Tải bản đầy đủ (.pdf) (10 trang)

Phát triển web với PHP và MySQL - p 14 pot

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (565.12 KB, 10 trang )

return a number greater than zero. If str1 is less than str2, strcmp() will return a number
less than zero. This function is case sensitive.
The function strcasecmp() is identical except that it is not case sensitive.
The function strnatcmp() and its non-case sensitive twin, strnatcasecmp(), were added in
PHP 4. These functions compare strings according to a “natural ordering,” which is more the
way a human would do it. For example, strcmp() would order the string “2” as greater than
the string “12” because it is lexicographically greater. strnatcmp() would do it the other
way around. You can read more about natural ordering at />projects/natsort/
Testing String Length with strlen()
We can check the length of a string with the strlen() function. If you pass it a string, this
function return its length. For example,
strlen(“hello”) returns 5.
This can be used for validating input data. Consider the email address on our form, stored in
$email. One basic way of validating an email address stored in $email is to check its length.
By my reasoning, the minimum length of an email address is six characters—for example,
if you have a country code with no second level domains, a one-letter server name, and
a one-letter email address. Therefore, an error could be produced if the address was not this
length:
if (strlen($email) < 6)
{
echo “That email address is not valid”;
exit; // finish execution of PHP script
}
Clearly, this is a very simplistic way of validating this information. We will look at better ways
in the next section.
Matching and Replacing Substrings with String
Functions
It’s common to want to check if a particular substring is present in a larger string. This partial
matching is usually more useful than testing for equality.
In our Smart Form example, we want to look for certain key phrases in the customer feedback
and send the mail to the appropriate department. If we want to send emails talking about Bob’s


shops to the retail manager, we want to know if the word “shop” (or derivatives thereof) appear
in the message.
String Manipulation and Regular Expressions
C
HAPTER 4
4
S
TRING
M
ANIPULATION
105
06 7842 CH04 3/6/01 3:41 PM Page 105
Given the functions we have already looked at, we could use explode() or strtok() to
retrieve the individual words in the message, and then compare them using the == operator or
strcmp().
However, we could also do the same thing with a single function call to one of the string
matching or regular expression matching functions. These are used to search for a pattern
inside a string. We’ll look at each set of functions one by one.
Finding Strings in Strings: strstr(), strchr(), strrchr(),
stristr()
To find a string within another string you can use any of the functions strstr(), strchr(),
strrchr(), or stristr().
The function
strstr() is the most generic, and can be used to find a string or character match
within a longer string. Note that in PHP, the
strchr() function is exactly the same as
strstr(), although its name implies that it is used to find a character in a string, similar to the
C version of this function. In PHP, either of these functions can be used to find a string inside a
string, including finding a string containing only a single character.
The prototype for strstr() is as follows:

string strstr(string haystack, string needle);
You pass the function a haystack to be searched and a needle to be found. If an exact match
of the needle is found, the function returns the haystack from the needle onwards, otherwise
it returns false. If the needle occurs more than once, the returned string will start from the
first occurrence of needle.
For example, in the Smart Form application, we can decide where to send the email as follows:
$toaddress = “”; // the default value
// Change the $toaddress if the criteria are met
if (strstr($feedback, “shop”))
$toaddress = “”;
else if (strstr($feedback, “delivery”))
$toaddress = “”;
else if (strstr($feedback, “bill”))
$toaddress = “”;
This code checks for certain keywords in the feedback and sends the mail to the appropriate
person. If, for example, the customer feedback reads “I still haven’t received delivery of
my last order,” the string “delivery” will be detected and the feedback will be sent to

Using PHP
P
ART I
106
06 7842 CH04 3/6/01 3:41 PM Page 106
There are two variants on strstr(). The first variant is stristr(), which is nearly identical
but is not case sensitive. This will be useful for this application as the customer might type
“delivery”, “Delivery”, or “DELIVERY”.
The second variant is strrchr(), which is again nearly identical, but will return the haystack
from the last occurrence of the needle onwards.
Finding the Position of a Substring: strpos(), strrpos()
The functions strpos() and strrpos() operate in a similar fashion to strstr(), except,

instead of returning a substring, they return the numerical position of a
needle within a
haystack.
The strpos() function has the following prototype:
int strpos(string haystack, string needle, int [offset] );
The integer returned represents the position of the first occurrence of the needle within the
haystack. The first character is in position 0 as usual.
For example, the following code will echo the value 4 to the browser:
$test = “Hello world”;
echo strpos($test, “o”);
In this case, we have only passed in a single character as the needle, but it can be a string of
any length.
The optional offset parameter is used to specify a point within the haystack to start searching.
For example
echo strpos($test, “o”, 5);
This code will echo the value 7 to the browser because PHP has started looking for the charac-
ter o at position 5, and therefore does not see the one at position 4.
The strrpos() function is almost identical, but will return the position of the last occurrence
of the needle in the haystack. Unlike strpos(), it only works with a single character needle.
Therefore, if you pass it a string as a needle, it will only use the first character of the string to
match.
In any of these cases, if the needle is not in the string, strpos() or strrpos() will return
false. This can be problematic because false in a weakly typed language such as PHP is
equivalent to 0, that is, the first character in a string.
String Manipulation and Regular Expressions
C
HAPTER 4
4
S
TRING

M
ANIPULATION
107
06 7842 CH04 3/6/01 3:41 PM Page 107
You can avoid this problem by using the === operator to test return values:
$result = strpos($test, “H”);
if ($result === false)
echo “Not found”
else
echo “Found at position 0”;
Note that this will only work in PHP 4—in earlier versions you can test for false by testing the
return value to see if it is a string (that is, false).
Replacing Substrings: str_replace(), substr_replace()
Find-and-replace functionality can be extremely useful with strings. We have used find and
replace in the past for personalizing documents generated by PHP—for example by replacing
<<name>> with a person’s name and <<address>> with their address. You can also use it for
censoring particular terms, such as in a discussion forum application, or even in the Smart
Form application.
Again, you can use string functions or regular expression functions for this purpose.
The most commonly used string function for replacement is
str_replace(). It has the follow-
ing prototype:
string str_replace(string needle, string new_needle, string haystack);
This function will replace all the instances of needle in haystack with new_needle.
For example, because people can use the Smart Form to complain, they might use some color-
ful words. As programmers, we can prevent Bob’s various departments from being abused in
that way:
$feedback = str_replace($offcolor, “%!@*”, $feedback);
The function substr_replace() is used to find and replace a particular substring of a string. It
has the following prototype:

string substr_replace(string string, string replacement, int start, int
[length] );
This function will replace part of the string string with the string replacement. Which part is
replaced depends upon the values of the start and optional length parameters.
The start value represents an offset into the string where replacement should begin. If it is 0
or positive, it is an offset from the beginning of the string; if it is negative, it is an offset from
the end of the string. For example, this line of code will replace the last character in $test
with “X”:
$test = substr_replace($test, “X”, -1);
Using PHP
P
ART I
108
06 7842 CH04 3/6/01 3:41 PM Page 108
The length value is optional and represents the point at which PHP will stop replacing. If you
don’t supply this value, the string will be replaced from start to the end of the string.
If length is zero, the replacement string will actually be inserted into the string without over-
writing the existing string.
A positive length represents the number of characters that you want replaced with the new
string.
A negative length represents the point at which you’d like to stop replacing characters,
counted from the end of the string.
Introduction to Regular Expressions
PHP supports two styles of regular expression syntax: POSIX and Perl. The POSIX style of
regular expression is compiled into PHP by default, but you can use the Perl style by compil-
ing in the PCRE (Perl-compatible regular expression) library. We’ll cover the simpler POSIX
style, but if you’re already a Perl programmer, or want to learn more about PCRE, read the
online manual at
.
So far, all the pattern matching we’ve done has used the string functions. We have been limited

to exact match, or to exact substring match. If you want to do more complex pattern matching,
you should use regular expressions. Regular expressions are difficult to grasp at first but can be
extremely useful.
The Basics
A regular expression is a way of describing a pattern in a piece of text. The exact (or literal)
matches we’ve done so far are a form of regular expression. For example, earlier we were
searching for regular expression terms like “shop” and “delivery”.
Matching regular expressions in PHP is more like a strstr() match than an equal comparison
because you are matching a string somewhere within another string. (It can be anywhere
within that string unless you specify otherwise.) For example, the string “shop” matches the
regular expression “shop”. It also matches the regular expressions “h”, “ho”, and so on.
We can use special characters to indicate a meta-meaning in addition to matching characters
exactly.
For example, with special characters you can indicate that a pattern must occur at the start or
end of a string, that part of a pattern can be repeated, or that characters in a pattern must be of
a particular type. You can also match on literal occurrences of special characters. We’ll look at
each of these.
String Manipulation and Regular Expressions
C
HAPTER 4
4
S
TRING
M
ANIPULATION
109
06 7842 CH04 3/6/01 3:41 PM Page 109
Character Sets and Classes
Using character sets immediately gives regular expressions more power than exact matching
expressions. Character sets can be used to match any character of a particular type—they’re

really a kind of wildcard.
First of all, you can use the . character as a wildcard for any other single character except a
new line (\n). For example, the regular expression
.at
matches the strings “cat”, “sat”, and “mat”, among others.
This kind of wildcard matching is often used for filename matching in operating systems.
With regular expressions, however, you can be more specific about the type of character you
would like to match, and you can actually specify a set that a character must belong to. In the
previous example, the regular expression matches “cat” and “mat”, but also matches “#at”. If
you want to limit this to a character between a and z, you can specify it as follows:
[a-z]
Anything enclosed in the special square brace characters [ and ] is a character class—a set of
characters to which a matched character must belong. Note that the expression in the square
brackets matches only a single character.
You can list a set, for example
[aeiou]
means any vowel.
You can also describe a range, as we just did using the special hyphen character, or a set of
ranges:
[a-zA-Z]
This set of ranges stands for any alphabetic character in upper- or lowercase.
You can also use sets to specify that a character cannot be a member of a set. For example,
[^a-z]
matches any character that is not between a and z. The caret symbol means not when it is
placed inside the square brackets. It has another meaning when used outside square brackets,
which we’ll look at in a minute.
In addition to listing out sets and ranges, a number of predefined character classes can be used
in a regular expression. These are shown in Table 4.3.
Using PHP
P

ART I
110
06 7842 CH04 3/6/01 3:41 PM Page 110
TABLE 4.3 Character Classes for Use in POSIX-Style Regular Expressions
Class Matches
[[:alnum:]] Alphanumeric characters
[[:alpha:]] Alphabetic characters
[[:lower:]] Lowercase letters
[[:upper:]] Uppercase letters
[[:digit:]] Decimal digits
[[:xdigit:]] Hexadecimal digits
[[:punct:]] Punctuation
[[:blank:]] Tabs and spaces
[[:space:]] Whitespace characters
[[:cntrl:]] Control characters
[[:print:]] All printable characters
[[:graph:]] All printable characters except for space
Repetition
Often you want to specify that there might be multiple occurrences of a particular string or
class of character. You can represent this using two special characters in your regular expres-
sion. The * symbol means that the pattern can be repeated zero or more times, and the + sym-
bol means that the pattern can be repeated one or more times. The symbol should appear
directly after the part of the expression that it applied to. For example
[[:alnum:]]+
means “at least one alphanumeric character.”
Subexpressions
It’s often useful to be able to split an expression into subexpressions so you can, for example,
represent “at least one of these strings followed by exactly one of those.” You can do this using
parentheses, exactly the same way as you would in an arithmetic expression. For example,
(very )*large

matches “large”, “very large”, “very very large”, and so on.
String Manipulation and Regular Expressions
C
HAPTER 4
4
S
TRING
M
ANIPULATION
111
06 7842 CH04 3/6/01 3:41 PM Page 111
Counted Subexpressions
We can specify how many times something can be repeated by using a numerical expression in
curly braces ( {} ).You can show an exact number of repetitions ({3} means exactly 3 repeti-
tions), a range of repetitions ({2, 4} means from 2 to 4 repetitions), or an open ended range of
repetitions ({2,} means at least two repetitions).
For example,
(very ){1, 3}
matches “very”, “very very” and “very very very”.
Anchoring to the Beginning or End of a String
You can specify whether a particular subexpression should appear at the start, the end, or both.
This is pretty useful when you want to make sure that only your search term and nothing else
appears in the string.
The caret symbol (^) is used at the start of a regular expression to show that it must appear at
the beginning of a searched string, and $ is used at the end of a regular expression to show that
it must appear at the end.
For example, this matches bob at the start of a string:
^bob
This matches com at the end of a string:
com$

Finally, this matches any single character from a to z, in the string on its own:
^[a-z]$
Branching
You can represent a choice in a regular expression with a vertical pipe. For example, if we
want to match com, edu, or net, we can use the expression:
(com)|(edu)|(net)
Matching Literal Special Characters
If you want to match one of the special characters mentioned in this section, such as ., {, or $,
you must put a slash (\) in front of it. If you want to represent a slash, you must replace it with
two slashes, \\.
Using PHP
P
ART I
112
06 7842 CH04 3/6/01 3:41 PM Page 112
Summary of Special Characters
A summary of all the special characters is shown in Tables 4.4 and 4.5. Table 4.4 shows the
meaning of special characters outside square brackets, and Table 4.5 shows their meaning
when used inside square brackets.
TABLE 4.4 Summary of Special Characters Used in POSIX Regular Expressions
Outside Square Brackets
Character Meaning
\ Escape character
^ Match at start of string
$ Match at end of string
. Match any character except newline (\n)
| Start of alternative branch (read as OR)
( Start subpattern
) End subpattern
* Repeat 0 or more times

+ Repeat 1 or more times
{ Start min/max quantifier
} End min/max quantifier
TABLE 4.5 Summary of Special Characters Used in POSIX Regular Expressions Inside
Square Brackets
Character Meaning
\ Escape character
^ NOT, only if used in initial position
- Used to specify character ranges
Putting It All Together for the Smart Form
There are at least two possible uses of regular expressions in the Smart Form application. The
first use is to detect particular terms in the customer feedback. We can be slightly smarter
about this using regular expressions. Using a string function, we’d have to do three different
searches if we wanted to match on “shop”, “customer service”, or “retail”. With a regular
expression, we can match all three:
shop|customer service|retail
String Manipulation and Regular Expressions
C
HAPTER 4
4
S
TRING
M
ANIPULATION
113
06 7842 CH04 3/6/01 3:41 PM Page 113
The second use is to validate customer email addresses in our application by encoding the stan-
dardized format of an email address in a regular expression. The format includes some
alphanumeric or punctuation characters, followed by an @ symbol, followed by a string of
alphanumeric and hyphen characters, followed by a dot, followed by more alphanumeric and

hyphen characters and possibly more dots, up until the end of the string, which encodes as fol-
lows:
^[a-zA-Z0-9_]+@[a-zA-Z0-9\-]+\.[a-zA-Z0-9\-\.]+$
The subexpression ^[a-zA-Z0-9_]+ means “start the string with at least one letter, number, or
underscore, or some combination of those.”
The @ symbol matches a literal @.
The subexpression [a-zA-Z0-9\-]+ matches the first part of the host name including alphanu-
meric characters and hyphens. Note that we’ve slashed out the hyphen because it’s a special
character inside square brackets.
The \. combination matches a literal
The subexpression [a-zA-Z0-9\-\.]+$ matches the rest of a domain name, including letters,
numbers, hyphens, and more dots if required, up until the end of the string.
A bit of analysis shows that you can produce invalid email addresses that will still match this
regular expression. It is almost impossible to catch them all, but this will improve the situation
a little.
Now that you have read about regular expressions, we’ll look at the PHP functions that use
them.
Finding Substrings with Regular Expressions
Finding substrings is the main application of the regular expressions we just developed. The
two functions available in PHP for matching regular expressions are ereg() and eregi().
The ereg() function has the following prototype:
int ereg(string pattern, string search, array [matches]);
This function searches the search string, looking for matches to the regular expression in
pattern. If matches are found for subexpressions of pattern, they will be stored in the array
matches, one subexpression per array element.
The eregi() function is identical except that it is not case sensitive.
Using PHP
P
ART I
114

06 7842 CH04 3/6/01 3:41 PM Page 114

×