Tải bản đầy đủ (.pdf) (5 trang)

PHP and MySQL Web Development - P30 pps

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (66.7 KB, 5 trang )

112
Chapter 4 String Manipulation and Regular Expressions
Repetition
Often you want to specify that there might be multiple occurrences of a particular string
or class of character.You can represent this using two special characters in your regular
expression.The * symbol means that the pattern can be repeated zero or more times, and
the + symbol means that the pattern can be repeated one or more times.The symbol
should appear directly after the part of the expression that it applied to. For example
[[:alnum:]]+
means “at least one alphanumeric character.”
Subexpressions
It’s often useful to be able to split an expression into subexpressions so you can, for
example, represent “at least one of these strings followed by exactly one of those.”You
can do this using parentheses, exactly the same way as you would in an arithmetic
expression. For example,
(very )*large
matches 'large', 'very large', 'very very large', and so on.
Counted Subexpressions
We can specify how many times something can be repeated by using a numerical
expression in curly braces ( {} ).You can show an exact number of repetitions ({3}
means exactly 3 repetitions), a range of repetitions ({2, 4} means from 2 to 4 repeti-
tions), or an open ended range of repetitions ({2,} means at least two repetitions).
For example,
(very ){1, 3}
matches 'very ', 'very very ' and 'very very very '.
Anchoring to the Beginning or End of a String
You can specify if a particular subexpression should appear at the start, the end, or both.
This is pretty useful when you want to make sure that only your search term and noth-
ing else appears in the string.
The caret symbol (^) is used at the start of a regular expression to show that it must
appear at the beginning of a searched string, and $ is used at the end of a regular expres-


sion to show that it must appear at the end.
For example, this matches bob at the start of a string:
^bob
This matches com at the end of a string:
com$
06 525x ch04 1/24/03 2:55 PM Page 112
113
Introduction to Regular Expressions
Finally, this matches any single character from a to z, in the string on its own:
^[a-z]$
Branching
You can represent a choice in a regular expression with a vertical pipe. For example, if
we want to match com, edu, or net,we can use the expression:
(com)|(edu)|(net)
Matching Literal Special Characters
If you want to match one of the special characters mentioned in this section, such as .,
{, or $,you must put a slash (\) in front of it. If you want to represent a slash, you must
replace it with two slashes, \\.
Summary of Special Characters
A summary of all the special characters is shown in Tables 4.4 and 4.5.Table 4.4 shows
the meaning of special characters outside square brackets, and Table 4.5 shows their
meaning when used inside square brackets.
Table 4.4 Summary of Special Characters Used in POSIX Regular
Expressions Outside Square Brackets
Character Meaning
\ Escape character
^ Match at start of string
$ Match at end of string
. Match any character except newline (\n)
| Start of alternative branch (read as OR)

( Start subpattern
) End subpattern
* Repeat 0 or more times
+ Repeat 1 or more times
{ Start min/max quantifier
{ Start min/max quantifier
06 525x ch04 1/24/03 2:55 PM Page 113
114
Chapter 4 String Manipulation and Regular Expressions
Table 4.5 Summary of Special Characters Used in POSIX Regular Expressions Inside
Square Brackets
Character Meaning
\ Escape character
^ NOT, only if used in initial position
- Used to specify character ranges
Putting It All Together for the Smart Form
There are at least two possible uses of regular expressions in the Smart Form application.
The first use is to detect particular terms in the customer feedback.We can be slightly
smarter about this using regular expressions. Using a string function, we’d have to do
three different searches if we wanted to match on 'shop', 'customer service',or
'retail'.With a regular expression, we can match all three:
shop|customer service|retail
The second use is to validate customer email addresses in our application by encoding
the standardized format of an email address in a regular expression.The format includes
some alphanumeric or punctuation characters, followed by an @ symbol, followed by a
string of alphanumeric and hyphen characters, followed by a dot, followed by more
alphanumeric and hyphen characters and possibly more dots, up until the end of the
string, which encodes as follows:
^[a-zA-Z0-9_\-\.]+@[a-zA-Z0-9\-]+\.[a-zA-Z0-9\-\.]+$
The subexpression ^[a-zA-Z0-9_\-\.]+ means “start the string with at least one letter,

number, underscore, hyphen, or dot, or some combination of those.”
The @ symbol matches a literal @.
The subexpression [a-zA-Z0-9\-]+ matches the first part of the host name including
alphanumeric characters and hyphens. Note that we’ve slashed out the hyphen because
it’s a special character inside square brackets.
The \. combination matches a literal
The subexpression
[a-zA-Z0-9\-\.]+$ matches the rest of a domain name, including
letters, numbers, hyphens, and more dots if required, up until the end of the string.
A bit of analysis shows that you can produce invalid email addresses that will still
match this regular expression. It is almost impossible to catch them all, but this will
improve the situation a little.You can refine this expression in many ways.You can, for
example, list valid TLDs. Be careful when making things more restrictive though, as a
validation function that rejects 1% of valid data is far more annoying than one that
allows through 10% of invalid data.
Now that you have read about regular expressions, we’ll look at the PHP functions
that use them.
06 525x ch04 1/24/03 2:55 PM Page 114
115
Replacing Substrings with Regular Expressions
Finding Substrings with Regular Expressions
Finding substrings is the main application of the regular expressions we just developed.
The two functions available in PHP for matching regular expressions are ereg() and
eregi().
The ereg() function has the following prototype:
int ereg(string pattern, string search, array [matches]);
This function searches the search string, looking for matches to the regular expression
in pattern. If matches are found for subexpressions of pattern, they will be stored in
the array matches, one subexpression per array element.
The eregi() function is identical except that it is not case sensitive.

We can adapt the Smart Form example to use regular expressions as follows:
if (!eregi('^[a-zA-Z0-9_\-\.]+@[a-zA-Z0-9\-]+\.[a-zA-Z0-9\-\.]+$', $email))
{
echo 'That is not a valid email address. Please return to the'
.' previous page and try again.';
exit;
}
$toaddress = ''; // the default value
if (eregi('shop|customer service|retail', $feedback))
$toaddress = '';
else if (eregi('deliver.*|fulfil.*', $feedback))
$toaddress = '';
else if (eregi('bill|account', $feedback))
$toaddress = '';
if (eregi('bigcustomer\.com', $email))
$toaddress = '';
Replacing Substrings with Regular Expressions
You can also use regular expressions to find and replace substrings in the same way as we
used str_replace().The two functions available for this are ereg_replace() and
eregi_replace().The function ereg_replace() has the following prototype:
string ereg_replace(string pattern, string replacement, string search);
This function searches for the regular expression pattern in the search string and
replaces it with the string replacement.
The function eregi_replace() is identical, but again, is not case sensitive.
06 525x ch04 1/24/03 2:55 PM Page 115
116
Chapter 4 String Manipulation and Regular Expressions
Splitting Strings with Regular Expressions
Another useful regular expression function is split(), which has the following proto-
type:

array split(string pattern, string search, int [max]);
This function splits the string search into substrings on the regular expression pattern
and returns the substrings in an array.The max integer limits the number of items that
can go into the array.
This can be useful for splitting up domain names or dates. For example,
$domain = 'yallara.cs.rmit.edu.au';
$arr = split ('\.', $domain);
while (list($key, $value) = each ($arr))
echo '<br />'.$value;
This splits the host name into its five components and prints each on a separate line.
Comparison of String Functions and Regular
Expression Functions
In general, the regular expression functions run less efficiently than the string functions
with similar functionality. If your application is simple enough to use string expressions,
do so.
Further Reading
PHP has many string functions.We have covered the more useful ones in this chapter,
but if you have a particular need (such as translating characters into Cyrillic), check the
PHP manual online to see if PHP has the function for you.
The amount of material available on regular expressions is enormous.You can start
with the man page for regexp if you are using UNIX and there are also some terrific
articles at
devshed.com and phpbuilder.com.
At Zend’s Web site, you can look at a more complex and powerful email validation
function than the one we developed here. It is called MailVal() and is available at
/>Regular expressions take a while to sink in—the more examples you look at and run,
the more confident you will be using them.
Next
In the next chapter, we’ll discuss several ways you can use PHP to save programming
time and effort and prevent redundancy by reusing pre-existing code.

06 525x ch04 1/24/03 2:55 PM Page 116

×