Tải bản đầy đủ (.pdf) (6 trang)

Professional Information Technology-Programming Book part 110 pot

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (19.89 KB, 6 trang )

Caution
As noted previously, you will need to modify the backreference
designator based on the implementation used. JavaScript users will
need to use $ instead of the previously used \. ColdFusion users
should use \for both find and replace operations.

Tip
As seen in this example, a subexpression may be referred to
multiple times simply by referring to the backreference as needed.


Let's look at one more example. User information is stored in a database, and
phone numbers are stored in the format 313-555-1234. However, you need to
reformat the phone numbers as (313) 555-1234. Here is the example:


313-555-1234

248-555-9999

810-555-9000



(\d{3})(-)(\d{3})(-)(\d{4})



($1) $3-$5




(313) 555-1234

(248) 555-9999

(810) 555-9000


Again, two regular expression patterns are used here. The first looks far more
complicated than it is, so let's walk through it. (\d{3})(-)(\d{3})(-
)(\d{4}) matches a phone number, but breaks it into five subexpressions (so as
to isolate its parts). (\d{3}) matches the first three digits as the first
subexpression, (-) matches – as the second subexpression, and so on. The end
result is that the phone number is broken into five parts (each part its own
subexpression): the area code, a hyphen, the first three digits of the number,
another hyphen, and then the final four digits. These five parts can be used
individually and as needed, and so ($1) $3-$5 simply reformats the number
using only three of the subexpressions and ignoring the other two, thereby turning
313-555-1234 into (313) 555-1234.
Tip
When manipulating text for reformatting, it is often useful to break
the text into lots of little subexpressions so as to have greater
control over that text.

Converting Case
Some regex implementations support the use of conversion operations via the
metacharacters listed in Table 8.1.
Table 8.1. Case Conversion Metacharacters
Metacharacter Description
\E


Terminate \L or \U conversion
\l

Convert next character to lowercase
\L

Convert all characters up to \E to lowercase
\u

Convert next character to uppercase
Table 8.1. Case Conversion Metacharacters
Metacharacter Description
\U

Convert all characters up to \E to uppercase
\l and \u are placed before a character (or expression) so as to convert the case of
the next character. \L and \U convert the case of all characters until a terminating
\E is reached.
Following is a simple example, converting the text within an <H1> tag pair to
uppercase:


<BODY>

<H1>Welcome to my Homepage</H1>

Content is divided into two sections:<BR>

<H2>ColdFusion</H2>


Information about Macromedia ColdFusion.

<H2>Wireless</H2>

Information about Bluetooth, 802.11, and more.

<H2>This is not valid HTML</H3>

</BODY>



(<[Hh]1>)(.*?)(</[Hh]1>)



$1\U$2\E$3



<BODY>

<H1>WELCOME TO MY HOMEPAGE</H1>

Content is divided into two sections:<BR>

<H2>ColdFusion</H2>

Information about Macromedia ColdFusion.


<H2>Wireless</H2>

Information about Bluetooth, 802.11, and more.

<H2>This is not valid HTML</H3>

</BODY>


The pattern (<[Hh]1>)(.*?)(</[Hh]1>) breaks the header into three
subexpressions: the opening tag, the text, and the closing tag. The second pattern
then puts the text back together: $1 contains the start tag, \U$2\E converts the
second subexpression (the header text) to uppercase, and $3 contains the end tag.
Summary
Subexpressions are used to define sets of characters or expressions. In addition to
being used for repeating matches (as seen in the previous lesson), subexpressions
can be referred to within patterns. This type of reference is called a backreference
(and unfortunately, there are implementation differences in backreference syntax).
Backreferences are useful in text matching and in replace operations.
Lesson 9. Looking Ahead and Behind
All the expressions used thus far have matched text, but sometimes you may want
to use expressions to mark the position of text to be matched (in contrast to the
matched text itself). This involves the use of lookaround (the capability to look
ahead and behind), which will be explained in this lesson.
Introducing Lookaround
Again, we'll start with an example. You need to extract the title of a Web page;
HTML page titles are placed between <TITLE> and </TITLE> tags in the
<HEAD> section of HTML code. Here's the example:



<HEAD>

<TITLE>Ben Forta's Homepage</TITLE>

</HEAD>



<[tT][iI][tT][lL][eE]>.*</[tT][iI][tT][lL][eE]>



<HEAD>

<TITLE>Ben Forta's Homepage</TITLE>

</HEAD>


<[tT][iI][tT][lL][eE]>.*</[tT][iI][tT][lL][eE]> matches the opening <TITLE> tag
(in upper, lower, or mixed case), the closing </TITLE> tag, and whatever text is
between them. That worked.
Or did it? What you needed was the title text, but what you got also contained the
opening and closing <TITLE> tags. Is it possible to return just the title text?
One solution could be to use subexpressions (as seen in Lesson 7, "Using
Subexpressions"). This would allow for you to retrieve the matched text in three
parts: the opening tag, the text, and the closing tag. With the matched text broken
into parts, it would not be too difficult to extract just that part you want.
But it makes little sense to make the effort to retrieve something that you actually

don't want, only to have to manually remove it. What you really need here is a way

×