Tải bản đầy đủ (.pdf) (5 trang)

Plug in PHP 100 POWER SOLUTIONS- P16 ppsx

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (400.4 KB, 5 trang )

C h a p t e r 3 : Te x t P r o c e s s i n g
41
C h a p t e r 3 : Te x t P r o c e s s i n g
41
$contractions Array of the contracted forms of $what
$j, $k
Integer loop counters
$from, $to
Strings to convert from and to
$u1, $u2 Strings containing start and end underline tags if $emphasis is true
$f, $t, $s, $e Various arguments passed to the function PIPHP_FT_FN1()
$uf, $ut String variable copies of $f and $t with their initial letters capitalized
$1, $2 String variables containing the matches found by preg_replace()
PIPHP_FT_FN1()
Function to perform the string replacements
How It Works
This plug-in takes as an argument a string of text which it then modifies and returns. The
original text is not changed by the process. It performs five passes through the text to
change different types of English.
The first pass iterates through the $misc array, stepping two elements at a time. It
searches for the first element and, if found, replaces it with the second. The $misc array
contains a set of unusual contractions that don’t follow the normal English rules, which is
one reason why the program gets them out of the way first.
The second pass works through the $nots array and checks whether any of the words
in it are followed by the word not. If so, it contracts them so that, for example, did not
becomes didn’t.
In the third pass, the $haves array is processed in an identical manner to the $nots
array, except that pairs of words such as should have become should’ve.
Pass four uses a pair of nested loops to iterate through the $who array of pronouns and
similar words, and then iterate through the $what array of words that follow directly after
them in the plug-in and can be contracted. If matches are made, then the contraction to use is


looked up in $contractions and applied. So, for example, he has will become he’s.
The final pass, at the end of the main function, looks for all instances of the word is with
another word and a space in front of it, and when it finds any it contracts the two together
so that, for example, Paul is would become Paul’s.
The second function in this code, PIPHP_FT_FN1(), is only used by the plug-in. It takes
the four arguments $f, $t, $s, and $e, which in order contain a string to change from, what
to change it to if found, the string to search within, and whether to emphasize any changes
by underlining them. It does all this by using regular expressions within the PHP preg_
replace() function. It repeats each match and replace twice; the second time to catch
strings beginning with capital letters.
NOTE The function PIPHP_FT_FN1() uses an obscure name since it has no real use anywhere
other than as a partner function to PIPHP_FriendlyText(). Where partner functions can be
useful in their own right, they are given a more memorable name, such as the ones for PIPHP_
SpellCheck(), PIPHP_SpellCheckLoadDictionary(), and PIPHP_
SpellCheckWord(), which appear a little further on in this chapter, in plug-in 8.

42
P l u g - i n P H P : 1 0 0 P o w e r S o l u t i o n s

42
P l u g - i n P H P : 1 0 0 P o w e r S o l u t i o n s
How to Use It
To transform any text (including text with HTML) using this plug-in, call the main function
in the following way:
$oldtext = "Let us go for a picnic. I hope it will not rain.";
$newtext = PIPHP_FriendlyText($oldtext, TRUE);
The first parameter holds the string to be modified. This will not be changed. Instead, a
new string containing the transformed text will be returned by the function. The second
parameter can be either FALSE or TRUE, which will cause all changes to be underlined. This
can be useful for debugging purposes.

In this example, the value of $newtext becomes “Let’s go for a picnic. I hope it won’t
rain.”
The Plug-in
function PIPHP_FriendlyText($text, $emphasis)
{
$misc = array("let us", "let's", "i\.e\.", "for example",
"e\.g\.", "for example", "cannot", "can't", "can not",
"can't", "shall not", "shan't", "will not", "won't");
$nots = array("are", "could", "did", "do", "does", "is",
"had", "has", "have", "might", "must", "should", "was",
"were", "would");
$haves = array("could", "might", "must", "should", "would");
$who = array("he", "here", "how", "I", "it", "she", "that",
"there", "they", "we", "who", "what", "when", "where",
"why", "you");
$what = array("am", "are", "had", "has", "have", "shall",
"will", "would");
$contractions = array("m", "re", "d", "s", "ve", "ll", "ll",
"d");

for ($j = 0 ; $j < sizeof($misc) ; $j += 2)
{
$from = $misc[$j];
$to = $misc[$j+1];
$text = PIPHP_FT_FN1($from, $to, $text, $emphasis);
}

for ($j = 0 ; $j < sizeof($nots) ; ++$j)
{
$from = $nots[$j] . " not";

$to = $nots[$j] . "n't";
$text = PIPHP_FT_FN1($from, $to, $text, $emphasis);
}

for ($j = 0 ; $j < sizeof($haves) ; ++$j)

{
$from = $haves[$j] . " have";
C h a p t e r 3 : Te x t P r o c e s s i n g
43
C h a p t e r 3 : Te x t P r o c e s s i n g
43
$to = $haves[$j] . "'ve";
$text = PIPHP_FT_FN1($from, $to, $text, $emphasis);
}

for ($j = 0 ; $j < sizeof($who) ; ++$j)
{
for ($k = 0 ; $k < sizeof($what) ; ++$k)
{
$from = "$who[$j] $what[$k]";
$to = "$who[$j]'$contractions[$k]";
$text = PIPHP_FT_FN1($from, $to, $text, $emphasis);
}
}

$to = "'s";
$u1 = $u2 = "";

if ($emphasis)

{
$u1 = "<u>";
$u2 = "</u>";
}

return preg_replace("/([\w]*) is([^\w]+)/", "$u1$1$to$u2$2",
$text);
}

function PIPHP_FT_FN1($f, $t, $s, $e)
{
$uf = ucfirst($f);
$ut = ucfirst($t);

if ($e)
{
$t = "<u>$t</u>";
$ut = "<u>$ut</u>";
}

$s = preg_replace("/([^\w]+)$f([^\w]+)/", "$1$t$2", $s);
return preg_replace("/([^\w]+)$uf([^\w]+)/", "$1$ut$2", $s);
}
Strip Whitespace
A few of the plug-ins in this book are really short and sweet, and at just a single line of code,
this is one of them. But although it’s tiny, it packs a punch because it can clean up the
messiest text by removing all the whitespace in a string such as extra spaces, tabs, newlines,
and so on.
Figure 3-4 shows part of the U.S. Declaration of Independence as it might appear if read
in from a poor quality reprint by some optical character recognition software, followed by

the result of running the text through this plug-in.

4

44
P l u g - i n P H P : 1 0 0 P o w e r S o l u t i o n s

44
P l u g - i n P H P : 1 0 0 P o w e r S o l u t i o n s
Although browsers generally ignore whitespace, if the text is displayed using the <pre>
tag or placed in a form element such as a <textarea> (as used in Figure 3-4) then all the
whitespace will be apparent.
About the Plug-in
This plug-in takes a string variable containing any text and removes all the whitespace. It
requires a single argument:
• $text A string variable containing the text to be modified
Variables, Arrays, and Functions
• None
How It Works
The plug-in makes use of the regular expression feature built into PHP. What it does is search
for the text within the two forward slash characters ,/, and then replaces any it finds with a
single space. Between the slashes is the simple string \s+, which means find any section of
whitespace that is one or more characters in length. The \s stands for a whitespace character
FIGURE 3-4 Unsightly whitespace can seriously mess up some text, but this plug-in will remove it for you.
C h a p t e r 3 : Te x t P r o c e s s i n g
45
C h a p t e r 3 : Te x t P r o c e s s i n g
45
and the + indicates that the preceding character should appear one or more times in the search.
The actual string passed to the preg_replace() function is modified and then returned to the

calling code.
How to Use It
To use this plug-in, call the function in the following manner, where $text is the string to
be cleaned up:
echo PIPHP_StripWhitespace($text);
The Plug-in
function PIPHP_StripWhitespace($text)
{
return preg_replace('/\s+/', ' ', $text);
}
Word Selector
Quite often you will find you need to somehow highlight chosen words within a web
page—for example, when a user arrives from a search engine, you may wish to highlight
the search terms they used to help them find what they are looking for. Other times, you
might not want certain words to appear, such as profanities or other terms you wish to
prevent your users from posting.
This plug-in is powerful enough to handle both of these cases because you simply
decide on the relevant words and what should happen to them. Figure 3-5 shows a few
words highlighted within a section of the U.S. Declaration of Independence.
FIGURE 3-5 Using the Word Selector plug-in, you can highlight selected words or censor unwanted ones.

5

×