Tải bản đầy đủ (.pdf) (5 trang)

Plug in PHP 100 POWER SOLUTIONS- P42 pps

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (460.95 KB, 5 trang )

C h a p t e r 7 : T h e I n t e r n e t
171
C h a p t e r 7 : T h e I n t e r n e t
171
FIGURE 7-10 With this plug-in you can make the busiest of web pages load quickly on a mobile browser.
FIGURE 7-11 This is the original Yahoo! home page before the plug-in is applied.

172
P l u g - i n P H P : 1 0 0 P o w e r S o l u t i o n s

172
P l u g - i n P H P : 1 0 0 P o w e r S o l u t i o n s
About the Plug-in
This plug-in accepts a string containing the HTML to be converted, along with other
required arguments, and returns a properly formatted HTML document with various
formatting elements removed. It takes these arguments:
• $html The HTML to convert
• $url The URL of the page being converted
• $style If “yes”, style and JavaScript elements are retained, otherwise they are
stripped out
• $images If “yes”, images are kept, otherwise they are removed
Variables, Arrays, and Functions
$dom Document object of $contents
$xpath XPath object for traversing $dom
$hrefs Object containing all a href= link elements in $dom
$links Array of all the links discovered in $url
$to Array containing the version of what each $link should be
changed to in order to ensure it is absolute
$count Integer containing the number of elements in $to
$link Each link in turn extracted from $links
$j Integer counter for iterating through $to


PIPHP_RelToAbsURL()
Plug-in 21: This function converts a relative URL to absolute.
How It Works
This function starts off by creating a DOM object that is loaded with the HTML from $html.
Then an XPath object is created from this, with which all a href= tags are extracted and placed
in the object $hrefs. After initializing the arrays $links and $to, which will contain the links
before and after converting to absolute format, all occurrences of & are converted to &
symbols, and then all & symbols to the token !!**1**!!, to avoid the suspected str_
replace() bug that doesn’t handle & symbols well.
Next the link parts of the tags are pulled out from $hrefs and placed into the array
$links using a for loop, and all duplicate links are removed from the array, which is then
sorted.
After this, the technique used in plug-ins 46 and 48 is implemented to swap all links in
$html with numbered tokens. This ensures that multiple replaces don’t interfere with each
other. First the $to array is loaded with a proper URL which has had any !!**1**!! tokens
changed back to & symbols after running them through PIPHP_RelToAbsURL() to ensure they
are absolute. This makes sure that legal URLs will be substituted when the tokens are later
changed back.
To be flexible, the plug-in supports three types of links—double quoted, single quoted,
and unquoted—each case being handled by one of the str_replace() calls. This function
substitutes links within $html for the token !!$count!!. This means that the first link
becomes !!0!!, the second !!1!!, and so on, as $count is incremented at each pass.
C h a p t e r 7 : T h e I n t e r n e t
173
C h a p t e r 7 : T h e I n t e r n e t
173
With all the tokens having been substituted they can now be swapped with their associated
links from the $to array. This is achieved using the following for loop.
Then, any remaining occurrences of the URL encoded format http%3A%2F%2F are rectified
to http://, and any !!**1**!! tokens are returned to being & symbols.

Next, if $style does not have the value “yes”, then whitespace, styling, and JavaScript
are removed from $html.
After this, $images is also tested and if it’s equal to “yes”, then images are allowed to
remain in place. This is achieved, along with removing all remaining tags, by appending the
tag <img> to the list of allowed tags in $allowed, which is then passed to the strip_tags()
function, along with $html. If $images is not equal to “yes”, then the <img> tag will not be
appended to $allowed, and consequently all image tags will also be removed by this function.
Upon completing all the processing, the result (in $html) is returned.
How to Use It
To convert HTML to a format more suitable for mobile browsers, use the plug-in like this:
$url = "";
$html = file_get_contents($url);
$style = "no";
$images = "no";
echo PIPHP_HTMLToMobile($html, $url, $style, $images);
This loads in the HTML from the index page at www.yahoo.com and then passes it to the
plug-in with both $style and $images set to “no”. This means that neither styling nor
JavaScript will be allowed in the converted HTML, and neither will images.
If $style is set to “yes”, then style tags and JavaScript are retained in the HTML. If
$images is also equal to “yes”, then some images will be retained—but not all, due to a lot
of the page’s content being removed.
If you play with this plug-in you’ll find that often you can set both $style and $images
to “yes” and many web pages will still return a lot less information because the strip_
tags() function removes plenty of HTML not strictly needed to use a web page.
Remember that this plug-in relies on plug-in 21, PIPHP_RelToAbsURL(). Therefore, you
must also copy it into your program or otherwise include it.
The Plug-in
function PIPHP_HTMLToMobile($html, $url, $style, $images)
{
$dom = new domdocument();

@$dom ->loadhtml($html);
$xpath = new domxpath($dom);
$hrefs = $xpath->evaluate("/html/body//a");
$links = array();
$to = array();
$count = 0;
$html = str_replace('&amp;', '&', $html);
$html = str_replace('&', '!!**1**!!', $html);


174
P l u g - i n P H P : 1 0 0 P o w e r S o l u t i o n s
for ($j = 0 ; $j < $hrefs->length ; ++$j)
$links[] = $hrefs->item($j)->getAttribute('href');

$links = array_unique($links);
sort($links);

foreach ($links as $link)
{
if ($link != "")
{
$temp = str_replace('!!**1**!!', '&', $link);
$to[$count] = urlencode(PIPHP_RelToAbsURL($url, $temp));
$html = str_replace("href=\"$link\"",
"href=\"!!$count!!\"", $html);
$html = str_replace("href='$link'",
"href='!!$count!!'", $html);
$html = str_replace("href=$link",
"href=!!$count!!", $html);

++$count;
}
}

for ($j = 0 ; $j < $count ; ++$j)
$html = str_replace("!!$j!!", $to[$j],
$html);

$html = str_replace('http%3A%2F%2F', 'http://', $html);
$html = str_replace('!!**1**!!', '&', $html);

if (strtolower($style) != "yes")
{
$html = preg_replace('/[\s]+/', ' ', $html);
$html = preg_replace('/<script[^>]*>.*?<\/script>/i', '',
$html);
$html = preg_replace('/<style[^>]*>.*?<\/style>/i', '',
$html);
}

$allowed = "<a><p><h><i><b><u><s>";
if (strtolower($images) == "yes") $allowed .= "<img>";
return strip_tags($html, $allowed);
}
CHAPTER 8
Chat and Messaging

×