Your charge will appear under the name "Marco Tabini & Associates, Inc." Please allow up to 4 to 6 weeks for your subscription to be established and your first issue to be mailed to you. *US Pricing is approximate and for illustration purposes only.
Choose a Subscription type:
Canada/USA International Surface International Air Combo edition add-on (print + PDF edition)
$ 83.99 $111.99 $125.99
$ 14.00
CAD CAD CAD CAD
($59.99 ($79.99 ($89.99 ($10.00
US*) US*) US*) US)
Country: ___________________________________________ Payment type: VISA Mastercard
*By signing this order form, you agree that we will charge your account in Canadian dollars for the “CAD” amounts indicated above. Because of fluctuations in the exchange rates, the actual amount charged in your currency on your credit card statement may vary slightly. **Offer available only in conjunction with the purchase of a print subscription.
To subscribe via snail mail - please detach/copy this form, fill it out and mail to the address above or fax to +1-416-630-5057
EDITORIAL
E D I T O R I A L
R A N T S
I
'm sure you're familiar with the Chinese proverb "may you live in interesting times." Even though I rarely think of my professional life as dull and boring, the last month has been particularly exciting. As promised in my exit(0) column from last month's issue, if you look through the middle of the magazine you'll find a full report (in colour!) on the best conference I have ever attended—our very own php|cruise (forgive me for a bit of professional price—eight months of prep
work will do that to you). Things went so well that we're working on another cruise—this time going to Alaska in the fall—and plan on making php|c an annual event for many years to come. All good things come to an end, of course, and, once back from the cruise, it's back to work. Luckily for us, work means bringing you yet another great issue of php|architect—and I personally consider that another good thing. Like every month, we've got some great content waiting for you in the following pages. The one I'm most proud of is George Schlossnagle's regular expressions article. Regexes are something that pretty much every programmer has to deal with, but that very few among us really know how to use. In fact, I've seen developers write extremely complicated code with the explicit purpose of getting around having to use a regular expression—and that is just plain wrong. After all, using the best solution for each problem is what being a programmer is all about. Thus, I approached George about writing an article on regular expressions—and it became quickly evident that one article would not even come close to covering the complexity of regex. Now, everyone knows that I always try my best to stay away from multi-part articles for a multitude of reasons, but in this case I felt that the topic more than deserved our attention over multiple issues and, therefore, George's article is the first in a series of three. Over the next three months, he will take you for a ride from the basics (which are covered in this issue) to the more complex and exotic aspects of regular expressions, thus hopefully providing the PHP world with a definitive guide to this topic.
If regular expressions are not your bag, one of the other topics covered in this month's issue is certain to tickle your fancy. For example, you may want to read Alessandro Sfondrini's excellent article on using the Amazon.com API directly from your PHP website, or Andrea Trasatti's look at the world of WAP. As you can probably imagine, both Andrea and Alessandro hail from my native Italy—and that alone makes their articles more than worth reading. There, my monthly heritage tax is now paid up! As I'm sure you've noticed, in the past few months we've been publishing material about testing practices quite frequently. As larger and larger projects are devel-
php|architect Volume III - Issue 3 March, 2004
Publisher Marco Tabini
Editorial Team Arbi Arzoumani Peter MacIntyre Eddie Peloke
Graphics & Layout Arbi Arzoumani
Managing Editor Emanuela Corso
Director of Marketing J. Scott Johnson
Account Executive Shelley Johnston
Authors John Coggeshall, John Holmes, Dr. James McCaffrey, George Schlossnagle, Alessandro Sfondrini, Chris Shiflett, Andrea Trasatti php|architect (ISSN 1709-7169) is published twelve times a year by Marco Tabini & Associates, Inc., P.O. Box 54526, 1771 Avenue Road, Toronto, ON M5M 4N5, Canada. Although all possible care has been placed in assuring the accuracy of the contents of this magazine, including all associated source code, listings and figures, the publisher assumes no responsibilities with regards of use of the information contained herein or in all associated material.
ing the ability to access low-level socket operations on streams.
PHP 5.0 Beta 4 PHP.net has announced the release of PHP 4.3.5 RC1. This fourth beta of PHP 5 is also scheduled to be the last one (barring unexpected surprises, that did occur with beta 3). This beta incorporates dozens of bug fixes since Beta 3, rewritten exceptions support, improved
interfaces support, new experimental SOAP support, as well as lots of other improvements, some of which are documented in the ChangeLog. Some of the key features of PHP 5 include: • PHP 5 features the Zend Engine 2. • XML support has been completely redone in PHP 5, all extensions are now focused around the excellent libxml2 library ( />• SQLite has been bundled with PHP. For more information on SQLite, please visit their website. • A new SimpleXML extension for easily accessing and manipulating XML as PHP objects. It can also interface with the DOM extension and vice-versa. • Streams have been greatly improved, includ-
March 2004
●
PHP Architect
●
www.phparch.com
PHP.net also announced the release of PHP 4.3.5 RC 3. This will be the last release candidate prior to the final release, so please test it as much as possible. For more information visit /> ZEND Optimizer 2.5.1 Zend has announced the release of Zend Optimizer
2.5.1. Zend.com describes the Optimizer as: "a free application that runs the files encoded by the Zend Encoder and Zend SafeGuard Suite, while enhancing the running speed of PHP applications. Benefits: • Enables users to run files encoded by the Zend Encoder • Increases runtime performance up to 40%." Get more information from Zend.com.
6
NEW STUFF Zend Launches New PHP5 In-Depth Articles Section Zend Technologies have launched a new version of their Developer's Corner on the zend.com website. PHP5 In-depth showcases articles from many well-known PHP authors on the new features of PHP. For more information, check out /> DEV Web Management System Dev is small, but powerful and very flexible content management system for web portals. System is licensed as freeware under the terms of GNU/GPL license. It is absolutely free for non-commercial and commercial use. Based on php4 + MySQL technology. This project allows the user to publish articles, evaluate article by taking the pool, publish short news and create back-ends in xml format, manage download lists, Manage advertisement on your site, Be informed
about events on your site, create system reports and export them into MS Excel or XML format and much more.
For more information visit: /> PhpMyAdmin 2.5.6 Phpmyadmin.net has released their latest version of phpMyAdmin. PHPMyAdmin is a tool written in PHP intended to handle the administration of MySQL over the Web. "Welcome to this new version, aimed at stabilization of the 2.5 branch. Meanwhile, work is continuing on the new 2.6 branch. PhpMyAdmin is a tool written in PHP intended to handle the administration of MySQL over the Web. Currently it can create and drop databases, create/drop/alter tables, delete/edit/add fields, execute any SQL statement, manage keys on fields." For more information visit: www.phpmyadmin.net.
PhpSQLiteAdmin 0.2 PhpSQLiteAdmin is a Web interface for the administration of SQLite databases. Version 0.2 comes with some new features and a lot of internal cleanups and refactoring. PhpSQLiteAdmin is still in an early stage of development. It comes free of charge and without warranty. For more information visit: www.phpsqliteadmin.net. phpMyEdit 5.4 phpMyEdit generates PHP code for displaying/editing MySQL tables in HTML. All you need to do is to write a simple calling program (a utility to do this is included).
Looking for a new PHP Extension? Check out some of the latest offerings from PECL. ps 1.1.0 ps is an extension similar to the pdf extension but for creating PostScript files. Its api is modeled after the pdf extension. Memcache 0.2 Memcached is a caching daemon designed especially for dynamic web applications to decrease database load by storing objects in memory. This extension allows you to work with memcached through handy OO interface. This extension allows you to call the functions made available by libstatgrab library. POP3 1.0 The POP3 extension makes it possible for a PHP script to connect to and interact with a POP3 mail server. It is based on the PHP streams interface and requires no external library. Fileinfo 0.1 This extension allows retrieval of information regarding vast majority of file. This information may include dimensions, quality, length etc. Additionally it can also be used to retrieve the mime type for a particular file and for text files proper language encoding.
March 2004
●
PHP Architect
●
www.phparch.com
7
NEW STUFF It includes a huge set of table manipulation functions (record adition, change, view, copy, and remove), table
sorting, filtering, table lookups, and more. Several minor bugs were fixed. A few new options were added. Major features include tabs support, the ability to specify SQL expressions for fields when writing to the database, the ability to define new triggers, and more. All eval() calls were removed due to security and performance reasons. Some code was optimized. Several parts of the documentation were updated. A lot of new language files were added and updated. For more information visit: phpMyEdit/ .
ionCube Releases New Encoder UK-based ionCube has released a new version of their compiled code PHP encoding tools. New features include a choice of ASCII or binary encoded file formats and optional support for OpenSource extensions such as mmcache. Prices start at a special price of $159 in their March 20% off sale. For further information, please visit the homepage of the Encoder:
Editorial: Contiuned from page 5 oped using PHP, serious testing processes are going to become an integral part of every good developer's arsenal of programming tools. What we never quite considered is that PHP is a great testing platform even for those projects that are not written using it. Thankfully, James McCaffrey came to the rescue and provided us with a wonderful article on the subject.
Our final article this month is about the new Tidy extension, which author John Coggeshall has recently introduced in PHP. You may have already heard about the Tidy project, which provides a series of libraries capable of parsing and automatically required documents written in markup languages like HTML or XML. Tidy brings an important set of capabilities to PHP, and I'm happy to have the author of the extension introduce us to it. That's it for this month—time for me to go tend to my sunburn while I start working on the next issue. Until then, happy readings!
/> php|a
Check out some of the hottest new releases from PEAR. Mail_Queue 1.1 Class to handle mail queue managment.Wrapper for PEAR::Mail and PEAR::DB (or PEAR::MDB).It can load, save and send saved mails in background and also backup some mails. The Mail_Queue class puts mails in a temporary container waiting to be fed to the MTA (Mail Transport Agent) and send them later (eg. every few minutes) by crontab or in other way. XML_Transformer 0.9.1 With the XML/Transformer class one can easily bind PHP functionality to XML tags, thus transforming the input XML tree into an output XML tree without the need for XSLT. Net_LMTP 0.7.0 Provides an implementation of the RFC2033 LMTP using PEAR's Net_Socket and Auth_SASL class. Text_Wiki 0.8.3 Abstracts parsing and rendering rules for Wiki markup in structured plain text.
March 2004
●
PHP Architect
●
www.phparch.com
8
Connecting to Amazon.com Web Services with NuSOAP
F E A T U R E
by Alessandro Sfondrini Have you ever wanted to add an online shop to your website but gave up on the idea because you lack the expertise and resources to run it? Using SOAP, you can connect to Amazon Web Services and create a PHP application to remotely browse and search products, add them to Amazon shopping carts or wish lists and, yes, you can even earn money on every purchase performed from your site.
I
n the article "Exploring the Google API with SOAP," which appeared in the January issue of php|a, I showed you what SOAP is and how it can be used
together with PHP. We used a SOAP-encoded document to perform a search using the Google Engine, then we parsed the response to display the results on our website. To perform these operations, we wrote an application from scratch; this approach can be great to understand how SOAP works, but when a customer asks you to implement a SOAP-based feature in an application, you can't waste your time in that way. In this case, there are some libraries that will make your coding quicker and easier: one of these is NuSOAP, which allows you to send Remote Procedure Calls (RPCs) over HTTP. This article will show you how we can use the Amazon.com API with NuSOAP to perform searches and display product details, without having to sort through a lot of SOAP syntax: if you have had an opportunity to read my previous article, you will notice how much shorter an application written this way is, and how much time can actually be saved by using this method.
What are Amazon Web Services? Amazon.com is one of the most widely known on-line shops. You can find and buy almost everything, from books to toys to power tools. Several years ago, Amazon launched a very successful affiliate program, which they later expanded in their Web Services program. Why would you want to use Amazon Web Services March 2004
●
PHP Architect
●
www.phparch.com
(AWS)? For instance, if your website is about Literature, you may want to allow your users to look for books in the (huge) Amazon database directly from your pages, without redirecting them to Amazon.com. You can provide them with a detailed description of each book and, when they decide to buy one, you can add it directly to their Amazon shopping cart. When the time comes to complete the purchase, you can redirect the user directly to the Amazon website, where the checkout process actually takes place and you receive credit for your affiliate referral. It is important to understand that AWS are designed only to retrieve information about products and create, as well as populate, shopping carts, not to perform payments: this must be done directly on the Amazon website-the reason being, of course, one of security for the customer's personal information. In any case, a significant portion of the transaction is performed from your website. This results in a benefit both for you and for your users, since you can offer your customers a nearly seamless user experience and collect your referral fees. Access to AWS, as well as to the affiliate program, requires you to register with the Amazon Associates Program and obtain an Associates ID, which will identi-
REQUIREMENTS PHP: 4.1 and higher
OS: Any Other software:: NuSOAP 0.6.4 Code Directory: webs-nusoap
9
FEATURE
Connecting to Amazon.com Web Services with NuSOAP
fy each purchase sent through our website.
Getting started Before we start coding, I recommend you download the AWS Software Developer's Kit from It contains the License Agreement, a guide (you should have a look at it to familiarize yourself with the concepts associated with the program) and some code samplesincluding a few written in PHP! As I mentioned earlier, you will also have to apply for your Developer's token-an alphanumerical string needed for performing searches and purchases: to do so, you have to visit : />oin/developer/application.html
and accept the AWS terms and conditions.
To write our application, we will take advantage of a PHP library called NuSOAP-which is really just a group of "userland" classes written in PHP and designed to allow developers to manage SOAP web services, which will speed up our coding by allowing us to focus on functionality rather than on the communication protocols. NuSOAP is distributed under the LGPL license, and can be downloaded here: . To add NuSOAP support to our project, we simply have to include nusoap.php to our PHP scripts using require(). Performing a Remote Procedure Call (RPC) is simple—look at this example: require("nusoap.php"); $params = array('name' => 'value');
The keyword on which the search should be performed.
Description
Url
String
The URL of the product page for this item on Amazon
Asin
String
The Amazon.com Standard Item Number for this product
ProductName
String
The name of the product (in our case, the title of the book)
Catalog
String
The category of the product (e.g.: books)
Authors
String
The name(s) of the author(s)
ReleaseDate
String
The release date, in human-readable format (e.g.: "23 February, 1976").
String
The page number. AWS returns ten results per page, so page 1 will
contain results 1 through 10, page 2 results 11 through 20, and so on.
Manufacturer
String
The name of the product's manufacturer (the publisher in our case)
String
Specifies the ID of the store to browse. Each Amazon store has its unique ID, which indicates what kind of products it sells (e.g.: books, music, dvd, vhs, etc.). You can find a complete list of all the IDs available in the AWS documentation.
ImageUrlSmall
String
A pointer to the products "small" image on the Amazon website
ImageUrlMedium
String
Same as above, for a slightly larger image
String
Your Associate ID. If you don't have one, you can use the generic ID webservices-20.
ImageUrlLarge
String
Same as above, but for an even larger image
ListPrice
String
The product's list price, including the currency symbol (e.g.: "$ 20.55")
String
Determines the type of search results. Lite indicates a simpler result set, while heavy provides a richer set of information about each item returned. We'll use lite
for our example.
OurPrice
String
The product's selling price on Amazon, including the currency symbol
UsedPrice
String
The product's price for used copies.
String
●
First of all, we include NuSOAP and we store the parameters we will use for the RPC in the $params associative array. We then create a new soapclient object, passing two arguments to the constructor: the SOAP server address and a boolean value that indicates whether the server uses a WSDL document. WSDL (Web Services Description Language) documents contain information about a web service, as well as its methods and properties. They are often used by web service providers—including Amazon. Once we have created the object, all we have to do is to actually execute the RPC by invoking the call()
method and specifying the remote method name and the parameters to be passed (contained in $params in our case). NuSOAP automatically fetches the results of the call and stores them in the $result array. Since we are working with a WSDL-based server, NuSOAP can actually create a "proxy" PHP class capable of providing a better interface to our scripts. Once we have instantiated $s, we can also invoke a remote method in this way:
The Developer Token you have received from Amazon.
www.phparch.com
10
FEATURE
Connecting to Amazon.com Web Services with NuSOAP
This can be useful to simplify our code: first, we create a proxy client, $proxy; any subsequent RPCs to methods specified in the WSDL can be performed using the proxy, without having to use the NuSOAP call() method again. In our application, we will use proxies to work with AWS.
Designing the application Now that we've laid down some ground rules, it's time to decide in detail what the goals of our application are going to be. Since we're all PHP fans, our example website will be about PHP and, therefore, we'll want to allow our users to buy books on this topic from Amazon. The first thing that we need is a search page: users will be able to search for a particular keyword (or for a set of keywords) and the page will display some basic information about each book that matches the criteria, such as its title, an image, the publishing company, author or authors and price. We also have to provide a way to browse the results, since AWS calls only return ten results per call. The search page should also contain a link for each product to another page on our website that will contain a detailed description of the book, including any
user reviews and comments. From here, the users will be able to continue their purchase on Amazon.com or
add the product to their wish lists.
The search page If you have had an opportunity to read through the AWS documentation, you have probably discovered that searches by keyword can be performed using the KeywordSearchRequest() method, which requires the parameters shown in Figure 1. Assuming that the call will be successful, the server will return an array containing several items: • The TotalResults element, which indicates the number of total results returned by the query. • The TotalPages element, which provides the number of pages available in the search result. • The Details sub-array, which contains a set of data about each search result matching our search criteria that is included in the page we have requested. Given that a search only returns a maximum of ten items per page, you can expect that this array will contain no more than ten elements. The lite search mode returns the data shown in Figure 2.
\n\n”; if($_GET[“page”] > 1) // Prints a link to prev. page if any echo “<a href=’$PHP_SELF?keyword=”.$_GET[“keyword”].”&page=”.($_GET[“page”]-1).”’>Previous Page</a> \n”; if($_GET[“page”] < $results[“TotalPages”]) // Prints a link to next page if any echo “ <a href=’$PHP_SELF?keyword=”.$_GET[“keyword”].”&page=”.($_GET[“page”]+1).”’>Next Page</a>”; ?>
March 2004
●
PHP Architect
●
www.phparch.com
11
FEATURE Parameter
Connecting to Amazon.com Web Services with NuSOAP
Type
Basic Character ClassesDescription
asin
String
The product's ASIN (which, in our case, can be retrieved from $_GET['asin']
tag
String
The Associate ID, or [webservices20] if you want to use a generic one
type
String
The type of search. In this case, we'll choose heavy, since we want all the information available on a particular book
devtag
String
Your Developer Token
Result Datum
SalesRank
Type
Description
Integer Array of Strings
Lists
The product's sales ranking The names of the ListMania lists that contain the product
Indicates the product categories in which the product can be found. Its contents look like this:
The type of medium on which the product is distributed (e.g.: paperback or hardcover for books)
Isbn
String
The ISBN code of the product (books only)
Availability
String
Indicates how long the product takes to be shipped
Reviews
SimilarProducts
Element
Array
This array contains information about the customer reviews associated with the product. It includes three elements: AvgCustomerRating, which indicates the average customer rating for the product, TotalCustomerReviews, which contains the number of customer reviews available and CustomerReviews, which is an array that contains the three most recent reviews (you can find the contents of this array in Figure 6).
Array of Strings
Contains the ASINs of products that are similar to this one.
Type
Description
The rating of the product in this review
Rating
Integer
Summary
String
A summary of the review
Comment
String
The full review itself
March 2004
●
PHP Architect
●
www.phparch.com
As you can see, the KeywordSearchRequest() method returns quite a few pieces of information for every result item, although, of course, we don't have to output all of them on our site. If you look at Listing 1—the source for our search page—you'll see that the very first part of the file is nothing more than a simple HTML form, which contains an input text box for the keyword and a hidden field that forces the page number to 1— this way, a new search will automatically start from the first page of results. The form uses the GET method because we need to use links for the "Next Page" and "Previous Page" operations (something like page.php?keyword=blah&page=2). Naturally, you could also use POST, but in that case it would be much more difficult for someone to create a direct link to your search results, which could, in theory, prevent you from completing some sales. The second part of the script contains the actual PHP code. First of all, an if-then-else control block stops the execution of the script if $_GET["keyword"] is empty. Otherwise, we include NuSOAP and create a SOAP client by passing the URI of the *.wsdl file for Amazon (which is provided in AWS documentation) and the boolean true to indicate to the constructor of the soapclient() class that the SOAP client features WSDL support. We also create a proxy to call AWS methods directly as we have seen in the first part of the article. The parameters needed to invoke
KeywordSearchRequest() are stored in the $param array; the first two (the keyword and the page number) are to be found in the $_GET superglobal, since they change each time we perform or browse a search, while the others are constant and, therefore, we hardcode them in our script. Remember to insert your developer token in $param["devtag"]. Once we have invoked the method and stored the search results in $results, we have to display the latter in a format that is comprehensible to the user. First, we check whether there are any results to begin with. If the search returned no data, the program displays a warning and exits. Otherwise, we print a short summary of the search: the keyword, the current page number and total page count, followed by details about each product in the current result page. These are actually produced by a simple foreach loop, which browses the $results["Details"] array, echoing the title of each book, a medium-size image, its authors, publishing company and prices. We will also provide a link to another page, details.php, which contains further information on each book. The link contains a reference to the product's ASIN (the Amazon identifier for each product) in order to make the application able to retrieve the correct product from Amazon's catalogue with another RPC. The last part of this page allows the user to browse the results: if the current page isn't the first one (Page
12
FEATURE
Connecting to Amazon.com Web Services with NuSOAP
1), the script prints a link to the previous one and, if it isn't the last page (based on the information returned by our AWS call), it prints a link to the next one. Figure 3 shows our search page at work.
The Product Detail Page Now that we are done with the first part of the application, it's time to move on to the product detail page, which will show advanced information about a particular book. The AWS method we need in this case is AsinSearchRequest(), which needs the parameters shown in Figure 4. Just like before, the response that we get back from Amazon is an array of arrays—except that, in this case, we will simply concern ourselves with the first result set, since the ASIN uniquely identifies one product. Our data, therefore, will be stored in $results['Details'][0], which, in turn, will contain the information shown in Figure 5. As you can see, some of the values returned are the same as the results of the KeywordSearchRequest() call that we used in Listing 1, while some others, like the customer reviews, are more appropriate for a detailed product page. Speaking of the product page, Listing 2 contains the code for details.php. First, we check $_GET["asin"]; if it is empty, the program displays a warning and exits.
In a more complete application, you may want a slightly more verbose explanation of what went wrong, or perhaps an automatic redirection to the search page. If we have an ASIN, we include the NuSOAP library, then create a SOAP client and proxy as we did in the previous page. Please note that we have to use sprintf() to transform the ASIN in a ten-character
strings, since AWS requires it to be submitted in that format (as an alternative, you could use str_pad() to ensure that the string is ten character long). This time, we only need to pass the ASIN and specify heavy as the search type. Once the RPC has been executed, we retrieve the results and print them out, using a foreach loop to cycle through the user reviews. The final touch in our application consists of providing a link back to the Amazon website in order to make it possible for our users to purchase a product—you can't do much selling by just showing which products are available! The AWS documentation specifies that an HTTP form must be set up for the purpose of submitting the purchase information over to Amazon.com. This form (you can look at the one in Listing 2 for an example) uses the POST method, and its action attribute is really nothing more than a page on Amazon.com that contains the
foreach($results[“Details”][0][“Reviews”][“CustomerReviews”] as $res) echo “
”.$res[“Summary”].”
” .”<b>Rating: </b>”.$res[“Rating”].”
”.$res[“Comment”].” <hr />”; ?>
March 2004
●
PHP Architect
●
www.phparch.com
13
FEATURE
Connecting to Amazon.com Web Services with NuSOAP
ASIN of product that must be added to the user's shopping basket. A few additional hidden fields provide the ASIN, the Associates Id and the Developer's token. The form supports two different buttons: one adds the product to the user's basket, while the other adds it to his wishlist.
Further Improvements As you have probably noticed, writing a SOAP-based application using a library like NuSOAP is much faster than developing your own SOAP classes—if you have read my article about the Google API that appeared on the January issue of php|a, you probably know what I am talking about. This means that you can develop rather complex applications without having to waste time dealing with the nitty-gritty details of the underlying protocol; in fact, we didn't even write any SOAP code for our Amazon application—NuSOAP did it all for us. Naturally, the code that I have introduced here is very basic and could stand to gain from some improvements. For instance, Amazon Web Services allow you to to manage a a remote shopping cart or wish list by adding and removing items to them. The very last part of the purchase—the one where money changes hands—must still take place on Amazon.com, but you can let the user perform most of the normal operations associated with an e-commerce website without leaving your website. However, do keep in mind that if you choose to manage the user's shopping cart remotely,
you can't change it once you've submitted to Amazon—this is done to protect the end user from fraudulent transactions. You can check out the AWS documentation for more details on this topic—you'll
find that it's not complicated at all. Depending on your needs, you may choose to perform a different kind of search operation on your website: by similar products, by author, by ISBN, by manufacturer, and so on. You may also want to browse a "node", or product category (e. g. "programming", "web", etc.) directly, without performing a search. It goes without saying that all this depends on what your goals are. If your Amazon-based shop becomes very popular, you may decide to join the Amazon Associates Program, an affiliate system that pays you commissions on every sale. Be careful, however, that your application must not send more than one request per second to Amazon—even if you provide an error handling system, you must not immediately retry a request if the previous one has failed. You should also provide a caching system, in order to store the data needed by your site without going back and forth to AWS for every request—you can check out Bruno Pedro's excellent article in the February 2004 issue of php|a for more idea on caching data from your PHP scripts. If you choose to do so, don't forget that you can't keep your data cached for more than twentyfour hours. Finally, please keep in mind that in the examples shown in this article we always referred to Amazon.com, the American website. AWS are also available for Amazon.co.uk, Amazon.de and Amazon.co.jp, but you have to modify the URIs in the
script, changing the specifications in the WSDL document from [soap.amazon.com/] to soapeu.amazon.com/, and so on. You will also have to add the locale parameter to your RPC invocations—its value can be set to uk, de or jp, depending on which Amazon
Figure 3
March 2004
●
PHP Architect
●
www.phparch.com
14
FEATURE
Connecting to Amazon.com Web Services with NuSOAP
website you are referring to.
I'm Outta Here
Amazon.com Web Services is a powerful tool that you can use to add e-commerce functionality to your site without going to the expense of developing an online store of your own and stocking all the merchandise. Even if you can't create a complete on-line shop using ASW (because the purchase must be completed on the Amazon website), you can still give your users a customized shopping experience that relies on the practically limitless resources of one of the world's most popular e-commerce websites. The sample application that I showed you in this article is quite simple: if you plan to use it in a production environment—especially if your site has a lot of traffic— you should probably consider implementing features like error handling and caching in order to prevent problems with the Amazon servers. Adding these elements to your application may require some extra work, but it could all pay off if you enjoy decent traffic and join the Amazon Associates Program. Perhaps most importantly, I hope to have given you a good idea of how much a SOAP library (in this article we have chosen NuSOAP, but there are some others
packages, like PEAR::SOAP) can simplify the creation of a complex application—write in few lines of code to perform a Remote Procedure Call and you're practically done. If you want to extend our sample application and create a "complete" on-line shop using AWS, have a look to the documentation: there you will find a detailed description of every method that's available for use. If you want to learn more about SOAP, you can check out the World Wide Web Consortium's notes about the protocol at or—if you missed it— read the article "Exploring the Google API with SOAP" published in the January 2004 issue of php|a.
About the Author
?>
Alessandro Sfondrini is a young Italian PHP programmer from Como. He has already written some on-line PHP tutorials and published scripts on most important Italian web portals. You can contact him at .
To Discuss this article: /> FavorHosting.com offers reliable and cost effective web hosting... SETUP FEES WAIVED AND FIRST 30 DAYS FREE! So if you're worried about an unreliable hosting provider who won't be around in another month, or available to answer your PHP specific support questions. Contact us and we'll switch your information and servers to one of our reliable hosting facilities and you'll enjoy no installation fees plus your first month of service is free!* - Strong support team - Focused on developer needs - Full Managed Backup Services Included Our support team consists of knowledgable and experienced professionals who understand the requirements of installing and supporting PHP based applications. Please visit />call 1-866-4FAVOR1 now for information.
March 2004
●
PHP Architect
●
www.phparch.com
15
Matchmaker, Matchmaker Make Me A Match An Introduction to Regular Expressions
F E A T U R E
by George Schlossnagle A quick search for the words "hate" and "regular expressions" on your favourite search engine is likely to bring up thousands upon thousands of hits. While most developers recognize the usefulness of regular expressions (and many can't do without them once they have figured out how regexes work), their use remains something of a blackmagic art—right up there with hypnosis and session management. Despite looking complicated, however, regular expressions are much easier to work with than most people are willing to admit.
A Few Myths about Regexes Before we get started, we should dispel a few popular myths about regexs: Myth: Regular Expressions are Slow. Truth: Regular expressions can be slow, but they don't need to be. The main regular expression library used by PHP (called PCRE and consisting of the preg_ family of functions) is quite fast and also quite powerful. This power means that it is
easy to write a short regular expression that performs a lot of work, and performing a lot of work with any tool can be slow. Myth: You should use basic string functions instead of regular expressions. Truth: Regular string functions (for example strstr or strtok) are (marginally) faster than the regular expression to accomplish the same task. That having been noted, this myth often leads to people implementing complicated string parsers using string matching functions where a single regular expression would do the trick. The PCRE library will always match complex patterns faster than implementing a parser on your own.
March 2004
●
PHP Architect
●
www.phparch.com
R
egular expressions (commonly known as regexes) are a powerful tool for pattern matching and text manipulation. A typical problem that pulls people
into learning regular expressions is text munging: you have a string of text and you need to replace portions of it based on certain rules. For instance, you might want to obfuscate all the email addresses in a block of text so that email addresses like get translated to the form george [at] example [dot] com. Regular expressions are the tool for the job, and provide a powerful and deep syntax for handling tasks like these. Alternatives to the PCRE Functions PHP supplies some alternatives to the PCRE functions. The most direct competitor is the POSIX regular expression library that consists of ereg, ereg_replace and others. We won't be looking at the POSIX regular expression functions because the PCRE library provides a broader pattern-matching facility than its POSIX counterpart and the PCRE library is about 30% faster on average. The other option is to perform string matching with the standard string functions. As noted above,
REQUIREMENTS PHP: ANY OS: Any Applications: N/A Code Directory: match-regex
16
FEATURE
Matchmaker, Matchmaker Make Me A Match
the string functions are faster on the tasks they were designed for (finding specific characters or substrings),
but are not an appropriate fit for anything but the simplest patterns. Your First Regex The simplest regex is a match against a static string. To determine if the string '' is present in a piece of text, we can use the following code fragment: if(preg_match("/george@example\.com/", $text)) { print "Matches"; } else { print "Does not match"; }
this function in more detail later in the article. • preg_replace_callback—This function makes it possible to perform very complex operations on a per-match basis through the use of callback functions. We will cover it in a future article, but some of its functionality overlaps with evaluated replacements, which are discussed in this article. • preg_quote(string text)—When using input text in a pattern, you may want to sanitize it to ensure it does not contain any regex metacharacters. preg_quote escapes all regex metachacters in a string.
Despite its simplicity, this example illustrates the basic syntax of a regex match. The regex itself is the first parameter, and is contained within slashes ([/]). • preg_split(string pattern, string subject The second parameter is the text you want to test [, int limit [, int flags]])—ppreg_split the pattern against. The preg_match function returns
performs similarly to explode, allowing us to true if the match succeeds, and false if it fails. Using break up the string subject into limit parts. slashes to delimit regular expressions is a convention Instead of splitting on a specific delimiter, (taken from the UNIX utility awk), but is not necespreg_split allows the string to be broken sary—you can actually use any non-alphanumeric based on a regex. character. Alternative delimiters are convenient if your pattern itself contains slashes. Regex Basics For instance, when dealing with file Of course, we can (and should) perpaths or URLs (both of which conform the previous simple match using tain numerous slashes), it is common “The power of regustrstr(), which is faster than any regex to use a different delimiter. lar expressions is function. What if, however, we want to We can also perform substitutions match all email addresses in a string, in matching comwith PCREs. To substitute 'george at rather than a specific one? What if you plex patterns that nospam.example.com' for my address wanted to change text only if it (a common anti-spam technique), you cannot be identiappeared in a particular position within can use your string? preg_replace("/george@example\.com/",
"george [at] nospam.example.com", $text);
The other PCRE functions are:
fied using straightforward textsearch functions like strstr().”
• pcre_grep(string pattern, array subjects [, int flag])—ppcre_grep applies the specified pattern to every element of subjects, returning an array consisting of those that matched. If the optional flag is set to PREG_GREP_INVER, only those elements that did not match will be returned. • pcre_match_all( s t r i n g p a t t e r n , s t r i n g subject [,array matches, int flags]])— pcre_match returns only the first match found in its subject text. pcre_match_all matches as many times as possible, returning an array of all the matches. I will discuss
March 2004
●
PHP Architect
●
www.phparch.com
The power of regular expressions is in matching complex patterns that cannot be identified using straightforward text-search functions like strstr(). The basic components of a regular expression pattern are:
• Character Classes—Patterns rarely consist of specified letters, but classes of letters. For example 'any number' instead of a particular number, or 'any letter' instead of a particular letter. • Grouping—Grouping allows for changing the precedence of operations as well as providing a means to extract the text you matched with a pattern. • Enumerations—Enumerators allow you to specify how many times a character class or sub-pattern appears. This allows for conven-
17
FEATURE
Matchmaker, Matchmaker Make Me A Match
Second, if you test this pattern you will find the following results.
ient expression of fixed length patterns like 'a US zipcode is 5 digits' as well as variable
length patterns such as 'a domain is a number of alphanumeric characters separated by dots'.
• 555-123-4567 matches. This is correct. • 5555-123-45678 matches. This is not correct.
• Alternations—Alternations allow for multiple patterns to be combined. Unlike character classes, which allow for a position to match multiple characters, alternations allow for entire patterns to be alternatively matched. For example, a valid workday can be Monday, Tuesday, Wednesday, Thursday or Friday.
The second example does not represent a valid phone number (the area code and line number are too long), but it matches because the pattern fits as shown in Figure 1. There are a couple of ways to combat this problem. If you know that your search text should be exactly a phone number (with no leading or trailing text), you can use positional anchors to force the pattern to start at the beginning of the text and end at the end, as we'll see later on. If the phone number might be contained in text, on the other hand, you might try and fix the pattern by having the numbers have at least one character of leading and trailing whitespace, using a pattern like:
• Positional Anchors—Anchors allow you to
require your pattern to start matching at a specific location in the search text, for example at the beginning or end of a line. • Global Pattern Modifiers—Global pattern modifiers allow you to change the basic behavior of a regular expression, for example rendering it case-insensitive.
/\s\d\d\d-\d\d\d-\d\d\d\d\s/
Character Classes While it's usually easy to find a particular substring within a larger string—for example, my e-mail address in a message—it's not always easy to find a particular type of substring-like any e-mail address. To do this, you need to be able to match against a more generic pattern and not just against a static string. PCRE supplies character classes to allow you to do this; a character class allows a specific character in a search text to be matched against a range of possible characters. For example, a US phone number is composed of a three digit area code, a three digit exchange, and a four digit line number, commonly delimited by a '-'. To match this pattern, you could use the following regular expression: /\d\d\d-\d\d\d-\d\d\d\d/
The \d specifier is a built-in PCRE character class that consists of all the digits. There are a couple things you should note about the pattern above. The first is that we have many \d's. In regular expressions, any character or character class matches only a single character unless you use an enumerator (which we'll cover later) to attach a quantity to it. Figure 1
Regex doesn't always work the way you expect 8
8
7
7
-
x
x
x
-
y
y
y
y
\d
\d
\d
-
\d
\d
\d
-
\d
\d
\d
\d
March 2004
●
PHP Architect
●
www.phparch.com
y
The \s specifier is another character class for all whitespace (spaces, tabs, newlines, etc.). This pattern does not work in all situations, though, since if the text begins with the phone number you will be unable to match the leading \s. To handle this case, boundary condition that PCRE supports \b—a matches at the border (or boundary) between a 'word' and a 'non-word' (these are words in the C programming language sense—letters, numbers and underscores only). \b is actually not a character class, but what is known as a 'zero-width assertion'; this means that the \b specifier does not actually match the character on the other side of the boundary, but only ensures that such a boundary exists. Putting that into our pattern we can refine it to: /\b\d\d\d-\d\d\d-\d\d\d\d\b/
Continuing the testing, we find that "077-xxx-yyyy" matches. US and Canadian area codes and exchanges cannot begin with 0 or 1 (these are reserved for long distance and operator-assisted or international services). To be able to restrict the leading numbers to the allowed set, we need to be able to create our own character classes. In PCRE, these are constructed by filling a set of brackets ([[ ]) with the characters we want to match. To match 2-9, we can use the character class [23456789], which is commonly shortened via a range operator to [2-9]. To use a custom character class in a pattern, you use it exactly as you would a
regular character or character class. Here is the phone number pattern reworked to employ this: /\b[2-9]\d\d-[2-9]\d\d-\d\d\d\d\b/
18
FEATURE
Matchmaker, Matchmaker Make Me A Match
PCRE provides six commonly used built-in character classes, described in Figure 2. Additionally, PCRE provides POSIX-style character classes for compatibility with POSIX-style regular expressions. These classes are described in Figure 3. POSIX character sets aren't commonly used much in real-life code, which is a shame because they are often a perfect fit for problems that programmers encounter in their dayto-day work. You can negate a POSIX character class by adding a ^ after the first colon. For instance, to match all non-letter characters, you could use the class :^alpha:. Negations are also available in custom character classes—for example, to match anything that is not the greater-than character (>), you can use the custom character class [^>]. Negations are very useful when you are creating regular expressions that extract quoted text or if you want to manually parse XML or HTML. Since '--', '^^' and '[[ ]' have special meanings in custom character classes, if you want those actual characters to be elements of the class, you should escape Figure 2 Basic Character Classes
them with a backslash (\\). The two exceptions are the range operator -, which can appear un-escaped as the last character in a class, since that is unambiguous, and the negation character ^, which can
appear un-escaped in any position but the first. Grouping and Sub-Patterns Usually, you will not only want to match a pattern, but extract data from it as well. To extract a specific part of a pattern, you surround it within parentheses. For example, to capture each part of the phone number pattern, you would add parentheses as follows: /\b([2-9]\d\d)-([2-9]\d\d)-(\d\d\d\d)\b/
Figure 3 POSIX Style Classes :alpha:
Any letter
:alnum:
Any alphanumeric character
:ascii:
Any ASCII character
:cntrl:
Any control chatacter.
.
Matches any character
:digit:
Any digit (same as \d)
\w
An alphanumeric character or the underscore character.
:graph:
Any alphanumeric or punctuation character.
\W
Anything not a \w.
:lower:
Any lowercase letter.
\d
A digit.
:print:
Any printable character.
\D
A non-digit.
:space:
Any whitespace character (same as \s).
\s
Any whitespace. This includes spaces, tabs, newlines, control characters.