Tải bản đầy đủ (.pdf) (94 trang)

Pro PHP XML and Web Services phần 7 pdf

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (538.95 KB, 94 trang )

Table 14-6. Continued
Element Use Description
author Recommended A Person construct providing information about the author of
a feed. A feed element must contain one or more author ele-
ments unless every entry element contains at least one author
element.
link Recommended A link, as defined in the “Common Constructs” section, to a
related Web page.
category Optional Associates a category, as defined in the “Common Constructs”
section, with the feed. A feed can have zero or more category
elements.
contributor Optional A Person construct providing information for a contributor to
the feed. You can use zero or more contributor elements.
generator Optional Identifies the agent used to create the feed.
icon Optional Identifies a small image, by means of a URL, for the feed.
logo Optional Identifies a larger image, by means of a URL, for the feed.
rights Optional A Text construct containing any rights, such as copyrights, for
the feed.
subtitle Optional A Text construct containing a description or subtitle for the
feed.
A document using the metadata elements from Table 14-6 could look something like the
one in Listing 14-4.
Listing 14-4. Sample Atom Feed Document Using Optional Elements
<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns=" /><title>Example Feed</title>
<id> /><updated>2005-10-02T15:15:00Z</updated>
<author>
<name>John Smith</name>
</author>
<author>
<name>Jane Doe</name>


</author>
<link rel="self" href="/atom/" />
<category term="technology"/>
<category term="PHP"/>
<contributor>
<name>John Doe</name>
</contributor>
<generator uri="/phpatomgen.php" version="1.0">
Example PHP Atom Generator
</generator>
CHAPTER 14 ■ CONTENT SYNDICATION: RSS AND ATOM548
6331_c14_final.qxd 2/16/06 4:34 PM Page 548
<icon> /><logo> /><rights> &copy; 2005 John Smith </rights>
<subtitle>Description of Example Atom Feed</subtitle>
<! Zero or more entry elements >
</feed>
entry Element
Atom does not require a feed to contain any entry elements, which is similar to RSS 2.0,
because it does not require items. Using the Atom format, however, an entry element can be
part of a feed and also can be its own document. This section will cover the structure of an
entry element because it is the same whether used a child element of a feed element or used
stand-alone as the document element of an Atom entry document. The only difference is that
because Atom elements must live within the Atom namespace, an entry element used as an
Atom entry document must declare the namespace, while a
child entry element within a feed would normally already be within the scope of this name-
space. Many of the possible child elements, shown in Table 14-7, of an entry element are used
in a similar fashion as those used by the feed element.
Table 14-7. Entry Child Elements
Element Use Description
title Required A Text construct containing the title or name of the entry.

id Required A permanent and universally unique IRI. If this is not a URI, it is
not dereferenced and is compared on a character-to-character
basis like a URI.
updated Required A Date construct indicating the date and time of the last signi-
ficant modification.
author Recommended A Person construct providing information about the author of
a feed. An entry element must contain at least one author ele-
ment unless one is contained by the feed or is provided within
a source element for the current entry.
content Recommended Contains or links to the complete content, as defined in the
“Common Constructs” section, of the entry. This element must
be provided if the entry does not contain an alternate link and
should be provided if there is no summary.
link Recommended A link, as defined in the “Common Constructs” section, to a
related Web page. An alternate link must be used if the entry
does not contain a content element.
summary Recommended A Text construct that provides a short summary or description
of the entry. It is recommended that a summary element be used
when no content element is used, the content is remote and
uses an src attribute, or the content is Base64-encoded.
category Optional Associates categories, as defined in the “Common Constructs”
section, with the entry. A feed can have zero or more category
elements. There can be zero or more category elements.
Continued
CHAPTER 14
■ CONTENT SYNDICATION: RSS AND ATOM 549
6331_c14_final.qxd 2/16/06 4:34 PM Page 549
Table 14-7. Continued
Element Use Description
contributor Optional A Person construct providing information for a contributor to

the entry. You can use zero or more contributor elements.
published Optional A Date construct containing the initial creation date and time
of the entry.
source Optional A source element is used when an entry is copied from another
feed. I will explain this element in further detail following this
table.
rights Optional A Text construct containing any rights, such as copyrights, for
the entry.
I have explained each of the elements in Table 14-7 elsewhere in the chapter. The only ele-
ment that needs more clarification is the source element. You use a source element when an
entry is copied from another feed. Its children can be any of those used by the entry’s original
parent feed element except for entry elements, especially when the element is not already
contained by the entry. For example, if you used an entry from Listing 14-3 to create an Atom
entry document, it could look like the following:
<?xml version="1.0" encoding="UTF-8"?>
<entry xmlns=" /><title>Article 1</title>
<link href=" /><id> /><updated>2005-10-02T11:35:27Z</updated>
<summary>This is the description for article 1.</summary>
<source>
<link href=" /><author>
<name>Rob Richards</name>
<email></email>
</author>
</source>
</entry>
If you look at the source element, you will see that it used the link and author elements
from the original feed. This pertains to Atom entry documents and also when an entry from
one feed is incorporated into another feed. The original feed information for the entry is
maintained with the entry, keeping it completely separate from the current feed yet allowing
the entry to reference its original feed. The author, contributor, rights, and category ele-

ments are some elements to preserve from the original feed because they provide the most
important information pertaining to the origins and rights for the entry.
Choosing a Format
With three competing technologies, how do you choose one to use? If you are going to be sub-
scribing to a feed, the answer is simple. You use what is offered and what your reader supports.
CHAPTER 14 ■ CONTENT SYNDICATION: RSS AND ATOM550
6331_c14_final.qxd 2/16/06 4:34 PM Page 550
The hard part comes when you are the one creating the feed. Personally, when faced with a
decision like this, I often will check around to see what the big corporations are doing. It is
normally a safe bet that if several of them are using the same technology, it means good sup-
port exists for it. Of course, big companies also have a decent amount of resources behind
them, so even if the support is not there, it usually arrives quickly.
In my opinion, RSS 2.0 looks like a safe bet, although I am not ruling out the others. With
a quick look at some RSS 2.0 implementers, you will see names such as Yahoo, the Wall Street
Journal, MSNBC, and IBM. This does not even include those providing podcasts. This, how-
ever, doesn’t mean you have to use RSS 2.0 or even select just a single format.
If you look at the open source community, it is not surprising to find sites providing feeds
in all three formats. Unlike a company that normally mandates how its information is accessed,
open source sites tend to lean more toward freedom of choice. No matter what aggregator or
reader you are using, as long as it’s compatible with at least one of the technologies, you will be
able to access the information.
Comparing the three formats, my first choice is RSS 2.0. It is simple to use and has a high
usage rate. Second on my list is Atom. I consider Atom to be a wildcard format. It has a great
structure and offers more flexibility than RSS 2.0, but it does not yet have the user base RSS 2.0
does. Remember, Atom was created as a competing format because of all the problems between
the two RSS camps. So, unlike the RSS branches that already had user bases (though divided),
Atom started from the bottom. I consider it a wildcard because it still has the possibility of gain-
ing more widespread usage. RSS 1.0 is my least favorite. I think the structure is a bit awkward,
and the use of namespaces a bit extensive for my liking. You should also take into account that
RSS 1.0 is built on RDF technology, which in my opinion just overcomplicates things.

In the end, the choice is up to you. Everything here has been my opinion, not the voice of
the Great Oz. Only you understand who your audience is and your users’ needs. You know the
type of content your feed will be supplying. Finally, you will be the one who has to support it.
The advice offered should help you decide which format (or even formats) best suits your needs.
Seeing Some Examples in Action
Content syndication varies depending upon the technologies you are comparing. For this rea-
son, the examples in the following sections are not overly complex examples that attempt to
demonstrate the complete functionality of each of the formats. I will demonstrate a simple
API for creating minimal RSS 1.0, RSS 2.0, and Atom feeds using DOM; a simple RSS 2.0 parser
using SimpleXML; and a simple Atom parser using XMLReader. You could extend each of
these examples to create much more feature-rich applications.
Creating Simple Feeds Using DOM
Depending upon the type of feed and the different support being added to it, building a feed
manually using DOM can become complex, especially when trying to support multiple formats.
This example will demonstrate how to use DOM to create feeds in multiple formats and support
the minimal requirements for each format. The code is split into four classes. The Syndicator
class is the base class, which is not instantiated directly, that provides the bulk of functionality
for building a feed. The remaining classes, which extend the Syndicator class, are the ones that
are instantiated to create a feed in a specific format. The RSS1 class supports an RSS 1.0 feed, the
RSS2 class supports RSS 2.0 feeds, and the Atom class supports Atom 1.0 feed documents.
CHAPTER 14 ■ CONTENT SYNDICATION: RSS AND ATOM 551
6331_c14_final.qxd 2/16/06 4:34 PM Page 551
Syndicator Class
The Syndicator class is the base class and provides the majority of functionality for creating
a feed. Because of the differing feed formats, much has been generalized in this class with
specifics provided by the extending classes. This class is not meant to be directly instantiated.
In actuality, this class should be made abstract, but in the event you are not fluent with OOP
or some of the newer aspects of PHP 5, I have written it as a regular class:
class Syndicator {
protected $rssDoc = NULL;

protected $docElement = NULL;
protected $root = NULL;
protected $items = NULL;
protected $hasChannel = TRUE;
protected $tagMap = array('item'=>'item', 'feeddesc'=>'description',
'itemdesc'=>'description');
const ITEM = 0;
const FEED = 1;
All class properties are protected because they are not meant to be accessed outside an
instantiated object. The first three properties are required because of the differing structures.
The rssDoc property holds the DOMDocument object you are using to create the feed. The
docElement property holds the DOMElement object to which item or entry elements are added.
This normally is the document element except in the case of RSS 2.0. The item elements are
added to the channel element in that format, which is actually a child of the document ele-
ment. The docElement property acts as a pseudo-document element, so you can add item and
entry elements using common functionality. The root property holds the DOMElement to which
you add the metadata for the feed. Again, this varies depending upon the format you are
using. For an Atom feed, the value of this property is the feed element, which is the document
element. For an RSS 1.0/RSS 2.0 feed, the value of this property is the channel element. I will
show how to use the remaining properties later in the example. The defaults for these, how-
ever, are for the RSS 1.0 and 2.0 feeds.
/* Common element creation function that handles namespace creation properly */
protected function createSyndElement($namespace, $name, $value=NULL)
{
if (is_null($namespace)) {
return $this->rssDoc->createElement($name, $value);
} else {
return $this->rssDoc->createElementNS($namespace, $name, $value);
}
}

/* Default link element creation function as Atom has a different format */
protected function createLink($parent, $url)
{
$link = $this->createSyndElement($this->NS, 'link', $url);
$parent->appendChild($link);
}
CHAPTER 14 ■ CONTENT SYNDICATION: RSS AND ATOM552
6331_c14_final.qxd 2/16/06 4:34 PM Page 552
The following function, createRSSNode(), adds a title, link, and description to the element
passed as the first parameter. In the case of an Atom feed, it also creates the updated and id
elements. Links in Atom feeds are created differently than in RSS 1.0 and RSS 2.0 feeds; thus,
the example uses a createLink() function. As you will see in the Atom class, it is overridden
so the element is created in the proper format. A $type variable is passed into this method to
indicate the type of element for which these child elements are being created. The reason for
this is to determine the element for the description. RSS 1.0 and RSS 2.0 use the element
description for both the channel and item elements. Atom, on the other hand, uses subtitle
for the feed element and content for the entry element. Based on the type, the proper name
is taken from the tagMap array, which is also overridden in the Atom class.
/* Generic method to create appropriate title, link, and
description for an element */
protected function createRSSNode($type, $parent, $title, $url,
$description, $pubDate = NULL, $id=NULL)
{
$this->createLink($parent, $url);
$title = $this->createSyndElement($this->NS, 'title', $title);
$parent->appendChild($title);
if ($type == Syndicator::ITEM) {
$titletag = $this->tagMap['itemdesc'];
} else {
$titletag = $this->tagMap['feeddesc'];

}
$description = $this->createSyndElement($this->NS, $titletag, $description);
$parent->appendChild($description);
The remaining functionality of the createRSSNode() method is specific to Atom. These
methods could be supported with additional coding for both RSS 1.0 and 2.0 but are currently
out of the scope of this example. To do so would require supporting extending modules, the
Dublin Core in particular, for RSS 1.0. These are required for a valid Atom feed so currently
work properly only for that format.
/* id elements and updated elements are specific to Atom
- corresponding elements from other formats not currently supported */
if (! is_null($id)) {
$idnode = $this->createSyndElement($this->NS, 'id', $id);
$parent->appendChild($idnode);
}
if (! is_null($pubDate)) {
$datenode = $this->createSyndElement($this->NS, 'updated', $pubDate);
$parent->appendChild($datenode);
}
}
The constructor performs all the initial setup for the feed. Each class defines a SHELL
property, which is just a template for the document. It is used to easily create a document with
the initial namespaces declared properly. The hasChannel property is set to FALSE for the Atom
class because it is the only format not using a channel element. Once the object is instantiated,
CHAPTER 14 ■ CONTENT SYNDICATION: RSS AND ATOM 553
6331_c14_final.qxd 2/16/06 4:34 PM Page 553
the constructor will have properly set up the properties mentioned earlier and set the initial
metadata for either the feed element or the channel element based on the values passed to
the constructor.
function __construct($title, $url, $description, $pubDate = NULL, $id=NULL)
{

try {
$this->rssDoc = new DOMDocument();
$this->rssDoc->loadXML($this->SHELL);
$this->docElement = $this->rssDoc->documentElement;
if ($this->hasChannel) {
$root = $this->createSyndElement($this->NS, 'channel');
$this->root = $this->docElement->appendChild($root);
} else {
$this->root = $this->docElement;
}
$this->createRSSNode(Syndicator::FEED, $this->root, $title,
$url, $description, $pubDate, $id);
return;
} catch (DOMException $e) {
throw new Exception($e->getMessage());
}
throw new Exception("Unable to Create Object");
}
The addItem() method is pretty simple. It creates an element using the name pulled from
the tagMap, which is entry for Atom and item for RSS 1.0 and 2.0. The new element is then
appended to the node held by the docElement property. The createRSSNode() method is then
called, passing the type Syndicator::ITEM constant, which will result in the title, link, descrip-
tion, possible ID, and updated elements to be created on this new element.
public function addItem($title, $link, $description=NULL,
$pubDate = NULL, $id=NULL)
{
$item = $this->createSyndElement($this->NS, $this->tagMap['item']);
if ($this->docElement->appendChild($item)) {
$this->createRSSNode(Syndicator::ITEM, $item, $title, $link,
$description, $pubDate, $id);

return TRUE;
}
return FALSE;
}
/* Method used as a holder and is overridden in the Atom class */
public function addAuthor($name)
{
trigger_error("Function not yet implemented");
return FALSE;
}
CHAPTER 14 ■ CONTENT SYNDICATION: RSS AND ATOM554
6331_c14_final.qxd 2/16/06 4:34 PM Page 554
/* Simple method to return the formatted XML document as a string */
function dump()
{
if ($this->rssDoc) {
$this->rssDoc->formatOutput = TRUE;
return $this->rssDoc->saveXML();
}
return "";
}
}
RSS1 Class
The RSS1 class is the class to be instantiated when creating an RSS 1.0 feed. It has a format
much different than RSS 2.0 and Atom do and therefore must override some methods to sup-
port its structure properly. The first area to look at is the properties and the constant it defines.
The RDFNS constant is used only within this class. It defines the rdf namespace because it is
quite long and because the constant makes it easier to use. This namespace is needed for a
few elements, and attributes are specific to RSS 1.0. The NS property sets the common name-
space used within the Syndicator class. Using the property allows the Syndicator class to use

generalized code shared amongst the classes when creating elements.
class RSS1 extends Syndicator {
const RDFNS = ' />protected $NS = ' />/* Following is formatted for readability */
protected $SHELL =
'<rdf:RDF xmlns:rdf=" />xmlns=" />';
The addToItems() method is unique to this class. RSS 1.0 requires items to be referenced
within the channel element. The items property, which you saw defined in the Syndicator class,
holds the DOMElement to which the rdf:li elements are added. Upon the addition of the first
item, the structure is set up, which includes the items element and the rdf:Seq element, which
is the parent for the rdf:li items. This method is never called publicly, and hence you have the
private accessor. Instead, it is called by the overridden addItem() method in this class.
private function addToItems($url)
{
if (is_null($this->items)) {
$container = $this->createSyndElement($this->NS, 'items');
$this->root->appendChild($container);
$this->items = $this->rssDoc->createElementNS(self::RDFNS, 'Seq');
$container->appendChild($this->items);
}
CHAPTER 14 ■ CONTENT SYNDICATION: RSS AND ATOM 555
6331_c14_final.qxd 2/16/06 4:34 PM Page 555
$item = $this->rssDoc->createElementNS(self::RDFNS, 'li');
$this->items->appendChild($item);
$item->setAttribute("resource", $url);
}
The only reason that the addItem() method has been overridden is to support the cre-
ation of the rdf:li elements. This method first calls the parent addItem() method and then
makes a call to the internal addToItems() method.
public function addItem($title, $link, $description=NULL,
$pubDate = NULL, $id=NULL)

{
if (parent::addItem($title, $link, $description, $pubDate, $id)) {
$this->addToItems($link);
return TRUE;
}
return FALSE;
}
As you probably recall from the RSS 1.0 section, the channel and item elements must con-
tain an rdf:about attribute. The createRSSNode() method is overridden to create this attribute
prior to the createRSSNode() method from the Syndicator class being called.
protected function createRSSNode($type, $parent, $title, $url,
$description, $pubDate = NULL)
{
$parent->setAttributeNS(self::RDFNS, 'rdf:about', $url);
parent::createRSSNode($type, $parent, $title, $url, $description, $pubDate);
}
}
RSS2 Class
The RSS2 class instantiates an object to create an RSS 2.0 document. This class is extremely
simple. RSS 2.0 does not use a namespace, so the NS property is set to NULL, and the tem-
plate is simply the rss element with a version. The structure of an RSS 2.0 feed differs from
that of RSS 1.0; as in RSS 2.0, all elements reside within the channel element. The construc-
tor has been overridden so that once the constructor from the Syndicator class has been
called, the docElement property can be set to point to the proper node. In this case, both
the root and docElement properties point to the channel element.
class RSS2 extends Syndicator {
protected $NS = NULL;
protected $SHELL = '<rss version="2.0" />';
CHAPTER 14 ■ CONTENT SYNDICATION: RSS AND ATOM556
6331_c14_final.qxd 2/16/06 4:34 PM Page 556

function __construct($title, $url, $description, $pubDate = NULL, $id=NULL)
{
try {
parent::__construct($title, $url, $description, $pubDate, $id);
$this->docElement = $this->root;
} catch (Exception $e) {
throw new Exception($e->getMessage());
}
}
}
Atom Class
The Atom class, used to instantiate an object to create an Atom 1.0 feed, is not much more difficult
than using the RSS2 class. Its NS property is set to the Atom namespace, and the SHELL property is
set to the initial feed element. The hasChannel variable is set to FALSE in this case. When the con-
structor is called, a channel element will not be created, and the docElement property will be set
accordingly. The class also defines a custom tagMap. Atom tags vary slightly from the RSS 1.0 and
2.0 tags, which is the reason for the use of this array mapping.
class Atom extends Syndicator {
protected $NS = ' />protected $SHELL = '<feed xmlns=" />';
protected $hasChannel = FALSE;
protected $tagMap = array('item'=>'entry', 'feeddesc'=>'subtitle',
'itemdesc'=>'content');
Atom has a different syntax for a link element. This method overrides the default method
so that the link is created in the proper format:
protected function createLink($parent, $url) {
$link = $this->rssDoc->createElementNS($this->NS, 'link');
$parent->appendChild($link);
$link->setAttribute('href', $url);
}
Atom also requires that the feed and entry elements contain an updated id element. In

the event no value has been passed to these parameters for the constructor and addItem()
methods, the values are automatically populated. The id is set to the URL, and the pubDate is
set to the current date and time.
■Note If you are not familiar with the value c passed to the date function, it is a new format character as
of PHP 5 that formats dates in ISO 8601 format. This format is compatible with the Atom Date construct.
CHAPTER 14 ■ CONTENT SYNDICATION: RSS AND ATOM 557
6331_c14_final.qxd 2/16/06 4:34 PM Page 557
For example:
function __construct($title, $url, $description, $pubDate = NULL, $id=NULL)
{
try {
if (empty($id))
$id = $url;
if (empty($pubDate))
$pubDate = date('c');
parent::__construct($title, $url, $description, $pubDate, $id);
} catch (Exception $e) {
throw new Exception($e->getMessage());
}
}
The addAuthor() method is specific to Atom. An author element is required either within
the feed or within every entry element. Rather than supporting some version of this for the
RSS formats, the method defined in the Syndicator class will issue a user notice when called
and not overridden by the current instantiated class. This method, when called, adds a simple
author and child name element to the feed. This is the minimal amount of data required to cre-
ate a valid Atom document.
public function addAuthor($name)
{
$author = $this->rssDoc->createElementNS($this->NS, 'author');
if ($this->docElement->appendChild($author)) {

$namenode = $this->rssDoc->createElementNS($this->NS, 'name', $name);
if ($author->appendChild($namenode)) {
return TRUE;
}
}
return FALSE;
}
public function addItem($title, $link, $description=NULL,
$pubDate = NULL, $id=NULL)
{
if (empty($id))
$id = $link;
if (empty($pubDate))
$pubDate = date('c');
return parent::addItem($title, $link, $description, $pubDate, $id);
}
}
You can use the following code with the classes defined previously to create simple feeds
in each format. Currently, the RSS2 class is the default type of feed to be created. Executing the
code will create an RSS 2.0 document containing two articles and will print the resulting docu-
ment to the output. Depending upon how the script is being accessed (CLI versus Web page),
CHAPTER 14 ■ CONTENT SYNDICATION: RSS AND ATOM558
6331_c14_final.qxd 2/16/06 4:34 PM Page 558
you need to use the correct input variable. It is currently set up to use CLI, but commenting
the line requesting the $_SERVER['argv'] and uncommenting the $_GET['format'] line will
allow the script to run within a Web page. In CLI mode, passing the value rss1 will create an
RSS 1.0 feed, atom will create an Atom feed, and anything else will result in an RSS 2.0 feed.
When executed within a Web page, the same values are used, although they need to be named
with the parameter format.
$type = "";

/* Uncomment the following when using within a Web server environment
if (isset($_GET ['format'])) {
$type = (string)$_GET['format'];
}
*/
/* Comment out the following to disable CLI mode */
if (isset($_SERVER['argc']) && $_SERVER['argc'] > 1) {
$type = (string) $_SERVER['argv'][1];
}
swtich ($type) {
case 'rss1':
$test = new RSS1("RSS1 Title", " />"My RSS1 Feed");
break;
case 'atom':
$test = new Atom("Atom Title", " />"My Atom Feed");
/* Author is only applicable to an Atom feed */
$test->addAuthor('Rob Richards');
break;
default:
$test = new RSS2("RSS2 Title", " />"My RSS2 Feed");
}
$test->addItem('Article 1', ' />'This is the description for article 1.');
$test->addItem('Article 2', ' />'This is the description for article 2.');
print $test->dump();
CHAPTER 14 ■ CONTENT SYNDICATION: RSS AND ATOM 559
6331_c14_final.qxd 2/16/06 4:34 PM Page 559
<?xml version="1.0"?>
<rss version="2.0">
<channel>
<link> /><title>RSS2 Title</title>

<description>My RSS2 Feed</description>
<item>
<link> /><title>Article 1</title>
<description>This is the description for article 1.</description>
</item>
<item>
<link> /><title>Article 2</title>
<description>This is the description for article 2.</description>
</item>
</channel>
</rss>
Creating a Simple RSS 2.0 Parser Using SimpleXML
SimpleXML provides a simple way to parse feeds. As long as no default namespaces have been
used in the feeds, you have little to deal with other than understanding the structure. As you
are already aware from Chapter 7, you access elements as properties by name, and you access
attributes like an array with string indexes.
<?php
/* Define some RSS 2.0 and other compatible feeds */
$rssfeed = array();
/* The PHP RSS feeds are RSS version 0.93 */
$rssfeed['PHPGEN'] = ' />/* The YAHOO RSS feeds are RSS version 2.0 */
$rssfeed['YAHOOTOPNEWS'] = ' />/* The Planet PHP RSS feed is RSS version 0.91 */
$rssfeed['PLNTPHP'] = ' />/* Apress new book list feed - RSS 2.0 */
$rssfeed['APRESSBOOKS'] = ' />/* Loop through and process each defined feed */
foreach($rssfeed AS $name=>$url) {
$rssParser = simplexml_load_file($url);
/* Output the channel information */
print $rssParser->channel->title."\n";
print " URL: ".$rssParser->channel->link."\n";
print " ".$rssParser->channel->description."\n\n";

CHAPTER 14 ■ CONTENT SYNDICATION: RSS AND ATOM560
6331_c14_final.qxd 2/16/06 4:34 PM Page 560
/* Iterate through the items, and output each one */
foreach ($rssParser->channel->item AS $item) {
print $item->title."\n";
print $item->link."\n";
print $item->pubDate."\n";
print $item->description."\n\n";
}
}
?>
As you can see, in only a few lines of code the basic information from RSS feeds ranging
from version 0.91 to 2.0, excluding RSS 1.0, is easily parsed using SimpleXML.
Creating a Simple Atom Parser Using XMLReader
This example uses XMLReader to parse an Atom feed from Planet PHP (http://www.
planet-php.org). Although the feed uses Atom 0.3, the code written here based on Atom 1.0
is compatible with the older version feed. It is basic because it outputs only the feed title,
URL, and a subtitle, if one is defined. It then outputs the title, link, and content for each entry
element in the feed. The amount of code to perform this simple task is much greater than that
of SimpleXML. XMLReader is a streaming parser, so the entire tree is not loaded into memory.
Although it is extremely fast and uses a low amount of memory, the code is much more diffi-
cult to write because positioning must be tracked to retrieve the correct information from the
feed.
■Note Within the following example, you may notice $$curnode being used. This is not a typo but
rather the use of a
variable variable
. A variable variable allows access to variables using dynamic names.
For example,
$a = 'myvariable'; $$a = 1; print $myvariable; results in the output of 1. You can
find detailed information concerning variable variables in the PHP manual.

Here’s the code:
<?php
$rssURL = ' />function outputChannelInfo($channelTitle, $channelLink, $channelDesc)
{
print "Title: $channelTitle\n";
print "URL: $channelLink\n";
print "Description: $channelDesc\n";
print " \n\n";
$GLOBALS['printTitle'] = TRUE;
}
CHAPTER 14 ■ CONTENT SYNDICATION: RSS AND ATOM 561
6331_c14_final.qxd 2/16/06 4:34 PM Page 561
/* This function processes an entry element and its contents */
function processItem($rssParser)
{
$content = "";
$link = "";
$title = "";
$curnode = NULL;
/* Keep processing the entry until the closing entry tag is encountered */
while ($rssParser->read() && $rssParser->localName != "entry") {
switch ($rssParser->nodeType) {
case XMLREADER::ELEMENT:
$curnode = NULL;
switch ($rssParser->localName) {
case "title":
case "content":
$curnode = $rssParser->localName;
break;
case "link":

$link = $rssParser->getAttribute('href');
}
break;
case XMLREADER::TEXT:
case XMLREADER::CDATA:
if (! is_null($curnode)) {
$$curnode = $rssParser->value;
}
}
}
print " Title: $title\n";
print " URL: $link\n";
print " Description: $content\n\n";
}
/* Create a new XMLReader, and begin reading from the remote location */
$rssParser = new XMLReader();
$rssParser->open($rssURL);
$printTitle = FALSE;
$subtitle = "";
$link = "";
$description = "";
$curnode = NULL;
CHAPTER 14 ■ CONTENT SYNDICATION: RSS AND ATOM562
6331_c14_final.qxd 2/16/06 4:34 PM Page 562
while ($rssParser->read()) {
switch ($rssParser->nodeType) {
case XMLREADER::ELEMENT:
$curnode = NULL;
switch ($rssParser->localName) {
case "entry":

if (! $printTitle) {
/* output the feed information before the first entry element */
outputChannelInfo($title, $link, $description);
}
/* If the entry is not empty, then process the contents */
if (! $rssParser->isEmptyElement) {
processItem($rssParser);
}
break;
case "title":
case "subtitle":
$curnode = $rssParser->localName;
break;
case "link":
$link = $rssParser->getAttribute('href');
}
break;
case XMLREADER::TEXT:
case XMLREADER::CDATA:
if (! is_null($curnode)) {
$$curnode = $rssParser->value;
}
}
}
/* In the event the feed contained no entry elements, output the feed information */
if (! $printTitle) {
outputChannelInfo($title, $link, $subtitle);
}
?>
XMLReader has an easy API to understand. The code should be more than enough to

understand how it is being parsed.
Using PEAR XML_RSS
The PEAR XML_RSS class, mentioned in Chapter 13, provides an easy way to read RSS feeds
without having to even know XML. Although it cannot be used to read Atom-based feeds, it
should work with most RSS version 1.0 and 2.0 feeds. The only requirements to use this class,
other than having to install it on the machine, are that the XML_Parser package is installed
and that remote file access is enabled (unless all feeds being accessed are local files, which is
highly unlikely).
CHAPTER 14 ■ CONTENT SYNDICATION: RSS AND ATOM 563
6331_c14_final.qxd 2/16/06 4:34 PM Page 563
You create the RSS parser by instantiating an XML_RSS object and by passing a URI or file
handle for the RSS data to be parsed to the constructor:
$rss_parser = new XML_RSS(' />In this instance, the RSS feed, located at is set as the
data to be parsed for the instantiated XML_RSS object, $rss_parser. Once created, the parse()
method must be called to read and parse the data, which will then, barring any errors, be
available to access by means of the API:
$rss_parser->parse();
The API is quite simple, having only five methods. Each method returns an array of data,
which corresponds to a specific group of information from the RSS document. The first piece
of information that is typically requested concerns the channel. The getChannelInfo() method
returns an associative array containing information about the channel itself. You can use the
following keys to access specific channel information from the array:
title: Title of the channel
link: URI of the channel
description: Description of the channel
image: An image associated with the channel
The availability of these keys depends upon the actual data contained in the RSS feed,
so it is usually prudent to check that a certain key exists in the array prior to trying to retrieve
a value.
The next area of the RSS feed typically accessed is the items contained in the feed. The

getItems() method returns a two-dimensional array containing each RSS item, which is then
accessed in a similar fashion as the channel information. Unlike a channel, no image is associ-
ated with an item, but XML_RSS does provide access to the publication date for an item, if
available, through the pubDate key.
You can quickly access all images from the RSS document by using the getImages()
method. It returns a two-dimensional array containing information about each image. The
available keys for an image are as follows:
title: Name of the channel
link: URL to the site
url: URL to the image
Text inputs are not all that common in feeds but can be accessed through the
getTextinputs() method. This method returns an array accessed through the following keys:
title: The label of the Submit button
description: The description of the input field
link: The URL of the script that processes text input requests
name: The name of the text object in the text input area
CHAPTER 14 ■ CONTENT SYNDICATION: RSS AND ATOM564
6331_c14_final.qxd 2/16/06 4:34 PM Page 564
The last method, getStructure(), provides a quick way to retrieve the entire RSS docu-
ment as a structure. The return value for this method is an array, but its composition depends
upon the RSS document itself so has no defined set form.
Listing 14-5 shows the RSS document that is parsed by Listing 14-6, which also displays
some of its basic information on the console.
Listing 14-5. RSS File Located at /><?xml version="1.0" encoding="UTF-8"?>
<rss version="0.91">
<channel>
<title>My RSS Feed</title>
<link> /><description>My Example Rss Feed</description>
<language>en</language>
<item>

<title>CDATA Section contained within description</title>
<link> /><description><![CDATA[<p>CDATA sections contain the content for
the description element so may contain any type
of characters</p>]]></description>
</item>
<item>
<title>RSS 0.91 does not have any namespaces</title>
<link> /><description><![CDATA[<p>No need to deal with namespaces when
using RSS 0.91.]]></description>
</item>
</channel>
</rss>
This document uses an older format of RSS, version 0.91, to demonstrate the flexibility of
the XML_RSS class. The code in Listing 14-6 could easily use a different feed without you having
to change anything other than the URL passed to the XML_RSS constructor:
Listing 14-6. RSS Parser Example
<?php
/* Require XML_RSS package */
require "XML/RSS.php";
/* Create RSS Parser */
$rss_parser = new XML_RSS(" />/* Parse RSS Feed */
$rss_parser->parse();
CHAPTER 14 ■ CONTENT SYNDICATION: RSS AND ATOM 565
6331_c14_final.qxd 2/16/06 4:34 PM Page 565
/* Get and Display Channel Information */
$channel = $rss_parser->getChannelInfo();
echo 'Channel: '.$channel['title']."\n";
echo ' Link: '.$channel['link']."\n";
echo ' Description: '.$channel['description']."\n";
echo " \n\n";

/* Get and Display Items */
foreach ($rss_parser->getItems() as $value) {
echo 'Item: '.$value['title']."\n";
echo ' Link: '.$value['link']."\n\n";
}
?>
Channel: My RSS Feed
Link: />Description: My Example Rss Feed

Item: CDATA Section contained within description
Link: />Item: RSS 0.91 does not have any namespaces
Link: />Conclusion
Content syndication has become popular mainly because of the numerous blogs available on
the Web. The most popular formats for this are RSS 1.0, RSS 2.0, and Atom. These formats had
rough evolutions. With all the discontent between the RSS 1.0 and RSS 2.0 camps, a bunch of
developers decided to start things from scratch, which resulted in Atom. In this chapter, you
saw how documents in all of these formats are structured and learned how to create and parse
them using tools available in PHP. Through the recent chapters, you have gotten closer to
working with XML and the Internet, with content syndication being primarily an XML-based
Web technology.
In the next chapter, you will begin to enter the world of Web Services, starting with Web
Distributed Data Exchange (WDDX).
CHAPTER 14 ■ CONTENT SYNDICATION: RSS AND ATOM566
6331_c14_final.qxd 2/16/06 4:34 PM Page 566
Web Distributed Data Exchange
(WDDX)
With the exception of content syndication, the material presented to this point has been
about general XML technologies and tools. Moving forward, the remaining chapters focus
more on Web services and data exchange through the use of XML. This chapter will cover
WDDX, which is a common XML format for exchanging data structures; specifically, the

chapter will explain what WDDX is, how to use it, and how to use the wddx extension in PHP.
Although WDDX itself is not a Web service, it can be used to create Web services.
Introducing WDDX
WDDX is an XML technology that allows data and data structures to be exchanged between
systems in a system-neutral format while keeping the data types intact. It defines an XML
structure that is used to pass the data but does not define the mechanism the data is passed
between; therefore, WDDX itself cannot be considered a Web service but can be used to build
a Web service, in the general sense, using any form of transport you like, including (but not
limited to) HTTP, File Transfer Protocol (FTP), Simple Mail Transfer Protocol (SMTP), and
Post Office Protocol (POP). Basically, you can use any protocol that supports transferring
textual data.
Background
Allaire created WDDX in 1998 to provide distributed computing support to its ColdFusion
platform. With WDDX, variables (which include a name, data type, and value) can be serial-
ized into an XML document from one application and sent to another. The receiving
application can then unserialize the XML document and re-create these variables in their
native data types and values. Data types are not limited just to the simple number and string
types but also include more complex structures such as arrays, structures, and recordsets.
WDDX is platform and language agnostic. This allows other languages on a variety of
platforms to take advantage of this technology, thus letting an application on one platform
written in one language send data to another application on another platform using a dif-
ferent language. The receiving application is then able to unserialize the data into its own
native data types.
567
CHAPTER 15
■ ■ ■
6331_c15_final.qxd 2/16/06 4:33 PM Page 567
WDDX is not a formal standard but is built upon open standards, specifically XML 1.0,
and is freely available for both use and redistribution. WDDX development and future evolu-
tion has moved to an open project, OpenWDDX.org (). Although you

can find some information and software development kits (SDKs) at this site, you won’t find
much activity from the past few years. This does not mean the WDDX technology is dead. It is
still actively used on a number of platforms and programming languages, especially PHP.
WDDX Data Types
Thinking of the data in terms of variables and their data types in PHP, the question becomes,
how can you send the data to another system, using XML, for processing? For example, you
might have the following variables, whose values need to be sent to another system:
$myinteger = 1;
Using XML, you might serialize the values, which simply means converting them to
a textual representation, and then send them in an XML document:
<data>1</data>
Depending upon the type of processing you need to perform, this might be sufficient.
The drawback to this is that you lose the native data types. Of course, the systems might
already have some predetermined structure and therefore map the structure accordingly,
or some sort of type hinting might be included in the document, like so:
<data type='integer'>1</data>
This does provide more flexibility, but any systems that are exchanging data have to
understand the structure and know how it should be processed. A different solution might
involve using XML Schemas to indicate data types, but, again, the system needs to know
how to process the document.
WDDX provides a solution to this problem. Through its common format, the value
would serialize to the following:
<wddxPacket version="1.0">
<header/>
<data>
<number>1</number>
</data>
</wddxPacket>
When passing a single value, this format might be acceptable, but XML is a descriptive
language. All you know from this structure is that it contains 1. You could never pass multiple

values in this format because nothing descriptive sets them apart. The majority of cases will
be serializing the actual variable rather than just the value, allowing for some descriptive
information to be passed. For instance, serializing the following variables, rather than single
values, produces something much more useful:
$mystring = 'Text Data';
$myinteger = 1;
CHAPTER 15 ■ WEB DISTRIBUTED DATA EXCHANGE (WDDX)568
6331_c15_final.qxd 2/16/06 4:33 PM Page 568
<wddxPacket version="1.0">
<header/>
<data>
<struct>
<var name="mystring">
<string>Text Data</string>
</var>
<var name="myinteger">
<number>1</number>
</var>
</struct>
</data>
</wddxPacket>
This structure clearly shows that it contains a variable named mystring, which is a string
containing the value Text Data, and a variable named myinteger, which is a number containing
the value 1. WDDX is not limited to just these simple data types; WDDX provides support for
several abstract types that are represented in a number of languages, as shown in Table 15-1.
CHAPTER 15 ■ WEB DISTRIBUTED DATA EXCHANGE (WDDX) 569
Table 15-1. WDDX Data Types and Language Mappings
WDDX PHP Java ECMAScript COM Type
null NULL null null VT_NULL
boolean boolean java.lang.Boolean boolean VT_BOOL

number integer, float, double java.lang.Double number VT_R8
dateTime java.lang.Date Date VT_DATE
string string java.lang.String String VT_BSTR
array array java.lang.Vector Array VT_ARRAY |
VT_VARIANT
struct array, object java.lang.Hashtable Object IWDDXStruct
recordset com.allaire.util. WddxRecordset IWDDXRecordset
RecordSet
binary com.allaire.util.Binary WddxBinary V_ARRAY | UI1
Understanding the Structure of WDDX
The structure of WDDX documents has remained consistent since 1999 with the release of
WDDX 1.0. Although the structure looks simple based on the DTD ( />downloads/download.cfm), the actual complexity of the document depends upon the data
being serialized. The more complex the structure of a variable (for instance, containing multi-
dimensional arrays or classes), the more complex the composition of the WDDX document
will be. The following sections will cover the structure of WDDX documents; you can build
them manually using an extension such as DOM or XMLWriter (covered in Chapter 2), or
6331_c15_final.qxd 2/16/06 4:33 PM Page 569
you can build them using the wddx extension in PHP, which requires little to no knowledge
of XML structures.
WDDX Packets
Data exchange using WDDX takes place through packets. Packets are simply XML documents
passing data in WDDX format; they begin with the wddxPacket element. This document element
contains a single header element and a single data element, providing a container for notes or
comments and a container for the actual data being exchanged, respectively. It also contains a
version attribute with the version of WDDX being used. Because currently only a single version
exists, the value will always be 1.0. When adding notes and comments to the packet, you use
a comment element, which is an optional child of the header element; otherwise, the header ele-
ment is just an empty element. For example:
<! Packet without notes or comments >
<wddxPacket version='1.0'>

<header/>
<data>
<! WDDX data goes here >
</wddxPacket>
<! Packet with a comment >
<wddxPacket version='1.0'>
<header>
<comment>
This packet contains a comment
</comment>
</header>
<data>
<! WDDX data goes here >
</wddxPacket>
The data element contains the meat of the packet, which is the data you are exchanging.
It contains only a single element, which depends upon the data being added to the packet.
The following is a list of valid child elements, which are explained in the next sections, for the
data element:
• null
• boolean
• number
• dateTime
• string
• array
• struct
• recordset
• binary
CHAPTER 15 ■ WEB DISTRIBUTED DATA EXCHANGE (WDDX)570
6331_c15_final.qxd 2/16/06 4:33 PM Page 570
Simple Data Type Elements

Simple data types are simple structures that cannot contain additional data types within their
contents. These elements include null, boolean, number, dateTime, and string. These data types
simply contain a value of the specified type as its contents or are empty in the case that the data
type does not or cannot have a value, such as NULL.
null
The null element represents a NULL value or empty string, depending upon whether the lan-
guage supports a NULL type. In the case of PHP, NULL is supported, but just keep in mind that
if exchanging data with another system using another language, it may be interpreted as an
empty string rather than NULL when unserialized. This is the element’s syntax:
<null/>
boolean
The boolean element represents a Boolean value. The value of this element can be true or false.
Case sensitivity is important here. These values must be lowercase in order to be considered
valid values according to the WDDX DTD. Even though mixed case may work in certain cases,
using all lowercase is highly recommended. This is the element’s syntax:
<boolean>false</boolean>
number
The number element is used for floating-point numbers. In PHP this covers both the integer
types and the float types. The range of numbers for the value of this element is restricted to
+/-1.7E+/-308 with the precision restricted to a maximum of 15 digits after the decimal point.
This is comparable to an 8-byte floating-point representation, which is the common maxi-
mum value for floats within PHP.
The following elements contain the serialized values for the numbers 12345, -12345,
12.345, -12.345, and 123456789012345:
<number>12345</number>
<number>-12345</number>
<number>12.345</number>
<number>-12.345</number>
<number>1.2345678901235E+014</number>
dateTime

The dateTime element carries date and time information in ISO 8601 format. PHP does not
have a native datetime type. This does not mean you cannot use this element, though. For
instance, you can set values using the date() function with either the c format parameter
added in PHP 5 (that is, date('c')) or the DATE_ISO8601 constant added in PHP 5.1 (that is,
date(DATE_ISO8601)) to create ISO 8601–formatted dates. This is the element’s syntax:
<dateTime>2005-10-08T17:28:04-04:00</dateTime>
CHAPTER 15 ■ WEB DISTRIBUTED DATA EXCHANGE (WDDX) 571
6331_c15_final.qxd 2/16/06 4:33 PM Page 571
string
The string element contains arbitrary-length strings that must not contain embedded NULLs.
You can handle control characters, falling into the UTF-8 range 00–1F, using child char elements.
The char element is an empty element with a code attribute. The value of this attribute is a single
character using the hexadecimal code. You do not need to handle tab (09) and newline (0A)
characters by using a char element. These characters are valid within XML text content.
Therefore, when setting a string value containing any of the special control characters, the
value of the string element will contain mixed content. For example, XML removes carriage
returns from XML data. Line endings in a Windows environment consist of a carriage return
and a newline. You can preserve these using the char element. The following examples illus-
trate how to use the string element as well as the char element:
<string>This is a string value without any control characters</string>
<string>Line 1<char code="0D"/><char code="0A"/>Line2</string>
Complex Data Type Elements
Complex data type elements include the array, struct, recordset, and binary elements.
These elements are used for more complex data structures, such as PHP arrays and classes.
Only two of these elements, array and struct, have direct mappings to native PHP types,
but the remainder can be converted into data usable by an application.
array
The array element holds data for an integer-based array. In PHP, arrays can have numeric or
string indexes. Only numeric-indexed arrays map to the array element. String-based indexed
arrays are handled with the struct element.

■Note Numeric index arrays in PHP are zero-based arrays. Creating arrays that are not zero-based, even
though they are numerically indexed, may not result in using the
array element. For instance, the arrays
array(2=>'a', 4=>'b', 6=>'c') and array(0=>'a', 2=>'b') serialized using the wddx extension
would result in a struct with named variables rather than array elements.
The children of an array element consist of the values held at each index. These values can
be both simple and complex data types. This means an array element can have one or more data
type child elements, which are the same child elements valid for use within the data element. The
array element also contains a length attribute. The value of this attribute is the number of values
held within the array. In PHP terms, the value of the length attribute is the value from calling the
count() function on the array being serialized.
For instance, the following PHP arrays, which are both numerically indexed, are serialized
into the same WDDX array structure:
CHAPTER 15 ■ WEB DISTRIBUTED DATA EXCHANGE (WDDX)572
6331_c15_final.qxd 2/16/06 4:33 PM Page 572

×