Tải bản đầy đủ (.pdf) (68 trang)

Tài liệu Online Training Courses from php|architect doc

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (3.44 MB, 68 trang )

V I R T U A L C L A S S R O O M S
Online Training Courses
from php|architect
Zend PHP Essentials
Our introductory PHP course, Zend PHP Essentials, was developed for us and Zend Technologies by
PHP expert Chris Shiflett, co-founder of the PHP Security Consortium. This 19-hour course provides
a thorough introduction to PHP development, with particular care to "doing things right" by covering
security, performance and the best development techniques. Rather than cramming as much theory
as possible, PHP Essentials provides a thoroughly practical approach to learning PHP—thus ensuring
that each student will be able to write good PHP code in a real-world setting by the end of the
course.
Zend PHP Certification Training
If you want to become a Zend Certified Engineer, this
course is the best preparation tool that you'll ever find!
Designed by some of the same Subject Matter Experts
who also helped write the exam itself, this course cov-
ers every single topic that is part of the exam. The
Zend PHP Certification Training (course) provides a
complete overview of the exam, and doubles as an
excellent refresher course in PHP for any developer.
Zend Professional PHP Development
This is our advanced course for the professional PHP
developer. This course picks up from where PHP
Essentials ends and provides a thorough, in-depth
analysis of advanced features found in both PHP 4
and PHP 5, including object-oriented programming
and design patterns, XML development, regular
expressions, encryption, e-mail manipulation, perform-
ance management and advanced databases.
For more information, visit our website at />or call us toll-free at (877) 630-6202 (416-630-6202 outside Canada and the U.S.)


Course
Description
Start Dates
Duration
Tutoring
Prerequisites Cost
Zend PHP
Essentials
• Covers PHP 4 and PHP 5
• Provides a thorough practical
introduction to PHP
• Covers security and performance
Every month
7 Sessions
19 Hours
3 Weeks
YES
-
$769.99 US
($999.99 CAD)
Zend PHP
Certification
Training
• Covers every topic in the exam
• Provides an excellent refresher
course for PHP at all levels
Every month
7 Sessions
19 Hours
3 Weeks

YES
Zend PHP
Essentials
$644.99 US
($838.99 CAD)
Zend Professional
PHP Development
• Covers advanced PHP 4 and
PHP 5 topics
• Perfect for going "beyond the
basics" and learning the true
power of PHP
Every month
7 Sessions
19 Hours
3 Weeks
YES
Zend PHP
Essentials
$769.99 US
($999.99 CAD)
• All our courses are delivered entirely online using an innovative system that combines the con-
venience of the Internet with the unique experience of being in a real classroom.
• All sessions take place in real time
, and the students can interact directly with the instructor as if
they were in a real classroom either via voice or text messaging.
• In most cases, our system requires no software installation and works with the majority of oper-
ating systems and browsers, including Windows, Mac OS and Linux, as well as Internet
Explorer, Firefox and Safari.
• All courses include a generous amount of homework and in-class exercises to ensure that the

students assimilate each topics thoroughly.
• Tutoring is available (via e-mail) throughout the duration of the entire course.
• Each class includes a complete set of recordings that the students can peruse at their leisure.

FFEEAATTUURREESS
16 Roll Your Own Database Abstraction Module
by Jason Lustig
23 An Introduction to PDO
Uniform Database Access in PHP 5.x
Ilia Alshanetsky
37 What are Trackbacks
and Why Do They Exist?
by Chris Cornutt
44 End-to-End Testing
with PHP and Internet Explorer
by Oz Solomon
09.2005
Download this month’s code at:
hhttttpp::////wwwwww pphhppaarrcchh ccoomm//ccooddee//
DDEEPPAARRTTMMEENNTTSS
6 EDITORIAL
The Whining Stops Here
7 WHAT’S NEW
10 TIPS & TRICKS
Input Filtering: Part 3
Ensuring Input Received is
Input Expected
by Ben Ramsey
54 TEST PATTERN
State of Confusion

by Marcus Baker
59 PRODUCT REVIEW
FUDforum 2.7.1
by Peter B. MacIntyre
63 SECURITY CORNER
PHP Security Audits
by Chris Shiflett
68 Exit(0);
Atomic Orange
by Marco Tabini

The Whining
Stops Here
E
E
D
D
I
I
T
T
O
O
R
R
I
I
A
A
L

L
P
HP has long been attacked by those who complain who like to com-
plain, usually about parts of the language that “don’t [quite] work
properly,” or issues that have sprung up as a result of PHP’s constant
evolution (but reluctance to break backwards-compatibility). How many
times have you had to consult the manual to refresh your memory on the
order of the
n
n
e
e
e
e
d
d
l
l
e
e
and
h
h
a
a
y
y
s
s
t

t
a
a
c
c
k
k
parameters? Unfortunately, there’s no
way to “fix” this particular issue, without breaking every script, in the his-
tory of PHP, that has ever used the
i
i
n
n
_
_
a
a
r
r
r
r
a
a
y
y
(
(
)
)

function.
Bogus complaints aside, one actually valid argument against PHP that
I’ve seen, recurring amongst the pundits, is the lack of a built-in, com-
mon database access mechanism.
Sure, there are a number of database abstraction packages floating
around the PHP world. Some of these are even quite mature, and feature-
rich. Still, none have been bundled with PHP (with the exception of
PEAR::DB), nor have they received the de facto PHP Core Seal of
Approval.
Enter PHP Data Objects (PDO), one of, if not the, first, compiled, true
PHP extensions that allows uniform database access for the majority of
popular database platforms. Not only is it actually a PHP extension
(which generally means that the code will be fast—and PDO meets this
expectation), and not a bunch of more common PHP user-land code, but
it will be bundled with PHP 5.1, which should be released “Real Soon
Now.” This is great news for everyone who uses PHP to communicate
with a database.
One of the main PDO developers, and a name you’re likely to recog-
nize, Ilia Alshanetsky, has written an introduction to this wonderful new
extension, and we’re proud to be running it in this issue.
If you’re anxious to try out PDO, but aren’t so anxious as immediately
upgrade to PHP 5.1 (or a release candidate), the extension has been
available in PECL for a while, now, for anyone who is running at least PHP
5.0.
Back to the pundits, one thing to remember in this argument is that
PDO doesn’t claim to be a database abstraction layer, but a common data-
base access interface. True database abstraction is nearly impossible to
maintain. Consider database-specific SQL, such as MySQL’s
N
N

O
O
W
W
(
(
)
)
versus
MSSQL’s
g
g
e
e
t
t
_
_
d
d
a
a
t
t
e
e
(
(
)
)

. So, PDO aptly defers this behavior to the user, and
doesn’t attempt to re-write queries (for the most part—see the part of the
article that discusses prepared statements and emulation).
That’s why another approach, such as the one described in Jason
Lustig’s piece (in this issue) would lend itself nicely to a common access
interface such as PDO. Jason’s code could easily accommodate PDO,
while allowing the user to specify RDBMS-specific SQL.
Looks like the PHP-haters will have to find something else to whine
about. In the mean time, we PHP-lovers will go about our lives, eating up
new features with enthusiasm.
Happy reading!
September 2005

PHP Architect

www.phparch.com
php|architect
Volume IV - Issue 9
September, 2005
Publisher
Marco Tabini
Editor-in-Chief
Sean Coates
Editorial Team
Arbi Arzoumani
Peter MacIntyre
Eddie Peloke
Graphics & Layout
Aleksandar Ilievski
Managing Editor

Emanuela Corso
News Editor
Leslie Hill

Authors
Ilia Alshanetsky, Marcus Baker,
Chris Cornutt, Jason Lustig,
Peter B. MacIntyre, Ben Ramsey,
Chris Shiflett, Oz Solomon
php|architect (ISSN 1709-7169) is published twelve times a year by
Marco Tabini & Associates, Inc., P.O. Box 54526, 1771 Avenue Road,
Toronto, ON M5M 4N5, Canada.
Although all possible care has been placed in assuring the accuracy of
the contents of this magazine, including all associated source code, list-
ings and figures, the publisher assumes no responsibilities with regards
of use of the information contained herein or in all associated material.
Contact Information:
General mailbox:
Editorial:
Subscriptions:
Sales & advertising:
Technical support:
Copyright © 2003-2005 Marco Tabini &
Associates, Inc. — All Rights Reserved
TM
September 2005

PHP Architect

www.phparch.com

7
What’s
NEW
? >
PHP 5.1 RC 1
php.net announces the release of PHP 5.1 RC 1.
"PHP 5.1 Release Candidate 1 is now available! If all goes well, this RC will be followed by a release within a couple of weeks.
Some of the key improvements of PHP 5.1 include:
• PDO (PHP Data Objects) - A new native database abstraction layer providing performance, ease-of-use, and flexibility.
• Significantly improved language performance mainly due to the new Zend Engine II execution architecture.
• The PCRE extension has been updated to PCRE 6.2.
• Many more improvements including lots of new functionality & many bug fixes, especially in regards to SOAP, streams
and SPL.
• See the bundled NEWS file for a more complete list of changes.
Everyone is encouraged to download and test this beta, although it is not yet recommended for mission-critical production use."
Get your hands on the latest release at
php.net.
MySQL 4.1.14
MySQL announces the release of version 4.1.14.
Some new changes include:
• SHOW CHARACTER SET and INFORMATION_SCHEMA now prop-
erly report the Latin1 character set as cp1252.
• MySQL Cluster: A new -P option is available for use with the
ndb_mgmd client. When called with this option, ndb_mgmd
prints all configuration data to stdout, then exits.
• The output of perror help now displays the ndb option.
• NDB: Improved handling of the configuration variables
NoOfPagesToDiskDuringRestartACC,
NoOfPagesToDiskAfterRestartACC,
NoOfPagesToDiskDuringRestartTUP, and

NoOfPagesToDiskAfterRestartTUP should result in noticeably
faster startup times for MySQL Cluster.
• Added support of WHERE clause for queries with FROM DUAL.
• Added an optimization that avoids key access with NULL keys
for the ref method when used in outer joins.
• Added new query cache test for the embedded server to the test
suite, there are now specific tests for the embedded and non-
embedded servers.
• Release also contains several bug fixes.
Grab the latest release from mysql.com
.
phpGroupWare 0.9.16.008
The phpGroupWare team is proud to announce
their latest release, 0.9.16.008. What is
phpGroupWare? phpGroupWare.org describes it as:
"phpGroupWare-formerly known as webdistro-is
a multi-user groupware suite written in PHP.
It provides about 50 web-based applications,
such as Calendar, Address Book, an advanced
Projects manager, To Do List, Notes, Email,
Newsgroup and Headlines Reader, a File
Manager and many more applications. The
calendar supports repeating events and
includes alarm functions. The email system
supports inline graphics and file attachments.
The system as a whole supports user prefer-
ences, themes, user permissions, multi-lan-
guage support and user groups. It includes
modules to set up and administer the working
environment. The groupware suite is based on

an advanced Application Programming
Interface (API)."
Get more info at
phpGroupWare.org.
WWhhaatt’’ss NNeeww??>>
September 2005

PHP Architect

www.phparch.com
8
Check out some of the hottest new releases from PEAR.
MP3_ID 1.2.0RC2
This class offers methods for reading and writing information tags (version 1) in MP3 files.
File_Find 1.0.0
File_Find, created as a replacement for its Perl counterpart, also named File_Find, is a directory searcher, which handles,
globbing, recursive directory searching, as well as a slew of other cool features.
PHPUnit 1.3.0
PHPUnit is a regression testing framework used by developers to implement unit tests in PHP. This version is to be used
with PHP 4.
Mail 1.1.8
PEAR's Mail package defines an interface for implementing mailers under the PEAR hierarchy. It also provides supporting func-
tions that are useful to multiple mailer backends. Currently supported backends include: PHP's native mail() function, send-
mail, and SMTP. This package also provides a RFC822 email address list validation utility class.
DB_DataObject_FormBuilder 0.18.1
DB_DataObject_FormBuilder will aid you in rapid application development using the DB_DataObject and HTML_QuickForm
packages. For a quick, but working, prototype of your application, simply model the database, run DataObject's createTable
script over it, and write a script that passes one of the resulting objects to the FormBuilder class. The FormBuilder will auto-
matically generate a simple but working HTML_QuickForm object that you can use to test your application. It also provides
a processing method that will automatically detect if an insert() or update() command has to be executed after the form has

been submitted. If you have set up DataObject's links.ini file correctly, it will also automatically detect if a table field is a for-
eign key and will populate a select box with the linked table's entries. There are many optional parameters that you can place
in your DataObjects.ini or in the properties of your derived classes, and will be used to fine-tune the form generation, grad-
ually turning the prototypes into fully-featured forms. You can take control at any stage of the process.
Net_Curl 1.2.2
Provides an OO interface to PHP's curl extension.
php|architect Releases New Design Patterns Book
We're proud to announce the release of php|architect's Guide to PHP Design Patterns,
the latest release in our Nanobook series.
You have probably heard a lot about Design Patterns a technique that helps you design
rock-solid solutions to practical problems that programmers everywhere encounter in their
day-to-day work. Even though there has been a lot of buzz, however, no-one has yet come
up with a comprehensive resource on design patterns for PHP developers—until today.
Author Jason E. Sweat's book php|architect's Guide to PHP Design Patterns is the
first, comprehensive guide to design patterns designed specifically for the PHP developer.
This book includes coverage of 16 design patterns with a specific eye to their applications in
PHP when building complex web applications, both in PHP 4 and PHP 5 (where appropriate,
sample code for both versions of the language is provided).
For more information,
/>WWhhaatt’’ss NNeeww??>>
September 2005

PHP Architect

www.phparch.com
9
Looking for a new PHP Extension? Check out some of the lastest offerings from PECL.
pecl_http 0.12.0
pecl_http's features and functionality includes:
• Building absolute URIs

• RFC compliant HTTP redirects
• RFC compliant HTTP date handling
• Parsing of HTTP headers and messages
• Caching by "Last-Modified" and/or ETag (with 'on the fly' option for ETag generation from buffered output)
• Sending data/files/streams with (multiple) ranges support
• Negotiating user preferred language/charset
• Convenient request functions built upon libcurl
• PHP 5 classes: HttpUtil, HttpResponse, HttpRequest, HttpRequestPool, HttpMessage
APC 3.0.8
APC is the Alternative PHP Cache. It was conceived of to provide a free, open, and robust framework for caching and opti-
mizing PHP intermediate code.
ingres 1.1
This extension supports Computer Associates's Ingres Relational Database.
DTrace 1.0.2
Allows Solaris' dtrace to instrument PHP.
PHPEd 4.0
NuSphere announces the latest release of their php IDE: PHPEd 4.0. The announcement lists some of the main features of the new
release as:
• Advanced, efficient and highly customizable EDITOR with support for object-oriented coding. Code highlighter, user-defined
shortcuts, instant syntax analysis, code insight, code templates and much more.
• Sophisticated PHP DEBUGGER that can operate both locally and in the remote mode. Debugger module for the latest php ver-
sion 5.0.4 is included in the package.
• PHP PROFILER. PhpED profiler shows executing time for each line, function or module of the code with tenth milliseconds preci-
sion. All the bottlenecks in the code are located quickly and efficiently.
• Project-wide CODE EXPLORER in PhpED IDE shows all php classes, methods, properties, functions and variables in every detail.
• Enhanced project management and deployment. Support for FTPS (TLS/SSL), SFTP and WebDAV/HTTPS (SSL) protocols make
deployment and data transfer secure.
• Integrated MySQL, MSSQL, Oracle and UltraSQL/PostgreSQL clients. Connect to a database directly from the IDE. Browse data-
bases, run SQL queries and work with database content without leaving the IDE.
• Integrated CVS client. Review changes in old versions of a source files to track bugs while working on the same project in a

team of developers.
• NuSOAP Wizard. Easily build professional web services in PHP using the NuSoap library.
• Enhanced integration. PhpED IDE can be easily integrated with 3rd party tools. The product is delivered with the embedded CSE
HTML Validator LITE and PolyStyle Formatter. PhpED IDE includes a number of pre-configured tools like PHP documentor, HTML
Tidy and a CVS client.
• Support for international character sets, including UTF-8. True Unicode editing is now available. PhpED IDE can be used to cre-
ate web sites in different encodings and natural languages.
For all the latest info, visit
NuSphere.com.
P
art one of this series intro-
duced the need to filter input
and explained why all input,
whether from a user or an RSS feed,
should be considered tainted. I also
introduced the whitelist approach
as a best practice for filtering input.
Part two further explained the
whitelist approach, exploring the
use of the ctype functions as excel-
lent tools to implement a whitelist-
based filter.
Recall from parts one and two the
HTML form used for discussion. I
have included a modified version of
this form in Listing 1. For the pur-
poses of the present discussion, I
have added the
a
a

g
g
e
e
,
c
c
o
o
l
l
o
o
r
r
, and
u
u
s
s
e
e
r
r
n
n
a
a
m
m

e
e
fields. Listing 2 shows the
processing form as seen at the end
of part two.
Rounding out my three-part
series on filtering input, this install-
ment of Tips & Tricks includes dis-
cussion on using regular expres-
sions to filter input, testing for the
length of input, and ensuring the
presence of acceptable values (e.g.
from
s
s
e
e
l
l
e
e
c
c
t
t
,
r
r
a
a

d
d
i
i
o
o
, or
c
c
h
h
e
e
c
c
k
k
b
b
o
o
x
x
form fields, etc.).
Filtering with Regular
Expressions
In last month’s column, I discussed
using PHP’s built-in character type
(
c

c
t
t
y
y
p
p
e
e
) functions to filter input.
When application design allows,
the ctype functions provide a fast
and easy-to-use interface to imple-
ment a whitelist approach to filter-
ing input. However, application
design doesn’t always allow this,
and the
c
c
t
t
y
y
p
p
e
e
functions lack flexibil-
ity.
For example,

c
c
t
t
y
y
p
p
e
e
_
_
a
a
l
l
p
p
h
h
a
a
(
(
)
)
only
checks for alphabetic characters,
September 2005


PHP Architect

www.phparch.com
10
This year has seen an increased focus on PHP security, and
this is good for the language, developers, and business
community. One phrase that comes to mind when dis-
cussing secure coding practices is Chris Shiflett’s mantra of
“filter input, escape output.” While we know what this
means in a general sense, practical examples elude us.
This month’s installment of Tips & Tricks concludes the
series on filtering input, providing practical examples and
helpful tips to filter input using regular expressions, test for
the length of data, and ensure acceptable values.
T
T
I
I
P
P
S
S
&
&
T
T
R
R
I
I

C
C
K
K
S
S
Input Filtering, Part 3:
Ensuring Input Received
is Input Expected
by Ben Ramsey
REQUIREMENTS
PHP n/a
CODE DIRECTORY tips
while
ccttyyppee__ddiiggiitt(())
checks for only
numeric characters.
ccttyyppee__aallnnuumm(())
checks for both, but then it doesn’t
allow for the presence of spaces,
underscores, hyphens, or any other
non-alphanumeric characters (nor
do the previous two mentioned
functions). On the other hand,
ccttyyppee__pprriinntt(())
is too open, allowing
all printable characters, and this
isn’t always a desired approach.
When you know exactly what
characters you want to allow, it’s

best to restrict input to those char-
acters—and only those characters.
So,
ccttyyppee__aallnnuumm(())
is good for user-
names, and
ccttyyppee__ddiiggiitt(())
is good
for five-digit U.S. zip codes, but
ccttyyppee__pprriinntt(())
isn’t necessarily
good for a first and last name, an e-
mail address, or a phone number.
Good application design defines
what characters these fields should
accept; good filtering accepts only
these characters.
Enter PHP’s Perl-Compatible
Regular Expression (PCRE) func-
tions. These functions make up for
their slowness—as compared to the
ccttyyppee
functions—with increased
flexibility and power. Regular
expressions can be used to match
just about anything and can per-
form some amazing tasks.
Take, for example, the
nnaammee
field

in Listing 1. In Listing 2, I define it
as a “string” type and then the
ffiill
tteerr(())
function filters it using
ccttyyppee__pprriinntt(())
. The decision to use
ccttyyppee__pprriinntt(())
over
ccttyyppee__aallpphhaa(())
should be clear: I wanted to allow
users to enter a space between their
first and last names. However, now
users can enter all sorts of random
characters, characters that should
not be acceptable for a name, so I
turn to a regular expression to
match a name. First, I come up with
the following to replace the
ccttyyppee__pprriinntt(())
function:
$clean[$key] =
(preg_match(‘/^[A-Z ]*$/i’,
$value))
? $value : ‘’;
This works well for names such as
“Ben Ramsey,” but suppose I want
September 2005

PHP Architect


www.phparch.com
TTIIPPSS && TTRRIICCKKSS
11
Input Filtering, Part 3
1 <form method=”POST”>
2 Name:
3 <input type=”text” name=”name” maxlength=”50” /><br />
4 Street:
5 <input type=”text” name=”street” maxlength=”100” /><br />
6 City:
7 <input type=”text” name=”city” maxlength=”50” /><br />
8 State:
9 <select name=”state”>
10 <option>Pick a state </option>
11 <option>Alabama</option>
12 <option>Alaska</option>
13 <option>Arizona</option>
14 <!— —>
15 </select><br />
16 Postal Code:
17 <input type=”text” name=”postal_code” maxlength=”10” />
18 <br />
19 Phone:
20 <input type=”text” name=”phone” maxlength=”25” /><br />
21 E-mail:
22 <input type=”text” name=”email” maxlength=”255” /><br />
23 Age:
24
<input type=”text” name=”age” maxlength=”3” /><br />

25 Color:<br />
26 Blue
27 <input type=”checkbox” name=”color[]” value=”blue” />
28 <br />
29 Red
30 <input type=”checkbox” name=”color[]” value=”red” />
31 <br />
32 Green
33 <input type=”checkbox” name=”color[]” value=”green” />
34 <br />
35 Yellow
36 <input type=”checkbox” name=”color[]” value=”yellow” />
37 <br />
38 Username:
39 <input type=”text” name=”username” maxlength=”16” />
40 <br />
41 <input type=”submit” value=”Submit” />
42 </form>
Listing 1
1 <?php
2 function filter ($input, $whitelist) {
3 $clean = array();
4 foreach ($input as $key => $value) {
5 if (array_key_exists($key, $whitelist)) {
6 switch ($whitelist[$key]) {
7 case ‘string’:
8 $clean[$key] =
9 (ctype_print($value)) ? $value : ‘’;
10 break;
11 case ‘int’:

12 $clean[$key] =
13 (ctype_digit($value)) ? $value : ‘’;
14 break;
15 }
16 }
17 }
18 return $clean;
19 }
20
21 $post_whitelist = array(
22 ‘name’ => ‘string’,
23 ‘street’ => ‘string’
,
24 ‘city’ => ‘string’,
25 ‘state’ => ‘string’,
26 ‘postal_code’ => ‘int’,
27 ‘phone’ => ‘string’,
28 ‘email’ => ‘string’
29 );
30
31 if ($_POST) {
32 $clean = filter($_POST, $post_whitelist);
33 }
34 ?>
Listing 2
Tim O’Reilly or Tim Berners-Lee to
fill out my form; I’ll need to allow
more characters. Also, assuming I
want to use the “string” type as a
general purpose string filter, I’ll

want to make the regular expres-
sion a bit more liberal—but not too
liberal. I’m still in control, so I want
to accept only a small range of
characters, a range of characters I
deem acceptable.
A better, “general purpose” regu-
lar expression for matching strings
is:
/^[-A-Z0-9\.\’”_ ]*$/i
I won’t go into the particular details
of how regular expressions work.
There are books and Web sites for
that, but I will share a few of my
preferred regular expressions for fil-
tering standard types of informa-
tion, such as e-mail addresses,
phone numbers, and postal codes.
Looking back at Listing 2, I
defined the postal code with the
“int” type, which works well in cer-
tain circumstances when only the
five-digit U.S. zip code is accept-
September 2005

PHP Architect

www.phparch.com
12
TTIIPPSS && TTRRIICCKKSS

Input Filtering, Part 3
1 <?php
2 require_once ‘PEAR.php’;
3 require_once ‘Mail/RFC822.php’;
4
5 $parsed_email =
6 Mail_RFC822::parseAddressList($_POST[‘email’);
7 if (!PEAR::isError($parsed_email)) {
8 $clean[‘email’] = $_POST[‘email’];
9 }
10 ?>
Listing 3
1 <?php
2 define(‘STRING’, ‘/^[-A-Z0-9\.\’”_ ]*$/i’);
3 define(‘EMAIL’, ‘/^[^@\s]+@([-a-z0-9]+\.)+[a-
z]{2,}$/i’);
4 define(‘PHONE’,
5 ‘/^[\(]?(\d{3})[\)]?[\s]?[\-]?(\d{3})[\s]?[\-]?’
6 .’(\d{4})[\s]?[x]?(\d*)$/’);
7 define(‘POSTAL_US’, ‘/^(\d{5})[\-]?(\d{4})?$/’);
8
9 $post_whitelist = array(
10 ‘name’ => array(
11 ‘type’ => ‘string’,
12 ‘maxlength’ => 50
13 ),
14 ‘street’ => array(
15 ‘type’ => ‘string’,
16 ‘maxlength’ => 100
17 ),

18 ‘city’ => array(
19 ‘type’ => ‘string’,
20 ‘maxlength’ => 50
21 ),
22 ‘state’ => array(
23 ‘type’ => ‘option’,
24 ‘options’ => array(
25 ‘Alabama’,
26 ‘Alaska’,
27 ‘Arizona’
28 )
29 ),
30 ‘postal_code’ => array(
31 ‘type’ => ‘postal’,
32 ‘maxlength’ => 10
33 ),
34 ‘phone’ => array(
35 ‘type’ => ‘phone’,
36 ‘maxlength’ => 25
37 ),
38 ‘email’ => array(
39 ‘type’ => ‘email’,
40 ‘maxlength’ => 255
41 ),
42 ‘age’ => array(
43 ‘type’ => ‘int’,
44 ‘maxlength’ => 3
45 ),
46 ‘color’ => array(
47 ‘type’ => ‘option’,

48 ‘options’ => array(
49 ‘blue’,
50 ‘red’,
51 ‘green’,
52 ‘yellow’
53 ),
54 ‘multiselect’ => TRUE
55 ),
56 ‘username’ => array(
57 ‘type’ => ‘username’,
58 ‘maxlength’ => 16
Listing 4
59 )
60 );
61
62 if ($_POST) {
63 $clean = filter($_POST, $post_whitelist);
64 }
65
66 function filter ($input, $whitelist) {
67 $clean = array();
68 foreach ($input as $key => $value) {
69 if (array_key_exists($key, $whitelist)) {
70 $filtered = NULL;
71 if (isset($whitelist[$key][‘maxlength’])
72 && (strlen($value) >
73 $whitelist[$key][‘maxlength’])) {
74 continue;
75 }
76 switch ($whitelist[$key][‘type’]) {

77 case ‘string’:
78 $filtered = (preg_match(STRING, $value))
79 ? $value :
NULL;
80 break;
81 case ‘int’:
82 $filtered = (ctype_digit($value))
83 ? $value : NULL;
84 break;
85 case ‘option’:
86 if (is_array($value)) {
87 if ($whitelist[$key][‘multiselect’]) {
88 $filtered = array();
89 foreach ($value as $option) {
90 if (in_array($option,
91 $whitelist[$key][‘options’])) {
92 $filtered[] = $option;
93 }
94 }
95 }
96 } else {
97 $filtered =
98 in_array($value,
$whitelist[$key][‘options’])
99 ? $value : NULL;
100 }
101 break;
102
case ‘username’:
103 $filtered = (ctype_alnum($value))

104 ? $value : NULL;
105 break;
106 case ‘email’:
107 $filtered = (preg_match(EMAIL, $value))
108 ? $value : NULL;
109 break;
110 case ‘phone’:
111 $filtered = (preg_match(PHONE, $value))
112 ? $value : NULL;
113 break;
114 case ‘postal’:
115 $filtered = (preg_match(POSTAL_US, $value))
116 ? $value : NULL;
117 break;
118 }
119 if (!is_null($filtered)) {
120 $clean[$key] = $filtered;
121 }
122 }
123
}
124 return $clean;
125 }
126 ?>
Listing 4 (cont’d)
able, but what if I want to accept a
zip+4 postal code? These are typi-
cally written as “12345-1234,” and
will cause
ccttyyppee__ddiiggiitt(())

to return
FFAALLSSEE
, because of the hyphen. Since
the “int” type is useful in other situ-
ations (e.g. the
aaggee
field), I won’t
rewrite its definition. Instead, I’ll
create a new type for “postal,” and
create a regular expression to
accept either a five-digit zip code or
a zip+4 code (with or without the
hyphen).
/^(\d{5})[\-]?(\d{4})?$/
Likewise, the e-mail and phone
number fields in Listing 2 are of the
“string” type, but I know that there
are acceptable patterns I want to
match for both of these. Plus, my
existing “string” regular expression
doesn’t allow the “
@@
” symbol, or
parentheses. Thus, I create an
“email” type and define its regular
expression as:
/^[^@\s]+@([-a-z0-9]+\.)+[a-
z]{2,}$/i
I also create a “phone” type, giving
it the following expression:

/^[\(]?(\d{3})[\)]?[\s]?[\-
]?(\d{3})[\s]?
[\-]?(\d{4})[\s]?[x]?(\d*)$/
These two regular expressions will
match most e-mail addresses or
U.S. phone numbers. In fact, the
expression used for phone numbers
here can extract all the parts of a
standard phone number to the
mmaattcchheess
parameter of
pprreegg__mmaattcchh(())
,
if desired.
It should be noted, however, that
the e-mail address regular expres-
sion used above will not match
some addresses considered compli-
ant according to RFC 822 guide-
lines. Take the following input, for
example: “
JJoohhnn DDooee ((hhoommee aaddddrreessss))
<<jjddooee@@eexxaammppllee ccoomm>>
”. According to
RFC 822 guidelines, this full string is
acceptable, but the e-mail regular
expression will reject it. Also,
addresses that contain no TLD, such
as
jjddooee@@eexxaammppllee

, are valid RFC 822
addresses.
If RFC 822 compliance is neces-
sary, then Listing 3 provides an
alternative e-mail address filtering
method using the
PPEEAARR::::MMaaiill
pack-
age. This can also be accomplished
using
iimmaapp__rrffcc882222__ppaarrssee__aaddrrlliisstt(())
if PHP is compiled
——wwiitthh iimmaapp
. If
portability is a concern, however, I
suggest using the
PPEEAARR::::MMaaiill
pack-
age.
Testing Input Length
In part one of this series, I men-
tioned that, while the
mmaaxxlleennggtthh
attribute of the HTML
iinnppuutt
tag
controls how much data a user may
enter when properly using a form
located on the host site, it does not
restrict the amount of data that a

user may post when using a form
located on another Web site, or
when posting by some other means
(see part one for more informa-
tion).
Likewise, client-side validation
with JavaScript may provide good
measure for practicing “defense in
depth,” as well as a potentially bet-
ter user experience, but it will not
restrict the actual data that can be
sent to the form processing script
from somewhere else (e.g. another
form on another Web site). Thus, it
is necessary to perform all input fil-
tering, or validation, on the server
side, in addition to any client-side
validation.
Regardless of whether you filter
input at the client, you must always
filter input at the server.
I have seen many sites that pro-
vide a
mmaaxxlleennggtthh
attribute in their
iinnppuutt
tags but fail to test the length
of the field from the server side.
This leaves the processing script
open to receive all lengths of data,

which can lead to database con-
straint violation errors and, poten-
tially, more dangerous issues.
Checking the length of input,
however, is simple, and, coupled
with the
mmaaxxlleennggtthh
attribute, it is
easy to determine that a user is
abusing the form if input received is
longer than the expected length.
Listing 4 is a finalized version of
the
ffiilltteerr(())
function that incorpo-
rates all that I have discussed thus
far. Notice how I have expanded
$$ppoosstt__wwhhiitteelliisstt
to include more
information about each form field.
Now, I associate an array with each
field that defines the type of input
to check against, in addition to sev-
eral other details. One of those
details is
mmaaxxlleennggtthh
, which I check
in the
ffiilltteerr(())
function with:

if
(isset($whitelist[$key][‘maxlengt
h’])
&& (strlen($value) >
$whitelist[$key][‘maxlength’])) {
continue;
}
Here, I use the
ccoonnttiinnuuee
statement
to skip to the next item in the [fore-
ach] loop, essentially excluding this
value from the
$$cclleeaann
array if it con-
tains more data than expected.
Since I have
mmaaxxlleennggtthh
defined for
these fields in my form, I am confi-
dent that no user using my form is
able to enter more data than
expected. If the input contains val-
ues that are longer than their
respective
mmaaxxlleennggtthh
, then I can
assume that the user is abusing my
form in some way, and I can safely
exclude the input from the

$$cclleeaann
array.
Ensuring Acceptable Values
In much the same way that
mmaaxxlleennggtthh
cannot be relied upon to
stop would-be attackers from send-
ing unlimited amounts of data to
form processing scripts, the values
displayed in HTML
sseelleecctt
,
rraaddiioo
button, and
cchheecckkbbooxx
lists are not
the only values that can be posted.
Thus, it is necessary to filter the val-
ues of these fields and ensure that
the input received is input expect-
ed. Again, this is not a hard practice
to implement, but it does require
more code.
Take another look at Listing 4. In
$$ppoosstt__wwhhiitteelliisstt
, I’ve also added
the “option” type, and for each
item specified as type “option,” I
have listed the expected options in
the “options” array. For flexibility,

I’ve also added the “multiselect”
September 2005

PHP Architect

www.phparch.com
TTIIPPSS && TTRRIICCKKSS
13
Input Filtering, Part 3
parameter that is defined on fields
in which more than one item may
be selected (i.e. checkboxes or
menu lists).
In the
ffiilltteerr(())
function, under
the “option” case of the switch
statement, I check whether the
input received is an array. If it is,
then I further check to ensure that
I’m allowing the user to select more
than one item. If not, then the
input received shouldn’t be an
array, and I discard the data and
move on. If it is a multi-select field,
then I check to ensure that every
item in the array matches those
defined in the “options” parameter
for the field.
If it’s not an array, then I simply

check to ensure that it matches one
of the “options.” If it does, then I
keep it; if not, then it is discarded.
If a value is not acceptable—that
is, it doesn’t conform to expecta-
tions—then I don’t keep it. It does-
n’t get added to the
$$cclleeaann
array.
Notice how all values in Listing 4
are now set to
NNUULLLL
if they don’t
conform to expectations. Then, I
check whether the value is null. If it
is, I don’t save it to
$$cclleeaann
. In part
two of this series, recall that I did
save it to the
$$cclleeaann
array, with an
empty value. I no longer do that,
and, instead choose to completely
discard the reference to the field.
Now, the worst thing that can hap-
pen when working with user input
is that a field doesn’t exist—but
that’s easy to check and report.
Moving Right Along

Over the past three issues, I have
given an in-depth look at input fil-
tering in PHP. This discussion has
covered such topics as “why to fil-
ter”, “using
ccttyyppee
functions and
regular expressions”, and “validat-
ing the length and acceptable val-
ues of received input.” I have dis-
cussed this all the while promoting
a whitelist approach to ensure that
input received is input expected.
For future installments of Tips &
Tricks, I would like to know what
tips and tricks you are using. Please
send your tip and/or trick to
, and, if I use
it, you’ll receive a free digital (PDF)
subscription to php|architect.
Until next time, happy coding!
TTIIPPSS && TTRRIICCKKSS
About the Author ?>
To Discuss this article:
/>Ben Ramsey is a Technology Manager for Hands On Network in Atlanta, Georgia. He is an author, Principal member of
the PHP Security Consortium, and Zend Certified Engineer. Ben lives just north of Atlanta with his wife Liz and dog Ashley.
You may contact him at
rraammsseeyy@@pphhpp nneett
or read his blog at
hhttttpp::////bbeennrraammsseeyy ccoomm//

.
Award-winning IDE for dynamic languages,
providing a powerful workspace for editing,
debugging and testing your programs. Features
advanced support for Perl, PHP, Python, Tcl and
XSLT, on Linux, Solaris and Windows.
Download your free evalutation at www.ActiveState.com/Komodo30
Input Filtering, Part 3

Roll Your Own
Database Abstraction Module
by Jason Lustig
H
ow does Adobe keep Photoshop working on
both Windows and Mac OS, or Microsoft keep
Office portable? Often, people take the route of
maintaining separate codebases for different platforms.
Mega-corporations have the resources to pull it off, but
a smaller firm or even a lone coder probably couldn’t
do it particularly efficiently. It’s one of the reasons why
the Mozilla project decided to go with XUL as their
frontend instead of maintaining different sets of code
for Windows, Mac OS, Linux, and whatever else hap-
pened to come around. Prior to XUL, if the Netscape
developers had to make a change, they had to update
every codebase individually, and it was a major hassle.
Web applications give us a little more freedom. HTML
is fantastically portable—as long as there is a decent
web browser for your desktop platform of choice, you
will be able to access and work with your web applica-

tions. It has been argued that Microsoft has neglected
Internet Explorer for exactly this reason: innovating too
much in the browser space would kill the desktop,
which is their big cash cow. Web applications are even
more portable, on the server side, because most of the
languages—be it PHP, Perl, or even some ASP, through
emulators such as Chili ASP—can run on almost any
web server in any operating system (within reason).
The bottleneck to ultimate portability turns out to be
the data itself. If you can abstract your data, then you
will never be tied down again! This is, in a way, the
“holy grail” of web application development: how can
you make the database code portable but at the same
time readable and hand-tuned for every database that
you are writing for? How can I take advantage of low-
level locking in Oracle when my MySQL code doesn’t
REQUIREMENTS
PHP 4
CODE DIRECTORY abstraction
September 2005

PHP Architect

www.phparch.com
16
You may already use database abstraction in your appli-
cations, perhaps through one of the available database
abstraction layers, such as PEAR::DB, or PDO (see the
PDO article in this issue), but what about various idiosyn-
crasies in the actual SQL? Perhaps you’ve never even con-

sidered this problem. This article will help you the data
abstraction beast.
FFEEAATTUURREE
FFEEAATTUURREE
Roll Your Own Database Abstraction Module
September 2005

PHP Architect

www.phparch.com
17
even have transactions? How can I abstract my data to
an extent that it can be used by all kinds of databases?
It’s possible; I’ve done it. I was able to port my
200,000-line web application from MySQL to
PostgreSQL in about two hours on a lazy Sunday after-
noon.
What Is Data Abstraction, and Why Bother?
Data abstraction is when your application does not
have to worry about where its data comes from. In the
world of web applications, because most people use
databases to handle their data, usually this translates
into database portability—the ability for your applica-
tion to interact with all different kinds of databases.
Is portability worth it? In a perfect world, our com-
puters would work properly most of the time, and we
wouldn’t have any reason to switch operating systems,
web browsers, etc. Why would we want to keep our
code portable? Are there such big advantages that
make the hassle worth the pain and suffering? (Because

it is extra work to keep code portable, since you need
to test across multiple systems.)
It depends on your goals. There are definite advan-
tages to portable code, such as opening up the market
for your application to a larger group of people, avoid-
ing lock-in, and more, but there are also disadvantages.
Grow your market. If you are selling or otherwise mak-
ing a computer program that other people will use,
whether it’s a web-based application or not, it would
be great to be able to offer it to more people. That’s the
reason why the big guns (like Adobe) keep their soft-
ware running on both Mac and Windows platforms. If
they picked only one system to support, it would really
cut costs, but would also alienate large group of poten-
tial customers. The more databases your web applica-
tion supports, the larger the number of people who
might be interested in purchasing or downloading it.
Portability keeps your code more readable and more
maintainable. If you use a modular approach to data
abstraction, as I do, or even if you use an abstracted set
of functions like
qquueerryy(())
instead of
mmyyssqqll__qquueerryy(())
, then
your code will be easier to read and maintain down the
line. This is something nobody will argue against!
Avoid lock-in. In my “real” job, I work in retail doing
market research. Our current setup uses Microsoft SQL
Server to mirror all of the data in our point of sale sys-

tem, so that we can mess around and not have to
worry about corrupting our actual, production data.
This set up generally works pretty well. One day, we
were having some trouble with the server, and my
boss, who is pretty smart, and has some technical back-
ground, said “Would it help if we switched to Oracle?”
The answer to this is, really, “I have no idea if it would
help if we switched to Oracle.” We don’t need any of
its fancy table locking features or anything like that,
and SQL Server has been pretty good to us so far. It
would be a lot of work to import our databases. The
reason that I said “no, Oracle would not help us much
at all,” is because we have many scripts, programs,
nightly jobs, and other little bits of code written for SQL
Server and fine-tuned to cater to its nuances and bugs.
To port all of this code would take weeks and would not
save us nearly enough time, in the long run.
We’ve been locked in. Now, there is nothing particular-
ly wrong with this, because we are doing everything
internally and really there is no reason why we would
want to switch to another database. But if we had to,
we would really be in a bind. Unfortunately, it is much
more difficult to keep code like this portable than it is
to keep web applications portable.
“Too Portable” or “Too Abstract?”
It depends on what you are trying to accomplish. Just
like many other processes that improve performance,
grow your market, and make things easier to do, the
concepts of data abstraction and portability function
under the law of diminishing returns. What this means in

English terms (as opposed to the economic mumbo-
jumbo that it really consists of) is that as you make your
code more and more portable, the benefit that you get
out of it tends to decrease over time.
So, when you first abstract your database, swapping
mmyyssqqll__qquueerryy(())
for PEAR::DB, or another similar abstrac-
tion layer like ADODB, the relative increase in produc-
tivity will be greater than when you then go and
abstract your queries, or do something crazy like begin
to use an XML-based definition of your database struc-
ture.
The key is to find a balance. You need to determine
“The bottleneck to ultimate portability is the data itself.
If you can abstract your data, then you will never be tied down again!

the point at which you are kidding yourself—where
additional abstraction will cease to help you out. When
you’ve reached this critical juncture, you should stop
fussing around and get to programming your real
application.
This isn’t to say that abstracting your data isn’t worth
it. But depending on the application you are writing
and the job it is supposed to do, sometimes abstraction
isn’t worth the time that you would spend to maintain
it. A simple formula might be: the time spent maintain-
ing data abstraction, divided by (the time it takes to
write the application in the first place multiplied by the
amount of time you plan to spend maintaining the
application). If the result of this formula is greater than

one, it probably isn’t worth it to abstract the data any
more than you have to, in order to get it working prop-
erly without killing yourself with PHP’s arcane function
names. Otherwise, it makes sense to abstract the data
to your heart’s delight.
Luckily for us, making a good abstraction layer is easy
enough, and the learning curve is such that you can get
used to it quickly enough, that the time to maintain
data abstraction is usually low enough to guarantee
that most of the time, it really is worth it.
Let’s Get to Business, Shall We?
We’ll begin with some simple pseudo-code to connect
to a database, pull some data, and then display it. We
are going to eventually abstract away different portions
of the database code, in varying amounts, to try to find
the “sweet spot” where we’ve balanced portability with
the time we’ll spend on further abstraction. See Listing
1.
Easy enough, right? We are already using some sort
of basic database abstraction. We don’t call
mmyyssqqll__qquueerryy(())
or
ppoossttggrreessqqll__ppccoonnnneecctt(())
anywhere in
here; we have abstracted away the PHP functions so
that we can rewrite the class to connect to an addition-
al database. In fact, you might notice that the function
names are similar to the ever-popular PEAR::DB abstrac-
tion class. It’s my personal favorite, because it is simple
and takes care of most of the hard work for you, and at

the same time it does not force you to abstract your
database calls any further than you want.
Additionally, the SQL code itself is pretty portable—
the language is standardized to a certain extent; you
can assume that basic
SSEELLEECCTT
,
UUPPDDAATTEE
, and
DDEELLEETTEE
state-
ments,
JJOOIINN
s, etc. will work on most modern databas-
es. The tricky part with writing code like this is that you
need to test it on all supported databases. When you
make a change to the SQL, it might break some data-
bases and not others. It increases the amount of QA
work that needs to be done, while minimizing the
amount of actual code you have to write.
One important thing to remember is: databases are
already a form of abstraction. They abstract away the
idea of data sitting on the disk in zeroes and ones, and
think about it as tables and rows. SQL stands for
Structured Query Language, and the theory is that it
should be standard across all the different database
engines. So, if you wrote your code with standard SQL,
it should be portable… right?
The problems that arise are often related to database-
specific extensions to the SQL standard. “Why use

extensions?” you might ask. “Just stick to the stan-
dard—databases should be standards-compliant, just
September 2005

PHP Architect

www.phparch.com
FFEEAATTUURREE
18
Roll Your Own Database Abstraction Module
1 <?php
2 $db = DB::connect();
3 $sql = ‘select ‘;
4 if ($database == ‘mssql’)
5 {
6 $sql .= ‘ top 10 ‘;
7 }
8 $sql .= ‘ * from mytable where something = 5 ‘;
9 if ($database == ‘mysql’)
10 {
11 $sql .= ‘ limit 10’;
12 }
13
14 $result = $db->query($sql);
15 while ($result->fetchInto($row))
16 {
17 var_dump($row);
18 }
19 ?>
Listing 2

1 <?php
2 $db = DB::connect();
3
4 $sql = ‘select * from mytable where something = 5’;
5 $result = $db->query($sql);
6 while ($result->fetchInto($row))
7 {
8 var_dump($row);
9 }
10 ?>
Listing 1
“Portability keeps your code
more readable and more maintainable.”
FFEEAATTUURREE
September 2005

PHP Architect

www.phparch.com
19
like web browsers!” Reality is that databases just aren’t
always so standards-compliant. MySQL (before version
5) didn’t support stored procedures, and has a number
of different table types, many of which handle locking
and transactions differently. Oracle and Microsoft SQL
each have a hundred handy little features that they
have added to the standard which, in theory, make it
easier to write applications. These features often serve
as convenience functions, and allow you to do things
like grab only one row, quickly.

Why not take advantage of these extra features? If
you don’t, you are just hurting your application by
making it slower. But, if you have the SQL itself hard-
coded into your main code, there is no way to really do
this, right? Wrong. If you were so inclined, you could
dynamically generate the SQL query, based on the
database platform you are using. Say, for example, that
you want to select the top ten rows from a table, and
want to support both MySQL and Microsoft SQL
Server. These two databases use different syntaxes to
limit the number of rows returned from a query. SQL
Server uses “
ttoopp xxxx
” and MySQL uses “
lliimmiitt xxxx
”.
However, the code in Listing 1 could be adapted to
support both databases, as in Listing 2.
Easy enough, right? In theory, yes, but it makes your
code impossibly hard to maintain, especially if, one day,
you decide that you also want to support Oracle,
PostgreSQL, Firebird, and maybe also DBase or SQLite.
Additionally, it is less secure because it opens the door
to making some big mistakes, since you are always gen-
erating the SQL statement on-the-fly. What if you mess
up and put something inside the “
$$ssqqll ==
” portion that
shouldn’t be there? This opens the
$$ssqqll

variable up to
a possible injection attack. It is a hacker’s paradise.
Roll Your Own Language
Let’s say you just want to have one set of database code
to rule them all. You could go the route of abstracting
the idea of your query, and then write a class that will
generate the SQL as necessary. You could add the abil-
ity to set optimization flags, if the database can handle
it. Depending on the database, your SQL generator will
either pay attention to or pretend these flags didn’t
exist. Let’s look at the same code again but with a
made-up SQL generator (Listing 3).
In this latest attempt to abstract our database query,
we have gone to great lengths to tell our code what we
are trying to do. Essentially, the
ddbb__qquueerryy::::ggeenneerraattee(())
function can figure out which database we want to talk
to, and create an optimized query at will. You don’t
even need to use a function-based abstraction; you can
create XML files that describe your queries, or even
your entire database structure, making it human-read-
able, as well.
But is it the best way? Personally, I don’t think so. You
end up just writing your own query language that
needs to be debugged and audited for security. You’d
have to maintain another complex abstraction layer in
your application, when you could instead be writing
Roll Your Own Database Abstraction Module
Figure 2
Figure 1

Figure 3
An Ideal Web Application
The directory structure of our application makes it easy to make new data abstrac-tion modules
and to differentiate between them.
A Unified Binary contains
executable code for both
the x86 and PowerPC (PPC)
architectures in one file
September 2005

PHP Architect

www.phparch.com
FFEEAATTUURREE
20
another simple layer. All too often, people over-abstract
their applications and focus too much on the frame-
work and not on making features that make their appli-
cation cool and fun to use.
Unified Binaries, Unified Abstraction
Over the past few years, while working on various
applications, I have developed a method which, in my
opinion, is the best that I have seen. It’s a system that
allows you to create new modules—or port your code
to new databases—quickly and easily. In fact, this
method makes it so that your “core” application never
actually touches the database or whatever sort of data
store you’re using. This opens up all sorts of interesting
possibilities, because your application doesn’t care
which database stores the data. It really doesn’t even

need to be a database. You could write a module that
stores your data in flat files, or even shared memory, if
you wanted to. We’ll cover that, later.
Most people will agree with the idea that modular
applications are a good thing. This “ideal complicated
program” is made up of modules that interact with
each other, through interfaces, abstracting away the
ugliness of any code that may reside underneath. Other
portions of your program can assume (within reason)
that this abstraction layer simply works and you will
never have to think so much about what’s actually hap-
pening. You will only need to work with the data that
is returned from the modules. Using a simple, standard
way of returning errors from the database modules, fail-
ures can be easily handled, as well.
For those who don’t understand all that architecture
mumbo-jumbo, let’s draw a picture of this “ideal” web
application (Figure 1).
The idea, here, is that each level of the application
takes care of one aspect of displaying a page, whether
it is generating the HTML code that is sent to the
browser (templates), the “business logic,” sanitizing
users’ input data, or anything else that a typical web
application must do. This makes “n-tiered applica-
tions”, where n represents the number of tiers (also
known as levels, modules, or by many other names).
The most popular and well-known of these n-tiered
models in the web application space is the three-tiered
application, also known as Model-View-Controller. In
the Model-View-Controller architecture, you have three

levels: a database (“model”), the business logic (“con-
troller”) and HTML generator (“view”). There are many
benefits to this model, especially in terms of scalability.
You can put each of these three tiers on different
groups of servers, and if you need to be able to support
more users, just throw more hardware at your applica-
tion.
Having multiple-tiered applications is great for other
reasons as well, including cleaner code, and better doc-
umentation. You can also pull out modules and replace
Roll Your Own Database Abstraction Module
1 <?php
2 function sql__select_top10_from_mytable($test)
3 {
4 // our database connectivity class
5 global $db;
6 $sql = ‘select * from
7 mytable where test = ‘.
8 $db->quote($test).’ limit 10;’;
9 return $db->query($sql);
10 }
11 ?>
12
Listing 7
1 <?php
2 function require_sql($file)
3 {
4 // this is a variable to hold ‘mysql’
5 // or otherwise the directory name associated
6 // with our database.

7 global $dbtype;
8 require_once ‘sql/’.$dbtype.’/’.$file;
9 }
10 ?>
Listing 5
1 <?php
2 function sql__select_top10_from_mytable()
3 {
4 // our database connectivity class
5 global $db;
6 $sql = ‘select * from mytable limit 10;’;
7 return $db->query($sql);
8 }
9 ?>
10
Listing 6
1 <?php
2 $db = DB::connect();
3 $result = sql__select_top10_from_mytable();
4
5 if (DB::isError($result))
6 {
7 throw_some_error();
8 }
9 else
10 {
11 while ($result->fetchInto($row))
12 {
13 var_dump($row);
14 }

15 }
16 ?>
Listing 4
1 <?php
2 $db = DB::connect();
3
4 $query = new db_query();
5 $query->fields[] = ‘*’;
6 $query->table = ‘mytable’;
7 $query->check(‘something = 5’);
8 $query->limit_rows(10);
9
10 $result = $db->query($query->generate());
11
12 while ($result->fetchInto($row))
13 {
14 var_dump($row);
15 }
16 ?>
Listing 3
them with others that have the same API, but work in a
totally different manner, underneath. This is where it
gets interesting with regard to our data abstraction
problem.
Another other great advantage of web applications is
that, for the most part, especially if you use a language
like PHP, they are dynamically compiled and run. This
means that you can interchange files at will, and users
will not be able to tell the difference. We’ll take advan-
tage of this, to create multiple database modules that

work along a set interface to our business logic. In this
way, to create a new module—in other words, support
a new database system—all we need to do is port one
database module’s code to the new database, and
voila! Your application has been ported to a new data-
base.
Usually, when people talk about database modules, it
is for the most part constrained to database connectivi-
ty, as we looked at before. Connectivity defines how
your application talks to the database and sends queries
and other messages back and forth. We can use one if
we want to with this system, but ultimately, because of
our modular system, it does not matter where we are
getting the data nor where we are storing it, so long as
it conforms to the set interface that our “business
logic” knows how to deal with.
This modular tier that I propose won’t live on a differ-
ent server (though you could put the files on one),
because it is actually a part of the “controller” level of
the application. It is surprisingly similar to Apple’s
“Unified Binary” approach to compiling programs for
both the PowerPC and x86 CPUs, which is why I like to
refer to it as “Unified Abstraction.”
What is a “Unified Binary,” how does it work, and
what does this have to do with data abstraction? Well,
Apple has a peculiar situation coming up where it will
be supporting two CPU families: IBM’s PowerPC, which
is what Macintosh computers have used for the past
ten years or so, and Intel’s Pentium (x86) family of
processors. This presents a major problem for software

developers. What are you going to do about develop-
ing for both processors, since a binary compiled for
PowerPC won’t run on x86, and vice-versa? It’s a very
similar problem to our issue with databases. The solu-
tion that Apple came up with is this: within the “appli-
cation” that you create is really two binary programs.
One is compiled for the PowerPC processor, and the
other for x86. When you open up a Unified Binary, Mac
OS will just use whichever binary is compatible with
your computer, and it can use resources (international-
ization files, images, etc.) normally, because they are
just normal files.
We will use a similar method. When you create the
PHP script for your web application, you will write it as
a core file that doesn’t really care about which database
you’ve chose; this is similar to the resource files. It fig-
ures out which database we are working with, and then
calls the appropriate database module, which is analo-
gous to different binaries for PowerPC and x86.
The key is that your application somehow needs to
know which database it is using. Somewhere, you are
storing the database connection credentials, such as
the username, password, hostname, and so on. In this
same place, you can keep information about whether
you are connecting to MySQL, Oracle, Microsoft SQL
Server, or even a flat-file database. We can add an extra
line,
$$ddbbttyyppee == ‘‘mmyyssqqll’’
, to our examples.
The main scripts that live in your web-root, which is

what people see when they come to your site, won’t
contain any actual database calls. Rather, they call func-
tions that return database records. Alternately you can
use an object-oriented approach, though I prefer sim-
ple functions because they lead to less code, which is,
in turn, less complicated.
In Listing 4, the
ssqqll____sseelleecctt__ttoopp1100__ffrroomm__mmyyttaabbllee(())
function ideally return a
PPEEAARR::::DDBB__RReessuulltt
object. We
use the
DDBB::::iissEErrrroorr(())
function to check to make sure
that our query worked properly.
You may have noticed that Listing 4 won’t run
because it’s missing the declaration of the
ssqqll____sseelleecctt__ttoopp1100__ffrroomm__mmyyttaabbllee(())
function. This is
because the listing contains only the core script, which
hasn’t yet called the database module. Let’s create a
script called
ddbbtteesstt pphhpp
, and place it in our applica-
tion’s root directory. We could create a subdirectory
called
ssqqll
, and within that, another directory,
mmyyssqqll,,
ppggssqqll

or whatever we want. This nested directory
would contain our database module. In that way, we
can create new modules simply by creating another
directory beneath
ssqqll
, such as
mmssssqqll
or
oorraaccllee
.
How does the PHP file know where to find the SQL
file associated with it? Listing 5 shows a function that
performs this task.
Within our main script, we can just add the line
rreeqquuiirree__ssqqll((‘‘ddbbtteesstt pphhpp’’));;
and our file will be includ-
ed. Within
ssqqll//$$ddbbttyyppee//ddbbtteesstt pphhpp
is the function
shown in Listing 6.
Of course, you could name the function anything you
like, but I usually choose to preface them with
ssqqll____
(and then, usually with something dealing with the
name and location of the associated core script, in a
larger application), because this way, functions won’t
have the same name, thus avoiding naming conflict.
You could also pass it variables, as shown in Listing 7.
In this way, you could have a similar function that
uses the alternate method of limiting rows, within the

mmssssqqll
module. Your application would be none the
wiser—it would just proceed as normal, and wouldn’t
care at all if you used
lliimmiitt
or
ttoopp
within the query. You
can optimize each query for each specific database, as
much as you like, and you’ll not have to worry about
the fact that all those obscure keywords might fail on
another database system.
FFEEAATTUURREE
September 2005

PHP Architect

www.phparch.com
21
Roll Your Own Database Abstraction Module
To port your application to a new database, all you’ll
need to do is take the database module whose SQL syn-
tax is closest to the one you are porting to, duplicate its
directory within
ssqqll//
, and rename it appropriately (to
e.g.
oorraaccllee
or
ddbbaassee

, etc.). Then, just go in and change
the SQL calls so that it takes advantage of the new data-
base’s features, and voila! You now support a new data-
base type!
The reason why you would start by copying the mod-
ule for the database whose syntax is most similar to
your new database is to require the fewest possible
changes to the SQL within the module.
Maintanence
“Alright,” you might be saying, “this sounds interest-
ing, but also it seems like a lot of work to maintain!” It
really isn’t that much work, once you’re used to it.
When you want to change a database query, you just
need to change the SQL in each of the database mod-
ules.
It’s also easy to add new functions, because, if you
first write only a simple function that doesn’t use
advanced and non-portable features of your favorite
database, you can just copy the function over to your
other modules and then go and make each one take
advantage of your table hints or other bells and whis-
tles.
Of course, if you keep your database modules well-
documented, maintenance is easier, as well.
Conclusion
Data abstraction can be done in many ways. The
method that I have suggested is one that I personally
prefer because of the ease of porting applications to
new databases and data storage methods. It isn’t for
everyone or for every project—just like some quick-

and-dirty applications don’t necessarily separate con-
tent from logic using templates, sometimes abstraction
isn’t worth it. Database abstraction, at the SQL level is
one of those things that doesn’t usually hurt too much,
and helps out in the long run.
September 2005

PHP Architect

www.phparch.com
FFEEAATTUURREE
22
Roll Your Own Database Abstraction Module
About the Author ?>
To Discuss this article:
/>Jason Lustig is a student at Brandeis University in Boston. He is a freelance programmer
who dabbles in database and application design, and works part-time doing market
research and data mining.
Available Right At Your Desk
All our classes take place
entirely through the Internet
and feature a real, live instructor
that interacts with each student
through voice or real-time
messaging.
What You Get
Your Own Web Sandbox
Our No-hassle Refund Policy
Smaller Classes = Better Learning
Curriculum

The training program closely
follows the certification guide—
as it was built by some of its
very same authors.
Sign-up and Save!
For a limited time, you can
get over
$300 US in savings
just by signing up for our
training program!
New classes start every three weeks!
/>N
early everyone who has ever employed PHP has
used it to talk to a database system. In most
cases, a database provides a highly flexible and
capable information storage and retrieval engine, ideal
for data gathering and analysis.
It is really no wonder that databases use is so preva-
lent in the developer community. As with most popular
tools, there are often multiple approaches to the same
problem, and database systems are no different from
the norm. There are literally dozens of different data-
base systems all competing for your attention as the
best way of dealing with information. PHP—the lan-
guage of choice for millions of developers—unsurpris-
ingly supports the majority of these database engines,
to ensure that no one is left out or feels neglected.
In most instances, the development of a database
interface in PHP is not the result of a master plan or
even a consequence of a well-planned specification,

designed to provide the ideal method of database com-
munication. More often than not, it is the result of a sit-
uation where a developer needed to have PHP connect
to a previously unfamiliar database.
By taking some existing code, possibly from other
database extensions, and adjusting it to work for their
particular database, the developer creates an initial
interface. Usually, other users and developers then
come up with tweaks, additions and refinements to the
initial code base that eventually evolves into a full data-
base extension.
While this approach has proven to be quite affective
over the years, it does pose one particular problem: the
PHP APIs for talking with most databases are relatively
similar, but are far from identical.
This problem is most apparent in the functions
defined by the various database extensions. Each has its
own, distinct, set of functions. For example, the MySQL
extension uses
m
m
y
y
s
s
q
q
l
l
_

_
f
f
e
e
t
t
c
c
h
h
_
_
r
r
o
o
w
w
(
(
)
)
to retrieve a record
as an array of elements, while PostgreSQL makes use of
p
p
g
g
_

_
f
f
e
e
t
t
c
c
h
h
_
_
r
r
o
o
w
w
(
(
)
)
. Aside from the differences in the
names, the parameter order of the functions is also
eclectic.
Using MySQL and PostgreSQL as examples, the for-
mer’s query execution function does not require a data-
base connection resource—and if one is provided, it
takes the last position in the function call’s parameter

list. In PostgreSQL, and several other extensions, a
database resource is required, and must be supplied as
the first parameter to the function. Document the dif-
ferences between the various extensions would proba-
September 2005

PHP Architect

www.phparch.com
23
F
F
E
E
A
A
T
T
U
U
R
R
E
E
A common complaint of the anti-PHP “expert” is the lack
of a bundled, uniform database access component. With
the advent of an improved object model, in PHP 5.0, a few
of PHP’s core developers decided that the time has come
to fill this hole with PHP Data Objects (PDO). The pack-
age, itself, has been in PECL for quite a while, now, but

with the upcoming release PHP 5.1, PDO will be bundled
in the main PHP distribution. What does it do? How does
it work? One of PDO’s main developers explains.
REQUIREMENTS
PHP 5.0+
OS N/A
Other Software
PDO and an appropriate
driver: />Code Directory n/a
An Introduction to
PDO
by Ilia Alshanetsky

bly require an entire book, and is far beyond the scope
of this article.
The API difference is something that is of little con-
cern the developers who only communicate with a par-
ticular database; it does, however, present a serious
problem to those who need to support multiple data-
base back-ends.
This has lead to the creation of numerous database
abstraction libraries. These range from simple ones that
merely choose the right native function for the job, and
possibly juggle the arguments, to complex and ulti-
mately slow beasts that not only abstract the interface,
but also try to handle various incompatibles between
the database systems, themselves.
This has been somewhat of a pet peeve for the PHP
core development community. This is why we decided
to address the issue—during LinuxTag 2003—with the

advent of PHP Data Object (PDO).
PDO was designed to use the latest PHP 5 object ori-
entation support to provide a common API for all data-
base systems with which PHP can communicate. By
creating a common database communication interface,
the need for the majority of database wrappers is elim-
inated. Because it was written in C, rather than PHP, the
interface is very fast, and has very minimal—if any—
overhead to the native interface. Furthermore, PDO
aimed to identify common operations that are per-
formed on a database, and provide easy and conven-
ient means of applying (or emulating if necessary)
them, for all supported databases. These abilities
include:
• execution of
IINNSSEERRTT
/
UUPPDDAATTEE
/
DDEELLEETTEE
queries
• retrieval of data from a database in various
forms:
• as an array
• as an object (new of pre-existing)
• into bound variables
• as a string
• retrieval of all rows as a multi-dimensional
array
• prepared statement querying

• the use of transactions
• auto-commit support
• the ability to normalize the case of table
columns
Thus, the “only” thing the code author needs to worry
about is the differences in the databases themselves,
which is simple enough as long as you use standard
SQL.
Current State of Affairs
At this time, PDO has reached the majority of the ini-
tially-set goals and offers nearly all of the initially-
planned features.
It also includes support for all major databases with
which PHP can communicate:
• MySQL 3 and 4 (
ppddoo__mmyyssqqll
)
• PostgreSQL (
ppddoo__ppggssqqll
)
• SQLite 2 and 3 (
ppddoo__ssqqlliittee
– in fact, PDO is
the only way to connect PHP to SQLite 3)
• Oracle (
ppddoo__ooccii
)
• Firebird (
ppddoo__ffiirreebbiirrdd
)

• MSSQL and FreeTSD (
ppddoo__ddbblliibb
)
• ODBC (
ppddoo__ooddbbcc
)
All of the drivers (with the possible exception of the
Firebird driver) are quite stable and are regularly tested
for both bugs and functionality. At the present time,
some are already being used on production systems.
Nonetheless, PDO and its drivers are a relatively new
addition to PHP, and as such, may contain some yet-to-
be-discovered bugs, so consider yourself warned.
Installing PDO
How do you get PDO? In PHP 5.1 (which should be out
shortly), the PDO core extension and its SQLite driver
are enabled by default.
Other drivers are part of the standard distribution;
however, they need to be explicitly enabled via a con-
figuration switch. These usually are in the
——wwiitthh ppddoo
[[ddaattaabbaassee__ttyyppee]]==[[iinntteerrffaaccee__lliibb__ppaatthh]]
format. For
example to enable MySQL support you would use the
–– wwiitthh ppddoo mmyyssqqll==//uussrr//llooccaall//mmyyssqqll
, assuming that the
MySQL client library can be found in
//uussrr//llooccaall//mmyyssqqll
.
For PHP 5.0.X users, the situation is a bit different.

Because PDO is not part of the standard distribution, it
must instead be downloaded and installed from the
PECL repository, or downloaded in binary form (for
Win32 users), from
hhttttpp::////ssnnaappss pphhpp nneett//
. For installa-
tion from PECL, you simply need to execute the follow-
ing commands:
pear install pdo
pear install pdo_[driver]
#(example: pear install pdo_sqlite)
Upon execution, these commands will download the
latest stable PDO release, and then automatically com-
pile it.
The next step involves loading the compiled PDO
modules into PHP via
pphhpp iinnii
:
//*NIX users
extension=pdo.so
extension=pdo_sqlite.so
// Win32 users
extension=php_pdo.so
extension=php_pdo_sqlite.so
In PHP 5.0.x, there is no automatic handling of module
dependencies; therefore, it is absolutely imperative that
the PDO extension, itself, be loaded prior to any of its
drivers. Failure to follow the correct loading sequence
will usually result in a prompt crash, due to the driver
FFEEAATTUURREE

September 2005

PHP Architect

www.phparch.com
An Introduction to PDO
25

×