JUNE 2004
VOLUME III - ISSUE 6
JUNE 2004
VOLUME III - ISSUE 6
www.phparch.com
The Magazine For PHP Professionals
> Artificial Intelligence made easy with PHP and FANN <
NETWORKSNETWORKS
NEURALNEURAL
Spell checking with PHP
Automatic language detection
Make your script determine the
language of written text
Portable and stable GUI applications with PHP and XUL
Efficient Oracle Programming
Incredible-looking forms with PHP,
PDF and FDF
TM
Plus:
Tips & Tricks, Product Reviews, Security Corner and much more...
This copy is registered to:
YOU
Sign up before July 20th and save up to $100!
Christian Mayaud —
Getting Your OSS Business Funded
, Rasmus Lerdorf —
Best Practices for PHP Developers
,
Jim Elliott —
Open Source: The View from IBM
, Daniel Kushner —
Attacking the PHP Market
, Andrei Zmievski —
Andrei’s
Regex Clinic
, Wez Furlong —
Introducing PDO
, Regina Mullen —
OSS in Legal Technology
, Derick Rethans —
Multilingual Development with PHP
,
George Schlossnagle —
PHP Design Patterns
... and many, many more!
php|w rks
Toronto, Sept. 22-24, 2004
Three days of pure PHP
/>Jump Right To It.
5 Editorial
6 What’s New!
42 Product Review
Maguma Workbench 2.0.4.1
by Peter B. MacIntyre
62 Tips & Tricks
By John W. Holmes
65 e x i t ( 0 ) ;
PHP And the What-if Machine
by Andi Gutmans and Marco Tabini
10 Low-impact Programming with
PHP and Oracle
by John Neil
19 Spell checking with PHP
by Ilia Alshanetsky
26 PHP | FDF
by Richard Lynch
36 Cyber-PHP
Neural Networks with FANN and PHP
by Evan Nemerson
47 PHP and XUL
by Jonathan Protzenko
55 PHP File Management—An
Introduction
by Peter B. MacIntyre
3
June 2004
●
PHP Architect
●
www.phparch.com
TABLE OF CONTENTS
II NN DD EE XX
II NN DD EE XX
php|architect
Features
Departments
TM
*By signing this order form, you agree that we will charge your account in Canadian
dollars for the “CAD” amounts indicated above. Because of fluctuations in the
exchange rates, the actual amount charged in your currency on your credit card
statement may vary slightly.
Choose a Subscription type:
CCaannaaddaa//UUSSAA $$ 9977..9999 CCAADD (($$6699..9999 UUSS**))
IInntteerrnnaattiioonnaall AAiirr $$113399..9999 CCAADD (($$9999..9999 UUSS**))
CCoommbboo eeddiittiioonn aadddd--oonn $$ 1144..0000 CCAADD (($$1100..0000 UUSS))
((pprriinntt ++ PPDDFF eeddiittiioonn))
Your charge will appear under the name "Marco Tabini & Associates, Inc." Please
allow up to 4 to 6 weeks for your subscription to be established and your first issue
to be mailed to you.
*US Pricing is approximate and for illustration purposes only.
php|architect Subscription Dept.
P.O. Box 54526
1771 Avenue Road
Toronto, ON M5M 4N5
Canada
Name: ____________________________________________
Address: _________________________________________
City: _____________________________________________
State/Province: ____________________________________
ZIP/Postal Code: ___________________________________
Country: ___________________________________________
Payment type:
VISA Mastercard American Express
Credit Card Number:________________________________
Expiration Date: _____________________________________
E-mail address: ______________________________________
Phone Number: ____________________________________
Visit: for
more information or to subscribe online.
Signature: Date:
To subscribe via snail mail - please detach/copy this form, fill it
out and mail to the address above or fax to +1-416-630-5057
php|architect
The Magazine For PHP Professionals
YYoouu’’llll nneevveerr kknnooww wwhhaatt wwee’’llll ccoommee uupp wwiitthh nneexxtt
Subscribe to the print
edition and get a copy of
Lumen's LightBulb — a
$99 value
absolutely FREE
†
!
In collaboration with:
Upgrade to the
Print edition
and save!
For existing
subscribers
Login to your account
for more details.
EXCLUSIVE!
EXCLUSIVE!
† Lightbulb Lumination offer is valid until 12/31/2004 on the purchase of a 12-month print subscription.
June 2004
●
PHP Architect
●
www.phparch.com
EE DD II TT OO RR II AA LL RR AA NN TT SS
php|architect
Volume III - Issue 6
June, 2004
Publisher
Marco Tabini
Editorial Team
Arbi Arzoumani
Peter MacIntyre
Eddie Peloke
Graphics & Layout
Arbi Arzoumani
Managing Editor
Emanuela Corso
Director of Marketing
J. Scott Johnson
Account Executive
Shelley Johnston
Authors
Ilia Alshanetsky, Andi Gutmans, Richard Lynch, John
Neil, Evan Nemerson, Peter B. MacIntyre, Jonathan
Protzenko
php|architect (ISSN 1709-7169) is published twelve times a year by Marco Tabini &
Associates, Inc., P.O. Box 54526, 1771 Avenue Road, Toronto, ON M5M 4N5, Canada.
Although all possible care has been placed in assuring the accuracy of the contents of this
magazine, including all associated source code, listings and figures, the publisher assumes
no responsibilities with regards of use of the information contained herein or in all asso-
ciated material.
Contact Information:
General mailbox:
Editorial:
Subscriptions:
Sales & advertising:
Technical support:
Copyright © 2003-2004 Marco Tabini & Associates, Inc.
— All Rights Reserved
T
his month’s issue marks the first time, at least to
my knowledge, that a topic such as artificial intel-
ligence has been discussed on a PHP publication.
AI is one of those topics most people talk about with-
out really understanding its capabilities—and this has
resulted in a lot of confusion out there. If you’re wor-
ried that your server will become sentient and try to
take over the world (or, worse, spend all your money),
you can rest assured that that will not be the case (at
least until you run Internet Explorer—that’ll do the
trick).
However, a technology like neural networks can
come in very handy for a website developer. Ad-hoc
predictive solutions for tasks such as fraud prevention
and customer enticement already exist out there and
are well available for everyone to use—at an often
steep price. As a PHP developer, however, you are both
luckier and less fortunate at the same time. The FANN
library extension that is now available through PECL
provides you with the facility needed to create, train
and execute a generic neural network, which means
that you can not only build applications similar, or even
better, to the ones available commercially, but that you
can also build new and exciting ones.
On the other hand, designing and training a neural
network is a bit of a “black art” that requires a lot of
trial and error, so that you’ll have to be very creative
with it. It’s excellent news for us that Evan Nemerson,
who is the author and maintainer of the extension (as
well as one of the original authors of the library) has
agreed to tackle the problem of creating a neural net
from a practical perspective—building a simple script
that is capable of automatically determining the lan-
guage in which a string of text is written. Even with sur-
prisingly little training (and, even better, very little
actual PHP code), the network can reach surprisingly
high levels of accuracy.
Still, I’m fairly convinced that, once more people start
appreciating the abilities of the FANN library in finer
detail, we’ll see applications built on top of it become
available for everyone to use and tweak—and before
you know it, your computer will shut down at the
sound of “I’ll be back”.
Neural networks are not all we’re doing this month,
of course. Ilia Alshanetsky covers spell checking—a
topic that can be helpful to everyone who runs a web-
site. As it turns out (but not surprisingly), PHP has
excellent facilities that support spell-checking opera-
tions. We also have a great article on optimizing
Oracle-based websites—now that Oracle is placing
more and more interest in open-source projects, this is
likely to come in handy to more and more developers,
even if they are not in the enterprise arena. If you ever
wanted to create beautiful-looking forms but dreaded
the prospect of converting them to PDF, you’ll likely be
EDITORIAL
Continued on page 9...
June 2004
●
PHP Architect
●
www.phparch.com
6
NNEEWW SSTTUUFFFF
What’s New!
NN EE WW SS TT UU FF FF
eZ publish 3.4
eeZZ..nnoo
announces the release of eZ publish 3.4.
“eZ publish is an open source content management
system and development framework. As a content
management system (CMS) it’s most notable feature
is its revolutionary, fully customizable, and extendable
content model. This is also what makes it suitable as
a platform for general Web development. Its stand-
alone libraries can be used for cross-platform, data-
base independent PHP projects. eZ publish is also well
suited for news publishing, e-commerce (B2B and
B2C), portals, and corporate Web sites, intranets, and
extranets. eZ publish is dual licensed between GPL
and the eZ publish professional license.”
View more information at
eeZZ..nnoo
.
phpPgAdmin 3.4 Released
PPoossttggrreessqqll..ccoomm
announces the release of
phpPgAdmin 3.4.
“phpPgAdmin is a web-based administration tool for
all 7.x versions of PostgreSQL.”
Some new features include:
• Add CACHE and CYCLE parameters in
sequence creation
• View, add, edit and delete comments on
tables, views, schemas, aggregates, conver-
sions, operators, functions, types, opclasses,
sequences and columns (Dan Boren &
ChrisKL)
• Add config file option for turning off the dis-
play of comments
• Allow creating array columns in tables
• Allow adding array columns to tables
• many more…
Get all the info at
PPoossttggrreessqqll..ccoomm
.
Zend Technologies and Apollo
Interactive Unite
Thursday, May 27th 2004 13:48:55 GMT.
“Apollo Interactive®, America's leading Interactive
Agency, and Zend Technologies, the PHP company,
today announced a partnership to promote excellence
in open source development. Through the alliance,
the companies will share their varied technology per-
spectives to improve the functionality of the PHP lan-
guage ¾ which was developed by the founders of
Zend ¾ and refine PHP implementation for large,
high-volume enterprise Web sites.
The combination of Apollo’s significant PHP site devel-
opment experience and Zend’s technological expertise
will help drive the continued evolution of PHP, an
open source Web scripting language that is gaining
momentum as the most popular language to power
dynamic Web sites.The alliance will further the devel-
opment of PHP’s infrastructure and enable Zend to
establish best practices for its implementation in large
enterprise environments.”
For more information visit:
wwwwww..zzeenndd..ccoomm
June 2004
●
PHP Architect
●
www.phparch.com
7
NNEEWW SSTTUUFFFF
PHP5 Coding Contest
Want to put your PHP5 Skills to the test? Zend has
announced its PHP5 coding contest, of which
php|architect is also a sponsor.
“We’ve got lots of Prizes to give out just for entering,
as well as the Grand Prizes: a top-of-the-range Dell
laptop for a developer working by himself or an Apple
iPod Mini for each member of your team! Your appli-
cation will be rated both by your peers and by the
panel of Judges we’ve assembled from among the
most known and well-respected names in the PHP
community.”
Get all the Contest information from
ZZeenndd..ccoomm
.
Looking for a new PHP Extension? Check out some of the lastest offerings from PECL.
BLENC 1.0alpha
BLENC is an extension that hooks into the Zend Engine, allowing for transparent encryption and
execution of PHP scripts using the blowfish algorithm. It is not designed for complete security
(it is still possible to disassemble the script into op codes using a package such as XDebug),
however it does keep people out of your code and make reverse engineering difficult.
odbtp 1.1.1
This extension provides a set of ODBTP, Open Database Transport Protocol, client functions.
ODBTP allows any platform to remotely access Win32-based databases. Linux and UNIX clients
can use this extension to access Win32 databases like MS SQL Server, MS Access and Visual
FoxPro.
Fileinfo 0.2
This extension allows retrieval of information regarding vast majority of file. This information
may include dimensions, quality, length, and so on.
Additionally it can also be used to retrieve the mime type for a particular file and for text files
proper language encoding.
PDO_ODBC 0.1.1
This extension provides an ODBC v3 driver for PDO. It supports unixODBC and IBM DB2
libraries, and will support more in future releases. Windows binary available from:
/>PDO_MYSQL 0.1
This extension provides a Mysql 3.x/4.0 driver for PDO.
PHP 4.3.7 Released
PHP.net announced
the release of PHP
4.3.7. The PHP
Development Team is
proud to announce the
release of PHP PHP
4.3.7. This is a mainte-
nance release that, in
addition to several non-critical bug fixes, addresses
an input validation vulnerability in
eessccaappeesshheellllccmmdd(())
and
eessccaappeesshheellllaarrgg(())
functions
on the Windows platform. Users of PHP on
Windows are encouraged to upgrade to this
release as soon as possible.
For more information visit:
hhttttpp::////qqaa..pphhpp..nneett//
php|architect
New at php|a: PayPal support and sin-
gle prints
Monday, June 7th 2004 13:22:00 GMT
You asked for it! php|architect's purchasing system
now accepts PayPal as a valid payment method! You
can use your PayPal account safely and securely to pay
for all your php|a purchases.
Also, effective immediately you can now purchase
individual print issues that will be delivered directly to
your doorstep. Expect more past issues to become
available as we update our inventory and introduce
new shipping methods to get the magazines out to you
faster!
PHP 5 Release Candidate 3 Released!
Tuesday, June 8th 2004 12:48:09 GMT. PHP.net
announces the third release candidate of PHP5!
The third (and hopefully final) Release Candidate of
PHP 5 is now available!
This mostly bug fix release improves PHP 5's stability
and irons out some of the remaining issues before PHP
5 can be deemed release quality. Everyone is now
encouraged to start playing with it!
There are few changes changes since Release
Candidate 2, which can be found here.
For more information visit:
wwwwww..pphhpp..nneett
June 2004
●
PHP Architect
●
www.phparch.com
8
NNEEWW SSTTUUFFFF
Check out some of the hottest new releases from PEAR.
Config 1.10.1
The Config package provides methods for configuration manipu-
lation.
• Creates configurations from scratch
• Parses and outputs different formats (XML, PHP, INI,
Apache...)
• Edits existing configurations
• Converts configurations to other formats
• Allows manipulation of sections, comments, directives...
• Parses configurations into a tree structure
• Provides XPath-like access to directives
XML_HTMLSax3 3.0.0RC1
XML_HTMLSax3 is a SAX-based XML parser for badly formed XML documents, such as HTML.
The original code base was developed by Alexander Zhukov and published at />ects/phpshelve/. Alexander kindly gave permission to modify the code and license for inclusion in PEAR.
PEAR::XML_HTMLSax3 provides an API very similar to the native PHP XML extension
( allowing handlers using one to be easily adapted to the other. The key difference
is HTMLSax will not break on badly formed XML, allowing it to be used for parsing HTML documents.
Otherwise HTMLSax supports all the handlers available from Expat except namespace and external entity han-
dlers. Provides methods for handling XML escapes as well as JSP/ASP opening and close tags.
DB_DataObject 1.6.1
DataObject performs 2 tasks:
1. It builds SQL statements based on the objects vars and the builder methods.
2. It acts as a datastore for a table row.
The core class is designed to be extended for each of your tables so that you put the data logic inside the data
classes.
php|a
PHP_Beautifier 0.0.6.1
This program reformats and beauti-
fies PHP source code files automati-
cally. The program is Open Source
and distributed under the terms of
PHP License. It is written in PHP 5
and has a command line tool.
June 2004
●
PHP Architect
●
www.phparch.com
9
interested in this month’s article on FDF forms—PHP
provides an excellent interface to Adobe’s FDF library
that lets you combine a PDF form with POST data and
create a print-quality document with little or no effort.
Elsewhere, we cover XUL, the interface development
language that must have been born out of one Mozilla
developer asking the others “and now, how do we do
it in Windows?” XUL is great for building a GUI appli-
cation that can be ported across several operating sys-
tems and that requires almost no programming—and,
certainly, no code in C, Visual Basic et similia.
Finally, this issue also marks the debut of our very
own Peter MacIntyre in the role of reviewer and author.
Peter is a great help in the editorial process—and, as it
turns out, an incredibly gifted reviewer. Now, if I could
only interest him in some Italian food…
Editorial: Contiuned from page 5
php|a
NNEEWW SSTTUUFFFF
LightBulb 5.02
Lumen Software announces the release of LightBulb
5.02.
“LightBulb is a complete, browser-based, WYSIWYG
PHP development suite which includes a PHP appli-
cation generator, a code editor (with context and
classes prompting and highlighting), a complete
middleware/framework environment (Lumenation),
a GUI application interface, record locking, HIPPA
application compliance, user application logging,
transaction logging, current user monitoring, a
library of PHP classes and data access security, DB
compatibility, a report builder, a query builder, an
SQL builder, a source code manager, an application
management system, and a virtual desktop system
metaphor, and many other features.”
For more information or to download, visit
eezzssddkk..ccoomm
.
H
aving your PHP-driven website use an Oracle
database means you can tap into some of the
most powerful tools available for web develop-
ment. The speed and reliability of PHP code, coupled
with the power and flexibility of an Oracle database
gives developers a tough combination to beat.
However, unless you are careful, performance and reli-
ability issues can creep up on you and cause your sys-
tems to have increasingly difficult issues to resolve.
This article is an attempt to outline a few steps you
can undertake to create PHP and Oracle code that
works well together and minimizes resource utilization
on both sides. In other words, we’re presenting the
tools and techniques to create “low impact program-
ming.”
All of the examples shown are drawn from real-life
pain and suffering. Our environment consists of a num-
ber of web servers running Linux, Apache version 2,
and usually the latest version of PHP. Our Oracle data-
base servers typically run on Sun hardware operating
Solaris (although we do have several test Oracle servers
running on Linux).
We’ll start by describing the various ways in which
you can minimize the impact of coding decisions on
both the web and database servers. We will follow this
up with some very specific examples of common tasks
and the approaches you can take with them. We’ll con-
clude with tools and techniques for monitoring your
progress at making low impact and robust web sites
with PHP and Oracle.
What is “Low Impact Programming”?
Low impact programming, more an attitude than a
skill, means always trying to reduce the ways in which
the code we write and the configurations we make
impact the servers on which they run. It means always
searching for ways to reduce resource utilization
regardless of how often a piece of code will be run.
In practical terms, writing low impact programming
means that your systems will scale without having to
continue throwing hardware at the problem.
Moreover, by concentrating on reducing resource uti-
lization to accomplish the same tasks, you end up mak-
ing your systems much more robust and fault tolerant.
We are often lulled into a false sense of security when
we use tools like PHP and Oracle, because of their
inherent speed and reliability. The danger is that when
performance issues arise, they escalate quickly.
Keeping Resource Utilization Light
A number of factors influence what resources are
required when connecting PHP and Oracle in a web-
site. The easiest ways to reduce resource usage include
using persistent database connections, avoiding Oracle
database commits, taking advantage of the Oracle SQL
June 2004
●
PHP Architect
●
www.phparch.com
10
FF EE AA TT UU RR EE
Low-impact Programming with PHP and Oracle
by John Neil
PHP: 4.3.x
OS:
Any
Other software: Oracle Database Server
Code Directory: oracle
REQUIREMENTS
PHP and Oracle are an excellent combination for creating
powerful and scalable web solutions. This article sheds
light on those performance issues that might arise only
under high-traffic situations—so that you can stop them
before they ever start cropping up.
cache, and minimizing data transfer.
Persistent Connections
Many arguments exist both for and against using per-
sistent connections. The biggest single advantage to
using persistent connections, with Oracle as your data-
base in particular, lies in the fact that creating database
connections takes a lot of time and CPU power on both
ends of the connection. In our testing, we found that
opening a new Oracle database con-
nection added between 0.25 and
0.5 seconds per page. Using persist-
ent connections saved us this time
on nearly every page.
If you choose to use persistent
connections with an Oracle data-
base, you must be aware of many
things. Among the chief considera-
tions are that resources opened up
by one script on a persistent connec-
tion will remain open on subsequent
scripts on the same connection. This
has a cumulative effect on your data-
base server and can be a stealth rea-
son for system slowdowns and
phantom error messages.
As an example, each statement handle created opens
a cursor. The open cursor in the Oracle database repre-
sents a memory handle within the database, and all
Oracle databases have a finite number of these handles
available. While the persistent connections will eventu-
ally close (closing all the open cursors on that connec-
tion) when the Apache child process ends, on a busy
site you can see the open cursor counts rise until you
start getting error messages. If you’re going to use per-
sistent connections, then, you must specifically close all
statement handles you create.
Another problem area that can sneak up on you is in
the use of Oracle session parameters. One of the most
common uses of Oracle session parameters is to set the
default date format. If you want a particular date for-
mat for a query on one page and use the Oracle session
parameters to accomplish that, then that change in the
date format will persist to all scripts that happen to use
the same connection. This can lead to very inconsistent
output without any clear indication of why it is happen-
ing.
The last danger with persistent connections is that
current versions of PHP don’t handle database restarts
very well. If you are using persistent connections and
restart your Oracle database, all of the open connec-
tions being used by your Apache server will become
corrupt but will not be reopened until the next page.
This means that every time you restart your Oracle
database, you need to restart your Apache server or
your users will see lots of error messages until all the old
connections are retired.
Using persistent connections to reduce resource uti-
lization can work if your PHP scripts all follow these
guidelines:
• Program all scripts to clean up after them-
selves.
• Only use Oracle session parameters in well
defined and agreed upon ways.
Having all your scripts clean up after
themselves is an easy idea. If you
open a statement handle, close it. If
you open a new descriptor, close it.
If you can remember to always close
every resource you ever open, your
Oracle server will reward you with
even performance and high upti-
mes.
While there are quite a few useful
features in an Oracle server that can
be taken advantage of via Oracle
session parameters, these parame-
ters must always be mutually agreed
upon by all the programmers and
used consistently. If one program-
mer chooses to set an Oracle session
parameter, the unintended effects this will have on
everyone else’s code are very difficult to predict.
Moreover, bug reports involving these types of param-
eters are almost impossible to find.
Minimizing Commits and Transaction
Size
Every time an Oracle database does a commit, it will
save whatever is in the current buffer to disk. This is true
whether or not there is any data to save. Each of these
disk writes takes time and resources on the Oracle serv-
er.
Because the default behavior of the Oracle functions
in PHP is to have auto-commit turned on by default,
you dramatically increase the number of unnecessary
disk writes performed by the database server. The rea-
son that so many of the disk writes are unnecessary lies
in the fact that almost every statement handle used in
a PHP site is a query for data. Unless your query is a
“select for update” operation, a select statement will
require no saving of data and only needs the disk to
read.
The easiest way to avoid doing commits when all you
want to do is read data from the database is to use the
OOCCII__DDEEFFAAUULLTT
option on your
OOCCIIEExxeeccuuttee
statements.
This changes the behavior of SQL execute statements
from auto-committing your statement handle to defer-
ring the commit. While this might lead to problems
with the roll-back spaces, if you followed the earlier
advice of always closing your statement handles, then
June 2004
●
PHP Architect
●
www.phparch.com
11
FFEEAATTUURREE
Low-impact Programming with PHP and Oracle
“E
very time an
Oracle database
does a commit,
it will save what-
ever is in the
current buffer to
disk.”
resource utilization will be kept to a minimum.
When doing inserts, updates, and deletes, however,
you must do a commit or your data changes will not be
saved. There may be situations where you wish to defer
the commit until after further operations are complet-
ed, but under most circumstances you’ll want to com-
mit as soon as you execute the statement. The only sit-
uation where you must defer a commit is when using
certain types of bind variables such as when dealing
with large objects or PL/SQL.
If you are going to defer commits when doing inserts,
updates, or deletes, then you must make sure that you
keep enough room in your Oracle rollback segments.
The size of your rollback segments should be more than
large enough to accommodate the largest transaction
you will ever have in a single script. If you’re going to
do a lot of work with large objects, then you should
make sure that your rollback segments could accom-
modate the largest large object you think you’ll
encounter.
Leveraging the SQL Buffer Cache
One of the chief benefits of using an Oracle database as
the backbone of your PHP-driven site is that the SQL
engine available to you has enormous power to manip-
ulate data. With that power, however, you pay a price
as your queries become more and more complicated.
Each time you pass a SQL statement to the Oracle data-
base, it must be parsed and an execution model must
be created. Every SQL statement must pass through the
Oracle cost based optimizer (CBO) to determine what
indexes will be used and in what order, along with the
various join conditions to best return the data request-
ed.
To keep from returning to the CBO unnecessarily,
Oracle will maintain a cache of the results of each parse
and execution. However, this cache is based on the
exact SQL statement and is case sensitive. As a result, to
properly leverage this cache and avoid having to re-
parse the same SQL statements over and over again, all
of your SQL statements must be standardized.
The easiest ways to standardize your SQL statements
consist of following two simple guidelines: always use a
consistent case convention and avoid putting newline
characters in your SQL strings. While many program-
mers like to use a mixed case for all of their SQL state-
ments, it is hard to find two programmers who will do
it exactly the same way. The easiest way to avoid hav-
ing case issues deny you equal access to the SQL buffer
cache is to always use the same case for all SQL state-
ments. While it is a personal choice on which case to
use, we choose to always use lower case for SQL state-
ments, since they are easier to type this way.
Another way to leverage the SQL cache is to look for
queries that vary only by particular parameters. For
example, if you are always calling up rows from a par-
ticular table just varying the primary key you query on,
then you can make them all use the same SQL cache
entry by using a bind variable for the varying parame-
ter. In this way, the SQL is always the same but the bind
variable lets you select which row you wish to return.
Another way to reduce the load on the Oracle data-
base server is to move more complex queries into
Oracle views. A view is simply a pre-defined query. The
advantage to a view is that it gets compiled and opti-
mized when it is created, rather than whenever the
associated statement is executed. As a result, when the
PHP script calls on the view, the database server has
already dealt with the complex conditions. Effective use
of views, often difficult for PHP programmers making
the transition from other database systems, can dra-
matically reduce resource utilization. Another benefit is
that your DBA can optimize the views in the system
while leaving the PHP programmers code untouched.
By following these guidelines, a busy site can often
achieve a buffer cache hit ratio of 80%, or even better
under some circumstances. This will dramatically
reduce the load on the CBO and other aspects of the
Oracle database server.
Minimizing Data Transfer
Another stealth reason for seemingly slow performance
lies in how much data is transferred between the Oracle
server and the Apache server. Hopefully, the database
and web servers are located physically close to one
another (preferably on the same network subnet).
However, large amounts of data transfer—often unnec-
essary—can cause slowdowns and reduce response
times.
One way to reduce data transfer is to create views
that let you retrieve only the data that you actually
need. For example, if you have a table with one or
more large objects in it and you don’t need LOB data,
don’t put those columns in your query. If you prefer to
use the
sseelleecctt ** ffrroomm……
syntax, then create a view that
contains the non-LOB columns of the table.
Another way to avoid unnecessary data transfer is to
write queries without having to do queries within your
return result loops. If you have your code do an inner
query for each return result row, then you’re going to
be putting a lot of extra pressure on data transfer
between your PHP script and the Oracle server. Because
of the ability of the Oracle SQL engine to do complex,
multi-dimensional queries, it is almost always possible
to write a nested query as a single query.
Another place where data transfer can hurt perform-
ance is within the database itself. It is often difficult for
the CBO to know that a join condition is a foreign
key/primary key relationship. If you know that only one
matching row will ever occur for a given join condition,
then you can let the CBO know this by passing the
FFIIRRSSTT__RROOWWSS
SQL compiler directive as in Listing 1. This
lets the CBO know that all of the join conditions are for-
June 2004
●
PHP Architect
●
www.phparch.com
12
FFEEAATTUURREE
Low-impact Programming with PHP and Oracle
eign keys to primary keys.
One last condition where data transfer can affect per-
formance is in the use of database links between multi-
ple Oracle servers. When you do joins across database
links or even on the far side of the database link, the
local CBO will work especially hard putting all the data
together. You will often find that performance on both
the local database server and remote database server
suffers as data transfers across the database link eat up
resources on both sides.
Optimizing Common Tasks with PHP and Oracle
There are a number of common tasks where the
choices made by the programmer can have a signifi-
cant cumulative effect on performance. Among the
tasks where these choices arise are in providing paged
output, computing subtotals and grand totals, finding
sums and averages conditionally, and querying against
date constraints. In all cases, there are multiple ways to
accomplish the same task. We will show what we have
found produces the least impact on all our servers
together.
Paging Query Output
Programmers are often called upon to provide search
output or reports in a paged format. For example, you
may want to show search results limiting the output to
only 20 results per page. In a web environment, this
helps to avoid situations where a poor search may
return thousands of entries, only the first few of which
are really of interest to the user.
With some RDBMS systems—including mySQL and,
to some extent, Microsoft SQL Server—built-in exten-
sions to SQL allow you to do this quickly and easily.
Fortunately or unfortunately, Oracle databases have
only half of what you need to limit output, and it is
always done before the sorting requested in the query.
Given those limitations, you need to decide whether
you will have PHP limit your search output or whether
you will have your Oracle query return only the rows
you are interested in.
To really understand the limitations in Oracle, let’s
start to build a paged query from the inside out. Let’s
say you want to retrieve all employee records sorted by
last name and then first name. The query in Listing 2
works well enough. So long as there are only a few
dozen employees, you never need worry about how
many rows get returned and displayed in the browser.
Once you move beyond a few dozen rows returned
to a few hundred rows returned, you’ll want to limit the
output. If you wanted to display only the first twenty
rows in the query and you had done a superficial read-
ing of the Oracle SQL manuals, you would be tempted
to use the query in Listing 3. Unfortunately, the Oracle
CBO will limit your query by row number prior to
applying the sorting routines, giving you inconsistent
output. The cure is to put the main query from Listing
2 into a subquery, as in Listing 4.
This basic technique can give you a query that cuts
off at a particular maximum value. This works well
enough to display the first page of your paged output.
Assuming you want to display the second page, you
will have to employ another round of querying. If you
modify the query in Listing 4 to also return the row
number returned, then you can put the whole query in
yet another subquery and limit output to only those
rows whose row number is at least as large as the min-
imum value you want to return. The full query shown
in Listing 5 shows how to return rows 20 through 39 of
the result set.
The main performance issues you’ll face when trying
to decide whether to use the paged query described
above versus doing a solution with PHP code will cen-
ter on whether data transfer or the Oracle CBO opti-
mizations give you the best performance. Our experi-
ence has demonstrated that for the first 3-4 pages of
output, the query solution gives slightly better results.
June 2004
●
PHP Architect
●
www.phparch.com
13
FFEEAATTUURREE
Low-impact Programming with PHP and Oracle
sseelleecctt
//**++ FFIIRRSSTT__RROOWWSS **//
ee..eennaammee,,
dd..ddnnaammee
ffrroomm
eemmpp ee,,
ddeepptt dd
wwhheerree
ee..ddeeppttnnoo==dd..ddeeppttnnoo
Listing 1
sseelleecctt ** ffrroomm eemmpp oorrddeerr bbyy eennaammee
Listing 2
sseelleecctt ** ffrroomm eemmpp wwhheerree rroowwnnuumm << 2211 oorrddeerr bbyy eennaammee
Listing 3
sseelleecctt
ee..**,,
rroowwnnuumm rr
ffrroomm
((sseelleecctt ** ffrroomm eemmpp oorrddeerr bbyy eennaammee)) ee
wwhheerree
rroowwnnuumm << 2211
Listing 4
sseelleecctt
ff..**
ffrroomm ((
sseelleecctt
ee..**,, rroowwnnuumm rr
ffrroomm ((sseelleecctt ** ffrroomm eemmpp oorrddeerr bbyy eennaammee)) ee
wwhheerree
rroowwnnuumm << 4400)) ff
wwhheerree
ff..rr >>== 2200
Listing 5
The larger the inner result set and the more pages of
output the user pages to, the closer the performance
impact between query and PHP solutions become.
Using Roll-up Queries
Another common task when producing reports
involves creating subtotals and grand totals. When the
only computations are summations and counts,
whether you use the aggregation functions in Oracle or
you use PHP variables is almost immaterial. However, if
your query involves averages or other functions, then
you’ll want to take advantage of the large family of
aggregation functions available in Oracle SQL.
The roll-up features available in Oracle SQL center on
options given the GROUP BY clause along with use of
the GROUPING function. An example query using
these functions is shown in Listing 6. This query finds
the minimum, maximum, average, and totals for
June 2004
●
PHP Architect
●
www.phparch.com
14
FFEEAATTUURREE
Low-impact Programming with PHP and Oracle
1 <table>
2 <tr>
3 <th>Name</th>
4 <th>Department</th>
5 <th>Min Salary</th>
6 <th>Max Salary</th>
7 <th>Avg Salary</th>
8 <th>Total</th>
9 </tr>
10 <?php
11 $conn_ora = OCIPLogin(“SCOTT”, “TIGER”);
12
13 $sql = “select d.dname, e.ename, min(e.sal) as min_salary, “;
14 $sql .= “max(e.sal) as max_salary, avg(e.sal) as avg_salary,
15 $sql .= “sum(e.sal) as salary, grouping(d.dname) as d, “;
16 $sql .= “grouping(e.ename) as e “;
17 $sql .= “from emp e, dept d “;
18 $sql .= “where e.deptno=d.deptno “;
19 $sql .= “group by rollup(d.dname, e.ename)”;
20
21 $stmt = OCIParse($conn_ora, $sql);
22 OCIExecute($stmt, OCI_DEFAULT);
23 while (OCIFetchInto($stmt, $row, OCI_ASSOC)) {
24 if ($row[“D”] == 1) {
25 // grand total row since all departments are grouped
26 ?>
27 <tr>
28 <td colspan=2>Grand Totals</td>
29 <td align=right>$<?php echo number_format($row[“MIN_SALARY”], 2); ?></td>
30 <td align=right>$<?php echo number_format($row[“MAX_SALARY”], 2); ?></td>
31 <td align=right>$<?php echo number_format($row[“AVG_SALARY”], 2); ?></td>
32 <td align=right>$<?php echo number_format($row[“TOTAL”], 2); ?></td>
33 </tr>
34 <?php
35 } elseif ($row[“E”] == 1) {
36 // department subtotal total row since all employees are grouped
37 ?>
38 <tr>
39 <td colspan=2>Subtotal for <?php echo $row[“DNAME”]; ?></td>
40 <td align=right>$<?php echo number_format($row[“MIN_SALARY”], 2); ?></td>
41 <td align=right>$<?php echo number_format($row[“MAX_SALARY”], 2); ?></td>
42 <td align=right>$<?php echo number_format($row[“AVG_SALARY”], 2); ?></td>
43 <td align=right>$<?php echo number_format($row[“TOTAL”], 2); ?></td>
44 </tr>
45 <?php
46 } else {
47 // employee output row
48 ?>
49 <tr>
50 <td><?php echo $row[“ENAME”]; ?></td>
51 <td><?php echo $row[“DNAME”]; ?></td>
52 <td align=right>$<?php echo number_format($row[“MIN_SALARY”], 2); ?></td>
53 <td align=right>$<?php echo number_format($row[“MAX_SALARY”], 2); ?></td>
54 <td align=right>$<?php echo number_format($row[“AVG_SALARY”], 2); ?></td>
55 <td align=right>$<?php echo number_format($row[“TOTAL”], 2); ?></td>
56 </tr>
57 <?php
58 }
59 }
60 OCIFreeStatement($stmt);
61 ?>
62 </table>
Listing 7
sseelleecctt
dd..ddnnaammee,,
ee..eennaammee,,
mmiinn((ee..ssaall)) aass mmiinn__ssaallaarryy,,
mmaaxx((ee..ssaall)) aass mmaaxx__ssaallaarryy,,
aavvgg((ee..ssaall)) aass aavvgg__ssaallaarryy,,
ssuumm((ee..ssaall)) aass ssaallaarryy,,
ggrroouuppiinngg((dd..ddnnaammee)) aass dd,,
ggrroouuppiinngg((ee..eennaammee)) aass ee
ffrroomm
eemmpp ee,,
ddeepptt dd
wwhheerree
ee..ddeeppttnnoo==dd..ddeeppttnnoo
ggrroouupp bbyy
rroolllluupp((dd..ddnnaammee,, ee..eennaammee))
Listing 6
salaries by employee with department subtotals and a
grand total for all rows. In order to know which rows
contain subtotals for a particular column, we include
values from the GROUPING functions in our SELECT
clause. The main thing to remember about the GROUP-
ING function is that it behaves opposite to what you
think it does. Moreover, when a column is part of a
rolled-up subtotal, then the return set will have a NULL
value in that column. You cannot count on this to tell
you when you have a subtotal column in case there are
actual NULLs that are a legitimate part of the return set.
A complete PHP example using the query from
Listing 6 utilizing the GROUPING function output to
show subtotal rows and the grand total row is shown in
Listing 7.
Using Case and NVL
A number of specialized data situations occur where
the Oracle CASE and NVL functions provide beneficial
solutions. Often, you will want to total things condi-
tionally or otherwise operate on selective data. In addi-
tion, there are cases where you want to pivot your out-
put from what the natural select order would give you.
Finally, there are cases where you want one thing to
happen if there is a value in a column and another
thing to happen if the column is null.
To conditionally operate on columns, you can use
PHP with variable accumulators, or you can do these
accumulations in your query. The advantage to doing
them in PHP is that you will cut down on the amount
of work the Oracle CBO has to do when it is assembling
the query. However, with the use of the SQL buffer
cache, indexes, and potentially views, you can mitigate
this quite a bit. The big disadvantage to doing this with
PHP is that you will have a lot of unnecessary data
transfer between your Oracle server and your Apache
server. Moreover, you will be increasing the memory
utilization on your Apache server—a commodity that is
usually in short supply on a busy machine.
Let’s say that you wanted to sum up the salaries for
all managers. You can do this with the PHP code in
Listing 8. This will retrieve all the data from the data-
base and decide which data to use in its sum.
Alternatively, you can use the single query shown in
Listing 9 that will give you the answer right away.
When you wish to transpose or pivot the output, you
are often forced to retrieve all your data via calls to the
database and then put it into arrays in PHP for output.
This often is the most efficient method for accomplish-
ing this task. However, if the circumstances are right,
you can use the CASE function to retrieve the columns
you wish to use individually. We have found this partic-
ularly useful when we wish to display reports on trans-
actional data for today, yesterday, this week, and this
month. The use of a
CCAASSEE
statement to pivot around
these date values makes the process very straightfor-
ward. If you refer to Listing 10, you’ll see a sample
query whose output appears in Figure 1. This, like the
previous example, gets just the information required
with a minimum of data transfer.
Finally, when you are looking to execute a query that
will have conditional logic based on whether a column
is
NNUULLLL
or contains a value, the
NNVVLL
function can greatly
speed up the process. There are a number of ways in
which this function can be of use. For example, when
we have a sequence value that is used in a table for
sorting purposes, we often want to have new entries
append at the end of the sequence. The SQL statement
in Listing 11 will insert a new item and will guarantee
that it will always appear at the end of the list.
Another instance where the
NNVVLL
function can be of
use is when dealing with effectivity dates. In those
June 2004
●
PHP Architect
●
www.phparch.com
15
FFEEAATTUURREE
Low-impact Programming with PHP and Oracle
TTOODDAAYY YYEESSTTEERRDDAAYY TTHHIISS__WWEEEEKK TTHHIISS__MMOONNTTHH
--------------------------------------------------
1201 91 1201 3034
Figure 1
1 <?php
2 $conn_ora = OCIPLogin(“SCOTT”, “TIGER”);
3
4 $sql = “select * from emp”;
5
6 $stmt = OCIParse($conn_ora, $sql);
7 OCIExecute($stmt, OCI_DEFAULT);
8 $mgr_total = 0;
9 while (OCIFetchInto($stmt, $row, OCI_ASSOC)) {
10 if ($row[“MGR_FLAG”] == “Y”)
11 $mgr_total += $row[“SAL”];
12 }
13 OCIFreeStatement($stmt);
14 ?>
Listing 8
sseelleecctt ssuumm((ccaassee wwhheenn mmggrr__ffllaagg==’’YY’’ tthheenn ssaall eellssee 00 eenndd)) aass mmggrr__ssaallffrroomm eemmpp
Listing 9
sseelleecctt
ssuumm((ccaassee wwhheenn ttrruunncc((llaasstt__llooggiinn__ddaattee))==ttrruunncc((ssyyssddaattee))
tthheenn 11 eellssee 00 eenndd)) aass ttooddaayy,,
ssuumm((ccaassee wwhheenn ttrruunncc((llaasstt__llooggiinn__ddaattee))==ttrruunncc((ssyyssddaattee--11))
tthheenn 11 eellssee 00 eenndd)) aass yyeesstteerrddaayy,,
ssuumm((ccaassee wwhheenn ttrruunncc((llaasstt__llooggiinn__ddaattee,, ‘‘IIWW’’))==ttrruunncc((ssyyssddaattee,, ‘‘IIWW’’))
tthheenn 11 eellssee 00 eenndd)) aass tthhiiss__wweeeekk,,
ssuumm((ccaassee wwhheenn ttrruunncc((llaasstt__llooggiinn__ddaattee,, ‘‘MMOONNTTHH’’))==ttrruunncc((ssyyssddaattee,, ‘‘MMOONNTTHH’’))
tthheenn 11 eellssee 00 eenndd)) aass tthhiiss__mmoonntthh
ffrroomm
uusseerrss
Listing 10
iinnsseerrtt iinnttoo mmeennuu__iitteemm
((mmeennuu__iitteemm__iidd,, sseeqq,, pprroommpptt,, uurrll))
vvaalluueess
((mmeennuu__iitteemm__sseeqq..nneexxttvvaall,, nnvvll((mmaaxx((sseeqq))++1100,, 1100)),, ‘‘NNeeww IItteemm’’,, ‘‘//nneeww__iitteemm..hhttmmll’’))
Listing 11
cases, you want to know if the current date is at least
what the start date is indicating and at most what the
end date is indicating. Listings 12 and 13 contain two
alternative queries that will return the same data. The
results of these queries are not terribly different in their
impact on the database server and are mostly just an
exercise in thinking about the uses of
NNVVLL
in creative
ways.
Fast Oracle Date Functions
One thing that often trips up programmers who are
new to Oracle databases is that the Oracle
DDAATTEE
column
data type is actually a date and time column. Oracle
does not have a column that is date only or time only
as many other RDBM systems do. Instead, Oracle dates
are stored internally as a floating point number. The
integer portion of the floating point number is the
number of days since January 1, 2000BC. The mantissa
represents what portion of a day the time represents.
Thus, 10.5 would represent January 10, 2000BC, at
noon.
This method of storing dates means that there are
some very quick methods for doing particular kinds of
date-related logic. For example, if you want to know
the number of days between two dates (not counting
any time of day differences), then you can use the
TTRRUUNNCC
function as shown in Listing 14. If you want to
know if a column named
LLAASSTT__DDAATTEE
matches today,
then you can compare
TTRRUUNNCC((LLAASSTT__DDAATTEE))
with
TTRRUUNNCC((SSYYSSDDAATTEE))
. The TRUNC function, given just a sin-
gle parameter, will convert the floating point date into
an integer. This has the effect of converting the date
and time into midnight of the given date.
If you pass additional arguments to
TTRRUUNNCC
, you can
move your date in even more strategic fashions. For
example, to see whether two dates are in the same
month, you can compare
TTRRUUNNCC((ddaattee11,, ‘‘MMOONNTTHH’’))
to
TTRRUUNNCC((ddaattee22,, ‘‘MMOONNTTHH’’))
. To see if two dates are in the
same week, you can use
TTRRUUNNCC((ddaattee11,, ‘‘IIWW’’))
and
TTRRUUNNCC((ddaattee22,, ‘‘IIWW’’))
. Note that
we use
IIWW
instead of
WWWW
since in Oracle, the
IIWW
refers to
an ISO week specification in which weeks always begin
on Monday. If you use the
WWWW
week parameter, then the
week will begin on whatever day of the week that year’s
January 1st occurs on.
If you refer back to Listing 10, you will see an effec-
tive use of the
TTRRUUNNCC
function with Oracle dates. These
functions are much faster than using either
TTOO__CCHHAARR
comparisons or doing comparisons of
BBEETTWWEEEENN
.
Moreover, because Oracle date columns also contain a
time, using
TTRRUUNNCC
will save you from inclusive problems
when you do use a
BBEETTWWEEEENN
function. For example, let’s
say that you want to know all transactions that
occurred between August 1, 2003, and August 31,
2003. If you just used the query in Listing 15, then no
transactions that occurred during the day on August 31
would be included. However, if you use the query in
Listing 16, then you’ll pick up everything that occurred
on August 31 regardless of the time.
Tuning and Monitoring
In a perfect world, all PHP programmers would be
experts in creating pre-tuned SQL statements. If we
could always be counted on to do things in the most
efficient manner, then we could do away with monitor-
ing of our databases. However, since none of us ever
seems to live in this perfect place, there is always a need
to keep an eye on which queries are using what
resources on the database and on the web server in
order to keep on top of performance issues.
Tuning and monitoring consists of a number of
tasks—most of the time performed by an Oracle DBA.
June 2004
●
PHP Architect
●
www.phparch.com
16
FFEEAATTUURREE
Low-impact Programming with PHP and Oracle
sseelleecctt
ttrruunncc((tt11..ddaattee)) -- ttrruunncc((tt22..ddaattee)) aass ddaayyss__bbeettwweeeenn
ffrroomm
tteesstt__ttaabbllee tt11,,
tteesstt__ttaabbllee tt22
Listing 14
sseelleecctt
**
ffrroomm
uusseerr__lloogg
wwhheerree
lloogg__ddaattee bbeettwweeeenn ttoo__ddaattee((‘‘0011--AAUUGG--22000033’’,, ‘‘DDDD--MMOONN--YYYYYYYY’’)) aanndd
ttoo__ddaattee((‘‘3311--AAUUGG--22000033’’,, ‘‘DDDD--MMOONN--YYYYYYYY’’))
Listing 15
sseelleecctt
**
ffrroomm
uusseerr__lloogg
wwhheerree
ttrruunncc((lloogg__ddaattee)) bbeettwweeeenn ttoo__ddaattee((‘‘0011--AAUUGG--22000033’’,, ‘‘DDDD--MMOONN--YYYYYYYY’’)) aanndd
ttoo__ddaattee((‘‘3311--AAUUGG--22000033’’,, ‘‘DDDD--MMOONN--YYYYYYYY’’))
Listing 16
sseelleecctt
**
ffrroomm
uusseerrss
wwhheerree
((ssttaarrtt__ddaattee << ssyyssddaattee oorr ssttaarrtt__ddaattee iiss nnuullll)) aanndd
((eenndd__ddaattee >> ssyyssddaattee oorr eenndd__ddaattee iiss nnuullll))
Listing 12
sseelleecctt
**
ffrroomm
uusseerrss
wwhheerree
nnvvll((ssttaarrtt__ddaattee,, ssyyssddaattee)) <<== ssyyssddaattee aanndd
nnvvll((eenndd__ddaattee,, ssyyssddaattee)) >>== ssyyssddaattee
Listing 13
However, occasions do occur where a PHP programmer
can participate in the tuning and monitoring cycles.
Often, a DBA will find a query that is performing badly,
will know how to fix it, but will have a terrible time
actually finding where this query lives in the code for
the web site. Moreover, the DBA will need to work
closely with the programmer to ensure that any alter-
ations to the query will continue to return the correct
data.
The basics of performance tuning come down to two
tasks for programmers: finding bad (or poorly perform-
ing) SQL and creating monitoring tools.
Finding Bad SQL
The Oracle data dictionary keeps track of system
resource utilization for each and every query in the sys-
tem. You can query various system tables to discover all
kinds of performance characteristics at any time you
wish to.
There are several ways in which to measure good and
bad performance for Oracle SQL. The chief characteris-
tics we monitor include buffer gets, parse calls, and disk
reads. These refer to the various parts of the Oracle
database server having the greatest impact on query
performance. In each case, lower numbers indicate bet-
ter performance characteristics.
To find the queries that have the poorest ratio of
buffer gets, you can perform the query in Listing 17.
Buffer gets are a measure of CPU utilization in the
Oracle server. If you are concerned only with a few
database users, then you can limit the where clause to
include only the database users you wish to find. To
interpret the ratio returned in this query, let’s examine
the manner in which it is constructed. This query
returns the ratio of CPU utilization over the number of
times the query was executed. This will let newer
queries (those that haven’t been executed many times
yet) stand out over queries that have been in the sys-
tem longer.
Another measure of poor performance would be the
number of times a particular query must be parsed by
the Oracle CBO. A query to find the worst performers
in this category is shown in Listing 18. A higher num-
ber indicates a query that may need to be placed in a
view or otherwise optimized to avoid having to parse it
over and over again. While a view won’t reduce the
number of “soft” parses, it will cut down on the num-
ber of “hard” parses. Again, by dividing the number of
parse calls by the execution count, newer queries will
rise to the top of the list.
The last major area of performance indicators would
be looking at those queries with the highest ratio of
disk reads. To find these queries you can use Listing 19.
This query will report a high ratio for a query if there are
a large number of disk reads for each execution of the
query. These are likely candidate queries to be further
optimized with additional WHERE clauses or other tech-
niques to cut down on data transfer.
More information on interpreting these values can be
found in the Oracle publication Oracle 8i Designing
and Tuning for Performance. This is usually available as
a PDF document in the set of CDs that came with your
Oracle server software.
June 2004
●
PHP Architect
●
www.phparch.com
17
FFEEAATTUURREE
Low-impact Programming with PHP and Oracle
sseelleecctt
rroowwnnuumm aass rraannkk,,
bb..**
ffrroomm ((
sseelleecctt
uu..uusseerrnnaammee,,
vv..ppaarrssee__ccaallllss,,
vv..eexxeeccuuttiioonnss,,
rroouunndd((vv..ppaarrssee__ccaallllss//ddeeccooddee((vv..eexxeeccuuttiioonnss,,00,,11,,vv..eexxeeccuuttiioonnss)))) aass rraattiioo,,
vv..ssqqll__tteexxtt
ffrroomm
vv$$ssqqll vv,,
ddbbaa__uusseerrss uu,,
((
sseelleecctt
ppaarrssiinngg__uusseerr__iidd,,
66**aavvgg((ppaarrssee__ccaallllss)) aass aavvgg__ppaarrssee__ccaallllss
ffrroomm
vv$$ssqqll
wwhheerree
ppaarrssee__ccaallllss >> 00
ggrroouupp bbyy
ppaarrssiinngg__uusseerr__iidd
)) aa
wwhheerree
vv..ppaarrssee__ccaallllss >> aa..aavvgg__ppaarrssee__ccaallllss aanndd
vv..ppaarrssiinngg__uusseerr__iidd==aa..ppaarrssiinngg__uusseerr__iidd aanndd
vv..ppaarrssiinngg__uusseerr__iidd==uu..uusseerr__iidd
oorrddeerr bbyy
rroouunndd((vv..ppaarrssee__ccaallllss//ddeeccooddee((vv..eexxeeccuuttiioonnss,,00,,11,,vv..eexxeeccuuttiioonnss)))) ddeesscc
)) bb
wwhheerree
rroowwnnuumm <<== 8800
Listing 18
sseelleecctt
rroowwnnuumm aass rraannkk,,
bb..**
ffrroomm ((
sseelleecctt
uu..uusseerrnnaammee,,
vv..bbuuffffeerr__ggeettss,,
vv..eexxeeccuuttiioonnss,,
rroouunndd((vv..bbuuffffeerr__ggeettss//ddeeccooddee((vv..eexxeeccuuttiioonnss,,00,,11,,vv..eexxeeccuuttiioonnss)))) aass rraattiioo,,
vv..ssqqll__tteexxtt
ffrroomm
vv$$ssqqll vv,,
ddbbaa__uusseerrss uu,,
((
sseelleecctt
ppaarrssiinngg__uusseerr__iidd,,
aavvgg((bbuuffffeerr__ggeettss)) aass aavvgg__bbuuffffeerr__ggeettss
ffrroomm
vv$$ssqqll
wwhheerree
bbuuffffeerr__ggeettss >> 00
ggrroouupp bbyy
ppaarrssiinngg__uusseerr__iidd
)) aa
wwhheerree
vv..bbuuffffeerr__ggeettss >> aa..aavvgg__bbuuffffeerr__ggeettss aanndd
vv..ppaarrssiinngg__uusseerr__iidd==aa..ppaarrssiinngg__uusseerr__iidd aanndd
vv..ppaarrssiinngg__uusseerr__iidd==uu..uusseerr__iidd
oorrddeerr bbyy
rroouunndd((vv..bbuuffffeerr__ggeettss//ddeeccooddee((vv..eexxeeccuuttiioonnss,,00,,11,,vv..eexxeeccuuttiioonnss)))) ddeesscc
)) bb
wwhheerree
rroowwnnuumm <<== 8800
Listing 17
Monitoring System Resources
When you’ve started looking at the performance of var-
ious queries within your system, you will soon find
yourself wanting to do something more systematic to
keep on top of performance issues before they get out
of hand. When this time comes, you’ll want to have a
set of queries and a process in place to look at SQL per-
formance over time.
By putting the queries mentioned above into a regu-
larly scheduled script, you can see over time which
queries are being used most often by your applications.
This can be an invaluable tool for programmers who
want to find out where optimization efforts will yield
the highest results and can also keep you from spend-
ing lots of time optimizing a query that is run once per
day at the expense of optimizing a query run on every
single page on your site.
Another area where you can track performance issues
is on the Apache servers. In this case, looking at how
many active Apache child processes occur at any given
time, as well as tracking load average, memory utiliza-
tion, and overall system process counts, can help iden-
tify problems with your web server before they get out
of hand.
We use a number of scripts and tools to monitor our
systems on a regular basis. Among the key tools are
scripts that run the queries looking for poor perform-
ance in buffer gets, parse calls, and disk reads on a daily
basis. We also have on every web server scripts that
monitor load averages, process counts, and memory
utilization and feed that data into a round-robin data-
base (RRD). We can then generate graphs of system
performance for a number of time periods on an ongo-
ing basis.
Only through a concerted effort on a number of
fronts can you maintain a good picture of where your
performance issues lie today and where the perform-
ance issues of tomorrow will likely occur.
Summary
Having access to a powerful database like Oracle is a
tremendous asset to a PHP programmer. The flexibility
and power of the data engine is something that can
really help create complex and robust web sites.
However, performance issues often arise that will take
you by surprise unless you are prepared to deal with
them.
Hopefully, this guide can serve as a starting point for
you and your organization to take steps in utilizing your
Oracle database to its full potential without causing too
many problems. The lessons passed on here are all the
result of painful processes as we dealt with performance
issues in real life crisis situations. Perhaps learning how
we solved performance issues will keep your web and
database servers working well together.
June 2004
●
PHP Architect
●
www.phparch.com
18
FFEEAATTUURREE
Low-impact Programming with PHP and Oracle
About the Author ?>
To Discuss this article:
/>Simone Grassi is a software developer and partner in Ci S.B.i.C. snc
(
wwwwww..cciissbbiicc..ccoomm
,) in Cesena, Italy. He has been using PHP since late
2000. For Cisbic, he works as a project-planning analyst. As a software
developer, he takes care of the Lybra framework (lybra.sf.net). Simone
received his degree in Computer Science from the University Of Bologna.
You can contact him at
ssiimmoonnee@@cciissbbiicc..ccoomm
.
dynamic web pages - german php.node
news . scripts . tutorials . downloads . books . installation hints
Dynamic Web Pages
www.dynamicwebpages.de
sex could not be better
|
sseelleecctt
rroowwnnuumm aass rraannkk,,
bb..**
ffrroomm ((
sseelleecctt
uu..uusseerrnnaammee,,
vv..ddiisskk__rreeaaddss,,
vv..eexxeeccuuttiioonnss,,
rroouunndd((vv..ddiisskk__rreeaaddss//ddeeccooddee((vv..eexxeeccuuttiioonnss,,00,,11,,vv..eexxeeccuuttiioonnss)))) aass rraattiioo,,
vv..ssqqll__tteexxtt
ffrroomm
vv$$ssqqll vv,,
ddbbaa__uusseerrss uu,,
((
sseelleecctt
ppaarrssiinngg__uusseerr__iidd,,
aavvgg((ddiisskk__rreeaaddss))//11000000 aass aavvgg__ddiisskk__rreeaaddss
ffrroomm
vv$$ssqqll
wwhheerree
ddiisskk__rreeaaddss >> 00
ggrroouupp bbyy
ppaarrssiinngg__uusseerr__iidd
)) aa
wwhheerree
vv..ddiisskk__rreeaaddss >> aa..aavvgg__ddiisskk__rreeaaddss aanndd
vv..ppaarrssiinngg__uusseerr__iidd==aa..ppaarrssiinngg__uusseerr__iidd aanndd
vv..ppaarrssiinngg__uusseerr__iidd==uu..uusseerr__iidd
oorrddeerr bbyy
rroouunndd((vv..ddiisskk__rreeaaddss//ddeeccooddee((vv..eexxeeccuuttiioonnss,,00,,11,,vv..eexxeeccuuttiioonnss)))) ddeesscc
)) bb
wwhheerree
rroowwnnuumm <<== 8800
Listing 19
P
HP’s primary spell checking functionality is made
available through the pspell extension., which is
based on the Aspell library.
The Aspell library is a well-established open source
spell-checking engine used by many other applications.
One of its neat abilities is the capability to spell check
multiple languages, rather than the single one that
most other solutions are limited to. At this time, Aspell
has dictionaries for over 20 languages, and new ones
are being added all the time. Because Aspell is a fairly
commonly used library, it can be found by default on
most open-source operating systems—chances are, you
won’t actually need to download an install a new
library to take advantage of what Aspell has to offer.
This is very useful, because it makes the process of
adding the pspell extension to PHP a simple matter of
recompiling PHP with the
––wwiitthh--ppssppeellll
flag, which
should be helpful if you need to convince your ISP to
add this extension. Unfortunately, even though the
underlying library is almost always available, very few
ISPs actually have this extension enabled, so keep that
in mind when writing software that will depend on the
functionality offered by pspell.
Getting Started with pspell
Installing or upgrading Aspell (PHP requires Aspell
0.50.0+) is a fairly simple process that involves down-
loading and installing the library itself, followed by the
installation of the dictionaries you intend to use. The
library includes only the spell checking engine—the
dictionaries must be installed individually from separate
packages available on the Aspell’s website. Additional
dictionaries can be added at any point, so there is little
need to install all of the available dictionaries right
away. That said, the dictionary files themselves take
very little space (about one to two megabytes each)
and the advantage of compiling them at the onset is
that you won’t have waste time if you want to use addi-
tional dictionaries at a later point. In any case, all major
distributions have binary packages for both the library
and commonly used dictionaries, so the upgrade/install
process is fairly painless.
Once the library is installed, you simply need to add
——wwiitthh--ppssppeellll
to your PHP configuration. If the library
was not installed inside the standard location, such as
//uussrr
or
//uussrr//llooccaall
, you will need to specify the correct
path to the directory where it resides, for example
——
wwiitthh--ssppeellll==//ppaatthh//ttoo//lliibb
. You also have the option of
installing pspell as a shared extension (via
——wwiitthh--
ppssppeellll==//uussrr,,sshhaarreedd
) that can be enabled only for par-
ticular hosts. This is quite useful if you need to enable
the functionality for a specific account or limit capabil-
ity to use pspell to higher tier accounts. It is important
to keep in mind that spell checking is a relatively slow
process and spell checking large quantities of text may
take some time. Therefore, it is important set execution
June 2004
●
PHP Architect
●
www.phparch.com
19
FF EE AA TT UU RR EE
Spell checking with PHP
by Ilia Alshanetsky
PHP: Any
OS: Any
Applications: None
Code Directory: N/A
REQUIREMENTS
Everyone makes typos. That is a universal constant, but no one wants their typos to end up in the final product, be it
an e-mail or a blog entry. Consequently, many programs have integrated spell checkers that can find and help correct
the mistakes made by busy fingers. Unfortunately, for the most part this functionality is not available to many forms of
web communications, such as forums, blogs and online comment systems. This is primarily due to the fact that it is not
easy to implement a spell checker, and few developers are familiar with the extensions and libraries that can simplify the
process. This article will focus on two PHP extensions that offer spell checking functionality that can be used to validate
and correct typos and spelling errors.
time limits to prevent scripts from taking excessive
amounts of time when forced to spell-check large doc-
uments.
Once all the necessary tools are in place, the actual
spell-checking process can begin. The first step is the
creation of a pspell resource that will allow the usage of
the spell checker. This is done via the
ppssppeellll__nneeww(())
function, which accepts a number of parameters. The
first and the only required parameter is the two-letter
language code that tells the extension which dictionary
will be used. Since some languages use multiple
spellings for the same word (depending, for example,
on the particular dialect of the language spoken in a
country), you may also want to specify the country as
well, which can be passed along as a second, optional
parameter. For example, for the English language there
are three possible country values:
British, Canadian and American.
You also have an option of speci-
fying a jargon and locale files,
although these values are largely
unused and in most instances it
is best to leave them at their
defaults. The very last option (a
bit mask) allows you to set your
preferences regarding how hard
pspell should try to find spelling
alternatives to a word that is mis-
spelled. The values range from
PPSSPPEELLLL__FFAASSTT
, which will return
the fewest number of suggestions but will take the least
amount of CPU, to
PPSSPPEELLLL__BBAADD__SSPPEELLLLEERRSS
, which will
return the maximum possible number of suggestions,
but will take a noticeably greater amount of CPU. The
default mode is
PPSSPPEELLLL__NNOORRMMAALL
, which tries to find a
“happy compromise” between the quality of the sug-
gestions returned and the processing time needed to
generate them. You can also use this parameter to set
an option and indicate how words that are not separate
by a space (also known as run-togethers) should be
handled. By default, they would be considered typos,
but in some instances you may want to allow them.
// simple version
$psl = pspell_new(“en”);
// with options
$psl = pspell_new(“en”, “canadian”, “”, “ISO8859-15”,
PSPELL_FAST | PSPELL_RUN_TOGETHER);
The spell-checking options can also be set via the
ppssppeellll__ccoonnffiigg__rruunnttooggeetthheerr(())
function, which can
change the run-together behaviour, and
ppssppeellll__ccoonnffiigg__mmooddee(())
, which can change the spelling
mode. The ability to change the mode at any time is
very handy, as it allows the usage of faster defaults and
then, if these fail to generate the necessary data, to
switch to a more complex mechanism for a particular
word.
Now that we have created a spell-checking resource,
it can be used to validate text. The actual text valida-
tion is done through two functions,
ppssppeellll__cchheecckk(())
,
which will determine if the specified word is correctly
spelled and return
FFaallssee
if it isn’t. In this case, you can
use the
ppssppeellll__ssuuggggeesstt(())
function to generate an array
of possible alternatives.
if (!pspell_check($psl, “speler”)) {
$suggestions = pspell_suggest($psl, “speler”);
foreach ($suggestions as $word) {
echo $word . “<br />\n”;
}
}
Both
ppssppeellll__cchheecckk(())
and
ppssppeellll__ssuuggggeesstt(())
can only work
with one word at a time; to
spell-check an entire document,
you will need to first use PHP to
break down the text into indi-
vidual words that can be fed to
the pspell. If you are dealing
with plain text, this is very sim-
ple to do, especially so if you
have PHP 4.3.0, where the
ssttrr__wwoorrdd__ccoouunntt(())
function is
available. This function has
three modes of operation: the default mode will simply
count the number of words inside a string and return
an integer result. The second mode—the one we
want—will return an array of words that can be spell
checked.
$wl = str_word_count(“will return an aray of words”,
1);
foreach ($wl as $key => $word) {
if (!pspell_check($psl, $word)) {
$sug = pspell_suggest($psl, $word);
// replace word with 1st suggestion
$wl[$key] = $sug[0];
}
}
// print corrected text (will return an array of
words)
echo implode(‘ ‘, $wl);
If you are using an older version of PHP that does not
have
ssttrr__wwoorrdd__ccoouunntt(())
you can emulate the second
operation mode by using
pprreegg__mmaattcchh__aallll(())
, which is
noticeably slower, but will still get the job done.
If (!function_exists(“str_word_count”)) {
function str_word_count($text) {
preg_match_all(‘!(\w+)!’, $text, $m);
return $m[0];
}
}
June 2004
●
PHP Architect
●
www.phparch.com
20
FFEEAATTUURREE
Spell checking with PHP
“A
t this time, Aspell
has dictionaries for
over 20 languages,
and new ones are
being added
all the time”
The problem with this code is that the word array you
will receive in return from your call to the “simulated”
ssttrr__wwoorrdd__ccoouunntt(())
function contains only the words,
and all of the punctuation and non-alphabetic charac-
ters will not be present. Thus, if you simply do
iimmppllooddee(())
as I did in the previous example, all of those
characters will be lost and only the words will be
retained—clearly a bad idea. Thus, we need a way to
replace the misspellings without losing the formatting,
so that our modifications only affect the misspelled
words.
This is where the third and arguably the most useful
mode of the
ssttrr__wwoorrdd__ccoouunntt(())
function comes into
play. When this mode is used the resulting array will
have the offset of the word inside the string as the key
for each element. This allows you to easily find the posi-
tion of the word inside the text and replace it via
ssuubbssttrr__rreeppllaaccee(())
quickly and efficiently—and without
the risk of text corruption. While this can be emulated
with
pprreegg__mmaattcchh__aallll(())
, it would require the use of the
PPRREEGG__OOFFFFSSEETT__CCAAPPTTUURREE
flag that is only available in PHP
4.3.0 and higher. Since PHP 4.3.0 already has a native
ssttrr__wwoorrdd__ccoouunntt(())
function, there is no need to emulate
it using a slower alternative. If you are using an older
version of PHP, you will need to come up with your own
string parser—or better yet upgrade your installation!
function spell_check_str($psl, $s)
{
$off = 0;
foreach (str_word_count($s, 2) as $key => $w) {
if (!pspell_check($psl, $w)) {
$r = array_shift(pspell_suggest($psl, $w));
// replace word with a 1st suggestion
$s = substr_replace($s, $r, $key+$off,
strlen($w));
// adjust offset since word has changed
$off += strlen($r) - strlen($w);
}
}
return $s;
}
Replacing a misspelled word with the first suggestion
offered by the spell checker is not always the best
approach, although most of the time it will work rea-
sonably well. Generally speaking, it is better to replace
a word with a select box that would allow the user to
choose a correct spelling or convert the word into a link
that would raise a layer with possible suggestions
through JavaScript. The function itself would pretty
much remain the same—except that the code that
deals with replacement would now loop through all the
possible results and create an appropriate list of alterna-
tives.
Advanced pspell
Now that the basic spell checking functionality is work-
ing, let’s take a look at some of the more advanced
capabilities that the pspell extension offers.
When working with text, you will undoubtedly
encounter words that are correctly spelled, but the spell
checker does not recognize. This is a frequent occur-
rence when using industry-specific terminology, names
or slang. In those instances, you would probably want
to make the spell checker ignore this word and not try
to suggest alternatives for it. For example, when work-
ing with HTML tags inside the text or formatting tags
such as FUDcode or BBcode, you could add the tags to
the dictionary so that the spell checker can simply skip
over them and save you the time of having to add spe-
cial handlers for those tags, over complicating your
code. For this purpose, you can use the
ppssppeellll__aadddd__ttoo__sseessssiioonn(())
function to add a word to the
current session that would effectively make the spell
checker ignore it.
// Before: will print <BLOCK QUOTE>stuff</BLOCK
QUOTE>
echo spell_check_str($psl, ‘<BLOCKQUOTE>stuf</BLOCK-
QUOTE>’);
// After: will print <BLOCKQUOTE>stuff</BLOCKQUOTE>
pspell_add_to_session($psl, ‘BLOCKQUOTE’);
echo spell_check_str($psl, ‘<BLOCKQUOTE>stuf</BLOCK-
QUOTE>’);
Having to add words to the “ignore list” every time
you need to spell check some text, however, is rather
annoying—not to mention quite slow. Fortunately, the
pspell extension provides a mechanism for storing your
ignore list and then quickly importing it back into your
script. Not only does this make the code simpler and
faster, but it also allows many processes to benefit from
a joint ignore list. To create such a list, the pspell
resource creation process needs to be changed to allow
for the usage of personal dictionary files.
Instead of using the
ppssppeellll__nneeww(())
function,
ppssppeellll__ccoonnffiigg__ccrreeaattee(())
is used to create a new pspell
configuration resource. This function takes all of the
same arguments as
ppssppeellll__nneeww(())
, except the option
parameter. The options regarding the mode and han-
dling of the run-togethers will need to be set via
ssppeellll__ccoonnffiigg__mmooddee(())
and
ppssppeellll__ccoonnffiigg__rruunnttooggeetthheerr(())
separately. The
ppssppeellll__ccoonnffiigg__ppeerrssoonnaall(())
function is
then used to specify the path to the custom word list
file, containing a list of words to ignore. If you intend
to add to this file, be sure that it is writable by the user
who PHP is going to be running as. Once these steps
are completed, a pspell resource can be created based
on the configuration resource that was generated via
the
ppssppeellll__nneeww__ccoonnffiigg(())
function.
// create new config based on english
$psc = pspell_config_create(“en”);
// specify personal dictionary file
pspell_config_personal($psc, “./my.pws”);
// create pspell resource based on new config
$psl = pspell_new_config($psc);
June 2004
●
PHP Architect
●
www.phparch.com
21
FFEEAATTUURREE
Spell checking with PHP
New words can then be added to the ignore list via
the
ppssppeellll__aadddd__ttoo__ppeerrssoonnaall(())
function, which is iden-
tical to
ppssppeellll__aadddd__ttoo__sseessssiioonn(())
as far as its parameters
are concerned. Once all of the necessary words have
been added, they can be appended to the ignore file
via the
ppssppeellll__ssaavvee__wwoorrddlliisstt(())
function.
// add word to personal dictionary file
pspell_add_to_personal($psl, “Ilia”);
// safe wordlist (appends to existing list
pspell_save_wordlist($psl);
Unfortunately, the Aspell library does not provide an
API for removing or modifying existing entries inside
the personal word list file. To make these changes, you
will need to write your own function. Fortunately, this
is very easy to do, since the format of the file includes
a basic header that specifies how many entries can be
found inside it, followed by the entries themselves, one
per line. If you find yourself editing your custom dic-
tionaries very often, you can create a function for this
purpose, otherwise using your favorite text editor will
do the trick:
function mod_wordlist($file, $word, $rep=’’)
{
$d = file_get_contents($file);
$wc = substr_count($d, “\n”) – 1; // old entry
count
// remove word
$d = str_replace(“\n”.$word.”\n”, “\n”, $d);
if ($rep) { // add new word
$d .= $rep . “\n”;
} else { // modify header (number of words has
changed)
$d = str_replace($wc, ($wc – 1), $d);
}
fwrite(fopen($file, “w”), $d); // update word list
file
}
In some instances, you not only want to add words to
the ignore list, but also add them to the dictionary file
itself as possible alternatives that can be used in future
runs as a replacement for typos in words that are not
found in the stock dictionary file.
This, too, is something that can be done through the
pspell extension, which allows for the creation and
usage of personal dictionary files that can be used in
addition to the base file provided for a particular lan-
guage. If the dictionary file fails to find a match, it’ll try
using the custom file to determine a possible alternative
for misspelled word. As with the word list file, you first
need to specify the path to the file where possible alter-
natives can be found. This is done via the
ppssppeellll__ccoonnffiigg__rreeppll(())
function, which takes a pspell
configuration resource as the first parameter and the
path to the alternate dictionary file as the second. The
ppssppeellll__ssttoorree__rreeppllaacceemmeenntt(())
can then be used to add
replacement suggestions, and calling
ppssppeellll__ssaavvee__wwoorrddlliisstt(())
will now save both the ignore
list and the replacement list.
$psc = pspell_config_create(“en”);
// specify personal replacement file
pspell_config_repl($psc, “./my.rep”);
// Add replacement
pspell_store_replacement($psl, “Iaaaliaa”, “Ilia”);
// save replacement
pspell_save_wordlist($psl);
If you do not want to save a replacement, you can
use the
ppssppeellll__ssttoorree__rreeppllaacceemmeenntt(())
function without
specifying the path to the file and saving the word list.
The replacement mechanism itself is intelligent enough
that if the specified string is close enough to the source,
it will use the replacement rather then the base diction-
ary, which may have further matches. Using the above
code as an example, if I were to spell check “Iaaliaa”, it
would prioritize “Ilia”, which was my replacement for
“Iaaaliaa” over the dictionary’s suggestion of “Alia”.
This, of course, means that, when adding replacement
pairs, you don’t actually need to add an entry for every
possible misspelling of the word being added.
Moreover, the library is intelligent enough to check its
main database to see if the replacement is already avail-
able and, if it is, it will not add the word to the person-
al replacement file.
As with the word list file, there is no native function
to modify or remove entries from the replacement file.
Fortunately, the format of this file is even simpler than
the one used by the word list, because, while it does
have a one line header, it is not actually being used.
Other than the header, the entries are stored in the
bbaadd__wwoorrdd rreeppllaacceemmeenntt
format and can be easily modi-
fied with the following function.
function md_repl($file, $src, $dst, $n_src=’’,
$n_dst=’’)
{
// remove word
$data = str_replace(“\n{$src} {$dst}\n”,
“\n”, file_get_contents($file));
// add new replacement
if ($new_src && new_dst) {
$data .= “{$new_src} {$new_dst}\n”;
}
// update word list file
fwrite(fopen($file, “w”), $data);
}
Beyond pspell
Aside from pspell, another spell checking extension
called Enchant has been recently made available.
This extension can be found inside the PECL reposito-
ry and can be installed by running
ppeeaarr iinnssttaallll
eenncchhaanntt
. It is based on the Enchant library that provides
a common API to multiple spell checking engines, such
as Aspell, Ispell, MySpell, and so on.
Having a single native API means that you can seam-
lessly use multiple engines without having to write your
June 2004
●
PHP Architect
●
www.phparch.com
22
FFEEAATTUURREE
Spell checking with PHP
own wrappers around different interfaces. The Enchant
library works directly with each spell checking library,
so there is virtually no speed difference between using
the native interface offered by pspell and the wrapper
offered by Enchant.
The main advantage of Enchant is that it gives you
the ability to use different spell checkers that may sup-
port other languages or have specific benefits, such as
lower memory footprint (Ispell) and better dictionaries.
It also guarantees that you will have access to spell
checking support on virtually any system, since at least
one spell checking library is always included, although
you will need to install Enchant itself.
The Enchant extension API is fairly similar to that of
the pspell extension and, for the most part, offers the
same capabilities—with, however, a few notable differ-
ences. Since the Enchant extension can work with
many different spell checking engines, the spell check-
ing resource creation is designed to accommodate the
selection of the engine to be used. The first step is to
initialize the enchant broker, which is done through a
call to
eenncchhaanntt__bbrrookkeerr__iinniitt(())
function. You can then
use the resulting resource to determine what spell
checking engines are supported by calling the
eenncchhaanntt__bbrrookkeerr__ddeessccrriibbee(())
function, which will return
an array of information arrays about the supported
spell checkers.
/* Example Output */
Array
(
[0] => Array
(
[name] => aspell
[desc] => Aspell Provider
[file] =>
/usr/lib/enchant/libenchant_aspell.so
)
)
The next step is determining the availability of a dic-
tionary for a language you want to spell check. Unlike
pspell, which uses two parameters to select language
and locale, in Enchant both the settings are handled by
a single parameter, which looks something like
llaann--
gguuaaggee__LLOOCCAALLEE
(for example,
eenn__CCAA
for Canadian
English). This parameter is then passed to the
eenncchhaanntt__bbrrookkeerr__ddiicctt__eexxiissttss(())
function, which returns
TTrruuee
if a dictionary is available and
FFaallssee
otherwise.
If you have more then one spell checking engine
available (which is almost always the case), the Enchant
library will automatically choose what it thinks is the
best engine for the task, based on the availability of a
dictionary and its quality. If the default choice is not to
your liking, you can modify the order in which the
engines are picked by using the
eenncchhaanntt__bbrrookkeerr__sseett__oorrddeerriinngg(())
function, which takes
the language string, followed by a comma delimited
string where the engines are listed in the order you
want them to be used.
$r = enchant_broker_init();
enchant_broker_set_ordering($r, “en_CA”,
“aspell,myspell,ispell”);
At this point, you can load the actual dictionary file
using
eenncchhaanntt__bbrrookkeerr__rreeqquueesstt__ddiicctt(())
, which will cre-
ate a resource that can be used to perform spell-check-
ing operations. To check which spell-checking engine is
being used, you can use the
eenncchhaanntt__ddiicctt__ddeessccrriibbee(())
function, which will return an array with information
about the selected engine.
if (enchant_broker_dict_exists($r, “en_CA”)) {
$d = enchant_broker_request_dict($r, “en_CA”);
print_r(enchant_dict_describe($d));
}
To determine if the spelling of a word is correct, you
can use the
eenncchhaanntt__ddiicctt__cchheecckk(())
function, which will
return a Boolean value indicating whether the specified
word was spelled correctly. Just like with the pspell
extension, spell checking can only be done word at a
time. If the word has a spelling error, you can generate
an array of suggestions by calling the
eenncchhaanntt__ddiicctt__ssuuggggeesstt(())
function.
if (!enchant_dict_check($d, “spel”)) {
$suggestions = enchant_dict_suggest($d,
“spel”);
}
To simplify the process, Enchant also offers the
eenncchhaanntt__ddiicctt__qquuiicckk__cchheecckk(())
function, which can check
a word and return a list of possible alternatives if it is
not spelled correctly, all in one go. This makes the spell
checking-code slightly faster and reduces the amount
of PHP code you need to write (which is never a bad
thing).
if (!enchant_dict_quick_check($d, “spel”, $sugges-
tions)) {
print_r($suggestions);
}
June 2004
●
PHP Architect
●
www.phparch.com
23
FFEEAATTUURREE
Spell checking with PHP
“T
he main advantage of Enchant
is that it gives you the ability to
use different spell checkers that
may support other languages...”
When the function returns
FFaallssee
, indicating that the
specified word has been misspelled and a variable is
provided as the third optional argument (passed by ref-
erence), the function will populate that variable with
possible spelling alternatives.
Once you are done working with the spell checker,
you should free the dictionary and the broker resources
by calling the
eenncchhaanntt__bbrrookkeerr__ffrreeee__ddiicctt(())
and
eenncchhaanntt__bbrrookkeerr__ffrreeee(())
functions respectively. While
PHP will free those resources automatically on script ter-
mination, it is generally better to do so manually, so
that memory and dictionary file handles are released as
soon as possible.
The Enchant extension also supports the ignore lists,
which can be used to allow certain words to be skipped
by the spell checker. As with pspell, you have the abili-
ty to use both session- and file-based ignore lists, which
can be shared by multiple processes. The session-based
ignore lists are handled by two functions,
eenncchhaanntt__ddiicctt__iiss__iinn__sseessssiioonn(())
, which checks if a par-
ticular word is already being ignored and
eenncchhaanntt__ddiicctt__aadddd__ttoo__sseessssiioonn(())
, which adds a word to
a session’s ignore list. Since the add-to-session function
does not return a status indicator, you should use
eenncchhaanntt__ddiicctt__iiss__iinn__sseessssiioonn(())
to verify if the word
was, in fact, added successfully. Keep in mind that not
all spell-checking engines may support this functionali-
ty, so this may not always be possible.
enchant_dict_add_to_session($d, “Ilia”);
if (!enchant_dict_is_in_session($d, “Ilia”)) {
exit(“Cannot add to session ignore list.\n”);
}
To use a more permanent file-based ignore list, you
first need to establish the path to your ignore file by
calling the
eenncchhaanntt__bbrrookkeerr__rreeqquueesstt__ppwwll__ddiicctt(())
func-
tion. The file must already exits, but can be empty—if
it does not exist or is not accessible, the function will
fail. To add entries to the file, you can use the
eenncchhaanntt__ddiicctt__aadddd__ttoo__ppeerrssoonnaall(())
function. Like the ses-
sion function, this function does not return a success
indicator and
eenncchhaanntt__ddiicctt__iiss__iinn__sseessssiioonn(())
should be
used to confirm that the word has actually been added.
$r = enchant_broker_init();
$d = enchant_broker_request_pwl_dict($r,
“./my.dict”);
enchant_dict_add_to_personal($d, “BLOCKQUOTE”);
if (!enchant_dict_is_in_session($d, “BLOCKQUOTE”)) {
exit(“Cannot add to personal ignore list.\n”);
}
Because the file name you provide is a generic hold-
er for the personal word list, the Enchant library will
automatically create a spell-checking engine-specific
file as well. For example, if the Aspell backend is being
used, Enchant will also create
mmyy..ppwwss
inside the same
directory as
mmyy..ddiicctt
. This is a very important tidbit of
information to keep in mind when adding new words
to the list, since you will need to ensure that not only
mmyy..ddiicctt
is writable, but the directory it is in as well.
Adding replacement alternatives is also possible;
however, unlike what happens with pspell, these are
always session-specific, and there is no way to save
them for later re-use. This is done through the
eenncchhaanntt__ddiicctt__ssttoorree__rreeppllaacceemmeenntt(())
function, which
takes a source string and a possible replacement that
can be used for substitution.
enchant_dict_store_replacement($d, “AAliaaa”,
“Ilia”);
enchant_dict_quick_check($d, “AAliaaa”, $sug);
echo $sug[0] . “\n”; // will print Ilia
Mistakes Hapen Without A Spel
Cheker
As I hope you have an opportunity to discover, spell
checking text strings from you PHP scripts is not at all
difficult. For the most part, the biggest difficulty is not
in checking the text—but in actually breaking it down
into individual words that can then be validated.
Fortunately, ever since the introduction of the
ssttrr__wwoorrdd__ccoouunntt(())
function, this has become a fairly triv-
ial process
The functionality offered by a spell checker has many
uses, even in situations where users do not input long
text strings. For example, in a search engine a spell
checker can be used to validate keywords. You can also
use it inside a PHP 404 handler (which I discussed in the
March 2004 issue of php|architect) to check for typos
and automatically correct them, taking the user to right
page without any manual intervention or extra steps.
Ultimately, a spell checker is a powerful tool that can be
successfully applied to many problems with little effort,
but that can make a big impact on the quality of your
applications.
June 2004
●
PHP Architect
●
www.phparch.com
24
FFEEAATTUURREE
Spell checking with PHP
About the Author ?>
To Discuss this article:
/>Ilia Alshanetsky is an active member of the PHP development team and
is the current release manager of PHP 4.3.X. Ilia is also the principal
developer of FUDforum (
hhttttpp::////ffuuddffoorruumm..oorrgg
), an open source bulletin
board and a contributor to several other projects.