Tải bản đầy đủ (.pdf) (63 trang)

Tài liệu FOCUS ON DATA XMLPULL AS AN ALTERNATIVE TO DOM & SAX doc

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (2.96 MB, 63 trang )

NEXCESS.NET Internet Solutions
304 1/2 S. State St.
Ann Arbor, MI 48104-2445

PHP / MySQL
SPECIALISTS!
Simple, Affordable, Reliable PHP / MySQL Web Hosting Solutions
POPULAR SHARED HOSTING PACKAGES
MINI-ME
$
6
95
POPULAR RESELLER HOSTING PACKAGES
500 MB Storage
15 GB Transfer
50 E-Mail Accounts
25 Subdomains
25 MySQL Databases
PHP5 / MySQL 4.1.X
SITEWORX control panel
/mo
SMALL BIZ
$
21
95
2000 MB Storage
50 GB Transfer
200 E-Mail Accounts
75 Subdomains
75 MySQL Databases


PHP5 / MySQL 4.1.X
SITEWORX control panel
/mo
NEXRESELL 1
$
16
95
900 MB Storage
30 GB Transfer
Unlimited MySQL Databases
Host 30 Domains
PHP5 / MYSQL 4.1.X
NODEWORX Reseller Access
All of our servers run our in-house developed PHP/MySQL
server control panel: INTERWORX-CP
INTERWORX-CP features include:
- Rigorous spam / virus filtering
- Detailed website usage stats (including realtime metrics)
- Superb file management; WYSIWYG HTML editor
INTERWORX-CP is also available for your dedicated server. Just visit
o for more information and to place your order.
WHY NEXCESS.NET? WE ARE PHP/MYSQL DEVELOPERS
LIKE YOU AND UNDERSTAND YOUR SUPPORT NEEDS!
ORDER TODAY AND GET 10% OFF ANY WEB HOSTING PACKAGE
VISIT HTTP://NEXCESS.NET/PHPARCH FOR DETAILS
Dedicated & Managed Dedicated server solutions also available
Serving the web since Y2K
/mo
NEXRESELL 2
$

59
95
7500 MB Storage
100 GB Transfer
Unlimited MySQL Databases
Host Unlimited Domains
PHP5 / MySQL 4.1.X
NODEWORX Reseller Access
/mo
CONTROL PANEL
:
php
php
5
php
php
4
NEW! PHP 5 & MYSQL 4.1.X
PHP4 & MySQL 3.x/4.0.x options also available
We'll install any PHP extension you
need! Just ask :)
128 BIT SSL CERTIFICATES
AS LOW AS $39.95
/ YEAR
DOMAIN NAME REGISTRATION
FROM $10.00
/ YEAR
GENEROUS AFFILIATE PROGRAM
UP TO 100% PAYBACK
PER REFERRAL

30 DAY
MONEY BACK GUARANTEE
FREE DOMAIN NAME
WITH ANY ANNUAL SIGNUP
4.1.x
3.x/4.0.x

II NN DD EE XX
6 EDITORIAL
You Know Nothing
7 What’s New!
51 Test Pattern
The Never Ending Backlog
by Marcus Baker
55 Product Review
Jaws 0.5: Just When You Thought it
was Safe to Go Back in the Water
by Peter B. MacIntyre
59 Security Corner
Persistent Logins
62 exit(0);
Oh No, Not Again!
by Marco Tabini
10 The Anatomy of a Hit:
An Advanced PHP & MySQL Hit Counter
by John R. Zaleski, Ph.D.
22 Solving the Unicode Puzzle
by Michael Toppa
29 XMLPull
An Alternative to DOM & SAX

by Markus Nix
40 More on Advanced Sessions
and Authentication in PHP5
by Ed Lecky-Thompson
TABLE OF CONTENTS
php|architect
TM
Departments
Features
Have you had your PHP today?
Have you had your PHP today?
The Magazine For PHP Professionals

NEW COMBO NOW AVAILABLE: PDF + PRINT
NNEEWW
LLoowweerr PPrriiccee!!

NNOOTTHHIINNGG
yyoouu kknnooww
EEDDIITTOORRIIAALL
S
oftware development is humbling. Just when you think
you’ve got a solid handle on every last (important) bit of tech-
nology you need to complete the project at hand, you’re
often slapped in the face with the news that you’re just plain
wrong. This news can be both frustrating, and encouraging (at the
same time, believe it or not).
Let me set the scene. Your team has been commissioned with
adding a new section to your corporate intranet. In the course of
the addition, you adopt a new technology of some sort. Perhaps

this is a new database abstraction layer, or a different manner of
handling HTML forms. It could be anything; it doesn’t really mat-
ter. Your team has worked on this new module for two months.
You’ve put all of your collective knowledge and experience into
the project. The launch date is in a couple days, and you’re actu-
ally going to make your deadline.
So, this sounds pretty good so far; what could go wrong?
Perhaps one of the directors is about to walk in with a must-have
feature that needs to be in the next release, and will disrupt your
schedule? Sure. This happens all the time, but it’s not the scenario
I’m thinking of—that’s just frustrating, and rarely the least bit
encouraging. The bad situation that I’m thinking of is (oddly) free
of managerial influence.
This new technology that you’ve adopted is really great. It has a
few problems, but you’ve managed to work around them. All
things considered, it’s saved you many hours in the course of the
past few weeks, and you’ve been bragging about it to your devel-
oper-friends who work at different companies.
Then, in the course of your daily, duly-diligent reading of various
PHP news sources, you discover a brand-new, just-released-yester-
day extension that could replace this other new technology you’ve
already adopted. Not only is it a suitable replacement, but it solves
all of the problems you had to work around, and also opens the
door to new possibilities that you didn’t even consider.
Frustrating because you’re about to release a critical project that
encompasses technology that you’ve just discovered is inferior. But
encouraging because you’re now awaiting the day you’re allowed
to rip out all of that legacy (but, ironically, not-yet-released) code
and employ a superior product.
So, what’s my point? Simple: I know nothing. What I think I

know is only temporary, and could be supplanted at any moment.
My life as a developer is a constant journey of staying on top of
things, and no matter how much I think I “have it covered,”
there’s always something new about to appear on the weblog,
newsgroup, or source repository of tomorrow.
I hope the articles in this issue open your eyes to new ideas.
Especially the XMLPull article, which I think is pretty sweet new
(well, newer) technology, and that it’s not too late to incorporate
these ideas into your current—or next—project.
May 2005

PHP Architect

www.phparch.com
6
php|architect
Volume IV - Issue 5
May, 2005
Publisher
Marco Tabini
Editor-in-Chief
Sean Coates
Editorial Team
Arbi Arzoumani
Peter MacIntyre
Eddie Peloke
Graphics & Layout
Aleksandar Ilievski
Managing Editor
Emanuela Corso

News Editor
Leslie Hill

Authors
Marcus Baker, Ed Lecky-Thompson,
Peter B. MacIntyre, Chris Shiflett,
John R. Zaleski, Ph.D., Michael Toppa,
Markus Nix
php|architect (ISSN 1709-7169) is published twelve times a year by
Marco Tabini & Associates, Inc., P.O. Box 54526, 1771 Avenue Road,
Toronto, ON M5M 4N5, Canada.
Although all possible care has been placed in assuring the accuracy of
the contents of this magazine, including all associated source code, list-
ings and figures, the publisher assumes no responsibilities with regards
of use of the information contained herein or in all associated material.
Contact Information:
General mailbox:
Editorial:
Subscriptions:
Sales & advertising:
Technical support:
Copyright © 2003-2005 Marco Tabini &
Associates, Inc. — All Rights Reserved
EE DD II TT OO RR II AA LL RR AA NN TT SS
TM
Solar 0.2.0
paul-m-jones.com announces the release of Solar 0.2.0.
What is it? According to solarphp.com: "Solar is a simple
object library and application repository (that is, a com-
bined class library and application component suite) for PHP5."

"Solar provides simple, easy-to-comprehend classes and components for the com-
mon aspects of web-based rapid application development, all under the LGPL."
Solar is designed for developers who intend to distribute their applications to the
world. This means the database driver functions work exactly the same way for each
supported database. It also means that localization support is built in from the start."
Get all the latest info from
solarphp.com.
phpBB 2.0.14
The phpBB Group announces the release of phpBB 2.0.14,
the "We know we are (not) furry" edition. "This release
addresses some bugfixes as well as fixing some minor non-
critical security issues. All issues not reported to us before being released are not
credited to the founder, as usual."
"As with all new releases, we urge you to update as soon as possible. You can, of
course, find this download on our downloads page ( />loads.php). As usual, three packages are available to simplify your update."
"The Full Package contains entire phpBB2 source and English language package."
For more information visit:

NNEEWW SSTTUUFFFF
May 2005

PHP Architect

www.phparch.com
7
What’s New!
NN EE WW SS TT UU FF FF
Vogoo PHP API v0.8.2
Vogoo-API.com is happy to announce
the release of Vogoo PHP API 0.8.2.

Vogoo-API.com announces: Vogoo PHP
API v0.8.2 is a free PHP API licensed
under the terms of the GNU GPL. With
Vogoo PHP API, you can easily and
freely add professional collaborative
filtering features to your Web Site.
v0.8.2 features
• Handles all member/product
votes (available since v0.8)
• Fast computation of similarities
between members (available
since v0.8)
• One-to-one product recommen-
dations (available since v0.8)
• Ability for members to specify
when they are not interested in
a product recommendation
Planned features for future versions
• New engine based on products
recommendations that gives
better performances when little
information is available on the
member.
• Real time targeted ads
• Handles multiple product cate-
gories
• Collaborative filtering features
available for non-member visi-
tors
• Administration tool

• Engine for 'related sales'.
• Engine for 'related sales'.
Check out
Vogoo-API.com for all
the latest info.
The Zend PHP Certification Practice Test Book is now available!
We're happy to announce that, after many months of hard work, the Zend PHP
Certification Practice Test Book, written by John Coggeshall and Marco Tabini, is now
available for sale from our website and most book sellers worldwide!
The book provides 200 questions designed as a learning and practice tool for the
Zend PHP Certification exam. Each question has been written and edited by four
members of the Zend Education Board the very same group who prepared the
exam. The questions, which cover every topic in the exam, come with a detailed
answer that explains not only the correct choice, but also the question's intention,
pitfalls and the best strategy for tackling similar topics during the exam.
For more information, visit
hhttttpp::////wwwwww pphhppaarrcchh ccoomm//cceerrtt//mmoocckk__tteessttiinngg pphhpp
NNEEWW SSTTUUFFFF
May 2005

PHP Architect

www.phparch.com
8
Check out some of the hottest new releases from PEAR.
MDB2_Schema 0.2.0
PPEEAARR::::MMDDBB22__SScchheemmaa
enables users to maintain
RRDDBBMMSS
independent schema files in XML that can be used to create, alter and drop

database entities and insert data into a database. Reverse engineering database schemas from existing databases is also supported.
The format is compatible with both PEAR::MDB and Metabase.
MDB2 2.0.0beta4
PEAR MDB2 is a merge of the PEAR DB and Metabase php database abstraction layers.
Note that the API will be adapted to better fit with the new PHP 5-only PDO before the first stable release.
It provides a common API for all supported RDBMS. The main difference to most other DB abstraction packages is that MDB2 goes
much further to ensure portability. Among other things, MDB2 features:
• An OO-style query API
• A DSN (data source name) or array format for specifying database servers
• Datatype abstraction and on demand datatype conversion
• Portable error codes
• Sequential and non sequential row fetching as well as bulk fetching
• Ability to make buffered and unbuffered queries
• Ordered array and associative array for the fetched rows
• Prepare/execute (bind) emulation
• Sequence emulation
• Replace emulation
• Limited Subselect emulation
• Row limit support
• Transactions support
• Large Object support
• Index/Unique support
• Module Framework to load advanced functionality on demand
• Table information interface
• RDBMS management methods (creating, dropping, altering)
• RDBMS independent xml based schema definition management
• Reverse engineering schemas from an existing DB (currently only MySQL)
• Full integration into the PEAR Framework
• PHPDoc API documentation
Currently supported RDBMS:

• MySQL (mysql and mysqli extension)
• PostGreSQL
• Oracle
• Frontbase
• Querysim
• Interbase/Firebird
• MSSQL
• SQLite
• Others soon to follow.
Cache 1.5.5RC1
With the PEAR Cache, you can cache the result of certain function calls, as well as the output of a whole script run, or share data
between applications.
DB_DataObject_FormBuilder 0.14.0
DB_DataObject_FormBuilder will aid you in rapid application development using the packages DB_DataObject and HTML_QuickForm.
For having a quick but working prototype of your application, simply model the database, run DataObject's createTable script over it
and write a script that passes one of the resulting objects to the FormBuilder class. The FormBuilder will automatically generate a sim-
ple but working HTML_QuickForm object that you can use to test your application. It also provides a processing method that will auto-
matically detect if an
iinnsseerrtt(())
or update() command has to be executed after the form has been submitted. If you have set up
DataObject's links.ini file correctly, it will also automatically detect if a table field is a foreign key and will populate a selectbox with the
linked table's entries. There are many optional parameters that you can place in your DataObjects.ini or in the properties of your
derived classes, that you can use to fine-tune the form-generation, gradually turning the prototypes into fully-featured forms, and you
can take control at any stage of the process.
Net_GeoIP 0.9.0alpha1
A library that uses Maxmind's GeoIP databases to accurately determine geographic location of an IP address.
NNEEWW SSTTUUFFFF
May 2005

PHP Architect


www.phparch.com
9
Looking for a new PHP Extension? Check out some of the lastest offerings from PECL.
archive 0.2
The archive extension allows reading and writing tar and cpio archives using libarchive
(
/>xmlReader 1.0.1
This extension wraps the libxml xmlReader API. The reader acts as a cursor going forward on the document stream and stopping at
each node in the way. xmlReader is similar to SAX though uses a much simpler API.
runkit 0.1.0
Replace, rename, and remove user defined functions and classes. Define customized superglobal variables for general purpose use.
Execute code in restricted environment (sandboxing).
mqseries 0.8.0
This package provides support for IBM Websphere MQ (MQSeries).
colorer 0.2
Colorer take5 is a syntax highlighting and text parsing library, that provides services of text parsing in host editor systems in real-time
and transforming results into colored text. For details, see />While colorer is primarily designed for use with text editors, it can be also used for non-interactive syntax highlighting, for example,
in web applications. This PHP extension provides basic functions for syntax highlighting.
CONFERENCES
ApacheCon Europe 05
ApacheCon.com announces:
"ApacheCon Europe, the official conference of the Apache Software Foundation (ASF) will be held July 18-22 in Stuttgart, Germany.
For the forth consecutive year, half- and full-day pre-conference tutorials offer real world insight, techniques, and methodologies
pivotal to the increasing demand for Open Source software. Topics include Scalable Internet Architectures, Web Services, PHP,
mod_perl, Apache HTTP Server, Java, XML, Subversion, and SpamAssassin.
The three main conference days offer a wide range of beginner, intermediate and advanced sessions. ApacheCon attendees have
more than 70 sessions to choose from, to learn firsthand the latest developments of key Open-Source projects including the Apache
HTTP Server, the world's most popular web server software.
With plenty of room for networking and peer discussions, attendees can meet ASF Members and participants during the ApacheCon

Expo, evening events, Birds Of a Feather sessions and a number of informal social gatherings."
For more information visit:
/>VS.Php 1.1.1
Jcx.Software brings news of the immediate availability of
VS.Php version 1.1.1. This update adds support for PhpDoc
commenting, secure ftp deployment capabilities and many
bug fixes
PhpDoc is a powerful feature of PHP that allows the devel-
oper to add comments to the source code that can be used
to generate documentation. VS.Php uses this information to
provide a better intellisense content. For instance, VS.Php is
able to parse those comments to determine what type is a
particular variable. Intellisense uses this information to bet-
ter help the developer. This update also adds support for
secure ftp protocol for deploying applications through a
secure connection.
For information or to download VS.Php, visit:
/>PHPEdit 1.2
PHPEdit proudly announces the release of the
latest version, PHPEdit 1.2
Next major version of PHPEdit is finally available for down-
load. This version includes lots of changes in its internals, and
adds new, powerful features to the IDE, like complete PHP5
support, real-time syntax checking, jump to declaration,
SimpleTest integration, new document templates,
phpDocumentor Wizard and lots of enhancements in existing
tools like CodeHint, CodeInsight and CodeBrowser.
This version is available for free to all our customers. You
can download it and test it for 30 days. You can also buy a
license to avoid the time limit.

To grab the latest version, visit
/>T
he following methodology was motivated by a
request from a client of mine who asked me to
provide a web page access counter for their main
corporate web site. A condition of the deal, though,
was that they did not want to show the actual number
of accesses, publicly, on the web site, itself. Instead,
they wanted to keep track this data privately.
Their reasons for omitting a public counter were in
keeping with the idea that they did not want to broad-
cast the activity on their site to all visitors, and, in keep-
ing with the tone of their message, did not desire to
display a typical web page access counter on their site.
Instead, they wanted an access counter that would
provide them with a means of comparing and contrast-
ing the number of accesses from day to day so that
they could analyze advertising impacts on the number
of visitors who were hitting their site.
As you may know, numerous types of Web counters
exist that are wide ranging in their capabilities and
styles. However, I wanted to tailor a solution for my
client that would keep track of the number of accesses
to their site, while providing a tool to view these data
in a manner that was meaningful, and comparative.
The output would provide an at-a-glance summary that
would allow my client to assess the effectiveness of
advertising campaigns with respect to changes in site
activity.
What developed was a custom hit counter which

continues to evolve over time—an example screenshot
can be seen in Figure 1. The benefits of this hit count-
er are not so much in its uniqueness as in the possibili-
ties it offers to the average PHP developer who is inter-
ested in evolving their skills in the domain of PHP,
REQUIREMENTS
PHP
5.0 or greater
(5.0.4 available)
OS
Win2K Prof,
Win2K Advanced Server,
WinXP SP1/SP2
Other Software
MySQL version 4.0
or greater (4.1 available)
Code Directory hitcounter
May 2005

PHP Architect

www.phparch.com
FFEEAATTUURREE
10
The Anatomy of a Hit
An Advanced PHP & MySQL Hit Counter
by John R. Zaleski, Ph.D.
The combined approach of capturing web page access,
and charting the results provides a simple standalone
capability for graphically displaying hit counts to a web

site that requires only a basic working knowledge of PHP
and MySQL, yet provides a basic model for expanding and
developing a much more sophisticated counter.
Furthermore, the methodology for charting the hit count
data can be decoupled from basic web page access count-
ing for use in academic, business, or other types of data
mining applications where data charting and mining pro-
vide a unique way of comparing and contrasting data as
they change over time.
FF EE AA TT UU RR EE
RESOURCES
URL
hhttttpp::////wwwwww ttiizzaagg ccoomm//mmyyssqqllTTuuttoorriiaall//
URL
hhttttpp::////pphhpp rreessoouurrcceeiinnddeexx ccoomm//CCoommpplleettee__SSccrrii
ppttss//AAcccceessss__CCoouunntteerrss//TTeexxtt__BBaasseedd//
ii
FFEEAATTUURREE
The Anatomy of a Hit: An Advanced PHP & MySQL Hit Counter
May 2005

PHP Architect

www.phparch.com
11
MySQL, and user interface design.
The counter and graphing methodology I provide
here are very simple to understand and can be modi-
fied and used for many applications, even beyond web
page access counting.

Calling the Hit Counter
The visual hit counter methodology consists of two sep-
arate pieces of code: one for incrementing hit count
statistics on a web page, and another for analyzing and
mining those statistics for relevant value. The decision
to separate these two sets of functionalities is some-
what based on heuristics, but are born out of logic: by
separating the processing from the actual hit counting,
we remove the potential performance impacts associat-
ed with database access for each visit to a web page.
Instead, we assign the analytical data mining of the sta-
tistics themselves to a web site dedicated to their study.
This has the overall effect of reducing the load time of
the original web site so that users are not impacted.
To implement the data collection part of the process,
the initial step in any web page involves incorporating
the following lines of code:
<!— Add the client hit counter —>
<?php include “./hc.php”; ?>
<!— End body tag —>
The
hhcc pphhpp
file is then included in the web page, at the
desired location. Those wishing to make use of this
methodology need only include the above code seg-
ment in their PHP page (once all supporting files have
been uploaded to the server), and the hit counter
becomes operational.
The
hhcc pphhpp

code contains the logic to open a data
file (
hhiittccoouunntteerr ddaatt
), increment a counter, and store
various other statistics to the opened file each time a
web page with the preceding include statement is
encountered.
We begin the code in
hhcc pphhpp
by assigning the name
of the data file to the variable
$$CCOOUUNNTT__FFIILLEE
:
$COUNT_FILE = “hitcounter.dat”;
//
if ( filesize( $COUNT_FILE ) > 0 ) {
$contents = fread ( $fp, filesize( $COUNT_FILE )
);…}
If the file referred to by
$$CCOOUUNNTT__FFIILLEE
exists, and already
contains data, we can assume the contents are the
results of previous pages accesses. So, we read the con-
tents of the entire file. Upon reading the last value, I
assign the content to the
$$ccoonntteennttss
variable, increment
the value by 1, and append the new value to the
hhiittccoouunntteerr ddaatt
file.

If this is the first time the web page has been
accessed, the file is empty (or the file does not exist), so
we have to create the file and write new data to it. In
addition to simply writing the current counter value, I
also write the date and time stamp; this is to facilitate
the data mining process. The
hhiittccoouunntteerr ddaatt
file has
the following format:
[1] 23 14 45 PM Wednesday July 28th 2004 1
[2] 06 19 09 AM Thursday July 29th 2004 2
[3] 08 29 13 AM Thursday July 29th 2004 3
Note that much more information can be added (such
as the identity of those accessing the web page).
However, that code would need to be added to the
structure of the hit count listing. The code fragment
responsible for writing the output listing above is:
fwrite( $fp,”[“.$counter . “] “.date(“h:i A l F dS
Y”).” “.
$counter.” \n”);
The entire code listing for the hit counter is contained
in Listing 1. It is important to set the permissions to per-
mit the
hhcc pphhpp
file to read and write files in the directo-
ry in which it is placed. If this is not done properly, the
script will be unable to write to the
hhiittccoouunntteerr ddaatt
file.
Plotting Preliminaries

Plotting preparation is accomplished using the
ssiitteeiinn
ddeexx pphhpp
file (Listing 2). As I explained earlier, I had
opted to create the hit counter method independently
of the plotting code to decouple the hit counter
method from the database. This serves several purpos-
es. First, it allows those interested in just a plain hit
counter to implement it without requiring them to
master the techniques of database connectivity.
Second, this takes performance considerations into
account by avoiding database access during the count-
er incrementing process. Third, and finally, this enables
the user to alter and improve the plotting routine inde-
pendently of the hit counter so that accurate statistics
can continue to be kept by keeping the index page
intact.
It will be noted that in the hit counter method I
developed in Listing 1, there is no direct output of the
number of hits to the Web page. This is a matter of
choice for the Web page owner. Sometimes individuals
Figure 1
perceive that, if the count is too low, this can bode
poorly for return visits, while others believe that the hit
count statistic may be seen as inappropriate or tacky for
the particular site. I manage several sites for local busi-
nesses, and I have found have experienced both kinds
of sentiments from the business owners. Thus, by cre-
ating this separate method, and only publishing the
link to a site that is not directly associated with the web

index page and its child links, the business owners can
privately view the web page statistics to determine how
many accesses have been made. They can also view
when these hits occurred, in the course of the past
weeks, and months, and correlate the data to external
events (for instance, during periods of specific types of
advertising).
Updating the Database
I begin by opening a connection to the database and
entering all existing data from the hit counter method
into it. This is accomplished in the
ssiitteeIInnddeexx pphhpp
code:
$conn = mysql_connect(“localhost”, “root”,”admin”);
In the examples I provide, everything is run on the local
machine (
llooccaallhhoosstt
), and I have set the username and
password to
rroooott
and
aaddmmiinn
, respectively. The name of
the database instance can be arbitrarily defined by the
user; I chose
ssiitteessttaattss
. Developers have their own
naming conventions, and I’m merely giving you some
insight into my own. So, selecting the appropriate
database is accomplished via the following statement:

mysql_select_db(“sitestats”,$conn)
or die(“Could not open sitestats: “ .
mysql_error());
The “
oorr ddiiee
” clause allows me to catch any errors and
kick them out for debugging purposes, should a con-
nection problem arise. I now read the table of site
entries and find the last value so that it can be updated
with the latest data:
$table = “sitevisits”;
$check = “select * from $table”;
$qry = mysql_query($check)
or die (“Could not match data because “ .
mysql_error());
$nRows = mysql_num_rows($qry);
This query allows me to determine the current number
of rows contained in the table–this will be necessary
later. In addition, I load an array with the data that I just
read. To plot the data, I need it in a form that I can
manipulate in memory:
while ($newArray = mysql_fetch_array($qry) ) {
$visits = $newArray[‘visits’];
if ( strcmp( $debug, “yes” ) == 0 )
echo “ maxVisits = “ . $maxVisits .
“ value from db = “ . $visits . “<br>”;
if ( $visits > $maxVisits ) $maxVisits = $visits;
}
From this segment, we determine the number of visits
May 2005


PHP Architect

www.phparch.com
FFEEAATTUURREE
12
The Anatomy of a Hit: An Advanced PHP & MySQL Hit Counter
Listing 1 (cont’d)
50 if ( $debug == 1 )
51 echo “ stop: “ . $stop . “<br>”;
52
53 $previous_count = substr( $contents, $start,
54 $stop-2 );
55
56 if ( $debug == 1 )
57 echo “ Previous Count: “ . $previous_count
58 . “ <br>”;
59
60 $counter = 1 + (int) $previous_count;
61
62 if ( $debug == 1 )
63 echo “ Counter: “ . $counter . “<br>”;
64
65 fwrite($fp, “[“ . $counter . “] “ .
66 date(“H i s A l F dS Y”) . “ “ . $counter
67 . “ \n” );
68
69 fclose($fp);
70 } // endif
71

72
73 // If file exists, but has no content, this means it is
74 // the first time the counter is being used. In this
75 // instance, write the counter number and the date/time
76 // stamp to the hit counter file, with the counter
77 // number = 1.
78
79 if ( filesize( $COUNT_FILE ) == 0) {
80 fclose( $fp );
81
82 $fp = fopen(“$COUNT_FILE”, “a”);
83
84 $counter = 1;
85
86 if ( $debug == 1 ) echo “[“ . $counter . “] “
87 . date(“h:i A l F dS, Y”);
88
89 fwrite( $fp, “[“ . $counter . “] “
90 . date(“h:i A l F dS Y”) . “ “ . $counter . “ \n” );
91 fclose( $fp );
92
93 } // end if filesize = 0
94 } else {
95 echo “Can’t find file, check ‘\$file’<BR>”;
96 }
97 ?>
Listing 1
1 <?php
2 // hc.php
3

4 $debug = 0;
5
6 $ra = $_SERVER[“REMOTE_ADDR”];
7 $rh = $_SERVER[“REMOTE_HOST”];
8
9 $COUNT_FILE = “hitcounter.dat”;
10
11 $counter = 0; $start = 0; $stop = 0;
12
13 if (file_exists($COUNT_FILE)) {
14 $fp = fopen(“$COUNT_FILE”, “r”);
15
16 // If file exists, and has content, read that content,
17 // extract the counter value, add 1 to it, and re-write
18 // to the counter data file.
19 //
20
21 if ( filesize( $COUNT_FILE ) > 0 ) {
22 $contents = fread ( $fp, filesize( $COUNT_FILE ) );
23 if ( $debug == 1 ) echo $contents;
24
25 $stringlength = strlen($contents);
26 fclose(
$fp );
27
28 $fp = fopen(“$COUNT_FILE”, “a”);
29
30 $i = 0;
31
32 while ( $i < $stringlength )

33 {
34 $char = $contents{$i};
35 $i = $i + 1;
36
37 if ( $char == “[“ ) {
38 if ( $debug == 1 )
39 echo “<br> Found [ “ . $i . “<br>”;
40 $start = $i;
41 }
42 if ( $char == “]” ) {
43 if ( $debug == 1 ) echo “ Found ] “ . $i . “<br>”;
44 $stop = $i;
45 }
46 }
47
48 if ( $debug == 1 )
49 echo “ start: “
. $start . “<br>”;
FFEEAATTUURREE
The Anatomy of a Hit: An Advanced PHP & MySQL Hit Counter
May 2005

PHP Architect

www.phparch.com
13
and adjust our old maximum to reflect the current
value. The array variable,
$$vviissiittss
, now contains all of

the data from the database. Therefore,
$$vviissiittss
is a
multi-dimensional array that allows us to keep track of
all of this data. The time has come to read the
hhiittccoouunntteerr ddaatt
file and determine what’s new so that
this can be added to the database, and the
$$vviissiittss
array. The
hhiittccoouunntteerr ddaatt
file is opened and its records
are stored in a new temporary array,
$$ffiilleeEElleemmeennttss
:
$data = file($fileName);
foreach ($data as $column => $val )
{
if ( strcmp($val,” “) == 0 )
{
$fileElements[$column] = explode(“ “, $val);
}
}
The explode function is very useful in expanding the
elements read from the data file into separate fields that
are then assigned to the
$$ffiilleeEElleemmeennttss
array. This is
simple because the field delimiter in the
hhiittccoouunntteerr ddaatt

file is the space character.
The next step in the process involves locating the cur-
rent position in the database and determining how
many new data points need to be added. Then, we
locate where to begin entering data into the database
table. This is accomplished by reading the
hhiittccoouunntteerr ddaatt
file and comparing the maximum num-
ber of visits last recorded in the database with the asso-
ciated visit data contained in the data file. When the
two are equal, the point has been reached in the data
file wherein the last entry was made to the database.
Any data contained beyond this point represents new
information that must be inserted into the instance.
This defines the starting index for future inserts into the
database, which we fill using a
ffoorr
loop as follows:
for($k = $startIndex+1; $k < sizeof($data)-1; $k++ )
{
if
(strcmp($fileElements[$k][5],$fileElements[$k+1][5])!
=0)
{
$hour = $fileElements[$k][1];
//
$visits = $fileElements[$k][9];
$sql = “insert into sitevisits (visit_ID, hour,
minute,
second, DayofWeek, Month, DayofMonth, Year, vis-

its)
values (‘’, ‘$hour’, ‘$minute’, ‘$second’,
‘$DayofWeek’, ‘$Month’, ‘$DayofMonth’, ‘$Year’,
‘$visits’)”;
//
}
The code snippet above is contained in Listing 2; it
inserts the new data into the
ssiitteevviissiittss
table. The
starting point for the instance is at
$$ssttaarrttIInnddeexx++11
. We
can identify where the new data begins from the
hhiittccoouunntteerr ddaatt
file and the ending point is
ssiizzeeooff(($$ddaattaa))
, that is, the total amount of data con-
tained within the
hhiittccoouunntteerr ddaatt
file. The fields entered
into the database are truncated in the code segment
above to save space. However, the fields include
$$hhoouurr
,
$$mmiinnuuttee
,
$$sseeccoonndd
,
$$DDaayyooffWWeeeekk

,
$$MMoonntthh
,
$$DDaayyooffMMoonntthh
,
$$YYeeaarr
, and
$$vviissiittss
.
Querying Results
Listing 3 is what I’ll call
qquueerryyDDbb pphhpp
—one of the plot-
ting workhorses of the methodology. I start by perform-
ing a general query and fetching all data within the
database:
$table = “sitevisits”;
$check = “select * from $table”;
$qry = mysql_query($check)
or die (“Could not match data because “ .
mysql_error());
Then, I assign these data to an array:
while ($newArray = mysql_fetch_array($qry) ) {
$dow = $newArray[‘DayofWeek’];
$mo = $newArray[‘Month’];
$dom = $newArray[‘DayofMonth’];
$yr = $newArray[‘Year’];
$vis = $newArray[‘visits’];
$dbElements[$i][0] = $dow;
$dbElements[$i][1] = $mo;

$dbElements[$i][2] = $dom;
$dbElements[$i][3] = $yr;
$dbElements[$i][4] = $vis;
These elements are to be used in the plotting process.
The actual plotting takes place within the
qquueerryyDDBB pphhpp
code using the
$$ddbbEElleemmeennttss[[$$ii]][[44]] == $$vviiss;;
assign-
ment. Quite simply, I define arbitrarily a field width (in
pixels) that defines the span or range of the plotting
window. I selected 400 pixels simply because in this
way the entire screen will not be taken over by the plot-
ting of the individual bar chart elements. Furthermore,
I scale the plotting of the individual bars to the current
maximum value contained within the database. This is
logical because over time, as more data accumulates,
the overall maximum number of visits increases. It is
therefore necessary to scale all data by the new maxi-
mum value so that earlier hit count recordings will dis-
play proportionally with respect to one another.
Furthermore, since the maximum number of visits is
“T
he output would provide an
at-a-glance summary that would
allow my client to assess
the effectiveness of
advertising campaigns ”
May 2005


PHP Architect

www.phparch.com
FFEEAATTUURREE
14
The Anatomy of a Hit: An Advanced PHP & MySQL Hit Counter
Listing 2 (cont’d)
93 echo “[“ . $val . “]<br>”;
94
95 if ( strcmp($val,” “) == 0 ) {
96 }
97 else
98 {
99 $fileElements[$column] = explode(“ “, $val);
100 }
101 }
102
103 //********************************************************
104 // Determine where to begin new data entry into database,
105 // based on what is contained in the hitcounter file
106 //********************************************************
107
108 if ( strcmp( $debug, “yes” ) == 0 )
109 echo “ number of total data elements now: “
110 . sizeof($data) . “<br>”;
111
112 $startIndex = 0;
113 for ($k = 1; $k < sizeof($data); $k++ )
114 {
115 if ( strcmp($maxVisits, $fileElements[$k][9]) ==

0 ) {
116 // Found the entry
117 $startIndex = $k;
118 }
119 if ( strcmp( $debug, “yes” ) == 0 )
120 echo “ “ . $fileElements[$k][9] . “ “ . $maxVisits
121 . “<br>”;
122 }
123
124 if ( strcmp( $debug, “yes” ) == 0 )
125 echo “ new start index: “ . $startIndex . “<br>”;
126
127 if ( strcmp( $debug, “yes” ) == 0 )
128 echo “ start index: “ . $startIndex . “<br>”;
129
130 //***************************************************
131 // Insert table data, beginning with the start index
132 //***************************************************
133
134 for ($k = $startIndex+1; $k < sizeof($data)-1; $k++ )
135
{
136 if ( strcmp($fileElements[$k][5],
137 $fileElements[$k+1][5]) != 0 ) {
138
139 $hour = $fileElements[$k][1];
140 $minute = $fileElements[$k][2];
141 $second = $fileElements[$k][3];
142 $DayofWeek = $fileElements[$k][5];
143 $Month = $fileElements[$k][6];

144 $DayofMonth = $fileElements[$k][7];
145 $Year = $fileElements[$k][8];
146 $visits = $fileElements[$k][9];
147
148
149 $sql = “insert into sitevisits (visit_ID, hour,
150 minute, second, DayofWeek, Month, DayofMonth,
151
Year, visits) values (‘’, ‘$hour’, ‘$minute’,
152 ‘$second’, ‘$DayofWeek’, ‘$Month’, ‘$DayofMonth’,
153 ‘$Year’, ‘$visits’)”;
154
155 if ( strcmp( $debug, “yes” ) == 0 )
156 echo “ sql statement: “ . $sql . “<br>”;
157
158 //
159 // Execute the SQL statement
160 //
161
162 $result = mysql_query($sql);
163
164 if ( strcmp( $debug, “yes” ) == 0 )
165 echo “ result of insert: “ . $result . “<br>”;
166
167 if ( strcmp( $debug, “yes” ) == 0 )
168 echo “ result: “ . $result . “<br>”;
169
170 if ( strcmp( $debug, “yes” ) == 0 ) {
171 for ($m = 1; $m < 10; $m++ )
172 {

173 echo $fileElements[
$k][$m] . “ “;
174 }
175 echo “<br>”;
176 } // end if
177 }
178 }
179
180 //**********************************
181 // Insert the last row of data
182 //**********************************
183
184 if ( strcmp( $debug, “yes” ) == 0 )
Listing 2
1 <?php
2 // siteindex.php
3
4 $debug = “no”;
5
6 //********************************
7 // Read the hitcounter file
8 //********************************
9
10 $fileName = “ /hitcounter.dat”;
11
12
13 //**************************************
14 // Open the db connection to sitestats
15 // and look at the last entry
16 //**************************************

17
18 $conn = mysql_connect(“localhost”, “root”, “admin”);
19 if ( ! $conn )
20 die(“Could not connect to MySQL” );
21
22 mysql_select_db(“sitestats”,$conn)
23 or die(“Could not open sitestats: “ . mysql_error());
24
25 //******************************************
26 // Select the last visit entry in the table
27 //******************************************
28
29 $table = “sitevisits”;
30 $check = “select * from $table”;
31 $qry = mysql_query($check)
32 or die (“Could not match data: “ . mysql_error());
33 $nRows = mysql_num_rows($qry);
34 $maxVisits = 0
;
35
36 while ($newArray = mysql_fetch_array($qry) ) {
37 $visits = $newArray[‘visits’];
38
39 if ( strcmp( $debug, “yes” ) == 0 )
40 echo “ maxVisits = “ . $maxVisits
41 . “ value from db = “ . $visits . “<br>”;
42
43 if ( $visits > $maxVisits ) $maxVisits = $visits;
44 }
45

46 if ( strcmp( $debug, “yes” ) == 0 )
47 echo “ max visits: “ . $maxVisits . “<br>”;
48
49 mysql_close($conn);
50
51 if ( strcmp( $maxVisits, “” ) == 0 ) $maxVisits = 0;
52
53 if ( strcmp( $debug, “yes” ) == 0 )
54 echo “ Maximum number of visits stored in database: “
55 . $maxVisits . “<br>”;
56
57
58
59 //***************************************
60 // Open the db connection to sitestats
61 // and prepare to insert data
62 //***************************************
63
64 $conn = mysql_connect(“localhost”, “root”, “admin”);
65
66 if ( strcmp( $debug, “yes” ) == 0 )
67 echo “ $conn = “ . $conn . “<br>”;
68
69 if ( ! $conn ) die(“Could not connect to MySQL” );
70
71 mysql_select_db(“sitestats”,$conn)
72 or die(“Could not open sitestats: “ . mysql_error());
73
74 if ( strcmp( $debug, “yes” ) == 0 )
75 echo “ selected table <br>”;

76
77 //********************************************
78 // Load data from hitcounter file into array
79 //********************************************
80
81 $data = file($fileName);
82
83 //**************************************
84 // Extract each value and explode into
85 // a two-dimensional array
86 //**************************************
87
88 foreach ($data as
$column => $val )
89 {
90 // Explore data into a new array
91
92 if ( strcmp( $debug, “yes” ) == 0 )
(logically) always represented by the last data element
within the database, it follows that we need to scale
based on this last element.
Thus, I define a maximum width using the variable
$$ggrraapphhWWiiddtthhMMaaxx == 440000
pixels. Now, I need to define the
height of each bar (that is, the width in the vertical
sense), which I’ve arbitrarily assigned to be
$$bbaarrHHeeiigghhtt
== 1100;;
pixels, and the absolute maximum width of each
bar, taken as the latest data entry in the database

ssiitteessttaattss
table
$$bbaarrMMaaxx == $$ddbbEElleemmeennttss[[$$nnRRoowwss 11]][[44]];;
I also need to define the number of rows to plot on a
given web page. This is an important feature because
the number that should be plotted is related to each
bar’s width as well as the resolution of the screen and
the ability of the user to see the data clearly without
having to use the scroll bar. Scrollbars can become a
nuisance, too, if the user is continually moving them to
see all data. Hence, one requirement which I imposed
was to keep all of the data within the eye span of the
user. So, I opted for a relatively low count in terms of
bars per page. Now, since I will only be plotting 10 bars
per page, I need to come up with a mechanism for
allowing the user to move to a new page and show the
next 10 bars in the database. I therefore defined vari-
ables to keep track of the starting row and the ending
row on any given page. These quantities are represent-
ed as follows:
$numberRowsToPlot = 10;
$startRow = 0;
$endRow = $startRow + $numberRowsToPlot;
These equations will become important, shortly. First,
let’s plot the first 10 rows of data. We do this in a for-
loop, like this:
for ( $i = $startRow; $i < $endRow; $i++ )
{
$countVal = intval( $dbElements[$i][4] );
$barWidth = $graphWidthMax * $countVal/$barMax;

//
}
I begin with the
$$ssttaarrttRRooww
on the page and end with
the first
$$eennddRRooww
. I retrieve the
$$ii
—the current index of
the
$$ddbbEElleemmeennttss
array for counter value—and assign it
to variable
$$ccoouunnttVVaall
. I then scale the
$$bbaarrWWiiddtthh
in pro-
portion to the maximum graphing width (defined ear-
lier as 400 pixels) normalized by the maximum number
of hits. This gives me a proportional width with respect
to the 400-pixel limit within the plotting frame (here,
the web page itself).
You’ll note from Figure 1 that data are printed along-
side of the bars, including the value of a particular bar
width. This is done in a straightforward manner by sim-
ply encapsulating the printing of the data within a
table, as columns within that table. This ensures uni-
form spacing and alignment of the data within the
cells.

Without going into all of the details (because Listing
3 provides the explicit implementation), the key ele-
ments of this plotting process are as follows: create a
table, enter the data values into columns via an echo
statement, and concatenate multiple columns so that
the data are aligned across the page:
echo “<tr>”;
echo “<td align=right><font face=arial color=blue
size=2>”;
echo $dbElements[$i][0] . “,</font></td>”;
But how do we actually create the bar? Very easily: we
have a JPG image of a single pixel, and labeled
rreeddddoott jjppgg
. Within the second to last column of the
table we create an image reference to that JPG image
and size it where its width is equal to
$$bbaarrWWiiddtthh
and its
FFEEAATTUURREE
May 2005

PHP Architect

www.phparch.com
The Anatomy of a Hit: An Advanced PHP & MySQL Hit Counter
15
Listing 2 (cont’d)
185 echo “ startIndex = “ . $startIndex
186 . “ sizeof(data) = “ . sizeof($data) . “<br>”;
187

188 if ( $startIndex+1 < sizeof($data) ) {
189 $hour = $fileElements[sizeof($data)-1][1];
190 $minute = $fileElements[sizeof($data)-1][2];
191 $second = $fileElements[sizeof($data)-1][3];
192 $DayofWeek = $fileElements[sizeof($data)-1][5];
193 $Month = $fileElements[sizeof($data)-1][6];
194 $DayofMonth = $fileElements[sizeof($data)-1][
7];
195 $Year = $fileElements[sizeof($data)-1][8];
196 $visits = $fileElements[sizeof($data)-1][9];
197
198 $sql = “insert into siteVisits (hour, minute, second,
199 DayofWeek, Month, DayofMonth, Year, visits) values
200 (‘$hour’, ‘$minute’, ‘$second’, ‘$DayofWeek’,
201 ‘$Month’, ‘$DayofMonth’, ‘$Year’, ‘$visits’)”;
202
203 //
204 // Execute the SQL statement
205 //
206
207 $result = mysql_query($sql);
208
209 if ( strcmp( $debug, “yes” ) == 0 )
210 echo “ result: “ . $result . “<br>”;
211
212 if ( strcmp( $debug, “yes” ) == 0 ) {
213 for ($m = 1; $m < 10; $m++ )
214 {
215 echo $fileElements[sizeof($data
)-1][$m] . “ “;

216 }
217 echo “<br>”;
218 } // end if
219 } // end if
220
221 //***********************
222 // Close the connection
223 //***********************
224
225 mysql_close($conn);
226 header(“Location: queryDB.php”);
227
228 ?>
“The explode function
is very useful in expanding the
elements read from the data file
into separate fields.“
May 2005

PHP Architect

www.phparch.com
FFEEAATTUURREE
16
The Anatomy of a Hit: An Advanced PHP & MySQL Hit Counter
Listing 3
1 <?php
2 // queryDB.php
3
4 include(“header.php”);

5 include(“logo.php”);
6
7 $debug = “no”;
8 $production = “no”;
9
10 //***************************************
11 // Open the db connection to sitestats
12 // and look at the last entry
13 //***************************************
14
15 $conn = mysql_connect(“localhost”, “root”, “admin”);
16
17 if ( ! $conn )
18 die(“Could not connect to MySQL” );
19
20 mysql_select_db(“sitestats”,$conn)
21 or die(“Could not open sitestats: “ . mysql_error());
22
23
24 //*****************************************************
25 // Note: mysql_fetch_row($qry) retrieves a single row
26 // mysql_fetch_field($qry, $i) fetches field $i
27 //*****************************************************
28
29 $table = “sitevisits”;
30 $check = “select * from $table”;
31 $qry = mysql_query($check)
32 or die (“Could not match data: “ . mysql_error());
33
34 if ( strcmp( $debug, “yes”

) == 0 )
35 echo “ qry = “ . $qry . “<br>”;
36
37 $nRows = mysql_num_rows($qry);
38
39 if ( strcmp( $debug, “yes” ) == 0 ) echo “<table>”;
40 if ( strcmp( $debug, “yes” ) == 0 ) echo “<th>”;
41 if ( strcmp( $debug, “yes” ) == 0 ) echo “</th>”;
42
43 $i = 0;
44 while ($newArray = mysql_fetch_array($qry) ) {
45
46 $dow = $newArray[‘DayofWeek’];
47 $mo = $newArray[‘Month’];
48 $dom = $newArray[‘DayofMonth’];
49 $yr = $newArray[‘Year’];
50 $vis = $newArray[‘visits’
];
51
52 $dbElements[$i][0] = $dow;
53 $dbElements[$i][1] = $mo;
54 $dbElements[$i][2] = $dom;
55 $dbElements[$i][3] = $yr;
56 $dbElements[$i][4] = $vis;
57 if ( strcmp( $debug, “yes” ) == 0 )
58 echo “ ==> “ . $dbElements[$i][4] . “<br>”;
59 $i++;
60
61 if ( strcmp( $debug, “yes” ) == 0 ) echo “<tr>”;
62 if ( strcmp( $debug, “yes” ) == 0 )

63 echo “<td><font face=arial color=blue size=3>”
64 . $dow . “</font></td>”
;
65 if ( strcmp( $debug, “yes” ) == 0 )
66 echo “<td><font face=arial color=blue size=3>”
67 . $mo . “</font></td>”;
68 if ( strcmp( $debug, “yes” ) == 0 )
69 echo “<td><font face=arial color=blue size=3>”
70 . $dom . “</font></td>”;
71 if ( strcmp( $debug, “yes” ) == 0 )
72 echo “<td><font face=arial color=blue size=3>”
73 . $yr . “</font></td>”;
74 if ( strcmp( $debug, “yes” ) == 0 )
75 echo “<td><font face=arial color=blue size=3>”
76 . $vis . “</font></td>”;
77 if ( strcmp( $debug, “yes” ) == 0 )
78 echo “</tr>”;
79 } // end while
80
81 if ( strcmp( $debug, “yes”
) == 0 ) echo “</table>”;
82
83 mysql_close($conn);
84
85 //*************************************************
86 // Sort by visits, ascending, using insertion sort
87 //*************************************************
88
89 for ( $i = 1; $i < $nRows; $i++ )
90 {

91 $index0 = $dbElements[$i][0];
92 $index1 = $dbElements[$i][1];
Listing 3 (cont’d)
93 $index2 = $dbElements[$i][2];
94 $index3 = $dbElements[$i][3];
95 $index4 = $dbElements[$i][4];
96
97 $j = $i;
98 while ( ($j > 0) && ($dbElements[$j-1][4] > $index4) )
99 {
100 $dbElements[$j][4] = $dbElements[$j-1][4];
101 $dbElements[$j][3] = $dbElements[$j-1][3];
102 $dbElements[$j][2] = $dbElements[$j-1][2];
103 $dbElements[$j][1
] = $dbElements[$j-1][1];
104 $dbElements[$j][0] = $dbElements[$j-1][0];
105
106 $j = $j - 1;
107 }
108
109 $dbElements[$j][0] = $index0;
110 $dbElements[$j][1] = $index1;
111 $dbElements[$j][2] = $index2;
112 $dbElements[$j][3] = $index3;
113 $dbElements[$j][4] = $index4;
114 }
115
116 //************************************
117 // Print out new table and plot graph
118 //************************************

119
120 $graphWidthMax = 400;
121 $barHeight = 10; // pixels
122 $barMax = $dbElements
[$nRows-1][4];
123
124 $numberRowsToPlot = 10;
125 $startRow = 0;
126 $endRow = $startRow + $numberRowsToPlot;
127
128 if ( strcmp( $debug, “yes” ) == 0 )
129 echo “ Max = “ . $barMax . “<br>”;
130
131 echo “<table>”;
132 echo “<th>”;
133 echo “</th>”;
134
135 for ( $i = $startRow; $i < $endRow; $i++ )
136 {
137 $countVal = intval( $dbElements[$i][4] );
138 $barWidth = $graphWidthMax * $countVal/$barMax;
139
140 echo “<tr>”;
141 echo “<td align=right><font face=arial color=blue “
142 . “size=2>” . $dbElements[
$i][0] . “,</font></td>”;
143 echo “<td align=right><font face=arial color=blue “
144 . “size=2>” . $dbElements[$i][1]. “</font></td>”;
145 echo “<td align=right><font face=arial color=blue “
146 . “size=2>” . $dbElements[$i][2] . “</font></td>”;

147 echo “<td align=right><font face=arial color=blue “
148 . “size=2>” . $dbElements[$i][3]. “</font></td>”;
149 print(“<td>\n”);
150 echo “<font face=arial color=purple size=2>”;
151 echo “<b>”;
152 print(“<img src=\”reddot.jpg\” “);
153 print(“width=\”$barWidth\” height=\”$barHeight\”>”);
154 echo “ “ . $dbElements[$i][4];
155 echo “</b>”;
156 echo “</font>”;
157 print(“</td>\n”);
158
159 echo “</tr>”;
160 }
161
162 echo
“</table>”;
163 ?>
164 <table>
165 <tr>
166 <td>
167 <font Style=”font-family:arial; font-size:12pt;
168 font-style: bold; color: #000000;”>
169 Entries: <?php echo $startRow; ?> to
170 <?php echo $endRow; ?> with
171 <?php echo $barMax; ?> total rows
172 </font>
173 </td>
174 <td>
175 <form method=”post” action=”queryDB1.php”>

176 <input type=”hidden” name=”startRow”
177 value=”<?php echo $startRow; ?>” >
178 <input type=”hidden” name=”numberRowsToPlot”
179 value=”<?php echo $numberRowsToPlot; ?>” >
180 <input type=”hidden” name=”discrim” value=”add” >
181 <input type=”hidden” name=”delta” value=”10” >
182 <input type=”submit” value=”>”
183 Style=”font-family:sans-serif; font-size:10pt;
184 font-style:bold; background:#4400ff none;
height is equal to
$$bbaarrHHeeiigghhtt
, as shown below:
print(“<td>\n”);
//
print(“<img src=\”reddot.jpg\” “);
print(“width=\”$barWidth\” height=\”$barHeight\”>”);
echo “ “ . $dbElements[$i][4];
//
print(“</td>\n”);
echo “</tr>”;
At the end of each bar, I print the actual value of the
bar, accomplished by outputting the value of
$$ddbbEElleemmeennttss[[$$ii]][[44]]
.
Getting the Next 10 Rows
At the bottom of Listing 3, there are two forms. I will
focus on the first form for the time being. This form
accepts the current values of
$$ssttaarrttRRooww
and

$$eennddRRooww
and passes these, as hidden values, to the PHP code in
Listing 4 (
qquueerryyDDBB11 pphhpp
). This is shown in the code seg-
ment below:
<form method=”post” action=”queryDB1.php”>
<input type=”hidden” name=”startRow”
value=”<?php echo $startRow; ?>” >
<input type=”hidden” name=”numberRowsToPlot”
value=”<?php echo $numberRowsToPlot; ?>” >
<input type=”hidden” name=”discrim” value=”add” >
<input type=”hidden” name=”delta” value=”10” >
<input type=”submit” value=”>”
Style=”font-family:sans-serif; font-size:10pt;
font-style:bold; background:#4400ff none;
color: #ccbbcc; height: 2em; width: 2em”>
</form>
Key within this form code are the variables named
$$ddiissccrriimm
and
$$ddeellttaa
which are passed as hidden vari-
ables from
qquueerryyDDBB pphhpp
to
qquueerryyDDBB11 pphhpp
. The ASCII text
string “add” is assigned to the
ddiissccrriimm

field. As you’ll
see in a moment, this is the key to how the
qquueerryyDDBB11 pphhpp
code displays results—they are posted
through the form. These are retrieved within
qquueerryyDDBB11 pphhpp
using the following code:
$startRow = $_POST[‘startRow’];
$numberRowsToPlot = $_POST[‘numberRowsToPlot’];
$discrim = $_POST[‘discrim’];
$delta = $_POST[‘delta’];
Again, I open the database and retrieve the data, trans-
late it to the
$$ddbbEElleemmeennttss
array, and then apply the
$$ddiissccrriimm
parameter to the data.
if ( strcmp($discrim,”add”) == 0 ) { // Going up
$startRow = $startRow + $delta;
$endRow = $startRow + $delta;
if ( $endRow > $barMax ) {
$endRow = $barMax;
}
}
If we click the right-hand arrow in Figure 1 (that is, the
“increase” button) then we expect that we will be pre-
sented the next 10 rows of data. This is accomplished
within
qquueerryyDDBB11 pphhpp
by adding the value

$$ddeellttaa
to the
current
$$ssttaarrttRRooww
and assigning the new
$$eennddRRooww
equal
to the current
$$ssttaarrttRRooww
plus
$$ddeellttaa
. We must be care-
ful if we are at the last few elements of data, because by
attempting to add
$$ddeellttaa
rows to the current
$$ssttaarrttRRooww
we may, in effect, run off the end of the data table. To
accommodate this event, I perform a check on the
value of
$$eennddRRooww
in relation to
$$bbaarrMMaaxx
. If
$$eennddRRooww
is
greater than
$$bbaarrMMaaxx
, then simply assign
$$eennddRRooww

to
$$bbaarrMMaaxx
. The application of this logic results in the
screen snapshot shown in Figure 2, in which the next
10 rows appear.
In the interest of completeness, it must be noted that
code Listings 5, 6, and 7 are those for
hheeaaddeerr pphhpp
,
llooggoo pphhpp
, and
ffooootteerr pphhpp
, respectively. These are small
files that contain web page header, title, and page clos-
ing HTML tags that are included in the main PHP doc-
uments.
Getting the Previous 10 Rows
This process continues: located at the bottom of
qquueerryyDDBB11 pphhpp
are three forms. The second form is the
FFEEAATTUURREE
May 2005

PHP Architect

www.phparch.com
The Anatomy of a Hit: An Advanced PHP & MySQL Hit Counter
17
Listing 3 (cont’d)
185 color: #ccbbcc; height: 2em; width: 2em”>

186 </form>
187 </td>
188 <td>
189 <font Style=”font-family:arial; font-size:12pt; font-
style: bold; color: #000000;”>
190 Go to Entry:
191 </font>
192 </td>
193 <td>
194 <form method=”post” action=”queryDB1.php”>
195 <input name=”startRow” type=”text” >
196 <input type=”hidden” name=”numberRowsToPlot”
197 value=”<?php echo $numberRowsToPlot; ?>” >
198 <input type=”hidden” name=”discrim” value=”val” >
199 <input type=”hidden” name=”delta” value=”10” >
200 <input type=”submit” value=”>|<”
201 Style=”font-family:sans-serif; font-size:8pt;
202 font-style:bold; background:#4400ff none;
203 color: #ccbbcc; height: 3em; width: 3em”>
204 </form>
205 </td>
206 </tr>
207 </table>
208
209 <?php
210 include(“footer.php”);
211 ?>
Figure 2
May 2005


PHP Architect

www.phparch.com
FFEEAATTUURREE
18
The Anatomy of a Hit: An Advanced PHP & MySQL Hit Counter
Listing 4
1 <?php
2 // queryDB1.php
3 include(“header.php”);
4 include(“logo.php”);
5
6 $startRow = $_POST[‘startRow’];
7 $numberRowsToPlot = $_POST[‘numberRowsToPlot’];
8 $discrim = $_POST[‘discrim’];
9 $delta = $_POST[‘delta’];
10
11 $debug = “no”;
12
13 //***************************************
14 // Open the db connection to sitestats
15 // and look at the last entry
16 //***************************************
17
18 $conn = mysql_connect(“localhost”, “root”, “admin”);
19
20 if ( ! $conn )
21 die(“Could not connect to MySQL” );
22
23 mysql_select_db(“sitestats”,$conn)

24 or die(“Could not open sitestats: “ . mysql_error());
25
26
27 //*****************************************************
28 // Note: mysql_fetch_row($qry) retrieves a single row
29 // mysql_fetch_field($qry, $i) fetches field $i
30 //*****************************************************
31
32 $table = “sitevisits”;
33
34 $check =
“select * from $table”;
35
36 $qry = mysql_query($check)
37 or die (“Could not match data because “ . mysql_error());
38
39 if ( strcmp( $debug, “yes” ) == 0 )
40 echo “ qry = “ . $qry . “<br>”;
41
42 $nRows = mysql_num_rows($qry);
43
44 if ( strcmp( $debug, “yes” ) == 0 ) echo “<table>”;
45 if ( strcmp( $debug, “yes” ) == 0 ) echo “<th>”;
46 if ( strcmp( $debug, “yes” ) == 0 ) echo “</th>”;
47
48 $i = 0;
49 while ($newArray = mysql_fetch_array($qry) ) {
50 $dow = $newArray[‘DayofWeek’];
51 $mo = $newArray[
‘Month’];

52 $dom = $newArray[‘DayofMonth’];
53 $yr = $newArray[‘Year’];
54 $vis = $newArray[‘visits’];
55
56 $dbElements[$i][0] = $dow;
57 $dbElements[$i][1] = $mo;
58 $dbElements[$i][2] = $dom;
59 $dbElements[$i][3] = $yr;
60 $dbElements[$i][4] = $vis;
61 if ( strcmp( $debug, “yes” ) == 0 )
62 echo “ ==> “ . $dbElements[$i][4] . “<br>”;
63 $i++;
64
65 if ( strcmp( $debug, “yes” ) ==
0 ) echo “<tr>”;
66 if ( strcmp( $debug, “yes” ) == 0 )
67 echo “<td><font face=arial color=blue size=3>”
68 . $dow . “</font></td>”;
69 if ( strcmp( $debug, “yes” ) == 0 )
70 echo “<td><font face=arial color=blue size=3>”
71 . $mo . “</font></td>”;
72 if ( strcmp( $debug, “yes” ) == 0 )
73 echo “<td><font face=arial color=blue size=3>”
74 . $dom . “</font></td>”;
75 if ( strcmp( $debug, “yes” ) == 0 )
76 echo “<td><font face=arial color=blue size=3>”
77 . $yr . “</font></td>”;
78 if ( strcmp( $debug, “yes” ) == 0 )
79 echo “<td><font face=arial color=blue size=3>”
80 . $vis . “</font></td>”;

81 if (
strcmp( $debug, “yes” ) == 0 )
82 echo “</tr>”;
83 } // end while
84
85 if ( strcmp( $debug, “yes” ) == 0 ) echo “</table>”;
86
87 mysql_close($conn);
88
89 //*************************************************
90 // Sort by visits, ascending, using insertion sort
91 //*************************************************
92
93 for ( $i = 1; $i < $nRows; $i++ )
94 {
95 $index0 = $dbElements[$i][0];
96 $index1 = $dbElements[$i][1];
Listing 4 (cont’d)
97 $index2 = $dbElements[$i][2];
98 $index3 = $dbElements[$i][3];
99 $index4 = $dbElements[$i][4];
100
101 $j = $i;
102 while ( ($j > 0) && ($dbElements[$j-1][4] > $index4) )
103 {
104 $dbElements[$j][4] = $dbElements[$j-1][4];
105 $dbElements[$j][3] = $dbElements[$j-1][3];
106 $dbElements[$j][2] = $dbElements[$j-1][2];
107 $dbElements[$j][1
] = $dbElements[$j-1][1];

108 $dbElements[$j][0] = $dbElements[$j-1][0];
109
110
111 $j = $j - 1;
112 }
113
114 $dbElements[$j][0] = $index0;
115 $dbElements[$j][1] = $index1;
116 $dbElements[$j][2] = $index2;
117 $dbElements[$j][3] = $index3;
118 $dbElements[$j][4] = $index4;
119
120 }
121
122 //************************************
123 // Print out new table and plot graph
124 //************************************
125
126 $graphWidthMax = 400;
127 $barHeight = 10; // pixels
128 $barMax = $dbElements
[$nRows-1][4];
129
130 //
131 // Define the field range to show on the page.
132 //
133
134 if ( strcmp($discrim,”val”) == 0 ) {
135 // Go to specific range
136 $endRow = $startRow + $delta;

137 if ( $endRow > $barMax ) $endRow = $barMax;
138 }
139
140 //
141 // Adding $delta
142 //
143
144 if ( strcmp($discrim,”add”) == 0 ) { // Going up
145
146 $startRow = $startRow + $delta;
147 $endRow = $startRow + $delta;
148
149 if ( $endRow > $barMax ) {
150 $endRow = $barMax;
151 }
152 }
153
154 //
155 // Subtracting $delta
156 //
157
158 if ( strcmp($discrim
, “subtract”) == 0 ) { // Going down
159
160 $startRow = $startRow - $delta;
161 $endRow = $startRow + $delta;
162
163 if ( $startRow <= 0 ) {
164 $startRow = 0;
165 $endRow = $startRow + $delta;

166 }
167 }
168
169 if ( strcmp( $debug, “yes” ) == 0 )
170 echo “ Max = “ . $barMax . “<br>”;
171
172 echo “<table>”;
173 echo “<th>”;
174 echo “</th>”;
175
176 if ( strcmp( $debug, “yes” ) == 0 )
177 echo “ startRow = “ . $startRow . “<br>”;
178 if ( strcmp( $debug, “yes” ) == 0 )
179
echo “ endRow = “ . $endRow . “<br>”;
180 if ( strcmp( $debug, “yes” ) == 0 )
181 echo “ delta = “ . $delta . “<br>”;
182 if ( strcmp( $debug, “yes” ) == 0 )
183 echo “ discrim = “ . $discrim . “<br>”;
184
185 for ( $i = $startRow; $i < $endRow; $i++ )
186 {
187 $countVal = intval( $dbElements[$i][4] );
188
189 if ( $countVal != “” ) {
190 $barWidth = $graphWidthMax * $countVal/$barMax;
191
192 echo “<tr>”;
same as shown for
qquueerryyDDBB pphhpp

: in which the variable
$$ddeellttaa
is added to the current
$$ssttaarrttRRooww
and
$$eennddRRooww
.
The first form accommodates the left-hand arrow, and
assigns the string “subtract” to the
$$ddiissccrriimm
variable.
The code in
qquueerryyDDBB11 pphhpp
is then called recursively. If
the user opts to back up ten rows, then there is a “sub-
tract” method that does the following:
if ( strcmp($discrim, “subtract”) == 0 ) { // Going
down
$startRow = $startRow - $delta;
$endRow = $startRow + $delta;
if ( $startRow <= 0 ) {
$startRow = 0;
$endRow = $startRow + $delta;
}
}
In this instance, the
$$ssttaarrttRRooww
is decremented by the
amount in
$$ddeellttaa

. The
$$eennddRRooww
is still incremented by
$$ddeellttaa
rows above
$$ssttaarrttRRooww
. Then, we must accom-
modate the possibility of decrementing below the start
row. The conditional statement handles this event by
checking whether the current value of
$$ssttaarrttRRooww
is less
than zero. If so, assign zero to the
$$ssttaarrttRRooww
variable,
and set the
$$eennddRRooww
to zero plus
$$ddeellttaa
.
Starting at an Arbitrary Row
The third and last form contained in
qquueerryyDDBB11 pphhpp
accommodates the condition in which a user wishes to
go to an arbitrary row within the table. This behavior is
preferred when, for example, much data exists within
the database and the user would like to jump nearly to
the end.
In this case, the value for
$$ssttaarrttRRooww

is assigned direct-
ly by the user, through the form, and
qquueerryyDDBB11 pphhpp
is
called recursively, again. The value of
$$ddiissccrriimm
picks up
the string value “gotovalue” from
qquueerryyDDBB pphhpp
, and
uses this to assign the
$$ssttaarrttRRooww
:
<form method=”post” action=”queryDB1.php”>
<input name=”startRow” type=”text” >
<input type=”hidden” name=”numberRowsToPlot”
value=”<?php echo $numberRowsToPlot; ?>” >
<input type=”hidden” name=”discrim”
value=”val” >
<input type=”hidden” name=”delta” value=”10” >
<input type=”submit” value=”>|<”
Style=”font-family:sans-serif; font-size:8pt;
font-style:bold; background:#4400ff none;
color: #ccbbcc; height: 3em; width: 3em”>
</form>
The
$$ssttaarrttRRooww
variable becomes the point at which val-
ues will start to be displayed, and is entered by the user
through the form above. Again,

qquueerryyDDBB11 pphhpp
is called
recursively, and the
$$ddiissccrriimm
value is set to the string
“val”. The code segment that catches this value fol-
lows:
if ( strcmp($discrim,”val”) == 0 ) { // Go to spe-
cific range
$endRow = $startRow + $delta;
if ( $endRow > $barMax ) $endRow = $barMax;
}
FFEEAATTUURREE
May 2005

PHP Architect

www.phparch.com
The Anatomy of a Hit: An Advanced PHP & MySQL Hit Counter
19
Listing 4 (cont’d)
193 echo “<td align=right><font face=arial color=blue “
194 . “size=2>” . $dbElements[$i][0] . “,</font></td>”;
195 echo “<td align=right><font face=arial color=blue “
196 . “size=2>” . $dbElements[$i][1]. “</font></td>”;
197 echo “<td align=right><font face=arial color=blue “
198 . “size=2>” . $dbElements[$i][2] . “</font></td>”;
199 echo “<td align=right><font face=arial color=blue “
200 . “size=2>” . $dbElements[$i][3]. “</font></td>”;
201 print(“<td>\n”);

202 echo “<font face=arial color=purple size=2>”;
203 echo “<b>”;
204 print(“<img src=\”reddot.jpg\” “);
205 print(“width=\”$barWidth\” height=\”$barHeight\”>”);
206 echo “ “ . $dbElements[$i][4];
207 echo “</b>”;
208 echo “</font>”;
209 print(“</td>\n”);
210 echo “</tr>”;
211 }
212 }
213
214 echo “</table>”;
215
216 ?>
217 <table>
218 <tr>
219 <td>
220 <font Style=”font-family:arial; font-size:12pt;
221 font-style: bold; color: #000000;”>
222 Entries: <?php echo $startRow; ?> to
223 <?php echo $endRow; ?> with
224 <?php echo $barMax; ?> total rows
225 </font>
226 </td>
227
228 <?php
229 if ( $startRow > 0 ) {
230 ?>
231 <td>

232 <form method=”post” action=”queryDB1.php”>
233 <input type=”hidden” name=”startRow”
234 value=”<?php echo $startRow; ?>” >
235 <input type=”hidden” name=”numberRowsToPlot”
236 value=”<?php echo $numberRowsToPlot; ?>” >
237 <input type=”hidden” name=”discrim”
238 value=”subtract” >
239 <input type=”hidden” name=”delta” value=”10”>
240 <input type=”submit” value=”<”
241 Style=”font-family:sans-serif; font-size:10pt;
242 font-style:bold; background:#4400ff none;
243 color: #ccbbcc; height: 2em; width: 2em”>
244 </form>
245 </td>
246 <?php
247 }
248 ?>
249
250 <td>
251 <form method=”post” action=”queryDB1.php”>
252 <input type=”hidden” name=”startRow”
253 value=”<?php echo $startRow; ?>” >
254 <input type=”hidden” name=”numberRowsToPlot”
255 value=”<?php echo $numberRowsToPlot; ?>” >
256 <input type=”hidden” name=”discrim” value=”add” >
257 <input type=”hidden” name=”delta” value=”10” >
258 <input type=”submit” value=”>”
259 Style=”font-family:sans-serif; font-size:10pt;
260 font-style:bold; background:#4400ff none;
261 color: #ccbbcc; height: 2em; width: 2em”>

262 </form>
263 </td>
264 <td>
265 <font Style=”font-family:arial; font-size:12pt;
266 font-style: bold; color: #000000;”>
267 Go to Entry:
268 </font>
269 </td>
270 <td>
271 <form method=”post” action=”queryDB1.php”>
272 <input name=”startRow” type=”text” >
273 <input type=”hidden” name=”numberRowsToPlot”
274 value=”<?php echo $numberRowsToPlot; ?>” >
275 <input type=”hidden” name=”discrim” value=”val” >
276 <input type=”hidden” name=”delta” value=”10” >
277 <input type=”submit” value=”>|<”
278 Style=”font-family:sans-serif; font-size:8pt;
279 font-style:bold; background:#4400ff none;
280 color: #ccbbcc; height: 3em; width: 3em”>
281 </form>
282 </td>
283 </tr>
284 </table>
285
286 <?php
287 include(“footer.php”
);
288 ?>
The
$$eennddRRooww

variable is set to
$$ssttaarrttRRooww
plus
$$ddeellttaa
. If
the
$$eennddRRooww
exceeds the number of rows in the data-
base, it is automatically set to the maximum database
row. In this way a user can access any starting row and
hop over intermediate values as needed. The data are
passed recursively back to
qquueerryyDDBB11 pphhpp
using the fol-
lowing variables, which are retrieved from the form
post code:
$startRow = $_POST[‘startRow’];
$numberRowsToPlot = $_POST[‘numberRowsToPlot’];
$discrim = $_POST[‘discrim’];
$delta = $_POST[‘delta’];
The values are set based on the user’s selection during
the previous call to
qquueerryyDDBB11 pphhpp
. It is possible to aug-
ment these statements by incorporating some error
checking into the code to verify that the values have
been set within the proper ranges. This is merely one
suggestion offered to improve the robustness of the
methodology.
Operation and Data Base Table Structure

For those interested in using this methodology on their
own sites, all files are provided for download in the
code archive. Figure 3 shows the structure of the
ssiitteessttaattss
database, and the
ssiitteevviissiittss
table; it con-
tains a screenshot taken from PHPMyAdmin—a useful
tool for managing MySQL databases. A user wishing to
recreate this site counter tool will need to install MySQL
on the server and will need to create the database
instance and table required to run the code.
Summary
I have intended to provide some insight into how to
develop a simple and useful bar-chart based hit count-
er using PHP and MySQL. The code I have provided is
the same as that which I am using on client sites to
keep track of access statistics. A user having ordinary
skill in the art of PHP and MySQL can take this idea
much farther and include many different types of statis-
tics.
The methodology I provide has educational value, as
well, by illustrating a simple manner of implementing
PHP database connectivity—a capability that is neces-
sary for any type of advanced commercial application.
Some additional ideas include adding site statistics on
time of day, user identity, and server identity. It is even
possible to accommodate statistics for each web page
associated with a site, thereby providing details on the
popularity of various pages and on whether the site is

able to hold the interest of individuals so that they visit
other features available at your site.
There is no limit to what you can do.
May 2005

PHP Architect

www.phparch.com
FFEEAATTUURREE
20
The Anatomy of a Hit: An Advanced PHP & MySQL Hit Counter
About the Author ?>
To Discuss this article:
/>John R. Zaleski, Ph.D., is a biomedical systems engineer with
20 years of experience in software development and medical
device integration experience as applied to acute care hospi-
tal environments. He has developed and fielded medical
products that are currently in use in large acute care hospi-
tals. He has developed products and many applications in
Java, PHP, and MySQL and has authored two dozen patent
applications and an equal number of refereed publications
in the areas of medical device integration, software methods
for medical device communication, software performance,
and real-time clinical analysis of patient data.
Listing 5
1 <?php
2 // header.php
3 echo “<html>”;
4 echo “<head>”;
5 echo “<title>Site Counter Tool</title>”;

6 echo “</head>”;
7 echo “<body bgcolor=’#fffffb’>”;
8 ?>
Listing 6
1 <?php
2 // logo.php
3 echo “<table>”;
4 echo “<tr align=center>”;
5 echo “<td>”;
6 echo “<h1>Site Counter Tool</h1>”;
7 echo “</td>”;
8 echo “</tr>”;
9 echo “</table>”;
10 ?>
Listing 7
1 <?php
2 // footer.php
3 echo “</body>”;
4 echo “</html>”;
5 ?>
Figure 3

May 2005

PHP Architect

www.phparch.com
FFEEAATTUURREE
22
U

nicode is a single character set designed to
include characters from just about every writing
system on the planet (and off the planet—even
Klingon has been written for Unicode, although it is not
part of the official standard). In recent years, Unicode
has become more prevalent on the web, and all major
web browsers, web servers, programming languages,
and databases worth their salt now support it.
Switching your web applications to Unicode will give
you the ability to correctly handle and display any char-
acter from any language you’re likely to encounter.
Understanding the significance of Unicode requires
first understanding some basics of character sets, and
their history. The first thing you need to know was said
best by Joel Spolsky of Joel On Software: “There ain’t no
such thing as plain text.” If you don’t know the charac-
ter set and the encoding that were used in the creation
of a string of text, then you won’t know how to display
it properly. For modern purposes, the story of character
sets starts with ASCII. In the 1960s, unaccented English
characters, as well as various control characters for car-
riage returns, page feeds, etc., were each assigned a
number from 0 to 127; there was general agreement
on these number assignments, and so ASCII was born.
The ASCII characters could fit in 7 bits, and computers
used 8-bit bytes, which left an extra bit of space. This
led to the proliferation of hundreds of different charac-
ter sets, with each one using this extra space in a differ-
ent way. The characters from 0-127 are often referred
to as Lower ASCII, and the characters from 128-255 as

Many web sites cannot correctly interpret or display any-
thing other than English language characters. Converting
your site to UTF-8 (Unicode) enables you to handle char-
acters from almost any language in the world. However,
currently available conversion guidelines typically focus on
just a single software product, offering little guidance on
how to move UTF-8 encoded data between different prod-
ucts. Configuring your web server, PHP, and your database
to support UTF-8 is one thing—configuring them so UTF-
8 encoded data moves smoothly between them is anoth-
er. This article guides you through a UTF-8 conversion
using PHP, Oracle, and Apache. It also covers data exports
to PDF, RTF, email, and plain text.
Solving the Unicode Puzzle
by Michael Toppa
REQUIREMENTS
PHP 4.3.10 or higher
OS Any
Other Software Oracle 9, Apache, PDFLib
Code Directory n/a
REFERENCES
UNICODE
hhttttpp::////wwwwww uunniiccooddee oorrgg//
UNICODE
hhttttpp::////wwwwww aallaannwwoooodd nneett//uunniiccooddee//
ORACLE
hhttttpp::////wwwwww oorraaccllee ccoomm//tteecchhnnoollooggyy//
tteecchh//ooppeennssoouurrccee//pphhpp//gglloobbaalliizziinngg__
oorraaccllee__pphhpp__aapppplliiccaattiioonnss hhttmmll
PHP

hhttttpp::////uuss33 pphhpp nneett//mmaannuuaall//eenn//
rreeff mmbbssttrriinngg pphhpp
ii
FF EE AA TT UU RR EE
Upper ASCII or Extended ASCII. Extended ASCII charac-
ter sets added characters from non-English languages,
special characters like copyright symbols, and line-
drawing characters to simplify drawing boxes, etc. With
all these different versions of extended ASCII floating
around, text generated on, say, a computer in Russia
would turn into gibberish if you tried to read it on a
computer in the US. This happened because the num-
ber codes representing the Cyrillic characters were
assigned to totally different characters on the US com-
puter. This became a bit of a problem when everyone
started using the internet.
Unicode represents an effort to clean up this mess.
The Unicode slogan is: “Unicode provides a unique
number for every character, no matter what the plat-
form, no matter what the program, no matter what the
language.” Unicode can do this because it allows char-
acters to occupy more than one byte, so it has enough
room to store characters from languages around the
world—even Asian languages that have thousands of
characters. With Unicode, it’s particularly important to
understand the distinction between a character set,
and character encoding. Unicode is a single character
set, but there are three different ways to encode it: they
are called UTF-8, UTF-16, and UTF-32 (there’s also UTF-
7, but it was never officially adopted by the Unicode

Consortium, and for the most part it’s been deprecated
in favor of UTF-8). The numbers 8, 16, and 32 indicate
the bits used for the Unicode code units (a complete
character may occupy more than one code unit—it can
be multi-byte). All three encodings can display any
Unicode character, and each has its own advantages
and disadvantages depending on what’s important in a
particular implementation. In the case of web applica-
tions, UTF-8 is the encoding of choice because it stores
the lower ASCII characters in a single byte format. This
makes UTF-8 fully compatible with “plain text,” even if
you’re clueless about character encoding.
For the sake of brevity, I’ve glossed over a great num-
ber of points related to Unicode and character sets. If
you want to learn more, I highly recommend the arti-
cle The Absolute Minimum Every Software Developer
Absolutely, Positively Must Know About Unicode and
Character Sets (No Excuses!) by Joel Spolsky, at
wwwwww jjooeelloonnssooffttwwaarree ccoomm//aarrttiicclleess//UUnniiccooddee hhttmmll
. It
contains links to a number of other good resources as
well.
Why Care About Unicode?
As far as Unicode and UTF-8 are concerned, all web
sites can be placed in one of three categories: those
that don’t need to care about them, those that should
convert to UTF-8, and those that should convert to
UTF-8 and internationalize.
The most common character set currently in use on
the English-speaking side of the web, other than UTF-8,

is Western ISO-8859-1 (aka Latin-1). If your site isn’t
already using UTF-8, then you’re probably using Latin-
1. If you’ve had no problems related to character sets
so far, and you have absolutely no foreseeable needs to
handle text outside the ASCII range, then you fall into
the first category: you probably don’t need to do any-
thing. As you’ll see in the rest of this article, converting
to UTF-8 is not a painless process, so you should only
undertake the work if you have some clearly identifi-
able, relevant goals to meet.
Here at the University of Pennsylvania School of
Medicine, we fall into the second category: our web
sites are in English, but we occasionally handle data
from a variety of foreign languages that don’t use the
English alphabet. We must receive, store, display, and
transmit these characters faithfully. Since we can’t reli-
ably predict what sort of characters might come our
way, converting our applications to UTF-8 was the log-
ical choice, since it can handle any language we might
need to support.
The third category is for sites that don’t just occasion-
ally handle foreign characters—they actually serve an
international audience. In addition to using UTF-8,
these sites typically employ various mechanisms that
allow visitors to choose the language for displaying
content. One important term applied here is interna-
tionalization, defined by the W3C as “[t]he process of
designing, creating, and maintaining software that can
serve the needs of users with differing language, cultur-
al, or geographic requirements and expectations” (see

hhttttpp::////wwwwww ww33 oorrgg//TTRR//wwss ii1188nn sscceennaarriiooss//
). Another
key term is localization: “[t]he tailoring of a system to
the individual cultural expectations for a specific target
market or group of individuals.” Sites that are able to
dynamically perform localization for a variety of target
audiences can do so because they’ve been configured
with a good internationalization framework.
Internationalization and localization are substantial
topics, and are not the focus of this article. However,
getting all the various components of your web appli-
cation environment to place nicely together using UTF-
8 is a necessary step before you can even try interna-
tionalizing your site. So this article will be of interest to
those who only want to handle the occasional non-
English characters, and to those who are contemplating
fully internationalizing their site.
Getting Ready for UTF-8
The first step is determining the scope of your work. At
a minimum, you probably have PHP, a web server, and
a database to consider. I’ll cover doing a UTF-8
conversion with PHP, Apache, and Oracle. If you
are also using Oracle, then you must read An
Overview on Globalizing Oracle PHP Applications at
hh tt tt pp :: // // ww ww ww oo rr aa cc ll ee cc oo mm // tt ee cc hh nn oo ll oo gg yy // tt ee cc hh //
oo pp ee nn ss oo uu rr cc ee // pp hh pp // gg ll oo bb aa ll ii zz ii nn gg __ oo rr aa cc ll ee __ pp hh pp __
aapppplliiccaattiioonnss hhttmmll
. It’s an excellent starting point, but,
unfortunately, it doesn’t always explain the reasons
FFEEAATTUURREE

May 2005

PHP Architect

www.phparch.com
Solving the Unicode Puzzle
23
behind its recommendations, which means you’ll get
stuck if things don’t happen to work after you follow its
instructions. I’ll try to fill those gaps.
You also have to take a look at any other applications
that interact with PHP, your web server, or your data-
base, as they will also be affected by a character set
conversion. For us, that included Smarty, PDFlib, and
exporting data to RTF, text files, and email, so I’ll dis-
cuss those as well. Even if you have a different mix of
applications, the concepts I’ll describe are probably
applicable to your situation, although the implementa-
tion specifics, obviously, will be different.
Configuring Apache, PHP, and Oracle
Most of the time, PHP web applications are run under
the Apache web server, which itself is running in a user
account (assuming you’re in a Unix-ish environment).
So, the first step is to set the environment of this
account correctly. Since PHP and Oracle are speaking to
each other through this account, it’s crucial to specify
the right character set for it, so they both know what to
expect. You do this by setting the
NNLLSS__LLAANNGG
environ-

ment variable in the Apache configuration. The Oracle
Overview document mentioned above says to set it to
AALL3322UUTTFF88
, but doesn’t fully explain why. So when this
didn’t do the trick for me, I had to do some more
research. I looked up the Oracle Character Set descrip-
tions and learned that
AALL3322UUTTFF88
corresponds to
Unicode 3.1. After talking with our DBA I learned that
our Oracle database was set to Unicode 3.0, which
meant I needed to set
NNLLSS__LLAANNGG== UUTTFF88
. Note that we
ultimately switched to
AALL3322UUTTFF88
, since it corresponds
to the latest version of Unicode, and in Oracle it allows
for conversion between UTF-16 and UTF-8 (just in case
you ever need to do that). The moral of the story is that
NNLLSS__LLAANNGG
should exactly match the character set you’re
using in Oracle.
What I just said contradicts the advice of the Oracle
Overview document, where it says
NNLLSS__LLAANNGG
should be
set to match the client (in this case, PHP) but that it
doesn’t need to match the database character set.
That’s technically true, but a mismatch will quickly lead

to trouble if, for example, you try to insert records from
PHP that are in an encoding that’s not compatible with
the Oracle character set. If you’re going to switch to
UTF-8, do it wholeheartedly: set PHP, your web server,
and your database all to UTF-8. This will save you the
headache of translating character encodings as you
move data around.
NNLLSS__LLAANNGG
is not the end of the story. It applies to the
communication between PHP and Oracle, but it does-
n’t determine how characters are encoded within PHP,
and it doesn’t influence how documents are served by
Apache. There are a few different approaches to consid-
er for having Apache and PHP serve your web pages in
UTF-8.
If you want all of the documents
on your server to default to UTF-8, one option is to
set the
AAddddDDeeffaauullttCChhaarrsseett
directive in the
Apache configuration to UTF-8. Note,
however, that the Apache documentation at
hhttttpp::////hhttttppdd aappaacchhee oorrgg//ddooccss 22 00//mmoodd//ccoorree hhttmmll
does not express enthusiasm about this approach:

AAddddDDeeffaauullttCChhaarrsseett
should only be used when all of
the text resources to which it applies are known to be
in that character encoding and it is too inconvenient to
label their charset individually. One such example is to

add the charset parameter to resources containing gen-
erated content, such as legacy CGI scripts, that might
be vulnerable to cross-site scripting attacks due to user-
provided data being included in the output. Note,
however, that a better solution is to just fix (or delete)
those scripts…”
If you want all of your PHP-generated content to be
served in UTF-8, set
ddeeffaauulltt__cchhaarrsseett==UUTTFF 88
in your
pphhpp iinnii
file. It’s OK if the PHP
ddeeffaauulltt__cchhaarrsseett
is differ-
ent from what’s specified in Apache
AAddddDDeeffaauullttCChhaarrsseett
:
the former will apply only to PHP files, and the latter
will apply to everything else.
If you want some (but not all) of your PHP documents
served in UTF-8, you don’t have to modify
pphhpp iinnii
.
Instead, specify UTF-8 as the character set in the
CCoonntteenntt ttyyppee
header of those files. It’s important to
point out here that you should set this header with the
PHP
hheeaaddeerr(())
function. If you try to set it with an HTML

Meta tag, and you’ve used Apache’s
AAddddDDeeffaauullttCChhaarrsseett
directive to specify a different character set, the Apache
directive will override your Meta tag.
Now that you’ve configured how you want docu-
ments served, you need to configure PHP so it can
internally handle UTF-8. This means enabling multi-
byte character support. You’ll need to re-compile PHP
May 2005

PHP Architect

www.phparch.com
FFEEAATTUURREE
24
Solving the Unicode Puzzle
“Switching your web applications to Unicode
will give you the ability to correctly handle and display
any character from any language you’re likely to encounter.“
with the
eennaabbllee mmbbssttrriinngg
option (unless, of course,
you had the foresight to do it previously), and set
mmbbssttrriinngg iinntteerrnnaall__eennccooddiinngg==UUTTFF 88
in your
pphhpp iinnii
file.
Look over the PHP documentation for multi-byte
string functions at
hhttttpp::////wwwwww pphhpp nneett//rreeff mmbbssttrriinngg

.
Many of the PHP string functions have multi-byte
equivalents. An example is the best way to illustrate
what this means. The multi-byte version of
ssttrrlleenn(())
is
mmbb__ssttrrlleenn(())
. The
ssttrrlleenn(())
function assumes that a
character always occupies a single byte, so it actually
returns the length of a string in bytes, and does not
necessarily indicate the number of characters. In UTF-8,
though, a string that is 4 characters long could occupy
anywhere from 4 to 24 bytes depending on the pres-
ence of multi-byte characters. The
mmbb__ssttrrlleenn(())
function
will correctly tell you the number of characters in such
a string, but the regular
ssttrrlleenn(())
function won’t.
Because of all this, you should consider enabling
PHP’s function overloading feature, described at
hhttttpp::////pphhpp nneett//rreeff mmbbssttrriinngg##mmbbssttrriinngg oovveerrllooaadd
.
Activating function overloading will cause PHP to auto-
matically assume it’s handling multi-byte strings, so—
continuing with the example—it will actually execute
mmbb__ssttrrlleenn(())

when you call
ssttrrlleenn(())
. If you’re making a
wholesale conversion to UTF-8, and you don’t want to
revise all of the string function calls in your existing
code, implementing function overloading makes sense.
But there are a couple of caveats:
Watch out for calls to
ssttrrlleenn(())
(or any other string
function) where it really is intended to work with the
byte length, not the character length. In that situation,
function overloading will end up giving you an unin-
tended result. Fortunately, there is a workaround for
mmbb__ssttrrlleenn(())
: it accepts a character set specification as a
second argument and if you pass in ‘latin1’ (even
though it’s actually handling a UTF-8 string). This will
cause the string to be evaluated as if it were single-byte
encoded.
mmbb__ssttrrlleenn(($$yyoouurr__uuttff88__ssttrriinngg,, ‘‘llaattiinn11’’))
will
give you the number of bytes in a multi-byte string.
You may not want to do function overloading on
mmaaiill(())
. I’ll explain why in the discussion of email below.
Note that if you haven’t upgraded to PHP 5, the
hhttmmll__eennttiittyy__ddeeccooddee(())
function will return an error if
you pass it a UTF-8 string. This was the only UTF-8

incompatibility we found in PHP 4.3.
Going back to Oracle, starting with Oracle 9i, it pro-
vides improved handling for multi-byte characters by
giving you a way to distinguish between byte length
and character length. When creating a table, you can
specify whether its length is defined in terms of charac-
ters or bytes. For example,
VVAARRCCHHAARR22((2200 BBYYTTEE))
will give
you a 20-byte length field, and
VVAARRCCHHAARR22((2200 CCHHAARR))
will
give you a 20-character length field. The default is
BBYYTTEE
,
which you can alter with the
NNLLSS__LLEENNGGTTHH__SSEEMMAANNTTIICCSS
parameter—see your Oracle documentation for more
details.
Beware Windows-1252 in Web Forms
As I mentioned, other than UTF-8, the character encod-
ing you’re most likely to find on English-speaking web
sites, these days, is Latin-1 (aka Western ISO-8859-1).
One of the nice things about UTF-8 is that the first 256
characters are the same as in Latin-1. That is, the Latin-
1 ASCII characters and its Extended ASCII characters
live in the same numerical locations in UTF-8. If you’re
currently on Latin-1, this greatly eases the pain of
switching to UTF-8.
So, the big “however” comes from—you guessed it—

Windows. Fortunately, Windows NT, 2000, and XP use
Unicode internally and shouldn’t cause headaches for a
UTF-8 web site. But Windows 95 and 98 use the
Windows-1252 character set. Its standard ASCII charac-
ters from 0-127 are the same as Latin-1 and UTF-8, but
its Extended ASCII set is different. If you have a form on
a web page that’s UTF-8 encoded, and someone run-
ning Windows 9x fills out the form by copying-and-
pasting text from Microsoft Word, Extended ASCII
characters may be interpreted properly. You may have
experienced this before: for example, the “
©©
” symbol in
your Word document turned into something like “
ää

when you pasted it into a form. Nothing about the
character’s underlying data changed—the decimal rep-
resentation of the character is the same as it was
before—it just means something different in UTF-8
than it does in Windows-1252.
This was more of a problem in the past than it is now,
as modern browsers try to transparently perform a
character set conversion for you as needed in these sit-
uations. But the problems are by no means entirely
resolved: see FORM submission and i18n at
hhttttpp::////ppppeewwwwww pphh ggllaa aacc uukk//~~ffllaavveellll//cchhaarrsseett//
ffoorrmm ii1188nn hhttmmll
for a thorough overview of all the
issues related to this, as well as a rundown of how the

major browsers behave (if you’re wondering about the
meaning of i18n, it’s short-hand for internationaliza-
tion).
What makes this a truly maddening problem is con-
verting a Latin-1 encoded database to UTF-8 when
some of the data in it came from Latin-1 encoded web
forms where users pasted in Windows-1252 text, and
their browsers didn’t convert the characters properly.
There is no easy fix for this, as you simply have to look
at the records yourself to see if the Extended ASCII
characters are displaying as the user intended, or if
there was a character set conversion problem along the
way.
UTF-8 Support in Smarty
Smarty handles UTF-8 transparently—almost. The one
trouble spot is the
eessccaappee
modifier. It calls the PHP
hhttmmlleennttiittiieess(())
and
hhttmmllssppeecciiaallcchhaarrss(())
functions, but
it doesn’t provide them with the necessary charset
argument so they’ll work with UTF-8. The solution is to
FFEEAATTUURREE
May 2005

PHP Architect

www.phparch.com

25
Solving the Unicode Puzzle

×