Tải bản đầy đủ (.pdf) (50 trang)

Tài liệu Advanced PHP Programming- P11 doc

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (535.35 KB, 50 trang )

478
Chapter 20 PHP and Zend Engine Internals
n
opcode 1—Here the ZEND_ASSIGN handler assigns to Register 0 (the pointer to
$hi) the value hello. Register 1 is also assigned to, but it is never used. Register 1
would be utilized if the assignment were being used in an expression like this:
if($hi = ‘hello’){}
n
opcode 2—Here you re-fetch the value of $hi, now into Register 2.You use the
op ZEND_FETCH_R because the variable is used in a read-only context.
n
opcode 3—ZEND_ECHO prints the value of Register 2 (or, more accurately, sends it
to the output buffering system). echo (and print, its alias) are operations that are
built in to PHP itself, as opposed to functions that need to be called.
n
opcode 4—ZEND_RETURN is called, setting the return value of the script to 1.Even
though
return is not explicitly called in the script, every script contains an
implicit return
1, which is executed if the script completes without return being
explicitly called.
Here is a more complex example:
<?php
$hi =
‘hello’;
echo strtoupper($hi);
?>
The intermediate code dump looks similar:
opnum line opcode op1 op2 result
0 2 ZEND_FETCH_W
“hi”‘0


1 2 ZEND_ASSIGN
‘0 “hello”‘0
2 3 ZEND_FETCH_R “hi”‘2
3 3 ZEND_SEND_VAR
‘2
4 3 ZEND_DO_FCALL
“strtoupper”‘3
5 3 ZEND_ECHO ‘3
6 5 ZEND_RETURN 1
Notice the differences between these two scripts.
n
opcode 3—The ZEND_SEND_VAR op pushes a pointer to Register 2 (the variable
$hi) onto the argument stack.This argument stack is how the called function
receives its arguments. Because the function called here is an internal function
(implemented in C and not in PHP), its operation is completely hidden from PHP.
Later you will see how a userspace function receives arguments.
n
opcode 4—The ZEND_DO_FCALL op calls the function strtoupper and indicates
that Register 3 is where its return value should be set.
Here is an example of a trivial PHP script that implements conditional flow control:
<?php
$i = 0;
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
479
How the Zend Engine Works: Opcodes and Op Arrays
while($i < 5) {
$i++;
}
?>
opnum line opcode op1 op2 result

0 2 ZEND_FETCH_W “i”‘0
1 2 ZEND_ASSIGN ‘0 0 ‘0
2 3 ZEND_FETCH_R “i”‘2
3 3 ZEND_IS_SMALLER ‘2 5 ‘2
4 3 ZEND_JMPZ $3
5 4 ZEND_FETCH_RW “i”‘4
6 4 ZEND_POST_INC ‘4 ‘4
7 4 ZEND_FREE $5
8 5 ZEND_JMP
9 7 ZEND_RETURN 1
Note here that you have a ZEND_JMPZ op to set a conditional branch point (to evaluate
whether you should jump to the end of the loop if $i is greater than or equal to 5) and
a ZEND_JMP op to bring you back to the top of the loop to reevaluate the condition at
the end of each iteration.
Observe the following in these examples:
n
Six registers are allocated and used in this code, even though only two registers are
ever used at any one time. Register reuse is not implemented in PHP. For large
scripts, thousands of registers may be allocated.
n
No real optimization is performed on the code.This postincrement:
$i++;
could be optimized to a pre-increment:
++$i;
because it is used in a void context (that is, it is not used in an expression where
the former value of $i needs to be stored.) This would save you having to stash its
value in a register.
n
The jump oplines are not displayed in the debugger.This is really the fault of the
assembly dumper.The Zend Engine leaves ops used for some internal purposes

marked as unused.
Before we move on, there is one last important example to look at.The example show-
ing function calls earlier in this chapter uses
strtoupper, which is a built-in function.
Calling a function written in PHP looks similar to that to calling a built-in function:
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
480
Chapter 20 PHP and Zend Engine Internals
<?php
function hello($name) {
echo “hello\n”;
}
hello(“George”);
?>
opnum line opcode op1 op2 result
0 2 ZEND_NOP
1 5 ZEND_SEND_VAL “George”
2 5 ZEND_DO_FCALL “hello”‘0
3 7 ZEND_RETURN 1
But where is the function code? This code simply sets the argument stack (via
ZEND_SEND_VAL) and calls hello, but you don’t see the code for hello anywhere.This is
because functions in PHP are op arrays as well, as if they were miniature scripts. For
example, here is the op array for the function
hello:
FUNCTION: hello
opnum line opcode op1 op2 result
0 2 ZEND_FETCH_W “name”‘0
1 2 ZEND_RECV 1 ‘0
2 3 ZEND_ECHO “hello%0A”
3 4 ZEND_RETURN NULL

This looks pretty similar to the inline code you’ve seen before.The only difference is
ZEND_RECV, which reads off the argument stack.As with standalone scripts, even though
you don’t explicitly return at the end, a ZEND_RETURN op is implicitly added, and it
returns null.
Calling includes work similarly to function calls:
<?php
include(“file.inc”);
?>
opnum line opcode op1 op2 result
0 2 ZEND_INCLUDE_OR_EVAL “file.inc”‘0
1 4 ZEND_RETURN 1
This illustrates an important aspect of the PHP language: All includes and requires
happen at runtime. So when a script is initially parsed, the op array for that script is gen-
erated, and any functions and classes defined in its top-level file (the one that is actually
run) are inserted into the symbol table; but no potentially included scripts are parsed yet.
When the script is executed, if an
include statement is encountered, the include is
then parsed and executed on the spot. Figure 20.1 illustrates the flow of a normal PHP
script.
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
481
How the Zend Engine Works: Opcodes and Op Arrays
Figure 20.1 The execution path of a PHP script.
This design choice has a number of repercussions:
n
Flexibility—It is an oft-vaunted fact that PHP is a runtime language. One of the
important things that being a runtime language means for PHP is that it supports
conditional inclusion of files and conditional declaration of functions and classes.
Here’s an example:
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.

482
Chapter 20 PHP and Zend Engine Internals
if($condition) {
include(“file1.inc”);
}
else {
include(“file2.inc”);
}
In this example, the runtime parsing and execution of included files makes this
operation more efficient (because files are included only when needed), and it
eliminates the potential hassles of symbol conflicts if two files contain different
implementations of the same function or class.
n
Speed—Having to actually compile includes on-the-fly means that a significant
portion of a script’s execution time is spent simply compiling its dependant
includes. If a file is included twice, it must be parsed and executed twice.
include_once and require_once partially solve that problem, but it is further
exacerbated by the fact that PHP resets its compiler state completely between
script executions. (We’ll talk about that more in a minute, as well as some ways to
minimize that effect. )
Variables
Programming languages come in two basic flavors when it comes to how variables are
declared:
n
Statically typed—Statically typed languages include languages such as C++ or
Java, where a variable is assigned a type (for example,
int or String) and that type
is fixed at compile time.
n
Dynamically typed—Dynamically typed languages include languages such as

PHP, Perl, Python, and VBScript, where types are automatically inferred at run-
time. If you use this:
$variable = 0;
PHP will automatically create it as an integer type.
Furthermore, there are two additional criteria for how types are enforced or converted
between:
n
Strongly typed—In a strongly typed language, if an expression receives an argu-
ment of the wrong type, an error is generated.Without exception, statically typed
languages are strongly typed (although many allow one type to be cast, or forced
to be interpreted, as another type). Some dynamically typed languages, such as
Python and Ruby, have strong typing; in them, exceptions are thrown if variables
are used in an incorrect context.
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
483
Variables
n
Weakly typed—A weakly typed language does not necessarily enforce types.This
is usually accompanied by autoconversion of variables to appropriate types. For
instance, in this:
$string = “The value of \$variable is $variable.”;
$variable (which was autocast into an integer when it was first set) is now auto-
converted into a string type so that it can be used to create $string.
All these typing strategies have their relative benefits and drawbacks. Static typing allows
you to enforce a certain level of data validation at compile time. For this reason,
dynamically typed languages tend to be slower than statically typed languages. Dynamic
typing is, of course, more flexible. Most interpreted languages choose to go with dynam-
ic typing because it fits their flexibility.
Strong typing similarly allows you a good amount of built-in data validation, in this
case at runtime.Weak typing provides additional flexibility by allowing variables to auto-

convert between types as necessary.The interpreted languages are pretty well split on
strong typing versus weak typing. Python and Ruby (both of which bill themselves as
general-purpose “enterprise” languages) implement strong typing, whereas Perl, PHP, and
JavaScript implement weak typing.
PHP is both dynamically typed and weakly typed. One slight exception is the option-
al type checking for argument types in functions. For example, this:
function foo(User $array) { }
and this:
function bar( Exception $array) {}
enforce being passed a User or an Exception object (or one of its descendants or imple-
menters), respectively.
To fully understand types in PHP, you need to look under the hood at the data struc-
tures used in the engine. In PHP, all variables are zvals, represented by the following C
structure:
struct _zval_struct {
/* Variable information */
zvalue_value value; /* value */
zend_uint refcount;
zend_uchar type; /* active type */
zend_uchar is_ref;
};
and its complementary data container:
typedef union _zvalue_value {
long lval; /* long value */
double dval; /* double value */
struct {
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
484
Chapter 20 PHP and Zend Engine Internals
char *val;

int len;
} str; /* string value */
HashTable *ht; /* hashtable value */
zend_object_value obj; /* handle to an object */
} zvalue_value;
The zval consists of its own value (which we’ll get to in a moment), a refcount, a type,
and the flag is_ref.
A zval’s refcount is the reference counter for the value associated with that variable.
When you instantiate a new variable, like this, it is created with a reference count of 1:
$variable = ‘foo’;
If you create a copy of $variable,thezval for its value has its reference count incre-
mented. So after you perform the following, the
zval for ‘foo’ has a reference count of
2:
$variable_copy = $variable;
If you then change $variable, it will be associated to a new zval with a reference
count of 1, and the original string
‘foo’ will have its reference count decremented to 1,
as follows:
$variable = ‘bar’;
When a variable falls out of scope (say it’s defined in a function and that function is
returned from), or when the variable is destroyed, its
zval’s reference count is decre-
mented by one.When a zval’s refcount reaches 0, it is picked up by the garbage-
collection system and its contents will be freed.
The zval type is especially interesting.The fact that PHP is a weakly typed language
does not mean that variables do not have types.The type attribute of the zval specifies
what the current type of the zval is; this indicates which part of the zvalue_value
union should be looked at for its value.
Finally, is_ref indicates whether this zval actually holds data or is simply a reference

to another zval that holds data.
The
zvalue_value value is where the data for a zval is actually stored.This is a
union of all the possible base types for a variable in PHP: long integers, doubles, strings,
hashtables (arrays), and object handles.
union in C is a composite data type that uses a
minimal amount of space to store at different times different possible types. Practically,
this means that the data stored for a zval is either a numeric representation, a string rep-
resentation, an array representation, or an object representation, but never more than one
at a time.This is in contrast to a language such as Perl, where all these potential represen-
tations can coexist (this is how in Perl you can have a variable that has entirely different
representations when accessed as a string than when accessed as a number).
When you switch types in PHP (which is almost never done explicitly—almost
always implicitly, when a usage demands a
zval be in a different representation than it
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
485
Variables
currently is), zvalue_value is converted into the required format.This is why you get
behavior like this:
$a = “00”;
$a += 0;
echo $a;
which prints 0 and not 00 because the extra characters are silently discarded when $a is
converted to an integer on the second line.
Variable types are also important in comparison.When you compare two variables
with the identical operator (===), like this, the active types for the zvals are compared,
and if they are different, the comparison fails outright:
$a = 0;
$b =

‘0’;
echo ($a === $b)?”Match”:”Doesn’t Match”;
For that reason, this example fails.
With the is equal operator (
==), the comparison that is performed is based on the
active types of the operands. If the operands are strings or nulls, they are compared as
strings, if either is a Boolean, they are converted to Boolean values and compared, and
otherwise they are converted to numbers and compared.Although this results in the
==
operator being symmetrical (for example, if $a == $b is the same as $b == $a), it actu-
ally is not transitive.The following example of this was kindly provided by Dan Cowgill:
$a = “0”;
$b = 0;
$c =
“”;
echo ($a == $b)?
”True”:”False”; // True
echo ($b == $c)?”True”:”False”; // True
echo ($a == $c)?
”True”:”False”; // False
Although transitivity may seem like a basic feature of an operator algebra, understanding
how == works makes it clear why transitivity does not hold. Here are some examples:
n
“0” == 0 because both variables end up being converted to integers and com-
pared.
n
$b == $c because both $b and $c are converted to integers and compared.
n
However, $a != $c because both $a and $c are strings, and when they are com-
pared as strings, they are decidedly different.

In his commentary on this example, Dan compared this to the
== and eq operators in
Perl, which are both transitive.They are both transitive, though, because they are both
typed comparison.
== in Perl coerces both operands into numbers before performing the
comparison, whereas
eq coerces both operands into strings.The PHP == is not a typed
comparator, though, and it coerces variables only if they are not of the same active type.
Thus the lack of transitivity.
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
486
Chapter 20 PHP and Zend Engine Internals
Functions
You’ve seen that when a piece of code calls a function, it populates the argument stack
via ZEND_SEND_VAL and uses a ZEND_DO_FCALL op to execute the function. But what
does that really do? To really understand how these things work, you need to go back to
even before compilation.When PHP starts up, it looks through all its registered exten-
sions (both the ones that were compiled statically and any that were registered in the
php.ini file) and registers all the functions that they define.These functions look like
this:
typedef struct _zend_internal_function {
/* Common elements */
zend_uchar type;
zend_uchar *arg_types;
char *function_name;
zend_class_entry *scope;
zend_uint fn_flags;
union _zend_function *prototype;
/* END of common elements */
void (*handler)(INTERNAL_FUNCTION_PARAMETERS);

} zend_internal_function;
The important things to note here are the type (which is always ZEND_INTERNAL_
FUNCTION, meaning that it is an extension function written in C), the function name, and
the handler, which is a C function pointer to the function itself and is part of the exten-
sion code.
Registering one of these functions basically amounts to its being inserted into the
global function table (a hashtable in which functions are stored).
User-defined functions are, of course, inserted by the compiler.When the compiler
(by which I still mean the lexer, parser, and code generator all together) encounters a
piece of code like this:
function say_hello($name)
{
echo “Hello $name\n”;
}
it compiles the code inside the function’s block as a new op array, creates a zend_
function with that op array, and inserts that zend_function into the global function
table with its type set to ZEND_USER_FUNCTION.A zend_function looks like this:
typedef union _zend_function {
zend_uchar type;
struct {
zend_uchar type; /* never used */
zend_uchar *arg_types;
char *function_name;
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
487
Classes
zend_class_entry *scope;
zend_uint fn_flags;
union _zend_function *prototype;
} common;

zend_op_array op_array;
zend_internal_function internal_function;
} zend_function;
This definition can be rather confusing if you don’t recognize one of the design goals:
For the most part, zend_functions are zend_internal_functions are op arrays.They
are not identical structs, but all the elements that are in “common” they hold in com-
mon.Thus they can safely be casted to each other.
In practice, this means that when a ZEND_DO_FCALL op is executed, it stashes away the
current scope, populates the argument stack, and looks up the requested function by
name (actually by the lowercase version of the name because PHP implements case-
insensitive function names), returning a pointer to a zend_function. If the function’s
type is ZEND_INTERNAL_FUNCTION, it can be recast to a zend_internal_function and
executed via zend_execute_internal, which executes internal functions. Otherwise, it
will be executed via zend_execute, the same function that is called to execute scripts
and includes.This works because for user functions are completely identical to op
arrays.
As you can likely infer from the way that PHP functions work, ZEND_SEND_VAL does
not push an argument’s zval onto the argument stack; instead, it copies it and pushes the
copy onto the stack.This has the consequence that unless a variable is passed by refer-
ence (with the exception of objects), changing its value in a function does not change
the argument passed—it changes only the copy.To change a passed argument in a func-
tion, pass it by reference.
Classes
Classes are similar to functions in that, like functions, they are stashed in their own global
symbol table; but they are more complex than functions.Whereas functions are similar to
scripts (possessing the same instruction set), classes are like a miniature version of the
entire execution scope.
A class is represented by a
zend_class_entry, like this:
struct _zend_class_entry {

char type;
char *name;
zend_uint name_length;
struct _zend_class_entry *parent;
int refcount;
zend_bool constants_updated;
zend_uint ce_flags;
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
488
Chapter 20 PHP and Zend Engine Internals
HashTable function_table;
HashTable default_properties;
HashTable properties_info;
HashTable class_table;
HashTable *static_members;
HashTable constants_table;
zend_function_entry *built-in_functions;
union _zend_function *constructor;
union _zend_function *destructor;
union _zend_function *clone;
union _zend_function *_ _get;
union _zend_function *_ _set;
union _zend_function *_ _call;
/* handlers */
zend_object_value (*create_object)(zend_class_entry *class_type TSRMLS_DC);
zend_class_entry **interfaces;
zend_uint num_interfaces;
char *filename;
zend_uint line_start;
zend_uint line_end;

char *doc_comment;
zend_uint doc_comment_len;
};
Like the main execution scope, a class contains its own function table (for holding class
methods), and its own constants table.The class entry also contains a number of other
items, including tables for its attributes (for example, default_properties, properties_
info, static_members) as well as the interfaces it implements, its constructor, its
destructor, its clone, and its overloadable access functions. In addition, there is the
create_object function pointer, which, if defined, is used to create a new object and
define its handlers, which allow for fine-grained control of how that object is accessed.
One of the major changes in PHP 5 is the object model. In PHP 4, when you create
an object, you are returned a zval whose zvalue_value looks like this:
typedef struct _zend_object {
zend_class_entry *ce;
HashTable *properties;
} zend_object;
This means that zend_objects in PHP 4 are little more than hashtables (of attributes)
with a zend_class_entry floating around to hold its methods.When objects are passed
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
489
Classes
to functions, they are copied (as all other variable types are), and implementing controls
of attribute accessors is extremely hackish.
In PHP 5, an object’s zval contains a zend_object_value, like this:
struct _zend_object_value {
zend_object_handle handle;
zend_object_handlers *handlers;
};
The zend_object_value in turn contains a zend_object_handle (an integer that iden-
tifies the location of the object in a global object store—effectively a pointer to the

object proper) and a set of handlers, which regulate all accesses to the object.
This intrinsically changes the way that objects are handled in PHP. In PHP 5, when
an object’s
zval is copied (as happens on assignment or when passed into a function),
the data is not copied; another reference to the object is created.These semantics are
much more standard and correspond to the object semantics in Java, Python, Perl, and
other languages.
The Object Handlers
In PHP 5 it is possible (in the extension API) to control almost all access to an object
and its properties. A handler API is provided that implements the following access han-
dlers:
typedef struct _zend_object_handlers {
/* general object functions */
zend_object_add_ref_t add_ref;
zend_object_del_ref_t del_ref;
zend_object_delete_obj_t delete_obj;
zend_object_clone_obj_t clone_obj;
/* individual object functions */
zend_object_read_property_t read_property;
zend_object_write_property_t write_property;
zend_object_read_dimension_t read_dimension;
zend_object_write_dimension_t write_dimension;
zend_object_get_property_ptr_ptr_t get_property_ptr_ptr;
zend_object_get_t get;
zend_object_set_t set;
zend_object_has_property_t has_property;
zend_object_unset_property_t unset_property;
zend_object_has_dimension_t has_dimension;
zend_object_unset_dimension_t unset_dimension;
zend_object_get_properties_t get_properties;

zend_object_get_method_t get_method;
zend_object_call_method_t call_method;
zend_object_get_constructor_t get_constructor;
zend_object_get_class_entry_t get_class_entry;
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
490
Chapter 20 PHP and Zend Engine Internals
zend_object_get_class_name_t get_class_name;
zend_object_compare_t compare_objects;
zend_object_cast_t cast_object;
} zend_object_handlers;
We’ll explore each handler in greater depth in Chapter 22,“Extending PHP: Part II,”
where you’ll actually implement extension classes. In the meantime, you just need to
know that the handler names offer a relatively clear indication as to what they do. For
example, add_ref is called whenever a reference to an object is added:
$object2 = $object;
and compare_objects is called whenever two objects are compared by using the
is_equal operator:
if($object2 == $object) {}
Object Creation
In the Zend Engine version 2, object creation happens in two phases.When you call
this:
$object = new ClassName;
a new zend_object is created and placed in the object store, and a handle to it is
assigned to
$object. By default (as happens when you instantiate a userspace class), the
object is allocated by using the default allocator, and it is assigned the default access han-
dlers. Alternatively, if the class’s
zend_class_entry has its create_object function
defined, that function is called to handle the allocation of the object and returns the

array of
zend_object_handlers for that object.
This level of control is especially useful if you need to override the basic operations
of an object and if you need to store resource data in an object that should not be
touched by the normal memory management mechanisms.The Java and mono exten-
sions both use these facilities to allow PHP to instantiate and access objects from these
other language.
Only after the
zend_object_value is created is the constructor called on the object.
Even in extensions, the constructor (and destructor and clone) are “normal”
zend_
functions.They do not alter the object’s access handlers, which have already been estab-
lished.
Other Important Structures
In addition to the function and class tables, there are a few other important global data
structures worth mentioning. Knowledge of how these work isn’t terribly important for
a user of PHP, but it can be useful if you want to modify how the engine itself works.
Most of these are elements of either the compiler_globals struct or the
executor_globals struct and are most often referenced in the source via the macros
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
491
Classes
CG() and EG(), respectively.These are some of the global data structures you should
know about:
n
CG(function_table) and EG(function_table)—These structures refer to the
function table we’ve talked about up until now. It exists in both the compiler and
executor globals. Iterating through this hashtable gives you every callable function.
n
CG(class_table) and EG(class_table)—These structures refer to the hashtable

in which all the classes are stored.
n
EG(symbol_table)—This structure refers to a hashtable that is the main (that is,
global) symbol table.This is where all the variables in the global scope are stored.
n
EG(active_symbol_table)—This structure refers to a hashtable that contains the
symbol table for the current scope.
n
EG(zend_constants)—This structure refers to the constants hashtable, where con-
stants set with the function
define are stored.
n
CG(auto_globals)—This structure refers to the hashtable of autoglobals
($_SERVER, $_ENV, $_POST, and so on) that are used in the script.This is a compil-
er global so that the autoglobals can be conditionally initialized only if the script
utilizes them.This boosts performance because it avoids the work of initializing
and populating these variables when they are not needed.
n
EG(regular_list)—This structure refers to a hashtable that is used to store “reg-
ular” (that is, nonpersistent) resources. Resources here are PHP resource-type vari-
ables, such as streams, file pointers, database connections, and so on.You’ll learn
more about how these are used in Chapter 22.
n
EG(persistent_list)—This structure is like EG(regular_list),but
EG(persistent_list) resources are not freed at the end of every request (persist-
ent database connections, for example).
n
EG(user_error_handler)—This structure refers to a pointer to a zval that con-
tains the name of the current user_error_handler function (as set via the
set_error_handler function). If no error-handler function is set, this structure is

NULL.
n
EG(user_error_handlers)—This structure refers to the stack of error-handler
functions.
n
EG(user_exception_handler)—This structure refers to a pointer to a zval that
contains the name of the current global exception handler, as set via the function
set_exception_handler. If none has been set, this structure is NULL.
n
EG(user_exception_handlers)—This structure refers to the stack of global
exception handlers.
n
EG(exception)—This is an important structure.Whenever an exception is
thrown,
EG(exception) is set to the actual object handler’s zval that is thrown.
Whenever a function call is returned,
EG(exception) is checked. If it is not NULL,
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
492
Chapter 20 PHP and Zend Engine Internals
execution halts and the script jumps to the op for the appropriate catch block.We
will explore throwing exceptions from within extension code in depth in Chapter
21,“Extending PHP: Part I,” and Chapter 22.
n
EG(ini_directives)—This structure refers to a hashtable of the php.ini direc-
tives that is set in this execution context.
This is just a selection of the globals set in
executor_globals and compiler_globals.
The globals listed here were chosen either because they are used in interesting optimiza-
tions in the engine (the just-in-time population of autoglobals) or because you will want

to interact with them in extensions (such as resource lists).
The Principle of Sandboxing
The principle of sandboxing is that nothing that a user does in handling one request should in any way
affect a subsequent request. PHP is an extremely well-sandboxed language in that at the end of every
request, the interpreter is returned to a clean starting state. This specifically entails the following:
n
All function and class tables have all ZEND_USER_FUNCTION and ZEND_USER_CLASS (that is, all
userspace-defined functions and classes) removed.
n
All op arrays for any parsed files are discarded. (They are actually discarded immediately after use.)
n
The symbol tables and constants tables are completely cleaned of all data.
n
All resources not on the persistent list are destructed.
Solutions such as mod_perl make it easy to accidentally instantiate global variables that have persistent
(and thus potentially unexpected) values between requests. PHP’s request-end sterilization makes that sort
of problem almost impossible. It also means that data that is known not to change between requests (for
example, the compilation results of a file) needs to be regenerated on every request in which it is used. As
we’ve discussed before in relation to compiler caches such as APC, IonCube, and the Zend Accelerator,
avoiding certain aspects of this sandboxing can be beneficial from a performance standpoint. We’ll look at
some methods for that in Chapter 23.
The PHP Request Life Cycle
Now that you have a decent understanding of how the Zend Engine works, let’s look at
how the engine sits inside PHP and how PHP itself sits inside other applications.
Any discussion of the architecture of PHP starts with a diagram such as Figure 20.2,
which shows the application layers in PHP.
The outermost layer, where PHP interacts with other applications, is the Server
Abstraction API (SAPI) layer.The SAPI layer partially handles the startup and shutdown
of PHP inside an application, and it provides hooks for handling data such as cookies
and

POST data in an application-agnostic manner.
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
493
The PHP Request Life Cycle
Figure 20.2 The architecture of PHP.
Below the SAPI layer lies the PHP engine itself.The core PHP code handles setting up
the running environment (populating global variables and setting default .ini options),
providing interfaces such as the stream’s I/O interface, parsing of data, and most impor-
tantly, providing an interface for loading extensions (both statically compiled extensions
and dynamically loaded extensions).
At the core of PHP lies the Zend Engine, which we have discussed in depth here.As
you’ve seen, the Zend Engine fully handles the parsing and execution of scripts.The
Zend Engine was also designed for extensibility and allows for entirely overriding its
basic functionality (compilation, execution, and error handling), overriding selective por-
tions of its behavior (overriding
op_handlers in particular ops), and having functions
called on registerable hooks (on every function call, on every opcode, and so on).These
features allow for easy integration of caches, profilers, debuggers, and semantics-altering
extensions.
Application
(apache, thttpd, cli, etc.)
Zend Engine
SAPI
(see Chap 23)
Zend Extension API
(see Chap 23)
Zend API
Extensions
(mysql, standard library, etc. )
(see Chap 22)

PHP API
(streams, output, etc)
(see chap 22)
PHP API
(streams, output, etc.)
(see Chap 22)
PHP
Modular Code
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
494
Chapter 20 PHP and Zend Engine Internals
The SAPI Layer
The SAPI layer is the abstraction layer that allows for easy embedding of PHP into other
applications. Some SAPIs include the following:
n
mod_php5—This is the PHP module for Apache, and it is a SAPI that embeds
PHP into the Apache Web server.
n
fastcgi—This is an implementation of FastCGI that provides a scalable extension
to the CGI standard. FastCGI is a persistent CGI daemon that can handle multiple
requests. FastCGI is the preferred method of running PHP under IIS and shows
performance almost as good as that of mod_php5.
n
CLI—This is the standalone interpreter for running PHP scripts from the com-
mand line, and it is a thin wrapper around a SAPI layer.
n
embed—This is a general-purpose SAPI that is designed to provide a C library
interface for embedding a PHP interpreter in an arbitrary application.
The idea is that regardless of the application, PHP needs to communicate with an appli-
cation in a number of common places, so the SAPI interface provides a hook for each of

those places.When an application needs to start up PHP, for instance, it calls the startup
hook. Conversely, when PHP wants to output information, it uses the provided
ub_write hook, which the SAPI layer author has coded to use the correct output
method for the application PHP is running in.
To understand the capabilities of the SAPI layer, it is easiest to look at the hooks it
implements. Every SAPI interface registers the following
struct, with PHP describing
the callbacks it implements:
struct _sapi_module_struct {
char *name;
char *pretty_name;
int (*startup)(struct _sapi_module_struct *sapi_module);
int (*shutdown)(struct _sapi_module_struct *sapi_module);
int (*activate)(TSRMLS_D);
int (*deactivate)(TSRMLS_D);
int (*ub_write)(const char *str, unsigned int str_length TSRMLS_DC);
void (*flush)(void *server_context);
struct stat *(*get_stat)(TSRMLS_D);
char *(*getenv)(char *name, size_t name_len TSRMLS_DC);
void (*sapi_error)(int type, const char *error_msg, );
int (*header_handler)(sapi_header_struct *sapi_header,
sapi_headers_struct *sapi_headers TSRMLS_DC);
int (*send_headers)(sapi_headers_struct *sapi_headers TSRMLS_DC);
void (*send_header)(sapi_header_struct *sapi_header,
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
495
The PHP Request Life Cycle
void *server_context TSRMLS_DC);
int (*read_post)(char *buffer, uint count_bytes TSRMLS_DC);
char *(*read_cookies)(TSRMLS_D);

void (*register_server_variables)(zval *track_vars_array TSRMLS_DC);
void (*log_message)(char *message);
char *php_ini_path_override;
void (*block_interruptions)(void);
void (*unblock_interruptions)(void);
void (*default_post_reader)(TSRMLS_D);
void (*treat_data)(int arg, char *str, zval *destArray TSRMLS_DC);
char *executable_location;
int php_ini_ignore;
int (*get_fd)(int *fd TSRMLS_DC);
int (*force_http_10)(TSRMLS_D);
int (*get_target_uid)(uid_t * TSRMLS_DC);
int (*get_target_gid)(gid_t * TSRMLS_DC);
unsigned int (*input_filter)(int arg, char *var,
char **val, unsigned int val_len TSRMLS_DC);
void (*ini_defaults)(HashTable *configuration_hash);
int phpinfo_as_text;
};
The following are some of the notable elements from this example:
n
startup—This is called the first time the SAPI is initialized. In an application that
will serve multiple requests, this is performed only once. For example, in
mod_php5, this is performed in the parent process before children are forked.
n
activate—This is called at the beginning of each request. It reinitializes all the
per-request SAPI data structures.
n
deactivate—This is called at the end of each request. It ensures that all data has
been correctly flushed to the application, and then it destroys all the per-request
data structures.

n
shutdown—This is called at interpreter shutdown. It destroys all the SAPI struc-
tures.
n
ub_write—This is what PHP will use to output data to the client. In the CLI
SAPI, this is as simple as writing to standard output; in mod_php5, the Apache
library call rwrite is called.
n
sapi_error—This is a handler for reporting errors to the application. Most SAPIs
use php_error, which instructs PHP to use its own internal error system.
n
flush—This tells the application to flush its output. In the CLI, this is implement-
ed via the C library call fflush; mod_php5 uses the Apache library rflush.
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
496
Chapter 20 PHP and Zend Engine Internals
n
send_header—This sends a single specified header to the client. Some servers
(such as Apache) have built-in functions for handling header transmission. Others
(such as the PHP CGI) require you to manually send them. Others still (such as
the CLI) do not handle sending headers at all.
n
send_headers—This sends all headers to the client.
n
read_cookies—During SAPI activation, if a read_cookies handler is defined, it
will be called to populate SG(request_info).cookie_data.This is then used to
populate the $_COOKIE autoglobal.
n
read_post—During SAPI activation, if the request method is a POST (or if the
php.ini variable always_populate_raw_post_data is true), the read_post han-

dler is called to populate $
HTTP_RAW_POST_DATA and $_POST.
Chapter 23 takes a closer look at using the SAPI interface to integrate PHP into applica-
tions and does a complete walkthrough of the CGI SAPI.
The PHP Core
There are several key steps in activating and running a PHP interpreter.When an appli-
cation wants to start a PHP interpreter, it starts by calling php_module_startup.This
function is like the master switch that turns on the interpreter. It activates the registered
SAPI, initializes the output buffering system, starts the Zend Engine, reads in and acts on
the
php.ini file, and prepares the interpreter for its first request. Some important func-
tions that are used in the core are
n
php_module_startup—This is the master startup for PHP.
n
php_startup_extensions—This runs the initialization function in all registered
extensions.
n
php_output_startup—This starts the output system.
n
php_request_startup—At the beginning of a request, this is the master function,
which calls up to the SAPI per-request functions, calls down into the Zend
Engine for per-request initialization, and calls the request startup function in all
registered modules.
n
php_output_activate—This activates the output system, setting the output func-
tions to use the SAPI-specified output functions.
n
php_init_config—This reads in the php.ini file and acts on its contents.
n

php_request_shutdown—This is the master function to destroy per-request
resources.
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
497
The PHP Request Life Cycle
n
php_end_ob_buffers—This is used to flush output buffers, if output buffering has
been enabled.
n
php_module_shutdown—This is the master shutdown function for PHP, triggering
all the rest of the interpreter shutdown functions.
The PHP Extension API
Most of our discussion regarding the PHP extension API will be carried on in Chapter
22, where you will actually implement extensions. Here we’ll only look at the basic call-
backs available to extensions and when they are called.
Extensions can be registered in two ways.When an extension is compiled statically
into PHP, the configuration system permanently registers that module with PHP.An
extension can also be loaded from the
.ini file, in which case it is registered during the
.ini parsing.
The hooks that an extension can register are contained in its
zend_module_entry
function, like so:
struct _zend_module_entry {
unsigned short size;
unsigned int zend_api;
unsigned char zend_debug;
unsigned char zts;
struct _zend_ini_entry *ini_entry;
char *name;

zend_function_entry *functions;
int (*module_startup_func)(INIT_FUNC_ARGS);
int (*module_shutdown_func)(SHUTDOWN_FUNC_ARGS);
int (*request_startup_func)(INIT_FUNC_ARGS);
int (*request_shutdown_func)(SHUTDOWN_FUNC_ARGS);
void (*info_func)(ZEND_MODULE_INFO_FUNC_ARGS);
char *version;
int (*global_startup_func)(void);
int (*global_shutdown_func)(void);
int globals_id;
int module_started;
unsigned char type;
void *handle;
int module_number;
};
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
498
Chapter 20 PHP and Zend Engine Internals
The following are some important elements of this struct:
n
module_startup_func—This hook is called when the module is first loaded.This
traditionally registers globals, performs any one-time initializations, and registers
any .ini file entries that the module wants to use. In some pre-fork architectures,
notably Apache, this function is called in the parent process, before forking.This
makes it an inappropriate place to initialize open sockets or database connections
because they may not behave well if multiple processes try to use the same
resources.
n
module_shutdown_func—This hook is called when the interpreter shuts down.
Any resources that the module has allocated should be freed here.

n
request_startup_func—This is called at the beginning of each request.This
hook is particularly useful for setting up any sort of per-request resources that a
script may need.
n
request_shutdown_func—This is called at the end of every request.
n
functions—This is the function that the extension defines.
n
ini_functions—This is the .ini file entries that the extension registers.
The Zend Extension API
The final component of the PHP request life cycle is the extension API that the Zend
Engine itself provides for extensibility.There are two major components of the extensi-
bility: Certain key internal functions are accessed via function pointers, meaning that
they can be overridden at runtime, and there is a hook API that allows an extension to
register code to be run before certain opcodes.
These are the main function pointers used in the Zend Engine:
n
zend_compile—We discussed this function at the beginning of the chapter.
zend_compile is the wrapper for the lexer, parser, and code generator.APC and
the other compiler caches overload this pointer so that they can return cached
copies of scripts’ op arrays.
n
zend_execute—Also discussed earlier in this chapter, this is the function that exe-
cutes the code generated by
zend_compile. APD and the other code profilers
overload
zend_execute so that they can track with high granularity the time spent
in every function call.
n

zend_error_cb—This is a pointer that sets the function called anytime an error is
triggered in PHP. If you wanted to write an extension that automatically converts
errors to exceptions, this would be the place to do it.
n
zend_fopen—This is the function that implements the open call that is used inter-
nally whenever a file needs to be opened.
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
499
The PHP Request Life Cycle
The hook API is an extension of the PHP extension API:
struct _zend_extension {
char *name;
char *version;
char *author;
char *URL;
char *copyright;
startup_func_t startup;
shutdown_func_t shutdown;
activate_func_t activate;
deactivate_func_t deactivate;
message_handler_func_t message_handler;
op_array_handler_func_t op_array_handler;
statement_handler_func_t statement_handler;
fcall_begin_handler_func_t fcall_begin_handler;
fcall_end_handler_func_t fcall_end_handler;
op_array_ctor_func_t op_array_ctor;
op_array_dtor_func_t op_array_dtor;
int (*api_no_check)(int api_no);
void *reserved2;
void *reserved3;

void *reserved4;
void *reserved5;
void *reserved6;
void *reserved7;
void *reserved8;
DL_HANDLE handle;
int resource_number;
};
The pointers provide the following functionality:
n
startup—This is functionally identical to an extension’s module_startup_func
function.
n
shutdown—This is functionally identical to an extension’s module_shutdown_func
function.
n
activate—This is functionally identical to an extension’s request_startup_func
function.
n
deactivate—This is functionally identical to an extension’s
request_shutdown_func function.
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
500
Chapter 20 PHP and Zend Engine Internals
n
message_handler—This is called when the extension is registered.
n
op_array_handler—This is called on a function’s op_array after the function is
compiled.
n

statement_handler—If this handler is set, an additional opcode is inserted before
every statement.This opcode’s handler executes all the registered statement han-
dlers.This handler can be useful for debugging extensions, but because it effective-
ly doubles the size of the script’s op array, it can have a deleterious effect on system
performance.
n
fcall_begin_handler—If this handler is set, an additional opcode is inserted
before every
ZEND_DO_FCALL and ZEND_DO_FCALL_BY_NAME opcode.That opcode’s
handler executes all registered
fcall_begin_handler functions.
n
fcall_end_handler—If this handler is set, an additional opcode is inserted after
every
ZEND_DO_FCALL and ZEND_DO_FCALL_BY_NAME opcode.That opcode’s han-
dler executes all registered
fcall_end_handler functions.
How All the Pieces Fit Together
The preceding sections provide a lot of information. PHP, SAPIs, the Zend Engine—
there are a lot of moving parts to consider.The most important part in understanding
how a system works is understanding how all the pieces fit together. Each SAPI is
unique in how it ties all the pieces together, but all the SAPIs follow the same basic pat-
tern.
Figure 20.3 shows the complete life cycle of the
mod_php5 SAPI. After the initial
server startup, the process loops the handling requests.
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
501
The PHP Request Life Cycle
Figure 20.3 The

mod_php5 request life cycle.
php_output_activate
php_request_startup
zend_activate
sapi_activate
zend_activate_modules
zend_compile
zend_compile
zend_shutdown_modules
zend_deactivate
sapi_deactivate
sapi_shutdown
sapi_startup
startup
zend_extensions
startup dynamically
Ioaded extensions
startup internal
extensions
parse ini values
zend_startup
php_output_startup
php_module_startup
zend_shutdown
Startup
Per Request Steps
run extension request
startup functions
pull in request data
from Apache

run zend_extension
activate functions
actually parse and execute
the script
initialize compiler and executor
request end
TEAM FLY
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
502
Chapter 20 PHP and Zend Engine Internals
Further Reading
Documentation for the Zend Engine is pretty scarce. If you prefer a more hands-on
introduction than is presented here, skip ahead to Chapter 23, where you will see a com-
plete walkthrough of the CGI SAPI as well as extensive coverage of how to embed PHP
into external applications.
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.

×