Tải bản đầy đủ (.pdf) (50 trang)

Tài liệu MySQL Administrator’s Bible- P10 ppt

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (1.16 MB, 50 trang )

Storage Engines 11
Once this is done you can SELECT from the FEDERATED table and see the rows in the remote
table.
Limitations of FEDERATED tables
Although FEDERATED tables can be extremely useful they do have some limitations. FED-
ERATED tables are non-transactional in nature regardless of the support that the remote table
might have for transactions. In addition, indexes are not always handled the same for the
FEDERATED table as they are for the remote table. FEDERATED tables do not support index
prefixes on
CHAR, VARCHAR, TEXT,orBLOB columns. There are instances where indexes are not
used as expected.
Though it is a fairly minor issue, FEDERATED tables do not support the use of
ALTER TABLE.
To change the table definition of a FEDERATED table you must drop the table and re-create it.
Even with these limitations the FEDERATED engine can be very useful when remote data access
is needed.
NDB storage engine
The NDB storage engine is used to build a MySQL Cluster. MySQL Cluster is a shared-nothing,
highly available system. MySQL Cluster is designed to avoid a single point of failure. A cluster
will contain one or more SQL nodes (traditional MySQL Servers), one or more data nodes, and
one or more management servers. For more information about MySQL Cluster, see Chapter 22.
Feature summary:
■ Built to provide high availability with redundant sets of data for node failure scenarios.
■ Transactional support.
■ No foreign key support.
■ Row-level locking.
■ Operates either entirely in memory or supports on-disk storage of data columns. Indexes
must always be stored in memory.
Beginning with MySQL Server version 5.1.24 the standard binaries built by Sun Microsystems
will not include the NDB storage engine. Though it is still available with the source code (for
now), the code for the NDB storage engine will not be updated to keep pace with changes in the


NDB storage engine. If you need to use the NDB storage engine, use the MySQL Cluster server
instead of MySQL Server.
Archive storage engine
The Archive storage engine is a good option for those needing to store data long term or
perform data auditing on application operations. For quite some time MySQL has offered an
option of using compressed MyISAM tables. There are performance benefits to using compressed
417
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
Part III Core MySQL Administration
MyISAM tables for read operations, but there are several problems with MyISAM compressed
tables:
■ Compressing (or packing) a MyISAM table requires that the database be taken offline while
the table is compressed. This is done through the use of the of the myisampack utility from
the shell prompt.
■ Compressed MyISAM tables are read-only. You cannot
INSERT, DELETE,orUPDATE on a
compressed table.
Archive tables are, in several ways, an improvement over these older MyISAM packed tables.
The designers of the Archive storage engine were definitely looking at what was good about
compressed MyISAM tables and improving on the areas of weaknesses of compressed MyISAM
tables.
Feature summary:
■ Non-transactional.
■ Archive tables allow
INSERT operations on the table. However, Archive tables do not
allow
UPDATE or DELETE operations. For data auditing purposes this is ideal because
auditing requirements specify that once data has been created it cannot be changed in
any manner.
■ Very good data compression factor. The compression factor on an Archive table is higher

than a packed MyISAM table.
■ For reading data, much like the InnoDB engine, the Archive engine uses a snapshot read.
This ensures that read operations do not block write operations.
■ The Archive engine uses row-level locking.
■ The Archive storage engine supports only one index. However, in tests, the Archive stor-
age engine performs better with read queries than MyISAM packed tables, even when the
packed tables have indexes the Archive tables do not.
Archive tables use up to four files during operation. All files begin with a filename of the table
name. The table definition file has a suffix of
.frm. The data file has a suffix of .ARZ. The file
with metadata information for the table (if present) has a suffix of
.ARM. During table optimiza-
tion operations it is also possible that there is a file with a suffix of
.ARN. These files will all be
located in the directory of the database in which the tables are located.
The following shows a modified version of the
rental table from the sakila database.
Changes were made because Archive tables do not support foreign keys and the
rental table
had more than one index:
mysql> CREATE TABLE `rental_archive` (
-> `rental_id` int(11) NOT NULL AUTO_INCREMENT,
-> `rental_date` datetime NOT NULL,
-> `inventory_id` mediumint(8) unsigned NOT NULL,
-> `customer_id` smallint(5) unsigned NOT NULL,
418
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
Storage Engines 11
-> `return_date` datetime DEFAULT NULL,
-> `staff_id` tinyint(3) unsigned NOT NULL,

-> `last_update` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
-> PRIMARY KEY (`rental_id`)
-> ) ENGINE=Archive DEFAULT CHARSET=utf8;
Query OK, 0 rows affected (0.01 sec)
This creates an archival version of the table. Now to demonstrate the ability to perform INSERT
statements but not UPDATE or DELETE statements on an Archive table:
mysql> INSERT INTO rental_archive
-> (rental_date, inventory_id, customer_id,
-> return_date, staff_id) values (NOW(),’1’,’23’,NOW(),’1’);
Query OK, 1 row affected (0.00 sec)
mysql> SELECT rental_id, rental_date, inventory_id, customer_id
-> return_date, staff_id, last_update
-> FROM rental_archive\G
*************************** 1. row ***************************
rental_id: 1
rental_date: 2009-01-16 16:44:41
inventory_id: 1
customer_id: 23
return_date: 2009-01-16 16:44:41
staff_id: 1
last_update: 2009-01-16 16:44:41
1 row in set (0.00 sec)
mysql> UPDATE rental SET rental_date=NOW() WHERE rental_id=’1’;
ERROR 1031 (HY000): Table storage engine for ’rental_archive’ doesn’t
have this option
mysql> DELETE FROM rental_archive WHERE rental_id=’1’;
ERROR 1031 (HY000): Table storage engine for ’rental_archive’ doesn’t
have this option
In a nutshell, what is demonstrated here, along with the data compression capabilities, are the
two primary reasons for using Archive tables.

Blackhole storage engine
The Blackhole storage engine does not actually store data in tables as with other storage engines.
This might seem to be very counter-intuitive. It is a storage engine that does not store data. But
there are uses for this setup.
A master server may have an extremely high-write table whose data is only ever used on a
slave. For example, session information including where a user clicked may be used in reporting
queries, but are never queried on the master server. In this case, the table on the master can be
created with
ENGINE=BLACKHOLE and the table on the slave can be modified to use any storage
419
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
Part III Core MySQL Administration
engine. In this way, data updates on the master happen instantaneously, because INSERT,
UPDATE,andDELETE statements return immediately as successful (so long as they are valid
DML statements according to the table schema). This is a frequently used method of having a
high-write table cause almost no extra I/O on the master (the binary logs are still written to).
Something quite common with large setups is the use of multiple slaves from a master server. If
you have too many slaves this can cause load issues on the master server. You can implement a
relay slave that acts as an intermediary to reduce this load. We discuss this in detail in Chapter
16. This relay slave can be configured using the Blackhole storage engine for all tables being
replicated. Even though the Blackhole tables do not store any data changes, the binary logs are
still written if logging is enabled. The relay slave operates much more efficiently because of the
minimization of I/O activity.
CSV storage engine
The CSV (Comma Separated Value) storage engine is an engine of a decidedly different nature.
A CSV data file is simply a text file that can be manipulated with a simple text editor or
command-line text tools if needed. One of the primary uses for CSV is for data exchange and
fast importing.
Feature summary:
■ Plain text data file in CSV format

■ Can have an instantaneous import time
■ Easily imported into programs such as Microsoft Excel
■ Table-level locking
■ No foreign key support
■ Non-transactional
■ Trivial backups and restores (copy the files)
■ Does not support indexing or partitions
Three files are created with any CSV table: an
.frm file, which contains the table format; a .CSM
file, which contains metadata; and the .CSV file, which contains the data. Each of these files has
a prefix that consists of the table name.
Here is a brief example from a snippet of a data file of a three-field CSV table:
"1","Overstreet","Joe"
"2","Beal", "Sally"
"3","Murphy","Ashton"
"4","McGhee", "Sam"
420
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
Storage Engines 11
Want an instantaneous import time? Just have the data that you want to be imported
stored in the CSV format in an export file. Create a new CSV table with the desired
fields and then just copy the export file to the directory of the database. Then move the file to
table_name.CSV. Issue a FLUSH TABLES command and the data is instantly available.
Working with Storage Engines
A number of commands are used specifically to work with any storage engine. These commands
are extremely useful in the day-to-day work of a system administrator.
CREATE TABLE
The CREATE TABLE statement is used to create a database table. You can specify the storage
engine that you want to use for the table by using the
ENGINE clause or the server will use the

default storage engine.
For example, to create an InnoDB table:
CREATE TABLE innodb_example (
id INTEGER UNSIGNED NOT NULL PRIMARY KEY AUTO_INCREMENT,
name VARCHAR(20) NOT NULL
) ENGINE = InnoDB;
ALTER TABLE
The ALTER TABLE command is used to modify previously created tables. The full syntax for the
command is covered in Chapter 4. If you need to change a table type from one storage engine
to another the
ALTER TABLE command makes this simple. If you executed the previous CREATE
TABLE
without specifying an ENGINE clause it would create a MyISAM table by default:
CREATE TABLE innodb_example (
id INTEGER UNSIGNED NOT NULL PRIMARY KEY AUTO_INCREMENT,
name VARCHAR(20) NOT NULL
);
After you create the table you realize it needed to be an InnoDB table. To change this you sim-
ply run the
ALTER TABLE command:
mysql> ALTER TABLE innodb_example ENGINE=InnoDB;
That is all there is to it!
421
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
Part III Core MySQL Administration
DROP TABLE
The DROP TABLE command is used to drop a table and all its contents. Be careful, because this
action cannot be undone. If it is executed inside a transaction the
DROP TABLE command will
implicitly commit the transaction before performing the actual

DROP TABLE command.
mysql> DROP TABLE innodb_example;
Summary
The amazing variety of storage engines available for MySQL Server can be a great benefit to the
database administrator. It allows you to choose the storage engine that best fits the needs of your
application. You even have the choice of having more than one storage engine at the same time
in your database, and having a table on a slave server have a different storage engine than the
same table on the master server. This range of options brings complexity. There are typically
many options to configure each storage engine. A beginning administrator should concentrate on
understanding the MyISAM and InnoDB storage engines because these are going to be the ones
you work with most often in a typical MySQL Server environment.
Some of the topics that were taught in this chapter include:
■ Pluggable storage engine architecture
■ Storage engine plugins
■ Available storage engines:
■ MyISAM / Merge
■ InnoDB
■ MEMORY
■ Maria
■ Falcon
■ PBXT
■ FEDERATED
■ NDB
■ Archive
■ Blackhole
■ CSV
■ Commands used to work with storage engines
422
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
Caching with MySQL

IN THIS CHAPTER
Implementing Cache Tables
Working with the Query Cache
Utilizing Memcached
O
ne of the ways of making queries respond faster is to implement
caching. A cache stores frequently used information in a place
where it can be accessed faster. For example, a web browser
like Firefox uses a cache to store the text, images, and other objects from
recently visited websites on your hard drive. When you visit a page you
have recently been to, the text and images do not have to be downloaded
a second time. Another way objects on web pages are cached is through a
caching proxy server such as Squid (
www.squid-cache.org). A caching
proxy server is a proxy between the Internet and a set of machines.
It eliminates the redundancy of each machine having a private cache.
Furthermore, a cache of web objects from sites that all the machines have
recently visited are stored, so a machine can benefit from a proxy cache
even if it has never visited a particular web page before. Figure 12-1 shows
a simplified diagram of a network of desktops that utilize a caching proxy
server for web pages.
When users visit a web page their computer will first check the web page
cache in their browser as described previously. If the page is not stored locally,
the next step is to check the Squid cache. If the web page is not stored in the
Squid cache, the web page is downloaded from the actual website — storing
objects in the Squid and local browser cache along the way.
This may sound complicated, but there is very little overhead from check-
ing these caches. When a cache can be used it is much faster than
downloading content over the Internet. If you have a hundred people
behind a caching proxy, there are going to be many cached

web objects that can be accessed by someone else. Everyone
behind the web caching proxy benefits from the entire cache.
Not only is the user experience faster, but because
423
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
Part III Core MySQL Administration
there is less need to go to the Internet to download a web page, less bandwidth is used. The
drawback is sometimes a web page has been updated and a cache has older information.
FIGURE 12-1
Internet
Caching
Proxy
Server
Like caching in any other application, caching with MySQL is designed to return data from a
query quickly. There are three common ways to cache data with MySQL: creating manual cache
tables, using internal caching in
mysqld, and using the memcached distributed caching system.
Implementing Cache Tables
Developers and database administrators spend lots of time optimizing queries. However, one
aspect of query optimization that is often overlooked is that a simple data access is much faster
than a query with calculations, sorting, and aggregations. A cache table is a regular table in a
database that stores the results of one or more queries for faster retrieval. Common query results
that could be cached in a table are counts, ratings, and summaries.
For example, instead of calculating the number of total site visitors every time someone visits a
particular web page, you can set an event in the Event Scheduler to calculate the total number
of visitors every 30 minutes and store that information in a visitor count table. Then the slower
query to calculate count is run once every 30 minutes, and every single time the web page is
visited it does a simple query of the visitor count table. For more information on the Event
Scheduler and how to create events, see Chapter 7.
As an example, assume it takes 4 seconds to calculate the total visitor count and a quarter of a

second to return the results from the visitor count cache table. If there are 100 visitors to a site in
30 minutes, it would take 4 seconds per visitor * 100 visitors = 400 seconds of query processing
time to run the query for every visitor. However, if you ran the query once every 30 minutes, it
would take 4 seconds for the query to run once + 1/4 second to retrieve data from the cache
table * 100 visitors to the site = 29 seconds of query processing time. The benefit is obvious.
424
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
Caching with MySQL 12
The tradeoff is that the total count may not be 100% accurate. However, it is likely good
enough — visitor count is usually used to show how many people have visited the site, and
the count still adds value even if it is off by a few percentage points. The decision of whether
or not the number is good enough has to be made through a joint effort between the database
administrator and the business to determine if this is a reasonable tradeoff.
There is one other tradeoff of using the Event Scheduler — the query is run every 30 minutes,
regardless of whether or not the information is used. For example, if nobody visits the website
for an hour, there is no reason to recalculate the visitor count. If this is an issue you need to
overcome, you will have to coordinate with application developers as well to create querying
code that uses the following logic:
■ Check the age of the data in the cache table
■ If the data age is 30 minutes or less, use the data in the cache table
■ If the data age is greater than 30 minutes, re-run the query, storing the results and new
time in the cache table
It is preferable to schedule an event in the Event Scheduler, but for resource-heavy queries that
are run on data that does not change very often, it may be necessary to create custom applica-
tion code instead.
Going back to our example using the Event Scheduler — after the decision to store total
visitor count every 30 minutes is made, there is a problem. The business has changed its
requirements, and tells you that the count needs to be 100% accurate; it is unacceptable to have
the query take 4 seconds, and there is no money in the budget for faster hardware.
This issue can be solved by storing the total visitor count for a time period in the past and then

adding the count of visitors since the last calculation. For example, at midnight daily, calculate
the total visitor count for the previous day. This count will never change once the day is over,
so the count for an entire day never needs to be recalculated. The running total of visitors can
be calculated as well, and then a 100% accurate count can be done at any time by calculating
the total visitor count so far today and adding it to the running total calculated once per day.
Computing the count for one day is much less resource-intensive than running full count every
time someone visits a page.
Not only will this allow for accurate counts of total visitors, but it will allow for easy computa-
tion of number of visitors in a given data range — in our example, the number of visitors in a
week can be calculated by adding the seven daily totals of visitors. The previous method did not
easily allow for this type of data manipulation.
Here is an example of how to create such a system:
CREATE TABLE visitors_today (
today INT UNSIGNED NOT NULL DEFAULT 0,
vcount BIGINT UNSIGNED NOT NULL DEFAULT 0
) ENGINE=InnoDB;
425
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
Part III Core MySQL Administration
CREATE TABLE visitors_stored (
vcount BIGINT UNSIGNED NOT NULL DEFAULT 0,
vcount_date DATE NOT NULL PRIMARY KEY
) ENGINE=MyISAM;
The visitors_stored table holds a historic record of the count of visitors per day. There can
only be one entry per day. The
visitors_today table holds the current count of visitors so far
today (in the
today field) and the total count of visitors up to today (in the vcount field).
With each new visitor, update the
visitors_today.count field:

UPDATE visitors_today SET today = today + 1;
To retrieve the current total visitor count:
SELECT today + vcount FROM visitors_today;
Every day at midnight, schedule an event that:
■ Inserts a new row with yesterday’s date and count in the
visitors_stored table
■ Calculates and stores the total running count in
visitors_today.vcount
■ Resets the vistors_today.today field to zero
Here is a sample event that updates the current and historic visitor count tables:
CREATE EVENT update_vcount
ON SCHEDULE EVERY 1 DAY
STARTS ’2009-01-01 00:00:00’
DO BEGIN
INSERT INTO visitors_stored (vcount, vcount_date)
SELECT today, CURRENT_DATE() - INTERVAL 1 DAY FROM visitors_today;
UPDATE visitors_today set today=0, vcount=(
SELECT SUM(vcount) as vcount FROM visitors_stored);
END
Do not forget to enable the Event Scheduler with SET GLOBAL event_scheduler=’ON’,and
recall from Chapter 7 that you will need to change the
DELIMITER to define an event with
multiple statement events. Also, for the system to function it must be initialized; at minimum
you must populate the
visitors_today table. If today already had 37 visitors and the total
number of visitors at the start of the day was 1,638,492, the following would populate the table
properly:
INSERT INTO visitors_today (today, vcount) VALUES (37,1638492);
For a new system, simply use INSERT INTO visitors_today (today, vcount) VALUES
(0,0);

.
426
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
Caching with MySQL 12
These two tables achieve the business goals of allowing for accurate counts and faster count-
ing without buying new hardware. For more ways to optimize this and other queries, see
Chapter 18.
As a final example before moving on, another way to keep a counter accurate is through the use
of a trigger. If you need to keep an accurate count of the number of rows in a table and it is
resource-intensive to run the
SELECT COUNT(*) query (as is the case on InnoDB tables), just
have a trigger update a count statistic on every
INSERT or DELETE from the table. Keep in mind
that using triggers brings its own overhead and might not be ideal for your situation. As well, it
is useful for simple calculations such as a running counter, but there is not an equivalent useful
solution for more complex calculations such as ranking.
System-intensive queries, whose results can be slightly inaccurate, are often perfect candidates
for cache tables. Some of these queries may be required to use a cache table because the calcula-
tions use so many resources!
Working with the Query Cache
Internally, mysqld can cache the result sets of SQL statements. Similar to the concept of a local
cache of a web page,
mysqld can compare a query to the queries stored in the query cache. If
the query is stored in the query cache, the result set is retrieved without having to execute the
query again. If the query is not stored in the query cache, the query is executed and the result
set can be stored in the query cache so the next time the query is called the result set will be
there.
By default, the query cache is not enabled. Using the query cache adds more overhead —
memory is used to store the query cache, and it takes processing time to check the query cache.
If the query cache is frequently checked and matches are rarely found, the additional processing

time required may hurt performance instead of helping it. Make sure to thoroughly test the
query cache before implementing it.
In some cases, turning the query cache on will produce amazing performance improvements.
However, it is a common mistake for database administrators new to MySQL to believe that the
query cache will solve most performance issues. In reality, only queries with certain characteris-
tics will benefit from using the query cache. This section discusses how the query cache works,
to better provide you with information to help you decide when the query cache is appropriate
for you.
What gets stored in the query cache?
The MySQL query cache is simple in operation. It only caches SELECT statements and their
corresponding result sets, and only for deterministic
SELECT statements. Deterministic SELECT
statements are statements that always produce the same results no matter what, given the same
427
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
Part III Core MySQL Administration
data set. For example, SELECT COUNT(*) FROM tbl is always the same given the same data set,
but
SELECT NOW() is not always the same.
The query cache stores a rudimentary hash of the query along with its result set. This means
that the query being compared must be byte-for-byte identical to the cached query — the
queries themselves are not compared; their hashes are compared. This means that in order to
match, the queries must match with regard to case sensitivity and whitespace. Consider the
following two queries:
SELECT customer_id,name from sakila.customer;
SELECT customer_id,name FROM sakila.customer;
Both queries executed from the command line will return identical result sets. However, from
the perspective of the query cache, both queries are not identical. With the first query
from is
not capitalized and with the second query

FROM is capitalized. From the perspective of the query
cache these are not the same query. As mentioned previously, different spacing in the queries
will also result in no match being found.
In addition to spacing and capitalization, queries are viewed as different by the query cache if
they use different databases, protocol versions, or character sets. Furthermore, not all
SELECT
statements will be cached. SELECT statements with the following properties are not cached:
■ Use of non-deterministic functions
■ Most subqueries
■ Use of user-defined functions
■ Use of temporary tables

SELECT statements in stored functions, triggers, views, and events
■ Using
LOCK IN SHARE MODE or FOR UPDATE
■ Result sets larger than query_cache_limit (1 Mb by default; see the section in this
chapter on query cache configuration)
■ Using
SQL_NO_CACHE
Just because a query can be cached, however, does not mean it should be cached. If a query is
being executed frequently and does not have a result set that exceeds the
query_cache_limit,
it may be a good candidate for caching. However, another factor to consider is how many times
a cached query is used before it is removed from the query cache.
Query cache invalidation is when a query is removed from the query cache because its result set
may have changed. When a table is modified by either DDL or DML (either schema changes or
data changes), all of the queries in the query cache referencing that table are invalidated and
removed. This query cache invalidation is not very granular — it invalidates query cache results
based on tables, not rows. A query may be removed from the query cache even though its result
set has not changed, because a table it references has changed.

428
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
Caching with MySQL 12
Query cache pruning is when a query is removed from the query cache because the query cache
cannot find a free block in which to store a new query. The query cache prunes older entries
using a least recently used (LRU) algorithm.
The overhead of query removal must be considered when determining if a query is a good
candidate for caching. Queries that reference tables that are frequently updated and queries
that are pruned before being used several times are not good candidates for the query cache. In
addition, if there are many queries that can be cached, enabling the query cache for all queries
is not a good idea — we have seen many database administrators turn on the query cache, and
query performance gets worse, because every single query is being cached, and there are so
many queries being cached that the savings of using a cache are never recognized.
When using the query cache, each SELECT statement has overhead. First, a hash of
the query is calculated, and the query cache is checked for a match. If the query
does not match, the query proceeds as usual. If the query can be cached, the result set is stored in
the query cache when the query is completed. All queries that do not have a match in the query
cache will use more resources than if the query cache was turned off.
However, DML such as UPDATE, INSERT,andDELETE and DDL such as ALTER TABLE also have
extra overhead even though these statements are never cached. When a table is changed, the
cache is checked in order to remove any queries that are invalidated.
If you realize you do not want to store every query in the query cache, you can use the server
variable
query_cache_type. It has three possible options: OFF, ON,andDEMAND. Setting the
query_cache_type to OFF turns off the query cache. If you do not want to cache every result
set, you have two options:
■ Cache all queries, except for a few chosen queries. Setting the
query_cache_type to
ON and setting a query_cache_size larger than 0 enables the query cache for SELECT
statements as specified by the rules above. This means the query cache will cache all

possible queries, and you can choose certain queries not to cache by using
SELECT
SQL_NO_CACHE
.
■ Do not cache any queries, except for a few chosen queries. Setting the
query_cache_
type
to DEMAND and setting a query_cache_size larger than 0 enables the query
cache only for
SELECT statements with the SQL_CACHE clause. In other words, the query
cache does not cache any queries except for the ones you choose, which will use
SELECT
SQL_CACHE
.
Query cache memory usage and tuning
The query cache for the server is stored completely in memory, and does not persist across
mysqld restarts. Aside from about 40k of memory used by the query cache, the entire
pool of memory is initialized as a single contiguous chunk of memory. During query cache
operation this chunk of memory is divided up into smaller blocks. These blocks are used to
store the result sets of the cached queries. Each block must be at least the size specified by
query_cache_min_res_unit, which defaults to 4 Kb. For result sets smaller than 4 Kb, it still
uses4Kbofmemory.
429
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
Part III Core MySQL Administration
In addition to these blocks containing the result sets of the queries there are two hash tables.
One hash table stores query hashes. The second hash table contains the list of database tables
that each cached query references. When a table is modified, this second hash table is consulted
to determine which queries should be removed due to query cache invalidation.
Seven system variables are used to control the configuration and memory usage of the query

cache. The values of these can be seen by using
SHOW GLOBAL VARIABLES statement or in
the
GLOBAL_VARIABLES system view in the INFORMATION_SCHEMA (see Chapter 21 for more
information about
INFORMATION_SCHEMA).
Table 12-1 lists these variables and a brief description of each one.
TABLE 12-1
Query Cache System Variables
Variable Name Description
query_alloc_block_size Specifies the size of memory blocks allocated by the
query cache during query processing. The default is
8Kb.
query_cache_limit Result sets larger than this are not cached. The default
is 1 Mb.
query_cache_min_res_unit Specifies the minimum size for blocks in the query
cache. The default value is 4 Kb.
query_cache_size The total memory allocated by mysqld for the query
cache. The default is 0, which disables the query
cache. This can be changed dynamically — changes
do not require a mysqld restart.
query_cache_type Determines the mode of operation of the query
cache. Options are OFF, ON,andDEMAND. The
default is ON.
query_cache_wlock_invalidate If set to TRUE, queries referencing MyISAM tables
are invalidated when a write lock for that table is
obtained, even if none of the data is changed when
the write lock is released. The default is FALSE.
query_cache_prealloc_size Specifies the size of the buffer used for query parsing
by the cache. Defaults to 8 Kb.

There are eight status variables relating to the query cache. These variables are used for man-
aging and tuning the query cache. The values of these can be seen by using the
SHOW GLOBAL
STATUS
statement or in the GLOBAL_STATUS system view in the INFORMATION_SCHEMA (see
Chapter 21 for more information about
INFORMATION_SCHEMA).
430
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
Caching with MySQL 12
Table 12-2 lists each query cache status variable with a brief description.
TABLE 12-2
Query Cache Status Variables
Counter Name Description
Qcache_free_blocks Number of memory blocks free in the query cache
Qcache_free_memory Total bytes of memory free in the query cache
Qcache_hits Number of times a query matched the query cache
Qcache_inserts Number of times a query and result set were inserted
into the query cache
Qcache_lowmem_prunes Number of times a query was removed due to query
pruning
Qcache_not_cached Number of queries that could not be cached
Qcache_queries_in_cache Number of queries currently stored in the query
cache
Qcache_total_blocks Total number of memory blocks in cache
To see how useful your query cache is:
■ Determine the percentage of
SELECT queries that are getting their result sets from the
query cache. This rate, known as the query cache hit ratio, is calculated by taking
the number of query cache hits (matches) and dividing it by the total number of hits

plus the number of non-hits.
Com_select is the status variable that holds the number of
executed
SELECT queries that go through the full execution plan. To calculate the query
cache hit ratio:
Qcache_hits / (Qcache_hits + Com_select)
For example, if Qcache_hits is 15,593 and Com_select is 10,193:
15,593 / (15,593 + 10,193) = .60470 = 60.47%
Depending on your application, this can be a good hit ratio or not. If your application
has many different
SELECT statements, or if many statements cannot be cached or are
not being cached on purpose (that is, using
SELECT SQL_NO_CACHE), the hit ratio will
never be high. To increase your hit ratio, consider caching more queries (see the sidebar
on caching more queries).
431
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
Part III Core MySQL Administration
■ Calculate the percentage of SELECT queries that are placed in the query cache. This rate is
known as the insert ratio and, like the hit ratio, is calculated using the
Com_select status
variable:
Qcache_inserts / Com_select * 100
If this percentage is high, it means that most of your SELECT queries are being put into
the query cache. If
Qcache_lowmem_prunes is also low, it means that even though most
of your
SELECT queries are being put into the query cache, they are not causing your
query cache to fill up, thus pruning queries from the query cache. However, this may not
necessarily indicate a well-tuned query cache, because there could be a lot of query cache

invalidation happening. Unfortunately there is no parameter to see exactly how much
query cache invalidation is happening. You may be able to estimate how many queries are
being removed by watching how the
Qcache_queries_in_cache variable changes over
time.
■ Calculate the percentage of free memory, in bytes and blocks, left in the query cache:
Qcache_free_memory / query_cache_size * 100
Qcache_free_blocks / Qcache_total_blocks * 100
If the percentage of free memory is high, either lower the query_cache_size so that you
are not wasting memory by allocating it to the query cache, or cache more queries (see the
sidebar on caching more queries).
If the percentage of free memory is low and
Qcache_queries_in_cache is also low, it
means that in addition not having a lot of memory for more queries, there are very few
queries in the query cache. This may indicate that:
■ The
query_cache_size needs to bigger
■ The
query_cache_limit needs to be smaller
■ The query cache is heavily fragmented. See the next section on query cache fragmen-
tation for more information.
Caching More Queries
I
f you determine that your query cache is not being utilized enough, you may need to try caching
more queries. To do this, you can:
1. Use fewer SELECT SQL_NO_CACHE statements if your query_cache_type is ON.
2. Use more SELECT SQL_CACHE statements if your query_cache_type is
DEMAND.
continued
432

Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
Caching with MySQL 12
continued
3. Qcache_not_cached is the number of queries that could not be cached. If this is
large, increasing the
query_cache_limit parameter enables queries with larger
result sets to be stored in the query cache.
4. Qcache_lowmem_prunes is the number of times a query has had to
be pruned due to not enough memory. If
Qcache_free_blocks and
Qcache_free_memory are also high, you likely have fragmentation (see the next
section on query cache fragmentation). If the free memory in the query cache is
low, increasing the
query_cache_size parameter will allow more queries to
be cached in the query cache.
Query cache fragmentation
Fragmentation of the query cache is analogous to fragmentation of hard drives. With a typical
desktop computer, after you utilize it for a while your hard drive gets fragmented with files scat-
tered all over the hard drive. The fragmentation is caused by files being deleted that leave holes
in the layout of files on your hard drive. Though the operating system will try and fill in these
holes with other files, they will almost never be an exact fit. Over time you end up with hun-
dreds or thousands of these holes and they consume extra hard drive space and slow down the
system. To resolve the problem the hard drive must be defragmented. This involves moving
the files around to remove the gaps and make everything as compact as possible.
With a query cache there are two causes of fragmentation. The first cause is that the result sets
do not always fit exactly into a block of memory. The query cache program has to estimate the
size of the result set it will be storing in the cache and allocate a block of memory for storage of
the result set. The estimation is necessary because it receives the result set row by row instead
of in one large chunk. Once the query cache is done storing the result set it will trim the
allocated block of memory down in size so that there is just enough memory for the size of the

result set. This leaves a gap between the blocks of memory.
The second cause of query cache fragmentation is the invalidation of queries stored in the cache.
When a query is invalidated it is deleted from the cache. As with hard drive fragmentation, a
hole is created when the deletion occurs.
Often both trimming of blocks and invalidation of queries leads to blocks of memory that
are too small to store query results. The best way to handle query cache fragmentation is to
minimize the amount of fragmentation. One way to minimize fragmentation is to set the block
size of the query cache to the average result set size of your queries. To determine the average
result size:
(query_cache_size-Qcache_free_memory)/Qcache_queries_in_cache
Unfortunately this will not help for every application. Situations where this might not be helpful
include when the application has a combination of both large and small queries.
433
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
Part III Core MySQL Administration
Another way to minimize fragmentation is to minimize the query cache invalidation and prun-
ing. The more queries removed, the more fragmentation can occur. Often, it is not possible to
change the frequency of
UPDATE, INSERT,andDELETE statements on a table. However, query
cache invalidation may be minimized by splitting large tables into smaller tables by using
MERGE
tables (see Chapter 11) or partitioning (see Chapter 15).
In situations where fragmentation is likely and difficult to minimize, it may be best to
choose which queries to cache by setting the
query_cache_type to DEMAND and using the
SQL_CACHE hint to select the queries you want cached.
The previous section on query cache memory usage and tuning showed how to determine if the
query cache is fragmented. Executing the
FLUSH QUERY CACHE command will defragment
the cache.

FLUSH QUERY CACHE does not flush the cache; it compacts the cache by lock-
ing the query cache and re-ordering it so there are no gaps between blocks of memory. With
larger caches the re-ordering can be a significant period of time. At the time of this writing, we
recommend the maximum size of
query_cache_size to be 256 Mb, so the defragmentation
process is not too long. If, instead of compacting the query cache, it is necessary to empty the
query cache use the
RESET QUERY CACHE command.
Be careful using the FLUSH QUERY CACHE command. DML and DDL use the query
cache to invalidate queries after the table changes are made, so these queries will
not complete until the lock on the query cache is released. In addition, any SELECT queries
checking the query cache must also wait for the lock on the query cache to be released. If the
query_cache_type is ON, this means that
all
SELECT queries except for those specifying
SQL_NO_CACHE depend on the query cache. Performing a FLUSH QUERY CACHE may block many
or all queries, so be careful when defragmenting!
Utilizing memcached
The memcached caching system is a distributed, in-memory cache system that speeds up web
applications by minimizing the amount of database queries needed. It was originally developed
by Danga Interactive for the LiveJournal website. The official website for
memcached is at
/>The basic principle of
memcached is that a number of memcached daemons work toge-
ther to provide data caching.
memcached is distributed because many daemons can work
together to cache data — there is no need for one centralized cache. It is an in-memory cache,
which means that when a daemon is restarted the cache data it held is lost. As with the query
cache in
mysqld, memcached stores a hash of a query along with its result set. The application

code must be changed to first retrieve the result set corresponding to the hash of the query — if
nothing is returned, there was no match in the
memcached cache, and the application code
should proceed to query the database as normal. The application code has the power to store
the query and the new result set, if desired.
434
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
Caching with MySQL 12
Installing memcached is not difficult for those with system administration experience. mem-
cached
runs on Unix-based platforms. A third-party Windows port of memcached is available
at
The only prerequisite memcached
requires is the libevent library.
The Frequently Asked Questions page for memcached has a lot of good information
for a memcached beginner, including where to download memcached and how to
install it:
/>Configuration of memcached is not difficult. In fact, it has just a handful of command-line
options. Table 12-3 lists the
memcached options and what they do.
TABLE 12-3
memcached Command-Line Options
Option Description
-d Runs in daemon mode (in the background). Without this option, memcached
runs in the foreground.
-l Listens on specific IP address. By default, memcached listens on all IP
addresses configured on the server.
-s Listens via the specified Unix socket file only.
-m Amount of memory to use (in megabytes).
-p Listens on specified port (11211 by default).

-u User who runs the daemon.
One consideration is the amount of memory you can allocate to each memcached daemon. If
you are running
memcached on a 32-bit machine, each process can only address approximately
2.4 Gb of memory. However, because
memcached is designed to be distributed, it is very easy
to run multiple instances on 32-bit machines. In fact, to maximize the cache
memcached pro-
vides, run as many
memcached daemons as possible — using dedicated memcached machines
with one or more instances, and putting
memcached instances on the same machines as their
web servers. Both of these practices are recommended and encouraged.
Web servers are particularly well-suited to share resources with
memcached processes. For typi-
cal dynamic content, web servers use much more CPU than memory, and
memcached processes
use much more memory than CPU.
435
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
Part III Core MySQL Administration
No authentication or security measures are built into memcached. Never expose
memcached ports to the Internet.
Once memcached is configured and running, it is time to begin utilizing the cache. You do this
via application code. In each place that your application queries the database, add in a check of
the cache in
memcached first. If memcached returns nothing, the application must then go to
the database, get the result set, and store it in
memcached.
A number of programming APIs are available that interface with

memcached. Table 12-4 lists
some of the more popular ones and their locations.
TABLE 12-4
memcached Programming API Libraries
API Language API Name Download Location
C libmemcachedapr_
memcache
/>.html
/>projects/libs/apr_memcache/
C# memcacheddotnetEnyim
Memcachedbeitmemcached
/>memcacheddotnet/
/>EnyimMemcached/
/>beitmemcached/
Java spymemcached />spymemcached/
Lua memcached.lua />Perl Cache::Memcached
Cache::Memcached::
Fast
/>Cache-Memcached/
/>Cache-Memcached-Fast/
PHP memcache />memcache
Python python-memcached />python-memcached/
MySQL UDFs Memcached Functions
for MySQL
/>Functions_for_MySQL.html
Though we do not have the space to provide a complete programming guide, we will show a
simple example to show you what it is like to work with
memcached.
436
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.

Caching with MySQL 12
In order to be utilized, memcached will need to be integrated into your application. In this
example, the application is using PHP and the
memcache API for PHP.
The first thing to do is create a new
Memcache object and establish a connection to our mem-
cached
server:
<?php
$memcache = new Memcache;
$memcache->connect(’localhost’, 11211) or die ("Could not connect");
?>
For this example we are running memcached on the local machine with default settings. You
would typically initiate this connection once, when you open a database connection. If you want
to connect to more than one
memcached server, call $memcache->connect() again and with
the host and port additional server. For comparison purposes, before we show you how to
integrate
memcached with your code, here is the code for a database query that does not use
memcached:
<?php
$sql = "SELECT first_name, last_name FROM sakila.customer WHERE cus-
tomer_id = 1";
$query = mysql_query($sql) or die(mysql_error()." : $sql");
if(mysql_num_rows($query)> 0) {
$result = mysql_fetch_object($query);
$content = $result->content;
}
?>
Here is one way to integrate memcached:

<?php
$sql = "SELECT first_name, last_name FROM sakila.customer WHERE cus-
tomer_id = 1";
//create a hash of the query
$key = md5(’query’.$sql);
//lookup the hash
$result = $memcache->get($key);
//if we memcached does not return a result set
//query the database normally
if($result == null) {
$query = mysql_query($sql) or die(mysql_error()." : $sql");
if(mysql_num_rows($query)> 0) {
$result = mysql_fetch_object($query);
//store the result in memcache
$memcache->set($key,$result,0,1800);
$content = $result->content;
}
}
?>
437
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
Part III Core MySQL Administration
The code first checks to see if it can find the result set in memcached. If no results are returned,
the result set is fetched from the database, returned for use, and stored into
memcached.Inthe
code sample, the
set() function takes four arguments: the query hash ($key), the result set
(
$result), whether or not the result set should be compressed (0, which means no), and an
expiration time (

1800 seconds, which is 30 minutes).
There are two ways to resolve the problem of stale data in the
memcached cache. The easiest
method is to use a low cache expiration time, applied to the result set when the data is stored
using the
set() function. Though this is simple to implement, there can still be a lag from the
time you updated the database to when the updates will appear in the cache. For example, if we
set the expiration time for the previous query to 60 seconds, the data may be up to 60 seconds
out of date. Though this is better than up to 30 minutes out of date, the data is still changed.
The second method is to update the cache immediately when data is changed. This requires
more work because you have to update each result set that has changed in the cache.
There is another way for
memcached to interact with mysqld:thememcache_engine storage
engine, developed by MySQL architect Brian Aker. The website for
memcache_engine is:
/>memcache_engine makes memcached look like a table in mysqld. Instead of API calls such
as
get() and set() in the preceding example, API calls are in the format of SELECT, INSERT,
UPDATE,andDELETE queries on a table representing the memcached cache.
memcached can make a dramatic difference in the response time of your application and the
load on your database servers. Although the speed of
memcached is impressive, surpassing that
of the query cache in
mysqld, the ability to scale by adding additional servers and updating
your application code is equally important.
memcached can be a great way to scale a rapidly
growing website.
Summary
Caching at any level with database servers is a win-win situation when the end user frequently
gets faster responses from the application and the database server does not have to work as hard

to return those results. Combining multiple layers of caching provides even better results. The
following topics were covered in this chapter:
■ What caching is
■ Using cache tables to cache counts
■ The
mysqld query cache
■ The
memcached caching server
438
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
Backups and Recovery
IN THIS CHAPTER
Backing up MySQL
Recovering from crashes
Planning for disasters
T
he most important responsibility a database administrator has is
performing backups and recoveries. In almost every case, data is an
organization’s primary asset and keeping that asset safe is of utmost
importance. Backups and recovery are necessary in case of disaster, but
are also critical in setting up replication, upgrading, migrating, and trou-
bleshooting unexpected data changes.
This chapter has two goals. The first goal is to give an overview of the var-
ious methods of backups available to a database administrator. The second
goal is to help you to understand that there is more to backups and recov-
ery than just being able to perform the actual operations of backup and
recovery. Backup and recovery are parts of the big picture of planning for
disasters and recovering from them, as well as setting up new slave servers
and test/development environments.
Backing Up MySQL

A number of terms are associated with backup and recovery. Before cov-
ering the various methods of performing backups, it’s important to define
some terminology first:
■ Logical backup —Alogical backup is created by saving informa-
tion that represents the logical database structures using SQL state-
ments like CREATE DATABASE, CREATE TABLE, and INSERT. It
is not correct to say that a logical backup is a text representation
of the database server because there can be non-text binary data in
logical backup. Other names for a logical backup are a logical export
and an export.
439
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
Part III Core MySQL Administration
The advantages of a logical backup are that it allows the database administrator to manip-
ulate the backup data using tools such as
dd and grep and programming languages such
as awk and Perl. In addition, logical backups are more compatible between different ver-
sions of
mysqld when you are upgrading your database server. Sun recommends a logical
backup and restore of your database as a standard practice when upgrading between dif-
ferent release series (that is, when upgrading from version 5.1 to 6.0) of
mysqld.The
disadvantages compared to a physical backup are that a logical backup is slower to back
up and restore and may take up more space (but compresses well).
■ Physical backup —Aphysical backup is a backup of the actual database files or disk
partitions. This can be much faster to back up and restore than logical backups. Though
physical backups do not compress much (the data is usually in a binary format and thus
somewhat compressed already), physical backups are often smaller than uncompressed
logical backups. Physical backups may also be called raw backups.
■ Full backup —Afull backup is a standalone backup containing everything in the

database. If necessary you could use the results of the full backup to re-create a server
somewhere else. A full backup can be either a logical or physical backup.
■ Incremental backup —Anincremental backup is a backup that only contains the data
changed since the previous backup. The previous backup may have been full or incre-
mental. The advantage of an incremental backup compared to a full backup is quicker
backup times. Incremental backups are used to be able to back up data more frequently
than a full backup may allow.
For example, a full backup of a large data set may take 3 hours, and a daily incremental
backup may take 30 minutes. An organization may only be able to run a 3-hour
full backup once a week on Sundays, but during the other six days of the week an
organization can run an incremental backup. In this way, the data set can be restored to
how it was on Sunday, Tuesday, or any other day of the week. To restore to Tuesday,
first the Sunday full backup is restored. Then the Monday incremental backup is applied
to the Sunday restore, restoring the data to how it was on Monday. Then the Tuesday
incremental backup can be applied.
The biggest disadvantage of an incremental backup is that it is not a full data set, and can-
not be used by itself to restore a full data set. This means that incremental backups have
longer recovery times than full backups, because the latest full backup and all intermediate
incremental backups need to be restored.
■ Consistent backup —Aconsistent backup is a backup at an exact moment in time. A
backup process takes time to complete. Consider the following scenario:
■ The
customer table has a field that references a record in the address table.
■ The
address table is backed up at 1:00 am.
■ The
customer table is backed up at 1:05 am.
■ Between 1:00 am and 1:05 am, a customer named Henry Inge signs up for a new
account using an address not previously in the system.
440

Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
Backups and Recovery 13
The address table was backed up at 1:00 am, before Henry signed up for a new account.
Therefore, the backup of
address will not contain Henry’s address. However, customer
is backed up at 1:05 am, and by then Henry’s address is in address, and there is a refer-
ence to that record in
customer. Thus, a problem occurs: in the backup of customer,
Henry’s record references an address that does not exist in the backup of
address!
An inconsistent backup is usually simpler and less resource-intensive to produce than a
consistent backup. However, an inconsistent backup cannot be used as a standalone
backup. An inconsistent backup may be used for partial data restores (such as restoring
only Henry’s e-mail address).
■ Hot backup —Ahot backup is a backup of a database that is still running. During a hot
backup, neither reads nor writes are blocked.
■ Warm backup —Awarm backup is a backup of a database that is still running. During
a warm backup, read queries are not blocked but writes are prohibited from making any
modifications to the database for the duration of the backup.
■ Cold backup —Acold backup is a backup performed while the database is shut down.
This makes it very easy to make a consistent copy of your data. The disadvantage is that
the server is not accessible during the time the backup is performed.
■ Point-in-time restore —Apoint-in-time restore is a restoration of a database to a speci-
fied date and time. If this does not correspond to the time a full backup was performed,
incremental backups and/or server logs must be used to finish the restoration process.
Uses for backups
There are many reasons to have data backups. This section outlines the main reasons; you may
have additional backup uses specific to your environment.
Disaster recovery
Most people think of disaster recovery as the sole reason backups are performed. However,

disaster recovery itself is a misnomer. You have many different types of disasters to plan for;
an important part of a backup plan is determining both common and uncommon disasters and
planning the appropriate responses to them.
For example, hard drive failure could be handled several ways. One plan is to use RAID arrays
to handle a single drive failure, which will allow the file system to continue to be used (at a
reduced speed) until the drive can be replaced. This disaster recovery plan must also include
how to notice that a drive has failed, and how and when the replacement is done — is there
a vendor or third-party that will replace the drive within a certain period of time? Whose
responsibility in-house is it to make sure the correct parts are on hand (or ordered with rush
delivery), and make sure the replacement happens?
Of course this plan does not take into account what happens if the RAID controller fails.
You should try and think of as many possible failure scenarios and your response to them as
441
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.

×