Tải bản đầy đủ (.pdf) (59 trang)

the art of scalability scalable web architecture processes and organizations for the modern enterprise phần 8 doc

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (6.31 MB, 59 trang )

ptg5994185
388 CHAPTER 25 CACHING FOR PERFORMANCE AND SCALE
Caching Software
Adequately covering even a portion of the caching software that is available both
from vendors and the open source communities is beyond the scope of this chapter.
However, there are some points that should be covered to guide you in your search
for the right caching software for your company’s needs. The first point is that you
should thoroughly understand your application and user demands. Running a site
with multiple GB per second of traffic requires a much more robust and enterprise-
class caching solution than does a small site serving 10MB per second of traffic. Are
you projecting a doubling of requests or users or traffic every month? Are you intro-
ducing a brand-new video product line that is going to completely change that type
and need for caching? These are the types of questions you need to ask yourself
before you start shopping the Web for a solution, or you could easily fall into the trap
of making your problem fit the solution.
The second point addresses the difference between add-on features and purpose-
built solutions and is applicable to both hardware and software solutions. To under-
stand the difference, let’s discuss the life cycle of a typical technology product. A
product usually starts out as a unique technology that sells and gains traction, or is
adopted in the case of open source, as a result of its innovation and benefit within its
target market. Over time, this product becomes less unique and eventually commod-
itized, meaning everyone sells essentially the same product with the primary differen-
tiation being price. High tech companies generally don’t like selling commodity
products because the profit margins continue to get squeezed each year. And open
source communities are usually passionate about their software and want to see it
continue to serve a purpose. The way to prevent the margin squeeze or the move into
the history books is to add features to the product. The more “value” the vendor
adds the more the vendor can keep the price high. The problem with this is that these
add-on features are almost always inferior to purpose-built products designed to
solve this one specific problem.
An example of this can be seen in comparing the performance of mod_cache in


Apache as an add-on feature with that of the purpose-built product memcached. This
is not to belittle or take away anything from Apache, which is a very common open
source Web server that is developed and maintained by an open community of devel-
opers known as the Apache Software Foundation. The application is available for a
wide variety of operating systems and has been the most popular Web server on the
World Wide Web since 1996. The Apache module, mod_cache, implements an HTTP
content cache that can be used to cache either local or proxied content. This module
is one of hundreds available for Apache, and it absolutely serves a purpose, but when
you need an object cache that is distributed and fault tolerant, there are better solu-
tions such as memcached.
Application caches are extensive in their types, implementations, and configura-
tions. You should first become familiar with the current and future requirements of
ptg5994185
CONTENT DELIVERY NETWORKS 389
your application. Then, you should make sure you understand the differences
between add-on features and purpose-built solutions. With theses two pieces of
knowledge, you are ready to make a good decision when it comes to the ideal caching
solution for your application.
Content Delivery Networks
The last type of caching that we are going to cover in this chapter is the content deliv-
ery networks (CDNs). This level of caching is used to push any of your content that is
cacheable closer to the end user. The benefits of this include faster response time and
fewer requests on your servers. The implementation of a CDN is varied but most
generically can be thought of as a network of gateway caches located in many differ-
ent geographical areas and residing on many different Internet peering networks.
Many CDNs use the Internet as their backbone and offer their servers to host your
content. Others, to provide higher availability and differentiate themselves, have built
their own network point to point between their hosting locations.
The advantages of CDNs are that they speed up response time, off load requests
from your application’s origin servers, and possibly lower delivery cost, although this

is not always the case. The concept is that the total capacity of the CDN’s strategi-
cally placed servers can yield a higher capacity and availability than the network
backbone. The reason for this is that if there is a network constraint or bottleneck,
the total throughput is limited. When these are eliminated by placing CDN servers on
the edge of the network, the total capacity is increased and overall availability
increases as well. The way this works is that you place the CDN’s domain as an alias
for your server by using a canonical name (CNAME) in your DNS entry. A sample
entry might look like this:
ads.akfpartners.com CNAME ads.akfpartners.akfcdn.net
Here, we have our CDN,
akfcdn.net
, as an alias for our subdomain
ads.akfpart-
ners.com
. The CDN alias could then be requested by the application, and as long as
the cache was valid, it would be served from the CDN and not our origin servers for
our system. The CDN gateway servers would periodically make requests to our
application origin servers to ensure that the data, content, or Web pages that they
have in cache is up-to-date. If the cache is out-of-date, the new content is distributed
through the CDN to their edge servers.
Today, CDNs offer a wide variety of services in addition to the primary service of
caching your content closer to the end user. These services include DNS replacement,
geo-load balancing, which is serving content to users based on their geographical
location, and even application monitoring. All of these services are becoming more
commoditized as more providers enter into the market. In addition to commercial
ptg5994185
390 CHAPTER 25 CACHING FOR PERFORMANCE AND SCALE
CDNs, there are more peer-to-peer P2P services being utilized for content delivery to
end users to minimize the bandwidth and server utilization from providers.
Conclusion

In this chapter, we started off by explaining the concept that the best way to handle
large amounts of traffic is to avoid handling them in the first place. You can best do
this by utilizing caching. In this manner, caching can be one of the best tools in your
tool box for ensuring scalability. We identified that there are numerous forms of
caching already present in our environments, ranging from CPU cache to DNS cache
to Web browser caches. In this chapter, we wanted to focus primarily on three levels
of caching that are most under your control from an architectural perspective. These
are caching at the object, application, and content delivery network levels.
We started with a primer on caching in general and covered the tag-datum struc-
ture of caches and how they are similar to buffers. We also covered the terminology
of cache-hit, cache-miss, and hit-ratio. We discussed the various refreshing methodol-
ogies of batch and upon cache-miss as well as caching algorithms such as LRU and
MRU. We finished the introductory section with a comparison of write-through ver-
sus write-back methods of manipulating the data stored in cache.
The first type of cache that we discussed was the object cache. These are caches
used to store objects for the application to be reused. Objects stored within the cache
usually come from either a database or have been generated by the application. These
objects are serialized to be placed into cache. For object caches to be used, the appli-
cation must be aware of them and have implemented methods to manipulate the
cache. The database is the first place to look to offset load through the use of an
object cache, because it is generally the slowest and most expensive of your applica-
tion tiers; but the application tier is often a target as well.
The next type of cache that we discussed was the application cache. We covered
two varieties of application caching: proxy caching and reverse proxy caching. The
basic premise of application caching is that you desire to speed up performance or
minimize resources used. Proxy caching is used for a limited number of users request-
ing an unlimited number of Web pages. This type of caching is often employed by
Internet service providers or local area networks such as in schools and corporations.
The other type of application caching we covered was the reverse proxy cache. A
reverse proxy cache is used for an unlimited number of users or requestors and for a

limited number of sites or applications. These are most often implemented by system
owners in order to off load the requests on their application origin servers.
The last type of caching that we covered was the content delivery networks
(CDNs). The general principle of this level of caching is to push content that is cache-
ptg5994185
CONCLUSION 391
able closer to the end user. The benefits include faster response time and fewer
requests on the origin servers. CDNs are implemented as a network of gateway
caches in different geographical areas utilizing different ISPs.
No matter what type of service or application you provide, it is important to
understand the various methods of caching in order that you choose the right type of
cache. There is almost always a caching type or level that makes sense with Web 2.0
and SaaS systems.
Key Points
• The most easily scalable traffic is the type that never touches the application
because it is serviced by cache.
• There are many layers to consider adding caching, each with pros and cons.
• Buffers are similar to caches and can be used for performance, such as when
reordering of data is required before writing to disk.
• The structure of a cache is very similar to data structures, such as arrays with
key-value pairs. In a cache, these tuples or entries are called tags and datum.
• A cache is used for the temporary storage of data that is likely to be accessed again,
such as when the same data is read over and over without the data changing.
• When the requesting application or user finds the data that it is asking for in the
cache this is called a cache-hit.
• When the data is not present in the cache, the application must go to the pri-
mary source to retrieve the data. Not finding the data in the cache is called a
cache-miss.
• The number of hits to requests is called a cache ratio or hit ratio.
• The use of an object cache makes sense if you have a piece of data either in the

database or in the application server that gets accessed frequently but is updated
infrequently.
• The database is the first place to look to offset load because it is generally the
slowest and most expensive of your application tiers.
• A reverse proxy cache is opposite in that it caches for an unlimited number of
users or requestors and for a limited number of sites or applications.
• Another term used for reverse proxy caches is gateway caches.
• Reverse proxy caches are most often implemented by system owners themselves
in order to off load the requests on their Web servers.
• Many CDNs use the Internet as their backbone and offer their servers to host
your content.
ptg5994185
392 CHAPTER 25 CACHING FOR PERFORMANCE AND SCALE
• Others, in order to provide higher availability and differentiate themselves, have
built their own network point to point between their hosting locations.
• The advantages of CDNs are that they lower delivery cost, speed up response
time, and off load requests from your application’s origin servers.
ptg5994185
393
Chapter 26
Asynchronous Design for Scale
In all fighting, the direct method may be used for joining battle,
but indirect methods will be needed in order to secure victory.
—Sun Tzu
This last chapter in Part III, Architecting Scalable Solutions, will address an often
overlooked problem when developing services or product—that is, overlooked until
it becomes a noticeable and costly inhibitor to scaling. This problem is the use of syn-
chronous calls in the application. We will explore the reasons that most developers over-
look asynchronous calls as a scaling principle and how converting synchronous calls to
asynchronous ones can greatly improve the scalability and availability of the system.

We will explore the use of state in applications including why it is used, how it is
often used, why it can be problematic, and how to make the best of it when neces-
sary. Examining the need for state and eliminating it where possible will pay huge
dividends within your architecture if it is not already a problem. If it already is a
problem in your system, this chapter will give you some tools to fix it.
Synching Up on Synchronization
Let’s start our discussion by covering some of the basics of synchronization, starting
with a definition and some different types of synchronization methods. The process
of synchronization refers to the use and coordination of simultaneously executed
threads or processes that are part of an overall task. These processes must run in the
correct order to avoid a race condition or erroneous results. Stated another way, syn-
chronization is when two or more pieces of work must be in a specific order to
accomplish a task. An example is a login task. First, the user’s password must be
encrypted; then it must be compared against the encrypted version in the database;
then the session data must be updated marking the user as authenticated; then the
welcome page must be generated; and finally the welcome page must be presented. If
ptg5994185
394 CHAPTER 26 ASYNCHRONOUS DESIGN FOR SCALE
any of those pieces of work are done out of order, the task of logging the user in fails
to get accomplished.
There are many types of synchronization processes that take place in program-
ming. One that all developers should be familiar with is the mutex or mutual exclu-
sion. Mutex refers to how global resources are protected from concurrently running
processes to ensure only one process is updating or accessing the resource at a time.
This is often accomplished through semaphores, which is kind of a fancy flag. Sema-
phores are variables or data types that mark or flag a resource as being in use or free.
Another classic synchronization method is known as thread join. Thread join is when
a process is blocked from executing until a thread terminates. After the thread termi-
nates, the other process is free to continue. An example would be for a parent pro-
cess, such as a “look up,” to start executing. The parent process kicks off a child

process to retrieve the location of the data that it is going to look up, and this child
thread is “joined.” This means that the parent process cannot complete until the
child process terminates.
Dining Philosophers Problem
This analogy is credited to Sir Charles Anthony Richard Hoare (a.k.a. Tony Hoare), as in the
person who invented the Quicksort algorithm. This analogy is used as an illustrative example of
resource contention and deadlock. The story goes that there were five philosophers sitting
around a table with a bowl of spaghetti in the middle. Each philosopher had a fork to his left,
and therefore each had one to his right. The philosophers could either think or eat, but not both.
Additionally, in order to serve and eat the spaghetti, each philosopher required the use of two
forks. Without any coordination, it is possible that all the philosophers pick up their forks simul-
taneously and therefore no one has two forks in which to serve or eat.
This analogy is used to show that without synchronization the five philosophers could
remain stalled indefinitely and starve just as five computer processes waiting for a resource
could all enter into a deadlocked state. There are many ways to solve such a dilemma. One is
to have a rule that each philosopher when reaching a deadlock state will place his fork down,
freeing up a resource, and think for a random time. If this solution sounds familiar, it might be
because it is the basic idea of retransmission that takes place in the Transmission Control Pro-
tocol (TCP). When no acknowledgement for data is received, a timer is started to wait for a
retry. The amount of time is adjusted by the smoothed round trip time algorithm and doubled
after each unsuccessful retry.
As you might expect, there are many other types of synchronization processes and
methods that are employed in programming. We’re not presenting an exhaustive list
ptg5994185
SYNCHRONOUS VERSUS ASYNCHRONOUS CALLS 395
but rather attempting to give you an overall understanding that synchronization is
used throughout programming in many different ways. Eliminating synchronization
is not possible, nor would it be advisable. It is, however, prudent to understand the
purpose and cost of synchronization so that when you use it you do so wisely.
Synchronous Versus Asynchronous Calls

Now that we have a basic definition and some examples of synchronization, we can
move on to a broader discussion of synchronous versus asynchronous calls within the
application. Synchronous calls perform their action completely by the time the call
returns. If a method is called and control is given to this method to execute, the point
in the application that made the call is not given control back until the method has
completed its execution and returned either successfully or with an error. In other
words, synchronous methods are called, they execute, and when they finish, you get
control back. As an example of a synchronous method, let’s look at a method called
query_exec from AllScale’s human resource management (HRM) service. This
method is used to build and execute a dynamic database query. One step in the
query_exec method is to establish a database connection. The query_exec method
does not continue executing without explicit acknowledgement of successful comple-
tion of this database connection task. Doing so would be a waste of resources and
time. If the database is not available, the application should not waste time creating
the query and waiting for it to become available. Indeed, if the database is not avail-
able, the team should reread Chapter 24, Splitting Databases for Scale, on how to
scale the database so that there is improved availability. Nevertheless, this is an
example of how synchronous calls work. The originating call is halted and not
allowed to complete until the invoked process returns.
A nontechnical example of synchronicity is communication between two individu-
als either in a face-to-face fashion or over a phone line. If both individuals are
engaged in meaningful conversation, there is not likely to be any other action going
on. One individual cannot easily start another conversation with another individual
without first stopping the conversation with the first person. Phone lines are held
open until one or both callers terminate the call.
Contrast the synchronous methods or threads with an asynchronous method.
With an asynchronous method call, the method is called to execute in a new thread,
and it immediately returns control back to the thread that called it. The design pat-
tern that describes the asynchronous method call is known as the asynchronous
design, or the asynchronous method invocation (AMI). The asynchronous call con-

tinues to execute in another thread and terminates either successfully or with error
without further interaction with the initiating thread. Let’s turn back to our AllScale
ptg5994185
396 CHAPTER 26 ASYNCHRONOUS DESIGN FOR SCALE
example with the query_exec method. After calling synchronously for the database
connection, the method needs to prepare and execute the query. In the HRM system,
AllScale has a monitoring framework that allows them to note the duration and suc-
cess of all queries by asynchronously calling a method for start_query_time and
end_query_time. These methods store a system time in memory and wait for the end
call to be placed in order to calculate duration. The duration is then stored in a mon-
itoring database that can be queried to understand how well the system is performing
in terms of query run time. Monitoring the query performance is important but not
as important as actually servicing the users’ requests. Therefore, the calls to the mon-
itoring methods of start_query_time and end_query_time are done asynchronously. If
they succeed and return, great—AllScale’s operations and engineering teams get the
query time in the monitoring database. If the monitoring calls fail or get delayed for
20 seconds waiting on the monitoring database connection, they don’t care. The user
query continues on without any concern over the asynchronous calls.
Returning to our communication example, email is a great example of asynchro-
nous communication. You write an email and send it, immediately moving on to
another task, which may be another email, a round of golf, or whatever. When the
response comes in, at an appropriate time, you read the response and potentially
issue yet another email in response. The communication chain blocks neither the
sender nor receiver for anything but the time to process the communication and issue
a response.
Scaling Synchronously or Asynchronously
Now we understand the difference between synchronous and asynchronous calls.
Why does this matter? The answer lies in scalability. Synchronous calls, if used exces-
sively or incorrectly, cause undue burden on the system and prevent it from scaling.
Let’s continue with our query_exec example where we were trying to execute a user’s

query. If we had implemented the two monitoring calls synchronously using the
rationale that (1) monitoring is important, (2) the monitoring methods are very
quick, and (3) even if we slow down a user query what’s the worst that could happen.
These are all good intentions, but they are wrong. As we stated earlier, monitoring is
important but it is not more important than returning a user’s query. The monitoring
methods might be very quick, when the monitoring database is operational, but what
happens when it has a hardware failure and is inaccessible? The monitoring queries
back up waiting to time out. This means the users’ queries are blocked waiting for
completion of the monitoring queries and are in turn backed up. When the user que-
ries are slowed down or temporarily halted waiting for a time out, it is still taking up
a database connection on the user database and is still consuming memory on the
application server trying to execute this thread. As more and more user threads start
stalling waiting for their monitoring calls to time out, the user database might run
out of connections preventing other nonmonitored queries from executing, and the
ptg5994185
SYNCHRONOUS VERSUS ASYNCHRONOUS CALLS 397
threads on the app servers get written to disk to free up memory, which causes swap-
ping on the app servers. This swapping in turn slows down all processing and may
result in the TCP stack of the app server reaching some maximum limit and refusing
subsequent connections. Ultimately, new user requests are not processed and users sit
waiting for browser or application timeouts. Your application or platform is essen-
tially “down.” As you see, this ugly chain of events can quite easily occur because of
a simple oversight on whether a call should be synchronous or asynchronous. The
worst thing about this scenario is the root cause can be elusive. As we step through
the chain it is relatively easy to follow but when the symptoms of a problem are that
your system’s Web pages start loading slowly and over the next 15 minutes this con-
tinues to get worse and worse until finally the entire system grinds to a halt, diagnos-
ing the problem can be very difficult. Hopefully, you have sufficient monitoring in
place to help you diagnose these types of problems, but these extended chains of
events can be very daunting to unravel when your site is down and you are frantic to

get it back into service.
Despite the fact that synchronous calls can be problematic if used incorrectly or
excessively, method calls are very often done synchronously. Why is this? The answer
is that synchronous calls are simpler than asynchronous calls. “But wait!” you say.
“Yes, they are simpler but often times our methods require that the other methods
invoked do successfully complete and therefore we can’t put a bunch of asynchro-
nous calls in our system.” Ah, yes; good point. There are many times when you do
need an invoked method to complete and you need to know the status of that in
order to continue along your thread. We are not going to tell you that all synchro-
nous calls are bad; in fact, many are necessary and make the developer’s life a thou-
sand times less complicated. However, there are times when asynchronous calls can
and should be used in place of synchronous calls, even when there is dependency as
described earlier. If the main thread could care less whether the invoked thread fin-
ishes, such as with the monitoring calls, a simple asynchronous call is all that is
required. If, however, you require some information from the invoked thread, but
you don’t want to stop the primary thread from executing, there are ways to use call-
backs to retrieve this information. An in-depth discussion of callbacks are beyond the
scope of this chapter. An example of callback functionality is interrupt handlers in
operating systems that report on hardware conditions.
Asynchronous Coordination
Asynchronous coordination and communication between the original method and the invoked
method requires a mechanism that the original method determines when or if a called method
has completed executing. Callbacks are methods passed as an argument to other methods
and allow for the decoupling of different layers in the code.
ptg5994185
398 CHAPTER 26 ASYNCHRONOUS DESIGN FOR SCALE
In C/C++, this is done through function pointers; in Java, it is done through object refer-
ences. There are many design patterns that use callbacks, such as the delegate design pattern
and the observer design pattern. The higher level process acts as a client of the lower level and
calls the lower level method by passing it by reference. An example of what a callback method

might be invoked for would be an asynchronous event like file system changes.
In the .NET Framework, the asynchronous communication is characterized by the use of
BeginBlah
, where
Blah
is the name of the synchronous version of the method. There are four
ways to determine if an asynchronous call has been completed: first is polling (the
IsCompleted
property), second is a callback
Delegate
, third is the
AsyncWaitHandle
to wait on the call to com-
plete, and fourth the
EndBlah
, which waits on the call to complete.
Different languages offer different solutions to the asynchronous communication and coordi-
nation problem. Understand what your language and frameworks offer so that you can imple-
ment them when needed.
In the preceding paragraph, we said that synchronous calls are simpler than asyn-
chronous calls and therefore they get used an awful lot more often. Although this is
completely true, it is only part of the reason that engineers don’t pay enough atten-
tion to the impact of synchronous calls. The second part of the problem is that devel-
opers typically only see a small portion of the application. Very few people in the
organization get the advantage of viewing the application in total from a higher level
perspective. Your architects should certainly be looking at this level, as should some
of your management team. These are the people that you will have to rely on to help
challenge and explain how synchronization might cause scaling issues.
Example Asynchronous Systems
To fully understand how synchronous calls can cause scaling issues and how you can

either design from the start or convert a system in place to use asynchronous calls, we
shall invoke an example system that we can explore. The system that we are going to
discuss is taken from an actual client implementation that we reviewed in our advi-
sory practice at AKF Partners, but obviously it is obfuscated to protect privacy and
simplified to derive the relevant teaching points quickly.
The client had a system, we’ll call it MailScale, that allowed subscribed users to
email groups of other users with special notices, newsletters, and coupons (see Figure
26.1). The volume of emails sent in a single campaign could be very large, as many as
several hundred thousand recipients. These jobs were obviously done asynchronously
from the main site. When a subscribed user was finished creating or uploading the
email notice, he submitted the email job to process. Because processing tens of thou-
sands of emails can take several minutes, it really would be ridiculous to hold up the
user’s done page with a synchronous call while the job actually processes. So far, so
ptg5994185
SYNCHRONOUS VERSUS ASYNCHRONOUS CALLS 399
good; we have email batch jobs that are performed asynchronously from the main
user site.
The problem was that behind the main site there were schedulers that queued the
email jobs and parsed them out to available email servers when they became avail-
able. These schedulers were the service that received the email job from the main site
when submitted by the user. This was done synchronously: a user clicked Send, the
call was placed to the scheduler to receive the email job, and a confirmation was
returned that the job was received and queued. This makes sense that you don’t want
this submission to fail without the user knowing it and the call takes a couple hun-
dred milliseconds usually, so this is just a simple synchronous method invocation.
However, the engineer who made this decision did not know that the schedulers were
placing synchronous calls to the mail servers.
When a scheduler received a job, it queued it up until a mail server became avail-
able. Then, the scheduler would establish a synchronous stream of communication
between itself and the mail server to pass all the information about the job and mon-

itor the job while it completed. When all the mail servers were running under maxi-
mum capacity, and there were the proper number of schedulers for the number of
mail servers, everything worked fine. When mail slowed down because of an exces-
sive number of bounce back emails or an ISP mail server was slow receiving the out-
bound emails, the MailScale email servers could slow down and get backed up. This
in turn backed up the schedulers because they relied on a synchronous communica-
tion channel for monitoring the status of the jobs. When the schedulers slowed down
and became unresponsive, this backed up into the main site, making the application
Figure 26.1 MailScale Example
MailScale Site
Schedulers
Process Mail Job
Submit Mail Job
Dashed-Line Arrows Are
Synchronous Calls
Mail Servers
Internet
End Users
Database
ptg5994185
400 CHAPTER 26 ASYNCHRONOUS DESIGN FOR SCALE
servers trying to synchronously insert and schedule email jobs to slow down. The
entire site became slow and unresponsive, all because of a chain of synchronous calls
that no single person was aware of.
The fix for this problem was to break the synchronous communication into asyn-
chronous calls, preferably at both the app to scheduler and scheduler to email serv-
ers, but at least at one of those places. There are a few lessons to be learned here. The
first and most important is that synchronous calls can cause problems in your system
in unexpected places. One call can lead to another call to another, which can get very
complicated with all the interactions and multitude of independent code paths

through most systems, often referred to as the cyclomatic complexity of a program.
The next lesson that we can take from this is that engineers usually do not have the
overall architecture vision, and this can cause them to make decisions that daisy
chain processes together. This is the reason that architects and managers are critical
to help with designs, constantly teach engineers about the larger system, and oversee
implementations in an attempt to avoid these problems. The last lesson that we can
take from this example is the complexity in debugging problems of this nature.
Depending on the monitoring system, it is likely that the first alert comes from the
slowdown of the site and not the mail servers. If that occurs, it is natural that every-
one start looking at why the site is slowing down the mail servers instead of the other
way around. These problems can take a while to unravel and decipher.
Another reason to analyze and remove synchronous calls is the multiplicative
effect of failure. If you are old enough, you might remember the old Christmas tree
lights. These were strings of lights where if you had a single bulb out in the entire
string of lights, it caused every other bulb to be out. These lights were wired in series,
and should any single light fail, the entire string would fail. As a result, the “avail-
ability” of the string of lights was the product of the availability (1—the probability
of failure) of all the lights. If any light had a 99.999% availability or a 0.001%
chance of failure and there were 100 lights in the string, the theoretical availability of
the string of lights was 0.99999
100
or 0.999, reduced from 5-nine availability to 3-nine
availability. In a year’s time, 5-nine availability, 99.999%, has just over five minutes
of downtime, bulbs out, whereas a 3-nine availability, 99.9%, has over 500 minutes
of downtime. This equates to increasing the chance of failure from 0.001% to 0.1%.
No wonder our parents hated putting up those lights!
Systems that rely upon each other for information in a series and in synchronous
fashion are subject to the same rates of failure as the Christmas tree lights of yore.
Synchronous calls cause the same types of problems as lights wired in series. If one
fails, it is going to cause problems within the line of communication back to the end

customer. The more calls we make, the higher the probability of failure. The higher
the probability of failure, the more likely it is that we hold open connections and
refuse future customer requests. The easiest fix to this is to make these calls asynchro-
nous and ensure that they have a chance to recover gracefully with timeouts should
ptg5994185
DEFINING STATE 401
they not receive responses in a timely fashion. If you’ve waited two seconds and a
response hasn’t come back, simply discard the request and return a friendly error
message to the customer.
This entire discussion of synchronous and asynchronous calls is one of the often
missed but necessary topics that must be discussed, debated, and taught to organiza-
tions. Skipping over this is asking for problems down the road when loads start to
grow, servers start reaching maximum capacity, or services get added. Adopting prin-
ciples, standards, and coding practices now will save a lot of downtime and wasted
resources on tracking down and fixing these problems in the future.
Defining State
Another oft ignored engineering topic is stateful versus stateless applications. An
application that uses state is called stateful and it relies on the current condition of
execution as a determinant of the next action to be performed. An application or pro-
tocol that doesn’t use state is referred to as stateless. Hyper Text Transfer Protocol
(HTTP) is a stateless protocol because it doesn’t need any information about the pre-
vious request to know everything necessary to fulfill the next request. An example of
the use of state would be in a monitoring program that first identifies that a query
was requested instead of a cache request and then, based on that information, it cal-
culates a duration time for the query. In a stateless implementation of the same pro-
gram, it would receive all the information that it required to calculate the duration at
the time of request. If it was a duration calculation for a query, this information
would be passed to it upon invocation.
You may recall from a computer science computational theory class the descrip-
tion of Mealy and Moore machines, which are known as state machines or finite

state machines. A state machine is an abstract model of states and actions that is used
to model behavior; these can be implemented in the real world in either hardware or
software. There are other ways to model or describe behavior of an application, but
the state machine is one of the most common.
Mealy Moore Machines
A Mealy machine is a finite state machine that generates output based on the input and the
current state of the machine. A Moore machine, on the other hand, is a finite state machine that
generates output based solely on the current state. A very simple example of a Moore machine
is a turn signal that alternates on and off. The output is the light being turned on or off and is
completely determined by the current state. If it is on, it gets turned off. If it is off, it gets turned on.
ptg5994185
402 CHAPTER 26 ASYNCHRONOUS DESIGN FOR SCALE
Another very simple example, this time of a Mealy machine, is a traffic signal. Assume that
the traffic signal has a switch to determine whether a car is present. The output is the traffic
light red, yellow, or green. The input is a car at the intersection waiting on the light. The output
is determined by the current state of the light as well as the input. If a car is waiting and the cur-
rent state is red, the signal gets turned to green. Obviously, these are both overly simplified
examples, but you get the point that there are different ways of modeling behavior using states,
inputs, outputs, and actions.
Given that finite state machines are one of the fundamental aspects of theoretical
computer science as mathematically modeled by automatons, it is no wonder why
this is a fundamental structure of our system designs. But why exactly do we see state
in almost all of our programs, and are there alternatives? The reason that most appli-
cations rely on state is that the languages used for Web based or Software as a Service
(SaaS) development are almost all imperative based. Imperative programming is the
use of statements to describe how to change the state of a program. Declarative pro-
gramming is the opposite and uses statements to describe what changes need to take
place. Procedural, structured, and object-oriented programming all are imperative-
based programming methodologies. Example languages include Pascal, C/C++ and
Java. Functional or logical programming is declarative and therefore does not make

use of the state of the program. Standard Query Language (SQL) is a common exam-
ple of a logical language that is stateless.
Now that we have explored the definition of state and understand why state is
fundamental to most of our systems, we can start to explore how this can cause prob-
lems when we need to scale our applications. Having an application run as a single
instance on a single server, the state of the machine is known and easy to manage. All
users run on the one server, so knowing that a particular user has logged in allows the
application to use this state of being logged in and whatever input arrives, such as
clicking a link, to determine what the resulting output should be. The complexity of
this comes when we begin to scale our application along the X-axis by adding serv-
ers. If a user arrives on one server for this request and on another server for the next
request, how would each machine know the current state of the user? If your applica-
tion is split along the Y-axis and the login service is running in a completely different
pool than the report service, how does each of these services know the state of the
other? These are all questions that arise when trying to scale applications that require
state. These are not insurmountable, but they do require some thought, hopefully
before you are in a bind with your current capacity and have to rush out a new server
or split the services.
One of the most common implementations of state is the user session. Just because
an application is stateful does not mean that it must have a user sessions. The oppo-
site is true also. An application or service that implements a session may do so as a
ptg5994185
DEFINING STATE 403
stateless manner; consider the stateless session beans in enterprise java beans. A user
session is an established communication between the client, typically the user’s
browser, and the server that gets maintained during the life of the session for that
user. There are lots of things that developers store in user sessions, perhaps the most
common is the fact that the user is logged in and has certain privileges. This obvi-
ously is important unless you want to continue validating the user’s authentication at
each page request. Other items typically stored in session include account attributes

such as preferences for first seen reports, layout, or default settings. Again, having
these retrieved once from the database and then kept with the user during the session
can be the most economical thing to do.
As we laid out in the previous paragraph, there are lots of things that you may
want to store in a user’s session, but storing this information can be problematic in
terms of increased complexity for scaling. It makes great sense to not have to con-
stantly communicate with the database to retrieve a user’s preferences as they bounce
around your site, but this improved performance makes it difficult when there is a
pool of servers handling user requests. Another complexity of keeping session is that
if you are not careful the amount of information stored there will become unwieldy.
Although not common, sometimes an individual user’s session data reaches or
exceeds hundreds of kilobytes. Of course, this is excessive, but we’ve seen clients fail
to manage their session data and the result is a Frankenstein’s monster in terms of
both size and complexity. Every engineer wants his information to be quickly and
easily available, so he sticks his data in the session. After you’ve stepped back and
looked at the size and the obvious problems of keeping all these user sessions in
memory or transmitting them back and forth between the user’s browser and the
server, this situation needs to be remedied quickly.
If you have managed to keep the user sessions to a reasonable size, what methods
are available for saving state or keeping sessions in environments with multiple serv-
ers? There are three basic approaches: avoid, centralize, and decentralize. Similar to
our approach with caching, the best way to solve a user session scaling issue is to
avoid having the issue. You can achieve this by either removing session data from
your application or making it stateless. The other way to achieve avoidance is to
make sure each user is only placed on a single server. This way, the session data can
remain in memory on the server because that user will always come back to that
server for requests; other users will go to other servers in the pool. You can accom-
plish this manually in the code by performing a z-axis split (modulus or lookup) and
put all users with usernames A through M on one server and all users with usernames
N through Z on another server. If DNS pushes a user with username jackal to the sec-

ond server, it just redirects her to the first server to process her request. Another solu-
tion to this is to use session cookies on the load balancer. These cookies assign all
users to a particular server for the duration of the session. This way, every request
that comes through from a particular user will land on the same server. Almost all
ptg5994185
404 CHAPTER 26 ASYNCHRONOUS DESIGN FOR SCALE
load balancer solutions offer some sort of session cookie that provides this function-
ality. There are several solutions for avoiding the problem all together.
Let’s assume that for some reason none of these solutions work. The next method
of solving the complexities of keeping session on a myriad of servers when scaling is
decentralization of session storage. The way that this can be accomplished is by stor-
ing session in a cookie on the user’s browser. There are many implementations of this,
such as serializing the session data and then storing all of it in a cookie. This session
data must be transferred back and forth, marshalled/unmarshalled, and manipulated
by the application, which can add up to lots of time required for this. Remember that
marshalling and unmarshalling are processes where the object is transformed into a
data format suitable for transmitting or storing and converted back again. Another
twist to this is to store a very little amount of information in the session cookie and
use it as a reference index to a list of objects in a session database or file that contain
all the session information about each user. This way, the transmission and marshal-
ling costs are minimized.
The third method of solving the session problem with scaling systems is centraliza-
tion. This is where all user session data is stored centrally in a cache system and all
Web or app servers can access his data. This way, if a user lands on Web server 1 for
the login and then on Web server 3 for a report, both servers can access the central
cache and see that the user is logged in and what that user’s preferences are. A cen-
tralized cache system such as memcached that we discussed in Chapter 25, Caching
for Performance and Scale, would work well in this situation for storing user session
data. Some systems have success using session databases, but the overhead of connec-
tions and queries seem too much when there are other solutions such as caches for

roughly the same cost in hardware and software. The issue to watch for with session
caching is that the cache hit ratio needs to be very high or the user experience will be
awful. If the cache expires a session because it doesn’t have enough room to keep all
the user sessions, the user who gets kicked out of cache will have to log back in. As
you can imagine, if this is happening 25% of the time, it is going to be extremely
annoying.
Three Solutions to Scaling with Sessions
There are three basic approaches to solving the complexities of scaling an application that
uses session data: avoidance, decentralization, and centralization.
• Avoidance
Remove session data completely
Modulus users to a particular server via the code
Stick users on a particular server per session with session cookies from the load balancer
ptg5994185
CONCLUSION 405
• Decentralization
Store session cookies with all information in the browser’s cookie.
Store session cookies as an index to session objects in a database or file system with all
the information stored there.
• Centralization
Store sessions in a centralized session cache like memcached.
Databases can be used as well but are not recommended.
There are many creative methods of solving the session complexities when scaling applica-
tions. Depending on the specific needs and parameters of your application, one or more of
these might work better for you than others.
Whether you decide to design your application to be stateful or stateless and
whether you use session data or not are decisions that must be made on an applica-
tion by application basis. In general, it is easier to scale applications that are stateless
and do not care about sessions. Although this may aid in scaling, it may be unrealistic
in the complexities that it causes for the application development. When you do

require the use of state—in particular, session state—consider how you are going to
scale your application in all three axes of the AKF Scale Cube before you need to do
so. Scrambling to figure out the easiest or quickest way to fix a session issue across
multiple servers might lead to poor long-term decisions. These on the spot architec-
tural decisions should be avoided as much as possible.
Conclusion
In this last chapter of Part III, we dealt with synchronous versus asynchronous calls.
This topic is often overlooked when developing services or products until it becomes
a noticeable inhibitor to scaling. We started our discussion exploring synchroniza-
tion. The process of synchronization refers to the use and coordination of simulta-
neously executed threads or processes that are part of an overall task. We defined
synchronization as the situation when two or more pieces of work must be done to
accomplish a task. One example of synchronization that we covered was a mutex or
mutual exclusion. Mutex was a process of protecting global resources from concur-
rently running processes, often accomplished through the use of semaphores.
After we covered synchronization, we tackled the topics of synchronous and asyn-
chronous calls. We discussed synchronous methods as ones that, when they are called,
execute, and when they finish, the calling method gets control back. This was con-
trasted with the asynchronous methods calls where the method is called to execute in
ptg5994185
406 CHAPTER 26 ASYNCHRONOUS DESIGN FOR SCALE
a new thread and it immediately returns control back to the thread that called it. The
design pattern that describes the asynchronous method call is known as the asynchro-
nous method invocation (AMI). With the general definitions under our belt, we con-
tinued with an analysis of why synchronous calls can become problematic for scaling.
We gave some examples of how an unsuspecting synchronous call can actually cause
severe problems across the entire system. Although we did not encourage the com-
plete elimination of synchronous calls, we did express the recommendation that you
thoroughly understand how to convert synchronous calls to asynchronous ones.
Additionally, we discussed why it is important to have individuals like architects and

managers overseeing the entire system design to help point out to engineers when
asynchronous calls could be warranted.
Another topic that we covered in this chapter was the use of state in an applica-
tion. We started with what is state within application development. We then dove
into a discussion in computational theory on finite state machines and concluded
with a distinction between imperative and declarative languages. We finished the
stateful versus stateless conversation with one of the most commonly used implemen-
tations of state: that being the session state. Session as we defined it was an estab-
lished communication between the client, typically the user’s browser, and the server,
that gets maintained during the life of the session for that user. We noted that keeping
track of session data can become laborious and complex, especially when dealing
with scaling an application on any of the axes from the AKF Scale Cube. We covered
three broad classes of solutions—avoidance, centralization, and decentralization—
and gave specific examples and alternatives for each.
The overall lesson that this chapter should impart on the reader is that there are
reasons that we see engineers use synchronous calls and write stateful applications,
some due to carefully considered reasons and others because of the nature of modern
computational theory and languages. The important point is that you should spend
the time up front discussing these so that there are more, carefully considered deci-
sions about the uses of these rather than finding yourself needing to scale an applica-
tion and finding out that there are designs that prevent you from doing so.
Key Points
• Synchronization is when two or more pieces of work must be done in order to
accomplish a task.
• Mutex is a synchronization method that defines how global resources are pro-
tected from concurrently running processes.
• Synchronous calls perform their action completely by the time the call returns.
• With an asynchronous method call, the method is called to execute in a new
thread and it immediately returns control back to the thread that called it.
ptg5994185

CONCLUSION 407
• The design pattern that describes the asynchronous method call is known as the
asynchronous design and alternatively as the asynchronous method invocation
(AMI).
• Synchronous calls can, if used excessively or incorrectly, cause undue burden on
the system and prevent it from scaling.
• Synchronous calls are simpler than asynchronous calls.
• The second part of the problem of synchronous calls is that developers typically
only see a small portion of the application.
• An application that uses state is called stateful and it relies on the current state
of execution as a determinant of the next action to be performed.
• An application or protocol that doesn’t use state is referred to as stateless.
• Hyper Text Transfer Protocol (HTTP) is a stateless protocol because it doesn’t
need any information about the previous request to know everything necessary
to fulfill the next request.
• A state machine is an abstract model of states and actions that is used to model
behavior; these can be implemented in the real world in either hardware or
software.
• The reason that most applications rely on state is that the languages used for
Web based or SaaS development are almost all imperative based.
• Imperative programming is the use of statements to describe how to change the
state of a program.
• Declarative programming is the opposite and uses statements to describe what
changes need to take place.
• One of the most common implementations of state is the user session.
• Choosing wisely between synchronous/asynchronous as well as stateful/stateless
is critical for scalable applications.
• Have discussions and make decisions early, when standards, practices, and prin-
ciples can be followed.
ptg5994185

This page intentionally left blank
ptg5994185
Part IV
Solving Other Issues
and Challenges
ptg5994185
This page intentionally left blank
ptg5994185
411
Chapter 27
Too Much Data
The skillful soldier does not raise a second levy, nor are his supply wagons loaded more than once.
—Sun Tzu
Hyper growth, or even slow steady growth over time, presents some unique scalabil-
ity problems with data retention and storage. We might log information relevant at
the time of a transaction, insert information relevant to a purchase, or keep track of
user account changes. We may log all customer contacts or allow users to store data
ranging from pictures to videos. This size, as we will discuss later, has significant cost
implications to our business and can negatively affect our ability to scale, or at least
scale cost effectively.
Time also affects the value of our data in most systems. Although not universally
true, in many systems, the value of data decreases over time. Old customer contact
information, although potentially valuable, probably isn’t as valuable as the most
recent contact information. Old photos and videos aren’t likely accessed as often and
old log messages that we’ve made probably aren’t as relevant to us today. So as our
costs increase with all of the additional data being stored, the value on a per data unit
stored decreases, presenting unique challenges for most businesses.
The size of data alone can present issues for your business. Assuming that not all
elements of the data are valuable to all requests or actions against that data, we need
to find ways to process and store this data quickly and cost effectively.

This chapter is all about data size or the amount of data that you store. How do
we handle it, process it, and keep our business from being overly burdened by it?
What data do we get rid of and how do we store data in a tiered fashion that allows
all data to be accretive to shareholder value?
ptg5994185
412 CHAPTER 27 TOO MUCH DATA
The Cost of Data
Data is costly. Your first response to this might be that the costs of mass storage
devices have decreased steadily over time and with the introduction of cloud storage
services, storage has become “nearly free.” But free and nearly free obviously aren’t
the same thing as a whole lot of something that is nearly free actually turns out to be
quite expensive. As the price of storage decreases over time, we tend to care less
about how much we use and as a result our usage typically increases significantly.
Prices might drop by 50% and rather than passing that 50% reduction in price off to
shareholders as a reduction in our cost of operations, we may very likely allow the
size of our storage to double because it is “cheap.”
But the initial cost of this storage is not the only cost you incur with every piece of
data you store on it. The more storage you have, the more storage management you
need. This might be the overhead of systems administrators to handle the data, or
capacity planners to plan for the growth, or maybe even software licenses that allow
you to “virtualize” your storage environment and manage it more easily. As your
storage grows, so does the complexity of managing that storage.
Furthermore, as your storage increases, the power and space costs of handling that
storage increases as well. You might argue here that the advent of Massive Array of
Idle Disks (MAID) has offset those costs, or maybe you are thinking of even less
costly solutions such as cloud storage services. We applaud you if you have put your
infrequently accessed data on such a storage infrastructure. But the fact of the matter
is that if you run one massive array, it will cost you less than 10 massive arrays, and
less storage in the cloud will cost you less than more storage in the cloud. In the case
of MAID solutions, those disks spin from time to time, and they take power just to

ensure that they are “functioning.” Furthermore, you either paid for the power distri-
bution units (power sockets) into which they are plugged or you pay a monthly or
annual fee in the case of a collocation provider to have the plug and power available.
Finally, you either paid to build an infrastructure capable of some maximum power
utilization likely driven by a percentage of those drives being active or you pay some-
one else (again in the case of collocation) to handle that for you. And of course, if
you aren’t using MAID drives, the cost of your power to run systems that are always
spinning is even higher. If you are using cloud services, you still need the staff and
processes to understand where that storage is located and to ensure that you can
properly access it.
And that’s not it! If this data resides in a database upon which you are performing
transactions for end users, each query of that data increases with the size of the data
being queried. We’re not talking about the cost of the physical storage at this point,
but rather the time to complete the query. Although it’s true that if you are querying
upon a properly balanced index that the time to query that data is not linear (it is

×