Tải bản đầy đủ (.pdf) (67 trang)

Expert one-on-one J2EE Design and Development phần 10 potx

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (3.03 MB, 67 trang )

Brought to you by ownSky
Performance Testing and Tuning an Application
o Can we cope with the increased implementation complexity required to support caching? This will
be mitigated if we use a good, generic cache implementation, but we must be aware that read-write
caching introduces significant threading issues.
o Is the volume of data we need to cache manageable? Clearly, if the data set we need to cache contains
millions of entities, and we can't predict which ones users will want, a cache will just waste memory.
Databases are very good at plucking small numbers of records from a large range, and our cache
isn't likely to do a better job.
o Will our cache work in a cluster? This usually isn't an issue for reference data: it's not a problem if
each server has its own copy of read-only data, but maintaining integrity of cached read-write data
across a cluster is hard. If replication between caches looks necessary, it's pretty obvious that we
shouldn't be implementing such infrastructure as part of our application, but looking for support in
our application server or a third-party product.
o Can the cache reasonably satisfy the kind of queries clients will make against the data? Otherwise we
might find ourselves trying to reinvent a database. In some situations, the need for querying might be
satisfied more easily by an XML document than cached Java objects.
o Are we sure that our application server cannot meet our caching requirements? For example, if we
know that it offers an efficient entity bean cache, caching data on the client may be unnecessary.
One decisive issue here will be how far (in terms of network distance) the client is from the EJB tier.
The Pareto Principle (the 80/20 rule) is applicable to caching. Most of the performance gain can often be
achieved with a small proportion of the effort involved in tackling the more difficult caching issues.
Data caching can radically improve the performance of J2EE applications. However,
caching can add much complexity and is a common cause of bugs. The difficulty of
implementing different caching solutions varies greatly. Jump at any quick wins, such as
caching read-only data. This adds minimal complexity, and can produce a good performance
improvement. Think much more carefully about any alternatives when caching is a harder
problem - for example, when it concerns read-write data.
Don't rush to implement caching with the assumption that it will be required; base caching
policy on performance analysis.
A good application design, with a clean relationship between architectural tiers, will usually facilitate adding


any caching required. In particular, interface-based design facilitates caching; we can easily replace any
interface with a caching implementation, if business requirements are satisfied. We'll look at an example of a
simple cache shortly.
Where to Cache
As using J2EE naturally produces a layered architecture, there are multiple locations where caching may occur.
Some of these types of caching are implemented by the J2EE server or underlying database, and are accessible
to the developer via configuration, not code. Other forms of caching must be implemented by developers, and
can absorb a large part of total development effort.
Let's look at choices for cache locations, beginning from the backend:

633
Brought to you by ownSky




Brought to you by ownSky
Performance Testing and Tuning an Application




Brought to you by ownSky


Brought to you by ownSky
Performance Testing and Tuning an Application
Generally, the closer to the client we can cache, the bigger the performance improvement, especially in distributed
applications. The flip side is that the closer to the client we cache, the narrower the range of scenarios that benefit
from the cache. For example, if we cache the whole of an application's dynamically generated pages, response time

on these pages will be extremely fast (of course, this particular optimization only works for pages that don't contain
user-specific information). However, this is a "dumb" form of caching - the cache may have an obvious key for the
data (probably the requested URL), but it can't understand the data it is storing, because it is mixed with
presentation markup. Such a cache would be of no use to a Swing client, even if the data in the varying fragments
of the cached pages were relevant to a Swing client.
J2EE standard infrastructure is really geared only to support the caching of data in entity
EJBs. This option isn't available unless we choose to use entity EJBs (and there are many
reasons why we might not). It's also of limited value in distributed applications, as they face as
much of a problem in moving data from EJB container to remote client as in moving data
from database to EJB container.
Thus we often need to implement our own caching solution, or resort to another third-party caching solution. I
recommend the following guidelines for caching:
o Avoid caching unless it involves reference data (in which case it's simple to implement) or
unless performance clearly requires it. In general, distributed applications are much more
likely to need to implement data caching than collocated applications.
o As read/write caches involve complex concurrency issues, use third-party libraries (discussed below)
to conceal the complexity of the necessary synchronization. Use the simplest approach to ensuring
integrity under concurrent access that delivers satisfactory performance.
o Consider the implications of multiple caches working together. Would it result in users seeing data that
is staler than any one of the caches might tolerate? Or does one cache eliminate the need for another?
Third-party Caching Products for Use in J2EE Applications
Let's look at some third-party commercial caching products that can be used inJ2EE applications. The main
reasons we might spend money on a commercial solution are to achieve reliable replicated caching
functionality, and avoid the need to implement and maintain complex caching functionality in-house.
Coherence, from Tangosol ( is a replicated caching
solution, which claims even to support clusters including geographically dispersed servers. Coherence
integrates with most leading application servers, including JBoss. Coherence caches are basically alternatives to
standard Java map implementations, such as java.util.HashMap, so using them merely requires
Coherence-specific implementations of Java core interfaces.
SpiritCache, from SpiritSoft ( is also a

replicated caching solution, and claims to provide a "universal caching framework for the Java platform". The
SpiritCache API is based on the proposed JCache standard API (JSR-107:
JCache, proposed by Oracle, defines a standard API for caching and retrieving objects, including an
event-based system allowing application code to register for notification of cache events.
637
Brought to you by ownSky
Commercial caching products are likely to prove a very good investment for applications with sophisticated
caching requirements, such as the need for caching across a cluster of servers. Developing and maintaining
complex caching solutions in-house can prove very expensive. However, even if we use third-party products,
running a clustered cache will significantly complicate application deployment, as the caching product - in
addition to the J2EE application server - will need to be configured appropriately for our clustered environment.
Code Optimization
Since design largely determines performance, unless application code is particularly badly written, code optimization is seldom worth
the effort in J2EE applications unless it is targeted at known problem areas. However, all professional developers should be familiar with
performance issues at code level to avoid making basic errors. For discussion of Java performance in general, I recommend Java
Performance Tuningby Jack Shirazi from O'Reilly (ISBN: 0-596-00015-4) and Java 2 Performance and Idiom Guide from Prentice Hall,
(ISBN: 0-13-014260-3). There are also many good online resources on performance tuning. Shirazi maintains a performance tuning web
site ( that contains an exhaustive directory of code tuning tips from many sources.
Avoid code optimizations that reduce maintainability unless there is an overriding performance imperative. Such
"optimizations" are not just a one-off effort, but are likely to prove an ongoing cost and cause of bugs.
The higher-level the coding issue, the bigger the potential performance gain by code optimization. Thus there often is potential to achieve
good results by techniques such as reordering the steps of an algorithm, so that expensive tasks are executed only if absolutely essential.
As with design, an ounce of prevention is worth a pound of cure. While obsession with performance is counter-productive, good
programmers don't write grossly inefficient code that will later need optimization. Sometimes, however, it does make sense to try a
simple algorithm first, and change the implementation to use a faster but more complex algorithm only if it proves necessary.
Really low-level techniques such as loop unrolling are unlikely to bring any benefit to J2EE systems. Any optimization should be targeted,
and based on the results of profiling. When looking at profiler output, concentrate on the slowest five methods; effort directed elsewhere
will probably be wasted.
The following table lists some potential code optimizations (worthwhile and counter-productive), to illustrate some of the tradeoffs
between performance and maintainability to be considered:


638
Brought to you by ownSky
































































639
Brought to you by ownSky
Performance Testing and Tuning an Application


































































Brought to you by ownSky
Performance Testing and Tuning an Application
As an example of this, consider logging in our sample application. The following seemingly innocent statement
in our TicketController web controller, performed only once, accounts for a surprisingly high 5% of total
execution time if a user requests information about a reservation already held in their session:
logger.fine("Reservation request is [“ +
reservationRequest + "]");
The problem is not the logging statement itself, but that of performing a string operation (which HotSpot
optimizes to a StringBuffer operation) and invoking the toString() method on the ReservationRequest
object, which performs several further string operations. Adding a check as to whether the log message will
ever be displayed, to avoid creating it if it won't be, will all but eliminate this cost in production, as any good
logging package provides highly efficient querying of log configuration:
if (logger.isLoggable(Level.FINE))

logger.fine("Reservation request is [“

+ reservationRequest + "]");
Of course a 5% performance saving is no big deal in most cases, but such careless use of logging can be much

more critical in frequendy-invoked methods. Such conditional logging is essential in heavily used code.
Generating log output usually has a minor impact on performance. However, building log
messages unnecessarily, especially if it involves unnecessary toString () invocations, can be
surprisingly expensive.
Two particularly tricky issues are synchronization and reflection. These are potentially important, because they
sit midway between design and implementation. Let's take a closer look at each in turn.
Correct use of synchronization is an issue of both design and coding. Excessive synchronization throttles
performance and has the potential to deadlock. Insufficient synchronization can cause state corruption.
Synchronization issues often arise when implementing caching. The essential reference on Java threading is
Concurrent Programming in Java: Design Principles and Patterns
from Addison Wesley (ISBN
'.
0-201-31009-Oj. I
strongly recommend referring to this book when implementing any complex multi-threaded code. However, the
following tips may be useful:
o Don't assume that synchronization will always prove disastrous for performance. Base decisions
empirically. Especially if operations executed under synchronization execute quickly,
synchronization may ensure data integrity with minimal impact on performance. We'll look at a
practical example of the issues relating to synchronization later in this chapter.
o Use automatic variables instead of instance variables where possible, so that synchronization is
not necessary (this advice is particularly relevant to web-tier controllers).
o Use the least synchronization consistent with preserving state integrity.
o Synchronize the smallest possible sections of code.
o Remember that object references, like ints (but not longs and doubles) are atomic (read or
written in a single operation), so their state cannot be corrupted. Hence a race condition in which
two threads initialize the same object in succession (as when putting an object into a cache) may
do no harm, so long as it's not an error for initialization to occur more than once, and be acceptable
in pursuit of reduced synchronization.
641
Brought to you by ownSky

o Use lock splitting to minimize the performance impact of synchronization. Lock splitting is a
technique to increase the granularity of synchronization locks, so that each synchronized block
locks out only threads interested in the object being updated. If possible, use a standard package
such as Doug Lea's util. concurrent to avoid the need to implement well-known
synchronization techniques such as lock splitting. Remember that using EJB to take care of
concurrency issues isn't the only alternative to writing your own low-level multi-threaded code:
util .concurrent is an open source package that can be used anywhere in ajavaapplication.
Reflection has a reputation for being slow. Reflection is central to much J2EE functionality and a powerful tool
in writing generic Java code, so it's worth taking a close look at the performance issues involved. It reveals that
most of the fear surrounding the performance of reflection is unwarranted.
To illustrate this, I ran a simple test to time four basic reflection operations:
o Loading a class by name with the Class.forName (String) method. The cost of invoking this
method depends on whether the requested class has already been loaded. Any operation - using
reflection or not - will be much slower if it requires a class to be loaded for the first time.
o Instantiating a loaded class by invoking the Class.newlnstance() method, using the class's
no-argument constructor.
o Introspection: finding a class's methods using Class.getMethods().
o Method invocation using Method. invoke(), once a reference to a method has been cached.
The source code for the test can be found in the sample application download, under the path
/framework/test/reflection/Tests.Java.
The following method was invoked via reflection:

The most important results, in running these tests concurrently on a IGhz Pentium III under JDK
1.3.1_02, were:
o 10,000 invocations this method via Method.invoke( ) took 480ms.
o 10,000 invocations this method directly took 301ms (less than twice as fast).
o 10,000 creations of an object with two superclasses and a fairly large amount of instance data took
21,371ms.
o 10,000 creations of objects of the same class using the new operations took 21,280ms. This means
that whether reflection or the new operator is used will produce no effect on the cost 01 creating a

large object.
My conclusions, from this and tests I have run in the past, and experience from developing real application are
that:
642
Brought to you by ownSky
Performance Testing and Tuning an Application
Invoking a method using reflection is very fast once a reference to the Method object is available.
When using reflection, try to cache the results of introspection if possible. Remember that a method
can be invoked on any object of the declaring class. If the method does any work at all, the cost of this
work is likely to outweigh the cost of reflective invocation.
The cost of instantiating any but trivial objects dwarfs the cost of invoking the newlnstance()
method on the relevant class. When a class has several instance variables and superclasses with
instance data, the cost of object creation is hundreds of times more expensive than that of initiating
that object creation through reflection.
Reflective operations are so fast that virtually any amount of reflection done once per web
request will have no perceptible effect on performance.
Slow operations such as string operations are slower than invoking methods using reflection.
Reflective operations are generally faster - and some dramatically faster - in JDK 1.3.1 and
JDK 1.4 than in JDK 1.3.0 and earlier JDKs. Sun have realized the importance of reflection, and
have put much effort into improving the performance of reflection with each new JVM.
The assumption among many Java developers that "reflection is slow" is misguided, and
becoming increasingly anachronistic with maturing JVMs. Avoiding reflection is pointless
except in unusual circumstances - for example, in a deeply nested loop. Appropriate use of
reflection has many benefits, and its performance overhead is nowhere near sufficient to
justify avoiding it. Of course application code will normally use reflection only via an
abstraction provided by infrastructure code.
Case Study: The "Display Show" Page in the Sample
application
All benchmarks on the following test were run on a 1 Ghz Pentium III with 512 MB of RAM under
Windows XP. Microsoft Web Application Stress Tool, application server and database were running in the

same machine. The software versions were JBoss 3.0.0, Oracle 8.1.7, and JavaJDK 1.3.1_02. Logging
was switched to production level (errors and warnings only).
Let's now look at a case study of addressing the performance requirements of one use case in the sample
application. Let's consider requests for the "Display Show" page. This displays information about all bookable
performances of a particular show. The "Welcome" page links directly to this page, so most users will arrive
here on their second page view, although they may be interested in different shows. Thus it's vital that this page
can cope with heavy user activity, that it renders quickly and that generating it doesn't load the system too
heavily.
Some of the information displayed on this page is rarely changing reference data: for example, the name of the
show and the pricing structure. Other information changes frequently: for example, we must display the
availability of each seat type for every performance (with 10 performances of a show displayed and 4 classes of
seats for each, this would mean 40 availability checks). Business requirements state that caching may be
acceptable if required to deliver adequate performance, but that the availability information must be no more
than 30 seconds old. The following screenshot illustrates this page:

643
Brought to you by ownSky


















We begin by running load tests without any caching or other optimizations in application code to see whether
there is a problem. The Microsoft Web Application Stress Tool reveals that with 100 concurrent users, this page
can take 14 hits per second, with an average response time of just over 6 seconds. The load test showed JBoss
using 80% of CPU and Oracle almost 20% (it's important to use your operating system's load monitoring tools
during load testing).
Although this exceeds our modest performance targets for concurrent access, it does not meet requirements for
response time. Throughput and performance could deteriorate sharply if we had to display more than 3
performances of a show (our test data), or if Oracle was on a remote server, as would be the case in production.
Of course we would test the effect of these scenarios in a real application, but I have limited hardware and time
at my disposal while writing this book. Thus we must implement design and code changes necessary to improve
the performance of generating this page.
It's pretty clear from the Task Manager display that the problem is largely in communication with, and work
within, the database. However, before we begin amending our design and changing code, it's a good idea to get
some precise metrics of where the application spends its time. So we profile two requests for this page in JProbe.
The results, ordered by cumulative method time, look as follows:
644
Brought to you by ownSky
Performance Testing and Tuning an Application











These results indicate that we have executed 6 SQL queries per page view, shown by the 12 invocations of the
SqlQuery.execute() method, and that these queries accounted for 52% of the total time. Rendering the JSP
accounted for a surprisingly high 26% of execution time. However, it's clear that the database access is the main
limiter on performance. The 13% spent reflectively invoking methods using reflection via Method, invoke ()
indicates the 12 EJB accesses per page view. Both JBoss and the EJB proxy infrastructure discussed in
Chapter 11 use reflection in EJB invocation. 12 EJB invocations per page is also unacceptably high, due to the
overhead of invoking EJB methods, so we will also want to address this.
As the queries involved are simple selects and don't involve transaction or locking issues, we can rule out
locking in the database or within the application server (we should also check that the database is correctly
configured and the schema efficient; we'll assume this to be the case). Since we can't make simple selects more
efficient, we'll need to implement caching in business objects to minimize the number of calls to the database.
As business requirements allow the data presented on this screen to be as much as 30 seconds out of date, we
have room for maneuvering.
Since the web-tier code in com.wrox.expertj2ee.ticket.web.Ticketcontroller is coded to use the
com. wrox.expertj2ee.ticket.command.AvailabilityCheck interface to retrieve availability
information, rather than a concrete implementation, we can easily substitute a different JavaBean
implementation to implement caching.
Interface-driven design is an area in which good design practice leads to maximum freedom
in performance tuning. While there is a tiny overhead in invoking methods through an
interface, rather than on a class, it is irrelevant in comparison with the benefits of being able
to reimplement an interface without affecting callers.
645
Brought to you by ownSky
During high-level design, we also considered the possibility of using JMS to fire updates on reservations
and purchases, as an alternative to caching, to cause data to be invalidated only when it's known to be
changed. As reservations can timeout in the database, without further activity through the web tier, this
would be moderately complex to implement: we 'd have to schedule a second JMS message to be sent on the
reservation's expiry, so that any cache could check whether the reservation expired or had been converted

into a purchase. Further performance investigation will reveal whether this option is necessary.
Let's begin by looking at the present code in the implementation of the AvailabilityCheck interface to
return combined performance and availability information. The highlighted lines use the BoxOfficeEJB,
which will need to perform a database query. This method is invoked several times to build information for
each show. Note that the results of JNDI lookups have already been cached in infrastructure code:
public PerformanceWithAvailability getPerformanceWithAvailability
( Performance p) throws NoSuchPerformanceException {

int avail = boxOffice.getFreeSeatCount(p.getld() ) ;
PerformanceWithAvailabilitylmpl pai =

new PerformanceWithAvailabilitylmpl(p, avail);

for (int i = 0; i < p.getPriceBands().size(); i++) {

PriceBand pb = (PriceBand) p.getPriceBands().get( i );

avail = boxOff ice.getFreeSeatCount (p.getldl), pb.getId());

PriceBandWithAvailability pba =

new PriceBandWithAvailabilitylmpl(pb, avail);

pai.addPriceBand(pba);
}
return pai;
}

We begin by trying the simplest possible approach: caching performance objects by key in a hash table. As this
is quite simple, it's reasonable to implement it in application code, rather than introduce a third-party caching

solution. Rather than worry about synchronization - potentially the toughest problem in implementing
caches - we use a java.util.HashTable to hold a cache of PerformanceWithAvailability objects,
keyed by integer performance ID.
Remember that the old, pre-Java 2, collections use synchronization on nearly every method, including put
and get on maps, while the newer collections, such as java.util.HashMap, leave the caller to handle
any synchronization necessary. This means that the newer collections are always a better choice for read-only
data.
There's no need to set a limit on the maximum size of the cache (another problem sometimes encountered
when implementing caches), as there can never be more show and performance objects than we can store in
RAM. Likewise, we don't need to worry about the implications of clustering (another potential caching
problem); business requirements state that data should be no older than 30 seconds, not that it must be
exactly the same on all servers in any cluster.
Since the business requirements state that the seat selection page, generation of which also uses the
AvailabilityCheck interface, always requires up-to-date data, we need to perform a little refactoring to add
a new Boolean parameter to the methods from the AvailabilityCheck interface, so that caching can be
disabled if the caller chooses.
646
Brought to you by ownSky
Performance Testing and Tuning an Application
Our caching logic will need to be able to check how old a cached Perf ormanceWithAvailability
object is, so we make the Perf ormanceWithAvailability interface extend a simple interface,
TimeStamped, which exposes the age of the object:
package com. interface21 .core;

public interface TimeStamped
{ long getTimeStamp() ; }

As the period for which we cache data is likely to be critical to performance, we expose a "timeout" JavaBean
property on the CachedAvailabilityCheck class, our new caching implementation of the
AvailabilityCheck interface, which uses a HashTable as its internal cache:

private Map perf ormanceCache = new HashTable() ;
private long timeout = 1000L;

public void setTimeout( int sees) {
this. timeout = 1000L * sees;

Now we split getPerf ormanceWithAvailability() into two methods, separating the acquisition of new
data into the reloadPerf ormanceWithAvailability() method. I've highlighted the condition that
determines whether or not to use any cached copy of the performance data for the requested ID. Note that the
quickest checks - such as whether the timeout bean property is set to 0, meaning that caching is effectively
disabled - are performed first, so that we don't need to evaluate the slowest checks, which involve getting the
current system time (a relatively slow operation), unless necessary.
Strictly speaking, the check as to whether the timeout property is 0 is unnecessary, as the timestamp
comparison would work even if it were. However, as this check takes virtually no time its far better to run a
redundant check sometimes than ever to perform an unnecessary, expensive check:
public PerformanceWithAvailability getPerf ormanceWithAvailability
(Performance p, boolean acceptCached) throws NoSuchPerformanceException {

Integer key = new Integer(p.getld) ) ;

PerformanceWithAvailability pai =

(PerformanceWithAvailability) perf ormanceCache.get (key);

if (pai == null

this.timeout <= 0L | |

! acceptCached | |


System.currentTimeMillis() - pai.getTimeStamp() > this.timeout) {

pai = reloadPerformanceWithAvailability(p) ;

this.performanceCache.put (key, pai) ;
}

return pai;

private PerformanceWithAvailability
reloadPerformanceWithAvailability( Performance p) throws
NoSuchPerformanceException {

647
Brought to you by ownSky
int avail = boxOffice.getFreeSeatCount(p.getldf));
PerformanceWithAvailabilitylmpl pai =
new PerformanceWithAvailabilitylmpl(p, avail);
for (int i = 0; i < p.getPriceBands().size(); i++) {
PriceBand pb = (PriceBand) p.getPriceBands().get(i) ;
avail = boxOf fice.getFreeSeatCount (p.getld() , pb.getldO)
PriceBandWithAvailability pba =
new PriceBandWithAvailabilitylmpl(pb, avail);
pai.addPriceBand(pba); }
return pai;
Since using a synchronized hash table guarantees data integrity, we don't need to perform any synchronization
ourselves. There is a possibility that, at the first highlighted line in the above listing, we will retrieve a null value
from the hash table, but that before we retrieve the data and insert into the hash table, another thread will have
beaten us to it. However, this won't cause any data integrity problems: the occasional unnecessary database
access is a lesser evil than more complex, bug-prone code. Clever synchronization is sometimes necessary, but

it's best avoided if it doesn't deliver real value.
With these changes, we set the timeout property of the availabilityCheck bean to 20 seconds in the
relevant bean definition in ticket-servlet.xml and rerun the Web Application Stress Tool. The result is a
massive improvement in throughput and performance: 51 pages per second, against the 14 achieved without
caching. The Task Manager indicates that Oracle is now doing virtually nothing. This more than satisfies our
business requirements.
However, the more up-to-date the data the better, so we experiment with reduced timeout settings. A timeout
setting of 10 seconds produces runs averaging 49 pages per second, with an average response time well under 2
seconds, indicating that this may be worthwhile. Reducing the timeout to 1 second reduces throughput to 28
pages per second: probably too great a performance sacrifice.
At this point, I was still concerned about the effect of synchronization. Would a more sophisticated approach
minimize locking and produce even better results? To check this, I wrote a multi-threaded test that enabled me
to test only the CachingAvailabilityCheck class, using the simple load-testing framework in the
com.interface21.load package discussed earlier. The worker thread extended the AbstractTest class,
and simply involved retrieving data from a random show among those loaded when the whole test suite started
up:
public class AvailabilityCheckTest extends AbstractTest
{ private AvailabilityFixture fixture;

public void setFixture(Object fixture) {

this.fixture = (AvailabilityFixture) fixture;}

protected void runPass(int i) throws Exception {

Show s = (Show) fixture.shows.get(randomlndexffixture.shows.size() ) ) ;
fixture.availabilityCheck.getShowWitnAvailability(s, true);

648
Brought to you by ownSky

Performance Testing and Tuning an Application



It's essential that each thread invoke the same AvailabilityCheck object, so we create a "fixture" class
hared by all instances. This creates and exposes a CachingAvailabilityCheck object. Note that in the
listing below I've exposed a public final instance variable. This isn't usually a good idea, as it's not
JavaBean-friendly and means that we can't add intelligence in a getter method, but it's acceptable in a quick
test case. The AvailabilityFixture class exposes three bean properties that enable tests to be
parameterized: sout, which directly sets the timeout of the CachingAvailabilityCheck being tested,
and minDelay and maxDelay (discussed below):












We're interested in the performance of the caching algorithm, not the underlying database access, so I use a
simple dummy implementation of the BoxOffice interface in an inner class (again, interface-based design proves
handy during testing). This always returns the same data (we're not interested in the values, just how long it takes
to retrieve them), delaying for a random number of milliseconds between the value of the minDelay and
maxDelay bean property. Those methods that are irrelevant to the test simply throw an
UnsupportedoperationException. This is better than returning null, as we'll immediately see if these
methods ever do unexpectedly get invoked:

private class DummyBoxOffice implements BoxOffice {

public Reservation allocateSeats(ReservationReguest request)
throws NotEnoughSeatsException,

NoSuchPerformanceException,
InvalidSeatingReguestException {

649
Brought to you by ownSky
throw new UnsupportedOperationException( "DummyBoxOf f ice.allocateSeats" )

public Booking conf irmReservation(PurchaseRequest purchase)
throws ExpiredReservationTakenException,
CreditCardAuthorizationException,
InvalidSeatingRequestException, BoxOfficelnternalException
{ throw new UnsupportedOperationExceptionl "DummyBoxOf f ice.
conf irmReservation" ) ;

public int getFreeSeatCount(int performanceld, int seatTypeld)

throws NoSuchPerformanceException {
AbstractTest.simulateDelay(minDelay, maxDelay);
return 10;
}

public int getFreeSeatCount(int performanceld)

throws NoSuchPerformanceException {
AbstractTest .simulateDelay(minDelay, maxDelay);

return 30;
}

public int getSeatCount(int performanceld)

throws NoSuchPerformanceException
{ return 200; }

To use the real, EJB, implementation of the BoxOffice, we'd need to run the tests in the EJB container or
access the EJB through a remote interface, which would distort the test results. If we weren't using EJB, we
could simply read the XML bean definitions in ticket-servlet.xml in our test suite. The complexity
that any use of EJB adds throughout the software lifecycle should be considered before choosing to use
EJB; in this case using EJB does deliver real value through declarative transaction management, so we can
accept greater complexity in other areas.
We can configure our test suite with the following properties file, which is similar to the example we saw
above:
suite.class=com.interface21.load.BeanFactoryTestSuite

suite.name=Availability check

suite.report!ntervalSeconds=l

suite.longReports=false

suite.doubleFormat=###.#

suite.reportFile=<local path to report file>

The crucial framework test suite definitions are of the number of threads to run concurrently, the number of
test passes to be run by each thread, and the maximum pause value per thread:

suite.threads=50
suite.passes=40
suite.maxPause=2 3
650
Brought to you by ownSky
Performance Testing and Tuning an Application
We set the fixture object as a bean property of the framework's generic BeanFactoryTestSuite:
suite.fixture(ref)=fixture

The fixture is also a bean, so we can configure the CachingAvailability object's timeout, and the delays
in the JDBC simulation methods as follows:

fixture.class=com.interface21.load.AvailabilityFixture
fixture.timeout=10
fixture.minDelay=60
fixture.maxDelay=120

Finally, we set the properties of each worker thread:
availabilityTest.class=com.interface21.load.AvailabilityCheckTest
availabilityTest.(singleton)=false

First I set fixture.timeout property to 0 to disable caching. This produced throughput of about 140 hits per
second, with an average response time of about 360 milliseconds. Setting the thread timeout to 1 second
produced a dramatic improvement, without about 7,000 hits per second and an average response time of 7
milliseconds. Increasing the timeout further produced an improvement of 20% or less.

No surprises so far. However I was a little surprised by the results of investigating the effect of synchronization.
I began by replacing the Hashtable with the unsynchronized java.util.HashMap. Unless this produced a
substantial improvement, there was no point in putting more effort into developing smarter synchronization.
The improvement was at most 10-15% at all realistic load levels. Only by trying hundreds of users

simultaneously requesting information about the same show, with an unrealistically slow database response
time and a 1 second timeout - an impossible scenario, as the web interface couldn't deliver this kind of load to
business objects - did the Hashtable synchronization begin to reduce throughput significantly. I learned also
that eliminating the potential race condition noted above by synchronization within the
getPerformanceWithAvailability () method reduced performance around 40% under moderate to
heavy load, making it unattractive.

With a little thought, it's easy to explain these results. Although there is an inevitable lock management load in
the JVM associated with synchronization, the effect of synchronization on throughput will ultimately depend
on how long it takes to execute the synchronized operations. As hash table get and put operations take very
little time, the effect of synchronization is fairly small (this is quite like the Copy-On-Write approach we
discussed in Chapter 11: synchronization is applied only to updating a reference, not to looking up the new
data).

Thus the simplest approach - the cache shown above, uses the synchronized java.util.Hashtable -produced
performance far exceeding the business requirements.

Finally, I ran JProbe again on the same use case, with caching enabled, to see what had changed. Note that this
is a profile of a single request, and so doesn't reflect synchronization costs in concurrent access:


651
Brought to you by ownSky














This indicates that 94% of the execution time is now spent rendering the JSP. Only by switching to a more
performant view technology might we appreciably improve performance. Further changes to Java application
code will produce no benefit. Normally, such results - indicating that we've run into a limit of the underlying
J2EE technologies - are very encouraging. However, it would be worth checking the JSP to establish that it's
efficient. In this case, it's trivial, so there's no scope for improvement.
It's time to stop. We've exceeded our performance goals, and further effort will produce no worthwhile return.
This case study indicates the value of an empirically based approach to performance
tuning, and how doing "the simplest thing that could possibly work" can be valuable in
performance tuning. As we had coded the web-tier controller to use a business interface,
not a concrete class, as part of our overall design strategy, it was easy to substitute a
caching implementation.
With an empirical approach using the Web Application Stress tool, we established that, in this case, the simplest
caching strategy - ensuring data integrity through synchronization - performed better under all conditions
except improbably high load than more sophisticated locking strategies. We also established that there was no
problem in ensuring that data displayed was no older than 10 seconds, more than satisfying business
requirements on freshness of data. Using JProbe, we were able to confirm that the performance of the final
version, with caching in place, was limited by the work of rendering the JSP view, indicating no further scope
for performance improvements.
Of course the simplest approach may not always deliver adequate performance. However, this example shows
that it's wise to expend greater effort reluctantly, and only when it is proven to be necessary.
652
Brought to you by ownSky
Performance Testing and Tuning an Application
Performance in Distributed Applications

Distributed applications are much more complex than applications in which all components
run in the same JVM. Performance is among the most important of the many reasons to avoid
adopting a distributed architecture unless it's the only way to satisfy business requirements.
The commonest cause of disappointing performance in J2EE applications is unnecessary use of remote calling -
usually in the form of remote access to EJBs. This typically imposes an overhead far greater than that of any other
operation in a J2EE application. Many developers perceive J2EE to be an inherently distributed model. In fact, this is
a misconception. J2EE merely provides particularly strong support for implementing distributed architectures when
necessary. Just because this choice is available doesn't mean that we should always make it.
In this section, we'll look at why remote calling is so expensive, and how to minimize its effect on
performance when we must implement a distributed application.
The Overhead of Remote Method Invocation (RMI)
Whereas ordinary Java classes make calls by reference to objects in the same virtual machine, calls to EJBs in
distributed applications must be made remotely, using a remote invocation protocol such as IIOP. Clients
cannot directly reference EJB objects and must obtain remote references using JNDI. EJBs and EJB clients may
be located in different virtual machines, or even different physical servers. This indirection sometimes enhances
scalability: because an application server is responsible for managing naming lookups and remote method
invocation, multiple application server instances can cooperate to route traffic within a cluster and offer
failover support. However, the performance cost of remote, rather than local, calling can be hefty if we do not
design our applications appropriately.
EJB's support for remote clients is based on Java RMI. However, any infrastructure for distributed invocation will
have similar overheads.
Java RMI supports two types of objects: remote objects, and serializable objects. Remote objects support
method invocation from remote clients (clients running in different processes), who are given remote
references to them. Remote objects are of classes that implement the java.rmi.Remote interface and all of
whose remote methods are declared to throw java.rmi.RemoteException in addition to any application
exceptions. All EJBs with remote interfaces are remote objects.
Serializable objects are essentially data objects that can be used in invocations on remote objects. Serializable
objects must be of classes that implement the java.io.Serializable tag interface and must have serializable
fields (other serializable objects, or primitive types). Serializable objects are passed by value, meaning that both
copies can be changed independently, and that object state must be serialized (converted to a stream

representation) and deserialized (reconstituted from the stream representation) with each method call.
Serializable objects are used for data exchange in distributed J2EE applications, as parameters, return values,
and exceptions in calls to remote objects.
Method invocations on remote objects such as EJB objects or EJB homes always require a network round trip
from client to server and back. Hence remote calling consumes network bandwidth. Unnecessary remote calls
consume bandwidth that should be reserved for operations that do something necessary, such as moving data to
where it's needed.

563
Brought to you by ownSky
Each remote call will encounter the overhead of marshaling and unmarshaling serializable parameters: the
process by which the caller converts method parameters into a format that can be sent across the network, and
the receiver reassembles object parameters. Marshaling and unmarshaling has an overhead over and above the
work of serialization and deserialization and the time taken to communicate the bytes across the network. The
overhead depends on the protocol being used, which may be IIOP or an optimized proprietary protocol such as
WebLogic's T3 or Orion's ORMI. J2EE 1.3 application servers must support IIOP, but need not use it by default.
The following diagram illustrates the overhead involved in remote method invocation:







This overhead means that remote calls may be more than 1,000 times slower than local calls, even if there's a fast
LAN connection between the application components involved.
The number of remote calls is a major determinant - potentially the major determinant -of a
distributed application's performance, because the overhead of remote calling is so great.
In the following section, we'll look at how we can minimize the performance impact of remote invocation when
designing distributed applications.

Fortunately, we have many choices as architects and developers. For example:
o We can try to structure our application to minimize the need to move data between architectural tiers
through remote calling. This technique is known as application partitioning.
o We can try to move the data we can't help moving in the minimum number of remote calls.
o We may be able to move individual pieces of data more efficiently.
o We can collocate components in the same virtual machine so that inter-tier calls do not
require remote calling.
o We can cache data from remote resources to minimize the number of remote calls. We've already
considered caching; it will be particularly beneficial in this scenario.
Let's examine these techniques in turn.
654
Brought to you by ownSky


Performance Testing and Tuning an Application
Minimizing Remote Calls
The greatest scope for performance gains is in structuring an application so as to minimize the number of
remote calls that will be required.
Application Partitioning
Application partitioning is the task of dividing a distributed application into major architectural tiers and
assigning each component to one tier. In a J2EE web application using EJB, this means assigning each object or
functional component to one of the client browser, the web tier, the EJB tier, or the database. A "functional
component" need not always be a Java object. For example, a stored procedure in a relational database might
be a functional component of an application.
Application partitioning will determine the maximum extent of network round trips required as the application
runs. The actual extent of network round trips may be less in some deployment configurations. A distributed
J2EE application must support different deployment configurations, meaning that the web container and EJB
container may be collocated in the same JVM, which will reduce the number of network round trips in some
deployments.
The main aim of application partitioning is to ensure that each architectural layer has a clearly defined

responsibility. For example, we should ensure that business logic in a distributed J2EE application is in the EJB
tier, so that it can be shared between client types. However, there is also a performance imperative: to ensure that
frequent, time-critical operations can be performed without network round trips. As we've seen from examining
the cost of Java remote method invocations, application partitioning can have a dramatic effect on performance.
Poor application partitioning decisions lead to "chatty" remote calling - the greatest enemy of performance in
distributed applications.
Design and performance considerations with respect to application partitioning tend to be in harmony.
Excessive remote calling complicates an application and is error prone, so it's no more desirable from a
design perspective than a performance perspective. However, application partitioning sometimes does
involve tradeoffs.
Appropriate application partitioning can have a dramatic effect on performance. Hence
it's vital to consider the performance impact of each decision in application partitioning.
The greatest performance benefits will result from minimizing the depth of calling down a distributed J2EE stack
to satisfy incoming requests.
The deeper down a distributed J2EE stack calls need to be made to service a request, the
poorer the resulting performance will be. Especially in the case of common types of
request, we should try to service requests as close as possible to the client. Of course, this
requires a tradeoff: we can easily produce hosts of other problems, such as complex,
bug-prone caching code or stale data, by making this our prime goal.
What techniques can we use to ensure efficient application partitioning?
655
One of the biggest determinants is where the data we operate on comes from. First, we need to analyze data
flow in the application. Data may flow from the data store in the EIS tier to the user, or from the user down the
application's tiers.
Three strategies are particularly useful for minimizing round trips:
o Moving data to where we operate on it.
Brought to you by ownSky
o Moving the operation to where the data is. Java RMI enables us to move code as well as data in order
to do this. We can also move some operations inside ElS-tier resources such as databases to
minimize network traffic.

o Collocating components with a strong affinity. Objects with a strong affinity interact with each other
often.
Moving Data to Where We Operate on It
The worst situation is to have data located in one tier while the operations on it are in another. For example,
this arises if the web tier holds a data object and makes many calls to the EJB tier as it processes it. A better
alternative is to move the object to the EJB tier by passing it as a parameter, so that all operations run locally,
with only one remote invocation necessary. The EJB Command pattern, discussed in Chapter 10, is an
example of this approach. The Value Object pattern also moves entire objects in a single remote call.
Caching, which we have discussed, is a special case of moving data to where we operate on it. In this case, data
is moved in the opposite direction: from EIS tier towards the client.
Moving the Operation to the Data
An example of this strategy is using a single stored procedure running inside a relational database to
implement an operation instead of performing multiple round trips between the FJB tier and the database to
implement the same logic in Java and SQL. In some cases this will greatly improve performance. The use of
stored procedures is an example of a performance-inspired application partitioning decision that does involve
a tradeoff. It may have other disadvantages. It may reduce the portability of the application between
databases, and may reduce maintainability. However, this application partitioning technique is applicable to
collocated, as well as distributed, J2EE applications.
Another example is a possible approach to validating user input. Validation rules are business logic, and
therefore belong naturally in the EJB tier in a distributed application, not the web tier. However, making a
network round trip from the web container to the EJB container to validate input each time a form is submitted
will be wasteful, especially if many problems in the input can be identified without access to back-end
components.
One solution is for the EJB tier to control validation logic, and move validation code to the web tier in a
serializable object implementing an agreed validation interface. The validator object need only be passed
across the network once. As the web tier will already have the class definition of the validator interface, only
the implementing class need be provided by the EJB tier at run time. The validator can then be invoked
locally in the web tier, and remote calls will only be necessary for the minority of validation operations, such
as checking a username is unique, that require access to data. As local calling is so much faster than remote
calling, this strategy is likely to be more performant than calling the EJB tier to perform validation, even if the

EJB tier needs to perform validation again (also locally) to ensure that invalid input can never result in a data
update.
656
Brought to you by ownSky
Performance Testing and Tuning an Application
Let's look at an illustration of this in practice. Imagine a requirement to validate a user object that contains e-mail
address password, postcode, and username properties. In a native implementation a web tier controller might
invoke a method on a remote EJB to validate each of these properties in turn, as shown in the following diagram:










This approach will guarantee terrible performance, with an excessive number of expensive remote calls
required to validate each user object.
A much better approach is to move the data to where we operate on it (as described above), using a serializable
value object so that user data can be sent to the EJB server in a single remote call, and the results of validating all
fields returned. This approach is shown in the diagram below:






This will deliver a huge performance improvement, especially if there are many fields to validate. Performing just

one remote method invocation, even if it involves passing more data, will be much faster than performing many
fine-grained remote method invocations.
However, let's assume that the validation of only the username field requires database access (to check that the
submitted username isn't already taken by another user), and that all other validation rules can be applied entirely
on the client. In this case, we can apply the approach described above of moving the validation code to the client
via a validation class obtained from the EJB tier when the application starts up. As the application runs, the
client-side validator instance can validate most fields, such as e-mail address and postcode, without invoking EJBs.
It will need to make only one remote call, to validate the username value, to validate each user object. This
scenario is shown in the diagram overleaf:
657

×