Tải bản đầy đủ (.pdf) (10 trang)

3D Graphics with OpenGL ES and M3G- P16 pot

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (330.29 KB, 10 trang )

134 PERFORMANCE AND SCALABILITY CHAPTER 6
of wr iting boasts VGA true color displays, powered by dedicated GPUs and 600MHz
multicore ARM11 processors with vector floating-point units. Currently only the expen-
sive smart phones have dedicated graphics processors, but the situation is changing
rapidly with ever-cheaper GPU designs entering the feature phone market.
Programming standards such as OpenGL ES attempt to unify the variety of devices
by providing a common interface for accessing the underlying graphics architecture:
they act as hardware abstraction layers. This is important, as now the set of available
graphics features is reasonably constant from the programmer’s point of view. Apart
from the API and feature set these standards unify a third important factor: the under-
lying rendering model. Both OpenGL ES and M3G build on the shoulders of desktop
OpenGL by adopting its rendering paradigms as well as its well-specified and documented
pipeline. So, even though a programmer can assume to have more or less the same fea-
ture set on a low-end and a high-end device, use the same APIs to program both, and
have some expectations about the rendering quality, one thing cannot be guaranteed:
performance.
6.1 SCALABILITY
When building a scalable 3D application two major factors need to be taken into account.
First of all, the application should have maximum graphics performance; no major bottle-
necks or loss of performance should exist. This is extremely important as the lowest-end
mobile phones being targeted have very limited capabilities. The second thing to con-
sider is identifying all aspects of the rendering process that can be scaled. Scaling in this
context means that once an application runs adequately on the lowest-end de vice being
targeted, the application can be made more interesting on devices that have better render-
ing performance by adding geometric detail, using higher-quality textures, more complex
special effects, better screen resolution, more accurate physics, more complex game logic,
and so forth. In other words, you should always scale applications upward by adding eye
candy, because the opposite—that is downscaling a complex application—is much more
difficult to accomplish.
3D content is reasonably easy to scale using either automated or manually controlled
offline tools. For example, most modeling packages support automatic generation of


low-polygon-count models. This allows exporting the same scene using different triangle
budgets. Methods such as texture-based illumination, detail textures, and bump mapping
make it possible to use fewer triangles to express complex shapes; these were covered
earlier in Section 3.4.3. Texture maps are highly scalable, and creating smaller textures
is a trivial operation supported by all image-editing programs. The use of compressed
texture formats [BAC96, Fen03, SAM05] reduces the memory requirements even further.
Figure 6.1 illustrates how few triangles are needed for creating a compelling 3D game.
SECTION 6.1 SCALABILITY 135
Figure 6.1: Low-polygon models from a golf game by Digital Chocolate.
6.1.1 SPECIAL EFFECTS
Most game applications contain highly scalable visual elements that do not have any
impact on the game play. For example, bullet holes on walls, skid marks left by a race
car, and drifting clouds in the sky are typical examples of eye candy that could be reduced
or dropped altogether without altering the fundamentals of the game. Whether a special
effect is a game play element depends on the context. As an example, fog is often used to
mask the popping rendering artifacts caused by geometric level-of-detail optimizations
and culling of distant objects. It is also a visual effect that makes scenes moodier and more
atmospheric. On the other hand, fog may make enemies more difficult to spot in a shooter
game—removing the fog would clearly affect the game play. Ensuring that the game play
is not distur bed is especially important in multiplayer games as players should not need
to suffer from unfair disadvantages due to scaling of special effects.
If you want to expose performance controls to the user, special effects are one of the prime
candidates for this. Most users can understand the difference between rendering bullet
holes and not rendering them, whereas having to make a choice between bilinear and
trilinear filtering is not for the uninitiated.
One family of effects that can be made fully scalable are particle systems such as explosions,
water effects, flying leaves, or fire, as shown in Figure 6.2. The number of particles, the
complexity of the particle simulation, and the associated visuals can all be scaled based
on the graphics capabilities of the device. Furthermore, one can allocate a shared budget
for all particle systems: this ensures that the load on the graphics system is controlled

dynamically, and that the maximum load can be bounded. A similar approach is often
used for sounds, e.g., during an intense firefight the more subtle sound effects are skipped,
as the y would get drowned by the gunshots anyway.
136 PERFORMANCE AND SCALABILITY CHAPTER 6
Figure 6.2: Particle effects can be used to simulate natural phenomena, such as fire, that are not
easily represented as polygonal surfaces. (Image copyright
c
 AMD.)
6.1.2 TUNING DOWN THE DETAILS
Other scalable elements include noncritical detail objects and background elements.
In many 3D environments the most distant elements are rendered using 2D back-
drops instead of true 3D objects. In this technique faraway objects are collapsed into
a single panoramic sky cube at the expense of losing parallax effects between and
within those objects. Similarly, multi-pass detail textures can be omitted on low-end
devices.
The method selected for rendering shadows is another aspect that can be scaled. On a
high-performance device it may be visually pleasing to use stencil shadows [Cro77, EK02]
for some or all of the game objects. This is a costly approach, and less photorealistic meth-
ods, such as rendering shaded blobs under the main characters, should be utilized on less
capable systems. Again, one should be careful to make sure that shadows are truly just a
visual detail as in some games they can affect the game play.
6.2 PERFORMANCE OPTIMIZATION
The most important thing to do when attempting to optimize the performance of an
application is profiling. Modern graphics processors are complex devices, and the inter-
action between them and other hardware and software components of the system is not
trivial. This makes predicting the impact of program optimizations difficult. The only
effective way for finding out how changes in the program code affect application perfor-
mance is measuring it.
SECTION 6.2 PERFORMANCE OPTIMIZATION 137
The tips and tricks provided in this chapter are good rules of thumb but by no means

gospel. Following these rules is likely to increase overall rendering performance on most
devices, but the task of identifying device-specific bottlenecks is always left to the applica-
tion programmer. Problems in performance particular to a phone model often arise from
system integr ation issues rather than deficiencies in the rendering hardware. This means
that the profiling code must be run on the actual target device; it is not sufficient just
to obtain similar hardware. Publicly available benchmark programs such as those from
FutureMark
1
or JBenchmark
2
are useful for assessing approximate graphics processing
performance of a device. However, they may not pinpoint individual bottlenecks that
may ruin the performance of a particular application.
Performance problems of a 3D graphics application can be classified into three groups:
pixel pipeline, vertex pipeline, and application bottlenecks. These groups can be then fur-
ther partitioned into different pipeline stages. The overall pipeline runs only as fast as
its slowest stage, which forms a bottleneck. However, regardless of the source of the bot-
tleneck, the strategy for dealing with one is straightforward (see Figure 6.3). First, you
should locate the bottleneck. Then, you should try to eliminate it and move to the next
one. Locating bottlenecks for a single rendering task is simple. You should go through
each pipeline stage and reduce its workload. If the performance changes significantly, you
have found the bottleneck. Otherwise, you should move to the next pipeline stage. How-
ever, it is good to understand that the bottleneck often changes within a single frame that
contains multiple different primitives. For example, if the application first renders a group
of lines and afterward a group of lit and shaded triangles, we can expect the bottleneck to
change. In the following we study the main pipeline groups in more detail.
6.2.1 PIXEL PIPELINE
Whether an application’s performance is bound by the pixel pipeline can be found out by
changing the rendering resolution—this is easiest done by scaling the viewport. If the per-
formance scales directly with the screen resolution, the bottleneck is in the pixel pipeline.

After this, further testing is needed for identifying the exact pipeline stage (Figure 6.4).
To determine if memory bandwidth is the limiting factor, you should try using smaller
pixel formats for the different buffers and textures, or disable texturing altogether. If a
performance difference is observed, you are likely to be bandwidth-bound. Other factors
contributing to the memory bandwidth include blending operations and depth buffer-
ing. Try disabling these features to see if there is a difference. Another culprit for slow
fragment processing may be the texture filtering used. Test the application with
nonfiltered textures to find out if the performance increases.
1 www.futuremark.com
2 www.jbenchmark.com
138 PERFORMANCE AND SCALABILITY CHAPTER 6
Eliminate all draw calls
Limited by graphics
Limited by rendering
Limited by pixel
processing
Limited by geometry
processing
Limited by buffer swap
Limited by application processing
Faster
Faster
Faster
No effect
No effect
No effect
Only clear, draw one small triangle, and swap
Set viewport to 8 3 8 pixels
Reduce resolution
or frame rate

Figure 6.3: Determining whether the bottleneck is in application processing, buffer swapping, geometry processing, or
fragment processing.
Limited by pixel processing
Disable texturing
Faster
Limited by frame buffer access
Disable blending, fragment tests
Limited by frame
buffer ops
Limited by color
buffer bandwidth
Use fewer ops, render
in front-to-back order
User smaller resolution,
color depth, or viewport
Faster
No effect
Limited by texturing
Reduce textures to 1 ϫ 1 pixel
Use smaller textures,
compressed textures,
nearest filtering, mipmaps
Replace textures with
baked-in vertex colors,
use nearest filtering
Limited by texture
memory bandwidth
Limited by texture
mapping logic
Faster

No effect
No effect
Figure 6.4: Finding the performance bottleneck in fill rate limited rendering.
SECTION 6.2 PERFORMANCE OPTIMIZATION 139
To summarize: in order to speed up an application where the pixel pipeline is the
bottleneck, you have to either use a smaller screen resolution, render fewer objects, use
simpler data formats, utilize smaller texture maps, or perform less complex fragment and
texture processing. Many of these optimizations are covered in more detail later in this
chapter.
6.2.2 VERTEX PIPELINE
Bottlenecks in the vertex pipeline can be found by making two tests (Figure 6.5). First,
you should try rendering only every other triangle but keeping the vertex arrays used
intact. Second, you should try to reduce the complexity of the t ransformation and lighting
pipeline. If both of these changes show performance improvements, the application is
bound by vertex processing. If only the reduced triangle count shows a difference, we
have a submission bottleneck, i.e., we are bound by how fast the vertex and primitive data
can be transferred from the application.
When analyzing the vertex pipeline, you should always scale the viewport to make the
rendering resolution small in order to keep the cost of pixel processing to a minimum.
A good size for the current mobile phone display resolutions would be 8 × 8 pixels or
Limited by geometry processing
Limited by T&L
Limited by the
lighting pipeline
Limited by the
vertex pipeline
Limited by triangle setup
Reduce the number of triangles
Use fewer triangles
Use fewer and

simpler lights
Use fewer triangles,
8/16-bit vertices
Disable lighting
Faster
Faster
No effect
No effect
Figure 6.5: Finding the performance bottleneck in geometry-limited rendering.
140 PERFORMANCE AND SCALABILITY CHAPTER 6
so. A resolution smaller than this might cause too many triangles to become subpixel-
sized; optimized drivers would cull them and skip their vertex processing, complicating
the analysis.
Submission bottlenecks can be addressed by using smaller data formats, by organizing
the vertices and primitives in a more cache-friendly manner, by storing the data on the
server rather than in the client, and of course by using simplified meshes that have fewer
triangles. On the other hand, if vertex processing is the cause for the slowdown, the
remedy is to reduce complexity in the transformation and lighting pipeline. This is best
done by using fewer and simpler light sources, or avoiding dynamic lighting altogether.
Also, disabling fog, providing prenormalized vertex normals, and avoiding the use of
texture mat rices and floating-point vertex data formats are likely to reduce the geometry
workload.
6.2.3 APPLICATION CODE
Finally, it may be that the bottleneck is not in the rendering part at all. Instead, the
application code itself may be slow. To determine if this is the case, you should turn off
all application logic, i.e., just execute the code that performs the per-frame rendering. If
significant performance differences can be observed, you have an application bottleneck.
Alternatively, you could just comment out all rendering calls, e.g., glDrawElements
in OpenGL ES. If the frame rate does not change much, the application is not
rendering-bound.

A more fine-grained analysis is needed for pinpointing the slow parts in an application.
The best tool for this analysis is a profiler that shows how much time is spent in each func-
tion or line of code. Unfortunately hardware profilers for real mobile phones are both very
expensive and difficult to obtain. This means that applications need to be either executed
on other similar hardware, e.g., Lauterbach boards
3
are commonly used, or they may be
compiled and executed on a desktop computer where software-based profilers are readily
available. When profiling an application on anything except the real target device, the data
you get is only indicative. However, it may g ive you valuable insights into where time is
potentially spent in the application, the complexities of the algorithms used, and it may
even reveal some otherwise hard-to-find bugs.
As floating-point code tends to be emulated on many embedded devices, slowdowns
are often caused by innocent-looking routines that perform math processing for physics
simulation or game log ic. Re-writing these sections using integer arithmetic may yield sig-
nificant gains in performance. Appendix A provides an introduction to fixed-point pro-
gramming. Java programs have their own performance-related pitfalls. These are covered
in more detail in Appendix B.
3 www.lauterbach.com
SECTION 6.2 PERFORMANCE OPTIMIZATION 141
6.2.4 PROFILING OPENGL ES APPLICATIONS
Before optimizing your code you should always clean it up. This means that you should
first fix all graphics-related errors, i.e., make sure no OpenGL ES errors are raised. Then
you should take a look at the OpenGL ES call logs generated by your application. You
will need a separate tool for this: we will introduce one below. From the logs you will
get the list of OpenGL ES API calls made by your application. You should verify that
they are what you expect, and remove any redundant ones. At this stage you should trap
typical programming mistakes such as clearing the buffers multiple times, or enabling
unnecessary rendering states.
One potentially useful commercial tool for profiling your application is gDEBugger ES

from Graphic Remedy.
4
It is an OpenGL ES debugger and profiler that traces application
activity on top of the OpenGL ES APIs to provide the application behavior information
you need to find bugs and to optimize application performance (see Figure 6.6). gDEBug-
ger ES essentially transforms the debugging task of graphics applications from a “black
box” into a “white box” model; it lets you peer inside the OpenGL ES usage to see how
individual commands affect the graphic pipeline implementation. The profiler enables
viewing context state v ariables (Figure 6.7), texture data and properties, performance
counters, and OpenGL ES function call history. It allows adding breakpoints on OpenGL
ES commands, forcing the application’s raster mode and render target, and breaking on
OpenGL ES errors.
Another useful tool for profiling the application code is Carbide IDE From Nokia for S60
and UIQ Symbian devices. With commercial versions of Carbide you can do on-target
debugging, performance profiling, and power consumption analysis. See Figure 6.8 for
an example view of the performance investigator.
Figure 6.6: gDEBugger ES is a tool for debugging and profiling the OpenGL ES graphics driver.
4 www.gremedy.com
142 PERFORMANCE AND SCALABILITY CHAPTER 6
Figure 6.7: gDEBugger ES showing the state variables of the OpenGL ES context.
6.2.5 CHECKLISTS
This section provides checklists for reviewing a graphics application for high perfor-
mance, quality, portability, and lower power usage. Tables 6.1–6.4 contain questions that
should be asked in a review, and the “correct” answers to those questions. The appli-
cability of each issue is characterized as ALL, MOST, or SOME to indicate whether the
question applies to practically all implementations and platforms, or just some of them.
For example, on some platforms enabling perspective, correction does not reduce perfor-
mance while on others you will have to pay a performance penalty. Note that even though
we are using OpenGL ES and EGL terminology and function names in the tables, most
of the issues also apply to M3G.

SECTION 6.2 PERFORMANCE OPTIMIZATION 143
Figure 6.8: Carbide showing one of the performance analysis views. (Image copyright
c
 Nokia.)
Table 6.1 contains a list of basic questions to go through for a quick performance analysis.
The list is by no means exhaustive, but it contains the most common pitfalls that cause
performance issues.
A checklist of features affecting rendering quality can be found in Table 6.2. Questions
in the table highlight quality settings that improve quality but do not have any nega-
tive performance impact on typical graphics hardware. However, the impact on software
implementations may be severe.
In a similar fashion, Table 6.3 provides checks for efficient power usage, and finally,
Table 6.4 covers programming practices and features that may cause portability problems.

×