114 ANIMATION CHAPTER 4
Morphing is a good technique for producing complex deformations, such as facial
expressions, that are not easily reproduced by simple transformations. The downside
of morphing is that it takes a lot of memory per keyframe, so the number of base
shapes should be kept relatively small for complex meshes. Alternatively, morphing
can be used to define lots of keyframes for very simple meshes to get fairly complex
animation that is still computationally cheap to render in real time.
4.2.2 SKINNING
For complex characters with lots of vertices and more or less arbitrary numbers of possible
poses, morphing quickly becomes inefficient. Another way to deform meshes is to assign
vertices to the joints of an articulated skeleton, animate them, and connect the vertices
with a skin of polygons [Cat72]. However, that still leads to sharp changes at joints. During
the 1990s the gaming and 3D modeling industry generalized this approach and started
calling it skinning [Lan98]. The idea is that each vertex can be associated with sever al
joints or bones, weighted by linear weights. This technique is sometimes referred to as
subspace surface deformation, or linear blend skinning; we simply call it skinning. It is so
commonly used today that we can call it the de facto standard of char acter animation.
The general idea behind skinning is that instead of transforming the whole mesh with
a single transformation matrix, each vertex is individually transformed by a weighted
blend of several matrices as shown in Figure 4.7. By assigning different weights to different
vertices, we can simulate articulated characters with soft flesh around rigid bones.
The skeleton used in skinning stands for a hierarchy of transformations. An example
hierarchy can be seen in Figure 4.7. The pelvis is the root node, and the rest of the body
parts are connected to each other so that the limbs extend deeper into the hierarchy.
Each bone has a transformation relative to the parent node—usually at least translation
and rotation, but scaling can be used, for example, for cartoon-like animation. The hier-
archy also has a rest pose (also known as bind pose) in which the bone transformations
are such that the skeleton is aligned with the untransformed mesh.
Having the skeleton hierarchy, we can compute transformations from the bones to the
common root node. This gives us a transformation matrix T
i
for each bone i. The matrices
for the rest pose are important, and we denote those B
i
.
The relative transformation that takes a rest pose B to a target pose T is TB
−1
.From
this, and allowing a vertex v to have weighted influence w
i
from several bones, we get the
skinning equation for a transformed vertex
v
=
w
i
T
i
B
−1
i
v.
(4.12)
Note that we can either transform the vertex with each matrix, then compute a blend of
the transformed vertices, or compute a blend of the matrices and transform the vertex just
once using the blended matrix. The latter can in some cases be more efficient if the inverse
SECTION 4.2 DEFORMING MESHES 115
Figure 4.7: Left: A skeletally animated, or skinned, character. Each arrow ends in a joint, and each
joint has a bone transformation, usually involving at least translation and rotation.
Right: A close-up
of one animated joint, demonstrating vertex blending. The vertices around the joint are conceptually
transformed with both bone transformations, resulting in the positions denoted by the thin lines and
black dots. The transformed results are then interpolated (dotted line, white dots) to obtain the final
skins (thick lines).
transpose matrix is needed for transforming vertex normals. Also, the modelview matrix
can be premultiplied into each matrix T
i
to avoid doing the camera transformation as a
separate step after the vertex blending.
With hardware-accelerated skinning, using either vertex shaders or the OpenGL matrix
palette extension (Section 10.4.3), the vertices will be transformed each time the mesh is
rendered. With multi-pass rendering in particular, the mesh will therefore be transformed
multiple times. A software implementation can easily perform the calculations only when
necessary and cache the results, but this will still place a considerable burden on the CPU.
As an animated mesh typically changes for each frame, there is usually no gain from using
software skinning if hardware acceleration is available, but it is worth keeping the option
in mind for special cases.
The animation for skinning can come from a number of sources. It is possible to use
keyframe animation to animate the bones of the skeleton, with the keyframes modeled by
hand or extracted from motion capture data. Another possibility is to use physics-based
animation. Rigid body dynamics [Len04] are often used to produce “ragdoll” effects, for
example when a foe is gunned down and does a spectacular fall from height in a shooter
game. Inverse kinematics (IK) [FvFH90, WW92] can also be used to make hands touch
scene objects, align feet with the ground, and so forth. Often a combination of these tech-
niques is used with keyframe animation driving the normal motion, rigid body dynamics
116 ANIMATION CHAPTER 4
stepping in for falling and other special effects, and IK making small corrections to avoid
penetrating scene geometry.
4.2.3 OTHER DYNAMIC DEFORMATIONS
Naturally, dynamic deformation of meshesneed not be limited to morphing and skinning.
As we can apply arbitrary processing to the vertices, either in the application code or, more
commonly, in gr aphics hardware, almost unlimited effects are possible.
One common example of per-vertex animation is water simulation. By applying displace-
ments to each vertex based on a fluid simulation model, a convincing effect can be created.
Different kinds of physics-based deformation effects include soft body modeling, whereby
the mesh deforms upon contact based on, for example, a mass-and-spring simulation. A
variation of this is cloth modeling, where air density plays a more important role. The
details of creating these and other effects are be yond the scope of this book. For further
information, refer to the bibliography ([WW92, EMP
+
02]).
Once the vertex data is dynamically modified by the application, it needs to be fed to the
rendering stage. Most graphics engines prefer static vertex data, which allows for opti-
mizations such as precomputing bounding volumes or optimizing the storage format and
location of the data. Vertex data that is dynamically uploaded from the application pro-
hibits most such optimizations, and it also requires additional memory bandwidth to
transfer the data between application and graphics memory. Therefore, there is almost
always some performance reduction associated with dynamically modifying vert ices. The
magnitude of this performance hit can vary greatly by system and application—for exam-
ple, vertex shaders in modern GPUs can perform vertex computations more efficiently
than application code because there is no need to move the data around in memory, and
it has an instruction set optimized for that particular task. This is also the reason that
modern rendering APIs, including both OpenGL ES and M3G, have built-in support for
the basic vertex deformation cases—to enable the most efficient implementation for the
underlying hardware.
5
CHAPTER
SCENE MANAGEMENT
By dealing with individual triangles, matrices, and disparate pieces of rendering state, you
are in full control of the rendering engine and will get exactly what you ask for. However,
creating and managing 3D content at that level of detail quickly becomes a burden; this
typically happens when cubes and spheres no longer cut it, and graphic artists need to
get involved. Getting their animated object hierarchies and fancy materials out of 3ds
Max or Maya and into your real-time application can be a big challenge. The task is
not made any easier if your runtime API cannot handle complete objects, materials,
characters, and scenes, together with their associated animations. The artists and their
tools deal with higher-level concepts than triangle strips and blending functions, and
your runtime engine should accommodate to that.
Raising the abstraction level of the runtime API closer to that of the modeling tools
facilitates a content-driven approach to development, where designers can work inde-
pendently of programmers, but it has other benefits as well. It flattens the learning
curve, reduces the amount of boilerplate code, eliminates many common sources of
error, and in general increases the productivity of both novice and expert programmers.
A high-level API can also result in better performance, particularly if you are not already
a 3D guru with in-depth knowledge of all the software and hardware configurations
that your application is supposed to be running on.
In this chapter, we take a look at how 3D objects are composed, how the objects can
be organized into a scene graph, and how the scene graph can be efficiently rendered
and updated. Our focus is on how these concepts are expressed in M3G, so we do not
117
118 SCENE MANAGEMENT CHAPTER 5
cover the whole spectrum of data structures that have been used in other systems or
that you could use in your own g ame engine. For the most part, we will use terminology
from M3G.
5.1 TRIANGLE MESHES
A 3D object combines geometric primitives and rendering state into a self-contained visual
entity that is easier to animate and interact with than the low-level bits and pieces are. 3D
objects can be defined in many ways, e.g., with polygons, lines, points, B
´
ezier patches,
NURBS, subdivision surfaces, implicit surfaces, or voxels, but in this chapter we concen-
trate on simple triangle meshes, as they are the only type of geometric primitive supported
by M3G.
A triangle mesh consists of vertices in 3D space, connected into triangles to define a
surface, plus associated rendering state to specify how the surface is to be shaded. The
structure of a triangle mesh in M3G is as shown in Figure 5.1: vertex coordinates, other
per-vertex attributes, and triangle indices are stored in their respective buffers, while
rendering state is aggregated into what we call the appearance of the mesh. Although
this exact organization is specific to M3G, other scene graphs are usually similar. We
will explain the function of each of the mesh components below.
VertexBuffers are used to store per-vertex attributes, which, in the case of M3G,
include vertex coordinates (x, y, z), texture coordinates (s, t, r, q), normal vectors
(n
x
, n
y
, n
z
), and colors (R, G, B, A). Note that the two first texture coordinates (s, t) are
Mesh
VertexBuffer
Texture
Coordinates
Coordinates
Normals
Colors
Texture
Coordinates
VertexArray
VertexArray
VertexArray
VertexArray
VertexArray
Appearance
Appearance
Appearance
IndexBuffer
IndexBuffer
IndexBuffer
Figure 5.1: The components of a triangle mesh in M3G.
SECTION 5.1 TRIANGLE MESHES 119
enough for typical use cases, but three or four can be used for projective texture mapping
and other tricks.
The coordinates and normals of a triangle mesh are given in its local coordinate system—
object coordinates—and are transformed into eye coordinates by the modelview matrix.
The mesh can be animated and instantiated by changing the modelview matr ix between
frames (for animation) or between draw calls (for instantiation). Texture coordinates are
also subject to a 4 × 4 projective transformation. This allows you to scroll or otherwise
animate the texture, or to project it onto the mesh; see Section 3.4.1 for details.
IndexBuffers define the surface of the mesh by connecting vertices into triangles, as
shown in Figure 5.2. OpenGL ES defines three ways to form triangles from consecutive
indices—triangle strips, lists, and fans—but M3G only supports triangle strips. There
may be multiple index buffers per mesh; each buffer then defines a submesh, which is the
basic unit of rendering in M3G. Splitting a mesh into submeshes is necessary if different
parts of the mesh have different rendering state; for example, if one part is translucent
while others are opaque, or if the parts have different texture maps.
The Appearance defines how a mesh or submesh is to be shaded, textured, blended,
and so on. The appearance is typically divided into components that encapsulate coher-
ent subsets of the low-level rendering state: Figure 5.3 shows how this was done for M3G.
The appearance components have fairly self-explanatory names: the Texture2D object,
for instance, contains the texture blending, filtering, and wrapping modes, as well as the
4 × 4 texture coordinate transformation matrix. The texture image is included by refer-
ence, and stored in an Image2D object. Appearances and their component objects can
vertices
coordinates
triangles
texcoords
indices
colors
normals
1234
xxxxyyyyzzzz
rrrrggggbbbb
sssstttt
123424453376
nx nx nx nxny ny ny nynz nz nz nz
T1 T2 T3 T4
2
1
T1
3
T2
T3
4
Figure 5.2: Triangle meshes are formed by indexing a set of vertex arrays. Here the triangles are organized into a triangle
list, i.e., every three indices define a new triangle. For example, triangle T2 is formed by the vertices 2, 4, and 3.
120 SCENE MANAGEMENT CHAPTER 5
Appearance
Material
Fog
Compositing
Mode
Polygon
Mode
Image2D
Image2D
Texture2D
Texture2D
Figure5.3: The appearance components in M3G. Implementations may support an arbitrary number
of texturing units, but the most common choice (two units) is shown in this diagram.
be shared between an arbitrary number of meshes and submeshes in the scene graph. The
appearance components of M3G are discussed in detail in Chapter 14.
5.2 SCENE GRAPHS
Rendering a single 3D object may be useful in a demo or a tutorial, but to create something
more exciting you will need a number of 3D objects in a particular spatial and logical
arrangement—a 3D scene.
3D scenes can be organized into many different data structures that are collectively
referred to as scene graphs. The term is decidedly vague, covering everything from simple
lists of objects up to very sophisticated spatial databases. In this section we aim to char-
acterize the design space of scene graphs, progressively narrowing down our scope to the
small subset of that space that is relevant for M3G.
5.2.1 APPLICATION AREA
When setting out to design a scene graph system, the first thing to decide is what it is for.
Is it for graphics, physics, artificial intelligence, spatial audio, or a combination of these?
Is it designed for real-time or offline use, or both? Is it for a specific game genre, such
as first-person shooters or flight simulators, or maybe just one title? A unified scene rep-
resentation serving all conceivable applications would certainly be ideal, but in practice
we have to specialize to avoid creating a huge monolithic system that runs slowly and is
difficult to use.
SECTION 5.2 SCENE GRAPHS 121
Typical scene graphs strike a balance by specializing in real-time animation and rendering,
but not in any particular application or game genre. This is also the case with M3G.
Physics, artificial intelligence, audio, user interaction, and everything else is left for the
user, although facilitated to some extent by the ability to store metadata and invisi-
ble objects into the main scene graph. Adjunct features such as collision detection are
included in some systems to serve as building blocks for physics simulation, path find-
ing, and so on. M3G does not support collision detection, but it does provide for simple
picking—that is, shooting a ray into the scene to see which object and triangle it first inter-
sects. This can be used as a replacement to proper collision detection in some cases.
5.2.2 SPATIAL DATA STRUCTURE
Having decided to go for a rendering-oriented scene graph, the next step is to pick the
right spatial data structure for our system. The application areas or game genres that we
have in mind play a big role in that decision, because there is no single data structure that
would be a perfect fit for all types of 3D scenes.
The main purpose of a spatial data structure in this context is visibility processing, that
is, quickly determining which parts of the scene will not contribute to the final rendered
image. Objects may be too far away from the viewer, occluded by a wall, or outside the
field of view, and can thus be eliminated from further processing. This is called visibility
culling. In large scenes that do not fit into memory at once, visibility processing includes
paging, i.e., figuring out when to load each part of the scene from the mass storage device,
and which parts to remove to make room for the new things.
Depending on the type of scene, the data structure of choice may be a hierarchical space
partitioning scheme such as a quadtree, octree, BSP tree, or kd-tree. Quadtrees, for exam-
ple, are a good match with terrain rendering. Some scenes might be best handled with
portals or precomputed potentially visible sets (PVS). Specialized data structures are
available for massive terrain scenes, such as those in Google Earth. See Chapter 9 of
Real-Time Rendering [AMH02] for an overview of these and other visibility processing
techniques.
Even though this is only scratching the surface, it becomes clear that having built-in
support for all potentially useful data structures in the runtime engine is impossible.
Their sheer number is overwhelming, not to mention the complexity of implementing
them. Besides, researchers around the world are constantly coming up with new and
improved data structures.
The easy way out, taken by M3G and most other scene graphs, is to not incorporate any
spatial data structures beyond a transformation hierarchy, in which scene graph nodes
are positioned, oriented, and otherw ise transformed with respect to their scene graph
parents. This is a convenient way to organize a 3D scene, as it mirrors the way that things
are often laid out in the real world—and more important, in 3D modeling tools.
122 SCENE MANAGEMENT CHAPTER 5
The solar system is a classic example of hierarchical transformations: the moons orbit
the planets, the planets orbit the sun, and every thing revolves around its own axis. The
solar system is almost trivial to set up and animate with hierarchical transformations, but
extremely difficult without them. The human skeleton is another typical example.
Visibility processing in M3G is limited to view frustum culling that is based on a bounding
volume hierarchy; see Figure 5.4. While the standard does not actually say anything about
bounding volumes or visibility processing, it appears that all widely deployed implemen-
tations have independently adopted similar means of hierarchical view frustum culling.
We will discuss this in more detail in Section 5.3.
Implementing more specialized or more advanced visibility processing is left for the user.
Luckily, this does not mean that you would have to ditch the whole scene graph and start
from scratch if you wanted to use a quadtree, for instance. You can leverage the built-in
scene tree as a basis for any of the tree structures mentioned above. Also, the same triangle
meshes and materials can often be used regardless of the higher-level data structure.
The fact that typical scene graphs are geared toward hier archical view frustum culling and
transformations is also their weakness. There is an underlying assumption that the scene
graph structure is a close match to the spatial layout of the scene. To put it another way,
nodes are assumed to lie close to their siblings, parents, and descendants in world space.
Violating this assumption may degrade performance. If this were not the case, you might
want to arrange your scene such that all nonplayer characters are in the same branch of
the graph, for instance.
The implicit assumption of physical proximity may also cause you trouble when nodes
need to be moved with respect to each other. For instance, characters in a game world
A
AB
BC
C
D
D
Figure 5.4: A bounding volume hierarchy (BVH) consisting of axis-aligned bounding boxes, illustrated in two dimensions
for clarity. The bounding volume of node A encloses the bounding volumes of its children.
SECTION 5.2 SCENE GRAPHS 123
may be wandering freely from one area to another. The seemingly obvious solution is
to relocate the moving objects to the branches that most closely match their physical
locations. However, sometimes it may be difficult to determine where each object should
go. Structural changes to the scene graph may not come for free, either.
5.2.3 CONTENT CREATION
Creating any nontr ivial scene by manually typing in vertices, indices and rendering state
bits is doomed to failure. Ideally, objects and entire scenes would be authored in commer-
cial or proprietary tools, and exported into a format that can be imported by the runtime
engine. M3G defines its own file format to bridge the gap between the runtime engine
and DCC tools such as 3ds Max, Maya, or Softimage; see Figure 5.5. The file format is a
precise match with the capabilities of the runtime API, and supports a reasonable subset
of popular modeling tool features.
From the runtime engine’s point of view, the main problem with DCC tools is that they
are so flexible. The scene graph designer is faced with an abundance of animation and
rendering techniques that the graphics artists would love to use, but only a fraction of
which can be realistically supported in the runtime engine. See Figure 5.6 to get an idea
of the variety of features that are available in a modern authoring tool.
DCC tool
Exporter
Intermediate
Format
(e.g., COLLADA)
Optimizer &
Converter
Delivery
Format
(M3G)
M3G Loader
Runtime Scene Graph
Figure5.5: A typical M3G content production pipeline. None of the publicly available exporters that we are aware of actually
use COLLADA as their intermediate format, but we expect that to change in the future.