Tải bản đầy đủ (.pdf) (31 trang)

Improving The 3D Talking Head For Using In An Avatar Of Virtual Meeting Room.pdf

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (13.84 MB, 31 trang )

VIETNAM NATIONAL UNIVERSITY, HANOI
COLLEGE OF TECHNOLOGY

ANH DUC NGUYEN

IMPROVING THE 3D TALKING HEAD
FOR USING IN AN AVATAR
OF VIRTUAL MEETING ROOM
Branch: Information Technology
Code: 1.01.10
MASTER THESIS
Supervisor: Dr. The Duy Bui

Hanoi, November 2006


Contents
List of Figures..................................................................................................................3
Chapter 1 - Introduction................................................................................................. 5
1.1 The avatar in the virtual meeting room ...........................................................5
1.2 Structure of this thesis.......................................................................................6
Chapter 2 - The 3D animated talking head................................................................ 8
2.1 A muscle based 3D face m odel........................................................................8
2.2 Combination of facial movements on a 3D talking head.............................9
2.3 From emotions to emotional facial expressions...........................................12
2.4 Conclusion.........................................................................................................15
Chapter 3 - OpenGL and JO G L overview............................................................... 16
3.1 OpenGL overview............................................................................................16
3.1.1 Immediate Mode and Retained Mode (Scene Graphs)........................16
3.1.2 OpenGL history.........................................................................................16
3.1.3 How does OpenGL work?........................................................................17


3.1.4 OpenGL as a state machine..................................................................... 19
3.1.5 Drawing geometry.................................................................................... 20
3.2 JOGL overview.................................................................................................22
3.2.1 Introduction............................................................................................... 22
3.2.2 Developing with JOGL............................................................................23
3.2.3 Using JOGL............................................................................................... 24
3.3 Conclusion.........................................................................................................25
Chapter 4 - Improving lip-sync ability......................................................................26
4.1 Introduction....................................................................................................... 26
4.2 Previous work...................................................................................................27
4.3 FreeTTS and Mbrola........................................................................................28
4.3.1 FreeTTS......................................................................................................28
4.3.2 Mbrola........................................................................................................31
4.4 The improved lip model...................................................................................32
4.5 Conclusion.........................................................................................................35
C hapter 5 - Adding the hair and eyelashes models................................................36
5.1 Introduction....................................................................................................... 36


5.2 The Hair model.................................................................................................. 37
5.2.1 Introduction to V RM L............................................................................. 37
5.2.2 Our hair model............................................................................................39
5.3 The Eyelashes m odel........................................................................................ 42
5.4 Conclusion..........................................................................................................44
Chapter 6 - Implementation and illustrations......................................................... 45
6.1 Implementing the face m odel..........................................................................45
6.1.1 Structure of the system............................................................................. 45
6.1.2 Some improvements..................................................................................46
6.2 Face model illustrations....................................................................................47
Chapter 7 - Conclusion.................................................................................................. 56

Future research............................................................................................................... 56
References...........................................................................................................................58

2


L is t o f F ig u re s
2.1: The original 3D face model: (a): The face mesh with muscles; (b): The
face after rendering......................................................................................................9
2.2: System overview.......................................................................................................10
2.3: Combination of two movements in the same channel........................................ 11
2.4: The activity of Zygomatic Major and Orbicularis Oris before (top) and after
(bottom) applying combination algorithm............................................................11
2.5: The emotion-to-expression system ........................................................................12
2.6: Membership functions for emotion intensity (a) and muscle contraction
level (b ).....................................................................................................................13
2.7: Basic emotions: neutral, Sadness, Happiness, Anger, Fear, Disgust, Surprise
(from left to right)................................................................................................... 15
3.1: Software implementation of OpenGL....................................................................18
3.2: Hardware implementation of OpenGL................................................................. 18
3.3: A simplified version of OpenGL pipeline.............................................................19
3.4: The structure of an application using JOGL.........................................................25
4.1: FreeTTS Architecture...............................................................................................29
5.1: Dividing a polygon (a) to triangles (b)...................................................................40
5.2: Importing the hair model: (a): the original head; (b): the head with the
imported hair model; (c): the head with the imported and fine tuned hair
model........................................................................................................................... 41
5.3: Some other imported and fine tuned hair models...................................................41
5.4: The open (a) and close eyes (b) without and with eyelashes..............................43
5.5: The face without (a) and with (b), (c) the hair and eyelashes models.............. 44

6.1: The main interface of our program......................................................................... 47
6.2: The face model displays Happiness emotion with maximum intensity............ 48
6.3: The face model displays Surprise emotion with maximum intensity................48
3


6.4: The combination of two emotions: Happiness andSurprise................................49
6.5: The effect of left Zygomatic Major muscle’scontractionat maximum level
on the face model....................................................................................................... 49
6.6: The face model from different view points..............................................................50
6.7: Increasing surprise.......................................................................................................50
6.8: The hair model after being imported.........................................................................51
6.9: The hair model after being fine tuned.......................................................................51
6.10: Some other hair models............................................................................................ 52
6.11: Closing the eyes.........................................................................................................53
6.12: The face model attach to the body........................................................................ 54
6.13: Our face model embeds intoother project..............................................................54

4


Chapter 1
Introduction
1.1 The avatar in the virtual meeting room
The Virtual Meeting Rooms (VMRs) are 3D virtual simulations of meeting
rooms where the various modalities such as speech, gaze, distance, gestures and
facial expressions can be controlled (a VMR project in Twente). The rapid
development in computer graphics and embodied conversational agents areas allows
the creation of VMRs and makes them to be useful for various purposes. These
purposes can be divided into three following categories [24], First, they can be used

as a virtual environment for teleconferencing, a real-time communication means for
remote participations of meeting [18]. Using the VMRs helps to reduce the amount
of data that needs to be sent to and displayed on screens of remote client side. In
addition, they offer to overcome some features that are problematic in real meetings
or in traditional video-based conferences. For examples, the participants can adapt
the Virtual Environment to their own preferences without disturbing other people or
they can choose a view from any seat in VMRs that they want and feel the
comfortable during the meeting [17]. Second, VMRs are used to simulate the
content of recorded meeting in the different ways or present multimedia information
about it. Information can be directly recorded from participant’s behaviors in real
meetings (e.g. tracking of head or body movements, voice). These presentations can
be used as a 3D summary of the real meetings or for evaluating the annotations and
results which are obtained by machine learning methods. Third, because Virtual
Environments allow controlling various independent factors (voice, gaze, distance,
gestures, and facial expressions); these factors can be used to study their influence
on features of social interaction and social behavior. Conversely, the effect of social
interaction on these factors can be studied adequately in Virtual Environments as
well.
In the VMRs environment, each participant is represented by an avatar. An
avatar is an embodied conversational agent that simulates all behaviors and
movements of the participant. The avatar will typically contain a talking head which
is able to speak and displays lip movements during speech, emotional facial
5


Introduction
expressions, conversation signals and a body which is able to display gestures of the
participant. The important thing is the avatar of each participant must bring the
belief to other participants. The avatar will be believable if it can simulate the
appearance, express the characteristics of the participant and its actions and

reactions can be as true to life as those of the person it is representing.
The talking head model plays an important role in the creation of a believable
avatar. It is not only used to display facial movements and expressions but also used
to distinguish other avatars and to express the personality of the participant. In order
to create a talking head model which is suitable to use for avatar in the VMRs, there
are some problems which need to deal with. First, the talking head must be simple
enough to keep the real-time animation but still produce realistic and high quality
facial expressions. Second, the talking head not only has the capabilities to create
facial movements such as conversational signals, emotions expressions, etc but also
has to combine and solve the conflicts between them. Third, the talking head must
look like real head, it means the head must have other models attached to it such as
hair models, tongue model, eyelashes model, etc.
In this thesis, we choose the talking head model from [3] to improve and then
use for avatars in VMRs. We study the model carefully to discover all advantages as
well 93 disadvantages. The advantages will be inherited while the missing functions
or disadvantages will be supplemented or improved, respectively. We change the
rendering method of the head to new one to improve the animation speed. The
synchronization between audible and visible speech is also improved. We supply
the hair and the eyelashes models to make the head look more realistic. The
improved model not only can be used for avatars in VMRs environment but also
can be embedded into other projects.

1.2 Structure of this thesis
In the Chapter 2, we introduce the 3D animated talking head [3] that our
works are based on. This head is able to produce realistic facial expressions, real
time animation on the personal computer. It can display several types of facial
movements such as eye blinking, head rotation, lip movement, etc at once and the
most important thing is it can generate emotional facial expressions from emotions.
We briefly introduce the way this muscle based 3D face model is created, the
6



Introduction
techniques it uses for producing animation, the combination of facial movements
and how to generate emotional facial expressions from emotions.
In the Chapter 3, we present an overview of OpenGL and JOGL (Java
bindings for OpenGL). OpenGL is industry standard and premier environment for
developing 2D and 3D graphic applications. Its capabilities allow developer to
display compelling graphics and produce applications that require maximum
performance (OpenGL project). JOGL is new OpenGL interface for Java platform.
It is open sourced, clean and minimalist API from all bindings available.
In the Chapter 4, we introduce an overview of FreeTTS and Mbrola. FreeTTS
is a robust text-to-speech system that we used to get phonemes and timing
information from a text. This phonemes string is used to generate lip movements
when speaking. FreeTTS supports Mbrola which is a speech synthesizer based on
the concatenation of diphones. We used Mbrola as an output thread of FreeTTS to
produce synthetic audible. We also present the method to improve the lip-sync
capability. The original head can speak but in some conditions the speech from the
speaker does not synchrony with the movements of the lip on the screen. Besides,
we may want the head to express various emotions depends on current speaking
sentence, so we need to know exactly time when the sentence is spoken then we can
generate the suitable emotions.
The original head does not have hair model and eyelashes. We supply these
parts in order to make it look like a real head and become more attractive. In the
Chapter 5, we present the method to apply a hair model for the head and the way we
draw eyelashes for the eyes. Available hair models will be attached to the head
model without much human intervention during process. In addition, the eyelashes
are a small part on the face but without them, the eyes may not look real. The
eyelashes also help to improve the emotions expression capability of the eyes when
the eyes flutter. We describe some problems about the eyelashes creation, and how

to fix them to the eyelid so they can move with the eyelid when the eyes close or
open.
In the Chapter 6, we introduce the implementation of the face using Java and
JOGL. We also introduce our improvement in rendering method of the talking head
using the new methods and mechanism which are introduced in OpenGL 1.5. This
method helps to increase the animation speed significantly. Some illustrations of
our 3D talking head model are also introduced in this Chapter.
7


Chapter 2
The 3D animated talking head
2.1 A muscle based 3D face model
The face model is created by a polygonal face mesh and a B-spline surface for
the lips. The face mesh data was obtained from a 3D scanner at first and was
processed to improve the animation performance but still kept the high quality of
the model. The process contains two phases. In the first phase, the number of
vertices and polygons was reduced in non-expressive parts but maintained in the
expressive parts which are the areas around the eyes, the nose, the mouth and the
forehead. At the end of this phase, the face mesh contains 2,468 vertices and 4,746
polygons. This is small enough to have real-time animation but still preserves the
high quality of detail in expressive parts of the face. In the second phase, the face
model was divided into eleven regions. Five regions on the left part include of left
lower face, left middle face, left lower eyelid, left upper eyelid and left upper face.
There are five corresponding regions on the right part and the last region is at the
back of the head. This not only helps to prevent unwanted artifacts generated
because of the displacement of the vertices in the regions that should not be affected
by muscle contractions but also increase the animation speed.
The lip model is a B-spline surface with 24 x 6 control point grid. The lip is
deformed by moving the control points and the B-spline surface is polygonalized to

connect with the face mesh for rendering. The B-spline surface has the advantage of
producing a smooth face but it can not produce wrinkles and needs to be
polygonalized before rendering. If the number of control point is too large then it
will require heavy computations. Due to these advantage and disadvantage, it is
suitable to use B-spline surface for modeling the small part of the face like the lips.
Almost all of the 19 muscles, which are used on the face to generate
animation, are vector muscles, except Orbicularis Oris that drive the mouth and
Orbicularis Oculi that drive the eye. The vector muscle of the face is an improved
version of the vector muscle model from [28]. In addition, a mechanism to generate
wrinkles and bulges is added to increase the realistic of the facial expressions and
the technique to reduce the computation is also introduced to enhance the animation
8


The 3D animated talking head
performance. The Orbicularis Oris muscle is parameterization-based and is adopted
from [12]. The Orbicularis Oculi has two parts: the Pars Palpebralis that open and
closes the eyelid, is adopted from [22] and Pars Orbitalis that squeezes the eye, is
adopted from [28], The jaw and the eyeball rotation algorithms are improved from
the ones proposed in [22]. The mouth now has a natural oval-looking, and the eyes
can track a target. Eye movement is independent of facial muscle movements and
can not rotate to impossible positions. All muscles have the intensity range from 0
to 1, the step value between two adjacent muscle contractions is 0.2. This step value
is determined after trail and error experiments. It is small enough to ensure the
facial animations are smooth and large enough to decrease the computation times.
Figure 2.1 shows the original face from [3].

(a)

(b)


Figure 2.1: The original 3D face model
(a): The face mesh with muscles; (b): The face after rendering

2.2 Combination of facial movements on a 3D talking
head
The system takes as input the marked up text with each facial movement
(except lip movement while talking) is defined as a group of muscle contractions
that share the same function, start time, onset, offset and duration. Lip movement
will be generated separately inside the system based on the phonemic presentation
of the input text.

9


The 3D animated talking head

Figure 2.2: System overview
There are several types of facial movements on the face. They include lip
movements when talking, conversational signals, emotion displays, gaze and head
movements, and manipulators to satisfy biological requirements of the face. All of
them can occur at the same time and because they are driven by the muscle models,
there can be situations where there are conflicting muscles when two or more
movements happen at once. Conflicting muscles are muscles that can not contract at
the same time. For example, when we smile the Zygomatic Major and Minor
muscles contract to pull the comer of the lip outward. If at that time we
concurrently say “Hello”, the phoneme “@U” in the word “Hello” requires the
contraction of the Orbicularis Oris muscle which drives the lip into a tight, pursed
shape. So Zygomatic Major (and Minor) and Orbicularis are conflicting muscles.
The face must solve this problem to produce natural animation.

Each type of facial movements belongs to one channel. There are six channels
in the system: manipulators (eye blinking), lip movements (phoneme), conversation
signals (muscle contractions), emotion displays (expression), gaze movements (eye
movement) and head movements channel. The combination process contains two
steps. In the first step, the movements in each channel are concatenated to generate
smooth transactions between adjacent movements. In the second step, the
movements in all channels are combined and processed to solve “conflicting
muscles”.
10


The 3D animated talking head
12

11
O 1
o 09

J

-

------- First movement

/
/

ob

ãp06

m35
(TJ
0.4
Đô
O02
01 //

Combination o f two
movements

i1

I

/

/

'

\
\

N

\

--------Second m ovement

\


»I

*

\

\

#

\
\

\

\

\

Time’fin seconds)

Figure 2.3: Combination of two movements in the same channel
o>
-Jo
co
o(Z
c
o
o


a

o>
co
c
o
o
4 5

6

7 8

Time (in seconds)

Figure 2.4: The activity of Zygomatic Major and Orbicularis Oris before (top) and
after (bottom) applying combination algorithm.
Figure 2.3 is an example about combining two movements in the same
channel. The muscle’s activity of the first movement happens until time 3, when
11


The 3D animated talking head
there is a stimulus to the second movement, it stops following the first movement
and then release to the target value of the muscle in the second movement (0.5),
followed by the second movement.
Figure 2.4 is an example about combining two movements in different
channels. Because Zygomatic Major and Orbicularis Oris are conflicting muscles
and the Orbicularis muscle has higher priority when it is activated (at time 3), the

Zygomatic Major is inhibited. However, its activity is adjusted so it does not release
too fast which would create an unnatural movement. Zygomatic Major activity
releases gradually to zero value and then Orbicularis Oris muscle starts contracting.

2.3 From emotions to emotional facial expressions
There are six emotions are considered to be universal, this means they
associated consistently with the same facial expressions across different cultures.
These emotions are: Happiness, Anger, Surprise, Fear, Disgust and Sadness [34].
Other emotions on the face are considered to be generated by combining six basic
emotions above, but rarely more than two emotions occur at the same time. So, two
aspects of generating emotional facial expressions from emotions are concentrated
to solve. First, depending on the intensity of emotion, the face must display the
continuous changes in expressions. Second, the face must have a method to
combine expressions from two emotions. A fuzzy rule-based system is suitable for
these requirements because it allows incorporating qualitative as well as
quantitative information.
Single Expression Mode
FRBS
T
FMCV)
i

Blend Expression Mode
FRBS
Figure 2.5: The emotion-to-expression system
12


The 3D animated talking head
There are two fuzzy rule-based systems implemented to convert from emotion

intensities to muscles contraction levels, which are used to generate emotional
expressions on the 3D face model. The first fuzzy rule-based system is used to
produce contraction levels from single emotion intensity, it is called “Single
Expression Mode”. The second one is used when two emotion intensity values are
converted to muscle contractions levels, it is called “Blend Expression Mode”. A
mechanism to select Single or Blend Expression Mode is based on the intensities of
the emotions felt. When a single emotion expresses, the Single mode is chosen. The
Blend Expression Mode is chosen when more than one emotion expresses, but only
the two highest emotion intensity values are used (Figure 2.5).
^ in te n s ity (em o,'onl

Ik'nriT or Intensity
(a)
M ie v e l(,m ,scle- co,Uracnon *

lX»j»ree of Level
(b)
Figure 2.6: Membership functions for emotion intensity (a) and muscle contraction
level(b)
13


The 3D animated talking head
The intensity of each emotion is modeled by five fuzzy sets: VeryLovv, Low,
Medium, High and VeryHigh. Similarly, the contraction level of each muscle is
described by five fuzzy sets: VerySmall, Small, Medium, Big and VeryBig. By
using these fuzzy sets, the system can describe emotions qualitative descriptions
like “surprise then lift eyebrows’' and quantitative descriptions like “if the level of
sadness is low then draw the eyebrows together; while if the level of sadness is
high, then draw the eyebrows together and draw the corners of the lips down.”, etc.

The form in Figure 2.6 of the membership functions and the support of each
membership function are determined after experiments.
The rule in the single expression mode looks like following form:
if Sadness is VeryLow then
muscle 9’s contraction level is VerySmall
muscle 13’s contraction level is VerySmall
muscle 14’s contraction level is VerySmall
muscle 15’s contraction level is VerySmall
muscle 18’s contraction level is VerySmall
The rule in blend expression mode looks like following form:
if Surprise is Low and Fear is Medium then
muscle 9’s contraction level is Small
muscle 10’s contraction level is Small
muscle 16’s contraction level is Small
muscle 3’s contraction level is Medium
muscle 4’s contraction level is Medium
muscle 5’s contraction level is Medium
muscle 17’s contraction level is Medium
There are no rules to blend expressions of Happiness and Disgust, as well as
Sadness and Surprise, because there is no evidence that these emotions can happen
concurrently. For these expressions, only the emotion with higher intensity is
expressed.
Figure 2.7 displays six basic emotion facial expressions which are generated
from six corresponding emotions with all intensities are 1 (maximum value). The
quality of the facial expressions is improved by using the psychological-based and
fairly simple fuzzy rules rather than using other graphic algorithms and complicated
formulas or intensively trained Neural Networks.
14



The 3D animated talking head

Figure 2.7: Basic emotions: neutral, Sadness, Happiness, Anger, Fear, Disgust,
Surprise (from left to right)

2.4 Conclusion
This 3D face model is suitable for using in the avatar of virtual meeting room
because it can display facial expressions with real time animation. Beside verbal
movements (lip movements when speaking), it can display other non-verbal
behaviors such as eye blinking, head rotation, etc. It also can generate emotional
facial expressions from emotions and can combine different facial movements to
display at the same time. Not only is the face able to express six basic built-in
emotions but it can also generate many other emotions by controlling the muscles
model. Thus, the participants can express their own emotions and track the
emotions of the others through the face of avatar. They can benefit from verbal and
non-verbal communications and have a new way to find the points of interest in the
meeting. One important thing is that the face can help the avatar to bring the
plausibility to the participants so they can feel that they are in the real meeting with
real people.

15


Chapter 3
OpenGL and JOGL overview
3.1 OpenGL overview
3.1.1 Immediate Mode and Retained M ode (Scene Graphs)
There are two different types of APIs for programming real-time 3D
applications [32]. The first type is called retained mode. In retained mode, the
description of objects and the scene is provided to the API and then the graphics

package will create the image on the screen. All things the programmers need to do
is to give commands to change the position and viewing orientation of the user (also
called the camera) or other objects in the scene. The structure that has just be built is
called scene graph. The scene graph is a data structure that includes all the objects
in our scene and their relationships to others. Many high-level toolkits or "game
engines" use this approach. The programmer doesn't need to understand how the
scene is rendered because the graphic library will take care of rendering the model
or database that he hands over to it. Java3D is one example of scene graph API.
The second approach to 3D rendering is called immediate mode. Most retained
mode APIs or scene graphs use an immediate mode API internally to actually
perform the rendering. For examples, Java3D uses OpenGL or Direct3D to render
the geometry created by user. In immediate mode, the programmers don't describe
the models and environment at high a level as in retained mode. Instead, they issue
commands directly to the graphics processor. Each command has an immediate
effect depends on the current setting state and new commands have no effect on
rendering commands that have already been executed. This allows everything to be
controlled at low-level.
3.1.2 OpenGL history
OpenGL is an industry-standard, cross-platform Application Programming
Interface (API). The specification for this API was finalized in 1992, and the first
implementations appeared in 1993. The forerunner of OpenGL is Iris GL (Graphics
16


OpenGL and JOGL overview
Library), the API that was designed and supported by Silicon Graphics, Inc. To
establish an industry standard, Silicon Graphics collaborated with various graphics
hardware companies to create an open standard, which was named "OpenGL."
Until now, seven revisions have been introduced to add new functionality to
the API. The newest version of the OpenGL specification is 2.1.All newer versions

are upward compatible with earlier versions [4],
- Version 1.1 was finished in 1997 and added support for two important
capabilities: vertex arrays and texture objects.
- The specification for OpenGL 1.2 was released in 1998 and added support
for 3D textures and an optional set of imaging functionality.
- The OpenGL 1.3 specification was completed in 2001 and added support
for cube map textures, compressed textures, multi-textures, etc.
- OpenGL 1.4 was completed in 2002 and added automatic mipmap
generation, additional blending functions, internal texture formats for
storing depth values for use in shadow computations, support for drawing
multiple vertex arrays with a single command, more control over point
rasterization, control over stencil wrapping behavior, and various additions
to texturing capabilities.
- The OpenGL 1.5 specification was published in October 2003. It added
support for vertex buffer objects, shadow comparison functions and
occlusion queries.
- OpenGL 2.0, finalized in September 2004, opened up the processing
pipeline for user control by providing programmability for both vertex
processing and fragment processing. Other features added in 2.0 include
support for multiple render targets, nonpower-of-2 textures, point sprites,
and separate stencil functionality for front- and back-facing surfaces.
- Version 2.1, has just released in August 2006, added support for the
revision 1.20 of OpenGL shading language, non-square matrices, pixel
buffer objects and sRGB textures. |- ĐAI H O C Q U Ố C GIA HÀ NÒI
trung

TÁM THỒNG TIN THƯ VIỆN

31.1.3 How does OpenGL work?
OpenGL implementations can be software implementation or hardware

implementation. Window applications can call a Windows API which is called the
17


OpenGL and JOGL overview
Graphics Device Interface (GDI) to create output onscreen and graphic card vendors
usually supply a driver for GDI to interface with. A software implementation of
OpenGL takes graphics requests from an application and constructs (rasterizes) a
color image of the 3D graphics. This image then will be supplied to the GDI to
display on the monitor. Microsoft has its OpenGL software implementation and
almost modem operating system products from Microsoft contain support for
OpenGL. However, SGI and MESA also released software implementations of
OpenGL for Windows that greatly outperformed Microsoft's implementation.

Figure 3.1: Software implementation of OpenGL
An OpenGL hardware implementation usually takes the form of a graphics
card driver. OpenGL API calls from applications are passed to a hardware driver.
This driver does not pass its output to the Windows GDI for display, it interfaces
directly with the graphics display hardware, instead. The more components of
OpenGL are hardware implemented, the faster the implementation processes the
calls from applications and display images onscreen.

Figure 3.2: Hardware implementation of OpenGL
18


OpenGL and JOGL overview
When an application calls OpenGL API functions, the commands are placed in
a command buffer. Vertex data, texture data, etc are also contained in this buffer.
When the buffer is flushed, the commands and data are passed to the

“Transformation and Lighting” step. In this step, points used to describe an object's
geometry are recalculated to determine the given object's location and orientation.
Lighting calculations are performed as well to indicate the brightness of the colors
at each vertex. When this stage finished, the data is passed to the “Rasterization”
step of the pipeline. The rasterizer actually creates the color image from the
geometric, color, and texture data and places the image into the frame buffer. The
frame buffer is the memory area of the graphics display device, which means the
image is displayed on the screen. Figure 3.3 shows the simple view of OpenGL
pipeline. At a low level, there are many boxes inside each box of the diagram.

Figure 3.3: A simplified version of OpenGL pipeline
3.1.4 OpenGL as a state machine
OpenGL is designed as a state machine [21]. If we put it into specific states (or
modes) then these states will remain in effect until we change them. For example,
the current color is a state variable. We can set the current color to black, white, red,
or any other color, and all objects will be drawn with that color until we set the
current color to something else. The current color is only one of many state
variables that OpenGL maintains. The other states are current viewing and
projection transformations, line and polygon stipple patterns, polygon drawing
modes, pixel-packing conventions, positions and characteristics of lights, and
material properties of the objects being drawn.
The execution model for OpenGL can be described as client-server. An
application (the client) issues OpenGL commands that are interpreted and processed
by an OpenGL implementation (the server). Many server-side variables only have
two states: on or off, that are enabled or disabled with the command gl E n a b l e ()
or g l D i s a b l e (). For client-side, we enable it with g l E n a b l e C l i e n t S t a t e ()
and disable it with g l D i s a b l e C l i e n t S t a t e () commands. Each state variable or
19



OpenGL and JOGL overview
mode has a default value, and vve can query the system for each variable's current
value at any time. In addition, we can save a collection of server-side state variables
on an attribute stack with g l P u s h A t t r i b () and client-side state can be pushed on
second stack with g l P u s h C l i e n t A t t r i b (). We can temporarily modify the
states, and restore the values later with g l P o p A t t r i b () or
g l P o p C l i e n t A t t r i b () for server-side or client-side states, respectively. In the
case we only need to change the state temporarily, using these commands is likely
to be more efficient than issuing the query commands.
3.1.5 Drawing geometry
All graphic objects in OpenGL are constructed from geometric drawing
primitives. OpenGL only supports the following geometry primitives: points, lines,
line strips, line loops, polygons, triangles, triangle strips, triangle fans,
quadrilaterals, and quadrilateral strips. To send geometry data to OpenGL for
rendering, we have three main ways [25]. The first is the vertex-at-a-time method.
The command g l B e g i n () is called to start a primitive and then glEnd() to end it.
Between these two commands are commands that specify vertex attributes such as
vertex position, color, normal, texture coordinates. These commands are
g l V e r t e x * (), g l C o l o r M ) , g l N o rm al *( ), and g l T e x C o o r d * (), etc. When
the vertex-at-a-time method is used, the call g l V e r t e x * () signals the end of the
data definition for a single vertex, and it may also define the completion of a
primitive. After calling the command g l B e g i n () and specifying a primitive type, a
graphics primitive is completed by calling enough times g l V e r t e x * () to
completely specify a primitive of the indicated type. For example, a triangle is
completed every third time g l V e r t e x * () is called.
The second method to draw primitives is to use vertex arrays. With this
method, vertex attributes are stored in user-defined arrays, the applications then set
up pointers to the arrays, and use g l D r a w A r r a y s (), g l M u l t i D r a w A r r a y s (), or
g l l n t e r l e a v e d A r r a y s (), etc to draw a huge number of primitives at once.
Because this method can efficiently pass large amounts of geometry data to

OpenGL, it is usually used for portions of code that are extremely performance
critical. Using g l B e g i n () and glEnd(), application developers have to specify
each attribute of each vertex, so the number of function calls can become significant
when objects with thousands of vertices are drawn. In contrast, we can draw a large
20


OpenGL and JOGL overview
number of primitives with a single function call after the vertex data is organized
into arrays by using vertex arrays method. Besides, this method can be faster than
vertex-at-a-time method because it is often more efficient for the OpenGL
implementation to deal with data organized into arrays. OpenGL supports some
types of array includes colors array, vertex positions array, normal vectors array.
The values of current arrays are specified with g l C o l o r P o i n t e r (),
g l V e r t e x P o i n t e r (), g l N o r m a l P o i n t e r (), respectively. We have to indicate
which type of arrays will be used before calling g l D r a w A r r a y s () or
g l M u l t i D r a w A r r a y (). The function g l l n t e r l e a v e d A r r a y s () can specify
and enable several interleaved arrays simultaneously (e.g., each vertex might be
defined with three floating-point values representing a normal followed by three
floating-point values representing a vertex position.)
The two former methods are called immediate mode because primitives are
rendered right after they have been specified. In the third method all function calls
are stored in the display list and are pre-processed before executing. A display list is
an OpenGL-managed data structure that stores commands for later execution. Both
commands to set state and commands to draw geometry can be included in display
list and are stored on the server side. Display list can be processed later with
g l C a l l L i s t () or g l C a l l L i s t s () . The display list is initiated with
g l N e w L i s t (), and completed with g l E n d L i s t (). All the commands issued
between those two calls become part of the display list. There are although certain
OpenGL commands are not allowed within display lists. In common, display list

mode can provide a better performance than immediate mode. OpenGL
implementation can optimize the commands in the display list for the underlying
hardware and store the commands in a memory area that allow better drawing
performance such as in memory of the graphics accelerator. These optimizations
require some extra computation or data movement, so applications only see a
performance benefit if the display list is called more than once.
From OpenGL version 1.5, there is a mechanism that permits vertex array data
to be stored in server-side memory. This mechanism typically provides the highest
performance rendering because the data can be stored in memory on the graphics
accelerator and need not be transferred over the I/O bus each time it is rendered.
The g l B i n d B u f fer () command creates a buffer object in the memory of graphic
accelerator, g l B u f f e rD at a () and g l B u f f e r S u b D a t a () commands are used to
21


OpenGL and JOGL overview
specify the data values for that buffer. The API also supports to efficiently stream
data from client to server. gl M a p B u f f er () can map a buffer object into the client's
address space and obtain a pointer to this memory so that we can specified data
values directly. Before using other rendering commands that access the buffer, we
need to call g l U n m a p B u f fer () to remove the current pointer to that buffer object.

3.2 JOGL overview
3.2.1 Introduction
OpenGL is for making graphics and it is fast. In almost modern graphic card,
it is hardware accelerated. We can use OpenGL to create anything visually that we
would want to do. Unfortunately, OpenGL is written in C language. Besides, we
need to put graphics from OpenGL into a window to display them, but OpenGL
itself doesn't have any commands for us to create windows. This makes OpenGL
hard to learn for beginners or programmers that want to use true Object Oriented

Programming (OOP) language like Java. Java is possibly the most popular true
OOP language. There have been many attempts to combine OpenGL with Java and
provide access to OpenGL through a friendly Java API, such as Java 3D, OpenGL
for Java Technology (gl4java) and Lightweight Java Game Library (LWJGL) but
the most robust, simple and easy-to-use API was JOGL. The reason is JOGL is
supported by both Sun (the creators of Java) and SGI (the creators of OpenGL).
JOGL is a Java programming language binding for the OpenGL 3D graphics
API. It supports integration with the Java platform's AWT and Swing widget sets
while providing a minimal and easy-to-use API that handles many of the issues
associated with building multithreaded OpenGL applications. JOGL provides access
to the latest OpenGL routines (OpenGL 2.0 with vendor extensions) as well as
platform-independent access to hardware-accelerated off screen rendering. JOGL
also provides some of the most popular features introduced by other Java bindings
for OpenGL like GL4Java, LWJGL and Magician, including a composable pipeline
model which can provide faster debugging for Java-based OpenGL applications
than the analogous C program. JOGL differs from these libraries in that it merely
exposes the procedural OpenGL API via methods on a few classes, rather than
attempting to map OpenGL functionality onto the OOP paradigm [9],
22


OpenGL and JOGL overview
The JOGL binding is itself written almost completely in the Java
programming language. Indeed, the majority of the JOGL code is auto-generated
from the OpenGL C header files via a conversion tool named GlueGen, which was
programmed specifically to facilitate the creation of JOGL. GlueGen parses the C
header files and then magically creates the needed Java and JNI code necessary to
connect to those native libraries. This design decision has both its advantages and
disadvantages. The procedural and state machine nature of OpenGL is inconsistent
with the typical method of programming under Java, which is bothersome to many

programmers. However, the straightforward mapping of the OpenGL C API to Java
methods makes conversion of existing C applications and example code much
simpler. The thin layer of abstraction provided by JOGL makes runtime execution
quite efficient. Because most of the codes are auto-generated, all updates to
OpenGL can be added quickly to JOGL [30].
3.2.2 Developing with JOGL
JOGL was designed for the most recent versions of the Java platform and for
this reason; it supports only J2SE 1.4 and later. It also only supports true color (15
bits per pixel and higher) rendering and does not support color-indexed modes. It
was designed with New I/O (NIO) in mind and uses NIO internally in the
implementation.
To develop an application using JOGL, we need both jogl.jar and the
appropriate native library jar file (for example, jogl-natives-win32.jar). The jogl.jar
needs to be in C L A S S P A T H for compiling and running code, while the native library
file or files also need to be along the j a v a .l i b r a r y .p a t h at run time. We can
include the files with our code and point to them directly with - c l a s s p a t h and Djava. l i b r a r y . p a t h arguments. This approach helps end users who may not
want, or may not be able, to add files to these directories.
The recommended distribution vehicle for applications using JOGL is Java
Web Start. JOGL-based applications do not even need to be signed; all that is
necessary is to reference the JOGL extension JNLP file. Because the JOGL jar files
are signed, an unsigned application can reference the signed JOGL library and
continue to run inside the sandbox. The users only need to launch Java Web Start
and download the client application for the first time, the application then is cached
on the client machine and can be launched remotely offline.
23


OpenGL and JOGL overview
JOGL also supports Applet. The J O G L A p p l e t l n s t a l l e r is distributed
inside jogl.jar as a utility class in com. s u n .o p e n g l .util. This installer uses

some clever tricks to allow deployment of unsigned applets which use JOGL into
existing web browsers and JREs as far back as 1.4.2, which is the earliest version of
Java supported by JOGL. It requires that the developer host a local, signed copy of
jogl.jar and all of the jogl-natives jar flies; the certificates must be the same on all
of these jars. Because in the release builds of JOGL, all of these jar files are signed
by Sun Microsystems, so the developer can deploy applets without needing any
certificates.
3.2.3 Using JOGL
JOGL provides two basic widgets into which OpenGL rendering can be
performed. The G L C a n v a s is a heavyweight AWT widget which supports hardware
acceleration and which is intended to be the primary widget used by applications.
The G L J P a n e l is a fully Swing-compatible lightweight widget which supports
hardware acceleration but it is not as fast as the G L C a n v a s because it typically
reads back the frame buffer in order to draw it using Java2D. The G L jP a n e l is
intended to provide 100% correct Swing integration in the circumstances where a
G L C a n v a s can not be used.
Both the G L C a n v a s and G L J P a n e l implement a common interface called
G L A u t o D r a w a b l e so applications can switch between them with minimal code
changes. The G L A u t o D r a w a b l e interface provides:
- access to the GL object for calling OpenGL routines
- a callback mechanism (GL Ev entListener) for performing OpenGL
rendering
- a d i s p l a y 0 method for forcing OpenGL rendering to be performed
synchronously
- AWT- and Swing-independent abstractions for getting and setting the size
of the widget and adding and removing event listeners
Applications implement the G L E v e n t L i s t e n e r interface to perform
OpenGL drawing via callbacks. When the methods of the G L E v e n t L i s t e n e r are
called, the underlying OpenGL context associated with the drawable is already
24



×