3
EVOLUTION OF VoIP SIGNALING
PROTOCOLS1
This chapter reviews the existing and emerging VoIP signaling and call control
protocols. In PSTN networks, ISUP (ISDN user part) and TCAP (transaction
capabilities application part) messages of the SS7 protocol [1] are commonly
used for call control and interworking of services.
The first generation (released in 1996) of VoIP signaling and media con-
trol protocols, such as ITU-T’s H.225/H.245—defined under ITU-T’s H.323
umbrella protocol [2]—was intended to o¤er LAN-based real-time VoIP ser-
vices. These protocols already had the proper ingredients (such as support of
ISUP messaging for call control) to support interworking with PSTN networks
as well. Consequently, there was a flurry of networking activities to deliver
VoIP services in LAN or within enterprises and to o¤er long-haul (inter-LATA
and international) transport of VoIP. The latter is also known as cheap and
wireless quality long-distance voice service over wireline network using IP. How-
ever, the telecom service providers found the following two problems with ver-
sion 1 of the H.323 protocol:
a. Many of the desired and advanced PSTN-domain call features and ser-
vices could not be easily implemented using H.323v1 because of its lack
of openness (i.e., all of the procedures are internally defined), and
b. Scalable implementation was neither feasible nor cost-e¤ective because it
needed call state full proxies.
32
1 The ideas and viewpoints presented here belong solely to Bhumip Khasnabish, Massachusetts,
USA.
Implementing Voice over IP. Bhumip Khasnabish
Copyright
2003 John Wiley & Sons, Inc.
ISBN: 0-471-21666-6
These problems motivated ITU-T to release the second version of H.323 in
1998. H.323-v2 supports lightweight call setup—runs over UDP instead of
using multiple TCP sessions per call—and declares many of the mandatory
features and protocols of H.323v1 to be optional [3]. But in 1999, IETF
released the first version of its Internet paradigm, Web protocol (i.e., HTTP),
and well-defined semantics-based session initial protocol (SIP, RFC 3261) for
VoIP call control, and service (a superset of the PSTN domain) creation and
management. In addition, SIP supports call stateless proxies and allows tra-
versal of call states over many proxy hops [4,5]. These make scalable imple-
mentation of VoIP more feasible than was possible using H.323.
The race to catch up continued. ITU-T announced versions 3 and 4 of
H.323 and then certified H.323v4 in late 2000. H.323v4 supports the following
features: (a) extensive support of UDP, SCTP (defined later in this chapter),
and making H.245 optional; (b) enhanced support of security as defined in
H.235v2; (c) support of H.323 (URL) for a network-based presence and instant
messaging; and (d) support of tunnel-based signaling like ISUP, Q.SIG, and so
on and HTTP commands and stimulus-based call control.
IETF is also working on extending the service creation, security, and call
routing features of SIP (RFCs 3261, 3262, 3263, 3264, 3265, and 3266). Some
of these features are (a) instant messaging and presence management, (b)
advanced call routing and messaging features, and (c) support of SIP/SDP/
RTP message traversal over network address translation (NAT) and firewall
devices.
In parallel to the above-mentioned activities related to H.323v2 (and
beyond) and SIPv2, researchers at Cisco, Level 3 Communications, and Tel-
cordia developed a call/media control architecture for the next-generation
(packet-based) network that supports both IP telephony and evolution of
PSTN from a monolithic system to one that supports distributed call pro-
cessing. That architecture enables physical separation of call control intelli-
gence that resides in the media gateway controller (MGC) from the media-
adaptation/translation gateways (MGs). It also recommends a protocol called
MGCP (media gateway control protocol, RFC 2705, 1999), which was the
result of a merger of SGCP (simple gateway control protocol) and IPDC (IP
device control) protocol. MGCP supports PSTN evolution by allowing inter-
working with circuit-switched networks and devices (analog and digital POTS
phones) via the following predefined endpoints: (a) access and residential GWs,
and integrated network access server and VoIP GWs; (b) GWs supporting
ISUP and multifrequency-type trunks; and (c) announcement servers and net-
work access servers.
In order to provide seamless interoperability of call and service control
between PSTN and next-generation (packet-based) network domains, the
MGC needs to exchange control messages reliably and securely to the SS7
network via the signaling gateway (SG; it can use the SCTP protocol, RFC
2960, as discussed later). Note that in the PSTN network, the call control and
signaling intelligence reside in the SS7 network.
EVOLUTION OF VoIP SIGNALING PROTOCOLS
33
MGCP is currently enjoying the widespread approval of cable TV (CATV)-
based VoIP service providers (e.g., see PKT-SP-EC-MGCP-I04-011221.pdf at
www.packetcable.com/specifications/). Both IETF and ITU-T’s study group 9
(Integrated Broadband Cable and Television Networks Study Group) are con-
sidering approval of the extensions of MGCP (MGCP v2, RFC 2705-bis, etc.).
MGCP is also evolving to ITU-T’s H.248 recommendation [6,7] and IETF’s
Media gateway control protocol (RFCs 3054, 3015, and 2805).
SWITCH-BASED VERSUS SERVER-BASED VoIP
For switch-based VoIP services, interworking with the existing PSTN switches,
networks, and terminals is desirable. In such scenarios, H.225 and H.245 are
well-established signaling and media control protocols under the H.323
umbrella protocol. Note that H.323 defines IP-PSTN GWs, call controller or
GK, terminal equipment (TE), and multipoint control units (MCUs) as the
elements of the system architecture. H.248/Megaco appears to be the most
promising emerging protocol that can complement both H.323 and SIP when
SIP/H.323 is used for communication between TEs, and between TE and MG
or GW.
For server-based VoIP services, the intended network consists of servers and
IP routers. In these scenarios, SIP and its many variants are most useful. For
large networks, IETF suggests the use of the TRIP (it defines telephony routing
over IP in a fashion similar to that of the BGP; RFC 2871, a work in progress
in IETF’s IPTel WG, RFC 2871) protocol to locate the server to which a call
should be routed. For routing a call from an SIP or IP phone to a PSTN ter-
minal (analog or digital POTS phone), one must use the IP-PSTN GW, call
controller, and an ENUM server. ENUM (electronic numbering, RFC 2916)
converts the E.164 telephony address to an IP address and vice versa using an
enhanced domain name system (DNS) server.
H.225 AND H.245 PROTOCOLS
Although there are a large number of protocols and standards for signaling and
control of real-time VoIP calls, ITU-T’s H.22x and H.32x recommendations
(details are available at www.itu.int/itu-t/) are by far the most widely deployed
first-generation VoIP protocols, especially for international VoIP calls. The key
network elements for operation of the H.323 protocol are the IP-PSTN media
gateway (MG), a call controller or GK, a multipoint control unit (MCU), and
TEs. All of these elements are connected to form the zone shown in Figure 3-1,
using a LAN where the quality of transmission cannot be controlled.
The H.225 standard defines ITU-T’s Q.931 protocol (a variation of ISDN
user network interface layer-3 specifications for basic call control) based call
setup and RAS (registration, admission/administration, and status) messaging
34
EVOLUTION OF VoIP SIGNALING PROTOCOLS
from a GW or end device/unit or TE to a GK. RAS messages are carried over
UDP packets; these contain a number of request/reply (confirmation or reject)
messages exchanged between the TE/GW and the GK. TEs can use RAS for
discovering a GK or to register/deregister with a GK. A GK uses the RAS
messages to monitor the endpoints within a zone and to manage the associated
resources.
H.245 defines in-band media and conference control protocols for call
parameter exchange and negotiation. These parameters include audiovisual
mode and channel, bit rate, data integrity, delay, and so on. They provide a
set of control functions for multiparty multimedia conferencing, and can also
determine the master/slave relationship between parties to open/close logical
channels between the endpoints. In Figure 2-7 I showed the functions and rel-
ative positions of H.225 and H.245 with reference to ISO’s open system inter-
connection (OSI) stack [1]. Figure 3-2 shows the protocol sequence for estab-
lishment of a real-time H.323 voice communication session from one PSTN
phone to another over an IP network. Note that in this diagram, ARQ stands
for Admission Request, ACF for Admission Confirm, LRQ for Location
Request, and LCF for Location Confirm. Ingress and egress gateways are
indicated by IGW and EGW, respectively. Ingress and egress gatekeepers are
indicated by IGK and EGK, respectively.
SESSION INITIATION PROTOCOL (SIP)
SIP (IETF’s RFC 3261) refers to a suite of call setup and media mapping pro-
tocols for multimedia (including voice) communications over a wide area net-
Figure 3-1 Network elements and their interconnection using a LAN in an H.323 zone.
Note that the PBX (PSTN) is outside the scope of H.323 and is shown to demonstrate
the interoperability of H.323 with PSTN.
SESSION INITIATION PROTOCOL
35
work (WAN). It includes definitions of the SIP, Session Announcement Pro-
tocol (SAP), and Session Description Protocol (SDP; RFCs 3266, 3108, and
2327).
SIP supports flexible addressing. The called party’s address can be an e-mail
address, a URL, or ITU-T’s E.164-based telephone number. It uses a simple
request-response protocol with syntax and semantics that are very similar to
those of the HTTP protocol used in the World Wide Web (WWW). As the
name suggests, SIP is used to initiate a session between users, but it does so in a
lightweight fashion. This is because SIP performs location service, call partici-
pant management, and call establishment but not resource reservation for the
circuit or tunnel that is to be used for transmission of information. These
characteristics of SIP appear to be very similar to the features of the H.225
protocol. SAP is used along with SDP to announce the session descriptions
proactively (via UDP packets) to the users.
SDP includes information about the media streams, attributes of the
receiver’s capability, destination address(es) for unicast or multicast, UDP port,
payload type, and so on. The receiver’s capability may include a list of en-
coders that the sender can use during a session. These attributes can also be
renegotiated dynamically during a session to reduce the probability of conges-
tion. These characteristics of SDP appear to be very similar to the features of
the H.245 protocol.
Figure 3-2 Message exchange for setting up an H.323-based VoIP session from one
PSTN phone to another over an IP network.
36
EVOLUTION OF VoIP SIGNALING PROTOCOLS
SIP architectural elements include (a) user agents (UA): client (UAC) or
server (UAS) and (b) network servers: redirection, proxy, or registrar. The
client or end device in SIP includes both the client and the server; hence, a call
participant (end device) may either generate or receive requests. SIP requests
can traverse many proxy servers. Each proxy server may receive a request and
then forward it to the next-hop server, which may be another proxy server or
the destination UA server. A SIP server may act as a redirect server as well. A
redirect server informs the client about the next-hop server so that the client
can contact it directly.
Figure 3-3 shows the message exchange for a SIP-based call setup. Note that
the number of messages that need to be exchanged to set up a SIP session is
smaller than that for an H.323 session (Fig. 3-2). As of 2001, both software-
based (running in a PC) and hardware-based SIP and IP phones were available.
For call routing over a large IP network, SIP may use the TRIP (telephony
routing over IP, a work-in-progress in IETF’s IPTel WG, RFC 2871) protocol
to locate the server to which a call should be routed. For routing a call to a
PSTN terminal (POTS phone), it may be necessary to use the ENUM (elec-
tronic numbering, RFC 2916) protocol. ENUM converts E.164 telephony
address to IP address (using an enhanced DNS server) and vice versa.
Figure 3-3 Message exchange for setting up a SIP-based voice communication session
from one IP or SIP phone to another.
SESSION INITIATION PROTOCOL
37