Chapter 4: The Socket Library- P1
The socket library is a low-level programmer's interface that allows clients
to set up a TCP/IP connection and communicate directly to servers. Servers
use sockets to listen for incoming connections, and clients use sockets to
initiate transactions on the port that the server is listening on.
Do you really need to know about sockets? Possibly not. In Chapter 5, The
LWP Library, we cover LWP, a library that includes a simple framework for
connecting to and communicating over the Web, making knowledge of the
underlying network communication superfluous. If you plan to use LWP you
can probably skip this chapter for now (and maybe forever).
Compared to using something like LWP, working with sockets is a tedious
undertaking. While it gives you the power to say whatever you want through
your network connection, you need to be really careful about what you say;
if it's not fully compliant with the HTTP specs, the web server won't
understand you! Perhaps your web client works with one web server but not
another. Or maybe your web client works most of the time, but not in special
cases. Writing a fully compliant application could become a real headache.
A programmer's library like LWP will figure out which headers to use, the
parameters with each header, and special cases like dealing with HTTP
version differences and URL redirections. With the socket library, you do all
of this on your own. To some degree, writing a raw client with the socket
library is like reinventing the wheel.
However, some people may be forced to use sockets because LWP is
unavailable, or because they just prefer to do things by hand (the way some
people prefer to make spaghetti sauce from scratch). This chapter covers the
socket calls that you can use to establish HTTP connections independently
of LWP. At the end of the chapter are some extended examples using
sockets that you can model your own programs on.
A Typical Conversation over Sockets
The basic idea behind sockets (as with all TCP-based client/server services)
is that the server sits and waits for connections over the network to the port
in question. When a client connects to that port, the server accepts the
connection and then converses with the client using whatever protocol they
agree on (e.g., HTTP, NNTP, SMTP, etc.).
Initially, the server uses the socket( ) system call to create the socket, and the
bind( ) call to assign the socket to a particular port on the host. The server
then uses the listen( ) and accept( ) routines to establish communication on
that port.
On the other end, the client also uses the socket( ) system call to create a
socket, and then the connect( ) call to initiate a connection associated with
that socket on a specified remote host and port.
The server uses the accept( ) call to intercept the incoming connection and
initiate communication with the client. Now the client and server can each
use sysread( ) and syswrite( ) calls to speak HTTP, until the transaction is
over.
Instead of using sysread( ) and syswrite( ), you can also just read from and
write to the socket as you would any other file handle (e.g., print <FH>;).
Finally, either the client or server uses the close( ) or shutdown( ) routine to
end the connection.
Figure 4-1 shows the flow of a sockets transaction.
Figure 4-1. Socket calls
Using the Socket Calls
The socket library is part of the standard Perl distribution. Include the socket
module like this:
use Socket;
Table 4-1 lists the socket calls available using the socket library in Perl.
Table 4-1: Socket Calls
Function Usage Purpose
socket( )
Both client
and server
Create a generic I/O buffer in the operating system
connect(
)
Client only
Establish a network connection and associate it
with the I/O buffer created by socket( )
sysread( )
Both client
and server
Read data from the network connection
syswrite(
)
Both client
and server
Write data to the network connection
close( )
Both client
and server
Terminate communication
bind( ) Server only
Associate a socket buffer with a port on the
machine
listen( ) Server only Wait for incoming connection from a client
accept( ) Server only Accept the incoming connection from client
Conceptually, think of a socket as a "pipe" between the client and server.
Data written to one end of the pipe appears on the other end of the pipe. To
create a pipe, call socket( ). To write data into one end of the pipe, call
syswrite( ). To read on the other end of the pipe, call sysread( ). Finally, to
dispose of the pipe and cease communication between the client and server,
call close( ).
Since this book is primarily about client programming, we'll talk about the
socket calls used by clients first, followed by the calls that are only used on
the server end. Although we're only writing client programs, we cover both
client and server functions, for the sake of showing how the library fits
together.
Initializing the Socket
Both the client and server use the socket( ) function to create a generic
"pipe" or I/O buffer in the operating system. The socket( ) call takes several
arguments, specifying which file handle to associate with the socket, what
the network protocol is, and whether the socket should be stream-oriented or
record-oriented. For HTTP transactions, sockets are stream-oriented
connections running TCP over IP, so HTTP-based applications must
associate these characteristics with a newly created socket.
For example, in the following line, the SH file handle is associated with the
newly created socket. PF_INET indicates the Internet Protocol while
getprotobyname('tcp') indicates that the Transmission Control Protocol
(TCP) runs on top of IP. Finally, SOCK_STREAM indicates that the socket
is stream-oriented, as opposed to record-oriented:
socket(SH, PF_INET, SOCK_STREAM,
getprotobyname('tcp')) || die $!;
If the socket call fails, the program should die( ) using the error message
found in $!.
Establishing a Network Connection
Calling connect( ) attempts to contact a server at a desired host and port. The
configuration information is stored in a data structure that is passed to
connect( ).
my $sin = sockaddr_in
(80,inet_aton('www.ora.com'));
connect(SH,$sin) || die $!;
The Socket::sockaddr_in( ) routine accepts a port number as the first
parameter and a 32-bit IP address as the second number. Socket::inet_aton( )
translates a hostname string or dotted decimal string to a 32-bit IP address.
Socket::sockaddr_in( ) returns a data structure that is then passed to connect(
). From there, connect( ) attempts to establish a network connection to the
specified server and port. Upon successful connection, it returns true.
Otherwise, it returns false upon error and assigns $! with an error message.
Use die( ) after connect( ) to stop the program and report any errors.
Writing Data to a Network Connection
To write to the file handle associated with the open socket connection, use
the syswrite( ) routine. The first parameter is the file handle to write the data
to. The data to write is specified as the second parameter. Finally, the third
parameter is the length of the data to write. Like this:
$buffer="hello world!";
syswrite(FH, $buffer, length($buffer));
An easier way to communicate is with print. When used with an autoflushed
file handle, the result is the same as calling syswrite( ). The print command
is more flexible than syswrite( ) because the programmer can specify more
complex string expressions that are difficult to specify in syswrite( ). Using
print, the previous example looks like this:
select(FH);
$|=1; # set $| to non-zero to make
selection autoflushed
print FH "hello world!";
Reading Data From a Network Connection
To read from the file handle associated with the open socket connection, use
the sysread( ) routine. In the first parameter, a file handle is given to specify
the connection to read from. The second parameter specifies a scalar
variable to store the data that was read. Finally, the third parameter specifies
the maximum number of bytes you want to read from the connection. The
sysread( ) routine returns the number of bytes actually read:
sysread(FH, $buffer, 200); # read at most 200
bytes from FH
If you want to read a line at a time from the file handle, you can also use the
angle operator on it, like so:
$buffer = <FH>;
Closing the Connection
After the network transaction is complete, close( ) disconnects the network
connection.
close(FH);
Server Socket Calls
The following functions set the socket in server mode and map a client's
incoming request to a file handle. After a client request has been accepted,
all subsequent communication with the client is referenced through the file
handle with sysread( ) and syswrite( ), as described earlier.
Binding to the Port
A sockets-based server application first creates the socket as follows:
my $proto = getprotobyname('tcp');
socket(F, PF_INET, SOCK_STREAM, $proto) || die
$!;
Next, the program calls bind( ) to associate the socket with a port number on
the machine. If another program is already using the port, bind( ) returns a
false (zero) value. Here, we use sockaddr_in( ) to identify the port for bind(
). (We use port 80, the traditional port for HTTP.)
my $sin = sockaddr_in(80,INADDR_ANY);
bind(F,$sin) || die $!;
Waiting for a Connection
The listen( ) function tells the operating system that the server is ready to
accept incoming network connections on the port. The first parameter is the
file handle of the socket to listen to. In the event that multiple client
programs are connecting to the port at the same time, a queue of network
connections is maintained by the operating system. The queue length is
specified in the second parameter:
listen(F, $length) || die $!;
Accepting a Connection
The accept( ) function waits for an incoming request to the server. For
parameters, accept( ) uses two file handles. The one we've been dealing with
so far is a generic file handle associated with the socket. In the above
example code, we've called it F. This is passed in as the second parameter.
The first parameter is a file handle that accept( ) will associate with a
specific network connection.
accept(FH,F) || die $!;
So when a client connects to the server, accept( ) associates the client's
connection with the file handle passed in as the first parameter. The second
parameter, F, still refers to a generic socket that is connected to the
designated port and is not specifically connected to any clients.
You can now read and write to the filehandle to communicate with the
client. In this example, the filehandle is FH. For example:
print FH "HTTP/1.0 404 Not Found\n";
Client Connection Code
The following Perl function encapsulates all the necessary code needed to
establish a network connection to a server. As input, open_TCP( ) requires a
file handle as a first parameter, a hostname or dotted decimal IP address as
the second parameter, and a port number as the third parameter. Upon
successfully connecting to the server, open_TCP( ) returns 1. Otherwise, it
returns undef upon error.