Chapter 4: The Socket Library- P2
Now we wait for a response from the server. We read in the response and
selectively echo it out, where we look at the $response, $header, and $data
variables to see if the user is interested in looking at each part of the reply:
# get the HTTP response line
my $the_response=<F>;
print $the_response if ($all || defined
$response);
# get the header data
while(<F>=~ m/^(\S+):\s+(.+)/) {
print "$1: $2\n" if ($all || defined $header);
}
# get the entity body
if ($all || defined $data) {
print while (<F>);
}
The full source code looks like this:
#!/usr/local/bin/perl -w
# socket based hypertext version of UNIX cat
use strict;
use Socket; # include Socket
module
require 'tcp.pl'; # file with Open_TCP
routine
require 'web.pl'; # file with parseURL
routine
use vars qw($opt_h $opt_H $opt_r $opt_d);
use Getopt::Std;
# parse command line arguments
getopts('hHrd');
# print out usage if needed
if (defined $opt_h || $#ARGV<0) { help(); }
# if it wasn't an option, it was a URL
while($_ = shift @ARGV) {
hcat($_, $opt_r, $opt_H, $opt_d);
}
# Subroutine to print out usage information
sub usage {
print "usage: $0 -rhHd URL(s)\n";
print " -h help\n";
print " -r print out response\n";
print " -H print out header\n";
print " -d print out data\n\n";
exit(-1);
}
# Subroutine to print out help text along with
usage information
sub help {
print "Hypertext cat help\n\n";
print "This program prints out documents on a
remote web server.\n";
print "By default, the response code, header, and
data are printed\n";
print "but can be selectively printed with the -
r, -H, and -d options.\n\n";
usage();
}
# Given a URL, print out the data there
sub hcat {
# grab paramaters
my ($full_url, $response, $header, $data)=@_;
# assume that response, header, and data will be
printed
my $all = !($response || $header || $data);
# if the URL isn't a full URL, assume that it is
a http request
$full_url="http://$full_url" if ($full_url !~
m/(\w+):\/\/([^\/:]+)(:\d*)?([^#]*)/);
# break up URL into meaningful parts
my @the_url = parse_URL($full_url);
if (!defined @the_url) {
print "Please use fully qualified valid URL\n";
exit(-1);
}
# we're only interested in HTTP URL's
return if ($the_url[0] !~ m/http/i);
# connect to server specified in 1st parameter
if (!defined open_TCP('F', $the_url[1],
$the_url[2])) {
print "Error connecting to web server:
$the_url[1]\n";
exit(-1);
}
# request the path of the document to get
print F "GET $the_url[3] HTTP/1.0\n";
print F "Accept: */*\n";
print F "User-Agent: hcat/1.0\n\n";
# print out server's response.
# get the HTTP response line
my $the_response=<F>;
print $the_response if ($all || defined
$response);
# get the header data
while(<F>=~ m/^(\S+):\s+(.+)/) {
print "$1: $2\n" if ($all || defined $header);
}
# get the entity body
if ($all || defined $data) {
print while (<F>);
}
# close the network connection
close(F);
}
Shell Hypertext cat
With hcat, one can easily retrieve documents from remote web servers. But
there are times when a client request needs to be more complex than hcat is
willing to allow. To give the user more flexibility in sending client requests,
we'll change hcat into shcat, a shell utility that accepts methods, headers, and
entity-body data from standard input. With this program, you can write shell
scripts that specify different methods, custom headers, and submit form data.
All of this can be done by changing a few lines around. In hcat, where you
see this:
# request the path of the document to get
print F "GET $the_url[3] HTTP/1.0\n";
print F "Accept: */*\n";
print F "User-Agent: hcat/1.0\n\n";
Replace it with this:
# copy STDIN to network connection
while (<STDIN>) {print F;}
and save it as shcat. Now you can say whatever you want on shcat's STDIN,
and it will forward it on to the web server you specify. This allows you to do
things like HTML form postings with POST, or a file upload with PUT, and
selectively look at the results. At this point, it's really all up to you what you
want to say, as long as it's HTTP compliant.
Here's a UNIX shell script example that calls shcat to do a file upload:
#!/bin/ksh
echo "PUT /~apm/hi.txt HTTP/1.0
User-Agent: shcat/1.0
Accept: */*
Content-type: text/plain
Content-length: 2
hi" | shcat
Grep out URL References
When you need to quickly get a list of all the references in an HTML page,
here's a utility you can use to fetch an HTML page from a server and print
out the URLs referenced within the page. We've taken the hcat code and
modified it a little. There's also another function that we added to parse out
URLs from the HTML. Let's go over that first:
sub grab_urls {
my($data, %tags) = @_;
my @urls;