6 - 1
Information Assurance Foundations - SANS
©2001
1
Web Security
Security Essentials
The SANS Institute
Hello. With everything that is occurring on the Internet and all of the articles that have been written,
web security is a very exciting area. Most attacks that are publicized are either directly or indirectly
web-based attacks. Every company and person seems to have a web site, yet most web sites are not
designed or built properly from a security standpoint. In the next hour, we are going to take a look at
web security and cover some things you can do to check the security of the web sites you either
maintain or use. This is a foundational course, developed for the SANS Security Essentials program.
When you complete this course, there will be a quiz available from the SANS web page to help
reinforce the material and ensure your mastery of it. Also, you should always get prior permission
but I would recommend trying these steps out on your own web sites to see what vulnerabilities
might exist. Remember, before you can fix a problem you must be aware of the problem. Hopefully
after this module you will have some of the knowledge you need to start securing your web
applications.
6 - 2
Web Security - SANS
©2001
2
Agenda
• Web communication
• Web security protocols
• Active content
• Cracking web applications
• Web application defenses
On the slide “Agenda” we list some of the key things that we are going to cover in this section. First,
we are going to cover web communication and how it works. Topics that are often misunderstood,
like input validation and cookies, will also be covered. Next, security protocols like SSL and SET
will be discussed and key attributes and potential weaknesses of each will be covered. Active
content that increases functionality but decreases security will also be addressed. CGI, Java, and
ActiveX will all be explained, which will allow a reader to compare their alternatives when deciding
on which one to use when putting together a web site. Then the exciting topic of cracking web
applications will be covered. This is meant from an informative standpoint and the reader should be
reminded that you should always get written permission before running any of these tests or tools
against a website. Since the purpose of security is to protect your data, we will finish this section
with how you can defend against web application attacks. Now without delay, lets get started at
looking at web security.
6 - 3
Web Security - SANS
©2001
3
Everything You Always Wanted to
Know About Web Communications…
• Servers and Clients
•HTTP and HTML
The World Wide Web has become the de facto communications medium for the Internet. Millions
of people use it every day to get information, communicate with coworkers, buy and sell goods,
entertain themselves, and keep up-to-date with current events. However, most of these people have
very little knowledge about how the web actually works. On this slide, we will give you a brief
introduction to the web and tell you everything you always wanted to know about the web but were
afraid to ask. All in less than three minutes.
All computers on the web fall into one of two categories: clients or servers. Let’s start with servers.
A server is a computer that contains some sort of information that an organization wants to
distribute. The server runs a special piece of software, called a web server, that takes requests from
other machines, figures out what the request is for, finds the answer to the request, and sends it back
to the requesting machine. That’s basically all a server does.
The client machine is the machine that is doing the requesting. The client runs a piece of software
called a “Web browser”, or just browser for short. Browsers take input from users, convert that
input into a language the server will understand, sends it off to the server over the network, and waits
for the reply. When the server sends the reply, the browser will format it and display it for the user.
Simple as that. OK, it’s not really all that simple. There may be a lot of processing that goes on
behind the scenes. For example, the server may have to contact other computers to get the
information the client needs, or the client may have to run some other programs in order to properly
interpret the response from the browser, but here you have the basics: Client sends the request,
server responds to the request.
The way clients and servers communicate on the Web is through a protocol called HTTP – the
HyperText Transfer Protocol. Like any other protocol, HTTP is just a set of standards,
conventions, and notations the two systems must understand in order to communicate.
The HyperText Markup Language, or HTML, is the actual language used to develop web pages.
HTML uses a set of special notations, called tags, to tell the browser how to display a page,
including things like where to center text, what fonts to use, where to place images on a page, and so
on. If you want to see examples of HTML, most browsers allow you to view the HTML source code
for any page it displays.
6 - 4
Web Security - SANS
©2001
4
Everything You Always Wanted to
Know About Web Communications (2)
• Stateless Communications
• Retrieving Information – GET
• Sending Information – POST
Communication on the Web is called “stateless.” This is because each interaction between clients
and servers is an independent transaction. For example, each time you click on a web page you are
starting a completely new interaction between your browser and the server. If you click on 12
different links on a page, your browser will make 12 different connections to the server. There is no
information about the state of any previous transactions carried over from one transaction to the next.
That’s why it is called “stateless.” We will see in the next slide how servers and clients can be
tricked into carrying state information between transactions.
There are two types of transactions that browsers can request of servers. They are called GET and
POST. A GET transaction asks the server to get some information and send it back to the browser.
When you click on a simple link on a web page, you are typically issuing a GET transaction request.
A POST transaction allows the browser to send some information to the server, usually information
from a form the user fills out. POST transactions send the information from the browser to the server.
The server will then act on the input and send any results back to the client. Generally, users don’t
have control over whether clicking on a link on a web page will initiate a GET or POST transaction.
That decision is already coded into the web page itself.
6 - 5
Web Security - SANS
©2001
5
HTML Security
• Reading HTML Source
Given the open nature of the HTTP protocol, it is easy to start seeing some of the negative
security issues that surround its use. On this slide, we will examine some of these problems.
The easiest way to learn HTML is to examine the HTML source code of any page you happen to
visit. Most browsers have an option to let you view the HTML source of the current page you are
viewing. From there you can see all the code, fields, tags, and other HTML elements that make up
the page. You may also see some unexpected things. Many developers put information into
source code that is never meant for public viewing, thinking that regular people will never see it.
When you view the code, you may see things like variable names and data values that are used
internally by the web site’s programs. You may see references to the names of the site’s
developers or internal information about the organization that is running on the server. You may
see references to directory names where files are stored on the web server. There may be
references to user IDs or passwords for different services on the machines. If the server is using
JavaScript, or some other scripting language, you may see code paths that refer to options that the
user would not normally see. All this information can give an attacker a clue as to the underlying
structure and organization of the server, in order to plan an attack. And it’s all there, free for the
looking.
6 - 6
Web Security - SANS
©2001
6
HTML Security
• Hidden Fields
• Server Side Includes
Many web pages, particularly those that use input forms, make use of a feature of HTML called
Hidden Fields. Like their name implies, hidden fields reside on a web page form but they are
hidden from view when the page is displayed. Hidden fields are typically used as a method for
carrying information from one form to another without requiring the user to re-enter the
information on each form. However, hidden fields can also contain values not entered by the user.
For example, when a user enters a user ID on a web form, the server might look up the user’s
Social Security Number and place that in a hidden field for later use. If you look at the HTML
source for the page with the hidden field, you will see that information. Unfortunately, so will
anyone else that may be sniffing the network when that page is transmitted.
Another neat tool is the use of a technology called Server Side Includes. Server Side Includes
are small pieces of code that are embedded in HTML documents. When a Web server begins to
display a web page, it will go line by line through the code interpreting the HTML commands.
When it comes upon a Server Side Include line, it stops and does whatever the include says. For
example, it might insert text from a different file, like a copyright notice or policy statement. It
might insert today’s date and time to be displayed on the page. Or, and this is the scary part, it
might run a separate program and insert its output into the HTML document. This is scary
because if the included program has a bug, or the attacker can manipulate the program to run
some malicious code, the potential exists for the attacker to compromise the server and gain
unauthorized access or obtain confidential information.
Now, despite these shortcomings, and some others we will examine shortly, nobody is saying that
we should do away with HTML. But security practitioners need to take extra care when
developing, implementing, or reviewing HTML systems to reduce the likelihood that information
in source code or the use of hidden fields and server side includes do not have a negative effect on
the server or the organization.
6 - 7
Web Security - SANS
©2001
7
Input Validation
• All programs are driven by their input
– “garbage in, garbage out”
• Normal input Æ normal results
• Abnormal input Æ unknown
results
• No validation in standard HTML
• Need to check all
form input before
processing
Many web pages use forms or some other user input as a way of interacting with the
user. The user may need to enter a term for searching or enter a user ID and password to gain entry to a
particular site. Or, more commonly, the user will need to enter information into a form like a credit card
number or merchandise numbers. The server will then send this information to some other program for
processing. The processing program, like any other computer program, relies on this input to drive its
functions. There is an age-old axiom in computers that says, “Garbage in, garbage out.” This means that if
the user enters bad input into the program they will get bad output.
Computer programs by and large don’t handle bad input very well. They do great with
normal, expected input. As long as the user works with the program in the ways that the designer
anticipated, everything goes along just fine. But when a user acts in a way that the designer did not
anticipate, either accidentally or maliciously, the program will not act predictably. In fact, the results of
this action are generally unknown. If you haven’t figured this out already, computer security people hate
when things act in unknown, unpredictable ways. That’s because it makes it difficult, if not impossible, to
protect the system. SYN floods, fragmentation attacks, and the Ping of Death are all examples of what
happens when a system receives input it did not expect.
Plain vanilla HTML also has no built-in methods for validating user input. There are no
variable checks or data validation rules built into HTML to prevent bad input from happening. If you are
using a scripting language to develop your pages, you can build validation routines into your forms, but if
you want to stick with plain HTML, you are out of luck.
That’s why you need to pay particular attention to any web pages, or any program for
that matter, that requires user input. You need to ensure that all input is validated for correctness. What
does “validated” mean? It means that you need to check that the input is correct for the type of information
being requested. If you are looking for a Social Security number, make sure that there are no letters
entered by the user. If you are requesting a piece of text that should be 10 characters long, make sure the
user doesn’t enter 500 characters of text.
Beyond simple type and length validation, you also need to check the input to see if it
matches the type of information you are expecting. For example, if you normally only sell 2 or 3 of a
particular item, is it normal for a user to order 999? Is the name on the customer’s credit card different
from the name on the shipping address? Things like this can be a clue to possible unauthorized activity or
fraud.
6 - 8
Web Security - SANS
©2001
8
Cookies
• HTTP is “stateless” – no context information
• Cookies provide “state” and context
• Can only hold information given to the browser
by the server
• Can only be exchanged with originating server or
domain
• Beware of cross-site sharing (e.g. DoubleClick)
• Can block cookies if desired
One of the interesting things that we mentioned before about the HTTP protocol is that it is stateless. By
“stateless” we mean that each transaction is an independent unit with no relation to any transactions that
came before or after it. When you request a web page, your computer connects to the server, gets the
page, then closes the connection. The next time you request a page, your computer makes a new
connection to the server.
Unfortunately, many web applications, like shopping or information retrieval systems, require that
information be passed from one page to the other. How do you accomplish this in a stateless system? The
answer is a protocol called “cookies.” A cookie is a small piece of information that a server will send to
a browser. What does this information contain? Well, almost anything the server wants it to. It might be
the product numbers and prices for things you want to order from a site. It might be your user ID or
customer number on a particular site. It can be anything that the site needs you to store from page to
page.
Cookies are actually a pretty neat technology, and nicely solve a major problem in the original HTTP
protocol. However, many people don’t like cookies. There are a couple of reasons for this. The first is
that the user has no control over the information stored in cookies. Since the content of the cookie is
controlled by the server, you have no way of knowing what’s in it. Also, if a site puts sensitive
information in a cookie, like a Social Security number or a credit card number, unless they take steps to
hide that information (for example, using encryption) that information will be available to everyone on
the network as the cookie is transmitted back and forth between the browser and the server.
6 - 9
Web Security - SANS
©2001
9
Cookies (2)
• HTTP is “stateless” – no context information
• Cookies provide “state” and context
• Can only hold information given to the browser
by the server
• Can only be exchanged with originating server or
domain
• Beware of cross-site sharing (e.g. DoubleClick)
• Can block cookies if desired
Some people object to cookies on privacy principles. They believe that cookies are somehow magically
taking information from you or your computer and spreading that information around the Internet. Most
of these fears are based on a lack of understanding of how cookies really work. First off, cookies can
only contain information that you’ve already given to the web server or the company you are dealing
with. There is no way the site can know your home address or credit card number unless you have
already given it to them. So you’ve already given up some of your privacy before the cookies even
entered into the picture. Secondly, cookies can only be sent to and from the server or domain that
originally created the cookie. There is no way that a cookie from xyz.com can be shared with a server
from abc.com.
This last point, however, while technically true, has found a wrinkle lately. It is true that one company’s
server can not share a cookie with another company’s server. But what if one company were able to
distribute cookies on ALL servers? This is exactly what a company called DoubleClick has done. You’ve
probably seen their advertisements on web pages you’ve visited. DoubleClick rents space on web pages
for advertisements. So, for example, when you visit the web page for acme.com, you will see an ad that
is actually generated by DoubleClick from the DoubleClick server. The cookies generated by that ad are
shared between the browser and DoubleClick, not the browser and Acme. Then, when you go to
widgets.com, you may see another DoubleClick advertisement. Again, you will share a cookie with
DoubleClick, not Widgets.com. In this way, the DoubleClick service can begin to collect information on
what sites you have visited over the Internet. Many privacy advocates are extremely worried about this
practice.
If you are really worried about cookies, you can take steps to protect yourself. In most browsers, you can
set an option to prevent the downloading of cookies to your browser. There are also a number of
shareware add-on utilities that let you selectively block cookies based on various criteria.
6 - 10
Web Security - SANS
©2001
10
What About Non-Persistent
Cookies?
• Non-persistent cookies = per-session cookies
• Non-persistent cookies are not written to the browser
file system
– So they can’t be edited and are therefore safe from session
ID attacks. Right?
– Wrongo!!!
• Several possible methods
– 1) Raw read/write from/to the client-side memory
– 2) Adapt Mozilla browser source to edit cookies
– 3) Write a Javascript app that lets you view page and edit
– 4) Write a proxy that allows for editing of entire session
passed back and forth from browser (Achilles)
There are two general types of cookies, persistent and non-persistent. Persistent cookies are usually
written to your hard drive and even if you reboot your system, they are still there. They are meant
for long term tracking. Non-persistent cookies are meant for a much shorter period of time and are
usually used on a session per session basis. Non-persistent cookies are usually written to memory so
if the computer is rebooted the information in the cookie is lost. Since non-persistent cookies are not
written to the file system, some people think they cannot be edited. Don’t think you are secure if
your web application uses non-persistent cookies. Sure, these cookies aren’t written to the browser
machine’s file system, so they are more difficult to edit. However, there are several techniques for
altering non-persistent cookies. Using a proxy to completely edit the session between browser and
web server is the most promising of these techniques, and can support SSL.
The following are four possible methods for accessing or modifying non-persistent cookies.
1) Raw read/write from/to the client-side memory
2) Adapt Mozilla browser source to edit cookies
3) Write a Javascript app that lets you view page and edit
4) Write a proxy that allows for editing of entire session passed back and forth from browser
(Achilles)
6 - 11
Web Security - SANS
©2001
11
SSL
• Protocol for encrypting network traffic
• Operates at Transport Layer
• Operates on port 443
•How it works
– Client connects to server
– Server indicates need for SSL
– Client and server exchange crypto keys
– Secure session begins
• Not a guarantee of security
Plain, generic HTTP is fine for open, non-secret communications, but some applications require more privacy
than that provided by HTTP. For example, you may want to keep your credit card information or information
about your bank accounts secret over the Internet. For these types of applications, there is the Secure Socket
Layer protocol, or SSL.
SSL is a general-purpose protocol for encryption of network traffic. Although it is most commonly associated
with HTTP traffic, SSL operates at the Transport Layer of the TCP/IP stack and can be used with many different
application protocols. Any program that uses TCP can be modified to use SSL. General HTTP traffic typically
operates on port 80. When SSL is enabled on a connection, it usually runs on port 443.
When a client connects to a web server, the server will generally indicate whether SSL is required for that
page. If it is required, the client and the server will negotiate to determine what type of encryption the session
will use. Generally, the strongest algorithm that the two programs support will be selected.
The client and the server will then exchange encryption keys. These are the codes that will enable the two to
encrypt messages back and forth. Once the keys have been exchanged, all further communications between the
client and the server are encrypted.
I have left out a LOT of detail here about the specifics of the key exchange and the use of certificates to
validate the identity of the client and the server, but most of it is unimportant in order to gain a high-level
understanding of the process. What’s important to remember is that all sensitive information that is to be
transmitted over the web should require SSL to be enabled. You can tell if SSL is enabled on a web page by
looking at the bottom of your browser. In Internet Explorer there will be a small icon of a lock in the lower right
corner. In Netscape there will be a small lock in the lower left corner. Other browsers may have other indicators,
but they all mean the same thing – your information is being protected with encryption.
Please note that the use of SSL does not guarantee that your information is secure from all prying eyes. SSL
only secures data in transit over the network. Even then, it is possible that someone will capture the information
as it is transmitted and decrypt the packets. The likelihood is reduced, particularly if you are using strong
encryption, but it is possible. Also, SSL does not protect your information once it reaches the destination
computer. If that computer stores the information in a publicly accessible area or an attacker gains unauthorized
access to that computer, your information is still vulnerable.
6 - 12
Web Security - SANS
©2001
12
Secure Electronic Transactions
(SET)
• Developed by Visa, MasterCard,
Microsoft, Netscape
• Specific-purpose protocol
• Secures credit and debit card
transactions
Protocols like SSL are designed to be general purpose protocols. This means that they can be used in
a variety of applications under a variety of different circumstances. In some instances, however, it is
better to have an application-specific protocol. This is a protocol that is designed with a particular
purpose in mind. Such an application is the exchange of credit and payment information over the
Internet. This type of information can be highly sensitive and the need to keep it confidential is great.
For this reason the Secure Electronic Transaction protocol, or SET, was developed.
SET was developed by a number of large players in the credit card and computer industries,
including Visa, MasterCard, Microsoft, and Netscape. It was designed to handle the specific
problems of transmitting credit and debit card information. For example, SET handles issues like
validating credit card numbers, checking the customer’s authorization to use the credit card,
authorizing the transaction with the bank, and processing the transaction. SET provides an integrated
system that handles the entire transaction, including card authorization and finalization of the sale.
SET has a number of mechanisms that protect the customer, the merchant, and the bank. For
example, the protocol hides the actual credit card number from the merchant, instead sending it
directly to the bank. Also, the bank does not know the actual merchandise purchased by the
customer, protecting the privacy of the customer’s purchases.
6 - 13
Web Security - SANS
©2001
13
Secure Electronic Transactions
(SET) (2)
• Services provided
– Authentication
–Confidentiality
– Message Integrity
–Linkage
SET provides four basic services that protect transactions.
Authentication: All the parties to the transaction are authenticated using digital signatures. We will
learn more about digital signatures later when we discuss cryptography.
Confidentiality: The transaction is encrypted so that Internet eavesdroppers can not capture the data
and discover the details of the transaction.
Message Integrity: The transaction can not be tampered with by attackers. Thus, they can not alter
the account numbers or payment amounts involved in the transaction.
Linkage: SET allows a message sent by one party to the transaction (either the customer, the
merchant, or the bank) to contain an attachment that can be read only by another specified party.
This allows the first party to verify that the attachment is correct without being able to read the
contents of the attachment. This is very important for the privacy reasons stated above.
SET has many advantages over plain SSL in that it covers the entire transaction from end to end. If
plain SSL were used, the credit and validation information would be exposed at many different
points along the way, leaving the information available for attackers or data thieves. This is, in fact,
what happened in 1994 when an attacker broke into the Netcom Internet Service Provider and stole
thousands of credit card numbers that were stored on Netcom’s computers.
Although it seems like the perfect answer to credit exchanges on the Internet, use of SET in the real
world has been slow in coming. Hopefully, in the near future, its use will increase as more
companies implement it as part of the on-line ordering systems and more customers see its
advantages and begin demanding it for their personal transactions.
6 - 14
Web Security - SANS
©2001
14
Common Gateway Interface
(CGI)
• Allows web pages to do something instead of
just returning pages
• Extends the capabilities of a web server
• Creates many exposures on server
– Leaking information
– Performing unauthorized transactions
– Executing unintended programs
For all the hype surrounding it, HTTP is still pretty much a dumb protocol. By that, I mean it really only does
one thing – once you make a request for a web page, HTTP gets the page from the server and delivers it to your
browser. Not too exciting, huh? Well, early users of the Web didn’t think so either. They wanted a way to
interact with Web servers. They wanted the servers to do something instead of blindly returning pages. To make
that happen, they invented the Common Gateway Interface, or CGI.
CGI is a method of extending the web server’s abilities by executing programs on the server and returning the
results back to the user. CGI scripts can generate pages dynamically based on the information obtained during
their execution. Some examples of CGI programs that have been written include database transaction systems,
computer games, financial transaction systems, and even vending machine ordering.
However, CGI is a very primitive process for handling such interaction, and it may create a large number of
vulnerabilities on the server in which it is used. For example, if the results of the CGI execution are not filtered
before being sent to the user, the use of CGI programs can lead to the leakage of information about the system
or its data. Because CGI has few built-in data checking mechanisms, it can be relatively easy for a user to
falsify the information sent to the CGI program, increasing the potential for the execution of unauthorized or
fraudulent transactions. Finally, since many CGI programs use underlying command interpreters (like Perl or a
UNIX shell), the potential exists for an attacker to run programs not intended by the designers of the system.
This is a popular method of gaining unauthorized administrative access on web servers.
6 - 15
Web Security - SANS
©2001
15
Common Gateway Interface
(CGI) (2)
• Common Mistakes
– Misuse of command interpreters
– Bad memory management
– Passing unchecked parameters to system
There are several common mistakes that many CGI developers make when writing their programs. The first is
misuse of command interpreters. As mentioned before, many CGI programs use command interpreters that
are called by the CGI program. Since there is no direct linkage between the CGI program and the command
interpreter, the interpreter has little way of validating the information it is being sent. If an attacker can find a
way to pass random system commands to the interpreter, they have the potential to successfully compromise the
system.
Another common mistake is the lack of attention paid to memory management. As we will see later on when
we discuss buffer overflows, a common method of attack is to send a program more information than it was
designed to handle. If the information reaches a certain peak, or if it is carefully crafted, it has the ability to
crash the server, often leaving the attacker with administrator privileges on the computer. Also, if the program
itself does not pay close attention to the resources it is using, it can potentially consume all the available
resources of the computer, again leaving it exposed to compromise. The final common mistake, and the one
that is also the most preventable, is passing unchecked user input to CGI programs. Many of the most
successful attacks have been based on the fact that a CGI program did not check the information entered by the
user. In some cases, users are able to enter privileged system commands as input to web forms and the computer
will blindly execute them without even a virtual glance.
CGI programs can add a great deal of flexibility to your web site. But, like any enabling technology, it has a
negative side that must be checked before proceeding blindly with its implementation. Also, CGI is a relatively
old protocol, designed back when the web was still in its infancy. There are more modern alternatives to CGI
that have addressed some of CGIs shortcomings. Unfortunately, they have also introduced some of their own.
6 - 16
Web Security - SANS
©2001
16
Active Content
• Programs that interact in a network
environment
• Java/ActiveX
It used to be that computer programs were fairly simple. You ran them, they did some work on your
computer, you got the results, and you were done. Then as network computing took off, we began to
see client/server programs. You ran them, they interacted with a server somewhere on the network,
they did the work on the server, you got the results, and you were done. Then with the advent of the
web, we started seeing the use of CGI programs to do the work. However, with both client/server
and CGI, much of the work was being done on the remote computer. This placed a very heavy
burden on the server. It would be nice if the work could be performed locally on your machine, just
like in the olden days. The server wouldn’t be so burdened and you could probably get the work
done faster.
Enter Active Content. “Active Content” is a term commonly used for program code that is
embedded in the contents of a web page. When the page is accessed by a web browser, the
embedded code is automatically downloaded and executed on the user’s workstation. Other terms
that are sometimes used to describe active content include executable content, active code, or
mobile code. Active Content can be thought of as CGI: The Next Generation.
Two of the most common examples of active content are Java and ActiveX. Java is a programming
and execution environment originally developed by Sun Microsystems. It was designed for
developing programs that run on many different types of devices. One of the features of Java’s
portability is that a special type of Java program, called an applet, can be embedded in a web page’s
HTML code and run on a user’s machine. ActiveX is the term Microsoft uses for its active content
components. ActiveX components are called “controls” and, like Java, are downloaded to the user’s
computer where they are executed.