1
1
Network Traffic Analysis
Using tcpdump
Judy Novak
Judy Novak
Johns Hopkins University Applied Physics Laboratory
Writing tcpdump Filters
All material Copyright Novak, 2000, 2001. All rights reserved.
2
2
Writing tcpdump Filters
•
Introduction to tcpdump
•
Writing tcpdump Filters
•
Examination of Datagram Fields
•
Beginning Analysis
•
Real World Examples
•
Step by Step Analysis
This page intentionally left blank.
3
3
Objectives
•
Review the foundations to understand and create
tcpdump filters including:
•
tcpdump filter format
•
Review of bit/byte theory
•
Review of binary/hexadecimal numbering
systems
•
Review of bit masking
•
Learning to formulate tcpdump filters
•
Review of tcpdump output
tcpdump filters are necessary to selectively gather/read records of network traffic.
While this section may be somewhat difficult to understand especially if you haven’t been exposed
to this theory before, it is more than just an academic exercise. In order to comprehend network
traffic at its most visceral level, you will have to understand tcpdump filters. Also, familiarity with
tcpdump filters is necessary if you want to process tcpdump files for some trait. For instance, if you
wanted to identify the beginning of a TCP connection, you would search for traffic with the SYN bit
alone set.
4
4
Foundations For Understanding
tcpdump Filters
•
Specify item of interest for record selection
•
Any field in the IP datagram
•
Examples: header length or TCP flags
•
Variables for more commonly used fields:
•
Examples: “port” or “host”
•
Less common fields:
•
Identify protocol
•
Identify byte displacement
•
Examples: ip[0], tcp[13]
tcpdump filters need to specify an item of interest, a field in the IP datagram for record selection.
Such items can be part of the IP header such as the IP header length, the TCP header such as TCP
flags, the UDP header such as the destination port, or the ICMP message such as the message type.
tcpdump provides a special name for each type of header. Much as you would expect, ip is used to
denote a field in the IP header or data portion of the IP datagram, tcp for a field in the TCP header or
segment, udp for the UDP header or UDP datagram, and ICMP for the ICMP message.
For instance, ip[0] would indicate the first byte offset of the IP datagram which happens to be part of
the IP header (remember counting starts at 0). tcp[13] would be the 13th byte offset into the TCP
segment which is also part of the TCP header, and icmp[0] would be the first byte offset of the ICMP
message which is the ICMP message type.
Sample filters and reference material are found in:
• tcpdump man pages
5
5
Specifying Fields
0 15 16 31
20
bytes
4-bit 4-bit IP 8-bit TOS 16-bit total length (in bytes)
version header
length
16-bit IP identification number 3-bit
flags
13-bit fragment offset
8-bit time to live
(TTL)
8-bit protocol
16-bit header checksum
32-bit source IP address
32-bit destination IP address
ip[1]
src host
protocol[displacement]
macro
Looking at the IP header as an example, we learn two ways to specify different fields. The easier way to
specify a field of interest is by using a tcpdump macro. Not all fields have these macros. The source IP can
be specified by combining two macros “src” and “host” to identify the field. But, if we want to look at the
type of service field, we have to identify a protocol in which the field is found (IP because this is in the IP
header) and a displacement in bytes (1) offset in the protocol.
What are some of the more common macros used in filters?
host select the record if either the source or destination host matches this IP
net select the record if either the source or destination subnet matches
This is useful if there are several IP’s from the same subnet of interest to you
port select the record if either the source or destination port matches
src host select the record if the source host matches
dst host select the record if the destination host matches
src net select the record if the source subnet matches
dst net select the record if the destination subnet matches
src port select the record if the source port matches
dst port select the record if the destination port matches
icmp select the record if the protocol field ip[9] has a value of 1
tcp select the record if the protocol field ip[9] has a value of 6
udp select the record if the protocol field ip[9] has a decimal value of 17
6
6
The tcpdump Filter Format
•
The two different formats for a tcpdump filter are:
•
<protocol header> [offset: length] <relation> <value>
ip[9] = 1
tcp[2:2] < 20
udp[4:2] != 0
icmp[0] = 8
•
<variable> <value>
port 23
dst host 1.2.3.4
src net 0
The first filter ip[9] = 1 selects any record with the IP protocol of 1 (ICMP).
The second filter tcp[2:2] selects any record with a TCP destination port less than 20.
The third filter udp[4:2] selects any UDP record with a non-zero UDP length.
The fourth filter selects any record with an ICMP message type of 8, an ICMP echo request.
The first variable filter selects any record with source/destination port of 23 (telnet).
The second variable filter selects any record with destination host 1.2.3.4.
The third variable filter selects any record with a source subnet of 0.x.x.x.
7
7
Bit/Byte Fundamentals
•
A byte is an 8 bit field
•
It is possible to denote a span of bytes, i.e.
udp[0:2]
•
Smallest precision that the tcpdump “language”
offers is a byte
•
How do you reference bits within a byte?
•
Bit masking
First 4 bytes (bytes 0 - 3) of the IP header:
BYTE 0 1 2 3
4 bit 4 bit 8 bit TOS 16 bit IP total
length version length
The bit is the smallest unit that can be represented by a computer - it can have a value of either 0 or 1. A
byte is composed of 8 bits. Byte counting begins at byte 0; all successive bytes fall on these 8 bit
boundaries. udp[0:2] specifies the byte in the UDP datagram beginning at byte 0 for a length of two bytes.
Bit masking or using a combination of boolean arithmetic and binary/hexadecimal values will help “isolate”
bits.
8
8
Decimal/Binary
Representations
Base 10 Arithmetic - Decimal
2 6 5
10
2
10
1
10
0
Base 2 Arithmetic - Binary
2
7
2
6
2
5
2
4
2
3
2
2
2
1
2
0
1 0 0 0 0 0 0 1
128 64 32 16 8 4 2 1
= 2x100 + 6x10 + 5x1 = 265
= 1x128 + 1x1 = 129
Because decimal is our native number system, we really don’t have to do any conversions to understand the
value of a number. But, if you examine the number, you realize that a digit has value based on its
placement in the number. The digits that are least significant (to the right) have less value and those that
are most significant (to the left) have the most value. Each digit is represented by an increasing power of
the native base or base 10.
The same theory applies when we are dealing with binary or base 2. Instead of using exponents of 10, we
use exponents of 2 to figure out the decimal representation of the number. Also, because we are talking in
terms of a byte, we use 8 bits or binary digits to represent a byte. So, we see above how we convert the
binary number of 10000001 to a decimal 129.
9
9
Binary/Hex Conversion
Base 2 Arithmetic - Binary
2
7
2
6
2
5
2
4
2
3
2
2
2
1
2
0
1 0 0 0 0 0 0 1
128 64 32 16 8 4 2 1
= 1x128 + 1x1 = 129
.
Base 16 Arithmetic - Hexadecimal
2
3
2
2
2
1
2
0
2
3
2
2
2
1
2
0
1 0 0 0 0 0 0 1
4 binary bits represent one
hex character. 1000 0001
binary is 81 hex. To denote
hex we use the 0x prefix -
0x81.
81 hex = 8x16
1
+ 1x16
0
= 129
If you consider a byte as two hexadecimal characters, each character will be 4 bits long. So 16 different
hex values can be represented - if all bits of a 4-bit chunk (nibble) are turned on or set to 1 the maximum
value will be 15 (8 + 4 + 2 + 1). Counting in hex goes from 0 to 9, 10 = a, 11 = b, 12 = c, 13 = d, 14 =e,
15 = f.
The leftmost bits are called the high-order bits - they have the most value, whereas the rightmost bits are
referred to as the low-order bits. The same holds true for bytes; the left most are known as high-order
bytes and right most are known as low-order bytes.
Remember from arithmetic that any number with an exponent of 0 is 1.
Terminology:
Byte = 8 bits
Nibble = 4 bits
Hex char = 4 bits
Word = 32 bits
10
10
Hexadecimal Representation
2
3
2
2
2
1
2
0
2
3
2
2
2
1
2
0
(Hex)
0 0 0 0 = 0 1 0 0 0 = 8
0 0 0 1 = 1 1 0 0 1 = 9
0 0 1 0 = 2 1 0 1 0 = 10 (a)
0 0 1 1 = 3 1 0 1 1 = 11 (b)
0 1 0 0 = 4 1 1 0 0 = 12 (c)
0 1 0 1 = 5 1 1 0 1 = 13 (d)
0 1 1 0 = 6 1 1 1 0 = 14 (e)
0 1 1 1 = 7 1 1 1 1 = 15 (f)
When representing hexadecimal, we have a numbering system that goes from 1 to 15. The problem
comes in representing values above 9 in a different scheme so that we can differentiate decimal and
hexadecimal. A value of 10 decimal is a different value than 10 hexadecimal. A value of 10
hexadecimal has a value of 16 in decimal. So, when we get to values above 9, we use letters to
represent 10 – 15 as you can see in the second column above. The letters in parentheses are the
hexadecimal representations of the numbers in decimal.
11
11
Figuring Out Decimal Values
for Hex Output
Use reference to discover where fields start and end
Each character in the hex output is a power of 16
Start at the rightmost character and increase power of 16
Multiply by base number by exponent, add all values
First 8 bytes of hexadecimal output of a UDP header
0089 0089 004c 1fd7
0 0 8 9 0 0 8 9 0 0 4 c 1 f d 7
2
3
Source Port Dest Port Length Checksum
1
4
16
3
16
2
16
1
16
0
16
3
16
2
16
1
16
0
16
3
16
2
16
1
16
0
16
3
16
2
16
1
16
0
23
4
4
8*16
1
+ 9*16
0
= 128 + 9 = 137
1
When you see hexadecimal output and you need to translate it to some kind of coherent output, how do
you start? Let’s assume that we are looking at a field or fields that have numeric values. In other words,
we are not looking at a string payload. Let’s use 8 bytes of hexadecimal output from a UDP header to
describe the process of figuring out the decimal values of all the fields.
The first thing that you need to do is to identify what you are looking at. Most of the time when you look
at hex output, it will be the entire datagram. In this case, for demonstration purposes, we will take an
excerpt of the datagram. This is the first 8 bytes of the UDP header. You’ll need to use some reference,
such as TCP/IP Illustrated, Volume1 by Richard Stevens or the references at the back of the course to
identify the fields in the UDP header. Remember that each character that you see in the output is one hex
character (4 bits) so there are 2 hex characters in a byte. You’ll discover that there is a 16-bit source port,
a 16-bit destination port, a 16-bit UDP length and a 16-bit checksum in the UDP header. Coincidentally,
these are all 2 byte fields – or 4 hex characters. You see that we divide up the hex output accordingly.
Next, start with the rightmost hex character and label that with an exponent of 16
0
. For each hex
character associated with that field, move left and increase the power of 16 until you hit the leftmost
character in the field. Then, multiply the base by the exponent above it and add all the values.
Using the source port 0089 as an example, we start with the rightmost character and label it 16
0
. Next,
we only have one more character that is non-zero and we label that as 16
1
. Now, we multiply the
rightmost character 9 by 16
0
(anything to the 0 power is 1) and get a result of 9. Then we multiply the
next character 8 by 16
1
(or 16) and get 128. Adding 128 and 9, we arrive at 137 which is the source port
typically associated with NetBIOS name service queries.
12
12
Your Turn
These are the first two bytes of the IP header
4500 0030
Use the reference pages at the end of the course to figure out
what the 16-bit total length is in decimal
Figure out the decimal value of the 16-bit total length. Use the reference materials at the end of this
course to find a layout of the IP header and where the 16-bit total length falls in the IP header. Once
you’ve discovered that field, use the methods discussed to figure out the decimal equivalent of the
hex value.
13
13
Answer
4 5 0 0 0 0 3 0
IP version
IP header
length
TOS
16
3
16
2
16
1
16
0
3*16
1
= 48
Answer: 48 bytes in the IP
datagram
16-bit total length
The first thing we do is look at the layout for the IP header. The 16-bit total length field is found in
the 2
nd
and 3
rd
bytes offset from the IP header (counting starts at 0). We find a value of 0030 in
these 2 bytes. So, we methodically label all the the hex digits in this field as powers of 16 starting at
the rightmost digit 0. Because we only have one non-zero value in the IP length field, we really only
need to figure out its value.
The non-zero value of 3 is located in the 16
1
position. So, we simply multiply 3*16 and discover
that the IP length is 48 bytes.
14
14
The Problem: Looking at
Fields Less Than a Byte
Layout of first byte
4 bit IP version 4 bit header length
0 1 0 0
Current
value in IP version
0 0 0 0
Desired value in IP version
We run into a slight problem when we deal with fields in an IP datagram that are less than a byte in
length. The first byte of the IP header is actually two different fields – a 4 bit IP version and a 4 bit
header length. If we use the protocol[displacement] notation, ip[0] finds both fields. What if we
wanted to look at the 4 bit IP header length only and we were not interested in the 4 bit IP version?
There is really no simple operation that is native to the tcpdump “language” that allows us to do this.
But, we can do some operations and manipulations of fields and bits that will allow us to look at the
4 bit header length only. In essence, if we can zero out or change all the bits in the IP version field
to 0, we really are looking at just the 4 bit header length if we look at ip[0]. How exactly do we
discard or zero-out this high-order nibble and preserve the low-order nibble found in the 4 bit header
length? This is what we will discuss next.
15
15
More Fundamentals
•
Individual bit or a range of bits selected by bit
masking
•
Uses the boolean AND operation to keep or
discard a bit(s)
•
Two bits are AND’ed; the following values yield
the following results
BIT A AND BIT B = RESULT
0 0 0
1 0 0
0 1 0
1 1 1
We will use the boolean AND operation to help us zero-out unwanted bits. Let’s look at the
fundamentals of applying this theory.
Because we are dealing with computers that talk in binary, we consider taking every combination of
the only two possible bit values - 0 and 1. As you can see from the truth table above, the only time
the resulting value is 1 is when both bits that are AND’ed are 1.
If you imagine “BIT A” as the bit found in the original byte and “BIT B” as a mask value used in an
AND operation of “BIT A”, we can determine the appropriate mask value to either discard or
preserve an original bit.
16
16
Solution: “AND” Unwanted Bits
With 0’s
0 1 0 0
Current
value in IP version
0 0 0 0
Resulting high order nibble value
0 0 0 0
The solution to dealing with fields that are less than a byte is basically to zero-out all other bits in the
byte other than those we are interested in. In this instance, we want to “AND” the high-order 4 bits
in the first byte of the IP header with zeros. This will yield zeros in the place where there once might
have been non-zero values.
17
17
Solution: “AND” Wanted Bits
With 1’s
0 1 0 1
Current
value in IP header length
Resulting low order nibble value
1 1 1 1
0 1 0 1
Because we are dealing with an entire byte, we must also pay attention to the low- order nibble, the
IP header length that we want to preserve. We must preserve the original value that we found there.
We can’t simply ignore this field. In order to preserve the current value found in that field, we
“AND” all bits with a value of 1. This will not change the current value found in that nibble.
18
18
The Mask Byte
0 1 0 1
Current
value in first byte of IP header
Resulting byte value
1 1 1 1
0 1 0 1
0 1 0 0
0 0 0 0
0 0 0 0
Mask value
0000 1111
Hex - 0x0f
Ultimately, what you have to do is create a “mask” byte. This is a byte that will be AND’ed with the
original value found in the first byte of the IP header to give us the desired resulting byte which will
have the high-order nibble of all zeros and the low- order nibble as it was before the AND operation.
So, this just means that our mask byte is 0000 1111 which translated to two hexadecimal characters
of 0f.
19
19
Putting it all Together
0 1 0 1
Current
value in first byte of IP header
1 1 1 1
0 1 0 0
0 0 0 0
Mask value
0000 1111
Hex - 0x0f
Partial filter = ip[0] & 0x0f
field AND mask
We figured out the mask that we want to AND with the first byte of the IP header, but how do we
tell tcpdump how to do this? What we do is first identify the byte (or bytes) that we are dealing with
by identifying what protocol we are dealing with (IP) and the displacement into the protocol that the
byte is found (0 – first byte). Next, we use the “&” symbol to denote the AND operation and then
we must tell it what value to AND it with. This is the mask value that we figured out or 0x0f in
hexadecimal.
20
20
And Your Point Would Be?
A 1 in a mask bit preserves a corresponding value bit, a 0 in a
mask bit discards a corresponding value bit.
0 1 0 0 0 1 0 1
Current IP byte 0 fields, version = 4, length = 5
0 0 0 0 1 1 1 1
Mask value
0 0 0 0 0 1 0 1
Discards first 4 bits, preserves second 4 bits
The mask would be 0x0f and the partial filter would be ip[0] & 0x0f.
4 bit version 4 bit length
2
3
2
2
2
1
2
0
2
3
2
2
2
1
2
0
First byte of IP header
Once the mask has been computed to figure out which bits to discard and which to preserve, it has
to be “superimposed” over some byte or span of bytes. In this case we need to superimpose the
mask over the entire first byte of the IP header because that is where the fields we are interested in
lie. So, in this case that field is represented by ip[0]. The partial filter of superimposing the
appropriate mask over the field of interest becomes ip[0] & 0x0f.
A way to test whether an IP datagram has options is to test if the IP header length is greater than 5
(this is five 32 bit “words”- or 4 bytes). The filter then would become:
ip[0] & 0x0f > 5
If this filter were included in the tcpdump statement with the proper notation or in a file and pointed
to by the tcpdump option -F, all records read that had an IP header length of greater than 5 would be
selected.
What would the mask be to preserve the high order 4 bits (the version number) and discard the low
4 order bits (the length)?
0 1 0 0 0 1 0 1 AND
__ ___ ___ ___ ___ ___ ___ ___ MASK? <== Fill in the blanks with
the mask.
0 1 0 0 0 0 0 0 YIELDS
The answer for this can be found at the end of this chapter, Exercise 1.
21
21
TCP Flag Bits
•
Located in the TCP header
•
Tells much about the state of a given TCP
segment
•
We often examine this field in filters
URG ACK PSH RST SYN FIN
This field is denoted as tcp[13]
Reserved
The TCP reserved bits, in the past, have not been used for anything other than operating system
fingerprinting. They were probably included for some anticipated future use, but at present are not
supposed to be used. When nmap attempts to do operating system fingerprinting, it might try to set
these values just to see how the receiving host responds.
Also, if a packet gets corrupted in transit, these bits might be erroneously set. We will always mask
these bits with 0’s when analyzing the TCP flag bits since they will need to be discarded. We will
then select or discard other flag bits as necessary using an appropriate mask and the boolean AND
operator.
22
22
Masking the TCP Flag Bits
What would the mask be to single out the SYN bit only?
Reserved bits URG ACK PSH RST SYN FIN
0 0 0 0 0 0 1 0 = 0x02
0 0 0 1 0 0 0 0 = 0x___ ___?
0 0 0 0 1 1 1 1 = 0x___ ___?
What TCP flag bit(s) would lines 2 and 3 individually look at and what
would the masks be to check if any of these non-zero bits is set?
2
3
2
2
2
1
2
0
2
3
2
2
2
1
2
0
In the above slide, we try to set the foundations for analyzing the TCP flag bits. When we examine
tcpdump data, we may be interested in TCP data that has a particular flag set. For instance, we may
be interested in initial connections only, in which case the SYN bit alone is set. So, in this case we
need to be able to mask the other bits so that we check if the SYN bit only is highlighted. Once this
mask is superimposed over the tcp[13] byte, we will select corresponding tcpdump records in which
the SYN bit is found to have a value of 1.
The TCP flag bits field in the slide above has been depicted to help you determine the mask value.
Since there will be two hexadecimal characters in the mask, the dividing line down the middle marks
the two different characters. Also, notice the base 2 values above each of the bits to assist you in
figuring out the corresponding value for the bit.
The answers for this slide can be found at the end of the chapter, Exercise 2.
23
23
Masking the TCP Flag Bits
What would the entire filter be to check if the SYN bit is set?
Reserved bits URG ACK PSH RST SYN FIN
0 0 0 0 0 0 1 0 = tcp[13] & 0x02 != 0
0 0 0 1 0 0 0 0 =
0 0 0 0 1 1 1 1 =
What would the filters be for the second and third lines
to check if the corresponding bits are set?
2
3
2
2
2
1
2
0
2
3
2
2
2
1
2
0
?
?
The comparison of != 0 is used because a complete filter requires a relation and value. The condition
not equal 0 is a generic way of testing that one or more bits of the resulting value between the
original and masked values is not zero. Alternatively, you can test for a condition equal to a value,
but this requires you to figure out the resulting bit values that are set. For instance, tcp[13] & 0x10
= 16 would be another way to test if the ACK bit were set.
The answers for this slide can be found at the end of this chapter, Exercise 3.
24
24
A Couple More Masks
Reserved bits URG ACK PSH RST SYN FIN
1) What would the mask be to check for either the SYN or
FIN bits set? What would the filter be to test for this?
2) What would the mask be to test if any of the 6 TCP flag
bits is non-zero? (hint test for all 6 of them) What would the
filter be for this?
Mask 1 = 0x ___ ___ ??? Example mask: 0xff
Mask 2 = 0x ___ ___ ???
Filter 1 = _____ [
] & 0x ___ ___ ! = 0 Example filter: ip[6] 0x20 ! = 0
protocol [ bytes]
Filter 2 = _____ [
] & 0x ___ ___ ! = 0
protocol [ bytes]
The answers for this slide can be found at the end of this chapter, Exercise 4.
25
25
Checking For Multiple Bits Set
•
tcp[13] & 0x03 ! = 0 will check for either the SYN bit on, the
FIN bit on or both the SYN and FIN bits on
•
How would you select only the records with both the SYN
and FIN bits on simultaneously?
•
Check for an exact value
Reserved URG ACK PSH RST SYN FIN
0 0 0 0 0 0 1 1
tcp[13] = 3
What would the filter be to check for all PSH, RST, SYN, and FIN bits set?
Answer:
tcp[13] = 0x0f
To test for exact values in a bit or bits we test for the exact value that should exist if those bits are set. If we
want to check for the PSH, RST, SYN, and FIN flags set, we add up the bit value for all those bits and arrive
at a hexadecimal 0f.
Here is an example of the type of records that the first SYN/FIN filter will pick up:
22:57:14.740000 1.2.3.4.0 > 192.168.24.29.53: SF 2969894912:2969894912(0) win 512
22:57:14.750000 1.2.3.4.0 > 192.168.24.30.53: SF 2969894912:2969894912(0) win 512
22:57:14.800000 1.2.3.4.0 > 192.168.24.31.53: SF 2969894912:2969894912(0) win 512
0 0 0 0 1 1 1
1
Reserved URG ACK PSH RST SYN FIN