1
Unix and Linux Backups – SANS GIAC LevelOne
© 2000, 2001
1
Unix and Linux Backups
for System Administrators
By Robert Blader
Hello, my name is Robert Blader. I’m here to present a tutorial on how to make use of the backup utilities that
UNIX provides and apply them to the development of a backup plan. For the past 10 years, I have worked as
a system administrator at the Naval Surface Warfare Center in Dahlgren, Virginia. The mission of the site I
managed was to develop fire-control software for deployment on board submarines. As such, data
availability, security, and configuration management were of paramount importance.
Before I start, I’d like to tell a story. Perhaps some of you can identify with it. You’re tasked with managing
a system. If it’s new, start with hardware - connect cables, attach peripherals, etc. Next, you install and
configure the operating system, the latest security patches, and security software (Tripwire, TCP Wrappers,
COPS, etc). Next, you create user accounts, groups, and directories. Finally, you add your applications,
compilers, tools, etc. You’re running along fine for six months until one morning users notice they cannot
access files.
The day in question, there is some deadline that must be met and that data is essential. You confirm what you
are being told - you cannot access directories that should be there, and attempts to mount the filesytem are
futile. Your choices are (A) panic; (B) panic while trying to locate a backup that you fear is old and was not
done with a timely recovery in mind; or (C) break out your contingency plan that has your backup/recovery
plan documented step by step. If this is your first crisis, then you probably will handle it using some
combination of A and B. Hopefully, after going through this tutorial, choice C will be a viable option.
2
Unix and Linux Backups – SANS GIAC LevelOne
© 2000, 2001
2
Course Objectives
• Use three Unix/Linux backup
commands: tar, dump, and dd (or
cpio)
• Operate the tape device via the mt
command
• Develop a backup strategy that meets
your needs as well as your users’
At the completion of this tutorial, the student will know how to (1) use tar, dump, and dd to
archive data; (2) know how to use the mt command to control the tape media and the tape device;
and (3) know how to apply the UNIX archiving tool set to formulate a backup plan. (Editor’s note:
information on the UNIX command cpio is also included as an appendix to this course. – JEK)
No one can argue against the value of a backup in a time of crisis. Whether the crisis is the result of
a hardware failure such as a disk crash, a security breach, or a user accidentally deleting files, the
ability to recover from the event in a timely manner is what will separate an excellent system
administrator from a mediocre one. Obtaining funding – and the respect and confidence of users – is
a lot easier when you can provide them with restored data rather than with excuses. However,
devising a backup scheme that achieves this in a UNIX environment may seem a daunting task.
However, it does not need to be. This tutorial will explain the concepts you need to be able to meet
this challenge and succeed.
A list of the requirements that a backup plan should meet will be discussed. A little bit of time spent
creating a backup plan now will make dealing with lost data much less stressful later.
3
Unix and Linux Backups – SANS GIAC LevelOne
© 2000, 2001
3
Tutorial Outline
• Unix/Linux Backup Commands
• Tape operation
• Backup strategies
•Conclusion
We will start by presenting the three backup utilities that UNIX provides us.
They are tar, dump, and dd. Each command will be presented with usage, examples, and a
description of the situation that each is best suited for. We will also touch on some personal "war
stories" and useful examples. This way, we will see how the utilities come together to form a
comprehensive backup scheme.
Since magnetic tape is by and far the most common media, we will show how the mt command
comes into play to manage the tape device and manipulate the tape. Next, we will present some
considerations to take into account when creating a backup plan, and wrap up with some closing
notes.
4
Unix and Linux Backups – SANS GIAC LevelOne
© 2000, 2001
4
Unix/Linux Backup Commands
•tar
•dump
•dd
•cpio (in Appendix)
The archival commands we will discuss here are tar, dump, and dd.
As we will see, each is suited for different types of backups. Combined, they form a versatile toolkit
for performing backups.
Some information on syntax - the dash proceeding option flags for tar and dump are optional.
Dashes however, are not used with dd.
5
Unix and Linux Backups – SANS GIAC LevelOne
© 2000, 2001
5
tar Usage
• Create tar file
tar cvf <archive> <file>
• Extract tar file
tar xvf <archive> <file>
• List contents of tar archive
tar tvf <archive> <file>
• Copy current directory to another
tar cpf - . | ( cd newdir; tar xvpf - )
–Where
• <Archive> is a file or tape device
• <File> is the file or directory to archive
The three primary functions of tar are (1) to create an archive; (2) to extract files from the archive;
and (3) to generate a table of contents for a tar file.
It is simple to use, ideal for backing up only a particular directory tree or a list of files.
Note how in the fourth bullet, we use a dash instead of specifying an “archive”. A dash can be used
in lieu of a device or file name to a indicate that the data will either be read from standard input or
written to standard output depending on which side of the pipe it is used.
6
Unix and Linux Backups – SANS GIAC LevelOne
© 2000, 2001
6
tar -the <File> Parameter
• Warning: -p to get all ACL and
permission information
• Absolute vs. Relative path
–Affects whether files will be placed in
current working directory or in
absolute path when restoring
• If restoring file from tar created using absolute
pathname, could wind up overwriting a file if
one exists by that name
tar, when used with the -p flag, will preserve access information. If you administer a heterogeneous
environment, it may be important to try to extract your tar files on the same platform as they were created
on. This is because some operating systems (such as Solaris) support Access Control Lists; others (such as
Linux) do not. If maintaining ACL controls is important for you at your site, note that the information will
be lost
Another thing to keep in mind when creating a tar archive is the use of absolute vs. relative path names. Tar
files are restored to locations based on how they were put on the tape. If they were created using absolute
path names, they will be restored to the same location. Otherwise they are restored relative to the current
working directory. To illustrate the significance, here is a true story:
At the site I used to work at, we routinely got deliveries of software from our contractors. Unfortunately,
one company was lax in their documentation, especially when it came to installation notes. The normal
course of action with a new delivery was to unload it to a “test” area, where the code would be tested prior to
being put into production. The current version remains in use until the code is tested. One day, I was given
an update to install. I extracted the tar file that was delivered. Since it was backed up using absolute path
names, the current version wound up being overwritten. I had to restore the original version, move it to a
temporary location, extract the new files, move them to a test directory, and move the old version back to
where it belonged. Moral of the story: know what you are extracting, make sure you know where the files
are going, and know if the files already exist on disk. Otherwise, a 15 minute task could take you all
afternoon.
7
Unix and Linux Backups – SANS GIAC LevelOne
© 2000, 2001
7
Absolute vs Relative Examples
•Backup /etc to etc_archive.tar
Absolute path: would overwrite /etc when
extracted)
tar -cvf etc_archive.tar /etc
Relative path
: use “.” to indicate current directory
cd /etc
tar cvf /etc_archive.tar . relative
path
Here are examples of how an archive is created with tar using both absolute and relative path names.
In the absolute path example, the contents of /etc would be overwritten when restored.
Use of the “.” indicates that the archive uses relative path names. Restoring files created in this
manner will place them in the current directory. Typically, you would want to first create an empty
directory from which to stage the tar extraction.
By the way, Linux (Red Hat) tar, by default, strips any leading slashes. However, this can be
overridden with the -P flag. However, this does not apply to all vendors’ implementations of tar.
8
Unix and Linux Backups – SANS GIAC LevelOne
© 2000, 2001
8
Use Caution When Extracting
Tar Files
• If backed up with absolute path:
–Take care that files by that name
don’t already exist
• If backed up with relative path
– Will restore to current directory. Be
certain you cd to the directory you
want the files to reside in
Whether using relative or absolute pathnames, caution should be used. If absolute pathnames are
used, make sure you do not accidentally overwrite files on disk. The next slide shows a snippet of
code that can be used as a shell script to check that the files that are on a tape will not overwrite any
files without you knowing it.
Alternatively, if relative paths are used and the files go to the directory you are in, you need to make
sure that is where you want them to wind up. A common mistake is to untar the file while still sitting
in a directory full of files like /usr for example, and then having to “relocate” the files that do not
belong there.
9
Unix and Linux Backups – SANS GIAC LevelOne
© 2000, 2001
9
Ensure Don’t Overwrite Files
With tar
• The following code could help find
files that could get overwritten:
tar -tvf /dev/nrst0 > tar_listing.out
for FILE in `cat tar_listing.out|cut -f6
-d” “`
do
if [ -f $FILE ]; then
echo “$FILE exists
mv $FILE $FILE.orig
fi
done
Here is one way to ensure that you don’t overwrite files. First, we use tar with the -t option to
extract a file listing and save it off to a temporary file called tar_listing.out.
Then, we read the contents of the tar listing, extract the filename with the cut command, and test to
see if a file by that name exists. If so, print a warning and save it off with a .orig extension. This
way we can be proactive when we restore files and not just cross our fingers and hope for the best.
As a rule of thumb, it is recommended that you use relative path names, extract to a temporary
directory, and then copy files to where you want them to permanently reside. This way, you avoid
overwriting a file by accident.
10
Unix and Linux Backups – SANS GIAC LevelOne
© 2000, 2001
10
Other Tar Options
• Tar a list of files with -I (include)
–Want all *.C files from the /development
directory tree or file system:
find /development -name “*.C” > filelist.Out
tar -I filelist.Out -cvf c_files_archive.Tar
• Likewise, exclude files with -X
This example shows how you can use the find command in conjunction with tar (with the -I
flag) to create an include list. Here we are archiving C source files.
The find command says “search the /development directory tree for files matching the pattern *.c.
Save the results to a file called filelist.out”.
The tar command says “archive all the files in filelist.out and call the archive c_files_archive.tar”.
11
Unix and Linux Backups – SANS GIAC LevelOne
© 2000, 2001
11
tar Summary
• Use relative paths when possible
• Can be used for directory trees or selected
files if listed with an include or exclude file
•Use -p to retain security attributes (e.g.
ACLs)
• Can archive on-line, to tape, or use on either
side of a pipe
•Use -M to span multiple volumes (Red Hat)
To summarize, tar is the the simplest and perhaps the most versatile of backup commands
available.
To be safe, use relative pathnames and use -t to double check the contents of a tape prior to
extracting it.
You can tar an entire directory tree or just selected files.
Use the –p flag to retain ownership, group and access mode, and ACLs on platforms that support
them.
You can use tar to archive on-line, to tape device, or use it on either side of a pipe.
tar does not support device files. Also, on some versions of UNIX, multiple volumes may not be
supported. However, Redhat Linux supports multiple volumes with the -M option. Check to see if
your version of tar supports the use of multiple volumes/tapes.
12
Unix and Linux Backups – SANS GIAC LevelOne
© 2000, 2001
12
Using dump/restore
• Dump levels (full vs. incremental)
• File-system based:
– Rerun full dump if you upgrade the operating
system.
– Need to be root to run dump (need read access to
raw partition).
– Can only dump a local file system, not NFS-
mounted.
– Good time for housekeeping.
dump gives you the option to archive either an entire file system, or only the files that have been
changed since a previous dump. tar does not look at whether or not a file was previously dumped
(or tarred).
Since dump is file system based, there are some things to keep in mind.
First, a full dump should be run after an upgrade or reinstallation of the operating system. This is
because dates on files are when the files were “mastered”, not actually copied to your system.
Therefore, their creation dates relative to your dumps will be out of synch. In other words, the files
you install will be NEW, yet could have older time stamps than the files they are replacing. The
/etc/dumpdates file will not be accurate, and incremental dumps will not pick them up as being
changed files.
Second, tar only requires that you be able to read the file in order to archive it. dump accesses the
raw device (which typically is readable only by root), so non-privileged users cannot run it (without
use of sudo, or a setUID script).
And third, dump only supports local UNIX file systems. It cannot dump NFS-mounted partitions. If
you need to dump a remote partition, run dump on the system serving it and use
hostname:device to specify a remote tape device.
13
Unix and Linux Backups – SANS GIAC LevelOne
© 2000, 2001
13
dump
• Full dump or level “0”
– dump captures entire file systems - can only be used to
dump an entire local file system
• Incremental dump or level 1-9
– Captures all files modified since previous dump of a
lower level
–Uses /etc/dumpdates to store date, dumplevel and
filesystem name
• Set block size, tape length, density, etc
– Defaults are often okay. Some large drives require
specific values which can be obtained from the vendor
Aside from full vs. incremental, you really have no control over which files get dumped.
A level 0 dump captures an entire file system. Incremental dumps (levels 1-9) record files modified
since a dump of lower level.
dump uses the /etc/dumpdates file to record what level dump was done on which file system
and when.
dump also keeps track of the amount of media used. When dumping small partitions to tape, you
can usually rely on defaults; but if dumping large partitions (several GB’s) to large capacity media,
you may need to specify tape length, density or both. Tape drive vendors can usually assist with
these parameters.
14
Unix and Linux Backups – SANS GIAC LevelOne
© 2000, 2001
14
dump Examples
• Full dump of /usr:
dump 0uf /dev/nrst0 /usr
• Incremental (level 2) dump of /usr to
a 20 GB Travan tape drive
dump 2usdf 740 106400 /dev/nrst0 /usr
Some dump examples:
The simplest form of the dump command is:
dump dump_level u (update dumpdates file) f (device name) and the file_system to dump.
The last parameter may be specified as mount point (like /usr) or a disk device name
(/dev/hd0a). If you need to specify other parameters, they must be in the order of the flags used;
example two uses size and density, so that must be the order of the parameters.
First, a full dump of the /usr file system.
Example two dumps /usr to a 20 GB tape drive.
15
Unix and Linux Backups – SANS GIAC LevelOne
© 2000, 2001
15
dump Notes
•uupdates the dumpdates file
– Otherwise dump is full (regardless of
dump level)
• dump normally rewinds to beginning of
tape prior to writing
– If you write multiple dump files to tape
and do NOT use a non-rewinding device,
you will overwrite archives
We’ve already discussed the use of the dumpdates file. If you do not use the -u flag to indicate that
the dumpdates file should be updated, the effect on your incremental dumps is that they will be run
as though you specified full dumps. This is because dump has no way to know when the file system
was dumped.
Be careful to use a non-rewinding device! Dumps rewind to beginning of tape if not directed
otherwise. This will result in all but the last dump file being overwritten.
16
Unix and Linux Backups – SANS GIAC LevelOne
© 2000, 2001
16
restore
• Used for reading dump-formatted
archives to restore or create “table of
contents”
• Interactive and non-interactive mode
– Interactive mode permits “browsing
of dump file”
restore reads back archives created by dump. It can be used to generate a listing or extract files.
Note it cannot read tar files, nor can tar read files created by dump.
restore has both an interactive and a non-interactive mode. The interactive mode permits you to
“browse” through a dump file so that you may search for what you want to retrieve.
17
Unix and Linux Backups – SANS GIAC LevelOne
© 2000, 2001
17
Restore Examples
• Interactive restore of /etc/hosts:
cd /tmp; restore ivf /dev/nrst0
restore> cd etc
restore> add hosts
make node ./etc
restore> extract
Specify next volume #: 1
Extract file ./etc/hosts
Add links
Set directory mode, owner, and times
Set owner/mode for '.'? [yn] y
restore>quit
• Add builds list of files to search for
• When extracted, full path is recreated relative to
current directory and restored file is
/tmp/etc/hosts
Here is an example of an interactive restore.
We first cd to /tmp and use /tmp as a staging directory.
Next, we invoke restore with the i (interactive) flag.
Note add only builds the search list. Issue an add command for each file you want to extract.
However, it does not actually retrieve files. The extract command tells dump to start searching
the tape.
The absolute path to each file restored is recreated. In this case, we are retrieving /etc/hosts in
the /tmp directory. Therefore, restore will put the restored file in /tmp/etc/hosts.
18
Unix and Linux Backups – SANS GIAC LevelOne
© 2000, 2001
18
Restore Examples
•Non-interactive mode -
– Restore full filesystem, /big_project
– Start with most recent level 0 dump tape
cd /big_project
restore rf /dev/nrst0
– Repeat for each incremental dump taken
between the level 0 and present,
restore rf /dev/nrst0
In non-interactive mode, be certain that you have set your current working directory to be the
directory that you want data restored to prior to running the restore -r command.
Position the tape to the desired dump file on the tape. There will be more on this when we discuss
tape operations.
cd to the directory where you want the data restored.
Issue restore with the r and f flags to extract the entire dump.
Repeat the procedure for each incremental dump taken between the level 0 and the present.
19
Unix and Linux Backups – SANS GIAC LevelOne
© 2000, 2001
19
Using dd
• Provides image of disk - as near to a bit-by-
bit copy as you can get with Linux tools
– In event of compromise, could include
“deleted” files
• Ability to copy tape to tape
• Ability to read from non-UNIX platform or
UNIX systems with different byte order
(Sun/SGI)
dd is a utility that reads input files block by block.
If you specify a disk device, you can capture file system metadata – blocks of “data” marked deleted
that could be useful for evidence gathering following a break in. This data would be missed if using
tar or dump, which rely on the UNIX file system.
The input file can be a tape or disk device name, enabling you to make tape to tape copies. without
having to unpack the archive.
Additionally, you can do conversions on the blocks of data permitting you to swap byte pairs -
enabling you to go between SGI and other UNIX variants.
Other conversions include changing upper to lowercase data, ASCII to EBCIDC, and others. Refer
to the man pages for a complete list.
20
Unix and Linux Backups – SANS GIAC LevelOne
© 2000, 2001
20
• Image copy of a file system
dd if=/dev/hd0a of=/dev/nrst0
• Tape to tape copy
dd if=/dev/nrst0 of=/dev/nrst1
• Copy from a platform with different
byte order
dd if=/dev/nrst0 conv=swab| tar xf -
dd Examples
Here are three sample uses of dd:
Firs,t we copy a disk partition to a tape.
In the second example, we are copying from one tape drive to another.
Example three shows a more complex use of dd. Here we are using it to help transfer a tar file from
an SGI to a Linux system. Since these two platforms have a different byte order, a conversion needs
to take place. The byte-order conversion is made to an archive residing on a tape and piped to a tar
command.
This is probably not something you need to do often but is shown to illustrate how powerful dd can
be.
21
Unix and Linux Backups – SANS GIAC LevelOne
© 2000, 2001
21
dd Examples - Forensics
• Files AND filesystem metadata are
saved for forensics study
• Files that might have been maliciously
deleted (i.e., log files) might be able to
be restored
•Usage:
dd if=input_device of=output_device
If your system should ever be infected by a virus, Trojan horse, etc., first perform a backup of the
filesystem using dd. This will preserve filesystem information, along with “deleted” disk blocks
which forensics experts may be able to recover.
Ideally, you will have a ready spare to rebuild onto from your backups and can set the compromised
disk aside for forensic study.
A binary copy, in the hands of a computer forensics expert, might also provide insight to how the
virus operates.
22
Unix and Linux Backups – SANS GIAC LevelOne
© 2000, 2001
22
Command Summary
• tar
– Archive selected files or directories
– Copy contents of one directory to another
• dump/restore
– Can choose between full and incremental
– Can only dump entire file system, not a
single file or directory
•dd
–Binary backup
– Use to modify format of a dump file
– Copy archives between tapes
This slide compares and contrasts the backup commands tar, dump, and dd.
tar is best for backing up a single directory or selected files. You can also use it to copy the
contents of one directory to another, with the exception of /dev.
dump and its counterpart restore are best suited for creating backups of entire partitions. Usually
run from cron as root (since it needs to be able to read the raw device), it is typically used for
nightly backups.
dd is used to create a binary (byte for byte) copy of a device, convert between block formats, and
copy archive files, without extracting or restoring the archives.
Neither one is better or worse, they are just suited for different things. All three combined make for
an effective tool set that can support your backup plan.
23
Unix and Linux Backups – SANS GIAC LevelOne
© 2000, 2001
23
Tutorial Outline
• Unix/Linux Backup commands
• Tape operation
• Backup strategies
•Conclusion
Since backups are usually done with a tape device as the target, an overview of how to manage the
media is in order.
Let’s examine how we manipulate tape media in conjunction with backup.
24
Unix and Linux Backups – SANS GIAC LevelOne
© 2000, 2001
24
Tape Management
• Magnetic tape is the traditional medium of
choice
– Capacity
–Size
– Range of options
• 4mm, 8mm, DLT,
– Reusable and economical
• However, disks and CDROM are other
technologies that might be considered
Tapes, by and far, are the backup media of choice. They offer high capacity, take little space, are
available in various sizes and formats, and are cost efficient. However, they are not the only option
available.
Disks are getting cheaper and bigger. Experience has shown me that about 90% of user-requested
backups are for files that were modified within a week’s time. In an environment where backups are
requested often, and restore time must be kept low, keeping archives on line for a brief period of
time may be worth considering.
25
Unix and Linux Backups – SANS GIAC LevelOne
© 2000, 2001
25
mt Usage
mt -f /dev/nrst0 command
command can be any of the following:
• status = status of device
– (Tape must be loaded to get information on
device type, on-line, etc)
• rew = rewind
• offl = rewinds tape, ejects it
• fsf n = fast forward over n archive files
• bsf n = rewind over n archive files
• eom = skip to end of recorded media
Tape devices are managed using the mt command.
This slide shows some of the mt options for manipulating tape. As mentioned earlier, mt can
provide tape drive status, rewind, eject a tape and take the drive offline, skip over archive files,
backup over archive files, and jump to the end of the tape in preparation for appending a new
archive.