Self-Service Linux®
B
RUCE
P
ERENS
’ O
PEN
S
OURCE
S
ERIES
www.phptr.com/perens
◆
Java™ Application Development on Linux®
Carl Albing and Michael Schwarz
◆
C++ GUI Programming with Qt 3
Jasmin Blanchette and Mark Summerfield
◆
Managing Linux Systems with Webmin: System Administration and Module Development
Jamie Cameron
◆
Understanding the Linux Virtual Memory Manager
Mel Gorman
◆
PHP 5 Power Programming
Andi Gutmans, Stig Bakken, and Derick Rethans
◆
Linux® Quick Fix Notebook
Peter Harrison
◆
Implementing CIFS: The Common Internet File System
Christopher Hertel
◆
Open Source Security Tools: A Practical Guide to Security Applications
Tony Howlett
◆
Apache Jakarta Commons: Reusable Java™ Components
Will Iverson
◆
Embedded Software Development with eCos
Anthony Massa
◆
Rapid Application Development with Mozilla
Nigel McFarlane
◆
Subversion Version Control: Using the Subversion Version Control System in Development
Projects
William Nagel
◆
Intrusion Detection with SNORT: Advanced IDS Techniques Using SNORT, Apache, MySQL,
PHP, and ACID
Rafeeq Ur Rehman
◆
Cross-Platform GUI Programming with wxWidgets
Julian Smart and Kevin Hock with Stefan Csomor
◆
Samba-3 by Example, Second Edition: Practical Exercises to Successful Deployment
John H. Terpstra
◆
The Official Samba-3 HOWTO and Reference Guide, Second Edition
John H. Terpstra and Jelmer R. Vernooij, Editors
◆
Self-Service Linux®: Mastering the Art of Problem Determination
Mark Wilding and Dan Behman
perens_series_7x9.25.fm Page 1 Tuesday, August 16, 2005 2:17 PM
Mark Wilding and Dan Behman
PRENTICE HALL
Professional Technical Reference
Upper Saddle River, NJ ● Boston ●
Indianapolis ● San Francisco ● New York ●
Toronto ● Montreal ● London ● Munich ●
Paris ● Madrid ● Capetown ● Sydney ●
Tokyo ● Singapore ● Mexico City
Self-Service Linux®
Mastering the Art of Problem
Determination
Many of the designations used by manufacturers and sellers to distinguish their products
are claimed as trademarks. Where those designations appear in this book, and the publisher
was aware of a trademark claim, the designations have been printed with initial capital
letters or in all capitals.
The authors and publisher have taken care in the preparation of this book, but make no
expressed or implied warranty of any kind and assume no responsibility for errors or omissions.
No liability is assumed for incidental or consequential damages in connection with or arising
out of the use of the information or programs contained herein.
The publisher offers excellent discounts on this book when ordered in quantity for bulk
purchases or special sales, which may include electronic versions and/or custom covers and
content particular to your business, training goals, marketing focus, and branding interests.
For more information, please contact:
U. S. Corporate and Government Sales
(800) 382-3419
For sales outside the U. S., please contact:
International Sales
Visit us on the Web: www.phptr.com
Library of Congress Number: 2005927150
Copyright © 2006 Pearson Education, Inc.
This material may be distributed only subject to the terms and conditions set forth in the Open
Publication License, v1.0 or later (the latest version is presently available at
http://
www.opencontent.org/openpub/).
ISBN 0-13-147751-X
Text printed in the United States on recycled paper at R.R. Donnelley in Crawfordsville,
Indiana.
First printing, September, 2005
I would like to dedicate this book to my wife, Caryna, whose relentless nagging and
badgering forced me to continue working on this book when nothing else could. Just
kidding Without Caryna’s support and understanding, I could never have written
this book. Not only did she help me find time to write, she also spent countless hours
formatting the entire book for production. I would also like to dedicate this book to
my two sons, Rhys and Dylan, whose boundless energy acted as inspiration
throughout the writing of this book.
Mark Wilding
Without the enduring love and patience of my wife Kim, this laborous
project would have halted long ago. I dedicate this book to her,
as well as to my beautiful son Nicholas, my family,
and all of the Botzangs and Mayos.
Dan Behman
Gutmans_Frontmatter Page vi Thursday, September 23, 2004 9:05 AM
Contents
Preface
Chapter 1: Best Practices and Initial Investigation
Chapter 2: strace and System Call Tracing Explained
Chapter 3: The /proc Filesystem
Chapter 4: Compiling
Chapter 5: The Stack
Chapter 6: The GNU Debugger (GDB)
Chapter 7: Linux System Crashes and Hangs
Chapter 8: Kernel Debugging with KDB
Chapter 9: ELF: Executable and Linking Format
A: The Toolbox
B: Data Collection Script
Index
Contents
Preface xvii
1 Best Practices and Initial Investigation 1
1.1 Introduction 1
1.2 Getting Your System(s) Ready for Effective Problem
Determination 2
1.3 The Four Phases of Investigation 3
1.3.1 Phase #1: Initial Investigation Using Your Own Skills 5
1.3.2 Phase #2: Searching the Internet Effectively 9
1.3.3 Phase #3: Begin Deeper Investigation (Good Problem
Investigation Practices) 12
1.3.4 Phase #4: Getting Help or New Ideas 21
1.4 Technical Investigation 28
1.4.1 Symptom Versus Cause 28
1.5 Troubleshooting Commercial Products 38
1.6 Conclusion 39
2 strace and System Call Tracing Explained 41
2.1 Introduction 41
2.2 What Is strace? 41
2.2.1 More Information from the Kernel Side 45
2.2.2 When to Use It 48
2.2.3 Simple Example 49
2.2.4 Same Program Built Statically 53
2.3 Important strace Options 54
2.3.1 Following Child Processes 54
2.3.2 Timing System Call Activity 55
2.3.3 Verbose Mode 57
2.3.4 Tracing a Running Process 59
2.4 Effects and Issues of Using strace 60
2.4.1 strace and EINTR 61
2.5 Real Debugging Examples 62
2.5.1 Reducing Start Up Time by Fixing
LD_LIBRARY_PATH 62
2.5.2 The PATH Environment Variable 65
2.5.3 stracing inetd or xinetd (the Super Server) 66
2.5.4 Communication Errors 68
2.5.5 Investigating a Hang Using strace 69
2.5.6 Reverse Engineering (How the strace Tool Itself Works) 71
2.6 System Call Tracing Examples 74
2.6.1 Sample Code 75
2.6.2 The System Call Tracing Code Explained 87
2.7 Conclusion 88
3 The /proc Filesystem 89
3.1 Introduction 89
3.2 Process Information 90
3.2.1 /proc/self 90
3.2.2 /proc/<pid> in More Detail 91
3.2.3 /proc/<pid>/cmdline 107
3.2.4 /proc/<pid>/environ 107
3.2.5 /proc/<pid>/mem 107
3.2.6 /proc/<pid>/fd 108
3.2.7 /proc/<pid>/mapped base 108
3.3 Kernel Information and Manipulation 109
3.3.1 /proc/cmdline 109
3.3.2 /proc/config.gz or /proc/sys/config.gz 109
3.3.3 /proc/cpufreq 109
3.3.4 /proc/cpuinfo 110
3.3.5 /proc/devices 110
3.3.6 /proc/kcore 111
3.3.7 /proc/locks 111
3.3.8 /proc/meminfo 111
3.3.9 /proc/mm 111
3.3.10 /proc/modules 112
3.3.11 /proc/net 112
3.3.12 /proc/partitions 112
3.3.13 /proc/pci 113
3.3.14 /proc/slabinfo 113
x
Contents
3.4 System Information and Manipulation 113
3.4.1 /proc/sys/fs 113
3.4.2 /proc/sys/kernel 115
3.4.3 /proc/sys/vm 120
3.5 Conclusion 120
4 Compiling 121
4.1 Introduction 121
4.2 The GNU Compiler Collection 121
4.2.1 A Brief History of GCC 121
4.2.2 GCC Version Compatibility 122
4.3 Other Compilers 122
4.4 Compiling the Linux Kernel 123
4.4.1 Obtaining the Kernel Source 123
4.4.2 Architecture Specific Source 124
4.4.3 Working with Kernel Source Compile Errors 124
4.4.4 General Compilation Problems 128
4.5 Assembly Listings 133
4.5.1 Purpose of Assembly Listings 134
4.5.2 Generating Assembly Listings 135
4.5.3 Reading and Understanding an Assembly Listing 136
4.6 Compiler Optimizations 140
4.7 Conclusion 149
5 The Stack 151
5.1 Introduction 151
5.2 A Real-World Analogy 152
5.3 Stacks in x86 and x86-64 Architectures 153
5.4 What Is a Stack Frame? 157
5.5 How Does the Stack Work? 159
5.5.1 The BP and SP Registers 159
5.5.2 Function Calling Conventions 162
5.6 Referencing and Modifying Data on the Stack 171
5.7 Viewing the Raw Stack in a Debugger 173
5.8 Examining the Raw Stack in Detail 176
5.8.1 Homegrown Stack Traceback Function 180
5.9 Conclusion 191
6 The GNU Debugger (GDB) 193
6.1 Introduction 193
6.2 When to Use a Debugger 194
6.3 Command Line Editing 195
xiContents
6.4 Controlling a Process with GDB 196
6.4.1 Running a Program Off the Command Line with GDB 197
6.4.2 Attaching to a Running Process 199
6.4.3 Use a Core File 200
6.5 Examining Data, Memory, and Registers 204
6.5.1 Memory Map 204
6.5.2 Stack 206
6.5.3 Examining Memory and Variables 210
6.5.4 Register Dump 217
6.6 Execution 220
6.6.1 The Basic Commands 221
6.6.2 Settings for Execution Control Commands 223
6.6.3 Breakpoints 228
6.6.4 Watchpoints 230
6.6.5 Display Expression on Stop 234
6.6.6 Working with Shared Libraries 235
6.7 Source Code 238
6.8 Assembly Language 240
6.9 Tips and Tricks 241
6.9.1 Attaching to a Process—Revisited 241
6.9.2 Finding the Address of Variables and Functions 244
6.9.3 Viewing Structures in Executables without Debug
Symbols 246
6.9.4 Understanding and Dealing with Endian-ness 250
6.10 Working with C++ 252
6.10.1 Global Constructors and Destructors 252
6.10.2 Inline Functions 256
6.10.3 Exceptions 257
6.11 Threads 260
6.11.1 Running Out of Stack Space 265
6.12 Data Display Debugger (DDD) 266
6.12.1 The Data Display Window 268
6.12.2 Source Code Window 272
6.12.3 Machine Language Window 273
6.12.4 GDB Console Window 274
6.13 Conclusion 274
7 Linux System Crashes and Hangs 275
7.1 Introduction 275
7.2 Gathering Information 275
7.2.1 Syslog Explained 276
7.2.2 Setting up a Serial Console 277
xii
Contents
7.2.3 Connecting the Serial Null-Modem Cable 278
7.2.4 Enabling the Serial Console at Startup 279
7.2.5 Using SysRq Kernel Magic 281
7.2.6 Oops Reports 281
7.2.7 Adding a Manual Kernel Trap 281
7.2.8 Examining an Oops Report 284
7.2.9 Determining the Failing Line of Code 289
7.2.10 Kernel Oopses and Hardware 293
7.2.11 Setting up cscope to Index Kernel Sources 294
7.3 Conclusion 295
8 Kernel Debugging with KDB 297
8.1 Introduction 297
8.2 Enabling KDB 297
8.3 Using KDB 299
8.3.1 Activating KDB 299
8.3.2 Resuming Normal Execution 300
8.3.3 Basic Commands 300
8.4 Conclusion 305
9 ELF: Executable and Linking Format 307
9.1 Introduction 307
9.2 Concepts and Definitions 309
9.2.1 Symbol 309
9.2.2 Object Files, Shared Libraries, Executables, and Core
Files 311
9.2.3 Linking 314
9.2.4 Run Time Linking 318
9.2.5 Program Interpreter / Run Time Linker 318
9.3 ELF Header 318
9.4 Overview of Segments and Sections 324
9.5 Segments and the Program Header Table 325
9.5.1 Text and Data Segments 329
9.6 Sections and the Section Header Table 331
9.6.1 String Table Format 335
9.6.2 Symbol Table Format 335
9.6.3 Section Names and Types 338
9.7 Relocation and Position Independent Code (PIC) 362
9.7.1 PIC vs. non-PIC 363
9.7.2 Relocation and Position Independent Code 366
9.7.3 Relocation and Linking 367
9.8 Stripping an ELF Object 371
xiiiContents
9.9 Program Interpreter 372
9.9.1 Link Map 376
9.10 Symbol Resolution 377
9.11 Use of Weak Symbols for Problem Investigations 382
9.12 Advanced Interception Using Global Offset Table 386
9.13 Source Files 390
9.14 ELF APIs 392
9.15 Other Information 392
9.16 Conclusion 392
A The Toolbox 393
A.1 Introduction 393
A.2 Process Information and Debugging 393
A.2.1 Tool: GDB 393
A.2.2 Tool: ps 393
A.2.3 Tool: strace (system call tracer) 394
A.2.4 Tool: /proc filesystem 394
A.2.5 Tool: DDD (Data Display Debugger) 394
A.2.6 Tool: lsof (List Open Files) 394
A.2.7 Tool: ltrace (library call tracer) 395
A.2.8 Tool: time 395
A.2.9 Tool: top 395
A.2.10 Tool: pstree 396
A.3 Network 396
A.3.1 Tool: traceroute 396
A.3.2 File: /etc/hosts 396
A.3.3 File: /etc/services 396
A.3.4 Tool: netstat 397
A.3.5 Tool: ping 397
A.3.6 Tool: telnet 397
A.3.7 Tool: host/nslookup 397
A.3.8 Tool: ethtool 398
A.3.9 Tool: ethereal 398
A.3.10 File: /etc/nsswitch.conf 398
A.3.11 File: /etc/resolv.conf 398
A.4 System Information 399
A.4.1 Tool: vmstat 399
A.4.2 Tool: iostat 399
A.4.3 Tool: nfsstat 399
A.4.4 Tool: sar 400
A.4.5 Tool: syslogd 400
A.4.6 Tool: dmesg 400
xiv
Contents
A.4.7 Tool: mpstat 400
A.4.8 Tool: procinfo 401
A.4.9 Tool: xosview 401
A.5 Files and Object Files 401
A.5.1 Tool: file 401
A.5.2 Tool: ldd 401
A.5.3 Tool: nm 402
A.5.4 Tool: objdump 402
A.5.5 Tool: od 402
A.5.6 Tool: stat 402
A.5.7 Tool: readelf 403
A.5.8 Tool: strings 403
A.6 Kernel 403
A.6.1 Tool: KDB 403
A.6.2 Tool: KGDB 403
A.6.3 Tool: ksymoops 404
A.7 Miscellaneous 404
A.7.1 Tool: VMWare Workstation 404
A.7.2 Tool: VNC Server 405
A.7.3 Tool: VNC Viewer 405
B Data Collection Script 407
B.1 Overview 407
B.1.1 -thorough 409
B.1.2 -perf, -hang <pid>, -trap, -error <cmd> 409
B.2 Running the Script 410
B.3 The Script Source 410
B.4 Disclaimer 419
Index 421
xvContents
About the Authors
Mark Wilding is a senior developer at IBM who currently specializes in
serviceability technologies, UNIX, and Linux. With over 15 years of experience
writing software, Mark has extensive expertise in operating systems, networks,
C/C++ development, serviceability, quality engineering, and computer hardware.
Dan Behman is a member of the DB2 UDB for Linux Platform Exploitation
development team at the Toronto IBM Software Lab. He has over 10 years of
experience with Linux, and has been involved in porting and enabling DB2
UDB on the latest architectures that Linux supports, including x86-64,
zSeries, and POWER platforms.
xviiPreface
Preface
Linux is the ultimate choice for home and business users. It is powerful, as
stable as any commercial operating system, secure, and best of all, it is open
source. One of the biggest deciding factors for whether to use Linux at home or
for your business can be service and support. Because Linux is developed by
thousands of volunteers from around the world, it is not always clear who to
turn to when something goes wrong.
In the true spirit of Linux, there is a slightly different approach to support
than the commercial norm. After all, Linux represents an unparalleled
community of experts, it includes industry leading problem determination tools,
and of course, the product itself includes the source code. These resources are
in addition to the professional Linux support services that are available from
companies, such as IBM, and the various Linux vendors, such as Redhat and
SUSE. Making the most of these additional resources is called “self-service”
and is the main topic covered by this book.
Self-service on Linux means different things to different people. For those
who use Linux at home, it means a more enjoyable Linux experience. For those
xviii
Preface
who use Linux at work, being able to quickly and effectively diagnose problems
on Linux can increase their value as employees as well as their marketability.
For corporate leaders deciding whether to adopt Linux as part of the corporate
strategy, self-service for Linux means reduced operation costs and increased
Return on Investment (ROI) for any Linux adoption strategy. Regardless of
what type of Linux user you are, it is important to make the most of your
Linux experience and investment.
WHAT IS THIS BOOK ABOUT?
In a nutshell, this book is about effectively and efficiently diagnosing problems
that occur in the Linux environment. It covers good investigation practices,
how to use the information and resources on the Internet, and then dives right
into detail describing how to use the most important problem determination
tools that Linux has to offer.
Chapter 1 is like a crash course on effective problem determination
practices, which will help you to diagnose problems like an expert. It covers
where and how to look for information on the Internet as well as how to start
investigating common types of problems.
Chapter 2 covers strace, which is arguably the most frequently used
problem determination tool in Linux. This chapter includes both practical usage
information as well as details about how strace works. It also includes source
code for a simple strace tool and details about how the underlying functionality
works with the kernel through the ptrace interface.
Chapter 3 is about the /proc filesystem, which contains a wealth of
information about the hardware, kernel, and processes that are running on
the system. The purpose of this chapter is to point out and examine some of the
more advanced features and tricks primarily related to problem determination
and system diagnosis. For example, the chapter covers how to use the SysRq
Kernel Magic hotkey with /proc/sys/kernel/sysrq.
Chapter 4 provides detailed information about compiling. Why does a
book about debugging on Linux include a chapter about compiling? Well, the
beginning of this preface mentioned that diagnosing problems in Linux is
different than that on commercial environments. The main reason behind this
is that the source code is freely available for all of the open source tools and
the operating system itself. This chapter provides vital information whether
you need to recompile an open source application with debug information (as
is often the case), whether you need to generate an assembly language listing
for a tough problem (that is, to find the line of code for a trap), or whether you
run into a problem while recompiling the Linux kernel itself.
xixPreface
Chapter 5 covers intimate details about the stack, one of the most
important and fundamental concepts of a computer system. Besides explaining
all the gory details about the structure of a stack (which is pretty much required
knowledge for any Linux expert), the chapter also includes and explains source
code that can be used by the readers to generate stack traces from within their
own tools and applications. The code examples are not only useful to illustrate
how the stack works but they can save real time and debugging effort when
included as part of an application’s debugging facilities.
Chapter 6 takes an in-depth and detailed look at debugging applications
with the GNU Debugger (GDB) and includes an overview of the Data Display
Debugger (DDD) graphical user interface. Linux has an advantage over most
other operating systems in that it includes a feature rich debugger, GDB, for
free. Debuggers can be used to debug many types of problems, and given that
GDB is free, it is well worth the effort to understand the basic as well as the
more advanced features. This chapter covers hard-to-find details about
debugging C++ applications, threaded applications, as well as numerous best
practices. Have you ever spawned an xterm to attach to a process with GDB?
This chapter will show you how—and why!
Chapter 7 provides a detailed overview of system crashes and hangs. With
proprietary operating systems (OSs), a system crash or hang almost certainly
requires you to call the OS vendor for help. However with Linux, the end user
can debug a kernel problem on his or her own or at least identify key information
to search for known problems. If you do need to get an expert involved, knowing
what to collect will help you to get the right data quickly for a fast diagnosis.
This chapter describes everything from how to attach a serial console to how
to find the line of code for a kernel trap (an “oops”). For example, the chapter
provides step-by-step details for how to manually add a trap in the kernel and
then debug it to find the resulting line of code.
Chapter 8 covers more details about debugging the kernel or debugging
with the kernel debugger, kdb. The chapter covers how to configure and enable
kdb on your system as well as some practical commands that most Linux users
can use without being a kernel expert. For example, this chapter shows you
how to find out what a process is doing from within the kernel, which can be
particularly useful if the process is hung and not killable.
Chapter 9 is a detailed, head-on look at Executable and Linking Format
(ELF). The details behind ELF are often ignored or just assumed to work. This
is really unfortunate because a thorough understanding of ELF can lead to a
whole new world of debugging techniques. This chapter covers intimate but
practical details of the underlying ELF file format as well as tips and tricks
that few people know. There is even sample code and step-by-step instructions
xx
Preface
for how to override functions using LD_PRELOAD and how to use the global
offset table and the GDB debugger to intercept functions manually and redirect
them to debug versions.
Appendix A is a toolbox that outlines the most useful tools, facilities, and
files on Linux. For each tool, there is a description of when it is useful and
where to get the latest copy.
Appendix B includes a production-ready data collection script that is
especially useful for mission-critical systems or those who remotely support
customers on Linux. The data collection script alone can save many hours or
even days for debugging a remote problem.
Note: The source code used in this book can be found at http://
www.phptr.com/title/013147751X.
Note: A code continuation character, ➥, appears at the beginning
of code lines that have wrapped down from the line above it.
Lastly, as we wrote this book it became clear to us that we were covering
the right information. Reviewers often commented about how they were able
to use the information immediately to solve real problems, not the problems
that may come in the future or may have happened in the past, but real problems
that people were actually struggling with when they reviewed the chapters.
We also found ourselves referring to the content of the book to help solve
problems as they came up. We hope you find it as useful as it has been to those
who have read it thus far.
WHO IS THIS BOOK FOR?
This book has useful information for any Linux user but is certainly geared
more toward the Linux professional. This includes Linux power users, Linux
administrators, developers who write software for Linux, and support staff
who support products on Linux.
Readers who casually use Linux at home will benefit also, as long as they
either have a basic understanding of Linux or are at least willing to learn
more about it—the latter being most important.
Ultimately, as Linux increases in popularity, there are many seasoned
experts who are facing the challenge of translating their knowledge and
experience to the Linux platform. Many are already experts with one or more
operating systems except that they lack specific knowledge about the various
command line incantations or ways to interpret their knowledge for Linux.
xxiPreface
This book will help such experts to quickly adapt their existing skill set and
apply it effectively on Linux.
This power-packed book contains real industry experience on many topics
and very hard-to-find information. Without a doubt, it is a must have for any
developer, tester, support analyst, or anyone who uses Linux.
ACKNOWLEDGMENTS
Anyone who has written a book will agree that it takes an enormous amount of
effort. Yes, there is a lot of work for the authors, but without the many key
people behind the scenes, writing a book would be nearly impossible. We would
like to thank all of the people who reviewed, supported, contributed, or otherwise
made this book possible.
First, we would like to thank the reviewers for their time, patience, and
valuable feedback. Besides the typos, grammatical errors, and technical
omissions, in many cases the reviewers allowed us to see other vantage points,
which in turn helped to make the content more well-rounded and complete. In
particular, we would like to thank Richard Moore, for reviewing the technical
content of many chapters; Robert Haskins, for being so thorough with his
reviews and comments; Mel Gorman, for his valuable feedback on the ELF
(Executable and Linking Format) chapter; Scott Dier, for his many valuable
comments; Jan Kritter, for reviewing pretty much the entire book; and Joyce
Coleman, Ananth Narayan, Pascale Stephenson, Ben Elliston, Hien Nguyen,
Jim Keniston, as well as the IBM Linux Technology Center, for their valuable
feedback. We would also like to thank the excellent engineers from SUSE for
helping to answer many deep technical questions, especially Andi Kleen, Frank
Balzer, and Michael Matz.
We would especially like to thank our wives and families for the support,
encouragement, and giving us the time to work on this project. Without their
support, this book would have never gotten past the casual conversation we
had about possibly writing one many months ago. We truly appreciate the
sacrifices that they have made to allow us to finish this book.
Last of all, we would like to thank the Open Source Community as a
whole. The open source movement is a truly remarkable phenomenon that has
and will continue to raise the bar for computing at home or for commercial
environments. Our thanks to the Open Source Community is not specifically
for this book but rather for their tireless dedication and technical prowess that
make Linux and all open source products a reality. It is our hope that the
content in this book will encourage others to adopt, use or support open source
products and of course Linux. Every little bit helps.
Thanks for reading this book.
xxii
Preface
OTHER
The history and evolution of the Linux operating system is fascinating and
certainly still being written with new twists popping up all the time. Linux
itself comprises only the kernel of the whole operating system. Granted, this is
the single most important part, but everything else surrounding the Linux
kernel is made up mostly of GNU free software. There are two major things
that GNU software and the Linux kernel have in common. The first is that the
source code for both is freely accessible. The second is that they have been
developed and continue to be developed by many thousands of volunteers
throughout the world, all connecting and sharing ideas and work through the
Internet. Many refer to this collaboration of people and resources as the Open
Source Community.
The Open Source Community is much like a distributed development
team with skills and experience spanning many different areas of computer
science. The source code that is written by the Open Source Community is
available for anyone and everyone to see. Not only can this make problem
determination easier, having such a large and diverse group of people looking
at the code can reduce the number of defects and improve the security of the
source code. Open source software is open to innovations as much as criticism,
both helping to improve the quality and functionality of the software.
One of the most common concerns about adopting Linux is service and
support. However, Linux has the Open Source Community, a wide range of
freely available problem determination tools, the source code, and the Internet
itself as a source of information including numerous sites and newsgroups
dedicated to Linux. It is important for every Linux user to understand the
resources and tools that are available to help them diagnose problems. That is
the purpose of this book. It is not intended to be a replacement to a support
contract, nor does it require one. If you have one, this book is an enhancement
that will be sure to help you make the most of your existing support contract.
1
Best Practices and Initial
Investigation
1.1 INTRODUCTION
Your boss is screaming, your customers are screaming, you’re screaming …
Whatever the situation, there is a problem, and you need to solve it. Remember
those old classic MUD games? For those who don’t, a Multi-User Dungeon or
MUD was the earliest incarnation of the online video game. Users played the
game through a completely non-graphical text interface that described the
surroundings and options available to the player and then prompted the user
with what to do next.
You are alone in a dark cubicle. To the North is your boss’s office, to
the West is your Team Lead’s cubicle, to the East is a window opening
out to a five-floor drop, and to the South is a kitchenette containing
a freshly brewed pot of coffee. You stare at your computer screen in
bewilderment as the phone rings for the fifth time in as many minutes
indicating that your users are unable to connect to their server.
Command>
What will you do? Will you run toward the East and dive through the
open window? Will you go grab a hot cup of coffee to ensure you stay alert for
the long night ahead? A common thing to do in these MUD games was to
examine your surroundings further, usually done by the look command.
Command> look
Your cubicle is a mess of papers and old coffee cups. The message
waiting light on your phone is burnt out from flashing for so many
months. Your email inbox is overflowing with unanswered emails. On top
of the mess is the brand new book you ordered entitled “Self-Service
Linux.” You need a shower.
Command> read book “Self-Service Linux”
You still need a shower.
1
C H A P T E R 1
2
Best Practices and Initial Investigation Chap. 1
This tongue-in-cheek MUD analogy aside, what can this book really do
for you? This book includes chapters that are loaded with useful information
to help you diagnose problems quickly and effectively. This first chapter covers
best practices for problem determination and points to the more in-depth
information found in the chapters throughout this book. The first step is to
ensure that your Linux system(s) are configured for effective problem
determination.
1.2 GETTING YOUR SYSTEM(S) READY FOR EFFECTIVE PROBLEM
DETERMINATION
The Linux problem determination tools and facilities are free, which begs the
question: Why not install them? Without these tools, a simple problem can
turn into a long and painful ordeal that can affect a business and/or your
personal time. Before reading through the rest of the book, take some time to
make sure the following tools are installed on your system(s). These tools are
just waiting to make your life easier and/or your business more productive:
☞ strace: The strace tool traces the system calls, special functions that
interact with the operating system. You can use this for many types of
problems, especially those that relate to the operating system.
☞ ltrace: The ltrace tool traces the functions that a process calls. This is
similar to strace, but the called functions provide more detail.
☞ lsof: The lsof tool lists all of the open files on the operating system (OS).
When a file is open, the OS returns a numeric file descriptor to the process
to use. This tool lists all of the open files on the OS with their respective
process IDs and file descriptors.
☞ top: This tool lists the “top” processes that are running on the system. By
default it sorts by the amount of current CPU being consumed by a process.
☞ traceroute/tcptraceroute: These tools can be used to trace a network
route (or at least one direction of it).
☞ ping: Ping simply checks whether a remote system can respond. Sometimes
firewalls block the network packets ping uses, but it is still very useful.
3
☞ hexdump or equivalent: This is simply a tool that can display the raw
contents of a file.
☞ tcpdump and/or ethereal: Used for network problems, these tools can
display the packets of network traffic.
☞ GDB: This is a powerful debugger that can be used to investigate some of
the more difficult problems.
☞ readelf: This tool can read and display information about various sections
of an Executable and Linking Format (ELF) file.
These tools (and many more) are listed in Appendix A, “The Toolbox,”
along with information on where to find these tools. The rest of this book
assumes that your systems have these basic Linux problem determination tools
installed. These tools and facilities are free, and they won’t do much good sitting
quietly on an installation CD (or on the Internet somewhere). In fact, this book
will self-destruct in five minutes if these tools are not installed.
Now of course, just because you have a tool in your toolbox, it doesn’t
mean you know how to use it in a particular situation. Imagine a toolbox with
lots of very high quality tools sitting on your desk. Suddenly your boss walks
into your office and asks you to fix a car engine or TV. You know you have the
tools. You might even know what the tools are used for (that is, a wrench is
used for loosening and tightening bolts), but could you fix that car engine? A
toolbox is not a substitute for a good understanding of how and when to use
the tools. Understanding how and when to use these tools is the main focus of
this book.
1.3 THE FOUR PHASES OF INVESTIGATION
Good investigation practices should balance the need to solve problems quickly,
the need to build your skills, and the effective use of subject matter experts.
The need to solve a problem quickly is obvious, but building your skills is
important as well.
Imagine walking into a library looking for information about a type of
hardwood called “red oak.” To your surprise, you find a person who knows
absolutely everything about wood. You have a choice to make. You can ask this
person for the information you need, or you can read through several books
and resources trying to find the information on your own. In the first case, you
will get the answer you need right away you just need to ask. In the second
case, you will likely end up reading a lot of information about hardwood on
1.3 The Four Phases of Investigation