Manipulating Text
With only a shell available on the first Unix sys-
tems (on which Linux was based), using those
systems meant dealing primarily with commands
and plain text files. Documents, program code,
configuration files, e-mail, and almost anything
you created or configured was represented by
text files. To work with those files, early develop-
ers created many text manipulation tools.
Despite having graphical tools for working with
text, most seasoned Linux users find command
line tools to be more efficient and convenient.
Text editors such as vi (Vim), Emacs, JOE, nano,
and Pico are available with most Linux distribu-
tions. Commands such as
grep
,
sed
, and
awk
can
be used to find, and possibly change, pieces of
information within text files.
This chapter shows how to use many popular
commands for working with text files in Ubuntu.
It also explores some of the less common uses of
text manipulation commands that you might find
interesting.
Matching Text with
Regular Expressions
Many of the tools for working with text enable
you to use regular expressions, sometimes referred
to as regex, to identify the text you are looking for
based on some pattern. You can use these strings
to find text within a text editor or use them with
search commands to scan multiple files for the
strings of text you want.
IN THIS CHAPTER
Matching text with
regular expressions
Editing text files with
vi, JOE, or nano
Using graphical text
editors
Listing text with cat,
head, and tail
Paging text with less
and more
Paginating text with pr
Searching for text
with grep
Counting words, lines,
and characters with wc
Sorting output
with sort
Stream editing with
sed, tr, cut, and awk
Searching binaries for
text with strings
Finding differences in
files with diff
Converting text files
with unix2dos/
dos2unix
82935c05.qxd:Toolbox 10/29/07 1:32 PM Page 89
A regex search pattern can include a specific string of text (as in a word such as Linux)
or a location (such as the end of a line or the beginning of a word). It can also be spe-
cific (find just the word hello) or more inclusive (find any word beginning with h and
ending with o).
Appendix C includes reference information for shell metacharacters that can be used
in conjunction with regular expressions to do the exact kinds of matches you are look-
ing for. This section shows examples of using regular expressions with several differ-
ent tools you encounter throughout this chapter.
Table 5-1 shows some examples using basic regular expressions to match text strings.
Many examples of regular expressions are used in examples throughout this chapter.
Keep in mind that not every command that incorporates regex uses its features the
same way.
Table 5-1: Matching Using Regular Expressions
Editing Text Files
There are many text editors in the Linux/Unix world. The editor that is most com-
mon is vi, which can be found virtually on any Unix system available today. That
is why knowing how to at least make minor file edits in vi is a critical skill for any
Linux administrator. One day, if you find yourself in a minimalist, foreign Linux
Expression Matches
a*
a, ab, abc, and aecjejich
^a
Any “a” appearing at the beginning of a line
*a$
Any “a” appearing at the end of a line
a.c
Three-character strings that begin with a and end with c
[bcf]at
bat, cat, or fat
[a-d]at
aat, bat, cat, dat, but not Aat, Bat, and so on
[A-D]at
Aat, Bat, Cat, and Dat, but not aat, bat, and so on
1[3-5]7
137, 147, and 157
\tHello
A tab character preceding the word Hello
\.[tT][xX][Tt]
.txt, .TXT, .TxT, or other case combinations
Chapter 5: Manipulating Text
90
82935c05.qxd:Toolbox 10/29/07 12:59 PM Page 90
environment trying to bring a server back online, vi is the tool that will almost always
be there.
On Ubuntu, make sure you have the vim-enhanced package installed. Vim (Vi
IMproved) with the vim-enhanced package will provide the most up-to-date, feature-
rich, and user-friendly vi editor. For more details about using vi, refer to Appendix A.
NOTE Ubuntu installs vim by default.
Traditionally, the other popular Unix text editor has been Emacs and its more graphi-
cal variant, XEmacs. Emacs is a powerful multi-function tool that can also act as a
mail/news reader or shell, and perform other functions. Emacs is also known for its
very complex series of keyboard shortcuts that require three arms to execute properly.
In the mid-90s, Emacs was ahead of vi in terms of features. Now that Vim is widely
available, both can provide all the text editing features you’ll ever need. If you are not
already familiar with either vi or Emacs, we recommend you start by learning vi.
There are many other command line and GUI text editors available for Linux. Text-
based editors that you may find to be simpler than vi and Emacs include JED, JOE,
and nano. Start any of those editors by typing its command name, optionally fol-
lowed by the file name you want to edit. The following sections offer some quick
descriptions of how to use each of those editors.
Using the JOE Editor
If you have used classic word processors such as WordStar that worked with text files,
you might be comfortable with the JOE editor. To use JOE, install the joe package. To
use the spell checker in JOE, make sure the aspell package is installed. (Ubuntu installs
aspell by default.) To install JOE, run the following command:
$ sudo apt-get install joe
With JOE, instead of entering a command or text mode, you are always ready to type.
To move around in the file, you can use control characters or the arrow keys. To open a
text file for editing, just type joe and the file name or use some of the following options:
$ joe memo.txt Open memo.txt for editing
$ joe -wordwrap memo.txt Turn on wordwrap while editing
$ joe -lmargin 5 -tab 5 memo.txt Set left margin to 5 and tab to 5
$ joe +25 memo.txt Begin editing on line 25
To add text, just begin typing. You can use keyboard shortcuts for many functions. Use arrow
keys to move the cursor left, right, up, or down. Use the Delete key to delete text under
the cursor or the Backspace key to erase text to the left of the cursor. Press Enter to add
a line break. Press Ctrl+k+h to see the help screen. Table 5-2 shows the most commonly
used control keys for editing in JOE.
91
Chapter 5: Manipulating Text
82935c05.qxd:Toolbox 10/29/07 1:00 PM Page 91
Table 5-2: Control Keys for Editing with JOE
Key Combo Result
Cursor
Ctrl+b Left
Ctrl+p Up
Ctrl+f Right
Ctrl+n Down
Ctrl+z Previous word
Ctrl+x Next word
Search
Ctrl+k+f Find text
Ctrl+l Find next
Block
Ctrl+k+b Begin
Ctrl+k+k End
Ctrl+k+m Move block
Ctrl+k+c Copy block
Ctrl+k+w Write block to file
Ctrl+k+y Delete block
Ctrl+k+/ Filter
Misc
Ctrl+k+a Center line
Ctrl+t Options
Ctrl+r Refresh
File
Ctrl+k+e Open new file to edit
92
Chapter 5: Manipulating Text
82935c05.qxd:Toolbox 10/29/07 1:00 PM Page 92
Table 5-2: Control Keys for Editing with JOE (continued)
Continued
Key Combo Result
File (continued)
Ctrl+k+r Insert file at cursor
Ctrl+k+d Save
Goto
Ctrl+u Previous screen
Ctrl+v Next screen
Ctrl+a Line beginning
Ctrl+e End of line
Ctrl+k+u Top of file
Ctrl+k+v End of file
Ctrl+k+l To line number
Delete
Ctrl+d Delete character
Ctrl+y Delete line
Ctrl+w Delete word right
Ctrl+o Delete word left
Ctrl+j Delete line to right
Ctrl+- Undo
Ctrl+6 Redo
Exit
Ctrl+k+x Save and quit
Ctrl+c Abort
Ctrl+k+z Shell
93
Chapter 5: Manipulating Text
82935c05.qxd:Toolbox 10/29/07 1:00 PM Page 93
Table 5-2: Control Keys for Editing with JOE (continued)
Using the Pico and nano Editors
Pico is a popular, very small text editor, distributed as part of the Pine e-mail client.
Although Pico is free, it is not truly open source. Therefore, many Linux distributions,
including Ubuntu, don’t offer Pico. Instead, they offer an open source clone of Pico
called nano (nano’s another editor). This section describes the nano editor.
NOTE Ubuntu links the command
pico
to the program for the nano editor.
Nano (represented by the
nano
command) is a compact text editor that runs from the
shell, but is screen-oriented (owing to the fact that it is based on the curses library).
Nano is popular with those who formerly used the Pine e-mail client because nano’s
editing features are the same as those used by Pine’s Pico editor. On the rare occasion
that you don’t have the vi editor available on a Linux system (such as when installing
a minimal Gentoo Linux), nano may be available. Ubuntu installs nano by default.
You need the
spell
command, rather than
aspell
, to perform a spelling check
within nano.
As with the JOE editor, instead of having command and typing modes, you can just
begin typing. To open a text file for editing, just type nano and the file name or use some
of the following options:
$ nano memo.txt Open memo.txt for editing
$ nano -B memo.txt When saving, back up previous to ~.filename
$ nano -m memo.txt Turn on mouse to move cursor (if supported)
$ nano +83 memo.txt Begin editing on line 83
The
-m
command-line option turns on support for a mouse. You can use the mouse to
select a position in the text, and the cursor moves to that position. After the first click,
though, nano uses the mouse to mark a block of text, which may not be what you are
expecting.
As with JOE, to add text, just begin typing. Use arrow keys to move the cursor left, right,
up, or down. Use the Delete key to delete text under the cursor or the Backspace key
to erase text to the left of the cursor. Press Enter to add a line break. Press Ctrl+g to
read help text. Table 5-3 shows the control codes for nano that are described on the
help screen.
Key Combo Result
Spell
Ctrl+[+n Word
Ctrl+[+l File
94
Chapter 5: Manipulating Text
82935c05.qxd:Toolbox 10/29/07 1:00 PM Page 94
Table 5-3: Control Keys for Editing with nano
Continued
Control Code Function Key Description
Ctrl+g F1 Show help text. (Press Ctrl+x to exit help.)
Ctrl+x F2 Exit nano (or close the current file buffer).
Ctrl+o F3 Save the current file.
Ctrl+j F4 Justify the current text in the current paragraph.
Ctrl+r F5 Insert a file into the current file.
Ctrl+w F6 Search for text.
Ctrl+y F7 Go to the previous screen.
Ctrl+v F8 Go to the next screen.
Ctrl+k F9 Cut (and store) the current line or marked text.
Ctrl+u F10 Uncut (paste) the previously cut line into the file.
Ctrl+c F11 Display the current cursor position.
Ctrl+t F12 Start spell checking.
Ctrl+- Go to selected line and column numbers.
Ctrl+\ Search and replace text.
Ctrl+6 Mark text, starting at the cursor (Ctrl+6 to unset mark).
Ctrl+f Go forward one character.
Ctrl+b Go back one character.
Ctrl+Space Go forward one word.
Alt+Space Go backward one word.
Ctrl+p Go to the previous line.
Ctrl+n Go to the next line.
Ctrl+a Go to the beginning of the current line.
Ctrl+e Go to the end of the current line.
Alt+( Go to the beginning of the current paragraph.
95
Chapter 5: Manipulating Text
82935c05.qxd:Toolbox 10/29/07 1:00 PM Page 95
Table 5-3: Control Keys for Editing with nano (continued)
Graphical Text Editors
Just because you are editing text doesn’t mean you have to use a text-based editor.
The main advantages of using a graphical text editor is that you can use a mouse to
select menus, highlight text, cut and copy text, or run special plug-ins.
You can expect to have the GNOME text editor (gedit) if your Linux system has the
GNOME desktop installed. Features in gedit enable you to check spelling, list docu-
ment statistics, change display fonts and colors, and print your documents. The KDE
desktop also has its own KDE text editor (kedit in the kdeutils package). It includes
similar features to the GNOME text editor, along with a few extras, such as the ability
to send the current document with kmail or another user-configurable KDE component.
Vim itself comes with an X GUI version. It is launched with the
gvim
command, which
is part of the vim-X11 package. If you’d like to turn GUI Vim into a more user-friendly
text editor, you can download a third-party configuration called Cream from
http://
cream.sourceforge.net/
.
NOTE To use
gvim
, you need to install an additional package, vim-gnome.
Other text editors you can install include nedit (with features for using macros and
executing shell commands and aimed at software developers) and leafpad (which is
similar to the Windows Notepad text editor). The Scribes text editor (scribes) includes
some advanced features for automatic correction, replacement, indentation, and word
completion.
Listing, Sorting, and Changing Text
Instead of just editing a single text file, you can use a variety of Linux commands to
display, search, and manipulate the contents of one or more text files at a time.
Control Code Function Key Description
Alt+) Go to the end of the current paragraph.
Alt+\ Go to the first line of the file.
Alt+/ Go to the last line of the file.
Alt+] Go to the bracket matching the current bracket.
Alt+= Scroll down one line.
Alt+- Scroll up the line.
96
Chapter 5: Manipulating Text
82935c05.qxd:Toolbox 10/29/07 1:00 PM Page 96