Tải bản đầy đủ (.pdf) (119 trang)

Thinking in Java 3rd Edition phần 7 pdf

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (570.71 KB, 119 trang )


684 Thinking in Java www.BruceEckel.com
static method you’ll usually use getClass(). You don’t need to use the
current class as the node identifier, but that’s the usual practice.
Feedback

Once you create the node, it’s available for either loading or reading data.
This example loads the node with various types of items, and then gets the
keys(). These come back as a String[], which you might not expect if
you’re used to keys() in the collections library. Here, they’re converted to
a List which is used to produce an Iterator for printing the keys and
values. Notice the second argument to get(). This is the default value
which is produced if there isn’t any entry for that key value. While
iterating through a set of keys, you always know there’s an entry so using
null as the default is safe, but normally you’ll be fetching a named key, as
in:
Feedback

prefs.getInt("Companions", 0));

In the normal case you’ll want to provide a reasonable default value. In
fact, a typical idiom is seen in the lines:
int usageCount = prefs.getInt("UsageCount", 0);
usageCount++;
prefs.putInt("UsageCount", usageCount);

This way, the first time you run the program the UsageCount will be
zero, but on subsequent invocations it will be nonzero.
Feedback

When you run PreferencesDemo.java you’ll see that the UsageCount


does indeed increment every time you run the program, but where is the
data stored? There’s no local file that appears after the program is run the
first time. The Preferences API uses appropriate system resources to
accomplish its task, and these will vary depending on the OS. In Windows,
the registry is used (since it’s already a hierarchy of nodes with key-value
pairs). But the whole point is that the information is magically stored for
you so that you don’t have to worry about how it works from one system
to another.
Feedback

There’s more to the preferences API than shown here. Consult the JDK
documentation, which is fairly understandable, for further details.
Feedback


Chapter 12: The Java I/O System 685
Regular expressions
To finish this chapter, we’ll look at regular expressions, which were added
in JDK 1.4 but have been integral to Standard Unix utilities like sed &
awk, and languages like Python and Perl (some would argue that they are
predominant reason for Perl’s success). Technically these are string
manipulation tools (previously delegated to the String, StringBuffer, &
StringTokenizer classes in Java), but they are typically used in
conjunction with I/O so it’s not too far-fetched to include them here
5
.
Feedback

Regular expressions are powerful and flexible text processing tools. They
allow you to specify, programmatically, complex patterns of text that can

be discovered in an input string. Once you discover these patterns, you
can then react to them any way you want. Although the syntax of regular
expressions can be intimidating at first, they provide a compact and
dynamic language which can be employed to solve all sorts of string
processing, matching and selection, editing, and verification problems in
a completely general way.
Feedback

Creating regular expressions
You can begin learning regular expressions with a useful subset of the
possible constructs. A complete list of constructs for building regular
expressions can be found in the JavaDocs for the Pattern class for
package java.util.regex.
Feedback

Characters
B The specific character B
\xhh Character with hex value 0xhh
\uhhhh The Unicode character with hex representation
0xhhhh
\t Tab

5
A chapter dedicated to strings will have to wait until the 4
th
edition. Mike Shea
contributed to this section.

686 Thinking in Java www.BruceEckel.com
\n Newline

\r Carriage return
\f Formfeed
\e Escape
The power of regular expressions begins to appear when defining
character classes. Here are some typical ways to create character classes,
and some predefined classes:
Feedback

Character Classes
. Represents any character
[abc] Any of the characters a, b, or c (same as
a|b|c)
[^abc] Any character except a, b, and c (negation)
[a-zA-Z] Any character a thru z or A thru Z (range)
[abc[hij]] Any of a,b,c,h,i,j (same as a|b|c|h|i|j)
(union)
[a-z&&[hij]] Either h, i, or j (intersection)
\s A whitespace character (space, tab,
newline, formfeed, carriage return)
\S A non-whitespace character ([^\s])
\d A numeric digit [0-9]
\D A non-digit [^0-9]
\w A word character [a-zA-Z_0-9]
\W A non-word character [^\w]
If you have any experience with regular expressions in other languages,
you’ll immediately notice a difference in the way backslashes are handled.
In other languages, “\\” means “I want to insert a plain old (literal)
backslash in the regular expression. Don’t give it any special meaning.” In
Java, “\\” means “I’m inserting a regular expression backslash, so the
following character has special meaning.” For example, if you want to

indicate one or more word characters, your regular expression string will
be “\\w+”. If you want to insert a literal backslash, you say “\\\\”.

Chapter 12: The Java I/O System 687
However, things like newlines and tabs just use a single backslash: “\n\t”.
Feedback

What’s shown here is only a sampling; you’ll want to have the
java.util.regex.Pattern JDK documentation page bookmarked or on
your “start” menu so you can easily access all the possible regular
expression patterns.
Feedback

Logical Operators
XY X followed by Y
X|Y X or Y
(X) A capturing group. You can refer to the
i
th
captured group later in the expression
with \i

Boundary Matchers
^ Beginning of a line
$ End of a line
\b Word boundary
\B Non-word boundary
\G End of the previous match
As an example, each of the following represent valid regular expressions,
and all will successfully match the character sequence "Rudolph":

Rudolph
[rR]udolph
[rR][aeiou][a-z]ol.*
R.*

Quantifiers
A Quantifier describes the way that a pattern absorbs input text:
• Greedy: Quantifiers are greedy unless otherwise altered. A greedy
expression finds as many possible matches for the pattern as
possible. A typical cause of problems is assuming that your pattern

688 Thinking in Java www.BruceEckel.com
will only match the first possible group of characters, when it’s
actually greedy and will keep going.
Feedback

• Reluctant: Specified with a question mark. Matches the minimum
necessary number of characters to satisfy the pattern. Also called
lazy, minimal matching, non-greedy or ungreedy.
Feedback

• Possessive: Currently only available in Java (not in other
languages), and is more advanced so you probably won’t use it
right away. As a regular expression is applied to a string, it
generates many states so that it can backtrack if the match fails.
Possessive quantifiers do not keep those intermediate states,
preventing backtracking. They can be used to prevent the a regular
expression from running away and also to make it execute more
efficiently.
Feedback


Greedy Reluctant Possessive Matches
X? X?? X?+ X, one or none
X* X*? X*+ X, zero or more
X+ X+? X++ X, one or more
X{n} X{n}? X{n}+ X, exactly n times
X{n,} X{n,}? X{n,}+ X, at least n times
X{n,m} X{n,m}? X{n,m}+ X, at least n but not more than m
times
You should be very aware that the expression ‘X’ will often need to be
surrounded in parentheses for it to work the way you desire. For example:
abc+

Might seem like it would match the sequence ‘abc’ one or more times, and
if you apply it to the input string ‘abcabcabc’ you will in fact get three
matches. However, the expression actually says “match ‘ab’ followed by
one or more occurrences of ‘c’.” To match the entire string ‘abc’ one or
more times, you must say:

Chapter 12: The Java I/O System 689
(abc)+

You can easily be fooled when using regular expressions – it’s a new
language, on top of Java.
Feedback

CharSequence
JDK1.4 defines a new interface called CharSequence, which establishes
a definition of a character sequence, abstracted from the String or
StringBuffer classes:

interface CharSequence {
charAt(int i);
length();
subSequence(int start, int end);
toString();
}

The String, StringBuffer, and CharBuffer classes have been modified
implement this new CharSequence interface. Many regular expression
operations take CharSequence arguments.
Feedback

Pattern and Matcher
As a first example, the following class can be used to test regular
expressions against an input string. The first argument is the input string
to match against, followed by one or more regular epressions to be
applied to the input. Under Unix/Linux, the regular expressions must be
quoted on the command line.
Feedback

This program can be useful in testing regular expressions as you construct
them to see that they produce your intended matching behavior.
//: c12:TestRegularExpression.java
// Allows you to easly try out regular expressions.
// {Args: abcabcabcdefabc "abc+" "(abc)+" "(abc){2,}" }
import java.util.regex.*;

public class TestRegularExpression {
public static void main(String[] args) {
if(args.length < 2) {

System.out.println("Usage:\n" +
"java TestRegularExpression " +
"characterSequence regularExpression+");

690 Thinking in Java www.BruceEckel.com
System.exit(0);
}
System.out.println("Input: \"" + args[0] + "\"");
for(int i = 1; i < args.length; i++) {
System.out.println(
"Regular expression: \"" + args[i] + "\"");
Pattern p = Pattern.compile(args[i]);
Matcher m = p.matcher(args[0]);
while(m.find()) {
System.out.println("Match \"" + m.group() +
"\" at positions " +
m.start() + "-" + (m.end() - 1));
}
}
}
} ///:~

Regular expressions are implemented in Java through the Pattern and
Matcher classes in the package java.util.regex. A Pattern object
represents a compiled version of a regular expression. The static
compile( ) method compiles a regular expression string into a Pattern
object. As seen above, you can use the matcher( ) method and the input
string to produce a Matcher object from the compiled Pattern object.
Pattern also has a
static boolean matches(String regex, CharSequence input)


for quickly discerning if regex can be found in input, and a split( )
method that produces an array of String that has been broken around
matches of the regex.
Feedback

A Matcher object is generated by calling Pattern.matcher( ) with the
input string as an argument. The Matcher object is then used to access
the results, using methods to evaluate the success or failure of different
types of matches:
boolean matches()
boolean lookingAt()
boolean find()
boolean find(int start)

The matches( ) method is successful if the pattern matches the entire
input string, while lookingAt( ) is successful if the input string, starting
at the beginning, is a match to the pattern.
Feedback


Chapter 12: The Java I/O System 691
find()
Matcher.find( ) can be used to discover multiple pattern matches in the
CharSequence to which it is applied. For example:
//: c12:FindDemo.java
import java.util.regex.*;
import com.bruceeckel.simpletest.*;
import java.util.*;


public class FindDemo {
private static Test monitor = new Test();
public static void main(String[] args) {
Matcher m = Pattern.compile("\\w+")
.matcher("Evening is full of the linnet's wings");
while(m.find())
System.out.println(m.group());
int i = 0;
while(m.find(i)) {
System.out.print(m.group() + " ");
i++;
}
monitor.expect(new String[] {
"Evening",
"is",
"full",
"of",
"the",
"linnet",
"s",
"wings",
"Evening vening ening ning ing ng g is is s full " +
"full ull ll l of of f the the he e linnet linnet " +
"innet nnet net et t s s wings wings ings ngs gs s "
});
}
} ///:~

The pattern “\\w+” indicates “one or more word characters,” so it will
simply split the input up into words. find( ) is like an iterator, moving

forward through the input string. However, the second version of find( )
can be given an integer argument that tells it the character position for the
beginning of the search – this version resets the search position to the
value of the argument, as you can see from the output.
Feedback


692 Thinking in Java www.BruceEckel.com
Groups
Groups are regular expressions set off by parentheses, which can be called
up later with their group number. Group zero indicates the whole
expression match, group one is the first parenthesized group, etc. Thus in
A(B(C))D

there are three groups: group 0 is ABCD, group 1 is BC, and group 2 is C.
Feedback

The Matcher object has methods to give you information about groups:
public int groupCount( ) returns the number of groups in this
matcher's pattern. Group zero is not included in this count.
public String group( ) returns group zero (the entire match) from the
previous match operation (find( ), for example).
public String group(int i) returns the given group number during the
previous match operation. If the match was successful but the group
specified failed to match any part of the input string, then null is returned.
public int start(int group) returns the start index of the group found
in the previous match operation.
public int end(int group) returns the index of the last character, plus
one, of the group found in the previous match operation.
Feedback


Here’s an example of regular expression groups:
//: c12:Groups.java
import java.util.regex.*;
import com.bruceeckel.simpletest.*;

public class Groups {
private static Test monitor = new Test();
static public final String poem =
"Twas brillig, and the slithy toves\n" +
"Did gyre and gimble in the wabe.\n" +
"All mimsy were the borogoves,\n" +
"And the mome raths outgrabe.\n\n" +
"Beware the Jabberwock, my son,\n" +
"The jaws that bite, the claws that catch.\n" +

Chapter 12: The Java I/O System 693
"Beware the Jubjub bird, and shun\n" +
"The frumious Bandersnatch.";
public static void main(String[] args) {
Matcher m =
Pattern.compile("(?m)(\\S+)\\s+((\\S+)\\s+(\\S+))$")
.matcher(poem);
while(m.find()) {
for(int j = 0; j <= m.groupCount(); j++)
System.out.print("[" + m.group(j) + "]");
System.out.println();
}
monitor.expect(new String[]{
"[the slithy toves]" +

"[the][slithy toves][slithy][toves]",
"[in the wabe.][in][the wabe.][the][wabe.]",
"[were the borogoves,]" +
"[were][the borogoves,][the][borogoves,]",
"[mome raths outgrabe.]" +
"[mome][raths outgrabe.][raths][outgrabe.]",
"[Jabberwock, my son,]" +
"[Jabberwock,][my son,][my][son,]",
"[claws that catch.]" +
"[claws][that catch.][that][catch.]",
"[bird, and shun][bird,][and shun][and][shun]",
"[The frumious Bandersnatch.][The]" +
"[frumious Bandersnatch.][frumious][Bandersnatch.]"
});
}
} ///:~

The poem is the first part of Lewis Carroll’s “Jabberwocky,” from Through
the Looking Glass. You can see that the regular expression pattern has a
number of parenthesized groups, consisting of any number of non-
whitespace characters (‘\S+’) followed by any number of whitespace
characters (‘\s+’). The goal is to capture the last three words on each line;
the end of a line is delimited by ‘$’. However, the normal behavior is to
match ‘$’ with the end of the entire input sequence, so we must explicitly
tell the regular expression to pay attention to newlines within the input.
This is accomplished with the ‘(?m)’ pattern flag at the beginning of the
sequence (pattern flags will be shown shortly).
Feedback



694 Thinking in Java www.BruceEckel.com
start() and end()
Following a successful matching operation, start( ) returns the start
index of the previous match, and end( ) returns the the index of the last
character matched, plus one. Invoking either start( ) or end( ) following
an unsuccessful matching operation (or prior to a matching operation
being attempted) produces an IllegalStateException. The following
program also demonstrates matches( ) and lookingAt( ):
Feedback

//: c12:StartEnd.java
import java.util.regex.*;
import com.bruceeckel.simpletest.*;

public class StartEnd {
private static Test monitor = new Test();
public static void main(String[] args) {
String[] input = new String[] {
"Java has regular expressions in 1.4",
"regular expressions now expressing in Java",
"Java represses oracular expressions"
};
Pattern
p1 = Pattern.compile("re\\w*"),
p2 = Pattern.compile("Java.*");
for(int i = 0; i < input.length; i++) {
System.out.println("input " + i + ": " + input[i]);
Matcher
m1 = p1.matcher(input[i]),
m2 = p2.matcher(input[i]);

while(m1.find())
System.out.println("m1.find() '" + m1.group() +
"' start = "+ m1.start() + " end = " + m1.end());
while(m2.find())
System.out.println("m2.find() '" + m2.group() +
"' start = "+ m2.start() + " end = " + m2.end());
if(m1.lookingAt()) // No reset() necessary
System.out.println("m1.lookingAt() start = "
+ m1.start() + " end = " + m1.end());
if(m2.lookingAt())
System.out.println("m2.lookingAt() start = "
+ m2.start() + " end = " + m2.end());
if(m1.matches()) // No reset() necessary
System.out.println("m1.matches() start = "
+ m1.start() + " end = " + m1.end());

Chapter 12: The Java I/O System 695
if(m2.matches())
System.out.println("m2.matches() start = "
+ m2.start() + " end = " + m2.end());
}
monitor.expect(new String[] {
"input 0: Java has regular expressions in 1.4",
"m1.find() 'regular' start = 9 end = 16",
"m1.find() 'ressions' start = 20 end = 28",
"m2.find() 'Java has regular expressions in 1.4'" +
" start = 0 end = 35",
"m2.lookingAt() start = 0 end = 35",
"m2.matches() start = 0 end = 35",
"input 1: regular expressions now " +

"expressing in Java",
"m1.find() 'regular' start = 0 end = 7",
"m1.find() 'ressions' start = 11 end = 19",
"m1.find() 'ressing' start = 27 end = 34",
"m2.find() 'Java' start = 38 end = 42",
"m1.lookingAt() start = 0 end = 7",
"input 2: Java represses oracular expressions",
"m1.find() 'represses' start = 5 end = 14",
"m1.find() 'ressions' start = 27 end = 35",
"m2.find() 'Java represses oracular expressions' " +
"start = 0 end = 35",
"m2.lookingAt() start = 0 end = 35",
"m2.matches() start = 0 end = 35"
});
}
} ///:~

Notice that find( ) will locate the regular expression anywhere in the
input, but lookingAt( ) and matches( ) only succeed if the regular
expression starts matching at the very beginning of the input. While
matches( ) only succeeds if the entire input matches the regular
expression, lookingAt( )
6
succeeds if only the first part of the input
matches.
Feedback



6

I have no idea how they came up with this method name, or what it’s supposed to refer
to. But it’s reassuring to know that whoever comes up with nonintuitive method names is
still employed at Sun. And that their apparent policy of not reviewing code designs is still
in place. Sorry for the sarcasm, but this kind of thing gets tiresome after a few years.

696 Thinking in Java www.BruceEckel.com
Pattern flags
An alternative compile( ) method accepts flags that affect the behavior
of regular expression matching:
Pattern Pattern.compile(String regex, int flag)

where flag is drawn from among the following Pattern class constants:
Compile Flag Effect
Pattern.CANON_EQ Two characters will be considered to
match if, and only if, their full
canonical decompositions match.
The expression “a\u030A”, for
example, will match the string “?”
when this flag is specified. By
default, matching does not take
canonical equivalence into account.
Pattern.CASE_INSENSITIVE
(?i)
By default, case-insensitive
matching assumes that only
characters in the US-ASCII charset
are being matched. This flag allows
your pattern to match without
regard to case (upper or lower).
Unicode-aware case-insensitive

matching can be enabled by
specifying the UNICODE_CASE
flag in conjunction with this flag.
Pattern.COMMENTS
(?x)
In this mode, whitespace is ignored,
and embedded comments starting
with # are ignored until the end of a
line. Unix lines mode can also be
enabled via the embedded flag
expression.
Pattern.DOTALL
(?s)
In dotall mode, the expression ‘.’
matches any character, including a
line terminator. B
y
default the ‘.’

Chapter 12: The Java I/O System 697
expression does not match line
terminators.
Pattern.MULTILINE
(?m)
In multiline mode the expressions
‘^’ and ‘$’ match the beginning and
ending of a line, respectively. ‘^’ also
matches the beginning of the input
string, and ‘$’ also matches the end
of the input string. By default these

expressions only match at the
beginning and the end of the entire
input string.
Pattern.UNICODE_CASE
(?u)
When this flag is specified then case-
insensitive matching, when enabled
by the CASE_INSENSITIVE flag,
is done in a manner consistent with
the Unicode Standard. By default,
case-insensitive matching assumes
that only characters in the US-ASCII
charset are being matched.
Pattern.UNIX_LINES
(?d)
In this mode, only the ‘\n’ line
terminator is recognized in the
behavior of ‘.’, ‘^’, and ‘$’.
Particularly useful among these flags are
Pattern.CASE_INSENSITIVE, Pattern.MULTILINE, and
Pattern.COMMENTS (which is helpful for clarity and/or
documentation). Note that the behavior of most of the flags can also be
obtained by inserting the parenthesized characters, shown in the table
beneath the flags, into your regular expression, preceding the place where
you want the mode to take effect.
Feedback

You can combine the effect of these and other flags through an "OR" (‘|’)
operation:
//: c12:ReFlags.java

import java.util.regex.*;
import com.bruceeckel.simpletest.*;

698 Thinking in Java www.BruceEckel.com

public class ReFlags {
private static Test monitor = new Test();
public static void main(String[] args) {
Pattern p = Pattern.compile("^java",
Pattern.CASE_INSENSITIVE|Pattern.MULTILINE);
Matcher m = p.matcher(
"java has regex\nJava has regex\n" +
"JAVA has pretty good regular expressions\n" +
"Regular expressions are in Java");
while(m.find())
System.out.println(m.group());
monitor.expect(new String[] {
"java",
"Java",
"JAVA"
});
}
} ///:~

This creates a pattern which will match lines which start with "java",
"Java", "JAVA", etc. and attempt a match for each line within a multiline
set (matches starting at the beginning of the character sequence and
following each line terminator within the character sequence). Note that
the group( ) method only produces the matched portion.
Feedback


split()
Splitting divides an input string into an array of String objects, delimited
by the regular expression.
String[] split(CharSequence charseq)
String[] split(CharSequence charseq, int limit)

This is a quick and handy way of breaking up input text over a common
boundary:
//: c12:SplitDemo.java
import java.util.regex.*;
import com.bruceeckel.simpletest.*;
import java.util.*;

public class SplitDemo {
private static Test monitor = new Test();

Chapter 12: The Java I/O System 699
public static void main(String[] args) {
String input =
"This!!unusual use!!of exclamation!!points";
System.out.println(Arrays.asList(
Pattern.compile("!!").split(input)));
// Only do the first three:
System.out.println(Arrays.asList(
Pattern.compile("!!").split(input, 3)));
System.out.println(Arrays.asList(
"Aha! String has a split() built in!".split(" ")));
monitor.expect(new String[] {
"[This, unusual use, of exclamation, points]",

"[This, unusual use, of exclamation!!points]",
"[Aha!, String, has, a, split(), built, in!]"
});
}
} ///:~

The second form of split( ) limits the number of splits that occur.
Feedback

Notice that regular expressions are so valuable that some operations have
also been added to the String class, including split( ) (shown here),
matches( ), replaceFirst( ), and replaceAll( ). These behave like
their Pattern and Matcher counterparts.
Feedback

Replace operations
Regular expressions become especially useful when you begin replacing
text. Here are the available methods:
replaceFirst(String replacement) replaces the first matching part of
the input string with replacement.
Feedback

replaceAll(String replacement) replaces every matching part of the
input string with replacement.
Feedback

appendReplacement(Stringbuffer sbuf, String replacement)
performs a step-by-step replacements into sbuf, rather than replacing
only the first one or all of them, as in replaceFirst( ) and replaceAll( ),
respectively. This is a very important method, because it allows you to call

methods and perform other processing in order to produce replacement
(replaceFirst( ) and replaceAll( ) are only able to put in fixed strings).

700 Thinking in Java www.BruceEckel.com
With this method, you can programmatically pick apart the groups and
create powerful replacements.
Feedback

appendTail(StringBuffer sbuf, String replacement) is invoked
after one or more invocations of the appendReplacement( ) method in
order to copy the remainder of the input string.
Feedback

Here’s an example which shows the use of all the replace operations. In
addition, the block of commented text at the beginning is extracted and
processed with regular expressions, for use as input in the rest of the
example:
//: c12:TheReplacements.java
import java.util.regex.*;
import java.io.*;
import com.bruceeckel.util.*;
import com.bruceeckel.simpletest.*;

/*! Here's a block of text to use as input to
the regular expression matcher. Note that we'll
first extract the block of text by looking for
the special delimiters, then process the
extracted block. !*/

public class TheReplacements {

private static Test monitor = new Test();
public static void main(String[] args) throws Exception {
String s = TextFile.read("TheReplacements.java");
// Match the specially-commented block of text above:
Matcher mInput =
Pattern.compile("/\\*!(.*)!\\*/", Pattern.DOTALL)
.matcher(s);
if(mInput.find())
s = mInput.group(1); // Captured by parentheses
// Replace two or more spaces with a single space:
s = s.replaceAll(" {2,}", " ");
// Replace one or more spaces at the beginning of each
// line with no spaces. Must enable MULTILINE mode:
s = s.replaceAll("(?m)^ +", "");
System.out.println(s);
s = s.replaceFirst("[aeiou]", "(VOWEL1)");
StringBuffer sbuf = new StringBuffer();
Pattern p = Pattern.compile("[aeiou]");
Matcher m = p.matcher(s);

Chapter 12: The Java I/O System 701
// Process the find information as you
// perform the replacements:
while(m.find())
m.appendReplacement(sbuf, m.group().toUpperCase());
// Put in the remainder of the text:
m.appendTail(sbuf);
System.out.println(sbuf);
monitor.expect(new String[]{
"Here's a block of text to use as input to",

"the regular expression matcher. Note that we'll",
"first extract the block of text by looking for",
"the special delimiters, then process the",
"extracted block. ",
"H(VOWEL1)rE's A blOck Of tExt tO UsE As InpUt tO",
"thE rEgUlAr ExprEssIOn mAtchEr. NOtE thAt wE'll",
"fIrst ExtrAct thE blOck Of tExt by lOOkIng fOr",
"thE spEcIAl dElImItErs, thEn prOcEss thE",
"ExtrActEd blOck. "
});
}
} ///:~

The file is opened and read using the TextFile.read( ) method
introduced earlier in this chapter. mInput is created to match all the text
(notice the grouping parentheses) between ‘/*!’ and ‘!*/’. Then, more than
two spaces are reduced to a single space, and any space at the beginning
of each line is removed (in order to do this on all lines and not just the
beginning of the input, multiline mode must be enabled). These two
replacements are performed with the equivalent (but more convenient, in
this case) replaceAll( ) that’s part of String. Note that since each
replacement is only used once in the program, there’s no extra cost to
doing it this way rather than precompiling it as a Pattern.
Feedback

replaceFirst( ) only performs the first replacement that it finds. In
addition, the replacement strings in replaceFirst( ) and replaceAll( )
are just literals, so if you want to perform some processing on each
replacement they don’t help. In that case, you need to use
appendReplacement( ), which allows you to write any amount of code

in the process of performing the replacement. In the above example, a
group( ) is selected and processed – in this example, setting the vowel
found by the regular expression to upper case – as the resulting sbuf is
being built. Normally, you would step through and perform all the

702 Thinking in Java www.BruceEckel.com
replacements and then call appendTail( ), but if you wanted to simulate
replaceFirst( ) (or “replace n”) you would just do the replacement one
time and then call appendTail( ) to put the rest into sbuf.
Feedback

appendReplacement( ) also allows you to refer to captured groups
directly in the replacement string by saying “$g” where ‘g’ is the group
number. However, this is for simpler processing and wouldn’t give you the
desired results in the above program.
Feedback

reset()
An existing Matcher object can be applied to a new character sequence
Using the reset( ) methods:
//: c12:Resetting.java
import java.util.regex.*;
import java.io.*;
import com.bruceeckel.simpletest.*;

public class Resetting {
private static Test monitor = new Test();
public static void main(String[] args) throws Exception {
Matcher m = Pattern.compile("[frb][aiu][gx]")
.matcher("fix the rug with bags");

while(m.find())
System.out.println(m.group());
m.reset("fix the rig with rags");
while(m.find())
System.out.println(m.group());
monitor.expect(new String[]{
"fix",
"rug",
"bag",
"fix",
"rig",
"rag"
});
}
} ///:~

reset( ) without any arguments sets the Matcher to the beginning of the
current sequence.
Feedback


Chapter 12: The Java I/O System 703
Regular expressions and Java I/O
Most of the examples so far have shown regular expressions applied to
static strings. The following example shows one way to apply regular
expressions to search for matches in a file. Inspired by Unix’s grep,
JGrep.java takes two arguments: a filename and the regular expression
that you want to match.The ouput shows each line where a match occurs
and the match position(s) within the line.
Feedback


//: c12:JGrep.java
// A very simple version of the "grep" program.
// {Args: JGrep.java "\\b[Ssct]\\w+"}
import java.io.*;
import java.util.regex.*;
import java.util.*;
import com.bruceeckel.util.*;

public class JGrep {
public static void main(String[] args) throws Exception {
if (args.length < 2) {
System.out.println("Usage: java JGrep file regex");
System.exit(0);
}
Pattern p = Pattern.compile(args[1]);
// Iterate through the lines of the input file:
ListIterator it = new TextFile(args[0]).listIterator();
while(it.hasNext()) {
Matcher m = p.matcher((String)it.next());
while(m.find())
System.out.println(it.nextIndex() + ": " +
m.group() + ": " + m.start());
}
}
} ///:~

The file is opened as a TextFile object (these were introduced earlier in
this chapter). Since a TextFile contains the lines of the file in an
ArrayList, from that array a ListIterator is produced. The result is an

iterator that will allow you to move through the lines of the file (forward
and backward).
Feedback


704 Thinking in Java www.BruceEckel.com
Each input line is used to produce a Matcher and the result is scanned
with find( ). Note that the ListIterator.nextIndex( ) keeps track of the
line numbers.
Feedback

The test arguments open the JGrep.java file to read as input, and search
for words starting with [Ssct].
Feedback

Is StringTokenizer needed?
The new capabilities provided with regular expressions might prompt you
to wonder whether the original StringTokenizer class is still necessary.
Before JDK 1.4, the way to split a string into parts was to “tokenize” it
with StringTokenizer. But now it’s much easier and more succinct to do
the same thing with regular expressions:
//: c12:ReplacingStringTokenizer.java
import java.util.regex.*;
import com.bruceeckel.simpletest.*;
import java.util.*;

public class ReplacingStringTokenizer {
private static Test monitor = new Test();
public static void main(String[] args) {
String input = "But I'm not dead yet! I feel happy!";

StringTokenizer stoke = new StringTokenizer(input);
while(stoke.hasMoreElements())
System.out.println(stoke.nextToken());
System.out.println(Arrays.asList(input.split(" ")));
monitor.expect(new String[] {
"But",
"I'm",
"not",
"dead",
"yet!",
"I",
"feel",
"happy!",
"[But, I'm, not, dead, yet!, I, feel, happy!]"
});
}
} ///:~


Chapter 12: The Java I/O System 705
With regular expressions you can also split a string into parts using more
complex patterns, something that’s much more difficult with
StringTokenizer. It seems safe to say that regular expressions replace
any tokenizing classes in earlier versions of Java.
Feedback

You can learn much more about regular expressions in Mastering
Regular Expressions, 2
nd
Edition by Jeffrey E. F. Friedl (O’Reilly, 2002).

Feedback

Summary
The Java I/O stream library does satisfy the basic requirements: you can
perform reading and writing with the console, a file, a block of memory,
or even across the Internet. With inheritance, you can create new types of
input and output objects. And you can even add a simple extensibility to
the kinds of objects a stream will accept by redefining the toString( )
method that’s automatically called when you pass an object to a method
that’s expecting a String (Java’s limited “automatic type conversion”).
Feedback

There are questions left unanswered by the documentation and design of
the I/O stream library. For example, it would have been nice if you could
say that you want an exception thrown if you try to overwrite a file when
opening it for output—some programming systems allow you to specify
that you want to open an output file, but only if it doesn’t already exist. In
Java, it appears that you are supposed to use a File object to determine
whether a file exists, because if you open it as a FileOutputStream or
FileWriter it will always get overwritten.
Feedback

The I/O stream library brings up mixed feelings; it does much of the job
and it’s portable. But if you don’t already understand the decorator
pattern, the design is nonintuitive, so there’s extra overhead in learning
and teaching it. It’s also incomplete: for example, I shouldn’t have to write
utilities like TextFile, and there’s no support for the kind of output
formatting that virtually every other language’s I/O package supports.
Feedback


However, once you do understand the decorator pattern and begin using
the library in situations that require its flexibility, you can begin to benefit

706 Thinking in Java www.BruceEckel.com
from this design, at which point its cost in extra lines of code may not
bother you as much.
Feedback

If you do not find what you’re looking for in this chapter (which has only
been an introduction, and is not meant to be comprehensive), you can
find in-depth coverage in Java I/O, by Elliotte Rusty Harold (O’Reilly,
1999).
Feedback

Exercises
Solutions to selected exercises can be found in the electronic document The Thinking in Java
Annotated Solution Guide, available for a small fee from www.BruceEckel.com.

1. Open a text file so that you can read the file one line at a time.
Read each line as a String and place that String object into a
LinkedList. Print all of the lines in the LinkedList in reverse
order.
Feedback

2. Modify Exercise 1 so that the name of the file you read is provided
as a command-line argument.
Feedback

3. Modify Exercise 2 to also open a text file so you can write text into
it. Write the lines in the ArrayList, along with line numbers (do

not attempt to use the “LineNumber” classes), out to the file.
Feedback

4. Modify Exercise 2 to force all the lines in the ArrayList to upper
case and send the results to System.out.
Feedback

5. Modify Exercise 2 to take additional command-line arguments of
words to find in the file. Print all lines in which any of the words
match.
Feedback

6. Modify DirList.java so that the FilenameFilter actually opens
each file and accepts the file based on whether any of the trailing
arguments on the command line exist in that file.
Feedback

7. Modify DirList.java to produce all the file names in the current
directory and subdirectories that satisfy the given regular
expression. Hint: use recursion to traverse the subdirectories.

Chapter 12: The Java I/O System 707
8. Create a class called SortedDirList with a constructor that takes
file path information and builds a sorted directory list from the
files at that path. Create two overloaded list( ) methods that will
either produce the whole list or a subset of the list based on an
argument. Add a size( ) method that takes a file name and
produces the size of that file.
Feedback


9. Modify WordCount.java so that it produces an alphabetic sort
instead, using the tool from Chapter 11.
Feedback

10. Modify WordCount.java so that it uses a class containing a
String and a count value to store each different word, and a Set
of these objects to maintain the list of words.
Feedback

11. Modify IOStreamDemo.java so that it uses
LineNumberReader to keep track of the line count. Note that
it’s much easier to just keep track programmatically.
Feedback

12. Starting with section 4 of IOStreamDemo.java, write a program
that compares the performance of writing to a file when using
buffered and unbuffered I/O.
Feedback

13. Modify section 5 of IOStreamDemo.java to eliminate the spaces
in the line produced by the first call to in5.readUTF( ).
Feedback

14. Repair the program CADState.java as described in the text.
Feedback

15. In Blips.java, copy the file and rename it to BlipCheck.java
and rename the class Blip2 to BlipCheck (making it public and
removing the public scope from the class Blips in the process).
Remove the //! marks in the file and execute the program

including the offending lines. Next, comment out the default
constructor for BlipCheck. Run it and explain why it works. Note
that after compiling, you must execute the program with “java
Blips” because the main( ) method is still in class Blips.
Feedback

16. In Blip3.java, comment out the two lines after the phrases “You
must do this:” and run the program. Explain the result and why it
differs from when the two lines are in the program.
Feedback


708 Thinking in Java www.BruceEckel.com
17. (Intermediate) In Chapter 8, locate the
GreenhouseController.java example, which consists of four
files. GreenhouseController contains a hard-coded set of
events. Change the program so that it reads the events and their
relative times from a text file. (Challenging: Use a design patterns
factory method to build the events—see Thinking in Patterns with
Java at www.BruceEckel.com.)
Feedback

18. For the phrase “Java now has regular expressions” evaluate
whether the following expressions will find a match:
^Java
\Breg.*
n.w\s+h(a|i)s
s?
s*
s+

s{4}
s{1.}
s{0,3}

19. Apply the regular expression
(?i)((^[aeiou])|(\s+[aeiou]))\w+?[aeiou]\b

to
"Arline ate eight apples and one orange while Anita
hadn't any"

20. Modify JGrep.java to accept flags as arguments (e.g.
Pattern.CASE_INSENSITIVE, Pattern.MULTILINE)
21. Modify JGrep.java to use Java NIO memory-mapped files.
22. Modify JGrep.java to accept a directory name or a file name as
argument (if a directory is provided, search should include all files
in the directory). Hint: you can generate a list of filenames with:
String[] filenames = new File(".").list();

×