Tải bản đầy đủ (.pdf) (10 trang)

Constituent Structure - Part 10

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (109.58 KB, 10 trang )

called immediate constituent analysis (IC). IC was not so much a
formalized algorithm for segmenting sentences, but was based on the
native speaker and linguist’s intuitions about semantic relatedness
between elements. IC splits sentences into constituents based on how
closely the modiWcation relations among the words were. For example,
take the diagram in (1) (adapted from Wells 1947: 84), where a sentence
has been analyzed into immediate constituents. The greater the num-
ber of pipes (j) the weaker the boundary between the constituents (i.e.
the more pipes, more closely related the words).2 The constituents in
this diagram are listed below it.
(1) The k King kj of kk England j openkj ed k parliament.
Constituents:
(a) The King of England
(b) The
(c) King of England
(d) King
(e) of England
(f) of
(g) England
(h) opened
(i) open
(j) ed
(k) opened parliament
(l) parliament
Pike (1943) criticized BloomWeld’s IC system for its vagueness (al-
though see Longacre 1960 for a defense of the vaguer notions). Pike
developed a set of discovery procedures (methodologies that a linguist
can use to come up with a grammatical analysis), which are very
similar to the constituency tests listed in Chapter 2. Harris (1946)
(drawing on Aristotelian notions borrowed from logic) reWned
these tests somewhat by formalizing the procedure of identiWcation


of immediate constituents by making reference to substitution. That is,
2 The number of pipes should not be taken relativistically. That is, the fact that there are
three pipes between open and ed and four pipes between of and England, does not mean
that of and England are more closely related than open and ed. The fact that there are four
pipes in the Wrst half has to do with the fact that there are four major morphemes in the
NP, and only three in the VP. The number of pipes is determined by the number of
ultimate constituents (i.e. morphemes), not by degree of relationship.
70 phrase structure grammars and x-bar
if one can substitute a single morpheme of a given type for a string of
words, then that string functions as a constituent of the same type.
Wells (1947) enriches Harris’s system by adding a notion of construc-
tion—an idea we will return to in Chapter 9. Harwood (1955) fore-
shadows Chomsky’s work on phrase structure grammars, and suggests
that Harris’s substitution procedures can be axiomized into formation
rules of the kind we will look at in the next section.
Harris’s work is the Wrst step away from an analysis based on semantic
relations like ‘‘subject’’, ‘‘predicate’’, and ‘‘modiWer’’, and towards an
analysis based purely in the structural equivalence of strings of words.3
Harris was Chomsky’s teacher and was undoubtedly a major inXuence
on Chomsky’s (1957) formalization of phrase structure grammars.
5.2 Phrase structure grammars
In his early unpublished work (the Logical Structure of Linguistic Theory
(LSLT), later published in 1975), Chomsky Wrst articulates a family of
formal systems that might be applied to human language. These are
phrase structure grammars (PSGs). The most accessible introduction to
Chomsky’s PSGs can be found in Chomsky (1957).4 Chomsky asserts
that PSGs are a formal implementation of the structuralist IC analyses.
Postal (1967) presents a defense of this claim, arguing that IC systems
are all simply poorly formalized phrase structure grammars. Manaster-
Ramer and Kac (1990) and Borsley (1996) claim that this is not quite

accurate, and there were elements of analysis present in IC that were
explicitly excluded from Chomsky’s original deWnitions of PSGs (e.g.
discontinuous structures). Nevertheless, Chomsky’s formalizations re-
main the standard against which all other theories are currently meas-
ured, so we will retain them here for the moment.
A PSG draws on the structuralist notion that large constituents are
replaced by linear adjacent sequences of smaller constituents. A PSG
thus represents a substitution operation. This grammar consists of
four parts. First we have what is called an initial symbol (usually S
(¼ sentence)), which will start the series of replacement operations.
Second we have vocabulary of non-terminal symbols {A,B,...}.These
3 Harris’s motivation was computerized translation, so the goal was to Wnd objectively
detectable characterizations and categorizations instead of pragmatic and semantic no-
tions that required an interaction with the world that only a human could provide.
4 See Lasnik (2000) for a modern recapitulation of this work, and its relevance today.
phrase structure grammars 71
symbols may never appear in the Wnal line of in the derivation of a
sentence. Traditionally these symbols are represented with capital
letters (however, later lexicalist versions of PSGs abandon this conven-
tion). Next we have a vocabulary of terminal symbols {a, b,...}, or
‘‘words’’. Traditionally, these are represented by lower-case letters
(again, however, this convention is often abandoned in much recent
linguistic work). Finally, we have a set of replacement or production
rules (called phrase structure rules or PSRs), which take the initial
symbol and through a series of substitutions result in a string of
terminals (and only a string of terminals). More formally, a PSG is
deWned as quadruple (Lewis and Papadimitriou 1981; Rayward-Smith
1995; Hopcroft, Motwani, and Ullman 2001):
(2) PSG ¼hN, T, P, Si
N ¼ set of non-terminals

T ¼ set of terminals
P ¼ set of production rules (PSRs)
S ¼ start symbol
The production rules take the form in (3).
(3)X! WYZ
The element on the left is a higher-level constituent replaced by the
smaller constituents on the right. The arrow, in this conception of
PSG, should be taken to mean ‘‘is replaced by’’ (in other conceptions
of PSG, which we will discuss later, the arrow has subtly diVerent
meanings).
Take the toy grammar in (4) as an example:
(4) N ¼ {A, B, S}, S ¼ {S}, T ¼ {a, b},
P ¼ (i) S ! AB
(ii) A ! A a
(iii) B ! b B
(iv) A ! a
(v) B ! b
This grammar represents a very simple language where there are only
two words (a and b), and where sentences consist only of any number
of as followed by any number of bs. To see how this works, let us do one
possible derivation (there are many possibilities) of the sentence aaab.
We start with the symbol S, and apply rule (i).
72 phrase structure grammars and x-bar
(5) (a) S
(b) A B rule i
Then we can apply the rule in (v) which will replace the B symbol with
the terminal b:
(c) A b rule v
Now we can apply rule (ii) which replaces A with another A and the
terminal a:

(d) A a b rule ii
If we apply it again we get the next line, replacing the A in (d) with A
and another a:
(e) A aa b rule ii
Finally we can apply the rule in (iv) which replaces A with the single
terminal symbol a:
(f) aaa b rule iv
This is our terminal string. The steps in (5a–f) are known as a derivation.
Let’s now bring constituent trees into the equation. It is possible
to represent each step in the derivation as a line in a tree, starting at
the top.
() (a) S
(b) A B
(c)
A
b
(d) A a
b
(e) A aa b
(f ) aaab
This tree is a little familiar, but is not identical to the trees in Chapters 2
to 4. However, it doesn’t take much manipulation to transform it into a
more typical constituent tree. In the derivational tree in (6) the arrows
represent the directional ‘‘is a’’ relation (%). (That is, S % A B se-
quence. In line (c), A % A, and B % b. By the conventions we devel-
oped in Chapter 3, things at the top of the tree have a directional
phrase structure grammars 73
‘‘dominance’’ relation, which is assumed but not represented by
arrows. If we take the ‘‘is a’’ relation to be identical to domination,
then we can delete the directional arrows. Furthermore, if we conXate

the sequences of non-branching identical terminals then we get the
tree in (7):
()S
AB
A ab
A a
a
This is more familiar constituency tree that we have already discussed.
What is crucial to understanding this particular conception of PSGs is
that the derivational steps of the production correspond roughly to the
constituents of the sentence.
Ambiguity in structure results when you have two derivations that
do not reduce to the same tree, but have the same surface string.
Consider the more complicated toy grammar in (8):
(8) N ¼ {A, B, S} S ¼ {S}, T¼ {a, b},
P ¼ (i) S ! AB
(ii) S ! A
(iii) A ! AB
(iv) A ! a
(v) B ! b
The sentence ab has many possible derivations with this grammar.
However, at least two of them result in quite diVerent PS trees.
Compare the derivations in (9), (10), and (11):
(9) (a) S
(b) A B (i)
(c) a B (iv)
(d) a b (v)
(10) (a) S
(b) A B (i)
(c) A b (v)

(d) a b (iv)
74 phrase structure grammars and x-bar

×