Assignment 2: Syntax Analysis
I. Introduction
In this assignment, you are required to implement a parser manually for MC programs.
The parser performs a syntax analysis process that receives a sequence of tokens
produced by the scanner, which should have been implemented in Assignment 1, and
verifies if the token sequence is grammatically correct or not.
In order to complete the assignment, the following tasks are to be fulfilled:
- Construct a context-free grammar for the MC language.
- Implement a parser according to the constructed grammar.
You can either employ a top-down or bottom-up parsing technique for your parser.
You must adopt the scanner provided by the course’s staffs to perform lexical analysis
for your parser. Thus, the used token set must be the same as that previously specified in
Assigment 1.
You should refer to lecture notes and textbooks as well as the MC language specification
carefully to find out the grammar rules that precisely reflect the MC program structures.
II. Operational Instructions
The programming language used to implement the scanner must be Java. You should
install Java JDK 5.0, which includes a java compiler and a java virtual machine as done
in Assignment 1.
To implement the parser, you first download provided file ass2.zip, which will be
uploaded after the due date of Assignment 1, and decompress it in a directory, called
supposedly $ROOT$ as done in Assignment 1. In your $ROOT$ directory, you will have
an MC directory whose structure is as follows (bold names are folder, the remaining are
files):
MC
|__ lexicalanalysis
|
|___Scanner.java, SourcePosition.java, Token.java, ErrorReport.java
|__ syntaxanalysis
|
|___ grammar
|
|
|____grammar.txt
|
|___ test
|
|
|____test.txt
|
|
|____solution.txt
|
|___ Parser.java
|__MCCompiler.java
The files in directory lexicalanalysis are the scanner provided for your convinient. You
must not modify these files.
File grammar.txt specifies your constructed grammar. You can either use BNF or EBNF
formalism to specify your productions. In case you transform the grammar for top-down
parsing, please put both original and transformed grammars in the same file as separated
sections.
File MCCompiler.java defines class MCCompiler. You must not modify this file.
File Parser.java defines class Parser. This class will perform all the necessary tasks of a
parser.
- The public method parse will be involked from the main method of class
MCCompiler and report if the input file is grammatically correct or not.
File test.txt and solution.txt are supplied for your convenience. You can try the provided
files by typing the following and compare the output to the content of solution.txt.
$ROOT$> javac MC\*.java
$ROOT$> java MC.MCCompiler MC\syntaxanalysis\test.txt
If necessary, you can create some your own new java source files but they must be in the
same package of MC.syntaxanalysis. Files Parser.java, grammar.txt and your own
new files are only required to be submitted.
3. Output Format
If the input file is grammatically correct, the parser outputs nothing; otherwise an error
message will be reported accordingly (please refer to Section 5 for more detail of error
message).
4. Parser Testing
Although some test files are provided for this assignment, you are recommended to
design additional test cases to make sure your parser works as desired. The mechanism to
test your parser is similar to that in Assignment 1.
5. Syntax Error
Your parser must be capable of detecting syntax errors in the input file as soon as present.
When a syntax error is detected, a corresponding error message will display the positions
of the error and the token whose occurrence causes the error. The conventional error
message is of the following format
[Syntax error: Unexpected token:][“ “]<token-kind>[tab][Lexeme:]<tokenlexeme>[tab][charStart=]<token-charStart>[“ “][charFinish=]<token-charFinish>[“
”][line=]<token-line>
For example, with the input “x = a+ ;” at line 3, the following message should be
displayed:
“Syntax error: Unexpected token: Token.SEMICOLON
charFinish=8 line =3”
Lexeme:;
charStart=8
Note that there is no new line at the end of the error message. The parser will terminate
immediately when a syntax error is found. No further error recovery action is taken.
6. Submission and Late Penalties
Instructions for submiting your assignment will be available in the course’s site around
week 6. Basically, the submission mechanism is similar to that of Assignment 1.
The deadline for this Assignment 2 is at 12:00 noon Wed, Oct 25th, 2006.
This assignment is worth 25% of the assignment mark.
You are strongly advised to start as soon as possible and should not wait until the last
minute.
If you are late for 1 day (12:00 noon Oct 26th, 2006), the maximum mark for you is 7
If you are late for 2 day (12:00 noon Oct 27th, 2006), the maximum mark for you is 4.
If you are late for 3 day (12:00 noon Oct 28th, 2006), the maximum mark for you is 1.
After Oct 28th,2006, you do not need to submit your assignment anymore. Also, no
excuse is accepted after this point of time.
7. Plagiarism
You must do the assignment by yourself. If it is discovered that your assignment is a
copy of your friend’s work, both of you will receive a zero-mark for this subject (not
only assignment mark). NO EXCUSE AND NO EXCEPTION!
TUTORIAL 2
1. Consider the following grammar
S à (L) | a
L à L,S | S
a) What are the terminal, nonterminal and start symbol?
b) Find parse tree for the following sentences:
(a,a)
(a,(a,a))
(a,((a,a),(a,a)))
c) Construct a leftmost derivation for each sentence given in (b)
d) Construct a rightmost derivation for each sentence given in (b)
e) What is the language generated by this grammar?
2. Consider the following grammar
S à aSbS| bSaS | ∈
a) Find a rightmost derivation for abab
b) Construct all possible parse trees for abab
c) Is this grammar ambiguous? Why?
d) What is the language generated by this grammar?
3. Write a grammar that generates all of boolean expressions. Construct the
corresponding parse tree for not (true or false)
4.
a) Eliminate the left-recursion from the grammar in Exercise 1
b) Compute the First, Follow and Select sets for the transformed grammar
c) Construct a recursive predictive parser for the transformed grammar
d) Show the behavior of the parser for the sentences given in Exercise 1b
5. Eliminate left-recursion and left-factoring for the grammar constructed in
Exercise 3, if present.
TUTORIAL 2
1. Consider the following grammar
S à (L) | a
L à L,S | S
a) What are the terminal, nonterminal and start symbol?
b) Find parse tree for the following sentences:
(a,a)
(a,(a,a))
(a,((a,a),(a,a)))
c) Construct a leftmost derivation for each sentence given in (b)
d) Construct a rightmost derivation for each sentence given in (b)
e) What is the language generated by this grammar?
2. Consider the following grammar
S à aSbS| bSaS | ∈
a) Find a rightmost derivation for abab
b) Construct all possible parse trees for abab
c) Is this grammar ambiguous? Why?
d) What is the language generated by this grammar?
3. Write a grammar that generates all of boolean expressions. Construct the
corresponding parse tree for not (true or false)
4.
a) Eliminate the left-recursion from the grammar in Exercise 1
b) Compute the First, Follow and Select sets for the transformed grammar
c) Construct a recursive predictive parser for the transformed grammar
d) Show the behavior of the parser for the sentences given in Exercise 1b
5. Eliminate left-recursion and left-factoring for the grammar constructed in
Exercise 3, if present.