Lexical Structure

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (88.25 KB, 6 trang )

Chapter 2. Lexical Structure
The lexical structure of a programming language is the set of elementary rules tha
specifies how you write programs in that language. It is the lowest-level syntax of a
language; it specifies such things as what variable names look like, what characters are
used for comments, and how one program statement is separated from the next. This
short chapter documents the lexical structure of JavaScri
t
pt.
hich is
nt feature for in portant for
programmers who do not speak English.
ext
d
acters a antities may be disconcerted to know that JavaScript
represents each character using 2 bytes, but this fact is actually transparent to the
in
erals -- all other parts of an ECMAScript v1 program are
restricted to the ASCII character set. Versions of JavaScript that predate ECMAScript
bles,
four distinct
es.
hile
typed in any case in HTML, in JavaScript they
2.1 Character Set
JavaScript programs are written using the Unicode character set. Unlike the 7-bit ASCII
encoding, which is useful only for English, and the 8-bit ISO Latin-1 encoding, w
useful only for English and major Western European languages, the 16-bit Unicode
encoding can represent virtually every written language in common use on the planet.
This is an importa ternationalization and is particularly im
American and other English-speaking programmers typically write programs using a t
editor that supports only the ASCII or Latin-1 character encodings, and thus they don't

have easy access to the full Unicode character set. This is not a problem, however,
because both the ASCII and Latin-1 encodings are subsets of Unicode, so any JavaScript
program written using those character sets is perfectly valid. Programmers who are use
to thinking of char s 8-bit qu
programmer and can simply be ignored.
Although the ECMAScript v3 standard allows Unicode characters anywhere in a
JavaScript program, Versions 1 and 2 of the standard allow Unicode characters only
comments and quoted string lit
standardization typically do not support Unicode at all.
2.2 Case Sensitivity
JavaScript is a case-sensitive language. This means that language keywords, varia
function names, and any other identifiers must always be typed with a consistent
capitalization of letters. The while keyword, for example, must be typed "while", not
"While" or "WHILE". Similarly, online, Online, OnLine, and ONLINE are
variable nam
Note, however, that HTML is not case-sensitive. Because of its close association with
client-side JavaScript, this difference can be confusing. Many JavaScript objects and
properties have the same names as the HTML tags and attributes they represent. W
these tags and attribute names can be
typically must be all lowercase. For example,
is commonly specified as onClick in HTML
the HTML onclick event handler attribute
, but it must be referred to as onclick in
cape, however, so in Internet Explorer 4 and later, client-side objects and
s are case-sensitive.
2.3 Whitespace and Line Breaks
spaces, tabs, and newlines that appear between tokens in programs,
re part of string or regular expression literals. A token is a keyword,
ariable name, number, function name, or some other entity in which you would
12 3 parate tokens (and constitutes a syntax error, incidentally).

s
2.4 Optional Semicolons
tatements in JavaScript are generally followed by semicolons (;), just as they are
C, C++, and Java. The semicolon serves to separate statements from each other. In
mit the semicolon if each of your statements is placed on
following code could be written without semicolons:
the first semicolon is required:
Script theoretically allows line breaks between any two tokens, the fact
t automatically inserts semicolons for you causes some exceptions to this
JavaScript code.
While core JavaScript is entirely and exclusively case-sensitive, exceptions to this rule
are allowed in client-side JavaScript. In Internet Explorer 3, for example, all client-side
objects and properties were case-insensitive. This caused problematic incompatibilities
with Nets
propertie
JavaScript ignores
xcept those that ae
v
obviously not want to insert a space or a line break. If you place a space, tab, or newline
within a token, you break it up into two tokens -- thus,
123 is a single numeric token, but
is two se
Because you can use spaces, tabs, and newlines freely in your programs (except in
strings, regular expressions, and tokens), you are free to format and indent your program
in a neat and consistent way that makes the code easy to read and understand. Note,
however, that there is one minor restriction on the placement of line breaks; it is
described in the following section.
S
in
imple s

JavaScript, however, you may o
a separate line. For example, the
a = 3;
= 4; b
But when formatted as follows,
a = 3; b = 4;
Omitting semicolons is not a good programming practice; you should get in the habit of
using them.
Although Java
hat JavaScript
rule. Loosely, if you break a line of code in such a way that the line before the break
appears to be a complete statement, JavaScript may think you omitted the semicolon an
insert one for you, altering your meaning. Some places you should look out for this ar
with the
d
e
eturn, break, and continue statements (which are described in Chapter 6r ).
For example, consider the following:
true;
true;
atch out for -- this code does not cause a syntax error and will fail
in a nonobvious way. A similar problem occurs if you write:
break
keyword, causing a syntax error when it
+ and -- postfix operators (see
return
JavaScript assumes you meant:
return;
However, you probably meant:
return true;

This is something to w
outerloop;
JavaScript inserts a semicolon after the break
tries to interpret the next line. For similar reasons, the +
Chapter 5) must always appear on the same line as the expressions to which they
applied.
are
2.5 Comments
Java, supports both C++ and C-style comments. Any text between a //
and the end of a line is treated as a comment and is ignored by JavaScript. Any text
comment. These C-style comments
following lines of code are all legal
vaScript comments:
ere is another comment.
/*
JavaScript, like
between the characters and is also treated as a/* */
may span multiple lines but may not be nested. The
Ja
// This is a single-line comment.
/* This is also a comment */ // and h
* This is yet another comment.
* It has multiple lines.
*/
2.6 Literals
A literal is a data value that appears directly in a program. The following are all literals:
12 // The number twelve
1.2 // The number one point two
"hello world" // A string of text
'Hi' // Another string

true // A boolean value
false // The other boolean value
/javascript/gi // A "regular expression" literal (for pattern
t
pressions that serve as array and object literals are also supported.
ple:
x:1, y:2 } // An object initializer
[1,2,3,4,5] // An array initializer
matching)
null // Absence of an objec
In ECMAScript v3, ex
For exam
{
Note that these array and object literals have been supported since JavaScript 1.2 but
were not standardized until ECMAScript v3.
Literals are an important part of any programming language, as it is impossible to write a
program without them. The various JavaScript literals are described in detail in Chapter
3.
2.7 Identifiers
nd
functions and to provide labels for certain loops in JavaScript code. The rules for legal
cript as they are in Java and many other languages.
underscore (_), or a dollar sign ($).
[1]
An identifier is simply a name. In JavaScript, identifiers are used to name variables a
identifier names are the same in JavaS
The first character must be a letter, an Subsequent
so you
should avoid using dollar signs in identifiers in the code you write yourself.
i

my_variable_name
characters may be any letter or digit or an underscore or dollar sign. (Numbers are not
allowed as the first character so that JavaScript can easily distinguish identifiers from
numbers.) These are all legal identifiers:
[1]
Note that dollar signs are not legal in identifiers prior to JavaScript 1.1. They are intended for use only by code-generation tools,
v13
_dummy
$str
In ECMAScript v3, identifiers can contain letters and digits from the complete Unicode
character set. Prior to this version of the standard, JavaScript identifiers are restricted to
the ASCII character set. ECMAScript v3 also allows Unicode escape sequences to appear
in identifiers. A Unicode escape is the characters \u followed by 4 hexadecimal digits
a 16-b r encoding. For example, the identifier that specify it characte could also be
\u03c0. Although this is an awkward syntax, it makes it possible to translate
t program t contain Unic haracters into a fo hat allows
ated with t and othe at do not supp t the full Uni de
haracter set.
rposes in
2.8 Reserved Words
There are a num ou cannot use
rs (va ames, function names, and bels) in y cript
Table
written as
JavaScrip s tha ode c rm t them to be
manipul ext editors r tools th or co
c
Finally, identifiers cannot be the same as any of the keywords used for other pu
JavaScript. The next section lists the special names that are reserved in JavaScript.
ber of reserved words in JavaScript. These are words that y

as identifie
programs.
riable n
2-1
loop la our JavaS
lists the k ds standardized by ECMAScript v3. These words
ial me to JavaS
e 2-1 served Java word
eywor
have spec aning cript -- they are part of the language syntax itself.
Tabl . Re Script key s
break do if switc peofh ty
case else in this var
catch false instanceof throw void
continue finally new true while
default for null try with
delete function return
Table 2-2 lists other reserved keywords. These words are not currently used in
JavaScript, but they are reserved by ECMAScript v3 as possible future extensions to the
language.
Table 2-2. Words reserved for ECMA extensions
abstract double goto native static
boolean enum implements package super
byte export import private synchronized
char extends int protected throws

Lexical Structure

Tài liệu liên quan

Tài liệu bạn tìm kiếm đã sẵn sàng tải về