Tải bản đầy đủ (.pdf) (94 trang)

Class Notes in Statistics and Econometrics Part 11 potx

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (523.78 KB, 94 trang )

CHAPTER 21
About Computers
21.1. General Strategy
With the fast-paced development of computer hardware and software, anyone
who uses computers profesionally needs a strategy about how to allocate their time
and money for hardware and software.
21.1.1. Operating System. In my view, there are two alternatives today:
either do everything in Microsoft Windows and other commercial software, or use
GNU/Linux, the free unix operating system together with the free software built on
top of it, see www.linux.org, in addition to Microsoft Windows. I will argue here for
the second route. It is true, GNU/Linux has a steeper learning curve than Windows,
but this also means that you have a more powerful tool, and serious efforts are under
535
536 21. ABOUT COMPUTERS
way to make GNU/Linux more and more user friendly. Windows, on the other hand,
has the following disadvantages:
• Microsoft Windows and the other commercial software are expensive.
• The philosophy of Microsoft Windows is to keep the user in the dark about
how the computer is working, i.e., turn the computer user into a passive
consumer. This severely limits the range of things you can do with your
computer. The source code of the programs you are using is usually unavail-
able, therefore you never know exactly what you are doing and you cannot
modify the program for your own uses. The unavailability of source code
also makes the programs more vulnerable to virus attacks and breakins.
In Linux, the user is the master of the computer and can exploit its full
potential.
• You spend too much time pointing and clicking. In GNU/Linux and other
unix systems, it is possible to set up menus to o,m but everything that can
be done through a menu can also be done on the command line or through
a script.
• Windows and the commercial software based on it are very resource-hungry;


they require powerful computers. Computers which are no longer fast and
big enough to run the latest version of Windows are still very capable to
run Linux.
21.1. GENERAL STRATEGY 537
• It is becoming more and more apparent that free software is more stable
and of higher quality than commercial software. Free software is developed
by programmers throughout the world who want good tools for themse lves.
• Most Linux distributions have excellent systems which allows the user to
automatically download always the latest versions of the software; this au-
tomates the tedious task of software maintenance, i.e., updating and fitting
together the updates.
Some imp ortant software is not available on Linux or is much better on Windows.
Certain tasks, like scanning, voice recognition, and www access, which have mass
markets, are better done on Microsoft Windows than on Linux. Therefore you will
probably not be able to eliminate Microsoft Windows completely; however it is pos-
sible to configure your PC so that you can run MS-Windows and Linux on it, or
to have a Linux machine be the network server for a network which has Windows
machines on it (this is more stable, faster, and cheaper than Windows NT).
There are s everal versions of Linux available, and the one which is mos t inde-
pendent of commercial interests, and which is one of the most quality-conscious dis-
tributions, in my view, is Debian GNU/Linux, . The Linux
route is more difficult at the beginning but will pay off in the long run, and I recom-
mend it esp e cially if you are going to work outside the USA. The Salt Lake Linux
538 21. ABOUT COMPUTERS
Users Group meets on the third Wednesday of
every month, usually on the University of Utah campus.
In order to demonstrate the usefulness of Linux I loaded Debian GNU/Linux on
an old computer with one of the early Pentium processors, which bec ame available at
the Econ Department because it was too slow for Windows 98. It is by the w indow
in the Econ Computer Lab. When you log onto this computer you are in the X-

windows system. In Linux and other unix systems, the mouse usually has 3 buttons:
left, right, and middle. The mouse which comes with the computer in the computer
lab has 2 bottons: left and right, but if you press both buttons simultaneously you
get the same effect as pressing the middle button on a unix mouse.
If the cursor is in front of the background, then you will get 3 different menus by
pressing the different mouse buttons. The left mouse button gives you the different
programs, if you press both buttons at the same time you can perform operations on
the windows, and the right button gives you a list of all open windows.
Another little tidbit you need to know about unix systems is this: There are no
drives as in Microsoft Dos or Windows, but all files are in one hierarchical directory
tree. Instead of a backslash \ you have a forward slash /. In order to use the floppy
disk, you have to insert the disk in the disk drive and then give the command mount
/floppy. Then the disk is accessible to you as the contents of the directory /floppy.
21.1. GENERAL STRATEGY 539
Before taking the disk out you should give the command umount /floppy. You can
do this only if /floppy is not the current directory.
In order to remotely acce ss X-windows from Microsoft-Windows, you have to go
through the following steps.
• click on the exceed icon which is in the network-neighborhood folder.
• then open a telnet session to the unix station you want to access.
• at the unix station give the who -l command so that you know the id of the
machine from which you are telnetting from; assume it is econlab9.econ.utah.edu.
• then give the command (if you are in a bash shell as you probably will b e
if it is linux)
DISPLAY=econlab9.econ.utah.edu:0; export DISPLAY
or, if it is the C-shell:
setenv DISPLAY econlab9.econ.utah.edu:0
DISPLAY=buc-17.econ.utah.edu:0; export DISPLAY
Something else: if I use the usual telnet program which comes with windows, in
order to telnet into a unix machine, and then I try to edit a file using emacs, it does

not work, it seems that some of the key sequences used by emacs make telnet hang.
Therefore I use a different telnet program, Teraterm Pro, with downloading instruc-
tions at answers/remote access/install ttssh.html.
540 21. ABOUT COMPUTERS
21.1.2. Application Software. I prefer learning a few pieces of software well
instead of learning lots of software superficially. Therefore the choice of software is
an especially important question.
I am using the editor emacs for reading mail, for writing papers which are then
printed in T
E
X, for many office tasks, such as appointment calendar, address book,
etc., for browsing the www, and as a frontend for running SAS or R/Splus and also
the shell and C. Emacs shows that free software can have unsurpassed quality. The
webpage for GNU is www.gnu.org.
With personal computers becoming more and more powerful, emacs and much
of the Gnu-s oftware is available not only on unix systems but also on Windows. As
a preparation to a migration to Linux, you may want to install these programs on
Microsoft Windows first. On the other hand, netscape and wordperfect are now both
available for free on Linux.
Besides emacs I am using the typesetting system T
E
X, or, to be precise, the T
E
X-
macro-package A
M
S-L
A
T
E

X. This is the tool which mathematicians use to write their
articles and books, and many econometrics and statistics textbooks was written using
T
E
X. Besides its math capabilities, another advantage of T
E
X is that it supports
many different alphabets and languages.
For statistical software I recommend the combination of SAS and Splus, and it
is easy to have a copy of the GNU-version of Splus, called R, on your computer. R is
21.1. GENERAL STRATEGY 541
not as powerful as Splus, but it is very similar, in the simple tasks almost identical.
There is also a GNU version of SPSS in preparation.
21.1.3. Other points. With modern technology it is easy to keep everything
you ever write, all your class notes, papers, b ook excerpts, etc. It w ill just take
one or perhaps a handful of CD-roms to have it available, and it allows you greater
continuity in your work.
In my view, windowing systems are overrated: they are necessary for web brows-
ing or graphics applications, but I am still using character-based terminals most of
the time. I consider them less straining on the eye, and in this way I also have world-
wide access to my unix account through telnet. Instead of having several windows
next to each other I do my work in several emacs buffers which I can display at will
(i.e., the windows are on top of each other, but if necessary I can also display them
side by side on the screen).
In an earlier version of these notes, in 1995, I had written the following:
I do not consider it desirable to have a computer at home in which
I buy and install all the software for myself. The installation of the
regular updates, and then all the adjustments that are necesary so
that the new software works together again like the old software
did, is a lot of work, which should be centralized. I keep all

542 21. ABOUT COMPUTERS
my work on a unix account at the university. In this way it is
accessible to me wherever I go, and it is backed up regularly.
In the meanwhile, I changed my mind about that. After switching to Debian
GNU/Linux, with its excellent automatic updating of the software, I realized how
outdated the unix workstations at the Econ Department have become. My Linux
workstations have more modern software than the Sun stations. In my own situa-
tion as a University Professor, there is an additional benefit if I do my work on my
own Linux workstation at home: as long as I am using University computers, the
University will claim copyright for the software which I develop, even if I do it on
my own time . If I have my own Linux workstation at home, it is more difficult for
the University to appropriate work which they do not pay for.
21.2. The Emacs Editor
You can use emacs either on a character-based terminal or in X-windows. On a
character-based terminal you simply type emacs. In a windows setting, it is probably
available in one of the menus, but you can also get into it by just typing emacs & in
one of the x-terminal windows. The ampersand means that you are running emacs
in the “background.” This is sufficient since emacs opens its own window. If you
issue the command without the ampersand, then the X-te rminal window from which
21.2. THE EMACS EDITOR 543
you invoked local will not accept any other commands, i.e., will be useless, until
you leave emacs again.
The emacs commands which you have to learn first are the help commands.
They all start with a C-h, i.e., control-h: typ e h while holding the control button
down. The first thing you m ay want to do at a quiet moment is go through the emacs
tutorial: get into emacs and then type C-h t and then follow instructions. Another
very powerful resource at your fingertip is emacs-info. To get into it type C-h i. It
has information pages for you to browse through, not only about emacs itself, but
also a variety of other subjects. The parts most important for you is the Emacs menu
item, which gives the whole Emacs-manual, and the ESS menu item, which explains

how to run Splus and SAS from inside emacs.
Another important emacs key is the “quit” command C-g. If you want to abort a
command, this will usually get you out. Also important command is the changing of
the buffer, C-x b. Usually you will have many buffers in emacs, and switch between
them if needed. The command C-x C-c terminates emacs.
Another thing I recommend you to learn is how to send and receive electronic
mail from inside emacs. To send mail, give the command C-x m. Then fill out address
and message field, and send it by typing C-c C-c. In order to receive mail, type M-x
rmail. There are a few one-letter commands which allow you to move around in
544 21. ABOUT COMPUTERS
your messages: n is next message, p is previous message, d is delete the message, r
means: reply to this message.
21.3. How to Enter and Exit SAS
From one of the computers on the Econ network, go into the Windows menu and
double-click on the SAS icon. It will give you two windows, the command window
on the bottom and a window for output on the top. Type your commands into the
command window, and click on the button with the runner on it in order to submit
the commands.
If you log on to the workstation marx or keynes, the first command you have
to give is openwin in order to start the X-window-system. Then go to the local
window and give the command sas &. The ampersand means that sas is run in the
background; if you forget it you won’t be able to use the local window until you
exist sas again. As SAS starts up, it creates 3 windows, and you have to move those
windows where you want them and then click the left mouse button.
From any computer with telnet access, get into the DOS prompt and then type
telnet marx.econ.utah.edu. Then sign on with your user-id and your password,
and then issue the command sas. Over telnet, those SAS commands which use
function keys etc. will probably not work, and you have to do more typing. SAS over
telnet is more feasible if you use SAS from inside emacs for instance.
21.4. HOW TO TRANSFER SAS DATA SETS BETWEEN COMPUTERS 545

The book [Ell95] is a simple introduction into SAS written by an instructor of
the University of Utah and used by Math 317/318.
21.4. How to Transfer SAS Data Sets Between Computers
The following instructions work even if the computers have different operating
systems. In order to transfer all SAS data files in the /home/econ/ehrbar/sas
directory on smith to your own computer, you have to first enter SAS on smith and
give the following commands:
libname ec7800 ’/home/econ/ehrbar/ec7800/sasdata’;
proc cport L=ec7800;
run;
This creates a file in the directory you were in when you started SAS (usually
your home directory) by the name sascat.dat. Then you must transport the file
sascat.dat to your own computer. If you want to put it onto your account on the
novell network, you must log to your novell account and ftp from there to smith
and get the file this way. For this you have to login to your account and then
cd ehrbar/ec7800/sasdata. and then first give the command binary because it
is a binary file, and then get sascat.dat. Or you can download it from the www
by ehrbar/sascat.dat. but depending on your web
546 21. ABOUT COMPUTERS
browser it may not arrive in the right format. And the following SAS commands
deposit the data sets into your directory sasdata on your machine:
libname myec7800 ’mysasdata’;
proc cimport L=myec7800;
run;
21.5. INSTRUCTIONS FOR STATISTICS 5969, HANS EHRBAR’S SECTION 547
21.5. Instructions for Statistics 5969, Hans Ehrbar’s Section
21.5.1. How to Download and Install the free Statistical Package R.
The main archive for R is at , and the mirror for the
USA is at . Here are instructions, current as of May
30, 2001, how to install R on a Microsoft Windows machine: click on “Download R

for Windows”; this leads you into a directory; go to the subdirectory “base” and
from there download the two file SetupR.exe. I.e., from Microsoft Internet Explorer
right-click on the above link and choose the menu option: “save target as.” It will
ask you where to save it; the default will probably be a file of the same name in the
“My Documents” folder, which is quite alright.
The next step is to run SetupR.exe. For this it close Internet Explorer and any
other applications that may be running on your computer. Then go into the Start
Menu, click on “Run”, and then click on “Browse” and find the file SetupR.exe in
the “My Documents” folder, and press OK to run it.
It may be interesting for you to read the license, which is the famous and influ-
ential GNU Public License .
Then you get to a screen “Selec t Destination Directory”. It is ok to choose the
default C:\Program Files\R\rw1023, click on Next.
Then it asks you to select the components to install, again the default is fine,
but you may choose more or fewer components.
548 21. ABOUT COMPUTERS
Under “Select Start Menu Folder” again select the default.
You may also want to install wget for windows from .
uk/pub/Rtools/wget.zip. Interesting is also the FAQ at .
ac.uk/pub/R/rw-FAQ.html.
21.5.2. The text used in Stat 5969. This text is the R-manual called “An
Intro duction to R” version 1.2.3 which you will have on your computer as a pdf
file after installing R. If you used all the defaults above, the path is C:\Program
Files\R\rw1023\doc\manual\R-intro.pdf. This manual is also on the www at
/>21.5.3. Syllabus for Stat 5969. Wednesday June 13: Your reading assign-
ment for June 13 is some background reading about the GNU-Project and the con-
cept of Free Software. Please read />There will be a mini quiz on Wednesday testing whether you have read it. In class
we will go through the Sample Session pp. 80–84 in the Manual, and then discuss
the basics of the R language, chapters 1–6 of the Manual. The following homework
problems apply these basic language features:

Problem 264. 3 points In the dataset LifeCycleSavings, which R-command
returns a vector with the names of all countries for which the savings rate is smaller
than 10 percent.
21.5. INSTRUCTIONS FOR STATISTICS 5969, HANS EHRBAR’S SECTION 549
Answer. row.names(LifeCycleSavings)[LifeCycleSavings$sr < 10]. 
Problem 265. 6 points x <- 1:26; names(x) <- letters; vowels <- c("a",
"e", "i", "o", "u’’) Which R-expression returns the subvector of x correspond-
ing to all consonants?
Answer. x[-x[vowels]] 
Problem 266. 4 points x is a numerical vector. Construct the vector of first
differences of x, whose ith element is x
i
−x
i−1
(i > 2), and whose first element is NA.
Do not use the function diff(x) but the tools described in Section 2.7 of R-intro.
Answer. x-c(NA, x[-1]) or c(NA, x[-1 ]-x[ -len gth (x)] ) 
Problem 267. 2 points x is a vector with missing values. which R-expression
replaces all missing values by 0?
Answer. x[is.na(x)] <- 0 or ifelse(is.na(x), 0, x. 
Problem 268. 2 points Use paste to get the character vector "1999:1" "1999:2"
"1999:3" "1999:4"
Answer. paste(1999, 1:4, sep=":") 
550 21. ABOUT COMPUTERS
Problem 269. 5 points Do the exercise described on the middle of p. 17, i.e.,
compute the 95 percent confidence limits for the state mean incomes. You should be
getting the following intervals:
act nsw nt qld sa tas vic wa
63.56 68.41 112.68 65.00 63.72 66.85 70.56 60.71
25.44 46.25 -1.68 42.20 46.28 54.15 41.44 43.79

Answer. state <- c("tas", "sa", "qld", "nsw", "nsw", "nt", "wa", "wa", "qld", "vic",
"nsw", "vic", "qld", "qld", "sa", "tas", "sa", "nt", "wa", "vic", "qld", "nsw", "nsw", "wa",
"sa", "act", "nsw", "vic", "vic", "act"); statef <- factor(state); incomes <- c(60, 49 ,
40, 61, 64, 60, 59, 54, 62, 6 9, 70, 42, 56, 61, 61, 61, 58, 51, 48, 65, 49, 49, 41, 48,
52, 46, 59, 46, 58, 43); incmeans <- tapply(incomes, statef, mean); stderr <- function(x)
sqrt(var(x)/length(x)); incster <- tapply(incomes, statef, stderr); sampsize <- tapply(incomes,
statef, length); Use 2-tail 5 percent, each tail has 2.5 percent: critval <- qt(0.975,sampsize-1);
conflow <- incmeans - critval * incster; confhigh <- incmeans + critval * incster; To print
the confidence intervals use rbind(confhigh, conflow) which gives the following output:
act nsw nt qld sa tas vic wa
confhigh 63.55931 68.41304 112.677921 65.00034 63.7155 66.8531 70.5598 60.70747
conflow 25.44069 46.25363 -1.677921 42.19966 46.2845 54.1469 41.4402 43.79253

21.5. INSTRUCTIONS FOR STATISTICS 5969, HANS EHRBAR’S SECTION 551
Problem 270. 4 points Use the cut function to generate a factor from the
variable ddpi in the da ta frame LifeCycleSavings. This factor should have the
three levels low for values ddpi ≤ 3, medium for values 3 < ddpi ≤ 6, and high for
the other values.
Answer. cut(LifeCycleSavings$ddpi, c(0,3,6,20), c("low", "medium", "hig h")) 
Monday June 18: graphical procedures, chapter 12. Please read this chapter
before coming to class, there will be a mini quiz again. For the following homework
it is helpful to do demo(graphics) and to watch closely which commands were used
there.
Problem 271. 5 points The data frame LifeCycleSavings has some egregious
outliers. Which plots allow you to identify those? Use those plots to determine which
of the data you consider outliers.
Answer. Do pairs(LifeCycleSavings) and look for panels which have isolated points. In
order to see which observation this is, do attach(LifeCycleSavings), then plot(sr,ddpi), then
identify(sr,ddpi). You see that 49 is clearly an outlier, and perhaps 47 and 23. Looking at some
other panels in the scatter plot matrix you will find that 49 always stands out, with also 47 and

44. 
552 21. ABOUT COMPUTERS
Problem 272. 5 points x <- 1:40 + rnorm(40) + c(1,3,0,-4) Assume x is
quarterly data. Make a plot of x in which each of the seasons is marked by a hollow
dot fil led in with a different color.
Answer. plot(x, type="n"); lines(x, lty="dotted"); points(x, bg=c("tan", "springgreen",
"tomato", "orange"), pch= 21) 
Wednesday June 20: More language features, chapters 6–10, and the beginning
of statistical models, chapter 11. A Mini Quiz will check that you read chapters 6–10
before coming to class. Homework is an estimation problem.
Monday June 25: Mini Quiz about chapter 11. We will finish chapter 11. After
this session you will have a take-home final exam for this part of the class, using the
features of R. It will be due on Monday, July 2nd, at the beginning of class.
If you have installed wget in a location R can find it in (I think no longer
necessary).
21.5. INSTRUCTIONS FOR STATISTICS 5969, HANS EHRBAR’S SECTION 553
In unix, it is possible to start R or Splus just by typing R or Splus, whether
you are in the X-windows system or on a character-based terminal.
But for serious work I prefer to run it from inside the editor emacs. Emacs
provides a very convenient front end for Splus and SAS (and other languages will be
added in the future). After entering Emacs, all you have to do is type M-x S (for
Splus version 5 which we have on our workstations) or M-x SAS (for SAS). Here M-x
means meta-x. On the workstations, the meta-key is the key to the left of the space
bar. It works like the control key. Hold down this key and then type x. If you telnet
in from your own com puter, you need a two-key sequence for all meta-characters:
first type the escape-key, then release it and then type x. If you do M-x S or M-x
SAS, emacs will ask you: “from which directory?” This is the directory to which you
would have cd’d before starting up Splus or SAS. Just type a return as a response,
in this way your home directory will be the default directory. Then you can type
and submit the Splus-commands given below from inside emacs.

Here are some common procedures for Splus: To dump a function into an edit
buffer do C-c C-d, to compile it do C-c C-l, for parsing errors C-x ‘, for help about
R/Splus C-c C-v, and for help on ess C-h i, and then m ESS.
The interface with SAS is at this p oint less well developed than that with Splus.
You have to write a file with your SAS-commands in it, typically it is called myfile.sas.
The file name extension should conventionally be sas, and if it is, emacs will help
554 21. ABOUT COMPUTERS
you writing the SAS code with the proper indentation. Say you have such a sas file
in your current buffer and you want to submit it to SAS. First do M-x SAS to start
SAS. This creates some other windows but your cursor should stay in the original
window with the sas-file. Then to C-c C-b to submit the whole buffer to SAS.
There are some shortcuts to switch between the buffers: C-c C-t switches you
into *SAS.lst* which lists the results of your computation.
For further work you may have to create a region in your buffer; go to the
beginning of the region and type C-@ (emacs will respond with the message in the
minibuffer: “mark s et”), and then go to the end of the region. Before using the
region for editing, it is always good to do the command C-x C-x (which puts the
cursor where the mark was and the marker where the cursor was) to make sure the
region is what you want it to be. There is apparently a bug in many emacs versions
where the point jumps by a word when you do it the first time, but when you correct
it then it will stay. Emacs may also b e configured in such a way that the region
becomes inactive if other editing is done before it is used; the command C-x C-x
re-activates the region. Then type C-c C-r to submit the region to the SAS process.
In order to make high resolution gs-plots, you have to put the following two lines
into your batch files. For interactive use on X-terminals you must comment them
out again (by putting /* in front and */ after them).
21.5. INSTRUCTIONS FOR STATISTICS 5969, HANS EHRBAR’S SECTION 555
filename grafout ’temp.ps’;
goptions device=ps gsfname=grafout gsfmode=append gaccess=sasgastd;
The emacs interface for Splus is much more sophisticated. Here are some com-

mands to get you started. Whenever you type a command on the last line starting
with > and hit return, this command will be submitted to Splus. The key combi-
nation M-p puts the previous command on the last line with the prompt; you may
then edit it and resubmit it simply by typing the return key (the cursor does not
have to be at the end of the line to do this). Earlier commands can be obtained
by repeated M-p, and M-n will scroll the commands in the other direction. C-c C-v
will display the help files for any object of your choice in a split screen. This is easy
to remember, the two ke ys are right next to each other, and you will probably use
this key sequence a lot. You can use the usual emacs commands to switch between
buffers. Inside S-mode there is name completion for all objects, by just typing the
tab key. There are very nice commands which allow you to write and debug your
own Splus-functions. The command C-c C-d “dumps” a Splus-object into a sep-
arate buffer, so that you can change it with the editor. Then when you are done,
typ e C-c C-l to “load” the new code. This will generate a new Splus-object, and if
this is successful, you no longer need the special edit buffer. These are well designed
powerful tools, but you have to study them, by accessing the documentation about
556 21. ABOUT COMPUTERS
S-mode in Emacs-info. They cannot be learned by trial and error, and they cannot
be learned in one or two sessions.
If you are sitting at the console, then you must give the command openwin()
to tell Splus to display high resolution graphs in a separate window. You will get a
postscript printout simply by clicking the mouse on the print button in this window.
If you are logged in over telnet and access Splus through emacs, then it is possible
to get some crude graphs on your screen after giving the command printer(width=79).
Your plotting commands will not generate a plot until you give the command show()
in order to tell Splus that now is the time to send a character-based plot to the screen.
Splus has a very convenient routine to translate SAS-datasets into Splus-datasets.
Assume there is a SAS dataset cobbdoug in the unix directory /home/econ/ehrbar/ec7800/sasdata,
i.e., this dataset is located in a unix file by the name /home/econ/ehrbar/ec7800/sasdata/cobbdoug.ssd02.
Then the Splus-command mycobbdoug <- sas.get("/home/econ/ehrbar/ec7800/sasdata",

"cobbdoug") will create a Splus-dataframe with the same data in it.
In order to transfer Splus-files from one computer to another, use the data.dump
and data.restore commands.
To get out of Splus again, issue the command C-c C-q. It will ask you if you
want all temporary files and buffers deleted, and you should answer yes. This will
not delete the buffer with your Splus-commands in it. If you want a record of your
21.6. THE DATA STEP IN SAS 557
Splus-session, you should save this buffer in a file, by giving the command C-x C-s
(it will prompt you for a filename).
By the way, it is a go od idea to do your unix commands through an emacs buffer
too. In this way you have a record of your session and you have eas ier facilities
to recall commands, which are usually the same as the commands you use in your
*S*-buffer. To do this you have to give the command M-x shell.
Books on Splus include the “B lue book” [BCW96] which unfortunately does
not discuss some of the features recently introduced into S, and the “White book”
[CH93] which covers what is new in the 1991 release of S. The files book.errata and
model.errata in the directory /usr/local/splus-3.1/doc/ specify known errors
in the Blue and White book.
Textbooks for using Splus include [VR99] which has an url www.stats.oz.ac.uk/pub/MASS3/
[Spe94], [Bur98] (downloadable for free from the internet), and [Eve94].
R has now a very convenient facility to automatically download and update
packages from CRAN. Look at the help page for update.packages.
21.6. The Data Step in SAS
We will mainly discuss here how to create new SAS data sets from already
existing data sets. For this you need the set and merge statements.
558 21. ABOUT COMPUTERS
Assume you have a dataset mydata which includes the variable year, and you
want to run a regression procedure only for the years 1950–59. This you can do by
including the following data step before running the regression:
data fifties;

set mydata;
if 1950 <= year <= 1959;
This works because the data step executes every command once for every obser-
vation. When it executes the set statement, It starts with the first observation and
includes every variable from the data set mydata into the new data set fifties; but
if the expression 1950 <= year <= 1959 is not true, then it throws this observation
out again.
Another example is: you want to transform some of the variables in your data
set. For instance you want to get aggregate capital stock, investment, and output
for all industries. Then you might issue the commands:
data aggregate;
set ec781.invconst;
kcon00=sum(of kcon20-kcon39);
icon00=sum(of icon20-icon39);
ocon00=sum(of ocon20-ocon39);
keep kcon00, icon00, ocon00, year;
21.6. THE DATA STEP IN SAS 559
The keep statement tells SAS to drop all the other variables, otherwise all variables
in ec781.invconst would also be in aggregate.
Assume you need some variables from ec781.invconst and some from ec781.invmisc.
Let us assume both have the same variable year. Then you can use the merge state-
ment:
data mydata;
merge ec781.invcost ec781.invmisc;
by year;
keep kcon20, icon20, ocon20, year, prate20, primeint;
For this step it is sometimes necessary to rename variables before merging. This can
be done by the rename option.
The by statement makes sure that the years in the different datasets do not get
mixed up. This allows you to use the merge statement also to get variables from the

Citybase, even if the starting end ending years are not the same as in our datasets.
An alternative, but not so good method would be to use two set statements:
data mydata;
set ec781.invcost;
set ec781.invmisc;
keep kcon20, icon20, ocon20, year, prate20, primeint;

×