sequence data analysis guidebook

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (26.01 MB, 318 trang )

1
GeneJockeyll
Entering and Editing Sequences
Phil Taylor
1. Introduction
Entering sequence by hand is a tedious and error-prone process. In general,
if the sequence that you need is available in any electronic form, you should be
able to import it into GeneJockey without having to retype the data. For
example, most sequences published in research papers are normally accompa-
nied by a GenBank/EMBL accession number, which allows you to retrieve the
sequence from the GenBank CD-ROM or from a remote networked database.
If, however, you have no option but to type the required sequence (for
example, if you are reading sequence by hand from a manual sequencmg gel),
GeneJockey provides powerful facilities to do so, and to check the accuracy
of the entered data. Sequence data in GeneJockey is simple text, displayed in
capitals, and behaves just as text does in any word processor. All the standard
editing commands act in the way in which you expect them to act, and you may
use fonts, styles, and colors to draw attention to parts of your sequence, just as
you would when editing ordinary text.
2. Materials
1. Hardware: GeneJockey requires a Macintosh with ColorQuickdraw in ROM
(this excludes the Macintosh plus [and older machines], the SE, the PowerBook
100, and the Macintosh Portable). The program also requires system 7.0 or later,
and at least 2 Mb of available memory. A color display capable of showing 256
colors is helpful but not essential.
2. Software: For the operations described in this chapter, you need only the
GeneJockey program itself. For operations described in later chapters, you will
need some additional files supplied with the program. You would normally install
From* Methods in Molecular Bfology, Vol 70’ Sequence Data Analysis Guidebook
Edlted by S R Swlndell Humana Press Inc , Totowa, NJ
1

2 Taylor
Comments
Q
areas put any
Rat Pituitary GnRH Receptor
relevanl tex, .
data here
cofeatures
J
Open Reading frame III blue I - 984 (inc stoa)
P
I’
II 232-291
CL.“ .^ First Nt -37 - , .
nsmembrais regions underlinaQ
15 - 174
Comments
scrollbar
and comments
Butlon used 10 set
the numbering of
Ihe first nucleaude
wmdow
Fig. 1. Anatomy of a GeneJockey sequence window.
GeneJockey on your hard drive by simply copying all the files supplied into a
single folder. When running on a Power Macintosh, the GeneJockey Helper file
should be present in the same folder. The native-code resources in this tile run
about ten times faster than the code in the main program, and since multiple align-
ment is a time-consuming process, the extra speed is very helpful. GeneJockey is
licensed for use only on a angle-user basis, but is not copy-protected.

3. Methods
3.1. Sequence Entry
1. Start up the program by double-clicking on the GeneJockey icon. The program
offers three kinds of windows in which you may enter and edit text. For this
reason, the New command m the File menu is hierarchical, offering you the
choice of a new nucleotide sequence window, peptide sequence window, or a plain
text window We will start by opening a nucleotide sequence window and enter-
ing a DNA sequence (see Note 1). Fig. 1 shows a nucleotide sequence window.
GeneJockey//: Entering and Editing
2. Use the New > Nucleotide sequence command to open the window Note that the
window title is Untitled 1. As is usual with Macintosh programs, the window will
not be given a title until you save it to disk (see Note 1).
3. Use the Save as , command from the File menu to save your new window before
you start typing.
4. Give the file a suitable name for the sequence you are going to enter.
5. When the file IS saved, click on the empty sequence box to place the Insertion
point at the top left of the box.
6. Start typmg your DNA sequence. Note that the program converts text that you
type mto this box to uppercase (see Notes 2,3).
7. Next, select Speak on Entry from the Edit menu. Continue typing. Each time you
hit a key, the machine will speak the corresponding letter. This is very helpful if you
are not a touch typist, because it means that you do not have to look at the screen to
check what you type. You can turn this facility off again using the same command.
8. Select Tidy up to format the sequence into blocks of 10 nucleotides (see Note 4).
9. Once you have typed in a few lines of sequence use the Save command to update
the disk file. It IS always a good idea to save sequences frequently when typing,
in case of accidents. You should make sure your sequence is saved before carry-
mg out the operations m the next paragraph.
10. Type a few more bases and look at the Revert to Original and Undo commands
(see Notes 5,6).

3.2. Switching Between Circular and Linear Sequences
1 Of the three buttons at the center of the screen, the left-hand button currently
reads Linear. When you click on it, the legend changes to Circular. The button
toggles between these two states, and the legend indicates the current conforma-
tion of the sequence. The difference between linear and circular sequences is for
the most part trivial, affecting only the restriction enzyme analysis, m which it is
important to deal correctly with restriction sites that span the origin of circular
sequences (i.e., where part of the site is at the top left of the display at position 1,
and the other part at the very end)
2. Click on the button again to return the sequence to the linear state
3.3. Changing the Origin Point
1. Click on the Set Origin button. You will see a dialog box that asks you for the
number of the first nucleotide in the sequence and tells you that you may enter
any number between 32 and -32 K, except zero (see Note 7).
2. Enter a small negative number, such as -20, and click on OK. The First Nt:
legend at center left now reads -20 to remmd you of the current numbering, and
if you run the cursor along the top line of the sequence, you ~111 see that the
numbering jumps from -1 to +l without using zero.
3. Click on the Set Origin button again and set the origin back to 1.
4. Now, make the sequence circular, and if you have made any changes, save the
sequence again. (If you can not remember whether you have made any significant
Taylor
5.
changes, pull down the File menu and look at the Save command. If it is disabled
then you do not need to save.)
Now use the Set Origin button again. The effect of changing the ongin of a clrcu-
lar sequence is quite different, since by convention the origin of a circular sequence
is always shown at the top left of the display If you set the origin to -20, the
sequence will be rotated so that the last 20 nucleotides are brought to the begin-
ning, with the nucleotlde that was twentieth from the end of the sequence dis-

played at the top left, and numbered 1. Remember that there is
no
Undo command
for this, so it is a good idea to make sure the sequence is saved in case you make
a mistake with the numbering. You can then use the Revert command to restore
the original display. The effect of circularizing a linear sequence whose origin is
not the first nucleotide displayed is similar, and the same caution applies here.
3.4. Verifying the Sequence
Entry
of sequences at the keyboard is an error-prone process, and if you
wish to be certain that the sequence you have entered is correct it is necessary
to use some form of verification. GeneJockey offers you two methods of
verifying sequences:
Verify by Speaking
and
Verify by Typing.
Both com-
mands are found in the
Edit
menu.
1. First, click at the top left of the sequence to set the insertion point at the begm-
mng (or just before the part of the sequence you wish to check).
2 Select Verify by Speaking. The
computer will speak the first
10 bases of
the sequence, perrmtting you to check that you have entered them correctly. Hit the
space bar or any other prmtmg key to start readmg the next 10 bases. If you wish to
move quickly around the sequence, use the left or nght arrow keys to move forward or
back 10 bases, or the up or down arrow keys to move one hne up or down (see Note 8).
3. Set the insertion point back to the begmning of the section you wish to check.

4. Select Verify by Typing (see Note 8).
5. Start retyping the sequence. As you type each base, the selection moves one place
forward. If you type a base that does not match the sequence you entered ongl-
nally, the machme will beep and the selection will not move on.
6. In order to correct the error, type Command-period and the machine will return
control to you with the incorrect base already selected for changing.
7. Type the correct base then reissue the Verify by Typing command to continue
verification (since this is a keyboard-orientated operation you will find it quicker
to
use the Command-T equivalent to restart verification). As before, you may use
the arrow keys to move around rapidly during verification, and the machine will
exit from the mode automatically when you reach the end of the sequence.
3.5. Annotating Sequences
You can insert notes and comments on your sequence in the upper text box
of the window. Only one of the text boxes is active at any time, indicated by the
flashing insertion point.
GeneJockey//: Entering and Editing
5
1. Click in the top box and type in a few lines of text. Comments in GeneJockey are
simple free-form text: You may type in anything you want here. Text m either
box that is off screen can be reached by using the scroll bars in the usual way.
2. Click on the arrows at the top or bottom of the scroll bars; the text ~111 scroll by
one line. If you continue to hold down the mouse button, the text will scroll a
second lme after a short pause. Holding down the button contmuously produces
progressively shorter pauses until the text is scrolling at full speed. All of the
standard Macintosh editing commands, Cut, Copy, Paste, Clear, and Undo, apply
to both Comment and Sequence boxes, but Speak on Entry and the two Verify
commands only operate on the sequence box.
3.6, Advanced Editing Making a Construct
GeneJockey is a multiwindow editor, and you may have as many windows

open at once as you need, subject to a maximum of 50. This means that you can
construct new sequences by copying text from one window and inserting it into
a sequence in a second window. We will use this faciltty to insert the sequence
that you previously typed into a plasmid vector, and in a later chapter we will
run a restriction analysis on this construct.
1. Use the Open command to open a suitable linear DNA sequence. Use the dopam-
me D2A receptor sequence from the demo files disk supplied with the program if
you have no other sequence.
2 Next, Open a suitable vector sequence; we will use the plasmid pBluescript as an
example.
3. Bring the first window back to the front, either by clicking on it or by selectmg its
title from the Windows menu. We are going to ligate this sequence into the EcoRI
site of pBluescript, and to do this properly we will first have to attach EcoRI
linkers to our test sequence. The recognition sequence for this enzyme 1s
G 1 AATTC, where 1 represents the cut site, so we have to ensure that our test sequence
starts with AATTC and ends with G (of course, real linkers are a little longer than
that, but we need not concern ourselves with that here).
4. Set the insertion point at the beginnmg of the test sequence and type m AATTC.
5. Use the Tidy up button to put the sequence back in regular columns.
6. Next, scroll to the end of the sequence (if it is not on the screen).
7. Set the insertion point after the last nucleotide.
8. Type in a single
G.
9. Switch back to the window containing the vector.
10. Locate the EcoRl site in the vector. To do this you could run a restriction analy-
sis,
but that IS a little complex just to find a single restriction site. Instead, we will
use the
Find command. First, make sure that the insertion pomt is at the begin-
ning of the sequence, then select

Find > in sequence
. from the
Find menu (see
Note 9).
11. Type m GAATTC and hit the OK button. The program will scroll the sate onto the
widow
and leave tt selected.
6
Taylor
12.
13.
14.
15.
16.
17.
18.
19.
20.
Set the insertion point on the cut site, i e , between the
G
(at 701) and the followmg
A.
Click on the test sequence window to bring it back to the front.
Select the whole sequence by means of the Select All command from the Edit
menu. (You could also do this by dragging across the whole sequence, or by
setting the insertion point at the beginning and shift-chckmg at the end.) So that
we will be able to identify the insert when we have made the construct, it is a
good idea to label it now.
Use the Color .
command from the Text menu to put the sequence mto a con-

trasting color (see Note
10)
Next, copy the entire sequence onto the clipboard by means of the Copy com-
mand from the Edit menu.
Brmg the vector sequence window back to the front. If you do this by chckmg on
it, be careful to click only once, or you may shift the insertion point from the
place where you left It. Check that tt is still after the
G
at 701.
Paste the test sequence in using the Paste command from the Edit menu.
Click on the Tidy up button to reformat the sequence
Save It under a smtable name.
There-you have Just ligated a test sequence into a vector-I bet you wish it
was that simple
in the real
world!
3.7. Inverting Sequences
Suppose that we have only the construct sequence to work with, but we
decide that the wrong strand of DNA has been inserted into the vector, and we
need to take it
out, invert it (i.e., generate the opposite strand), and put it back
again. First, we have to select the insert, which is now in the middle of the
pBluescript
sequence. We know where the beginning is, just after the EcoRI
site at 701, so we only need to locate the end. We could find that numerically
by adding the length of the test sequence to 701, or we could simply scroll
down the screen to see where the color changes, but we will search again for
the second EcoRI site, which now marks the end of the insert.
1. Set the insertion point at the beginning of the sequence
2. Select the Find Same command. This simply repeats the previous search, find-

ing the original site.
3. Repeat the Find Same
command to find the second EcoRI site.
4 Set the insertion point just before the G of the second site
5. Scroll back to the first site at 701 Hold down the shift key while you click after
the
c
of the first site. The whole of the insert
will then be selected. (In
GeneJockeyII, the cursor display remains active while you drag, so you could
also just drag across the part of the sequence that you want, watching the num-
bers to see when you get to the right place. Yet another alternative would be to
use the Select . command from the Find menu and specify numerically the region
of sequence you want selected.)
6. Copy the insert onto the clipboard.
7. Use the New > Nucleotide Sequence command to generate a new sequence wmdow.
8. Paste the sequence into it and Tidy it.
9. Select Invert from the Modify menu. The program opens a new window con-
taining the inverted sequence (see Note 11).
10. Use Select All to change the color as before, if you wish
11. Copy the entire sequence.
12 Pull the window containing the construct back to the front. Since we now have
several wmdows open, it is easier to do this by means of the Windows menu than
by trying to find it by moving the windows around on the screen. The part of the
sequence that represents our original insert is still selected
13. Paste the inverted sequence, and it will replace the original.
14. Tidy up the sequence.
We are now finished with the windows that we currently have open, so close
them all. To do this, hold down the Option key while chckmg m the close box
of the front window. The program will close all the windows in turn, prompt-

ing us as it does so to save any new work.
4. Notes
1. Using the New command offers three alternatives. One is for creating a new nucle-
otide sequence. The second is for creating a new pepttde sequence. Peptide sequen-
ces are entered in precisely the same way as nucleotrde sequences, and a peptide
sequence window looks Just like a nucleotide sequence window, the only obvious
difference being that the origin prompt at center left reads “First AA:” rather than
“First Nt:.” You will notice some differences when you come to use the modifica-
tion and analysis commands, however, since different menu commands will be
enabled depending on what type of window is foremost on the screen.
Peptide sequences are entered in single letter code and represented in uppercase
characters only. There are no wildcard characters. The type of window you choose
specifies whether the program will treat the sequence as DNA or protein, and there 1s
very little to prevent you from entering the wrong kind of sequence into a window
(there is no way for the program to distinguish between a short DNA sequence and
the equivalent set of characters representing a peptide conststing entirely of alanine,
cysteine, glycine, and tbreonine, for example), so be careful when usmg the New
command to
ask for the correct window type
for
the sequence you intend to enter.
A third type of
window that may be obtained with the New command IS a plain
text window. This has a single scroll bar and is 80 characters wide. There is a title
area at the top that holds a single line of text and initially reads “New text win-
dow.” This title string is not directly editable, but may be changed via a dialog
box obtained by clicking in this area. The remainder of the window acts as a plain
text area, and is useful for general purpose editing Many of the analyses. that
GeneJockey performs display their results in text windows, and you may edit
such results before printing or saving them.

2. GeneJockey only handles sequences consisting of uppercase symbols. Note that
when you reach nucleottde number 10, and any multiple of 10 thereafter, the
program will automatically insert a space or return so that the sequence is dis-
played in blocks of 10. In a nucleotide sequence window, you may use the sym-
bols A, C, G, and T, plus the standard degenerate symbols that are used to
represent the case m which a particular posmon may be occupied by more than
one base. U is not a legal character, so RNA sequences should be entered as
DNA If you type an illegal character you will get a dialog box displaymg the
complete list of these characters. For example, type m an X to see thu. You can
also see the display of permitted degenerate codes at any time by selecting the
Show Wildcards command from the Edit menu. You can dismtss the
Wildcards dialog either by clicking on the Cancel button or by clickmg on any of
the buttons that display the degenerate codes; in the latter case the dialog causes
that code to be inserted mto the sequence at the current selection point
3. When entering DNA sequences you will make extensive use of the A, C, G, and T
keys, and it is most convenient to have these keys close together so that you can enter
the data with one hand and not have to look at the keyboard Use the Re-Assign
Keys command from the Edit menu to do this. Because I am right-handed, I nor-
mally reassign the keys U, I, 0, and P to give me A, C, G, and T, respectively. This
has the advantage that none of U, I, 0, or P are degenerate codes, so I will never want
to use them for then original symbols within a DNA sequence, and they are close
enough on the keyboard to the delete key that if I make a mistake I can backspace
over it without taking my eyes off the gel or sequence from which I am reading. If
you wish your keyboard always to work in this way, you should click on the Set
Default checkbox before clicking on OK in the dialog. To return the keyboard to
normal you should click on the Standard Layout button. The reassigned keyboard
only applies to DNA sequences; the keyboard will operate normally when you type
ordinary text into the comments area of a sequence window or anywhere else
4. You have probably noticed by now that if you move the mouse cursor across the
sequence box the number of the nucleotide beneath the cursor is continuously

displayed at center left. This IS very helpful for locating a particular nucleottde
by number. The calculatton of the number does, however, depend on the sequence
being formatted correctly m regular blocks of ten. Some operations destroy this
regular format, and the function of the Tidy up button is to restore order m these
cases. For example, suppose you wished to insert an extra block of sequence m
the middle of your existing sequence. Place the insertion pomt m the middle of
the sequence by clicking on it. Now type in a few nucleotides The resulting
disorder would not affect any analyses that you later ran on this sequence, since
all the analyses ignore the presence of space and return characters, but it looks
untidy and spoils the operation of the cursor posttion display. Click on the Tidy
Up button to put the sequence back mto regular columns. It would have been
possible to make the program tidy the sequence after every keystroke, but it would
have slowed the operation of the program to an irrttatmg extent, especrally when
inserting residues near the beginning of a long sequence.
GeneJockey//: Entering and Editing 9
5. If you now wish to restore your sequence to its original state, select the Revert to
Original command from the File menu. This returns the window to the state it
was m when you issued the last Save command, checking with you first to see if
you really want to discard any changes made smce then.
6. Another way to reverse any change you have made is to use the Undo command
at the top of the Edit menu. Pull down the menu and look at this command now.
It reads Undo Typing, and if you use it, all the typing you have done since you
placed the insertion point will be removed. Undo always shows you what can be
undone. Almost all editing operations can be undone, the only exceptions being
the three operations performed with the buttons at the center of the screen, It may
read Cannot Undo, and be disabled (i.e., it is shown in gray, and does not respond
if you try to use it). This is because the file has just been loaded or saved, and you
have not yet made any changes: There is nothing to undo.
7. Set Origin changes the way in which the sequence is numbered, and has different
effects depending on whether the sequence is linear or circular. The origin of a linear

sequence is position number 1, which may be anywhere on the screen, or indeed
outside the sequence displayed. If your sequence represents a small segment of a
larger sequence that is itself numbered from 1, the first nucleotide displayed on the
screen will have a number >l . If, on the other hand, you wish to set the ongm at
some feature m the body of the sequence (for example, at the start codon of a
translated region), the first nucleotide will have a negative number. By convention,
nucleotide numbermg does not use zero, so you may not set the origin to zero.
Strictly speaking, when you set the origin of a linear sequence, you do not specify
the position of the origin itself, but rather the numbering of the first nucleotide.
8. Verify by Typing and Verify by Speaking are modal commands, i e., you can
not do anything else at the same time, because the menus, scrollbars, and so on,
are all inactive. When the program has talked its way to the end of the sequence
it will exit automatically from this mode and return to normal operatton If you
wish to exit before the end of the sequence is reached (in order to make correc-
tions) you may do so by holding down the command key and simultaneously
typing a period. (This is the standard Macintosh abort command: You can stop
most operations m GeneJockey this way if you change your mind.)
9. The Find command in GeneJockey is similar to that in a word processor, but has
some special facilities for use with sequences. Since all sequences m GeneJockey
are m uppercase, it does not matter whether you type in the target sequence in
capitals or lowercase; the program will convert the characters to capitals before
searching. You can include degenerate codes in the target sequence, so AATNG
will find AATAG, AATCG, AATGG, or AATTG. Likewise, degenerate codes in
the search sequence will be honored, so AATTG will find not only AATTG but
NATAG, ANTAG, AANAG, and so on. The Find command will also permit you
to specify a number of allowable mismatches, so you can find sections that are
similar to, but not identical to the target sequence. You can also set the program
to find the mmimum number of mismatches required to produce a match, by
means of the Find Mismatches button.
10

Taylor
10. Using the Text Menu: Unlrke most sequence handling programs, GeneJockey
has the ablhty to make use of formatted text. Any part of a sequence or annotation
text may be placed in any font, size, style, or color. This 1s most useful for labeling
parts of a sequence, especially since when you make constructs by editmg sequences
together the format is camed over to the composite sequence, allowing you to iden-
trfy immediately where each part of the composite sequence came from.
Most of the Text menu, and its submenus Font and Style, will be familiar to
Macintosh users. You may be surprised to see that very few fonts are displayed in
the Font submenu. The reason for this is that GeneJockey only displays fixed-
width fonts here. Most users will find only Monaco and Courier fonts hsted The
reason for this is that proportionally spaced fonts, which look so nice for standard
text, disrupt the display of sequences, making It impossible to lme up the blocks
neatly. Here are some examples.
9 pt. Monaco font (the default):
CGAAGGGCTC CCCACTCCTA GCCAGCCCAC ACCAAGCTTC TTGCAGCCCG
GGGAGCAAGT GGAACTAAAC CTGCGGCAGG TTTAAATGTG TATTTGGCTA
CTTGGCTACT GAGTAGAGAA CACAAAATGA ATAACTCCAC CAACTCCTCT
AACAGTGGCC TGGCTCTGAC CAGTCCTTAT AAGACATTTG AAGTGGTTTT
10 pt. Courier font (good for printing on postscript printers, but less legible on screen):
CGAAGGGCTC CCCACTCCTA GCCAGCCCAC ACCAAGCTTC TTGCAGCCCG
GGGAGCAAGT GGAACTAAAC CTGCGGCAGG TTTAAATGTG TATTTGGCTA
CTTGGCTACT GAGTAGAGAA CACAAAATGA ATAACTCCAC CAACTCCTCT
AACAGTGGCC TGGCTCTGAC CAGTCCTTAT AAGACATTTG AAGTGGTTTT
TATTGTCCTT GTCGCCGGAT CCCTCAGTTT GGTGACCATT ATTGGGAACA
TCCTGGTCAT GGTCTCCATC ZLIAGTCAACC GACACCTCCA GACAGTCAAC
AATTACTTTT TGTTCAGCTT GGCCTGTGCT GACCTCATCA TTGGTGTTTT
CTCCATGAAC CTGTACACTC TTTACACTGT GATTGGCTAC TGGCCTTTGG
GCCCCGTGGT GTGTGACCTT TGGCTAGCTC TGGACTACGT GGTCAGTAAT
12 pt. Geneva font (proportionally spaced and therefore fine for text, but messy for

sequences):
CGAAGGGCTC CCCACTCCTA GCCAGCCCAC ACCAAGCTTC
TTGCAGCCCG
GGGAGCAAGT GGAACTAAAC CTGCGGCAGG TTTAAATGTG
TATTTGGCTA
CTTGGCTACT GAGTAGAGAA CACAAAATGA ATAACTCCAC
CAACTCCTCT
AACAGTGGCC TGGCTCTGAC CAGTCCTTAT AAGACATTTG
AAGTGGTTTT
Of course, if you insist, you can use proportionally spaced fonts, but you will
need to use the More
command to get access to the full set of fonts in your
GeneJockeyll: Entering and Editing
system. This command also permits you to use sizes other than the basic ones
listed on the Font submenu.
To use the Color . command, first select the area of text or’sequence that you
wish to change, then issue the command. The dialog that follows is the Macintosh
standard color wheel. If the text was originally black, the wheel will appear
entirely black. Move the scrollbar at the right to the top of its travel to show
colors, then click on the wheel to select a color. Dismiss the dialog with the OK
button and the text will change color.
When using these commands to label parts of sequences, you should be aware
that not all combinations of Font, Size, and Style will work well. In addition to
the advice given above about the use of proportionally spaced fonts, you should
be aware that some Styles also change the width of the characters. In general, the
Underline and Italic styles work well, but Bold, Condense, Outline, and Shadow
all increase the width of the characters to which they apply. Some combinations
can be used successfully, e.g., Bold + Condense works provided that you return
the space characters between the blocks to plain text. Be aware also that different
fonts may be of differing widths, even though they are nominally of the same

point size, so if you change part of a sequence from Monaco to Courier font, you
should increase the point size from 9 to 10 to make the character widths match.
Examples:
CGGTGGACTA TCCAAACCTA CTTCATACCC CAGGAACTCG CTAACAAAGT:
Underline-OK
GCCCTACACT
CGCGACTTGA ACATGGTCTT
AGCTCCCCAG AACATGCGCC:
Italic-OK
GGTGCCCTAA GTTCTCAATC GCTGTTCGAA ACTCGGAACA TAGTTTTGGC:
Bold alone-messy
CACTCCAAAC AACTTAGCCT GGATAGGACG GATGCAGGCC CCTTTCCACC:
Bold + Condense-OK
11. The name of the Modify menu is something of a misnomer, since although the com-
mands on it (except Generate Random Sequence . and Genetic code >) generate a
derivative sequence, they always open a new window to contain the derivative,
leaving the original sequence unmodified. The commands on this menu operate on
the front (active) window only. The first three commands on the menu manipulate
sequences in standard ways, and it is important to distinguish between them.
The Reverse command produces a sequence that 1s reversed in order, and is
the only command on this menu that will operate on both nucleotide and protein
sequences (e.g.,
ATTGGGCC
reversed is
CCGGGTTA).
The Coniplement command
produces a sequence that is complementary to the original sequence (e.g.,
ATTGGGCC
complemented is
TAACCCGG).

You should note that although this sequence is
complementary to the original, it is shown in the reverse direction, i.e., with the 3’ end
at the left. The Invert command both reverses and complements the sequence, gener-
ating the sequence of the strand that is biologically complementary to the original
(e.g , ATTGGGCC
inverted is
GGCCCAAT).
Of the three commands, this is the one
that is used most often, which is why it is the only menu command in color
The Genetic Data Environment
A User Modifiable and Expandable
Multiple Sequence Analysis Package
Jonathan A. Eisen
1. Introduction
The Genetic Data Environment (GDE) is a software package designed for
molecular sequence alignment and analysis (I). Four features make GDE stand
out relative to other similar programs:
1. It is free.
2 It has a user-friendly and visually powerful multiple sequence alignment editor.
3. Analysis can readily be performed on any sequence(s) or region(s) of sequences
simply by selecting the sequence(s) or region(s) of interest and choosing the desired
function from the pop-up menus
4. Although tt comes with a vanety of powerful sequence analysis tools, any addi-
tional programs of the user’s interest or updates for programs in use can be incor-
porated quickly and easily into the menu system (see Note 1).
The current release of GDE includes a variety of sequence analysis tools,
including methods for sequence alignment and editing, conversion between
sequence formats, nucleic acid translation, identification of restriction sites,
RNA secondary structure prediction and drawing, database searching, dot plots,

phylogenetic analysis, consensus determination, and printmg and formatting.
Instructions for how to use many of these features are presented here. How-
ever, since GDE is user-expandable, the main focus of this chapter will be on
how to use the core GDE alignment window. In addition, a brief guide
on how
to add additional programs to the GDE menu system is included. Learning to
use this type of program may be of more use in the future other programs will
From Methods II) Molecular Biology, Vol 70 Sequence Data Analysis Gurdebook
Edited by S R Swlndell Humana Press Inc , Totowa, NJ
73
74 Eisen
likely adopt this user-expandable system. Currently, work is in progress to
incorporate many of the features of GDE into the incredibly powerful but some-
what cumbersome software package GCG.
2. Materials
2.7. Hardware
The GDE software package is designed to run on the Sun family of com-
puter workstations. However, it can also be run with some modifications on
other Unix-based workstations, such as DecStations (Digital Equipment
Corp., Maynard, MA) and SGIs (Silicon Graphics, Inc., Mountam View, CA).
The sequence alignment editor is designed to be run in an X-Windows or
OpenWindows environment and can be displayed locally (on the machine run-
nmg the GDE software) or remotely on any machme capable of X-window
emulation (e.g., MacX can be used for displaying on a Macintosh). Although
most of the features and programs of GDE are designed to be run from the
alignment editor, many can also be run from the Unix prompt. A working
knowledge of Unix and X-windows is helpful for using GDE but not neces-
sary. Whenever possible, I include all instructions needed.
The core GDE package requires about 15 Mb of disk space. Additional space
is required for sequence database files. The amount of RAM needed varies a

great deal, depending on the size of the sequence files being viewed and the
number and type of programs used to manipulate or analyze these sequences.
The GDE system can be run on color or black and white machines. However,
to make full use of the sequence alignment window, it is helpful to have
color. For example, ammo acids are colored by chemical type (all acidic are
one color, all basic are another, and so on). Thus, regions of sequence similar-
ity can be quickly identified by blocks of particular colors. In addition, some of
the highlighting features of particular GDE programs work best when viewed
in color.
2.2. Soft ware
The current GDE package (version 2.2) can be obtained from a variety of
computer archives. URL addresses for some sites are given below.
1
2.
3. gopher://megasun.bch.umontreal.ca/ll/GDE
4. gopher://rdpgopher.life.uiuc.edu/ll/progr~s/Editor-GDE
5.
6.

7
unix/GDE
Genetic Data Environment
15
The GDE package is usually found at archive sites in compressed archive
format as a single file (e.g., gde2.2.tar.Z). This tile must be copied to a local
machine, decompressed, and unarchived. In addition, the .cshrc file of all users
who want to run GDE must be modified slightly. Below are instructions that
can be used to set-up GDE for a Sun Sparcstation (once the file has been cop-
ied from an archive site). The commands in italics should be typed from the
Unix prompt and followed by a carriage return. For other types of computers,

some modifications of these instructions may be necessary. The specifics will
depend on the machine, the type of Unix being run, and the type of X-windows
being used for display. Instructions for setting up GDE on a variety of other
machines are available at many of the above archive sites.
1. % mkdlr /usr/local/GDE CreturnX makes a directory for the GDE program.
2. % mv gde2.2. tar.Z /usr/local/GDE/ <return>: moves the file to the directory.
3. % uncompress gde2 2 tar.Z <return>: uncompresses tile
4. % tar -xvf gde2.2. tar <return>: unarchives tile.
For
each user, the following lines should be added to the .cshrc tile found in
their home directory. The additions can be made using a text editor like
vi, emacs,
or textedit or by using the cat command (type cat > > . cshrc from the Unix prompt
and any text typed will be added to the .cshrc file-when done type control-D).
1 set path = ($ path usr/local/GDE/bin)
2 setenv GDE-HELP-DIR /usr/local/GDE/GDEHELP
2.3. Databases
The GDE package comes with two database comparison programs sta
(2) and blast (3). To make use of these programs, the desired databases must be
set-up m specific formats and locations. All should be set-up m subdirectories
within the GDEHELP directory (/usr/local/GDE/GDEHELP). Instructions for
doing so are given below. Special programs are required to format databases
for the blast programs, and these are included with the GDE package. To run
these programs, simply type their name followed by a carriage return from the
Unix prompt. If the appropriate databases are already set-up elsewhere on a
local system, aliases for the locations of these tiles can be set-up in the directo-
ries described below instead of copying the entire databases.
1. ForJzsta protem searches, copy PIR to the GDEHELP/FASTA/PIR/ directory.
2. Forfasta nucleotide searches, copy Genbank to GDEHELP/FASTA/GENBANW
directory.

3. For blast protein comparisons, copy PIR to GDEHELP/BLAST/PIR/. Then use
thepw2fasta program to convert to temporary FASTA format. Then reformat the
database using the setdb program.
16
Eisen
4. For
blast
nucleotide comparisons, copy Genbank to BLAST/GENBANW in the
GDEHELP directory. Then use the
gb2fasta
program to convert to temporary
FASTA format. Finally, use the
pressdb
to reformat the database.
3. Methods
3.1. GDE Basics
3.7.1. Starting the Program
Prior to starting GDE, the user must set-up for displaying in an X-windows or
equivalent environment. If GDE is to be run locally on a workstatton, usually the
windows environment will be started when you log on the machine. If not, try
typing
x
or
openwin
from the Unix prompt. The GDE can also be run remotely
by setting up to display on a local machine but running the program elsewhere.
There are many ways to do this depending on the machine on which you will be
displaying. In general, what you have to do is tell the local machine that you are
allowing the machine that will be used to run the GDE software to be a X-win-
dows host (for many Unix systems

type xhost + remote-machines-address
from
the Unix prompt replacing
remote-machine-address
by the name or IP address
of the machine from which you will run GDE). Then you have to tell the remote
machme that you will be displaying GDE elsewhere (for many Unix systems
type
setenv DISPLAY local~machines~address.4
from the Unix prompt replac-
ing
local-machine-address
by the IP address or name of the machine used as
the display).
Once everything is set-up, to start GDE type
gde
or
gdefilename
(where
“jlename”
is replaced by the name of the file one wants to open), followed by
a carriage return (this must be typed in the window of the machine running the
GDE software if you are using a remote server). The GDE alignment window
should appear. An example window is shown in Fig. 1. This window includes
many of the features that will be referred to later.
3.1.2. Using the Mouse and Menus in GDE
GDE is a menu-driven, X-windows-based system. As with other windows
envu-omnents, in X-windows, pop-up/drag-down menus are used to access a
variety of commands. The most obvious difference between X-windows and
traditional Mac or PC Windows is that there are three buttons on the mouse

with which to become familiar. The buttons are used for different functions,
including:
1. Left button: placing cursor, selecting sequences and regions of sequences, scroll-
ing, resizing windows, and splittmg screen.
2. Middle button: extending text selecnon.
3. Right button: opening pop-up menus and scroll-bar menus.
Genetic Data Environment 17
GDE
Seqmm Names and Group #‘s
MeIllU
\ /
Selected Region
&II Bar Elevator
Split Screen Dwder
Fig. 1. The main GDE window.
The most important mouse skill in GDE is selection of items fi-om the GDE menus.
To
select an item in a menu, such as the File menu (in the upper left in Fig. 1):
1. Point the mouse cursor-at the menu button of interest and click with the right
mouse button (this will expose the items in the drag down menu).
2. Select one of the items in the menu by pointing and clicking with the left mouse
button.
3. Menus can be “thumbtacked” to the screen by first selecting the menu with the
right mouse button and then clicking on the thumbtack with the left mouse button.
For most GDE menu items, a dialog box will appear after the command has
been selected. These boxes ask for various types of input that define exactly
how the command will be executed. The GDE uses five types of input formats
in these dialog boxes-text lines, sliders, chooser buttons, pop-up menus, and
check-boxes. The first four of these are demonstrated in the menu for the Find
command (Fig. 2).

1. Text lines: To enter text in a text line, point the mouse cursor to the text line,
click with the left mouse button, and then type the text.
2. Sliders: To modify values in sliders, point the cursor to the rectangular box on the
slider and then click and hold the left mouse button and drag to the left or to the
right to get to the desired number (which is shown in the text line to the left).
Sliders can be altered in increments of one by pointing and clicking with the left
mouse button to the right or left of the slider box, along the slider line.
3. Chooser buttons: To alter selections in chooser buttons, simply point the mouse
cursor and click with the left button on one of the boxes to the right of the choice.
The selected box will be highlighted.
4. Pop-up menus: Pop-up menus can be altered as described above for GDE menus.
5. Check-boxes: Boxes are checked by simply pointing and clicking with the left
mouse button.
18 Eisen
Pop-Up &
Menus
,I
. >
.:
Fig. 2.
An example of a GDE dialog box. The figure shows the dialog for the Find
command showing four of the five possible means of inputing information.
In general, once the dialog box has been “filled out” to the user’s interest,
the command is usually started by clicking the OK or DONE buttons.
As mentioned above, one of the most powerful aspects of GDE is the ability
to quickly add new programs. A dialog box like this one for a new program can
usually be added in about 30 min with no programming experience except a
little knowledge of Unix commands. The dialog boxes are helpful because once
they are programmed the user does not have to remember the code line instruc-
tions for each program (see Section 4. for more information about incorporat-

ing new programs) and the program can be run on specific sequences or regions
with the click of a button.
3.1.3. Sequence Input and Sequence Types
GDE uses four different types of sequences: DNA/RNA, protein, text, and
masks. The sequence type is important in determining which characters are
allowed to be entered into the sequence, as well as how external programs
handle the sequence when it is selected for analysis. The DNA/RNA and pro-
tein sequences use the standard nucleotide, amino acid, and degenerate posi-
tions abbreviations. Text sequences allow any characters and are particularly
useful for keeping notes along with an alignment (such as intron positions,
transcription start sites, mutation spots, and so on). Masks are used to direct
external programs to use only subsets of a sequence alignment. This can be
particularly useful in phylogenetic analysis (see Section 3.3.5.) but are useful
in other functions as well (see Section 3.1.17.).
There are three ways to get sequences into a GDE window. Short descrip-
tions for each method are given below. Combinations of these can be used to
Genetic Data Enwonment
79
load multiple files and sequences mto one window (remember to check the file
name prior to saving if multiple files have been opened or imported; see Note 2).
3.1.3.1. DIRECT INPUT (FOR SEQUENCES IN GDE, FLAT, OR GENBANK FORMAT)
1. Choose the Open command from the File menu
2. In the dialog box, the local directory is shown. Click on the name of the file to be
opened or move through the directories to find the file of interest.
3. Once the file is selected, click the Open button.
4. The sequence(s) will be added to the ones currently in the GDE wmdow.
3.1.3.2. LOADING SEQUENCES IN OTHER FORMATS
1. Choose the Input Foreign Format command from the File menu (see Note 3).
2. A text line for inputting the name of the file to import will appear m the dialog
box. If the file of interest is in the directory from which the GDE program was

started, type in the file name (e.g., gde.pir). If the file is in another directory, you
need to type the path name as well (e.g., /GDE/gde.pir). Sometimes it is easier to
move the file to the directory in which GDE was started rather than typing the
entire path name.
3. Chck the OK button.
4. The sequences will be imported and added to those already m the GDE window.
5. This function uses the readseq program to convert between sequence formats
and thus has all of the features and bugs of this program. It IS important to be
careful when importmg sequences that have been recewed by E-mad from sequence
databases. Depending on the way they were received and the E-mail system used,
sometimes the E-mail headers can mterfere with the importing functions. In addl-
tion, only some sequence information fields will be converted; others may be left
out or merged into the same field. Instructions for accessing sequence informa-
tion fields are m Section 3.1.5.
6. Readable formats include Genbank, IG/Stanford, NBRF, EMBL, GCG, DNA
Strider, Fitch, Pearson/Fasts, Zuker, Olsen, Phylip, Plain text, ASN 1, PIR, MSF,
and PAUP.
3.1.3.3. NEW SEQUENCES
1. Choose the New Sequence command from the File menu.
2. Choose the sequence type (DNA/RNA, protein, text, mask) from the pop-up menu.
3. Type in a name.
4. Click the OK button.
5. A sequence name (with no sequence yet) will be added to the sequences already
in the GDE window. The sequence can then be typed in directly (see Sectlon 3 2.1.).
3.1.4. Selection of Sequences or Regions for Analysis
In general, functions selected from GDE menus are performed only on the
sequence(s) or region(s) that have been selected by the user. The ability to quickly
select different sequences and regions of interest allows the user to perform analysis
20 Eisen
with high specificity. For example, to compare a small segment of the N-termini

of a protein to a sequence database, just select that region and then choose one of
the database searching options from the GDE menu.
Sequences and regions can be selected either directly using the mouse or
indirectly using menu functions. The currently selected sequences or regions
are highlighted in the GDE window (Fig. I). It is important to note that region
and sequence selection are independent+hanging selected regions has no affect
on which sequences are selected and vice versa. However, for some commands,
sequence and region selection can be in conflict. This occurs when the com-
mand chosen can be performed on either sequences or regions (e.g., multiple
sequence alignments). In these cases, a selection window will appear asking
the user to choose whether the function is to be performed on the region(s) or
sequence(s) selected. Some functions can only be performed on either regions
or sequences but
not both (e.g., grouping, see Section 3.1.9.), and thus a chooser
window will not appear in these cases.
3.1.4.1. SEQUENCE SELECTION
1.
Click on the short name of the sequence with the left mouse button.
2. To select multiple sequences use mouse dragging (click and hold the left mouse
button while dragging the mouse cursor across the names of the sequences to be
selected and releasing after the last name) or shift clicking (hold the shift key
while performing additional selections with the mouse button).
3. Use the Select All
or
Select by name commands from the Edit menu to select
sequences indirectly.
4. Deselection of sequences IS done either by selecting other sequences wtthout
holding down the shift key or clicking the mouse in the region immediately to the
rtght of the sequence names (but to the left of the sequence text)
3.1.4.2. REGION SELECTION

1. With the left mouse button held, drag the mouse cursor across the region to be
selected and release the button when at the end of the region.
2. Alternatively, “embrace” the region to be selected by pointing the mouse cursor at
one side of the region and clicking with the left mouse button and then point and click
on the other side of the region with the middle mouse button. This method allows
the use of scroll-bars to move to the second edge of the region to be selected (which
makes selection of long regions of sequence easier than with mouse dragging).
3. Both of the above methods can be used to select a region from one sequence or
comparable regions from multrple sequences.
4. Selection of regions can be complicated by grouping of sequences (see Section
3.1.8.)-selecting a region m one member of a sequence group ~111 automati-
cally select that region in all members of the group.
5.
To deselect regions, select another region or point and click the mouse anywhere
in the text.
Genetic Data Environment
21
3.15. Saving Sequences and Alignments
GDE allows for sequences and alignments to be saved in a variety of for-
mats (see Note 4). The three different means of saving sequences and align-
ments are described below. If you have many sequence files, be careful not to
overwrite files of interest.
3.1.5.1. SAVING AN ENTIRE ALIGNMENT
1. Choose Save As from File menu.
2. Select the format (GDE, Genbank, or Flat).
3. Enter a new name or leave the original name.
4. Click the OK button.
5 The file ~111 be saved in the directory where the GDE program was opened.
3.152. SAVING SPECIFIC SEQUENCE(S) OR REGION(S)
1. Select the sequence(s) or region(s) to be saved.

2. Choose Save Selection from the File menu.
3. Select the sequence format (GDE, Genbank, Flat).
4. Enter a file name.
5. Click the OK button.
6. The file will be saved in the directory where the GDE program was opened. Ahgn-
ment information will be retained.
3.1.5.3. SAVING IN OTHER FORMATS
1. Select the sequence(s) or region(s) to be saved.
2. Choose Output Foreign Format from the File menu (as with the Input Foreign
Formats command m Section 3.1.2., this uses readseq).
3. Select the output format from the pop-up menu (Genbank, IG/Stanford, NBRF,
EMBL, GCG, DNA Strider, Fitch, Pearson/Fasts, Zuker, Olsen, Phyhp, Plain
text, ASN. 1, PIR, MSF, PAUP, Pretty)
4. Enter a name for the tile.
5. Click the OK button.
6. The file will be saved in the directory where the GDE program was opened.
3.1.6. Sequence information
GDE allows storage of a variety of information for each sequence, Under
normal conditions, the majority of this information is kept hidden. Access to
this information is gained via a dialog box. This information can be useful for
sorting functions (see Section 3.1.7.) as well as for future reference. For exam-
ple, strand and direction will influence translation functions and sequence type
will mfluence allowable modifications.
1. Select the sequence of interest.
2. Choose Get Info from the File menu (Fig. 3).
22
Eisen
ion B made‘!& seqttenie for demonstrat;on
"'
I _""-_ _-_-"__- .11

Created on a/4/% -tS:a7:55 ( >I
tomment5:
This sequence was made up for demonstration purposes.
to patent it or clonedt.
Please do not try ^-i,
\\:
* h
-is+
Fig. 3. Sequence Information dialog box.
3. Change or enter the text for short name (the name shown in GDE Window), full
name, ID number, description, author, and comments.
4. Set the pop up menus for type, strand, and direction.
5. Click the OK button when done.
3.1.7. Sorting and Ordering Sequences
In order
to aid multiple sequence alignment and analysis, it is helpful some-
times to have specific sequences next to each other. Reordering of sequences can
be done in two ways-either by cutting and pasting or using sorting functions.
3.1.7.1. MANUAL
1. Select the sequence(s) to be moved.
2. Choose the Cut or Copy commands from the Edit menu, or use built-in cut/copy
keyboard function keys.
3. Select the site at which the sequences are to be placed (by selecting the sequence
immediately above the site).
4. Choose the Paste command from the Edit
menu.
3.1.7.2. COMPUTER-BASED
I. Select the sequence(s) or region(s) to be sorted.
2. Choose the Sort command from the Edit menu.
3. Choose the Primary and Secondary Sort Fields (group, type, name, sequence ID,

creator, offset).
4. Click the OK button.
5. A new GDE window with the results will appear.
3.1.8. Extracting Sequences/Regions
Sometimes it is helpful to extract subsets of sequences or regions of
sequences into a new alignment window. This can be done in either of the
following two ways.
Genetic Data Environment
23
3.1.8.1. DIRECT
1 Select the sequence(s) or region(s)
2. Choose Extract from the Edit menu.
3. A new GDE window with the results will appear.
3.1.8.2.
INDIRECT
1. Select the sequence(s) or region(s).
2. Choose Save Selection from the File menu (see Section 3.1.10.).
3. Use the Open command to reopen this saved selection (see Section 3.1.3.).
3.1.9. Grouping Sequences
Grouping of sequences allows editing functions to be performed on all members
of the group at the same time. This feature is particularly useful for aligning
sequences
by hand. For example, if one had separate alignments of 30 gamma
globins and 30 beta globins and wanted to align them together manually, it might
be easiest to group all of the beta globins into one group and all of the gammas into
another. Then, alignment gaps could be placed in all gammas at the same time and
all betas at the same time by entering the gap into only one of the members of the
group. If one then wanted to put a gap in only one or a few of the beta globins, they
could be ungrouped and the gap could be placed in just those few. When editing
functions are attempted on one member of a group, only those actions that are

permitted for all members of the group will be allowed (see Section 3.1.10.).
Regions cannot be grouped, only sequences can. To change sequence groups:
1. Select the sequence(s) to be grouped or ungrouped.
2. Choose Group or Ungroup from the Edit menu.
3. If any of the sequences selected are part of another group, the user will be asked
whether to merge the groups or to create a new one.
4. A number will be placed to the left of the short sequence name(s) to indicate
group status.
3.7.10. Sequence Protections
GDE allows for the protectton of sequences against accidental modification.
There are four different types of modifications allowed during editing. The default
is to allow only modiftcation of alignment gaps and translations. Depending on the
type of sequence (DNA, protein, text, mask), “ambiguous” characters are dif-
ferent, For example, N is ambiguous for DNA and RNA, but is not for protein.
1. Select the sequence(s).
2. Choose Protections from the File menu.
3. Select the modifications allowed (unambiguous characters, ambiguous charac-
ters, alignment gaps, translations).
4. Click the Done button when finished.
24 Eisen
3.1.11. Repeat Counts
Repeat counts allow the user to repeat a keystroke any number of times by
typing the number corresponding to the desired number of repeats immediately
prior to the key being typed. This is very useful for manual sequence alignment
(for inserting or removing multiple gap characters) and for moving the cursor a
defined number of spaces (see Section 3.2.). Repeat counts will not work when
the cursor is in a text or mask sequence because numbers can be used as input.
3.1.12. Printing
The GDE has two means of printing sequences or alignments. Normal GDE
printing allows printing of sequences and alignments with a variety of Unix

commands as well as viewing and editing the file to be printed. Sequences can
also be printed with the PrettyPrint format of the readseq program. PrettyPrint
output is designed for publishing and presentation of alignments and can pro-
duce very polished figures. Both printing commands are accessible from the
File menu.
3. I. 7 3. Cursor Position
The cursor is identified by the flashing horizontal line in the sequence text
section of the GDE window. It is used m essentially the same way as the cursor
in most word processing programs. First and foremost, the cursor marks the
spot at which editing commands are performed and text selections begin. In
addition, it can be used to mark a place for quick returns if the screen is scrolled
to another page Information about the cursor position is displayed in the status
line (Fig. 1). To move the cursor, either point to a new region and click with the
left mouse button or move with the arrow keys (repeat numbers can be typed
before the arrow keys to move a specific number of positions). If the cursor is
moved past the edge of the screen, scrolling will be activated and the next page
of sequence will be shown. Since scrolling can be performed without moving
the cursor (see Section 3.1.14.), the cursor may not always be visible in the
GDE window. The cursor may be hidden from view if the scroll-bars are used
to show a different region of sequence. To return the screen to display the
region of sequence where the cursor is, type one of the arrow keys. This function
(which I will refer to later as the return screen function) is helpful, but can lead to
some confusion. If you want to keep the view on the sequences you have scrolled
to, remember to change the cursor point to that region using the mouse.
3.1.14. Scrolling
Only a portion of most sequences will be viewable in a single GDE window.
The rest of the sequence can be viewed by scrolling to another page (to the
right or left). In addition, if an alignment contains many sequences, it may
Genetic Data Environment
25

be necessary to scroll up and down to see different sequences. Scrolling can be
performed in a variety of ways, including:
1. Click with the left mouse buttons on the arrows on the scrollmg elevator (Fig. 1).
2. Click and drag in the center of the elevator
3. Use the scroll-bar menu (which is opened by clicking with the right button on the
scroll-bar).
4. Click on the scroll-bar edges (the vertical lines at the edge of the scroll-bar) This
moves the wmdow all the way to the beginning or end of an alignment
5. Use the cursor arrows to move the cursor past one edge of a screen page (see
Section 3.1.13.)
3.1.15. Split Screens
A split screen allows the viewing of discontinuous regions of a particular align-
ment. This can be used, for example, to insert gaps in the upstream portion of a
sequence while simultaneously monitoring the ahgnment of the downstream
portion, even thousands of bases away. Be careful not to have different vertical
positions for different screens-this ~111 make identification of specific
sequences difficult. Vertical scrolling can be locked in the screen properties menu.
The region of the alignment shown in a particular screen can be changed in
three ways, by downstream manipulations of the sequence (such as insertion of
gaps) in another screen, by using the scrolling functions, or by using the return
screen function described in Section 3.1.13. The return screen function can
lead to much confusion when using split screens because this function only
operates on the active screen. The active screen is determined by the screen in
which the mouse pointer is pointing. Therefore, be sure to know which screen
the mouse is pointing to before you use the return screen function. For exam-
ple, imagine you are using the right screen to view the C-termini of a protein
alignment and the left screen to view the N-termim, and the cursor is in one of
the proteins in the N-termini. If you want to insert a few ahgnment gaps m this
protein’s N-terminus, be careful that the mouse is pointing to the left screen. If
it is pointing to the right screen when you type the alignment gaps, the right

screen will return to the position of the cursor and thus you will have two
screens showing the N-termini. Below are descriptions of the two ways to make
and remove split screens. Any number of split screens can be used at one time.
1. Point the mouse cursor at the edge of the scroll-bar
2. Click and drag to create or remove split screens.
Alternatively:
1. Point and click the right mouse button on the scroll-bar.
2. Select Split Views or Unsplit Views from the pop-up menu.

sequence data analysis guidebook

Tài liệu liên quan

Tài liệu bạn tìm kiếm đã sẵn sàng tải về