Tải bản đầy đủ (.pdf) (729 trang)

Ebook The indispensable PC hardware book (3rd edition) Part 2

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (29.69 MB, 729 trang )

18 Our Mathematical Grandmother The MathCo 8087
The four basic arithmetical operations with integers are already integrated on the 8086/88. It is
not surprising that the 8086/88 can handle neither floating-point numbers nor transcendental
functions; this is carried out by the mathematical coprocessor 8087. It can enhance the performance up to a factor of 100, when compared to software emulations. Additionally, the 8087
supports an SOSS/SS CPU in maximum mode with 68 new mnemonics.

18.1 8087 Number Formats and Numerical Instruction Set
As a mathematical coprocessor, the 8087 can process floating-point numbers directly. In the
same way as the 80286 and its successors, the 8087 represents all numbers in the temporary real
format according to the IEEE standard. Figure 6.3 (Chapter 6) shows the number formats that
are supported by the 8087. Unfortunately, the 8087 does not implement the IEEE standard for
floating-point numbers in a very strict way (not very surprising - the 8087 was available before
the standard). The 8087 numeric instruction set is slightly smaller than that for an i387 or
80287XL; for example, the FSETPM (set protected mode) instruction is (of course) missing.
Further, no functions for evaluating sine and cosine are available. But they can be constructed
with the help of the tangent. A detailed list of all 8087 instructions is given in Appendix Cl.

18.2 8087 Pins and Signals
Like the 8086/88, the 8087 has 40 pins in all for inputting and outputting signals and supply voltages.
Usually, the 8087 comes in a 40-pin DIP package. Figure 18.1 shows the pin assignment of the 8087.
ADlS-ADO (I/O)
Pins 39, 2-16
These 16 connections form the 16 data bits when the 8087 is reading or writing data, as well as
the lower 16 address bits for addressing memory. As is the case with the 8086, these 16 pins
form a time-divisionally multiplexed address and data bus.
A19-A16/S6-S3 (II01
Pins 35-38
These four pins form the four high-order bits of the address bus, as well as four status signals,
and form a time-divisionally multiplexed address and control bus. During bus cycles controlled
by the 8087, the S6, S4 and S3 signals are reserved and held on a high level. Additionally, S5
is then always low. If the 8086/88 is controlling the bus then the 8087 observes the CPU activity


using the signals at pins 56 to S3.
471


472

Chapter 16

:

AD15
Al+%3
Al 7lS4
A181S5
A19/S6
ms7
RWGTI
INT

8087

L

J

Figure 18.1: 8087 pin assignment. The 8087 comes itt a standard DIP pncknge cornprisiq 40 pins.

BHEIS7 (I/O)
Pin 3 4
This bus high enable signal indicates whether a byte is transferred on the high-order part ADISAD8 of the data bus. When the 8086/88 is in control of the bus the 8087 observes the signal at

pin 57 supplied by the CPU.
BUSY (0)
Pin 23
If the signal at this pin is high then the 8087 is currently executing a numerical instruction.
Usually, BUSY is connected to the TEST pin of the 8086/88. The CPU checks the TEST pin and
therefore the BUSY signal to determine the completion of a numerical instruction.
CLK (I)
Pin 19
CLK is the clock signal for the 8087.
INT (0)
Pin 32
The signal output at this pin indicates that during the execution of a numerical instruction 1”
the 8087, a non-maskable exception has occurred, for example an overflow. The output of tkr
signal can be suppressed by interrupt masking in the 8087.


Our Mathematical Grandmother The MathCo 8087

473

QSl, QSO (I, I)
Pins 24, 25
The signals at these pins indicate the status of the prefetch queue in the 8086/W Thus, the 8087
can observe the CPU’s prefetch queue. For (QSl, QSO) the following interpretations hold:
(00) the prefetch queue is not active;
(01) the first byte of the opcode in the prefetch

queue is processed:

(10) the prefetch queue is cancelled;

(11) a next byte of the opcode in the prefetch queue is processed.

READY (I)
Pin 22
The addressed memory confirms the completion of a data transfer from or to memory with a
high-level signal at READY. Therefore, like the 8086/88, the 8087 can also insert wait cycles if
the memory doesn’t respond quickly enough to an access.
RESET (I)
Pin 21
If this input is high for at least four clock cycles, the 8087 aborts its operation immediately and
carries out a processor reset.
RQ/GTO (I/O)
Pin 31
The 8087 uses this pin to get control of the local bus -from the- SOS6/8S so as to execute its own
memory cycles. RQ/GTO is connected to the CPU’s RQ/GTl pin. Normally, the 8086/88 is in
control of the bus to read instructions and data. If the 8087 accesses the memory because of a
LOAD or STORE instruction, it takes over control of the local bus. Therefore, both the 8086/88
and the 8087 can act as a local busmaster.
RQ/GTl (I/O)
Pin 33
This pin may be used by another local busmaster to get control of the local bus from the 8087.
% zl, so (I/O)
Pins 28-26
These three control signals indicate the current bus cycle. For the combinations (S2, Sl, SO) the
following interpretations hold for bus cycles controlled by the 8087:
(OXX)

invalid;

(1001


invalid;

(101) data is read from memory;
(110) data is written into
1111)

passive state.

memory;


474

Chapter 18

If the SO86/88 is controlling the bus, the 8087 observes the CPU activity using the signals at pins
S2toSO.

vcc

(I)

Pin 40
This pin is supplied with the supply voltage of +5 V.
GND
Pins 1, 20
These pins are grounded (usually at 0 V)

18.3 8087 Structure and Functioning

The control unit largely comprises a unit for bus control, data buffers, and a prefetch queue. The
prefetch queue is identical to that in the 8086/88 in a double sense:
- It has the same length. Immediately after a processor reset the 8087 checks by means of the
BHE/S7 signal whether it is connected to an 8086 or 8088. The 8087 adjusts the length of its
prefetch queue according to the length in the 8086 (six bytes) or 8088 (four bytes), respectively.
_

The prefetch queue contains the same instructions. By synchronous operation of the 8086/
88 and 8087, the same bytes (and therefore also the same instructions) are present in the
prefetch queues of both CPU and coprocessor.

Thus, the CU of the coprocessor attends the data bus synchronously to and concurrently with
the CPU and fetches instructions to decode. Like the other 80x87 coprocessors, the 8087 also has
a status, control and tag word, as well as a register stack with eight BO-bit FP-registers. Additionally, the two registers for instruction and data pointers are implemented.
The status word format is shown in Figure 18.2. If bit B is set the numerical unit NU is occupied
by a calculation or has issued an interrupt that hasn’t yet been serviced completely. If the IX bit
is set, a non-maskable exception has occurred and the 8087 has activated its INT output. In the
PC/XT an NM1 is issued. (Beginning with the 80287, IR has been replaced by ES = error status.)
The meaning of the remaining bits C3-CO, TOP, PE, LIE, OE, ZE, DE and ZE is the same as for
the 80287.

The 8087 generates an exception under various circumstances, but some exceptions may be
masked. Further, you are free to define various modes for rounding, precision and the representation of infinite values. For this purpose, the 8087 has a control word, shown in Figure 15.3.


Our Mathematical Grandmother - The MathCo 8087

475

Figure 18.3: 8087 control word


The IC bit controls the processing of infinite values. Projective infinity leads to only one value,
namely m. If you set IC equal to 0, then the 8087 operates with affine infinity, and two infinite
values +a0 and --m are possible. Beginning with the 80287XL, the IC bit is only present on
compatibility grounds because the IEEE standard allows affine infinity only. With the M bit, you
can mask interrupts globally, in which case the 8087 ignores all exceptions and doesn’t execute
an on-chip exception handler. This capability has also been removed with the 80287. The function of the remaining bits PM, UM, OM, ZM, DM and IM is the same as in the i387 (Section 6.5).
You will find the 8087 tag word in Section 6.5; it is identical to that in the i387. Moreover, the
memory images of the instruction and data pointers match those for the 16-bit real format in the
i387. They are shown in Figure 6.10.

18.4 8087 Memory Cycles
An interesting difference between the 8087 and all later 80x87 model occurs in the memory
access: the 8087 can access memory on its own; there are no I/O cycles between CPU and
coprocessor.
The 8086/88 distinguishes instructions with memory access from pure arithmetical instructions
handed by the 8087. The CPU calculates the operand address according to the addressing
scheme indicated, and then the 8086/88 executes a dummy read cycle. This cycle differs from a
normal read cycle only in that the CPU ignores the data supplied by the memory. If the CPU
recognizes a coprocessor instruction without a memory operand, it continues with the next
struction after the 8087 has signalled via its BUSY pin that it has completed the current
struction.
he 8087 also behaves differently for instructions with and without a memory operand. In the
rst case, it simply executes an instruction such as FSQRT (square root of a floating-point
umber). For an instruction with a memory operand it uses the 8086/88 dummy read cycle in
re following way:
Fetching an operand from memory: the 8087 reads the address supplied by the CPU in the
dummy read cycle via the address bus and stores it in an internal temporary register. Then
the 8087 reads the data word that is put onto the data bus by the memory. If the operand
is longer than the data word transferred within this read cycle, the 8087 requests control of

the local bus from the 8086/88. Now the 8087 carries out one or more succeeding read cycles
on its own. The coprocessor uses the memory address fetched during the course of the
dummy read cycle and increments it until the whole memory operand is read. For example,
in the case of the 8088/87 combination, eight memory read cycles are necessary to read a


476

Chapter 18

floating-point number in long real format. Afterwards, the 8087 releases control of the loc.ll
bus to the 8086/88 again.
-

Writing an operand into memory: in this case the coprocessor also fetches the address output
by the CPU in a dummy read cycle, but ignores the memory data appearing on the data bus,
Afterwards, the 8087 takes over control of the local bus and writes the operand into memory,
starting with the fetched address, in one or more write cycles.

Because of the dummy read cycle the 8087 doesn’t need its own addressing unit to determine
the effective address of the operand with segment, offset and displacement. This is advantageous because the 8087, with its 75 000 transistors, integrates far more components on a single
chip compared to the 28 000 transistors of the 8086/88, and space is at a premium (remember
that the 8087 was born in the 1970s).
The 8087 also uses the 8086/88 addressing unit if new instructions have to be fetched into the
prefetch queue. The CPU addresses the memory to load one or two bytes into the prefetch
queue. These instruction bytes appear on the data bus. The processor status signals keep the
8087 informed about the prefetch processes, and it monitors the bus. If the instruction bytes
from memory appear on the data bus, the 8087 (and also the 8086/88, of course) loads them into
the prefetch queue.
For the data transfer between memory and coprocessor, no additional I/O bus cycles between

CPU and 8087 are necessary. Therefore, the LOAD and STORE instructions require more time
on an 80287. Don’t be surprised if, for pure mathematical applications, a 10 MHz XT with an
8087 coprocessor is nearly as fast as a 10 MHz AT with an 80287. The 80287 (without XL) runs
only at two-thirds of the CPU speed, thus at 6.67MHz. Moreover, it requires the additional
I/O bus cycles between CPU and 80287 when accessing memory. However, the 80286/80287
combination cancels this disadvantage with a more effective bus cycle lasting for only two clock
cycles per data transfer at zero wait states, compared to the four clock cycles of the SO86/8O87
combination. In the end, both systems give about the same performance.

18.5 8086/8087 System Configuration
Figure 18.4 shows typical wiring oi the 8087 coprocessor and CPU 8086/88. As they are
busmasters, both chips access the same local bus which is connected to memory, the I/O a&
dress space and the bus slots via the 8288 bus controller. The 8086/88 and the 8087 read and
decode the same instruction stream at the same speed, thus they operate s~&zrorrozts/~~ and are
supplied with the same clock signal (CLK) by the 8284 clock generator. All higher coprocessol-s,
however, such as the 80287,387, etc., run asychronously to the CPU. For synchronous operntloll
of the 8086/88 and 8087, the 8087 must always know the current state of the 8086/88.
The 8087 can process its instructions independently of the CPU. Even concurrent (parall
execution of instructions is possible, but here the problem of resynchronization arises dft’2r
completion of the coprocessor instruction. After decoding the current ESC instruction, the 8(N~l/
88 would prefer to execute the next instruction at once, but cannot do so because the CPU 11~‘~
to wait for the coprocessor. Because of this, the BUSY pin of the 8087 is connected to tllc‘


Our Mathematical Grandmother - The MathCo 8087

477

Figure 18.4: 8086/8087 system configw&m. The 8087 hnnnor~izes especinlly well with the 8086/88, and cm
therefore be connected to the 8086jSS without difficulties. The 8087 uses the same bus controller, the same clock

generator, nnd the same interrupt controller as the CPU.

TEST pin of the 8086/88. When the coprocessor executes an instruction it activates the BUSY
signal. When it has completed the instruction, it deactivates the signal. The WAIT instruction
of the 8086/88 causes the CPU to check the TEST pin continuously to observe the BUSY state
of the coprocessor. Only when the 8087 has deactivated BUSY to signal to the 8086/88 that the
current instruction is completed and the 8087 is ready to accept further numeric instructions
does the CPU continue with the next instruction. Via the QSO and QSl pins, the 8087 detects the
status of the 8086/88’s prefetch queue to observe the CPU’s operation. Thus, the 8086/88 and
8087 always operate synchronously.
If an error or an exception occurs during a numerical calculation in the coprocessor, such as
overflow or underflow, the 8087 activates its INT output to issue a hardware interrupt request
to the CPU. Usually, the INT signal of the 8087 is managed by an interrupt controller (the 8259A,
for example) and then applied to the 8086/88. But the PC/XT does it in another way: the 8087
hardware interrupt request is supplied to the NM1 input of the 8086/88. The PC/XT has only
one 8259A PIC and must therefore save IRQ channels. Note that besides the coprocessor interrupt, an error on an extension adapter or a memory parity error may also issue an NM1 corresponding to interrupt 2. Thus, the interrupt handler must be able to locate the source of an NMI.
Figure 18.4 demonstrates that both the 8086/88 and the 8087 can access the local bus, to read
data from memory, for example. 8086/88 instructions such as MOV reg, mem or the LOAD
instruction of the 8087 carry out a memory access. Thus there are two busmasters, each using
the local bus independently. A simultaneous access of the local bus by the CPU and coprocessor
would give rise to a conflict between them, with disastrous consequences. Therefore, only one
of these two processors may control the local bus, and the transfer- of control
between them must
be carried out in a strictly defined way. Because of this, the RQ/GTl pins of the 8086/88
- - _
and RQ/GlO pins of the 8087 are connected. From the description above
you_ can see that these
_
Pins serve to request and grant local bus control. The 8087 uses the RQ/GTO pin to get control



478

Chapter 18

of the local bus for data transfers to and from memory. The RQ/GTl pin is available for other
busmasters, for example the I/O 8299 coprocessor. Therefore, CPU and coprocessor may alternate in controlling the local bus. The 8087 bus structure and its bus control signals are equivalent
to those of the 8086/88.


19 Memory Chips



Virtually no other computer element has been the subject of such almost suicidal competition
between the world’s leading microelectronic giants over the past ten years as memory chips. At
the beginning of the PC-era 64 kbit chips and 16 kbit chips were considered to be high-tech. But
today in our PCs, 16Mbit chips are used, and 256Mbit chips are already running in several
laboratories.

Note that the storage capacity of memory chips is always indicated in bits and not in bytes.
,:i. Today’s most common 4 Mb memory chip is therefore able to hold four million bits, or 512 kbytes.
For a main memory of 4 Mbytes, eight of these chips (plus one for parity) are thus required.

’ The technological problems of manufacturing such highly-integrated electronic elements are
enormous. The typical structure size is only about 1 pm, and with the 64 Mbyte chip they will
be even less (about 0.3 pm). Human hairs are at least 20 times thicker. Moreover, all transistors
‘. and other elements must operate correctly (and at enormous speed); after all, on a 64 Mbyte chip
there are more than 200 million (!) transistors, capacitors and resistors. If only one of these
elements is faulty, then the chip is worthless (but manufacturers have integrated redundant

circuits to repair minor malfunctions that will then only affect the overall access time). Thus, it
is not surprising that the development of these tiny and quite cheap chips costs several hundred
million dollars.
For the construction of highly integrated memory chips the concept of dynamic RAM (DRAM)
is generally accepted today. If only the access speed is in question (for example, for fast cache
memories), then static RAM (SRAM) is used. But both memory types have the disadvantage that
they lose their ability to remember as soon as the power supply is switched off or fails. They
store information in a volatile manner. For the boot routines and the PC BIOS, therefore, only
a few types of ROM are applicable. These memories also hold the stored information after a
power-down. They store information in a non-volatile manner, but their contents may not be
altered, or at least only with some difficulty.

19.1 Small and Cheap - DRAM
The name dynamic RAM (DRAM) comes from the operation principle of these memory chips.
They represent the stored information using charges in a capacitor. However, all capacitors have
the disadvantageous characteristic of losing their charge with the lapse of time, so the chip loses
the stored information. To avoid this the information must be refreshed periodically or ccdynamically)), that is, the capacitor is recharged according to the information held. Figure 19.1
shows the pin assignment of a 4 Mb chip as an example. Compared with the processors, we only
have to discuss a few pins here.
A9-Ao (I)
Pins 21-24, 27-32
“Th ese t en pins are supplied with the row and column addresses of the accessed memory cells.
A-7,7


480

Chapter 19

Figure 19.1: Pin ossipment of a 16 Mb chip.


- -

LCAS, UCAS (I)
Pins 35, 34

If the corresponding column address strobe pin is on a low level then the DRAM strobes the
supplied address and processes it as a column address. LCAS
is assigned
- the low-order data
byte 107-100, UCAS the high-order data byte 1015-108. LCAS, UCAS and RAS serve as
address control signals for the DRAM chip.
1015-100 (I/O)
Pins 2-5, 7-10, 41-44, 46-49
These pins are supplied with the write data during a write access, and they provide the read
data in the course of a read access.
RAS (I)
Pin 18
If the memory controller applies a low-level signal at this row address strobe pin then the
DRAM latches the supplied address and interprets it as a row address.
WE (1)
Pin 17
If the write-enable signal at this pin is on a low level then the DRAM performs a write access
GE (I)
Pin 33
If the output-enable signal at this pin is on a low level then data is read from the addressed
memory cell and output.


481


Memory Chips

vcc (I)
Pins 1, 6, 25
These pins are supplied with the supply voltage.
GND
Pins 26, 45, 50
These pins are grounded.

19.1.1 Structure and Operation Principle
For data storage, reading the information, and the internal management of the DRAM, several
functional groups are necessary. Figure 19.2 shows a typical block diagram of a dynamic RAM.

-

RAS
GAS
WE

Figure 19.2: Block dinpm of a d!ytmvric RAM. The r~ernory cells ore arrnnged in R matrix, the so-cnlled n~emory
CM array. The address buffer srqwwtiolly nccepts the row nrrd colrrmr~ addresses nnd trmwnits thcrn to the row
and column decoder, rqwctively. The decoders drive ir&rnal sipd lines md gates so thnt the datn of the
addressed memory cell IS tmmmitted to the dntn buffer nftcr a short time puiod to DC output.

:

The central part of the DRAM is the WWHWLJ cell nrmy. Usually, a bit is stored in an individually
addressable unit memory cell (see Figure 19.3), which is arranged together with many others
in the form of a matrix with rows and columns. A 4 Mbyte chip has 4 194 304 memory cells

arranged in a matrix of, for example, 2048 rows and 2048 columns. By specifying the row and
column number, a memory cell is unambiguously determined.


482

Chapter 15

The address buffer accepts the memory address output by the external memory controller
according to the CPU’s address. For this purpose, the address is divided into two parts, a row
and a column address. These two addresses are read into the address buffer in succession: this
process is called ?rnrltiple~ing. The reason for this division is obvious: to address one cell in a
4 Mbyte chip with 2048 rows and 2048 columns, 22 address bits are required in total (11 for the
row and 11 for the column). If all address bits are to be transferred at once, 22 address pins
would also be required. Thus the chip package becomes very large. Moreover, a large address
buffer would be necessary. For high integration, it is disadvantageous if all element groups that
establish a connection to their surroundings (for example, the address or data buffer) have to
be powerful and therefore occupy a comparably large area, because only then can they supply
enough current for driving external chips such as the memory controller or external data buffers.
Thus it is better to transfer the memory address in two portions. Generally, the address buffer
first reads the row address and then the column address. This address multiplexing is controlled
by the RAS and CA5 control signals. If the memory controller passes a row address then it
simultaneously activates the RAS signal, that is, it lowers the level of RAS to low. RAS (rozu
address strobe) informs the DRAM chip that the supplied address is a row address. Now the
DRAM control activates the address buffer to fetch the address and transfers it to the row
decoder, which in turn decodes this address. If the memory controller later supplies the column
address then it activates the CAS (column address strobe) signal. Thus the DRAM control recognizes that the address now represents a column address, and activates the address buffer again.
The address buffer accepts the supplied address and transfers it to the column decoder. The
duration of the RAS and CAS signals as well as their interval (the so-called RAS-CAS de/a!/)
must fulfil the requirements of the DRAM chip.

The memory cell thus addressed outputs the stored data, which is amplified by a sense amplifier
and transferred to a data output buffer by an I/O gate. The buffer finally supplies the information as read data D,,, via the data pins of the memory chip.
If data is to be written the memory controller activates the WE signal for write enable and applies
the write data D,, to the data input buffer. Via the I/O gate and a sense amplifier, the information is amplified, transferred to the addressed memory cell, and stored. The precharge circuit
serves to support the sense amplifier (described later).
Thus the PC’s memory controller carries out three different jobs: dividing the address from the
CPU into a row and a column address that are supplied in succession, activating the signal>
RAS, CAS and WE correctly, and transferring and accepting the write and read data, respectively. Moreover, advanced memory concepts such as interleaving and page mode request Malt
cycles flexibly, and the memory controller must prepare the addressed memory chips accordin$!
(more about this subject later). The raw address and data signal from the CPU is not suitable
for the memory, thus the memory controller is an essential element of the K’s memory subsystem

19.1.2 Reading and Writing Data
The l-transistor-l-capacitor cell is mainly established as the common unit memory cell toda!
Figure 19.3 shows the structure of such a unit memory cell and the I/O peripherals required to
read and write data.


483

Memory Chips

Column 1 Column 2

Column



Precharge Circuit
~___________..______--__.-__..___--_--_,


ense Amplifier Block

Column Decoder

h ;/O Gate Block

Figure 19.3: Memory cell array and I/O peripherals. The unit memory cell for holding one bit comprises a
capacitor and a transistor. The word lines turn on the access transistors of a row and the column decoder selects
a bit line pair. The data of a memory cell is thus transmitted onto the l/O line pair and afterwards to the data
output buffer.

,~ ,The unit memory cell has a capacitor which holds the data in the form of electrical charges, and
‘an access transistor which serves as a switch for selecting the capacitor. The transistor’s gate is
-‘connected to the word line WLx. The memory cell array accommodates as many word lines
’ :hU to WLn as rows are formed.
Besides the word lines the memory cell array also comprises so-called bit line pairs BL, E. The
number of these bit line pairs is equal to the number of columns in the memory cell array. The
+bit lines are alternately connected to the sources of the access transistors. Finally, the unit
k’memory cell is the capacitor which constitutes Ihe actual memory element of the cell. One of
odes is connected to the drain of the corresponding access transistor, and the other is
%.
“.


Chapter 19

484

The regular arrangement of access transistors, capacitors, word lines and bit line pairs is

repeated until the chip’s capacity is reached. Thus, for a 1 Mbyte memory chip, 4 194 304 access
transistors, 4 194 304 storage capacitors, 2048 word lines and 2048 bit line pairs are formed.
Of particular significance for detecting memory data during the course of a read operation is the
precharge circuit. In advance of a memory controller access and the activation of a word line
(which is directly connected to this access), the precharge circuit charges all bit line pairs up to
half of the supply potential, that is, Vcc/2. Additionally, the bit line pairs are short-circuited by
a transistor so that they are each at an equal potential. If this equalizing and precharging process
is completed, then the precharge circuit is again deactivated. The time required for precharging
and equalizing is called the RAS precharge time. Only once this process is finished can the chip
carry out an access to its memory cells. Figure 19.4 shows the course of the potential on a bit
line pair during a data read.
When the memory controller addresses a memory cell within the chip the controller first supplies the row address signal, which is accepted by the address buffer and transferred to the row
decoder. At this time the two bit lines of a pair have the same potential Vcc/2. The row decoder
decodes the row address signal and activates the word line corresponding to the decoded row
address. Now all the access transistors connected to this word line are switched on. The charges
of all the storage capacitors of the addressed row flow onto the corresponding bit line (time t,
in Figure 19.4). In the 4 Mbyte chip concerned, 2048 access transistors are thus turned on and
the charges of 2048 storage capacitors flow onto the 2048 bit line pairs.

BL

;

:

The problem, particularly with today’s highly integrated memory chips, is that the capacity of
the storage capacitors is far less than the capacity of the bit lines connected to them by the access
transistors. Thus the potential of the bit line changes only slightly, typically by +lOO mV (t2). lf



Memory Chips

485

the storage capacitor was empty, then the potential of the bit line slightly decreases; if charged
then the potential increases. The sense amplifier activated by the DRAM control amplifies the
potential difference on the two bit lines of the pair. In the first case, it draws the potential of the
bit line connected to the storage capacitor down to ground and raises the potential of the other
bit line up to Vcc (tJ. In the second case, the opposite happens - the bit line connected to the
storage capacitor is raised to Vcc and the other bit line decreased to ground.
Without precharging and potential equalization by the precharge circuit, the sense amplifier
would need to amplify the absolute potential of the bit line. But because the potential change
is only about 100 mV, this amplifying process would be much less stable and therefore more
likely to fail, compared to the difference forming of the two bit lines. Here the dynamic range
is flO0 mV, that is, 200 mV in total. Thus the precharge circuit enhances reliability.
Each of the 2048 sense amplifiers supplies the amplified storage signal at its output and applies
the signal to the I/O gate block. This block has gate circuits with two gate transistors, each
controlled by the column decoder. The column decoder decodes the applied column address
signal (which is applied after the row address signal), and activates exactly one gate. This means
that the data of only one sense amplifier is transmitted onto the I/O line pair I/O, I/O and
transferred to the output data buffer. Only now, and thus much later than the row address, does
the column address become important. Multiplexing of the row and column address therefore
has no adverse effect, as one might expect at a first glance.
The output data buffer amplifies the data signal again and outputs it as output data D,.,. At the
same time, the potentials of the bit line pairs are on a low or a high level according to the data
in the memory cell that is connected to the selected word line. Thus they correspond to the
stored data. As the access transistors remain on by the activated word line, the read-out data
is written back into the memory cells of one row. The reading of a single memory cell therefore
simultaneously leads to a refreshing of the whole line. The time period between applying the
row address and outputting the data D,, via the data output buffer is called RAS access time t,,,,

or access time. The much shorter CAS access time tCAs is significant for certain high-speed modes.
This access time characterizes the time period between supplying the column address and
outputting the data D,,,. Both access times are illustrated in Figure 19.4.
After completing the data output the row and column decoders as well as the sense amplifiers
are disabled again, and the gates in the I/O gate block are switched off. At that time the bit lines
are still on the potentials according to the read data. The refreshed memory cells are disconnected from the bit lines by the disabled word line, and the access transistors thus switched off.
Now the DRAM control activates the precharge circuit (t,), which lowers and increases, respectively, the potentials of the bit lines to Vcc/2 and equalizes them again (tJ. After stabilization
of the whole DRAM circuitry, the chip is ready for another memory cycle. The necessary time
period between stabilization of the output data and supply of a new row address and activation
of I?% is called recovery tirrw or RAS precharp time t,,,, (Figure 79.4).
The total of RAS precharge time and access, time leads to the cycle time t,,,,,,. Generally, the RAS
precharge time lasts about 80% of the access time, so that the cycle time is about 1.8 times more
than the access time. Thus, a DRAM with an access time of 100 ns has a cycle time of 180 ns.
Not until this 180 ns has elapsed may a new access to memory be carried out. Therefore, the


486

Chapter 19

time period between two successive memory accesses is not determined by the short access time
but by the nearly double cycle time of 180 ns. If one adds the signal propagation delays between
CPU and memory on the motherboard of about 20 ns, then an 80286 CPU with an access time
of two processor clock cycles may not exceed a clock rate of 10 MHz, otherwise one or more
wait states must be inserted. Advanced memory concepts such as interleaving trick the RAS
precharge time so that in most cases only the access time is decisive. In page mode or static
column mode, even the shortest CAS access time determines the access rate. (More about these
subjects in Section 19.1.6.)
The data write is carried out in nearly the same way as data reading. At first the memory control
supplies the row address signal upon an active RAS. Simultaneously, it enables the control

signal WE to inform the DRAM that it should carry out a data write. The data D,, to write is
supplied to the data input buffer, amplified and transferred onto the I/O line pair I/O, I/O.
The data output buffer is not activated for the data write.
The row decoder decodes the row address signal and activates the corresponding word line. As
is the case for data reading, here also the access transistors are turned on and they transfer the
stored charges onto the bit line pairs BLx, BLx. Afterwards, the memory controller activates the
CAS signal and applies the column address via the address buffer to the column decoder. It
decodes the address and switches on a single transfer gate through which the data from the
I/O line pair is transmitted to the corresponding sense amplifier. This sense amplifier amplifies
the data signal and raises or lowers the potential of the bit lines in the pair concerned according
to the value ccl), or aO>> of the write data. As the signal from the data input buffer is stronger
than that from the memory cell concerned, the amplification of the write data gains the upper
hand. The potential on the bit line pair of the selected memory cell reflects the value of the write
data. All other sense amplifiers amplify the data held in the memory cells so that after a short
time potentials are present on all bit line pairs that correspond to the unchanged data and the
new write data, respectively.
These potentials are fetched as corresponding charges into the storage capacitors. Afterwards,
the DRAM controller deactivates the row decoder, the column decoder and the data input
buffer. The capacitors of the memory cells are disconnected from the bit lines and the write
process is completed. As was the case for the data read, the precharge circuit sets the bit line
pairs to a potential level Vcc/2 again, and the DRAM is ready for another memory cycle.
Besides the memory cell with one access transistor and one storage capacitor, there are other cell
types with several transistors or capacitors. The structure of such cells is much more complicated, of course, and the integration of its elements gets more difficult because of their higher
number. Such memory types are therefore mainly used for specific applications, for example, a
so-called dual-port RAM where the memory cells have a transistor for reading and another
transistor for writing data so that data can be read and written simultaneously. This is advantageous, for example, for video memories because the CPU can write data into the video RAbf
to set up an image without the need to wait for a release of the memory. On the other hand,
the graphics hardware may continuously read out the memory to drive the monitor. For this
purpose, VRAM chips have a parallel random access port used by the CPU for writing data into
the video memory and, further, a very fast serial output port that clocks out a plurality of bits,

for example a whole memory row. The monitor driver circuit can thus be supplied very quickly


Memory Chips

487

and continuously with image data. The CRT controller need not address the video memory
periodically to read every image byte, and the CPU need not wait for a horizontal or vertical
retrace until it is allowed to read or write video data.
Instead of the precharge circuit, other methods can also be employed. For example, it is possible
to install a dummy cell for every column in the memory cell array that holds only half of that
charge which corresponds to a <cl,>. Practically, this cell holds the value ((l/2,>. The sense amplifiers then compare the potential read from the addressed memory cell with the potential of
the dummy cell. The effect is similar to that of the precharge circuit. Also, here a difference and
no absolute value is amplified.
It is not necessary to structure the memory cell array in a square form with an equal number
of rows and columns and to use a symmetrical design with 2048 rows and 2048 columns. The
designers have complete freedom in this respect. Internally, 4 Mbyte chips often have 1024 rows
and 4096 columns simply because the chip is longer than it is wide. In this case, one of the
supplied row address bits is used as an additional (that is, 12th) column address bit internally.
The ten row address bits select one of 2’” = 1024 rows, but the 12 column address bits select one
of 2”= 4096 columns. In high-capacity memory chips the memory cell array is also often
divided into two or more subarrays. In a 4 Mbyte chip eight subarrays with 512 rows and 1024
columns may be present, for example. One or more row address bits are then used as the
subarray address; the remaining row and column address bits then only select a row or column
within the selected subarray.
The word and bit lines thus get shorter and the signals become stronger. But as a disadvantage,
the number of sense amplifiers and I/O gates increases. Such methods are usual, particularly
in the new highly-integrated DRAMS, because with the cells always getting smaller and smaller
and therefore the capacitors of less capacity, the long bit lines <<eat>) the signal before it can reach

the sense amplifier. Which concept a manufacturer implements for the various chips cannot be
recognized from the outside. Moreover, these concepts are often kept secret so that competitors
don’t get an insight into their rivals’ technologies.

19.1.3 Semiconductor Layer Structure
The following sections present the usual concepts for implementing DRAM memory cells.
Integrated circuits are formed by layers of various materials on a single substrate. Figure 19.5
is a sectional view through such a layer structure of a simple DRAM memory cell with a plane
capacitor. In the lower part of the figure, a circuit diagram of the memory cell is additionally
illustrated.
The actual memory cell is formed between the field oxide films on the left and right sides. The
field oxides separate and isolate the individual memory cells. The gate and the two n-doped
,regions source and drain constitute the access transistor of the memory cell. The gate is separjated from the p-substrate by a so-called gate isolation or gate oxide film, and controls the
conductivity of the channel between source and drain. The capacitor in its simplest configura!hon is formed bv an electrode which is grounded. The electrode is separated by a dielectric
aolation film from the p-substrate in the same way as the gate, so that the charge storage takes
f


488

Chapter 1 9

GND

W

BL

I
Figure 29.5: A typical DRAM cell. The nccess trnnsistor of the DRAM cell genemlly consists of or, MOS

transistor. The yote of the transistor sirn~rltnneously forms the word line, and the dmin is connected to the bit
line. Charges that represent the stored information are held in the substrate in the regiotr below the electrode

place below the isolation layer in the substrate. To simplify the interconnection of the memoq
cells as far as possible, the gate simultaneously forms a section of the word line and the drain
is part of the bit line. if the word line W is selected by the row decoder, then the electric field
below the gate that is part of the word line lowers the resistance value of the channel between
source and drain. Capacitor charges may thus flow away through the source-channel-drain
path to the bit line BL, which is connected to the n-drain. They generate a data signal on the bit
line pair BL, K, which in turn is sensed and amplified by the sense amplifier.
A problem arising in connection with the higher integration of the memory cells is that the si=
of the capacitor, and thus its capacity, decreases. Therefore, fewer and fewer charges can be
stored between electrode and substrate. The data signals during a data read become too weak
to ensure reliable operation of the DRAM. With the latest 4 Mbit chip the engineers therefore
went over to a three-dimensional memory cell structure. One of the concepts used is shown in
Figure 19.6, namely the DRAM memory cell with trench capacitor.
In this memory cell type the information charges are no longer stored simply between two plane
capacitor electrodes, but the capacitor has been enlarged into the depth of the substrate. The
facing area of the two capacitor electrodes thus becomes much larger than is possible with al’
ordinary plane capacitor. The memory cell can be miniaturized and the integration densit!
enlarged without decreasing the amount of charge held in the storage capacitor. The read-out
signals are strong enough and the DRAM chip also operates very reliably at higher integratror’
densities.
Unfortunately, the technical problems of manufacturing such tiny trenches are enormous. We
must handle trench widths of about 1 urn at a depth of 3-4 pm here. For manufacturing s LlCll
small trenches completely new etching techniques had to be developed which are anisotroPic,
and therefore etch more in depth than in width, It was two years before this technology \“a’

I



489

Memory Chips

GND

w

BL

/
Charge Storage Area

BL

!Figurc 19.6: Trench capacitor for htghest integration densities. To enhance the electrode area of the storage
‘cqmtor, the capacitor is built into the depth of the substrate. Thus the memory cells can moue closer together
I.’
:_rotthout decreasing the stored charge per cell.
‘reliably available. Also, doping the source and drain regions as well as the dielectric layer
between the two capacitor electrodes is very difficult. Thus it is not surprising that only a few
big companies in the world with enormous financial resources are able to manufacture these
memory chips.
5 i.
GT o enhance the integration density of memory chips, other methods are also possible and
C$applied, for example folded bit line structures, shared sense amplifiers, and stacked capacitors.
#_*k of space prohibits an explanation of all these methods, but it is obvious that the memory
pchips which appear to be so simple from the outside accommodate many high-tech elements
s;?,‘and methods. Without them, projects such as the 64 Mbit chip could not be realized.

k:

&9.1.4 DRAM Refresh
g:
m Figure 19.5 you already know that the data is stored in the form of electrical charges in
capacitor. As is true for all technical equipment, this capacitor is not perfect, that is, it
rges over the course of time via the access transistor and its dielectric layer. Thus the
charges and therefore also the data held get lost. The capacitor must be recharged periy. Remember that during the course of a memory read or write a refresh of the memory
ells within the addressed row is automatically carried out. Normal DRAM S must be refreshed
1-16 ms, depending upon the type. Currently, three refresh methods are employed: RASefresh; CAS-before-RAS refresh and hidden refresh. Figure 19.7 shows the course of the
involved during these refresh types.


490

Chapter 19

;

Refresh

Es [T
cAs[

Cycle

I

;


Refresh

Cycle

I

\

j
h

:
Ia)

KS [“\-----/
Address [

-

R A S [-$

Read

Cycle
I

;
1

Refresh


Cycle
/

i
J_

Figure 19.7: Three refresh types. (a) RAS-only refresh; (b) CA.5before-RAS refresh; (c) hidden refresh,

RAS-only Refresh
The simplest and most used method for refreshing a memory cell is to carry out a dummy read
cycle. For this cycle the RAS signal is activated and a row address (the so-called refresh nddressj
is applied to the DRAM, but the CAS signal remains disabled. The DRAM thus internally reads
one row onto the bit line pairs and amplifies the read data. But because of the disabled 66
signal they are not transferred to the I/O line pair and thus not to the data output buffer. To
refresh the whole memory an external logic or the processor itself must supply all the row
addresses in succession. This refresh type is called RAS-only refresh. The disadvantage of this
outdated refresh method is that an external logic, or at least a program, is necessary to carry out
the DRAM refresh. In the PC this is done by channel 0 of the 8237 DMA chip, which is periodically activated by counter 1 of the 8253/8254 timer chip and issues a dummy read cycle. In
an RAS-only refresh, several refresh cycles can be executed successively if the CPU or refresh
control drives the DRAM chip accordingly.
CAS-before-RAS

Refresh

Most modern DRAM chips additionally implement one or more internal refresh modes. The
most important is the so-called CA.5before-RAS refresh. For this purpose, the DRAM chip has its
own refresh logic with an address counter. For a CAS-before-RAS refresh, CAS is held low for
a certain time period before RAS also drops (thus CAS-before-RAS). The on-chip refresh (that



Memory Chips

491

is, the internal refresh logic) is thus activated, and the refresh logic carries out an automatic
internal refresh. The refresh address is generated internally by the address counter and the
refresh logic, and need not be supplied externally. After every CAS-before&AS refresh cycle,
the internal address counter is incremented so that it indicates the new address to refresh. Thus
it is sufficient if the memory controller ((bumps, the DRAM from time to time to issue a refresh
cycle. With the CASbefore-RAS refresh, several refresh cycles can also be executed in succession.
Hidden Refresh
Another elegant option is the hidden refresh. Here the actual refresh cycle is more or less <

×