VIETNAM NATIONAL UNIVERSITY – HO CHI MINH CITY
UNIVERSITY OF SCIENCE
LÊ THỊ LINH AN
A SOFT ERROR TOLERANT SRAM DESIGN
IN 130NM CMOS TECHNOLOGY
Specialization: Electronic Engineering – Microelectronics Major
Code: 60 52 70
MASTER DEGREE THESIS
ELECTRONICS ENGINEERING – MICROELECTRONICS
SUPERVISOR
Dr. BÙI TRỌNG TÚ
Ho Chi Minh City, 2010
ACKNOWLEDGEMENTS
It is my pleasure to thank all the people who made this thesis possible.
First of all, I would like to sincerely express my appreciation to my advisor, Dr.
Bui Trong Tu, for his tremendous support, valuable guidance and constant
encouragement during my studies. His technical advice made my master’s studies
a meaningful learning experience.
I am also grateful to Prof. Dang Luong Mo, Prof. Nguyen Huu Phuong, and Dr.
Huynh Huu Thuan, who are the managers of this Microelectronics Master
program. This is really an interesting course with enthusiastic and devoted
professors, who are the experts in the IC industry.
I also wish to thank my colleagues in TCAM team for all helpful discussion and
valuable advice during my study. Appreciation is expressed for Silicon Design
Solutions Company who have supported me about financial and let me join in this
Master course during my work.
Finally, my special thanks to my family who have always been with me
throughout the difficulties and challenges of my master study.
Ho Chi Minh City, November 2010
Le Thi
LinhAn
ABSTRACT
Soft error is a great concern for microelectronics circuits today. With the advanced
development in CMOS technologies, VLSI circuits are becoming more sensitive to
external noise sources, especially radiation particle strikes, which are the cause of
soft error. Soft errors are random and do not cause the permanent failure.
However, it causes the corruption of stored information, which could turn to the
failure in functionality of the circuits.
Meanwhile, the demand for a higher reliability of electronics applications is
always a non-stop requirement. There are a lot of critical applications that need the
extreme exactly in circuit functionality, such as the circuits used in space or
biomedical equipment, as well as the military electronics and so on.
Generally, soft errors in memories attracted more attention than soft errors in logic
circuit. In addition, memories play an important part in modern system. Because of
the high integration of storage cells, a large memory is more sensitive to particle
strikes than logic. Due to that motivation, this thesis focuses to study about soft
errors in memories.
The thesis goes through the background knowledge of soft errors and its
mitigation techniques. Then, a SRAM design with additional soft error tolerant
feature will be presented. The SRAM is designed in 130nm CMOS technology,
using circuit hardening and error correcting code techniques to mitigate the soft
error effect. The soft error tolerant level is verified by some simulations. Not only
focus on the soft error tolerant circuits, a whole SRAM architecture will be shown
in detail, from circuit to physical implementation. The verification and simulation
results are also included.
TABLE OF CONTENTS
Acknowledgement
Abstract
Table of contents
Abbreviations
List of tables
List of figures
CHAPTER 1 - INTRODUCTION 1
1.1. Problem and motivation 1
1.2. Contribution of the thesis 2
1.3. Thesis organization 2
CHAPTER 2 - BACKGROUND 4
2.1. Soft errors in semiconductor device 4
2.1.1. Radiation sources 4
2.2. Soft errors occurrence mechanism 5
2.3. Soft errors mitigation techniques 6
2.3.1. Device level techniques 6
2.3.2. Circuit level techniques 7
2.3.3. Block level techniques 7
CHAPTER 3 – SOFT ERROR TOLERANT SRAM DESIGN 10
3.1. SRAM specification 10
3.1.1. General information 10
3.1.2. Floorplan 11
3.1.4. Operation brief description 12
3.2. SRAM detail design 14
3.2.1. SRAM cell architecture 14
3.2.2. Replica path for Read operation 15
3.2.3. Internal clock generator 17
3.2.4. Write circuit 19
3.2.5. Decoder 19
3.2.6. Input/output latches 21
3.3. Error detecting and correcting (EDC) block 22
3.3.1. Hamming code algorithm 23
3.3.2. EDC block implementation 24
3.3.3. EDC detail architecture 26
CHAPTER 4 – DESIGN SIMULATION AND VERIFICATION 37
4.1. SRAM cell simulation 37
4.1.1. SRAM cell simulation to find device size 37
4.1.2. SRAM cell characteristic summary 42
4.1.3. Static noise margin comparison 43
4.1.4. SRAM cell capacitance 43
4.2. Soft error tolerant simulation 44
4.2.1. Verification methodology 44
4.2.2. Critical charge simulation 45
4.2.3. Simulation results 46
4.2.4. Conclusion 49
4.3. Post-layout simulation 50
4.3.1. Simulation setup 50
4.3.2. Cycle time definition and simulation result 52
4.3.3. Access time 55
4.3.4. Setup time 56
4.3.5. Timing delay of some critical paths 57
4.3.6. Simulation results summary 61
4.4. SRAM and EDC functional verification 61
4.4.3. Simulation setup 65
4.4.4. Functional verification result 67
4.5. Physical verification 70
CHAPTER 5 – CONCLUSION AND FUTURE WORK 75
ABBREVIATIONS
VLSI Very large scale integration
CMOS Complementary Metal-Oxide Semiconductor
SEU Single Event Upset
DRC Design Rule Check
LVS Layout versus Schematic
SRAM Static Random Access Memory
ECC Error Correcting Code
EDC Error Detecting and Correcting
SNM Static noise margin
LPE Layout Parasitic Extraction
LIST OF TABLES
Table 3.1: Pin description 12
Table 3.2: Hamming code for 22 bits 24
Table 4.1: Read current 38
Table 4.2: Read leakage current 38
Table 4.3: Effect of leakage on read current 38
Table 4.4: Write current 40
Table 4.5: Static noise margin 41
Table 4.6: SRAM cell characteristic summary 43
Table 4.7: SNM comparison 43
Table 4. 8: SRAM cell capacitance 44
Table 4.9: Critical charge result of hardened SRAM cell 46
Table 4.10: Critical charge result for normal SRAM cell 48
Table 4.11: Performance result (SS_125_1.35) 61
Table 4.12: Timing delay between nodes 61
Table 4.13: Design fault model 62
LIST OF FIGURES
Figure 2.1: Redundancy 8
Figure 2.2: Concurrent error detection 8
Figure 3.1: SRAM floorplan 11
Figure 3.2: Write operation 13
Figure 3.3: Read operation 13
Figure 3.4: SRAM cell architecture 15
Figure 3.5: Timing scheme for read operation 16
Figure 3.6: Reference IO cell and read circuit 17
Figure 3.7: Read clock generator circuit 18
Figure 3.8: Write clock generator 19
Figure 3.9: Write circuit and sequential waveform 19
Figure 3.10: Row decoder block diagram 20
Figure 3.11: Xdec circuit 21
Figure 3.12: Hardened latch architecture 22
Figure 3.13: EDC block diagram 25
Figure 3.14: Write encoder schematic 27
Figure 3.15: Parity comparison schematic 28
Figure 3.16: Syndrome decoder schematic 29
Figure 3.17: Bit flipper block 30
Figure 3.18: Input select 31
Figure 3.19: Output select and output latch 32
Figure 3.20: Top level layout view 33
Figure 3.21: SRAM cell layout with only device layers shown 34
Figure 3.23: Xdec cell layout 34
Figure 3.22: SRAM cell layout 34
Figure 3.24: Xdec array 1x256 35
Figure 3.25: Control block 35
Figure 3.26: IO array 1x22 36
Figure 4.1: Read current 37
Figure 4.2: Write current 39
Figure 4.3: Inject a current source to an off NMOS drain 45
Figure 4.4: The injected SEU current for hardened SRAM cell 47
Figure 4.5: IBL waveform of hardened SRAM cell 47
Figure 4.6: The exchange state between IBL and IBLX 47
Figure 4.7: The injected SEU current for normal SRAM cell 48
Figure 4.8: IBL waveform of normal SRAM cell 48
Figure 4.9: The exchange state between IBL and IBLX 49
Figure 4.10: A part of LPE netlist containing capacitance value 50
Figure 4.11: A part of LPE netlist containing resistor value 51
Figure 4.12: A part of input waveform for performance simulation 51
Figure 4.13: Hspice option 52
Figure 4.15: Delay from clk rise to resetx rise 53
Figure 4.14: Cycle time must cover the internal clock 53
Figure 4.17: Delay from clk rise to dmrbl rise 54
Figure 4.16: Cycle time must make sure all RBL be precharged fully 54
Figure 4.18: Cycle time must cover PWH of input latch plus for max setup time 55
Figure 4.19: PWH of input latch 55
Figure 4.20: Access time definition 56
Figure 4.21:Access time 56
Figure 4.22: Address input path delay 57
Figure 4.23: Clock path delay 57
Figure 4.24: Delay from CLKA to intckx fall 58
Figure 4.25: Delay from intclk fall to rhcpx fall 58
Figure 4.26: Delay from rhcpx fall to latch rise 58
Figure 4. 27: Delay from rhcpx fall to echo rise 59
Figure 4.28: Delay from echo rise to resetx fall 59
Figure 4.29: Delay from resetx fall to intclk rise 59
Figure 4.30: delay from intclk rise to rhcpx rise 60
Figure 4.31: Delay from rhcpx rise to latch fall 60
Figure 4.32: Delay from intclk rise to resetx rise 60
Figure 4.33: Netlist of top level 66
Figure 4.34: A part of full test vector 66
Figure 4.35: Hsim option 67
Figure 4.37: Waveform of SRAM functional simulation 68
Figure 4.36: Hsim log file 68
Figure 4.38: Waveform of EDC functional simulation 69
Figure 4.39: LVS Calibre report for hierachical checking 71
Figure 4.40: Detail LVS report for top level 72
Figure 4.41: DRC report 74
CHAPTER
1
INTRODUCTION
P a g e | 1
CHAPTER 1
INTRODUCTION
1.1. Problem and motivation
Reliability is the key challenge facing the modern VLSI system. The advanced
development of CMOS technologies has resulted in the lower supply voltages,
higher clock frequencies, and the increasing of transistor integration densities.
Consequently, VLSI circuits are becoming more vulnerable to various noise
sources. It can be listed here some well-known noise effects such as: the power
and ground noise, capacitive coupling noise, radiation particle strikes …
With the rapid scaling of technology, integrated circuits (ICs) are turned to be very
sensitive to the radiation particles strikes. When a radiation particle strike at a
sensitive region in a semiconductor device, the charges generated could corrupt
the stored information in the memory element, resulting in an erroneous data at the
output, or so called soft error. Soft errors are incidental and do not destroy the
device. They just cause the temporary functional failure and the system still works
well after that. The radiation particle striking is a random natural phenomenon;
therefore they cannot be predicted or controlled by the designers.
The charge particles could be the alpha particles, neutron induced
10
B fission and
high energy cosmic ray neutrons. The source of charge particles can be from the
radioactive material or cosmic rays. In addition, it could also be the result of high
energy particle interaction with semiconductor itself.
Electronics applications nowadays always require a higher reliability level. Many
critical applications such as biomedical circuits, as well as space and military
electronics devices demand extreme high reliable circuit functionality. That means
soft errors are becoming more and more unacceptable, even in the commercial
CHAPTER
1
INTRODUCTION
P a g e | 2
applications [1]. Therefore, soft error elimination is a major consideration of all
VLSI circuits today.
Memories always have a high density integration of storage elements. Hence, they
are more sensitive to soft errors than in logic circuit. The soft errors in memories
(SRAM and DRAM) were widespread studied from the end of the twentieth
century [2]. However, it is still problematic up to now. Due to that motivation, this
thesis focuses to study the soft errors on memories (specific in SRAM) and applies
some mitigation techniques to design a SRAM with soft error tolerant feature.
1.2. Contribution of the thesis
The thesis presents the detail design of a synchronous two-port SRAM in 130nm
CMOS technology, with additional soft error tolerant feature. The design was
applied two soft error mitigation techniques; those are circuit hardening and error
correcting code (ECC) techniques.
For the first technique, only some special parts of the design are selected to be
hardened. They are the memory cell, the address input latches, data input and
output latches, keeper circuits… These are parts that most easily suffer from soft
error of a memory because they are storage elements.
The second technique helps to recover the design if unfortunately the soft error
occurred. It is a built-in error detecting and correcting (EDC) block for the SRAM.
This block was applied the ECC techniques, used the Hamming code to detect if
there is a single bit or double bit error in a memory array. And it will also function
as a correcting circuit if there is a single bit upset.
1.3. Thesis organization
The rest of thesis is organized as follows:
Chapter 2 introduces the background knowledge of soft error, its mechanism as
well as the mitigation techniques.
CHAPTER
1
INTRODUCTION
P a g e | 3
Chapter 3 describes detail about the SRAM design, including the SRAM
specification, SRAM architecture, specific design for soft error tolerant feature
and the physical implementation.
Chapter 4 focuses on the verification methodologies and the simulations result.
These simulations include the soft error tolerant level simulation, memory cell
characteristic, post layout simulation, functional verification and physical
verification.
Finally, chapter 5 shows the conclusion and some discussions to improve this
SRAM design.
CHAPTER
2
BACKGROUND
P a g e | 4
CHAPTER 2
BACKGROUND
2.1. Soft errors in semiconductor device
Soft errors, also called Single Event Upset (SEU), are the errors in
microelectronics circuit caused when the radiation particles strike at sensitive
regions of the silicon devices. Soft errors are incident and no breakage of the
device occurs [3]. They only flip the stored state of a memory element and will
generate an erroneous value at output. Soft errors cause no permanent faults; the
system still work well after suffering from an SEU. Therefore, they are named as
soft error. This background section will help to get an overview of radiation
sources which are the cause of soft errors and the mechanism of soft errors
occurrence in semiconductor devices.
2.1.1. Radiation sources
Radiation is kinetic energy in the form of high speed particles and
electromagnetic waves [4]. Typically, three main sources of radiation causing
soft errors in semiconductor device could be summarized as following:
· Alpha particles: are the nuclei of helium atoms consisting of 2 protons and
2 neutrons. Alpha particles are generated from the radioactive decay
process and when they collide with other atoms. Because of alpha particles
cannot travel a long path in material, atmosphere therefore is not the main
source of alpha particles, but an integrated circuit itself. Packaging and
soldering contain traces of radioactive isotopes, which lead to release the
alpha particles as well as other particles such as gamma and beta particles,
as they decay to lower state. Alpha particles contain the kinetic energy in
the range of 4 to 9 MeV [5]
CHAPTER
2
BACKGROUND
P a g e | 5
· High energy neutrons: when the cosmic radiation reacts with the
atmospheres, it will cause the generation of secondary particles, includes
protons, electrons, neutron … All of them can cause soft errors; however,
charge generation property of neutron for the same energy is more than
proton or electron. Neutron does not contain charge; therefore, the
ionization in material cannot be caused by itself. However, when a neutron
with energy above 1 MeV interacts with the silicon atoms, it will cause a
nuclear reaction which generates charged particles. These charged particles
cause ionization, lead to the soft error in the device.
· Thermal neutrons:
thermal neutrons are low energy neutrons with a
kinetic energy of about 0.025 eV. The interaction of low energy cosmic
neutrons and doping boron (isotope
10
B and
11
B) in semiconductor material
generates the secondary radiation particles (the lithium atom and alpha
particle). Both these particles can cause soft errors in the device
2.2. Soft errors occurrence mechanism
In semiconductor device, there are some sensitive nodes which are easily to suffer
from SEU. Those are the drain of the off NMOS and off PMOS transistors.
Consider an off NMOS transistor, its source, gate and substrate terminals are
connected to VSS. The drain is connected to VDD. The drain and substrate of this
OFF transistor form a reverse-biased junction. Therefore, a strong electric field
from drain to substrate exists in the depletion region of this junction. Because
radiation particles generate the free electron hole pairs, this electric field will cause
the collection of electron at drain and of hole at the substrate. That’s why these
reverse-biased junctions are the most sensitive nodes to the particle strikes.
When the particles strike at these sensitive nodes, due to the electric field of the
reversed-biased junction, the generated charges are collected at the opposite
voltage terminals (drain and substrate) of the reverse-biased junction. Electrons
move towards the positive voltage while holes move toward the negative voltage.
CHAPTER
2
BACKGROUND
P a g e | 6
This event will cause a current pulse, flow from the n type diffusion to the p type
diffusion in a very short duration. When the charge collection exceeds the critical
charge, the storage value will be changed. Critical charge (Q
crit
) is the minimum
charge required to flip the cell. The Q
crit
depends on the characteristic of the
circuit, especially the supply voltage and the nodal capacitance of the drain [6].
When a particle strike discharges the charge stored at the drain of the OFF-NMOS
transistor, it will flip from 1 to 0. Similarly for a 0 to 1 flip when it strikes the
drain of the OFF-PMOS transistor.
As technology scales down, to adapt the higher requirement for constraining the
power and making the circuit transient faster, the supply voltage and nodal
capacitance is decreasing swiftly. That makes the charge stored at the sensitive
nodes of the device is reduced because Qnode = Cnode×Vdd, resulted in the more
and more vulnerable to soft errors of SRAM.
2.3. Soft errors mitigation techniques
In general, the soft error mitigation techniques could be classified into three
categories: device level techniques, circuit level techniques and block level
techniques
2.3.1. Device level techniques
At this level, some methods were given out to edit the traditional fabrication
process to make the device resistant to soft error. The manufacturers and
designer could choose the appropriate material, package, as well as the better
device geometries. For example, soft errors can be caused by alpha particles
which are emitted by the materials or compounds used in packaging.
Therefore, choosing the appropriate material which has the less probability of
alpha particles could minimize the soft error rate. Or to reduce the soft error
induced by the interaction of low energy cosmic neutrons and doping boron
CHAPTER
2
BACKGROUND
P a g e | 7
10
B, BPSG (boron phosphor silicate glass) is replaced by other insulators that
do not contain boron.
2.3.2. Circuit level techniques
The technique that is mostly used at this level to make the circuit resistant to
SEU is radiation hardening technique [7]. With this technique, some special
parts of the design are chosen to be hardened. Basic circuits element such as
Inverter, Nand, Nor, flip-flop or latches are made SEU resistant by adding
extra transistor than normal. Normally, this technique is often applied to the
memory cell, keeper circuits, latches or flip-flops which are data storage
element, thus are easily suffer from soft errors. It will help to increase the
critical charge at sensitive nodes, making those nodes less susceptible to the
SEU. This technique is widely used because the designer can predict which
nodes are sensitive to protect them from the SEU. However, this technique
causes the overhead in area and power consumption [8].
2.3.3. Block level techniques
Different with two approaches above, the block level techniques are used to
detect the error and recover the design after being suffered from SEU, while
the two approaches above mainly protect and enhance the design. There are
two main mitigation techniques at this level:
2.3.3.1. Redundancy
The redundancy techniques often clone to create the redundant circuit.
However, this result the area and performance overhead, also higher
power consumption. Triple modular redundancy is a classical method
which has the high soft error reliability. Three identical copies of a circuit
compute on the same data in parallel. The three outputs are then evaluated
CHAPTER
2
BACKGROUND
P a g e | 8
by the majority voting logic. It will return the value that occurred in at
least two of three cases. By using this technique, the soft error will be
detected if it occurs at one of the three circuits, assuming the other two
circuits operated correctly.
Figure 2.1: Redundancy
Another example of redundancy is using concurrent error detection, from
which, only a selected parts of the design are chosen to be duplicated.
Figure 2.2: Concurrent error detection
In these techniques, selecting the parts to be duplicated is very important. If
the particle strike happens in the non-duplicated region, it cannot be
detected. In contrast, if it occurs at the duplicated portion of the circuit, the
checker could detect it. Therefore, must be careful to select the cutset logic
in which the nodes have highest soft error susceptibility [9].
2.3.3.2. Error correcting code and parity
CHAPTER
2
BACKGROUND
P a g e | 9
Due the soft error doesn’t destroy the device, if soft errors occur, we can
remove them by rewriting the correct data to it, or we can get the correct
output data by fixing the error bits before it comes to the output. This
technique incorporates redundant data into each word to create an error
correcting code (ECC), which used to detect and correct the soft error
[10]. Normally, the redundant data are the parity bits generated during
write operation. These parity bits are stored in memory array. Each time to
data is read, they will be used to detect the error and correct the error bit.
Used ECC can be Hamming code, Turbo code …
CHAPTER
3
SOFT ERROR TOLERANT SRAM DESIGN
P a g e | 10
CHAPTER 3
SOFT ERROR TOLERANT SRAM DESIGN
3.1. SRAM specification
3.1.1. General information
· Two-ports synchronous SRAM 22kbit memory
· Built-in Error Detecting and Correcting (EDC) block to mitigate soft error.
The EDC block could detect single bit/double bit error and only fix single
bit error.
· This SRAM was designed in 130nm CMOS technology.
· Operating voltage range is from 1.35V to 1.65V
· Operating frequency is 200MHz (at worst case)
· Hand-crafted layout
· 22 bit data in/out for SRAM
· Only 16 bit data in/out for EDC block interface because the remaining 6 bit
data of SRAM were used as parity bit check.
· 8 row addresses input and 2 column addresses IO
· Two independent clocks for read and write operations as well as two
independent data in/out ports and address buses.
· Some parts of the design were selected to be radiation hardened
· There is also the memory enable control for read and write.
· EDC enable pin allows to operate with or without error detection and
correction task
CHAPTER
3
SOFT ERROR TOLERANT SRAM DESIGN
P a g e | 11
3.1.2. Floorplan
MEMORY ARRAY
R
E
F
C
O
L
U
M
N
ROW DECODER
CONTROL BLOCK
R
E
F
I
O
CENA
REF ROW
BUILT
-
IN EDC BLOCK
CEN
B
AA
[0:9]
A
B
[0:9]
CENA
CEN
B
CENA
CENB
AA[0:9]
AB [0:9]
CENA
CENB
BUILT-IN EDC BLOCK
QA[0:21]
DB<0:21]
BWEN
LATCH
Q
I[0:21]
QI[0:21]
RAM_MODE
LATCH
SE
DE
PE
RAM_MODE
DI<0:15]
QO[0:15]
A
B
[0:9]
AA[0:9]
CEN
B
CENA
CEN
B
CENA
COLUMN MUX
SENSE AMPLIFIER - OUTPUT BUFFER
Figure 3.1: SRAM floorplan
CHAPTER
3
SOFT ERROR TOLERANT SRAM DESIGN
P a g e | 12
3.1.3. Interface pin description
Table 3.1: Pin description
Pin Name Description
CLKA
Read port clock input
CLKB
Write port clock input
CENB
Write enable
CENA
Read enable
AA<0:9>
Read address
AB<0:9>
Write address
DI<0:15>
Data in
QO<0:15>
Data output
RAM_MODE
EDC block disable pin
· RAM_MODE = 0: the SRAM will work with error detecting
and correcting tasks
· RAM_MODE = 1: the SRAM will work in normal mode,
without error detecting and correcting tasks.
DE
Double bit error flag
SE
Single bit error flag
PE
Parity bit error flag
3.1.4. Operation brief description
3.1.4.1. SRAM operation
A write operation is started at the rising edge of CLKB signal. The write
enable control input, data input and address input are latched at the
beginning of each cycle. During a write operation, data will be written
into the memory, and the data will not propagate to the memory output.
CHAPTER
3
SOFT ERROR TOLERANT SRAM DESIGN
P a g e | 13
The memory output will remain at the value determined by the last
memory read.
Figure 3.2: Write operation
Similarly, a read operation is started at the rising edge of CLKA signal. The
read enable control input and address input are latched at the beginning of
each cycle. The data output latch is latched following each read access,
controlled by the track path.
Figure 3.3: Read operation
3.1.4.2. Built-in EDC operation
In each write operation, the 16 bit data input DI<0:15> of EDC will be
encoded to 6 parity bits following the Hamming code. After that, 16 bit
data input and 6 parity bits will propagate to 22-bit data in ports
DB<0:21> of the SRAM. That means, in the memory array, only 16 bit is