ispLSI/GAL Metastability Report

®
ispLSI /GAL
Metastability Report
®
October 2001
Introduction
The dictionary definition of metastability is “a situation that is characterized by a slight margin of stability.” When
applied to bi-stable (digital) logic, the term refers to an undesirable, marginally stable output state between VIL
(max) and VIH (min).
Metastability can occur in bi-stable storage elements (registers, latches, memories, etc.) when setup and/or hold
times are violated. Since setup and hold times vary with temperature and operating voltage, among other factors,
the times referred to here are not the min/max numbers printed in data sheets, but rather the actual times for the
given set of operating conditions. Typical applications where such times are likely to be violated include bus and
memory arbiters, interfaces, synchronizers, and other state machines employing asynchronous inputs or asynchronous clocks.
Metastability manifests itself in a number of different ways. Common responses are (shown as they might be captured on a digital oscilloscope in Figure 1: runt pulse (1a), decreased output slew rate (1b), output oscillation (1c),
and increased clock-to-output time (1d). By definition, the phenomenon of metastability is statistical in nature. Not
only is entry into the metastable state uncertain, but the time spent there can also vary.
Because PLDs are commonplace in today’s designs, a thorough understanding of their metastable behavior is crucial. In some applications, output anomalies shorter than one clock cycle may be acceptable, but in applications
where the register output is used as a control signal (clock, bus grant, chip select, etc.) for other circuitry, faults
such as runt pulses and oscillation cannot be tolerated.
This report will not study the causes or characteristics of metastability in great detail; excellent material has already
been prepared on this subject (see Bibliography). Rather, this report will introduce a mathematical model for the
metastable phenomenon, discuss potential test methodologies, present and compare test results from various
bipolar and CMOS PLDs, and discuss how to interpret the data. This report will close with suggestions on how to
design metastable tolerant systems.
Derivation of Constants
The basic premise of all metastability models is that a device’s output is more likely to have settled to a valid state
in time (t) than in time (t-n). In fact, the failure probability distribution follows an exponential curve. Figure 2 shows a
typical failure frequency plot.
It is accepted [1] that metastable failures can be accurately modeled by the equation:
log Failure = log MAX - b(∆ - ∆o)
(1)
In this equation, MAX represents the maximum failure rate for a particular environment, ∆ is the time delayed
before sampling the DUT (Device Under Test) output, and ∆o is the time at which the number of failures starts to
decrease. On a failure frequency plot (such as the one in Figure 2), ∆o represents the knee of the curve. The constant b is the rate at which the frequency of failures decreases after the knee is reached.
Recall that:
log X = a ln (X), where a = log (e)
Substituting this into Equation 1:
a • ln Failure = a • ln MAX - b(∆ - ∆o)
www.latticesemi.com
1
(2)
metastab_04
ispLSI and GAL
Metastability Report
Lattice Semiconductor
MAX is related to the clock frequency (fCLOCK) and data frequency (fDATA). That is,
MAX = (k1 • fCLOCK • fDATA)
(3)
Substituting Equation 3 into Equation 2 and applying some algebra:
a • ln Failure = a • ln (k1 • fCLOCK • fDATA) - b(∆ - ∆o)
ln Failure - ln (k1 • fCLOCK • fDATA) = -b/a(∆ - ∆o)
Setting k2 = b/a and rearranging the equation yields:
Failure = (k1 • fCLOCK • fDATA)e-k2 (∆ - ∆o)
(4)
When used with Equation 4, the constants k1, k2, and ∆o, completely describe a particular device’s metastable
characteristics; they indicate how quickly a device can resolve the metastable condition. Devices which transition
out of the metastable region quickly are characterized by a small ∆o and a large k2.
The constant k1 is peculiar to the test apparatus (it can be thought of as a “scaling factor”). The maximum metastable failure rate (MAX) is limited by fCLOCK; a failure cannot occur if the device isn’t clocked. Likewise, it is true that a
metastable failure cannot occur unless data has changed. So, if fDATA < fCLOCK, then MAX = fDATA. This was the
case in the test fixture Lattice used (fCLOCK = 10MHz, fDATA = 2.5MHz). Substituting MAX = fDATA back into Equation 3 yields: k1 = 1/fCLOCK, so k1 = 100ns for our tests.
Figure 1a. Runt Pulse
Figure 1b. Decreased Slew Rate
Figure 1c. Output Oscillation
Figure 1d. Increased TCO
2
ispLSI and GAL
Metastability Report
Lattice Semiconductor
Figure 2. Typical Failure Frequency Plot
Test Fixture
The goal of testing a particular device’s metastable characteristics is to generate real numbers for the constants k2
and ∆o. To do this, the device must first be forced into the metastable state. This is done by intentionally violating
setup and/or hold times. Once metastable, the output can be observed on an oscilloscope or used to increment an
event counter.
Traditional Approach
One approach to characterizing a device’s metastable behavior employs a test fixture similar to that shown in Figure 3a. In such a fixture, data to the device includes a “jitter band” so that the device sees changing data as it is
clocked. The DUT output is fed to a window comparator to determine when it is in the metastable region (between
VIL max and VIH min). The comparator output can be sampled periodically and used to increment an event counter.
Figure 3a. Traditional Metastability Test Circuit
÷2
Variable Delay
+V
VIH
To
Counter
–
+
Osc.
Data
Shifter
D
Q
D
Q
–
DUT
+
VIL
3
'373
ispLSI and GAL
Metastability Report
Lattice Semiconductor
Figure 3b. Lattice Metastability Test Circuit
÷2
D
E
L
A
Y
.
.
.
0 M
U
7 X
Osc.
To Digital
Oscilloscope
SELECT
÷8
D
E
L
A
Y
.
.
.
D
0 M
U
7 X
Q
DUT
SELECT
This method of testing, though it directly yields MTBF numbers, has some drawbacks. The first is that it does not
distinguish between the different types of metastable behavior (runt pulse, oscillation, slow rise/fall time, delayed
transition), and it may have difficulty detecting every type. Also, the registers used in the detector circuit itself may
become metastable, which would adversely affect the results.
A New Approach
The test method used to gather data for this report used the circuit shown in Figure 3b. The tester employed an
“infinite precision” variable delay circuit to control clock placement with respect to data. This arrangement allowed
exact worst case placement of the clock, so as to induce metastability with nearly every clock pulse.
Using a digital oscilloscope (Tektronix 11403A) in point accumulate mode, metastable failures were recorded over
a lengthy period of time. A hardcopy was then made and the constants empirically obtained (details below).
The oscilloscope approach, being visual in nature, enables the designer to make educated decisions regarding
maximum clock and data rates, as well as the suitability of using the output to drive other circuitry. The five minute
sample period used in our tests contained approximately 750 million failures. Much longer sample periods were
evaluated, but they provided no perceptible gain in usable information.
A slight disadvantage of this approach is that extracting k2 and ∆o values from the hardcopies is not straightforward. Because each point on the hardcopy can represent any number of actual samples (between one and 1.5 million), one cannot simply count the points at time (t) for the MTBF at that time (although, in the case of the scattered
points, the probability is low that a single isolated point represents more than one sample).
To generate values for k2 and ∆o, it was necessary to refer to previous metastability studies [1]. By studying the
output plots of devices with known constants, certain relationships were established. For example, it was determined that ∆o represents the time from the leading edge of the output until the “dot density” starts to decrease
measurably. It should be noted that ∆o in previous studies included device propagation delays, whereas in our test
it does not.
The time from ∆o until the dot density equals zero was defined to be the “time to metastable release” or simply
time(r). The relationship between k2 and time(r) is given below in (5), and shown graphically in Figure 4. Recall that
MAX = 2.5 x 106 and a = log(e).
k2 = log(MAX) / (time(r) • a) = 14.73/time(r)
4
(5)
ispLSI and GAL
Metastability Report
Lattice Semiconductor
Figure 4. K2 Constant
15
12.5
10
7.5
.5
2.5
0
2
4
6
8
10 12 14 16 18 20 20 24 26 28 30
Interpreting the Results
In addition to examining E2CMOS® GAL® devices, this study also tested several bipolar PAL devices as well as
other CMOS PLDs. To insure that the results of this study would be relevant, all necessary precautions were
observed: the devices were of recent vintage and were acquired blindly through distributors; multiple samples of
each device were tested and the results combined; all devices had either fixed 16R8 architectures or were configured to emulate the 16R8 architecture; the devices were programmed from the same JEDEC fuse map file (the
source equations and the JEDEC fuse map file are presented in Listing 1).
Plots 1 through 11 on the following pages are some of the oscilloscope plots generated for this study. The top
waveform in each plot is the clock signal, the middle trace is the metastable data output and the bottom trace is the
histogram of the accumulated samples between 1V and 2V of the output signal. The horizontal scale is 2ns per
division, so the exact clock to output time of the metastable output condition can be read directly. The vertical scale
is 2V per division for the top trace, and 1V per division for the middle trace.
The middle waveform in each plot is the metastable device output which is the only signal captured in point accumulate mode. In every case, the output signal plot shows two stable levels after the transition. This is a direct result
of the “indecision” caused by metastability; on some cycles the output settled to a high level, while on others it settled to a low level.
Plot 9 shows the response of a bipolar PAL16R8-7. Notice the very well defined runt pulse (this correlates with previous data gathered on similar devices by the manufacturer [1]). The absence of a secondary trace along ground
indicates that the output always starts to transition to a high level, even when it finally settles to a low level. This
characteristic makes the device unsuitable for use in control path applications (when metastability is possible). All
of the bipolar parts examined showed similar results.
Plots 1 through 8 show typical metastability characteristics of Lattice PLD devices. Aside from the fact that setup
time violations may cause tCO to increase by a small (but random) amount, the outputs are very clean and well
behaved. The fact that there are no runt pulses or other anomalies is extremely significant, as the GAL6002B not
only allows asynchronous clocking, but encourages that activity. Although GAL6002B is a much slower device as
compared to GAL16V8 and GAL22V10, the similar metastable characteristics of the GAL6002B to the much faster
GAL devices indicate that the inherent metastable characteristics of all the GAL devices have consistently desirable characteristics across all speed grades. Comparing Plots 4 through 8 with Plots 9 and 10 shows that characteristics of the GAL devices are superior to those of bipolar PLDs. Plot 11 illustrates metastable characteristics of
the TTL flip-flop (TISN74AS74).
For reference purposes, Plots 12 through 14 are included. Plot 12 shows a normal (i.e. non-metastable)
GAL16V8B-7 transition, and Plot 13 a normal PAL16R8-7 transition. Plot 14 is the normal transition of the TTL flipflop (TI SN74AS74). For consistency, only rising edges have been shown. Our tests also covered falling edges
which, in general, were interesting but did not provide any additional information.
5
ispLSI and GAL
Metastability Report
Lattice Semiconductor
For a more quantitative look at the phenomenon of metastability, refer to the table beneath each plot. These tables
list the measured values of the constants ∆o and k2 for the device whose plot is shown, and for similar devices.
Recall that large k2 and small ∆o values are desirable. The numbers in the tables correlate closely with the results
of earlier tests [1,5], confirming the validity of our test method.
Since all the devices within each family possess very similar register and output buffer circuitry and all are fabricated using the same basic process, the data shown in the table accompanying each plot is considered applicable
to all devices and speed grades in the same family.
Using the Results
If a register enters the metastable state in a system, then data was obviously unstable as the register was being
clocked. The argument over which data should have been captured (old or new) is academic as the register will
randomly pick one or the other. Signals in most asynchronous systems are active for more than one clock cycle, so
if they are missed initially, they could be captured on a subsequent clock cycle.
It is the task of the state machine designer to take adequate precautions against metastability causing illegal states
to be entered. One way to do this is by using “gray codes” when ordering states. Gray code state equations allow
only one state bit to change during a state transition. Thus, the worst metastability could do would be to delay a
state transition by one clock cycle. If more than one bit were allowed to change, the outcome would be purely random, and probably illegal. Figure 5 shows examples of both cases.
Figure 5.
SEQUENTIAL STATE ORDERING
GRAY CODE STATE ORDERING
00
11
00
01
10
10
If metastability occurs
while transitioning from
01, every state is a
possible next state.
01
11
If metastability occurs
while transitioning
from 01, the possible
next states are 01
and 11.
Other solutions are to externally (or internally) synchronize the asynchronous signals, or to increase cycle times to
allow time for metastable outputs to settle. An example of the latter solution is given below.
It is worth noting at this point that state machines (synchronous or asynchronous) can fail for reasons other than
metastability. A not insignificant component of a PLD’s specified setup time is directly attributable to internal data
skewing [2]. Data skewing is the inevitable result of differing signal path lengths, loading conditions, and gate
delays. Stated another way, each input to output path has its own set of actual AC specifications. If insufficient
setup time has passed, different “versions” of the same data may be present at the inputs of different registers as
they are clocked. A good example of this is:
Output_Pin19 := Input_Pin2;
Output_Pin15 := !Input_Pin2;
If clocked at precisely the right moment after an input transition, one register will capture old data while the other
captures new data, resulting in a system failure. This condition, though also the result of a setup time violation,
6
ispLSI and GAL
Metastability Report
Lattice Semiconductor
should not be confused with metastability (the “incorrect” data that is captured has normal output characteristics); it
is, pure and simply, the result of a violation of specifications.
Example
To determine the maximum clock rate (given an acceptable error rate) that a particular device will allow in an asynchronous environment, equation (4) is used. For example, the system shown in Figure 6 utilizes a 9600 baud
(bits/sec) asynchronous data stream. The system clock period is tCO+tPD+tSU+∆. For one failure per year:
3.2x10-8 = [(1x10-7)(1/(∆+22))(9600)]e-[4(∆-.44)]
Solving for ∆ yields ∆=2.22ns, or about 2ns, for a cycle time of 24ns. Referring back to Plot 1, the additional delay
of 2ns intuitively makes sense. Remember, in terms of setup and hold time violations, the oscilloscope plots were
made under worst case failure conditions; the scattered dots could represent MTBFs of days, years, or even millenniums in a typical asynchronous environment.
Due to the extremely quick metastable settling times of GAL devices, a relatively small increase in the cycle time
will produce a dramatic improvement in reliability.
Bibliography
1. 1. D.M.Tavana (MMI), “Metastability - A study of the Anomalous Behavior of Synchronizer Circuits,” in: Programmable Array Logic Handbook, Monolithic Memories Inc., 1986, pp 11-13 - 11-16.
2. 2. K.Rubin (Force Computers), “Metastability Testing in PALs,” Wescon/87 Conference Record (San Francisco,
November 17-19, 1987). Los Angeles: Electronics Conventions Management, Inc, 1987, pp 16/1 1-10.
3. 3. K.Nootbaar (Applied Microcircuits Corp.), “Design, Testing, and Application of a Metastable Hardened FlipFlop,” ibid., pp 16/2 1-9.
4. 4. J.Birkner (MMI), “Understanding Metastability,” ibid., pp 16/3 1-3.
5. 5. R.K.Breuninger, K.Frank, “Metastable Characteristics of Texas Instruments Advanced Bipolar Logic Families,” application note SDAA004, Texas Instruments, 1985.
7
ispLSI and GAL
Metastability Report
Lattice Semiconductor
Plot 1. ispLSI 2032 Metastable Output
1V/div
Clock
Output
2ns/div
Part Number
Manufacturer
∆o (ns)
k2 (1/ns2)
ispLSI 2032
Lattice
.986
13.9
Plot 2. ispLSI 2032V Metastable Output
Clock
1V/div
Output
2ns/div
Part Number
Manufacturer
∆o (ns)
k2 (1/ns2)
ispLSI 2032V
Lattice
1.044
13.9
8
ispLSI and GAL
Metastability Report
Lattice Semiconductor
Plot 3. ispLSI 3192 Metastable Output
1V/div
Clock
Output
2ns/div
Part Number
Manufacturer
∆o (ns)
k2 (1/ns)
ispLSI 3192
Lattice
.772
13.9
Plot 4. GAL16V8C-5 Metastable Output
1V/div
Clock
Output
2ns/div
Part Number
Manufacturer
∆o (ns)
k2 (1/ns)
GAL16V8C-5
Lattice
1.4
9.82
9
ispLSI and GAL
Metastability Report
Lattice Semiconductor
Plot 5. ispLSI 1016-80 Metastable Output
Clock
1V/div
Output
2ns/div
Part Number
Manufacturer
∆o (ns)
k2 (1/ns)
ispLSI 1016-80
Lattice
.854
11.0
Plot 6. GAL16V8B-7 Metastable Output
2V/div
Clock
1V/div
Output
2ns/div
Part Number
Manufacturer
∆o (ns)
k2 (1/ns)
GAL16V8B-7
Lattice
.44
5.0
10
ispLSI and GAL
Metastability Report
Lattice Semiconductor
2V/div
Plot 7. GAL22V10B-10 Metastable Output
Clock
1V/div
Output
2ns/div
Part Number
Manufacturer
∆o (ns)
k2 (1/ns)
GAL22V10B-10
Lattice
.51
5.2
2V/div
Plot 8. GAL6002B-15 Metastable Output
Clock
1V/div
Output
2ns/div
Part Number
Manufacturer
∆o (ns)
k2 (1/ns)
GAL6002B-15
Lattice
1.1
6.52
11
ispLSI and GAL
Metastability Report
Lattice Semiconductor
Plot 9. PAL16R8-7 Metastable Output
2V/div
Clock
1V/div
Output
2ns/div
Part Number
Manufacturer
∆o (ns)
k2 (1/ns2)
PAL16R8-7
Lattice
1.2
2.5
Plot 10. TIBPAL16R6-7 Metastable Output
Output
1V/div
2V/div
Clock
2ns/div
Part Number
Manufacturer
∆o (ns)
k2 (1/ns2)
TIBPAL16R6-7
TI
1.5
1.5
12
ispLSI and GAL
Metastability Report
Lattice Semiconductor
Plot 11. SN74AS74 Metastable Output
2V/div
Clock
1V/div
Output
2ns/div
Part Number
Manufacturer
∆o (ns)
k2 (1/ns2)
SN74AS74
TI
.91
3.5
Plot 12. Normal GAL16V8B-7 Transition
2V/div
Clock
1V/div
Output
2ns/div
13
ispLSI and GAL
Metastability Report
Lattice Semiconductor
Plot 13. Normal PAL16R8-7 Transition
Output
1V/div
2V/div
Clock
2ns/div
Plot 14. Normal SN74AS74 Transition
2V/div
Clock
1V/div
Output
2ns/div
14