IDT AN-136 A new generation of tag srams-the idt71215 and Datasheet

A NEW GENERATION OF
TAG SRAMS—THE IDT71215 AND
IDT71216
APPLICATION
NOTE
AN-136
Integrated Device Technology, Inc.
By Kelly Maas
INTRODUCTION
CACHE AND TAG BASICS
The 71215 and 71216 represent a new generation of
integrated Tag SRAMs. Just as earlier Tag SRAMs such as
the 71B74 were better suited for tag applications than conventional SRAMs, the 71215/16 go a step further by integrating
new features to significantly ease the design of high performance cache subsystems for today’s high speed processors.
These Tag RAMs are designed for easy interfacing to Intel and
PowerPC processors, but are very flexible and can easily be
used in other applications as well.
This application note first provides some background information on caches, then describes in detail the architecture
and operation of the 71215 and 71216. This is followed by
three application examples, then a brief discussion of cache
coherency protocol implementation using these Tag RAMs.
Since the 71215 and 71216 are very similar, the descriptions
and explanations in this application note apply to both unless
otherwise noted.
For those new to caches, a brief review of cache basics may
be worthwhile. A cache is a memory that provides a CPU with
high speed access to a subset of the data from main memory.
Our discussions are focused on the secondary cache, which
is also known as the L2 cache, but it is not much different from
the faster primary (L1) cache residing inside most CPUs.
The cache consists of a controller, a data memory and a tag
memory. The purpose of the data memory is to store the
active data from main memory, and is composed of either
synchronous burst or asynchronous SRAMs. The tag memory
stores indexes (part of the CPU address field) that indicate
which data is stored in the cache. Additionally, most caches
also require at least one bit of memory for each cache entry,
to indicate the valid or dirty status of that entry. Figure 1 shows
how the CPU address field relates to the cache and the tag
memory. This example includes valid and dirty status bits, and
represents a 512KB cache, 2GB cacheable address space,
32-byte line size, and 8-byte word size.
DATA SRAM ADDRESS
A31
A30
A19
A5
A18
A4
A3
MSB
LSB
TAG MEMORY
12
1
1
TAG
LINE
VALID
LINE
DIRTY
TAG
ADDRESS
COMPARATOR
MATCH
to CACHE CONTROLLER
3176 drw 01
Figure 1. CPU Address Field and the L2 Cache (Showing 512 KB cache size and 2 GB cacheable main memory)
The IDT logo is a registered trademark of Integrated Device Technology, Inc.
PowerPC is a trademark of International Business Machines Corporation
Pentium is a trademark of Intel Corporation
1995 Integrated Device Technology, Inc.
1/95
A NEW GENERATION OF TAG SRAMS—THE IDT71215 AND IDT71216
APPLICATION NOTE AN-136
bit status memory on chip.
Integrated Tag RAMs operate as ordinary SRAMs, but
have an additional access mode in which a word of data (an
index) is internally read (but not driven off-chip) and compared
with the CPU address driven onto the Tag RAM’s data bus.
Figure 2 shows the basic architecture of an integrated Tag
SRAM. The comparator indicates whether the cache holds
the data for the address supplied by the CPU or other bus
master. This is a critical timing path since this tag “hit” or “miss”
must be determined before the cache memory access can be
completed (or even started, in many cases). Note that tag
memories connect only to the CPU address bus and never to
the CPU data bus.
THE 71215 AND 71216
As shown in Figure 3, these 16K x 15 RAMs are configured
internally as two memories: 16K x 12 for tag and 16K x 3 for
status. These two memories share the address bus but are
controlled independently. An important new feature is extra
pins and logic for generating BRDY (Intel’s Burst Ready) and
TA (PowerPC’s Transfer Acknowledge). These are CPU input
signals which are time critical in zero wait state secondary
caches. I/O’s are 3.3V compatible and there is a low power
standby mode. All writes are synchronous as with burst data
SRAMs, while all reads and compares are asynchronous for
minimum delay. Two opposite polarity chip select pins are
provided for easy depth expansion.
BASIC TAG RAM ARCHITECTURE
WRITE
DATA
(TAG)
DATAIN
MEMORY
ADDRESS
DATAOUT
READ
COMPARE
MATCH
3176 drw 02
Figure 2. Basic Integrated Tag SRAM Architecture
An additional feature of the Tag SRAM is that a portion of
the memory is resettable. This permits use of one bit of the
data field as a “valid” status bit. Upon system initialization,
when the cache contains random data, a quick reset will clear
the valid bit for every cache line so that all initial cache
accesses will result in a miss. A miss then causes the address
to be loaded into the Tag RAM, data from main memory to be
loaded into the data RAMs, and the valid bit to be set true. If
not included in the Tag RAM, this function requires an additional 1-bit wide SRAM.
The reset feature of earlier Tag RAMs was sufficient for
implementation of a valid bit, but nothing more. Today’s
secondary caches frequently implement four-state write-back
protocols such as MESI, with multiprocessor applications
requiring five states (e.g. MOESI) or more. Hence, most
caches need a two- or three-bit status memory that is accessed separately from the tag memory. It is used in conjunction with the match output to determine the response to a CPU
memory access or a snoop. (A snoop is an operation initiated
by the system in order to maintain coherency between the
cache(s) and main memory.) This has typically been handled
with yet another RAM - a conventional separate I/O SRAM
organized as either x1 or x4. The 71215/16 includes a three2
A NEW GENERATION OF TAG SRAMS—THE IDT71215 AND IDT71216
APPLICATION NOTE AN-136
16K x 12
MEMORY
ADDR(0:13)
16K x 3
MEMORY
VLDOUT
OET
DLYOUT
TAG (0:11)
WTOUT
OES
REGISTER
WET
WES
VLDin / S1IN
DLYin / S2IN
WTin / S3IN
BRDYIN (TAIN)
RESET
CLK
MATCH AND
BRDY LOGIC
SFUNC
BRDYH (TAH)
MATCH
BRDY (TA)
W/R (TT1)
BRDYOE (TAOE)
CS1
Chip enabling
Reseting the 16K x 3 memory
Powering down
Disabling outputs
CONTROL
LOGIC
CS2
PWRDN
3176 drw 03
Figure 3. Simplified 71215 / 71216 Block Diagram (71216 signal names are in parenthesis)
For a 1MB cache and 4GB of cacheable main memory, two
of the devices may be cascaded in depth without any timing
penalty apart from increased capacitive loading. This is
accomplished with the two Chip Select pins. A low order
address signal may be connected to CS1 on one chip and to
CS2 on the other so that at any given time, one is selected and
the other is deselected. The deselected chip ignores all
control inputs (except RESET and PWRDN) and tri-states its
outputs so that the two chips can be conveniently bussed
together. As expected, worst case timing delays from the Chip
Select inputs are the same as for the Address inputs. When
only a single 71215 or 71216 is used in an application, CS1 is
tied to VSS and CS2 is tied to VCC.
With a 16K x 12 tag memory, the 71215 and 71216 are
wider and deeper than most Tag RAMs. For a typical 64-bit
CPU with a 32-byte line size, the 16K depth supports a 512KB
cache while the 12-bit tag field supports 2GB of cacheable
main memory. Thus, only a single component is required for
most applications. Table 1 shows the relationships between
Tag RAM size, cache size, and cacheable main memory size.
The Tag depth is equal to the cache size divided by the line
size. The Tag width is equal to the base-2 log of the ratio of
main memory size to cache size.
TABLE 1: REQUIRED TAG RAM SIZE AS A
FUNCTION OF CACHE SIZE AND MAIN
MEMORY SIZE (For 32-byte line size and direct
mapped cache architecture.)
Cache Size
Cacheable Main Memory Size
64MB
256MB
1GB
2GB
4GB
128KB
4K x 9
4K x 11
4K x 13
4K x 14
4K x 15
256KB
8K x 8
8K x 10
8K x 12
8K x 13
8K x 14
512KB
16K x 7
16K x 9
16K x 11 16K x 12 16K x 13
1MB
32K x 6
32K x 8
32K x 10 32K x 11 32K x 12
3176 tbl 01
3
A NEW GENERATION OF TAG SRAMS—THE IDT71215 AND IDT71216
ADDR (13:0)
APPLICATION NOTE AN-136
0
Reg
16K x 12
MEMORY
TAG
1
16K x 3
MEMORY
STATUS
CS1
CS2
DataIN
Register
DataIN
Register
Register
SA
SA
TAG (11:0)
VLD/S1IN
DLY/S2IN
WT/S3IN
OET
VLD/S1OUT
DLY/S2OUT
WT/S3OUT
REGISTER
WRITE
(pos) PULSE
GENERATOR
WET
WES
CLK
OES
RESET
(neg) PULSE
GENERATOR
COMPARE
RESET
PWRDN
SFUNC
W/R (TT1)
71216 only
MATCH
BRDYH (TAH)
BRDYIN (TAIN)
BRDY (TA)
REGISTER
BRDYOE (TAOE)
3176 drw 04
Figure 4. Detailed 71215 / 71216 Block Diagram (71216 pin names are in parenthesis)
4
A NEW GENERATION OF TAG SRAMS—THE IDT71215 AND IDT71216
APPLICATION NOTE AN-136
match modes, the address path is flow-through for the fastest
possible response to a new address.
The three status bits of the 71215/16 are labeled VLD/S1,
DTY/S2, and WT/S3. The reason for the dual names is that
their functions vary, dependent on the state of the static Status
Function (SFUNC) input signal. When SFUNC is low, the
status bits are said to be in a “dedicated” mode and are
referred to as Valid, Dirty and Write-Through. See Figure 5.
When SFUNC is high, the status bits play no special role within
the 71215/16 and are simply referred to as Status 1, Status 2
and Status 3. See Figure 6. The functionality of VLD and WT
in the dedicated mode is described later. DTY/S2 does not
have any special functionality within the 71215/16.
The 71215/16 is shown in more detail in Figure 4. The tag
memory is controlled by the Write Enable Tag (WET) and
Output Enable Tag (OET) pins. During writes, WET is synchronous to CLK, as are the input data (TAG0 - TAG11) and
address (A0 - A13). Note that WET has no effect on the TAG
output buffers, so OET must be high to disable the outputs
during writes. Reads are performed by deasserting WET and
asynchronously asserting OET. For cache architectures in
which the tag is never read (e.g. write-through caches), OET
may be tied to VCC. When both WET and OET are high, the
71215/16 is in the match mode, where the TAG0 - TAG11
inputs are compared with the stored data and are used to
generate the MATCH and BRDY/TA outputs. In both read and
WTIN / S3IN
DTYIN / S2IN
VLDIN / S1IN
I/O
MEMORY
Address
V
D WP
WTOUT / S3OUT
DTYOUT / S2OUT
COMPARE
WET
VLDOUT / S1OUT
WES
internal RESET
OE
CLK
71216 only
MATCH
W/R (TT1)
BRDYH (TAH)
BRDY (TA)
BRDYIN(TAIN)
BRDYOE (TAOE)
3176 drw 05
Figure 5. Dedicated Mode Logic (71216 pin names are in parenthesis)
5
A NEW GENERATION OF TAG SRAMS—THE IDT71215 AND IDT71216
APPLICATION NOTE AN-136
BRDY and Match logic (Generic Status bit mode)
TA names added for ap note
WTIN / S3IN
DTYIN / S2IN
VLDIN / S1IN
TAG
ADDRESS
MEMORY
V
D WP
WTOUT / S3OUT
DTYOUT / S2OUT
WET
WES
COMPARE
VLDOUT / S1OUT
internal RESET
OE
CLK
MATCH
BRDYH (TAH)
BRDY (TA)
BRDYIN (TAIN)
BRDYOE (TAOE)
3176 drw 06
Figure 6. Generic Mode Logic (71216 pin names are in parenthesis)
The status bits are accessed through separate input pins
and output pins. This avoids the need for fast turn around on
this bus as in the following example: a single word write hit to
a write back line results in the need to set the state to dirty (also
called “modified”). The status memory must go from reading
to writing then back to reading in as little as two cycles. If
common I/O is preferred, the user may tie the respective input
and output pins together. The status memory control signals
(WES and OES) are equivalent to WET and OET for the tag
memory. Also, because the status field is separate I/O, OES
is normally tied to VSS to permanently enable the status
outputs.
The tag and status memories are controlled independently
since normal operation of the 71215/16 finds the tag memory
in match mode and the status memory in read mode. Often,
however, WET and WES are tied together in a design because
the write function tends to be common between them. For
those times when only the status bits need to be updated,
WET, WES and OET can be asserted together without having
to externally drive the TAG bus. This causes the data read
from the tag field to be written back to the same address,
resulting in no change to the tag data.
Note that there is only one address register that is used by
both memory segments. The address is registered when
either WET or WES is sampled low, and is flow-through when
both WET and WES are sampled high.
The entire status memory is cleared to zeros when RESET
is sampled low on at least one rising edge of CLK. This can
be used to put the cache into a known state after power up, or
after a cache flush. Since reset is a type of write, WET and
WES are required to be high during reset. PWRDN must also
be high, but the state of the chip select inputs does not matter.
During reset, BRDY/TA is driven high, and MATCH is driven
low.
MATCH, BRDY AND TA
As mentioned earlier, the 71215/16 is in match mode when
This allows the TAG0
OET is high and WET is sampled high.
6
A NEW GENERATION OF TAG SRAMS—THE IDT71215 AND IDT71216
APPLICATION NOTE AN-136
Logic has been included in the 71215/16 that enables it to
qualify BRDY/TA for one particular case. While a write hit to
a write-back line can be handled by the cache alone, a writethrough line requires that the write also proceed to main
memory. In the former case the cache can respond without
wait states and BRDY/TA is driven low immediately as the
result of a tag match and set VLD bit. In the latter case, main
memory writes normally require wait states. If a line (or the
whole cache) is write-through, the 71215/16 should not drive
BRDY/TA low, so that the cache/memory controller may do so
later when the main memory (or write buffer) write is complete.
When the Tag RAM is in dedicated status mode (SFUNC low),
the stored WT bit determines whether the line is write-through
(high) or write-back (low). Note that it may also be used to
denote a write protected line. Another pin - W/R and TT1 on
the 71215 and 71216 respectively - connects directly to the
CPU for distinguishing between processor reads and processor writes. These two bits of information are used to block
internal generation of BRDY/TA during a processor write to a
write-through line. Without this feature, the cache controller
might not have enough time to generate a blocking signal (as
described below) based on the WT output from the Tag RAM.
If a user wants to gate the VLD bit with MATCH but not use the
WT bit in combination with W/R or TT1, he should select the
dedicated mode (SFUNC low) and tie W/R low or TT1 high.
Note that the one functional difference between the 71215 and
71216 is the polarity of the W/R and TT1 signals.
The cache controller may have additional information and
may wish to delay the assertion of BRDY/TA. Thus, the 71215
and 71216 have input pins - BRDYH and TAH respectively so that the cache controller may force BRDY/TA high, regardless of the result of the tag comparison inside the 71215/16.
In the case of a cache miss or write through, the system
memory controller (usually combined with the cache controller) becomes completely responsible for generating BRDY or
TA for that bus cycle. For flexibility, the 71215/16 incorporates
two options for merging its own BRDY/TA output with that
generated by the system memory controller.
One approach is to bus the two signals together. This is the
preferred approach when the cache (including the 71215/16)
is optional, as on a module, since addition or removal of the
cache does not affect the way in which the cache controller
generates BRDY/TA. Figure 9 shows this approach for the
71215 used with the Pentium. It applies equally to the 71216
and PowerPC. This requires that both BRDY/TA sources be
tri-statable. The BRDYOE and TAOE input pins of the 71215
and 71216 are driven by the cache/memory controller, and are
used to enable or disable the 71215/16 BRDY/TA output as
necessary. To be prepared for a possible hit, each new bus
cycle begins with BRDYOE/TAOE low. In the event of a cache
miss, the controller deasserts BRDYOE/TAOE, then takes
over responsibility for driving BRDY/TA. This is also the
procedure for writes to write-through lines, where even cache
hits are responded to by the controller. Also, the controller
usually takes over control of BRDY/TA for the second, third
and fourth words of a burst transfer. This is required if either
the CPU address is not guaranteed to remain valid throughout
the entire bus cycle (a change to the 71215/16 address bus
- TAG11 inputs (high order bits of the CPU/system address
bus) to be compared with internally stored data. When
SFUNC is low, the stored VLD bit is combined with this
comparator output to generate a MATCH output that is true
only when both the tag comparison is true and the VLD bit is
high. Thus, an invalid tag entry does not generate a hit. Note
that OES and WES do not affect internal access of the status
bits. When SFUNC is high, the status bits are generic and
MATCH is simply the output of the comparator. MATCH is
driven low when the 71215/16 is not in match mode. When the
chip is deselected, MATCH becomes high impedance.
The cache/memory controller has traditionally generated
the BRDY/TA signal to the CPU, using MATCH and other
inputs. This is a critical timing path. During a zero-wait-state
lead-off, there are only two clock cycles for the CPU to drive
the address and other bus signals, and for BRDY or TA to be
returned to the CPU by the cache controller. See Figure 7.
Typically there is not enough time to have two chips (Tag RAM
and controller) in this timing path. The 71215/16 address this
difficulty by incorporating logic for generating BRDY/TA, thereby
removing the cache controller from this path. This is shown in
Figure 8.
While the cache controller is removed from the primary
BRDY/TA timing path, it must still play a part in generating
BRDY/TA. The controller has address and other bus cycle
information that is needed to qualify the generation of BRDY/
TA. This qualification logic is best placed in parallel with the
tag lookup, rather than in series with it. Also, there are cases
where the the generation of BRDY/TA by the 71215/16 must
be blocked so that the cache/memory controller can generate
it instead.
66MHz CPU
10ns
Delay
w/ derating
CACHE
CONTROLLER
MATCH
TAG SRAM
ADDRESS
10ns
Delay
Misc. Addr and Status
5ns
Delay
BRDY / TA
5ns Setup
Two Clock Cycles = 30ns
3176 drw 07
Figure 7. Conventional Tag RAM Usage - Chip Set in BRDY/
TA Critical Path
71215/16 TAG
66MHz CPU
10ns
Delay
w/
derating
5ns Setup
A(5:30)
MATCH
10ns
A(0:13) 10ns STATUS BITS CACHE
TAG(0:11)
CONTROLLER
11ns
BRDY / TA
Two Clock Cycles = 30ns
3176 drw 08
Figure 8. 71215/16 BRDY/TA Timing - Chip Set Removed
from the Critical Path
7
A NEW GENERATION OF TAG SRAMS—THE IDT71215 AND IDT71216
APPLICATION NOTE AN-136
will ripple through to BRDY/TA), or if data SRAM accesses
require wait states (it’s necessary to toggle BRDY/TA from
cycle to cycle). For those times when neither device is driving
BRDY/TA, a pull-up resistor is used to keep the signal high. In
this case, it’s suggested that the controller drive BRDY/TA
high before putting it in a high impedance state. Thus, the
resistor is never used to generate a low to high transition and
therefore can be weak (3 KΩ to 20 KΩ). Also, both the 71215/
16 and controller can remain off the BRDY/TA bus for extended periods of time if so desired. With this approach,
BRDYIN or TAIN (Burst Ready Input, Transfer Acknowledge
Input) is tied high.
66MHz
PENTIUM
71215 TAG
REDUCED POWER
For the increasing number of applications that require a low
power standby mode, the 71215/16 includes an asynchronous power down pin (PWRDN). When it is driven low, both
the tag and status memories are shut down to save considerably on power consumption. For optimum power savings, all
input and bidirectional signals should also be held at CMOS
voltage levels (near VCC or VSS). During power down, all
outputs are placed in a high impedance state and all data is
retained. All writes should be allowed to complete before
PWRDN is asserted. There is no minimum time that it must be
low. When exiting the power down state, there is only a very
short delay after the rising edge of PWRDN before normal
activity can be resumed.
CHIP SET
MATCH
A(5:30)
A(0:13)
TAG(0:11)
STATUS BITS
BRDYH
SYSTEM USAGE
For applications not using the entire 12-bit tag field, the
unused TAG I/O pins should be pulled either high or low
through 1 KΩ to 5 KΩ resistors. For applications not using the
entire 3-bit status field, the unused inputs may be tied directly
to VCC or VSS, and the unused outputs are left unconnected.
All other unused inputs should be tied either to VCC or VSS as
appropriate for their function. This includes unused address
signals ift only part of the depth of the 71215/16 is used.
BRDYH
BRDY BRDYOE
BRDYOE
BRDY
BRDY
The BRDY sources are totem-pole, NOT open-drain
3176 drw 09
Figure 9. Combining BRDY/TA : Bussing Option
The second approach is to have the cache/memory controller drive it’s BRDY/TA output into the BRDYIN/TAIN input
on the 71215/16 at all times. Inside the 71215/16, BRDYIN/
TAIN is registered by the clock then ANDed (negative logic
ORed) with the internally generated BRDY/TA. For this
approach, BRDYOE/TAOE is tied permanently low. The
controller no longer generates BRDYOE/TAOE, but instead
must generate BRDY/TA one cycle earlier because it is
delayed by one cycle in reaching the CPU. Note that BRDYH/
TAH only enables or disables the BRDY/TA generated inside
the 71215/16, and does not affect the propagation of BRDYIN/
TAIN through to the BRDY/TA output. Figure 10 shows this
approach for the 71215 and Pentium.
66MHz
PENTIUM
71215 TAG
CHIP SET
MATCH
A(5:30)
A(0:13)
TAG(0:11)
STATUS BITS
BRDYH
BRDY
BRDY BRDYIN
BRDYH
BRDY
BRDYIN is registered
3176 drw 10
Figure 10. Combining BRDY/TA : Pass-Through Option
BRDY/TA functions similar to MATCH (but opposite in
polarity) when the 71215/16 is not in match mode. It is high
impedance when the chip is deselected (or BRDYOE/TAOE is
high), and otherwise is driven high when out of match mode.
8
APPLICATION NOTE AN-136
Address Buffer
A NEW GENERATION OF TAG SRAMS—THE IDT71215 AND IDT71216
A31 - A3
A17 - A5
A29 - A18
A17 - A3
IDT71215
TAG RAM
PENTIUM
µP
ADDR (12:0)
CHIP SET
TAG (11:0)
DATA RAMS
ADDR (13)
BRDY
W/R
W/R
MATCH
MATCH
3
STATUSOUT
STATUS
ADDR
3
STATUS
STATUSIN
VCC
CS2
BRDYOE
CS1
OES
SFUNC
BRDYH
BRDYIN
WES
WET
OET
PWRDN
BRDYH
BRDY
WES
WET
OET
PWRDN
SYSTEM BUS
RESET
BRDY
CACHE READ
and WRITE
IDT71V432
32K x 32
I/Os
MAIN MEMORY READ/WRITE
Data Buffer
VARIOUS CONTROL SIGNALS
D63 - D0
3176 drw 11
Figure 11: Pentium / 71215 Example of 256 KB Cache
When changing the depth of the cache (and Tag RAM), the
TAG field shifts accordingly, so that it remains contiguous with
the address field. The example in Figure 11 uses only half of
a 71215, and can be compared with Figures 1 and 12. It shows
a 256KB cache for the Pentium, uses the BRDY pass-through
option from Figure 10, and maps 1GB of main memory into the
cache. Two 32Kx32 burst SRAMs are shown as the cache
data RAM.
9
A NEW GENERATION OF TAG SRAMS—THE IDT71215 AND IDT71216
A0 - A28
A1 - A12
A13 - A28
IDT71216
TAG RAM
PowerPC
CHIP SET
TAG (11:0)
ADDR (13:0)
MATCH
RESET
TA
TA
TT1
TT1
DATA RAMS
MATCH
STATUS
3
STATUSOUT
3
VCC
CS2
TAIN
CS1
OES
SFUNC
ADDR
STATUS
STATUSIN
TAOE
TAH
TAOE
TAH
WES
WET
OET
PWRDN
WES
WET
OET
PWRDN
TA
SYSTEM BUS
A13 - A26
CACHE READ
and WRITE
64K x 18
BURST
SRAM
I/Os
MAIN MEMORY READ/WRITE
VARIOUS CONTROL SIGNALS
72
DH0 - DH31 , DL0 - DL31, and DP0 - DP7
Data Buffer
DATA
Address Buffer
ADDR
APPLICATION NOTE AN-136
3176 dr w 12
Figure 12: PowerPC / 71216 Example of 512 KB Cache
Figure 12 shows a 512KB cache implementation for the
PowerPC, using the full address range of the 71216. This
example uses a bussed TA implementation shown in Figure 9.
The tag size is sufficient to support 2GB of cacheable main
memory.
10
Address Buffer
A0 - A28
A13 - A26
A0 - A11
A12
(2) IDT71216
TAG RAM
PowerPC
CHIP SET
TAG (11:0)
ADDR (13:0)
MATCH
RESET
TT1
TT1
TA
TA
A13 - A28
STATUSout
VCC
MATCH
3
CS2
STATUS
CS1
3
STATUSin
STATUS
TAOE
TAH
VCC
TAIN
CS2
CS1
OES
SFUNC
ADDR
TAOE
TAH
WES
WET
OET
PWRDN
DATA RAMS
A12
64K x 18
BURST
SRAM
WES
WET
OET
PWRDN
I/Os
ADDR
CS2
DATA RAMS
ADDR
APPLICATION NOTE AN-136
SYSTEM BUS
A NEW GENERATION OF TAG SRAMS—THE IDT71215 AND IDT71216
CS2
CS1
CS1
TA
VARIOUS CONTROL SIGNALS
CACHE READ
and WRITE
64K x 18
BURST
SRAM
I/Os
MAIN MEMORY READ/WRITE
Data Buffer
DATA
DH0 - DH31 , DL0 - DL31, and DP0 - DP7
3176 drw 13
Figure 13: PowerPC / 71216 Example of 1 MB Cache
Figure 13 shows a 1MB cache for the PowerPC using the
71216. The implementation is essentially the same as for
512KB, but with two 71216 Tags and two banks of data
SRAMs. Except for CS1 and CS2, all the same signals that
were connected to the first Tag RAM should be connected to
the same pins of the second Tag RAM. The least significant
tag bit of the 512KB cache is used to select between the two
Tag RAMs of the 1MB cache. The same is true for the two
banks of data SRAMs. The tag field then shifts one bit in the
direction of the more significant address bits. Please note that
the PowerPC and Intel processors do not have the same
address sequence. A0 is the MSB for the PowerPC while A31
is the MSB for Intel's processors.
It is also possible to double the size of the cache and
cached address space without doubling up the Tag RAMs.
This can be done by doubling the line size of the cache - from
32 bytes to 64 bytes, for example. It is not necessary to have
the same line size for both the primary and secondary caches,
though it does simplify the cache controller. A more detailed
discussion of this topic is beyond the scope of this application
note.
The CLK pin should be driven by the same clock that drives
the CPU. Although there is no standard for clock skew
tolerances between devices, a recommended target is ±1nS.
MESI PROTOCOL IMPLEMENTATION
MESI is a cache coherency protocol, implemented in the
primary cache of both the PowerPC 601 and the Pentium
Processor. With the 71215/16, it is now practical to also
implement MESI for the L2 cache. The acronym stands for
Modified (write-back data that is dirty), Exclusive (clean writeback data that can later transition to Modified), Shared (writethrough data which cannot become Modified) and Invalid. In
short, it allows for cache lines to be individually marked as
11
A NEW GENERATION OF TAG SRAMS—THE IDT71215 AND IDT71216
APPLICATION NOTE AN-136
either write-through or write-back. While the cache controller
is responsible for implementing the protocol and controlling
the state transitions, the 71215/16’s features can be helpful in
the implementation.
The following state assignments for MESI are intended to
take advantage of the features of the 71215/16 when it is in the
dedicated status mode (SFUNC low). Some variations are
possible:
TABLE 2: SUGGESTED MESI STATE ASSIGNMENTS
Invalid
VLD/S1
DTY/S2
WT/S3
0
X
X
Shared
1
0
1
Exclusive
1
0
0
Modified
1
1
0
3176 tbl 02
As described earlier, BRDY/TA generation is blocked when
VLD/S1 is low, and during write hits to write-through lines. The
cache controller is responsible for all state transitions, including Exclusive to Modified.
SUMMARY
The 71215 and 71216 represent a major step forward in
Tag RAMs. They are sized appropriately for the majority of
today’s cache and main memory requirements, and offer new
features that help remove many of the barriers to the implementation of zero wait-state caches.
As this application note is being written, the fastest speed
grade of the 71215 and 71216 is 9nS (address to match time),
with faster speeds expected in the future. Please contact your
local IDT sales office or representative for information on the
latest speed availability.
12
Similar pages