RM5231™ Microprocessor with 32-Bit System Bus Document Rev. 1.3 Date: 01/2000 FEATURES • Dual Issue superscalar microprocessor — 150, 200, & 250 MHz operating frequencies — 300 Dhrystone2.1 MIPS • System interface optimized for embedded applications — 32-bit system interface lowers total system cost — High-performance write protocols maximize uncached write bandwidth — Processor clock multipliers: 2, 2.5, 3, 3.5, 4, 4.5, 5, 6, 7, 8, 9 — 2.5V core with 3.3V IO’s — IEEE 1149.1 JTAG boundary scan • Integrated on-chip caches — 32KB instruction and 32KB data - 2 way set associative — Per set locking — Virtually indexed, physically tagged — Write-back and write-through on a per page basis — Pipeline restart on first doubleword for data cache misses • Integrated memory management unit — Fully associative joint TLB (shared by I and D translations) — 48 dual entries map 96 pages — Variable page size (4KB to 16MB in 4x increments) • High-performance floating-point unit - up to 500 MFLOPS — Single cycle repeat rate for common single-precision operations and some double precision operations — Two cycle repeat rate for double-precision multiply and double precision combined multiply-add operations — Single cycle repeat rate for single-precision combined multiply-add operation • MIPS IV instruction set — Floating point multiply-add instruction increases performance in signal processing and graphics applications — Conditional moves to reduce branch frequency — Index address modes (register + register) • Embedded application enhancements — Specialized DSP integer Multiply-Accumulate instructions and 3-operand multiply instruction — I and D cache locking by set — Optional dedicated exception vector for interrupts • Fully static 0.25 micron CMOS design with power down logic — Standby reduced power mode with WAIT instruction — 2.5V core with 3.3V I/O • 128-pin Power-Quad 4 (QFP) package BLOCK DIAGRAM Primary Data Cache 2-way Set Associative DTag ITag DTLB ITLB Primary Instruction Cache 2-way Set Associative A/D Bus Pad Bus Store Buffer Pad Buffer Write Buffer Instruction Dispatch Unit Address Buffer FP Instruction Register Read Buffer Integer Instruction Register FP Bus Integer Bus D Bus Packer/Unpacker Floating-Point MultAdd, Add, Sub, Cvt, Div, Sqrt www.qedinc.com Integer Register File Integer Address/Adder System/Memory Control Shifter/Store Aligner IVA Logic Unit PC Incrementer FA Bus Branch PC Adder ITLB Virtual Program Counter Quantum Effect Devices Load Aligner DVA Coprocessor 0 Integer Control Floating-Point Register File Joint TLB Floating-Point Control Floating-Point Load/Align DTLB Virtual PLL/Clocks RM5231 Microprocessor, Document Rev. 1.3 Int Mult, Div, Madd 1 HARDWARE OVERVIEW Pipeline The RM5231 offers a high-level of integration targeted at high-performance embedded applications. The key elements of the RM5231 are briefly described below. For integer operations, loads, stores, and other non-floating-point operations, the RM5231 uses the 5-stage pipeline. In addition to the integer pipeline, the RM5231 uses an extended 7-stage pipeline for floating-point operations. Superscalar Dispatch The RM5231 has an asymmetric superscalar dispatch unit which allows it to issue an integer instruction and a floatingpoint computation instruction simultaneously. Integer instructions include alu, branch, load/store, and floatingpoint load/store, while floating-point computation instructions include floating-point add, subtract, combined multiply-add, converts, etc. In combination with its highthroughput fully pipelined floating-point execution unit, the superscalar capability of the RM5231 provides unparalleled price/performance in computationally intensive embedded applications. CPU Registers The RM5231 CPU has a user-visible state consisting of 32 general purpose registers, two special purpose registers for integer multiplication and division, a program counter, and no condition code bits. Figure 1 shows the user visible state. Figure 2 shows the RM5231 integer pipeline. As illustrated in the figure, up to five integer instructions can be executing simultaneously. Integer Unit The RM5231 integer unit includes thirty-two general purpose 64-bit registers, a load/store architecture with single cycle ALU operations (add, sub, logical, shift) and an autonomous multiply/divide unit. Additional register resources include: the HI/LO result registers for the twooperand integer multiply/divide operations, and the program counter (PC). The RM5231 implements the MIPS IV Instruction Set Architecture, and is therefore fully upward compatible with applications that run on processors implementing the earlier generation MIPS I-III instruction sets. General Purpose Registers 63 Multiply/Divide Registers 0 0 63 0 r1 r2 HI 63 0 • LO • Program Counter • • 63 r29 0 PC r30 r31 Figure 1 CPU Registers 2 RM5231 Microprocessor, Document Rev. 2.0 Quantum Effect Devices www.qedinc.com I0 1I 2I 1R 2R 1A 2A 1D 2D 1W 2W 1I 2I 1R 2R 1A 2A 1D 2D 1W 2W 1I 2I 1R 2R 1A 2A 1D 2D 1W 2W 1I 2I 1R 2R 1A 2A 1D 2D 1W 2W 1I 2I 1R 2R 1A 2A 1D 2D I1 I2 I3 I4 1W 2W one cycle 1I-1R: Instruction cache access 2I: Instruction virtual to physical address translation 2R: Register file read, Bypass calculation, Instruction decode, Branch address calculation 1A: Issue or slip decision, Branch decision 1A: Data virtual address calculation 1A-2A: Integer add, logical, shift 2A: Store Align 2A-2D: Data cache access and load align 1D: Data virtual to physical address translation 2W: Register file write Figure 2 Pipeline Register File Table 1: The RM5231 has thirty-two general purpose registers with register location 0 (r0) hard wired to a zero value. These registers are used for scalar integer operations and address calculation. The register file has two read ports and one write port and is fully bypassed to minimize operation latency in the pipeline. Opcode MULT/U, MAD/U MUL Integer Multiply/Divide Operations Operand Size Latency Repeat Rate Stall Cycles 16 bit 3 2 0 32 bit 4 3 0 16 bit 3 2 1 32 bit 4 3 2 any 7 6 0 ALU DMULT, DMULTU The RM5231 ALU consists of an integer adder/subtractor, a logic unit, and a shifter. The adder performs address calculations in addition to arithmetic operations. The logic unit performs all logical and zero shift data moves. The shifter performs shifts and store alignment operations. Each of these units is optimized to perform all operations in a single processor cycle. DIV, DIVD any 36 36 0 DDIV, DDIVU any 68 68 0 Integer Multiply/Divide The RM5231 has a dedicated integer multiply/divide unit optimized for high-speed multiply and multiply-accumulate operations. Table 1 shows the performance of the multiply/ divide unit on each operation. The baseline MIPS IV ISA specifies that the results of a multiply or divide operation be placed in the Hi and Lo registers. These values can then be transferred to the general purpose register file using the Move-from-Hi and Movefrom-Lo (MFHI/MFLO) instructions. In addition to the baseline MIPS IV integer multiply instructions, the RM5231 also implements the 3 operand multiply instruction, MUL. This instruction specifies that the multiply result go directly to the integer register file rather than the Lo register. The portion of the multiply that would have normally gone into the Hi register is discarded. For applications where it is known that the upper half of the multiply result is not required, using the MUL instruction eliminates the necessity of executing an explicit MFLO instruction. Also included in the RM5231 are the multiply-add instructions, MADU/MAD. This instruction multiplies two operands and adds the resulting product to the current contents of Quantum Effect Devices www.qedinc.com RM5231 Microprocessor, Document Rev. 1.3 3 the Hi and Lo registers. The multiply-accumulate operation is the core primitive of almost all signal processing algorithms allowing the RM5231 to eliminate the need for a separate DSP engine in many embedded applications. Floating-Point Co-Processor The RM5231 incorporates a high-performance fully pipelined floating-point co-processor which includes a floatingpoint register file and autonomous execution units for multiply/add/convert and divide/square root. The floating-point coprocessor is a tightly coupled execution unit, decoding and executing instructions in parallel with, and in the case of floating-point loads and stores, in cooperation with the integer unit. The superscalar capabilities of the RM5231 allow floating-point computation instructions to issue concurrently with integer instructions. Floating-Point Unit The RM5231 floating-point execution unit supports single and double precision arithmetic, as specified in the IEEE Standard 754. The execution unit is broken into a separate divide/square root unit and a pipelined multiply/add unit. Overlap of the divide/square root and multiply/add operations is supported. The RM5231 maintains fully precise floating-point exceptions while allowing both overlapped and pipelined operations. Precise exceptions are extremely important in objectoriented programming environments and highly desirable for debugging in any environment. Floating-point operations includes: • • • • • • • • • • add subtract multiply divide square root reciprocal reciprocal square root conditional moves conversion between fixed-point and floating-point format conversion between floating-point formats, and floating-point compare. Table 2 gives the latencies of the floating-point instructions in internal processor cycles. and issue a floating-point co-processor load or store doubleword instruction in every cycle. The floating-point control register space contains two registers; one for determining configuration and revision information for the coprocessor and one for control and status information. These are primarily used for diagnostic software, exception handling, state saving and restoring, and control of rounding modes. To support superscalar operation, the FGR has four read ports and two write ports, and is fully bypassed to minimize operation latency in the pipeline. Three of the read ports and one write port are used to support the combined multiply-add instruction while the fourth read and second write port allows a concurrent floating-point load or store. Table 2: Floating-Point Instruction Cycles Operation Latency Repeat Rate fadd 4 1 fsub 4 1 fmult 4/5 1/2 fmadd 4/5 1/2 fmsub 4/5 1/2 fdiv 21/36 19/34 fsqrt 21/36 19/34 frecip 21/36 19/34 frsqrt 38/68 36/66 fcvt.s.d 4 1 fcvt.s.w 6 3 fcvt.s.l 6 3 fcvt.d.s 4 1 fcvt.d.w 4 1 fcvt.d.l 4 1 fcvt.w.s 4 1 fcvt.w.d 4 1 fcvt.l.s 4 1 fcvt.l.d 4 1 fcmp 1 1 fmov 1 1 fmovc 1 1 fabs 1 1 fneg 1 1 Note: Numbers are represented as single/double precision format. Floating-Point General Register File System Control Co-processor (CP0) The floating-point general register file (FGR) is made up of thirty-two 64-bit registers. With the floating-point load and store double instructions (LDC1 and SDC1), the floatingpoint unit can take advantage of the 64-bit wide data cache The system control co-processor, also called coprocessor 0 or CP0 in the MIPS architecture, is responsible for the virtual memory sub-system, the exception control system, and the diagnostics capability of the processor. In the MIPS 4 RM5231 Microprocessor, Document Rev. 2.0 Quantum Effect Devices www.qedinc.com • • architecture, the system control co-processor (and thus the kernel software) is implementation dependent. The memory management unit controls the virtual memory system page mapping. It consists of an instruction address translation buffer, ITLB, a data address translation buffer, DTLB, a Joint instruction and data address translation buffer, JTLB, and co-processor registers used by the virtual memory mapping sub-system. System Control Co-Processor Registers The RM5231 incorporates all system control co-processor (CP0) registers on-chip. These registers provide the path through which the virtual memory system’s page mapping is examined and modified, exceptions are handled, and operating modes are controlled (kernel vs. user mode, interrupts enabled or disabled, cache features). In addition, the RM5231 includes registers to implement a real-time cycle counting facility to aid in cache diagnostic testing and to assist in data error detection. Figure 3 shows the CP0 registers. kernel mode supervisor mode This mechanism is available to system software to provide a secure environment for user processes. Bits in the CP0 Status register determine which virtual addressing mode is used. In the user mode, the RM5231 provides a single, uniform virtual address space of 1TB (2GB in 32-bit mode). When operating in the kernel mode, four distinct virtual address spaces, totalling over 2.5TB (4GB in 32-bit mode), are simultaneously available and are differentiated by the high-order bits of the virtual address. The RM5231 processors also support a supervisor mode in which the virtual address space over 2TB (2.5GB in 32-bit mode), divided into three regions based on the high-order bits of the virtual address. When the RM5231 is configured as a 64-bit microprocessor, the virtual address space layout is an upward compatible extension of the 32-bit virtual address space layout. Figure 4 shows the address space layout for 32-bit operation. Virtual to Physical Address Mapping The RM5231 provides three modes of virtual addressing: • user mode PageMask 5* EntryLo0 2* EntryHi 10* EntryLo1 3* 47 Index 0* Context 4* BadVAddr 8* Count 9* Compare 11* Status 12* Cause 13* EPC 14* TLB Random 1* Wired 6* (entries protected from TLBWR) 0 LLAddr 17* TagLo 28* TagHi 29* XContext 20* ECC 26* PRId 15* CacheErr 27* ErrorEPC 30* Config 16* Used for memory management Used for exception processing * Register number Figure 3 CP0 Registers Quantum Effect Devices www.qedinc.com RM5231 Microprocessor, Document Rev. 1.3 5 0xFFFFFFFF 0xE0000000 0xDFFFFFFF 0xC0000000 0xBFFFFFFF 0xA0000000 0x9FFFFFFF 0x80000000 0x7FFFFFFF Kernel virtual address space (kseg3) Mapped, 0.5GB Supervisor virtual address space (ksseg) Mapped, 0.5GB Uncached kernel physical address space (kseg1) Unmapped, 0.5GB Cached kernel physical address space (kseg0) Unmapped, 0.5GB User virtual address space (kuseg) Mapped, 2.0GB replacement. This mechanism allows the operating system to guarantee that certain pages are always mapped for performance reasons and for deadlock avoidance. This mechanism also facilitates the design of real-time systems by allowing deterministic access to critical software. The JTLB also contains information that controls the cache coherency protocol for each page. Specifically, each page has attribute bits to determine the following coherency algorithms: • • • • • • • uncached non-coherent write-back non-coherent write-through with write-allocate non-coherent write-through without write-allocate sharable exclusive update Note that both of the write-through protocols bypass the secondary cache since it does not support writes of less than a complete cache line. 0x00000000 Figure 4 Kernel Mode Virtual Addressing (32-bit) Joint TLB For fast virtual-to-physical address translation, the RM5231 uses a large, fully associative TLB that maps 96 virtual pages to their corresponding physical addresses. As indicated by its name, the joint TLB (JTLB) is used for both instruction and data translations. The JTLB is organized as 48 pairs of even-odd entries, and maps a virtual address and address space identifier into the large, 64GB physical address space. Two mechanisms are provided to assist in controlling the amount of mapped space and the replacement characteristics of various memory regions. First, the page size can be configured, on a per-entry basis, to use page sizes in the range of 4KB to 16MB (in multiples of 4). The CP0 Page Mask register is loaded with the desired page size of a mapping, and that size is stored into the TLB along with the virtual address when a new entry is written. Thus, operating systems can create special purpose maps; for example, an entire frame buffer can be memory mapped using only one TLB entry. The second mechanism controls the replacement algorithm when a TLB miss occurs. The RM5231 provides a random replacement algorithm to select a TLB entry to be written with a new mapping. However, the processor also provides a mechanism whereby a system specific number of mappings can be locked into the TLB, thereby avoiding random 6 The non-coherent protocols are used for both code and data on the RM5231, with data using write-back or writethrough depending on the application. The coherency attributes generate coherent transaction types on the system interface. However, in the RM5231 cache coherency is not supported, hence the coherency attributes should never be used. Instruction TLB The RM5231 implements a 2-entry instruction TLB (ITLB) to minimize contention for the JTLB, eliminate the timing critical path of translating through a large associative array, and save power. Each ITLB entry maps a 4KB page. The ITLB improves performance by allowing instruction address translation to occur in parallel with data address translation. When a miss occurs on an instruction address translation by the ITLB, the least-recently used ITLB entry is filled from the JTLB. The operation of the ITLB is completely transparent to the user. Data TLB The RM5231 implements a 4-entry data TLB (DTLB) for the same reasons cited above for the ITLB. Each DTLB entry maps a 4KB page. The DTLB improves performance by allowing data address translation to occur in parallel with instruction address translation. When a miss occurs on a data address translation by the DTLB, the DTLB is filled from the JTLB. The DTLB refill is pseudo-LRU: the least recently used entry of the least recently used pair of entries is filled. The operation of the DTLB is completely transparent to the user. RM5231 Microprocessor, Document Rev. 2.0 Quantum Effect Devices www.qedinc.com Cache Memory physically tagged to allow simultaneous address translation and data cache access. In order to keep the RM5231’s high-performance pipeline full and operating efficiently, the RM5231 incorporates onchip instruction and data caches that can be accessed in a single processor cycle. Each cache has its own 64-bit data path and both caches can be accessed simultaneously. The cache subsystem provides the integer and floatingpoint units with an aggregate bandwidth of over 3GB per second at an internal clock frequency of 200 MHz. Cache protocols supported for the data cache are: 1. 2. Instruction Cache The RM5231 incorporates a two-way set associative onchip instruction cache. This virtually indexed, physically tagged cache is 32KB in size and is protected with word parity. Since the cache is virtually indexed, the virtual-to-physical address translation occurs in parallel with the cache access, further increasing performance by allowing these two operations to occur simultaneously. The cache tag contains a 24-bit physical address, a valid bit, and has a single parity bit. The instruction cache is 64-bits wide and can be accessed each processor cycle. Accessing 64 bits per cycle allows the instruction cache to supply two instructions per cycle to the superscalar dispatch unit. For typical code sequences where a floating-point load or store and a floating-point computation instruction are being issued together in a loop, the entire bandwidth available from the instruction cache will be consumed. Cache miss refill writes 64 bits per cycle to minimize the cache miss penalty. The line size is eight instructions (32 bytes) to maximize the performance of communication between the processor and the memory system. The RM5231 supports instruction cache locking. The contents of one set of the cache, set A, can be locked by setting a bit in the coprocessor 0 Status register. Locking the set prevents its contents from being overwritten by a subsequent cache miss. Refill will occur only into set B. This mechanism allows the programmer to lock critical code into the cache thereby guaranteeing deterministic behavior for the locked code sequence. Data Cache For fast, single cycle data access, the RM5231 includes a 32KB on-chip data cache that is two-way set associative with a fixed 32-byte (eight words) line size. The data cache is protected with byte parity and its tag is protected with a single parity bit. It is virtually indexed and Quantum Effect Devices www.qedinc.com 3. 4. Uncached. Reads to addresses in a memory area identified as uncached do not access the cache. Writes to such addresses are written directly to main memory without updating the cache. Write-back. Loads and instruction fetches first search the cache, reading main memory only if the desired data is not cache resident. On data store operations, the cache is first searched to determine if the target address is cache resident. If it is resident, the cache contents are updated, and the cache line is marked for later write-back. If the cache lookup misses, the target line is first brought into the cache and then the write is performed as above. Write-through with write allocate. Loads and instruction fetches first search the cache, reading main memory only if the desired data is not cache resident. On data store operations, the cache is first searched to determine if the target address is cache resident. If it is resident, the cache contents are updated and main memory is written, leaving the write-back bit of the cache line unchanged. If the cache lookup misses, the target line is first brought into the cache and then the write is performed as above. Write-through without write allocate. Loads and instruction fetches first search the cache, reading main memory only if the desired data is not cache resident. On data store operations, the cache is first searched to determine if the target address is cache resident. If it is resident, the cache contents are updated and main memory is written, leaving the write-back bit of the cache line unchanged. If the cache lookup misses, then only main memory is written. The most commonly used write policy is write-back, where a store to a cache line does not immediately cause memory to be updated. This increases system performance by reducing bus traffic and eliminating the bottleneck of waiting for each store operation to finish before issuing a subsequent memory operation. Software can, however, select write-through on a per-page basis when appropriate, such as for frame buffers. Associated with the data cache is the store buffer. When the RM5231 executes a STORE instruction, this singleentry buffer gets written with the store data while the tag comparison is performed. If the tag matches, then the data is written into the data cache in the next cycle that the data cache is not accessed (the next non-load cycle). The store buffer allows the RM5231 to execute a store every processor cycle and to perform back-to-back stores without penalty. In the event of a store immediately followed by a load to the same address, a combined merge and cache write occurs such that no penalty is incurred. RM5231 Microprocessor, Document Rev. 1.3 7 The RM5231 cache attributes for both the instruction and data caches are summarized in Table 3. Table 3: Cache Attributes Characteristics Instruction Data Size 32KB 32KB Organization 2-way set associative 2-way set associative Line size 32B 32B Index vAddr11..0 vAddr11..0 Tag pAddr31..12 pAddr31..12 Write policy n.a. write-back/writethrough Read order sub-block sub-block write order sequential sequential miss restart after transfer of entire line first double Parity per-word per-byte Cache locking set A set A Write Buffer Writes to external memory, whether cache miss writebacks or stores to uncached or write-through addresses, use the on-chip write buffer. The write buffer holds up to four 64-bit address and data pairs. The entire buffer is used for a data cache write-back and allows the processor to proceed in parallel with the memory update. For uncached and write-through stores, the write buffer significantly increases performance by decoupling the SysAD bus transfers from the instruction execution stream. System Interface The system interface consists of a 32-bit Address/Data bus with 4 parity check bits and a 9-bit command bus. In addition, there are 6 handshake signals and 6 interrupt inputs. The interface is capable of transferring data between the processor and memory at a peak rate of 400MB/sec with a 100 MHz SysClock. Figure 5 shows a typical embedded system using the RM5231. In this example, a bank of DRAMs and a memory controller ASIC share the processor’s SysAD bus while the memory controller provides separate ports to a boot ROM and an I/O system. System Address/Data Bus The 32-bit System Address Data (SysAD) bus is used to transfer addresses and data between the RM5231 and the rest of the system. It is protected with a 4-bit parity check bus (SysADC). The system interface is configurable to allow easy interfacing to memory and I/O systems of varying frequencies. The data rate and the bus frequency at which the RM5231 transmits data to the system interface are programmable at boot time via mode control bits. The rate at which the processor receives data is fully controlled by the external device. Address Flash/ Boot Rom DRAM 36 Control x 8 x Latch RM5231 36 Memory I/O Controller PCI Bus 23 Figure 5 Typical Embedded System Block Diagram System Command Bus 8 The RM5231 interface has a 9-bit System Command (SysCmd) bus. The command bus indicates whether the RM5231 Microprocessor, Document Rev. 2.0 Quantum Effect Devices www.qedinc.com SysAD bus carries address or data information on a perclock basis. If the SysAD carries address, then the SysCmd bus also indicates what type of transaction is to take place (for example, a read or write). If the SysAD carries data, then the SysCmd bus also gives information about the data (for example, this is the last data word transmitted, or the data contains an error). The SysCmd bus is bidirectional to support both processor requests and external requests to the RM5231. Processor requests are initiated by the RM5231 and responded to by an external device. External requests are issued by an external device and require the RM5231 to respond. The RM5231 supports one- to four-byte transfers as well as block transfers on the SysAD bus. In the case of a subword transfer, the two low-order address bits give the byte address of the transfer, and the SysCmd bus indicates the number of bytes being transferred. Handshake Signals There are six handshake signals on the system interface. Two of these, RdRdy* and WrRdy*, are used by an external device to indicate to the RM5231 whether it can accept a new read or write transaction. The RM5231 samples these signals before deasserting the address on read and write requests. ExtRqst* and Release* are used to transfer control of the SysAD and SysCmd buses from the processor to an external device. When an external device needs to control the interface, it asserts ExtRqst*. The RM5231 responds by asserting Release* to release the system interface to slave state. ValidOut* and ValidIn* are used by the RM5231 and the external device respectively to indicate that there is a valid command or data on the SysAD and SysCmd buses. The RM5231 asserts ValidOut* when it is driving these buses with a valid command or data, and the external device drives ValidIn* when it has control of the buses and is driving a valid command or data. Non-overlapping System Interface The RM5231 requires a non-overlapping system interface. This means that only one processor request may be outstanding at a time and that the request must be serviced by an external device before the RM5231 issues another request. The RM5231 can issue read and write requests to an external device, whereas an external device can issue null and write requests to the RM5231. For processor reads the RM5231 asserts ValidOut* and simultaneously drives the address and read command on the SysAD and SysCmd buses respectively. If the system interface has RdRdy* asserted, then the processor tristates its drivers and releases the system interface to the slave state by asserting Release*. The external device can then begin sending data to the RM5231. Figure 6 shows a processor block read request and the external agent read response. The read latency is 4 cycles (ValidOut* to ValidIn*), and the response data pattern is “WWWWWWWW”, indicating that data can be transferred on every clock with no wait states in-between. Figure 7 shows a processor block write using write response pattern “WWWWWWWW”, or code 0, of the boot time mode select options. SysClock SysAD Addr Data0 Data1 Data2 Data3 Data4 Data5 Data6 Data7 SysCmd Read NData NData NData NData NData NData NData NEOD ValidOut* ValidIn* RdRdy* WrRdy* Release* Figure 6 Processor Block Read Quantum Effect Devices www.qedinc.com RM5231 Microprocessor, Document Rev. 1.3 9 SysClock SysAD Addr Data0 Data1 Data2 Data3 Data4 Data5 Data6 Data7 SysCmd Write NData NData NData NData NData NData NData NEOD ValidOut* ValidIn* RdRdy* WrRdy* Release* Figure 7 Processor Block Write Enhanced Write Modes The RM5231 implements two enhancements to the original R4000 write mechanism: Write Reissue and Pipeline Writes. The original R4000 allowed a write on the SysAD bys every four SysClock cycles. Hence for a non-block write, this meant that two out of every four cycles were wait states. Pipelined write mode eliminates these two wait states by allowing the processor to drive a new write address onto the bus immediately after the previous data cycle. This allows for higher SysAD bus utilization. However, at high frequencies the processor may drive a subsequent write onto the bus prior to the time the external agent deasserts WrRdy*, indicating that it can not accept another write cycle. This can cause the cycle to be aborted. Write reissue mode is an enhancement to pipelined write mode and allows the processor to reissue aborted write cycles. If WrRdy* is deasserted during the issue phase of a write operation, the cycle is aborted by the processor and reissued at a later time. In write reissue mode, a write rate of one write every two bus cycles can be achieved. Pipelined writes have the same two bus cycle write repeat rate, but can issue one additional write following the deassertion of WrRdy*. External Requests The RM5231 can respond to certain requests issued by an external device. These requests take one of two forms: Write requests and Null requests. An external device executes a write request when it wishes to update one of the processors writable resources such as the internal interrupt register. A null request is executed when the external 10 device wishes the processor to reassert ownership of the processor external interface (the external device wants the processor interface to go from slave state to master state). Typically, a null request is executed after an external device, that has acquired control of the processor interface via the asertion of ExtRqst*, has completed a transaction between itself and system memory in a system where memory is connected directly to the SysAD bus. Normally, this transaction would be a DMA read or write from the I/O system. Interrupt Handling In order to provide better real time interrupt handling, the RM5231 supports a dedicated interrupt vector. When enabled by the real time executive, by setting a bit in the Cause register, interrupts vector to a specific address which is not shared with any of the other exception types. This capability eliminates the need to go through the normal software routine for exception decode and dispatch, thereby lowering interrupt latency. Standby Mode The RM5231 provides a means to reduce the amount of power consumed by the internal core when the CPU would otherwise not be performing any useful operations. This state is known as Standby Mode. Executing the WAIT instruction enables interrupts causes the processor to enter Standby Mode. When the wait instruction completes the W pipe stage, and if the SysAD bus is currently idle, the internal processor clocks stop, thereby freezing the pipeline. The phase lock loop, or PLL, internal timer/counter, and the “wake up” input pins: Int[5:0]*, NMI*, ExtReq*, Reset*, and ColdReset* will continue to operate in their normal fashion. If the SysAD bus is not idle when the WAIT instruction completes the W pipe- RM5231 Microprocessor, Document Rev. 2.0 Quantum Effect Devices www.qedinc.com stage, then the WAIT is treated as a NOP until the bus operation is completed. Once the processor is in Standby, any interrupt, including the internally generated timer interrupt, will cause the processor to exit Standby and resume operation where it left off. The WAIT instruction is typically inserted in the idle loop of the operating system or real time executive. Mode bit 7:5 JTAG Interface The RM5231 interface supports JTAG boundary scan in conformance with the IEEE 1149.1 specification. The JTAG interface is especially helpful for checking the integrity of the processors pin connections. Boot-Time Options Fundamental operational modes for the processor are initialized by the boot-time mode control interface. This serial interface operates at a very low frequency (SysClock divided by 256). The low frequency operation allows the initialization information to be kept in a low cost EPROM or a system interface ASIC. Immediately after the VccOK signal is asserted, the processor reads a serial bit stream of 256 bits to initialize all the fundamental operational modes. ModeClock run continuously from the assertion of VccOK. The boot-time serial mode stream is defined in Table 4. Bit 0 is the bit presented to the processor as the first bit in the stream when VccOK is deasserted. Bit 255 is the last bit transferred. Boot-Time Mode Bit Stream Mode bit 0 4:1 Description Pclock to SysClock Multiplier Mode Bit 20=0 / Mode Bit 20=1 0: Multiply by 2/x 1: Multiply by 3/x 2: Multiply by 4/x 3: Multiply by 5/2.5 4: Multiply by 6/x 5: Multiply by 7/3.5 6: Multiply by 8/x 7: Multiply by 9/4.5 Specifies byte ordering. Logically ORed with BigEndian input signal. 0: Little endian 1: Big endian 10:9 Non-Block Write Control 00: R4000 compatible non-block writes 01: reserved 10: pipelined non-block writes 11: non-block write re-issue 11 Timer Interrupt Enable/Disable 0: Enable the timer interrupt on Int*[5] 1: Disable the timer interrupt on Int*[5] 12 Reserved: Must be zero 14:13 15 Boot-Time Modes Table 4: 8 Description Output driver strength - 100% = fastest 00: 67% strength 01: 50% strength 10: 100% strength 11: 83% strength Reserved: Must be zero 17:16 System configuration identifiers - software visible in processor Config[21..20] register 19:18 Reserved: Must be zero 20 Select SysClock to PClock Multiply Mode 0: Integer Multipliers 1: Half-Integer Multipliers 21 Reserved: Must be one 255:22 Reserved: Must be zero Reserved: Must be zero Write-back data rate (W = write data transfer, x = wait state) 0: WWWWWWWW 1: WWxWWxWWxWWx 2: WWxxWWxxWWxxWWxx 3: WxWxWxWxWxWxWxWx 4: WWxxxWWxxxWWxxxWWxxx 5: WWxxxxWWxxxxWWxxxxWWxxxx 6: WxxWxxWxxWxxWxxWxxWxxWxx 7: WWxxxxxxWWxxxxxxWWxxxxxxWWxxxxxx 8: WxxxWxxxWxxxWxxxWxxxWxxxWxxxWxxx 9-15 reserved Quantum Effect Devices www.qedinc.com RM5231 Microprocessor, Document Rev. 1.3 11 PIN DESCRIPTIONS The following is a list of interface, interrupt, and miscellaneous pins available on the RM5231. Pin Name Type Description System Interface ExtRqst* Input External request Signals that the system interface is submitting an external request. Release* Output Release interface Signals that the processor is releasing the system interface to slave state RdRdy* Input Read Ready Signals that an external agent can now accept a processor read. WrRdy* Input Write Ready Signals that an external agent can now accept a processor write request. ValidIn* Input Valid Input Signals that an external agent is now driving a valid address or data on the SysAD bus and a valid command or data identifier on the SysCmd bus. ValidOut* Output Valid output Signals that the processor is now driving a valid address or data on the SysAD bus and a valid command or data identifier on the SysCmd bus. SysAD[31:0] Input/Output System address/data bus A 32-bit address and data bus for communication between the processor and an external agent. SysADC[3:0] Input/Output System address/data check bus A 4-bit bus containing parity check bits for the SysAD bus during data cycles. SysCmd[8:0] Input/Output System command/data identifier bus A 9-bit bus for command and data identifier transmission between the processor and an external agent. SysCmdP Input/Output Reserved for system command/data identifier bus parity For the RM5231, unused on input and zero on output. Clock/Control Interface SysClock Input System clock Master clock input used as the system interface reference clock. All output timings are relative to this input clock. Pipeline operation frequency is derived by multiplying this clock up by the factor selected during boot initialization VccP Input Quiet Vcc for PLL Quiet Vcc for the internal phase locked loop. Must be connected to VccInt through a filter circuit. VssP Input Quiet VSS for PLL Quiet Vss for the internal phase locked loop. Must be connected to VssInt through a filter circuit. Interrupt interface: Int*[5:0] Input Interrupt Six general processor interrupts, bit-wise ORed with bits 5:0 of the interrupt register. NMI* Input Non-maskable interrupt Non-maskable interrupt, ORed with bit 6 of the interrupt register. JTDI Input JTAG data in JTAG serial data in. JTCK Input JTAG clock input JTAG serial clock input. JTDO Output JTAG data out JTAG serial data out. JTMS Input JTAG command JTAG command signal, signals that the incoming serial data is command data. JTAG Interface 12 RM5231 Microprocessor, Document Rev. 2.0 Quantum Effect Devices www.qedinc.com Pin Name Type Description Initialization interface BigEndian Input Allows the system to change the processor addressing mode without rewriting the mode ROM. VccOK Input Vcc is OK When asserted, this signal indicates to the RM5231 that the 3.3V power supply has been above 3.0V for more than 100 milliseconds and will remain stable. The assertion of VccOK initiates the reading of the boot-time mode control serial stream. ColdReset* Input Cold reset This signal must be asserted for a power on reset or a cold reset. ColdReset must be de-asserted synchronously with SysClock. Reset* Input Reset This signal must be asserted for any reset sequence. It may be asserted synchronously or asynchronously for a cold reset, or synchronously to initiate a warm reset. Reset must be de-asserted synchronously with SysClock. ModeClock Output Boot mode clock Serial boot-mode data clock output at the system clock frequency divided by 256 ModeIn Input Boot mode data in Serial boot-mode data input. ABSOLUTE MAXIMUM RATINGS1 Symbol Rating V 0 to +85 °C –55 to +125 °C DC Input Current 203 mA DC Output Current 504 mA Terminal Voltage with respect to GND TCASE Operating Temperature TSTG Storage Temperature IOUT Note 1: Note 2: Unit –0.52 to +3.9 VTERM IIN Limits Stresses greater than those listed under ABSOLUTE MAXIMUM RATINGS may cause permanent damage to the device. This is a stress rating only and functional operation of the device at these or any other conditions above those indicated in the operational sections of this specification is not implied. Exposure to absolute maximum rating conditions for extended periods may affect reliability. VIN minimum = -2.0V for pulse width less than 15ns. VIN should not exceed 3.9 Volts. Note 3: When VIN < 0V or VIN > VccIO Note 4: Not more than one output should be shorted at a time. Duration of the short should not exceed 30 seconds. RECOMMENDED OPERATING CONDITIONS Grade Temperature Vss VccInt VccIO VccP Commercial 0°C to +85°C (Case) 0V 2.5V±5% 3.3V±5% 2.5V±5% Industrial -40°C to +85°C (Case) 0V 2.5V±5% 3.3V±5% 2.5V±5% Note: VCC I/O should not exceed VccInt by greater than 1.2V during the power-up sequence. Note: Applying a logic high state to any I/O pin before VccInt becomes stable is not recommended. Note: As specified in IEEE 1149.1 (JTAG), the JTMS pin must be held low during reset to avoid entering JTAG test mode. Quantum Effect Devices www.qedinc.com RM5231 Microprocessor, Document Rev. 1.3 13 DC ELECTRICAL CHARACTERISTICS Parameter Minimum VOL VOH Conditions Maximum 0.1V |IOUT|= 20 µA VccIO - 0.1V VOL 0.4V VOH 2.4V VIL -0.5V 0.2 x VccIO VIH 0.7 x VccIO VccIO + 0.5V IIN ±20 µA ±20 µA CIN 10pF COUT 10pF |IOUT| = 4mA VIN = 0 VIN = VccIO POWER CONSUMPTION CPU Speed Parameter Conditions: Max: VccInt = 2.625 Typ: VccInt = 2.5V 150 MHz Typ standby VccInt Power (mWatts) active Max1 200 200 MHz Typ Max1 250 250 MHz Typ Max1 350 R4000 write protocol with no FPU operation (integer instructions only) 1100 2200 1425 2800 1725 3450 Write re-issue or pipelined writes with superscalar 1225 2450 1600 3200 1900 3800 Note 1: Worst case instruction mix with worst case supply voltage. Note: I/O supply power is application dependant, but typically <10% of VccInt. 14 RM5231 Microprocessor, Document Rev. 2.0 Quantum Effect Devices www.qedinc.com AC ELECTRICAL CHARACTERISTICS Capacitive Load Deration CPU Speed 150–250 MHz Parameter Symbol Load Derate Min Max CLD Units 2 IO Power Derate IO Power Derate @ 20pF Load ns/25pF 17.5 mW/25pF/MHz 5.5 mW/MHz 4.0 Clock Parameters CPU Speed Parameter Symbol 150 MHz Test Conditions Min Max 200 MHz Min 250 MHz Max Min Max Units SysClock High tSCH Transition ≤ 5ns 3 3 3 ns SysClock Low tSCL Transition ≤ 5ns 3 3 3 ns SysClock Frequency 25 75 25 100 25 100 MHz 40 ns tSCP 40 40 Clock Jitter for SysClock tJI ±200 ±200 SysClock Rise Time tCR 2 2 2 ns SysClock Fall Time tCF 2 2 2 ns ModeClock Period tModeCKP 256 256 256 tSCP JTAG Clock Period tJTAGCKP 4 4 4 tSCP SysClock Period ±150 ps Note: Operation of the RM5231 is only guaranteed with the Phase Lock Loop enabled. System Interface Parameters1 Parameter Symbol Test Conditions 150–250 MHz CPU Speed Min Max Units 1.0 4.5 ns mode14..13 = 11 1.0 5.0 ns mode14..13 = 00 1.0 5.5 ns mode14..13 = 01 (slowest) 1.0 6.0 ns mode14..13 = 10 (fastest) Data Output2,3 tDO 5 Data Setup4 tDS trise = see above table 2.5 ns Data Hold4 tDH tfall = see above table 1.0 ns Note 1: Note 2: Note 3: Note 4: Note 5: Quantum Effect Devices www.qedinc.com Timings are measured from 1.5V of the clock to 1.5V of the signal. Capacitive load for all output timings is 50pF. Data Output timing applies to all signal pins whether tristate I/O or output only. Setup and Hold parameters apply to all signal pins whether tristate I/O or input only. Only mode 14:13 = 10 is tested and guaranteed. RM5231 Microprocessor, Document Rev. 1.3 15 Boot-Time Interface Parameters CPU Speed 150–250 MHz Parameter Symbol Test Conditions Min Max Units Mode Data Setup tDS(M) 4 SysClock cycles Mode Data Hold tDH(M) 0 SysClock cycles TIMING DIAGRAMS SysClock tRise tFall tHigh tLow ±tJitterIn Figure 8 Clock Timing System Interface Timing (SysAD, SysCmd, ValidIn*, ValidOut*, etc.) SysClock tDS Data tDH Data Figure 9 Input Timing SysClock tDOmax Data tDOmin Data Data Figure 10 Output Timing 16 RM5231 Microprocessor, Document Rev. 2.0 Quantum Effect Devices www.qedinc.com PACKAGING INFORMATION 5 D 4 Hx 10 2.00 DIA 4 PLACES D/2 A 3 N 7 D1 SEE DETAIL “A” D1/2 DA-B H 0.20 E/2 D 3 Hy ODD LEAD SIDES (D2) (b) 4X X 3 X = A, B, OR D (E2) E 4 7 5 E1 11.0 REF. 11.0 REF. E1/2 4X 0.20 C A-B D B 3 4.00 R. 4 PLACES 11.0 REF. SEATING PLANE C 8 12-16° 11.0 REF. “COUNTRY OF ORIGIN” MARK 3.00 REF. DIA. 4 PLACES X 3 X = A, B, OR D DETAIL “A” BOTTOM VIEW TOP VIEW EVEN LEAD SIDES e/2 SEE DETAIL “B” 0.40 MIN. A (N-4)X e a,a,a M C A-B D 0° MIN. b 2 H 0.076 C 8 0.10 A2 0.13/0.30 R 0.13 R. MIN. 11 b 11 0.13/0.23 WITH LEAD FINISH 0.13/0.19 b1 11 BASE METAL SECTION C-C Symbol Min Nominal Max A -- 3.70 4.07 A1 0.25 0.33 -- A2 3.17 3.37 3.67 0.25 A1 13 DETAIL “B” 11 GAGE PLANE C C L 1.60 REF. 0-7° All dimensions are in millimeters unless otherwise noted. Note D 31.20 BSC To be determined at seating Plane C. D1 28.00 BSC Dimensions D1 and E1 do not include mold protrusion. Allowable mold protrusion is 0.254 MM per side. Dimension D1 and E1 do include mold mismatch and are determined at Datum Plane H. D2 24.00 REF. E 31.20 BSC To be determined at seating Plane C. E1 28.00 BSC Dimensions D1 and E1 do not include mold protrusion. Allowable mold protrusion is 0.254 MM per side. Dimension D1 and E1 do include mold mismatch and are determined at Datum Plane H. E2 24.00 REF. D3 21.0 REF. E3 21.0 REF. L 0.65 0.70 0.95 N -- 128 -- e 0.80 BSC b 0.30 -- 0.45 b1 0.30 0.35 0.40 a,a,a 0.16 ThetaJa 13.7° C/W ThetaJc 1.5° C/W Quantum Effect Devices www.qedinc.com RM5231 Microprocessor, Document Rev. 1.3 17 NOTES: 1 2 3 4 5 6 7 8 9 10 11 12 13 18 All dimensioning and tolerances confirm to ASME Y14.5–1994. Datum Plane H located at the bottom of the mold parting line and coincident with where lead exits plastic body. Datums A–B and D to be determined where center line between leads exits plastic body at Datum Plane H. To be determined at seating Plane C. Dimensions D1 and E1 do not include mold protrusion. Allowable mold protrusion is 0.254 MM per side. Dimension D1 and E1 do include mold mismatch and are determined at Datum Plane H. “N” is number of terminals. Package top dimensions are smaller than bottom dimensions by 0.20 millimeters and top of package will not overhang bottom of package. Dimensions b does not include Damabr protrusion. Allowable Damabr protrusion shall be 0.08 MM. Total in excess of b dimension at maximum material condition. Damabr can not be located on the lower radius or the foot. The dimension space between protrusion and an adjacent lead shall not be less than 0.07 MM for 0.4 MM and 0.50 MM pitch package. All dimensions are in millimeters. The optional exposed heat shrink is coincident with the top or bottom side of the package and not allowed to protrude beyond that surface. These dimensions apply to the flat section of the lead between 0.10 MM and 0.25 MM from the lead tip. This drawing conforms to JEDEC registered outline MS-022. But the heat slug dimension was not specified on JEDEC. A1 is defined as the distance from the seating plane to the lowest point of the package body. RM5231 Microprocessor, Document Rev. 2.0 Quantum Effect Devices www.qedinc.com RM5231 128 P-QUAD PACKAGE PINOUT Pin Function Pin Function Pin Function Pin Function 1 NC 33 ModeIn 65 NMI* 97 NC 2 NC 34 RdRdy* 66 ExtRqst* 98 NC 3 VccIO 35 WrRdy* 67 Reset* 99 NC 4 Vss 36 ValidIn* 68 ColdReset* 100 NC 5 SysAD4 37 ValidOut* 69 VccOK 101 VccIO 6 SysAD5 38 Release* 70 BigEndian 102 Vss 7 VccInt 39 VccP 71 VccIO 103 SysAD28 8 Vss 40 VssP 72 Vss 104 SysAD29 9 SysAD6 41 SysClock 73 SysAD16 105 VccInt 10 SysAD7 42 VccInt 74 VccInt 106 Vss 11 SysAD8 43 Vss 75 Vss 107 SysAD30 12 SysAD9 44 SysCmd0 76 SysAD17 108 SysAD31 13 VccIO 45 SysCmd1 77 SysAD18 109 SysADC2 14 Vss 46 SysCmd2 78 SysAD19 110 VccInt 15 SysAD10 47 SysCmd3 79 VccInt 111 Vss 16 SysAD11 48 VccIO 80 Vss 112 SysADC3 17 VccInt 49 Vss 81 SysAD20 113 VccIO 18 Vss 50 SysCmd4 82 SysAD21 114 Vss 19 SysAD12 51 SysCmd5 83 VccIO 115 SysADC0 20 SysAD13 52 Vss 84 Vss 116 SysADC1 21 SysAD14 53 SysCmd6 85 SysAD22 117 SysAD0 22 VccInt 54 SysCmd7 86 SysAD23 118 SysAD1 23 Vss 55 SysCmd8 87 SysAD24 119 VccInt 24 SysAD15 56 SysCmdP 88 SysAD25 120 Vss 25 VccIO 57 VccInt 89 VccInt 121 SysAD2 26 Vss 58 Vss 90 Vss 122 SysAD3 27 ModeClock 59 Int0* 91 SysAD26 123 VccIO 28 JTDO 60 Int1* 92 SysAD27 124 Vss 29 JTDI 61 Int2* 93 VccIO 125 NC 30 JTCK 62 Int3* 94 Vss 126 NC 31 JTMS 63 Int4* 95 NC 127 NC 32 VccIO 64 Int5* 96 NC 128 NC Quantum Effect Devices www.qedinc.com RM5231 Microprocessor, Document Rev. 1.3 19 ORDERING INFORMATION RM5231 -123 A I Temperature Grade: (blank) = commercial I = Industrial Package Type: Q = Power Quad 4 (PQ-4) S = SBGA T = TBGA Device Maximum Speed Device Type Valid Combinations RM5231-150Q RM5231-200Q RM5231-250Q RM5231-200QI This document may, wholly or partially, be subject to change without notice. Quantum Effect Devices, Inc. reserves the right to make changes to its products or specifications at any time without notice, in order to improve design or performance and to supply the best possible product. All rights are reserved. No one is permitted to reproduce or duplicate, in any form, the whole or part of this document without QED's permission. QED will not be held responsible for any damage to the user or any property that may result from accidents, misuse, or any other causes arising during operation of the user's unit. LIFE SUPPORT POLICY: QED's products are not designed, intended, or authorized for use as components intended for surgical implant into the body, or other applications intended to support or sustain life, or for any other application in which failure of the product could create a situation where personal injury or death may occur. Should a customer purchase or use the products for any such unintended or unauthorized application, the customer shall indemnify and hold QED and its officers, employees, subsidiaries, affiliates, and distributors harmless against all claims, costs, damages, and expenses, and reasonable attorney fees arising out of, directly or indirectly, any claim of personal injury or death associated with such unintended or unauthorized use, even if such claim alleges that QED was negligent regarding the design or manufacture of the part. QED does not assume any responsibility for use of any circuitry described other than the circuitry embodied in a QED product. The company makes no representations that the circuitry described herein is free from patent infringement or other rights of third parties, which may result from its use. No license is granted by implication or otherwise under any patent, patent rights, or other rights, of QED. The QED logo and RISCMark are trademarks of Quantum Effect Devices, Inc. MIPS is a registered trademark of MIPS Technologies, Inc. All other trademarks are the respective property of the trademark holders. Document Number: RM5231-DS0011300001 Quantum Effect Devices 3255-3 Scott Blvd, Suite 200 Santa Clara, CA. 95054 phone (408) 565-0300 fax (408) 565-0330 [email protected] 20 RM5231 Microprocessor, Document Rev. 1.3 Quantum Effect Devices www.qedinc.com