376 TM HIGH PERFORMANCE 32-BIT EMBEDDED PROCESSOR Y Full 32-Bit Internal Architecture Ð 8-, 16-, 32-Bit Data Types Ð 8 General Purpose 32-Bit Registers Ð Extensive 32-Bit Instruction Set Y High Performance 16-Bit Data Bus Ð 16 or 20 MHz CPU Clock Ð Two-Clock Bus Cycles Ð 16 Mbytes/Sec Bus Bandwidth Y 16 Mbyte Physical Memory Size Y High Speed Numerics Support with the 80387SX Y Low System Cost with the 82370 Integrated System Peripheral Y On-Chip Debugging Support Including Break Point Registers Y Complete Intel Development Support Ð C, PL/M, Assembler Ð ICE TM -376, In-Circuit Emulator Ð iRMK Real Time Kernel Ð iSDM Debug Monitor Ð DOS Based Debug Y Extensive Third-Party Support: Ð Languages: C, Pascal, FORTRAN, BASIC and ADA* Ð Hosts: VMS*, UNIX*, MS-DOS*, and Others Ð Real-Time Kernels Y High Speed CHMOS IV Technology Y Available in 100 Pin Plastic Quad FlatPack Package and 88-Pin Pin Grid Array (See Packaging Outlines and Dimensions Ý231369) INTRODUCTION The 376 32-bit embedded processor is designed for high performance embedded systems. It provides the performance benefits of a highly pipelined 32-bit internal architecture with the low system cost associated with 16-bit hardware systems. The 80376 processor is based on the 80386 and offers a high degree of compatibility with the 80386. All 80386 32-bit programs not dependent on paging can be executed on the 80376 and all 80376 programs can be executed on the 80386. All 32-bit 80386 language translators can be used for software development. With proper support software, any 80386-based computer can be used to develop and test 80376 programs. In addition, any 80386-based PC-AT* compatible computer can be used for hardware prototyping for designs based on the 80376 and its companion product the 82370. 240182 – 48 80376 Microarchitecture Intel, iRMK, ICE, 376, 386, Intel386, iSDM, Intel1376 are trademarks of Intel Corp. *UNIX is a registered trademark of AT&T. ADA is a registered trademark of the U.S. Government, Ada Joint Program Office. PC-AT is a registered trademark of IBM Corporation. VMS is a trademark of Digital Equipment Corporation. MS-DOS is a trademark of MicroSoft Corporation. *Other brands and names are the property of their respective owners. Information in this document is provided in connection with Intel products. Intel assumes no liability whatsoever, including infringement of any patent or copyright, for sale and use of Intel products except as provided in Intel’s Terms and Conditions of Sale for such products. Intel retains the right to make changes to these specifications at any time, without notice. Microcomputer Products may have minor variations to this specification known as errata. COPYRIGHT © INTEL CORPORATION, 1995 December 1990 Order Number: 240182-004 376 EMBEDDED PROCESSOR 1.0 PIN DESCRIPTION 240182 – 52 Figure 1.1. 80376 100-Pin Quad Flat-Pack Pin Out (Top View) Table 1.1. 100-Pin Plastic Quad Flat-Pack Pin Assignments Address A1 A2 A3 A4 A5 A6 A7 A8 A9 A10 A11 A12 A13 A14 A15 A16 A17 A18 A19 A20 A21 A22 A23 2 Data 18 51 52 53 54 55 56 58 59 60 61 62 64 65 66 70 72 73 74 75 76 79 80 D0 D1 D2 D3 D4 D5 D6 D7 D8 D9 D10 D11 D12 D13 D14 D15 Control 1 100 99 96 95 94 93 92 90 89 88 87 86 83 82 81 ADS BHE BLE BUSY CLK2 D/C ERROR FLT HLDA HOLD INTR LOCK M/IO NA NMI PEREQ READY RESET W/R 16 19 17 34 15 24 36 28 3 4 40 26 23 6 38 37 7 33 25 N/C VCC VSS 20 27 8 9 10 21 32 39 42 48 57 69 71 84 91 97 2 5 11 12 13 14 22 35 41 49 50 63 67 68 77 78 85 98 29 30 31 43 44 45 46 47 376 EMBEDDED PROCESSOR Top View (Component Side) 240182 – 49 Bottom View (Pin Side) 240182 – 2 Figure 1.2. 80376 88-Pin Grid Array Pin Out 3 376 EMBEDDED PROCESSOR Table 1.2. 88-Pin Grid Array Pin Assignments 4 Pin 2H Label CLK2 Pin 12D Label A18 Pin 2L Label M/IO Pin 11A Label VCC 9B 8A D15 D14 12E 13E A17 A16 5M 1J LOCK ADS 13A 13C VCC VCC 8B 7A D13 D12 12F 13F A15 A14 1H 2G READY NA 13L 1N VCC VCC 7B 6A D11 D10 12G 13G A13 A12 1G 2F HOLD HLDA 13N 11B VCC VSS 6B D9 13H A11 7N PEREQ 2C VSS 5A D8 12H A10 7M BUSY 1D VSS 5B D7 13J A9 8N ERROR 1M VSS 4B 4A D6 D5 12J 12K A8 A7 9M 8M INTR NMI 4N 9N VSS VSS 3B 2D D4 D3 13K 12L A6 A5 6M 2B RESET VCC 11N 2A VSS VSS 1E 2E D2 D1 12M 11M A4 A3 12B 1C VCC VCC 12A 1B VSS VSS 1F 9A 10A 10B D0 A23 A22 A21 10M 1K 2J 2K A2 A1 BLE BHE 2M 3N 5N 10N VCC VCC VCC VCC 13B 13M 2N 6N VSS VSS VSS VSS 12C 13D A20 A19 4M 3M W/R D/C 1A 3A VCC VCC 12N 1L VSS N/C 376 EMBEDDED PROCESSOR The following table lists a brief description of each pin on the 80376. The following definitions are used in these descriptions: I O I/O Ð The named signal is active LOW. Input signal. Output signal. Input and Output signal. No electrical connection. Symbol Type Name and Function CLK2 I CLK2 provides the fundamental timing for the 80376. For additional information see Clock in Section 4.1. RESET I RESET suspends any operation in progress and places the 80376 in a known reset state. See Interrupt Signals in Section 4.1 for additional information. D15 –D0 I/O DATA BUS inputs data during memory, I/O and interrupt acknowledge read cycles and outputs data during memory and I/O write cycles. See Data Bus in Section 4.1 for additional information. A23 –A1 O ADDRESS BUS outputs physical memory or port I/O addresses. See Address Bus in Section 4.1 for additional information. W/R O WRITE/READ is a bus cycle definition pin that distinguishes write cycles from read cycles. See Bus Cycle Definition Signals in Section 4.1 for additional information. D/C O DATA/CONTROL is a bus cycle definition pin that distinguishes data cycles, either memory or I/O, from control cycles which are: interrupt acknowledge, halt, and instruction fetching. See Bus Cycle Definition Signals in Section 4.1 for additional information. M/IO O MEMORY I/O is a bus cycle definition pin that distinguishes memory cycles from input/output cycles. See Bus Cycle Definition Signals in Section 4.1 for additional information. LOCK O BUS LOCK is a bus cycle definition pin that indicates that other system bus masters are denied access to the system bus while it is active. See Bus Cycle Definition Signals in Section 4.1 for additional information. ADS O ADDRESS STATUS indicates that a valid bus cycle definition and address (W/R, D/C, M/IO, BHE, BLE and A23 –A1) are being driven at the 80376 pins. See Bus Control Signals in Section 4.1 for additional information. NA I NEXT ADDRESS is used to request address pipelining. See Bus Control Signals in Section 4.1 for additional information. READY I BUS READY terminates the bus cycle. See Bus Control Signals in Section 4.1 for additional information. BHE, BLE O BYTE ENABLES indicate which data bytes of the data bus take part in a bus cycle. See Address Bus in Section 4.1 for additional information. HOLD I BUS HOLD REQUEST input allows another bus master to request control of the local bus. See Bus Arbitration Signals in Section 4.1 for additional information. 5 376 EMBEDDED PROCESSOR Symbol HLDA Type O INTR I NMI I BUSY I ERROR I PEREQ I FLT I N/C Ð VCC VSS I I Name and Function BUS HOLD ACKNOWLEDGE output indicates that the 80376 has surrendered control of its local bus to another bus master. See Bus Arbitration Signals in Section 4.1 for additional information. INTERRUPT REQUEST is a maskable input that signals the 80376 to suspend execution of the current program and execute an interrupt acknowledge function. See Interrupt Signals in Section 4.1 for additional information. NON-MASKABLE INTERRUPT REQUEST is a non-maskable input that signals the 80376 to suspend execution of the current program and execute an interrupt acknowledge function. See Interrupt Signals in Section 4.1 for additional information. BUSY signals a busy condition from a processor extension. See Coprocessor Interface Signals in Section 4.1 for additional information. ERROR signals an error condition from a processor extension. See Coprocessor Interface Signals in Section 4.1 for additional information. PROCESSOR EXTENSION REQUEST indicates that the processor extension has data to be transferred by the 80376. See Coprocessor Interface Signals in Section 4.1 for additional information. FLOAT, when active, forces all bidirectional and output signals, including HLDA, to the float condition. FLOAT is not available on the PGA package. See Float for additional information. NO CONNECT should always remain unconnected. Connection of a N/C pin may cause the processor to malfunction or be incompatible with future steppings of the 80376. SYSTEM POWER provides the a 5V nominal D.C. supply input. SYSTEM GROUND provides 0V connection from which all inputs and outputs are measured. 2.0 ARCHITECTURE OVERVIEW The 80376 supports the protection mechanisms needed by sophisticated multitasking embedded systems and real-time operating systems. The use of these protection mechanisms is completely optional. For embedded applications not needing protection, the 80376 can easily be configured to provide a 16 Mbyte physical address space. Instruction pipelining, high bus bandwidth, and a very high performance ALU ensure short average instruction execution times and high system throughput. The 80376 is capable of execution at sustained rates of 2.5–3.0 million instructions per second. The 80376 offers on-chip testability and debugging features. Four break point registers allow conditional or unconditional break point traps on code execution or data accesses for powerful debugging of even ROM based systems. Other testability features include self-test and tri-stating of output buffers during RESET. The Intel 80376 embedded processor consists of a central processing unit, a memory management unit and a bus interface. The central processing unit con6 sists of the execution unit and instruction unit. The execution unit contains the eight 32-bit general registers which are used for both address calculation and data operations and a 64-bit barrel shifter used to speed shift, rotate, multiply, and divide operations. The instruction unit decodes the instruction opcodes and stores them in the decoded instruction queue for immediate use by the execution unit. The Memory Management Unit (MMU) consists of a segmentation and protection unit. Segmentation allows the managing of the logical address space by providing an extra addressing component, one that allows easy code and data relocatability, and efficient sharing. The protection unit provides four levels of protection for isolating and protecting applications and the operating system from each other. The hardware enforced protection allows the design of systems with a high degree of integrity and simplifies debugging. Finally, to facilitate high performance system hardware designs, the 80376 bus interface offers address pipelining and direct Byte Enable signals for each byte of the data bus. 376 EMBEDDED PROCESSOR 2.1 Register Set The 80376 has twenty-nine registers as shown in Figure 2.1. These registers are grouped into the following six categories: 240182 – 47 240182 – 5 Figure 2.1. 80376 Base Architecture Registers 7 376 EMBEDDED PROCESSOR General Registers: The eight 32-bit general purpose registers are used to contain arithmetic and logical operands. Four of these (EAX, EBX, ECX and EDX) can be used either in their entirety as 32-bit registers, as 16-bit registers, or split into pairs of separate 8-bit registers. Segment Registers: Six 16-bit special purpose registers select, at any given time, the segments of memory that are immediately addressable for code, stack, and data. Flags and Instruction Pointer Registers: These two 32-bit special purpose registers in Figure 2.1 record or control certain aspects of the 80376 processor state. The EFLAGS register includes status and control bits that are used to reflect the outcome of many instructions and modify the semantics of some instructions. The Instruction Pointer, called EIP, is 32 bits wide. The Instruction Pointer controls instruction fetching and the processor automatically increments it after executing an instruction. System Address Registers: These four special registers reference the tables or segments supported by the 80376/80386 protection model. These tables or segments are: GDTR (Global Descriptor Table Register), IDTR (Interrupt Descriptor Table Register), LDTR (Local Descriptor Table Register), TR (Task State Segment Register). Debug Registers: The six programmer accessible debug registers provide on-chip support for debugging. The use of the debug registers is described in Section 2.11 Debugging Support. EFLAGS REGISTER The flag Register is a 32-bit register named EFLAGS. The defined bits and bit fields within EFLAGS, shown in Figure 2.2, control certain operations and indicate the status of the 80376 processor. The function of the flag bits is given in Table 2.1. Control Register: The 32-bit control register, CR0, is used to control Coprocessor Emulation. 240182 – 3 240182 – 4 240182 – 5 Figure 2.2. Status and Control Register Bit Functions 8 376 EMBEDDED PROCESSOR Table 2.1. Flag Definitions Bit Position 0 2 Name CF PF 4 AF 6 7 ZF SF 8 TF 9 IF 10 DF 11 OF 12, 13 IOPL 14 NT 16 RF Function Carry FlagÐSet on high-order bit carry or borrow; cleared otherwise. Parity FlagÐSet if low-order 8 bits of result contain an even number of 1-bits; cleared otherwise. Auxiliary Carry FlagÐSet on carry from or borrow to the low order four bits of AL; cleared otherwise. Zero FlagÐSet if result is zero; cleared otherwise. Sign FlagÐSet equal to high-order bit of result (0 if positive, 1 if negative). Single Step FlagÐOnce set, a single step interrupt occurs after the next instruction executes. TF is cleared by the single step interrupt. Interrupt-Enable FlagÐWhen set, external interrupts signaled on the INTR pin will cause the CPU to transfer control to an interrupt vector specified location. Direction FlagÐCauses string instructions to auto-increment (default) the appropriate index registers when cleared. Setting DF causes autodecrement. Overflow FlagÐSet if the operation resulted in a carry/borrow into the sign bit (high-order bit) of the result but did not result in a carry/borrow out of the high-order bit or vice-versa. I/O Privilege LevelÐIndicates the maximum CPL permitted to execute I/O instructions without generating an exception 13 fault or consulting the I/O permission bit map. It also indicates the maximum CPL value allowing alteration of the IF bit. Nested TaskÐIndicates that the execution of the current task is nested within another task (see Task Switching). Resume FlagÐUsed in conjunction with debug register breakpoints. It is checked at instruction boundaries before breakpoint processing. If set, any debug fault is ignored on the next instruction. It is reset at the successful completion of any instruction except IRET, POPF, and those instructions causing task switches. CONTROL REGISTER The 80376 has a 32-bit control register called CR0 that is used to control coprocessor emulation. This register is shown in Figures, 2.1 and 2.2. The defined CR0 bits are described in Table 2.2. Bits 0, 4 and 31 of CR0 have fixed values in the 80376. These values cannot be changed. Programs that load CR0 should always load bits 0, 4 and 31 with values previously there to be compatible with the 80386. Table 2.2. CR0 Definitions Bit Position 1 Name MP 2 EM 3 TS Function Monitor Coprocessor ExtensionÐAllows WAIT instructions to cause a processor extension not present exception (number 7). Emulate Processor ExtensionÐWhen set, this bit causes a processor extension not present exception (number 7) on ESC instructions to allow processor extension emulation. Task SwitchedÐWhen set, this bit indicates the next instruction using a processor extension will cause exception 7, allowing software to test whether the current processor extension context belongs to the current task (see Task Switching). 9 376 EMBEDDED PROCESSOR 2.2 Instruction Set 2.3 Memory Organization The instruction set is divided into nine categories of operations: Memory on the 80376 is divided into 8-bit quantities (bytes), 16-bit quantities (words), and 32-bit quantities (dwords). Words are stored in two consecutive bytes in memory with the low-order byte at the lowest address. Dwords are stored in four consecutive bytes in memory with the low-order byte at the lowest address. The address of a word or Dword is the byte address of the low-order byte. For maximum performance word and dword values should be at even physical addresses. Data Transfer Arithmetic Shift/Rotate String Manipulation Bit Manipulation Control Transfer High Level Language Support Operating System Support Processor Control These 80376 processor instructions are listed in Table 8.1 80376 Instruction Set and Clock Count Summary. All 80376 processor instructions operate on either 0, 1, 2 or 3 operands; an operand resides in a register, in the instruction itself, or in memory. Most zero operand instructions (e.g. CLI, STI) take only one byte. One operand instructions generally are two bytes long. The average instruction is 3.2 bytes long. Since the 80376 has a 16-byte prefetch instruction queue an average of 5 instructions can be prefetched. The use of two operands permits the following types of common instructions: Register to Register Memory to Register Immediate to Register Memory to Memory Register to Memory Immediate to Memory The operands are either 8-, 16- or 32-bit long. In addition to these basic data types the 80376 processor supports segments. Memory can be divided up into one or more variable length segments, which can be shared between programs. ADDRESS SPACES The 80376 has three types of address spaces: logical, linear, and physical. A logical address (also known as a virtual address) consists of a selector and an offset. A selector is the contents of a segment register. An offset is formed by summing all of the addressing components (BASE, INDEX, and DISPLACEMENT), discussed in Section 2.4 Addressing Modes, into an effective address. Every selector has a logical base address associated with it that can be up to 32 bits in length. This 32bit logical base address is added to either a 32-bit offset address or a 16-bit offset address (by using the address length prefix )to form a final 32-bit linear address. This final linear address is then truncated so that only the lower 24 bits of this address are used to address the 16 Mbytes physical memory address space. The logical base address is stored in one of two operating system tables (i.e. the Local Descriptor Table or Global Descriptor Table). Figure 2.3 shows the relationship between the various address spaces. 10 376 EMBEDDED PROCESSOR 240182 – 6 Figure 2.3. Address Translation SEGMENT REGISTER USAGE The main data structure used to organize memory is the segment. On the 80376, segments are variable sized blocks of linear addresses which have certain attributes associated with them. There are two main types of segments, code and data. The simplest use of segments is to have one code and data segment. Each segment is 16 Mbytes in size overlapping each other. This allows code and data to be directly addressed by the same offset. In order to provide compact instruction encoding and increase processor performance, instructions do not need to explicitly specify which segment reg- ister is used. The segment register is automatically chosen according to the rules of Table 2.3 (Segment Register Selection Rules). In general, data references use the selector contained in the DS register, stack references use the SS register and instruction fetches use the CS register. The contents of the Instruction Pointer provide the offset. Special segment override prefixes allow the explicit use of a given segment register, and override the implicit rules listed in Table 2.3. The override prefixes also allow the use of the ES, FS and GS segment registers. There are no restrictions regarding the overlapping of the base addresses of any segments. Thus, all 6 segments could have the base address set to zero. Further details of segmentation are discussed in Section 3.0 Architecture. 11 376 EMBEDDED PROCESSOR Table 2.3. Segment Register Selection Rules Type of Memory Reference Implied (Default) Segment Use Segment Override Prefixes Possible Code Fetch CS None Destination of PUSH, PUSHF, INT, CALL, PUSHA Instructions SS None Source of POP, POPA, POPF, IRET, RET Instructions SS None Destination of STOS, MOVS, REP STOS, REP MOVS Instructions (DI is Base Register) ES None DS DS DS DS DS DS SS SS CS, SS, ES, FS, GS CS, SS, ES, FS, GS CS, SS, ES, FS, GS CS, SS, ES, FS, GS CS, SS, ES, FS, GS CS, SS, ES, FS, GS CS, SS, ES, FS, GS CS, SS, ES, FS, GS Other Data References, with Effective Address Using Base Register of: [EAX] [EBX] [ECX] [EDX] [ESI] [EDI] [EBP] [ESP] 2.4 Addressing Modes The 80376 provides a total of 8 addressing modes for instructions to specify operands. The addressing modes are optimized to allow the efficient execution of high level languages such as C and FORTRAN, and they cover the vast majority of data references needed by high-level languages. Two of the addressing modes provide for instructions that operate on register or immediate operands: Register Operand Mode: The operand is located in one of the 8-, 16- or 32-bit general registers. Immediate Operand Mode: The operand is included in the instruction as part of the opcode. The remaining 6 modes provide a mechanism for specifying the effective address of an operand. The linear address consists of two components: the seg- 12 ment base address and an effective address. The effective address is calculated by summing any combination of the following three address elements (see Figure 2.3): DISPLACEMENT: an 8-, 16- or 32-bit immediate value following the instruction. BASE: The contents of any general purpose register. The base registers are generally used by compilers to point to the start of the local variable area. Note that if the Address Length Prefix is used, only BX and BP can be used as a BASE register. INDEX: The contents of any general purpose register except for ESP. The index registers are used to access the elements of an array, or a string of characters. The index register’s value can be multiplied by a scale factor, either 1, 2, 4 or 8. The scaled index is especially useful for accessing arrays or structures. Note that if the Address Length Prefix is used, no Scaling is available and only the registers SI and DI can be used to INDEX. 376 EMBEDDED PROCESSOR Combinations of these 3 components make up the 6 additional addressing modes. There is no performance penalty for using any of these addressing combinations, since the effective address calculation is pipelined with the execution of other instructions. The one exception is the simultaneous use of BASE and INDEX components which requires one additional clock. As shown in Figure 2.4, the effective address (EA) of an operand is calculated according to the following formula: EA e BASERegister a (INDEXRegister c scaling) a DISPLACEMENT 1. Direct Mode: The operand’s offset is contained as part of the instruction as an 8-, 16- or 32-bit DISPLACEMENT. 2. Register Indirect Mode: A BASE register contains the address of the operand. 3. Based Mode: A BASE register’s contents is added to a DISPLACEMENT to form the operand’s offset. 4. Scaled Index Mode: An INDEX register’s contents is multiplied by a SCALING factor which is added to a DISPLACEMENT to form the operand’s offset. 5. Based Scaled Index Mode: The contents of an INDEX register is multiplied by a SCALING factor and the result is added to the contents of a BASE register to obtain the operand’s offset. 6. Based Scaled Index Mode with Displacement: The contents of an INDEX register are multiplied by a SCALING factor, and the result is added to the contents of a BASE register and a DISPLACEMENT to form the operand’s offset. 240182 – 7 Figure 2.4. Addressing Mode Calculations 13 376 EMBEDDED PROCESSOR GENERATING 16-BIT ADDRESSES The 80376 executes code with a default length for operands and addresses of 32 bits. The 80376 is also able to execute operands and addresses of 16 bits. This is specified through the use of override prefixes. Two prefixes, the Operand Length Prefix and the Address Length Prefix, override the default 32-bit length on an individual instruction basis. These prefixes are automatically added by assem- blers. The Operand Length and Address Length Prefixes can be applied separately or in combination to any instruction. The 80376 normally executes 32-bit code and uses either 8- or 32-bit displacements, and any register can be used as based or index registers. When executing 16-bit code (by prefix overrides), the displacements are either 8 or 16 bits, and the base and index register conform to the 16-bit model. Table 2.4 illustrates the differences. Table 2.4. BASE and INDEX Registers for 16- and 32-Bit Addresses 16-Bit Addressing 32-Bit Addressing BASE REGISTER BX, BP Any 32-Bit GP Register INDEX REGISTER SI, DI Any 32-Bit GP Register except ESP SCALE FACTOR None 1, 2, 4, 8 DISPLACMENT 0, 8, 16 Bits 0, 8, 32 Bits 2.5 Data Types The 80376 supports all of the data types commonly used in high level languages: Bit: Bit Field: Bit String: Byte: Unsigned Byte: Integer (Word): Long Integer (Double Word): A signed 32-bit quantity. All operations assume a 2’s complement representation. Unsigned Integer (Word): Unsigned Long Integer (Double Word): Signed Quad Word: An unsigned 16-bit quantity. Unsigned Quad Word: Pointer: Long Pointer: Char: String: BCD: Packed BCD: 14 A single bit quantity. A group of up to 32 contiguous bits, which spans a maximum of four bytes. A set of contiguous bits, on the 80376 bit strings can be up to 16 Mbits long. A signed 8-bit quantity. An unsigned 8-bit quantity. A signed 16-bit quantity. An unsigned 32-bit quantity. A signed 64-bit quantity. An unsigned 64-bit quantity. A 16- or 32-bit offset only quantity which indirectly references another memory location. A full pointer which consists of a 16-bit segment selector and either a 16- or 32-bit offset. A byte representation of an ASCII Alphanumeric or control character. A contiguous sequence of bytes, words or dwords. A string may contain between 1 byte and 16 Mbytes. A byte (unpacked) representation of decimal digits 0 – 9. A byte (packed) representation of two decimal digits 0 – 9 storing one digit in each nibble. 376 EMBEDDED PROCESSOR When the 80376 is coupled with a numerics Coprocessor such as the 80387SX then the following common Floating Point types are supported. Floating Point: A signed 32-, 64- or 80-bit real number representation. Floating point numbers are supported by the 80387SX numerics coprocessor. Figure 2.5 illustrates the data types supported by the 80376 processor and the 80387SX coprocessor. 240182 – 8 Figure 2.5. 80376 Supported Data Types 15 376 EMBEDDED PROCESSOR 2.6 I/O Space The 80376 has two distinct physical address spaces: physical memory and I/O. Generally, peripherals are placed in I/O space although the 80376 also supports memory-mapped peripherals. The I/O space consists of 64 Kbytes which can be divided into 64K 8-bit ports, 32K 16-bit ports, or any combination of ports which add to no more than 64 Kbytes. The M/IO pin acts as an additional address line, thus allowing the system designer to easily determine which address space the processor is accessing. Note that the I/O address refers to a physical address. The I/O ports are accessed by the IN and OUT instructions, with the port address supplied as an immediate 8-bit constant in the instruction or in the DX register. All 8-bit and 16-bit port addresses are zero extended on the upper address lines. The I/O instructions cause the M/IO pin to be driven LOW. I/O port addresses 00F8H through 00FFH are reserved for use by Intel. 2.7 Interrupts and Exceptions Interrupts and exceptions alter the normal program flow in order to handle external events, report errors or exceptional conditons. The difference between interrupts and exceptions is that interrupts are used to handle asynchronous external events while exceptions handle instruction faults. Although a program can generate a software interrupt via an INT N instruction, the processor treats software interrupts as exceptions. Hardware interrupts occur as the result of an external event and are classified into two types: maskable or non-maskable. Interrupts are serviced after the execution of the current instruction. After the interrupt handler is finished servicing the interrupt, execution proceeds with the instruction immediately after the interrupted instruction. Exceptions are classified as faults, traps, or aborts depending on the way they are reported, and whether or not restart of the instruction causing the exception is suported. Faults are exceptions that are detected and serviced before the execution of the faulting instruction. Traps are exceptions that are reported immediately after the execution of the instruction which caused the problem. Aborts are exceptions which do not permit the precise location of the instruction causing the exception to be determined. Thus, when an interrupt service routine has been completed, execution proceeds from the in- 16 struction immediately following the interrupted instruction. On the other hand the return address from an exception/fault routine will always point at the instruction causing the exception and include any leading instruction prefixes. Table 2.5 summarizes the possible interrupts for the 80376 and shows where the return address points to. The 80376 has the ability to handle up to 256 different interrupts/exceptions. In order to service the interrupts, a table with up to 256 interrupt vectors must be defined. The interrupt vectors are simply pointers to the appropriate interrupt service routine. The interrupt vectors are 8-byte quantities, which are put in an Interrupt Descriptor Table. Of the 256 possible interrupts, 32 are reserved for use by Intel and the remaining 224 are free to be used by the system designer. INTERRUPT PROCESSING When an interrupt occurs the following actions happen. First, the current program address and the Flags are saved on the stack to allow resumption of the interrupted program. Next, an 8-bit vector is supplied to the 80376 which identifies the appropriate entry in the interrupt table. The table contains either an Interrupt Gate, a Trap Gate or a Task Gate that will point to an interrupt procedure or task. The user supplied interrupt service routine is executed. Finally, when an IRET instruction is executed the old processor state is restored and program execution resumes at the appropriate instruction. The 8-bit interrupt vector is supplied to the 80376 in several different ways: exceptions supply the interrupt vector internally; software INT instructions contain or imply the vector; maskable hardware interrupts supply the 8-bit vector via the interrupt acknowledge bus sequence. Non-Maskable hardware interrupts are assigned to interrupt vector 2. Maskable Interrupt Maskable interrupts are the most common way to respond to asynchronous external hardware events. A hardware interrupt occurs when the INTR is pulled HIGH and the Interrupt Flag bit (IF) is enabled. The processor only responds to interrupts between instructions (string instructions have an ‘‘interrupt window’’ between memory moves which allows interrupts during long string moves). When an interrupt occurs the processor reads an 8-bit vector supplied by the hardware which identifies the source of the interrupt (one of 224 user defined interrupts). 376 EMBEDDED PROCESSOR Table 2.5. Interrupt Vector Assignments Function Instruction Which Can Cause Exception Interrupt Number Return Address Points to Faulting Instruction Type Divide Error 0 DIV, IDIV Yes FAULT Debug Exception 1 Any Instruction Yes TRAP* NMI Interrupt 2 INT 2 or NMI No NMI One-Byte Interrupt 3 INT No TRAP Interrupt on Overflow 4 INTO No TRAP Array Bounds Check 5 BOUND Yes FAULT Invalid OP-Code 6 Any Illegal Instruction Yes FAULT Device Not Available 7 ESC, WAIT Yes FAULT 8 Any Instruction That Can Generate an Exception Double Fault ABORT Coprocessor Segment Overrun 9 ESC No ABORT Invalid TSS 10 JMP, CALL, IRET, INT Yes FAULT Segment Not Present 11 Segment Register Instructions Yes FAULT Stack Fault 12 Stack References Yes FAULT 13 Any Memory Reference Yes FAULT Ð Ð ESC, WAIT Yes FAULT INT n No TRAP General Protection Fault Intel Reserved Coprocessor Error 14–15 16 Intel Reserved 17–32 Two-Byte Interrupt 0–255 Ð *Some debug exceptions may report both traps on the previous instruction, and faults on the next instruction. Interrupts through Interrupt Gates automatically reset IF, disabling INTR requests. Interrupts through Trap Gates leave the state of the IF bit unchanged. Interrupts through a Task Gate change the IF bit according to the image of the EFLAGs register in the task’s Task State Segment (TSS). When an IRET instruction is executed, the original state of the IF bit is restored. Non-Maskable Interrupt Non-maskable interrupts provide a method of servicing very high priority interrupts. When the NMI input is pulled HIGH it causes an interrupt with an internally supplied vector value of 2. Unlike a normal hardware interrupt no interrupt acknowledgement sequence is performed for an NMI. tion is executed or the processor is reset. If NMI occurs while currently servicing an NMI, its presence will be saved for servicing after executing the first IRET instruction. The disabling of INTR requests depends on the gate in IDT location 2. Software Interrupts A third type of interrupt/exception for the 80376 is the software interrupt. An INT n instruction causes the processor to execute the interrupt service routine pointed to by the nth vector in the interrupt table. A special case of the two byte software interrupt INT n is the one byte INT 3, or breakpoint interrupt. By inserting this one byte instruction in a program, the user can set breakpoints in his program as a debugging tool. While executing the NMI servicing procedure, the 80376 will not service any further NMI request, or INT requests, until an interrupt return (IRET) instruc- 17 376 EMBEDDED PROCESSOR A final type of software interrupt, is the single step interrupt. It is discussed in Single-Step Trap (page 22). INTERRUPT AND EXCEPTION PRIORITIES Interrupts are externally-generated events. Maskable Interrupts (on the INTR input) and Non-Maskable Interrupts (on the NMI input) are recognized at instruction boundaries. When NMI and maskable INTR are both recognized at the same instruction boundary, the 80376 invokes the NMI service routine first. If, after the NMI service routine has been invoked, maskable interrupts are still enabled, then the 80376 will invoke the appropriate interrupt service routine. As the 80376 executes instructions, it follows a consistent cycle in checking for exceptions, as shown in Table 2.6. This cycle is repeated as each instruction is executed, and occurs in parallel with instruction decoding and execution. INSTRUCTION RESTART The 80376 fully supports restarting all instructions after faults. If an exception is detected in the instruction to be executed (exception categories 4 through 9 in Table 2.6), the 80376 device invokes the appropriate exception service routine. The 80376 is in a state that permits restart of the instruction. DOUBLE FAULT A Double fault (exception 8) results when the processor attempts to invoke an exception service routine for the segment exceptions (10, 11, 12 or 13), but in the process of doing so, detects an exception. 2.8 Reset and Initialization When the processor is Reset the registers have the values shown in Table 2.7. The 80376 will then start executing instructions near the top of physical memory, at location 0FFFFF0H. A short JMP should be executed within the segment defined for power-up (see Table 2.7). The GDT should then be initialized for a start-up data and code segment followed by a far JMP that will load the segment descriptor cache with the new descriptor values. The IDT table, after reset, is located at physical address 0H, with a limit of 256 entries. RESET forces the 80376 to terminate all execution and local bus activity. No instruction execution or bus activity will occur as long as Reset is active. Between 350 and 450 CLK2 periods after Reset becomes inactive, the 80376 will start executing instructions at the top of physical memory. Table 2.6. Sequence of Exception Checking Consider the case of the 80376 having just completed an instruction. It then performs the following checks before reaching the point where the next instruction is completed: 1. Check for Exception 1 Traps from the instruction just completed (single-step via Trap Flag, or Data Breakpoints set in the Debug Registers). 2. Check for external NMI and INTR. 3. Check for Exception 1 Faults in the next instruction (Instruction Execution Breakpoint set in the Debug Registers for the next instruction). 4. Check for Segmentation Faults that prevented fetching the entire next instruction (exceptions 11 or 13). 5. Check for Faults decoding the next instruction (exception 6 if illegal opcode; or exception 13 if instruction is longer than 15 bytes, or privilege violation (i.e. not at IOPL or at CPL e 0). 6. If WAIT opcode, check if TS e 1 and MP e 1 (exception 7 if both are 1). 7. If ESCape opcode for numeric coprocessor, check if EM e 1 or TS e 1 (exception 7 if either are 1). 8. If WAIT opcode or ESCape opcode for numeric coprocessor, check ERROR input signal (exception 16 if ERROR input is asserted). 9. Check for Segmentation Faults that prevent transferring the entire memory quantity (exceptions 11, 12, 13). 18 376 EMBEDDED PROCESSOR Table 2.7. Register Values after Reset Flag Word (EFLAGS) uuuu0002H (Note 1) Machine Status Word (CR0) uuuuuuu1H (Note 2) Instruction Pointer (EIP) 0000FFF0H Code Segment (CS) F000H (Note 3) Data Segment (DS) 0000H (Note 4) Stack Segment (SS) 0000H Extra Segment (ES) 0000H Extra Segment (FS) 0000H Extra Segment (GS) 0000H (Note 4) EAX Register 0000H (Note 5) EDX Register Component and Stepping ID (Note 6) Undefined (Note 7) All Other Registers NOTES: 1. EFLAG Register. The upper 14 bits of the EFLAGS register are undefined, all defined flag bits are zero. 2. CR0: The defined 4 bits in the CR0 is equal to 1H. 3. The Code Segment Register (CS) will have its Base Address set to 0FFFF0000H and Limit set to 0FFFFH. 4. The Data and Extra Segment Registers (DS and ES) will have their Base Address set to 000000000H and Limit set to 0FFFFH. 5. If self-test is selected, the EAX should contain a 0 value. If a value of 0 is not found the self-test has detected a flaw in the part. 6. EDX register always holds component and stepping identifier. 7. All unidentified bits are Intel Reserved and should not be used. 2.9 Initialization Because the 80376 processor starts executing in protected mode, certain precautions need be taken during initialization. Before any far jumps can take place the GDT and/or LDT tables need to be setup and their respective registers loaded. Before interrupts can be initialized the IDT table must be setup and the IDTR must be loaded. The example code is shown below: ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; **************************************************************** This is an example of startup code to put either an 80376, 80386SX or 80386 into flat mode. All of memory is treated as simple linear RAM. There are no interrupt routines. The Builder creates the GDT-alias and IDT-alias and places them, by default, in GDT[1] and GDT[2]. Other entries in the GDT are specified in the Build file. After initialization it jumps to a C startup routine. To use this template, change this jmp address to that of your code, or make the label of your code ‘c startup‘. This code was assembled and built using version 1.2 of the Intel RLL utilities and Intel 386ASM assembler. *** This code was tested *** **************************************************************** 19 376 EMBEDDED PROCESSOR NAME FLAT EXTRN ; name of the object module c startup:near ; this is the label jmped to after init pe flag data selc INIT CODE equ 1 equ 20h ; assume code is GDT[3], data GDT[4] SEGMENT ER PUBLIC USE32 ; Segment base at 0ffffff80h PUBLIC GDT DESC gdt desc PUBLIC dq ? START start: cld smsw bx test bl,1 jnz pestart realstart db 66h mov eax,offset gdt desc xor ebx,ebx mov bh,ah move bl,al db 67h db 66h lgdt cs:[ebx] smsw ax or al,pe flag lmsw ax jmp next pestart: mov ebx,offset gdt desc xor eax,eax mov ax,bx lgdt cs:[eax] xor ebx,ebx mov b1,data selc mov ds,bx mov ss,bx mov es,bx mov fs,bx mov gs,bx jmp pejump next: xor ebx,ebx mov b1,data selc mov ds,bx mov ss,bx mov es,bx mov fs,bx mov gs,bx db 66h pejump: jmp far ptr c startup org 70h jmp short start INIT CODE ENDS END 20 ; clear direction flag ; check for processor (80376) at reset ; use SMSW rather than MOV for speed ; ; ; ; ; ; is an 80386 and in real mode force the next operand into 32-bit mode. move address of the GDT descriptor into eax clear ebx load 8 bits of address into bh load 8 bits of address into bl ; use the 32-bit form of LGDT to load ; the 32-bits of address into the GDTR ; go into protected mode (set PE bit) ; flush prefetch queue ; lower portion of address only ; initialize data selectors ; GDT[3] ; initialize data selectors ; GDT[3] ; for the 80386, need to make a 32-bit jump ; but the 80376 is already 32-bit. ; only if segment base is at 0ffffff80h 376 EMBEDDED PROCESSOR This code should be linked into your application for boot loadable code. The following build file illustrates how this is accomplished. FLAT; Ð build program id SEGMENT *segments (dpl40), phantom code (dpl40), phantom data (dpl40), init code (base40ffffff80h); Ð Ð Ð Ð Give all user segments a DPL of 0. These two segments are created by the builder when the FLAT control is used. Put startup code at the reset vector area. GATE g13 (entry413, dpl40, trap), i32 (entry432, dpl40, interrupt), Ð trap gate disables interrupts Ð interrupt gates doesn’t TABLE Ð create GDT GDT (LOCATION 4 GDT DESC, Ð Ð Ð Ð Ð Ð Ð Ð Ð ENTRY 4 (3: phantom code , 4: phantom data , 5:code32, 6:data, 7:init code) ); In a buffer starting at GDT DESC, BLD386 places the GDT base and GDT limit values. Buffer must be 6 bytes long. The base and limit values are places in this buffer as two bytes of limit plus four bytes of base in the format required for use by the LGDT instruction. Ð Explicitly place segment Ð entries into the GDT. TASK MAIN TASK ( DPL 4 0, DATA 4 DATA, CODE 4 main, STACKS 4 (DATA), NO INTENABLED, PRESENT ); Ð Ð Ð Ð Ð Task privilege level is 0. Points to a segment that indicates initial DS value. Entry point is main, which must be a public id. Ð Ð Ð Ð Segment id points to stack segment. Sets the initial SS:ESP. Disable interrupts. Present bit in TSS set to 1. MEMORY (RANGE 4 (EPROM 4 ROM(0ffff8000h..0ffffffffh), DRAM 4 RAM(0..0ffffh)), ALLOCATE 4 (EPROM 4 (MAIN TASK))); END asm386 asm386 bnd386 bld386 flatsim.a38 debug application.a38 debug application.obj,flatsim.obj nolo debug oj (application.bnd) application.bnd bf (flatsim.bld) bl flat Commands to assemble and build a boot-loadable application named ‘‘application.a38’’. The initialization code is called ‘‘flatsim.a38’’, and build file is called ‘‘application.bld’’. 21 376 EMBEDDED PROCESSOR 2.10 Self-Test 2.11 Debugging Support The 80376 has the capability to perform a self-test. The self-test checks the function of all of the Control ROM and most of the non-random logic of the part. Approximately one-half of the 80376 can be tested during self-test. The 80376 provides several features which simplify the debugging process. The three categories of onchip debugging aids are: Self-Test is initiated on the 80376 when the RESET pin transitions from HIGH to LOW, and the BUSY pin is LOW. The self-test takes about 220 clocks, or approximately 33 ms with a 16 MHz 80376 processor. At the completion of self-test the processor performs reset and begins normal operation. The part has successfully passed self-test if the contents of the EAX register is zero. If the EAX register is not zero then the self-test has detected a flaw in the part. If self-test is not selected after reset, EAX may be non-zero after reset. 1. The code execution breakpoint opcode (0CCH). 2. The single-step capability provided by the TF bit in the flag register, and 3. The code and data breakpoint capability provided by the Debug Registers DR0 – 3, DR6, and DR7. BREAKPOINT INSTRUCTION A single-byte software interrupt (Int 3) breakpoint instruction is available for use by software debuggers. The breakpoint opcode is 0CCh, and generates an exception 3 trap when executed. DEBUG REGISTERS 240182 – 9 240182 – 10 240182 – 5 Figure 2.6. Debug Registers 22 376 EMBEDDED PROCESSOR SINGLE-STEP TRAP 3.0 ARCHITECTURE If the single-step flag (TF, bit 8) in the EFLAG register is found to be set at the end of an instruction, a single-step exception occurs. The single-step exception is auto vectored to exception number 1. The Intel 80376 Embedded Processor has a physical address space of 16 Mbytes (224 bytes) and allows the running of virtual memory programs of almost unlimited size (16 Kbytes c 16 Mbytes or 256 Gbytes (238 bytes)). In addition the 80376 provides a sophisticated memory management and a hardware-assisted protection mechanism. The Debug Registers are an advanced debugging feature of the 80376. They allow data access breakpoints as well as code execution breakpoints. Since the breakpoints are indicated by on-chip registers, an instruction execution breakpoint can be placed in ROM code or in code shared by several tasks, neither of which can be supported by the INT 3 breakpoint opcode. The 80376 contains six Debug Registers, consisting of four breakpoint address registers and two breakpoint control registers. Initially after reset, breakpoints are in the disabled state; therefore, no breakpoints will occur unless the debug registers are programmed. Breakpoints set up in the Debug Registers are auto-vectored to exception 1. Figure 2.6 shows the breakpoint status and control registers. 3.1 Addressing Mechanism The 80376 uses two components to form the logical address, a 16-bit selector which determines the linear base address of a segment, and a 32-bit effective address. The selector is used to specify an index into an operating system defined table (see Figure 3.1). The table contains the 32-bit base address of a given segment. The linear address is formed by adding the base address obtained from the table to the 32-bit effective address. This value is truncated to 24 bits to form the physical address, which is then placed on the address bus. 240182 – 11 Figure 3.1. Address Calculation 23 376 EMBEDDED PROCESSOR 3.2 Segmentation Segmentation is one method of memory management and provides the basis for protection in the 80376. Segments are used to encapsulate regions of memory which have common attributes. For example, all of the code of a given program could be contained in a segment, or an operating system table may reside in a segment. All information about each segment, is stored in an 8-byte data structure called a descriptor. All of the descriptors in a system are contained in tables recognized by hardware. Each of the tables have a register associated with it: GDTR, LDTR and IDTR; see Figure 3.2. The LGDT, LLDT and LIDT instructions load the base and limit of the Global, Local and Interrupt Descriptor Tables into the appropriate register. The SGDT, SLDT and SIDT store these base and limit values. These are privileged instructions. TERMINOLOGY The following terms are used throughout the discussion of descriptors, privilege levels and protection: PL: Privilege LevelÐOne of the four hierarchical privilege levels. Level 0 is the most privileged level and level 3 is the least privileged. RPL: Requestor Privilege LevelÐThe privilege level of the original supplier of the selector. RPL is determined by the least two significant bits of a selector. DPL: Descriptor Privilege LevelÐThis is the least privileged level at which a task may access that descriptor (and the segment associated with that descriptor). Descriptor Privilege Level is determined by bits 6:5 in the Access Right Byte of a descriptor. CPL: Current Privilege LevelÐThe privilege level at which a task is currently executing, which equals the privilege level of the code segment being executed. CPL can also be determined by examining the lowest 2 bits of the CS register, except for conforming code segments. EPL: Effective Privilege LevelÐThe effective privilege level is the least privileged of the RPL and the DPL. EPL is the numerical maximum of RPL and DPL. 240182 – 12 Figure 3.2. Descriptor Table Registers Global Descriptor Table The Global Descriptor Table (GDT) contains descriptors which are possibly available to all of the tasks in a system. The GDT can contain any type of segment descriptor except for interrupt and trap descriptors. Every 80376 system contains a GDT. A simple 80376 system contains only 2 entries in the GDT; a code and a data descriptor. For maximum performance, descriptor tables should begin on even addresses. Task: One instance of the execution of a program. Tasks are also referred to as processes. The first slot of the Global Descriptor Table corresponds to the null selector and is not used. The null selector defines a null pointer value. DESCRIPTOR TABLES Local Descriptor Table The descriptor tables define all of the segments which are used in an 80376 system. There are three types of tables on the 80376 which hold descriptors: the Global Descriptor Table, Local Descriptor Table, and the Interrupt Decriptor Table. All of the tables are variable length memory arrays, they can range in size between 8 bytes and 64 Kbytes. Each table can hold up to 8192 8-byte descriptors. The upper 13 bits of a selector are used as an index into the descriptor table. The tables have registers associated with them which hold the 32-bit linear base address, and the 16-bit limit of each table. LDTs contain descriptors which are associated with a given task. Generally, operating systems are designed so that each task has a separate LDT. The LDT may contain only code, data, stack, task gate, and call gate descriptors. LDTs provide a mechanism for isolating a given task’s code and data segments from the rest of the operating system, while the GDT contains descriptors for segments which are common to all tasks. A segment cannot be accessed by a task if its segment descriptor does not exist in either the current LDT or the GDT. This pro- 24 376 EMBEDDED PROCESSOR vides both isolation and protection for a task’s segments, while still allowing global data to be shared among tasks. Unlike the 6-byte GDT or IDT registers which contain a base address and limit, the visible portion of the LDT register contains only a 16-bit selector. This selector refers to a Local Descriptor Table descriptor in the GDT (see Figure 2.1). INTERRUPT DESCRIPTOR TABLE The third table needed for 80376 systems is the Interrupt Descriptor Table. The IDT contains the descriptors which point to the location of up to 256 interrupt service routines. The IDT may contain only task gates, interrupt gates and trap gates. The IDT should be at least 256 bytes in size in order to hold the descriptors for the 32 Intel Reserved Interrupts. Every interrupt used by a system must have an entry in the IDT. The IDT entries are referenced by INT instructions, external interrupt vectors, and exceptions. DESCRIPTORS The object to which the segment selector points to is called a descriptor. Descriptors are eight-byte quantities which contain attributes about a given region of linear address space. These attributes include the 32-bit logical base address of the seg- ment, the 20-bit length and granularity of the segment, the protection level, read, write or execute privileges, and the type of segment. All of the attribute information about a segment is contained in 12 bits in the segment descriptor. Figure 3.3 shows the general format of a descriptor. All segments on the the 80376 have three attribute fields in common: the Present bit (P), the Descriptor Privilege Level bits (DPL) and the Segment bit (S). P e 1 if the segment is loaded in physical memory, if P e 0 then any attempt to access the segment causes a not present exception (exception 11). The DPL is a two-bit field which specifies the protection level, 0 – 3, associated with a segment. The 80376 has two main categories of segments: system segments, and non-system segments (for code and data). The segment bit, S, determines if a given segment is a system segment, a code segment or a data segment. If the S bit is 1 then the segment is either a code or data segment, if it is 0 then the segment is a system segment. Note that although the 80376 is limited to a 16-Mbyte Physical address space (224), its base address allows a segment to be placed anywhere in a 4-Gbyte linear address space. When writing code for the 80376, users should keep code portability to an 80386 processor (or other processors with a larger physical address space) in mind. A segment base address can be placed anywhere in this 4-Gbyte linear address space, but a physical address will be 31 0 SEGMENT BASE 15 . . . 0 SEGMENT LIMIT 15 . . . 0 A BASE LIMIT G 1 0 V P DPL S TYPE 31 . . . 24 19 . . . 16 L BASE LIMIT P DPL S TYPE A G 0 AVL A BASE 23 . . . 16 BYTE ADDRESS 0 a4 Base Address of the segment The length of the segment Present Bit 1 e Present 0 e Not Present Descriptor Privilege Level 0 – 3 Segment Descriptor: 0 e System Descriptor, 1 e Code or Data Descriptor Type of Segment Accessed Bit Granularity Bit 1 e Segment length is 4 Kbyte Granular 0 e Segment length is byte granular Bit must be zero (0) for compatibility with future processors Available field for user or OS Figure 3.3. Segment Descriptors 31 0 SEGMENT BASE 15 . . . 0 SEGMENT LIMIT 15 . . . 0 A ACCESS BASE LIMIT G 1 0 V RIGHTS 31 . . . 24 19 . . . 16 L BYTE 0 BASE a4 23 . . . 16 G Granularity Bit 1 e Segment length is 4 Kbyte granular 0 e Segment length is byte granular 0 Bit must be zero (0) for compatibility with future processors AVL Available field for user or OS Figure 3.4. Code and Data Descriptors 25 376 EMBEDDED PROCESSOR Table 3.1. Access Rights Byte Definition for Code and Data Descriptors Bit Position Name Function 7 Present (P) 6–5 Descriptor Privilege Level (DPL) Segment S e 1 Code or Data (includes stacks) segment descriptor Descriptor (S) S e 0 System Segment Descriptor or Gate Descriptor 4 3 2 Pe1 Pe0 Segment is mapped into physical memory. No mapping to physical memory exits Segment privilege attribute used in privilege tests. 1 Executable (E) Expansion Direction (ED) Writable (W) E e 0 Descriptor type is data segment: ED e 0 Expand up segment, offsets must be s limit. ED e 1 Expand down segment, offsets must be l limit. W e 0 Data segment may not be written into. W e 1 Data segment may be written into. 3 2 Executable (E) Conforming (C) 1 Readable (R) E e 1 Descriptor type is code segment: C e 1 Code segment may only be executed when CPL t DPL and CPL remains unchanged. R e 0 Code segment may not be read. R e 1 Code segment may be read. 0 Accessed (A) * * If Data Segment (S e 1, E e 0) If Code Segment (S e 1, E e 1) A e 0 Segment has not been accessed. A e 1 Segment selector has been loaded into segment register or used by selector test instructions. generated that is a truncated version of this linear address. Truncation will be to the maximum number of address bits. It is recommended to place EPROM at the highest physical address and DRAM at the lowest physical addresses. 80376 system descriptors (which are the same as 80386 descriptor types 2, 5, 9, B, C, E and F) contain a 32-bit logical base address and a 20-bit segment limit. Selector Fields Code and Data Descriptors (S e 1) Figure 3.4 shows the general format of a code and data descriptor and Table 3.1 illustrates how the bits in the Access Right Byte are interpreted. Code and data segments have several descriptor fields in common. The accessed bit, A, is set whenever the processor accesses a descriptor. The granularity bit, G, specifies if a segment length is 1-bytegranular or 4-Kbyte-granular. Base address bits 31–24, which are normally found in 80386 descriptors, are not made externally available on the 80376. They do not affect the operation of the 80376. The A31 –A24 field should be set to allow an 80386 to correctly execute with EPROM at the upper 4096 Mbytes of physical memory. System Descriptor Formats (S e 0) System segments describe information about operating system tables, tasks, and gates. Figure 3.5 shows the general format of system segment descriptors, and the various types of system segments. 26 A selector has three fields: Local or Global Descriptor Table Indicator (TI), Descriptor Entry Index (Index), and Requestor ( the selector’s) Privilege Level (RPL) as shown in Figure 3.6. The TI bit selects either the Global Descriptor Table or the Local Descriptor Table. The Index selects one of 8K descriptors in the appropriate descriptor table. The RPL bits allow high speed testing of the selector’s privilege attributes. Segment Descriptor Cache In addition to the selector value, every segment register has a segment descriptor cache register associated with it. Whenever a segment register’s contents are changed, the 8-byte descriptor associated with that selector is automatically loaded (cached) on the chip. Once loaded, all references to that segment use the cached descriptor information instead of reaccessing the descriptor. The contents of the descriptor cache are not visible to the programmer. Since descriptor caches only change when a segment register is changed, programs which modify the descriptor tables must reload the appropriate segment registers after changing a descriptor’s value. 376 EMBEDDED PROCESSOR 31 16 SEGMENT BASE 15 . . . 0 SEGMENT LIMIT 15 . . . 0 BASE LIMIT G 0 0 0 31 . . . 24 19 . . . 16 Type 0 1 2 3 4 5 6 7 0 P Defines Invalid Reserved LDT Reserved Reserved Task Gate (80376/80386 Task) Reserved Reserved DPL 0 Type 8 9 A B C D E F 0 TYPE BASE a4 23 . . . 16 Defines Invalid Available 80376/80386 TSS Undefined (Intel Reserved) Busy 80376/80386 TSS 80376/80386 Call Gate Undefined (Intel Reserved) 80376/80386 Interrupt Gate 80376/80386 Trap Gate Figure 3.5. System Descriptors 240182 – 13 Figure 3.6. Example Descriptor Selection 3.3 Protection The 80376 offers extensive protection features. These protection features are particularly useful in sophisticated embedded applications which use multitasking real-time operating systems. For simpler embedded applications these protection capabilities can be easily bypassed by making all applications run at privilege level (PL) 0. RULES OF PRIVILEGE The 80376 controls access to both data and procedures between levels of a task, according to the following rules. ÐData stored in a segment with privilege level p can be accessed only by code executing at a privilege level at least as privileged as p. ÐA code segment/procedure with privilege level p can only be called by a task executing at the same or a lesser privilege level than p. PRIVILEGE LEVELS At any point in time, a task on the 80376 always executes at one of the four privilege levels. The Current Privilege Level (CPL) specifies what the task’s privilege level is. A task’s CPL may only be changed 27 376 EMBEDDED PROCESSOR by control transfers through gate descriptors to a code segment with a different privilege level. Thus, an application program running at PL e 3 may call an operating system routine at PL e 1 (via a gate) which would cause the task’s CPL to be set to 1 until the operating system routine was finished. Selector Privilege (RPL) The privilege level of a selector is specified by the RPL field. The selector’s RPL is only used to establish a less trusted privilege level than the current privilege level of the task for the use of a segment. This level is called the task’s effective privilege level (EPL). The EPL is defined as being the least privileged (numerically larger) level of a task’s CPL and a selector’s RPL. The RPL is most commonly used to verify that pointers passed to an operating system procedure do not access data that is of higher privilege than the procedure that originated the pointer. Since the originator of a selector can specify any RPL value, the Adjust RPL (ARPL) instruction is provided to force the RPL bits to the originator’s CPL. I/O Privilege The I/O privilege level (IOPL) lets the operating system code executing at CPL e 0 define the least privileged level at which I/O instructions can be used. An exception 13 (General Protection Violation) is generated if an I/O instruction is attempted when the CPL of the task is less privileged than the IOPL. The IOPL is stored in bits 13 and 14 of the EFLAGS register. The following instructions cause an exception 13 if the CPL is greater than IOPL: IN, INS, OUT, OUTS, STI, CLI and LOCK prefix. Descriptor Access There are basically two types of segment accessess: those involving code segments such as control transfers, and those involving data accesses. Determining the ability of a task to access a segment involves the type of segment to be accessed, the instruction used, the type of descriptor used and CPL, RPL, and DPL as described above. 28 Any time an instruction loads a data segment register (DS, ES, FS, GS) the 80376 makes protection validation checks. Selectors loaded in the DS, ES, FS, GS registers must refer only to data segment or readable code segments. Finally the privilege validation checks are performed. The CPL is compared to the EPL and if the EPL is more privileged than the CPL, an exception 13 (general protection fault) is generated. The rules regarding the stack segment are slightly different than those involving data segments. Instructions that load selectors into SS must refer to data segment descriptors for writeable data segments. The DPL and RPL must equal the CPL of all other descriptor types or a privilege level violation will cause an exception 13. A stack not present fault causes an exception 12. PRIVILEGE LEVEL TRANSFERS Inter-segment control transfers occur when a selector is loaded in the CS register. For a typical system most of these transfers are simply the result of a call or a jump to another routine. There are five types of control transfers which are summarized in Table 3.2. Many of these transfers result in a privilege level transfer. Changing privilege levels is done only by control transfers, using gates, task switches, and interrupt or trap gates. Control transfers can only occur if the operation which loaded the selector references the correct descriptor type. Any violation of these descriptor usage rules will cause an exception 13. CALL GATES Gates provide protected indirect CALLs. One of the major uses of gates is to provide a secure method of privilege transfers within a task. Since the operating system defines all of the gates in a system, it can ensure that all gates only allow entry into a few trusted procedures. 376 EMBEDDED PROCESSOR Table 3.2. Descriptor Types Used for Control Transfer Control Transfer Types Intersegment within the same privilege level Intersegment to the same or higher privilege level Interrupt within task may change CPL Intersegment to a lower privilege level (changes task CPL) Task Switch Operation Types Descriptor Referenced JMP, CALL, RET, IRET* Code Segment Descriptor Table GDT/LDT CALL Call Gate GDT/LDT Interrupt Instruction, Exception, External Interrupt Trap or Interrupt Gate IDT RET, IRET* Code Segment GDT/LDT CALL, JMP Task State Segment GDT CALL, JMP Task Gate GDT/LDT IRET** Interrupt Instruction, Exception, External Interrupt Task Gate IDT *NT (Nested Task bit of flag register) e 0 **NT (Nested Task bit of flag register) e 1 29 376 EMBEDDED PROCESSOR NOTE: BITÐMAPÐOFFSET must be s DFFFH Type e 9: Available 80376 TSS. Type e B: Busy 80376 TSS. 240182 – 14 Figure 3.7. 80376 TSS And TSS Registers 30 376 EMBEDDED PROCESSOR TASK SWITCHING A very important attribute of any multi-tasking operating system is its ability to rapidly switch between tasks or processes. The 80376 directly supports this operation by providing a task switch instruction in hardware. The 80376 task switch operation saves the entire state of the machine (all of the registers, address space, and a link to the previous task), loads a new execution state, performs protection checks, and commences execution in the new task. Like transfer of control by gates, the task switch operation is invoked by executing an inter-segment JMP or CALL instruction which refers to a Task State Segment (TSS), or a task gate descriptor in the GDT or LDT. An INT n instruction, exception, trap or external interrupt may also invoke the task switch operation if there is a task gate descriptor in the associated IDT descriptor slot. For simple applications, the TSS and task switching may not be used. The TSS or task switch will not be used or occur if no task gates are present in the GDT, LDT or IDT. The TSS descriptor points to a segment (see Figure 3.7) containing the entire 80376 execution state. A task gate descriptor contains a TSS selector. The limit of an 80376 TSS must be greater than 64H, and can be as large as 16 Mbytes. In the additional TSS space, the operating system is free to store additional information as the reason the task is inactive, the time the task has spent running, and open files belonging to the task. For maximum performance, TSS should start on an even address. Each Task must have a TSS associated with it. The current TSS is identified by a special register in the 80376 called the Task State Segment Register (TR). This register contains a selector referring to the task state segment descriptor that defines the current TSS. A hidden base and limit register associated with the TSS descriptor is loaded whenever TR is loaded with a new selector. Returning from a task is accomplished by the IRET instruction. When IRET is executed, control is returned to the task which was interrupted. The current executing task’s state is saved in the TSS and the old task state is restored from its TSS. Several bits in the flag register and CR0 register give information about the state of a task which is useful to the operating system. The Nested Task bit, NT, controls the function of the IRET instruction. If NT e 0 the IRET instruction performs the regular return. If NT e 1, IRET performs a task switch operation back to the previous task. The NT bit is set or reset in the following fashion: When a CALL or INT instruction initiates a task switch, the new TSS will be marked busy and the back link field of the new TSS set to the old TSS selector. The NT bit of the new task is set by CALL or INT initiated task switches. An interrupt that does not cause a task switch will clear NT (The NT bit will be restored after execution of the interrupt handler). NT may also be set or cleared by POPF or IRET instructions. The 80376 task state segment is marked busy by changing the descriptor type field from TYPE 9 to TYPE 0BH. Use of a selector that references a busy task state segment causes an exception 13. The coprocessor’s state is not automatically saved when a task switch occurs. The Task Switched Bit, TS, in the CR0 register helps deal with the coprocessor’s state in a multi-tasking environment. Whenever the 80376 switches tasks, it sets the TS bit. The 80376 detects the first use of a processor extension instruction after a task switch and causes the processor extension not available exception 7. The exception handler for exception 7 may then decide whether to save the state of the coprocessor. The T bit in the 80376 TSS indicates that the processor should generate a debug exception when switching to a task. If T e 1 then upon entry to a new task a debug exception 1 will be generated. 240182 – 15 I/O Ports Accessible 2 x 9, 12, 13, 15, 20 x 24, 27, 33, 34, 40, 41, 48, 50, 52, 53, 58 x 60, 62, 63, 96 x 127 Figure 3.8. Sample I/O Permission Bit Map 31 376 EMBEDDED PROCESSOR PROTECTION AND I/O PERMISSION BIT MAP The I/O instructions that directly refer to addresses in the processor’s I/O space are IN, INS, OUT and OUTS. The 80376 has the ability to selectively trap references to specific I/O addresses. The structure that enables selective trapping is the I/O Permission Bit Map in the TSS segment (see Figures 3.7 and 3.8). The I/O permission map is a bit vector. The size of the map and its location in the TSS segment are variable. The processor locates the I/O permission map by means of the I/O map base field in the fixed portion of the TSS. The I/O map base field is 16 bits wide and contains the offset of the beginning of the I/O permission map. If an I/O instruction (IN, INS, OUT or OUTS) is encountered, the processor first checks whether CPL s IOPL. If this condition is true, the I/O operation may proceed. If not true, the processor checks the I/O permission map. Each bit in the map corresponds to an I/O port byte address; for example, the bit for port 41 is found at I/O map base a 5 linearly, (5 c 8 e 40), bit offset 1. The processor tests all the bits that correspond to the I/O addresses spanned by an I/O operation; for example, a double word operation tests four bits corresponding to four adjacent byte addresses. If any tested bit is set, the processor signals a general protection exception. If all the tested bits are zero, the I/O operations may proceed. It is not necessary for the I/O permission map to represent all the I/O addresses. I/O addresses not spanned by the map are treated as if they had onebits in the map. The I/O map base should be at least one byte less than the TSS limit and the last byte beyond the I/O mapping information must contain all 1’s. Because the I/O permission map is in the TSS segment, different tasks can have different maps. Thus, the operating system can allocate ports to a task by changing the I/O permission map in the task’s TSS. IMPORTANT IMPLEMENTATION NOTE: Beyond the last byte of I/O mapping information in the I/O permission bit map must be a byte containing all 1’s. The byte of all 1’s must be within the limit of the 80376’s TSS segment (see Figure 3.7). 4.0 FUNCTIONAL DATA The Intel 80376 embedded processor features a straightforward functional interface to the external hardware. The 80376 has separate parallel buses for data and address. The data bus is 16 bits in width, and bidirectional. The address bus outputs 24-bit address values using 23 address lines and two-byte enable signals. The 80376 has two selectable address bus cycles: pipelined and non-pipelined. The pipelining option allows as much time as possible for data access by 240182 – 16 Figure 4.1. Functional Signal Groups 32 376 EMBEDDED PROCESSOR starting the pending bus cycle before the present bus cycle is finished. A non-pipelined bus cycle gives the highest bus performance by executing every bus cycle in two processor clock cycles. For maximum design flexibility, the address pipelining option is selectable on a cycle-by-cycle basis. The processor’s bus cycle is the basic mechanism for information transfer, either from system to processor, or from processor to system. 80376 bus cycles perform data transfer in a minimum of only two clock periods. On a 16-bit data bus, the maximum 80376 transfer bandwidth at 16 MHz is therefore 16 Mbytes/sec. However, any bus cycle will be extended for more than two clock periods if external hardware withholds acknowledgement of the cycle. The 80376 can relinquish control of its local buses to allow mastership by other devices, such as direct memory access (DMA) channels. When relinquished, HLDA is the only output pin driven by the 80376, providing near-complete isolation of the processor from its system (all other output pins are in a float condition). 4.1 Signal Description Overview Ahead is a brief description of the 80376 input and output signals arranged by functional groups. The signal descriptions sometimes refer to A.C. timing parameters, such as ‘‘t25 Reset Setup Time’’ and ‘‘t26 Reset Hold Time.’’ The values of these parameters can be found in Tables 6.4 and 6.5. CLOCK (CLK2) CLK2 provides the fundamental timing for the 80376. It is divided by two internally to generate the internal processor clock used for instruction execution. The internal clock is comprised of two 240182 – 17 Figure 4.2. CLK2 Signal and Internal Processor Clock 33 376 EMBEDDED PROCESSOR phases, ‘‘phase one’’ and ‘‘phase two’’. Each CLK2 period is a phase of the internal clock. Figure 4.2 illustrates the relationship. If desired, the phase of the internal processor clock can be synchronized to a known phase by ensuring the falling edge of the RESET signal meets the applicable setup and hold times t25 and t26. During coprocessor I/O transfers, A22 –A16 are driven LOW, and A23 is driven HIGH so that this address line can be used by external logic to generate the coprocessor select signal. Thus, the I/O address driven by the 80376 for coprocessor commands is 8000F8H, and the I/O address driven by the 80376 processor for coprocessor data is 8000FCH or 8000FEH. DATA BUS (D15 –D0) The address bus is capable of addressing 16 Mbytes of physical memory space (000000H through 0FFFFFFH), and 64 Kbytes of I/O address space (000000H through 00FFFFH) for programmed I/O. The address bus is active HIGH and will float during bus hold acknowledge. These three-state bidirectional signals provide the general purpose data path between the 80376 and other devices. The data bus outputs are active HIGH and will float during bus hold acknowledge. Data bus reads require that read-data setup and hold times t21 and t22 be met relative to CLK2 for correct operation. ADDRESS BUS (BHE, BLE, A23 –A1) These three-state outputs provide physical memory addresses or I/O port addresses. A23 –A16 are LOW during I/O transfers except for I/O transfers automatically generated by coprocessor instructions. The Byte Enable outputs BHE and BLE directly indicate which bytes of the 16-bit data bus are involved with the current transfer. BHE applies to D15 –D8 and BLE applies to D7 –D0. If both BHE and BLE are asserted, then 16 bits of data are being transferred. See Table 4.1 for a complete decoding of these signals. The byte enables are active LOW and will float during bus hold acknowledge. Table 4.1. Byte Enable Definitions 34 BHE BLE Function 0 0 Word Transfer 0 1 Byte Transfer on Upper Byte of the Data Bus, D15 –D8 1 0 Byte Transfer on Lower Byte of the Data Bus, D7 –D0 1 1 Never Occurs 376 EMBEDDED PROCESSOR BUS CYCLE DEFINITION SIGNALS (W/R, D/C, M/IO, LOCK) BUS CONTROL SIGNALS (ADS, READY, NA) These three-state outputs define the type of bus cycle being performed: W/R distinguishes between write and read cycles, D/C distinguishes between data and control cycles, M/IO distinguishes between memory and I/O cycles, and LOCK distinguishes between locked and unlocked bus cycles. All of these signals are active LOW and will float during bus acknowledge. The following signals allow the processor to indicate when a bus cycle has begun, and allow other system hardware to control address pipelining and bus cycle termination. The primary bus cycle definition signals are W/R, D/C and M/IO, since these are the signals driven valid as ADS (Address Status output) becomes active. The LOCK signal is driven valid at the same time the bus cycle begins, which due to address pipelining, could be after ADS becomes active. Exact bus cycle definitions, as a function of W/R, D/C and M/IO are given in Table 4.2. LOCK indicates that other system bus masters are not to gain control of the system bus while it is active. LOCK is activated on the CLK2 edge that begins the first locked bus cycle (i.e., it is not active at the same time as the other bus cycle definition pins) and is deactivated when ready is returned to the end of the last bus cycle which is to be locked. The beginning of a bus cycle is determined when READY is returned in a previous bus cycle and another is pending (ADS is active) or the clock in which ADS is driven active if the bus was idle. This means that it follows more closely with the write data rules when it is valid, but may cause the bus to be locked longer than desired. The LOCK signal may be explicitly activated by the LOCK prefix on certain instructions. LOCK is always asserted when executing the XCHG instruction, during descriptor updates, and during the interrupt acknowledge sequence. Address Status (ADS) This three-state output indicates that a valid bus cycle definition and address (W/R, D/C, M/IO, BHE, BLE and A23 –A1) are being driven at the 80376 pins. ADS is an active LOW output. Once ADS is driven active, valid address, byte enables, and definition signals will not change. In addition, ADS will remain active until its associated bus cycle begins (when READY is returned for the previous bus cycle when running pipelined bus cycles). ADS will float during bus hold acknowledge. See sections NonPipelined Bus Cycles and Pipelined Bus Cycles for additional information on how ADS is asserted for different bus states. Transfer Acknowledge (READY) This input indicates the current bus cycle is complete, and the active bytes indicated by BHE and BLE are accepted or provided. When READY is sampled active during a read cycle or interrupt acknowledge cycle, the 80376 latches the input data and terminates the cycle. When READY is sampled active during a write cycle, the processor terminates the bus cycle. Table 4.2. Bus Cycle Definition M/IO D/C W/R Bus Cycle Type 0 0 0 INTERRUPT ACKNOWLEDGE Yes 0 0 1 Does Not Occur Ð 0 1 0 I/O DATA READ No 0 1 1 I/O DATA WRITE No 1 0 0 MEMORY CODE READ No 1 0 1 HALT: Address e 2 BHE e 1 BLE e 0 No 1 1 0 MEMORY DATA READ Some Cycles 1 1 1 MEMORY DATA WRITE Some Cycles SHUTDOWN: Address e 0 BHE e 1 BLE e 0 Locked? 35 376 EMBEDDED PROCESSOR READY is ignored on the first bus state of all bus cycles, and sampled each bus state thereafter until asserted. READY must eventually be asserted to acknowledge every bus cycle, including Halt Indication and Shutdown Indication bus cycles. When being sampled, READY must always meet setup and hold times t19 and t20 for correct operation. Next Address Request (NA) This is used to request pipelining. This input indicates the system is prepared to accept new values of BHE, BLE, A23 –A1, W/R, D/C and M/IO from the 80376 even if the end of the current cycle is not being acknowledged on READY. If this input is active when sampled, the next bus cycle’s address and status signals are driven onto the bus, provided the next bus request is already pending internally. NA is ignored in clock cycles in which ADS or READY is activated. This signal is active LOW and must satisfy setup and hold times t15 and t16 for correct operation. See Pipelined Bus Cycles and Read and Write Cycles for additional information. HOLD is a level-sensitive, active HIGH, synchronous input. HOLD signals must always meet setup and hold times t23 and t24 for correct operation. Bus Hold Acknowledge (HLDA) When active (HIGH), this output indicates the 80376 has relinquished control of its local bus in response to an asserted HOLD signal, and is in the bus Hold Acknowledge state. The Bus Hold Acknowledge state offers near-complete signal isolation. In the Hold Acknowledge state, HLDA is the only signal being driven by the 80376. The other output signals or bidirectional signals (D15 –D0, BHE, BLE, A23 –A1, W/R, D/C, M/IO, LOCK and ADS) are in a high-impedance state so the requesting bus master may control them. These pins remain OFF throughout the time that HLDA remains active (see Table 4.3). Pull-up resistors may be desired on several signals to avoid spurious activity when no bus master is driving them. See Resistor Recommendations for additional information. BUS ARBITRATION SIGNALS (HOLD, HLDA) This section describes the mechanism by which the processor relinquishes control of its local buses when requested by another bus master device. See Entering and Exiting Hold Acknowledge for additional information. Bus Hold Request (HOLD) This input indicates some device other than the 80376 requires bus mastership. When control is granted, the 80376 floats A23 –A1, BHE, BLE, D15 –D0, LOCK, M/IO, D/C, W/R and ADS, and then activates HLDA, thus entering the bus hold acknowledge state. The local bus will remain granted to the requesting master until HOLD becomes inactive. When HOLD becomes inactive, the 80376 will deactivate HLDA and drive the local bus (at the same time), thus terminating the hold acknowledge condition. HOLD must remain asserted as long as any other device is a local bus master. External pull-up resistors may be required when in the hold acknowledge state since none of the 80376 floated outputs have internal pull-up resistors. See Resistor Recommendations for additional information. HOLD is not recognized while RESET is active but is recognized during the time between the high-to-low transistion of RESET and the first instruction fetch. If RESET is asserted while HOLD is asserted, RESET has priority and places the bus into an idle state, rather than the hold acknowledge (high-impedance) state. 36 When the HOLD signal is made inactive, the 80376 will deactivate HLDA and drive the bus. One rising edge on the NMI input is remembered for processing after the HOLD input is negated. Table 4.3. Output Pin State during HOLD Pin Value Pin Names 1 Float HLDA LOCK, M/IO, D/C, W/R, ADS, A23 –A1, BHE, BLE, D15 –D0 Hold Latencies The maximum possible HOLD latency depends on the software being executed. The actual HOLD latency at any time depends on the current bus activity, the state of the LOCK signal (internal to the CPU) activated by the LOCK prefix, and interrupts. The 80376 will not honor a HOLD request until the current bus operation is complete. The 80376 breaks 32-bit data or I/O accesses into 2 internally locked 16-bit bus cycles; the LOCK signal is not asserted. The 80376 breaks unaligned 16-bit or 32-bit data or I/O accesses into 2 or 3 internally locked 16-bit bus cycles. Again the LOCK signal is not asserted but a HOLD request will not be recognized until the end of the entire transfer. 376 EMBEDDED PROCESSOR Wait states affect HOLD latency. The 80376 will not honor a HOLD request until the end of the current bus operation, no matter how many wait states are required. Systems with DMA where data transfer is critical must insure that READY returns sufficiently soon. COPROCESSOR INTERFACE SIGNALS (PEREQ, BUSY, ERROR) In the following sections are descriptions of signals dedicated to the numeric coprocessor interface. In addition to the data bus, address bus, and bus cycle definition signals, these following signals control communication between the 80376 and the 80387SX processor extension. The F(N)INIT, F(N)CLEX coprocessor instructions are allowed to execute even if BUSY is active, since these instructions are used for coprocessor initialization and exception-clearing. BUSY is an active LOW, level-sensitive asynchronous signal. Setup and hold times, t29 and t30, relative to the CLK2 signal must be met to guarantee recognition at a particular clock edge. This pin is provided with a weak internal pull-up resistor of around 20 KX to VCC so that it will not float active when left unconnected. BUSY serves an additional function. If BUSY is sampled LOW at the falling edge of RESET, the 80376 processor performs an internal self-test (see Bus Activity During and Following Reset. If BUSY is sampled HIGH, no self-test is performed. Coprocessor Request (PEREQ) Coprocessor Error (ERROR) When asserted (HIGH), this input signal indicates a coprocessor request for a data operand to be transferred to/from memory by the 80376. In response, the 80376 transfers information between the coprocessor and memory. Because the 80376 has internally stored the coprocessor opcode being executed, it performs the requested data transfer with the correct direction and memory address. PEREQ is a level-sensitive active HIGH asynchronous signal. Setup and hold times, t29 and t30, relative to the CLK2 signal must be met to guarantee recognition at a particular clock edge. This signal is provided with a weak internal pull-down resistor of around 20 KX to ground so that it will not float active when left unconnected. Coprocessor Busy (BUSY) When asserted (LOW), this input indicates the coprocessor is still executing an instruction, and is not yet able to accept another. When the 80376 encounters any coprocessor instruction which operates on the numerics stack (e.g. load, pop, or arithmetic operation), or the WAIT instruction, this input is first automatically sampled until it is seen to be inactive. This sampling of the BUSY input prevents overrunning the execution of a previous coprocessor instruction. When asserted (LOW), this input signal indicates that the previous coprocessor instruction generated a coprocessor error of a type not masked by the coprocessor’s control register. This input is automatically sampled by the 80376 when a coprocessor instruction is encountered, and if active, the 80376 generates exception 16 to access the error-handling software. Several coprocessor instructions, generally those which clear the numeric error flags in the coprocessor or save coprocessor state, do execute without the 80376 generating exception 16 even if ERROR is active. These instructions are FNINIT, FNCLEX, FNSTSW, FNSTSWAX, FNSTCW, FNSTENV and FNSAVE. ERROR is an active LOW, level-sensitive asynchronous signal. Setup and hold times t29 and t30, relative to the CLK2 signal must be met to guarantee recognition at a particular clock edge. This pin is provided with a weak internal pull-up resistor of around 20 KX to VCC so that it will not float active when left unconnected. 37 376 EMBEDDED PROCESSOR INTERRUPT SIGNALS (INTR, NMI, RESET) Interrupt Latency The following descriptions cover inputs that can interrupt or suspend execution of the processor’s current instruction stream. The time that elapses before an interrupt request is serviced (interrupt latency) varies according to several factors. This delay must be taken into account by the interrupt source. Any of the following factors can affect interrupt latency: Maskable Interrupt Request (INTR) When asserted, this input indicates a request for interrupt service, which can be masked by the 80376 Flag Register IF bit. When the 80376 responds to the INTR input, it performs two interrupt acknowledge bus cycles and, at the end of the second, latches an 8-bit interrupt vector on D7 –D0 to identify the source of the interrupt. INTR is an active HIGH, level-sensitive asynchronous signal. Setup and hold times, t27 and t28, relative to the CLK2 signal must be met to guarantee recognition at a particular clock edge. To assure recognition of an INTR request, INTR should remain active until the first interrupt acknowledge bus cycle begins. INTR is sampled at the beginning of every instruction. In order to be recognized at a particular instruction boundary, INTR must be active at least eight CLK2 clock periods before the beginning of the execution of the instruction. If recognized, the 80376 will begin execution of the interrupt. Non-Maskable Interrupt Request (NMI) This input indicates a request for interrupt service which cannot be masked by software. The nonmaskable interrupt request is always processed according to the pointer or gate in slot 2 of the interrupt table. Because of the fixed NMI slot assignment, no interrupt acknowledge cycles are performed when processing NMI. NMI is an active HIGH, rising edge-sensitive asynchronous signal. Setup and hold times, t27 and t28, relative to the CLK2 signal must be met to guarantee recognition at a particular clock edge. To assure recognition of NMI, it must be inactive for at least eight CLK2 periods, and then be active for at least eight CLK2 periods before the beginning of the execution of an instruction. Once NMI processing has begun, no additional NMI’s are processed until after the next IRET instruction, which is typically the end of the NMI service routine. If NMI is re-asserted prior to that time, however, one rising edge on NMI will be remembered for processing after executing the next IRET instruction. 38 1. If interrupts are masked, and INTR request will not be recognized until interrupts are reenabled. 2. If an NMI is currently being serviced, an incoming NMI request will not be recognized until the 80376 encounters the IRET instruction. 3. An interrupt request is recognized only on an instruction boundary of the 80376 Execution Unit except for the following cases: Ð Repeat string instructions can be interrupted after each iteration. Ð If the instruction loads the Stack Segment register, an interrupt is not processed until after the following instruction, which should be an ESP load. This allows the entire stack pointer to be loaded without interruption. Ð If an instruction sets the interrupt flag (enabling interrupts), an interrupt is not processed until after the next instruction. The longest latency occurs when the interrupt request arrives while the 80376 processor is executing a long instruction such as multiplication, division or a task-switch. 4. Saving the Flags register and CS:EIP registers. 5. If interrupt service routine requires a task switch, time must be allowed for the task switch. 6. If the interrupt service routine saves registers that are not automatically saved by the 80376. RESET This input signal suspends any operation in progress and places the 80376 in a known reset state. The 80376 is reset by asserting RESET for 15 or more CLK2 periods (80 or more CLK2 periods before requesting self-test). When RESET is active, all other input pins except FLT are ignored, and all other bus pins are driven to an idle bus state as shown in Table 4.4. If RESET and HOLD are both active at a point in time, RESET takes priority even if the 80376 was in a Hold Acknowledge state prior to RESET active. RESET is an active HIGH, level-sensitive synchronous signal. Setup and hold times, t25 and t26, must be met in order to assure proper operation of the 80376. 376 EMBEDDED PROCESSOR Table 4.4. Pin State (Bus Idle) during RESET Pin Name Signal Level during RESET ADS 1 D15 –D0 Float BHE, BLE 0 A23 –A1 1 W/R 0 D/C 1 M/IO 0 LOCK 1 HLDA 0 4.2 Bus Transfer Mechanism All data transfers occur as a result of one or more bus cycles. Logical data operands of byte and word lengths may be transferred without restrictions on physical address alignment. Any byte boundary may be used, although two physical bus cycles are performed as required for unaligned operand transfers. The 80376 processor address signals are designed to simplify external system hardware. BHE and BLE provide linear selects for the two bytes of the 16-bit data bus. Byte Enable outputs BHE and BLE are asserted when their associated data bus bytes are involved with the present bus cycle, as listed in Table 4.5. Each bus cycle is composed of at least two bus states. Each bus state requires one processor clock period. Additional bus states added to a single bus cycle are called wait states. See Bus Functional Description for additional information. 4.3 Memory and I/O Spaces Bus cycles may access physical memory space or I/O space. Peripheral devices in the system may either be memory-mapped, or I/O-mapped, or both. As shown in Figure 4.3, physical memory addresses range from 000000H to 0FFFFFFH (16 Mbytes) and I/O addresses from 000000H to 00FFFFH (64 Kbytes). Note the I/O addresses used by the automatic I/O cycles for coprocessor communication are 8000F8H to 8000FFH, beyond the address range of programmed I/O, to allow easy generation of a coprocessor chip select signal using the A23 and M/IO signals. OPERAND ALIGNMENT With the flexibility of memory addressing on the 80376, it is possible to transfer a logical operand that spans more than one physical Dword or word of memory or I/O. Examples are 32-bit Dword or 16-bit word operands beginning at addresses not evenly divisible by 2. Operand alignment and size dictate when multiple bus cycles are required. Table 4.6 describes the transfer cycles generated for all combinations of logical operand lengths and alignment. Table 4.6. Transfer Bus Cycles for Bytes, Words and Dwords Table 4.5. Byte Enables and Associated Data and Operand Bytes Byte Enable BHE BLE Byte-Length of Logical Operand 1 Associated Data Bus Signals D15 –D8 (Byte 1ÐMost Significant) D7 –D0 (Byte 0ÐLeast Significant) Physical Byte Address in xx Memory (Low-Order Bits) Transfer Cycles Key: b 2 4 00 01 10 11 00 01 10 11 w lb, hb w hb, lw, hb, hw, mw, l,b hw lb, lw hb, mw lb b e byte transfer w e word transfer l e low-order portion m e mid-order portion x e don’t care h e high-order portion 39 376 EMBEDDED PROCESSOR 240182 – 18 NOTE: Since A23 is HIGH during automatic communication with coprocessor, A23 HIGH and M/IO LOW can be used to easily generate a coprocessor select signal. Figure 4.3. Physical Memory and I/O Spaces 4.4 Bus Functional Description The 80376 has separate, parallel buses for data and address. The data bus is 16 bits in width, and bidirectional. The address bus provides a 24-bit value using 23 signals for the 23 upper-order address bits and 2 Byte Enable signals to directly indicate the active bytes. These buses are interpreted and controlled by several definition signals. The definition of each bus cycle is given by three signals: M/IO, W/R and D/C. At the same time, a valid address is present on the byte enable signals, BHE and BLE, and the other address signals A23 –A1. A status signal, ADS, indicates when the 80376 issues a new bus cycle definition and address. Collectively, the address bus, data bus and all associated control signals are referred to simply as ‘‘the bus’’. When active, the bus performs one of the bus cycles below: 1. Read from memory space 2. Locked read from memory space 3. Write to memory space 4. Locked write to memory space 40 5. Read from I/O space (or coprocessor) 6. Write to I/O space (or coprocessor) 7. Interrupt acknowledge (always locked) 8. Indicate halt, or indicate shutdown Table 4.2 shows the encoding of the bus cycle definition signals for each bus cycle. See Bus Cycle Definition Signals for additonal information. When the 80376 bus is not performing one of the activities listed above, it is either Idle or in the Hold Acknowledge state, which may be detected by external circuitry. The idle state can be identified by the 80376 giving no further assertions on its address strobe output (ADS) since the beginning of its most recent bus cycle, and the most recent bus cycle having been terminated. The hold acknowledge state is identified by the 80376 asserting its hold acknowledge (HLDA) output. The shortest time unit of bus activity is a bus state. A bus state is one processor clock period (two CLK2 periods) in duration. A complete data transfer occurs during a bus cycle, composed of two or more bus states. 376 EMBEDDED PROCESSOR 240182 – 19 Figure 4.4. Fastest Read Cycles with Non-Pipelined Timing The fastest 80376 bus cycle requires only two bus states. For example, three consecutive bus read cycles, each consisting of two bus states, are shown by Figure 4.4. The bus states in each cycle are named T1 and T2. Any memory or I/O address may be accessed by such a two-state bus cycle, if the external hardware is fast enough. Every bus cycle continues until it is acknowledged by the external system hardware, using the 80376 READY input. Acknowledging the bus cycle at the end of the first T2 results in the shortest bus cycle, requiring only T1 and T2. If READY is not immediately asserted however, T2 states are repeated indefinitely until the READY input is sampled active. The pipelining option provides a choice of bus cycle timings. Pipelined or non-pipelined cycles are selectable on a cycle-by-cycle basis with the Next Address (NA) input. When pipelining is selected the address (BHE, BLE and A23 –A1) and definition (W/R, D/C, M/IO and LOCK) of the next cycle are available before the end of the current cycle. To signal their availability, the 80376 address status output (ADS) is asserted. Figure 4.5 illustrates the fastest read cycles with pipelined timing. Note from Figure 4.5 the fastest bus cycles using pipelining require only two bus states, named T1P and T2P. Therefore pipelined cycles allow the same data bandwidth as non-pipelined cycles, but address-to-data access time is increased by one T-state time compared to that of a non-pipelined cycle. 41 376 EMBEDDED PROCESSOR 240182 – 20 Figure 4.5. Fastest Read Cycles with Pipelined Timing READ AND WRITE CYCLES Data transfers occur as a result of bus cycles, classified as read or write cycles. During read cycles, data is transferred from an external device to the processor. During write cycles, data is transferred from the processor to an external device. Two choices of bus cycle timing are dynamically selectable: non-pipelined or pipelined. After an idle bus state, the processor always uses non-pipelined timing. However the NA (Next Address) input may be asserted to select pipelined timing for the next bus cycle. When pipelining is selected and the 80376 has a bus request pending internally, the address and definition of the next cycle is made available even before the current bus cycle is acknowledged by READY. Terminating a read or write cycle, like any bus cycle, requires acknowledging the cycle by asserting the READY input. Until acknowledged, the processor inserts wait states into the bus cycle, to allow adjust- 42 ment for the speed of any external device. External hardware, which has decoded the address and bus cycle type, asserts the READY input at the appropriate time. At the end of the second bus state within the bus cycle, READY is sampled. At that time, if external hardware acknowledges the bus cycle by asserting READY, the bus cycle terminates as shown in Figure 4.6. If READY is negated as in Figure 4.7, the 80376 executes another bus state (a wait state) and READY is sampled again at the end of that state. This continues indefinitely until the cycle is acknowledged by READY asserted. When the current cycle is acknowledged, the 80376 terminates it. When a read cycle is acknowledged, the 80376 latches the information present at its data pins. When a write cycle is acknowledged, the write data of the 80376 remains valid throughout phase one of the next bus state, to provide write data hold time. 376 EMBEDDED PROCESSOR 240182 – 21 Idle states are shown here for diagram variety only. Write cycles are not always followed by an idle state. An active bus cycle can immediately follow the write cycle. Figure 4.6. Various Non-Pipelined Bus Cycles (Zero Wait States) Non-Pipelined Bus Cycles Any bus cycle may be performed with non-pipelined timing. For example, Figure 4.6 shows a mixture of non-pipelined read and write cycles. Figure 4.6 shows that the fastest possible non-pipelined cycles have two bus states per bus cycle. The states are named T1 and T2. In phase one of T1, the address signals and bus cycle definition signals are driven valid and, to signal their availability, address strobe (ADS) is simultaneously asserted. During read or write cycles, the data bus behaves as follows. If the cycle is a read, the 80376 floats its data signals to allow driving by the external device being addressed. The 80376 requires that all data bus pins be at a valid logic state (HIGH or LOW) at the end of each read cycle, when READY is asserted. The system MUST be designed to meet this requirement. If the cycle is a write, data signals are driven by the 80376 beginning in phase two of T1 until phase one of the bus state following cycle acknowledgement. 43 376 EMBEDDED PROCESSOR 240182 – 22 Idle states are shown here for diagram variety only. Write cycles are not always followed by an idle state. An active bus cycle can immediately follow the write cycle. Figure 4.7. Various Non-Pipelined Bus Cycles (Various Number of Wait States) Figure 4.7 illustrates non-pipelined bus cycles with one wait state added to Cycles 2 and 3. READY is sampled inactive at the end of the first T2 in Cycles 2 and 3. Therefore Cycles 2 and 3 have T2 repeated again. At the end of the second T2, READY is sampled active. When address pipelining is not used, the address and bus cycle definition remain valid during all wait states. When wait states are added and it is desirable to maintain non-pipelined timing, it is necessary to negate NA during each T2 state except the 44 last one, as shown in Figure 4.7, Cycles 2 and 3. If NA is sampled active during a T2 other than the last one, the next state would be T2I or T2P instead of another T2. When address pipelining is not used, the bus states and transitions are completely illustrated by Figure 4.8. The bus transitions between four possible states, T1, T2, Ti, and Th. Bus cycles consist of T1 and T2, with T2 being repeated for wait states. Otherwise the bus may be idle, Ti, or in the hold acknowledge state Th. 376 EMBEDDED PROCESSOR 240182 – 23 Bus States: T1Ðfirst clock of a non-pipelined bus cycle (80376 drives new address and asserts ADS). T2Ðsubsequent clocks of a bus cycle when NA has not been sampled asserted in the current bus cycle. TiÐidle state. ThÐhold acknowledge state (80376 asserts HLDA). The fastest bus cycle consists of two states: T1 and T2. Four basic bus states describe bus operation when not using pipelined address. Figure 4.8. 80376 Bus States (Not Using Pipelined Address) Bus cycles always begin with T1. T1 always leads to T2. If a bus cycle is not acknowledged during T2 and NA is inactive, T2 is repeated. When a cycle is acknowledged during T2, the following state will be T1 of the next bus cycle if a bus request is pending internally, or Ti if there is no bus request pending, or Th if the HOLD input is being asserted. Use of pipelining allows the 80376 to enter three additional bus states not shown in Figure 4.8. Figure 4.12 is the complete bus state diagram, including pipelined cycles. Pipelined Bus Cycles Pipelining is the option of requesting the address and the bus cycle definition of the next inter- nally pending bus cycle before the current bus cycle is acknowledged with READY asserted. ADS is asserted by the 80376 when the next address is issued. The pipelining option is controlled on a cycleby-cycle basis with the NA input signal. Once a bus cycle is in progress and the current address has been valid for at least one entire bus state, the NA input is sampled at the end of every phase one until the bus cycle is acknowledged. During non-pipelined bus cycles NA is sampled at the end of phase one in every T2. An example is Cycle 2 in Figure 4.9, during which NA is sampled at the end of phase one of every T2 (it was asserted once during the first T2 and has no further effect during that bus cycle). 45 376 EMBEDDED PROCESSOR 240182 – 24 Following any idle bus state (Ti), bus cycles are non-pipelined. Within non-pipelined bus cycles, NA is only sampled during wait states. Therefore, to begin pipelining during a group of non-pipelined bus cycles requires a non-pipelined cycle with at least one wait state (Cylcle 2 above). Figure 4.9. Transitioning to Pipelining during Burst of Bus Cycles If NA is sampled active, the 80376 is free to drive the address and bus cycle definition of the next bus cycle, and assert ADS, as soon as it has a bus request internally pending. It may drive the next address as early as the next bus state, whether the current bus cycle is acknowledged at that time or not. Regarding the details of pipelining, the 80376 has the following characteristics: 1. The next address and status may appear as early as the bus state after NA was sampled active (see Figures 4.9 or 4.10). In that case, state T2P is entered immediately. However, when there is not an internal bus request already pending, the next address and status will not be available immediately after NA is asserted and T2I is entered instead of T2P (see Figure 4.11 Cycle 3). Provided the current bus cycle isn’t yet acknow- 46 ledged by READY asserted, T2P will be entered as soon as the 80376 does drive the next address and status. External hardware should therefore observe the ADS output as confirmation the next address and status are actually being driven on the bus. 2. Any address and status which are validated by a pulse on the 80376 ADS output will remain stable on the address pins for at least two processor clock periods. The 80376 cannot produce a new address and status more frequently than every two processor clock periods (see Figures 4.9, 4.10 and 4.11). 3. Only the address and bus cycle definition of the very next bus cycle is available. The pipelining capability cannot look further than one bus cycle ahead (see Figure 4.11, Cycle 1). 376 EMBEDDED PROCESSOR 240182 – 25 Following any idle bus state (Ti) the bus cycle is always non-pipelined and NA is only sampled during wait states. To start, address pipelining after an idle state requires a non-pipelined cycle with at least one wait state (cycle 1 above). The pipelined cycles (2, 3, 4 above) are shown with various numbers of wait states. Figure 4.10. Fastest Transition to Pipelined Bus Cycle Following Idle Bus State The complete bus state transition diagram, including pipelining is given by Figure 4.12. Note it is a superset of the diagram for non-pipelined only, and the three additional bus states for pipelining are drawn in bold. The fastest bus cycle with pipelining consists of just two bus states, T1P and T2P (recall for non-pipelined it is T1 and T2). T1P is the first bus state of a pipelined cycle. Initiating and Maintaining Pipelined Bus Cycles a pipelined bus cycle T1P. From an idle state, Ti, the first bus cycle must begin with T1, and is therefore a non-pipelined bus cycle. The next bus cycle will be pipelined, however, provided NA is asserted and the first bus cycle ends in a T2P state (the address and status for the next bus cycle is driven during T2P). The fastest path from an idle state to a pipelined bus cycle is shown in bold below: Ti, Ti, T1 – T2 – T2P, T1P – T2P, idle states non-pipelined cycle pipelined cycle Using the state diagram Figure 4.12, observe the transitions from an idle state, Ti, to the beginning of 47 376 EMBEDDED PROCESSOR 240182 – 26 Figure 4.11. Details of Address Pipelining during Cycles with Wait States T1–T2–T2P are the states of the bus cycle that establishes address pipelining for the next bus cycle, which begins with T1P. The same is true after a bus hold state, shown below: Th, Th, Th, T1–T2–T2P, hold aknowledge non-pipelined states cycle 48 T1P–T2P, pipelined cycle The transition to pipelined address is shown functionally by Figure 4.10, Cycle 1. Note that Cycle 1 is used to transition into pipelined address timing for the subsequent Cycles 2, 3 and 4, which are pipelined. The NA input is asserted at the appropriate time to select address pipelining for Cycles 2, 3 and 4. Once a bus cycle is in progress and the current address and status has been valid for one entire bus state, the NA input is sampled at the end of every phase one until the bus cycle is acknowledged. 376 EMBEDDED PROCESSOR 240182 – 27 Bus States: T1Ðfirst clock of a non-pipelined bus cycle (80376 drives new address, status and asserts ADS). T2Ðsubsequent clocks of a bus cycle when NA has not been sampled asserted in the current bus cycle. T2IÐsubsequent clocks of a bus cycle when NA has been sampled asserted in the current bus cycle but there is not yet an internal bus request pending (80376 will not drive new address, status or assert ADS). T2PÐsubsequent clocks of a bus cycle when NA has been sampled asserted in the current bus cycle and there is an internal bus request pending (80376 drives new address, status and asserts ADS). T1PÐfirst clock of a pipelined bus cycle. TiÐidle state. ThÐhold acknowledge state (80376 asserts HLDA). Asserting NA for pipelined bus cycles gives access to three more bus states: T2I, T2P and T1P. Using pipelining the fastest bus cycle consists of T1P and T2P. Figure 4.12. 80376 Processor Complete Bus States (Including Pipelining) 49 376 EMBEDDED PROCESSOR Sampling begins in T2 during Cycle 1 in Figure 4.10. Once NA is sampled active during the current cycle, the 80376 is free to drive a new address and bus cycle definition on the bus as early as the next bus state. In Figure 4.10, Cycle 1 for example, the next address and status is driven during state T2P. Thus Cycle 1 makes the transition to pipelined timing, since it begins with T1 but ends with T2P. Because the address for Cycle 2 is available before Cycle 2 begins, Cycle 2 is called a pipelined bus cycle, and it begins with T1P. Cycle 2 begins as soon as READY asserted terminates Cycle 1. Examples of transition bus cycles are Figure 4.10, Cycle 1 and Figure 4.9, Cycle 2. Figure 4.10 shows transition during the very first cycle after an idle bus state, which is the fastest possible transition into address pipelining. Figure 4.9, Cycle 2 shows a transition cycle occurring during a burst of bus cycles. In any case, a transition cycle is the same whenever it occurs: it consists at least of T1, T2 (NA is asserted at that time), and T2P (provided the 80376 has an internal bus request already pending, which it almost always has). T2P states are repeated if wait states are added to the cycle. Note that only three states (T1, T2 and T2P) are required in a bus cycle performing a transition from non-pipelined into pipelined timing, for example Figure 4.10, Cycle 1. Figure 4.10, Cycles 2, 3 and 4 show that pipelining can be maintained with twostate bus cycles consisting only of T1P and T2P. Once a pipelined bus cycle is in progress, pipelined timing is maintained for the next cycle by asserting NA and detecting that the 80376 enters T2P during the current bus cycle. The current bus cycle must end in state T2P for pipelining to be maintained in the next cycle. T2P is identified by the assertion of ADS. Figures 4.9 and 4.10 however, each show 50 pipelining ending after Cycle 4 because Cycle 4 ends in T2I. This indicates the 80376 didn’t have an internal bus request prior to the acknowledgement of Cycle 4. If a cycle ends with a T2 or T2I, the next cycle will not be pipelined. Realistically, pipelining is almost always maintained as long as NA is sampled asserted. This is so because in the absence of any other request, a code prefetch request is always internally pending until the instruction decoder and code prefetch queue are completely full. Therefore pipelining is maintained for long bursts of bus cycles, if the bus is available (i.e., HOLD inactive) and NA is sampled active in each of the bus cycles. INTERRUPT ACKNOWLEDGE (INTA) CYCLES In repsonse to an interrupt request on the INTR input when interrupts are enabled, the 80376 performs two interrupt acknowledge cycles. These bus cycles are similar to read cycles in that bus definition signals define the type of bus activity taking place, and each cycle continues until acknowledged by READY sampled active. The state of A2 distinguishes the first and second interrupt acknowledge cycles. The byte address driven during the first interrupt acknowledge cycle is 4 (A23 –A3, A1, BLE LOW, A2 and BHE HIGH). The byte address driven during the second interrupt acknowledge cycle is 0 (A23 –A1, BLE LOW and BHE HIGH). The LOCK output is asserted from the beginning of the first interrupt acknowledge cycle until the end of the second interrupt acknowledge cycle. Four idle bus states, Ti, are inserted by the 80376 between the two interrupt acknowledge cycles for compatibility with the interrupt specification TRHRL of the 8259A Interrupt Controller and the 82370 Integrated Peripheral. 376 EMBEDDED PROCESSOR 240182 – 28 Interrupt Vector (0–255) is read on D0–D7 at end of second Interrupt Acknowledge bus cycle. Because each Interrupt Acknowledge bus cycle is followed by idle bus states, asserting NA has no practical effect. Choose the approach which is simplest for your system hardware design. Figure 4.13. Interrupt Acknowledge Cycles During both interrupt acknowledge cycles, D15 –D0 float. No data is read at the end of the first interrupt acknowledge cycle. At the end of the second interrupt acknowledge cycle, the 80376 will read an external interrupt vector from D7 –D0 of the data bus. The vector indicates the specific interrupt number (from 0–255) requiring service. HALT INDICATION CYCLE The 80376 execution unit halts as a result of executing a HLT instruction. Signaling its entrance into the halt state, a halt indication cycle is performed. The halt indication cycle is identified by the state of the bus definition signals and a byte address of 2. See the Bus Cycle Definition Signals section. The halt indication cycle must be acknowledged by READY asserted. A halted 80376 resumes execution when INTR (if interrupts are enabled), NMI or RESET is asserted. 51 376 EMBEDDED PROCESSOR 240182 – 29 Figure 4.14. Example Halt Indication Cycle from Non-Pipelined Cycle SHUTDOWN INDICATION CYCLE The 80376 shuts down as a result of a protection fault while attempting to process a double fault. Signaling its entrance into the shutdown state, a shutdown indication cycle is performed. The shutdown indication cycle is identified by the state of the bus definition signals shown in Bus Cycle Definition Signals and a byte address of 0. The shutdown indication cycle must be acknowledged by READY asserted. A shutdown 80376 resumes execution when NMI or RESET is asserted. 52 ENTERING AND EXITING HOLD ACKNOWLEDGE The bus hold acknowledge state, Th, is entered in response to the HOLD input being asserted. In the bus hold acknowledge state, the 80376 floats all outputs or bidirectional signals, except for HLDA. HLDA is asserted as long as the 80376 remains in the bus hold acknowledge state. In the bus hold acknowledge state, all inputs except HOLD and RESET are ignored. 376 EMBEDDED PROCESSOR 240182 – 30 Figure 4.15. Example Shutdown Indication Cycle from Non-Pipelined Cycle Th may be entered from a bus idle state as in Figure 4.16 or after the acknowledgement of the current physical bus cycle if the LOCK signal is not asserted, as in Figures 4.17 and 4.18. Th is exited in response to the HOLD input being negated. The following state will be Ti as in Figure 4.16 if no bus request is pending. The following bus state will be T1 if a bus request is internally pending, as in Figures 4.17 and 4.18. Th is exited in response to RESET being asserted. If a rising edge occurs on the edge-triggered NMI input while in Th, the event is remembered as a nonmaskable interrupt 2 and is serviced when Th is exited unless the 80376 is reset before Th is exited. 53 376 EMBEDDED PROCESSOR 240182 – 31 NOTE: For maximum design flexibility the 80376 has no internal pull-up resistors on its outputs. Your design may require an external pullup on ADS and other 80376 outputs to keep them negated during float periods. Figure 4.16. Requesting Hold from Idle Bus RESET DURING HOLD ACKNOWLEDGE RESET being asserted takes priority over HOLD being asserted. If RESET is asserted while HOLD remains asserted, the 80376 drives its pins to defined states during reset, as in Table 4.5, Pin State During Reset, and performs internal reset activity as usual. If HOLD remains asserted when RESET is inactive, the 80376 enters the hold acknowledge state before performing its first bus cycle, provided HOLD is still asserted when the 80376 processor would otherwise perform its first bus cycle. If HOLD remains asserted when RESET is inactive, the BUSY input is still sampled as usual to determine whether a self test is being requested. FLOAT Activating the FLT input floats all 80376 bidirectional and output signals, including HLDA. Asserting FLT isolates the 80376 from the surrounding circuitry. 54 When an 80376 in a PQFP surface-mount package is used without a socket, it cannot be removed from the printed circuit board. The FLT input allows the 80376 to be electrically isolated to allow testing of external circuitry. This technique is known as ONCE for ‘‘ON-Circuit Emulation’’. ENTERING AND EXITING FLOAT FLT is an asynchronous, active-low input. It is recognized on the rising edge of CLK2. When recognized, it aborts the current bus cycle and floats the outputs of the 80376 (Figure 4.20). FLT must be held low for a minimum of 16 CLK2 cycles. Reset should be asserted and held asserted until after FLT is deasserted. This will ensure that the 80376 will exit float in a valid state. Asserting the FLT input unconditionally aborts the current bus cycle and forces the 80376 into the FLOAT mode. Since activating FLT unconditionally forces the 80376 into FLOAT mode, the 80376 is not 376 EMBEDDED PROCESSOR 240182 – 32 NOTE: HOLD is a synchronous input and can be asserted at any CLK2 edge, provided setup and hold (t23 and t24) requirements are met. This waveform is useful for determining Hold Acknowledge latency. Figure 4.17. Requesting Hold from Active Bus (NA Inactive) guaranteed to enter FLOAT in a valid state. After deactivating FLT, the 80376 is not guaranteed to exit FLOAT mode in a valid state. This is not a problem as the FLT pin is meant to be used only during ONCE. After exiting FLOAT, the 80376 must be reset to return it to a valid state. Reset should be asserted before FLT is deasserted. This will ensure that the 80376 will exit float in a valid state. FLT has an internal pull-up resistor, and if it is not used it should be unconnected. BUS ACTIVITY DURING AND FOLLOWING RESET RESET is the highest priority input signal, capable of interrupting any processor activity when it is assert- ed. A bus cycle in progress can be aborted at any stage, or idle states or bus hold acknowledge states discontinued so that the reset state is established. RESET should remain asserted for at least 15 CLK2 periods to ensure it is recognized throughout the 80376, and at least 80 CLK2 periods if a 80376 selftest is going to be requested at the falling edge. RESET asserted pulses less than 15 CLK2 periods may not be recognized. RESET pulses less than 80 CLK2 periods followed by a self-test may cause the selftest to report a failure when no true failure exists. Provided the RESET falling edge meets setup and hold times t25 and t26, the internal processor clock phase is defined at that time as illustrated by Figure 4.19 and Figure 6.7. 55 376 EMBEDDED PROCESSOR 240182 – 33 NOTE: HOLD is a synchronous input and can be asserted at any CLK2 edge, provided setup and hold (t23 and t24) requirements are met. This waveform is useful for determining Hold Acknowledge latency. Figure 4.18. Requesting Hold from Idle Bus (NA Active) An 80376 self-test may be requested at the time RESET goes inactive by having the BUSY input at a LOW level as shown in Figure 4.19. The self-test requires (220 a approximately 60) CLK2 periods to complete. The self-test duration is not affected by the test results. Even if the self-test indicates a 56 problem, the 80376 attempts to proceed with the reset sequence afterwards. After the RESET falling edge (and after the self-test if it was requested) the 80376 performs an internal initialization sequence for approximately 350 to 450 CLK2 periods. 376 EMBEDDED PROCESSOR 240182 – 34 NOTES: 1. BUSY should be held stable for 8 CLK2 periods before and after the CLK2 period in which RESET falling edge occurs. 2. If self-test is requested, the 80376 outputs remain in their reset state as shown here. Figure 4.19. Bus Activity from Reset until First Code Fetch 240182 – 53 Figure 4.20. Entering and Exiting FLOAT 57 376 EMBEDDED PROCESSOR 4.5 Self-Test Signature Upon completion of self-test (if self-test was requested by driving BUSY LOW at the falling edge of RESET) the EAX register will contain a signature of 00000000H indicating the 80376 passed its self-test of microcode and major PLA contents with no problems detected. The passing signature in EAX, 00000000H, applies to all 80376 revision levels. Any non-zero signature indicates the 80376 unit is faulty. 4.6 Component and Revision Identifiers To assist 80376 users, the 80376 after reset holds a component identifier and revision identifier in its DX register. The upper 8 bits of DX hold 33H as identification of the 80376 component. (The lower nibble, 03H, refers to the Intel386 TM architecture. The upper nibble, 30H, refers to the third member of the Intel386 family). The lower 8 bits of DX hold an 8-bit unsigned binary number related to the component revision level. The revision identifier will, in general, chronologically track those component steppings which are intended to have certain improvements or distinction from previous steppings. The 80376 revision identifier will track that of the 80386 where possible. The revision identifier is intended to assist 80376 users to a practical extent. However, the revision identifier value is not guaranteed to change with every stepping revision, or to follow a completely uniform numerical sequence, depending on the type or intention of revision, or manufacturing materials required to be changed. Intel has sole discretion over these characteristics of the component. As the 80376 begins supporting a coprocessor instruction, it tests the BUSY and ERROR signals to determine if the coprocessor can accept its next instruction. Thus, the BUSY and ERROR inputs eliminate the need for any ‘‘preamble’’ bus cycles for communication between processor and coprocessor. The 80387SX can be given its command opcode immediately. The dedicated signals provide instruction synchronization, and eliminate the need of using the 80376 WAIT opcode (9BH) for 80387SX instruction synchronization (the WAIT opcode was required when the 8086 or 8088 was used with the 8087 coprocessor). Custom coprocessors can be included in 80376 based systems by memory-mapped or I/O-mapped interfaces. Such coprocessor interfaces allow a completely custom protocol, and are not limited to a set of coprocessor protocol ‘‘primitives’’. Instead, memory-mapped or I/O-mapped interfaces may use all applicable 80376 instructions for high-speed coprocessor communication. The BUSY and ERROR inputs of the 80376 may also be used for the custom coprocessor interface, if such hardware assist is desired. These signals can be tested by the 80376 WAIT opcode (9BH). The WAIT instruction will wait until the BUSY input is inactive (interruptable by an NMI or enabled INTR input), but generates an exception 16 fault if the ERROR pin is active when the BUSY goes (or is) inactive. If the custom coprocessor interface is memory-mapped, protection of the addresses used for the interface can be provided with the segmentation mechanism of the 80376. If the custom interface is I/O-mapped, protection of the interface can be provided with the 80376 IOPL (I/O Privilege Level) mechanism. 80376 Stepping Name Revision Identifier A0 05H The 80387SX numeric coprocessor interface is I/O mapped as shown in Table 4.8. Note that the 80387SX coprocessor interface addresses are beyond the 0H-0FFFFH range for programmed I/O. When the 80376 supports the 80387SX coprocessor, the 80376 automatically generates bus cycles to the coprocessor interface addresses. B 08H Table 4.8 Numeric Coprocessor Port Addresses Table 4.7. Component and Revision Identifier History 4.7 Coprocessor Interfacing The 80376 provides an automatic interface for the Intel 80387SX numeric floating-point coprocessor. The 80387SX coprocessor uses an I/O mapped interface driven automatically by the 80376 and assisted by three dedicated signals: BUSY, ERROR and PEREQ. 58 Address in 80376 I/O Space 80387SX Coprocessor Register 8000F8H 8000FCH 8000FEH Opcode Register Operand Register Operand Register 376 EMBEDDED PROCESSOR Table 5.2. 80376 Maximum Allowable Ambient Temperature at Various Airflows SOFTWARE TESTING FOR COPROCESSOR PRESENCE When software is used to test coprocessor (80387SX) presence, it should use only the following coprocessor opcodes: FNINIT, FNSTCW and FNSTSW. To use other coprocessor opcodes when a coprocessor is known to be not present, first set EM e 1 in the 80376 CR0 register. 5.0 PACKAGE THERMAL SPECIFICATIONS The Intel 80376 embedded processor is specified for operation when case temperature is within the range of 0§ C–115§ C for both the ceramic 88-pin PGA package and the plastic 100-pin PQFP package. The case temperature may be measured in any environment, to determine whether the 80376 is within specified operating range. The case temperature should be measured at the center of the top surface. The ambient temperature is guaranteed as long as Tc is not violated. The ambient temperature can be calculated from the ijc and ija from the following equations: TJ e Tc a P*ijc TA e Tj b P*ija TC e Ta a P*[ija b ijc] Values for ija and ijc are given in Table 5.1 for the 100-lead fine pitch. ija is given at various airflows. Table 5.2 shows the maximum Ta allowable (without exceeding Tc) at various airflows. Note that Ta can be improved further by attaching ‘‘fins’’ or a ‘‘heat sink’’ to the package. P is calculated using the maximum cold Icc of 305 mA and the maximum VCC of 5.5V for both packages. Table 5.1. 80376 Package Thermal Characteristics Thermal Resistances (§ C/Watt) ijc and ija ija Versus Airflow-ft/min (m/sec) Package ijc 0 200 400 600 800 1000 (0) (1.01) (2.03) (3.04) (4.06) (5.07) 100-Lead 7.5 34.5 29.5 Fine Pitch 25.5 22.5 21.5 21.0 88-Pin PGA 17.0 14.5 12.5 12.0 2.5 29.0 22.5 TA(§ C) vs Airflow-ft/min (m/sec) Package ijc 0 200 400 600 800 1000 (0) (1.01) (2.03) (3.04) (4.06) (5.07) 100-Lead 7.5 70 Fine Pitch 78 85 90 92 93 88-Pin PGA 81 90 95 98 99 2.5 70 6.0 ELECTRICAL SPECIFICATIONS The following sections describe recommended electrical connections for the 80376, and its electrical specifications. 6.1 Power and Grounding The 80376 is implemented in CHMOS IV technology and has modest power requirements. However, its high clock frequency and 47 output buffers (address, data, control, and HLDA) can cause power surges as multiple output buffers drive new signal levels simultaneously. For clean on-chip power distribution at high frequency, 14 VCC and 18 VSS pins separately feed functional units of the 80376. Power and ground connections must be made to all external VCC and GND pins of the 80376. On the circuit board, all VCC pins should be connected on a VCC plane and all VSS pins should be connected on a GND plane. POWER DECOUPLING RECOMMENDATIONS Liberal decoupling capacitors should be placed near the 80376. The 80376 driving its 24-bit address bus and 16-bit data bus at high frequencies can cause transient power surges, particularly when driving large capacitive loads. Low inductance capacitors and interconnects are recommended for best high frequency electrical performance. Inductance can be reduced by shortening circuit board traces between the 80376 and decoupling capacitors as much as possible. RESISTOR RECOMMENDATIONS The ERROR, FLT and BUSY inputs have internal pull-up resistors of approximately 20 KX and the PEREQ input has an internal pull-down resistor of approximately 20 KX built into the 80376 to keep these signals inactive when the 80387SX is not present in the system (or temporarily removed from its socket). 59 376 EMBEDDED PROCESSOR In typical designs, the external pull-up resistors shown in Table 6.1 are recommended. However, a particular design may have reason to adjust the resistor values recommended here, or alter the use of pull-up resistors in other ways. If not using address pipelining connect the NA pin to a pull-up resistor in the range of 20 KX to VCC. Table 6.1. Recommended Resistor Pull-Ups to VCC Table 6.2. Maximum Ratings Pin Signal Pull-Up Value 16 ADS Purpose 20 KX g 10% Lightly Pull ADS Inactive during 80376 Hold Acknowledge States 26 LOCK 20 KX g 10% Lightly Pull LOCK Inactive during 80376 Hold Acknowledge States OTHER CONNECTION RECOMMENDATIONS For reliable operation, always connect unused inputs to an appropriate signal level. N/C pins should always remain unconnected. Connection of N/C pins to VCC or VSS will result in incompatibility with future steppings of the 80376. Particularly when not using interrupts or bus hold (as when first prototyping), prevent any chance of spurious activity by connecting these associated inputs to GND: ÐINTR ÐNMI ÐHOLD 60 6.2 Absolute Maximum Ratings Parameter Maximum Rating Storage Temperature b 65§ C to a 150§ C Case Temperature under Bias b 65§ C to a 120§ C Supply Voltage with Respect to VSS b 0.5V to a 6.5V Voltage on Other Pins b 0.5V to (VCC a 0.5)V Table 6.2 gives a stress ratings only, and functional operation at the maximums is not guaranteed. Functional operating conditions are given in Section 6.3, D.C. Specifications, and Section 6.4, A.C. Specifications. Extended exposure to the Maximum Ratings may affect device reliability. Furthermore, although the 80376 contains protective circuitry to resist damage from static electric discharge, always take precautions to avoid high static voltages or electric fields. 376 EMBEDDED PROCESSOR 6.3 D.C. Specifications ADVANCE INFORMATION SUBJECT TO CHANGE Table 6.3: 80376 D.C. Characteristics Functional Operating Range: VCC e 5V g 10%; TCASE e 0§ C to 115§ C for 88-pin PGA or 100-pin PQFP Min Max VIL Symbol Input LOW Voltage Parameter b 0.3 a 0.8 VIH Input HIGH Voltage 2.0 VILC CLK2 Input LOW Voltage b 0.3 Unit V(1) VCC a 0.3 V(1) a 0.8 V(1) VCC b 0.8 VCC a 0.3 V(1) VIHC CLK2 Input HIGH Voltage VOL Output LOW Voltage IOL e 4 mA: A23 –A1, D15 –D0 0.45 V(1) IOL e 5 mA: BHE, BLE, W/R, D/C, M/IO, LOCK, ADS, HLDA 0.45 V(1) VOH Output High Voltage IOH e b 1 mA: A23 –A1, D15 –D0 IOH e b 0.2 mA: 2.4 V(1) VCC b 0.5 V(1) 2.4 V(1) VCC b 0.5 V(1) A23 –A1, D15 –D0 IOH e b 0.9 mA: BHE, BLE, W/R, D/C, M/IO, LOCK, ADS, HLDA IOH e b 0.18 mA: BHE, BLE, W/R, D/C, M/IO, LOCK ADS, HLDA ILI Input Leakage Current (For All Pins except PEREQ, BUSY, FLT and ERROR) g 15 mA, 0V s VIN s VCC(1) IIH Input Leakage Current (PEREQ Pin) 200 mA, VIH e 2.4V(1, 2) IIL Input Leakage Current (BUSY and ERROR Pins) b 400 mA, VIL e 0.45V(3) ILO Output Leakage Current g 15 mA, 0.45V s VOUT s VCC(1) ICC Supply Current CLK2 e 32 MHz CLK2 e 40 MHz 275 305 mA, ICC typ e 175 mA(4) mA, ICC typ e 200 mA(4) CIN Input Capacitance 10 pF, FC e 1 MHz(5) COUT Output or I/O Capacitance 12 pF, FC e 1 MHz(5) CCLK CLK2 Capacitance 20 pF, FC e 1 MHz(5) NOTES: 1. Tested at the minimum operating frequency of the device. 2. PEREQ input has an internal pull-down resistor. 3. BUSY, FLT and ERROR inputs each have an internal pull-up resistor. 4. ICC max measurement at worse case load, VCC and temperature (0§ C). 5. Not 100% tested. 61 376 EMBEDDED PROCESSOR The A.C. specifications given in Table 6.4 consist of output delays, input setup requirements and input hold requirements. All A.C. specifications are relative to the CLK2 rising edge crossing the 2.0V level. A.C. specification measurement is defined by Figure 6.1. Inputs must be driven to the voltage levels indicated by Figure 6.1 when A.C. specifications are measured. 80376 output delays are specified with minimum and maximum limits measured as shown. The minimum 80376 delay times are hold times provided to external circuitry. 80376 input setup and hold times are specified as minimums, defining the smallest acceptable sampling window. Within the sampling window, a synchronous input signal must be stable for correct 80376 processor operation. Outputs NA, W/R, D/C, M/IO, LOCK, BHE, BLE, A23 –A1 and HLDA only change at the beginning of phase one. D15 –D0 (write cycles) only change at the beginning of phase two. The READY, HOLD, BUSY, ERROR, PEREQ and D15 –D0 (read cycles) inputs are sampled at the beginning of phase one. The NA, INTR and NMI inputs are sampled at the beginning of phase two. 240182 – 35 LEGEND: AÐMaximum Output Delay Spec. BÐMinimum Output Delay Spec. CÐMinimum Input Setup Spec. DÐMinimum Input Hold Spec. Figure 6.1. Drive Levels and Measurement Points for A.C. Specifications 62 376 EMBEDDED PROCESSOR 6.4 A.C. Specifications Table 6.4. 80376 A.C. Characteristics at 16 MHz Functional Operating Range: VCC e 5V g 10%; TCASE e 0§ C to 115§ C for 88-pin PGA or 100-pin PQFP Symbol Min Max Unit Operating Frequency Parameter 4 16 MHz Figure Notes t1 CLK2 Period 31 125 ns 6.3 t2a CLK2 HIGH Time 9 ns 6.3 At 2(3) t2b CLK2 HIGH Time 5 ns 6.3 At (VCC b 0.8)V(3) t3a CLK2 LOW Time 9 ns 6.3 At 2V(3) t3b CLK2 LOW Time 7 ns 6.3 At 0.8V(3) t4 CLK2 Fall Time 8 ns 6.3 (VCC b 0.8)V to 0.8V(3) t5 CLK2 Rise Time 8 ns 6.3 0.8V to (VCC b 0.8)(3) t6 A23 –A1 Valid Delay 4 36 ns 6.5 CL e 120 pF(4) t7 A23 –A1 Float Delay 4 40 ns 6.6 (1) t8 BHE, BLE, LOCK Valid Delay 4 36 ns 6.5 CL e 75 pF(4) t9 BHE, BLE, LOCK Float Delay 4 40 ns 6.6 (1) t10 W/R, M/IO, D/C, ADS Valid Delay 6 33 ns 6.5 CL e 75 pF(4) t11 W/R, M/IO, D/C, ADS Float Delay 6 35 ns 6.6 (1) t12 D15 –D0 Write Data Valid Delay 4 40 ns 6.5 CL e 120 pF(4) t13 D15 –D0 Write Data Float Delay 4 35 ns 6.6 (1) t14 HLDA Valid Delay 4 33 ns 6.6 CL e 75 pF(4) t15 NA Setup Time 5 ns 6.4 t16 NA Hold Time 21 ns 6.6 t19 READY Setup Time 19 ns 6.4 t20 READY Hold Time 4 ns 6.4 t21 Setup Time D15 –D0 Read Data 9 ns 6.4 t22 Hold Time D15 –D0 Read Data 6 ns 6.4 t23 HOLD Setup Time 26 ns 6.4 t24 HOLD Hold Time 5 ns 6.4 t25 RESET Setup Time 13 ns 6.7 t26 RESET Hold Time 4 ns 6.7 Half CLK2 Freq 63 376 EMBEDDED PROCESSOR Table 6.4. 80376 A.C. Characteristics at 16 MHz (Continued) Functional Operating Range: VCC e 5V g 10%; TCASE e 0§ C to 115§ C for 88-pin PGA or 100-pin PQFP Symbol Parameter Min Max Unit Figure Notes t27 NMI, INTR Setup Time 16 ns 6.4 (2) t28 NMI, INTR Hold Time 16 ns 6.4 (2) t29 PEREQ, ERROR, BUSY, FLT Setup Time 16 ns 6.4 (2) t30 PEREQ, ERROR, BUSY, FLT Hold Time 5 ns 6.4 (2) NOTES: 1. Float condition occurs when maximum output current becomes less than ILO in magnitude. Float delay is not 100% tested. 2. These inputs are allowed to be asynchronous to CLK2. The setup and hold specifications are given for testing purposes, to assure recognition within a specific CLK2 period. 3. These are not tested. They are guaranteed by design characterization. 4. Tested with CL set to 50 pF and derated to support the indicated distributed capacitive load. See Figures 6.8 through 6.10 for capacitive derating curves. 5. The 80376 does not have t17 or t18 timing specifications. Table 6.5. 80376 A.C. Characteristics at 20 MHz Functional Operating Range: VCC e 5V g 10%; TCASE e 0§ C to 115§ C for 88-pin PGA or 100-pin PQFP Symbol 64 Parameter Min Max Unit Operating Frequency 4 20 MHz Figure Notes t1 CLK2 Period 25 125 ns 6.3 t2a CLK2 HIGH Time 8 ns 6.3 At 2V(3) t2b CLK2 HIGH Time 5 ns 6.3 At (VCC b 0.8)V(3) t3a CLK2 LOW Time 8 ns 6.3 At 2V(3) t3b CLK2 LOW Time 6 ns 6.3 At 0.8V(3) t4 CLK2 Fall Time 8 ns 6.3 (VCC b 0.8V) to 0.8V(3) t5 CLK2 Rise Time 8 ns 6.3 0.8V to (VCC b 0.8)(3) t6 A23 –A1 Valid Delay 4 30 ns 6.5 CL e 120 pF(4) t7 A23 –A1 Float Delay 4 ns 6.6 (1) t8 BHE, BLE, LOCK Valid Delay 4 30 ns 6.5 CL e 75 pF(4) t9 BHE, BLE, LOCK Float Delay 4 32 ns 6.6 (1) t10a M/IO, D/C Valid Delay 6 28 ns 6.5 CL e 75 pF(4) t10b W/R, ADS Valid Delay 6 26 ns 6.5 CL e 75 pF(4) t11 W/R, M/IO, D/C, ADS Float Delay 6 30 ns 6.6 (1) t12 D15 –D0 Write Data Valid Delay 4 38 ns 6.5 CL e 120 pF t13 D15 –D0 Write Data Float Delay 4 27 ns 6.6 (1) Half CLK2 Frequency 376 EMBEDDED PROCESSOR Table 6.5. 80376 A.C. Characteristics at 20 MHz (Continued) Functional Operating Range: VCC e 5V g 10%; TCASE e 0§ C to 115§ C for 88-pin PGA or 100-pin PQFP Symbol Parameter Min Max Unit Figure Notes 28 ns 6.5 CL e 75 pF(4) ns 6.4 t14 HLDA Valid Delay 4 t15 NA Setup Time 5 t16 NA Hold Time 12 ns 6.4 t19 READY Setup Time 12 ns 6.4 t20 READY Hold Time 4 ns 6.4 t21 D15 –D0 Read Data Setup Time 9 ns 6.4 t22 D15 –D0 Read Data Hold Time 6 ns 6.4 t23 HOLD Setup Time 17 ns 6.4 t24 HOLD Hold Time 5 ns 6.4 t25 RESET Setup Time 12 ns 6.7 t26 RESET Hold Time 4 ns 6.7 t27 NMI, INTR Setup Time 16 ns 6.4 (2) t28 NMI, INTR Hold Time 16 ns 6.4 (2) t29 PEREQ, ERROR, BUSY, FLT Setup Time 14 ns 6.4 (2) t30 PEREQ, ERROR, BUSY, FLT Hold Time 5 ns 6.4 (2) NOTES: 1. Float condition occurs when maximum output current becomes less than ILO in magnitude. Float delay is not 100% tested. 2. These inputs are allowed to be asynchronous to CLK2. The setup and hold specifications are given for testing purposes, to assure recognition within a specific CLK2 period. 3. These are not tested. They are guaranteed by design characterization. 4. Tested with CL set to 50 pF and derated to support the indicated distributed capacitive load. See Figures 6.8 through 6.10 for capacitive derating curves. 5. The 80376 does not have t17 or t18 timing specifications. A.C. TEST LOADS A.C. TIMING WAVEFORMS 240182 – 36 240182 – 37 Figure 6.2. A.C. Test Loads Figure 6.3. CLK2 Waveform 65 376 EMBEDDED PROCESSOR 240182 – 38 Figure 6.4. A.C. Timing WaveformsÐInput Setup and Hold Timing 240182 – 39 Figure 6.5. A.C. Timing WaveformsÐOutput Valid Delay Timing 66 376 EMBEDDED PROCESSOR 240182 – 40 Figure 6.6. A.C. Timing WaveformsÐOutput Float Delay and HLDA Valid Delay Timing 240182 – 41 The second internal processor phase following RESET high-to-low transition (provided t25 and t26 are met) is U2. Figure 6.7. A.C. Timing WaveformsÐRESET Setup and Hold Timing, and Internal Phase 67 376 EMBEDDED PROCESSOR 240182 – 43 240182 – 42 Figure 6.8. Typical Output Valid Delay versus Load Capacitance at Maximum Operating Temperature (CL e 120 pF) Figure 6.9. Typical Output Valid Delay versus Load Capacitance at Maximum Operating Temperature (CL e 75 pF) 240182 – 44 Figure 6.10. Typical Output Rise Time versus Load Capacitance at Maximum Operating Temperature 240182 – 45 Figure 6.11. Typical ICC vs Frequency 68 376 EMBEDDED PROCESSOR 6.5 Designing for the ICE TM -376 Emulator The 376 embedded processor in-circuit emulator product is the ICE-376 emulator. Use of the emulator requires the target system to provide a socket that is compatible with the ICE-376 emulator. The 80376 offers two different probes for emulating user systems: an 88-pin PGA probe and a 100-pin fine pitch flat-pack probe. The 100-pin fine pitch flatpack probe requires a socket, called the 100-pin PQFP, which is available from 3-M Textool (part number 2-0100-07243-000). The ICE-376 emulator probe attaches to the target system via an adapter which replaces the 80376 component in the target system. Because of the high operating frequency of 80376 systems and of the ICE-376 emulator, there is no buffering between the 80376 emulation processor in the ICE-376 emulator probe and the target system. A direct result of the non-buffered interconnect is that the ICE-376 emulator shares the address and data bus with the user’s system, and the RESET signal is intercepted by the ICE emulator hardware. In order for the ICE-376 emulator to be functional in the user’s system without the Optional Isolation Board (OIB) the designer must be aware of the following conditions: 1. The bus controller must only enable data transceivers onto the data bus during valid read cycles of the 80376, other local devices or other bus masters. 2. Before another bus master drives the local processor address bus, the other master must gain control of the address bus by asserting HOLD and receiving the HLDA response. 3. The emulation processor receives the RESET signal 2 or 4 CLK2 cycles later than an 80376 would, and responds to RESET later. Correct phase of the response is guaranteed. In addition to the above considerations, the ICE-376 emulator processor module has several electrical and mechanical characteristics that should be taken into consideration when designing the 80376 system. Capacitive Loading: ICE-376 adds up to 27 pF to each 80376 signal. Drive Requirements: ICE-376 adds one FAST TTL load on the CLK2, control, address, and data lines. These loads are within the processor module and are driven by the 80376 emulation processor, which has standard drive and loading capability listed in Tables 6.3 and 6.4. Power Requirements: For noise immunity and CMOS latch-up protection the ICE-376 emulator processor module is powered by the user system. The circuitry on the processor module draws up to 1.4A including the maximum 80376 ICC from the user 80376 socket. 80376 Location and Orientation: The ICE-376 emulator processor module may require lateral clearance. Figure 6.12 shows the clearance requirements of the iMP adapter and Figure 6.13 shows the clearance requirements of the 88-pin PGA adapter. The 240182 – 46 Figure 6.12. Preliminary ICE TM -376 Emulator User Cable with PQFP Adapter 69 376 EMBEDDED PROCESSOR 240182 – 50 Figure 6.13. ICE TM -376 Emulator User Cable with 88-Pin PGA Adapter optional isolation board (OIB), which provides extra electrical buffering and has the same lateral clearance requirements as Figures 6.12 and 6.13, adds an additional 0.5 inches to the vertical clearance requirement. This is illustrated in Figure 6.14. on the user’s bus. The OIB allows the ICE-376 emulator to function in user systems with faults (shorted signals, etc.). After electrical verification the OIB may be removed. When the OIB is installed, the user system must have a maximum CLK2 frequency of 20 MHz. Optional Isolation Board (OIB) and the CLK2 speed reduction: Due to the unbuffered probe design, the ICE-376 emulator is susceptible to errors 240182 – 51 Figure 6.14. ICE TM -376 Emulator User Cable with OIB and PQFP Adapter 70 376 EMBEDDED PROCESSOR 7.0 DIFFERENCES BETWEEN THE 80376 AND THE 80386 The following are the major differences between the 80376 and the 80386. 1. The 80376 generates byte selects on BHE and BLE (like the 8086 and 80286 microprocessors) to distinguish the upper and lower bytes on its 16-bit data bus. The 80386 uses four-byte selects, BE0–BE3, to distinguish between the different bytes on its 32-bit bus. 2. The 80376 has no bus sizing option. The 80386 can select between either a 32-bit bus or a 16-bit bus by use of the BS16 input. The 80376 has a 16-bit bus size. 3. The NA pin operation in the 80376 is identical to that of the NA pin on the 80386 with one exception: the NA pin of the 80386 cannot be activated on 16-bit bus cycles (where BS16 is LOW in the 80386 case), whereas NA can be activated on any 80376 bus cycle. 4. The contents of all 80376 registers at reset are identical to the contents of the 80386 registers at reset, except the DX register. The DX register contains a component-stepping identifier at reset, i.e. in 80386, after reset DH e 03H indicates 80386 DL e revision number; in 80376, after reset DH e 33H indicates 80376 DL e revision number. 5. The 80386 uses A31 and M/IO as a select for numerics coprocessor. The 80376 uses the A23 and M/IO to select its numerics coprocessor. 6. The 80386 prefetch unit fetches code in fourbyte units. The 80376 prefetch unit reads two bytes as one unit (like the 80286 microprocessor). In BS16 mode, the 80386 takes two consecutive bus cycles to complete a prefetch request. If there is a data read or write request after the prefetch starts, the 80386 will fetch all four bytes before addressing the new request. 7. The 80376 has no paging mechanism. 8. The 80376 starts executing code in what corresponds to the 80386 protected mode. The 80386 starts execution in real mode, which is then used to enter protected mode. 9. The 80386 has a virtual-86 mode that allows the execution of a real mode 8086 program as a task in protected mode. The 80376 has no virtual-86 mode. 10. The 80386 maps a 48-bit logical address into a 32-bit physical address by segmentation and paging. The 80376 maps its 48-bit logical address into a 24-bit physical address by segmentation only. 11. The 80376 uses the 80387SX numerics coprocessor for floating point operations, while the 80386 uses the 80387 coprocessor. 12. The 80386 can execute from 16-bit code segments. The 80376 can only execute from 32-bit code Segments. 13. The 80376 has an input called FLT which threestates all bidirectional and output pins, including HLDA, when asserted. It is used with ON Circuit Emulation (ONCE). 8.0 INSTRUCTION SET This section describes the 376 embedded processor instruction set. Table 8.1 lists all instructions along with instruction encoding diagrams and clock counts. Further details of the instruction encoding are then provided in the following sections, which completely describe the encoding structure and the definition of all fields occurring within 80376 instructions. 8.1 80376 Instruction Encoding and Clock Count Summary To calculate elapsed time for an instruction, multiply the instruction clock count, as listed in Table 8.1 below, by the processor clock period (e.g. 50 ns for an 80376 operating at 20 MHz). The actual clock count of an 80376 program will average 10% more 71 376 EMBEDDED PROCESSOR than the calculated clock count due to instruction sequences which execute faster than they can be fetched from memory. Instruction Clock Count Assumptions: 1. The instruction has been prefetched, decoded, and is ready for execution. 2. Bus cycles do not require wait states. 3. There are no local bus HOLD requests delaying processor acess to the bus. 4. No exceptions are detected during instruction execution. 5. If an effective address is calculated, it does not use two general register components. One register, scaling and displacement can be used within the clock counts showns. However, if the effective address calculation uses two general register components, add 1 clock to the clock count shown. 6. Memory reference instruction accesses byte or aligned 16-bit operands. Instruction Clock Count Notation Ð If two clock counts are given, the smaller refers to a register operand and the larger refers to a memory operand. 72 Ðn e number of times repeated. Ðm e number of components in the next instruction executed, where the entire displacement (if any) counts as one component, the entire immediate data (if any) counts as one component, and all other bytes of the instruction and prefix(es) each count as one component. Misaligned or 32-Bit Operand Accesses: Ð If instructions accesses a misaligned 16-bit operand or 32-bit operand on even address add: 2* clocks for read or write. 4** clocks for read and write. Ð If instructions accesses a 32-bit operand on odd address add: 4* clocks for read or write. 8** clocks for read and write. Wait States: Wait states add 1 clock per wait state to instruction execution for each data access. 376 EMBEDDED PROCESSOR Table 8.1. 80376 Instruction Set Clock Count Summary Instruction Clock Counts Number of Data Cycles Notes 2/2* 0/1* a 2/4* 0/1* a 2/2* 0/1* a immediate data 2 2 Format GENERAL DATA TRANSFER MOV e Move: Register to Register/Memory 1000100w mod reg Register/Memory to Register 1000101w mod reg r/m Immediate to Register/Memory 1100011w mod 0 0 0 r/m Immediate to Register (Short Form) 1011 w reg r/m immediate data Memory to Accumulator (Short Form) 1010000w full displacement 4* 1* a Accumulator to Memory (Short Form) 1010001w full displacement 2* 1* a Register/Memory to Segment Register 10001110 mod sreg3 r/m 22/23* 0/6* a,b,c Segment Register to Register/Memory 10001100 mod sreg3 r/m 2/2* 0/1* a 00001111 1011111w mod reg r/m 3/6* 0/1* a 00001111 1011011w mod reg r/m 3/6* 0/1* a MOVSX e Move with Sign Extension Register from Register/Memory MOVZX e Move with Zero Extension Register from Register/Memory PUSH e Push: Register/Memory Register (Short Form) Segment Register (ES, CS, SS or DS) Segment Register (FS or GS) 11111111 7/9* 2/4* a reg 4 2 a 0 0 0 sreg2 1 1 0 4 2 a 01010 mod 1 1 0 r/m 00001111 1 0 sreg3 0 0 0 4 2 a Immediate 011010s0 immediate data 4 2 a PUSHA e Push All 01100000 34 16 a 7/9* 2/4* a 6 2 a 25 6 a, b, c 25 6 a, b, c 40 16 a 3/5** 0/2** a, m 3 0 POP e Pop Register/Memory Register (Short Form) Segment Register (ES, SS or DS) Segment Register (FS or GS) POPA e Pop All 10001111 01011 mod 0 0 0 r/m reg 0 0 0 sreg 2 1 1 1 00001111 1 0 sreg 3 0 0 1 01100001 XCHG e Exchange Register/Memory with Register Register with Accumulator (Short Form) 1000011w 10010 mod reg r/m reg IN e Input from: Fixed Port Variable Port 1110010w port number 1110110w 6* 1* 26* 1* f,k f,l 7* 1* f,k 27* 1* f,l f,k OUT e Output to: Fixed Port 1110011w Variable Port 1110111w LEA e Load EA to Register 10001101 port number mod reg r/m 4* 1* 24* 1* f,l 5* 1* f,k 26* 1* f,l 2 73 376 EMBEDDED PROCESSOR Table 8.1. 80376 Instruction Set Clock Count Summary (Continued) Instruction Format Clock Counts Number of Data Cycles Notes 26* 6* a, b, c SEGMENT CONTROL LDS e Load Pointer to DS 11000101 mod reg r/m LES e Load Pointer to ES 11000100 mod reg r/m 26* 6* a, b, c LFS e Load Pointer to FS 00001111 10110100 mod reg r/m 29* 6* a, b, c LGS e Load Pointer to GS 00001111 10110101 mod reg r/m 29* 6* a, b, c LSS e Load Pointer to SS 00001111 10110010 mod reg r/m 26* 6* a, b, c FLAG CONTROL CLC e Clear Carry Flag 11111000 CLD e Clear Direction Flag 11111100 2 2 CLI e Clear Interrupt Enable Flag 11111010 8 f 5 e CLTS e Clear Task Switched Flag 00001111 CMC e Complement Carry Flag 11110101 00000110 2 LAHF e Load AH into Flag 10011111 2 POPF e Pop Flags 10011101 7 a, g PUSHF e Push Flags 10011100 4 a SAHF e Store AH into Flags 10011110 3 STC e Set Carry Flag 11111001 2 STD e Set Direction Flag 11111101 2 STI e Set Interrupt Enable Flag 11111011 8 f ARITHMETIC ADD e Add Register to Register 000000dw mod reg r/m 2 Register to Memory 0000000w mod reg r/m 7** Memory to Register 0000001w mod reg r/m Immediate to Register/Memory 100000sw mod 0 0 0 r/m Immediate to Accumulator (Short Form) 0000010w immediate data immediate data 2** a 6* 1* a 2/7** 0/2** a 2 ADC e Add with Carry Register to Register 000100dw mod reg r/m 2 Register to Memory 0001000w mod reg r/m 7** 2** a Memory to Register 0001001w mod reg r/m 6* 1* a Immediate to Register/Memory 100000sw mod 0 1 0 r/m 2/7** 0/2** a Immediate to Accumulator (Short Form) 0001010w 0/2** a immediate data immediate data 2 INC e Increment Register/Memory Register (Short Form) 1111111w 01000 mod 0 0 0 r/m reg 2/6** 2 SUB e Subtract Register from Register 74 001010dw mod reg r/m 2 376 EMBEDDED PROCESSOR Table 8.1. 80376 Instruction Set Clock Count Summary (Continued) Instruction Format Clock Counts Number Of Data Cycles Notes ARITHMETIC (Continued) Register from Memory 0 0 1 0 1 0 0 w mod reg r/m 7** 2** a Memory from Register 0 0 1 0 1 0 1 w mod reg r/m 6* 1 a 2/7** 0/1** a 2** a 6* 1* a 2/7** 0/2** a 0/2** a Immediate from Register/Memory 1 0 0 0 0 0 s w mod 1 0 1 Immediate from Accumulator (Short Form) 0010110w r/m immediate data immediate data 2 SBB e Subtract with Borrow Register from Register 0 0 0 1 1 0 d w mod reg r/m 2 Register from Memory 0 0 0 1 1 0 0 w mod reg r/m 7** Memory from Register 0 0 0 1 1 0 1 w mod reg r/m Immediate from Register/Memory 1 0 0 0 0 0 s w mod 0 1 1 r/m immediate data Immediate from Accumulator (Short Form) 0001110w immediate data 2 DEC e Decrement Register/Memory Register (Short Form) 1 1 1 1 1 1 1 w reg 0 0 1 01001 r/m 2/6** reg 2 CMP e Compare Register with Register 0 0 1 1 1 0 d w mod reg r/m 2 Memory with Register 0 0 1 1 1 0 0 w mod reg r/m 5* 1* a Register with Memory 0 0 1 1 1 0 1 w mod reg r/m 6** 2** a Immediate with Register/Memory 1 0 0 0 0 0 s w mod 1 1 1 r/m immediate data 2/5* 0/1* a Immediate with Accumulator (Short Form) 0011110w NEG e Change Sign 1 1 1 1 0 1 1 w mod 0 1 1 0/2* a AAA e ASCII Adjust for Add 00110111 12–17/15–20 12–25/15–28* 12–41/17–46* 0/1 0/1* 0/2* a,n a,n a,n 12–17/15–20 12–25/15–28* 12–41/17–46* 0/1 0/1* 0/2* a,n a,n a,n 12–17/15–20 12–25/15–28* 12–41/17–46* 0/1 0/1* 0/2* a,n a,n a,n 13–26/14–27* 13–42/16–45* 0/1* 0/2* a,n a,n immediate data 2 r/m 2/6* 4 AAS e ASCII Adjust for Subtract 00111111 4 DAA e Decimal Adjust for Add 00100111 4 DAS e Decimal Adjust for Subtract 00101111 4 MUL e Multiply (Unsigned) Accumulator with Register/Memory 1 1 1 1 0 1 1 w mod 1 0 0 r/m MultiplierÐByte ÐWord ÐDoubleword IMUL e Integer Multiply (Signed) Accumulator with Register/Memory 1 1 1 1 0 1 1 w mod 1 0 1 r/m MultiplierÐByte ÐWord ÐDoubleword Register with Register/Memory 00001111 10101111 mod reg r/m MultiplierÐByte ÐWord ÐDoubleword Register/Memory with Immediate to Register ÐWord ÐDoubleword 011010s1 mod reg r/m immediate data 75 376 EMBEDDED PROCESSOR Table 8.1. 80376 Instruction Set Clock Count Summary (Continued) Instruction Format Clock Counts Number Of Data Cycles Notes 14/17 22/25* 38/43* 0/1 0/1* 0/2* a, o a, o a, o 19/22 27/30* 43/48* 0/1 0/1 0/2* a, o a, o a, o ARITHMETIC (Continued) DIV e Divide (Unsigned) Accumulator by Register/Memory 1 1 1 1 0 1 1 w mod 1 1 0 r/m DivisorÐByte ÐWord ÐDoubleword IDIV e Integer Divide (Signed) Accumulator by Register/Memory 1 1 1 1 0 1 1 w mod 1 1 1 r/m DivisorÐByte ÐWord ÐDoubleword AAD e ASCII Adjust for Divide 11010101 00001010 19 AAM e ASCII Adjust for Multiply 11010100 00001010 17 CBW e Convert Byte to Word 10011000 3 CWD e Convert Word to Double Word 10011001 2 LOGIC Shift Rotate Instructions Not Through Carry (ROL, ROR, SAL, SAR, SHL, and SHR) Register/Memory by 1 1 1 0 1 0 0 0 w mod TTT r/m 3/7** 0/2** a Register/Memory by CL 1 1 0 1 0 0 1 w mod TTT r/m 3/7** 0/2** a Register/Memory by Immediate Count 1 1 0 0 0 0 0 w mod TTT r/m immed 8-bit data 3/7** 0/2** a Register/Memory by 1 1 1 0 1 0 0 0 w mod TTT r/m 9/10** 0/2** a Register/Memory by CL 1 1 0 1 0 0 1 w mod TTT r/m 9/10** 10/2** a Register/Memory by Immediate Count 1 1 0 0 0 0 0 w mod TTT r/m immed 8-bit data 9/10** 0/2** a Through Carry (RCL and RCR) T T T Instruction 000 ROL 001 ROR 010 RCL 011 RCR 100 SHL/SAL 101 SHR 111 SAR SHLD e Shift Left Double Register/Memory by Immediate 00001111 1 0 1 0 0 1 0 0 mod reg r/m immed 8-bit data 3/7** 0/2** Register/Memory by CL 00001111 1 0 1 0 0 1 0 1 mod reg r/m 3/7** 0/2** Register/Memory by Immediate 00001111 1 0 1 0 1 1 0 0 mod reg r/m immed 8-bit data 3/7** 0/2** Register/Memory by CL 00001111 1 0 1 0 1 1 0 1 mod reg r/m 3/7** 0/2** SHRD e Shift Right Double AND e And Register to Register 76 0 0 1 0 0 0 d w mod reg r/m 2 376 EMBEDDED PROCESSOR Table 8.1. 80376 Instruction Set Clock Count Summary (Continued) Instruction Format Clock Counts Number of Data Cycles Notes 7** 2** a 6* 1* a 2/7** 0/2** a LOGIC (Continued) Register to Memory 0010000w mod reg r/m Memory to Register 0010001w mod reg r/m Immediate to Register/Memory 1000000w mod 1 0 0 r/m immediate data Immediate to Accumulator (Short Form) 0010010w immediate data Register/Memory and Register 1000010w mod reg r/m 2/5* 0/1* a Immediate Data and Register/Memory 1111011w mod 0 0 0 r/m immediate data 2/5* 0/1* a Immediate Data and Accumulator (Short Form) 1010100w immediate data Register to Register 000010dw mod reg r/m 2 Register to Memory 0000100w mod reg r/m 7** 2** a Memory to Register 0000101w mod reg r/m 6* 1* a Immediate to Register/Memory 1000000w mod 0 0 1 r/m immediate data 2/7** 0/2** a Immediate to Accumulator (Short Form) 0000110w immediate data Register to Register 001100dw mod reg r/m 2 Register to Memory 0011000w mod reg r/m 7** 2** a Memory to Register 0011001w mod reg r/m 6* 1* a Immediate to Register/Memory 1000000w mod 1 1 0 r/m immediate data 2/7** 0/2** a Immediate to Accumulator (Short Form) 0011010w immediate data NOT e Invert Register/Memory 1111011w mod 0 1 0 0/2** a 2 TEST e And Function to Flags, No Result 2 OR e Or 2 XOR e Exclusive Or r/m 2 2/6** STRING MANIPULATION CMPS e Compare Byte Word 1010011w 10* 2* a INS e Input Byte/Word from DX Port 0110110w 9** 29** 1** 1** a,f,k a,f,l LODS e Load Byte/Word to AL/AX/EAX 1010110w 5* 1* a MOVS e Move Byte Word 1010010w 7** 2** a OUTS e Output Byte/Word to DX Port 0110111w 8** 28** 1** 1** a,f,k a,f,l SCAS e Scan Byte Word 1010111w 7* 1* a 1010101w 4* 1* a 11010111 5* 1* a 5 a 9n** 2n** a STOS e Store Byte/Word from AL/AX/EX XLAT e Translate String REPEATED STRING MANIPULATION Repeated by Count in CX or ECX REPE CMPS e Compare String (Find Non-Match) 11110011 1010011w 77 376 EMBEDDED PROCESSOR Table 8.1. 80376 Instruction Set Clock Count Summary (Continued) Instruction Format Clock Counts Number of Data Cycles Notes REPEATED STRING MANIPULATION (Continued) REPNE CMPS e Compare String (Find Match) 11110010 1010011w 5 a 9n** 2n** a REP INS e Input String 11110011 0110110w 7 a 6n* 27 a 6n* 1n* 1n* a,f,k a,f,l REP LODS e Load String 11110011 1010110w 5 a 6n* 1n* a REP MOVS e Move String 11110011 1010010w 7 a 4n** 2n** a REP OUTS e Output String 11110011 0110111w 6 a 5n* 26 a 5n* 1n* 1n* a,f,k a,f,l 11110011 1010111w 5 a 8n* 1n* a 11110010 1010111w 5 a 8n* 1n* a 11110011 1010101w 5 a 5n* 1n* a BSF e Scan Bit Forward 00001111 10111100 mod reg r/m 10 a 3n** 2n** a BSR e Scan Bit Reverse 00001111 10111101 mod reg r/m 10 a 3n** 2n** a Register/Memory, Immediate 00001111 10111010 mod 1 0 0 r/m immed 8-bit data 3/6* 0/1* a Register/Memory, Register 00001111 10100011 mod reg r/m 3/12* 0/1* a Register/Memory, Immediate 00001111 10111010 mod 1 1 1 r/m immed 8-bit data 6/8* 0/2* a Register/Memory, Register 00001111 10111011 mod reg r/m 6/13* 0/2* a Register/Memory, Immediate 00001111 10111010 mod 1 1 0 r/m immed 8-bit data 6/8* 0/2* a Register/Memory, Register 00001111 10110011 mod reg r/m 6/13* 0/2* a Register/Memory, Immediate 00001111 10111010 mod 1 0 1 r/m immed 8-bit data 6/8* 0/2* a Register/Memory, Register 00001111 10101011 mod reg r/m 6/13* 0/2* a 11101000 full displacement 9 a m* 2 j 11111111 mod 0 1 0 r/m 9 a m/12 a m 2/3 a, j 10011010 unsigned full offset, selector 42 a m 9 c, d, j REPE SCAS e Scan String (Find Non-AL/AX/EAX) REPNE SCAS e Scan String (Find AL/AX/EAX) REP STOS e Store String BIT MANIPULATION BT e Test Bit BTC e Test Bit and Complement BTR e Test Bit and Reset BTS e Test Bit and Set CONTROL TRANSFER CALL e Call Direct within Segment Register/Memory Indirect within Segment Direct Intersegment 78 376 EMBEDDED PROCESSOR Table 8.1. 80376 Instruction Set Clock Count Summary (Continued) Instruction Format Clock Counts Number of Data Cycles Notes 64 a m 13 a,c,d,j 98 a m 13 a,c,d,j 106 a 8x a m 13 a 4x a,c,d,j 392 124 a,c,d,j 46 a m 10 a,c,d,j 68 a m 14 a,c,d,j 102 a m 14 a,c,d,j 110 a 8x a m 14 a 4x a,c,d,j 399 130 a,c,d,j CONTROL TRANSFER (Continued) (Direct Intersegment) Via Call Gate to Same Privilege Level Via Call Gate to Different Privilege Level, (No Parameters) Via Call Gate to Different Privilege Level, (x Parameters) From 386 Task to 386 TSS Indirect Intersegment 11111111 mod 0 1 1 r/m Via Call Gate to Same Privilege Level Via Call Gate to Different Privilege Level, (No Parameters) Via Call Gate to Different Privilege Level, (x Parameters) From 386 Task to 386 TSS JMP e Unconditional Jump Short 11101011 8-bit displacement 7am Direct within Segment 11101001 full displacement 7am Register/Memory Indirect within Segment 11111111 mod 1 0 0 Direct Intersegment 11101010 unsigned full offset, selector r/m Via Call Gate to Same Privilege Level From 386 Task to 386 TSS Indirect Intersegment Via Call Gate to Same Privilege Level From 386 Task to 386 TSS 11111111 mod 1 0 1 r/m j j 9 a m/14 a m 2/4 a,j 37 a m 5 c,d,j 53 a m 9 a,c,d,j 395 124 a,c,d,j 37 a m 9 a,c,d,j 59 a m 401 13 124 a,c,d,j a,c,d,j 79 376 EMBEDDED PROCESSOR Table 8.1. 80376 Instruction Set Clock Count Summary (Continued) Instruction Format Clock Counts Number of Data Cycles Notes 12 a m 2 a,j,p 12 a m 2 a,j,p 36 a m 4 a,c,d,j,p 36 a m 4 a,c,d,j,p 80 80 4 4 c,d,j,p c,d,j,p CONTROL TRANSFER (Continued) RET e Return from CALL: Within Segment 11000011 Within Segment Adding Immediate to SP 11000010 Intersegment 11001011 Intersegment Adding Immediate to SP 11001010 16-bit displ 16-bit displ to Different Privilege Level Intersegment Intersegment Adding Immediate to SP CONDITIONAL JUMPS NOTE: Times Are Jump ‘‘Taken or Not Taken’’ JO e Jump on Overflow 8-Bit Displacement 01110000 8-bit displ 7 a m or 3 Full Displacement 00001111 10000000 j full displacement 7 a m or 3 j 7 a m or 3 j full displacement 7 a m or 3 j 8-bit displ 7 a m or 3 j 10000010 7 a m or 3 j 7 a m or 3 j full displacement 7 a m or 3 j 7 a m or 3 j full displacement 7 a m or 3 j 7 a m or 3 j full displacement 7 a m or 3 j 7 a m or 3 j full displacement 7 a m or 3 j 7 a m or 3 j full displacement 7 a m or 3 j 7 a m or 3 j 7 a m or 3 j JNO e Jump on Not Overflow 8-Bit Displacement 01110001 8-bit displ Full Displacement 00001111 10000001 JB/JNAE e Jump on Below/Not Above or Equal 8-Bit Displacement Full Displacement 01110010 00001111 full displacement JNB/JAE e Jump on Not Below/Above or Equal 8-Bit Displacement 01110011 8-bit displ Full Displacement 00001111 10000011 JE/JZ e Jump on Equal/Zero 8-Bit Displacement 01110100 8-bit displ Full Displacement 00001111 10000100 8-Bit Displacement 01110101 8-bit displ Full Displacement 00001111 10000101 JNE/JNZ e Jump on Not Equal/Not Zero JBE/JNA e Jump on Below or Equal/Not Above 8-Bit Displacement 01110110 8-bit displ Full Displacement 00001111 10000110 JNBE/JA e Jump on Not Below or Equal/Above 8-Bit Displacement 01110111 8-bit displ Full Displacement 00001111 10000111 8-Bit Displacement 01111000 8-bit displ Full Displacement 00001111 10001000 JS e Jump on Sign 80 full displacement 376 EMBEDDED PROCESSOR Table 8.1. 80376 Instruction Set Clock Count Summary (Continued) Instruction Clock Counts Format Number of Data Cycles Notes CONDITIONAL JUMPS (Continued) JNS e Jump on Not Sign 8-Bit Displacement 01111001 8-bit displ 7 a m or 3 Full Displacement 00001111 10001001 j 7 a m or 3 j 7 a m or 3 j 7 a m or 3 j 8-bit displ 7 a m or 3 j 10001011 7 a m or 3 j 7 a m or 3 j 7 a m or 3 j 7 a m or 3 j 7 a m or 3 j 7 a m or 3 j 7 a m or 3 j 7 a m or 3 j full displacement JP/JPE e Jump on Parity/Parity Even 8-Bit Displacement 01111010 8-bit displ Full Displacement 00001111 10001010 full displacement JNP/JPO e Jump on Not Parity/Parity Odd 8-Bit Displacement Full Displacement 01111011 00001111 full displacement JL/JNGE e Jump on Less/Not Greater or Equal 8-Bit Displacement 01111100 8-bit displ Full Displacement 00001111 10001100 full displacement JNL/JGE e Jump on Not Less/Greater or Equal 8-Bit Displacement 01111101 8-bit displ Full Displacement 00001111 10001101 full displacement JLE/JNG e Jump on Less or Equal/Not Greater 8-Bit Displacement 01111110 8-bit displ Full Displacement 00001111 10001110 full displacement JNLE/JG e Jump on Not Less or Equal/Greater 8-Bit Displacement 01111111 8-bit displ Full Displacement 00001111 10001111 7 a m or 3 j 11100011 8-bit displ 9 a m or 5 j JECXZ e Jump on ECX Zero full displacement (Address Size Prefix Differentiates JCXZ from JECXZ) LOOP e Loop ECX Times 11100010 8-bit displ 11 a m j LOOPZ/LOOPE e Loop with Zero/Equal 11100001 8-bit displ 11 a m j LOOPNZ/LOOPNE e Loop While Not Zero 11100000 8-bit displ 11 a m j 00001111 10010000 mod 0 0 0 r/m 4/5* 0/1* a 00001111 10010001 mod 0 0 0 r/m 4/5* 0/1* a 10010010 mod 0 0 0 r/m 4/5* 0/1* a CONDITIONAL BYTE SET NOTE: Times Are Register/Memory SETO e Set Byte on Overflow To Register/Memory SETNO e Set Byte on Not Overflow To Register/Memory SETB/SETNAE e Set Byte on Below/Not Above or Equal To Register/Memory 00001111 81 376 EMBEDDED PROCESSOR Table 8.1. 80376 Instruction Set Clock Count Summary (Continued) Instruction Format Clock Counts Number of Data Cycles Notes CONDITIONAL BYTE SET (Continued) SETNB e Set Byte on Not Below/Above or Equal To Register/Memory 00001111 10010011 mod 0 0 0 r/m 4/5* 0/1* a 00001111 10010100 mod 0 0 0 r/m 4/5* 0/1* a 10010101 mod 0 0 0 r/m 4/5* 0/1* a 10010110 mod 0 0 0 r/m 4/5* 0/1* a 00001111 10010111 mod 0 0 0 r/m 4/5* 0/1* a 00001111 10011000 mod 0 0 0 r/m 4/5* 0/1* a 00001111 10011001 mod 0 0 0 r/m 4/5* 0/1* a 10011010 mod 0 0 0 r/m 4/5* 0/1* a 10011011 mod 0 0 0 r/m 4/5* 0/1* a 10011100 mod 0 0 0 r/m 4/5* 0/1* a 01111101 mod 0 0 0 r/m 4/5* 0/1* a 10011110 mod 0 0 0 r/m 4/5* 0/1* a 10011111 mod 0 0 0 r/m 4/5* 0/1* a 10 14 17 a 8(n b 1) 1 a a a SETE/SETZ e Set Byte on Equal/Zero To Register/Memory SETNE/SETNZ e Set Byte on Not Equal/Not Zero To Register/Memory 00001111 SETBE/SETNA e Set Byte on Below or Equal/Not Above To Register/Memory 00001111 SETNBE/SETA e Set Byte on Not Below or Equal/Above To Register/Memory SETS e Set Byte on Sign To Register/Memory SETNS e Set Byte on Not Sign To Register/Memory SETP/SETPE e Set Byte on Parity/Parity Even To Register/Memory 00001111 SETNP/SETPO e Set Byte on Not Parity/Parity Odd To Register/Memory 00001111 SETL/SETNGE e Set Byte on Less/Not Greater or Equal To Register/Memory 00001111 SETNL/SETGE e Set Byte on Not Less/Greater or Equal To Register/Memory 00001111 SETLE/SETNG e Set Byte on Less or Equal/Not Greater To Register/Memory 00001111 SETNLE/SETG e Set Byte on Not Less or Equal/Greater To Register/Memory ENTER e Enter Procedure 00001111 11001000 Le0 Le1 Ll1 LEAVE e Leave Procedure 82 11001001 16-bit displacement, 8-bit level 6 4(n b 1) a 376 EMBEDDED PROCESSOR Table 8.1. 80376 Instruction Set Clock Count Summary (Continued) Instruction Format Clock Counts Number of Data Cycles Notes INTERRUPT INSTRUCTIONS INT e Interrupt: Type Specified 11001101 type Via Interrupt or Trap Gate to Same Privilege Level Via Interrupt or Trap Gate to Different Privilege Level 71 14 c,d,j,p 111 14 c,d,j,p From 386 Task to 386 TSS via Task Gate 467 140 c,d,j,p Type 3 11001100 Via Interrupt or Trap Gate to Same Privilege Level Via Interrupt or Trap Gate to Different Privilege Level 71 14 c,d,j,p 111 14 c,d,j,p From 386 Task to 386 TSS via Task Gate 308 138 c,d,j,p INTO e Interrupt 4 if Overflow Flag Set 11001110 If OF e 1: If OF e 0 3 Via Interrupt or Trap Gate to Same Privilege Level Via Interrupt or Trap Gate to Different Privilege Level 71 14 c,d,j,p 111 14 c,d,j,p From 386 Task to 386 TSS via Task Gate 413 138 c,d,j,p 83 376 EMBEDDED PROCESSOR Table 8.1. 80376 Instruction Set Clock Count Summary (Continued) Clock Counts Number Of Data Cycles Notes if in Range 10 0 a,c,d,j,o,p if Out of Range: Via Interrupt or Trap Gate to Same Privilege Level Via Interrupt or Trap Gate to Different Privilege Level 71 14 c,d,j,p 111 14 c,d,j,p From 386 Task to 386 TSS via Task Gate 398 138 c,d,j,p To the Same Privilege Level (within Task) To Different Privilege Level (within Task) 42 86 5 5 a,c,d,j,p a,c,d,j,p From 386 Task to 386 TSS 328 138 c,d,j,p Instruction Format INTERRUPT INSTRUCTIONS (Continued) Bound e Out of Range Interrupt 5 if Detect Value 01100010 mod reg r/m INTERRUPT RETURN IRET e Interrupt Return 11001111 PROCESSOR CONTROL 11110100 HLT e HALT 5 b MOV e Move to and from Control/Debug/Test Registers CR0 00001111 00100010 1 1 eee reg 10 b Register from CR0 from register 00001111 00100000 1 1 eee reg 6 b DR0–3 from Register 00001111 00100011 1 1 eee reg 22 b DR6–7 from Register 00001111 00100011 1 1 eee reg 16 b Register from DR6–7 00001111 00100001 1 1 eee reg 14 b Register from DR0–3 00001111 00100001 1 1 eee reg 22 b NOP e No Operation 10010000 3 WAIT e Wait until BUSY Pin is Negated 10011011 6 84 376 EMBEDDED PROCESSOR Table 8.1. 80376 Instruction Set Clock Count Summary (Continued) Instruction Clock Counts Format Number of Data Cycles Notes PROCESSOR EXTENSION INSTRUCTIONS Processor Extension Escape 11011TTT mod L L L r/m See 80387SX Data Sheet a TTT and LLL bits are opcode information for coprocessor. PREFIX BYTES Address Size Prefix 01100111 0 LOCK e Bus Lock Prefix 11110000 0 Operand Size Prefix 01100110 0 CS: 00101110 0 DS: 00111110 0 ES: 00100110 0 FS: 01100100 0 GS: 01100101 0 SS: 00110110 0 f Segment Override Prefix PROTECTION CONTROL ARPL e Adjust Requested Privilege Level From Register/Memory 01100011 mod reg r/m 20/21** 2** a LAR e Load Access Rights From Register/Memory 00001111 00000010 mod reg r/m 17/18* 1* a,c,i,p 00001111 00000001 mod 0 1 0 r/m 13** 3* a,e 00001111 00000001 mod 0 1 1 r/m 13** 3* a,e 00001111 00000000 mod 0 1 0 r/m 24/28* 5* a,c,e,p 00001111 00000001 mod 1 1 0 r/m 10/13* 1* a,e 00001111 00000011 mod reg r/m 24/27* 29/32* 2* 2* a,c,i,p a,c,i,p LGDT e Load Global Descriptor Table Register LIDT e Load Interrupt Descriptor Table Register LLDT e Load Local Descriptor Table Register to Register/Memory LMSW eLoad Machine Status Word From Register/Memory LSL e Load Segment Limit From Register/Memory Byte-Granular Limit Page-Granular Limit LTR e Load Task Register From Register/Memory 00001111 00000000 mod 0 0 1 r/m 27/31* 4* a,c,e,p 00001111 00000001 mod 0 0 0 r/m 11* 3* a 00001111 00000001 mod 0 0 1 r/m 11* 3* a 00000000 mod 0 0 0 r/m 2/2* 4* a SGDT e Store Global Descriptor Table Register SIDT e Store Interrupt Descriptor Table Register SLDT e Store Local Descriptor Table Register To Register/Memory 00001111 85 376 EMBEDDED PROCESSOR Table 8.1. 80376 Instruction Set Clock Count Summary (Continued) Instruction Format Clock Counts Number of Data Cycles Notes PROTECTION CONTROL (Continued) SMSW e Store Machine Status Word STR e 00001111 00000001 mod 1 0 0 r/m 2/2* 1* a, c 00001111 00000000 mod 0 0 1 r/m 2/2* 1* a 00001111 00000000 mod 1 0 0 r/m 10/11** 2** a,c,i,p 00001111 00000000 mod 1 0 1 r/m 15/16** 2** a,c,i,p Store Task Register To Register/Memory VERR e Verify Read Accesss Register/Memory VERW e Verify Write Accesss NOTES: a. Exception 13 fault (general violation) will occur if the memory operand in CS, DS, ES, FS or GS cannot be used due to either a segment limit violation or access rights violation. If a stack limit is violated, and exception 12 (stack segment limit violation or not present) occurs. b. For segment load operations, the CPL, RPL and DPL must agree with the privilege rules to avoid an exception 13 fault (general protection violation). The segments’s descriptor must indicate ‘‘present’’ or exception 11 (CS, DS, ES, FS, GS not present). If the SS register is loaded and a stack segment not present is detected, an exception 12 (stack segment limit violation or not present occurs). c. All segment descriptor accesses in the GDT or LDT made by this instruction will automatically assert LOCK to maintain descriptor integrity in multiprocessor systems. d. JMP, CALL, INT, RET and IRET instructions referring to another code segment will cause an exception 13 (general protection violation) if an applicable privilege rule is volated. e. An exception 13 fault occurs if CPL is greater than 0. f. An exception 13 fault occurs if CPL is greater than IOPL. g. The IF bit of the flag register is not updated if CPL is greater than IOPL. The IOPL field of the flag register is updated only if CPL e 0. h. Any violation of privelege rules as applied to the selector operand does not cause a protection exception; rather, the zero flag is cleared. i. If the coprocessor’s memory operand violates a segment limit or segment access rights, an exception 13 fault (general protection exception) will occur before the ESC instruction is executed. An exception 12 fault (stack segment limit violation or no present) will occur if the stack limit is violated by the operand’s starting address. j. The destination of a JMP, CALL, INT, RET or IRET must be in the defined limit of a code segment or an exception 13 fault (general protection violation) will occur. k. If CPL s IOPL l. If CPL l IOPL m. LOCK is automatically asserted, regardless of the presence or absence of the LOCK prefix. n. The 80376 uses an early-out multiply algorithm. The actual number of clocks depends on the position of the most significant bit in the operand (multiplier). Clock counts given are minimum to maximum. To calculate actual clocks use the following formula: Actual Clock e if m k l 0 then max ([log2 lml], 3) a 9 clocks: if m e 0 then 12 clocks (where m is the multiplier) o. An exception may occur, depending on the value of the operand. p. LOCK is asserted during descriptor table accesses. 86 376 EMBEDDED PROCESSOR encodings of the mod r/m byte indicate a second addressing byte, the scale-index-base byte, follows the mod r/m byte to fully specify the addressing mode. 8.2 INSTRUCTION ENCODING Overview All instruction encodings are subsets of the general instruction format shown in Figure 8.1. Instructions consist of one or two primary opcode bytes, possibly an address specifier consisting of the ‘‘mod r/m’’ byte and ‘‘scaled index’’ byte, a displacement if required, and an immediate data field if required. Addressing modes can include a displacement immediately following the mod r/m byte, or scaled index byte. If a displacement is present, the possible sizes are 8, 16 or 32 bits. If the instruction specifies an immediate operand, the immediate operand follows any displacement bytes. The immediate operand, if specified, is always the last field of the instruction. Within the primary opcode or opcodes, smaller encoding fields may be defined. These fields vary according to the class of operation. The fields define such information as direction of the operation, size of the displacements, register encoding, or sign extension. Figure 8.1 illustrates several of the fields that can appear in an instruction, such as the mod field and the r/m field, but the Figure does not show all fields. Several smaller fields also appear in certain instructions, sometimes within the opcode bytes themselves. Table 8.2 is a complete list of all fields appearing in the 80376 instruction set. Further ahead, following Table 8.2, are detailed tables for each field. Almost all instructions referring to an operand in memory have an addressing mode byte following the primary opcode byte(s). This byte, the mod r/m byte, specifies the address mode to be used. Certain T T T T T T T T T T T T T T T T mod T T T r/m X7 0 7 ä opcode (one or two bytes) (T represents an opcode bit.) 0 Y X7 X 6 5 3 2 0 ss index base d32 l 16 l 8 l none data32 l 16 l 8 l none Y X7 ä ‘‘mod r/m’’ byte ä 6 5 3 2 0 ä ‘‘s-i-b’’ byte register and address mode specifier YX Y ä Y X address displacement (4, 2, 1 bytes or none) ä Y immediate data (4, 2, 1 bytes or none) Figure 8.1. General Instruction Format Table 8.2. Fields within 80376 Instructions Field Name Description Number of Bits w d s reg mod r/m Specifies if Data is Byte or Full Size (Full Size is either 16 or 32 Bits Specifies Direction of Data Operation Specifies if an Immediate Data Field Must be Sign-Extended General Register Specifier Address Mode Specifier (Effective Address can be a General Register) ss index base sreg2 sreg3 tttn Scale Factor for Scaled Index Address Mode General Register to be used as Index Register General Register to be used as Base Register Segment Register Specifier for CS, SS, DS, ES Segment Register Specifier for CS, SS, DS, ES, FS, GS For Conditional Instructions, Specifies a Condition Asserted or a Condition Negated 1 1 1 3 2 for mod; 3 for r/m 2 3 3 2 3 4 Note: Table 8.1 shows encoding of individual instructions. 87 376 EMBEDDED PROCESSOR Encoding of reg Field When w Field is not Present in Instruction 16-Bit Extensions of the Instruction Set Two prefixes, the operand size prefix (66H) and the effective address size prefix (67H), allow overriding individually the default selection of operand size and effective address size. These prefixes may precede any opcode bytes and affect only the instruction they precede. If necessary, one or both of the prefixes may be placed before the opcode bytes. The presence of the operand size prefix (66H) and the effective address prefix will allow 16-bit data operation and 16-bit effective address calculations. For instructions with more than one prefix, the order of prefixes is unimportant. reg Field Register Selected with 66H Prefix Register Selected During 32-Bit Data Operations 000 001 010 011 100 101 110 111 AX CX DX BX SP BP SI DI EAX ECX EDX EBX ESP EBP ESI EDI Encoding of reg Field When w Field is Present in Instruction Unless specified otherwise, instructions with 8-bit and 16-bit operands do not affect the contents of the high-order bits of the extended registers. Encoding of Instruction Fields Within the instruction are several fields indicating register selection, addressing mode and so on. ENCODING OF OPERAND LENGTH (w) FIELD For any given instruction performing a data operation, the instruction will execute as a 32-bit operation. Within the constraints of the operation size, the w field encodes the operand size as either one byte or the full operation size, as shown in the table below. w Field Operand Size with 66H Prefix Normal Operand Size 0 1 8 Bits 16 Bits 8 Bits 32 Bits ENCODING OF THE GENERAL REGISTER (reg) FIELD The general register is specified by the reg field, which may appear in the primary opcode bytes, or as the reg field of the ‘‘mod r/m’’ byte, or as the r/m field of the ‘‘mod r/m’’ byte. 88 Register Specified by reg Field with 66H Prefix reg 000 001 010 011 100 101 110 111 Function of w Field (when w e 0) (when w e 1) AL CL DL BL AH CH DH BH AX CX DX BX SP BP SI DI Register Specified by reg Field without 66H Prefix reg 000 001 010 011 100 101 110 111 Function of w Field (when w e 0) (when w e 1) AL CL DL BL AH CH DH BH EAX ECX EDX EBX ESP EBP ESI EDI 376 EMBEDDED PROCESSOR ENCODING OF THE SEGMENT REGISTER (sreg) FIELD ENCODING OF ADDRESS MODE The sreg field in certain instructions is a 2-bit field allowing one of the CS, DS, ES or SS segment registers to be specified. The sreg field in other instructions is a 3-bit field, allowing the FS and GS segment registers to be specified also. 2-Bit sreg2 Field 2-Bit sreg2 Field Segment Register Selected 00 01 10 11 ES CS SS DS 3-Bit sreg3 Field 3-Bit sreg3 Field Segment Register Selected 000 001 010 011 100 101 110 111 ES CS SS DS FS GS do not use do not use Except for special instructions, such as PUSH or POP, where the addressing mode is pre-determined, the addressing mode for the current instruction is specified by addressing bytes following the primary opcode. The primary addressing byte is the ‘‘mod r/m’’ byte, and a second byte of addressing information, the ‘‘s-i-b’’ (scale-index-base) byte, can be specified. The s-i-b byte (scale-index-base byte) is specified when using 32-bit addressing mode and the ‘‘mod r/m’’ byte has r/m e 100 and mod e 00, 01 or 10. When the sib byte is present, the 32-bit addressing mode is a function of the mod, ss, index, and base fields. The primary addressing byte, the ‘‘mod r/m’’ byte, also contains three bits (shown as TTT in Figure 8.1) sometimes used as an extension of the primary opcode. The three bits, however, may also be used as a register field (reg). When calculating an effective address, either 16-bit addressing or 32-bit addressing is used. 16-bit addressing uses 16-bit address components to calculate the effective address while 32-bit addressing uses 32-bit address components to calculate the effective address. When 16-bit addressing is used, the ‘‘mod r/m’’ byte is interpreted as a 16-bit addressing mode specifier. When 32-bit addressing is used, the ‘‘mod r/m’’ byte is interpreted as a 32-bit addressing mode specifier. Tables on the following three pages define all encodings of all 16-bit addressing modes and 32-bit addressing modes. 89 376 EMBEDDED PROCESSOR Encoding of Normal Address Mode with ‘‘mod r/m’’ byte (no ‘‘s-i-b’’ byte present): mod r/m Effective Address mod r/m Effective Address 00 000 00 001 00 010 00 011 00 100 00 101 00 110 00 111 DS: [EAX] DS: [ECX] DS: [EDX] DS: [EBX] s-i-b is present DS:d32 DS: [ESI] DS: [EDI] 10 000 10 001 10 010 10 011 10 100 10 101 10 110 10 111 DS: [EAX a d32] DS: [ECX a d32] DS: [EDX a d32] DS: [EBX a d32] s-i-b is present SS: [EBP a d32] DS: [ESI a d32] DS: [EDI a d32] 01 000 01 001 01 010 01 011 01 100 01 101 01 110 01 111 DS: [EAX a d8] DS: [ECX a d8] DS: [EDX a d8] DS: [EBX a d8] s-i-b is present SS: [EBP a d8] DS: [ESI a d8] DS: [EDI a d8] 11 000 11 001 11 010 11 011 11 100 11 101 11 110 11 111 registerÐsee below registerÐsee below registerÐsee below registerÐsee below registerÐsee below registerÐsee below registerÐsee below registerÐsee below Register Specified by reg or r/m during Normal Data Operations: mod r/m 11 000 11 001 11 010 11 011 11 100 11 101 11 110 11 111 function of w field (when w e 0) (when w e 1) AL CL DL BL AH CH DH BH EAX ECX EDX EBX ESP EBP ESI EDI Register Specified by reg or r/m during 16-Bit Data Operations: (66H Prefix) mod r/m 11 000 11 001 11 010 11 011 11 100 11 101 11 110 11 111 90 function of w field (when w e 0) (when w e 1) AL CL DL BL AH CH DH BH AX CX DX BX SP BP SI DI 376 EMBEDDED PROCESSOR Encoding of 16-bit Address Mode with ‘‘mod r/m’’ Byte Using 67H Prefix mod r/m Effective Address mod r/m Effective Address 00 000 00 001 00 010 00 011 00 100 00 101 00 110 00 111 DS: [BX a SI] DS: [BX a DI] SS: [BP a SI] SS: [BP a DI] DS: [SI] DS: [DI] DS:d16 DS: [BX] 10 000 10 001 10 010 10 011 10 100 10 101 10 110 10 111 DS: [BX a SI a d16] DS: [BX a DI a d16] SS: [BP a SI a d16] SS: [BP a DI a d16] DS: [SI a d16] DS: [DI a d16] SS: [BP a d16] DS: [BX a d16] 01 000 01 001 01 010 01 011 01 100 01 101 01 110 01 111 DS: [BX a SI a d8] DS: [BX a DI a d8] SS: [BP a SI a d8] SS: [BP a DI a d8] DS: [SI a d8] DS: [DI a d8] SS: [BP a d8] DS: [BX a d8] 11 000 11 001 11 010 11 011 11 100 11 101 11 110 11 111 registerÐsee below registerÐsee below registerÐsee below registerÐsee below registerÐsee below registerÐsee below registerÐsee below registerÐsee below 91 376 EMBEDDED PROCESSOR Encoding of 32-bit Address Mode (‘‘mod r/m’’ byte and ‘‘s-i-b’’ byte present): mod base Effective Address ss Scale Factor 00 000 00 001 00 010 00 011 00 100 00 101 00 110 00 111 DS: [EAX a (scaled index)] DS: [ECX a (scaled index)] DS: [EDX a (scaled index)] DS: [EBX a (scaled index)] SS: [ESP a (scaled index)] DS: [d32 a (scaled index)] DS: [ESI a (scaled index)] DS: [EDI a (scaled index)] 00 01 10 11 x1 x2 x4 x8 index Index Register 01 000 01 001 01 010 01 011 01 100 01 101 01 110 01 111 DS: [EAX a (scaled index) a d8] DS: [ECX a (scaled index) a d8] DS: [EDX a (scaled index) a d8] DS: [EBX a (scaled index) a d8] SS: [ESP a (scaled index) a d8] SS: [EBP a (scaled index) a d8] DS: [ESI a (scaled index) a d8] DS: [EDI a (scaled index) a d8] 000 001 010 011 100 101 110 111 EAX ECX EDX EBX no index reg** EBP ESI EDI 10 000 10 001 10 010 10 011 10 100 10 101 10 110 10 111 DS: [EAX a (scaled index) a d32] DS: [ECX a (scaled index) a d32] DS: [EDX a (scaled index) a d32] DS: [EBX a (scaled index) a d32] SS: [ESP a (scaled index) a d32] SS: [EBP a (scaled index) a d32] DS: [ESI a (scaled index) a d32] DS: [EDI a (scaled index) a d32] NOTE: Mod field in ‘‘mod r/m’’ byte; ss, index, base fields in ‘‘s-i-b’’ byte. 92 **IMPORTANT NOTE: When index field is 100, indicating ‘‘no index register,’’ then ss field MUST equal 00. If index is 100 and ss does not equal 00, the effective address is undefined. 376 EMBEDDED PROCESSOR ENCODING OF OPERATION DIRECTION (d) FIELD Mnemonic In many two-operand instructions the d field is present to indicate which operand is considered the source and which is the destination. d Direction of Operation 0 Register/Memory k - - Register ‘‘reg’’ Field Indicates Source Operand; ‘‘mod r/m’’ or ‘‘mod ss index base’’ Indicates Destination Operand 1 Register k - - Register/Memory ‘‘reg’’ Field Indicates Destination Operand; ‘‘mod r/m’’ or ‘‘mod ss index base’’ Indicates Source Operand O NO B/NAE NB/AE E/Z NE/NZ BE/NA NBE/A S NS P/PE NP/PO L/NGE NL/GE LE/NG NLE/G Condition tttn Overflow No Overflow Below/Not Above or Equal Not Below/Above or Equal Equal/Zero Not Equal/Not Zero Below or Equal/Not Above Not Below or Equal/Above Sign Not Sign Parity/Parity Even Not Parity/Parity Odd Less Than/Not Greater or Equal Not Less Than/Greater or Equal Less Than or Equal/Greater Than Not Less or Equal/Greater Than 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 ENCODING OF SIGN-EXTEND (s) FIELD The s field occurs primarily to instructions with immediate data fields. The s field has an effect only if the size of the immediate data is 8 bits and is being placed in a 16-bit or 32-bit destination. s Effect on Immediate Data8 Effect on Immediate Data 16 l 32 0 None 1 Sign-Extend Data8 to Fill 16-Bit or 32-Bit Destination ENCODING OF CONTROL OR DEBUG REGISTER (eee) FIELD For the loading and storing of the Control and Debug registers. When Interpreted as Control Register Field None eee Code Reg Name None 000 010 011 CR0 Reserved Reserved Do not use any other encoding ENCODING OF CONDITIONAL TEST (tttn) FIELD For the conditional instructions (conditional jumps and set on condition), tttn is encoded with n indicating to use the condition (n e 0) or its negation (n e 1), and ttt giving the condition to test. When Interpreted as Debug Register Field eee Code Reg Name 000 001 010 011 110 111 DR0 DR1 DR2 DR3 DR6 DR7 Do not use any other encoding 93 376 EMBEDDED PROCESSOR 9.0 REVISION HISTORY The sections significantly revised since version -003 are: Section 1.0 Added FLT pin. Section 4.4 Section 4.6 Added description of FLOAT operation and ONCE Mode. Figure 4.20 is new. Added revision identifier information for change to CHMOS IV manufacturing process. Section 5.0 Both packages now specified for 0§ C – 115§ C case temperature operation. Thermal resistance values changed. Section 6.3 ICC Max. specifications changed from 400 mA (cold) and 360 mA (hot) to 275 mA (cold, 16 MHz) and 305 mA (cold, 20 MHz). HLDA Valid Delay, t14, min. changed from 6 ns to 4 ns. Added 20 MHz A.C. specifications in Table 6.5. Replaced Capacitive Derating Curves in Figures 6.8 – 6.10 to reflect new manufacturing process. Replaced ICC vs. Frequency data (Figure 6.11) to reflect new specifications. Section 6.4 The sections significantly revised since version -002 are: Section 1.0 Modified table 1.1. to list pins in alphabetical order. The sections significantly revised since version -001 are: Section 2.0 Figure 2.0 was updated to show the 16-bit registers SI, DI, BP and SP. Section 2.1 Section 2.1 Section 2.3 Section 2.6 Section 2.8 Section 2.10 Section 3.0 Section 3.2 Section 3.2 Figure 2.2 was updated to show the correct bit polarity for bit 4 in the CR0 register. Tables 2.1 and 2.2 were updated to include additional information on the EFLAGs and CR0 registers. Figure 2.3 was updated to more accurately reflect the addressing mechanism of the 80376. In the subsection Maskable Interrupt a paragraph was added to describe the effect of interrupt gates on the IF EFLAGs bit. Table 2.7 was updated to reflect the correct power up condition of the CR0 register. Figure 2.6 was updated to show the correct bit positions of the BT, BS and BD bits in the DR6 register. Figure 3.1 was updated to clearly show the address calculation process. The subsection DESCRIPTORS was elaborated upon to clearly define the relationship between the linear address space and physical address space of the 80376. Figures 3.3 and 3.4 were updated to show the AVL bit field. Section 3.3 The last sentence in the first paragraph of subsection PROTECTION AND I/O PERMISSION BIT MAP was deleted. This was an incorrect statement. Section 4.1 In the Subsection ADDRESS BUS (BHE, BLE, A23 –A1 last sentence in the first paragraph was updated to reflect the numerics operand addresses as 8000FCH and 8000FEH. Because the 80376 sometimes does a double word I/O access a second access to 8000FEH can be seen. The Subsection Hold Lantencies was updated to describe how 32-bit and unaligned accesses are internally locked but do not assert the LOCK signal. Table 4.6 was updated to show the correct active data bits during a BLE assertion. Section 4.1 Section 4.2 94 376 EMBEDDED PROCESSOR 9.0 REVISION HISTORY (Continued) Section 4.4 Section 4.6 Section 4.7 Section 5.0 This section was updated to correctly reflect the pipelining of the address and status of the 80376 as opposed to ‘‘Address Pipelining’’ which occurs on processors such as the 80286. Table 4.7 was updated to show the correct Revision number, 05H. Table 4.8 was updated to show the numerics operand register 8000FEH. This address is seen when the 80376 does a DWORD operation to the port address 8000FCH. In the first paragraph the case temperatures were updated to reflect the 0§ C – 115§ C for the ceramic package and 0§ C–110§ C for the plastic package. Section 6.2 Table 6.2 was updated to reflect the Case Temperature under Bias specification of b 65§ C– 120§ C. Section 6.4 Section 6.4 Figure 6.8 vertical axis was updated to reflect ‘‘Output Valid Delay (ns)’’. Figure 6.11 was updated to show typical ICC vs Frequency for the 80376. Section 8.1 Section 8.2 The clock counts and opcodes for various instructions were updated to their correct values. The section INSTRUCTION ENCODING was appended to the data sheet. 95