Intel387 TM SX MATH COPROCESSOR Y New Automatic Power Management Ð Low Power Consumption Ð Typically 100 mA in Dynamic Mode, and 4 mA in Idle Mode Y Socket Compatible with Intel387 Family of Math CoProcessors Ð Hardware and Software Compatible Ð Supported by Over 2100 Commercial Software Packages Ð 10% to 15% Performance Increase on Whetstone and Livermore Benchmarks Y Compatible with the Intel386 TM SX Microprocessor Ð Extends CPU Instruction Set to Include Trigonometric, Logarithmic, and Exponential Y High Performance 80-Bit Internal Architecture Y Implements ANSI/IEEE Standard 754-1985 for Binary Floating-Point Arithmetic Y Available in a 68-Pin PLCC Package See Intel Packaging Specification, Order Ý231369 The Intel387 TM SX Math CoProcessor is an extension to the Intel386 TM SX microprocessor architecture. The combination of the Intel387 TM SX with the Intel386 TM SX microprocessor dramatically increases the processing speed of computer application software that utilizes high performance floating-point operations. An internal Power Management Unit enables the Intel387 TM SX to perform these floating-point operations while maintaining very low power consumption for portable and desktop applications. The internal Power Management Unit effectively reduces power consumption by 95% when the device is idle. The Intel387 TM SX Math CoProcessor is available in a 68-pin PLCC package, and is manufactured on Intel’s advanced 1.0 micron CHMOS IV technology. 240225 – 22 Intel386 and Intel387 are trademarks of Intel Corporation. *Other brands and names are the property of their respective owners. Information in this document is provided in connection with Intel products. Intel assumes no liability whatsoever, including infringement of any patent or copyright, for sale and use of Intel products except as provided in Intel’s Terms and Conditions of Sale for such products. Intel retains the right to make changes to these specifications at any time, without notice. Microcomputer Products may have minor variations to this specification known as errata. January 1994 COPYRIGHT © INTEL CORPORATION, 1995 Order Number: 240225-009 1 Intel387 TM SX Math CoProcessor CONTENTS PAGE 1.0 PIN ASSIGNMENT ÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀ 5 1.1 Pin Description Table ÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀ 6 2.0 FUNCTIONAL DESCRIPTION ÀÀÀÀÀÀÀÀÀ 7 2.1 Feature List ÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀ 7 2.2 Math CoProcessor Architecture ÀÀÀÀÀÀ 7 2.3 Power Management ÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀ 8 2.3.1 Dynamic Mode ÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀ 8 2.3.2 Idle Mode ÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀ 8 2.4 Compatibility ÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀ 8 2.5 Performance ÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀ 8 3.0 PROGRAMMING INTERFACE ÀÀÀÀÀÀÀÀÀ 9 3.1 Instruction Set ÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀ 9 3.1.1 Data Transfer Instructions ÀÀÀÀÀÀ 9 3.1.2 Arithmetic Instructions ÀÀÀÀÀÀÀÀÀÀ 9 3.1.3 Comparison Instructions ÀÀÀÀÀÀÀ 10 3.1.4 Transcendental Instructions ÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀ 10 3.1.5 Load Constant Instructions ÀÀÀÀ 10 3.1.6 Processor Instructions ÀÀÀÀÀÀÀÀÀ 11 3.2 Register Set ÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀ 11 3.2.1 Status Word (SW) Register ÀÀÀÀ 12 3.2.2 Control Word (CW) Register ÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀ 15 3.2.3 Data Register ÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀ 16 3.2.4 Tag Word (TW) Register ÀÀÀÀÀÀÀ 16 3.2.5 Instruction and Data Pointers ÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀ 16 3.3 Data Types ÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀ 18 3.4 Interrupt Description ÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀ 18 3.5 Exception Handling ÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀ 18 3.6 Initialization ÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀ 21 3.7 Processing Modes ÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀ 21 3.8 Programming Support ÀÀÀÀÀÀÀÀÀÀÀÀÀÀ 21 CONTENTS PAGE 4.0 HARDWARE SYSTEM INTERFACE ÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀ 21 4.1 Signal Description ÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀ 22 4.1.1 Intel386 CPU Clock 2 (CPUCLK2) ÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀ 22 4.1.2 Intel387 Math CoProcessor Clock 2 (NUMCLK2) ÀÀÀÀÀÀÀÀÀÀÀÀÀÀ 22 4.1.3 Clocking Mode (CKM) ÀÀÀÀÀÀÀÀÀ 23 4.1.4 System Reset (RESETIN) ÀÀÀÀÀÀ 23 4.1.5 Processor Request (PEREQ) ÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀ 23 4.1.6 Busy Status (BUSYÝ) ÀÀÀÀÀÀÀÀÀ 23 4.1.7 Error Status (ERRORÝ) ÀÀÀÀÀÀÀ 23 4.1.8 Data Pins (D15–D0) ÀÀÀÀÀÀÀÀÀÀÀ 23 4.1.9 Write/Read Bus Cycle (W/RÝ) ÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀ 23 4.1.10 Address Stobe (ADSÝ) ÀÀÀÀÀÀÀ 23 4.1.11 Bus Ready Input (READYÝ) ÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀ 24 4.1.12 Ready Output (READYOÝ) ÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀ 24 4.1.13 Status Enable (STEN) ÀÀÀÀÀÀÀÀ 24 4.1.14 Math CoProcessor Select 1 (NPS1Ý) ÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀ 24 4.1.15 Math CoProcessor Select 2 (NPS2) ÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀ 24 4.1.16 Command (CMD0Ý) ÀÀÀÀÀÀÀÀÀ 24 4.1.17 System Power (VCC) ÀÀÀÀÀÀÀÀÀ 24 4.1.18 System Ground (VSS) ÀÀÀÀÀÀÀÀ 24 4.2 System Configuration ÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀ 25 4.3 Math CoProcessor Architecture ÀÀÀÀÀ 26 4.3.1 Bus Control Logic ÀÀÀÀÀÀÀÀÀÀÀÀÀÀ 26 4.3.2 Data Interface and Control Unit ÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀ 26 4.3.3 Floating Point Unit ÀÀÀÀÀÀÀÀÀÀÀÀÀ 26 4.3.4 Power Management Unit ÀÀÀÀÀÀÀ 26 2 2 CONTENTS PAGE 4.4 Bus Cycles ÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀ 26 4.4.1 Intel387 SX Math CoProcessor Addressing ÀÀÀÀÀÀÀÀÀÀ 27 4.4.2 CPU/Math CoProcessor Synchronization ÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀ 27 4.4.3 Synchronous/Asynchronous Modes ÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀ 27 4.4.4 Automatic Bus Cycle Termination ÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀ 27 5.0 BUS OPERATION ÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀ 27 5.1 Non-pipelined Bus Cycles ÀÀÀÀÀÀÀÀÀÀ 28 5.1.1 Write Cycle ÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀ 28 5.1.2 Read Cycle ÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀ 29 5.2 Pipelined Bus Cycles ÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀ 29 5.3 Mixed Bus Cycles ÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀ 30 5.4 BUSYÝ and PEREQ Timing Relationship ÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀ 32 CONTENTS PAGE 7.0 ELECTRICAL CHARACTERISTICS ÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀ 33 7.1 Absolute Maximum Ratings ÀÀÀÀÀÀÀÀÀ 33 7.2 D.C. Characteristics ÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀ 34 7.3 A.C. Characteristics ÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀ 35 8.0 Intel387 SX MATH COPROCESSOR INSTRUCTION SET ÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀ 41 APPENDIX AÐIntel387 SX MATH COPROCESSOR COMPATIBILITY ÀÀÀÀ A-1 A.1 8087/80287 Compatibility ÀÀÀÀÀÀÀÀÀ A-1 A.1.1 General Differences ÀÀÀÀÀÀÀÀÀÀ A-1 A.1.2 Exceptions ÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀ A-2 APPENDIX BÐCOMPATIBILITY BETWEEN THE 80287 AND 8087 MATH COPROCESSOR ÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀ B-1 6.0 PACKAGE SPECIFICATIONS ÀÀÀÀÀÀÀÀ 33 6.1 Mechanical Specifications ÀÀÀÀÀÀÀÀÀÀ 33 6.2 Thermal Specifications ÀÀÀÀÀÀÀÀÀÀÀÀÀ 33 3 3 CONTENTS FIGURES Figure 1-1 PAGE Intel387 SX Math CoProcessor Pinout ÀÀÀÀÀÀÀÀÀÀÀ 5 Figure 2-1 Intel387 SX Math CoProcessor Block Diagram ÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀ 7 Figure 3-1 Intel 386 SX CPU and Intel387 Math CoProcessor Register Set ÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀ 11 Figure 3-2 Status Word ÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀ 12 Figure 3-3 Control Word ÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀ 15 Figure 3-4 Tag Word Register ÀÀÀÀÀÀÀÀÀÀÀ 16 Figure 3-5 Instruction and Data Pointer Image in Memory, 32-Bit Protected Mode Format ÀÀÀÀÀÀ 17 Figure 3-6 Instruction and Data Pointer Image in Memory, 16-Bit Protected Mode Format ÀÀÀÀÀÀ 17 Figure 3-7 Instruction and Data Pointer Image in Memory, 32-Bit Real Mode Format ÀÀÀÀÀÀÀÀÀÀÀ 17 Figure 3-8 Instruction and Data Pointer Image in Memory, 16-Bit Real Mode Format ÀÀÀÀÀÀÀÀÀÀÀ 18 Figure 4-1 Intel386 SX CPU and Intel387 SX Math CoProcessor System Configuration ÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀ 25 Figure 5-1 Bus State Diagram ÀÀÀÀÀÀÀÀÀÀÀ 28 Figure 5-2 Non-Pipelined Read and Write Cycles ÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀ 29 Figure 5-3 Fastest Transition to and from Pipelined Cycles ÀÀÀÀÀÀÀÀ 30 Figure 5-4 Pipelined Cycles with Wait States ÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀ 31 Figure 5-5 BUSYÝ and PEREQ Timing Relationship ÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀ 32 Figure 7-1a Typical Output Valid Delay vs Load Capacitance at Max Operating Temperature ÀÀÀÀÀÀ 37 Figure 7-1b Typical Output Slew Time vs Load Capacitance at Max Operating Temperature ÀÀÀÀÀÀ 37 Figure 7-1c Maximum ICC vs Frequency ÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀ 37 CONTENTS Figure 7-2 Figure 7-3 Figure 7-4 Figure 7-5 Figure 7-6 Figure 7-7 PAGE CPUCLK2/NUMCLK2 Waveform and Measurement Points for Input/Output ÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀ 38 Output Signals ÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀ 38 Input and I/O Signals ÀÀÀÀÀÀÀÀ 39 RESET Signal ÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀ 39 Float from STEN ÀÀÀÀÀÀÀÀÀÀÀÀÀ 40 Other Parameters ÀÀÀÀÀÀÀÀÀÀÀÀ 40 TABLES Table 1-1 Pin Cross ReferenceÐ Functional Grouping ÀÀÀÀÀÀÀÀÀÀÀ 5 Table 3-1 Condition Code Interpretation ÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀ 13 Table 3-2 Condition Code Interpretation after FPREM and FPREM1 Instructions ÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀ 14 Table 3-3 Condition Code Resulting from Comparison ÀÀÀÀÀÀÀÀÀÀÀÀÀ 14 Table 3-4 Condition Code Defining Operand Class ÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀ 14 Table 3-5 Mapping Condition Codes to Intel386 CPU Flag Bits ÀÀÀÀÀÀÀÀ 14 Table 3-6 Intel387 SX Math CoProcessor Data Type Representation in Memory ÀÀÀÀ 19 Table 3-7 CPU Interrupt Vectors Reserve for Math CoProcessor ÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀ 20 Table 3-8 Intel387 SX Math CoProcessor Exceptions ÀÀÀÀÀÀ 20 Table 4-1 Pin Summary ÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀ 22 Table 4-2 Output Pin Status during Reset ÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀ 23 Table 4-3 Bus Cycle Definition ÀÀÀÀÀÀÀÀÀÀ 26 Table 6-1 Thermal Resistances (§ C/Watt) iJC and iJA ÀÀÀÀÀÀÀÀ 33 Table 6-2 Maximum TA at Various Airflows ÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀ 33 Table 7-1 D.C. Specifications ÀÀÀÀÀÀÀÀÀÀÀ 34 Table 7-2a Timing Requirements of the Bus Interface Unit ÀÀÀÀÀÀÀÀÀÀÀÀ 35 Table 7-2b Timing Requirements of the Execution Unit ÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀ 36 Table 7-2c Other AC Parameters ÀÀÀÀÀÀÀÀÀ 36 Table 8-1 Instruction Formats ÀÀÀÀÀÀÀÀÀÀÀ 41 4 4 Intel387 TM SX MATH COPROCESSOR 1.0 PIN ASSIGNMENT The Intel387 SX Math CoProcessor pinout as viewed from the top side of the component is shown in Figure 1-1. VCC and VSS (GND) connections must be made to multiple pins. The circuit board should include VCC and VSS planes for power distribution and all VCC and VSS pins must be connected to the appropriate plane. NOTE: Pins identified as N.C. should remain completely unconnected. 240225 – 1 Figure 1-1. Intel387 TM SX Math CoProcessor Pinout Table 1-1. Pin Cross ReferenceÐFunctional Grouping BUSYÝ PEREQ ERRORÝ 36 56 35 ADSÝ CMD0Ý NPS1Ý NPS2 STEN W/RÝ 47 48 44 45 40 41 READYÝ READYOÝ 49 57 CKM CPUCLK2 NUMCLK2 59 54 53 RESETIN 51 D00 D01 D02 D03 D04 D05 D06 D07 D08 D09 D10 D11 D12 D13 D14 D15 19 20 23 8 7 6 3 2 24 28 29 30 16 15 12 11 VCC 4 9 13 22 26 31 33 37 39 43 46 50 58 62 64 VSS 5 14 21 25 27 32 34 38 42 55 60 61 63 66 N.C. 1 10 17 18 52 65 67 68 5 5 Intel387 TM SX MATH COPROCESSOR 1.1 Pin Description Table The following table lists a brief description of each pin on the Intel387 SX Math CoProcessor. For a more complete description refer to Section 4.1 Signal Description. The following definitions are used in these descriptions: Ý The signal is active LOW. I O Input Signal Output Signal I/O Input and Output Signal Symbol Type Name and Function ADSÝ I ADDRESS STROBE indicates that the address and bus cycle definition is valid. BUSYÝ O BUSY indicates that the Math CoProcessor is currently executing an instruction. CKM I CLOCKING MODE is used to select synchronous or asynchronous clock modes. CMD0 I COMMAND determines whether an opcode or operand are being sent to the Math CoProcessor. During a read cycle it indicates which register group is being read. CPUCLK2 I CPU CLOCK input provides the timing for the bus interface unit and the execution unit in synchronous mode. D15–D0 I/O DATA BUS is used to transfer instructions and data between the Math CoProcessor and CPU. ERRORÝ O ERROR signals that an unmasked exception has occurred. NC Ð NO CONNECT should always remain unconnected. Connection of a N.C. pin may cause the Math CoProcessor to malfunction or be incompatible with future steppings. NPS1Ý I NPX SELECT 1 is used to select the Math CoProcessor. NPS2 I NPX SELECT 2 is used to select the Math CoProcessor. NUMCLK2 I NUMERICS CLOCK is used in asynchronous mode to drive the Floating Point Execution Unit. PEREQ O PROCESSOR EXTENSION REQUEST signals the CPU that the Math CoProcessor is ready for data transfer to/from its FIFO. READYÝ I READY indicates that the bus cycle is being terminated. READYOÝ O READY OUT signals the CPU that the Math CoProcessor is terminating the bus cycle. RESETIN I SYSTEM RESET terminates any operation in progress and forces the Math CoProcessor to enter a dormant state. STEN I STATUS ENABLE serves as a master chip select for the Math CoProcessor. When inactive, this pin forces all outputs and bi-directional pins into a floating state. W/RÝ I WRITE/READ indicates whether the CPU bus cycle in progress is a read or a write cycle. VCC I SYSTEM POWER provides the a 5V nominal D.C. supply input. VSS I SYSTEM GROUND provides the 0V connection from which all inputs and outputs are measured. 6 6 Intel387 TM SX MATH COPROCESSOR 2.0 # Expands Intel386 SX CPU data types to include FUNCTIONAL DESCRIPTION The Intel387 SX Math CoProcessor is designed to support the Intel386 SX Microprocessor and effectively extend the CPU architecture by providing fast execution of arithmetic instructions and transcendental functions. This component contains internal power management circuitry for reduced active power dissipation and an automatic idle mode. 32-bit, 64-bit, and 80-bit Floating Point; 32-bit and 64-bit Integers; and 18 Digit BCD Operands. # Directly extends the Intel386 SX CPU Instruction Set to trigonometric, logarithmic, exponential, and arithmetic functions for all data types. # Operates independently of Real, Protected, and Virtual-86 Modes of the Intel386 SX Microprocessors. # Fully compatible with the Intel387 SL Mobile and 2.1 Feature List # New power saving design provides low power dissipation in active and idle modes. DX Math CoProcessors. Implements all Intel387 Math CoProcessor architectural enhancements over 8087 and 80287. # Implements ANSI/IEEE Standard 754-1985 for # Higher Performance, 10%–25% higher benchmark performance than the original Intel387 SX Math CoProcessor. # High Performance 84-bit Internal Architecture # Eight 80-bit Numeric Registers, usable as individually addressable general registers or as a register stack. # Full-range transcendental operations for SINE, COSINE, TANGENT, ARCTANGENT, and LOGARITHM. # Programmable rounding modes and notification of rounding effects. # Exception reporting either by software polling or hardware interrupts. # Fully compatible with the SX Microprocessors. binary floating point arithmetic. # Upward Object Code compatible from 8087 and 80287. 2.2 Math CoProcessor Architecture As shown in Figure 2-1, the Intel387 SX Math CoProcessor is internally divided into four sections; the Bus Control Logic, the Data Interface and Control Logic, the Floating Point Unit, and the Power Management Unit. The Bus Control Logic is responsible for the CPU bus tracking and interface. The Data Interface and Control Unit latches data and decodes instructions. The Floating Point Unit executes the mathematical instructions. The Power Management Unit is new to the Intel387 family and is the nucleus 240225 – 2 Figure 2-1. Intel387 TM SX Math CoProcessor Block Diagram 7 7 Intel387 TM SX MATH COPROCESSOR of the static architecture. It is responsible for shutting down idle sections of the device to save power. Microprocessor/Math CoProcessor Interface The Intel386 CPU interprets the pattern 11011B in most significant five bits of an instruction as an opcode intended for a math coprocessor. Instructions thus marked are called ESCAPE or ESC instructions. Upon decoding the instruction as an ESC instruction, the Intel386 CPU transfers the opcode to the math coprocessor through an I/O write cycle at a dedicated address (8000F8H) outside the normal programmed I/O address range. The math coprocessor has dedicated output signals for controlling the data transfer and notifying the CPU if the Math CoProcessor is busy or that a floating point error has occurred. Math CoProcessor accepts the instruction and ramps the internal core within one clock so there is no impact to performance or throughput. In idle mode, the Intel387 SX Math CoProcessor draws typically 4 mA of current and reduces case temperature to near ambient. NOTE: In asynchronous clock mode (CKM e 0), the internal idle mode is disabled. 2.4 Compatibility The Intel387 SX Math CoProcessor is compatible with the Intel387 SL Mobile Math CoProcessor. Due to the increased performance and internal pipelining effects, diagnostic programs should never use instruction execution time for test purposes. 2.3 Power Management The Intel387 SX Math CoProcessor offers two modes of power management; dynamic and idle. 2.3.1 DYNAMIC MODE Dynamic Mode is when the device is executing an instruction. Using Intel’s CHMOS IV technology, the Intel387 SX Math CoProcessor draws considerably less power than its predecessor. The active power supply current is reduced to approximately 100 mA at 20 MHz and provides low case temperatures. 2.3.2 IDLE MODE When an instruction is not being executed, the Intel387 SX Math CoProcessor will automatically change to Idle Mode . Three clocks after completion of the previous instruction, the internal power manager shuts down the floating point execution unit and all non-essential circuitry. Only portions of the Bus Interface Unit remain active to monitor the CPU bus activity and to accept the next instruction when it is transferred. When the CPU transfers the next instruction to the Math CoProcessor, the Intel387 SX 2.5 Performance The increased performance of floating point calculations can be attributed to the 84-bit architecture and floating point processor. For the CPU to execute floating point calculations requires very long software emulation methods with reduced resolution and accuracy. The performance of the Intel387 SX Math CoProcessor has been further enhanced through improvements in the internal microcode and through internal architectural changes. These refinements will increase Whetstone benchmarks by approximately 10% to 25% over the original Intel387 SX Math CoProcessor. Real performance, however, should be measured with application software. Depending upon software coding, system overhead, and percentage of floating point instructions, performance can vary significantly. 8 8 Intel387 TM SX MATH COPROCESSOR 3.0 PROGRAMMING INTERFACE The Intel387 SX Math CoProcessor effectively extends to an Intel386 Microprocessor system additional instructions, registers, data types, and interrupts specifically designed to facilitate high-speed floating point processing. All communication between the CPU and the Math CoProcessor is transparent to applications software. The CPU automatically controls the Math CoProcessor whenever a numerics instruction is executed. All physical memory and virtual memory of the CPU are available for storage of the instructions and operands of programs that use the Math CoProcessor. All memory addressing modes, including use of displacement, base register, index register, and scaling are available for addressing numerical operands. The Intel387 SX Math CoProcessor is software compatible with the Intel387 DX Math CoProcessors and supports all applications written for the Intel386 CPU and Intel387 Math CoProcessors. 3.1 Instruction Set The Intel386 CPU interprets the pattern 11011B in most significant five bits of an instruction as an opcode intended for a math coprocessor. Instructions thus marked are called ESCAPE or ESC instruction. The typical Math CoProcessor instruction accepts one or two operands and produces one or sometimes two results. In two-operand instructions, one operand is the contents of the Math CoProcessor register, while the other may be a memory location. The operands of some instructions are predefined; for example, FSQRT always takes the square root of the number in the top stack element. The Intel387 SX Math CoProcessor instruction set can be divided into six groups. The following sections gives a brief description of each instruction. Section 8.0 defines the instruction format and byte fields. Further details can be obtained from the Intel387 User’s Manual, Programmer’s Reference, Order Ý231917. 3.1.1 DATA TRANSFER INSTRUCTIONS The class includes the operations that load, store, and convert operands of any support data types. Integer Transfers FILD Load (convert from) Integer (word, short, long) FIST Store (convert to) Integer (word, short) FISTP Store (convert to) Integer and pop (word, short, long) Packed Decimal Transfers FBLD Load (convert from) packed decimal FBSTP Store packed decimal and pop 3.1.2 ARITHMETIC INSTRUCTIONS This class of instructions provide variations on the basic add, subtract, multiply, and divide operations and a number of other basic arithmetic operations. Operands may reside in registers or one operand may reside in memory. Addition FADD FADDP Add Real Add Real and pop FIADD Add Integer Subtraction FSUB FSUBP Subtract Real Subtract Real and pop FISUB Subtract Integer FSUBR Subtract Real reversed FSUBRP Subtract Real reversed and pop FISUBR Subtract Integer reversed Multiplication FMUL Multiply Real FMULP Multiply Real and pop FIMUL Division FDIV FDIVP FIDIV FDIVR FDIVRP FIDIVR Multiply Integer Divide Divide Divide Divide Divide Divide Real Real and pop Integer Real reversed Real reversed and pop Integer reversed Real Transfers FLD Load Real (single, double, extended) FST Store Real (single, double) FSTP Store Real and pop (single, double, extended) FXCH Exchange registers 9 9 Intel387 TM SX MATH COPROCESSOR Other Operations FSQRT FSCALE Square Root Scale FPREM Partial Remainder FPREM1 IEEE standard partial remainder FRNDINT Round to Integer FXTRACT Extract Exponent and Significand FABS Absolute Value FCHS Change sign 3.1.3 COMPARISON INSTRUCTION Instructions of this class allow comparison of numbers of all supported real and integer data types. Each of these instructions analyzes the top stack element often in relationship to another operand and reports the result as a condition code in the status word. FCOM Compare Real FCOMP FCOMPP FUCOM Compare Real and pop Compare Real and pop twice Unordered compare Real FUCOMP Unordered compare Real and pop FUCOMPP Unordered compare Real and pop twice FICOM Compare Integer FICOMP Compare Integer and pop FTST FXAM Test Examine 3.1.4 TRANSCENDENTAL INSTRUCTIONS This group of the Intel387 operations includes trigonometric, inverse trigonometric, logarithmic and exponential functions. The transcendental operate on the top one or two stack elements, and they return their results to the stack. The trigonometric operations assume their arguments are expressed in radians. The logarithmic and exponential operations work in base 2. FSIN Sine FCOS Cosine FSINCOS Sine and cosine FPTAN Tangent FPATAN F2XM1 Arctangent of ST(1)/ST 2x –1 FYL2X Y * log2X FYL2XP1 Y * log2(X a 1) 3.1.5 LOAD CONSTANT INSTRUCTIONS Each of these instructions loads (pushes) a commonly used constant onto the stack. The constants have extended real values nearest to the infinitely precise numbers. The only error that can be generated is an Invalid Exception if a stack overflow occurs. FLDZ Load a 0.0 FLD1 FLDPI FLDL2T FLDL2E FLDLG2 FLDLN2 Load Load Load Load Load Load a 1.0 q log2 10 log2e log102 loge2 10 10 Intel387 TM SX MATH COPROCESSOR FRSTOR FINCSTP 3.1.6 PROCESSOR INSTRUCTIONS (ADMINISTRATIVE) Restore State Increment Stack pointer FINIT Initialize Math CoProcessor FDECSTP Decrement Stack pointer FLDCW Load Control Word FSTCW FLDCW Store Control Word Load Status Word FFREE FNOP Free Register No Operation FWAIT Report Math CoProcessor Error FSTSW Store Status Word FSTSW AX Store Status Word to AX register FCLEX Clear Exceptions FSTENV Store Environment FLDENV FSAVE Load Environment Save State 3.2 Register Set Figure 3-1 shows the Intel387 SX Math CoProcessor register set. When a Math CoProcessor is present in a system, programmers may use these registers in addition to the registers normally available on the CPU. i386 TM Microprocessor Registers GENERAL REGISTERS 31 16 15 0 AX EAX AH CS AL BX EBX BH SS BL DS CX ECX CH EDX SEGMENT REGISTERS 15 0 ES CL FS DX DH GS DL ESI SI EDI DI 31 0 EIP EFLAGS EBP ESP BP SP l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l i387 TM Math CoProcessor Data Registers 79 R0 Sign 78 64 Exponent 63 0 Tag Field 1 0 Significand R1 R2 R3 R4 R5 R6 R7 15 0 Control Register Status Register Tag Word 47 0 Instruction Pointer (in CPU) Data Pointer (in CPU) Figure 3-1. Intel386 TM CPU and Intel387 TM Math CoProcessor Register Set 11 11 Intel387 TM SX MATH COPROCESSOR 3.2.1 STATUS WORD (SW) REGISTER The 16-bit status word (in the status register) shown in Figure 3-2 reflects the overall state of the Math CoProcessor. It can be read and inspected by programs using the FSTSW memory or FSTSW AX instructions. Bit 15, the Busy bit (B) is included for 8087 compatibility only. It always has the same value as the Error Summary bit (ES, bit 7 of status word); it does not indicate the status of the BUSYÝ output of the Math CoProcessor. Bits 13–11 (TOP) serves as the pointer to the Math CoProcessor data register that is the current Top-OfStack. The significance of the stack top is described in Section 3.2.5 Data Registers. The four numeric condition code bits (C3 –C0, Bit 14, 10–8) are similar to the flags in a CPU; instructions that perform arithmetic operations update these bits to reflect the outcome. The effects of the instructions on the condition code are summarized in Tables 3-1 through 3-4. These condition code bits are used principally for conditional branching. The FSTSW AX instructions stores the Math CoProcessor status word directly to the CPU AX register, allowing the condition codes to be inspected efficiently by Intel386 CPU code. The Intel386 CPU SAHF instruction can copy C3 –C0 directly to the flag bits to simplify conditional branching. Table 3-5 shows the mapping of these bits to the Intel386 CPU flag bits. Bit 7 is the error summary (ES) status bit. This bit is set if any unmasked exception bit is set; it is clear otherwise. If this bit is set, the ERRORÝ signal is asserted. Bit 6 is the stack flag (SF). This bit is used to distinguish invalid operations due to stack overflow or underflow from other kinds of invalid operations. When SF is set, bit 9 (C1) distinguishes between stack overflow (C1 e 1) or underflow (C1 e 0). Bit 5 – 0 are the six exception flags of the status word and are set to indicate that during an instruction execution the Math CoProcessor has detected one of six possible exception conditions since these status bits were last cleared or reset. Section 3.5 entitled Exception Handling explains how they are set and used. The exception flags are ‘‘sticky’’ bits and can only be cleared by the instructions FINIT, FCLEX, FLDENV, FSAVE, and FRSTOR. Note that when a new value is loaded into the status word by the FLDENV or FRSTOR instruction, the value of ES (bit 7) and B (bit 15) are not derived from the values loaded from memory but rather are dependent upon the values of the exception flags (bits 5 – 0) in the status word and their corresponding masks in the control word. If ES is set in such a case, the ERRORÝ output of the Math CoProcessor is activated immediately. 240225 – 3 ES is set if any unmasked exception bit is set; cleared otherwise. See Table 2-2 for interpretation of condition code. TOP values: 000 e Register 0 is Top of Stack 001 e Register 1 is Top of Stack . . . e Register 7 is Top of Stack 111 For definitions of exceptions, refer to the section entitled ‘‘Exception Handling’’ Figure 3-2. Status Word 12 12 Intel387 TM SX MATH COPROCESSOR Table 3-1. Condition Code Interpretation Instruction C0 (S) FPREM, FPREM1 (see Table 3-2) Q2 FCOM, FCOMP, FCOMPP, FTST, FUCOM, FUCOMP, FUCOMPP, FICOM, FICOMP FXAM FCHS, FABS, FXCH, FINCSTP, FDECSTP, Constant loads, FXTRACT, FLD, FILD, FBLD, FSTP (ext real) FIST, FBSTP, FRNDINT, FST, FSTP, FADD, FMUL, FDIV, FDIVR, FSUB, FSUBR, FSCALE, FSQRT, FPATAN, F2XM1, FYL2X, FYL2XP1 FPTAN, FSIN FCOS, FSINCOS FLDENV, FRSTOR FLDCW, FSTENV, FSTCW, FSTSW, FCLEX, FINIT, FSAVE O/UÝ Reduction Roundup UNDEFINED C3 (Z) Three least significant bits of quotient Q0 Result of comparison (see Table 3-3) C1 (A) Q1 or O/UÝ C2 (C) Reduction 0 e complete 1 e incomplete Zero or O/UÝ Operand is not comparable (Table 3-3) Operand class (see Table 3-4) Sign or O/UÝ Operand class (Table 3-4) UNDEFINED Zero or O/UÝ UNDEFINED UNDEFINED Roundup or O/UÝ UNDEFINED UNDEFINED Roundup or O/UÝ, undefined if C2 e 1 Reduction 0 e complete 1 e incomplete Each bit loaded from memory UNDEFINED When both IE and SF bits of status word are set, indicating a stack exception, this bit distinguishes between stack overflow (C1 e 1) and underflow (C1 e 0). If FPREM or FPREM1 produces a remainder that is less than the modulus, reduction is complete. When reduction is incomplete the value at the top of the stack is a partial remainder, which can be used as input to further reduction. For FPTAN, FSIN, FCOS, and FSINCOS, the reduction bit is set if the operand at the top of the stack is too large. In this case the original operand remains at the top of the stack. When the PE bit of the status word is set, this bit indicates whether the last rounding in the instruction was upward. Do not rely on finding any specific value in these bits. 13 13 Intel387 TM SX MATH COPROCESSOR Table 3-2. Condition Code Interpretation after FPREM and FPREM1 Instructions Condition Code Interpretation after FPREM and FPREM1 C2 C3 C1 C0 1 X X X Q1 Q0 Q2 Q MOD8 0 0 1 1 0 0 1 1 0 1 0 1 0 1 0 1 0 0 0 0 1 1 1 1 0 1 2 3 4 5 6 7 0 Incomplete Reduction: further interation required for complete reduction Complete Reduction: C0, C3, C1 contain three least significant bits of quotient Table 3-3. Condition Code Resulting from Comparison Order C3 C2 C0 TOP l Operand TOP k Operand 0 0 1 1 0 0 0 1 0 1 0 1 TOP e Operand Unordered Table 3-4. Condition Code Defining Operand Class C3 C2 C1 C0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 0 Value at TOP a Unsupported a NaN b Unsupported b NaN a Normal a Infinity b Normal b Infinity a0 a Empty b0 b Empty a Denormal b Denormal Table 3-5 Mapping Condition Codes to Intel386 TM CPU Flag Bits 240225 – 4 14 14 Intel387 TM SX MATH COPROCESSOR 3.2.2 CONTROL WORD (CW) REGISTER The Math CoProcessor provides the programmer with several processing options that are selected by loading a control word from memory into the control register. Figure 3-3 show the format and encoding of fields in the control word. The low-order byte of the control word register is used to configure the exception masking. Bits 5 – 0 of the control word contain individual masks for each of the six exceptions that the Math CoProcessor recognizes. See Section 3.5, Exception Handling, for further explanation on the exception control and definition. The high-order byte of the control word is used to configure the Math CoProcessor operating mode, including precision, rounding and infinity control. # The rounding control (RC) field (bits 11–10) provide for directed rounding and true chop, as well as the unbiased round to nearest even mode specified in the IEEE standard. Rounding control affects only those instructions that perform rounding at the end of the operation (and thus can generate a precision exception); namely, FST, FSTP, FIST, all arithmetic instructions (except FPREM, FPREM1, FXTRACT, FABS, and FCHS) and all transcendental instructions. # The precision control (PC) field (bits 9 – 8) can be used to set the Math CoProcessor internal operating precision of the significand at less than the default of 64 bits (extended precision). This can be useful in providing compatibility with early generation arithmetic processors of smaller precision. PC affects only the instructions FADD, FSUB(R), FMUL, FDIV(R), and FSQRT. For all other instructions, either the precision is determined by the opcode or extended precision is used. # The ‘‘infinity control bit’’ (bit 12) is not meaningful to the Intel387 SX Math CoProcessor and programs must ignore its value. To maintain compatibility with the 8087 and 80287 (non-387 core), this bit can be programmed, however, regardless of its value the Intel387 SX Math CoProcessor always treats infinity in the affine sense ( b % k a % ). This bit is initialized to zero both after a hardware reset and after FINIT instruction. All other bits are reserved and should not be programmed, to assure compatibility with future processors. 240225 – 5 Precision Control 00Ð24 bits (single precision) 01Ð(reserved) 10Ð53 bits (double precision) 11Ð64 bits (extended precision) Rounding Control 00ÐRound to nearest or even 01ÐRound down (toward b % ) 10ÐRound up (toward a % ) 11ÐChop (truncate toward zero) Figure 3-3. Control Word 15 15 Intel387 TM SX MATH COPROCESSOR 3.2.3 DATA REGISTER Intel387 SX Math CoProcessor data register set consists of eight registers (R0–R7) which are treated as both a stack and a general register file. Each of these data registers in the Math CoProcessor is 80 bits wide and is divided into fields corresponding to the Math CoProcessor’s extended-precision real data type, which is used for internal calculations. The Math CoProcessor register set can be accessed either as a stack, with instructions operating on the top one or two stack elements, or as individually addressable registers. The TOP field in the status word identifies the current top-of-stack register. A ‘‘push’’ operation decrements TOP by one and loads a value into the new top register. A ‘‘store and pop’’ operation stores the value from the current top register into memory and then increments TOP by one. The Math CoProcessor register stack grows ‘‘down’’ toward lower-addressed registers. Most of the Intel387 SX Math CoProcessor operations use the register stack as the operand(s) and/or as a place to store the result. Instructions may address the data register either implicitly or explicitly. Many instructions operate on the register at the top of the stack. These instructions implicitly address the register at which TOP points. Other instructions allow the programmer to explicitly specify which register to use. Explicit register addressing is also relative to TOP (where ST denotes the current stack top and ST(i) refers to the i’th register from the ST in the stack so the real register address in computed as ST a i). 3.2.4 TAG WORD (TW) REGISTER The tag word marks the content of each numeric data register, as Figure 3-4 shows. Each two-bit tag represents one of the eight data register. The princi- pal function of the tag word is to optimize the Math CoProcessor’s performance and stack handling by making it possible to distinguish between empty and non-empty register locations. It also enables exception handlers to identify special values (e.g. NaNs or denormals) in the contents of a stack location without the need to perform complex decoding of the actual data. 3.2.5 INSTRUCTION AND DATA POINTERS Because the Math CoProcessor operates in parallel with the CPU, any exceptions detected by the Math CoProcessor may be reported after the CPU has executed the ESC instruction which caused it. To allow identification of the numeric instruction which caused the exception, the Intel386 Microprocessor contains registers that aid in diagnosis. These registers supply the address of the failing instruction and the address of its numeric memory operand (if appropriate). The instruction and data pointers are provided for user-written exception handlers. These registers are located in the CPU, but appear to be located in the Math CoProcessor because they are accessed by the ESC instructions FLDENV, FSTENV, FSAVE, and FRSTOR; which transfer the values between the registers and memory. Whenever the CPU executes a new ESC instruction (except administrative instructions), it saves the address of the instruction (including any prefixes that may be present), the address of the operand (if present) and the opcode. The instruction and data pointers appear in one of four formats depending on the operating mode of the CPU (protected mode or real-address mode) and depending on the operand size attribute in effect (32-bit operand or 16-bit operand). (See Figures 3-5, 3-6, 3-7, and 3-8.) Note that the value of the data pointer is undefined if the prior ESC instruction did not have a memory operand. 15 TAG (7) 0 TAG (6) TAG (5) TAG (4) TAG (3) TAG (2) TAG (1) TAG (0) NOTE: The index i of tag(i) is not top-relative. A program typically uses the ‘‘top’’ field of Status Word to determine which tag(i) field refers to logical top of stack. TAG VALUES: 00 e Valid 01 e Zero 10 e QNaN, SNaN, Infinity, Denormal and Unsupported Formats 11 e Empty Figure 3-4. Tag Word Register 16 16 Intel387 TM SX MATH COPROCESSOR 32-BIT PROTECTED MODE FORMAT 31 23 15 7 0 RESERVED CONTROL WORD 0 RESERVED STATUS WORD 4 RESERVED TAG WORD 8 IP OFFSET 00000 C OPCODE 10..0 CS SELECTOR 10 DATA OPERAND OFFSET 14 RESERVED OPERAND SELECTOR 18 Figure 3-5. Instruction and Data Pointer Image in Memory, 32-Bit Protected-Mode Format 15 16-BIT PROTECTED MODE FORMAT 7 0 CONTROL WORD 0 STATUS WORD 2 TAG WORD 4 IP OFFSET 6 CS SELECTOR 8 OPERAND OFFSET A OPERAND SELECTOR C Figure 3-6. Instruction and Data Pointer Image in Memory, 16-Bit Protected-Mode Format 31 23 0000 32-BIT REAL-ADDRESS MODE FORMAT 15 0 CONTROL WORD 0 RESERVED STATUS WORD 4 RESERVED TAG WORD 8 RESERVED INSTRUCTION POINTER 15..0 C INSTRUCTION POINTER 31..16 RESERVED 0000 7 RESERVED OPERAND POINTER 31..16 0 OPCODE 10..0 OPERAND POINTER 15..0 0000 00000000 10 14 18 Figure 3-7. Instruction and Data Pointer Image in Memory, 32-Bit Real-Mode Format 17 17 Intel387 TM SX MATH COPROCESSOR 16-BIT REAL-ADDRESS MODE AND VIRTUAL 8086 MODE FORMAT 15 7 IP19.16 0 CONTROL WORD 0 STATUS WORD 2 TAG WORD 4 INSTRUCTION POINTER 15..0 6 0 OPCODE 10..0 8 OPERAND POINTER 15..0 DP 19.16 0 0 0 0 0 0 0 0 0 0 A 0 0 C Figure 3-8. Instruction and Data Pointer Image in Memory, 16-Bit Real-Mode Format 3.3 Data Types 3.4 Interrupt Description Table 3-6 lists the seven data types that the Math CoProcessor supports and presents the format for each type. Operands are stored in memory with the least significant digit at the lowest memory address. Programs retrieve these values by generating the lowest address. For maximum system performance, all operands should start at physical-memory addresses that correspond to the word size of the CPU; operands may begin at any other addresses, but will require extra memory cycles to access the entire operand. CPU interrupts are used to report errors or exceptional conditions while executing numeric programs in either real or protected mode. Table 3-7 shows these interrupts and their functions. The data type formats can be divided into three classes: binary integer, decimal integer, and binary real. These formats, however, exist in memory only. Internally, the Math CoProcessor holds all numbers in the extended-precision real format. Instructions that load operands from memory automatically convert operands represented in memory as 16, 32, or 64-bit integers, 32 or 64-bit floating point numbers, or 18 digit packed BCD numbers into extended-precision real format. Instructions that store operands in memory perform the inverse type conversion. In addition to the typical real and integer data values, the Intel387 SX Math CoProcessor data formats encompass encodings for a variety of special values. These special values have significance and can express relevant information about the computations or operations that produced them. The various types of special values are denormal real numbers, zeros, positive and negative infinity, NaNs (Not-a-Number), Indefinite, and unsupported formats. For further information on data types and formats, see the Intel387 Programmer’s Reference Manual. 3.5 Exception Handling The Math CoProcessor detects six different exception conditions that occur during instruction execution. Table 3-8 lists the exception conditions in order of precedence, showing for each the cause and the default action taken by the Math CoProcessor if the exception is masked by its corresponding mask bit in the control word. Any exception that is not masked by the control word sets the corresponding exception flag of the status word, sets the ES bit of the status word, and asserts the ERRORÝ signal. When the CPU attempts to execute another ESC instruction or WAIT, exception 16 occurs. The exception condition must be resolved via an interrupt service routine. The return address pushed onto the CPU stack upon entry to the service routine does not necessarily point to the failing instruction nor to the following instruction. The CPU saves the address of the floating-point instruction that caused the exception and the address of any memory operand required by that instruction. 18 18 Intel387 TM SX MATH COPROCESSOR Table 3-6. Intel387 TM SX Math CoProcessor Data Type Representation in Memory 240225 – 23 NOTES: 1. S e Sign bit (0 e positive, 1 e negative) 2. dn e Decimal digit (two per byte) 3. X e Bits have no significance; Math CoProcessor ignores when loading, zeros when storing 4. U e Position of implicit binary point 5. I e Integer bit of significand; stored in temporary real, implicit in single and double precision 6. Exponent Bias (normalized values): Single: 127 (7FH) Double: 1023 (3FFH) Extended REal: 16383 (3FFFH) 7. Packed BCD: (b1)S (D17..D0) 8. Real: ( b1)S (2E-BIAS) (F0 F1...) 19 19 Intel387 TM SX MATH COPROCESSOR Table 3-7. CPU Interrupt Vectors Reserved for Math CoProcessor Interrupt Number Cause of Interrupt 7 An ESC instruction was encountered when EM or TS of CPU control register zero (CR0) was set. EM e 1 indicates that software emulation of the instruction is required. When TS is set, either an ESC or WAIT instruction causes interrupt 7. This indicates that the current Math CoProcessor context may not belong to the current task. 9 In a protected-mode system, an operand of a coprocessor instruction wrapped around an addressing limit (0FFFFH for expand-up segments, zero for expand-down segments) and spanned inaccessible addresses(1). The failing numerics instruction is not restartable. The address of the failing numerics instruction and data operand may be lost; an FSTENV does not return reliable addresses. The segment overrun exception should be handled by executing an FNINIT instruction (i.e., an FINIT without a preceding WAIT). The exception can be avoided by never allowing numerics operands to cross the end of a segment. 13 In a protected-mode system, the first word of a numeric operand is not entirely within the limit of its segment. The return address pushed onto the stack of the exception handler points at the ESC instruction that caused the exception, including any prefixes. The Math CoProcessor has not executed this instruction; the instruction pointer and data pointer register refer to a previous, correctly executed instruction. 16 The previous numerics instruction caused an unmasked exception. The address of the faulty instruction and the address of its operand are stored in the instruction pointer and data pointer registers. Only ESC and WAIT instructions can cause this interrupt. The CPU return address pushed onto the stack of the exception handler points to a WAIT or ESC instruction (including prefixes). This instruction can be restarted after clearing the exception condition in the Math CoProcessor. FNINIT, FNCLEX, FNSTSW, FNSTENV, and FNSAVE cannot cause this interrupt. NOTE: 1. An operand may wrap around an addressing limit when the segment limit is near an addressing limit and the operand is near the largest valid address in the segment. Because of the wrap-around, the beginning and ending addresses of such an operand will be at opposite ends of the segment. There are two ways that such an operand may also span inaccessible addresses: 1) if the segment limit is not equal to the addressing limit (e.g. addressing limit is FFFFH and segment limit is FFFDH) the operand will span addresses that are not within the segment (e.g. an 8-byte operand that starts at valid offset FFFCH will span addresses FFFC–FFFFH and 0000-0003H; however addresses FFFEH and FFFFH are not valid, because they exceed the limit); 2) if the operand begins and ends in present and accessible segments but intermediate bytes of the operand fall in a not-present page or in a segment or page to which the procedure does not have access rights. Table 3-8. Intel387 TM SX Math CoProcessor Exceptions Exception Cause Default Action (if exception is masked) Invalid Operation Operation on a signalling NaN, unsupported format, indeterminate for (0- % , 0/0, ( a % ) a ( b % ), etc.), or stack overflow/underflow (SF is also set). Result is a quiet NaN, integer indefinite, or BCD indefinte Denormalized Operand At least one of the operands is denormalized, i.e., it has the smallest exponent but a nonzero significand. Normal processing continues Zero Divisor The divisor is zero while the dividend is a noninfinite, nonzero number. Result is % Overflow The result is too large in magnitude to fit in the specified format. Result is largest finite value or % Underflow The true result is nonzero but too small to be represented in the specified format, and, if underflow exception is masked, denormalization causes the loss of accuracy. Result is denormalized or zero Inexact Result (Precision) The true result is not exactly representable in the specified format (e.g. 1/3); the result is rounded according to the rounding mode. Normal processing continues 20 20 Intel387 TM SX MATH COPROCESSOR 3.6 Initialization After FNINIT or RESET, the control word contains the value 037FH (all exceptions masked, precision control 64 bits, rounding to nearest) the same values as in an Intel287 after RESET. For compatibility with the 8087 and Intel287, the bit that used to indicate infinity control (bit 12) is set to zero; however, regardless of its setting, infinity is treated in the affine sense. After FNINIT or RESET, the status word is initialized as follows: # All exceptions are set to zero. # Stack TOP is zero, so that after the first push the stack top will be register seven (111B). # The condition code C3 –C0 is undefined. # The B-bit is zero. The tag word contains FFFFH (all stack locations are empty). The Intel386 Microprocessor and Intel387 Math CoProcessor initialization software must execute a FNINIT instruction (i.e., FINIT without a preceding WAIT) after RESET. The FNINIT is not strictly required for the Intel386 software, but Intel recommends its use to help ensure upware compatibility with other processors. After a hardware RESET, the ERRORÝ output is asserted to indicate that an Intel387 Math CoProcessor is present. To accomplish this, the IE (Invalid Exception) and ES (Error Summary) bits of the status word are set, and the IM bit (Invalid Exception Mask) in the control word is cleared. After FNINIT, the status word and the control word have the same values as in an Intel287 Math CoProcessor after RESET. 3.7 Processing Modes The Intel387 SX Math CoProcessor works the same whether the CPU is executing in real-addressing mode, protected mode, or virtual-8086 mode. All references to memory for numerics data or status information are performed by the CPU, and therefore obey the memory-management and protection rules of the CPU mode currently in effect. The Intel387 SX Math CoProcessor merely operates on instruc- tions and values passed to it by the CPU and therefore is not sensitive to the processing mode of the CPU. The real-address mode and virtual-8086 mode, the Intel387 SX Math CoProcessor is completely upward compatible with software for the 8086/8087 and 80286/80287 real-address mode systems. In protected mode, the Intel387 SX Math CoProcessor is completely upward compatible with software for the 80286/80287 protected mode system. The only differences of operation that may appear when 8086/8087 programs are ported to the protected mode (not using virtual-8086 mode) is in the format of operands for the administrative instructions FLDENV, FSTENV, FRSTOR, and FSAVE. 3.8 Programming Support Using the Intel387 SX Math CoProcessor requires no special programming tools, because all new instructions and data types are directly supported by the assembler and compilers for high-level languages. All Intel386 Microprocessor development tools that support Intel387 Math CoProcessor programs can also be used to develop software for the Intel386 SX Microprocessors and Intel387 SX Math CoProcessors. All 8086/8088 development tools that support the 8087 can also be used to develop software for the CPU and Math CoProcessor in realaddress mode or virtual-8086 mode. All 80286 development tools that support the Intel287 Math CoProcessor can also be used to develop software for the Intel386 CPU and Intel387 Math CoProcessor. 4.0 HARDWARE SYSTEM INTERFACE In the following description of hardware interface, the Ý symbol at the end of a signal name indicates that the active or asserted state occurs when the signal is at a low voltage. When no Ý is present after the signal name, the signal is asserted when at the high voltage level. 21 21 Intel387 TM SX MATH COPROCESSOR 4.1 Signal Description In the following signal descriptions, the Intel387 SX Math CoProcessor pins are grouped by function as shown by Table 4-1. Table 4-1 lists every pin by its identifier, gives a brief description and lists some of its characteristics (Refer to Figure 1-1 and Table 1-1 for pin configuration). All output signals can be tri-stated by driving STEN inactive. The output buffers of the bi-directional data pins D15–D0 are also tri-state; they only leave the floating state during read cycles when the Math CoProcessor is selected. 4.1.1 Intel386 CPU CLOCK 2 (CPUCLK2) This input uses the CLK2 signal of the CPU to time the bus control logic. Several other Math CoProcessor signals are referenced to the rising edge of this signal. When CKM e 1 (synchronous mode) this pin also clocks the data interface and control unit and the floating point unit of the Math CoProcessor. This pin requires CMOS-level input. The signal on this pin is divided by two to produce the internal clock signal CLK. 4.1.2 Intel387 MATH COPROCESSOR CLOCK 2 (NUMCLK2) When CKM e 0 (asynchronous mode), this pin provides the clock for the data interface and control unit and the floating point unit of the Math CoProcessor. In this case, the ratio of the frequency of NUMCLK2 to the frequency of CPUCLK2 must lie within the range 10:16 to 14:10 and the maximum frequency must not exceed the device specifications. When CKM e 1 (synchronous mode), signals on this pin are ignored: CPUCLK2 is used instead for the data interface and control unit and the floating point unit. This pin requires CMOS level input and should be tied low if not used. Table 4-1. Pin Summary Pin Name Function Active State Input/ Output Referenced To . . . I I I I CPUCLK2 O O O CPUCLK2 CPUCLK2 NUMCLK2 I/O I I I O CPUCLK2 CPUCLK2 CPUCLK2 CPUCLK2 CPUCLK2 I I I I CPUCLK2 CPUCLK2 CPUCLK2 CPUCLK2 Execution Control CPUCLK2 NUMCLK2 CKM RESETIN Microprocessor Clock2 Math CoProcessor Clock2 Math CoProcessor Clock Mode System Reset PEREQ BUSYÝ ERRORÝ Processor Request Busy Status Error Status D15–D0 W/RÝ ADSÝ READYÝ READYOÝ Data Pins Write/Read Bus Cycle Address Strobe Bus Ready Input Ready Output STEN NPS1Ý NPS2 CMD0Ý Status Enable Numerics Select Ý1 Numerics Select Ý2 Command VCC VSS System Power System Ground High Math CoProcessor Handshake High Low Low Bus Interface High/Low Low Low Low Chip/Port Select High Low High Low Power and Ground 22 22 Intel387 TM SX MATH COPROCESSOR 4.1.3 CLOCKING MODE (CKM) 4.1.5 PROCESSOR REQUEST (PEREQ) This pin is strapping option. When it is strapped to VCC (HIGH), the Math CoProcessor operates in synchronous mode; when strapped to VSS (LOW), the Math CoProcessor operates in asynchronous mode. These modes relate to clocking of the internal data interface and control unit and the floating point unit only; the bus control logic always operates synchronously with respect to the CPU. When active, this pin signals to the CPU that the Math CoProcessor is ready for data transfer to/from its data FIFO. When all data is written to or read from the data FIFO, PEREQ is deactivated. This signal always goes inactive before BUSYÝ goes inactive. This signal is reference to CPUCLK2. It should be connected to the CPU PEREQ input pin. Synchronous mode requires the use of only one clock, the CPU’s CLK2. Use of synchronous mode eliminates one clock generator from the board design and is recommended for all designs. Synchronous mode also allows the internal Power Management Unit to enable the idle and standby power saving modes. 4.1.6 BUSY STATUS (BUSYÝ) When active, this pin signals to the CPU that the Math CoProcessor is currently executing an instruction. This signal is referenced to CPUCLK2. It should be connected to the CPU BUSYÝ input pin. 4.1.7 ERROR STATUS (ERRORÝ) Asynchronous mode can provide higher performance of the floating point unit by running a faster clock on NUMCLK2. (The CPU’s CLK2 must still be connected to CPUCLK2 input.) This allows the floating point unit to run up to 40% faster than in synchronous mode. Internal power management is disabled in asynchronous mode. 4.1.4 SYSTEM RESET (RESETIN) A LOW to HIGH transition on this pin causes the Math CoProcessor to terminate its present activity and to enter a dormant state. RESETIN must remain active (HIGH) for at least 40 CPUCLK2 (NUMCLK2 if CKM e 0) periods. The HIGH to LOW transitions of RESETIN must be synchronous with CPUCLK2, so that the phase of the internal clock of the bus control logic (which is the CPUCLK2 divided by two) is the same as the phase of the internal clock of the CPU. After RESETIN goes LOW, at least 50 CPUCLK2 (NUMCLK2 if CKM e 0) periods must pass before the first Math CoProcessor instruction is written into the Math CoProcessor. This pin should be connected to the CPU RESET pin. Table 4-2 shows the status of the output pins during the reset sequence. After a reset, all output pins return to their inactive state except for ERRORÝ which remains active (for CPU recognition) until cleared. Table 4-2. Output Pin Status during Reset Pin Value Pin Name HIGH LOW Tri-State OFF READYOÝ, BUSYÝ PEREQ, ERRORÝ D15–D0 This pin reflects the ES bit of the status register. When active, it indicates that an unmasked exception has occurred. This signal can be changed to the inactive state only by the following instructions (without a preceding WAIT); FNINIT, FNCLEX, FNSTENV, FNSAVE, FLDCW, FLDENV, and FRSTOR. ERRORÝ is driven active during RESET to indicate to the CPU that the Math CoProcessor is present. This pin is referenced to NUMCLK2 (or CPUCLK2 if CKM e 1). It should be connected to the ERRORÝ pin of the CPU. 4.1.8 DATA PINS (D15 – D0) These bi-directional pins are used to transfer data and opcodes between the CPU and Math CoProcessor. They are normally connected directly to the corresponding CPU data pins. HIGH state indicates a value of one. D0 is the least significant data bit. Timings are referenced to rising edge of CPUCLK2. 4.1.9 WRITE/READ BUS CYCLE (W/RÝ) This signal indicates to the Math CoProcessor whether the CPU bus cycle in progress is a read or a write cycle. This pin should be connected directly to the CPU’s W/RÝ pin. HIGH indicates a write cycle to the Math CoProcessor; LOW a read cycle from the Math CoProcessor. This input is ignored if any of the signals STEN, NPS1Ý, or NPS2 are inactive. Setup and hold times are referenced to CPUCLK2. 4.1.10 ADDRESS STROBE (ADSÝ) This input, in conjunction with the READYÝ input, indicates when the Math CoProcessor bus control logic may sample W/RÝ and the chip select signals. Setup and hold times are referenced to CPUCLK2. This pin should be connected to the ADSÝ pin of the CPU. 23 23 Intel387 TM SX MATH COPROCESSOR 4.1.11 BUS READY INPUT (READYÝ) This input indicates to the Math CoProcessor when a CPU bus cycle is to be terminated. It is used by the bus control logic to trace bus activities. Bus cycles can be extended indefinitely until terminated by READYÝ. This input should be connected to the same signal that drives the CPU’s READYÝ input. Setup and hold times are referenced to CPUCLK2. 4.1.12 READY OUTPUT (READYOÝ) This pin is activated at such a time that write cycles are terminated after two clocks (except FLDENV and FRSTOR) and read cycles after three clocks. In configurations where no extra wait states are required, this pin must directly or indirectly drive the READYÝ input of the CPU. Refer to the section entitled ‘‘BUS OPERATION’’ for details. This pin is activated only during bus cycles that select the Math CoProcessor. This signal is referenced to CPUCLK2. (FLDENV and FRSTOR require data transfers larger than the FIFO. Therefore, PEREQ is activated for the duration of transferring 2 words of 32 bits and then deactivated until the FIFO is ready to accept two additional words. The length of the write cycles of the last operand word in each transfer as well as the first operand word transfer of the entire instruction is 3 clocks instead of 2 clocks. This is done to give the Intel386 CPU enough time to sample PEREQ and to notice that the Intel387 is not ready for additional transfers.) 4.1.13 STATUS ENABLE (STEN) This pin serves as a chip select for the Math CoProcessor. When inactive, this pin forces BUSYÝ, PEREQ, ERRORÝ and READYOÝ outputs into a floating state. D15–D0 are normally floating and will leave the floating state only if STEN is active and additional conditions are met (read cycle). STEN also causes the chip to recognize its other chip select inputs. STEN makes it easier to do on-board testing (using the overdrive method) of other chips in systems containing the Math CoProcessor. STEN should be pulled up with a resistor so that it can be pulled down when testing. In boards that do not use on-board testing STEN should be connected to VCC. Setup and hold times are relative to CPUCLK2. Note that STEN must maintain the same setup and hold times as NPS1Ý, NPS2, and CMD0Ý (i.e., if STEN changes state during a Math CoProcessor bus cycle, it must change state during the same CLK period as the NPS1Ý, NPS2, and CMD0Ý signals). 4.1.14 MATH COPROCESSOR SELECT 1 (NPS1Ý) When active (along with STEN and NPS2) in the first period of a CPU bus cycle, this signal indicates that the purpose of the bus cycle is to communicate with the Math CoProcessor. This pin should be connected directly to the M/IOÝ pin of the CPU, so that the Math CoProcessor is selected only when the CPU performs I/O cycles. Setup and hold times are referenced to the rising edge of CPUCLK2. 4.1.15 MATH COPROCESSOR SELECT 2 (NPS2) When active (along with STEN and NPS1Ý) in the first period of a CPU bus cycle, this signal indicates that the purpose of the bus cycle is to communicate with the Math CoProcessor. This pin should be connected directly to the A23 pin of the CPU, so that the Math CoProcessor is selected only when the CPU issues one of the I/O addresses reserved for the Math CoProcessor (8000F8h, 8000FCh, or 8000FEh which is treated as 8000FCh by the Math CoProcessor). Setup and hold times are referenced to the rising edge of CPUCLK2. 4.1.16 COMMAND (CMD0Ý) During a write cycle, this signal indicates whether an opcode (CMD0Ý active low) or data (CMD0Ý inactive high) is being sent to the Math CoProcessor. During a read cycle, it indicates whether the control or status register (CMD0Ý active) or a data register (CMD0Ý) is being read. CMD0Ý should be connected directly to the A2 output of the CPU. Setup and hold times are referenced to the rising edge of CPUCLK2 at the end of PH2. 4.1.17 SYSTEM POWER (VCC) System power provides the a 5V DC supply input. All VCC pins should be tied together on the circuit board and local decoupling capacitors should be used between VCC and VSS. 4.1.18 SYSTEM GROUND (VSS) System ground provides the 0V connection from which all inputs and outputs are measured. All VSS pins should be tied together on the circuit board and local decoupling capacitors should be used between VCC and VSS. 24 24 Intel387 TM SX MATH COPROCESSOR 4.2 System Configuration The Intel387 SX Math CoProcessor is designed to interface with the Intel386 SX Microprocessor as shown by Figure 4-1. A dedicated communication protocol makes possible high-speed transfer of opcodes and operands between the CPU and Math CoProcessor. The Intel387 SX Math CoProcessor is designed so that no additional components are required for interface with the CPU. Most control pins of the Math CoProcessor are connected directly to pins of the CPU. # The CPU and Math CoProcessor share the same reset signals. They may also share the same clock input; however, for greatest performance, an external oscillator may be needed. # The corresponding BusyÝ, ERRORÝ, and PEREQ pins are connected together. # The Math CoProcessor NPS1Ý and NPS2 inputs are connected to the latched CPU M/IOÝ and A23 outputs respectively. For Math CoProcessor cycles, M/IOÝ is always LOW and A23 always HIGH. # The Math CoProcessor input CMD0 is connected The interface between the Math CoProcessor and the CPU has these characteristics: # The Math CoProcessor shares the local bus of the Intel386 SX Microprocessor. to the latched A2 output. The Intel386 SX Microprocessor generates address 8000F8H when writing a command and address 8000FCH or 8000FEH (treated as 8000FCH by the Intel387 SX Math CoProcessor) when writing or reading data. It does not generate any other addresses during Math CoProcessor bus cycles. 240225 – 6 Figure 4-1. Intel386 TM SX CPU and Intel387 TM SX Math CoProcessor System Configuration 25 25 Intel387 TM SX MATH COPROCESSOR 4.3 Math CoProcessor Architecture As shown in Figure 2-1 Block Diagram, the Intel387 SX Math CoProcessor is internally divided into four sections; the Bus Control Logic (BCL), the Data Interface and Control Logic, the Floating Point Unit (FPU), and the Power Management Unit (PMU). The Bus Control Logic is responsible for the CPU bus tracking and interface. The BCL is the only unit in the Math CoProcessor that must run synchronously with the CPU; the rest of the Math CoProcessor can run asynchronously with respect to the CPU. The Data Interface and Control Unit is responsible for the data flow to and from the FPU and the control registers, for receiving the instructions, decoding them, sequencing the microinstructions, and for handling some of the administrative instructions. The Floating Point Unit (with the support of the control unit which contains the sequencer and other support units) executes the mathematical instructions. The Power Manager is new to the Intel387 family. It is responsible for shutting down idle sections of the device to save power. FIFO or the instruction decoder. The instruction decoder decodes the ESC instructions sent to it by the CPU and generates controls that direct the data flow in the FIFO. It also triggers the microinstruction sequencer that controls execution of each instruction. If the ESC instruction is FINIT, FCLEX, FSTSW, FSTSW AX, FSTCW, FSETPM, or FRSTPM, the control unit executes it independently of the FPU and the sequencer. The data interface and control unit is the unit that generates the BUSYÝ, PEREQ, and ERRORÝ signals that synchronize the Math CoProcessor activities with the CPU. 4.3.3 FLOATING POINT UNIT The FPU executes all instructions that involve the register stack, including arithmetic, logical, transcendental, constant, and data transfer instructions. The data path in the FPU is 84 bits wide (68 significant bits, 15 exponent bits, and a sign bit) which allows internal operand transfers to be performed at very high speeds. 4.3.4 POWER MANAGEMENT UNIT 4.3.1 BUS CONTROL LOGIC The BCL communicates solely with the CPU using I/O bus cycles. The BCL appears to the CPU as a special peripheral device. It is special in two respects: the CPU initiates I/O automatically when it encounters ESC instructions, and the CPU uses reserved I/O addresses to communicate with the BCL. The BCL does not communicate directly with memory. The CPU performs all memory access, transferring input operands from the memory to the Math CoProcessor and transferring outputs from the Math CoProcessor to memory. 4.3.2 DATA INTERFACE AND CONTROL UNIT The data interface and control unit latches the data and, subject to BCL control, directs the data to the The Power Management Unit (PMU) controls all internal power savings circuits. When the Math CoProcessor is not executing an instruction, the PMU disables the internal clock to the FPU, Control Unit, and Data Interface within three clocks. The Bus Control Logic remains enabled to accept the next instruction. Upon decode of a valid Math CoProcessor bus cycle, the PMU enables the internal clock to all circuits. No loss in performance occurs. 4.4 Bus Cycles All bus cycles are initiated by the CPU. The pins STEN, NPS1Ý, NPS2, CMD0, and W/RÝ identify bus cycles for the Math CoProcessor. Table 4-3 defines the types of Math CoProcessor bus cycles. Table 4-3. Bus Cycle Definition STEN NPS1Ý NPS2 CMD0Ý W/RÝ 0 1 1 1 1 1 1 X 1 X 0 0 0 0 X X 0 1 1 1 1 X X X 0 0 1 1 X X X 0 1 0 1 Bus Cycle Type Math CoProcessor not selected and all outputs in floating state Math CoProcessor not selected Math CoProcessor not selected CW or SW read from Math CoProcessor Opcode write to Math CoProcessor Data read from Math CoProcessor Data write to Math CoProcessor 26 26 Intel387 TM SX MATH COPROCESSOR 4.4.1 INTEL387 SX MATH COPROCESSOR ADDRESSING value has already been written or read by the Math CoProcessor before the CPU reads or changes the value. The NPS1Ý, NPS2, and CMD0 signals allow the Math CoProcessor to identify which bus cycles are intended for the Math CoProcessor. The Math CoProcessor responds to I/O cycles when the I/O address is 8000F8h, 8000FCh, and 8000FEh (treated as 8000FCh). The Math CoProcessor responds to I/O cycles when bit 23 of the I/O address is set. In other words, the Math CoProcessor acts as an I/O device in a reserved I/O address space. Once it has started to execute a numerics instruction and has transferred and operands from the CPU, the Math CoProcessor can process the instruction in parallel with and independent of the host CPU. When the Math CoProcessor detects an exception, it asserts the ERRORÝ signal, which causes a CPU interrupt. Because A23 is used to select the Intel387 SX Math CoProcessor for data transfers, it is not possible for a program running on the CPU to address the Math CoProcessor with an I/O instruction. Only ESC instructions cause the CPU to communicate with the Math CoProcessor. 4.4.2 CPU/MATH COPROCESSOR SYNCHRONIZATION The pins BUSYÝ, PEREQ, and ERRORÝ are used for various aspects of synchronization between the CPU and the Math CoProcessor. BUSYÝ is used to synchronize instruction transfer from the CPU to the Math CoProcessor. When the Math CoProcessor recognizes an ESC instruction it asserts BUSYÝ. For most ESC instructions, the CPU waits for the Math CoProcessor to deassert BUSYÝ before sending the new opcode. The Math CoProcessor uses the PEREQ pin of the CPU to signal that the Math CoProcessor is ready for data transfer to or from its data FIFO. The Math CoProcessor does not directly access memory; rather, the CPU provides memory access services for the Math CoProcessor. (For this reason, memory access on behalf of the Math CoProcessor always obeys the protection rules applicable to the current CPU mode.) Once the CPU initiates an Math CoProcessor instruction that has operands, the CPU waits for PEREQ signals that indicate when the Math CoProcessor is ready for operand transfer. Once all operands have been transferred (or if the instruction has no operands) the CPU continues program execution while the Math CoProcessor executes the ESC instruction. In 8087/8087 systems, WAIT instructions may be required to achieve synchronization of both commands and operands. In the Intel386 Microprocessor and Intel387 Math CoProcessor systems, however, WAIT instructions are required only for operand synchronization; namely, after Math CoProcessor stores to memory (except FSTSW and FSTCW) or load from memory. (In 80286/80287 systems, WAIT is required before FLDENV and FRSTOR.) Used this way, WAIT ensures that the 4.4.3 SYNCHRONOUS/ASYNCHRONOUS MODES The internal logic of the Math CoProcessor can operate either directly from the CPU clock (synchronous mode) or from a separate clock (asynchronous mode). The two configurations are distinguished by the CKM pin. In either case, the bus control logic (BCL) of the Math CoProcessor is synchronized with the CPU clock. Use of asynchronous mode allows the BCL and the FPU section of the Math CoProcessor to run at different speeds. In this case, the ratio of the frequency of NUMCLK2 to the frequency of CPUCLK2 must lie within the range 10:16 to 14:10. Use of synchronous mode eliminates one clock generator from the board design. The internal Power Management Unit of the Intel387 SX Math CoProcessor is disabled in asynchronous mode. 4.4.4 AUTOMATIC BUS CYCLE TERMINATION In configurations where no extra wait states are required, READYOÝ can drive the CPU’s READYÝ input and the Math CoProcessors READYÝ input. If wait states are required, this pin should be connected to the logic that ORs all READY outputs from peripheral devices on the CPU bus. READYOÝ is asserted by the Math CoProcessor only during I/O cycles that select the Math CoProcessor. Refer to Section 5.0 Bus Operation for details. 5.0 BUS OPERATION With respect to bus interface, the Intel387 SX Math CoProcessor is fully synchronous with the CPU. Both operate at the same rate because each generates its internal CLK signal by dividing CPUCLK2 by two. Furthermore, both internal CLK signals are in phase, because they are synchronized by the same RESETIN signal. A bus cycle for the Math CoProcessor starts when the CPU activates ADSÝ and drives new values on the address and cycle definition lines (W/RÝ, M/IOÝ, etc.). The Math CoProcessor examines the address and cycle definition lines in the same CLK period during which ADSÝ is activated. This CLK period is considered the first CLK of the bus cycle. 27 27 Intel387 TM SX MATH COPROCESSOR During this first CLK period, the Math CoProcessor also examines the W/RÝ input signal to determine whether the cycle is a read or a write cycle and examines the CMD0Ý input to determine whether an opcode, operand, or control/status register transfer is to occur. The Intel387 SX Math CoProcessor supports both pipelined (i.e., overlapped) and non-pipelined bus cycles. A non-pipelined cycle is one for which the CPU asserts ADSÝ when no other bus cycle is in progress. A pipelined bus cycle is one for which the CPU asserts ADSÝ and provides valid next address and control signals before the prior Math CoProcessor cycle terminates. The CPU may do this as early as the second CLK period after asserting ADSÝ for the prior cycle. Pipelining increases the availability of the bus by at least one CLK period. The Intel387 SX Math CoProcessor supports pipelined bus cycles in order to optimize address pipelining by the CPU for memory cycles. Bus operation is described in terms of an abstract state machine. Figure 5-1 illustrates the states and state transitions for Math CoProcessor bus cycles: # TI is the idle state. This is the state of the bus logic after RESET, the state to which bus logic returns after every non-pipelined bus cycle, and the state to which bus logic returns after a series of pipelined cycles. # TRS is the READYÝ sensitive state. Different types of bus cycles may require a minimum of one or two successive TRS states. The bus logic remains in TRS state until READYÝ is sensed, at which point the bus cycle terminates. Any number of wait states may be implemented by delaying READYÝ, thereby causing additional successive TRS states. # TP is the first state for every pipelined bus cycle. This state is not used by non-pipelined cycles. Note that the bus logic tracks bus state regardless of the values on the chip/port select pins. The READYOÝ output of the Math CoProcessor indicates when a Math CoProcessor bus cycle may be terminated if no extra wait states are required. For all write cycles (except those for the instructions FLDENV and FRSTOR), READYOÝ is always asserted during the first TRS state, regardless of the number of wait states. For all read cycles (and write cycles for FLDENV and FRSTOR), READYÝ is always asserted in the second TRS state, regardless of the number of wait states. These rules apply to both pipelined and non-pipelined cycles. Systems designers may use READYOÝ in one of the following ways: 1. Connect it (directly or through logic that ORs READYÝ signals from other devices) to the READYÝ inputs of the CPU and Math CoProcessor. 2. Use it as one input to a wait-state generator. The following sections illustrate different types of Intel387 SX Math CoProcessor bus cycles. Because different instructions have different amounts of overhead before, between, and after operand transfer cycles, it is not possible to represent in a few diagrams all of the combinations of successive operand transfer cycles. The following bus cycle diagrams show memory cycles between Math CoProcessor operand transfer cycles. Note however that, during FRSTOR, some consecutive accesses to the Math CoProcessor do not have intervening memory accesses. For the timing relationship between operand transfer cycles and opcode write or other overhead activities, see Figure 7-7 ‘‘Other Parameters’’. 5.1 Non-Pipelined Bus Cycles Figure 5-2 illustrates bus activity for consecutive non-pipelined bus cycles. At the second clock of the bus cycle, the Math CoProcessor enters the TRS state. During this state, it samples the READYÝ input and stays in this state as long as READYÝ is inactive. 5.1.1 WRITE CYCLE In write cycles, the Math CoProcessor drives the READYOÝ signal for one CLK period during the second CLK period of the cycle (i.e., the first TRS state); therefore, the fastest write cycle takes two CLK periods (see cycle 2 of Figure 5-2). For the instructions FLDENV and FRSTOR, however, the Math CoProcessor forces wait state by delaying the activation of READYOÝ to the second TRS state (not shown in Figure 5-2). 240225 – 7 The Math CoProcessor samples the D15 – D0 inputs into data latches at the falling edge of CLK as long as it stays in TRS state. Figure 5-1. Bus State Diagram 28 28 Intel387 TM SX MATH COPROCESSOR 240225 – 8 Cycles 1 & 2 represent part of the operand transfer cycle for instructions involving either 4-byte or 8-byte operand loads. Cycles 3 & 4 represent part of the operand transfer cycle for a store operation. *Cycles 1 & 2 could repeat here or TI states for various non-operand transfer cycles and overhead. Figure 5-2. Non-Pipelined Read and Write Cycles When READYÝ is asserted, the Math CoProcessor returns to the idle state. Simultaneously with the Math CoProcessor entering the idle state, the CPU may assert ADSÝ again, signaling the beginning of yet another cycle. 5.1.2 READ CYCLE At the rising edge of CLK in the second CLK period of the cycle (i.e., the first TRS state), the Math CoProcessor starts to drive the D15–D0 outputs and continues to drive them as long as it stays in TRS state. At least one wait state must be inserted to ensure that the CPU latches the correct data. Because the Math CoProcessor starts driving the data bus only at the rising edge of CLK in the second clock period of the bus cycle, not enough time is left for the data signals to propagate and be latched by the CPU before the next falling edge of CLK. Therefore, the Math CoProcessor does not drive the READYOÝ signal until the third CLK period of the cycle. Thus, if the READYOÝ output drives the CPU’s READYÝ input, one wait state is automatically inserted. Because one wait state is required for Math CoProcessor reads, the minimum length of a Math CoProcessor read cycle is three CLK periods, as cycle 3 of Figure 5-2 shows. When READYÝ is asserted, the Math CoProcessor returns to the idle state. Simultaneously with the Math CoProcessor’s entering the idle state, the CPU may assert ADSÝ again, signaling the beginning of yet another cycle. The transition from TRS state to idle state causes the Math CoProcessor to put the D15 – D0 outputs into the floating state, allowing another device to drive the data bus. 5.2 Pipelined Bus Cycles Because all the activities of the Math CoProcessor bus interface occur either during the TRS state or 29 29 Intel387 TM SX MATH COPROCESSOR during the transitions to or from that state, the only difference between a pipelined and a non-pipelined cycle is the manner of changing from one state to another. The exact activities during each state are detailed in the previous section ‘‘Non-pipelined Bus Cycles’’. When the CPU asserts ADSÝ before the end of a bus cycle, both ADSÝ and READYÝ are active during a TRS state. This condition causes the Math CoProcessor to change to a different state named TP. One clock period after a TP state, the Math CoProcessor always returns to the TRS state. In consecutive pipelined cycles, the Math CoProcessor bus logic uses only the TRS and TP states. Figure 5-3 shows the fastest transitions into and out of the pipelined bus cycles. Cycle 1 in the figure represents a non-pipelined cycle. (Non-pipelined write are always followed by another non-pipelined cycle, because READYÝ is asserted before the earliest possible assertion of ADSÝ for the next cycle.) Figure 5-4 shows pipelined write and read cycles with one additional TRS state beyond the minimum required. To delay the assertion of READYÝ requires external logic. 5.3 Mixed Bus Cycles When the Math CoProcessor bus logic is in the TRS state, it distinguishes between non-pipelined and pipelined cycles according to the behavior of ADSÝ and READYÝ. In a non-pipelined cycle, only READYÝ is activated, and the transition is from the TRS state to the idle state. In a pipelined cycle, both READYÝ and ADSÝ are active, and the transition is first from TRS state to TP state, then, after one clock period, back to TRS state. 240225 – 9 Cycle 1 – Cycle 4 represent the operand transfer cycle for an instruction involving a transfer of two 32-bit loads in total. The opcode write cycles and other overhead are not shown. Note that the next cycle will be a pipelined cycle if both READYÝ and ADSÝ are sampled active at the end of a TRS state of the current cycle. Figure 5-3. Fastest Transitions to and from Pipelined Cycles 30 30 Intel387 TM SX MATH COPROCESSOR 240225 – 10 NOTE: 1. Cycles between operand write to the Math CoProcessor and storing result. Figure 5-4. Pipelined Cycles with Wait States 31 31 Intel387 TM SX MATH COPROCESSOR 5.4 BUSYÝ and PEREQ Timing Relationship Figure 5-5 shows the activation of BUSYÝ at the beginning of instruction execution and its deactiva- tion upon completion of the instruction. PEREQ is activated within this interval. If ERRORÝ is ever asserted, it would be asserted at least six CPUCLK2 periods after the deactivation of PEREQ and would be deasserted at least six CPUCLK2 periods before the deactivation of BUSYÝ. 240225 – 11 NOTES: 1. Instruction dependent. 2. PEREQ is an asynchronous input to the Intel386 TM Microprocessor; it may not be asserted (instruction dependent). 3. More operand transfers. 4. Memory read (operand) cycle is not shown. Figure 5-5. STEN, BUSYÝ, and PEREQ Timing Relationships 32 32 Intel387 TM SX MATH COPROCESSOR 6.0 The ambient temperature (TA) is guaranteed as long as TC is not violated. The ambient temperature can be calculated from the iJC (thermal resistance constant from the transistor junction to the case) and iJA (thermal resistance from junction to ambient) from the following calculations: Junction Temperature TJ e TC a P*iJC PACKAGE SPECIFICATIONS 6.1 Mechanical Specifications The Intel387 SX Math CoProcessor is packaged in a 68-pin PLCC package. Detailed mechanical specifications can be found in the Intel Packaging Specification, Order Number 231369. Ambient Temperature TA e TJ b P*iJA Case Temperature 6.2 Thermal Specifications TC e TA a P* (iJA b iJC) Values for iJA and iJC are given in Table 6-1 for the 68 pin PLCC package. iJC is given at various airflows. Table 6-2 shows the maximum TA allowable without exceeding TC at various airflows. Note that TA can be improved further by attaching a heat sink to the package. P is calculated by using the maximum hot ICC and maximum VCC. The Intel387 SX Math CoProcessor is specified for operation when the case temperature is within the range of 0§ C to 100§ C. The case temperature (TC) may be measured in any environment to determine whether the Intel387 SX Math CoProcessor is within the specified operating range. The case temperature should be measured at the center of the top surface. Table 6-1. Thermal Resistances (§ C/Watt) iJC and iJA iJA versus Airflow - ft/min (m/sec) Package iJC 0 (0) 200 (1.01) 400 (2.03) 600 (3.04) 800 (4.06) 1000 (5.07) 68-Pin PLCC 8 30 25 20 15.5 13 12 Table 6-2. Maximum TA at Various Airflows TA (§ C) versus Airflow - ft/min (m/sec) Package 0 (0) 200 (1.01) 400 (2.03) 600 (3.04) 800 (4.06) 1000 (5.07) 68-Pin PLCC 84.9 88.3 91.8 94.8 96.6 97.2 Maximum TA is calculated at maximum VCC and maximum ICC. 7.0 ELECTRICAL CHARACTERISTICS The following specifications represent the targets of the design effort. They are subject to change without notice. Contact your Intel representative to get the most up-to-date values. 7.1 Absolute Maximum Ratings* Case Temperature TC Under Bias ÀÀÀ0§ C to a 100§ C Storage Temperature ÀÀÀÀÀÀÀÀÀÀ b 65§ C to a 150§ C Voltage on Any Pin with Respect to Ground ÀÀÀÀÀÀÀ b 0.5 to VCC a 0.5 Power DissipationÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀ0.8W NOTICE: This is a production data sheet. The specifications are subject to change without notice. *WARNING: Stressing the device beyond the ‘‘Absolute Maximum Ratings’’ may cause permanent damage. These are stress ratings only. Operation beyond the ‘‘Operating Conditions’’ is not recommended and extended exposure beyond the ‘‘Operating Conditions’’ may affect device reliability. 33 33 Intel387 TM SX MATH COPROCESSOR 7.2 D.C. Characteristics Table 7-1. D.C. Specifications TC e 0§ C to a 100§ C, VCC e 5V g 10% Symbol Parameter Min Max Units Input LO Voltage Input HI Voltage CPUCLK2 and NUMCLK2 Input LO Voltage CPUCLK2 and NUMCLK2 Input HI Voltage b 0.3 a 0.8 VCC a 0.3 V V b 0.3 a 0.8 V VCC b 0.8 VCC a 0.8 V VOL VOH VOH Output LO Voltage Output HI Voltage Output HI Voltage 0.45 2.4 VCC b 0.8 V V V (Note 2) (Note 3) (Note 4) ICC Power Supply Current Dynamic Mode Freq. e 33 MHz(5) Freq. e 25 MHz(5) Freq. e 20 MHz(5) Freq. e 16 MHz(5) Freq. e 1 MHz(5) Idle Mode(6) 150 150 125 100 20 7 mA mA mA mA mA mA ICC typ. ICC typ. ICC typ. ICC typ. ICC typ. ICC typ. ILI ILO Input Leakage Current I/O Leakage Current g 15 g 15 mA mA 0V s VIN s VCC 0.45V s VO s VCC CIN CO CCLK Input Capacitance I/O Capacitance Clock Capacitance 10 12 20 pF pF pF fc e 1 MHz fc e 1 MHz fc e 1 MHz VIL VIH VCL VCH 2.0 7 7 7 Test Conditions (Note 1) (Note 1) e e e e e e 135 mA 130 mA 110 mA 90 mA 5 mA 4 mA NOTES: 1. This parameter is for all inputs, excluding the clock inputs. 2. This parameter is measured at IOL as follows: Data e 4.0 mA READYOÝ, ERRORÝ, BUSYÝ, PEREQ e 25 mA 3. This parameter is measured at IOH as follows: Data e 1.0 mA READYOÝ, ERRORÝ, BUSYÝ, PEREQ e 0.6 mA 4. This parameter is measured at IOH as follows: Data e 0.2 mA READYOÝ, ERRORÝ, BUSYÝ PEREQ e 0.12 mA 5. Synchronous Clock Mode (CKM e 1). ICC is measured at steady state, maximum capacitive loading on the outputs, and worst-case D.C. level at the inputs. 6. Intel387 SX Math CoProcessor Internal Idle Mode. Synchronous clock mode, clock and control inputs are active but the Math CoProcessor is not executing an instruction. Outputs driving CMOS inputs. 34 34 Intel387 TM SX MATH COPROCESSOR 7.3 A.C. Characteristics Table 7-2a. Timing Requirements of the Bus Interface Unit TC e 0§ C to a 100§ C, VCC e 5V g 10% (All measurements made at 1.5V unless otherwise specified) Pin Symbol Parameter CPUCLK2 CPUCLK2 CPUCLK2 CPUCLK2 CPUCLK2 CPUCLK2 CPUCLK2 t1 t2a t2b t3a t3b t4 t5 READYOÝ PEREQ BUSYÝ ERRORÝ 16 MHz – 25 MHz 33 MHz Test Conditions Refer to Figure 7.2 4 4 2.0V 2.0V VCC b 0.8V 2.0V 0.8V From VCC b 0.8V to 0.8V From 0.8V to VCC b 0.8V 4 4 4 4 17 21 21 23 CL CL CL CL 50 pF 50 pF 50 pF 50 pF 7.3 37 CL e 50 pF 7.4 24 0 8 8 6 40 40 40 40 1 1 1 1 30 30 30 30 Min (ns) Max (ns) Min (ns) Max (ns) Period High Time High Time Low Time Low Time Fall Time Rise Time 20 6 3 6 4 DC 15 6.25 4.5 6.25 4.5 DC t7a t7b t7c t7d Out Delay Out Delay Out Delay Out Delay 4 4 4 4 25 23 23 23 D15–D0 D15–D0 D15–D0 D15–D0 t8 t10 t11 t12* Out Delay Setup Time Hold Time Float Time 1 11 11 6 45 READYOÝ PEREQ BUSYÝ ERRORÝ t13a* t13b* t13c* t13d* Float Time Float Time Float Time Float Time 1 1 1 1 ADSÝ ADSÝ W/RÝ W/RÝ t14a t15a t14b t15b Setup Time Hold Time Setup Time Hold Time 15 4 15 4 13 4 13 4 7.4 READYÝ READYÝ CMD0Ý CMD0Ý NPS1Ý, NPS2 NPS1Ý, NPS2 STEN STEN t16a t17a t16b t17b t16c t17c t16d t17d Setup Time Hold Time Setup Time Hold Time Setup Time Hold Time Setup Time Hold Time 9 4 16 2 16 2 15 2 7 4 13 2 13 2 13 2 7.4 RESETIN RESETIN t18 t19 Setup Time Hold Time 8 3 5 2 7.5 7 7 e e e e 19 7.6 NOTE: *Float condition occurs when maximum output current becomes less than ILO in magnitude. Float delay is not tested. 35 35 Intel387 TM SX MATH COPROCESSOR Table 7-2b. Timing Requirements of the Execution Unit (Asynchronous Mode CKM e 0) Pin Symbol Parameter NUMCLK2 NUMCLK2 NUMCLK2 NUMCLK2 NUMCLK2 NUMCLK2 NUMCLK2 t1 t2a t2b t3a t3b t4 t5 Period High Time High Time Low Time Low Time Fall Time Rise Time NUMCLK2/ CPUCLK2 16 MHz – 25 MHz 33 MHz Min (ns) Max (ns) Min (ns) Max (ns) 20 6 3 6 4 500 15 6.25 4.5 6.25 4.5 500 7 7 Ratio 10/16 14/10 6 6 10/16 Test Conditions Refer to Figure 2.0V 2.0V VCC b 0.8V 2.0V 0.8V From VCC b 0.8V to 0.8V From 0.8V to VCC b 0.8V 7.2 14/10 NOTE: If not used (CKM e 1) tie NUMCLK2 low. Table 7-2c. Other A.C. Parameters Pin Symbol Parameter Min Max Units RESETIN t30 Duration 40 NUMCLK2 RESETIN t31 RESETIN Inactive to 1st Opcode Write 50 NUMCLK2 BUSYÝ t32 Duration 6 CPUCLK2 BUSYÝ, ERRORÝ t33 ERRORÝ (In)Active to BUSYÝ Inactive 6 CPUCLK2 PEREQ, ERRORÝ t34 PEREQ Inactive to ERRORÝ Active 6 CPUCLK2 READYÝ, BUSYÝ t35 READYÝ Active to BUSYÝ Active 0 READYÝ t36 Minimum Time from Opcode Write to Opcode/Operand Write 4 CPUCLK2 READYÝ t37 Minimum Time from Operand Write to Operand Write 4 CPUCLK2 4 CPUCLK2 36 36 Intel387 TM SX MATH COPROCESSOR 240225 – 12 NOTE: *Typical part under worst-case conditions. Figure 7-1a. Typical Output Valid Delay vs Load Capacitance at Max Operating Temperature 240225 – 13 240225 – 14 NOTE: *Typical part under worst-case conditions. Figure 7-1b. Typical Output Slew Time vs Load Capacitance at Max Operating Temperature 240225 – 15 Figure 7-1c. Maximum ICC vs Frequency 37 37 Intel387 TM SX MATH COPROCESSOR 240225 – 16 Figure 7-2. CPUCLK2/NUMCLK2 Waveform and Measurement Points for Input/Output 240225 – 17 Figure 7-3. Output Signals 38 38 Intel387 TM SX MATH COPROCESSOR 240225 – 18 Figure 7-4. Input and I/O Signals 240225 – 19 NOTE: The second internal processor phase following RESET high to low transition is PH2. Figure 7-5. RESET Signal 39 39 Intel387 TM SX MATH COPROCESSOR 240225 – 20 Figure 7-6. Float from STEN 240225 – 21 *In NUMCLK2’s **or last operand NOTE: 1. Memory read (operand) cycle is not shown. Figure 7-7. Other Parameters 40 40 Intel387 TM SX MATH COPROCESSOR 8.0 INTEL387 SX MATH COPROCESSOR INSTRUCTION SET Instructions for the Intel387 SX Math CoProcessor assume one of the five forms shown in Table 8-1. In all cases, instructions are at least two bytes long and begin with the bit pattern 11011B, which identifies the ESCAPE class of instruction. Instructions that refer to memory operands specify addresses using the CPU’s addressing modes. MOD (Mode field) and R/M (Register/Memory specifier) have the same interpretation as the corresponding fields of CPU instructions (refer to Pro- grammer’s Reference Manual for the CPU). SIB (Scale Index Base) byte and DISP (displacement) are optionally present in instructions that have MOD and R/M fields. Their presence depends on the values of MOD and R/M, as for instructions of the CPU. The instruction summaries that follow in Table 8-2 assume that the instruction has been prefetched, decoded, and is ready for execution; that bus cycles do not require wait states; that there are no local bus HOLD requests delaying processor access to the bus; and that no exceptions are detected during instruction execution. If the instruction has MOD and R/M fields that call for both base and index registers, add one clock. Table 8-1. Instruction Formats Instruction First Byte OPA Optional Fields Second Byte 1 11011 1 2 11011 3 11011 d P OPA 1 1 4 11011 0 0 1 1 1 1 OP 5 11011 0 1 1 1 1 1 OP 15 –11 10 9 8 7 6 5 4 3 2 1 0 MF MOD OPA 1 MOD OPB R/M SIB DISP OPB* R/M SIB DISP OPB* ST(i) OP e Instruction opcode, possibly split into two fields OPA and OPB MF e Memory Format 00 - 32-bit real 01 - 32-bit integer 10 - 64-bit real 11 - 16-bit integer d e Destination 0 - Destination is ST(0) 1 - Destination is ST(i) R XOR d e 0 - Destination (op) Source R XOR d e 1 - Source (op) Destination *In FSUB and FDIV, the low-order bit of OPB is the R (reversed) bit P e POP 0 - Do not pop stack 1 - Pop stack after operation ESC e 11011 ST(i) e Register stack element i 000 e Stack top 001 e Second stack element # # # 111 e Eighth stack element 41 41 Intel387 TM SX MATH COPROCESSOR Encoding Instruction Clock Count Range 32-Bit 64-Bit Integer Real Byte 0 Byte 1 Optional Bytes 2 – 6 32-Bit Real DATA TRANSFER FLD e Loada Integer/real memory to ST(0) ESC MF 1 MOD 000 R/M SIB/DISP 11 – 20 Long integer memory to ST(0) ESC 111 MOD 101 R/M SIB/DISP Extended real memory to ST(0) ESC 011 MOD 101 R/M SIB/DISP 16 – 47 BCD memory to ST(0) ESC 111 MOD 100 R/M SIB/DISP 49 – 101 ST(i) to ST(0) ESC 001 11000 ST(i) ST(0) to integer/real memory ESC MF 1 MOD 010 R/M ST(0) to ST(i) ESC 101 11010 ST(i) 28 – 44 20 – 27 16-Bit Integer 42 – 53 30 – 58 7 – 12 FST e Store SIB/DISP 27 – 45 59 – 78 59 58 – 76 59 58 – 76 7 – 11 FSTP e Store and Pop ST(0) to integer/real memory ESC MF 1 MOD 011 R/M SIB/DISP ST(0) to long integer memory ESC 111 MOD 111 R/M SIB/DISP 27 – 45 59 – 78 ST(0) to extended real memory ESC 011 MOD 111 R/M SIB/DISP 50 – 56 ST(0) to BCD memory ESC 111 MOD 110 R/M SIB/DISP 116 – 194 ST(0) to ST(i) ESC 101 11011 ST (i) 7 – 11 ESC 001 11001 ST(i) 10 – 17 Integer/real memory to ST(0) ESC MF 0 MOD 010 R/M ST(i) to ST(0) ESC 000 11010 ST(i) Integer/real memory to ST(0) ESC MF 0 MOD 011 R/M ST(i) to ST(0) ESC 000 11011 ST(i) 64 – 86 FXCH e Exchange ST(i) and ST(0) COMPARISON FCOM e Compare SIB/DISP 15 – 27 36 – 54 18 – 31 39 – 62 13 – 21 FCOMP e Compare and pop SIB/DISP 15 – 27 36 – 54 18 – 31 39 – 62 13 – 21 FCOMPP e Compare and pop twice ESC 110 1101 1001 13 – 21 FTST e Test ST(0) ST(1) to ST(0) ESC 001 1110 0100 17 – 25 FUCOM e Unordered compare ESC 101 11100 ST(i) 13 – 21 FUCOMP e Unordered compare ESC 101 11101 ST(i) 13 – 21 FUCOMPP e Unordered compare and pop twice and pop ESC 010 1110 1001 13 – 21 FXAM e Examine ST(0) ESC 001 1110 0101 24-37 Shaded areas indicate instructions not available in 8087/80287. NOTE: a. When loading single or double precision zero from memory, add 5 clocks. 42 42 Intel387 TM SX MATH COPROCESSOR Encoding Instruction Clock Count Range Optional Bytes 2 – 6 32-Bit Real 32-Bit Integer 64-Bit Real 16-Bit Integer 14 – 31 36 – 58 19 – 38 38 – 64 Byte 0 Byte 1 Integer/real memory to ST(0) ESC MF 0 MOD 000 R/M SIB/DISP ST(i) and ST(0) ESC d P 0 11000 ST(i) SIB/DISP ESC MF 0 MOD 10 R R/M SIB/DISP ARITHMETIC FADD e Add 12 – 26b FSUB e Subtract Integer/real memory with ST(0) ST(i) to ST(0) 14 – 31 36 – 58 19 – 38 38 – 64c 12 – 26d ESC d P 0 1110 R R/M Integer/real memory with ST(0) ESC MF 0 MOD 001 R/M ST(i) and ST(0) ESC d P 0 1100 1 R/M Integer/real memory with ST(0) ESC MF 0 MOD 11 R R/M ST(i) and ST(0) ESC d P 0 1111 R R/M 77 – 80h FSQRT i e Square root ESC 001 1111 1010 97 – 111 FSCALE e Scale ST(0) by ST(1) ESC 001 1111 1101 44 – 82 FPREM e Partial remainder ESC 001 1111 1000 56 – 140 FPREM1 e Partial remainder (IEEE) ESC 001 1111 0101 81 – 168 FRNDINT e Round ST(0) to integer ESC 001 1111 1100 41 – 62 FMUL e Multiply SIB/DISP 21 – 33 45 – 73 27 – 57 46 – 74 17 – 50e FDIV e Divide SIB/DISP 79 – 87 103 – 116f 85 – 95 FXTRACT e Extract components of ST(0) ESC 001 1111 0100 42 – 63 FABS e Absolute value of ST(0) ESC 001 1110 0001 14 – 21 FCHS e Change sign of ST(0) ESC 001 1110 0000 17 – 24 105 – 124g TRANSCENDENTAL FCOSk e Cosine of ST(0) ESC 001 1111 1111 122 – 680 FPTANk e Partial tangent of ST(0) ESC 001 1111 0010 162 – 430j FPATAN e Partial arctangent of ST(0) ESC 001 1111 0011 250 – 420 FSINk e Sine of ST(0) ESC 001 1111 1110 121 – 680 FSINCOSk e Sine and cosine of ST(0) ESC 001 1111 1011 150 – 650 F2XM1l e 2ST(0) b 1 ESC 001 1111 0000 167 – 410 FYL2Xm e ST(1) * log2ST(0) ESC 001 1111 0001 99 – 436 FYL2XP1n e ST(1) * log2 [ST(0) a 1.0] ESC 001 1111 1001 210 – 447 Shaded areas indicate instructions not available in 8087/80287. NOTES: b. Add 3 clocks to the range when d e 1. c. Add 1 clock to each range when R e 1. d. Add 3 clocks to the range when d e 0. e. typical e 52 (When d e 0, 46–54, typical e 49). f. Add 1 clock to the range when R e 1. g. 135 – 141 when R e 1. h. Add 3 clocks to the range when d e 1. i. b0 s ST(0) s a % . j. These timings hold for operands in the range lxl k q. For operands not in this range, up to 76 additional clocks may be needed to reduce the operand. k. 0 s ST(0) k 263. l. b1.0 s ST(0) s 1.0. m. 0 s ST(0) k % , b % k ST(1) k a % . n. 0 s lST(0)l k [2-SQRT(2)]/2, b % kST(1) k a % . 43 43 Intel387 TM SX MATH COPROCESSOR Encoding Instruction Byte 0 Byte 1 Clock Count Range Optional Bytes 2 – 6 32-Bit Real 32-Bit Integer 64-Bit Real 16-Bit Integer CONSTANTS FLDZ e Load a 0.0 to ST(0) ESC 001 1110 1110 10 – 17 FLD1 e Load a 1.0 to ST(0) ESC 001 1110 1000 15 – 22 FLDPI e Load q to ST(0) ESC 001 1110 1011 26 – 36 FLDL2T e Load log2(10) to ST(0) ESC 001 1110 1001 26 – 36 FLDL2E e Load log2(e) to ST(0) ESC 001 1110 1010 26 – 36 FLDLG2 e Load log10(2) to ST(0) ESC 001 1110 1100 25 – 35 FLDLN2 e Load loge(2) to ST(0) ESC 001 1110 1101 26 – 38 PROCESSOR CONTROL FINIT e Initialize Math CoProcessor ESC 011 1110 0011 FLDCW e Load control word from memory ESC 001 MOD 101 R/M SIB/DISP 19 33 FSTCW e Store control word to memory ESC 001 MOD 111 R/M SIB/DISP 15 FSTSW e Store status word to memory ESC 101 MOD 111 R/M SIB/DISP 15 FSTSW AX e Store status word to AX ESC 111 1110 0000 FCLEX e Clear exceptions ESC 011 1110 0010 FSTENV e Store environment ESC 001 MOD 110 R/M SIB/DISP 117 – 118 FLDENV e Load environment ESC 001 MOD 100 R/M SIB/DISP 85 13 11 FSAVE e Save state ESC 101 MOD 110 R/M SIB/DISP 402 – 403 FRSTOR e Restore state ESC 101 MOD 100 R/M SIB/DISP 415 FINCSTP e Increment stack pointer ESC 001 1111 0111 FDECSTP e Decrement stack pointer ESC 001 1111 0110 22 FFREE e Free ST(i) ESC 101 1100 0 ST(i) 18 FNOP e No operations ESC 001 1101 0000 12 21 44 44 Intel387 TM SX MATH COPROCESSOR APPENDIX A INTEL387 SX MATH COPROCESSOR COMPATIBILITY A.1 8087/80287 Compatibility This section summarizes the differences between the Intel387 SX Math CoProcessor and the 80287 Math CoProcessor. Any migration from the 8087 directly to the Intel387 SX Math CoProcessor must also take into account the differences between the 8087 and the 80287 Math CoProcessor as listed in Appendix B. Many changes have been designed into the Intel387 SX Math CoProcessor to directly support the IEEE standard in hardware. These changes result in increased performance by eliminating the need for software that supports the standard. A.1.1 GENERAL DIFFERENCES The Intel387 SX Math CoProcessor supports only affine closure for infinity arithmetic, not projective closure. Operands for FSCALE and FPATAN are no longer restricted in range (except for g % ); F2XM1 and FPTAN accept a wider range of operands. Rounding control is in effect for FLD constant. Software cannot change entries of the tag word to values (other than empty) that differ from actual register contents. After reset, FINIT, and incomplete FPREM, the Intel387 SX Math CoProcessor resets to zero the condition code bits C3 –C0 of the status word. In conformance with the IEEE standard, the Intel387 SX Math CoProcessor does not support the special data formats pseudo-zero, pseudo-NaN, pseudo-infinity, and unnormal. The denormal exception has a different purpose on the Intel387 SX Math CoProcessor. A system that uses the denormal exception handler solely to normalize the denormal operands, would better mask the denormal exception on the Intel387 SX Math CoProcessor. The Intel387 SX Math CoProcessor automatically normalizes denormal operands when the denormal exception is masked. A-1 45 Intel387 TM SX MATH COPROCESSOR A.1.2 EXCEPTIONS A number of differences exist due to changes in the IEEE standard and to functional improvements to the architecture of the Intel387 SX Math CoProcessor: 1. When the overflow or underflow exception is masked, the Intel387 SX Math CoProcessor differs from the 80287 in rounding when overflow or underflow occurs. The Intel387 SX Math CoProcessor produces results that are consistent with the rounding mode. 2. When the underflow exception is masked, the Intel387 SX Math CoProcessor sets its underflow flag only if there is also a loss of accuracy during denormalization. 3. Fewer invalid-operations exceptions due to denormal operand, because the instructions FSQRT, FDIV, FPREM, and conversions to BCD or to integer normalize denormal operands before proceeding. 4. The FSQRT, FBSTP, and FPREM instructions may cause underflow, because they support denormal operands. 5. The denormal exception can occur during the transcendental instruction and the FXTRACT instruction. 6. The denormal exception no longer takes precedence over all other exceptions. 7. When the denormal exception is masked, the Intel387 SX Math CoProcessor automatically normalizes denormal operands. The 8087/80287 performs unnormal arithmetic, which might produce an unnormal result. 8. When the operand is zero, the FXTRACT instruction reports a zero-divide exception and leaves b % in ST(1). 9. The status word has a new bit (SF) that signals when invalid-operation exceptions are due to stack underflow or overflow. 10. FLD extended precision no longer reports denormal exceptions, because the instruction is not numeric. 11. FLD single/double precision when the operand is denormal converts the number to extended precision and signals the denormal operand exception. When loading a signaling NaN, FLD single/double precision signals an invalid-operation exception. 12. The Intel387 SX Math CoProcessor only generates quiet NaNs (as on the 80287); however, the Intel387 SX Math CoProcessor distinguishes between quiet NaNs and signaling NaNs. Signaling NaNs trigger exceptions when they are used as operands; quiet NaNs do not (except for FCOM, FIST, and FBSTP which also raise IE for quiet NaNs). 13. When stack overflow occurs during FPTAN and overflow is masked, both ST(0) and ST(1) contain quiet NaNs. The 80287/8087 leaves the original operand in ST(1) intact. 14. When the scaling factor is g % , the FSCALE instruction behaves as follows: # FSCALE (0, % ) generates the invalid operation exception. # FSCALE (finite, b % ) generates zero with the same sign as the scaled operand. # FSCALE (finite, a % ) generates % with the same sign as the scaled operand. The 8087/80287 returns zero in the first case and raises the invalid-operation exception in the other cases. 15. The Intel387 SX Math CoProcessor returns signed infinity/zero as the unmasked response to massive overflow/underflow. The 8087 and 80287 support a limited range for the scaling factor; within this range either massive overflow/underflow do not occur or undefined results are produced. A-2 46 Intel387 TM SX MATH COPROCESSOR APPENDIX B COMPATIBILITY BETWEEN THE 80287 AND 8087 MATH COPROCESSOR The 80286/80287 operating in Real Address mode will execute 8086/8087 programs without major modification. However, because of differences in the handling of numeric exceptions by the 80287 Math CoProcessor and the 8087 Math CoProcessor, exception handling routines may need to be changed. This appendix summarizes the differences between the 80287 Math CoProcessor and the 8087 Math CoProcessor, and provides details showing how 8087/8087 programs can be ported to the 80286/80287. 1. The Math CoProcessor signals exceptions through a dedicated ERRORÝ line to the 80286. The Math CoProcessor error signal does not pass through an interrupt controller (the 8087 INT signal does). Therefore, any interrupt controller oriented instructions in numeric exception handlers for the 8086/8087 should be deleted. 2. The 8087 instructions FENI and FDISI perform no useful function in the 80287. If the 80287 encounters one of these opcodes in its instruction stream, the instruction will effectively be ignored; none of the 80287 internal states will be updated. While 8086/8087 programs containing the instruction may be executed on the 80286/80287, it is unlikely that the exception handling routines containing these instructions will be completely portable to the 80287. 3. Interrupt vector 16 must point to the numeric exception handling routine. 4. The ESC instruction address saved in the 80287 includes any leading prefixes before the ESC opcode. The corresponding address saved in the 8087 does not include leading prefixes. 5. In Protected Address mode, the format of the 80287’s saved instruction and address pointers is different than for the 8087. The instruction opcode is not saved in Protected mode; exception handlers will have to retrieve the opcode from memory if needed. 6. Interrupt 7 will occur in the 80286 when executing ESC instructions with either TS (task switched) or EM (emulation) of the 80286 MSW set (TS e 1 or EM e 1). It TS is set, then a WAIT instruction will also cause interrupt 7. An exception handler should be included in 80286/80287 code to handle these situations. 7. Interrupt 9 will occur if the second or subsequent words of a floating point operand fall outside a segment’s size. Interrupt 13 will occur if the starting address of a numeric operand falls outside a segment’s size. An exception handler should be included in 80286/80287 code to report these programming errors. 8. Except for the processor control instructions, all of the 80287 numeric instructions are automatically synchronized by the 80286 CPU; the 80286 CPU automatically tests the BUSYÝ line from the 80287 to ensure that the 80287 has completed its previous instruction before executing the next ESC instruction. No explicit WAIT instructions are required to assure this synchronization. For the 8087 used witth 8086 and 8088 processors, explicit WAITs are required before each numeric instruction to ensure synchronization. Although 8086/8087 programs having explicit WAIT instructions will execute perfectly on the 80286/80287 without reassembly, these WAIT instructions are unnecessary. 9. Since the 80287 does not require WAIT instructions before each numeric instruction, the ASM286 assembler does not automatically generate these WAIT instuctions. The ASM86 assembler, however, automatically precedes every ESC instruction with a WAIT instruction. Although numeric routines generated using the ASM86 assembler will generally execute correctly on the 80286/80287, reassembly using ASM286 may result in a more compact code image. The processor control instructions for the 80287 may be coded using either a WAIT or No-WAIT form of mnemonic. The WAIT forms of these instructions cause ASM286 to precede the ESC instructions with a CPU WAIT instruction, in the identical manner as does ASM86. B-1 47