November 1995 NS32CG16-10/NS32CG16-15 High-Performance Printer/Display Processor General Description Features The NS32CG16 is a 32-bit microprocessor in the Series 32000/EPTM family that provides special features for graphics applications. It is specifically designed to support page oriented printing technologies such as Laser, LCS, LED, Ion-Deposition and InkJet. The NS32CG16 provides a 16 Mbyte linear address space and a 16-bit external data bus. It also has a 32-bit ALU, an eight-byte prefetch queue, and a slave processor interface. The capabilities of the NS32CG16 can be expanded by using an external floating point unit which interfaces to the NS32CG16 as a slave processor. This combination provides optimal support for outline character fonts. The NS32CG16’s highly efficient architecture, in addition to the built-in capabilities for supporting BITBLT (BIT-aligned BLock Transfer) operations and other special graphics functions, make the device the ideal choice to handle a variety of page description languages such as PostscriptTM and PCLTM . Y Y Y Y Y Y Y Y Y Software compatible with the Series 32000/EP processors 32-bit architecture and implementation Special support for graphics applications Ð 18 graphics instructions Ð Binary compression/expansion capability for font storage using RLL encoding Ð Pattern magnification Ð Interface to an external BITBLT processing units for fast color BITBLT operations On-chip clock generator Floating-point support via the NS32081 or NS32181 Optimal interface to large memory arrays via the NS32CG821 and the DP84xx family of DRAM controllers Power save mode High-speed CMOS technology 68-pin PLCC package Block Diagram TL/EE/9424 – 1 Series 32000É is a registered trademark of National Semiconductor Corporation. EPTM , Embedded System ProcessorsTM are trademarks of National Semiconductor Corporation. PostscriptTM is a trademark of Adobe Systems, Inc. PCLTM is a trademark of Hewlett Packard. C1995 National Semiconductor Corporation TL/EE/9424 RRD-B30M115/Printed in U. S. A. NS32CG16-10/NS32CG16-15 High-Performance Printer/Display Processor PRELIMINARY Table of Contents 3.0 FUNCTIONAL DESCRIPTION (Continued) 1.0 PRODUCT INTRODUCTION 3.2.4 Non-Maskable Interrupt 1.1 NS32CG16 Special Features 3.2.5 Traps 2.0 ARCHITECTURAL DESCRIPTION 3.2.6 Priority among Exceptions 2.1 Register Set 3.2.7 Exception Acknowledge Sequences: Detailed Flow 3.2.7.1 Maskable/Non-Maskable Interrupt Sequence 3.2.7.2 SLAVE/ILL/SVC/DVZ/FLG/BPT/UND Trap Sequence 3.2.7.3 Trace Trap Sequence 2.1.1 General Purpose Registers 2.1.2 Address Registers 2.1.3 Processor Status Register 2.1.4 Configuration Register 2.2 Memory Organization 2.3 Modular Software Support 3.3 Debugging Support 2.4 Instruction Set 3.3.1 Instruction Tracing 2.4.1 General Instruction Format 2.4.2 Addressing Modes 2.4.3 Instruction Set Summary 3.4 System Interface 3.4.1 Power and Grounding 3.4.2 Clocking 3.4.3 Power Save Mode 3.4.4 Resetting 3.4.5 Bus Cycles 3.4.5.1 Bus Status 3.4.5.2 Basic Read and Write Cycles 3.4.5.3 Cycle Extension 3.4.5.4 Instruction Fetch Cycles 3.4.5.5 Interrupt Control Cycles 3.4.5.6 Slave Processor Bus Cycles 3.4.5.7 Data Access Sequences 3.4.5.8 Bus Access Control 3.4.5.9 Instruction Status 2.4 Graphic Support 2.5.1 Frame Buffer Addressing 2.5.2 BITBLT Fundamentals 2.5.2.1 Frame Buffer Architecture 2.5.2.2 BIT Alignment 2.5.2.3 Block Boundaries and Destination Masks 2.5.2.4 BITBLT Directions 2.5.2.5 BITBLT Variations 2.5.3 Graphics Support Instructions 2.5.3.1 BITBLT (BIT-aligned BLock Transfer) 2.5.3.2 Pattern Fill 2.5.3.3 Data Compression, Expansion and Magnify 2.5.3.3.1 Magnifying Compressed Data 4.0 DEVICE SPECIFICATIONS 4.1 NS32CG16 Pin Descriptions 3.0 FUNCTIONAL DESCRIPTION 4.1.1 Supplies 4.1.2 Input Signals 4.1.3 Output Signals 4.1.4 Input-Output Signals 3.1 Instruction Execution 3.1.1 Operating States 3.1.2 Instruction Endings 3.1.2.1 Completed Instructions 3.1.2.2 Suspended Instructions 3.1.2.3 Terminated Instructions 3.1.2.3 Partially Completed Instructions 3.1.3 Slave Processor Instructions 3.1.3.1 Slave Processor Protocol 3.1.3.2 Floating-Point Instructions 4.2 Absolute Maximum Ratings 4.3 Electrical Characteristics 4.4 Test Loading Characteristics 4.5 Switching Characteristics 4.5.1 Definitions 4.5.2 Timing Tables 4.5.2.1 Output Signals: Internal Propagation Delays 4.5.2.2 Input Signal Requirements 4.5.3 Timing Diagrams 3.2 Exception Processing 3.2.1 Exception Acknowledge Sequence 3.2.2 Returning from an Exception Service Procedure 3.2.3 Maskable Interrupts 3.2.3.1 Non-Vectored Mode 3.2.3.2 Vectored Mode: Non-Cascaded Case 3.2.3.3 Vectored Mode: Cascaded Case 2 Table of Contents (Continued) Appendix A: INSTRUCTION FORMATS Appendix B: INSTRUCTION EXECUTION TIMES B.1 Basic and Floating-Point Instructions B.1.1 Equations B.1.2 Notes on Table Use B.1.3 Calculation of the Execution Time TEX for Basic Instructions B.1.4 Calculation of the Execution Time TEX for Floating-Point Instructions B.2 Special Graphics Instructions B.2.1 Execution Time Calculation for Special Graphics Instructions 3 List of Illustrations CPU Block Diagram ÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀ1-1 NS32FX16 Internal Registers ÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀ2-1 Processor Status Register (PSR) ÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀ2-2 Configuration Register (CFG) ÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀ2-3 NS32CG16 Run-Time EnvironmentÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀ2-4 General Instruction Format ÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀ2-5 Index Byte Format ÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀ2-6 Displacement Encodings ÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀ2-7 Correspondence between Linear and Cartesian Addressing ÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀ2-8 32-Pixel by 32-Scan Line Frame BufferÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀ2-9 Overlapping BITBLT Blocks ÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀ2-10 B B Instructions Format ÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀ2-11 BITWT Instruction Format ÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀ2-12 EXTBLT Instruction Format ÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀ2-13 MOVMPi Instruction Format ÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀ2-14 TBITS Instruction Format ÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀ2-15 SBITS Instruction Format ÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀ2-16 SBITPS Instruction Format ÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀ2-17 Bus Activity for a Simple BITBLT OperationÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀ2-18 Operating States ÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀ3-1 Slave Processor Protocol ÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀ3-2 Slave Processor Status Word ÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀ3-3 Interrupt Dispatch Table ÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀ3-4 Exception Acknowledge Sequence ÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀ3-5 Return from Trap (RETTn) Instruction Flow ÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀ3-6 Return from Interrupt (RETI) Instruction Flow ÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀ3-7 Interrupt Control Unit Connections (16 Levels) ÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀ3-8 Cascaded Interrupt Control Unit Connections ÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀ3-9 Exception Processing Flowchart ÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀ3-10 Service Sequence ÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀ3-11 Power and Ground Connections ÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀ3-12 Crystal InterconnectionsÐ20 MHz, 30 MHz ÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀ3-13a Crystal InterconnectionsÐ30 MHz ÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀ3-13b Recommended Reset ConnectionsÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀ3-14 Power-On Reset Requirements ÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀ3-15 General Reset Timings ÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀ3-16 Bus ConnectionsÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀ3-17 Read Cycle Timing ÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀ3-18 Write Cycle Timing ÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀ3-19 4 List of Illustrations (Continued) Cycle Extension of a Read Cycle ÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀ3-20 Slave Processor Read Cycle ÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀ3-21 Slave Processor Write CycleÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀ3-22 NS32FX16 and FPU Interconnections ÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀ3-23 Memory Interface ÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀ3-24 HOLD Timing, Bus Initially Idle ÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀ3-25 HOLD Timing, Bus Initially Not Idle ÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀ3-26 Connection DiagramÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀ4-1 Test Loading Configuration ÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀ4-2 Output Signals Specification Standard ÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀ4-3 Input Signals Specification Standard ÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀ4-4 Read CycleÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀ4-5 Write CycleÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀ4-6 HOLD Acknowledge Timing (Bus Initially Not Idle) ÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀ4-7 HOLD Timing (Bus Initially Idle) ÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀ4-8 External DMA Controller Bus Cycle ÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀ4-9 Slave Processor Write TimingÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀ4-10 Slave Processor Read Timing ÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀ4-11 SPC Timing ÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀ4-12 PFS Signal Timing ÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀ4-13 ILO Signal Timing ÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀ4-14 Clock Waveforms ÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀ4-15 INT Signal Timing ÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀ4-16 NITI Signal Timing ÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀ4-17 Power-On Reset ÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀ4-18 Non-Power-On ResetÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀ4-19 List of Tables NS32FX16 Addressing Modes ÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀ2-1 NS32FX16 Instruction Set Summary ÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀ2-2 ‘op’ and ‘i’ Field Encodings ÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀ2-3 Floating-Point Instruction ProtocolsÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀ3-1 Summary of Exception Processing ÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀ3-2 External Oscillator SpecificationsÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀ3-3 Interrupt Sequences ÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀ3-4 Bus Cycle Categories ÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀ3-5 Data Access Sequences ÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀ3-6 Basic Instructions ÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀB-1 Floating-Point Instructions: CPU Portion ÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀB-2 Average Instruction Execution Times with No Wait-States ÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀB-3 Average Instruction Execution Times with Wait-States ÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀB-4 5 1.0 Product Introduction hardware is transparent to the software. With future advances in semiconductor technology, the slaves can be physically integrated on the CPU chip itself. To summarize, the architectural features cited above provide three primary performance advantages and characteristics: The NS32CG16 is a high speed CMOS microprocessor in the Series 32000/EP family. The NS32CG16 is software-compatible with all other CPUs in the family. The device incorporates all of the Series 32000 advanced architectural features, with the exception of the virtual memory capability. Brief descriptions of the NS32CG16 features that are shared with other members of the family are provided below: Powerful Addressing Modes. Nine addressing modes available to all instructions are included to access data structures efficiently. Data Types. The architecture provides for numerous data types, such as byte, word, doubleword, and BCD, which may be arranged into a wide variety of data structures. Symmetric Instruction Set. While avoiding special case instructions that compilers can’t use, the Series 32000 family incorporates powerful instructions for control operations, such as array indexing and external procedure calls, which save considerable space and time for compiled code. Memory-to-Memory Operations. The Series 32000 CPUs represent two-address machines. This means that each operand can be referenced by any one of the addressing modes provided. This powerful memory-to-memory architecture permits memory locations to be treated as registers for all useful operations. This is important for temporary operands as well as for context switching. Large, Uniform Addressing. The NS32CG16 has 24-bit address pointers that can address up to 16 megabytes without any segmentation; this addressing scheme provides flexible memory management without add-on expense. Modular Software Support. Any software package for the Series 32000 architecture can be developed independent of all other packages, without regard to individual addressing. In addition, ROM code is totally relocatable and easy to access, which allows a significant reduction in hardware and software cost. Software Processor Concept. The Series 32000 architecture allows future expansions of the instruction set that can be executed by special slave processors, acting as extensions to the CPU. This concept of slave processors is unique to the Series 32000 architecture. It allows software compatibility even for future components because the slave # High-Level Language Support # Easy Future Growth Path # Application Flexibility 1.1 NS32CG16 SPECIAL FEATURES In addition to the above Series 32000 features, the NS32CG16 provides features that make the device extremely attractive for a wide range of applications where graphics support, low chip count, and low power consumption are required. The most relevant of these features are the graphics support capabilities, that can be used in applications such as printers, CRT terminals, and other varieties of display systems, where text and graphics are to be handled. Graphics support is provided by eighteen instructions that allow operations such as BITBLT, data compression/expansion, fills, and line drawing, to be performed very efficiently. In addition, the device can be easily interfaced to an external BITBLT Processing Unit (BPU) for high BITBLT performance. The NS32CG16 allows systems to be built with a relatively small amount of random logic. The bus is highly optimized to allow simple interfacing to a large variety of DRAMs and peripheral devices. All the relevant bus access signals and clock signals are generated on-chip. The cycle extension logic is also incorporated on-chip. The device is fabricated in a low-power, CMOS technology. It also includes a power-save feature that allows the clock to be slowed down under software control, thus minimizing the power consumption. This feature can be used in those applications where power saving during periods of low performance demand is highly desirable. The power save feature and the Bus Characteristics are described in the ‘‘Functional Description’’ section. A general overview of BITBLT operations and a description of the graphics support instructions is provided in Section 2.5. Details on all the NS32CG16 instructions can be found in the NS32CG16 Printer/Display Processor Programmer’s Reference Supplement. 6 1.0 Product Introduction (Continued) 2.0 Architectural Description Below is a summary of the instructions that are directly applicable to graphics along with their intended use. 2.1 REGISTER SET Instruction BBAND BBOR BBFOR BBXOR BBSTOD BITWT EXTBLT MOVMP TBITS SBITS SBITPS SBIT CBIT TBIT IBIT INDEX The NS32CG16 CPU has 17 internal registers grouped according to functions as follows: 8 general purpose, 7 address, 1 processor status and 1 configuration. Figure 2-1 shows the NS32CG16 internal registers. Application The BITBLT group of instructions provide a method of quickly imaging characters, creating patterns, windowing and other block oriented effects. w Move Multiple Pattern is a very fast instruction for clearing memory and drawing patterns and lines. Test Bit String will measure the length of 1’s or 0’s in an image, supporting many data compression methods (RLL), TBITS may also be used to test for boundaries of images. Set Bit String is a very fast instruction for filling objects, outline characters and drawing horizontal lines. The TBITS and SBITS instructions support Group 3 and Group 4 CCITT standards for compression and decompression algorithms. Address 32 Bits x General Purpose 32 Bits x w PC R0 SP0 R1 SP1 R2 FP R3 SB R4 INTBASE MOD R5 R6 R7 Processor Status Configuration PSR CFG FIGURE 2-1. NS32CG16 Internal Registers 2.1.1 General Purpose Registers There are eight registers (R0 – R7) used for satisfying the high speed general storage requirements, such as holding temporary variables and addresses. The general purpose registers are free for any use by the programmer. They are 32 bits in length. If a general purpose register is specified for an operand that is 8 or 16 bits long, only the low part of the register is used; the high part is not referenced or modified. Set Bit Perpendicular String is a very fast instruction for drawing vertical, horizontal and 45§ lines. In printing applications SBITS and SBITPS may be used to express portrait and landscape respectively from the same compressed font data. The size of the character may be scaled as it is drawn. 2.1.2 Address Registers The seven address registers are used by the processor to implement specific address functions. Except for the MOD register that is 16 bits wide, all the others are 32 bits. A description of the address registers follows. PCÐProgram Counter. The PC register is a pointer to the first byte of the instruction currently being executed. The PC is used to reference memory in the program section. SP0, SP1ÐStack Pointers. The SP0 register points to the lowest address of the last item stored on the INTERRUPT STACK. This stack is normally used only by the operating system. It is used primarily for storing temporary data, and holding return information for operating system subroutines and interrupt and trap service routines. The SP1 register points to the lowest address of the last item stored on the USER STACK. This stack is used by normal user programs to hold temporary data and subroutine return information. When a reference is made to the selected Stack Pointer (see PSR S-bit), the terms ‘‘SP Register’’ or ‘‘SP’’ are used. SP refers to either SP0 or SP1, depending on the setting of the S bit in the PSR register. If the S bit in the PSR is 0, SP refers to SP0. If the S bit in the PSR is 1 then SP refers to SP1. Stacks in the Series 32000 architecture grow downward in memory. A Push operation pre-decrements the Stack Pointer by the operand length. A Pop operation post-increments the Stack Pointer by the operand length. The Bit group of instructions enable single pixels anywhere in memory to be set, cleared, tested or inverted. The INDEX instruction combines a multiplyadd sequence into a single instruction. This provides a fast translation of an X-Y address to a pixel relative address. 7 2.0 Architectural Description (Continued) ond operand is less than the first operand, when both operands are interpreted as signed integers. Otherwise, it is set to ‘‘0’’. FPÐFrame Pointer. The FP register is used by a procedure to access parameters and local variables on the stack. The FP register is set up on procedure entry with the ENTER instruction and restored on procedure termination with the EXIT instruction. The frame pointer holds the address in memory occupied by the old contents of the frame pointer. SBÐStatic Base. The SB register points to the global variables of a software module. This register is used to support relocatable global variables for software modules. The SB register holds the lowest address in memory occupied by the global variables of a module. INTBASEÐInterrupt Base. The INTBASE register holds the address of the dispatch table for interrupts and traps (Section 3.2.1). MODÐModule. The MOD register holds the address of the module descriptor of the currently executing software module. The MOD register is 16 bits long, therefore the module table must be contained within the first 64 kbytes of memory. U If the U bit is ‘‘1’’ no privileged instructions may be executed. If the U bit is ‘‘0’’ then all instructions may be executed. When U e 0 the processor is said to be in Supervisor Mode; when U e 1 the processor is said to be in User Mode. A User Mode program is restricted from executing certain instructions and accessing certain registers which could interfere with the operating system. For example, a User Mode program is prevented from changing the setting of the flag used to indicate its own privilege mode. A Supervisor Mode program is assumed to be a trusted part of the operating system, hence it has no such restrictions. S The S bit specifies whether the SP0 register or SP1 register is used as the Stack Pointer. The bit is automatically cleared on interrupts and traps. It may have a setting of 0 (use the SP0 register) or 1 (use the SP1 register). P The P bit prevents a TRC trap from occurring more than once for an instruction (Section 3.3.1). It may have a setting of 0 (no trace pending) or 1 (trace pending). I If I e 1, then all interrupts will be accepted. If I e 0, only the NMI interrupt is accepted. Trap enables are not affected by this bit. B Reserved for use by the CPU. This bit is set to 1 during the execution of the EXTBLT instruction and causes the BPU signal to become active. Upon reset, B is set to zero and the BPU signal is set high. 2.1.3 Processor Status Register The Processor Status Register (PSR) holds status information for the microprocessor. The PSR is sixteen bits long, divided into two eight-bit halves. The low order eight bits are accessible to all programs, but the high order eight bits are accessible only to programs executing in Supervisor Mode. 15 B 8 7 I P S U N Z 0 F J K L T Note 1: When an interrupt is acknowledged, the B, I, P, S and U bits are set to zero and the BPU signal is set high. A return from interrupt will restore the original values from the copy of the PSR register saved in the interrupt stack. C FIGURE 2-2. Processor Status Register (PSR) Note 2: If BITBLT (BB) or EXTBLT instructions are executed in an interrupt routine, the PSR bits J and K must be cleared first. C The C bit indicates that a carry or borrow occurred after an addition or subtraction instruction. It can be used with the ADDC and SUBC instructions to perform multipleprecision integer arithmetic calculations. It may have a setting of 0 (no carry or borrow) or 1 (carry or borrow). T The T bit causes program tracing. If this bit is set to 1, a TRC trap is executed after every instruction (Section 3.3.1). L The L bit is altered by comparison instructions. In a comparison instruction the L bit is set to ‘‘1’’ if the second operand is less than the first operand, when both operands are interpreted as unsigned integers. Otherwise, it is set to ‘‘0’’. In Floating-Point comparisons, this bit is always cleared. K Reserved for use by the CPU. J Reserved for use by the CPU. F The F bit is a general condition flag, which is altered by many instructions (e.g., integer arithmetic instructions use it to indicate overflow). Z The Z bit is altered by comparison instructions. In a comparison instruction the Z bit is set to ‘‘1’’ if the second operand is equal to the first operand; otherwise it is set to ‘‘0’’. N The N bit is altered by comparison instructions. In a comparison instruction the N bit is set to ‘‘1’’ if the sec- 2.1.4 Configuration Register The Configuration Register (CFG) is 8 bits wide, of which four bits are implemented. The implemented bits are used to declare the presence of certain external devices and to select the clock scaling factor. CFG is programmed by the SETCFG instruction. The format of CFG is shown in Figure 2-3 . The various control bits are described below. 7 0 C M F I FIGURE 2-3. Configuration Register (CFG) Interrupt vectoring. This bit controls whether maskable interrupts are handled in nonvectored (I e 0) or vectored (I e 1) mode. Refer to Section 3.2.3 for more information. F Floating-point instruction set. This bit indicates whether a floating-point unit (FPU) is present to execute floatingpoint instructions. If this bit is 0 when the CPU executes a floating-point instruction, a Trap (UND) occurs. If this bit is 1, then the CPU transfers the instruction and any necessary operands to the FPU using the slave-processor protocol described in Section 3.1.3.1. M Clock scaling. This bit is used in conjuction with the C bit to select the clock scaling factor. C Clock scaling. Same as the M bit above. Refer to Section 3.4.3 on ‘‘Power Save Mode’’ for details. I 8 2.0 Architectural Description (Continued) 2.2 MEMORY ORGANIZATION 2. Static Data Segment. The main memory of the NS32CG16 is a uniform linear address space. Memory locations are numbered sequentially starting at zero and ending at 224 b 1. The number specifying a memory location is called an address. The contents of each memory location is a byte consisting of eight bits. Unless otherwise noted, diagrams in this document show data stored in memory with the lowest address on the right and the highest address on the left. Also, when data is shown vertically, the lowest address is at the top of a diagram and the highest address at the bottom of the diagram. When bits are numbered in a diagram, the least significant bit is given the number zero, and is shown at the right of the diagram. Bits are numbered in increasing significance and toward the left. Used to store variables and data that may be accessed by all procedures within the module. 3. Link Table. This component contains two types of entries: Absolute Addresses and Procedure Descriptors. An Absolute Address is used in the external addressing mode, in conjunction with a displacement and the current MOD Register contents to compute the effective address of an external variable belonging to another module. The Procedure Descriptor is used in the call external procedure (CXP) instruction to compute the address of an external procedure. Normally, the linker program specifies the locations of the three components. The Static Data and Link Table typically reside in RAM; the code component can be either in RAM or in ROM. The three components can be mapped into noncontiguous locations in memory, and each can be independently relocated. Since the Link Table contains the absolute addresses of external variables, the linker need not assign absolute memory addresses for these in the module itself; they may be assigned at load time. To handle the transfer of control from one module to another, the NS32CG16 uses a module table in memory and two registers in the CPU. The Module Table is located within the first 64 kbytes of memory. This table contains a Module Descriptor (also called a Module Table Entry) for each module in the address space of the program. A Module Descriptor has four 32-bit entries corresponding to each component of a module: 7 0 A Byte at Address A Two contiguous bytes are called a word. Except where noted, the least significant byte of a word is stored at the lower address, and the most significant byte of the word is stored at the next higher address. In memory, the address of a word is the address of its least significant byte, and a word may start at any address. 15 8 7 0 Aa1 A MSB LSB Word at Address A # The Static Base entry contains the address of the begin- Two contiguous words are called a double-word. Except where noted, the least significant word of a double-word is stored at the lowest address and the most significant word of the double-word is stored at the address two higher. In memory, the address of a double-word is the address of its least significant byte, and a double-word may start at any address. 31 24 23 Aa3 16 15 Aa2 8 7 Aa1 MSB ning of the module’s static data segment. # The Link Table Base points to the beginning of the module’s Link Table. # The Program Base is the address of the beginning of the code and constant data for the module. # A fourth entry is currently unused but reserved. The MOD Register in the CPU contains the address of the Module Descriptor for the currently executing module. The Static Base Register (SB) contains a copy of the Static Base entry in the Module Descriptor of the currently executing module, i.e., it points to the beginning of the current module’s static data area. This register is implemented in the CPU for efficiency purposes. By having a copy of the static base entry or chip, the CPU can avoid reading it from memory each time a data item in the static data segment is accessed. In an NS32CG16 software environment modules need not be linked together prior to loading. As modules are loaded, a linking loader simply updates the Module Table and fills the Link Table entries with the appropriate values. No modification of a module’s code is required. Thus, modules may be stored in read-only memory and may be added to a system independently of each other, without regard to their individual addressing. Figure 2-4 shows a typical NS32CG16 run-time environment. 0 A LSB Double Word at Address A Although memory is addressed as bytes, it is actually organized as words. Therefore, words and double-words that are aligned to start at even addresses (multiples of two) are accessed more quickly than words and double-words that are not so aligned. 2.3 MODULAR SOFTWARE SUPPORT The NS32CG16 provides special support for software modules and modular programs. Each module in a NS32CG16 software environment consists of three components: 1. Program Code Segment. This segment contains the module’s code and constant data. 9 2.0 Architectural Description (Continued) TL/EE/9424 – 2 Note: Dashed lines indicate information copied to register during transfer of control between modules. FIGURE 2-4. NS32CG16 Run-Time Environment Following Index Bytes come any displacements (addressing constants) or immediate values associated with the selected addressing modes. Each Disp/lmm field may contain one of two displacements, or one immediate value. The size of a Displacement field is encoded within the top bits of that field, as shown in Figure 2-7 , with the remaining bits interpreted as a signed (two’s complement) value. The size of an immediate value is determined from the Opcode field. Both Displacement and Immediate fields are stored most-significant byte first. Note that this is different from the memory representation of data (Section 2.2). Some instructions require additional ‘‘implied’’ immediates and/or displacements, apart from those associated with addressing modes. Any such extensions appear at the end of the instruction, in the order that they appear within the list of operands in the instruction definition (Section 2.4.3). 2.4 INSTRUCTION SET 2.4.1 General Instruction Format Figure 2-5 shows the general format of a Series 32000 instruction. The Basic Instruction is one to three bytes long and contains the Opcode and up to two 5-bit General Addressing Mode (‘‘Gen’’) fields. Following the Basic Instruction field is a set of optional extensions, which may appear depending on the instruction and the addressing modes selected. Index Bytes appear when either or both Gen fields specify Scaled Index. In this case, the Gen field specifies only the Scale Factor (1, 2, 4 or 8), and the Index Byte specifies which General Purpose Register to use as the index, and which addressing mode calculation to perform before indexing. TL/EE/9424 – 3 FIGURE 2-5. General Instruction Format 10 2.0 Architectural Description (Continued) Byte Displacement: Range b64 to a 63 TL/EE/9424 – 80 FIGURE 2-6. Index Byte Format 2.4.2 Addressing Modes The NS32CG16 CPU generally accesses an operand by calculating its Effective Address based on information available when the operand is to be accessed. The method to be used in performing this calculation is specified by the programmer as an ‘‘addressing mode.’’ Addressing modes in the NS32CG16 are designed to optimally support high-level language accesses to variables. In nearly all cases, a variable access requires only one addressing mode, within the instruction that acts upon that variable. Extraneous data movement is therefore minimized. NS32CG16 Addressing Modes fall into nine basic types: Register: The operand is available in one of the eight General Purpose Registers. In certain Slave Processor instructions, an auxiliary set of eight registers may be referenced instead. Register Relative: A General Purpose Register contains an address to which is added a displacement value from the instruction, yielding the Effective Address of the operand in memory. Memory Space: Identical to Register Relative above, except that the register used is one of the dedicated registers PC, SP, SB or FP. These registers point to data areas generally needed by high-level languages. Memory Relative: A pointer variable is found within the memory space pointed to by the SP, SB or FP register. A displacement is added to that pointer to generate the Effective Address of the operand. Word Displacement: Range b8192 to a 8191 Double Word Displacement: Range (Entire Addressing Space) TL/EE/9424 – 4 FIGURE 2-7. Displacement Encodings Immediate: The operand is encoded within the instruction. This addressing mode is not allowed if the operand is to be written. Absolute: The address of the operand is specified by a displacement field in the instruction. External: A pointer value is read from a specified entry of the current Link Table. To this pointer value is added a displacement, yielding the Effective Address of the operand. Top of Stack: The currently-selected Stack Pointer (SP0 or SP1) specifies the location of the operand. The operand is pushed or popped, depending on whether it is written or read. Scaled Index: Although encoded as an addressing mode, Scaled Indexing is an option on any addressing mode except Immediate or another Scaled Index. It has the effect of calculating an Effective Address, then multiplying any General Purpose Register by 1, 2, 4 or 8 and adding into the total, yielding the final Effective Address of the operand. Table 2-1 is a brief summary of the addressing modes. For a complete description of their actions, see the Series 32000 Instruction Set Reference Manual. In addition to the general modes, Register-Indirect with auto-increment/decrement and warps or pitch are available on several of the graphics instructions. 11 2.0 Architectural Description (Continued) TABLE 2-1. NS32CG16 Addressing Modes ENCODING Register 00000 00001 00010 00011 00100 00101 00110 00111 Register Relative 01000 01001 01010 01011 01100 01101 01110 01111 Memory Relative 10000 10001 10010 Reserved 10011 Immediate 10100 Absolute 10101 External 10110 Top Of Stack 10111 Memory Space 11000 11001 11010 11011 Scaled Index 11100 11101 11110 11111 MODE ASSEMBLER SYNTAX EFFECTIVE ADDRESS Register 0 Register 1 Register 2 Register 3 Register 4 Register 5 Register 6 Register 7 R0 or F0 R1 or F1 R2 or F2 R3 or F3 R4 or F4 R5 or F5 R6 or F6 R6 or F7 None: Operand is in the specified register. Register 0 relative Register 1 relative Register 2 relative Register 3 relative Register 4 relative Register 5 relative Register 6 relative Register 7 relative disp(R0) disp(R1) disp(R2) disp(R3) disp(R4) disp(R5) disp(R6) disp(R7) Disp a Register. Frame memory relative Stack memory relative Static memory relative disp2(disp1 (FP)) disp2(disp1 (SP)) disp2(disp1 (SB)) Disp2 a Pointer; Pointer found at address Disp 1 a Register. ‘‘SP’’ is either SP0 or SP1, as selected in PSR. Immediate value None: Operand is input from instruction queue. Absolute @ disp Disp. External EXT (disp1) a disp2 Disp2 a Pointer; Pointer is found at Link Table Entry number Disp1. Top of stack TOS Top of current stack, using either User or Interrupt Stack Pointer, as selected in PSR. Automatic Push/Pop included. Frame memory Stack memory Static memory Program memory disp(FP) disp(SP) disp(SB) * a disp Disp a Register; ‘‘SP’’ is either SP0 or SP1, as selected in PSR. Index, bytes Index, words Index, double words Index, quad words mode[Rn:B] mode[Rn:W] mode[Rn:D] mode[Rn:Q] EA (mode) a Rn. EA (mode) a 2 c Rn. EA (mode) a 4 c Rn. EA (mode) a 8 c Rn. ‘‘Mode’’ and ‘‘n’’ are contained within the Index Byte. EA (mode) denotes the effective address generated using mode. (Reserved for Future Use) 12 2.0 Architectural Description (Continued) gen e General operand. Any addressing mode can be specified. 2.4.3 Instruction Set Summary Table 2-2 presents a brief description of the NS32CG16 instruction set. The Format column refers to the Instruction Format tables (Appendix A). The Instruction column gives the instruction as coded in assembly language, and the Description column provides a short description of the function provided by that instruction. Further details of the exact operations performed by each instruction may be found in the Series 32000 Instruction Set Reference Manual and the NS32CG16 Printer/Display Processor Programmer’s Reference. Notations: i e Integer length suffix: B e Byte W e Word D e Double Word f e Floating Point length suffix: F e Standard Floating L e Long Floating short e A 4-bit value encoded within the Basic Instruction (see Appendix A for encodings). imm e Implied immediate operand. An 8-bit value appended after any addressing extensions. disp e Displacement (addressing constant): 8, 16 or 32 bits. All three lengths legal. reg e Any General Purpose Register: R0 – R7. areg e Any Processor Register: SP, SB, FP, INTBASE, MOD, PSR, US (bottom 8 PSR bits). cond e Any condition code, encoded as a 4-bit field within the Basic Instruction (see Appendix A for encodings). TABLE 2-2. NS32CG16 Instruction Set Summary MOVES Format 4 2 7 7 7 7 7 4 Operation Operands Description MOVi MOVQi MOVMi MOVZBW MOVZiD MOVXBW MOVXiD ADDR gen,gen short,gen gen,gen,disp gen,gen gen,gen gen,gen gen,gen gen,gen Move a value. Extend and move a signed 4-bit constant. Move multiple: disp bytes (1 to 16). Move with zero extension. Move with zero extension. Move with sign extension. Move with sign extension. Move effective address. Operands Description gen,gen short,gen gen,gen gen,gen gen,gen gen,gen gen,gen gen,gen gen,gen gen,gen gen,gen gen,gen gen,gen gen,gen Add. Add signed 4-bit constant. Add with carry. Subtract. Subtract with carry (borrow). Negate (2’s complement). Take absolute value. Multiply. Divide, rounding toward zero. Remainder from QUO. Divide, rounding down. Remainder from DIV (Modulus). Multiply to extended integer. Divide extended integer. INTEGER ARITHMETIC Format Operation 4 2 4 4 4 6 6 7 7 7 7 7 7 7 ADDi ADDQi ADDCi SUBi SUBCi NEGi ABSi MULi QUOi REMi DIVi MODi MEIi DEIi PACKED DECIMAL (BCD) ARITHMETIC Format Operation Operands 6 6 ADDPi SUBPi gen,gen gen,gen Description Add packed. Subtract packed. 13 2.0 Architectural Description (Continued) TABLE 2-2. NS32CG16 Instruction Set Summary (Continued) INTEGER COMPARISON Format Operation 4 CMPi 2 CMPQi 7 CMPMi Operands gen,gen short,gen gen,gen,disp Description Compare. Compare to signed 4-bit constant. Compare multiple: disp bytes (1 to 16). LOGICAL AND BOOLEAN Format Operation Operands Description ANDi ORi BICi XORi COMi NOTi Scondi gen,gen gen,gen gen,gen gen,gen gen,gen gen,gen gen Logical AND. Logical OR. Clear selected bits. Logical exclusive OR. Complement all bits. Boolean complement: LSB only. Save condition code (cond) as a Boolean variable of size i. Operation Operands Description LSHi ASHi ROTi gen,gen gen,gen gen,gen Logical shift, left or right. Arithmetic shift, left or right. Rotate, left or right. 4 4 4 4 6 6 2 SHIFTS Format 6 6 6 BIT FIELDS Bit fields are values in memory that are not aligned to byte boundaries. Examples are PACKED arrays and records used in Pascal. ‘‘Extract’’ instructions read and align a bit field. ‘‘Insert’’ instructions write a bit field from an aligned source. Format Operation Operands Description 8 8 7 7 8 ARRAYS Format 8 8 EXTi INSi EXTSi INSSi CVTP reg,gen,gen,disp reg,gen,gen,disp gen,gen,imm,imm gen,gen,imm,imm reg,gen,gen Extract bit field (array oriented). Insert bit field (array oriented). Extract bit field (short form). Insert bit field (short form). Convert to bit field pointer. Operation CHECKi INDEXi Operands reg,gen,gen reg,gen,gen Description Index bounds check. Recursive indexing step for multiple-dimensional arrays. 14 2.0 Architectural Description (Continued) TABLE 2-2. NS32CG16 Instruction Set Summary (Continued) STRINGS Options on all string instructions are: String instructions assign specific functions to the General Purpose Registers: R4 Ð Comparison Value R3 Ð Translation Table Pointer R2 Ð String 2 Pointer R1 Ð String 1 Pointer R0 Ð Limit Count B (Backward): Format 5 5 5 Decrement string pointers after each step rather than incrementing. U (Until match): End instruction if String 1 entry matches R4. W (While match): End instruction if String 1 entry does not match R4. All string instructions end when R0 decrements to zero. Operation Operands Description MOVSi MOVST CMPSi CMPST SKPSi SKPST options options options options options options Move string 1 to string 2. Move string, translating bytes. Compare string 1 to string 2. Compare, translating string 1 bytes. Skip over string 1 entries. Skip, translating bytes for until/while. Operation Operands Description JUMP BR Bcond CASEi ACBi JSR BSR CXP CXPD SVC FLAG BPT ENTER EXIT RET RXP RETT RETI gen disp disp gen short,gen,disp gen disp disp gen Jump. Branch (PC Relative). Conditional branch. Multiway branch. Add 4-bit constant and branch if non-zero. Jump to subroutine. Branch to subroutine. Call external procedure Call external procedure using descriptor. Supervisor call. Flag trap. Breakpoint trap. Save registers and allocate stack frame (Enter Procedure). Restore registers and reclaim stack frame (Exit Procedure). Return from subroutine. Return from external procedure call. Return from trap. (Privileged) Return from interrupt. (Privileged) JUMPS AND LINKAGE Format 3 0 0 3 2 3 1 1 3 1 1 1 1 1 1 1 1 1 [reg list], disp [reg list] disp disp disp CPU REGISTER MANIPULATION Format Operation Operands Description 1 1 2 2 3 3 3 5 SAVE RESTORE LPRi SPRi ADJSPi BISPSRi BICPSRi SETCFG [reg list] [reg list] areg,gen areg,gen gen gen gen [option list] Save general purpose registers. Restore general purpose registers. Load dedicated register. (Privileged if PSR or INTBASE) Store dedicated register. (Privileged if PSR or INTBASE) Adjust stack pointer. Set selected bits in PSR. (Privileged if not Byte length) Clear selected bits in PSR. (Privileged if not Byte length) Set configuration register. (Privileged) 15 2.0 Architectural Description (Continued) TABLE 2-2. NS32CG16 Instruction Set Summary (Continued) FLOATING POINT Format 11 9 9 9 9 9 9 11 11 11 11 11 11 11 9 9 12 12 12 12 MISCELLANEOUS Format 1 1 1 GRAPHICS Format 5 5 5 5 5 5 5 5 5 5 5 Operation MOVf MOVLF MOVFL MOVif ROUNDfi TRUNCfi FLOORfi ADDf SUBf MULf DIVf CMPf NEGf ABSf LFSR SFSR POLYf DOTf SCALBf LOGBf Operands gen,gen gen,gen gen,gen gen,gen gen,gen gen,gen gen,gen gen,gen gen,gen gen,gen gen,gen gen,gen gen,gen gen,gen gen gen gen,gen gen,gen gen,gen gen,gen Description Move a floating point value. Move and shorten a long value to standard. Move and lengthen a standard value to long. Convert any integer to standard or long floating. Convert to integer by rounding. Convert to integer by truncating, toward zero. Convert to largest integer less than or equal to value. Add. Subtract. Multiply. Divide. Compare. Negate. Take absolute value. Load FSR. Store FSR. Polynomial Step. Dot Product. Binary Scale. Binary Log. Operation Operands Description NOP WAIT DIA No operation. Wait for interrupt. Diagnose. Single-byte ‘‘Branch to Self’’ for hardware breakpointing. Not for use in programming. Operation Operands Description BBOR BBAND BBFOR BBXOR BBSTOD BITWT EXTBLT MOVMPi TBITS SBITS SBITPS options* options Bit-aligned block transfer ‘OR’. Bit-aligned block transfer ‘AND’. Bit-aligned block transfer fast ‘OR’. Bit-aligned block transfer ‘XOR’. Bit-aligned block source to destination. Bit-aligned word transfer. External bit-aligned block transfer. Move multiple pattern. Test bit string. Set bit string. Set bit perpendicular string. options options options options BITS Format 4 6 6 6 6 6 8 Operation Operands Description TBITi SBITi SBITIi CBITi CBITIi IBITi FFSi gen,gen gen,gen gen,gen gen,gen gen,gen gen,gen gen,gen Test bit. Test and set bit. Test and set bit, interlocked. Test and clear bit. Test and clear bit, interlocked. Test and invert bit. Find first set bit. *Note: Options are controlled by fields of the instruction, PSR status bits, or dedicated register values. 16 2.0 Architectural Description (Continued) Example 2: Create fat pixel by setting bits at locations (x, y), (x a 1, y), (x, y a 1) and (x a 1, y a 1). 2.5 GRAPHICS SUPPORT The following sections provide a brief description of the NS32CG16 graphics support capabilities. Basic discussions on frame buffer addressing and BITBLT operations are also provided. More detailed information on the NS32CG16 graphics support instructions can be found in the NS32CG16 Printer/Display Processor Programmer’s Reference. Setup: R0 x coordinate R1 y coordinate 2.5.1 Frame Buffer Addressing There are two basic addressing schemes for referencing pixels within the frame buffer: Linear and Cartesian (or x-y). Linear addressing associates a single number to each pixel representing the physical address of the corresponding bit in memory. Cartesian addressing associates two numbers to each pixel representing the x and y coordinates of the pixel relative to a point in the Cartesian space taken as the origin. The Cartesian space is generally defined as having the origin in the upper left. A movement to the right increases the x coordinate; a movement downward increases the y coordinate. The correspondence between the location of a pixel in the Cartesian space and the physical (BIT) address in memory is shown in Figure 2-8 . The origin of the Cartesian space (x e 0, y e 0) corresponds to the bit address ‘ORG’. Incrementing the x coordinate increments the bit address by one. Incrementing the y coordinate increments the bit address by an amount representing the warp (or pitch) of the Cartesian space. Thus, the linear address of a pixel at location (x, y) in the Cartesian space can be found by the following expression. ADDR e ORG a y * WARP a x Warp is the distance (in bits) in the physical memory space between two vertically adjacent bits in the Cartesian space. Example 1 below shows two NS32CG16 instruction sequences to set a single pixel given the x and y coordinates. Example 2 shows how to create a fat pixel by setting four adjacent bits in the Cartesian space. 1, R1 R1, ORG ; (X01, Y) ; SECOND PIXEL ADDD SBITD (WARP-1), R1 R1, ORG ; (X, Y01) ; THIRD PIXEL ADDQD SBITD 1, R1 R1, ORG ; (X01, Y01) ; LAST PIXEL 2.5.2 BITBLT Fundamentals BITBLT, BIT-aligned BLock Transfer, is a general operator that provides a mechanism to move an arbitrary size rectangle of an image from one part of the frame buffer to another. During the data transfer process a bitwise logical operation can be performed between the source and the destination data. BITBLT is also called RasterOp: operations on rasters. It defines two rectangular areas, source and destination, and performs a logical operation (e.g., AND, OR, XOR) between these two areas and stores the result back to the destination. It can be expressed in simple notation as: ; Y*WARP ; 0 X 4 BIT OFFSET ; SET PIXEL Instruction Sequence 2: INDEXD R1, (WARP-1), R0 SBITD R1, ORG ADDQD SBITD TL/EE/9424 – 5 Instruction Sequence 1: WARP, R1 R0, R1 R1, ORG ; BIT ADDRESS ; SET FIRST PIXEL FIGURE 2-8. Correspondence between Linear and Cartesian Addressing Example 1: Set pixel at location (x, y) Setup: R0 x coordinate R1 y coordinate MULD ADDD SBITD Instruction Sequence: INDEXD R1, (WARP-1), R0 SBITD 41, ORG Source op Destination x Destination op: AND, OR, XOR, etc. ; Y*WARP 0 X ; SET PIXEL 17 2.0 Architectural Description (Continued) Due to the plane-oriented frame buffer architecture, all memory operations must be word-aligned. In order to preserve the neighboring bits surrounding the BITBLT destination block, both a left mask and a right mask are needed for all the leftmost and all the rightmost data words of the destination block. The left mask and the right mask both remain the same during a BITBLT operation. The following example illustrates the bit alignment requirements. In this example, the memory data path is 16 bits wide. Figure 2-9 shows a 32 pixel by 32 scan line frame buffer which is organized as a long bit stream which wraps around every two words (32 bits). The origin (top left corner) of the frame buffer starts from the lowest word in memory (word address 00 (hex)). Each word in the memory contains 16 bits, D0 – D15. The least significant bit of a memory word, D0, is defined as the first displayed pixel in a word. In this example, BITBLT addresses are expressed as pixel addresses relative to the origin of the frame buffer. The source block starting address is 021 (hex) (the second pixel in the third word). The destination block starting address is 204 (hex) (the fifth pixel in the 33rd word). The block width is 13 (hex), and the height is 06 (hex) (corresponding to 6 scan lines). The shift value is 3. 2.5.2.1 Frame Buffer Architecture There are two basic types of frame buffer architectures: plane-oriented or pixel-oriented. BITBLT takes advantage of the plane-oriented frame buffer architecture’s attribute of multiple, adjacent pixels-per-word, facilitating the movement of large blocks of data. The source and destination starting addresses are expressed as pixel addresses. The width and height of the block to be moved are expressed in terms of pixels and scan lines. The source block may start and end at any bit position of any word, and the same applies for the destination block. 2.5.2.2 Bit Alignment Before a logical operation can be performed between the source and the destination data, the source data must first be bit aligned to the destination data. In Figure 2-9 , the source data needs to be shifted three bits to the right in order to align the first pixel (i.e., the pixel at the top left corner) in the source data block to the first pixel in the destination data block. 2.5.2.3 Block Boundaries and Destination Masks Each BITBLT destination scan line may start and end at any bit position in any data word. The neighboring bits (bits sharing the same word address with any words in the destination data block, but not a part of the BITBLT rectangle) of the BITBLT destination scan line must remain unchanged after the BITBLT operation. TL/EE/9424 – 6 FIGURE 2-9. 32-Pixel by 32-Scan Line Frame Buffer 18 2.0 Architectural Description (Continued) TL/EE/9424 – 68 TL/EE/9424 – 8 (a) (b) FIGURE 2-10. Overlapping BITBLT Blocks The left mask and the right mask are 0000,1111,1111,1111 and 1111,1111,0000,0000 respectively. Note 1: Zeros in either the left mask or the right mask indicate the destination bits which will not be modified. Note 2: The BB(function) and EXTBLT instructions use different set up parameters, and techniques. UNTIL UNTIL RETURN 2.5.2.2 BITBLT Directions A BITBLT operation moves a rectangular block of data in a frame buffer. The operation itself can be considered as a subroutine with two nested loops. The loops are preceded by setup operations. In the outer loop the source and destination starting addresses are calculated, and the test for completion is performed. In the inner loop the actual data movement for a single scan line takes place. The length of the inner loop is the number of (aligned) words spanned by each scan line. The length of the outer loop is equal to the height (number of scan lines) of the block to be moved. A skeleton of the subroutine representing the BITBLT operation follows. BITBLT: calculate BITBLT setup parameters; (once per BITBLT operation). such as width, height bit misalignment (shift number) left, right masks horizontal, vertical directions etc done horizontally done vertically (from BITBLT). Note: In the NS32CG16 only the setup operations must be done by the programmer. The inner and outer loops are automatically executed by the BITBLT instructions. Each loop can be executed in one of two directions: the inner loop from left to right or right to left, the outer loop from top to bottom (down) or bottom to top (up). The ability to move data starting from any corner of the BITBLT rectangle is necessary to avoid destroying the BITBLT source data as a result of destination writes when the source and destination are overlapped (i.e., when they share pixels). This situation is routinely encountered while panning or scrolling. A determination of the correct execution directions of the BITBLT must be performed whenever the source and destination rectangles overlap. Any overlap will result in the destruction of source data (from a destination write) if the correct vertical direction is not used. Horizontal BITBLT direction is of concern only in certain cases of overlap, as will be explained below. # # Figures 2-10(a) and (b) illustrate two cases of overlap. Here, the BITBLT rectangles are three pixels wide by five scan lines high; they overlap by a single pixel in (a) and a single column of pixels in (b) . For purposes of illustration, the BITBLT is assumed to be carried out pixel-by-pixel. This convention does not affect the conclusions. In Figure 2-10(a) , if the BITBLT is performed in the UP direction (bottom-to-top) one of the transfers of the bottom scan line of the source will write to the circled pixel of the destination. Due to the overlap, this pixel is also part of the uppermost scan line of the source rectangle. Thus, data needed later is destroyed. Therefore, this BITBLT must be performed in the DOWN direction. Another example of this oc- OUTERLOOP: calculate source, dest addresses; (once per scanline). INNERLOOP: move data, (logical operation) and increment addresses; (once per word). 19 2.0 Architectural Description (Continued) These five instructions perform standard BITBLT operations between source and destination blocks. The operations available include the following: curs any time the screen is moved in a purely vertical direction, as in scrolling text. It should be noted that, in both of these cases, the choice of horizontal BITBLT direction may be made arbitrarily. AND dst AND dst BBOR: src OR dst b src OR dst BBXOR: src XOR dst b src XOR dst BBFOR: src OR dst BBSTOD: src TO dst b src TO dst ‘src’ and ‘bsrc’ stand for ‘True Source’ and ‘Inverted Source’ respectively; ‘dst’ stands for ‘Destination’. b src 2.5.2.5 BITBLT Variations The ‘‘classical’’ definition of BITBLT, as described in ‘‘Smalltalk-80 The Language and its Implementation’’, by Adele Goldberg and David Robson, provides for three operands: source, destination and mask/texture. This third operand is commonly used in monochrome systems to incorporate a stipple pattern into an area. These stipple patterns provide the appearance of multiple shades of gray in singlebit-per-pixel systems, in a manner similar to the ‘‘halftone’’ process used in printing. Texture op1 Source op2 Destination x Destination Note 1: For speed reasons, the BB instructions require the masks to be specified with respect to the source block. In Figure 2-9 masking was defined relative to the destination block. Note 2: The options b S and DA are not available for the BBFOR instruction. Note 3: BBFOR performs the same operation as BBOR with IA and S options. Note 4: IA and DA are mutually exclusive and so are S and b S. Note 5: The width is defined as the number of words of source data to read. Note 6: An odd number of bytes can be specified for the source warp. However, word alignment of source scan lines will result in faster execution. While the NS32CG16 and the external BPU (if used) are essentially two-operand devices, three-operand BITBLT operations can be implemented quite flexibly and efficiently by performing the two operations serially. The horizontal and vertical directions of the BITBLT operations performed by the above instructions, with the exception of BBFOR, are both programmable. The horizontal direction is controlled by the IA and DA options. The vertical direction is controlled by the sign of the source and destination warps. Figure 2-11 and Table 2-3 show the format of the BB instructions and the encodings for the ‘op’ and ‘i’ fields. 2.5.3 GRAPHICS SUPPORT INSTRUCTIONS The NS32CG16 provides eleven instructions for supporting graphics oriented applications. These instructions are divided into three groups according to the operations they perform. General descriptions for each of them and the related formats are provided in the following sections. 2.5.3.1 BITBLT (BIT-aligned BLock Transfer) This group includes seven instructions. They are used to move characters and objects into the frame buffer which will be printed or displayed. One of the instructions works in conjunction with an external BITBLT Processing Unit (BPU) to maximize performance. The other six are executed by the NS32CG16. 23 16 15 8 7 0 0 0 0 0 0 D X S 0 op i 0 0 0 0 0 1 1 1 0 # D is set when the DA option is selected # S is set when the b S option is selected # X is set for BBAND, and it is clear for all other BB instructions FIGURE 2-11. BB Instructions Format BIT-aligned BLock Transfer Syntax: BB(function) Options Setup: R0 base address, source data R1 base address, destination data R2 shift value R3 height (in lines) R4 first mask R5 second mask R6 source warp (adjusted) R7 destination warp (adjusted) 0(SP) width (in words) Function: AND, OR, XOR, FOR, STOD Options: IA Increasing Address (default option). When IA is selected, scan lines are transferred in the increasing BIT/BYTE order. DA Decreasing Address. S True Source (default option). bS src BBAND: Figure 2-10(b) demonstrates a case in which the horizontal BITBLT direction may not be chosen arbitrarily. This is an instance of purely horizontal movement of data (panning). Because the movement from source to destination involves data within the same scan line, the incorrect direction of movement will overwrite data which will be needed later. In this example, the correct direction is from right to left. TABLE 2-3. ‘op’ and ‘i’ Field Encodings Options ‘op’ Field ‘i’ Field BBAND Instruction Yes 1010 11 BBOR Yes 0110 01 BBXOR Yes 1110 01 BBFOR No 1100 01 BBSTOD Yes 0100 01 BIT-aligned Word Transfer Syntax: BITWT Setup: R0 Base address, source word R1 Base address, destination double word R2 Shift value The BITWT instruction performs a fast logical OR operation between a source word and a destination double word, stores the result into the destination double word and increments registers R0 and R1 by two. Before performing the OR operation, the source word is shifted left (i.e., in the direction of increasing bit numbers) by the value in register R2. Inverted Source. 20 2.0 Architectural Description (Continued) This instruction stores the pattern in register R3 into the destination area whose address is in register R0. The pattern count is specified in register R2. After each store operation the destination address is changed by the contents of register R1. This allows the pattern to be stored in rows, in columns, and in any direction, depending on the value and sign of R1. The MOVMPi instruction format is shown in Figure 2-14 . This instruction can be used within the inner loop of a block OR operation. Its use assumes that the source data is ‘clean’ and does not need masking. The BITWT format is shown in Figure 2-12 . 23 16 15 8 7 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 1 1 0 23 FIGURE 2-12. BITWT Instruction Format External BITBLT Syntax: EXTBLT Setup: R0 R1 R2 R3 R4 R5 R6 R7 0 0 0 0 0 0 0 0 0 0 0 1 1 1 8 7 i 0 0 0 0 0 1 1 1 0 FIGURE 2-14. MOVMPi Instruction Format base addresses, source data base address, destination data width (in bytes) height (in lines) horizontal increment/decrement temporary register (current width) source warp (adjusted) destination warp (adjusted) 2.5.3.3 Data Compression, Expansion and Magnify The three instructions in this group can be used to compress data and restore data from compression. A compressed character set may require from 30% to 50% less memory space for its storage. The compression ratio possible can be 50:1 or higher depending on the data and algorithm used. TBITS can also be used to find boundaries of an object. As a character is needed, the data is expanded and stored in a RAM buffer. The expand instructions (SBITS, SBITPS) can also function as line drawing instructions. Note 1: R0 and R1 are updated after execution to point to the last source and destination addresses plus related warps. R2, R3 and R5 will be modified. R4, R6, and R7 are returned unchanged. Note 2: Source and destination pointers should point to word-aligned operands to maximize speed and minimize external interface logic. Test Bit String Syntax: TBITS option Setup: R0 base address, source (byte address) R1 starting source bit offset R2 destination run length limited code R3 maximum value run length limit R4 maximum source bit offset Option: 1 count set bits until a clear bit is found 0 count clear bits until a set bit is found This instruction performs an entire BITBLT operation in conjunction with an external BITBLT Processing Unit (BPU). The external BPU Control Register should be loaded by the software before the instruction is executed (refer to the DP8510 or DP8511 data sheets for more information on the BPU). The NS32CG16 generates a series of source read, destination read and destination write bus cycles until the entire data block has been transferred. The BITBLT operation can be performed in either horizontal direction. As controlled by the sign of the contents of register R4. Depending on the relative alignment of the source and destination blocks, an extra source read may be required at the beginning of each scan line, to load the pipeline register in the external BPU. The L bit in the PSR register determines whether the extra source read is performed. If L is 1, no extra read is performed. The instructions CMPQB 2,1 or CMPQB 1,2 could be executed to provide the right setting for the L bit just before executing EXTBLT. Figure 2-13 shows the EXTBLT format. The bus activity for a simple BITBLT operation is shown in Figure 2-18. 23 15 15 8 7 Note: R0, R3 and R4 are not modified by the instruction execution. R1 reflects the new bit offset. R2 holds the result. This instruction starts at the base address, adds a bit offset, and tests the bit for clear if ‘‘option’’ e 0 (and for set if ‘‘option’’ e 1). If clear (or set), the instruction increments to the next higher bit and tests for clear (or set). This testing for clear proceeds through memory until a set bit is found or until the maximum source bit offset or maximum run length value is reached. The total number of clear bits is stored in the destination as a run length value. When TBITS finds a set bit and terminates, the bit offset is adjusted to reflect the current bit address. Offset is then ready for the next TBITS instruction with ‘‘option’’ e 0. After the instruction is executed, the F flag is set to the value of the bit previous to the bit currently being pointed to (i.e., the value of the bit on which the instruction completed execution). In the case of a starting bit offset exceeding the maximum bit offset (R1 t R4), the F flag is set if the option was 1 and clear if the option was 0. The L flag is set when the desired bit is found, or if the run length equalled the maximum run length value and the bit was not found. It is cleared otherwise. Figure 2-15 shows the TBITS instruction format. 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 1 1 0 0 0 0 1 1 1 0 FIGURE 2-13. EXTBLT Instruction Format 2.5.3.2 Pattern Fill Only one instruction is in this group. It is usually used for clearing RAM and drawing patterns and lines. Move Multiple Pattern Syntax: MOVMPi Setup: R0 base address of the destination R1 pointer increment (in bytes) R2 number of pattern moves R3 source pattern 23 15 8 7 0 0 0 0 0 0 0 0 0 S 0 1 0 0 1 1 1 0 0 0 0 1 1 1 0 # S is set for ‘TBITS 1’ and clear for ‘TBITS 0’. Note: R1 and R3 are not modified by the instruction. R2 will always be returned as zero. R0 is modified to reflect the last address into which a pattern was written. FIGURE 2-15. TBITS Instruction Format 21 2.0 Architectural Description (Continued) Set BIT Perpendicular String Syntax: SBITPS Set Bit String Syntax: SBITS Setup: R0 R1 R2 R3 base address of the destination starting bit offset (signed) number of bits to set (unsigned) address of string look-up table Setup: R0 R1 R2 R3 base address, destination (byte address) starting bit offset number of bits to set destination warp (signed value, in bits) Note: When the instruction terminates, the registers are returned unchanged. Note: When the instruction terminates, the R0 and R3 registers are returned unchanged. R1 becomes the final bit offset. R2 is zero. SBITS sets a number of contiguous bits in memory to 1, and is typically used for data expansion operations. The instruction draws the number of ones specified by the value in R2, starting at the bit address provided by registers R0 and R1. In order to maximize speed and allow drawing of patterned lines, an external 1k byte lookup table is used. The lookup table is specified in the NS32CG16 Printer/Display Processor Programmer’s Reference Supplement. When SBITS begins executing, it compares the value in R2 with 25. If the value in R2 is less than or equal to 25, the F flag is cleared and the appropriate number of bits are set in memory. If R2 is greater than 25, the F flag is set and no other action is performed. This allows the software to use a faster algorithm to set longer strings of bits. Figure 2-16 shows the SBITS instruction format. The SBITPS can be used to set a string of bits in any direction. This allows a font to be expanded with a 90 or 270 degree rotation, as may be required in a printer application. SBITPS sets a string of bits starting at the bit address specified in registers R0 and R1. The number of bits in the string is specified in R2. After the first bit is set, the destination warp is added to the bit address and the next bit is set. The process is repeated until all the bits have been set. A negative raster warp offset value leads to a 90 degree rotation. A positive raster warp value leads to a 270 degree rotation. If the R3 value is e (space warp a 1 or b1), then the result is a 45 degree line. If the R3 value is a 1 or b1, a horizontal line results. SBITS and SBITPS allow expansion on any 90 degree angle, giving portrait, landscape and mirror images from one font. Figure 2-17 shows the SBITPS instruction format. 23 15 8 7 0 23 15 8 7 0 0 0 0 0 0 0 0 0 0 0 1 1 0 1 1 1 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 1 0 1 1 1 1 0 0 0 0 1 1 1 0 FIGURE 2-16. SBITS Instruction Format FIGURE 2-17. SBITPS Instruction Format TL/EE/9424 – 9 FIGURE 2-18. Bus Activity for a Simple BITBLT Operation Note 1: This example is for a block 4 words wide and 1 line high. Note 2: The sequence is common with all logical operations of the DP8510/DP8511 BPU. Note 3: Mask values, shift values and number of bit planes do not affect the performance. Note 4: Zero wait states are assumed throughout the BITBLT operation. Note 5: The extra read is performed when the BPU pipeline register needs to be preloaded. 22 2.0 Architectural Description (Continued) 2.5.3.3.1 Magnifying Compressed Data Restoring data is just one application of the SBITS and SBITPS instructions. Multiplying the ‘‘length’’ operand used by the SBITS and SBITPS instructions causes the resulting pattern to be wider, or a multiple of ‘‘length’’. As the pattern of data is expanded, it can be magnified by 2x, 3x, 4x, . . . , 10x and so on. This creates several sizes of the same style of character, or changes the size of a logo. A magnify in both dimensions X and Y can be accomplished by drawing a single line, then using the MOVS (Move String) or the BB instructions to duplicate the line, maintaining an equal aspect ratio. More information on this subject is provided in the NS32CG16 Printer/Display Processor Programmer’s Reference Supplement. 3.0 Functional Description This chapter provides details on the functional characteristics of the NS32CG16 microprocessor. The chapter is divided into four main sections: Instruction Execution, Exception Processing, Debugging and System Interface. TL/EE/9424 – 10 3.1 INSTRUCTION EXECUTION To execute an instruction, the NS32CG16 performs the following operations: FIGURE 3-1. Operating States In the Executing-Instructions state, the CPU executes instructions. It will exit this state when an exception is recognized or a WAIT instruction is encountered. At which time it enters the Processing-An-Exception state or the WaitingFor-An-Interrupt state respectively. While in the Processing-An-Exception state, the CPU saves the PC, PSR and MOD register contents on the stack and reads the new PC and module linkage information to begin execution of the exception service procedure. Following the completion of all data references required to process an exception, the CPU enters the Executing-Instructions state. In the Waiting-For-An-Interrupt state, the CPU is idle. A special status identifying this state is presented on the system interface (Section 3.5). When an interrupt is detected, the CPU enters the Processing-An-Exception State. # # # # # # Fetch the Instruction Read Source Operands, if Any (1) Calculate Results Write Result Operands, if Any Modify Flags, if Necessary Update the Program Counter Under most circumstances, the CPU can be conceived to execute instructions by completing the operations above in strict sequence for one instruction and then beginning the sequence of operations for the next instruction. However, due to the internal instruction pipelining, as well as the occurrence of exceptions, the sequence of operations performed during the execution of an instruction may be altered. Furthermore, exceptions also break the sequentiality of the instructions executed by the CPU. 3.1.2 Instruction Endings The NS32CG16 checks for exceptions at various points while executing instructions. Certain exceptions, like interrupts, are in most cases recognized between instructions. Other exceptions, like Divide-By-Zero Trap, are recognized during execution of an instruction. When an exception is recognized during execution of an instruction, the instruction ends in one of four possible ways: completed, suspended, terminated, or partially completed. Each type of exception causes a particular ending, as specified in Section 3.2. Note 1: In this and following sections, memory locations read by the CPU to calculate effective addresses for Memory-Relative and External addressing modes are considered like source operands, even if the effective address is being calculated for an operand with access class of write. 3.1.1 Operating States The CPU has four operating states regarding the execution of instructions and the processing of exceptions: Reset, Executing Instructions, Processing An Exception and WaitingFor-An-Interrupt. The various states and transitions between them are shown in Figure 3-1 . Whenever the RSTI signal is asserted, the CPU enters the reset state. The CPU remains in the reset state until the RSTI signal is driven inactive, at which time it enters the Executing-Instructions state. In the Reset state the contents of certain registers are initialized. Refer to Section 3.5.4 for details. 23 3.0 Functional Description (Continued) 3.1.2.1 Completed Instructions 3.1.2.3 Terminated Instructions When an exception is recognized after an instruction is completed, the CPU has performed all of the operations for that instruction and for all other instructions executed since the last exception occurred. Result operands have been written, flags have been modified, and the PC saved on the Interrupt Stack contains the address of the next instruction to execute. The exception service procedure can, at its conclusion, execute the RETT instruction (or the RETI instruction for maskable interrupts), and the CPU will begin executing the instruction following the completed instruction. 3.1.2.2 Suspended Instructions An instruction is suspended when one of several trap conditions is detected during execution of the instruction. A suspended instruction has not been completed, but all other instructions executed since the last exception occurred have been completed. Result operands and flags due to be affected by the instruction may have been modified, but only modifications that allow the instruction to be executed again and completed can occur. For certain exceptions (Trap (UND) the CPU clears the P-flag in the PSR before saving the copy that is pushed on the Interrupt Stack. The PC saved on the Interrupt Stack contains the address of the suspended instruction. To complete a suspended instruction, the exception service procedure takes either of two actions: 1. The service procedure can simulate the suspended instruction’s execution. After calculating and writing the instruction’s results, the flags in the PSR copy saved on the Interrupt Stack should be modified, and the PC saved on the Interrupt Stack should be updated to point to the next instruction to execute. The service procedure can then execute the RETT instruction, and the CPU begins executing the instruction following the suspended instruction. This is the action taken when floating-point instructions are simulated by software in systems without a hardware floating-point unit. 2. The suspended instruction can be executed again after the service procedure has eliminated the trap condition that caused the instruction to be suspended. The service procedure should execute the RETT instruction at its conclusion; then the CPU begins executing the suspended instruction again. This is the action taken by a debugger when it encounters a BPT instruction that was temporarily placed in another instruction’s location in order to set a breakpoint. An instruction being executed is terminated when reset occurs. Any result operands and flags due to be affected by the instruction are undefined, as is the contents of the PC. 3.1.2.4 Partially Completed Instructions When an interrupt condition is recognized during execution of a string instruction, the instruction is said to be partially completed. A partially completed instruction has not completed, but all other instructions executed since the last exception occurred have been completed. Result operands and flags due to be affected by the instruction may have been modified, but the values stored in the string pointers and other general-purpose registers used during the instruction’s execution allow the instruction to be executed again and completed. The CPU clears the P-flag in the PSR before saving the copy that is pushed on the Interrupt Stack. The PC saved on the Interrupt Stack contains the address of the partially completed instruction. The exception service procedure can, at its conclusion, simply execute the RETT instruction (or the RETI instruction for maskable interrupts), and the CPU will resume executing the partially completed instruction. 3.1.3 Slave Processor Instructions The NS32CG16 supports only one group of instructions, the floating-point instruction set, as being executable by a slave processor. The floating-point instruction set is validated by the F-bit in the CFG register. If a floating-point instruction is encountered and the F-bit in the CFG register is not set, a Trap (UND) will result, without any slave processor communication attempted by the CPU. This allows software emulation in case an external floatingpoint unit (FPU) is not used. 3.1.3.1 Slave Processor Protocol Slave Processor instructions have a three-byte Basic Instruction field, consisting of an ID Byte followed by an Operation Word. The ID Byte has three functions: 1. It identifies the instruction as being a Slave Processor instruction. 2. It specifies which Slave Processor will execute it. 3. It determines the format of the following Operation Word of the instruction. Upon receiving a Slave Processor instruction, the CPU initiates the sequence outlined in Figure 3-2 . While applying Status Code 1111 (Broadcast ID, Section 3.5.5.1), the CPU transfers the ID Byte on the least-significant half of the Data Bus (AD0 – AD7). All Slave Processors input this byte and decode it. The Slave Processor selected by the ID Byte is activated, and from this point the CPU is communicating only with it. If any other slave protocol was in progress (e.g., an aborted Slave instruction), this transfer cancels it. Note 1: It may be necessary for the exception service procedure to alter the P-flag in the PSR copy saved on the Interrupt Stack: If the exception service procedure simulates the suspended instruction and the Pflag was cleared by the CPU before saving the PSR copy, then the saved T-flag must be copied to the saved P-flag (like the floatingpoint instruction simulation described above). Or if the exception service procedure executes the suspended instruction again and the P-flag was not cleared by the CPU before saving the PSR copy, then the saved P-flag must be cleared (like the breakpoint trap described above). Otherwise, no alteration to the saved P-flag is necessary. 24 3.0 Functional Description (Continued) diately trap through the Slave vector in the Interrupt Table. Certain Slave Processor instructions cause CPU PSR bits to be loaded from the Status Word. The CPU next sends the Operation Word while applying Status Code 1101 (Transfer Slave Operand, Section 3.5.5.1). Upon receiving it, the Slave Processor decodes it, and at this point both the CPU and the Slave Processor are aware of the number of operands to be transferred and their sizes. The Operation Word is swapped on the Data Bus; that is, bits 0–7 appear on pins AD8–AD15 and bits 8 – 15 appear on pins AD0–AD7. Using the Address Mode fields within the Operation Word, the CPU starts fetching operands and issuing them to the Slave Processor. To do so, it references any Addressing Mode extensions which may be appended to the Slave Processor instruction. Since the CPU is solely responsible for memory accesses, these extensions are not sent to the Slave Processor. The Status Code applied is 1101 (Transfer Slave Processor Operand, Section 3.5.5.1). After the CPU has issued the last operand, the Slave Processor starts the actual execution of the instruction. Upon completion, it will signal the CPU by pulsing SPC low. While the Slave Processor is executing the instruction, the CPU is free to prefetch instructions into its queue. If it fills the queue before the Slave Processor finishes, the CPU will wait, applying Status Code 0011 (Waiting for Slave). Upon receiving the pulse on SPC, the CPU uses SPC to read a Status Word from the Slave Processor, applying Status Code 1110 (Read Slave Status). This word has the format shown in Figure 3-3 . If the Q-bit (‘‘Quit’’, Bit 0) is set, this indicates that an error was detected by the Slave Processor. The CPU will not continue the protocol, but will imme- Status Combinations: Send ID (ID): Code 1111 Xfer Operand (OP): Code 1101 Read Status (ST): Code 1110 Step 1 2 3 4 5 6 7 Status ID OP OP Ð Action CPU Sends ID Byte CPU Sends Operation Word CPU Sends Required Operands Slave Starts Execution. CPU Pre-Fetches. Ð Slave Pulses SPC Low ST CPU Reads Status Word. (Trap? Alter Flags?) OP CPU Reads Results (If Any). FIGURE 3-2. Slave Processor Protocol The last step in the protocol is for the CPU to read a result, if any, and transfer it to the destination. The Read cycles from the Slave Processor are performed by the CPU while applying Status Code 1101 (Transfer Slave Operand). 3.1.3.2 Floating-Point Instructions Table 3-1 gives the protocols followed for each FloatingPoint instruction. The instructions are referenced by their mnemonics. For the bit encodings of each instruction, see Appendix A. TABLE 3-1. Floating-Point Instruction Protocols Operand 2 Operand 1 Operand 2 Class Issued Issued rmw.f f f rmw.f f f rmw.f f f rmw.f f f ADDf SUBf MULf DIVf Operand 1 Class read.f read.f read.f read.f MOVf ABSf NEGf read.f read.f read.f write.f write.f write.f f f f N/A N/A N/A f to Op.2 f to Op.2 f to Op.2 none none none CMPf read.f read.f f f N/A N,Z,L FLOORfi TRUNCfi ROUNDfi read.f read.f read.f write.i write.i write.i f f f N/A N/A N/A i to Op.2 i to Op.2 i to Op.2 none none none MOVFL MOVLF read.F read.L write.L write.F F L N/A N/A L to Op.2 F to Op.2 none none MOVif read.i write.f i N/A f to Op.2 none LFSR SFSR read.D N/A N/A write.D D N/A N/A N/A N/A D to Op. 2 none none POLYf DOTf SCALBf LOGBf read.f read.f read.f read.f read.f read.f rmw.f write.f f f f f f f f N/A f to F0 f to F0 f to Op. 2 f to Op. 2 none none none none Mnemonic Notes: D e Double Word i e Integer size (B, W, D) specified in mnemonic. f e Floating-Point type (F, L) specified in mnemonic. N/A e Not Applicable to this instruction. 25 Returned Value Type and Dest. f to Op.2 f to Op.2 f to Op.2 f to Op.2 PSR Bits Affected none none none none 3.0 Functional Description (Continued) whose purpose is to cause a trap to occur (e.g., supervisor call instruction). The Operand class columns give the Access Class for each general operand, defining how the addressing modes are interpreted (see Series 32000 Instruction Set Reference Manual). The Operand Issued columns show the sizes of the operands issued to the Floating-Point Unit by the CPU. ‘‘D’’ indicates a 32-bit Double Word. ‘‘i’’ indicates that the instruction specifies an integer size for the operand (B e Byte, W e Word, D e Double Word). ‘‘f’’ indicates that the instruction specifies a Floating-Point size for the operand (F e 32-bit Standard Floating, L e 64-bit Long Floating). The Returned Value Type and Destination column gives the size of any returned value and where the CPU places it. The PSR Bits Affected column indicates which PSR bits, if any, are updated from the Slave Processor Status Word (Figure 3-3) . When an exception is recognized, the CPU saves the PC, PSR and the MOD register contents on the interrupt stack and then it transfers control to an exception service procedure. Details on the operations performed in the various cases by the CPU to enter and exit the exception service procedure are given in the following sections. It is to be noted that the reset operation is not treated here as an exception. Even though, like any exception, it alters the instruction execution sequence. The reason being that the CPU handles reset in a significantly different way than it does for exceptions. Refer to Section 3.4.4 for details on the reset operation. 3.2.1 Exception Acknowledge Sequence When an exception is recognized, the CPU goes through three major steps: 1) Adjustment of Registers. Depending on the source of the exception, the CPU may restore and/or adjust the contents of the Program Counter (PC), the Processor Status Register (PSR) and the currently-selected Stack Pointer (SP). A copy of the PSR is made, and the PSR is then set to reflect Supervisor Mode and selection of the Interrupt Stack. 2) Vector Acquisition. A Vector is either obtained from the Data Bus or is supplied by default. 3) Service Call. The Vector is used as an index into the Interrupt Dispatch Table, whose base address is taken from the CPU Interrupt Base (INTBASE) Register. See Figure 3-4 . A 32-bit External Procedure Descriptor is read from the table entry, and an External Procedure Call is performed using it. The MOD Register (16 bits) and Program Counter (32 bits) are pushed on the Interrupt Stack. TL/EE/9424–69 FIGURE 3-3. Slave Processor Status Word Any operand indicated as being of type ‘‘f’’ will not cause a transfer if the Register addressing mode is specified. This is because the Floating-Point Registers are physically on the Floating-Point Unit and are therefore available without CPU assistance. 3.2 EXCEPTION PROCESSING Exceptions are special events that alter the sequence of instruction execution. The CPU recognizes two basic types of exceptions: interrupts and traps. An interrupt occurs in response to an event signalled by activating the NMI or INT input signals. Interrupts are typically requested by peripheral devices that require the CPU’s attention. Traps occur as a result either of exceptional conditions (e.g., attempted division by zero) or of specific instructions TL/EE/9424 – 70 FIGURE 3-4. Interrupt Dispatch and Cascade Tables 26 3.0 Functional Description (Continued) This process is illustrated in Figure 3-13a , from the viewpoint of the programmer. Details on the sequences of events in processing interrupts and traps are given in the following sections. TL/EE/9424 – 71 TL/EE/9424 – 72 FIGURE 3-5. Exception Acknowledge Sequence 27 3.0 Functional Description (Continued) 3.2.2 Returning from an Exception Service Procedure 3.2.3 Maskable Interrupts To return control to an interrupted program, one of two instructions can be used: RETT (Return from Trap) and RETI (Return from Interrupt). RETT is used to return from any trap or a non-maskable interrupt service procedure. Since some traps are often used deliberately as a call mechanism for supervisor mode procedures, RETT can also adjust the Stack Pointer (SP) to discard a specified number of bytes from the original stack as surplus parameter space. RETI is used to return from a maskable interrupt service procedure. A difference of RETT, RETI also informs any external interrupt control units that interrupt service has completed. Since interrupts are generally asynchronous external events, RETI does not discard parameters from the stack. Both of the above instructions always restore the PSR, MOD, PC and SB registers to their previous contents. The INT pin is a level-sensitive input. A continuous low level is allowed for generating multiple interrupt requests. The input is maskable, and is therefore enabled to generate interrupt requests only while the Processor Status Register I bit is set. The I bit is automatically cleared during service of an INT or NMI request, and is restored to its original setting upon return from the interrupt service routine via the RETT or RETI instruction. The INT pin may be configured via the SETCFG instruction as either Non-Vectored (CFG Register bit I e 0) or Vectored (bit I e 1). 3.2.3.1 Non-Vectored Mode In the Non-Vectored mode, an interrupt request on the INT pin will cause an Interrupt Acknowledge bus cycle, but the CPU will ignore any value read from the bus and use instead a default vector of zero. This mode is useful for small systems in which hardware interrupt prioritization is unnecessary. TL/EE/9424 – 15 FIGURE 3-6. Return from Trap (RETT n) Instruction Flow 28 3.0 Functional Description (Continued) TL/EE/9424 – 16 FIGURE 3-7. Return from Interrupt (RETI) Instruction Flow again, which the CPU uses to determine whether it needs also to inform a Cascaded ICU. In a system with only one ICU (16 levels of interrupt), the vectors provided must be in the range of 0 through 127; that is, they must be positive numbers in eight bits. By providing a negative vector number, an ICU flags the interrupt source as being a Cascaded ICU (see below). 3.2.3.2 Vectored Mode: Non-Cascaded Case In the Vectored mode, the CPU uses an Interrupt Control Unit (ICU) to prioritize up to 16 interrupt requests. Upon receipt of an interrupt request on the INT pin, the CPU performs an ‘‘Interrupt Acknowledge, Master’’ bus cycle reading a vector value from the low-order byte of the Data Bus. This vector is then used as an index into the Dispatch Table in order to find the External Procedure Descriptor for the proper interrupt service procedure. The service procedure eventually returns via the Return from Interrupt (RETI) instruction, which performs an End of Interrupt bus cycle, informing the ICU that it may re-prioritize any interrupt requests still pending. The ICU provides the vector number Note: During a return from interrupt, the CPU looks at Bit 7 of the vector number from the master ICU. If Bit 7 is 0, bits 0 through 6 are ignored. 3.2.3.3 Vectored Mode: Cascaded Case In order to allow up to 256 levels of interrupt, provision is made both in the CPU and in the NS32202 Interrupt Control 29 3.0 Functional Description (Continued) Unit (ICU) to transparently support cascading. Figure 3-9 shows a typical cascaded configuration. Note that the Interrupt output from a Cascaded ICU goes to an Interrupt Request input of the Master ICU, which is the only ICU which drives the CPU INT pin. In a system which uses cascading, two tasks must be performed upon initialization: 1) For each Cascaded ICU in the system, the Master ICU must be informed of the line number (0 to 15) on which it receives the cascaded requests. 2) A Cascade Table must be established in memory. The Cascade Table is located in a NEGATIVE direction from the location indicated by the CPU Interrupt Base (INTBASE) Register. Its entries are 32-bit addresses, pointing to the Vector Registers of each of up to 16 Cascaded ICUs. tive Cascade Table index instead of a (positive) vector number. The CPU, seeing the negative value, uses it as an index into the Cascade Table and reads the Cascade Address from the referenced entry. Applying this address, the CPU performs an ‘‘Interrupt Acknowledge, Cascaded’’ bus cycle, reading the final vector value. This vector is interpreted by the CPU as an unsigned byte, and can therefore be in the range of 0 through 255. In returning from a Cascaded interrupt, the service procedure executes the Return from Interrupt (RETI) instruction, as it would for any Maskable Interrupt. The CPU performs an ‘‘End of Interrupt, Master’’ bus cycle, whereupon the Master ICU again provides the negative Cascaded Table index. The CPU, seeing a negative value, uses it to find the corresponding Cascade Address from the Cascade Table. Applying this address, it performs an ‘‘End of Interrupt, Cascaded’’ bus cycle, informing the Cascaded ICU of the completion of the service routine. The byte read from the Cascaded ICU is discarded. Figure 3-4 illustrates the position of the Cascade Table. To find the Cascade Table entry for a Cascaded ICU, take its Master ICU line number (0 to 15) and subtract 16 from it, giving an index in the range b16 to b1. Multiply this value by 4, and add the resulting negative number to the contents of the INTBASE Register. The 32-bit entry at this address must be set to the address of the Hardware Vector Register of the Cascaded ICU. This is referred to as the ‘‘Cascade Address.’’ Upon receipt of an interrupt request from a Cascaded ICU, the Master ICU interrupts the CPU and provides the nega- Note: If an interrupt must be masked off, the CPU can do so by setting the corresponding bit in the Interrupt Mask Register of the Interrupt Controller. However, if an interrupt is set pending during the CPU instruction that masks off that interrupt, the CPU may still perform an interrupt acknowledge cycle following that instruction since it might have sampled the INT line before the ICU deasserted it. This could cause the ICU to provide an invalid vector. To avoid this problem the above operation should be performed with the CPU interrupt disabled. TL/EE/9424 – 17 FIGURE 3-8. Interrupt Control Unit Connections (16 Levels) 30 3.0 Functional Description (Continued) TL/EE/9424 – 18 FIGURE 3-9. Cascaded Interrupt Control Unit Connections 31 3.0 Functional Description (Continued) Trap (FLG): The FLAG instruction detected a ‘‘1’’ in the PSR F-bit. 3.2.4 Non-Maskable Interrupt The Non-Maskable Interrupt is triggered whenever a falling edge is detected on the NMI pin. The CPU performs an ‘‘Interrupt Acknowledge’’ bus cycle from Address FFFF0016 when processing of this interrupt actually begins. The vector value used for the Non-Maskable Interrupt is taken as 1, regardless of the value read from the bus. The service procedure returns from the Non-Maskable-Interrupt using the Return from Trap (RETT) instruction. No special bus cycles occur on return. Trap (BPT): The Breakpoint (BPT) instruction was executed. Trap (TRC): The instruction just completed is being traced. Refer to Section 3.3.1 for details. Trap (UND): An undefined opcode was encountered by the CPU. 3.2.6 Priority among Exceptions The CPU checks for specific exceptions at various points while executing an instruction. It is possible that several exceptions occur simultaneously. In that event, the CPU responds to the exception with highest priority. 3.2.5 Traps Traps are processing exceptions that are generated as direct results of the execution of an instruction. The return address saved on the stack by any trap except Trap (TRC) is the address of the first byte of the instruction during which the trap occurred. When a trap is recognized, maskable interrupts are not disabled. There are 8 trap conditions recognized by the NS32FX16 as described below. Trap (SLAVE): An exceptional condition was detected by the Floating-Point Unit during the execution of a Slave Instruction. This trap is requested via the Status Word returned as part of the Slave Processor Protocol (Section 3.1.3.1). Trap (ILL): Illegal operation. A privileged operation was attempted while the CPU was in User Mode (PSR bit U e 1). Trap (SVC): The Supervisor Call (SVC) instruction was executed. Trap (DVZ): An attempt was made to divide an integer by zero. (The FPU trap is used for Floating-Point division by zero.) Figure 3-10 shows an exception processing flowchart. Before executing an instruction, the CPU checks for pending interrupts, or Trap (TRC). The CPU responds to any pending interrupt requests; nonmaskable interrupts are recognized with higher priority than maskable interrupts. If no interrupts are pending, then the CPU checks the P-flag in the PSR to determine whether a Trap (TRC) is pending. If the P-flag is 1, a Trap (TRC) is processed. If no interrupt or Trap (TRC) is pending, the CPU begins executing the instruction. While executing an instruction, the CPU may recognize up to two exceptions: 1. Interrupt, if the instruction is interruptible. 2. One of 7 mutually exclusive traps: SLAVE, ILL, SVC, DVZ, FLG, BPT, UND If no exception is detected while the instruction is executing, then the instruction is completed and the PC is updated to point to the next instruction. 32 3.0 Functional Description (Continued) TL/EE/9424 – 19 FIGURE 3-10. Exception Processing Flowchart 33 3.0 Functional Description (Continued) 3.2.7 Exception Acknowledge Sequences: Detailed Flow 3.2.7.2 SLAVE/ILL/SVC/DVZ/FLG/BPT/UND For purposes of the following detailed discussion of exception acknowledge sequences, a single sequence called ‘‘service’’ is defined in Figure 3-11 . 1. Restore the currently selected Stack Pointer and the Processor Status Register to their original values at the start of the trapped instruction. 2. Set ‘‘Vector’’ to the value corresponding to the trap type. SLAVE: Vector e 3. Trap Sequence Upon detecting any interrupt request or trap condition, the CPU first performs a sequence dependent upon the type of exception. This sequence will include saving a copy of the Processor Status Register and establishing a vector and a return address. The CPU then performs the service sequence. ILL: SVC: DVZ: FLG: BPT: UND: 3.2.7.1 Maskable/Non-Maskable Interrupt Sequence This sequence is performed by the CPU when the NMI pin receives a falling edge, or the INT pin becomes active with the PSR I bit set. The interrupt sequence begins either at the next instruction boundary or, in the case of the String instructions, or Graphics instructions which have interior loops (BBOR, BBXOR, BBAND, BBFOR, EXTBLT, MOVMP, SBITPS, TBITS), at the next interruptible point during its execution. The graphics instructions are interruptible. 1. If a String instruction was interrupted and not yet completed: a. Clear the Processor Status Register P bit. b. Set ‘‘Return Address’’ to the address of the first byte of the interrupted instruction. Otherwise, set ‘‘Return Address’’ to the address of the next instruction. 2. Copy the Processor Status Register (PSR) into a temporary register, then clear PSR bits S, U, T, P and I. 3. If the interrupt is Non-Maskable: a. Read a byte from address FFFF0016, applying Status Code 0100 (Interrupt Acknowledge, Master: Section 3.4.1). Discard the byte read. b. Set ‘‘Vector’’ to 1. c. Go to Step 8. 4. If the interrupt is Non-Vectored: a. Read a byte from address FFFE0016, applying Status Code 0100 (Interrupt Acknowledge, Master: Section 3.4.1). Discard the byte read. b. Set ‘‘Vector’’ to 0. c. Go to Step 8. 5. Here the interrupt is Vectored. Read ‘‘Byte’’ from address FFFE0016, applying Status Code 0100 (Interrupt Acknowledge, Master: Section 3.4.1). Vector Vector Vector Vector Vector Vector e 4. e 5. e 6. e 7. e 8. e 10. 3. If Trap (UND) a. Clear the Processor Status Register P bit. 4. Copy the Processor Status Register (PSR) into a temporary register, then clear PSR bits T, U, S and P. 5. Set ‘‘Return Address’’ to the address of the first byte of the trapped instruction. 6. Perform Service (Vector, Return Address), Figure 3-11 . 3.2.7.3. Trace Trap Sequence 1. In the Processor Status Register (PSR), clear the P bit. 2. Copy the PSR into a temporary register, then clear PSR bits S, U and T. 3. Set ‘‘Vector’’ to 9. 4. Set ‘‘Return Address’’ to the address of the next instruction. 5. Perform Service (Vector, Return Address), Figure 3-11 . Service (Vector, Return Address): 1. Push the PSR copy onto the Interrupt Stack as a 16bit value. 2. Read the 32-bit External Procedure Descriptor from the Interrupt Dispatch Table: address is Vector*4 a INTBASE Register contents. 3. Move the Module field of the Descriptor into the temporary MOD Register. 4. Read the Program Base pointer from memory address MOD a 8, and add to it the Offset field from the Descriptor, placing the result in the Program Counter. 5. Read the new Static Base pointer from the memory address contained in MOD, placing it into the SB Register. 6. Flush Queue: Non-sequentially fetch first instruction of Interrupt Routine. 7. Push MOD Register onto the Interrupt Stack as a 16bit value. (The PSR has already been pushed as a 16bit value.) 8. Push the Return Address onto the Interrupt Stack as a 32-bit quantity. 9. Copy temporary MOD Register to MOD Register. 6. If ‘‘Byte’’ t 0, then set ‘‘Vector’’ to ‘‘Byte’’ and go to Step 8. 7. If ‘‘Byte’’ is in the range b16 through b1, then the interrupt source is Cascaded. (More negative values are reserved for future use.) Perform the following: a. Read the 32-bit Cascade Address from memory. The address is calculated as INTBASE a 4* Byte. b. Read ‘‘Vector’’, applying the Cascade Address just read and Status Code 0101 (Interrupt Acknowledge, Cascaded: Section 3.4.1). 8. Perform Service (Vector, Return Address), Figure 3-11. FIGURE 3-11. Service Sequence Invoked during All Interrupt/Trap Sequences 34 3.0 Functional Description (Continued) TABLE 3-2. Summary of Exception Processing Instruction Ending Cleared before Saving PSR Cleared after Saving PSR Interrupt Before Instruction None /P* TUSPI UND SLAVE, SVC, DVZ, FLG, BPT, ILL TRC Suspended Suspended Before Instruction P None P TUS TUSP TUS Exception For all other Series 32000 instructions, a single-step operation will complete the entire instruction before traping back to the debugger. On the instructions mentioned above, serveral single-step commands may be required to complete the instruction, ONLY when interrupts are occurring. There are some methods to give the appearance of singlestepping for these NS32CG16 instructions. 1. MON16/MONCG monitors the return from single-step trap vector, PC value. If the PC has not changed since the last single-step command was issued, the single-step operation is repeated. It is also advisable to ensure that one of the NS32CG16 instructions is being singlestepped, by inspecting the first byte of the address pointed to by the PC register. If it is 0x0E, then the instruction is an NS32CG16-specific instruction. 2. A breakpoint following the instruction would also trap after the instruction had completed. 3.3 DEBUGGING SUPPORT The NS32CG16 provides features to assist in program debugging. Besides the Breakpoint (BPT) instruction that can be used to generate soft breaks, the CPU also provides the instruction tracing capability. 3.3.1 Instruction Tracing Instruction tracing is a very useful feature that can be used during debugging to single-step through selected portions of a program. Tracing is enabled by setting the T-bit in the PSR Register. When enabled, the CPU generates a Trace Trap (TRC) after the execution of each instruction. At the beginning of each instruction, the T-bit is copied into the PSR P (Trace ‘‘Pending’’) bit. If the P-bit is set at the end of an instruction, then the Trace Trap is activated. If any other trap or interrupt request is made during a traced instruction, its entire service procedure is allowed to complete before the Trace Trap occurs. Each interrupt and trap sequence handles the P-bit for proper tracing, guaranteeing only one Trace Trap per instruction, and guaranteeing that the Return Address pushed during a Trace Trap is always the address of the next instruction to be traced. The beginning of the execution of a TRAP(UND) is not considered to be a beginning of an instruction, and hence the T-bit is not copied into the P-bit. Due to the fact that some instructions can clear the T- and P-bits in the PSR, in some cases a Trace Trap may not occur at the end of the instruction. This happens when one of the privileged instructions BICPSRW or LPRW PSR is executed. In other cases, it is still possible to guarantee that a Trace Trap occurs at the end of the instruction, provided that special care is taken before returning from the Trace Trap Service Procedure. In case a BICPSRB instruction has been executed, the service procedure should make sure that the T-bit in the PSR copy saved on the Interrupt Stack is set before executing the RETT instruction to return to the program being traced. If the RETT or RETI instructions have to be traced, the Trace Trap Service Procedure should set the P- and T-bits in the PSR copy on the Interrupt Stack that is going to be restored in the execution of such instructions. While debugging the NS32CG16 instructions which have interior loops (BBOR, BBXOR, BBAND, BBFOR, EXTBLT, MOVMP, SBITPS, TBITS), special care must be taken with the single-step trap. If an interrupt occurs during a singlestep of one of the graphics instructions, the interrupt will be serviced. Upon return from the interrupt service routine, the new NS32CG16 instruction will not be re-entered, due to a single-step trap. Both the NMI and INT interrupts will cause this behavior. Another single-step operation (S command in DBG16/MONCG) will resume from where the instruction was interrupted. There are no side effects from this early termination, and the instruction will complete normally. Note: If instruction tracing is enabled while the WAIT instructioin is executed, the Trap (TRC) occurs after the next interrupt, when the interrupt service procedure has returned. 3.4 SYSTEM INTERFACE This section provides general information on the NS32CG16 interface to the external world. Descriptions of the CPU requirements as well as the various bus characteristics are provided here. Details on other device characteristics including timing are given in Chapter 4. 3.4.1 Power and Grounding The NS32CG16 requires a single 5V power supply, applied on 5 pins. The logic voltage pin (VCCL) supplies the power to the on-chip logic. The buffer voltage pins VCCCTTL, VCCFCLK, VCCAD, and VCCIO supply the power to the onchip output drivers. Grounding connections are made on 6 pins. The Logic Ground Pin (VSSL) provides the ground connection to the on-chip logic. The buffer ground pins VSSFCLK, VSSNTSO, VSSHAD, VSSLAD, VSSIO are the ground pins for the onchip output drivers. For optimal noise immunity, the power and ground pins should be connected to VCC and ground planes respectively. If VCC and ground planes are not used, single conductors should be run directly from each VCC pin to a power point, and from each GND pin to a ground point. Daisy-chained connections should be avoided. Decoupling capacitors should also be used to keep the noise level to a minimum. Standard 0.1 mF ceramic capacitors can be used for this purpose. In addition, a 1.0 mF tantalum capacitor should be connected between VCCL and ground. They should attach to VCC, VSS pairs as close as possible to the NS32CG16. 35 3.0 Functional Description (Continued) Either an external single-phase clock signal or a crystal can be used as the clock source. If a single-phase clock source is used, only the connection on OSCIN is required; OSCOUT should be left unconnected or loaded with no more than 5 pF of stray capacitance. The voltage level requirements specified in Section 4.3 must also be met for proper operation. When operation with a crystal is desired, special care should be taken to minimize stray capacitances and inductances. The crystal, as well as the external components, should be placed in close proximity to the OSCIN and OSCOUT pins to keep the printed circuit trace lengths to an absolute minimum. Figure 3-13a and 3-13b show the external crystal interconnections. Table 3-3 provides the crystal characteristics and the values of the R, C, and L components, including stray capacitance, required for various frequencies. During prototype using wire-wrap or similar methods, the capacitors should be soldered directly to the power pins of the NS32CG16 socket, or as close as possible, with very short leads. Recommended bypass for production in printed circuit boards: a5 Ground Capacitors VCCL VSSL 0.1 mF Disk Ceramic 1.0 mF Tantalum VCCIO VSSIO 0.1 mF VCCCTTL VSSNTSO 0.1 mF VCCAD VSSLAD 0.1 mF VCCAD VSSHAD None VCCFCLK VSSFCLK 0.1 mF VCCL – VSSL bypass requires a very short lead length and low inductance on the 0.1 mF capacitor. Design Notes When constructing a board using high frequency clocks with multiple lines switching, special care should be taken to avoid resonances on signal lines. A separate power and ground layer is recommended. This is true when designing boards for the NS32CG16. Switching times of under 5 ns on some lines are possible. Resonant frequencies should be maintained well above the 200 MHz frequency range on signal paths by keeping traces short and inductance low. Loading capacitance at the end of a transmission line contributes to the resonant frequency and should be minimized if possible. Capacitors should be located as close as possible across each power and ground pair near the NS32CG16. Power and ground connections are shown in Figure 3-12 . TL/EE/9424 – 21 FIGURE 3-13a. Crystal Interconnections 20 MHz, 30 MHz 3.4.2 Clocking The NS32CG16 provides an internal oscillator that interacts with an external clock source through two signals; OSCIN and OSCOUT. TL/EE/9424 – 22 FIGURE 3-13b. Crystal InterconnectionsÐ30 MHz TL/EE/9424–7 FIGURE 3-12. Power and Ground Connections 36 3.0 Functional Description (Continued) TL/EE/9424 – 23 FIGURE 3-14. Recommended Reset Connections TABLE 3-3. External Oscillator Specifications Crystal Characteristics Type Tolerance 3.4.4 Resetting The RSTI input pin is used to reset the NS32CG16. The CPU samples RSTI on the falling edge of CTTL. Whenever a low level is detected, the CPU responds immediately. Any instruction being executed is terminated; any results that have not yet been written to memory are discarded; and any pending interrupts and traps are eliminated. The internal latch for the edge-sensitive NMI signal is cleared. On application of power, RSTI must be held low for at least 50 ms after VCC is stable. This is to ensure that all on-chip voltages are completely stable before operation. Whenever a Reset is applied, it must also remain active for not less than 64 CTTL cycles. See Figures 3-15 and 3-16 . AT-Cut 0.005% at a 25§ C 0.01% from 0§ C to a 70§ C Stability Resonance 20 MHz or 30 MHz: 30 MHz: Maximum Series Resistance Maximum Shunt Capacitance Fundamental (Parallel) Third Overtone (Parallel) 50X 7 pF R, C and L Values Frequency (MHz) R1 (kX) R2 (X) C1 (pF) C2 (pF) C3 (pF) L (mH) 20 30 30 270 180 180 75 51 51 20 20 20 20 20 20 800–1300 3.3 3.4.3 Power Save Mode The NS32CG16 provides a power save feature that can be used to significantly reduce the power consumption at times when the computational demand decreases. The device uses the clock signal at the OSCIN pin to derive the internal clock as well as the external signals PHI1, PHI2, CTTL and FCLK. The frequency of these clock signals is affected by the clock scaling factor. Scaling factors of 1, 2, 4, or 8 can be selected by properly setting the C- and M-bits in the CFG register. The power save mode should not be used to reduce the clock frequency below the minimum frequency required by the CPU. Upon reset, both C and M are set to zero, thus maximum clock rate is selected. Due to the fact that the C- and M-bits are programmed by the SETCFG instruction, the power save feature can only be controlled by programs running in supervisor mode. The following table shows the C- and M-bit settings for the various scaling factors, and the resulting supply current for a crystal frequency of 30 MHz. TL/EE/9424 – 24 FIGURE 3-15. Power-On Reset Requirements TL/EE/9424 – 25 FIGURE 3-16. General Reset Timing While in the Reset state, the CPU drives the signals ADS, RD, WR, DBE, TSO, BPU, and DDIN inactive. AD0 – AD15, A16 – A23 and SPC are floated, and the state of all other output signals is undefined. The internal CPU clock PHI1, PHI2 and CTTL run at half the frequency of the signal on the OSCIN pin. The HOLD signal must be kept inactive. After the RSTI signal is driven high, the CPU will stay in the reset condition for approximately 8 clock cycles and then it will begin execution at address 0. The PSR is reset to 0. The CFG C- and M-bits are reset to 0. FCLK runs at the same frequency as OSCIN. NMI is enabled to allow Non-Maskable Interrupts. The following conditions are present after reset due to the PSR being reset to 0: Clock Scaling Factor vs Supply Current C M Scaling Factor CPU Clock Frequency Typical ICC at a 5V 0 0 1 1 0 1 0 1 1 2 4 8 15 MHz 7.5 MHz 37.5 MHz 1.88 MHz 140 mA 76 mA 42 mA 25 mA 37 3.0 Functional Description (Continued) Tracing is disabled. 0110 Ð End of Interrupt, Master. Supervisor mode is enabled. The CPU is performing a Read cycle to indicate that it is executing a Return from Interrupt (RETI) instruction at the completion of an interrupt’s service procedure. 0111 Ð End of Interrupt, Cascaded. The CPU is performing a read cycle from a Cascaded Interrupt Control Unit to indicate that it is executing a Return from Interrupt (RETI) instruction at the completion of an interrupt’s service procedure. 1000 Ð Sequential Instruction Fetch. The CPU is reading the next sequential word from the instruction stream into the Instruction Queue. It will do so whenever the bus would otherwise be idle and the queue is not already full. 1001 Ð Non-Sequential Instruction Fetch The CPU is performing the first fetch of instruction code after the Instruction Queue is purged. This will occur as a result of any jump or branch, any interrupt or trap, or execution of certain instructions. 1010 Ð Data Transfer. The CPU is reading or writing an operand of an instruction. 1011 Ð Read RMW Operand. The CPU is reading an operand which will subsequently be modified and rewritten. The write cycle of RMW will have a ‘‘write’’ status. 1100 Ð Read for Effective Address Calculation. The CPU is reading information from memory in order to determine the Effective Address of an operand. This will occur whenever an instruction uses the Memory Relative or External addressing mode. 1101 Ð Transfer Slave Processor Operand. The CPU is either transferring an instruction operand to or from a Slave Processor, or it is issuing the Operation Word of a Slave Processor instruction. 1110 Ð Read Slave Processor Status. The CPU is reading a Status Word from a Slave Processor after the Slave Processor has signalled completion of an instruction. 1111 Ð Broadcast Slave ID. The CPU is initiating the execution of a Slave Processor instruction by transferring the first byte of the instruction, which represents the slave processor indentification. 3.4.5.2 Basic Read and Write Cycles The sequence of events occurring during a CPU access to either memory or peripheral device is shown in Figure 3-18 for a read cycle, and Figure 3-19 for a write cycle. The cases shown assume that the selected memory or peripheral device is capable of communicating with the CPU at full speed. If not, then cycle extension may be requested through CWAIT and/or WAIT1 –2. Supervisor stack space is used when the TOS addressing mode is indicated. No trace traps are pending. Only NMI is enabled. Maskable interrupts are disabled. BPU is inactive high. The Clock Scaling Factor is set to 1, refer to Section 3.4.3. Note that vector/non-vectored interrupts have not been selected. While interrupts are disabled, a SETCFG [I] instruction must be executed to enable vectored interrupts. If nonvectored interrupts are required, a SETCFG without the [I] must be executed. The presence/absence of the NS32081, NS32181, or NS32381 has also not been declared. If there is a FloatingPoint Unit, a SETCFG [F] instruction must be executed. If there is no floating-point unit, a SETCFG without the [F] must be executed. In general, a SETCFG instruction must be executed in the reset routine, in order to properly configure the CPU. The options should be combined, and executed in a single instruction. For example, to declare vectored interrupts, a Floating-Point unit installed, and full CPU clock rate, execute a SETCFG [F, I] instruction. To declare non-vectored interrupts, no FPU, and full CPU clock rate, execute a SETCFG [ ] instruction. 3.4.5 Bus Cycles The NS32CG16 will perform bus cycles for one of the following reasons: 1. To fetch instructions from memory. 2. To write or read data to or from memory or external peripheral devices. 3. To acknowledge an interrupt, or to acknowledge completion of an interrupt service routine. 4. To transfer information to or from a Slave Processor. 3.4.5.1 Bus Status The NS32CG16 CPU presents four bits of Bus Status information on pins ST0–ST3. The various combinations on these pins indicate why the CPU is performing a bus cycle, or, if it is idle on the bus, they why it is idle. The Bus Status pins are interpreted as a 4-bit value, with ST0 the least significant bit. Their values decode as follows: 0000 Ð The bus is idle because the CPU does not need to perform a bus access. 0001 Ð The bus is idle because the CPU is executing the WAIT instruction. 0010 Ð (Reserved for future use.) 0011 Ð The bus is idle because the CPU is waiting for a Slave Processor to complete an instruction. 0100 Ð Interrupt Acknowledge, Master The CPU is performing a Read cycle to acknowledge an interrupt request. See Section 3.2.3. 0101 Ð Interrupt Acknowledge, Cascaded. The CPU is reading an interrupt vector to acknowledge a maskable interrupt request from a Cascaded Interrupt Control Unit. 38 3.0 Functional Description (Continued) from the AD0 – AD15 pins. See Figure 3-17 . During this time also the status signals DDIN, indicating the direction of the transfer, and HBE, indicating whether the high byte (AD8 – AD15) is to be referenced, become valid. During T2 the CPU switches the Data Bus, AD0 – AD15, to either accept or present data. Note that the signals A16 – A23 remain valid, and need not be latched. A full-speed bus cycle is performed in four cycles of the CTTL clock signal, labeled T1 through T4. Clock cycles not associated with a bus cycle are designated Ti (for ‘‘idle’’). During T1, the CPU applies an address on pins AD0 – AD15 and A16 –A23 and provides a low-going pulse on the ADS pin, which serves the dual purpose of informing external circuitry that a bus cycle is starting and of providing control to an external latch for demultiplexing Address bits 0 – 15 TL/EE/9424 – 11 FIGURE 3-17. Bus Connections 39 3.0 Functional Description (Continued) TL/EE/9424 – 12 FIGURE 3-18. Read Cycle Timing 40 3.0 Functional Description (Continued) TL/EE/9424 – 13 FIGURE 3-19. Write Cycle Timing 41 3.0 Functional Description (Continued) At this time the signals TSO (Timing State Output), DBE (Data Buffer Enable) and either RD (Read Strobe) or WR (Write Strobe) will also be activated. tional T3 state (also called wait state) will be inserted after the next T-State. Any combination of the above signals can be activated at one time. However, the WAIT1 – 2 inputs are only sampled by the CPU at the end of state T2. They are ignored at all other times. The WAIT1 – 2 inputs are binary weighted, and can be used to insert up to 3 wait states, according to the following table. The T3 state provides for access time requirements, and it occurs at least once in a bus cycle. At the end of T2, on the rising edge of CTTL, the CWAIT and WAIT1–2 signals are sampled to determine whether the bus cycle will be extended. See Section 3.4.5.3. If the CPU is performing a read cycle, the data bus (AD0– AD15) is sampled at the beginning of T4 on the rising edge of CTTL. Data must, however, be held a little longer to meet the data hold time requirements. The RD signal is guaranteed not to go inactive before this time, so its rising edge can be safely used to disable the device providing the input data. The T4 state finishes the bus cycle. At the beginning of T4, the RD or WR, and TSO signals go inactive, and on the falling edge of CTTL, DBE goes inactive, having provided for necessary data hold times. Data during Write cycles remains valid from the CPU throughout T4. Note that the Bus Status lines (ST0 –ST3) change at the beginning of T4, anticipating the following bus cycle (if any). 3.4.5.3 Cycle Extension To allow sufficient access time for any speed of memory or peripheral device, the NS32CG16 provides for extension of a bus cycle. Any type of bus cycle except a Slave Processor cycle and a special bus cycle can be extended. In Figures 3-18 and 3-19 , note that during T3 all bus control signals from the CPU are flat. Therefore, a bus cycle can be cleanly extended by causing the T3 state to be repeated. This is the purpose of the WAIT1–2 and CWAIT input signals. At the end of state T2, on the rising edge of CTTL, WAIT1– 2 and CWAIT are sampled. If any of these signals are active, the bus cycle will be extended by at least one clock cycle. Thus, one or more addi- WAIT2 WAIT1 Number of Wait States HIGH HIGH LOW LOW HIGH LOW HIGH LOW 0 1 2 3 CWAIT causes wait states to be inserted continuously as long as it is sampled active. It is normally used when the number of wait states to be inserted in the CPU bus cycle is not known in advance. The following sequence shows the CPU response to the WAIT1 – 2 and CWAIT inputs. 1. Start bus cycle. 2. Sample WAIT1 – 2 and CWAIT at the end of state T2. 3. If the WAIT1 – 2 inputs are both inactive, then go to step 6. 4. Insert the number of wait states selected by WAIT1 –2. 5. Sample CWAIT again. 6. If CWAIT is not active, then go to step 8. 7. Insert one wait state and then go to step 5. 8. Complete bus cycle. Figure 3-20 shows a bus cycle extended by three wait states, two of which are due to WAIT2, and one is due to CWAIT. 42 3.0 Functional Description (Continued) TL/EE/9424 – 14 FIGURE 3-20. Cycle Extension of a Read Cycle 43 3.0 Functional Description (Continued) the first bus cycle after a break displays Non-Sequential status, and that cycle is either an Even Word Read or an Odd Byte Read, depending on whether the distination address is even or odd. 3.4.5.4 Instruction Fetch Cycles Instructions for the NS32CG16 CPU are ‘‘prefetched’’; that is, they are input before being needed into the next available entry of the eight-byte instruction Queue. The CPU performs two types of instruction Fetch cycles: Sequential and NonSequential. These can be distinguished from each other by their differing status combinations on pins ST0–ST3 (Section 3.4.5.1). A Sequential Fetch will be performed by the CPU whenever the Data Bus would otherwise be idle and the Instruction Queue is not currently full. Sequential Fetches are always Even Word Read cycles (Table 3-5). A Non-Sequential Fetch occurs as a result of any break in the normally sequential flow of a program. Any jump or branch instruction, a trap or an interrupt will cause the next Instruction Fetch cycle to be Non-Sequential. In addition, certain instructions flush the instruction queue, causing the next instruction fetch to display Non-Sequential status. Only 3.4.5.5 Interrupt Control Cycles Activating the INT or NMI pin on the CPU will initiate one or more bus cycles whose purpose in interrupt control rather than the tranfer of instructions or data. Execution of the Return from Interrupt Instruction (RETI) will also cause Interrupt Control bus cycles. These differ from instruction or data transfers only in the status presented on pins ST0 – ST3. All Interrupt Control cycles are single-byte Read cycles. Table 3-4 shows the Interrupt Control sequences associated with each interrupt and with the return from its service routine. For full details of the NS32CG16 interrupt structure, see Section 3.2. 44 3.0 Functional Description (Continued) Cycle Status Interrupt Acknowledge 1 0100 TABLE 3-4. Interrupt Sequences Address DDIN HBE A0 A. Non-Maskable Interrupt Control Sequence High Bus Low Bus FFFF0016 Don’t Care Don’t Care Don’t Care Don’t Care 0 1 0 Interrupt Return None: Performed through Return from Trap (RETT) instruction. B. Non-Vectored Interrupt Control Sequence Interrupt Acknowledge 1 0100 FFFE0016 0 1 0 Interrupt Return None: Performed through Return from Trap (RETT) instruction. C. Vectored Interrupt Sequence: Non-Cascaded Interrupt Acknowledge 1 0100 FFFE0016 0 1 0 Don’t Care Vector: Range: 0 – 127 Interrupt Return 1 0110 FFFE0016 0 1 0 Don’t Care Vector: Same as in Previous Int. Ack. Cycle Don’t Care Cascade Index: range b16 to b1 D. Vectored Interrupt Sequence: Cascaded Interrupt Acknowledge 1 0100 FFFE0016 0 1 (The CPU here uses the Cascade Index to find the Cascade Address.) 2 0101 Cascade 0 1 or Address 0* Interrupt Return 1 0110 FFFE0016 0 1 (The CPU here uses the Cascade Index to find the Cascade Address.) 2 0111 Cascade 0 1 or Address 0* 0 0 or 1* Vector, range 0 – 255; on appropriate half or Data Bus for even/odd address 0 Don’t Care Cascade Index: same as in previous Int. Ack. Cycle 0 or 1* Don’t Care Don’t Care * If the Cascaded ICU Address is Even (A0 is low), then the CPU applies HBE high and reads the vector number from bits 0–7 of the Data Bus. If the address is Odd (A0 is high), then the CPU applies HBE low and reads the vector number from bits 8–15 of the Data Bus. The vector number may be in the range 0–225. 45 3.0 Functional Description (Continued) 3.4.5.6 Slave Processor Bus Cycles A Slave Processor bus cycle always takes exactly two clock cycles, labeled T1 and T4 (see Figures 3-21 and 3-22 ). During a Read cycle SPC is active from the beginning of T1 to the beginning of T4, and the data is sampled at the end of T1. The Cycle Status pins lead the cycle by one clock period, and are sampled on the leading edge of SPC. During a Write cycle, the CPU applies data and activates SPC at T1, removing SPC at T4. The Slave Processor latches the status on the leading edge of SPC and latches data on the trailing edge. The CPU does not pulse the Address Strobe (ADS), and no bus signals are generated. The direction of a transfer is determined by the sequence (‘‘protocol’’) established by the instruction under execution; but the CPU indicates the direction on the DDIN pin for hardware debugging purposes. A Slave Processor operand is transferred in one or more Slave bus cycles. A Byte operand is transferred on the least-significant byte of the Data Bus (AD0–AD7), and a Word operand is transferred on the entire bus. A Double Word is transferred in a consecutive pair of bus cycles, least-significant word first. A Quad Word is transferred in two pairs of Slave cycles, with other bus cycles possibly occurring between them. The word order is from least-significant word to most-significant. Figure 3-23 shows the NS32CG16 and FPU connection diagram. TL/EE/9424 – 31 *Note: Slave Processor samples Data Bus here. FIGURE 3-22. Slave Processor Write Cycle 3.4.5.7 Data Access Sequences The 24-bit address provided by the NS32CG16 is a byte address; that is, it uniquely identifies one of up to 16,777,216 8-bit memory locations. An important feature of the NS32CG16 is that the presence of a 16-bit data bus imposes no restrictions on data alignment; any data item, regardless of size, may be placed starting at any memory address. The NS32CG16 provides a special control signal, High Byte Enable (HBE), which facilitates individual byte addressing on a 16-bit bus. Memory is organized as two 8-bit banks, each bank receiving the word address (A1 – A23) in parallel. One bank, connected to Data Bus pins AD0 – AD7, is enabled to respond to even byte addresses; i.e., when the least significant address bit (A0) is low. The other bank, connected to Data Bus pins AD8 – AD15, is enabled when HBE is low. See Figure 3-24 . Any bus cycle falls into one of three categories: Even Byte Access, Odd Byte Access, and Even Word Access. All accesses to any data type are made up of sequences of these cycles. Table 3-5 gives the state of A0 and HBE for each category. Note: CPU samples Data Bus here. TL/EE/9424–30 FIGURE 3-21. Slave Processor Read Cycle 46 3.0 Functional Description (Continued) TL/EE/9424 – 73 FIGURE 3-23. NS32CG16 and FPU Interconnections TL/EE/9424 – 74 FIGURE 3-24. Memory Interface TABLE 3-5. Bus Cycle Categories Category HBE A0 Even Byte Odd Byte Even Word 1 0 0 0 1 0 Accesses of operands requiring more than one bus cycle are performed sequentially, with no idle T-states separating them. The number of bus cycles required to transfer an operand depends on its size and its alignment (i.e., whether it starts on an even byte address or an odd byte address). Table 3-6 lists the bus cycles performed for each situation. For the timing of A0 and HBE, see Section 3.4.5.2. 47 3.0 Functional Description (Continued) Cycle Type TABLE 3-6. Data Access Sequences HBE A0 High Bus Address Low Bus A. Odd Word Access Sequence Byte 1 1 2 Odd Byte Even Byte A Aa1 0 1 1 0 Byte 0 Don’t Care Byte 0 wA Don’t Care Byte 1 B. Even Double-Word Access Sequence 1 1 Even Word Even Word A Aa2 Byte 3 Byte 2 0 0 0 0 Byte 1 Byte 1 Byte 3 Byte 0 wA Byte 0 Byte 2 C. Odd Double-Word Access Sequence 1 2 3 Odd Byte Even Word Even Byte A Aa1 Aa3 Byte 3 Byte 2 0 0 1 1 0 0 Byte 1 Byte 0 Byte 2 Don’t Care Byte 0 wA Don’t Care Byte 1 Byte 3 D. Even Quad-Word Access Sequence Byte 7 Byte 6 Byte 5 1 2 Even Word Even Word A Aa2 Byte 4 Byte 3 Byte 2 0 0 0 0 Byte 1 Byte 3 Byte 1 Byte 0 Byte 2 Byte 0 0 0 Byte 5 Byte 7 Byte 4 Byte 6 wA Other Bus Cycles (Instruction Prefetch or Slave) can occur here. 3 4 Even Word Even Word Aa4 Aa6 0 0 E. Odd Quad-Word Access Sequence Byte 7 Byte 6 Byte 5 1 2 3 Odd Byte Even Word Even Byte A Aa1 Aa3 Byte 4 Byte 3 Byte 2 Byte 1 0 0 1 1 0 0 Byte 0 Byte 2 Don’t Care Don’t Care Byte 1 Byte 3 Byte 0 1 0 0 Byte 4 Byte 6 Don’t Care Don’t Care Byte 5 Byte 7 Other Bus Cycles (Instruction Prefetch or Slave) can occur here. 4 5 6 Odd Byte Even Word Even Byte Aa4 Aa5 Aa7 0 0 1 48 wA 3.0 Functional Description (Continued) as the CPU must always complete the current bus cycle. Figure 3-25 shows the timing sequence when the CPU is idle. In this case, the CPU grants the bus during the immediately following clock cycle. Figure 3-26 shows the sequence when the CPU is using the bus at the time the HOLD request is made. If the request is made during or before the clock cycle shown (two clock cycles before T4), the CPU will release the bus during the clock cycle following T4. If the request occurs closer to T4, the CPU may already have decided to initiate another bus cycle. In that case it will not grant the bus until after the next T4 state. Note that this situation will also occur if the CPU is idle on the bus but has initiated a bus cycle internally. 3.4.5.8 Bus Access Control The NS32CG16 CPU has the capability of relinquishing its control of the bus upon request from a DMA controller or another CPU. This capability is implemented by means of the HOLD (Hold Request) and HLDA (Hold Acknowledge) pins. By asserting HOLD low, an external device requests access to the bus. On receipt of HLDA from the CPU, the device may perform bus cycles, as the CPU at this point has set AD0 –AD15, A16–A23 and HBE to the TRI-STATEÉ condition and has switched ADS and DDIN to the input mode. The CPU now monitors ADS and DDIN from the external device to generate the relevant strobe signals (i.e., TSO, DBE, RD or WR). To return control of the bus to the CPU, the device sets HOLD inactive, and the CPU acknowledges it by setting HLDA inactive. How quickly the CPU releases the bus depends on whether it is idle on the bus at the time the HOLD request is made, Note 1: During DMA cycles the WAIT1–2 signals should be kept inactive, unless they are also monitored by the DMA controller. If wait states are required, CWAIT should be used. Note 2: The logic value of the status pins, ST0–3, is undefined during DMA activity. TL/EE/9424 – 75 FIGURE 3-25. HOLD Timing, Bus Initially Idle 49 3.0 Functional Description (Continued) TL/EE/9424 – 76 FIGURE 3-26. HOLD Timing, Bus Initially Not Idle 50 HOLD 3.0 Functional Description (Continued) 3.4.5.9 Instruction Status In addition to the four bits of Bus Cycle status (ST0 – 3), the NS32CG16 CPU also presents Instruction Status information on three separate pins. These pins differ from ST0 – 3 in that they are synchronous to the CPU’s internal instruction execution section rather than to its bus interface section. PFS (Program Flow Status) is pulsed low as each instruction begins execution. It is intended for debugging purposes. U/S originates from the U-bit of the Processor Status Register, and indicates whether the CPU is currently running in User or Supervisor mode. Although it is not synchronous to bus cycles, there are guarantees on its validity during any given bus cycle. See the Timing Specifications in Section 4. ILO (Interlocked Operation) is activated during an SBITI (Set Bit, Interlocked) or CBITI (Clear Bit, Interlocked) instruction. It is made available to external bus arbitration circuitry in order to allow these instructions to implement the semaphore primitive operations for multi-processor communication and resource sharing. ILO is guaranteed to be active during the operand accesses performed by the interlocked instructions. Hold Request. When active, causes the CPU to release the bus for DMA or multiprocessing purposes. See Section 3.4.5.8. Note: If the HOLD signal is generated asynchronously, its set up and hold times may be violated. In this case, it is recommended to synchronize it with CTTL to minimize the possibility of metastable states. The CPU provides only one synchronization stage to minimize the HLDA latency. This is to avoid speed degradations in cases of heavy HOLD activity (i.e., DMA controller cycles interleaved with CPU cycles). INT Interrupt. A low level on this pin requests a maskable interrupt. INT must be kept asserted until the interrupt is acknowledged. NMI Non-Maskable Interrupt. A High-to-Low transition on this signal requests a non-maskable interrupt Note: INT and NMI are true asynchronous inputs. Therefore, no synchronization with CTTL is required. CWAIT Note: The acknowledge of HOLD is on a cycle by cycle basis. Therefore, it is possible to have HLDA active when an interlock operation is in progress. In this case, ILO remains low and the interlocked instruction continues only after HOLD is de-asserted. Continuous Wait. Causes the CPU to insert continuous wait states if sampled low at the end of T2 and each following T-State. See Section 3.4.5.3. WAIT1 – 2 Two-Bit Wait State Inputs. These inputs, collectively called WAIT1 –2, allow from zero to three wait states to be specified. They are binary weighted. See Section 3.4.5.3. 4.0 Device Specifications 4.1 NS32CG16 PIN DESCRIPTIONS The following is a brief description of all NS32CG16 pins. The descriptions reference portions of the Functional Description, Section 3. Unless otherwise indicated, reserved pins should be left open. Note: During a DMA cycle, WAIT1–2 should be kept inactive unless they are also monitored by the DMA Controller. Wait states, in this case, should be generated through CWAIT. OSCIN Note: An asterisk next to the signal name indicates a TRI-STATE condition for that signal during HOLD acknowledge. Crystal/External Clock Input. Input from a crystal or an external clock source. See Section 3.4.2. 4.1.3 Output Signals A16 – A23 *High-Order Address Bits. These are the most significant 8 bits of the memory address bus. HBE *High Byte Enable. Status signal used to enable data transfers on the most significant byte of the data bus. ST0 – 3 Status. Bus cycle status code; ST0 is the least significant. Encodings are: 0000ÐIdle: CPU Inactive on Bus. 0001ÐIdle: WAIT Instruction. 0010Ð(Reserved) 0011ÐIdle: Waiting for Slave. 0100ÐInterrupt Acknowledge, Master. 0101ÐInterrupt Acknowledge, Cascaded. 0110ÐEnd of Interrupt, Master. 0111ÐEnd of Interrupt, Cascaded. 1000ÐSequential Instruction Fetch. 1001ÐNon-Sequential Instruction Fetch. 1010ÐData Transfer. 1011ÐRead Read-Modify-Write Operand. 4.1.1 Supplies Logic Power. VCCL a 5V positive supply for on-chip logic. VCCCTTL, Buffers Power. VCCFCLK, a 5V positive supplies for on-chip output VCCAD, buffers. VCCIO VSSL Logic Ground. Ground reference for on-chip logic. VSSFCLK, Buffers Ground. VSSNTSC, Ground reference for on-chip output buffers. VSSHAD, VSSLAD, VSSIO 4.1.2 Input Signals RSTI Reset Input. Schmitt triggered, asynchronous signal used to generate a CPU reset. See Section 3.4.4. Note: The reset signal is a true asynchronous input. Therefore, no external synchronizing circuit is needed. When RSTI changes right before the falling edge of CTTL, and meets the specified set-up time, it will be recognized on that falling edge. Otherwise it will be recognized on the falling edge of CTTL in the following clock cycle. 51 4.0 Device Specifications (Continued) 1100ÐRead for Effective Address. DBE 1101ÐTransfer Slave Operand. U/S 1110ÐRead Slave Status Word. 1111ÐBroadcast Slave ID. User/Supervisor. User or Supervisor Mode status. High indicates User Mode; low indicates Supervisor Mode. ILO Interlocked Operation. When active, indicates that an interlocked operation is being executed. HLDA Hold Acknowledge. Activated by the CPU in response to the HOLD input to indicate that the CPU has released the bus. PFS Program Flow Status. A pulse on this signal indicates the beginning of execution of an instruction. BPU BPU Cycle. This signal is activated during a bus cycle to enable an external BITBLT processing unit. The EXTBLT instruction activates this signal.* RSTO Reset Output. This signal becomes active when RSTI is low, initiating a system reset. RD Read Strobe. Activated during CPU or DMAC read cycles to enable reading of data from memory or peripherals. See Section 3.4.5.2. WR OSCOUT Crystal Output. This line is used as the return path for the crystal (if used). When an external clock source is used, OSCOUT should be left unconnected or loaded with no more than 5 pF of stray capacitance. FCLK Fast Clock. This clock is derived from the clock waveform on OSCIN. Its frequency is either the same as OSCIN or is lower, depending upon the scale factor programmed into the CFG register. PHI1, PHI2 Two-Phase Clock. These outputs provide a two-phase clock with frequency half that of FCLK. They can be used to clock the DP8510/DP8511 BPU. The trace lengths of PHI1 and PHI2 should be shorter than 4 inches (10 centimeters) when connected to the BPU. CTTL System Clock. This clock is similar to PHI1 but has a much higher driving capability. The skew between its rising edge and PHI1 rising edge is kept to a minimum. 4.1.4 Input-Output Signals AD0 –15 *Address/Data Bus. Multiplexed Address/Data information. Bit 0 is the least significant bit of each. SPC Slave Processor Control. Used by the CPU as the data strobe output for slave processor transfers; used by a slave processor to acknowledge completion of a slave instruction. See Section 3.4.5.6. DDIN *Data Direction. Status signal indicating the direction of the data transfer during a bus cycle. During HOLD acknowledge this signal becomes an input and determines the activation of RD or WR. *Address Strobe ADS Controls address latches; signals the beginning of a bus cycle. During HOLD acknowledge this signal becomes an input and the CPU monitors it to detect the beginning of a DMA cycle and generate the relevant strobe signals. When a DMA is used, ADS should be pulled up to VCC through a 10 kX resistor. Write Strobe. Activated during CPU or DMAC write cycles to enable writing of data to memory or peripherals. *Note: BPU is low (Active) only during bus cycles involving pre-fetching instructions and execution of EXTBLT operands. It is recommended that BPU, ADS and status lines (ST0–ST3) be used to qualify BPU bus cycles. If a DMA circuit exists in the system, the HLDA signal should be used to further qualify BPU cycles. BPU may become active during T4 of a nonBPU bus cycle, and may become inactive during T4 of a BPU bus cycle. BPU must be qualified by ADS and status lines (ST0–ST3) to be used as an external gating signal. TSO Data Buffers Enable. Used to control external data buffers. It is active when the data buffers are to be enabled. Timing State Output. The falling edge of TSO identifies the beginning of state T2 of a bus cycle. The rising edge identifies the beginning of state T4. 52 4.0 Device Specifications (Continued) 68-Pin PCC Package TL/EE/9424 – 29 Bottom View Order Number NS32CG16V-10 or NS32CG16V-15 NS Package Number V68A FIGURE 4-1. Connection Diagram 53 4.0 Device Specifications (Continued) All Input or Output Voltages with Respect to GND 4.2 ABSOLUTE MAXIMUM RATINGS If Military/Aerospace specified devices are required, please contact the National Semiconductor Sales Office/Distributors for availability and specifications. Temperature Under Bias 0§ C to a 70§ C Storage Temperature b 0.5V to a 7V Note: Absolute maximum ratings indicate limits beyond which permanent damage may occur. Continuous operation at these limits is not intended; operation should be limited to those conditions specified under Electrical Characteristics. b 65§ C to a 150§ C 4.3 ELECTRICAL CHARACTERISTICS: TA e 0§ C to a 70§ C, VCC e 5V g 5%, GND e 0V Symbol Parameter Conditions Min Typ Max Units V VIH High Level Input Voltage (Note 4) 2.0 VCC a 0.5 VIL Low Level Input Voltage (Note 3) b 0.5 0.8 V VT a RSTI Rising Threshold Voltage VCC e 5.0V (Note 5) 2.5 3.5 V VHYS RSTI Hysteresis Voltage VCC e 5.0V (Note 5) 0.8 1.8 V VXL OSCIN Input Low Voltage 0.5 V VXH OSCIN Input High Voltage VOH High Level Output Voltage IOH e b400 mA (Note 6) VOL Low Level Output Voltage IOL e 4 mA (Note 6) IILS SPC Input Current (low) VIN e 0.4V, SPC in Input Mode II Input Load Current 0 s VIN s VCC, All Inputs except SPC IL Leakage Current Output and I/O Pins in TRI-STATE Input Mode 0.4 s VOUT s VCC b 20 ICC Active Supply Current IOUT e 0, TA e 25§ C (Note 2) VPH PHI1, 2 High Level Output Voltage IOH e b400 mA PHI1, 2 Low Level Output Voltage IOL e 4 mA VPL 4.5 V 2.4 V 0.45 V 0.05 1.0 mA b 20 20 mA 20 mA 200 mA 140 0.9 VCC V 0.1 VCC Note 1: Care should be taken by designers to provide a minimum inductance path between the VSS pins and system ground in order to minimize noise. Note 2: ICC is affected by the clock scaling factor selected by the C and M bits in the CFG register, see Section 3.2.1. Note 3: VIL minÐin the range of b 0.5V to b 1.5V, the pulse must be s 20 ns, and the period between pulses t 120 ns. Note 4: VIH maxÐin the range of VCC a 0.5V to VCC a 2.0V, the pulse must be s 25 ns, and the period between pulses t 120 ns. Note 5: Not 100% tested. Note 6: All outputs except PHI1 and PHI2. 54 V 4.0 Device Specifications (Continued) 4.4 TEST LOADING CHARACTERISTICS Signal Name HBE, ST0–3, U/S, Capacitive Loading High Level Low Level Output Voltage Output Voltage Input Load Current (IOH e b 400 mA) (IOL e 4 mA) (0 s VIN s VCC) 2.0V s VOH s VCC a 0.5V b 0.5V s VOL s 0.8V High Level Low Level Input Voltage Input Voltage b 20 mA s II s 20 mA 2.0V s VIH s VCC a 0.5V b 0.5V s VIL s 0.45V b 20 mA s II s 20 mA 2.0V s VIH s VCC a 0.5V b 0.5V s VIL s 0.8V b 20 mA s II s 20 mA 4.5V s VIH s VCC a 0.5V b 0.5V s VIL s 0.5V b 20 mA s II s 20 mA 2.4V s VIH s VCC a 0.5V b 0.5V s VIL s 0.45V 50 mA s II s 1.0 mA 2.0V s VIH s VCC a 0.5V b 0.5V s VIL s 0.4V 50 pF ILO, HLDA, PFS, BPU, RST0, RD, WR, TSO, DBE, FCLK, DDIN, ADS RSTI, HOLD, INT, 50 pF NMI, CWAIT, WAIT1–2 OSCIN 50 pF AD0–15, A16–23, 100 pF 2.0V s VOH s VCC a 0.5V b 0.5V s VOL s 0.8V CTTL PHI1, PHI2 30 pF (Note 2) (Note 2) SPC 30 pF 2.0V s VOH s VCC a 0.5V b 0.5V s VOL s 0.8V OSCOUT see Table 2.0V s VOH s VCC a 0.5V b 0.5V s VOL s 0.8V (Note 1) 3-1 Note 1: The maximum capacitive loading of OSCOUT is given in Table 3-1 when the NS32CG16’s oscillator is driven with a crystal. If a single phase clock source is used, OSCOUT should be left unconnected or loaded with no more than 5 pF of stray capacitance. Note 2: As stated in Table 4.5.2. TL/EE/9424 – 77 FIGURE 4.3. Output Signals Specification Standard Abbreviations: L.E.ÐLeading Edge T.E.Ð Trailing Edge TL/EE/9424 – 65 FIGURE 4.2. Test Loading Configuration 4.5 SWITCHING CHARACTERISTICS R.E.ÐRising Edge F.E.ÐFalling Edge 4.5.1 Definitions All the timing specifications given in this section refer to 0.8V or 2.0V on the rising or falling edges of all the signals as illustrated in Figures 4-3 and 4-4 unless specifically stated otherwise. The capacitive load is assumed to be 100 pF on CTTL and 50 pF on all the other output signals. TL/EE/9424 – 78 FIGURE 4.4. Input Signals Specification Standard 55 4.0 Device Specifications (Continued) 4.5.2 Timing Tables 4.5.2.1 Output Signals: Internal Propagation Delays, NS32CG16-10 and NS32CG16-15 Name Figure Description Reference/Conditions tCTp 4-15 CTTL Clock Period R.E., CTTL to Next R.E., CTTL tCTh 4-15 CTTL High Time At 1.5V (Both Edges) (see Note 1) 25 pF–100 pF Capacitive Load NS32CG16-10 NS32CG16-15 (Note 3) Min Max Min Max 100 1000 66 1000 ns 0.42 0.57 0.41 0.58 tCTp Units tCTl 4-15 CTTL Low Time At 0.8V 25 pF–100 pF Capacitive Load 0.42 0.56 0.41 0.53 tCTp tCTr 4-15 CTTL Rise Time 0.8V to 2.0V VCC on R.E., CTTL 0 8 0 6 ns tCTf 4-15 CTTL Fall Time 2.0V to 0.8V VCC on F.E., CTTL 0 8 0 6 ns tCLw(1,2) 4-15 PHI1, PHI2 Pulse Width At 2.0V on PHI1, PHI2 (Both Edges) 0.35 0.55 0.32 0.53 tCTp tCLh 4-15 Clock High Time At 90% VCC on PHI1, PHI2 (Both Edges) 0.22 0.50 0.28 0.50 tCTp tnOVL(1,2) 4-15 PHI1, PHI2, Non-Overlap Time At 50% VCC on PHI1, PHI2 tXFr 4-15 OSCIN to FCLK R.E. Delay 80% VCC on R.E., OSCIN to R.E., FCLK tFCr 4-15 FCLK to CTTL R.E. Delay R.E., FCLK to R.E., CTTL tFCf 4-15 FCLK to CTTL F.E. Delay R.E., FCLK to F.E., CTTL tPCr 4-15 CTTL and PHI1 Skew R.E., CTTL to R.E., PHI1 tALv 4-5 Address Bits 0–15 Valid after R.E., CTTL T1 tALh 4-5 Address Bits 0–15 Hold after R.E., CTTL T2 tAHv 4-5 Address Bits 16–23 Valid after R.E., CTTL T1 tAHh 4-5 Address Bits 16–23 Hold after R.E., CTTL Next T1 or Ti tALfr 4-5 Address Bits 0–15 floating (during read) after R.E., CTTL T2 tALnfr 4-5 AD0–AD15 Floating (Note 2) 2 2 ns 2 29 2 25 ns b2 10 b2 10 ns b2 10 b2 10 ns 4 b4 4 ns 40 4 30 b4 5 5 40 0 0 ns ns 30 0 ns ns 5 38 5 28 ns 4 36 4 26 ns Note 1: Device testing is performed using the Test Loading Characteristics in Table 4.1. Additional timing data for CTTL with various capacitive loads is not 100% tested. Note 2: tALnfr is address bits 0–15 floating or not active after R.E. CTTL T1. This is only valid if the previous CPU cycle was a read (Figure 4.5) . A previous write may have ‘‘data’’ active into T1 of the next cycle which then becomes ‘‘address’’ during T1. Note 3: 15 MHz specifications are only guaranteed when tCTp e 66 ns. 56 4.0 Device Specifications (Continued) 4.5.2.1 Output Signals: Internal Propagation Delays, NS32CG16-10 and NS32CG16-15 (Continued) Name Figure Description Reference/Conditions NS32CG16-10 Min tALf 4-7 AD0–AD15 Floating (Caused by HOLD) after R.E., CTTL Ti tAHf 4-7 A16–A23 Floating after R.E., CTTL Ti tALnf 4-5, 4-8 Address Bits 0–15 Not Floating after R.E., CTTL T1 tAHnf 4-8 Address Bits 16–23 Not Floating after R.E., CTTL T4 after R.E., CTTL T2 or T1 tDv 4-6, 4-10 Data Valid (Write Cycle) tDh 4-6, 4-10 Max NS32CG16-15 Min Units Max 25 18 ns 25 18 ns 4 36 4 26 ns 4 36 4 26 ns 38 ns 5 26 ns 5 25 50 Data Hold after R.E., CTTL Next T1 or Ti 0 tADSa 4-5 ADS Signal Active after R.E., CTTL T1 5 35 0 ns tADSia 4-5 ADS Signal Inactive after F.E., CTTL T1 5 35 tADSw 4-6 ADS Pulse Width at 15% VCC (Both Edges) 30 tADSf 4-7 ADS Floating after R.E., CTTL Ti tADSr 4-8 ADS Return from Floating after R.E., CTTL Ti tALADSs 4-6 Address Bits 0–15 Setup before ADS T.E. 25 18 ns tAHADSs 4-6 Address Bits 16–23 Setup before ADS T.E. 25 18 ns tALADSh 4-5 Address Bits 0–15 Hold after ADS T.E. 12 12 ns tHBEv 4-5 HBE Signal Valid after R.E., CTTL T1 tHBEh 4-5 HBE Signal Hold after R.E., CTTL Next T1 or Ti tHBEf 4-7 HBE Signal Floating after R.E., CTTL Ti tHBEr 4-8 HBE Return from Floating after R.E., CTTL Ti tDDINv 4-5 DDIN Signal Valid after R.E., CTTL T1 tDDINh 4-5 DDIN Signal Hold after R.E., CTTL Next T1 or Ti tDDINf 4-7 DDIN Floating after R.E., CTTL Ti tDDINr 4-8 DDIN Return from Floating after R.E., CTTL Ti tSPCa 4-10 SPC Output Active after R.E., CTTL T1 tSPCia 4-10 SPC Output Inactive after R.E., CTTL T4 tSPCnf 4-12 SPC Output Non-Forcing (Note 2) after F.E., CTTL T4 tHLDAa 4-7 HLDA Signal Active tHLDAia 4-8 HLDA Signal Inactive tSTv 4-5 Status ST0–ST3 Valid tSTh 4-5 Status ST0–ST3 Hold after R.E., CTTL T4 tBPUv 4-5 BPU Signal Valid after R.E., CTTL T4 tBPUh 4-5 BPU Signal Hold after R.E., CTTL T4 25 55 40 55 40 60 ns ns ns ns 38 ns 55 40 ns 55 40 ns 38 ns 55 40 ns 55 40 ns 0 0 65 0 ns 0 ns 30 5 21 ns 35 5 26 ns tCTp a 10 tCTp a 8 ns after R.E., CTTL Ti 50 28 ns after R.E., CTTL Ti 50 28 ns after R.E., CTTL T4 (before T1, see Note 1) 45 38 ns 30 ns 5 0 0 45 5 5 ns ns Note 1: Every memory cycle starts with T4, during which Cycle Status is applied. If the CPU was idling, the sequence will be: ‘‘ . . . Ti, T4, T1 . . . ’’. If the CPU was not idling, the sequence will be: ‘‘ . . . T4, T1 . . . ’’. Note 2: If the CPU is connected directly to the FPU and the CTTL loading is not violated, the CPU and FPU will function correctly together. The CPU and FPU connect directly without buffers. They should be located less than 4 inches (10 centimeters) apart. tSPCa and tSPCia will track each other on all CPU’s and therefore it is not possible to have a minimum tSPCia and a maximum tSPCa value. The pulse width minimum, tSPCw, of the FPU will not be violated by the NS32CG16 when connected directly to the FPU. 57 4.0 Device Specifications (Continued) 4.5.2.1 Output Signals: Internal Propagation Delays, NS32CG16-10 and NS32CG16-15 (Continued) Name Figure Description Reference/Conditions NS32CG16-10 NS32CG16-15 Min Max Min Max Units tTSOa 4-5 TSO Signal Active after R.E., CTTL T2 2 15 2 14 ns tTSOia 4-5 TSO Signal Inactive after R.E., CTTL T4 0 15 0 10 ns tRDa 4-5 RD Signal Active after R.E., CTTL T2 20 15 ns tRDia 4-5 RD Signal Inactive after R.E., CTTL T4 20 0 15 ns tWRa 4-6 WR Signal Active after R.E., CTTL T2 20 15 ns tWRia 4-6 WR Signal Inactive after R.E., CTTL T4 20 0 15 ns tDBEa(R) 4-5 DBE Active (Read Cycle) after F.E., CTTL T2 21 15 ns tDBEa(W) 4-6 DBE Active (Write Cycle) after R.E., CTTL T2 28 15 ns DBE Inactive after F.E., CTTL T4 23 15 ns 4-5 U/S Signal Valid after R.E., CTTL T4 40 30 ns tUSh 4-5 U/S Signal Hold after R.E., CTTL T4 tPFSa 4-13 PFS Signal Active after F.E., CTTL 50 38 tPFSia 4-13 PFS Signal Inactive after F.E., CTTL 50 38 tPFSw 4-13 PFS Pulse Width at 15% VCC (Both Edges) tILOa 4-14 ILO Signal Active after R.E., CTTL 55 35 ns tILOia 4-14 ILO Signal Inactive after R.E., CTTL 55 35 ns tRSTOa 4-19 RSTO Signal Active after R.E., CTTL 21 15 ns tRSTOia 4-19 RSTO Signal Inactive after R.E., CTTL 21 15 ns tRTOI 4-19 Reset to Idle after F.E. of RSTO 10 10 tCTp tDBEia tUSv 4-5, 4-6 58 5 5 70 ns 45 ns ns ns 4.0 Device Specifications (Continued) 4.4.2.2 Input Signal Requirements: NS32CG16-10 and NS32CG16-15 Name Figure Description Reference/Conditions tXp 4-15 OSCIN Clock Period R.E., OSCIN to Next R.E., OSCIN tXh 4-15 OSCIN High Time (External Clock) at 4.2V (Both Edges) tXl 4-15 OSCIN Low Time tDIs 4-5, 4-11 tDIh 4-5, 4-11 tCWs NS32CG16-10 NS32CG16-15 Min Max Min Max 50 500 33 500 Units ns 16 11 ns at 1.0V (Both Edges) 16 11 ns Data In Setup before R.E., CTTL T4 18 15 ns Data In Hold (see Note 1) after R.E., CTTL T4 7 7 ns 4-5, 4-6 CWAIT Signal Setup before R.E., CTTL T3 or T3(w) 20 20 ns tCWh 4-5, 4-6 CWAIT Signal Hold after R.E., CTTL T3 or T3(w) 5 5 ns tWs 4-5, 4-6 WAITn Signals Setup before R.E., CTTL T3 or T3(w) 20 20 ns tWh 4-5, 4-6 WAITn Signals Hold after R.E., CTTL T3 or T3(w) 5 5 ns tHLDs 4-7, 4-8 HOLD Setup Time before R.E., CTTL TX2 or Ti 30 22 ns tHLDh 4-7, 4-8 HOLD Hold Time after R.E., CTTL Ti 0 0 ns tPWR 4-18 Power Stable to RSTI R.E. after VCC Reaches 4.5V 50 33 ms tRSTw 4-19 RSTI Pulse Width at 0.8V (Both Edges) 64 64 tCTp tSPCh 4-12 SPC Hold Time (see Note 3) after R.E., CTTL 0 0 ns tINTh 4-16 INT Signal Hold After R.E., CTTL T2 of Interrupt Acknowledge Cycle tNMIw 4-17 NMI Pulse Width at 0.8V (Both Edges) tSPCd 4-12 SPC Pulse Delay from Slave after F.E., CTTL T4 tSPCs 4-12 SPC Input Setup tADSs 4-9 ADS Input Setup tADSh 4-9 ADS Input Hold (see Note 2) after F.E., CTTL T1 tDDINs 4-9 DDIN Input Setup tDDINh 4-9 DDIN Input Hold 8 8 tCTp 70 50 ns 2 2 tCTp before R.E., CTTL 25 25 ns before F.E., CTTL 15 10 ns 10 10 ns before F.E., CTTL 15 10 ns after R.E., CTTL T4 7 5 ns Note 1: tDIh is always less than or equal to tRDia. Note 2: ADS must be deasserted before state T4 of the DMA controller cycle. Note 3: Not tested, guaranteed by design. 59 4.0 Device Specifications (Continued) 4.5.4 TIMING DIAGRAMS TL/EE/9424 – 32 FIGURE 4-5. Read Cycle 60 4.0 Device Specifications (Continued) TL/EE/9424 – 33 FIGURE 4-6. Write Cycle 61 4.0 Device Specifications (Continued) TL/EE/9424 – 34 FIGURE 4-7. HOLD Acknowledge Timing (Bus Initially Not Idle) Note: When the bus is not idle, HOLD must be asserted before the rising edge of CTTL of the timing state that precedes state T4 in order for the request to be acknowledged. 62 4.0 Device Specifications (Continued) TL/EE/9424 – 35 FIGURE 4-8. HOLD Timing (Bus Initially Idle) 63 4.0 Device Specifications (Continued) TL/EE/9424 – 36 FIGURE 4-9. DMAC Initiated Bus Cycle Note 1: ADS must be deactivated before state T4 of the DMA controller cycle. Note 2: During a DMA cycle WAIT1–2 must be kept inactive unless they are monitored by the DMA Controller. A DMA cycle is similar to a CPU cycle. The NS32CG16 generates TSO, RD, WR and DBE. The DMAC drives the address/data lines HBE, ADS and DDIN. Note 3: During a DMA cycle, if the ADS signal is pulsed in order to initiate a bus cycle, the HOLD signal must remain asserted until state T4 of the DMAC cycle. 64 4.0 Device Specifications (Continued) TL/EE/9424 – 37 TL/EE/9424 – 38 FIGURE 4-10. Slave Processor Write Timing FIGURE 4-11. Slave Processor Read Timing TL/EE/9424 – 39 FIGURE 4-12. SPC Timing After transferring the last operand to the FPU, the CPU turns OFF the output driver and holds SPC high with an internal 5 kX pullup. TL/EE/9424 – 40 FIGURE 4-13. PFS Signal Timing 65 4.0 Device Specifications (Continued) TL/EE/9424 – 49 Note: ILO may be asserted more than one clock cycle before the beginning of an interlocked access. FIGURE 4-14. ILO Signal Timing TL/EE/9424 – 47 FIGURE 4-15. Clock Waveforms 66 4.0 Device Specifications (Continued) TL/EE/9424 – 79 FIGURE 4-16. INT Signal Timing Note 1: Once INT is asserted, it must remain asserted until it is acknowledged. Note 2: INTA is the Interrupt Acknowledge bus cycle (not a CPU signal). Refer to Section 3.4.5 and Table 3.4. TL/EE/9424 – 51 FIGURE 4-17. NMI Signal Timing TL/EE/9424 – 53 FIGURE 4-18. Power-On Reset 67 4.0 Device Specifications (Continued) TL/EE/9424 – 54 Note 1: During Reset the HOLD signal must be kept high. Note 2: After RSTI is deasserted the first bus cycle will be an instruction fetch at address zero. FIGURE 4-19. Non-Power-On Reset 68 Appendix A: Instruction Formats T e Translated B e Backward U/W e 00: None 01: While Match 11: Until Match Configuration bits in SETCFG instruction: NOTATIONS i e Integer Type Field B e 00 (Byte) W e 01 (Word) D e 11 (Double Word) f e Floating-Point Type Field F e 1 (Std. Floating: 32 bits) L e 0 (Long Floating: 64 bits) op e Operation Code Valid encodings shown with each format. gen, gen 1, gen 2 e General Addressing Mode Field See Section 2.4.2 for encodings. reg e General Purpose Register Number C M F I 7 0 cond 1 0 1 0 Format 0 Bcond cond e Condition Code Field 0000 e EQual: Z e 1 0001 e Not Equal: Z e 0 0010 e Carry Set: C e 1 0011 e Carry Clear: C e 0 0100 e Higher: L e 1 0101 e Lower or Same: L e 0 0110 e Greater Than: N e 1 0111 e Less or Equal: N e 0 1000 e Flag Set: F e 1 1001 e Flag Clear: F e 0 1010 e LOwer: L e 0 and Z e 0 1011 e Higher or Same: L e 1 or Z e 1 1100 e Less Than: N e 0 and Z e 0 1101 e Greater or Equal: N e 1 or Z e 1 1110 e (Unconditionally True) 1111 e (Unconditionally False) short e Short Immediate Value. May contain quick: Signed 4-bit value, in MOVQ, ADDQ, CMPQ, ACB cond: Condition Code (above), in Scond. areg: CPU Dedicated Register, in LPR, SPR 0000 e UPSR (BR) 7 0 op 0 0 1 0 Format 1 BSR RET CXP RXP RETT RETI SAVE RESTORE Ð0000 Ð0001 Ð0010 Ð0011 Ð0100 Ð0101 Ð0110 Ð0111 ENTER EXIT NOP WAIT DIA FLAG SVC BPT 15 Ð1000 Ð1001 Ð1010 Ð1011 Ð1100 Ð1101 Ð1110 Ð1111 87 gen 0 short op 1 1 ADDQ CMPQ SPR Scond Ð000 Ð001 Ð010 Ð011 ACB MOVQ LPR 15 Ð100 Ð101 Ð110 87 gen 0001–0111 e (Reserved) 1000 e FP 1001 e SP 1010 e SB 1011 e (Reserved) 1100 e (Reserved) 1101 e PSR 1110 e INTBASE 1111 e MOD Options: in String Instructions op 0 1 1 1 1 1 i Format 3 CXPD Ð0000 BICPSR Ð0010 JUMP Ð0100 BISPSR Ð0110 Trap (UND) on XXX1, 1000 ADJSP JSR CASE 15 Ð1010 Ð1100 Ð1110 87 gen 1 gen 2 0 op Format 4 U/W i Format 2 B T ADD CMP BIC ADDC MOV OR 69 Ð0000 Ð0001 Ð0010 Ð0100 Ð0101 Ð0110 SUB ADDR AND SUBC TBIT XOR Ð1000 Ð1001 Ð1010 Ð1100 Ð1101 Ð1110 i Appendix A: Instruction Formats (Continued) 23 16 15 0 0 0 0 0 short 8 7 0 op i 23 0 0 0 0 0 1 1 1 0 16 15 gen 1 gen 2 BITWT TBITS BBAND SBITPS BBFOR SBITS BBXOR MOVif LFSR MOVLF MOVFL b 1000 b 1001 b 1010 b 1011 b 1100 b 1101 b 1110 16 15 8 7 gen 2 op i b 000 b 001 b 010 b 011 b 0000 b 0001 b 0010 b 0011 b 0100 b 0101 b 0110 b 0111 b 1000 b 1001 b 1010 b 1011 b 1100 b 1101 b 1110 b 1111 8 7 gen 2 16 15 op i ADDf MOVf CMPf (Note 3) SUBf NEGf Trap (UND) Trap (UND) 0 23 1 1 0 0 1 1 1 0 b 0000 b 0001 b 0010 b 0011 b 0100 b 0101 b 0110 b 0111 8 7 gen 2 op 0 0 f 1 0 1 1 1 1 1 0 MUL MEI Trap (UND) DEI QUO REM MOD DIV b 0000 b 0001 b 0010 b 0011 b 0100 b 0101 b 0110 b 0111 DIVf (Note 1) Trap (UND) Trap (UND) MULf ABSf Trap (UND) Trap (UND) 16 15 gen 1 Format 7 MOVM CMPM INSS EXTS MOVXBW MOVZBW MOVZiD MOVXiD b 100 b 101 b 110 b 111 Format 11 NEG NOT Trap (UND) SUBP ABS COM IBIT ADDP 16 15 gen 1 ROUND TRUNC SFSR FLOOR Format 10 Always Format 6 23 0 0 1 1 1 1 1 0 0 1 0 0 1 1 1 0 gen 1 ROT ASH CBIT CBITI Trap (UND) LSH SBIT SBITI i 0 23 gen 1 f TL/EE/9424 – 56 Trap (UND) 23 op 0 Format 9 Format 5 b 0000 MOVS b 0001 CMPS b 0010 SETCFG b 0011 SKPS b 0100 BBSTOD b 0101 EXTBLT b 0110 BBOR b 0111 MOVMP No Operation on 1111 8 7 b 1000 b 1001 b 1010 b 1011 b 1100 b 1101 b 1110 b 1111 8 7 gen 2 op 0 0 f 1 1 1 1 1 1 1 0 Format 12 b 1000 b 1001 b 1010 b 1011 b 1100 b 1101 b 1110 b 1111 (Note 2) (Note 1) POLYf DOTf SCALBf LOGBf Trap (UND) Trap (UND) b 0000 b 0001 b 0010 b 0011 b 0100 b 0101 b 0110 b 0111 (Note 2) (Note 1) Trap (UND) Trap (UND) (Note 2) (Note 1) Trap (UND) Trap (UND) b 1000 b 1001 b 1010 b 1011 b 1100 b 1101 b 1110 b 1111 *Instructions with Format 12 are available only when the NS32381 is used. TL/EE/9424–55 Format 8 TL/EE/9424 – 57 b 0 00 EXT INDEX b 0 01 CVTP FFS b 0 10 INS b 0 11 CHECK Trap (UND) on b1 10 and b1 11 b 1 00 b 1 01 Trap (UND) Format 13 Always TL/EE/9424 – 58 70 Appendix A: Instruction Formats (Continued) Trap (UND) Format 14 Always Trap (UND) Format 15 Always Trap (UND) Format 18 Always Trap (UND) Format 19 Always TL/EE/9424 – 59 TL/EE/9424 – 63 Implied Immediate Encodings: 7 TL/EE/9424 – 60 Trap (UND) 0 r7 Format 16 Always r6 r5 r4 r3 r2 r1 r0 Register Mask, appended to SAVE, ENTER 7 0 ro Trap (UND) r1 r2 r3 r4 r5 r6 r7 Register Mask, appended to RESTORE, EXIT TL/EE/9424 – 61 Format 17 Always 7 0 offset lengthb1 Offset/Length Modifier appended to INSS, EXTS TL/EE/9424 – 62 Note 1: Opcode not defined; CPU treats like MOVf. First operand has access class of read; second operand has access class of write; f-field selects 32-bit or 64-bit data. Note 2: Opcode not defined; CPU treats like ADDf. First operand has access class of read; second operand has access class of read-modify-write. f-field selects 32-bit or 64-bit data. Note 3: Reserved opcode; execution of this opcode will generate an undefined result. 71 Appendix B: Instruction Execution Times fÐ This parameter is related to the floating-point operand size. This section provides the necessary information to calculate the instruction execution times for the NS32CG16. The following assumptions are made: Y The entire instruction, with all displacements and immediate operands, is assumed to be present in the instruction queue when needed. Y Interference from instruction prefetches, which is very dependent upon the preceding instruction(s), is ignored. This assumption will tend to affect the timing estimate in an optimistic direction. Y It is assumed that all memory operand transfers are completed before the next instruction begins execution. In the case of an operand of access class rmw in memory, this is pessimistic, as the Write transfer occurs in parallel with the execution of the next instruction. Y It is assumed that there is no overlap between the fetch of an operand and the following sequences of microcode. This is pessimistic, as the fetch of Operand 1 will generally occur in parallel with the effective address calculation of Operand 2, and the fetch of Operand 2 will occur in parallel with the execution phase of the instruction. Y Where possible, the values of operands are taken into consideration when they affect instruction timing, and a range of times is given. Where this is not done, the worst case is assumed. TfÐ The time required to transfer 32 bits of floating point value to or from the FPU. TiÐ The time required to transfer an integer value to or from the FPU. B.1.1 Equations The following equations assume that: # Memory accesses occur at full speed. # Any wait states should be reflected in the calculations of TOPB, TOPW and TOPD. Note: When multiple writes are performed during the execution of an instruction, wait states occurring during intermediate write transactions may be partially hidden by the internal execution. Therefore, a certain number of wait states can be inserted with no effect on the execution time. For example, in the case of the MOVSi instructions each wait state on write operations subtracts 1 clock cycle per write bus access, from the TCY of the instruction, since updating the pointers occurs in parallel with the write operation. This means that wait states can be added to write cycles without changing the execution time of the instruction, up to a maximum of 13 wait states on writes for MOVSB and MOVSW, and 4 wait states on writes for MOVSD. TEAÐ TEA values for the various addressing modes are provided in the following table. TEA TABLE Addressing Mode B.1 BASIC AND FLOATING-POINT INSTRUCTIONS Execution times for basic and floating-point instructions are given in Tables B-1 and B-2. The parameters needed for the various calculations are defined below. TEAÐ The time required to calculate an operand’s Effective Address. For a Register or Immediate operand, this includes the fetch of that operand. TEA1Ð TEA value for the GEN or GEN1 operand. TEA2Ð TEA value for the GEN2 operand. TOPBÐ The time needed to read or write a memory byte. TOPWÐ The time needed to read or write a memory word. TOPDÐ The time needed to read or write a memory double-word. TOPiÐ The time needed to read or write a memory operand, where the operand size is given by the operation length of the instruction. It is always equivalent to either TOPB, TOPW or TOPD. TCYÐ Internal processing overhead, in clock cycles. LÐ Internal processing whose duration depends on the operation length. The number of clock cycles is derived by multiplying this value by the number of bytes in the operation length. NCYCÐ Number of bus cycles performed by the CPU to fetch or store an operand. NCYC depends on the operand size and alignment. TPRÐ CPU processing (in clock cycles) performed in parallel with the FPU. TFPUÐ Processing time required by the FPU to execute the instruction. This is the time from the last data sent to the FPU, until done is issued. TFPU can be found in the FPU data sheets. TEA Value IMMEDIATE, ABSOLUTE 4 EXTERNAL 11 a 2 * TOPD MEMORY RELATIVE 7 a TOPD REGISTER 2 REGISTER RELATIVE, MEMORY SPACE 5 TOP OF STACK SCALED INDEXED Notes 4 2 3 Access Class Write Access Class Read Access Class RMW TI1 a TI2 TI1 e TEA of the basemode except: if basemode is REGISTER then TI1 e 5 if basemode is TOP OF STACK then TI1 e 4 TI2 depends on the scale factor: if byte indexing TI1 e 5 if word indexing TI2 e 7 if double-word indexing TI2 e 8 if quad-word indexing TI2 e 10 TOPBÐ If operand is in a register or is immediate then TOPB e 0 else TOPB e 3 TOPWÐ If operand is in a register or is immediate then TOPW e 0 else TOPW e 4 # NCYC b 1 TOPDÐ If operand is in a register or is immediate then TOPD e 0 else TOPD e 4 # NCYC b 1 72 Appendix B: Instruction Execution Times (Continued) TOPiÐ If operand is in a register or is immediate then TOPi e 0 else if i e byte then TOPi e TOPB else if i e word then TOPi e TOPW else (i e double-word) then TOPi e TOPD LÐ If i (operation length) e byte then L e 1 else if i e word then L e 2 else (i e double-word) L e 4 fÐ If standard floating (32 bits): f e 1 If long floating (64 bits): f e 2 TfÐ Tf e 4 TiÐ If integer e byte or word, then Ti e 2 If integer e double-word, then Ti e 4 B.1.3. Calculation of the Execution Time TEX for Basic Instructions The execution time for a basic instruction is obtained by performing the following steps: 1. Find the desired instruction in Table B-1. 2. Calculate the values of TEA, TOPB, etc. using the numbers in the table and the equations given in the previous sections. 3. The result derived by adding together these values is the execution time TEX in clock cycles. EXAMPLE Calculate TEX for the instruction CMPW R0, TOS. Operand 1 is in a register; Operand 2 is in memory. This means that we must use the table values corresponding to the kxMl case as given in the Notes column. Only the ÝTEA1, ÝTEA2, ÝTOPi and TCY columns have values assigned for the CMPi instruction. Therefore, they are they only ones that need to be calculated to find TEX. The blank columns are irrelevant to this instruction. Both ÝTEA1 and ÝTEA2 columns contain 1 for the kxMl case. This means that effective address times have to be calculated for both operands. (For the kMRl case, the Register operand would have required no TEA time, therefore only the Memory operand TEA would have been necessary.) From the equations: TEA1 (Register mode) e 2. B.1.2 Notes on Table Use Values in the ÝTEA1 and ÝTEA2 columns indicate whether effective addresses need to be calculated. A value of 1 indicates that address calculation time is required for the corresponding operand. A 0 indicates that the operand is either missing, or it is in a register and the instruction has an optimized form which eliminates the TEA calculation for it. In the L column, multiply the entry by the operation length in bytes (1, 2 or 4). In the TCY column, special notations sometimes appear: n1 x n2 means n1 minimum, n2 maximum TEA2 (Top of Stack mode, access class read) e 2. The ÝTOPi column represents potential operand transfers to or from memory. For a Compare instruction, each operand is read once, for a total of two operand transfers. TOPi (Word, Register) e 0, TOPi (Word, TOS) e 3 (assuming the operand aligned) Total TOPi e 3 TCY is the time required for internal operation within the CPU. The TCY value for this case is 3. TEX e TEA1 a TEA2 a TOPi a TCY e 2 a 2 a 3 a 3 e 10 machine cycles. If the CPU is running at 20 MHz then a machine cycle (clock cycle) is 50 ns. Therefore, this instruction would take 10 c 50 ns, or 0.5 ms, to execute. n1%n2 means that the instruction flushes the instruction queue after n1 clock cycles and nonsequentially fetches the next instruction. The value n2 indicates the number of clock cycles for the internal execution of the instruction (including n1). The effective number of cycles (TCY) must take into account the time (Tfetch) required to fetch the portion of the next instruction including the basic encoding and the index bytes. This time depends on the size and the alignment of this portion. If only one memory cycle is required, then: TCY e n1 a 6 a Tfetch If more than one memory cycle is required, then: TCY e n1 a 5 a Tfetch In the notes column, notations held within angle brackets k l indicate alternatives in the operand addressing modes which affect the execution time. A table entry which is affected by the operand addressing may have multiple values, corresponding to the alternatives. These addressing notations are: B.1.4 Calculation of the Execution Time TEX for Floating-Point Instructions The execution time for a floating-point instruction is obtained by performing the following steps: 1. Find the desired instruction in Table B-2. 2. Calculate the values of TEA1, TEA2, TOPB, etc., using the numbers in the table, and the equations given in the previous sections. 3. Get the floating-point instruction execution time TFPU from the appropriate FPU data sheet. 4. Choose the higher value between TPR and TFPU a 3. kIl Immediate CPU Register k M l Memory k F l FPU Register, either 32 or 64 Bits k x l Any Addressing Mode k ab l a and b represent the addressing modes of operand 1 and 2 respectively. Both a and b can be any addressing mode (e.g., kMRl means memory to CPU register). kRl 5. The result derived by adding together these values is the execution time TEX in clock cycles. EXAMPLE 1 Calculate TEX for the instruction MOVLF F0, @ hÊ 3000. Assumptions: # The FPU being used is the NS32181. # Write cycles are performed with no wait states. Note: Unless otherwise specified the TCY value for immediate addressing is the same as for CPU register addressing. 73 Appendix B: Instruction Execution Times (Continued) TEX Calculation: TEX Calculation: Operand 1 is in a register, operand 2 is in memory. This means that we have to use the table values for the kFMl case. The following parameter values are obtained from Table B-2 and the equations in the previous sections. TEA2 (Absolute Mode) e 4 Operand 1 and operand 2 are both in memory. Therefore, the table values for the kMMl case must be used. The parameter values obtained from Table B-2 and the equations in the previous sections are as follows: TEA1 (Register Relative Mode) e 5 TEA2 (Memory Relative Mode) e 8 a TOPD e 15 (TOPD e 7 (Operand Aligned, No Wait)) TOPD1 (Read from GEN1) e 7 a 2 e 9 (Operand Aligned, One Wait) TOPD2 (rmw from GEN2) e 11 a 6 e 17 (Operand Unaligned, Two Waits) Tf e 4 TOPD (Memory Write) e 7 (Operand aligned, no waits) Tf e 4 TCY e 32 TPR e TEA2 a 6 e 4 a 6 e 10 From the FPU Execution Timing table in the NS32181 data sheet we get a TFPU for MOVLF of 19 clock cycles. The higher value between TPR and TFPU a 3 is 22. The total execution time in clock cycles is: TEX e TEA2 a TOPD a TF a TCY a 22 e 65 TCY e 22 x 28 TPR e 0 From the FPU Execution Timing Table in the NS32181 data sheet we get a TFPU for MULF of 33 clock cycles. The higher value between TPR and TFPU a 3 is 36. The total execution time in clock cycles is: TEX e TEA1 a TEA2 a TOPD1 a TOPD2 a 3 # Tf a TCY a 36 e 5 a 15 a 9 a 17 a (22 x 28) a 36 e 133 x 140 EXAMPLE 2 Calculate TEX for the instruction MULF 20(R0), 4(10(FP)) Assumptions: # The FPU being used is the NS32181. # 20(R0) is an aligned read with one wait state. # 10(FP) is an aligned read with no wait states. # 4(10 (FP)) is an unaligned rmw with two wait states. TABLE B-1. Basic Instructions Mnemonic ÝTEA1 ÝTEA2 ÝTOPB ÝTOPW ÝTOPD ÝTOPi ÝL TCY ABSi 1 1 1 1 Ð Ð Ð Ð Ð Ð 2 2 Ð Ð 9 8 ACBi 1 1 Ð Ð Ð Ð Ð Ð Ð Ð Ð Ð Ð Ð Ð Ð Ð Ð Ð Ð 2 2 Ð Ð Ð Ð Ð Ð 16 15%20 18 17%22 ADDi 1 1 Ð 1 Ð Ð Ð Ð Ð Ð Ð Ð Ð Ð Ð 3 1 Ð Ð Ð Ð 3 4 4 k xM l k MR l k RR l ADDCi 1 1 Ð 1 Ð Ð Ð Ð Ð Ð Ð Ð Ð Ð Ð 3 1 Ð Ð Ð Ð 3 4 4 k xM l k MR l k RR l ADDPi 1 1 1 1 Ð Ð Ð Ð Ð Ð 3 3 Ð Ð 16 18 ADDQi Ð Ð 1 Ð Ð Ð Ð Ð Ð Ð 2 Ð Ð Ð 6 4 kMl kRl ADDR 1 1 1 Ð Ð Ð Ð Ð 1 Ð Ð Ð Ð Ð 2 3 k xM l k xR l ADJSPi 1 Ð Ð Ð Ð 1 Ð 6 ANDi 1 1 Ð 1 Ð Ð Ð Ð Ð Ð Ð Ð Ð Ð Ð 3 1 Ð Ð Ð Ð 3 4 4 k M l no branch k M l branch k R l no branch k R l branch No Carry Carry k xM l k MR l k RR l x 45 ASHi 1 1 1 Ð Ð 2 Ð Bcond Ð Ð Ð Ð Ð Ð Ð Ð Ð Ð Ð Ð Ð Ð 7 6%10 BICi 1 1 Ð 1 Ð Ð Ð Ð Ð Ð Ð Ð Ð Ð Ð 3 1 Ð Ð Ð Ð 3 4 4 74 14 Notes SCR k 0 SCR l 0 no branch branch k xM l k MR l k RR l Appendix B: Instruction Execution Times (Continued) TABLE B-1. Basic Instructions (Continued) ÝTEA1 ÝTEA2 ÝTOPB ÝTOPW ÝTOPD ÝTOPi ÝL TCY BICPSRB Mnemonic 1 Ð 1 Ð Ð Ð Ð 18%22 BICPSRW 1 Ð Ð 1 Ð Ð Ð 30%34 BISPSRB 1 Ð 1 Ð Ð Ð Ð 18%22 30%34 Notes BISPSRW 1 Ð Ð 1 Ð Ð Ð BPT Ð Ð Ð 2 4 Ð Ð 40 BR Ð Ð Ð Ð Ð Ð Ð 4%10 BSR Ð Ð Ð Ð 1 Ð Ð 6%16 CASEi 1 Ð Ð Ð Ð 1 Ð 4%9 CBITi 1 1 1 Ð 2 Ð Ð Ð Ð Ð 1 1 Ð Ð 15 7 k xM l CBITIi 1 1 1 Ð 2 Ð Ð Ð Ð Ð 1 1 Ð Ð 15 7 k xM l CHECKi 1 1 1 1 1 1 Ð Ð Ð Ð Ð Ð Ð Ð Ð 3 3 3 Ð Ð Ð 7 10 11 high low ok CMPi 1 1 Ð 1 Ð Ð Ð Ð Ð Ð Ð Ð Ð Ð Ð 2 1 Ð Ð Ð Ð 3 3 3 1 1 Ð Ð Ð 2*n Ð 9 * n a 24 1 Ð Ð Ð Ð Ð Ð Ð Ð Ð 1 Ð Ð Ð 3 3 CMPMi CMPQi CMPSi Ð Ð CMPST Ð COMi 1 k xR l k xR l k xM l k MR l k RR l n e Ý of elements in block kMl kRl Ð 35 * n a 53 n e Ý of elements, not Translated 2*n Ð 38 * n a 53 Translated 2 Ð 7 Ð Ð Ð 2*n Ð n Ð Ð 1 Ð Ð Ð CVTP 1 1 Ð Ð 1 Ð Ð 7 CXP Ð Ð Ð 3 4 Ð Ð 16%21 CXPD 1 Ð Ð 3 3 Ð Ð 13%18 DEIi 1 1 1 Ð Ð Ð Ð Ð Ð Ð 5 1 16 16 38 31 DIA Ð Ð Ð Ð Ð Ð Ð DIVi 1 1 Ð Ð Ð 3 16 58 Ð Ð Ð Ð na1 Ð Ð 4 * n a 18 n e Ý of general registers saved Ð Ð Ð Ð na1 Ð Ð 5 * n a 17 n e Ý of general registers restored EXTi 1 1 1 1 Ð Ð Ð Ð 1 Ð 1 1 Ð Ð 19 17 EXTSi 1 1 Ð Ð 1 1 Ð FFSi 1 1 2 Ð Ð 1 24 x 29 x 51 26 x 36 24 x 28 FLAG Ð Ð Ð Ð Ð Ð Ð 4 Ð 3 Ð Ð Ð Ð 6 44 no trap trap IBITi 1 1 1 Ð 2 Ð Ð Ð Ð Ð 1 Ð Ð Ð 17 9 k xM l ENTER EXIT 75 k xM l k xR l 3%7 x 68 field in memory field in register k xR l Appendix B: Instruction Execution Times (Continued) TABLE B-1. Basic Instructions (Continued) ÝTEA1 ÝTEA2 ÝTOPB ÝTOPW ÝTOPD ÝTOPi ÝL TCY INDEXi Mnemonic 1 1 Ð Ð Ð 2 16 25 INSi 1 1 1 Ð Ð Ð Ð Ð 2 Ð 1 1 Ð Ð INSSi 1 1 Ð Ð 2 1 Ð x 39 x 96 39 x 49 JSR 1 Ð Ð Ð 1 1 Ð 5%15 JUMP 1 Ð Ð Ð Ð Ð Ð LPRi 1 Ð Ð Ð Ð 1 Ð LSHi 1 1 1 Ð Ð 2 Ð MEIi 1 1 Ð Ð Ð 4 16 MODi 1 1 Ð Ð Ð 3 16 MOVi 1 1 Ð 1 Ð Ð Ð Ð Ð Ð Ð Ð Ð Ð Ð 2 1 Ð Ð Ð Ð MOVMi 29 28 Notes field in memory field in register 2%6 x 33 14 x 45 19 23 54 x 73 1 3 3 k xM l k MR l k RR l n e Ý of elements in block 1 1 Ð Ð Ð 2*n Ð 3 * n a 20 1 Ð Ð Ð Ð Ð Ð Ð Ð Ð 1 Ð Ð Ð 2 3 Ð Ð Ð Ð Ð Ð Ð Ð Ð Ð 2*n 2*n Ð Ð 14 * n a 59 24 * n a 54 n e Ý elements no options B, W and/or U option in effect Ð Ð Ð Ð Ð Ð Ð Ð Ð Ð 2*n 2*n Ð Ð 10 * n a 59 24 * n a 54 n e Ý of elements no options B, W and/or U option in effect MOVST Ð Ð n Ð Ð 2*n Ð 27 * n a 54 Translated MOVXBD 1 1 1 Ð 1 Ð Ð 6 MOVXBW 1 1 1 1 Ð Ð Ð 6 MOVXWD 1 1 Ð 1 1 Ð Ð 6 MOVZBD 1 1 1 Ð 1 Ð Ð 5 MOVZBW 1 1 1 1 Ð Ð Ð 5 MOVQi MOVSB, W MOVSD MOVZWD 1 1 Ð 1 1 Ð Ð 5 MULi 1 1 Ð Ð Ð 3 16 15 NEGi 1 1 Ð Ð Ð 2 Ð 5 NOP Ð Ð Ð Ð Ð Ð Ð 3 NOTi 1 1 Ð Ð Ð 2 Ð 5 ORi 1 1 Ð 1 Ð Ð Ð Ð Ð Ð Ð Ð Ð Ð Ð 3 1 Ð Ð Ð Ð 3 4 4 1 1 Ð Ð Ð 3 16 QUOi 76 49 x 55 kMl kRl k xM l k MR l k RR l Appendix B: Instruction Execution Times (Continued) TABLE B-1. Basic Instructions (Continued) Mnemonic ÝTEA1 ÝTEA2 ÝTOPB ÝTOPW ÝTOPD ÝTOPi ÝL 1 1 Ð Ð Ð 3 16 57 Ð Ð Ð Ð n Ð Ð 5 * n a 12 RET Ð Ð Ð Ð 1 Ð Ð 2%8 RETI Ð Ð Ð Ð 1 2 2 2 2 3 Ð Ð Ð Ð 60 60 RETT Ð Ð Ð 2 2 Ð Ð ROTi 1 1 1 Ð Ð 2 Ð RXP Ð Ð Ð 1 2 Ð Ð 2%6 Scondi 1 1 Ð Ð Ð Ð Ð Ð Ð Ð 1 1 Ð Ð 9 10 Ð Ð Ð Ð n Ð Ð 4 * n a 13 SBITi 1 1 1 Ð 2 Ð Ð Ð Ð Ð 1 1 Ð Ð 15 7 k xM l SBITIi 1 1 1 Ð 2 Ð Ð Ð Ð Ð 1 1 Ð Ð 15 7 k xM l Ð Ð Ð Ð Ð Ð Ð 15 Ð Ð Ð Ð Ð n Ð 27 * n a 51 n e Ý of elements, not Translated SKPST Ð Ð n Ð Ð n Ð 30 * n a 51 Translated SPRi 1 Ð Ð Ð Ð 1 Ð SUBi 1 1 Ð 1 Ð Ð Ð Ð Ð Ð Ð Ð Ð Ð Ð 3 1 Ð Ð Ð Ð 3 4 4 k xM l 1 1 Ð 1 Ð Ð Ð Ð Ð Ð Ð Ð Ð Ð Ð 3 1 Ð Ð Ð Ð 3 4 4 k xM l SUBPi 1 1 1 1 Ð Ð Ð Ð Ð Ð 3 3 Ð Ð 16 18 no carry carry SVC Ð Ð Ð 2 4 Ð Ð 40 TBITi 1 1 1 Ð 1 Ð Ð Ð Ð Ð 1 1 Ð Ð 14 4 REMi RESTORE SAVE SETCFG SKPSi SUBCi WAIT XORi Ð Ð Ð Ð Ð Ð Ð 1 1 Ð 1 Ð Ð Ð Ð Ð Ð Ð Ð Ð Ð Ð 3 1 Ð Ð Ð Ð 77 TCY Notes x 62 n e Ý of general registers restored Non-Cascaded Cascaded 45 14 21 6 x 45 False True n e Ý of general registers saved k xR l k xR l x 27 x? 3 4 4 k MR l k RR l k MR l k RR l k xM l k xR l ? e until an interrupt/reset k xM l k MR l k RR l Appendix B: Instruction Execution Times (Continued) TABLE B-2. Floating-Point Instructions: CPU Portion Mnemonic ÝTEA1 ÝTEA2 ÝTOPD ÝTOPi ÝTi ÝTf TCY TPR Notes Ð 1 Ð Ð Ð 1 Ð Ð Ð 1 1 1 Ð f Ð 2f 2f 3f Ð Ð Ð Ð Ð Ð Ð Ð Ð Ð Ð Ð Ð f f 2f 3f 3f 8 0 0 0 0 0 k FF l (25 (27 (13 17 x 17) 24 a f x 29) x 30) x 19) Ð 1 Ð Ð Ð 1 Ð Ð Ð Ð Ð Ð Ð f Ð f f 2f Ð Ð Ð Ð Ð Ð Ð Ð Ð Ð Ð Ð Ð f f f 2f 2f (14 x 17) a 3f Ð 1 Ð Ð Ð 1 Ð Ð Ð Ð Ð Ð Ð 1 Ð 2 2 3 Ð Ð Ð Ð Ð Ð Ð Ð Ð Ð Ð Ð Ð 1 1 2 3 3 Ð 1 Ð Ð Ð 1 Ð Ð Ð Ð Ð Ð Ð 2 Ð 1 1 3 Ð Ð Ð Ð Ð Ð Ð Ð Ð Ð Ð Ð Ð 2 2 1 3 3 Ð 1 Ð Ð Ð 1 Ð Ð Ð Ð Ð Ð Ð f Ð Ð Ð f Ð Ð Ð 1 1 1 1 1 1 1 1 1 Ð f f Ð f f Ð 1 Ð Ð Ð 1 Ð Ð Ð 1 1 1 Ð Ð Ð f f f Ð 1 Ð Ð Ð 1 1 1 1 1 1 1 Ð 1 Ð Ð Ð 1 Ð 1 Ð Ð Ð Ð 1 1 1 Ð Ð Ð Ð f Ð f f 2f Ð f Ð Ð Ð Ð Ð Ð Ð Ð Ð Ð SFSR Ð 1 Ð Ð Ð 1 LFSR Ð 1 Ð Ð Ð 1 ADDf, SUBf, MULf, DIVf MOVf, ABSf, NEGf MOVFL MOVLF TRUNCfi, FLOORfi, ROUNDfi MOVif CMPf (14 a 3f a 6f a 3f a 9f 17 24 a f 23 a 3f 33 a f (20 x 23) a 6f 17 17 8 0 0 6 a TEA2 TEA2 b 3 TEA2 b 3 x 20 25 35 43 35 x 38 20 x 23 16 8 0 0 TEA2 a 6 TEA2 b 4 TEA2 b 3 26 32 42 35 6 0 0 6 a TEA2 TEA2 b 2 b f TEA2b3 x 38 k IF l k FM l k IM l k MM l k FF l k MF l k IF l k FM l k IM l k MM l k FF l k MF l k IF l k FM l k IM l k MM l k FF l k MF l k IF l k FM l k IM l k MM l k FR l 25 a f 20 26 a f (16 x 19) a 4f 9 0 0 TEA2 a 6 TEA2 b 2 TEA2 b 2 b f Ð Ð Ð f f f 25 b f 18 26 20 a 4f 22 a 5f (10 x 13) a 5f 0 0 0 0 0 0 k RF l Ð Ð Ð Ð Ð Ð Ð Ð Ð Ð f f f 2f 2f f 2f 2f 23 x 23) a 31 a f x 30) a 29 x 21) a 37 a f x 29) a 35 a 2f 13 7 7 0 0 0 0 0 0 k FF l Ð Ð Ð Ð 1 1 19 20 7 TEA1 a 4 kRl Ð Ð Ð Ð 1 1 23 0 0 kRl 78 20 k MF l (17 (20 (27 (15 (21 x 20) a 3f 18 x 21 3f 3f 6f 8f k MR l k IR l k FM l k IM l k MM l k MF l k IF l k RM l k IM l k MM l k MF l k IF l k FM l k IM l k MM l k FI l k MI l k II l kMl kMl Appendix B: Instruction Execution Times (Continued) Twaitrdd The number of wait states applied for a Read operation on destination data. B.2 SPECIAL GRAPHICS INSTRUCTIONS This section provides the execution times for the special graphics instructions. Table B-3 lists the average instruction execution times for different shift values and for a no-waitstate system design. The ‘‘No Option’’ of each instruction is used. The effect of wait states on the execution time is rather difficult to evaluate due to the pipelined nature of the read and write operations. Instructions that have shift amounts, such as BBOR, BBXOR, BBAND, BBFOR and BITWT, make use of the parallel nature of the Series 32000É/EP processors by doing the actual shift during the reading of the double-word destination data. This means that if there are wait states on read operations, these instructions are able to shift further, without impacting the overall execution time. For example, the total execution time for a BBFOR operation, shifting 8 bits, with 2 wait states on read operations, is the same as for a BBFOR operation shifting by 12 bits. This is because a destination read takes 4 clock cycles longer than a no-waitstate double-word read does. Note that this effect is not valid for more than 4 wait states because at 4 wait states, all possible shift values (0–15) are ‘‘hidden’’ during the destination read. Table B-4 shows the average execution times with wait states, assuming a shift value of eight unless stated otherwise. The parameters used in the execution time equations are defined below. Twaitrd The number of wait states applied for a Read operation. Twaitr The number of wait states applied for a Write operation. Twaitrds The number of wait states applied for a Read operation on source data. This also refers to the number of wait states applied for a table memory access (in the SBITS instruction, for example). Twaitwrd The number of wait states applied for a Write operation on destination data. Twaitbt Twaitrds a Twaitrdd * 2 a Twaitwrd * 2, the value used for BITBLT timing. width The width of a BITBLT operation, in words. height The height of a BITBLT operation, in scan lines. shift The number of bits of shift applied. B.2.1 Execution Time Calculation for Special Graphics Instructions The execution time for a special graphics instruction is obtained by inserting the appropriate parameters to the equation for that instruction and evaluating it. For example, to calculate the execution time of the BBOR instruction applied to a 10-word wide and 5-line high data block, assuming a shift count of 15 and a no-wait-state system, the following equation from Table B-3 is used. 42 a (107 a 44 * (width b 2)) * height a ((shift b 8) * width * height) Substituting the appropriate values to the shift, width and height parameters yields: 45 a (107 a 44 * (10 b 2)) * 50 a ((15 b 8) * 10 * 50) or 42 a (107 a 352) * 50 a (7 * 500) e 26,492 clocks or 1.77 ms @ 15 MHz This represents the ‘‘worst case’’ time for this instruction, since a shift of greater than 15 bits can be handled by moving the source and destination pointers by 2 bytes and adjusting the shift amount. The ‘‘best case’’ and ‘‘average case’’ times for most instructions are the same, due to reading the destination data during the shifting of the source data. TABLE B-3. Average Instruction Execution Times with No Wait-States Instruction Number of Clock Cycles Notes BBOR 42 a (107 a 44 * (width b 2)) * height 42 a (107 a 44 * (width b 2)) * height a ((shift b 8) * width * height ) Shift e 0 Shift l 8 x8 BBXOR 44 a (107 a 44 * (width b 2)) * height 44 a (107 a 44 * (width b 2)) * height a ((shift b 8) * width * height ) Shift e 0 Shift l 8 x8 BBAND 45 a (111 a 44 * (width b 2)) * height 45 a (111 a 44 * (width b 2)) * height a ((shift b 8) * width * height ) Shift e 0 Shift l 8 x8 BBFOR 48 a (61 a 25 * (width b 2)) * height 48 a (74 a 32 * (width b 2)) * height 48 a (74 a 32 * (width b 2))* height a ((shift b 8) * width * height ) Shift e 0 Shift e 1 Shift l 8 66 a (170 a 60 * (width b 2)) * height 66 a (170 a 60 * (width b 2)) * height a ((shift b 8) * width * height ) Shift e 0 Shift l 8 BBSTOD 79 x8 x8 Appendix B: Instruction Execution Times (Continued) TABLE B-3. Average Instruction Execution Times with No Wait-States (Continued) Instruction BITWT EXTBLT Number of Clock Cycles 16 28 28 a (shift b 8) 35 35 35 35 a a a a (19 (13 (17 (11 a a a a 12 * width 12 * width 13 * width 13 * width MOVMPB,W 16 a 7 * R2 MOVMPD,W 16 a 8 * R2 SBITS 39 42 SBITP 8 a (34 * R2) Notes Shift e 0 Shift e 1 Shift l 8 ) * height ) * height ) * height ) * height x8 Shift e 0 x 8, Pre-Read Shift e 0 x 8, No Pre-Read Shift l 8, Pre-Read Shift l 8, No Pre-Read R2 s 25 R2 l 25 TABLE B-4. Average Instruction Execution Times with Wait-States Instruction Number of Clock Cycles Notes BBOR 42 a ((107 a 2 * Twaitblt) a (44 a Twaitblt) * (width b 2)) * height BBXOR 44 a ((107 a 2 * Twaitblt) a (44 a Twaitblt) * (width b 2)) * height BBAND 45 a ((111 a 2 * Twaitblt) a (44 a Twaitblt) * (width b 2)) * height BBFOR 48 a ((74 a 2 * Twaitblt) a (32 a Twaitblt) * (width b 2)) * height BBSTOD 66 a ((170 a 2 * Twaitblt) a (60 a Twaitblt) * (width b 2)) * height BITWIT 16 a Twaitrds a Twaitrdd a Twaitwrd 28 a Twaitblt Shift e 0 Shift e 1 EXTBLT 35 a (19 a (12 a (Twaitrds a Twaitrdd a Twaitwrd) )* width ) * height 35 a (13 a (12 a (Twaitrds a Twaitrdd a Twaitwrd)) * width ) * height Pre-Read No Pre-Read MOVMPB,W 16 a 7 * R2 a (Twaitwr b 1) * R2 16 a 7 * R2 Twaitwr l 1 Twaitwr s 1 MOVMPD 16 a 8 * R2 a Twaitwr * R2 SBITS 39 a (2 * Twaitrdd a 2 * Twaitwrd a 2 * Twaitrds) 42 a (2 * Twaitrdd a 2 * Twaitrds) SBITP 8 a (34 * R2) a ((Twaitrdd a Twaitwrd) * R2) 80 R2 s 25 R2 l 25 x8 81 NS32CG16-10/NS32CG16-15 High-Performance Printer/Display Processor Physical Dimensions inches (millimeters) Plastic Chip Carrier (V) Order Number NS32CG16V-10 or NS32CG16V-15 NS Package Number V68A LIFE SUPPORT POLICY NATIONAL’S PRODUCTS ARE NOT AUTHORIZED FOR USE AS CRITICAL COMPONENTS IN LIFE SUPPORT DEVICES OR SYSTEMS WITHOUT THE EXPRESS WRITTEN APPROVAL OF THE PRESIDENT OF NATIONAL SEMICONDUCTOR CORPORATION. As used herein: 1. Life support devices or systems are devices or systems which, (a) are intended for surgical implant into the body, or (b) support or sustain life, and whose failure to perform, when properly used in accordance with instructions for use provided in the labeling, can be reasonably expected to result in a significant injury to the user. National Semiconductor Corporation 1111 West Bardin Road Arlington, TX 76017 Tel: 1(800) 272-9959 Fax: 1(800) 737-7018 2. A critical component is any component of a life support device or system whose failure to perform can be reasonably expected to cause the failure of the life support device or system, or to affect its safety or effectiveness. National Semiconductor Europe Fax: (a49) 0-180-530 85 86 Email: cnjwge @ tevm2.nsc.com Deutsch Tel: (a49) 0-180-530 85 85 English Tel: (a49) 0-180-532 78 32 Fran3ais Tel: (a49) 0-180-532 93 58 Italiano Tel: (a49) 0-180-534 16 80 National Semiconductor Hong Kong Ltd. 13th Floor, Straight Block, Ocean Centre, 5 Canton Rd. Tsimshatsui, Kowloon Hong Kong Tel: (852) 2737-1600 Fax: (852) 2736-9960 National Semiconductor Japan Ltd. Tel: 81-043-299-2309 Fax: 81-043-299-2408 National does not assume any responsibility for use of any circuitry described, no circuit patent licenses are implied and National reserves the right at any time without notice to change said circuitry and specifications.