BookL64364PG.fm5 Page i Friday, January 28, 2000 4:58 PM L64364 ® ATMizer II+ ATM-SAR Chip Programming Guide January 2000 Order Number R14012 BookL64364PG.fm5 Page ii Friday, January 28, 2000 4:58 PM This document contains proprietary information of LSI Logic Corporation. The information contained herein is not to be used by or disclosed to third parties without the express written permission of an officer of LSI Logic Corporation. Document number DB15-000072-01, First Edition (January 2000) This document is a guide for system programmer’s involved in the development of application software for the LSI Logic L64364 ATMizer® II+ ATM-SAR Chip, and will remain the official reference source for all revisions/releases of this product until rescinded by an update. To receive product literature, visit us at http://www.lsilogic.com. LSI Logic Corporation reserves the right to make changes to any products herein at any time without notice. LSI Logic does not assume any responsibility or liability arising out of the application or use of any product described herein, except as expressly agreed to in writing by LSI Logic; nor does the purchase or use of a product from LSI Logic convey a license under any patent rights, copyrights, trademark rights, or any other of the intellectual property rights of LSI Logic or third parties. Copyright © 1997 – 2000 by LSI Logic Corporation. All rights reserved. TRADEMARK ACKNOWLEDGMENT The LSI Logic logo design and ATMizer are trademarks or registered trademarks of LSI Logic Corporation. All other brand and product names may be trademarks of their respective companies. ii BookL64364PG.fm5 Page iii Friday, January 28, 2000 4:58 PM Contents Chapter 1 Chapter 2 Introduction 1.1 Hardware Overview 1.2 Typical Application 1.3 Software Overview 1.3.1 Data Structures and Maintenance 1.3.2 Host Messaging 1.3.3 Scheduling 1.3.4 Hashing Function 1.3.5 Packet Aging 1.3.6 Interrupt Handling 1.3.7 OAM Cell Processing 1.3.8 AAL3/4 Processing 1.3.9 Initialization 1.3.10 Operating Software 1-1 1-4 1-7 1-7 1-16 1-20 1-21 1-21 1-21 1-22 1-22 1-22 1-23 Host Messaging 2.1 Host Messaging Overview 2.2 Buffer Processing 2.2.1 Buffer Flow 2.2.2 FIFO Location 2.2.3 FIFO Contents 2.2.4 FIFO Implementations 2.3 Rings 2.3.1 Ring Structure 2.3.2 Ring Management 2.3.3 Ring Implementation (Initialization) 2-1 2-2 2-4 2-8 2-10 2-14 2-20 2-20 2-22 2-23 Contents iii BookL64364PG.fm5 Page iv Friday, January 28, 2000 4:58 PM Chapter 3 iv Scheduling 3.1 Scheduling Invocation 3.1.1 Line Recovered Clock Synchronization 3.1.2 FIFO Full Synchronization 3.2 Scheduler Commands 3.2.1 SCD_Serv( ) Command 3.2.2 SCD_Sched( ) Command 3.2.3 SCD_Tic( ) Command 3.3 The Scheduling Process 3.3.1 A Simple Scheduling Function 3.3.2 Scheduling Lag 3.3.3 Rate Granularity 3.3.4 Time Comparisons 3.3.5 Stopping Connection Scheduling 3.3.6 Race Conditions and Hazards 3.3.7 Scheduling ABR Connections 3.4 UBR Connections 3.4.1 Managing the UBR List in Software 3.4.2 Managing UBR Connections Using the Scheduler 3.5 VBR Connections 3.5.1 PCR-Based Implementation 3.5.2 SCR-Based Implementation 3.6 ABR Connections 3.7 Local Congestion 3.7.1 Fairness 3.7.2 List Lengths 3.7.3 Detecting a Local Congestion 3.7.4 Minimum Cell Rate Guarantees 3.7.5 MultiPHY Operation 3.8 Source Code Listings 3.8.1 Macros and Types Header File (uTypes.h) 3.8.2 ATMizer II+ Header File (ATMizer2.h) 3.8.3 ATMizer II+ Hardware Header File (Hdr.h) 3.8.4 Extended Instructions Header File (Instr.h) 3.8.5 ABR Functions Header File (ABR.h) 3.8.6 TxCell() and RxCell() (Cell.c) Contents 3-1 3-2 3-2 3-3 3-3 3-3 3-4 3-4 3-4 3-5 3-6 3-8 3-9 3-13 3-15 3-20 3-21 3-22 3-24 3-24 3-26 3-27 3-30 3-30 3-35 3-36 3-36 3-38 3-44 3-44 3-45 3-49 3-61 3-62 3-64 BookL64364PG.fm5 Page v Friday, January 28, 2000 4:58 PM 3.8.7 3.8.8 3.8.9 Chapter 4 Chapter 5 Chapter 6 Chapter 7 Chapter 8 Transmit a CBR Cell (CBR.c) Transmit a VBR Cell (VBR.c) Transmit and Receive ABR Cells (ABR.c) 3-66 3-67 3-68 Unschedule 4.1 Introduction 4.2 Unschedule Routine 4-1 4-2 Hashing Function 5.1 Hashing Mechanism 5.2 Hashing Function 5.3 Hash Implementation 5-1 5-2 5-3 Packet Aging 6.1 Introduction 6.2 Mailbox Processing 6.3 Packet Aging Routine 6-1 6-3 6-4 Interrupt Handling 7.1 Introduction 7.2 Nonvectored Interrupt Handler 7.3 Vectored Interrupt Handler 7.3.1 Enable Interrupts 7.3.2 General Handler 7.3.3 Individual Handlers 7-1 7-2 7-5 7-6 7-8 7-10 OAM Cell Processing 8.1 Introduction 8.2 F4 OAM Flow 8.2.1 Initialization of F4 Flow 8.2.2 F4 Flow Transmit 8.2.3 F4 Flow Receive 8.2.4 Host Processing of F4 Flow 8.3 F5 OAM Flow 8-1 8-2 8-3 8-5 8-6 8-9 8-11 Contents v BookL64364PG.fm5 Page vi Friday, January 28, 2000 4:58 PM Chapter 9 Chapter 10 vi AAL3/4 Processing 9.1 Introduction 9.2 AAL3/4 Segmentation 9.3 AAL3/4 Reassembly Initialization 10.1 Initialization Overview 10.2 Booting Procedures 10.2.1 Default ATMizer II+ Chip Initialization 10.2.2 Secondary Port EPROM Boot Sequence 10.2.3 Cell Buffer Memory/Serial PROM Boot Sequence 10.2.4 Cell Buffer Memory/Primary Port Boot Sequence 10.3 C Preamble Execution 10.4 CPU Initialization and Configuration 10.4.1 Configuration and Cache Control Register 10.4.2 Cache Configuration 10.4.3 Dcache and D-RAM Configuration 10.4.4 Dcache and C-RAM Usage 10.4.5 Icache and I-RAM Configuration 10.4.6 Icache and I-RAM Usage 10.5 Configuration Header File 10.6 Host PCI Access 10.6.1 PCI Bus Configuration 10.6.2 PCI Access to the ATMizer II+ Memory Space 10.7 Memory Allocation 10.7.1 Receive Direction 10.7.2 Transmit Direction 10.7.3 Connection Descriptors 10.7.4 Buffer Descriptors 10.7.5 Data Exchanging Blocks 10.7.6 Related Issues 10.8 Hardware Registers Initialization 10.8.1 EDMA Registers 10.8.2 Scheduler Registers 10.8.3 ACI Registers Contents 9-1 9-2 9-6 10-1 10-3 10-3 10-5 10-5 10-7 10-8 10-10 10-10 10-13 10-14 10-17 10-19 10-20 10-25 10-31 10-31 10-33 10-35 10-38 10-38 10-39 10-39 10-41 10-42 10-43 10-46 10-49 10-50 BookL64364PG.fm5 Page vii Friday, January 28, 2000 4:58 PM 10.8.4 Timer Registers 10.8.5 APU Registers Data Structures Initialization 10.9.1 VCD and ACD Initialization 10.9.2 BFD Initialization 10.9.3 Calendar Table Initialization 10.9.4 Ring Initialization 10.9.5 Free Cell List 10.9.6 Miscellaneous Data Structures 10-55 10-57 10-58 10-58 10-61 10-63 10-64 10-65 10-65 Operating Software 11.1 Top Level Structure 11.2 APU Program 11.2.1 Cell Operation Flow 11.2.2 Buffer Operation Flow 11.2.3 Pseudocode 11.3 Host Program 11.3.1 Setting up a Configuration File 11.3.2 Host Tasks 11-1 11-2 11-2 11-3 11-4 11-5 11-5 11-6 10.9 Chapter 11 Customer Feedback Figures 1.1 1.2 1.3 1.4 1.5 1.6 1.7 2.1 2.2 2.3 2.4 2.5 2.6 L64364 Functional Block Diagram ATMizer II+ Application Development Platform Block Diagram Host Connection Descriptor Format Buffer Descriptor and Buffer Relationship ATMizer II+ Memory Organization Mailbox Entry Format Ring Message Format Buffer Descriptor Layout Transmit Flow Receive Flow FIFO Descriptor Declaration PutFifo() Routine GetFifo() Routine Contents 1-3 1-5 1-9 1-13 1-13 1-16 1-20 2-2 2-5 2-7 2-15 2-15 2-16 vii BookL64364PG.fm5 Page viii Friday, January 28, 2000 4:58 PM 2.7 2.8 2.9 2.10 2.11 2.12 2.13 2.14 2.15 2.16 2.17 2.18 2.19 2.20 2.21 2.22 2.23 2.24 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9 3.10 3.11 3.12 3.13 3.14 3.15 3.16 3.17 3.18 3.19 3.20 viii FIFO Operations Enhanced Fifo Descriptor Declaration CBM Layout TxFifo Descriptor Location APU TxFifo Descriptor Initialization Host TxFifo Descriptor Initialization Modified PutFifo() and GetFifo() Routines PutFifo() and GetFifo() without Rd Pointer Update Ring Descriptors Declaration CBM Ring Size Primary Memory Ring Size Ring Initialization Host PutRing() Call Host GetRing() Call APU Ring Initialization APU PutRing() Call APU GetRing() Call GetRing() and PutRing() Routines A Simple Scheduling Function Handling Scheduling Lag Connection Scheduled with Rate 0.3 Calculating Fractional Service Time Handling Time Comparisons Stopping Connection Scheduling Buff Completion Queue Interrupt Handler Connection Rescheduling Race Conditions Interrupt Handler without Race Condition Resetting the Connection Scheduled Flag TxCell() Routine for Multiple Class Connections TxCell() Routine Handling Out-of-Rate Cells Implementing a UBR Connection List Managing UBR Lists with the Scheduler UBR_Send and CBR_Send Combined A Leaky Bucket Routine An SCR-Based Leaky Bucket Algorithm A MultiPHY TxCell() Enhanced MultiPHY Code Contents 2-16 2-17 2-18 2-18 2-18 2-18 2-19 2-20 2-21 2-21 2-21 2-23 2-23 2-24 2-24 2-24 2-24 2-25 3-4 3-6 3-7 3-7 3-9 3-10 3-11 3-12 3-13 3-14 3-14 3-16 3-19 3-21 3-23 3-23 3-25 3-27 3-39 3-40 BookL64364PG.fm5 Page ix Friday, January 28, 2000 4:58 PM 3.21 Buff Completion Queue Interrupt Handler for MultiCalendar Support 3-41 4.1 Unschedule Routine 4-2 5.1 Hashing Table Declarations 5-2 5.2 Hashing Table Initialization 5-3 5.3 Find Prime Routine 5-4 5.4 Inserting a Connection into the Hashing Table 5-5 6.1 HCD_Rx Structure Declarations 6-3 7.1 Nonvectored Interrupts General Handler 7-2 7.2 General Handler Exit to PMON 7-5 7.3 Vectored Interrupts Enabling Routine 7-6 7.4 Vectored Interrupts General Handler 7-8 7.5 IntRxMbx Interrupt Handler 7-10 8.1 OAM Cell Declarations 8-2 8.2 OAM Flow Connection Information 8-3 8.3 OAM Cell Initialization 8-4 8.4 OAM Cell Header Formation 8-5 8.5 OAM_Send() Routine 8-5 8.6 APU OAM_Receive() Routine 8-7 8.7 Host OAM_Receive() Routine 8-9 9.1 AAL3/4 Cell Layout 9-2 9.2 ACD_Ctrl_t Structure 9-3 9.3 SAR_PDU Header Declarations 9-3 9.4 AAL34_Send() Routine 9-5 9.5 AAL34_Receive() Routine 9-6 10.1 CCC Register and SDRAM Controller Initialization 10-4 10.2 Serial Boot Routine 10-6 10.3 Sample Initialization Code 10-8 10.4 CCC Register Layout 10-11 10.5 Tag Test Mode Loaded Data Format 10-15 10.6 Data RAM Configuration Code 10-15 10.7 Separating the Code with the Linker Script 10-21 10.8 Main Loop Example 10-22 10.9 Setting and Loading IRAM 10-23 10.10 PCI Configuration Space Registers 10-32 10.11 PCI Configuration Address Format 10-32 10.12 Programming the Latency Timer in the PCI Configuration Register 10-33 Contents ix BookL64364PG.fm5 Page x Friday, January 28, 2000 4:58 PM 10.13 10.14 10.15 10.16 10.17 10.18 10.19 10.20 10.21 10.22 10.23 10.24 10.25 10.26 10.27 10.28 11.1 11.2 Allocating Memory to Data Structures ATMizer Code Size Calculation Memory-T Variables Initialization Updating Memory Pointers Loc_BuffPCI and Loc_BuffSec Format Initializing EDMA Registers Initializing Scheduler Registers Initializing ACI Registers Cascading Timers for a Long Watchdog Timeout Clearing the APU_Reset Bit Clearing VCD Fields Initializing BFDs Clearing the Calendar Table Initializing Host Rings Initializing APU Rings Free Cell List Initialization A Typical Configuration File Opening Connections 1.1 1.2 1.3 1.4 1.5 1.6 2.1 2.2 2.3 2.4 2.5 3.1 3.2 3.3 3.4 3.5 3.6 Host Connection Descriptor Fields Data Transfer Modes Data Exchange with Host DMA Data Exchange without Host DMA Mailbox Messages Statistics Result Fields FIFOs between the Host and APU Three-Way Messaging, Transmit Direction Three-Way Messaging, Receive Direction Two-Way Messaging, Transmit Direction Two-Way Messaging, Receive Direction Time Comparisons Simulation Results for Class 0 and 1 Connections Simulation Results for Class 2 Connections Calendar List Length for Varying Link Utilizations Initial Setup for MultiPHY Connections PHY 0 Statistics at 155 Mbps with a Single Calendar 10-35 10-36 10-36 10-36 10-40 10-48 10-50 10-54 10-56 10-58 10-58 10-62 10-64 10-64 10-65 10-65 11-6 11-11 Tables x Contents 1-10 1-14 1-15 1-15 1-16 1-18 2-10 2-11 2-11 2-12 2-13 3-8 3-31 3-33 3-36 3-42 3-43 BookL64364PG.fm5 Page xi Friday, January 28, 2000 4:58 PM 3.7 7.1 7.2 7.3 10.1 10.2 10.3 10.4 10.5 10.6 10.7 10.8 10.9 10.10 10.11 10.12 10.13 PHY 0 Statistics at 155 Mbps with Multiple Calendars Nonvectored Interrupt Sources Vectored Interrupt Sources General Register Map Data Section Allocation Configuration Header File Contents PCI Virtual Address vs. Base Addresses ATMizer II+ External Memory Map Secondary Bus Memory Map BFD Number Allocation Buffer Location in Secondary Memory ATMizer II+ Hardware Registers to be Initialized Data and BFD Transfer Modes ACI Control Register Initialization External Vectored Interrupts Required Open Connection Parameters ACD Field Calculations Contents 3-43 7-2 7-5 7-13 10-17 10-26 10-34 10-34 10-35 10-37 10-41 10-44 10-46 10-51 10-57 10-59 10-60 xi BookL64364PG.fm5 Page xii Friday, January 28, 2000 4:58 PM xii Contents BookL64364PG.fm5 Page xiii Friday, January 28, 2000 4:58 PM Preface This document provides information for system programmers and developers who have a need to evaluate or program the L64364 ATMizer® II+ ATM-SAR Chip. Audience This document assumes that you have some familiarity with ATM, microprocessors and related support devices. The people who benefit from this book are: • engineers and managers who are evaluating the processor for possible use in a system, and • software engineers who are designing the processor into a system. Organization This document has the following chapters and appendices: • Chapter 1, Introduction, describes the general characteristics and features of the L64364 ATMizer II+ ATM-SAR chip, describes the data structures used by the chip, and provides an overview of the typical software. • Chapter 2, Host Messaging, describes the exchange of control information between the host and the ATMizer II+ chip. It also discusses methods of transferring information over the PCI bus. • Chapter 3, Scheduling, describes the scheduling process and its implementation in hardware by the ATMizer II+ chip, and includes sample scheduling code. • Chapter 4, Unschedule, describes the unscheduling process and its implementation in hardware by the ATMizer II+ chip, and includes sample unscheduling code. Preface xiii BookL64364PG.fm5 Page xiv Friday, January 28, 2000 4:58 PM • Chapter 5, Hashing Function, describes the hashing mechanism and its implementation in the ATMizer II+ chip with sample code. • Chapter 6, Packet Aging, presents an overview of the packet aging process and its relationship to the host processor. • Chapter 7, Interrupt Handling, describes the external interrupts and resets, their interaction with the ATMizer II+ chip, and includes sample code for interrupt handlers. • Chapter 8, OAM Cell Processing, describes the handling of F4 and F5 Operations and Management Cells by the ATMizer II+ chip. • Chapter 9, AAL3/4 Processing, describes ATM Adaptation Layer 3/4 and how the ATMizer II+ chip’s application code can be modified to support AAL3/4 processing. • Chapter 10, Initialization, describes initialization, configuration, and booting procedures to prepare the ATMizer II+ chip for programming. • Chapter 11, Operating Software, describes typical APU and host operations and includes code segments. Related Publications • L64364 ATMizer® II+ ATM-SAR Chip Technical Manual, LSI Logic Corporation, Order Number R14008. • L64364 ATMizer® II+ Application Development Platform User’s Guide, Revision 1.0, Preliminary. • ATM Forum Traffic Management Specifications • MIPS Programmer’s Handbook Conventions Used in This Manual The first time a word or phrase is defined in this manual, it is italicized. The word assert means to drive a signal true or active. The word deassert means to drive a signal false or inactive. Hexadecimal numbers are indicated by the prefix “0x” —for example, 0x32CF. Binary numbers are indicated by the prefix “0b” —for example, 0b0011.0010.1100.1111. xiv Preface BookL64364PG.fm5 Page 1 Friday, January 28, 2000 4:58 PM Chapter 1 Introduction The application code developed in this manual is provided as a design example. It is distributed with the expectation that it will be useful, but without warranty of any kind. The code may be changed without further notice. The code supplied with your ATMizer II+ chip or Application Development Platform (ADP) was initially written for the ADP. You will need to modify it appropriately for your system design. This chapter contains the following sections: • Section 1.1, “Hardware Overview” • Section 1.2, “Typical Application” • Section 1.3, “Software Overview” 1.1 Hardware Overview The L64364 ATMizer II+ ATM-SAR chip provides 155 Mbits/s of full-duplex operation while performing segmentation and reassembly (SAR) of ATM Adaptation Layer 5 (AAL5) Convergence Sublayer Protocol Data Units (CS-PDUs). Refer to the block diagram in Figure 1.1. A specialized, hardwired AAL5 protocol SAR engine, called the Enhanced DMA (EDMA), assists the MIPS-based ATM Processing Unit (APU) in segmentation and reassembly tasks and memory management functions. Although the EDMA is responsible for all basic segmentation and reassembly functions, it operates under full control of the APU. The APU is responsible for traffic management, host messaging, and any other upper layer tasks. As an option, the advanced functions of the hardwired units may be switched off to give the APU full control of all operations. However, this impacts overall performance. L64364 ATMizer II+ ATM-SAR Chip Programming Guide 1-1 BookL64364PG.fm5 Page 2 Friday, January 28, 2000 4:58 PM The APU is based on the LSI Logic MIPS II compatible CW4011 RISC microprocessor core. The processor delivers 160 MIPS peak (110 MIPS sustained) when operating at 80 MHz. The APU instruction set is extended with ATM-specific instructions to enhance performance. These instructions accelerate the cell rate calculations for Available Bit Rate (ABR) services by allowing direct arithmetic operations (add, subtract and multiply) on rates expressed as ATM Forum floating point 15-bit numbers. Scheduling and policing of different ATM Quality of Service (QoS) connections can be achieved efficiently with the help of the integrated hardware Scheduler. The Scheduler supports six priority classes. It uses calendar tables to create arbitrary traffic schemes to a limit of 64 K Virtual Connections. The ATM Cell Interface (ACI) handles the transfer of cells between the CBM and the Utopia Port. The Utopia Port complies with The ATM Forum Utopia Level 2, v1.0, multi-PHY specification. The port operates at 50 MHz with 8-bit data buses and cell-level handshaking. The Timer Unit includes a set of hardware timers and registers that provide real-time events for the APU. There are seven general-purpose timers and a TimeStamp Counter implemented in a set of registers. The start count of the general-purpose timers can be set and can be cascaded for longer timed intervals. The input clocks to the timers are individually selectable between an external input or the L64364 system clock. The primary host interface for the device is a 33 MHz, 32-bit wide Peripheral Component Interconnect (PCI) bus. As the bus master, the L64364 is able to autonomously access control and data structures located in the system memory. As a bus slave, the device provides transparent access to secondary memory and to the internal Cell Buffer Memory (CBM) for external PCI bus masters. The PCI interface implements four separate FIFOs to maximize the performance of simultaneous read/write operations as bus master or slave. The L64364 integrates a Secondary Bus memory controller that provides a glueless interface for asynchronous SRAMs, synchronous SRAMs and synchronous DRAMs for secondary memory. It can also serve as an interface to external physical layer devices such as framers. The memory 1-2 Introduction BookL64364PG.fm5 Page 3 Friday, January 28, 2000 4:58 PM controller allows APU booting from parallel, byte-wide EPROMs, and from serial EPROMs. The device includes a JTAG controller and boundary scan logic to simplify board-level tests. Figure 1.1 L64364 Functional Block Diagram PCI Bus Local Bus Clock In PCI Interface Secondary Bus Memory Controller Clock PLL Secondary Port Primary Port JTAG Controller ATM Processing Unit 8 KB Instruction Cache 4 KB Data Cache Enhanced DMA 4 KB Cell Buffer Memory Scheduler Unit Timer Unit ATM Cell Interface Utopia Bus Hardware Overview 1-3 BookL64364PG.fm5 Page 4 Friday, January 28, 2000 4:58 PM 1.2 Typical Application Figure 1.2 shows a block diagram of LSI Logic Corporation’s ATMizer II+ Application Development Platform (ADP). The main features of the system are: • A MIPS 4011 RISC processor from LSI Logic working as the control (Host) processor, running at 80 MHz. • Two ATMizer-II+ ATM-SAR devices, with embedded MIPS 4011 RISC processors, running up to 80 MHz. • Host CPU and two ATMizer II+ SAR devices interface with each other over a 33 MHz, 32-bit PCI bus, compliant with PCI local bus specifications, Version. 2.1. • Host CPU interfaces with the PCI bus through a PCI bridge chip from V3 Semiconductor. • Extra PCI motherboard connectors for supporting up to two additional PCI cards in the system. • All three processors execute PROM based debug monitor (PMON) from LSI Logic, providing command-line user interface over RS232 Serial ports. • 10BASE-T Ethernet interface with host CPU, with Trivial File Transfer Protocol TFTP support. • Four ATM physical layer devices (PHY) supporting the multiPHY Utopia level 2 functionality. • Utopia bus configuration (ATM-PHY Interface) configurable through front panel DIP switches. • Utopia frequency configurable through on-board jumpers; default is 33 MHz. • All important nets in the design available on headers for probing and logic analysis. The external interfaces with the ADP system include: 1-4 • Six RS232C Serial Ports, two per processor. One port is used for command-line user interface, and the other port is used for code downloading. • 10BASE-T Ethernet interface with the host CPU. The PMON monitor program on host CPU supports TFTP for data transfer over Ethernet. Introduction BookL64364PG.fm5 Page 5 Friday, January 28, 2000 4:58 PM • Four OC-3 (SONET) ATM-UNI Interfaces for ATM traffic. The aggregate throughput is limited by ATMizer II+ Utopia interface. • Front panel DIP switches for system configuration. • Front and back panel LED indicators. • 110/240 V, 50/60, Hz AC Power supply. Figure 1.2 ATMizer II+ Application Development Platform Block Diagram Local SDRAM 1Mx64 64 Host CPU Interface Controller EPLD MBUS Host CPU LR4500 LBUS 32 Buffer SONIC Ethernet Controller PCI Bridge Controller EPLD SCN2681 DUART Address Buffer and Data Buffer/Latch Latch 8 32 Am29F040 FLASH 512Kx8 8 82C55 PIO PCI Arbiter EPLD V292PBC PCI Bridge Shared ASRAM 128Kx32 32 PCI Bus Second ATMizer II+ SAR2 First ATMizer II+ SAR1 Typical Application First Spare PCI Connector Second Spare PCI Connector 1-5 BookL64364PG.fm5 Page 6 Friday, January 28, 2000 4:58 PM Figure 1.2 ATMizer II+ Application Development Platform Block Diagram (Cont.) PCI Bus 32 Utopia XC1736D Serial PROM ATMizer II+ ATM-SAR 32 SAR Controller EPLD ATMizer II+ Secondary Bus Local SDRAM 1Mx32 Buffer Local SSRAM 32Kx32 32 32 8 Local ASRAM 128Kx32 8 8 Am29F040 FLASH 512Kx8 82C54 TIMER ATMizer II+ Utopia Bus 8 8 S/UNI-LITE CY7C955 ATM-UNI PHY Device ATM-UNI PHY Device SONET SONET Optical Transceiver Optical Transceiver Zero Delay Buffer Utopia Controller EPLD Utopia Bus of other SAR 1-6 Introduction 8 SCN2681 DUART BookL64364PG.fm5 Page 7 Friday, January 28, 2000 4:58 PM 1.3 Software Overview This section describes the data structures used for the ATMizer II+ chip and summarizes the contents of the remaining chapters in this manual. 1.3.1 Data Structures and Maintenance The following sections describe the functions, format, and maintenance of connection numbers, Virtual Connection Descriptors (VCDs), APU Connection Descriptors (ACDs), Host Connection Descriptors (HCDs), Buffers, and Buffer Descriptors (BFDs). 1.3.1.1 Connection Numbers Ideally, the host processor needs to know which connection number to use when it opens a connection. It should use the most recently freed connection number for the next open connection if there is one. In the receive direction, the APU needs to read the cell header to determine which connection the cell belongs to through a hashing table mechanism. In the transmit direction, the host inserts a connection number into the VPI/VCI fields in the cell header. The connection numbers are limited to the range of 0 to MAX_CON_NUM − 1. OAM cells are processed by using predefined connection numbers that are different from those associated with regular data flow. All connections are statically opened during the initialization period. 1.3.1.2 Virtual Connection Descriptors VCDs store control information about virtual connections and are typically created when connections are established. They are initialized by the APU and managed automatically by the EDMA. Only the EDMA and the Scheduler can access them during normal operation. VCDs can be located in secondary memory and/or in CBM. Generally, the VCDs are located in non-cacheable secondary memory area to keep their consistency since multiple modules can access and modify their contents. Refer to the L64364 ATMizer II+ ATM-SAR Chip Technical Manual for details. Software Overview 1-7 BookL64364PG.fm5 Page 8 Friday, January 28, 2000 4:58 PM 1.3.1.3 Host Connection Descriptors A Host Connection Descriptor (HCD) contains parameters required by the APU for an open connection operation plus some other fields necessary for the host. Parameters needed by the APU depend on the connection traffic class. The HCDs are located in the 8 Mbyte host private memory. The host initializes and maintains an array containing one HCD per requested connection. The HCD size is 128 bytes. Figure 1.3 shows the format of the descriptor and Table 1.1 describes the fields of the descriptor. 1-8 Introduction BookL64364PG.fm5 Page 9 Friday, January 28, 2000 4:58 PM Figure 1.3 Host Connection Descriptor Format 31 0 0 Connection Number 4 VCD_Ctrl 8 Cell Header (reserved) 12 Crc32 Host --> 16 PCR APU fields Host PCR PCR 20 SCR MCR Class 24 MBS ICR dependent 28 TBE fields 32 FRIT 36 Status 40 BytesRec 44 BytesSent maintenance 48 BadBuff fields 52 PDUSize 56 StartTime 60 TimeStamp 64 BFD_HT 68 Head_PDU 72 Curr_PDU 76 Tail_PDU Software Overview PCR 1-9 BookL64364PG.fm5 Page 10 Friday, January 28, 2000 4:58 PM Table 1.1 1-10 Host Connection Descriptor Fields Name Addr Class Description Init ConNum 0 All Connection Number Yes VCD_Ctrl 4 All VCD_Ctrl field Yes Reserved 8 All Cell header (implemented later) Yes Crc32 12 All CRC32 for AAL0 mode Yes PCR 16 All Peak Cell Rate in cells/s, 24-bit integer Yes MCR 21 ABR Minimum Cell Rate in cells/s, 24-bit integer Yes SCR 22 VBR Sustained Cell Rate in cells/s, 24-bit integer Yes ICR 25 ABR Initial Cell Rate in cells/s, 24-bit integer Yes MBS 26 VBR Maximum Burst Size Yes TBE 29 ABR Transient Buffer Exposure Yes FRTT 33 ABR Fixed Round-Trip Time Yes Status 36 Host Connection Status Closed BytesRec 40 Host Number of bytes received 0 BytesSent 44 Host Number of bytes sent 0 BadBuff 48 Host Number of bad buffers received 0 PDUSIze 52 Host Size of PDU 0 StartTime 56 Host Start time of connection 0 TimeStamp 60 Host TimeStamp of last received BFD 0 BFD_HT 64 Host Head and tail of BFD list of PDU 0 Head_PDU 68 Host Head of PDU list 0 Curr_PDU 72 Host Current PDU being sent to APU 0 Tail_PDU 76 Host Tail of PDU list 0 Introduction BookL64364PG.fm5 Page 11 Friday, January 28, 2000 4:58 PM The APU only needs the first 32 bytes of information in the HCD. Before the host issues an open connection command, it copies the first 32 bytes in the HCD for the ready-to-open connection from the host private memory to the primary memory. The open command tells the APU the memory location of these bytes. See Section 1.3.2.1, “Mailbox,” for details. The rest of the fields in the HCD are for the host’s internal maintenance. For each connection the host needs to know the: Status – This field gives the status of the connection. CLOSED – The connection is closed (the SAR acknowledged the close request). REQ_OPEN – The host requests the SAR to open the connection. OPEN – The SAR acknowledges the open request from the host. REQ_CLOSED – The host requests the SAR to close the connection. BytesRec – This field stores the number of bytes received. It is updated each time a buffer is received from the SAR. BytesSent – This field stores the number of bytes sent. It is updated each time a buffer is sent to the SAR. BadBuff – This field stores the number of bad buffers received. It is updated according to the BFD_Ctrl field each time a buffer is received from the SAR. This is included on the statistics display. PDUSize – This is the size of the PDU accumulated by the host from the APU. StartTime – This field contains the timer value at the time the connection was opened. It is used to calculate the actual transmission rate for that connection. TimeStamp – The timestamp of the last received BFD from the APU. Head_PDU, Curr_PDU and Tail_PDU – The pointers to the PDU list attached to the HCD to be sent to the APU. Software Overview 1-11 BookL64364PG.fm5 Page 12 Friday, January 28, 2000 4:58 PM 1.3.1.4 APU Connection Descriptors ACDs are used by the APU to hold connection related parameters. They are only accessible by the APU and can be located in secondary memory, CBM, Dcache and/or D-RAM. There are different ACDs for different types of connections. To save accessing time, the size of an ACD should not exceed 32 bytes. The APU needs to initialize the necessary fields in the ACDs for each connection. At connection setup, the APU fetches these parameters from a known location in the host’s primary memory and manipulates them internally. Refer to Chapter 3, Scheduling, for details. The calculation of the ACD is described in Table 10.13. 1.3.1.5 Buffers Buffers hold actual data. The buffer data is transferred by the EDMA and generated/consumed by the host. Buffers can be located in secondary memory and/or primary memory. For the simplicity of buffer management, buffer memory is allocated at initialization time. In this static memory allocation scheme, the buffers are managed as a free allocated memory pointer queue (stack). 1.3.1.6 Buffer Descriptors Buffer Descriptors (BFDs) hold control information about buffers and are attached to VCDs when buffers are segmented or reassembled (see Figure 1.4). BFDs are mainly accessed by the EDMA. They may also be accessed by the APU in more specialized ways of buffer memory management. BFDs can be located in secondary memory and/or primary memory. Both the host and the APU have their own buffers. Refer to the L64364 ATMizer II+ ATM-SAR Chip Technical Manual for more details. 1-12 Introduction BookL64364PG.fm5 Page 13 Friday, January 28, 2000 4:58 PM Figure 1.4 Buffer Descriptor and Buffer Relationship Buffers (actual data) BFD pBuffData Buffer List 0 0 6 7 8 1 0 0 0 0 1 2 3 4 5 pBuffData pBuffData pBuffData 6 7 8 9 10 BFD_Ctrl BFD_UU ConNum BuffSize NextBFD Buffer Descriptor pBuffData_PCI BFD_FreeSel pBuffData_Sec 1.3.1.7 Buffer, PDU, and BFD Maintenance The data communication between the host and the APU is performance critical because of the memory accessing. Figure 1.5 briefly illustrates the memory organization of ATMizer II+. Figure 1.5 ATMizer II+ Memory Organization Primary Memory Software Overview L64364 Secondary Memory 1-13 BookL64364PG.fm5 Page 14 Friday, January 28, 2000 4:58 PM Table 1.2 lists the different modes of transferring buffers and BFDs between primary memory (PM), secondary memory (SM), and cell buffer memory (CBM). Table 1.2 Data Transfer Modes Mode Type Description Cell Mode Individual cells are exchanged between the CBM and PM or SM. Packet Mode Packets are exchanged between the SM and PM using the EDMA move processor. BFD Far Mode BFDs are located in PM. BFD Local Mode BFDs are located in SM BFD Copy Mode BFDs are copied between PM and SM. There are two methods of exchanging the cells/packets and BFDs between memories. The first method is to let the host DMA write the transmitting cells/packets and BFDs to secondary memory and let the EDMA write the received cells/packets and BFDs back to primary memory. Since write operations through the PCI Bus are always faster than read operations, this will save PCI transmission time, given the assumption that the host has DMA capability. The second method assumes that the host does not have DMA capability. The EDMA performs the data exchange between primary memory and secondary memory and the operation is transparent to the ATMizer-II+ and the host. The ADP uses this method. Also, the firmware operates in both Cell mode and Packet mode. The data exchanges are summarized in Table 1.3 and Table 1.4. 1-14 Introduction BookL64364PG.fm5 Page 15 Friday, January 28, 2000 4:58 PM Table 1.3 Data Exchange with Host DMA Direction Cell Mode Packet Mode Tx Host writes to SM (Optimum) Move processor reads from PM Rx Host reads from SM Move processor writes to PM (Optimum) Table 1.4 Data Exchange without Host DMA Direction Cell Mode Packet Mode Tx Tx processor reads from PM Move processor reads from PM (Optimum) Rx Host reads from PM Move processor writes to PM (Optimum) The location of the BFDs is based on how the following configuration fields in the EDMA_Ctrl register are defined: • EDMA_TxBFD_Far • EDMA_TxBFD_Copy • EDMA_RxBFD_Far • EDMA_RxBFD_Copy Refer to the L64364 ATMizer II+ ATM-SAR Chip Technical Manual for details. 1.3.1.8 Calendar Table The Calendar Table is a cell slot array managed by the Scheduler. Each entry in the Calendar Table corresponds to one cell slot and contains connection numbers of VCs to be serviced in that slot. It is implemented in secondary memory. Refer to the L64364 ATMizer II+ ATM-SAR Chip Technical Manual for details. Software Overview 1-15 BookL64364PG.fm5 Page 16 Friday, January 28, 2000 4:58 PM 1.3.2 Host Messaging The APU and host pass messages to each other in one of the following ways: • Mailbox: command and feedback exchanges • Ring: buffer number exchanges 1.3.2.1 Mailbox The external primary port bus master (the host) issues statistics and connection commands to the ATMizer-II+ through the PCI Mailbox (built on the ATMizer II+ chip). When the command actions are completed, the APU sends the acknowledgment back to the host by writing it into a predefined memory location. The content of the acknowledgment is exactly the same as that of the command. This fixed memory location functions similar to a mailbox. Note that the acknowledgment is not written into the host PCI Mailbox to reduce the PCI bus traffic. Refer to the L64364 ATMizer II+ ATM-SAR Chip Technical Manual for the detailed functionality and structure of the Mailbox. Mailbox Message Data Structure – Each entry in the Mailbox is a 32-bit word. The general format of this word is shown in Figure 1.6. Only the LSB (bits 15 - 0) is used to decode the type of the message. The rest of the bits are message-type dependent. All the messages are defined in Table 1.5. Figure 1.6 Mailbox Entry Format 31 16 15 Message Dependent Table 1.5 1-16 0 Message Type Mailbox Messages Command Bits [15:0] Bits [31:16] Get Statistics Report 0x0001 Target Address Open Connection 0x0002 Address to the Host Connection Descriptor Close Connection 0x0003 Connection Number Introduction BookL64364PG.fm5 Page 17 Friday, January 28, 2000 4:58 PM Connection Commands – The open connection command transfers the host-negotiated/determined connection parameters to the ATMizer II+ in the following manner. The host specifies an address pointer pointing to a block of memory, copies the part of the HCD associated with that connection to this memory block, and then tells the APU the starting address of this block in the open connection command message. The APU can then read the desired parameters of the connection from the HCD. The size of this memory block is fixed at 32 bytes but the content varies according to the traffic class of the connection. Refer to Section 1.3.1.3 for details. Generally, the host statically opens all connections, one by one, during the initialization period. After retrieving an open connection command, the APU: • initializes the VCD for that connection, • reads the parameters from the address-given memory block, and • calculates the associated ACD. For each connection, the APU needs to open two connections, a transmit (Tx) and a receive (Rx) connection. The connection number for the Tx connection is in the parameter memory block. If the MAXIMUM_ CONNECTION_NUMBER is 1 K, the APU simply assigns the Tx connection number + 1 K as the Rx connection number. No specific parameters are needed for the Rx connection. After the creation of both connections, the APU acknowledges to the host by repeating the command message back to the predefined primary memory location. When the host receives the acknowledgment, it continues with the next open connection command and so on. In a system loopback configuration, the RX connection number lookup method will not be correct since the number in the VPI/VCC fields of the cell header contain the Tx connection number. To correct this, the APU adds MAXIMUM_CONNECTION_NUMBER to the result retrieved from the API/ACI fields of the received cell. When two ADPs are connected back-to-back, the APU in one system modifies the received cell header API/ACI fields before it sends the cells back to the other system. For the close connection command, the APU clears both the Tx connection VCD and the Rx connection VCD. The Tx connection number is in the command message and the Rx connection number is derived Software Overview 1-17 BookL64364PG.fm5 Page 18 Friday, January 28, 2000 4:58 PM as described above. After that, the APU acknowledges by copying the close connection command back to the predefined location in the primary memory. The host checks the content of this location to make sure that the APU has completed the command before issuing another command. Get Statistics Command – Statistics results include the number of: • cells sent, • cells received, • PDUs transmitted, • PDUs received, and • errors. The three MSBs of the command indicate the command type and the rest of the bits point to the initial address of a 64-byte fixed block in the primary memory. The APU puts all of the statistics information in this block. Table 1.6 describes the fields of the statistics information. Then the APU acknowledges by copying the command back to the predefined location in primary memory. Again, the host checks this location before issuing another command to avoid a race condition. Table 1.6 Statistics Result Fields Fields Addr (Byte) Description RxCells 0 number of received cells TxCells 4 number of transmitted cells RxPDU 8 number of received PDUs TxPDU 12 number of transmitted PDUs ErrTimeout 16 number of received aborted (timeout) PDUs ErrRxLost 20 number of received lost cells ErrConNum 24 number of wrong connection number ErrCrc10 28 number of errored (CRC10) RM/OAM cells (Sheet 1 of 2) 1-18 Introduction BookL64364PG.fm5 Page 19 Friday, January 28, 2000 4:58 PM Table 1.6 Statistics Result Fields (Cont.) Fields Addr (Byte) Description ErrCrc 32 number of received CRC32-errored PDUs ErrLength 36 number of received length-errored PDUs ErrAbort 40 number of received aborted (zero length) PDUs ErrLowMem 44 times that one free buffer list is empty ErrNoContBuff 48 number of transmitted partially-built PDUs ErrNoMem 52 times that both free buffer lists are empty ErrNoData 56 times of no-buffer attached to VCD (Sheet 2 of 2) 1.3.2.2 Rings Rings are data structures that support fast messaging between the host and the APU. They minimize control traffic over the shared PCI Bus and support the master write-only method of exchanging data. The rings are described in greater detail in Chapter 2. Figure 1.7 shows the format of a ring message. In the receive direction, the ATMizer II+ reassembles cells into buffers. After one buffer is full, the APU places this buffer number with the status bits (returned from the EDMA Buffer Completion Queue) into the RxRing. The host checks the RxRing, retrieves and consumes the data, updates the statistics and then returns this free buffer to the ATMizer II+ by writing (copying) BuffLarge, BuffFree and Buffer Number fields into the TxRing with the BuffFree bit set. After getting this message, the APU issues a buff command to the EDMA. The EDMA returns the buffer to the large buffer free list if the BuffFree and the BuffLarge fields are set, or returns the buffer to the small buffer free list if the BuffFree field is set and the BuffLarge field is cleared. In the transmit direction, the host writes the Buffer Number fields into the TxRing with the BuffFree bit clear. Then the APU issues a buff command to the EDMA by copying this message into the EDMA_Buff register. Since the BuffFree bit is cleared, the EDMA segments the buffer for transmission regardless of the BuffLarge bit. Refer to the L64364 Software Overview 1-19 BookL64364PG.fm5 Page 20 Friday, January 28, 2000 4:58 PM ATMizer II+ ATM-SAR Chip Technical Manual for details. After the buffer is completely segmented, the APU puts the buffer number into the RxRing with the BuffFree bit clear. The host checks the RxRing and places this free buffer back in its own free buffer list. Figure 1.7 Ring Message Format 31 21 20 Status Bits 18 FreeSel 17 16 BuffFree BuffLarge 15 0 Buffer Number 1.3.3 Scheduling The primary objective of a scheduling task is to decide which connection should be serviced in the current ATM cell slot. The scheduling task in the ATMizer II+ chip is managed by the on-chip MIPS processor, the ATM Processing Unit (APU). The APU uses a hardware Scheduler module to minimize the processor load. This chapter discusses different approaches for writing the APU application code that performs the scheduling function. A complete application example is developed, starting from the simplest code and progressively enhancing it with additional features until the desired behavior is reached. At each step, the code is fully commented and various alternatives are discussed. The hardware Scheduler allows the ATMizer II+ chip to manage a large number of connections with arbitrary data rates. Other segmentation and reassembly (SAR) processors base connection scheduling on a set of timers. Since these processors include a limited number of timers, they have a limited number of data rates that they can handle. All connections have to be assigned to one of those rates. This approach is practical with constant bit rate (CBR) sources whose rates can be approximated with that of one of the timers. Variable bit rate (VBR) sources are typically built by executing a leaky bucket algorithm at each peak cell rate (PCR) event to check whether a cell can be sent from a connection. Although inefficient, VBR sources can be serviced with a set of timers. With the advent and standardization of available bit rate (ABR) by the ATM Forum, the timer-based approach is not practical anymore. An ABR source may have an arbitrary and varying-in-time rate, that is difficult to match by a finite set of timers. 1-20 Introduction BookL64364PG.fm5 Page 21 Friday, January 28, 2000 4:58 PM The ATMizer II+ chip uses the hardware Scheduler to effectively create arbitrary traffic patterns on a large number of connections. The Scheduler provides primitives that are executed under control of the APU, an 80 MHz, superscalar MIPS processor core capable of sustaining 110 MIPS performance. The Scheduler primitives act like software routines except that they are executed by dedicated hardware units and do not consume CPU bandwidth. However, since the management of the scheduling process is actually done in software, you can change device behavior by downloading new application code. 1.3.4 Hashing Function ATM technology is connection oriented and the data flow between two end-station entities is based on an established virtual connection between them. The routing mechanism for the cells which hold the data is carried in the header; the address space is comprised of 24 bits which is then sub-divided into two fields. At the end-stations, the cells are processed based on a connection number. Typically, the maximum number of connections that an end-station processes is much smaller than the address space available in the cell header. Therefore, a need exists for a hashing mechanism to obtain the connection number of a cell based on the cell header value. 1.3.5 Packet Aging The concept of packet aging is the notification to the host of idle connections that have not received a cell for a predefined period. The ATM Processing Unit (APU) samples the Virtual Connection Descriptor (VCD) and examines the TimeStamp value on the VCD to determine if the connection has to be labeled as an idle connection. 1.3.6 Interrupt Handling The CW4010 processor used in the L64364 ATMizer II+ ATM-SAR Chip supports three types of interrupt signals: • Cold/warm resets and nonmaskable interrupts, • External nonvectored interrupts (6), and • External vectored interrupts (16). Software Overview 1-21 BookL64364PG.fm5 Page 22 Friday, January 28, 2000 4:58 PM The six external nonvectored and 16 vectored interrupts require a general handler to pass control off to a unique handler for each specific interrupt. The vectored interrupts also require an enabling routine at initialization. See Chapter 7 for handler code samples and a sample enabler routine. 1.3.7 OAM Cell Processing Chapter 8 outlines the implementation of the Operations and Management (OAM) cell processing function. The OAM function is defined at the Physical and the ATM layers. The Physical Layer OAM cell processing is done by the Framer (for example, the SuniLite Framer chip). The software running on the ATM Processing Unit performs the ATM Layer OAM cell processing. The main goal of this software is to provide the means for you to perform the OAM cell processing. 1.3.8 AAL3/4 Processing Chapter 9 describes the implementation of AAL3/4 processing in the ATMizer II+ chip. The EDMA in the ATMizer II+ chip is designed to implement the AAL5 CS-PDU processing and to support AAL0 type connections. The segmentation and reassembly support for AAL0 connections provided by the EDMA can be used to implement the SAR-PDU segmentation and reassembly for AAL3/4 connections. The AAL3/4 processing can be implemented in the ATMizer II+ by the APU which can preprocess the PDU data before it is segmented or reassembled by the EDMA. 1.3.9 Initialization Chapter 10 discusses the following initialization steps for the ATMizer II+ chip and provides sample code. 1. Booting The booting step initializes the Configuration and Cache Control (CCC) register and the SDRAM controller, then copies the initialization and application code to an executable memory location. 2. C Preamble Execution 1-22 Introduction BookL64364PG.fm5 Page 23 Friday, January 28, 2000 4:58 PM C preamble execution includes .bss section clearing, stack allocation, and initialization of the global data pointer and stack pointer registers. 3. CPU Initialization and Configuration CPU initialization and configuration mainly includes cache configuring and flushing, and interrupt and exception handler setting. 4. Memory Allocation Memory allocation defines the maps for primary, secondary, and Cell Buffer Memory (CBM). 5. Hardware Registers Initialization ATMizer II+ chip hardware initialization and configuration includes all hardware module registers and mode setting. 6. Data Structures Initialization Data structures initialization includes Free Cell List initialization, clearing the Virtual Connection Descriptors (VCDs), and setting the Buffer Descriptors (BFDs) and Scheduler Calendar Table (SCDs). 1.3.10 Operating Software Chapter 11 describes the functions performed by the ATM Processing Unit (APU) in the ATMizer II+ chip and those required of the host. The APU is responsible for traffic management, host messaging, OAM cell processing, and statistics collection. The host program allows you to send commands to the ATMizer II+ and to display the results of these commands. This involves opening connections, transmitting and receiving data, and displaying statistics such as effective rate, errors received, etc. Software Overview 1-23 BookL64364PG.fm5 Page 24 Friday, January 28, 2000 4:58 PM 1-24 Introduction BookL64364PG.fm5 Page 1 Friday, January 28, 2000 4:58 PM Chapter 2 Host Messaging This chapter describes the types of control information that need to be exchanged between a host and the ATM Processing Unit (APU) in the ATMizer II+ chip. It also discusses methods of efficiently moving that information over the shared interconnecting bus, the PCI Bus. Messaging application code is developed for each method. The chapter includes the following sections: • Section 2.1, “Host Messaging Overview” • Section 2.2, “Buffer Processing” • Section 2.3, “Rings” 2.1 Host Messaging Overview The ATMizer II+ chip is a Segmentation and Reassembly (SAR) Processor and thus is typically a slave in a system in that it executes commands issued by an external bus master (host). In this chapter it is assumed that there is only one host in the system. You can easily extend the techniques developed here to cases where there are multiple hosts. The term buffer in this manual is used to denote a memory location and the data in that location. It is used in this way to simplify the discussion since the buffer location may hold a packet, part of a packet, or an ATM cell. The communications between a host and the ATMizer II+ chip described in this chapter includes the following host tasks: 1. sending buffers for segmentation 2. getting back completely sent (segmented) buffers 3. receiving reassembled buffers L64364 ATMizer II+ ATM-SAR Chip Programming Guide 2-1 BookL64364PG.fm5 Page 2 Friday, January 28, 2000 4:58 PM 4. returning received buffers to the free list 5. opening connections 6. closing connections 7. requesting statistics Because of the flexibility of the ATMizer II+ chip, you may decide to implement additional commands or certain host tasks on the ATMizer II+ APU. In a typical system, tasks one through four involve buffer processing, are performed very often, and should be optimized for performance. Tasks five through seven are executed only occasionally and their impact on performance is minimal. Therefore, the two groups of tasks are examined separately. Throughout this chapter simple type definitions are used: typedef unsigned long ulong; typedef unsigned short ushort; typedef unsigned char uchar; 2.2 Buffer Processing In an ATMizer II+ system, a buffer holds payload data for segmentation or reassembly. Each buffer has an associated Buffer Descriptor (BFD) that holds control information as described in the L64364 ATMizer II+ ATM-SAR Chip Technical Manual. See Figure 2.1 for a summary of the BFD layout. Figure 2.1 31 Buffer Descriptor Layout 29 28 27 26 1 24 23 BFD_Ctrl 2 UU 2-2 NextBFD pBuffData_PCI FreeSel R pBuffData_Sec Host Messaging 0 ConNum BuffSize 3 4 16 15 BookL64364PG.fm5 Page 3 Friday, January 28, 2000 4:58 PM BFD_Ctrl BFD Control Bits Word 1, [31:24] Refer to the L64364 ATMizer II+ ATM-SAR Chip Technical Manual for descriptions of these bits. UU AAL5 User-to-User Data byte reserved for user’s use. ConNum Connection Number Word 1, [15:0] Each buffer has a 24-bit ConNum associated with it. BuffSize Buffer Size Word 2, [31:16] The BuffSize field specifies the number of bytes in the buffer payload. NextBFD Next BFD Word 2, [15:0] The NextBFD field is used to link BFDs. In the transmit direction, the EDMA ignores the NextBFD field when it executes the Buff command so that neither the host nor the APU need to initialize this field. In the receive direction and, in case of a fragmented PDU, the EDMA places the number of the next buffer belonging to the same PDU in NextBFD. If the buffer is the last one for the PDU, the EDMA places zero in NextBFD. Word 1, [23:16] pBuffData_PCI Word 3, [31:0] Pointer to Buffer Data in PCI Memory The pBuffData_PCI field holds the pointer to the buffer payload in PCI memory. If this field is zero, then the value in the pBuffData_Sec field is used to point to the buffer payload in Secondary memory. FreeSel Free Select Word 4, [31:29] The APU uses this field to indicate to which Free List (0–5) the BFD belongs. R Reserved Do not modify this field. Word 4, [28:27] pBuffData_Sec Word 4, [26:0] Pointer to Buffer Data in Secondary Memory The pBuffData_Sec field holds the pointer to the buffer payload in Secondary memory. If this field is zero, then the value in the pBuffData_PCI field is used to point to the buffer payload in Secondary memory. If both pBuffData_PCI and pBuffData_Sec are non-zero, the BFD is in packet mode. Buffer Processing 2-3 BookL64364PG.fm5 Page 4 Friday, January 28, 2000 4:58 PM Buffer Descriptors are referenced using 16-bit wide Buffer Numbers (BuffNum). A Buffer Number is an index into the Buffer Descriptor array. The base of the array is programmed using the EDMA_BFD_Base register. Since each entry in the array holds 16 bytes, the EDMA shifts a BuffNum left by four positions before adding it to EDMA_BFD_Base to obtain a BFD address. 2.2.1 Buffer Flow Before describing the different methods of buffer messaging, it is important to analyze data and control flows for systems including the ATMizer II+ chip and an external host. 2.2.1.1 Transmit Flow Since the PCI BUs is a shared bus, you will need to construct some form of FIFOs to handle data and control information for both transfer directions. The transmit flow using FIFOs between the host and APU is shown in Figure 2.2. 2-4 Host Messaging BookL64364PG.fm5 Page 5 Friday, January 28, 2000 4:58 PM Figure 2.2 Transmit Flow Host Host Buffer for Transmission Buffer Transmitted Host Messaging TxFifo TxDone APU APU EDMA Buff Request Queue APU EDMA TxCell Request Queue Scheduling EDMA TxCell Completion Queue EDMA VCD BFD BFD EDMA ACI TxFifo in Cell Buffer Memory ACI Utopia Bus When a host has a buffer ready for segmentation, it places an appropriate message in the TxFifo. The contents of the message and various implementations of the TxFifo are described elsewhere in this section. The APU retrieves the message from the TxFifo and depending on the messaging scheme used, performs various BFD formatting. In the simplest form, the host uses exactly the same BFD format as the EDMA and the APU needs only to issue a Buff command that places a BuffNum in the EDMA Buff Request Queue. The EDMA retrieves the BuffNum Buffer Processing 2-5 BookL64364PG.fm5 Page 6 Friday, January 28, 2000 4:58 PM from the Request Queue and links the referenced BFD to a VCD. The Connection Number to use is read from the first word of the BFD. As a separate asynchronous process governed by the scheduling mechanism, the APU issues TxCell commands to the EDMA. An example scheduling mechanism is described in Chapter 3, Scheduling. The EDMA retrieves data payloads from buffers, creates cells in Cell Buffer Memory (CBM), and puts cells in the ACI TxFifo for transmission to the Utopia Bus. Buffer completion occurs when all the data from the buffer payload has been successfully segmented into cells and placed in the ACI TxFifo. When a buffer is completed, the EDMA places the BuffNum in the EDMA TxCell Completion Queue. For the last buffer of a PDU, the EDMA may have to build a cell without any buffer payload and holding only the AAL5 trailer (padding, UU, CPI, Length and CRC32). This situation occurs when the PDU length modulo 48 is equal to zero or greater than 40. In this case, the current Buffer Number is placed in the EDMA Completion Queue only after this last cell is built. The APU retrieves the BuffNum from the TxCell Completion Queue and writes an appropriate message to the TxDone FIFO. Although different implementations are possible, the message typically consists of the Buffer Number. The process above the dashed line in Figure 2.2 (Host Messaging) is the subject of this chapter while the process in the lower left corner (Scheduling) is described in Chapter 3, Scheduling. 2.2.1.2 Receive Flow The data and control flow for the receive direction is shown in Figure 2.3. This section describes the process to the right and above the dashed line in Figure 2.3. Figure 2.3 describes only the case when buffers are pulled out by the EDMA from a free list as needed. It does not describe the case when the APU explicitly attaches free buffers to a VCD in the receive direction. 2-6 Host Messaging BookL64364PG.fm5 Page 7 Friday, January 28, 2000 4:58 PM Figure 2.3 Receive Flow Utopia Bus ACI Host ACI RxFifo in Cell Buffer Memory APU RxFifo Cell Header Translation EDMA RxCell Request Queue APU Host Messaging RxDone APU EDMA RxCell Completion Queue EDMA VCD Host EDMA Buff Request Queue EDMA BFD BFD Free BFD BFD When a cell is received from the Utopia Bus, the ACI places it in the ACI RxFifo located in the CBM. The APU retrieves the cell from ACI RxFifo and performs cell header lookup. This operation consists of classifying the cell based on the cell header and determining the Connection Number. The Connection Number identifies the Virtual Connection to which the cell belongs. There are numerous ways to perform this operation. The simplest one is to form the Connection Number from the appropriate bits of the cell header VPI/VCI fields by shift and mask operations. This method is appropriate when the VPI and VCI are assigned as contiguous numbers. If this is not the case, a more complex cell header lookup (for example, using hashing) should be performed. When the APU determines the Connection Number, it places a command in the EDMA RxCell Request Queue. The EDMA retrieves the command from the queue and transfers the cell payload to a buffer. If there is no buffer available, the EDMA pulls out a free buffer from one of the two free buffer lists and attaches it to the VCD. When a buffer is completed, the Buffer Processing 2-7 BookL64364PG.fm5 Page 8 Friday, January 28, 2000 4:58 PM EDMA detaches it from the linked list and places the Buffer Number in the EDMA RxCell Completion Queue. Note that the linked list of buffers attached to a receive VCD has a maximum length of one when only buffers from free lists are used. The list can have more than one BFD only if buffers are attached explicitly using the EDMA Buff command for a receive VCD. The APU retrieves the BuffNum from the EDMA RxCell Completion Queue and places an appropriate message in the RxFifo. The contents of the message and various implementations of the RxFifo are described in Section 2.2.3, “FIFO Contents” and Section 2.2.4, “FIFO Implementations.” The host retrieves the message and may process the buffer payload. When the buffer processing is done, the host places a message in the RxDone FIFO. Although various implementations are possible, such a message typically consists of the Buffer Number with additional control bits. The APU retrieves the message from the RxDone FIFO and issues the EDMA Buff command to place the BuffNum in the EDMA Buff Request Queue. The EDMA retrieves the BuffNum from the request queue and links the corresponding BFD to a large or small free buffer list. 2.2.2 FIFO Location To maximize system performance, it is important to place the FIFOs in appropriate locations. The ATMizer II+ chip uses the PCI Bus as the shared bus to communicate with the host. Since shared buses typically experience increased latencies, it is important to minimize the total number of bus requests. Once the bus is acquired, it is less expensive to continue an established burst than to start a new one. Therefore, a design goal for an ATMizer II+ chip host messaging system is to minimize the total number of bus accesses, if necessary at the expense of increasing burst lengths. The second issue to consider is that write operations are typically much less expensive than read operations in both the bus utilization and master device processing power. This is because write operations are nonblocking while read operations are blocking. When a data element is written to an external target device over the PCI Bus, the initiating device typically writes the data into the PCI master FIFO and then continues with other tasks. From that perspective, a write operation is of the “shootand-forget” type. When the PCI controller acquires the bus, it has data 2-8 Host Messaging BookL64364PG.fm5 Page 9 Friday, January 28, 2000 4:58 PM available in the internal FIFO and may put the stream over the bus immediately. When an initiating device has to read a data element from a target device over the PCI bus, it has to request the bus, wait until the data is retrieved by the target device, and wait until the data is placed on the bus. This may take a relatively long time due to bus latencies. When the bus is acquired and the target is selected by address decoding logic, the target must respond. The PCI controller has to fetch data from the target location and place it on the bus. For example, a DRAM-based memory may require as much as 120 ns (four cycles) to place data on the bus, assuming the memory bus is not used. These four cycles are then lost for bus utilization. If read latency is sufficiently high, better effects may be achieved if the target device immediately disconnects after decoding its own address. The master device is required to retry the same operation. In the meantime, the target has time to fetch data and place it in its slave read FIFO. In between, the bus arbiter may decide to grant the bus to another target, effectively increasing the time for the target to fetch data without adversely affecting bus utilization. However, even this method increases bus utilization since the bus has to be arbitrated twice for the same data and the process doesn’t help to resolve the master blocking issue. It may actually make things worse due to bus re-arbitration. Consequently, the second design goal for an ATMizer II+ chip host messaging system should be to privilege PCI write operations and discourage PCI read operations. Since PCI FIFOs hold data that is written by a sender and read by the receiver, this goal is achieved if the FIFOs are located in the receiver memory. Therefore, the RxFifo and TxDone FIFOs should be located in the PCI primary memory or host memory accessible from the PCI Bus, and the TxFifo and RxDone FIFO should be located in either the CBM or secondary memory. To reduce the secondary memory’s bandwidth requirements, it is recommended that you place both the TxFifo and the RxDone FIFO in the CBM. Buffer Processing 2-9 BookL64364PG.fm5 Page 10 Friday, January 28, 2000 4:58 PM 2.2.3 FIFO Contents As discussed in the previous section, four FIFOs are needed to exchange information between the host and APU. This section discusses which elements are actually placed in the FIFOs. FIFO implementation is described in a later section. See Table 2.1. Table 2.1 FIFOs between the Host and APU Name Sender Receiver Contents TxFifo Host APU buffers for transmission (for segmentation) TxDone APU Host buffers sent RxFifo APU Host buffers received (from reassembly) RxDone Host APU processed buffers to a free list The elements placed in the TxFifo and RxFifo may be either Buffer Descriptors or Buffer Numbers leading to what is called 2- or 3-way messaging. Since the TxDone and RxDone FIFOs are used only to signal that a given buffer was completely processed, it is sufficient to store the Buffer Numbers in them. 2.2.3.1 Three-Way Messaging In this case, the host writes Buffer Numbers into the TxFifo and the APU writes Buffers Numbers into the RxFifo. The sequence of events for the transmit direction is as shown in Table 2.2. 2-10 Host Messaging BookL64364PG.fm5 Page 11 Friday, January 28, 2000 4:58 PM Table 2.2 Three-Way Messaging, Transmit Direction 1. The host writes the BuffNum into the TxFifo. PCI write 2. The APU reads the BuffNum from the TxFifo. 3. The APU issues the Buff command. 4. The EDMA copies the BFD to secondary memory. PCI read 5. The EDMA processes the buffer. 6. The EDMA places the completed BuffNum in the TxCell Completion Queue. 7. The APU writes the BuffNum to the TxDone FIFO. PCI write As shown, this method involves two write operations and one read operation over the shared PCI Bus, hence the name of 3-way messaging. The sequence of events for the receive direction is shown in Table 2.3. Table 2.3 Three-Way Messaging, Receive Direction 1. The EDMA completes a buffer and copies the BFD to primary memory. PCI write 2. The EDMA places the BuffNum in the RxCell Completion Queue. 3. The APU writes the BuffNum to the RxFifo. PCI write 4. The host reads the BuffNum and BFD, and processes the data. 5. The host writes the processed BuffNum to the RxDone FIFO. PCI write 6. The APU issues the Buff command. 7. The EDMA attaches the BFD to a free list. There are three write operations over the shared PCI Bus In the receive direction. Note that the same type of element (BuffNum) is used both for the TxFifo and RxDone. The two FIFOs in the receive direction thus are combined in one using special tag bits for the receiver (APU) to determine the difference. Similarly, the RxFifo may be combined with the TxDone FIFO. Buffer Processing 2-11 BookL64364PG.fm5 Page 12 Friday, January 28, 2000 4:58 PM 2.2.3.2 Two-Way Messaging In this case the host writes Buffer Descriptors into the TxFifo and the APU writes Buffer Descriptors into the RxFifo. The sequence of events for the transmit direction is as shown in Table 2.4. Table 2.4 Two-Way Messaging, Transmit Direction 1. The host writes the BuffNum into the TxFifo. PCI write 2. The APU copies the BFD to secondary memory and allocates a local BuffNum. 3. The APU issues the Buff command. 4. The EDMA links the BFD to an appropriate VCD. 5. The EDMA processes the buffer. 6. The EDMA places the completed BuffNum in the TxCell Completion Queue. 7. The APU writes the BuffNum to the TxDone FIFO. PCI write This method involves two operations over the shared PCI bus, hence the name 2-way messaging. There are some additional steps that the APU has to perform in 2-way messaging as compared to 3-way messaging. First, the APU must allocate a free BFD in which to copy the contents of the BFD retrieved from the TxFifo. This is necessary since the TxFifo must be emptied rapidly to avoid overflow. Second, the BFD in the TxFifo must have some sort of an identifier that can be returned later to the host (step 7 in the transmit events). Such an identifier should be allocated by the host and may be stored, for example, in the last unused halfword of the BFD. When the EDMA completes a buffer segmentation and returns a BuffNum into the TxCell Completion Queue, the APU has to read the BFD[BuffNum].BuffId field and place this field in the TxDone FIFO, instead of the BuffNum. Note that in this case, BuffNum is purely a local number exchanged between the APU and the EDMA while the BuffId is exchanged between the host and the APU. Step 2 of the transmit sequence involves a copy operation and its implementation depends on the location of the TxFifo. The highest performance can be achieved if the TxFifo is located in the Cell Buffer Memory. In this case, the APU may simply copy the BFD word by word into the secondary memory, optionally performing some formatting 2-12 Host Messaging BookL64364PG.fm5 Page 13 Friday, January 28, 2000 4:58 PM operations if the Buffer Descriptor format used by the host differs from that shown in Figure 2.1. Similar events happen in the receive direction. See Table 2.5. Table 2.5 Two-Way Messaging, Receive Direction 1. The EDMA completes a buffer and places the BuffNum in the RxCell Completion Queue. 2. The APU writes the BuffNum to the BuffId field. 3. The APU programs the Move processor to write the BFD into the RxFifo. PCI write 4. The host reads the BuffNum and BFD, and processes the data. 5. The host writes the processed BuffNum to the RxDone FIFO. PCI write 6. The APU issues the Buff command. 7. The EDMA attaches the BFD to a free list. This time the APU has to explicitly store the BuffNum in the BuffId field so that the host may return it later to the RxDone FIFO. Compared to 3-way messaging, 2-way messaging has the advantage of creating less traffic over the shared PCI bus. One disadvantage of 2-way messaging is that it requires the host to create a PCI burst to send the BFD to the TxFifo. Depending on the host PCI Bus controller, a PCI burst may require setting up a DMA transfer. If a DMA is required, setting it up may impose a high burden on the host processor. If the host bus controller is able to collect multiple single-word transactions at consecutive addresses into 4-word bursts, then 2-way messaging is much more efficient. The second disadvantage of 2-way signalling is that it requires more involvement from the APU and may create a bottleneck if the APU performs many non-SAR related tasks. It also creates slightly more traffic to and from secondary memory. Buffer Processing 2-13 BookL64364PG.fm5 Page 14 Friday, January 28, 2000 4:58 PM 2.2.4 FIFO Implementations This section presents a discussion of various implementations for host messaging FIFOs, while abstracting the FIFOs’ contents. 2.2.4.1 ATMizer II+ Chip Mailbox The simplest implementation for a FIFO is to use existing hardware resources. ATMizer II+ chip includes a 4-entry deep, 32-bit wide, bidirectional FIFO called Mailbox that may be used for communications between the APU and an external PCI Bus master. However, there are significant drawbacks to using the Mailbox for buffer processing: • The Mailbox is only four entries deep. • The APU-to-host direction involves reading the data from the Mailbox over the PCI Bus. The reasons why PCI read operations should be avoided have already been described in Section 2.2.2, “FIFO Location.” The APU is usually much faster than any host processor since its firmware has direct access to hardware resources without the burden of the multilevel function calls required by typical host operating systems. It is therefore tempting to assume that Mailbox overflow can be avoided easily. In the host-to-APU direction, the APU just has to read the Mailbox quickly enough. In the APU-to-host direction, the APU must check the Mailbox occupancy level before writing in a new value. The above assumption is true if the processing time of a command placed in the host-to-APU Mailbox depended only on the APU. Actually, the Buff processor takes care of Buff command processing. If the host writes back-to-back Buff commands to the Mailbox FIFO, the APU will quickly empty the Mailbox and put appropriate commands in the Buff Request Queue. Since the Buff processor needs some time to execute a command, the Buff Request Queue will fill quickly. When this happens, the APU needs to buffer additional commands into a software-controlled FIFO and notify the host of an overflow condition. If such a FIFO is required, the Mailbox may be bypassed and that FIFO used exclusively to reduce the total overhead. 2-14 Host Messaging BookL64364PG.fm5 Page 15 Friday, January 28, 2000 4:58 PM 2.2.4.2 Software Controlled FIFO – Shared Descriptor Multiple software implementations of a FIFO are possible. Choose an implementation that is adapted to the situation where the reader (APU or host), and the writer (APU or host), use separate asynchronous processes, and avoids the need for locking the FIFO descriptor. Assuming that the FIFO holds 32-bit, unsigned integers, the FIFO descriptor is declared as shown in Figure 2.4. Figure 2.4 1 2 3 4 5 6 FIFO Descriptor Declaration typedef struct { ulong *Rd; ulong *Wr; ulong *Base; ulong *End; } Fifo_t, *pFifo_t; /* /* /* /* current position to read from */ current position to write to */ Fifo array base*/ end of the Fifo array */ With this declaration, a routine (Figure 2.5) can be implemented to put a data element in the FIFO: Figure 2.5 PutFifo() Routine 7 int PutFifo(pFifo_t pFifo, ulong Data) 8 { 9 ulong *Ptr = (pFifo->Wr == pFifo->End) ? pFifo->Base : pFifo->Wr + 1; 10 if (Ptr == pFifo->Rd) 11 return 0; 12 *pFifo->Wr = Data; 13 pFifo->Wr = Ptr; 14 return 1; 15 } In the above code, the write pointer is first incremented by using the temporary variable Ptr, wrapping around to the FIFO base if necessary. If the new Wr pointer reaches the Rd pointer, the FIFO is full and the routine returns a failure code. Otherwise, the data element is placed at the current Wr pointer and the Wr pointer value is assigned to the temporary variable, Ptr (i.e., the Wr pointer is incremented). Buffer Processing 2-15 BookL64364PG.fm5 Page 16 Friday, January 28, 2000 4:58 PM The implementation of a routine to retrieve a data element from a FIFO is similar as shown Figure 2.6. Figure 2.6 GetFifo() Routine 16 int GetFifo(pFifo_t pFifo, ulong *Data) 17 { 18 if (pFifo->Rd == pFifo->Wr) 19 return 0; 20 *Data = *pFifo->Rd; 21 pFifo->Rd = (pFifo->Rd == pFifo->End) ? pFifo->Base : pFifo->Rd + 1; 22 return 1; 23 } A FIFO underflow condition is detected by comparing the Rd and Wr pointers. If they are equal, the FIFO is empty and the routine returns a failure status. Otherwise, a data element is retrieved and the Rd pointer is incremented. Figure 2.7 shows how these routines cooperate to control the FIFO. Figure 2.7 FIFO Operations Rd Rd Base Rd End a) b) Wr Initial c) Wr After 1 PutFifo Rd Wr After 2 PutFifos Rd d) e) Wr After 2 PutFifos and 1 GetFifo Rd f) Wr After 3 PutFifos and 1 GetFifo Wr After 4 PutFifos and 1 GetFifo Note that with this approach there is always one FIFO element empty (Figure 2.7). To use it, additional boolean flags would have to be introduced to distinguish between FIFO full and FIFO empty conditions (when the Rd pointer matches the Wr pointer). Such flags would have to be manipulated both by the reader, GetFifo(), and the writer, PutFifo(). This, in turn, would require locking the FIFO descriptor. 2-16 Host Messaging BookL64364PG.fm5 Page 17 Friday, January 28, 2000 4:58 PM In contrast, the implementation shown is completely safe from any race conditions due to the asynchronous and overlapping execution of the GetFifo() and PutFifo() commands by different CPUs. It is hazard free because only the reader can modify the Rd pointer and only the write command can modify the Wr pointer. The main problem with this implementation is that it requires a shared descriptor since both the reader and writer need to access it. As discussed previously in Section 2.2.2, “FIFO Location,” transactions through the PCI Bus, particularly reads, are expensive in terms of time and should be avoided. 2.2.4.3 Software Controlled FIFO – Double Descriptors You can avoid all read and many (but not all) write accesses to the PCI Bus when: • the FIFO is located in the reader’s memory. • both the reader and writer maintain their local copies of a FIFO descriptor. Since secondary memory is fast, each master can read it quickly and the only remaining PCI operations are: • for the writer - one write operation to update the reader’s local Wr pointer • for the writer - one write operation to place data in the FIFO • for the reader - one write operation to update the writer’s local Rd pointer The FIFO descriptor has to be enhanced with one additional field as shown in Figure 2.8. Figure 2.8 Enhanced Fifo Descriptor Declaration 1 typedef struct { 2 ulong *Rd; 3 ulong *Wr; 4 ulong **Other; 5 ulong *Base; 6 ulong *End; 7} Fifo_t, *pFifo_t; /* current position to read from */ /* current position to write to */ /* Fifo array base*/ /* end of the Fifo array */ Buffer Processing 2-17 BookL64364PG.fm5 Page 18 Friday, January 28, 2000 4:58 PM Assuming that you implement the TxFifo from Figure 2.2 in Cell Buffer Memory and that the APU FIFO descriptor is also located in Cell Buffer Memory, the Cell Buffer Memory layout may look like Figure 2.9. Figure 2.9 CBM Layout 8 struct { 9 Fifo_t TxFifo; 10 ulong TxFifoData[TX_FIFO_SIZE]; 11 /* other stuff */ 12 } CBM_t, *pCBM_t; Similarly, the host TxFifo Descriptor may be located in host memory (Figure 2.10): Figure 2.10 TxFifo Descriptor Location 13 struct { 14 Fifo_t TxFifo; 15 /* other stuff */ 16 } SHM, *pSHM_t; The following APU code initializes the APU TxFifo descriptor (Figure 2.11): Figure 2.11 APU TxFifo Descriptor Initialization 17 18 19 20 TxFifo->Base = &TxFifoData[0]; TxFifo->End = &TxFifoData[TX_FIFO_SIZE - 1]; TxFifo->Rd = TxFifo->Wr = TxFifo->Base; TxFifo->Other = &SHM.TxFifo.Rd; The following host code initializes the host TxFifo descriptor (Figure 2.12): Figure 2.12 Host TxFifo Descriptor Initialization 21 22 23 24 TxFifo->Base = &CBM.TxFifoData[0]; TxFifo->End = &CBM.TxFifoData[TX_FIFO_SIZE - 1]; TxFifo->Rd = TxFifo->Wr = TxFifo->Base; TxFifo->Other = &CBM.TxFifo.Wr; 2-18 Host Messaging BookL64364PG.fm5 Page 19 Friday, January 28, 2000 4:58 PM The reader and writer routines are modified as follows (Figure 2.13): Figure 2.13 Modified PutFifo() and GetFifo() Routines 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 int PutFifo(pFifo_t pFifo, ulong Data) { ulong *Ptr = (pFifo->Wr == pFifo->End) ? pFifo->Base : pFifo->Wr + 1; if (Ptr == pFifo->Rd) return 0; *pFifo->Wr = Data; /* ---- PCI write ---- */ pFifo->Wr = Ptr; *pFifo->Other = Ptr; /* ---- PCI write ---- */ return 1; } int GetFifo(pFifo_t pFifo, ulong *Data) { if (pFifo->Rd == pFifo->Wr) return 0; *Data = *pFifo->Rd; pFifo->Rd = (pFifo->Rd == pFifo->End) ? pFifo->Base : pFifo->Rd + 1; *pFifo->Other = pFifo->Rd; /*---- PCI write ---- */ return 1; } This implementation reduces PCI operations to three writes per data element. One write is for the data element transfer and the others are for pointer updates. Obviously, you cannot eliminate the data element transfer, although it is possible to reduce bandwidth requirements by grouping multiple data elements in one PCI burst at the expense of increased processing delay. However, there are ways to reduce the pointer update traffic. 2.2.4.4 Eliminating Rd Pointer Update The master Rd pointer update may be eliminated if a special element value is used as a mark for an empty FIFO position. In the case of the TxFifo containing Buffer Numbers, zero can be safely used as a mark because Buffer Number zero is not used by the ATMizer II+ chip hardware. With this assumption, the PutFifo() and GetFifo() routines are rewritten as follows (Figure 2.14): Buffer Processing 2-19 BookL64364PG.fm5 Page 20 Friday, January 28, 2000 4:58 PM Figure 2.14 PutFifo() and GetFifo() without Rd Pointer Update 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 int PutFifo(pFifo_t pFifo, ulong Data) { ulong *Ptr = (pFifo->Wr == pFifo->End) ? pFifo->Base : pFifo->Wr + 1; if (Ptr == pFifo->Rd) return 0; *pFifo->Wr = Data; /* ---- PCI write --- */ pFifo->Wr = Ptr; return 1; } ulong GetFifo(pFifo_t pFifo) { ulong Data = *pFifo->Rd; if (Data == 0) return 0; pFifo->Rd = (pFifo->Rd == pFifo->End) ? pFifo->Base : pFifo->Rd + 1; *pFifo->Other = pFifo->Rd; /* --- PCI write --- */ return Data; } Note that the PutFifo() routine now requires only one PCI access to write the data element at line 6. GetFifo() still needs a PCI write at line 16. You cannot eliminate this second PCI write operation but can reduce its frequency by performing it for a group of data elements instead of each data element. If the group size is chosen to be half of the FIFO size, the resulting data structure is called a ring and is described in detail in the next section. 2.3 Rings Rings are special data structures supporting very fast messaging between the host and the APU. At a minimum, two rings have to be maintained. The APU-to-host ring is located in primary (PCI) memory and is used for both the RxFifo and the TxDone FIFO. The host-to-APU ring is located in Cell Buffer Memory and is used to implement both the TxFifo and the TxDone FIFO. As an alternative, four rings (one per FIFO) may be built. Only the two ring method is discussed here. 2.3.1 Ring Structure Rings are described by the following Ring Descriptors (Figure 2.15): 2-20 Host Messaging BookL64364PG.fm5 Page 21 Friday, January 28, 2000 4:58 PM Figure 2.15 Ring Descriptors Declaration 1 2 3 4 5 6 7 8 typedef struct { ulong *Ptr; /* current element to be read (retrieved from Fifo) */ ulong *End; /* end of the payload */ ushort *Credit;/* pointer to Credit field in sender memory */ ushort Size; /* size of the Ring, in words */ ushort Count; /* total number of elements retrieved from the Fifo */ /* or current credit value */ } RingDesc_t, *pRingDesc_t; Each ring has an associated Ring Descriptor and Ring Credit field. A Ring Credit is an unsigned short integer located in the writer’s space and updated by the reader to enable the writer to send more data. There are two Ring Credits: • the APU_Host_Credit located in the Cell Buffer Memory • the Host_APU_Credit located in primary memory There are two Ring Descriptors per ring. One is owned and maintained by the writer and the other by the reader. The CBM layout may look like ring structure shown in Figure 2.16: Figure 2.16 CBM Ring Size 1 2 3 4 5 6 7 struct { ulong APU_Host_Credit; ulong Host_APU[HOST_APU_RING_SIZE]; /* Rx Fifo */ /* Tx Fifo */ /* Free cells */ } /* for example 32 */ Similarly, in the primary memory, the layout might be like the one shown in Figure 2.17: Figure 2.17 Primary Memory Ring Size 8 struct { 9 ulong Host_APU_Credit; 10 ulong APU_Host[APU_HOST_SIZE]; 11 /* other stuff */ 12 } Rings /* may be big, like 256 */ 2-21 BookL64364PG.fm5 Page 22 Friday, January 28, 2000 4:58 PM The four Ring Descriptors are located in the memory of each master, specifically the: • Host Ring Descriptors are located in host memory • APU Ring Descriptors are located in ATMizer II+ memory The host cannot (and does not need to) access the APU Ring Descriptor and vice versa. We call the host-to-APU ring the TxRing and the APUto-host ring the RxRing (although BuffNums of the other direction are placed in both). 2.3.2 Ring Management The writer places a data element at the current ring pointer. The current ring pointer is the writer’s local variable. It is incremented after each write operation and wraps down to the ring base address when the ring size is reached. The reader periodically checks the ring element to which its own local ring index points. If the data element is not zero, the reader consumes the data, clears the element to zero, and increments its pointer. It also wraps down to the ring base address when the ring size is reached. Since the indexes to the rings are kept separate by the reader and the writer, it is possible to cause FIFO overflow in the case where one processor overruns the other processor. We introduce credits to avoid overflow. The reader gives the writer credits to enable the writer to write more data. In order to reduce PCI Bus traffic, the reader gives credits in bursts equal to half of the ring size each time. To avoid race conditions, the writer keeps a private copy of the number of words it has sent in the Count variable. The reader informs the writer at each half-size of the FIFO that it is ready to get another batch by placing the number of received elements in the *Credit variable. To avoid FIFO overflow, the writer needs to verify that its Count value does not reach the *Credit value. 2-22 Host Messaging BookL64364PG.fm5 Page 23 Friday, January 28, 2000 4:58 PM 2.3.3 Ring Implementation (Initialization) The assumptions in the following code are: • the ring size is a power of two, • the value zero is not used as a data element, and • all data elements of the rings are initially set to zero. The initialization for the host looks like the following (Figure 2.18): Figure 2.18 Ring Initialization 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Ring_Desc_t TxRing, RxRing; TxRing.Ptr TxRing.Size TxRing.End TxRing.Count TxRing.Credit = = = = = &Host_APU[0]; /* points to CBM */ HOST_APU_RING_SIZE; TxRing.Ptr +TxRing.Size; 0; &Host_APU_Credit; /* points to primary memory*/ RxRing.Ptr RxRing.Size RxRing.End RxRing.Count RxRing.Credit *RxRing.Credit = = = = = = &APU_Host[0]; /* points to primary memory */ APU_HOST_RING_SIZE; RxRing.Ptr + RxRing.Size; RxRing.Size; &APU_Host_Credit; /* points to CBM */ RxRing.Size; When the host wants to put an element (data) in the TxRing, (host->APU), it calls the following (Figure 2.19): Figure 2.19 Host PutRing() Call 15 while (PutRing(&TxRing, Data) == 0) 16 ; The PutRing() routine takes a pointer to the Ring Descriptor and attempts to put a data element (ulong) into the ring. It returns one if it succeeds and zero if it fails. (It fails when the ring overflows because the receiver does not remove the elements in time.) Note that if you have other tasks to perform, you can replace the while loop with an if statement and then try again. As described above, the host waits until there is a place in the ring. Rings 2-23 BookL64364PG.fm5 Page 24 Friday, January 28, 2000 4:58 PM When the host wants to retrieve an element from the RxRing, (APU->host), it calls (Figure 2.20): Figure 2.20 Host GetRing() Call 1 2 while ( (n = GetRing(&RxRing) == 0) ; The GetRing() routine takes a pointer to the Ring Descriptor and returns an element from the ring, or returns zero if the ring is empty. Again, if you do not want to be stalled while waiting for data, replace the while statement with an if statement. Similarly, initialization for the APU is (Figure 2.21): Figure 2.21 APU Ring Initialization 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Ring_Desc_t TxRing, RxRing; RxRing.Ptr RxRing.Size RxRing.End RxRing.Count RxRing.Credit = = = = = &APU_Host[0]; /* points to primary memory */ APU_HOST_RING_SIZE; RxRing.Ptr +Rx Ring.Size; 0; &APU_Host_Credit; /* points to CBM */ TxRing.Ptr TxRing.Size TxRing.End TxRing.Count TxRing.Credit *TxRing.Credit = = = = = = &Host_APU[0]; /* points to CBM */ HOST_APU_RING_SIZE; TxRing.Ptr +Tx Ring.Size; TxRing.Size; &Host_APU_Credit; /* points to primary memory */ TxRing.Size; When the APU wants to put an element in the RxRing it calls (Figure 2.22): Figure 2.22 APU PutRing() Call 17 if (PutRing(&RxRing, Data) == 0) { 18 /* here error code, this ring should never overflow */ 19 } and when it wants to get an element from the TxRing it calls (Figure 2.23): Figure 2.23 APU GetRing() Call 20 if ( (n = GetRing(&TxRing)) != 0) { 21 /* process the element */ 22 } 2-24 Host Messaging BookL64364PG.fm5 Page 25 Friday, January 28, 2000 4:58 PM The source code for both routines follows (Figure 2.24): Figure 2.24 GetRing() and PutRing() Routines 1 ulong GetRing(pRing_t Ring) 2 { 3 ulong Result = *Ring->Ptr; 4 if (Result != 0) { 5 *Ring->Ptr = 0; 6 if (++Ring->Ptr == Ring->End) 7 Ring->Ptr = Ring->End - Ring->Size; 8 if ((++Ring->Count & ((Ring->Size >> 1) - 1)) == 0 )/* half of the ring */ 9 *Ring->Credit = Ring->Count; 10 } 11 return Result; 12 } 13 14 int PutRing(pRing_t Ring, ulong Data) 15 { 16 if (*Ring->Credit == Ring->Count) 17 return 0; 18 *Ring->Ptr = Data; 19 if (++Ring->Ptr == Ring->End) 20 Ring->Ptr = Ring->End - Ring->Size; 21 Ring->Count++; 22 return 1; 23 } Rings 2-25 BookL64364PG.fm5 Page 26 Friday, January 28, 2000 4:58 PM 2-26 Host Messaging BookL64364PG.fm5 Page 1 Friday, January 28, 2000 4:58 PM Chapter 3 Scheduling This chapter discusses an ATMizer II+ chip scheduling task. Detailed information describing the Scheduler may be found in the L64364 ATMizer II+ ATM-SAR Chip Technical Manual. This chapter includes the following sections. • Section 3.1, “Scheduling Invocation” • Section 3.2, “Scheduler Commands” • Section 3.3, “The Scheduling Process” • Section 3.4, “UBR Connections” • Section 3.5, “VBR Connections” • Section 3.6, “ABR Connections” • Section 3.7, “Local Congestion” • Section 3.8, “Source Code Listings” This chapter uses simple type definitions: typedef unsigned long ulong; typedef unsigned short ushort; typedef unsigned char uchar; 3.1 Scheduling Invocation The primary task of a scheduling process is to decide which connection should be serviced in the current cell slot. The scheduling process is not concerned about any memory management or segmentation issues; these are handled by other processes or by dedicated hardware units. For the purposes of the scheduling process, each connection appears as a logical entity with an infinite and continuous data buffer attached to it. L64364 ATMizer II+ ATM-SAR Chip Programming Guide 3-1 BookL64364PG.fm5 Page 2 Friday, January 28, 2000 4:58 PM The primitive time unit in the scheduling process is a cell slot. A cell slot is the time necessary to send one cell on a physical link (for example, 2.82 µs for OC-3 line rates). There are many methods of synchronizing the ATMizer II+ chip’s scheduling process to physical time. Two approaches, the Line Recovered Clock and the FIFO Full methods, are described in the following sections. 3.1.1 Line Recovered Clock Synchronization In this method, a clock recovered from the line is applied to one of the ATMizer II+ chip’s timers. The timer is then polled or generates an interrupt to trigger the scheduling process. If other tasks prevent the scheduling process from servicing the timer interrupt, the timer handler can increment a service counter. The scheduling process is called as long as the counter is nonzero. The counter is decremented after each call. 3.1.2 FIFO Full Synchronization A simpler approach relies on the fact that, although the ATMizer II+ chip’s transmit FIFO is not drained at a fixed clock rate due to UTOPIA start/stop boundary conditions, the PHY device FIFO is drained at the constant line rate. On average, the ATMizer II+ chip’s transmit FIFO drain rate is equal to the line rate. Therefore, it is sufficient to fill the transmit FIFO as fast as possible until it becomes full. The scheduling process is called as long as the FIFO is not full. At each call, it puts exactly one cell in the FIFO and advances its internal time counter. With this approach, be careful to handle situations correctly when the scheduling process is unable to create a cell because there is no data to send. An idle cell must then be put in the FIFO to avoid violating the connection service contract when data is available. Note that, in many practical situations, the violation is of a very short duration. It is proportional to the size of the transmit FIFO and is usually only of importance for CBR traffic. Explicit generation of idle cells in the transmit FIFO make this technique unusable for the multiPHY environment. 3-2 Scheduling BookL64364PG.fm5 Page 3 Friday, January 28, 2000 4:58 PM 3.2 Scheduler Commands The Scheduler offers three commands for scheduling connections. To increase readability of the code, Scheduler commands have been encapsulated in C macros. They are: 1 2 3 N = SCD_Serv(); SCD_Sched(N, T); SCD_Tic(); 3.2.1 SCD_Serv( ) Command The SCD_Serv() command retrieves the number of the connection to be serviced in the current cell slot from an internal register of the Scheduler. After reading the register, the macro returns the connection number immediately. No memory accesses are necessary. The Scheduler then automatically fetches the next connection number to be serviced. 3.2.2 SCD_Sched( ) Command The SCD_Sched(N, T) command schedules connection N for service at cell slot T. The connection descriptor is inserted into a linked list if slot T is already occupied. The insertion position depends on the Scheduler mode. In the Flat mode, all connections get equal priority so a new connection is always appended to the end of the list. Since the Scheduler maintains both a head and a tail pointer in the list in the calendar table, the SCD_Sched() command is executed in constant time. In Priority mode, the SCD_Sched() macro has to scan the list of connections present in slot T and insert connection N based on its class. The list is scanned until the Scheduler finds a connection, M, with a higher class (and thus lower priority) than the class of connection N. Then connection N is inserted before connection M. Due to the scanning of the list, the execution time of SCD_Sched() is variable. Extensive simulations have shown that lists of connections scheduled for the same slot, except for the current cell slot, are very short. The average length is less than one. This observation is true even when there is local congestion and the aggregate rate of all connections exceeds the link Scheduler Commands 3-3 BookL64364PG.fm5 Page 4 Friday, January 28, 2000 4:58 PM rate. System behavior in the presence of local congestion is analyzed in detail in Section 3.7, “Local Congestion.” The SCD_Sched(0, T) form of the command is an indication to the Scheduler that it should use the connection returned by the last SCD_Serv() command. In Flat mode, commands SCD_Sched(N, T) and SCD_Sched(0, T) are equivalent. However, the latter form is preferred in Priority mode as it does not require the Scheduler to fetch the connection class from the memory. 3.2.3 SCD_Tic( ) Command The SCD_Tic() command is used to advance the Scheduler to the next cell slot. As for the SCD_Sched() command, the execution time of SCD_Tic() depends on the Scheduler mode. It is constant in Flat mode and depends on the average length of the list in Priority mode. 3.3 The Scheduling Process This section describes scheduling CBR, VBR, and ABR connections and develops progressive application code along with the discussion. Unspecified bit rate (UBR) connection scheduling is covered in Section 3.4, “UBR Connections.” 3.3.1 A Simple Scheduling Function For an example of the use of Scheduler commands refer to Figure 3.1. Figure 3.1 1 2 3 4 5 6 7 8 9 10 3-4 A Simple Scheduling Function void TxCell(ulong aCell) { ulong N, T; N = SCD_Serv(); EDMA_TxCell(N, aCell); T = TimeNow + ACD[N].ICG; SCD_Sched(0, T); SCD_Tic(); TimeNow++; } Scheduling BookL64364PG.fm5 Page 5 Friday, January 28, 2000 4:58 PM As explained in Section 3.1, “Scheduling Invocation,” the TxCell() routine (line 1 in Figure 3.1) is called when a cell should be sent. The routine’s single parameter is a cell location in Cell Buffer Memory that will be used for building the cell. The calling routine must first get a free cell location before calling TxCell(). If there are no free cell locations, the transmit FIFO is full and a cell cannot be sent. In line 4, the SCD_Serv() command retrieves the number of the connection to service in the current cell slot. Next (line 5), the APU issues a command to the EDMA to send a cell from that connection. The next service time is computed in line 6 by adding an intercell Gap (ICG) to the current time value. The ICG is the inverse of the current connection rate. It is stored in the APU Connection Descriptor (ACD) as ACD[N].ICG where N is the connection number. After a cell is sent and the new service time is computed, the SCD_Sched() command in line 7 schedules the connection for service at the new service time. Finally, the Scheduler time is advanced (line 8) by issuing an SCD_Tic() command and the internal time is advanced (line 9) by incrementing the TimeNow variable. It is easy to see that the previous code listing can be improved. It will be used as a starting point, identifying and fixing the flaws one by one until the desired behavior is achieved. 3.3.2 Scheduling Lag The code for handling scheduling lag is shown in Figure 3.2. One problem with the code is that when a connection is scheduled for service at time T, it may not actually be serviced until some later time T1 > T. In Flat mode, if slot T already has n connections scheduled for service then connection N will actually be serviced at time T1 = T + n. In Priority mode, even if slot T is initially empty, connection N may be pushed to some time later by higher priority connections. This scheduling lag effectively reduces the connection rate to a lower value. To solve that problem, it is necessary to maintain the desired connection service time, T, in the ACD. Time T is stored as ThTxTime (theoretical transmit time). With these naming conventions, the scheduling function can be enhanced to that shown in Figure 3.2. The Scheduling Process 3-5 BookL64364PG.fm5 Page 6 Friday, January 28, 2000 4:58 PM Figure 3.2 1 2 3 4 5 6 7 8 9 10 11 12 13 Handling Scheduling Lag void TxCell(ulong aCell) { ulong N, T; N = SCD_Serv(); EDMA_TxCell(N, aCell); ACD[N].ThTxTime += ACD[n].ICG; T = ACD[N].ThTxTime; if ( T <= TimeNow) T = TimeNow + 1; SCD_Sched(0, T); SCD_Tic(); TimeNow++; } Now, the new service time (line 6) is computed by adding the ICG to the time when the cell should have been sent and not to the current time. However, an additional precaution is necessary because the new service time may actually be in the past (that is, less than current time). This situation may arise when a connection is delayed, due to a transient local congestion, more than one ICG. Lines 8 and 9 check for this condition and schedule the connection at the next available time if it is true. Note that the Scheduler does not allow scheduling connections at the current slot. The earliest possible time is TimeNow + 1. 3.3.3 Rate Granularity In addition to the issues discussed in the previous paragraphs, the code in Figure 3.2 has another problem. The ICG as an integer variable restricts the rates possible to achieve LCR/n (LCR is the line cell rate of the physical link in cells/second, and n is the number of connections). This limitation would, in fact, remove all need for the Scheduler since similar results are easy to achieve with a set of hardware timers. To overcome this problem, the ICG has to be stored as either a floatingpoint or fractional number. Although the floating-point format is easier to manipulate in software, it would introduce a severe performance bottleneck since the ATMizer II+ chip does not have a complete floating point hardware unit. The APU has the necessary hardware module to execute arithmetic operations in ATM Forum floating point format for rate description, but this format has only nine bits for the mantissa. 3-6 Scheduling BookL64364PG.fm5 Page 7 Friday, January 28, 2000 4:58 PM In fact, the ICG calculations do not have to be performed with high precision. A simple fractional format, for example 24.8, is sufficient. In the 24.8 format, 24 bits are used to express the integer part and 8 bits to express a fractional part. This provides a precision of 1/256 or better than 0.4%. The theoretical transmit time should also use the same format. Figure 3.3 shows an example of a connection scheduled with a fixed, normalized rate of 0.3 (normalized means that the LCR is equal to 1 by definition). The ICG is 1/0.3 = 3.333... or 0x0000.0355 in hexadecimal. Figure 3.3 Connection Scheduled with Rate 0.3 0 2 ThTxTime = 0x0000 4 0x0355 6 8 0x06AA 10 12 0x09FF 14 0x0D54 16 18 0x10A9 The code performing fractional service time calculations is shown in Figure 3.4. Figure 3.4 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Calculating Fractional Service Time #define TIME_FRAC 8 void TxCell(ulong aCell) { ulong N, T; N = SCD_Serv(); EDMA_TxCell(N, aCell); ACD[N].ThTxTime += ACD[n].ICG; T = ACD[N].ThTxTime; if ( T <= TimeNow ) T = TimeNow + (1 << TIME_FRAC); SCD_Sched(0, T >> TIME_FRAC); SCD_Tic(); TimeNow += 1 << TIME_FRAC; } TIME_FRAC is a constant number of fractional bits. Line 1 sets it to eight bits. The Scheduling Process 3-7 BookL64364PG.fm5 Page 8 Friday, January 28, 2000 4:58 PM Note that the TimeNow variable in line 11 of Figure 3.4 is now incremented by (1 << TIME_FRAC). This is necessary in order to have both ThTxTime and TimeNow in the same units. An alternative approach would be to clear the eight most significant bits (TIME_FRAC) of TimeNow before the comparison at line 10 is made. 3.3.4 Time Comparisons A careful reader of line 10 might ask if the comparison is always valid. TimeNow is continuously incremented and will eventually wrap down as the result of arithmetic overflow. An example of a case for the scaled down (8 bits instead of 32) version of ThTxTime and TimeNow is given next. Since the fact that both variables are fractional is irrelevant here, the integer values in shown in Table 3.1. will be used. Table 3.1 Time Comparisons TimeNow ThTxTime Delta hex Delta Result 3 1 2 0x002 Past 255 253 2 0x002 Past 254 0 254 0x0FE Past 255 1 254 0x0FE Past 2 0 2 0x002 Past 1 3 −2 0x1FE Future 253 255 −2 0x1FE Future 0 254 −254 0x102 Future 1 255 −254 0x102 Future 0 2 −2 0x1FE Future Although the real difference between TimeNow and ThTxTime varies depending on the absolute value of both, the difference truncated to eight bits remains the same. If the difference between TimeNow and ThTxTime is interpreted as an 8-bit signed number, positive values mean that ThTxTime is in the past and negative values mean that ThTxTime is in the future, compared to TimeNow. 3-8 Scheduling BookL64364PG.fm5 Page 9 Friday, January 28, 2000 4:58 PM The following equation may be used to determine if time T is in the past: Equation 3.1 (TimeNow - T) >= 0 The ‘>=’ is used in the comparison because the Scheduler doesn’t support scheduling connections in the current cell slot (when the difference is equal to zero). Equation 3.1 may be rewritten as follows: Equation 3.2 (TimeNow - T + 1) > 0 However, one time unit in our system is numerically equal to (1 << TIME_NOW). The code in Figure 3.5 defines a special macro to render the comparison more readable. Figure 3.5 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Handling Time Comparisons #define TIME_FRAC 8 #define IsInPast(T) ( ((long) (TimeNow - (T) + (1 << TIME_FRAC))) > 0) void TxCell(ulong aCell) { ulong N, T; N = SCD_Serv(); EDMA_TxCell(N, aCell); ACD[N].ThTxTime += ACD[n].ICG; T = ACD[N].ThTxTime; if ( IsInPast(T) ) T = TimeNow + (1 << TIME_FRAC); SCD_Sched(0, T >> TIME_FRAC); SCD_Tic(); TimeNow += 1 << TIME_FRAC; } The code in Figure 3.5 is immune to arithmetic overflow of TimeNow. Of course if ThTxTime lags by more than 2^(31 - TIME_FRAC) cell slots behind TimeNow, the comparison yields invalid results. The value above represents 4.7 seconds for OC-3, far beyond what a local congestion may create. 3.3.5 Stopping Connection Scheduling So far, it assumed that there is always data to send from a connection when the connection is serviced. Since this is rarely true in a real application, it is necessary to handle situations when a connection should be serviced but there is no data to send. The Scheduling Process 3-9 BookL64364PG.fm5 Page 10 Friday, January 28, 2000 4:58 PM The Scheduler is equipped to help the APU detect this situation. It reads the VCD_Ctrl.BuffPres bit from the Virtual Connection Descriptor (VCD) and returns it as SCD_BuffPres in bit position 31 (the sign bit) together with the connection number. This explains why the link field (NextVCD) used by the Scheduler is positioned just before the VCD_Ctrl field. In addition, the EDMA is able to signal through a dedicated, on-chip, signaling path to the Scheduler that the BuffPres bit has changed for a given connection. This is necessary since the Scheduler reads the BuffPres bit and the NextVCD field in advance to have it ready for the APU when it issues the SCD_Serv() command. If the EDMA changes the bit later, it signals the change to the Scheduler. This feature is equivalent to cache snooping in microprocessor caches. If there is no data to send, simply stop scheduling the connection and continue to issue the SCD_Serv() command until a connection with data to send is found or the list of connections scheduled for the current slot is exhausted. This is illustrated in Figure 3.6. Figure 3.6 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 3-10 Stopping Connection Scheduling #define #define #define #define TIME_FRAC 8 IsInPast(T) ( ((long) (TimeNow - (T) + (1 << TIME_FRAC))) > 0) ConHasNoData(N) ((long) N > 0) ConHasData(N) ((long) N < 0) void TxCell(ulong aCell) { ulong N, T; do { N = SCD_Serv(); } while (ConHasNoData(N)); if (ConHasData(N)) { EDMA_TxCell(N, aCell); ACD[N].ThTxTime += ACD[n].ICG; T = ACD[N].ThTxTime; if ( IsInPast(T) ) T = TimeNow + (1 << TIME_FRAC); SCD_Sched(0, T >> TIME_FRAC); } else { pCell_t pCell = (pCell_t) &CBM[aCell]; pCell->CDS = CDS_IDLE_CELL; pCell->CellHdr = 0; EDMA_TxCell(0, aCell); } SCD_Tic(); TimeNow += 1 << TIME_FRAC; } Scheduling BookL64364PG.fm5 Page 11 Friday, January 28, 2000 4:58 PM 3.3.5.1 Connection Rescheduling The code in Figure 3.6 has two interesting features. First, the connections that have no data to send are effectively removed from the calendar table and a question immediately arises as to when the connections are reinserted in the calendar table (rescheduled). Obviously, a connection should be rescheduled when it receives data to send. To implement that strategy, the APU would have to check the BuffPres bit each time it requests that a buffer be attached to a VCD, that is, when it issues a Buff command to the EDMA. Fortunately, this is unnecessary since the EDMA performs this task automatically. When the BuffPres bit goes from zero to one as a result of the buffer attachment, the EDMA puts the connection number in the Buff Completion Queue. This event might be polled or set up to generate an interrupt. In both cases, the APU should reschedule the connection at the next available slot. The code in Figure 3.6 may be installed as a vectored interrupt handler for this task as shown in Figure 3.7. Figure 3.7 1 2 3 4 5 Buff Completion Queue Interrupt Handler void ServBuffComplQueue() { ulong N = EDMA_ComplQueue(); ulong T = SCD_Now(); SCD_Sched(N, T + 1); } If this code is installed as an interrupt handler, additional code is required to implement register save and restore. In line 2, the connection to reschedule is retrieved from the EDMA Buff Completion Queue. The code in line 3 finds the current scheduler time. The connection is rescheduled to the next time slot (T + 1) in line 4. Rescheduling the connection immediately when data becomes available might be questionable. Since the connection is rescheduled at the next available slot, it might seem that the traffic contract could be violated if data is absent for a very short time and then available again. However, detailed analysis in Figure 3.8 shows this is not a problem. The Scheduling Process 3-11 BookL64364PG.fm5 Page 12 Friday, January 28, 2000 4:58 PM Figure 3.8 Time Connection Rescheduling 0 2 T Data BuffPres Scheduled A connection is scheduled at time 2 and a TxCell command sends the last cell for the connection. The BuffPres bit is cleared some time after and when the connection is serviced again at time 3, no data is available. Cells are not sent and the connection is removed from the calendar. Later, when data is available, the connection is rescheduled again at time T. Therefore, the minimum time between T and 2 is always at least one ICG. 3.3.5.2 Sending Idle Cells As discussed in Section 3.1, “Scheduling Invocation,” when there is no cell to send, an explicit idle cell is sent to avoid future contract violations. This is easily accomplished by the code in lines 20-23 of Figure 3.6. The pointer to Cell Buffer Free location (&CBM[aCell]) is type casted to an ATMizer II+ chip’s cell structure (pCell-T). The structure contains a 4-byte cell descriptor followed by a 4-byte cell header and a 48-byte cell payload. If a timer is used to derive the cell slot clock as discussed in Section 3.1.1, “Line Recovered Clock Synchronization,” the ATM Cell Interface (ACI) in the ATMizer II+ chip or the framer automatic idle cell generation may be used. The cell descriptor is set in such a way that the CDS_Tbytes field has the value 48, which when sent, instructs the ACI to clear all cell payload to the UTOPIA Bus. The cell header is cleared, creating an explicit idle cell. The cell is put into the ACI transmit FIFO by specifying the null connection to the EDMA. Note that using the EDMA to put the idle cell in the transmit FIFO guarantees correct cell ordering. If a cell is put directly into the transmit FIFO, it might get in front of other cells that are to be built by the EDMA through processing requests already present in the EDMA Request Queue. 3-12 Scheduling BookL64364PG.fm5 Page 13 Friday, January 28, 2000 4:58 PM The ACI module may send idle cells from cell location 0 automatically when the transmit FIFO is empty and the ACI_TxIdle bit is set in the ACI_Ctrl register. However, this feature cannot be used if the scheduling method described in Section 3.1.2, “FIFO Full Synchronization,” is used. 3.3.6 Race Conditions and Hazards When there are two or more processing units using the same resource (in this case the Scheduler), it is important to analyze the system’s behavior for possible race conditions. Careful analysis of Figure 3.6 and Figure 3.7 reveals that a race condition exists. Refer to Figure 3.9. Figure 3.9 Race Conditions 0 1 2 3 4 5 6 7 BuffPres T1 T2 After the last cell is sent at slot 1, the BuffPres bit is cleared at time T1. Next, a new buffer is attached at time T2 and the code from Figure 3.7 is invoked. The code attempts to reschedule the connection previously scheduled at time 2. This is a fatal failure and results in some lost connections. The solution in this case is to store an explicit flag in the ACD to signal if the connection is currently scheduled or not. The code in Figure 3.7 is modified as shown in Figure 3.10 to check the flag before rescheduling the connection. The Scheduling Process 3-13 BookL64364PG.fm5 Page 14 Friday, January 28, 2000 4:58 PM Figure 3.10 Interrupt Handler without Race Condition 1 2 3 4 5 6 7 8 void ServBuffComplQueue() { ulong N = EDMA_ComplQueue(); ulong T = SCD_Now(); if (!ACD[N].Scheduled) { SCD_Sched(N, T + 1); ACD[N].Scheduled = 1; } } The code from Figure 3.6 is also modified as in Figure 3.11 to reset the flag. Figure 3.11 Resetting the Connection Scheduled Flag 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 3-14 #define #define #define #define TIME_FRAC 8 IsInPast(T) ( ((long) (TimeNow - (T) + (1 << TIME_FRAC))) > 0) ConHasNoData(N) ((long) N > 0) ConHasData(N) ((long) N < 0) void TxCell(ulong aCell) { ulong N, T; do { N = SCD_Serv(); if (ConHasNoData(N)) ACD[N].Scheduled = 0; } while (ConHasNoData(N)); if (ConHasData(N)) { EDMA_TxCell(N, aCell); ACD[N].ThTxTime += ACD[n].ICG; T = ACD[N].ThTxTime; if ( IsInPast(T) ) T = TimeNow + (1 << TIME_FRAC); SCD_Sched(0, T >> TIME_FRAC); } else { pCell_t pCell = (pCell_t) &CBM[aCell]; pCell->CDS = CDS_IDLE_CELL; pCell->CellHdr = 0; EDMA_TxCell(0, aCell); } SCD_Tic(); TimeNow += 1 << TIME_FRAC; } Scheduling BookL64364PG.fm5 Page 15 Friday, January 28, 2000 4:58 PM If the Buffer Completion Queue handler is invoked using interrupts, the interrupts should be disabled immediately after entering the TxCell() routine (after line 7). They may be enabled after exiting the do {} while loop after line 12. This precaution is necessary to avoid the race condition due to the TxCell() routine clearing the Scheduled bit while the ServBuffComplQueue() routine is attempting to set it. 3.3.7 Scheduling ABR Connections The code in Figure 3.10 and Figure 3.11 correctly handles Constant Bit Rate (CBR) and Variable Bit Rate (VBR) connections. Of course, inverse leaky bucket calculations would have to be performed to compute new ICGs as described in Section 3.5.1, “PCR-Based Implementation,” but the scheduling process is now complete. However, the situation is slightly more complex with Available Bit Rate (ABR) connections that may have Resource Management (RM) cells to send even if there is no data to send. The detailed algorithm for handling ABR connections is described in Section 3.6, “ABR Connections.” Here, the discussion will concentrate only on the scheduling task. The interface with the ABR specific code is handled using two functions: • int ABR_Send(ConNum, aCell, ...) • int ABR_Receive(ConNum, aCell, ...) 3.3.7.1 Sending a Cell from an ABR Connection The ABR_Send() routine is called to send a cell from an ABR connection. The routine decides if a Forward Resource Management (FRM) cell, a Backward Resource Management (BRM) cell, or a data cell should be sent. Next, it builds the appropriate cell in the cell buffer and sends it out. The routine also updates the ICG in the ACD, increasing or decreasing the connection rate according to the information present in the RM cells. Even if the connection has no data to send, it may still have an FRM or a BRM cell to send. This suggests a complete rewrite of the TxCell() routine since the ConHasNoData() test can no longer be relied upon to skip connections that do not require any processing. The new TxCell() routine is shown in Figure 3.12. The Scheduling Process 3-15 BookL64364PG.fm5 Page 16 Friday, January 28, 2000 4:58 PM Figure 3.12 TxCell() Routine for Multiple Class Connections 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 3-16 #define TIME_FRAC 8 #define IsInPast(T) ( ((long) (TimeNow - (T) + (1 << TIME_FRAC))) > 0) #define ConHasData(N) ((long) N < 0) #define ConClass(N) ( ((N) >> 16) & 3) typedef int QoS_Send_t(ulong ConNum, ulong aCell); extern QoS_Send_t CBR_Send, VBR_Send, ABR_Send, UBR_Send; QoS_Send_t *QoS_Send[] = { CBR_Send, VBR_Send, ABR_Send, UBR_Send }; void TxCell(ulong aCell) { ulong N, T; do { N = SCD_Serv(); if (N == 0) { pCell_t pCell = (pCell_t) &CBM[aCell]; pCell->CDS = CDS_IDLE; pCell->CellHdr = 0; EDMA_TxCell(0, aCell); break; } if (QoS_Send[ConClass(N)](N, aCell)) { ACD[N].ThTxTime += ACD[N].ICG; T = ACD[N].ThTxTime; if ( IsInPast(T) ) T = TimeNow + (1 << TIME_FRAC); SCD_Sched(0, T >> TIME_FRAC); break; } else ACD[N].Scheduled = 0; } while (1); SCD_Tic(); TimeNow += 1 << TIME_FRAC; } int CBR_Send(ulong ConNum, ulong aCell) { if (ConHasData(ConNum)) { EDMA_TxCell(ConNum, aCell); return 1; } return 0; } Scheduling BookL64364PG.fm5 Page 17 Friday, January 28, 2000 4:58 PM To process different traffic classes, four routines are defined in Figure 3.12: • CBR_Send(...) • VBR_Send(...) • ABR_Send(...) • UBR_Send(...) Each routine returns a boolean status code indicating if the connection should be rescheduled. Status code 0 means that the connection should not be rescheduled, usually because there was no data to send. The type of the CBR/VBR/ABR/UBR_Send routine is declared as QoS_Send_t in line 7. Also an array of four pointers to QoS functions is defined and initialized in line 10. The code starts with the same do {} while loop at line 16 as before. This time, however, if the Scheduler does not have a cell to send, an idle cell is built and sent out. Otherwise, the appropriate QoS_Send function is called to return a status code. The CBR_Send(...) function defined in line 41 simply checks if there is data to send, sends a cell, and returns a status code. More complex calculations involving recomputing the ICG are necessary for VBR and ABR connections. If required, the connection is unscheduled (line 34) and the search continues for other connections to service in the current cell slot by reentering the do {} while loop. Otherwise, the new ThTxTime is computed and the connection is rescheduled. Finally in line 37, the time is advanced. 3.3.7.2 Receiving an RM Cell The ABR_Receive() routine is called when an RM cell has been received on an ABR connection. The routine first determines if this is an FRM or a BRM. In the case of a BRM, the routine updates the ACD by computing the new connection rate and ICG, and then discards the cell. In the case of an FRM, the required fields from the cell are stored in the ACD and the cell is also discarded. However, as described in The ATM Forum Traffic Management Specifications, v4.0, the ABR_Receive() routine may The Scheduling Process 3-17 BookL64364PG.fm5 Page 18 Friday, January 28, 2000 4:58 PM also choose to immediately send back (turn around) the FRM cell as an out-of-rate cell (CLP = 1). The last requirement (turning around an FRM cell and sending it out of rate) is rather awkward to implement. At this time, the RM cell is present in the cell buffer. If it is put immediately in the transmit FIFO, it may result in a priority inversion since the out-of-rate cell may take the place of a higher priority (for example, CBR) cell in the transmit direction. Note that you could choose to simply discard out-of-rate cells as this is an allowed option according to the ATM Forum specifications. This is not an efficient solution (from the network perspective); there should at least be an attempt to send the cell out. Another option would be to build a separate FIFO of out-of-rate cells and keep them in the cell buffer waiting for an empty slot. This approach is not recommended for two reasons. First, it reduces the size of the Receive FIFO which, in turn, increases the risk of cell loss. Second, before sending an ABR cell, you would have to scan this new FIFO to determine if an FRM cell from the same connection is waiting there. If this is not done, there is the risk that an out-of-rate cell may sit in the FIFO and its data age past usefulness. The solution adopted is a compromise. Just one out-of-rate cell is allowed to be saved temporarily in the cell buffer. If another out-of-rate cell has to be saved, possibly from another connection, the previous cell is discarded. This strategy has the following advantages: • avoids priority inversion as the cells are sent only by the sending task which can correctly decide on priorities • keeps the Receive FIFO sufficiently large • simplifies discarding outdated RM cells as only the cell has to be checked The disadvantage, of course, is that out-of-rate cells (already at their destination) will be discarded if the destination experiences a local congestion or is close to congestion. However, this is a small price compared with the simplifications listed above. Also, since the out-of-rate cells are tagged (CLP=1), they have a higher probability of being dropped by the network. The expanded TxCell() code is shown in Figure 3.13. 3-18 Scheduling BookL64364PG.fm5 Page 19 Friday, January 28, 2000 4:58 PM Figure 3.13 TxCell() Routine Handling Out-of-Rate Cells 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 #define #define #define #define TIME_FRAC 8 IsInPast(T) ( ((long) (TimeNow - (T) + (1 << TIME_FRAC))) > 0) ConHasData(N) ((long) N < 0) ConClass(N) ( ((N) >> 16) & 3) typedef int QoS_Send_t(ulong ConNum, ulong aCell); extern QoS_Send_t CBR_Send, VBR_Send, ABR_Send, UBR_Send; QoS_Send_t *QoS_Send[] = { CBR_Send, VBR_Send, ABR_Send, UBR_Send }; void TxCell(ulong aCell) { ulong N, T; do { N = SCD_Serv(); if (N == 0) { if (aOutOfRateCell) { EDMA_TxCell(0, aOutOfRateCell); aOutOfRateCell = 0; ACI_Free(aCell); } else { pCell_t pCell = (pCell_t) &CBM[aCell]; pCell->CDS = CDS_IDLE; pCell->CellHdr = 0; EDMA_TxCell(0, aCell); } break; } if (QoS_Send[ConClass(N)](N, aCell)) { ACD[N].ThTxTime += ACD[N].ICG; T = ACD[N].ThTxTime; if ( IsInPast(T) ) T = TimeNow + (1 << TIME_FRAC); SCD_Sched(0, T >> TIME_FRAC); break; } else ACD[N].Scheduled = 0; } while (1); SCD_Tic(); TimeNow += 1 << TIME_FRAC; } The Scheduling Process 3-19 BookL64364PG.fm5 Page 20 Friday, January 28, 2000 4:58 PM The modified code shows that, before sending an explicit idle cell, line 18 should be checked to see if there is an out-of-rate cell waiting for transmission. If there is one, it is sent and the current cell location is returned to the ACI free cell list (line 21). Note that, after sending a regular cell (line 32), the out of rate cell must be checked to see if it has been outdated. However, since this can occur only for the ABR connection, it can be handled in the ABR_Send() routine. The CBR_Send() routine is not changed and is not repeated in the listing. 3.3.7.3 Unscheduled ABR Connections An FRM cell may also be received when the connection in the opposite direction is not scheduled because it has no data to send. In this case there are several options. The simplest option would be to immediately put the received cell in the transmit FIFO, possibly after verifying that the current ICG is respected. However, as explained above, this may result in priority inversion, in which the turned-around cell takes the place of a higher-priority cell in the other direction. As already shown in Figure 3.12, this approach results in a slightly more resource-consuming method of scheduling the connection so that the correct priorities are respected and a full ABR source behavior is implemented. 3.4 UBR Connections The Undefined Bit Rate (UBR) connections are usually at the lowest priority and should be serviced when no cells from higher priority QoS connections can be sent. Within the UBR class, connections are serviced round robin. The ATM Forum Traffic Management Specifications allow you the option of defining Peak Cell Rate (PCR) for UBR connections. If link rates are collected during the signaling phase, it makes sense to set the PCR to the minimum link rate since cells at faster rates are discarded anyway. In this example, it is assumed that the PCR is not enforced. However, it will be seen that the UBR scheduling code developed here can handle PCR. 3-20 Scheduling BookL64364PG.fm5 Page 21 Friday, January 28, 2000 4:58 PM To implement the nonPCR service policy, the UBR connections are kept in a circular list. When a connection is serviced, the list pointer is advanced to the next connection. There are several possible implementations of this scheduling strategy. Two of them are to: • manage the UBR list in software • use the Scheduler to manage the UBR list 3.4.1 Managing the UBR List in Software Managing the UBR list in software is an easy task but it consumes CPU resources. The natural place to put the connection list pointer is in the NextVCD field of the VCD. Note that VCDs and ACDs are two different data structures. The EDMA uses VCDs to maintain SAR state variables. The software uses ACDs to maintain rate-related variables. In both the demonstration software and in this manual, ACDs are used only for the transmit direction. Other applications also may require building ACDs for the receive direction. Retrieving a connection from a UBR list requires reading the NextVCD field and the VCD_Ctrl.BuffPres bit. Since the EDMA may modify the BuffPres bit at any time, the VCD_Ctrl field has to be read through uncacheable memory space. Therefore, to avoid cache trashing, it is probably also more efficient to read the NextVCD from the uncacheable space. This results in two separate read operations from the secondary memory, slowing down the APU considerably. However, for the sake of the example, the code is developed in Figure 3.14. Figure 3.14 Implementing a UBR Connection List 1 2 3 4 5 6 7 8 9 10 11 12 13 14 ulong UBR_List; int UBR_Send(ulong aCell) { ulong Head = UBR_List; if (Head) do { ulong N = UBR_List; UBR_List = VCD[N].NextVCD; if (VCD[N].VCD_Ctrl.BuffPres) { EDMA_TxCell(ConNum, aCell); return 1; } } while (N != Head); return 0; } UBR Connections 3-21 BookL64364PG.fm5 Page 22 Friday, January 28, 2000 4:58 PM First, a static variable (UBR_List) is defined to hold the current pointer of the list. Next, if the list is not empty, the connection is retrieved from the current position and the list pointer is advanced. If the connection has data to send, a cell is sent and status code 1 is returned, signalling that a cell was sent. If not, then the pointer continues scanning for a full circle, coming back to the starting position. 3.4.2 Managing UBR Connections Using the Scheduler If a fixed ICG of one cell slot is used, and all UBR connections are put in class 3 (lowest priority), the Scheduler can manage the UBR list in hardware. It might seem that this approach has a drawback when the Scheduler is configured to operate in the Priority mode, where the calendar table holds only a head pointer to the list of connections scheduled for the slot. To schedule a new connection, the Scheduler has to scan the list in order to put the connection at the end of the list of connections with the same priority. This is not a problem for other QoS traffic sources, such as CBR or ABR, where connection rates are typically much lower than the link rate. For those classes, the ICG makes the lists quite short because connections to service are spread over multiple slots. For UBR, however, all connections are at the same slot and each time one is scheduled, the Scheduler has to scan the entire list. The list can be very long. Fortunately, this is not a serious problem. When an ICG of one is used for all UBR connections, the long list is built at the current cell slot and not at the next cell slot. To see how that occurs, assume that a long list is present at the current slot and the next slot is empty. When a connection is serviced, it is placed at the next slot. This is a fast operation since the next slot is empty. Since a connection was serviced, the time is advanced and the SCD_Tic() command is executed. The Scheduler advances to the next slot, grabs the previously scheduled connection and appends it at the end of the list for the current slot, which is quite long. Since the Scheduler returned to the starting point, this process repeats forever. The key factor to the execution efficiency is that the Scheduler keeps both head and tail pointers of lists for all four traffic classes in the current slot in internal registers. It means that the execution time of the SCD_Tic() command does not depend on the number of connections in the current slot (it does of course depend on the number of 3-22 Scheduling BookL64364PG.fm5 Page 23 Friday, January 28, 2000 4:58 PM connections in the next slot) and the number of memory accesses is minimized. Given these considerations, the Scheduler is used to manage the UBR connection list. The pointer to the UBR_Send() routine is already installed in the QoS_Send array (line 10 of Figure 3.13), so the only task remaining is to define the routine itself. See Figure 3.15. Figure 3.15 Managing UBR Lists with the Scheduler 1 2 3 4 5 6 7 8 int UBR_Send(ulong ConNum, ulong aCell) { if (ConHasData(ConNum)) { EDMA_TxCell(ConNum, aCell); return 1; } return 0; } If the ICG of UBR connections is set to zero, the code at line 34 of Figure 3.13 takes care of rescheduling the connection to the next slot. In fact, looking carefully at the UBR_Send() routine, it can be seen that it is exactly the same as the CBR_Send() routine. The only differences are the class and the ICG value. To save instruction memory, just one of them may be used as shown in Figure 3.16 Figure 3.16 UBR_Send and CBR_Send Combined 1 2 3 4 #define UBR_Send CBR_Send QoS_Send_t *QoS_Send[] = { CBR_Send, VBR_Send, ABR_Send, UBR_Send }; Now it is easy to see that if a PCR enforcement is required for a UBR connection, the ICG is simply set to 1/PCR instead of zero. UBR Connections 3-23 BookL64364PG.fm5 Page 24 Friday, January 28, 2000 4:58 PM 3.5 VBR Connections The VBR connections usually are scheduled using timers running at connection PCRs. Many existing SAR devices have a finite set of timers and a connection rate has to be approximated by the closest timer rate. Other SARs use a bandwidth table approach where connections to be serviced are kept in a table with each entry corresponding to a cell slot. Bandwidth tables are primitive versions of the Scheduler calendar table with two important differences: • Bandwidth tables are static. They are set at initialization time and not modified during run time. The calendar table is created dynamically as the connections are rescheduled. • The bandwidth table holds just one entry while the calendar table holds a list of connections. Since bandwidth tables are static, any conflicts are resolved at initialization. A conflict is a situation where more there one connection should be serviced in one cell slot. The calendar table resolves conflicts by delaying connection servicing during run time. 3.5.1 PCR-Based Implementation The Scheduler module in the ATMizer II+ chip allows very easy implementation of VBR scheduling. A connection ICG is set to 1/PCR. When a connection is to be serviced, a leaky bucket test is performed. If the test is positive, a cell is sent. Otherwise, another connection is tested. The leaky bucket test can be implemented as shown in Figure 3.17. 3-24 Scheduling BookL64364PG.fm5 Page 25 Friday, January 28, 2000 4:58 PM Figure 3.17 A Leaky Bucket Routine 1 2 3 4 5 6 7 8 9 10 11 12 int LeakyBucket(ulong N) { long X = ACD[N].Bucket; X -= TimeNow - ACD[N].LstCmplTime; if ( X <= ACD[N].Limit) { if (X < 0) X = 0; ACD[N].Bucket = X + ACD[N].Increment ACD[N].LastComplTime = TimeNow; return 1; } return 0; } This routine is called to check if connection N (passed as a parameter) is allowed to send a cell. First (line 2), the current contents of the connection bucket is assigned to a temporary variable. Next (line 3), a difference between the current time and the last time a cell was sent from this connection is subtracted from the bucket. If the bucket value is less than the limit, a cell is sent, the bucket is incremented, and the routine returns 1 to signal that the test is positive. This algorithm has to be executed at every PCR event to make sure that the cell under consideration is conforming. This is inefficient in the following circumstances: 1. If the data is being sent into the network at a rate close to the Sustainable Cell Rate (SCR), which is the case under steady conditions, then the leaky bucket test would satisfy X > Limit at many of the PCR events. Hence for every PCR/SCR execution of the leaky bucket algorithm, one cell at most is able to get through at the SCR. In other words, the algorithm must be invoked PCR/SCR times to send one cell, leading to inefficient utilization of resources. 2. If an application temporarily has no data to send, the leaky bucket algorithm must continue to run at every PCR event, which wastes cycles. The second problem can be resolved easily by suspending the scheduling of connections with no data to send, much the same as was done for CBR connections. To resolve the first problem, the leaky bucket algorithm must be revised. VBR Connections 3-25 BookL64364PG.fm5 Page 26 Friday, January 28, 2000 4:58 PM 3.5.2 SCR-Based Implementation To avoid invocations at the PCR or SCR, every time a cell is successfully transmitted the nearest time at which the next conforming cell can be transmitted is computed. Then, instead of executing the leaky bucket algorithm at every PCR event, the APU can wait until the earliest conforming time is reached to transmit the cell. Before the APU code is developed, first look at what happens when a leaky bucket algorithm is used to check whether a received cell is conforming. A cell is conforming if: Equation 3.3 Bucket - (ArrivalTime - LastComplTime) <= Limit In addition, each time a conforming cell is received, the bucket is updated as follows: Equation 3.4 Bucket -= (ArrivalTime - LastComplTime) Bucket += Increment Therefore, if to compute the earliest time, T, when a conforming cell may be sent, it is necessary to solve the following equation for T: Equation 3.5 Bucket - (T - TimeNow) = Limit Equation 3.6 T = Bucket + TimeNow - Limit Since the ICG is the difference between current time and the next cell transmission time, it is computed easily as: Equation 3.7 ICG = Bucket - Limit Note that TimeNow is used in the above equations instead of ThTxTime as for other classes of service. The leaky bucket calculations automatically adjust for the lag. Figure 3.18 is an enhanced version of Figure 3.17. If there is data to be sent, the code first updates (line 4) the contents of the bucket. The LastCmplTIme can be computed easily as the difference between ThTxTime and ICG. The code then computes the new ICG (line 6) corresponding to the earliest time a conforming cell from the same connection can be sent. The newly computed ICG is compared to 1/PCR (line 7) to avoid sending cells at rates exceeding PCR. This is a crude 3-26 Scheduling BookL64364PG.fm5 Page 27 Friday, January 28, 2000 4:58 PM way to perform this check; a more elaborate way would be to execute a second leaky bucket calculation for PCR conformance. Note that ThTxTime is updated to the current time (line 8) so that the next cell is scheduled after the current ICG. Figure 3.18 An SCR-Based Leaky Bucket Algorithm 1 2 3 4 5 6 7 8 9 10 11 12 int VBR_Send(ulong N, ulong aCell) { if (ConHasData(N) { N = ConConNum(N); ACD[N].Bucket -= TimeNow - (ACD[N].ThTxTime - ACD[N].ICG); ACD[N].Bucket = maxi(0, ACD[N].Bucket) + ACD[N].Increment; ACD[N].ICG = ACD[N].Bucket - ACD[N].Limit; ACD[N].ICG = maxi(ACD[N].ICG_PCR, ACD[N].ICG); ACD[N].ThTxTime = TimeNow; return 1; } return 0; } 3.6 ABR Connections For Available Bit Rate (ABR) service, the behavior of an end system is governed by a set of rules for both source and destination end systems. The rules are defined in The ATM Forum Traffic Management Specifications, v4.0 for rate-based flow control. The basic idea of rate-based flow control is to send special cells at regular intervals. These special cells, called Resource Management (RM) cells, are used to probe the state of the network. As the RM cells travel through the network, following the same route as the data cells, ATM switches may change their contents. When an RM cell arrives at its destination, it is turned around and sent back to the source following the same route in the opposite direction. Finally, when the RM cell returns to the source, the source has to modify the connection rate based on the information inserted in the RM cell by the switches along the connection route. Since the source and destination rules, and the associated pseudocode, are clearly described in the ATM Forum specifications, it is not necessary to build the ABR code progressively as was done for the preceding cases. Instead, fully commented C code implementing the ABR_Send() and the ABR_Receive() functions is given in Section 3.8, “Source Code ABR Connections 3-27 BookL64364PG.fm5 Page 28 Friday, January 28, 2000 4:58 PM Listings.” The C code is quite faithful to the pseudocode given in the Traffic Management Specifications except for the following differences: • • • Variable name choices in the ATM Forum pseudocode are quite poor. To make the code easier to read, the following variable names are used: – Count is replaced by InRateCell – Turn-around is replaced by PresBRM – First-turn is replaced by LastWasFRM – Unack is replaced by FRM_sinceBRM The following ABR parameters are set to the constant values specified below: – Nrm = 32 – Trm = 100 ms – Mrm = 2 There is no support for out-of-rate Forward Resource Management cells. Source behavior No. 11 specifies that FRM cells may be sent out of rate at a rate not exceeding TCR (Tagged Cell Rate). The pseudocode of the Traffic Management Specifications, v4.0, Appendix I, chose to implement this behavior by sending out-of-rate FRM cells only if the Allowed Cell Rate (ACR) is below TCR. This may not be the best course of action since it stops any data traffic. Moreover, it requires that a separate data structure be used to schedule the out-of-rate FRM cells, with its accompanying costs in greater memory usage. In light of this, it was decided that the advantages of supporting out-of-rate FRM cells at TCR (TCR = 10 cells/s) were not worth the cost and decided not to support them in our implementation. 3-28 • Section 3.8.9, “Transmit and Receive ABR Cells (ABR.c),” contains the code that discards out-of-date BRM cells waiting for transmission. This is also a deviation from the ATM Forum pseudocode. A justification for the deviation was given in Section 3.3.7.2, “Receiving an RM Cell.” • Section I.7 of Appendix I of the Traffic Management Specifications has a detailed discussion of the options available for turning around FRM cells at the destination. There are five distinct implementations Scheduling BookL64364PG.fm5 Page 29 Friday, January 28, 2000 4:58 PM that maintain compliance with the Traffic Management Specifications, namely: 1. The newly arrived cell is sent as an out-of-rate BRM cell in addition to being scheduled for in-rate transmission. 2. The old cell is sent as an out-of-rate BRM cell and the newly arrived cell is scheduled for in-rate transmission. 3. The newly arrived cell is scheduled for in-rate transmission and the old cell is dropped. 4. Two copies of the newly arrived cell are scheduled for in-rate transmission. 5. Both the old cell and the newly arrived cell are scheduled for in-rate transmission. The implementation described in Section 3.3.7.2, “Receiving an RM Cell,” does not strictly fit in any of these five categories. It lies somewhere between options 1 and 3. If the link is lightly loaded, then the implementation approaches the behavior of option 1. If the link is heavily loaded, it approaches the behavior of option 3. The discussion in the Traffic Management Specifications, Section I.7.1 of Appendix I discourages the use of option 3, since the analysis given there leads to the conclusion that the rate of BRM cells would be much lower than that of FRM cells and lead to a decrease in responsiveness. However, this analysis is believed to be inaccurate, particularly for the case when ACRbck = 0. The text claims that there will be no flow of BRM cells at all under option 3. As explained in the next bullet, this is not the case in our implementation. When ACRbck = 0, the rate of in-rate BRM cells should approach 1/32 of the rate of FRM cells. Under this condition, option 3 still leads to more acceptable performance, even in the presence of heavy link traffic. An aspect of destination behavior that is not very clear from the pseudocode given in Appendix I of the Traffic Management Specifications is handling the FRM cells that arrive when the connection is not scheduled for transmission. If the connection is immediately scheduled for transmission for the next cell slot, as recommended, this may lead to the flow of in-rate BRM cells at a rate that cannot be policed by the return path congestion-control mechanisms. To avoid this, invoke the connection rescheduling rules described in Section 3.3.5.1, “Connection Rescheduling.” Under those rules, the connection is ABR Connections 3-29 BookL64364PG.fm5 Page 30 Friday, January 28, 2000 4:58 PM rescheduled for the next transmission (even though there may be no data to send) and, if another FRM cell arrives before the next transmission instant, it awaits its turn so that it can be sent in-rate. To optimize the cache performance of the architecture, the contents of the VCD must fit within 32 bytes since this also happens to be the size of the cache line in the 4010 RISC processor. The code in Section 3.8.9, “Transmit and Receive ABR Cells (ABR.c)” contains the details of how this is achieved. 3.7 Local Congestion Local congestion is the situation when the sum of the active connection rates exceed the output link bandwidth. A local congestion may be transient causing a small buildup of connection lists in the calendar table, or of long duration causing important delays in connection service times. In the presence of local congestion, some connections are serviced at rates lower than requested. In this section, system behavior is analyzed in the presence of local congestion. 3.7.1 Fairness Normally, all connections are serviced according to their rates. In the presence of a congestion, some connections have their actual rates reduced. It is important that the rate reductions satisfy the following requirements: • Priorities are respected. Rates of lower priority connections are reduced, eventually to zero, before the rates of higher priority connections are reduced. • Rates are reduced in fair manner. An example of fairness criteria is the maximum-minimum fairness defined in the Traffic Management Specifications. To verify that the scheduling algorithm described in this chapter satisfies the above requirements, a series of simulations were performed. A very important result of these simulations is that priorities are respected and maximum-minimum fairness is achieved. One set of simulations with the achieved results is described in the following paragraphs. 3-30 Scheduling BookL64364PG.fm5 Page 31 Friday, January 28, 2000 4:58 PM In this set of simulations, there are two constant sets of connections assigned to class 0 and 1. Each set is composed of ten connections and uses 30% of the link bandwidth. The third set is composed of ten connections belonging to class 2 and is requesting an increasing share of the bandwidth. The normalized aggregate rates of the class 2 connections are set to 0, 0.3, 0.4, 0.5, 0.6, 0.9, and 1.2. The total requested link utilization is thus 0.6, 0.9, 1.0, 1.1, 1.2, 1.5, and 1.8. Since the actual link utilization cannot exceed 1.0, the actual rates of the class 2 connections are decreased by the system, while classes 0 and 1 are unaffected. Table 3.2 shows the results of the simulations for classes 0 and 1. The column Actual represents the actual connection rate, the column Requested is the requested connection rate, and the column Error is the difference between them. For reference, the requested ICG (inverse of requested rate) is shown in the last column. Table 3.2 Simulation Results for Class 0 and 1 Connections ConNum Class Actual Requested Error Req. ICG 1 0 0.0080 0.0071 -12.68% 140.85 2 0 0.0100 0.0096 -4.17% 104.17 3 0 0.0510 0.0511 0.20% 19.57 4 0 0.0410 0.0404 -1.49% 24.75 5 0 0.0410 0.0404 -1.49% 24.75 6 0 0.0400 0.0392 -2.04% 25.51 7 0 0.0150 0.0141 -6.38% 70.92 8 0 0.0250 0.0250 0.00% 40.00 9 0 0.0290 0.0289 -0.35% 34.60 10 0 0.0440 0.0442 0.45% 22.62 11 1 0.0130 0.0123 -5.69% 81.30 12 1 0.0350 0.0347 -0.86% 28.82 13 1 0.0060 0.0051 -17.65% 196.08 (Sheet 1 of 2) Local Congestion 3-31 BookL64364PG.fm5 Page 32 Friday, January 28, 2000 4:58 PM Table 3.2 Simulation Results for Class 0 and 1 Connections (Cont.) ConNum Class Actual Requested Error Req. ICG 14 1 0.0190 0.0183 -3.83% 54.64 15 1 0.0380 0.0381 0.26% 26.25 16 1 0.0190 0.0187 -1.60% 53.48 17 1 0.0390 0.0397 1.76% 25.19 18 1 0.0570 0.0573 0.52% 17.45 19 1 0.0280 0.0284 1.41% 35.21 20 1 0.0470 0.0475 1.05% 21.05 (Sheet 2 of 2) The differences between requested and actual rates are small. Connection 13 has a high error which is actually a simulation artifact due to the short simulation time (1000 slots compared to an ICG of 196 slots). With that exception, all connections are serviced at actual rates close to the requested rates in spite of a local congestion experienced by lower priority connections in class 2. The situation is different for class 2 connections that oversubscribe the link. See Table 3.3. 3-32 Scheduling BookL64364PG.fm5 Page 33 Friday, January 28, 2000 4:58 PM Table 3.3 Simulation Results for Class 2 Connections Actual Req. Error Req. ICG Actual Req. Error Req. ICG 21 0.0250 0.0248 -0.81% 40.32 0.0340 0.0348 2.30% 28.74 22 0.0200 0.0204 1.96% 49.02 0.0450 0.0464 3.02% 21.55 23 0.0280 0.0279 -0.36% 35.84 0.0340 0.0351 3.13% 28.49 24 0.0490 0.0498 1.61% 20.08 0.0310 0.0319 2.82% 31.35 0.0300 0.0303 0.99% 33.00 0.0420 0.0429 2.10% 23.31 26 0.0440 0.0451 2.44% 22.17 0.0590 0.0606 2.64% 16.50 27 0.0120 0.0115 -4.35% 86.96 0.0510 0.0530 3.77% 18.87 28 0.0510 0.0528 3.41% 18.94 0.0030 0.0026 -15.38% 384.62 29 0.0150 0.0155 3.23% 64.52 0.0300 0.0311 3.54% 32.15 30 0.0210 0.0218 3.67% 45.87 0.0600 0.0617 2.76% 16.21 21 0.0550 0.0769 28.48% 13.00 0.0530 0.0658 19.45% 15.20 22 0.0540 0.0617 12.48% 16.21 0.0530 0.0571 7.18% 17.51 23 0.0550 0.0806 31.76% 12.41 0.0060 0.0058 -3.45% 172.41 24 0.0020 0.0015 -33.33% 666.67 0.0550 0.1022 46.18% 9.78 0.0340 0.0345 1.45% 28.99 0.0540 0.0846 36.17% 11.82 26 0.0540 0.0741 27.13% 13.50 0.0040 0.0041 2.44% 243.90 27 0.0200 0.0206 2.91% 48.54 0.0100 0.0103 2.91% 97.09 28 0.0550 0.0827 33.49% 12.09 0.0530 0.0637 16.80% 15.70 29 0.0400 0.0411 2.68% 24.33 0.0540 0.1303 58.56% 7.67 30 0.0250 0.0262 4.58% 38.17 0.0530 0.0761 30.35% 13.14 Set 25 25 0.9 1.1 Set 1.0 1.2 (Sheet 1 of 2) Local Congestion 3-33 BookL64364PG.fm5 Page 34 Friday, January 28, 2000 4:58 PM Table 3.3 Simulation Results for Class 2 Connections (Cont.) Actual Req. Error Req. ICG Actual Req. Error Req. ICG 21 0.0550 0.1361 59.59% 7.35 0.0110 0.0107 -2.80% 93.46 22 0.0550 0.1892 70.93% 5.29 0.0480 0.1597 69.94% 6.26 23 0.0450 0.0467 3.64% 21.41 0.0470 0.1027 54.24% 9.74 24 0.0540 0.1793 69.88% 5.58 0.0100 0.0095 -5.26% 105.26 0.0540 0.1692 68.09% 5.91 0.0470 0.1766 73.39% 5.66 26 0.0260 0.0270 3.70% 37.04 0.0470 0.0887 47.01% 11.27 27 0.0540 0.0993 45.62% 10.07 0.0470 0.1918 75.50% 5.21 28 0.0110 0.0111 0.90% 90.09 0.0460 0.0790 41.77% 12.66 29 0.0200 0.0200 0.00% 50.00 0.0460 0.2087 77.96% 4.79 30 0.0210 0.0220 4.55% 45.45 0.0460 0.1727 73.36% 5.79 Set 25 1.5 Set 1.8 (Sheet 2 of 2) When the link capacity is not exceeded (sets 0.9 and 1.0 in the table), class 2 connections are scheduled at rates close to those requested. However, in the presence of local congestion (sets 1.1 and above), rates of some connections are reduced. It is interesting to analyze how the connection rates are reduced. The aggregate link bandwidth available for class 2 connections is 0.4 and, since there are ten class 2 connections, the fair share of the link is 0.04 per connection. All connections with rates below the fair share, which are called conforming connections, are satisfied at their requested rates. The rates of all conforming connections are then subtracted from the available link bandwidth (0.4 in this example). The nonconforming connections equally share the result of the subtraction. 3-34 Scheduling BookL64364PG.fm5 Page 35 Friday, January 28, 2000 4:58 PM A connection is conforming if and only if: L r ≤ S = -------NC Equation 3.8 where and r S L NC is is is is the the the the actual rate of the connection fair share link’s total bandwidth total number of connections on the link The actual rate of a nonconforming connection, j, may be calculated as follows: L– rj = Equation 3.9 where and rj ri K NK is is is is ∑ ri i ∈K ------------------------- the the the the NC – NK actual rate of nonconforming connection j actual rate of connection i set of all conforming connections number of conforming connections 3.7.2 List Lengths Another area of concern in the presence of local congestion is the length of the calendar lists, particularly in Priority mode. Intuitively, one may think that the lists become longer, slowing down Scheduler operations. Since the Scheduler has to scan the lists in Priority mode, the execution time is proportional to the average list length. Fortunately, in this case, the intuition is wrong. In fact, the list lengths average less than one, except for the current cell slot. To understand how this works, first consider system behavior in the absence of local congestion and assume that the number of active connections does not change. This is equivalent to saying that all scheduled connections have infinite buffers of data to send. When the link utilization is less than 1.0, some cell slots are empty and the calendar table is sparse. As the link utilization increases, more and more slots are nonempty. When the link utilization exceeds 1.0 and the system enters a local congestion state, the list of connections at the current slot starts to grow, creating a wave propagating throughout the calendar table. Slots in front of the wave have Local Congestion 3-35 BookL64364PG.fm5 Page 36 Friday, January 28, 2000 4:58 PM the list length decreasing while the list length at the current slot increases. Simulations described in the previous section were used to extract the average length of connection lists at the current cell slot and the next nine cell slots. As shown in Table 3.4, in the absence of local congestion, the list length at the current slot approaches 1.0 while subsequent slots have lengths proportional to the sum of all rates, called the actual link rate. When the actual link rate increases to the maximum link rate (1.0), the average list length also increases. In the presence of local congestion, the list length of the current slot increases further, while the list lengths at subsequent slots decrease. Table 3.4 Calendar List Length for Varying Link Utilizations Link Now+0 Now+1 Now+2 Now+3 Now+4 Now+5 Now+6 Now+7 Now+8 Now+9 0.6 0.95 0.61 0.61 0.61 0.61 0.60 0.60 0.60 0.60 0.60 0.9 2.39 0.91 0.90 0.90 0.90 0.90 0.90 0.90 0.90 0.89 1.0 5.13 0.99 0.98 0.97 0.95 0.94 0.92 0.90 0.87 0.86 1.1 7.62 0.76 0.75 0.75 0.75 0.74 0.73 0.72 0.72 0.71 1.2 7.98 0.66 0.65 0.65 0.65 0.65 0.64 0.64 0.64 0.64 1.5 7.96 0.73 0.73 0.72 0.71 0.71 0.71 0.70 0.70 0.69 1.8 9.15 0.64 0.64 0.64 0.64 0.64 0.63 0.63 0.63 0.63 2.1 10.10 0.64 0.64 0.64 0.63 0.63 0.63 0.63 0.63 0.63 3.7.3 Detecting a Local Congestion In some applications, it may be necessary to detect local congestion. This is achieved easily by monitoring the connection lag, that is, the difference between the Theoretical Transmit Time stored in the ACD and the actual time. If the difference exceeds a threshold, the local congestion state is declared. 3.7.4 Minimum Cell Rate Guarantees One of the ABR parameters is the Minimum Cell Rate (MCR). An ABR connection is guaranteed to be serviced at minimum at the MCR even in 3-36 Scheduling BookL64364PG.fm5 Page 37 Friday, January 28, 2000 4:58 PM the presence of network congestion. The guarantee is possible due to the connection-oriented nature of ATM which verifies that enough bandwidth is available during call setup. If the network cannot satisfy MCR, the new call is rejected. An ATM end system might also perform a similar verification to ensure that the sum of all MCRs does not exceed the outgoing link bandwidth minus the rates of higher priority connections. Alternatively, this check may be performed by an ingress switch. Even if the call setup verifications are performed, it is not sufficient to guarantee MCR during run time in the presence of local congestion. To understand that, group connections into two sets, set K with connections that have an MCRK > 0 and set M with connections that have an MCRM = 0. In the absence of local congestion, MCRs for set K are respected due to the appropriate rules in the Traffic Management Specifications. In the presence of local congestion, it is preferable that the rates of all nonconforming connections (as defined in Section 3.7.1, “Fairness”) to be decreased in a fair manner unless this decrease results in a rate lower than MCR, in which case the rate should be set to MCR. However, the actual rates are limited by Equation 3.9 on page 3-35, which does not take into account MCR. In other words, connections with MCR = 0 take outgoing link bandwidth from connections with MCR > 0, resulting in MCR violations. One possible solution to this problem is to put set K (connections with MCR > 0) in a separate class with higher priority than set M (connections with MCR = 0). Although simple, this solution results in an unfair situation for set M. Then the algorithm always would satisfy set K in full and reduce the rates of set M, while the desired behavior is to reduce rates of both sets K and M unless the decrease results in a rate lower than MCR. A more complex solution is to dynamically switch connections between two priority classes in run time. For that to work, the Real Cell Rate must be measured. The Real Cell Rate (RCR) is defined as the rate observed on the outgoing link as opposed to the Actual Cell Rate (ACR) which is governed by ABR source and destination rules. The measurement may be performed by computing inverses of real ICGs and averaging them over some time. If the Real Cell Rate drops below MCR, the connection is moved to a higher priority class. It returns to a lower priority class Local Congestion 3-37 BookL64364PG.fm5 Page 38 Friday, January 28, 2000 4:58 PM when the Real Cell Rate exceeds MCR. Sufficient hysteresis should be introduced to avoid oscillations. This manual does not contain the C code necessary to implement the MCR guarantees. 3.7.5 MultiPHY Operation The code described in the previous sections is well suited for a single PHY environment. It also can be used for multiPHY applications without ABR when local congestion may be avoided by rejecting calls that would result in congestion. Since this is inherently impossible with ABR (there is no notion of average rate), a multiPHY environment with ABR requires enhancements to the basic code. When a FIFO full strategy is used to pace the invocation of the TxCell() scheduling routine (see Section 3.1.2, “FIFO Full Synchronization”), the ATMizer II+ chip tries to send cells to a PHY FIFO as quickly as possible. When the PHY FIFO becomes full, it paces down the ACI TxFIFO which, in turn, paces down invocation of the scheduling routine. Recall that each invocation of TxCell() results in one cell placed in the ACI TxFIFO. Consider a situation where there are two outgoing links, A and B. Link A is oversubscribed and link B is not. Since there is only one calendar, the scheduling algorithm puts more cells in the TxFIFO for link A than for link B. This, in turn, results in a head-of-line blocking and under-utilization of link B. Note that it is not good enough to have a separate TxFIFO for each PHY device to avoid this problem. The root cause is that there is only one calendar. To avoid the problem, one calendar table is needed per PHY device. If all PHY devices are synchronized to the same network clock, they can be served in a round-robin way, invoking the TxCell() routine with a different calendar each time. The CalSwitch command can be used to change the calendar table of the ATMizer II+ hardware Scheduler. The Scheduler modifies the internal pointers such that all subsequent commands are performed on the new calendar. The TxCell() routine needs to be encapsulated in a wrapper routine, SendCell(), that issues the CalSwitch command as shown in Figure 3.19. 3-38 Scheduling BookL64364PG.fm5 Page 39 Friday, January 28, 2000 4:58 PM Figure 3.19 A MultiPHY TxCell() 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 ushort aCell[PHY_NUM]; ulong SCD_Ctrl[PHY_NUM]; ushort CurrentPHY; /* free cell addresses */ /* SCD_Ctrl registers for each calendar */ /* current index into above tables */ int SendCell() { int i; ulong SaveTime = TimeNow; do { aCell[CurrentPHY] = ACI_Free(); if (aCell[CurrentPHY] == 0) return 0; } while (++CurrentPHY < PHY_NUM); for (i = 0; i < PHY_NUM; i++) { TimeNow = SaveTime; /* Cal_Switch() is a macro that issues CalSwitch Command */ Cal_Switch(i); TxCell(aCell[i]); } CurrentPHY = 0; return 1; } The code first tries to acquire one free cell location per PHY device and to store it in the aCell array. If this is not possible because the ACI TxFIFO does not have enough free locations, the routine exits with a failure status code. The next invocation then continues free cell acquisition at the place where the previous invocation failed. When enough free cells are acquired, the code invokes the TxCell() routine once per PHY device, each time modifying the calendar base address. The current time must be preserved as each invocation of TxCell() increments it. In high-speed applications (for example three DS-3 lines), the memory bandwidth requirements may be reduced if multiple cells for each PHY are built before the calendar table is changed. The code in Figure 3.20 implements this strategy. Local Congestion 3-39 BookL64364PG.fm5 Page 40 Friday, January 28, 2000 4:58 PM Figure 3.20 Enhanced MultiPHY Code 1 ushort aCell[PHY_NUM * PHY_BLOCK]; /* free cell addresses */ 2 ulong SCD_Ctrl[PHY_NUM * PHY_BLOCK]; /* SCD_Ctrl registers for each calendar */ 3 ushort CurrentPHY; /* current index into above tables */ 4 5 int SendCell() 6 { 7 int i, j; 8 ulong SaveTime = TimeNow; 9 do { 10 aCell[CurrentPHY] = ACI_Free(); 11 if (aCell[CurrentPHY] == 0) 12 return 0; 13 } while (++CurrentPHY < PHY_NUM * PHY_SIZE); 14 15 for (i = 0; i < PHY_NUM; i++) { 16 Cal_Switch(i); 17 TimeNow = SaveTime; 18 for (j = i * PHY_BLOCK; j < (i + 1) * PHY_BLOCK; j++) 19 TxCell(aCell[j]); 20 } 21 CurrentPHY = 0; 22 return 1; 23 } This code is very similar to the previous one with the exception that PHY_BLOCK cells are built for each PHY before the calendar base address is switched. Since multiple calendars are used for the connections of different PHY devices, the connections need to be rescheduled after data is attached to the VCD in the corresponding calendar. Therefore, the rescheduling code from Section 3.3.5.1, “Connection Rescheduling” is modified for multicalendar support as shown in Figure 3.21. 3-40 Scheduling BookL64364PG.fm5 Page 41 Friday, January 28, 2000 4:58 PM Figure 3.21 Buff Completion Queue Interrupt Handler for MultiCalendar Support 1 2 3 void ServBuffComplQueue() { ulong N = EDMA_ComplQueue(); int Cal_No = (ACD[N].ACD_Ctrl >> ACD_CalNo) & 0x3; 4 int CurrCal = (Hdr->SCD.CalSwitch) >> 6; 5 if (CurrCal == Cal_No) { 6 ulong T = SCD_Now(); 7 SCD_Sched(N, T + 1); 8 } 9 else { 10 /* Macro to switch calendar */ 11 CalSwitch(Cal_No); 12 ulong T = SCD_Now(); 13 SCD_Sched(N, T + 1); 14 CalSwitch(CurrCal); 15 } 16 } If the PHY devices are not running on the same network clock and are of different rates, calendar switching can be done such that the number of cells serviced from each calendar (including idle cells) is proportional to the line rate of the PHY device corresponding to the calendar. Multiple calendars are used in the multiPHY operation to improve the QoS for connections of different PHY devices. Using multiple calendars reduces the head-of-line blocking inherent with a single calendar in multiPHY operation. An analysis of the jitter of CBR connections is included here to illustrate the improvement in the variance of the intercell gap when multiple calendars are used for a multiPHY operation. The CBR connections are opened on two PHY devices as shown in Table 3.5. The line rate of PHY 0 is OC-3 (155 Mbps) and the line rate of PHY 1 is DS3 (45 Mbps). Note that, for the connections of PHY 1, the intercell gap in the calendar is multiplied by 155/45 since the calendar slot time corresponds to the line rate of OC-3. Local Congestion 3-41 BookL64364PG.fm5 Page 42 Friday, January 28, 2000 4:58 PM Table 3.5 Initial Setup for MultiPHY Connections Connection Number Calendar Number PHY Device Number Rate in Cells/s Rate in Mbits/s Intercell Gap (µs) 1 0 0 176603 74.88 5.66 2 0 0 176603 74.88 5.66 3 0 1 7580 3.21 132.09 4 0 1 7580 3.21 132.09 5 0 1 7580 3.21 132.09 6 0 1 7580 3.21 132.09 7 0 1 7580 3.21 132.09 8 0 1 7580 3.21 132.09 9 0 1 7580 3.21 132.09 10 0 1 7580 3.21 132.09 11 0 1 7580 3.21 132.09 12 0 1 7580 3.21 132.09 13 0 1 7580 3.21 132.09 14 0 1 7580 3.21 132.09 15 0 1 7580 3.21 132.09 16 0 1 7580 3.21 132.09 Two measurements were made on the two PHY 0 connections using an HP E1697A. Table 3.6 shows the variance in the intercell gap for the connections of PHY 0 when a single calendar is used to schedule cells to both the PHY devices. 3-42 Scheduling BookL64364PG.fm5 Page 43 Friday, January 28, 2000 4:58 PM Table 3.6 PHY 0 Statistics at 155 Mbps with a Single Calendar Connection Number Calendar Number PHY Device Number Intercell Gap (µs) Intercell Gap Variance (µs) Rate (Mb/s) 1 0 0 8.7 15.35 48.74 2 0 0 8.7 15.35 48.74 As described earlier, in case of multiple PHY devices that are scheduled using the same calendar, the connections on the faster PHY device suffer a greater variance in the intercell gap (and thereby jitter in the rate) due to the head-of-line blocking by the cells belonging to the slower devices. The same measurements were taken using two calendars, one for each PHY. The results are shown in Table 3.7. Table 3.7 PHY 0 Statistics at 155 Mbps with Multiple Calendars Connection Number Calendar Number PHY Device Number Intercell Gap (µs) Intercell Gap Variance (µs) Rate (Mb/s) 1 0 0 5.83 0.55 73.1 2 0 0 5.83 0.55 73.1 It can be seen from the tables that the jitter is substantially improved when using two calendars. Note also that the transmission rate increased to near maximum. The code listing provided in Section 3.8, “Source Code Listings,” does not include multiPHY operation. Local Congestion 3-43 BookL64364PG.fm5 Page 44 Friday, January 28, 2000 4:58 PM 3.8 Source Code Listings The remainder of this section provides sample listings for all of the ATMizer II+ code developed for topics described in this chapter. The code is composed of the following files: • uTypes.h defines basic types • ATMizer2.h main header file • Hdr.h all hardware definitions • Instr.h definitions of CW4010 extended instructions • ABR.h declarations specific to ABR • Cell.c main routine for sending and receiving cells • CBR.c handles CBR and UBR traffic • VBR.c handles VBR traffic • ABR.c handles ABR traffic 3.8.1 Macros and Types Header File (uTypes.h) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 3-44 /* $Id: uTypes.h,v 1.3 1996/06/04 22:16:59 zhifeng Exp $ */ /* -----------------------------------------------------------* ATMizer-2 * Copyright (C) 1995–1999 LSI Logic Corporation * * uTypes.h - Main include file defining basic types and macros * * -----------------------------------------------------------*/ #ifndef _UTYPES_H #define _UTYPES_H typedef unsigned long ulong; typedef unsigned short ushort; typedef unsigned char uchar; #define U16 0x0000ffff /* -----------------------------------------* Macros to access a data element of different type */ #define byte(x) ( *( (uchar *) &(x)) ) #define half(x) ( *( (ushort *) &(x)) ) #define word(x) ( *( (ulong *) &(x)) ) Scheduling BookL64364PG.fm5 Page 45 Friday, January 28, 2000 4:58 PM 26 27 28 29 30 31 /* -----------------------------------------* Macro to create bit masks */ #define one(x) ((ulong) 1 << (ulong) (x)) #endif 3.8.2 ATMizer II+ Header File (ATMizer2.h) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 /* $Id: ATMizer2.h,v 1.13 1996/08/06 22:15:42 thomasd Exp $ */ /* -----------------------------------------------------------* ATMizer-2 * Copyright (C) 1995–1999 LSI Logic Corporation * * ATMizer2.h - Main include file for L64364 * -----------------------------------------------------------*/ #ifndef _ATMIZER2_H #define _ATMIZER2_H /* -----------------------------------------------------------* MACRO DEFINITIONS */ /* * Fractional part of time, currently 24.8 is recommended * If you increase this value, make sure that your ICG * can accommodate the maximum value */ #define TIME_FRAC 8 /* Calendar size mask. This value can be adjusted according * to the calendar size. */ #define CAL_SIZE_MASK 63 /* * Value to use for Cell Descriptor to send an explicit * idle cell */ #define CDS_IDLE (4 << 10) /* * Test to avoid scheduling connections in the past */ #define IsInPast(T) ((long)TimeNow - (long)T + (1 << TIME_FRAC) > 0) /* * Extract SCD_BuffPres and SCD_Class fields from * a value returned by SCD_Serv() */ #define ConHasData(N) ((long) N < 0) Source Code Listings 3-45 BookL64364PG.fm5 Page 46 Friday, January 28, 2000 4:58 PM 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 3-46 #define ConClass(N) ( ((N) >> 16) & 3) /* * Check the PTI field of a cell header */ #define PTI_HDR (7 << 1) #define PTI_RM (6 << 1) /* -----------------------------------------------------------* TYPE DECLARATIONS */ /* -----------------------------------------------------------* Receive and transmit rings */ typedef struct { ulong *Ptr; ulong *Base; ulong *End; } Ring_t, *pRing_t; /* -----------------------------------------------------------* Statistics vector */ typedef struct { ulong RxCells; /* received cells */ ulong TxCells; /* transmitted cell */ ulong RxPDU; /* received PDUs */ ulong TxPDU; /* transmitted PDU */ ulong ErrCrc; /* received crc errrored PDUs */ ulong ErrLength; /* received length errored PDU */ ulong ErrAbort; /* received aborted (zero length) PDUs*/ ulong ErrLowMem; /* one free buffer list is empty */ ulong ErrNoContBuff; /* received partially built PDUs */ ulong ErrNoMem; /* Both free buffer list are empty */ ulong ErrNoData; /* no buffer is attached to VCD */ ulong ErrTimeout /* received aborted (timeout) PDUs */ ulong ErrRxLost; /* received lost cells */ ulong ErrConNum; /* wrong connection number */ ulong ErrCrc10; /* errored (crc10) RM cells */ } Stat_t, *pStat_t; /* -----------------------------------------------------------* ATM cell in Cell Buffer. This declaration does not support tag bytes */ typedef struct { ulong CDS; ulong CellHdr; uchar Payd[48]; } Cell_t, *pCell_t; /* -----------------------------------------------------------* APU Connection descriptor. Scheduling BookL64364PG.fm5 Page 47 Friday, January 28, 2000 4:58 PM 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 * The size depends on the QoS supported. * CBR, UBR : 8 bytes * CBR, VBR, UBR : 20 bytes * CBR, VBR, UBR and ABR : 32 bytes * * To simplify the coding (and speed array index calculations) * we will always use 32 bytes per connection. * * ABR specific data is declared in ABR.h, here we only declare * descriptors for other QoS. In all cases, the first two words * are the same (ICG and ThTxTime) * The following 3 words are used by VBR only. * * The first word (ICG) also stores the connection ‘Scheduled’ flag * on bit 31, while the ICG uses bits 23:0 and bits 30:24 are unused * and all zero. Instead of using bitfields for this data structure * (for example like that: * typedef struct { * ulong Scheduled:1, * ICG:30; * which is quite inefficient, we define access macros: * ACD_Sched(ACD[N]) - sets connection state to ‘Scheduled’ * ACD_UnSched(ACD[N]) - sets connection state to ‘not Scheduled’ * ACD_IsSched(ACD[N]) - returns true if connection is scheduled * * Since the value of ICG is only used when connection is scheduled * we store state ‘Scheduled’ as bit 31 = 0, which avoids clearing * this bit everytime ICG is fetched. */ typedef struct { ulong ICG; /* in 16.8 format */ ulong ThTxTime; /* The following declarations are for VBR connections only */ ulong Bucket; /* Current Bucket contents */ ulong Increment; /* Bucket Increment each time a cell is sent */ ulong Limit; /* Bucket limit */ ulong ICG_PCR; /* 1/PCR */ uchar Pad[32-6*4]; ushort ACD_Ctrl; } ACD_t, *pACD_t; #define ACD_Sched(x) #define ACD_UnSched(x) #define ACD_IsSched(x) byte(x) = 0 byte(x) = 0x80 ( *( (long *) &(x)) >= 0 ) /* -----------------------------------------------------------* Type of CBR/VBR/ABR/UBR_send functions */ typedef int QoS_Send_t(const ulong, const ulong); /* -----------------------------------------------------------* Out of rate cell in cell buffer Source Code Listings 3-47 BookL64364PG.fm5 Page 48 Friday, January 28, 2000 4:58 PM 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 3-48 */ typedef struct { ushort aCell; ushort ConNum; } OutOfRate_t; /* -----------------------------------------------------------* EXTERNAL DECLARATIONS */ /* -----------------------------------------------------------* Declaration of data structures located in external memory */ extern pACD_t ACD; extern pVCD_t VCD; extern pBFD_t BFD; /* -----------------------------------------------------------* Declaration of global variables located in Data RAM */ extern extern extern extern extern extern extern extern ulong Ring_t Ring_t OutOfRate_t Stat_t ulong* ulong* ulong* TimeNow; TxRing; RxRing; OutOfRate; Stat; HCD_MsgBase; Stats_MsgBase; APU2Host_Mbx; /* -----------------------------------------------------------* Declarations of global functions */ extern void Initialize(void); extern QoS_Send_t CBR_Send, VBR_Send, ABR_Send, UBR_Send; extern int ABR_Receive(const ulong, const ulong); extern extern extern extern extern extern extern extern extern extern extern extern extern void ComplMsg(const ulong, const pRing_t); void BuffMsg(void); void HostMsg(const ulong); void RxCell(const ulong); void TxCell(const ulong); void BFS_Error(const ulong); void HostMsg(const ulong); ulong GetRing(pRing_t); ulong PutRing(pRing_t, ulong); ulong iramSize(ulong, ulong); void setDram(const ulong, const ulong); void setIram(const ulong, const ulong); void loadIram(const ulong, const ulong, const ulong); #endif Scheduling BookL64364PG.fm5 Page 49 Friday, January 28, 2000 4:58 PM 3.8.3 ATMizer II+ Hardware Header File (Hdr.h) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 /* $Id: Hdr.h,v 1.4 1996/06/04 22:16:05 zhifeng Exp $ */ /* -----------------------------------------------------------* ATMizer-2 * Copyright (C) 1995–1999 LSI Logic Corporation * * Hdr.h - Declarations for all the L64364 hardware resources * *-----------------------------------------------------------*/ #ifndef _HDR_H_ #define _HDR_H_ /*_____________________________________________________________ * * HARDWARE REGISTERS MAP *_____________________________________________________________ */ /* EDMA memory mapped registers */ typedef struct EDMA_Reg_s { /* ItemName; Offs, Size, R/W, Description */ ulong TxCompl; /* 0x00, 32, R, Read Transmit Completion Queue. */ ulong TxConNum; /* 0x04, 32, R/W, Connection number for the TxCell command. */ ulong TxCell; /* 0x08, 32, R/W, Issue a TxCell command. */ ulong pad1; /* 0x0c, 32, N/A, Padding bits. */ ulong TxConAct; /* 0x10, 32, R, Current ConNum processed by TxCell processor*/ ulong pad2[11]; /* 0x14, 11*32, N/A, Padding bits. */ ulong RxCompl; /* 0x40, 32, R, Read Transmit Completion Queue. */ ulong RxConNum; /* 0x44, 32, R/W, Connection number for the RxCell command*/ ulong RxCell; /* 0x48, 32, R/W, Issue a RxCell command. */ ulong pad3; /* 0x4c, 32, N/A, Padding bits. */ ulong RxConAct; /* 0x50, 32, R, Current ConNum processed by RxCell processor.*/ ulong pad4; /* 0x54, 32, N/A, Padding bits. */ ushort RxBuffOffs; /* 0x58, 16, R/W, Offset for the receive Buffers payload. */ ushort pad5; /* 0x5a, 16, N/A, Padding bits. */ Source Code Listings 3-49 BookL64364PG.fm5 Page 50 Friday, January 28, 2000 4:58 PM 51 ulong pad6[9]; /* 0x5c, 9*32, N/A, 52 Padding bits. */ 53 ulong Buff; /* 0x80, 32, R/W, 54 Issue a buff command. */ 55 ulong pad7; /* 0x84, 32, N/A, 56 Padding bits. */ 57 ulong ConReAct; /* 0x88, 32, R, 58 Buff processor conection Reactivation message.*/ 59 ulong pad8; /* 0x8c, 32, N/A, 60 Padding bits. */ 61 ulong BuffConAct; /* 0x90, 32, R, 62 Current ConNum processed by Buff processor.*/ 63 ushort LBuff; /* 0x94, 16, R/W, 64 head of Large Free Buffer lists. */ 65 ushort SBuff; /* 0x96, 16, R/W, 66 head of Small Free Buffer lists. */ 67 ushort TxBuffOffs; /* 0x98, 16, R/W, 68 Offset for the transmit Buffers payload. */ 69 ushort pad9; /* 0x9a, 16, N/A, 70 Padding bits. */ 71 ulong pad10; /* 0x9c, 32, N/A, 72 Padding bits. */ 73 ulong MoveSrc; /* 0xa0, 32, R/W, 74 Program source address for a move command. */ 75 ulong MoveDst; /* 0xa4, 32, R/W, 76 Program destination address for a move command.*/ 77 ushort MoveCount; /* 0xa8, 16, R/W, 78 Program the byte count and issue a move command.*/ 79 ushort pad11; /* 0xaa, 16, N/A, 80 Padding bits. */ 81 ulong pad12[5]; /* 0xac, 5*32, N/A, 82 Padding bits. */ 83 ushort Ctrl; /* 0xc0, 16, R/W, 84 EDMA control bits. */ 85 ushort pad13; /* 0xc2, 16, N/A, 86 Padding bits. */ 87 ushort Status; /* 0xc4, 16, R, 88 Check the EDMA status. */ 89 ushort pad14; /* 0xc6 16, N/A, 90 Padding bits. */ 91 ushort LBuffSize; /* 0xc8, 16, R/W, 92 Size of large buffer in bytes. */ 93 ushort SBuffSize; /* 0xca, 16, R/W, 94 Size of small buffer in bytes. */ 95 ushort VCD_Base; /* 0xcc, 16, R/W, 96 Base address of the VC Descriptor Table. */ 97 ushort pad15; /* 0xce, 16, N/A, 98 Padding bits. */ 99 ushort BFD_LBase; /* 0xd0, 16, R/W, 100 Local Base address of Buffer Descriptor Table*/. 101 ushort BFD_FBase; /* 0xd2, 16, R/W, 102 Far Base address of Buffer Descriptor Table.*/ 103 } EDMA_Reg_t, *pEDMA_Reg_t; 3-50 Scheduling BookL64364PG.fm5 Page 51 Friday, January 28, 2000 4:58 PM 104 105 106 /* ACI memory mapped registers */ 107 typedef struct ACI_Reg_s { 108 /* ItemName; Offs, Size, R/W, Init, Description */ 109 ushort Ctrl; /* 0x00, 16, R/W, Y 110 ACI Control field. */ 111 ushort FreeList; /* 0x02, 16, R/W, Y 112 Beginning of free cell list. */ 113 uchar TxTimer; /* 0x04, 8, R/W, Y 114 Transmit time-out. */ 115 uchar TxSize; /* 0x05, 8, R/W, Y 116 Maximum number of cells in Transmit Fifo. 117 uchar TxLimit; /* 0x06, 8, R/W, Y 118 Num of cells in TxFifo to generate an interrupt. */ 119 uchar RxLimit; /* 0x07, 8, R/W, Y 120 Num of cells in RxFifo to generate an interrupt. */ 121 ulong RxMask; /* 0x08, 32, R/W, Y 122 Receive polling mask. */ 123 ushort Free; /* 0x0c, 16, R/W, 124 Get or return a free cell location. 125 ushort pad1; /* 0x0e, 16, N/A, 126 Padding bits. */ 127 ushort RxRead; /* 0x10, 16, R, 128 Get cell from Receive Fifo. */ 129 ushort pad2; /* 0x12, 16, N/A, 130 Padding bits. */ 131 ushort TxWrite; /* 0x14, 16, W, 132 Put cell in Transmit Fifo. */ 133 ushort pad3; /* 0x16, 16, N/A, 134 Padding bits. */ 135 uchar RxCels; /* 0x18, 8, R, 136 Number of cells in the Receive Fifo. 137 uchar pad4; /* 0x19, 8, N/A, 138 Padding bits. */ 139 uchar TxCels; /* 0x1a, 8, R, 140 Number of cells in the Transmit Fifo. 141 uchar pad5; /* 0x1b, 8, N/A, 142 Padding bits. */ 143 ushort Error; /* 0x1c, 16, R, 144 Get a cell from the Fifo. */ 145 ushort pad6; /* 0x1e, 16, N/A, 146 Padding bits. */ 147 } ACI_Reg_t, *pACI_Reg_t; 148 149 150 /* Timer Unit memory mapped registers */ 151 typedef struct TIM_Reg_s { 152 /* ItemName; Offs, Size, R/W, Init, Description */ Source Code Listings */ */ */ */ 3-51 BookL64364PG.fm5 Page 52 Friday, January 28, 2000 4:58 PM 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 3-52 ulong TimeStamp; uchar Timer1; uchar pad1; uchar TimerInit1; uchar pad2; uchar Timer2; uchar pad3; uchar TimerInit2; uchar pad4; uchar Timer3; uchar pad5; uchar TimerInit3; uchar pad6; uchar Timer4; uchar pad7; uchar TimerInit4; uchar pad8; uchar Timer5; uchar pad9; uchar TimerInit5; uchar pad10; uchar Timer6; uchar pad11; uchar TimerInit6; uchar pad12; uchar Timer7; uchar pad13; Scheduling /* 0x00, 32, R/W, 0 Time Stamp Counter. */ /* 0x04, 8, R/W, Y Timer Value. */ /* 0x05, 8, N/A, Padding bits. */ /* 0x06, 8, R/W, 0 Timer Initialization value. /* 0x07, 8, N/A, Padding bits. */ /* 0x08, 8, R/W, Y Timer Value. */ /* 0x09, 8, N/A, Padding bits. */ /* 0x0a, 8, R/W, 0 Timer Initialization value. /* 0x0b, 8, N/A, Padding bits. */ /* 0x0c, 8, R/W, Y Timer Value. */ /* 0x0d, 8, N/A, Padding bits. */ /* 0x0e, 8, R/W, 0 Timer Initialization value. /* 0x0f, 8, N/A, Padding bits. */ /* 0x10, 8, R/W, Y Timer Value. */ /* 0x11, 8, N/A, Padding bits. */ /* 0x12, 8, R/W, 0 Timer Initialization value. /* 0x13, 8, N/A, Padding bits. */ /* 0x14, 8, R/W, Y Timer Value. */ /* 0x15, 8, N/A, Padding bits. */ /* 0x16, 8, R/W, 0 Timer Initialization value. /* 0x17, 8, N/A, Padding bits. */ /* 0x18, 8, R/W, Y Timer Value. */ /* 0x19, 8, N/A, Padding bits. */ /* 0x1a, 8, R/W, 0 Timer Initialization value. /* 0x1b, 8, N/A, Padding bits. */ /* 0x1c, 8, R/W, Y Timer Value. */ /* 0x1d, 8, N/A, */ */ */ */ */ */ BookL64364PG.fm5 Page 53 Friday, January 28, 2000 4:58 PM 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 Padding bits. */ 8, R/W, 0 Timer Initialization value. */ /* 0x1f, 8, N/A, Padding bits. */ /* 0x20, 6, R/W, Y Timer-out enable. */ /* 0x21, 3*8, N/A, Padding bits. */ /* 0x24, 6, W, Timer-out clear. */ /* 0x25, 3*8, N/A, Padding bits. */ /* 0x28, 32, W, Y Timer clock selection. */ uchar TimerInit7; /* 0x1e, uchar pad14; uchar Enable; uchar pad15[3]; uchar Clear; uchar pad16[3]; ulong ClockSel; } TIM_Reg_t, *pTIM_Reg_t; /* Scheduler Unit memory mapped registers */ typedef struct SCD_Reg_s { /* ItemName; Offs, Size, R/W, Description */ ulong Ctrl; /* 0x00, 32, R/W, Control register. */ ushort pad1; /* 0x04, 16, N/A, Padding bits. */ ushort CalSize; /* 0x06, 16, R/W, Size of the Calendar Table. */ ushort pad2; /* 0x08, 16, N/A, Padding bits. */ ushort Now; /* 0x0a, 16, R/W, Current cell slot pointer. */ ulong Serv; /* 0x0c, 32, R, execute service command. */ ulong Sched; /* 0x10, 32, W, execute schedule command. */ ulong pad3; /* 0x14, 32, N/A, Padding bits. */ ulong Tic; /* 0x18, 32, W, execute tic command. */ } SCD_Reg_t, *pSCD_Reg_t; /* APU memory mapped registers */ typedef struct APU_Reg_s { /* ItemName; Offs, Size, R/W, Description */ ulong AddrMap; /* 0x00, 32, R/W, Memory mapping register. */ ushort pad1; /* 0x04, 16, N/A, Padding bits. */ ushort Watchdog; /* 0x06, 16, R/W, APU watchdog timer value. */ ulong Srl; /* 0x08, 32, R/W, Read a word from a serial EPROM. */ Source Code Listings 3-53 BookL64364PG.fm5 Page 54 Friday, January 28, 2000 4:58 PM 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 3-54 ushort pad2; /* 0x0c, 16, N/A, Padding bits. */ ushort VIntEnable; /* 0x0e, 16, R/W, Interrupt mask. */ ulong VIntBase; /* 0x10, 32, R/W, Interrupt base address. */ ulong Status; /* 0x14, 32, R, System status bits. */ } APU_Reg_t, *pAPU_Reg_t; /* PORT Controller memory mapped registers */ typedef struct PC_Reg_s { /* ItemName; Offs, Size, R/W, Description */ uchar pad1[3]; /* 0x0, 3*8, N/A, Padding bits. */ uchar PP_Ctrl; /* 0x03, 8, R/W, Primary Port control. */ ulong PP_RxMbx; /* 0x04, 32, R, Input Mailbox (host -> APU) */ ulong PP_TxMbx; /* 0x08, 32, R/W, Output Mailbox (APU -> host). */ ulong pad2[29]; /* 0x0c, 29*32, N/A, Padding bits. */ ulong SP_Ctrl; /* 0x80, 32, R/W, Secondary Port control register. */ ulong SP_SDRAM; /* 0x84, 32, R/W, SDRAM control register. */ ulong SP_Refresh; /* 0x88, 32, R/W, SDRAM refresh register. */ } PC_Reg_t, *pPC_Reg_t; /* Reserved for hardware registers external to ATMizer-II+ CWM. */ typedef struct EXT_Reg_s { uchar Extern[1024]; /* Hardware registers for external modules.*/ } EXT_Reg_t, *pEXT_Reg_t; /* ATMizer-II+ CWM hardware register map */ typedef struct Hdr_Reg_s { /* ItemName; Size, VirAddr, Description */ EDMA_Reg_t volatile EDMA; /* 256 bytes, b8000000, Hardware registers for EDMA. */ uchar pad1[256 - sizeof(EDMA_Reg_t)]; ACI_Reg_t uchar SCD_Reg_t uchar volatile ACI; /* 256 bytes, b8000100, Hardware registers for ACI. */ pad2[256 - sizeof(ACI_Reg_t)]; volatile SCD; /* 128 bytes, b8000200, Hardware registers for SCHEDULER. */ pad4[128 - sizeof(SCD_Reg_t)]; Scheduling BookL64364PG.fm5 Page 55 Friday, January 28, 2000 4:58 PM 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 TIM_Reg_t uchar APU_Reg_t uchar PC_Reg_t uchar EXT_Reg_t volatile TIM; /* 128 bytes, b8000280, Hardware registers for TIMER. */ pad3[128 - sizeof(TIM_Reg_t)]; volatile APU; /* 256 bytes, b8000300, Hardware registers for APU. */ pad5[256 - sizeof(APU_Reg_t)]; volatile PC; /* 256 bytes, b8000400, Hardware registers for Port Controller.*/ pad6[256 - sizeof(PC_Reg_t)]; volatile EXT; /* 1k bytes, b8000800, Hardware registers for external modules. */ } Hdr_t, *pHdr_t; /*_____________________________________________________________ * * HARDWARE REGISTER DEFINITIONS *_____________________________________________________________ */ /* ------------------------------------------------------* Buffer Status bits, returned in a Completion Queue */ #define #define #define #define #define #define #define #define #define #define #define #define #define BFS_ErrAll BFS_ConNumRet BFS_BuffCont BFS_DirTx BFS_ErrNoData BFS_ErrNoMem BFS_ErrNoContBuff BFS_ErrLowMem BFS_ErrAbort BFS_ErrLength BFS_ErrCrc BFS_BuffFree BFS_BuffLarge 31 30 29 28 27 26 25 24 23 22 21 17 16 /* ---------------------------------------* EDMA Control register EDMA_Ctrl */ typedef struct { ushort Res1:4, RxBFD_Far:1, RxBFD_Copy:1, TxBFD_Far:1, TxBFD_Copy:1, Res2:2, ByteSwap:1, Source Code Listings 3-55 BookL64364PG.fm5 Page 56 Friday, January 28, 2000 4:58 PM 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 3-56 OrHdr:1, ConReAct:1, UU:1, RxCopy:1, TxCopy:1; } EDMA_Ctrl_t; #define #define #define #define #define #define #define #define #define #define EDMA_RxBFD_Far EDMA_RxBFD_Copy EDMA_TxBFD_Far EDMA_TxBFD_Copy EDMA_ByteSwap EDMA_OrHdr EDMA_ConReAct EDMA_UU EDMA_RxBuffCopy EDMA_TxBuffCopy 11 10 9 8 5 4 3 2 1 0 /* ------------------------------------------------------* Buffer Descriptor control bits */ typedef struct { ushort BuffCont:1, EFCI:1, CLP:1, BuffFree:1, BuffLarge:1, ErrAbort:1, ErrLength:1, ErrCrc:1, ConNumMSB:8; } BFD_Ctrl_t; #define #define #define #define #define #define #define #define BFD_BuffCont BFD_EFCI BFD_CLP BFD_BuffFree BFD_BuffLarge BFD_ErrAbort BFD_ErrLength BFD_ErrCrc 15 14 13 12 11 10 9 8 /* ---------------------------------------* Buffer Descriptor (BFD) */ typedef struct { BFD_Ctrl_t BFD_Ctrl; ushort ConNum; /* Connection number to which buffer belongs */ ushort BuffSize; /* Size of the buffer */ ushort NextBFD; /* Index to next BFD in the list */ ulong pBuffData; /* pointer to the payload */ ushort UU_CPI; Scheduling BookL64364PG.fm5 Page 57 Friday, January 28, 2000 4:58 PM 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 ushort BuffNum; } BFD_t, *pBFD_t; /* -----------------------* Control field of the VCD */ typedef struct { ushort BuffPres:1, ConAct:1, BuffCont:1, BuffFree:1, BuffLarge:1, BuffDone:1, EFCI:1, CLP:1, PHY:5, CellHold:1, AAL0:1, DirTx:1; } VCD_Ctrl_t; #define #define #define #define #define #define #define #define VCD_BuffPres VCD_ConAct VCD_BuffCont VCD_BuffFree VCD_BuffLarge VCD_BuffDone VCD_EFCI VCD_CLP #define VCD_CellHold #define VCD_AALO #define VCD_DirTx /* connection is open */ /* connection active status */ /* /* /* /* address of physical device to use */ do not send out cell */ AAL0 mode of operation */ set for Tx, cleared for Rx */ 15 14 13 12 11 10 9 8 2 1 0 /* ------------------------------* auxiliary control word for AAL0 */ typedef struct { ulong Tbytes:6, Crc10:1, Reserved:1, Offs:6, Unused:18; } AAL0_Ctrl_t; /* -------------------------------* ATM cell header */ typedef struct { ulong VPI:12, VCI:16, PTI:1, /* only one PTI bit is named as such */ Source Code Listings 3-57 BookL64364PG.fm5 Page 58 Friday, January 28, 2000 4:58 PM 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 3-58 EFCI:1, EOM:1, CLP:1; } CellHdr_t; #define CELL_EFCI #define CELL_EOM #define CELL_CLP 2 1 0 /* -------------------------------* Virtual Circuit Descriptor (VCD) */ typedef struct { ushort Class; ushort NextVCD; VCD_Ctrl_t VCD_Ctrl; ushort Nbytes; ulong pBuffData; AAL0_Ctrl_t Crc32; CellHdr_t CellHdr; ushort BuffSize; ushort PayldLen; ushort TailBFD; ushort NextBFD; ushort CurrBFD; ushort UU_CPI; } VCD_t, *pVCD_t; /* -------------------------------* Cell Descriptor in Cell Buffer */ typedef struct { ulong Next:16, Tbytes:6, Crc10:1, Par:1, BOM:1, EOM:1, Len:1, PHY:5; } CDS_t; #define #define #define #define #define #define #define CDS_Tbytes CDS_Crc10 CDS_Par CDS_BOM CDS_EOM CDS_Len CDS_PHY 10 9 8 7 6 5 0 /* -------------------------------* EDMA Status register */ Scheduling /* used by the scheduler */ /* used by the scheduler */ /* VCD control bits */ /* num of bytes processed */ /* pointer to curr buffer */ /* partial CRC-32 result */ /* Cell Header, tx only */ /* size of the current buffer */ /* total length of frame */ /* index to tail BFD */ /* index to current BFD */ /* index to next BFD */ BookL64364PG.fm5 Page 59 Friday, January 28, 2000 4:58 PM 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 typedef struct { ushort RxCellComplFull:1, TxCellComplFull:1, BuffComplFull:1, MoveRxPend:1, RxCellMsg:1, TxCellMsg:1, BuffMsg :1, MoveBuffPend:1, RxCellReqFull:1, TxCellReqFull:1, BuffReqFull:1, MoveReqFull:1, RxCellBusy:1, TxCellBusy:1, BuffBusy:1, MoveBusy:1; } EDMA_Status_t; #define #define #define #define #define #define #define #define #define #define #define #define #define #define #define #define EDMA_MoveBusy EDMA_BuffBusy EDMA_TxCellBusy EDMA_RxCellBusy EDMA_MoveReqFull EDMA_BuffReqFull EDMA_TxCellReqFull EDMA_RxCellReqFull EDMA_MoveBuffPend EDMA_BuffMsg EDMA_TxCellMsg EDMA_RxCellMsg EDMA_MoveRxPend EDMA_BuffComplFull EDMA_TxCellComplFull EDMA_RxCellComplFull 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 /* -----------------------------------------------------------* APU_AddrMap register */ typedef struct { ulong Reset:1, Boot:2, SecMSB:5, Res1:3, PriMSB:5, IntAck:6, Res2:3, ExcMap:7; } APU_AddrMap_t; /* ------------------------------------------------------------ Source Code Listings 3-59 BookL64364PG.fm5 Page 60 Friday, January 28, 2000 4:58 PM 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 * APU_Status register */ typedef struct { ulong MbxFull:1, Res1:3, NowBusy:1, TicBusy:1, SchedBusy:1, ServBusy:1, Res2:7, Watchdog:1, EDMA_RxCelFull:1, ACI_RxFull:1, RxMbx:1, EDMA_TxCellFull:1, EDMA_RxCell:1, ACI_Rx:1, EDMA_TxCell:1, EDMA_Buff:1, ACI_Err:1, ACI_Tx:1, IntExt:2, IntTim:4; } APU_Status_t; 3-60 Scheduling /*_____________________________________________________________ * * ACCESS MACROS *_____________________________________________________________ */ #define #define #define #define ACI_Send(x) ACI_GetFree() ACI_Free(x) ACI_RxRead() Hdr->ACI.TxWrite = (x) Hdr->ACI.Free Hdr->ACI.Free = (x) Hdr->ACI.RxRead #define EDMA_TxCell(x, y) {Hdr->EDMA.TxConNum = (x); Hdr->EDMA.TxCell = (y);} #define EDMA_RxCell(x, y) {Hdr->EDMA.RxConNum = (x); Hdr->EDMA.RxCell = (y);} #define EDMA_Status() Hdr->EDMA.Status #define EDMA_TxCompl() Hdr->EDMA.TxCompl #define EDMA_RxCompl() Hdr->EDMA.RxCompl #define EDMA_BuffConReAct() Hdr->EDMA.ConReAct #define EDMA_Buff(x) Hdr->EDMA.Buff = (x) #define #define #define #define #define SCD_Sched(x, y) SCD_Serv() SCD_Tic() SCD_Now() SCD_SetNow(x) #define PP_RxMbx() Hdr->SCD.Sched = ( (x) << 16 | (y) ) Hdr->SCD.Serv Hdr->SCD.Tic = 0 Hdr->SCD.Now Hdr->SCD.Now = (x) Hdr->PC.PP_RxMbx BookL64364PG.fm5 Page 61 Friday, January 28, 2000 4:58 PM 630 631 632 633 634 635 636 637 638 639 640 /*_____________________________________________________________ * * EXTERNAL DECLARATIONS FOR HARDWARE RESSOURCES *_____________________________________________________________ */ extern pHdr_t extern uchar * Hdr; CBM; #endif 3.8.4 Extended Instructions Header File (Instr.h) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 /* $Id: Instr.h,v 1.2 1996/06/04 22:16:18 zhifeng Exp $ */ /* -----------------------------------------------------------* ATMizer-2 * Copyright (C) 1995–1999 LSI Logic Corporation * * Instr.h - Declarations for the CW4010 extended instructions * suitable for ATM Forum defined rate calculation. * These declarations can be used with a GNU GCC compiler * * -----------------------------------------------------------*/ #ifndef _INSTR_H #define _INSTR_H #define maxi(a, b) \ ({ int __z, __a = (a), __b = (b); \ __asm__ (“maxi %0,%1,%2” : “=r” (__z) : “r” (__a), “r”(__b) ); __z; }) #define mini(a, b) \ ({ int __z, __a = (a), __b = (b); \ __asm__ (“mini %0,%1,%2” : “=r” (__z) : “r” (__a), “r”(__b) ); __z; }) #define rmul(a, b) \ ({ int __z, __a = (a), __b = (b); \ __asm__ (“rmul %1,%2\n\tmflo %0” : \ “=r” (__z) : “r” (__a), “r”(__b) : “h”, “l” ); \ __z; }) #define radd(a, b) \ ({ int __z, __a = (a), __b = (b); \ __asm__ (“radd %1,%2\n\tmflo %0” : \ “=r” (__z) : “r” (__a), “r”(__b) : “h”, “l” ); \ __z; }) #define rsub(a, b) \ ({ int __z, __a = (a), __b = (b); \ __asm__ (“rsub %1,%2\n\tmflo %0” : \ “=r” (__z) : “r” (__a), “r”(__b) : “h”, “l” ); \ __z; }) Source Code Listings \ \ 3-61 BookL64364PG.fm5 Page 62 Friday, January 28, 2000 4:58 PM 39 40 41 42 43 44 45 46 47 48 49 50 #define r2u(a) ({ int __z, __asm__ (“r2u “=r” (__z) : __z; }) #define u2r(a) ({ int __z, __asm__ (“u2r “=r” (__z) : __z; }) \ __a = (a); \ %1\n\tmflo %0” : \ “r” (__a) : “h”, “l” ); \ \ __a = (a); \ %1\n\tmflo %0” : \ “r” (__a) : “h”, “l” ); \ #endif 3.8.5 ABR Functions Header File (ABR.h) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 3-62 /* $Id: ABR.h,v 1.5 1996/07/03 01:04:30 zhifeng Exp $ */ /* -----------------------------------------------------------* ATMizer-2 * Copyright (C) 1995 LSI Logic Corporation * * Available Bit Rate - Source and Destination End System Behavior * * ABR.h - Header file for ABR functions * * -----------------------------------------------------------*/ #ifndef _ABR_H #define _ABR_H typedef struct { /* 0 */ ulong ICG; /* Inter-Cell-Gap, in fractional fromat */ /* 4 */ ulong ThTxTime; /* Theoretical Transmit Time in fract format*/ /* 8 */ ulong LastTimeFRM; /* Last Time a Forward RM cell was sent*/ /* 12 */ uchar logRIF:4, logRDF:4; /* 13 */ uchar CRM; /* limit of FRM in absence of BRM */ /* 14 */ uchar FRM_SinceBRM; /* count of FRM since last received BRM */ /* 15 */ uchar InRateCell; /* Count of In-Rate cells since last FRM*/ /* 16 */ ushort ACR; /* Allowed Cell Rate */ /* 18 */ ushort MCR; /* Minimum Cell Rate */ /* 20 */ ushort ICR; /* Initial Cell Rate */ /* 22 */ ushort PCR; /* Peak Cell Rate */ /* 24 */ ushort PVec; /* binary flags and some service parameters*/ /* 26 */ ushort BRM_ER; /* BRM: Explicit Rate */ /* 28 */ ushort BRM_CCR; /* BRM: Current Cell Rate */ /* 30 */ ushort BRM_MCR; /* BRM: Minimum Cell Rate */ } ABR_t, *pABR_t; #define LCR (149.76e6/8/53) /* Line Cell Rate */ /* ------------------------------------------------------------ Scheduling BookL64364PG.fm5 Page 63 Friday, January 28, 2000 4:58 PM 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 * */ #define #define #define #define #define ABR constant service parameters logNRM NRM TRM MRM TCR 5 (1 << logNRM) /* default 32 */ ( (ulong) (0.1 * LCR) ) /* default 100 ms */ 2 ((1 << RATE_NZ) | (3 << RATE_EXP) | 128 ) /* decimal 10 in ABR rate format */ /* -----------------------------------------------------------* Definition of PVec bitfield. * Bits 1-0 store binary state variables * Bits 4-2 store 3 bits from MSG field of BRM * Bits 15-5 store optionally negotiable service parameters. * * 0 LastWasFRM * 1 PresBRM * 2 BRM_NI * 3 BRM_CI * 4 BRM_BN * 7:5 logCDF * 15:8 ADTF (TM specs require 10 bits here, only 8 fit) * * CDF has a default of 1/16 and is optionally negotiated * If you need the to negotiate that value, use the definition below * #define logCDF ((pABR->PVec >> 6) & 3) * otherwise the default one is faster. */ #define logCDF 4 /* * ADTF has a default value of 0.5 s and is optionally negotiated * If you need to negotiate that value per VC, use the definition below. * This definition provides granularity of 80 ms which is not as good * as 10 ms required by TM specs, but it should be sufficient. * #define ADTF ( (ulong) ((pABR->PVec >> 9) * LCR * 10.23 / 128) ) * otherwise the default one is (much) faster */ #define ADTF ( (ulong) (0.5 * LCR) ) /* -----------------------------------------------------------* Macro’s to manipulate binary state variables */ #define F_LastWasFRM 0x01 #define F_AllowIncACR 0x02 #define F_PresBRM 0x04 #define F_ALL (F_LastWasFRM | F_AllowIncACR | F_PresBRM) /* * Macro’s to manipulate Message Type field of a RM cell */ #define BRM_NI 0x10 /* No Increase */ #define BRM_CI 0x20 /* Congestion Indication */ #define BRM_BN 0x40 /* BECN Cell */ Source Code Listings 3-63 BookL64364PG.fm5 Page 64 Friday, January 28, 2000 4:58 PM 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 #define BRM_DIR #define BRM_ALL #define BRM_SHIFT 0x80 /* Direction, not stored */ (BRM_NI | BRM_CI | BRM_BN | BRM_DIR) 2 /* offset to shift BRM bits */ /* -----------------------------------------------------------* Private local macro’s */ #define InterCellGap(x) (((ulong) LCR) / r2u(x)) << TIME_FRAC #define RATE_EXP #define RATE_NZ #define RM_CDS #define ABR_ID 9 14 /* Exponent part of ABR rate */ /* Valid bit of ABR rate */ ((8 << 10) | (1 << 9)) /*RM cell buff descriptor*/ 0x01 #endif 3.8.6 TxCell() and RxCell() (Cell.c) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 3-64 /* $Id: Cell.c,v 1.9 1996/06/25 01:16:26 zhifeng Exp $ */ /* -----------------------------------------------------------* ATMizer-2 * Copyright (C) 1995–1999 LSI Logic Corporation * * Cell.c - Receive and Transmit a cell * -----------------------------------------------------------*/ #include “uTypes.h” #include “Config.h” #include “Hdr.h” #include “ATMizer2.h” /* -----------------------------------------------------------* Receive Cell * * Name: RxCell(const ulong aCell) * * Description: This function is called if the RxCell processor’s * request queue is not full and there is a cell in * the ACI Receive Fifo. APU gets this cell and checks * the cell header. If it is a RM cell, process it. * Otherwise, invoke RxCell processor to process it. * * Parameter: aCell - address of the cell in Cell Buffer * * Return value: None * * -----------------------------------------------------------*/ void RxCell(const ulong aCell) { /* retrieve the header */ Scheduling BookL64364PG.fm5 Page 65 Friday, January 28, 2000 4:58 PM 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 ulong CellHdr = word(CBM[aCell + 4]); #ifdef LOOP_BACK ulong ConNum #else ulong ConNum #endif = (CellHdr >> 4) + MAX_CON_NUM; = CellHdr >> 4; /* * Very Simple Cell Header Lookup: * take only VCI (must be in range 0 .. CON_NUM-1 * OAM cells and signalling VCI are not processed */ if (ConNum >= MAX_CON_NUM) { Stat.ErrConNum++; ACI_Free(aCell); } else { /* Check if it is a RM cell */ if ( (CellHdr & PTI_HDR) == PTI_RM ) { if (word(CBM[aCell]) & one(CDS_Crc10)) { Stat.ErrCrc10++; ACI_Free(aCell); } else ABR_Receive(ConNum, aCell); } else EDMA_RxCell(ConNum, (CellHdr << 16) | aCell ); } Stat.RxCells++; } /* -----------------------------------------------------------* Transmit a Cell * * Name: TxCell(const ulong aCell) * Description: This function is called if the TxCell processor’s * request queue is not full and there is a free cell * location. APU gets the ConNum from the Scheduler * checks the class of this connection and calls the * corresponding procedures. * Parameters: aCell - address of a free cell location in Cell Buffer * Return value: None * -----------------------------------------------------------*/ #define UBR_Send CBR_Send QoS_Send_t *QoS_Send[] = { CBR_Send, VBR_Send, ABR_Send, UBR_Send }; void TxCell(const ulong aCell) { Source Code Listings 3-65 BookL64364PG.fm5 Page 66 Friday, January 28, 2000 4:58 PM 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 } ulong do { N, T; N = SCD_Serv(); if ((N & U16) == 0) { if (OutOfRate.aCell) { EDMA_TxCell(0, OutOfRate.aCell); OutOfRate.aCell = 0; OutOfRate.ConNum = 0; ACI_Free(aCell); } else { pCell_t pCell = (pCell_t) &CBM[aCell]; pCell->CDS = CDS_IDLE; pCell->CellHdr = 0; EDMA_TxCell(0, aCell); } break; } if (QoS_Send[ConClass(N)](N, aCell)) { N &= U16; ACD[N].ThTxTime += ACD[N].ICG; T = ACD[N].ThTxTime; if ( IsInPast(T) ) T = TimeNow + (1 << TIME_FRAC); SCD_Sched(0, (T >> TIME_FRAC) & CAL_SIZE_MASK); break; } ACD_UnSched(ACD[N]); } while (1); SCD_Tic(); TimeNow += 1 << TIME_FRAC; Stat.TxCells++; 3.8.7 Transmit a CBR Cell (CBR.c) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 3-66 /* $Id: CBR.c,v 1.2 1996/06/04 00:35:45 zhifeng Exp $ */ /* -----------------------------------------------------------* ATMizer-2 * Copyright (C) 1995–1999 LSI Logic Corporation * * CBR.c - Send a cell from a Constant Bit Rate connection. * -----------------------------------------------------------*/ #include “uTypes.h” #include “Hdr.h” #include “ATMizer2.h” /* * * * -----------------------------------------------------------Name: CBR_Send() Description: Behavior for a transmit CBR cell parameters: ConNum: BuffPres|Class|Connection Number (from Scheduler) Scheduling BookL64364PG.fm5 Page 67 Friday, January 28, 2000 4:58 PM 17 18 19 20 21 22 23 24 25 26 27 28 29 30 * aCell: cell address in Cell Buffer * returns status code: * 0 no cell sent * 1 data cell sent * -----------------------------------------------------------*/ int CBR_Send(const ulong ConNum, const ulong aCell) { if (ConHasData(ConNum)) { EDMA_TxCell(ConNum, aCell); return 1; } return 0; } 3.8.8 Transmit a VBR Cell (VBR.c) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 /* $Id: VBR.c,v 1.6 1996/08/06 19:02:47 thomasd Exp $ */ /* -----------------------------------------------------------* ATMizer-2 * Copyright (C) 1995–1999 LSI Logic Corporation * * VBR.c - Send a cell from a VBR connection * -----------------------------------------------------------*/ #include “uTypes.h” #include “Hdr.h” #include “ATMizer2.h” #include “Instr.h” /* -----------------------------------------------------------* Name: VBR_Send() * Description: Behavior for a transmit VBR cell * parameters: ConNum: BuffPres|Class|Connection Number (from Scheduler) * aCell: cell address in Cell Buffer * returns status code: * 0 no cell sent * 1 data cell sent * -----------------------------------------------------------*/ int VBR_Send(const ulong ConNum, const ulong aCell) { if (ConHasData(ConNum)) { ulong N = ConNum & U16; ACD[N].Bucket -= TimeNow - (ACD[N].ThTxTime - ACD[N].ICG); ACD[N].Bucket = maxi(0, ACD[N].Bucket) + ACD[N].Increment; ACD[N].ICG = maxi(ACD[N].ICG_PCR, ACD[N].Bucket - ACD[N].Limit); ACD[N].ThTxTime = TimeNow; EDMA_TxCell(N, aCell); return 1; } return 0; } Source Code Listings 3-67 BookL64364PG.fm5 Page 68 Friday, January 28, 2000 4:58 PM 3.8.9 Transmit and Receive ABR Cells (ABR.c) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 3-68 /* /* * * * * * $Id: ABR.c,v 1.5 1996/06/04 01:15:55 zhifeng Exp $ */ -----------------------------------------------------------ATMizer-2 Copyright (C) 1995–1999 LSI Logic Corporation ABR.c - Available Bit Rate Source and Destination End System Behavior As per ATM Forum Traffic Management specifications v4.0 (af-tm0056.000) * The following simplications that are believed to be realistic * were made: * 1. Following parameters are set to constant * Nrm = 32, Trm = 100 ms, CDF = 1/16, ADTF = 0.5 * 2. CRM range is 0 .. 255 * 3. Sending out-of-rate cells at TCR is not implemented * -----------------------------------------------------------*/ #include “uTypes.h” #include “ABR.h” #include “Hdr.h” #include “Instr.h” #include “ATMizer2.h” /* -----------------------------------------------------------* Name: ABR_Send() * Description: Behavior for a transmit ABR cell * parameters: ConNum: BuffPres|Class|Connection Number (from Scheduler) * aCell: cell address in Cell Buffer * returns status code: * 0 no cell sent * 1 data cell sent * 2 backward RM cell sent * 3 forward RM cell sent * -----------------------------------------------------------*/ int ABR_Send(const ulong ConNum, const ulong aCell) { register pABR_t pABR = (pABR_t) &ACD[ConNum & U16]; /* * substraction TimeNow-LastTimeFRM works even when TimeNow wraps around * It is mandatory that TimeNow is maintained as an ulong variable * incremented at each cell slot. */ register ulong TimeDiff = (TimeNow - pABR->LastTimeFRM) >> TIME_FRAC; /* ----- Source Rule 3a ----* - After the first in-rate forward RM cell, in rate cells shall * - be sent in the folowing order: * - a. The next in-rate cell shall be in-rate forward RM cell Scheduling BookL64364PG.fm5 Page 69 Friday, January 28, 2000 4:58 PM 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 * - if and only if, since the last in-rate forward RM cell was sent, either: * i. at least Mrm in-rate cells have been sent and * at least Trm time has elapsed * or * ii. Nrm-1 in-rate cells have been sent */ if ( pABR->InRateCell >= NRM || (TimeDiff > TRM && pABR->InRateCell > MRM) ) { /* ------ Source Rule 5 -----* - Before sending a forward in-rate RM-cell, if ACR > ICR * - and the time T that has elapsed since the last in-rate * - forward RM-cell was sent is greater than ADTF, then ACR shall be reduced to ICR. */ if (TimeDiff > ADTF && pABR->ACR > pABR->ICR) pABR->ACR = pABR->ICR; /* ------ Source Rule 6 -----* - Before sending in-rate forward RM cell and after adjusting * - ACR according to Rule 5 above, if at least CRM in-rate * - forward RM-cells have been sent since the last backward * - RM-cell with BN = 0 was received, then ACR shall be * - reduced by at least ACR*CDF, unless this reduction would * - result in a rate below MCR, in which case ACR shall be set to MCR * Expression evaluation: * ACR = ACR - ACR * CDF = ACR - ACR / (1/CDF) * = ACR - ( (ACR.exp - logCDF) | ACR.frac ) * = ACR (ACR - (logCDF << RATE_EXP)) */ if (pABR->FRM_SinceBRM >= pABR->CRM) { /* the subtraction below may underflow - but than the NZ bit * will be cleared effectiviy resetting result to 0 */ pABR->ACR = rsub(pABR->ACR, pABR->ACR - (logCDF << RATE_EXP) ); pABR->ACR = maxi(pABR->ACR, pABR->MCR); } /* * build and send in-rate forward RM cell according to * Source Rules 4, 7, 10. */ word(CBM[aCell]) = RM_CDS; /* RM Cell Header with PTI set to 6 (PTI_RM) */ word(CBM[aCell + 4]) = word(VCD[ConNum & U16].CellHdr) | PTI_RM; /* ID = 1, Msg = 0, ER = PCR */ word(CBM[aCell + 8]) = (ABR_ID << 24) | pABR->PCR; /* CCR = , MCR = MCR */ word(CBM[aCell + 12]) = word(pABR->ACR); EDMA_TxCell(0, aCell); Source Code Listings 3-69 BookL64364PG.fm5 Page 70 Friday, January 28, 2000 4:58 PM 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 3-70 pABR->FRM_SinceBRM++; pABR->InRateCell = 1; pABR->LastTimeFRM = TimeNow; pABR->PVec |= F_LastWasFRM; pABR->ICG = InterCellGap(pABR->ACR); return 3; } /* ------ Source Rule 3-b -----* - b. The next in-rate cell shall be a backward RM cell if * condition a. above is not met, if a backward RM cell is * waiting for transmission and if either: * i. no in-rate backward RM cell has been sent since the last * in-rate forward RM cell * ii. no data cell is waiting for transmission */ else if ( (pABR->PVec & F_PresBRM) && (!ConHasData(ConNum) || (pABR->PVec & F_LastWasFRM))) { /* * build and send in-rate backward RM cell */ register ulong Msg = (ABR_ID << 8) | BRM_DIR | ((pABR->PVec << BRM_SHIFT) & BRM_ALL); if (VCD[ConNum & U16].VCD_Ctrl.EFCI) Msg |= BRM_CI; /* Cell Descriptor */ word(CBM[aCell]) = RM_CDS; /* Cell Header with PTI = 6 (PTI_RM) */ word(CBM[aCell + 4]) = word(VCD[ConNum & U16].CellHdr) | PTI_RM; /* ID = 1, DIR=1, (BN, CI, NI) <- BRM, ER <- BRM_ER */ word(CBM[aCell + 8]) = (Msg << 16) | pABR->BRM_ER; /* CCR<-BRM_CCR, MCR<-BRM<-MCR */ word(CBM[aCell + 12]) = word(pABR->BRM_CCR); EDMA_TxCell(0, aCell ); pABR->InRateCell++; pABR->PVec &= ~F_LastWasFRM & ~F_PresBRM; /* * If the waiting out of rate cell is from the same * connection, discard it because it is outdated. */ if (OutOfRate.ConNum == (ConNum & U16)) { ACI_Free(OutOfRate.aCell); OutOfRate.aCell = 0; OutOfRate.ConNum = 0; } return 2; } /* ------ Source Rule 3-c -----* - c. The next in-rate cell sent shall be a data cell if neither * condition a. nor condition b. above is met, and if a data Scheduling BookL64364PG.fm5 Page 71 Friday, January 28, 2000 4:58 PM 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 * cell is waiting for transmission */ else if ( ConHasData(ConNum)) { /* * send a data cell. */ EDMA_TxCell(ConNum, aCell); pABR->InRateCell++; return 1; } return 0; } /* -----------------------------------------------------------* Name: ABR_Receive() * Description: Behavior for a received RM cell * parameters: ConNum: Connection Number after header look-up * aCell: cell address in Cell Buffer * return code: * 0 received backward RM cell * 1 received forward RM cell * -----------------------------------------------------------*/ int ABR_Receive( const ulong ConNum, const ulong aCell) { register pABR_t pABR = (pABR_t) &ACD[ConNum & U16]; /* * get ID, Msg, ER fields in RM cell */ register ulong ER = word(CBM[aCell + 8]); register ulong Msg = ER >> 16; ER &= U16; /* * Test for bit DIR that occupies sign position after the shifting */ if ( Msg & BRM_DIR ) { /* if DIR == Backward */ /* ------ Source rule 8a -------* - When a backward RM cell is received with CI=1 * - then ACR shall be reduced by at least ACR*RDF, * - unless that reduction would result in a rate below MCR * - in which case ACR shall be set to MCR. */ if ( Msg & BRM_CI ) { /* if CI set in BRM */ /* * Expression evaluation: * ACR = ACR - ACR * RDF = ACR - ACR / (1/RDF) * = ACR - ( (ACR.exp - logRDF) | ACR.frac ) * = ACR (ACR - (logRDF << RATE_EXP)) * RDF is power of 2 in range 1..1/32,768 * logRDF is stored in PVec (4 bits) * * the substraction below may underflow - but than the * NZ bit will be cleared effectivily resetting X to 0 Source Code Listings 3-71 BookL64364PG.fm5 Page 72 Friday, January 28, 2000 4:58 PM 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 3-72 */ register ulong X = pABR->ACR - (pABR->logRDF << RATE_EXP); pABR->ACR = rsub(pABR->ACR, X); } /* ------ Source rule 8b -----* - If the backward RM cell has both CI=0 and NI=0 then ACR * - ACR may be increased by no more than RIF*PCR, to a rate * - not greater than PCR */ else if ( (Msg & BRM_NI) == 0 ) { /* * ACR = ACR + RIF * PCR = ACR + PCR / (1/RIF) * = ACR + ( (PCR.exp - logRIF) | PCR.frac ) * = ACR + PCR - (logRIF << RATE_EXP) * RIF is power of 2 in range 1/32768 .. 1 */ ulong X = pABR->PCR - (pABR->logRIF << RATE_EXP); pABR->ACR = radd(pABR->ACR, X); pABR->ACR = mini(pABR->ACR, pABR->PCR); } /* ------ Source rule 8b and 9 -----* - When a backward RM-cell is received and after ACR is * - adjusted according to Source Rule 8, if ACR is greater * - than ER from RM cell, then ACR shall be reduced to no * - greater than ER, unless ER is unless than MCR, in which * - case ACR shall be set to MCR */ pABR->ACR = mini(ER, pABR->ACR); pABR->ACR = maxi(pABR->ACR, pABR->MCR); pABR->ICG = InterCellGap(pABR->ACR); if (( Msg & BRM_BN) == 0) /*if it is source generated RM cell*/ pABR->FRM_SinceBRM = 0; ACI_Free(aCell); return 0; } else { /* if DIR == Forward */ /* ------ Destination rule 3 -----* Destination rule 3 has 5 options. The option implemented is * described below. This is believed to be the most reasonable * although it is not the cheapiest one. * * - If a forward RM cell is received by the destination while * - another turned-around RM cell (on the same VC) is scheduled * - for in-rate transmission: * - a. The contents of the old cell are overwritten by the * contents of the new cell * - b. The old cell (after being overwritten) shall be sent * out-of-rate * - c. The new cell is scheduled for in-rate transmission */ pABR->PVec = (pABR->PVec & ~(BRM_ALL >> BRM_SHIFT)) | ((Msg & BRM_ALL) >> BRM_SHIFT); Scheduling BookL64364PG.fm5 Page 73 Friday, January 28, 2000 4:58 PM 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 } 290 pABR->BRM_ER = ER; word(pABR->BRM_CCR) = word(CBM[aCell + 12]); /* * Check if there presents a BRM cell for the current * connection. If so, replace the contents of the old one and send it out-of-rate. */ if (pABR->PVec & F_PresBRM) { if (OutOfRate.aCell) ACI_Free(OutOfRate.aCell); ER |= BRM_DIR; if (VCD[ConNum & U16].VCD_Ctrl.EFCI) ER |= BRM_CI; word(CBM[aCell]) = RM_CDS; word(CBM[aCell + 4]) |= CELL_CLP; half(CBM[aCell + 8]) = ER; OutOfRate.aCell = aCell; OutOfRate.ConNum = ConNum & U16; } else ACI_Free(aCell); pABR->PVec |= F_PresBRM; /* * Reschedule this connection if it is not currently scheduled */ if (!ACD_IsSched(ACD[ConNum & U16])) { SCD_Sched(ConNum & U16, \ ((TimeNow >> TIME_FRAC) + 1) & CAL_SIZE_MASK ); ACD_Sched(ACD[ConNum & U16]); } return 1; } Source Code Listings 3-73 BookL64364PG.fm5 Page 74 Friday, January 28, 2000 4:58 PM 3-74 Scheduling BookL64364PG.fm5 Page 1 Friday, January 28, 2000 4:58 PM Chapter 4 Unschedule This chapter describes the motivation and the procedure to unschedule a connection that is already scheduled in the calendar of the ATMizer II+ chip and includes the following sections: • Section 4.1, “Introduction” • Section 4.2, “Unschedule Routine” 4.1 Introduction When a connection needs to be unscheduled, it is necessary to remove the conection from the calendar table. There are a couple of reasons that a connection may need to be unscheduled. 1. The rate for a scheduled connection increases. This can introduce jitter for the connection when the intercell gap decreases. Hence, a connection that is scheduled many slots down in the calendar now has to be rescheduled closer to the current slot. To ensure that the same connection does not appear twice in the calendar, the connection should be unscheduled from its previous slot and then rescheduled in the new slot. 2. The connection is closed. When a connection is closed, it is neccesary to unschedule or remove the connection from the calendar to make sure that no more cells are sent for the connection. L64364 ATMizer II+ ATM-SAR Chip Programming Guide 4-1 BookL64364PG.fm5 Page 2 Friday, January 28, 2000 4:58 PM 4.2 Unschedule Routine The following code illustrates the unscheduling of a connection (Figure 4.1): Figure 4.1 Unschedule Routine 1 /* Unschedule if Connection in Calendar */ 2 UnscheduleCal(ConNum){ 3 ushort *Calendar; 4 5 uchar CalNo = (uchar) ((ACD[ConNum].ACD_Ctrl >> 13) & 0x03); 6 Calendar = (ulong )CalendarAddr[CalNo]; 7 8 if ( !ACD_IsSched(ACD[ConNum]) ) { 9 10 if(ACD[ConNum].ThTxTime > SCD_Now()){ 11 if(word(Read_SCD_Ctrl()) & ONE(SCD_FlatMode)) 12 CurrConNum = Calendar[(ACD[ConNum].ThTxTime)*2]; 13 else 14 CurrConNum = Calendar[ACD[ConNum].ThTxTime]; 15 16 if(CurrConNum == ConNum){ 17 if(word(Read_SCD_Ctrl()) & ONE(SCD_FlatMode)){ 18 Calendar[(ACD[ConNum].ThTxTime)*2] = VCD[ConNum].NextVCD; 19 /* if only one conn in slot*/ 20 if(Calendar[((ACD[ConNum].ThTxTime)*2)+1] == CalNum) 21 Calendar[((ACD[ConNum].ThTxTime)*2)+1] = 0; 22 } 23 else 24 Calendar[ACD[ConNum].ThTxTime] = VCD[ConNum].NextVCD; 25 } 26 else{ 27 while(VCD[CurrConNum].NextVCD != ConNum){ 28 CurrConNum = VCD[CurrConNum].NextVCD; 29 } 30 VCD[CurrConNum].NextVCD = VCD[ConNum].NextVCD; 31 if(word(Read_SCD_Ctrl()) & ONE(SCD_FlatMode)) 32 if(Calendar[((ACD[ConNum].ThTxTime)*2)+1] == CalNum) 33 Calendar[((ACD[ConNum].ThTxTime)*2)+1] = CurrConNum; 34 } 35 ACD_UnSched(ACD[ConNum]); 36 } 37 } 38 } 4-2 Unschedule BookL64364PG.fm5 Page 3 Friday, January 28, 2000 4:58 PM Here are the steps that were taken in the sample code to unschedule Connection ConNum: Line 6 Get the base address of the calendar in which the connection is scheduled (may not be the current calendar). Line 8 Make sure the connection is scheduled in the calendar by checking the ACD. Line 10 Next, check the slot (ThTxTime) where the connection is supposed to be scheduled in the calendar. If the ThTxTime is greater than the Now slot, then the connection is not yet merged into the internal cache of the scheduler. However, if the ThTxTime is less than or equal to the current slot, the connection is already cached by the scheduler. The scheduler has no knowledge of the connection until the connection is cached. Line 20 When you know that the connection is still in the calendar, you can traverse the linked list to get the connection you are interested in and then remove it from the list. To do this, get the first connection of the calendar slot (line12 for Flat mode, line 14 for Priority mode). In Flat mode, the scheduler keeps both the head and tail of the slot. If the head of the slot is the same connection number as the one that is to be unscheduled, then just make the NextVCD field of the connection the top of the slot (line 16). In Flat mode, the scheduler holds both the head and tail of the slot, hence, check to see if the connection is the only one in the slot. If so, clear out the tail of the slot. Line 26 If the connection is elsewhere in the slot, then parse the list until you find the connection. Then remove it from the list. Line 35 Once the connection is removed from the list, the ACD has to be updated to reflect it. Unschedule Routine 4-3 BookL64364PG.fm5 Page 4 Friday, January 28, 2000 4:58 PM If the connection is cached internally in the scheduler, it may not be worthwhile to uncache it. The reason is that, if the connection is now faster than before, it is not possible to schedule it any faster than the current slot and, if the connection is to be closed, then you can close it once it is served. By reading the head and tail registers it is possible to know when the connection will be served and then you can take appropriate action (either remove it so that it is closed or schedule it with the new ICG). 4-4 Unschedule BookL64364PG.fm5 Page 1 Friday, January 28, 2000 4:58 PM Chapter 5 Hashing Function This chapter describes the hashing function implemented in the ATMizer II+ chip and includes the following sections: • Section 5.1, “Hashing Mechanism” • Section 5.2, “Hashing Function.” • Section 5.3, “Hash Implementation” 5.1 Hashing Mechanism ATM technology is connection oriented and the data flow between two end-station entities is based on an established virtual connection between them. The routing mechanism for the cells which hold the data is carried in the cell header; the address space is comprised of 24 bits which is then subdivided into two fields. At the end stations, the cells are processed based on a connection number. Typically, the maximum number of connections that an end station processes is much smaller than the address space available in the cell header. Therefore, a need exists for a hashing mechanism to obtain the connection number of a cell based on the cell header value. The input value to the hashing mechanism is the cell header and the output is the connection number corresponding to the cell header. Thus, the hash table indexes each cell header with the corresponding connection number and enables the retrieval of the connection number based on the cell header. The hashing function uses the cell header to generate an index and the entry in the table the index points to is checked to yield the connection number for the cell header. Since the address space is much larger than the maximum number of connections (and hence the maximum size of the table), there is a possibility that two distinct cell headers can give rise to the same index value. This is L64364 ATMizer II+ ATM-SAR Chip Programming Guide 5-1 BookL64364PG.fm5 Page 2 Friday, January 28, 2000 4:58 PM referred to as a “collision.” Therefore, all the cell headers that give rise to the same index value are linked together by means of a linked list. If the hash table has more than one entry, the entries are sorted to obtain the connection number that corresponds to the cell header. The hash table is structured in an array as defined in Figure 5.1. Figure 5.1 Hashing Table Declarations /* -----------------------------------------------------* Hash Type/Function Type */ typedef struct Hash_Entry_t Hash_Entry_t, *pHash_Entry_t; struct Hash_Entry_t { ulong ConNum; /* Connection number */ ulong VPI_VCI; /* VPI_VCI field of the Cell header */ pHash_Entry_t Hash_Next; /* Pointer to next Hash entry */ }; The collision resolution described above is referred to as “chained addressing.” The size of the array that constitutes the hash table is determined by the maximum number of connections and by the hash function used to compute the index value from the cell header. The next section presents the details of the hashing function and discusses some implementation issues. 5.2 Hashing Function The key step in obtaining the connection number from the cell header is the computation of the index of the hash table where the cell header and the connection number are stored. This function necessarily maps more than one cell header (or VPI/VCI value) to one index since it cannot be a one-to-one function. However, the function is chosen such that the spread of the index numbers is statistically even across the range of cell header values, given that the cell header values are randomly chosen and are equally likely. This is best achieved if the index is computed as the mod function of the cell header value with respect to a prime number. Therefore, the design variable in the choice of the hashing function and hence the hash table is the prime number for the computation of the index using the mod function. 5-2 Hashing Function BookL64364PG.fm5 Page 3 Friday, January 28, 2000 4:58 PM The total number of connections that can be processed by the ATMizer II+ chip is determined by the number of VCD structures that can be supported in the memory up to a maximum of 224. Let the total number of receive connections that are supported in an application be Nmax. Therefore, the prime number for the computation of the hash function is less than Nmax. The simplest choice of this prime number is that it is the largest prime less than Nmax. The hashing function can be chosen to be the mod function with respect to the largest prime number less than the maximum number of connections. 5.3 Hash Implementation The Hashing Table size is based on the total available memory space. As a trade-off, the larger the table is, the wider and more even the hashing items are distributed, and the less the chance for a collision. Once the Hashing Table is determined, it needs to be initialized. Since each entry in the hash table is actually a flat linked list, the user also needs to reserve and initialize a free entry pool which will be used once a hash table entry is taken by one connection while another connection also has the same hit. Figure 5.2 shows the C routine to accomplish this task. Figure 5.2 Hashing Table Initialization static void InitHash() { pHash_Entry_t temp; ulong n; /* * HashTable entries are already cleared. * Need to create a free entry pool. */ freeEntryPool = HashTable + sizeof(Hash_Entry_t) * MAX_CON_NUM; temp = freeEntryPool; for(n = 0; n < MAX_CON_NUM; n++) temp[n].Hash_Next = &temp[n + 1]; temp[n - 1].Hash_Next = 0; }; Hash Implementation 5-3 BookL64364PG.fm5 Page 4 Friday, January 28, 2000 4:58 PM After the initialization of the hash table, the program needs to calculate the prime number. The C code segment for the calculation is shown in Figure 5.3. Figure 5.3 Find Prime Routine ulong FindPrime(ulong N) { ulong i, j, temp; /* Find the largest prime less than the TableSize */ for (i = 1; i < N; i++) { temp = N - i; for (j = temp / 2; j > 1; j--) { if ( (temp % j) == 0 ) { break; } } if (j == 1) { return temp; } } return 0; } Since each entry in the hash table is actually a flat linked list, the insertion procedure needs to check whether there is a connection already in this slot. If so, it needs to take one free entry from the free entry pool to append to the end of the list of that slot. Figure 5.4 shows the C code segments for these actions. 5-4 Hashing Function BookL64364PG.fm5 Page 5 Friday, January 28, 2000 4:58 PM Figure 5.4 Inserting a Connection into the Hashing Table /****************** Insertion *******************/ /* * Now insert this connection into the Hashing Table. * Note, only Rx direction needs hashing table. * On SDP we always loop Tx cells back as Rx cells, so the * cell header remains the same. */ /* Entry in the hashing table = VPI_VCI MOD prime */ { ulong n, VPI_VCI; pHash_Entry_t entry; VPI_VCI = Tx; n = VPI_VCI % prime; /* If HashTable[n].ConNum = 0, then insert into this entry */ if (HashTable[n].ConNum == 0) { HashTable[n].ConNum = Rx; HashTable[n].VPI_VCI = VPI_VCI; } else { /* else, always insert the new one right after the head of the list */ entry = freeEntryPool; freeEntryPool = entry->Hash_Next; entry->ConNum = Rx; entry->VPI_VCI = VPI_VCI; entry->Hash_Next = HashTable[n].Hash_Next; HashTable[n].Hash_Next = entry; } } Hash Implementation 5-5 BookL64364PG.fm5 Page 6 Friday, January 28, 2000 4:58 PM 5-6 Hashing Function BookL64364PG.fm5 Page 1 Friday, January 28, 2000 4:58 PM Chapter 6 Packet Aging This chapter describes the packet aging function in the ATMizer II+ chip and includes the following sections: • Section 6.1, “Introduction” • Section 6.2, “Mailbox Processing” • Section 6.3, “Packet Aging Routine” 6.1 Introduction The concept of packet aging is the notification to the host of idle connections that have not received a cell for a predefined period. The ATM Processing Unit (APU) samples the Virtual Connection Descriptor (VCD) and examines the TimeStamp value on the VCD to determine if the connection has to be labeled as an idle connection. In the present scheme, issuing a EDMA_RxCell command by the APU, together with an End of Message (EOM) cell, terminates the unfinished buffer. This terminates the buffer with (possibly) the ErrCRC, ErrLength and ErrAbort bits set and the EDMA places the terminated buffer in the EDMA_RxCompl queue. This buffer is placed on the ring in sequence along with the rest of the buffers corresponding to the connection by the APU. The idle connection number and the terminated buffer number are sent to the host so that the connection status can be updated to the idle state. The APU sends the host a message through the Mailbox with the address of the HCD_Rx structure in the PCI memory where the connection number and the buffer number are placed by the APU. The host samples the Mailbox and, if it is not empty, retrieves the message and processes it. Also, the buffer that is retrieved from the ring is processed according to the buffer processing policy of the host. In the L64364 ATMizer II+ ATM-SAR Chip Programming Guide 6-1 BookL64364PG.fm5 Page 2 Friday, January 28, 2000 4:58 PM current scheme, there is no identifier in the buffer status bits to indicate that the buffer was terminated by the APU and thereby should be differentiated from other buffers that have CRC, length, or abort errors. Therefore, the possibility exists that the buffer from the ring is retrieved before the Mailbox message is processed. The host’s buffer and connection processing protocols should be designed to account for this possibility. The discussion of the above scheme can be summarized as follows: the APU performs the packet aging routine and informs the host of the idle connections based on the expiration of the TimeStamp values in the VCD. 6-2 Packet Aging BookL64364PG.fm5 Page 3 Friday, January 28, 2000 4:58 PM 6.2 Mailbox Processing The Mailbox is used by the APU to send the message about a connection that has been in the idle state. The connection number and the terminated buffer number are placed in the HCD_Rx structure in the PCI memory whose structure is shown below in Figure 6.1. Figure 6.1 HCD_Rx Structure Declarations /* layout of the primary PCI memory */ typedef struct { ulong ConNum; /* Connection Number */ ulong CellHeader; /* Cell Header in Tx direction; In Rx - BuffNum */ ulong Class; /* Class of the connection */ ulong PCR; /* PCR of the connection */ ulong SCR_MCR; /* SCR(VBR) or MCR(ABR) of the connection */ ulong MBS_ICR; /* MBS(VBR) and ICR(ABR) */ ulong TBE; /* TBE */ ulong FRTT; /* Round trip time */ } PCI_HCD_t, *pPCI_HCD_t; typedef volatile struct { ulong CmdAck_APU; ulong CmdAck; ulong TxCredit; ulong RxRing[RX_RING_SIZE]; ulong Ext_Msg; Stat_t Stat[2]; Config_t Config; PCI_HCD_t HCD_Tx; PCI_HCD_t HCD_Rx; uchar Buff[4]; } PCI_t, *pPCI_t; Since more than one connection may be in the idle state, the APU updates the HCD_Rx after a confirmation that the previous message sent to the host was processed by using the CmdAck_APU field in the PCI memory. This handshake mechanism prevents the APU from overwriting the HCD_Rx before the corresponding message is processed by the host. Mailbox Processing 6-3 BookL64364PG.fm5 Page 4 Friday, January 28, 2000 4:58 PM 6.3 Packet Aging Routine The APU enters the packet aging procedure periodically and checks the connections in sequential order for idle connections. The APU checks if the TimeStamp on the VCD is expired based on a preprogrammed timeout value. If the connection has been idle, the APU checks if there is a buffer that is being processed by the EDMA. In the current implementation, the VCD_BuffPres bit is set if there is a buffer currently being processed that is attached to the VCD. If the buffer is present, the APU terminates the buffer by sending an EOM cell and puts the connection number and the buffer number in the HCD_Rx structure before issuing a message to the host. If there is no buffer present, the connection number and a zero buffer number are placed in HCD_Rx. 6-4 Packet Aging BookL64364PG.fm5 Page 1 Friday, January 28, 2000 4:58 PM Chapter 7 Interrupt Handling This chapter describes the interrupt handler function implemented in the L64364 ATMizer II+ ATM-SAR Chip and includes the following sections: • Section 7.1, “Introduction” • Section 7.2, “Nonvectored Interrupt Handler” • Section 7.3, “Vectored Interrupt Handler” 7.1 Introduction The CW4011 processor used in the L64364 ATMizer II+ ATM-SAR Chip supports three types of interrupt signals: • Cold/warm resets (CRESETn and WRESETn signals) and nonmaskable interrupts (NMIn signal) • External nonvectored interrupts (EXiNTn[5:0]) • External vectored interrupt (EXViNTn) This chapter focuses on the software side of external interrupt handling performed by the L64364 ATMizer II+ ATM-SAR Chip. It uses sample code developed by LSI Logic Corporation for the ATMizer II+ Application Development Platform (ADP) to illustrate the design steps. Refer to the L64364 ATMizer II+ ATM-SAR Chip Technical Manual for a detailed description of the interrupt handling mechanism. L64364 ATMizer II+ ATM-SAR Chip Programming Guide 7-1 BookL64364PG.fm5 Page 2 Friday, January 28, 2000 4:58 PM 7.2 Nonvectored Interrupt Handler In the L64364 ATMizer II+ architecture, there are six nonvectored interrupts which are used to handle catastrophic events. They are listed in Table 7.1 with number 5 being the highest priority. Table 7.1 Nonvectored Interrupt Sources Name Number IP Bit in Cause Reg. CW4011 Status Reg. Bit Description IntPCIErr 5 7 15 PCI abort or parity error IntSBErr 4 6 14 Secondary Bus error IntRateExc 3 5 13 Rate calculation exception or OCA Bus timeout IntRxMbxOvr 2 4 12 Receive Mailbox overflow IntSCD_BusErr 1 3 11 Scheduler bus error IntEDMA_BusErr 0 2 10 EDMA bus error The general handler detects which interrupt occurs by checking the Cause register, then jumps to the specific handler that takes care of that event. The following code is a sample of event handling (Figure 7.1): Figure 7.1 Nonvectored Interrupts General Handler #include <mips.h> #include <lr64363.h> #define int0 #define int1 #define int2 #define int3 #define int4 #define int5 .text .globl handler .ent handler .set noat handler: 7-2 Interrupt Handling 0x0400 0x0800 0x1000 0x2000 0x4000 0x8000 BookL64364PG.fm5 Page 3 Friday, January 28, 2000 4:58 PM # check to see whether it’s for me .set noreorder mfc0 k0,C0_CAUSE nop .set reorder # first see whether EXCCODE = 0 and k1,k0,CAUSE_EXCMASK bne k1,zero,trap # mask IP bits with int mask .set noreorder mfc0 k1,C0_SR nop and k0,k1 .set reorder # now check the intEDMA_BusErr IP bit and k1,k0,int0 bne k1,zero,intEDMA_BusErr # now check the intSCD_BusErr IP bit and k1,k0,int1 bne k1,zero,intSCD_BusErr # now check the intRxMbxOvr IP bit and k1,k0,int2 bne k1,zero,intRxMbxOvr # now check the intRateExc IP bit and k1,k0,int3 bne k1,zero,intRateExc # now check the intSBAddrErr IP bit and k1,k0,int4 bne k1,zero,intSBAddrErr # now check the intPCIErr IP bit and k1,k0,int5 bne k1,zero,intPCIErr # handle trap exceptions such as bus error # address error, etc ... trap: b done intRxMbxOvr: intRateExc: Nonvectored Interrupt Handler 7-3 BookL64364PG.fm5 Page 4 Friday, January 28, 2000 4:58 PM intSBAddrErr: intPCIErr: # reset the hardware modules li k0,M_APU_AddrMap li k1,ADRM_RESET sw k0,(k1) b done intEDMA_BusErr: intSCD_BusErr: # reset the hardware modules li k0,M_APU_AddrMap li k1,ADRM_RESET sw k0,(k1) # reinitialize the SDRAM controller # since it is possible that the bus error # occurs on the SDRAM page b sdram_init sdram_init: # Load base address li t2, SBC_BASE /* Precharge command */ li t0, 0x4033b753 sw t0, 4(t2) # control reg /* Mode register */ li t0, 0x20228530 sw t0, 4(t2) # control reg /* Set Mode */ li t0, 0x0000eeee li t3, 0x80811800 sw t0, (t3) /* Set Control */ li t0, 0x10228530 sw t0, 4(t2) # control reg /* Load refresh register */ li t0, 0x00000300 sw t0, 8(t2) # refresh reg .set noreorder done: 7-4 Interrupt Handling BookL64364PG.fm5 Page 5 Friday, January 28, 2000 4:58 PM mfc0 k0,C0_EPC nop j k0 rfe .set reorder .end handler .set at If you use the LSI Logic PMON for debugging, the handler can print out a message that describes the type of the interrupt. It then transfers control to PMON for debugging by calling the exit() function as the following example code illustrates (Figure 7.2): Figure 7.2 General Handler Exit to PMON .data int0_msg: .assize “Interrupt: EDMA Bus Error\n” la jal a0,int0_msg printf # get message to print # jump to printf routine jal exit # exit to PMON 7.3 Vectored Interrupt Handler In the L64364 ATMizer II+ architecture, there are 16, prioritized, vectored interrupts as shown in Table 7.2. Table 7.2 Vectored Interrupt Sources Name Number1 Description IntEDMA_ComplFull 15 TxCell, RxCell or Buff Completion Queue is full IntACI_RxFull 14 ACI Receive FIFO full IntRxMbx 13 Receive Mailbox FIFO nonempty IntMove_Compl 12 Move complete IntEDMA_RxCompl 11 RxCell Completion Queue nonempty IntACI_RxThrld 10 ACI Receive FIFO exceeds threshold (Sheet 1 of 2) Vectored Interrupt Handler 7-5 BookL64364PG.fm5 Page 6 Friday, January 28, 2000 4:58 PM Table 7.2 Vectored Interrupt Sources (Cont.) Name Number1 Description IntEDMA_TxCompl 9 TxCell Completion Queue nonempty IntEDMA_BuffCompl 8 Buff Completion Queue nonempty IntACI_Err 7 Tiimeout, parity, or short-cell error IntACI_TxThrld 6 ACI Transmit FIFO drops below threshold IntExt[1:0] 5–4 External interrupt inputs (user-defined) IntTim[3:1] 3–1 Timers 3–1 timeout IntTim[0] 0 Timer 8 timeout (Sheet 2 of 2) 1. APU Status register interrupt bit numbers and APU_VIntEnable register bit numbers match the interrupt numbers shown here. The L64364 ATMizer II+ ATM-SAR Chip Technical Manual has a detailed description of its interrupt mechanism. This section shows how to handle the vectored interrupts with sample code. 7.3.1 Enable Interrupts The following code shows how to enable the vectored interrupts (Figure 7.3): Figure 7.3 Vectored Interrupts Enabling Routine #include “regdef.h” #include “cp0_scobra.h” .text .set noreorder .globl VectEn .ent VectEn VectEn: /* store return address on stack */ subu sp,24 sw ra,20(sp) /* * Map vector interrupt table through APU_VIntBase and * APU_VIntEnable registers. 7-6 Interrupt Handling BookL64364PG.fm5 Page 7 Friday, January 28, 2000 4:58 PM */ /* Hardware Register Base address */ la t1, 0xb8000000 /* load address of vector interrupt table */ la t0, V_Handler /* align vector interrupt table address */ srl t0, t0, 7 /* store address in the APU_VIntBase reg */ sw t0, 0x310(t1) /* interupt mask */ /* * Don’t enable IntAci_Tx here, it will jump * to its handler right way. Enable it when * a connection is opened. */ #define APU_VINT_MASK \ (EDMA_COMPLFULL << 15) | \ (ACI_RXFULL << 14) | \ (RXMBX << 13) | \ (EDMA_MOVE << 12) | \ (EDMA_RXCELL << 11) | \ (ACI_RX << 10) | \ (EDMA_TXCELL << 9) | \ (EDMA_BUFF << 8) | \ (ACI_ERR << 7) | \ (ACI_Tx << 6) | \ (EXT1 << 5) | \ (EXT0 << 4) | \ (TIMER3 << 3) | \ (TIMER2 << 2) | \ (TIMER1 << 1) | \ (TIMER8 << 0) li t0, APU_VINT_MASK /* enable interrupts at APU */ sh t0, 0x30e(t1) /* enable Vectored Interrupts in CCC reg */ /* t0 <- CCC Register */ mfc0 t0, C0_CONFIG nop nop Vectored Interrupt Handler 7-7 BookL64364PG.fm5 Page 8 Friday, January 28, 2000 4:58 PM /* enable vectored interrupts(set EVI) */ li t1, 0x02000000 or t0, t0, t1 mtc0 t0, C0_CONFIG nop nop /* enable Vectored Interrupts in C0_SR reg */ mfc0 t0, C0_SR nop nop ori t0, t0, SR_IE mtc0 t0, C0_SR nop nop /* Return to the caller */ lw ra, 20(sp) addu sp, 24 j ra nop nop .set reorder .end VectEn 7.3.2 General Handler The following code is a sample for the general handler (Figure 7.4): Figure 7.4 Vectored Interrupts General Handler .text .align 7 # aligned for APU_VIntBase register. .set noreorder V_Handler: /* ################################################## */ /* Vectored table */ /* ################################################## */ timer8: /* Timer #8 timed out */ j timer8_handler /* Int 0 */ nop timer1: /* Timer #1 timed out */ j timer1_handler /* Int 1 */ nop timer2: /* Timer #2 timed out */ j timer2_handler /* Int 2 */ 7-8 Interrupt Handling BookL64364PG.fm5 Page 9 Friday, January 28, 2000 4:58 PM nop timer3: j timer3_handler nop ext_int0: j ext_int0_handler nop ext_int1: j ext_int1_handler nop aci_tx: j nop aci_err: j aci_tx_handler aci_err_handler nop edma_bcq_ne: /* Timer #3 timed out */ /* Int 3 */ /* External interrupt #0 */ /* Int 4 */ /* External interrupt #1 */ /* Int 5 */ /* ACI Tx FIFO drops below threshold */ /* Int 6 */ /* ACI Error FIFO non-empty */ /* Int 7 */ /* EDMA buffer completion queue non-empty */ j edma_bcq_ne_handler /* Int 8 */ nop edma_txcq_ne: /* EDMA TxCell completion queue non-empty */ j edma_txcq_ne_handler/* Int 9 */ nop aci_rx: /* ACI Rx FIFO exceeds threshold */ j aci_rx_handler /* Int a */ nop edma_rxcq_ne: /* EDMA RxCell completion queue non-empty */ j edma_rxcq_ne_handler/* Int b */ nop edma_move: /* EDMA move completion */ j edma_move_handler /* Int c */ nop rx_mbox_ne: /* Rx mailbox non-empty */ j rx_mbox_ne_handler /* Int d */ nop aci_rx_full: /* ACI Rx FIFO full */ j aci_rx_full_handler /* Int e */ nop edma_cq_full: /* EDMA completion queue full */ j edma_cq_full_handler/* Int f */ nop Vectored Interrupt Handler 7-9 BookL64364PG.fm5 Page 10 Friday, January 28, 2000 4:58 PM 7.3.3 Individual Handlers Each interrupt event has its own interrupt handler. This handling routine is the same C or assembly file used in a regular polling mode application, except that the application is in interrupt-driven mode. Following is a sample code for handling the IntRxMbx interrupt (Figure 7.5): Figure 7.5 IntRxMbx Interrupt Handler #include “regdef.h” #include “cp0_scobra.h” #include “Interrupt.h” .text .set noreorder rx_mbox_ne_handler: /* Rx mailbox non-empty */ /* * allocate some space on the stack prior to enabling ints. */ subu sp,C_SIZE*4 .set reorder #if !REG_MAP | RXMBX_DEBUG /* now save the rest of the registers */ sw AT,C_AT*4(sp) sw v0,C_V0*4(sp) sw v1,C_V1*4(sp) sw a0,C_A0*4(sp) sw a1,C_A1*4(sp) sw a2,C_A2*4(sp) sw a3,C_A3*4(sp) sw t0,C_T0*4(sp) sw t1,C_T1*4(sp) sw t2,C_T2*4(sp) sw t3,C_T3*4(sp) sw t4,C_T4*4(sp) sw t5,C_T5*4(sp) sw t6,C_T6*4(sp) sw t7,C_T7*4(sp) sw t8,C_T8*4(sp) sw t9,C_T9*4(sp) #endif sw ra,C_RA*4(sp) mflo t6 sw t6,C_LO*4(sp) 7-10 Interrupt Handling BookL64364PG.fm5 Page 11 Friday, January 28, 2000 4:58 PM mfhi t6 sw t6,C_HI*4(sp) subu sp,24 # allocate min size context li # atmizer hardware register base k1, 0xb8000000 .set noreorder #if RXMBX_DEBUG la a0, got_rx_mbox_ne jal printf nop la jal nop lw jal nop #endif a0, apu_status_msg printf # list all active interrupts a0,0x314(k1) print_reg # read APU_Status register #if !REG_MAP lw a0, 0x404(k1) #else lw s4, 0x404(k1) #endif jal HostMsg nop # read PP_RxMbx register # read PP_RxMbx register #if RXMBX_DEBUG la a0, pass_msg jal printf nop nop #endif #if RXMBX_DEBUG la a0, EPC_msg jal printf nop #lw a0,(C_EPC*4+24)(sp) mfc0 k0,C0_EPC nop jal print_reg jal printf nop #endif .set reorder Vectored Interrupt Handler 7-11 BookL64364PG.fm5 Page 12 Friday, January 28, 2000 4:58 PM addu sp,24 # deallocate #if !REG_MAP | RXMBX_DEBUG lw AT,C_AT*4(sp) lw v0,C_V0*4(sp) lw v1,C_V1*4(sp) lw a0,C_A0*4(sp) lw a1,C_A1*4(sp) lw a2,C_A2*4(sp) lw a3,C_A3*4(sp) lw t0,C_T0*4(sp) lw t1,C_T1*4(sp) lw t2,C_T2*4(sp) lw t3,C_T3*4(sp) lw t4,C_T4*4(sp) lw t5,C_T5*4(sp) # t6 is restored later lw t7,C_T7*4(sp) lw t8,C_T8*4(sp) lw t9,C_T9*4(sp) #endif lw ra,C_RA*4(sp) lw t6,C_LO*4(sp) mtlo t6 lw t6,C_HI*4(sp) mthi t6 .set noreorder /* restore t6, EPC and deallocate stack */ #if !REG_MAP | RXMBX_DEBUG lw t6,C_T6*4(sp) #endif #lw k0,C_EPC*4(sp) mfc0 k0,C0_EPC nop addu sp,C_SIZE*4 j k0 /* return from interrupt R3000 mode */ rfe .set reorder The above code is a regular interrupt handler. It must save the contents of registers before jumping into the handling routine and restore them before exiting from the handler. Since the vectored interrupt events happen so frequently, this kind of storing/restoring decreases the overall 7-12 Interrupt Handling BookL64364PG.fm5 Page 13 Friday, January 28, 2000 4:58 PM performance dramatically. To avoid it, you may separate the entire general register set into the three domains shown in Table 7.3. Table 7.3 General Register Map General Domain1 Domain2 $zero ($0) $at ($1) $s8 ($30) $ra ($31) $v0 ($2) $v1 ($3) $sp ($29) $a0 ($4) $s4 ($20) $gp ($28) $a1 ($5) $s5 ($21) LO $a2 ($6) $s6 ($22) HI $a3 ($7) $s7 ($23) $s0 ($16) $k0 ($26) $s1 ($17) $k1 ($27) $s2 ($18) $t4 ($12) $s3 ($19) $t5 ($13) $t0 ($8) $t6 ($14) $t1 ($9) $t7 ($15) $t2 ($10) $t8 ($24) $t3 ($11) $t9 ($25) The idea behind this separation is to divide the general registers into two nonoverlapped sets, one is used only by the regular routines and the other one is used only by the interrupt handlers. Thus, the handlers do not have to unnecessarily store/restore those not shared. The shared part still needs saving/reloading. Many tools support the option to restrict references to the specified registers when generating the assembly code. The following procedure uses Gnu tools. As shown in Table 7.3, general registers are required by the compiler and assembler, and should be used by both regular routines and the interrupt handlers. The source files are separated into two sets, one called regular files and the other called handler files. Vectored Interrupt Handler 7-13 BookL64364PG.fm5 Page 14 Friday, January 28, 2000 4:58 PM Step 1. Compile the files with -S and -ffixed options to generate the assembly files using only registers in domain 1. Step 2. Convert domain 1 registers in the handler code to domain 2 registers through a post-processing script file. Step 3. Call the assembler to assemble the files to generate the object and executable code. The drawback of this scheme is that it violates the MIPS convention, and thus makes it difficult (if not impossible) for debugging since the debugger has no idea of this register shuffle. You may have to follow the MIPS convention before developing a bug-free working code. This is also indicated in the sample code provided in this section. 7-14 Interrupt Handling BookL64364PG.fm5 Page 1 Friday, January 28, 2000 4:58 PM Chapter 8 OAM Cell Processing This chapter outlines the implementation of the Operations and Management (OAM) cell processing function in ATMizer II+ software. The OAM function is defined at the Physical and the ATM layers. The Physical Layer OAM cell processing is done by the framer (for example, the SuniLite Framer chip). The software running on the ATM Processing Unit (APU) performs the ATM Layer OAM cell processing. The software examples in this chapter are provided for demonstration and evaluation purposes. This chapter contains the following sections: • Section 8.1, “Introduction,” • Section 8.2, “F4 OAM Flow,” • Section 8.3, “F5 OAM Flow,” 8.1 Introduction The OAM cells are defined by the International Telecommunications Union in specification ITU-T I.610. OAM cells are loaded by the host to the ATMizer II+ chip. These cells are transferred between two ATM end units to convey management information about the network. The OAM cell flows are defined at the Physical layer and the ATM layer. They have predefined header values to distinguish them from the regular data cells on the link. The OAM flows ‘F1’ and ‘F3’ are at the Physical Layer. The flows ‘F4’ and ‘F5’ are at the ATM Layer. An ATM cell is identified as an F4 OAM cell by a 0x00000040 header and an F5 OAM cell is identified by a 0x0000000A header. The F4/F5 OAM cells are treated as out-ofband cells and passed to the host directly by the APU which processes them without involving the EDMA. The software support for OAM is used to create OAM cell flows and filter the incoming OAM cells. L64364 ATMizer II+ ATM-SAR Chip Programming Guide 8-1 BookL64364PG.fm5 Page 2 Friday, January 28, 2000 4:58 PM 8.2 F4 OAM Flow The F4 OAM flow is for the management of the Virtual Paths (VPs) between ATM end units. The connections in the ATMizer II+ chip are characterized by a connection number for the hardware. Since the F4 OAM cell flow for a VP is independent of the connections in that VP, a separate connection number needs to be reserved for the support of the OAM flow for each VP. This connection number is used by the Scheduler to send the OAM cells in the Tx direction. In the Rx direction, the OAM cells are received and processed by the APU. OAM cells are passed to the host through Mailbox messaging because: • they are individually complete and do not have to be reassembled, • they occur at a low rate, and • since they contain important information on the condition of the network, the host needs to be notified immediately. Since only one cell is to be sent, the use of a buffer to carry the cell (small buffer) is inefficient. Therefore, the received OAM cell contents for a VP are copied into the primary memory and the host is notified. A structure called OAM_VPC_t is defined in the primary memory to transfer the contents of the OAM cell to and from the host as shown below in Figure 8.1. Figure 8.1 OAM Cell Declarations /* -----------------------------------------------------* OAM cell declaration */ typedef struct { ulong ConNum; Cell_t OAM_Cell; } OAM_VPC_t, *pOAM_VPC_t; typedef volatile struct { ulong CmdAck; ulong CmdAck_APU; #if defined(OAM_F4) || defined(OAM_F5) ulong CmdAck_OAM_Tx; ulong CmdAck_OAM_Rx; #endif ulong TxCredit; ulong RxRing[RX_RING_SIZE]; 8-2 OAM Cell Processing BookL64364PG.fm5 Page 3 Friday, January 28, 2000 4:58 PM ulong Ext_Msg; Stat_t Stat[2]; Config_t Config; PCI_HCD_t HCD_Tx; PCI_HCD_t HCD_Rx; #if defined(OAM_F4) || defined(OAM_F5) OAM_VPC_t OAM_VPC_Tx; OAM_VPC_t OAM_VPC_Rx; #endif uchar Buff[4]; } PCI_t, *pPCI_t; The OAM_VPC_Tx structure is used by the host to send the contents of a VP OAM cell along with the connection number associated with the OAM flow. OAM_VPC_Rx is used by the APU to send the contents of the OAM cell filtered by the APU to the host. Since several OAM flows may be open at the same time, the messaging between the host and the APU is done using a handshake through the CmdAck_OAM_Tx and CmdAck_OAM_Rx fields in the primary memory. This prevents the structure from being overwritten before it is read. The host also maintains a structure (OAM_VP) for each VP OAM flow of type OAM_VPC_t. The connection information for the OAM flow is maintained in the HCD_PAR_t structure OAM_VP_HCD as shown in Figure 8.2. Figure 8.2 OAM Flow Connection Information #ifdef OAM_F4 #define OAM_F4_COUNT 1 OAM_VPC_t OAM_VP[OAM_F4_COUNT]; HCD_PAR_t OAM_VP_HCD[OAM_F4_COUNT]; #endif The ATM end unit that transmits the F4 OAM cells is denoted by F4_B_NT1 and the ATM end unit that receives and turns around the OAM cells is denoted by F4_B_NT2. The code for these two end units is compiled separately by the compiler directives #ifdef F4_B_NT1 and #ifdef F4_B_NT2, respectively. 8.2.1 Initialization of F4 Flow The F4_B_NT1 host opens an F4 flow for a VP by issuing an open connection command with a connection number. The connection parameters are initialized in OAM_VP_HCD. The connection is initialized as a CBR connection with a specified cell rate thereby setting the cell rate of the OAM cell flow. In the ATMizer II+ chip, the scheduling of the F4 OAM Flow 8-3 BookL64364PG.fm5 Page 4 Friday, January 28, 2000 4:58 PM connection is done by the Scheduler and the first cell is scheduled when a buffer is attached to the VCD corresponding to the connection. Since the OAM flow does not use the data from the buffer attached to the VCD, a dummy buffer is attached by the host to the VCD to start the OAM cell transmission. Before attaching a dummy buffer, the host initializes the OAM cell in the primary memory for the VP and then issues a Buff command via the TxRing to attach the dummy buffer. This procedure is shown in the code below, Figure 8.3. Figure 8.3 OAM Cell Initialization #ifdef OAM_F4 #ifdef F4_B_NT1 /* Open Connection for OAM_VP */ ConNum = OAM_VP_OFFSET; for (i=0; i < pPCI->Config.OAM_VPCount; i++) { OAM_VP_HCD[i].ConNum = ConNum; OAM_VP_HCD[i].CellHeader = OAM_F4E_CellHdr | ((ConNum - OAM_VP_OFFSET) << CELL_VPI); OAM_VP_HCD[i].Class = Class_CBR; OAM_VP_HCD[i].PCR = OAM_F4_PCR; OAM_VP_HCD[i].Status = REQ_OPEN; if (Open_Connection(&OAM_VP_HCD[i])) Halt("Cannot open OAM_VP connection"); OAM_VP_HCD[i].StartTime = mfc0(9); OAM_VP_HCD[i].Status = OPEN; OAM_INT/* Initialize the OAM Cell */ OAM_Init(OAM_VP_HCD[i].ConNum, OAM_VP_HCD[i].CellHeader, OAM_Perf_Mon, &OAM_VP[i]); /* Send OAM Cell to the APU */ OAM_Send(&OAM_VP[i], MSG_OAM_F4); printf("Opened Connection OAM_VP with VP= %d.\n",OAM_VP_HCD[i].ConNum); /* Send dummy Buffer to start connection */ half(BFD[BuffNum].BFD_Ctrl) = 0; BFD[BuffNum].ConNum = ConNum; n = 0; while (RingPut(&TxRing, BuffNum) == 0) if (++n == TIMEOUT_TX_RING) Halt("TxRing timeout in OpenConnection"); ConNum++; } #endif #endif 8-4 OAM Cell Processing BookL64364PG.fm5 Page 5 Friday, January 28, 2000 4:58 PM The code in Figure 8.3 OAM_VP_OFFSET is defined to be the offset reserved for the connection numbers for the VP OAM flows. The OAM cell header is formed using the connection number for the VP and OAM_F4E_CellHdr which is defined in Figure 8.4. Figure 8.4 OAM Cell Header Formation #define OAM_F4E_CellHdr 0x00000040 /* ATM Layer VPI OAM cell- F4 End-to-End */ The routine OAM_Init() initializes the OAM_VP structure and the routine OAM_Send() sends it to the APU. In OAM_Init(), the contents of the OAM cell are initialized based on the function of the OAM cell as defined in ITU-T I.610. Once the APU receives the message, it initializes the OAM cell contents in the secondary memory. From secondary memory, the contents are copied into the cell in Cell Buffer Memory (CBM) before the OAM cell is sent out. 8.2.2 F4 Flow Transmit When the connection number assigned to the VP is serviced by the Scheduler, the OAM cell contents are copied from the secondary memory OAM_VP structure to the cell in CBM. The cell is then sent out. This is achieved by the routine OAM_Send() in the TxCell routine of the APU as shown below in Figure 8.5. Figure 8.5 OAM_Send() Routine /* -------------------------------------------------------* Name: OAM_Send() * * Description: Send OAM cell for F4 and F5 flows * * parameters: pCell: cell address in Cell Buffer * * -------------------------------------------------------*/ void OAM_Send(const ulong ConNum, const ulong OAM_Type, const pCell_t pCell) { #if defined(F4_B_NT1) || defined(F5_B_NT1) int i; #endif uchar *Ptr = 0; switch (OAM_Type) { #ifdef OAM_F4 case MSG_OAM_F4: Ptr = (uchar *) &OAM_VP[ConNum-OAM_VP_OFFSET].OAM_Cell.Payld; F4 OAM Flow 8-5 BookL64364PG.fm5 Page 6 Friday, January 28, 2000 4:58 PM break; #endif #ifdef OAM_F5 case MSG_OAM_F5: Ptr = (uchar *) &OAM_VC[ConNum-OAM_VP_OFFSET].OAM_Cell.Payld; break; #endif default: break; } #if defined(F4_B_NT1) || defined(F5_B_NT1) /* Processing for B_NT1 - Send OAM Cell */ pCell->CDS = ONE(CDS_Crc10); switch (OAM_Type) { #ifdef OAM_F4 case MSG_OAM_F4: pCell->CellHdr = OAM_VP[ConNum-OAM_VP_OFFSET].OAM_Cell.CellHdr; break; #endif #ifdef OAM_F5 case MSG_OAM_F5: pCell->CellHdr = OAM_VC[ConNum-OAM_VC_OFFSET].OAM_Cell.CellHdr; break; #endif default: break; } for (i=0; i < 48; i++) pCell->Payld[i] = *Ptr++; EDMA_TxCell(0,pCell); return; #endif #if defined(F4_B_NT2) || defined(F5_B_NT2) Halt("Example code does not send OAM cells from B_NT2"); #endif } 8.2.3 F4 Flow Receive On the receiving side, the F4 OAM cell is filtered by the APU using the cell header and the OAM_Receive()routine is called in the RxCell processing by the APU to process the OAM cell. In the OAM_Receive() code shown in the following example, the F4_B_NT2 end unit turns the cell around and re-transmits it back to F4_B_NT1. In other applications, the cell contents may be modified to notify F4_B_NT1 of changes in the network conditions. The same routine is used by F4_B_NT1 to process 8-6 OAM Cell Processing BookL64364PG.fm5 Page 7 Friday, January 28, 2000 4:58 PM the OAM cell received back from F4_B_NT2. The F4_B_NT1 end unit informs the host of the OAM cell by copying the cell contents to the primary memory and sending a message to the host through the Mailbox, as shown in Figure 8.6. Figure 8.6 APU OAM_Receive() Routine /* ------------------------------------------------------* Name: OAM_Receive() * Description: Processing of OAM cell received for F4 or F5 flows * * parameters: pCell: cell address in Cell Buffer * * ------------------------------------------------------*/ void OAM_Receive(const ulong OAM_Type, const pCell_t pCell) { /* In this example code we turn around the OAM cell and update * the outgoing OAM cell. */ #if defined(F4_B_NT1) || defined(F5_B_NT1) ulong Msg, Cmd; ulong *Src, *Dst; long Status; int i; #endif ulong ConNum = 0; switch (OAM_Type) { #ifdef OAM_F4 case MSG_OAM_F4: ConNum = OAM_VP_OFFSET + (pCell->CellHdr & OAM_VP_MASK >> CELL_VPI); break; #endif #ifdef OAM_F5 case MSG_OAM_F5: ConNum = ((pCell->CellHdr & OAM_VC_MASK) >> CELL_VCI); break; #endif default: break; } if (pCell->CDS & ONE(CDS_Crc10)) Stat.ErrCrc10++; #if defined(F4_B_NT1) || defined(F5_B_NT1) /* Test to see if the Mailbox if free */ Cmd = pPCI->CmdAck_OAM_Rx; while (Cmd != MAILBOX_FREE) F4 OAM Flow 8-7 BookL64364PG.fm5 Page 8 Friday, January 28, 2000 4:58 PM Cmd = pPCI->CmdAck_OAM_Rx; pPCI->CmdAck_OAM_Rx = MAILBOX_BUSY; /* Copy the contents of the cell to the OAM_VP */ Src = (ulong *) &pPCI->OAM_VPC_Rx.OAM_Cell; Dst = (ulong *) pCell; pPCI->OAM_VPC_Rx.ConNum = ConNum; for (i=0; i < sizeof(Cell_t)/sizeof(ulong); i++) *Dst++ = *Src++; /* pPCI->OAM_VPC_Rx.OAM_Cell = *pCell; */ /* Send the ConNum in the MailBox to host */ switch (OAM_Type) { #ifdef OAM_F4 case MSG_OAM_F4: pPCI->Ext_Msg = MSG(MSG_OAM_F4, &pPCI->OAM_VPC_Rx); break; #endif #ifdef OAM_F5 case MSG_OAM_F5: pPCI->Ext_Msg = MSG(MSG_OAM_F5, &pPCI->OAM_VPC_Rx); break; #endif default: break; } Msg = MSG(MSG_ASYNC, &pPCI->Ext_Msg); Status = (long) APU_Status(); while (Status < 0) Status = (long) APU_Status(); PP_TxMbx(Msg); ACI_Free(pCell); return; #endif #if defined(F4_B_NT2) || defined(F5_B_NT2) /* Update using contents of OAM_Cell * Turn around the received OAM Cell */ /* NULL Update in this example code */ /* Send the cell with modified payload */ while (EDMA_Status() & ONE(EDMA_TxCellReqFull)) { ; } EDMA_TxCell(0,pCell); switch (OAM_Type) { #ifdef OAM_F4 8-8 OAM Cell Processing BookL64364PG.fm5 Page 9 Friday, January 28, 2000 4:58 PM case MSG_OAM_F4: Stat.OAM_VP++; break; #endif #ifdef OAM_F5 case MSG_OAM_F5 Stat.OAM_VC++; break; #endif default: break; } return; #endif } 8.2.4 Host Processing of F4 Flow The host processes the OAM cell for the VP with the OAM_Receive() routine. An example of that routine is shown below. You can easily modify the code or add to it for your specific application (Figure 8.7). Figure 8.7 Host OAM_Receive() Routine /* -------------------------------------------------------* Name: OAM_Receive() * * Description: Processing of OAM cell received for F4 and F5 flow * * parameters: Msg: Message with address of OAM Cell * * -------------------------------------------------------*/ void OAM_Receive(const ulong Msg) { /* In this example code we update the contents of the OAM_VP * ulong *TargetAddr = (ulong *) MapFromAPU((ulong) MSG_PTR(Msg)); ulong *Dst; ulong OAM_Type = MSG_TAG(Msg); ulong ConNum = *TargetAddr; pOAM_F45_t Payld; int i; switch (OAM_Type) { #ifdef OAM_F4 case MSG_OAM_F4: Payld = (pOAM_F45_t) &OAM_VP[ConNumOAM_VP_OFFSET].OAM_Cell.Payld; Dst = (ulong *) &OAM_VP[ConNum - OAM_VP_OFFSET]; for (i=0; i < (sizeof(Cell_t)/sizeof(ulong) + 1); i++) F4 OAM Flow 8-9 BookL64364PG.fm5 Page 10 Friday, January 28, 2000 4:58 PM *Dst++ = *TargetAddr++; break; #endif #ifdef OAM_F5 case MSG_OAM_F5: Payld = (pOAM_F45_t) &OAM_VC[ConNum OAM_VC_OFFSET].OAM_Cell.Payld; Dst = (ulong *) &OAM_VC[ConNum - OAM_VC_OFFSET]; for (i=0; i < (sizeof(Cell_t)/sizeof(ulong) + 1); i++) *Dst++ = *TargetAddr++; break; #endif default: break; } switch (Payld->OAM_Func_Type) { case OAM_Fault_AIS: case OAM_Fault_FERF: { pOAM_AIS_t OAM_AIS = (pOAM_AIS_t) Payld->R; /* Process the information */ #if defined(F4_B_NT1) || defined(F5_B_NT1)/* NULL BEHAVIOR */ #endif #if defined(F4_B_NT2) || defined(F5_B_NT2) /* NULL BEHAVIOR FOR B_NT2 */ #endif break; } /*Placeholders for user modification*/ case OAM_Perf_Fwd: case OAM_Perf_Bck: case OAM_Perf_Mon: { pOAM_Perf_t OAM_Perf = pOAM_Perf_t) Payld->R; /* Process the information */ #if defined(F4_B_NT1) || defined(F5_B_NT1) #endif #if defined(F4_B_NT2) || defined(F5_B_NT2) /* NULL BEHAVIOR FOR B_NT2 */ #endif break; } case OAM_Act_Perf: case OAM_Act_Cont: { pOAM_Act_t OAM_Act = (pOAM_Act_t) Payld->R; /* Process the information */ #if defined(F4_B_NT1) || defined(F5_B_NT1) #endif 8-10 OAM Cell Processing BookL64364PG.fm5 Page 11 Friday, January 28, 2000 4:58 PM #if defined(F4_B_NT2) || defined(F5_B_NT2) #endif break; } default: break; } #if defined(F4_B_NT1) || defined(F5_B_NT1) /* Copy the Modified OAM Cell to the APU */ pPCI->CmdAck_OAM_Rx = MAILBOX_FREE; return; #endif } Therefore, using the OAM_VPC_t structure, the OAM cells are passed from the host to the APU and back. The F4 flow for each VP is facilitated by opening a connection and initializing a VCD. 8.3 F5 OAM Flow The F5 OAM flow is defined for a Virtual Connection (VC). As before, a different connection number is assigned to the OAM flow for a VC. Therefore, a VC with OAM flow is represented by two connection numbers in the ATMizer II+ chip. One connection number is for the regular data flow with the appropriate scheduling mechanism (CBR, VBR, ABR etc.), and the other is for the OAM flow associated with the VC. The Mailbox messaging is used by the APU to convey the OAM cell contents to and from the host as described in the previous section. The procedures for sending and receiving OAM cells remains the same as before with the exception of the cell header filtering, which is different for F5 flow as defined in ITU-T I.610. F5 OAM Flow 8-11 BookL64364PG.fm5 Page 12 Friday, January 28, 2000 4:58 PM 8-12 OAM Cell Processing BookL64364PG.fm5 Page 1 Friday, January 28, 2000 4:58 PM Chapter 9 AAL3/4 Processing This chapter describes the software for AAL3/4 processing in the ATMizer II+ chip. The EDMA in the ATMizer II+ chip is designed to process AAL5 CS-PDUs and to support AAL0 type connections. The segmentation and reassembly support for AAL0 connections provided by the EDMA can be used to implement the CS-PDU segmentation and reassembly for AAL3/4 connections. The ATM Processing Unit (APU) in the ATMizer II+ chip can preprocess AAL3/4 PDU data before it is segmented or reassembled by the EDMA. This chapter contains the following sections: • Section 9.1, “Introduction” • Section 9.2, “AAL3/4 Segmentation” • Section 9.3, “AAL3/4 Reassembly” 9.1 Introduction The AAL3/4 CS-PDU defined by ITU-T I.363 is segmented into cells as shown in Figure 9.1. The Segment Type in the SAR-PDU header indicates whether the SAR-PDU is a Beginning of Message (BOM), a Continuation of Message (COM), an End of Message (EOM), or a Single Segment Message (SSM). The Sequence Number (SN) is incremented by the sender, starting at the BOM, so the receiver can detect missing or out-of-order cells. The Multiplex Identification (MID) allows up to 1024 logical connections to be multiplexed over a single ATM virtual channel. The Length Indicator in the trailer indicates the number of valid data octets in the payload (44 for BOM and COM segments; maybe less for EOM and SSM segments). L64364 ATMizer II+ ATM-SAR Chip Programming Guide 9-1 BookL64364PG.fm5 Page 2 Friday, January 28, 2000 4:58 PM Figure 9.1 AAL3/4 Cell Layout ATM AAL3/4 Cell SAR-PDU SAR-PDU Header SAR-PDU Trailer Cell Header ST SN MID SAR-PDU Payload LI 5 2 Octets Bits 4 Bits 10 Bits 44 Octets 6 Bits ST = Segment Type SN = Sequence Number MID = Multiplex Identification CRC 10 Bits LI = Length Indicator CRC = Cyclic Redundancy Check In the segmentation process, the SAR_PDU header and the trailer have to be put together by the APU before the cell is transmitted. The next Section outlines the steps needed to support an AAL3/4 connection in the ATMizer II+ application code. 9.2 AAL3/4 Segmentation The processing needed for AAL3/4 connections is different from that for AAL5 connections. To distinguish AAL3/4 connections from AAL5 connections, the AAL type needs to be a part of the ACD structure. The AAL3/4 SAR_PDU header and trailer are processed by the APU after the EDMA fills in the SAR-PDU data in the cell payload. To facilitate this, the VCD is set in the AAL0 mode (VCD_ALL0 control bit set) with Cell Hold turned on (VCD_CellHold control bit set). In the CellHold mode, the EDMA processes the cell but does not send the cell out to the ACI. Instead, the EDMA returns the cell address to the auxilliary completion queue with the Cell-Hold completion message. The connection number is returned in the Cell-Hold completion message. The APU determines the AAL type of the connection based on the AAL_Type field in ACD_Ctrl_t structure. The format for ACD_Ctrl_t structure is shown in Figure 9.2. 9-2 AAL3/4 Processing BookL64364PG.fm5 Page 3 Friday, January 28, 2000 4:58 PM Figure 9.2 ACD_Ctrl_t Structure typedef struct { ushort Tx:1, CalNum:2, Class:3, EFCI:1, CLP:1, PHY:5, CellHold:1, AAL_Type:2; } ACD_Ctrl_t; The Cell-Hold completion message is processed by the APU to complete the header and the trailer of the SAR-PDU using the SAR_Hdr field, which holds the sequence number, segment type, and the MID of the SAR-PDU as shown in Figure 9.3. Figure 9.3 SAR_PDU Header Declarations /* ---------------------------------------* SAR-PDU Header type */ typedef struct { ushort ST:2, /* Segment Type */ SN:4, /* Sequence Number */ MID:10; /* Multiplexing Identifier */ } SAR_HDR_t; The ACD structure is modified to hold the SAR_Hdr field as shown in the following: typedef struct { ulong ICG; /* in 16.8 format */ ulong ThTxTime; /* The following declarations are for VBR connections only */ ulong Bucket; /* Current Bucket contents */ ulong Increment; /* Bucket Increment each time a cell is sent */ ulong Limit; /* Bucket limit */ ulong ICG_PCR; /* 1/PCR */ uchar Pad[32-6*4]; ushort ACD_Ctrl; /* ACD Control for the connection */ SAR_HDR_t SAR_Hdr; /* SAR-PDU Header */ } ACD_t, *pACD_t; AAL3/4 Segmentation 9-3 BookL64364PG.fm5 Page 4 Friday, January 28, 2000 4:58 PM To leave the space for the header and the trailer in the cell, the AAL0 connection type is provided with the VCD_Offs and VCD_Tbytes fields in the CRC32 field of the VCD. The VCD_Offs specifies the offset from the beginning of the cell to the start of the payload data, and VCD_Tbytes specifies the number of bytes to be filled in the payload of the cell. For a cell size of 52 bytes (cell header and payload), VCD_Offs is set to 10 (4 bytes for CDS, 4 bytes for the cell header, and 2 bytes for the SAR-PDU header), and the VCD_Tbytes is set to 44 which is the size of the SAR-PDU. The EDMA returns the actual number of bytes copied into the cell in CDS_Tbytes, which can then be used to determine the LI field of the trailer. In the TxCell routine of the application code, an EDMA_TxCell command for a AAL3/4 connection is issued in the same way as it is issued for an AAL5 connection. Therefore, the TxCell routine is not altered to support AAL3/4 processing. When the payload data is copied from the buffer to the cell, the EDMA returns the connection number in the completion queue and the APU processes the message to complete the header and the trailer for the SAR-PDU. The ST field of the SAR-PDU is based on the CDS_BOM and CDS_EOM bits that are set by the EDMA to indicate the beginning and end of a message. The SN field of the SAR-PDU is copied from the ACD structure. The trailer is formed by setting LI based on the CDS_Tbytes and CDS_Crc10 is set in the CDS to enable the CRC10 generation for the contents of the cell by the ACI. Finally, the SN is updated in the ACD for the connection. Note that the CDS_Head is also updated to point to the next cell address that was sent to the EDMA for this connection. The SAR-PDU is then sent to the ACI by issuing an EDMA_TxCell with a zero connection number. This is shown in the following code (Figure 9.4): 9-4 AAL3/4 Processing BookL64364PG.fm5 Page 5 Friday, January 28, 2000 4:58 PM Figure 9.4 AAL34_Send() Routine /* -------------------------------------------------------* Name: AAL34_Send() * * Description: Send AAL34 cell after it is returned by the * EDMA in the TxCompl queue. * * parameters: ConNum: Connection number of the VC. * * -------------------------------------------------------*/ void AAL34_Send(const ulong ConNum) { register pCell_t pCell = 0; ulong CDS_Head = (ulong) ACD[ConNum].CDS_Head; if (ACD[ConNum].CDS[CDS_Head]) { pCell = (pCell_t) ((ulong)pCBM + (ulong) ACD[ConNum].CDS[CDS_Head]); } else Halt("Empty Cell for AAL34 Tx Connection"); ACD[ConNum].CDS[CDS_Head] = 0; ACD[ConNum].CDS_Head = (CDS_Head + 1) % 10; /* Set the Crc10 bit of the CDS */ pCell->CDS |= ONE(CDS_Crc10); /* Set the Segment type of the cell * if (pCell->CDS & ONE(CDS_BOM)) { if (pCell->CDS & ONE(CDS_EOM)) { ACD[ConNum].SAR_Hdr.ST = } else ACD[ConNum].SAR_Hdr.ST = } else { if (pCell->CDS & ONE(CDS_EOM)) { ACD[ConNum].SAR_Hdr.ST = } else ACD[ConNum].SAR_Hdr.ST = } ST_SSM; ST_BOM; ST_EOM; ST_COM; /* Update the SAR_Hdr and SAR_Trailer fields of the cell */ half(pCell->Payld[0]) = half(ACD[ConNum].SAR_Hdr); half(pCell->Payld[46]) = (ushort) pCell->CDS & 0x0000fc00; /* Update the ACD for the connection */ ACD[ConNum].SAR_Hdr.SN = (ACD[ConNum].SAR_Hdr.SN + 1) & 0xf; EDMA_TxCell(0,pCell); } AAL3/4 Segmentation 9-5 BookL64364PG.fm5 Page 6 Friday, January 28, 2000 4:58 PM 9.3 AAL3/4 Reassembly The processing of an AAL3/4 connection in the receive direction involves the extraction of the data from the SAR-PDU after verifying that the header and trailer are correct and without errors. Since, the BFD_ErrLength and BFD_ErrCrc control bits in the BFD are not set by the EDMA, the APU needs to update them when the SAR-PDU is processed. The processing of the AAL3/4 cell is done by the AAL3/4_Receive() routine as shown in the Figure 9.5. Figure 9.5 AAL34_Receive() Routine /* -------------------------------------------------------* Name: AAL34_Receive() * * Description: Processing of AAL34 cell received * * parameters: pCell: cell address in Cell Buffer * * -------------------------------------------------------*/ void AAL34_Receive(const ulong ConNum, const pCell_t pCell) { pAAL34_Payld_t Payld = (pAAL34_Payld_t) &pCell->Payld; ulong CurrBFD = (ulong) VCD[ConNum].CurrBFD; ulong CellHdr = pCell->CellHdr; switch (Payld->SAR_Hdr >> ST_SHIFT ) { case ST_BOM: pCell->CDS |= ONE(CDS_BOM); break; case ST_SSM: case ST_EOM: pCell->CDS |= ONE(CDS_EOM); CellHdr |= ONE(CELL_EOM); break; case ST_COM: /* pCell->CDS &= ~( ONE(CDS_BOM) | ONE(CDS_EOM) ); */ break; default: break; } /* Set the Tbytes of the cell */ pCell->CDS &= 0xffff03ff; /* Clear out the CDS_TBytes */ pCell->CDS |= (ulong) (Payld->SAR_Tlr & LI_MASK); /* Error in the Crc10 of the cell */ if (pCell->CDS & ONE(CDS_Crc10)) 9-6 AAL3/4 Processing BookL64364PG.fm5 Page 7 Friday, January 28, 2000 4:58 PM half(BFD[CurrBFD].BFD_Ctrl) = half(BFD[CurrBFD].BFD_Ctrl) | ONE(BFD_ErrCrc); /* Sequence number error */ if ((Payld->SAR_Hdr & SN_MASK) != (half(ACD[ConNum].SAR_Hdr) & SN_MASK) ) half(BFD[CurrBFD].BFD_Ctrl) |= ONE(BFD_ErrLength); /* Update the SN of the ACD */ ACD[ConNum].SAR_Hdr.SN = (ACD[ConNum].SAR_Hdr.SN + 1) & 0xf; /* Send the cell to the EDMA for processing */ EDMA_RxCell(ConNum, CellHdr, pCell); } AAL3/4 Reassembly 9-7 BookL64364PG.fm5 Page 8 Friday, January 28, 2000 4:58 PM 9-8 AAL3/4 Processing BookL64364PG.fm5 Page 1 Friday, January 28, 2000 4:58 PM Chapter 10 Initialization This chapter describes initialization and configuration tasks, and provides sample code for the ATMizer II+ chip. • Section 10.1, “Initialization Overview” • Section 10.2, “Booting Procedures” • Section 10.3, “C Preamble Execution” • Section 10.4, “CPU Initialization and Configuration” • Section 10.5, “Configuration Header File” • Section 10.6, “Host PCI Access” • Section 10.7, “Memory Allocation” • Section 10.8, “Hardware Registers Initialization” • Section 10.9, “Data Structures Initialization” 10.1 Initialization Overview The following steps are typically required to initialize the ATMizer II+ chip: Step 1. Booting The booting step initializes the Configuration and Cache Control (CCC) register and the SDRAM controller, then copies the initialization and application code to an executable memory location. Step 2. C Preamble Execution C preamble execution includes .bss section clearing, stack allocation, and initialization of the global data pointer and stack pointer registers. L64364 ATMizer II+ ATM-SAR Chip Programming Guide 10-1 BookL64364PG.fm5 Page 2 Friday, January 28, 2000 4:58 PM Step 3. CPU Initialization and Configuration CPU initialization and configuration mainly includes cache configuring and flushing, and interrupt and exception handler setting. Step 4. Memory Allocation Memory allocation defines the maps for primary, secondary, and Cell Buffer Memory (CBM). Step 5. Hardware Registers Initialization ATMizer II+ chip hardware initialization and configuration includes all hardware module registers and mode setting. Step 6. Data Structures Initialization Data structures initialization includes Free Cell List initialization, clearing the Virtual Connection Descriptors (VCDs), and setting the Buffer Descriptors (BFDs) and Scheduler Calendar Table (SCDs). Step 7. Interrupt Handler Initialization When the interrupts are enabled, the interrupt handlers are used by the software to process the interrupts. See Chapter 7, “Interrupt Handling” for details. The LSI Logic PROM Monitor (PMON) can be used for default initialization for steps 1 through 3 above and as a software application debugger. However, you do not need PMON for your system. Following steps 1 through 3, you may load the application program (with the load command) to the desired memory location. To compile and link a program for execution, PMON provides a utility program called pmcc. Pmcc invokes the host’s C compiler with the correct arguments, then flags and generates FastFormat records of the code for downloading. You may suppress the default initialization included in PMON and provide your own initialization code. Sample code is provided in Section 10.2, “Booting Procedures,” Section 10.3, “C Preamble Execution,” and Section 10.4, “CPU Initialization and Configuration.” The remainder of this chapter follows the above steps and illustrates each in detail. 10-2 Initialization BookL64364PG.fm5 Page 3 Friday, January 28, 2000 4:58 PM 10.2 Booting Procedures The L64364 ATMizer II+ chip supports the following booting procedures: • booting from the Secondary Port’s byte-wide EPROM page • booting from CBM/serial PROM • booting from CBM/Primary Port Each booting procedure requires a unique sequence of code before reading the actual application code. In the boot code, the ATM Processing Unit (APU) in the ATMizer II+ chip must make sure that the CCC register and the SDRAM controller are initialized first. The SDRAM controller initialization is necessary before it can be used; this is important for applications running code out of SDRAM. The APU also needs to remap the exception vector space to a location where the application program will reside. An example of this default initialization code from the LSI Logic PMON is shown in Section 10.2.1 . For applications using boot procedures from CBM, the boot code should contain a routine for copying the application program to an executable memory location before jumping to that location for execution. The remainder of this section assumes that the application code is stored in SDRAM. 10.2.1 Default ATMizer II+ Chip Initialization The following code (Figure 10.1) initializes the CCC register and the SDRAM controller to default settings before jumping to the main loop of PMON: Booting Procedures 10-3 BookL64364PG.fm5 Page 4 Friday, January 28, 2000 4:58 PM Figure 10.1 CCC Register and SDRAM Controller Initialization 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 #include <regdef.h> #include <cp0_scobra.h> .text .globl atmizer2Init .ent atmizer2Init atmizer2Init: # setup the CCC configuration register # enable: CMP, II+E, DIE, MUL, MAD, BGE, IPWE(1K), WB # Icache: 2 way set assoc, 4K set size # Dcache: 2 way set assoc, 4K set size # CCC <- 0000 0001 1111 0011 1011 0110 0010 0000 li t0, 0x01f3b620 .set noreorder # load the CP0 configuration register mtc0 t0, C0_CCC .set reorder # setup li t1, li t2, sw t1, the wait states of secondary memory page 0, 1, 2 */ 0xf8000888 M_SBCR (t2) # setup the AddrMap Register to direct exceptions # to the SDRAM. li t1,(0<<ADRM_EXCMAP_SHFT) li t2,M_APU_AddrMap sw t1,(t2) # setup the SDRAM config info # Load base address li t2, SBC_BASE /* Precharge command */ li t0, 0x4033b753 sw t0, 4(t2) # control reg /* Mode register */ li t0, 0x20228530 sw t0, 4(t2) # control reg /* li li sw Set t0, t3, t0, Mode */ 0x0000eeee 0x80810000 (t3) /* Set Control */ li t0, 0x10228530 sw t0, 4(t2) # control reg 10-4 Initialization BookL64364PG.fm5 Page 5 Friday, January 28, 2000 4:58 PM 52 53 54 55 56 57 58 /* Load refresh register */ li t0, 0x00000300 sw t0, 8(t2) # refresh reg j ra .end atmizer2Init The CCC register’s default setting can be overwritten with the desired configuration for the specific application after the APU jumps out of the boot sequence and into the application-specific code. This is described in detail in Section 10.4, “CPU Initialization and Configuration.” 10.2.2 Secondary Port EPROM Boot Sequence Booting from the Secondary Port’s 8-bit EPROM is selected if the SYS_BOOT[1:0] pins are 0b00 when the PCI_RSTn signal is deasserted. The boot exception address of 0xBFC0.0000 is mapped to physical address 0x0000.0000. There is no special boot code required in this case since the EPROM is mapped to a page on the Secondary Port and is used only for the firmware for the APU and data structures. 10.2.3 Cell Buffer Memory/Serial PROM Boot Sequence When serial PROM boot is selected (SYS_BOOT[1:0] = 0b11), the APU remains in reset after the PCI_RSTn signal is deasserted until the first 256 bytes of data from the serial PROM are copied into CBM. When the copy is completed, the APU leaves the reset state and begins execution from CBM. The 64 instructions in CBM copy the remaining code from the serial PROM to a valid memory location (CBM, Primary Port memory, or Secondary Port memory) by reading the APU_SRL register and then jumping to that location to continue execution. The boot code in CBM has to be located in a different address area from the actual application program. See Section 10.4.5, “Icache and I-RAM Configuration” for descriptions of how this is done. The CBM boot code cannot be compiled separately from the main application program because it requires the size of the application code to be copied. The following code (Figure 10.2) is used to boot from a serial PROM: Booting Procedures 10-5 BookL64364PG.fm5 Page 6 Friday, January 28, 2000 4:58 PM Figure 10.2 Serial Boot Routine 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 #include <regdef.h> #include <cp0_scobra.h> #define start_addr 0x80820000 .extern start_addr 4 .extern etext 4 .text .globl serialBoot .ent serialBoot serialBoot: # set SR and CAUSE to something sensible li v0,SR_BEV .set noreorder .set noat mtc0 nop mtc0 nop v0,C0_SR zero,C0_CAUSE # set up the CPU and SDRAM controller # this is the routine described in Section 10.2.1 jal nop atmizer2Init # copy code from Serial PROM to SDRAM # get size of code and data to be copied # assuming they are to be copied to SDRAM addr 0xa0820000 li t1,start_addr # beginning of .text section # to be copied la t0,etext # end of .text section subu t2,t0,t1 # number of instr bytes to copy li t1,0xa0820000 # SDRAM address where the program # is to be copied to loop_text: lw t2,M_APU_SRL # read Serial PROM sub t2,4 # decrement counter by 4 bytes (1 word) sw t2,(t1) # store instr word from Serial PROM # to SDRAM .set noreorder .set noat bnez t2,loop_text # get next word addi t1,0x4 # point to next addr in SDRAM .set at .set reorder 10-6 Initialization BookL64364PG.fm5 Page 7 Friday, January 28, 2000 4:58 PM 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 la subu la t3,edata t2,t3,t0 t1,_fdata loop_data: lw t2,M_APU_SRL sub t2,1 sw t2,(t1) .set .set bnez addi .set .set noreorder noat t2,loop_data t1,0x4 at reorder # # # # end of .data section number of data words to copy address where the .data section is to be copied to # # # # read Serial PROM decrement counter by 1 store data word from Serial PROM to SDRAM # get next data word # point to next addr in SDRAM # jump to SDRAM for start of execution li t1,0xa0820000 j t1 nop .set at .set reorder .end serialBoot This code consists of about 56 instructions after compilation. It fits completely in CBM and executes when the APU jumps to the reset exception vector. It is compiled with and linked to the application code. The link address for the boot code is at 0xB000.0000 and the link address for the application code is at 0x80B2.0000 if the application code is cache resident. If the application code is separated into two parts, one in a noncacheable area and the other in I-RAM, the copying process is a little more complicated. The copying code has to be modified to copy the two sections to two different memory areas. 10.2.4 Cell Buffer Memory/Primary Port Boot Sequence Booting from CBM/Primary Port is selected if the SYS_BOOT[1:0] pins are 0x10 when the PCI_RSTn signal is deasserted. In this boot sequence, the reset exception address is remapped to CBM address 0. However, the APU is still in reset until the XPP_APU_Reset bit of the XPP_Ctrl register is cleared. After remapping the reset address, the external PCI master first configures the PCI configuration space of the ATMizer II+ chip and then puts the boot code (such as the one described above with minor modification to the copy-from address) into CBM. When Booting Procedures 10-7 BookL64364PG.fm5 Page 8 Friday, January 28, 2000 4:58 PM this is done, clear the XPP_APU_Reset bit so the APU will jump to CBM and start execution. 10.3 C Preamble Execution The initialization code following the booting procedure prepares the processor for execution of an application program. It typically performs the following tasks: • initializes memory • clears the .bss and .sbss sections • flushes the caches • copies program data from PROM to RAM • initializes the stack pointer and the global data pointer registers • enables interrupts • switches from noncacheable to cacheable space The sample initialization code follows (Figure 10.3): Figure 10.3 Sample Initialization Code 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 /* start-up code, cacheable, no exceptions */ #include <regdef.h> #include <cp0_scobra.h> #define STKSIZE 8192 .comm stack, STKSIZE .text /* * This is the entry point of the entire application */ init: .set noreorder 10-8 /* * setup the Status register (BEV=0, Kernel Mode, Rupts enabled) * enable access to CP0 in user mode * enable interrupts * enable HW interrupt 1 */ li a0, (SR_CU0 | SR_IE | 0x0000) Initialization BookL64364PG.fm5 Page 9 Friday, January 28, 2000 4:58 PM 23 mtc0 a0, C0_SR /* load the CP0 status register */ 24 25 /* clear the SW interrupt bits in the Cause Register */ 26 nop 27 mtc0 zero, C0_CAUSE /* only SW interrupt bits are writeable */ 28 nop 29 30 # clear bss 31 la v0, _fbss 32 la v1, end 33 1: sw $0, 0x0(v0) 34 sw $0, 0x4(v0) 35 sw $0, 0x8(v0) 36 sw $0, 0xc(v0) 37 addu v0,16 38 blt v0, v1, 1b 39 40 # flush the caches 41 # first set up a K1seg sp & gp 42 la sp, stack+STKSIZE-24 43 or sp, K1BASE 44 la gp, _gp 45 or gp, K1BASE 46 47 /* flush the caches */ 48 .set nowarn 49 .word ( 0xbc030000 ) /* instruction CACHE_FLUSHID */ 50 .set warn 51 52 53 # copy .data to RAM 54 # src=etext dst=_fdata stop=edata 55 la t0, etext 56 la t1, _fdata 57 la t2, edata 58 1: lw t3, (t0) 59 sw t3, (t1) 60 addu t0, 4 61 addu t1, 4 62 blt t1, t2, 1b 63 64 # ok to use k0seg now, so initialize sp & gp 65 la sp, stack+STKSIZE-24 66 la gp, _gp 67 68 # transfer to main program 69 # reg indirect necessary to switch segments 70 la t0, main 71 jal t0 72 73 _exit: 74 b _exit 75 .end init C Preamble Execution 10-9 BookL64364PG.fm5 Page 10 Friday, January 28, 2000 4:58 PM The above example is a general initialization routine. It may be varied under different cases. For instance, you may choose your own $gp and $sp values to map the .sbss and .sdata sections and the stack to the data RAM instead of using the linker default $gp and $sp values. You may modify the linker script to achieve the same goal. This is described in detail in Section 10.4, “CPU Initialization and Configuration.” If the .sbss section is mapped to data RAM, then clearing the .sbss section is not necessary since this external block of memory will not be referenced. Instead, the .sbss section in the data RAM should be cleared. Note, both Icache and Dcache should be flushed before they are set and loaded. Also, the above code assumes that the program will run from the PROM, that is, the .text section is located in the PROM. If the application code is in I-RAM or SSRAM of secondary memory, it should be loaded to the corresponding destination address. The sample code is illustrated in Section 10.2.3, “Cell Buffer Memory/Serial PROM Boot Sequence.” 10.4 CPU Initialization and Configuration CPU initialization includes secondary memory controller initialization, cache configuration, and interrupt settings. Secondary memory controller initialization and interrupt settings are described in the L64364 ATMizer II+ ATM-SAR Chip Technical Manual. This section discusses different choices of Icache and Dcache configuration, initialization, and utilization. 10.4.1 Configuration and Cache Control Register The CCC register allows software to configure various aspects of the APU. Figure 10.4 shows the format of the CCC register in the ATMizer II+ chip. The paragraphs following the figure describe the fields and bits and their required settings. Refer to the L64364 ATMizer II+ ATM-SAR Chip Technical Manual for more detail. 10-10 Initialization BookL64364PG.fm5 Page 11 Friday, January 28, 2000 4:58 PM Figure 10.4 CCC Register Layout 31 29 28 27 26 25 24 23 22 21 20 19 18 17 16 EWP r ISR1 EVI CMP IIE DIE MUL MAD TMR BGE IE0 IE1 13 12 11 10 9 8 7 6 5 4 3 2 1 0 DE0 DE1 TE WB SR0 SR1 IsC TAG INV R 15 14 IS[1,0] DS[1,0] IPW IPS[1,0] R Reserved [31:29] These bits are not used in the L64364 and should be cleared. EWP External Write Priority 28 This bit defines SCBus arbitration priority between data reads and writes in the 4-level write buffer. Clearing EWP gives higher priority to data read requests if the read address does not match any of the write addresses in the write buffer. Setting EWP gives higher priority to data writes. R Reserved 27 This bit is not used in the L64364 and should be cleared. ISR1 Icache 1 Enable 26 Scratch-pad RAM mode enable (Icache set 1). Setting depends on application. EVI External Vectored Interrupt 25 1 = enable, 0 = disable. Setting depends on application. CMP R3000 Compatibility Mode 24 1 = enable, 0 = disable. Setting depends on application. IIE Icache Invalidate Request Enable 23 1 = enable, 0 = disable. Setting depends on application. DIE Dcache Invalidate Request Enable 22 1 = enable, 0 = disable. Setting depends on application. MUL Floating-Point Multiplier Unit Enable 21 1 = enable, 0 = disable. Enabled in ATMizer II+ chip. CPU Initialization and Configuration 10-11 BookL64364PG.fm5 Page 12 Friday, January 28, 2000 4:58 PM MAD Multiplier Accumulate Extension Enable 20 1 = enable, 0 = disable. Disabled in ATMizer II+ chip. TMR Timer Facility Enable 19 1 = enable, 0 = disable. Setting depends on application. When enabled and the value in the Count register equals the value in the Compare register, sets interrupt IP7 in the Cause register. BGE Bus Grant Enable 18 1 = enable, 0 = disable. Enabled in ATMizer II+ chip. When enabled, the ATMizer II+ chip recognizes external logic as the BIU Bus master. IE0 Icache Set 0 Enable 17 1 = enable, 0 = disable. Setting depends on application. IE1 Icache Set 1 Enable 16 1 = enable, 0 = disable. Setting depends on application. Note: 10-12 For I-cache, IE1 MUST be set to enable operation of the cache memory and ISR1 determines whether it is used in cache mode or scratch pad mode. IS[1:0] Icache Size [15:14] 0b00 = 1 Kbyte, 0b01 = 2 Kbyte, 0b10 = 4 Kbyte, 0b11 = 8 Kbyte. Set to 0b10 in the ATMizer II+ chip for a 4 Kbyte Icache. DE0 Dcache Set 0 Enable 13 1 = enable, 0 = disable. Setting depends on application. DE1 Dcache Set 1 Enable 12 1 = enable, 0 = disable. Setting depends on application. DS[1,0] Dcache Size [11,10] 0b00 = 1 Kbyte, 0b01 = 2 Kbyte, 0b10 = 4 Kbyte, 0b11 = 8 Kbyte. Set to 0b01 in the ATMizer II+ chip for a 1 Kbyte Dcache. IPW Internal Page Write Enable 9 1 = enable, 0 = disable. Setting depends on application. IPS[1:0] Internal Page Size [8:7] 0b00 = 1 Kbyte, 0b01 = 2 Kbyte, 0b10 = 4 Kbyte, 0b11 = 8 Kbyte. Setting depends on application. Initialization BookL64364PG.fm5 Page 13 Friday, January 28, 2000 4:58 PM TE Translation Buffer Enable 6 1 = enable, 0 = disable. Disabled in ATMizer II+ chip. WB Write Through/Write Back Cache Select 5 0 = write through, 1 = write back. Defines cache operation for addresses not mapped by the Translation Buffer. Setting depends on application. SR0 Scratch-pad RAM Mode Enable (Dcache Set 0) 4 When this bit is set and the DE0 bit is cleared, Dcache Set 0 is configured as scratch-pad RAM. When this bit is cleared, the DE0 bit enables/disables Dcache mode for Set 0. Setting depends on application. SR1 Scratch-pad RAM Mode Enable (Dcache Set 1) 3 When this bit is set and the DE1 bit is cleared, Dcache Set 1 is configured as scratch-pad RAM. When this bit is cleared, the DE1 bit enables/disables Dcache mode for Set 1. Setting depends on application. Note: For the data cache, either SRx or DEx, but NOT both, is set to enable either scratch pad or cache operation. IsC Isolate Cache Mode 2 When set, APU store operations go to the cache but do not propagate to external memory. Setting depends on application. TAG Tag Test Mode When set, load and store operations access the Tag RAMs and can be used for Tag RAM testing. Setting depends on application. INV Cache Invalidate Mode 0 When set, cache contents are invalidated. Used only for cache diagnostic and debug operations. Setting depends on application. 1 10.4.2 Cache Configuration The ATMizer II+ Icache and Dcache organizations are as follows: • The Icache consists of two sets of 4 Kbytes (8 Kbytes total). The Dcache consists of two sets of 2 Kbytes (4 Kbytes total). CPU Initialization and Configuration 10-13 BookL64364PG.fm5 Page 14 Friday, January 28, 2000 4:58 PM • Direct mapped is selected when only one set of cache is enabled. Two-way set associative is selected when two sets are enabled. • One cache line is eight words (4 double-words = 32 bytes = 256 bits). Refill address ordering is wrap-around from the missing address. • Write back or write through is selectable by the WB bit in the CCC register. Both Icache and Dcache can be configured as scratch-pad RAMs. Each scratch-pad RAM must be located in one specific physical address space such as a local or secondary data memory. The APU may load the frequently referenced instructions and data structures into the scratch-pad RAMs to greatly reduce memory access time. 10.4.3 Dcache and D-RAM Configuration Dcache can be configured as direct mapped if one set is enabled or twoway set associative if both sets are enabled. When configured either way, Dcache behaves like regular cache. It may also be configured as data RAM. Either Dcache set (0 or 1) can be configured as scratch-pad RAM by setting the SR0 or SR1 bit of the CCC register. The scratch-pad RAM must be located at a specific physical address like a secondary data memory. Since the ATMizer II+ chip has Dcache Tag RAMs, the Tags must be programmed by isolating the cache before setting the SR bit. To program a data Tag RAM, set the following bits in the CCC register using an MTC0 instruction (Figure 10. 6 shows an example code at the end of this section): CCC_ISC = 1 - Isolate Cache Mode Enable CCC_INV = 0 - Invalidate Mode Disable CCC_TAG = 1 - Tag Test Mode Enable CCC_DC0 or CCC_DC1 = 1 - Set 0 or Set 1 Enable The MTC0 instruction has one delay slot and the instruction immediately following it should not be a load or a store. All load and store instructions following the MTC0 instruction access the data Tag RAM selected by the CCC_DC0 or CCC_DC1 bit using the format shown in Figure 10.5. Since the Dcache set size is 2 Kbytes, only the upper 21 bits of the data are for the tag. Also, the Valid (V) bit should always be 1 to initialize the tag field. The Hit (HT) bit is ignored during a store operation. For a load operation, the Hit bit is set if a match occurs. 10-14 Initialization BookL64364PG.fm5 Page 15 Friday, January 28, 2000 4:58 PM Figure 10.5 Tag Test Mode Loaded Data Format 31 10 9 Tag Data 3 Reserved 2 1 0 HT V WB If the Dcache scratch-pad RAM is enabled, an access to the scratch-pad RAM area is a secondary memory access without any stall cycle. You can choose one or two sets of Dcache, two sets of data RAM, or one set of Dcache and one set of data RAM. Note that only one set of cache can be set (that is, tag field specified) at a time. To map the data RAM to a physical memory area, isolate the Dcache (ISC = 1) and set the tag test mode (TAG = 1). In the tag test mode, all the memory accesses go only to the Tag RAM and the APU stores the tags in the Dcache. The following code is an example of data RAM configuration (Figure 10.6): Figure 10.6 Data RAM Configuration Code 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 /*****************************************************/ * setDram(addr, set) * set tag field of the selected set to map the D-ram * to addr, the addr should be a cachable virtual address. * This can be executed from kseg1 with interrupts disabled. */ .text .globlsetDram .entsetDram setDram: .setnoreorder subusp, 24 # allocate min size context sw s0, 4(sp) moves0, a0 # save a0 for later use # Fill the tags set1 with the coming addr in a0 */ mfc0t3, C0_CONFIG# save the original CP0 configuration register nop nop nop move t0, t3 # Select set0 or set 1? # a1 = 0 -> t0 = 1 << 12 # a1 = 1 -> t0 = 1 << 13 andi a1, 1 xor a1, 1 addiu a1, 1 CPU Initialization and Configuration 10-15 BookL64364PG.fm5 Page 16 Friday, January 28, 2000 4:58 PM 31 /* 32 * disable cache mode 33 * enable Tag test and Isolate cache mode for Dchache set0 or set1 34 */ 35 and t0, ~(CCC_IE1 | CCC_IE0 | CCC_DE1 | CCC_DE0 | CCC_INV) 36 or t0, (CCC_TAG | CCC_ISC) 37 sll t1, a1, 12 38 or t0, t1 39 mtc0t0, C0_CONFIG # load to the CP0 configuration register 40 nop 41 nop 42 nop 43 44 and a0, 0xfffffc00 # Map to cachable virtual address 45 or a0, 2 # 2 = vaild 46 li t1, 0 47 LOOP4: 48 sw a0, 0(t1) # store the tag ram 49 addiu a0, a0, 32 # advance the tag value by 4 words 50 addiu t1, t1, 32 # advance the tag position by 4 words 51 sltiu t2, t1, 2048 # continue if <=2k 52 bne t2, zero, LOOP4 53 nop 54 55 mtc0 t3, C0_CONFIG # restore the original CP0 configuration register 56 nop 57 nop 58 nop 59 60 .set reorder 61 lw s0, 4(sp) 62 addu sp, 24 # deallocate 63 j ra 64 65 .end setDram 10-16 Initialization BookL64364PG.fm5 Page 17 Friday, January 28, 2000 4:58 PM 10.4.4 Dcache and C-RAM Usage When the compiler and linker generate executables, they divide all data into one of the four sections listed in Table 10.1. Table 10.1 Data Section Allocation Name Description .data The .data section contains memory that the linker can initialize to nonzero values before the program begins to execute. The assembler uses 32-bit addressing to access these symbols. .sdata Similar to the .data section, except that the linker places it within a 64 Kbyte region pointed to by the $gp register so that the assembler can use economical 16-bit addressing to access it. .bss The .bss section consists of noninitialized data, which should be initialized to zero by the C preamble before the program begins to execute. Its data size is greater than the value specified by the -G command line option. The assembler uses 32-bit addressing to access these symbols. .sbss Similar to .bss section, except that its data size is smaller than the value specified by the -G command line option and the linker places it within a 64 Kbyte region pointed to by the $gp register. The assembler can use economical 16-bit addressing to access it. The combined size of the .sdata and .sbss sections must not exceed 64 Kbytes. Items equal to or smaller than the specified size go in the .sdata or the .sbss section. The -G command line option for each compiler or assembler can increase the size of the data items to be put into the .sdata and .sbss sections. If a -G value is not specified to the compiler, the default is eight. Here is an example: 1 2 int a = 5; char b; Variable a will be located in the .data or .sdata section and initialized to five. Variable b will be located in the .bss or .sbss section and initialized to zero. If the code is compiled with a -G 4 option, both variables will be put into the small sections since both their data sizes are not greater than four bytes. If another value is chosen for the -G option, like -G 2, then variable a will be put in the .data section since it is larger than two bytes while variable b will be put in the .sbss section since it is smaller than two bytes. CPU Initialization and Configuration 10-17 BookL64364PG.fm5 Page 18 Friday, January 28, 2000 4:58 PM The small data section (.sdata) and the small bss section (.sbss) are relatively addressed through the Global Pointer register $gp. The assembler code looks like the following line: lw $v1, offset($gp) The data section (.data) and the bss (.bss) section are absolutely addressed. The assembler code looks like the following two lines: 1 2 lui lw $v1, 16-bit-upper-address $v1, 16-bit-offset($v1) Because addressing items through $gp is faster than through a general method, you can put as many items as possible in the .sdata or .sbss sections. To optimize code execution, you can intentionally force the frequently referenced data structures (that is, .sdata and .sbss sections) to be located in the data RAM area. During the design phase, part of the physical memory is allocated and the data RAM is mapped into it by setting the corresponding tag address. Note: When setting the tag address, use cacheable virtual address. The .sdata section, the .sbss section, and the stack is then forced into the data RAM range. This can be achieved in two ways as described below. The first way is relatively simple. In the C preamble, as described in Section 10.3 , the $gp and the $sp is set to let the .sbss section and the stack fall into the data RAM range. Note that in this method, you should declare all the global variables as noninitialized variables and do the initialization in the code with expressions. In this way, all the global variables will be in the .sbss section and none in the .sdata section. Normally, when the linker links the object files, it creates a _gp symbol and its value should be assigned to the $gp register. After you modify the $gp register, the whole .sbss section is allocated to a different memory (data RAM in this case). Since .sbss holds only noninitialized variables and their default values should be zero, it does not matter where the .sbss section locates, providing it does not overwrite any other legal memory contents. The same reasoning applies to the stack which holds all the local variables. Since the .sdata section contents are 10-18 Initialization BookL64364PG.fm5 Page 19 Friday, January 28, 2000 4:58 PM initialized, you cannot simply change the $gp value. To do so would result in incorrect data addressing and fetching. Note: It is your responsibility to make sure that no actual data from the .sbss section and the stack overlap when they are put into the same cache set. You can always allocate one cache set to the .sbss section and the other set to the stack. The second method is more complex. You can modify the linker script to set the .sdata and .sbss sections’ starting and ending addresses and the $gp value according to the design. Then let the complier and linker put all global variables into these two sections with the proper -G option. The data RAM now is exactly mapped to the .sdata and .sbss sections. In the program, after setting the Dcache tags, you will need to load the contents of the .sdata section into the Dcache before enabling the data RAM mode. In this way, all global and local variables are intentionally put within the range of the data RAM. During normal operation, all references to these variables go to the data RAM area and dramatically reduce the data fetching time. 10.4.5 Icache and I-RAM Configuration The Icache, similar to the Dcache, may be configured as I-RAM. However, in contrast to the Dcache, only Set 1 may be configured as either Icache or I-RAM. Set 0 can be configured only as cache. The I-RAM set is a 4 Kbyte, single-cycle SRAM contained within the ATMizer II+ chip. If the Icache scratch-pad RAM mode is enabled, an access to the scratch-pad RAM mapped area is a secondary memory access without any stall cycle. Set 1 of the Icache is configured as I-RAM by setting the IR1 bit of the CCC register. The procedure to configure an Icache as I-RAM is similar to that described in the Dcache section. Set or clear the following CCC register bits as indicated: CCC_IR1 = 1 - Configure Icache Set 1 as scratch-pad RAM CCC_ISC = 1 - Isolate Cache Mode Enable CCC_INV = 0 - Invalidate Mode Disable CCC_TAG = 1 - Tag Test Mode Enable CCC_IS0 or CCC_IS1 = 1 - Set 0 or Set 1 Enable CPU Initialization and Configuration 10-19 BookL64364PG.fm5 Page 20 Friday, January 28, 2000 4:58 PM The Icache tag setting is also similar to the Dcache tag setting with the data format shown in Figure 10.5, except that only the upper 20 bits are for the tag (since the Icache set size is 4 Kbytes) and the WB bit is ignored. 10.4.6 Icache and I-RAM Usage The I-RAM set may hold up to 1 K instructions of the user-written firmware to power the APU. These instructions are frequently referenced and need to reside permanently in the I-RAM to reduce the instruction fetching time. The two sets of Icache can hold up to 2 K instructions. Based on different application code sizes, you may configure the Icache differently. For code size smaller than 8 Kbytes, you may configure two 4 Kbyte Icache sets and let the APU automatically load the instructions into the Icaches after the first reference. If the total code size is larger than 8 Kbytes, you may separate the code into two parts and configure Set 0 as regular Icache and Set 1 as I-RAM. The frequently referenced code part (for example, the interrupt handler) is preloaded into the I-RAM. The other part (for example, the initialization routine) is run regularly with only one Icache set. Another choice is to dedicate all 8 Kbytes of Icache to one part of the code and locate the rest of the code in off-chip memory. Note that it is not possible to configure Icache Set 0 as I-RAM. However, you can obtain the same effect by leaving the two sets as true instruction caches, mapping the code that should reside in the Icache into a cacheable memory area and all other code into the noncacheable area. It is not necessary to load instructions into Icaches since the APU will do that after the first code access. To separate the code into several parts, you can modify the linker script to create multiple, noncontiguous, text sections. As an example, the following code is part of a linker script used by the GNU linker (Figure 10.7): 10-20 Initialization BookL64364PG.fm5 Page 21 Friday, January 28, 2000 4:58 PM Figure 10.7 Separating the Code with the Linker Script 1 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 SECTIONS 2 { /* Read-only sections, merged into text segment: */ .iram 0x80b00000 : { _sIram = . ; APU_Loop.o(.text) APU_CBR.o(.text) APU_VBR.o(.text) APU_ABR.o(.text) APU_Cell.o(.text) APU_Compl.o(.text) APU_HostMsg.o(.text) APU_Error.o(.text) Ring.o(.text) _eIram = . ; } .loader 0xa0b20000 : { _sLoader = .; APU_Ram.o(.text) _eLoader = .; } .rest 0x80b21000 : { *(.text) } In the previous code example, three noncontiguous text sections were created: .iram, .loader, and .rest. The .iram section starts from symbol _sIram in line 6 and ends at symbol _eIram in line 16. In the following code example, the size of .iram is calculated by subtracting _sIram from _eIram. Then the Icache loading routine is called to load the .iram section into the I-RAM. Note: A similar technique also is used to determine the size of the code when loading from serial PROM. As in the above script, the Icache/Dcache tag setting routine and the Icache loading routine (APU_Ram.o) are linked to the noncacheable area (.loader section) since it should be running out of the caches. The .iram section is linked to the cacheable area since it should be running from the Icache. The .rest section is also linked to a cacheable area. After the Icache is loaded and I-RAM mode is enabled, the program counter jumps to the starting routine in the .iram section. Since the CPU Initialization and Configuration 10-21 BookL64364PG.fm5 Page 22 Friday, January 28, 2000 4:58 PM program counter jumps from the noncacheable area to the cacheable area with a 32-bit address difference, the jalr instruction should be used for the jump rather than the jal instruction whose branch range is only 26 bits offset. Also, external variables _sIram and _eIram should be declared as an unknown length array to avoid any relocation errors caused by the -G option. The constant definitions are left out of the example in Figure 10.8 for simplicity. Figure 10.8 Main Loop Example 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 /*________________________________________________________ * * MAIN LOOP *________________________________________________________ */ int main(void) { Func *f = (Func *)IRAM_START; extern char _sIram[], _eIram[]; char *src = _sIram; char *dst = _eIram; ulong iram_size = mini( (dst - src), IRAM_SIZE); /* * set tag field to map the DRAM to the memory */ setDram(DRAM_START & 0xffffff, 1); /* * set tag field to map the IRAM to the memory */ setIram(IRAM_START & 0xffffff); /* * Load the code into the iram */ loadIram(0, (ulong)src, iram_size); /* * Initialize all necessary configurations for ATMizer-II+ */ Initialize(); (* f)(); return 0; } 10-22 Initialization BookL64364PG.fm5 Page 23 Friday, January 28, 2000 4:58 PM During the initialization period, the APU first needs to map the I-RAM to the designated physical memory area and then load the instructions into Icache. Mapping the I-RAM to a physical memory area is similar to mapping the D-RAM. Isolate the Icache (ISC = 1) and put it in the tag test mode (TAG = 1). The APU then stores the tag in the Icache. To load the firmware into the I-RAM, enable the cache mechanism and put the Icache in the data test mode. The APU first disables the cache to load the instructions from the external memory. It then enables Icache Set 1 (IC1 = 1) and the cache isolated mode (ISC = 1), so the following memory access goes only to the Icache. Thus the fetched instruction is stored in Icache. The above procedure is repeated until the complete .iram section is loaded into Icache. The Icache is then configured as I-RAM (IR1 = 1, IC1 = 1) and the .iram section now resides permanently in I-RAM. The following code (Figure 10.9) is an example of setting and loading the I-RAM: Figure 10.9 Setting and Loading IRAM 1 /**************************************************** 2 * setIram(addr) 3 * set tag field of set 1 to map the iram to addr, the 4 * addr should be the physical address. 5 */ 6 .text 7 .globl setIram 8 .ent setIram 9 10 setIram: 11 .set noreorder 12 subu sp, 24 # allocate min size context 13 14 /* Fill the tags set1 with the coming addr in a0 */ 15 mfc0 t3, C0_CONFIG # save the original CP0 configuration register 16 nop 17 nop 18 nop 19 move t0, t3 20 21 /* 22 * disable cache mode 23 * enable Tag test and Isolate cache mode for Ichache set1 24 */ 25 and t0, ~(CCC_IE1 | CCC_IE0 | CCC_DE1 | CCC_DE0) 26 or t0, (CCC_TAG | CCC_ISC | CCC_IE1) 27 mtc0 t0, C0_CONFIG# load to the CP0 configuration register 28 nop 29 nop 30 nop CPU Initialization and Configuration 10-23 BookL64364PG.fm5 Page 24 Friday, January 28, 2000 4:58 PM 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 or li LOOP1: sw addiu addiu sltiu bne nop mtc0 nop nop nop a0, 2 t1, 0 # 2 = vaild a0, a0, t1, t2, t2, # # # # 0(t1) a0, 32 t1, 32 t1, 4096 zero, LOOP1 t3, C0_CONFIG .set reorder addu sp, 24 j ra store the tag ram advance the tag value by 4 words advance the tag position by 4 words continue if <= 4k # restore the original CP0 configuration register # deallocate .end setIram /************************************************************* * loadIram(dst, src, n) * copy n words from src into iram at dst */ .text .globl loadIram .ent loadIram loadIram: .set noreorder subu sp, 24 # allocate min size context mfc0 t3, C0_CONFIG # save the original CP0 configuration register nop nop nop LOOP2: 10-24 /* disable IsC so that intr can now be fetched */ /* from the memory */ /* * disable cache * disable Tag test and Isolate cache mode */ move t0, t3 and t0, ~(CCC_IE1 | CCC_IE0 | CCC_TAG | CCC_ISC | CCC_DE1 | CCC_DE0) mtc0 t0, C0_CONFIG # load to the CP0 configuration register nop nop nop lw t1, 0(a1) Initialization # load an instruction from the memory BookL64364PG.fm5 Page 25 Friday, January 28, 2000 4:58 PM 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 /* Enable IsC so that intr can now be written */ /* into the data part of the iram */ /* * enable Icache set1 with size 4k. * enable Isolate cache mode */ or t0, (CCC_IE1 | CCC_IS4 | CCC_ISC) mtc0 t0, C0_CONFIG # load to the CP0 configuration register nop nop nop sw t1, 0(a0) # store the instruction into the iram addiu addiu sub bgez nop a0, a1, a2, a2, # # # # a0, 4 a1, 4 a2, 4 LOOP2 advance the dst by 4 bytes advance the src by 4 bytes decrement the size by 4 bytes continue if n >= zero /* * IRAM operation on Icache Bank 1 is enabled by setting both IE1 * and IR1 bits. * Also, enable WB mode to speed up the performance. */ or t3, CCC_IR1 | CCC_IE1 | CCC_IS4 | CCC_WB mtc0 t3, C0_CONFIG # restore the original CP0 configuration register nop nop nop .set reorder addu sp, 24 # deallocate /* change to non-cachable address */ j ra .end loadIram 10.5 Configuration Header File Table 10.2 describes the contents of the configuration head file (config.h). All the parameters for the system configuration (shared or not) by the ATMizer II+ chip and the host are defined in this file. The ATMizer II+ chip and the host initialize the system according to this header file. Changes in the corresponding values of the parameters will adjust the configuration of the system. Configuration Header File 10-25 BookL64364PG.fm5 Page 26 Friday, January 28, 2000 4:58 PM Table 10.2 Configuration Header File Contents APU1 Host1 Description MaxOpenCon 1024 1024 Maximum open connections allowed RxRing_Credit 16 16 APU -> Host RxRing credit value pRxRing_Credit_APU/Host 0xA806.8480 0xBA06.8480 APU -> Host RxRing credit address RxRing_Base_APU/Host 0xA806.8400 0xBA06.8400 APU -> Host RxRing base RxRing_Size 32 32 APU -> Host RxRing size TxRing_Credit 16 16 Host -> APU TxRing credit value pTxRing_Credit_APU/Host 0xB000.0080 0xB400.0080 Host -> APU TxRing credit address TxRing_Base_APU/Host 0xB000.0000 0xB400.0000 Host -> APU TxRing base TxRing_Size 32 32 Host -> APU TxRing size HCD_MessBase_APU/Host 0xA806.8484 0xBA06.8484 Common location for open connection message data Stats_MessBase_APU/Host 0xA806.8584 0xBA06.8584 Common location for statistics message data Rx_SBuffSize 64 64 Size of small buffers for Rx data Rx_LBuffSize 256 256 Size of large buffers for Rx data Rx_SBuffCount0 170 170 Number of small buffers for Rx data, list 0 Rx_LBuffCount0 170 170 Number of large buffers for Rx data, list 0 Rx_SBuffCount1 170 170 Number of small buffers for Rx data, list 1 Rx_LBuffCount1 170 170 Number of large buffers for Rx data, list 1 Name APU -> Host RxRing Host -> APU TxRing Commands Related Rx Direction (Sheet 1 of 6) 10-26 Initialization BookL64364PG.fm5 Page 27 Friday, January 28, 2000 4:58 PM Table 10.2 Configuration Header File Contents (Cont.) Name APU1 Host1 Description Rx_SBuffCount2 170 170 Number of small buffers for Rx data, list 2 Rx_LBuffCount2 170 170 Number of large buffers for Rx data, list 2 Rx_SBuffCount3 170 170 Number of small buffers for Rx data, list 3 Rx_LBuffCount3 170 170 Number of large buffers for Rx data, list 3 Rx_SBuffCount4 170 170 Number of small buffers for Rx data, list 4 Rx_LBuffCount4 170 170 Number of large buffers for Rx data, list 4 Rx_SBuffCount5 170 170 Number of small buffers for Rx data, list 5 Rx_LBuffCount5 170 170 Number of large buffers for Rx data, list 5 RxBFDCount 2040 2040 Total RxBFD in all 6 Small and Large lists RxBFDBase_APU/Host 0xA800.0000 0xBA0. 0000 Rx BFD table base address RxSBuff_APU/Host 0xA801.0000 0xBA01.0000 Rx small buffers pool base address RxLBuff_APU/Host 0xA802.0000 0xBA02.0000 Rx large buffers pool base address Tx_BuffSize0 1024 1024 Maximum size of buffers for Tx data, list 0 Tx_BuffSize1 1024 1024 Maximum size of buffers for Tx data, list 1 Tx_BuffSize2 1024 1024 Maximum size of buffers for Tx data, list 2 Tx_BuffSize3 1024 1024 Maximum size of buffers for Tx data, list 3 Tx Direction (Sheet 2 of 6) Configuration Header File 10-27 BookL64364PG.fm5 Page 28 Friday, January 28, 2000 4:58 PM Table 10.2 Configuration Header File Contents (Cont.) Name APU1 Host1 Description Tx_BuffSize4 1024 1024 Maximum size of buffers for Tx data, list 4 Tx_BuffSize5 1024 1024 Maximum size of buffers for Tx data, list 5 Tx_BuffSize6 1024 1024 Maximum size of buffers for Tx data, list 6 Tx_BuffSize7 1024 1024 Maximum size of buffers for Tx data, list 7 Tx_BuffCount0 256 256 Number of buffers for Tx data, list 0 Tx_BuffCount1 256 256 Number of buffers for Tx data, list 1 Tx_BuffCount2 256 256 Number of buffers for Tx data, list 2 Tx_BuffCount3 256 256 Number of buffers for Tx data, list 3 Tx_BuffCount4 256 256 Number of buffers for Tx data, list 4 Tx_BuffCount5 256 256 Number of buffers for Tx data, list 5 Tx_BuffCount6 256 256 Number of buffers for Tx data, list 6 Tx_BuffCount7 256 256 Number of buffers for Tx data, list 7 TxBFDCount 2048 2048 Total Count of TxBFDs in 8 lists TxBFDBase_APU/Host 0xA800.8000 0xBA00.8000 Tx BFD table base address TxBuff_APU/Host 0xA806.0000 0xBA06.0000 Tx buffers pool base address EDMA_BFD_FBase 0xA800.0000 n/a Buffer Descriptor table in primary memory EDMA_BFD_LBase 0xA080.0000 n/a Buffer Descriptor table in secondary memory EDMA_VCD_Base 0xA060.0000 n/a Virtual Connection Descriptor table base EDMA_TxBFD_Copy 1 n/a Tx BFD local or far mode EDMA_RxBFD_Copy 1 n/a Rx BFD local or far mode EDMA Related (Sheet 3 of 6) 10-28 Initialization BookL64364PG.fm5 Page 29 Friday, January 28, 2000 4:58 PM Table 10.2 Configuration Header File Contents (Cont.) Name APU1 Host1 Description EDMA_TxBFD_Far n/a n/a Tx BFD are copied to/from far address EDMA_RxBFD_Far n/a n/a Rx BFD are copied to/from far address EDMA_ConReAct 0 n/a Enable connection reactivation in Tx direction EDMA_ByteSwap 0 n/a Ctrl byte swapping for cell transferring EDMA_Compat 0 n/a Compatibility in byte swapping for off-word boundary buffers with L64364. EDMA_OrHdr 0 n/a Ctrl the generation and extraction of the cell header EDMA_Ctrl see note 2 n/a EDMA control fields SCD_CalBase0 0xA061.0000 >> 9 n/a Calendar 0 base address SCD_CalBase1 0xA061.0400 n/a Calendar 1 base address SCD_CalBase2 0xA061.0800 n/a Calendar 2 base address SCD_CalBase3 0xA061.0C00 n/a Calendar 3 base address SCD_FlatMode see note 2 n/a Scheduler’s operating mode: flat or priority SCD_VCDinCB 0 n/a Number of VCDs in cell buffer SCD_Cal_Size0 1024 n/a Number of cell slots in calendar 0 SCD_Cal_Size1 1024 n/a Number of cell slots in calendar 1 SCD_Cal_Size2 1024 n/a Number of cell slots in calendar 2 SCD_Cal_Size3 1024 n/a Number of cell slots in calendar 3 SCD_Ctrl see note 3 n/a Scheduler control fields Scheduler Related (Sheet 4 of 6) Configuration Header File 10-29 BookL64364PG.fm5 Page 30 Friday, January 28, 2000 4:58 PM Table 10.2 Configuration Header File Contents (Cont.) APU1 Host1 Description ACI_TxSize 16 n/a Maximum number of cells in Tx FIFO ACI_RxLimit 4 n/a Threshold in RxFIFO to generate an interrupt ACI_TxLimit 8 n/a Threshold in TxFIFO to generate an interrupt ACI_RxMask 0x00FF.FFFF n/a Rx polling mask ACI_Phy 0 n/a Phy physical address to respond to in slave mode ACI_LoopBack see note 2 n/a Set for on-chip loop back ACI_Parity 0 n/a Enable for Utopia parity generation and error detection ACI_CellSize 00 n/a The cell size ACI_HEC 1 n/a Set to generate or verify the HEC bit ACI_TxIdle 0 n/a Set to generate idle cells when Tx FIFO is empty ACI_FixedPr 0 n/a Set to enable the priority of Phy device in Rx direction ACI_Slave 0 n/a Ctrl the master/slave operation of the Utopia bus ACI_DirectPoll 0 n/a Enable direct or multiplexed polling scheme ACI_Reset 0 n/a Set the Tx and Rx state machines to idle state ACI_Ctrl see note 4 n/a ACI control fields ResvCB 0 n/a In words, needed to calculate ACI_freelist Name ACI Related (Sheet 5 of 6) 10-30 Initialization BookL64364PG.fm5 Page 31 Friday, January 28, 2000 4:58 PM Table 10.2 Configuration Header File Contents (Cont.) Name APU1 Host1 Description n/a defined at compile time Host Connection Descriptor table in private memory Host Private HCD_Base (Sheet 6 of 6) 1. n/a = not applicable 10.6 Host PCI Access For the host, the accesses to the ATMizer II+ CBM, Mailbox FIFO, XPP_Control register, and secondary memory are handled through the PCI Bus. The following discussion assumes that the ATMizer II+ is a satellite. 10.6.1 PCI Bus Configuration Before accessing the data for read or write, the software has to initialize and configure the PCI Bus. To do so, it has to set up the SAR PCI configuration space. The ATMizer II+ chip supports type 0 configuration space access. PCI configuration space registers are shown in Figure 10.10. Shaded registers in the figure are not used by the ATMizer II+ chip. Configuration space writes to unused registers are completed normally, although data is ignored. Configuration space reads of unused registers are completed normally with all data bits 0. Refer to the L64364 ATMizer II+ ATM-SAR Chip Technical Manual for more detail. Host PCI Access 10-31 BookL64364PG.fm5 Page 32 Friday, January 28, 2000 4:58 PM Figure 10.10 PCI Configuration Space Registers 31 16 15 0x00 Device ID 0x04 Status 0x08 0 Vendor ID Command Class Code BIST 0x0c Revision ID Header Type 0x10 Latency Timer Cache Line Size Base Address Register 1 0x14 Base Address Register 2 0x18 Base Address Register 3 0x1c Base Address Register 4 0x20 Base Address Register 5 0x24 Base Address Register 6 0x28 Cardbus CIS Pointer 0x2c Subsystem ID Subsystem Vendor ID 0x30 Expansion ROM Base Address 0x34 Reserved 0x38 Reserved 0x3c Max Latency Min Grant Note: Interrupt Pin Interrupt Line The configuration space registers are documented in the PCI Bus little endian format (least significant byte is byte 0). The PCI configuration space is accessed by the host with the address format described in Figure 10.11. Figure 10.11 PCI Configuration Address Format 31 1 24 23 0 1 1 0 1 0 0 6 Don’t Care 5 0 Offset Hexadecimal base address: 0xB400.00000 The first thing the host must do is set the Command field in the PCI configuration space registers (offset 0x06, virtual address 0xB400 0006) to a default value. A reasonable setting is 0x0006. To enable the configuration write cycles to the SAR, the following sequence (Figure 10.12) is needed to enable the bridge chip. 10-32 Initialization BookL64364PG.fm5 Page 33 Friday, January 28, 2000 4:58 PM Figure 10.12 Programming the Latency Timer in the PCI Configuration Register /* Program the Latency timer in the Configuration register */ *((uchar *) 0xb800005e) = 0x0a; *((uchar *) 0xbd000000) = 0xe7; printf("Programming Command register; "); pPCI_Conf->Command = 0x06; printf("Programmed Command register."); *((uchar *) 0xbd000000) = 0xff; *((uchar *) 0xb800005e) = 0x06; Further details on the programming of the PCI configuration registers can be obtained from the L64364 ATMizer II+ ATM-SAR Chip Technical Manual. 10.6.2 PCI Access to the ATMizer II+ Memory Space Next, it is necessary to set up the PCI address space that will be used to access the SAR memory. This is done by writing the base address of the memory space you want to use into Base Address register 1 and 2 (offset 0x10 and 0x14) in the PCI configuration space. This address can be one of the four PCI base addresses described in Table 10.3. Note that there are four addresses, but that you actually only need two of them. Only Base Address register 1 and 2 are defined in the PCI configuration space of the ATMizer II+ chip. The other two addresses could be used if a second ATMizer II+ chip was connected to the same PCI Bus. Base Address register 1 defines slave transfers to the ATMizer II+ CBM, Mailbox FIFO, and XPP_Control register. The memory map for this address range is shown in Table 10.4. Base Address register 2 maps the ATMizer II+ secondary memory into PCI memory space. Refer to Table 10.5. Host PCI Access 10-33 BookL64364PG.fm5 Page 34 Friday, January 28, 2000 4:58 PM Once the base address registers are set, the host must use the address format specified in Table 10.3 when it wants to access the ATMizer II+ memory space through the PCI bus. Table 10.3 PCI Virtual Address vs. Base Addresses PCI Base Address1 Host Virtual Base Address2 0xB500.0000 0xB500.0000 0xB600.0000 0xB600.0000 0xB5800.0000 0xB580.0000 0xB700.0000 0xB700.0000 1. The base address the ATMizer II+ chip will scan on the PCI Bus, according to the “Base Register 1 or 2” value. 2. The base address used by the host to access the ATMizer II+ memory space. Example: If Base Address register 1 is set to 0xB500.0000, the virtual base address the host must use to write to the ATMizer II+ CBM is 0xB500.0000. The CBM range then is 0xB500.0000 to 0xB500.0FFF. The hardware registers are accessed by the host in little endian format starting at 0xB500.7000. Table 10.4 ATMizer II+ External Memory Map 10-34 PCI Memory Module Size 0x0000–0x0FFF Cell Buffer Memory 4 Kbyte 0x4000–0x400F Mailbox FIFO 16 bytes Initialization BookL64364PG.fm5 Page 35 Friday, January 28, 2000 4:58 PM Table 10.5 Secondary Bus Memory Map Start Address End Address Size Device Type Bus Size 0x0000.0000 0x000F.FFFF 1 Mbyte EPROM/SRAM 8 0x0020.0000 0x002F.FFFF 1 Mbyte PHY 8 0x0040.0000 0x005F.FFFF 2 Mbytes EPROM/SRAM 32 0x0060.0000 0x007F.FFFF 2 Mbytes SSRAM 32 0x0080.0000 0x00FF.FFFF 8 Mbytes SDRAM 32 10.7 Memory Allocation In the ATMizer II+ chip, MAX_CON_NUM (1024) connections can be opened simultaneously. This value has a direct impact on the allocation of different memory blocks. Assuming that only two buffers per connection will be used at a given time, that leads to 2 K transmit buffers and 2 K receive buffers maximum in system memory. The memory space needed by the different modules of the ATMizer II+ chip need to be allocated before the software can enter the main loop and the transfer of data occurs. To enable optimum use of the memory available on the ADP, the Memory_t structure (shown in Figure 10.13) with the following code, is used to allocate the memory to different data structures. Figure 10.13 Allocating Memory to Data Structures 1 /* 2 * Memory_t structure for memory allocation 3 */ 4 typedef struct { 5 ulong Sram; 6 ulong SramEnd; 7 ulong Ssram; 8 ulong SsramEnd; 9 ulong Sdram; 10 ulong SdramEnd; 11 ulong Phy; 12 ulong PhyEnd; 13 ulong Shr; 14 ulong ShrEnd; 15 } Memory_t, *pMemory_t; Memory Allocation 10-35 BookL64364PG.fm5 Page 36 Friday, January 28, 2000 4:58 PM The size and location of the ATMizer code is determined by the compiler variables “_ftext” and “_end”. The size of the code is computed as shown in Figure 10.14. Figure 10.14 ATMizer Code Size Calculation 1 #define CODE_SIZE _end - _ftext The Memory_t variables are initialized to the starting and end points of the memory available for allocation as shown in Figure 10.15. Figure 10.15 Memory-T Variables Initialization 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 /* * Memory_t Initialization for memory allocation */ Memory_t Memory; Memory.Sram = APU_BASE_SEC + SRAM_OFFS + SRAM_PMON_OFF;; Memory.SramEnd = APU_BASE_SEC + SRAM_OFFS + SRAM_SIZE; Memory.Ssram = APU_BASE_SEC + SSRAM_OFFS; Memory.SsramEnd = APU_BASE_SEC + SSRAM_OFFS + SSRAM_SIZE; Memory.Sdram = APU_BASE_SEC + SDRAM_; Memory.SdramEnd = APU_BASE_SEC + SDRAM_OFFS + SDRAM_SIZE; Memory.Phy = APU_BASE_SEC + PHY_OFFS;; Memory.PhyEnd = APU_BASE_SEC + PHY_OFFS ; Memory.Shr = (ulong) MapForAPU((void *) &pPCI->Buff); Memory.ShrEnd = APU_BASE_PCI + PCI_SIZE; Since the memory allocation starts after the ATMizer code in the Secondary memory, the Memory pointers are updated based on the location of the code in the Secondary memory as shown in Figure 10.16. Figure 10.16 Updating Memory Pointers 1 2 3 4 5 6 7 8 9 10 11 switch ( ((ulong) _ftext >> 20) & 0xf ) { case 2: Memory.Phy += CODE_SPACE; break; case 4: Memory.Sram += CODE_S break; case 6: Memory.Ssram += CODE_SPACE; break; case 8: Memory.Sdram += CODE_SPACE; break; default : break; } 10-36 Initialization BookL64364PG.fm5 Page 37 Friday, January 28, 2000 4:58 PM This allows the software to be compiled to SSRAM or SDRAM without modifying the intialization routines. The initialization of the secondary memory is done by the InitSEC routine in the host code which initializes the SCD calendar pointers, the VCD pointer, the BFD pointers and the ACD pointer. These pointers are passed to the ATMizer II+ chip through the configuration structure and are used in programming the hardware registers. The BFD numbers are allocated based on the EDMA_TxBFD_Far, EDMA_TxBFD_Copy, EDMA_RxBFD_Far and EDMA_RxBFD_Copy bits in the EDMA_Ctrl register. The allocation of the BFD numbers is done to optimize the usage of memory space in the primary and secondary memories as shown in Table 10.6. Table 10.6 BFD Number Allocation TxBFD Location RxBFD Location TxBFD Number RxBFD Number Local Local 1 TxBFDCount + 1 Local Far 1 1 Local Copy RxBFDCount + 1 1 Far Local 1 1 Far Far 1 TxBFDCount + 1 Far Copy RxBFDCount + 1 1 Copy Local 1 TxBFDCount + 1 Copy Far 1 TxBFDCount + 1 Copy Copy 1 TxBFDCount + 1 When the TxBFDs are in local memory (secondary memory) and the RxBFDs are in the far memory (primary PCI memory), then the BFD numbers for both can start at 1. On the other hand, if the RxBFD numbers start at (TxBFDCount + 1) as in the case when both Tx and RxBFDs are in the secondary memory, then the memory space corresponding to BFD numbers 1 to TxBFDCount is not used in the far memory. Memory Allocation 10-37 BookL64364PG.fm5 Page 38 Friday, January 28, 2000 4:58 PM 10.7.1 Receive Direction For the receive direction, the buffers, the RxBFDs, and the RxRing should be in shared memory (primary memory). If the RxBFDs are in packet mode, a copy of the buffers and RxBFDs may also be in secondary memory (SDRAM). RxBFDs - The BFD size is 16 bytes, so 32 Kbytes (16 x 2 K) of primary and secondary memory are required. Buffer Pool - In the receive direction, 64 bytes are required for small buffers and 256 bytes for large buffers. So, the two available buffer pools are: Small buffers: 64 x 1 K = 64 Kbytes of primary and secondary memory Large buffers: 256 x 1 K = 256 Kbytes of primary and secondary memory RxRing - The RxRing contains 32 buffer numbers. RX_RING_SIZE, therefore, is defined as 32. One more word is needed for TxRing credits, so the RxRing requires (4 x RX_RING_SIZE) + 4 bytes or 132 bytes of primary memory. 10.7.2 Transmit Direction For the transmit direction, the buffers and the BFDs are located in primary memory and the TxRing is in Cell Buffer Memory. If the ATMizer II+ chip is in the packet mode, a copy of the buffers and RxBFDs may also be in secondary memory. TxBFDs - The BFD is 16 bytes, so 32 Kbytes (16 x 2 K) of primary and secondary memory are required. Buffer Pool - To send buffer data of 2 Kbytes and open all 1 K connections at initialization would require 2 Mbytes of primary memory for the pool. However, since the contents of the transmitted buffers is not important, they are overlapped in memory. The 1 K buffers are overlapped every 16 bytes, i.e., buffer n+1 starts 16 bytes after the beginning address of buffer n. 10-38 Initialization BookL64364PG.fm5 Page 39 Friday, January 28, 2000 4:58 PM The space required then is: (16 x 2 K) + 1024 - 16 = 33 Kbytes of primary and secondary memory TxRing - The TxRing is located in CBM on the ATMizer II+ chip for fast accessing. The ring contains 32 buffer numbers. TX_RING_SIZE, therefore, is defined as 32. One more word is needed for RxRing credits, so the TxRing requires (4 x TX_RING_SIZE) + 4 bytes or 132 bytes of primary memory. 10.7.3 Connection Descriptors VCDs (32 bytes), ACDs (32 bytes), and SCDs (4 bytes) should be located in secondary memory. The memory allocation is as follows: VCDs: 32 x 1 K x 2 = 64 Kbytes in secondary memory ACDs: 32 x 1 K = 32 Kbytes in secondary memory SCDs: 4 x 1 K = 4 Kbytes in secondary memory The host maintains an array containing one Host Connection Descriptor per requested connection in its private memory. The structure of the HCD is described in Section 1.3.1.3 . The HCDs require 64 x 1 K = 64 Kbytes of host private memory. 10.7.4 Buffer Descriptors The buffer pointers in the BFDs indicate whether the BFDs are in cell mode or in packet mode. The pBuffData.SEC and pBuffData.PCI fields of the BFD are initialized in the InitBFD routine. The BFD_FreeList field of the BFD is used in the Rx direction to support six free lists. The software can take advantage of this field in the Tx direction for supporting up to eight free lists. Each list can then be put in cell or packet mode with different buffer sizes, enabling a more sophisticated buffer management scheme in the Tx direction. Similarly, eight BFD lists can be initialized for the PreAttach BFDs. When the BFD list is in cell mode with the buffers in secondary memory or in packet mode, the secondary memory for the buffers can be chosen to be in SSRAM, SDRAM or in SRAM. Furthermore, the starting address Memory Allocation 10-39 BookL64364PG.fm5 Page 40 Friday, January 28, 2000 4:58 PM of the buffer location can be selected to be 0, 1, 2 or 3. Therefore, offword boundary buffers can be supported by this initialization scheme. The location of the buffers of the BFD list is determined by the configuration variables Loc_BuffPCI and Loc_BuffSec. The format of the variables is shown in Figure 10.17. Figure 10.17 Loc_BuffPCI and Loc_BuffSec Format 31 30 29 28 27 26 25 24 Pre Pre Pre Pre Pre Pre Pre Pre Attach Attach Attach Attach Attach Attach Attach Attach 7 6 5 4 3 2 1 0 15 14 Reserved 13 12 11 10 9 8 23 22 Reserved 7 6 21 20 19 18 17 16 Rx Rx Rx Rx Rx Rx Large Large Large Large Large Large 5 4 3 2 1 0 5 4 3 2 1 0 Rx Rx Rx Rx Rx Rx Tx Tx Tx Tx Tx Tx Tx Tx Small SMall Small Small Small SMall BFD7 BFD6 BFD5 BFD4 BFD3 BFD2 BFD1 BFD0 5 4 3 2 1 0 For each list if the corresponding bit is set in Loc_BuffPCI, then the buffer is located in PCI memory and the list is cell mode in PCI memory. Similarly, if the bit is set in Loc_BuffSec then the buffer is located in the secondary memory. If both the bits corresponding to a BFD list are set, then the BFD list is in packet mode. If a BFD list is in cell mode with the buffer in secondary memory or in packet mode, then the location of the buffer in secondary memory can be chosen. To do this, Sec_BuffLoc1 and Sec_BuffLoc0 are used. The format of these variables is the same as above. For determining the location of the buffers of TxBFD list 0, bit 0 of Sec_BuffLoc1 and Sec_BuffLoc0 is used as shown in Table 10.7 10-40 Initialization BookL64364PG.fm5 Page 41 Friday, January 28, 2000 4:58 PM Table 10.7 Buffer Location in Secondary Memory Sec_BuffLoc1, bit 0/ Sec_BuffLoc0 bit 0 Buffer Location 00 N/A 01 SSRAM 10 SDRAM 11 SRAM Similarly, for other lists, the corresponding bits from Sec_BuffLoc1 and Sec_BuffLoc0 determine the location of the buffer in secondary memory. The offset of the buffer in PCI memory and secondary memory is determined in a similar manner for all the BFD lists using the variables Off_BuffPCI1 and Off_BuffPCI0 for PCI memory offset, and Off_BuffSec1 and Off_BuffSec0 for secondary memory offset. Note that in case of packet mode BFDs, the PCI offset and secondary offset should be the same. 10.7.5 Data Exchanging Blocks When issuing the open connection and get statistics commands, the host and the APU need to share a common fixed location in primary memory to exchange parameters and data. open connection – When the host sends an open connection command to the APU, the host copies the first 64 bytes of the HCD from its HCD table in its private memory to a fixed location in primary memory. Refer to Section 1.3.1.3 for details. The space required for the host-toAPU connection parameters is 256 bytes (64 x 4 bytes) in primary memory. get statistics – When the host requests statistics from the APU, the APU copies the relevant data to a fixed location in primary memory and sends an acknowledgment to the host. The data is described in Table 10.8. The space required for the statistics results is 256 bytes (64 x 4 bytes) in primary memory. Memory Allocation 10-41 BookL64364PG.fm5 Page 42 Friday, January 28, 2000 4:58 PM 10.7.6 Related Issues The following sections discuss other issues related to memory. 10.7.6.1 Cacheable and Noncacheable In general, if multiple modules can access the same data structure, this structure should be located in the noncacheable memory area to ensure that the APU and the host can always fetch the updated values. Only the APU and the host internally manipulated data structures (e.g., ACDs and HCDs) should be located in the cacheable area. All the other data structures (e.g., BFDs, VCDs, SCDs, Rings, Credits, acknowledgments, Statistics Results, and host-to-APU parameters) should reside in the noncacheable area. 10.7.6.2 Memory Access APU accesses to secondary memory depend on the following factors: • The number of connections and connection rates determines if ACDs are in data cache or a cache line needs to be fetched. • Connection QoS determines ACD size. CBR and UBR traffic typically require less ACD access (fewer bytes). ABR traffic requires full ACD access (32 bytes) for RM cells (typically 2 out of 32 cells). • Connection lookup can be done by masking and shifting of the cell headers and does not require any memory access. The number of SCD accesses to secondary memory depends on the scheduler mode (Flat or Priority). Priority mode execution time is variable because of the dependence on the calendar table connection linked-list length. Flat mode has constant connection-searching time. 10-42 Initialization BookL64364PG.fm5 Page 43 Friday, January 28, 2000 4:58 PM 10.7.6.3 Caching Policy ACDs can be allocated to SSRAM, so cache write through is used. 10.8 Hardware Registers Initialization Hardware registers should be initialized correctly to make each hardware module work properly. Various hardware register configuration options are described In this section. Refer to Appendix A, “Register Summary,” of the L64364 ATMizer II+ ATM-SAR Chip Technical Manual for references to register layout and content information. Table 10.8 lists and describes all of the hardware registers that need to be initialized. Hardware Registers Initialization 10-43 BookL64364PG.fm5 Page 44 Friday, January 28, 2000 4:58 PM Table 10.8 ATMizer II+ Hardware Registers to be Initialized Register Description EDMA_Ctrl EDMA Control fields EDMA_BFD_FBase Buffer Descriptor table in primary memory EDMA_BFD_LBase Buffer Descriptor table in secondary memory EDMA_SBuffSize Size of small buffers for Rx data EDMA_LBuffSize Size of large buffers for Rx data EDMA_SBuff0 Head of small free buffer list 0 EDMA_LBuff0 Head of large free buffer list 0 EDMA_SBuff1 Head of small free buffer list 1 EDMA_LBuff1 Head of large free buffer list 1 EDMA_SBuff2 Head of small free buffer list 2 EDMA_LBuff2 Head of large free buffer list 2 EDMA_SBuff3 Head of small free buffer list 3 EDMA_LBuff3 Head of large free buffer list 3 EDMA_SBuff4 Head of small free buffer list 4 EDMA_LBuff4 Head of large free buffer list 4 EDMA_SBuff5 Head of small free buffer list 5 EDMA_LBuff5 Head of large free buffer list 5 EDMA_VCD_Base Virtual Connection Descriptor table base SCD_Ctrl Scheduler Control register SCD_CalBase1 Base address of calendar 1 SCD_CalBase2 Base address of calendar 2 SCD_CalBase3 Base address of calendar 3 SCD_CalSize0 Number of cell slots in calendar 0 (Sheet 1 of 3) 10-44 Initialization BookL64364PG.fm5 Page 45 Friday, January 28, 2000 4:58 PM Table 10.8 ATMizer II+ Hardware Registers to be Initialized (Cont.) Register Description SCD_CalSize1 Number of cell slots in calendar 1 SCD_CalSize2 Number of cell slots in calendar 2 SCD_CalSize3 Number of cell slots in calendar 3 ACI_Ctrl ACI Control field ACI_RxMask Rx polling mask ACI_FreeList Beginning of the free cell list ACI_TxSize Maximum number of cells in Tx FIFO ACI_RxSize Maximum number of cells in Rx FIFO ACI_RxLimit Threshold in Rx FIFO to generate an interrupt ACI_TxLimit Threshold in Tx FIFO to generate an interrupt ACI_TxTimer Cell holding time in Tx FIFO TM_TimeStamp Timestamp Counter TM_Timer1 Timer 1 value TM_TimerInit1 Timer 1 initialization value TM_Timer2 Timer 2 value TM_TimerInit2 Timer 2 initialization value TM_Timer3 Timer 3 value TM_TimerInit3 Timer 3 initialization value TM_Timer4 Timer 4 value TM_TimerInit4 Timer 4 initialization value TM_Timer5 Timer 5 value TM_TimerInit5 Timer 5 initialization value (Sheet 2 of 3) Hardware Registers Initialization 10-45 BookL64364PG.fm5 Page 46 Friday, January 28, 2000 4:58 PM Table 10.8 ATMizer II+ Hardware Registers to be Initialized (Cont.) Register Description TM_Timer6 Timer 6 value TM_TimerInit6 Timer 6 initialization value TM_Timer7 Timer 7 value TM_TimerInit7 Timer 7 initialization value TM_Enable Time-out enable TM_ClockSel Timer clock selection (Sheet 3 of 3) 10.8.1 EDMA Registers The EDMA registers are all prefixed with EDMA_ and are described in the following paragraphs. 10.8.1.1 EDMA_Ctrl Register Bits in this register determine the data and BFD transfer modes as described in Table 10.9. Table 10.9 Data and BFD Transfer Modes Mode Type Description Cell Mode Individual cells are exchanged between CBM and primary or secondary memory. Packet Mode Complete packets are exchanged between primary memory and secondary memory. Far Mode BFDs are located in primary memory. Local Mode BFDs are copied between secondary memory and primary memory. Since write operations through the PCI Bus of the ATMizer II+ chip are always faster than read operations, the best modes are Cell mode and Local mode. In these, the host writes transmit cells and BFDs to secondary memory and the ATMizer II+ EDMA writes received cells and 10-46 Initialization BookL64364PG.fm5 Page 47 Friday, January 28, 2000 4:58 PM BFDs back to primary memory. This saves PCI Bus transmission time if the host also has DMA capability. If the host does not have DMA capability, use Packet and Far modes and let the ATMizer II+ EDMA exchange the data and BFDs between primary memory and secondary memory. The exchanges are then transparent to the APU and the host. The pBuffData.SEC and pBuffData.PCI fields of the BFDs point to secondary memory and primary memory respectively. If both fields are nonzero, then the BFD is in Packet mode. If pBuffData.SEC is zero, then the BFD is in Cell mode with the buffer in primary memory. On the other hand, if pBuffData.PCI is zero, then the BFD is in Cell mode with the buffer in secondary memory. The configuration of the following bits in the EDMA_Ctrl register determine where the BFDs are located: • EDMA_TxBFD_Far • EDMA_TxBFD_Copy • EDMA_RxBFD_Far • EDMA_RxBFD_Copy Far mode is selected when the Far bits are set and Local mode is selected when the Copy bits are set. The EDMA disregards the Far bits when the Copy bits are set. 10.8.1.2 EDMA_BFD_Base Registers BFDs in either primary or secondary memory are referenced by adding the Buffer Number times the size of the BFD to the value in the EDMA_BFD_FBase (Far or primary memory BFD base address) register or to the value in the EDMA_BFD_LBase (Local or secondary memory BFD base address) register. The registers are selected by the EDMA based on the settings of the Far and Copy bits in the EDMA_Ctrl register. 10.8.1.3 EDMA_Buff Registers The EDMA_LBuffSize and EDMA_SBuffSize registers specify the sizes of large and small buffers to be used when a buffer is linked from a free buffer list. Both EDMA_LBuffSize and EDMA_SBuffSize must be equal Hardware Registers Initialization 10-47 BookL64364PG.fm5 Page 48 Friday, January 28, 2000 4:58 PM to or larger than 48 for correct EDMA operation. The EDMA_LBuff and EDMA_SBuff registers point to the beginning of large and small free buffer lists. 10.8.1.4 EDMA_VCD_Base Register The EDMA_VCD_Base register is used to calculate the VCD address by adding its contents to the connection number multiplied by the size of the VCD. 10.8.1.5 EDMA Registers Initialization Code In the following initialization code, Packet mode was chosen for data transfers and Local mode for BFD maintenance. The C code illustrates how to initialize the related EDMA registers (Figure 10.18). Figure 10.18 Initializing EDMA Registers 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 /* * EDMA related parameters */ #define EDMA_BFD_FBase RxBFD_FBase_APU /* Buffer Descriptor table in primary memory */ #define EDMA_BFD_LBase RxBFD_LBase_APU /* Buffer Descriptor table in secondary memory */ #define EDMA_VCD_Base VCD_Base /* Virtual Connection Descriptor table base */ #define TxBFD_Copy1 /* Tx BFD local mode (1) or far mode (0) */ #define RxBFD_Copy1 /* Rx BFD local mode (1) or far mode (0) */ #define TxBFD_Far0 /* Tx BFD are in Far base (1) or Local base (0) */ #define RxBFD_Far0 /* Rx BFD are in Far base (1) or Local base (0) */ #define ConReAct0 /* Enable connection reactivation in Tx direction */ #define ByteSwap0 /* Ctrl byte swapping for cell transferring */ #define OrHdr 0 10-48 Initialization BookL64364PG.fm5 Page 49 Friday, January 28, 2000 4:58 PM 32 /* Ctrl the generation and extraction of the cell header */ 33 34 #define EDMA_Ctrl ( (ConReAct << EDMA_ConReAct) |\ 35 (ByteSwap << EDMA_ByteSwap) |\ 36 (OrHdr << EDMA_OrHdr) |\ 37 (TxBFD_Far << EDMA_TxBFD_Copy) |\ 38 (RxBFD_Far << EDMA_TxBFD_Far) |\ 39 (RxBFD_Copy << EDMA_RxBFD_Copy) |\ 40 (RxBFD_Far << EDMA_RxBFD_Far) ) 41 /* EDMA control fields */ 42 43 /* EDMA related initialization */ 44 Hdr->EDMA.SBuff= (ushort)Head_Rx_SBuff; 45 Hdr->EDMA.LBuff= (ushort)Head_Rx_LBuff; 46 Hdr->EDMA.VCD_Base= (ushort)EDMA_VCD_Base; 47 Hdr->EDMA.BFD_LBase (ushort)EDMA_BFD_LBase; 48 Hdr->EDMA.BFD_FBase= (ushort)EDMA_BFD_FBase; 49 Hdr->EDMA.Ctrl= EDMA_Ctrl; 10.8.2 Scheduler Registers The two Scheduler registers that need to be initialized are described in the following paragraphs. 10.8.2.1 SCD_Ctrl Register The Scheduler Control register, SCD_Ctrl, provides information about the calendar table base address and the Scheduler mode of operation. The Scheduler operates in the Flat mode when the SCD_FlatMode bit in the register is set and in the Priority mode when the SCD_FlatMode bit is cleared. Flat mode gives all connections equal service priority; Priority mode services the connections with lower class-of-service values first. The SCD_VCDinCB field in the SCD_Ctrl register determines the location of VCDs. All VCDs containing connection numbers equal to or less than the value in SCD_VCDinCB are located in CBM. The addresses of VCDs containing connection numbers greater than the value in SCD_VCDinCB are computed by adding the value in the EDMA_VCD_Base register to their connection numbers. Hardware Registers Initialization 10-49 BookL64364PG.fm5 Page 50 Friday, January 28, 2000 4:58 PM 10.8.2.2 SCD_CalSize Register The SCD_CalSize register is used to program the size of the calendar table in units of cell slots. The memory required to store the calendar table is calculated as: Equation 10.1 Memory (bytes) = SCD_CalSize * (2 + (2 * SCD_FlatMode)) 10.8.2.3 Scheduler Registers Initialization Code The following C code illustrates how to initialize the Scheduler registers (Figure 10.19): Figure 10.19 Initializing Scheduler Registers 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 /* * Scheduler related parameters *//**************************************************** #define SCD_CalBase (SCD_Base >> 9) /* Calendar base address */ #define SCD_FlatMode 0 /* Scheduler’s operating mode: flat (1) or priority (0) */ #define SCD_VCDinCB 0 /* Number of VCDs in cell buffer */ #define SCD_Cal_Size MAX_CON_NUM /* Number of cell slots in calendar */ #define SCD_Ctrl \ ( (SCD_VCDinCB << 24) | (SCD_FlatMode << 23) | \ (SCD_CalBase & 0xfffff) ) 18 /* Scheduler control fields */ 19 20 /* Scheduler related initialization */ 21 Hdr->SCD.Ctrl = SCD_Ctrl; 22 Hdr->SCD.CalSize = SCD_Cal_Size; 10.8.3 ACI Registers The ACI registers determine the operation of the APU and ACI in relation to the Utopia Bus. Those that require initialization are described in the following paragraphs. 10-50 Initialization BookL64364PG.fm5 Page 51 Friday, January 28, 2000 4:58 PM 10.8.3.1 ACI_Ctrl Register All of the assigned bits and fields in the ACI_Ctrl register must be initialized. The initialization code provided here sets the bits and fields as shown in Table 10.10. Refer to the L64364 ATMizer II+ ATM-SAR Chip Technical Manual for a detailed description of the register. Table 10.10 ACI Control Register Initialization Field/Bit Initialization ACI_PHY field PHY address to respond to in slave mode ACI_Loopback bit Set to enable on-chip loopback ACI_Parity bit Set to enable Utopia parity generation and error detection ACI_CellSize field Depends on application as follows: 0b00 = 52/53 bytes 0b01 = 56/57 bytes 0b10 = 60/61 bytes 0b11 = 64/65 bytes ACI_HEC bit Set to generate and verify HEC ACI_TxIdle bit Set to generate idle cells when the Tx FIFO is empty ACI_FixedPr bit Set to enable a fixed priority scheme in the receive direction (port 0 has highest priority and port 23 lowest priority) ACI_Slave bit Depends on application. When set, the APU is a Utopia Bus slave and responds to the address in the ACI_PHY field. When cleared, the APU is the Utopia Bus master. ACI_DirectPoll bit Depends on application and ACI_Slave bit. When set, the APU uses a direct polling scheme and supports up to four slave devices on the UTOPIA Bus. When cleared, the APU assigns PHY addresses [3:0] to the CLAV[3:0] lines of the UTOPIA Bus and supports multiplexed polling of up to 24 slave devices. See also the ACI_RxMask register description. ACI_Reset bit Set to place the ACI Transmitter and Receiver state machines to their idle state. Hardware Registers Initialization 10-51 BookL64364PG.fm5 Page 52 Friday, January 28, 2000 4:58 PM 10.8.3.2 ACI_RxMask Register The ACI_RxMask register contains 24 N bits, one for each PHY device supported in multiplexed polling. When an N bit in the register is set, the ACI receiver includes PHY device N in its polling; otherwise, the device is skipped. 10.8.3.3 ACI_FreeList Register The ACI_FreeList register is used only at initialization to set the beginning of the free cell list. The register is 16 bits wide. The calculation is: Equation 10.2 ACI_FreeList = CBM_base + (SCD_VCDinCB * sizeof(VCD)) + (ResvCB * sizeof(long)) +sizeof(TxRing) where CBM_base is the base address of CBM. SCD_VCDinCB is the total number of VCDs allocated to CBM. ResvCB is the reserved space and might be 0. Cell number 0 is always reserved. It is used for idle cell generation when that feature is enabled. If the feature is disabled, the cell location may be used as regular cell memory. TxRing is the transmit ring for messaging between the host and the APU. 10.8.3.4 ACI_TxSize Register The 8-bit ACI_TxSize register is used to set the maximum size of the transmit FIFO to guarantee sufficient free cell locations for the receive FIFO, since both FIFO’s share the same area in CBM. If the total number of transmit cells in CBM reaches ACI_TxSize, the CBM manager returns cell number 0 when the APU requests a free cell location. 10.8.3.5 ACI_RxSize Register The 8-bit ACI_RxSize register is used to set the maximum size of the receive FIFO to guarantee sufficient free cell locations for the transmit FIFO, since both FIFO’s share the same area in CBM. 10-52 Initialization BookL64364PG.fm5 Page 53 Friday, January 28, 2000 4:58 PM 10.8.3.6 ACI_Limit Registers The ACI_TxLimit and ACI_RxLimit registers are used to program the threshold for the number of cells in the transmit or receive FIFO that will generate an interrupt. When the actual number of cells exceeds the ACI_RxLimit or drops below the ACI_TxLimit, an interrupt is delivered to the APU (when enabled). The register is eight bits wide. 10.8.3.7 ACI_TxTimer Register The ACI_TxTimer register is used to set the cell holding time in the transmit FIFO depending on the selected timer. The register is eight bits wide. 10.8.3.8 ACI_FreeCount Register The ACI_FreeCount register is used to set the free cell count at initialization. The register is eight bits wide. 10.8.3.9 ACI Registers Initialization Code The following C code illustrates how to initialize the ACI registers (Figure 10.20): Hardware Registers Initialization 10-53 BookL64364PG.fm5 Page 54 Friday, January 28, 2000 4:58 PM Figure 10.20 Initializing ACI Registers 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 << /* * ACI related parameters */ #define ACI_TxSize 16 /* Maximum number of cells in Tx Fifo */ #define ACI_RxSize 16 /* Maximum number of cells in Rx Fifo */ #define ACI_RxLimit 4 /* Threshold in RxFifo to generate an interrupt */ #define ACI_TxLimit 8 /* Threshold in TxFifo to generate an interrupt */ #define ACI_RxMask 0x00ffffff /* Rx polling mask */ #define ACI_Phy 0 /* Phy physical address to respond to in slave mode */ #define ACI_LoopBack 1 /* set for on-chip loop back */ #define ACI_Parity 0 /* enable for Utopia parity generation and error detection */ #define ACI_CellSize 00 /* the cell size, 00-52/53, 01-56/57, 10-60/61, 11-64/65 */ #define ACI_HEC 1 /* set to generate or verify the HEC bit */ #define ACI_TxIdle 0 /* set to generate idle cells when Tx Fifo is empty */ #define ACI_FixedPr 0 /* set to enable the priority of Phy device in Rx direction */ #define ACI_Slave 0 /* control the master/slave operation of the Utopia bus */ #define ACI_DirectPoll 0 /* enable direct or multiplexed polling scheme */ #define ACI_Reset 0 /* set the Tx and Rx state machines to idle state */ #define ACI_Ctrl \ (ACI_Phy | (ACI_LoopBack << 5) | \(ACI_Parity << 6) | (ACI_CellSize 8) | \ 10-54 Initialization BookL64364PG.fm5 Page 55 Friday, January 28, 2000 4:58 PM 51 (ACI_HEC << 10) | (ACI_TxIdle << 11) | \ (ACI_FixedPr << 12) | (ACI_Slave 52 << 13) | \ (ACI_DirectPoll << 14) | (ACI_Reset << 15)) 53 /* ACI control fields */ 54 55 #define ResvCB 0 56 /* in words, needed to calculate ACI_freelist */ 57 58 #define ACI_FreeList \ 59 ( (SCD_VCDinCB << 5) + (ResvCB << 2) + \ TxRing_Size * SizeOf_Ring_Entry + \ 60 SizeOf_Ring_Credit) 61 #define ACI_FreeCount \ 62 ( CELL_COUNT ) 63 64 #define CellBuffSize \ 65 ( (SizeOf_CBM - ACI_FreeList) / SizeOf_Cell ) 66 67 /* ACI related initialization */ 68 Hdr->ACI.TxSize = ACI_TxSize; 69 Hdr->ACI.TxLimit = ACI_TxLimit; 70 Hdr->ACI.RxLimit = ACI_RxLimit; 71 Hdr->ACI.RxMask = ACI_RxMask; 72 Hdr->ACI.Ctrl = ACI_Ctrl; 73 Hdr->ACI.FreeList = ACI_FreeList; 74 Hdr->ACI.FreeCount = ACI_FreeCount; 10.8.4 Timer Registers The ATMizer II+ Timer Unit implements a set of eight hardware timers and a Timestamp Counter in registers to provide the APU with real-time events. The 32-bit TM_TimeStamp counter is incremented at each input clock event. It should be initialized to zero. There are eight, 8-bit, general-purpose timers, TM_Timer1-7. They are individually initialized to the values in the corresponding TM_TimerInit1-7 registers. A timer, if enabled by the associated bit in the TM_Enable register, is decremented at each input clock event selected by the TM_ClockSel register. A timer time-out event occurs when a timer reaches zero. It then reloads the value in the corresponding TM_TimerInit1-7 register. The eight timers may be cascaded to achieve higher counts. Time-out events for the Timestamp Counter, Timers 1 through 3 and 8 are registered in the APU_Status register and may generate an interrupt. Timers 4 through seven can be used only as part of a wider, cascaded timer. Hardware Registers Initialization 10-55 BookL64364PG.fm5 Page 56 Friday, January 28, 2000 4:58 PM Figure 10.21 shows how to cascade timers to enlarge the value of a watchdog timeout event. Figure 10.21 Cascading Timers for a Long Watchdog Timeout 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 /* * To avoid APU hangs when it is stalled, enable the * Watchdog timer with a large value. When it is timeout, * something must be wrong. */ /* enable Timers */ /* Hdr->TM.ClockSel = 0x65432100; Hdr->TM.Timer[0].Value = 0xff; Hdr->TM.Timer[0].Init = 0xff; Hdr->TM.Timer[1].Value = 0xff; Hdr->TM.Timer[1].Init = 0xff; Hdr->TM.Timer[2].Value = 0xff; Hdr->TM.Timer[2].Init = 0xff; Hdr->TM.Timer[3].Value = 0xff; Hdr->TM.Timer[3].Init = 0xff; Hdr->TM.Timer[4].Value = 0xff; Hdr->TM.Timer[4].Init = 0xff; Hdr->TM.Timer[5].Value = 0xff; Hdr->TM.Timer[5].Init = 0xff; Hdr->TM.Timer[6].Value = 0xff; Hdr->TM.Timer[6].Init = 0xff; */ /* enable watchdog timer on Timer2 */ Hdr->APU.Watchdog = 0x40ff; 10-56 Initialization BookL64364PG.fm5 Page 57 Friday, January 28, 2000 4:58 PM 10.8.5 APU Registers The APU_VIntEnable register is cleared at system reset. Setting a bit in the register enables the corresponding interrupt and clearing the bit disables (masks) the interrupt. The bit number in the register corresponds to the interrupt number shown in Table 10.11. Interrupt number 0 has the lowest priority. Table 10.11 External Vectored Interrupts Interrupt Number Description IntEDMA_ComplFull 15 Completion Queue is full (Tx, Rx or Buff) IntACI_RxFull 14 ACI Rx FIFO is full IntRxMbx 13 Rx Mailbox FIFO not empty IntEDMA_Move 12 EDMA Move is complete IntEDMA_RxCell 11 RxCell Completion Queue not empty IntACI_Rx 10 ACI Rx FIFO exceeds threshold (ACI_RxLimit) IntEDMA_TxCell 9 TxCell Completion Queue not empty IntEDMA_Buff 8 Buff Completion Queue not empty IntACI_Err 7 Timeout, parity, or short-cell error IntACI_Tx 6 ACI Tx FIFO is below threshold (ACI_TxLimit) IntExt1-0 5-4 External interrupt inputs (user defined) IntTim3-1 3-1 Timers 3-1 timeout IntTim0 8 Timer 8 timeout The contents of the APU_VIntBase register are used as bits [31:7] and the interrupt number as bits [6:3] for the vectored interrupt handler routine address. Bits [2:0] of the address are set to zero. The APU_Reset bit in the APU_AddrMap register is set when the hardware PCI_RSTn signal is asserted. All hardware modules remain in an idle state as long as this bit is set. After the APU initializes all hardware registers and memory resident data structures, the APU_Reset Hardware Registers Initialization 10-57 BookL64364PG.fm5 Page 58 Friday, January 28, 2000 4:58 PM bit should be cleared, as shown in Figure 10.22, to activate all the hardware modules on the ATMizer II+ chip: Figure 10.22 Clearing the APU_Reset Bit Hdr->APU.AddrMap &= 0x7fff00ff; 10.9 Data Structures Initialization This section describes initialization of the following data structures: • Virtual Connection Descriptors (VCDs) and APU Connection Descriptors (ACDs) • Buffer Descriptors (BFDs) • The Calendar Table • The Tx and Rx Rings • The Free Cell List 10.9.1 VCD and ACD Initialization During the initialization period, the SDP application code clears all fields of all VCDs, as shown in Figure 10.23. Figure 10.23 Clearing VCD Fields 1 2 3 4 /* clear all VCDs */ vcd = (ulong*)VCD; for (i = 0; i < (MAX_CON_NUM * SizeOf_VCD / 4); i++) *vcd ++= 0; The APU initializes the corresponding ACD and VCD when it receives an open connection command from the host. At the same time, the host also passes the initial address of a block of signal parameters for the connection to the APU. This block is the first 32 bytes of the Host Connection Descriptor (HCD). Refer to Section 1.3.2.1, “Mailbox,” for how to issue the open connection command and pass the required parameters. Based on these parameters, the APU calculates its own ACD for that connection. Table 10.12 lists the signaled parameters. Refer to the ATM Forum Traffic Management Specification 4.0, for the detailed meaning of 10-58 Initialization BookL64364PG.fm5 Page 59 Friday, January 28, 2000 4:58 PM each parameter. The calculation of the ACD is described in Table 10.13. Defined in the Initialization column of the table means that the parameter is predefined in the header file (ConPar.h) as the default value (you may change it if it is signaled). The undefined parameters are calculated at the connection open time. Refer to Chapter 3, Scheduling, for more detail about the usage of ACDs. Table 10.12 Required Open Connection Parameters Name Address Class Description ConNum 0 All Connection Number Reserved 4 All Cell header (to be implemented later) Class 8 All Class of traffic PCR 12 All Peak Cell Rate in Cells/Sec units, 24-bit integer SCR 16 VBR Sustained Cell Rate in Cells/Sec units, 24-bit integer MCR 16 ABR Minimum Cell Rate in Cells/Sec units, 24-bit integer MBS 20 VBR Maximum Burst Size ICR 20 ABR Initial Cell Rate in Cells/Sec units, 24-bit integer TBE 24 ABR Transient Buffer Exposure FRTT 28 ABR Fixed Round-Trip Time Data Structures Initialization 10-59 BookL64364PG.fm5 Page 60 Friday, January 28, 2000 4:58 PM Table 10.13 ACD Field Calculations Name Class Initialization ICG (Intercell Gap) CBR, UBR LCR (Line Cell Rate)/ PCR (Peak Cell Rate ICG ABR LCR/ICR (Initial Cell Rate) ICG_PCR VBR LCR/PCR ICG VBR ICG_PCR Bucket VBR 0, Variable Increment VBR LCR/SCR Limit VBR (MBS - 1)(Increment - ICG_PCR) ThTxTime All ICG NRM (maximum number of cells a source may send for each Forward Resource Management cell) ABR 32, defined ICR ABR min(PCR, TBE/FRTT) LastTimeFRM (last time a Forward Resource Management cell was sent) ABR Now - NRM/ICR logRIF (Rate Increase Factor) ABR 4, RIF = 1/16, defined logRDF (Rate Decrease Factor) ABR 4, RDF = 1/16, defined CRM (limit of FRM cells in the absence of a Backward Resource Management cell) ABR TBE/NRM FRM_SinceBRM (The count of FRM cells since the last Backward Resource Management cell) ABR 0 InRateCell (the count of In-Rate cells since the last FRM cell) ABR NRM ACR (Allowed Cell Rate) ABR ICR MCR ABR 0 PCR ABR PCR (Sheet 1 of 2) 10-60 Initialization BookL64364PG.fm5 Page 61 Friday, January 28, 2000 4:58 PM Table 10.13 ACD Field Calculations (Cont.) Name Class Initialization LastWasFRM (last RM cell sent was an FRM cell) ABR 0 PresBRM (presenting BRM cell) ABR 0 BRM_NI (BRM No Increase) ABR 0 BRM_C (BRM Congestion Indicator) ABR 0 BRM_BN (BRM Backward Explicit Congestion Notification cell) ABR 0 logCDF (Cutoff Decrease Factor) ABR 4, CDF =1 / 16, defined ADTF (ACR Decrease Time Factor) ABR (ulong) (0.5 * LCR), defined BRM_ER (BRM Explicit Rate) ABR 0 BRM_CCR (BRM Current Cell Rate) ABR 0 BRM_MCR (BRM Minimum Cell Rate) ABR 0 TRM (the upper bound on the time between FRM cells) ABR 100 ms, defined TCR (Total Cell Rate) ABR 10, defined (Sheet 2 of 2) 10.9.2 BFD Initialization All TxBFDs and RxBFDs are initialized before starting to send or receive data. The NextBFD, pBuffData, and BuffSize fields of the BFDs are set correctly and the other fields are cleared. Figure 10.24 shows an example of C code to initialize all BFDs. Data Structures Initialization 10-61 BookL64364PG.fm5 Page 62 Friday, January 28, 2000 4:58 PM Figure 10.24 Initializing BFDs 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 TmpAddr = (ulong *) TxBuffPool; for (n = 1; n < Tx_BuffCount; n++, TmpAddr += TxBuffSize) { TxBFD[n].NextBFD = n+1; /* points to next BFD */ RxBFD[n].NextBFD = n+1; TxBFD[n].pBuffData = (ulong)TmpAddr; TxBFD[n].BuffSize = TxBuffSize; if (n <= MAX_CON_NUM) RxBFD[n].pBuffData = (ulong)RxSmallBuffPool[n-1]; else RxBFD[n].pBuffData = \ (ulong)RxLargeBuffPool[n-1-MAX_CON_NUM]; } TxBFD[2*MAX_CON_NUM].NextBFD = 0;/* last Tx BFD in the list */ RxBFD[MAX_CON_NUM-1].NextBFD = 0;/* last small Rx BFD */ RxBFD[2*MAX_CON_NUM].NextBFD = 0;/* last large Rx BFD */ NextFreeTxBFD = 1; /* first available Tx BFD */ } There are two ways to initialize BFDs. One way is to let the host and/or the APU exchange buff commands with each other to attach BFDs to VCDs. For the RxBFDs, the host: 1. sets the BuffFree bit in all BFDs, 2. sets the BuffLarge bit in large BFDs, and 3. clears the BuffLarge bit in small BFDs. before sending the buff command to the APU. The APU passes the command to the EDMA and the EDMA automatically puts the BFDs in the corresponding buffer free list. For TxBFDs, the APU clears the BuffFree bit and ignores the BuffLarge bit if only one-size buffers are used. When the host receives the buff command, it puts the BFD in its own free buffer list. The other way to initialize BFDs is to simply let the host or the APU create the free BFD lists. In the ADP, the host initializes all the free BFD lists. In the BFD copy mode, the BFDs are located in both primary and secondary memory since they are copied back and forth. When accessing the BFDs, the EDMA uses either the Far BFD base (Fbase - BFD base address in primary memory) or the Local BFD base (Lbase - BFD base address in secondary memory) as follows: 10-62 Initialization BookL64364PG.fm5 Page 63 Friday, January 28, 2000 4:58 PM • attach – read BFD if (TxBFD_Copy) use Fbase else if (TxBFD_Far) use Fbase else use Lbase – write partial BFD from VCD[tailBFD] if (TxBFD_Copy) use Lbase else if (TxBFD_Far) use Fbase else use Lbase • free – if (RxBFD_Copy) use Lbase – else if (RxBFD_Far) use Fbase – else use Lbase From the above it can be seen that, in the BFD copy mode, the RxBFDs should be initialized in secondary memory and the TxBFDs should be initialized in primary memory. If not in the BFD copy mode, the BFDs should be located in the far (primary) memory or the local (secondary) memory per the states of the TxBFD_Far and RxBFD_Far bits in the EDMA_Ctrl register. Refer to the L64364 ATMizer II+ ATM-SAR Chip Technical Manual for more detail. 10.9.3 Calendar Table Initialization The Calendar Table is a cell-slot array managed by the Scheduler. Each entry in the Calendar Table corresponds to one cell slot and contains connection numbers of virtual connections to be serviced in that slot. All slots in the table are cleared initially to indicate that there are no connections scheduled. The example code in Figure 10.25 clears the Calendar Table. Refer to Chapter 3, “Scheduling” and the L64364 ATMizer II+ ATM-SAR Chip Technical Manual for more detail about the Calendar Table and Scheduler. Data Structures Initialization 10-63 BookL64364PG.fm5 Page 64 Friday, January 28, 2000 4:58 PM Figure 10.25 Clearing the Calendar Table 1 2 3 4 /* clear Calendar Table */ Calendar = (ulong*)SCD_Base; for (i = 0; i < (MAX_CON_NUM * SizeOf_SCD / 4); i++) *Calendar ++= 0; 10.9.4 Ring Initialization To minimize the traffic on the PCI Bus, both the APU and the host keep a separate set of pointers for the rings. The initialization for the host sets the RxRing count to the RxRing size and clears the TxRing count. It also points the TxRing credit and the RxRing starting pointer to primary memory, and points the TxRing starting pointer and the RxRing credit to CBM. The routine is shown in Figure 10.26. Figure 10.26 Initializing Host Rings 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Ring_Desc_t TxRing, RxRing; TxRing.Ptr TxRing.Size TxRing.End TxRing.Count TxRing.Credit = = = = = &Host_APU[0]; /* points to CBM */ HOST_APU_RING_SIZE; TxRing.Ptr +TxRing.Size; 0; &Host_APU_Credit; /* points to primary memory*/ RxRing.Ptr RxRing.Size RxRing.End RxRing.Count RxRing.Credit *RxRing.Credit = = = = = = &APU_Host[0]; /* points to primary memory */ APU_HOST_RING_SIZE; RxRing.Ptr + RxRing.Size; RxRing.Size; &APU_Host_Credit; /* points to CBM */ RxRing.Size; Similarly, the initialization for the APU sets the TxRing count to the TxRing size and clears the RxRing count. It also points the TxRing credit and the RxRing starting pointer to primary memory, and points the TxRing starting pointer and the RxRing credit to CBM. The routine is shown in Figure 10.27. 10-64 Initialization BookL64364PG.fm5 Page 65 Friday, January 28, 2000 4:58 PM Figure 10.27 Initializing APU Rings 1 2 3 4 5 6 7 8 9 10 11 12 13 Ring_Desc_t RxRing.Ptr RxRing.Size RxRing.End RxRing.Count RxRing.Credit TxRing, RxRing; = &APU_Host[0]; /* points to primary memory */ = APU_HOST_RING_SIZE; = RxRing.Ptr +Rx Ring.Size; = 0; = &APU_Host_Credit; /* points to CBM */ TxRing.Ptr TxRing.Size TxRing.End TxRing.Count TxRing.Credit *TxRing.Credit = = = = = = &Host_APU[0]; /* points to CBM */ HOST_APU_RING_SIZE; TxRing.Ptr +Tx Ring.Size; TxRing.Size; &Host_APU_Credit; /* points to primary memory */ TxRing.Size; 10.9.5 Free Cell List Cell Buffer Memory is a 4 Kbyte sized, on-chip memory. It is mainly used for the Free Cell List, the Transmit FIFO, and the Receive FIFO, and occasionally may contain other data structures. The Cell Buffer Manager is responsible for the management of the CBM. The Cell Buffer Manager maintains the Free Cell List through the ACI_FreeList register. This register is initialized as the first address of the Free Cell List and is described in Section 10.8.3, “ACI Registers.” At initialization, the APU builds a list of free cells. Refer to the L64364 ATMizer II+ ATM-SAR Chip Technical Manual for more detail of the Cell Descriptor format and usage. The example code shown in Figure 10.28 builds a Free Cell List. Figure 10.28 Free Cell List Initialization 1 2 3 4 5 /* CBM Freelist initialization */ CellBuff = (pCell_t)((ulong)CBM + ACI_FreeList); for (i = 1; i < CellBuffSize - 1; i++) CellBuff[i].CDS = (i * sizeof(Cell_t) + ACI_FreeList) << 18; CellBuff[CellBuffSize - 1].CDS = 0; 10.9.6 Miscellaneous Data Structures The remaining variables and structures should be correctly set or cleared. For instance, all the statistic results should be cleared. The global variable TimeNow should be initialized to 0. Data Structures Initialization 10-65 BookL64364PG.fm5 Page 66 Friday, January 28, 2000 4:58 PM 10-66 Initialization BookL64364PG.fm5 Page 1 Friday, January 28, 2000 4:58 PM Chapter 11 Operating Software This chapter describes operating software for an ATMizer II+ system. The sections in this chapter are: • Section 11.1, “Top Level Structure” • Section 11.2, “APU Program” • Section 11.3, “Host Program” 11.1 Top Level Structure Software running on the ATMizer II+ chip interacts with each hardware module to realize the traffic flow control mechanism defined by the ATM Forum. The Segmentation and Reassembly (SAR) process can be split into four separate subprocesses or threads: • RxHAS • RxCRT • TxHAS • TxCRT First, the receive and transmit directions are handled by independent threads. Second, host-to-ATM Processing Unit (APU) signalling is handled independent of cell receiving and transmitting. The Host-to-APU Signalling thread (HAS) involves the EDMA buff command and the EDMA move command (for packet mode only). The Cell Receive and Transmit thread (CRT) involves the EDMA cell command. The HAS thread is triggered by host commands and EDMA buffer completion events while the CRT thread is triggered by all arrival or Scheduler/timer events. L64364 ATMizer II+ ATM-SAR Chip Programming Guide 11-1 BookL64364PG.fm5 Page 2 Friday, January 28, 2000 4:58 PM 11.2 APU Program The MIPS processor core in the APU may be considered the main control unit of the ATMizer II+ architecture. The APU is responsible for traffic management, host messaging, OAM cell processing, and statistics collection. All other hardware processing modules are slaves of the APU and execute commands when appropriate hardware registers are written. From the software perspective, the hardware accelerators appear as predefined routines that execute faster than equivalent processor code and, more importantly, in parallel with the main processor. The hardware modules include: • a full AAL5 segmentation and reassembly engine with built-in memory management • a calendar-based Scheduler unit • a floating point accelerator for ATM Forum 15-bit floating point format 11.2.1 Cell Operation Flow Following is the typical transmit/receive cell flow. Refer to the L64364 ATMizer II+ ATM-SAR Chip Technical Manual for details. TxCRT – Transmit Cell Thread 1. The APU reads the ACI_TxFree register. The Cell Buffer Manager returns a free cell location. 2. The APU issues the SCD_Serv command. The Scheduler returns the connection number to be serviced. If the connection number is zero, no connection is scheduled to be serviced during this cell slot. Skip steps 3 and 4. 3. The APU computes a new Intercell Gap (ICG). 4. The APU issues the SCD_sched command with the connection number and the computed service time (ServTime = ThTxTime + ICG). 5. The APU issues the SCD_tic command. 6. The APU issues the TxCell command. 11-2 Operating Software BookL64364PG.fm5 Page 3 Friday, January 28, 2000 4:58 PM 7. The EDMA executes the TxCell command and puts the cell in the TxFIFO. 8. The Cell Buffer Manager sends the cell out and links the cell to the free cell list. RxCRT – Receive Cell Thread 1. The ACI places a cell in the ACI RxFIFO. 2. The APU reads the ACI_RxRead register to get the cell address. The ACI removes the cell from RxFIFO. 3. The APU reads the cell descriptor and cell header from the cell buffer and computes the connection number. 4. The APU issues the RxCell command. 5. The EDMA executes the RxCell command and returns the cell to a free list. 6. The Cell Buffer Manager links the cell to a free list. 11.2.2 Buffer Operation Flow The buffer operation flow in the ATMizer II+ chip is described in the following. Refer to the L64364 ATMizer II+ ATM-SAR Chip Technical Manual for details. TxHAS – Transmit Buffer Host-APU Signalling Thread 1. The host writes a BuffNum into the TxRing. 2. The APU reads this BuffNum from the TxRing and writes it into the Buff Request Queue if the queue is not full. 3. The EDMA gets the BuffNum from the Buff Request Queue, copies the BFD into secondary memory, links it to the VCD, and optionally invokes the Move processor to copy the buffer contents from primary memory to secondary memory (through the EDMA_TxBFD_Copy and EDMA_TxBuffCopy control bits) if it is in the packet mode. 4. The EDMA completes the buffer and places the BuffNum in the Buff Completion Queue. 5. The APU reads the BuffNum from the completion queue. APU Program 11-3 BookL64364PG.fm5 Page 4 Friday, January 28, 2000 4:58 PM 6. The APU writes the BuffNum and two control bits (BFS_BuffLarge and BFS_BuffFree) into the RxRing. 7. The host may free its own buffer since this packet has been sent out. RxHAS – Receive Buffer Host-APU Signalling Thread 1. The RxCell processor completes a buffer, places the BuffNum in the EDMA Completion Queue, copies the BFD to primary memory, and optionally invokes the Move processor to copy the buffer contents to primary memory (through the EDMA_RxBFD_Copy and EDMA_RxBuffCopy control bits) if it is in packet mode. 2. The APU retrieves the BuffNum from the completion queue and places the BuffNuM in the RxRing. 3. The host consumes data (some time later). 4. The host writes the BuffNum into the TxRing to free it. 5. The APU retrieves the BuffNum from the TxRing. 6. The APU issues a buff command to let the Buff processor link this BFD to a free list. 11.2.3 Pseudocode The ATMizer II+ Application Pseudocode contains a pseudo-code example to illustrate the APU software necessary to implement SAR functionality. The pseudo-code performs the following tasks: • 11-4 host and APU messaging – command to segment a buffer (host->APU) – return of segmented (sent) buffer (APU->host) – notification of received buffer (APU->host) – return of a buffer to a free list (host->APU) – request to open a connection (host->APU) – request to close a connection (host->APU) – request to copy statistics vector (host->APU) • receive cell header lookup • scheduling connections for transmit Operating Software BookL64364PG.fm5 Page 5 Friday, January 28, 2000 4:58 PM • ABR rate computations • VBR leaky bucket computations • collecting statistics – number of received and transmitted cells – number of received and transmitted PDUs – number and type of errors (CRC32, lost or misinserted cells, etc.) 11.3 Host Program The host program allows you to send commands to the ATMizer II+ chip and to display the results of these commands. This involves opening connections, transmitting and receiving data, and displaying statistics such as effective rate, errors received, etc. 11.3.1 Setting up a Configuration File The host program will not accept your commands during execution time. It will accept several initialization commands given after the program starts. After the go command is issued, the host program does not scan for user input. There is no data consistency checking on the received channel. The initialization commands allow you to: • set the size of all the buffers transmitted to the ATMizer II+ chip. • request the ATMizer II+ chip to open connections with specific parameters. – the number of connections to open – class: CBR, UBR – PCR: rate to request for that connection The go command starts the dialog between the host and the ATMizer II+ chip. According to the previous initialization commands, it will open the requested connections and start transmitting and receiving data. See Section 11.3.2.2, “Read Command Line Options,” for syntax details. Host Program 11-5 BookL64364PG.fm5 Page 6 Friday, January 28, 2000 4:58 PM Due to the sequencing of the commands (initialization commands first, and then go), it is easy to write all of these commands in a configuration (script) file. You will then be able to send this file through the communications program you are using (e.g., tip in Unix and Crosstalk, Procomm Plus, etc., in DOS) instead of typing all the commands one by one. This file will simply be a list of the commands to issue. A typical example is shown in Figure 11.1. Figure 11.1 A Typical Configuration File # Comments start with a “#” # buffsize buffer_size_in_bytes : set the buffer size buffsize 1024 # open connections connection_class rate : open the different # connections with specified class and rate open 1-3,8 CBR 25e6 open 3-5 CBR 12.5e6 open 6,7 CBR 0.1e6 # run ! go Again, see Section 11.3.2.2, “Read Command Line Options,” for syntax details. 11.3.2 Host Tasks The different tasks the host will have to perform during a demonstration are described in the following paragraphs. 11.3.2.1 Initialize the Data Structure in Primary Memory The host will have to initialize the following structures: TxRing – The TxRing holds the buffer numbers sent by the host to the ATMizer II+ chip for transmission. It is located in CBM and has a credit-type flow control. At initialization, this ring is filled with zeros. 11-6 Operating Software BookL64364PG.fm5 Page 7 Friday, January 28, 2000 4:58 PM Later the ring will contain buffer numbers that will allow the host to identify the corresponding BFD in the TxBFDList. It is maintained by a RingDesc_t structure. TxBFDList – This is the array containing the BFDs used for transmission. Each BFD has a field (pBuffData) pointing to an actual buffer in the primary memory. At initialization when the list is created, each NextBFD field is updated to point to the following BFD in the array, and each pBuffData field points to a buffer in the primary memory. The BuffSize field needs to be set according to the size you defined for the transmit buffers (buffsize user command). Tx Buffers – Tx Buffers are the buffers actually transmitted on the line. They are filled with random data at initialization. The size of the Tx Buffers is defined by you for all of the connections that are opened. It is a fixed value lower or equal to 1024 bytes. As explained in Section 10.7, “Memory Allocation,” the transmit buffers are overlapped every 16 bytes. RxRing The RxRing holds buffer numbers identifying BFDs from either the RxSmallBFDList or RxLargeBFDList, depending on the value of the BuffSize field. This ring also has a credit-type flow control. It is located in the primary memory. The ring is filled with zeros at initialization. RxSmallBFDList and RxLargeBFDList RxSmallBFDList and RxLargeBFDList are two arrays containing BFDs pointing to small and large buffers in the primary memory. These buffers are used by the ATMizer II+ chip to store the data received it receives from the line and reassembles. At initialization, the host builds these arrays the same way it creates the TxBFDList, by assigning small (64 bytes) or large (256 bytes) buffers to each BFD. Even though these lists are initialized by the host (they are in the primary memory), only the ATMizer II+ chip maintains them. Host Program 11-7 BookL64364PG.fm5 Page 8 Friday, January 28, 2000 4:58 PM ConnectionList This list is located in the host’s private memory and it describes the different parameters associated to each connection. There is one entry per connection requested, and each descriptor is created according to what you specified in the open connection command. See Section 1.3.1.1, “Connection Numbers,” for more details on the contents of the list. 11.3.2.2 Read Command Line Options This procedure analyzes the command line arguments, interprets them, and executes the corresponding functions. Any line beginning with a # is a comment line and is ignored. The commands are described in the following paragraphs. Set transmit buffer size: buffsize buffer_size where buffer_size is the size in bytes to use for all the buffers transmitted to the ATMizer II+ chip. Open connection: open connections class class_fields min_buffer_size max_buffer_size] where: connections is the list of connections to open. The format is a,c,e-h to open connections a, c, e, f, g, h (e to h). class is ABR, CBR, VBR, or UBR. class_fields are fields depending on the class type. See Section 1.3.1.3, “Host Connection Descriptors,” for details. min_buffer_size max_buffer_size - These parameters will be available in a future version of the software. They set the size of the transmit buffers to use for that connection. If max = min, this size will be used for all the buffers for that connection. If max > min, the software will use a random value between min and max. 11-8 Operating Software BookL64364PG.fm5 Page 9 Friday, January 28, 2000 4:58 PM Close connection: close connections where: connections is the list of connections to close. The format is a,c,e-h to close connections a, c, e, f, g, h (e to h). The connections have to be closed. A close_connection message is sent to the ATMizer II+ chip and no more buffers for the connections are transmitted. Hold connections: hold connections where: connections is the list of connections to hold. The format is a,c,e-h to hold connections a, c, e, f, g, h (e to h). This command stops the transfer of buffers to the connections but keeps them open. Refresh statistics display: stats S where: S is the number of seconds. This sets the time interval between two statistics screen updates. 11.3.2.3 Send Messages to the APU The messages the host sends to the APU are: • open connection • close connection • get statistics See section Section 1.3.2.1, “Mailbox,” for details on the messages. The SendMsgToSAR procedure simply writes the messages (32-bit wide words) to the memory-mapped mailbox register. No flow control is Host Program 11-9 BookL64364PG.fm5 Page 10 Friday, January 28, 2000 4:58 PM performed when accessing the mailbox since the host sends only one message at a time and waits for an acknowledge before sending another message. 11.3.2.4 Receive Messages from the APU This procedure scans the APU-to-host mailbox for the presence of new messages. As the host has to scan the mailbox regularly (for example to wait for the acknowledge to the Get Statistics message), it is better to have the mailbox located in the primary memory rather than to use the PCI mapped APU-to-host mailbox. Indeed, if the mailbox is located in the primary memory, writes to the mailbox by the APU will use the PCI Bus and then consume some bandwidth but they will not occur very often. On the other hand, reads from the mailbox will be done regularly by the host, but they won’t need access to the PCI Bus. Since the messages sent by the APU are only acknowledgments of the host commands (only one message at a time – no chance to overwrite a previous message), the APU-to-host mailbox can be a 32-bit location at a fixed address in the primary memory. The procedure ReadMsg returns the message content. 11.3.2.5 Open Connections All connections are opened at startup. The ConnexionList in the host’s private memory holds one Host +Connection Descriptor (HCD) per connection. This descriptor contains data to be used by the APU to open the connection and also data to be used by the host to maintain connection statistics. Since the APU needs only the first 32 bytes of the connection descriptor, the host copies the relevant bytes from its private memory to a fixed location in the primary memory. It then sends a message to the ATMizer II+ chip with the address of the connection descriptor in the primary memory and sets the HCD Status field to REQ_OPEN. The host then waits for the acknowledge from the ATMizer II+ chip before sending data to that particular connection. When the acknowledge is received, the host updates the status field in the HCD from REQUESTED to OPEN. It is then possible to send data for that connection. The host has to execute the following (Figure 11.2): 11-10 Operating Software BookL64364PG.fm5 Page 11 Friday, January 28, 2000 4:58 PM Figure 11.2 Opening Connections For (n=1 to Number of connections) Send message to SAR (OPEN_CONNECTION) Wait until (Read message from SAR == ACK_OPEN_CONNECTION) Connection[n].Status = OPEN 11.3.2.6 Close Connections This command sends a “CLOSE_CONNECTION n” message to the APU. Once the message is sent, the HCD status field is changed from OPEN to REQ_CLOSE. From then on, no more data for the connections is sent to the ATMizer II+ chip by the host but the Rx Buffer is still taken into account. When the acknowledge with the correct connection number(s) is received from the ATMizer II+ chip, the status field is changed to CLOSED and any buffers corresponding to the closed connection are discarded by the host. 11.3.2.7 Transmit Buffers to the ATMizer II+ Chip After memory initialization at startup and when all the open connection requests have been acknowledged by the APU, the host sends two buffers per connection to the TxRing. Then from there on, each time the host receives a TxDone notification from the APU in the RxRing, it sends back the buffer that has just been “Done” to the TxRing. Note that it is necessary to send more than one buffer per connection at startup to be sure that there will always be data to send to the line interface. Also, each time a buffer is sent, the HCD BytesSent field is incremented with the size of the buffer. 11.3.2.8 Receive Buffers from the ATMizer II+ Chip Each time the host receives a new buffer from the APU in the RxRing, it: • extracts the connection number from the BFD, • checks the status bits of the BFD_Ctrl field and updates the statistics field for the corresponding connection number (“BadBufs”), • increments the number of bytes received for that connection (“BytesRec”), and Host Program 11-11 BookL64364PG.fm5 Page 12 Friday, January 28, 2000 4:58 PM • writes the buffer number and three control bits (valid, BFS_BuffLarge and BFS_BuffFree) into the RxMbx to free it. No error checking is done on the data received. Note that, to maintain a constant flow, the received buffers are not looped back to the ATMizer II+ chip by the host. Indeed, only the Tx buffers are resent as soon as the TxDone notification is found in the TxRing (see the previous paragraph). 11.3.2.9 Request Statistics from the ATMizer II+ Chip To request statistics from the ATMizer II+ chip, the host sends a Get Statistics command with a pointer to the memory location where the data will be copied by the APU. See Section 1.3.2.1, “Mailbox.” This command is issued regularly by the host so that the statistics display in real time. 11.3.2.10 Display the Statistics When the host receives the acknowledge to the Get Statistics command from the APU, this procedure calculates and displays the following values: 11-12 • effective rate per connection • global rate, all connections considered • bad buffers per connection • bytes sent/received per connection • cells sent/received per connection • PDUs sent/received per connection Operating Software BookL64364PG.fm5 Page 25 Friday, January 28, 2000 4:58 PM Customer Feedback We would appreciate your feedback on this document. Please copy the following page, add your comments, and fax it to us at the number shown. If appropriate, please also fax copies of any marked-up pages from this document. Important: Please include your name, phone number, fax number, and company address so that we may contact you directly for clarification or additional information. Thank you for your help in improving the quality of our documents. BookL64364PG.fm5 Page 26 Friday, January 28, 2000 4:58 PM Reader’s Comments Fax your comments to: LSI Logic Corporation Technical Publications M/S E-198 Fax: 408.433.4333 Please tell us how you rate this document: L64364 ATMizer® II+ ATM-SAR Chip Programming Guide. Place a check mark in the appropriate blank for each category. Excellent Good Average Completeness of information Clarity of information Ease of finding information Technical content Usefulness of examples and illustrations Overall manual Fair Poor ____ ____ ____ ____ ____ ____ ____ ____ ____ ____ ____ ____ ____ ____ ____ ____ ____ ____ ____ ____ ____ ____ ____ ____ ____ ____ ____ ____ ____ ____ What could we do to improve this document? If you found errors in this document, please specify the error and page number. If appropriate, please fax a marked-up copy of the page(s). Please complete the information below so that we may contact you directly for clarification or additional information. Name Telephone Title Department Company Name Street City, State, Zip Customer Feedback Date Fax Mail Stop BookL64364PG.fm5 Page 27 Friday, January 28, 2000 4:58 PM U.S. Distributors by State A. E. Avnet Electronics http://www.hh.avnet.com B. M. Bell Microproducts, Inc. (for HAB’s) http://www.bellmicro.com I. E. Insight Electronics http://www.insight-electronics.com W. E. Wyle Electronics http://www.wyle.com Alabama Daphne I. E. Tel: 334.626.6190 Huntsville A. E. Tel: 256.837.8700 I. E. Tel: 256.830.1222 W. E. Tel: 800.964.9953 Alaska A. E. Tel: 800.332.8638 Arkansas W. E. Tel: 972.235.9953 Arizona Phoenix A. E. Tel: 480.736.7000 B. M. Tel: 602.267.9551 W. E. Tel: 800.528.4040 Tempe I. E. Tel: 480.829.1800 Tucson A. E. Tel: 520.742.0515 California Agoura Hills B. M. Tel: 818.865.0266 Irvine A. E. Tel: 949.789.4100 B. M. Tel: 949.470.2900 I. E. Tel: 949.727.3291 W. E. Tel: 800.626.9953 Los Angeles A. E. Tel: 818.594.0404 W. E. Tel: 800.288.9953 Sacramento A. E. Tel: 916.632.4500 W. E. Tel: 800.627.9953 San Diego A. E. Tel: 858.385.7500 B. M. Tel: 858.597.3010 I. E. Tel: 800.677.6011 W. E. Tel: 800.829.9953 San Jose A. E. Tel: 408.435.3500 B. M. Tel: 408.436.0881 I. E. Tel: 408.952.7000 Santa Clara W. E. Tel: 800.866.9953 Woodland Hills A. E. Tel: 818.594.0404 Westlake Village I. E. Tel: 818.707.2101 Colorado Denver A. E. Tel: 303.790.1662 B. M. Tel: 303.846.3065 W. E. Tel: 800.933.9953 Englewood I. E. Tel: 303.649.1800 Connecticut Cheshire A. E. Tel: 203.271.5700 I. E. Tel: 203.272.5843 Wallingford W. E. Tel: 800.605.9953 Delaware North/South A. E. Tel: 800.526.4812 Tel: 800.638.5988 B. M. Tel: 302.328.8968 W. E. Tel: 856.439.9110 Florida Altamonte Springs B. M. Tel: 407.682.1199 I. E. Tel: 407.834.6310 Boca Raton I. E. Tel: 561.997.2540 Clearwater I. E. Tel: 727.524.8850 Fort Lauderdale A. E. Tel: 954.484.5482 W. E. Tel: 800.568.9953 Miami B. M. Tel: 305.477.6406 Orlando A. E. Tel: 407.657.3300 W. E. Tel: 407.740.7450 Tampa W. E. Tel: 800.395.9953 St. Petersburg A. E. Tel: 727.507.5000 Georgia Atlanta A. E. Tel: 770.623.4400 B. M. Tel: 770.980.4922 W. E. Tel: 800.876.9953 Duluth I. E. Tel: 678.584.0812 Hawaii A. E. Tel: 800.851.2282 Idaho A. E. W. E. Tel: 801.365.3800 Tel: 801.974.9953 Illinois North/South A. E. Tel: 847.797.7300 Tel: 314.291.5350 Chicago B. M. Tel: 847.413.8530 W. E. Tel: 800.853.9953 Schaumburg I. E. Tel: 847.885.9700 Indiana Fort Wayne I. E. Tel: 219.436.4250 W. E. Tel: 888.358.9953 Indianapolis A. E. Tel: 317.575.3500 Iowa W. E. Tel: 612.853.2280 Cedar Rapids A. E. Tel: 319.393.0033 Kansas W. E. Tel: 303.457.9953 Kansas City A. E. Tel: 913.663.7900 Lenexa I. E. Tel: 913.492.0408 Kentucky W. E. Tel: 937.436.9953 Central/Northern/ Western A. E. Tel: 800.984.9503 Tel: 800.767.0329 Tel: 800.829.0146 Louisiana W. E. Tel: 713.854.9953 North/South A. E. Tel: 800.231.0253 Tel: 800.231.5575 Maine A. E. W. E. Tel: 800.272.9255 Tel: 781.271.9953 Maryland Baltimore A. E. Tel: 410.720.3400 W. E. Tel: 800.863.9953 Columbia B. M. Tel: 800.673.7461 I. E. Tel: 410.381.3131 Massachusetts Boston A. E. Tel: 978.532.9808 W. E. Tel: 800.444.9953 Burlingtonr I. E. Tel: 781.270.9400 Marlborough B. M. Tel: 508.480.9099 Woburn B. M. Tel: 781.933.9010 Michigan Brighton I. E. Tel: 810.229.7710 Detroit A. E. Tel: 734.416.5800 W. E. Tel: 888.318.9953 Minnesota Champlin B. M. Tel: 800.557.2566 Eden Prairie B. M. Tel: 800.255.1469 Minneapolis A. E. Tel: 612.346.3000 W. E. Tel: 800.860.9953 St. Louis Park I. E. Tel: 612.525.9999 Mississippi A. E. Tel: 800.633.2918 W. E. Tel: 256.830.1119 Missouri W. E. Tel: 630.620.0969 St. Louis A. E. Tel: 314.291.5350 I. E. Tel: 314.872.2182 Montana A. E. Tel: 800.526.1741 W. E. Tel: 801.974.9953 Nebraska A. E. Tel: 800.332.4375 W. E. Tel: 303.457.9953 Nevada Las Vegas A. E. Tel: 800.528.8471 W. E. Tel: 702.765.7117 New Hampshire A. E. Tel: 800.272.9255 W. E. Tel: 781.271.9953 New Jersey North/South A. E. Tel: 201.515.1641 Tel: 609.222.6400 Mt. Laurel I. E. Tel: 609.222.9566 Pine Brook W. E. Tel: 800.862.9953 Parsippany I. E. Tel: 973.299.4425 Wayne W. E. Tel: 973.237.9010 New Mexico W. E. Tel: 480.804.7000 Albuquerque A. E. Tel: 505.293.5119 BookL64364PG.fm5 Page 28 Friday, January 28, 2000 4:58 PM U.S. Distributors by State (Continued) New York Hauppauge I. E. Tel: 516.761.0960 Long Island A. E. Tel: 516.434.7400 W. E. Tel: 800.861.9953 Rochester A. E. Tel: 716.475.9130 I. E. Tel: 716.242.7790 W. E. Tel: 800.319.9953 Smithtown B. M. Tel: 800.543.2008 Syracuse A. E. Tel: 315.449.4927 North Carolina Raleigh A. E. Tel: 919.859.9159 I. E. Tel: 919.873.9922 W. E. Tel: 800.560.9953 North Dakota A. E. Tel: 800.829.0116 W. E. Tel: 612.853.2280 Ohio Cleveland A. E. Tel: 216.498.1100 W. E. Tel: 800.763.9953 Dayton A. E. Tel: 614.888.3313 I. E. Tel: 937.253.7501 W. E. Tel: 800.575.9953 Strongsville B. M. Tel: 440.238.0404 Valley View I. E. Tel: 216.520.4333 Oklahoma W. E. Tel: 972.235.9953 Tulsa A. E. Tel: 918.459.6000 I. E. Tel: 918.665.4664 Oregon Beavertonr B. M. Tel: 503.524.0787 I. E. Tel: 503.644.3300 Portland A. E. Tel: 503.526.6200 W. E. Tel: 800.879.9953 Pennsylvania Mercer I. E. Tel: 412.662.2707 Pittsburgh A. E. Tel: 412.281.4150 W. E. Tel: 440.248.9996 Philadelphia A. E. Tel: 800.526.4812 B. M. Tel: 215.741.4080 W. E. Tel: 800.871.9953 Rhode Island A. E. 800.272.9255 W. E. Tel: 781.271.9953 South Carolina A. E. Tel: 919.872.0712 W. E. Tel: 919.469.1502 South Dakota A. E. Tel: 800.829.0116 W. E. Tel: 612.853.2280 Tennessee W. E. Tel: 256.830.1119 East/West A. E. Tel: 800.241.8182 Tel: 800.633.2918 Texas Austin A. E. Tel: 512.219.3700 B. M. Tel: 512.258.0725 I. E. Tel: 512.719.3090 W. E. Tel: 800.365.9953 Dallas A. E. Tel: 214.553.4300 B. M. Tel: 972.783.4191 W. E. Tel: 800.955.9953 El Paso A. E. Tel: 800.526.9238 Houston A. E. Tel: 713.781.6100 B. M. Tel: 713.917.0663 W. E. Tel: 800.888.9953 Richardson I. E. Tel: 972.783.0800 Rio Grande Valley A. E. Tel: 210.412.2047 Stafford I. E. Tel: 281.277.8200 Utah Centerville B. M. Tel: 801.295.3900 Murray I. E. Tel: 801.288.9001 Salt Lake City A. E. Tel: 801.365.3800 W. E. Tel: 800.477.9953 Vermont A. E. Tel: 800.272.9255 W. E. Tel: 716.334.5970 Virginia A. E. Tel: 800.638.5988 W. E. Tel: 301.604.8488 Washington Kirkland I. E. Tel: 425.820.8100 Seattle A. E. Tel: 425.882.7000 W. E. Tel: 800.248.9953 West Virginia A. E. Tel: 800.638.5988 Wisconsin Milwaukee A. E. Tel: 414.513.1500 W. E. Tel: 800.867.9953 Wauwatosa I. E. Tel: 414.258.5338 Wyoming A. E. Tel: 800.332.9326 W. E. Tel: 801.974.9953 BookL64364PG.fm5 Page 29 Friday, January 28, 2000 4:58 PM Sales Offices and Design Resource Centers LSI Logic Corporation Corporate Headquarters Tel: 408.433.8000 Fax: 408.433.8989 NORTH AMERICA California Costa Mesa - Mint Technology Tel: 949.752.6468 Fax: 949.752.6868 Irvine ♦ Tel: 949.809.4600 Fax: 949.809.4444 Pleasanton Design Center Tel: 925.730.8800 Fax: 925.730.8700 San Diego Tel: 858.467.6981 Fax: 858.496.0548 Silicon Valley ♦ Tel: 408.433.8000 Fax: 408.954.3353 Wireless Design Center Tel: 858.350.5560 Fax: 858.350.0171 Colorado Boulder ♦ Tel: 303.447.3800 Fax: 303.541.0641 Colorado Springs Tel: 719.533.7000 Fax: 719.533.7020 Fort Collins Tel: 970.223.5100 Fax: 970.206.5549 Florida Boca Raton Tel: 561.989.3236 Fax: 561.989.3237 Georgia Alpharetta Tel: 770.753.6146 Fax: 770.753.6147 Illinois Oakbrook Terrace Tel: 630.954.2234 Fax: 630.954.2235 Kentucky Bowling Green Tel: 270.793.0010 Fax: 270.793.0040 Maryland Bethesda Tel: 301.897.5800 Fax: 301.897.8389 Massachusetts Waltham ♦ Tel: 781.890.0180 Fax: 781.890.6158 Burlington - Mint Technology Tel: 781.685.3800 Fax: 781.685.3801 Minnesota Minneapolis ♦ Tel: 612.921.8300 Fax: 612.921.8399 New Jersey Red Bank Tel: 732.933.2656 Fax: 732.933.2643 Cherry Hill - Mint Technology Tel: 609.489.5530 Fax: 609.489.5531 New York Fairport Tel: 716.218.0020 Fax: 716.218.9010 North Carolina Raleigh Tel: 919.785.4520 Fax: 919.783.8909 Oregon Beaverton Tel: 503.645.0589 Fax: 503.645.6612 Texas Austin Tel: 512.388.7294 Fax: 512.388.4171 Plano ♦ Tel: 972.244.5000 Fax: 972.244.5001 Houston Tel: 281.379.7800 Fax: 281.379.7818 Canada Ontario Ottawa ♦ Tel: 613.592.1263 Fax: 613.592.3253 INTERNATIONAL France Paris LSI Logic S.A. Immeuble Europa ♦ Tel: 33.1.34.63.13.13 Fax: 33.1.34.63.13.19 Germany Munich LSI Logic GmbH ♦ Tel: 49.89.4.58.33.0 Fax: 49.89.4.58.33.108 Stuttgart Tel: 49.711.13.96.90 Fax: 49.711.86.61.428 Italy Milano LSI Logic S.P.A. ♦ Tel: 39.039.687371 Fax: 39.039.6057867 Japan Tokyo LSI Logic K.K. ♦ Tel: 81.3.5463.7821 Fax: 81.3.5463.7820 Osaka ♦ Tel: 81.6.947.5281 Fax: 81.6.947.5287 Korea Seoul LSI Logic Corporation of Korea Ltd Tel: 82.2.528.3400 Fax: 82.2.528.2250 The Netherlands Eindhoven LSI Logic Europe Ltd Tel: 31.40.265.3580 Fax: 31.40.296.2109 Singapore Singapore LSI Logic Pte Ltd Tel: 65.334.9061 Fax: 65.334.4749 Tel: 65.835.5040 Fax: 65.732.5047 Sweden Stockholm LSI Logic AB ♦ Tel: 46.8.444.15.00 Fax: 46.8.750.66.47 Taiwan Taipei LSI Logic Asia, Inc. Taiwan Branch Tel: 886.2.2718.7828 Fax: 886.2.2718.8869 United Kingdom Bracknell LSI Logic Europe Ltd ♦ Tel: 44.1344.426544 Fax: 44.1344.481039 ♦ Sales Offices with Design Resource Centers BookL64364PG.fm5 Page 30 Friday, January 28, 2000 4:58 PM International Distributors Australia New South Wales Reptechnic Pty Ltd ♦ Tel: 612.9953.9844 Fax: 612.9953.9683 Belgium Acal nv/sa Tel: 32.2.7205983 Fax: 32.2.7251014 China Beijing LSI Logic International Services Inc. Tel: 86.10.6804.2534 Fax: 86.10.6804.2521 France Rungis Cedex Azzurri Technology France Tel: 33.1.41806310 Fax: 33.1.41730340 Germany Haar EBV Elektronik Tel: 49.89.4600980 Fax: 49.89.46009840 Munich Avnet Emg GmbH Tel: 49.89.45110102 Fax: 49.89.42.27.75 Wuennenberg-Haaren Peacock AG Tel: 49.2957.79.1692 Fax: 49.2957.79.9341 Hong Kong Hong Kong AVT Industrial Ltd Tel: 852.2428.0008 Fax: 852.2401.2105 EastEle Tel: 852.2798.8860 Fax: 852.2305.0640 India Bangalore Spike Technologies India Private Ltd ♦ Tel: 91.80.664.5530 Fax: 91.80.664.9748 Israel Tel Aviv Eastronics Ltd Tel: 972.3.6458777 Fax: 972.3.6458666 Japan Tokyo Global Electronics Corporation Tel: 81.3.3260.1411 Fax: 81.3.3260.7100 Technical Center Tel: 81.471.43.8200 Yokohama-City Macnica Corporation Tel: 81.45.939.6140 Fax: 81.45.939.6141 The Netherlands Eindhoven Acal Nederland b.v. Tel: 31.40.2.502602 Fax: 31.40.2.510255 Switzerland Brugg LSI Logic Sulzer AG Tel: 41.32.3743232 Fax: 41.32.3743233 Taiwan Taipei Avnet-Mercuries Corporation, Ltd Tel: 886.2.2516.7303 Fax: 886.2.2505.7391 Lumax International Corporation, Ltd Tel: 886.2.2788.3656 Fax: 886.2.2788.3568 Prospect Technology Corporation, Ltd Tel: 886.2.2721.9533 Fax: 886.2.2773.3756 Serial Semiconductor Corporation, Ltd Tel: 886.2.2579.5858 Fax: 886.2.2570.3123 United Kingdom Maidenhead Azzurri Technology Ltd Tel: 44.1628.826826 Fax: 44.1628.829730 Swindon EBV Elektronik Tel: 44.1793.849933 Fax: 44.1793.859555 ♦ Sales Offices with Design Resource Centers