RM0078 Reference manual SPEAr1340 architecture and functionality Introduction The SPEAr1340 is a member of the SPEAr® (structured processor enhanced architecture) family of embedded microprocessors, targeting high-performance human-machine interface (HMI) applications. It offers an unprecedented combination of integer/floating-point CPU performance, media processing, security features, and aggressive power reduction control for next-generation products. SPEAr1340 is based on ARM's latest multi-core technology (Cortex-A9 SMP/AMP, ARMv7 instruction set) and it is manufactured using ST's 55 nm HCMOS low-power silicon process. This document provides technical details about the architecture and functionality of SPEAr1340, and is intended to be used by systems-level and board-level product designers, as well as software developers. The SPEAr1340 address map and detailed register descriptions are provided in the companion reference manual: RM0089, Reference manual, SPEAr1340 address map and registers. November 2012 Doc ID 018553 Rev 3 1/590 www.st.com Contents RM0078 Contents 1 2 Device overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 1.1 Simplified block diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 1.2 Summary of features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 1.3 IP groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 CPU subsystem (A9SM) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 2.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 2.2 Pins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 2.3 Clocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 2.4 Interrupts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 2.5 Functional description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 2.6 2.5.1 CORTEXA9INTEGRATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 2.5.2 A9 CoreSight subsystem (A9CS) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 2.5.3 Clock manager (CMR) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 2.5.4 Snoop control unit (SCU) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 2.5.5 Global timer (GTIM) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 2.5.6 Timer and watchdog blocks (WDTIM) . . . . . . . . . . . . . . . . . . . . . . . . . . 42 2.5.7 Generic interrupt controller (GIC) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 2.6.1 3 4 2/590 Programming the global timer registers . . . . . . . . . . . . . . . . . . . . . . . . . 48 Multilayer interconnect matrix (BUSMATRIX) . . . . . . . . . . . . . . . . . . . . 49 3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 3.2 Pins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 3.3 Clocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 3.4 Interrupts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 3.5 Functional description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 3.5.1 Crossbars (XB) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 3.5.2 Shared link (SL) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 3.5.3 S3220 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 3.5.4 Masters (IAs) and slaves (TAs) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 System configuration registers (MISC) . . . . . . . . . . . . . . . . . . . . . . . . . 55 Doc ID 018553 Rev 3 RM0078 5 Contents Reset and clock generator (RCG) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 5.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 5.2 Pins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 5.3 Clocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 5.4 Functional description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 5.5 6 5.4.1 Main clock sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 5.4.2 PLLs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 5.4.3 Fractional clock generator (SSCG) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 5.4.4 XYSYNT clock divider . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 5.4.5 AMBA clock configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 5.4.6 A9SM clock configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 5.4.7 GMAC clock configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 5.4.8 I2S clock configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 5.4.9 UART clock configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 5.4.10 C3 clock configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 5.4.11 CLCD clock configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 5.4.12 GPT clock configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 5.4.13 MPMC clock configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 5.4.14 Gate unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 5.4.15 Reset generator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 5.5.1 Programming PLLs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 5.5.2 Changing system modes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 5.5.3 Setting cpu_clk = 600 MHz and hclk = 166 MHz . . . . . . . . . . . . . . . . . . 83 5.5.4 Configuring the fractional clock generator (SSCG) . . . . . . . . . . . . . . . . 83 Power management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 6.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 6.2 Power domains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 6.2.1 Power domain management: power states . . . . . . . . . . . . . . . . . . . . . . 85 6.2.2 Power domain management: configuration registers . . . . . . . . . . . . . . . 87 6.2.3 Power management procedures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 6.3 Clock power management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 6.4 IP power management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 6.4.1 Standard IP power management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 6.4.2 USBPHY power management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 Doc ID 018553 Rev 3 3/590 Contents 7 RM0078 6.4.3 MPMC/DDR PHY power management . . . . . . . . . . . . . . . . . . . . . . . . . 92 6.4.4 PCIE/SATA/MIPHY power management . . . . . . . . . . . . . . . . . . . . . . . . 93 6.4.5 ADC power management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 6.5 Voltage regulators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 6.6 Power control module (PCM) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 6.6.1 Pins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 6.6.2 Functional description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 BootROM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 7.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 7.1.1 7.2 7.3 7.4 4/590 Useful terms and definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 Functional description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 7.2.1 Hardware components used . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 7.2.2 OTP configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 7.2.3 Boot device selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 7.2.4 Software architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 7.2.5 System initialization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 7.2.6 Boot device initialization and code-shadowing . . . . . . . . . . . . . . . . . . 112 7.2.7 Xloader authentication and execution . . . . . . . . . . . . . . . . . . . . . . . . . 124 7.2.8 Image header authentication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126 7.2.9 Default boot mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128 Secure boot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 7.3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 7.3.2 First stage secure boot process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 7.3.3 Life cycle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 7.3.4 Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 7.3.5 Security table in BootROM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 7.3.6 BootROM and RAM layout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 7.3.7 OTP layout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 7.3.8 Usage examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 7.3.9 BootROM signed image format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138 7.3.10 Image signature cryptography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 Additional information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142 7.4.1 BootROM on Core 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142 7.4.2 Error codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 7.4.3 List of supported devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 7.4.4 BootROM table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144 Doc ID 018553 Rev 3 RM0078 8 Contents Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146 7.4.6 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147 Static RAMs (SRAM) 8.1 9 7.4.5 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148 One-time programmable antifuse (OTP) . . . . . . . . . . . . . . . . . . . . . . . 149 9.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149 9.2 Pins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149 9.3 Clocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149 9.4 Functional description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149 9.4.1 9.5 10 11 12 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148 OTP banks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150 Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151 9.5.1 Writing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151 9.5.2 Masking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151 General purpose timers (GPT) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152 10.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152 10.2 Pins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 10.3 Clocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 10.4 Interrupts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 10.5 Functional description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 Real-time clock (RTC) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154 11.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154 11.2 Pins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154 11.3 Clocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154 11.4 Interrupts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154 11.5 Functional description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155 Direct memory access controllers (DMAC) . . . . . . . . . . . . . . . . . . . . . 156 12.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156 12.2 Pins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156 12.3 Clocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156 12.4 Interrupts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157 12.5 Functional description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157 Doc ID 018553 Rev 3 5/590 Contents RM0078 12.6 13 12.5.2 DMAC multiplexing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158 12.5.3 DMAC transfers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161 12.5.4 Generating requests for the AHB master bus interface . . . . . . . . . . . . 164 12.5.5 AHB master interface arbitration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164 12.5.6 Scatter/Gather . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165 12.5.7 Endianness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165 Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166 12.6.1 DMAC transfer types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166 12.6.2 Programming example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173 12.6.3 Programming a channel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176 12.6.4 Disabling a channel prior to transfer completion . . . . . . . . . . . . . . . . . 199 12.6.5 Defined-length burst support on DMAC . . . . . . . . . . . . . . . . . . . . . . . . 200 13.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201 13.2 Functional description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202 13.2.1 AHB Master Interface (HIF) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202 13.2.2 C3 RAM Buffer (MEMORY) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203 13.2.3 Instruction dispatching subsystem (IDS) . . . . . . . . . . . . . . . . . . . . . . . 203 13.2.4 Couple and chaining module (CCM) . . . . . . . . . . . . . . . . . . . . . . . . . . 204 13.2.5 AHB Slave interface (SIF) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204 13.2.6 System registers (SYS) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204 13.2.7 Reset logic (MRGEN) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204 13.2.8 Channels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204 Operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206 13.3.1 Generic flow type instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206 13.3.2 Move channel instruction set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207 13.3.3 DES/3DES channel instruction set . . . . . . . . . . . . . . . . . . . . . . . . . . . 209 13.3.4 AES (MPCM) channel instruction set . . . . . . . . . . . . . . . . . . . . . . . . . 212 13.3.5 Unified hash with HMAC (UHH) channel instruction set . . . . . . . . . . . 217 13.3.6 Unified hash with HMAC 2 (UHH2) channel instruction set . . . . . . . . . 222 13.3.7 Public key (PKA) channel instruction set . . . . . . . . . . . . . . . . . . . . . . . 226 13.3.8 RNG channel instruction set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232 Temperature sensor (THSENS) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233 14.1 6/590 DMAC wrapper . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157 Cryptographic co-processor (C3) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201 13.3 14 12.5.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233 Doc ID 018553 Rev 3 RM0078 Contents 14.2 Pins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233 14.3 Clocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233 14.4 Interrupts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234 14.5 Functional description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234 14.5.1 15 Multiport DDR2/3 controller (MPMC) . . . . . . . . . . . . . . . . . . . . . . . . . . 235 15.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235 15.2 Pins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235 15.3 Clocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235 15.3.1 16 Low power modes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234 Changing the input clock frequency . . . . . . . . . . . . . . . . . . . . . . . . . . . 236 15.4 Interrupts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237 15.5 Resets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237 15.6 Functional description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237 15.6.1 AXI interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238 15.6.2 AHB interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243 15.6.3 Initialization protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244 15.6.4 Exclusive access . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244 15.6.5 Error responses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244 15.6.6 Multiport arbiter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245 15.6.7 Command queue with placement logic . . . . . . . . . . . . . . . . . . . . . . . . 251 15.6.8 Other memory controller features . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253 15.6.9 Address mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255 Static memory controller (FSMC) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 258 16.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 258 16.2 Pins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259 16.3 Clocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259 16.4 Interrupts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259 16.5 Functional description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259 16.5.1 NAND Flash controller . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259 16.5.2 NOR Flash / SRAM controller . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 260 16.5.3 Asynchronous operating modes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261 16.5.4 ECC calculation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267 16.5.5 Bus turn around . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267 Doc ID 018553 Rev 3 7/590 Contents 17 RM0078 Serial NOR Flash controller (SMI) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 268 17.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 268 17.2 Pins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 268 17.3 Clocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269 17.3.1 17.4 Latency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269 Functional description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 270 17.4.1 AHB interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 270 17.4.2 Memory device compatibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 270 17.4.3 Hardware mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 270 17.4.4 Software mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271 17.4.5 Booting from external memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271 17.4.6 External memory read request . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271 17.4.7 External memory write request . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272 17.4.8 Write burst mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273 17.4.9 Read while write . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273 17.4.10 Erasing and write status register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273 18 19 Memory card interface (MCIF) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274 18.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274 18.2 Pins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274 18.3 Clocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275 18.4 Functional description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275 18.4.1 SD2.0/SDIO2.0/MMC4.3 AHB Host controller . . . . . . . . . . . . . . . . . . . 275 18.4.2 Not using DMA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 278 18.4.3 Using DMA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 280 18.4.4 Using ADMA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282 18.4.5 Abort transaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284 18.4.6 Synchronization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285 18.4.7 CF4.1/xD1.3 AHB Host controller . . . . . . . . . . . . . . . . . . . . . . . . . . . . 286 Giga/Fast Ethernet controller (GMAC) . . . . . . . . . . . . . . . . . . . . . . . . 291 19.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291 19.2 Pins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291 19.3 Clocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291 19.4 Functional description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291 19.4.1 8/590 Descriptors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 292 Doc ID 018553 Rev 3 RM0078 Contents 19.5 20 21 22 19.4.2 Precision Time Protocol (PTP) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303 19.4.3 Advanced Timestamps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 307 19.4.4 AV feature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 309 19.4.5 Energy efficient ethernet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 310 Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 310 19.5.1 Initializing DMA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311 19.5.2 Initializing GMAC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312 19.5.3 Performing normal receive and transmit operation . . . . . . . . . . . . . . . 313 19.5.4 Stopping and starting transmission . . . . . . . . . . . . . . . . . . . . . . . . . . . 313 19.5.5 GMII link transitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313 19.5.6 IEEE 1588 time stamping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314 19.5.7 AV feature initialization steps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315 19.5.8 Energy efficient ethernet initialization steps . . . . . . . . . . . . . . . . . . . . . 316 USB 2.0 host controllers (UHC) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 318 20.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 318 20.2 Pins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 318 20.3 Clocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 319 20.4 Functional description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 319 USB OTG controller (UOC) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324 21.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324 21.2 Pins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325 21.3 Clocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325 PCI express controller (PCIe) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 326 22.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 326 22.2 Pins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 328 22.3 Clocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 328 22.4 Resets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 328 22.5 Interrupts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 328 22.6 Functional description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329 22.6.1 AXI bridge interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 331 22.6.2 Common xpress port logic (CXPL) . . . . . . . . . . . . . . . . . . . . . . . . . . . 334 22.6.3 Transmit application-dependent module (XADM) . . . . . . . . . . . . . . . . 335 22.6.4 Receive application-dependent module (RADM) . . . . . . . . . . . . . . . . . 336 Doc ID 018553 Rev 3 9/590 Contents RM0078 22.7 22.6.5 Configuration-dependent module (CDM) . . . . . . . . . . . . . . . . . . . . . . . 338 22.6.6 Power management control (PMC) . . . . . . . . . . . . . . . . . . . . . . . . . . . 339 22.6.7 Local bus controller (LBC) and data bus interface (DBI) . . . . . . . . . . . 339 22.6.8 Message generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344 22.6.9 Hot plug control (HOTPLUG_CTRL) module . . . . . . . . . . . . . . . . . . . . 344 Operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344 22.7.1 Initialization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345 22.7.2 Link establishment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345 22.7.3 Transmit TLP processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345 22.7.4 Receive TLP processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 347 22.7.5 Error handling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 354 22.7.6 Messages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 358 22.7.7 Address translation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 364 22.7.8 Outbound iATU operation: address match mode . . . . . . . . . . . . . . . . . 366 22.7.9 Inbound iATU operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 366 22.7.10 Gen2 5.0GT/s operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 368 22.7.11 Power management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 368 22.8 23 10/590 22.8.1 Programming example 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 372 22.8.2 Programming example 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 372 Serial ATA controllers (SATA) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 374 23.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 374 23.2 Pins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 375 23.3 Clocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 375 23.4 Resets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 375 23.5 Interrupts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 375 23.6 Functional description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 376 23.7 24 Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 372 23.6.1 Bus interface unit (BIU) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 376 23.6.2 Generic registers (GCSR) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 379 23.6.3 Port . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 379 Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 389 23.7.1 Software initialization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 389 23.7.2 Software manipulation of Port DMA . . . . . . . . . . . . . . . . . . . . . . . . . . . 391 SATA/PCIe physical interface (MiPHY) . . . . . . . . . . . . . . . . . . . . . . . . 392 Doc ID 018553 Rev 3 RM0078 Contents 24.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 392 24.2 Pins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 392 24.3 Clocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 393 26 Reference clock configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 393 24.3.2 Recommended clock frequencies . . . . . . . . . . . . . . . . . . . . . . . . . . . . 394 24.3.3 SerDes clocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 395 24.4 Resets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 395 24.5 Functional description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 396 24.6 25 24.3.1 24.5.1 PLL description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 396 24.5.2 SerDes description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 397 24.5.3 Compensation module (COMPENS) description . . . . . . . . . . . . . . . . . 397 Operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 398 Asynchronous serial ports (UART) . . . . . . . . . . . . . . . . . . . . . . . . . . . 399 25.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 399 25.2 Pins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 400 25.3 Clocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 401 25.4 Interrupts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 401 25.5 Functional description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 403 25.5.1 Main interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 403 25.5.2 Modem operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 405 25.5.3 Hardware flow control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 406 25.5.4 IrDA SIR ENDEC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 408 25.5.5 Baud rate generation and transmit logic . . . . . . . . . . . . . . . . . . . . . . . 410 Synchronous serial port (SSP) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 413 26.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 413 26.2 Pins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 413 26.3 Clocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 414 26.4 Functional description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 414 26.4.1 26.5 26.6 Main interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 414 Operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 416 26.5.1 Bit rate generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 417 26.5.2 Frame format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 417 Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 418 26.6.1 Defining the chip select . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 418 Doc ID 018553 Rev 3 11/590 Contents 27 RM0078 12/590 26.6.3 Configuring SSP as master or slave . . . . . . . . . . . . . . . . . . . . . . . . . . 419 27.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 421 27.2 Pins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 421 27.3 Clocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 421 27.4 Functional description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 422 27.4.1 Main interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 422 27.4.2 I2C terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 423 27.4.3 I2C behavior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 424 27.4.4 I2C protocols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 426 27.4.5 Multiple master arbitration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 430 27.4.6 Clock synchronization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 431 27.4.7 IC_CLK frequency configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 432 27.4.8 SDA hold time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 436 27.4.9 DMA controller interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 437 Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 445 27.5.1 Slave mode operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 445 27.5.2 Master mode operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 448 27.5.3 Disabling I2C . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 449 General purpose I/O (GPIOA-B) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 450 28.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 450 28.2 Pins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 450 28.3 Clocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 450 28.4 Resets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 451 28.5 Interrupts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 451 28.6 Functional description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 451 28.7 29 Enabling SSP operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 418 I2C bus controllers (I2C) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 421 27.5 28 26.6.2 28.6.1 APB interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 451 28.6.2 Interrupt detection logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 452 Operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 454 28.7.1 Interrupt configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 454 28.7.2 Operation of the input/output lines (I/O read/write) . . . . . . . . . . . . . . . 455 Extended general purpose I/O (XGPIO) . . . . . . . . . . . . . . . . . . . . . . . . 457 Doc ID 018553 Rev 3 RM0078 30 Contents 29.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 457 29.2 Pins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 457 29.3 Clocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 457 29.4 Interrupts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 457 29.5 Functional description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 458 29.5.1 XGPIO IN read . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 458 29.5.2 XGPIO OUT write . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 458 29.5.3 Using an XGPIO pin as an interrupt . . . . . . . . . . . . . . . . . . . . . . . . . . 459 Keyboard controller (KBD) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 461 30.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 461 30.2 Pins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 461 30.3 Clocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 462 30.4 Interrupts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 462 30.5 Functional description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 462 30.5.1 31 32 Operating modes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 463 A/D converter (ADC) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 465 31.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 465 31.2 Pins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 465 31.3 Clocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 465 31.4 Interrupts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 465 31.5 Functional description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 466 31.5.1 Enhanced mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 466 31.5.2 Touchscreen mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 467 31.5.3 High-resolution mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 467 31.5.4 DMA handshaking interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 468 PWM generators (PWM) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 469 32.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 469 32.1.1 Pins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 469 32.2 Clocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 469 32.3 Functional description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 470 32.4 32.3.1 Prescaler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 470 32.3.2 Pulse generator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 470 Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 470 Doc ID 018553 Rev 3 13/590 Contents RM0078 32.4.1 33 34 Configuring a channel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 470 HDMI CEC interfaces (CEC) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 472 33.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 472 33.2 Pins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 473 33.3 Clocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 473 33.4 Interrupts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 473 33.5 Functional description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 473 33.5.1 Control logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 473 33.5.2 Bit timing logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 479 33.5.3 Bit shaping logic (BSL) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 480 33.5.4 Prescaler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 480 33.5.5 Normal functional behavior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 480 33.5.6 Error conditions and error handling . . . . . . . . . . . . . . . . . . . . . . . . . . . 482 Display controller (CLCD) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 484 34.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 484 34.2 Pins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 484 34.3 Clocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 485 34.4 Interrupts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 485 34.5 Functional description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 486 34.5.1 LCD controller core . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 486 34.5.2 Master and slave bus interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 487 34.5.3 Timing and control unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 487 34.5.4 DMA controller & memory interface . . . . . . . . . . . . . . . . . . . . . . . . . . . 487 34.5.5 Frame buffer organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 488 34.5.6 Input FIFOs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 491 34.5.7 Pixel unpack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 491 34.5.8 Palette lookup table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 496 34.5.9 Output FIFO and formatter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 496 34.5.10 Power sequencing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 497 34.5.11 Pulse-width modulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 497 34.5.12 Overlay windows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 497 35 Graphics processing unit (GPU) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 499 35.1 14/590 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 499 Doc ID 018553 Rev 3 RM0078 Contents 35.2 Clocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 500 35.3 Interrupts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 500 35.4 Resets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 500 35.5 Functional description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 501 35.6 36 35.5.1 Geometry processor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 502 35.5.2 Pixel processor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 503 35.5.3 Memory management unit (MMU) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 505 Operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 505 35.6.1 3D system level operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 506 35.6.2 2D system level operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 509 35.6.3 Graphics pipeline level operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 511 Video decoder (VDEC) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 512 36.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 512 36.2 Clocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 514 36.3 Interrupts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 514 36.4 36.3.1 Decoder interrupts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 514 36.3.2 Post-processor interrupts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 517 Functional description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 518 36.4.1 H.264 decoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 519 36.4.2 MPEG-4 / H.263 / Sorenson Spark decoder . . . . . . . . . . . . . . . . . . . . 521 36.4.3 MPEG-2 / MPEG-1 decoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 523 36.4.4 JPEG decoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 525 36.4.5 VC-1 decoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 526 36.4.6 RV decoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 528 36.4.7 VP6 decoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 530 36.4.8 VP7/VP8 decoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 532 36.4.9 AVS decoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 534 36.4.10 DivX decoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 536 36.4.11 Post processor (PP) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 536 36.4.12 Video frame storage formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 548 37 Video encoder (VENC) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 555 37.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 555 37.2 Clocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 556 37.3 Functional description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 556 Doc ID 018553 Rev 3 15/590 Contents 38 RM0078 40 16/590 Bus interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 556 37.3.2 Video stabilization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 556 37.3.3 Connectivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 557 37.3.4 Multi-instance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 558 Camera input interfaces (CAM) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 559 38.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 559 38.2 Pins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 559 38.3 Clocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 559 38.4 Interrupts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 560 38.5 Functional description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 560 38.6 39 37.3.1 38.5.1 Data capture and conversion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 560 38.5.2 Data transfers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 561 38.5.3 Synchronization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 561 38.5.4 Performance levels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 563 38.5.5 Capture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 564 Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 564 38.6.1 Selecting synchronization type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 564 38.6.2 Masking interrupts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 564 Video input parallel port (VIP) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 565 39.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 565 39.2 Pins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 566 39.3 Clocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 566 39.4 Interrupts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 566 39.5 Functional description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 567 I2S digital audio interfaces (I2S) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 569 40.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 569 40.2 Pins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 570 40.3 Clocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 570 40.4 Interrupts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 570 40.5 Functional description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 571 40.5.1 Transmit channels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 571 40.5.2 Receive channels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 571 40.5.3 Audio data interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 571 Doc ID 018553 Rev 3 RM0078 Contents 40.5.4 40.6 41 External sclk gating and enable signal . . . . . . . . . . . . . . . . . . . . . . . . 571 Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 572 40.6.1 Using the I2S transmitter (Tx mode) . . . . . . . . . . . . . . . . . . . . . . . . . . 572 40.6.2 Using the I2S receiver (Rx mode) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 572 40.6.3 Configuring channels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 572 40.6.4 Using interrupts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 573 40.6.5 Programming FIFO thresholds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 573 40.6.6 Exchanging data with system memory . . . . . . . . . . . . . . . . . . . . . . . . 573 S/PDIF digital audio ports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 574 41.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 574 41.2 Pins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 574 41.3 Clocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 574 41.4 Functional description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 574 41.4.1 SPDIF IN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 574 41.4.2 SPDIF OUT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 574 Appendix A Interrupts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 575 Appendix B Acronyms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 579 Appendix C Copyright statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 585 Revision history . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 586 Doc ID 018553 Rev 3 17/590 List of tables RM0078 List of tables Table 1. Table 2. Table 3. Table 4. Table 5. Table 6. Table 7. Table 8. Table 9. Table 10. Table 11. Table 12. Table 13. Table 14. Table 15. Table 16. Table 17. Table 18. Table 19. Table 20. Table 21. Table 22. Table 23. Table 24. Table 25. Table 26. Table 27. Table 28. Table 29. Table 30. Table 31. Table 32. Table 33. Table 34. Table 35. Table 36. Table 37. Table 38. Table 39. Table 40. Table 41. Table 43. Table 44. Table 45. Table 47. Table 48. Table 49. Table 50. 18/590 Summary of SPEAr1340 features. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 SPEAr1340 IP groups. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 CortexA9 subsystem clocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 CORTEXA9INTEGRATION and PL310 configuration parameters. . . . . . . . . . . . . . . . . . . 36 A9CS memory map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 Interrupt output source selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 IA group organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 Connectivity matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 RCG clocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 PLL source clocks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 PLL output clocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 PLL division factors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 Jitter at PLL output clock . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 Selection of ? value . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 PLL modes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 SSCGn output frequencies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 XYSYNT clocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 A9SM clocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 GMAC clocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 Setting GMAC clocks to different modes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 Reset sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 Allowed power states . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 Allowed power states . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 Allowed wakeup events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 Power management configuration registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 Clock power states . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 Standard IPs power states . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 USBPHY power states . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 USBPHY power management-related registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 ADC power state. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 ADC power management-related registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 PCM internal pins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 OTP Bank M configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 Hardware boot selection (STRAP[0..3]) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 IP configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 USB device descriptors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 USB configuration descriptors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 USB interface descriptors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 USB IN endpoint descriptors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 USB OUT endpoint descriptors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 USB string descriptors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 Security parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 Error codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 Supported NAND devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 BANK 1/ 2 bit mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150 BANK M bit mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150 DMAC MUX - selecting the peripheral . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159 DMAC MUX - selecting the peripheral . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159 Doc ID 018553 Rev 3 RM0078 Table 51. Table 52. Table 53. Table 54. Table 55. Table 56. Table 57. Table 58. Table 59. Table 60. Table 61. Table 62. Table 63. Table 64. Table 65. Table 66. Table 67. Table 68. Table 69. Table 70. Table 71. Table 72. Table 73. Table 74. Table 75. Table 76. Table 77. Table 78. Table 79. Table 80. Table 81. Table 82. Table 83. Table 84. Table 85. Table 86. Table 87. Table 88. Table 89. Table 90. Table 91. Table 92. Table 93. Table 94. Table 95. Table 96. Table 97. Table 98. Table 99. Table 100. Table 101. Table 102. List of tables DMAC MUX - selecting the flow controller and data direction . . . . . . . . . . . . . . . . . . . . . 160 DMAC MUX - selecting the DMAC core . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160 Transfer types and flow controller combinations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162 Programming of transfer types and channel register update method . . . . . . . . . . . . . . . . 168 MOVE_INIT bit encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208 MOVE_INIT bits nn definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208 MOVE_DATA bit encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208 DES START ECB bit encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209 Bit a definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209 Bit b definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209 DES START CBC bit encoding. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210 DES APPEND ECB bit encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210 DES APPEND CBC bit encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211 HASH [MD5/SHA1/SHA2] INIT bit encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218 HASH [MD5/SHA1/SHA2] INIT bits aa definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218 HASH [MD5/SHA1/SHA2] APPEND instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218 HASH [MD5/SHA1/SHA2] END bit encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218 HASH [MD5/SHA1/SHA2] END bit t definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219 HASH CONTEXT SAVE bit encoding. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219 HASH CONTEXT RESTORE bit encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219 HMAC [MD5/SHA1/SHA2] INIT bit encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220 HMAC [MD5/SHA1/SHA2] APPEND bit encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220 HMAC [MD5/SHA1/SHA2] END bit encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221 HMAC [MD5/SHA1/SHA2] END bit t definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221 HMAC CONTEXT SAVE bit encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221 HMAC CONTEXT RESTORE bit encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221 HASH [SHA384/SHA512] INIT bit encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222 HASH [SHA384/SHA512] INIT bits aa definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222 HASH [SHA384/SHA512] APPEND bit encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223 HASH [SHA384/SHA512] END bit encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223 HASH CONTEXT SAVE bit encoding. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224 HASH CONTEXT RESTORE bit encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224 HMAC [SHA384/SHA512] INIT bit encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224 HMAC [SHA384/SHA512] APPEND bit encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225 HMAC [SHA384/SHA512] END bit encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225 HMAC CONTEXT SAVE bit encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225 HMAC CONTEXT RESTORE bit encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226 MONTY_EXP instruction data structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227 MONTY_PAR bit encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227 Input data structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227 Resulting data structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227 MOD_EXP bit encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228 Input data structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228 Resulting data structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228 MONTY_EXP bit encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229 Input data structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229 Resulting data structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229 ECC_MUL bit encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230 Input data structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230 Resulting data structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230 ECC_MONTY_MUL bit encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231 Input data structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231 Doc ID 018553 Rev 3 19/590 List of tables Table 103. Table 104. Table 105. Table 106. Table 107. Table 108. Table 109. Table 110. Table 111. Table 112. Table 113. Table 114. Table 115. Table 116. Table 117. Table 118. Table 119. Table 120. Table 121. Table 122. Table 123. Table 124. Table 125. Table 126. Table 127. Table 128. Table 129. Table 130. Table 131. Table 132. Table 133. Table 134. Table 135. Table 136. Table 137. Table 138. Table 139. Table 140. Table 141. Table 142. Table 143. Table 144. Table 145. Table 146. Table 147. Table 148. Table 149. Table 150. Table 151. Table 152. Table 153. Table 154. 20/590 RM0078 Resulting data structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231 GET_VAL instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232 AXI transfer type limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238 Configured AXI settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240 Write response signals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242 AHB transfer type limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243 Relative priority example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247 System D specifications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249 System D operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249 Out of range access parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254 NAND bank selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259 NOR/SRAM bank selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 260 External memory address . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 260 FSMC asynchronous operating modes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261 Supported instruction set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 270 Transmit descriptor words 0 through 3 (TDES0 — TDES3) . . . . . . . . . . . . . . . . . . . . . . . 293 Transmit descriptor words 6 and 7 (TDES6 and TDES7) . . . . . . . . . . . . . . . . . . . . . . . . . 297 Receive descriptor fields (RDES0 through RDES3) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 298 Extended status — receive descriptor fields 4 (RDES4) . . . . . . . . . . . . . . . . . . . . . . . . . 301 Time-stamp snapshot — receive descriptor fields 6 and 7 (RDES6 & RDES7) . . . . . . . . 302 AXI bridge DBI -> CDM / ELBI access details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343 Result of filtering rules applied to request TLPs and completion (CPL) TLPs: EP mode . 349 Result of filtering rules to request TLPs and completions (CPL) TLPs: RC mode . . . . . . 351 Error message (Msg) format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 355 Possible causes for typical errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 357 Message classes based on the message code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 358 Message transmission . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 360 Message reception . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 361 Controlling the routing of received messages. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 364 Registers used for programming the iATU . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 365 PCIe core completion timeout ranges versus PCI express specification . . . . . . . . . . . . . 371 p1_clk_osc selection truth table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 393 UART interrupt summary with combined outputs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 401 Meaning of modem input/output in DTE and DCE modes . . . . . . . . . . . . . . . . . . . . . . . . 405 Control bits to enable and disable hardware flow control . . . . . . . . . . . . . . . . . . . . . . . . . 407 External CS selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 418 I2C definition of bits in first byte . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 427 ic_clk in relation to high and low counts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 434 Triggering an interrupt from pin 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 453 Block signals and external interconnection cross reference . . . . . . . . . . . . . . . . . . . . . . . 462 Key-code table (hex values) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 464 Mapping between external pins and PARDATAREG bits. . . . . . . . . . . . . . . . . . . . . . . . . 464 RX_ERROR conditions, types, and actions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 475 Wait loop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 476 TX_ERROR conditions, types, and actions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 478 Frame buffer support for palette load (PSS =1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 488 Frame buffer organization, PSS =0 or BPP = 16, 18, 24 bpp . . . . . . . . . . . . . . . . . . . . . . 489 Frame buffer organization, PSS =1, BPP = 1 bpp . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 489 Frame buffer organization, PSS =1, BPP = 2 bpp . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 489 Frame buffer organization, PSS =1, BPP = 4 bpp . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 489 Frame buffer organization, PSS =1, BPP = 8 bpp . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 490 LEB_LEP, Input FIFO Read Side bits [31:16]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 491 Doc ID 018553 Rev 3 RM0078 Table 155. Table 156. Table 157. Table 158. Table 159. Table 160. Table 161. Table 162. Table 163. Table 164. Table 165. Table 166. Table 167. Table 168. Table 169. Table 170. Table 171. Table 172. Table 173. Table 174. Table 175. Table 176. Table 177. Table 178. Table 179. Table 180. Table 181. Table 182. Table 183. Table 184. Table 185. Table 186. Table 187. Table 188. Table 189. Table 190. Table 191. Table 192. Table 193. Table 194. Table 195. List of tables LEB_LEP, Input FIFO Read Side bits [15:0]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 492 BEB_BEP, Input FIFO Read Side bits [31:16] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 493 BEB_BEP, Input FIFO Read Side bits [15:0] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 494 LEB_BEP, Input FIFO Read Side bits [31:16] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 494 LEB_ BEP, Input FIFO Read Side bits [15:0] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 495 Supported standards, profiles and levels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 513 Deviations from the supported profiles and levels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 513 Decoder interrupt register (SWREG1 OFFSET 0X4). . . . . . . . . . . . . . . . . . . . . . . . . . . . 515 Post-processing interrupt register (swreg60 offset 0xf0) . . . . . . . . . . . . . . . . . . . . . . . . . 517 H.264 / SVC decoder base layer features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 519 MPEG-4 / H.263 / Sorenson Spark decoder features. . . . . . . . . . . . . . . . . . . . . . . . . . . . 521 MPEG-2 / MPEG-1 features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 523 JPEG decoder features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 525 VC-1 decoder features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 526 RV decoder features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 528 VP6 features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 530 VP7/VP8 features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 532 AVS features. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 534 DivX features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 536 Post processor features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 537 64-bit data bus parameter divisibility requirements. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 546 32-bit data bus parameter divisibility requirements. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 546 QCIF video frame luminance data pixel numbers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 549 QCIF video frame luminance pixel data storage in raster-scan order. All pixels in a row are stored in consecutive memory locations. . . . . . . . . . . . . . . . . . . . . 549 QCIF video frame luminance pixel data storage in tiled order. All pixels in a macroblock are stored in consecutive memory locations. . . . . . . . . . . . . . 549 Video stabilization features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 557 Connectivity features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 557 CAM interrupts summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 560 Maximum picture size according data format and buffer size. . . . . . . . . . . . . . . . . . . . . . 563 VIP internal pins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 566 Single link 16-bit data storing format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 567 Single link 24-bit data storing format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 567 Single link 32-bit data storing format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 567 Dual link 16-bit data storing format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 568 Dual link 24-bit data storing format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 568 I2S interrupts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 570 Channel configurations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 572 Interrupt configurations with respect to interrupt pins . . . . . . . . . . . . . . . . . . . . . . . . . . . . 573 SPEAr1340 external interrupts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 575 List of acronyms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 579 Document revision history . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 586 Doc ID 018553 Rev 3 21/590 List of figures RM0078 List of figures Figure 1. Figure 2. Figure 3. Figure 4. Figure 5. Figure 6. Figure 7. Figure 8. Figure 9. Figure 10. Figure 11. Figure 12. Figure 13. Figure 14. Figure 15. Figure 16. Figure 17. Figure 18. Figure 19. Figure 20. Figure 21. Figure 22. Figure 23. Figure 24. Figure 25. Figure 26. Figure 27. Figure 28. Figure 29. Figure 30. Figure 31. Figure 32. Figure 33. Figure 34. Figure 35. Figure 36. Figure 37. Figure 38. Figure 39. Figure 40. Figure 41. Figure 42. Figure 43. Figure 44. Figure 45. Figure 46. Figure 47. Figure 48. 22/590 SPEAr1340 simplified block diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 CortexA9 subsystem top level block diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 CORTEXA9INTEGRATION internal block diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 A9CS block diagram with CORTEXA9INTEGRATION. . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 Secure and non-secure interrupt priority formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 SPEAr1340 block diagram with BUSMATRIX topology details . . . . . . . . . . . . . . . . . . . . . 49 RCG block diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 RCG integration in SPEAr1340 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 PLL overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 X=1 , Y= 4 (duty cycle < 50 %) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 System clock controller . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 AMBA clock generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 A9SM clock domain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 GMAC clock generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 I2S_M clock generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 UART clock generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 C3 clock generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 CLCD clock generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 MPMC clocks scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 Reset generator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 Reset waveform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 SPEAr1340 power islands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 Power states transition graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 PCM block diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 PCM core block diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 Relevant PCM core interface timing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 Configuration funnel block diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 Configuration funnel selection flow graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 Domain checker block diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 BootROM start-up sequence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 SYSROM memory map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 SYSRAM0 memory map. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 System initialization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 SD/MMC card detection sequence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 BootROM flowchart. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 Header authentication flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 Default boot mode flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128 First stage secure boot process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130 BootROM and RAM layout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 Boot image format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 BootROM on Core 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142 GPT block diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152 DMAC block diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156 DMAC wrapper . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157 DMAC handshaking lines allocation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158 Multiblock transfer using linked lists when DMAH_CHx_STAT_SRC set to true . . . . . . . 167 Multiblock transfer using linked lists when DMAH_CHx_STAT_SRC set to false . . . . . . 167 Mapping of block descriptor (LLI) in memory to channel registers when Doc ID 018553 Rev 3 RM0078 Figure 49. Figure 50. Figure 51. Figure 52. Figure 53. Figure 54. Figure 55. Figure 56. Figure 57. Figure 58. Figure 59. Figure 60. Figure 61. Figure 62. Figure 63. Figure 64. Figure 65. Figure 66. Figure 67. Figure 68. Figure 69. Figure 70. Figure 71. Figure 72. Figure 73. Figure 74. Figure 75. Figure 76. Figure 77. Figure 78. Figure 79. Figure 80. Figure 81. Figure 82. Figure 83. Figure 84. Figure 85. Figure 86. Figure 87. Figure 88. Figure 89. Figure 90. Figure 91. Figure 92. Figure 93. List of figures DMAH_CHx_STAT_SRC set to True . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169 Mapping of block descriptor (LLI) in memory to channel registers when DMAH_CHx_STAT_SRC set to False . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170 Flowchart for DMA programming example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173 Multi-block with linked address for source and destination. . . . . . . . . . . . . . . . . . . . . . . . 180 Multi-block with linked address for source and destination where SARx and DARx between successive blocks are contiguous . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181 DMA transfer flow for source and destination linked list address . . . . . . . . . . . . . . . . . . . 182 Multi-block dma transfer with source and destination address auto-reloaded . . . . . . . . . 184 DMA transfer flow for source and destination address auto-reloaded . . . . . . . . . . . . . . . 185 Multi-block DMA transfer with source address auto-reloaded and linked list destination address . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189 DMA transfer flow for source address auto-reloaded and linked list destination address 190 Multi-block DMA transfer with source address auto-reloaded and contiguous destination address . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193 DMA transfer flow for source address auto-reloaded and contiguous destination address . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194 Multi-block DMA transfer with linked list source address and contiguous destination address . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198 DMA transfer for linked list source address and contiguous destination address. . . . . . . 198 C3 block diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201 C3 channel architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205 AES (MPCM) channel instruction set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212 MPCM Core block RAM diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214 MPCM vector tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215 THSENS block interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233 MPMC clocks scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236 Multiport memory controller architecture. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238 AXI interface blocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241 Weighted round-robin priority group structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247 Memory controller memory map: maximum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 256 Alternate memory map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257 FSMC and embedded MPU boundary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 258 SRAM asynchronous read access . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262 SRAM asynchronous write access . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262 SRAM asynchronous read access with FSMC_REn toggling. . . . . . . . . . . . . . . . . . . . . . 263 SRAM asynchronous write access with FSMC_REn toggling . . . . . . . . . . . . . . . . . . . . . 263 NOR Flash asynchronous read access . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264 NOR Flash asynchronous write access . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264 NOR Flash asynchronous read access with FSMC_REn toggling . . . . . . . . . . . . . . . . . . 265 NOR Flash asynchronous write access with FSMC_REn toggling. . . . . . . . . . . . . . . . . . 265 Asynchronous read access with extended address . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266 Asynchronous write access with extended address . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266 SMI block diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 268 SD/SDIO/MMC Host controller block diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274 Data transfer using DAT line sequence (not using DMA) . . . . . . . . . . . . . . . . . . . . . . . . . 278 Data transfer using DAT line sequence (using DMA) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 280 Data transfer using DAT line sequence (using ADMA). . . . . . . . . . . . . . . . . . . . . . . . . . . 282 Synchronous abort sequence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284 Data path synchronization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285 CF/xD Host controller block diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287 Transmitter descriptor fields - alternate (enhanced) format . . . . . . . . . . . . . . . . . . . . . . . 293 Doc ID 018553 Rev 3 23/590 List of figures Figure 94. Figure 95. Figure 96. Figure 97. Figure 98. Figure 99. Figure 100. Figure 101. Figure 102. Figure 103. Figure 104. Figure 105. Figure 106. Figure 107. Figure 108. Figure 109. Figure 110. Figure 111. Figure 112. Figure 113. Figure 114. Figure 115. Figure 116. Figure 117. Figure 118. Figure 119. Figure 120. Figure 121. Figure 122. Figure 123. Figure 124. Figure 125. Figure 126. Figure 127. Figure 128. Figure 129. Figure 130. Figure 131. Figure 132. Figure 133. Figure 134. Figure 135. Figure 136. Figure 137. Figure 138. Figure 139. Figure 140. Figure 141. Figure 142. Figure 143. Figure 144. Figure 145. 24/590 RM0078 Transmit descriptor fetch (read) for alternate (enhanced) format . . . . . . . . . . . . . . . . . . . 293 Receive descriptor fields - alternate (enhanced) format . . . . . . . . . . . . . . . . . . . . . . . . . . 297 Networked time synchronization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303 System time update using fine method. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305 UHC block diagram. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 318 USB open Host controller block diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321 UOC module in SPEAr1340 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324 PCIe port system block diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327 PCIe integration in SPEAr1340 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327 PCIe main interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329 DM core block diagram (with AHB/AXI bridge module) . . . . . . . . . . . . . . . . . . . . . . . . . . 330 System level view of the PCIe AXI core . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 332 PCIe AXI core top-level interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333 CXPL module block diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 334 XADM block diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 335 RADM block diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 337 LBC context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 340 LBC switch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 341 PCIe configuration space address map (per function) . . . . . . . . . . . . . . . . . . . . . . . . . . . 342 DBI access to LBC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343 Receive TLP processing flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 348 Default request TLP routing (assuming no TLPs with CA/CRS/UR completion status) . . 353 Message transmission: EP mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 359 Message transmission: RC mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 360 Message reception: EP mode. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 362 Message reception: RC mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 363 iATU address region mapping: outbound and inbound (address match mode) . . . . . . . . 366 iATU address region mapping: inbound (bar match mode) . . . . . . . . . . . . . . . . . . . . . . . 368 Relationship of power down states between link partners . . . . . . . . . . . . . . . . . . . . . . . . 369 SATA block diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 374 Bus interface unit block diagram. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 376 Transport layer functional block diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 381 Link layer functional block diagram. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 385 Port power control module diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 388 MiPHY application diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 392 Reference clock selection circuitry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 394 SerDes clocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 395 MiPHY functional block diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 396 MiPHY module in SPEAr1340 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 398 UART block diagram. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 400 Hardware flow control between two similar devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 406 Hardware flow control transfer diagram (start of transfer) . . . . . . . . . . . . . . . . . . . . . . . . 406 Hardware flow control transfer diagram (end of transfer) . . . . . . . . . . . . . . . . . . . . . . . . . 407 UART/IrDA block diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 409 IrDA data modulation (3/16) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 410 UART character frame . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 410 RXFIFO payload . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 411 UART transfer bit diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 411 Baud rate divisor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 411 SSP block diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 413 I2C block diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 421 Master/slave and transmitter/receiver relationships . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 423 Doc ID 018553 Rev 3 RM0078 Figure 146. Figure 147. Figure 148. Figure 149. Figure 150. Figure 151. Figure 152. Figure 153. Figure 154. Figure 155. Figure 156. Figure 157. Figure 158. Figure 159. Figure 160. Figure 161. Figure 162. Figure 163. Figure 164. Figure 165. Figure 166. Figure 167. Figure 168. Figure 169. Figure 170. Figure 171. Figure 172. Figure 173. Figure 174. Figure 175. Figure 176. Figure 177. Figure 178. Figure 179. Figure 180. Figure 181. Figure 182. Figure 183. Figure 184. Figure 185. Figure 186. Figure 187. Figure 188. Figure 189. Figure 190. Figure 191. Figure 192. Figure 193. Figure 194. Figure 195. Figure 196. Figure 197. List of figures Data transfer on the I2C bus. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 425 START and STOP condition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 426 7-bit address format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 426 10-bit address format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 427 Master-transmitter protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 428 Master-receiver protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 429 START BYTE transfer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 429 Multiple master arbitration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 430 Multi-master clock synchronization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 431 I2C master implementing tHD;DAT when IC_SDA_HOLD = 3. . . . . . . . . . . . . . . . . . . . . 436 Breakdown of DMA transfer into burst transactions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 438 Breakdown of DMA transfer into single and burst transactions . . . . . . . . . . . . . . . . . . . . 438 Case 1 watermark levels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 439 Case 2 watermark levels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 440 I2C Receive FIFO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 441 Burst transaction – pclk = hclk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 442 Back-to-back burst transaction – hclk = 2*pclk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 442 Single transaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 444 Burst transaction + 3 back-to-back singles – hclk = 2*pclk. . . . . . . . . . . . . . . . . . . . . . . . 444 GPIOA and GPIOB block diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 450 GPIO detailed block diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 451 GPIO interrupt registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 453 Example to write to address 0x098. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 456 Example to read from address 0x0C4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 456 XGPIO block diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 457 Mapping of XGPIO40 pad to XGPIO registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 458 Interrupt detection logic on XGPIOs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 459 Keyboard controller block diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 461 Timing diagram of ADC conversions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 466 PWM block diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 469 Output pulse generation example (Duty = 3, Period = 7) . . . . . . . . . . . . . . . . . . . . . . . . . 471 CEC block diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 472 CEC control logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 474 Example: a complete message reception . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 475 Example: RX_ERROR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 476 Example: a complete message transmission . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 477 Example: a TX_ERROR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 478 Quanta counter timing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 479 Bit shaping logic timing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 480 Message description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 481 Bit timing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 481 Signal-free time. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 482 Arbitration phase. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 482 Bit error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 482 LCD controller block diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 484 A single overlay window over a background graphics window . . . . . . . . . . . . . . . . . . . . . 498 GPU top level block diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 499 GPU functional block diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 501 The GPU software architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 505 Typical 3D graphics flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 506 Geometry processor data structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 507 Pixel processor data structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 508 Doc ID 018553 Rev 3 25/590 List of figures Figure 198. Figure 199. Figure 200. Figure 201. Figure 202. Figure 203. Figure 204. Figure 205. Figure 206. Figure 207. Figure 208. Figure 209. Figure 210. Figure 211. Figure 212. Figure 213. Figure 214. Figure 215. Figure 216. Figure 217. Figure 218. Figure 219. Figure 220. Figure 221. Figure 222. Figure 223. Figure 224. Figure 225. Figure 226. Figure 227. Figure 228. Figure 229. Figure 230. Figure 231. Figure 232. Figure 233. Figure 234. Figure 235. Figure 236. Figure 237. 26/590 RM0078 2D graphics process flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 509 GPU image filter process flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 510 Typical graphics pipeline flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 511 Decoder functional block diagrams. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 512 Video decoder detailed block diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 518 H.264 decoder initialization. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 519 H.264 / SVC decoder basic process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 520 MPEG-4 / H.263 / Sorenson Spark decoder initialization . . . . . . . . . . . . . . . . . . . . . . . . . 521 MPEG-4 / H.263 / Sorenson Spark decoder basic procces . . . . . . . . . . . . . . . . . . . . . . . 522 MPEG-2 / MPEG-1 decoder initialization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 523 MPEG-2 / MPEG-1 decoder basic procces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 524 JPEG decoder basic process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 525 VC-1 decoder initialization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 526 VC-1 decoder basic procces. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 527 RV decoder initialization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 528 RV decoder basic procces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 529 VP6 decoder initialization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 530 VP6 decoder basic procces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 531 VP7/VP8 decoder initialization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 532 VP7/VP8 decoder basic procces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 533 AVS decoder initialization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 534 AVS decoder basic procces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 535 Data flow and functional block diagram - standalone mode . . . . . . . . . . . . . . . . . . . . . . . 539 Data flow and functional block diagram - combined mode . . . . . . . . . . . . . . . . . . . . . . . . 540 Post processor flowchart - standalone mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 541 Post processor flowchart - combined mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 543 External memory use . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 548 YCbCr 4:2:0 planar video frame storage format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 550 YCbCr 4:2:0 semi-planar video frame storage format . . . . . . . . . . . . . . . . . . . . . . . . . . . 551 YCbCr 4:2:2 interleaved video frame storage format . . . . . . . . . . . . . . . . . . . . . . . . . . . . 552 AYCbCr 4:4:4 interleaved video frame storage format . . . . . . . . . . . . . . . . . . . . . . . . . . . 553 RGB 16bpp video frame storage format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 553 RGB 32bpp video frame storage format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 554 Encoder functional block diagram. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 555 Stabilization picture dimensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 557 CAM block diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 559 External VSYNC and HSYNC synchronization. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 562 ITU656 embedded synchronization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 563 Video input block diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 565 I2S block diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 569 Doc ID 018553 Rev 3 RM0078 1 Device overview Device overview The SPEAr1340 device is a system-on-chip belonging to the SPEAr® (Structured Processor Enhanced Architecture) family of embedded microprocessors. The product is suitable for consumer and professional applications where an advanced human machine interface (HMI) combined with high performance are required, such as low-cost tablets, thin clients, media phones and industrial/printer smart panels. The device is hardware-compliant to the support of both real-time (RTOS) and high-level (HLOS) operating systems, such as Android, Linux and Windows Embedded Compact 7. The architecture of SPEAr1340 is based on several internal components, communicating through a multilayer interconnection matrix (BUSMATRIX). This switching structure enables different data flows to be carried out concurrently, improving the overall platform efficiency. In particular, high-performance master agents are directly interconnected with the DDR memory controller in order to reduce access latency. The overall memory bandwidth assigned to each master port can be programmed and optimized through an internal weighted round-robin (WRR) arbitration scheme. Figure 1 on page 28 is the internal connectivity block diagram. Table 1 on page 29 lists device features and capabilities. Table 2 on page 32 lists device IP groups and their constituent IPs. Doc ID 018553 Rev 3 27/590 Device overview RM0078 1.1 Simplified block diagram Figure 1. SPEAr1340 simplified block diagram JTAG Trace Highspeedconnectivity Memory Coresight BootROM SRAMs MPCore USB2.0HostCtrl CPU0 DDR2/3Ctrl CPU1 FPU StaticMemoryCtrl SerialMemoryI/F USB2.0HostCtrl PTM FPU CortexA9CPU 32KB ICache PTM CortexA9CPU 32KB DCache 32KB ICache 32KB DCache USB2.0OTGCtrl Giga/Fast EthernetCtrl PCIe Ctrl PHY MemorycardI/F SCU Graphics,video,audio USB PHYs SATACtrl Timer& Watchdog CPU0 Timer& Watchdog CPU1 Global Timer Interrupt Controller Lowspeedconnectivity GPIO 2D/3DGPU AXIBus Master0 Snoop Filtering AXIBus Master1 Cache Transfers XGPIO VideoDecoder VideoEncoder ACP I2C(2x) SSP DisplayCtrl UART(2x) 512KBL2Cache Reset&clockGenerator KBD CameraI/F(4x) THSENS OTP PowerControl CEC(2x) VideoInput I2SAudioI/F (8in,8out) S/PDIFAudioI/F 28/590 Configuration registers DMACtrl(2x) Timers Security Coprocessor ADC PWM(4x) BUSMATRIXInterconnect Doc ID 018553 Rev 3 RTC Opt. Battery RM0078 Device overview 1.2 Summary of features Table 1. Summary of SPEAr1340 features Category Cortex A9 subsystem Interconnect Features Details CPU cores ARM Cortex A9 with FPU, dual-core, up to 600 MHz 32 KB L1 ICache per core 32 KB L1 DCache per core L2 Cache 512 KB, shared Debug & trace Coresight sub system, 2 x PTM debug I/F Other features – – – – – – Multilayer bus matrix up to 166 MHz Shared interrupt controller (GIC) 1x 64-bit global timer 2x 32-bit timers (one per core) 2x watchdog/timers (one per core) snoop control unit ACP Reset and clock generation — System configuration registers (MISC) — One-time programmable antifuse 510 + 209 bits Temperature sensor System-level — DMA controllers 2 x DMAC modules, total 16 channels General purpose timers 2 x GPT modules, total 8 timers (4 with capture mode) Real-time clock — Power control module — Security co-processor HW acceleration for DES, 3DES, AES, universal hashing, SHA1/2, MD5, HMAC PKA, True_RNG Doc ID 018553 Rev 3 29/590 Device overview Table 1. RM0078 Summary of SPEAr1340 features (continued) Category Internal / external memories Features Details BootROM 32 KB Stores resident bootstrap firmware System SRAM 32 KB Always-on SRAM 4 KB DDR controller – DDR2-1066/DDR3-1066, up to 533 MHz – 16-/32-bit – up to 2 GB address space Static memory controller 16-bit interface Supports: – NAND Flash – parallel NOR Flash – static RAM Serial Flash controller Supports serial NOR Flash, up to 2 banks, 16 MB each Memory card interface Supported standards: – SD/SDIO 2.0 – SDHC – MMC 4.x – CF/ CF+ 4.1 – xD 2D/3D graphics processing unit ARM MALI 200 Video decoder Supported standards: – H.264 1080p – MPEG-1/2/4 1080p – H.263 SD – Sorenson Spark 1080p – WMV9/ VC-1 1080p – RealVideo – DivX – VP6, VP7, VP8 AVS – JPEG 67 Mpixels Video encoder – H.264 1080p – JPEG 64 Mpixels Display controller Up to 24 bpp, 1920x1080 @60 fps Embedded PWM Graphics, video & audio Camera input interfaces — Video input parallel port — I2S digital audio interfaces 2 modules for total 8 x input + 8 x output channels SPDIF digital audio interface 30/590 Doc ID 018553 Rev 3 — RM0078 Table 1. Device overview Summary of SPEAr1340 features (continued) Category Features Details USB 2.0 host controllers 2 x USB 2.0 host ports USB OTG controller High-speed connectivity — Ethernet controller 1 x Giga/Fast Ethernet port (external GMII/ RGMII/MII/RMII PHY) PCI Express controller 1 port, alternative to SATA SATA gen-2 controller 1 port, alternative to PCIe PCIe/SATA physical interface General purpose IOs — 2 modules, total 16 IOs Extended general purpose IOs Low-speed connectivity — I2C bus controllers 2 ports, master/slave Synchronous serial port Master/slave, 4 chip select signals Asynchronous serial ports 2 x UART ports, also IrDA capable Keyboard controller 6x6 matrix HDMI/CEC 2 x interfaces Analog-to-digital converter 10-bit, 1 Msps, 8 channels Also suitable for resistive touchscreen interface Pulse width modulators 4 x PWM outputs Doc ID 018553 Rev 3 31/590 Device overview 1.3 RM0078 IP groups Table 2. SPEAr1340 IP groups IP group 32/590 Constituent IPs Overview, processors, & busses CPU subsystem (A9SM) Multilayer interconnect matrix (BUSMATRIX) General device resources BootROM Direct memory access controllers (DMAC) General purpose timers (GPT) One-time programmable antifuse (OTP) Power control module (PCM) Reset and clock generator (RCG) Real-time clock (RTC) Cryptographic co-processor (C3) Static RAMs (SRAM) System configuration registers (MISC) Temperature sensor (THSENS) Memory interfaces Multiport DDR2/3 controller (MPMC) Memory card interface (MCIF) Serial NOR Flash controller (SMI) Static memory controller (FSMC) Graphics, video, & audio Camera input interfaces (CAM) Display controller (CLCD) Graphics processing unit (GPU) I2S digital audio interfaces (I2S) S/PDIF digital audio ports Video decoder (VDEC) Video encoder (VENC) Video input parallel port (VIP) High-speed connectivity Giga/Fast Ethernet controller (GMAC) PCI express controller (PCIe) Serial ATA controllers (SATA) SATA/PCIe physical interface (MiPHY) USB 2.0 host controllers (UHC) USB OTG controller (UOC) Other connectivity A/D converter (ADC) Asynchronous serial port (UART0) Extended general purpose I/O (XGPIO) General purpose I/O (GPIOA-B) HDMI CEC interfaces (CEC) I2C bus controllers (I2C) Keyboard controller (KBD) PWM generators (PWM) Synchronous serial port (SSP) Doc ID 018553 Rev 3 RM0078 2 CPU subsystem (A9SM) CPU subsystem (A9SM) This chapter focuses on the A9SM functionality and operation. For the A9SM feature list, refer to the SPEAr1340 datasheet: ● Doc ID 023063, Data sheet, SPEAr1340, Dual-core Cortex A9 HMI embedded MPU For technical details about the programmable registers, refer to the following companion document: ● Overview The CPU subsystem is based on the ARM Cortex A9 processor, and has a dual core configuration. Figure 2 shows the main blocks of the CPU subsystem: ● a dual Cortex A9 core (CORTEXA9INTEGRATION) ● a CoreSight subsystem (A9CS) ● a clock manager (CMR) ● an L2 cache controller (PL310) and the bus interfaces: ● two AXI masters for PL310 BUSMATRIX connections ● an AXI slave for the accelerator coherency port (ACP) ● two APB slave interfaces; one for access to the clock manager, and one for access to the internal CoreSight components Figure 2. CortexA9 subsystem top level block diagram AXI-ACP (Q0) AXI slave for the accelerator coherency port CORTEXA9INTEGRATION Coresight subsystem (A9CS) APB-SYS (B0) APB slave interface for access to the internal Coresight component L2 cache controller PL310 Clock manager (CMR) APB-CMR (B1) AXI-M1 (20) APB slave interface for access to the clock manager AXI-M0 (10) 2.1 RM0089, Reference manual, SPEAr1340 address map and registers AXI masters for PL310 BUSMATRIX connections Doc ID 018553 Rev 3 33/590 CPU subsystem (A9SM) 2.2 RM0078 Pins For a complete pin description, refer to Doc ID 023063, Data sheet, SPEAr1340, Dual-core Cortex A9 HMI embedded MPU. 2.3 Clocks Note: This section gives a general presentation of the A9SM clocks. For more details, refer to Chapter 5: Reset and clock generator (RCG). Although all A9SM clocks are theoretically synchronous, for implementation reasons they are classified into different scopes. ● CLK_CORE and PERIPHCLK are considered fully asynchronous compared to the other A9SM clocks. This provides better tolerance to on-chip variations (OCV). The clock tree from CLK_CORE to CORTEXA9INTEGRATION must be as short as possible. ● ATCLK, PCLKEDB, TRACECLKIN, DAPCLK, CTMCLK, CTICLK are the clocks inside A9CS. These clocks must be considered synchronous and equal to the the main A9CS clock (ATCLK). ● TRACECLK (output) is one-half of TRACECLKIN. Clock gating is present to reduce power consumption. Table 3. CortexA9 subsystem clocks Clock 2.4 Frequency (MHz) Block Type A9SM CLK1GHz 1200 CMR Input External CLK_CORE 600 CORTEXA9INTEGRATION Input Internal PERIPHCLK 250 CORTEXA9INTEGRATION Input Internal ATCLK 250 A9CS Input Internal CTICLK 250 A9CS Input Internal CTMCLK 250 A9CS Input Internal TRACECLKIN 250 A9CS Input Internal PCLKDBG 250 A9CS Input Internal PCLKSYS 250 A9CS Input Internal TRACECLK 125 A9CS Output External PCLKDBG_SOC 100 A9CS Input External ATCLK_SOC 166 A9CS Input External ACLKS_SOC 166 CORTEXA9INTEGRATION Input External Interrupts Refer to Section 2.5.7: Generic interrupt controller (GIC) and Appendix A: Interrupts. 34/590 Doc ID 018553 Rev 3 RM0078 2.5 CPU subsystem (A9SM) Functional description This chapter describes the main blocks and functionalities of the CPU subsystem. 2.5.1 CORTEXA9INTEGRATION The CORTEXA9INTEGRATION comprises: ● two Cortex-A9 processors in a cluster and a snoop control unit (SCU) that ensures coherency within the cluster ● a global timer (GTIM) ● a private timer and watchdog unit per processor (WDTIM) ● a generic interrupt controller (GIC) with 128 dedicated external lines ● a second master port with programmable address filtering capability (disabled at reset) ● an accelerator coherency port (ACP) suitable for coherent memory transfers Figure 3. CORTEXA9INTEGRATION internal block diagram General timer Cortex-A9 MPCore CPU0 Cortex-A9 MPCore CPU1 Instruction, data, and coherency buses Tag RAM Slave 0 Slave 1 Tag RAM Timer and watchdog Timer and watchdog Tag RAM Tag RAM Tag control Interrupt controller Snoop control unit (SCU) Cache to cache transfers Snoop filtering Master 0 Master 1 AXI RW 64-bit bus AXI RW 64-bit bus Accelerator coherency port (ACP) AXI RW 64-bit bus Four tag RAMs per CPU CORTEXA9INTEGRATION and PL310 CORTEXA9INTEGRATION and PL310 support full parity error detection. Table 4 lists the CORTEXA9INTEGRATION and PL310 configuration parameters. An RTL parameter is a register transfer level (RTL) static parameter (not changeable at runtime). A PIN parameter is a signal available at the CortexA9 subsystem boundary that is used to impose a predefined behavior (primarily to determine reset values). Doc ID 018553 Rev 3 35/590 CPU subsystem (A9SM) Table 4. RM0078 CORTEXA9INTEGRATION and PL310 configuration parameters Type Name Value Description CORTEXA9INTEGRATION RTL CORE_NUM RTL MP_MODE YES 2 Multiprocessor system (enable SCU) 2 CPUs in system RTL ACP_PRESENT YES ACP port present RTL MASTER_NUM 2 RTL INT_NUM 128 Number of external interrupt lines RTL POWER_DOMAIN_WRAPPER NO Internal power-down feature not enabled RTL PTM_INTERFACE_PRESENT YES PTM interface present for each CPU (for CS) RTL PARITY YES Enabled parity fail signal generation on all internal RAMs RTL PRELOAD_ENGINE_PRESENT YES Enable the presence of a preload engine RTL PRELOAD_ENGINE_FIFO_SIZE 8 Number of entries on the preload engine PIN CFGSDISABLE Number of AXI masters zero No restrictions in writing GIC registers in secure mode at reset [CPU0/1] RTL DCACHESIZE 32 K Data cache size in byte RTL ICACHESIZE 32 K Instruction cache size in byte RTL TLBSIZE 128 Number of TLB entries RTL JAZELLE_PRESENT YES Java processor present RTL FPU_PRESENT YES Vectorized Floating Point Unit present RTL NEON_PRESENT NO NEON instruction set NOT available PIN CFGEND[1:0] 0x0 Little endian at reset PIN CFGNMFI[1:0] 0x0 NMFI bit in the CP15 c1 control register set to 0 at reset PIN CP15SDISABLE[1:0] 0x0 No restrictions in CP15 access at reset PIN VINITHI[1:0] 0x3 High vector table at reset PIN TEINIT 0x0 Default exception handling state at reset: ARM PL310 RTL pl310_PARITY YES Enabled parity fail signal generation on all internal RAMs RTL pl310_S1 YES 2 AXI slaves RTL pl310_M1 YES 2 AXI masters RTL pl310_AXI_ID_MAX 4 AXI ID width on the PL310 slave ports: pl310_AXI_ID_MAX+1 AXI ID width on the PL310 master ports: pl310_AXI_ID_MAX+3 RTL pl310_LOCKDOWN_BY_MASTER YES Enable lockdown by master support RTL pl310_LOCKDOWN_BY_LINE YES Enable lockdown by line support RTL pl310_ADDRESS_FILTERING YES Address filtering on 2nd AXI master enabled RTL pl310_TAG_SETUP_LAT 0 Setup time for Tag RAM = 0 core clock cycles RTL pl310_TAG_READ_LAT 1 READ tag RAM latencies = 1 core clock cycles RTL pl310_TAG_WRITE_LAT 1 WRITE tag RAM latencies = 1 core clock cycles RTL pl310_DATA_SETUP_LAT 0 Setup time for data RAM = 0 core clock cycles RTL pl310_DATA_READ_LAT 2 READ data RAM latencies = 2 core clock cycles 36/590 Doc ID 018553 Rev 3 RM0078 Table 4. CPU subsystem (A9SM) CORTEXA9INTEGRATION and PL310 configuration parameters (continued) Type Name Value Description RTL pl310_DATA_WRITE_LAT 2 WRITE data RAM latencies = 2 core clock cycles RTL pl310_NB_WAYS 8 Number of ways = 8 RTL pl310_SPECULATIVE_READ YES Enable the capability of emitting speculative reads RTL pl310_DATA_BANKING NO Allow a data reoganization in banks PIN CFGBIGEND zero Little endian at reset PIN WAYSIZE 2.5.2 3'b011 Way size of 64 KB A9 CoreSight subsystem (A9CS) The A9CS is dedicated to debugging and tracing. It is a modular and fully customizable subsystem. In normal functional mode, A9CS is powered off. These are the main A9SM components: AMBA advanced trace bus (ATB) The ATB transfers trace data through CoreSight infrastructure in a SoC. Trace sources are ATB masters, and sinks are ATB slaves. Link components provide both master and slave interfaces. Trace port interface unit (TPIU) The TPIU is an ATB slave that drains trace data off the chip. It acts as a bridge between the on-chip trace data and a data stream that is captured by a Trace Port Analyzer (TPA). The Formatter within the TPIU combines the source data and IDs into a single data stream, to enable serialization of data, inserting trigger packets on trigger detection. Embedded trace buffer (ETB) The ETB is an ATB slave and provides on-chip storage of trace data using a configurable sized RAM. The ETB accepts trace data from CoreSight trace source components through an AMBA trace bus (ATB). The Formatter in the ETB combines the source data and IDs into a single data stream. The Formatter operates in an identical manner to the Formatter in the TPIU. In this implementation ETB size is 8 Kbyte. Program trace macrocell (PTM) The PTM for the Cortex-A9 processor is a module that performs real-time instruction flow tracing based on the Program Flow Trace (PFT) architecture. The PTM-A9 generates information that trace tools use to reconstruct the execution of all or part of a program. Doc ID 018553 Rev 3 37/590 CPU subsystem (A9SM) RM0078 Cross trigger interface (CTI) The CTI combines and maps the trigger requests, and broadcasts them to all other interfaces on the ECT as channel events. When the CTI receives a channel event it maps this onto a trigger output. This enables subsystems to cross trigger with each other. The receiving and transmitting of triggers is performed through the trigger interface. Cross trigger matrix (CTM) This block controls the distribution of channel events. It provides Channel Interfaces (CIs) for connection to either CTIs or CTMs. This enables multiple CTIs to be linked together. Debug Access Port (DAP) The DAP comprises a number of components supplied in a single configuration. All the supplied components fit into the various architectural components for Debug Ports (DPs), which are used to access the DAP from an external debugger and Access Ports (APs), to access on-chip system resources. The debug port and access ports together are referred to as the DAP. The DAP provides real-time access by the debugger software to the JTAG scan chains in the chip, to all debug and trace configuration registers. For multicore systems debug access is maintained even if one core is powered down or asleep. Debug Access Port ROM table (DAP ROM) The DAP provides an internal ROM table connected to the master Debug APB port of the APB Mux. The ROM table stores the locations of the components on the Debug APB. The ROM table is a read-only device, writes are ignored. Figure 4 is the block diagram for both A9CS and CORTEXA9INTEGRATION. An active power up request of the debug domain must be applied to the APB-CMR. This request can be done either by the processor or by the debug access port (DAP) through the JTAG interface. All CoreSight peripherals are mapped within a space of 128 Kbytes. CoreSight components are mapped within the memory space of the system and each one has a 4-Kbyte address space reserved. Table 5 provides the complete list. The two ROM tables, DAPROM and CortexA9ROM (see Table 5: A9CS memory map, and DAPROM register details in RM0089, Reference manual, SPEAr1340 address map and registers) contain all the entries required to perform a topology detection of the system by reading on their contents. Starting from the DAPROM it is possible to follow the link to the CortexA9ROM. 38/590 Doc ID 018553 Rev 3 RM0078 CPU subsystem (A9SM) Figure 4. A9CS block diagram with CORTEXA9INTEGRATION Trace Port A9CS TPIU CTI ETB ATB ATB FUNNEL ETB ATB PTM0 PTM1 CPU0 CPU1 CTI0 CTI1 ATB CTM FUNNEL TPIU DAP ROM CTM CORTEXA9 ROM DAP JTAG CORTEXA9INTEGRATION APB-SYS In both ROM tables, each entry has the following fields: ● Bit 0: component present (1) or not (0) ● Bit 1: component with 32 bit (1) or 8 bit (0) data ● Bit 11-2: always 0 ● Bit 31-12: base address of the component. The system designer must define the external ROM table; for this purpose, a dedicated input at the A9SM boundary is provided: EXTROMTABLEOFFSET[31:12] and EXTROMTABLEOFFSETV. Doc ID 018553 Rev 3 39/590 CPU subsystem (A9SM) Table 5. 2.5.3 RM0078 A9CS memory map CoreSight component Base address OFFSET from DAPROM DAPROM 0xE0780000 0x00000 TPIU 0xE0781000 0x01000 CTI 0xE0782000 0x02000 ETB 0xE0783000 0x03000 FUNNEL TPIU 0xE0784000 0x04000 FUNNEL ETB 0xE0785000 0x05000 RESERVED 0xE0786000 0x06000 CORTEXA9 ROM 0xE07A0000 0x20000 RESERVED 0xE07A1000 0x21000 CORE0 CP14 0xE07B0000 0x30000 CORE0 PMU 0xE07B1000 0x31000 CORE1 CP14 0xE07B2000 0x32000 CORE1 PMU 0xE07B3000 0x33000 RESERVED 0xE07B4000 0x34000 CORE0 CTI 0xE07B8000 0x38000 CORE1 CTI 0xE07B9000 0x39000 RESERVED 0xE07BA000 0x3A000 CORE0 PTM 0xE07BC000 0x3C000 CORE1 PTM 0xE07BD000 0x3D000 RESERVED 0xE07BE000 0x3E000 Clock manager (CMR) The clock manager is the block that takes the clock coming from the system (the PLL is outside of A9SM) and divides it into clock signals for each internal block. To enable the A9CS clocks, you can either: ● program the CMR register through the APB-CMR interface –or– ● use the signals provided internally and managed by the CMR itself to enable the A9CS clocks through the JTAG interface To disable the A9CS clocks, use only the first of the above methods. You can use the clock manager to control the clock gating of the debug part, for instance the CoreSight subsystem (A9CS). This is the role of the APB-CMR interface, a standard APB3 interface whose main task is to provide a bus interface for A9CS clock enable/disable (for more detail, see Clock manager registers in RM0089, Reference manual, SPEAr1340 address map and registers). 40/590 Doc ID 018553 Rev 3 RM0078 2.5.4 CPU subsystem (A9SM) Snoop control unit (SCU) The SCU connects the two Cortex-A9 processors to the memory system through the AXI interfaces. The SCU functions are to: Note: ● maintain data cache coherency between the Cortex-A9 processors ● initiate L2 AXI memory accesses ● arbitrate between Cortex-A9 processors requesting L2 accesses ● manage ACP accesses The A9 SCU does not support hardware management of coherency of the instruction cache. Address filtering In the two-master port configuration, the SCU can be given an address range that redirects all memory transactions within the range to the second master port. The SCU routes all other memory transactions to the first master port. When filtering is off, exclusive accesses go to port M0; when filtering is on, exclusive accesses go to either port M0 or port M1, depending on the address. If the exclusive access is in the filtering range, it goes to M1; if not, it goes to M0. The SCU register bank provides the filtering mode enable bits and the address range selection registers (see Filtering Start Address Register, Filtering End Address Register, and SCU Control Register in RM0089, Reference manual, SPEAr1340 address map and registers). SCU event monitoring The individual CPU event monitors can be configured to gather statistics on the operation of the SCU. Refer to Cortex-A9 technical reference manual for more detail on monitoring events. 2.5.5 Global timer (GTIM) The global timer is: ● a 64-bit incrementing counter with an auto-incrementing feature ● memory mapped in the same address space as the private timers ● accessed at reset in secure state only (using the SCU Access Control Register) ● accessible to all Cortex-A9 processors. Each Cortex-A9 processor has a 64-bit comparator that is used to assert a private interrupt when the global timer has reached the comparator value. All the Cortex-A9 processors in a design use a common ID, ID[27], for this interrupt. This ID is sent to the interrupt controller as a private peripheral interrupt (see Interrupt distributor section). Global timer interrupt The global timer interrupt (ID[27]) is set as pending in the interrupt distributor when the counter register has the same value as the comparator register, after the event flag is set in the global timer interrupt status register. See also, Section 2.6.1: Programming the global timer registers. Doc ID 018553 Rev 3 41/590 CPU subsystem (A9SM) 2.5.6 RM0078 Timer and watchdog blocks (WDTIM) The watchdog can be configured as a timer. Both the timer and watchdog blocks have the following features: ● a 32-bit counter that generates an interrupt when it reaches zero ● an 8-bit prescaler for better control of the interrupt period ● configurable single-shot or auto-reload modes ● configurable starting values for the counter ● same clock as the interrupt controller clock Calculating timer intervals Use the following equation to calculate the timer intervals; this equation can be used to calculate the period between two events generated by a timer or watchdog. ( Prescaler_value + 1 ) × ( Load_value + 1 )---------------------------------------------------------------------------------------------------------------PERIPHCLK Timer and watchdog interrupts The timer interrupt ID[29] is set as pending in the interrupt distributor when the timer counter register reaches zero, after the event flag is set in the timer interrupt status register. The watchdog interrupt ID[30] is set as pending in the interrupt distributor when the watchdog counter register reaches zero, after the event flag is set in the watchdog interrupt status register. 2.5.7 Generic interrupt controller (GIC) The generic interrupt controller is a single functional unit located in a Cortex-A9 multiprocessor design. It is memory-mapped. The Cortex-A9 processors access it by using a private interface through the SCU. The GIC collates interrupts from a large number of sources and provides: 42/590 ● masking of interrupts ● prioritization of interrupts ● distribution of interrupts to the target Cortex-A9 processors ● tracking of the status of interrupts ● generation of interrupts by software ● support for security extensions Doc ID 018553 Rev 3 RM0078 CPU subsystem (A9SM) Interrupt sources can be of the following types: ● Software generated interrupts (SGI): they are generated by writing to the Software generated interrupt register (ICDSGIR). A maximum of 16 SGIs can be generated for each Cortex-A9 processor interface. ● Private peripheral interrupts (PPI): An interrupt generated by a peripheral that is specific to a single Cortex-A9 processor. There are 5 PPIs for each Cortex-A9 processor interface. ● Shared peripheral interrupts (SPI) An interrupt generated by a peripheral that the generic interrupt controller can route to any, or all, Cortex-A9 processor interfaces. The generic interrupt controller supports 128 SPIs. ● Lockable shared peripheral interrupts (LSPI) There are 31 LSPIs. You can configure and then lock these interrupts against further change using CFGSDISABLE. The LSPIs are present only if the SPIs are present. The generic interrupt controller consists of an Interrupt distributor and Cortex A9 processor interfaces. Interrupt distributor The interrupt distributor consists of a register-based list of interrupts, their priorities and activation requirements, Cortex-A9 processor targets, and their pending and active status. The interrupt distributor centralizes all interrupt sources, determines the priority of each interrupt and distributes the interrupt with the high priority to the Cortex A9 processor interfaces that connect to the processors in the system. The processor interface acknowledges interrupts and changes interrupt priority masks. Hardware ensures that an interrupt targeted at several processors can be taken by only one processor at a time. When the interrupt distributor detects an interrupt assertion, it sets the status of the interrupt for the targeted Cortex-A9 processors to pending. Level-triggered interrupts cannot be marked as pending if they are active for at least one Cortex-A9 processor. When an interrupt is triggered by the software interrupt register or the set-pending register, the status of that interrupt for the targeted Cortex-A9 processor or processors is set to pending. This interrupt then has the same behavior as a hardware interrupt. The distributor does not differentiate between software and hardware triggered interrupts. When multiple pending interrupts have the same priority, the selected interrupt is the one with the lowest ID. If there are multiple pending software-generated interrupts with the same ID, the lowest Cortex-A9 processor source is selected. For each processor the prioritization and selection block searches for the pending interrupt with the highest priority. This interrupt is then sent with its priority to the processor interface. The prioritization logic is physically duplicated to enable the simultaneous selection of the highest priority interrupt for each processor. The processor interface returns information to the distributor when the processor acknowledges (pending to active transition) or clears an interrupt (active to inactive transition). With the given interrupt ID, the interrupt distributor updates the status of this interrupt according to the information sent by the processor interface. Interrupt distributor interrupt sources. All interrupt sources are identified by a unique ID. They have their own configurable priority and a list of targeted Cortex-A9 processors, which is a list of processors that the interrupt is sent to when triggered by the interrupt distributor. Doc ID 018553 Rev 3 43/590 CPU subsystem (A9SM) RM0078 Interrupt sources can be of the following types: ● Software generated interrupts (SGI) Each Cortex-A9 processor has private interrupts, ID[0:15], that can be triggered only by software. These interrupts are aliased so that there is no requirement for a requesting Cortex-A9 processor to determine its own CPU ID when it deals with SGIs. The priority of an SGI depends on the value set by the receiving Cortex-A9 processor in the banked SGI priority registers, not the priority set by the sending Cortex-A9 processor. ● A legacy nFIQ pin, PPI(0) In legacy FIQ mode, the legacy nFIQ pin, on a per Cortex-A9 processor basis, bypasses the interrupt distributor logic and directly drives interrupt requests into the Cortex-A9 processor. When a Cortex-A9 processor uses the generic interrupt controller, rather than the legacy pin in the legacy mode, by enabling its own Cortex-A9 processor interface, the legacy nFIQ pin is treated like other interrupt lines and uses ID[28]. ● Private timer, PPI(1) Each Cortex-A9 processor has its own private timers that can generate interrupts, using ID[29]. ● Watchdog timers, PPI(2) Each Cortex-A9 processor has its own watchdog timers that can generate interrupts, using ID[30]. ● A legacy nIRQ pin, PPI(3) In legacy IRQ mode, the legacy nIRQ pin, on a per Cortex-A9 processor basis, bypasses the interrupt distributor logic and directly drives interrupt requests into the Cortex-A9 processor. ● Generic interrupt controller When a Cortex-A9 processor uses the interrupt controller (rather than the legacy pin in the legacy mode) by enabling its own Cortex-A9 processor interface, the legacy nIRQ pin is treated like other interrupt lines and uses ID[31]. ● Global timer, PPI(4) The global timer uses ID[27]. ● Shared peripheral interrupts (SPI) SPIs are triggered by events generated on associated interrupt input lines. The interrupt controller can support up to 224 interrupt input lines. The interrupt input lines can be configured as either edge sensitive (posedge), or level sensitive (high level). SPIs start at ID[32]. Cortex A9 processor interfaces The Cortex-A9 processor interfaces are slaves to the Cortex-A9 processors. They perform priority masking and preemption handling for a connected processor. There is one CortexA9 processor interface for each processor. A pending interrupt is accepted only if its priority is higher than the priority mask and also than the priority of the highest priority active interrupt active on that Cortex-A9 processor. If a pending interrupt is accepted, the effect is that an interrupt request is made to the processor for interrupt exception entry. If the processor then reads its interrupt acknowledge register, the processor interface records the priority of this interrupt and marks it as active in the interrupt distributor for that processor. 44/590 Doc ID 018553 Rev 3 RM0078 CPU subsystem (A9SM) If an interrupt is sent by several processors, only the first one gets this interrupt ID and other processors read the spurious ID, or another pending interrupt ID. If the interrupt is cleared before the Cortex-A9 processor reads its interrupt acknowledge register, for example because of a priority mask change or a write to the interrupt pending clear register, the Cortex-A9 processor gets the interrupt ID value 1023, indicating a spurious interrupt. The interrupt active to inactive transition is triggered by an Cortex-A9 processor writing the completed interrupt ID in its end of interrupt register. Security extensions support The generic interrupt controller enables all implemented interrupts to be individually defined as secure or non-secure. You can program secure interrupts to use either the IRQ or FIQ interrupt mechanism of a Cortex-A9 processor through the FIQen bit in the ICPICR register. Non-secure interrupts are always signalled using the IRQ mechanism of a Cortex-A9 processor. Note: A non-secure access to a register of a secure interrupt behaves as RAZ/WI. Priority formats. The software view of priority fields depends on the status of the access request to the priority field (NS-prot), and the security status (NS-int) of the interrupt that the priority field refers to. The priority space is partitioned to ensure that secure interrupts can always be given a priority higher than any non-secure interrupt. The non-secure domain observes a smaller available range of priority levels than the range available to the secure domain as Figure 5 shows. In Figure 5, priority format A shows the format this implementation uses for secure accesses. Priority format B shows the format this implementation uses for non-secure accesses. Bit D is the most significant bit (MSB) of the non-secure interrupt priority view. The least significant bit (LSB) is always zero. Priority format C shows the non-secure interrupt priority internal format as viewed by secure accesses. The MSB is usually one and it is automatically set for non-secure writes. Note: Priority zero is the highest priority. The lowest priority is priority 0x1F. Doc ID 018553 Rev 3 45/590 CPU subsystem (A9SM) Figure 5. RM0078 Secure and non-secure interrupt priority formats MSB LSB 7 6 5 4 3 2 1 0 Priority format A Interrupt security setting Security status of access SBZ E D C B A Any Secure Secure accesses 7 6 5 4 3 2 1 0 Priority format B D C B A Interrupt security setting Security status of access SBZ Non-secure Non-secure 7 6 5 4 3 2 1 0 Interrupt security setting Security status of access Priority format C 1 D C B A SBZ Non-secure Secure Non-secure accesses Non-secure writes as viewed by secure reads Interrupt security setting Security status of access Silently fails , no exception generated, Secure Non-secure RAZ/WI RAZ/WI Interrupt output source selection. There are two legacy interrupt inputs, nFIQ[n] and nIRQ[n], for each Cortex-A9 processor. When you use the legacy mode, the interrupt controller disables the corresponding Cortex-A9 processor interface and it routes the legacy interrupt inputs to the Cortex-A9 processor generating IRQ and FIQ exceptions respectively. Otherwise, these pins are used as PPI(0) and PPI(3). Table 6 shows the bits in the ICPICR register that enable you to select the signals that drive the interrupt outputs of a Cortex-A9 processor interface. Table 6. Interrupt output source selection ICPICR register Bit[3] FIQEn 46/590 Bit[1] EnableNS Interrupt output signals Bit[0] EnableS FIQ exception generated by IRQ exception generated by 0 0 0 nFIQ[n] nIRQ[n] 0 0 1 nFIQ[n] Secure interrupts 0 1 0 nFIQ[n] Non-Secure interrupts 0 1 1 nFIQ[n] Secure and non-Secure interrupts 1 0 0 nFIQ[n] nIRQ[n] 1 0 1 Secure interrupts nIRQ[n] 1 1 0 nFIQ[n] Non-Secure interrupts 1 1 1 Secure interrupts Non-Secure interrupts Doc ID 018553 Rev 3 RM0078 CPU subsystem (A9SM) Using CFGSDISABLE. The interrupt controller provides the facility to prevent write accesses to critical configuration registers when you assert CFGSDISABLE. This signal controls the read and write behavior for the secure control registers in the distributor and Cortex-A9 processor interfaces, and the lockable shared peripheral interrupts (LSPIs) in the interrupt controller. If you use CFGSDISABLE, ARM recommends that you assert CFGSDISABLE during the system boot process, after the software has configured the registers. Ideally, the system must deassert CFGSDISABLE only if a hard reset occurs. When CFGSDISABLE is HIGH, the interrupt controller prevents write accesses to the following registers in the: ● Distributor The enable_set register ● ● Secure interrupts defined by LSPI field in the ic_type register: – Interrupt security registers – Enable set registers – Enable clear registers – Pending set registers – Pending clear registers – Priority level registers – SPI target registers – Interrupt configuration register Cortex-A9 processor interface The ICPICR register, except for the EnableNS bit. Note: When CFGSDISABLE is HIGH the interrupt controller permits write access only to the EnableNS bit. All other bits are read-only. After you assert CFGSDISABLE, it changes the register bits to read-only and therefore the behavior of these secure interrupts cannot change, even in the presence of rogue code executing in the secure domain. Doc ID 018553 Rev 3 47/590 CPU subsystem (A9SM) RM0078 2.6 Programming 2.6.1 Programming the global timer registers This section provides information about how to program the global timer registers Programming the global timer counter register Note: 1. Clear the timer enable bit in timer control register 2. Write the lower 32-bit timer counter register 3. Write the upper 32-bit timer counter register 4. Set the timer enable bit You must use this register with 32-bit accesses.You cannot use the STRD/LDRD instructions. Reading the global timer counter register 1. Read the upper 32-bit timer counter register 2. Read the lower 32-bit timer counter register 3. Read again the upper 32-bit timer counter register. If the value is different from the precedent 32-bit upper value, read the lower 32-bit timer counter register. Otherwise, the 64-bit timer counter value is correct. Programming the global timer compare register Use the following steps to ensure that updates to this register do not set the timer interrupt status register. 48/590 1. Clear the COMPEN bit in the timer control register 2. Write the lower 32-bit comparator value register 3. Write the upper 32-bit comparator value register 4. Set the COMPEN bit and, if necessary, the IRQ enable bit Doc ID 018553 Rev 3 RM0078 Multilayer interconnect matrix (BUSMATRIX) 3 Multilayer interconnect matrix (BUSMATRIX) This chapter focuses on BUSMATRIX functionality and operation. For the BUSMATRIX feature list, refer to the SPEAr1340 datasheet: ● Doc ID 023063, Data sheet, SPEAr1340, Dual-core Cortex A9 HMI embedded MPU For technical details about the programmable registers, refer to the following companion document: ● 3.1 RM0089, Reference manual, SPEAr1340 address map and registers Overview The multilayer interconnect matrix is the connectivity infrastructure that enables data exchange between the various blocks of the device. This structure supports parallel communications between master and slave components, and ensures the maximum level of system throughput. SPEAr1340 block diagram with BUSMATRIX topology details 120 A9 subsystem module PTM I/F B0/B1 CoreSight subsystem FPU A9 CPU G I C AXI-32 AXI-64 Instr Cache Data Cache SPEAr1340 SCU Cache2cache transfers Snoop filtering Watchdogs/ Timers Q ABI CLCD 60 GMAC 36 A9 CPU Data Cache Instr Cache PTM I/F FPU ACP Figure 6. L2 cache (512 KB) 10 20 SATA PCIe0 35 VENC 75 VDEC 55 AXI-64 AXI-64 AXI-64 VIP 72 AXI-64 AXI-64 GPU 100 AXI-64 DMAC1 50 AXI-64 AHB-64 AHB-64 SMX0 Xbar PP AHB-64 SMX1 Xbar PP MCIF UOC 34 UHC0 30/31 UHC1 32/33 C3 70 AHB-32 2xAHB-32 2xAHB-32 AHB-32 DMAC0 40 SD/SDIO/ MMC 71 AHB-32 SMX2 Shared link to S3220 AHB-32 AXI-64 AXI-64 AXI-64 AXI-64 AXI-64 AXI-64 MPMC K MPMC H MPMC J MPMC L MPMC M MPMC N Port 2 Port 0 Port 1 Port 3 Port 4 Port 5 AXI-64 CEC I_2/_3 PCIe0 C5 16/32 bits (with ECC) DDR2/3 @533 MHz CAM I_4/5/ 6/7 AXI-32 UART1 I2C1 I2S S/M D11_0 D11_1 A5_0/_1 APB MPMC B2 MPMC VIPP I_1 AHB-32 AXI-32 SATA PCIe0 C6 Native-32 Native-32 AHB-32 SYS RAM0 A8 SYS ROM B11 FSMC A0/1 AHB-32 AHB-32 MCIF MCIF SD/SDIO/ MMC CF/xD C3 C4 OCP-32 GPU I_0 SPDIF I/O I_8/_9 GPT(2) B4/B5 GPIO (2) B7/B8 RTC B9 MISC B10 GPT(2) B15/B16 2xAPB 2xAPB APB APB 2xAPB SYS RAM1 A10 UART0 A2 SSP A3 I2C0 A4 ADC A6 KBD A9 PWM A7 Native-32 APB APB APB APB APB APB AHB-32 to SMX 2xAPB S3220 2xAHB-32 2xAHB-32 AHB-32 AHB-32 AHB-32 AHB-32 UHC0 D1/D2 UHC1 D3/D4 UOC D5 GMAC D0 XGPIO D6 SMI B3/13 Doc ID 018553 Rev 3 AHB-32 AHB-32 AHB-32 DMAC0 B6 DMAC1 B14 MIPHY B12_0 AHB-32 AHB-32 AHB-32 AHB-32 VENC VDEC B12_1 B12_2 CLCD C0 C3 C1 49/590 Multilayer interconnect matrix (BUSMATRIX) 3.2 RM0078 Pins The BUSMATRIX does not have any off-chip signals. 3.3 Clocks Refer to Chapter 5: Reset and clock generator (RCG). 3.4 Interrupts Refer to Appendix A: Interrupts. 3.5 Functional description Note: In this document, initiator agent (IA) and master are used synonymously, and target agent (TA) and slave are used synonymously. 3.5.1 Crossbars (XB) SMX0 and SMX1 can enable full connectivity between all of the IAs and TAs that require it. Crossbars are meant for performance and latency control. 3.5.2 Shared link (SL) SMX2 allows full connectivity between all IAs and TAs by maintaining a unique channel that uses time division to share its use. A shared link is easier to implement, and alleviates the problem of a high frequency design by allowing a better implementation for IAs and TAs that do not require high bandwidth. 3.5.3 S3220 The S3220 can manage up to four transactions in parallel, and can easily adapt to serve a peripheral with slow register access due to low-speed data FIFOs. 50/590 Doc ID 018553 Rev 3 RM0078 3.5.4 Multilayer interconnect matrix (BUSMATRIX) Masters (IAs) and slaves (TAs) Table 7 lists IAs by connectivity (group ID), and provides individual IDs, initiating IPs, and protocol types. Table 8 provides an IA and TA connectivity matrix. Table 7. IA group organization IA group ID IA ID Initiating IP Protocol type 10 10 A9SM AXI-64 20 20 A9SM AXI-64 30 UHC0 AHB-32 31 UHC0 AHB-32 32 UHC1 AHB-32 33 UHC1 AHB-32 34 UOC AHB-32 36 GMAC AXI-32 35 35 PCIE/SATA0 AXI-64 40 40 DMAC0 AHB-64 50 50 DMAC1 AHB-64 55 55 VDEC AXI-64 60 60 CLCD AXI-64 70 C3 AHB-32 71 MCIF AHB-32 72 VIP AHB-64 75 75 VENC AXI-64 100 100 GPU AXI-64 30 70 Known limitations ● The AXI protocol supports sequences of locked transactions, which are restricted to a single read-modify-write sequence. ● Read and write transactions must be made to the same address. ● The bridge converts AXI exclusive transactions to an OCP2 ReadLinked/WriteConditional pair. OCP2 limits ReadLinked/WriteConditional requests to a single request per thread at the taget core, and the bridge extends this restriction to AXI exclusive transactions. Doc ID 018553 Rev 3 51/590 Multilayer interconnect matrix (BUSMATRIX) Table 8. RM0078 Connectivity matrix IA group index Slave index IP Comment 10 H 20 30 35 40 50 55 60 70 75 80 100 X K X X J X MPMC X X DDR memory access L X X X M X N X IO space and configuration space X X NOR/SRAM memory space X X X X Configuration registers X X X X NAND memory space X X PCIE/SATA0 DBI space X X A5_0 I2S_S X X X X A5_1 I2S_M Configuration and data registers X X X X CF/xD X X X X X X X SD/SDIO/MMC X X X X X X X Standard shared RAM (32 KB) X X X X X X X X X X X X X X X X C3 X X PCIE0 C6 X X X C5 A1_1 X X X ACP FSMC X X A9SM A0 X X Q A1_0 X X X X X X X X X MCIF C4 A8 SYSRAM0 D11_1 UART1 D11_0 I2C1 Configuration and DMA port X X E0 BUSMATRIX Configuration registers for SMX X X I_0 GPU Configuration registers X X X X X X X X X X I_1 VIP Configuration registers X X X X X X X X X X Configuration and data registers X X X X X X X X X X CEC X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X I_2 I_3 I_4 CAM0 I_5 CAM1 Configuration and data registers X X I_6 CAM2 I_7 CAM3 X X X X X X X X X X I_8 SPDIF OUT X X X X X X X X X X I_9 SPDIF IN X X X X X X X X X X 52/590 Doc ID 018553 Rev 3 X RM0078 Multilayer interconnect matrix (BUSMATRIX) Table 8. Connectivity matrix (continued) IA group index Slave index IP Comment 10 20 30 35 40 50 55 60 70 75 80 100 A2 UART0 Configuration and data registers X X X X X X X X A6 ADC Configuration and data registers X X X X X X X X A3 SSP Configuration and data registers X X X X X X X X A7 PWM Configuration and data registers X X X X X X X X A4 I2C0 Configuration and data registers X X X X X X X X A9 KBD Configuration registers X X X X X X X X B4 GPT0 X X X X X X X X B5 GPT1 X X X X X X X X Configuration registers B15 GPT2 X X X X X X X X B16 GPT3 X X X X X X X X B9 RTC X X X X X X X X B7 GPIOA X X X X X X X X X X X X X X X X Configuration registers X X X X X X X X APB-SYS for internal (A9SM) Coresight access X X X X X X X X APB-CMR for clock manager access X X X X X X X X Configuration registers Configuration registers B8 GPIOB B10 MISC B0 A9SM B1 A10 SYSRAM1 Memory for always-on support (4 KB) X X X X X X X X C0 CLCD Configuration registers X X X X X X X X C1 C3 Configuration registers X X X X X X X X D0 GMAC Configuration registers X X X X X X X X D6 XGPIO Registers X X X X X X X X D5 UOC Control and status registers programming interface X X X X X X X X OHCI X X X X X X X X EHCI X X X X X X X X OHCI X X X X X X X X EHCI X X X X X X X X D1 UHC0 D2 D3 UHC1 D4 Doc ID 018553 Rev 3 53/590 Multilayer interconnect matrix (BUSMATRIX) Table 8. Connectivity matrix (continued) IA group index Slave index IP B13 SMI B3 B6 RM0078 Comment 10 20 NAND/NOR memory access X Configuration registers 75 80 100 X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X Configuration registers X X X X X X X X Internal peripherals (to be set using PERIPHBASE) X X L2CC configuration space (to be set using REGFILEBASE) X X DMAC0 30 35 40 50 55 X X X X X X X X X X X X X X 60 70 Configuration registers B14 DMAC1 B12_0 MIPHY B12_1 VENC Programming port B12_2 VDEC B2 MPMC P0 A9SM P1 E1 BUSMATRIX Configuration registers for S3220 X X X X X X X X B11 SYSROM Embedded ROM (32 KB) X X X X X X X X Each time an IA accesses outside its allowed address map, either it receives a bus error, or an interrupt is raised through the BUSMATRIX interrupt line to signal an abnormal transaction. READ operations always return a bus error. For WRITE operations, the interconnect distinguishes between posted and unposted transactions. Because there is no wait for a response for posted transactions, the bus signals the event through its interrupt line (sideband signaling) rather than transporting an in-band error. For information on how to handle this condition, refer to Appendix A: Interrupts. Information on programming posted and unposted transactions is provided in the individual IP chapters. 54/590 Doc ID 018553 Rev 3 RM0078 4 System configuration registers (MISC) System configuration registers (MISC) Using a 32-bit APB interface, the miscellaneous registers configure the SPEAr1340 global parameters (such as clocks, resets, and pads) and peripherals. SPEAr1340 registers are described in the companion reference manual: RM0089, Reference manual, SPEAr1340 address map and registers. Doc ID 018553 Rev 3 55/590 Reset and clock generator (RCG) 5 RM0078 Reset and clock generator (RCG) This chapter focuses on RCG functionality and operation. For the RCG feature list, refer refer to the SPEAr1340 datasheet: ● Doc ID 023063, Data sheet, SPEAr1340, Dual-core Cortex A9 HMI embedded MPU For technical details about the programmable registers, refer to the following companion document: ● 5.1 RM0089, Reference manual, SPEAr1340 address map and registers Overview The reset and clock generator (RCG) provides the system clocks and resets. It can be configured through the miscellaneous registers. Figure 7. RCG block diagram Primarily used to generate the 1 GHz clock for the AMBA subsystem osci1 Generates clocks osci3 PLL1 Contains the gating cells (driven by MISC registers) that enable/disable clocks pll1out vco1div2 XGPIO90 XGPIO132 PLL2 pll2out vco2div2 CLOCK SYS GATE UNIT clock pll3out vco3div2 PLL3 CLOCK CONTROL Primarily used to generate the 1.2 GHz clock for the AMBA subsystem RESET GENERATOR MISC control signals Drives system resets See also: Figure 8: RCG integration in SPEAr1340. 56/590 reset Doc ID 018553 Rev 3 RM0078 5.2 Reset and clock generator (RCG) Pins For a complete pin description, refer to Doc ID 023063, Data sheet, SPEAr1340, Dual-core Cortex A9 HMI embedded MPU. 5.3 Clocks Table 9. RCG clocks IP A9SM ADC RCG name CAM1 CAM2 CAM3 CAM4 CEC0 CEC1 Maximum frequency (MHz) CLK1GHZ pclk_a9sm PCLK 83 PERIP1_CLK_ENB[0] atclks_a9sm ACLKS_SOC 166 PERIP2_CLK_ENB[6] aclkm0_a9sm ACLKM0 166 aclkm1_a9sm ACLKM0 166 pclk_adc PCLK 83 PERIP1_CLK_ENB[30] clk_adc ADC_CLK 20 See Chapter 31 ahclkclk_i 166 PERIP1_CLK_ENB[0] hclk_c3 hclk 166 PERIP1_CLK_ENB[29] clk_c3 clk48m 48 CAM1_PIXCLK PIXCLK <=100 hclk_cam1 HCLK CAM2_PIXCLK PixCLK hclk_cam2 HCLK CAM3_PIXCLK PixCLK hclk_cam3 HCLK CAM4_PIXCLK PIXCLK hclk_cam4 HCLK 166 PERIP3_CLK_ENB[7] hclk_cec0 HCLK 166 PERIP3_CLK_ENB[5] ck_cec0 = hclk_cec0 ck 166 PERIP3_CLK_ENB[5] hclk_cec1 HCLK 166 PERIP3_CLK_ENB[4] ck_cec1 = hclk_cec1 ck 166 PERIP3_CLK_ENB[4] Doc ID 018553 Rev 3 1200 Configuration registers and references clk1ghz BUSMATRIX hclk_bus C3 IP native name 166 <=100 166 <=100 166 <=100 See A9SM clock configuration Connected to the PAD and enabled through PERIP3_CLK_ENB[10] PERIP3_CLK_ENB[10] Connected to the PAD and enabled through PERIP3_CLK_ENB[9] PERIP3_CLK_ENB[9] Connected to the PAD and enabled through PERIP3_CLK_ENB[8] PERIP3_CLK_ENB[8] Connected to the PAD and enabled through PERIP3_CLK_ENB[7] 57/590 Reset and clock generator (RCG) Table 9. RM0078 RCG clocks (continued) IP RCG name IP native name Maximum frequency (MHz) Configuration registers and references hclk_clcd hclk 166 PERIP1_CLK_ENB[27] aclk_clcd aclk 166 PERIP1_CLK_ENB[27] clk_clcd pclk_in 200 See CLCD clock configuration DMAC hclk_dma hclk_i 166 PERIP1_CLK_ENB[25] FSMC hclk_fsmc hclk_i 166 PERIP1_CLK_ENB[4] hclk_gmac hclk_i 166 PERIP1_CLK_ENB[8] clk_tx clk_tx_i 125 clk_rx clk_rx_i 125 clk_rmii clk_rmii_i 50 clk_ptp_ref = osci1 clk_ptp_ref_i 24 GPIOA pclk_gpioa PCLK 83 PERIP1_CLK_ENB[23] GPIOB pclk_gpiob PCLK 83 PERIP1_CLK_ENB[24] pclk_gpt0 pclk 83 PERIP1_CLK_ENB[21] clk_timer0 timer_clk pclk_gpt1 pclk clk_timer1 timer_clk pclk_gpt2 pclk clk_timer2 timer_clk pclk_gpt3 pclk clk_timer3 timer_clk hclk_gpu MALI_SUBSYS_AXI_ m_aclk clk_gpu = gen3_clk MALI_200Mhz_clk pclk_i2c0 pclk 83 clk_i2c0= hclk ic_clk 166 pclk_i2c1 pclk 83 clk_i2c1= hclk ic_clk 166 pclk_i2s_m pclk 83 i2s_m_sclk sclk <=12 i2s_m_sclk I2S_OUT_BITCLK <=12 CLCD GMAC GPT0 GPT1 GPT2 GPT3 GPU I2C0 I2C1 I2S_M <=83 83 <=83 83 <=83 83 <=83 166 <=200 See GMAC clock configuration See GPT clock configuration PERIP1_CLK_ENB[22] See GPT clock configuration PERIP2_CLK_ENB[4] See GPT clock configuration PERIP2_CLK_ENB[5] See GPT clock configuration PERIP3_CLK_ENB[6] See Fractional clock generator (SSCG) PERIP1_CLK_ENB[18] PERIP3_CLK_ENB[2] PERIP1_CLK_ENB[20] See I2S clock configuration I2S_OUT_OVRSAMP_CLK I2S _S KBD 58/590 pclk_i2s_s pclk_kbd pclk 83 sclk <=12 pclk 83 Doc ID 018553 Rev 3 PERIP1_CLK_ENB[19] See I2S clock configuration PERIP2_CLK_ENB[3] RM0078 Table 9. Reset and clock generator (RCG) RCG clocks (continued) IP RCG name IP native name Maximum frequency (MHz) Configuration registers and references hclk_sd hclk_sd 166 PERIP1_CLK_ENB[6] hclk_cf_xd hclk_cf_xd 166 PERIP1_CLK_ENB[7] clk_sd clk_sd 10<clk<83 clk_cf_xd clk_cf_xd 25<clk<166 See XYSYNT clock divider pclk_ao pclk 83 Always-on hclk_mpmc hclk 166 PERIP2_CLK_ENB[0] aclk_mpmc aclk 166 PERIP2_CLK_ENB[0] clk_mpmc_phy clk 533 PERIP2_CLK_ENB[1] clk_mpmc_ctrl clk_d2 266 PERIP2_CLK_ENB[1] clk_mpmc_ddr clk_ref 533 PERIP2_CLK_ENB[1] OTP pclk_o clk_i 83 Always-on PCIE aclk_pcie_sata aclk 166 PERIP1_CLK_ENB[12] PCM pclk_ao pclk 83 Always-on PWM pclk_pwm PCLK 83 PERIP3_CLK_ENB[3] pclk_rtc pclk 83 PERIP1_CLK_ENB[31] clk_32k clk32k aclk_pcie_sata aclk 166 PERIP1_CLK_ENB[12] hclk_smi hclk_i 166 PERIP1_CLK_ENB[5] clk_smi smi_clk 50 See Chapter 17 hclk_spdif_in HCLK_I 166 PERIP3_CLK_ENB[12] clk_spdif_in CLK_APPL hclk_spdif_out HCLK clk_spdif_out clk_appl pclk_ssp pclk 83 clk_ssp=pclk_ssp sspclk 83 SYSRAM0 hclk_sysram0 hclk 166 PERIP1_CLK_ENB[3] SYSRAM1 hclk_sysram1 hclk 166 PERIP1_CLK_ENB[2] SYSROM hclk_sysrom hclk 166 PERIP1_CLK_ENB[1] pclk_ao pclk 83 Always-on clk_thsens thsclk pclk_uart0 clk_uart0 MCIF MISC MPMC RTC SATA SMI SPDIF (in) SPDIF (out) SSP THSENS UART0 See XYSYNT clock divider 32 KHz <=200 166 <=147 PERIP_CLK_CFG[14] See Fractional clock generator (SSCG) PERIP3_CLK_ENB[13] PERIP_CLK_CFG[15] See Fractional clock generator (SSCG) PERIP1_CLK_ENB[17] 187.5 KHz PERIP2_CLK_ENB[8] pclk 83 PERIP1_CLK_ENB[15] uartclk 125 See UART clock configuration Doc ID 018553 Rev 3 59/590 Reset and clock generator (RCG) Table 9. IP UART1 UHC0 UHC1 UOC VDEC VENC VIP XGPIO 60/590 RM0078 RCG clocks (continued) RCG name IP native name Maximum frequency (MHz) Configuration registers and references pclk_uart1 pclk 83 PERIP3_CLK_ENB[1] clk_uart1 uartclk 125 See UART clock configuration freeclk_usb phy_clk_i 30 clk48_uhc0 ohci_clk48_i 48 clk12_uhc0 ohci_clk12_i 12 clk30_uhc0 utmi_phy_clock_i 30 hclk_uhc0 hclk_i 166 freeclk_usb phy_clk_i 30 clk48_uhc1 ohci_clk48_i 48 clk12_uhc1 ohci_clk12_i 12 clk30_uhc1 utmi_phy_clock_i 30 hclk_uhc1 hclk_i 166 PERIP1_CLK_ENB[10] hclk_uoc hclk 166 PERIP1_CLK_ENB[11] clk30_uoc utmi_clk 30 clk_vdec = gen0_clk DCLK <=200 hclk_vdec HCLK 166 PERIP3_CLK_ENB[16] aclk_vdec ACLK 166 PERIP3_CLK_ENB[16] clk_venc = gen1_clk ENC_CLK hclk_venc HCLK 166 PERIP3_CLK_ENB[15] aclk_venc ACLK 166 PERIP3_CLK_ENB[15] VIP_PIXCLK pad PIX_CLK_I <=193 PERIP3_CLK_ENB[11] hclk_video_in HCLK 166 PERIP3_CLK_ENB[11] hclk_xgpio Hclk_i 166 PERIP3_CLK_ENB[18] Doc ID 018553 Rev 3 <=200 PERIP1_CLK_ENB[9] See Fractional clock generator (SSCG) See Fractional clock generator (SSCG) RM0078 Reset and clock generator (RCG) 5.4 Functional description This section describes the main blocks and functionality of the RCG. Figure 8 shows how the reset and clock generator is integrated in the device. Figure 8. RCG integration in SPEAr1340 AlwaysON 32 KHz osci2 OSCI32 GMAC UOC ARM divider RTC cpu_clk PCM 24 MHz osci1 clk1ghz usb_48 USB PHY PLL CPU0 CPU1 RCG usb_30 hclk/pclk DDR PHY PLL MISC osci3 CODEC 25/100 MHz PLL GPU GPU MPMC C BUS CAMIF CLCD SPDIF VIP PCIe/SATA UHC Note: AlwaysON, ARM, BUS, GPU and CODEC are the names of SPEAr1340 power domains. For more information, see Chapter 6: Power management. 5.4.1 Main clock sources ● osci1: 24 MHz clock from internal oscillator connected to external quartz ● osci2: 32 kHz clock from internal oscillator used for the RTC block (optional) ● osci3: 25/100 MHz clock from the MIPHY macro (optional) For a complete list of RCG clocks see Section 5.3: Clocks. Doc ID 018553 Rev 3 61/590 Reset and clock generator (RCG) RM0078 5.4.2 PLLs Note: See also: Section 5.5.1: Programming PLLs PLL1, PLL2 and PLL3 in the RCG module, as well as the memory controller subsystem dedicated PLL (PLL4) are the main sources of system clocks. Table 10 lists the PLL source clocks, and the fields of register PLL_CFG that configure them. At reset, osci1 is the default clock source for all PLLs. Table 10. PLL source clocks PLL Source PLL_CFG register field PLL1 osci1, osci3, XGPIO90 pll1_clk_sel PLL2 osci1, osci3, XGPIO132 pll2_clk_sel PLL3 osci1, osci3, XGPIO132 pll3_clk_sel PLL4 osci1 PLL4 generates the memory controller clocks, and is always fed by clock osci1. Table 11 lists the PLL output clocks, their reset values, and the registers that configure them. Table 11. PLL output clocks PLL Frequency (after reset) MISC register PLL1 pll1out at 1 GHz vco1div2 at 500 MHz vco1div4 at 250 MHz PLL1_CTR PLL1_FRQ PLL1_MOD PLL2 pll2out at 125 MHz vco2div2 at 500 MHz PLL2_CTR PLL2_FRQ PLL2_MOD PLL3 pll3out at 65 MHz vco3div2 at 520 MHz PLL3_CTR PLL3_FRQ PLL3_MOD PLL4 pll4out at 533 MHz PLL4_CTR PLL4_FRQ PLL4_MOD Figure 9 shows the PLL components. 62/590 Doc ID 018553 Rev 3 RM0078 Figure 9. Reset and clock generator (RCG) PLL overview clk_in Predivision factor (prediv_N) Postdivision factor (postdiv_P) CP & VCO pllout clksel Internal divider (fbkdiv_M) A classic phase-locked-loop circuit vcodiv2 div2 Analog PLL extfbclk External divider (dithering logic) Modulates the VCO frequency Analog PLL The analog PLL features are: ● Input clock frequency range: 4 MHz to 350 MHz ● VCO frequency range: 800 MHz to 1600 MHz ● Output frequency range: 12.5 MHz to 1600 MHz ● Power-down mode: consumption is only due to leakage ● Maximum lock time: 150 us For the feedback reference clock, it is possible to choose between the output of an internal divider (fbkdiv_M) and an external divider (dithering logic). Because the configuration of the internal divider is static, the output frequency is a constant value. The VCO frequency can be calculated as follows: 2 M 15:8 f in f VCO = --------------------------------------N Where – fin/N is the reference clock after the prediv_N divider: the frequency range is (4 MHz, 50 MHz). – fin is the frequency of the input clock listed in Table 10. The output frequency can be calculated as follows: 2 M 15:8 fin fo ut = --------------------------------------p N 2 Doc ID 018553 Rev 3 63/590 Reset and clock generator (RCG) RM0078 Where: – M[15:8] is the feedback division factor: pll_fbkdiv_M[15:8] field of PLLx_FRQ register – N is the pre division factor: pll_prediv_N field of PLLx_FRQ register – P is the post-division factor: pll_postdiv_P field of PLLx_FRQ register Table 12 lists division factor ranges. Table 12. PLL division factors Range (decimal) M[15:8] N P 8 to 255 1 to 7 0 to 6 The PLL also generates two other auxiliary clocks: ● vcodiv2: VCO frequency divided by 2 ● vcodiv4: VCO frequency divided by 4 Table 13 can be used to evaluate the jitter introduced at the output. Note that the jitter introduced by the input source is not taken into account, only device and supply noise is considered here. Table 13. Jitter at PLL output clock Jitter type A Jitter due to supply noise (ps) B Jitter due to device noise (% of PLL output time period) Total jitter Single-period jitter 25 0.16 +/- (A + σ * B) Cycle-to-cycle jitter 25 0.32 +/- (A + σ * B) The σ value is chosen depending on the percentage of the samples exceeding the calculated jitter. Table 14. 64/590 Selection of σ value σ value Percentage of samples exceeding the jitter value (%) 1 31.73 2 4.555 3 0.27 4 6.30*1e-03 5 5.63*1e-05 6 2.00*1e-07 7 2.82*1e-10 Doc ID 018553 Rev 3 RM0078 Reset and clock generator (RCG) For example: If the output clock frequency is 1 GHz, the jitter is: +/- (25 ps + 3* 0.32/100 * 1000 ps ) = +/- (25 ps + 9.6 ps ) = +/- 34.6 ps Only 0.27 % of samples (3 σ) exceed the calculated jitter of +/-34.6 ps. Dithering logic ● Programmable modulation period ● Selectable modulation depth. Recommended range: 0-2.5% ● Selectable Sigma-Delta order. Recommended: 2nd order ● Maximum modulation frequency : fmod(max) = 100 KHz To enable the dithering logic, set the clksel signal to 1 (PLLx_CTR register, field pll_control1[5]). In this mode the internal feedback divider is bypassed, and the external logic is used to generate the VCO reference signal. The external divider is driven in order to generate a triangular wave. The algorithm that performs this modulation is based on a sigma-delta converter fed by a triangular wave. When the external feedback is enabled, the output signal frequency is calculated as: 2 M fi n f out = -------------------------------p 256 N 2 Where M is the feedback division factor of the external divider ( PLLx_FREQ register, pll_fbkdiv_M field). PLL modes Register: PLLx_CTR Table 15. Field: pll_control1[2:1] PLL modes Mode Description Non dithered The PLL behaves as a normal PLL (internal feedback divider). Fractional-N VCO frequencies can be selected that are not integer multiples of the reference frequency. In this mode the external divider is selected. Dithering A triangular wave is added to the VCO frequency. (double side modulation) Dithering (single side modulation) Similar to double side modulation, but the modulation only subtracts from the main frequency. In dithering mode, the PLLx_MOD registers configure the output clock modulation period and slope parameters. Use the frequency of the modulation wave (fref) and the modulation depth (md) to compute the field values: f ref KHz pll_modperiod = -----------------------------------4 fmod KHz 8 f md M pll_slope = --------------------------------------pll_mod per iod Doc ID 018553 Rev 3 65/590 Reset and clock generator (RCG) RM0078 Where: fo sci f ref = ---------- is the frequency at the output of input divider N fmod is the frequency of the modulation wave M = pll_fbkdiv_M md is the modulation depth in respect of the nominal frequency of the undithered clock 5.4.3 Fractional clock generator (SSCG) An SSCG is a clock synthesizer able to divide an input clock by a fractional factor. Main features: ● Input frequency (fin): 250 to 500 MHz ● Output frequency (fout): fin/16 to fin ● Single period jitter: maximum value +/- 230 ps ● Output clock period resolution: 2-13 / fin The output clock period is calculated as follows: – Tout = 2* To * Tin where: – Tout is the output period – To is the division parameter; it is a 17-bit fixed point representation of the division factor, with the first 3 MSBs representing the integer part. – Tin is the input clock period Example fin = 500 MHz , fout = 48 MHz To = fin / (2 * fout ) = 5.2083 The corresponding fixed point (14 decimal digit) value is calculated as: To = 5.2083 * 214 = 85332 => To = 17b10100110101010100 Table 16. SSCG 66/590 SSCGn output frequencies Input clock Input frequency range (MHz) IP Register SSCG0 vco1div4 vco3div2 pll3out 250-450 VIDEO_DEC GEN_CLK_SSCG0 PLL_CFG[28:27] SSCG1 vco1div4 vco3div2 pll3out 250-450 VIDEO_ENC GEN_CLK_SSCG1 PLL_CFG[28:27] SSCG2 vco1div4 vco2div2 pll2out 250-450 SPDIF_OUT GEN_CLK_SSCG2 PLL_CFG[30:29] SSCG3 vco1div4 vco2div2 pll2out 250-450 GPU, SPDIF_IN GEN_CLK_SSCG3 PLL_CFG[30:29] Doc ID 018553 Rev 3 RM0078 Reset and clock generator (RCG) Table 16. SSCGn output frequencies (continued) SSCG 5.4.4 Input clock Input frequency range (MHz) IP Register SSCG4 vco1div2 250-600 CPU, AMBA Subsystem SYS_CLK_SSCG SSCG5 vco1div4 pll2out 250-450 CLCD CLCD_CLK_SSCG SSCG6 vco1div2 250-600 AMBA Subsystem AMBA_CLK_SSCG XYSYNT clock divider XYSYNT is a clock divider based on an integer counter. The input clock can be divided by an integer value by setting the parameters X and Y. The output frequency is calculated as follows: Formula 1 fo ut = fin X ---Y With X ≤Y ⁄ 2 In this case, the output signal is high for only one input clock period (see Figure 10). Figure 10. X=1 , Y= 4 (duty cycle < 50 %) Tin *Y/X T in Tout If a duty cycle of 50% is required, it is possible to use this formula: Formula 2 X fout = fin ----------2 Y Use the synt_clkout_sel field in the XYSYNT-related configuration registers to choose between the two formulas: Note: ● synt_clkout_sel = 1 selects the first formula ● synt_clkout_sel = 0 selects the second one (DC = 50%). The maximum XYSYNT input frequency is 600 MHz. To have a fixed output period, the Y/X ratio should be an integer. Doc ID 018553 Rev 3 67/590 Reset and clock generator (RCG) RM0078 Table 17 lists XYSYNT input and output clocks, and the XYSYNT-related configuration registers. Table 17. XYSYNT clocks XYSYNT Input clock Output clock IP Register I2S_DIV1, I2S_DIV2 Vco1div2 pll2out, pll3out I2S_OUT_REFCLK i2s1_sclk i2s1_refclk I2S_M I2S_CLK_CFG C3_CLK_SYNT vco1div2 clk_c3_synt C3 C3_CLK_SYNT UART0_CLK_SYNT vco1div2 clk_uart0_synt UART0 UART0_CLK_SYNT UART1_CLK_SYNT vco1div2 clk_uart1_synt UART1 UART1_CLK_SYNT GMAC_CLK_SYNT MAC_GTXCLK125 pll2out osci3 clk_tx, clk_rx GMAC GMAC_CLK_SYNT MCIF_SD_CLK_SYNT vco1div2 clk_sd MCIF (SD) MCIF_SD_CLK_SYNT MCIF_CFXD_CLK_SYNT vco1div2 clk_cf_xd MCIF(CF/XD) MCIF_CFXD_CLK_SYNT ADC_CLK_SYNT hclk clk_adc ADC 5.4.5 ADC_CLK_SYNT AMBA clock configuration The following clocks feed the AMBA subsystem: ● CPU_CLK: the CPU clock nominally running at 500 MHz (PLL1 source). The maximum frequency is 600 MHz. ● HCLK/ACLK: the AHB/AXI clock, nominally running at 166 MHz ● PCLK: the APB clock, nominally running at 83.5 MHz By default, all these clocks are generated by the same root: SYS_CLK (see Figure 12: AMBA clock generation), and the ratios between them are fixed: ● sys_clk: cpu_clk = 2:1 ● cpu_clk: hclk = 3:1 ● hclk:pclk = 2:1 The SSCG4 can be also used as source for hclk/pclk clocks. In this way, it is possible to decouple the source of cpu_clk and hclk/pclk clocks: ● sys_clk : cpu_clk = 2:1 ● hclk: pclk = 2:1 Because the AMBA clocks feed most of the SoC registers, they are responsible for most of the dynamic power consumption. To optimize power resources, the AMBA subsystems can be set to three different power modes, depending on the source of their clocks. A system clock controller (see Figure 11) defines the system clock (SYS_CLK in) source. See also: Section 5.5.2: Changing system modes 68/590 Doc ID 018553 Rev 3 RM0078 Reset and clock generator (RCG) Figure 11. System clock controller AMBA subsystems power mode System clocks’ source is a PLL output or SSCG6. Nominally: PLL1 at 1 GHz, PLL3 at 1.2 GHz. Notes: Switching between clock sources PLL2 and PLL3 can produce glitches; when switching from one to the other, use SLOW mode. All other clock switches are glitchless. NORMAL If osci2_dis is set, it is not possible to switch from SLOW mode to DOZE mode. PLL_TIMOUT System clocks are driven by the osci1(default) clock or by its divided version; use the oscidiv_cfg and oscidiv_en fields of SYS_CLK_CTRL to enable and set the divisor. SLOW XTAL_TIMOUT Reset state. System clock is driven by a low frequency oscillator. After reset, osci1 is selected; resetting the osci2_dis bit of PERIP_CLK_CFG selects osci2 (32 kHz). DOZE MRESET Use the following register to switch among system modes: – SYS_CLK_CTRL – SYS_CLK_OSCITIMER – SYS_CLK_PLLTIMER All of the clocks come either from external pads or from the internal signal clk_int. Figure 12 illustrates the system clocks (SYS_CLK, HCLK and PCLK) generation circuit. In the boot process the system switches from DOZE to SLOW mode using the osci1 clock; once in SLOW mode, it is not possible to switch back from SLOW to DOZE mode if osci2_dis is set. All clock transitions between modes are performed without glitches. Once in NORMAL mode, the clock can be switched without a glitch, between pll1out, SSCG6 output, and pll2out/pll3out; only a switch between pll2out and pll3out can cause glitches. To change between pll2out and pll3out sources the system needs to switch from NORMAL mode (for instance, to SLOW mode). Then, change the clksys_src setting and finally switch to NORMAL mode. To select the source in NORMAL mode, configure the clksys_src field of SYS_CLK_CTRL register. To configure SSCG6, configure the register SYS_CLK_SSCG; it is fed by vco1div2 (nominally at 500 MHz). To decouple the HCLK (ACLK) clock from SYS_CLK (and so the CPU clock), program the hclk_sel field of register SYS_CLK_CTRL. Setting hclk_sel selects the SSCG4 output for HCLK, making it possible to set a different ratio between CPU_CLK and HCLK. Doc ID 018553 Rev 3 69/590 Reset and clock generator (RCG) RM0078 The maximum frequency for HCLK is 166 MHz. Figure 12. AMBA clock generation sys_mode_req (SYS_CLK_CTRL ) hclk_sel (SYS_CLK_CTRL) oscidiv_cfg (SYS_CLK_CTRL) CLOCK CTRL osci2 DOZE osci1div OSCIDIV osci1 sys_clk (clk1ghz) SLOW NORMAL div6 HCLK MUX pll1out hclk pll2out pll3out GLM3 SSCG6 vco1div2 SSCG4 div2 SYS_CLK_SSCG 70/590 clksys_src (SYS_CLK_CTRL) AMBA_CLK_SSCG Doc ID 018553 Rev 3 pclk RM0078 Reset and clock generator (RCG) 5.4.6 A9SM clock configuration The main A9SM clock source is the CLK1GHZ (see Figure 13). Because CLK1GHZ is connected to SYS_CLK (see Figure 12 ), its nominal value is 1 GHz when the PLL1 source is selected. The maximum value is 1.2 GHz, when cpu_clk= 600 MHz. For further details on configuring SYS_CLK clock, see AMBA clock configuration on page 68. Figure 13. A9SM clock domain CORTEXA9INTEGRATION CoreSight subsystem CTM IRQs CTM CTI ETB TPIU ROM Funnel 2 Funnel 1 Replicator Replicator A9SM Triggers GIC (128 IRQs) DBG0 PMU0 DBG1 PMU1 TRM0 WD0 TRM1 WD1 A9 Core #0 Trace CTI1 CTI0 A9 Core #1 PTM1 Replicator ATB PTM0 SCU ATCLK_SOC APB APB Dec. DAP ROM PCLKDBG_SOC APB Dec. JTAG TCK PL310 (with address filtering) PCLK APB DAP CLK1GHZ APB ClkMan AXI 0 ACLKM0 AXI 1 ACP ACLKS_SOC Clock manager ACLKM1 Legend AXI Table 18. APB Debug ATB Triggers APB Synchronizer 500 MHz block A9SM clocks A9SM clock Maximum frequency MISC register Description CLK1GHZ SYS_CLK at 1.2 GHz Main clock source CLK_CORE (CLK1GHZ /2) at 600 MHz Used by the two internal CPUs; generated by dividing CLK1GHZ by two, and is nominally 600 MHz. PERIPHCLK (CLK1GHZ /4) at 300 MHz ATCLK (CLK1GHZ /4) at 300 MHz NA Used by internal peripherals (WD, GIC); generated by dividing CLK1GHZ by four, and is nominally 300 MHz Feeds the TRACE unit; derived from CLK1GHZ, and runs at 300 MHz. The clocks for the two AXI interfaces; they are in phase with the system. ACLKM0, ACLKM1 aclk at 166 MHz Doc ID 018553 Rev 3 71/590 Reset and clock generator (RCG) Table 18. RM0078 A9SM clocks (continued) A9SM clock Maximum frequency MISC register Description ACLKS_SOC aclk at 166 MHz PERIP2_CLK_ENB[6] Used by the SCU, and runs at the same frequency as system clock HCLK. PCLK pclk at 83 MHz PERIP1_CLK_ENB[0] The APB interface clock, in phase with system clock pclk. 5.4.7 GMAC clock configuration The GMAC block supports the following PHY interfaces: ● GMII: Gigabit media independent interface ● RGMII: Reduced GMII ● MII: Media independent interface ● RMII: Reduced MII To select the PHY interface, configure the register GMAC_CLK_CFG[5:3], macphy_sel field. Table 19 lists the GMAC clocks for all interfaces. All of the clocks come either from external pads or from the internal signal clk_int (see Figure 14) Table 19. GMAC clocks macphy_sel field 72/590 Source MAC_GTXCLK (MHz) clk_tx (MHz) clk_rx (MHz) 000: MII 25 / 2.5 (MAC_TXCLK) 25 / 2.5 (MAC_RXCLK) – 000: GMII 125 (clk_int) 125 (MAC_RXCLK) 125 001: RGMII 125 / 25 / 2.5 (clk_int) 125 / 25 / 2.5 (MAC_RXCLK) 125 (both edges) 100: RMII 25 / 2.5 (clk_int) 25 / 2.5 (clk_int) 50 Doc ID 018553 Rev 3 RM0078 Reset and clock generator (RCG) Figure 14. GMAC clock generation macphy_sel (GMAC_CLK_CFG) synth_en (GMAC_CLK_CFG) GMAC_CLK_SYNT macphy_sel mac_speed(GMAC_CLK_CFG) RMII MAC_GTXCLK GMII or RGMII MAC_TXCLK MAC_GTXCLK125 MII 1 pll2out clk_int GMAC_CLK_SYNT osci3 GMII clk_tx 0 1,5,50 2,20 RGMII RMII clk_rmii clk_sel mac_speed RMII clk_rx MAC_RXCLK RGMII or GMII macphy_sel (GMAC_CLK_CFG) Table 20. PHY MII Setting GMAC clocks to different modes Clock Source Description clk_rx clk_tx MAC_TXCLK, MAC_RXCLK pads In this mode, there is no need to configure a divider or MUX. clk_rx Pad (MAC_RXCLK) clk_tx clk_tx = clk_int GMAC_CLK_CFG and GMAC_CLK_SYNT must be programmed to set clk_int= 125 MHz. clk_int is also present on MAC_GTXCLK for the external PHY. GMII Doc ID 018553 Rev 3 73/590 Reset and clock generator (RCG) Table 20. PHY RMII RGMII RM0078 Setting GMAC clocks to different modes (continued) Clock clk_rx clk_tx clk_rx clk_tx Source Description clk_int = clk_rmii clk_int must be set to 50 MHz. Internal dividers generate 25 MHz and 2.5 MHz. clk_rmii is also present on MAC_GTXCLK for the external PHY. clk_int clock can be set between the pad MAC_GTXCLK125 and the output of GMAC_CLK_SYNT divider using the synth_en field of register GMAC_CLK_CFG. The GMAC_CLK_CFG and GMAC_CLK_SYNT registers can be used to: – Set the XYSYNT source using clk_sel source – Program the XYSYNT division factors – Enable XYSYNT (synth_en = 1) clk_tx=clk_int clk_rx= MAC_RXCLK clk_int must be set to 125 MHz. The internal divider generates 25/2.5 MHz frequencies based on the mac_speed signal when in 100/10 Mbs. To configure the clk_int clock, use the GMAC_CLK_SYNT and GMAC_CLK_CFG registers. There is no need to configure clk_rx; it is connected directly to the pad. Examples: In GMII and RGMII modes, clk_int = 125 MHz using PLL2 as source 1. Program PLL2 to generate a 500 MHz clock. 2. Select pll2out as the source for GMAC_CLK_SYNT by setting clk_sel = 2b01. 3. Configure the GMAC_CLK_SYNT to divide by 4: a) synth_clkout_sel = 0 b) synt_xdiv=1 c) synt_ydiv=2 4. Enable the GMAC_CLK_SYNT source for clk_int by setting synth_en = 1. 5. Select the GMII/RGMII mode through macphy_sel field of GMAC_CLK_CFG register. In RMII mode, clk_int = 50 MHz using MAC_GTXCLK125 as source 74/590 1. Select MAC_GTXCLK125 as the source for GMAC_CLK_SYNT by setting clk_sel = 2b01. 2. Select pll2out as the source for GMAC_CLK_SYNT by setting clk_sel = 2b00. 3. Disable the GMAC_CLK_SYNT by setting synth_en = 0 in GMAC_CLK_CFG register 4. Select the RMII mode setting macphy_sel = 3'b100. Doc ID 018553 Rev 3 RM0078 5.4.8 Reset and clock generator (RCG) I2S clock configuration RCG generates two clocks for the I2S master block: ● I2S_M_SCLK to internal I2S_M block and I2S_OUT_BITCLK to external device ● I2S_OUT_OVRSAMP_CLK to external device. I2S_M_SCLK is generated from I2S_OUT_OVRSAMP_CLK, and it is synchronous to it. To configure the clock sources and the dividers parameters, use the I2S_CLK_CFG register. I2S slave block (I2S_S) serial clock is provided by the on-board I2S master device, through the I2S_IN_BITCLK pad. Figure 15. I2S_M clock generation I2S_OUT_OVRSAMP_CLK refout_div_en (I2S_CLK_CFG) refout_div_src (I2S_CLK_CFG) sclk_div_x,y sclk_div_sel sclk_div_en (I2S_CLK_CFG) I2S_OUT_BITCLK vco1div2 0 pll2out I2S_DIV1 pll3out I2S_M_SCLK I2S_DIV2 I2S_M 1 I2S I2S_OUT_REFCLK I2S_S I2S_IN_BITCLK refout_div_x,y refout_div_sel (I2S_CLK_CFG) Doc ID 018553 Rev 3 75/590 Reset and clock generator (RCG) 5.4.9 RM0078 UART clock configuration The UART clock can be generated by three sources: ● 48 MHz clock from USBPHY (clk_usb48 in Figure 16) ● 24 MHz clock coming from the main oscillator (osci1 in Figure 16) ● vco1div2: through the UARTx_SYNT, the vco1div2 clock is divided to generate clk_uartx_synt (refer to the UARTx_CLK_SYNT register) To select the source, use the MISC register PERIP_CLK_CFG. The clk_uartx clock is divided internally in the UART block to generate the desired BAUD rate (see Chapter 25: Asynchronous serial ports (UART)). Figure 16. UART clock generation uartclkx_sel (PERIP_CLK_CFG) clk_usb48 osci1 vco1div2 clk_uartx UARTx_SYNT clk_uartx_synt UARTx_CLK_SYNT 5.4.10 C3 clock configuration The C3 clock (clk_c3 in Figure 17) has two sources: ● 48 MHz clock from USBPHY (clk_usb48 in Figure 17) ● vco1div2: C3_SYNT divides the vco1div2 clock to generate the clk_c3_synt (see register C3_CLK_SYNT) To select the source, use the MISC register PERIP_CLK_CFG. Figure 17. C3 clock generation c3clk_sel (PERIP_CLK_CFG) clk_usb48 clk_c3 vco1div2 C3_SYNT clk_c3_synt C3_CLK_SYNT 76/590 Doc ID 018553 Rev 3 RM0078 5.4.11 Reset and clock generator (RCG) CLCD clock configuration To select the CLCD clock, you must configure the following registers: ● PLL_CFG[31] to select the SSCG input clock source. ● CLCD_CLK_SSCG register to configure SSCG5. ● PERIP_CLK_CFG[3:2] to select: – 48 MHz clock coming from the USB PHY (clk_usb48 in Figure 18): – the SSCG5 clock (clk_sscg5 in Figure 18) – XGPIO123 primary pad – pll3out The clk_clcd clock is only one option for the CLCD panel clock. For more information, see Chapter 34: Display controller (CLCD). Figure 18. CLCD clock generation clcdclk_sel (PERIP_CLK_CFG) clk_usb48 vco1div4 SSCG5 clk_sscg5 clk_clcd pll2out XGPIO132 CLCD_CLK_SSCG pll3out clcd_synth_sel (PLL_CFG) Doc ID 018553 Rev 3 77/590 Reset and clock generator (RCG) 5.4.12 RM0078 GPT clock configuration All four GPT prescalers (see Chapter 10: General purpose timers (GPT)) are fed by the clock clk_timer. To select the clk_timer source, use the PERIP_CLK_CFG register, field gpt_clk_sel: ● If gpt_clk_sel = 0 (default), osci1 is selected. ● If gpt_clk_sel = 1, pclk is selected. In this case, the clk_timer is synchronous with APB pclk. Disabling the clk_timer When the CPUs are in debug state, the gpt_dbg_en field of the SOC_CFG register can be configured to disable the clk_timer: 00: clk_timer is not gated when CPUs enter debug state. 01: clk_timer is gated when CPU0 enters debug state. 10: clk_timer is gated when CPU1 enters debug state. 11: clk_timer is gated when either CPU0 or CPU1 enter debug state. 5.4.13 MPMC clock configuration The memory controller uses two clock sources: ● The first clock source, used for the six AXI data ports (aclk_MPMC) and the AHB register port (hclk_MPMC), is the same as that used in the system for the AMBA interconnect; the default frequency is 166 MHz. This clock can be enabled/disabled using the mpmc_amba_clken field of the PERIP2_CLK_ENB register. The source for this clock is hclk (see Figure 12). ● The second clock source is used by the memory controller, the physical (PHY) interface, the MIM structure, and the on-board DDR memory. The relationship between the frequency value of the memory controller and the PHY interface is fixed at 1:2. The maximum frequency for the PHY interface and memory interface is 533 MHz. The memory interface has a dedicated clean clock source (clk_MPMC_ddr). This clock runs asynchronously with respect to hclk and cpu_clk. The value can be programmed through the MISC registers PLL4_FREQ and PLL4_CTR. The controller and the PHY clock can be enabled/disabled by the mpmc_ctrl_phy_clken field of the miscellaneous register PERIP2_CLK_CFG. The MIM is used to translate the controller frequency into the PHY frequency; Figure 19 shows the the memory controller clock relationships. 78/590 Doc ID 018553 Rev 3 RM0078 Reset and clock generator (RCG) Figure 19. MPMC clocks scheme aclk_ddr_ctrl MPMC hclk_ddr_ctrl DFI Bridge clock_mod clk_mpmc_ctrl (266 MHz) /2 osci1 (24 MHz) clk_mpmc_phy (533 MHz) PLL4 DDR PHY DDR 5.4.14 Gate unit The IP clocks can be enabled and disabled through registers PERIP1_CLK_ENB, PERIP2_CLK_ENB, and PERIP3_CLK_ENB. Note: For the complete clock list, see Table 9: RCG clocks. The clock enable sequence needs to be done at system start, before the IP software reset release (see PERIPx_SW_RST register description). Disabling/enabling the clock when the IP is not in reset state could produce glitches on the clock line. 5.4.15 Reset generator The main hardware reset is asserted by the MRESETn pad. As shown in Figure 20, the reset signal passes through a filter that suppresses glitches with widths less than 9 ns. To generate the hresetn root reset, the reset is first synchronized on the osci1 clock, and then on PCLK. Doc ID 018553 Rev 3 79/590 Reset and clock generator (RCG) RM0078 Most SoC resets are generated starting from the main hresetn through a reset module (dashed box in Figure 20): the reset is asserted asynchronously with the relative clock, and it is released synchronously on the same clock. Figure 21 shows the reset sequence. Figure 20. Reset generator resetn_cpu (to A9SM) ack_power_state[3] (from PCM) D Q synch clk1ghz MRESET dly nresets (to clocksys) sw_reset (from MISC ) cache_parity_fail (from A9SM) D Q synch osci1 ‘1’ D Q die_id_valid hresetn Count 60 synch pclk pclk ‘1’ D Q synch hresetn_x presetn_x rstn_x PERIP_SW_RST (from MISC) ack_power_state D synch wdog_req (from A9SM) 80/590 Doc ID 018553 Rev 3 Q RM0078 Reset and clock generator (RCG) Figure 21. Reset waveform 60 PCLK pulses pclk (sys_clk/2) @2 MHz MRESETn sw_reset cache_parity_fail hresetn When a power island is switched off, all of the resets of that island are asserted. This is accomplished by the PCM module through the ack_power_state[3:0] signals. For more details, see Chapter 6: Power control module (PCM). The A9SM module can assert a reset when the internal watchdog module timer expires. When this happens all the SoC but RCG, MISC, PCM and A9SM is reset and the CPUs jump to the BOOT code. For more details, see Chapter 2: CPU subsystem (A9SM). A software reset is available by programming the SYS_SW_RES register; this reset acts like the main hardware reset. The PERIP1_SW_RST, PERIP2_SW_RST , PERIP3_SW_RST registers are used to assert a software reset to IPs. Table 21 summarizes the reset sources and targets. Table 21. Reset sources Reset source Source Target Register & reference MRESETn External SoC Ack_power_state[ n ] PCM Power Island[n] PCM_CFG Sw reset MISC SoC SYS_SW_RES Cache_parity_fail A9SM SoC A9SM_PARITY_CFG wdog_req(1) A9SM SoC – {RCG,PCM,MISC,A9SM, USBPHY, DDRPHY, MIPHY} See Chapter 2 PERIPx_SW_RST[ n ] MISC IP[n] PERIP1_SW_RST PERIP2_SW_RST PERIP3_SW_RST 1. Note that after a watchdog reset only the CPUs and other blocks are reset (not the entire SoC), hence a software reset has to be asserted to guarantee that the system works properly. Doc ID 018553 Rev 3 81/590 Reset and clock generator (RCG) 5.5 Programming 5.5.1 Programming PLLs 1. 2. Start the analog PLL a) Set the PLLx_FRQ. b) Enable the PLL: register PLLx_CTR field pll_enable = 1. c) Wait until the PLL is locked: register PLLx_CTR field pll_lock = 1. Switch to an external divider: a) 3. RM0078 Change the feedback divider from internal to external: register PLLx_CTR field pll_control1[5] = 1. Everything but dithering mode can be changed here (modulation period, slope). b) Toggle pll_control1[0] from 1 to 0, and back to 1. c) Wait until the PLL is locked. d) Change to dither mode: pll_control1[2:1]. e) Toggle pll_control1[0] from 1 to 0, and back to 1. Make modulation changes: a) Program dither mode to off; change the modulation period and slope as needed. b) Toggle pll_control1[0] from 1 to 0, and back to 1. c) Wait until the PLL is locked. d) Change to dither mode again. e) Toggle pll_control1[0] from 1 to 0, and back to 1. The lock signal is meaningless during modulation. 5.5.2 Changing system modes 1. Enable xtal and pll counter by setting the xtaltimeout_en and plltimeout_en bits of register SYS_CLK_CTRL. 2. Set the SYS_CLK_OSCITIMER timeout value from DOZE to SLOW transition. For example, SYS_CLK_OSCITIMER = 0x100 3. Set the SYS_CLK_PLLTIMER timout value from SLOW to NORMAL transition. For example, SYS_CLK_PLLTIMER = 0x100. 82/590 4. Set the SYS_CLK_CTRL [sys_mode_req] = 0x2 to switch to SLOW mode. 5. Configure the PLL1 at 1 GHz (see Section 5.5.1: Programming PLLs). 6. Set the SYS_CLK_CTRL [sys_mode_req] = 0x4 to switch to NORMAL mode. Doc ID 018553 Rev 3 RM0078 5.5.3 Reset and clock generator (RCG) Setting cpu_clk = 600 MHz and hclk = 166 MHz 1. Configure the system to SLOW mode following the steps 1 to 4 described in Section 5.5.2: Changing system modes. 2. Set PLL1 to 1 GHz. 3. Set PLL2 to 1.2 GHz. 4. Configure SSCG4 clock to 166 MHz and wait for AMBA_CLK_SSCG[lock] = 1. 5. Select the SSCG4 source for hclk. For example, AMBA_CLK_SSCG[T0]= 0x603B since vco1div2= 500 MHz. For example, SYS_CLK_CTRL[hclk_sel]= 1. 6. Select the PLL2 for NORMAL mode. For example, SYS_CLK_CTRL[clksys_src] = 3'b110. 7. 5.5.4 Switch to NORMAL mode. Configuring the fractional clock generator (SSCG) This section gives a configuration example for SSCG: fin= 500 MHz, fout = 48 MHz The value for To is: To= fin / (2 * fout ) = 5.2083 The corresponding fixed point value is calculated as: To= 5.2083 * 214 = 85332 => To = 17b10100110101010100 Modulation example: fm= 100 KHz, Dt = 2.5% and fin = 500 MHz Dt= 0.025 Since LSB = 2-9 and the field is of 8 bits Dt= 0.025 * 2^9 = 12.8 = 8'b00001100 fmod= fm/ fin= 0.0002 Since the fm field is encoded with 8 bit and LSB= 2-16 fmod= 0.0002 * 216= 13.1= 8'b00001101 Doc ID 018553 Rev 3 83/590 Power management 6 RM0078 Power management This chapter focuses on SPEAr1340 power management. For technical details about the programmable registers, refer to the following companion document: ● 6.1 RM0089, Reference manual, SPEAr1340 address map and registers Overview The SPEAr1340 device enables you to choose from among a significant number of different configurations that can optimize the overall power consumption, depending on the target application. To this end, in addition to the usual technology-related and architecture-based solutions (LP libraries, high threshold-voltage cell usage, fine-grain clock gating and poweraware synthesis flow), SPEAr1340 also employs high-level mechanisms that enable savings both on leakage and dynamic power. The leakage and power configuration-specific savings are obtained by deploying a power shutoff strategy over a hierarchical partitioning of the design in power domains that responds to both a logical and a physical rationale. These techniques lead to the definition of shutoff modes. 6.2 Power domains SPEAr1340 is partitioned in power domains: sections of core logic with digital supplies that can be independently managed through the deployment of embedded switches. SPEAr1340 has five power domains: ● AlwaysON ● ARM ● CODEC ● GPU ● BUS Figure 22 shows these power domains and the IPs present in each domain. On the left side of the figure you can see the macros with dedicated supply pads and the IOs. Note: 84/590 The power domain names reflect only a part of the supported features. Doc ID 018553 Rev 3 RM0078 Power management Figure 22. SPEAr1340 power islands DDR PHY I/Os :1V5 /1V8 Logic: 1V2 PLL: 2.5V+1.2V AlwaysOn 9Always-on RAM 9GMAC 9GPIO 9MISC RTC OSCI 32 KHz 1.5V Hard macros with dedicated supply pads GPU 9PCM 9RCG 9RTC 9UOC 9CAM 9CEC 9GPU 9S/PDIF 9VIP USB2.0 PHY 3 ports 2.5V+1.2V_3V3 ARM PCIE/SATA PHY (MiPHY 1port) 2.5V+1.2V BUS 9ARM Cortex-A9 9L2 cache ADC 2.5V (+1.2V power_bus) 9A9SMRegister slices 9ADC 9BootROM 9BUSMATRIX 9C3 9CLCD 9DMAC 9FSMC 9GPT 9I2C System PLLs & OSCI 1.2V + 2.5V CODEC 3V3 TTL I/Os 9PCIE/SATA (MiPHY) 9VDEC 9VENC 3V3 TTL/1V8 CMOS I/Os 3V3 TTL/2V5 CMOS I/Os 9I2S 9KBD 9MCIF 9MPMC 9PWM 9SMI 9SSP 9SRAM 9UART 9UHC 3V3 PCI/TTL I/Os Note: 6.2.1 1 RTC is battery operated, it has non switchable power supply. 2 MiPHY has dedicated, non switchable power supply. 3 The DDR PHY has a dedicated, board-switchable power supply. Power domain management: power states The power islands can be configured in five different states, as shown below: Table 23. ID (1) Allowed power states Power state 0 AlwaysOn-Only 4 ARM + BUS 5 Comment All switchable power islands are OFF. DDRPHY can selectively be switched (board switches) ON or OFF. – ARM + CODEC + BUS – 6 ARM + GPU + BUS 7 ON – All switchable power islands are ON. 1. These IDs refer to power states numbering in Figure 23 below. Figure 23 is a visual representation of the allowed states and transitions. The power states marked in blue color are only used internally for transitions. The end user may set as target any of the allowed power states listed in the graphic. The hardware automatically sequences the power transitions as shown below. Sequencing is Doc ID 018553 Rev 3 85/590 Power management RM0078 applied to guarantee power supply integrity: two power domains are never switched on simultaneously. Figure 23. Power states transition graph TURN BUS ON TURN CODEC ON ARM+BUS TURN CODEC OFF ARM ARM+BUS+ CODEC TURN GPU ON TURN ARM ON POWERDOWN TURN GPU OFF TURN GPU ON TURN GPU OFF TURN CODEC ON ARM+GPU+ BUS SLEEP TURN CODEC OFF TURN ARM ON TURN GPU ON TURN ARM ON TURN ARM OFF POWERDOWN TURN ARM OFF TURN BUS ON TURN CODEC ON GPU TURN BUS OFF GPU+BUS GPU+BUS+ CODEC TURN CODEC OFF In the case of transitions starting from those power states in which the ARM core is powered on, the target state is programmed by the CPU itself. In the case of transitions starting from the power states in which no processing core is available, an alternative mechanism is needed and made available: a 'wakeup event' triggers the load of the desired configuration. The possible sources of wakeup events are listed in the table below. Table 24. Allowed wakeup events Wakeup event 86/590 Description USB wakeup trigger Exit from USB SUSPEND mode GPIO wakeup trigger '0' to '1' event on the dedicated GPIO RTC wakeup trigger ALARM interrupt event GMAC wakeup trigger Interrupt event generated by the reception of a valid wakeup frame (magic packet or remote wakeup frame) Doc ID 018553 Rev 3 RM0078 6.2.2 Power management Power domain management: configuration registers This section lists the registers (both IP-specific and miscellaneous) related to the power management of switchable domains. The second column shows the state of these registers after SoC power-up/reset. These settings correspond to state “RESET”. Table 25. Base address Power management configuration registers Offset Register name Register fields PCM_CFG All the register fields are used: – wakeup_en – wakeup_trig – sw_config – config_ack – config_bad – ack_power_state – ddr_phy_no_shutoff PCM_WKUP_CFG All the register fields are used: – rtc_wkup_config – gpio_wkup_config – usbdev_wkup_config – ethernet_wkup_config 108 SWITCH_CTR All the register fields are used: – pd1_ctrl – pd2_ctrl – pd3_ctrl – pd4_ctrl 310 PERIP2_CLK_ENB – mpmc_amba_clken 31C PERIP2_SW_RST – mpmc_amba_swrst 200 SYS_CLK_CTRL – sys_mode_req 214 PLL1_CTR – pll_enable 220 PLL2_CTR – pll_enable 22C PLL3_CTR – pll_enable 238 PLL4_CTR – pll_enable 100 104 E0700 Doc ID 018553 Rev 3 87/590 Power management RM0078 Table 25. Power management configuration registers (continued) Base address EC000 Offset Register name Register fields 018 MPMC_CTRL_REG_06 – pwrup_srefresh_exit 02C MPMC_CTRL_REG_11 – selfrefresh – start 318 MPMC_CTRL_REG_129 – cke_status 400 GPIODIR Fields 3 and 2 030 GPIODATA Fields 3 and 2 EC060 6.2.3 Power management procedures This section describes the software procedures for: 1. Activating transitions that do not require wake-up mechanisms (software-driven through the BUS) 2. Activating transitions that require wake-up mechanisms through one of the wake-up sources (GMAC, RTC, GPIO, USB) 3. Managing the SUSPEND-TO-RAM feature when the target state is State 0 (AlwaysOnOnly Procedure 1: Activating transitions that do not require wake-up mechanisms Note: You can use this procedure to switch to any state other than the AlwaysOn-Only (0000) state (see Table 26). 1. 88/590 Clear the fields of set PCM_CFG register by setting: a) config_ack to '0' b) config_bad to '0' c) sw_config to a value equal to ack_power_state 2. Set field sw_config of register PCM_CFG to the desired value. 3. Poll config_ack for acknowledgment of the execution of power state transition. The field config_bad is used for debug purposes, it states that an illegal state has been requested. Doc ID 018553 Rev 3 RM0078 Power management Procedure 2: Activating transitions that require wake-up mechanisms Note: You can use this procedure to switch to the AlwaysOn-Only (0000) state (see Table 26). 1. 2. Clear the fields of set PCM_CFG register and enable peripherals for wake-up by setting: a) wakeup_en field of desired wake-up peripheral to '1' b) wakeup_trig field to '0000' c) config_ack to '0' d) config_bad to '0' e) sw_config to a value equal to ack_power_state Prepare the desired configuration for wake-up by setting one or all of the following fields of register PCM_WKUP_CFG: a) rtc_wkup_config if RTC is enabled for wake-up b) gpio_wkup_config if GPIO is enabled for wake-up c) usbdev_wkup_config if USBDEV is enabled for wake-up d) ethernet_wkup_config if ETHERNET is enabled for wake-up 3. Set field sys_mode_req of registesr SYS_CLK_CTRL to '010' to switch the system to SLOW MODE. 4. Set fields pll_enable of registers PLL*_CTR to '0' to power down all the PLLs. 5. Set field sw_config of register PCM_CFG to '0000' . 6. To wake the system up, trigger the wake-up event from the chosen external source. Procedure 3: Managing the SUSPEND-TO-RAM feature when the target state is State 0 Note: You can use this procedure to switch to the AlwaysOn-Only state (0000) (see Table 26) while preserving the contents of the external DDR module. 1. Put the DDR in self-refresh mode by setting field srefresh (offset 16) of MPMC register MPMC_CTRL_REG_11 to ‘1’. 2. Check the CKE signal reading bit cke_status (offset 8) of the MPMC_CTRL_REG_129. Check that the register bit is set to ‘0’. 3. To avoid corruption of signals CKE and RESET for external DDR module when the memory controller is powered off, activate control from GPIO by setting: a) gpioa_clken of MISC register PERIP1_CLK_ENB to ‘1’ b) fields 3 and 2 of GPIOA register GPIODIR to ‘1’ to configure the needed IOs as outputs c) fields 3 and 2 of GPIOA register GPIODATA to ‘11’ to drive an ‘1’ on the selected IOs. This enables the on-board logic that forces the CKE and RESET signals to external DDR to the appropriate value allowing the external DDR to remain in self- Doc ID 018553 Rev 3 89/590 Power management RM0078 refresh mode regardless of the status of the memory controller (MPMC) inside the SoC. 4. Stop the memory controller by setting field start (offset 24) of MPMC register MPMC_CTRL_REG_11 to ‘0’. 5. Latch the register values of MPMC in Always-on RAM, to save the result of the levelling procedure. 6. Reset and clock-gate the AMBA interface of the MPMC by setting: a) mpmc_amba_clken of MISC register PERIP2_CLK_ENB to ‘0’ b) mpmc_amba_swrst of MISC register PERIP2_SW_RST to ‘1‘ 7. Apply Procedure 2 described above. 8. Set field sys_mode_req of register SYS_CLK_CTRL to '100' to switch the system to NORMAL MODE. 9. Remove reset and clock gating from the AMBA interface of the MPMC by setting: a) mpmc_amba_clken of MISC register PERIP2_CLK_ENB to ‘1’ b) mpmc_amba_swrst of MISC register PERIP2_SW_RST to ‘0‘ 10. Restore back the values from Always-on RAM to reprogram the controller. During this operation, be sure that both bits start and srefresh are set to ‘0’. 11. To drive the CKE signal to external DDR, release the on-board logic by setting field 2 of GPIOA register GPIODATA + 0x30 to ‘0.’ 12. Set the bit pwrup_srefresh_exit (offset 8) of the MPMC_CTRL_REG_06 to ‘1’. 13. Restart the memory controller by setting start field of MPMC register MPMC_CTRL_REG_11 to ‘1’. 14. Check that bit cke_status of MPMC_CTRL_REG_129 register is set to ‘1’. 15. To drive the RESET signal to external DDR, release the on-board logic by setting field 3 of GPIOA register GPIODATA + 0x30 to ‘0’. 6.3 Clock power management The reset and clock generator (RCG) provides the system clocks and resets. It is highly configurable through the miscellaneous registers. The RCG can be configured in three different states, as shown in Table 26. Table 26. Clock power states ID 90/590 State Comment 1 DOZE The system clock source is the RTC oscillator (nominally @32 KHz). 2 SLOW The system clock source is the main oscillator (nominally @24 MHz). 3 NORMAL The system clock source is the PLL1 (nominally @1 GHz ) or SSCG. Doc ID 018553 Rev 3 RM0078 Power management To change the power state, configure the sys_mode_req field of the miscellaneous register SYS_CLK_CTRL as follows: – 3’b001 for DOZE state – 3’b010 for SLOW state – 3’b100 for NORMAL state To change the clock source within each state, configure the same register appropriately. Note: For more information on clock configuration, refer to Chapter 5: Reset and clock generator (RCG). 6.4 IP power management 6.4.1 Standard IP power management This paragraph provides power management information for the IPs that do not feature specific power management procedures. For standard IPs, two power states are generally available: the DISABLED and the OPERATIVE one (see Table 27 below). Both are programmable through the miscellaneous registers (MISC). Table 27. Standard IPs power states ID State Comment 1 DISABLED IP under reset, clock disabled 2 OPERATIVE IP not under reset, clock enabled ● To enable/disable the clock, configure the PERIP1_CLK_EN and PERIP2_CLK_EN registers. ● To activate/disactivate reset, configure the PERIP1_SW_RST and PERIP2_SW_RST registers. Note: See also: RM0089, Reference manual, SPEAr1340 address map and registers for the description of the MISC registers. 6.4.2 USBPHY power management The USBPHY is the physical interface of the USB subsystem. It can be configured in two different states, as shown in Table 28. Table 28. USBPHY power states ID State Comment 1 SUSPEND In this mode the PHY clock along with the 48 MHz clock shuts off. 2 OPERATIONAL Normal operational mode Table 29 lists the registers (both IP-specific and miscellaneous) related to the power management of USBPHY. The second column shows the state of these registers after SoC power-up/reset. Doc ID 018553 Rev 3 91/590 Power management Table 29. RM0078 USBPHY power management-related registers Address Value after reset Register name 0xE0700314 0x0000183B USBPHY_GEN_CFG USBPHY configuration procedure This section describes how to configure power options for the USBPHY. Changing the device state to SUSPEND No register is involved to put USBPHY in SUSPEND state. USBPHY enters SUSPEND state automatically when there is no activity on USB line. All three ports should get SUSPENDM signal from the corresponding attached controller to achieve this SUSPEND state. But in order to reduce the power consumption in SUSPEND state, the following register is involved. # MISC SETTINGS (USBPHY) 1. Set address 0xE0700314 to value |= 0x1 This sets COMMONONN signal to '1'. When USBPHY is in suspend state, this setting allows to power down all USBPHY internal blocks that are common to the 3 individual phys (XO bias, PLL). When wake up occurs then USBPHY goes out of suspend state but power up the PLL and XO bias i.e. to make available all clocks COMMONONN should be written ‘0’ again. Than can be done by doing: Set address 0xE0700314 to value &= ~0x1 6.4.3 MPMC/DDR PHY power management JEDEC standard for DDR3 memories considers two main modes beyond the operational one: the self-refresh and the power-down one. To enter these modes, it is necessary to use the corresponding commands. The self-refresh command can be used to retain data in the DDR3 SDRAM even if the rest of the system is powered down. When in self-refresh mode, the DDR3 SDRAM retains data without external clocking. When the DDR3 SDRAM has entered self-refresh mode, all the external control signals, except for CKE and RESET#, are “don't care”. In order to keep the DRAM in self-refresh mode, RESET# must be kept high and CKE low. The self-refresh mode can be used to change the clock frequency at which MPMC operates. The memory controller must stop processing requests, the clock must be adjusted, the memory controller's timing parameters must be reprogrammed and then the memory controller can be restarted. To retain the data in DRAM during this process the memory can be put in self-refresh mode via a self-refresh command. The power-down mode is synchronously entered when CKE is registered low (along with NOP or Deselect command). CKE is not allowed to go low while the Mode register set command, MPR operations, ZQCAL operations, DLL locking or read/write operation are in progress. CKE is allowed to go low while any of other operations such as row activation, precharge or auto-precharge and refresh are in progress, but powerdown. 92/590 Doc ID 018553 Rev 3 RM0078 Power management Entering the power-down mode disables all the input and output buffers, including CK, CK#, ODT, CKE, and RESET# and DRAM content will be lost. A situation to use the self-refresh mode may arise when the user may wish to power-down or reset the MPMC without disturbing the contents of memory. In order for memory not to be erased, the CKE and RESET# signals must remain constant. As the MPMC is not able to drive these signals, the system handles this responsibility through on-board logic driven by the GPIO IP. When the MPMC has been restored to active state, it regains control over the mentioned signals. When the CKE signal is de-asserted, the memory enters self-refresh. This must occur before the reset signal is asserted to the MPMC or the MPMC is powered down. If the CKE signal is not released first, the memory may be left in an unknown state. When power is re-applied to the MPMC or the reset signal is released, the MPMC must be informed of the type of wakeup required: a full initialization or a memory wakeup where the memory devices are just pulled out of self-refresh. This information is conveyed to the MPMC through the pwrup_srefresh_exit parameter. If the pwrup_srefresh_exit parameter is cleared to 'b0, the MPMC will perform a full memory initialization. If the pwrup_srefresh_exit parameter is set to 'b1, this allows the controller to exit power-down mode by executing a self-refresh exit instead of the full memory initialization. This parameter provides means to skip full initialization when the DRAM devices are in a known self-refresh state. MPMC puts in self-refresh the attached DRAM device(s) via the srefresh parameter. To do so, the current memory burst for the current transaction (if any) will complete, all banks will be closed, the self-refresh command will be issued to the DRAM, and the memory clock enable signal will be de-asserted. The system will remain in self-refresh mode until this parameter is cleared to 'b0. The DRAM devices will return to normal operating mode after the self-refresh exit delay of the device and any DLL initialization time for the DRAM is reached. The memory controller will resume processing of the commands from the interruption point. When a self-refresh exit command is executed, an automatic refresh is requested. By setting the bit srefresh_exit_no_refresh, the automatic refresh request is inhibited. MPMC puts in power-down mode the attached DRAM device(s) via the power_down parameter: When this parameter is set to 'b1, the memory controller will complete processing of the current memory burst for the current transaction (if any), issue a precharge all command and then disable the clock enable signal to the DRAM devices. Any subsequent commands in the command queue will be suspended until this parameter is cleared to 'b0. The DRAM command will be lost in this case. 6.4.4 PCIE/SATA/MIPHY power management The PCIE/SATA controllers can be disabled (clock-gated and reset) for minimum dynamic consumption as for standard IPs. This automatically implies minimum power consumption state for MIPHY (reset by controller).This is also achieved when powering down the PCIE power island. Controllers for PCIE and SATA and the shared MIPHY physical layer can be configured powerwise through the link interface. Please refer to the standard procedures for link power management which can be found in the following documents: ● PCI Express® Base Specification revision 2.0 ● Serial ATA revision 3.0 Doc ID 018553 Rev 3 93/590 Power management 6.4.5 RM0078 ADC power management ADC can be configured only in one state, as shown in Table 30. Table 30. ADC power state ID 1 State Comment POWER DOWN ADC macro is in power down mode Table 31 lists the registers (both top level and IP level) related to the power management of ADC. The second column shows the state of these registers after SoC power-up/reset. Table 31. ADC power management-related registers Address Value after reset Register name 0xE0700274 0x0000003F PERIP1_CLK_EN[30], adc_clken 0xE070027C 0xFFFFFFC0 PERIP1_SW_RST[30], adc_swrst 0xE0080000 0x00000000 ADC_STATUS ADC configuration procedure To change the device state in order to force POWER DOWN state: # MISC SETTINGS (ADC) Set address 0xE070027C to value 0xBFFFFFC0 Set address 0xE0700274 to value 0x4000003F # ADC STATUS SETTINGS Set address 0xE0080000 to value 0x00000000 # MISC SETTINGS (ADC) Set address 0xE0700274 to value 0x0000003F 94/590 Doc ID 018553 Rev 3 RM0078 6.5 Power management Voltage regulators SPEAr1340 has three internal voltage regulators that generate a 2V5 supply output from a 3V3 supply input: ● MIPHY single-lane regulator: the voltage controlled by this regulator is internally connected to the MIPHY supply, but it is also externally visible on MIPHY_S_0_VDD2PLL2V5. This regulator is always active; it is not possible to bypass it. ● VREG1 regulator: used only for USB. This regulator is always active (its power down pin is connected to constant 0) ● VREG2 regulator: used for all PLLs (PLL1, PLL2, PLL3, DDR PLL), ADC and OTP. This regulator is switchable; its power down pin is controlled through a hardwired connection to dedicated PCM output (see PCM core description in Section 6.6.2: Functional description ) Note: See also: Doc ID 023063, Data sheet, SPEAr1340, Dual-core Cortex A9 HMI embedded MPU, Electrical characteristics chapter. Doc ID 018553 Rev 3 95/590 Power management 6.6 RM0078 Power control module (PCM) PCM is the core of the SPEAr1340 leakage power management system. Its role is to properly manage the power supply shutoff of the switchable sections of the embedded MPU. This section describes the structure and functionality of the PCM to allow the end user fully understand the effect of the power domain-related power management procedures on the hardware. Note: For the PCM feature list, refer to Doc ID 023063, Data sheet, SPEAr1340, Dual-core Cortex A9 HMI embedded MPU. Note: Figure 24. PCM block diagram ack_power_state GETH_config USB_config GPIO_config RTC_config FW_config wakeup_en wakeup_trig BUSMATRIX_req ack_power_state_o Configuration funnel isolate_vector_o shutoff_vector_o config_vector USBPHY_suspend BUSMATRIX_ack PCM Core ack_o bad_o V_is_ok_vector V_core_ok_4 V_core_ok_3 V_core_ok_2 V_core_ok_1 DDR_1V2_ok_i reg_powerdown_o Domain checker DDR1V2_OFF DDR1V8_OFF DDR1V8_OFF MISC connections PAD connections Direct connections Note: 96/590 For the pin description, refer to Table 32: PCM internal pins. Doc ID 018553 Rev 3 RM0078 6.6.1 Power management Pins Table 32 lists PCM internal pins. For the description of the external pins, refer to Doc ID 023063, Data sheet, SPEAr1340, Dual-core Cortex A9 HMI embedded MPU. Table 32. PCM internal pins Pin name Direction Description clk_i In – resetn_i In – slow_clk_i In – slow_resetn_i In – FW_config In Standard power configuration driven from registers belonging to MISC {USB, GPIO, RTC, GETH}_config In Wakeup power configuration driven from registers belonging to MISC wakeup_en In Wakeup enable signals driven from registers belonging to MISC wakeup_trig In Wakeup triggers from peripherals (USB,GPIO, RTC, GMAC). GPIO trigger comes from IO. USBPHY_suspend In Indicates the SUSPEND status of USB PHY BUSMATRIX_ack In Acknowledges for request of interconnect matrix shutdown BUSMATRIX_req Out V_is_ok_vector_i In Outputs of voltage detectors relative to the 4 supply domains. It indicates if the power domain it represents has reached functional voltage level (‘1’) or not (‘0’) DDR_1V2_ok_i In It is the output of voltage detector relative to DDRPHY power domain acknowledge_o Out Indicates that the power configuration driven from the MISC registers has been acknowledged. Pin configuration should not change again until the previous one has been acknowledged. shutoff_vector_o Out Vector that directly drives the power switches. One element for each power domain. ‘1’ means powerdown (open the switch). isolate_vector_o Out Vector that drives isolation cells. One element for each power domain. ‘1’ means isolate functionality is active. bad_o Out Indicates that the last configuration requested is a bad one (not belonging to the set of allowed configurations). power_state_o Out Outputs the last power configuration that has been received AND served correctly. DDR_1V2_shutoff_o Out Controls the (optional) external (i.e. board) power switch on the DDRPHY 1V2 supply line. Request for interconnect matrix shutdown Doc ID 018553 Rev 3 97/590 Power management Table 32. RM0078 PCM internal pins (continued) Pin name 6.6.2 Direction Description DDR_1V8_shutoff_o Out Controls the (optional) external (i.e. board) power switch on the DDRPHY 1V5/1V8 supply line. reg_powerdown_o Out Controls the shutdown of the 2V5 voltage regulator that supplies the system PLLs and ADC. Functional description The power control module coordinates the control activities related to shutoff mode management. It is divided into three main sub-blocks: ● the PCM core: the main state machine. This block takes as input a new power island configuration (a bit vector containing the desired status for each of the available power islands of SPEAr1340) and the current status of the power islands. It outputs the control signals for the power island switches and for the isolation cells (see Section : PCM core on page 98). It provides control for the shutdown of the VREG2 voltage regulator, related to VREG2_2V5_OUT. The regulator is automatically powered down when all of the switchable power islands are switched off, and automatically powered up upon wakeup. This control can be bypassed by programming the MISC register PCM_CFG (see MISC registers in RM0089, Reference manual, SPEAr1340 address map and registers). ● a configuration funnel: a muxing block that selects the source of the current configuration to be fed to the PCM core. (see Section : Configuration funnel on page 99) ● a domain checker: this block processes (synchronizes and debounces) the outputs of voltage detectors that report the current status of each power island. (see Section : Domain checker on page 102) PCM core The PCM core (Figure 25) comprises the following: 98/590 ● the main state machine: a small sequencer for controlling the external power supply switches (for DDR physical layer power supplies), and ● a configuration sequencer: this component can limit internally the number of transitions between power states (where a power state is simply one of the shutoff modes mentioned before), while still providing to the user full accessibility from any power state to any other one. Doc ID 018553 Rev 3 RM0078 Power management Figure 25. PCM core block diagram PCM core shutoff_vector_o resetn_i isolate_vector_o clk_i config_vector_i Configuration sequencer filtered_config_vector PCM FSM ack_power_state_o reg_powerdown_o V_is_ok_vector_i ack_o shutoff(0) bad_o External switch control DDR1V2_OFF DDR1V8_OFF Figure 26 shows the relevant interface timing; these signals comply with the following rules: ● External power configuration command must remain stable at least until acknowledged ● On domain shutoff, acknowledge must be asserted after V_is_OK states that power has been effectively shut off (last internal event of shutoff procedure) ● On domain power-up, acknowledge must be asserted after isolation line has been deasserted (last internal event of power-up procedure) ● On shutdown, issue the first isolate command, then issue the shutoff command. ● When powering back up (shutoff back to 0), wait for V_is_OK before releasing isolation logic. Figure 26. Relevant PCM core interface timing Configuration funnel Figure 27 shows the configuration funnel sub-block structure. Doc ID 018553 Rev 3 99/590 Power management RM0078 Figure 28 describes the implemented selection function. Figure 27. Configuration funnel block diagram GETH_config USB_config GPIO_config MUX REG config_vector RTC_config BUSMATRIX_req FW_config wakeup_en SELECTOR wakeup_trig ack_power_state SELECTION FUNCTION USBPHY_suspend MISC connections PCM connections Direct connections BUSMATRIX_ack 100/590 Doc ID 018553 Rev 3 RM0078 Power management Figure 28. Configuration funnel selection flow graph START Is device in ALWAYSON? Is FW requiring alwayson? NO NO Is FW requiring BUS OFF? NO Propagate FW configuration. YES YES If an enabled peripheral triggers a wakeup event, propagate its preloaded configuration, else keep current configuration stable. YES YES Is BUS MATRIX idle? NO NO Is USB PHY suspended? YES Is USB ENABLED to wakeup? Propagate FW configuration. Send Request to BUSMATRIX YES YES Is BUS MATRIX idle? NO NO Propagate “0000” configuration. Send Request to BUSMATRIX YES Is BUS MATRIX idle? NO Propagate “0000” configuration. Send Request to BUSMATRIX Doc ID 018553 Rev 3 101/590 Power management RM0078 Domain checker To resynchronize/debounce and, in general, process the signals that detect the current status (ON/OFF) of the related power supply rail, the domain checker sub block provides a simple interface to voltage detector outputs. Figure 29. Domain checker block diagram · · - Synchronizes Vok from switch - Propagates: ‘1’ if ‘1’ stable for last N cycles ‘0’ if ‘0’ stable for last N cycleS Otherwise, holds the previous value. V_core_ok_1 V_core_ok_2 Debouncer Debouncer domain_ok_vector_o V_core_ok_3 V_core_ok_4 DDR_1v2_ok_i shutoff_vector_i(0) 102/590 Debouncer Debouncer Debouncer DDR2/3 OK Counter (for 1V8) Doc ID 018553 Rev 3 RM0078 7 BootROM BootROM This chapter describes the device startup sequence from power on to bootloader execution, and provides an overview of the SoC device internal mechanism (both hardware and software) after the device resets. For the BootROM feature list, refer to the SPEAr1340 datasheet: ● Doc ID 023063, Data sheet, SPEAr1340, Dual-core Cortex A9 HMI embedded MPU For technical details about the programmable registers, refer to the following companion document: ● 7.1 RM0089, Reference manual, SPEAr1340 address map and registers Overview BootROM is the booting firmware prestored in the on-chip 32 KB ROM of SPEAr1340. On power-on, the processor comes out of reset and fetches the first instruction from the RESET vector location. For ARM processors, the high order reset vector is at 0xFFFF0000, where the start of the BootROM memory is mapped. In this way, the ARM processor starts fetching and executing instructions from the BootROM. The main tasks performed by the BootROM code are: 7.1.1 ● performing basic hardware initialization ● selecting the booting device; BootROM selects the booting device after reset by reading the status of the STRAP[3:0] pins ● locating the next level bootloader on the booting device ● validating the next level bootloader image ● passing control to the next level bootloader (execution of the next level bootloader) ● taking care of error cases Useful terms and definitions This section lists some useful terms and their definitions. ● BootROM is the very first firmware code to be fetched and executed by ARM Cortex A9 core when it finishes a reset. ● Bootloading is the activity of locating and loading the next level of software during the boot activity. ● X-Loader is a piece of code that corresponds to the first level of bootloading. It may reside in non-volatile memory (Flash), or be loaded from an external peripheral; it is located and authenticated by BootROM. Its primary task is to configure the DRAM controller with the proper setting according to the external memory part timing. ● XIP (execute in place) is a method in which firmware code can be fetched/executed by the CPU directly from the nonvolatile memory in which it resides. ● Code-shadowing is a software technique in which external firmware code is copied from a memory that does not support in-place execution (XIP), like NAND Flash, to a different memory enabling direct execution like static RAM. Doc ID 018553 Rev 3 103/590 BootROM 7.2 RM0078 ● OTP is one-time programmable memory that can be customized for the final application during the manufacturing process. ● SYSROM is the 32 KB read-only memory embedded in the device at address 0xFFFF0000, where the BootROM code resides. ● SYSRAM0 is the 32 KB static RAM embedded in the device at address 0xB3800000 that is used by the BootROM code to store its global and local variables during execution. ● ALWAYS-ON state is a power state condition in which the embedded MPU is almost completely powered off. The exceptions are: a small island including wake-up logic and SRAM memory; the DDR is powered on in self-refresh mode. The OS state resides permanently in memory and the system can be resumed quickly. ● UOC is the USB On-The-Go controller. Functional description BootROM has several features, such as: ● Security feature: BootROM enables the security feature provided by the cryptographic co-processor (C3) through OTP. After that, BootROM receives encrypted images with a security key, reads the key from OTP and decrypts the images. If the image decryption is successful, the execution proceeds further. If the image is not encrypted using the security key, the decryption fails. For information on how to enable the security feature, see Section 7.2.2: OTP configuration. ● Data CRC (DCRC): BootROM validates the next-level bootloader images through data CRC. Data CRC is enabled through OTP. When enabled, CRC is performed on the image data. If the calculated image CRC matches with that of the header, the execution proceeds further. For information on how to enable data CRC, see Section 7.2.2: OTP configuration. 104/590 ● Wake-up triggering: In order to implement power control, BootROM supports wake-up triggering. When the SoC is powered up and BootROM execution starts, BootROM checks if power-up is a result of wake-up triggering. If any of the PCM_CFG[9:5] bits is set, BootROM jumps to “Always-on RAM” at address 0xE0800000. ● Pen Holding mechanism: When Cortex A9 comes out of reset, both of its cores start fetching instructions from the reset vector location, for instance from address 0xFFFF0000, where the 32 KB SYSROM memory is located. The pen holding mechanism is used to stop Core 1 while continuing Core 0 booting. See also: Section 7.4.1: BootROM on Core 1. Doc ID 018553 Rev 3 RM0078 7.2.1 BootROM Hardware components used BootROM uses SYSRAM0 to copy the first level bootloader, Xloader. Depending on the booting mode selected, the source of Xloader may be Serial NOR Flash, Parallel NOR Flash, NAND Flash, SD/MMC card, UART0 and USB OTG. For more information on supported booting devices, see Section 7.2.3: Boot device selection. Figure 30 illustrates BootROM start-up sequence. Figure 30. BootROM start-up sequence "OOTSTAGES 3932!- %XTERNALCODE DEVICE "OOT2/- 8,OADER !2NDLEVEL "OOTLOADER (IGHVECTORS "OOT2//3 %MBEDDEDIN30%!R %XTERNAL Doc ID 018553 Rev 3 105/590 BootROM 7.2.2 RM0078 OTP configuration The SPEAr1340 OTP module embeds three 255-bit banks: Bank 1, Bank 2 and Bank M. BANK M contains one predefined bit (255) used for test purposes, and dedicated bits controlled by the BootROM. Table 33 shows how Bank M is mapped in the OTP. Note: Refer to Chapter 9: One-time programmable antifuse (OTP) for information on OTP banks. Table 33. Note: OTP Bank M configuration Bits Fields Offset Description 1 XXXX 255 2 S1-S0 254-253 1 bit + 1 redundancy for encryption key section enable 2 V1-V0 252-251 1 bit + 1 redundancy for vendor ID section enable 2 J1-J0 250-249 1 bit + 1 redundancy for JTAG disable 2 T1-T0 248-247 1 bit + 1 redundancy for TEST disable 2 E1-E0 246-245 Reserved 2 C1-C0 244-243 1 bit + 1 redundancy for data CRC check enable 2 U1-U0 242-241 Reserved 137 Security 72 USB/PCI ID 103-32 64 bits + 8 ECC for USB/PCI vendor IDs (see Note 2) 16 WP B2 31-16 8 bits + 8 redundancy for masking OTP Bank 2 16 WP B1 15-0 8 bits + 8 redundancy for masking OTP Bank 1 1 bit reserved for blowing at final test 240-104 128 bits + 9 ECC for encryption key (see Note 1) 1 To enable the security feature, set either of the security bits S0 or S1. 2 The OTP USB/PCI vendor IDs section can be programmed by users who wish to provide their own customized vendor and product IDs for USB. The following C structure highlights the fields of the 72-bit USB/PCI ID section (in little endian): /* * This OTP structure keeps all the USB and PCIe information that might be customized by the final user. */ typedef struct otp_basic { u16 otp_usb_vid; u16 otp_usb_pid; u16 otp_pci_vid; u16 otp_pci_pid; u8 otp_ecc; /* a H(127, 120) is required with 7bit of ECC. */ } See also: Section : OTP section access on page 145. 106/590 Doc ID 018553 Rev 3 RM0078 7.2.3 BootROM Boot device selection The device has seven external strapping pins (STRAP[6:0]) that are sampled by internal hardware logic during the power-on reset sequence, and latched on the BOOTSTRAP_CFG miscellaneous register (0xE0700004). After latching, STRAP[6:0] pins are reusable for different purposes. When used as output pins, they require no special conditions, but when used as input pins, the application must keep them in a non-driving (tri-state) mode for at least 2 µs after MRESETn is released. Note: For the description of STRAP[6:0] pins, refer to Doc ID 023063, Data sheet, SPEAr1340, Dual-core Cortex A9 HMI embedded MPU. The pins STRAP[3:0] are used to select the internal booting method (see Table 34 below). Reading the status of these pins, the BootROM selects the booting device between the following ones: ● Serial NOR Flash ● Parallel NOR Flash ● NAND Flash ● SD/MMC card ● UART0 ● OTG as USB device Table 34 describes SPEAr1340 boot selection. Note: For pad configuration details, refer to RM0089, Reference manual, SPEAr1340 address map and registers, Miscellaneous chapter, “Pad configuration options” table. Table 34. Hardware boot selection (STRAP[0..3]) Backup source(1) Primary source STRAP3 STRAP2 STRAP1 STRAP0 Bypass na 0 0 0 0 Serial NOR Flash USB OTG (Device) 0 0 0 1 NAND Flash USB OTG (Device) 0 0 1 0 Parallel NOR Flash (8-bit) USB OTG (Device) 0 0 1 1 Parallel NOR Flash (16-bit) USB OTG (Device) 0 1 0 0 UART0 na 0 1 0 1 rfu(2) na 0 1 1 0 rfu na 0 1 1 1 USB OTG (Device) na 1 0 0 0 Serial NOR Flash UART0 1 0 0 1 NAND Flash UART0 1 0 1 0 Parallel NOR Flash (8-bit) UART0 1 0 1 1 Parallel NOR Flash (16-bit) UART0 1 1 0 0 MMC/SD memory card na 1 1 0 1 rfu na 1 1 1 0 rfu na 1 1 1 1 Doc ID 018553 Rev 3 107/590 BootROM 1 RM0078 The backup source will be used in case that the primary source is not available. 2Reserved 7.2.4 for future use. Software architecture BootROM code is the first piece of software executed after the power ON phase. It is logically divided into three main parts: ● System initialization: this part is common for all boot modes. ● Boot device initialization and code-shadowing: this part is different for each boot mode. It consists of initializing the boot device and copying the Xloader image from that device to SRAM memory. ● Xloader authentication and execution: this part is common for all boot modes. SYSROM code memory map The 32 KB SYSROM memory is located at address 0xFFFF0000, where the ARM Cortex A9 starts fetching just after reset. The BootROM code resides here. Figure 31 shows the memory map of the SYSROM. Figure 31. SYSROM memory map 0xFFFF7FFF 0xFFFF7F00 0xFFFF7E00 Version Table Security Table Security Code 0xFFFF5000 0xFFFF4C38 unused rw data 0xFFFF4A4C 0xFFFF4398 ro data Text 0xFFFF0020 0xFFFF0000 108/590 Vector Table Doc ID 018553 Rev 3 RM0078 BootROM SYSRAM0 memory map At start-up, the DDR memory is not initialized. Instead, SPEAr1340 has a 32 KB SYSRAM0 memory that BootROM uses for the following purposes: ● Stack ● BSS ● RW data ● Code-shadowing ● Others: Pen holding and internal watchdog reset Figure 32 shows the memory map of the SYSRAM0. Figure 32. SYSRAM0 memory map 0xB3807FFF Code Shadowing Area 0xB3801500 RAM Variables 0xB38007F4 RW data 0xB3800608 0xB3800600 Watchdog Reset Pholding Stack 0xB3800000 Doc ID 018553 Rev 3 109/590 BootROM 7.2.5 RM0078 System initialization When the SoC is powered up and BootROM starts executing, it first performs basic system initialization. This set of steps is common irrespective of the boot mode chosen. Figure 33 shows the tasks performed during system initialization. Figure 33. System initialization 3934%-2%3%4"OOT2/- ENTRYPOINT 3YSTEMINITIALIZATION $ISABLES--5 )NVALIDATES)#ACHEANDENABLESIT 9%3 *UMPTO!,7!93 /N2!-LOCATION X% 7AKEUP TRIGGERED /54 ./ 7ATCHDOGRESET 9%3 3ETBITOF393?37?2%3 REGISTERTOTRIGGERCOMPLETE 3O#RESET /NX" WRITEh7$v ./ )NITIALIZE0,, 3ETUP3TACK )NITIALIZE$ATA SEGMENTAND"33 3ECURITYENABLED 9%3 %XECUTESECURITYFUNCTIONS ./ #HECKBOOTTYPE 9%3 3ECURITY EXECUTION SUCCESSFUL ./ 0ROCEEDTOINDIVIDUALBOOTMODES (!.' 110/590 Doc ID 018553 Rev 3 RM0078 BootROM Figure 33 is explained as follows: Note: 1. As a first step, the memory management unit (MMU) is disabled. BootROM does not use virtual memory, so MMU is not needed. In order to improve system performance the instruction cache (I-Cache) is initialized. To do so, BootROM invalidates the I-Cache and then enables it. 2. If wake-up is triggered by any of the sources, code jumps to the ALWAYS-ON RAM memory location (0xE0800000). This sequence is important to resume from the sleep feature. BootROM expects wake-up code to be already present in the ALWAYS-ON RAM. This RAM is in the ALWAYS-ON domain and hence is not powered off during sleep. The wake-up code lying in the ALWAYS-ON RAM area is put there by higher layer software before going into sleep, and is responsible for resuming the system in the original state. Even if the security feature is enabled, BootROM simply passes the control to ALWAYS-ON RAM considering that the code comes from a trusted source. No security checks are performed. 3. If wake-up is not triggered, it may be an internal watchdog reset. In this case, BootROM writes the watchdog ID (WD0 or WD1) that causes the reset on the watchdog reset location in SYSRAM0 (0xB3800604) and triggers the complete SoC reset by setting bit 0 of SYS_SW_RES miscellaneous register (0xE0700204). 4. If none of the above is true, it means that it is a normal system start-up. BootROM initializes the PLL, sets up the stack, initializes the BSS and data segment. Refer to Section : PLL initialization for more details. 5. If either of the security bits (S1 or S0) is set in OTP (see Section 7.2.2: OTP configuration), BootROM executes the initial security functions. If security execution is successful, BootROM passes at the following step. Otherwise, it hangs. For example: If bit 4 of SOC_CFG miscellaneous register (0xE0700000) is set, it corresponds to an unsupported test mode, so the BootROM will hang. As the timer is required by almost all boot modes, BootROM initializes the timer. 6. BootROM reads the BOOTSTRAP_CFG miscellaneous register (0xE0700004) to determine the STRAP pins configuration. According to the boot type, BootROM jumps to the corresponding boot mode. PLL initialization When the system comes out of reset, its clock source is osci1 (24 MHz). There are two different factors that BootROM must take into consideration while configuring the clocks of various subsystems: ● The clock frequencies of the individual subsystems must be within their limits: for instance, the C3 maximum frequency is 48 MHz. ● The power consumed by the SoC must be the minimum possible. The aim in programming PLL is to configure the minimum possible supported frequency. PLL programming is done through M, N and P parameters which can take a defined set of values. Fout = Fvco / 2^P where: M can be from 8 to 255 N can be from 1 to 7 P can be from 0 to 6 Doc ID 018553 Rev 3 111/590 BootROM RM0078 Fvco = Fref * 2 * M Fref = Fin / N Fin = 24 MHz On SPEAr1340, PLL1_FRQ is configured as 0x11000201, where: ● M = 0x11 ● P = 0x2 ● N = 0x1 Resulting in: ● Fref = 24 / 1 = 24 MHz ● Fvco = (24 * 2 * 17) = 816 MHz ● Fpll = (816 / (2^2)) = 204 MHz This leads to: 7.2.6 ● CPU Freq = Fpll / 2 = 102 MHz ● AHB Freq = Fpll / 6 = 34 MHz ● APB Freq = Fpll / 12 = 17 MHz Boot device initialization and code-shadowing After basic system initialization, BootROM initializes the IP which shall be used to access the boot device. The steps involved are: 112/590 ● Pad configuration SPEAr1340 pads are multiplexed with different IPs. By default, all multiplexed pads are in input mode. Hence, it is necessary to configure the pads related to a particular IP as required by the IP (basically setting the direction of the pad). ● Controller configuration The controller used to access the device is initialized. Additionally, if the boot device is using any synthesizer for the device clock, BootROM configures it. ● Device access To access the device and get Xloader, BootROM performs the following tasks: a) Initializes the boot device (if required) b) Copies Xloader header (64 bytes) from device to stack c) Authenticates the header d) Copies the Xloader image from the boot device to SRAM Doc ID 018553 Rev 3 RM0078 BootROM Table 35 summarizes the major configuration differences for each boot mode. Table 35. IP configuration Boot mode Controller Clock source (controller) Clock frequency (MHz) Boot Bypass SMI SMI clock = HCLK/SMI_PRESCALER 8.5 0xE6000000 SNOR SMI SMI clock = HCLK/SMI_PRESCALER 8.5 0xE6000000 PNOR FSMC HCLK 34 0xA0000000 NAND FSMC HCLK 34 0xB0800000 SD/MMC MCIF UART UART0 USB OTG UDC Image source address 40.78 The first primary partition must be a FAT partition. The image name must be xloader.img. Osci1 24 Sent by host using Kermit protocol ohci_clk48_i 48 Sent by flashing utility MCIF XY Synthesizer The following sections provide IP configuration details for each boot mode. Boot Bypass Boot Bypass uses the serial NOR device. In this mode, no validation or authentication is performed on the image data; the code directly jumps to the image load address found in the image header. The image header is authenticated, however. When using Boot Bypass, it is the sole responsibility of the user to ensure that the image being executed comes from a trusted source. Note: In this mode, security should be disabled. If security is enabled and Boot Bypass is selected, the SoC hangs. Pad configuration To access the serial NOR device, the following pads are enabled: ● SMI_DATAIN ● SMI_DATAOUT ● SMI_CLK ● SMI_CS0n (this pin should be used for booting from serial NOR) ● SMI_CS1n Controller configuration The SMI controller is used to access serial NOR devices. These are the configuration options: ● Bank used: Bank 0 ● Controller clock: 8.5 MHz ● Chip deselect time: 300 ns Doc ID 018553 Rev 3 113/590 BootROM RM0078 Device access In Boot Bypass mode, the image header (first 64 bytes) is copied word-by-word from the SNOR Flash to stack and authenticated. If the header authentication is successful, the code directly jumps to the image load address (which should lie in the SNOR memory area). Serial NOR (SNOR) The serial NOR boot mode is used to boot from serial NOR Flash. Pad configuration See Section : Boot Bypass. Controller configuration See Section : Boot Bypass. Device access Once the configuration is over, the header (first 64 bytes) is copied word-by-word from the SNOR Flash to stack and authenticated. If the header authentication is successful, load address is extracted from it and the entire image is copied word-by word from the SNOR Flash to the load address (in SRAM). Parallel NOR (PNOR) BootROM supports booting from 8-bit and 16-bit PNOR devices. The FSMC controller is used to access the PNOR Flash. Pad configuration Pads enabled for 8-bit PNOR devices: 114/590 ● FSMC_AD0 - FSMC_AD25 ● FSMC_RB0 ● FSMC_ALE_AD17 ● FSMC_CE0n (this pin should be used for booting from parallel NOR) ● FSMC_CE1n ● FSMC_CLE_AD16 ● FSMC_REn ● FSMC_RSTPWDWN0 ● FSMC_RSTPWDWN1 ● FSMC_RWPRT0n ● FSMC_RWPRT1n ● FSMC_WEn ● FSMC_IO0 - FSMC_IO7 Doc ID 018553 Rev 3 RM0078 BootROM Pads enabled for 16-bit PNOR devices: ● FSMC_AD0 - FSMC_AD25 ● FSMC_RB0 ● FSMC_ALE_AD17 ● FSMC_CE0n (this pin should be used for booting from parallel NOR) ● FSMC_CE1n ● FSMC_CLE_AD16 ● FSMC_REn ● FSMC_RSTPWDWN0 ● FSMC_RSTPWDWN1 ● FSMC_RWPRT0n ● FSMC_RWPRT1n ● FSMC_WEn ● FSMC_IO0 - FSMC_IO15 Controller configuration FSMC control register for Bank0 (GenMemCtrl0) ● Wait check during the first data access is enabled ● Reset / power-down signal is sent to the Flash memory ● Type of memory specified as PNOR ● Bank0 is enabled Timing register (GenMemCtrl_tim0) ● Duration of address state phase: 5 HCLK cycles ● Duration of hold address phase: 5 HCLK cycles ● Duration of Data_ST phase: 67 HCLK cycles ● Burst turn around duration: 5 HCLK cycles ● Burst cycle length: 5 HCLK cycles ● Data latency used: 6 HCLK cycles Device access The Xloader header and image data is copied using the library copy function. However, the copy function is different depending on the device width: ● 8-bit PNOR : byte-by-byte ● 16-bit PNOR: half-word by half-word NAND Flash BootROM expects the X-Loader to be present in either of the first four blocks (block 0 to block 3) of the NAND device. It uses a pure skip-block algorithm, in which bad blocks are skipped, starting from block0 up to block3. If BootROM does not find Xloader in any of these four blocks, it jumps to default boot mode. SPEAr1340 supports booting from a variety of NAND devices. It supports both NAND devices and new Open NAND Flash (ONFI) devices (see Section 7.4.3: List of supported devices). The FSMC controller is used to access the NAND Flash. Doc ID 018553 Rev 3 115/590 BootROM RM0078 Pad configuration Pads enabled for all NAND 8-bit devices ● FSMC_RB0 ● FSMC_ALE_AD17 ● FSMC_CE0n (this pin should be used for booting from NAND) ● FSMC_CE1n ● FSMC_CLE_AD16 ● FSMC_REn ● FSMC_RSTPWDWN1 ● FSMC_RWPRT0n ● FSMC_RWPRT1n ● FSMC_WEn ● FSMC_IO0 - FSMC_IO7 Pads enabled for all NAND 16-bit chips ● FSMC_RB0 ● FSMC_ALE_AD17 ● FSMC_CE0n (this pin should be used for booting from NAND) ● FSMC_CE1n ● FSMC_CLE_AD16 ● FSMC_REn ● FSMC_RSTPWDWN1 ● FSMC_RWPRT0n ● FSMC_RWPRT1n ● FSMC_WEn ● FSMC_IO0 - FSMC_IO15 Controller configuration FSMC control register for Bank0 (GenMemCtrl0): Note: 116/590 ● Wait sensitivity is activated ● NAND is selected as memory type ● CLE to RE delay is set as Tclk * 3 ● ALE to RE delay is set as Tclk * 3 ● The NAND device is enabled 1: Tclk = 29.4 ns (HCLK is set to 34 MHz). Doc ID 018553 Rev 3 RM0078 BootROM Timing registers To detect NAND devices without issues, the timing registers of the FSMC controller are initialized with appropriate timing values. The two timing registers to be configured are: ● GenMemCtrl_Comm0 (Timing register for common mode and NAND bank 0) ● GenMemCtrl_Attrib0 (Timing for PCcard attribute mode and wait mode for NAND bank 0) BootROM initializes the following timing settings in these registers. The values below are hardcoded into the BootROM code and cannot be tuned. These values have been selected relaxed in order to allow booting from any NAND device. In later stages, after BootROM, these values can be changed according to the NAND device used. ● THIZ = 0x01 The total time is calculated as: THIZ= 29.4 ns ● THOLD = 0x04 The total time is calculated as: THOLD= 117.65 ns ● TWAIT = 0x06 The total time is calculated as: TWAIT= 205.9 ns ● TSET = 0x00 The total time is calculated as: TSET= 29.4 ns Hence, both GenMemCtrl_Comm0 and GenMemCtrl_Attrib0 registers are initialized with value 0x01040600. Note: For a detailed description of FSMC timing requirements, refer to Doc ID 023063, Data sheet, SPEAr1340, Dual-core Cortex A9 HMI embedded MPU, “Timing characteristics” chapter. Device access Once the controller has been configured, the next step is to detect the Flash chip present. The procedure followed is: 1. The reset command (0xFF) is issued. 2. The device ID is read by issuing command 0x90. 3. The device ID is read again to eliminate bus hold and interface concerns. If the two ID reads matched, it means that a NAND device is present on the board. 4. The manufacturer ID is read as well. 5. The device ID read is compared with the static table in BootROM memory. This table contains the IDs of old NAND devices. If the device ID read from the chip is found in the static table, the device parameters such as page_size, block_size, and so on are read from the table. Otherwise, the chip is probed for ONFI-compliant Flash. 6. Once the chip parameters are known, the NAND chip is read page-by-page. 7. The first 64 bytes constitute the image header. Once the header is validated, the entire image is read page-by-page. Refer to Section 7.2.8: Image header authentication for details on header authentication. Doc ID 018553 Rev 3 117/590 BootROM RM0078 SD/MMC BootROM supports booting from SD and MMC cards. Note: The file system used by the card is the FAT file system. Pad configuration Pads enabled for SD/MMC: ● MCIF_nCE_SD_MMC ● MCIF_DATA_DIR ● MCIF_SD_CMD ● MCIF_LEDS ● MCIF_ADDR1_CLE_CLK ● MCIF_nCD_SD_MMC ● MCIF_DMARQ_RnB_WP ● MCIF_DATA0 ● MCIF_DATA[1:3]_SD ● MCIF_DATA[4:7] Controller configuration Peripheral register (PERIP_CFG miscellaneous register) ● Configure MCIF for SD/MMC card MCIF clock synthesizer (MCIF_SD_CLK_SYNT miscellaneous register) ● Set X parameter as 1 ● Set Y parameter as 8 ● Output clock synthesizer frequency: Fout = Fin * X / (2 * Y) for 50% duty cycle For instance, Fout = 12.65 MHz Device access The first step after configuring the controller is to determine the card type: SD or MMC. This is done by sending a command to the card that exists in the SD specification and is not present in MMC cards. BootROM checks if the SD card supports version 2 of the SD specification (which is the first to define the high capacity bit) with the command CMD8 , then tests for the bit with CMD41. If the card does not support version 2 of the SD specification, CMD41 is still issued (without requesting for the high capacity bit) to differentiate between a standard capacity SD card and an MMC card. The following flowchart shows this procedure. 118/590 Doc ID 018553 Rev 3 RM0078 BootROM Figure 34. SD/MMC card detection sequence 6WDUW &0' WLPHRXW 2. 6'Y[ RU00& RUQRFDUG 6'Y[ &0' UHT+& &0' 2. %LWFOHDUHG %LWVHW WLPHRXW 00&RU QRFDUG 6'VWDQGDUG 6' KLJKFDSDFLW\ &0' UHT+' WLPHRXW 2. 00&Y RUQRFDUG 00&Y &0' &KHFNELWV WLPHRXW 2. 1RFDUG 1RW 00&VWDQGDUG 00& KLJKGHQVLW\ 3URFHHGZLWKFDUGFRQILJXUDWLRQ Once the card type has been determined, image data is read from it using the standard SD/MMC card reading procedures. The protocol used for both SD and MMC is based on transactions where the host initiates a transfer by sending a command and the card responds with status information, the actual data and a CRC. To read a particular FAT sector, SD/MMC driver sets the start address to read from and the number of bytes to be read (for a FAT sector, this is always 512 bytes). Doc ID 018553 Rev 3 119/590 BootROM RM0078 UART When UART booting is selected, UART0 is configured for a baud rate of 115200 bps. In UART boot mode, BootROM supports data transfer using only the Kermit protocol. Pad configuration Pads enabled for UART0: ● UART0_RXD ● UART0_TXD Controller configuration ● Select OSC1 (24 MHz) as UART0 clock source ● Baud rate divisor values are set for 115200 bps ● Line control registers are configured for: – 8-bit word length – No parity – 1 stop bit – FIFO enabled Device access As mentioned above, in UART boot mode the Kermit protocol is used to transfer data. Data is sent in the form of Kermit packets. The data in each packet is enclosed between a packet header and footer. Refer to Kermit protocol manual for more details on data transmission: https://www-vs.informatik.uni-ulm.de/teach/ws05/rn1/Kermit%20Protocol.pdf Each data byte transmitted by the host is received in UART0’s data register (UARTDR, 0xE00000000). Similarly to other boot modes, if the header authentication is successful, BootROM starts copying the image data to the SRAM memory. USB OTG USB OTG boot mode can be configured by setting the STRAP[3] pins. USB boot mode is quite helpful as it is used (in conjunction with Flashing utility) to burn the next-level bootloaders and/or the operating system onto the SNOR, PNOR and NAND Flash chips. The USB OTG supports 2 modes of operation: slave mode and DMA mode. For USB booting, the SPEAr device is configured in slave mode to perform transfers over bulk endpoint. Pad configuration There is no pad multiplexing for USB IP pins. No pad needs to be enabled for USB OTG boot mode. Controller configuration The following steps are performed during USB OTG initialization. 120/590 1. All test registers and state machines in the USB 2.0 nanoPHY are reset. 2. UHC1 port’s transmit and receive logic are reset. 3. Wait is done until USB 2.0 nanoPHY PLL is locked. 4. OTG HCLK is reset and enabled. 5. Vendor ID (VID)/Product ID (PID) are initialized: If any of OTP bits 251 or 252 is set, BootROM reads the VID and PID from the OTP memory. Otherwise, it uses the default Doc ID 018553 Rev 3 RM0078 BootROM VID (0x483) and PID (0x3802). Then, BootROM prepares USB string descriptors with the following data: – MANUFACTURER - ST MICROELECTRONICS – PRODUCT_NAME - ST SPEAr SoC Family – DEVICE_ID - As read from the DIE_ID_3 and DIE_ID_4 After that, BootROM initializes the USB OTG controller registers and prepares the core for device mode. The following registers are configured for this: ● ● ● Global AHB configuration register (GAHBCFG) – Periodic TxFIFO is marked completely empty – IN Endpoint TxFIFO is marked completely empty All interrupts except the following are masked: – OUT Endpoints Interrupt – USB Reset – Enumeration Done – Receive FIFO Non-Empty Soft disconnect is removed by setting bit 1 in the Device control register (DCTL) Once this configuration is done, BootROM waits for USB RESET interrupt. The RESET interrupt is raised when the device is connected to the host using OTG cable. Once the RESET interrupt is received, the generic USB device enumeration procedure is performed and the device is enumerated as HIGH SPEED DEVICE. The following tables list the descriptors used. Table 36. USB device descriptors Length (bits) Offset (bits) Hex Value bLength 8 0 0x12 Descriptor size is 18 bytes. bDescriptorType 8 8 0x01 DEVICE descriptor type bcdUSB 16 16 0x0200 bDeviceClass 8 32 0x00 Each interface specifies its own class information. bDeviceSubClass 8 40 0x00 Each interface specifies its own subclass information. bDeviceProtocol 8 48 0x00 The device does not use class-specific protocols on a device basis. bMaxPacketSize0 8 56 0x40 Maximum packet size for endpoint zero is 64. idVendor 16 64 0x0483 Vendor ID is 1155: STMicroelectronics. idProduct 16 80 0x3802 Product ID is 14338. bcdDevice 16 96 0x0100 The device release number is 1.00. iManufacturer 8 112 0x01 The manufacturer string descriptor index is 1. iProduct 8 120 0x02 The product string descriptor index is 2. iSerialNumber 8 128 0x03 The serial number string descriptor index is 3. bNumConfigurations 8 136 0x01 The device has 1 possible configuration. Field Description USB Specification version 2.00 Doc ID 018553 Rev 3 121/590 BootROM Table 37. RM0078 USB configuration descriptors Field Length (bits) Offset (bits) Hex Value Description bLength 8 0 0x09 Descriptor size is 9 bytes. bDescriptorType 8 8 0x02 CONFIGURATION descriptor type wTotalLength 16 16 0x0020 The total length of data for this configuration is 32. This includes the combined length of all the descriptors returned. Warning : The value of wTotalLength is not equal to real length bNumInterfaces 8 32 0x01 This configuration supports 1 interface. bConfigurationValue 8 40 0x01 The value 1 should be used to select this configuration. iConfiguration 8 48 0x00 The device does not have the string descriptor describing this configuration. bmAttributes 8 56 0xC0 Configuration characteristics : Bit 7: Reserved (set to 1) Bit 6: Self-powered (set to 1) Bit 5: Remote Wakeup (set to 0) Note: The rest of the bits are reserved and set to 0. bMaxPower 8 64 0x00 The maximum power consumption of the device in this configuration is 0 mA. Table 38. USB interface descriptors Length (bits) Offset (bits) Hex Value bLength 8 72 0x09 Descriptor size is 9 bytes. bDescriptorType 8 80 0x04 INTERFACE descriptor type bInterfaceNumber 8 88 0x00 The number of this interface is 0. bAlternateSetting 8 96 0x00 The value used to select the alternate setting for this interface is 0. bNumEndpoints 8 104 0x02 The number of endpoints used by this interface is 2 (excluding endpoint zero). bInterfaceClass 8 112 0x00 Unknown class bInterfaceSubClass 8 120 0x00 The subclass code is 0. bInterfaceProtocol 8 128 0x02 The protocol code is 2. iInterface 8 136 0x00 The device does not have a string descriptor describing this interface. Field 122/590 Description Doc ID 018553 Rev 3 RM0078 Table 39. BootROM USB IN endpoint descriptors Length (bits) Field Offset (bits) Hex Value Description bLength 8 144 0x07 Descriptor size is 7 bytes bDescriptorType 8 152 0x05 ENDPOINT descriptor type bEndpointAddress 8 160 0x81 This is an IN endpoint with endpoint number 1. Types bmAttributes 8 168 0x02 Transfer: BULK Pkt Size Adjust: No wMaxPacketSize 16 176 0x0200 Maximum packet size for this endpoint is 512 Bytes. If High-Speed, 0 additional transactions per frame. bInterval 8 192 0x00 The polling interval value is every 0 Frames. Undefined for High-Speed. Table 40. USB OUT endpoint descriptors Length (bits) Offset (bits) Hex Value bLength 8 200 0x07 Descriptor size is 7 bytes. bDescriptorType 8 208 0x05 ENDPOINT descriptor type bEndpointAddress 8 216 0x02 This is an OUT endpoint with endpoint number 2. Field Description Types bmAttributes 8 224 0x02 Transfer: BULK Pkt Size Adjust: No wMaxPacketSize 16 232 0x0200 bInterval 8 248 0x00 Table 41. Maximum packet size for this endpoint is 512 Bytes. If High-Speed, 0 additional transactions per frame. The polling interval value is every 0 Frames. If High-Speed, 0 uFrames/NAK. USB string descriptors Field Length (bits) Offset (bits) Hex Value Description String Descriptor 1 bLength 8 0 0x28 Unicode String Length is 40 bytes (19 chars). bUnicodeType 8 8 0x03 Second Byte of Unicode STRING 8 16 0x53 String: ST SPEAr SoC Family String Descriptor 2 bLength 8 0 0x04 Descriptor size is 4 bytes. bUnicodeType 8 8 0x03 Second Byte of this descriptor wLANGID[0] 16 16 0x0409 Language Id: 1033 Doc ID 018553 Rev 3 123/590 BootROM Table 41. RM0078 USB string descriptors (continued) Length (bits) Field Offset (bits) Hex Value Description String Descriptor 3 bLength 8 0 0x24 Unicode String Length is 36 bytes (17 chars). bUnicodeType 8 8 0x03 Second Byte of Unicode STRING 8 16 0x30 String: SPEAr Device access In the USB bootmode, SPEAr acts as USB device. BootROM uses the following custom protocol for data transmission between the host and the device: 1. The first packet sent by the host is a a 64-byte packet. These 64 bytes constitute the image header. 2. The length of all other packets, but the last one, is equal to 512 bytes. 3. The last packet length is less than or equal to 512 bytes. Note: BootROM validates the first packet (image header). If the validation is not successful, it ignores all subsequent packets. Otherwise, it stores the image at the load address (present in the image header) 7.2.7 Xloader authentication and execution Once the image is loaded in the SRAM, it is authenticated. Depending upon the OTP configuration, the following checks are possible (in the order written below): 1. Data CRC verification (with Xloader devoted field) and RSA PUBLIC KEY signature verification. If data CRC is enabled, CRC check is done against image data. 2. Image decryption: if security is enabled, the image is decrypted using the security key from OTP. If both of these checks are successful, I-cache is invalidated and the code jumps to the load address present in the image header (ih_load). It is possible for the Xloader to return back to the BootROM code. If this happens, Xloader returns to the BootROM with an address of the next-level bootloader (possibly u-boot). This address lies in the memory map of the bootmode selected. For instance, if SNOR boot mode is selected and Xloader returns back to the BootROM, it returns with an address lying in the SNOR memory location where the next-level bootloader is present. Again, BootROM copies the image header, validates it, copies image data, validates and executes the image. In this way, it is possible to execute multiple images from BootROM. Error scenarios If data CRC check or image decryption fail: ● Note: Security checks are applied also in the default boot mode. ● 124/590 For memory boot modes (SNOR, PNOR, NAND) the default boot mode is executed. Refer to Section 7.2.9: Default boot mode for details. For peripheral boot modes (UART, SD/MMC, USB) no action is taken. The SoC should be restarted by the user. Doc ID 018553 Rev 3 RM0078 BootROM BootROM flow summary The following flowchart summarizes the complete BootROM software design. Figure 35. BootROM flowchart SYSTEM RESET - BootROM entry point System initialization: 1. Disables MMU 2. Invalidates I-Cache and enables it Wake up triggered? YES Jump to ALWAYSOn-RAM location (0xE0800000) OUT NO 1. Initialize PLL 2. Setup Stack 3. Initialize Data segment and BSS NO Watchdog reset? YES Set bit 0 of SYS_SW_RST register to trigger complete SoC reset. On 0xB3800604 write “WD0” YES Execute security functions Security enabled? NO HANG Check boot type YES Is security enabled? YES NO 1. Initialize SMI IP and enable pads 2. Get image header and authenticate. YES YES Security execution successful? Boot Bypass? NO NO HANG 1. Initialize the peripherals/controller required by the boot mode. 2. Get X-loader header from the boot source. 3. Authenticate X-Loader header Header authentication successful? Header authentication successful? NO Go to default boot mode YES Jump to image load address NO Go to default boot mode Receive complete X-loader image NO YES Data CRC enabled? YES NO Security validation successful? Data CRC ok? NO YES Security enabled? NO YES Jump to image load address Doc ID 018553 Rev 3 125/590 BootROM RM0078 Here is a brief description of the flow: 1. BootROM does the basic system initialization. 2. After initialization, BOOTSTRAP_CFG miscellaneous register (0xE0700004) is read to get the boot mode selected. 3. If it is a Boot Bypass, the SMI controller is initialized and the image header is read from the SMI Flash (0xE6000000). Once the header authentication is successful, the code jumps to the load address (present in the header). Refer to Section 7.2.8: Image header authentication for details on header authentication. 4. If it is not a Boot Bypass, the image header is read from the source. The source can be any of the following: SNOR Flash, PNOR Flash, NAND Flash, MMC card, SD card, sent through the Kermit protocol (UART boot) or sent as USB packets (USB boot). 5. The image header is authenticated and the load address is extracted from it. BootROM then copies the complete image from the source to the load address. 6. If data CRC is enabled, then data CRC check is performed. 7. If security is enabled, the image is decrypted using the security key and more security checks are performed. 8. Finally, the code jumps to the image load address. Error scenarios 7.2.8 ● If the initial security function execution fails, the SoC hangs. ● Boot Bypass requires security to be disabled. If security is enabled and Boot Bypass is selected, the SoC hangs. ● If the header authentication fails, the default boot mode is triggered. ● If DCRC check fails, the default boot mode is triggered. ● If the image decryption or any other security check fail, the default boot mode is triggered. Image header authentication The image header is 64-byte long. It has the following structure: typedef struct image_header { uint32_t ih_magic; /* Image Header Magic Number */ uint32_t ih_hcrc; /* Image Header CRC Checksum */ uint32_t ih_time; /* Image Creation Timestamp */ uint32_t ih_size; /* Image Data Size */ uint32_t ih_load; /* Data Load Address */ uint32_t ih_ep; /* Entry Point Address */ uint32_t ih_dcrc; /* Image Data CRC Checksum */ uint8_t ih_os; /* Operating System */ uint8_t ih_arch; /* CPU architecture */ uint8_t ih_type; /* Image Type */ uint8_t ih_comp; /* Compression Type */ uint8_t ih_name[IH_NMLEN]; /* Image Name */ } image_header_t; Figure 36 describes the image header authentication logic. 126/590 Doc ID 018553 Rev 3 RM0078 BootROM Figure 36. Header authentication flow Entry point - Image Header Authentication NO ih_magic = IH_MAGIC? YES Calculate header crc NO Calculated hcrc = ih_hcrc ? Invalid header! Return FAILURE YES ih_load >= 0xB2800000 and < bootrom_mem_end ? NO YES Valid Header Return SUCCESS The magic number is defined as: #define IH_MAGIC 0x27051956 To be considered valid, the image header should follow the rules below: 1. ih_magic should match the value IH_MAGIC 2. ih_hcrc should match the calculated crc of image_header_t 3. ih_load should not fall in between SYSRAM0 start address and bootrom_mem_end. When data CRC is enabled, ih_dcrc is used to validate the complete X-Loader image. It should match the calculated CRC of the image. Note: bootrom_mem_end is the SYSRAM0 address up to where the BSS region goes. It is less than 0xB3801500. Doc ID 018553 Rev 3 127/590 BootROM 7.2.9 RM0078 Default boot mode BootROM executes the default boot mode only if the following primary boot modes fail: ● Boot Bypass ● SNOR boot ● NAND boot ● PNOR boot The system needs to be reset in case that primary boot fails: ● UART boot ● SD/MMC boot ● USB boot Therefore, depending upon bootstrap pin configuration, USB booting or UART booting is triggered in case any failure occurs in primary boot mode. Figure 37 explains the default boot mode behavior. Figure 37. Default boot mode flow Default boot mode triggered YES Is the current boot mode - UART or USB or SDMMC? Do nothing. Let the user reset the SoC. NO Is USB configured as default boot mode? YES NO Go to UART boot mode 128/590 Doc ID 018553 Rev 3 Go to USB boot mode RM0078 BootROM 7.3 Secure boot 7.3.1 Overview The first stage boot is the ROM code inside of the SPEAr SoC. Once the SPEAr is provisioned with security information, and the secure mode is enabled in the OTP, the device requires cryptographically signed code for the second stage boot. It is up to the second stage boot to provide security for the following stages. Services are provided to the second stage that allow code to use the same cryptographic functions for the following stages. Alternately, the second stage can provide its own security model. 7.3.2 First stage secure boot process The boot process for secure boot requires a signed and encrypted image. Device boot will fail if the image supplied does not have a signature, or if it has a corrupted or invalid signature. The device will not retry the boot process if an invalid image is detected. A device reset is required to resume the boot process. The boot image is protected by an RSA PKCS#1 v2.1 digital signature, and the boot code is protected from casual viewing with a key derived from OTP data and image broadcast descriptor (a user-defined field in the image header). The following procedure outlines the boot process. Details of the cryptographic algorithms can be found in Section 7.3.10: Image signature cryptography. 1. Note: Load the boot code into the SRAM on the device, using any boot source selected by the standard boot loader. RAM is required for the boot process due to in-place decryption. 2. Retrieve the digital signature from the end of the boot image. 3. Use C3 to generate a hash of public keys and check this against OTP public key signature. Return error code if the public key hash is incorrect 4. Using C3, hash the boot image. 5. Verify the signature held in Flash using the Master Public Key and PKCS#1 v2.1 PSS ESMA. Return error code if does not match. 6. Extract the OTP data, signature data and image broadcast descriptor into the C3 buffer. 7. Perform key derivation of data from step 6 to generate the AES key. 8. Decrypt the code in place (SRAM) using the KDF key. 9. Hand off the code execution to the boot code. Boot code has access to the cryptographic API outlined above and could repeat the process, or implement its own solution. Doc ID 018553 Rev 3 129/590 BootROM RM0078 Figure 38. First stage secure boot process Reset security = *(sec_interface_table*) 0xFFFF7E00 security.init (SEC_NORMAL_SECURE_MODE) Error? Yes For(;;); No sec_state = security.get_state() Boot bypass && Sec_state == None Verify CRC Execute from Flash Yes No sec_state = security.get_state() Boot from Peripheral/Flash (normal boot loader process) No For(;;); Yes Load successful? Yes Verify header CRC Load image Verify image digital signature No Verify header CRC Load image Verify image CRC Execute from eSRAM Image verified Yes Decrypt in place Execute No Out For(;;); Out 130/590 Doc ID 018553 Rev 3 RM0078 7.3.3 BootROM Life cycle Life cycle is part of any security application. The ROM-based services and intrinsic chip capabilities are limited to a fixed set of functions, and are considered unchangeable. This does not prevent implementation of life cycle modifications in subsequent boot code. Secure ROM life cycle states From the device point of view there are two states: ● ● Development – Non-Secure ROM device shipped to customer – JTAG enabled Release – Secure ROM device shipped to customer – JTAG disabled and 'Secure' indicator provided to ROM – Provision OTP with security credentials Once the second link of the chain of trust is up and running the customer code can implement any kind of life-cycle required for specific needs. It is assumed that the first level boot does not change frequently, and implements a strong security model. 7.3.4 Services BootROM services typedef struct boot_rom_callbacks_{ unsigned long table_version; void * (*get_soc_type)(void); unsigned long (*get_boot_type)(void); void *nand_info; int (*nand_read)(void *nand, unsigned int offset, unsigned int *length, unsigned char *buffer); unsigned char * (*get_version)(void); int (*get_otpbits)(unsigned long bit_off, unsigned long bit_cnt, unsigned long *buffer); unsigned long (*hamming_encode)(unsigned long parity, void *data, unsigned int d, unsigned int nr); void (*hamming_fix)(void *data, unsigned int d, unsigned int nr, unsigned int fix); } boot_rom_callbacks_t; Doc ID 018553 Rev 3 131/590 BootROM RM0078 Secure ROM services The secure ROM service table is located at 0xFFFF7E00, and can be verified by checking the table version and revision fields. These services are used by the BootROM to validate the second stage boot image that gets loaded into SRAM, before the BootROM transfers control to the SRAM code. Error codes and definitions /** * Security interface error code definition * \ingroup values */ typedef enum sec_err_{ SEC_SUCCESS = 0, /**< Regular success code */ SEC_UNSUPPORTED = 1, /**< The requested feature is not supported in this configuration */ SEC_IMAGE_VERIF_FAILED = 2, /**< Indicate digital signature verification failed*/ SEC_INVALID_PARAMS = 3, /**< Indicate an invalid parameter has been passed */ SEC_ALREADY_PRESENT = 4, /**< Indicate the operation has already been performed and cannot be performed another time (life cycle) */ SEC_SELF_TEST_FAILED = 5, /**< Indicate self test has failed. \note The caller shall enter infinit loop*/ SEC_OUT_OF_RESOURCES = 6, /**< internal memory allocation failed, probably need more memory or there is a fragmentation issue */ SEC_INVALID_BLOB = 7, /**< The blob given as input is invalid */ SEC_INVALID_KEY = 8, /**< The keys given as parameter are not valid i.e. their hash didn't match */ SEC_INVALID_STATE = 9, /**< The operation requested is not supported in the current state of the device */ SEC_OTP_CORRUPT = 10, /**< The operation requested reported a corruption within the OTP memory */ }sec_err_t; /** * \typedef sec_init_t * Defines the initialization mode: * \ingroup values */ typedef enum sec_init_{ SEC_NORMAL_SECURE_MODE, /**< Regular operating mode */ SEC_FAKE_SECURE_MODE, /**< Test mode used in order to simulate secure mode in one of the following 2 cases: - a. ROM only device and bonding option is NOT SECURE - b. NV RAM device and fuse in life cycle hasn't been blown yet */ SEC_ALWAYS_LAST, /**< Sentinel for the enum type */ }sec_init_t; 132/590 Doc ID 018553 Rev 3 RM0078 BootROM typedef enum sec_state_{ /**< Indicates security is enabled at device level */ SEC_SECURITY_ENABLED, SEC_SECURITY_DISABLED, /**< Indicates security is disabled at device level */ }sec_state_t; 7.3.5 Security table in BootROM The security services implemented in the BootROM are provided at a well known location (see BootROM and RAM layout). These services allow subsequent loader code to access the cryptographic algorithms, and C3 hardware. The services are called by C code and conform to the standard ARM C calling conventions. The services allow the caller to assess the state of the secure boot environment and implement the same cryptography used by the secure ROM code in the follow-on boot stages. Table 42. Security table Field Definition revision Firmware revision - reflects which services are provided table_rev Table revision - reflects table structure mem_size Size of the memory required by the security module init Performs initialization (an array of callback function is expected) get_state Returns the current security state of the device verify_image Verifies the current image decrypt_image Decrypts the current image using PKCS#1 v2.1 sign_challenge Signs the incoming challenge using PKCS#1 v2.1 (future support) create_rng_pool Creates random pool - creates a pool of random numbers in OTP. Random pool (RNG_POOL) is a security parameter in OTP and is used as part of the KDF to generate the encryption key for the firmware. provision Flips life cycle state to next state (future support) seal Seals code or data (future support) unseal Unseals code or data (future support) clear_lifecycle De-commission the device - all secrets are lost (future support) C function call information for security services typedef struct sec_interface_table_{ unsigned long revision;/**< Firmware revision number */ unsigned long table_rev;/**< API revision */ unsigned long mem_size;/**< Size of memory required by the security module */ sec_err_t (*sec_init_fn)(sec_init_t flag, void * mem, unsigned long mem_size, boot_rom_callbacks_t * rom_cb); sec_state_t (*sec_get_state_fn)(void * mem); Doc ID 018553 Rev 3 133/590 BootROM RM0078 sec_err_t (*sec_verify_image_fn)(void * mem, unsigned char * image, unsigned long image_size); sec_err_t (*sec_decrypt_image_fn)(void * mem, unsigned char * image_src, unsigned char * image_dst, unsigned long image_size); sec_err_t (*sec_sign_challenge_fn)(void * mem, unsigned char * challenge, unsigned long challenge_size, unsigned char * response, unsigned long * response_length); sec_err_t (*sec_create_rnd_pool_fn)(void * mem, unsigned char * pub_keys, unsigned char * out_buf); sec_err_t (*sec_provision_fn)(void * mem, unsigned long * cycle); sec_err_t (*sec_seal_blob_fn)(void * mem, unsigned char * blob_in, unsigned long sensitive_offset, unsigned long blob_in_len, unsigned char * blob_out); sec_err_t (*sec_unseal_blob_fn)(void * mem, unsigned char * blob_in, unsigned long sensitive_offset, unsigned long blob_in_len, unsigned char * blob_out); sec_err_t (*sec_clear_life_cycle_fn)(void * mem ); }sec_interface_table_t; 134/590 Doc ID 018553 Rev 3 RM0078 7.3.6 BootROM BootROM and RAM layout Following is the final Flash layout, which includes all the BootROM sections, all the secure ROM sections and the tables. Figure 39. BootROM and RAM layout ROM area SRAM area 0xFFFF0000 BootROM 0xFFFF4FFF Security extensions 0xFFFF5000 Security service table 7.3.7 OTP layout SPEAr has three 256-bit banks (Bank 1, Bank 2 and Bank M) embedded into the OTP module, which is an array of one-time programmable anti-fuse memory cells reserved for system assigned purposes. There are two types of data stored in OTP: Modifiable data This data can be updated from a 0 to a 1 at any time and the results effect the function of the device over time. Unmodifiable (fixed) data This data may never be modified and is protected from modification attempts by a CRC and ECC. Any modification of this data will either result in an ECC/CRC failure or ECC correction (if the modification is within the tolerance of the ECC correction capabilities). The purpose for this data is to remain constant for the entire life of the device, until the end of life (when it should be destroyed by zeroizing the data (setting all bits to 1, or burned state). This OTP area should be write protected (if possible) by hardware mechanisms. The security parameters (all unmodifiable) are as follows: Table 43. Security parameters RSA PUBLIC KEY 2048 bits RNG_POOL 128 bits (from bank M bit 104) Doc ID 018553 Rev 3 135/590 BootROM RM0078 Table 43. Security parameters (continued) PUBLIC_KEY_HASH 256 bits: 1 bit at Bank 1 (bit 177) and 255 bits at Bank 2 (bits 0 to 254) PUBLIC_KEY_HASH = SHA256 (RSA_PUBLIC_KEY) DATA[256 bits] SHA-256 (RNG_POOL + PUBLIC_KEY_HASH) CRC[32 bits] DATA[bits 0-31] xor DATA[bits 32-63] xor DATA[bits 64-95] xor DATA[bits 96-127] xor DATA[bits 128-159] xor DATA[bits 160-191] xor DATA[bits 192-223] xor DATA[bits 224-255] The CRC protects the randompool (RNG_POOL) and the public key hash (PUBLIC_KEY_HASH) from modification. The CRC is generated by taking the result of the SHA-256 of this information (DATA defined above) and performing a 32 bit XOR of the resulting 8 32-bit words of DATA. The resulting 32 bits are broken up into two locations in Bank 1 (31 bits in one location and 1 bit (the high bit) in CRC MSB). ECC hamming_code(PUBLIC_KEY_HASH + RNG_POOL + CRC BANK 1 configuration for secure boot 1 bit 31b xxxx Security CRC 0xff Note: 1 8b VE1 8b ST1 8b VE0 8b ST0 20b 1b 1b Reserved CRC Key (1) MSB MSB 0xfe....0xe0 0xdf........................0xc0 10b 160b Security ECC Unused 0xbf.....................................0xa0 0x9f.........................................0 These bits are not used for secure boot, they are reserved for other purposes. Field description Security CRC: Low 31 bits of the CRC VE0/VE1: Version number: 0 ST0/ST1: Security Lifecycle: 3 = SECURE_BOOT CRC MSB: Bit 31 of the CRC Key MSB: Bit 255 of PUBLIC_KEY_HASH Security ECC: ECC code BANK 2 configuration for secure boot 1 bit 255b xxxx Public key hash 0xff 0xfe.................................................................................................................................................................0 Field description: Public key hash: 136/590 Bits 0 to 254 of PUBLIC_KEY_HASH Doc ID 018553 Rev 3 RM0078 BootROM BANK M configuration for secure boot 1 bit 1b 1b 1b 1b 1b 1b 1b 1b 1b 1b 4b xxxx S1 S0 V1 V0 J1 J0 T1 T0 E1 E0 Reserved 137b Security 72b 32b USB/PCI IDs WP bits 0xff Refer to the “OTP configuration” section for the field description of Bank M. OTP life cycle identification The life cycle of the OTP is controlled during initial provisioning. Since the ARM has complete access to the OTP bits, changes to the lifecycle cannot be controlled by the ROM. Security of the OTP must be enforced by the application and is out of scope for the ROM. ST0 and ST1 should be programmed to the same value. The value is read from OTP as (value = ST0 or ST1). Therefore, if ST0 is 3 and ST1 is 4, the resulting value (used) would be 0x7. STx is a walking set of bits defined as follows: 00000000 = No security 00000001 = Security option 1 (see Note: 1) 0000001x = Security option 3 (see Note: 1) 000001xx = Provisioned (security enabled) 00001xxx-1xxxxxxx = Decommissioned Note: 1 Security options 1 and 3 are used only for debugging. These modes allow you to pre-test code by enabling security checks and disabling CRC/ECC checks. ST0 and ST1 are not part of the XOR calculation. Any bits that are defined twice are duplicated and when read, are read as a logical OR (value1 OR value2). That way, if any OTP bits did not get blown correctly, the secondary blown (1 value) bit will override the value and be the used value of 1. Therefore for the lifecycle, a value of 1100 and a value of 0100 would be 1100. This should correct a single bit error in the lifecycle. Furthermore, the lifecycle is defined such that once one of the higher bits is set, there is no way to go back. Once it is provisioned, the only thing that can be done to change it is move to decommissioned. 7.3.8 Usage examples Examples of operating system integration This section covers: ● Supervisor protection ● Hypervisor protection Supervisor protection Supervisor only protection is implemented by the OS entirely. The boot ROM acts only as a root of trust and the OS must guarantee that the code in DDR cannot access any of the secrets held in internal SRAM. In that case the remapping of the vector table cannot be done in external DDR. The biggest drawback of this implementation is that it requires all the supervisor code to run out of internal memory. Doc ID 018553 Rev 3 137/590 BootROM RM0078 Hypervisor protection It is similar to the supervisor protection in the sense that it requires the hypervisor code to run entirely out of internal memory. The main advantage here is to allow the OS to run in external DDR (as it runs in user mode). The performance penalty is significant in that case, plus the OS needs some modification as the hypervisor is actually implementing para-virtualization and not full virtualization. This adds latency on every interrupt and might not be acceptable in some cases. The hypervisor held in external flash working in conjunction with the boot ROM extends the trust model to all accesses to any of the keys is fully protected and handled by the hypervisor itself and enforced by the MMU. Expected flow for the deployment of secure boot The following flow describes operations for development and production of secure mode boot code: 1. Device is shipped from ST Manufacturing plant (or TSMC) in a non-secure state. Meaning the device always boots up in a non-secure mode. During development, it is useful to stay in this mode as long as the development is in progress. 2. After development, test a final image. To do this: Encrypt image using the secure ROM SDK – Create an RSA keys – Sign image – Generate OTP provisioning data b) Write OTP provisioning data to OTP 3. Once secure-mode debug is completed, disable JTAG on the device by setting the JTAG disable OTP bits. Development has ceased, and production OTP flashing solution (using above OTP data) is deployed. 4. Boot in secure mode. From this point on the device boots in secure mode. 5. Firmware update 6. 7.3.9 a) a) The authentication is handled by customer code b) Signed image using same factory root keys (step 2a) is downloaded to the device c) Flash new image Decommission by setting the OTP lifecycle bit (ST0 and ST1) to 0xf BootROM signed image format The bootROM requires that the second stage loader be wrapped in a U-Boot header format. The SecureROM SDK will take an unsigned image and cryptographic keys generated by the SDK, and sign the image. It will also provide the recommended OTP settings. Note: A standard U-Boot image header is used to identify the image, it is not actually U-boot, as U-boot is too large to fit within SRAM. The entire second stage image, plus the BOOT ROM's data area and the digital signature must all fit in SROM (32KB) for secure boot to work properly. The structure of the cryptographic trailer is defined as follows: 138/590 Doc ID 018553 Rev 3 RM0078 BootROM Figure 40. Boot image format U-Boot Header (unencrypted) NOTES: 1) U-Boot Header’s payload size includes the Digital Signature. U-Boot Image (encrypted) 2) The load address for the image must be in SRAM or DDR memory. This is required due to the image decryption phase. U-Boot payload size (pad to 16 byte boundary) Digital signature (unencrypted) Since the digital signature is included in the U-Boot payload size, and the signature is required to be in SRAM when the signature verification is performed, the overall image size supported is reduced by the size of the digital signature size (0x70 bytes in size). The digital signature must also be aligned on a 16-byte boundary, therefore the image must be padded with up to 15 bytes to align it correctly. #define KEY_LEN_BYTES (256/8) typedef struct image_crypto_header_{ unsigned char pub_enc[KEY_LEN_BYTES];// 0x00 unsigned char pub_sig[KEY_LEN_BYTES];// 0x20 unsigned long hdr_revision;// 0x40 unsigned long broadcast_desc; unsigned long revision; unsigned long signature_properties; unsigned char signature[KEY_LEN_BYTES];// 0x50-0x6f }image_crypto_header_t; 7.3.10 Image signature cryptography The boot image is encrypted (using NIST SP800-108 KDF of OTP data), and signed with a PKCS#1 v2.1 PSS algorithm using an RSA 2048 private key. The public key is included in the signed blob, and is verified by comparing the thumbprint of the RSA-2048 public keys used to sign code with OTP data. If they match, the public keys are considered valid, and are used to verify the PKCS#1 signature. The KDF using RL and the security value from OTP allows the user to select an encryption key based on a device key, a version and security key. This key is used independently from the validation, and decrypts the boot image before execution OTP parameters: security value, VE0, VE1, and public key hash Cryptographic header parameters: broadcast_desc A || B: the concatenation of binary strings A and B Doc ID 018553 Rev 3 139/590 BootROM RM0078 Parameters B: Blob of data to sign (boot code) B p: B filled to 16 byte boundary B e: Bp encrypted Ke: RSA private key exponent Km: RSA private key modulus KS: RSA public key of Ke/Km KP: RSA public key value (future support for key-chaining) PKD: SHA_256 (KP || KS) RC : hexadecimal string 01020304050607 RL : (VE0 | VE1) XOR broadcast_desc Rv : security value (128 bits from OTP "security" field) RK : KDF(KI = Rv, Label = 8, Context = RC , L = RL) Ri : hexadecimal string a6a6a6a6a6a6a6a6a6a6a6a6a6a6a6a6 Provisioning 1. Compute PKD and store in OTP public key hash 2. Create 128 bits of security data from random number generator and store in OTP security 3. Set OTP VE0 and VE1 to the same number 4. Set OTP ST0 and ST1 to 4 (provisioned) 5. Compute CRC and store in OTP 6. Compute ECC and store in OTP Signature generation algorithm 1. Create Bp using B (boot image including U-Boot header). Round the size up to the nearest 16-byte boundary and fill with 0xa6. 2. Verify that the image fits in the SRAM (SRAM size minus bootROM SRAM usage). 3. Compute RK from OTP and cryptographic header data. 4. Encrypt Bp with AES CBC-128 using key Rk and iv Ri. This creates Be. 5. 140/590 Create signature using RSA PKCS 2.1 EMSA-PSS with: – Message payload = Be – RSA public key Ks 6. Store RSA public key Ks in cryptographic header pub_sig field. 7. Store signature in cryptographic header. 8. Store broadcast_desc in cryptographic header. Doc ID 018553 Rev 3 RM0078 BootROM Signature verification algorithm Preconditions: bootloader loads Be and cryptographic header into SRAM, and OTP has been programmed. ST0 OR ST1 = 4, 5, 6, or 7. 1. Compute RK from OTP and cryptographic header data 2. Compute PKD from public keys stored in cryptographic header 3. Validate public keys by comparing PKD to OTP If PKD from OTP (step 1 from provisioning) is the same as PKD calculated from step 2. then RSA keys are valid 4. Using public key Ks in the cryptographic header pub_sig field, verify the signature in the cryptographic header against Be using PKCS#1 v2.1 PSS 5. Decrypt Be in place (in SRAM) with RK computed from step 1 6. Transfer control to decrypted B Doc ID 018553 Rev 3 141/590 BootROM RM0078 7.4 Additional information 7.4.1 BootROM on Core 1 SPEAr1340 is based on ARM Cortex A9 processor. For system startup, only one core is necessary. The other one must be stalled and continue only when the SMP OS has been loaded. Hence, it becomes essential for the BootROM to behave differently on the two different cores. This can be easily done by finding the core ID on runtime using the following instruction: MRCp15, 0, r0, c0, c0, 5 /* Get our cpu id */ Core 1 is stalled using the Pen holding mechanism: Core 1 enters into a tight read-loop on the fixed SYSRAM0 location (0xB3800600), waiting for an external event (holding pen release). This event eventually happens in the SMP OS, when the OS writes a valid address at the pen holding location. Figure 41 shows the BootROM flow on Core 1. Figure 41. BootROM on Core 1 SYSTEM RESET - BootROM entry point System initialization: 1. Disables MMU 2. Invalidates I-Cache and enables it Wake up triggered? YES Jump to ALWAYSOn-RAM location (0xE0800000) OUT On 0xB3800604 write “WD1” Set bit 0 of SYS_SW_RST register to trigger complete SoC reset. NO Watchdog reset? YES NO Check address at pen holding location (0xB3800600) NO 142/590 Address = 0xFFFFFF? YES Doc ID 018553 Rev 3 Jump to the address RM0078 7.4.2 BootROM Error codes If error reporting is enabled and BootROM encounters any error, it reports the same on the error reporting device. The following table summarizes various error codes and their meaning. Table 44. Error codes Error code 7.4.3 Possible boot mode Definition 101 na This error signifies unsupported boot mode. 102 Any Image header is corrupted. 103 Any Image data is corrupted. 106 NAND BootROM is unable to initialize the NAND chip. 107 NAND BootROM is unable to read NAND chip. Possibly, the first four blocks are bad blocks. 114 SD/MMC BootROM is unable to find/initialize SD/MMC card. 115 SD/MMC BootROM is unable to read data from SD/MMC card 116 USB OTG The first USB data packet is not 64 bytes long. 117 USB OTG Image data of size not equal to the one specified in the image header. List of supported devices SNOR All devices, whose read command is 0x03, are supported. PNOR TBD NAND All ONFI devices complying to following versions: ● ONFI v1.0 ● ONFI v2.0 ● ONFI v2.1 ● ONFI v2.2 ● ONFI v2.3 Additionally, the following devices are supported. Table 45. Supported NAND devices Device part Vendor Density Bus width Page size NAND02GW3B2CN6 ST 2 GBit x8 2048 bytes + 64 bytes NAND02GW3A Numonyx 2 GBit x8 2048 bytes + 64 bytes NAND08GW3B2CN6 Numonyx 8 Gbit x8 2048 + 64 spare bytes NAND512W3A2C2A6 ST 512 Mbit x8 512 + 16 spare bytes Doc ID 018553 Rev 3 143/590 BootROM Table 45. RM0078 Supported NAND devices (continued) Device part Vendor Density Bus width Page size NAND01GW4B2AN6 ST 1 GBit x16 1024 words + 32 spare NAND01GW3B2BN6 ST 1 GBit x8 2048 + 64 spare bytes NAND04GW3B2BN6 ST 4 GBit x8 2048 + 64 spare bytes NAND128W3A28N6 ST 128 MBit x8 512 + 16 spare bytes NAND256W3A2BN6 ST 256 MBit x8 512 + 16 spare bytes K9K8G08V0A SAMSUNG 8 GBit x8 512 + 16 spare bytes K9F4G08V0A SAMSUNG 4 Gbit x8 512 + 16 spare bytes K9F2G08V0A SAMSUNG 2 GBit x8 512 + 16 spare bytes K9F1208V0A SAMSUNG 64 MBit x8 K9F8G08V0M SAMSUNG 8 Gbit x8 K9F1G16U0M SAMSUNG 1 GBit x16 1024words + 32 spare KM29U256 SAMSUNG 256 MBit x8 512 + 16 spare bytes NAND01GR3B ST 1GBit x8 2048 + 64 spare bytes SD/MMC ● All SD cards complying to v2.0 and 1.0 are supported. ● All MMC cards are supported. USB High/Full-speed USB v2.0 Host is supported. 7.4.4 BootROM table BootROM defines a table of re-entrant BootROM routines that can be used by the next bootloading levels as a library. The table format is the following one, represented as a C structure: #define TABLE_VERSION_2_0 2 #define TABLE_VERSION_2_1 3 const __attribute__ ((section(”.table”))) struct table_s spear_table = { .table_version = TABLE_VERSION_2_1, /* offset 0x00 */ .get_boot_type = getboottype, /* offset 0x04 */ .get_soc_type = getsoctype, /* offset 0x08 */ .nand_info = &nand_info[0], /* offset 0x0C */ .nand_read = nand_read_skip_bad, /* offset 0x10 */ .get_version = getversion, /* offset 0x14 */ .get_otpbits = get_otpbits, /* offset 0x18 */ .hamming_encode = hamming_encode /* offset 0x1C */ .hamming_fix = hamming_fix /* offset 0x20 */ } ; The table fields are all pointers to functions or structure, and basically can be divided in few groups according to their functionality. 144/590 Doc ID 018553 Rev 3 RM0078 BootROM Generic info The purpose of this table section is to get BootROM or SoC generic information. /* BootROM table version */ u32 table_version; /* This routine returns a string containing the BootROM version * * Format: * BOOTROM_VERSION = * $(VERSION).$(PATCHLEVEL).$(SUBLEVEL)$(EXTRAVERSION) */ u8 * (*get_version)(void); /* This routine returns the SOC type we are running onto. * It returns a pointer to following structure: * struct soc_type_s { * u8 soc; * u8 revision; * } ; */ struct soc_type_s * (*get_soc_type)(void); /* This routine returns the boot type selected with the strapping * option (four bits) */ u8 (*get_boot_type)(void); NAND Flash info and access The code to access the NAND Flash is usually not negligible in terms of size. So to decrease the second bootloader footprint, usage of this table section is suggested. /* To read the NAND the nand_read() routine can be used. The * first argument of the nand_read(), the nand_info geometry, can be found * in the table table itself. */ nand_info_t *nand_info; int (*nand_read)(nand_info_t *nand, size_t offset, size_t *length, u_char *buffer); OTP section access This section describes the routines exported by BootROM to access the OTP section. The OTP bits must be accessed and any possible error (due to the nature of OTP technology) must be corrected. /* Following are the routines exported by BootROM to access the * OTP section. */ int (*get_otpbits)(u32 bit_off, u32 bit_cnt, u32 *buffer); u32 (*hamming_encode)(u32 parity, void *data, unsigned int d, unsigned int nr); void (*hamming_fix)(void *data, unsigned int d, unsigned int nr, unsigned int fix); Doc ID 018553 Rev 3 145/590 BootROM 7.4.5 RM0078 Terminology Table 46. Useful terms Term 146/590 Description Authenticate Prove the integrity or identity of an operator or an object. Authorize Grant an authenticated entity access to a service or an object. Authorization session Security protocol that enables the GPE to authenticate service requests from an authorized operator; this is a [FIPS140] requirement enforced by a GPE. Blob Binary large object; opaque data which is sealed. Client Consumer of secure ROM services Credential Authentication value which provides proof of knowledge (password) or proof of ownership (biometrics or smartcard); analogous to [TPM] Auth and [FIPS140] authentication data. Cryptographic boundary from [FIPS140] - an explicitly defined continuous perimeter that establishes the physical bounds of a cryptographic module and contains all the hardware and software components of a cryptographic module. Cryptographic module From [FIPS140] - the set of hardware and/or software that implements NIST Approved security functions (including cryptographic algorithms and key generation) and is contained within the cryptographic boundary. Cryptographic service Fom [FIPS140] - an available GPE security command Endpoint An operator capable of cryptographically exchanging information with the device; the exchange may provide authentication, confidentiality, and integrity. HAL Hardware Abstraction Layer Identity Derived from [FIPS140] - an operator which is uniquely and individually authenticated by the cryptographic module; an identity is associated with a role or roles for authorization; identity-based authentication is a [FIPS140-2] security level 3 requirement. IV Initialization vector, an input parameter to the AES encryption/decryption service. KDF Key derivation function, a cryptographic hash function which derives one or more secret keys from secret values and/or other known information. Mechanism A set of primitives used to implement any of multiple policies; in the context of security, often stated as 'protection mechanism'. Operator From [FIPS140] - a consumer of cryptographic services external to the GPE, which may be human or automation. Permanent state Lifecycle state variables that are in shielded locations and survive power cycles. Policy A particular organizational strategy which is implemented with mechanisms; in the context of security, a 'security policy' protects information resources using 'protection mechanisms'. Doc ID 018553 Rev 3 RM0078 BootROM Table 46. Useful terms (continued) Term 7.4.6 Description Role Derived from [FIPS140] - a class of authenticated operators whose members are authorized to invoke specific cryptographic services and are not authorized to invoke others. Root keys (ROOT_AES, ROOT_HMAC) Private AES256 and HMAC256 keys used by the secure ROM to seal blobs for external storage; they are unique for each device. Seal A secure ROM activity which allows SPEAr secrets to be stored outside of the SPEAr; it produces a signed (SPEAR_HMAC) and encrypted (SPEAR_AES) blob for external storage. TPM Trusted platform module User A class of authenticated operators which consume cryptographic services; equivalent to the [FIPS140] User role. Zeroize Invalidate a critical security parameter in a shielded location. References ● [AES Key Wrap] National Institute of Standards and Technology (NIST), AES Key Wrap Specification, 2001 November ● [ANS X9.31 1998] American National Standard for Financial Services, Digital Signature Using Reversible Public Key Cryptography for the Financial Services Industry (rDSA) ● [ANS X9.62 2005] American National Standard for Financial Services, Public Key Cryptography for the Financial Service Industry, ECDSA ● [FIPS140] see [FIPS140-2] and [FIPS140-3] ● [FIPS140-2] National Institute of Standards and Technology (NIST), Security Requirements for Cryptographic Modules, FIPS Pub 140-2, 2001 May ● [FIPS140-3] National Institute of Standards and Technology (NIST), Security Requirements for Cryptographic Modules, FIPS Pub 140-3 Draft, 2007 ● [FIPS 186] National Institute of Standards and Technology (NIST), Digital Signature Standard, FIPS Pub 186, 2006 March ● [NIST SP800-56] National Institute of Standards and Technology (NIST), Recommendation for Pair-Wise Key Establishment Scheme ● [NIST SP800-57] National Institute of Standards and Technology (NIST), Recommendation for Key Management - Part 1: General, NIST Special Publication 800-57, 2007 March ● [NIST SP800-90] National Institute of Standards and Technology (NIST), Recommendation for Random Number Generation Using Deterministic Random Bit Generators ● [NIST 931 RNG Ext] National Institute of Standards and Technology (NIST), Random Number Generator Based on ANSI X9.31 Appendix A.2.4 Using TDES and AES ● [PKCS#1 v2.1] RSA Laboratories, RSA Cryptography Standard Doc ID 018553 Rev 3 147/590 Static RAMs (SRAM) 8 RM0078 Static RAMs (SRAM) This chapter focuses on SRAM functionality and operation. For the SRAM feature list, refer to the SPEAr1340 datasheet: ● Doc ID 023063, Data sheet, SPEAr1340, Dual-core Cortex A9 HMI embedded MPU For technical details about the programmable registers, refer to the following companion document: ● 8.1 RM0089, Reference manual, SPEAr1340 address map and registers Overview The SPEAr1340 device integrates 2 instances of static RAM blocks, identified as SYSRAM0 (32 KB) and SYSRAM1 4 KB). SYSRAM0 is a single port static RAM with 32 KB size. When all power islands are switched off, SYSRAM0 loses its data contents. SYSRAM1 is a single port static RAM with 4 KB size. When all power islands are switched off, SYSRAM1 maintains its data contents. A part of these memory areas is used during the bootstrap phase by BootROM firmware. After booting, all SRAM areas are fully available for general purpose applications. For the address space location of the two SRAMs, refer to the companion document: RM0089, Reference manual, SPEAr1340 address map and registers. 148/590 Doc ID 018553 Rev 3 RM0078 9 One-time programmable antifuse (OTP) One-time programmable antifuse (OTP) This chapter focuses on OTP functionality and operation. For the OTP feature list, refer to the SPEAr1340 datasheet: ● Doc ID 023063, Data sheet, SPEAr1340, Dual-core Cortex A9 HMI embedded MPU For technical details about the programmable registers, refer to the following companion document: ● 9.1 RM0089, Reference manual, SPEAr1340 address map and registers Overview The OTP block is an array of one-time programmable antifuse memory cells. Because all OTP banks have an embedded charge pump that provides the high voltage required for antifuse programming sessions, no additional high voltage pad is required at the chip interface. Because OTP is software programmable, no dedicated programming interface is required at the chip level. 9.2 Pins OTP has no external pins. 9.3 Clocks The OTP block receives a single clock. It is PCLK, the APB clock, nominally running at 83.5 MHz. Write operations to OTP banks must be performed with system running in SLOW mode (OSCI1 clock) so that PCLK runs at 2 MHz frequency. 9.4 Functional description The OTP block has three main functionalities: ● Write ● Read DATA values can be read back from MISC (OTP outputs are mirrored to MISC registers) together with a valid bit. DATA is refreshed after each reset. The exact availability time is cell-dependent, the valid bit indicates whether data is already available. ● Masking can inhibit write operations to some words. Use masking to prevent the contents of specific bytes in Bank 1 (or 2) from being altered. See also: Section 9.5: Programming. Doc ID 018553 Rev 3 149/590 One-time programmable antifuse (OTP) 9.4.1 RM0078 OTP banks OTP embeds three 255-bit banks, with the following features: ● BANK 1: 255-bit data bank with write-protect mechanism ● BANK 2: 255-bit data bank with write-protect mechanism ● BANK M: 255-bit bank, logically partitioned as described in next section. OTP banks bit mapping and usage BANK 1/BANK 2 In BANK 1 and BANK2, there is one predefined bit (255) used for test purposes. The rest of the bits are available for the user. Table 47. BANK 1/ 2 bit mapping 1 bit 255 bits XXXX Data 255 254...0 XXXX 1 bit reserved for blowing at final test Data 255 bits available for data writing BANK M BANK M contains one predefined bit (255) used for test purposes, and dedicated bits controlled by the BootROM. For a detailed description of these bits, refer to Chapter 7: BootROM. Table 48. 1 bit BANK M bit mapping 4 bits XXXX Reserved 255 254...251 1 bit 1 bit 1 bit 1 bit 1 bit 1 bit 213 bits J1 J0 T1 T0 E1 E0 Reserved 250 249 248 247 246 245 244...32 16 bits 16 bits WP bits B2 WP bits B1 31...16 15...0 XXXX 1 bit reserved for blowing at final test Reserved BootROM controlled (see Table 33: OTP Bank M configuration in Chapter 7: BootROM) J1 | J0 1 bit + 1 redundancy for JTAG disable (both bits should be programmed at “1” in order to permanently disable the JTAG interface) T1 | T0 1 bit + 1 redundancy for TEST disable (both bits should be programmed at “1” in order to permanently disable the TEST interface) E1 | E0 Reserved Reserved BootROM controlled (see Table 33: OTP Bank M configuration in Chapter 7: BootROM) WP bits B2 8 bits + 8 redundancy for masking bank 2 (each couple of bits (0-1; 2-3; … 14-15) should be programmed at “11” in order to inhibit write operations to the corresponding 32-bit word of BANK 2) WP bits B1 8 bits + 8 redundancy for masking bank 1 (each couple of bits (0-1; 2-3; … 14-15) should be programmed at “11” in order to inhibit write operations to the corresponding 32-bit word of BANK 1) 150/590 Doc ID 018553 Rev 3 RM0078 One-time programmable antifuse (OTP) 9.5 Programming 9.5.1 Writing Note: 1. Check that all previous write operations have finished: read the appropriate MISC register. 2. Program data bits to the appropriate MISC registers. 3. Start the OTP write: program the appropriate write bit to the MISC register. 1 The three banks must not be programmed in parallel. 2 It is strongly advised to deploy either a redundancy (OR function between 2 bits) or ECC scheme to the data been written due to typical reliability of fuse burning process. Changing a programmed value Under normal conditions (after a standard write with no masking applied), after data has been programmed (step 3, above): ● Overwriting a 0 with a 1 is possible, as shown in the following example. ● Overwriting a 1 with a 0 has no effect. ● Overwriting a 1 with a 1 can damage an antifuse. Example: changing 0001 to 0101 1. Write 0001 (as described in Section 9.5.1: Writing above) 2. Write 0100 Result: 0101 Note: To avoid antifuse damage, the second write must not be 0101. 9.5.2 Masking Writing a 1 in one of the first 32 bits of BANK M inhibits all write operations to the corresponding byte of BANK1 (or 2). The first couple of bits of the MASK bank inhibits writing to the first 32-bit word of Bank 1; the second couple of bits of the MASK bank inhibits writing to the second 32-bit word of Bank 1, and so on, then moving to Bank 2. Each mask bit has a redundant copy (OR function between the two). Doc ID 018553 Rev 3 151/590 General purpose timers (GPT) 10 RM0078 General purpose timers (GPT) This chapter focuses on GPT functionality and operation. For the GPT feature list, refer to the SPEAr1340 datasheet: ● Doc ID 023063, Data sheet, SPEAr1340, Dual-core Cortex A9 HMI embedded MPU For technical details about the programmable registers, refer to the following companion document: ● 10.1 RM0089, Reference manual, SPEAr1340 address map and registers Overview The SPEAr1340 device integrates 4 instances of a general purpose timer digital block, identified as GPT0, GPT1, GPT2, GPT3. Each instance is a dual timer, for total 8 independent timers. Figure 42. GPT block diagram TOGGLE_FF int MUTIMER MT_INT1_CLK mt_int_clk MT_INT1 TIMER_CLK P_D_OUT CLK RESETn TIMER_DEBUG PWDATA TOGGLE_FF MT_CAPT1 int MT_CAPT2 MUTIMER PADDR[82] MT_INT2_CLK PENABLE PWRITE MT_INT2 DECODER 152/590 P_D_OUT dec_rd_reg Doc ID 018553 Rev 3 PRDATA MB_PD_OUT dec_rd_reg dec_wr_reg PSELgpt WRAP_APB RM0078 10.2 General purpose timers (GPT) Pins For a complete pin description, refer to Doc ID 023063, Data sheet, SPEAr1340, Dual-core Cortex A9 HMI embedded MPU. 10.3 Clocks Refer to Chapter 5: Reset and clock generator (RCG). 10.4 Interrupts Refer to Appendix A: Interrupts. 10.5 Functional description General purpose timers can be used for precise timing measurements, and for the measurement of input signal frequency. GPT are essentially counters that increment based on the clock cycle and the timer prescaler that can be monitored by an application to determine elapsed time. GPT can have timer and capture mode capabilities. The timer clock is generated by a programmable 4-bit prescaler unit that performs a clock division by 1, 2, 4, 8, 16, 32, 64, 128, and 256. The following modes of operation are available: ● Auto-reload mode When the timer is enabled, the counter is cleared and starts incrementing. When it reaches the compare register value, an interrupt source is activated, the counter is automatically cleared and restarts incrementing. The process is repeated until the timer is disabled. ● Single-shot mode When the timer is enabled, the counter is cleared and starts incrementing. When it reaches the compare register value, an interrupt source is activated, the counter stopped and the timer disabled. ● Capture function This function is provided for the measurement of input timing signals. After initialization when a rising transition occurs at the MT_CAPTx input, the actual counter value is stored into the rising edge capture register (TIMER_REDG_CAPTx). In the same way, when a falling edge transition occurs at the CAPT input, the actual counter value is stored into the falling edge capture register (TIMER_FEDG_CAPTx). You can read the value stored in the two capture registers and compute the duration of the rising to falling edge (or vice versa) time interval. Doc ID 018553 Rev 3 153/590 Real-time clock (RTC) 11 RM0078 Real-time clock (RTC) This chapter focuses on RTC functionality and operation. For the RTC feature list, refer to the SPEAr1340 datasheet: ● Doc ID 023063, Data sheet, SPEAr1340, Dual-core Cortex A9 HMI embedded MPU For technical details about the programmable registers, refer to the following companion document: ● 11.1 RM0089, Reference manual, SPEAr1340 address map and registers Overview The RTC is a block that keeps track of the real time of day. It also functions as an alarm and a calendar. The time is displayed in 24-hour format, and time/calendar values are stored in binary-coded decimal format. The time of day, alarm and calendar, status and control registers can all be accessed via a standard 32 APB bus. All read/write operations last 2 cycles. RTC provides a self isolation mode that is activated during power down. This feature allows RTC to continue working if power is not supplied to the rest of the circuit. This feature is realized by supplying separate power and clock connections. A set of 16 general purpose registers (GP-Reg) are provided which can be used to save data during the power down state.GP-Reg-set runs on 32 K oscillator clock and powered by RTC battery. Each register is 32-bit and addressed mapped on the 32-bit APB bus. A bit in status register reflects the status of any pending write to GP-Reg-set. This means that write operation to the GP-Reg-set should be sequential, so you should wait for this pending status bit to be cleared before writing again to GP-Reg-set. 11.2 Pins For a complete pin description, refer to Doc ID 023063, Data sheet, SPEAr1340, Dual-core Cortex A9 HMI embedded MPU. 11.3 Clocks Refer to Chapter 5: Reset and clock generator (RCG). 11.4 Interrupts Refer to Appendix A: Interrupts. 154/590 Doc ID 018553 Rev 3 RM0078 11.5 Real-time clock (RTC) Functional description The RTC block is composed of two sub-blocks: the timer (RTC_32K) and the APB interface (RTC_48M). The timer block is powered by an external and separate battery and is clocked by a 32768 Hz clock. It provides two main functions. ● Time and calendar update ● Power monitoring and self-isolation. The APB interface is powered by the main chip power supply and it is clocked by a 83 MHz clock. It provides the following functions. ● Synchronization between 32 KHz and 48 MHz domains ● Timer registers read and write ● Alarm programming ● Interrupt generation ● Isolation monitoring Doc ID 018553 Rev 3 155/590 Direct memory access controllers (DMAC) 12 RM0078 Direct memory access controllers (DMAC) This chapter focuses on DMAC functionality and operation. For the DMAC feature list, refer to the SPEAr1340 datasheet: ● Doc ID 023063, Data sheet, SPEAr1340, Dual-core Cortex A9 HMI embedded MPU For technical details about the programmable registers related to the DMAC, refer to the system configuration registers (MISC) in the following companion document: ● 12.1 RM0089, Reference manual, SPEAr1340 address map and registers Overview The SPEAr1340 device integrates 2 instances of a DMA controller digital block, identified as DMAC0 and DMAC1. The DMAC is an AHB-central DMA controller core that transfers data from a source peripheral to a destination peripheral over two AHB buses. A wrapper is designed to instantiate 2 DMAC cores (each with 2 AHB master interfaces), 2 ICMs (which arbitrate the same master interface of each DMAC) and a MUX (which manages multiple peripheral handshaking interfaces). Figure 43. DMAC block diagram $-!# #HANNEL N $-!HARDWARE REQUEST)& !RBITER -ASTER)& 12.2 &)&/ #HANNEL !("3LAVE)& Pins For a complete pin description, refer to Doc ID 023063, Data sheet, SPEAr1340, Dual-core Cortex A9 HMI embedded MPU. 12.3 Clocks The DMAC clock is HCLK, the AHB clock. See also: Chapter 5: Reset and clock generator (RCG). 156/590 Doc ID 018553 Rev 3 RM0078 12.4 Direct memory access controllers (DMAC) Interrupts Each DMAC can generate 5 different types of interrupts to ARM (INT_FLAG): ● Error interrupt (IntErr): generated when an ERROR response is received from an AHB slave on the HRESP bus during a transfer ● Destination transaction complete interrupt (IntDstTran): generated after completion of the last AHB transfer of the requested transaction from the handshaking interface on the destination side ● Source transaction complete interrupt (IntSrcTran): generated after completion of the last AHB transfer of the requested transaction from the handshaking interface on the source side ● Block complete interrupt (IntBlock): generated on DMA block transfer completion to the destination peripheral ● Transfer complete interrupt (IntTfr): generated on DMA transfer completion to the destination peripheral Also, the bitwise OR of all bits of the INT_FLAG bus is driven on the INT_COMBINED output. See also: Appendix A: Interrupts . 12.5 Functional description 12.5.1 DMAC wrapper SPEAr1340 provides a DMAC wrapper with 56 DMA lines. These lines are connected to 32 hardware handshaking interfaces allowed by the 2 DMAC cores. Each core is configured with 16 handshaking interfaces. The DMAC wrapper uses two interconnection modules (ICMs) to arbitrate the same master interface of each DMAC core. Figure 44 shows how the 2 ICMs are connected to the master interfaces of both DMACs. Figure 44. DMAC wrapper DMAC wrapper DMAC0 DMAC1 51 41 52 42 ICM0 40 Doc ID 018553 Rev 3 ICM1 50 157/590 Direct memory access controllers (DMAC) 12.5.2 RM0078 DMAC multiplexing The DMAC wrapper uses a multiplexer (MUX) to: ● select the peripheral ● select the DMAC core ● manage a peripheral with a handshaking interface The DMAC multiplexing consists of the following steps. Step 1: selecting the peripheral Each DMAC is configured with 16 handshaking lines. The first 12 can be selected by configuring the miscellaneous registers. The last 4 handshaking lines for each DMAC are always mapped in a fixed way . Figure 45. DMAC handshaking lines allocation HS0’ HS1’ ADC_TX HS0 DMAC0 HS11’ I2S_RX HS11 CAM1_EVEN HS16 HS12 HS12’ HS13’ HS14’ HS15’ HS0” HS1” CAM1_ODD HS17 DMAC1 HS11” HS27 HS12” HS13” HS14” HS15” To select the peripheral, you must configure the miscellaneous register DMAC_HS_SEL according to the allocated handshaking interface (HS), as shown in Table 49. 158/590 Doc ID 018553 Rev 3 RM0078 Direct memory access controllers (DMAC) Table 49. DMAC MUX - selecting the peripheral Handshaking interface # DMAC_HS_SEL bit 0 1 0 0 ADC_TX Reserved 1 1 Reserved Reserved 2 2 SPDIF_TX Reserved 3 3 SPDIF_RX Reserved 4 4 SSP_TX Reserved 5 5 SSP_RX Reserved 6 6 UART0_TX Reserved 7 7 UART0_TX Reserved 8 8 I2C0_TX Reserved 9 9 I2C0_TX Reserved 10 10 I2S_TX Reserved 11 11 I2S_RX Reserved 12 UART1_TX 13 UART1_TX 14 I2C1_TX 15 I2C1_TX 16 12 CAM1_EVEN Reserved 17 13 CAM1_ODD Reserved 18 14 CAM2_EVEN Reserved 19 15 CAM2_ODD Reserved 20 16 CAM3_EVEN Reserved 21 17 CAM3_ODD Reserved 22 18 CAM4_EVEN Reserved 23 19 CAM4_ODD Reserved 24 20 Reserved Reserved 25 21 Reserved Reserved 26 22 Reserved Reserved 27 23 Reserved Reserved 28 Reserved 29 Reserved 30 Reserved 31 Reserved Doc ID 018553 Rev 3 159/590 Direct memory access controllers (DMAC) RM0078 Example: CAM1_EVEN corresponds to line HS#16. According to Table 49, HS#16 corresponds to bit 12. Therefore, to select CAM1_EVEN, set DMAC_HS_SEL[12] to 0. Step 2: selecting the flow controller and the data direction The DMAC controller is compatible with the ARM DMA controller. To select the flow controller and the data flow direction, configure the miscellaneous registers DMAC_FLOW_SEL and DMAC_DIR_SEL as follows: – To select if the flow controller is DMAC or the peripheral, configure the DMAC_FLOW_SEL[HS#] register. – To select the data direction (from or to the peripheral), configure the DMAC_DIR_SEL[HS#] register. Table 51. DMAC MUX - selecting the flow controller and data direction DMAC_FLOW_SEL[i] DMAC_DIR_SEL[i] Flow controller Data direction 0 0 DMAC From the peripheral 0 1 DMAC To the peripheral 1 x Peripheral Not needed Step 3: selecting the DMAC core To select which of the two DMAC cores the peripheral requests must be sent to, you must configure the miscellaneous register DMAC_SEL as shown in Table 52 below. Table 52. DMAC MUX - selecting the DMAC core DMAC_SEL[i] DMAC core involved 0 DMAC core 0 1 DMAC core 1 Step 4: Assigning a handshaking interface on a DMAC channel To route a handshaking interface on a DMAC channel, you must configure the corresponding CFGx register (where x is the channel). – To assign a HS interface as the source of a channel, set SRC_PER bits of the corresponding CFGx register. – To define a HS interface as the destination of a channel, set DST_PER bits of the corresponding CFGx register. Since both SRC_PER and DST_PER fields are on 4 bits ([42:39] and [46:43] respectively), you can write any value from 0 to 15. 160/590 Doc ID 018553 Rev 3 RM0078 Direct memory access controllers (DMAC) DMAC configuration example Here is an example of how to route a peripheral to DMAC, for instance the camera interface on DMAC0 core. 1. 2. According to Table 49 CAM1_EVEN scorresponds to line HS#16. To select CAM1_EVEN, set DMAC_HS_SEL[16] to 0. According to DMAC_SEL[0] register description: hs0_16_map: 0 : hs0 on DMA0 , hs16 on DMA1 1 : hs0 on DMA1 , hs16 on DMA0 Therefore, CAM1_EVEN can be routed on the HS#0 of both DMACs: 3. – If DMAC_SEL[0] = 1, CAM1_EVEN is assigned to DMAC0 and ADC_TX to DMAC1. – If DMAC_SEL[1] = 0, CAM1_EVEN is assigned to DMAC1 and ADC_TX to DMAC0 . Select if the peripheral is source or destination: – If the CAM is source, set SRC_PER = CFGx[42:39] = 0x0. – If the peripheral is destination, set DEST_PER = CFGx[46:43] = 0x0. Tha same procedure should be followed for the other peripherals. Summarizing: 12.5.3 CAM1_EVEN 0x0 CAM1_ODD 0x1 CAM2_EVEN 0x2 CAM2_ODD 0x3 CAM3_EVEN 0x4 CAM3_ODD 0x5 CAM4_EVEN 0x6 CAM4_ODD 0x7 DMAC transfers This section discusses how a single block transfer, made up of transactions, is performed. The device that controls the length of a block is known as the flow controller. The DMAC, the source peripheral or the destination peripheral must be assigned as the flow controller. ● If the block size is known prior to when the channel is enabled, then the DMAC should be programmed as the flow controller. ● If the block size is unknown when the DMAC channel is enabled, either the source or destination peripheral must be the flow controller. Table 53 lists valid transfer types and flow controller combinations. See also: Section 12.6.1: DMAC transfer types on page 166 for programming information. Doc ID 018553 Rev 3 161/590 Direct memory access controllers (DMAC) Table 53. RM0078 Transfer types and flow controller combinations Transfer type Flow controller Memory to memory DMAC Memory to peripheral DMAC Memory to peripheral Peripheral Peripheral to memory DMAC Peripheral to memory Peripheral Peripheral to peripheral DMAC Peripheral to peripheral Source peripheral Peripheral to peripheral Destination peripheral Handshaking interfaces are used at the transaction level to control the flow of single or burst transactions. The operation of the handshaking interface is different and depends on whether the peripheral or the DMAC is the flow controller. The peripheral uses the handshaking interface to indicate to the DMAC that it is ready to transfer or accept data over the AHB bus. A non-memory peripheral can request a DMA transfer through the DMAC using one of two types of handshaking interfaces: ● hardware handshaking: it is accomplished using a dedicated handshaking interface ● software handshaking: it is accomplished through memory-mapped registers Software selects between the hardware or software handshaking interface on a per-channel basis. The type of handshaking interface depends on whether the peripheral is a flow controller or not. For a memory peripheral there is no handshaking interface with the DMAC, and therefore the memory peripheral can never be a flow controller. Once the channel is enabled, the transfer proceeds immediately without waiting for a transaction request. Software handshaking When the slave peripheral requires the DMAC to perform a DMA transaction, it communicates this request by sending an interrupt to the CPU or interrupt controller. The interrupt service routine then uses the software handshake registers to initiate and control a DMA transaction. This group of software registers is used to implement the software handshaking interface. Handshaking interface – DMAC flow controller When the peripheral is not the flow controller, the DMAC tries to efficiently transfer the data using as little of bus bandwidth as possible. Generally, the DMAC tries to transfer the data using burst transactions and, where possible, fill or empty the channel FIFO in single bursts – provided that the software has not limited the burst length. The DMAC can also lock the arbitration for the master bus interface so that a channel is permanently granted the master bus interface. Additionally, the DMAC can assert the AMBA HLOCK signal to lock the system arbiter. 162/590 Doc ID 018553 Rev 3 RM0078 Direct memory access controllers (DMAC) Single transaction region There are cases where a DMA block transfer cannot be completed using only burst transactions. Typically this occurs when the block size is not a multiple of the burst transaction length. In these cases, the block transfer uses burst transactions up to the point where the amount of data left to complete the block is less than the amount of data in a burst transaction. At this point, the DMAC samples the “single” status flag and completes the block transfer using single transactions. The peripheral asserts a single status flag to indicate to the DMAC that there is enough data or space to complete a single transaction from or to the source/destination peripheral. The single transaction region is the time interval where the DMAC uses single transactions to complete the block transfer; burst transactions are exclusively used outside this region. Early-terminated burst transaction When a source or destination peripheral is in the single transaction region, a burst transaction can still be requested. However, src_burst_size_bytes or dst_burst_size_bytes is greater than the number of bytes left to complete in the source/destination block transfer at the time that the burst transaction is triggered. In this case, the burst transaction is started and “early-terminated” at block completion without transferring the programmed amount of data – that is, src_burst_size_bytes or dst_burst_size_bytes – but only the amount required completing the block transfer. An early-terminated burst transaction occurs between the DMAC and the peripheral only when the peripheral is not the flow controller. Handshaking interface – Peripheral flow controller When the peripheral is the flow controller, it controls the length of the block and must communicate to the DMAC when the block transfer is completed. The peripheral does this by telling the DMAC that the current transaction – burst or single – is the last transaction in the block. When the peripheral is the flow controller and the block size is not a multiple of the source/destination burst transaction length, then the peripheral must use single transactions to complete a block transfer. When the peripheral is the flow controller, it indicates directly to DMAC which type of transaction – single or burst – to perform. Where possible, the DMAC uses the maximum possible burst length. It can also lock the arbitration for the master bus so that a channel is permanently granted the master bus interface. The DMAC can also assert the HLOCK signal to lock the system arbiter. Setting up transfers Transfers are set up by programming fields of the CTLx and CFGx registers for that channel. A single block is made up of numerous transactions – single and burst – which are in turn composed of AHB transfers. A peripheral requests a transaction through the handshaking interface to the DMAC. The operation of the handshaking interface is different and depends on what is acting as the flow controller. Doc ID 018553 Rev 3 163/590 Direct memory access controllers (DMAC) 12.5.4 RM0078 Generating requests for the AHB master bus interface Each channel has a source state machine and destination state machine running in parallel. These state machines generate the request inputs to the arbiter, which arbitrates for the master bus interface (one arbiter per master bus interface). When the source/destination state machine is granted control of the master bus interface, and when the master bus interface is granted control of the external AHB bus, then AHB transfers between the peripheral and the DMAC (on behalf of the granted state machine) can take place. AHB transfers from the source peripheral or to the destination peripheral cannot proceed until the channel FIFO is ready. For burst transaction requests and for transfers involving memory peripherals, the criterion for “FIFO readiness” is controlled by the FIFO_MODE field of the CFGx register. The definition of FIFO readiness is the same for: ● Single transactions ● Burst transactions, where CFGx.FIFO_MODE = 0 ● Transfers involving memory peripherals, where CFGx.FIFO_MODE = 0 The channel FIFO is deemed ready when the space/data available is sufficient to complete a single AHB transfer of the specified transfer width. FIFO readiness for source transfers occurs when the channel FIFO contains enough room to accept at least a single transfer of CTLx.SRC_TR_WIDTH width. FIFO readiness for destination transfers occurs when the channel FIFO contains data to form at least a single transfer of CTLx.DST_TR_WIDTH width. When CFGx.FIFO_MODE = 1, then the criteria for FIFO readiness for burst transaction requests and transfers involving memory peripherals is as follows: ● A FIFO is ready for a source burst transfer when the FIFO is less than half empty; ● A FIFO is ready for a destination burst transfer when the FIFO is greater than or equal to half full. When the source/destination peripheral is not memory, the source/destination state machine waits for a single/burst transaction request. Upon receipt of a transaction request and only if the channel FIFO is “ready” for source/destination AHB transfers, a request for the master bus interface is made by the source/destination state machine. When the source/destination peripheral is memory, the source/destination state machine must wait until the channel FIFO is “ready”. A request is then made for the master bus interface. There is no handshaking mechanism employed between a memory peripheral and the DMAC. 12.5.5 AHB master interface arbitration Each DMAC channel has two request lines that request ownership of a particular master bus interface: channel source and channel destination request lines. Source and destination arbitrate separately for the bus. Once a source/destination state machine gains ownership of the master bus interface and the master bus interface has ownership of the AHB bus, then AHB transfers can proceed between the peripheral and DMAC. An arbitration scheme decides which of the request lines is granted the particular master bus interface. Each channel has a programmable priority. A request for the master bus interface can be made at any time, but is granted only after the current AHB transfer (burst 164/590 Doc ID 018553 Rev 3 RM0078 Direct memory access controllers (DMAC) or single) has completed. Therefore, if the master interface is transferring data for a lower priority channel and a higher priority channel requests service, then the master interface will complete the current burst for the lower priority channel before switching to transfer data for the higher priority channel. 12.5.6 Scatter/Gather Scatter is relevant to a destination transfer. The destination address is incremented or decremented by a programmed amount – the destination scatter interval (DSI) field of the DSRx register – multiplied by the number of bytes in a single AHB transfer to the destination when a scatter boundary is reached. The number of destination transfers between successive scatter boundaries is programmed into the destination scatter count (DSC) field of the DSRx register. Scatter is enabled by writing a 1 to the CTLx.DST_SCATTER_EN field. The CTLx.DINC field determines if the address is incremented, decremented, or remains fixed when a scatter boundary is reached. If the CTLx.DINC field indicates a fixed-address control throughout a DMA transfer, then the CTLx.DST_SCATTER_EN field is ignored, and the scatter feature is automatically disabled. Gather is relevant to a source transfer. The source address is incremented or decremented by a programmed amount – the source gather interval (SGI) field of the SGRx register – multiplied by the number of bytes in a single AHB transfer from the source when a gather boundary is reached. The number of source transfers between successive gather boundaries is programmed into the source gather count (SGC) field of the SGRx register. Gather is enabled by writing a 1 to the CTLx.SRC_GATHER_EN field. The CTLx.SINC field determines if the address is incremented, decremented, or remains fixed when a gather boundary is reached. If the CTLx.SINC field indicates a fixed-address control throughout a DMA transfer, then the CTLx.SRC_GATHER_EN field is ignored, and the scatter feature is automatically disabled. 12.5.7 Endianness The endianness of the AHB slave interface is statically configured to little-endian for both the DMACs. Endianness of each AHB master interface for both DMACs can be dynamically configured by programming a miscellaneous register. ● Because two DMACs are instantiated in the wrapper and each DMAC has two master interfaces, four pins are connected to MISC: ● DMA0_BIG_END_M1; ● DMA0_BIG_END_M2; ● DMA1_BIG_END_M1; ● DMA1_BIG_END_M2. 0 = Little-endian 1 = Big-endian Default value: 0. Doc ID 018553 Rev 3 165/590 Direct memory access controllers (DMAC) 12.6 Programming 12.6.1 DMAC transfer types RM0078 A DMA transfer may consist of single or multi-block transfers. On successive blocks of a multi-block transfer, the SARx/DARx register in the DMAC is reprogrammed using either of the following methods: ● Block chaining using linked lists ● Auto-reloading ● Contiguous address between blocks On successive blocks of a multi-block transfer, the CTLx register in the DMAC is reprogrammed using either of the following methods: ● Block chaining using linked lists ● Auto-reloading When block chaining, using Linked Lists is the multi-block method of choice. On successive blocks, the LLPx register in the DMAC is reprogrammed using block chaining with linked lists. A block descriptor consists of six registers: SARx, DARx, LLPx, CTLx, SSTATx, and DSTATx. The first four registers, along with the CFGx register, are used by the DMAC to set up and describe the block transfer. Note: The term Link List Item (LLI) and block descriptor are synonymous. Multi-block transfers Multi-block transfers are enabled by setting the DMAH_CHX_MULTI_BLK_EN configuration parameter to True. Note: Multi-block transfers—in which the source and destination are swapped during the transfer—are not supported. In a multi-block transfer, the direction must not change for the duration of the transfer. Block chaining using linked lists To enable multi-block transfers using block chaining, you must set the configuration parameter DMAH_CHx_MULTI_BLK_EN to True and the DMAH_CHx_HC_LLP parameter to False. In this case, the DMAC reprograms the channel registers prior to the start of each block by fetching the block descriptor for that block from system memory. This is known as an LLI update. DMAC block chaining uses a Linked List Pointer register (LLPx) that stores the address in memory of the next linked list item. Each LLI contains the corresponding block descriptors: 166/590 1. SARx 2. DARx 3. LLPx 4. CTLx 5. SSTATx 6. DSTATx Doc ID 018553 Rev 3 RM0078 Direct memory access controllers (DMAC) To set up block chaining, you program a sequence of Linked Lists in memory. LLI accesses are always 32-bit accesses (Hsize = 2) aligned to 32-bit boundaries and cannot be changed or programmed to anything other than 32-bit, even if the AHB master interface of the LLI supports more than a 32-bit data width. The SARx, DARx, LLPx, and CTLx registers are fetched from system memory on an LLI update. If configuration parameter DMAH_CHx_CTL_WB_EN = True, then the updated contents of the CTLx, SSTATx, and DSTATx registers are written back to memory on block completion. Figure 46 and Figure 47 show how you use chained linked lists in memory to define multi-block transfers using block chaining. Figure 46. Multiblock transfer using linked lists when DMAH_CHx_STAT_SRC set to true LLI(0) LLI(1) Write-back for DSTATx Write-back for DSTATx Write-back for SSTATx Write-back for SSTATx CTLx[63:32] CTLx[63:32] CTLx[31:0] CTLx[31:0] LLPx(1) LLPx(2) DARx DARx SARx System memory SARx LLPx(0) LLPx(1) LLPx(2) It is assumed that no allocation is made in system memory for the source status when the configuration parameter DMAH_CHx_STAT_SRC is set to False. If this parameter is False, then the order of a Linked List item is as follows: 1. SARx 2. DARx 3. LLPx 4. CTLx 5. DSTATx Figure 47. Multiblock transfer using linked lists when DMAH_CHx_STAT_SRC set to false LLI(1) LLI(0) Write-back for DSTATx Write-back for DSTATx CTLx[63:32] CTLx[63:32] CTLx[31:0] CTLx[31:0] LLPx(1) LLPx(2) DARx DARx SARx LLPx(0) System memory SARx LLPx(1) Doc ID 018553 Rev 3 LLPx(2) 167/590 Direct memory access controllers (DMAC) Note: RM0078 In order to not confuse the SARx, DARx, LLPx, CTLx, STATx, and DSTATx register locations of the LLI with the corresponding DMAC memory mapped register locations, the LLI register locations are prefixed with LLI; that is, LLI.SARx, LLI.DARx, LLI.LLPx, LLI.CTLx, LLI.SSTATx, and LLI.DSTATx. Figure 48 and Figure 49 show the mapping of a Linked List Item stored in memory to the channel registers block descriptor. Rows 6 through 10 of Table 54 show the required values of LLPx, CTLx, and CFGx for multiblock DMA transfers using block chaining. For rows 6 through 10 of Table 54, the LLI.CTLx, LLI.LLPx, LLI.SARx, and LLI.DARx register locations of the LLI are always affected at the start of every block transfer. The LLI.LLPx and LLI.CTLx locations are always used to reprogram the DMAC LLPx and CTLx registers. However, depending on the Table 54 row number, the LLI.SARx/LLI.DARx address may or may not be used to reprogram the DMAC SARx/DARx registers. Table 54. Transfer type 1. Singleblock or last transfer of multi-block 2. Autoreload multiblock transfer with contiguous SAR 3. Autoreload multiblock transfer with contiguous DAR Programming of transfer types and channel register update method LLP_ RELOAD LLP_DS RELOAD LLP. SRC_EN _SRC T_EN _DST LOC = 0 (CTLx) (CFGx) (CTLx) (CFGx) Yes Yes Yes 0 0 0 0 0 1 0 0 0 CTLx, LLPx Update Method 0 None, user reprograms SARx Update Method None (single) No 1 CTLx, LLPx are Conreloaded tiguous from initial values Autoreload No 0 CTLx, LLPx are Autoreloaded reload from initial values Contiguous No Autoreload No None (single) Yes Yes 0 1 0 1 CTLx, LLPx are Autoreloaded reload from initial values 5. Singleblock or last transfer of multi-block No 0 0 0 0 None, user reprograms 0 CTLx, LLPx loaded from Connext Linked tiguous List item 168/590 No 0 0 1 Write back(1) None (single) 4. Autoreload multiblock transfer 6. Linked list multi-block transfer with contiguous SAR DARx Update Method Doc ID 018553 Rev 3 None (single) Linked list Yes RM0078 Table 54. Transfer type 7. Linked list multi-block transfer with auto-reload SAR 8. Linked list multi-block transfer with contiguous DAR 9. Linked list multi-block transfer with auto-reload DAR 10. Linked list multiblock transfer Direct memory access controllers (DMAC) Programming of transfer types and channel register update method (continued) LLP_ RELOAD LLP_DS RELOAD LLP. SRC_EN _SRC T_EN _DST LOC = 0 (CTLx) (CFGx) (CTLx) (CFGx) No No No No 0 1 1 1 1 1 0 0 0 0 0 1 CTLx, LLPx Update Method SARx Update Method DARx Update Method Write back(1) 0 CTLx, LLPx loaded from Autonext Linked reload List item 0 CTLx, LLPx loaded from ConLinked list tiguous next Linked List item Yes 1 CTLx, LLPx loaded from AutoLinked list reload next Linked List item Yes 0 CTLx, LLPx loaded from Linked list Linked list Yes next Linked List item Linked list Yes 1. This column assumes that the configuration parameter DMAH_CHx_CTL_WB_EN = True. If DMAH_CHx_CTL_WB_EN = False, then there is never writeback of the control and status registers regardless of transfer type, and all rows of this column are “No”. Figure 48. Mapping of block descriptor (LLI) in memory to channel registers when DMAH_CHx_STAT_SRC set to True hsize = 32 LLI.DSTATx LLI.SSTATx LLI.CTLx[63:32] LLI.CTLx[31:0] LLI.LLPx LLI.DARx LLI.SARx {LLPx[31:2], 2‘b00} + 0x18 {LLPx[31:2], 2‘b00} + 0x14 {LLPx[31:2], 2‘b00} + 0x10 Fixed Offsets {LLPx[31:2], 2‘b00} + 0xc {LLPx[31:2], 2‘b00} + 0x8 {LLPx[31:2], 2‘b00} + 0x4 {LLPx[31:2], 2‘b00} 32 Doc ID 018553 Rev 3 base address of LLI (LLPx.LOC) 169/590 Direct memory access controllers (DMAC) RM0078 Figure 49. Mapping of block descriptor (LLI) in memory to channel registers when DMAH_CHx_STAT_SRC set to False hsize = 32 LLI.DSTATx {LLPx[31:2], 2‘b00} + 0x14 LLI.CTLx[63:32] LLI.CTLx[31:0] LLI.LLPx LLI.DARx LLI.SARx {LLPx[31:2], 2‘b00} + 0x10 Fixed Offsets {LLPx[31:2], 2‘b00} + 0xc {LLPx[31:2], 2‘b00} + 0x8 {LLPx[31:2], 2‘b00} + 0x4 {LLPx[31:2], 2‘b00} 32 Note: base address of LLI (LLPx.LOC) Throughout this chapter, there are descriptions about fetching the LLI.CTLx register from the location pointed to by the LLPx register. This exact location is the LLI base address (stored in LLPx register) plus the fixed offset. For example, in Figure 48, the location of the LLI.CTLx register is LLPx.LOC + 0xc. Referring to Table 54, if the Write Back column entry is “Yes” and the configuration parameter DMAH_CHx_CTL_WB_EN = True, then the CTLx[63:32] register is always written to system memory (to LLI.CTLx[63:32]) at the end of every block transfer. The source status is fetched and written to system memory at the end of every block transfer if the Write Back column entry is “Yes,” DMAH_CHx_CTL_WB_EN = True, DMAH_CHx_STAT_SRC = True, and CFGx.SS_UPD_EN is enabled. The destination status is fetched and written to system memory at the end of every block transfer if the Write Back column entry is “Yes,” DMAH_CHx_CTL_WB_EN = True, DMAH_CHx_STAT_DST = True, and CFGx.DS_UPD_EN is enabled. Auto-reloading of channel registers During auto-reloading, the channel registers are reloaded with their initial values at the completion of each block and the new values used for the new block. Depending on the row number in Table 54, some or all of the SARx, DARx, and CTLx channel registers are reloaded from their initial value at the start of a block transfer. Contiguous address between blocks In this case, the address between successive blocks is selected as a continuation from the end of the previous block. Enabling the source or destination address to be contiguous between blocks is a function of the CTLx.LLP_SRC_EN, CFGx.RELOAD_SRC, CTLx.LLP_DST_EN, and CTLx.RELOAD_DST registers (see Table 54). Note: 170/590 You cannot select both SARx and DARx updates to be contiguous. If you want this functionality, you should increase the size of the Block Transfer (CTLx.BLOCK_TS), or if this is at the maximum value, use Row 10 of Table 54 and set up the LLI.SARx address of the block descriptor to be equal to the end SARx address of the previous block. Similarly, set up Doc ID 018553 Rev 3 RM0078 Direct memory access controllers (DMAC) the LLI.DARx address of the block descriptor to be equal to the end DARx address of the previous block. For more information, refer to Section : Multi-block transfer with linked list for source and linked list for destination (Row 10). Suspension of transfers between blocks At the end of every block transfer, an end-of-block interrupt is asserted if: Note: 1. Interrupts are enabled, CTLx.INT_EN = 1, and 2. The channel block interrupt is unmasked, MaskBlock[n] = 1, where n is the channel number. The block-complete interrupt is generated at the completion of the block transfer to the destination. For rows 6, 8, and 10 of Table 54, the DMA transfer does not stall between block transfers. For example, at the end-of-block N, the DMAC automatically proceeds to block N + 1. For rows 2, 3, 4, 7, and 9 of Table 54 (SARx and/or DARx auto-reloaded between block transfers), the DMA transfer automatically stalls after the end-of-block interrupt is asserted, if the end-of-block interrupt is enabled and unmasked. The DMAC does not proceed to the next block transfer until a write to the ClearBlock[n] block interrupt clear register, done by software to clear the channel block-complete interrupt, is detected by hardware. For rows 2, 3, 4, 7, and 9 of Table 54 (SARx and/or DARx auto-reloaded between block transfers), the DMA transfer does not stall if either: ● Interrupts are disabled, CTLx.INT_EN = 0, or ● The channel block interrupt is masked, MaskBlock[n] = 0, where n is the channel number. Channel suspension between blocks is used to ensure that the end-of-block ISR (interrupt service routine) of the next-to-last block is serviced before the start of the final block commences. This ensures that the ISR has cleared the CFGx.RELOAD_SRC and/or CFGx.RELOAD_DST bits before completion of the final block. The reload bits CFGx.RELOAD_SRC and/or CFGx.RELOAD_DST should be cleared in the end-of-block ISR for the next-to-last block transfer. Ending multi-block transfers All multi-block transfers must end as shown in either Row 1 or Row 5 of Table 54. At the end of every block transfer, the DMAC samples the row number, and if the DMAC is in the Row 1 or Row 5 state, then the previous block transferred was the last block and the DMA transfer is terminated. Row 1 and Row 5 are used for single-block transfers or terminating multi-block transfers. Transfers initiated in rows 2, 3 or 4 can only end in row 1; similarly, transfers initiated in rows 6 through 10 can only end in row 5. Ending in the Row 5 state enables status fetch and write-back for the last block. Ending in the Row 1 state disables status fetch and write-back for the last block. For rows 2, 3, and 4 of Table 54, (LLPx.LOC = 0 and CFGx.RELOAD_SRC and/or CFGx.RELOAD_DST is set), multi-block DMA transfers continue until both the CFGx.RELOAD_SRC and CFGx.RELOAD_DST registers are cleared by software. They should be programmed to 0 in the end-of-block interrupt service routine that services the next-to-last block transfer; this puts the DMAC into the Row 1 state. Doc ID 018553 Rev 3 171/590 Direct memory access controllers (DMAC) RM0078 For rows 6, 8, and 10 of Table 54 (both CFGx.RELOAD_SRC and CFGx.RELOAD_DST cleared), the user must set up the last block descriptor in memory so that both LLI.CTLx.LLP_SRC_EN and LLI.CTLx.LLP_DST_EN are 0. The sampling of the LLPx.LOC bit takes place exclusively at the beginning of the transfer when the channel is enabled. This determines whether writeback is enabled throughout the complete transfer, and changing the value of this bit in subsequent blocks on the same transfer does not have any effect. Note: 172/590 The only allowed transitions between the rows of Table 54 are from any row into Row 1 or Row 5. As already stated, a transition into row 1 or row 5 is used to terminate the DMA transfer; all other transitions between rows are not allowed. Software must ensure that illegal transitions between rows do not occur between blocks of a multi-block transfer. For example, if block N is in row 10, then the only allowed rows for block N +1 are rows are rows 10 or 5. Doc ID 018553 Rev 3 RM0078 12.6.2 Direct memory access controllers (DMAC) Programming example The following flow diagram shows an overview of programming the DMA described in Section : Programming example for linked list multi-block transfer. Figure 50. Flowchart for DMA programming example Y Idle Read ChEnReg Channel busv N Program CTLx Register Clear pending interrupts Write to DONE bit Write to BLOCK_TS to set block transfer size Write to LLP_SRC_EN, LLP_DST_EN to set block chaining for source/destination Write to TT_FC to set transfer type and flow control Write to SRC_TR_WIDTH to set source transfer width Write to DST_TR_WIDTH to set destination transfer width Write to SMS/DMS to identify AHB layer for source/destination Write to SINC/DINC for incrementing address for source/destination Write to SRC_MSIZE, DEST_MSIZE to set source/destination burst transaction length Write SRC_GATHER_EN DST_SCATTER_EN to set source/destination gather enable bit Write to INT_EN to set Interrupt Enable bit Program CTLx Register LOCK_B Bus Lock bit set Set Bus Lock Level duration LOCK_B_L N Write to HS_SEL_SRC, HS_SEL_DST to set source/destination handshaking interface Hardware handshaking enabled Y LOCK_CH Channel Lock bit set Y Write to SRC_PER, DEST_PER to assign hardware handshaking interface N Write to SS_UPD_EN, DS_UPD_EN to set source/destination Status Update Enable Y Set Channel Lock Level duration LOCK_CH_L N Write to FIFO_EMPTY bit, CH_SUSP Channel Suspend bit and CH_PRIOR Channel Priority bit Set LLPx register locations of all LLI entries Write to Protection Control bit PROTCTL Write to FIFO_MODE select bit and Flow Control Mode bit FCMODE Set SARx/DARx register locations of all LLI entries Scatter enabled Write to RELOAD_SRC, RELOAD_DST to set automatid source/ destination Reload Write to SRC_HS_POL, DST_HS_POL to set source/destination Handshaking Interface Polarity Doc ID 018553 Rev 3 Program SGRx register Y Program DSRx register N Gather enabled Write to MAX_ABRST to set Maximum AMBA Burst Length Y N Clear pending interrupts Write to ChEnReg to enable DMAC 173/590 Direct memory access controllers (DMAC) RM0078 Programming example for linked list multi-block transfer This section explains the step-by-step programming of the DMAC. The example demonstrates row 10 of Table 54 for multi-block transfer with linked list for source and linked list for destination. This example uses the DMAC to move four blocks of contiguous data from source to destination memory using the linked list feature. 1. Set up the chain of linked list items – otherwise known as block descriptors – in memory. Write the control information in the LLI.CTLx register location of the block descriptor for each LLI in memory for Channel 1. In the LLI.CTLx register, the following is programmed: a) Set up the transfer type for a memory-to-memory transfer: ctlx[22:20] = 3'b000; b) Set up the transfer characteristics: - Transfer width for the source in the SRC_TR_WIDTH field ctlx[6:4] = 3'b001; - Transfer width for the destination in the DST_TR_WIDTH field ctlx[3:1] = 3'b001; - Source master layer in the SMS field where the source resides ctlx[26:25] = 2'b00; - Destination master layer in the DMS field where the destination resides ctlx[24:23] = 2'b00; - Incrementing address for the source in the SINC field ctlx[10:9] = 2'b00; - Incrementing address for the destination in the DINC field ctlx[8:7] = 2'b00; 2. Write the channel configuration information into the CFGx register for Channel 1: a) HS_SEL_SRC/HS_SEL_DST bits select which of the handshaking interfaces.hardware or software.is active for source requests on this channel. cfgx[11] = 1'b0; cfgx[10] = 1'b0; These settings are ignored because both the source and destination are memory types. b) If the hardware handshaking interface is activated for the source or destination peripheral, assign the handshaking interface to the source and destination peripheral by programming the SRC_PER and DEST_PER bits: cfgx[46:43] = 1'b0; cfgx[42:39] = 1'b0; These settings are ignored because both the source and destination are memory types. 3. The following For loop, shown as a programming example, sets the following: – LLI.LLPx register locations of all LLI entries in memory (except the last) to nonzero and point to the base address of the next Linked List Item. – LLI.SARx/LLI.DARx register locations of all LLI entries in memory point to the start source/destination block address preceding that LLI fetch. The For statement below configures the LLPx entries: 174/590 Doc ID 018553 Rev 3 RM0078 Direct memory access controllers (DMAC) for(i=0 ; i < 4 ; i=i+1) begin if (i == 3) llpx = 0; // end of LLI else llpx = llp_addr + 20; // start of next LLI //-: Program SAR `AHB_MASTER.write(0, llp_addr, sarx, AhbWord32Attrb, handle[0]); //-: Program DAR `AHB_MASTER.write(0, (llp_addr + 4), darx, AhbWord32Attrb, handle[0]); //-: Program LLP `AHB_MASTER.write(0, (llp_addr + 8), llpx, AhbWord32Attrb, handle[0]); //-: Program CTL `AHB_MASTER.write(0, (llp_addr + 12), ctlx[31:0], AhbWord32Attrb, handle[0]); `AHB_MASTER.write(0, (llp_addr + 16), ctlx[63:32], AhbWord32Attrb, handle[0]); // update pointers llp_addr = llp_addr + 20; // start of next LLI // 4 // ( sarx darx 16-bit words each with scatter/gather interval in each block will work only with scatter_gather count of 2) = sarx + 24; = darx + 24; end 4. If Gather is enabled—DMAH_CHx_SRC_GAT_EN = True and CTLx.SRC_GATHER_EN is enabled— program the SGRx register for Channel 1. 5. If Scatter is enabled—DMAH_CHx_DST_SCA_EN = True and CTLx.DST_SCATTER_EN is enabled—program the DSRx register for Channel 1. 6. Clear any pending interrupts on the channel from the previous DMA transfer by writing to the Interrupt Clear registers. 7. Finally, enable the channel by writing a 1 to the ChEnReg.CH_EN bit; the transfer is performed. Doc ID 018553 Rev 3 175/590 Direct memory access controllers (DMAC) 12.6.3 RM0078 Programming a channel Three registers – LLPx, CTLx, and CFGx – need to be programmed to determine whether single- or multi-block transfers occur, and which type of multi-block transfer is used. The different transfer types are shown in Table 54. The DMAC can be programmed to fetch the status from the source or destination peripheral; this status is stored in the SSTATx and DSTATx registers. When the DMAC is programmed to fetch the status from the source or destination peripheral, it writes this status and the contents of the CTLx register back to memory at the end of a block transfer. The Write Back column of Table 54 shows when this occurs. The “Update Method” columns indicate where the values of SARx, DARx, CTLx, and LLPx are obtained for the next block transfer when multi-block DMAC transfers are enabled. Note: In Table 54, all other combinations of LLPx.LOC = 0, CTLx.LLP_SRC_EN, CFGx.RELOAD_SRC, CTLx.LLP_DST_EN, and CFGx.RELOAD_DST are illegal, and will cause indeterminate or erroneous behavior. Programming examples Section : Single-block transfer (Row 1) on page 176 Section : Multi-block transfer with linked list for source and linked list for destination (Row 10) on page 178 Section : Multi-block transfer with source address auto-reloaded and destination address auto-reloaded (Row 4) on page 182 Section : Multi-block transfer with source address auto-reloaded and linked list destination address (Row 7) on page 186 Section : Multi-block transfer with source address auto-reloaded and contiguous destination address (Row 3) on page 191 Section : Multi-block DMA transfer with linked list for source and contiguous destination address (Row 8) on page 195 Single-block transfer (Row 1) This section describes a single-block transfer, Row 1 in Table 54. Note: 176/590 Row 5 in Table 54 is also a single-block transfer with write-back of control and status information enabled at the end of the single-block transfer. 1. Read the Channel Enable register to choose a free (disabled) channel; refer to “ChEnReg” register. 2. Clear any pending interrupts on the channel from the previous DMA transfer by writing to the Interrupt Clear registers: ClearTfr, ClearBlock, ClearSrcTran, ClearDstTran, and ClearErr. Reading the Interrupt Raw Status and Interrupt Status registers confirms that all interrupts have been cleared. Doc ID 018553 Rev 3 RM0078 Direct memory access controllers (DMAC) 3. Program the following channel registers: a) Write the starting source address in the SARx register for channel x. b) Write the starting destination address in the DARx register for channel x. c) Program CTLx and CFGx according to Row 1, as shown in Table 54. Program the LLPx register with 0. d) Write the control information for the DMA transfer in the CTLx register for channel x. For example, in the register, you can program the following: - Set up the transfer type (memory or non-memory peripheral for source and destination) and flow control device by programming the TT_FC of the CTLx register. - Set up the transfer characteristics, such as: • Transfer width for the source in the SRC_TR_WIDTH field. • Transfer width for the destination in the DST_TR_WIDTH field. • Source master layer in the SMS field where the source resides. • Destination master layer in the DMS field where the destination resides. • Incrementing/decrementing or fixed address for the source in the SINC field. • Incrementing/decrementing or fixed address for the destination in the DINC field. e) Write the channel configuration information into the CFGx register for channel x. - Designate the handshaking interface type (hardware or software) for the source and destination peripherals; this is not required for memory. This step requires programming the HS_SEL_SRC/HS_SEL_DST bits, respectively. Writing a 0 activates the hardware handshaking interface to handle source/destination requests. Writing a 1 activates the software handshaking interface to handle source and destination requests. - If the hardware handshaking interface is activated for the source or destination peripheral, assign a handshaking interface to the source and destination peripheral; this requires programming the SRC_PER and DEST_PER bits, respectively. f) If gather is enabled (parameter DMAH_CHx_SRC_GAT_EN = True and CTLx.SRC_GATHER_EN is enabled), program the SGRx register for channel x. g) If scatter is enabled (parameter DMAH_CHx_DST_SCA_EN = True and CTLx.DST_SCATTER_EN), program the DSRx register for channel x. 4. After the DMAC-selected channel has been programmed, enable the channel by writing a 1 to the ChEnReg.CH_EN bit. Ensure that bit 0 of the DmaCfgReg register is enabled. 5. Source and destination request single and burst DMA transactions in order to transfer the block of data (assuming non-memory peripherals). The DMAC acknowledges at the completion of every transaction (burst and single) in the block and carries out the block transfer. 6. Once the transfer completes, hardware sets the interrupts and disables the channel. At this time, you can respond to either the Block Complete or Transfer Complete interrupts, or poll for the transfer complete raw interrupt status register (RawTfr[n], n = channel number) until it is set by hardware, in order to detect when the transfer is complete. Note that if this polling is used, the software must ensure that the transfer complete interrupt is cleared by writing to the Interrupt Clear register, ClearTfr[n], before the channel is enabled. Doc ID 018553 Rev 3 177/590 Direct memory access controllers (DMAC) RM0078 Multi-block transfer with linked list for source and linked list for destination (Row 10) Note: This type of multi-block transfer can only be enabled when either of the following parameters is set: ● DMAH_CHx_MULTI_BLK_TYPE = NO_HARDCODE, or ● DMAH_CHx_MULTI_BLK_TYPE = LLP_LLP 1. Read the Channel Enable register (ChEnReg) to choose a free (disabled) channel. 2. Set up the chain of Linked List Items (otherwise known as block descriptors) in memory. Write the control information in the LLI.CTLx register location of the block descriptor for each LLI in memory (see Figure 46) for channel x. For example, in the register, you can program the following: a) Set up the transfer type (memory or non-memory peripheral for source and destination) and flow control device by programming the TT_FC of the CTLx register. b) Set up the transfer characteristics, such as: - Transfer width for the source in the SRC_TR_WIDTH field. - Transfer width for the destination in the DST_TR_WIDTH field. - Source master layer in the SMS field where the source resides. - Destination master layer in the DMS field where the destination resides. - Incrementing/decrementing or fixed address for the source in the SINC field. - Incrementing/decrementing or fixed address for the destination in the DINC field. 3. 178/590 Write the channel configuration information into the CFGx register for channel x. a) Designate the handshaking interface type (hardware or software) for the source and destination peripherals; this is not required for memory. This step requires programming the HS_SEL_SRC/HS_SEL_DST bits, respectively. Writing a 0 activates the hardware handshaking interface to handle source/destination requests for the specific channel. Writing a 1 activates the software handshaking interface to handle source/destination requests. b) If the hardware handshaking interface is activated for the source or destination peripheral, assign the handshaking interface to the source and destination peripheral. This requires programming the SRC_PER and DEST_PER bits, respectively. 4. Make sure that the LLI.CTLx register locations of all LLI entries in memory (except the last) are set as shown in Row 10 of Table 54. The LLI.CTLx register of the last Linked List Item must be set as described in Row 1 or Row 5 of Table 54. Figure 46 shows a Linked List example with two list items. 5. Make sure that the LLI.LLPx register locations of all LLI entries in memory (except the last) are non-zero and point to the base address of the next Linked List Item. 6. Make sure that the LLI.SARx/LLI.DARx register locations of all LLI entries in memory point to the start source/destination block address preceding that LLI fetch. 7. If parameter DMAH_CHx_CTL_WB_EN = True, ensure that the LLI.CTLx.DONE field of the LLI.CTLx register locations of all LLI entries in memory is cleared. 8. If source status fetching is enabled (DMAH_CHx_CTL_WB_EN = True, DMAH_CHx_STAT_SRC = True, and CFGx.SS_UPD_EN is enabled), program the SSTATARx register so that the source status information can be fetched from the Doc ID 018553 Rev 3 RM0078 Direct memory access controllers (DMAC) location pointed to by the SSTATARx. For conditions under which the source status information is fetched from system memory, refer to the Write Back column of Table 54. 9. If destination status fetching is enabled (DMAH_CHx_CTL_WB_EN = True, DMAH_CHx_STAT_DST = True, and CFGx.DS_UPD_EN is enabled), program the DSTATARx register so that the destination status information can be fetched from the location pointed to by the DSTATARx register. For conditions under which the destination status information is fetched from system memory, refer to the Write Back column of Table 54. 10. If gather is enabled (DMAH_CHx_SRC_GAT_EN = True and CTLx.SRC_GATHER_EN is enabled), program the SGRx register for channel x. 11. If scatter is enabled (DMAH_CHx_DST_SCA_EN = True and CTLx.DST_SCATTER_EN is enabled) program the DSRx register for channel x. 12. Clear any pending interrupts on the channel from the previous DMA transfer by writing to the Interrupt Clear registers: ClearTfr, ClearBlock, ClearSrcTran, ClearDstTran, and ClearErr. Reading the Interrupt Raw Status and Interrupt Status registers confirms that all interrupts have been cleared. 13. Program the CTLx and CFGx registers according to Row 10, as shown in Table 54 14. Program the LLPx register with LLP(0), the pointer to the first linked list item. 15. Finally, enable the channel by writing a 1 to the ChEnReg.CH_EN bit; the transfer is performed. 16. The DMAC fetches the first LLI from the location pointed to by LLPx(0). Note: The LLI.SARx, LLI.DARx, LLI.LLPx, and LLI.CTLx registers are fetched. The DMAC automatically reprograms the SARx, DARx, LLPx, and CTLx channel registers from the LLPx(0). 17. Source and destination request single and burst DMA transactions to transfer the block of data (assuming non-memory peripheral). The DMAC acknowledges at the completion of every transaction (burst and single) in the block and carries out the block transfer. 18. Once the block of data is transferred, the source status information is fetched from the location pointed to by the SSTATARx register and stored in the SSTATx register if DMAH_CHx_CTL_WB_EN = True, DMAH_CHx_STAT_SRC = True, and CFGx.SS_UPD_EN is enabled. For conditions under which the source status information is fetched from system memory, refer to the Write Back column of Table 54. The destination status information is fetched from the location pointed to by the DSTATARx register and stored in the DSTATx register if DMAH_CHx_CTL_WB_EN = True, DMAH_CHx_STAT_DST = True, and CFGx.DS_UPD_EN is enabled. For conditions under which the destination status information is fetched from system memory, refer to the Write Back column of Table 54. 19. If DMAH_CHx_CTL_WB_EN = True, then the CTLx[63:32] register is written out to system memory. For conditions under which the CTLx[63:32] register is written out to system memory, refer to the Write Back column of Table 54. The CTLx[63:32] register is written out to the same location on the same layer (LLPx.LMS) where it was originally fetched; that is, the location of the CTLx register of the linked list item fetched prior to the start of the block transfer. Only the second word of the CTLx register is written out – CTLx[63:32] – because only the CTLx.BLOCK_TS and CTLx.DONE fields have been updated by the DMAC hardware. Additionally, the CTLx.DONE bit is asserted to indicate block completion. Therefore, software can poll the LLI.CTLx.DONE bit of the CTLx register in the LLI to ascertain when a block transfer has completed. Doc ID 018553 Rev 3 179/590 Direct memory access controllers (DMAC) Note: RM0078 Do not poll the CTLx.DONE bit in the DMAC memory map; instead, poll the LLI.CTLx.DONE bit in the LLI for that block. If the polled LLI.CTLx.DONE bit is asserted, then this block transfer has completed. This LLI.CTLx.DONE bit was cleared at the start of the transfer (Step 7). 20. The SSTATx register is now written out to system memory if DMAH_CHx_CTL_WB_EN = True, DMAH_CHx_STAT_SRC = True, and CFGx.SS_UPD_EN is enabled. It is written to the SSTATx register location of the LLI pointed to by the previously saved LLPx.LOC register. The DSTATx register is now written out to system memory if DMAH_CHx_CTL_WB_EN = True, DMAH_CHx_STAT_DST = True, and CFGx.DS_UPD_EN is enabled. It is written to the DSTATx register location of the LLI pointed to by the previously saved LLPx.LOC register. The end-of-block interrupt, int_block, is generated after the write-back of the control and status registers has completed. 21. The write-back location for the control and status registers is the LLI pointed to by the previous value of the LLPx.LOC register, not the LLI pointed to by the current value of the LLPx.LOC register. next LLI from the memory location pointed to by the current LLPx register and automatically reprograms the SARx, DARx, LLPx, and CTLx channel registers. The DMA transfer continues until the DMAC determines that the CTLx and LLPx registers at the end of a block transfer match the ones described in Row 1 or Row 5 of Table 54 (as discussed earlier). The DMAC then knows that the previously transferred block was the last block in the DMA transfer. The DMA transfer might look like that shown in Figure 51. Figure 51. Multi-block with linked address for source and destination Address of Source Layer Address of Destination Layer Block 2 SAR(2) Block 2 DAR(2) Block 1 SAR(1) Block 1 DAR(1) Block 0 SAR(0) Block 0 DAR(0) Source Blocks 180/590 Doc ID 018553 Rev 3 Destination Blocks RM0078 Direct memory access controllers (DMAC) If the user needs to execute a DMA transfer where the source and destination address are contiguous, but where the amount of data to be transferred is greater than the maximum block size CTLx.BLOCK_TS, then this can be achieved using the type of multi-block transfer shown in Figure 52. Figure 52. Multi-block with linked address for source and destination where SARx and DARx between successive blocks are contiguous Address of Source Layer Address of Destination Layer Block3 DAR(3) Block3 Block2 DAR(2) SAR(3) Block2 Block1 DAR(1) SAR(2) Block1 SAR(1) Block0 DAR(0) Block0 SAR(0) Source Blocks Doc ID 018553 Rev 3 Destination Blocks 181/590 Direct memory access controllers (DMAC) RM0078 The DMA transfer flow is shown in Figure 53. Figure 53. DMA transfer flow for source and destination linked list address Channel enabled by software LLI fetch Hardware reprograms SARx, DARx, CTLx, and LLPx DMAC block transfer Source/destination status fetch Write-back of control and source/destination status to LLI Block-complete interrupt generated here Is DMAC in Row1 or Row5 of the “Programming of transfer types and channel register update method” table? DMAC transfer complete interrupt generated here no yes Channel disabled by hardware Multi-block transfer with source address auto-reloaded and destination address auto-reloaded (Row 4) Note: This type of multi-block transfer can only be enabled when either of the following parameters is set: ● DMAH_CHx_MULTI_BLK_TYPE = NO_HARDCODE or 182/590 ● DMAH_CHx_MULTI_BLK_TYPE = RELOAD_RELOAD 1. Read the Channel Enable register (ChEnReg) to choose an available (disabled) channel. 2. Clear any pending interrupts on the channel from the previous DMA transfer by writing to the Interrupt Clear registers: ClearTfr, ClearBlock, ClearSrcTran, ClearDstTran, and ClearErr. Reading the Interrupt Raw Status and Interrupt Status registers confirms that all interrupts have been cleared. Doc ID 018553 Rev 3 RM0078 Direct memory access controllers (DMAC) 3. Program the following channel registers: a) Write the starting source address in the SARx register for channel x. b) Write the starting destination address in the DARx register for channel x. c) Program CTLx and CFGx according to Row 4, as shown in Table 54. Program the LLPx register with 0. d) Write the control information for the DMA transfer in the CTLx register for channel x. For example, in the register, you can program the following: - Set up the transfer type (memory or non-memory peripheral for source and destination) and flow control device by programming the TT_FC of the CTLx register. - Set up the transfer characteristics, such as: • Transfer width for the source in the SRC_TR_WIDTH field. • Transfer width for the destination in the DST_TR_WIDTH field. • Source master layer in the SMS field where the source resides. • Destination master layer in the DMS field where the destination resides. • Incrementing/decrementing or fixed address for the source in the SINC field. • Incrementing/decrementing or fixed address for the destination in the DINC field. e) If gather is enabled (DMAH_CHx_SRC_GAT_EN = True and CTLx.SRC_GATHER_EN is enabled), program the SGRx register for channel x. f) If scatter is enabled (DMAH_CHx_DST_SCA_EN = True and CTLx.DST_SCATTER_EN), program the DSRx register for channel x. g) Write the channel configuration information into the CFGx register for channel x. Ensure that the reload bits, CFGx. RELOAD_SRC and CFGx.RELOAD_DST, are enabled. - Designate the handshaking interface type (hardware or software) for the source and destination peripherals; this is not required for memory. This step requires programming the HS_SEL_SRC/HS_SEL_DST bits, respectively. Writing a 0 activates the hardware handshaking interface to handle source/destination requests for the specific channel. Writing a 1 activates the software handshaking interface to handle source/destination requests. - If the hardware handshaking interface is activated for the source or destination peripheral, assign the handshaking interface to the source and destination peripheral. This requires programming the SRC_PER and DEST_PER bits, respectively. 4. After the DMAC selected channel has been programmed, enable the channel by writing a 1 to the ChEnReg.CH_EN bit. Ensure that bit 0 of the DmaCfgReg register is enabled. 5. Source and destination request single and burst DMAC transactions to transfer the block of data (assuming non-memory peripherals). The DMAC acknowledges on completion of each burst/single transaction and carries out the block transfer. 6. When the block transfer has completed, the DMAC reloads the SARx, DARx, and CTLx registers. Hardware sets the block-complete interrupt. The DMAC then samples the row number, as shown in Table 54. If the DMAC is in Row 1, then the DMA transfer either respond to the Block Complete or Transfer Complete interrupts, or poll for the transfer complete raw interrupt status register (RawTfr[n], where n is the channel number) until it is set by hardware, in order to detect when the transfer is complete. Doc ID 018553 Rev 3 183/590 Direct memory access controllers (DMAC) RM0078 Note that if this polling is used, software must ensure that the transfer complete interrupt is cleared by writing to the Interrupt Clear register, ClearTfr[n], before the channel is enabled. If the DMAC is not in Row 1, the next step is performed. 7. The DMA transfer proceeds as follows: a) If interrupts are enabled (CTLxx.INT_EN = 1) and the block-complete interrupt is unmasked (MaskBlock[x] = 1’b1, where x is the channel number), hardware sets the block-complete interrupt when the block transfer has completed. It then stalls until the block-complete interrupt is cleared by software. If the next block is to be the last block in the DMA transfer, then the block-complete ISR (interrupt service routine) should clear the reload bits in the CFGx.RELOAD_SRC and CFGx.RELOAD_DST registers. This puts the DMAC into Row 1, as shown in Table 54. If the next block is not the last block in the DMA transfer, then the reload bits should remain enabled to keep the DMAC in Row 4. b) If interrupts are disabled (CTLx.INT_EN = 0) or the block-complete interrupt is masked (MaskBlock[x] = 1’b0, where x is the channel number), then hardware does not stall until it detects a write to the block-complete interrupt clear register; instead, it immediately starts the next block transfer. In this case, software must clear the reload bits in the CFGx.RELOAD_SRC and CFGx.RELOAD_DST registers to put the DMAC into Row 1 of Table 54 before the last block of the DMA transfer has completed. The transfer is similar to that shown in Figure 54. Figure 54. Multi-block dma transfer with source and destination address autoreloaded Address of Source Layer Address of Destination Layer Block0 Block1 Block2 SAR DAR BlockN Source Blocks 184/590 Doc ID 018553 Rev 3 Destination Blocks RM0078 Direct memory access controllers (DMAC) The DMA transfer flow is shown in Figure 55. Figure 55. DMA transfer flow for source and destination address auto-reloaded Channel enabled by software Block transfer Reload SARx, DARx, and CTLx Is DMAC in Row1 of the “Programming of transfer types and channel register update method” table? yes DMAC transfer complete interrupt generated here no Channel disabled by hardware CTLx.INT_EN = 1 & MASKBLOCK[x]=1? Block-complete interrupt generated here no yes Stall until block-complete interrupt cleared by software Doc ID 018553 Rev 3 185/590 Direct memory access controllers (DMAC) RM0078 Multi-block transfer with source address auto-reloaded and linked list destination address (Row 7) Note: This type of multi-block transfer can only be enabled when either of the following parameters is set: ● DMAH_CHx_MULTI_BLK_TYPE = 0 or ● DMAH_CHx_MULTI_BLK_TYPE = RELOAD_LLP 1. Read the Channel Enable register (ChEnReg) in order to choose a free (disabled) channel. 2. Set up the chain of linked list items (otherwise known as block descriptors) in memory. Write the control information in the LLI.CTLx register location of the block descriptor for each LLI in memory (see Figure 46) for channel x. For example, in the register you can program the following: a) Set up the transfer type (memory or non-memory peripheral for source and destination) and flow control peripheral by programming the TT_FC of the CTLx register. b) Set up the transfer characteristics, such as: - Transfer width for the source in the SRC_TR_WIDTH field. . - Transfer width for the destination in the DST_TR_WIDTH field. - Source master layer in the SMS field where the source resides. - Destination master layer in the DMS field where the destination resides. - Incrementing/decrementing or fixed address for the source in the SINC field. - Incrementing/decrementing or fixed address for the destination in the DINC field. 3. Note: Write the starting source address in the SARx register for channel x. The values in the LLI.SARx register locations of each of the Linked List Items (LLIs) set up in memory, although fetched during an LLI fetch, are not used. 4. Write the channel configuration information into the CFGx register for channel x. - Designate the handshaking interface type (hardware or software) for the source and destination peripherals; this is not required for memory. This step requires programming the HS_SEL_SRC/HS_SEL_DST bits. Writing a 0 activates the hardware handshaking interface to handle source/destination requests for the specific channel. Writing a 1 activates the software handshaking interface source/destination requests. - If the hardware handshaking interface is activated for the source or destination peripheral, assign the handshaking interface to the source and destination peripheral; this requires programming the SRC_PER and DEST_PER bits, respectively. 5. 186/590 Make sure that the LLI.CTLx register locations of all LLIs in memory (except the last) are set as shown in Row 7 of Table 54, while the LLI.CTLx register of the last Linked List item must be set as described in Row 1 or Row 5 of Table 54. Figure 46 shows a Linked List example with two list items. Doc ID 018553 Rev 3 RM0078 Direct memory access controllers (DMAC) 6. Ensure that the LLI.LLPx register locations of all LLIs in memory (except the last) are non-zero and point to the next Linked List Item. 7. Ensure that the LLI.DARx register location of all LLIs in memory point to the start destination block address preceding that LLI fetch. 8. If DMAH_CHx_CTL_WB_EN = True, ensure that the LLI.CTLx.DONE fields of the LLI.CTLx register locations of all LLIs in memory are cleared. 9. If source status fetching is enabled (DMAH_CHx_CTL_WB_EN = True, DMAH_CHx_STAT_SRC = True, and CFGx.SS_UPD_EN is enabled), program the SSTATARx register so that the source status information can be fetched from the location pointed to by the SSTATARx. For conditions under which the source status information is fetched from system memory, refer to the Write Back column of Table 54. 10. If destination status fetching is enabled (DMAH_CHx_CTL_WB_EN = True, DMAH_CHx_STAT_DST = True, and CFGx.DS_UPD_EN is enabled), program the DSTATARx register so that the destination status information can be fetched from the location pointed to by the DSTATARx register. For conditions under which the destination status information is fetched from system memory, refer to the Write Back column of Table 54. 11. If gather is enabled (DMAH_CHx_SRC_GAT_EN = True and CTLx.SRC_GATHER_EN is enabled), program the SGRx register for channel x. 12. If scatter is enabled (DMAH_CHx_DST_SCA_EN = True and CTLx.DST_SCATTER_EN, program the DSRx register for channel x. 13. Clear any pending interrupts on the channel from the previous DMA transfer by writing to the Interrupt Clear registers: ClearTfr, ClearBlock, ClearSrcTran, ClearDstTran, and ClearErr. Reading the Interrupt Raw Status and Interrupt Status registers confirms that all interrupts have been cleared. 14. Program the CTLx and CFGx registers according to Row 7, as shown in Table 54. 15. Program the LLPx register with LLPx(0), the pointer to the first Linked List item. 16. Finally, enable the channel by writing a 1 to the ChEnReg.CH_EN bit; the transfer is performed. Ensure that bit 0 of the DmaCfgReg register is enabled. 17. The DMAC fetches the first LLI from the location pointed to by LLPx(0). Note: The LLI.SARx, LLI.DARx, LLI.LLPx, and LLI.CTLx registers are fetched. The LLI.SARx register – although fetched – is not used. 18. Source and destination request single and burst DMAC transactions in order to transfer the block of data (assuming non-memory peripherals). The DMAC acknowledges at the completion of every transaction (burst and single) in the block and carries out the block transfer. 19. Once the block of data is transferred, the source status information is fetched from the location pointed to by the SSTATARx register and stored in the SSTATx register if DMAH_CHx_CTL_WB_EN = True, DMAH_CHx_STAT_SRC = True, and CFGx.SS_UPD_EN is enabled. For conditions under which the source status information is fetched from system memory, refer to the Write Back column of Table 54. The destination status information is fetched from the location pointed to by the DSTATARx register and stored in the DSTATx register if DMAH_CHx_CTL_WB_EN = True, DMAH_CHx_STAT_DST = True, and CFGx.DS_UPD_EN is enabled. For conditions under which the destination status information is fetched from system memory, refer to the Write Back column of Table 54. 20. If DMAH_CHx_CTL_WB_EN = True, then the CTLx[63:32] register is written out to system memory. For conditions under which the CTLx[63:32] register is written out to Doc ID 018553 Rev 3 187/590 Direct memory access controllers (DMAC) RM0078 system memory, refer to the Write Back column of Table 54. The CTLx[63:32] register is written out to the same location on the same layer (LLPx.LMS) where it was originally fetched; that is, the location of the CTLx register of the linked list item fetched prior to the start of the block transfer. Only the second word of the CTLx register is written out – CTLx[63:32] – because only the CTLx.BLOCK_TS and CTLx.DONE fields have been updated by hardware within the DMAC. The LLI.CTLx.DONE bit is asserted to indicate block completion. Therefore, software can poll the LLI.CTLx.DONE bit field of the CTLx register in the LLI to ascertain when a block transfer has completed. Note: Do not poll the CTLx.DONE bit in the DMAC memory map. Instead, poll the LLI.CTLx.DONE bit in the LLI for that block. If the polled LLI.CTLx.DONE bit is asserted, then this block transfer has completed. This LLI.CTLx.DONE bit was cleared at the start of the transfer (Step 8). 21. The SSTATx register is now written out to system memory if DMAH_CHx_CTL_WB_EN = True, DMAH_CHx_STAT_SRC = True, and CFGx.SS_UPD_EN is enabled. It is written to the SSTATx register location of the LLI pointed to by the previously saved LLPx.LOC register. The DSTATx register is now written out to system memory if DMAH_CHx_CTL_WB_EN = True, DMAH_CHx_STAT_DST = True, and CFGx.DS_UPD_EN is enabled. It is written to the DSTATx register location of the LLI pointed to by the previously saved LLPx.LOC register. The end-of-block interrupt, int_block, is generated after the write-back of the control and status registers has completed. Note: The write-back location for the control and status registers is the LLI pointed to by the previous value of the LLPx.LOC register, not the LLI pointed to by the current value of the LLPx.LOC register. 22. The DMAC reloads the SARx register from the initial value. Hardware sets the blockcomplete interrupt. The DMAC samples the row number, as shown in Table 54. If the DMAC is in Row 1 or Row 5, then the DMA transfer has completed. Hardware sets the transfer complete interrupt and disables the channel. You can either respond to the Block Complete or Transfer Complete interrupts, or poll for the transfer complete raw interrupt status register (RawTfr[n], n = channel number) until it is set by hardware, in order to detect when the transfer is complete. Note that if this polling is used, software must ensure that the transfer complete interrupt is cleared by writing to the Interrupt Clear register, ClearTfr[n], before the channel is enabled. If the DMAC is not in Row 1 or Row 5 as shown in Table 54, the following steps are performed. 23. The DMA transfer proceeds as follows: 23. The DMA transfer proceeds as follows: 188/590 a) If interrupts are enabled (CTLx.INT_EN = 1) and the block-complete interrupt is unmasked (MaskBlock[x] = 1’b1, where x is the channel number), hardware sets the block-complete interrupt when the block transfer has completed. It then stalls until the block-complete interrupt is cleared by software. If the next block is to be the last block in the DMA transfer, then the block-complete ISR (interrupt service routine) should clear the CFGx.RELOAD_SRC source reload bit. This puts the DMAC into Row 1, as shown in Table 54. If the next block is not the last block in the DMA transfer, then the source reload bit should remain enabled to keep the DMAC in Row 7, as shown in Table 54. b) If interrupts are disabled (CTLx.INT_EN = 0) or the block-complete interrupt is masked (MaskBlock[x] = 1’b0, where x is the channel number), then hardware does not stall until it detects a write to the block-complete interrupt clear register; instead, it immediately starts the next block transfer. In this case, software must Doc ID 018553 Rev 3 RM0078 Direct memory access controllers (DMAC) clear the source reload bit, CFGx.RELOAD_SRC in order to put the device into Row 1 of Table 54 before the last block of the DMA transfer has completed. 24. The DMAC fetches the next LLI from memory location pointed to by the current LLPx register and automatically reprograms the DARx, CTLx, and LLPx channel registers. Note that the next block is the last block of the DMA transfer, then the CTLx and LLPx registers just fetched from the LLI should match Row 1 or Row 5 of Table 54. The DMA transfer might look like that shown in Figure 56. Figure 56. Multi-block DMA transfer with source address auto-reloaded and linked list destination address Address of Source Layer Address of Destination Layer Block0 SAR DAR(0) Block1 DAR(1) Block2 DAR(2) BlockN DAR(N) Source Blocks Doc ID 018553 Rev 3 Destination Blocks 189/590 Direct memory access controllers (DMAC) RM0078 The DMA transfer flow is shown in Figure 57. Figure 57. DMA transfer flow for source address auto-reloaded and linked list destination address Channel enabled by software LLI fetch Hardware reprograms DARx, CTLx, and LLPx DMAC block transfer Source/destination status fetch Write-back of control and source/destination status to LLI Reload SARx yes DMAC transfer complete interrupt generated here Is DMAC in Row1 of the “Programming of transfer types and channel register update method” table? no Channel disabled by hardware CTLx.INT_EN = 1 & MASKBLOCK[x]=1? Block-complete interrupt generated here yes Stall until block interrupt cleared by hardware 190/590 Doc ID 018553 Rev 3 no RM0078 Direct memory access controllers (DMAC) Multi-block transfer with source address auto-reloaded and contiguous destination address (Row 3) Note: This type of multi-block transfer can only be enabled when either of the following parameters is set: ● DMAH_CHx_MULTI_BLK_TYPE = 0 or ● DMAH_CHx_MULTI_BLK_TYPE = RELOAD_CONT 1. Read the Channel Enable register (ChEnReg) to choose a free (disabled) channel. 2. Clear any pending interrupts on the channel from the previous DMA transfer by writing to the Interrupt Clear registers: ClearTfr, ClearBlock, ClearSrcTran, ClearDstTran, and ClearErr. Reading the Interrupt Raw Status and Interrupt Status registers confirms that all interrupts have been cleared. 3. Program the following channel registers: a) Write the starting source address in the SARx register for channel x. b) Write the starting destination address in the DARx register for channel x . c) Program CTLx and CFGx according to Row 3, shown in Table 54. Program the LLPx register with 0. d) Write the control information for the DMA transfer in the CTLx register for channel x. For example, in the register, you can program the following: - Set up the transfer type (memory or non-memory peripheral for source and destination) and flow control device by programming the TT_FC of the CTLx register. - Set up the transfer characteristics, such as: • Transfer width for the source in the SRC_TR_WIDTH field. • Transfer width for the destination in the DST_TR_WIDTH field. • Source master layer in the SMS field where the source resides. • Destination master layer in the DMS field where the destination resides. • Incrementing/decrementing or fixed address for the source in the SINC field. • Incrementing/decrementing or fixed address for the destination in the DINC field. e) If gather is enabled (DMAH_CHx_SRC_GAT_EN = True and CTLx.SRC_GATHER_EN is enabled), program the SGRx register for channel x. f) If scatter is enabled (DMAH_CHx_DST_SCA_EN = True and CTLx.DST_SCATTER_EN is enabled), program the DSRx register for channel x. g) Write the channel configuration information into the CFGx register for channel x. - Designate the handshaking interface type (hardware or software) for the source and destination peripherals; this is not required for memory. This step requires programming the HS_SEL_SRC/HS_SEL_DST bits, respectively. Writing a 0 activates the hardware handshaking interface to handle source/destination requests for the specific channel. Writing a 1 activates the software handshaking interface to handle source/destination requests. - If the hardware handshaking interface is activated for the source or destination peripheral, assign the handshaking interface to the source and destination Doc ID 018553 Rev 3 191/590 Direct memory access controllers (DMAC) RM0078 peripheral. This requires programming the SRC_PER and DEST_PER bits, respectively. 192/590 4. After the DMAC channel has been programmed, enable the channel by writing a 1 to the ChEnReg.CH_EN bit. Ensure that bit 0 of the DmaCfgReg register is enabled. 5. Source and destination request single and burst DMAC transactions to transfer the block of data (assuming non-memory peripherals). The DMAC acknowledges at the completion of every transaction (burst and single) in the block and carries out the block transfer. 6. When the block transfer has completed, the DMAC reloads the SARx register; the DARx register remains unchanged. Hardware sets the block-complete interrupt. The DMAC then samples the row number, as shown in Table 54. If the DMAC is in Row 1, then the DMA transfer has completed. Hardware sets the transfer-complete interrupt and disables the channel. You can either respond to the Block Complete or Transfer Complete interrupts, or poll for the transfer complete raw interrupt status register (RawTfr[n], n = channel number) until it is set by hardware, in order to detect when the transfer is complete. Note that if this polling is used, software must ensure that the transfer complete interrupt is cleared by writing to the Interrupt Clear register, ClearTfr[n], before the channel is enabled. If the DMAC is not in Row 1, the next step is performed. 7. The DMA transfer proceeds as follows: a) If interrupts are enabled (CTLx.INT_EN = 1) and the block-complete interrupt is unmasked (MaskBlock[x] = 1’b1, where x is the channel number), hardware sets the block-complete interrupt when the block transfer has completed. It then stalls until the block-complete interrupt is cleared by software. If the next block is to be the last block in the DMA transfer, then the block-complete ISR (interrupt service routine) should clear the source reload bit, CFGx.RELOAD_SRC. This puts the DMAC into Row 1, as shown in Table 54. If the next block is not the last block in the DMA transfer, then the source reload bit should remain enabled to keep the DMAC in Row 3, as shown in Table 54. b) If interrupts are disabled (CTLx.INT_EN = 0) or the block-complete interrupt is masked (MaskBlock[x] = 1’b0, where x is the channel number), then hardware does not stall until it detects a write to the block-complete interrupt clear register; instead, it starts the next block transfer immediately. In this case, software must clear the source reload bit, CFGx.RELOAD_SRC, to put the device into Row 1 of Table 54 before the last block of the DMA transfer has completed. Doc ID 018553 Rev 3 RM0078 Direct memory access controllers (DMAC) The transfer is similar to that shown in Figure 58. Figure 58. Multi-block DMA transfer with source address auto-reloaded and contiguous destination address Address of Source Layer Address of Destination Layer Block2 Block1 DAR(2) Block0 SAR DAR(1) DAR(0) Source Blocks Doc ID 018553 Rev 3 Destination Blocks 193/590 Direct memory access controllers (DMAC) RM0078 The DMA transfer flow is shown in Figure 59. Figure 59. DMA transfer flow for source address auto-reloaded and contiguous destination address Channel enabled by software Block transfer Reload SARx and CTLx yes DMAC transfer complete interrupt generated here Is DMAC in Row1 of the “Programming of transfer types and channel register update method” table? no Channel disabled by hardware CTLx.INT_EN = 1 & MASKBLOCK[x]=1? Block-complete interrupt generated here yes Stall until block interrupt cleared by software 194/590 Doc ID 018553 Rev 3 no RM0078 Direct memory access controllers (DMAC) Multi-block DMA transfer with linked list for source and contiguous destination address (Row 8) Note: This type of multi-block transfer can only be enabled when either of the following parameters is set: ● DMAH_CHx_MULTI_BLK_TYPE = 0 or ● DMAH_CHx_MULTI_BLK_TYPE = LLP_CONT 1. Read the Channel Enable register (ChEnReg) to choose a free (disabled) channel. 2. Set up the linked list in memory. Write the control information in the LLI.CTLx register location of the block descriptor for each LLI in memory (see Figure 46) for channel x. For example, in the register, you can program the following: a) Set up the transfer type (memory or non-memory peripheral for source and destination) and flow control device by programming the TT_FC of the CTLx register. b) Set up the transfer characteristics, such as: - Transfer width for the source in the SRC_TR_WIDTH field. - Transfer width for the destination in the DST_TR_WIDTH field. - Source master layer in the SMS field where the source resides. - Destination master layer in the DMS field where the destination resides. - Incrementing/decrementing or fixed address for the source in the SINC field. - Incrementing/decrementing or fixed address for the destination in the DINC field. 3. Note: Write the starting destination address in the DARx register for channel x. The values in the LLI.DARx register location of each Linked List Item (LLI) in memory, although fetched during an LLI fetch, are not used. 4. 5. Write the channel configuration information into the CFGx register for channel x. a) Designate the handshaking interface type (hardware or software) for the source and destination peripherals; this is not required for memory. This step requires programming the HS_SEL_SRC/HS_SEL_DST bits. Writing a 0 activates the hardware handshaking interface to handle source/destination requests for the specific channel. Writing a 1 activates the software handshaking interface to handle source/destination requests. b) If the hardware handshaking interface is activated for the source or destination peripheral, assign the handshaking interface to the source and destination peripherals. This requires programming the SRC_PER and DEST_PER bits, respectively. Ensure that all LLI.CTLx register locations of the LLI (except the last) are set as shown in Row 8 of Table 54, while the LLI.CTLx register of the last Linked List item must be set as described in Row 1 or Row 5 of Table 54. Figure 46 shows a Linked List example with two list items. Doc ID 018553 Rev 3 195/590 Direct memory access controllers (DMAC) RM0078 6. Ensure that the LLI.LLPx register locations of all LLIs in memory (except the last) are non-zero and point to the next Linked List Item. 7. Ensure that the LLI.SARx register location of all LLIs in memory point to the start source block address preceding that LLI fetch. 8. If DMAH_CHx_CTL_WB_EN = True, ensure that the LLI.CTLx.DONE fields of the LLI.CTLx register locations of all LLIs in memory are cleared. 9. If source status fetching is enabled (DMAH_CHx_CTL_WB_EN = True, DMAH_CHx_STAT_SRC = True, and CFGx.SS_UPD_EN is enabled), program the SSTATARx register so that the source status information can be fetched from the location pointed to by SSTATARx. For conditions under which the source status information is fetched from system memory, refer to the Write Back column of Table 54. 10. If destination status fetching is enabled (DMAH_CHx_CTL_WB_EN = True, DMAH_CHx_STAT_DST = True, and CFGx.DS_UPD_EN is enabled), program the DSTATARx register so that the destination status information can be fetched from the location pointed to by the DSTATARx register. For conditions under which the destination status information is fetched from system memory, refer to the Write Back column of Table 54. 11. If gather is enabled (DMAH_CHx_SRC_GAT_EN = True and CTLx.SRC_GATHER_EN is enabled), program the SGRx register for channel x. 12. If scatter is enabled (DMAH_CHx_DST_SCA_EN = True and CTLx.DST_SCATTER_EN) program the DSRx register for channel x. 13. Clear any pending interrupts on the channel from the previous DMA transfer by writing to the Interrupt Clear registers: ClearTfr, ClearBlock, ClearSrcTran, ClearDstTran, and ClearErr. Reading the Interrupt Raw Status and Interrupt Status registers confirms that all interrupts have been cleared. 14. Program the CTLx and CFGx registers according to Row 8, as shown in Table 54. 15. Program the LLPx register with LLPx(0), the pointer to the first Linked List item. 16. Finally, enable the channel by writing a 1 to the ChEnReg.CH_EN bit; the transfer is performed. Ensure that bit 0 of the DmaCfgReg register is enabled. 17. The DMAC fetches the first LLI from the location pointed to by LLPx(0). Note: The LLI.SARx, LLI.DARx, LLI.LLPx, and LLI.CTLx registers are fetched. The LLI.DARx register location of the LLI – although fetched – is not used. The DARx register in the DMAC remains unchanged. 18. Source and destination request single and burst DMAC transactions to transfer the block of data (assuming non-memory peripherals). The DMAC acknowledges at the completion of every transaction (burst and single) in the block and carries out the block transfer. 19. Once the block of data is transferred, the source status information is fetched from the location pointed to by the SSTATARx register and stored in the SSTATx register if DMAH_CHx_CTL_WB_EN = True, DMAH_CHx_STAT_SRC = True, and CFGx.SS_UPD_EN is enabled. For conditions under which the source status information is fetched from system memory, refer to the Write Back column of Table 54. The destination status information is fetched from the location pointed to by the DSTATARx register and stored in the DSTATx register if DMAH_CHx_CTL_WB_EN = True, DMAH_CHx_STAT_DST = True, and CFGx.DS_UPD_EN is enabled. For 196/590 Doc ID 018553 Rev 3 RM0078 Direct memory access controllers (DMAC) conditions under which the destination status information is fetched from system memory, refer to the Write Back column of Table 54. 20. If DMAH_CHx_CTL_WB_EN = True, then the CTLx[63:32] register is written out to system memory. For conditions under which the CTLx[63:32] register is written out to system memory, refer to the Write Back column of Table 54. The CTLx[63:32] register is written out to the same location on the same layer (LLPx.LMS) where it was originally fetched; that is, the location of the CTLx register of the linked list item fetched prior to the start of the block transfer. Only the second word of the CTLx register is written out, CTLx[63:32], because only the CTLx.BLOCK_TS and CTLx.DONE fields have been updated by hardware within the DMAC. Additionally, the CTLx.DONE bit is asserted to indicate block completion. Therefore, software can poll the LLI.CTLx.DONE bit field of the CTLx register in the LLI to ascertain when a block transfer has completed. Note: Do not poll the CTLx.DONE bit in the DMAC memory map. Instead, poll the LLI.CTLx.DONE bit in the LLI for that block. If the polled LLI.CTLx.DONE bit is asserted, then this block transfer has completed. This LLI.CTLx.DONE bit was cleared at the start of the transfer (Step 8). 21. The SSTATx register is now written out to system memory if DMAH_CHx_CTL_WB_EN = True, DMAH_CHx_STAT_SRC = True, and CFGx.SS_UPD_EN is enabled. It is written to the SSTATx register location of the LLI pointed to by the previously saved LLPx.LOC register. The DSTATx register is now written out to system memory if DMAH_CHx_CTL_WB_EN = True, DMAH_CHx_STAT_DST = True, and CFGx.DS_UPD_EN is enabled. It is written to the DSTATx register location of the LLI pointed to by the previously saved LLPx.LOC register. The end-of-block interrupt, int_block, is generated after the write-back of the control and status registers has completed. Note: The write-back location for the control and status registers is the LLI pointed to by the previous value of the LLPx.LOC register, not the LLI pointed to by the current value of the LLPx.LOC register. 22. The DMAC does not wait for the block interrupt to be cleared, but continues and fetches the next LLI from the memory location pointed to by the current LLPx register and automatically reprograms the SARx, CTLx, and LLPx channel registers. The DARx register is left unchanged. The DMA transfer continues until the DMAC samples that the CTLx and LLPx registers at the end of a block transfer match those described in Row 1 or Row 5 of the “CTLx.SRC_MSIZE and DST_MSIZE decoding” table(1). The DMAC then knows that the previously transferred block was the last block in the DMA transfer. The DMAC transfer might look like that shown in Figure 60. Note that the destination address is decrementing. 1. For this table, refer to DMAC chapter in RM0089, Reference manual, SPEAr1340 address map and registers. Doc ID 018553 Rev 3 197/590 Direct memory access controllers (DMAC) RM0078 Figure 60. Multi-block DMA transfer with linked list source address and contiguous destination address Address of Source Layer Address of Destination Layer Block2 SAR(2) Block1 Block2 DAR(2) Block1 DAR(1) SAR(1) Block0 Block0 DAR(0) SAR(0) Source Blocks Destination Blocks The DMA transfer flow is shown in Figure 61. Figure 61. DMA transfer for linked list source address and contiguous destination address Channel enabled by software LLI fetch Hardware reprograms SARx, CTLx, and LLPx DMAC block transfer Source/destination status fetch Write-back of control and source/destination status to LLI Block-complete interrupt generated here Is DMAC in Row1 of the “Programming of transfer no types and channel register update method” table? DMAC transfer complete interrupt generated here yes Channel disabled by hardware 198/590 Doc ID 018553 Rev 3 RM0078 12.6.4 Direct memory access controllers (DMAC) Disabling a channel prior to transfer completion Under normal operation, software enables a channel by writing a 1 to the channel enable register, ChEnReg.CH_EN, and hardware disables a channel on transfer completion by clearing the ChEnReg.CH_EN register bit. The recommended way for software to disable a channel without losing data is to use the CH_SUSP bit in conjunction with the FIFO_EMPTY bit in the Channel Configuration Register (CFGx). 1. If software wishes to disable a channel prior to the DMA transfer completion, then it can set the CFGx.CH_SUSP bit to tell the DMAC to halt all transfers from the source peripheral. Therefore, the channel FIFO receives no new data. 2. Software can now poll the CFGx.FIFO_EMPTY bit until it indicates that the channel FIFO is empty. 3. The ChEnReg.CH_EN bit can then be cleared by software once the channel FIFO is empty. When CTLx.SRC_TR_WIDTH < CTLx.DST_TR_WIDTH and the CFGx.CH_SUSP bit is high, the CFGx.FIFO_EMPTY is asserted once the contents of the FIFO do not permit a single word of CTLx.DST_TR_WIDTH to be formed. However, there may still be data in the channel FIFO, but not enough to form a single transfer of CTLx.DST_TR_WIDTH. In this scenario, once the channel is disabled, the remaining data in the channel FIFO is not transferred to the destination peripheral. It is permissible to remove the channel from the suspension state by writing a 0 to the CFGx.CH_SUSP register. The DMA transfer completes in the normal manner. Note: If a channel is disabled by software, an active single or burst transaction is not guaranteed to receive an acknowledgement. If the DMAC is configured to use defined length bursts (DMAH_INCR_BURSTS = 0), disabling the channel via software prior to completing a transfer is not supported. Abnormal transfer termination A DMAC DMA transfer may be terminated abruptly by software by clearing the channel enable bit, ChEnReg.CH_EN. You must not assume that the channel is disabled immediately after the ChEnReg. The CH_EN bit is cleared over the AHB slave interface. Consider this as a request to disable the channel. You must poll ChEnReg.CH_EN and confirm that the channel is disabled by reading back 0. A case where the channel is not disabled after a channel disable request is where either the source or destination has received a split or retry response. The DMAC must keep re-attempting the transfer to the system HADDR that originally received the split or retry response until an OKAY response is returned; to do otherwise is an AMBA protocol violation. Software may terminate all channels abruptly by clearing the global enable bit in the DMAC Configuration Register (DmaCfgReg[0]). Again, you must not assume that all channels are disabled immediately after the DmaCfgReg[0] is cleared over the AHB slave interface. Consider this as a request to disable all channels. You must poll ChEnReg and confirm that all channels are disabled by reading back 0. Note: If the channel enable bit is cleared while there is data in the channel FIFO, this data is not sent to the destination peripheral and is not present when the channel is re-enabled. For read-sensitive source peripherals, such as a source FIFO, this data is therefore lost. When the source is not a read-sensitive device (such as memory), disabling a channel without waiting for the channel FIFO to empty may be acceptable, since the data is available from Doc ID 018553 Rev 3 199/590 Direct memory access controllers (DMAC) RM0078 the source peripheral upon request and is not lost. If a channel is disabled by software, an active single or burst transaction is not guaranteed to receive an acknowledgement. If the DMAC is configured to use defined length bursts (DMAH_INCR_BURSTS = 0), disabling the channel via software prior to completing a transfer is not supported. 12.6.5 Defined-length burst support on DMAC By default, the DMAC support incremental (INCR) bursts only. To achieve better performance, defined length bursts, such as INCR4, INCR8 and INCR16 are required. The DMAC can be configured to use defined-length bursts by setting the configuration parameter DMAH_INCR_BURSTS to 0. In this mode, the DMAC will select the largest valid defined-length burst to complete the transfer. 200/590 Doc ID 018553 Rev 3 RM0078 13 Cryptographic co-processor (C3) Cryptographic co-processor (C3) This chapter focuses on C3 functionality and operation. For the C3 feature list, refer to the SPEAr1340 datasheet: ● Doc ID 023063, Data sheet, SPEAr1340, Dual-core Cortex A9 HMI embedded MPU For technical details about the programmable registers, refer to the following companion document: ● Overview C3 is a set of macro-functions (channels) controlled by two instruction dispatchers. Instruction flows are created and stored in memory by the host processor: they are then read from memory and dispatched to the appropriate channels. Figure 62 shows C3 block diagram. Figure 62. C3 block diagram IRQ Instruction dispatcher (ID) DES/3DES channel M Initiator bus AES channel UHH channel UHH2 channel PKA channel Coupling and chaining module (CCM) Move channel AHB Master interface (HIF) 13.1 RM0089, Reference manual, SPEAr1340 address map and registers RNG channel RAM Buffer (MEMORY) S Target bus AHB Slave interface (SIF) Doc ID 018553 Rev 3 System registers (SYS) 201/590 Cryptographic co-processor (C3) 13.2 RM0078 Functional description The main blocks of C3 are described in the following sections. 13.2.1 AHB Master Interface (HIF) The Master interface (HIF) interfaces channels and instruction dispatchers (ID) to the initiator bus (AMBA AHB) and to the internal RAM Buffer (MEMORY). The purpose of the HIF is to allow read and write accesses generated by channels and instruction dispatchers to be transferred to the initiator bus or to the internal memory. An arbiter in the HIF prevents data access collisions from occurring. ID0 has the highest priority to perform accesses on this block followed in order by ID1 and Channels #0 to #7 (lowest priority). Read transfers have higher priority then write transfers. Every module attached to the HIF receives its own bus error signal. This signal is set by the HIF if a bus error condition is detected for a bus transaction initiated by the corresponding module. The HIF is able to route requests to the internal memory instead of the bus. The HIF is also able to route write requests to a byte bucket (data written there is thrown away). Transactions can simultaneously occur on the bus, the internal memory and the byte bucket. Before using the internal memory and the byte bucket, a base address for transactions that must target the internal memory or the byte bucket instead of the bus must be programmed in the HIF. To program the memory and byte bucket base addresses, you must configure the related C3 HIF registers (see RM0089, Reference manual, SPEAr1340 address map and registers). Write transaction requests coming from IDs or channels that are within an address window of 64 KB starting from the programmed byte bucket base address will be routed to the byte bucket. This means that everything written to this address window is thrown away. Read transactions from this address window are not affected by the byte bucket: they are normally routed either to the internal memory or to the bus. Transaction requests coming from IDs or channels that are within an address window of 64 KB starting from the programmed memory base address will be routed to the internal memory. Higher addresses of the internal memory window are aliased if the internal memory is smaller than 64 KB. A burst transaction always completes on the initial target even if addresses span two different targets. The Move Channel can be used to transfer data to/from the internal memory from/to the bus and vice versa. The content of the internal memory is undefined at startup or after an asynchronous master reset. The byte bucket has priority if both the byte bucket base address and the memory base address are programmed with the same addresses. The internal memory content can also be accessed from the AHB Slave interface (SIF). The internal memory can be accessed by an ID or channel and simultaneously from the AHB slave interface (SIF). 202/590 Doc ID 018553 Rev 3 RM0078 13.2.2 Cryptographic co-processor (C3) C3 RAM Buffer (MEMORY) The AHB Master interface is able to route requests to an internal Memory instead of the Bus. The size of the internal RAM is 16 KB, it is composed by 4096 words of 32 bits each. 13.2.3 Instruction dispatching subsystem (IDS) The IDS is a structural block that instances up to 4 instruction dispatchers (ID) and an instruction dispatcher multiplexer (IDM). The OR logic port is drawn in the IDS hierarchical level for simplicity, although this logic is really located in IDM. The IDs are connected in daisy chain to propagate information about channel and lane signals allocation. IDs can be replaced with zero logic blocks if all instruction dispatchers are not needed. Each ID interfaces to the HIF to fetch instruction, to the SIF to allow access to its registers, to the CCM to send coupling/chaining commands, to the SYS to communicate interrupt states and indirectly to channels (via IDM) to forward (dispatch) instructions. SPEAr1340 implements 2 instruction dispatchers: ID0 and ID1. ID2 and ID3 are not available. Instruction dispatcher (ID) An ID requests instructions from the HIF to fill an instruction queue. It knows which channels instructions must be dispatched to by decoding the higher bits of the first word of every instruction. If the target of an instruction is channel 0 and the instruction decodes to flow type instructions (NOP, NEXT, STOP, COUPLE, UNCOUPLE or WAIT) the instruction is not dispatched: it is executed by the ID. See Section 13.3: Operation for details about the encodings. Channel selection An ID must allocate a channel to take its ownership. An ID is not allowed to dispatch instructions to a channel without having allocated it. This way it is guaranteed that only one ID at a time will have the control of a single channel. When a channel is allocated by an ID it receives a select signal. An ID goes in error state if it tries to allocate a Channel that is already allocated or if the Channel is not in idle state. Two IDs could simultaneously allocate a channel. This situation will remain unnoticed to them until they start dispatching instructions. Instruction dispatching Once channels are allocated by using the above described signals the dispatching of instructions can begin. Typically, the first word of an instruction is dispatched simultaneously to the channel selection. Each instruction is composed by up to four 32-bit words. Words of multi-word instructions are dispatched sequentially to a channel using lanes. The instruction dispatcher multiplexer (IDM) multiplexes lanes coming from IDs to drive the final four lanes that are shared by all channels. When a channel is allocated by an ID, the ID monitors continuously its state (CSTAT). If the channel should report an error, the ID goes also in error state aborting current dispatching and stopping program execution. Doc ID 018553 Rev 3 203/590 Cryptographic co-processor (C3) 13.2.4 RM0078 Couple and chaining module (CCM) This is a switch matrix that can be used to interconnect two channels in a master/slave mode. This block is controlled by the IDs when they execute COUPLE and UNCOUPLE instructions. The maximum number of channel pairs that can be simultaneously interconnected corresponds to the number of CCM data paths: one CCM data path is used for each master/slave interconnection. Channels can be cascaded (a channel can simultaneously be a master and a slave). 13.2.5 AHB Slave interface (SIF) Most C3 blocks have configuration and/or status registers that can be accessed using the AHB slave interface. The SIF bridges AHB requests to a simpler set of signals for the different C3 blocks that need to map registers in the AHB address space. The SIF takes also care to decode AHB addresses in order to select the correct C3 internal block. The SIF interfaces to modules using a single clock data transfer protocol to keep latencies on the AHB bus at a minimum. There is, however, a feature available to modules to permit them to introduce wait states in read cycles. 13.2.6 System registers (SYS) The SYS block implements the system registers as described in RM0089, Reference manual, SPEAr1340 address map and registers. It collects status information about channels and instruction dispatchers. This information can be read through the slave interface. The SYS block is able to acknowledge ID interrupts and it is able to issue an asynchronous reset command to the MRGEN block. 13.2.7 Reset logic (MRGEN) This module drives the asynchronous reset buffer tree of the C3. It receives its input from two sources: asynchronously from the system (top-level HRESET_n port) and synchronously from the SYS module. The MRGEN module has two purposes: to synchronize the release of the external reset and permit self-reset (software reset) to the C3. 13.2.8 Channels SPEAr1340 implements the following channels: ● Channel 0: MOVE channel ● Channel 1: DES/3DES channel ● Channel 2: AES channel ● Channel 3: UHH channel ● Channel 4: UHH2 channel ● Channel 5: PKA channel ● Channel 6: RNG channel ● Channel 7: empty The features of each channel are described in Doc ID 023063, Data sheet, SPEAr1340, Dual-core Cortex A9 HMI embedded MPU. The instruction set for each channel is described in Section 13.3: Operation. 204/590 Doc ID 018553 Rev 3 RM0078 Cryptographic co-processor (C3) Each channel has four components. Two of them are common to all channels (CB_IF and CPIF) and the other two (CU and CB) are specific to the channel. A channel is connected to the Instruction dispatching subsystem (IDS), to the HIF, to the CCM and to the SIF. Figure 63 shows a typical C3 channel architecture. The main components are described below. Figure 63. C3 channel architecture )NSTRUCTION $ISPATCHER )$ #OUPLING#HAINING -ODULE ##- #HANNEL &)&/ ()& "LOCK CONTROL UNIT#5 &)&/ #OREBLOCK #" #OREBLOCK INTERFACE #"?)& 4ARGET BUS The components of a channel are: ● Control unit (CU): This block decodes instructions coming from the IDS and, in accordance with them, configures the core block (CB) and, if necessary, activates it so that it begins data processing. The CU interfaces the FIFOs in the CB_IF block to the HIF and it must keep a source, destination and count registers. The CU drives also a set of multiplexers inside the CB_IF for coupling and chaining operations driven by the CCM. The CU is tailored to the needs of each channel but most of its logic can be implemented by copying it from an existing channel. The MOVE channel may be used as a template to build new channels. ● Control block inteface (CB_IF): This block contains the FIFOs of the read and write path of the channel. It contains also most of the coupling and chaining logic. The CB_IF creates an interface from the rich and complex signal set of the CU to the much simpler CB interface. The CB_IF is the same for every channel, only the size of its FIFOs can be changed by modifying an RTL parameter. ● Core Block (CB): It performs the main task of the channel, that is functional data processing. ● CPIF: Channel interface to the SIF for AHB slave register access: both the CU and the CB have an interface for register access. The CPIF does the address decoding and output signal multiplexing and interfaces to the main AHB slave interface, the SIF. The CPIF stores also the Channel ID value (see registers description), this is configured using and RTL parameter. The CPIF is the same for every channel. Doc ID 018553 Rev 3 205/590 Cryptographic co-processor (C3) 13.3 RM0078 Operation This section describes the instructions encoding for the generic flow type and for each channel. In the following subsections: 13.3.1 1. [x] denotes the value of the Additional Instruction Words field for this instruction. 2. (x-y) denote acceptable values for this field 3. Unused fields must be zero 4. The opcodes for module 0 are shared between the Instruction Dispatchers and the channel 0 module. Two opcodes are available for channel 0 operations (bits 25-23 = 4 or 5). For example the move channel can be allocated to channel 0 and the two move operations can be encoded in the operation field. 5. xxxx stands for “don’t care”. Generic flow type instructions This section specifies the flow type instructions encoding. Flow type instructions are decoded by C3 dispatchers when the module number (bits 31-28) is 0. Different values of the module number leads to the dispatching of the instruction and its arguments to the channel associated to that number. Bits 31-28 Module Number (0) Bits 27-26 Additional Instruction Words (0-3) Bits 25-23 Operation (0-7) 0 = STOP [0] If and only if Bit 27-26 = 1 (one additional word) then the status register content is written in memory (at the address pointed to by this additional word) when the stop execution is executed. Bits 22-0 -> unused 1 = WAIT [0] Bits 22-16 -> unused Bits 15-0 -> Number of clock cycles to wait (0-65535) 2 = NEXT_Inst_List (*list_start) [1] Bits 22-0 -> unused *list_start-> 32-bit pointer to start of next instruction list 3 = NOP Bits 22-0 -> unused 4 = CHANNEL 0 SPECIFIC OPCODE (see Section 13.3.2: Move channel instruction set) 5 = CHANNEL 0 SPECIFIC OPCODE (see Section 13.3.2: Move channel instruction set) 6 = COUPLE 206/590 [0] Doc ID 018553 Rev 3 RM0078 Cryptographic co-processor (C3) Bits 22-19 -> Master device (0-15) Bit 18 -> Coupling/Chaining selection 0 = couple master device inputs (coupling) 1 = couple master device outputs (chaining) Bit 17 -14 -> Slave device (0-15) (This value should correspond to the Module Number) Bits 13-11 -> Coupling/Chaining Path Number Bits 10-0 -> unused Bits 13-11 -> Coupling/Chaining Path Number Bits 10-0 -> unused 7 = UNCOUPLE [0] Bits 22-14 -> unused Bits 13-11 -> Coupling/Chaining Path Number Bits 10-0 -> unused 13.3.2 Move channel instruction set The Move channel executes MOVE_INIT and MOVE_DATA instructions as specified below. Bits 31-28 Module Number (0) (Assumes that Move Channel is in Channel 0 instruction space) Bits 27-26 Additional Instruction Words (0-3) Bits 25-23 Operation (0-7) 0-3 = Not Used [0] 4 = MOVE_Init (data) [1] Bits 22-0 -> unused data -> 32-bit mask for logical operations 5 = MOVE_Data (len, *src, *dest) [2] Bits 22-21 -> Logical operation 0 = no operation 1 = logical AND 2 = logical OR 3 = logical XOR Bits 20-16 -> unused Bits 15-0 -> Length of block to move(0-65535) *src -> 32-bit pointer to start of source data *dest -> 32-bit pointer to destination address 6-7 = Not Used The move channel supports a slight variation of the MOVE_INIT instruction that also accepts the function parameter used to set the operator (see bits nn below). Instructions that do not conform to the following bit encodings or the ones mentioned above are unknown to the Move channel and they will cause an error state. Doc ID 018553 Rev 3 207/590 Cryptographic co-processor (C3) RM0078 MOVE_INIT instruction The MOVE_INIT instruction is 2 words long. This instruction is used to set the Function and Operand of the Move channel. The function is encoded in the first instruction word whereas the operand is represented by the second instruction word. Table 55. MOVE_INIT bit encoding W# Bit encoding 1 0000 0110 0nn0 0000 xxxx xxxx xxxx xxxx 2 (32 bit operand) Bit nn in Table 55 are used to set the function of the Move channel: Table 56. MOVE_INIT bits nn definition Bit 17, 16 nn Function 00 null 01 AND 10 OR 11 XOR MOVE_DATA instruction The MOVE_DATA instruction is 3 words long. This instruction is used to set the Source Register, the Destination Register and the Count Register of the Move Channel (values of MOVE_SRCR, MOVE_DSTR and MOVE_CNTR registers) and to eventually start a copy operation. The Function and the Count are encoded in the first instruction word, the second word represents the Source Address and the Destination Address is represented by the third instruction word. Table 57. MOVE_DATA bit encoding W# Bit encoding 1 0000 1010 1nn0 0000 cccc cccc cccc cccc 2 (32 bit Source Address) 3 (32 bit Destination Address) Bit nn in the above table are used to set the Function of the Move Channel and have the same encoding as in the MOVE_INIT instruction. Bits 15 to 0 in the first instruction word (cccc in the above table) represent the Count in Bytes of data to be copied. If the Count is different from zero the Move Channel begins to copy data. Count must be a multiple of 4 Bytes, Source and Destination Addresses must be 32-bit aligned, otherwise the Move Channel will go in error state. 208/590 Doc ID 018553 Rev 3 RM0078 13.3.3 Cryptographic co-processor (C3) DES/3DES channel instruction set This channel can compute DES and 3DES encryption and decryption in ECB and CBC mode by executing DES START and APPEND instructions. Instructions that do not conform to the following bit encodings or to the generic flow type instructions are unknown to the DES/3DES channel that will go in error state. There are 2 different DES instructions: ● DES START: used for setting the operation parameters, such as the key and the initialization vector. ● DES APPEND: used for passing the data to encrypt or decrypt. DES START instruction The DES START instruction can be applied with 2 different modes of operation: ● ECB ● CBC ECB The DES START ECB instruction is 2 words long. This instruction is used to set the key for the following operations. The length of the key is encoded in the first instruction word, while the second word represents the Source Address for the key. Table 58. DES START ECB bit encoding W# Bit encoding 1 0001 01ab 000x xxxx cccc cccc cccc cccc 2 (32 bit Source Address for the key) Bit a in Table 58 is used to set the algorithm to use: Table 59. Bit a definition Bit 25 a Operation 0 DES 1 3DES Bit b in Table 58 is used to set the operation to perform: Table 60. Bit b definition Bit 24 b Operation 0 Encryption 1 Decryption Bits 15 to 0 in the first instruction word (cccc in Table 58) represent the length in Bytes of the key. Doc ID 018553 Rev 3 209/590 Cryptographic co-processor (C3) RM0078 CBC The DES START CBC instruction is 3 words long. This instruction is used to set the key and the initialization vector for the following operations. The length of the key is encoded in the first instruction word, the second word represents the Source Address for the key and the third word represents the Source Address for the Initialization Vector (IV). Table 61. DES START CBC bit encoding W# Bit encoding 1 0001 10ab 001x xxxx cccc cccc cccc cccc 2 (32 bit Source Address for the key) 3 (32 bit Source Address for the IV) Bits a and b in the above table are used to set the algorithm and the operation to perform and have the same encoding as in the ECB instruction. Bits 15 to 0 in the first instruction word (cccc in the above table) represent the length in Bytes of the key. DES APPEND instruction The DES APPEND instruction can be applied with 3 different modes of operation: ● ECB ● CBC ECB The DES APPEND ECB instruction is 3 words long. This instruction is used for passing the data to process (encrypt or decrypt). The length of the data to process is encoded in the first instruction word, the second word represents the Source Address and the third word represents the Destination Address. Table 62. W# DES APPEND ECB bit encoding Bit encoding 1 0001 10ab 100x xxxx cccc cccc cccc cccc 2 (32 bit Source Address for the data) 3 (32 bit Destination Address for the data) Bit a in the above table is used to set the algorithm to use, while bit b is used to set the operation to perform (see Table 59 and Table 60). Bits 15 to 0 in the first instruction word (cccc in the above table) represent the length in Bytes of the data to process. CBC The DES APPEND CBC instruction is 3 words long. This instruction is used for passing the data to process (encrypt or decrypt). The length of the data to process is encoded in the first instruction word, the second word represents the Source Address and the third word represents the Destination Address. 210/590 Doc ID 018553 Rev 3 RM0078 Cryptographic co-processor (C3) Table 63. W# DES APPEND CBC bit encoding Bit encoding 1 0001 10ab 101x xxxx cccc cccc cccc cccc 2 (32 bit Source Address for the data) 3 (32 bit Destination Address for the data) Bits a and b in the above table are used to set the algorithm and the operation to perform and have the same encoding as in the ECB instruction. Bits 15 to 0 in the first instruction word (cccc in the above table) represent the length in Bytes of the key. Doc ID 018553 Rev 3 211/590 AES (MPCM) channel instruction set The following figure lists all possible instruction encodings that the MPCM Channel understands. Figure 64. AES (MPCM) channel instruction set -0#-#HANNELINSTRUCTIONSET 7/2$ MNEMO #(. 7. B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B Doc ID 018553 Rev 3 3%450?0 %8%#?0 3%450?$ %8%#?$ 3%450?3 %8%#?3 RESERVED RESERVED 3%450?0$ %8%#?0$ 3%450?03 %8%#?03 3%450?3$ %8%#?3$ RESERVED RESERVED 3%450?03$ %8%#?03$ ./0 37)4#(64 RESERVED RESERVED RESERVED RESERVED $/7.,/!$ RESERVED RESERVED RESERVED P P P P P P P P N P P P P N N N P P P P P P P P P P P P P P P P N P P P P N P P P P N P P P P N P P P P N P P P P N N N N N N N P P P P P P P P P P P P P P P P N P P P P N P P P P N P P P P N P P P P N P P P P N P P P P P P P P N P P P P N T T T T T P P P P P P N T 7/2$ BB BB BB N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N PARAM PARAM PARAM PARAM SRCPTR SRCPTR DSTPTR DSTPTR SRCPTR SRCPTR DSTPTR DSTPTR N N N N N N N N N N N N N N N N N N N N N N N N N N N N PARAM PARAM SRCPTR SRCPTR N N N N N N N N N N ADDRESS SRCPTR N N N N N N N N 7/2$ N N N N N PARAM PARAM DSTPTR DSTPTR SRCPTR SRCPTR DSTPTR DSTPTR RM0078 3%450 %8%# RESERVED RESERVED 7/2$ Cryptographic co-processor (C3) 212/590 13.3.4 RM0078 Cryptographic co-processor (C3) Bits 27-26 of Instruction Word #0 indicates the number of additional words this instructions has. Bits 21-16 of Instruction Word #0 represents an MPCM micro-sequence when marked p in the table or an MPCM vector table when marked t in the table. Bits 15-0 of Instruction Word #0 represents the number of bytes to be handled by the MPCM Channel when marked n in the table. pppppp: MPCM micro-sequence number. tttttt: MPCM vector table number. nn..nnnn: Byte Count. NOP Instruction No operation. Do nothing. The MPCM is not started. DOWNLOAD Instruction The DOWNLOAD instruction is used to program the MPCM RAM memory with microsequences. It has two additional instruction dwords: ● Word1 indicates at which MPCM RAM address the micro-sequence must be placed ● Word2 indicates the CU where to load the data from. Bits 21-16 (pppppp) of the DOWNLOAD instruction word0 indicates which micro-sequence number is being downloaded, bits 15-0 (nn..nnnn) indicates the length in Bytes of the micro-sequence. The C prototype of this instruction is: mpcm_download(int prgno, int addr, const char *srcpt, int n); prgno: bits 21-16 of word0 addr: word1 srcpt: word2 n: bits 15-0 of word0 Each micro-sequence instruction is 8 Bytes wide, so n must be a multiple of 8 Bytes. If n is not an 8 Bytes multiple the CU reports an AERR error. Each address location of the MPCM RAM contains a complete 8 Bytes micro-sequence instruction. Example: you want to download an 80 Bytes long micro-sequence as program #3 in the MPCM RAM address h71. The micro-sequence must be loaded by the C3 from address location h1000. The C function would be: mpcm_download(3, 0x71, 0x1000, 80); which must be encoded in the MPCM instruction: 29030050 00000071 00001000 After execution of this DOWNLOAD instruction, the MPCM channel will be ready to execute micro-sequence #3. The micro-sequence is placed in the MPCM RAM locations h71-h7A. The MPCM Core Block RAM is implicitly split into two sections. The initial address region contains Vector Tables for micro-sequence addresses whereas higher addresses contain the downloaded micro-sequences. Doc ID 018553 Rev 3 213/590 Cryptographic co-processor (C3) RM0078 Figure 65. MPCM Core block RAM diagram In Figure 65 you can see that the micro-sequence downloaded in the previous example has been placed at address h071 and that address location 3 is the vector to this microsequence. When you request the MPCM channel to execute micro-sequence #3, it will execute it staring from address location h071. There can be up to 64 entries in the Vector Table (the pppppp field in MPCM channel instructions is 6-bit wide). The user is free to choose how to organize the memory. Note that it is not obligatory to allocate space for a full Vector Table. If you download only 4 micro-sequences, the Vector Table can have only 4 entries (h000-h003) and the first micro-sequence can already be placed at address h004. SWITCHVT Instruction The MPCM offers a mechanism to have multiple vector tables in case you need to download more than 64 micro-sequences. In fact there can be up to 64 different vector tables in the MPCM Core Block RAM, leading to a theoretical maximum of 64x64 = 4096 downloadable micro-sequences. The SWITCHVT is a single dword instruction. Bits 21-16 (tttttt) of the SWITCHVT instruction word0 indicates which vector table to use for all following DOWNLOAD, SETUP and EXECUTE instructions. The C prototype of this instruction is: mpcm_switchvt(int vtno); vtno: bits 21-16 of word0 Example: you want to switch to Vector Table #1. The C function would be” mpcm_switchvt(1); 214/590 Doc ID 018553 Rev 3 RM0078 Cryptographic co-processor (C3) which must be encoded in the MPCM instruction: 21010000 After execution of this SWITCHVT instruction the MPCM will use Vector Table #1 for DOWNLOAD, SETUP and EXECUTE instructions. When the MPCM starts-up it will per default use Vector Table #0. If you do not need to download more than 64 micro-sequences, you should never use the SWITCHVT instruction. Vector tables are contiguous in the initial address region of the MPCM Core Block RAM. Figure 66. MPCM vector tables EXECUTE Instructions EXECUTE instructions are used to set the Source Address, Destination Address and Count Register of the MPCM Channel’s CU and to run downloaded micro-sequences. Since it is not always necessary to set all these registers a variety of different EXECUTE instructions is offered (for instance, there is no need to set the Source Address Register for a microsequence that does not need input data). One set of EXECUTE instructions have also the possibility to forward a parameter to the MPCM Core Block before the micro-sequence is launched. This can become handy for more complex modes. Previous values of the Source Address and Destination Address registers are used if you do not set them in an EXECUTE instructions. Doc ID 018553 Rev 3 215/590 Cryptographic co-processor (C3) RM0078 The C prototype of these instructions are: mpcm_execute(int prgno, int n); mpcm_execute_p(int prgno, unsigned int param, int n); mpcm_execute_d(int prgno, char *dstpt, int n); mpcm_execute_s(int prgno, const char *srcpt, int n); mpcm_execute_pd(int prgno, unsigned int param, char *dstpt, int n); mpcm_execute_ps(int prgno, unsigned int param, const char *stcpt, int n); mpcm_execute_sd(int prgno, const char *srcpt, char *dstpt, int n); mpcm_execute_psd(int prgno, unsigned int param, const char *srcpt, char *dstpt, int n); See the previous paragraphs for encoding of these instructions. Each of these instructions sets up the MPCM Core Block to execute the micro-sequence prgno. If the EXECUTE instruction specifies param, this will be forwarded to the MPCM Core Block before it is launched. The Byte Count n specified in these EXECUTE instructions is also forwarded to the MPCM Core Block before it is launched. In some more complex modes the micro-sequence could need this information. For each of these instructions the CU of the MPCM Channel will then load n Bytes from *srcpt and forward them to the MPCM Core Block for processing. If n is zero no data will be loaded. If *srcpt is not set by this EXECUTE instruction the previous value of *srcpt is used. For each of these instructions the CU of the MPCM Channel will also store any data generated by the MPCM Core Block to *dstpt. If *dstpt is not set by this EXECUTE instruction the previous value will be used. The CU of the MPCM Channel can handle any value of Byte Count (n). The CU loads always the minimum dwords to satisfy the request (i.e. if you request the processing of 2 Bytes the CU will load 1 dword). The minimum value of n in zero, the maximum value of n is (64kB – 1B = 65’535B). SETUP Instructions SETUP instructions are similar to EXECUTE instructions with the difference that the MPCM Core Block is not launched. These are primarily used to setup the MPCM Channel when it is going to be a Slave in Couple/Chaining operations: the MPCM Core Block will later be launched by a Master Channel. SETUP instructions can be used to set the Source Address and the Destination Address of the MPCM Channel’s CU (note that the Count Register is not affected). The MPCM Core Block is set-up to execute downloaded micro-sequences. As with the EXECUTE instructions, one set of SETUP instructions have also the possibility to forward a parameter to the MPCM Core Block. Previous values of the Source Address and Destination Address registers are retained if you do not set them in a SETUP instruction. 216/590 Doc ID 018553 Rev 3 RM0078 Cryptographic co-processor (C3) The C prototype of these instructions are: mpcm_setup(int prgno, int n); mpcm_setup_p(int prgno, unsigned int param, int n); mpcm_setup_d(int prgno, char *dstpt, int n); mpcm_setup_s(int prgno, const char *srcpt, int n); mpcm_setup_pd(int prgno, unsigned int param, char *dstpt, int n); mpcm_setup_ps(int prgno, unsigned int param, const char *stcpt, int n); mpcm_setup_sd(int prgno, const char *srcpt, char *dstpt, int n); mpcm_setup_psd(int prgno, unsigned int param, const char *srcpt, char *dstpt, int n); Each of these instructions sets-up the MPCM Core Block to execute the micro-sequence prgno. If the SETUP instruction specifies param, this will be forwarded to the MPCM Core Block. Note that the micro-sequence is not started. The micro-sequence can be later launched by a Master Channel, ie. The Byte Count n specified in these SETUP instructions is forwarded to the MPCM Core Block. In some more complex modes the micro-sequence could need this information. The Byte Count does not affect the Count Register. After SETUP instructions the MPCM Channel will be ready to accept data from a Master Channel in coupling/chaining operations. 13.3.5 Unified hash with HMAC (UHH) channel instruction set The UHH Channel executes HASH [MD5/SHA1/SHA2/CONTEXT] and HMAC [MD5/SHA1/SHA2/CONTEXT] instructions. Instructions that do not conform to the following bit encodings or to to the generic flow type instructions are unknown to the UHH Channel that will go in error state. HASH instruction There are 4 different HASH instructions: ● HASH MD5 ● HASH SHA1 ● HASH SHA2 ● HASH CONTEXT The first 3 instructions are used for computing the digest of a message and work in the same way. The last one is used for saving and restoring the context. HASH [MD5/SHA1/SHA2] instructions Each HASH [MD5/SHA1/SHA2] instruction is composed by 3 subinstructions: ● INIT ● APPEND ● END Doc ID 018553 Rev 3 217/590 Cryptographic co-processor (C3) RM0078 INIT The HASH [MD5/SHA1/SHA2] INIT instruction is 1 word long. This instruction is used to set the function. Table 64. HASH [MD5/SHA1/SHA2] INIT bit encoding W# 1 Bit encoding 0011 000a a00x xxxx xxxx xxxx xxxx xxxx Bits aa in the above table are used to set the algorithm to use: Table 65. HASH [MD5/SHA1/SHA2] INIT bits aa definition Bit 24,23 aa Algorithm 00 MD5 01 SHA-1 10 SHA-256 11 CONTEXT (see HASH CONTEXT instruction) APPEND The HASH [MD5/SHA1/SHA2] APPEND instruction is 2 words long. The length of the message is encoded in the first instruction word, while the second word represents the Source Address for the message. Table 66. W# HASH [MD5/SHA1/SHA2] APPEND instruction Bit encoding 1 0011 010a a01x xxxx cccc cccc cccc cccc 2 (32 bit Source Address for the message) Bits aa in Table 65 are used to set the algorithm to use and have the same encoding as in the INIT instruction. Bits 15 to 0 in the first instruction word (cccc in the above table) represent the Count in Bytes of the input message. END The HASH [MD5/SHA1/SHA2] END instruction is 2 words long. The second word represents the Destination Address for the digest. Table 67. W# 218/590 HASH [MD5/SHA1/SHA2] END bit encoding Bit encoding 1 0011 010a a10t xxxx xxxx xxxx xxxx xxxx 2 (32 bit Destination Address for the message) Doc ID 018553 Rev 3 RM0078 Cryptographic co-processor (C3) Bits aa in Table 67 are used to set the algorithm to use and have the same encoding as in the INIT instruction. Bit t in the above table is used to truncate the result to 96-bits: Table 68. HASH [MD5/SHA1/SHA2] END bit t definition Bit 20 t Trunc 0 Full digest 1 Truncated 96-bit digest HASH CONTEXT instruction The HASH CONTEXT instruction is composed by 2 subinstructions: ● SAVE ● RESTORE SAVE The HASH CONTEXT SAVE instruction is 2 words long. The second word represents the Destination Address for the context. Table 69. HASH CONTEXT SAVE bit encoding W# Bit encoding 1 0011 0101 10xx xxxx xxxx xxxx xxxx xxxx 2 (32 bit Destination Address for the context) RESTORE The HASH CONTEXT RESTORE instruction is 2 words long. The second word represents the Source Address for the context. Table 70. HASH CONTEXT RESTORE bit encoding W# Bit encoding 1 0011 0101 11xx xxxx xxxx xxxx xxxx xxxx 2 (32 bit Source Address for the context) HMAC instruction There are four different HMAC instructions: ● HMAC MD5 ● HMAC SHA1 ● HMAC SHA2 ● HMAC CONTEXT The first 3 instructions are used for computing the HMAC of a message and work in the same way. The last one is used for saving and restoring the context. Doc ID 018553 Rev 3 219/590 Cryptographic co-processor (C3) RM0078 HMAC [MD5/SHA1/SHA2] instructions Each HMAC [MD5/SHA1/SHA2] instruction is composed by 3 subinstructions: ● INIT ● APPEND ● END INIT The HMAC [MD5/SHA1/SHA2] INIT instruction is 2 words long. Table 71. W# HMAC [MD5/SHA1/SHA2] INIT bit encoding Bit encoding 1 0011 011a a00x xxxx cccc cccc cccc cccc 2 (32 bit Source Address for the key) Bits aa in the above table are used to set the algorithm to use and have the same encoding as in the HASH INIT instruction. Bits 15 to 0 in the first instruction word (cccc in the above table) represent the length in Bytes of the key. APPEND The HMAC [MD5/SHA1/SHA2] APPEND instruction is 2 words long. This instruction is used to set the Source Address Register for the message and to start the computation of the HMAC. The length of the message is encoded in the first instruction word, while the second word represents the Source Address for the message. Table 72. W# HMAC [MD5/SHA1/SHA2] APPEND bit encoding Bit encoding 1 0011 011a a01x xxxx cccc cccc cccc cccc 2 (32 bit Source Address for the message) Bits aa in the above table are used to set the algorithm to use and have the same encoding as in the INIT instruction. Bits 15 to 0 in the first instruction word (cccc in the above table) represent the Count in Bytes of the input message. END The HMAC [MD5/SHA1/SHA2] END instruction is 3 words long. The second word represents the Source Address for the key, while the third word represents the Destination Address for the HMAC. 220/590 Doc ID 018553 Rev 3 RM0078 Cryptographic co-processor (C3) Table 73. HMAC [MD5/SHA1/SHA2] END bit encoding W# Bit encoding 1 0011 101a a10t xxxx cccc cccc cccc cccc 2 (32 bit Source Address for the key) 3 (32 bit Destination Address for the message) Bits aa in the above table are used to set the algorithm to use and have the same encoding as in the INIT instruction. Bits 15 to 0 in the first instruction word (cccc in the above table) represent the length in Bytes of the key. Bit t in the above table is used to truncate the result to 96-bits: Table 74. HMAC [MD5/SHA1/SHA2] END bit t definition Bit 20 t Trunc 0 Full HMAC 1 Truncated 96-bits HMAC HMAC CONTEXT instruction The HMAC CONTEXT instruction is composed by 2 subinstructions: ● SAVE ● RESTORE SAVE The HMAC CONTEXT SAVE instruction is 2 words long. The second word represents the Destination Address for the context. Table 75. HMAC CONTEXT SAVE bit encoding W# Bit encoding 1 0011 0111 10xx xxxx xxxx xxxx xxxx xxxx 2 (32 bit Source Address for the context) RESTORE The HMAC CONTEXT RESTORE instruction is 2 words long. The second word represents the Source Address for the context. Table 76. HMAC CONTEXT RESTORE bit encoding W# Bit encoding 1 0011 0111 11xx xxxx xxxx xxxx xxxx xxxx 2 (32 bit Source Address for the context) Doc ID 018553 Rev 3 221/590 Cryptographic co-processor (C3) RM0078 13.3.6 Unified hash with HMAC 2 (UHH2) channel instruction set Note: The channel described in this document (that supports SHA384 and SHA512) is called UHH2, to distinguish from the UHH channel that can support MD5, SHA1 and SHA256. A new channel has been developed for SHA384 and SHA512 because these algorithms are oriented on 64 bits words (instead of 32 bits as for SHA1 and SHA256). The use of the UHH2 channel is almost the same as for the UHH channel. There are 3 main differences: – SHA384 replaces MD5 and SHA512 replaces SHA1 in the instruction encoding – the digest size is 384 bits for SHA384 and 512 bits for SHA512 – the size of the context for saving/restoring is increased (see details in CONTEXT sections) The UHH2 channel executes HASH [SHA384/SHA512/CONTEXT] and HMAC [SHA384/SHA512/CONTEXT] instructions. Instructions that do not conform to the following bit encodings or to to the generic flow type instructions are unknown to the UHH2 channel that will go in error state. There are 3 different HASH instructions: ● HASH SHA384 ● HASH SHA512 ● HASH CONTEXT The first 2 instructions are used for computing the digest of a message and work in the same way. The last one is used for saving and restoring the context. HASH [SHA384/SHA512] instructions Each HASH [SHA384/SHA512] instruction is composed by 3 subinstructions: ● INIT ● APPEND ● END INIT The HASH [SHA384/SHA512] INIT instruction is 1 word long. This instruction is used to set the function. Table 77. HASH [SHA384/SHA512] INIT bit encoding W# 1 Bit encoding 0100 000a a00x xxxx xxxx xxxx xxxx xxxx Bits aa in the above table are used to set the algorithm to use: Table 78. 222/590 HASH [SHA384/SHA512] INIT bits aa definition Bit 24,23 aa Algorithm 00 SHA384 01 SHA512 Doc ID 018553 Rev 3 RM0078 Cryptographic co-processor (C3) Table 78. HASH [SHA384/SHA512] INIT bits aa definition Bit 24,23 aa Algorithm 10 not used 11 CONTEXT (see HASH CONTEXT instruction) APPEND The HASH [SHA384/SHA512] APPEND instruction is 2 words long. The length of the message is encoded in the first instruction word, while the second word represents the Source Address for the message. Table 79. HASH [SHA384/SHA512] APPEND bit encoding W# Bit encoding 1 0100 010a a01x xxxx cccc cccc cccc cccc 2 (32 bit Source Address for the message) Bits aa in the above table are used to set the algorithm to use and have the same encoding as in the INIT instruction. Bits 15 to 0 in the first instruction word (cccc in the above table) represent the count in bytes of the input message. END The HASH [SHA384/SHA512] END instruction is 2 words long. The second word represents the Destination Address for the digest. Table 80. HASH [SHA384/SHA512] END bit encoding W# Bit encoding 1 0100 010a a100 xxxx xxxx xxxx xxxx xxxx 2 (32 bit Source Address for the message) Bits aa in the above table are used to set the algorithm to use and have the same encoding as in the INIT instruction. HASH CONTEXT instruction The HASH CONTEXT instruction is composed by 2 subinstructions: ● SAVE ● RESTORE SAVE The HASH CONTEXT SAVE instruction is 2 words long. The second word represents the Destination Address for the context. Doc ID 018553 Rev 3 223/590 Cryptographic co-processor (C3) Table 81. RM0078 HASH CONTEXT SAVE bit encoding W# Bit encoding 1 0100 0101 10xx xxxx xxxx xxxx xxxx xxxx 2 (32 bit Source Address for the message) RESTORE The HASH CONTEXT RESTORE instruction is 2 words long. The second word represents the Source Address for the context. Table 82. HASH CONTEXT RESTORE bit encoding W# Bit encoding 1 0100 0101 11xx xxxx xxxx xxxx xxxx xxxx 2 (32 bit Source Address for the message) HMAC instruction There are 3 different HMAC instructions: ● HMAC SHA384 ● HMAC SHA512 ● HMAC CONTEXT The first 2 instructions are used for computing the HMAC of a message and work in the same way. The last one is used for saving and restoring the context. HMAC [SHA384/SHA512] instructions Each HMAC [SHA384/SHA512] instruction is composed by 3 subinstructions: ● INIT ● APPEND ● END INIT The HMAC [SHA384/SHA512] INIT instruction is 2 words long. Table 83. HMAC [SHA384/SHA512] INIT bit encoding W# Bit encoding 1 0100 011a a00x xxxx cccc cccc cccc cccc 2 (32 bit Source Address for the message) Bits aa in the above table are used to set the algorithm to use and have the same encoding as in the HASH INIT instruction. Bits 15 to 0 in the first instruction word (cccc in the above table) represent the length in Bytes of the key. 224/590 Doc ID 018553 Rev 3 RM0078 Cryptographic co-processor (C3) APPEND The HMAC [SHA384/SHA512] APPEND instruction is 2 words long. The length of the message is encoded in the first instruction word, while the second word represents the Source Address for the message. Table 84. HMAC [SHA384/SHA512] APPEND bit encoding W# Bit encoding 1 0100 011a a01x xxxx cccc cccc cccc cccc 2 (32 bit Source Address for the message) Bits aa in the above table are used to set the algorithm to use and have the same encoding as in the INIT instruction. Bits 15 to 0 in the first instruction word (cccc in the above table) represent the Count in Bytes of the input message. END The HMAC [SHA384/SHA512] END instruction is 3 words long. The second word represents the Source Address for the key, while the third word represents the Destination Address for the HMAC. Table 85. HMAC [SHA384/SHA512] END bit encoding W# Bit encoding 1 0100 101a a100 xxxx cccc cccc cccc cccc 2 (32 bit Source Address for the message) Bits aa in the above table are used to set the algorithm to use and have the same encoding as in the INIT instruction. Bits 15 to 0 in the first instruction word (cccc in the above table) represent the length in Bytes of the key. HMAC CONTEXT instruction The HMAC CONTEXT instruction is composed by 2 subinstructions: ● SAVE ● RESTORE SAVE The HMAC CONTEXT SAVE instruction is 2 words long. The second word represents the Destination Address for the context. Table 86. HMAC CONTEXT SAVE bit encoding W# Bit encoding 1 0100 0111 10xx xxxx xxxx xxxx xxxx xxxx 2 (32 bit Source Address for the message) Doc ID 018553 Rev 3 225/590 Cryptographic co-processor (C3) RM0078 RESTORE The HMAC CONTEXT RESTORE instruction is 2 words long. The second word represents the Source Address for the context. Table 87. HMAC CONTEXT RESTORE bit encoding W# 13.3.7 Bit encoding 1 0100 0111 11xx xxxx xxxx xxxx xxxx xxxx 2 (32 bit Source Address for the message) Public key (PKA) channel instruction set The PKA Channel executes MONTY_PAR, MOD_EXP, MONTY_EXP, ECC_MUL and ECC_MONTY_MUL instructions. Instructions that do not conform to the following bit encodings or to the generic flow type instructions are unknown to the PKA Channel that will go in error state. Data structures and endianness Each instruction for the PKA channel requires pointers for the input and the output data structures to manage. The input data structures are variable, in composition (depending on the operation to execute) and in size (depending on the used size for the underlying finite field). All these structures must follow the indications about size and order for the involved operands as described for each instruction. The size of each field of the data structures is provided in W or E (number of 32-bit words), where: W = E = ( op_len ) ⁄ 32 ( exp _len ) ⁄ 32 for RSA/DH and E = ( k_len ) ⁄ 32 for ECC W depends only on the underlying finite field size (op_len), while E depends on the exponent to be used. Note: ● If the maximum allowed length for RSA and DH is 2048 bits, then W and E will be less or equal to 64 words (corresponding to 256 bytes). ● If the maximum allowed length for ECC is 384 bits, then W and E will be less or equal to 12 words (corresponding to 48 bytes). The input data structures must follow the specified order and size for all the parameters. Both input and output data structures are Big-endian. This means that the first word represents the most significant word of the first operand. The first byte of the word is also the most significant one. For instance, in case of MONTY_EXP instruction, the input data structure is composed by the 4 operands in the following order, as described by the instruction specification: 1. op_len (one single 32-bit word) 2. mod (W 32-bit words, depending on the op_len value) 3. exp_len (one single 32-bit word) 4. exp (E 32-bit words, depending on the exp_len value) The representation in memory of this data structure is: 226/590 Doc ID 018553 Rev 3 RM0078 Cryptographic co-processor (C3) Table 88. MONTY_EXP instruction data structure MSB op_len ... ... LSB op_len MSB mod ... ... - - ... ... - - ... ... LSB mod MSB exp_len ... ... LSB exp_len MSB exp ... ... - - - - - - - - LSB exp MONTY_PAR instruction The MONTY_PAR instruction is 3 words long. This instruction is used to set the Source Address Register, the Destination Address Register and the Count Register of the PKA Channel (values of PKA_SRCR, PKA_DSTR and PKA_CNTR registers) and to start the computation of the Montgomery’s parameter. The value depends only on the underlying finite field and then its computation is the same for both RSA/DH and ECC. The Function and the Count are encoded in the first instruction word, the second word represents the Source Address and the Destination Address is represented by the third instruction word. Table 89. MONTY_PAR bit encoding W# Bit encoding 1 0101 1000 1xxx xxxx cccc cccc cccc cccc 2 (32 bit Source Address) 3 (32 bit Destination Address) Bits 15 to 0 in the first instruction word (cccc in the above table) represent the Count in Bytes of the input data structure. Count must be a multiple of 4 Bytes, Source and Destination Addresses must be 32 bit aligned, otherwise the PKA Channel will go in error state. The input data structure to pass is: Table 90. Input data structure Name Size in words Description op_len 1 Length of the operands in bits mod W Modulus The resulting data structure is: Table 91. Resulting data structure Name Size in words R2 (mod n) W Description Montgomery’s parameter Doc ID 018553 Rev 3 227/590 Cryptographic co-processor (C3) RM0078 MOD_EXP instruction The MOD_EXP instruction is 4 words long. This instruction is used to set the Source Address Register for secret data, the Source Address Register for public data, the Destination Address Register and the Count Register of the PKA Channel (values of PKA_SRCR, PKA_PSRCR, PKA_DSTR and PKA_CNTR registers) and to start the computation of the modular exponentiation. In this case the input data structure has to include the Montgomery’s parameter to use for the computation. The Function and the Count are encoded in the first instruction word, the second word represents the Source Address for the secret data, the third word represents the Source Address for the public data and the Destination Address is represented by the fourth instruction word. MOD_EXP bit encoding Table 92. W# Bit encoding 1 0101 1101 0xxx xxxx cccc cccc cccc cccc 2 (32 bit Source Address for secret data) 3 (32 bit Source Address for public data) 4 (32 bit Destination Address) Bits 15 to 0 in the first instruction word (cccc in the above table) represent the Count in Bytes of the input data structures (both secret and public). Count must be a multiple of 4 Bytes, Source and Destination Addresses must be 32 bit aligned, otherwise the PKA Channel will go in error state. The input data structure with the secret data to pass is: Table 93. Input data structure Name Size in words Description op_len 1 mod W Modulus exp_len 1 Length of the exponent in bits exp E Exponent R2(mod n) W Montgomery’s parameter base W Base Length of the operands in bits The resulting data structure is: Table 94. Resulting data structure Name Exp Base 228/590 (mod Mod) Size in words W Description Result of the modular exponentiation Doc ID 018553 Rev 3 RM0078 Cryptographic co-processor (C3) MONTY_EXP instruction The MONTY_EXP instruction is 4 words long. This instruction is used to set the Source Address Register for secret data, the Source Address Register for public data, the Destination Address Register and the Count Register of the PKA Channel (values of PKA_SRCR, PKA_PSRCR, PKA_DSTR and PKA_CNTR registers) and to start the computation of the modular exponentiation. In this case the Montgomery’s parameter to use for the operation is computed by the channel. The Function and the Count are encoded in the first instruction word, the second word represents the Source Address for the secret data, the third word represents the Source Address for the public data and the Destination Address is represented by the fourth instruction word. MONTY_EXP bit encoding Table 95. W# Bit encoding 1 0101 1101 1xxx xxxx cccc cccc cccc cccc 2 (32 bit Source Address for secret data) 3 (32 bit Source Address for public data) 4 (32 bit Destination Address) Bits 15 to 0 in the first instruction word (cccc in the above table) represent the Count in Bytes of the input data structures (both secret and public). Count must be a multiple of 4 Bytes, Source and Destination Addresses must be 32 bit aligned, otherwise the PKA Channel will go in error state. The input data structure with the secret data to pass is: Table 96. Input data structure Name Size in words Description op_len 1 Length of the operands in bits mod W Modulus exp_len 1 Length of the exponent in bits exp E Exponent base W Base The resulting data structure is: \ Table 97. Resulting data structure Name Size in words BaseExp(mod Mod) W Description Result of the modular exponentiation Doc ID 018553 Rev 3 229/590 Cryptographic co-processor (C3) RM0078 ECC_MUL instruction The ECC_MUL instruction is 3 words long. This instruction is used to set the Source Address Register, the Destination Address Register and the Count Register of the PKA Channel (values of PKA_SRCR, PKA_DSTR and PKA_CNTR registers) and to start the computation of the scalar multiplication of an EC point. In this case the input data structure has to include the Montgomery’s parameter to use for the computation. The Function and the Count are encoded in the first instruction word, the second word represents the Source Address and the Destination Address is represented by the third instruction word. Table 98. ECC_MUL bit encoding W# Bit encoding 1 0101 1011 0xxx xxxx cccc cccc cccc cccc 2 (32 bit Source Address) 3 (32 bit Destination Address) Bits 15 to 0 in the first instruction word (cccc in the above table) represent the Count in Bytes of the input data structure. Count must be a multiple of 4 Bytes, Source and Destination Addresses must be 32 bit aligned, otherwise the PKA Channel will go in error state. The input data structure with the secret data to pass is: Table 99. Input data structure Name Size in words Description op_len 1 Length of the operands in bits mod W Modulus of the finite field a_sign 1 Sign of the a parameter a 1 Parameter of the elliptic curve Px W x-coordinate of the base point Py W y-coordinate of the base point k_len 1 Length of the scalar k k E Scalar k R2(mod n) W Montgomery’s parameter The resulting data structure is: Table 100. Resulting data structure 230/590 Name Size in words Description kPx W Coordinate of the result of the scalar multiplication kPy W y-coordinate of the result of the scalar multiplication Doc ID 018553 Rev 3 RM0078 Cryptographic co-processor (C3) ECC_MONTY_MUL instruction The ECC_MONTY_MUL instruction is 3 words long. This instruction is used to set the Source Address Register, the Destination Address Register and the Count Register of the PKA Channel (values of PKA_SRCR, PKA_DSTR and PKA_CNTR registers) and to start the computation of the scalar multiplication of an EC point. In this case the Montgomery’s parameter to use for the operation is computed by the channel. The Function and the Count are encoded in the first instruction word, the second word represents the Source Address and the Destination Address is represented by the third instruction word. Table 101. ECC_MONTY_MUL bit encoding W# Bit encoding 1 0101 1011 1xxx xxxx cccc cccc cccc cccc 2 (32 bit Source Address) 3 (32 bit Destination Address) Bits 15 to 0 in the first instruction word (cccc in the above table) represent the Count in Bytes of the input data structure. Count must be a multiple of 4 Bytes, Source and Destination Addresses must be 32 bit aligned, otherwise the PKA Channel will go in error state. The input data structure with the secret data to pass is: Table 102. Input data structure Name Size in words Description op_len 1 Length of the operands in bits mod W Modulus of the finite field a_sign 1 Sign of the a parameter a 1 Parameter of the elliptic curve Px W x-coordinate of the base point Py W y-coordinate of the base point k_len 1 Length of the scalar k k E Scalar k The resulting data structure is: Table 103. Resulting data structure Name Size in words Description kPx W Coordinate of the result of the scalar multiplication kPy W y-coordinate of the result of the scalar multiplication Doc ID 018553 Rev 3 231/590 Cryptographic co-processor (C3) 13.3.8 RM0078 RNG channel instruction set The RNG Channel is composed by the GET_VAL instruction. Instructions that do not conform to the following bit encoding or to the generic flow type instructions are unknown to the RNG Channel that will go in error state. GET_VAL instruction The RNG channel can fill a memory area with generated random values, starting from a passed destination pointer. The size of the values to generate is passed in the instruction. The GET_VAL instruction is 2 words long. This instruction is used to set the Destination Address Register of the RNG Channel (value of RNG_DSTR register). The size of the values to generate is encoded in the first instruction word, while the second word represents the Destination Address, where the generated values have to be stored. Table 104. GET_VAL instruction W# Bit encoding 1 0110 010x xxxx xxxx cccc cccc cccc cccc 2 (32 bit Destination Address) Bits 27 to 26 represent the number of additional words to pass with the instruction. Since there is only one additional word for the destination pointer, they must be ”01” for obtaining valid random numbers. Bit 25 represents the GET_VAL instruction and has to be 0. Bits 24 to 16 are unused and should be zero. Bits 15 to 0 in the first instruction word (cccc in the above table) represent the size in bytes of the output generated values. The size must be a multiple of 4 bytes and the destination Address must be 32 bit aligned, otherwise the RNG Channel will go in error state. The RNG channel can fill up to 64 Kbytes with a single instruction. 232/590 Doc ID 018553 Rev 3 RM0078 14 Temperature sensor (THSENS) Temperature sensor (THSENS) This chapter focuses on THSENS functionality and operation. For the THSENS feature list, refer to the SPEAr1340 datasheet: ● Doc ID 023063, Data sheet, SPEAr1340, Dual-core Cortex A9 HMI embedded MPU For technical details about the programmable registers, refer to the following companion document: ● 14.1 RM0089, Reference manual, SPEAr1340 address map and registers Overview The THSENS block is an embedded sensor for junction temperature monitoring. Figure 67. THSENS block interface RSTN DATA CLK DATAREADY DCORRECT PDN THSENS wrapper OVERFLOW pc lk_ i pre set n_i int_h i_thresh_o hi_th resh_i int_lo_thresh_o lo_thresh_i 14.2 Pins For a complete pin description, refer to Doc ID 023063, Data sheet, SPEAr1340, Dual-core Cortex A9 HMI embedded MPU. 14.3 Clocks The block receives the APB clock (wrapper logic) and a slower clock for the embedded THSENS_065LP library cell. See also: Chapter 5: Reset and clock generator (RCG). Doc ID 018553 Rev 3 233/590 Temperature sensor (THSENS) 14.4 RM0078 Interrupts The THSENS block generates the following interrupts: ● The “hi” interrupt line sets to ‘1’ when the measured temperature is higher than the “hi” threshold value. ● The “lo” interrupt line sets to ‘1’ when the measured temperature is lower than the “lo” threshold value. See also: Appendix A: Interrupts. 14.5 Functional description The block provides means to access the THSENS_065LP library cell for temperature measurement inside SPEAr1340 embedded MPU. Output temperature value is provided on DATA output, which is registered on PCLK and latched on a resynchronized (to PCLK) version of DATAREADY, because THSENS_065LP cell runs on a slow clock. Access to block inputs and outputs is possible through a dedicated MISC register THSENS_CFG. (see MISC chapter in RM0089, Reference manual, SPEAr1340 address map and registers). Temperature measurement range is from 20 degrees to 125 degrees Celsius. The typical value to be set on DCORRECT input is 10. As additional features with respect to simple temperature measurement, the wrapper allows the generation of two interrupts that depend on the comparison of the value of temperature read by the sensor with two threshold values fed as input to the wrapper. See Section 14.4: Interrupts for the description of these interrupts. 14.5.1 Low power modes For lower power consumption, use the PDN input pin to put the THSENS block in powerdown mode. 234/590 Doc ID 018553 Rev 3 RM0078 15 Multiport DDR2/3 controller (MPMC) Multiport DDR2/3 controller (MPMC) This chapter focuses on MPMC functionality and operation. For the MPMC feature list, refer to the SPEAr1340 datasheet: ● Doc ID 023063, Data sheet, SPEAr1340, Dual-core Cortex A9 HMI embedded MPU For technical details about the programmable registers, refer to the following companion document: ● 15.1 RM0089, Reference manual, SPEAr1340 address map and registers Overview MPMC is a high performance multichannel memory controller able to support DDR2 and DDR3 double data rate memory devices. The multiport architecture ensures that memory is shared efficiently among different high-bandwidth client modules. 15.2 Pins For a complete pin description, refer to Doc ID 023063, Data sheet, SPEAr1340, Dual-core Cortex A9 HMI embedded MPU. 15.3 Clocks See also: Chapter 5: Reset and clock generator (RCG). The memory controller uses two different clock sources from two different PLLs present in the system. The first clock source used for the six AXI data ports (axiY_ACLK) and AHB register port (regHCLK) is the same used in the system for the AMBA interconnect. Its frequency is fixed at 200 MHz. You can gate this clock through the mpmc_amba_clken parameter on the bit zero of the PERIP2_CLK_ENB register. The second clock source is used by the memory controller, the physical (PHY) interface, the MIM structure and the memory. The relationship between the frequency value of the memory controller and the PHY interface is fixed to 1:2. The translation between the interconnect frequency and the controller frequency is accomplished by the FIFOs present for each data AXI port. The maximum frequency for the PHY interface and memory interface is 533 MHz. The memory interface has a dedicated clean clock source(I_DDRPHY_clk_ref). The value can be programmed by the miscellaneous circuit through registers PLL4_FREQ and PLL4_CTR. The MIM is used to translate the controller frequency into the PHY frequency. Figure 68 shows the relationship between the memory controller clocks. Doc ID 018553 Rev 3 235/590 Multiport DDR2/3 controller (MPMC) RM0078 Figure 68. MPMC clocks scheme axiY_ACLK/regHCLK= 166 MHz Interconnect PLL1 Controller core axiY_ACLK/i_MPMC_clk = async i_MPMC_clk = 266 MHz PLL4 MIM DFI to DFI bridge i_MPMC_clk/i_DDRPHY_clk = sync /2 i_DDRPHY_clk = 533 MHz PHY i_DDRPHY_clk_ref (clean) = 533 MHz 15.3.1 Changing the input clock frequency The operating frequency of the memory controller is dependent on an ASIC-level input clock. There are situations in which you may wish to modify the frequency of the clock without resetting the memory controller. To change the clock frequency at which the memory controller operates, the memory controller must stop processing requests, the clock must be adjusted, the memory controller timing parameters must be reprogrammed and then the memory controller can be restarted. The procedure to change the clock frequency is as follows: 236/590 1. Ensure that the memory controller is idle, that is when the controller_busy signal is low. You can check the status of the controller_busy signal through the miscellaneous register MPMC_CTR_STS. 2. Put the memory devices into self-refresh mode by asserting the srefresh parameter to ‘b1. Do not use other means except for this parameter. To check if the devices have been put in self-refresh mode, check the cke_status signal (register MPMC_CTRL_REG_129). 3. Stop the memory controller by writing a ‘b0 to the start parameter. 4. The clock frequency may now be changed. Once the clock frequency has stabilized, program the parameters with the updated values. Review any other parameters that may be affected by the frequency change, such as: caslat, caslat_lin, caslat_lin_gate, any of the timing parameters, and so forth, and modify as necessary. 5. After updating all parameters, restart the memory controller by writing a ‘b1 to the start parameter. This forces the DLL to lock to the new frequency. 6. Once the DLL has locked and the PHY has initialized, the memory controller input signal dfi_init_complete is asserted (check the bit 8 into the int_status parameter is set). At this point, you can bring the memory devices out of self-refresh by clearing the srefresh parameter to ‘b0. You do not need to wait to send commands to the memory Doc ID 018553 Rev 3 RM0078 Multiport DDR2/3 controller (MPMC) controller after clearing the srefresh parameter; the memory controller will adjust for self-refresh exit time before processing memory commands. 7. 15.4 If any of the memory mode registers require updating at this point, you must set the values in the EMRS parameters, and then write them to the memory devices by setting the write_modereg parameter to ‘b1. Interrupts The memory controller has the following main interrupt line connected to the processor general interrupt circuit (refer to Chapter 2: CPU subsystem (A9SM)): ● The MPMC controller_int signal is connected to the interrupt line ID[92] The controller_int signal is a level sensitive signal that is asserted when the memory controller detects an interrupt condition. 15.5 Resets There are two sets of reset logic within the multiport memory controller: the reset for the core/phy and the reset for the AMBA ports. The reset signal for the core/phy is the asynchronous active-low reset (resetn_MPMC_ctrl/resetn_MPMC_phy) that resets all critical flip-flops in the system to ensure that the core emerges from reset in a known state. When the core is reset, all parameters are reset, and any commands within the core are lost. The core reset does not reset the AMBA ports, but resetting the core without resetting the AMBA ports will generate unknown behavior. The AMBA port reset is the active-low asynchronous signal (aresetn_MPMC/hresetn_MPMC). When this reset is asserted, the associated AMBA port resets, the port FIFOs clear, and the pointers reset. To prevent corruption within the memory controller, reset AXI ports only at initialization, or while a port is idle at the interface with no commands within the core. It is not required that both resets be activated simultaneously, but it is required that both resets be asserted concurrently for at least five cycles. First remove the core reset, then reset the port. 15.6 Functional description The multiport memory controller was designed for high memory bandwidth utilization and efficient arbitration for high priority requests. The architecture of the multiport system is shown in Figure 69 and consists of the following: ● 6 AXI interfaces ● Arbiter ● Command queue with placement logic ● Write data queue ● Read data queue ● DRAM command processing ● Register port with an AHB interface Doc ID 018553 Rev 3 237/590 Multiport DDR2/3 controller (MPMC) RM0078 Figure 69. Multiport memory controller architecture Write data interfaces MPMC core Write data AXI Arbiter Interfaces Command queue with placement logic DRAM command processing PHY interface AXI Bus queue AHB Bus Read data queue Register port Read Data Interfaces Programmable register settings The interface blocks contain FIFOs for commands, read and write data, and handle any clock domain crossings, if required. From the port interface blocks, commands are processed through an arbiter which feeds single commands to the command queue of the memory controller core. Write and read data is routed directly to the write and read data queues of the memory controller core independently of the arbiter. Each port has a distinct write data interface to the write data queue of the memory controller core. However, for read data, all ports share a single read data interface back to the port interface blocks. 15.6.1 AXI interface The MPMC has 6 AXI data ports that function as AXI slaves to external AXI masters. Transfers are burst-based of variable byte counts. The transfer types INCR and WRAP are fully supported. FIXED burst types are not supported. Table 105. AXI transfer type limitations Name axiY_ARBURST/ axiY_AWBURST Description AXI port Y read command burst type. – ‘b00 = Reserved (FIXED is not supported) – ‘b01 = INCR – ‘b10 = WRAP – ‘b11 = Reserved For the ports connected to the AXI bus directly, the thread ID signals (axiY_ARID or axiY_AWID) identify which of the 16-thread IDs is associated with the command. This thread ID is combined with the originating port to create a source ID which is used in the core to maintain originator information. There are no restrictions on mapping of thread IDs to AXI bus masters. The AXI interfaces handle all communication between the AXI bus and the core. Each port always supports full-size transfers where the full data port width is utilized on each beat. In addition, each port can be independently programmed to support 238/590 Doc ID 018553 Rev 3 RM0078 Multiport DDR2/3 controller (MPMC) narrow transfers, where the bytes-per-beat size is less than the port data width. There are no fixed timing requirements on a port between the traffic channels when the narrow transfer option is disabled, and in this case, write data may arrive before, with, or after the write command. When the narrow transfer option is enabled, a port does not accept write data until it has received the command and is aware of the total byte count associated with that command. AXI transaction management For optimization of the core, read commands from different thread IDs on a port, or read commands from different ports, may be automatically re-arranged in the core to execute out-of-order. When commands from different thread IDs are re-ordered, read data returned to the AXI- port interfaces will also be out-of-order and may be interleaved. To avoid reordering within a port, the AXI bus master should use one thread ID for all commands from any port. Note that read commands from the same thread ID on the same port will always execute in the same order as they were accepted into their port. Write commands have more ordering restrictions. Write commands from different ports may be re-ordered, but write commands from the same port, from the same or different thread ID, must remain in order. Write data may not be interleaved. While the AXI- interface does support multiple outstanding write instructions, the write data is expected to arrive in order. Because the read and write channels are distinct, read and write commands from different thread IDs on a port, from the same thread ID on a port, or from different ports, may be re-ordered. These commands will be automatically re-arranged for optimal command execution, as long as there are no collisions between the commands. An incoming AXI transaction is mapped into a core-level transaction, then synchronized from the AXI clock domain to the core clock domain and stored in the AXI port FIFOs. Each instruction consists of an address, size, length and thread ID. Because a port may utilize multiple thread IDs, the source ID that is used in the core is a combination of both the port and thread information. This concatenation occurs in the arbiter and this source ID is used in the placement logic. From the AXI FIFOs, the transaction is presented to the arbiter which arbitrates requests from all ports and forwards a single transaction to the core. AXI port configuration options Each AXI port in the memory controller has been defined for the requirements of the intended system. The configuration options are: ● Datapath width: Each port has a data interface width of 64 bits. ● Width of the ID: Each port is configured with a thread ID of 4 bits. ● Priority definition: Command priority is defined based on the port and the command type. For each port Y, there is an axiY_r_priority parameter which defines priorities for all read commands and an axiY_w_priority parameter which defines priorities for all write commands. Supported priority values range from 0 to 7, with 0 as the highest priority. ● Register port: AHB asynchronous port is used to write register. ● Buffering: Each data port contains a command, a read and a write FIFO, and a response storage array. In addition, each programmable port contains an asynchronous response FIFO to synchronize the memory response to the port time domain when operating asynchronously. The depth of each buffer in each port is listed in Table 106. Doc ID 018553 Rev 3 239/590 Multiport DDR2/3 controller (MPMC) RM0078 Table 106. Configured AXI settings Number port Port data width Command FIFO depth Write FIFO depth Read FIFO depth Write response FIFO depth Write response storage array depth 0-6 64 8 8 8 8 8 ● Exclusive access buffer depth: Exclusive access is an optional AXI feature that is only supported by the memory controller. This type of access will only be used if exclusive access commands are issued to the memory controller by driving the axiY_ARLOCK signal to ‘b10 with a read command. Each port of this memory controller contains 1 exclusive buffer and therefore each port may monitor the exclusivity of up to 1 transaction at any time. Refer to Section 15.6.3: Initialization protocol for more information. ● Locked access: There may be an occasion where a particular user wishes to have access to the memory without interruption from other ports. The AXI locked access option allows this functionality. The process is completely controlled by the user through the types of commands sent to the memory controller. ● Error detection: When an illegal operational condition is detected on a new AXI transaction entering the port, the port responds through an AXI error signal and the controller interrupt signal, and the error signature is recorded in the register space. The AXI error signal flagged is dependent on the type of transaction that caused the error (read or write). The controller interrupt and the signature information is dependent on type of error (command or data). AXI port FIFOs Incoming transactions from the AXI- interfaces are processed by the interface logic and mapped into equivalent transactions on the core bus. These transactions are queued into each port’s command FIFO. Each programmable port contains four FIFOs: command, read data, write data and response synchronization. The response synchronization FIFOs are only used when operating in asynchronous mode. In addition to the FIFOs, each port contains a storage array to hold the read and write responses. The five channels of traffic and their relationship to the port FIFOs is shown in Figure 70. 240/590 Doc ID 018553 Rev 3 RM0078 Multiport DDR2/3 controller (MPMC) Figure 70. AXI interface blocks AXI interface blocks Core When Asynchronous Write Resp. Port 0 from Memory Write Resp. Synch FIFO Write Resp. Array Write Data Write FIFO Write data queue (N-channels) Write Cmd Port 0 Arbitration Read Cmd Read Data Command FIFO Read FIFO When Asynchronous Write Resp. Port 1 Write Resp. Synch FIFO Write Resp. Array Arbiter Command queue with placement logic Write Data Write FIFO Write Cmd Port 0 Arbitration Read Cmd Read Data Command FIFO Read FIFO Read data queue (1 channel for all ports) . . .Replicated for Other Ports . . When Asynchronous Write Resp. Port N Write Resp. Synch FIFO Write Resp. Array Write Data Write FIFO Write Cmd Read Cmd Read Data Port 0 Arbitration Command FIFO Read FIFO Doc ID 018553 Rev 3 241/590 Multiport DDR2/3 controller (MPMC) RM0078 ● Command FIFO: The command FIFO accepts a single command from the in-port arbitration logic and holds the following command information: address, command type, encoded number of beats, encoded bytes-per-beat, bufferable/cacheable flag, coherent bufferable flag, thread ID, exclusive access / locked access status. ● Read FIFO: The read FIFO holds the data signals sent back from the memory controller, thread ID, last data byte and read data response. There is only one streaming read data interface out from the core for all AXI ports, regardless of the number of ports or the number of thread IDs for any port. The memory controller maps this data stream back to the proper port. With this singular data interface, the AXI port must be ready to accept the read data as soon as it is available on the internal core bus to avoid stalling the memory controller. The read FIFO is also responsible for synchronizing the data to the AXI time domain. ● Write FIFO: The write FIFO holds the data to send to memory, thread ID and data mask. The purpose of the write FIFO is to allow the AXI bus to offload its write data completely before the data is transferred to the core buffers. Each port has a distinct write channel into the core write data queue. If there are multiple thread IDs for a port, they will all share the channel for that port. ● Write response interface: When a write request is accepted into the AXI interface, an entry will be created in the response storage array for that command. When a write response is ready, the array will verify that this is the oldest command for any thread ID on that port and if so, the response will be sent out. The timing of this response is dependent on the type of response requested (buffered or cached) and the contents of the command queue. The AXI interface may not issue write responses until responses for all the older write commands have been issued; for cacheable responses, the entire system may be held off waiting for that response. This indication will be returned to the AXI master through the signal axiY_BRESP and its associate valid indicator axiY_BVALID signal. Different masters may require this response at different stages of the write command. A master that needs to quickly release the bus would optimally receive the completion response as soon as the port has accepted the write command and all of the corresponding data. Another master may wish to wait until the data has been accepted into the memory controller core, or successfully written to memory. Each data port is configured with two signals that work together to determine when an instruction is considered complete and the write completion response (axiY_BRESP) will be returned to the master. These signals are axiY_AWCACHE [0] and axiY_AWCOBUF (axiY_AWCOBUF with Y=5…0 is controllable by means of the miscellaneous register bits MPMC_CTR_STS[20:15]). Table 107 details the relationship between axiY_AWCACHE and axiY_AWCOBUF. Table 107. Write response signals axiY_AWCACHE[3:0] ‘b0000 ‘b0001 242/590 Response information(1) axiY_AWCOBUF Irrelevant Non-bufferable write command. Response will be ready when the write data has been committed to memory. 0 Standard bufferable write command. Response will be ready when the command and all associated data have been received by the AXI data port. There is no guarantee of data coherency across all AXI ports. Doc ID 018553 Rev 3 RM0078 Multiport DDR2/3 controller (MPMC) Table 107. Write response signals (continued) axiY_AWCACHE[3:0] Response information(1) axiY_AWCOBUF ‘b0001 1 Coherent bufferable write command. Response will be ready when the command has been accepted by the command queue in the memory controller core. This guarantees data coherency across all ports, but reduces the overall write response latency relative to the non-bufferable option. ‘bxxx- – All other settings are reserved. 1. The response will only be sent if all of the older write responses have been issued for any thread ID on that port. Treat the axiY_AWCOBUF signals as any other write command control signal. If the system cannot generate these signals on a per-command basis, it is recommended that these signals be tied high or low. 15.6.2 AHB interface The register interface is an independent AHB port to the memory controller. This port converts the AHB register addresses to core register addresses. This port operates asynchronously and contains a 4-deep asynchronous FIFO. The register port only supports the AHB SINGLE burst type. The register port only supports transfer types of NONSEQ or IDLE. This port will support accesses with a byte-per-beat equal to or less than the width of the AHB register bus. There is no support for INCR or WRAP burst types. There is no support for SEQ or BUSY transfer types. Table 108. AHB transfer type limitations Name Description regHBURST/ AHB register burst size. – ’b000 = Single beat (SINGLE) – All other settings are reserved regHTRANS AHB register transaction type indicator. – ’b00 = Idle – ’b01 = Reserved (busy is not supported) – ’b10 = Non-sequential – ’b11 = Reserved (sequential is not supported) regHRESP AHB register transfer response. Only “Okay” and “Error” are supported for AHB. – ’b00 = OKAY • ’b01 = ERROR\ – ’b10 = Reserved (RETRY is not supported) – ’b11 = Reserved (SPLIT is not supported) All parameters related to the AHB port operation are located in the core register map. These parameters are programmed during the initialization sequence along with all of the other device parameters. Doc ID 018553 Rev 3 243/590 Multiport DDR2/3 controller (MPMC) RM0078 A typical boot-up sequence includes a reset of the AXI ports as well as the core, followed by programming of the core through the AHB register port. 15.6.3 Initialization protocol For correct operation, the memory controller requires a specific sequence after all power to the system and to the memory devices is stable. The memory controller does not include circuitry to control the activation of power and ground to the system. Once the power to the memory devices and the system is stable, the memory controller must be initialized, and it will then automatically initialize the memory devices. Use the following procedure to initialize the memory controller: 15.6.4 1. Clear the resetn_MPMC_ctrl signal by driving it to ’b0. All programmable registers are cleared. 2. Set the resetn_MPMC_ctrl signal synchronously with the memory controller clock by driving the signal to ’b1. 3. Issue write register commands to configure the DRAM protocols. Keep the start parameter de-asserted during this initialization step. 4. Assert the start parameter. This triggers the memory controller to execute the initialization sequence using the parameters written into the registers. The memory controller waits for the PHY to assert the dfi_init_complete signal (bit 8 of int_status parameter), which indicates that the PHY and the memory devices are ready to accept commands. Exclusive access The exclusive access feature allows a master to monitor if a memory area has been altered since its last read. Exclusive access does not imply that the memory area is locked; other thread IDs of that port, or other ports, may access the area for reads or writes even though an exclusive access exists. If any writes occur to a memory area with a valid exclusive access request, the master will lose exclusivity and be informed of this status when it attempts to write to the area again. A loss of exclusivity does not trigger an interrupt or any error conditions; however, the AXI protocol requires that the write data is not written to memory if an exclusive write fails its exclusivity check. The master that has lost exclusivity must determine whether to restart the sequence by requesting another exclusive read or to write the data to the memory regardless via a non-exclusive write. 15.6.5 244/590 Error responses ● AXI error response: When an illegal operational condition is detected on a new AXI transaction entering the port, the port responds with an error condition. Instructions that generate AXI errors result in unpredictable behavior, and may cause memory corruption and/or hang conditions. When the programmable ports are programmed to asynchronous mode, the error signature is serialized and sent to the memory controller as a single data stream. This eliminates the need for an asynchronous FIFO to capture error information, but adds a delay to interrupt generation that does not impact the timing of the responses. ● Write errors: Error responses on a write operation are sent on the write response channel through the axiY_BRESP bus. A single response is sent for each write command. Write error responses are generated if the decoded bytes-per-beat is less Doc ID 018553 Rev 3 RM0078 Multiport DDR2/3 controller (MPMC) than the port data width when the narrow transfer option is not selected. This is an error for write commands only when the axiY_AWLEN is greater than 0 (Command Error). 15.6.6 ● Read errors: For read commands, the error is sent on the read data channel through the axiY_RRESP bus. The response is sent along with each data word. Read error responses are generated if a double-bit ECC error is detected on a read, and reporting is enabled in the ctrl_raw parameter. In addition to the controller interrupt and controller-level status signals, double-bit ECC errors on read commands also trigger an AXI response. For default transfers, the error is sent with the beats that caused the error. If the error was associated with a narrow transfer, the error is sent with each beat of the erroneous data word. ● AXI error reporting: If an AXI command error occurs, a bit will be set in the int_status parameter and the address and source ID of the command are saved in the port_cmd_error_addr and port_cmd_error_id parameters, respectively. In addition, the access type or types that relate to the error are stored in the port_cmd_error_type parameter. Similarly, when a data error occurs, the source ID of the command is saved in the port_data_error_id parameter. The access type or types that relate to the error are stored in the port_data_error_type parameter. The bits in the error type parameters are not exclusive. Multiple bits may be set to indicate the type of errors that occurred. Reading the int_ack parameter allows future errors to be captured in these error parameters. Read these parameters if the axiY_BRESP or axiY_RRESP signals are set. If multiple errors occur prior to an acknowledgment of the first error, the parameters still represent the first error attributes. Other error signatures are lost. If multiple errors occur simultaneously on different ports, the error information represents the lowest numbered erring port. Single-bit and double-bit ECC errors are also reported in the int_status parameter and the error signature parameters as detailed in Section 15.6.5: Error responses. Multiport arbiter The arbiter is responsible for arbitrating requests from the ports and sending requests to the memory controller core. This memory controller supports the weighted round-robin arbitration scheme which is based on three-step arbitration system. All commands are routed into priority groups based on the priority of the requests. Then, within each priority group, requests are serviced according to the “weight” (relative priority) of each port. Finally, each priority group presents a single command to the priority select module, which passes the highest priority command on to the memory controller core. This arbitration scheme also supports two additional features. For situations where the priority and the relative priority for multiple commands are identical, a port ordering system in included whereby the user may adjust the order in which the ports are considered. Secondly, for situations where two ports may be related, a mechanism is included which allows a pair of ports to share arbitration bandwidth for bandwidth efficiency. Round-robin operation Round-robin operation is the simplest form of arbitration and is ideal for systems that do not require requests to be treated preferentially to maintain bandwidth or minimize latency. This scheme uses a counter that rotates through the port numbers, incrementing every time a port request is granted. If the port that the counter is referencing has an active request, and the memory controller core command queue is not full, then this request will be sent to the memory controller core. If there is not an active request for that port, then the port will be skipped and the next port will be checked. The counter will increment by one whenever any request has been processed, regardless of which port’s request was arbitrated. Round-robin Doc ID 018553 Rev 3 245/590 Multiport DDR2/3 controller (MPMC) RM0078 arbitration ensures that each port’s requests can be successfully arbitrated into the memory controller core every N cycles, where N is the number of ports in the memory. No port will ever be locked out, and any port can have its requests serviced on every cycle as long as all other ports are quiet and the command queue is not full. Port priority For AXI ports, the priority is associated with a port and each port has separate priority parameter for reads and writes. These values are stored into the programmable parameters axiY_r_priority and axiY_w_priority (where Y represents the port number) at controller initialization. Internally, the ports are organized into priority groups based on their priority setting. The priority value is also used by the placement logic inside the memory controller core when filling the command queue. A priority value of 0 is highest priority, and a priority value of (decimal) 7 is the lowest priority in the memory controller. Note: You can program at priority level 0, but it is better to reserve this priority value so that the placement queue can elevate to this level through aging. Relative priority Inside each priority group, the relative priority is used to determine arbitration. The memory controller contains 8 identical priority groups with logic that selects between the requests from all commands at that priority level. The relative priority parameters axiY_priorityZ_relative_priority (where Y is the port number and Z is the priority group) “weight” the ports for each level and determine how the priority group will be arbitrated. Figure 71 shows this type of arbitration system. By using the relative priority concept, the arbitration is skewed in favor of certain ports based on user programming. Note: 246/590 The relative priority parameters have a minimum acceptable value of 1 to prevent port lockout. A 0 value will cause an error condition. Doc ID 018553 Rev 3 RM0078 Multiport DDR2/3 controller (MPMC) Figure 71. Weighted round-robin priority group structure Priority group 1 axi0_priority1_relative_priority ... axi5_priority1_relative_priority Priority 2 commands Priority group 2 axi0_priority2_relative_priority ... axi5_priority2_relative_priority ... Priority groups 3-6 ... ... Priority 7 commands Command queue of the MC core Priority 1 commands Priority select module Ports 0-5 Priority sorting Priority 0 commands Priority group 0 axi0_priority0_relative_priority axi1_priority0_relative_priority ... axi5_priority0_relative_priority Priority group 7 axi0_priority7_relative_priority ... axi5_priority7_relative_priority Programmable register settings If the relative priorities are all programmed to the same value within any priority group, then the arbitration will mimic a version of simple round-robin scheme within that priority group. Instead of incrementing whenever any request is processed, the simple round-robin counter will only increment to the next port after the value in the axiY_priorityZ_relative_priority parameter number of requests are processed. Each port X for priority level Y will be allocated the ratio of that port’s relative priority parameter (axiY_priorityZ_relative_priority) to the sum of all requesting port’s relative priority values. If a particular port is not requesting, then it is not included in the sum calculation, which means that the arbitration will be split with relative proportions among the requesting ports. As an example, consider a system with 4 ports where all requests are at priority 0. This system is described in Table 109. Table 109. Relative priority example Parameter System A axi0_priority0_relative_priority 1 axi1_priority0_relative_priority 2 axi2_priority0_relative_priority 3 axi3_priority0_relative_priority 4 Doc ID 018553 Rev 3 247/590 Multiport DDR2/3 controller (MPMC) RM0078 For this system, port 0 will be serviced 1/(1+2+3+4) = 1/10 of the time and Port 3 will be serviced 4/ (1+2+3+4) = 4/10 of the time. However, if port 2 is not actively requesting, then port 0 will be serviced 1/(1+2+4) = 1/7 of the time and port 3 will be serviced 4/(1+2+4) = 4/7 of the time. To ensure that relative priorities are maintained, there is a weight counter for each port within each priority group. These counters track the number of transactions accepted for that port in that priority group. When any counter value reaches the programmed relative port priority, the scan order for that priority group will be internally modified. The port that has met its relative priority will be dynamically positioned to the bottom of the scan order (and its counter will be reset), allowing other ports a preferential position. Note: For ports that are not expected to issue requests at a certain priority level, program the associated relative priority parameter to 0x1. This allows for minimum allocation without the risk of lock out in case a command appears. Port ordering With simple round-robin arbitration, the ports are scanned based on their port number in incrementing order in the system. Assuming that the command queue is not full, the port referenced by the counter is examined for valid incoming transactions. If there is an active request, it will be accepted. Otherwise, the next port in the scan order will be checked, and its request accepted. For the memory controller with weighted round-robin arbitration, the user has the option of adjusting the order that the ports are scanned. This is useful if requests from certain ports are more critical, or if a specific order may reduce contention between ports. The three-bit axiY_port_ordering parameters are used to set this new scan order. A value of ’b000 gives the highest listing in the scan order, and a value of ’b111 is the lowest listing in the scan order. If the 6 axiY_port_ordering parameters are programmed with unique values, then the scan order will be modified to proceed sequentially in this new order. If any of the port ordering parameters has the same value, then those ports will still be equal in the arbitration test. In this case, the port number will select between these ports, with the lower-numbered port automatically being selected first. Weighted round-robin arbitration summary The memory controller weighted round-robin arbitration system combines the concepts of round-robin operation, priority, relative priority and port ordering. The incoming commands are separated into priority groups based on the priority of the associated port for that type of command. Within each priority group, the relative priority values are examined to determine the arbitration winner. If the relative priority values are identical and no individual command can be selected, then the scan order is used to select between the requests. In the end, the highest priority command, from the highest relative priority port, with the highest location in the scan order will be selected and sent to the memory controller core. As an example, consider the system described in Table 110. The counters refer to the counters that exist for each port within each priority group to ensure that relative priorities are maintained. For simplification, the command queue is considered to never be full and commands are only received at priority level 0. The behavior is shown in Table 111. The highest port in the scan order that is requesting always wins arbitration, and the scan order is dynamically modified when any port counter reaches its allocated relative priority value. Note that if the command queue was considered, then cycles where the command queue 248/590 Doc ID 018553 Rev 3 RM0078 Multiport DDR2/3 controller (MPMC) was full would not have any arbitration winner and therefore, the counter values and scan order would not change on that cycle. Table 110. System D specifications Parameter Port 0 Port 1 Port 2 Port 3 axiY_priority0_relative_priority 4 3 2 1 axiY_port_ordering 0 1 2 3 Table 111. System D operation Ports requesting Cycle P0 P1 P2 P3 Arbitration winner Next counter P0 P1 P2 P3 Next scan order P0-P1-P2-P3 0 Y Y P0 1 0 0 0 P0-P1-P2-P3 1 Y Y Y P0 2 0 0 0 P0-P1-P2-P3 2 Y Y Y Y P0 3 0 0 0 P0-P1-P2-P3 3 Y Y Y Y P0 4 0 0 0 P1-P2-P3-P0 4 Y Y Y Y P1 0 1 0 0 P1-P2-P3-P0 5 Y Y Y Y P1 0 2 0 0 P1-P2-P3-P0 6 Y Y Y Y P1 0 3 0 0 P2-P3-P0-P1 7 Y Y Y P2 0 0 1 0 P2-P3-P0-P1 8 Y Y Y P2 0 0 2 0 P3-P0-P1-P2 9 Y Y P3 0 0 0 1 P0-P1-P2-P3 10 Y Y Y P0 1 0 0 0 P0-P1-P2-P3 11 Y Y P2 1 0 1 0 P0-P1-P2-P3 12 Y Y P2 1 0 2 0 P0-P1-P3-P2 Doc ID 018553 Rev 3 249/590 Multiport DDR2/3 controller (MPMC) RM0078 Priority relaxing A lower priority level will not win arbitration in weighted round-robin arbitration unless there are no higher priority requests. This could mean that, in a situation where high priority requests are being received continuously, lower priority requests could be locked out indefinitely. To avoid this scenario and control the arbitration latency for lower-priority commands, it is possible to disable priority groups temporarily. This is known as priority relaxing, and it is a time-controlled function. Each higher priority group will be temporarily disabled when the pre-set counter value for the lower priority group has been reached and a request is waiting. The axiY_priority_relax parameters set the counter value for port X at which the priority relax condition will be triggered. The timing counters inside each port are controlled by the weighted_round_robin_latency_control parameter. When the latency control bit is set to ’b1, the timing counters are free-running. Any timing counter may hit its axiY_priority_relax value at any point. When this occurs, higher-priority groups are disabled to allow a waiting request for this port to be processed. This results in a random latency for each port, but the maximum latency is fixed at the axiY_priority_relax value. If the current port does not have any commands waiting when the timing counter hits the relax value, then the counter will be reset and the arbiter will function normally. When the weighted_round_robin_latency_control parameter is cleared to ’b0, the timing counters only count while that port has a waiting request that is not being processed. In this case, when the port’s axiY_priority_relax parameter value is reached, all priority groups at priority levels higher than the waiting request are disabled. This port’s command is granted arbitration and is moved through to the memory controller core. Because the priority relax parameters and counters are associated with individual ports, it is possible that multiple priority relax counters could reach their specified value simultaneously. In this case, the lower priority command will be arbitrated first and then the higher priority command. This situation could alter the arbitration latency slightly, causing it to be longer than the expected value in the priority relax parameter. Port pairing The memory controller arbiter incorporates a feature which allows adjacent ports to be grouped together and considered jointly for arbitration. The weighted_round_robin_weight_sharing parameter controls this function, with one bit per pair of ports in the memory controller. Bit 0 controls ports 0 and 1, Bit 1 controls ports 2 and 3, etc. Because the ports are grouped together, their relative priorities are not considered separately. Referring to Section : Relative priority, the general formula for port priority allocation is the ratio of that port’s relative priority parameter (axiY_priorityZ_relative_priority) to the sum of all requesting port’s relative priority values. In this case, the relative priority value of only one of the paired ports is used for the sum calculation. This means that the bandwidth will be divided differently among the ports. Note: 250/590 For port weight sharing to be used, the relative priority parameters for the port pair must be programmed to the same value, and the port order of the paired ports must be sequential. If either condition is not followed, an error bit is set to ’b1. Doc ID 018553 Rev 3 RM0078 Multiport DDR2/3 controller (MPMC) Error conditions With the programming complexities of the weighted round-robin arbitration scheme, an error reporting mechanism is included to notify users of illegal programming scenarios. These error conditions generate a memory controller core interrupt and set a bit in the wrr_param_value_err parameter to ’b1 (see bits 16-19 of the Controller configuration register 47). The potential error conditions are: ● Bit 16 = The 6 axiY_port_ordering parameters do not all contain unique values. ● Bit 17 = Any of the axiY_priorityZ_relative_priority parameters have been programmed with a zero value. A 0 value leads to unknown behavior. The minimum allowable value is 1. ● Bit 18 = Any ports, whose related bit of the weighted_round_robin_weight_sharing parameter is set to ’b1, do not have the same values in their axiY_priorityZ_relative_priority parameters. ● Bit 19 = For ports whose related bit of the weighted_round_robin_weight_sharing parameter is set to ’b1, the values of the axiY_port_ordering parameters are not sequential. If bits 16, 18 or 19 are set to ’b1 in the wrr_param_value_err parameter, and any of the ports are paired in the weighted_round_robin_weight_sharing parameter, then all weight sharing data will be ignored during memory controller initialization and the ports will be prioritized by port number. If port pairing is not being used, but the bit 16 error condition is set to ’b1, then ports with a non-unique port ordering are prioritized by port number. 15.6.7 Command queue with placement logic The memory controller core contains a command queue that accepts commands from the arbiter. This command queue uses a placement algorithm to determine the order that commands execute in the memory controller core. The placement logic follows many rules to determine where new commands should be inserted into the queue, relative to the contents of the command queue at the time. Placement is determined by considering address collisions, source collisions, data collisions, command types and priorities. In addition, the placement logic attempts to maximize efficiency of the memory controller core through command grouping and bank splitting. Once placed into the command queue, the relative order of commands is constant. Many of the rules used in placement may be individually enabled/disabled. In addition, the queue may be disabled by clearing the placement_en parameter, resulting in an in-line queue that services requests in the order they are received. If the placement_en parameter is cleared to ’b0, the placement algorithm will be ignored. The rules of the placement algorithm are the following: ● Address collision/Data coherency violation: To avoid address collisions, reads or writes that access the same chip select, bank and row as a command already in the command queue will be inserted into the command queue after the original command, even if the new command is of a higher priority. This factor may be enabled/disabled through the addr_cmp_en parameter and should be disabled only if the system can guarantee coherency of reads and writes. ● Source ID collision: Each port is assigned a specific source ID that is a combination of the port and thread ID information, and identifies the source uniquely. This allows the memory controller to map data from/ to the correct source/destination. In general, read Doc ID 018553 Rev 3 251/590 Multiport DDR2/3 controller (MPMC) RM0078 commands from the same source ID will be placed in the command queue in order. Therefore, a read command with the same source ID as a read command already in the command queue will be processed after the original read command. All write commands from a port, even with different source IDs, will be executed in order. If there are no address conflicts, a read command could be executed ahead of a write command with the same source ID, and likewise a write command could be executed ahead of a read command with the same source ID. This feature will always be enabled. ● Write buffer collision: Incoming write requests in the command queue are allocated to one of the 4 write buffers of the memory controller core automatically based on availability. New write commands will be designated to any available buffer. However, back-to-back write requests from a particular source ID will be allocated to the same write buffer as the previous command. Because the memory controller core must pull data out of the buffers in the order it was stored, if a write command is linked to a buffer that is associated with another command in the queue, then the new command will be placed in the command queue after that command, regardless of priority. This feature will always be enabled. ● Priority: The placement algorithm will attempt to place higher priority commands ahead of lower priority commands, as long as they have no source ID, write buffer or address collisions. Higher priority commands will be placed lower in the command queue if they access the same address, are from the same requestor or use the same buffer as lower priority commands already in the command queue. This feature is enabled through the priority_en parameter. ● Bank splitting: Before accesses can be made to two different rows within the same bank, the first active row must be closed (pre-charged) and the new row must be opened (activated). Both activities require some timing overhead; therefore, for optimization, the placement queue will attempt to insert the new command into the command queue such that commands to other banks may execute during this timing overhead. The placement of the new commands will still follow priority, source ID, write buffer and address collision rules. The placement logic will also attempt to optimize the memory controller core by inserting a command to the same bank as an existing command in the command queue immediately after the original command. This reduces the overall timing overhead by potentially eliminating one pre-charging/ activating cycle. This placement will only be possible if there are no priority, source ID, write buffer or address collisions or conflicts with other commands in the command queue. All bank splitting features are enabled through the bank_split_en parameter. ● Read/Write grouping: The memory suffers a small timing overhead when switching from read to write mode. For efficiency, the placement queue will attempt to place a new read command sequentially with other read commands in the command queue, or a new write command sequentially with other write commands in the command queue. Grouping will only be possible if no priority, source ID, write buffer or address collision rules are violated. This feature is enabled through the rw_same_en parameter. Once a command has been placed in the command queue, its order relative to the other commands in the queue at that time is fixed. While this provides simplicity in the algorithm, there are drawbacks. For this reason, the memory controller offers two options that affect commands once they have been placed in the command queue: ● 252/590 Command aging: Because commands can be inserted ahead of existing commands in the command queue, the situation could occur where a low priority command remains at the bottom of the queue indefinitely. To avoid such a lockout condition, aging counters have been included in the placement logic that measure the number of cycles Doc ID 018553 Rev 3 RM0078 Multiport DDR2/3 controller (MPMC) that each command has been waiting. If command aging is enabled through the active_aging parameter, then if an aging counter hits its maximum, the priority of the associated command will be decremented by one (lower priority commands are executed first). This increases the likelihood that this command will move to the top of the command queue and be executed. Note that this command does not move relative positions in the command queue when it ages; the new priority will be considered when placing new commands into the command queue. Aging is controlled through a master aging counter and command aging counters associated with each command in the command queue. The age_count and command_age_count parameters hold the initial values for each of these counters, respectively. When the master counter counts down the age_count value, a signal is sent to the command aging counters to decrement. When the command aging counters have completely decremented, then the priority of the associated command is decremented by one number and the counter is reset. Therefore, a command does not age by a priority level until the total elapsed cycles has reached the product of the age_count and command_age_count values. The maximum number of cycles that any command can wait in the command queue until reaching the top priority level is the product of the age_count value, the command_age_count value, and the number of priority levels in the system. ● 15.6.8 High-priority command swapping: Commands are assigned priority values to ensure that critical commands are executed more quickly in the memory controller than less important commands. Therefore, it is desirable that high-priority commands pass into the memory controller core as soon as possible. The placement algorithm takes priority into account when determining the order of commands, but still allows a scenario in which a high-priority command sits waiting at the top of the command queue while another command, perhaps of a lower priority, is in process. The high-priority command swapping feature allows this new high-priority command to be executed more quickly. If the user has enabled the swapping function through the swap_en parameter, then the entry at the top of the command queue will be compared with the current command in progress. If the command queue’s top entry is of a higher priority (not the same priority), and it does not have an address, source ID or write buffer conflict with the current command being executed, then the original command will be interrupted. For this memory controller, an additional check is performed before a read command is interrupted. If the read command in progress and the read command at the top of the command queue are from the same port, then the executing command will only be interrupted if the swap_port_rw_same_en parameter is set to ’b1. If this parameter is cleared to ’b0, a read command from the same port as a read command in progress, even with a higher priority and without any conflicts, would remain at the top of the command queue while the current command completes. Other memory controller features Out-of-range address checking Because the master may attempt to write to an invalid address, all incoming addresses are always checked against the addressable physical memory space. If a transaction is addressed to an out-of-range memory location, bit 0 of the int_status parameter is set to 1b1 to alert the user of this condition. The memory controller records the address, source ID, and the length and type of transaction that caused the out-of-range interrupt in the out_of_range_addr, out_of_range_source_id, out_of_range_length and out_of_range_type parameters. Reading the out-of-range parameters initiates the memory controller to empty these parameters and allow them to store out-of-range access information for future errors. The Doc ID 018553 Rev 3 253/590 Multiport DDR2/3 controller (MPMC) RM0078 interrupt are acknowledged by setting bit 0 of the int_ack parameter to 1b1, which in turn causes bit 0 of the int_status parameter to clear to 1b0. If a second out-of-range access occurs before the first out-of-range interrupt is acknowledged, bit 1 of the int_status parameter is set to 1b1 to indicate that multiple out-of-range accesses occurred. If the out-of-range parameters have been read when the second out-of-range error occurs, the details of this transactionare stored in the out-of-range parameters. If they have not been read, the details of the second error are lost. Even though the address has been identified as erroneous, the memory controller still processes the read or write transaction. A read transaction returns random data that the user must receive to avoid stalling the memory controller. A standard, non-exclusive write transaction will write the associated data to an unknown location in the memory array, potentially over-writing other stored data. A command can not be aborted once accepted into the memory controller. Table 112. Out of range access parameters Parameter name Description out_of_range_addr [34:0] Transaction Address out_of_range_source_id [6:0] Bits [6:4] = Port ID Bits [3:0] = AXI Thread ID out_of_range_length [6:0] Total byte count of the transaction. For write commands: (axiY_AWLEN + 1) x 2 axiY_AWSIZE. For read commands: (axiY_ARLEN + 1) x 2 axiY_ARSIZE. out_of_range_type [5:0] 6b000000 = Non-exclusive write 6b000001 = Non-exclusive read 6b000010 = Non-exclusive masked write 6b000100 = Wrapped write 6b000101 = Wrapped read 6b000110 = Wrapped masked write 6b001000 = Exclusive write 6b001001 = Exclusive read 6b001010 = Exclusive masked write 6b010000 = Flushed write All other settings Reserved Self-refresh handshaking protocol You may manually trigger the memory devices to enter self-refresh mode by setting the srefresh parameter to 1b1 or by driving the user-interface signal srefresh_enter high. Either of these methods will cause the memory controller to complete the active processes inside the memory controller and then put the memory devices into self-refresh. The CKE input will be deasserted in this mode. In some circumstances, you may require confirmation that the memory devices have entered self-refresh mode. For this, the srefresh_ack acknowledge signal has been implemented. This acknowledge is only available when self-refresh mode is triggered by driving the srefresh_enter pin, an only if the pin is held high until the acknowledge is received. Pulsing the srefresh_enter signal is enough to trigger entry to the self-refresh mode, but the acknowledge requires the signal to be held. 254/590 Doc ID 018553 Rev 3 RM0078 Multiport DDR2/3 controller (MPMC) Once asserted, the srefresh_ack signal will be de-asserted when the srefresh_enter pin is de-asserted. Data byte disable In addition to the DFI signals, the memory controller provides a sideband signal, data_byte_disable, to the PHY indicating its data bus status. This signal’s width is a sum of the width of the memory data bus and the ECC data bus. For each bit of the bus that the memory controller is not using, the data_byte_disable signal will be driven to ‘b1. This signal is a concatenation of ECC functionality and data path reduction. The PHY should use this information to disable bits on the PHY/memory interface. Half datapath option This memory controller includes the option to reduce the usable size of the bus between the memory controller and the memory devices. This feature is useful when a different memory part, with a smaller data width, is utilized. To use a memory device with a smaller datapath, the half datapath option must be enabled by setting the programmable reduc parameter to 1b1. When the reduc parameter is set to 1b1, only the lower half of the DFI data bus is used. In this setting, the upper half the signal data_byte_disable will be driven high. If the reduc parameter is cleared to 1b0, the memory controller will ignore the half datapath option and function normally. In this case, the entire DFI data interface will be used. Idle drive enable For minimal power usage, the memory controller provides an option to disable the data and strobe buses when the memory controller is idle. If the user sets the drive_dq_dqs parameter to 1b1 and the memory controller is idle, the idle_drive_enable signal will be driven high. The memory controller is considered idle when there are no transactions currently in progress and the memory controller core command queue is empty. 15.6.9 Address mapping The memory controller automatically maps user addresses to the DRAM memory in a contiguous block. Addressing starts at user address 0 and ends at the highest available address according to the size and number of DRAM devices present. This mapping is dependent on how the memory controller was configured and how the parameters in the internal MC registers are programmed. The exact number and values of these parameters depends on the configuration and the type of memory for which the memory controller was designed. The mapping of the address space to the internal data storage structure of the DRAM devices is based on the actual size of the DRAM devices available. The size is stored in user-programmable parameters that must be initialized at power up. Certain DRAM devices allow for different mapping options to be chosen, while other DRAM devices depend on the burst length chosen. DDR SDRAM address mapping options The address structure of DDR SDRAM devices contains five fields. Each of these fields can be individually addressed when accessing the DRAM. The address map for this memory controller is ordered as follows: Doc ID 018553 Rev 3 255/590 Multiport DDR2/3 controller (MPMC) RM0078 Chip Select -- Row -- Bank -- Column -- Datapath The maximum widths of the fields are based on the configuration settings. The actual widths of the fields may be smaller if the device address width parameters (addr_pins, eight_bank_mode and column_size) are programmed differently. Maximum address space The maximum user address range is determined by the width of the memory datapath, the number of chip select pins, and the address space of the DRAM device. The maximum amount of memory can be calculated by the following formula: MaxMemBytes = ChipSelects X 2Address X NumBanks X DPWidthBytes For this memory controller, the maximum values for these fields are as follows: ● Chip selects = 2 ● Device address = 15 + 14 (Row + Column) ● Number of banks per chip select = 8 ● Memory datapath width in bytes = 4 bytes As a result, the maximum accessible memory area is 1 GB or 2 GB depending on configuration. Memory mapping to address space The maximum allowable address space and mapping into the DRAM devices for the memory controller is shown in Figure 72. This map corresponds to a memory device with 15 row bits and 14 column bits. Figure 72. Memory controller memory map: maximum The addr_pins and column_size parameters can each range from the maximum configured for the memory controller to seven bits smaller than the maximum configured. This allows the memory controller to function with a wide variety of memory device sizes. The settings for the addr_pins and column_size parameters control how the address map is used to decode the user address to the DRAM chip selects and row and column addresses. The eight_bank_mode parameter controls the address when eight bank mode is supported. It is assumed that the values in these parameters never exceed the maximum values configured. Using the example shown in Figure 72, if the memory controller is wired to devices with 12 row pins and 12 column bits, the maximum accessible memory space would be reduced. The accessible memory space for this configuration is 1024 MB. The address map for this configuration is shown in Figure 73. Note that address bits 30 through 34 are listed as ‘don’t care’ bits. These bits are ignored when the memory controller generates the address to the DRAM devices, but they are used to verify that the address lies within the usable address range of the memory controller. Therefore, the user should drive these bits to ‘b0 to avoid the memory controller interpreting the command as being out-of-range and setting one or both of the out-of-range interrupt bits. 256/590 Doc ID 018553 Rev 3 RM0078 Multiport DDR2/3 controller (MPMC) Figure 73. Alternate memory map Note: 1 The Chip Select, Row, Bank, and Column fields are used to address an entire memory word, and the memory controller bits are used to address individual bytes within that user word. For example, for a read starting at byte address 0x2, the memory controller bits must be defined as 3b010 in order to address this byte directly. Reads and writes are memory word-aligned if all the memory controller bits are 0. 2 The maximum accessible memory area is 1 GB or 2 GB, depending on configuration. When the 2 GB address space is enabled, the ACP function is not available. Doc ID 018553 Rev 3 257/590 Static memory controller (FSMC) 16 RM0078 Static memory controller (FSMC) This chapter focuses on FSMC functionality and operation. For the FSMC feature list, refer to the SPEAr1340 datasheet: ● Doc ID 023063, Data sheet, SPEAr1340, Dual-core Cortex A9 HMI embedded MPU For technical details about the programmable registers, refer to the following companion document: ● 16.1 RM0089, Reference manual, SPEAr1340 address map and registers Overview The flexible static memory controller (FSMC) is an AHB peripheral that interfaces AHB masters to a wide variety of memories. A wrapper is designed to contain this IP and a multiplexer (MUX) which selects the appropriate signals to connect to the pads depending on the type of memory. Figure 74. FSMC and embedded MPU boundary In chip I/Os Out chip I/Os SRAM Pad direction FSMC Interrupts & wait I pads Interrupts O pads Clocks & synch Address command clock On chip logic Configuration registers SoC boundary 258/590 Doc ID 018553 Rev 3 On board bus Out data bus AHB I/O pads In data bus NOR Flash NAND Flash RM0078 16.2 Static memory controller (FSMC) Pins For a complete pin description, refer to Doc ID 023063, Data sheet, SPEAr1340, Dual-core Cortex A9 HMI embedded MPU. 16.3 Clocks The FSMC receives HCLK, the AHB clock, running at 166 MHz. See also: Chapter 5: Reset and clock generator (RCG). 16.4 Interrupts Each NAND Flash connected to SPEAr1340 can generate an interrupt to notify the end of the BUSY state. When it happens, FSMC can provide them to ARM via pc*_int, sampled according to the configuration register GenMemCtrl_int. See also: Appendix A: Interrupts. 16.5 Functional description The flexible static memory controller is used to interface an AHB bus to external memories. The main purposes of FSMC are: 16.5.1 ● Translate the AHB protocol into the appropriate external storage device protocol ● Meet the timing of the external devices, slowing down and counting an appropriate number of HCLK (AHB clock) cycles to complete the transaction to the external device. NAND Flash controller The following accesses are supported for NAND Flash: ● Common memory space access: this is the normal way of accessing the NAND Flash. The data size is specified in DeviceWidth field of GenMemCtrl_PCx registers and corresponding timings must be specified in GenMemCtrl_Commx registers. ● Attribute memory space access: this is the same as the common memory access mode, except that timings are specified in the GenMemCtrl_Attrib register. FSMC can support up to 2 memory banks for NAND Flash. The following table lists the criteria used to select each bank. Table 113. NAND bank selection Address (HEX) Region name Bank selected Chip select 0xB0800000 Common memory space Bank 0 FSMC_CE0n 0xB0880000 Not used Bank 0 FSMC_CE0n 0xB0900000 Attribute memory space Bank 0 FSMC_CE0n 0xB0980000 Not used Bank 0 FSMC_CE0n 0xB0A00000 Common memory space Bank 1 FSMC_CE1n 0xB0A80000 Not used Bank 1 FSMC_CE1n Doc ID 018553 Rev 3 259/590 Static memory controller (FSMC) RM0078 Table 113. NAND bank selection (continued) Address (HEX) 16.5.2 Region name Bank selected Chip select 0xB0B00000 Attribute memory space Bank 1 FSMC_CE1n 0xB0B80000 Not used Bank 1 FSMC_CE1n NOR Flash / SRAM controller FSMC can support up to 2 memory banks for NOR Flash and SRAM. The following table shows the criteria used to select each bank. Table 114. NOR/SRAM bank selection Address (HEX) Bank selected Chip select 0xA0000000 Bank 0 FSMC_CE0n 0xA4000000 Bank 1 FSMC_CE1n The lower bits of HADDR are issued to the external memory taking into account that HADDR is expressed in bytes while the memory is addressed in memory words. The following table is used and does not depend on the actual bus data transfer size HSIZE. Table 115. External memory address Memory word size HADDR bits issued to memory 1 byte HADDR[25:0] 2 bytes HADDR[25:1] When the bus data size (HSIZE) is smaller that the actual memory size, if it is a SRAM then the controller uses the byte lanes (BLN outputs). For instance, reading or writing a byte (HSIZE=00, generated by the ARM assembly instructions LDRB or STRB) to/from a SRAM 16-bit data wide, is managed automatically by the controller with BLN outputs. If it is a Flash, the controller reads the whole memory word and uses only the information needed. There is no hardware mechanism to avoid writing to a Flash memory less than one full memory word. 260/590 Doc ID 018553 Rev 3 RM0078 16.5.3 Static memory controller (FSMC) Asynchronous operating modes The interface signals are synchronized by the internal clock HCLK. This clock is not output to the memory, however it is shown in the following graphics as a reference. When the extended mode is enabled (ExtendMode bit set in the GenMemCtrlx register), there are four extended modes available (A, B, C and D) and it is possible to mix these modes in read and write access. For example, read in mode A and write in mode B. When the extended mode is disabled, the FSMC operates in Mode1 or Mode2 as follows: ● Mode 1 is the default mode when SRAM memory type is selected (Bits 3:2 MemoryType = 0x0 in the GenMemCtrlx register). ● Mode 2 is the default mode when NOR memory type is selected (Bits 3:2 MemoryType = 0x2 in the GenMemCtrlx register). When the extended mode is disabled, it is not possible to mix modes in read and write access. Table 116. FSMC asynchronous operating modes Memory type Asynchronous mode SRAM NOR Extended mode disabled Mode 1 Mode 2 Extended mode enabled Mode A Mode B, C, D To select between the four asynchronous access modes, you must configure the AccessMode bits in the GenMemCtrl_timx register as follows: – 00: Access mode A – 01: Access mode B – 10: Access mode C – 11: Access mode D The following sections describe the asynchronous access modes in more detail. Note: For a detailed description of FSMC timing requirements, please refer to Doc ID 023063, Data sheet, SPEAr1340, Dual-core Cortex A9 HMI embedded MPU. Doc ID 018553 Rev 3 261/590 Static memory controller (FSMC) RM0078 Mode 1 - SRAM asynchronous access Figure 75 and Figure 76 show the timings for a typical SRAM access. Figure 75. SRAM asynchronous read access t1= 4 cycles Addr_ST= 3 t2= 5 cycles Data_ST= 4 HCLK FSMC_CExn Address valid FSMC_ADx FSMC_BLxn Data read FSMC_IO FSMC_REn FSMC_WEn Data strobe Figure 76. SRAM asynchronous write access t2= 5 cycles Data_ST= 4 t1= 4 cycles Addr_ST= 3 HCLK FSMC_CExn FSMC_ADx Address valid FSMC_BLxn Data write FSMC_IO FSMC_REn FSMC_WEn 1 HCLK cycle 262/590 Doc ID 018553 Rev 3 RM0078 Static memory controller (FSMC) Mode A - SRAM asynchronous access with FSMC_REn toggling Figure 77 and Figure 78 show the timings for a typical SRAM access with FSMC_REn toggling. Similar to “Mode 1” with the difference: ● FSMC_REn toggling ● Independent read and write timings Figure 77. SRAM asynchronous read access with FSMC_REn toggling t1= 4 cycles Addr_ST= 3 t2= 5 cycles Data_ST= 4 HCLK FSMC_CExn Address valid FSMC_ADx FSMC_BLxn Data read FSMC_IO FSMC_REn FSMC_WEn Data strobe Figure 78. SRAM asynchronous write access with FSMC_REn toggling t2= 5 cycles Data_ST= 4 t1= 4 cycles Addr_ST= 3 HCLK FSMC_CExn FSMC_ADx Address valid FSMC_BLxn Data write FSMC_IO FSMC_REn FSMC_WEn 1 HCLK cycle Doc ID 018553 Rev 3 263/590 Static memory controller (FSMC) RM0078 Mode 2/Mode B - NOR Flash asynchronous access The only difference between Mode 2 and Mode B is that read and write timings are the same when the extended mode is disabled (Mode 2), or can be different when the extended mode is enabled (Mode B). Similar to “Mode 1” with the difference: ● FSMC_REn toggling ● Independent read and write timings when extended Mode is set (Mode B). Figure 79 and Figure 80 show the timings for a typical NOR Flash access. Figure 79. NOR Flash asynchronous read access t2= 5 cycles Data_ST= 4 t1= 4 cycles Addr_ST= 3 HCLK FSMC_CExn Address valid FSMC_ADx FSMC_AV Data read FSMC_IO FSMC_REn FSMC_WEn Data strobe Figure 80. NOR Flash asynchronous write access t2= 5 cycles Data_ST= 4 t1= 4 cycles Addr_ST= 3 HCLK FSMC_CExn FSMC_ADx Address valid FSMC_AV FSMC_IO Data write FSMC_REn FSMC_WEn 1 HCLK cycle 264/590 Doc ID 018553 Rev 3 RM0078 Static memory controller (FSMC) Mode C - NOR Flash asynchronous access with FSMC_REn toggling Figure 81 and Figure 82 show the timings for a typical NOR Flash access with FSMC_REn toggling. Similar to “Mode 1” with the difference: ● FSMC_AV toggling ● FSMC_REn toggling Figure 81. NOR Flash asynchronous read access with FSMC_REn toggling t2= 5 cycles Data_ST= 4 t1= 4 cycles Addr_ST= 3 HCLK FSMC_CExn FSMC_ADx Address valid FSMC_AV FSMC_IO Data read FSMC_REn FSMC_WEn Data strobe Figure 82. NOR Flash asynchronous write access with FSMC_REn toggling t1= 4 cycles Addr_ST= 3 t2= 5 cycles Data_ST= 4 HCLK FSMC_CExn FSMC_ADx Address valid FSMC_AV FSMC_IO Data write FSMC_REn FSMC_WEn 1 HCLK cycle Doc ID 018553 Rev 3 265/590 Static memory controller (FSMC) RM0078 Mode D - Asynchronous access with extended address Figure 83 and Figure 84 show the timings for an asynchronous access with extended address. Similar to “Mode 1” with the difference: ● FSMC_AV toggling ● FSMC_REn toggling extended beyond FSMC_AV change Figure 83. Asynchronous read access with extended address t1= 3 cycles Addr_ST= 2 th= 3 cycles Hold_addr= 2 t2= 3 cycles Data_ST= 2 HCLK FSMC_CExn Address valid FSMC_ADx FSMC_AV FSMC_IO Data read FSMC_REn OEN_delay FSMC_WEn Data strobe Figure 84. Asynchronous write access with extended address t1= 3 cycles Addr_ST= 2 th= 3 cycles Hold_addr= 2 t2= 3 cycles Data_ST= 2 HCLK FSMC_CExn FSMC_ADx Address valid FSMC_AV FSMC_IO Data write FSMC_REn FSMC_WEn 1 HCLK cycle 266/590 Doc ID 018553 Rev 3 RM0078 16.5.4 Static memory controller (FSMC) ECC calculation FSMC has 2 hardware ECC calculator blocks, based on BCH coding. This solution corrects up to 8 errors in a 512-byte large data block; the data block size is not programmable. Each one refers to a single NAND chip select; ECC hardware blocks are not shared among NAND memories. After having written the 512-byte data, BCH encoder takes about 29 AHB clock cycles to calculate the ECC code. The bit 15 in GenMemCtrl_Status register flags when ECC calculation is completed. The ECC code is 104 bits and stored in the following registers: ● GenMemCtrl_ECCrx, where x is the Bank ● GenMemCtrl_ECC2rx, where x is the Bank ● GenMemCtrl_ECC3rx, where x is the Bank ● GenMemCtrl_Status[24:16] All registers are read-only; attempts to write them are ignored. When reading back, the 512 bytes data must be stored temporarily in a RAM. The 13 bytes ECC previously written must also be read, however there is no need to store them in RAM, the BCH decoder in FSMC automatically captures them to use. After about 301 AHB clock cycles and having read all data from NAND, the BCH decoder provides the information in the registers mentioned above. The final correction must be done in software inverting the appropriate bit in the buffer RAM. 16.5.5 Bus turn around External memories share the same address and data busses. During a read access, the data bus is driven by the selected memory. In case the next access is made to a multiplexed I/Os memory, a data bus turn around condition occurs, because the controller needs to drive addresses on the data bus, which might lead to a bus contention in case the “previous” memory has not released the bus fast enough. To prevent this, each time the FSMC performs a read access (random read, single access, or the last of a burst) to any kind of memory, a bus turnaround delay is introduced between the completion of the current read transaction (FSMC_REn and chip select disabled by the controller) and the next transaction. This delay lasts (BusTurn+1) AHB clock (HCLK) cycles, where BusTurn is the value programmed in the timing register GenMemCtrl_tim of the selected memory. If the memory system includes only non-muxed memories, BusTurn can be set to the minimum, otherwise set it to fulfill the worst (slowest) memory turnaround time. Doc ID 018553 Rev 3 267/590 Serial NOR Flash controller (SMI) 17 RM0078 Serial NOR Flash controller (SMI) This chapter focuses on SMI functionality and operation. For the SMI feature list, refer to the SPEAr1340 datasheet: ● Doc ID 023063, Data sheet, SPEAr1340, Dual-core Cortex A9 HMI embedded MPU For technical details about the programmable registers, refer to the following companion document: ● 17.1 RM0089, Reference manual, SPEAr1340 address map and registers Overview The serial memory interface integrated in SPEAr1340 acts as an AHB slave interface (32-, 16- or 8-bit) to SPI-compatible off-chip memories. SMI allows the CPU to use these serial memories either as data storage or for code execution. Figure 85. SMI block diagram SMI clock prescaler (1 to 127) AMBA AHB Bus SMI data processing and control Data, command Bank select SPIcompatible memories Transmit register AHB slave interface Control and status register Receive register/ Status register 17.2 Clock Data, Status Pins For a complete pin description, refer to Doc ID 023063, Data sheet, SPEAr1340, Dual-core Cortex A9 HMI embedded MPU. 268/590 Doc ID 018553 Rev 3 RM0078 17.3 Serial NOR Flash controller (SMI) Clocks The memory clock (smi_clk_o) is generated by SMI through its programmable prescaler unit. The incoming AHB bus frequency fAHB (HCLK signal) is divided by the value stored in the PRESC field of CR1 register, resulting in the SMI clock frequency fSMICLK: fSMICLK = fAHB / (PRESC value) that is: tSMICLK = tAHB • (PRESC value) where tSMICLK and tAHB are the clock periods of the SMI clock and the AHB bus, respectively. fSMICLK can be up to 20 MHz in normal mode and up to 50 MHz in fast read mode. Note: If PRESC is an even value, high time and low time of SMI clock are both equal to half a tSMICLK. In contrast, in case PRESC is an odd value: tSMICLK, high = tSMILCK • [(PRESC - 1) / 2] / PRESC tSMICLK, low = tSMICLK • [(PRESC + 1) / 2] / PRESC 17.3.1 Latency Assuming that SMI is not busy, the nominal latency for a 32-bit single read to a nonincrementing serial Flash address is: ● 73 tAHB maximum, if PRESC = 1 (that is, tAHB = tSMICLK); ● (68 tSMICLK + 5 tAHB) maximum, if PRESC > 1 (that is, tAHB ≠ tSMICLK, and specifically tSMICLK > tAHB), taking into account up to 9 clock periods in addition to 64 clock periods required to both send command to serial Flash memory (1-byte opcode + 3-bytes address) and receive back 32 bits. Under the same assumption, the nominal latency for a 32-bit single write to a nonincrementing serial Flash address is: ● 5 tAHB maximum, if PRESC = 1 (that is, tAHB = tSMICLK); ● (2 tSMICLK + 3 tAHB) maximum, if PRESC > 1 (that is, tAHB ≠ tSMICLK, and specifically tSMICLK > tAHB). For AHB read burst transfers, the maximum latency for all transfers after the first is the same as the data size, that is (32 tSMICLK) for a word transfer, (16 tSMILCLK) for a half-word, and (8 tSMICLK) for a byte, because there are no mandatory extra commands (instruction opcode and address). For AHB Write Burst transfers, the maximum latency for the 2nd transfer is: (data size + opcode + address bytes), and it is the same as data size for the transfers after that. Nominal latency can be increased by: ● On-going SMI transfer (read, write, read status register command or write enable command) ● Deselect time programming (field TCS in CR1 register), which adds (TCS + 1) • smi_clk_o periods ● Busy / idle transfer on AHB bus Doc ID 018553 Rev 3 269/590 Serial NOR Flash controller (SMI) RM0078 ● Fast read which adds 1 dummy byte ● Hold programming (field HOLD in CR1 register) ● Boot delay time (see Section 17.4.5: Booting from external memory) ● Frequency change ● On-going programming 17.4 Functional description 17.4.1 AHB interface The following rules apply to the access from the AHB to the SMI: 17.4.2 ● Endianness is fixed to little-endian ● SPLIT/RETRY responses are not supported ● Bursts must not cross bank boundaries ● Size of data transfers for memories can be byte/half-word/word, otherwise ERROR response on HRESP ● Size of data transfers for registers must be 32-bit wide, otherwise ERROR response on HRESP ● Read requests: all types of BURST are supported. Wrapping bursts take more time than incrementing bursts, as there is a break in the address increment ● Write requests: wrapping bursts are not supported, and provoke an ERROR response on HRESP ● BUSY transfer: the SMI transfer is held until busy is inactive. Memory device compatibility The communication protocol used is SPI in CPOL = 1 and CPHA = 1 mode. The instructions supported are listed in Table 117. Table 117. Supported instruction set Opcode 17.4.3 Description 0x03 Read data bytes 0x0B Read data high speed 0x05 Read status register 0x06 Write enable 0x02 Page program 0xAB Release from deep power-down Hardware mode At reset, the SMI operates in hardware mode. In this mode, the TR transmit register and RR receive register must not be accessed. They are managed by the SMI hardware and used to communicate with the external memory devices whenever an AHB master reads or writes to an address in external memory. 270/590 Doc ID 018553 Rev 3 RM0078 17.4.4 Serial NOR Flash controller (SMI) Software mode In software mode, TR transmit register and RR receive register are accessible. Direct AHB transfers to/from external memories are not allowed. You can enable software mode by setting the SM bit in the CR1 register. Software mode is used to transfer any data or commands from the TR transmit register to external memory and to read data directly in the RR receive register. The transfer is started using the send bit in the CR2 register. For example, software mode is used to erase Flash memory before writing. Erase cannot be managed in hardware mode due to incompatibilities that exist between Flash devices from different vendors. In software mode, application code being executed by the core cannot be fetched from external memory. It must either reside in internal memory, or be previously loaded from external memory while the SMI is in hardware mode. 17.4.5 Booting from external memory SPEAr1340 allows an external boot from a serial Flash only located at Bank0 (which is enabled after power-on reset). During the boot phase, the following instructions sequence is automatically sent to Bank0: Note: 17.4.6 1. Release from deep power-down (opcode 0xAB), in order to be able to boot on this bank even if it was in deep power-down mode 2. 29 µs delay to ensure Bank0 is successfully released 3. Read status register (opcode 0x05), in order to check that Bank0 is neither in write nor in erase cycle 4. Read data bytes (opcode 0x03) at memory start location (that is, 0xE6000000) with a 19 MHz clock frequency. 1 All memory banks other than Bank0 are disabled at reset and they must be enabled by setting dedicated BE bits in CR1 register before they can be accessed. 2 If an AHB request occurs while either the WEN bit or the RSR bit (both in CR2 register) is set, the on-going command is first finished before the request from AHB is sent to the memory. External memory read request A read request to external memory is served only if the SMI is in hardware mode (CR1 register, SW = 0), and write burst mode is not selected (CR1 register, WBM = 0), otherwise the ERF1 flag in the SR register is set and an ERROR response is sent to AHB. When a read request occurs in normal mode (CR1 register, FAST=0), the following sequence is sent to the selected bank: 1. Read data bytes opcode (0x03) 2. 3 or 2-byte address from the most to the least significant bit (depending on the ADDR_LENGTH bit in the CR1 register) 3. The clock is sent until the end of burst request from master. When a read request occurs in high speed mode (CR1 register, FAST = 1), the following sequence is sent to the selected bank: Doc ID 018553 Rev 3 271/590 Serial NOR Flash controller (SMI) RM0078 1. Read data bytes at high speed opcode (0x0B) 2. 3 or 2-byte address from the most to the least significant bit (depending on the ADDR_LENGTH bit in the CR1 register) 3. 1 dummy byte (0x00) 4. The clock is sent until the end of burst request from master. The external memory bank remains selected as long as there is no external memory address jump, and as long as no new commands are sent to the SMI (such as WEN, RSR, SW mode or WBM mode, write request, bank disable, prescaler configuration change or memory access error). It also remains selected when the address rolls over from 0xFFFFFF to 0x000000 in same bank. 17.4.7 External memory write request A write request from AHB is served only if the SMI is in hardware mode (CR1 register, SW = 0), otherwise the ERF1 flag in the SR register is set and an ERROR response is sent to AHB. Wrapping bursts are not allowed as serial memories do not support them. They generate an ERROR response to AHB. When a write request occurs, it is sent to external memory if the following conditions are met: ● Bank in write mode: When a bank is in write mode, the corresponding WM flag is set in the SR register. If this condition is not met when a write request occurs, the ERF2 flag in the SR is set and an ERROR response is sent to AHB. To enable write mode, select the bank using the BS bits in the CR2 register and then set the WEN bit in the CR1 register. ● No write in progress: The WIP bit in the SR register must be cleared. If this condition is not met, AHB is stalled until WIP = 0. When these two conditions are met, the following sequence is sent to the selected bank: 1. Page program opcode (0x02) 2. 3- or 2-byte address from the most to the least significant bit (depending on the ADDR_LENGTH bit in the CR1 register) 3. Transfer all the data bytes from bit 7 to bit 0, starting with address given previously and incrementing it to the last depending on the size of the write request. Write capability must be used only if write in progress/busy bit of the external memory status register is located in bit 0. Otherwise the system will become locked. After a write request is sent to external memory, write mode bit is reset and the read status register instruction is automatically sent to this bank until WIP = 0. Bits 7:0 of the SR register are refreshed every 8 smi_clk_o periods with the contents of the status register read from the selected external memory. When memory programming is finished, the WCF in the SR is set and an interrupt is generated if the WCIE bit in the CR1 register is set. In order to send a write request to another bank than the one under programming, the software must wait for WIP = 1, otherwise the error ERF2 would be generated due to non incrementing address. The bank under programming phase must not be disabled in order to write to another one. 272/590 Doc ID 018553 Rev 3 RM0078 17.4.8 Serial NOR Flash controller (SMI) Write burst mode Write burst mode is used to keep the external memory selected after the AHB write request (CR1 register, WBM = 1). In that case, the next AHB to external memory write request must be sent to the next incremented address, and it must be of the same size. Otherwise the ERF2 flag in the SR register is set and an ERROR response is sent to AHB. The external memory selection is released by resetting WBM or disabling the bank, and then the external memory page program cycle starts. If Bank is enabled, the read status register instruction is automatically sent to this bank until WIP = 0. A memory access error (ERF1 or ERF2) generates the nCSx release and the start of the external memory page program. If write burst mode is not selected, the next incrementing AHB write request will be sent to external memory if it occurs before the end of the previous serial transfer. Otherwise the ERF2 flag in the SR register is set and an ERROR response is sent to AHB. Consequently, it is mandatory to set WBM bit in order to perform several write requests which are not sent in the same AHB incrementing burst. If WBM = 0 and no other write request occurs, the external memory selection is released after sending the data, and the external memory page program cycle starts. Read requests to external memory are forbidden when WBM = 1, otherwise the ERF1 flag in the SR register is set and an ERROR response is sent to AHB. 17.4.9 Read while write If a read to the same bank which is in programming phase occurs, the AHB is stalled until WIP = 0. If a read to another bank occurs, the read status register sequence is stopped, the read request is served and then the read status register sequence is re-sent to the memory being programmed. So during a read while write, the external memory select is released after the read command, in order to send the read status register sequence. 17.4.10 Erasing and write status register In case of serial Flash, an erase may be necessary before writing. Due to incompatibility between different serial Flash vendors, erase and write status register can be done only in software mode. It is mandatory to send previously the write enable instruction through software mode only, in order not to corrupt the WM bit in the SR register as the end of internal Flash erase or write status register cannot be checked by hardware (and consequently write complete interrupt is not generated). WIP bit can be checked by sending the RSR command continuously. Doc ID 018553 Rev 3 273/590 Memory card interface (MCIF) 18 RM0078 Memory card interface (MCIF) This chapter focuses on MCIF functionality and operation. For the MCIF feature list, refer to the SPEAr1340 datasheet: ● Doc ID 023063, Data sheet, SPEAr1340, Dual-core Cortex A9 HMI embedded MPU For technical details about the programmable registers, refer to the following companion document: ● 18.1 RM0089, Reference manual, SPEAr1340 address map and registers Overview MCIF is a hardware IP that interfaces with the most common memory cards on the market: ● SD/SDIO 2.0 ● CF/CF+ Rev 4.1 ● SDHC ● MMC 4.2/4.3 ● xD The device interface multiplexes different memory cards on the same IOs; only one memory card is accessible at a given time. At the board level, discrete elements are required to handle host-swap management. Figure 86. SD/SDIO/MMC Host controller block diagram Bus monitor AHB BUS Synchronizer Power management AHB interface SD registers SD protocol unit Command control unit SDIO2.0/ SD2.0 Mem / MMC 3.31 4.2/4.3 Device Data control unit Data FIFO 2 * 4K Clock control 18.2 Pins For a complete pin description, refer to Doc ID 023063, Data sheet, SPEAr1340, Dual-core Cortex A9 HMI embedded MPU. 274/590 Doc ID 018553 Rev 3 RM0078 18.3 Memory card interface (MCIF) Clocks See Chapter 5: Reset and clock generator (RCG). 18.4 Functional description 18.4.1 SD2.0/SDIO2.0/MMC4.3 AHB Host controller The SD2.0/SDIO2.0/MMC4.3 Host controller: ● has an ARM processor interface that conforms to SD Host controller standard specification version 2.0. ● handles SDIO/SD protocol at the transmission level, packing data, adding cyclic redundancy check (CRC) and start/end bit, and checking for transaction format correctness. ● provides a programmed IO method and a DMA data transfer method. In the programmed IO method, the ARM processor transfers data using the buffer data port register. Host controller support for DMA can be determined by checking the DMA support in the capabilities register. DMA enables a peripheral to read or write to memory without intervention from the CPU. The system address register points to the first data address, and data is then accessed sequentially from that address. The SD2.0/SDIO2.0/MMC4.3 Host controller comprises: ● The Host_AHB interface, which acts as the bridge between AHB and Host controller. ● Host controller registers: the SD/SDIO controller registers are programmed by the ARM Processor through AHB target interface. Interrupts to the ARM Processor are generated based on the values set in the Interrupt status register and Interrupt enable registers. ● Bus monitor, which checks for any violations occurring in the SD bus and time-out conditions. ● Clk_gen: the clock generation block generates the SD clock depending on the value programmed by the ARM Processor in the clock control register. ● CRC generator and checker (CRC7 and CRC16): the CRC7 and CRC16 generators calculate the CRC for command and Data respectively to send the CRC to the SD/SDIO card. The CRC7 and CRC16 checker checks for any CRC error in the Response and Data sent by the SD/SDIO card. In order to detect data defects on the cards the host may include error correction codes in the payload data. An ECC code is used to store data on the card. This ECC code is used by the host or application to decode the user data. AHB interface The SD2.0/SDIO2.0/MMC4.3 Host controller provides a programmed IO method in which the ARM Host driver transfers data using the buffer data port register. The AHB target is the Host control registers, and these registers are programmed by the ARM processor through the AHB target interface. In the programmed IO data transfer method, the data transaction is performed through the AHB target interface. If the data transaction is done using the DMA data transfer method, the AHB Interface initiates a read or write transaction with memory. Doc ID 018553 Rev 3 275/590 Memory card interface (MCIF) RM0078 Interrupt controller If any of the interrupt bits are set in the interrupt status register, the SD2.0/SDIO2.0/MMC4.3 Host controller generates an interrupt to the ARM processor. Data FIFO The SD/SDIO Host controller uses two 4K dual port FIFOs to perform both read and write transactions. For maximum throughput during a write transaction (a data transfer from the ARM Processor to the SD2.0/SDIO2.0/MMC4.3 card), the two FIFOs are used alternately to store data. As data from the first FIFO transfers to the SD2.0/SDIO2.0/MMC4.3 card, the second FIFO fills, and as data from the second FIFO transfers, the first FIFO fills. Similarly, for maximum throughput during a read transaction (a data transfer from the SD2.0/SDIO2.0/MMC4.3 card to the ARM Processor), data from the SD2.0/SDIO2.0/MMC4.3 card is alternately written to each FIFO. As data from one FIFO transfers to the ARM Processor, the other FIFO fills. If the Host controller cannot accept any data from the SD2.0/SDIO2.0/MMC4.3 card, it either issues a read wait (if card supports read wait mechanism) to stop the data coming from the card, or it stops the clock. Note: FIFO depth is 4K. Two 4K FIFOs are used to support a ping pong mechanism (to increase the throughput). DAT[0-7] control logic The DAT[0-7] control logic block transmits data in the data lines during a write transaction and receives data in the data lines during a read transaction. Command control logic The Command control logic block sends the command on the cmd line, and receives the response coming from the SD2.0/SDIO2.0/MMC4.3 card. Power control The SD2.0/SDIO2.0/MMC4.3 Host controller supplies SD bus power depending on the value programmed in the power control register by the ARM Processor. The ARM processor supplies SD bus voltage according to card OCR and the supply voltage capabilities of the Host controller. If the SD bus power is set to 1 in the power control register, the Host controller supplies voltage to the card. If the Host driver selects an unsupported voltage in the SD bus voltage select field, the Host controller may ignore a write to SD bus power and retain a value of zero. Stream write and read for both DMA and NON DMA transaction WRITE_DAT_UNTIL_STOP(CMD20) writes a data stream from the host, beginning at the given address and ending at a STOP_TRANSMISSION. READ_DAT_UNTIL_STOP(CMD11) reads a data stream from the card, beginning at the given address and ending at a STOP_TRANSMISSION. The Host controller switches to the second FIFO after writing/reading a block of data to the first FIFO, but in stream transaction blk size is not programmed by the driver. Because of this, for both stream write and stream read transactions the host driver should write the 276/590 Doc ID 018553 Rev 3 RM0078 Memory card interface (MCIF) maximum FIFO size value to the blk size register. For example, if FIFO size is 4 K bytes, the host driver should write 4 K bytes to the blk size register. This ensures that FIFO switching occurs after writing/reading the 4 K bytes of data (= FIFO size). Host enumeration The SD2.0/SDIO2.0/MMC4.3 host is enumerated by an external ARM processor. The processor is informed of card insertion or removal from the slot by means of interrupts. The cards in the slot are enumerated by the SD host controller as instructed by the processor, by means of target register sets. Data transfer protocol SD transfers are classified according to how the number of blocks is specified: ● Single block transfer The number of blocks is specified to the host controller before the transfer. The number of blocks specified is always one. ● Multiple block transfer The number of blocks is specified to the host controller before the transfer. The number of blocks specified is one or more. ● Infinite block transfer The number of blocks is not specified to the host controller before the transfer. This transfer continues until an Abort transaction is executed. For an SD memory card, the abort transaction is performed by CMD12. For an SDIO card, the abort transaction is performed by CMD52. Doc ID 018553 Rev 3 277/590 Memory card interface (MCIF) 18.4.2 RM0078 Not using DMA Figure 87 provides a flowchart of the data transfer procedure without using DMA. Figure 87. Data transfer using DAT line sequence (not using DMA) Start (5) (1) Set Command Reg Set Block Size Reg (6) (2) Command Complete Int Occur Wait for Command Complete Int Set Block Count Reg (3) (7) Set Argument Reg Clr Command Complete status (4) (8) Set Transfer Mode Reg Get Response Data (9) Write Read Write or read? (10-R) (10-W) Wait for Buffer Read Ready Int Wait for Buffer Write Ready Int Buffer Write Ready Int occur Clr Buffer Write Ready status Buffer Read Ready Int Occur (11-R) (11-W) Clr Buffer Read Ready status (12-R) (12-W) Get Block Data Set Block Data (13-W) (13-R) Yes Yes More Blocks? More Blocks? No No Single or Multi Block Transfer (14) Single / Multi / Infinite Block Transfer? (17) (15) Wait for Transfer Complete Int (16) Abort Transaction Transfer Complete Int occur Clr Transfer Complete status End 278/590 Infinite Block Transfer Doc ID 018553 Rev 3 RM0078 Note: Memory card interface (MCIF) 1. In the Block Size register, set the value of the executed data byte length of one block. 2. In the Block Count register, set the value of the executed data block count. 3. In the Argument register, set the value of the issued command. 4. Set the appropriate value to Multi / Single Block Select and Block Count Enable. Set the value appropriate for the issued command to Data Transfer Direction, Auto CMD12 Enable, and DMA Enable. 5. In the Command register, set the value appropriate for the issued command. When writing the upper byte of the Command register, an SD command is issued. 6. Wait for a command complete interrupt. 7. Clear this bit: In the Normal Interrupt Status register, write 1 to Command Complete. 8. Read the Response register, and get the necessary information for the issued command. 9. If writing to a card, go to step 10-W. If reading from a card, go to step 10-R. 10-W. Wait for a Buffer Write Ready Interrupt. Non DMA write transfer: On receipt of a Buffer Write Ready interrupt, the ARM processor acts as a master and begins to transfer data via the Buffer data port register (fifo_1). The transmitter begins sending data on the SD bus when a block of data is ready in fifo_1. While transmitting the data on the SD bus, the buffer write ready interrupt is sent to the ARM Processor for the second block of data. The ARM processor acts as a master and begins sending the second block of data via the Buffer data port register to fifo_2. A buffer write ready interrupt is asserted only when a FIFO is empty to receive a block of data. 11-W. Write 1 to Buffer Write Ready in the Normal Interrupt Status register for clearing this bit. 12-W. Write block data (according to the number of bytes specified at the step 1) to the Buffer Data Port register. 13-W. Repeat until all blocks are sent and then go to step (14). Non DMA read transfer: A Buffer Read Ready interrupt is asserted whenever a block of data is ready in one of the FIFO’s. On receipt of a Buffer Read Ready interrupt, the ARM processor acts as a master and begins reading the data via the Buffer data port register (fifo_1). The receiver begins reading data from the SD bus only when a FIFO is empty to receive a block of data. When both the FIFO’s are full, the host controller stops the data flow from the card either by using a read wait mechanism (if the card supports read wait) or by stopping the clock. 10-R.Wait for a Buffer Read Ready interrupt 11-R. Clear this bit: In the Normal Interrupt Status register, write 1 to Buffer Read Ready. 12-R. Read block data (according to the number of bytes specified in step 1) from the Buffer Data Port register. 13-R. Repeat the previous step until all blocks are received, and then go to step 14. 14. For a single or multiple block transfer, go to step 15. For an infinite block transfer, go to step 17. 15. Wait for a Transfer Complete interrupt. 16. Clear this bit: In the Normal Interrupt Status register, write 1 to Transfer Complete. 17. Perform the Abort transaction sequence. Doc ID 018553 Rev 3 279/590 Memory card interface (MCIF) RM0078 Note: Steps 1 and 2 can be executed simultaneously; steps 4 and 5 can be executed simultaneously. 18.4.3 Using DMA Burst types such as 8-beat incrementing burst, 4-beat incrementing burst, or single transfer are used to transfer or receive the data from the system memory primarily to avoid the longer hold time of the AHB bus by the master. Figure 88 provides a flowchart of the data transfer procedure using DMA. Figure 88. Data transfer using DAT line sequence (using DMA) Start (1) Set System Address Reg (2) (10) Set Block Size Reg Wait for Transfer Complete Int and DMA Int (3) Set Block Count Reg (4) (11) Set Argument Reg Check Interrupt Status (5) Transfer Complete Int. occur Set Transfer Mode Reg (12) (6) Clr DMA Interrupt status Set Command Reg (7) (13) Set System Address Reg Wait for Command Complete Int (8) DMA Int. occur Command Complete Int occur Clr Command Complete status (14) Clr Transfer Complete status Clr DMA Interrupt status (9) Get Response Data 280/590 End 1. In the System Address register, set the system address for DMA. 2. In the Block Size register, set the value of the executed data byte length of one block. 3. In the Block Count register, set the value of the executed data block count. 4. In the Argument register, set the value of the issued command. 5. Set the value to Multi / Single Block Select and Block Count Enable. Set the value corresponding to the issued command to Data Transfer Direction, Auto CMD12 Enable and DMA Enable. 6. In Command register, set the value of the issued command. Doc ID 018553 Rev 3 RM0078 Note: Memory card interface (MCIF) When writing the upper byte of Command register, an SD command is issued. 7. Wait for a Command Complete interrupt 8. Clear this bit: In the Normal Interrupt Status register, write 1 to Command Complete. 9. Read the Response register and get the necessary information in accordance with the issued command. DMA read transfer: On receipt of the response end bit from the card for the write command (data flowing from Host to Card), the SD Host controller act as the master and requests the AHB bus. After receiving the grant, the host controller begins reading a block of data from the system memory and fills the first FIFO. Whenever a block of data is ready, the transmitter begins sending the data on the SD bus. While transmitting the data on the SD bus, the host controller requests the bus to fill the second block in the second FIFO. Ping Pong FIFOs are used to increase the throughput. Similarly, the host controller reads a block of data from the system memory whenever a FIFO is empty. This continues until all blocks are read from the System memory. a transfer complete interrupt is set only after transferring all the blocks of data to the card. DMA write transfer: The block of data received from the card (data flowing from card to host) is stored in first half of the FIFO. Whenever a block of data is ready, the SD Host controller acts as the master and request the AHB bus. After receiving the grant, the host controller begins writing a block of data into the system memory from the first FIFO. While transmitting the data into the system memory, the host controller receives the second block of data and store it in second FIFO. Similarly the host controller writes a block of data into the system memory whenever data is ready. This continues until all blocks are transferred to system memory. The transfer complete interrupt is set only after transferring all blocks of data to the system memory. Note: The host controller receives a block of data from the card only when it has room to store a block of data in FIFO. When both FIFOs are full, the host controller stop the data flow from the card either by using a read wait mechanism (if the card supports read wait) or by stopping the clock. 10. Wait for the Transfer Complete interrupt and DMA interrupt. 11. If Transfer Complete = 1, go to step 4, If DMA Interrupt = 1, go to step 12. Transfer Complete has higher priority than DMA Interrupt. 12. Clear this bit: In the Normal Interrupt Status register, write 1 to DMA Interrupt. 13. In the System Address register, set the next system address of the next data position and go to step 10. 14. Clear this bit: In the Normal Interrupt Status register, write 1 to the Transfer Complete and DMA Interrupt. Note: Steps 2 and 3 can be executed simultaneously; steps 5 and 6 can be executed simultaneously. Example: the host wishes to transfer 4 KB of data to the card. Assuming that the maximum block size is 256 bytes, the host driver programs the block size register as 256, and the block count register with the value 16. The AHB Master and Transmitter inside the SD2.0/SDIO2.0/MMC4.3 Host controller get the information (how much data to transfer) from these registers. Using this information, the AHB master acts as a master and initiates a Doc ID 018553 Rev 3 281/590 Memory card interface (MCIF) RM0078 data read transaction (to read a block of data - 256 bytes from the system memory). The following types of burst are used primarily to avoid a longer AHB bus hold by the master. ● Single transfer ● 4-beat incrementing burst ● 8-beat incrementing burst The first block is received in the first FIFO and the second block in the second FIFO. Similarly, the remaining blocks are received in alternate FIFOs. Whenever a block of data is ready in FIFO, the transmitter starts transmitting the block of data (256) on the SD bus. After transmitting the entire block of data to the card, the transmitter waits for a status response from the card. The transmitter sends the next block of data only when it receives a good status response from the card for the previous block of data, otherwise the transaction is aborted and the host starts a fresh transaction. 18.4.4 Using ADMA Figure 89 provides a flowchart of the data transfer procedure using ADMA. Figure 89. Data transfer using DAT line sequence (using ADMA) Start (1) Create Descriptor table (2) Set ADMA System Address Reg (3) Set Block Size Reg (4) Set Block Count Reg (11) Wait for Transfer Complete Int and ADMA Error Int (5) Set Argument Reg (12) Check Interrupt Status (6) ADMA Error Int. occurs Set Transfer Mode Reg Transfer Complete Int. occurs (7) Set Command Reg (13) (14) Clr Transfer Complete Interrupt status (8) Wait for Command Complete Int (9) (15) Abort ADMA Operation Command Complete Int occurs Clr Command Complete status (10) Get Response Data 282/590 Doc ID 018553 Rev 3 Clr ADMA Error Interrupt status End RM0078 Memory card interface (MCIF) 1. In the system memory, create a Descriptor table for ADMA. 2. In the ADMA System Address register, set the Descriptor address for ADMA. 3. In the Block Size register, set the value of the executed data byte length of one block. In the Block Count register, set the value of the executed data block count as explained in RM0089, Reference manual, SPEAr1340 address map and registers, MCIF chapter. If the Block Count Enable in the Transfer Mode register is set to 1, total data length can be designated by the Block Count register and the Descriptor Table. These two parameters indicate the same data length, but transfer length is limited by the 16-bit Block Count register. If the Block Count Enable in the Transfer Mode register is set to 0, total data length is designated not by the Block Count register, but by the Descriptor Table. In this case, ADMA reads more data than length programmed in the descriptor from the SD card. A too large read operation is aborted asynchronously, and extra read data is discarded when the ADMA completes. Note: 4. In the Argument register, set the argument value. 5. In the Transfer Mode register, set the appropriate value. The host driver determines Multi / Single Block Select, Block Count Enable, Data Transfer Direction, Auto CMD12 Enable and DMA Enable. Multi / Single Block Select and Block Count Enable are determined as explained in RM0089, Reference manual, SPEAr1340 address map and registers, MCIF chapter. 6. In the Command register, set the appropriate value. When writing to the upper byte [3] of the Command register, an SD command is issued and DMA is started. 7. Wait for a Command Complete interrupt. 8. Clear this bit: In the Normal Interrupt Status register, write 1 to Command Complete. 9. Read the Response register and get the necessary information for the issued command. 10. Wait for a Transfer Complete interrupt and an ADMA Error interrupt. 11. If Transfer Complete = 1, go to step 12. If ADMA Error Interrupt = 1, go to step 13. 12. Clear this bit: In the Normal Interrupt Status register, write 1 to Transfer Complete Status. 13. Clear this bit: In the Error Interrupt Status register, write 1 to ADMA Error Interrupt Status. 14. Abort the ADMA operation. To stop SD card operation, issue an abort command. If necessary, the host driver checks the ADMA Error Status register to detect why an ADMA error was generated. Note: Steps 3 and can be executed simultaneously; steps 5 and 6 can be executed simultaneously. Doc ID 018553 Rev 3 283/590 Memory card interface (MCIF) 18.4.5 RM0078 Abort transaction An abort transaction is performed using CMD12 for an SD memory card and CMD52 for an SDIO card. The two cases when the HD must do an abort transaction are: When the HD stops infinite block transfers and when HD stops transfers while a multiple block transfer is exicuting. The two types of abort command are: asynchronous abort, where the HD can issue an abort command at anytime unless command inhibit (CMD) = 1 in the current state register; and synchronous abort, where the HD uses a Stop At Block Gap request in the block gap control register to issue an abort command after the data transfer stops. Synchronous abort Figure 90 provides a flowchart of this procedure. Figure 90. Synchronous abort sequence 284/590 1. Stop SD transactions: In the Block Gap Control register, set the Stop At Block Gap Request to 1. 2. Wait for a Transfer Complete interrupt. 3. Clear this bit: In the Normal Interrupt Status register, set Transfer Complete to 1. 4. Issue an Abort Command 5. Do a software reset: In the Software Reset register, set both Software Reset for DAT Line and Software Reset for CMD Line to 1. 6. In the Software Reset register, check both the Software Reset for DAT Line and the Software Reset for CMD Line. If both are 0, the abort is complete. If either is 1, repeat this step. Doc ID 018553 Rev 3 RM0078 18.4.6 Memory card interface (MCIF) Synchronization Data path synchronization For both read and write transaction, dual port RAM is used to store data using one clock domain and to retrieve data using another clock domain. Signal flow from clock domain A to clock domain B In clock domain A, the input pulse (in_pulse) is latched at clock A, and the latched signal (in_pulse_lat) is inverted whenever an input pulse (in_pulse) is detected. In clock domain B, the latched signal is triple-flopped. The output pulse of clock domain B is generated by XORing the output of the second and third stage synchronizers (flip flops). Figure 91. Data path synchronization Doc ID 018553 Rev 3 285/590 Memory card interface (MCIF) 18.4.7 RM0078 CF4.1/xD1.3 AHB Host controller The CFHOST controller provides a control interface to connect a CompactFlash Storage or CF+ Card to the AMBA AHB slave interface. It has the following features: ● True IDE operating mode only. For TrueIDE mode support, the CFHOST controller provides direct access to the ATA Command/Control register set in the CF/CF+ Device. For data transfers in TrueIDE mode (PIO), the CPU can directly read/ write the ATA DataPort register transparently for transferring data or let the CFHOST controller perform PIO transfer protocol by performing the INTRQ monitoring from the CompactFlash device for every block of transfer, with DRQ size set to 512 (default), 1024, 2048 or 4096 Bytes. ● Ultra DMA transfer protocol, to transfer data between the host controller and the CF/CF+ device, for increased data transfer rates. ● Advanced timing modes when generating transfers in True IDE mode and ultra DMA mode. ● 1 byte transfer sizes and up to 256 blocks (where a block size is 512 bytes) between the AHB Bus and the CF/CF+ device. The transfer size is always 16-bit wide. The block data transfers increases the performance of the CPU by off loading the complexity of performing individual transactions on the CF/CF+ Interface to the CFHOST controller with the CPU performing data read/write in Burst mode on the AHB Bus. Dual Internal Data FIFOs are used in ping-pong fashion during the Block transfer, and the AHB Interface operates in PIO mode for efficient movement of data between the Host memory and to/from the controller’s FIFOs. ● Complete PIO Transfer protocol by monitoring the INTRQ Signal and BSY/DRQ for every Block Size (which is default of 512 bytes). For write transfers, the CFHOST controller monitors the INTRQ Signal and DRQ status for every sector (512 bytes default) that is to be transferred to the CF/CF+ Device. For read transfers, the CFHOST controller monitors the INTRQ Signal and DRQ status for every Sector (512 bytes default) that is to be read from the CF/CF+ device. In this mode, the performance is increased dramatically as the CPU is only transferring data to/from Data Port FIFO’s in the CFHOST controller in burst mode, while the CFHOST controller is performing the actual PIO transfer on the CF/CF+ Interface. ● A dual-clock based architecture (one clock for the CF Interface and one clock for the AHB Interface). The dual clock architecture provides flexibility in running the CompactFlash (CF+) Interface at the higher speeds that are part of the advanced timing modes. 286/590 Doc ID 018553 Rev 3 RM0078 Memory card interface (MCIF) Block diagram Figure 92. CF/xD Host controller block diagram Timing Control CF/+ Interface Controller AHB AHB Processor slave Bus card RAM 256 x 32 interface AHB Bus CF CF/+ xD Operations Registers Interface xD xD Bus Card ECC CF host control and status registers set block This block: ● Contains a set of registers used to operate the CFHOST controller. These registers include the configuration and status registers, interrupt control register, transfer control registers, and the data port registers (read/write data port register). ● Provides access to the ATA Register set in the CF/CF+ device. ● Generates interrupt to the CPU by monitoring various events. What follows is a brief description of the registers and functions in this block. For a complete description refer to the host controller registers in the RM0089, Reference manual, SPEAr1340 address map and registers. Configuration registers are used to configure the various modes in the CFHOST controller that control the behavior of the Interface/Card. The main modes the CFHOST controller can operate are: memory mode, I/O mode, true IDE mode and ultra DMA mode. The Timing mode is programmed in the timing mode register. The frequency of the cficlk is programmed in CFI clock configuration register. The CFI status register contains the current CF/CF+ card interface status. Interrupt control registers contain the IRQ Register to report various events, and the Interrupt Enable register to control the generation of the ahbtarget_interrupt signal based on various Events; it monitors the CF/CF+ interface signals and generates an interrupt when an event occurs. The event Status is returned through the interrupt register. The CPU can enable/disable the Interrupt generation for each individual event separately. The Interrupt Block also generates an interrupt when the currently invoked transfer on the C/CF+ Interface completes, or, during the transfer, a when the Buffer is available to transfer the next Block of data. Transfer control registers are a set of registers used to generate Transactions on the CF/CF+ Interface. Based on the values programmed and current mode of the CFHOST controller, transfers are generated on the CF/CF+ Interface in addition to the Ultra DMA Doc ID 018553 Rev 3 287/590 Memory card interface (MCIF) RM0078 Mode and the True IDE Modes. The data is transferred through the Read/Write Data Port Registers. Data port registers are front end registers of the Read FIFO or Write FIFO. The write data port register writes data into the Write FIFO to transfer data to the CF/CF+ Card. The read data port register reads data from the Read FIFO when transferring data from the CF/CF+ Card to the CPU. Extended data port registers cover a range of addresses (512 Byte for each write and read) that acts as a front end to the Write FIFO or Read FIFO. The extended data port can be used when the CPU wishes to initiate a Burst Transfer with Incrementing Addresses. The Extended Write Data Port Register space (offset from 0x0200 to 0x03FC) is used to write data into the Write FIFO, similarly to the write data port register at offset 0x0024. The extended read data port register space (offset from 0x0400 to 0x05FC) is used to read data from the Read FIFO, similarly to the read data port register at offset 0x0028. TrueIDE registers provide a window with which the CPU can directly access the ATA Registers in the CF/CF+ Device. All TrueIDE registers are 8-bit registers except the DataPort register, which is a 16-bit register. Accesses to these registers are treated as Non-Posted for read/write operations, and the transfers on the AHB interface are extended until the transfer is completed on the CF/CF+ Interface. Accesses to these registers are completed with Error response when the Interface is not operating in TrueIDE mode. The CPU can use the ATA DataPort if it wishes to directly transfer the data between the CPU and the CF/CF+ Device using TrueIDE PIO Mode transfer Protocol. The direct access of the ATA DataPort is treated as transparent mode with the CFHOST controller providing only bus protocol translation between the AHB bus and the CF/CF+ Interface. CF host timing control block The timing control block generates the timing information to the CF/CF+ Interface controller block while the Interface block is generating transfers on the CF/CF+ Interface. The timing information is based on the cficlk frequency, and on the current CF/CF+ Card timing mode. The timing information is critical for the correct operation of the CF HOST controller to prevent violating the timing protocol when generating transfers on the CF/CF+ Interface, and for the correct operation of the CF/CF+ Card. The timing information is hard coded in this block based on the timing values called for in the Specification. The timing information is divided based on the Host controller’s operating mode and transfer mode. CF Host transaction controller block The transaction controller is the main control for the CFHOST controller that manages transaction sequence generation on the CF/CF+ Interface. Based on the current transfer mode of the CFHOST controller, the controller generates the transfer sequence to the CF/CF+ Interface controller when the Transfer Control Register is programmed to initiate transfer. In TrueIDE PIO mode, the controller can perform PIO Block transfers to the CF/CF+. The controller completely handles the PIO Transfer protocol by monitoring INTRQ signal and BSY/DRQ bit for every Block Transfer until the complete data is transferred. The INTRQ signal is blocked from the CPU until the PIO Transfer completes, at which point the INTRQ is passed through to the Host CPU. The address is always fixed to 0b000 to address the Data Port Register. The Transfer Count must be a multiple of the DRQ Block size that is programmed in the CFHOST controller and the CF/CF+ Device. Supported DRQ Block Sizes are 512 Bytes(Default), 1024 Bytes, 2048 Bytes and 4096 Bytes. The DRQ Block Size is also used 288/590 Doc ID 018553 Rev 3 RM0078 Memory card interface (MCIF) to monitor the INTRQ signal for every new Block. Once the Transfer completes, the INTRQ from the CF/CF+ Device is forwarded to the AHB Bus for the CPU to process the interrupts. In DMA mode, the controller can perform either the TrueIDE MultiWord DMA Transfers or UltraDMA transfers to the CF/CF+ device. The Transfer size can be from 2 to 65536 Bytes. The DMA protocol on the CF/CF+ Interface is completely handled by the CFHOST controller by shielding the Host CPU from the complexities of the protocol. CF host data FIFO block The data FIFO block contains a FIFO that is used for read/write transfers from/to the CF/CF+ Card. This is an asynchronous FIFO, 32-bits wide and 256-bits deep, with one side operating on ahbclk while the other side operates on clk_xin. For Write transfers to the CF/CF+ Card, this FIFO is called the Write FIFO and for read transfers from CF/CF+ Card, this FIFO is called the Read FIFO. The Write FIFO is used when transferring data from the CPU to the CF/CF+ Card (Write Transfers). Once the transfer control registers are programmed, the CPU writes the data to be transferred into the Write FIFO by means of the write data port register. The CFHOST controller translates the 32-bit data into 16-bit wide data. After each block of data is transferred into the Write FIFO, the CPU waits for a Buffer Available Interrupt to transfer the next Block of data (up to value programmed in the Transfer Block Count Register) into the Write FIFO. The Read FIFO is used when transferring data from the CF/CF+ Card to the CPU (Read Transfers). Once the Transfer Control Registers are programmed, the CPU reads the data from the Read FIFO by means of the Read Data Port Register. The CFHOST controller assembles the 32-bit data from multiple 16-bit transfers on the CF Interface. For each block of data read from the CF/CF+ Card, the controller asserts a Buffer Available Interrupt to indicate that there is data available in the FIFO. CompactFlash/CF+ interface block The CF/CF+ Interface block interfaces with the CF/CF+ Interface signals and performs the PIO/DMA Transfers based on the Transaction controller instructions. The Interface block deals with one Memory, IO, PIO, or DMA Burst transfer at a time, while the Transaction controller maintains the overall Transfer Counts. The Interface controller gets the Timing Information from the Timing controller Block and uses this timing information when performing PIO or DMA transfers. xD Interface Block AHB interface. The AHB slave block houses the Operational registers and handles the reading and writing of these registers by the Arm Processor. Synchronization module (SYNC) has handshake logic to communicate with the AHB Interface, and on the other side communicates with the xD Card Interface. ECC detection & correction module includes the ECC detection and correction modules, and calculates the ECC code. The calculated ECC code is stored in the xD Card ECC area during the xD Card write command. The calculated ECC code is compared with the received ECC from the xD Card. The error correction module corrects the 1 bit error in the byte. Note: 3 bytes of ECC is calculated for every 256 bytes in Main Area. ECC is not calculated for the Redundant Area. Doc ID 018553 Rev 3 289/590 Memory card interface (MCIF) RM0078 xD controller handles all command, address, and data sequences, manages all hardware protocols, and enables users to access xD Memory by reading or writing into the AHB slave operational registers. 290/590 Doc ID 018553 Rev 3 RM0078 19 Giga/Fast Ethernet controller (GMAC) Giga/Fast Ethernet controller (GMAC) This chapter focuses on GMAC functionality and operation. For the GMAC feature list, refer to the SPEAr1340 datasheet: ● Doc ID 023063, Data sheet, SPEAr1340, Dual-core Cortex A9 HMI embedded MPU For technical details about the programmable registers, refer to the following companion document: ● 19.1 RM0089, Reference manual, SPEAr1340 address map and registers Overview The GMAC IP provides the capability to transmit and receive data over Ethernet. 19.2 Pins For a complete pin description, refer to Doc ID 023063, Data sheet, SPEAr1340, Dual-core Cortex A9 HMI embedded MPU. 19.3 Clocks See Chapter 5: Reset and clock generator (RCG). 19.4 Functional description The Giga/Fast Ethernet controller supports the following interfaces: ● MII: Media independent interface ● GMII: Gigabit media independent interface ● RMII: Reduced MII These interfaces are multiplexed on the device pads. To select between the available interfaces, you must configure the miscellaneous register GMAC_CLK_CFG[5:3], macphy_sel. This section describes both normal and alternate/enhanced descriptor formats. Note: In SPEAr1340, only the alternate/enhanced format is supported by hardware. Doc ID 018553 Rev 3 291/590 Giga/Fast Ethernet controller (GMAC) RM0078 19.4.1 Descriptors Note: In SPEAr1340, only the alternate/enhanced format is supported by hardware. Alternate or enhanced descriptors The alternate (or enhanced) descriptor structure has 8 DWORDS (32 bytes). The features of the alternate descriptor structure are: ● The alternative descriptor structure has been implemented to support buffers of up to 8 KB (useful for Jumbo frames). ● There is a reassignment of control and status bits in TDES0, TDES1, RDES0 (Advanced timestamp or IPC full offload configuration), and RDES1. ● The transmit descriptor stores the timestamp in TDES6 and TDES7 for the Advanced Timestamp. ● This receive descriptor structure is also used for storing the extended status (RDES4) and timestamp (RDES6 and RDES7) for advanced timestamp feature or IPC full offload feature. ● For the Timestamp feature, the software needs to allocate 32 bytes (8 DWORDS) of memory for every descriptor. When Timestamping or Receive IPC FullOffload engine are not enabled, the extended descriptors are not required and the software can use alternate descriptors with the default size of 16 bytes. The core also needs to be configured for this change using bit 7 (ATDS: alternate descriptor size) of the Bus Mode register. ● When alternate descriptor is chosen without Timestamp or Full IPC Offload feature, the descriptor size is always 4 DWORDs (DES0-DES3). The description or bit-mapping alternate descriptor structure (in little-endian mode) is given below. Note: The effect of big-endian mode (byte-swap) apply to this descriptor structure as well. When alternate descriptor with only Full IPC Checksum Offload (Type 2) is selected, it is not backward compatible with respect to status bits[7,5,0] in RDES0. In this mode, you should enable the extended descriptor mode (8 DWORDS) to get the IPC checksum engine status in RDES4. Transmit descriptors Figure 93 shows the transmit descriptor structure. The application software must program the control bits TDES0[31:20] during descriptor initialization. When the DMA updates the descriptor, it writes back all the control bits except the OWN bit (which it clears) and updates the status bits[19:0]. Table 118 describes transmitter descriptor word 0 (TDES0) through word 3 (TDES3). With the advance timestamp support, the snapshot of the timestamp to be taken can be enabled for a given frame by setting the TTSE: Transmit Timestamp Enable (TDES0 bit-25). When the descriptor is closed (when the OWN bit is cleared), the time-stamp is written into TDES6 and TDES7. This is indicated by the status bit TTSS: Transmit Timestamp Status (TDES0 bit-17). The contents of TDES6 and TDES7 are listed in Table 119. Note: 292/590 When either the advanced timestamp or IPC offload (Type 2) feature is enabled, the software should set the DMA Bus Mode register[7], so that the DMA operates with extended descriptor size. When this control bit is reset, the TDES4-TDES7 descriptor spaces are not valid. Doc ID 018553 Rev 3 RM0078 Giga/Fast Ethernet controller (GMAC) Figure 93. Transmitter descriptor fields - alternate (enhanced) format 31 TDES0 0 O W N TDES1 Ctrl [30:26] RES T T S E RE S Ctrl [23:20] T T T S R E S Status [16:0] Buffer 2 Byte Count [28:16] RES Buffer 1 Byte Count [12:0] TDES2 Buffer 1 Address [31:0] TDES3 Buffer 2 Address [31:0] or Next Descriptor Address [31:0] TDES4 Reserved TDES5 Reserved TDES6 Transmit Time Stamp Low [31:0] TDES7 Transmit Time Stamp High [31:0] The DMA always reads or fetches four DWORDS of the descriptor from system memory to obtain the buffer and control information as shown in Figure 94. For AV feature support, TDES0 has additional control bits[6:3] for channel 1. For channel 0, the bits 6:3 are ignored. Table 118 describes bits 6:3. Figure 94. Transmit descriptor fetch (read) for alternate (enhanced) format 31 TDES0 TDES1 0 O W N Ctrl [30:26] R E S T T S E R E S Ctrl [23:20] R E S Reserved for Status [17:7] Buffer 2 Byte Count [28:16] R E S SLOT Number [6:3] Reserved for Status [3:0] Buffer 1 Byte Count [12:0] TDES2 Buffer 1 Address [31:0] TDES3 Buffer 2 Address [31:0] or Next Descriptor Address [31:0] Table 118. Transmit descriptor words 0 through 3 (TDES0 — TDES3) Bitq Description TDES0 31 OWN: Own Bit When set, this bit indicates that the descriptor is owned by the DMA. When this bit is reset, it indicates that the descriptor is owned by the Host. The DMA clears this bit either when it completes the frame transmission or when the buffers allocated in the descriptor are read completely. To avoid a possible race condition between fetching a descriptor and the driver setting an ownership bit, set the ownership bit of the frame’s first descriptor after all subsequent descriptors belonging to the same frame are set. 30 IC: Interrupt on Completion When set, this bit sets the Transmit Interrupt (Register 5[0]) after the present frame has been transmitted. Doc ID 018553 Rev 3 293/590 Giga/Fast Ethernet controller (GMAC) RM0078 Table 118. Transmit descriptor words 0 through 3 (TDES0 — TDES3) (continued) Bitq Description 29 LS: Last Segment When set, this bit indicates that the buffer contains the last segment of the frame. When this bit is set, the TBS1: Transmit Buffer 1 Size or TBS2: Transmit Buffer 2 Size field in TDES1 should have a non-zero value. 28 FS: First Segment When set, this bit indicates that the buffer contains the first segment of a frame. 27 DC: Disable CRC When this bit is set, the GMAC does not append a cyclic redundancy check (CRC) to the end of the transmitted frame. This is valid only when the first segment (TDES0[28]) is set. 26 DP: Disable Pad When set, the GMAC does not automatically add padding to a frame shorter than 64 bytes. When this bit is reset, the DMA automatically adds padding and CRC to a frame shorter than 64 bytes, and the CRC field is added despite the state of the DC (TDES0[27]) bit. This is valid only when the first segment (TDES0[28]) is set. 25 TTSE: Transmit Timestamp Enable When set, this bit enables IEEE1588 hardware time stamping for the transmit frame referenced by the descriptor. This field is valid only when the First Segment control bit (TDES0[28]) is set. 24 Reserved CIC: Checksum Insertion Control These bits control the checksum calculation and insertion. Bit encodings are as shown below. 2’b00: Checksum Insertion Disabled. 2’b01: Only IP header checksum calculation and insertion are enabled. 23:22 2’b10: IP header checksum and payload checksum calculation and insertion are enabled, but pseudoheader checksum is not calculated in hardware. 2’b11: IP Header checksum and payload checksum calculation and insertion are enabled, and pseudoheader checksum is calculated in hardware. When the configuration parameter IPC_FULL_OFFLOAD is not selected, this field is reserved. 21 TER: Transmit End of Ring When set, this bit indicates that the descriptor list reached its final descriptor. The DMA returns to the base address of the list, creating a descriptor ring. 20 TCH: Second Address Chained When set, this bit indicates that the second address in the descriptor is the Next Descriptor address rather than the second buffer address. When TDES0[20] is set, TBS2 (TDES1[28:16]) is a “don’t care” value. TDES0[21] takes precedence over TDES0[20]. 19:18 Reserved 17 TTSS: Transmit Timestamp Status This field is used as a status bit to indicate that a timestamp was captured for the described transmit frame. When this bit is set, TDES2 and TDES3 have a timestamp value captured for the transmit frame. This field is only valid when the descriptor’s Last Segment control bit (TDES0[29]) is set. 16 IHE: IP Header Error When set, this bit indicates that the GMAC transmitter detected an error in the IP datagram header. The transmitter checks the header length in the IPv4 packet against the number of header bytes received from the application and indicates an error status if there is a mismatch. For IPv6 frames, a header error is reported if the main header length is not 40 bytes. Furthermore, the Ethernet Length/Type field value for an IPv4 or IPv6 frame must match the IP header version received with the packet. For IPv4 frames, an error status is also indicated if the Header Length field has a value less than 0x5. 294/590 Doc ID 018553 Rev 3 RM0078 Giga/Fast Ethernet controller (GMAC) Table 118. Transmit descriptor words 0 through 3 (TDES0 — TDES3) (continued) Bitq Description 15 ES: Error Summary Indicates the logical OR of the following bits: – TDES0[14]: Jabber Timeout – TDES0[13]: Frame Flush – TDES0[11]: Loss of Carrier – TDES0[10]: No Carrier – TDES0[9]: Late Collision – TDES0[8]: Excessive Collision – TDES0[2]: Excessive Deferral – TDES0[1]: Underflow Error – TDES0[16]: IP Header Error TDES0[12]: IP Payload Error 14 JT: Jabber Timeout When set, this bit indicates the GMAC transmitter has experienced a jabber time-out. This bit is only set when the GMAC configuration register’s JD bit is not set. 13 FF: Frame Flushed When set, this bit indicates that the DMA/MTL flushed the frame due to a software Flush command given by the CPU. 12 IPE: IP Payload Error When set, this bit indicates that GMAC transmitter detected an error in the TCP, UDP, or ICMP IP datagram payload. The transmitter checks the payload length received in the IPv4 or IPv6 header against the actual number of TCP, UDP, or ICMP packet bytes received from the application and issues an error status in case of a mismatch. 11 LC: Loss of Carrier When set, this bit indicates that a loss of carrier occurred during frame transmission (that is, the gmii_crs_i signal was inactive for one or more transmit clock periods during frame transmission). This is valid only for the frames transmitted without collision when the GMAC operates in Half-Duplex mode. 10 NC: No Carrier When set, this bit indicates that the Carrier Sense signal form the PHY was not asserted during transmission. 9 LC: Late Collision When set, this bit indicates that frame transmission was aborted due to a collision occurring after the collision window (64 byte-times, including preamble, in MII mode and 512 byte-times, including preamble and carrier extension, in GMII mode). This bit is not valid if the Underflow Error bit is set. 8 EC: Excessive Collision When set, this bit indicates that the transmission was aborted after 16 successive collisions while attempting to transmit the current frame. If the DR (Disable Retry) bit in the GMAC Configuration register is set, this bit is set after the first collision, and the transmission of the frame is aborted. 7 VF: VLAN Frame When set, this bit indicates that the transmitted frame was a VLAN-type frame. Doc ID 018553 Rev 3 295/590 Giga/Fast Ethernet controller (GMAC) RM0078 Table 118. Transmit descriptor words 0 through 3 (TDES0 — TDES3) (continued) Bitq Description 6:3 CC: Collision Count (Status field) These status bits indicate the number of collisions that occurred before the frame was transmitted. This count is not valid when the Excessive Collisions bit (TDES0[8]) is set. The core updates this status field only in the half-duplex mode. -orSLOTNUM: Slot Number Control Bits in AV Mode These bits indicate the slot interval in which the data should be fetched from the corresponding buffers addressed by TDES2 or TDES3. When the transmit descriptor is fetched, the DMA compares the slot number value in this field with the slot interval maintained in the core (Register 11xx). It fetches the data from the buffers only if there is a match in values. These bits are valid only for the AV channel 1 (not channel 0). 2 ED: Excessive Deferral When set, this bit indicates that the transmission has ended because of excessive deferral of over 24,288 bit times (155,680 bits times in 1,000-Mbps mode or if Jumbo Frame is enabled) if the Deferral Check (DC) bit in the GMAC Control register is set high. 1 UF: Underflow Error When set, this bit indicates that the GMAC aborted the frame because data arrived late from the Host memory. Underflow Error indicates that the DMA encountered an empty transmit buffer while transmitting the frame. The transmission process enters the Suspended state and sets both Transmit Underflow (Register 5[5]) and Transmit Interrupt (Register 5[0]). 0 DB: Deferred Bit When set, this bit indicates that the GMAC defers before transmission because of the presence of carrier. This bit is valid only in Half-Duplex mode. TDES1 31:29 Reserved 28:16 TBS2: Transmit Buffer 2 Size These bits indicate the second data buffer size in bytes. This field is not valid if TDES0[20] is set. 15:13 Reserved 12:0 TBS1: Transmit Buffer 1 Size These bits indicate the first data buffer byte size, in bytes. If this field is 0, the DMA ignores this buffer and uses Buffer 2 or the next descriptor, depending on the value of TCH (TDES0[20]). TDES2 31:0 Buffer 1 Address Pointer These bits indicate the physical address of Buffer 1. There is no limitation on the buffer address alignment. TDES3 31:0 296/590 Buffer 2 Address Pointer (Next Descriptor Address) Indicates the physical address of Buffer 2 when a descriptor ring structure is used. If the Second Address Chained (TDES1[24]) bit is set, this address contains the pointer to the physical memory where the Next Descriptor is present. The buffer address pointer must be aligned to the bus width only when TDES1[24] is set. (LSBs are ignored internally.) Doc ID 018553 Rev 3 RM0078 Giga/Fast Ethernet controller (GMAC) Table 119. Transmit descriptor words 6 and 7 (TDES6 and TDES7) Bit Description TDES6 TTSL: Transmit Frame Timestamp Low 31:0 This field is updated by DMA with the least significant 32 bits of the timestamp captured for the corresponding transmit frame. This field has the timestamp only if the Last Segment bit (LS) in the descriptor is set and Timestamp status (TTSS) bit is set. TDES7 TTSH: Transmit Frame Timestamp High 31:0 This field is updated by DMA with the most significant 32 bits of the timestamp captured for the corresponding receive frame. This field has the timestamp only if the Last Segment bit (LS) in the descriptor is set and Timestamp status (TTSS) bit is set. Receive descriptors Figure 95 shows the structure of the received descriptor. This has 32 bytes of descriptor data (8 DWORDs) for Advanced Timestamp or IPC Full Offload feature. Note: For each of these features, the software should set the DMA Bus Mode register[7] so that the DMA operates with extended descriptor size. When this control bit is reset, RDES0[7] and RDES0[0] is always cleared and the RDES4-RDES7 descriptor space are not valid. Figure 95. Receive descriptor fields - alternate (enhanced) format 31 RDES0 0 O W N RDES1 CTRL Status [30:0] RES [30:29] Buffer 2 Byte Count [28:16] CTRL [15:14] R E S Buffer 1 Byte Count [12:0] RDES2 Buffer 1 Address [31:0] RDES3 Buffer 2 Address [31:0] or Next Descriptor Address [31:0] RDES4 Extended Status [31:0] RDES5 Reserved RDES6 Receive Time Stamp Low [31:0] RDES7 Receive Time Stamp High [31:0] ● Table 120 describes RDES0 through RDES3. ● The extended status is written as shown in Table 121. The extended status is written only when there is status related to IPC or timestamp available. The availability of extended status is indicated by bit-0 of RDES0. This status is available for Advance Timestamp or IPC Full Offload features. ● RDES6 and RDES7 contain a snapshot of the time-stamp. The availability of that snapshot is indicated by bit-7 in the RDES0 descriptor. Table 122 lists contents of RDES6 and RDES7. Doc ID 018553 Rev 3 297/590 Giga/Fast Ethernet controller (GMAC) RM0078 Table 120. Receive descriptor fields (RDES0 through RDES3) Bit Description RDES0 31 OWN: Own Bit When set, this bit indicates that the descriptor is owned by the DMA of the GMAC Subsystem. When this bit is reset, this bit indicates that the descriptor is owned by the Host. The DMA clears this bit either when it completes the frame reception or when the buffers that are associated with this descriptor are full. 30 AFM: Destination Address Filter Fail When set, this bit indicates a frame that failed in the DA Filter in the GMAC Core. FL: Frame Length These bits indicate the byte length of the received frame that was transferred to host memory (including CRC). This field is valid when Last Descriptor (RDES0[8]) is set and either the Descriptor Error (RDES0[14]) or Overflow Error bits are reset. The frame length also includes the two bytes appended to the 29:16 Ethernet frame when IP checksum calculation (Type 1) is enabled and the received frame is not a MAC control frame. This field is valid when Last Descriptor (RDES0[8]) is set. When the Last Descriptor and Error Summary bits are not set, this field indicates the accumulated number of bytes that have been transferred for the current frame. 15 ES: Error Summary Indicates the logical OR of the following bits: – RDES0[1]: CRC Error – RDES0[3]: Receive Error – RDES0[4]: Watchdog Timeout – RDES0[6]: Late Collision – RDES0[7]: Giant Frame – RDES4[4:3]: IP Header/Payload Error – RDES0[11]: Overflow Error RDES0[14]: Descriptor Error This field is valid only when the Last Descriptor (RDES0[8]) is set. 14 DE: Descriptor Error When set, this bit indicates a frame truncation caused by a frame that does not fit within the current descriptor buffers, and that the DMA does not own the Next Descriptor. The frame is truncated. This field is valid only when the Last Descriptor (RDES0[8]) is set. 13 SAF: Source Address Filter Fail When set, this bit indicates that the SA field of frame failed the SA Filter in the GMAC Core. 12 LE: Length Error When set, this bit indicates that the actual length of the frame received and that the Length/ Type field does not match. This bit is valid only when the Frame Type (RDES0[5]) bit is reset. 11 OE: Overflow Error When set, this bit indicates that the received frame was damaged due to buffer overflow in MTL. 10 298/590 VLAN: VLAN Tag When set, this bit indicates that the frame pointed to by this descriptor is a VLAN frame tagged by the GMAC Core. Doc ID 018553 Rev 3 RM0078 Giga/Fast Ethernet controller (GMAC) Table 120. Receive descriptor fields (RDES0 through RDES3) (continued) Bit Description RDES0 (cont’d) 9 FS: First Descriptor When set, this bit indicates that this descriptor contains the first buffer of the frame. If the size of the first buffer is 0, the second buffer contains the beginning of the frame. If the size of the second buffer is also 0, the next Descriptor contains the beginning of the frame. 8 LS: Last Descriptor When set, this bit indicates that the buffers pointed to by this descriptor are the last buffers of the frame 7 Timestamp Available/IP Checksum Error (Type1) / Giant Frame When Advanced Timestamp feature is present: When set, this bit indicates that a snapshot of the Timestamp is written in descriptor words 6 (RDES6) and 7 (RDES7). This is valid only when the Last Descriptor bit (RDES0[8]) is set. When IP Checksum Engine (Type 1) is selected: When set, this bit indicates that the 16-bit IPv4 Header checksum calculated by the core did not match the received checksum bytes. Otherwise: When set, this bit indicates the Giant Frame Status. Giant frames are larger-than-1,518-byte (or 1,522-byte for VLAN) normal frames, and larger-than-9,018-byte (9,022-byte for VLAN) jumbo frames (when Jumbo Frame processing is enabled). 6 LC: Late Collision When set, this bit indicates that a late collision has occurred while receiving the frame in Half-Duplex mode. 5 FT: Frame Type When set, this bit indicates that the Receive Frame is an Ethernet-type frame (the LT field is greater than or equal to 16’h0600). When this bit is reset, it indicates that the received frame is an IEEE802.3 frame. This bit is not valid for Runt frames less than 14 bytes. 4 RWT: Receive Watchdog Timeout When set, this bit indicates that the Receive Watchdog Timer has expired while receiving the current frame and the current frame is truncated after the Watchdog Timeout. 3 RE: Receive Error When set, this bit indicates that the gmii_rxer_i signal is asserted while gmii_rxdv_i is asserted during frame reception. This error also includes carrier extension error in GMII and Half-duplex mode. Error can be of less/no extension, or error (rxd ≠ 0f) during extension. 2 DE: Dribble Bit Error When set, this bit indicates that the received frame has a non-integer multiple of bytes (odd nibbles). This bit is valid only in MII Mode. 1 CE: CRC Error When set, this bit indicates that a Cyclic Redundancy Check (CRC) Error occurred on the received frame. This field is valid only when the Last Descriptor (RDES0[8]) is set. 0 Extended Status Available/Rx MAC Address When either Advanced Timestamp or IP Checksum Offload (Type 2) is present, this bit, when set, indicates that the extended status is available in descriptor word 4 (RDES4). This is valid only when the Last Descriptor bit (RDES0[8]) is set. When Advance Timestamp Feature or IPC Full Offload is not selected, this bit indicates Rx MAC Address status. When set, this bit indicates that the Rx MAC Address registers value (1 to 31) matched the frame’s DA field. When reset, this bit indicates that the Rx MAC Address Register 0 value matched the DA field. Doc ID 018553 Rev 3 299/590 Giga/Fast Ethernet controller (GMAC) RM0078 Table 120. Receive descriptor fields (RDES0 through RDES3) (continued) Bit Description RDES1 31 DIC: Disable Interrupt on Completion When set, this bit prevents setting the Status Register’s RI bit (CSR5[6]) for the received frame ending in the buffer indicated by this descriptor. This, in turn, disables the assertion of the interrupt to Host due to RI for that frame. 30:29 Reserved RBS2: Receive Buffer 2 Size These bits indicate the second data buffer size, in bytes. The buffer size must be a multiple of 4, 8, or 16, 28:16 depending on the bus widths (32, 64, or 128, respectively), even if the value of RDES3 (buffer2 address pointer) is not aligned to bus width. If the buffer size is not an appropriate multiple of 4, 8, or 16, the resulting behavior is undefined. This field is not valid if RDES1[14] is set. 15 RER: Receive End of Ring When set, this bit indicates that the descriptor list reached its final descriptor. The DMA returns to the base address of the list, creating a descriptor ring. 14 RCH: Second Address Chained When set, this bit indicates that the second address in the descriptor is the Next Descriptor address rather than the second buffer address. When this bit is set, RBS2 (RDES1[28:16]) is a “don’t care” value. RDES1[15] takes precedence over RDES1[14]. 13 Reserved 12:0 RBS1: Receive Buffer 1 Size Indicates the first data buffer size in bytes. The buffer size must be a multiple of 4, 8, or 16, depending upon the bus widths (32, 64, or 128), even if the value of RDES2 (buffer1 address pointer) is not aligned. When the buffer size is not a multiple of 4, 8, or 16, the resulting behavior is undefined. If this field is 0, the DMA ignores this buffer and uses Buffer 2 or next descriptor depending on the value of RCH (Bit 14). RDES2 31:0 Buffer 1 Address Pointer These bits indicate the physical address of Buffer 1. There are no limitations on the buffer address alignment except for the following condition: The DMA uses the configured value for its address generation when the RDES2 value is used to store the start of frame. Note that the DMA performs a write operation with the RDES2[3/2/1:0] bits as 0 during the transfer of the start of frame but the frame data is shifted as per the actual Buffer address pointer. The DMA ignores RDES2[3/2/1:0] (corresponding to bus width of 128/64/32) if the address pointer is to a buffer where the middle or last part of the frame is stored. RDES3 31:0 300/590 Buffer 2 Address Pointer (Next Descriptor Address) These bits indicate the physical address of Buffer 2 when a descriptor ring structure is used. If the Second Address Chained (RDES1[24]) bit is set, this address contains the pointer to the physical memory where the Next Descriptor is present. If RDES1[24] is set, the buffer (Next Descriptor) address pointer must be bus width-aligned (RDES3[3, 2, or 1:0] = 0, corresponding to a bus width of 128, 64, or 32. LSBs are ignored internally.) However, when RDES1[24] is reset, there are no limitations on the RDES3 value, except for the following condition: The DMA uses the configured value for its buffer address generation when the RDES3 value is used to store the start of frame. The DMA ignores RDES3 [3, 2, or 1:0] (corresponding to a bus width of 128, 64, or 32) if the address pointer is to a buffer where the middle or last part of the frame is stored. Doc ID 018553 Rev 3 RM0078 Giga/Fast Ethernet controller (GMAC) Table 121. Extended status — receive descriptor fields 4 (RDES4) Bit Description 31:21 Reserved 20:18 VLAN Tag Priority Value These bits give the VLAN tag’s user value in the received packet. These bits are valid only when the RDES4 bits 16 and 17 are set. 17 AV Tagged Packet Received When set, this bit indicates that an AV tagged packet is received. Otherwise, this bit indicates that an untagged AV packet is received. This bit is valid when bit 16 (AV Packet Received) is set. 16 AV Packet Received When set, this bit indicates that an AV packet is received. 15 Reserved 14 Timestamp Dropped When set, this bit indicates that the timestamp was captured for this frame but got dropped in the MTL RxFIFO because of overflow. This bit is available only when you select the Advanced Timestamp feature. Otherwise, this bit is reserved. 13 PTP Version When set, this bit indicates that the received PTP message is having the IEEE 1588 version 2 format. When reset, it has the version 1 format. This is valid only if the message type is non-zero. This bit is available only if Advance Timestamp feature is selected else it is reserved. 12 PTP Frame Type When set, this bit indicates that the PTP message is sent directly over Ethernet. When this bit is not set and the message type is non-zero, it indicates that the PTP message is sent over UDP-IPv4 or UDP-IPv6. The information on IPv4 or IPv6 can be obtained from bits 6 and 7. This bit is available only if Advanced Timestamp feature is selected. 11:8 Message Type These bits are encoded to give the type of the message received. 0000: No PTP message received 0001: SYNC (all clock types) 0010: Follow_Up (all clock types) 0011: Delay_Req (all clock types) 0100: Delay_Resp (all clock types) 0101: Pdelay_Req (in peer-to-peer transparent clock) 0110: Pdelay_Resp (in peer-to-peer transparent clock) 0111: Pdelay_Resp_Follow_Up (in peer-to-peer transparent clock) 1000: Announce 1001: Management 1010: Signaling 1011-1110: Reserved 7 IPv6 Packet Received When set, this bit indicates that the received packet is an IPv6 packet. 6 IPv4 Packet Received When set, this bit indicates that the received packet is an IPv4 packet. 5 IP Checksum Bypassed When set, this bit indicates that the checksum offload engine is bypassed. Doc ID 018553 Rev 3 301/590 Giga/Fast Ethernet controller (GMAC) RM0078 Table 121. Extended status — receive descriptor fields 4 (RDES4) (continued) Bit Description 4 IP Payload Error When set, this bit indicates that the 16-bit IP payload checksum (that is, the TCP, UDP, or ICMP checksum) that the core calculated does not match the corresponding checksum field in the received segment. It is also set when the TCP, UDP, or ICMP segment length does not match the payload length value in the IP Header field. 3 IP Header Error When set, this bit indicates either that the 16-bit IPv4 header checksum calculated by the core does not match the received checksum bytes, or that the IP datagram version is not consistent with the Ethernet Type value. 2:0 IP Payload Type These bits indicate the type of payload encapsulated in the IP datagram processed by the Receive Checksum Offload Engine (COE). The COE also sets these bits to 2'b00 if it does not process the IP datagram’s payload due to an IP header error or fragmented IP. 3'b000: Unknown or did not process IP payload 3'b001: UDP 3'b010: TCP 3'b011: ICMP 3’b1xx: Reserved S Table 122. Time-stamp snapshot — receive descriptor fields 6 and 7 (RDES6 & RDES7) Bit Description RDES6 RTSL: Receive Frame Timestamp Low 31:0 This field is updated by DMA with the least significant 32 bits of the timestamp captured for the corresponding receive frame. This field is updated by DMA only for the last descriptor of the receive frame which is indicated by Last Descriptor status bit (RDES0[8]). RDES7 RTSH: Receive Frame Timestamp High 31:0 This field is updated by DMA with the most significant 32 bits of the timestamp captured for the corresponding receive frame. This field is updated by DMA only for the last descriptor of the receive frame which is indicated by Last Descriptor status bit (RDES0[8]). 302/590 Doc ID 018553 Rev 3 RM0078 19.4.2 Giga/Fast Ethernet controller (GMAC) Precision Time Protocol (PTP) The IEEE 1588-2002 standard defines a protocol, Precision Time Protocol (PTP), which enables precise synchronization of clocks in measurement and control systems implemented with technologies such as network communication, local computing, and distributed objects. The PTP applies to systems communicating by local area networks supporting multicast messaging, including Ethernet. This protocol enables heterogeneous systems and supports system-wide synchronization accuracy in the sub-microsecond range with minimal network and local clock computing resources. The PTP is transported over UDP/IP. The system or network is classified into Master and Slave nodes for distributing the timing and clock information. Figure below shows the process that PTP uses for synchronizing a slave node to a master node by exchanging PTP messages. Figure 96. Networked time synchronization Master Clock Time t1 Slave Clock Time Sync message Data at Slave Clock t2 t2m t2 Follow_Up message containing value of t4 t1, t2 t3m Delay_Resp message t3 t1, t2, t3 t4 Delay_Resp message containing value of t4 time Doc ID 018553 Rev 3 t1, t2, t3, t4 303/590 Giga/Fast Ethernet controller (GMAC) RM0078 Figure 96 shows the PTP process: 1. The master broadcasts the PTP Sync messages to all its nodes. The Sync message contains the master's reference time information. The time at which this message leaves the master's system is t1. This time must be captured, for Ethernet ports, at GMII or MII. 2. The slave receives the Sync message and also captures the exact time, t2, using its timing reference. 3. The master sends a Follow_up message to the slave, which contains t1 information for later use. 4. The slave sends a Delay_Req message to the master, noting the exact time, t3, at which this frame leaves the GMII/MII. 5. The master receives the message, capturing the exact time, t4, at which it enters its system. 6. The master sends the t4 information to the slave in the Delay_Resp message. 7. The slave uses the four values of t1, t2, t3, and t4 to synchronize its local timing reference to the master's timing reference. Most of the PTP implementation is done in the software above the UDP layer. However, the hardware support is required to capture the exact time when specific PTP packets enter or leave the Ethernet port at the GMII/MII. To get a snapshot of the time, the MAC requires a reference time in 64-bit format. The GMAC provides the following two options for using the reference timing source in a node: ● External Timestamp Input Option that takes an external 64-bit timing reference and its clock as input used for synchronize the timing reference to the MAC clock domain. The 64-bit timing reference is split in two 32-bit signals: Upper 32-bits (providing the time in seconds) and Lower 32-bits (providing the time in nanoseconds) ● Internal Reference Time Option that takes only the reference clock input and uses it to generate the Reference time (also called the System Time) internally and capture timestamps. The generation, update, and modification of the System Time are described in the next paragraph. System time register module The System Time Generator module is optional and is not available if external time updating is enabled. The 64-bit time is maintained updated using the input reference clock (clk_ptp_ref_i). This time is the source for taking snapshots (timestamps) of Ethernet frames being transmitted or received at the GMII. The System Time counter can be initialized or corrected using the coarse correction method. In this method, the initial value or the offset value is written to the Timestamp Update register (See RM0089, Reference manual, SPEAr1340 address map and registers). For initialization, the System Time counter is written with the value in theTimestamp Update registers, while for system time correction, the offset value is added to or subtracted from the system time. In the fine correction method, a slave clock's (clk_ptp_ref_i) frequency drift with respect to the master clock is corrected over a period of time instead of in one clock, coarse correction. In this method, an accumulator sums up the contents of the Addend register, as shown in Figure 97. The arithmetic carry that the accumulator generates is used as a pulse to increment the system time counter (both the accumulator and the addend are 32-bit registers). 304/590 Doc ID 018553 Rev 3 RM0078 Giga/Fast Ethernet controller (GMAC) Figure 97. System time update using fine method addend_val[31:0) addend_updt Addend register + Accumulator register Constant value incr_sub_sec_reg + Sub-second register incr_sec_reg Second register Doc ID 018553 Rev 3 305/590 Giga/Fast Ethernet controller (GMAC) RM0078 Transmit path functions The MAC captures a timestamp when the Start Frame Delimiter (SFD) of a frame is sent on GMII/MII, each transmit frame can be marked to indicate whether a timestamp should be captured for that frame. It must be specified from the user the frame for which the timestamp will be captured because the MAC does not process the transmitted frames to identify the PTP frames. The MAC returns the timestamp to the software inside the corresponding transmit descriptor in the TDES2 and TDES3 fields. The TDES2 field holds the 32 least significant bits of the timestamp. In case of alternate (enhanced) descriptor, the MAC writes the 64-bit timestamp in TDES6 and TDES7, respectively. Receive path functions The MAC captures the timestamp of all frames received on the GMII or MII interface and does not process the received frames to identify the PTP frames in the default mode, that is, when the Advanced Timestamp feature is not selected. The DMA returns the timestamp to the software in the corresponding receive descriptor, using the RDES2 and RDES3 fields. The RDES2 holds the 32 least significant bits of the timestamp, except as mentioned in "Receive Timestamp" on page 500. The timestamp is written only to that receive descriptor for which the Last Descriptor status field has been set to 1 (the EOF marker). When the timestamp is not available an all-ones pattern is written to the descriptors (RDES2 and RDES3), indicating that timestamp is not correct. If the software uses a control register bit to disable timestamping, the DMA does not alter RDES2 or RDES3. In case of alternate (enhanced) descriptor, the MAC writes the 64-bit timestamp in RDES6 and RDES7, respectively. The RDES0[7] field indicates whether the timestamp is updated in RDES6 and RDES77 or not. Timestamp error margin As mentioned in the previous paragraph the timestamp must be captured at the SFD of the transmitted and received frames at the GMII or MII interface, because the reference timing source (the PTP clock, clk_ptp_ref_i) is taken as different from the GMII or MII clocks, a small error margin is introduced, because of the transfer of information across asynchronous clock domains. In the transmit path, the captured and reported timestamp has a maximum error margin of 2 PTP clocks. This means that the captured timestamp has the reference timing source value that is given within 2 clocks after the SFD has been transmitted on the GMII. Similarly, in the receive path, the error margin is 3 GMII or MII clocks, plus up to 2 PTP clocks. It is possible to ignore the error margin because of the three GMII or MII clocks by assuming that this constant delay is present in the system before the SFD data reaches the GMII or MII interface of MAC. Frequency range of reference timing clock The timestamp information is transferred across asynchronous clock domains,from MAC clock domain to application clock domain. Therefore, a minimum delay is required between two consecutive timestamp captures, this delay is 4 clock cycles of GMII or MII and 3 clock cycles of PTP clocks. If the delay between two timestamp captures is less than this delay, the MAC does not take a timestamp snapshot for the second frame. The maximum PTP clock frequency is limited by the maximum resolution of the reference time. 306/590 Doc ID 018553 Rev 3 RM0078 Giga/Fast Ethernet controller (GMAC) The minimum PTP clock frequency depends on the time required between two consecutive SFD bytes. 19.4.3 Advanced Timestamps In addition to the basic timestamp features , the GMAC supports the following advanced timestamp features: ● Supports the IEEE 1588-2008 (version 2) timestamp format. ● Provides an option to take snapshot of all frames or only PTP type frames and event messages ● Provides an option to take the snapshot based on the clock type: ordinary, boundary, end-to-end, and peer-to-peer. ● Provides an option to select the node to be a Master or Slave for ordinary and boundary clock. ● Identifies the PTP message type, version, and PTP payload in frames sent directly over Ethernet and sends the status. ● Provides an option to measure sub-second time in digital or binary format. Clock types The GMAC supports the following clock types defined in the IEEE 1588-2008 standard: ● Ordinary Clock ● Boundary Clock ● End-to-End Transparent Clock ● Peer-to-Peer Transparent Clock Ordinary clock The ordinary clock in a domain supports a single copy of the protocol and has a single PTP state and a single physical port. It can be a grandmaster or a slave clock and supports the following features: ● Sends and receives PTP messages. ● Maintains the data sets such as timestamp values. Boundary clock The boundary clock is similar to the ordinary except for the following features: ● The clock data sets are common to all ports of the boundary clock ● The local clock is common to all ports of the boundary clock. End-to-end transparent clock The end-to-end transparent clock supports the end-to-end delay measurement mechanism between slave clocks and the master clock. The end-to-end transparent clock forwards all messages like normal bridge, router, or repeater. The residence time of a PTP packet is the time taken by the PTP packet from the Ingress port to the Egress port. The residence time of a SYNC packet inside the end-to-end transparent clock is updated in the correction field of the associated Follow_Up PTP packet before it is transmitted. Similarly, the residence time of a Delay_Req packet inside the end-to-end transparent clock is updated in the correction field of the associated Delay_Resp PTP packet before it is transmitted. Doc ID 018553 Rev 3 307/590 Giga/Fast Ethernet controller (GMAC) RM0078 Peer-to-peer transparent clock The peer-to-peer transparent clock differs from the end-to-end transparent clock in the way it corrects and handles the PTP timing messages. In all other aspects, it is identical to the end-to-end transparent clock. Reference timing source The MAC supports the following reference timing source features ● 48 bit seconds field ● Fixed Pulse-Per-Second Output 48 bit seconds field The MAC supports 80-bit timestamp with the following fields: ● UInteger48 secondsField The seconds field is the integer portion of the timestamp in units of seconds and is 48bits wide. ● UInteger32 nanosecondsField The nanoseconds field is the fractional portion of the timestamp in units of nanoseconds. The nanoseconds field supports the following two modes: – Digital rollover mode in which the maximum value in the nanoseconds field is 0x3B9A_C9FF, that is, (10e9-1) nanoseconds. – Binary rollover mode: In binary rollover mode, the nanoseconds field rolls over and increments the seconds field after value 0x7FFF_FFFF. You can set these modes by using Bit 9 (TSCTRLSSR) Timestamp Control Register. When the advanced timestamp feature is selected, the timestamp maintained in the MAC is still 64-bit wide. Fixed pulse-per-second output The GMAC supports the pulse-per-second (PPS) output that is given to indicate 1 second interval (default). The frequency of the PPS output can be changed by setting Bits[3:0], PPSCTRL in PPS Control Register. PPS start or stop time The start time can initially programmed in the Target Time registers. The start or stop time should be programmed with advanced system time to ensure proper PPS signal output. If the application programs a start or stop time that has already elapsed, then the MAC sets an error status bit indicating the programming error. If enabled, the MAC also sets the Target Time Reached interrupt event. The application can cancel the start or stop request only if the corresponding start or stop time has not elapsed. If the time has elapsed, the cancel command has no effect. PPS width and interval The PPS width and interval are programmed in terms of number of the units of sub-second increment value. Transmit path functions The structure of the descriptor changes when you enable the advanced timestamp feature. The advanced timestamp feature is supported only through Alternate (Enhanced) 308/590 Doc ID 018553 Rev 3 RM0078 Giga/Fast Ethernet controller (GMAC) descriptors format. The descriptor is 32-bytes long (8 DWORDS) and the snapshot of the timestamp is written in descriptor TDES6 and TDES7. Receive path functions When the advanced timestamp feature is selected, the MAC processes the received frames to identify valid PTP frames. The DMA returns the timestamp to the software inside the corresponding Transmit and Receive Descriptor. The advanced timestamp feature is supported only with the 32-bytes long Alternate (Enhanced) descriptor. The extended status, containing the timestamp message status and the IPC status, is written in descriptor RDES4 and the snapshot of the timestamp is written in descriptors RDES6 and RDES7. 19.4.4 AV feature The Audio Video (AV) feature enables transmission of time-sensitive traffic over bridged local area networks (LANs). The GMAC supports the AV data transfer in 100 Mbps and 1000 Mbps modes only in full duplex mode. A single master interface is connected to two DMA channels (channel 0, channel 1). A DMA arbiter helps in arbitration of all the paths (transmit and receive) in channel 0 and channel 1. Each channel has a separate Control and Status register (CSR) for managing the transmit and receive functions, descriptor handling, and interrupt handling. Transmit path functions The transmit path of channel 0 supports strict priority algorithm and is used for best-effort traffic. For a channel, the strict priority algorithm determines that a frame is available for transmission if the channel contains one or more frames. When the threshold mode for MTL Tx FIFO is enabled, the strict priority algorithm determines that a frame is available for transmission if the channel contains a partial frame of size equal to the programmed threshold limit. The transmit paths of channel 1 support traffic management by using the credit-based shaper algorithm. For a channel, the credit-based shaper algorithm determines that a frame is available for transmission if the following conditions are true: ● The channel contains one or more frames. ● The credit for the channel is positive as per the algorithm. You can disable the credit-based shaper algorithm for channel 1. When disabling the creditbased shaper algorithm for a channel, the channel uses the default strict priority algorithm. Each transmit DMA has a separate descriptor chain for fetching the transmit data. The transmit channel that gets the access to the system bus depends on the DMA arbiter. The transmit path has separate FIFOs (MTL layer) for each channel. The data fetched by the DMA is put in the respective FIFO. The traffic management and scheduler unit (TMS) controls which FIFO data is transmitted by the MAC. If the credit-based shaper algorithm is enabled for channel 1, then the corresponding channel is selected for transmission if the following conditions are true: ● If the frame is available in the channel and has a positive or zero credit. ● If the higher priority channel has no frame waiting in the FIFO. If the credit-based shaper algorithm is disabled for channel 1, then the frame to be transmitted from a channel is selected based on the following priority scheme: channel 0 at priority 0 (low) and channel 1 at priority 1 (high). Doc ID 018553 Rev 3 309/590 Giga/Fast Ethernet controller (GMAC) RM0078 Receive path functions The receive path of channel 0 and 1 are enabled by default. The AV packets can be of the following two types: 19.4.5 ● AV data packets: The AV data packets are always tagged. The tagged AV control packets are received based on the programmed priority value. You can program bits [18:16], AVP, in register 462 (AV MAC Control Register) to specify the channel to which an AV packet with a given priority must be sent. ● AV control packets: The AV control packets can be either tagged or untagged. The untagged AV control packets are received on Channel 0 by default. To receive these packets on Channel 1, You can program bits [25:24], AVCH, of register 462 (AV MAC Control Register) (offset 0x0738). Similar to the AV data packets, the tagged AV control packets are received based on the programmed priority value. Energy efficient ethernet Energy Efficient Ethernet (EEE) is an optional operational mode that enables the IEEE 802.3 Media Access Control (MAC) sublayer along with a family of physical layers to operate in the Low-Power Idle (LPI) mode. The EEE operational mode supports the IEEE 802.3 MAC operation at 100 Mbps, 1000 Mbps, and 10 Gbps. The LPI mode allows power saving by switching off parts of the communication device functionality when there is no data to be transmitted and received. The systems on both sides of the link can disable some functionalities and save power during the periods of lowlink utilization. The MAC controls whether the system should enter or exit the LPI mode and communicates this to the PHY. The EEE specifies the capabilities negotiation methods that the link partners can use to determine whether EEE is supported and then select the set of parameters that common to both devices. Note: 19.5 1 Even if the MAC supports multiple PHY interfaces, you should activate the EEE mode only when the MAC is operating with GMII and MII interface. 2 According to the Energy Efficient Ethernet standard (802.3az), the LPI mode is supported only in the full-duplex mode. Therefore, you should not enable the LPI mode when the MAC Transmitter is configured for the half-duplex mode. Programming This section describes how to initialize the DMA/GMAC registers in the proper sequence. 310/590 ● Initializing DMA on page 311 ● Initializing GMAC on page 312 ● Performing normal receive and transmit operation on page 313 ● Stopping and starting transmission on page 313 ● GMII link transitions on page 313 ● IEEE 1588 time stamping on page 314 ● AV feature initialization steps on page 315 ● Energy efficient ethernet initialization steps on page 316 Doc ID 018553 Rev 3 RM0078 19.5.1 Giga/Fast Ethernet controller (GMAC) Initializing DMA Perform the following steps to initialize the DMA. 1. 2. 3. 4. 5. 6. 7. Provide a software reset to reset all GMAC internal registers and logic. (Bus Mode Register – bit 0). Wait for the completion of the reset process. Poll bit 0 of the Bus Mode Register, which is only cleared after the reset operation is completed. Program the following fields to initialize the Bus Mode Register by setting the values in Bus Mode Register: a) Mixed Burst and AAL b) Fixed burst or undefined burst c) Burst length values and burst mode values. d) Descriptor Length (only valid if Ring Mode is used) e) Tx and Rx DMA Arbitration scheme and two-level priority weight for the channel Create a proper descriptor chain for transmit and receive. In addition, ensure that the DMA owns the receive descriptors by setting the bit 31 of the descriptor. When OSF mode is used, at least two descriptors are required. Make sure that your software creates three or more different transmit or receive descriptors in the chain before reusing any of the descriptors. Initialize receive and transmit descriptor list address with the base address of the transmit and receive descriptor (Receive Descriptor List Address Register and Transmit Descriptor List Address Register respectively). Program the following fields to initialize the mode of operation by setting the values in DMA Operation Mode Register: a) Receive and Transmit Store And Forward b) Receive and Transmit Threshold Control (RTC and TTC) c) Error Frame and undersized good frame forwarding enable d) OSF Mode 8. Clear the interrupt requests, by writing to those bits of the status register (interrupt bits only) that are set. For example, writing 1 into bit 16, the normal interrupt summary, clears this bit (Status Register). 9. Enable the interrupts by programming the Interrupt Enable Register. 10. Repeat steps 3 through 9 for channel 1 dedicated to AV feature. 11. Program the CBS control register, idleSlope, sendSlope, hiCredit, and loCredit registers of channel 1. 12. Start the Receive and Transmit DMA by setting SR (bit 1) and ST (bit 13) of the control registers for all channels. Doc ID 018553 Rev 3 311/590 Giga/Fast Ethernet controller (GMAC) 19.5.2 RM0078 Initializing GMAC The following GMAC Initialization operations can be performed after DMA initialization. If the MAC initialization is done before the DMA is set-up, enable the MAC receiver (last step below) only after the DMA is active. Otherwise, received frames fills the RxFIFO and overflow. Note that step 1 is different depending on whether the RTBI PHY interface is or is not enabled. 1. If the RTBI PHY interface is enabled: a) Program the GMAC AN Control Register to enable Auto-negotiation ANE (bit-12). Setting ELE (bit-14) of this register enables the PHY to loop back the transmit data and RAN (bit-9) can be set to restart Auto negotiation. a) Check the GMAC AN Status Register for completion of the Auto-negotiation process. ANC (bit-5) should be set. The link status (bit-2), when set, indicates that the link is up. If the RTBI PHY interface is not enabled: a) Program the GMAC (GMII Address Register for controlling the management cycles for external PHY. For example, Physical Layer Address PA (bits 15-11). In addition, set bit 0 (GMII Busy) for writing into PHY and reading from PHY. b) Read the 16-bit data of GMII Data Register from the PHY for link up, speed of operation, and mode of operation, by specifying the appropriate address value in bits 15-11 of GMII Address Register. 2. Provide the MAC address registers (MAC Address0 High Register and MAC Address0 Low Register). Additional MAC addresses must be programmed appropriately. 3. Program the Hash Table High and Hash Table Low Registers. 4. Program the following fields to set the appropriate filters for the incoming frames in MAC Frame Filter: 5. a) Receive All b) Promiscuous mode c) Hash or Perfect Filter d) Unicast, multicast, broadcast, and control frames filter settings Program the following fields for proper flow control in Flow Control Register: a) 312/590 Pause time and other pause frame control bits b) Receive and Transmit Flow control bits c) Flow Control Busy/Backpressure Activate 6. Program the Interrupt Mask register bits, as required, and if applicable, for your configuration. 7. Program the appropriate fields in MAC Configuration Register. For example, Interframe gap while transmission and jabber disable. Based on the Auto-negotiation you can set the Duplex mode (bit 11) or port select (bit 15). 8. Set the bits Transmit enable (TE bit-3) and Receive Enable (RE bit-2) in MAC Configuration Register. Doc ID 018553 Rev 3 RM0078 19.5.3 Giga/Fast Ethernet controller (GMAC) Performing normal receive and transmit operation For normal operation, perform the following steps: 19.5.4 1. For normal transmit and receive interrupts, read the interrupt status. Then, poll the descriptors, reading the status of the descriptor owned by the Host (either transmit or receive). 2. Set appropriate values for the descriptors, ensuring that transmit and receive descriptors are owned by the DMA to resume the transmission and reception of data. 3. If the descriptors are not owned by the DMA (or no descriptor is available), the DMA goes into SUSPEND state. The transmission or reception can be resumed by freeing the descriptors and issuing a poll demand by writing 0 into the Tx/Rx poll demand register (Transmit Poll Demand Register and Receive Poll Demand Register). 4. The values of the current host transmitter or receiver descriptor address pointer can be read for the debug process (Current Host Transmit Descriptor Register and Current Host Receive Descriptor Register). 5. The values of the current host transmit buffer address pointer and receive buffer address pointer can be read for the debug process (Current Host Transmit Buffer Address Register and Current Host Receive Buffer Address Register). Stopping and starting transmission Perform the following steps to pause the transmission for some time: 19.5.5 1. Disable the Transmit DMA (if applicable), by clearing bit 13 (ST: Start/Stop Transmission Command) of Operation Mode Register. 2. Wait for any previous frame transmissions to complete. You can check this by reading the appropriate bits of Debug Register. 3. Disable the MAC transmitter and MAC receiver by clearing the bit 3 (TE: Transmitter Enable) and bit 2 (RE: Receiver Enable) in MAC Configuration Register. 4. Disable the Receive DMA (if applicable), after making sure that the data in the Rx FIFO is transferred to the system memory (by reading Debug Register). 5. Make sure that both Tx FIFO and Rx FIFO are empty. 6. To restart the operation, first start the DMAs, and then enable the MAC Transmitter and Receiver. GMII link transitions Transmit and receive clocks are running when the link is down Perform the following steps when the link is down but the Transmit and Receive clocks are running: 1. Disable the Transmit DMA (if applicable), by clearing bit 13 (ST) of Operation Mode Register. 2. Disable the MAC receiver by clearing the bit 2 (RE) of MAC Configuration Register. 3. Wait for any previous frame transmissions to complete from the Tx FIFO. You can do this by reading the appropriate bits of Debug Register. -orFlush the Tx FIFO for faster empty operation. Doc ID 018553 Rev 3 313/590 Giga/Fast Ethernet controller (GMAC) RM0078 4. Disable the MAC transmitter by clearing bit 3 (TE) in MAC Configuration Register. 5. After the link is up, read the PHY registers to know the latest configuration and accordingly program the MAC registers. 6. Restart the operation by starting the Tx DMA, and then enabling the MAC Transmitter and Receiver. You do not need to disable the Rx DMA. As the Receiver is disabled, the FIFO does not get any data in the Rx FIFO. Transmit and receive clocks are stopped when the link is down 1. Wait till the link is up and the Transmit and Receive clocks are active. When the Transmit and Receive clocks are stopped, then disabling the transmit or receive operations does not have any effect. Therefore, the software must wait till the link is up again. 2. Disable the Transmit DMA (if applicable), by clearing bit 13 (ST) of the Operation Mode Register. 3. Disable the MAC receiver by clearing the bit 2 (RE) of MAC Configuration Register. 4. Wait for any previous frame transmissions to complete from the Tx FIFO. You can do this by reading the appropriate bits of Debug Register. -orFlush the Tx FIFO for faster empty operation. 19.5.6 5. Disable the MAC transmitter by clearing bit 3 (TE) in MAC Configuration Register. 6. After the link is up, read the PHY registers to know the latest configuration and accordingly program the MAC registers. 7. Restart the operation by starting the Tx DMA, and then enabling the MAC Transmitter and Receiver. IEEE 1588 time stamping Initializing system time generation You can enable the timestamp feature by setting bit 0 of the Timestamp control register. However, it is essential that the timestamp counter should be initialized after this bit is set. Perform the following steps during GMAC core initialization: 1. 314/590 Mask the Timestamp Trigger interrupt by setting the bit 9 of Interrupt Mask Register. 2. Program the bit 0 in Timestamp Control Register to enable time stamping. 3. Program the Sub-Second Increment Register based on the PTP clock frequency. 4. If you are using the Fine Correction approach, program the Timestamp Addend Register and set the bit 5 of Timestamp Control Register. 5. Poll the Timestamp Control register until the bit 5 is cleared. 6. Program the Timestamp Control register bit 1 to select the Fine Update method (if required). 7. Program the System Time - Seconds Update Register and System Time Nanoseconds Update Register with the appropriate time value. Doc ID 018553 Rev 3 RM0078 Giga/Fast Ethernet controller (GMAC) 8. Set the bit 2 in Timestamp Control Register. The Timestamp counter starts operation as soon as it is initialized with the value written in the Timestamp Update registers. 9. Note: Enable the MAC receiver and transmitter for proper time stamping. If timestamp operation is disabled by clearing bit 0 of Timestamp Control Register, you need to repeat all these steps to restart the timestamp operation. System time correction Use the following steps to synchronize or update the system time in one process (coarse correction method): 1. Set the offset (positive or negative) in the Timestamp Update registers . 2. Set bit 3 (TSUPDT) of the Timestamp Control Register. 3. The value in the Timestamp Update registers is added to or subtracted from the system time when the TSUPDT bit is cleared. Use the following steps to synchronize or update the system time to reduce system-time jitter (fine correction method): 19.5.7 1. Calculate the rate by which you want to make the system time increments slower or faster. 2. Update the Timestamp Addend Register with the new value and set the bit 5 of the Timestamp Control Register. 3. Wait for the time for which you want the new value of the Addend register to be active. You can do this by enabling the Timestamp Trigger interrupt after the system time reaches the target value. 4. Program the required target time in Target Time Seconds Register and (Target Time Nanoseconds Register. 5. Unmask the Timestamp interrupt by clearing bit 9 of Interrupt Mask Register. 6. Set bit 4 in Timestamp Control Register. 7. When this trigger causes an interrupt, read the Interrupt Status Register. 8. Reprogram the Timestamp Addend Register with the old value and set bit 5 again. AV feature initialization steps Enabling slot number checking You can use the slot number check feature to specify the intervals at which the channel 1 DMA fetches the frames from the AXI system bus. This feature is useful for a uniform and periodic transfer of the AV traffic from the host memory. The feature is available only when you enable time stamping and program the Sub-Second Increment Register. Perform the following steps to enable the slot number checking: Note: Perform these steps after Step 11 and before Step 12 of Section 19.5.1: Initializing DMA. Doc ID 018553 Rev 3 315/590 Giga/Fast Ethernet controller (GMAC) RM0078 1. Enable time stamping by following the steps described in Section : Initializing system time generation. 2. Make sure that the SLOTNUM field (bits 6:3) of Transmit Descriptor Word 0 (TDES0) contains a valid slot number. You can read the current reference slot number from the Slot Function Control and Status register. 3. Set the bit 0 (ESC: Enable Slot Comparison) of the Slot Function Control and Status register of a channel to enable the slot number checking. Enabling average bits per slot reporting The CBS Status register of the additional AV channels (channel 1 and channel 2) provides information about the average bits that are transmitted in a slot. The software can asynchronously read this register to retrieve information about the average bits transmitted per slot. Perform the following steps to enable average bits per slot reporting: Note: 1. Enable time stamping by following the steps described in Section : Initializing system time generation. 2. Program the bits 6:4 (SLC: Slot Count) of the CBS Control register of a channel with number of slots over which the average transmitted bits per slot need to be computed. 3. Enable the bit 17 (ABPSSIE: Average Bits Per Slot Interrupt Enable) of the CBS Control register of a channel to generate the average bits per slot interrupt. The frequency of this interrupt depends on the value programmed in Step 2. For example, when you program value 0 in the SLC field, the interrupt is generated at every 125 microsecond. When not required, you can disable this interrupt to stop the interrupt flooding. 4. Read the bits 16:0 (ABS: Average Bits per Slot) from the CBS Status register of a channel on each interrupt. Note: The software can read the ABS bits in polling mode even if the ABPSSIE bit is not enabled. When high, bit 17 (ABSU: ABS Updated) of the CBS Status register indicates that a new value is updated in the ABS field. 19.5.8 Energy efficient ethernet initialization steps You can configure the Energy Efficient Ethernet (EEE) feature in coreConsultant. Perform the following steps during GMAC core initialization: 1. Read the PHY register through the MDIO interface, check if the remote end has the EEE capability, and then negotiate the timer values. 2. Program the PHY registers through the MDIO interface (including the RX_CLK_stoppable bit that indicates to the PHY whether to stop RX clock in LPI mode.) 3. Program the bits [5:16] (LIT: LPI LS TIMER) and bits [15:0] (TWT: LPI TW TIMER) in LPI Timers Control Register. 4. Read the link status of the PHY chip by using the MDIO interface and update the bit 17 (PLS) of Register 12 (LPI Control and Status Register) accordingly. This update should be done whenever the link status in the PHY chip changes. 5. Set the bit 16 (LPIEN: LPI Enable) of LPI Control and Status Register to make the MAC enter the LPI state. The MAC enters the LPI mode after completing the transmission in progress and sets the bit 0 (TLPIEN: Transmit LPI Entry). 316/590 Doc ID 018553 Rev 3 RM0078 Note: Giga/Fast Ethernet controller (GMAC) If you want to make the MAC enter the LPI state only after it completes the transmission of all queued frames in the TxFIFO, you should stop the DMA before setting the LPIEN bit. For information about how to stop the DMA, see steps 1 and 2 in Section 19.5.4: Stopping and starting transmission. If you want to switch off the CSR clock, GMII transmit clock, or power to the rest of the system during the LPI state, you should wait for the TLPIEN interrupt of LPI Control and Status Register to be generated. Restore the clocks before performing the Step 6 when you want to come out of the LPI state. 6. Reset the bit 16 (LPIEN: LPI Enable) of LPI Control and Status Register to bring the MAC out of the LPI state. The MAC waits for the time programmed in the bits [15:0] (TWT: LPI TW TIMER) before setting the TLPIEX interrupt status bit and resuming the transmission. Doc ID 018553 Rev 3 317/590 USB 2.0 host controllers (UHC) 20 RM0078 USB 2.0 host controllers (UHC) This chapter focuses on UHC functionality and operation. For the UHC feature list, refer to the SPEAr1340 datasheet: ● Doc ID 023063, Data sheet, SPEAr1340, Dual-core Cortex A9 HMI embedded MPU For technical details about the programmable registers, refer to the following companion document: ● 20.1 RM0089, Reference manual, SPEAr1340 address map and registers Overview The SPEAr1340 device integrates 2 USB Host interfaces identified as UHC0 and UHC1. Each interface provides a high-speed Host controller (EHCI0, EHCI1) and a full-speed/lowspeed Host controller (OHCI0, OHCI1). Figure 98. UHC block diagram EHCI Operation AHB BIU AMBA AHB UTMI+PHY List Processor Root Hub EHCI Port0 Port1 To External PAD SOF Generator Packet Buffer USB2.0 EHCI Controller OHCI USB1.1 OHCI Controller OHCI USB1.1 OHCI Controller UHC 20.2 Pins For a complete pin description, refer to Doc ID 023063, Data sheet, SPEAr1340, Dual-core Cortex A9 HMI embedded MPU. 318/590 Doc ID 018553 Rev 3 RM0078 20.3 USB 2.0 host controllers (UHC) Clocks See Chapter 5: Reset and clock generator (RCG). 20.4 Functional description AHB bus interface unit (BIU) USB 2.0 Host access to the AHB bus is granted by the AHB bus interface unit (BIU), which consists of a master module and a slave module. The AHB BIU Slave module acts a slave on the AHB and responds to all EHCI/OHCI operational registers accesses from an AHB master. In particular, this module allows RW access to its operational registers through the AHB bus. Note: There is only a single AHB slave port in AHB BIU slave module for both EHCI and OHCI host controller registers access. The AHB BIU master module, acting as a master on the AHB, receives requests from the list processor block within the EHCI Host controller, and transfers data with system memory through the AHB bus. The AHB BIU Master supports 8-, 16-, and 32-bit data transfers, and 32-bit address transfers. Enhanced Host controller interface (EHCI) The EHCI Host controller, compliant with the EHCI specification (version 1.0), is embedded within the UHC to support the 480 Mbps high-speed (HS) transaction of USB 2.0.HS device. These are EHCI main blocks: ● List processor The list processor is the main block of the EHCI Host controller. The list processor is implemented with multiple state machines to perform the list service flow, which is set up by the host controller driver (HCD) according to the priority set in the operational registers. In addition, the list processor consists of a controller that interfaces with all the other EHCI Host controller blocks, such as the AHB BIU (master module), the packet buffer, the EHCI operational registers, the SOF generators and the root hub. ● Operational registers This block stores the implemented EHCI capability and operational registers as defined in the USB EHCI specification. In addition, some specific registers are also implemented in this block, to enable the programming of registers as the packet buffer depth, break memory transfer, frame length. The operational registers block interfaces with the AHB BIU (slave module), the list processor, and the root hub. ● Start-of-frame (SOF) generator The SOF generator block implements the counter which generates the start-of-frame packets to supply micro-SOFs for each microframe. The SOF counter runs in the PHY clock domain. Microframe duration is derived from the EHCI frame length adjustment (FLADJ) register value. This ensures that the Host microframe duration and per-port microframe duration remain the same. This block interfaces with the List Processor only. Doc ID 018553 Rev 3 319/590 USB 2.0 host controllers (UHC) ● RM0078 Packet buffer The packet buffer (PBUF) block provides storage and control for IN/OUT data transaction, with a configured size of 1024 bytes (256 x 32 = 1024 bytes). According to its functionality, the PBUF block interface with both the list processor and the root hub. specifically, during an OUT transaction, the list processor fetches data from the system memory and writes them in the PBUF. Besides, during an IN transaction, the data are written to PBUF by the Root Hub. The packet buffer size depends on the system latency and bandwidth allocated to the EHCI Host controller. For example, in case PBUF size is programmed to 64 bytes, a 1024-bytes IN transfer would get 1024/64 = 16 data transfer on the AHB bus. If the system is not able to ensure EHCI access to AHB bus for these 16 transfers with no breaks, then a buffer overrun occurs. In this case, to avoid buffer overrun or under-run, PBUF size could be set to 1024 bytes. ● Root hub The root hub (RH) block interfaces between the list processor and the USB PHY. It propagates reset and resume signals to downstream ports, and handles port connections and disconnections. The RH operates both on the local PHY clock (a free-running 30/60 MHz clock) and on the clock source from each physical port (30 MHz with a 16 bit interface). Open Host controller interface (OHCI) The OHCI Host controller, compliant with the OHCI specification (version 1.0a), is integrated in the UHC to support the 12 Mbps full-speed (FS) and the 1.5 Mbps low-speed (LS) operation of USB 1.1. FS/LS device connected to port0 is managed by OHCI0 and port1 is managed by OHCI1. The USB open Host controller is designed to be independent of the bus interface unit (BIU). The host bus is assumed to be at least 32 bits wide with adequate performance to support the data rate of the particular implementation (100Mbit/sec or higher plus overhead for DMA structures) as well as bounded latency so that the FIFOs can have a reasonable size. The main blocks of the OHCI block are described below. 320/590 Doc ID 018553 Rev 3 RM0078 USB 2.0 host controllers (UHC) Figure 99. USB open Host controller block diagram OHCI Regs RCFG_RegData(32) APB_SADR(6) HCI Slave block APB_SData(32) HCI_Data(32) Control USB 2 TxDpls Root Hub & Host SIE List Processor Block TxDmns Port S/M Ctrl Ctrl ED/TD_Status(32) HCM_ADR/ Data(32) X V R X V R USB Cntl ED/TD_Data(32) APP_MData(32) Ctrl Ctrl OHCI Regs HCI Bus Control TxEnL USB State Control Ctrl Port S/M 1 ED &TD Regs HCI Master block RH_Data(8) 64 x 8 FIFO Ctrl Root Hub Config Block HSIE S/M RcvData Status HC_Data(8) DF_Data(8) Clock MUX 12/1_5 RcvDpls DPLL - RcvDmns DF_Data(8) FIFO_Data (8) Addr (6) HCF_Data(8) 15 Ext.FIFO Status Port S/M X V R USB FIFO 64 x 8 ● HCI master block The HCI master block is the interface between the HCI master interface logic block and the HCI bus. It converts all the cycles initiated by different blocks of the list processor through HCI master interface logic block into HCI bus cycles according to the protocol defined for HCI bus. In addition to that it implements a state machine to read/ write from/to DFIFO. When it is transferring the data returned by endpoint, it reads the data from DFIFO and merges into DWORD and then send it to the application internal FIFO. Similarly when reading the endpoint data from the system memory, after reading every DWORD from the application FIFO it splits the DWORD into 4 individual bytes and then sends it to the DFIFO. It also implements byte-alignment logic, that is when a write cycle is initiated by FML block at the odd boundary (not the DWORD boundary), it reads only the lower 2 bit of the address (ties them to 0), so that the application always writes at DWORD boundary, and manipulates the byte-enables accordingly. ● HCI slave block The HCI slave block is the slave on HCI bus. This is basically an interface between the OHCI operational register internal to the Host Controller and the application. It updates the registers on writes and provides the register data on reads. All the slave accesses should be DWORD aligned. Therefore, byte enables are not used in slave accesses. ● List processor block The list processor block acts as a main controller of the entire controller. It has multiple state machines to implement List Service Flow, List Priority, USB-States, ED, TD Doc ID 018553 Rev 3 321/590 USB 2.0 host controllers (UHC) RM0078 Service, StatusWriteBack, TD Retirement, and so on, per the OHCI specification. In addition, this block implements a controller that interfaces with HCI_master and hsie, helping them in the data transfer from system memory to USB, and USB to system memory. The following submodules are included: ● – USB states – List service flow – ED-TD block – HCI master interface logic – Data read write logic RootHub and HSIE blocks Because implementation varies, most of the functionality of the RootHub is implemented in the port configuration block. This logic is common to any user configuration. The logic in this block acts as a wrapper around HSIE and interface with Host controller list processor, FIFO and OHCI registers. This block also implements the control logic to synchronize the interface between HSIE and port S/M. This block implements the following submodules: – Reset_Resume – DPLL – HSIE Digital PLL block (DPLL) The function of the DPLL block is to extract the clock and data information from the USB data received from the different transceiver. The digital PLL runs on a 48 MHz user-provided clock to extract the clock information from the USB for both full-speed and low-speed data. The two signals D+ and D- of the USB lines are passed through a differential receiver (external to the UHOSTC controller) and a NRZI formatted data is obtained from the output of the differential receivers. The output of the differential receiver is then used by the Digital PLL to extract clock information. The PLL Block also has a SE0 Detect Logic to detect the single ended zero (SE0) in the data stream. The circuit in this module extracts clock from either high-speed data or low-speed data indicated by SIE_Switch HCLK input from SIETx State Machine. HSIE functionality The functionality of the Host serial interface engine (HSIE) is to receive and transmit the USB data over D+ and D- lines in accordance with the USB protocol. During the reception of USB data, the D+ and D- signals are passed through the differential receiver (which is external to the UHOSTC controller) to get a single ended bit stream that is passed through the PLL Block to extract the clock and data information. The Clock and data are passed to the SIE Block to identify the Sync Pattern and for NRZINRZ conversion. This NRZ data is then passed through the Bit Stripper which strips off the excessive zeros inserted, The data stream is initially passed through the PID Decode and checker to identify different PIDs. Depending upon the type of PID, the HSIE block handles the protocol accordingly. ● RootHub port configuration The port configuration block implements part of the RootHub logic. This block is separated from the main RootHub block to distinguish the logic that varies with design requirements. In short, this block implements part of the OHCI registers that are 322/590 Doc ID 018553 Rev 3 RM0078 USB 2.0 host controllers (UHC) specific to RootHub and a state machine for every DownStreamPort to control the port functional states. This block has the following submodules: – RootHub port registers – Port S/M – Port receive – Port resume – Port MUX Doc ID 018553 Rev 3 323/590 USB OTG controller (UOC) 21 RM0078 USB OTG controller (UOC) This chapter focuses on UOC functionality and operation. For the UOC feature list, refer to the SPEAr1340 datasheet: ● Doc ID 023063, Data sheet, SPEAr1340, Dual-core Cortex A9 HMI embedded MPU For technical details about the programmable registers, refer to the following companion document: ● 21.1 RM0089, Reference manual, SPEAr1340 address map and registers Overview UOC supports both device and host functions and complies fully with the On-The-Go Supplement to the USB 2.0 Specification, Revision 1.3a and Revision 2.0. It can be configured as a host-only or device-only controller, fully compliant with the USB 2.0 Specification. It supports high-speed (HS, 480-Mbps) transfers. UOC connects to the industry-standard AMBA High-Performance Bus (AHB) to communicate with the application and system memory, and is fully compliant with the AMBA Specification, Revision 2.0. Figure 100. UOC module in SPEAr1340 USB_UOC_DRVVBUS PCM MISC Off-chip charge pump 5V utmiotg_drvvbus uoc_irq / ID[94] otg_utmi_suspend_n otg_utmi30_clk USB_UOC_VBUS UOC USB_UOC_ID Other UTMI+ (Parallel 16-bit IF and UOC IF) MCLK_XI OSCI AHB IF (Master & Slave) otg_hclk USB_UOC_DM otg_hreset_n USB_UOC_DP otg_utmi_rst_n U S B P H Y usbphy_clkcore RCG MCLK_XO SPEAr top level 324/590 Doc ID 018553 Rev 3 RM0078 21.2 USB OTG controller (UOC) Pins For a complete pin description, refer to Doc ID 023063, Data sheet, SPEAr1340, Dual-core Cortex A9 HMI embedded MPU. 21.3 Clocks ● utmi_clk: this is the UTMI+ clock. It is functionally used only when a UTMI PHY is selected, but always used as the PHY domain clock during DFT Scan mode. Select utmi_clk as a test clock even when the core is configured for a non-UTMI PHY. ● hclk: this is the AHB clock. hclk is the scan clock for the core's AHB domain. See also Chapter 5: Reset and clock generator (RCG). Doc ID 018553 Rev 3 325/590 PCI express controller (PCIe) 22 RM0078 PCI express controller (PCIe) This chapter focuses on PCIe functionality and operation. For the PCIe feature list, refer to the SPEAr1340 datasheet: ● Doc ID 023063, Data sheet, SPEAr1340, Dual-core Cortex A9 HMI embedded MPU For technical details about the programmable registers, refer to the following companion document: ● 22.1 RM0089, Reference manual, SPEAr1340 address map and registers Overview The PCI Express (PCIe) core incorporates a dual mode (DM) core which can implement a PCIe interface for a PCIe Root Complex (RC) or Endpoint (EP). The dual mode core can operate in EP or RC port modes, depending on the value written in a register during PCIe configuration. The DM core can be switched between modes at runtime by applying a power-on reset. PCI Express is compliant with the PCI Express Base 2.0 specification but it is also compliant with the PCIe 1.1 specification. The core features a proprietary user-configurable and high-performance application interface for generating and receiving PCIe traffic. It is available with standard AMBA 3 AXI interfaces. The PCIe cores implement the three PCI Express protocol layers (Transaction layer, Data Link Layer, and the MAC portion of the Physical Layer). It also implements the mode-specific functionality of the PCI Express Transaction Layer (XADM/RADM ) for packet transmission which sits between the application logic and the CXPL core. As shown in Figure 101, a complete PCI Express Port solution includes the core, an analog PHY macro, and application logic to source and sink data. 326/590 Doc ID 018553 Rev 3 RM0078 PCI express controller (PCIe) Figure 101. PCIe port system block diagram 0#)EAPPLICATION !PPLICATION LOGIC !PPLICATION REGISTERS #05OR %%02/- !PPLICATION INTERFACES #ORE !PPLICATIONDEPENDENT PARTOFTHE TRANSACTIONLAYER 4RANSACTIONLAYER $ATALINKLAYER 0HYSICALLAYER-!# 0(9INTERFACE0)0% 0)0%COMPLIANT0(9 0#)%XPRESS,INK Figure 102. PCIe integration in SPEAr1340 SPEAr top level pcie_p0_int pcie_sata_p0_int sata_p0_int To A9SM interrupt controller pcie_p0_power_up_rst_n MISC pcie_p0_aux_clk_en pcie_p0_device_present pcie_p0_core_clken pcie_miphy_p0_rst_phy_n MIPHY_S_0_TXp pcie_aux_clk pcie_axi_dbi IF pcie_miphy_p0_clk_tx MIPHY_S_0_TXn pcie_miphy_p0_clk_rx pcie_miphy_p0_data_tx pcie_miphy_p0_data_rx Demux pcie_sata_axi_master IF p1_rst_phy_n p1_clk_auxi p1_clk_rx p1_clk_tx p1_data_in p1_data_out pcie_sata_axi_slave IF MIPHY single lane RCG PCIe0 pcie_sata_0_aclk pcie_sata_0_aresetn MIPHY_S_0_RXp SATA0 MIPHY_S_0_RXn PLL MIPHY_S_XTAL1 MIPHY_S_XTAL2 Pcie_sata_sel[0] MIPHY_single pll control signal coming from MISC register Doc ID 018553 Rev 3 327/590 PCI express controller (PCIe) 22.2 RM0078 Pins For a complete pin description, refer to Doc ID 023063, Data sheet, SPEAr1340, Dual-core Cortex A9 HMI embedded MPU. 22.3 Clocks The PCIe controller operates in following clock domains: 22.4 22.5 ● Application clock (ACLK): this is the AXI bus interface unit clock. It is used for the AXI master and the two AXI slave interfaces. ● RX clock (p1_clk_rx): PHY receive clock; recovered RX domain clock coming from PHY. It is at 125 MHz or 250 MHz depending on selected speed and it is asynchronous with ACLK. ● TX clock (p1_clk_tx): PHY transmit clock: this clock is generated by the PHY for clocking the PCIe core transmit section. It is at 125 MHz or 250 MHz depending on selected speed and it is asynchronous with ACLK. Resets ● ARESETn: AXI reset for AXI master and the two AXI slave interfaces ● Reset_rx: PHY receive clock domain reset, asynchronous power on reset input for the RX clock domain ● Reset_tx: PHY transmit clock domain reset, asynchronous power on reset input for the TX clock domain Interrupts There is one interrupt output from the PCIe controller; it is the logical OR of the individual interrupt status bits (CR6_Register) in the PCIe application control registers. This register contains different interrupt or error conditions both internal to the PCIe controller itself and interrupt messages coming from PCIe link. See also Appendix A: Interrupts. The application logic in a PCI express endpoint may use one of three methods to signal an interrupt across the link: ● PCI legacy interrupt PCI includes up to four virtual interrupt wires, referred to as INTA, INTB, INTC, and INTD. These wires are shared by all the PCI devices in the system. PCI Express standard emulates this capability by providing Assert_INTx and Deassert_INTx Message packets sent through the PCI Express serial Link. ● MSI A PCI express endpoint may signal an MSI by sending a standard PCI Express Posted Write packet towards the Root Port. The packet must contain a specific address and 328/590 Doc ID 018553 Rev 3 RM0078 PCI express controller (PCIe) one of up to 32 data values. The varying data values, and the address value provide more detailed identification of interrupt events than legacy interrupts. ● MSI-X An MSI-X interrupt is identical to an MSI, except that an Endpoint may use one of up to 2048 address and data pairs in the MSI-X Posted Write packet. Endpoints with MSI-X capability also include application logic to mask and hold pending interrupts, as well as a memory table for the address and data pairs. The large number of address values available to each Endpoint allows MSI-X Messages to be routed to different interrupt consumers in a system, as compared to the single address available to MSI packets. Upstream Switch Ports can send MSI-X packets; Root Ports cannot. In complex systems, MSI-X packets could be routed to devices other than the RC, including other Endpoints, based on the multiple address/data pairs available. Only one of these capabilities is available at a time. When host software clears the MSI Enable bit, you may only use legacy interrupts. When host software sets the MSI Enable bit, you may only use MSI. If host software enables MSI or MSI-X, legacy interrupts are automatically disabled. Functionality is undefined if both MSI and MSI-X are enabled. When the PCIe controller is set in RC mode, the PCIe accepts ASSERT_INTX and DEASSERT_INTX messages from the downstream component. CR6_Register contains eight bits dedicated to these kind of interrupts (4 related to assert and 4 to deassert). They are set depending on packets coming from the link. When the PCIe controller is set in EP mode, it can generate ASSERT_INTA and DEASSERT_INTA messages by setting sys_int bit of CR0_Register. 22.6 Functional description Main PCIe interfaces are shown in Figure 103, while the top-level structure of the PCIe core is shown in Figure 104 (red line indicates the boundary of PCIe controller). Figure 103. PCIe main interfaces !8)-ASTER)& 280)0% !8)3LAVE)& 0#)ECONTROLLER 480)0% 0(9 !8)3LAVE)& FORREGISTERS27 234#,+ Doc ID 018553 Rev 3 329/590 PCI express controller (PCIe) RM0078 Figure 104. DM core block diagram (with AHB/AXI bridge module) $-#ORE 2#0,2"90 !("!8) 2!$- !("!8) "RIDGE -ODULE 280)0% 24,) 242'4 242'4 $") ,"# !PPLICATION REGISTERS %,") 0(9 ,OGIC 2XVENDOR MESSAGES 3)) #80,#ORE #$- !PPLICATION LOGIC -3)8 !("!8) ,OGIC 4XVENDOR MESSAGES -3) #ORE REGISTERS -3)8 !("!8) BRIDGE MODULE 8!,) 8!,) 8!$- 84,) 480)0% 2!-) 6-) -3'?'%. ,OGIC /PTIONAL SYSTEM STATUS CONTROL 3)) #,+234 0-# .OTES /PTIONALINTERNALI!45ANDEXTERNALX!45ADDRESSTRANSLATIONUNITSINTERFACESARESHOWN .ARROWARROWSREPRESENTRESPONSESIGNALPATHSTOREQUESTSBROADARROWS )N2#MODETHE%,")PINSAREPRESENTBUTNOTOPERATIONAL 330/590 Doc ID 018553 Rev 3 2!- RM0078 PCI express controller (PCIe) The Common Xpress Port Logic (CXPL) module implements the basic functionality for the PCI Express Physical, Link, and Transaction Layers. In addition to the CXPL, there are several top-level modules that provide the configuration and mode-specific features: ● Transmit application-dependent module (XADM) ● Receive application-dependent module (RADM) ● Configuration-dependent module (CDM) ● Power management controller (PMC) ● Local bus controller (LBC) ● Message generation (MSG_GEN) ● Hot plug control (HOTPLUG_CTRL) The following sections describe in detail the main interfaces of the PCIe controller. 22.6.1 AXI bridge interface The AXI bridge module acts as a bridge between the standard AXI interfaces and the Synopsys DesignWare PCIe core native interfaces. The bridge interconnects the AXI interfaces within an AMBA-embedded system with a remote PCIe link, as either a root complex port or as an endpoint port. The bridge supports three AXI interfaces, one for an AXI master, one for an AXI slave, and one for DBI access to the native PCIe core. The AXI master interface enables a remote PCIe device to read and write to an AXI slave connected to the AXI bridge. The AXI slave interface enables an AXI master to read and write through the AXI bridge to a remote PCIe device. The slave DBI enables an AXI master to read and write to registers inside the native PCIe core, or the device-specific registers attached to the PCIe native core's ELBI (see Section 22.6.7: Local bus controller (LBC) and data bus interface (DBI)). Throughout this document, the terms inbound and outbound are defined with respect to the AXI fabric. That is, inbound transactions are defined as the transactions presented by the native PCIe core's AXI master interface. Outbound transactions are defined as the transactions generated by an AXI master that targets a remote PCIe device. Doc ID 018553 Rev 3 331/590 PCI express controller (PCIe) RM0078 Figure 105. System level view of the PCIe AXI core /54"/5.$4RAFFIC !PPLICATIONHARDWAREANDSOFTWARE !8)-ASTER !8)3LAVE !8))NTERCONNECT #ORE!PPLICATION 3IDE !8)3LAVE$") !8)3LAVE !8)-ASTER !8)"RIDGE !PPLICATION LOGIC 3)))NTERFACE .ATIVE0#)E#ORE 0#)E )."/5.$4RAFFIC #ORE7IRE3IDE 0(9 0#)E2EMOTE,INK0ARTNER Figure 106 shows the PCIe AXI core top-level interfaces (red line indicates the boundary of PCIe controller). 332/590 Doc ID 018553 Rev 3 RM0078 PCI express controller (PCIe) Figure 106. PCIe AXI core top-level interfaces !PPLICATION !8)-ASTER OUTBOUND REQUESTS !8) BRIDGE SLAVE !PPLICATION !8)-ASTER REGISTER ACCESSES !8) BRIDGE SLAVE 8!,) 2#0,2"90 $") .ATIVE0#)E #ORE !PPLICATION !8)3LAVE INBOUND REQUESTS !8) BRIDGE MASTER !PPLICATION LOGIC 3)) %XTERNAL BRIDGE 2!- 2!-) 0)0% 0(9 242'4 8!,) !8) BRIDGE MODULE 0#)E!8)#ORE Features ● AXI master and slave interfaces for inbound and outbound PCI express requests. ● 32-bit address width for AXI master and slave interfaces ● 64-bit data width for AXI master and AXI slave interface for inbound and outbound requests ● 32-bit data width for AXI slave interface for register accesses (DBI) ● All types of PCI express transactions supported through the AXI bridge ● Little-endian operation. Doc ID 018553 Rev 3 333/590 PCI express controller (PCIe) 22.6.2 RM0078 Common xpress port logic (CXPL) The CXPL module implements a large portion of the transaction layer logic, all of the data link layer logic, and the MAC portion of the physical layer, including the link training and Status State Machine (LTSSM). The CXPL connects to the external PHY though the PIPE. Important aspects of the CXPL and overall core implementation include: ● Layer 3 (transaction layer) functionality is split between the XADM, RADM, CDM, and CXPL. ● Layer 1(physical layer) is split across the PIPE such that the MAC functionality is in the core and the PHY functionality is implemented in the PIPE-compliant PHY. ● Receive and transmit path functionality is decoupled except where communication between the two is required (such as flow control and other low-level link management functions). ● CXPL contains six modules, three for transmission and three for reception, as shown in Figure 107. – RTLH: Receive Transaction Layer Handler – XTLH: Transmit Transaction Layer Handler – RDLH: Receive Data Link Layer Handler – XDLH: Transmit Data Link Layer Handler – RMLH: Receive MAC Layer Handler – XMLH: Transmit MAC Layer Handler CXPL is compliant with the PCI express 2.0 specification with regards to the physical layer, data link layer and transaction layer. Figure 107. CXPL module block diagram #80, 4O 2!$- 24,) 24,( &ROM 8!$- 84,) 84,( 2$,( 2-,( 2X0)0% 8$,( 334/590 8-,( 2ETRYBUFFER CONTROLLOGIC Doc ID 018553 Rev 3 4X0)0% RM0078 Transmit application-dependent module (XADM) The XADM sits between the application logic and the CXPL core and implements the modespecific functionality of the PCI express transaction layer for packet transmission. Figure 108 is a block diagram of the XADM. Its functions include arbitration, TLP formation, and credit checking. The transmit path uses a cut-through architecture. It does not implement transmit buffering/queues (other than the retry buffer). Figure 108. XADM block diagram 8!$48&#CHECKING !PPLICATION #LIENT 8-4&##REDITS 8!,) 2EQUESTERAND #OMPLETER #80, 4RANSMIT ARBITRATION #LIENT #LIENT 2EQUESTERAND #OMPLETER 8!,) 4,0 FORMATION /UTPUT-58 MODULE 8!,) .OTE#LIENTAND8!,)AREOPTIONAL -3'?'%. #0, 2EQUESTERAND #OMPLETER -3' 22.6.3 PCI express controller (PCIe) ,"# Arbitration XADM provides the arbitration of TLP transmission between the following: ● The transmit client interfaces (XALI0, XALI1). (XALI2 shown in Figure 108 is not present). ● Internally generated Messages from the MSG_GEN, triggered by PME, INTx (EP mode), errors, or application logic ● Internally generated completions: – EP mode: Internally generated completions are responses for type 0 configuration read and write requests from upstream components, memory or I/O-mapped application register space read and write requests, or responses to error conditions (unsupported requests). – RC mode: Internally generated completions are unsupported request or completer abort, as required by the incoming request filtering function of the RADM. Doc ID 018553 Rev 3 335/590 PCI express controller (PCIe) RM0078 In general, all internally generated TLP requests have higher priority than client interfaces. Usage models for the client interfaces include: ● A master is connected to each client interface (EP mode only). The XADM arbitrates among client interfaces. There is no guarantee that order will be preserved among client interfaces. In some cases, a requester may consider implementing some ordering rules in the application logic, for example, holding off a Memory Read transaction until the Memory Write transaction is completed. ● A master is connected to Client1. A target completer is connected to Client0.. The XADM arbitrates around each client interface. There is no guarantee that order will be preserved among client interfaces. ● A master for posted traffic is connected to Client0. A master for non-posted traffic is connected to Client1. This is a model with one type of TLP per client (Posted, NonPosted, Completion). Credit checking The core checks that enough FC credits are available in the remote device for the specific type of transaction (P, NP, CPL) before allowing a transmission of a TLP. TLPs that passed the credit check are arbitrated according to the supported arbitration method. Internally generated completions and messages are also gated by the arbitration logic, though at highest priority, and must also pass the FC credit test before they are accepted for transmission. If the application is using a single transmit client interface for more than one Request type (for example, Posted and Non-Posted), and the current request (for example, a Posted Request) is being blocked due to lack of available FC credits, then that client interface is effectively blocked from sending other requests (for example, Non-Posted) even though credits may be available for that type. To avoid this situation, the application can use different transmit client interfaces for different request types. 22.6.4 Receive application-dependent module (RADM) The RADM sits between the application logic and the CXPL core and implements the modespecific functionality of the PCI Express Transaction Layer for TLP packet reception. Figure 109 shows a block diagram of the RADM. The RADM serves four major functionalities as following: ● Sort/Filter received TLPs ● Completion lookup table (CLT), which is used for completion tracking and completion timeout monitoring of transmitted Non-Posted requests. ● Provide queuing (or bypass) of the received TLP ● Output received TLP to the core's receive interface (Demux function) The filtering rules and routing for all TLP receive options are configurable for all TLPs received. 336/590 Doc ID 018553 Rev 3 RM0078 PCI express controller (PCIe) Figure 109. RADM block diagram 2!$- !PPLICATION 2CV&# UPDATE 4RASH #OMPLETION LOOKUP TABLE 1UEUE 242'4 4,0 FILTERING $%-58 2CVD#0, PROCESSING 2#0, -ESSAGE PROCESSING -3' 242'4 2EQUESTER #80, %22 #OMPLETER &ILTER -3' %,") ,"# #&'$ATA #$- 4O 8!$- LBC?#0, $") &OR2#-3)-3)8ARENOTAVAILABLE &OR2#AND$-2#MODE%,")ISNOTAVAILABLE Posted and non-posted request and completion TLP processing The RADM filter passes the Posted and Non-Posted Request and completion transactions (such as write transactions and memory reads) directly to the application through the RTRGT1 interface or to RTRGT0 for internal modules, as determined by the filtering and routing rules for the current operating mode. The RADM filter segregates Posted and NonPosted TLPs into valid supported and valid un-supported Requests, and forwards them to the queue. The filter processes each request and determines each TLP's destination along with other controls that may be needed to generate TLPs. For Requests that the core forwards to the RTRGT1 or Bypass interface, the application must process the request and generate the completion. For requests that the core forwards to RTRGT0, the core automatically generates the completion. The core automatically executes any required ELBI access before generating the completion. The RADM demux is designed to mux out a received TLP to the RTRGT1 and RCPL/RBYP interfaces from single queue or multiple queue (DM/RC/EP) configurations. The filter determines the destination and the action for each TLP, then sends this to the queue. The demux decides whether to discard or forward the TLP onto the RTRGT1, RTRGT0, RCPL or RBYP interfaces. Doc ID 018553 Rev 3 337/590 PCI express controller (PCIe) RM0078 Received completion TLP processing Received completions are filtered against the completion lookup table content before presenting the completion to the queue. The RADM also implements a completion time-out mechanism (via the completion lookup table) and notifies the application when an expected completion, corresponding to a transmitted Non Posted TLP, does not arrive within a specified time. Typically, infinite completion credits are advertised and the received completion is configured in bypass mode which means that there is no queue in the core to store completions. Completions can be configured in store and forward mode if the application has chosen to do so. If a completion lookup has failed or other completion filtering has failed, the core will assert an abort signal at the end of the transaction. If the core is configured to have completions in bypass mode, it is the application's responsibility to roll back any actions at the application's queue when an abort signal is asserted. If the core is configured with completions enqueued, the completion will be discarded by the core and flow control credits will be updated, as necessary, when an abort signal is detected. Message processing The RADM filter provides a message interface (grouped as part of the SII) to handle the message TLPs received from the upstream component. By default, the RADM filter processes the message and decodes the header before sending it to the application logic on the SII. You can also write on Filter Mask registers to change this default and to send the entire message TLP to the application in addition to providing the decoded message on the SII. 22.6.5 Configuration-dependent module (CDM) The CDM implements the standard PCI express configuration space and the core-specific register space. The CDM also requests the message generation module to send messages, as required, including MSI and interrupts. The specific PCI Express configuration structures implemented in the CDM include the following: ● ● PCI-Compatible configuration registers – RC mode: Type 1 header – EP mode: Type 0 header PCI Capability Structures: – PCI power management capability structure – MSI capability structure – MSI-X capability structure – VPD (vital product data) capability ● PCI express capability structure ● PCI express extended capabilities: – 338/590 Advanced error reporting capability – Virtual channel capability – Device serial number capability – Power budgeting extended capability Doc ID 018553 Rev 3 RM0078 PCI express controller (PCIe) The configured device type (determined by the CR0_Register[28:25], see PCIe application control registers) affects the behavior of the message generation engine, error reporting mechanism, as well as some PCI express configuration space registers. The CDM communicates with application's host bus controller through the DBI. 22.6.6 Power management control (PMC) The PMC module supports PCI software-compatible Power Management (PM) mechanisms and the native PCI Express Active State Power Management (ASPM). The PMC is the only module that must be powered by auxiliary power (Vaux) in the core when the core is in a lower power state. It is also the only module containing contexts that are resetable only at power-up. The following features are implemented in the PMC module: 22.6.7 ● ASPM support: L0s and L1 ● Control of the LTSSM to perform link power management: L0, L1, L2 and L3 ● Software-controlled device PM states: D0, D1, and D3hot/cold ● Generation of PM Message transmission requests ● Control of beacon generation ● Side-band wake mechanism: – Supports application wake-up (for example, WOL and WAKE# signal support from the platform system) – Generates wake to request system to restore power and clock ● Power management event (PME) generation ● Output of current power state status to the application Local bus controller (LBC) and data bus interface (DBI) The LBC module provides a mechanism for a link partner PCIe device (in EP mode only) or a local CPU (through the DBI) to access: Note: ● internal registers (in the CDM) ● external application registers connected externally to the ELBI. In RC mode: The application can access CDM registers or ELBI through the DBI. PCIe wire access (through RTRGT0) to the CDM registers or ELBI is not possible. Figure 110 shows the location of the LBC within the PCIe core and its role in routing transactions. Doc ID 018553 Rev 3 339/590 PCI express controller (PCIe) RM0078 Figure 110. LBC context 0#)E#ORE 2#0,2"90 2!$ 242'4 280)0% 24,) 242'4 #05OR %%02/- $") ,"# !PPLICATION REGISTERS %,") 0(9 #80,#ORE #$#ORE REGISTERS .OTE&ORADOWNSTREAMPORT THE%,")PINSAREPRESENT BUTNOTOPERATIONAL 8!$- 84,) 480)0% 2EQUEST 2ESPONSE )NCOMINGREQUESTISRECEIVEDFROM0#)E2EMOTE,INK0ARTNER 2EQUESTISFILTEREDANDROUTEDBY2!$-VIA242'4TO,OCAL"US#ONTROLLER,"# ,"#FORWARDSREQUESTTOEXTERNALREGISTERSVIA%,")ORINTERNALREGISTERSIN#$- ,"#FORMSA#PL#PL$4,0WITHTHERESPONSERECEIVEDFROM%,")OR#$- 0#)ECORETRANSMITSRESPONSE#PL#PL$TO2EMOTE,INK0ARTNER ,OCAL#05MAYALSOGENERATEREGISTERREADWRITEREQUESTVIA$")ANDTHERESPONSEFROM%,")OR#$-IS RETURNEDTOITIMMEDIATELY The LBC provides a switched access function to internal registers (in the CDM) or external registers (via ELBI) from the local application processor (CPU) via the DBI or the remote application software (off the PCIe RX wire) via RTRGT0. Figure 111 illustrates the four possible request paths through the LBC. 340/590 Doc ID 018553 Rev 3 RM0078 PCI express controller (PCIe) Figure 111. LBC switch )NBOUNDREQUEST THROUGH242'4 #$-ACCESSTO #ORESREGISTER $") %,")EXTERNALLOCAL BUSINTERFACE )NBOUND0#)EREQUESTTO2$72THE0#)ECORESINTERNALCONFIGSPACEREGISTERS )NBOUND0#)EREQUESTTO2$72EXTERNALAPPLICATIONSPECIFICREGISTERS ,OCAL#05REQUEST $")TO2$720#)ECORESINTERNALCONFIGSPACEREGISTERS ,OCAL#05REQUEST $")TO2$72EXTERNALAPPLICATIONSPECIFICREGISTERS The LBC also generates PCIe completions for requests coming from the PCIe wire through RTRGT0. Simultaneous transactions ● The LBC is single-threaded and therefore, the DBI and RTRGT0 cannot use the LBC at the same time. For example, a request on the DBI will not be accepted, during a RTRGT0 <-> ELBI transaction, until both parts of that transaction.-[1] request and [2] response (completion generation) - are completed. Therefore, it is not permissible to use the ELBI to drive the DBI. ● If the DBI and RTRGT0 present a request at the same time (regardless of the target/destination of each request), then the LBC will grant access to the RTRGT0. Application registers are connected to the ELBI. These can be accessed by PCIe request TLPs over the PCIe link or by the DBI. The ELBI can only be accessed by CFG access (from DBI or PCIe wire). CDM / ELBI register space layout The core has 4096 bytes(2) of PCI Express configuration space per function distributed as per Figure 112. This address space is fully accessible from the DBI without any restrictions. In EP mode it can be accessed from the PCIe wire using CFG requests. 2. A CFG TLP has a 6-bit Register Number Field and a 4-bit Extended Register Number field allowing 1024 DWORDS (4096 bytes) to be accessed. Doc ID 018553 Rev 3 341/590 PCI express controller (PCIe) RM0078 Figure 112. PCIe configuration space address map (per function) $7/2$!DDRESS X&& @#/.&)'?,)-)4 DEFAULTX&& "YTE!DDRESS X&&& #USTOMERAPPLICATIONREGISTERS %,") ;OPTIONAL= %,") MAXIMUM"YTES$7/2$3 @#/.&)'?,)-)4 DEFAULTX&&& #$0ORTLOGICREGISTERS ;OPTIONAL= X# X 0#)%XPRESS EXTENDED CONFIGURATION SPACE "YTES $7/2$3 0#)EEXTENDEDCAPABILITYSTRUCTURES !%26#3.0"!2)32)/630#)% X X 0#)STANDARDCAPABILITYSTRUCTURES 0--3)0#)%-3)860$ @#AP0TR X& X& 0#) SONFIGURATION SPACE "YTES $7/2$3 0#)CONFIGURATIONHEADERSPACE BYTES$7/2$3 PCI configuration header and capability registers (in CDM) The PCI Configuration Header and Capability registers in Figure 112 are PCIe core configuration registers specified by the PCI express 2.0 specification. Access from the PCIe wire is possible with CFG requests (in EP mode only). These registers are fully accessible from the DBI without any restrictions. Port logic (PL) registers (in CDM) The port logic registers in Figure 112 are PCIe core configuration registers not specified by the PCI express 2.0 specification, but are specific to the configuration and operation of the PCIe controller integrated in SPEAr. In EP mode, access from the PCIe wire is with CFG requests. There is no access from the PCIe wire in RC mode. These registers are fully accessible from the DBI without any restrictions. Customer application registers in ELBI The customer application registers in Figure 112 are the customers application registers specific to the operation of the customers application IP. They are external to the core and are connected to the ELBI. These registers can't be directly accessed through remote link. 342/590 Doc ID 018553 Rev 3 RM0078 PCI express controller (PCIe) These registers are fully accessible from the DBI without any restrictions. Accessibility summary ● ● ● From the PCIe wire (through RTRGT0) in EP mode only: – You can Memory-Map the Port Logic (PL) register space. – You cannot Memory-Map the PCI and PCIe configuration register spaces. – You must always access with a CFG request. – You cannot access to customer application registers In RC mode: – PCIe wire access (through RTRGT0) to the CDM registers is not possible – PCIe wire access (through RTRGT0) to the application registers is not possible From the DBI: – You can access without any restriction the port logic (PL) register space. – You can access without any restriction the PCI and PCIe configuration register – You can access without any restriction the application register – CONFIG_LIMIT is not used in DBI routing to CDM/ELBI. Table 123. AXI bridge DBI -> CDM / ELBI access details Address bits Access type 31-14 13 12 11-12 1 0 CDM Not used (1) 0 CS2(2) 1 K-DWORD register access 0 0 ELBI Not used (1) 1 Not used 1 K-DWORD register access 0 0 1. But forced to 0 internally 2. This bit must be asserted to write "BAR Mask registers" Figure 113. DBI access to LBC !("!8)!PPLICATION 0#)E#ORE !("!8) "53 !PPLICATION MASTER #05 !PPLICATION SLAVE !("!8)"RIDGE -ASTER 242'4 $")3LAVE $") 3LAVE 8!,) Doc ID 018553 Rev 3 ,"# 343/590 PCI express controller (PCIe) 22.6.8 RM0078 Message generation The message generation module works on message generation and message processing. 22.6.9 Hot plug control (HOTPLUG_CTRL) module In RC mode devices, the hot plug logic supports generation of hot plug interrupts on the following hot plug events: ● Power fault detected ● MRL sensor changed ● Presence detect changed ● Command completed ● Attention button pressed ● Electromechanical interlock status changed ● Data link layer state changed When MSI or MSI-X mode is enabled, the core notifies the application of hot plug events using the hp_msi bit in CR6_Register. When INTx interrupt mode is enabled, the core notifies the application of hot plug events using the hp_int bit in CR6_Register. If PME is enabled, the hot plug logic generates a hot plug wake-up signal on hp_pme, triggered by the above hot plug events. The RC Core does not check if the PM state is D1, D2, or D3hot. It is up to the application to check the value on pm_dstate to make sure the device is in D1, D2, or D3hot. 22.7 Operation This section describes the operations of the PCI Express core. The topics for this section are: ● Initialization ● Link establishment ● Transmit TLP processing ● 344/590 – Transmit TLP arbitration – Transmit retry – Transmit DLLP priorities Receive TLP processing – Receive filtering – Receive Routing – Receive queuing ● Error handling ● Messages ● Interrupts ● Address translation ● Gen2 5.0 GT/s operation ● Power management ● Completion timeout ranges Doc ID 018553 Rev 3 RM0078 22.7.1 PCI express controller (PCIe) Initialization Immediately after reset the DM core goes into either EP mode or RC mode depending on the state of the device_type setting (CR0_Register[28:25], see PCIe application control registers in RM0089, Reference manual, SPEAr1340 address map and registers).The internal configuration registers in the CDM assume their default reset values as listed in the PCIe core registers section. The application must keep the app_ltssm_enable signal deasserted after reset until the application is ready to establish a Link and start receiving and transmitting TLPs. If the application needs to update configuration registers in the CDM as part of the initialization process, then the application must keep app_ltssm_enable deasserted until it has programmed all the necessary configuration registers through the DBI. After initializing the necessary configuration registers, the application can assert app_ltssm_enable to allow the LTSSM to begin Link establishment. The LTSSM begins link negotiation after the deassertion of reset, miphy initialization complete and app_ltssm_enable bit (CR0_Register[3]) is asserted. 22.7.2 Link establishment The core and a PCI Express compliant PHY combine to provide a complete solution for setting up and maintaining a compliant PCI express link. The core implements the LTSSM function according to the PCI express 2.0 specification. In general, the process for establishing a Link is a follows: 22.7.3 1. Upon power-up (or directly out of reset), it is assumed that the power supply becomes stable and the ASIC/SoC and SerDes PLLs reach frequency lock before the devices attempt to establish a valid Link. Once in a valid state, the SerDes either communicates a ready status to the core or simply begins transmitting and receiving valid data. 2. Per the PCI express 2.0 specification, once bit and symbol synchronization are complete, the core initiates the following sequence to establish a link (assuming a valid and properly functioning link partner): a) Receiver detection on available lanes for the port. b) Exchange of training sequences to determine link configuration (for example, link speed, number of lanes, and order). c) Once both partners reach a valid negotiated state, the link state is set up and the LTSSM is in L0. 3. Once link up is achieved, the data link modules take over to manage the link and initialize flow control. 4. After flow control initialization is complete, the data link modules signal the transaction layer modules that the link is ready to allow transmission/reception of TLP traffic. 5. During normal operation, the LTSSM and data link modules continue to manage the underlying Link integrity while data traffic is communicated across the PCI express link. Transmit TLP processing Generally, all types of transmit TLPs (Posted, Non-Posted, and Completion) generated by the application travel through the core in the following flow: The application presents a transaction transmission request with header information and payload (if applicable) on one of the transmit client interfaces (for example, XALI0). Doc ID 018553 Rev 3 345/590 PCI express controller (PCIe) RM0078 1. The XADM forms the transaction into a TLP and checks the TLP against the current Flow Control credit availability. If the TLP passes the flow control checks and wins the arbitration with TLPs from the other client interfaces, then the TLP goes to the CXPL. 2. The XTLH module inserts an ECRC (if applicable) and snoops/stores the necessary TLP information for completion lookup (for Non-Posted requests only). 3. The XDLH inserts the sequence number and LCRC into the TLP and the retry buffer stores the TLP. 4. The XMLH inserts start and end delimiters and performs data scrambling. 5. The XMLH presents the packet to the PHY through the PIPE interface. 6. The PHY receives the packet, performs 8b10b encoding, and serialization, then sends the packet for transmission on the Link Transmit arbitration The XADM arbitrates transmit TLPs using round-robin method between the two transmit client interfaces. Regardless of the TLP transmit arbitration, messages (both internally-generated and messages requested through the VMI) always have the highest priority, followed by internally-generated completions. The priority order for all transmitted TLPs is: 1. Internally generated messages 2. Internally-generated completions 3. Transmit TLPs from Client0 and Client1 according to the selected arbitration method Transmit retry There is a Retry Buffer (RB) in the core that stores a copy of each transmitted TLP until an Ack is received. The RB consists of two buffers: retry buffer and start-of-TLP (SOT) buffer. The retry buffer is implemented with a single port RAM. The SOT buffer stores the starting address of each unacknowledged TLP stored in the retry buffer. The SOT buffer is implemented with a single port RAM and is indexed by the Sequence Number of the TLP whose starting address is being stored or retrieved. When a Nak is received or the replay timer times out, a replay is initiated. A replay is terminated by two conditions: ● When the replay of all TLPs in the retry buffer is finished, or ● An Ack DLLP is received that acknowledges all TLPs in the retry buffer The replay timer tracks the TLP replay time. It stays at 0 when every TLP has received an Ack and starts to count when a TLP is transmitted and the LTSSM is not in the training state. The replay timer is reset to 0 when an Ack or Nak is received that acknowledges a TLP that is in the retry buffer. Note: 346/590 The retry buffer does not function as a transmit queue. The core transmits TLPs immediately after they pass arbitration. The copy in the retry buffer is only sent in the event that the TLP must be re-transmitted Doc ID 018553 Rev 3 RM0078 PCI express controller (PCIe) Transmit DLLP priorities The order of priority to transmit pending DLLPs is: 22.7.4 ● High-priority DLLPs ● TLPs ● Low-priority DLLPs Receive TLP processing Generally, received transactions travel through the core in the following flow: 1. The PHY receives a stream of bits and aligns/forms them into 10-bit symbols 2. The PHY decodes the 10b stream into an 8b stream 3. The PHY crosses the clock domain from RX to TX and presents the stream to the PIPE. 4. The RMLH descrambles and deskews the incoming data, checks for receiver errors, then extracts packets. 5. The RDLH strips off the LCRC and sequence number. 6. The RTLH strips off the ECRC (if applicable), checks for a malformed TLP, and forms a transaction across the RTLI interface to the RADM. 7. The RADM filters the transaction based on the transaction type (Posted, Non-Posted, or completion) and the rules described in Receive filtering below. 8. Filtered transactions are sent to RADM queues. 9. Transactions residing in the RADM queues are presented to the application or locally handled by the LBC module, depending upon the filter result. Receive filtering The core contains a filter module that is responsible for the following tasks: ● Determine the status of a received TLP using filtering rules. ● Determine the destination interfaces of a received TLP based on the status from applying the filter rules. ● Signal the application for the status of the received TLP by driving signals such as DLLP abort, TLP abort and ECRC error. ● Report errors to Advanced Error reporting registers (ADERR_STRUC address block) based on filter results. If more than one type of error is detected, Section 6.2.3.2.3 “Error Pollution” of the PCI express 2.0 specification is followed. The core filters and routes received TLPs according to a set of rules determined by the TLP type based on the PCI express base 2.0 specification and user-configurable filtering options. The filtering rules for a received TLP are affected by I/O signals and register values. The application can mask some of the filtering and error handling rules by setting the corresponding bits in Symbol Timer and Filter Mask register 1 (SYMB_T_R) and Filter Mask register 2 (FL_MSK_R2). There are three types of the filtering rules in the core: ● rules that are applicable for all TLP received ● rules that are dependent on the type of the TLP based on PCIe specification ● rules that are not from the PCIe specification but requested by specific applications. Doc ID 018553 Rev 3 347/590 PCI express controller (PCIe) RM0078 Figure 114. Receive TLP processing flow 4RASH 1UEUE &ILTER 242'4 4,0 FILTERING #80, $%-58 2CVD#0, PROCESSING 2#0, %22 -3' 242'4 -ESSAGE PROCESSING -3' %,") ,"# $") #&'$ATA #$- Filtering rules applicable for all TLPs received The following general rules apply to all incoming TLPs: ● The core discards all incoming TLPs that have an invalid Type field. This TLP is treated as a “TLP ABORT”. ● A request TLP with the poison bit set is considered an unsupported request (UR) only when the UR poison rule mask bit is not set. Applications can control the end result of a poisoned TLP filter through the corresponding filter mask bit. If the filter mask bit is not set, all request TLPs with poison bit set will be discarded. ● A locally terminated TLP with ECRC error detected is discarded in store-and-forward mode and an ECRC error reported only when the filter mask CX_FLT_MASK_ECRC_DISCARD bit is not set. ● Filter rules have no effect on received TLPs when “DLLP ABORT” signal is asserted. ● If a completion of a non-posted request is not received within a completion timeout period, this request will be treated as a completion timeout, and a non-advisory error will be reported. ● For messages to be accepted and decoded, the incoming message must be one of the valid Message types with the correct payload length based on PCIe 2.0 specification. Valid Messages will be decoded and passed onto the SII interface as necessary. See Section 22.7.5: Error handling for more details. 348/590 Doc ID 018553 Rev 3 RM0078 PCI express controller (PCIe) Filtering rules based on TLP type defined in PCIe specification PCIe TLPs are categorized as requests and completions. The next table describes the filtering rules for request and completion TLPs and the results of the core's filter. If a received TLP passes all of the filter rules for request and completion TLPs, then it is considered to have no errors, and the TLP will be routed to the destination that is configured. Details on routing are provided in Receive routing section. Notation of filter results: UR = Unsupported Request CA = Completion Abort CRS = Configuration Retry Request SU = Successful UC = Unexpected Completion MLF = Malformed "-" = Filtering rule does not apply to TLP type MA = Master Abort TA = Target Abort EP mode filtering rules Table 124. Result of filtering rules applied to request TLPs and completion (CPL) TLPs: EP mode TLP type CFG MSG CPL with UR/CA/CR S status CPL with SU status UR SU SU UC UC UR UR - - - - TLP header poison bit is set and the filter mask CX_FLT_MASK_UR_POIS bit is not set UR UR UR UR SU SU Address within a BAR that is configured to RTRGT0 and TLP DW length > 1 CA CA - - - - MRd with lock and filter mask CX_FLT_MASK_LOCKED_RD_AS _UR bit is not set UR - - - - - Filtering rule MRd MWr IORd IOWr PowerState is not in D0 UR Address is not within any configured Memory BAR or IO BAR if it is an IO request Doc ID 018553 Rev 3 349/590 PCI express controller (PCIe) RM0078 Table 124. Result of filtering rules applied to request TLPs and completion (CPL) TLPs: EP mode (continued) TLP type CFG MSG CPL with UR/CA/CR S status CPL with SU status - UR - - - - - UR - - - Application requests the core filter to return CRS by asserting signal app_req_retry_en - - CRS - - - Not valid message for EP device - - - UR/MLF - - Illegal payload length of a message - - - UR - - Vendor MSG Type0 with filter mask CX_FLT_MASK_VENMSG0_DROP bit not set - - - UR - - Vendor MSG Type1 with r[2:0] to 3'b010 and {Bus#, Dev#, Func#} mismatch - - - UR - - CA CA CA - - - Requester ID mismatch - - - - MA/TA MLF Requester TAG mismatch - - - - MA/TA MLF TAG error (non-pad zero for reserved TAG bits - - - - MA/TA MLF Byte count mismatch (PCIe Gen2) - - - - MA/TA UC/MLF Completion received with status of UR - - - - MA - Completion received with status of CA - - - - TA - Completion received with status of CRS - - - - CRS - Completion received with CRS status and completion is not a pending configuration request - - - - MLF - Filtering rule MRd MWr IORd IOWr The function number of a completer ID within a CFG request does not match an implemented function within the receiver device and the filter mask CX_FLT_MASK_UR_FUNC_MISM ATCH bit is not set - Configuration type1 TLP request and the filter mask CX_FLT_MASK_CFG_TYPE1_RE Q_AS_UR is not set TLP with ECRC error detected 350/590 Doc ID 018553 Rev 3 RM0078 PCI express controller (PCIe) A complete list of the filtering checks can be referenced at Symbol Timer and Filter Mask register 1 (SYMB_T_R) and Filter Mask register 2 (FL_MSK_R2) in PCIe core registers section (Endpoint register bank) of RM0089, Reference manual, SPEAr1340 address map and registers. RC mode filtering rules Table 125. Result of filtering rules to request TLPs and completions (CPL) TLPs: RC mode TLP type IO MSG CPL with UR/CA status - - - - - - UR - UR - - - - UR UR UR UR UR - - - UR - - - - - - CFG Request received and the filter mask CX_FLT_MASK_RC_CFG_DISCARD is not set - - UR - - - - - IO Request received and the filter mask CX_FLT_MASK_RC_IO_DISCARD is not set - - - UR - - - - Filtering rule MRd MWr Address does not satisfy any of the following conditions: 1. Within any configured memory BAR. 2. Outside of the memory range AND prefetchable memory range as determined by the corresponding base and limit fields in the Type-1 header. 3. The filter mask CX_FLT_MASK_UR_OUTSIDE _BAR bit is set, which treats outof-bar TLPs as supported requests and indicates a special application requirement UR UR Any address bit, above bit position MASTER_BUS_ADDR_WIDTH-1 is set to '1' UR TLP header poison bit is set and the filter mask CX_FLT_MASK_UR_POIS bit is not set MRdLk request received and filter mask CX_FLT_MASK_LOCKED_RD_AS_ UR bit is set, which indicates that customer prefer to filter out the MRdLk (1) CFG Doc ID 018553 Rev 3 CPL with CRS status CPL with SU status 351/590 PCI express controller (PCIe) RM0078 Table 125. Result of filtering rules to request TLPs and completions (CPL) TLPs: RC mode TLP type IO MSG CPL with UR/CA status - - UR - - - - - - UR/ MLF - - - CA CA CA CA - - - - Requester ID mismatch - - - - - MA/TA - MLF Requester TAG mismatch - - - - - MA/TA - MLF TAG error (non-pad zero for reserved TAG bits) - - - - - MA/TA - MLF Byte count mismatch - - - - - MA/TA - MLF Completion received with status of UR - - - - - MA - - Completion received with status of CA - - - - - TA - - Completion received with CRS status and completion is not a pending configuration request - - - - - - MLF - Filtering rule MRd MWr Vendor MSG Type0 with filter mask CX_FLT_MASK_VENMSG0_DROP bit not set - - Not valid message for RC device - TLP with ECRC error detected (1) CFG CPL with CRS status CPL with SU status 1. DM (in RC mode) should not expect to receive a CFG or IO request. A complete list of the filtering checks can be referenced at Symbol Timer and Filter Mask register 1 (SYMB_T_R) and Filter Mask register 2 (FL_MSK_R2) in PCIe core registers section (Endpoint register bank) of RM0089, Reference manual, SPEAr1340 address map and registers. Filtering rules not defined in PCIe specification There are additional filtering rules that are designed to provide enhanced filter support for certain applications. ● Core to handle the received posted or non-posted requests with zero byte length When a zero-byte request TLP is received, also called "flush" command, the core can drop the zero-byte request (it means that the core service internally the request but doesn't pass it to the application). This is designed to support some applications that cannot handle a zero-byte request. Applications can dynamically program a bit in the filter mask CX_FLT_MASK_HANDLE_FLUSH bit to turn on/off this rule. If the core is programmed to handle the flush, it will be the completer's task to return completion status. ● Core to detect oversize read request and return UR for the read request Some applications may have a buffer limit and are not able to handle lengthy read requests. The core over-size read request detection rule can be turned on when an 352/590 Doc ID 018553 Rev 3 RM0078 PCI express controller (PCIe) application can identify a maximum read request size that it can tolerate. This feature is enabled when the PCIe AHB/AXI bridge is enabled. Receive routing ● EP Mode The possible destinations of a posted or non-posted request TLP are RTRGT1 interface, RTRGT0 interface and core discard (dropped or terminated). By default: – CFG requests are routed to RTRGT0 and then to CDM via LBC. – BAR-matched MEM/IO requests are routed to RTRGT1. – MSG requests are decoded internally, signalled on the SII interface and then terminated. Figure 115. Default request TLP routing (assuming no TLPs with CA/CRS/UR completion status) CORE CONFIG DATA #$ #&' 242'4 ,"# 242'4 -%-)/ !DDRESS4YPE#HECK # 8 0 , "!2 #8?.&5.# 242'4 3)) -3' The possible destinations of a completion TLP are RCPL interface, RBYP interface, RTRGT1 interface, and Core Discard. In general, a TLP type that is configured as bypass will be sent to either the RBYP interface, or RCPL interface if it is a completion. A TLP type that is configured as a cut-through or store-forward will be sent to RTRGT1 interface. Doc ID 018553 Rev 3 353/590 PCI express controller (PCIe) ● RM0078 RC mode The possible destinations of a posted or non-posted Request TLP are RTRGT1 interface and core discard (dropped or terminated). By default: – MEM requests outside of the memory range AND pre-fetchable memory range as determined by the corresponding base and limit fields in the Type-1 header, are routed to RTRGT1. – MSG requests are decoded internally, signalled on the SII interface and then terminated. – An RC does not expect to receive CFG or IO requests. – BARs should be disabled and not used. The possible destinations of a completion TLP are RCPL interface, RBYP interface, RTRGT1 interface, and core discard. In general, a TLP type that is configured as bypass will be sent to the RBYP interface. A TLP type that is configured as a cutthrough or store-forward will be sent to RTRGT1 interface. Receive queuing A segmented buffer queuing method is used: a memory pair (header and data) is used for all TLP types and all VCs. The memory is divided into segments for Posted, Non Posted and Completion queues for each VC. The depth of each segment can be controlled dynamically by writing the buffer depth related registers in Port Logic registers (PRT_LOG_R). Posted and not posted TLP use the store-forward mode: TLPs are stored into queue and advertisement of an available TLP is advertised only after the entire TLP is stored into the queue. To deliver these request RTRGT0 or RTGT1 interfaces are used depending on BAR setup and if the TLP is of CFG type or not. RTRGT1 is connected to the AXI/AHB Bridge master interface (request) channel. Completion TLPs use bypass mode: there is no receive queue in this mode, the application must be able to accept all traffic - as back-pressure is disabled in the mode. To deliver these requests RBYP interface is used which is connected to the AXI/AHB bridge slave interface (response) channel. 22.7.5 Error handling Errors are classified into two levels: 354/590 ● Correctable error (CORR). This means that the PCIe core has a way of automatically handling the error. There is no loss of information. For example, Link CRC (LCRC) that is fixed by replaying the DLL. ● Uncorrectable error (UNCORR). The PCIe core can not fix these and they are classified as: – Fatal error (FATAL). The link is not functioning correctly and may require a link reset. – Non-fatal error (NONFATAL). The problem is not related to link operation. Doc ID 018553 Rev 3 RM0078 PCI express controller (PCIe) The core implements the following types of error handling. ● PCIe baseline capability. These reporting capabilities are a minimum set, and are required of all PCI Express devices. Error notification takes two forms: – Messages sent to root complex (RC). – Completion status errors. This also covers mapping of PCIe errors to legacy PCI generic error handling such as PERR# and SERR#. Many of the PCIe errors are mapped into the Status register in the PCI compatible configuration space header. ● PCIe Advanced Error Reporting (AER) capability. Allows more sophisticated error reporting, control, masking and logging using the PCIe extended AER capability register structure (ADERR_STRUC address block). The PCIe core supports advisory reporting for both the baseline and AER capabilities, which is the configurable with-holding of reporting for non-fatal errors (NONFATAL). For an RC port, the reporting of most errors is internal to the root port. No external error notifications are generated. One exception to this (for example) is unsupported request (UR) completion status. PCIe baseline capability Reporting of errors is achieved by sending a notification to the RC (a CPL with UR/CA/CRS status for Non Posted requests, and optionally an error Msg). The decision to [not] send an error MSG is controlled by a complex set of associated control and status bits. The status is also logged in the Device status register (DEV_CAS[31:16]) for the following errors: unsupported request (UR), FATAL, NONFATAL and CORR. The flow diagram in Section 6.2.5, Sequence of Device Error Signaling and Logging Operations of the PCI express specification shows the sequence of operations related to signaling and logging of errors detected by a PCIe device. Table 126 shows error message format sent to RC. Table 126. Error message (Msg) format Message code Note 0011_00xx ERR_CORR, ERR_NONFATAL, ERR_FATAL are encoded using 30h, 31h,33h Completion status errors: Completion status errors for non posted requests may be any of the following: ● Unsupported request (UR) ● Configuration request retry status (CRS) ● Completion abort (CA) Reporting through the Device control register (DEV_CAS[31:16]): Doc ID 018553 Rev 3 355/590 PCI express controller (PCIe) RM0078 The PCI express capability register structure provides the following support for baseline error reporting. ● Enable/disable error reporting (Device control register, DEV_CAS[15:0]). ● Provide error status (Device control register, DEV_CAS[31:16]) for: ● – UR – Correctable error (CORR). – Fatal Error (FATAL). – Non fatal error (NONFATAL). A method for software to force link retraining (Device control register, DEV_CAS[15:0]). Advanced error reporting (AER) AER allows more sophisticated error reporting, control, masking and logging using the optional extended AER capability register structure (ADERR_STRUC address block). Advanced error reporting registers: All possible errors are enabled, masked and assigned a severity. There are two sets of registers: ● Error enable register ● Error severity register ● Error mask register The correctable set of registers handles (for example) errors arising from bad DLLPs or TLPs. The uncorrectable set of registers handles (for example) errors arising from UR, ECRC, malformed TLPs, buffer overflow, UC, CA, completion timeout and poisoned TLP. Severity programming: The uncorrectable error severity register allows each uncorrectable error to be programmed to fatal or non fatal. The transmission of these error messages by class (correctable, non-fatal, fatal) is enabled using the Reporting Enable fields of the Device control register, DEV_CAS[15:0] or the SERR Enable bit in the PCI status and command register (PCI_CONFIG_HEADER registers, address 0x04). The Uncorrectable Error Mask register and Correctable Error Mask register allows each error condition to be masked independently. If messages for a particular class of error are not enabled by the combined settings in the Device control register (DEV_CAS[15:0]) and the PCI status and command register (PCI_CONFIG_HEADER registers, address 0x04), then no messages of that class will be sent regardless of the values for the corresponding mask register. If an individual error is masked when it is detected, its error status bit is still affected, but no error reporting message is sent to the root complex, and the header log and first error pointer registers are unmodified. 356/590 Doc ID 018553 Rev 3 RM0078 PCI express controller (PCIe) Advisory non fatal messages The PCIe core supports advisory reporting which is the configurable with-holding of reporting for non fatal errors. ● During baseline error reporting, the core produces no error message. ● During AER, the core can instead, signal a non-fatal error with ERR_COR, which serves as an advisory notification to software. It will always signal a fatal error with ERR_FATAL UR/CA advisory The PCIe core generally sends a CPL with UR/CA status to signal a uncorrectable error for a non posted request. If the severity of the UR/CA error is non fatal, the PCIe core will handle this case as an advisory non fatal error. By default, the PCIe core will signal the non fatal error (if enabled) by sending an ERR_COR message. UC advisory When the PCIe core receives an UC and the severity of the UC error is non fatal, the PCIe core will handle this case as an advisory non fatal error. By default, the PCIe core will signal the error (if enabled) by sending an ERR_COR Message. Error source classification The following table indicates how some of the more common low level errors are classified. Table 127. Possible causes for typical errors Error type Possible cause UR (unsupported request) Poisoned TLP (EP=1) No BAR match MRd length > max read request size UC (unexpected completion) TAG mismatch Requester ID (RID) mismatch CPL timeout Remote device hung CA (completion abort) ECRC Malformed TLP Bad TLP header caused by bad link Buffer overflow Credit miscalculation by some PCIe device BNad DLLP LCRC For a full analysis of what error conditions contribute towards an UR or CA status, see Receive filtering on page 347. In many cases, the standard operation may be 'masked' or ignored by setting the corresponding bit in the “Symbol Timer and Filter Mask Register 1”. Doc ID 018553 Rev 3 357/590 PCI express controller (PCIe) RM0078 Error detection Built into the core are all mandatory error detections, some optional error detections, and the error report mechanism based on the PCI express specification. The core also has an option for the application to turn off the filter rules and perform its own error checking. The following general rules apply to all incoming TLPs: ● The core discards all incoming TLPs that have an invalid type field. This TLP is treated as a “TLP-ABORT”. ● A locally terminated TLP with ECRC error detected is discarded in store-and-forward mode and an ECRC error reported only when the filter mask CX_FLT_MASK_ECRC_DISCARD bit is not set. ● Filter rules have no affect on received TLP when "DLLP-ABORT" signal is asserted. ● If a completion of a non-posted request is not received within a completion timeout period, this request will be treated as a completion timeout, and a non-advisory error will be reported. See Advanced error reporting (AER) on page 356 for more details. ● “DLLP-ABORT” is asserted as a result of one of two conditions: ● 22.7.6 – A data link layer error is detected (for example, LCRC). A retry from a remote device will occur. – UC or completion with ECRC error is detected. This condition is valid only when the application has configured the core with infinite credits. Because the completion buffer of the core or application has limited resources defined for expected completions, it is necessary to avoid overflowing the completion buffer by unexpected completions. Therefore “DLLP-ABORT” is asserted to notify the core completion buffer (if completion is in store-forward mode) or application's completion buffer to rewind their buffer pointers when a completion with ECRC error or unexpected completion is detected. TLP-ABORT is asserted as a result of one of three conditions: – Malformed TLP – UC – ECRC Messages Similar to MWr, messages (Msg/MsgD) are posted transactions. The 8-bit “Message Code” field defines what class of message the TLP is. Some examples of typical message classes are given below: Table 128. Message classes based on the message code Message code [7:0] Message class TLP type Note 0001_xxxx Power management Msg 0010_0xxx Legacy PCI interrupt Msg Assert/Deassert for each of INT A/B/C/D 0011_00xx Error signaling Msg ERR_CORR, ERR_NONFATAL, ERR_FATAL are encoded using 30h, 31h, 33h 0111_11xx Vendor defined Msg/MsgD Other classes (used by PCIe core include locked transaction and slot power limit 358/590 Doc ID 018553 Rev 3 RM0078 Message signalled interrupts (MSI/MSI-X) are not messages (Msg/MsgD) but MWr TLPs Message generation Messages that are transmitted by the PCI express core can potentially be derived from the following seven sources. Referring to the circled numbers in the following diagrams, outbound messages can be created either by: ● The core automatically as follows: – Power management messages. – Error signaling messages. or ● The customer application as follows: – Direct supply of message TLPs at AXI bridge master – Vendor defined messages through the Vendor Message Interface (VMI). – Locked transaction messages through the SII Message interface [RC mode], Legacy PCI interrupt messages through the SII Interrupt interface. – Error signaling messages through the SII Transmit Control interface (app_err* I/O). Figure 116. Message transmission: EP mode 0#)E#ORE 2!$- 280)0% %22/23IGNALING Note: PCI express controller (PCIe) !45 !PPLICATION GENERATED MESSAGES !("!8) BRIDGE SLAVE !("!8) 0(9 A #80,#ORE 8!,) 6ENDORDEFINED 6-) !PPLICATION ERRORSIGNALLING 8!$-3'?'%. 3))4RANSMIT#ONTROL 480)0% ,EGACY0#)INTERRUPT 3)))NTERRUPT 0-# Doc ID 018553 Rev 3 359/590 PCI express controller (PCIe) RM0078 Figure 117. Message transmission: RC mode 0#)E#ORE 2!$- 280)0% 0(9 !45 !PPLICATION GENERATED MESSAGES #80,#ORE !("!8) BRIDGE SLAVE !("!8) A 8!,) 6ENDORDEFINED 6-) 8!$- !PPLICATION ERRORSIGNALLING 3))4RANSMIT#ONTROL -3'?'%. 480)0% ,OCKEDTRANSACTION $") 3))-ESSAGES #$REGISTERS 0-# Table 129. Message transmission Index(1) EP mode RC mode Power management controller in the core (Msg) PM_PME(2) 2 Error signaling inside the core (Msg). COR_ERR / ERR_NONFATAL / ERR_FATAL. See Section 22.7.5: Error handling for more details. 3 Direct Supply of any class of message (Msg/MsgD). Access through AXI interface 1 360/590 Message source (type) Doc ID 018553 Rev 3 PME_Turn_off(3) n/a RM0078 PCI express controller (PCIe) Table 129. Message transmission (continued) Index(1) 3a 4 5 Message source (type) Indirect supply of any class of message (Msg/MsgD). (4) Vendor defined (Msg ) EP mode See Section 22.7.7: Address translation for more details on generating Msg/MsgD from MWr/IOWr using address translation unit (ATU). The core generates vendor defined messages in response to requests on the VMI (see application registers which manage this interface). Locked transaction (Msg) 6 Legacy PCI interrupt (Msg) 7 Error signaling from the application (Msg) RC mode Unlock message, triggered by root complex i by setting the app_unlock_msg bit in the application register n/a Setting sys_int bit in application register (see Section 22.5: Interrupts) n/a n/a 1. The “Index” referts to the numbers in the previous graphics. 2. Triggered by your EP application through the outband_pwrup_cmd or apps_pm_xmt_pme bits in CR1_Register (see RM0089, Reference manual, SPEAr1340 address map and registers) 3. Triggered by your RC application through the apps_pm_xmt_turnoff bits in CR1_Register (see RM0089, Reference manual, SPEAr1340 address map and registers) 4. MsgD not possible on VMI. Message reception The PCI express core can receive the following types of messages. The index in the first column refers to the circled numbers in the following diagrams. Table 130. Message reception Index(1) Message source (type) EP mode RC mode PM_PME PME_TO_Ack 1 Power management (Msg) PME_Turn_Off 1a Slot power limit (Msg) Set_Slot_Power_Limi t Support Message. n/a 2 Error signaling from downstream component (Msg) n/a COR_ERR/ERR_NO NFATAL/ ERR_FATAL 3 Vendor defined (Msg/MsgD) 4 Locked transaction (Msg) Unlock message n/a 5 Legacy PCI interrupts from downstream devices (Msg) n/a See PCI legacy interrupt in Section 22.5: Interrupts. 1. The “Index” referts to the numbers in the previous graphics. Doc ID 018553 Rev 3 361/590 PCI express controller (PCIe) RM0078 Figure 118. Message reception: EP mode 0#)E#ORE !("!8) !("!8) BRIDGE SASTER 2!$- 242'4 280)0% 6ENDORDEFINED POWERMANAGEMENT SOMELOCKEDTRANSACTION 0OWERMANAGEMENT 3))-ESSAGES A 0(9 #80,#ORE 480)0% 0-# 362/590 Doc ID 018553 Rev 3 RM0078 PCI express controller (PCIe) Figure 119. Message reception: RC mode 0#)E#ORE !("!8) !("!8) "RIDGE -ASTER 2!$- 242'4 280)0% 6ENDORDEFINED POWERMANAGEMENTSOME LOCKEDTRANSACTION ERRORSIGNALING ,EGACY0#)INTERRUPT 0OWERMANAGEMENT 3))-ESSAGES 3)))NTERRUPT 0(9 #80,#ORE 480)0% 0-# The RADM filter processes every received message and decodes the header before sending it to the application logic on the System Information Interface (SII). In addition, power management messages are processed by the PCIe core Power Management Controller (PMC). By default, all received messages are dropped (serviced internally) and not passed to the application (through AXI bridge master). To have all decoded messages also sent to the application interface then the register fields outlined in Table 131 must be set to “1”. Doc ID 018553 Rev 3 363/590 PCI express controller (PCIe) RM0078 Table 131. Controlling the routing of received messages Register Filter Mask register 1 Bit Default value 29 Mask the dropping of non-vendor messages 0: Drop 1: Do not drop DEFAULT_FILTER_ MASK_1[13] 0 Mask the dropping of vendor type 0 messages 0: Drop(1) 1: Do not drop DEFAULT_FILTER_ MASK_2[0] = 0 1 Mask the dropping of non-vendor type 1 messages 0: Drop 1: Do not drop DEFAULT_FILTER_ MASK_2[1] = 0 Filter Mask register 2 Filter Mask register 2 1. Function Vendor TYPE0 messages are dropped with UR error reporting For the masking (of the dropping) of vendor messages, it is not possible to differentiate between “Vendor Message without Payload (Msg)” and "Vendor Message with Payload (MsgD)”. Note: See RM0089, Reference manual, SPEAr1340 address map and registers for full details of the Filter Mask registers. When a message request is filtered with UR/CA/CRS status, the TLP is always dropped. Only message requests filtered with SC status, can potentially be forwarded to the application on AXI bridge master. 22.7.7 Address translation Address translation is used for mapping different address ranges to different memory spaces supported by the application. A typical example will map the AMBA memory space to PCIe memory space. It can be configured (by software) to implement a customer-defined address (and TYPE/FORMAT) translation scheme without the need for additional external hardware. Outbound (TX) features ● Address match mode operation for MEM/IO/CFG/MSG TLPs. No address translation for CPL. ● Supports TYPE translation via TLP TYPE header field replacement for MEM types to MSG/CFG types. ● ● 364/590 – This includes translation from posted to non posted (for example, MWr to CfgWr0). – No TYPE translation from CPL TLPs. Programmable TLP header per region for the following fields for TLP field replacement. – TYPE / TD / TC / AT / ATTR / MSG Code – Function Number (Physical and Virtual). 8 address regions based on programmable registers for location and size. Doc ID 018553 Rev 3 RM0078 PCI express controller (PCIe) ● Programmable enable/disable per region. ● Automatic format (FMT) field translation between 3 DW and 4 DW for 64-bit addresses. ● Invert address matching mode to translate accesses outside of a successful address match. ● ECAM configuration shift mode to allow a 256 MB CFG space to be located anywhere in the 64-bit address space. ● Supports regions from 64 kB to 4 GB in size. Inbound (RX) features ● ● Address Match Mode operation for MEM/IO/CFG/MSG TLPs. No address translation for CPL. Selectable BAR match mode operation for IO/MEM TLPs. – TLPs destined for RTRGT0 (internal CDM or ELBI) will not be translated. – TLPs that are not error-free (ECRC, malformed and so on) will not be translated. Programmable TLP header per region for the following fields for matching. – TYPE / TD / TC / AT / ATTR / MSG code – Function number (physical and virtual). ● 8 address regions based on programmable registers for location and size. ● Programmable enable/disable per region. ● Automatic format (FMT) field translation between 3 DW and 4 DW for 64-bit addresses. ● Invert address matching mode to translate accesses outside of a successful address match. ● Configuration shift mode. Optimizes the memory footprint of CFG accesses destined for the AXI interface in multi-function devices. ● Response Code defines the CPL completion status to return for accesses matching a region. ● Supports regions from 64 kB to 4 GB in size. The iATU registers are in the PCIe core port logic register space (See Port logic (PL) registers (in CDM) on page 342). This may be accessed locally via the DBI interface or via PCIe configuration accesses. The following registers are used for programming the iATU. Table 132. Registers used for programming the iATU Byte offset Description +0x200 iATU viewport register +0x204 iATU region control 1 register +0x208 iATU region control 2 register +0x20C iATU region lower base address register +0x210 iATU Region upper base address register +0x214 iATU region limit address register +0x218 iATU region lower target address register +0x21C iATU region upper target address register Doc ID 018553 Rev 3 365/590 PCI express controller (PCIe) 22.7.8 RM0078 Outbound iATU operation: address match mode The address field of each request MEM/IO TLP is checked to see if it falls into any of the enabled(3) address regions defined by the 'Start' and 'End' addresses as defined in Figure 120: iATU address region mapping: outbound and inbound (address match mode). If an address match is found, then the TLP address field is modified as follows: Address = Address - Base Address + Target Address and the TYPE, TD, TC, AT and ATTR TLP header fields are replaced with the corresponding fields in iATU Region Control 1 register. If the application address field matches more than one of the eight address regions, then the first (lowest of the numbers from 0 to 7) enabled region to be matched is used. If there is no address match then the address is untranslated. Figure 120: iATU address region mapping: outbound and inbound (address match mode) provides more details on this translation process. Figure 120. iATU address region mapping: outbound and inbound (address match mode) ,IMITADDRESS E!45 2EGIONNREGISTER 5NTRANSLATED ADDRESS-AP 2EGIONSIZE %NDADDRESSn3TARTADDRESS 4RANSLATED ADDRESSMAP X&&&& 4HERESULTINGTRANSLATEDADDRESSSPACECAN BEBITORBIT 2EGIONN %NDADDRESS 2EGIONN X 3TARTADDRESS 5PPERTARGETADDRESS E!45 2EGIONNREGISTER ,OWERTARGETADDRESS E!452EGIONNREGISTER X ISLOG#8?!45?-).?2%')/ .?3):% ,OWERBASEADDRESS E!45 2EGIONNREGISTER The upper 32 bits of the target address register will always form the upper 32 bits of the translated address because: 22.7.9 ● The maximum region size is 4 GB. ● A region may not cross a 4 GB boundary. Inbound iATU operations The main difference between Inbound and Outbound iATU operation is that the TLP TYPE is never changed in the inbound direction. Instead, the TYPE field is used for more precise matching. Other fields may also be optionally used to further refine the matching process. Another difference is that for MEM/IO TLPs, you can select between Address matching (as used in Outbound Operation) or BAR matching. Normally an End Point (EP) will use BAR 3. 366/590 If the region enable bit of the Region Control register is '0' then that region is not used for address matching. Doc ID 018553 Rev 3 RM0078 PCI express controller (PCIe) match mode and a Root Complex (RC) will use address mode as an RC normally has no BAR's implemented. Lastly, for CFG0 TLPs, you can select between routing ID matching or accept mode. If there is no match then the address is untranslated. In addition, ● TLPs destined for RTRGT0 (internal CDM or ELBI) will not be translated. ● TLPs that are not error-free (ECRC, malformed and so on) will not be translated. ● Address translation of all TLP types (MEM/IO/CFG/MSG) except CPL is supported in Address match mode. In BAR match mode only translation of IO/MEM is supported. IO/MEM match modes Inbound address translation for IO/MEM TLPs will operate in one of two matching modes as determined by the 'Inbound match mode' field in the iATU Region Control 2 register. ● Address match mode The operation is similar to Figure 22.7.8: Outbound iATU operation: address match mode. The address field of each request TLP is checked to see if it falls into any of the enabled address regions defined by the 'Start' and 'End' addresses as defined in . If an address match is found, then the TLP address field is modified as follows: Address = Address - Base Address + Target Address If the TLP address field matches more than one of the eight address regions, then the first (lowest of the numbers from 0 to 7) enabled region to be matched is used. Address match mode should always be used to match MSG transactions as these will never generate a match against a BAR. ● BAR match mode Looking for an address match is a two-step process. The address field of MEM/IO (only) request TLPs is checked by the standard internal PCI Express BAR matching mechanism to see if it falls into any address region defined by the enabled BAR addresses and masks. If a matched BAR was found, then that matched BAR ID is compared by the iATU to the 'BAR Number' field in the iATU Region Control 2 register for all enabled regions. Figure 121: iATU address region mapping: inbound (bar match mode) provides more details on inbound translation in BAR match mode. BAR match mode can only be used for MEM/IO transactions. Normally an EP will use BAR match mode and an RC will use address match mode - as an RC normally has no BAR's implemented or at least must handle requests which do not match any of its BARs. However, the user has the freedom to implement any mode in their device. For example, an EP device may use address match mode, but should be aware that if the address range does not match one of its BAR ranges in an EP, the device will reject the request with Unsupported Request (UR) completion status and no translation will occur. When the PCIe core is operating with 32-bit BARs, the operation is defined as in Figure 121: iATU address region mapping: inbound (bar match mode). Doc ID 018553 Rev 3 367/590 PCI express controller (PCIe) RM0078 Figure 121. iATU address region mapping: inbound (bar match mode) 5NTRANSLATED ADDRESSMAP 2EGION3IZESETBYTHE"!2-ASK OFTHEMATCHED"!2 4RANSLATED ADDRESSMAP X-ATCHED"!2NUMBER 4HERESULTINGTRANSLATEDADDRESSSPACECAN BEBITORBIT 2EGIONX 2EGIONX X 3TARTADDRESS 5PPERTARGETADDRESS E!45 2EGIONXREGISTER "!2X 22.7.10 ,OWERTARGETADDRESS E!452EGIONXREGISTER ISLOG#8?!45 ?-).?2%')/.?3):% ISDETERMINEDBY"!2X-ASK2EGISTER Gen2 5.0GT/s operation The PCIe express core supports all of the non-optional Gen2 5.0 GT/s features defined in the PCI express 2.0 specification. The core operates at 125 MHz Gen1 rate. When operating at the Gen2 rate, the core's clock frequency is changed to 250 MHz. Software configuration of Gen2 5.0 GT/s operation is available through the Gen2 Related register. If bit 17 “Directed Speed Change” of the Gen2 Related register is set to '1', then the LTSSM will initiate a speed change after the link is initialized. The PCIe core changes the rate signal and waits for a pulse on the phy_mac_phystatus signal to confirm that the PHY has accepted the requested rate. 22.7.11 Power management An architectural overview of the power management controller is given in Section 22.6.6: Power management control (PMC). There are two types of power management operations: ● Software controlled PCI power management operations ● Active state power management operation (ASPM) for PCIe device only The L0s link state is controlled by the ASPM L0s enter condition met state. The L1 link state is controlled either by the ASPM L1 enter condition met state, or by the D-state (D1, D2, or D3) of the PCIe device. The D-state of the PCIe device is programmable by software. The L2/L3 ready state is controlled by D-state and power turn-off event. The power saving of links in lower power states is greater as the link state numbers get larger. Figure 122: Relationship of power down states between link partners shows the links states of PCIe devices and the relationships of power down states between link partners. 368/590 Doc ID 018553 Rev 3 RM0078 PCI express controller (PCIe) Figure 122. Relationship of power down states between link partners ,SENTERCONDITION ,SEXITCONDITION METFORDEVICE! METFORDEVICE ! ,INKSTATEOF 0#)EDEVICE! ,)DLE ,INKSTATEOF 0#)EDEVICE" ,S ,)DLE ,SENTERCONDITION METFORDEVICE! ,)DLE ,S ,SENTERCONDITION METFORDEVICE" ,S ,)DLE ,SEXITCONDITION METFORDEVICE" ,S ,SENTERCONDITION METFORDEVICE" ,ENTERCONDITION METFORDEVICE! ,INKSTATEOF 0#)EDEVICE! ,IDLE ,INKSTATEOF 0#)EDEVICE" ,ENTER NEGOTIATION 2ECEIVED, ENTERREQUEST ,IDLE 2ECEIVED ,EXIT , 2EQUEST ,EXIT , ,IDLE ,IDLE ,EXITCONDITION METFORDEVICE" ,ENTERCONDITION METFORDEVICE! ,INKSTATEOF 0#)EDEVICE! ,INKSTATEOF 0#)EDEVICE" ,IDLE ,ENTER NEGOTIATION ,IDLE 2ECEIVED, ENTERREQUEST ,EXITCONDITION METFORDEVICE! , , 2ECEIVED ,EXIT 2EQUEST ,EXIT , IDLE , IDLE HIGHESTPOWER SECONDHIGHESTPOWER THIRDHIGHESTPOWER LOWESTPOWER L0s power down L0s is a low power state enabled by Active State Power Management (ASPM). ASPM enabled devices can only control L0s entrance of the transmitter. The receiver L0s is controlled by the remote devices. To enter in this state all of the following condition has to be met: ● ASPM L0s is enabled. ● L0s enter conditions defined by PCI express specification for a duration of time and there is no higher stage of power down requested. ● The timeout value is controlled by the DEFAULT_L0S_ENTR_LATENCY constant which is set to 4 us. To exit from this state any of the following conditions should occur: Doc ID 018553 Rev 3 369/590 PCI express controller (PCIe) RM0078 1. Any DLLP or TLP pending to be sent. 2. L1 enter condition met. 3. PCIe link partner request to enter into link recovery. L1 power down L1 is a power down state enabled either by ASPM or by the software controlled D1, D2 or D3 state (which is programmed by the system power management unit). L1 state is a bidirectional link power down state. Both link partners must negotiate to go to L1 state. To enter in L1 state due to ASPM there are three possible scenarios (All conditions met): Scenario 1: L1 Idle timeout From L0s 1. ASPM L1 and L0s are enabled. 2. Link state is in L0s for both transmitter and receiver of the link, and bit 30 of the “Ack Frequency and L0-L1 ASPM Control register” is set to 0 (default setting) OR Link state is in L0s of transmitter and bit 30 of the “Ack Frequency and L0-L1 ASPM Control register” is set to 1. 3. L1 enter conditions defined by PCIe spec for a duration of time and there is no higher stage of power down requested. 4. The timeout value is controlled by the DEFAULT_L1_ENTR_LATENCY constant which is set to 8 us. Scenario 2: L1 Idle timeout from L0 1. ASPM L1 is enabled and L0s is not enabled. 2. Link state is in L0. 3. L1 enter conditions defined by PCIe spec for duration of time, and there is no higher stage of power down requested. 4. The timeout value is controlled by the DEFAULT_L1_ENTR_LATENCY constant which is set to 8 us. Scenario 3: Application controlled 1. ASPM L1 is enabled. 2. Application request to enter L1 by asserting signal app_req_entr_l1. 3. L1 enter conditions defined by PCIe spec is met. To enter in L1 State due to D1/D2/D3 States (all conditions met) ● All functions that are programmed to D1, D2 or D3 states. ● Always enter L1 when L2/L3 PM turn-off negotiation has not yet been done. To exit from L1 State any of the following condition should be met ● Software requests a higher stage of power down. ● Any DLLP or TLP pending to be sent. ● Application requesting exit of L1 by asserting signal app_req_exit_l1. ● Link partner requesting exit of L1. Once L1 has exited, another L1 entry will not be initiated for 10us if the enter L1 condition is due to ASPM. If the enter L1 condition is due to lower power D-state, the core will enter L1 370/590 Doc ID 018553 Rev 3 RM0078 PCI express controller (PCIe) again after a wait time of cfg_cpl_sent_count cycles defined in PL register. This wait time to ensures the exit conditions have been served. L2/L3 power down The core has control over the L2 or L3 ready link state. After the L2/L3 ready is entered, the downstream device will begin preparation for the power and clock removal. After main power has been removed, the link will transition to L2 if Vaux is provided, or it will transition to L3 if no Vaux is provided. L2/L3 ready is a bi-directional link power down state. To enter L2/L3 state all of the following condition should be met: ● PME_Turn_Off/Pme_To_Ack handshake has been completed at any of D0,D1,D2,D3 states. ● Application is ready to be turned off by asserting signal app_ready_entr_l23. To exit from L2/L3 state any of the following condition should be met: ● Device is programmed with capability to support PME and application requests wakeup by asserting the apps_pm_xmt_pme signal or by triggering a native hot-plug event when D-state is in D1, D2 or D3. ● Link partner requesting exit of L2/L3. The core supports beacon signaling by asserting signal pm_phy_beacongen or wake when a wake-up event is initiated by a PCIe device. Completion timeout ranges Timeout ranges are supported as defined in the PCI express 2.0 Specification. The Device Capabilities 2 register (offset 24h) shows support for all ranges. The Device Control 2 register (offset 28h) will have a reset value equal to the default value in the specification: "0000b Default range: 50 us to 50 ms". If the default value is used then the timeout will be in "Range B: 0101b: 16ms to 55ms." This range was chosen for the default because the PCI express 2.0 specification states "It is strongly recommended that the Completion Timeout mechanism not expire in less than 10 ms." The following table illustrates the specification values versus the PCI Express core values for the ranges. Table 133. PCIe core completion timeout ranges versus PCI express specification Range Spec minimum Encoding Spec maximum PCIe core minimum PCIe core maximum Default 0000b 50µs 510ms 28ms 44ms A 0001b 50µs 100µs 65µs 99µs A 0010b 1ms 10ms 4.1ms 6.2msµs B 0101b 16ms 55ms 28ms 44ms B 0110b 65ms 210ms 86ms 131ms C 1001b 260ms 900ms 260ms 390ms C 1010b 1s 3.5s 1.8s 2.8s D 1101b 4s 13s 5.4s 8.2s D 1110b 17s 64s 38s 58s Doc ID 018553 Rev 3 371/590 PCI express controller (PCIe) 22.8 RM0078 Programming Here below, programming sequence to configure PCIe controller is reported (refer to MISC registers section in RM0089, Reference manual, SPEAr1340 address map and registers): ● Enable the AXI clock to PCIe (by writing PERIP1_CLK_ENB) ● Release AXI reset to PCIe (by writing PERIP1_SW_RST) ● Set the PCIE_SATA_CFG register to work with PCIe and to enable clock and release power up reset (by writing PCIE_SATA_CFG) ● Configure PCIe module as an endpoint, a legacy endpoint or root complex PCIe module (by writing CR0[28:25] bits) ● Set app_ltssm_enable bit to allow LTSSM to continue Link establishment (by writing CR0[3] bit) ● Wait the end of Link Up sequence by polling xmlh_link_up bit of CR3 register (CR3[6]) ● Wait LTSSM in L0 state by polling xmlh_ltssm_state bits of CR3 registers (CR3[4:0] = 0x11 is the expected value) After this sequence the link is up and ready to start communication. After this Address Translation can be configured as shown hereafter. 22.8.1 Programming example 1 Define Outbound Region 1 as: IO region from 0x80000000_d000000 - 0x80000000_d000ffff (64k) mapped to 0x00010000 in PCIe IO space. 1. Set up the viewport register Write 0x00000001 to address { 0x700 + 0x200 } to set outbound region 1 as the current region 2. 3. 4. Set up the region base and limit address registers – Write 0xd0000000 to address {0x700 + 0x20C} to set the lower base address. – Write 0x80000000 to address {0x700 + 0x210} to set the upper base address. – Write 0xd000ffff to address {0x700 + 0x214} to set the limit address Set up the target address registers – Write 0x00010000 to address {0x700 + 0x218} to set the lower target address – Write 0x00000000 to address {0x700 + 0x21C} to set the upper target address Configure the region via the region control 1 register Write 0x00000002 to address {0x700 + 0x204} to define the type of the region to be IO. 5. Enable the region Write 0x80000000 to address {0x700 + 0x208} to enable the region. 22.8.2 Programming example 2 Define Inbound region 2 as: MEM region matching BAR4 (BAR Match mode) mapping to 0x8000000020000000 in the application memory space. 372/590 Doc ID 018553 Rev 3 RM0078 PCI express controller (PCIe) 1. Set up the viewport register Write 0x80000002 to address { 0x700 + 0x200 } to set inbound region 2as the current region 2. Set up the target address registers Write 0x20000000 to Address {0x700 + 0x218} to set the lower target address Write 0x80000000 to Address {0x700 + 0x21C} to set the upper target address 3. Configure the region via the region control 1 register Write 0x00000000 to address {0x700 + 0x204} to define the type of the region to be MEM. 4. Enable the region for BAR match mode Write 0xC0000400 to Address {0x700 + 0x208} to enable the region for BAR match mode for BAR#4. Define Inbound Region 0 as: MEM region matching TLPs with addresses in the range 0x00010000 - 0x0005ffff mapped to 0x1000000020000000 - 0x100000002004ffff in the application memory space 1. Set up the viewport register Write 0x80000000 to address { 0x700 + 0x200 } to set inbound region 0 as the current region 2. Set up the region base and limit address registers Write 0x00010000 to address {0x700 + 0x20C} to set the lower base address. Write 0x00000000 to address {0x700 + 0x210} to set the upper base address. Write 0x0005ffff to address {0x700 + 0x214} to set the limit address 3. Set up the target address registers Write 0x20000000 to address {0x700 + 0x218} to set the lower target address Write 0x10000000 to address {0x700 + 0x21C} to set the upper target address 4. Configure the region via the region control 1 register Write 0x00000000 to address {0x700 + 0x204} to define the type of the region to be MEM. 5. Enable the region Write 0x80000000 to address {0x700 + 0x208} to enable the region in address match mode. Doc ID 018553 Rev 3 373/590 Serial ATA controllers (SATA) 23 RM0078 Serial ATA controllers (SATA) This chapter focuses on SATA functionality and operation. For the SATA feature list, refer to the SPEAr1340 datasheet: ● Doc ID 023063, Data sheet, SPEAr1340, Dual-core Cortex A9 HMI embedded MPU For technical details about the programmable registers, refer to the following companion document: ● 23.1 RM0089, Reference manual, SPEAr1340 address map and registers Overview The Serial ATA controller implements the serial advanced technology attachment (SATA) storage interface for physical storage devices. It is compliant with Serial ATA, AMBA and AHCI standards: ● The Serial ATA specifications can be found at the following website: http://sata-io.org ● The AMBA specification can be found at the following website: http://www.arm.com/products/system-ip/amba/amba-open-specifications.php ● The AHCI specification can be found at the following website: http://www.intel.com/technology/serialata/ahci.htm The SATA consists of three main blocks: ● Bus interface unit (BIU) ● Generic registers (GCSR) ● Port Figure 123. SATA block diagram PHY I/F PHY I/F Application clock RX clock TX clock Port DS FIFO Link layer RX FIFO Transport layer TX FIFO Port DMA (PDMA) Port registers (PCSR) DMA I/F REG I/F Bus interface unit (BIU) BIU Master Master I/F Port power control module BIU Slave Keep-alive clock Generic registers (GCSR) 374/590 Doc ID 018553 Rev 3 Slave I/F RM0078 23.2 Serial ATA controllers (SATA) Pins For a complete pin description, refer to Doc ID 023063, Data sheet, SPEAr1340, Dual-core Cortex A9 HMI embedded MPU. 23.3 Clocks The Serial ATA controller operates in four clock domains: ● Application clock (ACLK): this is the AXI bus interface unit clock, used for AXI master and slave interfaces, ● RX clock (p1_clk_rx): PHY receive clock; synchronous clock used to receive data from the MIPHY. Note: According to SATA specifications, this clock must never exceed the TX clock frequency by more than 350 ppm. 23.4 23.5 ● TX clock (p1_clk_tx): PHY transmit clock; this clock is generated by the PHY for clocking the Port link and Transport layers (TX clock domain): 37.5 MHz, 75 MHz,and 150 MHz. ● Power module keep-alive clock(ref_clk): this free-running clock is used by the link layer power module to facilitate power management. This clock has an allowable range of 20–150 MHz. Resets ● ARESET: AXI BIU reset, AXI reset for AXI master and slave interfaces. The application must reset the BIU interface when it is asserting a reset to the core. ● Reset_rx: PHY receive clock domain reset, asynchronous power-on reset input for the RX clock domain. ● Reset_tx: PHY transmit clock domain reset, asynchronous power-on reset input for the PHY TX clock domain ● p1_rst_phy_n: Power module keep-alive clock domain reset, asynchronous power-on reset input for the power module clock domain. Interrupts See Appendix A: Interrupts. Doc ID 018553 Rev 3 375/590 Serial ATA controllers (SATA) RM0078 23.6 Functional description 23.6.1 Bus interface unit (BIU) The bus interface unit provides two AXI interfaces: ● AXI Master: this interface enables the SATA AHCI DMA engine to read and write to an AXI slave connected to the AXI BIU. ● AXI Slave: this interface enables an AXI master to read and write through the AXI BIU to the SATA AHCI registers. The Port DMA (PDMA) module implements the following functions, per port functions: ● Connection to the AXI BIU Master using AXI-specific DMA interface ● PRD prefetch capability of the SATA core using small PRD FIFO The PDMA PRD prefetch logic utilizes separate AXI bus IDs for PRD data and DMA data to enhance performance. Figure 124 shows a detailed block diagram of the BIU. Figure 124. Bus interface unit block diagram Write address Req Write data Request Ch. Resp Slave AXI GS Write response GIF Core GS Read address BIU register read MUX Registers Generic registers (GCSR) Register write address & data Response Ch. Read response Read response stall Read resp Error response handling Write address Request Ch. Request Ch. Write data Write response AXI GM data converter Master AXI GM BIU DMA arbiter Read address Port Response Ch. Response Ch. Response Ch. Read data 376/590 Request Ch. Doc ID 018553 Rev 3 RM0078 Serial ATA controllers (SATA) The AXI BIU Module provides an interface between the DesignWare SATA AHCI IP’s application interface and the AXI interconnect. It enables a SATA AHCI Host to be connected to an AXI slave and AXI master, thus enabling SATA-compliant devices to be connected to the system (when Host IP is combined with a SATA-compliant PHY). This module includes the following: ● AXI master and slave protocol handlers ● Internal slave and master control for generic request and response interfaces ● Data converters ● Register read MUX ● DMA request arbiter The slave and master protocol handlers support the AXI protocol conversion between an AXI transfer and a generic transfer within the BIU, which is converted to master and slave requests and responses. The slave also requires a Read Response Stall module to break an input/output timing throughpath, as well as an Error Response Handling module. Feature limitations The following list identifies the limitations when using the AXI bridge with the SATA core. ● For a burst transaction, if an external AXI slave is going to respond with DECERR or SLVERR for any data beats, it must respond with DECERR or SLVERR for all data beats in the transfer. ● The AXI slave interface returns register data in order, as data is immediately available. ● Writes to the AXI Slave must be performed with awid and wid “in-order”. In other words, the ID for address and data writes must contain the same ID. ● No support for AXI exclusive transfer. ● SATA AHCI will always perform DMA transactions with same request IDs for each of PRD and Data type requests, so that completions of AXI master read requests are always returned in-order. ● AXI bus interleave is not supported. Writes to the AXI Slave must be performed with awid and wid .in-order.. In other words, the ID for address and data writes must contain the same ID, and the entire write data must be written (until wlast is asserted) before another master can access the slave. Read response data to the AXI Master must also not be interleaved. In other words, once a read response for a particular request has begun, it must complete before rid can be asserted for a different response. ● Order enforcement in AXI BIU slave for the register responses data is first come, first serve only, but data is available within a few clock cycles. Doc ID 018553 Rev 3 377/590 Serial ATA controllers (SATA) RM0078 AHCI Core operations Supported AXI transfer type The AXI BIU Module is compliant with the AMBA 3.0 AXI specification. Supported AXI burst operations ● For AXI Slave transfers (accessing AHCI registers): The AXI BIU supports only the incremental burst type (INCR) and not the WRAP and FIXED burst types. INCR is used in conjunction with ARLEN and AWLEN to define any length of burst. If an attempt to read or write slave addresses beyond the register boundary, an error response is provided for the beats that are not legal. Addresses that might exist within the AHCI address range that are not populated or defined, will provide 0x0 response data. No holes are supported in bursts, including zero-write bytes in any beat. ● For AXI Master transfers (AHCI DMA transactions): The AXI BIU supports only the incremental burst type (INCR) and not the WRAP and FIXED burst types. INCR is used in conjunction with ARLEN and AWLEN to define any length of burst. The application can configure the master maximum request size that the system slaves can take. If the slave’s maximum request size is smaller than the AHCI maximum transfer size, then the AXI BIU will split the DMA request into two or more requests. The responses of the split requests are required to be returned in-order per the request ID being fixed for each of PRD and Data type requests. Finally, the AXI Master performs any request that does not start at an AXI data bus width address boundary, by issuing one request for data up to the data bus width amount, in order to internally align and optimize all subsequent requests. Supported AXI transfer size The AXI master performs precise DMA writes via write strobes to the intended write request locations and supports all AXI transfer sizes. DMA requests are limited to a minimum 16-bit transfer size (see Section 23.6.3: Port). The AXI Master in some cases performs reads that extend beyond the actual data request. If an AXI bus request length does not end in a busaligned address, the request will be preformed with full bus width beats and the last beat may over-read up to the full bus data width to complete the request. This excess data is discarded internally and not written to a connected disk. The AXI slave supports all sizes and burst lengths of an incremental type, within the AHCI address space. The AXI BIU slave supports all non-aligned starting addresses for both read and write register access. DMA transaction order enforcement through the AXI BIU Master Order enforcement through the AXI BIU DMA is handled differently for reads and write transactions. AXI Master bus write transfers: The AXI BIU master will use the same ID for all DMA write transactions, thus enforcing correct order for the write and response. PRDs are not written to the AXI bus. AXI Master bus read transfers: The AXI BIU master will use the same ID for all DMA read transactions, and the same ID for all PRD read transactions (per AHCI port), though they differ between the two. This enforces correct order on the returned data. It is not possible for there to be an ordering issue between PRDs and Data, because Data requests are the result of particular PRD read requests. 378/590 Doc ID 018553 Rev 3 RM0078 Serial ATA controllers (SATA) AXI Slave bus transfers: The AXI BIU slave will return all data and response in order., responding with the appropriate ID for all read and write transactions. All register data is available within a few clock cycles, and is handled first come first served. Note: The AXI Master ID is an encoded version of the Port’s one bit ID (Data/PRD) and the Port number, the width is determined by the number of ports. The AXI Slave ID is set via the max number of Masters in the system. AXI write with data gaps (“Holes”) The AXI Master will never perform write access with “holes” (writes only, reads N/A). The AXI slave allows for non-contiguous write byte enables such as 4’b0101, as long as AXI protocol is followed. Reads must always be performed on contiguous data, as the AXI does not have control over individual bytes via strobes. Maximum AXI transfer burst length The SATA AHCI core is programmed to support certain maximum transfer sizes. The AXI BIU supports sequential burst transfers with a maximum burst length determined by the AXI interconnect. The AXI BIU can support more than the traditional 16-beat maximum burst length, such that bursts up to 4KB may be supported. The maximum AXI transfer length is limited by the data bus width, the request address, and the maximum AXI burst length, in beats. The AXI BIU supports mismatches that occur when the AXI maximum transfer length is different than the AHCI DMA maximum request size. The DMA engine automatically accounts for the difference and limits the requests such that the maximum AXI request size is not exceeded, along with making sure no 4K boundaries are crossed in any single transfer. The application must define its system requirements such as maximum DMA request size and maximum AXI burst length. There are resources allocated within the AXI BIU which are set based on these system parameters. 23.6.2 Generic registers (GCSR) This module implements the registers present in the companion document: RM0089, Reference manual, SPEAr1340 address map and registers for a detailed register description. 23.6.3 Port The Port instantiates the following modules: ● Port DMA ● Port registers ● Transport layer ● Link layer ● Port power control module Doc ID 018553 Rev 3 379/590 Serial ATA controllers (SATA) RM0078 Port DMA The port DMA (PDMA) module implements the following functions: Note: ● Monitors commands posted by system software using P#CI register. When any of the command slots becomes active, PDMA downloads the corresponding Register FIS from the Command List structure and passes it to the Transport Layer TxFIFO for transmission to the Device. ● Controls data transfer between the Transport Layer FIFOs and system memory using Physical Region Descriptor Tables (PRDT). ● During Data FIS reception, PDMA requests AMBA write transfer of P#DMACR.RXTS size from the BIU Master when RxFIFO contains data of at least this size. ● During Data FIS transmission, PDMA requests AMBA read transfer of P#DMACR.TXTS size from the BIU Master when TxFIFO contains space of at least this size. PDMA requests read transactions of P#DMACR.TXTS size regardless of the TxFIFO space up to the PRD or Data FIS limit (whatever is smaller). When the read data is returned by the AXI BIU Master, the data flow is controlled by the PDMA-BIU interface. ● Transfers non-Data FISes received from the device to system memory using Received FIS Structure. Most of the communication between the PDMA and software is done using two system memory descriptors that are constructed by software prior to initiating the transfer: FIS descriptor, which contains FISes received from the device, and the other is the Command List, which contains a list of 1 to 32 commands available for the Port to execute and the pointers for data transfers. Some additional communication is done via registers located in the GCSR and PCSR modules. System memory structures are described in the SATA AHCI specification. The PDMA module operates in the application clock (aclk) domain and has 32-bit-wide data path. Port registers The port registers (PCSR) module implements all port-specific registers: ● Command list and FIS base addresses ● Interrupt status/ enable ● Port command/ status ● Task file data/ signature/ serial ATA ● DMA status/control ● PHY status/control Transport layer The transport layer functional block diagram is shown in Figure 125. The transport Layer consists of the following five main modules: 380/590 ● Receive FIFO (RxFIFO) ● Transmit FIFO (TxFIFO) ● Transport check module (TCHK) ● Transport state machine module (TSM) ● Synchronization module (APP_ASIC) Doc ID 018553 Rev 3 RM0078 Serial ATA controllers (SATA) Figure 125. Transport layer functional block diagram TX Clock Domain (clk_asic#) 33 Receive FIFO FIFO flags RX Data[32:0] (RxFIFO) FIFO pop request Data valid Link Layer (LL) RX Control Transport Check FIFO push request RX Data[32] RX Data[10:0] (TCHK) Transport Errors Link/PHY Errors RX Control Transport State Machine (TSM) Transport Layer Interface PHY/ Power Management Sync Module (SYN) DMA Control TX Control Port DMA (PDMA) FIFO almost full RX Data[31:0] Application Clock Domain (clk_app) PHY/ Power Management Link/PHY/Transport Errors TX Control TX Data [32:0] FIFO pop request Transmit FIFO (TxFIFO) FIFO flags TX Data[32:0] FIFO push request The transport layer operates in two clock domains: transmit and application. Transmit clock is generated in the PHY and depends on the Link Layer data path width (valid frequency values are: 37.5 MHz, 75 MHz, and 150 MHz). The application clock is sourced from the system bus and depends on the software. Both transmit and receive data paths are 32-bit wide. The Transport Layer block provides FIS reception and transmission functions of the SATA Transport Layer. During reception the Transport Layer receives a new FIS from the link layer through the RxFIFO, decodes the FIS type, and instructs the PDMA to route the FIS payload data to the appropriate location in system memory. During transmission the Transport Layer instructs the PDMA to construct the appropriate FIS, and then passes it to the Link Layer through the TxFIFO. The transport layer block receives all the PHY/Link errors from the Link Layer, detects Transport errors, and passes them to the PCSR for setting the corresponding error bits. The Transport Layer processes one FIS at time on the transmit side, meaning only one FIS is allowed in the TxFIFO at a time. On the receive side, RxFIFO can potentially contain more than one FIS at a time. For example, when the device transmits several DMA Data FISs back-to-back with minimal delay, RxFIFO might still have the previous Data FIS while the next FIS is being received. Doc ID 018553 Rev 3 381/590 Serial ATA controllers (SATA) RM0078 Transport layer FIS reception The FIS reception process is described as follows: ● The Link Layer starts frame reception and passes FIS content to the transport layer THCK. RxFIFO “almost full” flag notifies the link layer to send HOLDp to the device to prevent RxFIFO overflow. Upon detecting EOFp, the link layer asserts an “End status” signal to indicate the end of the FIS. All link layer/PHY errors are valid at this time. ● THCK module checks for transport layer protocol errors, passes FIS data to the RxFIFO, then appends “End status” DWORD at the end with all the link/PHY and transport errors. ● TSM module receives the FIS from the RxFIFO and passes it to the PDMA/PCSR. When any of the Link/PHY/Transport errors is detected, then the FIS is either ignored (when non-Data FIS) or the transfer is aborted (when Data FIS) and the corresponding bits are set in the P#SERR register. Transport layer FIS transmission The FIS transmission process is described as follows: Note: ● The PDMA detects a request from the system software and notifies the TSM to enter a transmit state. The DMA data transmission is activated by the TSM after it receives DMA Activate FIS from the device. ● The PDMA receives the appropriate FIS from BIU Master and pushes it into the TxFIFO. The following FIS types are supported: – Register FIS - Control or Command type. – Data FIS - PIO or DMA type. – BIST Activate FIS ● The link layer uses negation of the TxFIFO .empty. flag to generate SOFp and begin frame transmission. Bit 32 of the TxFIFO is used to indicate the FIS “last DWORD” to the Link Layer. When the Link Layer sees this bit valid, it closes the frame with CRC and EOFp. ● The TSM waits for either positive or negative frame transmission acknowledgement from the Link Layer (Link Layer “handshake” error). Both of these conditions are passed from Link to TSM in the “End Status” DWORD. Negative acknowledgement is generated when the device detects an error during the frame reception and signals it to the host Link Layer. In this case any non-data FIS is resent to the device using Transport Layer retry logic. When the error is detected during Data FIS transmission, then this transfer is aborted and the FIS is not resent. When neither positive nor negative acknowledgement is received from the Link Layer following frame transmission, host s/w times-out and resets the interface. Receive/Transmit FIFO (RX/TxFIFO) Both receive and transmit FIFOs are used as temporary FIS buffers and for clock domain crossing. The RxFIFO width is 33 bits: 32 bits are used to transfer data and the 33rd bit is used to indicate the End- Status DWORD so the Transport Layer can detect the end of the previous FIS and the start of the next FIS in the situation when more than one FIS is in the RxFIFO. The TxFIFO is 33 bits wide: 32 bits are used to transfer data and 33rd bit is used to indicate the last FIS DWORD to the Link Layer. Both FIFOs are reset on power-up either by the system bus reset signal, by the software setting SControl register bit 0 (COMRESET), or by the COMINIT condition. 382/590 Doc ID 018553 Rev 3 RM0078 Serial ATA controllers (SATA) Based on the system bus software requirements, a value of 1024 (2048 DWORDS) was selected for FIFO. An RxFIFO "almost full" flag is set to comply with the SATA HOLDp latency requirement: the Link Layer sends HOLDp on the back channel when this flag is asserted to prevent RxFIFO overflow. Data is read from the RxFIFO or written into the TxFIFO by the PDMA when there is enough data in the RxFIFO or room in the TxFIFO for a given DMA transaction size. Transport check (TCHK) The TCHK module provides the following functions: ● Detects new FIS reception by the Link Layer based on the received control signals. ● Decodes the FIS type located in the least-significant byte of the first DWORD and checks its validity. The following FIS types are supported: – Register FIS – Set Device Bits FIS – PIO Setup FIS – DMA Activate FIS – DMA Setup FIS – Data FIS – BIST Activate FIS – Unknown FIS (length is less than or equal to 64 bytes) ● Checks for all the Transport Layer errors (unrecognized FIS, protocol, transition, etc.). ● Detects an “End Status” signal assertion indicating the end of the current FIS from the Link Layer and passes all Link Layer/PHY/Transport Layer errors to the RxFIFO and to the PCSR module. ● The TCHK provides “Good FIS/Bad FIS” status acknowledgement to the Link Layer at the end of the received FIS. The TCHK module receives 32-bit FIS DWORD data from the Link Layer and adds one bit (bit 32) before writing it to the RxFIFO. This bit indicates either FIS data, when cleared, or .End Status. DWORD, when set. The following Transport Layer errors are checked in the TCHK (assuming no errors were detected in the Link/PHY): Doc ID 018553 Rev 3 383/590 Serial ATA controllers (SATA) 1. RM0078 FIS length: – Non-data FIS according to the FIS type – Data FIS should be between 2 and 2049 DWORDs – Unknown FIS should be between 1 and 16 DWORDs 2. PIO Setup FIS transfer count - should be non-zero and even byte count and not exceed 8192 bytes 3. PIO Data FIS following the PIO Setup FIS with D=1 (PIO read) DWORD count - should match the transfer count 4. PIO read protocol FIS sequence - only Data FIS or end status when error are expected after the PIO Setup FIS with D=1, any other FIS would be negatively acknowledged to the Link Layer 5. DMA Setup FIS buffer offset - bits 0 and 1 should be cleared and transfer count should be an even (not zero) number 6. First Party DMA read protocol - DMA Setup FIS with D=1 is followed either by Data FIS or Set Device Bits FIS or end status when error 7. First Party DMA write protocol - DMA Setup FIS with D=0 is followed by DMA Activate FIS (when A=0) or Set Device Bits FIS or end status (when A=1) 8. BIST Activate FIS is supported type only 9. RxFIFO push error for Data FIS - detected when Link has valid data and RxFIFO is “ful” (for example, device violates HOLD latency requirement) The Transport Transition Error P#SERR.DIAG_T bit is set when errors 1.8 are detected. The Unknown FIS P#SERR.DIAG_F bit is set when the Unknown FIS length does not exceed 64 bytes. The Protocol Error P#SERR.ERR_P bit is set on detection of error 9. Transport state machine (TSM) The TSM module provides the following functions: ● Implements the host Transport Layer state machine according to the SATA spec with the exception of the FIS checking and error handling functions. ● Decodes the FIS type by reading the least-significant-byte of the first DWORD of the FIS. ● Detects the “End status” DWORD and checks for any Link Layer/PHY/Transport Layer errors. When any of the errors is detected: – On a non-data FIS, the received FIS is discarded, the transmitted FIS is retried indefinitely, and the corresponding P#SERR register ERR_I bit is set. – On a data FIS, it can be passed to the system memory before the final status is reflected in the P#SERR register ERR_T bit. ● Generates/receives the appropriate control signals to/from the PDMA based on the received FIS and its state. ● Handles transfer termination requests originated from the Link Layer or PDMA module. Sync module (APP_ASIC) This module is used to synchronize several control signals between the Link Layer and the Transport Layer clock domains. 384/590 Doc ID 018553 Rev 3 RM0078 Serial ATA controllers (SATA) Link layer The Link layer functional block diagram is shown in Figure 126. Figure 126. Link layer functional block diagram RX Data RX PHY Control Signal Decode 8b10b Decoding Data Alignment clk_rbc#/ clk_asic0 Synch (optional) Data Conversion Descrambler Repeat Primitive Drop Deframer BIST Data Checker RX OOB Detection (optional) SigDetect TX OOB Generation (optional) Main Link State Machine Data Converter 8b10b Encoding Shared Data and RPD Scrambler RX Data Error Results PHY/Link Initialization State Machine BIST Data Generation Main link module TX Data RX CRC Check and Output Register Framer CRC Calculator TX Data On power-up, system reset or device hot-plug, the following sequence occurs: 1. The Link Layer transmits sequences of control data and ALIGN Primitives to the PHY. 2. They are then forwarded to a device PHY as OOB signaling. 3. In addition, the Link Layer detects OOB sequences. These OOB sequences bring the host controller, PHY, and device to an initialized condition. Once this occurs: Doc ID 018553 Rev 3 385/590 Serial ATA controllers (SATA) RM0078 1. The Link Layer passes a PHY Ready status to the Transport Layer and normal communication begins. 2. The Link Layer receives requests from the Transport Layer to transmit data, in the form of a Frame Information Structure (FIS) comprised of DWORDs, to a device via the local PHY. 3. The Link Layer in turn transmits the FIS by inserting Primitives, scrambling and optionally encoding the data, sending it to the PHY and waiting for status. 4. When a status FIS is received, the Link Layer optionally decodes, aligns and descrambles the data, removes Primitives and forwards the data to the Transport Layer. 5. The Link Layer then notifies the Transport Layer of the ending transfer status. The Link Layer has no notion of the FIS content, other than its beginning and end points and CRC. 6. Data alignment is performed on received FIS data via ALIGN Primitives. Flow control is also achieved on FIS going in either direction via HOLD Primitives. 7. In addition, the Link Layer receives requests from the Transport and PHY Layers to go into and out of power management modes. Power management is achieved by notifying the PHY of a partial or slumber condition and then disabling normal data transmission on PHY RX and TX interfaces until a wake-up request from Transport Layer or remote device via the PHY is seen from the Power Control Module. Power management is controlled via Partial and Slumber requests as described in the SATA specifications. The Initialization State Machine controls the Link Layer, PHY and device system initialization. The main Link Layer State Machine controls FIS traffic, flow control and error detection and status reporting. FIS traffic is generated and disassembled via Framer and Deframer modules. The Link Layer also performs CRC calculations on FIS, as well as scrambling and optionally encoding the data. ● Optional decoding of received FIS is performed in the Rx clock domain due to the fact that the incoming FIS is on an asynchronous, but frequency locked clock of the same rate as the Tx clock domain. ● 8b/10b encoding and decoding are performed in the Link Layer. The Link Layer receives data on either Rx clock, recovered from the incoming data stream by the PHY, or on Tx clock. This single receive clock is then used in this module to decode data and control signals from the PHY and pass it to the rest of the Link Layer. Data is passed through a synchronizing Datastream FIFO. ALIGN Primitives are also detected and dropped in the front end of the receiver as a means of guaranteeing no Datastream FIFO overruns, when a Datastream FIFO is included. ALIGN Primitives are also used to synchronize to the data stream in the PHY by triggering data realignment where necessary. Finally, ALIGNs are required by the TX OOB initialization state machine to complete initialization, following the SATA specifications. For this reason, the PHY must indicate the presence of at least two ALIGNs after the Link Layer detects the release of COMWAKE. Otherwise, the Link Layer is not able to complete initialization and begin normal operation. This is required regardless whether the PHY drops ALIGNs at any other time. Note: 386/590 Even if the PHY drops ALIGNs, data indicating the comma character must be present on phy_rx_data, in the corresponding phy_comma_det slot. This is required to invalidate comma characters before they are stable. Doc ID 018553 Rev 3 RM0078 Serial ATA controllers (SATA) Link layer features The SATA Link Layer features are as follows: ● Highly configurable PHY interface with selectable data widths ● Optional RX Data Buffer for recovered clock systems ● Optional OOB signaling and system Initialization ● 1.5 Gb/s and 3.0 Gb/s speed negotiation when TX OOB signaling is selected ● Frame negotiation and arbitration ● Envelope framing/deframing ● CRC calculating, insertion and checking ● Optional 8b/10b encoding/decoding ● Flow control ● Frame acknowledgement and status reporting ● Data width conversions ● Data scrambling/de-scrambling for EMI reduction ● Repeat Primitive data transmission and reception handling ● ALIGN Primitive detection, dropping and data alignment ● Power management support Configurable PHY interface Many of the SATA features are detailed in the SATA specifications, and are not repeated here. RX and TX Data The SATA PHY interface data width is 20 bits for both RX and TX (16+4 for the 8b/10b encoding). The Link Layer can receive data on a clock recovered from the incoming data stream (Rx clock), or the data can already be synchronized into the Tx clock domain. When data is presented to the Link Layer on a recovered clock, the Link Layer synchronizes data into the Tx clock domain via a Datastream FIFO. Port power control module The port power control module (PCM) implements the following functions: ● Monitors Transport, Link and PHY ready/not ready conditions, as well as Device and Host power requests. ● Systematically controls the Link and Transport Layer transitions into and out of offline conditions (system reset, COMRESET and power modes). ● Allows Tx clock and Rx clock to be stopped during Slumber and Partial power modes. The PCM main function is to allow disabling Tx clock and Rx clock in SATA power down modes. Note: If Tx clock or Rx clock are stopped, Near End Analog Loopback mode is not supported when a device is connected to the system. Therefore, it is recommended to only stop clocks in Slumber mode, in order to support Near End Analog Loopback mode when a device is connected. In order to support Host-initiated power modes where Rx clock and Tx clock are removed, the PMACK received from the Device must be able to make it through the Rx clock domain, Doc ID 018553 Rev 3 387/590 Serial ATA controllers (SATA) RM0078 synchronization, and the Link Layer Tx clock domain RX Data path to the Link state machine, before the clocks can be removed. The SATA specifications allow a Device to transmit 4 to 16 PMACKs before going into power down. While 16 PMACKs are enough to guarantee receipt by the Link state machine, 4 are not. In the cases where a Device does not send enough PMACKs, the clocks will need to be kept running long enough for the Link state machine to detect the PMACK, or the Host will not go completely into power down mode and a Host COMRESET would be required to exit the failed power mode. Figure 127 shows a high-level state diagram of the power control module. Figure 127. Port power control module diagram Rx clock domain RX Data from PHY Rx clock/Tx clock synchronization data stream FIFOs RX Front End Rx OOB detection clock domain sigdet Tx clock domain RX OOB Detect RX Note: Regardless of whether OOB detection is in the PHY or SATA, the power control module always uses COMWAKE and COMINIT. Link layer COMWAKE/COMINIT TX data to PHY Tx clock wake-Up Tx clock partial slumber Tx clock power mode request Note: There are clock-crossing synchronizers on all power control module I/O Power control module (always-alive clock domain) PHY Slumber Once asserted, PHY can remove Tx clock Systematically controls power-down and wake-up, allowing the PHY to remove clk_asic. Power mode request and enable from transport layer Wake-up from transport layer Partial/Slumber to transport layer The power control module exists in the ’always alive’ power module keep-alive clock domain. The power module keep-alive clock must always be present and must never change frequency. All signals into and out of the PCM are synchronized between the power module keep-alive clock, Tx clock, Rx clock, and the application clock (aclk) clock domains with one or more of synchronizers. The Power Control Module serves to assure all SATA Layers and the PHY move correctly between inactive and active states in unison. Note: 388/590 Within the core there is no difference between going into and out of Partial and Slumber power modes, even if a system disables Tx clock and Rx clock in one mode, but not the other. Clocks do not have to be removed in either mode. Doc ID 018553 Rev 3 RM0078 Serial ATA controllers (SATA) 23.7 Programming 23.7.1 Software initialization The SATA software initialization consists of two independent phases: a firmware phase (platform BIOS) and a system software phase. This section contains the following topics: ● Firmware specific initialization ● System software specific initialization Firmware specific initialization The firmware initialization is done on power-up. The following registers should be initialized to values that reflect the capabilities supported by the platform: Note: ● CAP.SSS= support for staggered spin-up ● CAP.SMPS= support for mechanical presence switches ● PI= ports implemented ● P#CMD.HPCP= whether the Port is hot plug capable. The P#CMD.HPCP should be set to 1 when P#CMD.MPSP or P#CMD.CPD is set to 1 for the Port. ● P#CMD.MPSP= whether mechanical presence switch is attached to the Port. ● P#CMD.CPD= whether cold presence detect logic is attached to the Port. Firmware should initialize the HPCP, MPSP, and CPD bits for each port implemented on the platform as defined by the PI register. After firmware has initialized the above mentioned registers, it should then perform the following steps to complete the staggered spin-up process (when applicable to the platform) on each port implemented (as indicated by the PI register): 1. Ensure that P#CMD.ST=0, P#CMD.CR=0, P#CMD.FRE=0, P#CMD.FR=0, and P#SCTL.DET=0. 2. Allocate memory for the command list and the FIS receive area. Set P#CLB and P#CLBU to the physical address of the allocated command list. Set P#FB and P#FBU to the physical address of the allocated FIS receive area. Then set P#CMD.FRE to 1. 3. Initiate a spin-up of the SATA drive attached to the Port by setting P#CMD.SUD to 1. 4. Wait for a positive indication that a device is attached to the Port (the maximum time to wait for presence indication is specified in the Serial ATA specification). This is done by polling P#SSTS.DET. When P#SSTS.DET returns a value of 1h or 3h when read, then the firmware should continue to the next step, otherwise when polling process times out, it moves to the next implemented Port and returns to Step 1. 5. Clear the P#SERR register by writing ones to each implemented bit location. 6. Wait for indication that SATA drive is ready. This is determined through examination of P#TFD.STS. When P#TFD.STS.BSY, P#TFD.STS.DRQ, and P#TFD.STS.ERR are all 0, prior to the maximum allowed time as specified in the ATA/ATAPI-7 specification, the device is ready. System software specific initialization Software may perform the SATA global reset prior to initializing by setting GHC.HR to 1 when desired. When firmware (BIOS) already allocated memory and initialized the Doc ID 018553 Rev 3 389/590 Serial ATA controllers (SATA) RM0078 appropriate registers for the command list and FIS receive area, the software may skip this step in the process. Following is the list of steps for system software to place the SATA into a minimally initialized state: Note: Note: 1. Determine which ports are implemented by the SATA, by reading the PI register. This bit map value aids the software to determine how many ports are available and which Port registers need to be initialized. 2. Ensure that the SATA is not in the running state by reading and examining each implemented Port.s P#CMD register. When P#CMD.ST, P#CMD.CR, P#CMD.FRE and P#CMD.FR are all cleared, the Port is in an idle state. Otherwise, the Port is not idle and should be placed in the idle state prior to manipulating the SATA global and Port specific register. System software places a Port into the idle state by clearing P#CMD.ST and waiting for P#CMD.CR to return 0 when read. Software should wait at least 500ms for this to occur. When P#CMD.FRE is set to 1, software should clear it to 0 and wait at least 500ms for P#CMD.FR to return 0 when read. When P#CMD.CR or P#CMD.FR do not clear to 0 correctly, then software may attempt a Port reset or a global reset to recover. 3. Determine how many command slots the HBA supports, by reading CAP.NCS. 4. For each implemented Port, system software should allocate memory for and program: – P#CLB and P#CLBU (when CAP.S64A is set to 1) – P#FB and P#FBU (when CAP.S64A is set to 1) It is good practice for system software to zero-out the memory allocated and referenced by P#CLB and P#FB. After setting P#FB and P#FBU to the physical address of the FIS receive area, system software should set P#CMD.FRE to 1. 5. For each implemented Port, clear the P#SERR register, by writing ones to each implemented bit location. 6. Determine which events should cause an interrupt, and set each implemented Port.s P#IE register with the appropriate enables. To enable the SATA to generate interrupts, system software must also set GHC.IE to 1. Due to the multi-tiered nature of the SATA interrupt architecture, system software must always ensure that the P#IS (clear this first) and IS.IPS (clear this second) register are cleared to ‘0’ before programming the P#IE and GHC.IE registers. This prevents any residual bits set in these registers from causing an interrupt to be asserted. Software should not set P#CMD.ST to 1 until it is determined that a functional device is present on the Port as determined by P#TFD.STS.BSY, P#TFD.STS.DRQ, P#TFD.STS.ERR bits all cleared, and P#SSTS.DET=3h. To enable the P#TFD register to be updated with the initial Register FIS for a Port, the P#SERR.DIAG_X bit must be cleared to 0. 390/590 Doc ID 018553 Rev 3 RM0078 23.7.2 Serial ATA controllers (SATA) Software manipulation of Port DMA This section contains the following topics: ● Start (P#CMD.ST) ● FIS Receive Enable (P#CMD.FRE) Start (P#CMD.ST) When P#CMD.ST is set to 1, software is not allowed to perform the following actions: ● Manipulate P#CMD.POD to power on or off a device through cold presence detect logic (when supported by the platform and enabled in the SATA); ● Manipulate P#SCTL.DET to change the PHY state; ● Manipulate P#CMD.SUD to spin-up the device (when supported by the platform) The above actions are only allowed while the Port is in the Not Running state, indicated by both P#CMD.ST and P#CMD.CR being 0. Software should set P#CMD.ST only after the following conditions become true: ● P#CMD.CR is verified to be cleared to .0. and P#CMD.FRE has been set to 1; ● A functional device is present on the Port (as determined by P#TFD.STS.BSY=0, P#TFD.STS.DRQ=0, and P#SSTS.DET=3h) and P#CLB/P#CLBU are programmed to valid values. FIS Receive Enable (P#CMD.FRE) When P#CMD.FRE is set (causing P#CMD.FR to be set to 1), the Port receives FISes from the devices and copies them into system memory. When P#CMD.FRE is cleared (causing P#CMD.FR to be cleared to 0), received FISes are held in the RxFIFO, and when it is full, further FIS reception is blocked. Software is allowed to manipulate P#CMD.FRE so that it may move the FIS receive area to a new location. When this bit is cleared to 0, software must first wait for P#CMD.FR to clear to 0, indicating that the Port DMA engine for FIS reception is in an idle condition. When P#CMD.FR and P#CMD.FRE are both cleared to 0, software may update the values of P#FB and P#FBU. Prior to setting P#CMD.FRE to 1, software should ensure that P#FB and P#FBU are set to valid values. Software should not write P#FB/P#FBU while P#CMD.FRE is set to 1. Software should set P#CMD.FRE to 1 prior to setting P#CMD.ST to 1. Software should not clear P#CMD.FRE while P#CMD.ST or P#CMD.CR is set to 1. Upon global or Port reset, the P#CMD.FRE bit is cleared. The D2H Register FIS containing the device signature is accepted by the Port, and the signature field is updated. When the SATA Port stops running due to an error (e.g., P#IS.IFS is set to 1), FISes may not be posted until the P#CMD.ST bit is cleared to 0 to recover from the error. Doc ID 018553 Rev 3 391/590 SATA/PCIe physical interface (MiPHY) 24 RM0078 SATA/PCIe physical interface (MiPHY) This chapter focuses on MiPHY functionality and operation. For the MiPHY feature list, refer to the SPEAr1340 datasheet: ● Doc ID 023063, Data sheet, SPEAr1340, Dual-core Cortex A9 HMI embedded MPU For technical details about the programmable registers, refer to the following companion document: ● 24.1 RM0089, Reference manual, SPEAr1340 address map and registers Overview The MiPHY macrocell implements the lower (physical) layer protocols providing data transmission and reception over a dual differential pair cable. The TX (transmit) and RX (receive) serial channels operate plesiochronously (NRZ). The macrocell can be used in Host or Device applications. Figure 128. MiPHY application diagram Serial data over copper (cable, PCB) 24.2 MiPHY System clock (Device) MiPHY Controller Bus Controller 20 bits Bus System clock (Host) System 2 Device 20 bits System 1 Host Pins For a complete pin description, refer to Doc ID 023063, Data sheet, SPEAr1340, Dual-core Cortex A9 HMI embedded MPU. 392/590 Doc ID 018553 Rev 3 RM0078 24.3 SATA/PCIe physical interface (MiPHY) Clocks The macrocell has an embedded PLL that can be configured to use either internal or external reference clocks. The PLL internal divider can be programmed through dedicated registers in the miscellaneous module. 24.3.1 ● ref_clk: reference clock for the internal PLL ● p1_clk_tx: Serializer output clock. It feeds PCIe or SATA controller tx path, depending on pcie_sata_sel value (see miscellaneous register PCIE_SATA_CFG ). This clock is asynchronous with respect to ACLK clock (PCIe/SATA AXI interface clock). ● p1_clk_rx: Deserializer output clock. It is asynchronous with respect to ACLK clock. Reference clock configuration The reference clock has a direct impact on the frequency accuracy. Its precision must be in line with the respective specifications (SATA/PCI Express). The frequency of the reference clock impacts also the calibration time. The reference clock may be configured as described in Table 134. Whatever the PLL reference clock, the macrocell provides it on the p1_clk_osc pin. Figure 129 shows the reference clock selection circuitry. To select the reference clock, you must configure the miscellaneous register PCIE_MIPHY_CFG. Table 134. p1_clk_osc selection truth table p1_osc_ bypass 0 1 X X p1_osc_ force_ext 0 0 1 1 osc_ext_ sel pll_ref_div [1:0] p1_clk_osc 00 Crystal reference clock 01 Crystal reference clock divided by 2 10 Crystal reference clock divided by 4 11 Crystal reference clock divided by 8 00 Differential external reference clock 01 Differential external reference clock divided by 2 10 Differential external reference clock divided by 4 11 Differential external reference clock divided by 6 00 Internal SoC reference clock (clk_pll_ref_zi) 01 Internal SoC reference clock (clk_pll_ref_zi) divided by 2 10 Internal SoC reference clock (clk_pll_ref_zi) divided by 4 11 Internal SoC reference clock (clk_pll_ref_zi) divided by 6 00 Internal SoC reference clock (clk_pll_ref_2V5_zi) 01 Internal SoC reference clock (clk_pll_ref_2V5_zi) divided by 2 10 Internal SoC reference clock (clk_pll_ref_2V5_zi) divided by 4 11 Internal SoC reference clock (clk_pll_ref_2V5_zi) divided by 6 X X 0 1 Doc ID 018553 Rev 3 393/590 SATA/PCIe physical interface (MiPHY) RM0078 Figure 129. Reference clock selection circuitry Crystal or differential clock xtal1 xtal2 Macrocell PLL Oscillator 2 qdiff Oscillator circuit VCO /x* Divider p1_osc_bypass PLL high frequency clock clk_osc_2v5_zo clk_osc_2v5_nzo ckout p1_clk_osc p1_osc_force_ext osc_ext_sel clk_pll_ref_2v5_zi clk_pll_ref_2v5_nzi clk_pll_ref_zi clk_pll_ref_nzi clk_osc_zo_en 24.3.2 pll_ref_div[1:0] fref clk_osc_zo clk_osc_nzo Recommended clock frequencies Default configuration: ● PCIe selected ● PLL reference clock @100 MHz ● PLL output clock @ 2.5 GHz ● p1_clk_tx and p1_clk_rx @125 MHz when in gen1, and @250 MHz in gen2 Configuration for SATA selection: ● SATA selected (by configuring pcie_sata_sel) ● PLL reference clock @25MHz ● PLL ratio set to 0x78 (PCIE_MIPHY_CFG[7:0]), PLL output clock @ 3 GHz ● p1_clk_tx and p1_clk_rx @75 MHz when in gen1, and @150 MHz in gen2 –or– 394/590 ● SATA selected (by configuring pcie_sata_sel) ● PLL reference clock @100MHz ● PLL ratio set to 0x3C (PCIE_MIPHY_CFG[7:0]), PLL output clock @ 3 GHz ● p1_clk_tx and p1_clk_rx @75 MHz when in gen1, and @150 MHz in gen2. Doc ID 018553 Rev 3 RM0078 24.3.3 SATA/PCIe physical interface (MiPHY) SerDes clocks The SerDes generates the px_clk_tx and px_clk_rx clocks from the high frequency PLL clock. The px_tx_spdsel, px_rx_spdsel, px_tx_lspd and px_power_mode[2:0] clocks are directly managed by PCIe or SATA controllers. Note: Regarding the pin naming convention px_yyy: – p is the port. – x is the macrocell port number (x is always 1) – yyy is the pin name. Figure 130. SerDes clocks SerDes clk/clkb from PLL (high freq) Clock recovery fref_ready /5or10 /5or10 /2,4 or 8 /2,4 or 8 24.4 px_tck px_rx_lspd px_clk_rx px_rx_spdsel px_power_mode[2:0] px_tx_lspd px_tx_spdsel px_clk_tx p1_clk_osc clk_ref from PLL (low freq) Resets ● p1_rst_phy_n: global reset for the macrocell (internal PLL included). When this reset is asserted the macrocell is in minimum power mode (everything is off). ● p1_rst_tx: serializer data path reset ● p1_rst_rx: deserializer data path reset Doc ID 018553 Rev 3 395/590 SATA/PCIe physical interface (MiPHY) 24.5 RM0078 Functional description As shown in Figure 131, the macrocell contains the following blocks: ● PLL: provides the high-speed reference clock for transmit and receive channels. ● SerDes: includes the standard-compliant transmit and receive functions: ● – SER: transmitter module – DES: clock and data recovery module. – I-DLL: oversampling clocks generator module – PMC: power management controller module Compensation: performs TX and RX buffer 100-ohm Figure 131. MiPHY functional block diagram PLL reference clock txp/txn rxp/rxn Sigma delta PLL SER DES Reference resistor DOC Compensation I-DLL DIC SerDes PMC 1 port macrocell 24.5.1 PLL description A sigma delta PLL provides the macrocell with the bit stream clock to all SerDes modules through a propagation line. The PLL has the following properties: ● harmonic PLL ● differential generated clock provided to all SerDes through a propagation line ● frequency range from 2.5 GHz to 3.0 GHz ● fractional PLL with 1 ppm frequency precision ● SSC modulation feature included The PLL is set to a dedicated frequency for each standard: ● SATA: 3 GHz ● PCI Express: 2.5 GHz The PLL is controlled by only one SerDes (the first on the PLL right side). But the PLL provides its status to all SerDes (PLL locked flag signal). 396/590 Doc ID 018553 Rev 3 RM0078 24.5.2 SATA/PCIe physical interface (MiPHY) SerDes description One SerDes module has all the circuitry needed to support 1 port (of SATA), PCI Express. SER module ● differential output signal TXP/TXN with programmable – swing – pre-emphasis – slew-rate ● detection of a peer transceiver on TXP/TXN pads (PCI Express feature only with low swing TX buffer) ● de-emphasis depending on TX buffer choice ● 8b10b encoder I-DLL module This module generates the multiple clocks from the single PLL clock that allow the deserializer to sample the incoming data stream. DES module ● equalization of RXP/RXN input ● signal detection circuitry on RXP/RXN input (both synchronous and asynchronous for OOB sequence and wake-up circuitry) ● clock and data recovery (CDR) ● 8b10b decoder with error detection PMC module This module manages the following: 24.5.3 ● macrocell wake up procedure ● macrocell power modes. Compensation module (COMPENS) description This block compensates the RX buffer input impedance, TX buffer output impedance, and TX buffer output slew rate over process, voltage and temperature variations, using an external reference resistor. Doc ID 018553 Rev 3 397/590 SATA/PCIe physical interface (MiPHY) 24.6 RM0078 Operation Figure 132 shows how the MiPHY is integrated in the SPEAr1340 device. To select whether PCIe or SATA should be selected by the multiplexer (MUX), you must configure the miscellaneous PCIe_SATA_CFG register[0]: pcie_sata_sel. ● Set pcie_sata_sel = 0 to select the PCIe block. ● Set pcie_sata_sel = 1 to select the SATA block. Figure 132. MiPHY module in SPEAr1340 SPEAr top level MISC MIPHY_S_0_TXp PCIe0 signals MIPHY_S_0_TXn p1_rst_phy_n p1_clk_auxi p1_clk_rx MUX pcie_sata_sel[0] p1_clk_tx p1_data_in p1_data_out MIPHY single lane RCG PCIe0 MIPHY_S_0_RXp SATA0 MIPHY_S_0_RXn SATA0 signals MIPHY_S_XTAL1 pcie_sata_sel[0] PLL MIPHY_S_XTAL2 MIPHY single PLL control signal coming from MISC register 398/590 Doc ID 018553 Rev 3 RM0078 25 Asynchronous serial ports (UART) Asynchronous serial ports (UART) This chapter focuses on UART functionality and operation. For the UART feature list, refer to the SPEAr1340 datasheet: ● Doc ID 023063, Data sheet, SPEAr1340, Dual-core Cortex A9 HMI embedded MPU For technical details about the programmable registers, refer to the following companion document: ● 25.1 RM0089, Reference manual, SPEAr1340 address map and registers Overview The SPEAr1340 device integrates 2 instances of an asynchronous serial port digital block, identified as UART0 and UART1. Asynchronous serial ports (commonly referred as UARTs) perform the main task in computer serial communication by converting incoming parallel information into serial data and incoming serial information into parallel data that can be sent on a communication line connected to an external peripheral device. Typical UART use cases include the connection of SPEAr-based platforms to debugging consoles, the communication with modems, and the interfacing of Bluetooth, DECT or ZigBee chipsets. UART ports usually do not directly generate or receive the external signals sent between different pieces of equipment. External interface devices convert the logic level signals of the UART to and from the external signal levels. External signals can take many different forms, such as RS-232, infrared, and wireless radio. In particular, the SPEAr1340 UART interfaces directly support (by software selection) the IrDA-compliant SIR (Serial InfraRed) protocol. The SPEAr1340 UART features offer functionality similar to the industry-standard 16C650 UART device. The UART supports standard asynchronous communication bits (start, stop, and parity), which are added prior to transmission and removed on reception. Doc ID 018553 Rev 3 399/590 Asynchronous serial ports (UART) RM0078 Figure 133. UART block diagram Read data[11:0] nUARTRST rxd[11:0] Write data[7:0] 16x8 Transmit FIFO PCLK 16x12 Receive FIFO txd[7:0] PRESETn UARTx_TXD Control and Status PSEL Transmitter PWRITE PADDR[11:2] SIROUT (1) Baud16 PENABLE APB interface and register block Baud rate divisor Baud rate generator PWDATA[15 0] PWDATA[15:0] UART RXD UARTx_RXD Baud16 Receiver PRDATA[15:0] SIRIN Receive FIFO status Transmit FIFO status UARTCLK FIFO flags UARTRXDMACLR UART0_RIn UARTTXDMACLR UART0_CTSn UARTRXDMASREQ UARTTXDMASREQ UARTRXDMABREQ UARTTXDMABREQ Note: 25.2 1 (1) UART0 DSRn UART0_DSRn UARTTXINTR DMA interface UART0_DCDn UARTRXINTR UARTMSINTR FIFO status and interrupt generation UART0_DTRn UARTRTINTR UART0_RTSn UARTEINTR nUARTOut1 UARTINTR nUARTOut2 For more information on this signal, refer to Section 25.5.4: IrDA SIR ENDEC. Pins For a complete pin description, refer to Doc ID 023063, Data sheet, SPEAr1340, Dual-core Cortex A9 HMI embedded MPU. 400/590 Doc ID 018553 Rev 3 RM0078 25.3 Asynchronous serial ports (UART) Clocks UART uses the PCLK clock for APB bus transactions and the reference clock UARTCLK for internal operations as well as for baud rate generation. UARTCLK has certain constraints with regards to PCLK. See Section 25.5.5: Baud rate generation and transmit logic for more information. Note: For details on UART clock configuration, see also Chapter 5: Reset and clock generator (RCG) 25.4 Interrupts See also: Appendix A: Interrupts. Table 135 shows a summary of the 11 maskable interrupts generated within the UART. These interrupts are combined to form four individual interrupt outputs and one which is the logical OR of the individual outputs. Any individual interrupt can be enabled or disabled by changing the corresponding mask bit in the UARTIMSC register. The status of the individual interrupt sources can be read either from the UARTRIS register for raw status, or from the UARTMIS register for the masked status. Table 135. UART interrupt summary with combined outputs Name Source Combined outputs UARTRXINTR Receive FIFO UARTRXINTR UARTTXINTR Transmit FIFO UARTTXINTR UARTRTINTR Receive time-out in Receive FIFO UARTRTINTR UARTCTSINTR Clear to send UARTDCDINTR Data carrier detect UARTDCSRINTR Data carrier send UARTRIINTR Ring indicator modem status UARTOEINTR Overrun error UARTBEINTR Break error (in reception) UARTPEINTR Parity error in the received character UARTFEINTR Framing error in the received character UARTMSINTR UARTINTR UARTEINTR Doc ID 018553 Rev 3 401/590 Asynchronous serial ports (UART) RM0078 UARTRXINTR This interrupt is asserted when one of the following events occurs: ● If the FIFOs are enabled and the receive FIFO reaches the programmed trigger level. To clear this interrupt, either read data from the receive FIFO until it becomes less than the trigger level, or write 1‘b1 to the corresponding bit of the UARTICR register. ● If the FIFOs are disabled and data is received, thereby filling the location. To clear this interrupt, either perform a single read of the receive FIFO, or write 1‘b1 to the corresponding bit of the UARTICR register. UARTTXINTR This interrupt is asserted when one of the following events occurs: ● If the FIFOs are enabled (FEN bit set to 1‘b1 in UARTLCR_H register) and the transmit FIFO reaches the programmed trigger level (TXIFLSEL in UARTIFLS register). To clear this interrupt, either write data to the transmit FIFO until it becomes greater than the trigger level, or write 1‘b1 to the corresponding bit of the UARTICR register. ● If the FIFOs are disabled and there is no data in the transmitter single location. To clear this interrupt, either perform a single write to the transmit FIFO, or write 1‘b1 to the corresponding bit of the UARTICR register. UARTRTINTR This interrupt is asserted when the receive FIFO is not empty, and no further data is received over a 32-bit period. This interrupt clears either when the receive FIFO becomes empty through reading all the data (or by reading the holding register), or when 1‘b1 is written to the corresponding bit of the UARTICR register. UARTMSINTR This interrupt is asserted if any of the modem status line changes: ● UARTRIINTR, because of a change in the nUARTRI modem status. ● UARTCTSINTR, because of a change in the nUARTCTS modem status. ● UARTDCDINTR, because of a change in the nUARTDCD modem status. ● UARTDSRINTR, because of a change in the nUARTDSR modem status. UARTEINTR This error interrupt is triggered when there is an error in the reception of the data. The interrupt can be caused by a number of different error conditions, such as overrun, break, parity and framing. UARTINTR This is the OR logical function of all the individual masked interrupt sources. This interrupt is asserted if any of the individual interrupts are asserted and enabled. 402/590 Doc ID 018553 Rev 3 RM0078 Asynchronous serial ports (UART) 25.5 Functional description 25.5.1 Main interfaces APB interface The APB interface block generates read and write decodes for accesses to control and status registers (CSRs) as well as to transmit/receive FIFO memories. Register block The register block stores data written, or to be read, across the APB interface. Baud rate generator The baud rate generator contains free-running counters that generate the internal x16 clocks, and Baud16 signal. Baud16 provides timing information for UART transmit and receive control. It consists of a stream of pulses with a width of one UARTCLK clock period and a frequency of 16 times the baud rate. Transmit FIFO The transmit FIFO is an 8-bit wide, 16-location deep FIFO memory buffer. CPU data written across the APB interface is stored in this FIFO until read out by the transmit logic. Note: The transmit FIFO block can be disabled to act like a one-byte holding register. Receive FIFO The receive FIFO is a 12-bit wide, 16-location deep FIFO memory buffer. Received data and corresponding error bits are stored in the receive FIFO by the receive logic until read out by the CPU across the APB interface. Note: The receive FIFO block can be disabled to act like a one-byte holding register. Transmit logic The transmit logic performs parallel-to-serial conversion on the data read from the transmit FIFO. The control logic outputs the serial bit stream beginning with a start bit followed by data bits, with the LSB first and ended by parity bit and stop bit according to the programmed configuration in control registers. See also: Section 25.5.5: Baud rate generation and transmit logic. Receive logic The receive logic performs serial-to-parallel conversion on the received serial bit stream after a valid pulse has been detected. The receive logic also performs detection of overrun, parity, frame error checking and line break, and their status accompanies the data that is written to the receive FIFO. See also: Section 25.5.5: Baud rate generation and transmit logic. Doc ID 018553 Rev 3 403/590 Asynchronous serial ports (UART) RM0078 Interrupt generation logic UART generates individual maskable active HIGH interrupts. A combined interrupt output is also generated as an OR function of the individual interrupt requests. A single combined interrupt can be used in a system interrupt controller that provides another level of masking on a per-peripheral basis. This enables to use modular device drivers that always know where to find the interrupt source control register bits. See also: Section 25.4: Interrupts DMA interface The UART provides a DMA interface to connect to a DMA controller. The DMA operation of the UART is controlled through the UART DMA control register—UARTDMACR. The DMA interface includes the following signals: ● ● For receive: – UARTRXDMASREQ: Single character DMA transfer request, asserted by the UART. For receive, one character consists of up to 12 bits. This signal is asserted when the receive FIFO contains at least one character. – UARTRXDMABREQ: Burst DMA transfer request, asserted by the UART. This signal is asserted when the receive FIFO contains more characters than the programmed watermark level. You can program the watermark level for each FIFO using the Interrupt FIFO Level Select Register, UARTIFLS. – UARTRXDMACLR: DMA request clear, asserted by a DMA controller to clear the receive request signals. If DMA burst transfer is requested, the clear signal is asserted during the transfer of the last data in the burst. For transmit: – UARTTXDMASREQ: Single character DMA transfer request, asserted by the UART. For transmit one character consists of up to eight bits. This signal is asserted when there is at least one empty location in the transmit FIFO. – UARTTXDMABREQ: Burst DMA transfer request, asserted by the UART. This signal is asserted when the transmit FIFO contains less characters than the watermark level. You can program the watermark level for each FIFO using the Interrupt FIFO Level Select Register, UARTIFLS. – UARTTXDMACLR: DMA request clear, asserted by a DMA controller to clear the transmit request signals. If DMA burst transfer is requested, the clear signal is asserted during the transfer of the last data in the burst. The burst transfer and single transfer request signals are not mutually exclusive, so they can both be asserted at the same time. When the UART is in the FIFO disabled mode (where both FIFOs act like a one-byte holding register), only the DMA single transfer mode can operate, because only one character can be transferred to or from the FIFO at any time. When the UART is in the FIFO enabled mode, data transfers can be made by either single or burst transfers depending on the programmed watermark level and the amount of data in the FIFO. In addition, the DMAONERR bit in the DMA Control Register (UARTDMACR) supports the use of the receive error interrupt, UARTEINTR. It enables the DMA receive request outputs, UARTRXDMASREQ or UARTRXDMABREQ, to be masked out when the UART error interrupt, UARTEINTR, is asserted. The DMA receive request outputs remain inactive until the UARTEINTR is cleared. The DMA transmit request outputs are unaffected. 404/590 Doc ID 018553 Rev 3 RM0078 Note: Asynchronous serial ports (UART) The two UART receive and transmit DMA interfaces as shown in Table 49: DMAC MUX selecting the peripheralare called UARTx_RX (comprising of UARTRXDMABREQ, UARTRXDMASREQ and UARTRXDMACLR), and UARTx_TX (comprising of UARTTXDMABREQ, UARTTXDMASREQ and UARTTXDMACLR). UARTx is the instance number. Synchronization registers and logic Because the UART supports both asynchronous and synchronous operation of the clocks PCLK and UARTCLK, synchronization registers and handshaking logic have been implemented and are active at all times. Synchronization of control signal is performed on both directions of data flow. 25.5.2 Modem operation You can use the UART to support the data terminal equipment (DTE) mode operation. Figure 133: UART block diagram shows the modem signals in the DTE mode, while the following table shows the meaning of the signals. Table 136. Meaning of modem input/output in DTE and DCE modes Meaning Port name DTE DCE nUARTCTS Clear to send Request to send nUARTDSR Data set ready Data terminal ready nUARTDCD Data carrier detect – Ring indicator – nUARTRTS Request to send Clear to send nUARTDTR Data terminal ready Data set ready nUARTRI Doc ID 018553 Rev 3 405/590 Asynchronous serial ports (UART) 25.5.3 RM0078 Hardware flow control The hardware flow control feature is fully selectable, and enables you to control the serial data flow by using the nUARTRTS output and nUARTCTS input signals. Figure 134 shows how two devices can communicate using hardware flow control. Figure 134. Hardware flow control between two similar devices RX FIFO and flow control TX FIFO and flow control nUARTRTS UARTRTS nUARTRTS UARTRTS nUARTCTS nUARTCTS UART 1 RX FIFO and flow control TX FIFO and flow control UART 2 When the RTS flow control is enabled, the nUARTRTS signal is asserted until the receive FIFO is filled up to the programmed watermark level. When the CTS flow control is enabled, the transmitter can only transmit data when the nUARTCTS signal is asserted. The hardware flow control is selectable through bits 14 (RTSEn) and 15 (CTSEn) of the UART control register (UARTCR). Table 137 lists how bits must be set to enable RTS and CTS flow control both simultaneously, and independently. When RTS flow control is enabled, the software cannot control the nUARTRTS line through bit 11 of the UART control register. Figure 135. Hardware flow control transfer diagram (start of transfer) UART1 RX nUARTRTS UARTx_RXD B0 Start bit B1 Pbit B3 - - - B7 N1 Stop Bit N1 N1 nUARTCTS UART2 TX Start bit B0 B1 N1 406/590 Doc ID 018553 Rev 3 B3 - - - B7 Pbit N1 Stop Bit N1 RM0078 Asynchronous serial ports (UART) Figure 136. Hardware flow control transfer diagram (end of transfer) UART1RX nUARTRTS B0 Start bit B1 Pbit B3 - - - B7 N1 Stop Bit N1 N1 nUARTCTS UART2TX Start bit B0 B1 B3 - - - B7 Pbit N1 N1 Stop Bit N1 No more frame is transferred after the above frame, as CTS is high. UART2 completes the ongoing frame and then stops to TX from the next frame. Table 137. Control bits to enable and disable hardware flow control UARTCR bit 15 (CTSEn) UARTCR bit 14 (RTSEn) 1 1 Both RTS and CTS flow control enabled 1 0 Only CTS flow control enabled 0 1 Only RTS flow control enabled 0 0 Both RTS and CTS flow control disabled Description RTS flow control The RTS flow control logic is linked to the programmable receive FIFO watermark levels. When RTS flow control is enabled, the nUARTRTS is asserted until the receive FIFO is filled up to the watermark level. When the receive FIFO watermark level is reached, the nUARTRTS signal is deasserted, indicating that there is no more room to receive any more data. The transmission of data is expected to cease after the current character has been transmitted. The nUARTRTS signal is reasserted when data has been read out of the receive FIFO so that it is filled to less than the watermark level. If RTS flow control is disabled and the UART is still enabled, then data is received until the receive FIFO is full, or no more data is transmitted to it. CTS flow control If CTS flow control is enabled, the transmitter checks the nUARTCTS signal before transmitting the next byte. If the nUARTCTS signal is asserted, the byte is transmitted, otherwise transmission does not occur. Data continues to be transmitted while nUARTCTS is asserted and the transmit FIFO is not empty. If the transmit FIFO is empty and the nUARTCTS signal is asserted, no data is transmitted. Doc ID 018553 Rev 3 407/590 Asynchronous serial ports (UART) RM0078 If the nUARTCTS signal is deasserted and CTS flow control is enabled, the current character transmission is completed before stopping. If CTS flow control is disabled and the UART is enabled, the data continues to be transmitted until the transmit FIFO is empty. 25.5.4 IrDA SIR ENDEC The IrDA SIR ENDEC comprises: ● an IrDA SIR transmit encoder and ● an IrDA SIR receive decoder The transmit encoder modulates the non return-to-zero (NRZ) transmit bit stream output from the UART. The IrDA SIR physical layer specifies use of a return to zero inverted (RZI) modulation scheme that represents logic 0 as an infrared light pulse. The modulated output pulse stream is transmitted to an external output driver and infrared light emitting diode (LED). In normal mode the transmitted pulse width is specified as three times the period of the internal x16 clock (Baud16), that is, 3/16 of a bit period. In low-power mode the transmit pulse width is specified as 3/16 of a 115.2 Kbits/s bit period. This is implemented as three times the period of a nominal 1.8432 MHz clock (IrLPBaud16) derived from dividing down of UARTCLK clock. The frequency of IrLPBaud16 is set up by writing the appropriate divisor value to UARTILPR. The active low encoder output is normally LOW for the marking state (no light pulse). The encoder outputs a high pulse to generate an infrared light pulse representing a logic 0 or spacing state. In normal and low power IrDA modes, when the fractional baud rate divider is used, the transmitted SIR pulse stream includes an increased amount of jitter. This jitter is because the Baud16 pulses cannot be generated at regular intervals when fractional division is used. That is, the Baud16 cycles have a different number of UARTCLK cycles. It can be shown that the worst case jitter in the SIR pulse stream can be up to three UARTCLK cycles. This is within the limits of the SIR IrDA Specification where the maximum amount of jitter allowed is 13%, as long as the UARTCLK is > 3.6864 MHz and the maximum baud rate used for normal mode SIR is <= 115.2 kbps. Under these conditions, the jitter is less than 9%. The receive decoder demodulates the return-to-zero bit stream from the infrared detector and outputs the received NRZ serial bit stream to the UART received data input. The decoder input is normally HIGH (marking state) in the idle state. The output polarity of the transmit encoder is opposite that of the decoder input. A start bit is detected when the decoder input is LOW. Regardless of of the power mode (normal or lowpower), a start bit is deemed valid if the decoder is still LOW, one period of IrLPBaud16 after the LOW was first detected. This enables a normal-mode UART to receive data from a lowpower mode UART that can transmit pulses as small as 1.41 µs. IrDA operation The IrDA SIR block (see Figure 137) contains an IrDA SIR protocol ENDEC. The SIR protocol ENDEC can be enabled for serial communication through signals nSIROUT and SIRIN to an infrared transducer, instead of using the UART signals UARTTXD and UARTRXD. 408/590 Doc ID 018553 Rev 3 RM0078 Asynchronous serial ports (UART) Figure 137. UART/IrDA block diagram TXD OR UARTTXD M U X nSIROUT APB SIR Transmit encoder SIREN UART0 TXD UART0_TXD UART_SIR_SEL UART nSIRIN RXD 1 AND SIR Receive decoder 0 UART0_RXD UARTRXD AND Wrapper To enable the SIR interface, you must configure the miscellaneous register PERIP_CFG , bit uart*_sir_uart_sel. If the SIR protocol is enabled, the UARTTXD line is held in the passive state (HIGH) and transitions of the modem status, or the UARTRXD line have no effect; this protocol can receive and transmit, but it is half-duplex only. The IrDA SIR ENDEC provides functionality that converts between an asynchronous UART data stream, and half-duplex serial SIR interface. No analog processing is performed onchip. The role of the SIR ENDEC is to provide a digital encoded output, and decoded input to the UART. There are two modes of operation: ● In normal IrDA mode, a zero logic level is transmitted as high pulse of 3/ 16th duration of the selected baud rate bit period on the nSIROUT signal, while logic one levels are transmitted as a static LOW signal. These levels control the driver of an infrared transmitter, sending a pulse of light for each zero. On the reception side, the incoming light pulses energize the photo transistor base of the receiver, pulling its output LOW. This drives the SIRIN signal LOW. ● In low-power IrDA mode, the width of the transmitted infrared pulse is set to three times the period of the internally generated IrLPBaud16 signal (1.63µs, assuming a nominal 1.8432 MHz frequency) by changing the appropriate bit in UARTCR. In both normal and low-power IrDA modes: ● during transmission, the UART data bit is used as the base for encoding ● during reception, the decoded bits are transferred to the UART receive logic The IrDA SIR physical layer specifies a half-duplex communication link, with a minimum 10 ms delay between transmission and reception. This delay must be generated by software because it is not supported by the UART. The delay is required because the infrared receiver electronics might become biased, or even saturated from the optical power coupled from the adjacent transmitter LED. This delay is known as latency, or receiver setup time. Doc ID 018553 Rev 3 409/590 Asynchronous serial ports (UART) RM0078 The IrLPBaud16 signal is generated by dividing down the UARTCLK signal according to the low-power divisor value written to UARTILPR. The low-power divisor value is calculated as follows: Low-power divisor = (FUARTCLK / FIrLPBaud16) Where FIrLPBaud16 is nominally 1.8432 MHz. The divisor must be chosen so that 1.42 MHz < FIrLPBaud16 < 2.12 MHz. IrDA data modulation Figure 138 shows the effect of IrDA 3/16 data modulation: Figure 138. IrDA data modulation (3/16) Data bits Start bit TXD 1 0 0 0 1 Stop bit 0 0 1 1 1 nSIROUT 3 16 Bit period Bit period SIRIN RXD 0 0 1 Start 25.5.5 0 1 0 1 0 1 1 Stop Data bits Baud rate generation and transmit logic UART character frame This is the frame format which is transmitted and received by UART from UARTx_TXD and UARTx_RXD pins respectively. Figure 139. UART character frame UARTx_TXD/ UARTx_RXD B0 B1 N1 N1 Legend: B: bits 0-7 (number depends on configuration) Pbit: Parity bit if parity is enabled Stop bits: 1 or 2 (number depends on configuration) 410/590 B3 – - - B7 Pbit Stop Bit N1 N1 Start Bit Doc ID 018553 Rev 3 RM0078 Asynchronous serial ports (UART) UART transmission UART transmits and receives the frame in Figure 139 in the following form, assuming the baud rate is 19200 bps. Data of 15 is received at RXFIFO in the following form. For example, 00010101: Figure 140. RXFIFO payload 1 0 1 0 1 0 0 0 Above is RXFIFO 8 bit payload, where the toggling rate is 19200. According to standard protocol, UART needs to generate a bit width of 52.08 us. Figure 141. UART transfer bit diagram UART generates bits with an error percentage of 1.56 %. See the example in the following section. Baud rate generation The baud rate divisor is a 22-bit number consisting of a 16-bit integer and a 6-bit fractional part. This is used by the baud rate generator to determine the bit period. The fractional baud rate divider enables the use of any clock with a frequency > 3.6864 MHz to act as UARTCLK, while it is still possible to generate all the standard baud rates. The baud rate divisor (BRD) has the following relationship to UARTCLK in MHz: Formula 1 6 × 10 = BRD + BRD ---------------------------------------------BRD = UARTCLK l F 16 × Baud rate Where BRDI is the integer part and BRDF is the fractional part separated by a decimal point as shown in the next figure: Figure 142. Baud rate divisor 16-bit integer 6-bit fractional part You can calculate the 6-bit number (m) by taking the fractional part of the required baud rate divisor and multiplying it by 64 (that is, 2n, where n is the width of the UARTFBRD register) and adding 0.5 to account for rounding errors: m = integer(BRDF * 2n + 0.5) Doc ID 018553 Rev 3 411/590 Asynchronous serial ports (UART) Note: RM0078 1 The contents of integer and fractional value registers (UARTIBRD and UARTFBRD) are not updated until current frame is transferred. 2 Integer value of 0 is invalid and fractional value is ignored in this case. 3 If integer value is 0xffff, the fractional value must not be greater than zero. Example: how to calculate BRD values Assuming that UARTCLK = 48 MHz, the required baud is 115.2 k. Using Formula 1 above: BRDI= (48 x 106)/ (16 x 115,2 x 103) = 26,0417= 26 BRDF = Integer ((0,0417 x 64) + 0,5= 3,1688 = 3 Generated baud divider = [(BRDF/ (2nbit)) + BRDI] = 3/64 =0,0469 + 26= 26,0469 Generated baud rate = (48 x 106)/ (16 x 26,0469) = 115.176,854 Error = (115200 -115167,5688)/115200 × 100 = 0.0002 % The maximum error using a 6-bit UARTFBRD register = 1/64 × 100 = 1.56 %. This occurs when m = 1, and the error is cumulative over 64 clock ticks. Frequency and baud rate constraints UART has certain constraints regarding frequency range and baud rates. The frequency selection for UARTCLK must be in the required range of baud rates: FUARTCLK (min) ≥ 16 x baud_rate(max) FUARTCLK (max) ≤ 16 x 65535 × baud_rate(min) For instance, for a range of baud rates from 110 baud to 460800 baud, the UARTCLK frequency must be between 7.3728 MHz and 115.34 MHz. If the baud rate required is high, UARTCLK minimum frequency must be high enough, as mentioned in the formulas above. Another constraint imposed is the clock frequency for PCLK in relation to UARTCLK. The frequency of UARTCLK must be no more than 5/3 times faster than the frequency of PCLK: FUARTCLK ≤ 5/3 x FPCLK 412/590 Doc ID 018553 Rev 3 RM0078 26 Synchronous serial port (SSP) Synchronous serial port (SSP) This chapter focuses on SSP functionality and operation. For the SSP feature list, refer to the SPEAr1340 datasheet: ● Doc ID 023063, Data sheet, SPEAr1340, Dual-core Cortex A9 HMI embedded MPU For technical details about the programmable registers, refer to the following companion document: ● 26.1 RM0089, Reference manual, SPEAr1340 address map and registers Overview The synchronous serial port (SSP) block includes a master or slave interface to enable synchronous serial communication with slave or master peripherals. Figure 143. SSP block diagram PCLK Tx FIFO (16x8) FIFO status and Interrupt generation AMBA APB Interface SSPINTR Rx FIFO (16x8) PCLK PCLK SSPCLK PCLK Register block SSPCLK SSPTXD Clock prescaler Transmit/ receive logic DMA interface 26.2 SSPCLKOUT SSPCLKIN SSPRXD Pins For a complete pin description, refer to Doc ID 023063, Data sheet, SPEAr1340, Dual-core Cortex A9 HMI embedded MPU. Doc ID 018553 Rev 3 413/590 Synchronous serial port (SSP) 26.3 RM0078 Clocks The SSP clocks are: ● PCLK: the APB clock ● SSP0_SCK (external) which is: ● – SSPCLKOUT when SSP works as a master – SSPCLKIN when SSP works as a slave SSPCLK: main SSP clock input (internal) See also Chapter 5: Reset and clock generator (RCG). 26.4 Functional description 26.4.1 Main interfaces APB slave interface The AMBA APB interface generates read and write decodes for accesses to status and control registers, and transmit and receive FIFO memories. The AMBA APB is a local secondary bus that provides a low-power extension to the higher bandwidth AMBA advanced high-performance bus (AHB) within the AMBA system hierarchy. The AMBA APB groups narrow-bus peripherals to avoid loading the system bus and provides an interface using memory-mapped registers, which are accessed under programmed control. Register block The register block stores data written or to be read across the AMBA APB interface. Clock prescaler When configured as a master, an internal prescaler, comprising two free-running reloadable serially linked counters, is used to provide the serial output clock CLKOUT. You can program the clock prescaler, through the SSPCPSR register, to divide CLK by a factor of 2 to 254 in steps of two. By not utilizing the least significant bit of the SSPCPSR register, division by an odd number is not possible and this ensures a symmetrical (equal mark space ratio) clock is generated. The output of the prescaler is further divided by a factor of 1 to 256, through the programming of the SSPCR0 control register, to give the final master output clock CLKOUT. Transmit FIFO The common transmit FIFO is a 16-bit wide, 8-location deep, first-in, first-out memory buffer. CPU data written across the AMBA APB interface are stored in the buffer until read out by the transmit logic. When configured as a master or a slave parallel data is written into the transmit FIFO prior to serial conversion and transmission to the attached slave or master respectively, through the SSPTXD pin. 414/590 Doc ID 018553 Rev 3 RM0078 Synchronous serial port (SSP) Receive FIFO The common receive FIFO is a 16-bit wide, 8-location deep, first-in, first-out memory buffer. Received data from the serial interface are stored in the buffer until read out by the CPU across the AMBA APB interface. When configured as a master or slave, serial data received through the SSPRXD pin is registered prior to parallel loading into the attached slave or master receive FIFO respectively. Transmit and receive logic When configured as a master, the clock to the attached slaves is derived from a divided down version of CLK through the prescaler operations described previously. The master transmit logic successively reads a value from its transmit FIFO and performs parallel to serial conversion on it. Then the serial data stream and frame control signal, synchronized to CLKOUT, are output through the TXD pin to the attached slaves. The master receive logic performs serial to parallel conversion on the incoming synchronous SSPRXD data stream, extracting and storing values into its receive FIFO, for subsequent reading through the APB interface. When configured as a slave, the SSPCLKIN clock is provided by an attached master and used to time its transmission and reception sequences. The slave transmit logic, under control of the master clock, successively reads a value from its transmit FIFO, performs parallel to serial conversion, then output the serial data stream and frame control signal through the slave SSPTXD pin. The slave receive logic performs serial to parallel conversion on the incoming SSPRXD data stream, extracting and storing values into its receive FIFO, for subsequent reading through the APB interface. Interrupt generation logic The SSP generates four individual maskable, active HIGH interrupts. A combined interrupt output is also generated as an OR function of the individual interrupt requests. You can use the single combined interrupt with a system interrupt controller that provides another level of masking on a per-peripheral basis. This allows use of modular device drivers that always know where to find the interrupt source control register bits. The individual interrupt requests could also be used with a system interrupt controller that provides masking for the outputs of each peripheral. In this way, a global interrupt controller service routine would be able to read the entire set of sources from one wide register in the system interrupt controller. This is attractive where the time to read from the peripheral registers is significant compared to the CPU clock speed in a real-time system. The peripheral supports both the above methods. The transmit and receive dynamic data-flow interrupts, TXINTR and RXINTR, are separated from the status interrupts so that data can be read or written in response to the FIFO trigger levels. DMA interface This block manages the DMA interface. It can work in single transfer mode or in burst transfer mode. The DMA operation of the PrimeCell SSP is controlled through the DMA control register, SSPDMACR. Doc ID 018553 Rev 3 415/590 Synchronous serial port (SSP) RM0078 The DMA interface includes the following signals: ● ● For receive – SSPRXDMASREQ: Single-character DMA transfer request, asserted by the SSP. This signal is asserted when the receive FIFO contains at least one character. – SSPRXDMABREQ: Burst DMA transfer request, asserted by the SSP. This signal is asserted when the receive FIFO contains four or more characters. – SSPRXDMACLR: DMA request clear, asserted by the DMA controller to clear the receive request signals. If DMA burst transfer is requested, the clear signal is asserted during the transfer of the last data in the burst. For transmit – SSPTXDMASREQ: Single-character DMA transfer request, asserted by the SSP. This signal is asserted when there is at least one empty location in the transmit FIFO. – SSPTXDMABREQ: Burst DMA transfer request, asserted by the SSP. This signal is asserted when the transmit FIFO contains four or less characters. – SSPTXDMACLR: DMA request clear, asserted by the DMA controller to clear the transmit request signals. If DMA burst transfer is requested, the clear signal is asserted during the transfer of the last data in the burst. The burst transfer and single transfer request signals are not mutually exclusive. They can both be asserted at the same time. For example, when there is more data than the watermark level of four in the receive FIFO, the burst transfer request and the single transfer request are asserted. When the amount of data left in the receive FIFO is less than the watermark level, the single request only is asserted. Each request signal remains asserted until the relevant DMA clear signal is asserted. After the request clear signal is deasserted, a request signal can become active again, depending on the conditions described above. Note: The two SSP receive and transmit DMA interfaces as shown in Table 50: DMAC MUX selecting the peripheral are called SSPn_RX (consisting of SSPRXDMABREQ, SSPRXDMASREQ and SSPRXDMACLR), and SSPn_TX (consisting of SSPTXDMABREQ, SSPTXDMASREQ and SSPTXDMACLR). For more detail on this interface, refer to Chapter 12: Direct memory access controllers (DMAC). Synchronizing registers and logic The SSP supports both asynchronous and synchronous operation of the clocks, PCLK and SSPCLK. Synchronization registers and handshaking logic have been implemented, and are active at all times. This has a minimal impact on performance or area. Synchronization of control signals is performed on both directions of data flow, which is from the PCLK to the SSPCLK domain and from the SSPCLK to the PCLK domain. 26.5 Operation This section describes the operation of the SSP block. After reset, the SSP logic is disabled and must be configured. The SSP can be configured as master or slave (see Section 26.6.3: Configuring SSP as master or slave) 416/590 Doc ID 018553 Rev 3 RM0078 Synchronous serial port (SSP) The bit rate, derived from the APB clock (PCLK), requires the programming of the clock prescale register SSPCPSR (refer to RM0089, Reference manual, SPEAr1340 address map and registers, MISC registers, for the PCLK frequency). 26.5.1 Bit rate generation Dividing down the input clock SSPCLK derives the serial bit rate. The clock is first divided by an even prescale value CPSDVSR from 2 to 254, which is programmed in SSPCPSR. The clock is further divided by a value from 1 to 256, which is 1 + SCR, where SCR is the value programmed in SSPCR0. The frequency of the output signal bit clock SSPCLKOUT is: ( FSSPCLK ) SSPCLKOUT = -----------------------------------------------------------------[ CPSDVR ⋅ ( 1 + SCR ) ] For example, if SSPCLK is 3.6864 MHz, and CPSDVSR = 2, then SSPCLKOUT has a frequency range from 7.2 kHz to 1.8432 MHz. 26.5.2 Frame format Each data frame is between 4- to 16-bit long depending on the size of data programmed, and is transmitted starting with the MSB. There are three basic frame types that can be selected: ● Texas Instruments synchronous serial ● Motorola SPI ● National Semiconductor Microwire For all three formats, the serial clock (SSPCLKOUT) is held inactive while the SSP is idle, and transitions at the programmed frequency only during active transmission or reception of data. The idle state of SSPCLKOUT is utilized to provide a receive timeout indication that occurs when the receive FIFO still contains data after a timeout period. For Motorola SPI and National Semiconductor Microwire frame formats, the serial frame (SSPFSSOUT) pin is active LOW, and is asserted (pulled down) during the entire transmission of the frame. For Texas Instruments synchronous serial frame format, the SSPFSSOUT pin is pulsed for one serial clock period starting at its rising edge, prior to the transmission of each frame. For this frame format, both the SSP and the off-chip slave device drive their output data on the rising edge of SSPCLKOUT, and latch data from the other device on the falling edge. Unlike the full-duplex transmission of the other two frame formats, the National Semiconductor Microwire format uses a special master-slave messaging technique, which operates at half-duplex. In this mode, when a frame begins, an 8-bit control message is transmitted to the off-chip slave. During this transmission no incoming data is received by the SSP. After the message has been sent, the off-chip slave decodes it and, after waiting one serial clock after the last bit of the 8-bit control message has been sent, responds with the requested data. The returned data can be 4 to 16-bits in length, making the total frame length anywhere from 13 to 25-bits. Doc ID 018553 Rev 3 417/590 Synchronous serial port (SSP) RM0078 26.6 Programming 26.6.1 Defining the chip select Four chip select lines are available to verify the real availability of the external signal; however, only one can be operational at time. To select the active one, you have to program the ssp_cs_en bit of the PERIPH_CFG miscellaneous register. It is also possible driving by software the chip select with hs_ssp_sw_cs; this feature is enabled by hs_ssp_en. The chip select driven by software is mandatory when the prescaler is set to 2 that means (SCR=0 and CPSDVR=2). All the bits mentionned above (sp_cs_en, hs_ssp_sw_cs and hs_ssp_en) belong to the PERIPH_CFG miscellaneous register. Table 138. External CS selection 26.6.2 ssp_cs_en [0,1] (from MISC) CS 00 SSPFSSOUT_0 01 SSPFSSOUT_1 10 SSPFSSOUT_2 11 SSPFSSOUT_3 Enabling SSP operation To enable SSP, you can either: ● prime the transmit FIFO, by writing up to eight 16-bit values when the SSP is disabled, -or- ● allow the transmit FIFO service request to interrupt the CPU. Once enabled, transmission or reception of data begins on the transmit (SSPTXD) and receive (SSPRXD) pins respectively. Clock ratios There is a constraint on the ratio of the frequencies of PCLK to SSPCLK. The frequency of SSPCLK must be less than or equal to that of PCLK. This ensures that control signals from the SSPCLK domain to the PCLK domain are certain to get synchronized before one frame duration: FSSPCLK <= FPCLK. In the slave operation, the SSPCLKIN signal from the external master is double synchronized and then delayed to detect an edge. It takes three SSPCLKs to detect an edge on SSPCLKIN. SSPTXD has less setup time to the falling edge of SSPCLKIN on which the master is sampling the line. The setup and hold times on SSPRXD with reference to SSPCLKIN must be more conservative to ensure that it is at the right value when the actual sampling occurs within the SSPMS. To ensure correct device operation, SSPCLK must be at least 12 times faster than the maximum expected frequency of SSPCLKIN. The frequency selected for SSPCLK must accommodate the desired range of bit clock rates. The ratio of minimum SSPCLK frequency to SSPCLKOUT maximum frequency in the case of the slave mode is 12 and for the master mode it is two. 418/590 Doc ID 018553 Rev 3 RM0078 Synchronous serial port (SSP) To generate a maximum bit rate of 1.8432 Mbps in the master mode, the frequency of SSPCLK must be at least 3.6864 MHz. With an SSPCLK frequency of 3.6864 MHz, the SSPCPSR register must be programmed with a value of two and the SCR[7:0] field in the SSPCR0 register needs to be programmed as zero. To work with a maximum bit rate of 1.8432 Mbps in the slave mode, the frequency of SSPCLK must be at least 22.12 MHz. With an SSPCLK frequency of 22.12 MHz, the SSPCPSR register can be programmed with a value of 12 and the SCR[7:0] field in the SSPCR0 register can be programmed as zero. Similarly the ratio of SSPCLK maximum frequency to SSPCLKOUT minimum frequency is 254 x 256. The minimum frequency of SSPCLK is governed by the following equations, both of which have to be satisfied: FSSPCLK(min) => 2 x FSSPCLKOUT(max) [for master mode] FSSPCLK(min) => 12 x FSSPCLKIN(max) [for slave mode] The maximum frequency of SSPCLK is governed by the following equations, both of which have to be satisfied: FSSPCLK(max) <= 254 x 256 x FSSPCLKOUT(min) [for master mode] FSSPCLK(max) <= 254 x 256 x FSSPCLKIN(min) [for slave mode] 26.6.3 Configuring SSP as master or slave Through the control registers SSPCR0 and SSPCR1, you can configure the peripheral as a master or slave operating under one of the following protocols: ● Motorola SPI ● Texas Instruments SSI ● National Semiconductor Programming the SSPCR0 control register The SSPCR0 register is used to: ● program the serial clock rate ● select one of the three protocols ● select the data word size (where applicable) The serial clock rate (SCR) value, in conjunction with the SSPCPSR clock prescale divisor value (CPSDVSR), is used to derive the SSP transmit and receive bit rate from the external SSPCLK. The frame format is programmed through the FRF bits and the data word size through the DSS bits. Bit phase and polarity, applicable to Motorola SPI format only, are programmed through the SPH and SPO bits. Programming the SSPCR1 control register The SSPCR1 register is used to: ● select master or slave mode ● enable a loop back test feature ● enable the SSP peripheral Doc ID 018553 Rev 3 419/590 Synchronous serial port (SSP) RM0078 To configure the SSP as a master, clear the SSPCR1 register master or slave selection bit (MS) to 0, which is the default value on reset. To configure the SSP as a slave, set the SSPCR1 register MS bit to 1. In this configuration, to enable or disable the SSP SSPTXD signal, use the SSPCR1 slave mode SSPTXD output disable bit (SOD). This can be used in some multislave environments where masters might parallel broadcast. To enable the operation of the SSP, set the synchronous serial port enable (SSE) bit to 1. 420/590 Doc ID 018553 Rev 3 RM0078 27 I2C bus controllers (I2C) I2C bus controllers (I2C) This chapter focuses on I2C functionality and operation. For the I2C feature list, refer to the SPEAr1340 datasheet: ● Doc ID 023063, Data sheet, SPEAr1340, Dual-core Cortex A9 HMI embedded MPU For technical details about the programmable registers, refer to the following companion document: ● 27.1 RM0089, Reference manual, SPEAr1340 address map and registers Overview The SPEAr1340 device integrates 1 instance of an I2C controller, identified as I2C0. The I2C controller acts as an APB slave interface to the two-wire serial I2C bus. Figure 144. I2C block diagram I2C controller AMBA bus interface unit Register file Slave state machine Master state machine Clock generator Rx shift Tx shift Rx filter Toggle Synchronizer DMA interface Interrupt controller RX FIFO 27.2 TX FIFO Pins For a complete pin description, refer to Doc ID 023063, Data sheet, SPEAr1340, Dual-core Cortex A9 HMI embedded MPU. 27.3 Clocks See Chapter 5: Reset and clock generator (RCG). Doc ID 018553 Rev 3 421/590 I2C bus controllers (I2C) 27.4 RM0078 Functional description The I2C bus is a two-wire serial interface, consisting of a serial data line (SDA) and a serial clock (SCL). These wires carry information between the devices connected to the bus. Each device is recognized by a unique address and can operate as either a “transmitter” or “receiver,” depending on the function of the device. Devices can also be considered as masters or slaves when performing data transfers. A master is a device that initiates a data transfer on the bus and generates the clock signals to permit that transfer. At that time, any device addressed is considered a slave. Note: The I2C must only be programmed to operate in either master OR slave mode only. Operating as a master and slave simultaneously is not supported. The I2C module can operate in standard mode (with data rates up to 100 Kb/s), fast mode (with data rates up to 400 Kb/s), and high-speed mode (with data rates up to 3.4 Mb/s). The I2C can communicate with devices only of these modes as long as they are attached to the bus. Additionally, highspeed mode and fast mode devices are downward compatible. For instance, high-speed mode devices can communicate with fast mode and standard mode devices in a mixed-speed bus system; fast mode devices can communicate with standard mode devices in 0 to 100 Kb/s I2C bus system. However, standard mode devices are not upward compatible and should not be incorporated in a fast-mode I2C bus system as they cannot follow the higher transfer rate and unpredictable states would occur. An example of high-speed mode devices are LCD displays, high-bit count ADCs, and high capacity EEPROMs. These devices typically need to transfer large amounts of data. Most maintenance and control applications, the common use for the I²C bus, typically operate at 100 kHz (in standard and fast modes). Any I2C device can be attached to an I²C-bus and every device can talk with any master, passing information back and forth. There needs to be at least one master (such as a microcontroller or DSP) on the bus but there can be multiple masters, which require them to arbitrate for ownership. Multiple masters and arbitration are explained later in this chapter. 27.4.1 Main interfaces The I2C is made up of an AMBA APB slave interface, an I2C interface, and FIFO logic to maintain coherency between the two interfaces. A simplified block diagram of the component is illustrated in Figure 144. The following list defines the main functions of the I2C blocks. ● AMBA bus interface unit: takes the APB interface signals and translates them into a common generic interface that allows the register file to be bus protocol- agnostic. ● Register file: contains configuration registers and is the interface with software. ● Slave state machine: follows the protocol for a slave and monitors bus for address match. ● Master state machine: generates the I2C protocol for the master transfers. ● Clock generator: calculates the required timing to do the following: ● 422/590 – generate the SCL clock when configured as a master – check for bus idle – generate a START and a STOP – set up the data and hold the data Rx shift: takes data into the design and extracts it in byte format. Doc ID 018553 Rev 3 RM0078 27.4.2 I2C bus controllers (I2C) ● Tx shift: presents data supplied by CPU for transfer on the I2C bus. ● Rx filter: detects the events in the bus; for example, start, stop and arbitration lost. ● Toggle: generates pulses on both sides and toggles to transfer signals across clock domains. ● Synchronizer: transfers signals from one clock domain to another. ● DMA interface: generates the handshaking signals to the central DMA controller in order to automate the data transfer without CPU intervention. ● Interrupt controller: Generates the raw interrupt and interrupt flags, allowing them to be set and cleared. ● RX FIFO/TX FIFO: holds the RX FIFO and TX FIFO register banks and controllers, along with their status levels. I2C terminology The following terms are used throughout this manual and are defined as follows. I2C bus terms The following terms relate to how the role of the I2C device and how it interacts with other I2C devices on the bus. ● Transmitter: the device that sends data to the bus. A transmitter can either be a device that initiates the data transmission to the bus (a master-transmitter) or responds to a request from the master to send data to the bus (a slave-transmitter). ● Receiver: the device that receives data from the bus. A receiver can either be a device that receives data on its own request (a master-receiver) or in response to a request from the master (a slave-receiver). ● Master: the component that initializes a transfer (START command), generates the clock (SCL) signal and terminates the transfer (STOP command). A master can be either a transmitter or a receiver. ● Slave: the device addressed by the master. A slave can be either receiver or transmitter. These concepts are illustrated in Figure 145. Figure 145. Master/slave and transmitter/receiver relationships Master Slave SDA Transmitter Receiver SCL Master Slave SDA Transmitter Receiver SCL Doc ID 018553 Rev 3 423/590 I2C bus controllers (I2C) RM0078 ● Multi-master: the ability for more than one master to co-exist on the bus at the same time without collision or data loss. ● Arbitration: the predefined procedure that authorizes only one master at a time to take control of the bus. For more information about this behavior, refer to Section 27.4.5: Multiple master arbitration. ● Synchronization: the predefined procedure that synchronizes the clock signals provided by two or more masters. For more information about this feature, refer to Section 27.4.6: Clock synchronization. ● SDA: data signal line (Serial DAta) ● SCL: clock signal line (Serial CLock) Bus transfer terms The following terms are specific to data transfers that occur to/from the I2C bus. ● Note: START and RESTART conditions are functionally identical. ● 27.4.3 START (RESTART): data transfer begins with a START or RESTART condition. The level of the SDA data line changes from high to low, while the SCL clock line remains high. When this occurs, the bus becomes busy. STOP: data transfer is terminated by a STOP condition. This occurs when the level on the SDA data line passes from the low state to the high state, while the SCL clock line remains high. When the data transfer has been terminated, the bus is free or idle once again. The bus stays busy if a RESTART is generated instead of a STOP condition. I2C behavior The I2C can be controlled via software to be either: ● An I2C master only, communicating with other I2C slaves; OR ● An I2C slave only, communicating with one more I2C masters. The master is responsible for generating the clock and controlling the transfer of data. The slave is responsible for either transmitting or receiving data to/from the master. The acknowledgement of data is sent by the device that is receiving data, which can be either a master or a slave. As mentioned previously, the I2C protocol also allows multiple masters to reside on the I2C bus and uses an arbitration procedure to determine bus ownership. Each slave has a unique address that is determined by the system designer. When a master wants to communicate with a slave, the master transmits a START/RESTART condition that is then followed by the slave’s address and a control bit (R/W) to determine if the master wants to transmit data or receive data from the slave. The slave then sends an acknowledge (ACK) pulse after the address. If the master (master-transmitter) is writing to the slave (slave-receiver), the receiver gets one byte of data. This transaction continues until the master terminates the transmission with a STOP condition. If the master is reading from a slave (master-receiver), the slave transmits (slave-transmitter) a byte of data to the master, and the master then acknowledges the transaction with the ACK pulse. This transaction continues until the master terminates the transmission by not acknowledging (NACK) the transaction after the last byte is received, and then the master issues a STOP condition or addresses another slave after issuing a RESTART condition. This behavior is illustrated in Figure 146. 424/590 Doc ID 018553 Rev 3 RM0078 I2C bus controllers (I2C) Figure 146. Data transfer on the I2C bus P or R MSB SDA LSB ACK ACK from slave SCL S or R 1 2 7 8 Byte complete interrupt within slave START or RESTART condition from receiver 9 1 2 SCL held low while servicing interrupts 3-8 9 R or P STOP AND RESTART condition The I2C is a synchronous serial interface. The SDA line is a bidirectional signal and changes only while the SCL line is low, except for STOP, START, and RESTART conditions. The output drivers are open-drain or open-collector to perform wire-AND functions on the bus. The maximum number of devices on the bus is limited by only the maximum capacitance specification of 400 pF. Data is transmitted in byte packages. Putting data into the FIFO generates a START, and emptying the FIFO generates a STOP. For more information, refer to START and STOP generation. The I2C protocols implemented in I2C are described in more details in Section 27.4.4: I2C protocols. START and STOP generation When operating as an I2C master, putting data into the transmit FIFO causes the I2C to generate a START condition on the I2C bus. Allowing the transmit FIFO to empty causes the I2C to generate a STOP condition on the I2C bus. When operating as a slave, the I2C does not generate START and STOP conditions, as per the protocol. However, if a read request is made to the I2C, it holds the SCL line low until read data has been supplied to it. This stalls the I2C bus until read data is provided to the slave I2C, or the I2C slave is disabled by writing a 0 to IC_ENABLE. Combined formats The I2C supports mixed read and write combined format transactions in both 7-bit and 10bit addressing modes. The I2C does not support mixed address and mixed address format—that is, a 7-bit address transaction followed by a 10-bit address transaction or vice versa—combined format transactions. To initiate combined format transfers, IC_CON.IC_RESTART_EN should be set to 1. With this value set and operating as a master, when the I2C completes an I2C transfer, it checks the transmit FIFO and executes the next transfer. If the direction of this transfer differs from the previous transfer, the combined format is used to issue the transfer. If the transmit FIFO is empty when the current I2C transfer completes, a STOP is issued and the next transfer is issued following a START condition. Doc ID 018553 Rev 3 425/590 I2C bus controllers (I2C) 27.4.4 RM0078 I2C protocols The I2C has the protocols dicussed in this section. START and STOP conditions When the bus is idle, both the SCL and SDA signals are pulled high through external pull-up resistors on the bus. When the master wants to start a transmission on the bus, the master issues a START condition. This is defined to be a high-to-low transition of the SDA signal while SCL is 1. When the master wants to terminate the transmission, the master issues a STOP condition. This is defined to be a low-to-high transition of the SDA line while SCL is 1. Figure 147 shows the timing of the START and STOP conditions. When data is being transmitted on the bus, the SDA line must be stable when SCL is 1. Figure 147. START and STOP condition SDA SCL P S Start condition Note: Data line stable data valid Change of data allowed Change of data allowed Stop condition The signal transitions for the START/STOP conditions, as depicted in Figure 147, reflect those observed at the output signals of the Master driving the I2C bus. Care should be taken when observing the SDA/SCL signals at the input signals of the Slave(s), because unequal line delays may result in an incorrect SDA/SCL timing relationship. Addressing slave protocol There are two address formats: the 7-bit address format and the 10-bit address format. 7-bit address format During the 7-bit address format, the first seven bits (bits 7:1) of the first byte set the slave address and the LSB bit (bit 0) is the R/W bit as shown in Figure 148. When bit 0 (R/W) is set to 0, the master writes to the slave. When bit 0 (R/W) is set to 1, the master reads from the slave. Figure 148. 7-bit address format MSB S A6 LSB A5 A4 A3 A2 A1 A0 R/W ACK Sent by slave Slave address S = START condition 426/590 ACK = Acknowledge Doc ID 018553 Rev 3 R/W = Read/Write pulse RM0078 I2C bus controllers (I2C) 10-bit address format During 10-bit addressing, two bytes are transferred to set the 10-bit address. The transfer of the first byte contains the following bit definition. The first five bits (bits 7:3) notify the slaves that this is a 10-bit transfer followed by the next two bits (bits 2:1), which set the slaves address bits 9:8, and the LSB bit (bit 0) is the R/W bit. The second byte transferred sets bits 7:0 of the slave address. Figure 149 shows the 10-bit address format, and Table 139 defines the special purpose and reserved first byte addresses. Figure 149. 10-bit address format S ‘1’ ‘1’ ‘1’ ‘1’ ‘0’ A9 A8 R/W ACK A7 A6 A5 A4 A3 A2 A1 A0 ACK Sent by slave Reserved for 10-bit address Sent by slave S = START condition R/W = Read/Write pulse ACK = Acknowledge Table 139. I2C definition of bits in first byte Slave address R/W bit Description 0000 000 0 General call address. I2C places the data in the receive buffer and issues a general call interrupt. 0000 000 1 START byte. For more information, refer to START BYTE transfer protocol on page 429. 0000 001 X CBUS address. I2C ignores these accesses. 0000 010 X Reserved 0000 011 X Reserved 0000 1xx X High-speed master code. For more information, refer to Section 27.4.5: Multiple master arbitration. 1111 1xx X Reserved 1111 0xx X 10-bit slave addressing I2C does not restrict you from using these reserved addresses. However, if you use these reserved addresses, you may run into incompatibilities with other I2C components. Transmitting and receiving protocol The master can initiate data transmission and reception to/from the bus, acting as either a master-transmitter or master-receiver. A slave responds to requests from the master to either transmit data or receive data to/from the bus, acting as either a slave-transmitter or slave-receiver, respectively. Doc ID 018553 Rev 3 427/590 I2C bus controllers (I2C) RM0078 Master-transmitter and slave-receiver All data is transmitted in byte format, with no limit on the number of bytes transferred per data transfer. After the master sends the address and R/W bit or the master transmits a byte of data to the slave, the slave-receiver must respond with the acknowledge signal (ACK). When a slave-receiver does not respond with an ACK pulse, the master aborts the transfer by issuing a STOP condition. The slave must leave the SDA line high so that the master can abort the transfer. If the master-transmitter is transmitting data as shown in Figure 150, then the slave-receiver responds to the master-transmitter with an acknowledge pulse after every byte of data is received. Figure 150. Master-transmitter protocol For 7-bit Address S Slave Address R/W A DATA A DATA A /A P ‘0’ (write) For 10-bit Address S Slave Address R/W A Slave Address Second Byte First 7 bits ‘0’ (write) ‘11110xxx’ A DATA A /A P ‘0’ (write) From Master to Slave From Slave to Master A = Acknowledge (SDA low) A = No acknowledge (SDA high) S = START condition P = STOP condition Master-receiver and slave-transmitter If the master is receiving data as shown in Figure 151, then the master responds to the slave-transmitter with an acknowledge pulse after a byte of data has been received, except for the last byte. This is the way the master-receiver notifies the slave-transmitter that this is the last byte. The slave-transmitter relinquishes the SDA line after detecting the No Acknowledge (NACK) so that the master can issue a STOP condition. When a master does not want to relinquish the bus with a STOP condition, the master can issue a RESTART condition. This is identical to a START condition except it occurs after the ACK pulse. Operating in master mode, the I2C can then communicate with the same slave using a transfer of a different direction. For a description of the combined format transactions that the I2C supports, refer to Combined formats on page 425. Note: 428/590 The I2C must be inactive on the serial port—if I2C_DYNAMIC_TAR_UPDATE = 1—before the target slave address register (IC_TAR) can be reprogrammed. Doc ID 018553 Rev 3 RM0078 I2C bus controllers (I2C) Figure 151. Master-receiver protocol For 7-bit Address S Slave Address R/W A DATA A DATA A P ‘1’ (read) For 10-bit Address S Slave Address Slave Address R/W A First 7 bits ‘0’ (write) Second Byte ‘11110xxx’ Slave Address First 7 bits A Sr ‘11110xxx’ ‘0’ (write) From Master to Slave A = Acknowledge (SDA low) A = No acknowledge (SDA high) S = START condition From Slave to Master R/W A DATA A P ‘1’ (read) R = RESTART condition P = STOP condition START BYTE transfer protocol The START BYTE transfer protocol is set up for systems that do not have an on-board dedicated I2C hardware module. When the I2C is addressed as a slave, it always samples the I2C bus at the highest speed supported so that it never requires a START BYTE transfer. However, when I2C is a master, it supports the generation of START BYTE transfers at the beginning of every transfer in case a slave device requires it. This protocol consists of seven zeros being transmitted followed by a 1, as illustrated in Figure 152. This allows the processor that is polling the bus to under-sample the address phase until 0 is detected. Once the microcontroller detects a 0, it switches from the under sampling rate to the correct rate of the master. Figure 152. START BYTE transfer SDA dummy acknowledge SCL 1 2 7 S 8 (HIGH) 9 ACK Sr start byte 00000001 The START BYTE procedure is as follows: 1. Master generates a START condition. 2. Master transmits the START byte (0000 0001). 3. Master transmits the ACK clock pulse. (Present only to conform with the byte handling format used on the bus) 4. No slave sets the ACK signal to 0. 5. Master generates a RESTART (R) condition. A hardware receiver does not respond to the START BYTE because it is a reserved address and resets after the RESTART condition is generated. Doc ID 018553 Rev 3 429/590 I2C bus controllers (I2C) 27.4.5 RM0078 Multiple master arbitration The I2C bus protocol allows multiple masters to reside on the same bus. If there are two masters on the same I²C-bus, there is an arbitration procedure if both try to take control of the bus at the same time by generating a START condition at the same time. Once a master (for example, a microcontroller) has control of the bus, no other master can take control until the first master sends a STOP condition and places the bus in an idle state. Arbitration takes place on the SDA line, while the SCL line is 1. The master, which transmits a 1 while the other master transmits 0, loses arbitration and turns off its data output stage. The master that lost arbitration can continue to generate clocks until the end of the byte transfer. If both masters are addressing the same slave device, the arbitration could go into the data phase. Upon detecting that it has lost arbitration to another master, the I2C will stop generating SCL (ic_clk_oe). Figure 153 illustrates the timing of when two masters are arbitrating on the bus. Figure 153. Multiple master arbitration DATA1 ‘1’ MSB DATA1 loses arbitration matching data MSB DATA2 ‘0’ SDA mirrors DATA2 SDA MSB SCL SDA lines up with DATA1 START condition For high-speed mode, the arbitration cannot go into the data phase because each master is programmed with a unique high-speed master code. This 8-bitcode is defined by the system designer and is set by writing to the High Speed Master Mode Code Address Register, IC_HS_MADDR. Because the codes are unique, only one master can win arbitration, which occurs by the end of the transmission of the high-speed master code. Control of the bus is determined by address or master code and data sent by competing masters, so there is no central master nor any order of priority on the bus. 430/590 Doc ID 018553 Rev 3 RM0078 I2C bus controllers (I2C) Arbitration is not allowed between the following conditions: ● A RESTART condition and a data bit ● A STOP condition and a data bit ● A RESTART condition and a STOP condition Slaves are not involved in the arbitration process. 27.4.6 Clock synchronization When two or more masters try to transfer information on the bus at the same time, they must arbitrate and synchronize the SCL clock. All masters generate their own clock to transfer messages. Data is valid only during the high period of SCL clock. Clock synchronization is performed using the wired-AND connection to the SCL signal. When the master transitions the SCL clock to 0, the master starts counting the low time of the SCL clock and transitions the SCL clock signal to 1 at the beginning of the next clock period. However, if another master is holding the SCL line to 0, then the master goes into a HIGH wait state until the SCL clock line transitions to 1. All masters then count off their high time, and the master with the shortest high time transitions the SCL line to 0. The masters then counts out their low time and the one with the longest low time forces the other master into a HIGH wait state. Therefore, a synchronized SCL clock is generated, which is illustrated in Figure 154. Optionally, slaves may hold the SCL line low to slow down the timing on the I2C bus. Figure 154. Multi-master clock synchronization Wait State Start counting HIGH period CLKA CLKB SCL SCL LOW transition Resets all CLKs to start counting their LOW periods SCL transitions HIGH when all CLKs are in HIGH state Doc ID 018553 Rev 3 431/590 I2C bus controllers (I2C) 27.4.7 RM0078 IC_CLK frequency configuration When the I2C is configured as a master, the *CNT registers must be set before any I2C bus transaction can take place in order to ensure proper I/O timing. The *CNT registers are: Note: ● IC_SS_SCL_HCNT ● IC_SS_SCL_LCNT ● IC_FS_SCL_HCNT ● IC_FS_SCL_LCNT ● IC_HS_SCL_HCNT ● IC_HS_SCL_LCNT It is not necessary to program any of the *CNT registers if the I2C is enabled to operate only as an I2C slave, since these registers are used only to determine the SCL timing requirements for operation as an I2C master. Minimum high and low counts When the I2C operates as an I2C master, in both transmit and receive transfers: ● Minimum value that can be programmed in the *_LCNT registers is 8 ● Minimum value allowed for the *_HCNT registers is 6 The minimum value of 8 for the *_LCNT registers is due to the time required for the I2C to drive SDA after a negative edge of SCL. The minimum value of 6 for the *_HCNT register is due to the time required for the I2C to sample SDA during the high period of SCL. The I2C adds one cycle to the programmed *_LCNT value in order to generate the low period of the SCL clock. This is due to the counting logic for SCL low counting to (*_LCNT+1). The I2C adds eight cycles to the programmed *_HCNT value in order to generate the high period of the SCL clock. This is due to the following factors: ● The counting logic for SCL high counts to (*_HCNT+1). ● The digital filtering applied to the SCL line incurs a delay of four ic_clk cycles. This filtering includes metastability removal and a 2-out-of-3 majority vote processing on SDA and SCL edges. ● Whenever SCL is driven 1 to 0 by the I2C, that is, completing the SCL high time.an internal logic latency of three ic_clk cycles is incurred. Consequently, the minimum SCL low time of which the I2C is capable is nine (9) ic_clk periods (8+1), while the minimum SCL high time is fourteen (14) ic_clk periods (6+1+4+3). Minimum IC_CLK frequency This section describes the minimum ic_clk frequencies that the I2C supports for each speed mode, and the associated high and low count values. It should be noted that these limits apply to the I2C in both master and slave modes. The limits for slave mode are required so that the I2C does not break the Thd;dat maximum I2C protocol timing requirement. 432/590 Doc ID 018553 Rev 3 RM0078 I2C bus controllers (I2C) Standard and fast modes This section details how to derive a minimum ic_clk value for standard and fast modes of the I2C. Although the following method shows how to do fast mode calculations, you can also use the same method in order to do calculations for standard mode. Given conditions and calculations for the minimum I2C ic_clk value in fast mode: ● Fast mode has data rate of 400kb/s; implies SCL period of 1/400khz = 2.5 us ● Minimum hcnt value of 14 as a seed value; IC_HCNT_FS = 14 ● Protocol minimum SCL high and low times: – MIN_SCL_LOWtime_FS = 1300 ns – MIN_SCL_HIGHtime_FS = 600 ns Derived equations: SCL_PERIOD_FS --------------------------------------------------------------------------------= IC_CLK_PERIOD IC_HCNT_FS + IC_LCNT_FS IC_LCNT_FS × IC_CLK_PERIOD = MIN_SCL_LOWtime_FS Combined, the previous equations produce the following: SCL_PERIOD_FS IC_LCNT_FS × --------------------------------------------------------------------------------- = MIN_SCL_LOWtime_FS IC_LCNT_FS + IC_HCNT_FS Solving for IC_LCNT_FS: 2,5 μs IC_LCNT_FS × --------------------------------------------------- = 1.3 μs IC_LCNT_FS + 14 The previous equation gives: IC_LCNT_FS=roundup(15.166)=16 These calculations produce IC_LCNT_FS = 16 and IC_HCNT_FS = 14, giving an ic_clk value of: 2.5 μs- = 83.3 ns = 12 MHz -----------------16 + 14 Testing these results shows that protocol requirements are satisfied. High-speed modes The method used for standard and fast modes is not enough to derive correct ic_clk values for the high-speed modes. For example, given a high-speed mode with a 100pf bus loading, using the standard and fast modes method produces the following: ● IC_LCNT_HS = 17 ● IC_HCNT_HS = 14 ● ic_clk = 105.4 MHz Depending on glitch suppression, the I2C can take up to nine ic_clk cycles to drive SDA after a negative edge of SCL; however, the protocol requires a maximum Thd;dat of 70ns for this mode. For example: ● 105 MHz => IC_CLK_PERIOD = 9.48 ns ● 9.48 ns * 9 = 85.32 ns ● 85.32 ns is a maximum violation of Thd;dat Doc ID 018553 Rev 3 433/590 I2C bus controllers (I2C) RM0078 Thus, these values cannot be used. To satisfy this rule, IC_CLK_PERIOD can be derived as follows: 70 ns- = 7.77 ns -------------9 From this value, high and low count values can be derived: IC_LCNT_HS × IC_CLK_PERIOD ≥ MIN_SCL_LOWtime_HS IC_LCNT_HS × 7.77 ns ≥ 160 ns IC_LCNT_HS ≥ 21 The minimum value of 14 for IC_HCNT_HS easily accommodates the MIN_SCL_HIGHtime_HS requirement of 60ns for this requirement. Therefore: MIN_SCL_HIGHtime_FS = 14 MIN_SCL_HIGHtime_FS = 21 This derivation gives a baud rate higher than the allowed 3.4 Mb/s, but the high or low count can be scaled up to give the desired baud rate. Given: SCL_PERIOD = 1/3.4 MHz = 294 ns IC_CLK_PERIOD = 7.77 ns Required: roundup(294/7.77) = 38 ic_clk periods for a baud rate of 3.4 Mb/s To achieve this, the low count must be scaled up by 3 to give: MIN_SCL_HIGHtime_FS = 14 MIN_SCL_HIGHtime_FS = 24 The values for HS mode with a bus loading of 400 pf can be derived in the same way. Table 140 lists the minimum ic_clk values for all modes with high and low count values. Table 140. ic_clk in relation to high and low counts Speed mode ic_clkfreq (MHz) SCL low count SCL low program value SCL low time SCL high count SCL high program value SCL high time SS 2.7 13 12 4.7 μ s 14 6 5.2 μ s FS 12.0 16 15 1.33 μ s 14 6 1.16 μ s HS (400 pf) 60.2 22 21 365 ns 14 6 232 ns HS (100 pf) 128.5 24 23 186 ns 14 6 108 ns Note: 434/590 The IC_*_SCL_LCNT and IC_*_SCL_HCNT registers are programmed using the SCL low and high program values in Table 140, which are calculated using SCL low count minus 1, and SCL high counts minus 8, respectively. Doc ID 018553 Rev 3 RM0078 I2C bus controllers (I2C) Calculating high and low counts The calculations below show how to calculate SCL high and low counts for each speed mode in the I2C. For the calculations to work, the ic_clk frequencies used must not be less than the minimum ic_clk frequencies specified in Table 140. The I2C coreConsultant GUI can automatically calculate SCL high and low count values. By specifying an integer ic_clk period value in nanoseconds for the IC_CLK_PERIOD parameter, SCL high and low count values are automatically calculated for each speed mode. The ic_clk period must not specify a clock of a lower frequency than required for all supported speed modes. It is possible that the automatically calculated values may result in a baud rate higher than the maximum rate specified by the protocol. If this happens, either the low or high count values can be scaled up to reduce the baud rate. The minimum IC_CLK calculations for high-speed mode show how to do this; for details, refer to High-speed modes on page 433. The equation to calculate the proper number of ic_clk signals required for setting the proper SCL clocks high and low times is as follows: IC_xCNT = (ROUNDUP(MIN_SCL_xxxtime*OSCFREQ,0)) ROUNDUP is an explicit Excel function call that is used to convert a real number to its equivalent integer number. MIN_SCL_HIGHtime = Minimum High Period MIN_SCL_HIGHtime = 4000 ns for 100 kbps 600 ns for 400 kbps 60 ns for 3.4 Mbs, bus loading = 100pF 160 ns for 3.4 Mbs, bus loading = 400pF MIN_SCL_LOWtime = Minimum Low Period MIN_SCL_LOWtime = 4700 ns for 100 kbps 1300 ns for 400 kbps 120 ns for 3.4Mbs, bus loading = 100pF 320 ns for 3.4Mbs, bus loading = 400pF OSCFREQ = ic_clk Clock Frequency (Hz). For example: OSCFREQ = 100 MHz I2Cmode = fast, 400 kbit/s MIN_SCL_HIGHtime = 600 ns. MIN_SCL_LOWtime = 1300 ns. IC_xCNT = (ROUNDUP(MIN_SCL_HIGH_LOWtime*OSCFREQ,0)) IC_HCNT = (ROUNDUP(600 ns * 100 MHz,0)) IC_HCNTSCL PERIOD = 60 IC_LCNT = (ROUNDUP(1300 ns * 100 MHz,0)) IC_LCNTSCL PERIOD = 130 Actual MIN_SCL_HIGHtime = 60*(1/100 MHz) = 600 ns Actual MIN_SCL_LOWtime = 130*(1/100 MHz) = 1300 ns Doc ID 018553 Rev 3 435/590 I2C bus controllers (I2C) 27.4.8 RM0078 SDA hold time The I2C protocol specification requires 300ns of hold time on the SDA signal (tHD;DAT) in standard and fast speed modes, and a hold time long enough to bridge the undefined part between logic 1 and logic 0 of the falling edge of SCL in high speed mode. Board delays on the SCL and SDA signals can mean that the hold-time requirement is met at the I2C master, but not at the I2C slave (or vice-versa). As each application will encounter differing board delays, the I2C contains a software programmable register (IC_SDA_HOLD) to enable dynamic adjustment of the SDA hold-time. The IC_SDA_HOLD register can be used to alter the timing of the generated SDA (ic_data_oe) signal by the I2C. Each value in the IC_SDA_HOLD register represents a unit of one ic_clk period. When the I2C is operating in Master Mode, the minimum tHD:DAT timing is one ic_clk period. Therefore even when IC_SDA_HOLD has a value of zero, the I2C will drive SDA (ic_data_oe) one ic_clk cycle after driving SCL (ic_clk_oe) to logic 0. For all other values of IC_SDA_HOLD, the following is true: ● Drive on SDA (ic_data_oe) will occur IC_SDA_HOLD ic_clk cycles after driving SCL (ic_clk_oe) to logic 0 When the I2C operates in slave mode, the minimum tHD:DAT timing is eight ic_clk periods. This delay is to allow for synchronization and filtering on the SCL (ic_clk_in) sample. Therefore, even when IC_SDA_HOLD has a value less than 8, the I2C will drive SDA (ic_data_oe) eight ic_clk cycles after SCL (ic_clk_in) has transitioned to logic 0. For all other values of IC_SDA_HOLD, the following is true: ● Drive on SDA (ic_data_oe) will occur IC_SDA_HOLD ic_clk cycles after SCL (ic_clk_in) has transitioned to logic 0 If different SDA hold times are required for different speed modes, the IC_SDA_HOLD register must be reprogrammed when the speed mode is being changed. The IC_SDA_HOLD register can be programmed only when the I2C is disabled (IC_ENABLE = 0). The reset value of the IC_SDA_HOLD register can be set via the coreConsultant parameter IC_DEFAULT_SDA_HOLD. Figure 155 shows the tHD:DAT timing generated by the I2C operating in Master Mode when IC_SDA_HOLD = 3. Figure 155. I2C master implementing tHD;DAT when IC_SDA_HOLD = 3 ic_clk ic_data_oe ic_clk_oe IC_SDA_HOLD = 3 436/590 Doc ID 018553 Rev 3 RM0078 27.4.9 I2C bus controllers (I2C) DMA controller interface The I2C has an optional built-in DMA capability that can be selected at configuration time; it has a handshaking interface to a DMA Controller to request and control transfers. The APB bus is used to perform the data transfer to or from the DMA. While the I2C DMA operation is designed in a generic way to fit any DMA controller as easily as possible, it is designed to work seamlessly, and best used, with the DMA Controller, the DMAC. The settings of the DMAC that are relevant to the operation of the I2C are discussed here, mainly bit fields in the DMAC channel control register, CTLx, where x is the channel number. When the I2C interfaces to the DMAC, the DMAC is always a flow controller; that is, it controls the block size. This must be programmed by software in the DMAC. The DMAC always transfers data using DMA burst transactions if possible, for efficiency. For more information, refer to Chapter 12: Direct memory access controllers (DMAC). Other DMA controllers act in a similar manner. The DMA output dma_finish is a status signal to indicate that the DMA block transfer is complete. I2C does not use this status signal, and therefore does not appear in the I/O port list. The I2Cn has 2 DMA handshaking interfaces called I2Cn_TX and I2Cn_RX as shown in Table 49: DMAC MUX - selecting the peripheral . ● The I2Cn_TX is composed by the following signals: dma_tx_req (burst request), dma_tx_single (single request) and dma_tx_ack (clear). ● The I2Cn_RX is composed by the following signals: dma_rx_req (burst request), dma_rx_single (single request) and dma_rx_ack (clear). The relevant DMA settings are discussed in the following sections. Enabling the DMA controller interface To enable the DMA Controller interface on the I2C, you must write the DMA Control Register (IC_DMA_CR). Writing a 1 into the TDMAE bit field of IC_DMA_CR register enables the I2C transmit handshaking interface. Writing a 1 into the RDMAE bit field of the IC_DMA_CR register enables the I2C receive handshaking interface. Overview of operation As a block flow control device, the DMA Controller is programmed by the processor with the number of data items (block size) that are to be transmitted or received by I2C; this is programmed into the BLOCK_TS field of the DMAC CTLx register. The block is broken into a number of transactions, each initiated by a request from the I2C. The DMA Controller must also be programmed with the number of data items (in this case, I2C FIFO entries) to be transferred for each DMA request. This is also known as the burst transaction length and is programmed into the SRC_MSIZE/DEST_MSIZE fields of the DMAC CTLx register for source and destination, respectively. Figure 156 shows a single block transfer, where the block size programmed into the DMA Controller is 12 and the burst transaction length is set to 4. In this case, the block size is a multiple of the burst transaction length. Therefore, the DMA block transfer consists of a series of burst transactions. If the I2C makes a transmit request to this channel, four data items are written to the I2C TX FIFO. Similarly, if the I2C makes a receive request to this channel, four data items are read from the I2C RX FIFO. Three separate requests must be made to this DMA channel before all 12 data items are written or read. Doc ID 018553 Rev 3 437/590 I2C bus controllers (I2C) RM0078 Figure 156. Breakdown of DMA transfer into burst transactions 12 data items DMA multi-block transfer level 12 data items DMA block level DMA burst transaction 1 DMA burst transaction 2 4 data items 4 data items DMA burst transaction 3 4 data items Block size : DMA.CTLx.BLOCK_TS=12 Number of data items per source burst transaction : DMA.CTLx.SRC_MSIZE = 4 I2C receive FIFO watermark level: I2C.DMARDLR + 1 = DMA.CTLx.SRC_MSIZE = 4 When the block size programmed into the DMA Controller is not a multiple of the burst transaction length, as shown in Figure 157, a series of burst transactions followed by single transactions are needed to complete the block transfer. Figure 157. Breakdown of DMA transfer into single and burst transactions 15 data items DMA multi-block transfer level 15 data items DMA block level DMA burst transaction 1 DMA burst transaction 2 4 data items 4 data items DMA burst transaction 3 4 data items DMA single transaction 1 1 data item DMA single transaction 2 DMA single transaction 3 1 data item 1 data item Block size : DMA.CTLx.BLOCK_TS=15 Number of data items per burst transaction : DMA.CTLx.DEST_MSIZE = 4 I2C transmit FIFO watermark level: I2C.IC_DMA_TDLR = DMA.CTLx.DEST_MSIZE = 4 Transmit watermark level and transmit FIFO underflow During I2C serial transfers, transmit FIFO requests are made to the DMAC whenever the number of entries in the transmit FIFO is less than or equal to the DMA Transmit Data Level Register (IC_DMA_TDLR) value; this is known as the watermark level. The DMAC responds by writing a burst of data to the transmit FIFO buffer, of length CTLx.DEST_MSIZE. 438/590 Doc ID 018553 Rev 3 RM0078 I2C bus controllers (I2C) Data should be fetched from the DMA often enough for the transmit FIFO to perform serial transfers continuously; that is, when the FIFO begins to empty another DMA request should be triggered. Otherwise, the FIFO will run out of data causing a STOP to be inserted on the I2C bus. To prevent this condition, the user must set the watermark level correctly. Choosing the transmit watermark level Consider the example where the assumption is made: DMA.CTLx.DEST_MSIZE = FIFO_DEPTH - I2C.IC_DMA_TDLR Here the number of data items to be transferred in a DMA burst is equal to the empty space in the Transmit FIFO. Consider two different watermark level settings. Case 1: IC_DMA_TDLR = 2 ● Transmit FIFO watermark level = I2C.IC_DMA_TDLR = 2 ● DMA.CTLx.DEST_MSIZE = FIFO_DEPTH - I2C.IC_DMA_TDLR = 6 ● I2C transmit FIFO_DEPTH = 8 ● DMA.CTLx.BLOCK_TS = 30 Figure 158. Case 1 watermark levels FIFO_DEPTH = 8 Transmit FIFO Watermark level EMPTY FIFO_DEPTH - I2C.IC_DMA_TDLR = 6 I2C.IC_DMA_TDLR = 2 Data Out Data In DMA Controller FULL I2C Transmit FIFO Therefore, the number of burst transactions needed equals the block size divided by the number of data items per burst: DMA.CTLx.BLOCK_TS/DMA.CTLx.DEST_MSIZE = 30/6 = 5 The number of burst transactions in the DMA block transfer is 5. But the watermark level, I2C.IC_DMA_TDLR, is quite low. Therefore, the probability of an I2C underflow is high where the I2C serial transmit line needs to transmit data, but where there is no data left in the transmit FIFO. This occurs because the DMA has not had time to service the DMA request before the transmit FIFO becomes empty. Case 2: IC_DMA_TDLR = 6 ● Transmit FIFO watermark level = I2C.IC_DMA_TDLR = 6 ● DMA.CTLx.DEST_MSIZE = FIFO_DEPTH - I2C.IC_DMA_TDLR = 2 ● I2C transmit FIFO_DEPTH = 8 ● DMA.CTLx.BLOCK_TS = 30 Doc ID 018553 Rev 3 439/590 I2C bus controllers (I2C) RM0078 Figure 159. Case 2 watermark levels EMPTY Transmit FIFO Watermark level FIFO_DEPTH = 8 FULL FIFO_DEPTH - I2C.iC_DMA_TDLR = 2 I2C.IC_DMA_TDLR = 6 Data In DMA Controller Data Out I2C Transmit FIFO Number of burst transactions in Block: DMA.CTLx.BLOCK_TS/DMA.CTLx.DEST_MSIZE = 30/2 = 15 In this block transfer, there are 15 destination burst transactions in a DMA block transfer. But the watermark level, I2C.IC_DMA_TDLR, is high. Therefore, the probability of an I2C underflow is low because the DMA controller has plenty of time to service the destination burst transaction request before the I2C transmit FIFO becomes empty. Thus, the second case has a lower probability of underflow at the expense of more burst transactions per block. This provides a potentially greater amount of AMBA bursts per block and worse bus utilization than the former case. Therefore, the goal in choosing a watermark level is to minimize the number of transactions per block, while at the same time keeping the probability of an underflow condition to an acceptable level. In practice, this is a function of the ratio of the rate at which the I2C transmits data to the rate at which the DMA can respond to destination burst requests. For example, promoting the channel to the highest priority channel in the DMA, and promoting the DMA master interface to the highest priority master in the AMBA layer, increases the rate at which the DMA controller can respond to burst transaction requests. This in turn allows the user to decrease the watermark level, which improves bus utilization without compromising the probability of an underflow occurring. Selecting DEST_MSIZE and transmit FIFO overflow As can be seen from Figure 159, programming DMA.CTLx.DEST_MSIZE to a value greater than the watermark level that triggers the DMA request may cause overflow when there is not enough space in the I2C transmit FIFO to service the destination burst request. Therefore, the following equation must be adhered to in order to avoid overflow: DMA.CTLx.DEST_MSIZE <= I2C.FIFO_DEPTH - I2C.IC_DMA_TDLR (1) In Case 2: IC_DMA_TDLR = 6, the amount of space in the transmit FIFO at the time the burst request is made is equal to the destination burst length, DMA.CTLx.DEST_MSIZE. Thus, the transmit FIFO may be full, but not overflowed, at the completion of the burst transaction. Therefore, for optimal operation, DMA.CTLx.DEST_MSIZE should be set at the FIFO level that triggers a transmit DMA request; that is: DMA.CTLx.DEST_MSIZE = I2C.FIFO_DEPTH - I2C.IC_DMA_TDLR (2) This is the setting used in Figure 157. Adhering to equation (2) reduces the number of DMA bursts needed for a block transfer, and this in turn improves AMBA bus utilization. 440/590 Doc ID 018553 Rev 3 RM0078 Note: I2C bus controllers (I2C) The transmit FIFO will not be full at the end of a DMA burst transfer if the I2C has successfully transmitted one data item or more on the I2C serial transmit line during the transfer. Receive watermark level and receive FIFO overflow During I2C serial transfers, receive FIFO requests are made to the DMAC whenever the number of entries in the receive FIFO is at or above the DMA Receive Data Level Register; that is, IC_DMA_RDLR+1. This is known as the watermark level. The DMAC responds by writing a burst of data to the transmit FIFO buffer of length CTLx.SRC_MSIZE. Data should be fetched by the DMA often enough for the receive FIFO to accept serial transfers continuously; that is, when the FIFO begins to fill, another DMA transfer is requested. Otherwise, the FIFO will fill with data (overflow). To prevent this condition, the user must correctly set the watermark level. Choosing the receive watermark level Similar to choosing the transmit watermark level described earlier, the receive watermark level, IC_DMA_RDLR+1, should be set to minimize the probability of overflow, as shown in Figure 160. It is a trade-off between the number of DMA burst transactions required per block versus the probability of an overflow occurring. Selecting SRC_MSIZE and Receive FIFO Underflow As can be seen in Figure 160, programming a source burst transaction length greater than the watermark level may cause underflow when there is not enough data to service the source burst request. Therefore, equation 3 below must be adhered to avoid underflow. If the number of data items in the receive FIFO is equal to the source burst length at the time the burst request is made – DMA.CTLx.SRC_MSIZE – the receive FIFO may be emptied, but not underflowed, at the completion of the burst transaction. For optimal operation, DMA.CTLx.SRC_MSIZE should be set at the watermark level; that is: DMA.CTLx.SRC_MSIZE = I2C.IC_DMA_RDLR + 1 (3) Adhering to equation (3) above reduces the number of DMA bursts in a block transfer, which in turn can avoid underflow and improve AMBA bus utilization. Note: The receive FIFO will not be empty at the end of the source burst transaction if the I2C has successfully received one data item or more on the I2C serial receive line during the burst. Figure 160. I2C Receive FIFO EMPTY Receive FIFO Watermark level FULL Data Out I2C.IC_DMA_RDLR + 1 DMA Controller Data In I2C Receive FIFO Doc ID 018553 Rev 3 441/590 I2C bus controllers (I2C) RM0078 Handshaking interface operation The following sections discuss the handshaking interface. Note: For I2C0, dma_tx_req = I2C0_TX and dma_rx_req = I2C0_RX dma_tx_req, dma_rx_req The request signals for source and destination, dma_tx_req and dma_rx_req, are activated when their corresponding FIFOs reach the watermark levels as discussed earlier. The DMAC uses rising-edge detection of the dma_tx_req signal/dma_rx_req to identify a request on the channel. Upon reception of the dma_tx_ack/dma_rx_ack signal from the DMAC to indicate the burst transaction is complete, the I2C de-asserts the burst request signals, dma_tx_req/dma_rx_req, until dma_tx_ack/dma_rx_ack is de-asserted by the DMAC. When the I2C samples that dma_tx_ack/dma_rx_ack is de-asserted, it can re-assert the dma_tx_req/dma_rx_req of the request line if their corresponding FIFOs exceed their watermark levels (back-to-back burst transaction). If this is not the case, the DMA request lines remain de-asserted. Figure 161 shows a timing diagram of a burst transaction where pclk = hclk. Figure 162 shows two back-to-back burst transactions where the hclk frequency is twice the pclk frequency. Figure 161. Burst transaction – pclk = hclk pclk hclk burst transaction request dma_tx_req burst transaction complete dma_tx_ack dma_tx_single not sampled by the DW_ahb_dmac for burst transactions Figure 162. Back-to-back burst transaction – hclk = 2*pclk hclk pclk burst transaction request burst transaction request dma_rx_req burst transaction complete burst transaction complete dma_rx_ack dma_rx_single 442/590 not sampled by the DW_ahb_dmac for burst transactions Doc ID 018553 Rev 3 RM0078 I2C bus controllers (I2C) The handshaking loop is as follows: dma_tx_req/dma_rx_req asserted by I2C Note: – dma_tx_ack/dma_rx_ack asserted by DMAC – dma_tx_req/dma_rx_req de-asserted by I2C – dma_tx_ack/dma_rx_ack de-asserted by DMAC – dma_tx_req/dma_rx_req reasserted by I2C, if back-to-back transaction is required The burst transaction request signals, dma_tx_req and dma_rx_req, are generated in the I2C off pclk and sampled in the DMAC by hclk. The acknowledge signals, dma_tx_ack and dma_rx_ack, are generated in the DMAC off hclk and sampled in the I2C of pclk. The handshaking mechanism between the DMAC and the I2C supports quasi-synchronous clocks; that is, hclk and pclk must be phase-aligned, and the hclk frequency must be a multiple of the pclk frequency. Two things to note here: 1. The burst request lines, dma_tx_req signal/dma_rx_req, once asserted remain asserted until their corresponding dma_tx_ack/dma_rx_ack signal is received even if the respective FIFO’s drop below their watermark levels during the burst transaction. 2. The dma_tx_req/dma_rx_req signals are de-asserted when their corresponding dma_tx_ack/dma_rx_ack signals are asserted, even if the respective FIFOs exceed their watermark levels. dma_tx_single, dma_rx_single The dma_tx_single signal is a status signal. It is asserted when there is at least one free entry in the transmit FIFO and cleared when the transmit FIFO is full. The dma_rx_single signal is a status signal. It is asserted when there is at least one valid data entry in the receive FIFO and cleared when the receive FIFO is empty. These signals are needed by only the DMAC for the case where the block size, CTLx.BLOCK_TS, that is programmed into the DMAC is not a multiple of the burst transaction length, CTLx.SRC_MSIZE, CTLx.DEST_MSIZE, as shown in Figure 157. In this case, the DMA single outputs inform the DMAC that it is still possible to perform single data item transfers, so it can access all data items in the transmit/receive FIFO and complete the DMA block transfer. The DMA single outputs from the I2C are not sampled by the DMAC otherwise. This is illustrated in the following example. Consider first an example where the receive FIFO channel of the I2C is as follows: DMA.CTLx.SRC_MSIZE = I2C.iC_DMA_RDLR + 1 = 4 DMA.CTLx.BLOCK_TS = 12 For the example in Figure 156, with the block size set to 12, the dma_rx_req signal is asserted when four data items are present in the receive FIFO. The dma_rx_req signal is asserted three times during the I2C serial transfer, ensuring that all 12 data items are read by the DMAC. All DMA requests read a block of data items and no single DMA transactions are required. This block transfer is made up of three burst transactions. Now, for the following block transfer: DMA.CTLx.SRC_MSIZE = I2C.IC_DMA_RDLR + 1 = 4 DMA.CTLx.BLOCK_TS = 15 Doc ID 018553 Rev 3 443/590 I2C bus controllers (I2C) RM0078 The first 12 data items are transferred as already described using three burst transactions. But when the last three data frames enter the receive FIFO, the dma_rx_req signal is not activated because the FIFO level is below the watermark level. The DMAC samples dma_rx_single and completes the DMA block transfer using three single transactions. The block transfer is made up of three burst transactions followed by three single transactions. Figure 163 shows a single transaction. The handshaking loop is as follows: dma_tx_single/dma_rx_single asserted by I2C – dma_tx_ack/dma_rx_ack asserted by DMAC – dma_tx_single/dma_rx_single de-asserted by I2C – dma_tx_ack/dma_rx_ack de-asserted by DMAC. Figure 163. Single transaction m0 m1 m2 n0 n1 n2 n3 n4 pclk hclk dma_rx_req single transaction complete dma_rx_ack dma_rx_single Figure 164 shows a burst transaction, followed by three back-to-back single transactions, where the hclk frequency is twice the pclk frequency. Figure 164. Burst transaction + 3 back-to-back singles – hclk = 2*pclk hclk pclk burst transaction request dma_tx_req dma_tx_ack burst transaction complete Single transaction complete Single transaction complete Single transaction complete dma_tx_single Note: 444/590 The single transaction request signals, dma_tx_single and dma_rx_single, are generated in the I2C on the pclk edge and sampled in DMAC on hclk. The acknowledge signals, dma_tx_ack and dma_rx_ack, are generated in the DMAC on the hclk edge hclk and sampled in the I2C on pclk. The handshaking mechanism between the DMAC and the I2C supports quasi-synchronous clocks; that is, hclk and pclk must be phase aligned and the hclk frequency must be a multiple of pclk frequency. Doc ID 018553 Rev 3 RM0078 I2C bus controllers (I2C) 27.5 Programming Note: It is important to note that the I2C should only be set to operate as an I2C Master, or I2C Slave, but not both simultaneously. This is achieved by ensuring that bit 6 (IC_SLAVE_DISABLE) and 0 (IC_MASTER_MODE) of the IC_CON register are never set to 0 and 1, respectively. 27.5.1 Slave mode operation This section discusses slave mode procedures. Initial configuration To use the I2C as a slave, perform the following steps: Note: 1. Disable the I2C by writing a ‘0’ to bit 0 of the IC_ENABLE register. 2. Write to the IC_SAR register (bits 9:0) to set the slave address. This is the address to which the I2C responds. 3. Write to the IC_CON register to specify which type of addressing is supported (7- or 10-bit by setting bit 3). Enable the I2C in slave-only mode by writing a ‘0’ into bit 6 (IC_SLAVE_DISABLE) and a ‘0’ to bit 0 (MASTER_MODE). Slaves and masters do not have to be programmed with the same type of addressing 7- or 10-bit address. For instance, a slave can be programmed with 7-bit addressing and a master with 10-bit addressing, and vice versa. 4. Note: Enable the I2C by writing a ‘1’ in bit 0 of the IC_ENABLE register. Depending on the reset values chosen, steps 2 and 3 may not be necessary because the reset values can be configured. For instance, if the device is only going to be a master, there would be no need to set the slave address because you can configure I2C to have the slave disabled after reset and to enable the master after reset. The values stored are static and do not need to be reprogrammed if the I2C is disabled. Slave-transmitter operation for a single byte When another I2C master device on the bus addresses the I2C and requests data, the I2C acts as a slave-transmitter and the following steps occur: 1. The other I2C master device initiates an I2C transfer with an address that matches the slave address in the IC_SAR register of the I2C. 2. The I2C acknowledges the sent address and recognizes the direction of the transfer to indicate that it is acting as a slave-transmitter. 3. The I2C asserts the RD_REQ interrupt (bit 5 of the IC_RAW_INTR_STAT register) and holds the SCL line low. It is in a wait state until software responds. If the RD_REQ interrupt has been masked, due to IC_INTR_MASK[5] register (M_RD_REQ bit field) being set to 0, then it is recommended that a hardware and/or software timing routine be used to instruct the CPU to perform periodic reads of the IC_RAW_INTR_STAT register. a) Reads that indicate IC_RAW_INTR_STAT[5] (R_RD_REQ bit field) being set to 1 must be treated as the equivalent of the RD_REQ interrupt being asserted. b) Software must then act to satisfy the I2C transfer. c) The timing interval used should be in the order of 10 times the fastest SCL clock period the I2C can handle. For example, for 400 kb/s, the timing interval is 25 us. Doc ID 018553 Rev 3 445/590 I2C bus controllers (I2C) Note: The value of 10 is recommended here because this is approximately the amount of time required for a single byte of data transferred on the I2C bus. 4. Note: RM0078 If there is any data remaining in the TX FIFO before receiving the read request, then the I2C asserts a TX_ABRT interrupt (bit 6 of the IC_RAW_INTR_STAT register) to flush the old data from the TX FIFO. Because the I2C’s TX FIFO is forced into a flushed/reset state whenever a TX_ABRT event occurs, it is necessary for software to release the I2C from this state by reading the IC_CLR_TX_ABRT register before attempting to write into the TX FIFO. See register IC_RAW_INTR_STAT for more details. If the TX_ABRT interrupt has been masked, due to of IC_INTR_MASK[6] register (M_TX_ABRT bit field) being set to 0, then it is recommended that re-using the timing routine (described in the previous step), or a similar one, be used to read the IC_RAW_INTR_STAT register. a) Reads that indicate bit 6 (R_TX_ABRT) being set to 1 must be treated as the equivalent of the TX_ABRT interrupt being asserted. b) There is no further action required from software. c) The timing interval used should be similar to that described in the previous step for the IC_RAW_INTR_STAT[5] register. 5. Software writes to the IC_DATA_CMD register with the data to be written (by writing a ‘0’ in bit 8). 6. Software must clear the RD_REQ and TX_ABRT interrupts (bits 5 and 6, respectively) of the IC_RAW_INTR_STAT register before proceeding. If the RD_REQ and/or TX_ABRT interrupts have been masked, then clearing of the IC_RAW_INTR_STAT register will have already been performed when either the R_RD_REQ or R_TX_ABRT bit has been read as 1. 7. The I2C releases the SCL and transmits the byte. 8. The master may hold the I2C bus by issuing a RESTART condition or release the bus by issuing a STOP condition. Slave-receiver operation for a single byte When another I2C master device on the bus addresses the I2C and is sending data, the I2C acts as a slave-receiver and the following steps occur: Note: 1. The other I2C master device initiates an I2C transfer with an address that matches the I2C’s slave address in the IC_SAR register. 2. The I2C acknowledges the sent address and recognizes the direction of the transfer to indicate that the I2C is acting as a slave-receiver. 3. I2C receives the transmitted byte and places it in the receive buffer. If the RX FIFO is completely filled with data when a byte is pushed, then an overflow occurs and the I2C continues with subsequent I2C transfers. Because a NACK is not generated, software must recognize the overflow when indicated by the I2C (by the R_RX_OVER bit in the IC_INTR_STAT register) and take appropriate actions to recover from lost data. Hence, there is a real time constraint on software to service the RX FIFO before the latter overflow as there is no way to reapply pressure to the remote transmitting master. You must select a deep enough RX FIFO depth to satisfy the interrupt service interval of their system. 4. 446/590 I2C asserts the RX_FULL interrupt (IC_RAW_INTR_STAT[2] register). If the RX_FULL interrupt has been masked, due to setting IC_INTR_MASK[2] register to 0 or setting IC_TX_TL to a value larger than 0, then it is recommended that a timing Doc ID 018553 Rev 3 RM0078 I2C bus controllers (I2C) routine (described in Slave-transmitter operation for a single byte on page 445) be implemented for periodic reads of the IC_STATUS register. Reads of the IC_STATUS register, with bit 3 (RFNE) set at 1, must then be treated by software as the equivalent of the RX_FULL interrupt being asserted. 5. Software may read the byte from the IC_DATA_CMD register (bits 7:0). 6. The other master device may hold the I2C bus by issuing a RESTART condition or release the bus by issuing a STOP conditions. Slave-transfer operation for bulk transfers In the standard I2C protocol, all transactions are single byte transactions and the programmer responds to a remote master read request by writing one byte into the slave’s TX FIFO. When a slave (slave-transmitter) is issued with a read request (RD_REQ) from the remote master (master-receiver), at a minimum there should be at least one entry placed into the slave-transmitter’s TX FIFO. I2C is designed to handle more data in the TX FIFO so that subsequent read requests can take that data without raising an interrupt to get more data. Ultimately, this eliminates the possibility of significant latencies being incurred between raising the interrupt for data each time had there been a restriction of having only one entry placed in the TX FIFO. This mode only occurs when I2C is acting as a slave-transmitter. If the remote master acknowledges the data sent by the slave-transmitter and there is no data in the slave’s TX FIFO, the I2C holds the I2C SCL line low while it raises the read request interrupt (RD_REQ) and waits for data to be written into the TX FIFO before it can be sent to the remote master. If the RD_REQ interrupt is masked, due to bit 5 (M_RD_REQ) of the IC_INTR_STAT register being set to 0, then it is recommended that a timing routine be used to activate periodic reads of the IC_RAW_INTR_STAT register. Reads of IC_RAW_INTR_STAT that return bit 5 (R_RD_REQ) set to 1 must be treated as the equivalent of the RD_REQ interrupt referred to in this section. This timing routine is similar to that described in Slavetransmitter operation for a single byte on page 445. The RD_REQ interrupt is raised upon a read request, and like interrupts, must be cleared when exiting the interrupt service handling routine (ISR). The ISR allows you to either write 1 byte or more than 1 byte into the TX FIFO. During the transmission of these bytes to the master, if the master acknowledges the last byte. Then the slave must raise the RD_REQ again because the master is requesting for more data. If the programmer knows in advance that the remote master is requesting a packet of n bytes, then when another master addresses I2C and requests data, the TX FIFO could be written with n number bytes and the remote master receives it as a continuous stream of data. For example, the I2C slave continues to send data to the remote master as long as the remote master is acknowledging the data sent and there is data available in the TX FIFO. There is no need to hold the SCL line low or to issue RD_REQ again. If the remote master is to receive n bytes from the I2C but the programmer wrote a number of bytes larger than n to the TX FIFO, then when the slave finishes sending the requested n bytes, it clears the TX FIFO and ignores any excess bytes. The I2C generates a transmit abort (TX_ABRT) event to indicate the clearing of the TX FIFO in this example. At the time an ACK/NACK is expected, if a NACK is received, then the remote master has all the data it wants. At this time, a flag is raised within the slave’s state machine to clear the leftover data in the TX FIFO. This flag is transferred to the processor Doc ID 018553 Rev 3 447/590 I2C bus controllers (I2C) RM0078 bus clock domain where the FIFO exists and the contents of the TX FIFO is cleared at that time. 27.5.2 Master mode operation This section discusses master mode procedures. Initial configuration The initial configuration procedure for Master Mode Operation depends on the configuration parameter I2C_DYNAMIC_TAR_UPDATE. When set to “Yes” (1), the target address and address format can be changed dynamically without having to disable I2C. This parameter only applies to when I2C is acting as a master because the slave requires the component to be disabled before any changes can be made to the address. The procedures are very similar and are only different with regard to where the IC_10BITADDR_MASTER bit is set (either bit 4 of IC_CON register or bit 12 of IC_TAR register). I2C_DYNAMIC_TAR_UPDATE = 1 To use the I2C as a master when the I2C_DYNAMIC_TAR_UPDATE configuration parameter is set to “Yes” (1), perform the following steps: Note: 1. Disable the I2C by writing 0 to the IC_ENABLE register. 2. Write to the IC_CON register to set the maximum speed mode supported for slave operation (bits 2:1) and to specify whether the I2C starts its transfers in 7/10 bit addressing mode when the device is a slave (bit 3). 3. Write to the IC_TAR register the address of the I2C device to be addressed. It also indicates whether a General Call or a START BYTE command is going to be performed by I2C. The desired speed of the I2C master-initiated transfers, either 7-bit or 10-bit addressing, is controlled by the IC_10BITADDR_MASTER bit field (bit 12). 4. Only applicable for high-speed mode transfers. Write to the IC_HS_MADDR register the desired master code for the I2C. The master code is programmer-defined. 5. Enable the I2C by writing a 1 in the IC_ENABLE register. 6. Now write the transfer direction and data to be sent to the IC_DATA_CMD register. If the IC_DATA_CMD register is written before the I2C is enabled, the data and commands are lost as the buffers are kept cleared when I2C is not enabled. For multiple I2C transfers, perform additional writes to the TX FIFO such that the TX FIFO does not become empty during the I2C transaction. If the TX FIFO is completely emptied at any stage, then further writes to the TX FIFO results in an independent I2C transaction. Dynamic IC_TAR or IC_10BITADDR_MASTER update The I2C supports dynamic updating of the IC_TAR (bits 9:0) and IC_10BITADDR_MASTER (bit 12) bit fields of the IC_TAR register. In order to perform a dynamic update of the IC_TAR register, the I2C_DYNAMIC_TAR_UPDATE configuration parameter must be set to “Yes” (1). You can dynamically write to the IC_TAR register provided the following conditions are met: 1. I2C is not enabled (IC_ENABLE=0); 2. I2C is enabled (IC_ENABLE=1); AND I2C is NOT engaged in any Master (tx, rx) operation (IC_STATUS[5]=0); AND OR 448/590 Doc ID 018553 Rev 3 RM0078 I2C bus controllers (I2C) I2C is enabled to operate in Master mode (IC_CON[0]=1); AND there are NO entries in the TX FIFO (IC_STATUS[2]=1) Master transmit and master receive The I2C supports switching back and forth between reading and writing dynamically. To transmit data, write the data to be written to the lower byte of the I2C Rx/Tx Data Buffer and Command Register (IC_DATA_CMD). The CMD bit [8] should be written to 0 for I2C write operations. Subsequently, a read command may be issued by writing “don’t cares” to the lower byte of the IC_DATA_CMD register, and a 1 should be written to the CMD bit. The I2C master continues to initiate transfers as long as there are commands present in the transmit FIFO. If the transmit FIFO becomes empty, the I2C inserts a STOP condition after completing the current transfers. 27.5.3 Disabling I2C The register IC_ENABLE_STATUS is added to allow software to unambiguously determine when the hardware has completely shutdown in response to the IC_ENABLE register being set from 1 to 0. Only one register is required to be monitored, as opposed to monitoring two registers (IC_STATUS and IC_RAW_INTR_STAT) which is a requirement for I2C versions 1.05a or earlier. Procedure Note: 1. Define a timer interval (ti2c_poll) equal to the 10 times the signaling period for the highest I2C transfer speed used in the system and supported by I2C. For example, if the highest I2C transfer mode is 400 kb/s, then this ti2c_poll is 25us. 2. Define a maximum time-out parameter, MAX_T_POLL_COUNT, such that if any repeated polling operation exceeds this maximum value, an error is reported. 3. Execute a blocking thread/process/function that prevents any further I2C master transactions to be started by software, but allows any pending transfers to be completed. This step can be ignored if I2C is programmed to operate as an I2C slave only. 4. The variable POLL_COUNT is initialized to zero. 5. Set IC_ENABLE to 0. 6. Read the IC_ENABLE_STATUS register and test the IC_EN bit (bit 0). Increment POLL_COUNT by one. If POLL_COUNT >= MAX_T_POLL_COUNT, exit with the relevant error code. 7. If IC_ENABLE_STATUS[0] is 1, then sleep for ti2c_poll and proceed to the previous step. Otherwise, exit with a relevant success code. Doc ID 018553 Rev 3 449/590 General purpose I/O (GPIOA-B) 28 RM0078 General purpose I/O (GPIOA-B) This chapter focuses on GPIO functionality and operation. For the GPIO feature list, refer to the SPEAr1340 datasheet: ● Doc ID 023063, Data sheet, SPEAr1340, Dual-core Cortex A9 HMI embedded MPU For technical details about the programmable registers, refer to the following companion document: ● 28.1 RM0089, Reference manual, SPEAr1340 address map and registers Overview The SPEAr1340 device integrates 2 instances of a general purpose I/O digital block, identified as GPIOA and GPIOB. The GPIO block provides 8 programmable inputs or outputs. Each input/output can be controlled through an APB interface. Figure 165. GPIOA and GPIOB block diagram GPIOINTR PCLK nGPEN[7:0] PRESETn GPIOA/B GPOUT[7:0] APB slave interface Interfaced with GPIO_A/B[7:0] GPIN[7:0] See also: Figure 166: GPIO detailed block diagram 28.2 Pins For a complete pin description, refer to Doc ID 023063, Data sheet, SPEAr1340, Dual-core Cortex A9 HMI embedded MPU. 28.3 Clocks The GPIO block uses PCLK, the APB clock. See also: Chapter 5: Reset and clock generator (RCG). 450/590 Doc ID 018553 Rev 3 RM0078 28.4 General purpose I/O (GPIOA-B) Resets The APB reset, PRESETn, is used to reset the GPIO block. All block registers are cleared during power-on-reset (LOW). This disables the output drivers for the GPIO lines, so that the pins are configured as inputs. 28.5 Interrupts GPIO sends a single interrupt, GPIOINTR, to the interrupt controller. See also: Section 28.6.2: Interrupt detection logic 28.6 Functional description Figure 166 shows a block diagram of the GPIO block with its main interfaces. Figure 166. GPIO detailed block diagram GPIO APB interface Register block PSEL Input/output control PENABLE PWRITE nGPIODIR[7:0] GPIODATA[7:0] GPINSync2[7:0] nGPEN[7:0] Input/ output multiplexor GPIN[7:0] PADDR[11:2] PWDATA[7:0] GPIO_A/B[7:0] GPOUT[7:0] ID PRDATA[7:0] PRESETn PCLK 28.6.1 Interrupt control Interrupt detection logic GPIOINTR APB interface The APB interface generates read and write decodes for accesses to control, interrupt, and data registers. A read-only decode is provided to access the ID codes. Doc ID 018553 Rev 3 451/590 General purpose I/O (GPIOA-B) 28.6.2 RM0078 Interrupt detection logic The interrupt section of the GPIO is controlled by a set of seven registers, each controlling a different feature or condition of the interrupt triggering chain. You can select the source of the interrupt, its polarity, and edge properties. GPIO has the ability to generate mask-programmable interrupts based on the level, or transitional value of any of its GPIO lines. When one or more GPIO lines cause an interrupt, a single interrupt output GPIOINTR is sent to the interrupt controller. Refer to Appendix A: Interrupts for GPIO interrupts lines. You can configure interrupts so that they are generated either on a change in the level, or on an edge of the GPIO line. The edge and level on which the interrupt must be generated is programmable. The set of seven registers in the APB interface allow the following functionality: ● interrupt generation either on a change in the level, one edge, or both edges of the GPIO line ● reading raw and masked interrupt status ● reading from and writing to the interrupt enable ● interrupt clear (write-only). Each input/output line has a corresponding masked interrupt output line. Setting the appropriate mask bit HIGH enables the interrupt. For edge-triggered interrupts, software must clear the interrupt to enable any further interrupts. For a level case, it is assumed that the external source holds the level constant for the interrupt to be recognized by the processor. Three registers are required to define the edge or sense that causes an interrupt: ● GPIOIS (Interrupt sense register) ● GPIOIBE (Interrupt both edges register) ● GPIOIEV (Interrupt event register) Figure 167 shows how the bits of the three registers combine to select an interrupt source event. Note: 452/590 Each bit of the interrupt registers corresponds to a GPIO pin. Doc ID 018553 Rev 3 RM0078 General purpose I/O (GPIOA-B) Figure 167. GPIO interrupt registers Start No = 0 GPIOIE masked? Interrupt masked Yes = 1 0 GPIOIBE both edges? 1 GPIOIS edge/level? Yes = 1 No = 0 1 GPIOIEV rising/falling? 1 0 0 GPIOIEV HIGH/LOW? Registers to be programmed Table 141 shows how an interrupt is triggered by a rising edge detected on input pin 2. Table 141. Triggering an interrupt from pin 2 Register Desired trigger 7 6 5 4 3 2 1 0 GPIOIS 0= edge 1= level x x x x x 0 x x GPIOIBE 0= single edge 1= both edges x x x x x 0 x x GPIOIEV LOW level, or negative edge HIGH level, or positive edge x x x x x 1 x x GPIOIE 0= masked 1= not masked 0 0 0 0 0 1 0 0 If any GPIOIE register bit is 0, the interrupt triggering on the associated line is disabled. In Table 141 an x indicates that the value of the associated bit is irrelevant, a consequence of the bit being masked by the GPIOIE register setting. You must perform programming of the interrupt control registers when the respective interrupts are not enabled. Writing to interrupt control registers can generate spurious interrupts if the corresponding bits are enabled. See also: Section : Recommendations on page 454. Doc ID 018553 Rev 3 453/590 General purpose I/O (GPIOA-B) RM0078 28.7 Operation 28.7.1 Interrupt configuration On application of PRESETn as LOW: ● interrupts in the desired line are disabled by clearing the corresponding bit in GPIOIE ● all registers are cleared to zero ● input and output pins are configured as inputs ● interrupts to the external world are all masked as disabled ● raw interrupts are cleared to zero ● edge triggered interrupts are selected as source Recommendations If you want to generate edge-triggered interrupts you must perform the following initialization sequence to avoid spurious interrupts being interpreted by the system. 1. Program GPIOIBE appropriately as individual or both-edge detection 2. Program GPIOIEV, if you have selected individual edge transactions previously 3. Program GPIOIS to select edge-triggered path 4. Apply three clock pulses to clean interrupt pipeline (wait for three PCLK periods) 5. Ensure GPIN[7:0] bus remains stable throughout this operation 6. Clear all interrupts by writing 0xFF to GPIOIC 7. Program GPIOIE to enable interrupts For example, to detect an interrupt on a rising edge of the signal on pin 2, you should configure the GPIO registers as follows: GPIOIBE &= 0xFB; GPIOIEV |= 0x4; GPIOIS &= 0xFB; // Waiting for 3 PCLK cycles, assuming ARM core clock is 4 times PCLK for(int i = 0; i < 12; i++); 454/590 GPIOIC = 0x4; GPIOIE |= 0x4; Doc ID 018553 Rev 3 RM0078 28.7.2 General purpose I/O (GPIOA-B) Operation of the input/output lines (I/O read/write) The GPIO block comprises eight programmable input/output lines. Data and control for these lines are provided by the data register GPIODATA and the data direction register GPIODIR. On reads, the data register contains the current status of the GPIO pins, whether they are configured as input or output. Writing to the data register only affects the pins that are configured as outputs. Data register (GPIODATA) The address bus is used as a mask on read/write operations of the data register GPIODATA. The eight address lines used as a mask are PADDR[9:2]. Therefore, the GPIODATA register effectively covers 256 locations in the address space. Data direction register (GPIODIR) The data direction register operates in the following manner: ● 0 indicates the corresponding I/O pin is defined as an input ● 1 indicates the corresponding I/O pin is defined as an output Write operation 1. Set desired bits of GPIODIR to '1', making IO pin an output. 2. Choose the write address for the GPIODATA register so that the corresponding bits of PADDR[9:2] are set to '1', unmasking the write operation to the desired IO pins. 3. Write to the address chosen in step 2. During a write, PADDR[9:2] bits behave as follows: ● If the address bit associated with the data bit is HIGH (unmasked), the value of the associated GPIODATA register bit is altered. ● If it is LOW (masked), the associated GPIODATA register bit is left unchanged. For example: PADDR[9:2] = 'b00000000 -> all bits of GPIODATA are masked. PADDR[9:2] = 'b11111111 -> all bits of GPIODATA are unmasked. PADDR[9:2] = 'b00001110 -> bits 1, 2 and 3 of GPIODATA are unmasked. If bits 1, 2 and 5 of GPIODATA need to be wriiten leaving the remaining bits (0, 3, 4, 6 and 7) unchanged, the PADDR[9:2] bits (used as mask) should be 'b00100110. PADDR[9:0] = 'b0010011000 (bits 1:0 are appended as "00"). Therefore, the address which should be accessed for write operation is: GPIODATA + 'b0010011000 (or GPIODATA + 0x098) When a value of 0xFB is written to the address 0x098 then: ● bits 5, and 1 of the GPIO pins are set to 1, and bit 2 is set to 0 ● the other bits are not changed. Doc ID 018553 Rev 3 455/590 General purpose I/O (GPIOA-B) RM0078 Figure 168 shows the above effect of the address value of 0x098 operating on the data value of 0xFB. Figure 168. Example to write to address 0x098 Note: PADDR[9:2] 9 8 7 6 5 4 3 2 0x098 0 0 1 0 0 1 1 0 0xFB 1 1 1 1 1 0 1 1 GPIODATA u u 1 u u 0 1 u 7 6 5 4 3 2 1 0 0 0 In Figure 168 “u” indicates that the bit value is unchanged. Read operation During a read, PADDR[9:2] bits behave as follows: ● If the bit is HIGH, the associated data bit value is read. ● If the bit is LOW, the data bit is read as '0'. For example: If bits 0, 4 and 5 are need to be read, PADDR[9:2] = 'b000110001. Therefore, the address which should be accessed for read = GPIODATA + 'b00011000100 (or GPIODATA + 0x0C4). When reading from 0x0C4 then: ● bits 5, 4, and 0 of the GPIO pins are returned ● the value of bits 7, 6, 3, 2, and 1 are returned as zero, regardless of their state. Figure 169 shows a read from the address 0x0C4 and the output on the PRDATA[7:0] lines. Figure 169. Example to read from address 0x0C4 456/590 PADDR[9:2] 9 8 7 6 5 4 3 2 0x0C4 0 0 1 1 0 0 0 1 GPIN[7:0] 1 1 1 1 1 1 1 0 PRDATA[7:0] 0 0 1 1 0 0 0 0 7 6 5 4 3 2 1 0 Doc ID 018553 Rev 3 0 RM0078 29 Extended general purpose I/O (XGPIO) Extended general purpose I/O (XGPIO) This chapter focuses on XGPIO functionality and operation. For the XGPIO feature list, refer to the SPEAr1340 datasheet: ● Doc ID 023063, Data sheet, SPEAr1340, Dual-core Cortex A9 HMI embedded MPU For technical details about the programmable registers, refer to the following companion document: ● 29.1 RM0089, Reference manual, SPEAr1340 address map and registers Overview Extended general purpose I/Os are individually programmable input/output pins (output by default) through an AHB slave interface. Figure 170. XGPIO block diagram GPIO_INT HCLK GPIO_EN[249:0] HRESETn XGPIO GPIO_OUT[249:0] AHB slave interface GPIO_IN[249:0] 29.2 Pins For a complete pin description, refer to Doc ID 023063, Data sheet, SPEAr1340, Dual-core Cortex A9 HMI embedded MPU. . 29.3 Clocks The XGPIO module uses the system AHB clock, HCLK. See also: Chapter 5: Reset and clock generator (RCG). 29.4 Interrupts The signal rising or falling edge from all IO pins is recorded and ORed to provide an interrupt. Refer to Appendix A: Interrupts for the XGPIO interrupt line. Doc ID 018553 Rev 3 457/590 Extended general purpose I/O (XGPIO) 29.5 RM0078 Functional description The XGPIO module contains a set of registers mapped to external general purpose IOs. The registers are accessible through an AHB slave interface. The module can also generate interrupts on the falling or rising edge (programmable) of the signal from any of the IOs. For reading, writing, or generating an interrupt from the IOs, the pins must be enabled as XGPIOs. As shown in the example of Figure 171, to enable SPEAr1340 pads as XGPIOs, you must configure the PAD_FUNCTION_EN_* miscellaneous registers. GPIO_IN*, GPIO_OUT* and GPIO_EN* are XGPIO register names. All these registers are documented in the RM0089, Reference manual, SPEAr1340 address map and registers. Figure 171. Mapping of XGPIO40 pad to XGPIO registers MUX LOGIC GPIO_IN1 31 30 9 8 7 XGPIO40 2 1 0 IN GPIO_OUT1 31 30 IO 9 8 7 2 1 0 9 8 7 2 1 0 OUT GPIO_EN1 31 30 EN PAD_FUNCTION_EN_2[3] 29.5.1 XGPIO IN read When enabled as XGPIO, the status of the signal on IO pins is always available in GPIO_IN0 to GPIO_IN7 registers. An AHB read sent to GPIO_IN0 through GPIO_IN7 provides the status of the signals on the XGPIOs. For example, a read from register GPIO_IN1 will provide the status of XGPIO32 to XGPIO63 in the bits 0 to 31 respectively. 29.5.2 XGPIO OUT write To drive a signal on the IO, the XGPIO must be enabled as an output: in the GPIO_ENx registers, set the bit that corresponds to the desired IO. For example, to set XGPIO40 as an output, bit 8 of GPIO_EN1 register should be reset to '0'. By default, all of the XGPIOs are enabled as outputs (GPIO_ENx registers default to 0x0). Once the XGPIO is programmed as an output, the values of the bits from the GPIO_OUTx registers are present on the XGPIO. The XGPIO can then be driven as required by writing to the GPIO_OUTx register. 458/590 Doc ID 018553 Rev 3 RM0078 Extended general purpose I/O (XGPIO) If an XGPIO is in input mode, the value of the corresponding bit from GPIO_OUTx register will not have any effect on it. The following example illustrates driving XGPIO40 out to '1': 29.5.3 PAD_FUNCTION_EN_2 &= 0xFFFFFDFF; GPIO_EN1 &= 0xFFFFFEFF; GPIO_OUT1 |= 0x100; Using an XGPIO pin as an interrupt An interrupt can be generated on the rising or falling edge of the signal on an XGPIO. Figure 172. Interrupt detection logic on XGPIOs GPIO_EN GPIO_IN R E G R E G EDGE DET GPIO_INT GPIO_IRQ IRQ_EDGE GPIO_IRQ_MASK Enabling interrupt generation 1. Initialize interrupt ID[139] (GPIO_IRQx) 2. Enable as XGPIOs the pins on which the interrupt is to be received (PAD_FUNCTION_EN_x). 3. Enable the XGPIOs in step 2 as inputs (GPIO_ENx). 4. Program the desired edge for interrupt captures (rising or falling edge); by default, falling edge is captured as interrupt (register IRQ_EDGEx). 5. Disable the interrupt masks on the enabled XGPIOs (GPIO_IRQ_MASKx). The following example illustrates setting XGPIO40 for interrupt detection on rising-edge of the signal: PAD_FUNCTION_EN_2 &= 0xFFFFFDFF; GPIO_EN1 |= 0x100; IRQ_EDGE1 |= 0x100; GPIO_IRQ_MASK1 &= 0xFFFFFEFF; Doc ID 018553 Rev 3 459/590 Extended general purpose I/O (XGPIO) RM0078 Checking XGPIO interrupt status ● Use registers GPIO_IRQ0 through GPIO_IRQ7 For example, reading from GPIO_IRQ1 gives the interrupt status from XGPIO32 through XGPIO63. Clearing an interrupt ● In the appropriate GPIO_IRQx register, write 0 to the interrupt bit to be cleared, keeping the remaining bits 1. For example, to clear interrupt from XGPIO40: GPIO_IRQ1 = 0xFFFFFEFF; 460/590 Doc ID 018553 Rev 3 RM0078 30 Keyboard controller (KBD) Keyboard controller (KBD) This chapter focuses on KBD functionality and operation. For the KBD feature list, refer to the SPEAr1340 datasheet: ● Doc ID 023063, Data sheet, SPEAr1340, Dual-core Cortex A9 HMI embedded MPU For technical details about the programmable registers, refer to the following companion document: ● 30.1 RM0089, Reference manual, SPEAr1340 address map and registers Overview The GPIO keyboard controller integrated in SPEAr1340 offers a 3-mode input and output port. It provides an12-bit GPIO, or 6x6 keyboard, or 2x2 keyboard plus 8-bit GPIO, and offers an interface to the industry standard APB bus. Figure 173. Keyboard controller block diagram APB +Wrapper ST_1-ST_2/ ST_1-ST_6 Key switch matrix KBD_1-KBD_2/ KBD_1-KBD_6/ Parallel/ KBD port CLK Reg_Out_Enable DOUT REG OUT 20 LOAD/LATCH Interrupts CLK Clock DIN Reset REG IN 20 Reg_In_Enable 30.2 Pins For a complete pin description, refer to Doc ID 023063, Data sheet, SPEAr1340, Dual-core Cortex A9 HMI embedded MPU. Doc ID 018553 Rev 3 461/590 Keyboard controller (KBD) 30.3 RM0078 Clocks KBD has the APB clock PCLK. The KBD scan rate varies across PCLK dividers. The PCLK frequency value has to be programmed as per the input clock frequency of KBD. Supposing that the clock frequency is X MHz, then the PCLK frequency should be equal to X. KBD must detect the microsecond. When the counter reaches the value X, this means that it reached 1 microsecond. So, to calculate the microsecond, it is necessary to program this value aligned to the input clock. See also: Chapter 5: Reset and clock generator (RCG). 30.4 Interrupts An interrupt is produced when a pressed key is scanned and properly validated (described in Section 30.5.1: Operating modes). This interrupt triggers the software interrupt service routine by reading and writing STATUSREG. Writing zero (0) in this register is essential to make interrupt line low. 30.5 Functional description The keyboard controller provides an APB bus interface with a test wrapper, intro chip signals, and 20 programmable I/O pins of which only 12 are exposed to the SoC interface. The wrapper converts internal chip signals into AMBA-compatible signals (non-AMBA signals are clock, reset, and interrupts). Control, status and data signals are all accessible through the APB bus interface. Table 142 cross-references block signals and external interconnections. Note: Different modes require that different pins be connected to the key switch matrix. Section 30.5.1: Operating modes provides a description of each operating mode. Table 142. Block signals and external interconnection cross reference PORT PIN GPIO KEYBOARD6x6 KEYBOARD2x2 ROW0 GPIO0 Keyboard Output (ROW0) Keyboard Output (ROW0) ROW1 GPIO1 Keyboard Output (ROW1) Keyboard Output (ROW1) ROW2 GPIO2 Keyboard Output (ROW2) GPIO2 ROW3 GPIO3 Keyboard Output (ROW3) GPIO3 ROW4 GPIO4 Keyboard Output (ROW4) GPIO4 ROW5 GPIO5 Keyboard Output (ROW5) GPIO5 COL0 GPIO9 Keyboard Input (COL0) Keyboard Input (COL0) COL1 GPIO10 Keyboard Input (COL1) Keyboard Input (COL1) COL2 GPIO11 Keyboard Input (COL2) GPIO11 COL3 GPIO12 Keyboard Input (COL3) GPIO22 462/590 Doc ID 018553 Rev 3 RM0078 Keyboard controller (KBD) Table 142. Block signals and external interconnection cross reference (continued) PORT PIN GPIO KEYBOARD6x6 KEYBOARD2x2 COL4 GPIO13 Keyboard Input (COL4) GPIO13 COL5 GPIO14 Keyboard Input (COL5) GPIO14 30.5.1 Operating modes General purpose input output interface mode (GPIO) At power-on, all pins are inputs by default. In GPIO mode, each of the available 12 signals can be individually programmed to be an output through the APB bus. Once programmed, each pin maintains its identity as an input or an output. To enable the GPIO mode, set the mode control bits in the mode control register to [01]. The ARM may read or write to the data register at any time. Writing to a pin that has been programmed as an input has no effect. Reading this register provides the status/values on all of the pins, inputs, and outputs. Keyboard interface mode In keyboard mode, the value of an externally connected keyboard (scanned at a programmed rate) can be read from the APB bus. If the key number select bits in the mode control register are set to [01], the keyboard contains up to 36 keys. Twelve port pins provide a 6x6 scanning matrix; six of the pins are strobes, and six of the pins are inputs. If the key number select bits in the mode control register are set to [10], the keyboard contains up to four keys. Two port pins provide a 2x2 scanning matrix; two of the pins are strobes, and two of the pins are inputs. In this case the remaining 8 pins can be used as GPIO. The circuitry scans the keys at a rate of 10, 20, 40 or 80 ms, controlled by the software. Two successive cycles are required to validate a key. Only one key is allowed down in a scan cycle. Once validated as being down, the “no key down” condition must be validated for two complete cycles when the key is released. Every valid key condition causes the value of the key to be written to a register and an interrupt is set. The key value is coded on eight bits; the lower nibble refers to the column number (0, 1,2…8), and the higher nibble gives the row number (0,1,2…8) of the key-pressed. Control register bits b3 and b2 determine the keyboard scanning rate. Each time the timer expires, the keyboard is scanned. The strobes are each active for sixty microseconds, so in keyboard 6x6, keyboard is scanned in 360 micro seconds; in keyboard 2x2, keyboard is scanned in 120 micro seconds. If only one key down is detected and it is the same key as on the previous scan, a bit is set in the Status register indicating new key data. The code for the key is written to the Keyboard value register. Key release is signaled only once. The keypad encoder initialization is made one time when the application starts (prescaler load value, keyboard enable, scan rate, keyboard operation mode), and then the software handles the interrupt line in order to process keyboard interrupts. Doc ID 018553 Rev 3 463/590 Keyboard controller (KBD) RM0078 Table 143. Key-code table (hex values) Note: COL(0) COL(1) COL(2) COL(3) COL(4) COL(5) Row(0) 0x00 0x01 0x02 0x03 0x04 0x05 Row(1) 0x10 0x11 0x12 0x13 0x14 0x15 Row(2) 0x20 0x21 0x22 0x23 0x24 0x25 Row(3) 0x30 0x31 0x32 0x33 0x34 0x35 Row(4) 0x40 0x41 0x42 0x43 0x44 0x45 Row(5) 0x50 0x51 0x52 0x53 0x54 0x55 1 For the value above, if the PCLK frequency does not exactly equal an integer, use the function below to calculate the new value: New value = {Pclk_frequency}*{original value}/{int(Pclk_frequency)} 2 In KBD mode, IOs used as ROWs (such as strobe signals) are connected as OPEN DRAIN, and IOs used as COLs (such as key pressed) are connected as normal bidirectional. In GPIO mode, all IOs are connected as normal bidirectional. 3 PARDATAREG bits are not directly one to one mapped with external IOs. Mapping between IOs and PARDATAREG bits is shown in the table here below. Table 144. Mapping between external pins and PARDATAREG bits(1) IOs PARDATAREG bits IOs PARDATAREG bits ROW0 PARADATAREG[0] COL0 PARADATAREG[9] ROW1 PARADATAREG[1] COL1 PARADATAREG[10] ROW2 PARADATAREG[2] COL2 PARADATAREG[11] ROW3 PARADATAREG[3] COL3 PARADATAREG[12] ROW4 PARADATAREG[4] COL4 PARADATAREG[13] ROW5 PARADATAREG[5] COL5 PARADATAREG[14] 1. When KBD is used in 2x2 keyboard configuration the IOs {ROW0, ROW1} and {COL0, COL1} are connected to the 2x2 keyboard matrix.The remaining IOs {ROW2, ROW3, ROW4, ROW5} and {COL2, COL3, COL4, COL5} can be used in GPIO mode. 464/590 Doc ID 018553 Rev 3 RM0078 31 A/D converter (ADC) A/D converter (ADC) This chapter focuses on ADC functionality and operation. For the ADC feature list, refer to the SPEAr1340 datasheet: ● Doc ID 023063, Data sheet, SPEAr1340, Dual-core Cortex A9 HMI embedded MPU For technical details about the programmable registers, refer to the following companion document: ● 31.1 RM0089, Reference manual, SPEAr1340 address map and registers Overview SPEAr1340 integrates a 10-bit resolution analog-to-digital converter. The resolution can be extended up to 17 bits by programming the controller. 31.2 Pins For a complete pin description, refer to Doc ID 023063, Data sheet, SPEAr1340, Dual-core Cortex A9 HMI embedded MPU. 31.3 Clocks The ADC uses the following clocks: ● PCLK: the APB clock, 83 MHz ● CLK: Test clock, 2.5 MHz – 20 MHz ● ADC_CLK. It is used to program ADC clock frequency. The duty cycle is done by the ratio of ADC_CLK_H and ADC_CLK_L values while the frequency is the APB clock frequency divided by the sum of these values. The maximum frequency of CLK_ADC is 20 MHz while the minimum is 2.5 MHz; this implies: 5 ≤( ADC_CLK_H + ADC_CLK_L ) ≤33 31.4 Interrupts ADC generates a conversion ready interrupt to ARM which indicates that: ● In normal mode, the single conversion on the selected channel is finished ● In enhanced mode, the conversions on all the enabled channels are finished Doc ID 018553 Rev 3 465/590 A/D converter (ADC) 31.5 RM0078 Functional description The ADC requires 50 µS to enter into functional mode from power-down state. To enable the ADC, the POWERUP bit (bit 4) of ADC_STATUS register should be set to 1. The conversion starts when ENABLE bit (bit 0) of ADC_STATUS register is set to 1. When the conversion is completed, an interrupt signal is generated (bit 8 of ADC_STATUS register is set to 1). At this point, the reading of the data begins. When the reading is finished, the interrupt is cleared. – When POWERUP = 0, the ADC is inactive and output latches contain the last conversion. – Setting POWERUP = 1, the ADC enters in functional mode after 50 µS. To start the conversion you need to set ENABLE = 1. A dedicated circuit controls the internal start signal START (see Figure 174 below). When START = 1, an end of conversion signal (EOC) is reset to 0, the conversion data field of the AVERAGE register is reset to the value 0x0 and the acquisition occurs. Then, the finite-state machine (FSM) switches START= 0 and the conversion phase takes place. The number of clock cycles required to complete a conversion is 13. At the end of conversion and after the reading, if POWERUP bit is 1, EN to ADC is kept to 1. In this way, the next conversion needs only an ADC_STATUS writing and FSM does not wait for start-up time. If POWERUP bit is 0, the FSM switches off the ADC and the next conversion requires again start-up time. Figure 174. Timing diagram of ADC conversions CLK EN 13 CYCLES 13 CYCLES START EOC DATA OF LAST CONV 31.5.1 DATA VALID DATA VALID Enhanced mode If ENM bit (bit 10) of ADC_STATUS register is set to 1, you can perform conversions on the selected channels in a continuous way. The start of conversions may be external (EXTSCANRATE bit = 1) or internal. In the first case you need an external signal to start the conversions, while in the second case you need to configure the SCAN_RATE register to set the number of APB clock cycles between the start of two consecutive scan conversions. To read the conversion results, you need to read CHx_DATA registers. Bit 17 is the VALID bit. This bit is 1 when read data is valid, while 0 in the following cases: 466/590 ● ENM = 0 ● Bit 0 of CHx_CTRL = 0 ● The controller is writing result in it. Doc ID 018553 Rev 3 RM0078 A/D converter (ADC) Starting from channel 0, it is possible to select a request to DMA (DmaEn = 1) in order to transfer the converted data on channels from 0 to DmaLastCh, which is a programmable value (ADC_STATUS register). When the conversion on DmaLastCh is completed, the controller performs a burst request to DMA to start transferring the converted data. In the meantime, ADC continues converting the remaining channels. At the next conversion on channel #0, ADC checks if the last DMA transfer is completed. If it is completed, the controller starts a new conversion on all the channels in the programmed range. If the DMA transfer is not completed yet, the ADC continues the conversion starting from the DmaLastCh+1. Example: If the range of the enabled channels is: [0, 1, 2, 3, 6, 7] and DmaLastCh=3, at the end of the conversion on channel#3 the ADC sends a burst request to DMA. While DMA is transferring the converted data from channels #0 to #3, the ADC continues the conversion on the remaining channels: #6 and #7. At the end of the conversion on channel #7, the ADC should restart from channel #0. However if the DMA is still transferring data, it will start from channel #6 (DmaLastCh + 1). Note: If all the enabled channels are selected for DMA transfers, the conversions will stall. 31.5.2 Touchscreen mode Each CHx_CTRL register has a TOUCHSCREEN bit (bit 4) which indicates that the related channel is used for the touchscreen feature. Enabling this bit, you can select 2 or 4 channels to dedicate to a single or a double touchscreen. When selecting 2 channels, the 1st channel converts the X value and the 2nd one the Y value; when selecting 4 channels, the 1st and 2nd channel convert the 2 X values and the 3rd and 4th one convert the 2 Y values. In both cases, a signal (XY_SEL) is generated to allow the switch of the X/Y axis: it is high when converting the X values, while low when converting the Y ones. To enable the touchscreen, it is also mandatory to enable the channel by writing both the bits 0 and 4 of the CHx_CTRL register. You must also set 2 or 4 channels: any different number causes a wrong behavior of ADC controller. 31.5.3 High-resolution mode The resolution of the ADC analog cell is 10 bits, but resolution can be extended in high resolution mode. High resolution mode is enabled by setting the HIGHRESOLUTION bit in the ADC_STATUS register. In high resolution mode, the ADC performs oversampling. The number of samples is programmable via the NSAMPLES bits (bits 7:5) in the ADC_STATUS register. The sum of the converted results can be read from the AVERAGE register. By reading the sum of the conversion results, software can use decimation or interpolation averaging methods to obtain higher resolution (> 10-bits). By oversampling 4 times, the resolution can be increased by 1 bit, by oversampling 16 times resolution can be increased by 2 bits and so on. Instead of dividing the sum of the converted results by NSAMPLES as in normal averaging, the sum of the samples read from the AVERAGE register should be right shifted n bits, where n is the desired number of bits of resolution. In enhanced mode (ENM=1) and if High resolution mode is enabled (HIGHRESOLUTION=1), the number of samples can be defined individually for each Doc ID 018553 Rev 3 467/590 A/D converter (ADC) RM0078 channel in the CHx_CTRL registers and the sum of the converted results can be read from the CHx_DATA registers. 31.5.4 DMA handshaking interface The ADC has a DMA handshaking interface called ADC_TX as shown in Table 49: DMAC MUX - selecting the peripheral . It is composed by only 2 signals: DMA_SREQ (single request) and DMA_CLR (clear). The ADC uses this interface to transfer the digital converted data to the DMAC, so it is always used in peripheral-to-memory mode and the flow controller is always the DMAC. 468/590 Doc ID 018553 Rev 3 RM0078 32 PWM generators (PWM) PWM generators (PWM) This chapter focuses on PWM functionality and operation. For the PWM feature list, refer to the SPEAr1340 datasheet: ● Doc ID 023063, Data sheet, SPEAr1340, Dual-core Cortex A9 HMI embedded MPU For technical details about the programmable registers, refer to the following companion document: ● 32.1 RM0089, Reference manual, SPEAr1340 address map and registers Overview The PWM module has three configuration registers for each of the four independent channels (PWM1, PWM2, PWM3, PWM4), and one additional register that has the master enable bit for synchronous channel operation. Figure 175. PWM block diagram PWM Prescaler 1 Pulse generator 1 PWM1 Prescaler 2 Pulse generator 2 PWM2 Prescaler 3 Pulse generator 3 PWM3 Prescaler 4 Pulse generator 4 PWM4 PCLK PRESETn APB interface 32.1.1 Configuration registers Pins For a complete pin description, refer to Doc ID 023063, Data sheet, SPEAr1340, Dual-core Cortex A9 HMI embedded MPU. 32.2 Clocks The PWM module uses the system APB clock, PCLK. See also: Chapter 5: Reset and clock generator (RCG). Doc ID 018553 Rev 3 469/590 PWM generators (PWM) RM0078 32.3 Functional description 32.3.1 Prescaler The prescaler contains a 14-bit counter which generates an enable signal for the pulse generators. When the corresponding PWM channel is enabled, the prescale counter is compared against the programmed prescale value (bits 15:2 of Control_Regx). If the counter matches the programmed value, the PWM generator’s enable signal becomes HIGH, otherwise it is kept LOW. If the programmed value is ‘0’, the PWM generator’s enable signal always remains HIGH; this allows the PWMs to operate at the maximum frequency of PCLK – 83 MHz. 32.3.2 Pulse generator The pulse generator consists of a 16-bit counter which operates at PCLK, but incremented only when enabled by the prescaler. The counter value is compared to the programmed duty and period values for generating PWM waves. The counter is incremented periodically, and is reset to ‘0’ every time it reaches the programmed period value. The PWM output is driven ‘1’ for counter values ranging from 0 to (duty - 1), and ‘0’ for values ranging from duty to period. The duty and period values are configured through the registers: Duty_Regx and Period_Regx respectively. 32.4 Programming 32.4.1 Configuring a channel Each channel of the PWM module is configured through three configuration registers: ● Control_Regx ● Duty_Regx ● Period_Regx To configure a channel: 1. Set a prescale value (Control_Regx) If required, prescale the input clock to the PWM counter (see Note 1 on page 471). The clock can be scaled from PCLK – 83 MHz. 2. Set the desired period (Period_Regx) This value corresponds to the number of prescaled clock cycles. 3. Set the desired duty cycle (Duty_Regx) This value also corresponds to the number of prescaled clock cycles. Make the duty cycle less than or equal to the period (set in step 2), or the output will remain high. For more information, see notes on page 471. 470/590 Doc ID 018553 Rev 3 RM0078 PWM generators (PWM) 4. Enable the output channel(s) (Control_Regx bit 0 and Master_Ctrl bit 0) If the channels must work synchronously: a) Enable bit 0 of all of the required Control_Regx. b) Enable Master_Ctrl bit 0. If no synchronous operation is required: a) Enable Master_Ctrl bit 0. b) Configure and enable the desired channels. See also: Figure 176: Output pulse generation example (Duty = 3, Period = 7) on page 471 1 Without any prescaling, the output pulse minimum frequency can be 83000 / (2^16 + 1) ~= 1.266 KHz. For an output pulse below this frequency, the input clock must be prescaled to the PWM counter; any combination of prescaling and period factors can be used that result in the desired output frequency. Example of generating a 4 KHz pulse with a 50% duty cycle: Prescale = 0x0; Period = 0x510D; Duty = 0x2887 Prescale = 0x19; Period = 0x33D; Duty = 0x19F 2 If the duty value is greater than the period value, the corresponding channel output remains high. 3 The required minimum value of the duty register is 1 (0 = LOW output). Because of this, the maximum output frequency value is PCLK/2 with a duty cycle of 50%, achieved by setting Duty = 1, Period = 1, and Prescaler = 0. The maximum PCLK frequency = 83 MHz. Duty cycle = Duty/(Period+1) * 100 %, where the duty setting is less than or equal to the period setting. Duty cycle resolution = 1/(Period + 1) (duty cycle is defined in terms of prescaler output clock pulses). Minimum duty cycle = 100 /(Period + 1) % = 100 / (2^16 + 1) % ~= 0.0015 % (for max period setting of 0xFFFF). Maximum duty cycle = 100 % (HIGH output, no pulse, duty value greater than period value). Figure 176. Output pulse generation example (Duty = 3, Period = 7) Prescaled Counter clock nf id e Note: PWMx O/P PWM_duty = 3 (Duty Reg) PWM_period = 8 (Period Reg + 1) Doc ID 018553 Rev 3 471/590 HDMI CEC interfaces (CEC) 33 RM0078 HDMI CEC interfaces (CEC) This chapter focuses on CEC functionality and operation. For the CEC feature list, refer to the SPEAr1340 datasheet: ● Doc ID 023063, Data sheet, SPEAr1340, Dual-core Cortex A9 HMI embedded MPU For technical details about the programmable registers, refer to the following companion document: ● 33.1 RM0089, Reference manual, SPEAr1340 address map and registers Overview CEC is an asynchronous transfer mode adaptation layer (AAL) protocol that provides highlevel control functions among the various audiovisual products in a user’s environment. CEC operates at low speeds, with minimal processing and memory overhead. Figure 177. CEC block diagram 472/590 Doc ID 018553 Rev 3 RM0078 33.2 HDMI CEC interfaces (CEC) Pins For a complete pin description, refer to Doc ID 023063, Data sheet, SPEAr1340, Dual-core Cortex A9 HMI embedded MPU. 33.3 Clocks The CEC is operated with a single AMBA clock, HCLK. See also: Chapter 5: Reset and clock generator (RCG) 33.4 Interrupts CEC generates the interrupt cec_it, the status of which can be found by reading register CEC_CTL. This interrupt clears when the corresponding bit is cleared in register CEC_CTL. 33.5 Functional description The CEC interface handles complete messages, but requires that the host CPU provides or unloads the data bytes one-by-one. 33.5.1 Control logic The CEC interface assumes one of the following states: ● STANDBY ● IDLE ● RX ● RX_ERROR ● TX ● TX_ERROR Doc ID 018553 Rev 3 473/590 HDMI CEC interfaces (CEC) RM0078 Figure 178. CEC control logic STANDBY STANDBY is entered on a chip reset, or on a reset of CEC_CFG.P_EN, and exited by setting the P_EN bit. In STANDBY: ● Any on-going transmission or reception operation is not interrupted and completes normally. The interface is actually in STANDBY mode when the P_EN bit is read back as 0. ● Activity on the CEC line is ignored, and the clock prescaler is stopped for minimum power consumption. IDLE IDLE is entered whenever a message has been transmitted or received successfully, or whenever an error has been processed. In IDLE, the CEC interface looks for either a transmit request (the TSOM bit is set) or a start bit. RX RX is entered when a start bit is detected while no message is pending for transmission. Once the header is received, the destination address is compared with the value programmed in the own address register CEC_OAR. If a match is not found and the address is not the broadcast address 0xF, the block is not acknowledged and the controller reverts 474/590 Doc ID 018553 Rev 3 RM0078 HDMI CEC interfaces (CEC) back to the IDLE state. Otherwise, the controller remains in the RX state where the host CPU is requested to retrieve all message bytes from the RX buffer one-by-one. The RBTF bit set in the control register CEC_CTL signals an available byte. The host CPU can become aware of this either by polling the latter register or by enabling interrupts in the configuration register CEC_CFG. If the RBTF bit is not cleared by the time a new block is received, the new block is not acknowledged, forcing the initiator to restart the message transmission, giving the host CPU another chance to retrieve all message bytes on time. Note: It is the responsibility of the software driver to ignore messages where the number of operands is less than the number specified for that opcode. Figure 179. Example: a complete message reception Note: Because a message may have been queued for transmission but arbitration lost, two different values can be read from the control register CEC_CTL. RX_ERROR The interface enters the RX_ERROR state when a condition listed in Table 145 occurs. The RX_ERROR state is not left until the receive error flag RX_ERR is cleared, and when the error state is left depends on the selected error resync mode: ● Default mode waits for an inter-frame spacing of at least five bit times ● Advanced mode leaves immediately Table 145. RX_ERROR conditions, types, and actions Error condition Error type Action A broadcast message is negatively acknowledged Acknowledge No specific action is taken The RBTF bit is not cleared while a new byte is ready to be written to the RX buffer RBTF Directly-addressed messages are not acknowledged, and broadcast messages are negatively acknowledged A start bit is detected before the end-of-message flag Start bit No specific action is taken Doc ID 018553 Rev 3 475/590 HDMI CEC interfaces (CEC) RM0078 Table 145. RX_ERROR conditions, types, and actions Error condition Error type A rising edge on the CEC line is detected outside the applicable window Action Bit timing The CEC line pulls low for 70 time quanta A falling edge on the CEC line is detected Bit period outside the applicable window Figure 180. Example: RX_ERROR Note: Because a message may have been queued for transmission but arbitration lost, two different values can be read from the control register CEC_CTL. TX The interface enters the TX state when the TSOM bit in the control register CEC_CTL is set. Once in this state, the interface ensures that the required signal-free time has elapsed before generating a start bit by waiting for the quanta counter of the bit timing logic to exceed the value listed in Table 146, unless another device emits a start bit (in which case the arbitration phase begins and lasts until the initiator address is fully transmitted). Table 146. Wait loop Previous state Note: 476/590 Wait value TX_ERROR 192 The device was receiving 288 Any other 384 It is the responsibility of the software driver to send an initiator address consistent with the logical address programmed in the own address register CEC_OAR. The arbitration is lost if the received initiator address, contained in the least significant nibble of the shift register, differs from the initiator address still present in the TX buffer. In this case, the controller Doc ID 018553 Rev 3 RM0078 HDMI CEC interfaces (CEC) switches to the RX state immediately, but continues to try to transmit after the receive phase until it is granted ownership of the bus. If arbitration is not lost, a new byte is requested to be written to the TX buffer each time the TBTF bit is set in the control register. The host CPU can become aware of this by polling the control register or by enabling interrupts in the configuration register CEC_CFG. If it does not achieve the required task on time, a transmit error flag TX_ERR is set. The message is transmitted successfully when the TEOM bit is set, but it should be considered lost as soon as the TX_ERR bit is set. Figure 181. Example: a complete message transmission TX_ERROR The interface enters the TX_ERROR state when a condition listed in Table 147 occurs. If the TX_ERROR state is not left before the transmit error flag TX_ERR is cleared, when the error state is left depends on the selected error resync mode: ● The default mode waits for an inter-frame spacing of at least three bit times ● The advanced mode leaves immediately Doc ID 018553 Rev 3 477/590 HDMI CEC interfaces (CEC) RM0078 Table 147. TX_ERROR conditions, types, and actions Error condition Error type A directly-addressed message block is not acknowledged, or A broadcast message block is negatively acknowledged Action Because no error signaling mechanism is specified for the initiator, no specific action is undertaken Acknowledge apart from aborting the current message and clearing the transmit request flag TSOM. The error handler decides whether retransmission is The TBTF bit is not cleared when the possible, depending on whether transmission has RBTF requested byte must be transmitted already failed six times or not, and sets the transmit request flag if required. The timing bit logic senses an unexpected bit Line Figure 182. Example: a TX_ERROR 478/590 Doc ID 018553 Rev 3 RM0078 33.5.2 HDMI CEC interfaces (CEC) Bit timing logic The bit timing logic (BTL) is in charge of extracting valid bits from the CEC line and for signaling line errors. It operates at a 0.05 ms time quantum, because the bit timings in the specification are expressed with this level of precision. The Rx data is resynchronized on the system clock and a 2/3 majority voter removes high frequency spikes before processing at the time-quantum rate. Also, to improve immunity to transition bounces and positive spikes, transitions are ignored for one time quantum period following a valid edge. On a valid Rx falling edge, the quanta counter is captured and reset. If the captured value is outside valid bounds (see Figure 183), a bit period error has been detected and is signaled by pulling the line low for 70 time quanta. On a valid Rx rising edge, the quanta counter is captured and compared against valid windows. If the edge is found to be outside, a line error is signaled unless the device has been programmed not to report such violations. Note: If a line error occurs while a start bit is expected, the whole message is ignored and no error is reported. In the absence of a rising edge, the quanta counter is left counting up to 511. Retransmission is allowed when the counter value is above 192. A new initiator may transmit when the counter is above 288, but the same initiator must wait until the counter reaches 384. Figure 183. Quanta counter timing Doc ID 018553 Rev 3 479/590 HDMI CEC interfaces (CEC) 33.5.3 RM0078 Bit shaping logic (BSL) The bit shaping logic generates the proper line waveform to signal a start bit, a logical 1 data bit, or an error bit. The same time quantum is used as for the bit timing logic. Figure 184. Bit shaping logic timing 33.5.4 Prescaler The prescaler defines the time quantum for the bit timing logic and the bit shaping logic, and provides a time quantum reference for complying with the required signal-free time. A 12-bit counter provides the necessary 50 s timebase, allowing for system clocks up to 82 MHz. The counter resets at the beginning of every bit to enable the bit timing logic to operate with maximum precision. 33.5.5 Normal functional behavior Message description All transactions on the CEC line consist of an initiator and one or more followers. The initiator sends the message structure and the data. The follower is the receives any data, and sets any acknowledgement bits. A message is conveyed in a single frame, which consists of a start bit followed by a header block and, optionally, an opcode and a variable number of operand blocks. All these blocks are made of an 8-bit payload (with the most significant bit transmitted first) followed by an end-of-message (OEM) bit and an acknowledge (ACK) bit. The EOM bit is set in the last block of a message and kept reset in all others. If a message contains additional blocks after an OEM is indicated, those additional blocks should be ignored. The EOM bit may be set in the header block to ‘ping’ other devices, to ascertain whether they are active. The acknowledge bit is always set to high impedance by the initiator so that it can be driven low either by the follower, which has read its own address in the header, or by the follower that needs to reject a broadcast message. The header comprises the source logical address field and the destination logical address field. The special address 0xF is used for broadcast messages. 480/590 Doc ID 018553 Rev 3 RM0078 HDMI CEC interfaces (CEC) Figure 185. Message description Bit timing The format of the start bit is unique and identifies the start of message. It should be validated by its low duration and by its total duration. All the remaining data bits in the message, after the start bit, have consistent timing. The high-to-low transition at the end of the data bit is the start of the next data bit, except for the final bit where the CEC line remains high. Figure 186. Bit timing Doc ID 018553 Rev 3 481/590 HDMI CEC interfaces (CEC) RM0078 Line use Devices that wish to transmit or retransmit a message onto the CEC line must ensure that the CEC line has been inactive for a number of bit periods. This signal-free time is defined as the time since the final bit of the previous frame, and depends on the initiating device and on the current status as shown in Figure 187. Figure 187. Signal-free time Because only one initiator is allowed at any one time, an arbitration mechanism is provided to avoid conflict when more than one initiator begins transmitting at the same time. CEC line arbitration starts with the leading edge of the start bit and continues until the end of the initiator address bits within the header block. During this period, the initiator monitors the CEC line and if, while driving the line to high impedance, it reads it back to 0, the initiator assumes it has lost arbitration and stops transmitting. It then becomes a follower. Figure 188. Arbitration phase 33.5.6 Error conditions and error handling Bit error A data bit (excluding the start bit) is considered invalid if the period between falling edges is less than the minimum bit period. A follower is expected to note such errors by generating a low bit period on the CEC line of 1.4 to 1.6 times the normal data bit period (nominally 3.6 ms). Figure 189. Bit error 482/590 Doc ID 018553 Rev 3 RM0078 HDMI CEC interfaces (CEC) Message error A message is considered to be lost and therefore able to be retransmitted if: ● A message is not acknowledged in a directly-addressed message ● A message is negatively acknowledged in a broadcast message ● A low impedance is detected on the CEC line when this condition is not expected (line error) Attempt retransmission at least once, and up to five times. Doc ID 018553 Rev 3 483/590 Display controller (CLCD) 34 RM0078 Display controller (CLCD) This chapter focuses on CLCD functionality and operation. For the CLCD feature list, refer to the SPEAr1340 datasheet: ● Doc ID 023063, Data sheet, SPEAr1340, Dual-core Cortex A9 HMI embedded MPU For technical details about the programmable registers, refer to the following companion document: ● 34.1 RM0089, Reference manual, SPEAr1340 address map and registers Overview The TFT LCD controller provides all of the necessary control signals to interface directly to a variety of TFT LCD panels. The following figure shows the display controller block diagram. Figure 190. LCD controller block diagram AHB slave interface Processor status & control registers LCD timing and pixel clock generation LCD timing & control Pulse width modulation generator LCD PWM DMA controller AXI master interface Input FIFO 2048x64 Pixel unpack Palette (256 x 16) Output FIFO 16 Words x 18/24-bit LCD data Output formatter Interrupt Interrupt status & mask registers 34.2 Pins For a complete pin description, refer to Doc ID 023063, Data sheet, SPEAr1340, Dual-core Cortex A9 HMI embedded MPU. 484/590 Doc ID 018553 Rev 3 RM0078 34.3 Display controller (CLCD) Clocks The LCD controller core has the following clock domains: ● ● Bus clock (HCLK) domain – AXI Master and AHB slave interfaces – Control and status registers – DMA controller – Write side of the palette two-port RAM – Write side of the input FIFO – Interrupt controller Pixel clock (PCLK) domain – Read side of the input FIFO – Read side of the palette two-port RAM – Pixel unpack – Timing & control unit – Output formatter Within the pixel clock domain, there are two versions of PCLK: the internal pixel clock, PCLK, which serves as the on-chip clock for the LCD pixel pipeline logic; and the external pixel clock, lcd_pclk, which serves as the off-chip clock to the LCD panel pixel clock input. The clock generator derives PCLK and lcd_pclk from input HCLK or pclk_in. The hclk is the slave bus clock input. The pclk_in is a separate LCD input clock. The clock generator outputs are determined by the pixel clock timing register (PCTR) programming parameters. The generator can generate PCLK dividing down from HCLK rates from 1 (bypass) to 128. Because it might not be convenient to derive PCLK from HCLK, a separate clock input, pclk_in, is provided. The pclk_in can then be set to the exact value required by the LCD panel using MISC registers. Finally, note that lcd_pclk is different from PCLK in that lcd_pclk can be held inactive while control register 1 (CR1), programming bit LCE is inactive. This facilitates the power sequencing requirements of the panel. See also: Chapter 5: Reset and clock generator (RCG). 34.4 Interrupts There are three coordinated interrupt registers: the interrupt status register (ISR), the interrupt mask register (IMR), and the interrupt vector register (IVR). The ISR and IMR are both read/write registers while the IVR is read-only. Any of the internally generated interrupts sets a corresponding bit in the ISR. If the error’s corresponding mask bit is set in the IMR, then the corresponding error bit in the IVR register sets, generating an interrupt to the processor. The processor interrupt handler can respond by reading the IVR to determine the particular interrupt to process. At the end of an interrupt response, the programming can reset the interrupt in the ISR by writing logic 1 to the corresponding interrupt bit in the ISR. Through the IMR register the programming has full control over which interrupts to enable. Doc ID 018553 Rev 3 485/590 Display controller (CLCD) RM0078 34.5 Functional description 34.5.1 LCD controller core The LCD is first initialized by the processor by means of the AHB slave bus interface. This interface is a read/write interface in which the LCD can only respond – and not initiate – to bus transactions. Minimal setup of the control and status registers are the timing registers for horizontal and vertical timing signals (registers HTR, VTR1, VTR2 and HVTER) and the DMA base address register (register DBAR). After that, the control bit LCE in the control register must be set and the LCD runs, accessing frame buffer memory and processing and piping the data through to the display. If you use the palette, you must first load it. The palette is programmable by the PSS bit of CR1. There are two options for loading the palette: ● Statically: by the processor via the AHB slave interface, or ● Dynamically: with each frame, via the AXI master interface and the frame buffer. The start of each frame begins with an internal start sync pulse from the timing and control unit (see Section 34.5.3: Timing and control unit), coincident with the vertical synchronization signal. This start sync pulse initiates the DMA controller to start accessing data from frame buffer memory via the master interface; the start sync pulse also initiates the pixel unpack to start accepting data from its side of the input FIFO. The master interface initiates read transactions with the bus. There are programmable options for 4, 8, 16 word bursts read lengths to improve bus utilization. Received frame data is written into the input FIFO. The FIFO bridges the two clock domains. The pixel unpack will unpack 1, 2, 4, 8, 16, 18, or 24 bit-per-pixel (bpp) words from the frame buffer word. Depending on the bpp programming and whether a palette is used, the pixel data is sent to either the palette or the output formatter. The output formatter contains an output FIFO which queues ready pixels for synchronization with the LCD panel timing signals. The LCD supports 2 port interfaces with the addition of a second pixel processing pipeline. The additional pipeline consists of a Palette, output FIFO, and output formatter. Control bit LPS in control register 1 (register CR1) directs the pixel flow within the pixel unpack module to either port/link 1 pipeline only, or unpacking adjacent pixels and presenting them in parallel to both port/link 1 and 2 pipelines. 486/590 Doc ID 018553 Rev 3 RM0078 34.5.2 Display controller (CLCD) Master and slave bus interfaces The LCD controller supports the following bus interfaces: ● ● 34.5.3 AMBA 2.0 AHB slave interface, that connects the processor to the LCD controller’s control & status registers, including the palette RAM. It is characterized by: – 32-bit data interface – SINGLE word burst – OKAY response only AMBA 3.0 AXI master, that connects the frame buffer memory to the LCD controller’s DMA controller and input FIFO. It is characterized by: – 64-bit data interface – 4, 8, 16 word bursts – incrementing-address burst (INCR) only – aligned transfers only – outstanding read is supported (maximum number of outstanding memory read requests is 4, depending on the MRR register) – Three error monitors, generating a maskable interrupt: Read burst length error, Return ID error, Response signal error – Overlap read burst is not supported – Out-of order transaction completion is not supported – No FIXED or WRAP burst types allowed Timing and control unit The timing and control unit uses the horizontal timing register (HTR), the vertical timing registers 1 & 2 (VTR1, VTR2) and the horizontal/vertical timing extension register (HVTER) to generate timing signals lcd_vsync, lcd_hsync, and lcd_de to the LCD panel. The timing unit remains inactive till control bit LCE in control register 1 (CR1) goes active. At that point the timing & control unit runs till LCE is de-asserted. At that time, the timing unit will keep running till the end of the current frame, and then orderly shut down. The timing unit can be reactivated with LCE re-asserted, but often power to the display must be re-cycled. This can be accomplished by control bit LPE in control register 1 (CR1) connected as an enable to an external power source for the LCD panel. Control bit LCE also plays a role in power sequencing. On startup, while LCE is inactive, timing signals lcd_pclk, lcd_hsync, lcd_vsync, lcd_de and data signals lcd_r[7:0], lcd_g[7:0], lcd_b[7:0] are held to logic zero. On LCE shutdown, after the current frame being displayed completes and the timing unit halts, these same signals are forced to logic zero. at that point power can safely be removed from the LCD panel. The timing unit provides interrupt VCT which triggers on one-of-four timing trigger points during the vertical scan period. The point of triggering is programmable via the interrupt scan compare register (ISCR). 34.5.4 DMA controller & memory interface The DMA controller initializes via the internal frame start pulse with the transfer of the DMA base address register (DBAR) to the DMA current address register (DCAR) and the commencement of the first memory transfer transaction. The numbers of words in a burst are programmed by FDW in control register 1 (CR1). Based on FDW and the number of Doc ID 018553 Rev 3 487/590 Display controller (CLCD) RM0078 empty words in the FIFO, a service request by the DMA controller to the master interface initiates a frame buffer read. The DMA controller keeps total track of the number of words per frame that are fetched from frame buffer memory. If the PSS bit in CR1 is set, indicating palette load from the frame buffer, the DMA load is divided into two segments. First, the palette is loaded directly from frame buffer memory (bypassing the input FIFO), based on the number of words indicated by bits-per-pixel control bits BPP in CR1. Second, after the palette is loaded, the DMA controller loads the appropriate number of frame buffer words for each frame through the input FIFO. If, on the other hand, the PSS bit in CR1 is not set, indicating palette load from the processor via the slave interface, or if there is no palette required (for 16, 18, 24 bpp), the DMA controller only loads the appropriate number of frame buffer words for each frame through the input FIFO. The software must program the DMA end address register (DEAR) with the frame buffer end address. The DMA controller will keep reading frame buffer words until the current address in DCAR equals DEAR. At that point, the DMA Controller will halt until the next frame, when it reads from the frame buffer starting with the address in DBAR. In case of outstanding memory read requests (i.e. MRR register equal to 01 or 10), the register DEAR_MRR overrides DEAR register. When MRR = 01 or 10, DEAR_MRR serves as a lookahead ad-dress, preventing the DMAC from generating memory read request when there are overlap read request outstanding. 34.5.5 Frame buffer organization The frame buffer memory is not included in the LCD controller core. The frame buffer attached to the master interface provides encoded or unencoded pixels for display on the LCD panel. If the PSS bit in CR1 is set, indicating palette load from the frame buffer, the lowest memory locations of the frame buffer must contain the contents for load into the palette by the DMA controller. Table 148 lists the number of the frame buffer memory words that need to be allocated for each palette based on the bits-per-pixel control bits BPP in CR1. Table 148. Frame buffer support for palette load (PSS =1) Frame buffer bits-per pixel (bpp) Palette size required Number of required 32-bit frame buffer words 1 2 entries by 16-bit 1 2 4 entries by 16-bit 2 4 16 entries by 16-bit 8 8 256 entries by 16-bit 128 In CLCD 1-port and 2-ports configurations, both palettes have to be loaded from the frame buffer (even if in the 1-port case one palette is not used). So the number of frame buffer memory words that need to be allocated is doubled. Frame buffer format for no palette load from frame buffer memory, or for bits-per-pixel of 16, 18, 24, which require no palette, is shown in Table 149. 488/590 Doc ID 018553 Rev 3 RM0078 Display controller (CLCD) Table 149. Frame buffer organization, PSS =0 or BPP = 16, 18, 24 bpp Frame buffer base address offset Frame buffer contents 0x0 Start of pixel data Frame buffer format for palette load from frame buffer memory, with bits-per-pixel of 1, is shown in Table 150. Addresses from 0x08 to 0x1C do not contain any specific data. Table 150. Frame buffer organization, PSS =1, BPP = 1 bpp FB base address offset Frame buffer contents 0x0 Palette1 Entry 1 Palette1 Entry 0 0x4 Palette2 Entry 1 Palette2 Entry 0 ... ... ... 0x20 Start of encoded pixel data Frame buffer format for palette load from frame buffer memory, with bits-per-pixel of 2, is shown in Table 151. Addresses from 0x10 to 0x1C do not contain any specific data. Table 151. Frame buffer organization, PSS =1, BPP = 2 bpp FB base address offset Frame buffer contents 0x0 Palette1 Entry 1 Palette1 Entry 0 0x4 Palette2 Entry 1 Palette2 Entry 0 0x08 Palette1 Entry 3 Palette1 Entry 2 0x0C Palette2 Entry 3 Palette2 Entry 2 ... ... ... 0x20 Start of encoded pixel data Frame buffer format for palette load from frame buffer memory, with bits-per-pixel of 4 is shown in Table 152. Table 152. Frame buffer organization, PSS =1, BPP = 4 bpp FB base address offset Frame buffer contents 0x00 Palette1 Entry 1 Palette1 Entry 0 0x04 Palette2 Entry 1 Palette2 Entry 0 0x08 Palette1 Entry 3 Palette1 Entry 2 0x0C Palette2 Entry 3 Palette2 Entry 2 0x10 Palette1 Entry 5 Palette1 Entry 4 0x14 Palette2 Entry 5 Palette2 Entry 4 0x18 Palette1 Entry 7 Palette1 Entry 6 0x1C Palette2 Entry 7 Palette2 Entry 6 Doc ID 018553 Rev 3 489/590 Display controller (CLCD) RM0078 Table 152. Frame buffer organization, PSS =1, BPP = 4 bpp (continued) FB base address offset Frame buffer contents 0x20 Palette1 Entry 9 Palette1 Entry 8 0x24 Palette2 Entry 9 Palette2 Entry 8 0x28 Palette1 Entry 11 Palette1 Entry 10 0x2C Palette2 Entry 11 Palette2 Entry 10 0x30 Palette1 Entry 13 Palette1 Entry 12 0x34 Palette2 Entry 13 Palette2 Entry 12 0x38 Palette1 Entry 15 Palette1 Entry 14 0x3C Palette2 Entry 15 Palette2 Entry 14 0x40 Start of encoded pixel data Frame buffer format for palette load from frame buffer memory, with bits-per-pixel of 8, is shown in Table 153. Table 153. Frame buffer organization, PSS =1, BPP = 8 bpp FB base address offset 0x000 Palette1 Entry 1 Palette1 Entry 0 0x004 Palette2 Entry 1 Palette2 Entry 0 0x008 Palette1 Entry 3 Palette1 Entry 2 0x00C Palette2 Entry 3 Palette2 Entry 2 0x010 Palette1 Entry 5 Palette1 Entry 4 0x014 Palette2 Entry 5 Palette2 Entry 4 0x018 Palette1 Entry 7 Palette1 Entry 6 0x01C Palette2 Entry 7 Palette2 Entry 6 0x020 Palette1 Entry 9 Palette1 Entry 8 0x024 Palette2 Entry 9 Palette2 Entry 8 0x028 Palette1 Entry 11 Palette1 Entry 10 0x02C Palette2 Entry 11 Palette2 Entry 10 0x030 Palette1 Entry 13 Palette1 Entry 12 0x034 Palette2 Entry 13 Palette2 Entry 12 0x038 Palette1 Entry 15 Palette1 Entry 14 0x03C Palette2 Entry 15 Palette2 Entry 14 ... ... ... 0x3F8 Palette1 Entry 255 Palette1 Entry 254 0x3FC Palette2 Entry 255 Palette2 Entry 254 0x400 490/590 Frame buffer contents Start of encoded pixel data Doc ID 018553 Rev 3 RM0078 Display controller (CLCD) 34.5.6 Input FIFOs There are 3 input FIFOs: one for base screen resolution (always active) and two for overlay screen (if overlays are enabled). Each FIFO has a 2k (2048)-word depth by 64-bit width memory size. The DMA controller and master interface control the write side on the bus clock (ACLK) domain, while the pixel unpack controls the read side on the pixel clock (PCLK) domain. Grey encoded address pointers are used for FIFO empty and full flag calculations as well as when there are 4, 8 or 16 empty word locations. Based on the FDW programming bits in control register 1 (CR1) and the number of empty FIFO locations, a service request for 4, 8, or 16-word bursts from memory is issued by the DMA Controller to the master interface. Note that 16 word bursts should only be used for optional FIFO implementations larger that N=16 words so as not to unnecessarily starve the Pixel Unpack for data. Interrupts IFO (Input FIFO – Overrun) and IFU (Input FIFO – Underrun) trigger whenever there is a FIFO write with no empty locations or read with no valid data. The write side by design cannot overrun the FIFO. While tested, this protection remains for potential error analysis. The read side unpack logic can cause an underrun, but this is due to insufficient Master Bus bandwidth or frame buffer memory response, causing the input FIFO to go empty while there is a request for data by the unpack logic. 34.5.7 Pixel unpack The pixel unpack reads 32-bit data from the input FIFO and extracts 1, 2, 4, 8, 16, 18, or 24 bits-per-pixel data depending on the BPP programming bits in CR1. Note that 1, 2, 4, 8 bpp are encoded pixels that index an entry into the palette while 16, 18, 24 bpp are unencoded pixels that directly drive the LCD panel via the output formatter. The CLCD supports bigendian, little-endian, and Windows CE data formats. With each frame, the internal start sync pulse from the timing & control unit initializes the pixel unpack to start de-queuing words from the Input FIFO as they appear on the read side. The following tables list the structure of the data in each frame buffer word in the input FIFO corresponding to the endian and BPP programming combinations. For each of the three supported data formats, the pixel unpack extracts the appropriate display pixel from the data word. The following are the three data types, with assigned mnemonics: ● LEB_LEP: EB0 = 0, EPO = 0 little endian frame buffer byte, placed in little endian pixel byte ● BEB_BEP: EB0 = 1, EPO = X big endian frame buffer byte, placed in big endian pixel byte ● LEB_BEP (1,2,4 bpp only): EB0 = 0, EPO = 1 little endian frame buffer byte, placed in big endian pixel byte (Windows CE format) Table 154. LEB_LEP, Input FIFO Read Side bits [31:16] Input FIFO Read Side Output Bits BPP 1 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 P31 P30 P29 P29 P27 P26 P25 P24 P23 P22 P21 P20 P19 P18 P17 P16 Doc ID 018553 Rev 3 491/590 Display controller (CLCD) RM0078 Table 154. LEB_LEP, Input FIFO Read Side bits [31:16] (continued) Input FIFO Read Side Output Bits BPP 31 30 29 P15 28 27 P14 26 25 P13 24 23 P12 22 21 P11 20 19 P10 18 17 P9 16 P8 2 1 0 1 0 1 0 1 P7 0 1 0 1 P6 0 1 0 P5 1 0 P4 4 3 2 1 0 3 2 1 0 3 2 1 0 P3 3 2 1 0 P2 8 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 - - - - - - 17 16 23 22 21 20 19 18 17 16 P1 16 15 14 13 12 11 10 9 8 P0 18 - - - - - - - - P0 24 - - - - - - - - Table 155. LEB_LEP, Input FIFO Read Side bits [15:0] Input FIFO Read Side Output Bits BPP 1 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 P15 P14 P13 P12 P11 P10 P9 P8 P7 P6 P5 P4 P3 P2 P1 P10 P7 P6 P5 P4 P3 P2 P1 P0 2 1 0 1 0 1 0 P3 1 0 1 0 P2 1 0 1 0 P1 1 0 P0 4 3 2 1 0 3 2 1 0 3 2 1 0 P1 3 2 1 0 3 2 1 0 P0 8 7 492/590 6 5 4 3 2 1 0 7 Doc ID 018553 Rev 3 6 5 4 RM0078 Display controller (CLCD) Table 155. LEB_LEP, Input FIFO Read Side bits [15:0] (continued) Input FIFO Read Side Output Bits BPP 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 - - - - - - 17 16 23 22 21 20 19 18 17 16 P0 16 15 14 13 12 11 10 9 8 P0 18 - - - - - - - - P0 24 - - - - - - - - Table 156. BEB_BEP, Input FIFO Read Side bits [31:16] Input FIFO Read Side Output Bits BPP 1 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 P0 P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12 P13 P14 P15 P0 P1 P2 P3 P4 P5 P6 P7 2 1 0 1 0 1 0 P0 1 0 1 0 P1 1 0 1 0 P2 1 0 P3 4 3 2 1 0 3 2 1 0 3 2 1 0 P0 3 2 1 0 P1 8 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 6 5 4 3 2 1 0 - - - - - 17 16 22 21 20 19 18 17 16 P0 16 15 14 13 12 11 10 9 8 7 P0 18 - - - - - - - - - P0 24 - - - - - - - - 23 Doc ID 018553 Rev 3 493/590 Display controller (CLCD) RM0078 Table 157. BEB_BEP, Input FIFO Read Side bits [15:0] Input FIFO Read Side Output Bits BPP 1 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 P16 P17 P18 P19 P20 P21 P22 P23 P24 P25 P26 P27 P28 P29 P30 P31 P8 P9 P10 P11 P12 P13 P14 P15 2 1 0 1 0 1 0 P4 1 0 1 0 P5 1 0 1 0 P6 1 0 P7 4 3 2 1 0 3 2 1 0 3 2 1 0 P2 3 2 1 0 P3 8 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 P1 16 15 14 13 12 11 10 9 8 P0 18 15 14 13 12 11 10 9 8 P0 24 15 14 13 12 11 10 9 8 Table 158. LEB_BEP, Input FIFO Read Side bits [31:16] Input FIFO Read Side Output Bits BPP 1 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 P24 P25 P26 P27 P28 P29 P30 P31 P16 P17 P18 P19 P20 P21 P22 P23 P12 P13 P14 P15 P8 P9 P10 P11 2 1 0 1 0 1 0 P6 1 0 1 0 P7 1 0 1 0 P4 1 0 1 0 P5 4 3 494/590 2 1 0 3 2 1 0 3 Doc ID 018553 Rev 3 2 1 0 3 2 RM0078 Display controller (CLCD) Table 158. LEB_BEP, Input FIFO Read Side bits [31:16] (continued) Input FIFO Read Side Output Bits BPP 31 30 29 28 27 26 25 24 23 22 21 20 P3 19 18 17 16 P2 8 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 - - - - - - 17 16 23 22 21 20 19 18 17 16 P1 16 15 14 13 12 11 10 9 8 P0 18 - - - - - - - - P0 24 - - - - - - - - Table 159. LEB_ BEP, Input FIFO Read Side bits [15:0] Input FIFO Read Side Output Bits BPP 1 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 P8 P9 P10 P11 P12 P13 P14 P15 P0 P1 P2 P3 P4 P5 P6 P7 P4 P5 P6 P7 P0 P1 P2 P3 2 1 0 1 0 1 0 1 P2 0 1 0 P3 1 0 1 0 P0 1 0 P1 4 3 2 1 0 3 2 1 0 3 2 1 0 P1 3 2 1 0 P0 8 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 P0 16 15 14 13 12 11 10 9 8 P0 18 15 14 13 12 11 10 9 8 Doc ID 018553 Rev 3 495/590 Display controller (CLCD) RM0078 Table 159. LEB_ BEP, Input FIFO Read Side bits [15:0] (continued) Input FIFO Read Side Output Bits BPP 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 P0 24 15 34.5.8 14 13 12 11 10 9 8 Palette lookup table The palette is a 256 entry by 16-bit lookup table implemented as a two-port 128 entry by 32- bit RAM. One port ties in with the bus clock (HCLK) domain, and based on control bit PSS in control register 1 (CR1), either the slave interface via the processor, or the master interface via the DMA controller and the frame buffer memory can fill the palette. Regardless of the PSS setting, the processor via the slave interface can always read the palette RAMs contents. The second port ties in with the pixel clock (PCLK) domain, enabling the palette RAMs contents to be indexed by the pixel unpack encoded pixel output. The palette’s output flows to the output formatter. Selection of which 16-bit half of the 32-bit Palette entry is determined by the endian setting and the least significant bit of the indexing encoded pixel input. Control bit EBO in CR1 determines the endian setting: In little endian mode, when the input index encoded pixel least significant bit is zero, the lower 16-bit palette entry is selected; In big endian mode, when the input index encoded pixel least significant bit is zero, the upper 16-bit palette entry is selected. 34.5.9 Output FIFO and formatter The output formatter contains an output FIFO which comprises of a 16 word by 24-bit memory. Depending on the bits-per-pixel programming, the incoming selection is either the pixel unpack (16, 18, 24 bpp) or the palette (1, 2, 4, 8 bpp). The output FIFO is slave to the unpack pixel which drives pixels to it either directly or through the palette. The output FIFO provides back-pressure pipeline freeze capability when it cannot accept another pixel for queuing. This allows the LCD controller to prefetch frame buffer data at the start of a frame, filling up both input and output FIFOs and then freezing till the first line is ready to display. Once the first data enable (lcd_de) signal from the timing & control unit is active, the output FIFO read side continuously reads for the remainder of each active horizontal line period. These reads in turn reactivate the unpack pixel and subsequently the DMA controller to access frame buffer data on a demand basis. The output formatter interprets the pixel read from the output FIFO according to control bits BPP, OPS and RGB defined in CR1. Both the write side and the read side of the output FIFO are on the pixel clock (PCLK) domain. Grey encoded address pointers are used for FIFO empty and full flag calculations as well as the lookahead pipeline freeze signal. The treatment of the output FIFO in this way allows for the same design for both the input and output FIFOs, and enables the output FIFO to read on a different clock domain in future designs. Interrupts OFO (Output FIFO – Overrun) and OFU (Output FIFO – Underrun) trigger whenever there is a FIFO write with no empty locations or read with no valid data. The write side logic by design cannot overrun (because of the back-pressure pipeline freeze 496/590 Doc ID 018553 Rev 3 RM0078 Display controller (CLCD) capability). While tested, this protection remains for potential error analysis. The read side can cause an OFU interrupt, and the cause of this could be inadequate bus bandwidth in accessing frame buffer data. 34.5.10 Power sequencing The LCD controller provides the following power-up sequencing support: 1. 2. Power is applied to the VLSI device containing the LCD controller core and the LCD panel. Internally the LCD controller core holds the following signals to logic zero: – lcd_vsync – lcd_r[7:0] – lcd_hsync – lcd_g[7:0] – lcd_de – lcd_b[7:0] – lcd_pclk After a pre-determined amount of time specified by the LCD panel and controlled by a processor timer, the control bit LCE in control register 1 is set to on. With LCE is on, the signals to the LCD panel listed in step 1 are free to drive to their programmed active levels. The LCD controller provides the following power-down sequencing support: a) Control bit LCE in control register 1 is set to off. b) After the current frame being displayed completes, the signals to the LCD panel listed above are forced to zero. c) At the time the signals to the LCD panel are forced to zero, interrupt LDD is generated, signaling frame completion. After a pre-determined amount of time specified by the LCD panel, power to the display can be removed. Note: The control bit LPE in CR1, connected as an enable to an external power source for the LCD panel, can be used for enabling and disabling power to the LCD panel. 34.5.11 Pulse-width modulation In order to support TFT LCD panels with LED for backlighting, a pulse-width modulation (PWM) module is added to the CLCD. Typically, a DC-DC converter provides the constant current to the LEDs, and the converter contains a brightness input. Modulating the brightness input with a PWM signal trades-off power consumed by the panels versus brightness. The PWM module has two sources for a clock: the slave bus HCLK or the pclk_in. The selected clock is pre-scaled to the desired PWM frequency, which in turn is modulated in pulse width by the PWM duty cycle register (PWMDCR). 34.5.12 Overlay windows The CLCD supports up to 2 overlay windows. These overlay windows overlay the background graphics screen. A key feature of an overlay window is that the window is read from a separate section of memory in substitution of the background window. Thus, there is no increase of the master bus bandwidth required when activating an overlay window. Doc ID 018553 Rev 3 497/590 Display controller (CLCD) RM0078 Each overlay window contains the following register definitions: Note: ● A control bit in register overlay window enable register (OWER), which enables or disables the overlay window. ● Overlay window X-Coordinates X_START, X_END in register overlay window X start / end register x (OWXSER_x). Note that there are 2 of these registers (x = 0 to 1), one for each potential overlay window. ● Overlay window Y-Coordinates Y_START, Y_END in register overlay window Y start / end register x (OWYSER_x). Note that there are 2 of these registers (x = 0 to 1), one for each potential overlay window. ● Start address of frame buffer memory for overlay window in register overlay window DMA base address register x (OWDBAR_x). The contents for the overlay window are located in a separate frame buffer memory section, pointed to by OWDBAR_x. ● The current address of overlay window x within the DMA controller is in register overlay window DMA current address register x (OWDCAR_x). ● End address of frame buffer memory for overlay window in register overlay window DMA end address register x (OWDEAR_x). It is compared to register OWDCAR_x to determine last frame buffer memory address to read from for overlay window x. There are 2 register sets (x = 0 to 1), one set for each potential overlay window, with each set consisting of the following registers: OWXSER_x, OWYSER_x, OWDBAR_x, OWDEAR_x and OWDCAR_x. When a overlay window is properly programmed (when its registers OWXSER_x, OWYSER_x, OWDBAR_x, OWDEAR_x and OWDCAR_x are written and the corresponding OWE bit in register OWER is set to 1), the bandwidth requirement increases for CLCD master in the following way: ● For the base window only: the bandwidth is for the base window only ● For the base window + 1 overlay: the bandwidth is for the base window + the size of 1st overlay window ● For the base window + 2 overlays: the bandwidth is for the base window + the size of 1st overlay window + the size of 1st overlay window A single overlay window over a background graphics window (Figure 191 below) depicts an overlay window with its X_START, Y_START, X_END, Y_END coordinates. The origin is defined as the upper-left point in the screen. Figure 191. A single overlay window over a background graphics window Y start Y end X start 498/590 Doc ID 018553 Rev 3 X end RM0078 35 Graphics processing unit (GPU) Graphics processing unit (GPU) This chapter focuses on GPU functionality and operation. For the GPU feature list, refer to the SPEAr1340 datasheet: ● Doc ID 023063, Data sheet, SPEAr1340, Dual-core Cortex A9 HMI embedded MPU For technical details about the programmable registers, refer to the following companion document: ● 35.1 RM0089, Reference manual, SPEAr1340 address map and registers Overview The GPU is a complex accelerator. It is mainly intended to be used through the binary device driver and high-level 3D libraries (OpenGL) made available by IP vendor (ARM Ltd) and that can be obtained from STMicroelectronics. Figure 192 shows a typical graphics system. Figure 193 on page 501 shows the functional blocks within the GPU. Figure 192. GPU top level block diagram Geometry processing refers to the various tasks the geometry processor performs. Geometry processor Pixel processor During geometry processing, the geometry processor converts geometric descriptions of each object to be drawn into a list of polygons for rendering, and passes this list to the pixel processor. See: Functional description on page 501 Produces a final image from the list of primitives generated by the geometry processor. Collectively, the tasks that the pixel processor performs are referred to as rendering See: Pixel processor on page 503 Memory management unit (MMU) System bus Display controller Enables access checking and translation for all pixel and geometry processor memory accesses. All memory accesses from the pixel and geometry processor use the MMU for access checking and translation. See: Memory management unit (MMU) on page 505 Display Doc ID 018553 Rev 3 499/590 Graphics processing unit (GPU) 35.2 RM0078 Clocks See also: Chapter 5: Reset and clock generator (RCG). 35.3 ● PCLK is the system APB clock. This clock is primarily used to program GPU registers. ● MALI_SUBSYS_AXI_m_aclk is the system AXI clock. This clock is primarily used to perform Read/Write on external memory by the GPU’s internal DMA. ● MALI_200Mhz_clk is the main GPU clock. Different GPU logics run on this clock (except for interface logic such as APB or AXI). Synthesizer SYNTH3 generates this clock. Interrupts The GPU provides the following interrupt request signals: ● IRQ_m200 for the pixel processor ● IRQ_mgp2 for the geometry processor ● IRQ_mmu for the MMU In addition to the physical interrupt lines listed above, each unit has several logical interrupts. See also: Appendix A: Interrupts 35.4 500/590 Resets ● PRESET_n is the APB reset. ● MALI_SUBSYS_AXI_m_RESETNn is the AXI reset. ● MALI_200Mhz_rstn is generated by the synthesizers, and is primarily used to reset MALI internal logic, including the pixel processor and geometry processor. Doc ID 018553 Rev 3 RM0078 35.5 Graphics processing unit (GPU) Functional description Figure 193. GPU functional block diagram GPU Functional description System bus interface Vertex shader core Vertex shader PLBU command processor Vertex loader On-chip bus Configuration registers Pixel processor System bus interface Tile writeback unit Polygon list reader Vertex loader system bus Vertex storer Polygon list builder unit (PLBU) Memory management unit (MMU) Vertex shader command processor Tile buffers RSW Triangle setup unit Rasterizer Fragment shader Blending unit Configuration registers Doc ID 018553 Rev 3 501/590 Graphics processing unit (GPU) 35.5.1 RM0078 Geometry processor The following are the geometry processor’s primary tasks: Note: ● Transform and Lighting (T&L). The input to the geometry processor is raw geometric descriptions of every object to be drawn in a scene. During T&L, the geometry processor scales, rotates, and positions the geometry of objects in the scene, and also calculates and assigns values to the vertices. These values are called varyings, and are required for rendering. The most common varyings are texture co-ordinates and colors. ● Primitive assembly. Primitive assembly involves the PLBU linking vertices together to form different primitives. Primitive assembly can be implicit by the order of the vertices, or explicit by an additional index array. When the vertices are specified for a triangle primitive, the rotation order of the vertices is important because this implicitly specifies a front and back face of the primitive. ● Automatic back face culling. After primitive assembly, back face culling removes all the primitives on the back side of the object that would not be visible because only the back face would be visible from the viewing plane. ● Primitive list assembly (optional; can be enabled or disabled). Because the pixel processor is a tile-based renderer, the geometry processor must prepare a list of all the primitives required for the pixel processor to render a tile. For each primitive, the PLBU writes a list entry for each tile in which part of the primitive might be visible. When using the geometry processor, a user-specified program called a vertex shader runs on every vertex in a frame. The vertex shader performs: ● geometry transformations ● projection correction ● lighting calculations ● other per-vertex calculations. Vertex shader command processor The vertex shader command processor reads and executes commands from a command list stored in memory. The command list is a list of commands intended to set up and configure execution of the vertex shader core. This enables the vertex shader core to execute multiple jobs without CPU intervention. Vertex shader core The vertex loader is a DMA unit that loads per-vertex data for processing. It can accept data from up to 16 distinct streams, each corresponding to one of 16 input registers in the vertex shader. The vertex shader is the most important single unit of the geometry processor. This unit performs most of the required calculations for each vertex. The vertex shader runs a program on each vertex of a 3D model, typically performing T&L for the model. The vertex storer stores data from the output registers of the vertex shader, in memory. The vertex storer can export data to integer or floating point numbers of different sizes. Polygon list builder unit (PLBU) The PLBU creates lists of the polygons that the pixel processor must draw. For each polygon in a scene, the list builder decides which tiles the polygon covers, and adds the polygon to the lists that draw those tiles. The PLBU only adds a polygon to lists where the 502/590 Doc ID 018553 Rev 3 RM0078 Graphics processing unit (GPU) polygon might have to be drawn, reducing the work involved when the pixel processor renders the scene. The PLBU discards polygons that are certain not to be visible. The PLBU can handle up to 300 lists to support the tile-based rendering mode of the pixel processor efficiently. PLBU command processor The PLBU command processor reads and executes commands from a command list stored in memory. The command list sets up and configures execution of the PLBU. This enables the PLBU to execute multiple jobs without CPU intervention. In most cases, the PLBU is used on vertices produced by the vertex shader. 35.5.2 Pixel processor Rendering is the term that describes the various tasks that the pixel processor performs. During rendering, the pixel processor uses the information from the polygon list to produce a final framebuffer image. The pixel processor renders the scene by processing each tile individually. One tile is a 16x16 pixel section of the rendered frame. The processor renders each tile completely before rendering the next tile. The pixel processor performs the following rendering operations: Triangle setup. This prepares the primitive for rendering by calculating various data that is required to rasterize and shade the primitive. Rasterization. The primitive is divided into independent fragments. In general, a fragment is a pixel-sized piece of primitive that the shader pipeline processes, and that might become a pixel or part of a pixel in the final framebuffer. Fragments that might be visible proceed to the fragment shading stage, and fragments that are certain not to be visible are discarded. Fragment shading. This stage determines how the fragment actually looks. In general, the processor calculates a color for the fragment. The fragment shader takes varying variables as input and uses them to interpolate data across the primitive. The fragment shader typically performs texture lookups to calculate the color of the fragment. Blending. The fragment is blended into the framebuffer to produce the final image. The fragment is only included in the framebuffer after thorough testing. During blending, various options enable you to decide how the fragment is combined with the existing framebuffer. For example, you can make the fragment partially translucent so that the final color of the fragment is a combination of its color and the existing color in the framebuffer. You can also include anti-aliasing as an optional stage of the blending process. Producing the framebuffer contents. After blending, the fragment becomes a pixel at a certain position in the tile buffer. If no other fragment overwrites that position, the fragment becomes a pixel in the final framebuffer. Multi-sampling techniques to obtain sharper final images can be applied to the pixel at this stage.When the internal tile buffer is completely rendered, it is written to the framebuffer in main memory. Polygon list reader The polygon list reader reads the polygon lists from main memory and executes commands from the lists. Each primitive in the polygon list contains a pointer to the corresponding RSW and vertex data for that primitive. The polygon list reader passes on information about the primitives and controls the operation of the GPU. Doc ID 018553 Rev 3 503/590 Graphics processing unit (GPU) RM0078 RSW behavior The RSW is a data structure in main memory that contains the render state of polygons. This render state conforms to the definition in the OpenGL ES API. The RSW defines how to rasterize and render the polygon. The GPU keeps a local cache of RSWs for immediate processing. The different pipeline stages in the renderer each reference the RSWs to determine how to process the primitives. Therefore RSW data must be available to the renderer for all the primitives currently in the pipeline. Because the GPU permits RSW data for many primitives to be active at the same time there is no requirement to stall or flush the pipeline for a change of renderer state. Vertex loader For each primitive in the polygon list, the vertex loader fetches the required vertices from memory. The vertices must be fully transformed to screen co-ordinates, typically by running a vertex shader program in the geometry processor. When all the vertices required by a primitive are available, the full vertex set is sent to the triangle setup unit. Triangle setup unit The triangle setup unit takes data from the vertex loader and polygon list reader and uses vertex data to compute coefficients for edge equations and varying interpolation equations. The unit passes the results of its computation to the rasterizer. Rasterizer The rasterizer takes coefficients and equations from the triangle setup unit and uses these to divide polygons into fragments. The rasterizer generates fragments that align with pixels in the tile and passes the fragments in to the fragment shader and then to the blending unit. Fragment shader The fragment shader is a programmable unit that calculates how each fragment of a primitive looks. The fragment shader program specified in the RSW for the primitive is executed for each fragment produced by the rasterizer. The fragment shader program consists of very long instruction words (VLIW), and can use all of the functional units of the fragment shader core in a single instruction. Blending unit When a fragment successfully exits the fragment shader, the blending unit blends the calculated fragment value into the current framebuffer value at that position. The current RSW selects the blend operation to use. Tile buffers The tile buffers take inputs from the fragment shader and perform various tests on the fragments, such as Z tests and stencil tests. When the tile is fully rendered it is written to the framebuffer. Four subpixel values are stored for each visible pixel, to support 4x anti-aliasing without performance degradation. The tile buffers include: 504/590 ● an 8-bit stencil buffer that stores stencil values ● a 24-bit Z buffer that stores depth values ● a 32-bit color buffer Doc ID 018553 Rev 3 RM0078 Graphics processing unit (GPU) Tile writeback unit The writeback unit writes the content of the tile buffer to system memory after the tile has been completely rendered. 35.5.3 Memory management unit (MMU) The MMU controls and translates memory accesses initiated by the GPU. The MMU controls data through data structures based on pages and tables. The MMU connects to the bus infrastructure. 35.6 Operation Figure 194. The GPU software architecture Graphics application EGL driver Graphics drivers Graphics standards shared layer Base driver Operating system Device driver GPU hardware Doc ID 018553 Rev 3 505/590 Graphics processing unit (GPU) 35.6.1 RM0078 3D system level operation Figure 195. Typical 3D graphics flow Start scene Graphics application makes API calls to start the process and initialize context Base driver allocates memory structures for geometry processing Graphics driver fills in the memory structures for geometry processing See also Figure 196: Geometry processor data structure on page 507, and Figure 197: Pixel processor data structure on page 508. Graphics driver submits its rendering job to the device driver Device driver starts geometry processing in hardware For each vertex, the hardware processes geometry data. Vertex shading Primitive assembly Tile list generation Base driver allocates memory structures for pixel processing Graphic driver fills in the memory structures for pixel processing Graphics driver submits its rendering job to the device driver Device driver configures and starts the pixel processor For each fragment, the hardware processes polygons and writes to the framebuffer. Rasterization Texturing and fragment shading Blending Write to framebuffer Operating system updates display End scene 506/590 Doc ID 018553 Rev 3 The pixel processor reads in the polygon lists, render states, and textures that the geometry processor has defined. From this information, the final graphical image is created. The pixel processor writes the image to the framebuffer. (Pixel processor on page 503) RM0078 Graphics processing unit (GPU) Figure 196. Geometry processor data structure Vertex data block memory in ... Vertex data block 1 Vertex data block 2 Vertex data block memory out Vertex data block n Vertex shading ... Vertex shader command list memory Vertex data block 1 Vertex data block 2 Vertex data block n ... VS command1 VS command n Polygon list memory Vertex list memory Polygon list tile 1 Polygon list command 1 ... ... Vertex index 1 Polygon list command n Polygon list building ... Vertex index n Polygon list builder command list memory Polygon list tile n Polygon list command 1 ... ... PLB command1 Polygon list command n PLB command n Doc ID 018553 Rev 3 507/590 Graphics processing unit (GPU) RM0078 Figure 197. Pixel processor data structure Polygon list memory Vertex data block memory Polygon list for tile 1 Vertex data block 1 Vertex data block 2 Vertex data block 3 Vertex data block 4 Polygon list command 1 ... Polygon list command 2 Polygon list primitive 1 ... Polygon list primitive 2 Vertex data block n Render state word 1 Render state word 2 ... ... Polygon list command n Render state word memory Render state word n Polygon list for tile 2 Shader program memory Polygon list command 1 Shader program 1 Shader program 2 ... Polygon list command 2 Polygon list primitive 1 Shader program n ... Polygon list primitive 2 Remap table memory Polygon list command n 508/590 Uniform remap table Uniforms Texture descriptor remap table Textures Doc ID 018553 Rev 3 RM0078 35.6.2 Graphics processing unit (GPU) 2D system level operation Figure 198. 2D graphics process flow Start scene Graphics application makes API calls to start the process and initialize context Graphics driver processes input and creates geometry data See Initial graphics API calls Stroked and filled path geometry generation Transformation Base driver allocates memory structures for geometry processing Graphics driver fills in the memory structures for geometry processing Graphics driver submits its rendering job to the device driver Device driver starts geometry processing in hardware Hardware processes geometry data. Cached geometry transformation Tile list generation Base driver allocates memory structures for pixel processing Graphic driver fills in the memory structures for pixel processing Graphics driver submits its rendering job to the device driver Device driver configures and starts the pixel processor Hardware processes pixel data Rasterization The pixel processor rasterizes triangles that are inside the drawing area and within the clipping rectangles. Clipping The GPU draws interior fills for the rasterized geometry. The fill style is determined by the specified drawing style and paint objects. The GPU samples image colors and combines them with the generated paint, depending on the image mode. Paint generation More image processing needed? Yes Image interpolation See also, 2D filter processing. No The result from the blending stage is merged with the destination framebuffer for display, using the alpha mask as the blend factor. Blending The GPU blends the color with the destination color using the blend function specified by the graphic driver. Masking and anti-aliasing Write to framebuffer Operating system updates display End scene Doc ID 018553 Rev 3 509/590 Graphics processing unit (GPU) RM0078 Initial graphics API calls ● drawing style ● transformations ● paints ● paths ● images ● mask buffer and scissor rectangles initialization Stroked and filled path geometry generation ● Transformed path divided into line-loops ● Line-loops tessellated into sets of triangles that represent the filled path ● Triangles generated that represent the stroke. The driver generates widened stroke geometry from the path data and stroke style settings. Transformation ● Path transformed from user space to surface space ● Image transformed from user space to surface space 2D filter processing The filtered image is processed at the image interpolation stage shown in Figure 198 on page 509, using the destination image format. Figure 199. GPU image filter process flow Source image The source image is converted to a format compatible with the destination image. Source image normalized The GPU performs an image filter operation in the software driver on the normalized source image. Filtering Conversion to destination format The result from the filtering operation is converted to the destination image format. Destination image Example: applying a blur filter to an image and creating a fade between the original image and the blurred version of the image: 510/590 1. Access the source image, then blur the image using the software driver to produce a destination image. 2. Draw the source image in the framebuffer. See also Figure 198 on page 509. 3. Draw the blurred image with a specific blend function enabled. See also Figure 198 on page 509. 4. Repeat steps above as needed. Doc ID 018553 Rev 3 RM0078 35.6.3 Graphics processing unit (GPU) Graphics pipeline level operation Figure 200. Typical graphics pipeline flow Start processing Initial processing Modeling transformation Viewing transformation Per-vertex lighting The API level drivers create data structures for the GPU and configures the hardware for each scene. The software generates data structures for render state words (RSWs) and texture descriptors. The geometry processor runs a vertex shader program for each vertex. This shader program can perform transform, lighting, viewport transformation, and perspective transformation. Vertices are then assembled into graphics primitives, and polygon lists are built for the pixel processor. The functional blocks used: Vertex shader command processor; Vertex shader core; Polygon list builder unit (PLBU); PLBU command processor. Projection transformation Rasterization Fragment shading The pixel processor: • Reads in polygon list data and commands from the polygon list. The polygon list entries point to the appropriate RSWs. • Reads in the RSWs to internal memory. • Reads in vertices for each primitive. When all required vertices are read, coefficients and equations for rasterization are calculated in a process called triangle setup. • Rasterizes the polygons, and runs fragment shaders. The rasterizer takes the coefficients and equations from the triangle setup unit and creates fragments. A fragment shader program is then run on each fragment to calculate the color of the fragment. The functional blocks used: RSW behavior; Polygon list reader; Vertex loader; Triangle setup unit; Rasterizer; Fragment shader. Blending Write to framebuffer To produces the final display data for the frame buffer, the pixel processor: • Creates blended fragments in a blending unit. The blending unit takes configuration information from the RSW and applies the corresponding blending functions to the fragments. The blending unit blends the fragments with the color already present at the corresponding location in the frame buffer. • Tests the fragments and updates the frame buffer. The pixel processor stores fragments in tile buffers. The tile buffer calculates which fragments are visible and which are hidden and passes the visible fragments to the frame buffer. • Writes the content of the tile buffer to system memory after the tile has been completely rendered. The functional blocks used: Blending unit; Tile buffers; Tile writeback unit. End processing Doc ID 018553 Rev 3 511/590 Video decoder (VDEC) 36 RM0078 Video decoder (VDEC) This chapter focuses on VDEC functionality and operation. For the VDEC feature list, refer to the SPEAr1340 datasheet: ● Doc ID 023063, Data sheet, SPEAr1340, Dual-core Cortex A9 HMI embedded MPU For technical details about the programmable registers, refer to the following companion document: ● 36.1 RM0089, Reference manual, SPEAr1340 address map and registers Overview The VDEC functional block is a complex subsystem. It is mainly intended to be used through the binary device driver and low-level software layers that can be obtained from STMicroelectronics. Figure 202 shows the video decoder block diagram. Figure 201. Decoder functional block diagrams Decoder control software Application programming interface External memory MPEG-2 Strm. Header Decode MPEG-4 Strm. Header Decode H.264 Strm. Header Decode VC-1 Strm. Header Decode RV Strm. Header Decode JPEG Strm. Header Decode Hardware drivers System bus Bus interface Alpha Blending Rotation Inter / Intra Prediction MV Decode Entropy Decode Dithering Deinterlace Deblocking Filter Inverse Transform RLC Decode Scaling RGB Conversion Decoder and post processor hardware 512/590 VP6/7/8 Strm. Header Decode Doc ID 018553 Rev 3 AC/DC Prediction AVS Strm. Header Decode RM0078 Video decoder (VDEC) Supported standards, profiles and levels Table 160. Supported standards, profiles and levels Standard Decoder support H.264 – – – – Baseline Profile, levels 1 - 4.2 Main Profile, levels 1 - 4.2 High Profile, levels 1 - 4.2 Image size up to 1080p at level 4.2 SVC – Scalable Baseline Profile, base layer only – Scalable High Profile, base layer only MPEG-4 – Simple Profile, levels 0 - 6 – Advanced Simple Profile, levels 0 - 5 MPEG-2 – Main Profile, low, medium and high levels MPEG-1 – Main Profile, low, medium and high levels H.263 – Profile 0, levels 10-70. Image size up to 720x576 Sorenson Spark – Bitstream version 0 and 1 VC-1 – Simple Profile, low, medium and high levels – Main Profile, low, medium and high levels – Advanced Profile, levels 0-3 JPEG – Baseline interleaved RV – RV8 – RV9 – RV10 VP6 – VP6.0 (Simple Profile) – VP6.1 – VP6.2 (Advanced Profile) VP7 – VP7 versions 0-3 VP8 – VP8 version 2 (WebM) AVS – P2 Jizhun Proflie, level 6.0 and 6.2 DivX – DivX Home Theater Profile Qualified TM – DivX3/4/5/6 Possible deviations from the tools specified by these levels, and other points to notice are listed in Table 161. Table 161. Deviations from the supported profiles and levels Standard Tool Decoder support AVS 4:2:2 sampling Not supported H.263 Time code extensions Not supported H.264 Slice groups (FMO) If more than one slice group used, software performs entropy decoding. Doc ID 018553 Rev 3 513/590 Video decoder (VDEC) RM0078 Table 161. Deviations from the supported profiles and levels (continued) Standard 36.2 Tool Decoder support H.264 Arbitrary slice order Supported, software performs entropy decoding. H.264 Redundant slices Supported, but not utilized; redundant slices are skipped by software. H.264 Image cropping Not performed by the decoder, cropping parameters are returned to the application. SVC Enhancement layers Not supported MPEG-4 Data partitioning Supported, software performs entropy decoding. MPEG-4 Global motion compensation Not supported VC-1 Multi-resolution Supported, upscaling will be performed by the postprocessor. VC-1 Range mapping Supported, range mapping will be performed by the post-processor. JPEG Non-interleaved data order Not supported Clocks See also: Chapter 5: Reset and clock generator (RCG) The following clocks are used inside the decoder wrapper: ● ACLK(4) is the system AXI clock. It is used by the asynchronous bridge to interface the decoder AXI Master with the AXI bus. ● HCLK(4) is the system AHB clock. It is used by the asynchronous bridge to interface the decoder AHB Slave with the AHB bus. ● DCLK is the decoder’s core clock (235 MHz). It is sourced from clock synthesizer SYNT0. The decoder AXI Master and AHB Slave both use this clock. 36.3 Interrupts The decoder and post-processor share a common interrupt line (XINTDec, interrupt line ID[113]) for all the interrupts generated. 36.3.1 Decoder interrupts When the decoder hardware wants the software attention, it sets the interrupt bit high with one of the status flags providing information about the reason for the interrupt. When the software has handled the interrupt it must reset all status flags to zero. The interrupt bit stays high until software has reset it. 4. ACLK and HCLK are the same clock; they are connected to the system bus clock AHCLK. 514/590 Doc ID 018553 Rev 3 RM0078 Video decoder (VDEC) The interrupt method can be set to interrupting or polling. Table 162. Decoder interrupt register (SWREG1 OFFSET 0X4) Bit Name 31:25 - 24 sw_dec_pic_inf 23:19 - Function Not used B slice detected. This signal is driven high during picture ready interrupt if B-type slice is found. This bit does not launch interrupt but is used to inform software about h264 tools. DIVX3: For DIVX3 this bit tells the value of extension header flag (flag called FLIPFLOP) Not used 18 sw_dec_timeout Interrupt status bit decoder timeout. When high, decoder has made no bus transactions in 2^18-1 clock cycles and has not set an interrupt. Possible only if timeout interrupting is enabled. This should be considered as an encountered error in the input stream. Note: Post-processor transactions affect this feature; running stand-alone post-processing while decoding may prevent decoder timeout interrupts. 17 sw_dec_slice_int Interrupt status bit dec_slice_decoded. When high software must set new base addresses for sw_dec_out_base and sw_jpg_ch_out_base before reseting this status bit. Used for JPEG and VP8 web-p modes. 16 sw_dec_error_int Interrupt status bit input stream error. When high, an error is found in input data stream decoding, and software must perform error concealment. HW will self reset. 15 sw_dec_aso_int Interrupt status bit ASO (Arbitrary Slice Ordering) detected. When high, hardware has encountered Arbitrary Slice Order tool in the input H.264 stream data, and software must perform entropy decoding. Hardware will self-reset. 14 Interrupt status bit input buffer empty. sw_dec_buffer_int When high, the input stream buffer is empty but the picture is not ready. Software must provide a new stream pointer to hardware. Hardware will not self-reset. 13 sw_dec_bus_int Interrupt status bit - Error response from bus. When high, hardware has received an error response from the bus while accessing external memory. This is a fatal error possibly caused by the incorrect allocation of decoder linear memory. Hardware will self-reset. 12 sw_dec_rdy_int Interrupt status bit decoder. When this bit is high decoder has decoded a picture. HW will self reset. 11:9 - 8 7:5 4 Not used sw_dec_irq Decoder IRQ. This bit drives the interrupt line, OR gated with the post-processor interrupt bit. Software will reset this after the interrupt is handled. The interrupt line is not used for the decoder if the interrupt disable bit for decoder is high. - Not used sw_dec_irq_dis Decoder IRQ disable. When high, there are no interrupts concerning decoder from HW. Polling must be used to see the interrupt status. Doc ID 018553 Rev 3 515/590 Video decoder (VDEC) RM0078 Table 162. Decoder interrupt register (SWREG1 OFFSET 0X4) (continued) Bit 3:1 0 Name Function - Not used sw_dec_e Decoder enable. Setting this bit high will start the decoding operation. HW will reset this when picture is processed or ASO or stream error is detected or bus error or timeout interrupt is given. 516/590 Doc ID 018553 Rev 3 RM0078 36.3.2 Video decoder (VDEC) Post-processor interrupts The post-processing interrupt register contains information for the post-processor. Table 163. Post-processing interrupt register (swreg60 offset 0xf0) Bit Name Function 13 sw_pp_bus_int Interrupt status bit - Error response from bus. When high, hardware has received an error response from the bus while accessing external memory. This is a fatal error possibly caused by the incorrect allocation of postprocessor linear memory. Hardware will self-reset. In pipeline mode this bit is not used 12 sw_pp_rdy_int Interrupt status bit pp. When this bit is high post processor has processed a picture in external mode. In pipeline mode this bit is not used. 11:9 - 8 sw_pp_irq 7:5 - 4 sw_pp_irq_dis 3:2 - 1 0 Not used Post-processor IRQ. This bit drives the interrupt line, OR gated with the decoder interrupt bit. Software will reset this after the interrupt is handled. The interrupt line is not used if the interrupt disable bit for postprocessor is high. Not used Post-processor IRQ disable. When high, there are no interrupts from HW concerning post processing. Polling must be used to see the interrupt Not used Decoder – post-processing pipeline enable: sw_pp_pipeline_e 0 = Post-processor is processing different picture than decoder or is disabled 1 = Post-processing is performed in pipeline with decoder sw_pp_e External mode post-processing enable. This bit will start the post-processing operation. Not to be used if PP is in pipeline with decoder (sw_pp_pipeline_e = 1). HW will reset this when picture is post-processed. Doc ID 018553 Rev 3 517/590 Video decoder (VDEC) 36.4 RM0078 Functional description Figure 202. Video decoder detailed block diagram romd (all 6 instances) axiahbg1dec Master Interface AXI streamd busifd X2X Bridge hwg1core axiwmfid AXI BUS scd Mvd bsd DCLK (235 MHz) Domain Fuse Post Processor filterd Slave Interface AHB H2HA sync 32 transd refbufferd pred ppd ahbwsifd AHB BUS acdcd Fuse Decoder ACLK (166 MHz) Domain HCLK (166 MHz) Domain clkctrld Hwg1swr (swrdec & swrpp) ramd Decoders are operated using the application programming interface (API). 518/590 ● H.264 decoder on page 519 ● MPEG-4 / H.263 / Sorenson Spark decoder on page 521 ● MPEG-2 / MPEG-1 decoder on page 523 ● JPEG decoder on page 525 ● VC-1 decoder on page 526 ● RV decoder on page 528 ● VP6 decoder on page 530 ● VP7/VP8 decoder on page 532 ● AVS decoder on page 534 ● DivX decoder on page 536 Doc ID 018553 Rev 3 XINT (Interrupt) RM0078 36.4.1 Video decoder (VDEC) H.264 decoder Table 164. H.264 / SVC decoder base layer features Feature Decoder support Input data format H.264 byte or NAL unit stream / SVC stream Decoding scheme – Frame by frame (or field by field) – Slice by slice Output data format – YCbCr 4:2:0 semi-planar format(1) – YCbCr 4:0:0 (monochrome) Supported image size – 48 x 48 to 1920 x 1088(2) – Step size 16 pixels(3) Maximum frame rate 30 fps at 1080p(4) Maximum bit rate As specified by H.264 HP level 4.2 Error detection and concealment Supported 1. In semi-planar format, the Cb and Cr components are interleaved pixel by pixel in a separate plane. This allows more efficient bus usage compared to the planar YCbCr format due to longer bursts that can be used in chrominance data transferring. 2. The maximum decoder output size is configurable up to 1920 x 1088. Internal memory size is affected by the selected configuration. 3. The decoder crops video fields that are a multiple of eight pixels in the vertical direction. 4. Achievable resolution and frame rate depending on specific stream content and system load. Figure 203. H.264 decoder initialization H.264 Decoder H.264 Application H264Declnit(&declnst, 0, 0, 0) Initialize H.264 decoder H264DEC_OK Receive H.264 stream start H264DecDecode(declnst, &decInput, &decOutput) Decode H.264 parameter sets H264DEC_STRM_PROCESSED Receive first H.264 coded data slice H264DecDecode(decInst, &decOutput) Activate parameter sets based on information contained in first picture slice (IDR picture) H264DEC_HDRS_RDY H264DecGetInfo(decInst, &decInfo) H264DEC_OK To get information about decoded stream (such as picture dimensions and cropping information), call H264DecGetInfo Doc ID 018553 Rev 3 519/590 Video decoder (VDEC) RM0078 Figure 204. H.264 / SVC decoder basic process 520/590 Doc ID 018553 Rev 3 RM0078 36.4.2 Video decoder (VDEC) MPEG-4 / H.263 / Sorenson Spark decoder Table 165. MPEG-4 / H.263 / Sorenson Spark decoder features Feature Decoder support Input data format MPEG-4 / H.263 / Sorenson Spark elementary video stream Decoding scheme – Frame by frame (or field by field) – Video packet by video packet Output data format YCbCr 4:2:0 semi-planar Supported image size – 48 x 48 to 1920 x 1088 (MPEG-4, Sorenson Spark)(1) – 48 x 48 to 720 x 576 (H.263) – Step size 16 pixels(2) Maximum frame rate 30 fps at 1080p(3) Maximum bit rate As specified by MPEG-4 ASP level 5 Error detection and concealment Supported 1. The maximum decoder output size is configurable up to 1920 x 1088. Internal memory size is affected by the selected configuration. 2. The decoder crops video fields that are a multiple of eight pixels in the vertical direction. 3. Achievable resolution and frame rate depending on specific stream content and system load. Figure 205. MPEG-4 / H.263 / Sorenson Spark decoder initialization Doc ID 018553 Rev 3 521/590 Video decoder (VDEC) RM0078 Figure 206. MPEG-4 / H.263 / Sorenson Spark decoder basic procces 522/590 Doc ID 018553 Rev 3 RM0078 36.4.3 Video decoder (VDEC) MPEG-2 / MPEG-1 decoder Table 166. MPEG-2 / MPEG-1 features Feature Decoder support Input data format MPEG-2 / MPEG-1 elementary video stream Decoding scheme – Frame by frame (or field by field) – Video packet by video packet Output data format YCbCr 4:2:0 semi-planar format – 48 x 48 to 1920 x 1088(1) Supported image size – Step size 16 pixels Maximum frame rate 30 fps at 1080p(2) Maximum bit rate As specified by MPEG-2 MP high level Error detection and concealment Supported 1. The maximum decoder output size is configurable up to 1920 x 1088. Internal memory size is affected by the selected configuration 2. Achievable resolution and frame rate depending on specific stream content and system load. Figure 207. MPEG-2 / MPEG-1 decoder initialization Doc ID 018553 Rev 3 523/590 Video decoder (VDEC) RM0078 Figure 208. MPEG-2 / MPEG-1 decoder basic procces 524/590 Doc ID 018553 Rev 3 RM0078 36.4.4 Video decoder (VDEC) JPEG decoder Table 167. JPEG decoder features Features Decoder support Input data format – JFIF file format 1.02 – YCbCr 4:0:0, 4:2:0, 4:2:2, 4:4:0, 4:1:1 and 4:4:4 Video frame storage formats Decoding scheme – Input: buffer by buffer, from 5 kB to 8 MB at a time(1) – Output: from 1 MB row to 16 Mpixels at a time(2) Output data format YCbCr 4:0:0, 4:2:0, 4:2:2, 4:4:0, 4:1:1 and 4:4:4 semi-planar Supported image size – 48 x 48 to 8176 x 8176 (66.8 Mpixels) – Step size 8 pixels(3) Maximum data rate Up to 76 million pixels per second(4) Thumbnail decoding JPEG compressed thumbnails supported Error detection Supported 1. Programmable buffer size for optimizing performance and memory consumption. Interrupt is issued when buffer runs empty, and the control software loads more stream to external memory. 2. Programmable output slice size for optimizing performance and memory consumption. Interrupt is issued when the requested area is decoded. The control software can be used to switch the decoder output picture base address each time. 3. Non-16x16 dividable resolutions are filled to the 16-pixel boundary. 4. Actual maximum data rate depends on the logic clock frequency and the JPEG compression rate. The given figure applies to high-quality JPEG with logic running at 200 MHz. Figure 209. JPEG decoder basic process Doc ID 018553 Rev 3 525/590 Video decoder (VDEC) 36.4.5 RM0078 VC-1 decoder Table 168. VC-1 decoder features Feature Decoder support Input data format VC-1 stream Decoding scheme – Frame by frame (or field by field) – Slice by slice Output data format YCbCr 4:2:0 semi-planar format Supported image size – 48 x 48 to 1920 x 1088(1) – Step size 16 pixels(2) Maximum frame rate 30 fps at 1080p(3) Maximum bit rate As specified by VC-1 AP level 3 Error detection and concealment Supported 1. The maximum decoder output size is configurable up to 1920 x 1088. Internal memory size is affected by the selected configuration. For interlaced sequences, field size must be at least 48 x48. 2. The decoder crops video fields that are a multiple of eight pixels in the vertical direction. 3. Achievable resolution and frame rate depending on specific stream content and system load. Figure 210. VC-1 decoder initialization 526/590 Doc ID 018553 Rev 3 RM0078 Video decoder (VDEC) Figure 211. VC-1 decoder basic procces Doc ID 018553 Rev 3 527/590 Video decoder (VDEC) 36.4.6 RM0078 RV decoder Table 169. RV decoder features Feature Decoder support Input data format RV8, RV9, or RV10 stream Decoding scheme – Frame by frame – Slice by slice Output data format YCbCr 4:2:0 semi-planar format Supported image size – 48 x 48 to 1920 x 1088(1) – Step size 16 pixels Maximum frame rate 30 fps at 1080p(2) Maximum bit rate As specified by RV specification Error detection and concealment Supported 1. The maximum decoder output size is configurable up to 1920 x 1088. Internal memory size is affected by the selected configuration. 2. Achievable resolution and frame rate depending on specifi