Using SDRAM on AT91SAM9 Microcontrollers 1. Scope The Atmel® AT91SAM9 ARM® Thumb® based microcontroller family features an AHB high-performance SDRAM controller for connecting 16-bit or 32-bit wide external SDRAM memories. The purpose of this document is to help the developer in the design of a system using SDRAM memories. It describes the performance characteristics of the SDRAM controller and associated techniques to optimize SDRAM performance and power consumption. AT91 ARM Thumb Microcontrollers Application Note The associated zip file, AN-SDRAM_software_example.zip, contains the elements required in Section 9.3 ”Software Example” on page 13. 2. SDRAM Controller Overview The SDRAM Controller (SDRAMC) extends the memory capabilities of a chip by providing the interface to an external 16-bit or 32-bit SDRAM device. The page size ranges from 2048 to 8192 and the number of columns from 256 to 2048. It supports byte (8-bit), half-word (16-bit) and word (32-bit) accesses. The SDRAM Controller word write burst oriented. It does not support byte read/write bursts or half-word write bursts. It keeps track of the active row in each bank, thus maximizing SDRAM performance, e.g., the application may be placed in one bank and data in the other banks. So as to optimize performance, it is advisable to avoid accessing different rows in the same bank (Open Bank Policy). The SDRAM controller supports a CAS latency of 1, 2 or 3, thus optimizing the read access depending on the frequency. Self refresh, power down and deep power down mode features minimize the consumption of the SDRAM device. 6256A–ATARM–19-Sep-06 3. SDRAM Controller Signals Definition The SDRAM Controller is capable of managing up to four bank 32-bit wide SDRAM devices. The signals generated by the controller are defined in Table 3-1. Refer to the chapter “External Bus Interface (EBI)” in the product datasheet. Table 3-1. SDRAM Controller Signals Controller Name Description Microcontroller Signal Type Active Level SDCK SDRAM Clock SDCK Output SDCKE SDRAM Clock Enable SDCKE Output High SDCS SDRAM Controller Chip Select NCS1/SDCS Output Low BA[1:0] Bank Select Signals A16/BA0; A17/BA1 Output RAS Row Signal RAS Output Low CAS Column Signal CAS Output Low SDWE SDRAM Write Enable SDWE Output Low NBS[3:0] Data Mask Enable Signals NBS[3:0] Output Low SDRAMC_A[12:0] Address Bus A[14:2] Output D[31:0] Data Bus D[31:0] I/O • SDCK is the clock signal that feeds the SDRAM device and to which all the other signals are referenced. All SDRAM input signals are sampled on the positive edge of SDCK. To reach a speed of 100 MHz on the pin SDCK loaded with 50 pF equivalent capacitor, a dedicated high speed pin is necessary and so SDCK pin is not multiplexed with a PIO line (lower frequency). • SDCKE acts as an inhibit signal to the SDRAM device. SDCKE remains high during valid SDRAM access (Read, Write, Precharge). It goes low when the device is in power down mode or in self refresh mode, and so a self refresh command can be issued by the controller. For more information, refer to the section “Self-refresh Mode” in the chapter “SDRAM Controller (SDRAMC)” in the product datasheet. • SDCS: When the chip select SDCS is low, command input is valid. When high, commands are ignored but the operation continues. • RAS, CAS, SDWE: The row address strobe (RAS), column address strobe (CAS) asserts to indicate that the corresponding address is present on the bus. The conjunction with write enable (SDWE) and chip select (SDCS) at the rising edge of the clock (SDCK) determines the SDRAM operation. • BA0, BA1 selects the bank to address when a command is input. Read/write or precharge is applied to the bank selected by BA0 and BA1. • NBS[3:0]: Data is accessed in 8,16 or 32 bits by means of NBS[3:0] which are respectively highest to lowest mask bit for the SDRAM data on the bus. • SDRAMC_A[12:0]: SDRAM controller address lines are bounded, respectively, to [A2:A14] of the microcontroller except for SDRAMC_A10 (SDA10) which is not bounded to A12. SDRAMC_A[12:0] addresses up to eleven columns and 13 rows. • SDA10: Acts as an SDRAM address line but is also used as the auto-precharge command bit. AT91 products output a dedicated SDA10 signal that allows the system to enable the auto precharge feature without address bus influence. 2 Application Note 6256A–ATARM–19-Sep-06 Application Note 4. SDRAM Connection on AT91SAM9 The AT91 microcontrollers support 16-bit and 32-bit SDRAM devices on one Chip Select area (NCS1). The bit DW located in the SDRAM configuration register selects 16-bit or 32-bit bus width. The 32-bit interface can be achieved by a single 32-bit SDRAM device or two 16-bit SDRAM devices. Each SDRAM device must use sufficient decoupling to provide efficient filtering on the power supply rails. 4.1 4.1.1 SDRAM 16-bit Connection Hardware Configuration D[0..15] A[0..14] (Not used A12) U1 A2 A3 A4 A5 A6 A7 A8 A9 A10 A11 A13 SDA10 BA0 BA1 SDA10 BA0 BA1 A14 23 24 25 26 29 30 31 32 33 34 22 35 20 21 36 40 SDCKE SDCK A0 CFIOR_NBS1_NWR1 CAS RAS SDWE SDCS_NCS1 SDCKE 37 SDCK 38 NBS0 NBS1 15 39 CAS RAS 17 18 SDWE 16 19 A0 MT48LC16M16A2 DQ0 A1 DQ1 A2 DQ2 A3 DQ3 A4 DQ4 A5 DQ5 A6 DQ6 A7 DQ7 A8 DQ8 A9 DQ9 A10 DQ10 A11 DQ11 DQ12 BA0 DQ13 BA1 DQ14 DQ15 A12 N.C VDD VDD CKE VDD VDDQ CLK VDDQ VDDQ DQML VDDQ DQMH VSS CAS VSS RAS VSS VSSQ VSSQ WE VSSQ CS VSSQ 2 4 5 7 8 10 11 13 42 44 45 47 48 50 51 53 1 14 27 3 9 43 49 D0 D1 D2 D3 D4 D5 D6 D7 D8 D9 D10 D11 D12 D13 D14 D15 3V3 C1 C2 C3 C4 C5 C6 C7 100NF 100NF 100NF 100NF 100NF 100NF 100NF 28 41 54 6 12 46 52 256 Mbits TSOP54 PACKAGE 4.1.2 Software Configuration The following configuration must be performed: • Assign the EBI CS1 to the SDRAM controller by setting the bit EBI_CS1A in the EBI Chip Select Assignment Register located in the bus matrix memory space. • Initialize the SDRAM Controller according to SDRAM device and system bus frequency. • The Data Bus Width is programmed to 16 bits. The SDRAM initialization sequence is described in “Initialization Sequence” on page 11. 3 6256A–ATARM–19-Sep-06 Application Note 4.2 4.2.1 SDRAM 32-bit Connection Hardware Configuration D[0..31] A[0..14] (Not used A12) U1 A2 A3 A4 A5 A6 A7 A8 A9 A10 A11 A13 SDA10 BA0 BA1 SDA10 BA0 BA1 A14 23 24 25 26 29 30 31 32 33 34 22 35 20 21 36 40 SDCKE SDCK A0 CFIOR_NBS1_NWR1 CAS RAS SDWE SDCS_NCS1 SDCKE 37 SDCK 38 NBS0 NBS1 15 39 CAS RAS 17 18 SDWE 16 19 U2 A0 MT48LC16M16A2 DQ0 A1 DQ1 A2 DQ2 A3 DQ3 A4 DQ4 A5 DQ5 A6 DQ6 A7 DQ7 A8 DQ8 A9 DQ9 A10 DQ10 A11 DQ11 DQ12 BA0 DQ13 BA1 DQ14 DQ15 A12 N.C VDD VDD CKE VDD VDDQ CLK VDDQ VDDQ DQML VDDQ DQMH VSS CAS VSS RAS VSS VSSQ VSSQ WE VSSQ CS VSSQ 2 4 5 7 8 10 11 13 42 44 45 47 48 50 51 53 1 14 27 3 9 43 49 D0 D1 D2 D3 D4 D5 D6 D7 D8 D9 D10 D11 D12 D13 D14 D15 3V3 A2 A3 A4 A5 A6 A7 A8 A9 A10 A11 SDA10 A13 BA0 BA1 A14 C1 C2 C3 C4 C5 C6 C7 100NF 100NF 100NF 100NF 100NF 100NF 100NF 28 41 54 6 12 46 52 A1 CFIOW_NBS3_NWR3 23 24 25 26 29 30 31 32 33 34 22 35 20 21 36 40 SDCKE 37 SDCK 38 NBS2 NBS3 15 39 CAS RAS 17 18 SDWE 16 19 256 Mbits A0 MT48LC16M16A2 DQ0 A1 DQ1 A2 DQ2 A3 DQ3 A4 DQ4 A5 DQ5 A6 DQ6 A7 DQ7 A8 DQ8 A9 DQ9 A10 DQ10 A11 DQ11 DQ12 BA0 DQ13 BA1 DQ14 DQ15 A12 N.C VDD VDD CKE VDD VDDQ CLK VDDQ VDDQ DQML VDDQ DQMH VSS CAS VSS RAS VSS VSSQ VSSQ WE VSSQ CS VSSQ 2 4 5 7 8 10 11 13 42 44 45 47 48 50 51 53 1 14 27 3 9 43 49 D16 D17 D18 D19 D20 D21 D22 D23 D24 D25 D26 D27 D28 D29 D30 D31 3V3 C8 C9 C10 C11 C12 C13 C14 100NF 100NF 100NF 100NF 100NF 100NF 100NF 28 41 54 6 12 46 52 256 Mbits TSOP54 PACKAGE 4.2.2 Software Configuration The following configuration must be performed: • Assign the EBI CS1 to the SDRAM controller by setting the bit EBI_CS1A in the EBI Chip Select Assignment Register located in the bus matrix memory space. • Initialize the SDRAM Controller according to the SDRAM device and system bus frequency. The Data Bus Width must be programmed to 32 bits. The data lines D[16..31] may be multiplexed with PIO lines. In this case, the dedicated PIOs must be programmed in peripheral mode in the PIO controller. The SDRAM initialization sequence is described in the “Initialization Sequence” on page 11. 5. SDRAM Signal Routing Considerations The critical high speed signal is associated with the SDRAM. The following are general guidelines for designing an SDRAM interface with AT91SAM9 products with a targeted speed of 100 MHz on SDCK. • Layout for the SDRAM should begin by placing the SDRAM devices as close as possible to the processor. A longer trace increases the rise and fall time of the signals. The setup time of signals generated by the AT91 microcontroller decreases with increased trace length. • Keep the SDRAM clock (SDCK) and the SDRAM control lines as short as possible. • Keep the address and data lines as short as possible. • For proper SDRAM operation at 100 Mhz, 10 to 30 Ohm series resistors can be placed on all the switching signals to limit the current flow into each of the outputs. The resistor placement is to be located near the processor. The need and specific value of series termination resistors on the signals is best determined by simulation using IBIS models and the specific design PCB layout. • To support maximum speeds, reasonable SDRAM loading constraints must be followed. For high-speed operation, the maximum load cannot exceed 50 pF on address and data buses 4 6256A–ATARM–19-Sep-06 Application Note and 10 pF on SDCK. The user must consider all the devices connected on the different buses to calculate the system load. • Use sufficient decoupling scheme for memory devices. It is recommended to use low ESR 0.01 µF and 0.1 µF decoupling capacitors in parallel. An additional 0.001 µF decoupling capacitor is recommended to minimize ground bounce and to filter high frequency noise. 6. SDRAM Access Definition 6.1 SDRAM Controller Write Cycle The SDRAM Controller allows burst access or single access. In both cases, the SDRAM controller keeps track of the active row in each bank, thus maximizing performance. To initiate a burst access, the SDRAM Controller uses the transfer type signal provided by the master requesting the access. If the next access is a sequential write access, writing to the SDRAM device is carried out. If the next access is a sequential write access, but the current access is to a boundary page, or if the next access is in another row, then the SDRAM Controller generates a precharge command, activates the new row and initiates a write command. To comply with SDRAM timing parameters, additional clock cycles are inserted between precharge/active (tRP) commands and active/write (tRCD) commands. 6.2 SDRAM Controller Read Cycle The SDRAM Controller allows burst access or single access. In all cases, the SDRAM Controller keeps track of the active row in each bank, thus maximizing performance. If row and bank addresses do not match the previous row/bank address, then the SDRAM controller automatically generates a precharge command, activates the new row and starts the read command. To comply with SDRAM timing parameters, additional clock cycles on SDCK are inserted between precharge and active commands (tRP) and between active and read commands (tRCD). These two parameters are set in the configuration register of the SDRAM Controller. After a read command, additional wait states are generated to comply with the CAS latency (1, 2 or 3 clock delays specified in the configuration register). 6.3 Border Management When the memory row boundary has been reached, an automatic page break is inserted. In this case, the SDRAM controller generates a precharge command, activates the new row and initiates a read or write command. To comply with SDRAM timing parameters, an additional clock cycle is inserted between the precharge/active (tRP) command and the active/read (tRCD) command. 5 6256A–ATARM–19-Sep-06 Application Note Figure 6-1. Read/Write General Access CAS = 2 SDCS SDCK SDRAMC_A[12:0] col a READ Cmd D[31:0] (Input) Figure 6-2. col a col b col b col c col d Dna col c col d WRITE Dnb Dnc Dnd Dna Dnb Dnc Dnd Read/Write Access After a Refresh tRCD = 3 tRCD = 3 CAS = 2 SDCS SDCK SDRAMC_A[12:0] Cmd Row n ACT D[31:0] (Input) NOP col a NOP col b col c Row m col d READ Dna ACT Dnb Dnc Dnd NOP col a NOP col b col c col d WRITE Dna Dnb Dnc Dnd 6 6256A–ATARM–19-Sep-06 Application Note Figure 6-3. Read/Write Access After a Bank Opening tRP = 3 tRCD = 3 CAS = 2 SDCS SDCK SDRAMC_A[12:0] Cmd Row n PRE NOP NOP ACT NOP col a NOP col b col c col d READ D[31:0] (Input) Dna Dnb Dnc Dnd tRCD = 3 tRP = 3 SDCS SDCK SDRAMC_A[12:0] Cmd Row m PRE NOP NOP ACT D[31:0] (Input) NOP col a NOP col b col c col d WRITE Dna Dnb Dnc Dnd 7. SDRAM Performance Definition The SDRAM interface operates at system bus clock, up to a maximum frequency of 100 MHz. Using a 198 MHz AT91SAM9261 system as an example, the ARM926™ core runs at 198 MHz, the system bus operates at one-half the core frequency, thus the SDRAM interface operates at 99 MHz. The performance of the SDRAM interface is measured in throughput, which is the amount of data that can be transferred to and from the SDRAM in a given time period. The throughput in bytes/second can be expressed by: T = bytes/s or (bytes / cycles) * (cycles/s) This depends on the SDRAM clock frequency (SDCK), the number of bytes per transfer (BPT) and the number of cycles. If the number of cycles per read and per write (CPR and CPW) are different, this results in: 7 6256A–ATARM–19-Sep-06 Application Note • read contribution TR = SDCK * BPT / CPR • write contribution TW = SDCK * BPT / CPW Finally, let’s introduce a ratio of read and write (RR and RW), equal to the percentage of accesses that are reads and writes. In all cases, RR + WR must equal 1.0. Formally, the SDRAM throughput (T) can be estimated by the sum of the amount of data transferred by each mode (Read and Write) divided by the sum of access cycles, assuming SDRAM is doing nothing else during the delay: T = (RR*TR*CPR + WR*TW*CPW) / (RR*CPR + WR*CPW) Assuming the BPT is the same for reads and for writes: T = SDCK * BPT * (RR + WR) / (RR*CPR + WR*CPW) As RR + WR equals 1.0, finally: T = SDCK * BPT / (RR*CPR + WR*CPW) with: • SDCK: The SDRAM Clock is the main factor in determining the SDRAM throughput. As all the accesses are paced by it, the higher the frequency of the SDRAM clock, the higher the SDRAM throughput. • BPT: The SDRAM Controller allows burst access or single access. In all cases, the SDRAM Controller keeps track of the active row in each bank, thus maximizing performance of the SDRAM. • Number of cycles per Read/Write • CPR and CPW: The cycles per read CPR is CAS latency cycles + N cycles for burst of N words + 1 cycle for synchronizing with the internal system bus. The cycles per write (CPW) equal N cycles for a burst of N words. Additional cycles are included on memory boundary or after a refresh command. • Ratio of Read and Write (RR and WR): This ratio depends on the application and can vary from 99-1% to 50-50%. 8 6256A–ATARM–19-Sep-06 Application Note 8. Influence of SDRAM Parameters 8.1 8.1.1 SDRAM Access Type Single Access Single accesses occur when a single memory location is accessed per SDRAM access. If the access is a non-cached read, the access is the least efficient access possible. 8.1.2 Burst Access Since the number of cycles to access SDRAM is pre-determined, the setup time cannot be reduced, but it can be amortized to minimize its impact. The higher the length of the burst, the higher the SDRAM throughput. In this typical case: • SDCK is 99 MHz (cycles/second) • Case 1: The number of bytes per transfer (BPT) is 32, corresponding to 8 words per transfer. • Case 2: The number of bytes per transfer (BPT) is 4, corresponding to one single access. • No memory boundary is reached, no Bank is to be opened. • CAS is 2 (cycles), i.e., CPR is 11 (cycles) and CPW is 8 (cycles) for an 8-word burst access, CPR is 4 (cycles) and CPW is 1 (cycle) for a single access. T1 = 99M * 32 / ((RR*11) + (WR*8)) T2 = 99M * 4 / ((RR*4) + (WR*1)) Table 8-1. RR/WR Results for Different Application Cases 80/20 50/50 20/80 T1 (Mbytes/s) 305 333 368 T2 (Mbytes/s) 116 158 247 The user should avoid single accesses for best performance. In the rest of the document, Case 1 with an RR/WR of 50/50 is the reference. 8.2 SDRAM CAS CAS can be 1, 2 or 3 cycles. As CAS are additional delays, an SDRAM device with a CAS latency of 1 yields better throughput than an SDRAM with a CAS latency of 3. The CAS latency should be selected depending on the operating frequency. Under the same conditions, selecting an SDRAM device with a CAS latency of 3 means that CPR is 12 (cycles) and CPW is 8 (cycles). Thus: T = 99M * 32 / ((0.5*12) + (0.5*8)) = 316 Mbytes/s A permanent throughput reduction by 16 Mbytes/s (5%) is relative to the CAS 2 SDRAM. Using an SDRAM with a low CAS latency is desirable. 9 6256A–ATARM–19-Sep-06 Application Note 8.3 SDRAM Refreshes SDRAM requires periodic refreshes to ensure the integrity of the data arrays. During an SDRAM refresh, accesses by the core are stalled until the refresh completes. SDRAM requires periodic refresh of all rows every 64 milliseconds. For SDRAM devices with 4096 rows, this gives a refresh cycle every 15.7 microseconds or 63,694 refreshes per second. An auto-refresh command is used to refresh the SDRAM device. Refresh addresses are generated internally by the SDRAM device and incremented after each auto-refresh automatically. The SDRAM Controller generates these auto-refresh commands periodically. An internal timer is loaded with the value in the register SDRAMC_TR that indicates the number of clock cycles between refresh cycles. An auto refresh phase typically requires 11 cycles (TRP + TRC), therefore: 63694 * 11 = 700,634 SDRAM clock cycles are consumed each second by refreshes, which are not data accesses. While the number is a small fraction of the available 99 MHz SDRAM clock cycles (0.7%), it does represent a throughput reduction of nearly 2 Mbytes/second. 8.4 8.4.1 Bus Masters ARM926EJ-S™ The ARM926EJ-S core includes additional address-synchronization cycles in the access. These cycles do not include the 1 cycle data bus synchronization. Delays occur when performing initial access to memory due to cache overheads. These include cache lookup failure (potential MMU table walks), checks for write buffer draining, bus granting, etc. Cached cores are designed to perform at their best when operating from the cache - there will always be penalty cycles seen before accesses occur to external memory. When the read is performed, a burst occurs on the bus and the data is read into ARM registers. For the subsequent write, the data is sent directly into the core’s write buffer; this drains in parallel with the core operation. If the subsequent operation is a read from external memory, then a delay occurs until the write buffer drain completes. In summary, read and write actions mask the time for the write buffer to drain. A write saturates the buffer, and only a read can see the cache miss penalty for each LDM operation. 8.4.2 DMA A DMA can access the SDRAM without any additional cycles. The throughput is maximized. 8.5 SDRAM Memory Boundaries When the memory row boundary has been reached, the row is closed (PRECHARGE command) before opening the new row. TRP and TRCD, which are 2 cycles long each, are to be added, CPR is 14 (cycles) and CPW is 12 (cycles). The throughput becomes: T = 99M * 32 / ((0.5*15) + (0.5*12)) = 234 Mbytes/s 10 6256A–ATARM–19-Sep-06 Application Note This event occurs each time a burst reaches a memory row boundary. In worst case, with an SDRAM page size of 255 bytes, this event occurs 8 / 255 = 3% of the time; 1.5% with an SDRAM page size of 512 bytes. Over time, the throughput reduction is about: (333 - 234) * 3 % = 3.2 Mbytes/s = 0.9% This influence is too small to be considered. 8.6 Conclusion To summarize the influence of SDRAM parameters: • As the influence of the SDRAM clock is essential, it must be set appropriately. • SDRAM CAS latency impacts the throughput. The CAS latency must be set to the lowest value matching the SDRAM frequency. • SDRAM page size has no measurable influence. • SDRAM refresh register should be set with an optimal value. A refresh delay shorter than necessary penalizes the throughput without any positive influence. 9. AT91SAM9261 SDRAM Controller Configuration 9.1 Initialization Sequence The initialization sequence is generated by software. The SDRAM devices are initialized by the following sequence: 1. SDRAM features must be set in the configuration register: asynchronous timings (TRC, TRAS, etc.), number of column, rows, CAS latency, and the data bus width. 2. The SDRAM memory type must be set in the Memory Device Register. 3. A minimum pause of 200 µs is provided to precede any signal toggle. 4. An All Banks Precharge command is issued to the SDRAM devices. The application must set Mode to 2 in the Mode Register and perform a write access to any SDRAM address. 5. Eight auto-refresh (CBR) cycles are provided. The application must set the Mode to 4 in the Mode Register and performs a write access to any SDRAM location height times. 6. A Mode Register set (MRS) cycle is issued to program the parameters of the SDRAM devices, in particular CAS latency and burst length. The application must set Mode to 3 in the Mode Register and perform a write access to the SDRAM. The write address must be chosen so that BA[1:0] are set to 0. For example, with a 16-bit 128 MB SDRAM (12 rows, 9 columns, 4 banks) bank address, the SDRAM write access should be done at the address 0x20000000. 7. The application must go into Normal Mode, setting Mode to 0 in the Mode Register and performing a write access at any location in the SDRAM. 8. Write the refresh rate into the count field in the SDRAMC Refresh Timer register. (Refresh rate = delay between refresh cycles). The SDRAM device requires a refresh every 15.625 us or 7.81 us. With a 100 MHz frequency, the Refresh Timer Counter Register must be set with the value 1562(15.652 is x 100 MHz) or 781(7.81 is x 100 MHz). After initialization, the SDRAM devices are fully functional. Initialization can only be carried out once. 11 6256A–ATARM–19-Sep-06 Application Note 9.2 Micron® 48LC16M16A2-75 The Micron 48LC16M16A2-75 are 16 Mb devices arranged as 2 Mbit x 16 x 4 banks with a CAS latency of 2 at 100 MHz. These devices are mounted on the AT91SAM9261-EK evaluation kits. Table 9-1 describes only software related settings. Additionally, PIOC PC16-PC31 lines have to be configured as D[31:16] for a 32-bit width data bus usage. Table 9-1 gives the settings for two 16-bit SDRAM devices connected in 32-bit mode. This configuration is represented in the section “SDRAM 32-bit Connection” on page 4. Table 9-1. Settings for Two 16-bit SDRAM Devices Connected in 32-bit Mode Description Register/field Settings Value PLL Frequency PMC_PLLAR 198 MHz 0x20603F09 Processor / Bus Clock PMC_MCKR 198 / 99 MHz 0x00000102 EBI_CSA SDRAMC System EBI Chip Select Assignment EBI_CS1A b10 48LC16M16A2-75 SDRAMC_CR 0x85227258 16 bits DBW 32 bits 0 Number of Column 9 NC 9 b01 Number of Rows 13 NR 13 b10 Number of Banks 4 NB 4 b1 2 cycles CAS 2 cycles b10 Last DATA-IN to PRECHARGE time 15 ns TWR 2 cycles 2 REFRESH to ACTIVATE time 66 ns TRC 7 cycles 7 PRECHARGE to ACTIVATE time 20 ns TRP 2 cycles 2 ACTIVATE to READ/WRITE time 20 ns TRCD 2 cycles 2 ACTIVATE to PRECHARGE time 44 ns TRAS 5 cycles 5 SELF REFRESH write to ACTIVATE time 75 ns TXSR 8 cycles 8 7 µs SDRAMC_TR 7 µs 0x2b5 SDRAM Device Databus Width CAS Latency SDRAM Refresh Timer Register - timer count 12 6256A–ATARM–19-Sep-06 Application Note 9.3 Software Example The code below is based on the Atmel libV3 definitions. #include "AT91SAM9261.h" #include "lib_AT91SAM9261.h" #define AT91C_MASTER_CLOCK 100000000 #define AT91C_SDRAM ((volatile unsigned int *)0x20000000) /* Base address of the SDRAM */ //*-------------------------------------------------------------------------------------//* Function Name : AT91F_InitSDRAM16 //* Object : Initialize the SDRAM in 16Bit mode //* Input Parameters : //* Output Parameters : //*-------------------------------------------------------------------------------------void AT91F_InitSDRAM16 (void) { /* Assign The CS1 to SDRAM function */ (*AT91C_MATRIX_EBICSA) |= AT91C_MATRIX_CS1A_SDRAMC; /* Set the SDRAM features*/ *AT91C_SDRAMC_CR = AT91C_SDRAMC_NC_9 | AT91C_SDRAMC_NR_13 | AT91C_SDRAMC_CAS_2 | AT91C_SDRAMC_NB_4_BANKS | AT91C_SDRAMC_DBW_16_BITS | AT91C_SDRAMC_TWR_2 | AT91C_SDRAMC_TRC_7 | AT91C_SDRAMC_TRP_2 | AT91C_SDRAMC_TRCD_2 | AT91C_SDRAMC_TRAS_5 | AT91C_SDRAMC_TXSR_8 ; /* Perform an All banks Precharge command */ *AT91C_SDRAMC_MR= 0x00000002; *AT91C_SDRAM = 0; /* Perform 8 auto-refresh (CBR) cycles*/ . *AT91C_SDRAMC_MR= AT91C_SDRAMC_MODE_RFSH_CMD; *AT91C_SDRAM = 0x00000000; *AT91C_SDRAMC_MR= AT91C_SDRAMC_MODE_RFSH_CMD; *AT91C_SDRAM = 0; 13 6256A–ATARM–19-Sep-06 Application Note *AT91C_SDRAMC_MR= AT91C_SDRAMC_MODE_RFSH_CMD; *AT91C_SDRAM = 0; *AT91C_SDRAMC_MR= AT91C_SDRAMC_MODE_RFSH_CMD; *AT91C_SDRAM = 0; *AT91C_SDRAMC_MR= AT91C_SDRAMC_MODE_RFSH_CMD; *AT91C_SDRAM = 0; *AT91C_SDRAMC_MR= AT91C_SDRAMC_MODE_RFSH_CMD; *AT91C_SDRAM = 0; *AT91C_SDRAMC_MR= AT91C_SDRAMC_MODE_RFSH_CMD; *AT91C_SDRAM = 0; *AT91C_SDRAMC_MR= AT91C_SDRAMC_MODE_RFSH_CMD; *AT91C_SDRAM /* = 0; Perform a Mode Register set (MRS) cycle to program the parameters of the SDRAM devices*/ *AT91C_SDRAMC_MR= AT91C_SDRAMC_MODE_LMR_CMD; *AT91C_SDRAM = 0; /* Set refresh rate into the SDRAMC Refresh Timer register(7.8 µs)*/. *AT91C_SDRAMC_TR= 780; /* 780 = AT91C_MASTER_CLOCK * 7.8µs */ /* Set Normal mode*/ *AT91C_SDRAMC_MR= AT91C_SDRAMC_MODE_NORMAL_CMD; *AT91C_SDRAM = 0; } //*-------------------------------------------------------------------------------------//* Function Name : AT91F_InitSDRAM32 //* Object : Initialize the SDRAM in 32bit mode //* Input Parameters : //* Output Parameters : //*-------------------------------------------------------------------------------------void AT91F_InitSDRAM32 (void) { /* Assign The CS1 to SDRAM function */ (*AT91C_MATRIX_EBICSA) |= AT91C_MATRIX_CS1A_SDRAMC; /* Configure the PIO line multiplexed with the data[31:16] in peripheral mode*/ AT91F_SDRAMC_CfgPIO(); /* Set the SDRAM features*/ 14 6256A–ATARM–19-Sep-06 Application Note *AT91C_SDRAMC_CR = AT91C_SDRAMC_NC_9 | AT91C_SDRAMC_NR_13 | AT91C_SDRAMC_CAS_2 | AT91C_SDRAMC_NB_4_BANKS | AT91C_SDRAMC_DBW_32_BITS | AT91C_SDRAMC_TWR_2 | AT91C_SDRAMC_TRC_7 | AT91C_SDRAMC_TRP_2 | AT91C_SDRAMC_TRCD_2 | AT91C_SDRAMC_TRAS_5 | AT91C_SDRAMC_TXSR_8 ; /* Perform an All banks Precharge command */ *AT91C_SDRAMC_MR= 0x00000002; *AT91C_SDRAM = 0; /* Perform 8 auto-refresh (CBR) cycles*/ . *AT91C_SDRAMC_MR= AT91C_SDRAMC_MODE_RFSH_CMD; *AT91C_SDRAM = 0; *AT91C_SDRAMC_MR= AT91C_SDRAMC_MODE_RFSH_CMD; *AT91C_SDRAM = 0; *AT91C_SDRAMC_MR= AT91C_SDRAMC_MODE_RFSH_CMD; *AT91C_SDRAM = 0; *AT91C_SDRAMC_MR= AT91C_SDRAMC_MODE_RFSH_CMD; *AT91C_SDRAM = 0; *AT91C_SDRAMC_MR= AT91C_SDRAMC_MODE_RFSH_CMD; *AT91C_SDRAM = 0; *AT91C_SDRAMC_MR= AT91C_SDRAMC_MODE_RFSH_CMD; *AT91C_SDRAM = 0; *AT91C_SDRAMC_MR= AT91C_SDRAMC_MODE_RFSH_CMD; *AT91C_SDRAM = 0; *AT91C_SDRAMC_MR= AT91C_SDRAMC_MODE_RFSH_CMD; *AT91C_SDRAM /* = 0; Perform a Mode Register set (MRS) cycle to program the parameters of the SDRAM devices*/ *AT91C_SDRAMC_MR= AT91C_SDRAMC_MODE_LMR_CMD; *AT91C_SDRAM = 0; /* Set refresh rate into the SDRAMC Refresh Timer register(7.8 µs)*/. 15 6256A–ATARM–19-Sep-06 Application Note *AT91C_SDRAMC_TR= 780; /* 780 = AT91C_MASTER_CLOCK * 7.8µs */ /* Set Normal mode*/ *AT91C_SDRAMC_MR= AT91C_SDRAMC_MODE_NORMAL_CMD; *AT91C_SDRAM = 0; } 9.4 ARM926EJ-S Access Results without MMU The system is configured as described in Table 9-1, “Settings for Two 16-bit SDRAM Devices Connected in 32-bit Mode,” on page 12. Only Icache is enabled to minimize the impact of code execution on measurements. The ARM Data Master is set as the fixed default master for the EBI. The code copies 32 MBytes from 0x20000000 to 0x22000000 with different methods. This means 64 Mbytes are transferred. While SDCK is 99 MHz, one cycle corresponds to 10 ns in all the waveforms. 9.4.1 9.4.1.1 Single Access Description The code performs read and write accesses of one word consecutively. Under these conditions, the ARM core adds 3 cycles between Write and Read and 6 cycles between Read and Write. These cycles include the code fetching in internal SRAM. This is the major effort of the ARM core in this case. Figure 9-1. SDRAM CAS Signal for a Single Access without MMU The cycles per read (CPR) equal 2 CAS latency cycles + 1 cycle for 1 word + 6 cycles for the ARM core + 1 cycle for bus synchronization. The cycles per write (CPW) equal 1 cycle for a burst of 1 word + 3 cycles for the ARM core. The RR equals WR. Theoretically, the throughput equals: 16 6256A–ATARM–19-Sep-06 Application Note T = 99M * 4 / (0.5*10 + 0.5*4) = 57 Mbytes/s 9.4.1.2 Result The 64 Mbytes are transferred in 1183 ms including code execution time. This gives a throughput of about 54 Mbytes/s. 9.4.2 9.4.2.1 4-word Burst Access Description The code performs read and write accesses consecutively with 4-word long ldmia and stmia instructions. Under these conditions, the ARM core adds 3 cycles between Write and Read and 5 cycles between Read and Write. These cycles includes the code fetching in internal SRAM. Figure 9-2. SDRAM CAS Signal for a 4-word Burst Access without MMU The cycles per read (CPR) equal 2 CAS latency cycles + 4 cycles for a burst of 4 words + 5 cycles for ARM core + 1 cycle for bus synchronization. The cycles per write (CPW) equal 4 cycles for burst of 4 words + 3 cycles for ARM core. The RR equals WR. Theoretically, the thoughput equals: T = 99M * 16 / (0.5*12 + 0.5*7) = 166 Mbytes/s 9.4.2.2 Result The 64 Mbytes are transferred in 422 ms including code execution time. This gives a throughput of about 152 Mbytes/s. 17 6256A–ATARM–19-Sep-06 Application Note 9.4.3 9.4.3.1 8-word Burst Access Description The code performs read and write accesses consecutively with 8-word long ldmia and stmia instructions. Under these conditions the ARM core adds 4 cycles between Write and Read and 5 cycles between Read and Write. These cycles includes the code fetching in internal SRAM. Figure 9-3. SDRAM CAS Signal for an 8-word Burst Access without MMU The cycles per read (CPR) equal 2 CAS latency cycles + 8 cycles for burst of 8 words + 5 cycles for ARM core + 1 cycle for synchronization. The cycles per write (CPW) equal 8 cycles for burst of 8 words + 4 cycles for ARM core. The RR equals WR. Theoretically, the thoughput equals: T = 99M * 32 / (0.5*16 + 0.5*12) = 226 Mbytes/s 9.4.3.2 Result The 64 Mbytes are transferred in 295 ms including code execution time. This gives a thoughpuof about 217 Mbytes/s. 18 6256A–ATARM–19-Sep-06 Application Note 9.5 ARM926EJ-S Accesses Results with MMU The system is configured as described Table 9-1 on page 12. MMU, Icache and Dcache are enabled to optimize SDRAM accesses and minimize the impact of code execution on measurements. The ARM Data Master is set as the fixed default master for the EBI. 9.5.1 9.5.1.1 Single Access Description The code performs read and write accesses of one word consecutively. With the MMU and the Data Cache, the ARM core optimizes the access and performs a burst access to drain the write buffer. Figure 9-4. SDRAM CAS Signal for a Single Access with MMU As it is difficult to separate Read and Write accesses, use the number of cycles for a 16-word long transfer (8 words are read and 8 words are write) as shown on the waveforms. Thus the number of cycles CPR + CPW + ARM core cycles is: 4 + 1 + 1 + 1 + 1 + 1 + 3 + 1 + 2 + 1 + 3 + 1 + 2 + 1 + 3 + 9 = 35. Theoretically, the throughput equals: T = 99M * 64 / 35 = 181 Mbytes/s Result The 64 Mbytes are transferred in 369 ms including code execution time. This gives a throughput of about 173 Mbytes/s. 19 6256A–ATARM–19-Sep-06 Application Note 9.5.2 9.5.2.1 4-word Burst Access Description The code performs read and write accesses consecutively with 4-word long ldmia and stmia instructions. With the MMU and the Data Cache, the ARM core optimizes the access. Figure 9-5. SDRAM CAS Signal for a 4-word Access with MMU As it is difficult to separate Read and Write accesses, use the number of cycles for a 16-word long transfer (8 words are read and 8 words are write) as shown on the waveforms. Thus the number of cycle CPR + CPW + ARM core cycles is: 12 + 4 + 4 + 3 = 23 Theoretically, the throughput equals: T = 99M * 64 / 23 = 275 Mbytes/s 9.5.2.2 Result The 64 Mbytes are transferred in 242 ms including code execution time. This gives a throughput of about 264 Mbytes/s. 20 6256A–ATARM–19-Sep-06 Application Note 9.5.3 9.5.3.1 8-word Burst Access Description The code performs read and write accesses consecutively with 8-word long ldmia and stmia instructions. With the MMU and the Data Cache, the ARM core optimizes the access and performs 16-word burst accesses. Figure 9-6. SDRAM CAS Signal for a 8-word Access with MMU As it is difficult to separate Read and Write accesses, use the number of cycles for a 16-word long transfer (8 words are read and 8 words are write) as shown on the waveforms. Thus the number of cycle CPR + CPW + ARM core cycles is: 16 + 7 = 23 Theoretically, the thoughput equals: T = 99M * 64 / 23 = 275 Mbytes/s 9.5.3.2 Result The 64 Mbytes are transferred in 242 ms including code execution time. This gives a thoughput of about 264 Mbytes/s. 21 6256A–ATARM–19-Sep-06 Application Note 9.6 9.6.1 SDRAM Performance Conclusion Theory versus Real Life The 5% difference between theory and measurement is due to the following factors, from most important to the least important: 1. the ARM that executes code from the Instruction Cache. Notice the transfer time includes the code execution time 2. PRECHARGE command and Bank opening that are ignored in the theory 3. PIT accuracy of 1ms 9.6.2 Checklist • As the SDRAM clock influence is essential, it must be set appropriately. • SDRAM CAS latency impacts the thoughput. The CAS latency must be set to a value matching the SDRAM frequency. • SDRAM page size has no measurable influence. • SDRAM refresh register is to be set with an optimal value. A refresh delay shorter than necessary only penalizes the throughput without any positive influence. • Software should take advantage of the SDRAM open-bank policy by locating code, data, etc. on separate SDRAM bank and row boundaries. • Software should avoid single-beat accesses for best performance. • Use MMU, Icache and Dcache as often as possible for best performance and minimum penalty in code running time. • 8-word burst accesses are not necessary as the same results can be obtained with 4-word because of ARM optimization. 4-word accesses use fewer registers and are easier to manage by software. • Software should attribute each Bus Master to an SDRAM Bank to save bank opening time, especially for high bandwidth peripherals such as the LCD DMA. 22 6256A–ATARM–19-Sep-06 Atmel Corporation 2325 Orchard Parkway San Jose, CA 95131, USA Tel: 1(408) 441-0311 Fax: 1(408) 487-2600 Regional Headquarters Europe Atmel Sarl Route des Arsenaux 41 Case Postale 80 CH-1705 Fribourg Switzerland Tel: (41) 26-426-5555 Fax: (41) 26-426-5500 Asia Room 1219 Chinachem Golden Plaza 77 Mody Road Tsimshatsui East Kowloon Hong Kong Tel: (852) 2721-9778 Fax: (852) 2722-1369 Japan 9F, Tonetsu Shinkawa Bldg. 1-24-8 Shinkawa Chuo-ku, Tokyo 104-0033 Japan Tel: (81) 3-3523-3551 Fax: (81) 3-3523-7581 Atmel Operations Memory 2325 Orchard Parkway San Jose, CA 95131, USA Tel: 1(408) 441-0311 Fax: 1(408) 436-4314 RF/Automotive Theresienstrasse 2 Postfach 3535 74025 Heilbronn, Germany Tel: (49) 71-31-67-0 Fax: (49) 71-31-67-2340 Microcontrollers 2325 Orchard Parkway San Jose, CA 95131, USA Tel: 1(408) 441-0311 Fax: 1(408) 436-4314 La Chantrerie BP 70602 44306 Nantes Cedex 3, France Tel: (33) 2-40-18-18-18 Fax: (33) 2-40-18-19-60 ASIC/ASSP/Smart Cards 1150 East Cheyenne Mtn. Blvd. Colorado Springs, CO 80906, USA Tel: 1(719) 576-3300 Fax: 1(719) 540-1759 Biometrics/Imaging/Hi-Rel MPU/ High-Speed Converters/RF Datacom Avenue de Rochepleine BP 123 38521 Saint-Egreve Cedex, France Tel: (33) 4-76-58-30-00 Fax: (33) 4-76-58-34-80 Zone Industrielle 13106 Rousset Cedex, France Tel: (33) 4-42-53-60-00 Fax: (33) 4-42-53-60-01 1150 East Cheyenne Mtn. Blvd. Colorado Springs, CO 80906, USA Tel: 1(719) 576-3300 Fax: 1(719) 540-1759 Scottish Enterprise Technology Park Maxwell Building East Kilbride G75 0QR, Scotland Tel: (44) 1355-803-000 Fax: (44) 1355-242-743 Literature Requests www.atmel.com/literature Disclaimer: The information in this document is provided in connection with Atmel products. No license, express or implied, by estoppel or otherwise, to any intellectual property right is granted by this document or in connection with the sale of Atmel products. EXCEPT AS SET FORTH IN ATMEL’S TERMS AND CONDITIONS OF SALE LOCATED ON ATMEL’S WEB SITE, ATMEL ASSUMES NO LIABILITY WHATSOEVER AND DISCLAIMS ANY EXPRESS, IMPLIED OR STATUTORY WARRANTY RELATING TO ITS PRODUCTS INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTY OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, OR NON-INFRINGEMENT. IN NO EVENT SHALL ATMEL BE LIABLE FOR ANY DIRECT, INDIRECT, CONSEQUENTIAL, PUNITIVE, SPECIAL OR INCIDENTAL DAMAGES (INCLUDING, WITHOUT LIMITATION, DAMAGES FOR LOSS OF PROFITS, BUSINESS INTERRUPTION, OR LOSS OF INFORMATION) ARISING OUT OF THE USE OR INABILITY TO USE THIS DOCUMENT, EVEN IF ATMEL HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. Atmel makes no representations or warranties with respect to the accuracy or completeness of the contents of this document and reserves the right to make changes to specifications and product descriptions at any time without notice. Atmel does not make any commitment to update the information contained herein. Unless specifically provided otherwise, Atmel products are not suitable for, and shall not be used in, automotive applications. Atmel’s products are not intended, authorized, or warranted for use as components in applications intended to support or sustain life. © 2006 Atmel Corporation. All rights reserved. Atmel®, logo and combinations thereof, Everywhere You Are ® and others are registered trademarks or trademarks of Atmel Corporation or its subsidiaries. ARM ®, the ARMPowered ® logo, Thumb ® and others are registered trademarks or trademarks of ARM Ltd. Other terms and product names may be trademarks of others. 6256A–ATARM–19-Sep-06