Application Note AC422 SmartFusion2 - Optimizing DDR Controller for Improved Efficiency - Libero SoC v11.4 Table of Contents ed ed Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3 Design Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3 Optimization Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4 Frequency of Operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Burst Length . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . AXI Master without Write Response State . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Read Address Queuing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Series of Writes or Reads . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DDR Configuration Tuning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 4 4 5 6 6 Implementation on SmartFusion2 Device . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8 Design Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8 Hardware Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11 rs Configuring the System Builder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 Simulation using Micron DDR3 SDRAM Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 Simulation using Microsemi DDR3 SDRAM VIP Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 Software Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .30 Running the Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .32 pe Board Jumper Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Host PC to Board Connections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . USB Driver Installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Steps to Run the Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DDR3 SDRAM Bandwidth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 32 32 33 37 Su Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .39 Appendix A – Design Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .40 List of Changes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .41 Purpose This document describes the techniques for improving the efficiency of double-data-rate (DDR) Controller using an example design for the SmartFusion®2 Development Kit board. It also provides details about implementing the DDR SDRAM simulation flow using the Micron DDR3 SDRAM model and Microsemi DDR3 SDRAM verification ip (VIP) model. August 2014 © 2014 Microsemi Corporation 1 SmartFusion2 - Optimizing DDR Controller for Improved Efficiency - Libero SoC v11.4 Introduction The SmartFusion2 device has two high-speed hardened ASIC memory controllers Microcontroller subsystem (MSS) DDR (MDDR) and Fabric DDR (FDDR) for interfacing with the DDR2, DDR3, and LPDDR1 SDRAM memories. The MDDR and FDDR subsystems are used to access high-speed DDR memories for high-speed data transfer and code execution. 066 ''5 6'5$0 ' ' 5 , 2 ed ed The DDR memory connected to the MDDR subsystem can be accessed by the MSS masters and the master logic implemented in the FPGA fabric (FPGA fabric master), whereas the DDR memory connected to the FDDR subsystem can be accessed only by an FPGA fabric master. The FPGA fabric masters communicate with the MDDR and FDDR subsystems through the AXI or AHB interfaces. Figure 1 illustrates the MDDR data path for AXI/AHB interfaces. $50&RUWH[0 6 ' , 0''5 ' ' 5 3 + < ''5 &RQWUROOHU ELW$;, $;, 7UDQVDFWLRQ &RQWUROOHU $3%&RQILJ 5HJ '6 6 &DFKH &RQWUROOHU ' ,& ''5B),& ELW$;,VLQJOH ELW$+%/ 'XDOELW$+%/ +3'0$ $+%%XV0DWUL[ ),&B ),&B ),&B rs ELW$3% ,'& 066''5 %ULGJH $3% 0DVWHU $;,$+% 0DVWHU pe )3*$)$%5,& 6PDUW)XVLRQ Figure 1 • MDDR Data Path for AXI/AHB Interfaces Su The AXI interface is typically used for burst transfers that provide an efficient access path and high throughput. Though the throughput is dependent on many system level parameters, it can be improved by applying specific optimization techniques.This application note describes a few DDR SDRAM controller optimization techniques with an example design for SmartFusion2 Development Kit board. Refer to SmartFusion2 SoC FPGA High Speed DDR Interfaces User Guide for more information on MDDR and FDDR subsystems. The sample design consists of an AXI master, LSRAM, and counters for throughput measurement. During the write operation, the AXI master reads the LSRAM and writes to the DDR3 memory and measures the throughput. During the read operation, the AXI master reads the DDR3 memory and writes to LSRAM and measures the throughput. The throughput values are displayed on the Host PC using the UART interface. 2 R e vi s i o n 3 References Following are the types of memory simulation models that can be used: • Microsemi provided generic DDR memory simulation model (VIP): The Libero® System-onChip (SoC) includes a JEDEC compliant VIP model. This VIP model is attached to the pin side of the MDDR/FDDR subsystem and simulates the functionality of a DDR memory device. This VIP model can be configured for DDR2, DDR3, and LPDDR SDRAM memories This VIP model is intended to complement vendor models or to act as a substitute in case a vendor model is not available. • Vendor-specific memory model: Memory vendors such as Micron, Samsung, and Hynix provide downloadable simulation models for specific memory devices. Make sure that the downloaded simulation model is JEDEC compliant. References ed ed This document also describes the DDR SDRAM simulation flow using the Micron DDR3 SDRAM and Microsemi DDR3 SDRAM VIP models. The following list of references is used in this document. These references complement and help in understanding the relevant Microsemi® SmartFusion2 System-on-Chip (SoC) filed programmable gate array (FPGA) device features and flows that are described in this document: • SmartFusion2 SoC FPGA High Speed DDR Interfaces User Guide • Connecting User Logic to AXI Interfaces of High-Performance Communication Blocks-SmartFusion2 • Connecting User Logic to the SmartFusion Microcontroller Subsystem • DDR Controller and Serial High Speed Controller Initialization Methodology • SmartFusion2 Development Kit User Guide rs Design Requirements Table 1 lists the design requirements. Table 1 • Design Requirements pe Design Requirements Description Hardware Requirements SmartFusion2 Development Kit Host PC or Laptop Rev D or later Any 64-bit Windows Operating System Su Software Requirements Libero SoC v11.4 SoftConsole v3.4 One of the following serial terminal emulation programs: • - HyperTerminal • TeraTerm • PuTTY Revision 3 3 SmartFusion2 - Optimizing DDR Controller for Improved Efficiency - Libero SoC v11.4 Optimization Techniques This section describes the following optimization techniques: • Frequency of Operation • Burst Length • AXI Master without Write Response State • Read Address Queuing • Series of Writes or Reads • DDR Configuration Tuning Frequency of Operation Burst Length ed ed The MDDR and FDDR subsystems support clock management dividers directly inside the embedded block. The divider ratios can be selected directly from the Clock Configurator for DDR clocks (MDDR_CLK/FDDR_CLK) and DDR_FIC clock. The best overall throughput ratio is 2:1, that is, half the DDR clock frequency. Many other ratios are possible to provide flexibility to the FPGA design. To show the optimal data throughput, this application note shows all examples using the 2:1 ratio. The design example uses 64-bit AXI as a FPGA fabric interface and configured to use 333.33 MHz as DDR clock frequency and 166.66 MHz as AXI clock. 166.66 MHz is the fastest clock frequency rate available to run the MDDR_CLK as this is the limit of the MSS CLK_BASE. rs The MDDR and FDDR subsystems support the DRAM burst lengths of 4, 8, or 16, depending on the configured bus-width and the DDR type. The AXI transaction controller in the MDDR and FDDR subsystem supports up to 16-beat burst read and write. The AXI beat burst length (write and read) and burst length of DRAM affect the optimal performance, but by setting the maximum supported burst length for DDR SDRAM and AXI interface achieve the optimal performance. The design example uses a DDR SDRAM burst length of 8 and an AXI write and read beat burst length of 16. pe Note: The design example is designed to run on the SmartFusion2 Development Kit board, which has the SmartFusion2 M2S050 device and a DDR3 SDRAM from Micron with the part number; MT41J256M8HX-15E. Both the devices support the maximum burst length of 8. AXI Master without Write Response State Su When the AXI master sends the last data (D (A15)), the WLAST signal goes HIGH, which indicates that the last transfer is in the first write burst. When the AXI slave in DDR subsystem accepts all the data items, it drives a write response (BVALID) back to the master to indicate that the write transaction is complete. By AXI protocol, the AXI master should wait for the write response before initiating the next write transaction. However, the time spent waiting for the write response wastes the clock cycles and reduce the overall throughput. The AXI master can send the second burst write address (B) without waiting for the write response of the first burst. This improves the write throughput by decreasing the wait states. This application note is focused on optimal throughput and therefore the write response channel is not verified. It is recommended that when using this technique, the write response channel is used concurrently with starting the next transfer to ensure that the previous write data has been fully accepted. The AXI protocol has a defined methodology for handling the termination of write burst transaction. This should be followed if the write response channel returns a non-OKAY value. 4 R e vi s i o n 3 Optimization Techniques Figure 2 illustrates the write transaction timing diagram without the write response state. $&/. $:$''5 % $ $:9$/,' '$ '$ '% ed ed :'$7$ :/$67 2.$< %5(63 %9$/,' rs Figure 2 • Write Transaction Timing Diagram without Write Response State Read Address Queuing pe The MDDR and FDDR subsystems support up to four outstanding read transactions. Figure 3 illustrates the burst read address queuing timing diagram. In 2:1 clock ratio, the MDDR controller starts the burst read transaction before command FIFO full which allows AXI master to send five burst read addresses. $&/. $ Su $5$''5 $ $ $ $ $ $ $59$/,' $55($'< $5,' Figure 3 • Read Transaction Timing Diagram with Burst Read Address Queuing Revision 3 5 SmartFusion2 - Optimizing DDR Controller for Improved Efficiency - Libero SoC v11.4 The AXI master increments the burst read address as long as the AXI slave in the DDR subsystem asserts the ARREADY signal. The burst read address queuing significantly increases the read throughput compared to the normal AXI read sequence. Table 7 on page 38 and Table 8 on page 39 show this significant improvement. Read address queuing does not reduce the initial latency associated with a DDR memory read access. By issuing multiple reads in sequence, the initial latency is only accounted for the first read. After the first read data is returned to the reminder of the requested data, the requested data is returned in sequence without a large read access penalty associated with the first read. Series of Writes or Reads The MDDR and FDDR subsystems’ performance depends on the method of data transfer between the DDR SDRAM and AXI master. The following methods of data transfer reduce optimal performance: 1. Single beat burst read and write operation ed ed 2. Random read and write operation 3. Switching between read and write operation The MDDR and FDDR subsystems’ performance increases while performing a series of reads or writes from the same bank and row. Figure 4 illustrates the AXI to DDR3 address mapping for the DDR3 SDRAM on SmartFusion2 Development Kit board. %$1. 52: [ 52: 52: %$1. 52: [ %$1. 52: %$1. [ [ %$1. 52: [ 52: rs %$1. [ %$1. [) %$1. [ 52: [) pe Figure 4 • AXI to DDR3 the Address Mapping When the AXI address crosses 0x0800, the DDR subsystem activates Row 0 of Bank 1. Row 1 of Bank 0 is activated only when the AXI address crosses 0x4000. If a new row is accessed every time, it must be pre-charged first. This means that additional time is needed before a row can be accessed and this reduces the overall throughput. Understanding the internal memory layout of the DDR and how it maps to the AXI address enables the accesses to minimize the row changes and increase the overall throughput. Su DDR Configuration Tuning The DDR SDRAM datasheet provides the timings parameters required for the proper operation in terms of time units. These timings should match with the configuration registers in the MDDR/FDDR controller. The timing parameters are required as number of DDR clock cycles and these are entered in the DDR Configurator GUI. The selection of minimum write or read delay values can result in optimal performance. Implementing this approach requires extensive memory testing to ensure that the memory transfers are stable. The SmartFusion2 Development Kit DDR3 is supplied with a default configuration file to setup the MDDR controller, which is available on its documentation web page. 6 R e vi s i o n 3 Optimization Techniques Table 2 lists the tuned parameter for better performance than that default configuration file. Table 2 • Tuned DDR Timing Parameters Parameters Default Values Tuned Values 6 (clk) 5 RAS min 15 12 RAS max 8192 22528 RCD 6 (clk) 5 RP 7 (clk) 5 REFI 3104 2592 51 17 79 54 6 5 32 10 CAS ed ed RC RFC WR Su pe rs FAW Revision 3 7 SmartFusion2 - Optimizing DDR Controller for Improved Efficiency - Libero SoC v11.4 Implementation on SmartFusion2 Device The optimization techniques that are mentioned in the above section have been implemented and validated using the SmartFusion2 Development Kit board. This section describes the following: • Design Description • Hardware Implementation • Software Implementation • Running the Design Design Description $50&RUWH[0 6 ' , 0''5 ' ' 5 3 + < ''5 &RQWUROOHU $;, 7UDQVDFWLRQ &RQWUROOHU $3%&RQILJ 5HJ ELW$;, '6 6 &DFKH &RQWUROOHU ' ,& ''5B),& +3'0$ $+%%XV0DWUL[ ),&B ),&B ELW$;, rs ELW$3% ,'& 066''5 %ULGJH 2QFKLS0+]5& RVFLOODWRU &RUH&RQILJ3 H65$0 008$57B ''5LQLWLDOL]DWLRQ VXEV\VWHP pe 6<65(6(7B325 *3,2V ''56'5$0 ''5,2 066 ed ed The design consists of MSS, CoreConfigP IP, CoreResetP IP, SYSRESET_POR Macro, on-chip 25/50 MHz RC oscillator, Fabric CCC (FCCC), AXI master (AXI_IF), AHB master (AHB_IF), and a command decoder (CMD_Decoder). Figure 5 shows the block diagram of the design. &RUH5HVHW3 Su )DEULF&&&)&&& $;,5HDG &KDQQHOV $;,0DVWHU$;,B,) $;,:ULWH &KDQQHOV :ULWHWKURXJKSXWFRXQWHU /65$0 5HDGWKURXJKSXWFRXQWHU 0+] )3*$)$%5,& 6PDUW)XVLRQ Figure 5 • Top-Level Block Diagram of the Design 8 R e vi s i o n 3 $+%PDVWHU $+%B,) &RPPDQG'HFRGHU &0'B'HFRGHU +RVW3& Design Description MSS is configured to use one UART interface (MMUART_0), MSS clock conditioning circuit (MSS_CCC), RESET Controller, eight GPIOs, one instance of the fabric interface (FIC_0), FIC_2 (Peripheral Initialization), and MDDR. The FIC_0 interface is configured to use a slave interface with the AHB-Lite (AHBL) interface type. The FIC_2 is configured to initialize the MSS DDR using the ARM® Cortex™-M3 processor along with the CoreConfigP, CoreResetP, and SYSRESET_POR macro. The MMUART_0 is used as an interface for writing to the HyperTerminal. Eight GPIOs are configured as output and routed to the FPGA fabric. The Cortex-M3 processor initiates the AXI write and read operation using these GPIOs. The MDDR is configured to use the DDR3 interface and routed the AXI interface to the FPGA fabric. FCCC is configured to provide the 166.6 MHz reference clock to the MSS_CCC and the fabric logic. The on-chip 25 MHz/50 MHz RC oscillator is the reference clock source for the FCCC. Table 3 lists the MSS_CCC generated clocks. ed ed Table 3 • MSS_CCC Generated Clocks Clock Name Frequency in MHz M3_CLK 166.6 MDDR_CLK 333.2 DDR_SMC_FIC_CLK 166.6 APB_0 83.3 APB_1 83.3 FIC_0_CLK 166.6 rs The command decoder receives the AXI transaction control from the Cortex-M3 processor through GPIOs and generates write, read, write size, and read size signals. Figure 6 illustrates the command decoding. :ULWH6L]H pe 5HDG6L]H *3,2V :5 123 .% :ULWH .% 5HDG .% 123 .% .% Su .% .% .% .% .% Figure 6 • Command Decoding Revision 3 9 SmartFusion2 - Optimizing DDR Controller for Improved Efficiency - Libero SoC v11.4 The AXI master block consists of AXI read channel, AXI write channel, write throughput counter, read throughput counter, and 512x64 LSRAM. It performs the write or read operation1 based on the input signals from the command decoder. During the write operation, the AXI master reads the LSRAM and writes into the DDR3 memory, and then measures the write throughput. During the read operation, the AXI master reads the DDR3 memory and writes into LSRAM, and then measures the read throughput. The write throughput counter counts the AXI clocks between AWVALID of first data and WLAST of last data. Similarly, the read throughput counter counts the AXI clocks between ARVALID of first data and RLAST of last data. After triggering the write or read operation, the AXI master performs the write or read operation eight times to get the average throughput. During the write operation, the write address (AWADDR) starts from 0x00000000, and is incremented by 128 (16-beat burst). During the read operation, the read address (ARADDR) starts from 0x01000000, and is incremented by 128. ed ed After each write or read operation, the AXI master sends the throughput count value and an eSRAM address starting from 0x20008104 to the AHBL master. Then, the AHBL master writes the throughput values into eSRAM. After that the Cortex-M3 processor reads the values and sends to the host PC using the UART interface. Refer to Connecting User Logic to AXI Interfaces of High-Performance Communication Blocks in the SmartFusion2 Devices application note for information on creating a custom AXI interface on user logic. Su pe rs Refer to Connecting User Logic to the SmartFusion Microcontroller Subsystem application note for information on creating a custom AHB interface on user logic. 1. 10 The write or read operation depends on the size of write or read data. For example, if the write size is selected as 2 KB, then one AXI write operation equals to 16x16-beat burst (16x16x64). R e visio n 3 Hardware Implementation Hardware Implementation The hardware implementation involves: • Configuring the System Builder • Connecting with a user logic AXI master (AXI_IF), AHB master (AHB_IF), and a command decoder (CMD_Decoder) Su pe rs ed ed Figure 7 shows the top-level SmartDesign of the example design. Figure 7 • Top-Level SmartDesign Revision 3 11 SmartFusion2 - Optimizing DDR Controller for Improved Efficiency - Libero SoC v11.4 Configuring the System Builder This section describes how to configure the MDDR and other device features and then build a complete system using the System Builder graphical design wizard in the Libero SoC software. For details on how to launch the System Builder wizard and detailed information on how to use it, refer the SmartFusion2 System Builder User Guide. The following steps describe how to configure the MDDR and access it from AXI master in the FPGA fabric: Su pe rs ed ed 1. Go to the System Builder - Device Features tab and check the MDDR check box. Leave the rest of the check boxes unchecked. Figure 8 shows the System Builder - Device Features tab. Figure 8 • System Builder - Device Features Tab 2. Configure the MDDR in Memories tab as shown in Figure 9. In this example, the design is created to access the DDR3 memory with a 16-bit data width and no ECC. 3. Set the DDR memory settling time to 200 µs and click Import Configuration file to initialize the DDR memory. The configuration file is stored in eNVM. The MDDR subsystem registers should be initialized before accessing DDR memory through the MDDR subsystem. The MDDR configuration register file is provided along with the design file (refer "Appendix A – Design Files" on page 40). 12 R e visio n 3 rs ed ed Hardware Implementation Su pe Figure 9 • Memory Configuration Revision 3 13 SmartFusion2 - Optimizing DDR Controller for Improved Efficiency - Libero SoC v11.4 rs ed ed 4. Drag the Fabric AMBA Master in the peripherals tab and drop on to the MSS DDR FIC Subsystem. The AMBA_MASTER_0 is added to the subsystem. Configure the Interface Type as AXI. Figure 10 shows the Peripherals tab. pe Figure 10 • Selecting MMUART_0 and MSS GPIO in Peripherals Tab 5. Drag the Fabric AMBA Master and drop on to the MSS FIC_0 - Fabric Master Subsystem. The AMBA_MASTER_1 is added to the subsystem and configured with AHBLite. Su 6. The design uses MMUART and GPIO MSS peripherals. Select MM_UART_0, MSS_GPIO and uncheck all other peripherals. 14 R e visio n 3 Hardware Implementation ed ed 7. Select Fabric under Connect To option in MM_UART_0 Configuration dialog. Su pe rs Figure 11 • MM_UART_0 Configuration Dialog Revision 3 15 SmartFusion2 - Optimizing DDR Controller for Improved Efficiency - Libero SoC v11.4 pe rs ed ed 8. Use the settings in MSS_GPIO Configurator tab as shown in Figure 12 and keep the rest at default states. Eight GPIOs are configured as output and routed to FPGA fabric. Figure 12 • MSS GPIO Configuration Su 9. Configure the System Clock and Subsystem clocks in Clocks tab as listed in Table 4. Table 4 • System and Subsystem Clocks Clock Name Frequency in MHz System clock On-chip 25 MHz/50 MHz RC oscillator M3_CLK 166.6 MDDR_CLK 333.2 DDR/SMC_FIC_CLK 166.6 APB_0_CLK 83.3 APB_1_CLK 83.3 FIC_0_CLK 166.6 16 R e visio n 3 Hardware Implementation rs ed ed Figure 13 shows the clocks configuration dialog. pe Figure 13 • System and Subsystem Clocks Configuration 10. Follow the rest of the steps with default settings and generate the design. Su 11. Instantiate the custom logic (AXI master, AHB master, and a command decoder) and make the connections as shown in Figure 7. Revision 3 17 SmartFusion2 - Optimizing DDR Controller for Improved Efficiency - Libero SoC v11.4 Simulation using Micron DDR3 SDRAM Model Setting Up the Simulation Model Setting up and running the simulation involve the following steps: 1. Obtain the Micron DDR3 memory model files - The SmartFusion2 Development Kit board has the DDR3 SDRAM from Micron with the part number; MT41J256M8HX-15E. The memory model used in the example design supports this device (Refer to "Appendix A – Design Files" on page 40). 2. Copy the ddr3.v and ddr3_parameters.vh simulation model files to the \<Libero SoC project directory>\stimulus directory. Su pe rs ed ed 3. Instantiate and connect the DDR3 memory model in the testbench as shown in Figure 14. Figure 14 • Instantiating Simulation Model 4. Ensure that ddr3.v file is included at the top of the testbench file. The example design uses two instances of DDR3 models with the device width eight. 18 R e visio n 3 Hardware Implementation rs Figure 15 • Stimulus Settings ed ed 5. Set the testbench in which DDR3 memory model is instantiated as active stimulus. Figure 15 shows the settings under Stimulus Hierarchy. Su pe 6. Click Project > Project Settings > Simulation Options > Waveforms. Figure 16 shows the Waveforms settings on the right. Figure 16 • Waveforms Settings 7. Select the Include DO file check box and enter wave.do in the box as displayed in Figure 16. Revision 3 19 SmartFusion2 - Optimizing DDR Controller for Improved Efficiency - Libero SoC v11.4 ed ed The timing diagrams shown from Figure 17 through Figure 19 illustrate the write operation. Figure 17 illustrates the AXI master signals, command from CMD_Decoder, and address and data to the AHB master. Figure 17 • AXI Master (AXI_IF) Signals for Write Operation Su pe rs Figure 18 illustrates the MDDR signals. The AXI master reads 2 KB of data from LSRAM and writes to DDR3 SDRAM. The write operation is repeated eight times. The data is written into Row 0 of all banks (Bank 0 – Bank 7). Figure 18 • MDDR Signals for Write Operation 20 R e visio n 3 Hardware Implementation Figure 19 • AHB Master Signals ed ed Figure 19 illustrates the AHB master signals. The AHB master receives the address and data from the AXI master and writes into eSRAM. Su pe rs The timing diagrams shown from Figure 20 through Figure 22 illustrate the read operation. Figure 20 illustrates the AXI master signals, command from CMD_Decoder, and address and data to the AHB master. Figure 20 • AXI Master (AXI_IF) Signals for Read Operation Revision 3 21 SmartFusion2 - Optimizing DDR Controller for Improved Efficiency - Libero SoC v11.4 ed ed Figure 21 illustrates the MDDR signals. The AXI master reads 2 KB of data from DDR3 SDRAM and writes to LSRAM. The read operation is repeated eight times. The data is read from Row 0 of all banks (Bank 0 – Bank 7). Figure 21 • MDDR Signals for Read Operation pe rs Figure 22 illustrates the AHB master signals. The AHB master receives the address and data from AXI master and writes to eSRAM. Su Figure 22 • AHB Master Signals 22 R e visio n 3 Hardware Implementation Simulation using Microsemi DDR3 SDRAM VIP Model Libero SoC includes a generic DDR memory simulation model (VIP). This VIP is attached to the pin side of the MDDR or FDDR subsystem, and simulates the functionality of a DDR memory device. It can be configured for DDR2, DDR3, and LPDDR SDRAM memories as well. Setting Up Simulation Model Setting up and running the simulation involves the followings steps: 1. Click Catalog tab in the Libero SoC. 2. Select the Simulation Mode check box. rs ed ed 3. Under Memory and Controller, select Generic DDR Memory Simulation model to drag into the SmartDesign testbench canvas. Figure 23 shows the Simulation mode. Su pe Figure 23 • Generic DDR Memory Simulation Model Revision 3 23 SmartFusion2 - Optimizing DDR Controller for Improved Efficiency - Libero SoC v11.4 pe rs ed ed 4. Enter the Generic DDR Memory Simulation model configuration details as shown in Figure 24. The example design uses two instances of SimDRAM (VIP model) with the device width size of eight. Su Figure 24 • Configuring SimDRAM 24 R e visio n 3 Hardware Implementation ed ed 5. Make the connections as described in "Simulation using Microsemi DDR3 SDRAM VIP Model" section on page 23. The connections are same as the Micron model. Figure 25 shows the SmartDesign testbench for the example design with Microsemi DDR3 SDRAM VIP model. rs Figure 25 • SmartDesign Testbench for Example Design with Microsemi DDR3 SDRAM VIP 6. Generate the design by clicking SmartDesign > Generate Component or by clicking Generate Component on the SmartDesign toolbar. 7. Add the following code above endmodule in the generated SmartDesign testbench file, MDDR_VIP_Simulation.v. Su pe wire [1:0] MDDR_DM_RDQS; wire [15:0] MDDR_DQ; wire [1:0] MDDR_DQS; wire [2:0] COMMAND; assign COMMAND = {MDDR_TA_top_0_MDDR_RAS_N,MDDR_TA_top_0_MDDR_CAS_N,MDDR_TA_top_0_MDDR_WE_N}; assign MDDR_DM_RDQS = MDDR_DM_RDQS_net_0; assign MDDR_DQ = MDDR_DQ_net_0; assign MDDR_DQS = MDDR_DQS_net_0; initial begin $display ("+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++"); $display ("Loading LSRAM from lsram.mem file"); $display (""); $readmemh("lsram_512x64.mem",MDDR_VIP_Simulation.MDDR_TA_top_0.AXI_IF_0.Rdata_me m); $display (" Completed Loading LSRAM"); $display ("+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++"); @(posedge MDDR_VIP_Simulation.MDDR_TA_top_0.AXI_IF_0.RESETn); /* 2KB write */ repeat(9500) @(posedge MDDR_VIP_Simulation.MDDR_TA_top_0.AXI_IF_0.CLK); force MDDR_VIP_Simulation.MDDR_TA_top_0.CMD_Decoder_0.command = 8'b001_001_01; /* Disable Write */ repeat(15) @(posedge MDDR_VIP_Simulation.MDDR_TA_top_0.AXI_IF_0.CLK); force MDDR_VIP_Simulation.MDDR_TA_top_0.CMD_Decoder_0.command = 8'b000_000_00; /* 2KB Read */ repeat(5000) @(posedge MDDR_VIP_Simulation.MDDR_TA_top_0.AXI_IF_0.CLK); force MDDR_VIP_Simulation.MDDR_TA_top_0.CMD_Decoder_0.command = 8'b001_001_10; Revision 3 25 SmartFusion2 - Optimizing DDR Controller for Improved Efficiency - Libero SoC v11.4 /* Disable Read */ repeat(15) @(posedge MDDR_VIP_Simulation.MDDR_TA_top_0.AXI_IF_0.CLK); force MDDR_VIP_Simulation.MDDR_TA_top_0.CMD_Decoder_0.command = 8'b000_000_00; end ed ed Figure 26 shows the SmartDesign generated testbench file under Files tab. rs Figure 26 • SmartDesign Generated Testbench File Su pe 8. Under the Stimulus Hierarchy tab, set the SmartDesign testbench as Set as active stimulus. Figure 27 shows the Stimulus Hierarchy settings. Figure 27 • Stimulus Settings 26 R e visio n 3 Hardware Implementation Figure 28 • Waveforms Settings ed ed 9. Change the default DO file name to wave_vip.do file in Project > Project Settings > Simulation Options > Waveforms. Figure 28 shows the Waveforms settings. Su pe rs The timing diagrams shown from Figure 29 through Figure 31 on page 28 illustrate the write operation. Figure 29 illustrates the AXI master signals, command from CMD_Decoder, and address and data to AHB master. Figure 29 • AXI Master (AXI_IF) Signals for Write Operation Revision 3 27 SmartFusion2 - Optimizing DDR Controller for Improved Efficiency - Libero SoC v11.4 ed ed Figure 30 illustrates the MDDR subsystem signals. The AXI master reads 2 KB of data from LSRAM and writes into DDR3 SDRAM. The write operation is repeated eight times. The data is written into Row 0 of all banks (Bank 0 – Bank 7). Figure 30 • MDDR Signals for Write Operation pe rs Figure 31 illustrates the AHB master signals. The AHB master receives address and data from AXI master and writes into eSRAM. Su Figure 31 • AHB Master Signals 28 R e visio n 3 Hardware Implementation ed ed The timing diagrams shown from Figure 32 through Figure 34 on page 30 illustrate the read operation. Figure 32 illustrates the AXI master signals, command from CMD_Decoder, and address and data to AHB master. Figure 32 • AXI Master (AXI_IF) Signals for Read Operation pe rs Figure 33 illustrates the MDDR signals. The AXI master reads 2 KB of data from DDR3 SDRAM and writes into LSRAM. The read operation is repeated eight times. The data is red from Row 0 of all banks (Bank 0 – Bank 7). Su Figure 33 • MDDR Signals for Read Operation Revision 3 29 SmartFusion2 - Optimizing DDR Controller for Improved Efficiency - Libero SoC v11.4 Figure 34 • AHB Master Signal Software Implementation ed ed Figure 34 illustrates the AHB master signals. The AHB master receives address and data from AXI master and writes into eSRAM. The software design example performs the following operations: Initializing and configuring the MMUART_0 with 115200 baud rate, 8 data bits, 1 stop bit, no parity, and no flow control. This is done by adding MICROSEMI_STDIO_THRU_MMUART0 symbol in the project settings as shown in Figure 35. Su pe rs • Figure 35 • MICROSEMI_STDIO_THRU_MMUART0 Symbol Settings • Initializing and configuring the GPIOs (MSS_GPIO_0 to MSS_GPIO_7 are configured in the output mode). • Initializing the DDR3 SDRAM: • – 16777216x4 locations, starting from address 0xA0000000, are filled with zeros. – 8x1024x4 locations, starting from address 0xA1000000, are filled with incremental patterns. Initializing the eSRAM: – 30 8x4 locations, starting from address 0x20008104, are filled with zeros. • Performing the data integrity checks. • Sending a command to the AXI master for reading operation through GPIOs. R e visio n 3 Software Implementation • Sending a command to the AXI master for writing operation through GPIOs. List of firmware drivers used in this application: • SmartFusion2 MSS GPIO driver • SmartFusion2 MSS MMUART driver: – To communicate with the serial terminal program running on host PC In this design example, the application software performs the following steps: 1. Performing the following data integrity checks: a. The Cortex-M3 processor initializes the 8x1024x4 (8 repetitions x 1024 locations x 4 bytes) locations of DDR3 SDRAM, starting from address 0xA1000000, with incremental patterns. The pattern increments from 0 to 1023, and is repeated eight times. ed ed b. The AXI master reads 4 KB of data from DDR3 SDRAM, starting from the address 0x01000000, that is 0xA10000001, and writes into LSRAM. The read operation is repeated eight times. The last 4 KB of data is fetched from the address 0x01007000, that is 0xA10070001. c. The AXI master reads 4 KB of data from LSRAM (512x64) and writes into DDR3 SDRAM, starting from address 0x00000000, that is, 0xA00000001. The write operation is repeated eight times. The last 4 KB of data is written at the address 0x00007000, that is 0xA0007000. d. The Cortex-M3 processor compares the 4 KB data at address 0xA0007000 and 0xA1007000. The status is printed on HyperTerminal with error count, if any. Note:The address map to access the DDR memory from MSS masters through MDDR is 0xA0000000-0xDFFFFFFF. 2. Initializing the DDR3 SDRAM again. 3. Perform the read operation. Uncomment any of the following lines based on the size of data to be read from DDR3 SDRAM. The default size is 2 KB. */ rs /* DDR3 SDRAM READ OPERATION (0x26); (0x4A); (0x6E); (0x92); pe //MSS_GPIO_set_outputs //MSS_GPIO_set_outputs //MSS_GPIO_set_outputs //MSS_GPIO_set_outputs // // // // 2KB 4KB 8KB 16KB 4. Printing the read throughput values on HyperTerminal. 5. Performing the write operation. Uncomment any of the following lines based on the size of data to be written into DDR3 SDRAM. The default size is 2 KB. /* DDR3 SDRAM WRITE OPERATION Su //MSS_GPIO_set_outputs //MSS_GPIO_set_outputs //MSS_GPIO_set_outputs //MSS_GPIO_set_outputs (0x25); (0x49); (0x6D); (0x91); */ // // // // 2KB 4KB 8KB 16KB 6. Printing the write throughput values on HyperTerminal. Revision 3 31 SmartFusion2 - Optimizing DDR Controller for Improved Efficiency - Libero SoC v11.4 Running the Design The design example is designed to run on the SmartFusion2 Development Kit board. Refer to http://www.microsemi.com/products/fpga-soc/design-resources/dev-kits/smartfusion2/smartfusion2development-kit for more detailed board information. Board Jumper Settings Table 5 lists the jumpers that need to be connected on SmartFusion2 Development Kit board. Table 5 • SmartFusion2 SoC FPGA Development Kit Jumper Settings Jumper Pin (From) J23 2 J129, J133 2 2 Comments Default ed ed J70, J93, J94, J117, J123, 1 J142, J157, J160, J167, J225, J226, J227 Pin (To) 3 Default 3 Default Note: Press SW7 power switch on the board to OFF position while providing the jumper connections. Host PC to Board Connections 1. Connect the FlashPro4 programmer to the FP4 HEADER J59 connector of the SmartFusion2 Development Kit board. rs 2. Connect one end of the USB mini-B (FTDI interface) cable to the J24 connector provided on the SmartFusion2 Development Kit board and connect the other end of the USB cable to the host PC. USB Driver Installation pe Install the FTDI D2XX driver for serial terminal communication through FTDI mini USB cable. The drivers and installation guide can be downloaded from www.microsemi.com/soc/documents/CDM_2.08.24_WHQL_Certified.zip. Su Ensure that the USB to UART bridge drivers are detected (can be verified in Device Manager in the system), as shown in Figure 36 on page 33. 32 R e visio n 3 Running the Design rs ed ed Note: Copy the COM port number for serial port configuration. Ensure that the COM port Location is specified as on USB Serial Converter D, as shown in Figure 36. pe Figure 36 • USB to UART Bridge Drivers Steps to Run the Design 1. Connect the power supply to the J18 connector and FlashPro programmer. 2. Press the SW7 power supply switch to ON. 3. Program the SmartFusion2 Development Kit board with the generated or provided *.stp file (Refer to "Appendix A – Design Files" on page 40) using FlashPro. Su 4. Invoke the SoftConsole3.4 integrated design environment (IDE) and launch the debugger. 5. Start the HyperTerminal program with the baud rate set to 57600, 8 data bits, 1 stop bit, no parity, and no flow control. If the PC does not have HyperTerminal, use any free serial terminal emulation program, such as PuTTY or Tera Term. Refer to the Configuring Serial Terminal Emulation Programs Tutorial for configuring HyperTerminal, Tera Term, and PuTTY. Revision 3 33 SmartFusion2 - Optimizing DDR Controller for Improved Efficiency - Libero SoC v11.4 rs ed ed When the debugger runs in SoftConsole, HyperTerminal window is displayed with the data integrity check-status followed by the read and write throughputs. Figure 37 shows the total number of AXI clocks taken for 2 KB of data transferred from LSRAM to DDR3 SDRAM and DDR3 SDRAM to LSRAM. Su pe Figure 37 • Throughput for 2 KB Data 34 R e visio n 3 Running the Design rs ed ed Figure 38 shows the total number of AXI clocks taken for 4 KB of data transferred from LSRAM to DDR3 SDRAM and DDR3 to LSRAM. Su pe Figure 38 • Throughput for 4 KB Data Revision 3 35 SmartFusion2 - Optimizing DDR Controller for Improved Efficiency - Libero SoC v11.4 rs ed ed Figure 39 shows the total number of AXI clocks taken for 8 KB of data transferred from LSRAM to DDR3 SDRAM and vice-versa. Su pe Figure 39 • Throughput for 8 KB Data 36 R e visio n 3 Running the Design rs ed ed Figure 40 shows the total number of AXI clocks taken for 16 KB of data transferred from LSRAM to DDR3 SDRAM and vice-versa. pe Figure 40 • Throughput for 16 KB Data DDR3 SDRAM Bandwidth Table 6 provides the total number of 16 beat bursts corresponding to the write or read size. Table 6 • Total Number of 16 Beat Bursts Write or Read Data Size Su 2 KB Total Number of 16 Beat Bursts 16 4 KB 32 8 KB 64 16 KB 128 The following equation is applied to calculate the throughput: Bandwidth (MB/s)=(16 ÷ (Total number of AXI clocks ÷ Total number of 16 beat bursts))×8×AXI Clock (MHz) Revision 3 37 SmartFusion2 - Optimizing DDR Controller for Improved Efficiency - Libero SoC v11.4 Simulation Result Table 7 lists the write and read bandwidth of DDR3 SDRAM simulation. The incremental pattern of size varies from 2 KB to 16 KB, which is transferred from LSRAM to DDR3 SDRAM and vice-versa. Table 7 • DDR3 SDRAM Bandwidth Size No of cycle Bandwidth (MB/Sec) No of cycle 2 KB 529 619 737 444 4 KB 1041 629 1476 444 8 KB 2065 634 2954 443 16 KB 4113 637 6233 420 2 KB 529 642 737 461 4 KB 1041 653 1476 460 8 KB 2065 658 2954 460 16 KB 4401 617 6236 436 2 KB 501 678 737 461 4 KB 981 693 1476 460 8 KB 1941 700 2954 460 16 KB 4151 655 6236 436 2 KB 502 677 721 471 4 KB 982 692 1444 470 8 KB 1942 700 2890 470 16 KB 3862 704 5788 470 Base 160 MHz 1 2 3 1. 166 MHz 2. Without Write Response State 1. 166 MHz 2. Without Write Response State 3. Tuned DDR Configuration 1. 166 MHz 2. Without Write Response State 3. Tuned DDR Configuration 4. Read Command Queuing 2 KB 477 712 526 646 4 KB 957 710 1054 645 8 KB 1917 709 2110 644 16 KB 3837 708 4222 644 Su 4 1. 166 MHz 38 Bandwidth Write Read (MB/Sec) Improvement Improvement avg: 630 avg:440 ed ed Optimization Techniques rs SI No Read pe Write R e visio n 3 avg: 650 4.7% avg: 460 4.5% avg: 680 7.9% avg: 460 4.5% avg: 700 11% avg: 470 6.8% avg: 710 12% avg: 645 46.6% Conclusion Board Test Result Table 8 lists the write and read bandwidth of DDR3 SDRAM on SmartFusion2 Development Kit board. The incremental pattern of size varies from 2 KB to 16 KB, which is transferred from LSRAM to DDR3 SDRAM and vice-versa. Table 8 • DDR3 SDRAM Bandwidth Write Base 160 MHz 2 3 1. 166.6 MHz 2. Without Write Response State 1. 166.6 MHz 2. Without Write Response State 3. Tuned DDR Configuration Bandwidth (MB/Sec) No of Cycle 1. 166.6 MHz 2. Without Write Response State 3. Tuned DDR Configuration 4. Read Command Queuing Bandwidth Write Read (MB/Sec) Improvement Improvement 2 KB 507 646 736 445 4 KB 1019 643 1475 444 8 KB 2043 641 2956 443 16 KB 4091 2 KB 507 4 KB 1019 8 KB 2043 16 KB 4091 2 KB 492 4 KB 957 8 KB 1980 16 KB 3963 avg: 640 avg: 440 640 5915 443 672 736 463 669 1475 462 668 2956 461 667 5912 461 693 736 463 713 1475 462 689 2953 462 688 5912 461 2 KB 477 715 720 473 4 KB 957 713 1443 472 8 KB 1980 689 2889 472 16 KB 3837 711 5799 470 2 KB 477 715 526 648 4 KB 957 713 1054 647 8 KB 1917 711 2110 646 16 KB 3837 711 4427 616 avg: 670 4.7% avg: 460 4.5% avg: 690 7.8% avg: 460 4.5% avg: 710 11% avg: 470 6.8% avg: 710 11% avg: 640 45% Su 4 1. 166.6 MHz No of Cycle pe 1 Size ed ed Optimization Techniques rs SI No Read Conclusion This application note describes the DDR SDRAM bandwidth optimization techniques with an example design on SmartFusion2 Development Kit board. It also shows the DDR SDRAM simulation flow using the Micron DDR3 SDRAM model and Microsemi DDR3 SDRAM VIP model. Revision 3 39 SmartFusion2 - Optimizing DDR Controller for Improved Efficiency - Libero SoC v11.4 Appendix A – Design Files The design files can be downloaded from the Microsemi SoC Products Group website: http://soc.microsemi.com/download/rsc/?f=M2S_AC422_11p4_DF Su pe rs ed ed The design file consists of Libero SoC Verilog project, SoftConsole software project, MDDR Configuration files, Simulation model files and programming files (*.stp) for SmartFusion2 Development Kit board. Refer to the Readme.txt file included in the design file for the directory structure and description. 40 R e visio n 3 List of Changes List of Changes The following table lists the critical changes that are made in the current version: Date Changes Page Revision 3 (August, 2014) Rearranged a few sections. No change in content. NA Revision 2 (August, 2014) Updated the document for Libero SoC v11.4 software release (SAR 59944). NA Initial release. NA Su pe rs ed ed Revision 1 (June, 2014) Revision 3 41 ed ed rs pe Su Microsemi Corporate Headquarters One Enterprise, Aliso Viejo CA 92656 USA Within the USA: +1 (800) 713-4113 Outside the USA: +1 (949) 380-6100 Sales: +1 (949) 380-6136 Fax: +1 (949) 215-4996 E-mail: [email protected] Microsemi Corporation (Nasdaq: MSCC) offers a comprehensive portfolio of semiconductor and system solutions for communications, defense and security, aerospace, and industrial markets. Products include high-performance and radiation-hardened analog mixed-signal integrated circuits, FPGAs, SoCs, and ASICs; power management products; timing and synchronization devices and precise time solutions, setting the world's standard for time; voice processing devices; RF solutions; discrete components; security technologies and scalable anti-tamper products; Power-over-Ethernet ICs and midspans; as well as custom design capabilities and services. Microsemi is headquartered in Aliso Viejo, Calif. and has approximately 3,400 employees globally. Learn more at www.microsemi.com. © 2014 Microsemi Corporation. All rights reserved. Microsemi and the Microsemi logo are trademarks of Microsemi Corporation. All other trademarks and service marks are the property of their respective owners. 51900290-3/08.14