Application Note AC424 IGLOO2 - Optimizing DDR Controller for Improved Efficiency - Libero SoC v11.5 Table of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . rs ed Frequency of Operation . . . . . . . . . Burst Length . . . . . . . . . . . . . . . AXI Master without Write Response State Read Address Queuing . . . . . . . . . Series of Writes or Reads . . . . . . . . DDR Configuration Tuning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ed Purpose . . . . . . . . . . . . . . . . . . . Introduction . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . Reference Design Requirements and Details Optimization Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1 3 3 3 .3 .4 .4 .5 .5 .6 Implementation on IGLOO2 Device . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Design Description . . . . . . . . . . . . . . . . . . . . . . . Hardware Implementation . . . . . . . . . . . . . . . . . . . Configuring the System Builder . . . . . . . . . . . . . Simulation using Micron LPDDR SDRAM model . . . . Simulation using Microsemi LPDDR SDRAM VIP Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7 .9 10 15 20 Running the Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 pe Setting up the Hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 Running the Performance Measurement Utility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 Steps to Run the Utility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 LPDDR SDRAM Bandwidth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 Simulation Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 Board Test Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 Su Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 Appendix A – Design Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 List of Changes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 Purpose This application note describes the techniques for improving the efficiency of double data rate (DDR) controller using an example design for the IGLOO®2 Evaluation Kit board. It also provides details about implementing the DDR SDRAM simulation flow using the Micron low power DDR (LPDDR) SDRAM model and Microsemi® LPDDR SDRAM verification IP (VIP) model. Introduction The IGLOO2 device has two high-speed hardened application-specific integrated circuit (ASIC) memory controllers such as memory subsystem DDR (MDDR) and fabric DDR (FDDR) for interfacing with the DDR2, DDR3, and LPDDR1 SDRAM memories. The MDDR and FDDR subsystems are used to access high-speed DDR memories for high-speed data transfer and code execution. The DDR memory connected to the MDDR subsystem can be accessed by the high performance memory subsystem (HPMS) masters and the master logic implemented in the FPGA fabric (FPGA fabric master), whereas the DDR memory connected to the FDDR subsystem can be accessed only by a field programmable gate array (FPGA) fabric master. May 2015 © 2015 Microsemi Corporation 1 IGLOO2 - Optimizing DDR Controller for Improved Efficiency - Libero SoC v11.5 The FPGA fabric masters communicate with the MDDR and FDDR subsystems through the AXI or AHB interfaces. Figure 1 shows the MDDR data path for advanced extensible Interface (AXI)/advanced highperformance bus (AHB) interfaces. DDR SDRAM I O HPMS MDDR D D R P H Y DDR Controller 64-bit AXI AXI Transaction Controller APB Config Reg 16-bit APB HPMS DDR Bridge DDR_FIC 64-bit AXI/ single 32- bit AHBL/ Dual 32-bit AHBL AXI/ AHB Master AHB Bus Matrix FIC_2 FIC_1 FIC_0 rs ed APB Master HPDMA ed D D R FPGA FABRIC IGLOO2 Figure 1 • MDDR Data Path for AXI/AHB Interfaces pe The AXI interface is typically used for burst transfers that provide an efficient access path and high throughput. Though the throughput is dependent on many system level parameters, it can be improved by applying specific optimization techniques. This application note describes a few DDR SDRAM controller optimization techniques with an example design for IGLOO2 Evaluation Kit board. For more information on MDDR and FDDR subsystems, refer to IGLOO2 FPGA High Speed DDR Interfaces User Guide. Su The sample design consists of an AXI master, LSRAM, counters for throughput measurement, and CoreUART interface logic. During the write operation, the AXI master reads the LSRAM and writes to the LPDDR memory and measures the throughput. During the read operation, the AXI master reads the LPDDR memory and writes to LSRAM and measures the throughput. The throughput values are displayed on the Host PC using the CoreUART interface. There are two types of memory simulation models that can be used: • Microsemi provided Verification Intellectual Property (VIP): The Libero® System-on-Chip (SoC) includes a JEDEC compliant VIP model. This VIP model is attached to the pin side of the MDDR/FDDR subsystem and simulates the functionality of a DDR memory device. This VIP model can be configured for DDR2, DDR3, and LPDDR SDRAM memories. This VIP model is intended to complement vendor models or to act as a substitute in case a vendor model is not available. • Vendor-specific memory model: Memory vendors such as Micron, Samsung, and Hynix provide downloadable simulation models for specific memory devices. The downloaded simulation model must be JEDEC compliant. This application note also describes the DDR SDRAM simulation flow using the Micron LPDDR SDRAM model and Microsemi LPDDR SDRAM VIP model. 2 R e vi s i o n 3 References References The following list of references is used in this document. These references complement and help in understanding the relevant Microsemi IGLOO2 device features and flows that are described in this document: IGLOO2 FPGA High Speed DDR Interfaces User Guide • Connecting User Logic to AXI Interfaces of High-Performance Communication BlocksSmartFusion®2 • Connecting User Logic to the SmartFusion Microcontroller Subsystem • DDR Controller and Serial High Speed Controller Initialization Methodology • IGLOO2 Evaluation Kit User Guide • IGLOO2 System Builder User Guide ed • Reference Design Requirements and Details Table 1 lists the reference design requirements and details. Table 1 • Reference Design Requirements and Details Description rs ed Hardware Requirements IGLOO2 evaluation kit. Refer the IGLOO2 FPGA Rev C or later Evaluation Kit User Guide for more information. Desktop or Laptop Any 64-bit Windows Operating System Software Requirements Libero SoC v11.5 Microsoft .NET Framework 4 Client Profile Optimization Techniques pe This section describes the following optimization techniques: Frequency of Operation • Burst Length • AXI Master without Write Response State • Read Address Queuing • Series of Writes or Reads • DDR Configuration Tuning Su • Frequency of Operation The MDDR and FDDR subsystems support clock management dividers directly inside the embedded block. The user can select the divider ratios from the Clock Configurator for DDR clocks (MDDR_CLK/FDDR_CLK) and DDR_FIC clock. The best overall throughput ratio is 2:1, that is, half the DDR clock frequency. Many other ratios are possible to provide flexibility to the FPGA design. To show the optimal data throughput, this application note shows all examples using the 2:1 ratio. The design example uses 64-bit AXI as a FPGA fabric interface and configured to use 166 MHz as DDR clock frequency1 and 83 MHz as AXI clock. 1. IGLOO2 MDDR subsystem supports maximum of 200 MHz as DDR clock frequency for LPDDR1 memory type. But LPDDR memory on IGLOO2 Evaluation Kit board supports 166 MHz only. Revision 3 3 IGLOO2 - Optimizing DDR Controller for Improved Efficiency - Libero SoC v11.5 Burst Length The MDDR and FDDR subsystems support the DRAM burst lengths of 4, 8, or 16, depending on the configured bus-width and the DDR type. The AXI transaction controller in the MDDR and FDDR subsystem supports up to 16-beat burst read and writes. The AXI beat burst length (write and read) and burst length of DRAM affect the optimal performance, but by setting the maximum supported burst length for DDR SDRAM and AXI interface achieve the optimal performance. The design example uses a DDR SDRAM burst length of 16 and an AXI write and read beat burst length of 16. AXI Master without Write Response State ed When the AXI master sends the last data (D (A15)), the WLAST signal goes HIGH which indicates that the last transfer in the first write burst. When the AXI slave in DDR subsystem accepts all the data items, it drives a write response (BVALID) back to the master to indicate that the write transaction is complete. By AXI protocol, the AXI master should wait for the write response before initiating the next write transaction. However, the time spent waiting for the write response will waste clock cycles and reduce overall throughput. The AXI master can then send the second burst write address (B) without waiting for the write response of the first burst, which improves the write throughput. ACLK A pe AWADDR rs ed This improves the write throughput by decreasing the wait states. This application note is focused on optimal throughput and therefore the write response channel is not verified. It is recommended that when using this technique, the write response channel is used concurrently with starting the next transfer to ensure that the previous write data has been fully accepted. The AXI protocol has a defined methodology on handling the termination of write burst transaction; this should be followed if the write response channel returns a non-OKAY value. Figure 2 shows the write transaction timing diagram without the write response state. B AWVALID WDATA D(A0) D(A15) D(B0) Su WLAST OKAY BRESP BVALID Figure 2 • Write Transaction Timing Diagram Without Write Response State This technique is implemented in the example design. Comment or uncomment the following line of code in the AXI Master Interface (AXI_IF.v) to validate this technique. define WITHOUT_WRITE_RESPONSE /* Comment this line to define With Write Response state */ 4 R e vi s i o n 3 Optimization Techniques Read Address Queuing The MDDR and FDDR subsystems support up to four outstanding read transactions. Figure 3 shows the burst read address queuing timing diagram. In 2:1 clock ratio, the MDDR controller starts the burst read transaction before command FIFO full which allows the AXI master to send 5 burst read address. ACLK A1 A2 A4 A3 ARVALID A6 A7 rs ed ARREADY ARID A5 ed ARADDR 0 Figure 3 • Read Transaction Timing Diagram with Burst Read Address Queuing pe The AXI master increments the burst read address as long as the AXI slave in the DDR subsystem asserts the ARREADY signal. The burst read address queuing significantly increases the read throughput compared to the normal AXI read sequence. Table 6 on page 31 and Table 7 on page 32 show this significant improvement. Read address queuing does not reduce the initial latency associated with a DDR memory read access. By issuing multiple reads in sequence the initial latency is only accounted for in the first read. After the first read data is returned the remainder of the requested data is returned in sequence without a large read access penalty associated with the first read. This technique is implemented in the example design. Comment or uncomment the following line of code in the AXI Master Interface (AXI_IF.v) to validate this technique. define READ_ADDRESS_QUEUING /* Comment this line to define Without Read Address Queuing */ Su Series of Writes or Reads The MDDR and FDDR subsystems' performance depends on the method of data transfer between the DDR SDRAM and AXI master. The following methods of data transfer reduce optimal performance: 1. Single beat burst read and write operation 2. Random read and write operation 3. Switching between read and write operation Revision 3 5 IGLOO2 - Optimizing DDR Controller for Improved Efficiency - Libero SoC v11.5 The MDDR and FDDR subsystems' performance increases while performing a series of reads or writes from the same bank and row. Figure 4 shows the AXI to LPDDR address mapping for the LPDDR SDRAM on IGLOO2 Evaluation Kit board. BANK 0 0 ROW 0 BANK 1 ROW 0 0x0800 BANK 2 0x1000 ROW 0 BANK 3 ROW 0 0x1800 0x1F80 Figure 4 • AXI to LPDDR the Address Mapping ed When the AXI address crosses 0x0800, the DDR subsystem activates Row 0 of Bank 1. Row 1 of Bank 0 is activated only when the AXI address crosses 0x2000. If a new row is accessed every time, it must be pre-charged first. This means that additional time is needed before a row can be accessed and this reduces the overall throughput. Understanding the internal memory layout of the DDR and how it maps to the AXI address enables the accesses to minimize the row changes and increase the overall throughput. DDR Configuration Tuning rs ed The DDR SDRAM datasheet provides the timings parameters required for the proper operation in terms of time units. These timings should match with the configuration registers in the MDDR/FDDR Controller. The timing parameters are required as number of DDR clock cycles and these are entered in the DDR Configurator GUI. The selection of minimum write or read delay values can result in optimal performance. Implementing this approach requires extensive memory testing to ensure that the memory transfers are stable. The IGLOO2 Evaluation Kit LPDDR is supplied with a default configuration file to setup the MDDR controller which is available on its documentation web page. Table 2 lists the tuned parameter for better performance than that default configuration file. Default Values Tuned Values 4 2 pe Table 2 • Tuned DDR Timing Parameters Parameters 8 7 8192 11264 6 3 7 3 3104 1280 RC 3 10 XP 3 1 CKE 3 1 RFC 79 25 MRD RAS min RAS max RCD RP Su REFI 6 R e vi s i o n 3 Implementation on IGLOO2 Device Implementation on IGLOO2 Device The optimization techniques that are mentioned in the above section have been implemented and validated using the IGLOO2 Evaluation Kit board. This section describes the following: • Design Description • Hardware Implementation • Running the Design Design Description ed The design consists of HPMS, DDR initialization subsystem, AXI master (AXI_IF), Command decoder (CMD_Decoder), and a COM interface (COM_Interface) block. Figure 5 shows the block diagram of the design. MDDR D D R P H Y 64-bit AXI AXI Transaction Controller rs ed DDR SDRAM DDR IO HPMS DDR Controller APB Config Reg HPMS DDR Bridge HPDMA eNVM DDR_FIC AHB Bus Matrix 16-bit APB FIC_1 64-bit AXI FIC_0 DDR initialization subsystem CoreConfigMaster FIC_2 CoreConfigP Su pe CoreAHBLite SYSRESET_POR AXI Read Channels AXI Master (AXI_IF) AXI Write Channels Write throughput counter LSRAM Read throughput counter CoreResetP Command Decoder (CMD_Decoder) Control_Logic Rx TPSRAM FPGA FABRIC COM Interface (COM_Interface) CoreUART Tx Host PC IGLOO2 Figure 5 • Top-Level Block Diagram of the Design MDDR in the HPMS is configured to use the LPDDR interface and routed the AXI interface to the FPGA fabric. The DDR initialization subsystem consists of CoreConfigMaster and CoreConfigP IPs that initializes the MDDR controller. The initialization process consists of following actions: • CoreConfigMaster (AHBL Master) accesses the DDR configuration data stored in eNVM through FIC_0. • The configuration data is then sent to CoreConfigP through the FIC_2 master port. • CoreConfigP sends the configuration data to advanced peripheral bus (APB) of the MDDR subsystem. Revision 3 7 IGLOO2 - Optimizing DDR Controller for Improved Efficiency - Libero SoC v11.5 The command decoder receives the AXI transaction control from the COM interface block and generates write, read, write size, and read size signals. Figure 6 shows the command decoding. Write Size Read Size command 6 5 4 2 3 0 1 0 0 0 0 0 NOP 0 0 1 2KB 0 1 Write 0 1 0 4KB 1 0 Read 0 1 1 8KB 1 0 0 16KB 1 0 1 32KB ed 0 1 0 0 0 0 0 1 2KB 0 1 0 4KB 0 1 1 8KB 1 0 0 16KB 1 0 1 32KB 1 NOP 0 rs ed Figure 6 • Command Decoding 7 W/R pe The AXI master block consists of AXI read channel, AXI write channel, write throughput counter, read throughput counter, and 512x64 LSRAM. It performs the write or read operation2 based on the input signals from the command decoder. During the write operation, the AXI master reads the LSRAM and writes into the LPDDR memory, and then measures the write throughput. During the read operation, the AXI master reads the LPDDR memory and writes into LSRAM, and then measures the read throughput. The write throughput counter counts the AXI clocks between AWVALID of first data and WLAST of last data. Similarly, the read throughput counter counts the AXI clocks between ARVALID of first data and RLAST of last data. Su After triggering the write or read operation, the AXI master performs the write or read operation eight times to get the average throughput and to ACTIVATE all banks. During the write operation, the write address (AWADDR) starts from 0x00000000, and is incremented by 128 (16-beat burst). During the read operation, the read address (ARADDR) starts from 0x00000000, and is incremented by 128. After each write or read operation, the AXI master sends the throughput count value and an address starting from 0x0 to the COM interface block. Then, the COM interface block writes the throughput values into TPSRAM. The control logic in the COM interface block reads the values and sends to the host PC using the CoreUART interface. For information on creating a custom AXI interface on user logic, refer to Connecting User Logic to AXI Interfaces of High-Performance Communication Blocks in the SmartFusion2 Devices application note. 2. 8 The write or read operation depends on the size of the write or read data. For example, if the write size is selected as 2KB, then one AXI write operation equals to 16x16-beat burst (16x16x64). R e vi s i o n 3 Implementation on IGLOO2 Device Hardware Implementation The hardware implementation involves: • Configuring the System Builder • Connecting with custom logic (AXI master, Command decoder and COM interface). pe rs ed ed Figure 7 shows the top-level SmartDesign of the example design. Su Figure 7 • Top-Level SmartDesign Revision 3 9 IGLOO2 - Optimizing DDR Controller for Improved Efficiency - Libero SoC v11.5 Configuring the System Builder This section describes how to configure the MDDR and other device features and then build a complete system using the System Builder graphical design wizard in the Libero SoC Software. For details on how to launch the System Builder wizard and detailed information on how to use it, refer the IGLOO2 System Builder User Guide. The following steps describe how to configure the MDDR and access it from AXI master in the FPGA fabric: pe rs ed ed 1. Go to the System Builder - Device Features tab and check the HPMS External Memory (MDDR) check box and leave the rest of the check boxes unchecked. Figure 8 shows the System Builder - Device Features tab. Su Figure 8 • System Builder - Device Features Tab 10 R e visio n 3 Implementation on IGLOO2 Device 2. Configure the MDDR in Memories tab as shown in Figure 9. In this example, the design is created to access the LPDDR memory with a 16-bit data width and no ECC. pe rs ed ed 3. Set the DDR memory settling time to 200 us and click Import Configuration file to initialize the DDR memory. The configuration file is stored in eNVM. The MDDR subsystem registers should be initialized before accessing DDR memory through the MDDR subsystem. The MDDR configuration register file is provided along with the design file (Refer to "Appendix A – Design Files" section on page 33). Su Figure 9 • Memory Configuration Revision 3 11 IGLOO2 - Optimizing DDR Controller for Improved Efficiency - Libero SoC v11.5 rs ed ed 4. In the Peripherals tab, drag the Fabric AMBA Master and drop on to the HPMS DDR FIC Subsystem. The AMBA_MASTER_0 is added to the subsystem and configured the Interface Type as AXI. Figure 10 shows the Peripherals tab with the AMBA_MASTER_0 added. Figure 10 • Peripherals Tab with the Fabric AMBA Master Added 5. Configure the System Clock and Subsystem clocks in Clocks tab as listed in Table 3. Clock Name pe Table 3 • System and Subsystem Clocks Frequency in MHz System Clock On-chip 25/50 MHz RC oscillator HPMS_CLK 83 MDDR_CLK 166 83 FIC_0_CLK 20.750 Su DDR/SMC_FIC_CLK 12 R e visio n 3 Implementation on IGLOO2 Device rs ed ed Figure 11 shows the Clocks configuration dialog. Figure 11 • System and Subsystem Clocks Configuration 6. Follow the rest of the steps with default settings and generate the design. Su pe 7. Instantiate the custom logic (AXI master, Command decoder, and COM interface) and make the connections as shown in Figure 7 on page 9. Revision 3 13 IGLOO2 - Optimizing DDR Controller for Improved Efficiency - Libero SoC v11.5 rs ed ed Figure 12 shows the SmartDesign of the COM interface block. The COM_interface SmartDesign component handles the UART communication between Host PC software utility and the AXI master logic. Figure 12 • SmartDesign of the COM Interface Block pe The COREUART_0 IP receives the UART signals from the Host PC user interface. The Control_Logic_0 collects the command from the COREUART_0 and sends to the AXI master through the Command decoder which triggers the write/read operation. After the write/read operation, the Control_logic_0 reads the throughput count values from TPSRAM_0 and sends to the Host PC through COREUART_0. The configurations of CoreUART and TPSRAM are given below: • – Baud Rate: 115200 – Data Bits: 8 – Parity: None. PSRAM IP has the following configuration: Su • CoreUART IP has the following configuration: 14 – Write port depth: 8 – Write port width: 16 – Read port depth: 16 – Read port width: 8 R e visio n 3 Implementation on IGLOO2 Device Simulation using Micron LPDDR SDRAM model Setting up the Simulation Model Setting up and running the simulation involves the following steps: 1. Obtain the Micron LPDDR SDRAM model files - The IGLOO2 Evaluation Kit board has the LPDDR DRAM from Micron with the part number; MT46H32M16LFBF-6 IT:C TR. The memory model used in the example design supports this device (Refer to "Appendix A – Design Files" section on page 33). 2. Copy the dram.v and dram_parameters.vh simulation model files to the \<Libero SoC project directory>\stimulus directory. rs ed ed 3. Instantiate and connect the LPDDR SDRAM memory model in the testbench as shown in Figure 13. Figure 13 • Instantiation of Simulation Model Su pe 4. Ensure that dram.v file is included at the top of the testbench file. The example design uses one instance of LPDDR model with the device width sixteen. Revision 3 15 IGLOO2 - Optimizing DDR Controller for Improved Efficiency - Libero SoC v11.5 Figure 14 • Stimulus Settings rs ed ed 5. Set the testbench in which LPDDR memory model is instantiated as active stimulus. Figure 14 shows the settings under Stimulus Hierarchy. Su pe 6. Click Project > Project Settings > Simulation Options > Waveforms. Figure 15 shows the Waveforms settings on the right. Figure 15 • Waveforms Settings 7. Select the Include DO File check box and enter wave.do in the box as shown in Figure 15. 16 R e visio n 3 Implementation on IGLOO2 Device Timing Diagrams ed The timing diagrams shown from Figure 16 through Figure 18 on page 18 illustrate the write operation. Figure 16 shows the control logic signals in the COM interface block. rs ed Figure 16 • Control Logic Signals in the COM Interface Block for Write Operation Su pe After reset de-asserted, the control logic receives the handshake (0x63) command through the CoreUART RX port. Then control logic sends the acknowledgment (0x61) through the CoreUART TX port and wait for the write command. Once the write command receives, the control logic sends the write command to the AXI master through the Command decoder which triggers the write operation. After the write operation, the control logic reads the throughput count values from TPSRAM and sends to the CoreUART TX port. Revision 3 17 IGLOO2 - Optimizing DDR Controller for Improved Efficiency - Libero SoC v11.5 ed Figure 17 shows the MDDR signals. The AXI master reads 2 KB of data from LSRAM and writes to LPDDR SDRAM. The write operation is repeated eight times. The data is written into Row 0 and Row 1 of all banks (Bank 0 - Bank 3). Figure 17 • MDDR Signals for Write Operation Su pe rs ed Figure 18 shows the AXI master signals. The AXI master sends the throughput count value and an address starting from 0x0 to the COM interface block. Figure 18 • AXI Master Signals for Write Operation 18 R e visio n 3 Implementation on IGLOO2 Device ed The timing diagrams shown from Figure 19 through Figure 21 on page 20 show the read operation. Figure 19 shows the control logic signals in the COM interface block. Figure 19 • Control Logic Signals in the COM Interface Block for Read Operation rs ed After the write operation, the control logic receives the handshake (0x63) command through the CoreUART RX port. Then control logic sends the acknowledgment (0x61) through the CoreUART TX port and wait for the read command. Once the read command receives, the control logic sends the read command to the AXI master through the Command decoder which triggers the read operation. After the read operation, the control logic reads the throughput count values from TPSRAM and sends to the CoreUART TX port. Su pe Figure 20 shows the MDDR signals. The AXI master reads 2 KB of data from LPDDR SDRAM and writes to LSRAM. The read operation is repeated eight times. The data is read from Row 0 and Row 1 of all banks (Bank 0 - Bank 3). Figure 20 • MDDR Signals for Read Operation Revision 3 19 IGLOO2 - Optimizing DDR Controller for Improved Efficiency - Libero SoC v11.5 rs ed ed Figure 21 shows the AXI master signals. The AXI master sends the throughput count value and an address starting from 0x0 to the COM interface block. Figure 21 • AXI Master Signals for Read Operation Simulation using Microsemi LPDDR SDRAM VIP Model Libero SoC includes a generic DDR memory simulation model, also called Verification Intellectual Property (VIP).This VIP is attached to the pin side of the MDDR or FDDR subsystem, and simulates the functionality of a DDR memory device. It can be configured for DDR2, DDR3, and LPDDR SDRAM memories as well. pe Setting up the Simulation Model Setting up and running the simulation involves the followings steps: 1. Click Catalog tab in the Libero SoC. Su 2. Select the Simulation Mode check box. 20 R e visio n 3 Implementation on IGLOO2 Device rs ed ed 3. Under Memory and Controller, select Generic DDR Memory Simulation model to drag into the SmartDesign testbench canvas. Figure 22 shows the Simulation model. Figure 22 • Generic DDR Memory Simulation Model Su pe 4. Enter the Generic DDR Memory Simulation model configuration details as shown in Figure 23. The example design uses one instance of SimDRAM (VIP model) with the device width size of sixteen. Figure 23 • Configuring SimDRAM Revision 3 21 IGLOO2 - Optimizing DDR Controller for Improved Efficiency - Libero SoC v11.5 ed 5. Make the connections as described in "Simulation using Micron LPDDR SDRAM model" section on page 15. The connections are same as the Micron model. Figure 24 Shows the SmartDesign testbench for the example design with Microsemi LPDDR SDRAM VIP model. Figure 24 • SmartDesign Testbench for Example Design with Microsemi LPDDR SDRAM VIP rs ed 6. Generate the design by clicking SmartDesign > Generate Component or by clicking Generate Component on the SmartDesign tool bar. Su pe 7. Open the generated SmartDesign testbench file, LPDDR_VIP_Simulation.v. Figure 25 shows the SmartDesign generated testbench file under Files tab. Figure 25 • SmartDesign Generated Testbench File 8. Replace timescale 1 ns/100 ps with timescale 1ps /1fs. 9. Add the following code above endmodule. wire MDDR_CLK; wire MDDR_CKE; wire MDDR_CS_N; wire [15:0] MDDR_ADDR; wire [2:0] MDDR_BA; wire [3:0] wire [1:0] 22 fsm; MDDR_DM_RDQS; R e visio n 3 Implementation on IGLOO2 Device wire [15:0] MDDR_DQ; wire [1:0] MDDR_DQS; wire [2:0] COMMAND; reg fsm_en; assign assign assign assign assign assign assign assign MDDR_DM_RDQS MDDR_DQ MDDR_DQS MDDR_CLK MDDR_CKE MDDR_CS_N MDDR_ADDR MDDR_BA = = = = = = = = = 8680500; // 115200Hz net_2; net_1; net_0; MDDR_TA_top_0_MDDR_CLK; MDDR_TA_top_0_MDDR_CKE; MDDR_TA_top_0_MDDR_CS_N; MDDR_TA_top_0_MDDR_ADDR; MDDR_TA_top_0_MDDR_BA; ed reg BRCLK; parameter BRCLK_PERIOD assign COMMAND = {MDDR_TA_top_0_MDDR_RAS_N,MDDR_TA_top_0_MDDR_CAS_N,MDDR_TA_top_0_MDDR_WE_N}; assign fsm = LPDDR_VIP_Simulation.MDDR_TA_top_0.COM_Interface_0.Control_Logic_0.fsm; rs ed initial begin BRCLK = 1'b0; @(posedge LPDDR_VIP_Simulation.MDDR_TA_top_0.AXI_IF_0.CLK); repeat(2000) begin #(BRCLK_PERIOD / 2.0) BRCLK <= !BRCLK; end end initial begin $display ("+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++"); $display ("Loading LSRAM from lsram.mem file"); $display (""); pe $readmemh("lsram_512x64.mem",LPDDR_VIP_Simulation.MDDR_TA_top_0.AXI_IF_0.Rdata_mem); $display (" Completed Loading LSRAM"); $display ("+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++"); @(posedge LPDDR_VIP_Simulation.MDDR_TA_top_0.AXI_IF_0.RESETn); force LPDDR_VIP_Simulation.MDDR_TA_top_0.COM_Interface_0.COREUART_0.DATA_OUT 8'b1100011; /* Handshaking Commmand 'c' */ Su @(posedge LPDDR_VIP_Simulation.MDDR_TA_top_0.COM_Interface_0.Control_Logic_0.RX_RDY); repeat(5) @(posedge BRCLK); force LPDDR_VIP_Simulation.MDDR_TA_top_0.COM_Interface_0.COREUART_0.DATA_OUT 8'b00100101; /* 2KB Write */ @(posedge fsm_en); repeat(40) @(posedge BRCLK); force LPDDR_VIP_Simulation.MDDR_TA_top_0.COM_Interface_0.COREUART_0.DATA_OUT 8'b1100011; /* Handshaking Commmand 'c' */ @(posedge LPDDR_VIP_Simulation.MDDR_TA_top_0.COM_Interface_0.Control_Logic_0.RX_RDY); repeat(5) @(posedge BRCLK); force LPDDR_VIP_Simulation.MDDR_TA_top_0.COM_Interface_0.COREUART_0.DATA_OUT 8'b00100110; /* 2KB Read */ = = = = end always @(posedge LPDDR_VIP_Simulation.MDDR_TA_top_0.AXI_IF_0.CLK) begin Revision 3 23 IGLOO2 - Optimizing DDR Controller for Improved Efficiency - Libero SoC v11.5 if(fsm == 4'b1001) begin fsm_en <= 1'b1; end else begin fsm_en <= 1'b0; end end pe rs ed ed 10. Under the Stimulus Hierarchy tab, set the SmartDesign testbench as Set as active stimulus. Figure 26 shows the Stimulus Hierarchy settings. Figure 26 • Stimulus Settings Su 11. Change the default DO file name to wave_vip.do file in Project > Project Settings > Simulation Options > Waveforms. Figure 27 shows the Waveforms settings. Figure 27 • Waveforms Settings 24 R e visio n 3 Implementation on IGLOO2 Device Timing Diagrams ed The timing diagrams shown from Figure 28 through Figure 30 on page 26 show the write operation. Figure 28 shows the control logic signals in the COM interface block. Figure 28 • Control Logic Signals in the COM Interface Block for Write Operation rs ed After reset de-asserted, the control logic receives the handshake (0x63) command through the CoreUART RX port. Then control logic sends the acknowledgment (0x61) through the CoreUART TX port and wait for the write command. Once the write command receives, the control logic sends the write command to the AXI master through the Command decoder which triggers the write operation. After the write operation, the control logic reads the throughput count values from TPSRAM and sends to the CoreUART TX port. Su pe Figure 29 shows the MDDR signals. The AXI master reads 2KB of data from LSRAM and writes to LPDDR SDRAM. The write operation is repeated eight times. The data is written into Row 0 and Row 1 of all banks (Bank 0 - Bank 3). Figure 29 • MDDR Signals for Write Operation Revision 3 25 IGLOO2 - Optimizing DDR Controller for Improved Efficiency - Libero SoC v11.5 rs ed ed Figure 30 shows the AXI master signals. The AXI master sends the throughput count value and an address starting from 0x0 to the COM interface block. Figure 30 • AXI Master Signals for Write Operation Su pe The timing diagrams shown from Figure 31 through Figure 33 on page 27 show the read operation. Figure 31 shows the control logic signals in the COM interface block. Figure 31 • Control Logic Signals in the COM Interface Block for Read Operation After the write operation, the control logic receives the handshake (0x63) command through the CoreUART RX port. Then control logic sends the acknowledgment (0x61) through the CoreUART TX port and wait for the read command. Once the read command receives, the control logic sends the read command to the AXI master through the Command decoder which triggers the read operation. After the read operation, the control logic reads the throughput count values from TPSRAM and sends to the CoreUART TX port. 26 R e visio n 3 Implementation on IGLOO2 Device ed Figure 32 shows the MDDR signals. The AXI master reads 2 KB of data from LPDDR SDRAM and writes to LSRAM. The read operation is repeated eight times. The data is read from Row 0 and Row 1 of all banks (Bank 0 - Bank 3). Figure 32 • MDDR Signals for Read Operation Su pe rs ed Figure 33 shows the AXI master signals. The AXI master sends the throughput count value and an address starting from 0x0 to the COM interface block. Figure 33 • AXI Master Signals for Read Operation Revision 3 27 IGLOO2 - Optimizing DDR Controller for Improved Efficiency - Libero SoC v11.5 Running the Design The design example is designed to run on the IGLOO2 Evaluation Kit board. For more detailed board information, refer to www.microsemi.com/products/fpga-soc/design-resources/dev-kits/igloo2/igloo2evaluation-kit. Setting up the Hardware Use the following steps to setup the hardware: 1. Connect the jumpers on the IGLOO2 Evaluation Kit board as listed in Table 4. Table 4 • IGLOO2 FPGA Evaluation Kit Jumper Settings Pin (from) Pin (to) Comments J22 1 2 Default J23 1 2 Default J24 1 2 Default J8 1 2 Default J3 1 2 Default ed Jumper rs ed CAUTION: While making the jumper connections, the power supply switch SW7 must be switched off. 2. Connect the Power supply to the J6 connector; switch on the power supply switch, SW7. 3. Connect the FlashPro4 programmer to the PROG HEADER J5 connector of the IGLOO2 Evaluation Kit board. 4. Connect the Host PC USB port to the IGLOO2 Evaluation Kit board’s J18 (FTDI) USB connector using the USB mini-B cable. 5. Ensure that the USB to UART bridge drivers are automatically detected. This can be verified in the Device Manager of the Host PC. If the USB to UART bridge drivers are not installed, download and install the drivers from www.microsemi.com/soc/documents/CDM_2.08.24_WHQL_Certified.zip. Su pe 6. Program the IGLOO2 Evaluation Kit board with the generated or provided *.stp file (Refer to "Appendix A – Design Files" section on page 33) using FlashPro. 28 R e visio n 3 Running the Design Running the Performance Measurement Utility rs ed ed The example design provides performance measurement utility, IGL2_LPDDR_BW that runs on the Host PC to communicate with the IGLOO2 Evaluation Kit board. The UART protocol is used as the underlying communication protocol between the Host PC and the IGLOO2 Evaluation Kit board. Figure 34 shows initial screen of the IGL2_LPDDR_BW Utility. Figure 34 • IGL2_LPDDR_BW Utility The IGL2_LPDDR_BW utility consists of following sections: Transfer Type: Write or Read • Data Size: Write data size or Read data size can be selected from the drop down box. The data size varies from 2 KB to 16 KB. • LPDDR Throughput: It displays the number of AXI clocks and corresponding throughput values in MB/s • Buttons: pe • Connect button for to connect or disconnect the serial port communication between the Host PC and the IGLOO2 Evaluation Kit board. – Start button for to start the performance measurement. – Exit button to exit the application. Su – Revision 3 29 IGLOO2 - Optimizing DDR Controller for Improved Efficiency - Libero SoC v11.5 Steps to Run the Utility 1. Launch the utility. The default location is: <download_folder>\M2GL_AC424_DF\Windows_Utility\IGL2_LPDDR_BW.exe rs ed ed 2. Click Connect and wait for few seconds to connect the proper FDTI COM port. The connection status along with the COM Port and Baud rate is shown in the left bottom corner of the screen. Figure 35 shows the connection status of the utility. Figure 35 • IGL2_LPDDR_BW Connection Status 3. Select Write or Read as Transfer Type. Su pe 4. Select Write data size or Read data size from the drop down box and click Start. Figure 36 shows the write throughput measurement for 2 KB data transfer. Figure 36 • Write Throughput Measurement The number of AXI clocks may differ for different run. It is due to PRE-CHARGE, ACTIAVTE or REFRESH cycle which runs between the memory transactions. Table 7 lists the write and read bandwidth for data size varies from 2 KB to 16 KB. 30 R e visio n 3 LPDDR SDRAM Bandwidth LPDDR SDRAM Bandwidth Table 5 provides the total number of 16 beat bursts corresponding to the write or read size. Table 5 • Total Number of 16 Beat Bursts Write or Read Data Size Total Number of 16 Beat Bursts 2 KB 16 4 KB 32 8 KB 64 16 KB 128 The following equation is applied to calculate the throughput: ed Bandwidth (MB/s) = (16 ÷ (Total number of AXI clocks ÷ Total number of 16 beat bursts))×8×AXI Clock (MHz) Simulation Result Table 6 lists the write and read bandwidth of LPDDR SDRAM simulation. The incremental pattern of size varies from 2 KB to 16 KB, which is transferred from LSRAM to LPDDR SDRAM and vice-versa. rs ed Table 6 • LPDDR SDRAM Bandwidth Write SI No Optimization Techniques Base AXI CLK 80 MHz. No of cycle Bandwidth Write Read (MB/Sec) Improvement Improvement 2 507 323 721 227 4 1019 321 1450 225 8 2043 320 2904 225 16 4091 320 5809 225 2 507 335 721 235 4 1019 333 1450 234 pe 1) AXI CLK 83 MHz. No of Bandwidth cycle (MB/Sec) 8 2043 332 2905 234 16 4091 332 5800 234 2 477 356 721 235 4 957 355 1450 234 8 1917 354 2905 234 Su 1 Size (KB) Read 16 3837 354 5809 234 2 477 356 719 236 4 957 355 1440 236 8 1917 354 2886 235 16 3906 348 5883 231 2 477 356 526 323 4 957 355 1054 322 8 1917 354 2110 322 16 3907 348 4317 315 2 3 4 1) AXI CLK 83 MHz. 2) Without Write Response State 1) AXI CLK 83 MHz. 2) Without Write Response State 3) Tuned DDR Configuration 1) AXI CLK 83 MHz 2) Without Write Response State 3) Tuned DDR Configuration 4) Read Command Queuing Revision 3 avg:320 avg:225 avg:332 3.75% avg:234 4% avg:354 10.6% avg:234 4% avg:354 10.6% avg:235 4.4% avg:354 10.6% avg:322 43% 31 IGLOO2 - Optimizing DDR Controller for Improved Efficiency - Libero SoC v11.5 Table 6 • LPDDR SDRAM Bandwidth (continued) Write 5 Optimization Techniques Size (KB) 1) AXI CLK 100 MHz (MDDR CLK 200 MHZ) 2) Without Write Response State 3) Tuned DDR Configuration 4) Read Command Queuing No of Bandwidth cycle (MB/Sec) No of cycle Bandwidth Write Read (MB/Sec) Improvement Improvement 2 477 429 526 389 4 957 428 1054 388 8 1917 427 2110 388 16 3907 419 4317 379 avg:428 33.75% avg:388 72% ed SI No Read Board Test Result Table 7 lists the write and read bandwidth of LPDDR SDRAM on IGLOO2 Evaluation kit board. The incremental pattern of size varies from 2 KB to 16 KB, which is transferred from LSRAM to LPDDR SDRAM and vice-versa. rs ed Table 7 • LPDDR SDRAM Bandwidth Write Read SI No Optimization Techniques Size (KB) No of cycle Bandwidth (MB/Sec) No of cycle Bandwidth (MB/Sec) Write Improvement Read Improvement Base AXI CLK 80 MHz. 2 507 323 721 227 avg:320 avg:225 1019 321 1450 225 2043 320 2905 225 4091 320 5812 225 507 335 721 235 1019 333 1450 234 avg:332 3.75% avg:234 4% 8 2043 332 2905 234 16 4091 332 5809 234 2 477 356 721 235 4 957 355 1450 234 avg:354 10.6% avg:234 4% 8 1917 354 2905 234 16 3837 354 5809 234 2 477 356 719 236 4 957 355 1444 235 avg:354 10.6% avg:235 4.4% 8 1917 354 2886 235 16 3907 348 5883 231 4 8 16 2 4 1) AXI CLK 83 MHz. 2) Without Write Response State Su 2 1) AXI CLK 83 MHz. pe 1 3 32 1) AXI CLK 83 MHz. 2) Without Write Response State 3) Tuned DDR Configuration R e visio n 3 Conclusion Table 7 • LPDDR SDRAM Bandwidth (continued) Write 4 Optimization Techniques 1)AXI CLK 83 MHz 2) Without Write Response State 3) Tuned DDR Configuration 4) Read Command Queuing Size (KB) No of cycle Bandwidth (MB/Sec) No of cycle Bandwidth (MB/Sec) Write Improvement Read Improvement 2 477 356 526 323 4 957 355 1054 322 avg:354 10.6% avg:322 43% 8 1917 354 2110 322 16 3907 348 4313 315 ed SI No Read Conclusion rs ed This application note describes the DDR SDRAM bandwidth optimization techniques with an example design on IGLOO2 Evaluation Kit board. It also shows the LPDDR SDRAM simulation flow using the Micron LPDDR SDRAM model and Microsemi LPDDR SDRAM VIP model. Appendix A – Design Files The design files can be downloaded from the Microsemi SoC Products Group website: http://soc.microsemi.com/download/rsc/?f=m2gl_ac424_liberov11p5_df The design file consists of Libero SoC Verilog project, MDDR Configuration files, Simulation model files and programming files (*.stp) for IGLOO2 Evaluation Kit board. Refer to the Readme.txt file included in the design file for the directory structure and description. List of Changes Date Revision 3 (May 2015) Changes Page Updated the document for Libero v11.5 software release (SAR 67502). NA Updated the document for Libero v11.4 software release (SAR 59677). NA Su Revision 2 pe The following table lists critical changes that were made in each revision. (August 2014) Revision 3 33 ed rs ed pe Su Microsemi Corporation (MSCC) offers a comprehensive portfolio of semiconductor and system solutions for communications, defense & security, aerospace and industrial markets. Products include high-performance and radiation-hardened analog mixed-signal integrated circuits, FPGAs, SoCs and ASICs; power management products; timing and synchronization devices and precise time solutions, setting the world’s standard for time; voice processing devices; RF solutions; discrete components; security technologies and scalable anti-tamper products; Ethernet solutions; Power-over-Ethernet ICs and midspans; as well as custom design capabilities and services. Microsemi is headquartered in Aliso Viejo, Calif., and has approximately 3,600 employees globally. Learn more at www.microsemi.com. Microsemi Corporate Headquarters One Enterprise, Aliso Viejo, CA 92656 USA Within the USA: +1 (800) 713-4113 Outside the USA: +1 (949) 380-6100 Sales: +1 (949) 380-6136 Fax: +1 (949) 215-4996 E-mail: [email protected] © 2015 Microsemi Corporation. All rights reserved. Microsemi and the Microsemi logo are trademarks of Microsemi Corporation. All other trademarks and service marks are the property of their respective owners. Microsemi makes no warranty, representation, or guarantee regarding the information contained herein or the suitability of its products and services for any particular purpose, nor does Microsemi assume any liability whatsoever arising out of the application or use of any product or circuit. The products sold hereunder and any other products sold by Microsemi have been subject to limited testing and should not be used in conjunction with mission-critical equipment or applications. Any performance specifications are believed to be reliable but are not verified, and Buyer must conduct and complete all performance and other testing of the products, alone and together with, or installed in, any end-products. Buyer shall not rely on any data and performance specifications or parameters provided by Microsemi. It is the Buyer's responsibility to independently determine suitability of any products and to test and verify the same. The information provided by Microsemi hereunder is provided "as is, where is" and with all faults, and the entire risk associated with such information is entirely with the Buyer. Microsemi does not grant, explicitly or implicitly, to any party any patent rights, licenses, or any other IP rights, whether with regard to such information itself or anything described by such information. Information provided in this document is proprietary to Microsemi, and Microsemi reserves the right to make any changes to the information in this document or to any products and services at any time without notice. 51900292-3/05-15