AC424: IGLOO2 - Optimizing DDR Controller for Improved Efficiency - Libero SoC v11.5 Application Note

Application Note AC424
IGLOO2 - Optimizing DDR Controller for Improved
Efficiency - Libero SoC v11.5
Table of Contents
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
rs
ed
Frequency of Operation . . . . . . . . .
Burst Length . . . . . . . . . . . . . . .
AXI Master without Write Response State
Read Address Queuing . . . . . . . . .
Series of Writes or Reads . . . . . . . .
DDR Configuration Tuning . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
ed
Purpose . . . . . . . . . . . . . . . . . . .
Introduction . . . . . . . . . . . . . . . . .
References . . . . . . . . . . . . . . . . .
Reference Design Requirements and Details
Optimization Techniques . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1
1
3
3
3
.3
.4
.4
.5
.5
.6
Implementation on IGLOO2 Device . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Design Description . . . . . . . . . . . . . . . . . . . . . . .
Hardware Implementation . . . . . . . . . . . . . . . . . . .
Configuring the System Builder . . . . . . . . . . . . .
Simulation using Micron LPDDR SDRAM model . . . .
Simulation using Microsemi LPDDR SDRAM VIP Model
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.7
.9
10
15
20
Running the Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
pe
Setting up the Hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
Running the Performance Measurement Utility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
Steps to Run the Utility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
LPDDR SDRAM Bandwidth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
Simulation Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
Board Test Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
Su
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
Appendix A – Design Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
List of Changes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
Purpose
This application note describes the techniques for improving the efficiency of double data rate (DDR)
controller using an example design for the IGLOO®2 Evaluation Kit board. It also provides details about
implementing the DDR SDRAM simulation flow using the Micron low power DDR (LPDDR) SDRAM
model and Microsemi® LPDDR SDRAM verification IP (VIP) model.
Introduction
The IGLOO2 device has two high-speed hardened application-specific integrated circuit (ASIC) memory
controllers such as memory subsystem DDR (MDDR) and fabric DDR (FDDR) for interfacing with the
DDR2, DDR3, and LPDDR1 SDRAM memories. The MDDR and FDDR subsystems are used to access
high-speed DDR memories for high-speed data transfer and code execution.
The DDR memory connected to the MDDR subsystem can be accessed by the high performance
memory subsystem (HPMS) masters and the master logic implemented in the FPGA fabric (FPGA fabric
master), whereas the DDR memory connected to the FDDR subsystem can be accessed only by a field
programmable gate array (FPGA) fabric master.
May 2015
© 2015 Microsemi Corporation
1
IGLOO2 - Optimizing DDR Controller for Improved Efficiency - Libero SoC v11.5
The FPGA fabric masters communicate with the MDDR and FDDR subsystems through the AXI or AHB
interfaces. Figure 1 shows the MDDR data path for advanced extensible Interface (AXI)/advanced highperformance bus (AHB) interfaces.
DDR
SDRAM
I
O
HPMS
MDDR
D
D
R
P
H
Y
DDR
Controller
64-bit AXI
AXI
Transaction
Controller
APB Config
Reg
16-bit APB
HPMS DDR
Bridge
DDR_FIC
64-bit AXI/ single
32- bit AHBL/
Dual 32-bit AHBL
AXI/ AHB
Master
AHB Bus Matrix
FIC_2
FIC_1
FIC_0
rs
ed
APB
Master
HPDMA
ed
D
D
R
FPGA FABRIC
IGLOO2
Figure 1 • MDDR Data Path for AXI/AHB Interfaces
pe
The AXI interface is typically used for burst transfers that provide an efficient access path and high
throughput. Though the throughput is dependent on many system level parameters, it can be improved
by applying specific optimization techniques. This application note describes a few DDR SDRAM
controller optimization techniques with an example design for IGLOO2 Evaluation Kit board. For more
information on MDDR and FDDR subsystems, refer to IGLOO2 FPGA High Speed DDR Interfaces User
Guide.
Su
The sample design consists of an AXI master, LSRAM, counters for throughput measurement, and
CoreUART interface logic. During the write operation, the AXI master reads the LSRAM and writes to the
LPDDR memory and measures the throughput. During the read operation, the AXI master reads the
LPDDR memory and writes to LSRAM and measures the throughput. The throughput values are
displayed on the Host PC using the CoreUART interface.
There are two types of memory simulation models that can be used:
•
Microsemi provided Verification Intellectual Property (VIP): The Libero® System-on-Chip
(SoC) includes a JEDEC compliant VIP model. This VIP model is attached to the pin side of the
MDDR/FDDR subsystem and simulates the functionality of a DDR memory device. This VIP
model can be configured for DDR2, DDR3, and LPDDR SDRAM memories. This VIP model is
intended to complement vendor models or to act as a substitute in case a vendor model is not
available.
•
Vendor-specific memory model: Memory vendors such as Micron, Samsung, and Hynix
provide downloadable simulation models for specific memory devices. The downloaded
simulation model must be JEDEC compliant.
This application note also describes the DDR SDRAM simulation flow using the Micron LPDDR SDRAM
model and Microsemi LPDDR SDRAM VIP model.
2
R e vi s i o n 3
References
References
The following list of references is used in this document. These references complement and help in
understanding the relevant Microsemi IGLOO2 device features and flows that are described in this
document:
IGLOO2 FPGA High Speed DDR Interfaces User Guide
•
Connecting User Logic to AXI Interfaces of High-Performance Communication BlocksSmartFusion®2
•
Connecting User Logic to the SmartFusion Microcontroller Subsystem
•
DDR Controller and Serial High Speed Controller Initialization Methodology
•
IGLOO2 Evaluation Kit User Guide
•
IGLOO2 System Builder User Guide
ed
•
Reference Design Requirements and Details
Table 1 lists the reference design requirements and details.
Table 1 • Reference Design Requirements and Details
Description
rs
ed
Hardware Requirements
IGLOO2 evaluation kit. Refer the IGLOO2 FPGA Rev C or later
Evaluation Kit User Guide for more information.
Desktop or Laptop
Any 64-bit Windows Operating System
Software Requirements
Libero SoC
v11.5
Microsoft .NET Framework 4 Client Profile
Optimization Techniques
pe
This section describes the following optimization techniques:
Frequency of Operation
•
Burst Length
•
AXI Master without Write Response State
•
Read Address Queuing
•
Series of Writes or Reads
•
DDR Configuration Tuning
Su
•
Frequency of Operation
The MDDR and FDDR subsystems support clock management dividers directly inside the embedded
block. The user can select the divider ratios from the Clock Configurator for DDR clocks
(MDDR_CLK/FDDR_CLK) and DDR_FIC clock. The best overall throughput ratio is 2:1, that is, half the
DDR clock frequency. Many other ratios are possible to provide flexibility to the FPGA design. To show
the optimal data throughput, this application note shows all examples using the 2:1 ratio. The design
example uses 64-bit AXI as a FPGA fabric interface and configured to use 166 MHz as DDR clock
frequency1 and 83 MHz as AXI clock.
1.
IGLOO2 MDDR subsystem supports maximum of 200 MHz as DDR clock frequency for LPDDR1 memory type. But
LPDDR memory on IGLOO2 Evaluation Kit board supports 166 MHz only.
Revision 3
3
IGLOO2 - Optimizing DDR Controller for Improved Efficiency - Libero SoC v11.5
Burst Length
The MDDR and FDDR subsystems support the DRAM burst lengths of 4, 8, or 16, depending on the
configured bus-width and the DDR type. The AXI transaction controller in the MDDR and FDDR
subsystem supports up to 16-beat burst read and writes. The AXI beat burst length (write and read) and
burst length of DRAM affect the optimal performance, but by setting the maximum supported burst length
for DDR SDRAM and AXI interface achieve the optimal performance. The design example uses a DDR
SDRAM burst length of 16 and an AXI write and read beat burst length of 16.
AXI Master without Write Response State
ed
When the AXI master sends the last data (D (A15)), the WLAST signal goes HIGH which indicates that
the last transfer in the first write burst. When the AXI slave in DDR subsystem accepts all the data items,
it drives a write response (BVALID) back to the master to indicate that the write transaction is complete.
By AXI protocol, the AXI master should wait for the write response before initiating the next write
transaction. However, the time spent waiting for the write response will waste clock cycles and reduce
overall throughput. The AXI master can then send the second burst write address (B) without waiting for
the write response of the first burst, which improves the write throughput.
ACLK
A
pe
AWADDR
rs
ed
This improves the write throughput by decreasing the wait states. This application note is focused on
optimal throughput and therefore the write response channel is not verified. It is recommended that when
using this technique, the write response channel is used concurrently with starting the next transfer to
ensure that the previous write data has been fully accepted. The AXI protocol has a defined methodology
on handling the termination of write burst transaction; this should be followed if the write response
channel returns a non-OKAY value. Figure 2 shows the write transaction timing diagram without the write
response state.
B
AWVALID
WDATA
D(A0)
D(A15)
D(B0)
Su
WLAST
OKAY
BRESP
BVALID
Figure 2 • Write Transaction Timing Diagram Without Write Response State
This technique is implemented in the example design. Comment or uncomment the following line of code
in the AXI Master Interface (AXI_IF.v) to validate this technique.
define WITHOUT_WRITE_RESPONSE /* Comment this line to define With Write Response state */
4
R e vi s i o n 3
Optimization Techniques
Read Address Queuing
The MDDR and FDDR subsystems support up to four outstanding read transactions. Figure 3 shows the
burst read address queuing timing diagram. In 2:1 clock ratio, the MDDR controller starts the burst read
transaction before command FIFO full which allows the AXI master to send 5 burst read address.
ACLK
A1
A2
A4
A3
ARVALID
A6
A7
rs
ed
ARREADY
ARID
A5
ed
ARADDR
0
Figure 3 • Read Transaction Timing Diagram with Burst Read Address Queuing
pe
The AXI master increments the burst read address as long as the AXI slave in the DDR subsystem
asserts the ARREADY signal. The burst read address queuing significantly increases the read
throughput compared to the normal AXI read sequence. Table 6 on page 31 and Table 7 on page 32
show this significant improvement. Read address queuing does not reduce the initial latency associated
with a DDR memory read access. By issuing multiple reads in sequence the initial latency is only
accounted for in the first read. After the first read data is returned the remainder of the requested data is
returned in sequence without a large read access penalty associated with the first read.
This technique is implemented in the example design. Comment or uncomment the following line of code
in the AXI Master Interface (AXI_IF.v) to validate this technique.
define READ_ADDRESS_QUEUING /* Comment this line to define Without Read Address Queuing */
Su
Series of Writes or Reads
The MDDR and FDDR subsystems' performance depends on the method of data transfer between the
DDR SDRAM and AXI master. The following methods of data transfer reduce optimal performance:
1. Single beat burst read and write operation
2. Random read and write operation
3. Switching between read and write operation
Revision 3
5
IGLOO2 - Optimizing DDR Controller for Improved Efficiency - Libero SoC v11.5
The MDDR and FDDR subsystems' performance increases while performing a series of reads or writes
from the same bank and row. Figure 4 shows the AXI to LPDDR address mapping for the LPDDR
SDRAM on IGLOO2 Evaluation Kit board.
BANK 0
0
ROW 0
BANK 1
ROW 0
0x0800
BANK 2
0x1000
ROW 0
BANK 3
ROW 0
0x1800
0x1F80
Figure 4 • AXI to LPDDR the Address Mapping
ed
When the AXI address crosses 0x0800, the DDR subsystem activates Row 0 of Bank 1. Row 1 of Bank 0
is activated only when the AXI address crosses 0x2000. If a new row is accessed every time, it must be
pre-charged first. This means that additional time is needed before a row can be accessed and this
reduces the overall throughput. Understanding the internal memory layout of the DDR and how it maps to
the AXI address enables the accesses to minimize the row changes and increase the overall throughput.
DDR Configuration Tuning
rs
ed
The DDR SDRAM datasheet provides the timings parameters required for the proper operation in terms
of time units. These timings should match with the configuration registers in the MDDR/FDDR Controller.
The timing parameters are required as number of DDR clock cycles and these are entered in the DDR
Configurator GUI. The selection of minimum write or read delay values can result in optimal
performance. Implementing this approach requires extensive memory testing to ensure that the memory
transfers are stable. The IGLOO2 Evaluation Kit LPDDR is supplied with a default configuration file to
setup the MDDR controller which is available on its documentation web page.
Table 2 lists the tuned parameter for better performance than that default configuration file.
Default Values
Tuned Values
4
2
pe
Table 2 • Tuned DDR Timing Parameters
Parameters
8
7
8192
11264
6
3
7
3
3104
1280
RC
3
10
XP
3
1
CKE
3
1
RFC
79
25
MRD
RAS min
RAS max
RCD
RP
Su
REFI
6
R e vi s i o n 3
Implementation on IGLOO2 Device
Implementation on IGLOO2 Device
The optimization techniques that are mentioned in the above section have been implemented and
validated using the IGLOO2 Evaluation Kit board. This section describes the following:
•
Design Description
•
Hardware Implementation
•
Running the Design
Design Description
ed
The design consists of HPMS, DDR initialization subsystem, AXI master (AXI_IF), Command decoder
(CMD_Decoder), and a COM interface (COM_Interface) block. Figure 5 shows the block diagram of the
design.
MDDR
D
D
R
P
H
Y
64-bit AXI
AXI
Transaction
Controller
rs
ed
DDR SDRAM
DDR IO
HPMS
DDR
Controller
APB Config
Reg
HPMS DDR
Bridge
HPDMA
eNVM
DDR_FIC
AHB Bus Matrix
16-bit APB
FIC_1
64-bit AXI
FIC_0
DDR initialization
subsystem CoreConfigMaster
FIC_2
CoreConfigP
Su
pe
CoreAHBLite
SYSRESET_POR
AXI Read
Channels
AXI Master (AXI_IF)
AXI Write
Channels
Write throughput counter
LSRAM
Read throughput counter
CoreResetP
Command Decoder
(CMD_Decoder)
Control_Logic
Rx
TPSRAM
FPGA FABRIC
COM Interface
(COM_Interface)
CoreUART
Tx
Host PC
IGLOO2
Figure 5 • Top-Level Block Diagram of the Design
MDDR in the HPMS is configured to use the LPDDR interface and routed the AXI interface to the FPGA
fabric. The DDR initialization subsystem consists of CoreConfigMaster and CoreConfigP IPs that
initializes the MDDR controller. The initialization process consists of following actions:
•
CoreConfigMaster (AHBL Master) accesses the DDR configuration data stored in eNVM through
FIC_0.
•
The configuration data is then sent to CoreConfigP through the FIC_2 master port.
•
CoreConfigP sends the configuration data to advanced peripheral bus (APB) of the MDDR
subsystem.
Revision 3
7
IGLOO2 - Optimizing DDR Controller for Improved Efficiency - Libero SoC v11.5
The command decoder receives the AXI transaction control from the COM interface block and generates
write, read, write size, and read size signals. Figure 6 shows the command decoding.
Write Size
Read Size
command
6
5
4
2
3
0
1
0
0
0
0
0
NOP
0
0
1
2KB
0
1
Write
0
1
0
4KB
1
0
Read
0
1
1
8KB
1
0
0
16KB
1
0
1
32KB
ed
0
1
0
0
0
0
0
1
2KB
0
1
0
4KB
0
1
1
8KB
1
0
0
16KB
1
0
1
32KB
1
NOP
0
rs
ed
Figure 6 • Command Decoding
7
W/R
pe
The AXI master block consists of AXI read channel, AXI write channel, write throughput counter, read
throughput counter, and 512x64 LSRAM. It performs the write or read operation2 based on the input
signals from the command decoder. During the write operation, the AXI master reads the LSRAM and
writes into the LPDDR memory, and then measures the write throughput. During the read operation, the
AXI master reads the LPDDR memory and writes into LSRAM, and then measures the read throughput.
The write throughput counter counts the AXI clocks between AWVALID of first data and WLAST of last
data. Similarly, the read throughput counter counts the AXI clocks between ARVALID of first data and
RLAST of last data.
Su
After triggering the write or read operation, the AXI master performs the write or read operation eight
times to get the average throughput and to ACTIVATE all banks. During the write operation, the write
address (AWADDR) starts from 0x00000000, and is incremented by 128 (16-beat burst). During the read
operation, the read address (ARADDR) starts from 0x00000000, and is incremented by 128.
After each write or read operation, the AXI master sends the throughput count value and an address
starting from 0x0 to the COM interface block. Then, the COM interface block writes the throughput values
into TPSRAM. The control logic in the COM interface block reads the values and sends to the host PC
using the CoreUART interface.
For information on creating a custom AXI interface on user logic, refer to Connecting User Logic to AXI
Interfaces of High-Performance Communication Blocks in the SmartFusion2 Devices application note.
2.
8
The write or read operation depends on the size of the write or read data. For example, if the write size is selected as 2KB,
then one AXI write operation equals to 16x16-beat burst (16x16x64).
R e vi s i o n 3
Implementation on IGLOO2 Device
Hardware Implementation
The hardware implementation involves:
•
Configuring the System Builder
•
Connecting with custom logic (AXI master, Command decoder and COM interface).
pe
rs
ed
ed
Figure 7 shows the top-level SmartDesign of the example design.
Su
Figure 7 • Top-Level SmartDesign
Revision 3
9
IGLOO2 - Optimizing DDR Controller for Improved Efficiency - Libero SoC v11.5
Configuring the System Builder
This section describes how to configure the MDDR and other device features and then build a complete
system using the System Builder graphical design wizard in the Libero SoC Software. For details on how
to launch the System Builder wizard and detailed information on how to use it, refer the IGLOO2 System
Builder User Guide.
The following steps describe how to configure the MDDR and access it from AXI master in the FPGA
fabric:
pe
rs
ed
ed
1. Go to the System Builder - Device Features tab and check the HPMS External Memory
(MDDR) check box and leave the rest of the check boxes unchecked. Figure 8 shows the System
Builder - Device Features tab.
Su
Figure 8 • System Builder - Device Features Tab
10
R e visio n 3
Implementation on IGLOO2 Device
2. Configure the MDDR in Memories tab as shown in Figure 9. In this example, the design is
created to access the LPDDR memory with a 16-bit data width and no ECC.
pe
rs
ed
ed
3. Set the DDR memory settling time to 200 us and click Import Configuration file to initialize the
DDR memory. The configuration file is stored in eNVM. The MDDR subsystem registers should
be initialized before accessing DDR memory through the MDDR subsystem. The MDDR
configuration register file is provided along with the design file (Refer to "Appendix A – Design
Files" section on page 33).
Su
Figure 9 • Memory Configuration
Revision 3
11
IGLOO2 - Optimizing DDR Controller for Improved Efficiency - Libero SoC v11.5
rs
ed
ed
4. In the Peripherals tab, drag the Fabric AMBA Master and drop on to the HPMS DDR FIC
Subsystem. The AMBA_MASTER_0 is added to the subsystem and configured the Interface
Type as AXI. Figure 10 shows the Peripherals tab with the AMBA_MASTER_0 added.
Figure 10 • Peripherals Tab with the Fabric AMBA Master Added
5. Configure the System Clock and Subsystem clocks in Clocks tab as listed in Table 3.
Clock Name
pe
Table 3 • System and Subsystem Clocks
Frequency in MHz
System Clock
On-chip 25/50 MHz RC oscillator
HPMS_CLK
83
MDDR_CLK
166
83
FIC_0_CLK
20.750
Su
DDR/SMC_FIC_CLK
12
R e visio n 3
Implementation on IGLOO2 Device
rs
ed
ed
Figure 11 shows the Clocks configuration dialog.
Figure 11 • System and Subsystem Clocks Configuration
6. Follow the rest of the steps with default settings and generate the design.
Su
pe
7. Instantiate the custom logic (AXI master, Command decoder, and COM interface) and make the
connections as shown in Figure 7 on page 9.
Revision 3
13
IGLOO2 - Optimizing DDR Controller for Improved Efficiency - Libero SoC v11.5
rs
ed
ed
Figure 12 shows the SmartDesign of the COM interface block. The COM_interface SmartDesign
component handles the UART communication between Host PC software utility and the AXI master
logic.
Figure 12 • SmartDesign of the COM Interface Block
pe
The COREUART_0 IP receives the UART signals from the Host PC user interface. The Control_Logic_0
collects the command from the COREUART_0 and sends to the AXI master through the Command
decoder which triggers the write/read operation. After the write/read operation, the Control_logic_0 reads
the throughput count values from TPSRAM_0 and sends to the Host PC through COREUART_0.
The configurations of CoreUART and TPSRAM are given below:
•
–
Baud Rate: 115200
–
Data Bits: 8
–
Parity: None.
PSRAM IP has the following configuration:
Su
•
CoreUART IP has the following configuration:
14
–
Write port depth: 8
–
Write port width: 16
–
Read port depth: 16
–
Read port width: 8
R e visio n 3
Implementation on IGLOO2 Device
Simulation using Micron LPDDR SDRAM model
Setting up the Simulation Model
Setting up and running the simulation involves the following steps:
1. Obtain the Micron LPDDR SDRAM model files - The IGLOO2 Evaluation Kit board has the
LPDDR DRAM from Micron with the part number; MT46H32M16LFBF-6 IT:C TR. The memory
model used in the example design supports this device (Refer to "Appendix A – Design Files"
section on page 33).
2. Copy the dram.v and dram_parameters.vh simulation model files to the \<Libero SoC project
directory>\stimulus directory.
rs
ed
ed
3. Instantiate and connect the LPDDR SDRAM memory model in the testbench as shown in
Figure 13.
Figure 13 • Instantiation of Simulation Model
Su
pe
4. Ensure that dram.v file is included at the top of the testbench file. The example design uses one
instance of LPDDR model with the device width sixteen.
Revision 3
15
IGLOO2 - Optimizing DDR Controller for Improved Efficiency - Libero SoC v11.5
Figure 14 • Stimulus Settings
rs
ed
ed
5. Set the testbench in which LPDDR memory model is instantiated as active stimulus. Figure 14
shows the settings under Stimulus Hierarchy.
Su
pe
6. Click Project > Project Settings > Simulation Options > Waveforms. Figure 15 shows the
Waveforms settings on the right.
Figure 15 • Waveforms Settings
7. Select the Include DO File check box and enter wave.do in the box as shown in Figure 15.
16
R e visio n 3
Implementation on IGLOO2 Device
Timing Diagrams
ed
The timing diagrams shown from Figure 16 through Figure 18 on page 18 illustrate the write operation.
Figure 16 shows the control logic signals in the COM interface block.
rs
ed
Figure 16 • Control Logic Signals in the COM Interface Block for Write Operation
Su
pe
After reset de-asserted, the control logic receives the handshake (0x63) command through the
CoreUART RX port. Then control logic sends the acknowledgment (0x61) through the CoreUART TX
port and wait for the write command. Once the write command receives, the control logic sends the write
command to the AXI master through the Command decoder which triggers the write operation. After the
write operation, the control logic reads the throughput count values from TPSRAM and sends to the
CoreUART TX port.
Revision 3
17
IGLOO2 - Optimizing DDR Controller for Improved Efficiency - Libero SoC v11.5
ed
Figure 17 shows the MDDR signals. The AXI master reads 2 KB of data from LSRAM and writes to
LPDDR SDRAM. The write operation is repeated eight times. The data is written into Row 0 and Row 1
of all banks (Bank 0 - Bank 3).
Figure 17 • MDDR Signals for Write Operation
Su
pe
rs
ed
Figure 18 shows the AXI master signals. The AXI master sends the throughput count value and an
address starting from 0x0 to the COM interface block.
Figure 18 • AXI Master Signals for Write Operation
18
R e visio n 3
Implementation on IGLOO2 Device
ed
The timing diagrams shown from Figure 19 through Figure 21 on page 20 show the read operation.
Figure 19 shows the control logic signals in the COM interface block.
Figure 19 • Control Logic Signals in the COM Interface Block for Read Operation
rs
ed
After the write operation, the control logic receives the handshake (0x63) command through the
CoreUART RX port. Then control logic sends the acknowledgment (0x61) through the CoreUART TX
port and wait for the read command. Once the read command receives, the control logic sends the read
command to the AXI master through the Command decoder which triggers the read operation. After the
read operation, the control logic reads the throughput count values from TPSRAM and sends to the
CoreUART TX port.
Su
pe
Figure 20 shows the MDDR signals. The AXI master reads 2 KB of data from LPDDR SDRAM and writes
to LSRAM. The read operation is repeated eight times. The data is read from Row 0 and Row 1 of all
banks (Bank 0 - Bank 3).
Figure 20 • MDDR Signals for Read Operation
Revision 3
19
IGLOO2 - Optimizing DDR Controller for Improved Efficiency - Libero SoC v11.5
rs
ed
ed
Figure 21 shows the AXI master signals. The AXI master sends the throughput count value and an
address starting from 0x0 to the COM interface block.
Figure 21 • AXI Master Signals for Read Operation
Simulation using Microsemi LPDDR SDRAM VIP Model
Libero SoC includes a generic DDR memory simulation model, also called Verification Intellectual
Property (VIP).This VIP is attached to the pin side of the MDDR or FDDR subsystem, and simulates the
functionality of a DDR memory device. It can be configured for DDR2, DDR3, and LPDDR SDRAM
memories as well.
pe
Setting up the Simulation Model
Setting up and running the simulation involves the followings steps:
1. Click Catalog tab in the Libero SoC.
Su
2. Select the Simulation Mode check box.
20
R e visio n 3
Implementation on IGLOO2 Device
rs
ed
ed
3. Under Memory and Controller, select Generic DDR Memory Simulation model to drag into the
SmartDesign testbench canvas. Figure 22 shows the Simulation model.
Figure 22 • Generic DDR Memory Simulation Model
Su
pe
4. Enter the Generic DDR Memory Simulation model configuration details as shown in Figure 23.
The example design uses one instance of SimDRAM (VIP model) with the device width size of
sixteen.
Figure 23 • Configuring SimDRAM
Revision 3
21
IGLOO2 - Optimizing DDR Controller for Improved Efficiency - Libero SoC v11.5
ed
5. Make the connections as described in "Simulation using Micron LPDDR SDRAM model" section
on page 15. The connections are same as the Micron model. Figure 24 Shows the SmartDesign
testbench for the example design with Microsemi LPDDR SDRAM VIP model.
Figure 24 • SmartDesign Testbench for Example Design with Microsemi LPDDR SDRAM VIP
rs
ed
6. Generate the design by clicking SmartDesign > Generate Component or by clicking Generate
Component on the SmartDesign tool bar.
Su
pe
7. Open the generated SmartDesign testbench file, LPDDR_VIP_Simulation.v. Figure 25 shows the
SmartDesign generated testbench file under Files tab.
Figure 25 • SmartDesign Generated Testbench File
8. Replace timescale 1 ns/100 ps with timescale 1ps /1fs.
9. Add the following code above endmodule.
wire
MDDR_CLK;
wire
MDDR_CKE;
wire
MDDR_CS_N;
wire [15:0] MDDR_ADDR;
wire [2:0] MDDR_BA;
wire [3:0]
wire [1:0]
22
fsm;
MDDR_DM_RDQS;
R e visio n 3
Implementation on IGLOO2 Device
wire [15:0] MDDR_DQ;
wire [1:0] MDDR_DQS;
wire [2:0] COMMAND;
reg fsm_en;
assign
assign
assign
assign
assign
assign
assign
assign
MDDR_DM_RDQS
MDDR_DQ
MDDR_DQS
MDDR_CLK
MDDR_CKE
MDDR_CS_N
MDDR_ADDR
MDDR_BA
=
=
=
=
=
=
=
=
= 8680500;
// 115200Hz
net_2;
net_1;
net_0;
MDDR_TA_top_0_MDDR_CLK;
MDDR_TA_top_0_MDDR_CKE;
MDDR_TA_top_0_MDDR_CS_N;
MDDR_TA_top_0_MDDR_ADDR;
MDDR_TA_top_0_MDDR_BA;
ed
reg BRCLK;
parameter BRCLK_PERIOD
assign COMMAND =
{MDDR_TA_top_0_MDDR_RAS_N,MDDR_TA_top_0_MDDR_CAS_N,MDDR_TA_top_0_MDDR_WE_N};
assign fsm
= LPDDR_VIP_Simulation.MDDR_TA_top_0.COM_Interface_0.Control_Logic_0.fsm;
rs
ed
initial
begin
BRCLK
= 1'b0;
@(posedge LPDDR_VIP_Simulation.MDDR_TA_top_0.AXI_IF_0.CLK);
repeat(2000)
begin
#(BRCLK_PERIOD / 2.0) BRCLK <= !BRCLK;
end
end
initial
begin
$display ("+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++");
$display ("Loading LSRAM from lsram.mem file");
$display ("");
pe
$readmemh("lsram_512x64.mem",LPDDR_VIP_Simulation.MDDR_TA_top_0.AXI_IF_0.Rdata_mem);
$display (" Completed Loading LSRAM");
$display ("+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++");
@(posedge LPDDR_VIP_Simulation.MDDR_TA_top_0.AXI_IF_0.RESETn);
force LPDDR_VIP_Simulation.MDDR_TA_top_0.COM_Interface_0.COREUART_0.DATA_OUT
8'b1100011;
/* Handshaking Commmand 'c' */
Su
@(posedge
LPDDR_VIP_Simulation.MDDR_TA_top_0.COM_Interface_0.Control_Logic_0.RX_RDY);
repeat(5) @(posedge BRCLK);
force LPDDR_VIP_Simulation.MDDR_TA_top_0.COM_Interface_0.COREUART_0.DATA_OUT
8'b00100101; /* 2KB Write */
@(posedge fsm_en);
repeat(40) @(posedge BRCLK);
force LPDDR_VIP_Simulation.MDDR_TA_top_0.COM_Interface_0.COREUART_0.DATA_OUT
8'b1100011; /* Handshaking Commmand 'c' */
@(posedge
LPDDR_VIP_Simulation.MDDR_TA_top_0.COM_Interface_0.Control_Logic_0.RX_RDY);
repeat(5) @(posedge BRCLK);
force LPDDR_VIP_Simulation.MDDR_TA_top_0.COM_Interface_0.COREUART_0.DATA_OUT
8'b00100110; /* 2KB Read */
=
=
=
=
end
always @(posedge LPDDR_VIP_Simulation.MDDR_TA_top_0.AXI_IF_0.CLK)
begin
Revision 3
23
IGLOO2 - Optimizing DDR Controller for Improved Efficiency - Libero SoC v11.5
if(fsm == 4'b1001)
begin
fsm_en <= 1'b1;
end
else
begin
fsm_en <= 1'b0;
end
end
pe
rs
ed
ed
10. Under the Stimulus Hierarchy tab, set the SmartDesign testbench as Set as active stimulus.
Figure 26 shows the Stimulus Hierarchy settings.
Figure 26 • Stimulus Settings
Su
11. Change the default DO file name to wave_vip.do file in Project > Project Settings > Simulation
Options > Waveforms. Figure 27 shows the Waveforms settings.
Figure 27 • Waveforms Settings
24
R e visio n 3
Implementation on IGLOO2 Device
Timing Diagrams
ed
The timing diagrams shown from Figure 28 through Figure 30 on page 26 show the write operation.
Figure 28 shows the control logic signals in the COM interface block.
Figure 28 • Control Logic Signals in the COM Interface Block for Write Operation
rs
ed
After reset de-asserted, the control logic receives the handshake (0x63) command through the
CoreUART RX port. Then control logic sends the acknowledgment (0x61) through the CoreUART TX
port and wait for the write command. Once the write command receives, the control logic sends the write
command to the AXI master through the Command decoder which triggers the write operation. After the
write operation, the control logic reads the throughput count values from TPSRAM and sends to the
CoreUART TX port.
Su
pe
Figure 29 shows the MDDR signals. The AXI master reads 2KB of data from LSRAM and writes to
LPDDR SDRAM. The write operation is repeated eight times. The data is written into Row 0 and Row 1
of all banks (Bank 0 - Bank 3).
Figure 29 • MDDR Signals for Write Operation
Revision 3
25
IGLOO2 - Optimizing DDR Controller for Improved Efficiency - Libero SoC v11.5
rs
ed
ed
Figure 30 shows the AXI master signals. The AXI master sends the throughput count value and an
address starting from 0x0 to the COM interface block.
Figure 30 • AXI Master Signals for Write Operation
Su
pe
The timing diagrams shown from Figure 31 through Figure 33 on page 27 show the read operation.
Figure 31 shows the control logic signals in the COM interface block.
Figure 31 • Control Logic Signals in the COM Interface Block for Read Operation
After the write operation, the control logic receives the handshake (0x63) command through the
CoreUART RX port. Then control logic sends the acknowledgment (0x61) through the CoreUART TX
port and wait for the read command. Once the read command receives, the control logic sends the read
command to the AXI master through the Command decoder which triggers the read operation. After the
read operation, the control logic reads the throughput count values from TPSRAM and sends to the
CoreUART TX port.
26
R e visio n 3
Implementation on IGLOO2 Device
ed
Figure 32 shows the MDDR signals. The AXI master reads 2 KB of data from LPDDR SDRAM and writes
to LSRAM. The read operation is repeated eight times. The data is read from Row 0 and Row 1 of all
banks (Bank 0 - Bank 3).
Figure 32 • MDDR Signals for Read Operation
Su
pe
rs
ed
Figure 33 shows the AXI master signals. The AXI master sends the throughput count value and an
address starting from 0x0 to the COM interface block.
Figure 33 • AXI Master Signals for Read Operation
Revision 3
27
IGLOO2 - Optimizing DDR Controller for Improved Efficiency - Libero SoC v11.5
Running the Design
The design example is designed to run on the IGLOO2 Evaluation Kit board. For more detailed board
information, refer to www.microsemi.com/products/fpga-soc/design-resources/dev-kits/igloo2/igloo2evaluation-kit.
Setting up the Hardware
Use the following steps to setup the hardware:
1. Connect the jumpers on the IGLOO2 Evaluation Kit board as listed in Table 4.
Table 4 • IGLOO2 FPGA Evaluation Kit Jumper Settings
Pin (from)
Pin (to)
Comments
J22
1
2
Default
J23
1
2
Default
J24
1
2
Default
J8
1
2
Default
J3
1
2
Default
ed
Jumper
rs
ed
CAUTION: While making the jumper connections, the power supply switch SW7 must be switched off.
2. Connect the Power supply to the J6 connector; switch on the power supply switch, SW7.
3. Connect the FlashPro4 programmer to the PROG HEADER J5 connector of the IGLOO2
Evaluation Kit board.
4. Connect the Host PC USB port to the IGLOO2 Evaluation Kit board’s J18 (FTDI) USB connector
using the USB mini-B cable.
5. Ensure that the USB to UART bridge drivers are automatically detected. This can be verified in
the Device Manager of the Host PC.
If the USB to UART bridge drivers are not installed, download and install the drivers from
www.microsemi.com/soc/documents/CDM_2.08.24_WHQL_Certified.zip.
Su
pe
6. Program the IGLOO2 Evaluation Kit board with the generated or provided *.stp file (Refer to
"Appendix A – Design Files" section on page 33) using FlashPro.
28
R e visio n 3
Running the Design
Running the Performance Measurement Utility
rs
ed
ed
The example design provides performance measurement utility, IGL2_LPDDR_BW that runs on the Host
PC to communicate with the IGLOO2 Evaluation Kit board. The UART protocol is used as the underlying
communication protocol between the Host PC and the IGLOO2 Evaluation Kit board. Figure 34 shows
initial screen of the IGL2_LPDDR_BW Utility.
Figure 34 • IGL2_LPDDR_BW Utility
The IGL2_LPDDR_BW utility consists of following sections:
Transfer Type: Write or Read
•
Data Size: Write data size or Read data size can be selected from the drop down box. The data
size varies from 2 KB to 16 KB.
•
LPDDR Throughput: It displays the number of AXI clocks and corresponding throughput values
in MB/s
•
Buttons:
pe
•
Connect button for to connect or disconnect the serial port communication between the Host
PC and the IGLOO2 Evaluation Kit board.
–
Start button for to start the performance measurement.
–
Exit button to exit the application.
Su
–
Revision 3
29
IGLOO2 - Optimizing DDR Controller for Improved Efficiency - Libero SoC v11.5
Steps to Run the Utility
1. Launch the utility. The default location is:
<download_folder>\M2GL_AC424_DF\Windows_Utility\IGL2_LPDDR_BW.exe
rs
ed
ed
2. Click Connect and wait for few seconds to connect the proper FDTI COM port. The connection
status along with the COM Port and Baud rate is shown in the left bottom corner of the screen.
Figure 35 shows the connection status of the utility.
Figure 35 • IGL2_LPDDR_BW Connection Status
3. Select Write or Read as Transfer Type.
Su
pe
4. Select Write data size or Read data size from the drop down box and click Start. Figure 36 shows
the write throughput measurement for 2 KB data transfer.
Figure 36 • Write Throughput Measurement
The number of AXI clocks may differ for different run. It is due to PRE-CHARGE, ACTIAVTE or
REFRESH cycle which runs between the memory transactions.
Table 7 lists the write and read bandwidth for data size varies from 2 KB to 16 KB.
30
R e visio n 3
LPDDR SDRAM Bandwidth
LPDDR SDRAM Bandwidth
Table 5 provides the total number of 16 beat bursts corresponding to the write or read size.
Table 5 • Total Number of 16 Beat Bursts
Write or Read Data Size
Total Number of 16 Beat Bursts
2 KB
16
4 KB
32
8 KB
64
16 KB
128
The following equation is applied to calculate the throughput:
ed
Bandwidth (MB/s) = (16 ÷ (Total number of AXI clocks ÷ Total number of 16 beat bursts))×8×AXI Clock (MHz)
Simulation Result
Table 6 lists the write and read bandwidth of LPDDR SDRAM simulation. The incremental pattern of size
varies from 2 KB to 16 KB, which is transferred from LSRAM to LPDDR SDRAM and vice-versa.
rs
ed
Table 6 • LPDDR SDRAM Bandwidth
Write
SI No
Optimization
Techniques
Base
AXI CLK 80 MHz.
No of
cycle
Bandwidth
Write
Read
(MB/Sec) Improvement Improvement
2
507
323
721
227
4
1019
321
1450
225
8
2043
320
2904
225
16
4091
320
5809
225
2
507
335
721
235
4
1019
333
1450
234
pe
1) AXI CLK 83 MHz.
No of Bandwidth
cycle (MB/Sec)
8
2043
332
2905
234
16
4091
332
5800
234
2
477
356
721
235
4
957
355
1450
234
8
1917
354
2905
234
Su
1
Size
(KB)
Read
16
3837
354
5809
234
2
477
356
719
236
4
957
355
1440
236
8
1917
354
2886
235
16
3906
348
5883
231
2
477
356
526
323
4
957
355
1054
322
8
1917
354
2110
322
16
3907
348
4317
315
2
3
4
1) AXI CLK 83 MHz.
2) Without Write
Response State
1) AXI CLK 83 MHz.
2) Without Write
Response State
3) Tuned DDR
Configuration
1) AXI CLK 83 MHz
2) Without Write
Response State
3) Tuned DDR
Configuration
4) Read Command
Queuing
Revision 3
avg:320
avg:225
avg:332
3.75%
avg:234
4%
avg:354
10.6%
avg:234
4%
avg:354
10.6%
avg:235
4.4%
avg:354
10.6%
avg:322
43%
31
IGLOO2 - Optimizing DDR Controller for Improved Efficiency - Libero SoC v11.5
Table 6 • LPDDR SDRAM Bandwidth (continued)
Write
5
Optimization
Techniques
Size
(KB)
1) AXI CLK 100 MHz
(MDDR CLK 200
MHZ)
2) Without Write
Response State
3) Tuned DDR
Configuration
4) Read Command
Queuing
No of Bandwidth
cycle (MB/Sec)
No of
cycle
Bandwidth
Write
Read
(MB/Sec) Improvement Improvement
2
477
429
526
389
4
957
428
1054
388
8
1917
427
2110
388
16
3907
419
4317
379
avg:428
33.75%
avg:388
72%
ed
SI No
Read
Board Test Result
Table 7 lists the write and read bandwidth of LPDDR SDRAM on IGLOO2 Evaluation kit board. The
incremental pattern of size varies from 2 KB to 16 KB, which is transferred from LSRAM to LPDDR
SDRAM and vice-versa.
rs
ed
Table 7 • LPDDR SDRAM Bandwidth
Write
Read
SI No
Optimization
Techniques
Size
(KB)
No of
cycle
Bandwidth
(MB/Sec)
No of
cycle
Bandwidth
(MB/Sec)
Write
Improvement
Read
Improvement
Base
AXI CLK 80 MHz.
2
507
323
721
227
avg:320
avg:225
1019
321
1450
225
2043
320
2905
225
4091
320
5812
225
507
335
721
235
1019
333
1450
234
avg:332
3.75%
avg:234
4%
8
2043
332
2905
234
16
4091
332
5809
234
2
477
356
721
235
4
957
355
1450
234
avg:354
10.6%
avg:234
4%
8
1917
354
2905
234
16
3837
354
5809
234
2
477
356
719
236
4
957
355
1444
235
avg:354
10.6%
avg:235
4.4%
8
1917
354
2886
235
16
3907
348
5883
231
4
8
16
2
4
1) AXI CLK 83
MHz.
2) Without Write
Response State
Su
2
1) AXI CLK 83
MHz.
pe
1
3
32
1) AXI CLK 83
MHz.
2) Without Write
Response State
3) Tuned DDR
Configuration
R e visio n 3
Conclusion
Table 7 • LPDDR SDRAM Bandwidth (continued)
Write
4
Optimization
Techniques
1)AXI CLK 83
MHz
2) Without Write
Response State
3) Tuned DDR
Configuration
4) Read
Command
Queuing
Size
(KB)
No of
cycle
Bandwidth
(MB/Sec)
No of
cycle
Bandwidth
(MB/Sec)
Write
Improvement
Read
Improvement
2
477
356
526
323
4
957
355
1054
322
avg:354
10.6%
avg:322
43%
8
1917
354
2110
322
16
3907
348
4313
315
ed
SI No
Read
Conclusion
rs
ed
This application note describes the DDR SDRAM bandwidth optimization techniques with an example
design on IGLOO2 Evaluation Kit board. It also shows the LPDDR SDRAM simulation flow using the
Micron LPDDR SDRAM model and Microsemi LPDDR SDRAM VIP model.
Appendix A – Design Files
The design files can be downloaded from the Microsemi SoC Products Group website:
http://soc.microsemi.com/download/rsc/?f=m2gl_ac424_liberov11p5_df
The design file consists of Libero SoC Verilog project, MDDR Configuration files, Simulation model files
and programming files (*.stp) for IGLOO2 Evaluation Kit board. Refer to the Readme.txt file included in
the design file for the directory structure and description.
List of Changes
Date
Revision 3
(May 2015)
Changes
Page
Updated the document for Libero v11.5 software release (SAR 67502).
NA
Updated the document for Libero v11.4 software release (SAR 59677).
NA
Su
Revision 2
pe
The following table lists critical changes that were made in each revision.
(August 2014)
Revision 3
33
ed
rs
ed
pe
Su
Microsemi Corporation (MSCC) offers a comprehensive portfolio of semiconductor and system
solutions for communications, defense & security, aerospace and industrial markets. Products
include high-performance and radiation-hardened analog mixed-signal integrated circuits,
FPGAs, SoCs and ASICs; power management products; timing and synchronization devices
and precise time solutions, setting the world’s standard for time; voice processing devices; RF
solutions; discrete components; security technologies and scalable anti-tamper products;
Ethernet solutions; Power-over-Ethernet ICs and midspans; as well as custom design
capabilities and services. Microsemi is headquartered in Aliso Viejo, Calif., and has
approximately 3,600 employees globally. Learn more at www.microsemi.com.
Microsemi Corporate Headquarters
One Enterprise, Aliso Viejo,
CA 92656 USA
Within the USA: +1 (800) 713-4113
Outside the USA: +1 (949) 380-6100
Sales: +1 (949) 380-6136
Fax: +1 (949) 215-4996
E-mail: [email protected]
© 2015 Microsemi Corporation. All
rights reserved. Microsemi and the
Microsemi logo are trademarks of
Microsemi Corporation. All other
trademarks and service marks are the
property of their respective owners.
Microsemi makes no warranty, representation, or guarantee regarding the information contained herein or
the suitability of its products and services for any particular purpose, nor does Microsemi assume any
liability whatsoever arising out of the application or use of any product or circuit. The products sold
hereunder and any other products sold by Microsemi have been subject to limited testing and should not
be used in conjunction with mission-critical equipment or applications. Any performance specifications are
believed to be reliable but are not verified, and Buyer must conduct and complete all performance and
other testing of the products, alone and together with, or installed in, any end-products. Buyer shall not rely
on any data and performance specifications or parameters provided by Microsemi. It is the Buyer's
responsibility to independently determine suitability of any products and to test and verify the same. The
information provided by Microsemi hereunder is provided "as is, where is" and with all faults, and the entire
risk associated with such information is entirely with the Buyer. Microsemi does not grant, explicitly or
implicitly, to any party any patent rights, licenses, or any other IP rights, whether with regard to such
information itself or anything described by such information. Information provided in this document is
proprietary to Microsemi, and Microsemi reserves the right to make any changes to the information in this
document or to any products and services at any time without notice.
51900292-3/05-15