AN203230 Cypress S25FL-S Multi-I/O DDR DLP Optimizes Read Performance and Reliability This application note highlights the SPI flash DDR Quad I/O access protocol and how the use of DLP can optimize performance and reliability. 1 Abstract Today's embedded systems are typically have larger code and data densities, faster start-ups, higher application performance requirements, while at the same time trying to reduce overall system cost. Cypress understands these conflicting constraints and continues its product innovations to offer best in class NVM solutions to address the next generation system requirements. For the past decade Cypress innovations have broaden our portfolio focus beyond parallel NOR's Asynchronous, Page, Burst interfaces to lower pin count, high read performance, low cost NOR SPI based solutions. SPI interfaces have progressed from a single-bit, SDR, unidirectional input and output (x1) interface to SDR/DDR four-bit, bidirectional (x4) interface. Today's several leading NOR-based SPI memories achieved SDR 133-MHz clock rates and DDR quad I/O interface to facilitate a 66 MB/s continuous read throughput utilizing legacy SPI timing modes. With the addition of DDR timing approaches and the application of a DLP (data learning pattern), the Cypress S25FL-S SPI Family improves read throughputs by an additional 20 percent. This application note highlights the SPI flash DDR Quad I/O access protocol and how the use of DLP can optimize performance and reliability. 2 Multi IO SPI Data Learning Pattern Today high speed embedded systems have more complex OS and Application requirements which typically results in increased read bandwidth requirements to provide acceptable performance at a neutral or lower cost point. Multi I/O SPI-based flash is becoming a pseudo- industry standard and its key features are its low pin count serial interface and bandwidths that are comparable to today's higher pin count parallel flash devices. The Cypress NOR SPI-based S25FL-S Eclipse™ family offers Multi I/O read access capability; offering excellent tradeoffs in reducing interface pin count and best in class read bandwidth. The S25FL-S optimized the base Multi I/O SPI-based read performance with the addition of DDR timing and the application of a data learning pattern (DLP). The S25FL-S improve read throughputs by an additional 20 percent. The following sections highlights the SDR Multi I/O Quad read mode operation and a provides a comparison against DDR Multi I/O Quad DLP. www.cypress.com Document No. 001-03230 Rev. *A 1 Cypress S25FL-S Multi-I/O DDR DLP Optimizes Read Performance and Reliability 2.1 Quad Read Overview SDR The Quad I/O SPI interface retains backward compatibility to support legacy x1 and x2 peripheral products. See Figure 1. Figure 1. High Level Quad I/O Master / Slave Interface SOC / MCU Quad SPI Flash CS# IO0 Data Input CPU SPI Controller SCK IO1 IO2 IO3 The SDR (single-edge data rate) Multi I/O SPI read timing mode outputs a new data value upon each falling edge of SCK. After a period known as the clock-to-data-out time (tV), data becomes valid and remains valid until shortly after the next falling SCK edge see Figure 2. The host typically uses this falling SCK edge to capture the data being output by the SPI flash. The hold time (tHO) defines the length of time that data remains valid after a falling SCK edge. Figure 2. Legacy SPI Timing tDV = Psck - tV + tHO PSCK SCK tV tHO D1 Valid IO[3:0] D2 Valid Evaluating Data-Valid Time Legacy SPI-based timing values, the size of the data valid (tDV) is the clock period (Psck) minus the time until data becomes valid (tV) plus the hold time (tHO) after the next falling clock edge: tDV = Psck - tV + tHO Example Legacy SDR Data Valid Window: – Clock Period: PSCK = 7.5 ns (80 MHz) – Open: tV = 6.5 ns / Close: tHO = 0 ns – Data Valid = PSCK - tV + tHO: 7.5 ns - 6.5 ns + 0 ns ~ 1 ns If one has the assumption that tV and tHO timings are fixed then the data valid window compresses as the SCK frequency increases which limits SCK to ~133 MHz. In a SPI device the tV and tHO timing track each other; a device with a longer tV have a longer tHO and a device with a short tV have a short tHO. The data valid window timing varies with respect to the next falling clock edge and www.cypress.com Document No. 001-03230 Rev. *A 2 Cypress S25FL-S Multi-I/O DDR DLP Optimizes Read Performance and Reliability tV. Utilizing this information that tV and tHO track one another one can define the size of the data valid (tDV) as equal to: tDV = Psck - tO_SKEW - tOTT The key benefit of understanding the tV/ tHO timing constraint allows for optimization of flash data-valid window. Consider the 3V flash using a 133-MHz clock (Psck=7.5 ns), a tO_SKEW of 600 ps. and an Output slew rate of 2V/ ns. The output rise/fall time (tOTT) is: tOTT = Voutput_swing/Output_slew_rate [4]= 3 V/(2 V/ns) =1.5 ns tDV = Psck - tO_SKEW - tOTT = 7.5 ns - 600 ps - 1.5 ns =5.4 ns The new data-valid window can provide a significant improvement over the legacy SPI timing. 2.2 Multi I/O Quad Read DDR These insights concerning the data valid window can be applied SPI DDR mode. In DDR mode the tO_SKEW and tOTT values do not change but new data is output every half clock cycle rather than after every full clock cycle, as is the case for SDR mode (see Figure 3). The tDV is defined as: tDV = tCLH - tO_SKEW - tOTT [5] Figure 3. DDR Quad SPI Timing This DDR SPI data-valid period for a system running at a given clock speed is identical to the data-valid period for an SDR system running at twice that clock speed. This means a DDR SPI device can reliably achieve the same read data rate at a significantly slower clock speed. For example, a QUAD DDR SPI with a clock operating at 80 MHz can achieve 80 MB/s. Note operating at a slower clock speed increases the data valid time. Consider the 3V flash DDR Read access using a 80 MHz clock (Psck/2 = 6.25 ns), a tO_SKEW of 600 ps, and an Output slew rate of 2V/ns. The output rise/fall time (tOTT) is: tOTT = Voutput_swing/Output_slew_rate [4]= 3V/(2V/ns) = 1.5 ns. Assume 50% SCK duty cycle. tDV = Psck/2 - tO_SKEW - tOTT = 6.25 ns - 600 ps - 1.5 ns ~ 4.15 ns www.cypress.com Document No. 001-03230 Rev. *A 3 Cypress S25FL-S Multi-I/O DDR DLP Optimizes Read Performance and Reliability SPI-DDR Read Operation — Data Learning Patterns In a DDR implementation, the tV time can be greater than a half clock period which means a specific SCK edge cannot be used by the Master to reliably capture the data coming from the flash. The Master must skew the datacapture point with respect to each SCK edge in order to reliably capture data. Prior to detailing how the Master might facilitate appropriate sampling skews lets discuss the S25FL-S six new SPI DDR read operations. The new DDR read protocol is available for x1, x2, and x4 interfaces with either three- or four-byte addressing. Consider an SPI-DDR read operation performed using a Quad I/O interface with three-byte addressing (command EDh). The command sequence is much like a standard Quad I/O read operation with the exception that the address, mode, and data bits are transferred on both rising and falling clock edges (DDR) rather than the standard Quad I/O SDR protocol. The SPI-DDR read protocol is processed as follows: 1. The instruction (command operation code) is transferred in an SDR manner for compatibility with all other legacy SPI instructions. After the instruction is sent, all remaining transfers are DDR. 2. Target address is transferred (DDR). 3. Mode bits are loaded (DDR). 4. Read latency (dummy) cycles are issued while target data is extracted from the array. 5. A data learning pattern (DLP) is output by the SPI NOR flash during the last four dummy cycles (DDR). 6. Target data is output by the SPI NOR device (DDR). The new SPI-DDR protocol as shown in Figure 4 adds an 8-bit data-learning pattern (DLP) that is output by the SPI flash using DDR protocol on the four dummy clock cycles just before output of the target data. The DLP provides a known data sequence on each data signal so that the host controller can determine the optimal capture timing to use when receiving the read data. The dummy cycles that carry the DLP occur during the idle period in the legacy Quad I/O read protocol while the target data is retrieved from the memory array; it can be accomplished without impacting performance. The DLP presents the same timing, phase delay, and skew characteristics that exist during output of the target data. The clock to data output delay (tV in Figure 3) will be the same for the DLP and the target data on each individual data signal. Data phase delay and skew arise from issues related to either the memory device or the system environment. Memory device timing variations are caused by process, voltage, temperature, and output-to-output skew. System-level variations are introduced by PCB parasitics, trace-length mismatches, and bus capacitive loading. Collectively, these timing phase and skew relationships are represented in Figure 3. The timing characteristics of the known data learning pattern will allow the host controller to compensate for both the device- and system-level timing phase and skew offsets when valid data is present on the bus. Figure 4. Quad IO SPI-DDR Showing DLP Read Transaction www.cypress.com Document No. 001-03230 Rev. *A 4 Cypress S25FL-S Multi-I/O DDR DLP Optimizes Read Performance and Reliability Key Points: Data Learning Pattern output by memory Oversampled by host Optimal data capture point determined Data read from device – Calibration upon every read transaction – Provides Compensation for Process, Voltage, Temperature 160 Mbps (per IO) Data Rate Today – 2.3 Strategy extendable to higher rates Data Learning Pattern Storage and Definition The non-volatile data learning register (NVDLR) and the volatile data learning register (VDLR) are used to define the sequence of DLP values (8 bits on each of the four Lows) that are used during an SPI-DDR read operation to train the host controller (see Figure 5). The NVDLR can be programmed one time (OTP) with a customer-specific DLP value. During power-up or reset, the value in the NVDLR is loaded into the VDLR. The sequence of values used as the DLP is defined in the VDLR during SPI-DDR read operations. The VDLR can be read and written directly by the host system. When the VDLR is 00h, the DLP will not be output during DDR read operations, thus providing an option to turn off the DLP. The choice of an appropriate DLP is up to the system developer but the pattern should be chosen to maximize skews dependent on the bit-stream sequence. The most significant pattern-dependent skewing is bounded by transitions from states that have been stable for extended periods and states that have existed for shorter periods of time; for example, at higher frequencies, the High to Low transition of a signal behaves slightly different when the bit stream is xx110 than when the bit stream is xx010. The High reached in the xx110 pattern is usually a higher voltage than the High reached with the xx010 pattern. The higher starting voltage will mean that it will take slightly longer to reach a valid LOW state during a High to Low transition. The xx110 High to Low transition demonstrates a ‘strong 1,’ while the xx010 transition demonstrates a ‘weak 1.’ The DLP pattern should be chosen to include at least one instance of: weak 0, strong 0, weak 1, and strong 1 One pattern fulfilling these requirements is 34h (00110100b). The edges in the 34h pattern step through the following transitions: Strong 0 -> Strong 1 -> Weak 0 -> Weak 1 Any data learning pattern that includes these four transition types should maximize pattern-dependent skew characteristics. 2.4 Host Capture Strategy The overall data-capture strategy for the host memory controller is to use the DLP input as a test sequence to characterize system response and determine tV and tDV. Once the data eye has been identified during the DLP portion of the DDR read sequence, the controller selects the optimal data-capture point to maximize the timing margin for the read data. A common way to create the Master data-capture logic is via series of skewed data-capture points that span the data-valid window. The implementation for a single I/O might consist of five channels with a fixed sampling delay between each of the channels. The five delayed strobes (A through E) could be generated with a delay-locked loop (DLL) or using an oversampling clock that is in turn generated using an internally available higher frequency clock. The host controller samples the target I/O while the DLP is being output. The phase-delayed strobes (A-E) are triggered by the eight SCLK edges when the DLP is output. www.cypress.com Document No. 001-03230 Rev. *A 5 Cypress S25FL-S Multi-I/O DDR DLP Optimizes Read Performance and Reliability Figure 5. Host Capture Strategy Key Points: 3 Data oversampled Samples from ‘taps’ B, C, and D always successfully capture the DLP In this case Tap C provides the greatest margin Use Tap C to capture data for the remainder of this read transaction Recalibration can be performed prior to every read transaction, provides more robust / reliable operation across operating conditions Conclusion As embedded applications continue to demand higher performance the legacy SPI interface and protocols must continue to accommodate higher read speeds. The DLP approach enables SPI-DDR NOR flash to move beyond today 133 Mbps (per pin) data rate. This new feature provides the embedded designers another attractive nonvolatile memory solution to maximize read data throughput while minimizing pin count, PCB complexity, package size, and cost. These principles and feature improvements apply across many market applications whether it is industrial, automotive graphics, or consumer; DLP-enabled SPI-DDR NOR flash provides another enhanced solution to improve system design and performance, all for a reasonable cost. 4 References Cypress: S25FL256S Data Sheet Cypress Article: Data learning and SPI boost NOR flash performance by Cliff Zitlaw www.cypress.com Document No. 001-03230 Rev. *A 6 Cypress S25FL-S Multi-I/O DDR DLP Optimizes Read Performance and Reliability Document History Page Document Title: AN203230 - Cypress S25FL-S Multi-I/O DDR DLP Optimizes Read Performance and Reliability Document Number: 001-03230 Rev. ECN No. Orig. of Change Submission Date Description of Change ** – – 05/20/2015 Initial version *A 5041698 MSWI 12/08/2015 Updated in Cypress template www.cypress.com Document No. 001-03230 Rev. *A 7 Cypress S25FL-S Multi-I/O DDR DLP Optimizes Read Performance and Reliability Worldwide Sales and Design Support Worldwide Sales and Design Support Cypress maintains a worldwide network of offices, solution centers, manufacturers’ representatives, and distributors. To find the office closest to you, visit us at Cypress Locations. # 999 Products PSoC® Solutions Automotive..................................cypress.com/go/automotive psoc.cypress.com/solutions Clocks & Buffers ................................ cypress.com/go/clocks PSoC 1 | PSoC 3 | PSoC 4 | PSoC 5LP Interface......................................... cypress.com/go/interface Cypress Developer Community Lighting & Power Control ............cypress.com/go/powerpsoc Memory........................................... cypress.com/go/memory PSoC ....................................................cypress.com/go/psoc Touch Sensing .................................... cypress.com/go/touch Community | Forums | Blogs | Video | Training Technical Support cypress.com/go/support USB Controllers ....................................cypress.com/go/USB Wireless/RF .................................... cypress.com/go/wireless MirrorBit®, MirrorBit® Eclipse™, ORNAND™, EcoRAM™ and combinations thereof, are trademarks and registered trademarks of Cypress Semiconductor Corp. All other trademarks or registered trademarks referenced herein are the property of their respective owners. Cypress Semiconductor 198 Champion Court San Jose, CA 95134-1709 Phone: Fax: Website: 408-943-2600 408-943-4730 www.cypress.com © Cypress Semiconductor Corporation, 2015. The information contained herein is subject to change without notice. Cypress Semiconductor Corporation assumes no responsibility for the use of any circuitry other than circuitry embodied in a Cypress product. Nor does it convey or imply any license under patent or other rights. Cypress products are not warranted nor intended to be used for medical, life support, life saving, critical control or safety applications, unless pursuant to an express written agreement with Cypress. Furthermore, Cypress does not authorize its products for use as critical components in life-support systems where a malfunction or failure may reasonably be expected to result in significant injury to the user. The inclusion of Cypress products in life-support systems application implies that the manufacturer assumes all risk of such use and in doing so indemnifies Cypress against all charges. This Source Code (software and/or firmware) is owned by Cypress Semiconductor Corporation (Cypress) and is protected by and subject to worldwide patent protection (United States and foreign), United States copyright laws and international treaty provisions. Cypress hereby grants to licensee a personal, non-exclusive, non-transferable license to copy, use, modify, create derivative works of, and compile the Cypress Source Code and derivative works for the sole purpose of creating custom software and or firmware in support of licensee product to be used only in conjunction with a Cypress integrated circuit as specified in the applicable agreement. Any reproduction, modification, translation, compilation, or representation of this Source Code except as specified above is prohibited without the express written permission of Cypress. Disclaimer: CYPRESS MAKES NO WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, WITH REGARD TO THIS MATERIAL, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. Cypress reserves the right to make changes without further notice to the materials described herein. Cypress does not assume any liability arising out of the application or use of any product or circuit described herein. Cypress does not authorize its products for use as critical components in life-support systems where a malfunction or failure may reasonably be expected to result in significant injury to the user. The inclusion of Cypress' product in a life-support systems application implies that the manufacturer assumes all risk of such use and in doing so indemnifies Cypress against all charges. Use may be limited by and subject to the applicable Cypress software license agreement. www.cypress.com Document No. 001-03230 Rev. *A 8