Aeroflex Colorado Springs Application Note UT699-AN-07 UT699 L1 Data and Instruction Cache Organization Table 1: Cross Reference of Applicable Products Product Name: Manufacturer Part Number SMD # Device Type Internal PIC Number: UT699 5962-08228 01, 02 WG07 UT699 32-bit Fault-Tolerant SPARC V8/LEON 3FT Processor * PIC = Product Identification Code 1 Introduction Cache memory is an important element in microprocessors. In the UT699, each instruction and data access from external memory can take up to three clock cycles during random accesses and two clock cycles during burst instruction fetches. Accesses to cache memory in a processor such as the UT699 take only a single clock cycle. Microprocessor designers usually place cache memory on the same die as the central processing unit in order to achieve this fast access time. During code execution, the processor fetches instructions and data from external memory and stores it into on-chip cache memory. Subsequent accesses to cached instructions or data will then take only a single CPU clock cycle per access. This results in higher system performance as a processor utilizing cache requires fewer clock cycles to execute code as the same processor without cache. This application note explains the cache organization of the UT699, how the UT699 determines cache addresses, and the use of cache tags. Finally, Section 6 provides assembly code examples that the software programmer can utilize to access cache data and tags. 2 Cache Organization The UT699 Leon 3FT microprocessor has 8kB of L1 instruction cache and 8kB of L1 data cache. Both cache units are organized as two-way, set associative, resulting in a logical configuration of 2x4kB for both instruction cache and data cache. The instruction cache is organized as 128 lines with 32 bytes per line for each set. The data cache is organized as 256 lines with 16 bytes per line for each set. In the event of a cache miss, i.e., a cache location does not contain valid data, the cache controller replaces an entire cache line using a least-recently used (LRU) replacement policy. The instruction and data cache are organized as shown in Tables 2 and 3. Table 2: Instruction Cache Organization Set Line 0 0 31 30 ... 4 3 2 1 0 0 ... 31 30 ... 4 3 2 1 0 0 127 31 30 ... 4 3 2 1 0 1 0 31 30 ... 4 3 2 1 0 1 ... 31 30 ... 4 3 2 1 0 1 127 31 30 ... 4 3 2 1 0 Creation Date: 9/9/10 Byte Page 1 of 9 Modification Date: 7/5/11 Aeroflex Colorado Springs Application Note UT699-AN-07 . Table 3: Data Cache Organization Creation Date: 9/9/10 Set Line Byte 0 0 15 ... 1 0 0 ... 15 ... 1 0 0 255 15 ... 1 0 1 0 15 ... 1 0 1 ... 15 ... 1 0 1 255 15 ... 1 0 Page 2 of 9 Modification Date: 7/5/11 Aeroflex Colorado Springs Application Note 3 UT699-AN-07 Cache Addresses A unique address identifies each cache location. Accesses to either cache data or cache tags make use of these addresses. Section 5 explains the use of cache tags and their relationship to external addresses. Instruction and data cache addresses are word aligned. Instruction cache addresses range from 000016 to 0FFC16 for set 0, and from 100016 to 1FFC16 for set 1. Data cache addresses range from 000016 to 0FFC16 for set 0, and from 100016 to 1FFC16 for set 1. Since cache addresses are always aligned on 32-bit word boundaries, they must end in 0016, 0416, 0816, or 0C16. Table 4 shows an example of the addresses for the words in line 2 of set 1 of the instruction cache. For example, instruction cache address 104016 is the address of word 0 of line 2 of set 1 of the instruction cache. Table 4: Logical Representation of Instruction Cache Address Cache Address Set Line “00”1 Word 104016 x x x 1 0 0 0 0 0 1 0 0 0 0 0 0 104416 x x x 1 0 0 0 0 0 1 0 0 0 1 0 0 ... x x x 1 0 0 0 0 0 1 0 0 0 105C16 x x x 1 0 0 0 0 0 1 0 0 0 ... 1 1 1 Table 5 shows an example of the addresses for the words in line 2 of set 1 of the data cache. Table 5: Logical Representation of Data Cache Address Cache Address Set Line Word “00”1 102016 x x x 1 0 0 0 0 0 0 1 0 0 0 0 0 102416 x x x 1 0 0 0 0 0 0 1 0 0 1 0 0 102816 x x x 1 0 0 0 0 0 0 1 0 1 0 0 0 102C16 x x x 1 0 0 0 0 0 0 1 0 1 1 0 0 Notes: 1. The two least-significant bits for both instruction and data cache addresses are always “00”, indicating word alignment. Creation Date: 9/9/10 Page 3 of 9 Modification Date: 7/5/11 Aeroflex Colorado Springs Application Note 4 UT699-AN-07 Data Caching The following section provides an example of how external data is stored in cache memory, demonstrates the case where two external addresses are mapped to the same cache location, and explains how cache sets are used. Each set of the data cache contains 4096 bytes, or 1024 words, of cache memory that map to the entire 1GB external address space. Therefore, each individual cache location maps to 256k locations in external memory. Conversely, there are 256k locations of external address locations that map to a single location in cache memory. Now consider the case where two variables are written to external data, the first to address 4000200016 and the second to address 4000300016. Both variables are aligned on a 4kB boundary, which is the size of each data cache set. Therefore, they necessarily map to the same cache location. Specifically, they both map to the data cache at address 000016. This is shown in Figure 1 below. . Set Line Word 0 0 3 2 1 0 0 ... 3 2 1 0 0 255 3 2 1 0 1 0 3 2 1 0 1 ... 3 2 1 0 1 255 3 2 1 0 4000200016 4000300016 Figure 1. Example of External Data being Written to Cache In this example, the first write to external address 4000200016 results in a write to cache location 000016, which is the first word of the first row of set 0. The valid bit for this cache location will be set, indicating that the cache location contains valid data. Valid bits are discussed further in Section 5. Next, data is written to external address 4000300016. If the valid bit for cache location 000016 were not set, this would result in a write to that cache location. However, since the valid bit is set, the write occurs to cache address 100016,which is the first word of the first row of set 1. The cache controller uses a least-recently used (LRU) replacement policy. This means that a subsequent update to external memory on the same 4kB boundary results in a write to cache location 000016, assuming location 100016 was the most recently accessed location. The cache data will be overwritten with the new data. Creation Date: 9/9/10 Page 4 of 9 Modification Date: 7/5/11 Aeroflex Colorado Springs Application Note 5 UT699-AN-07 Cache Tags and Data Each cache memory location has an associated cache data and a cache tag. The cache tag of a particular cache location contains information that identifies the address of the associated data in external memory. The cache data of a particular cache location contains the data corresponding to the data in external memory. Refer to the tag layouts in Figures 2 and 4. These figures show the fields of the instruction and data cache tags. The actual physical layout of the cache tags is explained in Section 2.6.3 of the UT699 Functional Manual. The ITAG and DTAG fields contain the most-significant 20 bits of the address of the data in external memory. The least-significant 12 address bits directly correspond to the cache address and are used to access cache tags and data using the load and store instructions lda and sta. The IVAL and DVAL fields identify whether or not the corresponding word in a cache line is valid. A ‘1’ indicates that the word is valid, and accesses to the data or instruction at that address result in a valid cache hit. Note: The valid bits are shared with all cache tags for a given cache line. For more information on the instruction and data tag layouts, please refer to the UT699 Functional Manual. The cache data fields are represented in Figures 3 and 5. These are 32-bit fields that contain the same data as the referenced address in external memory when the cache is valid, i.e., the valid bit is set for that cache location. 31 12 11 8 ITAG 7 0 0000 IVAL Figure 2. Instruction Cache Tag Layout 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 4 3 2 1 0 DATA Figure 3. Instruction Cache Data Layout 31 12 11 DTAG 00000000 0 DVAL Figure 4. Data Cache Tag Layout 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 DATA Figure 5. Data Cache Data Layout Creation Date: 9/9/10 Page 5 of 9 Modification Date: 7/5/11 Aeroflex Colorado Springs Application Note UT699-AN-07 Consider the previous example in Figure 1 of a write to external memory at address 4000200016. It is assumed that prior to the write, the entire cache line does not contain valid data, i.e., the DVAL field is 00002. Following the write to external memory, the data cache tag contains the following information (in hexadecimal): 4000216 Base address is 4000216 0016 116 Always 0016 Valid bits (0-F) To reconstruct the external address from the cache tag, the DTAG or ITAG field is concatenated with the cache address. In this example, the DTAG field is 4000216 and the cache address is 000016. Therefore, the referenced external address is: 40002xxx16 + 000016 = 4000200016. Following this write, the data cache at location 000016 contains the same data as external address 4000200016. Creation Date: 9/9/10 Page 6 of 9 Modification Date: 7/5/11 Aeroflex Colorado Springs Application Note 6 UT699-AN-07 Accessing Cache Memory Using Alternate Space Identifier (ASI) Instructions Accesses to the cache tags and cache data are handled automatically by the LEON 3FT core. However, they can be accessed using lda and sta instructions. These commands are similar to the load and store instructions ld and st, except that they access memory in an alternate memory space using an alternate space identifier (ASI). The following table shows the ASI usage for the UT699 microprocessor. Table 6: ASI Usage ASI Usage 0116 Forced cache miss 0216 System (cache control) registers 0816, 0916, 0A16, 0B16 Normal instruction and data access 0C16 Instruction cache tags 0D16 Instruction cache data 0E16 Data cache tags 0F16 Data cache data 1016 Flush entire instruction cache 1116 Flush entire data cache For example, to access the data cache tag and cache data at a particular cache location, the programmer must use ASI 0E16 and 0F16 with an lda and sta instruction using inline assembly code. The most efficient way to access memory in the standard or an alternate memory space is to create inline assembly procedures called as C routines. The following four functions show examples of inline assembly code. The first two show how to perform a store and load operation in standard memory space. inline void storemem(int addr, int val) { asm volatile (" st %0, [%1] " : : "r" (val), "r" (addr) ); } inline int loadmem(int addr) { int tmp; asm volatile (" ld [%1], %0 " : "=r" (tmp) : "r" (addr) ); return tmp; } Creation Date: 9/9/10 // store val to addr // output // inputs // // // // Page 7 of 9 used for returned value load tmp from addr output input Modification Date: 7/5/11 Aeroflex Colorado Springs Application Note UT699-AN-07 The next two functions are used to read the values of the data cache tags and data cache data in alternate spaces 0E16 and 0F16, respectively. inline int loadmem_asi_0e(int addr) { int tmp; asm volatile (" lda [%1] 0x0e, %0 " : "=r" (tmp) : "r" (addr) ); return tmp; } inline int loadmem_asi_0f(int addr) { int tmp; asm volatile (" lda [%1] 0x0f, %0 " : "=r" (tmp) : "r" (addr) ); return tmp; } // // // // used for returned value load tmp from addr at ASI 0x0e output input // // // // used for returned value load tmp from addr at ASI 0x0f output input We can now make use of our inline assembly routines using a C function call. An example is the following write to data memory at locations 4000200016 and 4000300016 in standard memory space using the following C code: storemem(0x40002000, 0x55555555); storemem(0x40003000, 0xaaaaaaaa); In the event of a flushed data cache line or a cleared valid bit at cache location 000016, both physical memory locations would map to data cache address 000016, i.e., word 0 of line 0 of set 0. However, this example shows that the first store operation results in the writing of data 5555555516 to set 0, with the second store operation writing data to set 1. The C functions and resultant returned values are shown below: dcache_data = loadmem_asi_0f(0x0000); dcache_tag = loadmem_asi_0e(0x0000); The instruction passes the 12-bit cache address as a function parameter. The first function returns the cache data 5555555516. The second function returns the data cache tag 4000200116. The five most-significant hex digits of the tag indicate the upper 20 address bits of the data stored in physical memory space. The least-significant hex digit corresponds to the valid bits for the cache line. In this example, the value of ‘1’ in the least-significant digit indicates that word 0 is now valid as a result of the first storemem operation. The first storemem operation resulted in an update of the data cache memory at location 000016. Therefore, any access to data in physical memory at an address with the same three least-significant hex digits results in either a replacement at cache location 000016, or the data being written to set 1 at address 100016. Since only cache location 000016 has been updated, the Creation Date: 9/9/10 Page 8 of 9 Modification Date: 7/5/11 Aeroflex Colorado Springs Application Note UT699-AN-07 data will be written to set 1. To illustrate this, the data cache data and data cache tag at cache location 100016 are accessed using the following instructions: dcache_data = loadmem_asi_0f(0x1000); dcache_tag = loadmem_asi_0e(0x1000); These instructions return values of AAAAAAAA16 and 4000300116 for the data cache data and tag, respectively, showing that the data was stored in set 1. As before, the ‘1’ in the least-significant digit of the data cache tag indicates that the first word in the cache line is valid. Note: Diagnostic accesses to instruction cache (ASI 0C16 and 0D16) fail unless the instruction cache is disabled in the cache control register. 7 Conclusion Updates and accesses to data and instruction cache during the execution of application code are automatically handled by the LEON 3FT processor core logic. However, the contents of the cache tags and data are readily available with memory accesses using alternate space identifier (ASI) instructions. This can be particularly useful during code debug when confirmation of cache accesses is required or to compare performance in a system where cache could be either enabled or disabled. Creation Date: 9/9/10 Page 9 of 9 Modification Date: 7/5/11