The Feasibility of Memory Encryption and Authentication Donald Owen, Jr. Laboratory for Computer Architecture Department of Electrical and Computer Engineering The University of Texas at Austin Austin, TX 78705, USA FastPath 2013: April 21, 2013 D. Owen 1/32 Outline Introduction Motivation Background Solution Characterization Software Hardware Results Conclusion D. Owen 2/32 Motivation Digital data is increasingly put on mobile devices and remote servers. This data needs to be protected. Security Problems Security Solutions? Health records SSN Private communication (emails) Corporate documents Crypto keys ... more Disk encryption Strong passwords Hardened software A major component left unprotected: DRAM. D. Owen 3/32 Existing Protections Not Enough Attacks abound I Passive I I Bus Sniffing* Active I I I I Spoofing Splicing Replay Cold boot* D. Owen 4/32 Existing Protections Not Enough XboxTM hacked by Andrew “bunnie” Huang (MIT) using a bus sniffer to read a secret key / decryption code. Image Huang [1] D. Owen 5/32 Existing Protections Not Enough The “cold boot” attack, and variants thereof, exploit DRAM remanence to extract data from RAM. 1. Interrupt power of a running system 2. Reboot into custom OS 3. Dump contents of DRAM to permanent storage 4. Mine dumped data for keys, files, fragments 5. Use recovered data to exploit, for example, encrypted disk D. Owen 6/32 Outline Introduction Motivation Background Solution Characterization Software Hardware Results Conclusion D. Owen 7/32 Memory Encryption & Authentication What do we want? How do we get it? C: Confidentiality I: Integrity A: Authentication Encryption Hash/Tag/Signature Tag/Signature In short, encrypt and tag data to RAM; decrypt and authenticate data from RAM. D. Owen 8/32 Memory Encryption & Authentication I Approach 1: Encrypt and tag each cache line. Store tags on the chip. Verify on reading back from RAM. I I Problem: Storage space. Approach 2: Use a tree structure! Store tags in DRAM with the root stored on the chip. Verify up the tree on reading back. I I Problem: Speed. Variants: Use a dedicated tag cache on the chip. Verify until you hit in the cache. D. Owen 9/32 Memory Encryption & Authentication Image Elbaz et al. [2] D. Owen 10/32 Galois Counter Mode Image Wikipedia [3] D. Owen 11/32 Outline Introduction Motivation Background Solution Characterization Software Hardware Results Conclusion D. Owen 12/32 Solution Characterization Previous work assumed the existence of hardware adequate to the task of encryption, decryption, tagging, and verifying fast enough to meet performance demands. We evaluated how feasible those assumptions are under different implementation characteristics. I Software I I I Pure C on x86 C + x86 Assembly C + x86 Assembly + ISA Extensions I RTL on FPGA I RTL on synthesized ASIC D. Owen 13/32 Experimental Setup Table : Experimental Setup Processor OS Compiler FPGA Synthesis ASIC Synthesis Intel Core i7 2620M Fedora Linux 17 GNU/Linux 3.7 x86 64 GCC 4.7.2 Xilinx ISE v. 14.3 Kintex 7-325T Synopsys Design Vision v. E2010-12 FreePDK 45 nm Library D. Owen 14/32 Pure C Implementation MiBench has an AES (Rijndael) benchmark. We modified this benchmark to suit the implementation requirements. Modifications I Convert AES-CBC to AES-GCM. I Convert File I/O to in-memory operations. I Profile at cache line sizes D. Owen 15/32 Pure C Implementation Table : Cycles per Byte Measurements for Pure C Implementation of AES-GCM Encrypt Decrypt 32B 52.2 74.2 64B 37.8 50.4 128B 35.6 39.8 256B 28.3 35.8 512B 25.4 33.0 Buffer Size D. Owen 16/32 C + Assembly We can do better! I The same code in the pure C implementation has optional Assembly routines. I OpenSSL uses Assembly optimizations. I Modern x86 processors have ISA extensions for AES and GCM. D. Owen 17/32 C + Assembly Table : Cycles per Byte Measurements for Assembly-Optimized AES-GCM (64B Buffer) Method Encrypt Decrypt 37.8 50.4 C + Opt. 22 30 OpenSSL ˜25 ˜25 3.5 ˜3.5 Pure C ISA Extensions [4] D. Owen 18/32 RTL Module Most previous work assumes the existence of hardware modules. We adapted an open-source AES-GCM module to be suitable for both FPGA and ASIC synthesis. D. Owen 19/32 RTL Module Characteristics - FPGA Table : Open-Source vs. Representative Commercial AES-GCM RTL Core on Kintex 7 FPGA Metric Startup 16B Enc/Dec 16B Tag(Hash) 64B Cache Line + Tag Open Source 19 clocks 22 clocks 17 clocks 123 clocks Commercial 0 clocks 12 clocks 12 clocks 60 clocks Freq. Max Logic Slices Block RAMs 212 MHz ˜800 8 256 MHz ˜1000 12 D. Owen 20/32 FPGA RTL Synthesis Results Linear Fit: 193mW /instance. D. Owen 21/32 RTL Module Characteristics - ASIC FreePDK 45 Implementation I Freq Max: 250 MHz I Area: 89k µm2 I Power: 12 mW D. Owen 22/32 ASIC RTL Synthesis Results Linear Fit: 11.05mW /instance. D. Owen 23/32 Outline Introduction Motivation Background Solution Characterization Software Hardware Results Conclusion D. Owen 24/32 Results - Summary Table : Summary of Different Implementation Methods Clock (Hz) Cycles Byte Throughput Typ. Power Typ. Area Mbps/mW ASIC FPGA 200 M 1.9 x86 C 2.7 G 44 x86 Assembly 2.7 G 22 x86 ISA Ext. 2.7 G 3.5 225 M 1.9 936.6 Mbps 11.05 mW 74.1kµm2 84.7 882.5 Mbps 192.9 mW 4.57 490.9 Mbps ˜35 W 1.40 ∗ 10−2 981.8 Mbps ˜35 W 2.81 ∗ 10−2 6.17 Gbps ˜35 W 1.76 ∗ 10−1 D. Owen 25/32 Implementation Feasibility Table : Peak Memory Bandwidth of Several Modern Systems BW ( GB ) s Nexus 7 Nexus 10 iPhone 5 iPad 3 Intel i7 AMD FX 5.3 12.8 8.5 12.8 25.6 21 D. Owen 26/32 Implementation Feasibility Table : Number of Instances to Meet Peak BW Intel i7 C C + Opt. C + ISA FPGA ASIC 590 210 34 230 220 220 ASIC Modules ≈ 16 mm2 at 45 nm. 220 ASIC Modules ≈ 2.4 W at 45 nm. D. Owen 27/32 Outline Introduction Motivation Background Solution Characterization Software Hardware Results Conclusion D. Owen 28/32 Summary What Have We Shown? I Software solutions require too much power. I Software solutions require too much area. I Software solutions are too slow. I FPGA solution may be useful for existing designs. I ASIC solution may be feasible for implementation in a real system. D. Owen 29/32 Thank you! Questions? D. Owen 30/32 Backup Slides Backup Slides D. Owen 31/32 References [Online]. Available: http://www.xenatera.com/bunnie/proj/anatak/xboxmod.html#ldt R. Elbaz, D. Champagne, C. Gebotys, R. B. Lee, N. Potlapally, and L. Torres, “Hardware mechanisms for memory authentication: A survey of existing techniques and engines,” in Transactions on Computational Science IV, M. L. Gavrilova, C. J. Tan, and E. D. Moreno, Eds. Berlin, Heidelberg: Springer-Verlag, 2009, ch. Hardware Mechanisms for Memory Authentication: A Survey of Existing Techniques and Engines, pp. 1–22. [Online]. Available: http://dx.doi.org/10.1007/978-3-642-01004-0 1 [Online]. Available: https://en.wikipedia.org/wiki/Galois/Counter Mode R carry-less multiplication instruction and its usage for computing S. Gueron and M. E. Kounavis, “Intel the gcm mode,” White Paper, 2010. D. Owen 32/32