here

The Feasibility of Memory Encryption and
Authentication
Donald Owen, Jr.
Laboratory for Computer Architecture
Department of Electrical and Computer Engineering
The University of Texas at Austin
Austin, TX 78705, USA
FastPath 2013: April 21, 2013
D. Owen
1/32
Outline
Introduction
Motivation
Background
Solution Characterization
Software
Hardware
Results
Conclusion
D. Owen
2/32
Motivation
Digital data is increasingly put on mobile devices and remote
servers. This data needs to be protected.
Security Problems
Security Solutions?
Health records
SSN
Private communication (emails)
Corporate documents
Crypto keys
... more
Disk encryption
Strong passwords
Hardened software
A major component left unprotected: DRAM.
D. Owen
3/32
Existing Protections Not Enough
Attacks abound
I Passive
I
I
Bus Sniffing*
Active
I
I
I
I
Spoofing
Splicing
Replay
Cold boot*
D. Owen
4/32
Existing Protections Not Enough
XboxTM hacked by Andrew “bunnie” Huang (MIT) using a bus
sniffer to read a secret key / decryption code.
Image Huang [1]
D. Owen
5/32
Existing Protections Not Enough
The “cold boot” attack, and variants thereof, exploit DRAM
remanence to extract data from RAM.
1. Interrupt power of a running system
2. Reboot into custom OS
3. Dump contents of DRAM to permanent storage
4. Mine dumped data for keys, files, fragments
5. Use recovered data to exploit, for example, encrypted disk
D. Owen
6/32
Outline
Introduction
Motivation
Background
Solution Characterization
Software
Hardware
Results
Conclusion
D. Owen
7/32
Memory Encryption & Authentication
What do we want?
How do we get it?
C: Confidentiality
I: Integrity
A: Authentication
Encryption
Hash/Tag/Signature
Tag/Signature
In short, encrypt and tag data to RAM; decrypt and authenticate
data from RAM.
D. Owen
8/32
Memory Encryption & Authentication
I
Approach 1: Encrypt and tag each cache line. Store tags on
the chip. Verify on reading back from RAM.
I
I
Problem: Storage space.
Approach 2: Use a tree structure! Store tags in DRAM with
the root stored on the chip. Verify up the tree on reading
back.
I
I
Problem: Speed.
Variants: Use a dedicated tag cache on the chip. Verify until
you hit in the cache.
D. Owen
9/32
Memory Encryption & Authentication
Image Elbaz et al. [2]
D. Owen
10/32
Galois Counter Mode
Image Wikipedia [3]
D. Owen
11/32
Outline
Introduction
Motivation
Background
Solution Characterization
Software
Hardware
Results
Conclusion
D. Owen
12/32
Solution Characterization
Previous work assumed the existence of hardware adequate to the
task of encryption, decryption, tagging, and verifying fast enough
to meet performance demands.
We evaluated how feasible those assumptions are under different
implementation characteristics.
I
Software
I
I
I
Pure C on x86
C + x86 Assembly
C + x86 Assembly + ISA Extensions
I
RTL on FPGA
I
RTL on synthesized ASIC
D. Owen
13/32
Experimental Setup
Table : Experimental Setup
Processor
OS
Compiler
FPGA Synthesis
ASIC Synthesis
Intel Core i7 2620M
Fedora Linux 17
GNU/Linux 3.7 x86 64
GCC 4.7.2
Xilinx ISE v. 14.3
Kintex 7-325T
Synopsys Design Vision v. E2010-12
FreePDK 45 nm Library
D. Owen
14/32
Pure C Implementation
MiBench has an AES (Rijndael) benchmark. We modified this
benchmark to suit the implementation requirements.
Modifications
I
Convert AES-CBC to AES-GCM.
I
Convert File I/O to in-memory operations.
I
Profile at cache line sizes
D. Owen
15/32
Pure C Implementation
Table : Cycles per Byte Measurements for Pure C Implementation of
AES-GCM
Encrypt
Decrypt
32B
52.2
74.2
64B
37.8
50.4
128B
35.6
39.8
256B
28.3
35.8
512B
25.4
33.0
Buffer Size
D. Owen
16/32
C + Assembly
We can do better!
I
The same code in the pure C implementation has optional
Assembly routines.
I
OpenSSL uses Assembly optimizations.
I
Modern x86 processors have ISA extensions for AES and
GCM.
D. Owen
17/32
C + Assembly
Table : Cycles per Byte Measurements for Assembly-Optimized
AES-GCM (64B Buffer)
Method
Encrypt
Decrypt
37.8
50.4
C + Opt.
22
30
OpenSSL
˜25
˜25
3.5
˜3.5
Pure C
ISA Extensions
[4]
D. Owen
18/32
RTL Module
Most previous work assumes the existence of hardware modules.
We adapted an open-source AES-GCM module to be suitable for
both FPGA and ASIC synthesis.
D. Owen
19/32
RTL Module Characteristics - FPGA
Table : Open-Source vs. Representative Commercial AES-GCM RTL
Core on Kintex 7 FPGA
Metric
Startup
16B Enc/Dec
16B Tag(Hash)
64B Cache Line + Tag
Open Source
19 clocks
22 clocks
17 clocks
123 clocks
Commercial
0 clocks
12 clocks
12 clocks
60 clocks
Freq. Max
Logic Slices
Block RAMs
212 MHz
˜800
8
256 MHz
˜1000
12
D. Owen
20/32
FPGA RTL Synthesis Results
Linear Fit: 193mW /instance.
D. Owen
21/32
RTL Module Characteristics - ASIC
FreePDK 45 Implementation
I
Freq Max: 250 MHz
I
Area: 89k µm2
I
Power: 12 mW
D. Owen
22/32
ASIC RTL Synthesis Results
Linear Fit: 11.05mW /instance.
D. Owen
23/32
Outline
Introduction
Motivation
Background
Solution Characterization
Software
Hardware
Results
Conclusion
D. Owen
24/32
Results - Summary
Table : Summary of Different Implementation Methods
Clock (Hz)
Cycles
Byte
Throughput
Typ. Power
Typ. Area
Mbps/mW
ASIC
FPGA
200 M
1.9
x86
C
2.7 G
44
x86
Assembly
2.7 G
22
x86
ISA Ext.
2.7 G
3.5
225 M
1.9
936.6 Mbps
11.05 mW
74.1kµm2
84.7
882.5 Mbps
192.9 mW
4.57
490.9 Mbps
˜35 W
1.40 ∗ 10−2
981.8 Mbps
˜35 W
2.81 ∗ 10−2
6.17 Gbps
˜35 W
1.76 ∗ 10−1
D. Owen
25/32
Implementation Feasibility
Table : Peak Memory Bandwidth of Several Modern Systems
BW
( GB
)
s
Nexus 7
Nexus 10
iPhone 5
iPad 3
Intel i7
AMD FX
5.3
12.8
8.5
12.8
25.6
21
D. Owen
26/32
Implementation Feasibility
Table : Number of Instances to Meet Peak BW
Intel i7
C
C + Opt.
C + ISA
FPGA
ASIC
590
210
34
230
220
220 ASIC Modules ≈ 16 mm2 at 45 nm.
220 ASIC Modules ≈ 2.4 W at 45 nm.
D. Owen
27/32
Outline
Introduction
Motivation
Background
Solution Characterization
Software
Hardware
Results
Conclusion
D. Owen
28/32
Summary
What Have We Shown?
I
Software solutions require too much power.
I
Software solutions require too much area.
I
Software solutions are too slow.
I
FPGA solution may be useful for existing designs.
I
ASIC solution may be feasible for implementation in a real
system.
D. Owen
29/32
Thank you!
Questions?
D. Owen
30/32
Backup Slides
Backup Slides
D. Owen
31/32
References
[Online]. Available: http://www.xenatera.com/bunnie/proj/anatak/xboxmod.html#ldt
R. Elbaz, D. Champagne, C. Gebotys, R. B. Lee, N. Potlapally, and L. Torres, “Hardware mechanisms for
memory authentication: A survey of existing techniques and engines,” in Transactions on Computational
Science IV, M. L. Gavrilova, C. J. Tan, and E. D. Moreno, Eds. Berlin, Heidelberg: Springer-Verlag, 2009,
ch. Hardware Mechanisms for Memory Authentication: A Survey of Existing Techniques and Engines, pp.
1–22. [Online]. Available: http://dx.doi.org/10.1007/978-3-642-01004-0 1
[Online]. Available: https://en.wikipedia.org/wiki/Galois/Counter Mode
R carry-less multiplication instruction and its usage for computing
S. Gueron and M. E. Kounavis, “Intel
the gcm mode,” White Paper, 2010.
D. Owen
32/32