Presentation

IBM HRL
Data Layout Optimizations in GCC
Olga Golovanevsky
Razya Ladelsky
IBM Haifa Research Lab, Israel
4/23/2007
© 2007 IBM Corporation
IBM HRL
Agenda
General idea
In scope of GCC
Structure reorganization optimizations
– Transformations
– Decision making mechanism
– Type–escape analysis
Matrix reorganization optimizations
– Matrix flattening
– Matrix transposing
– Combining: flattening + transposing
Current Status
Experimental results
2
4/23/2007
© 2007 IBM Corporation
IBM HRL
General Idea
General idea: to adapt the layout of a data structure to its access
patterns
Principle: if data elements are accessed close in time, they should
be allocated close in memory to gain better cache utilization
Optimizations - profile-based/static:
– matrix flattening, matrix transposing
– structure peeling, splitting and reordering
Flow:
– analyze data accesses
– change allocations, deallocations and access sites
For structure:
– fields are manipulated
For matrix:
– dimensions are manipulated
3
4/23/2007
© 2007 IBM Corporation
IBM HRL
In scope of GCC
Presented:
– matrix flattening/transposing:
•
gcc summit 2006
– struct-reorg:
•
•
gcc summit 2005 by M.Hagog and C.Tice
gcc summit 2007
Based on tree-ssa IR
Use ipa infrastructure
Jan Hubicka/Suse-Novell
Require whole program
– -fwhole-program, -combine
Potential for LTO
Developed on:
– matrix optimizations – ipa-branch
– struct-reorg – struct-reorg-branch
Scheduled for gcc4.3 mainline
4
4/23/2007
© 2007 IBM Corporation
IBM HRL
Structure peeling
typedef struct {int a; float b;} str;
str *arr = (str *) malloc(N*sizeof(str));
foo1 ()
foo2 ()
{
{
for (i=0; i<N; i++)
for (i=0; i<N; i++)
arr[i].a = …;
}
… = arr[i].b;
}
a b a b a b a b a b a b a b a b
5
4/23/2007
© 2007 IBM Corporation
IBM HRL
Structure peeling
typedef
{int{int
a;} str1;
typedefstruct
struct
a; float b;} str;
typedef struct {float b;} str2;
str1
*arr1== (str
(str1*)*) malloc(N*sizeof(str));
malloc(N*sizeof(str1));
str *arr
str2 *arr2 = (str2 *) malloc(N*sizeof(str2));
foo1 ()
foo2 ()
{
{
for (i=0; i<N; i++)
for (i=0; i<N; i++)
arr[i].a ==…;
arr1[i].a
…;
}
arr[i].b;
… = arr2[i].b;
}
a a
b a a
b a a
b a a
b a bb ba bb ba bb ba bb b
6
4/23/2007
© 2007 IBM Corporation
IBM HRL
Structure Splitting
typeset struct node_struct {
typedef struct node_struct {
struct node_struct *parent;
Node_0 * Node_0_ptr;
Expression
left;
Node_1 * Node_1_ptr;
Operator *
bin_op;
Node_2 * Node_2_ptr;
Expression
right;
} Node;
} Node;
Node
typedef struct {
Node
Node
Node
Node
Expression
Node_0
Node_0
Node_0
Node_0
Node_0
Node_1
Node_1
Node_1
Node_1
Node_1
Node_2
Node_2
Node_2
Node_2
Node_2
right; } Node_0;
typedef struct {
Expression
Operator *
left;
bin_op; } Node_1;
typedef struct {
struct node_struct *parent; } Node_2;
7
4/23/2007
© 2007 IBM Corporation
IBM HRL
Structure Reordering
typedef struct Operator_struct
{
typedef struct Operator_struct
{
n_int
precedence;
OpType
type;
const n_char *name;
OpMode
mode;
OpMode
n_char
symbol;
OpType
type;
n_char
symbol;
n_int
mode;
precedence;
struct Operator_struct *lower;
struct Operator_struct *
higher;
const char*
struct Operator_struct *
lower;
struct Operator_struct *higher;
} Operator;
name;
} Operator;
Less aggressive optimization
Does not change allocation/access sites, only type definition
8
4/23/2007
© 2007 IBM Corporation
IBM HRL
Decision Making Mechanism
profile-based; make use of CFG
Fields Reference Graph:
– vertices – field accesses
– edges
•
•
follow CFG
weighted with amount of data accessed between two field accesses
Rebuild Fields Reference Graph into Close Proximity Graph
– vertices – fields
– Edges
•
•
9
potentially all vertices are interconnected
average amount of data accessed between two fields
4/23/2007
© 2007 IBM Corporation
IBM HRL
Decision Making Mechanism
struct str {int f1; int f2; int f3};
foo ()
bb1
}
10
(100#,10)
100
(10#,25)
acc(str->f1)
{
…
acc(str->f1);
mem(10);
for (i=0; i<N; i++)
{
mem(5);
acc(str->f2);
mem(10);
acc(str->f1)
mem(7);
}
mem(8);
acc (str->f3);
…
(99#,12)
(209#,11)
(10.15)
f1
f2
mem(10)
(10#,15)
(90#,18)
(90#,18)
10
mem(5)
acc(str->f2)
acc(str->f2)
90(100#,10)
mem(10) (99#,12)
acc(str->f1)
mem(7)
bb3
bb2
10
(10#,15)
mem(8)
100
(10#,15)
f3
CPG
str1:{f1,f2}
srt2:{f3}
100
acc(str->f3)
acc(str->f3)
cfg
FRG
4/23/2007
© 2007 IBM Corporation
IBM HRL
Type escape analysis
structure escapes when:
–
–
–
–
–
pointer to structure is parameter/return type of global function
structure is global variable
one of the fields is escaped
cast to pointer to structure
operations other than plus, minus or times are applied to pointer to
structure
ipa-type-escape
–
–
–
–
11
ipa pass, by Kenneth Zadeck
initially written on gimple
recently updated to tree-ssa
by default with –O2
4/23/2007
© 2007 IBM Corporation
IBM HRL
Future plans
Support transformation for substructure
Analyze/reorg more that one structure at a time:
– consider the hierarchy of structures as a whole unite
– mixed data structures: arrays within structures
Spec2000/2006/mcf benchmark
•
new type of transformation
Analyze more data, tune decision make algorithm
Type escape analysis
•
•
12
more extensions with tree-ssa
check real world applications with big portion of the code is in libraries
4/23/2007
© 2007 IBM Corporation
IBM HRL
Agenda
General idea
In scope of GCC
Structure reorganization optimizations
– Transformations
– Decision making mechanism
– Type–escape analysis
Matrix reorganization optimizations
– Matrix flattening
– Matrix transposing
– Combining: flattening + transposing
Current status
Experimental results
13
4/23/2007
© 2007 IBM Corporation
IBM HRL
Matrix Flattening
Before flattening:
A
A[0]
A[0][0] A[0][1] A[0][2] A[0][3]
Access A[ i ][ j ] :
A[1]
A[1][0] A[1][1] A[1][2] A[1][3]
A[2]
A[2][0] A[2][1] A[2][2] A[2][3]
1. base = A + i * size(ptr)
2. A [ i ] [ j ] = *(*base + j * size(el))
After flattening:
A’
A[0][0] A[0][1] A[0][2] A[0][3] A[1][0] A[1][1] A[1][2] A[1][3] A[2][0] A[2][1] A[2][2] A[2][3]
One memory allocation
Access A[ i ][ j ] :
14
A [ i ] [ j ] = *(A’ + (i * 4 + j) * size(el))
4/23/2007
© 2007 IBM Corporation
IBM HRL
Flattening - the benefit
What is the benefit?
– Each matrix access involves less
indirections -> less memory references
– The rows are laid one after the other
-> contiguous space allocation
-> increase cache locality
15
4/23/2007
© 2007 IBM Corporation
IBM HRL
2.1 Matrix flattening: partial flattening
Flattening dim0 and
dim1
flattened
dim2
dim0
i
16
foo (a[i][j])
dim1
j
4/23/2007
© 2007 IBM Corporation
IBM HRL
Matrix Transposing
for (i=0; i<N; i++)
for (j=0; j<M; j++)
access to A[ ji ][ ij ]
A[0]
A[0][0]
A[0][0] A[0][1]
A[0][1] A[0][2]
A[0][2] A[0][3]
A[0][3]
A[1]
A[1][0]
A[1][0] A[1][1]
A[1][1] A[1][2]
A[1][2] A[1][3]
A[1][3]
A[2]
A[2][0]
A[2][0] A[2][1]
A[2][1] A[2][2]
A[2][2] A[2][3]
A[2][3]
The elements of the columns
rows are accessed
sequentially
A[0][1] A[1][1] A[2][1]
Iterated according to the columns
rows
A[0][2] A[1][2] A[2][2]
Matrix transposing: Interchanging the dimensions
17
A[0][0] A[1][0] A[2][0]
4/23/2007
A[0][3] A[1][3] A[2][3]
© 2007 IBM Corporation
IBM HRL
Matrix Transposing: decision making - example
for (i=0; i<N; i++)
-> row major matrix
for (j=0; j<M; j++)
acc1:
access to A[ i ][ j ]
for (i=0; i<N; i++)
-> column major matrix
for (j=0; j<M; j++)
acc2:
18
access to A[ j ][ i ]
4/23/2007
© 2007 IBM Corporation
IBM HRL
Matrix Transposing: decision making
Multi dimensional ->
all possible permutations for allocation
– How to decide the best permutation?
– We consider:
• Which dimensions are iterated in the innermost loops
• Profiling information: how often the accesses were executed
– The hotness of dimension determines how close in time
were accesses to the elements of this dimension,
normalized by the hotness of access execution.
19
4/23/2007
© 2007 IBM Corporation
IBM HRL
Combining Flattening and Transposing
Original matrix :
dim0
dim1
Flattened matrix
Flattening mechanism
Transposed matrix :
Virtual mapping
(Choose
permutation)
20
dim0 -> dim1
dim1 -> dim0
Flattened transposed matrix
Flattening mechanism
4/23/2007
© 2007 IBM Corporation
IBM HRL
Combining Flattening and Transposing
transposing + flattening ->
flatten the transposed matrix
A[3,4] => A[4,3]
A[0][0] A[0][1] A[0][2] A[0][3]
A[1][0] A[1][1] A[1][2] A[1][3]
A[2][0] A[2][1] A[2][2] A[2][3]
21
A[0][0] A[1][0] A[2][0]
=>
A[0][1] A[1][1] A[2][1]
A[0][2] A[1][2] A[2][2]
A[0][3] A[1][3] A[2][3]
4/23/2007
© 2007 IBM Corporation
IBM HRL
status
Matrix flattening and transposing:
– Initially developed on the ipa branch.
– To be included in GCC4.3; patch submitted recently
– Enabled by -fipa-mreorg, -fdump-ipa-mreorg for dump
– Transposing enabled when profiling information is available
Struct reorg:
– Initially developed on struct reorg branch
– To be included in GCC4.3
– Enabled by -fipa-reorg-structs
Thanks! Mustafa Hagog, Revital Eres, Daniel Berlin, Kenneth Zadeck, Zdenek
Dvorak, Jan Hubicka, Sebastian Pop
22
4/23/2007
© 2007 IBM Corporation
IBM HRL
power5/linux
Performance Results
180%
Original
160%
Matrix Flattening
140%
Flattening+Transposing
120%
Struct Reorg
100%
80%
60%
40%
20%
0%
power970/linux
art
equake
Original
600%
Matix Flattening
Flattening+Transposing
500%
Struct Reorg
Matrix+Struct Opt
400%
300%
200%
100%
0%
art
23
equake
4/23/2007
© 2007 IBM Corporation
IBM HRL
Questions?
24
4/23/2007
© 2007 IBM Corporation