A Proposal of Operation History Management System for Source-to-Source Optimization of HPC Programs

19-20 July, 2009 | PADTAD 2009 @ Chicago, Illinois
A Proposal of Operation History Management System
for Source-to-Source Optimization of HPC Programs
Yasushi Negishi, Hiroki Murata and Takao Moriyama
Deep Computing, Tokyo Research Laboratory, IBM Research
19-20 July, 2009 | PADTAD 2009 @ Chicago, Illinois
© 2009 IBM Corporation
19-20 July, 2009 | PADTAD 2009 @ Chicago, Illinois
Outline of this Presentation
1.Proposal of an algorithm for managing operation
history of source-to-source optimization.
2.Prototype system with new user interface for
managing operation history explicitly.
2
© 2009 IBM Corporation
19-20 July, 2009 | PADTAD 2009 @ Chicago, Illinois
Outline of this Presentation
1.Proposal of an algorithm for managing operation
history of source-to-source optimization.
2.Prototype system with new user interface for
managing operation history explicitly.
3
© 2009 IBM Corporation
19-20 July, 2009 | PADTAD 2009 @ Chicago, Illinois
Background

Improvement of single processor performance is stopping, and architectures of
supercomputers is becoming more complex.
– Architecture-specific optimizations are needed to utilize various kinds of
network and processor architectures to achieve reasonable performance.
 Application areas for numerical simulations continue to expand.
–We need solve performance issues more effectively and more easily.
 Source-to-source optimization tools are becoming important.
–Automatic conversion (a.k.a. refactoring) for optimization
–Support typical architecture-specific and application-specific performance
optimization patterns.
–Reduce programmer’s time and human errors by supporting routine but
troublesome optimization.
4
© 2009 IBM Corporation
19-20 July, 2009 | PADTAD 2009 @ Chicago, Illinois
Typical Source-to-Source Optimization Steps

Strength reduction
Optimization steps are combinations of automatic conversion and manual editing
– Replace costly operation with an equivalent but less expensive operation
• E.g. x = r ** (-1)  x = 1 / r
– Steps
1. Modify the code to use less expensive operation by manual editing
Loop unrolling & SIMDization
– Use SIMD instructions If compiler does not generate optimal SIMD instructions in a loop
• E.g. x(i) = a(i) + b(i) * c(i)
 x(i) = FPMADD(a(i), b(i), c(i))
•
x(i+1) = a(i+1) + b(i+1) * c(i+1)
– Steps
1. Unroll the loop by automatic conversion with specifying the range and unroll factor.
2. Modify the unrolled loop body with in-line assemble code for SIMD by manual editing
Loop tiling (a.k.a. loop blocking, strip mine and interchange)
– Change loop structure to increase memory access locality and cache hit ratio.
• E.g.
for (i=0; i<N; i+= Bi)
for (i=0; i<N; i++)
for (j=0; j<N; j+= Bj)
for (j=0; j<N; j++)
for (ii=i; ii<min(i+Bi,N); ii++)
c[i] = c[i]+ a[i,j]*b[j];
for (jj=j; jj<min(j+Bj,N); jj++)
c[ii] =c[ii]+ a[ii,jj]*b[jj];
– Steps
1. Modify the loop by automatic conversion with specifying the range and blocking factors.



5
© 2009 IBM Corporation
19-20 July, 2009 | PADTAD 2009 @ Chicago, Illinois
“Reapplication Conflict”
 Because of trial-and-error nature of optimization work, it is sometimes required to
undo an operation in the past or to insert or change operation in the past even if a
single user manages the code.
 We call this conflict caused by a single user as “Reapplication Conflict”.
 System for supporting Source-to-Source optimization should handle this conflict
correctly.
6
© 2009 IBM Corporation
19-20 July, 2009 | PADTAD 2009 @ Chicago, Illinois
Issues of Existing Version Management Systems Handling
“Reapplication Conflict”
 Because of trial-and-error nature of optimization work, it is sometimes
required to undo an operation in the past or to insert or change operation
in the past even if a single user manages the code.
–We call this conflict caused by a single user as “Reapplication Conflict”.
System should handle this conflict correctly.
 Existing version management systems use algorithm of “patch” command
or similar one to handle conflicts.
 But the patch algorithm has a issue.
–As for modification by manual editing, the patch algorithm works fine.
• The algorithm applies difference by an operation on different base
code, with adjusting target range to be applied.
–As for modification by automatic conversion, the patch algorithm may
generate unexpected results.
 Scenario in which existing system does not work expectedly is shown.
7
© 2009 IBM Corporation
19-20 July, 2009 | PADTAD 2009 @ Chicago, Illinois
Example Scenario of “Reapplication Conflict” (original)
program sample
implicit none
integer i, n
parameter(n=10000000)
real*8 a, b, pi, x(n), sin, s, t1, t2, t3, rtc
a=0
b=0
pi = 3.14159265d0
s = rtc()
do i = 1, n
x(i) = i * sin(i / (pi * 4.0d0))
enddo
t1 = rtc() - s
s = rtc()
do i = 1, n
a = a + x(i) ** (-1)
enddo
t2 = rtc() - s
s = rtc()
do i = 2, n
b = b + ((x(i) + a) / (pi * 4.0d0) + 1.0d0)
enddo
t3 = rtc() - s
write(*,*) 'a=', a, 'b=', b
write(*,*) 'time=', t1, t2, t3
end
8
Original
Original code is checked out.
© 2009 IBM Corporation
19-20 July, 2009 | PADTAD 2009 @ Chicago, Illinois
Example Scenario of “Reapplication Conflict” (Step 1)
program sample
implicit none
integer i, n
parameter(n=10000000)
real*8 a, b, pi, fourpi, x(n), sin, s, t1, t2, t3, rtc
a=0
b=0
pi = 3.14159265d0
s = rtc()
fourpi = pi * 4.0d0
do i = 1, n
x(i) = i * sin(i / fourpi)
enddo
t1 = rtc() - s
s = rtc()
do i = 1, n
a = a + x(i) ** (-1)
enddo
t2 = rtc() - s
s = rtc()
do i = 2, n
b = b + ((x(i) + a) / fourpi + 1.0d0)
enddo
t3 = rtc() - s
write(*,*) 'a=', a, 'b=', b
write(*,*) 'time=', t1, t2, t3
end
9
Original
Operation A
Original:
Step 1:
Step 1: Do loop invariant code motion by
manual editing, and check it in
© 2009 IBM Corporation
19-20 July, 2009 | PADTAD 2009 @ Chicago, Illinois
Example Scenario of “Reapplication Conflict” (Step 2)
program sample
implicit none
integer i, n
parameter(n=10000000)
real*8 a, b, pi, fourpi, x(n), sin, s, t1, t2, t3, rtc
a=0
b=0
pi = 3.14159265d0
s = rtc()
fourpi = pi * 4.0d0
do i = 1, n
x(i) = i * sin(i / fourpi)
enddo
t1 = rtc() - s
s = rtc()
do i = 1, n
a = a + 1.0d0 / x(i)
enddo
t2 = rtc() - s
s = rtc()
do i = 2, n
b = b + ((x(i) + a) / fourpi + 1.0d0)
enddo
t3 = rtc() - s
write(*,*) 'a=', a, 'b=', b
write(*,*) 'time=', t1, t2, t3
end
10
Original
A
B
Original:
Step 1:
Step 2:
Step 2: Do strength reduction by manual
editing, and check it in.
© 2009 IBM Corporation
19-20 July, 2009 | PADTAD 2009 @ Chicago, Illinois
Example Scenario of “Reapplication Conflict” (Step 3)
program sample
implicit none
integer i, n
parameter(n=10000000)
real*8 a, b, pi, fourpi, x(n), sin, s, t1, t2, t3, rtc
a=0
b=0
pi = 3.14159265d0
s = rtc()
fourpi = pi * 4.0d0
do i = 1, n
x(i) = i * sin(i / fourpi)
enddo
t1 = rtc() - s
s = rtc()
do i = 1, n
a = a + 1.0d0 / x(i)
enddo
t2 = rtc() - s
s = rtc()
do i = 2, n, 4
b = b + ((x(i) + a) / fourpi + 1.0d0)
b = b + ((x(i+1) + a) / fourpi + 1.0d0)
b = b + ((x(i+2) + a) / fourpi + 1.0d0)
b = b + ((x(i+3) + a) / fourpi + 1.0d0)
enddo
t3 = rtc() - s
write(*,*) 'a=', a, 'b=', b
write(*,*) 'time=', t1, t2, t3
end
11
Original
A
B
C
Original:
Step 1:
Step 2:
Step 3:
Step 3: Do loop unrolling by automatic
conversion, and check it in.
© 2009 IBM Corporation
19-20 July, 2009 | PADTAD 2009 @ Chicago, Illinois
Example Scenario of “Reapplication Conflict” (Step 4)
program sample
implicit none
integer i, n
parameter(n=10000000)
real*8 a, b, pi, fourpi, x(n), sin, s, t1, t2, t3, rtc
a=0
b=0
pi = 3.14159265d0
s = rtc()
fourpi = pi * 4.0d0
do i = 1, n
x(i) = i * sin(i / fourpi)
enddo
t1 = rtc() - s
s = rtc()
do i = 1, n
a = a + 1.0d0 / x(i)
enddo
t2 = rtc() - s
s = rtc()
do i = 2, n, 4
b = b + ((x(i) + a) / fourpi + 1.0d0)
b = b + ((x(i+1) + a) / fourpi + 1.0d0)
b = b + ((x(i+2) + a) / fourpi + 1.0d0)
b = b + ((x(i+3) + a) / fourpi + 1.0d0)
enddo
t3 = rtc() - s
write(*,*) 'a=', a, 'b=', b
write(*,*) 'time=', t1, t2, t3
end
12
Original
A
B
C
Original:
Step 1:
Step 2:
Step 3:
N.G.
O.K.
O.K.
Step 4: Compile and execute the code,
and analyze effects of optimizations
 Find the following results
Optimization A: not effective
Optimization B: effective
Optimization C: effective
© 2009 IBM Corporation
19-20 July, 2009 | PADTAD 2009 @ Chicago, Illinois
Example Scenario of “Reapplication Conflict” (Step 5)
program sample
implicit none
integer i, n
parameter(n=10000000)
real*8 a, b, pi, fourpi, x(n), sin, s, t1, t2, t3, rtc
13
Original
A
B
C
Original:
a=0
Step 1:
b=0
pi = 3.14159265d0
s = rtc()
fourpi = pi * 4.0d0
Step 2:
do i = 1, n
x(i) = i * sin(i / fourpi)
Target of optimization A
enddo
t1 = rtc() - s
Step 3:
s = rtc()
do i = 1, n
a = a + 1.0d0 / x(i)
enddo
Step 5:
t2 = rtc() - s
s = rtc()
do i = 2, n, 4
b = b + ((x(i) + a) / fourpi + 1.0d0)
Step 5: Undo the optimization A by
b = b + ((x(i+1) + a) / fourpi + 1.0d0)
“patch” command
b = b + ((x(i+2) + a) / fourpi + 1.0d0)
b = b + ((x(i+3) + a) / fourpi + 1.0d0)
enddo
t3 = rtc() - s
Not target of optimization A, but influenced
write(*,*) 'a=', a, 'b=', b
write(*,*) 'time=', t1, t2, t3
end
© 2009 IBM Corporation
19-20 July, 2009 | PADTAD 2009 @ Chicago, Illinois
Example Scenario of “Reapplication Conflict” (Final Results)
program sample
implicit none
integer i, n
parameter(n=10000000)
real*8 a, b, pi, x(n), sin, s, t1, t2, t3, rtc
a=0
b=0
pi = 3.14159265d0
s = rtc()
do i = 1, n
x(i) = i * sin(i / (pi * 4.0d0))
enddo
t1 = rtc() - s
s = rtc()
do i = 1, n
a = a + 1 / x(i)
enddo
t2 = rtc() - s
s = rtc()
do i = 2, n, 4
b = b + ((x(i) + a) / (pi * 4.0d0) + 1.0d0)
b = b + ((x(i+1) + a) / fourpi + 1.0d0)
b = b + ((x(i+2) + a) / fourpi + 1.0d0)
b = b + ((x(i+3) + a) / fourpi + 1.0d0)
enddo
t3 = rtc() - s
write(*,*) 'a=', a, 'b=', b
write(*,*) 'time=', t1, t2, t3
end
14
Problem:
The wrong line is unrolled !!
Because “patch” does not actually
apply the automatic conversion
operation again, but does just apply
difference of the results by automatic
conversion operation.
System for managing automatic
conversion operations needed.
(1) Adjust the target range
(2) Apply the automatic operation
actually again.
© 2009 IBM Corporation
19-20 July, 2009 | PADTAD 2009 @ Chicago, Illinois
Proposed Algorithm for saving/applying automatic operations
 Manual editing handled by the patch algorithm
Saving an operation
Manual
Editing
Original
code
Applying an saved operation
Modified
code
Patch
Optimization
results
Optimized results
on modified code
algorithm
Context difference file
Context difference file
Operation log
Operation log
 Automatic conversion handled by our proposed algorithm
Specify
Range
Original
Code
15
Specify
Conversion ID
and arguments
Pseudo
change
file
Optimization
results
Modified
Code
Patch
algorithm
Pseudo
change
file
Optimization
results
Context difference file
Apply automatic
Context difference file conversion
Conversion ID
Conversion ID
Arguments
Arguments
Operation log
Operation log
© 2009 IBM Corporation
19-20 July, 2009 | PADTAD 2009 @ Chicago, Illinois
Scenario of Proposed Algorism to Save Automatic Operations
program sample
program
implicit
nonesample
implicit
integer
i, n none
integer
i, n
parameter(n=10000000)
Identifier
of
automatic conversion Operation
parameter(n=10000000)
real*8
a, b, pi, fourpi, x(n), sin, s, t1, t2, t3, rtc
real*8 a, b, pi, fourpi, x(n), sin, s, t1, t2, t3, rtc
“loop unrolling”
a=0
b = a0 = 0
=0
pi =b3.14159265d0
parameter
= 3.14159265d0
s = pi
rtc()
s = rtc()
4 fourpi = pi * 4.0d0
do ifourpi
= 1, n= pi * 4.0d0
n / fourpi)
x(i)do= ii =* 1,
sin(i
x(i)
=
i
*
sin(i / fourpi)
enddo difference
context
file
enddo
t1 = rtc() - s
= rtc()
- sJul 11 11:36:34 2009
*** opeB.F
Sat
s = t1
rtc()
s
=
rtc()
--- opeC2.F
do i = 1, n Sun Jul 12 13:36:10 2009
n / x(i)
***************
ado
= ai =+1,
1.0d0
a
=
a
+
1.0d0 / x(i)
*** 19,27
enddo****
--- 19,29
---- - s
t2 =enddo
rtc()
t2
= rtc() - s
enddo
s = rtc()
s
=
t2i == rtc()
do
2,rtc()
n- s
$BEGIN
s b= =rtc()
b + ((x(i) + a) / fourpi + 1.0d0)
do i = 2, n
+ $BEGIN
enddo
=-bns+ ((x(i) + a) / fourpi + 1.0d0)
=b 2,
t3do
= irtc()
enddo
b = b 'a=',
+ ((x(i)
+ a) b/ fourpi + 1.0d0)
write(*,*)
a, 'b=',
$END
enddo 'time=', t1, t2, t3
write(*,*)
+ $END
endt3 = rtc() - s
t3 write(*,*)
= rtc() - s'a=', a, 'b=', b
write(*,*)
'time=',
t1,bt2, t3
write(*,*)
'a=',
a, 'b=',
end 'time=', t1, t2, t3
write(*,*)
16
Algorithm for saving operation history
log Step 1: Generate pseudo change file
pseudo change file
by inserting special lines to specify
range for the automatic operation.
Step 2: Create context difference file
between the file before editing and the
pseudo change file
By saving this context difference file,
range-adjust algorithm of “patch”
command can be used for identifying the
target range of automatic conversion.
Step 3: Save identifier of automatic
conversion operation (e.g. “loop
unrolling”), its parameter (e.g. “4”), and
the context difference file as its operation
log.
© 2009 IBM Corporation
Identifier of automatic conversion Operation log
19-20 July, 2009 | PADTAD 2009 @ Chicago, Illinois
“loopAutomatic
unrolling”
Scenario of Proposed Algorism to Apply
Operation (Step 1)
program sample
program
implicit
nonesample
implicit
integer
i, n none
integer
i, n
parameter(n=10000000)
parameter(n=10000000)
real*8
a, b, pi, x(n), sin, s, t1, t2, t3, rtc
real*8 a, b, pi, fourpi, x(n), sin, s, t1, t2, t3, rtc
a=0
b = a0 = 0
=0
pi =b3.14159265d0
= 3.14159265d0
s = pi
rtc()
do is==1,rtc()
n
= pi */ 4.0d0
x(i)fourpi
= i * sin(i
(pi * 4.0d0))
do
i
=
1,
n
enddo
x(i) =- is* sin(i / fourpi)
t1 = rtc()
s = enddo
rtc()
t1
do i = =1,rtc()
n -s
s
=
rtc()
a = a + x(i) ** (-1)
do i = 1, n
enddo
a =-as+ 1.0d0 / x(i)
t2 = rtc()
s = enddo
rtc()
t2
do i = =2,rtc()
n -s
s
=
rtc()
b = b + ((x(i) + a) / (pi * 4.0d0) + 1.0d0)
$BEGIN
enddo
i = -2,s n
t3 =do
rtc()
b = 'a=',
b + ((x(i)
+ a)
write(*,*)
a, 'b=',
b / fourpi + 1.0d0)
enddo'time=', t1, t2, t3
write(*,*)
$END
end
t3 = rtc() - s
write(*,*) 'a=', a, 'b=', b
2:
Ignore
the
starting
and
ending
Trialwrite(*,*)
3:
4:
1: Apply
outer
history
one
thelines
line
same
'time=',the
t1, t2,
t3mostattwo
end
line
numbersthe modification
position
before/after
Not Match
Match
17
parameter
4
context difference file
*** opeB.F
Sat Jul 11 11:36:34 2009
--- opeC2.F Sun Jul 12 13:36:10 2009
***************
*** 19,27 ****
--- 19,29 ---enddo
t2 = rtc() - s
s = rtc()
+ $BEGIN
do i = 2, n
b = b + ((x(i) + a) / fourpi + 1.0d0)
enddo
+ $END
t3 = rtc() - s
write(*,*) 'a=', a, 'b=', b
write(*,*) 'time=', t1, t2, t3
Algorithm for applying operation history
on modified target code
Step1: Apply the context diff file to the
target program by using algorithm used
© 2009 IBM Corporation
pseudo change file by the “patch” command.
Identifier of automatic conversion Operation log
19-20 July, 2009 | PADTAD 2009 @ Chicago, Illinois
“loopAutomatic
unrolling”
Scenario of Proposed Algorism to Apply
Operation (Step 2)
program sample
implicit none
integer i, n
parameter(n=10000000)
real*8 a, b, pi, fourpi, x(n), sin, s, t1, t2, t3, rtc
a=0
b=0
pi = 3.14159265d0
s = rtc()
fourpi = pi * 4.0d0
do i = 1, n
x(i) = i * sin(i / fourpi)
enddo
t1 = rtc() - s
s = rtc()
do i = 1, n
a = a + 1.0d0 / x(i)
enddo
Redo
“loop unrolling” “4” times
t2 = rtc() - s
s = rtc()
$BEGIN
do i = 2, n
b = b + ((x(i) + a) / fourpi + 1.0d0)
enddo
$END
t3 = rtc() - s
write(*,*) 'a=', a, 'b=', b
write(*,*) 'time=', t1, t2, t3
end
18
parameter
4
context difference file
on “the loop”
pseudo change file
*** opeB.F
Sat Jul 11 11:36:34 2009
--- opeC2.F Sun Jul 12 13:36:10 2009
***************
*** 19,27 ****
--- 19,29 ---enddo
t2 = rtc() - s
s = rtc()
+ $BEGIN
do i = 2, n
b = b + ((x(i) + a) / fourpi + 1.0d0)
enddo
+ $END
t3 = rtc() - s
write(*,*) 'a=', a, 'b=', b
write(*,*) 'time=', t1, t2, t3
Algorithm for applying operation history
on modified target code
Step2: Redo automatic conversion with
its parameter saved in the operation log.
© 2009 IBM Corporation
19-20 July, 2009 | PADTAD 2009 @ Chicago, Illinois
Proposed Algorism to Apply Automatic Operation (Final Results)
program sample
implicit none
integer i, n
parameter(n=10000000)
real*8 a, b, pi, x(n), sin, s, t1, t2, t3, rtc
a=0
b=0
pi = 3.14159265d0
s = rtc()
do i = 1, n
x(i) = i * sin(i / (pi * 4.0d0))
enddo
t1 = rtc() - s
s = rtc()
do i = 1, n
a = a + 1 / x(i)
enddo
t2 = rtc() - s
s = rtc()
do i = 2, n, 4
b = b + ((x(i) + a) / (pi * 4.0d0) + 1.0d0)
b = b + ((x(i+1) + a) / (pi * 4.0d0) + 1.0d0)
b = b + ((x(i+2) + a) / (pi * 4.0d0) + 1.0d0)
b = b + ((x(i+3) + a) / (pi * 4.0d0) + 1.0d0)
enddo
t3 = rtc() - s
write(*,*) 'a=', a, 'b=', b
write(*,*) 'time=', t1, t2, t3
end
19
Problem solved.
The correct line is unrolled !!
The proposed system can reapply
automatic conversion operations
correctly.
© 2009 IBM Corporation
19-20 July, 2009 | PADTAD 2009 @ Chicago, Illinois
Outline of this Presentation
1.Proposal of an algorithm for managing operation
history of source-to-source optimization.
2.Prototype system with new user interface for
managing operation history explicitly.
20
© 2009 IBM Corporation
19-20 July, 2009 | PADTAD 2009 @ Chicago, Illinois
User defined
Transformation
rules
User defined
Transformation
rules
Pre-defined
Transformation
rules
Prototype Implementation of the Proposed System
HPC refactoring module
Photran
module
(Fortran)
CDT
module
(C)
Open
Open
Source
Source
Eclipse
 Implemented as an Eclipse plug-in module
– Worked with open source CDT/Photran modules
– Use CDT/Photran’s C/Fortran parser
21
© 2009 IBM Corporation
19-20 July, 2009 | PADTAD 2009 @ Chicago, Illinois
Proposal of user interface for operation history management system
1. Operation History is displayed as a
sequence, and user can select and
modify any point of source code.
2. The succeeding operations are
automatically reapplied as needed
to
Operation
produce a new version according
to
history view
the user’s instructions.
Source code view
3. Operations are categorized into
the following three categories
and
according
status and
Source
code to theInformation
console
output view and
tree
view of the
necessity
reapplication,
are displayed by using three colors.
Green: Applied
Yellow: Not tried to applied
Operation
Red: Tried to applied, but
fail. view
22
history
© 2009 IBM Corporation
19-20 July, 2009 | PADTAD 2009 @ Chicago, Illinois
Conclusion
1.Explained proposal of an algorithm for managing
operation history of source-to-source optimization.
2.Explained Prototype system with new user interface
for managing operation history explicitly.
23
© 2009 IBM Corporation
19-20 July, 2009 | PADTAD 2009 @ Chicago, Illinois
Questions ?
24
© 2009 IBM Corporation
Similar pages