Paper: PDF

March 29, 2000 12:56 pm
1
The MIRV SimpleScalar/PISA Compiler
Matthew Postiff, David Greene, Charles Lefurgy, Dave Helder and Trevor Mudge
{postiffm,greened,lefurgy,dhelder,tnm}@eecs.umich.edu
EECS Department, University of Michigan
1301 Beal Ave., Ann Arbor, MI 48109-2122
Abstract
We introduce a new experimental C compiler in this report. The compiler, called MIRV, is
designed to enable research that explores the interaction between the compiler and
microarchitecture. This introductory paper makes comparisons between MIRV and GCC.
We notice trends between the compilers and optimization levels across SPECint1995 and
SPEC2000. Finally, we provide a set of SimpleScalar/PISA binaries to the research community. As we improve the compiler, we encourage architecture researchers to use these
optimized binaries as reference programs for architecture research.
1. Introduction
The design of computers in general and microprocessors in particular has shown a steady
increase in both performance and complexity. Advanced techniques such as pipelining and out-oforder execution have increased the design and verification effort required to create a viable product. To overcome some of these problems, hardware designers have been exploring ways to move
functionality into the compiler. From RISC to current designs such as Intel’s IA64, the compiler
has played a greater role in simplifying the hardware while maintaining the current trend of performance improvement.
The MIRV compiler is designed to analyze trade-offs between compile-time and run-time
knowledge of program behavior. MIRV enables research into this area in four ways. First, the
compiler is built with a modular filter architecture. This allows the researcher to easily write optimizations and explore their placement in the phase ordering. Second, the retargetable code generator and low-level optimizer support both commercially available microprocessors and the
popular SimpleScalar simulation environment. This allows both realistic performance evaluation
as well as explorations into next-generation computer instruction set architecture. Third, MIRV
provides an interface for program instrumentation and profile back-annotation. This allows studies into runtime behavior as well as profile-guided optimizations. Fourth, the compiler environment that we have developed around MIRV provides easy regression testing, debugging, and
extraction of performance characteristics of both the compiler and the compiled code.
In this report we introduce MIRV and compare its performance to the GCC compiler. This
report also introduces a package of SPEC binary executables which are compiled with GCC and
MIRV at various optimization levels. The purpose of this document is to explain the compilation
and simulation environment in which the binaries were produced and to summarize the performance differences between the compiled code. Several notable results are presented.
The organization of the rest of this paper is as follows. Section 2 describes the compilation
environment that we used to generate the results shown. Similarly, Section 3 outlines the simula-
March 29, 2000 12:56 pm
2
tion environment. Section 4 introduces the performance graphs shown in the appendices and Section 5 describes some interesting observations made from the performance graphs. We conclude
with Section 6. The appendices contain detailed compilation and simulation results as well as provide additional detail on the optimizations that were performed during compilation.
2. Compilation Environment
We tested seven compiler configurations. The first is labeled ‘SSsup’ which is the SimpleScalar supplied binary, available at the SimpleScalar web site [2]. The next three configurations were compiled in our test environment with the GCC 2.7.2.3 port to the PISA instruction set.
This tool is available from UC-Davis [7]. We also used a pre-release version of binutils 2.9.5 for
the assembler and linker. These were slightly modified from sources at Cygnus [8]. The final
three configurations were compiled with MIRV and used the same assembler and linker as the
GCC builds.
The MIRV compiler implements the most common optimization passes. The exact order
of application of the optimzation filters is given in Table 4 in Appendix A. For comparison,
Appendix B contains the optimizations applied in the GCC compiler.
MIRV always applies register coalescing and graph coloring register allocation in the
backend, regardless of the optimization level. The allocator is implemented with the standard
graph coloring algorithm except that it does not implement live range splitting or rematerialization [3]. This means that it is not fair to compare GCC -O0 with mirv -O0 since GCC does not
perform register allocation at the -O0 optimization level.
3. Simulation Environment
The SimpleScalar 3.0 sim-outorder simulator was used with default parameters [4]. Table
1 shows the relevant default parameter values. All simulations were performed in little-endian
mode.
We used the SPEC95 integer benchmarks and several of the SPEC00 benchmarks [5, 6].
All benchmarks were run to completion on the data set indicated in the table; we modified the
supplied input sets to allow the simulations to complete in a reasonable amount of time (about 100
million instructions). The benchmarks are described in Table 2 and the exact input sets are shown
in Table 3.
4. SPEC Performance Graphs
The full set of graphs comparing MIRV to GCC can be found in Appendices C and D.
These graphs show various metrics for each of the eight SPEC95 benchmarks and selected
SPEC00 benchmarks. Table 6 explains each of the graphs and any special notes on how the data
was gathered. For the SPEC95 benchmarks, we include the PISA binary supplied on the SimpleScalar website [2] as a comparison point. These benchmarks were compiled with the arguments “O2 -funroll-loops”. There are no supplied binaries for SPEC00 benchmarks, so no information
appears for those in our graphs. The full set of results is attached in Appendix F.
March 29, 2000 12:56 pm
SimpleScalar
parameter
fetch queue size
fetch speed
decode, width
issue width
commit width
RUU (window) size
LSQ
FUs
branch prediction
L1 D-cache
L1 I-cache
L2 unified cache
memory latency
memory width
Instruction TLB
Data TLB
3
Value
4
1
4
4 out-of-order, wrong-path issue included
4
16
8
alu:4, mult:1, memport:2, fpalu:4, fpmult:1
2048-entry table of 2-bit counters, 4-way 512-set BTB, 3 cycle extra mispredict latency, non-speculative update, 8-entry return address stack
128-set, 4-way, 32-byte lines, LRU, 1-cycle hit, total of 16KB
512-set, direct-mapped 32-byte line, LRU, 1-cycle hit, total of 16KB
1024-set, 4-way, 64-byte line, 6-cycle hit, total of 256KB
18 cycles for first chunk, 2 thereafter
8 bytes
16-way, 4096 byte page, 4-way, LRU, 30 cycle miss penalty
32-way, 4096 byte page, 4-way, LRU, 30 cycle miss penalty
Table 1. Simulation parameters for sim-outorder (the defaults).
Category
SPECint95
SPECfp2000
Benchmark
compress
gcc
go
ijpeg
li
m88ksim
perl
vortex
art
equake
gzip
SPECint2000
mcf
vortex
vpr
Description
A in-memory version of the common UNIX utility.
Based on the GNU C compiler version 2.5.3.
An internationally ranked go-playing program.
Image compression/decompression on in-memory images.
Xlisp interpreter.
A chip simulator for the Motorola 88100 microprocessor.
An interpreter for the Perl language.
An object oriented database.
Recognizes objects in a thermal image using a neural network.
Simulation of seismic wave propagation in large basins.
Data compression program that uses Lempel-Ziv coding (LZ77) as
its compression algorithm.
A benchmark derived from a program used for single-depot vehicle
scheduling in public mass transportation.
A single-user object-oriented database transaction benchmark which
exercises a system kernel coded in integer C.
Performs placement and routing in Field-Programmable Gate Arrays.
Table 2. Descriptions of the benchmarks used in this study.
The only anomalous behavior we observed during simulations was in the vortex benchmark, where we discovered that the SimpleScalar supplied binary had been compiled with the flag
‘-DOPTIMIZE’. The GCC and MIRV binaries that we initially built were not compiled with this
flag because we did not know about it. The flag turns on various optimizations in the vortex code
March 29, 2000 12:56 pm
Category
SPECint95
SPECfp2000
SPECint2000
Benchmark
compress
gcc
go
ijpeg
li
m88ksim
perl
vortex
art
equake
gzip
mcf
vortex
vpr
4
Input
30000 q 2131
regclass.i
9 9 null.in
specmun.ppm, -compression.quality 25, other args as in training run
boyer.lsp (reference input)
ctl.lit (train input)
jumble.pl < jumble.in, dictionary up to ’angeline’ only
250 parts and 1000 people, other variables scaled accordingly
-scanfile c756hel.in -trainfile1 a10.img -stride 2 -startx 134 -starty
220 -endx 139 -endy 225 -objects 1 (test input)
< inp.in (test input)
input.compressed 1 (test input)
inp.in (test input)
250 parts and 1000 people, other variables scaled accordingly
net.in arch.in place.in route.out -nodisp -route_only route_chan_width 15 -pres_fac_mult 2 -acc_fac 1 first_iter_pres_fac 4 -initial_pres_fac 8 (test input)
Table 3. Description of benchmark inputs.
itself (it is a preprocessor directive). We added ‘-DOPTIMIZE’ to our simulations and the anomaly was solved.
5. Performance Observations
Several interesting observations can be made from the data shown in Appendices C and D.
These observations could fall into several categories which are examined in the following subsections. It is important to keep in mind the simulator configuration shown in Table 1.
5.1 Comparing MIRV to GCC
GCC has no register allocation in -O0. MIRV has graph coloring allocation and register
coalescing (simple copy propagation). Since GCC and MIRV unoptimized code is otherwise very
similar, we can use these two bars to show an estimate of the importance of register allocation.
For example, MIRV -O0 execution times are often 20% faster than GCC -O0 and sometimes
much faster. This benefit is solely due to register allocation. MIRV-O1 and -O2 performs a little
worse than GCC. This is borne out in the graphs on cycles and dynamic counts of instructions,
memory references and branches. The dynamic instruction mix graphs point out that MIRV is uniformly higher than GCC in all categories of instructions (Appendix E), particularly in memory
operations. When MIRV produces better code than GCC, it is often because it has reduced the
number of ‘other’ instructions (this happens in go, ijpeg, vortex, and vortex00).
The graphs show that dynamic instruction count is often a very good indication of the
number of cycles the benchmark will take to execute. However, there are several counter-examples. For instance, the mirv-O2 instruction count for perl is 2% worse than for GCC-O2 but the
binary executes 9.6% faster. The opposite happens on go.
March 29, 2000 12:56 pm
5
5.2 Comparing SPEC95 to SPEC00
There are several characteristics that differentiate SPEC95 from SPEC00. IPC ranges
from 1 to 2 for SPEC95 and 0.6 to 1.8 for SPEC00. The average number of instructions per
branch is 4 to 6 for SPEC95 and 4 to 8 for SPEC00 (ignoring ijpeg and the unoptimized binaries).
SPEC00 instruction cache miss rates are very low except for the vortex benchmark. The
instruction cache simulated in this work is 16KB. The floating point benchmarks art and equake
have very small source code – each is only one source file and have 1270 and 1513 lines of source
code, respectively. The integer benchmark mcf is similarly small at 2412 lines of code. These
benchmarks are similar to compress, ijpeg, and li95 in the SPEC95 suite. The other SPEC95
benchmarks have a much higher miss ratio than SPEC00. SPEC00 vortex has slightly higher miss
rate than SPEC95 version of vortex.
SPEC00 data cache miss rates are much higher than SPEC95. Whereas SPEC95 miss rates
are generally less than 2% (5% for compress), SPEC00 miss rates are usually around 4%. art is a
particularly notable example with up to a 40% miss rate. Within a given compiler, optimization
generally makes the data miss rate worse. This is to be expected as optimizations cause more efficient use of registers, thus eliminating the “easy” load and store operations and leaving those that
are essential to the algorithm. A prime example of this is the art benchmark, where the data cache
miss rate increases from 15% to 40% as optimizations are enabled from -O0 to -O2. At the same
time, however, the number of data references is cut by a factor of three. The low fruit has been
harvested and the “essential” memory accesses remain in the benchmark. The unified L2 cache
suffers a higher miss rate in SPEC00 as well.
The SPEC00 binaries presented here are much smaller than the binaries for SPEC95. This
is one reason that the instruction cache performs so much better for SPEC00. On the other hand,
the instruction window is much busier in the SPEC00 than it is in SPEC95 as shown in the register-update-unit utilization graph. One might expect smaller programs to make less usage of the
instruction window, but because of the high data cache miss rates it appears that instructions are
held up longer in the window.
To summarize the differences between SPEC00 and SPEC95, we saw that IPC and data
cache performance were lower for the newer benchmarks, but that these programs exercised the
instruction cache less because of their smaller code size. This points out the importance of selecting the appropriate set of benchmarks for a given architectural study. Instruction cache studies
should probably avoid many of the SPEC00 benchmarks because they do not stress the instruction
cache. On the other hand, data cache studies would emphasize SPEC00 because it strains the data
side of the caching system much more than SPEC95. SPEC00 also seems to require a bigger
instruction window to avoid window-full stalls. The two suites together seem to provide a nice
complement of characteristics; most studies should use both suites.
5.3 Comparing Optimization Characteristics
MIRV and GCC optimizations exhibit similar characteristics across most of the benchmarks but are there exceptions. For example, -O2 optimization usually produces code that runs
slightly faster than -O1 code. However, in the case of the vortex benchmark, -O2 code is slightly
worse than -O1 code for MIRV. This is due to register promotion which in this case increases the
register pressure to the point of introducing additional spilling code.
March 29, 2000 12:56 pm
6
Branch prediction accuracy is generally much worse for unoptimized binaries. One reason
for this is simply the larger number of branches that are executed (20% fewer branches are executed in -O2 than in -O0). For both SPEC95 and SPEC00, prediction accuracies range from
roughly 82% to 98% and usually optimizations increase prediction accuracy by 4% or more.
GCC optimizations usually increase the number of instructions retired per cycle (IPC) but
for MIRV the opposite is the case.
Both compilers typically demonstrate a reduction in instruction-cache miss rate with optimizations enabled. For vortex, MIRV optimizations also result in an increase in instruction cache
miss rate but GCC optimizations actually improve instruction cache performance for this benchmark. For the li benchmark, the reverse occurs.
6. Obtaining and Installing the Binaries
The version 1 binaries used to produce the data in this report are available on the MIRV
website [1], including the binaries supplied on the SimpleScalar website [2]. The README file
there explains how to install the binaries.
7. Conclusion
This report has introduced the MIRV compiler. As its performance improves, we encourage architecture researchers to use these binaries in conjunction with the SimpleScalar simulation
environment as examples of highly optimized programs. As they evolve, these will include
advanced optimizations that are not available in GCC and so should be more representative of
state-of-the-art compilation techniques.
Acknowledgments
This work was supported by DARPA grant DABT63-97-C-0047. Simulations were performed on computers donated through the Intel Education 2000 Grant.
References
[1]
http://www.eecs.umich.edu/mirv
[2]
ftp://ftp.cs.wisc.edu/sohi/Code/simplescalar/simplebench.little.tar
[3]
Preston Briggs. Register Allocation via Graph Coloring. Rice University, Houston, Texas,
USA. Tech. Report. April, 1992.
[4]
Douglas C. Burger and Todd M. Austin. The SimpleScalar Tool Set, Version 2.0. University of Wisconsin, Madison Tech. Report. June, 1997.
[5]
Standard Performance Evaluation Corporation. SPEC CPU95. http://www.spec.org/osg/
cpu95/, Warrenton, Virginia, 1995.
[6]
Standard Performance Evaluation Corporation. SPEC CPU2000. http://www.spec.org/osg/
March 29, 2000 12:56 pm
cpu2000/, Warrenton, Virginia, 2000.
[7]
http://arch.cs.ucdavis.edu/RAD/gcc-2.7.2.3.ss.tar.gz
[8]
http://sourceware.cygnus.com/binutils/
7
March 29, 2000 12:56 pm
8
Appendix A. MIRV Optimizations
Frontend
Optimize
Level
-O2
-O3
-O3
-O3
-O2
-O1
-O1
-O1
-O1
-O1
-O1
-O1
-O2
-O1
-O1
-O1
-O1
-O1
-O1
-O1
-O1
-O1
-O4
-O1
-O1
-O1
-O2
-O1
-O1
-O1
Filter Applied
-fscalReplAggr
-fcallGraph
-finline
-ffunctCleaner
-floopUnroll
-farrayToPointer
-floopInversion
-fconstantFold
-fpropagation
-freassociation
-fconstantFold
-farithSimplify
-fregPromote
-fdeadCode
-floopInduction
-fLICodeMotion
-fCSE
-fpropagation
-fCSE
-farithSimplify
-fconstantFold
-fpropagation
-fLICodeMotion
-farithSimplify
-fconstantFold
-fstrengthReduction
-fscalReplAggr
-farithSimplify
-fdeadCode
-fcleaner
Backend
Optimize
Level
-O1
-O1
-O1
-O1
-O1
-O1
-O1
-O1
-O1
-O1
-O1
-O1
-O1
-O1
-O1
-O1
-O0
-O1
-O1
-O1
-O1
-O1
-O1
-O1
-O1
-O1
Filter Applied
-fpeephole0
-fpeephole1
-fblockClean
-fcse
-fcopy_propagation
-fconstant_propagation
-fdead_code_elimination
-fpeephole0
-fpeephole1
-fcse
-fcopy_propagation
-fconstant_propagation
-fdead_code_elimination
-fpeephole0
-fpeephole1
-flist_scheduler
-freg_alloc
-flist_scheduler_aggressive
-fpeephole0
-fpeephole1
-fcselocal
-fcopy_propagation
-fdead_code_elimination
-fpeephole1
-fblockClean
-fleafopt
Table 4. Order of optimization filter application in MIRV. Since the system is based on MIRV-toMIRV filters, filters can easily be run more than once, as the table shows. The frontend filters
operate on the MIRV high-level IR while the backend filters operate on a quad-type low-level IR.
March 29, 2000 12:56 pm
9
Appendix B. GCC Optimizations
The table shows the optimization sequence when ‘-O3 -funroll-loops’ is turned on. The
following flags are enabled: ‘-fdefer-pop -fomit-frame-pointer -fcse-follow-jumps -fcse-skipblocks -fexpensive-optimizations -fthread-jumps -fstrength-reduce -funroll-loops -fpeephole fforce-mem -ffunction-cse functions -finline -fcaller-saves -fpcc-struct-return -frerun-cse-afterloop -fschedule-insns -fschedule-insns2 -fcommon -fgnu-linker -mgas -mgpOPT -mgpopt’. The
table is somewhat incomplete because of the lack of documentation on GCC internal operations.
Optimization Applied
jump optimization
cse
jump optimization
loop invariant code motion
strength reduction (induction variables)
loop unroll
cse
coalescing
scheduling (first pass)
register allocation (local, then global)
insert prologue and epilogue code
sheduling (second pass)
branch optimizations (delayed and shortening)
jump opitmization
dead-code elimination
Table 5. Optimization flags in GCC 2.7.2.3/PISA. The GCC compiler is
flag based, meaning that an optimization is either on or off. Multiple
invocations of an optimization require a special flag (e.g. -frerun-cseafter-loop).
March 29, 2000 12:56 pm
10
Appendix C. SPEC95 Results
Execution Cycles
Retired Dynamic Instructions
300
350
SSsup
gccO0
gccO1
gccO2
mirvO0
mirvO1
mirvO2
250
250
Instructions (millions)
200
150
100
200
150
100
50
50
0
vo
rte
x
pe
rl
li9
5
m
88
ks
im
ijp
eg
co
m
Retired Dynamic Memory References
Retired Dynamic Loads
160
100
SSsup
gccO0
gccO1
gccO2
mirvO0
mirvO1
mirvO2
120
100
SSsup
gccO0
gccO1
gccO2
mirvO0
mirvO1
mirvO2
90
80
Loads (millions)
140
References (millions)
go
95
gc
c9
5
vo
rte
x
pe
rl
im
88
ks
m
li9
5
ijp
eg
go
co
m
gc
c
pr
es
s9
5
95
0
pr
es
s
Cycles (millions)
300
SSsup
gccO0
gccO1
gccO2
mirvO0
mirvO1
mirvO2
80
60
70
60
50
40
30
40
20
20
10
Retired Dynamic Stores
vo
rte
x
pe
rl
li9
5
m
88
ks
im
ijp
eg
Retired Dynamic Branches
60
50
SSsup
gccO0
gccO1
gccO2
mirvO0
mirvO1
mirvO2
40
35
30
SSsup
gccO0
gccO1
gccO2
mirvO0
mirvO1
mirvO2
50
Branches (millions)
45
Stores (millions)
go
pr
es
s
95
gc
c9
5
co
m
vo
rte
x
pe
rl
li9
5
m
88
ks
im
ijp
eg
95
co
m
pr
es
s
go
0
gc
c9
5
0
25
20
15
10
40
30
20
10
5
pe
rl
vo
rte
x
pe
rl
vo
rte
x
IPC
m
88
ks
im
li9
5
ijp
eg
go
95
co
m
pr
es
s
gc
c9
5
vo
rte
x
pe
rl
m
88
ks
im
li9
5
ijp
eg
go
co
m
pr
es
s
95
0
gc
c9
5
0
Average Instructions Per Branch
2.5
16
SSsup
gccO0
gccO1
gccO2
mirvO0
mirvO1
mirvO2
2.0
12
10
IPB
IPC
1.5
1.0
SSsup
gccO0
gccO1
gccO2
mirvO0
mirvO1
mirvO2
14
8
6
4
0.5
2
m
88
ks
im
li9
5
ijp
eg
go
95
pr
es
s
co
m
gc
c9
5
vo
rte
x
pe
rl
li9
5
ijp
eg
go
m
88
ks
im
co
m
pr
es
s
95
0
gc
c9
5
0.0
March 29, 2000 12:56 pm
11
L1 Instruction Cache Miss Rate
Branch Prediction Accuracy
10%
100%
SSsup
gccO0
gccO1
gccO2
mirvO0
mirvO1
mirvO2
7%
5%
Text Size 2
3.5
3.0
1.5
L2 Miss Rate
vo
rte
x
m
pe
rl
95
co
m
pr
gc
c
vo
rte
x
pe
rl
m
88
ks
im
li9
5
go
ijp
eg
gc
c
co
m
pr
es
s9
5
0.0
95
0.5
0%
88
ks
im
1.0
1%
li9
5
2%
2.0
ijp
eg
3%
SSsup
gccO0
gccO1
gccO2
mirvO0
mirvO1
mirvO2
2.5
go
4%
es
s9
5
SSsup
gccO0
gccO1
gccO2
mirvO0
mirvO1
mirvO2
Bytes (millions)
5%
Percent Cycles RUU Full
12%
70%
8%
6%
4%
60%
SSsup
gccO0
gccO1
gccO2
mirvO0
mirvO1
mirvO2
50%
Percentage
SSsup
gccO0
gccO1
gccO2
mirvO0
mirvO1
mirvO2
10%
40%
30%
20%
2%
10%
0%
vo
rte
x
pe
rl
im
88
ks
m
li9
5
ijp
eg
pr
es
s9
5
co
m
95
gc
c
vo
rte
x
pe
rl
88
ks
im
m
li9
5
ijp
eg
go
95
pr
es
s
co
m
gc
c9
5
0%
go
Miss Rate
vo
rte
x
m
L1 Data Cache Miss Rate
6%
Miss Rate
pe
rl
95
co
m
pr
gc
c
vo
rte
x
pe
rl
pr
es
s
co
m
m
88
ks
im
0%
li9
5
1%
80%
ijp
eg
2%
82%
go
3%
84%
88
ks
im
4%
86%
li9
5
88%
6%
ijp
eg
90%
95
SSsup
gccO0
gccO1
gccO2
mirvO0
mirvO1
mirvO2
8%
go
92%
gc
c9
5
Accuracy
94%
9%
es
s9
5
96%
Miss Rate
98%
March 29, 2000 12:56 pm
Graph
Execution Cycles
Retired Dynamic Instructions
Retired Dynamic Memory References
Retired Dynamic Loads
Retired Dynamic Stores
Retired Dynamic Branches
IPC
Average Instructions Per Branch
Branch Prediction Accuracy
L1 Instruction Cache Miss Rate
L1 Data Cache Miss Rate
Text Size 2
L2 Miss Rate
Percent Cycles RUU Full
12
Special Notes
sim_cycle
sim_num_insn
sim_num_refs
sim_num_loads
sim_num_stores
sim_num_branches
sim_IPC
sim_IPB
bpred_bimod.bpred_dir_rate
il1.miss_rate
dl1.miss_rate
This is computed as bfd_section_size(abfd, sect) of
the “.text” section in the binary. This is slightly
more accurate than ld_text_size. SimpleScalar
instructions are 64-bits each.
ul2.miss_rate
ruu_full
Table 6. Explanation of the graphs in Appendices C and D. Statistics without further
explanation are simply the statistic that is produced by the default sim-outorder simulator.
March 29, 2000 12:56 pm
13
Appendix D. SPEC00 Results
Execution Cycles
Retired Dynamic Instructions
7000
9000
Cycles (millions)
7000
6000
5000
4000
3000
SSsup
gccO0
gccO1
gccO2
mirvO0
mirvO1
mirvO2
6000
Instructions (millions)
SSsup
gccO0
gccO1
gccO2
mirvO0
mirvO1
mirvO2
8000
5000
4000
3000
2000
2000
1000
1000
0
0
art00
equake00
gzip00
mcf00
vortex00
vpr00
art00
Retired Dynamic Memory References
mcf00
vortex00
vpr00
3000
2500
2000
1500
SSsup
gccO0
gccO1
gccO2
mirvO0
mirvO1
mirvO2
2500
Loads (millions)
SSsup
gccO0
gccO1
gccO2
mirvO0
mirvO1
mirvO2
3000
References (millions)
gzip00
Retired Dynamic Loads
3500
2000
1500
1000
1000
500
500
0
0
art00
equake00
gzip00
mcf00
vortex00
vpr00
art00
Retired Dynamic Stores
equake00
gzip00
mcf00
vortex00
vpr00
Retired Dynamic Branches
500
600
SSsup
gccO0
gccO1
gccO2
mirvO0
mirvO1
mirvO2
450
400
300
250
200
150
SSsup
gccO0
gccO1
gccO2
mirvO0
mirvO1
mirvO2
500
Branches (millions)
350
Stores (millions)
equake00
100
400
300
200
100
50
0
0
art00
equake00
gzip00
mcf00
vortex00
vpr00
art00
IPC
1.6
1.4
mcf00
vortex00
vpr00
14
SSsup
gccO0
gccO1
gccO2
mirvO0
mirvO1
mirvO2
SSsup
gccO0
gccO1
gccO2
mirvO0
mirvO1
mirvO2
12
10
1.2
8
IPB
IPC
gzip00
Average Instructions Per Branch
2.0
1.8
equake00
1.0
6
0.8
0.6
4
0.4
2
0.2
0.0
0
art00
equake00
gzip00
mcf00
vortex00
vpr00
art00
equake00
gzip00
mcf00
vortex00
vpr00
March 29, 2000 12:56 pm
14
L1 Instruction Cache Miss Rate
Branch Prediction Accuracy
10%
100%
98%
96%
9%
8%
7%
92%
Miss Rate
Accuracy
94%
SSsup
gccO0
gccO1
gccO2
mirvO0
mirvO1
mirvO2
90%
88%
6%
5%
4%
86%
3%
84%
2%
82%
1%
80%
SSsup
gccO0
gccO1
gccO2
mirvO0
mirvO1
mirvO2
0%
art00
equake00
gzip00
mcf00
vortex00
vpr00
art00
equake00
L1 Data Cache Miss Rate
mcf00
vortex00
vpr00
mcf00
vortex00
vpr00
Text Size 2
50%
1.4
SSsup
gccO0
gccO1
gccO2
mirvO0
mirvO1
mirvO2
45%
40%
1.2
1.0
Bytes (millions)
35%
Miss Rate
gzip00
30%
25%
20%
15%
SSsup
gccO0
gccO1
gccO2
mirvO0
mirvO1
mirvO2
0.8
0.6
0.4
10%
0.2
5%
0%
0.0
art00
equake00
gzip00
mcf00
vortex00
vpr00
art00
L2 Miss Rate
gzip00
Percent Cycles RUU Full
50%
100%
40%
35%
30%
25%
20%
SSsup
gccO0
gccO1
gccO2
mirvO0
mirvO1
mirvO2
90%
80%
70%
Percentage
SSsup
gccO0
gccO1
gccO2
mirvO0
mirvO1
mirvO2
45%
Miss Rate
equake00
60%
50%
40%
30%
15%
20%
10%
10%
5%
0%
0%
art00
art00
equake00
gzip00
mcf 00
vortex00
vpr00
equake00
gzip00
mcf 00
vortex00
vpr00
0
ijpeg-gccO2
ijpeg-mirvO1
ijpeg-mirvO2
vortex-mirvO0
vortex-mirvO1
vortex-mirvO2
ijpeg-gccO1
vortex-gccO1
ijpeg-mirvO0
ijpeg-gccO0
vortex-gccO2
ijpeg-SSsup
vortex-gccO0
go-gccO0
go-SSsup
compress95-mirvO2
compress95-mirvO1
compress95-mirvO0
compress95-gccO2
compress95-gccO1
compress95-gccO0
vortex-SSsup
50
go-mirvO2
100
go-mirvO1
150
perl-mirvO2
200
go-mirvO0
250
perl-mirvO1
300
perl-mirvO0
stores
loads
go-gccO2
other
branches
go-gccO1
Dynam ic Instruction Mix
perl-gccO2
500
perl-gccO1
perl-gccO0
perl-SSsup
m88ksim-mirvO2
m88ksim-mirvO1
m88ksim-mirvO0
m88ksim-gccO2
m88ksim-gccO1
m88ksim-gccO0
gcc95-mirvO2
compress95-SSsup
stores
loads
li95-mirvO2
200
gcc95-mirvO1
gcc95-mirvO0
other
branches
m88ksim-SSsup
350
gcc95-gccO2
gcc95-gccO1
gcc95-gccO0
gcc95-SSsup
Dynamic Instructions (millions)
250
li95-mirvO1
400
li95-mirvO0
450
li95-gccO2
li95-gccO1
li95-gccO0
li95-SSsup
Dynamic Instructions (millions)
March 29, 2000 12:56 pm
15
Appendix E. Dynamic Instruction Mix Results
300
Dynam ic Instruction Mix
150
100
50
0
gzip00-gccO0
gzip00-gccO1
gzip00-gccO2
gzip00-mirvO0
gzip00-mirvO1
gzip00-mirvO2
vpr00-gccO1
vpr00-gccO2
vpr00-mirvO0
vpr00-mirvO1
vpr00-mirvO2
0
vpr00-gccO0
200
gzip00-SSsup
400
vpr00-SSsup
600
equake00-mirvO2
800
vortex00-mirvO2
Dynam ic Instruction Mix
equake00-mirvO1
1600
vortex00-mirvO1
equake00-mirvO0
equake00-gccO2
equake00-gccO1
equake00-gccO0
equake00-SSsup
0
vortex00-mirvO0
vortex00-gccO2
vortex00-gccO1
stores
loads
vortex00-gccO0
1200
vortex00-SSsup
stores
loads
art00-mirvO2
5000
art00-mirvO1
other
branches
mcf00-mirvO2
other
branches
art00-mirvO0
art00-gccO2
art00-gccO1
art00-gccO0
art00-SSsup
Dynamic Instructions (millions)
6000
mcf00-mirvO1
1400
mcf00-mirvO0
1000
mcf00-gccO2
mcf00-gccO1
mcf00-gccO0
mcf00-SSsup
Dynamic Instructions (millions)
March 29, 2000 12:56 pm
16
7000
Dynam ic Instruction Mix
4000
3000
2000
1000
March 29, 2000 12:56 pm
17
Appendix F. Detailed Results
Table F.1. Number of execution cycles.
cycles
gcc95
SSsup
gccO0
133,198,943
200,481,213
gccO1
137,146,750
gccO2
135,925,630
mirvO0
199,353,822
mirvO1
153,676,261
mirvO2
155,570,866
compress95
74,128,332
108,056,529
76,954,251
74,120,871
101,465,965
73,727,346
71,995,564
go
144,963,348
289,772,154
153,516,403
143,099,654
177,020,674
136,487,401
144,615,493
ijpeg
60,867,889
116,708,870
62,016,032
60,362,407
68,882,890
59,562,890
59,092,026
li95
111,990,825
198,753,658
123,987,875
122,404,069
171,851,043
126,223,376
119,688,571
73,873,102
203,182,527
87,329,609
73,359,777
143,770,331
97,886,738
100,992,063
m88ksim
perl
91,785,556
118,283,857
104,616,089
94,904,097
99,226,691
84,818,970
88,070,759
vortex
144,218,250
222,066,885
150,370,197
159,279,949
211,559,470
164,531,061
167,160,492
art00
0 8,183,059,761 3,932,008,214 3,691,442,317 5,664,033,174 3,896,938,836 3,978,797,881
equake00
0 1,983,885,959 1,228,292,079 1,129,167,092 1,592,383,199 1,246,462,154 1,097,503,110
gzip00
0 1,458,535,132
741,395,794
729,502,893
995,964,507
789,611,147
830,910,020
176,698,141
mcf00
0
300,632,079
185,845,520
180,043,497
216,663,192
176,769,988
vortex00
0
222,071,242
150,275,946
159,022,773
211,142,086
165,024,308
166,842,914
vpr00
0
960,620,460
497,678,504
497,481,148
667,190,527
513,963,243
505,800,410
Table F.2. Number of dynamic instructions.
dynInsn
gcc95
SSsup
gccO0
gccO1
gccO2
mirvO0
mirvO1
mirvO2
121,291,882
173,501,299
124,358,856
122,048,567
165,479,820
136,597,068
compress95 124,007,203
202,848,107
127,320,796
123,989,867
182,370,014
128,062,400
135,902,282
124,117,434
go
132,918,691
268,225,327
146,032,235
134,765,723
179,700,961
131,900,396
132,506,089
ijpeg
123,953,291
221,597,997
125,493,959
124,300,853
138,774,682
114,940,431
114,701,757
li95
173,968,882
277,800,863
176,326,885
173,607,213
234,372,535
177,251,019
177,236,548
m88ksim
119,317,263
214,992,826
124,426,497
119,670,756
184,779,644
122,942,402
123,731,658
perl
108,713,654
129,120,457
110,119,889
109,608,239
127,682,949
111,669,505
111,731,966
vortex
153,682,491
207,233,844
157,864,689
157,918,608
205,467,402
168,991,471
168,981,011
art00
0 6,269,718,143 2,270,057,265 2,024,827,366 3,883,917,755 2,131,438,659 2,137,494,813
equake00
0 2,984,585,045 1,502,806,229 1,459,964,520 1,967,861,216 1,519,585,190 1,382,962,709
gzip00
0 2,149,944,982 1,308,126,550 1,276,220,491 1,840,361,191 1,448,658,876 1,531,669,686
mcf00
0
405,724,021
209,934,857
202,016,959
277,494,737
200,422,476
vortex00
0
207,393,055
158,021,459
158,075,450
205,626,182
169,148,892
169,138,288
0 1,510,303,511
710,968,666
710,087,319 1,013,762,258
722,299,249
708,193,105
vpr00
200,424,704
March 29, 2000 12:56 pm
18
Table F.3. Number of dynamic memory references.
dynRefs
gcc95
SSsup
gccO0
49,334,757
73,873,363
gccO1
49,352,753
gccO2
49,361,737
mirvO0
58,200,460
mirvO1
54,892,416
mirvO2
55,461,383
compress95
43,664,405
71,087,446
45,453,686
43,616,577
63,318,786
46,089,721
43,413,349
go
36,716,606
82,177,020
39,763,447
38,071,493
44,760,506
42,363,336
43,109,104
ijpeg
31,856,721
86,688,154
31,571,618
31,708,996
34,770,808
34,330,551
34,393,731
li95
74,677,881
141,546,686
74,055,835
72,648,246
96,516,071
77,473,882
77,450,411
m88ksim
37,052,641
94,852,549
37,360,061
37,214,551
54,389,895
39,072,683
39,001,729
perl
49,025,762
59,632,004
49,364,718
49,074,054
54,951,417
48,596,001
48,677,929
vortex
81,564,028
117,310,095
84,338,484
84,884,654
111,006,867
93,174,770
93,165,111
art00
0 2,953,356,736
572,664,757
562,834,944 1,561,619,480
748,113,677
668,403,628
equake00
0 1,146,908,255
485,738,408
494,960,252
636,607,655
526,424,173
510,722,043
gzip00
0
760,957,701
410,885,840
402,533,982
522,098,333
460,619,365
533,499,622
76,623,066
mcf00
0
191,667,384
77,491,719
77,874,104
100,389,133
76,623,066
vortex00
0
117,424,086
84,450,640
84,996,918
111,120,145
93,287,517
93,277,714
vpr00
0
769,568,893
290,026,962
285,158,550
463,760,059
320,832,075
308,995,984
Table F.4. Number of dynamic load instructions.
loads
SSsup
gccO0
gccO1
gccO2
mirvO0
mirvO1
mirvO2
gcc95
31,872,008
51,867,054
32,314,881
31,743,080
39,225,924
35,149,405
35,430,480
compress95
26,561,283
44,501,502
28,354,482
26,517,372
39,664,588
28,257,749
26,602,114
31,143,981
go
27,464,121
64,392,437
30,374,262
28,144,373
34,362,737
30,791,030
ijpeg
22,283,318
63,619,861
22,431,106
22,204,793
25,129,726
24,116,600
24,135,665
li95
45,493,244
95,161,631
45,743,456
44,478,428
60,466,443
47,584,005
47,570,254
22,795,190
67,768,522
23,053,841
22,877,056
34,470,734
23,987,331
23,949,875
28,905,023
m88ksim
perl
29,091,584
36,461,407
29,386,498
29,019,370
33,099,127
28,844,092
vortex
43,439,863
70,196,006
45,404,268
45,113,080
65,182,812
50,703,090
50,701,232
0 2,487,176,786
427,202,096
417,372,265 1,405,139,878
598,432,397
518,482,248
art00
equake00
0
960,634,992
369,058,638
371,111,611
517,248,426
393,834,619
381,554,065
gzip00
0
538,118,773
293,914,705
281,592,756
377,580,781
323,275,044
360,690,739
43,412,222
mcf00
0
129,482,527
44,316,226
44,692,447
56,798,038
43,412,222
vortex00
0
70,242,839
45,449,812
45,158,651
65,229,268
50,749,024
50,747,094
vpr00
0
611,078,869
211,472,979
206,577,571
370,396,557
241,442,330
227,526,190
Table F.5. Number of dynamic store instructions.
stores
SSsup
gccO0
gccO1
gccO2
mirvO0
mirvO1
mirvO2
gcc95
17,462,749
22,006,309
17,037,872
17,618,657
18,974,536
19,743,011
compress95
17,103,122
26,585,944
17,099,204
17,099,205
23,654,198
17,831,972
20,030,903
16,811,235
go
9,252,485
17,784,583
9,389,185
9,927,120
10,397,769
11,572,306
11,965,123
ijpeg
9,573,403
23,068,293
9,140,512
9,504,203
9,641,082
10,213,951
10,258,066
li95
29,184,637
46,385,055
28,312,379
28,169,818
36,049,628
29,889,877
29,880,157
m88ksim
14,257,451
27,084,027
14,306,220
14,337,495
19,919,161
15,085,352
15,051,854
19,772,906
perl
19,934,178
23,170,597
19,978,220
20,054,684
21,852,290
19,751,909
vortex
38,124,165
47,114,089
38,934,216
39,771,574
45,824,055
42,471,680
42,463,879
art00
0
466,179,950
145,462,661
145,462,679
156,479,602
149,681,280
149,921,380
equake00
0
186,273,263
116,679,770
123,848,641
119,359,229
132,589,554
129,167,978
gzip00
0
222,838,928
116,971,135
120,941,226
144,517,552
137,344,321
172,808,883
mcf00
0
62,184,857
33,175,493
33,181,657
43,591,095
33,210,844
33,210,844
vortex00
0
47,181,247
39,000,828
39,838,267
45,890,877
42,538,493
42,530,620
vpr00
0
158,490,024
78,553,983
78,580,979
93,363,502
79,389,745
81,469,794
March 29, 2000 12:56 pm
19
Table F.6. Number of dynamic branch instructions.
branches
gcc95
SSsup
gccO0
24,419,621
32,075,384
gccO1
24,911,740
gccO2
24,768,175
mirvO0
31,987,107
mirvO1
27,208,037
mirvO2
26,912,338
compress95
22,449,938
29,619,589
22,447,525
22,447,525
29,375,803
23,225,364
22,761,010
go
20,226,745
26,490,233
20,469,933
20,427,880
25,320,029
20,972,690
20,919,284
ijpeg
11,147,615
16,411,284
11,200,098
11,196,216
16,144,036
11,361,376
10,258,101
li95
39,563,998
56,677,574
40,721,926
40,494,851
51,263,724
41,047,678
41,047,678
m88ksim
23,229,288
32,048,286
23,644,444
23,633,461
33,171,563
24,432,348
22,807,367
21,701,676
perl
20,807,945
24,539,960
21,121,515
21,051,780
24,802,448
21,758,211
vortex
24,386,128
29,994,803
23,970,139
23,940,982
31,815,320
27,718,552
27,718,547
art00
0
496,824,799
305,175,270
286,672,217
497,835,745
340,464,793
245,074,081
equake00
0
245,887,275
194,340,192
194,287,976
237,936,428
196,680,682
172,950,878
gzip00
0
315,190,379
237,097,158
237,096,998
304,468,516
246,300,098
246,142,642
mcf00
0
60,507,881
44,050,368
43,662,048
56,701,362
44,567,160
44,566,440
vortex00
0
30,006,871
23,981,934
23,952,777
31,827,429
27,730,466
27,730,461
vpr00
0
134,367,823
102,717,592
102,668,076
131,409,609
102,297,479
101,097,016
Table F.7. Total number of instructions executed (speculative and non-speculative)
totalinsn
gcc95
SSsup
gccO0
gccO1
gccO2
mirvO0
mirvO1
mirvO2
142,731,598
197,785,893
146,101,963
143,256,471
191,364,706
160,882,289
compress95 145,313,204
229,548,689
152,356,955
145,305,225
209,208,842
151,114,523
159,878,583
147,488,798
go
164,694,798
302,160,340
178,262,378
166,919,747
213,106,429
164,904,141
164,675,171
ijpeg
132,673,455
230,974,241
134,451,425
133,029,864
147,785,767
123,363,879
123,029,721
li95
212,072,618
320,561,618
214,215,027
207,172,196
277,786,734
225,526,301
225,799,749
m88ksim
127,989,542
222,049,480
132,583,890
128,182,216
194,087,300
139,146,769
140,645,885
perl
124,387,461
141,813,436
124,294,619
125,604,111
143,100,557
125,829,147
125,925,468
vortex
157,204,562
211,428,230
161,684,192
161,403,187
209,951,844
172,990,766
172,832,047
art00
0 6,599,861,689 2,601,964,628 2,358,665,668 4,086,289,721 2,298,462,334 2,417,414,263
equake00
0 3,079,987,564 1,585,588,414 1,541,774,603 2,063,294,135 1,621,353,296 1,432,807,184
gzip00
0 2,373,936,872 1,485,265,785 1,445,983,553 2,055,284,906 1,617,370,836 1,700,150,401
mcf00
0
446,599,029
250,260,853
242,563,503
312,157,139
238,801,643
238,804,394
vortex00
0
211,582,549
161,787,572
161,568,409
210,097,534
173,132,592
173,290,978
0 1,633,462,350
825,282,672
823,844,499 1,128,487,567
839,225,111
825,515,677
vpr00
March 29, 2000 12:56 pm
20
Table F.8. Total number of memory references executed (speculative and non-speculative).
totalrefs
SSsup
gccO0
gccO1
gccO2
mirvO0
mirvO1
mirvO2
gcc95
57,644,574
82,971,897
57,441,491
57,468,393
65,405,756
63,848,148
compress95
51,242,289
79,852,930
55,180,437
51,199,467
72,576,616
55,207,752
64,271,568
52,843,584
go
44,987,931
92,231,388
47,675,386
46,637,129
51,533,660
52,652,047
53,144,911
ijpeg
33,743,078
88,824,590
33,280,042
33,610,579
36,390,406
36,239,829
36,373,765
li95
89,978,781
165,736,202
87,894,372
85,331,607
114,404,632
96,624,795
96,718,638
m88ksim
40,537,475
98,421,283
40,590,766
40,650,549
57,198,876
44,358,129
44,540,365
perl
54,572,928
64,639,181
54,224,137
54,952,336
60,183,338
54,290,123
54,119,775
vortex
83,117,340
119,568,855
85,989,966
86,394,959
112,849,860
94,888,095
94,848,785
art00
0 3,122,895,929
626,774,507
640,161,117 1,658,947,251
801,274,286
751,495,150
equake00
0 1,185,730,979
504,513,549
520,254,297
654,951,428
560,047,648
524,313,794
gzip00
0
839,817,109
464,332,540
454,703,182
577,101,119
516,330,530
596,370,658
mcf00
0
210,989,597
89,090,863
91,249,222
109,810,364
88,765,109
88,766,011
vortex00
0
119,689,422
86,104,426
86,506,518
112,958,574
94,990,488
95,078,399
vpr00
0
832,443,022
336,006,141
331,069,790
526,463,967
374,287,250
362,455,251
Table F.9. Instructions per cycle.
IPC
SSsup
gccO0
gccO1
gccO2
mirvO0
mirvO1
mirvO2
gcc95
0.91
0.87
0.91
0.90
0.83
0.89
compress95
1.67
1.88
1.65
1.67
1.80
1.74
1.72
go
0.92
0.93
0.95
0.94
1.02
0.97
0.92
ijpeg
2.04
1.90
2.02
2.06
2.01
1.93
1.94
li95
0.87
1.55
1.40
1.42
1.42
1.36
1.40
1.48
m88ksim
1.62
1.06
1.42
1.63
1.29
1.26
1.23
1.27
perl
1.18
1.09
1.05
1.15
1.29
1.32
vortex
1.07
0.93
1.05
0.99
0.97
1.03
1.01
art00
0.00
0.77
0.58
0.55
0.69
0.55
0.54
equake00
0.00
1.50
1.22
1.29
1.24
1.22
1.26
gzip00
0.00
1.47
1.76
1.75
1.85
1.83
1.84
mcf00
0.00
1.35
1.13
1.12
1.28
1.13
1.13
vortex00
0.00
0.93
1.05
0.99
0.97
1.03
1.01
vpr00
0.00
1.57
1.43
1.43
1.52
1.41
1.40
March 29, 2000 12:56 pm
21
Table F.10. Instructions per branch.
IPB
SSsup
gcc95
gccO0
4.97
gccO1
gccO2
5.41
4.99
mirvO0
4.93
mirvO1
5.17
mirvO2
5.02
5.05
compress95
5.52
6.85
5.67
5.52
6.21
5.51
5.45
go
6.57
10.13
7.13
6.60
7.10
6.29
6.33
ijpeg
11.12
13.50
11.20
11.10
8.60
10.12
11.18
li95
4.40
4.90
4.33
4.29
4.57
4.32
4.32
5.14
6.71
5.26
5.06
5.57
5.03
5.43
m88ksim
perl
5.22
5.26
5.21
5.21
5.15
5.13
5.15
vortex
6.30
6.91
6.59
6.60
6.46
6.10
6.10
art00
0.00
12.62
7.44
7.06
7.80
6.26
8.72
equake00
0.00
12.14
7.73
7.51
8.27
7.73
8.00
gzip00
0.00
6.82
5.52
5.38
6.04
5.88
6.22
4.50
mcf00
0.00
6.71
4.77
4.63
4.89
4.50
vortex00
0.00
6.91
6.59
6.60
6.46
6.10
6.10
vpr00
0.00
11.24
6.92
6.92
7.71
7.06
7.01
Table F.11. Branch prediction accuracy.
BPrate
SSsup
gccO0
gccO1
gccO2
mirvO0
mirvO1
mirvO2
gcc95
88.70%
87.39%
89.10%
88.96%
85.69%
88.41%
88.22%
compress95
90.00%
90.76%
90.00%
90.00%
83.99%
90.40%
90.16%
go
81.68%
84.42%
81.87%
81.84%
82.31%
81.95%
81.72%
ijpeg
92.61%
93.80%
92.66%
92.66%
93.10%
92.74%
92.02%
li95
m88ksim
92.48%
86.91%
92.58%
92.51%
85.47%
92.40%
92.40%
96.18%
89.45%
95.98%
96.40%
88.97%
94.34%
94.02%
94.38%
perl
93.34%
93.05%
93.87%
93.77%
91.61%
94.38%
vortex
96.80%
87.57%
96.28%
96.98%
88.57%
96.92%
97.12%
art00
0.00%
89.58%
83.95%
82.84%
92.24%
89.77%
84.28%
equake00
0.00%
94.93%
93.65%
93.65%
94.71%
93.69%
95.84%
gzip00
0.00%
90.85%
93.41%
93.41%
88.35%
93.15%
93.14%
90.81%
mcf00
0.00%
85.12%
90.88%
90.88%
83.51%
90.81%
vortex00
0.00%
87.57%
96.27%
96.98%
88.58%
96.92%
96.73%
vpr00
0.00%
88.46%
90.91%
90.90%
84.22%
90.71%
90.31%
Table F.12. L1 instruction-cache miss rate.
IL1miss
SSsup
gccO0
gccO1
gccO2
mirvO0
mirvO1
mirvO2
gcc95
6.58%
7.41%
6.71%
6.84%
7.65%
6.92%
compress95
0.01%
0.01%
0.01%
0.01%
0.01%
0.01%
0.01%
go
4.91%
5.07%
4.50%
4.59%
4.42%
4.19%
4.82%
ijpeg
0.42%
0.29%
0.28%
0.39%
0.46%
0.49%
0.31%
li95
0.58%
0.54%
1.35%
1.80%
1.47%
1.28%
0.77%
2.67%
7.47%
4.03%
2.72%
4.64%
3.98%
4.10%
m88ksim
7.11%
perl
4.48%
5.70%
6.12%
4.69%
3.72%
3.54%
3.88%
vortex
6.98%
8.23%
7.12%
8.19%
7.89%
7.68%
7.95%
0.00%
art00
0.00%
0.00%
0.00%
0.00%
0.01%
0.01%
equake00
0.00%
0.36%
2.09%
1.26%
3.14%
1.75%
0.57%
gzip00
0.00%
2.41%
0.00%
0.00%
0.01%
0.02%
0.01%
0.18%
mcf00
0.00%
0.09%
0.19%
0.20%
0.13%
0.19%
vortex00
0.00%
8.23%
7.12%
8.19%
7.89%
7.68%
7.77%
vpr00
0.00%
0.15%
0.13%
0.14%
0.16%
0.15%
0.25%
March 29, 2000 12:56 pm
22
Table F.13. L1 data-cache miss rate.
DL1miss
gcc95
SSsup
gccO0
1.54%
gccO1
1.11%
gccO2
1.53%
mirvO0
1.54%
mirvO1
1.39%
mirvO2
1.47%
1.46%
compress95
5.23%
3.32%
4.85%
5.19%
3.62%
4.90%
5.18%
go
2.01%
1.00%
1.93%
2.05%
1.71%
1.85%
1.88%
ijpeg
0.90%
0.36%
0.91%
0.91%
0.82%
0.84%
0.84%
li95
1.81%
1.02%
1.82%
1.86%
1.58%
1.88%
1.88%
0.72%
0.31%
0.73%
0.71%
0.50%
0.69%
0.68%
m88ksim
perl
0.60%
0.61%
0.58%
0.58%
0.50%
0.55%
0.55%
vortex
1.81%
1.28%
1.75%
1.76%
1.41%
1.61%
1.62%
art00
0.00%
8.96%
40.73%
41.97%
15.18%
32.12%
34.55%
equake00
0.00%
1.94%
4.38%
4.29%
3.35%
4.05%
4.18%
gzip00
0.00%
2.50%
4.48%
4.54%
3.58%
4.01%
3.47%
13.10%
mcf00
0.00%
6.35%
12.99%
12.77%
10.31%
13.10%
vortex00
0.00%
1.29%
1.76%
1.77%
1.42%
1.62%
1.63%
vpr00
0.00%
1.68%
4.02%
4.07%
2.42%
3.65%
3.71%
Table F.14. Program text size (measurement 1)
textSize
gcc95
SSsup
gccO0
gccO1
gccO2
mirvO0
mirvO1
mirvO2
2,166,768
2,934,576
2,000,448
1,962,672
2,830,848
2,279,776
2,538,240
compress95
103,840
109,584
105,456
105,264
107,712
105,232
107,552
go
621,600
934,112
581,824
566,400
678,432
561,280
620,144
ijpeg
396,976
520,848
364,752
365,904
414,704
377,280
474,784
li95
180,640
207,792
176,528
176,160
199,536
182,640
183,680
286,864
383,024
289,712
286,608
354,784
308,736
328,192
m88ksim
perl
535,584
627,024
506,992
503,008
621,392
559,184
568,320
vortex
990,928
1,195,328
977,424
966,704
1,132,080
1,017,072
1,017,200
144,304
art00
0
131,504
120,384
119,456
123,328
120,384
equake00
0
154,048
125,904
126,624
134,976
129,232
159,888
gzip00
0
230,448
201,264
200,768
214,720
200,512
213,632
mcf00
0
127,056
114,176
114,400
117,872
115,056
115,600
vortex00
0
1,195,328
977,424
966,704
1,132,080
1,017,072
1,017,152
vpr00
0
439,328
322,768
314,320
363,040
328,336
437,968
Table F.15. Program text size (measurement 2, as described in Table 6).
textSize2
gcc95
SSsup
gccO0
gccO1
gccO2
mirvO0
mirvO1
mirvO2
2,166,320
2,934,128
2,000,000
1,962,224
2,830,400
2,279,328
compress95
103,392
109,136
105,008
104,816
107,264
104,784
2,537,792
107,104
go
621,152
933,664
581,376
565,952
677,984
560,832
619,696
ijpeg
396,528
520,400
364,304
365,456
414,256
376,832
474,336
li95
180,192
207,344
176,080
175,712
199,088
182,192
183,232
m88ksim
286,416
382,576
289,264
286,160
354,336
308,288
327,744
perl
535,136
626,576
506,544
502,560
620,944
558,736
567,872
vortex
990,480
1,194,880
976,976
966,256
1,131,632
1,016,624
1,016,752
art00
0
131,056
119,936
119,008
122,880
119,936
143,856
equake00
0
153,600
125,456
126,176
134,528
128,784
159,440
gzip00
0
230,000
200,816
200,320
214,272
200,064
213,184
mcf00
0
126,608
113,728
113,952
117,424
114,608
115,152
vortex00
0
1,194,880
976,976
966,256
1,131,632
1,016,624
1,016,704
vpr00
0
438,880
322,320
313,872
362,592
327,888
437,520
March 29, 2000 12:56 pm
23
Table F.16. Unified L2 miss rate.
UL2miss
gcc95
SSsup
gccO0
2.17%
gccO1
2.14%
gccO2
2.02%
mirvO0
2.01%
mirvO1
mirvO2
2.28%
2.24%
2.35%
compress95
9.55%
9.36%
9.69%
9.67%
9.60%
9.63%
9.60%
go
5.77%
11.20%
6.79%
5.89%
7.44%
5.97%
6.13%
ijpeg
6.31%
9.06%
7.67%
6.37%
5.82%
6.03%
8.68%
li95
0.10%
0.10%
0.07%
0.06%
0.06%
0.07%
0.09%
3.04%
0.67%
2.01%
3.00%
1.25%
1.93%
1.85%
m88ksim
perl
0.75%
0.54%
0.62%
0.73%
0.80%
0.94%
0.86%
vortex
2.70%
1.99%
2.51%
2.18%
2.41%
2.10%
2.02%
art00
0.00%
48.12%
48.12%
48.12%
48.10%
48.10%
48.12%
equake00
0.00%
19.08%
11.51%
15.20%
7.37%
12.55%
20.65%
gzip00
0.00%
1.08%
3.09%
3.07%
3.32%
3.09%
3.29%
18.51%
mcf00
0.00%
18.57%
18.52%
18.48%
18.61%
18.48%
vortex00
0.00%
1.98%
2.47%
2.11%
2.37%
2.29%
2.38%
vpr00
0.00%
9.42%
10.22%
10.22%
9.72%
10.19%
9.74%
Table F.17. Percentage of time RUU is full.
RUUFull
SSsup
gccO0
gccO1
gccO2
mirvO0
mirvO1
mirvO2
gcc95
10.85%
10.67%
11.22%
10.18%
12.79%
10.27%
10.03%
compress95
52.81%
52.03%
51.37%
52.79%
41.33%
49.48%
51.12%
go
23.26%
27.97%
27.20%
24.06%
22.93%
21.98%
20.38%
ijpeg
58.00%
49.83%
61.33%
57.70%
60.23%
43.31%
47.60%
28.32%
21.41%
25.25%
24.63%
22.34%
23.61%
24.52%
m88ksim
li95
20.21%
7.28%
21.58%
18.90%
23.88%
21.37%
23.52%
10.33%
perl
10.44%
8.43%
8.19%
10.70%
12.80%
11.60%
vortex
9.90%
4.34%
9.19%
7.65%
4.96%
5.90%
5.63%
art00
0.00%
27.55%
88.72%
90.29%
78.48%
74.73%
81.46%
equake00
0.00%
61.10%
43.80%
40.57%
33.83%
36.70%
40.26%
gzip00
0.00%
32.27%
64.87%
62.00%
53.08%
53.88%
46.69%
55.40%
mcf00
0.00%
31.31%
58.99%
55.55%
32.76%
55.36%
vortex00
0.00%
4.30%
9.05%
7.54%
4.84%
5.86%
5.74%
vpr00
0.00%
39.11%
60.04%
61.02%
37.09%
46.48%
44.41%