ASHLI Multipass for Lower Precision Targets

ASHLI
Multipass for Lower Precision Targets
Avi Bleiweiss
Arcot J. Preetham
Derek Whiteman
Joshua Barczak
ATI Research Silicon Valley, Inc
Background
•
Shading hardware virtualization is key to Ashli
technology
•
Segmenting complex shaders still vital
– Less so for instructions, more for registers and limited
vertex iterators
ASHLI
Siggraph 2004
•
Multipass rendering incorporates shader
partitions
•
Intermediate buffer allocation criteria – retain
ultimate precision
Motivation
•
Multipass is memory demanding
– Expensive per-pass per-output, window sized, IEEE
float component buffer
•
Video memory a scarce resource after all
– Temporary buffers shared with textures and state
ASHLI
Siggraph 2004
•
Make multipass pervasive for mobile and hand
held platforms
•
Challenge unified ultimate precision quest
– Use lower precision targets (LPT), when possible
Shader Range
•
Pre-compiled shader value range essential for
determining LPT usage
•
Both intermediate and final results delimited by
range
•
Range calculation, off and in line options
– Apply numerical analysis based on inputs
•
Use generated Cpp from any of the languages
– Examine IEEE float buffer for each pass cumulatively
•
ASHLI
Siggraph 2004
Inline range computation
– Resolve pass data into a single min/max buffer
– Use reduction operation for final range values
Shader Range (cnt’d)
ASHLI
Siggraph 2004
•
Per-buffer sample shaders:
•
First implementation caters to a single color
output pixel shader
API for LPT
•
Shader value range
– A program shader optionally takes in delimiter
parameters e.g. addShader(processor, entry, min, max)
•
Compile time flags for normalizing temporary and
final results
– Into either an unsigned (0 to 1) or a signed (-1 to +1)
buffer
•
ASHLI
Siggraph 2004
Shader source invariant of final render target
format
Normalization
•
LPT normalization pairs derived from shader
range
– Each composed of a scale and an offset
ASHLI
Siggraph 2004
•
Apply normalization to each shader segment,
right before writing output result
•
De-normalize LPT stored result, before being
used, for all partitions but the first
•
LPT normalization mad’s part of subdivision cost
metrics
Normalization (cnt’d)
•
Code example:
– Range <-1000,+1000>, into an unsigned buffer
ASHLI
Siggraph 2004
Footprint
•
Target format choices for one, two, and four
components:
–
–
–
–
•
32-bit floating point (IEEE)
16-bit float (10-bit mantissa, 5-bit exponent)
16-bit unsigned integer
8-bit unsigned integer
Consider a 1K by 1K pixels window, per-pass
per-output buffer cost (MB):
Format
ASHLI
Siggraph 2004
Single Component
Two Components
Four Components
32 bits
4
8
16
16 bits
2
4
8
8 bits
1
N/A
4
Footprint (cnt’d)
•
Multiple output partitions may increase single
output mate footprint
– Lesser passes, more buffers per-pass
ASHLI
Siggraph 2004
•
Use one or two component intermediate buffers,
when possible
•
Reclaim inactive output buffer(s) from previous
passes
•
Reorder pass execution to avoid buffer
dependencies across passes
Performance
•
LPT performance impact is of interest for pixel
shading limited scenes
•
Single (SRT) and multiple (MRT) render targets
evaluated for memory behavior
•
SRT:
• Memory bandwidth lesser factor for compute intensive
passes:
•
Figures are in frames-per-second (32 bit relative in parenthesis)
Instructions per Pass
32 bits
16 bits
8 bits
8
10.7 (1.00)
16.5 (1.54)
20.2 (1.88)
ASHLI
16
21.8 (1.00)
29.8 (1.36)
33.1 (1.52)
Siggraph 2004
32
27.0 (1.00)
34.5 (1.27)
35.9 (1.33)
64
44.1 (1.00)
45.7 (1.04)
45.7 (1.04)
Performance (cnt’d)
•
MRT:
• A more involved memory access pattern with lesser
locality
• Scene performance model:
•
•
•
Evaluated in increments of 5 distant lights
Consistent four outputs per pass, a four component output buffer
Figures are in frames-per-second (32 bit relative in parenthesis)
Lights
ASHLI
Siggraph 2004
Passes
Buffers
32 bits
16 bits
8 bits
10
14
24
11.90 (1.00)
18.45 (1.55)
25.74 (2.16)
15
20
34
7.76 (1.00)
11.86 (1.52)
18.35 (2.39)
20
27
44
5.97 (1.00)
8.88 (1.48)
13.69 (2.29)
25
34
54
4.86 (1.00)
7.08 (1.46)
10.98 (2.26)
30
40
64
3.73 (1.00)
6.13 (1.64)
9.29 (2.49)
Quality
•
A shader range maps onto a minimal LPT format
•
Perception quantization error determining factor
•
Any LPT format above the lowest match expects
to yield comparable render quality
•
Following samples depict normalization to a
shader range of <0, 40>
– 16 bit unsigned integer minimal format
ASHLI
Siggraph 2004
Quality (cnt’d)
•
ASHLI
Siggraph 2004
LPT format matches range, hardly noticed loss
Quality (cnt’d)
•
ASHLI
Siggraph 2004
LPT format less than range, significant banding
Summary
•
Ashli decouples final render target precision
from shader source
•
Multipass rendering more affordable on low cost
platforms, using LPT
•
Ashli main download link and contact:
– http://www.ati.com/developer/ashli.html
– [email protected]
ASHLI
Siggraph 2004