ASHLI Multipass for Lower Precision Targets Avi Bleiweiss Arcot J. Preetham Derek Whiteman Joshua Barczak ATI Research Silicon Valley, Inc Background • Shading hardware virtualization is key to Ashli technology • Segmenting complex shaders still vital – Less so for instructions, more for registers and limited vertex iterators ASHLI Siggraph 2004 • Multipass rendering incorporates shader partitions • Intermediate buffer allocation criteria – retain ultimate precision Motivation • Multipass is memory demanding – Expensive per-pass per-output, window sized, IEEE float component buffer • Video memory a scarce resource after all – Temporary buffers shared with textures and state ASHLI Siggraph 2004 • Make multipass pervasive for mobile and hand held platforms • Challenge unified ultimate precision quest – Use lower precision targets (LPT), when possible Shader Range • Pre-compiled shader value range essential for determining LPT usage • Both intermediate and final results delimited by range • Range calculation, off and in line options – Apply numerical analysis based on inputs • Use generated Cpp from any of the languages – Examine IEEE float buffer for each pass cumulatively • ASHLI Siggraph 2004 Inline range computation – Resolve pass data into a single min/max buffer – Use reduction operation for final range values Shader Range (cnt’d) ASHLI Siggraph 2004 • Per-buffer sample shaders: • First implementation caters to a single color output pixel shader API for LPT • Shader value range – A program shader optionally takes in delimiter parameters e.g. addShader(processor, entry, min, max) • Compile time flags for normalizing temporary and final results – Into either an unsigned (0 to 1) or a signed (-1 to +1) buffer • ASHLI Siggraph 2004 Shader source invariant of final render target format Normalization • LPT normalization pairs derived from shader range – Each composed of a scale and an offset ASHLI Siggraph 2004 • Apply normalization to each shader segment, right before writing output result • De-normalize LPT stored result, before being used, for all partitions but the first • LPT normalization mad’s part of subdivision cost metrics Normalization (cnt’d) • Code example: – Range <-1000,+1000>, into an unsigned buffer ASHLI Siggraph 2004 Footprint • Target format choices for one, two, and four components: – – – – • 32-bit floating point (IEEE) 16-bit float (10-bit mantissa, 5-bit exponent) 16-bit unsigned integer 8-bit unsigned integer Consider a 1K by 1K pixels window, per-pass per-output buffer cost (MB): Format ASHLI Siggraph 2004 Single Component Two Components Four Components 32 bits 4 8 16 16 bits 2 4 8 8 bits 1 N/A 4 Footprint (cnt’d) • Multiple output partitions may increase single output mate footprint – Lesser passes, more buffers per-pass ASHLI Siggraph 2004 • Use one or two component intermediate buffers, when possible • Reclaim inactive output buffer(s) from previous passes • Reorder pass execution to avoid buffer dependencies across passes Performance • LPT performance impact is of interest for pixel shading limited scenes • Single (SRT) and multiple (MRT) render targets evaluated for memory behavior • SRT: • Memory bandwidth lesser factor for compute intensive passes: • Figures are in frames-per-second (32 bit relative in parenthesis) Instructions per Pass 32 bits 16 bits 8 bits 8 10.7 (1.00) 16.5 (1.54) 20.2 (1.88) ASHLI 16 21.8 (1.00) 29.8 (1.36) 33.1 (1.52) Siggraph 2004 32 27.0 (1.00) 34.5 (1.27) 35.9 (1.33) 64 44.1 (1.00) 45.7 (1.04) 45.7 (1.04) Performance (cnt’d) • MRT: • A more involved memory access pattern with lesser locality • Scene performance model: • • • Evaluated in increments of 5 distant lights Consistent four outputs per pass, a four component output buffer Figures are in frames-per-second (32 bit relative in parenthesis) Lights ASHLI Siggraph 2004 Passes Buffers 32 bits 16 bits 8 bits 10 14 24 11.90 (1.00) 18.45 (1.55) 25.74 (2.16) 15 20 34 7.76 (1.00) 11.86 (1.52) 18.35 (2.39) 20 27 44 5.97 (1.00) 8.88 (1.48) 13.69 (2.29) 25 34 54 4.86 (1.00) 7.08 (1.46) 10.98 (2.26) 30 40 64 3.73 (1.00) 6.13 (1.64) 9.29 (2.49) Quality • A shader range maps onto a minimal LPT format • Perception quantization error determining factor • Any LPT format above the lowest match expects to yield comparable render quality • Following samples depict normalization to a shader range of <0, 40> – 16 bit unsigned integer minimal format ASHLI Siggraph 2004 Quality (cnt’d) • ASHLI Siggraph 2004 LPT format matches range, hardly noticed loss Quality (cnt’d) • ASHLI Siggraph 2004 LPT format less than range, significant banding Summary • Ashli decouples final render target precision from shader source • Multipass rendering more affordable on low cost platforms, using LPT • Ashli main download link and contact: – http://www.ati.com/developer/ashli.html – [email protected] ASHLI Siggraph 2004