Link opens in new browser window.

Revision 1.5
June 8, 2010
Radeon R5xx Acceleration
© 2010 Advanced Micro Devices, Inc.
Proprietary
1
Revision 1.5
June 8, 2010
Trademarks
AMD, the AMD Arrow logo, Athlon, and combinations thereof, ATI, ATI logo, Radeon, and Crossfire are trademarks of Advanced
Micro Devices, Inc.
Microsoft and Windows are registered trademarks of Microsoft Corporation.
Other product names used in this publication are for identification purposes only and may be trademarks of their respective
companies.
Disclaimer
The contents of this document are provided in connection with Advanced Micro Devices, Inc. ("AMD") products. AMD makes no
representations or warranties with respect to the accuracy or completeness of the contents of this publication and reserves the
right to make changes to specifications and product descriptions at any time without notice. No license, whether express, implied,
arising by estoppel, or otherwise, to any intellectual property rights are granted by this publication. Except as set forth in AMD's
Standard Terms and Conditions of Sale, AMD assumes no liability whatsoever, and disclaims any express or implied warranty,
relating to its products including, but not limited to, the implied warranty of merchantability, fitness for a particular purpose, or
infringement of any intellectual property right. AMD's products are not designed, intended, authorized or warranted for use as
components in systems intended for surgical implant into the body, or in other applications intended to support or sustain life, or
in any other application in which the failure of AMD's product could create a situation where personal injury, death, or severe
property or environmental damage may occur. AMD reserves the right to discontinue or make changes to its products at any time
without notice.
© 2010 Advanced Micro Devices, Inc. All rights reserved.
© 2010 Advanced Micro Devices, Inc.
Proprietary
2
Revision 1.5
1.
INTRODUCTION ............................................................................................................................................. 6
1.1
1.2
1.3
1.4
2.
June 8, 2010
INTRODUCING THE R5XX FAMILY ..........................................................................................................................6
FEATURE HIGHLIGHTS ........................................................................................................................................6
FEATURES IN DETAIL ..........................................................................................................................................6
CHANGES FROM R3XX/4XX .................................................................................................................................7
TILING............................................................................................................................................................ 9
2.1
2.2
2.3
OVERVIEW .......................................................................................................................................................9
MICRO BLOCKS.................................................................................................................................................9
MACRO BLOCKS ................................................................................................................................................9
3.
SURFACE FORMATS ..................................................................................................................................... 11
4.
TEXTURE MEMORY LAYOUT ........................................................................................................................ 13
4.1
4.2
4.3
4.4
4.5
4.6
4.7
5.
COMMAND PROCESSOR .............................................................................................................................. 18
5.1
5.2
5.3
5.4
5.5
5.6
5.7
5.8
5.9
5.10
5.11
5.12
5.13
5.14
6.
OVERVIEW .....................................................................................................................................................18
HOST PROGRAMMING MODEL DESCRIPTION ........................................................................................................18
PUSH VS PULL MODEL .....................................................................................................................................18
RING BUFFER MANAGEMENT ............................................................................................................................19
CHIPSET COHERENCY ISSUES..............................................................................................................................21
INDIRECT BUFFER MANAGEMENT .......................................................................................................................21
OVERVIEW OF DMA OPERATION .......................................................................................................................22
RESETTING THE COMMAND PROCESSOR ..............................................................................................................24
COMMAND STREAM SYNCHRONIZATION ..............................................................................................................24
STARTING THE INDIRECT STREAMS ......................................................................................................................25
WRITING HOST DATA TO THE COMMAND STREAM QUEUE .....................................................................................26
WRITING TO THE MICROENGINE RAM ...............................................................................................................27
READING FROM THE MICROENGINE RAM ...........................................................................................................27
STARTING A DMA OPERATION ..........................................................................................................................28
PM4 ............................................................................................................................................................. 29
6.1
6.2
7.
MACRO - LINEAR / MICRO - LINEAR ...................................................................................................................13
MACRO - LINEAR / MICRO - TILED......................................................................................................................13
MACRO - TILED / MICRO - LINEAR......................................................................................................................13
MACRO - TILED / MICRO - TILED........................................................................................................................14
MIPMAPS .....................................................................................................................................................15
CUBE MAPS ...................................................................................................................................................15
3D TEXTURES .................................................................................................................................................16
PACKET TYPES ................................................................................................................................................29
DEFINITION OF TYPE-3 PACKETS .........................................................................................................................33
VERTEX SHADERS ........................................................................................................................................ 59
© 2010 Advanced Micro Devices, Inc.
Proprietary
3
Revision 1.5
7.1
7.2
7.3
7.4
7.5
7.6
7.7
8.
June 8, 2010
INTRODUCTION ...............................................................................................................................................59
INPUT ...........................................................................................................................................................59
VECTOR ORDER AND VECTOR ID’S......................................................................................................................64
VAP REGISTERS ..............................................................................................................................................65
R3XX-R5XX PROGRAMMABLE VERTEX SHADER DESCRIPTION ..................................................................................71
SETTING-UP AND STARTING THE VAP ...............................................................................................................101
METHODS OF PASSING VERTEX DATA ...............................................................................................................102
FRAGMENT SHADERS ................................................................................................................................ 103
8.1
8.2
8.3
8.4
8.5
8.6
8.7
8.8
9.
INTRODUCTION .............................................................................................................................................103
INSTRUCTIONS ..............................................................................................................................................103
INSTRUCTION WORDS ....................................................................................................................................104
ALU INSTRUCTIONS .......................................................................................................................................105
TEXTURE INSTRUCTIONS .................................................................................................................................113
FLOW CONTROL ............................................................................................................................................115
FLOATING POINT ISSUES .................................................................................................................................121
WRITING TO US REGISTERS.............................................................................................................................124
HIZ ............................................................................................................................................................. 126
9.1
9.2
9.3
9.4
9.5
9.6
10.
10.1
10.2
10.3
10.4
10.5
10.6
10.7
10.8
10.9
11.
11.1
11.2
11.3
11.4
11.5
11.6
11.7
INTRODUCTION .............................................................................................................................................126
ENABLING HIZ ..............................................................................................................................................126
CONFIGURING HIZ ........................................................................................................................................126
HIZ CLEAR WITH PM4 PACKET ........................................................................................................................128
EXAMPLE: PUTTING IT ALL TOGETHER ...............................................................................................................128
STATE CHANGES THAT INVALIDATE HIZ .............................................................................................................129
DRIVER NOTES....................................................................................................................................... 130
R5XX CHANGES ............................................................................................................................................130
INTERFACE NOTES .........................................................................................................................................132
REGISTER NOTES ...........................................................................................................................................133
FEATURE NOTES ...........................................................................................................................................138
BLEND OPTIMIZATION NOTES ...........................................................................................................................141
TEXTURE NOTES............................................................................................................................................141
GA POINT/LINE/POLYGON SETUP ....................................................................................................................142
CB AA/CLEAR SETUP ....................................................................................................................................143
ERRATA .......................................................................................................................................................145
REGISTERS ............................................................................................................................................. 147
COMMAND PROCESSOR REGISTERS ..................................................................................................................147
COLOR BUFFER REGISTERS ..............................................................................................................................156
FOG REGISTERS.............................................................................................................................................172
GEOMETRY ASSEMBLY REGISTERS ....................................................................................................................175
GRAPHICS BACKEND REGISTERS .......................................................................................................................186
RASTERIZER REGISTERS ...................................................................................................................................199
CLIPPING REGISTERS ......................................................................................................................................202
© 2010 Advanced Micro Devices, Inc.
Proprietary
4
Revision 1.5
11.8
11.9
11.10
11.11
11.12
June 8, 2010
SETUP UNIT REGISTERS ..................................................................................................................................210
TEXTURE REGISTERS.......................................................................................................................................218
FRAGMENT SHADER REGISTERS ...................................................................................................................229
VERTEX REGISTERS ....................................................................................................................................254
Z BUFFER REGISTERS .................................................................................................................................279
© 2010 Advanced Micro Devices, Inc.
Proprietary
5
Revision 1.5
June 8, 2010
1. Introduction
1.1 Introducing the R5xx Family
The R5xx family provides the fastest and most advanced 2D, 3D, and multimedia graphics performance for desktop
PCs in the performance mainstream markets. The R5xx family supports Shader Model 3.0, advanced memory
interface technology, a brand new display controller and a consumer electronics (CE) quality TV (NTSC/PAL)
encoder. The R5xx family represents AMD‟s 2nd generation PCI Express technology product and leverages a brand
new graphics architecture. The R5xx family builds on the R3xx architecture. As such, much of this guide is
applicable to R3xx and R4xx chips as well with some caveats. Where applicable, generational differences are noted.
1.2 Feature Highlights
1.2.1 Shader Technology






Support for Microsoft® DirectX® 9.0 programmable vertex and pixel shaders in hardware.
Shader Model 3.0 vertex and pixel shader support.
Full speed 32-bit floating point processing.
High dynamic range rendering with floating point blending and anti-aliasing support.
High performance dynamic branching and flow control.
Complete feature set also supported in OpenGL® 2.0.
1.2.2 Anti-Aliasing





2x/4x/6x Anti-Aliasing modes.
Sparse multi-sample algorithm with gamma correction, programmable sample patterns, and centroid
sampling.
New Adaptive Anti-Aliasing mode.
Temporal Anti-Aliasing.
Lossless Color Compression (up to 6:1) at all resolutions, up to and including widescreen HDTV.
1.2.3 New Ring Bus Memory Controller







Programmable arbitration logic maximizes memory efficiency, software upgradeable.
New fully associative texture, color, and Z cache design.
Hierarchical Z-Buffer with Early Z Test.
Lossless Z-Buffer Compression (up to 48:1).
Fast Z-Buffer Clear.
Z Cache optimized for real-time shadow rendering.
Optimized for performance at high display resolutions, up to and including widescreen HDTV.
1.3 Features in Detail
1.3.1 2D Acceleration Features

A highly optimized 128-bit engine, capable of processing multiple pixels/clock.
© 2010 Advanced Micro Devices, Inc.
Proprietary
6
Revision 1.5






June 8, 2010
Hardware acceleration provided for BitBLT, line drawing, polygon and rectangle fills, bit masking,
monochrome expansion, panning and scrolling, scissoring, and full ROP support (including ROP3).
Optimized handling of fonts and text using ATI proprietary techniques.
Game acceleration including support for Microsoft's DirectDraw: Double Buffering, Virtual Sprites,
Transparent BLT, and Masked BLT.
Acceleration in 8/15/16/32-bpp modes.
Support for WIN 2000 & WIN XP GDI extensions: Alpha BLT, Transparent BLT, Gradient Fill.
Hardware cursor support up to 64x64x32-bpp, with alpha channel for direct support of WIN 2000 & WIN
XP alpha cursor standard.
1.3.2 3D Acceleration Features







Fully DirectX 9.0 compliant, including full speed 32-bit floating point per component operations.
Shader Model 3.0 support with programmable vertex shaders (full operand and operation support) allowing
up to 1024 instructions and 256 vectors of constant store. This includes vertex shader loops, branches, and
subroutines, which allow support of the following:
o 1024 vertex shader instruction store.
o 261,888 instructions with a single loop.
o 4+ trillion instructions with nested loops.
o Dynamic flow control.
o 8 full vertex processing units.
Advanced pixel shaders with the following features:
o New advanced shader design, with ultra-threading sequencer for high efficiency operations.
o Full Pixel Shader 3.0 support.
o Advanced, high performance branching support.
o 32-bit floating point support for high dynamic range computations.
Full anti-aliasing on render surfaces up to and including 64-bit floating point formats.
Support for 2xAA, 4xAA and 6xAA subsamples, with little performance loss in most cases.
Advanced AA quality algorithms, generating visuals that are superior to other solutions with an equivalent
number of samples.
New adaptive anti-aliasing modes dynamically select between fast multi-sampling and high quality supersampling per polygon, delivering the benefits of both techniques.
1.4 Changes from R3xx/4xx
Changes from R3xx to R4xx












Support for 1, 2, 3 and 4 quad pixel pipes
Support for 1 to 6 vertex shader pipes
HDTV resolution support for HiZ
Support of 16x16 and 32x32 pixel tile sizes (32x32 should now be the preferred amount)
Vastly redesigned Memory controller, with new client interfaces
Support for 8b of subpixel precision
Native support of 4Kx4K raster target
PS instruction support now at 512 each for Scalar, Vec3 and Texture (1536 total instructions)
VS native support for Sin/Cos
TX Component swizzling
Enhanced texture performance
MRT and wide pixel performance fixes
© 2010 Advanced Micro Devices, Inc.
Proprietary
7
Revision 1.5





June 8, 2010
Fog alpha rounding matches RGB
Line stipple fixes; SU texture stuffing improvements
LOD Clamp/bias re-order
2D support for larger pixels (Pitch at 16b)
4x AA buffer tiling is changed when memory mapping is not used
Changes from R4xx to R5xx















New Memory controller
Support of VS3.0 features, except Vertex fetch
Support of all PS3.0 features, including extended GPRs and Constants, all branching and predication
New FP32 US, including most IEEE NANs, INFs behavior corrected (still TRUNC rounding mode)
Support of new Z range [-2,2], with per pixel clamping in SC
Support of up to 11 texture sets (10 explicit), or 44 iterators
Support of color to texture mappings, and texture to color mappings (for performance improvements)
New IS_IP for better mapping of components from VS to PS
Color now in FP20 mode, instead of S3.12 mode
New HiZ compression mode, allows high precision Z values to be stored
New FP16 render surfaces support, including blending and all backend functions, but not texture filtering
Fully set associative caches for Texture, Color, and Z
New more efficient fifos for all MC clients
New Filter4 mode for Texture unit
New 1b texture mode for texture unit
© 2010 Advanced Micro Devices, Inc.
Proprietary
8
Revision 1.5
June 8, 2010
2. Tiling
2.1 Overview
R3xx-R5xx support two types of blocks


Micro block
Macro block
Each block type can either be linear or tiled.
2.2 Micro Blocks
A micro block refers to a 32-byte consecutive data in memory. It is aligned to a 32-byte boundary, which means that
the 5 LSBs of a micro-block address are zeros. Micro blocks can be linear or tiled. Linear maps a 1D area of an
image to the block. Tiled maps a 2D area of an image to a block. The following table shows the different type of
micro blocks and the region of the 2D image that maps to it (x X y)
Micro-linear
Micro-tiled
8 bit pixel
32x1 pixels (x=32 , y=1)
8x4 pixels (x=8 , y=4) supported by : tx/cb/hdp
16 bit pixel
16x1 pixels (x=16 , y=1)
4x4 pixels (x=4 , y=4) supported by : tx/cb/zb/hdp
16 bit pixel
16x1 pixels (x=16 , y=1)
8x2 pixels (x=8 , y= 2) supported by: tx/cb/hdp/disp
32 bit pixel
8x1 pixels (x=8 , y=1)
4x2 pixels (x=4 , y=2) supported by: tx/cb/zb/hdp/disp
64 bit pixel
4x1 pixels (x=4 , y=1)
2x2 pixels (x=2 , y=2) supported by: tx/hdp
128 bit pixel
2x1 pixels (x=2, y=1)
2.3 Macro blocks
A macro block refers to a 2K-byte consecutive data in memory. Macro-blocks loosely refer to the size a DRAM
page. How micro tiles are arranged in a macro-tile is controlled by whether the macro-block is linear or tiled. Linear
macro block maps x-order sequential array of micro-blocks to a macro-block. When the end of the current scan is
reached, the macro-block continues with data from the next micro-tile in the next scan. The alignment for Linear
macro-blocks is 32 bytes. An image can generally be more compact using macro-linear, but it is typically slower in
rendering performance. Tiled macro-blocks map a 2D region of micro-blocks into a macro-block. Tiled macroblocks are aligned to a 2K-byte boundary, which means that the 11 LSBs of a macro-block address are zeros
There are 64 micro-blocks in a macro-block (2k divided by 32 bytes). In a tiled macro-block these 64 micro-blocks
are arranged as an 8x8. The number of pixels in x and y that map into a tiled macro-block is based on pixel size and
micro-block type. Multiplying the data from the previous table by 8 can do this:
© 2010 Advanced Micro Devices, Inc.
Proprietary
9
Revision 1.5
8 bit pixel
Macro-tiled
Micro-linear
256x8
Macro-tiled
Micro-tiled
64x32
16 bit pixel (8x2)
128x8
64x16
16 bit pixel (4x4)
128x8
32x32
32 bit pixel
64x8
32x16
64 bit pixel
32x8
16x16
© 2010 Advanced Micro Devices, Inc.
Proprietary
June 8, 2010
10
Revision 1.5
June 8, 2010
3. Surface Formats
This section describes all of the surface formats used by the R3xx-R5xx texture units and frame buffers. These
formats are first listed in summary, together with a list of features (fog, blend etc.) supported by each format.
8-bit Formats
Format
Layout
C_8
7
6
C2_4
7
C_3_3_2
7
5
4 3
C0
2
1
0
6 5
C1
4
2 1
C0
0
6 5
C2
4
3
3 2
C1
Range
1 0
C0
Blend
Fog
0.0 to 1.0 (unsigned)
-1.0 to +1.0 (signed)
0.0 to 1.0
Yes
Display
Yes
No
Dither Filter
Yes
Yes
Yes
No
No
No
Yes
0.0 to 1.0
Yes
No
No
No
Yes
16-bit Formats
Format
Layout
Range
C_16
15 14 13 12 11 10 9
8 7
C0
6
5
4
3
2
1
0
C_16_MPEG
15 14 13 12 11 10 9
8 7
C0
6
5
4
3
2
1
0
0.0 to 1.0 (unsigned)
-1.0 to +1.0 (signed)
-1.0 to +1.0
16
Blend
Fog
No
Display
No
No
Dither Filter
No
Yes
No
No
No
No
Yes
16
No
No
No
No
No
C_16_FP
15 14 13 12 11 10 9
8 7
C0
6
5
4
3
2
1
0
-2 to +2
C2_8
15 14 13 12 11 10 9
C1
8
7
6
5
4 3
C0
2
1
0
Yes
Yes
No
Yes
Yes
C_5_6_5
15 14 13 12 11 10 9
C2
8 7
C1
6
5
4
3
2 1
C0
0
0.0 to 1.0 (unsigned)
-1.0 to +1.0 (signed)
0.0 to 1.0
Yes
Yes
Yes
Yes
Yes
C_6_5_5
15 14 13 12 11 10 9
C2
8
7 6
C1
5
4
3
2 1
C0
0
No
No
No
No
Yes
C4_4
15 14 13 12 11 10 9
C3
C2
8
6 5
C1
4
3
2 1
C0
0
0.0 to 1.0 (unsigned)
-1.0 to +1.0 (signed)
0.0 to 1.0
Yes
Yes
Yes
Yes
Yes
C_1_5_5_5
15 14 13 12 11 10 9
C3
C2
8
4
3
2 1
C0
0
0.0 to 1.0
Yes
Yes
Yes
Yes
Yes
7
7 6
C1
5
32-bit Formats
Format
Layout
24
C4_8
C3
C3
C2
24
C_10_11_11
C2
C3
24
C2
8
C1
16
C2
24
C_11_11_10
C_2_10_10_10
C2
24
C4_8_GAMMA
Range
16
0
C0
8
C1
0
C0
16
C1
8
16
C1
8
16
C1
8
0
C0
0
C0
0
C0
C2_16
24
C1
16
8
C0
0
C2_16_MPEG
24
C1
16
8
C0
0
© 2010 Advanced Micro Devices, Inc.
Proprietary
Blend
Fog
0.0 to 1.0 (unsigned)
-1.0 to +1.0 (signed)
0.0 to 1.0
Yes
Display
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
0.0 to 1.0 (unsigned)
-1.0 to +1.0 (signed)
0.0 to 1.0 (unsigned)
-1.0 to +1.0 (signed)
0.0 to 1.0 (unsigned)
-1.0 to +1.0 (signed)
0.0 to 1.0 (unsigned)
-1.0 to +1.0 (signed)
-1.0 to +1.0
No
No
No
No
Yes
No
No
No
No
Yes
Yes
No
No
No
Yes
No
No
No
No
Yes
No
No
No
No
Yes
11
Dither Filter
Revision 1.5
June 8, 2010
C2_16_FP
24
C1
16
8
C0
0
-216 to +216
No
No
No
No
No
C_32_FP
24
16
C0
8
0
-2127 to +2127
No
No
No
No
No
0
0.0 to 1.0
Yes
Yes
Yes
Yes
Yes
0
0.0 to 1.0
Yes
Yes
Yes
Yes
Yes
0
0.0 to 1.0
Yes
Yes
Yes
Yes
Yes
Blend
Fog
0.0 to 1.0 (unsigned)
-1.0 to +1.0 (signed)
-216 to +216
No
No
No
No
Yes
No
No
No
No
No
No
No
No
No
No
C_AVYU
C_VYUY
C_YVYU
24
A
16
V
24
V
16
Y1
U
8
V
24
Y1
8
Y
16
V
Y0
8
Y0
U
64-bit Formats
Format
Layout
Range
C4_16
56
C3
48
40
C2
32
24
C1
16
8
C0
0
C4_16_FP
56
C3
48
40
C2
32
24
C1
16
8
C0
0
C2_32_FP
56
48
C1
40
32
24
16
C0
8
0
-2127 to +2127
112
C3
96
80
C2
48
C1
32
16
C0
0
-2127 to +2127
Display
Dither Filter
128-bit Formats
Format
C4_32_FP
Layout
64
Range
Display
No
Blend
Fog
No
No
Depth Formats
Format
Layout
W_24
24
W_24_FP
24
© 2010 Advanced Micro Devices, Inc.
Proprietary
Range
Read
8
0
0 to 224-1
Yes
No
16
8
0
-263 to +263
Yes
Yes
DEPTH
DEPTH
Write
16
12
Dither Filter
No
No
Revision 1.5
June 8, 2010
4. Texture Memory Layout
4.1 Macro - Linear / Micro - Linear
The starting address of an image is aligned to a 32-Byte boundary specified by register TX_OFFSET[31:5]. The
texels that make up the image are stored in row-column order. Each row of an image is aligned to 32 Bytes. The
image is stored contiguously in memory. This is illustrated in the following figure.
Image
0
S
Memory
N
32B Alligned
0
0
S
T=0
T
N
0
32B Alligned
S
T=1
M
N
Texel
4.2 Macro - Linear / Micro - Tiled
The starting address of an image is aligned to a 32-Byte boundary specified by register TX_OFFSET[31:5].
The Micro-Tiles that make up the image are stored in row-column order. Each row of Micro-Tiles is aligned to
32 Bytes. The image is stored contiguously in memory. This format is very similar to Linear/Linear with the
exception that Micro-Tiles are stored in row-column order, while texels are tiled within each Micro-Tile. This is
illustrated in the following figure.
Image
0
SMICRO
Memory
A
32B Alligned
0
TMICRO = 0
TMICRO
32B Alligned
TMICRO = 1
B
0
SMICRO
A
0
SMICRO
A
Texel
MicroTile
4.3 Macro - Tiled / Micro - Linear
The starting address of an image is aligned to a 2K-Byte boundary specified by register TX_OFFSET[31:5].
The Macro-Tiles that make up the image are stored in row-column order. Each row of Macro-Tiles is aligned to
2K Bytes. Each image is stored contiguously in memory. This is illustrated in the following figures. Micro-Tiles
are re-ordered within a Macro-Tile to improve dram locality.
© 2010 Advanced Micro Devices, Inc.
Proprietary
13
Revision 1.5
S
June 8, 2010
N-1
MacroTile : MxN Texels
T
MxN-1
8 x 256 x 8b
8 x 128 x 16b
8 x 64 x 32b
8 x 32 x 64b
0
Byte0
Byte2K-1
M-1
Image
SMACRO
0
Memory
C
2KB Alligned
0
TMACRO = 0
TMACRO
2KB Alligned
TMACRO = 1
D
0
SMACRO
C
0
SMACRO
C
MacroTile = MxN Texels
Texel
4.4 Macro - Tiled / Micro - Tiled
The starting address of an image is aligned to a 2K-Byte boundary specified by register TX_OFFSET[31:5].
8x8 Micro-Tiles are stored within a Macro-Tile. The Macro-Tiles that make up the image are then stored in rowcolumn order. Each row of Macro-Tiles is aligned to 2K Bytes. Each image is stored contiguously in memory.
This is illustrated in the following figures. Micro-Tiles are re-ordered within a Macro-Tile to improve dram
locality.
S
MacroTile : 8x8 MicroTiles
T
3F
Byte2K-1
0
Byte0
MicroTile
© 2010 Advanced Micro Devices, Inc.
Proprietary
14
Revision 1.5
Image
SMACRO
0
June 8, 2010
Memory
C
2KB Alligned
0
TMACRO = 0
TMACRO
2KB Alligned
TMACRO = 1
D
0
SMACRO
C
0
SMACRO
C
MacroTile = 8x8 MicroTiles
MicroTile
4.5 MipMaps
For a MipMap pyramids, the levels are stored contiguously in memory. The ordering of the images is from
largest to smallest. Each level of a mipmap pyramid must follow the same alignment and padding restrictions as
a planar image. If Macro-Tiled, once image size drops below size of Macro-Tile, the hardware switches to
Macro-Linear to minimize memory use.
Memory
MipMaps stored contiguously from largest to smallest.
Level 0
Level 1
Level 2
Level 3
* Not Drawn To Scale
4.6 Cube Maps
Cube map faces must be power of two in width and height, and must be square. Cube maps can be planar or
mipmapped. All six cube faces must have the same dimensions as Face0.
The faces of a cube map are stored contiguously in memory from Face0 to Face5. If mipmapped, levels 1 thru N
are then stored from largest to smallest. If Macro-Tiled, once image size drops below size of Macro-Tile, the
hardware switches to Macro-Linear to minimize memory use.
© 2010 Advanced Micro Devices, Inc.
Proprietary
15
Revision 1.5
June 8, 2010
Memory
Face0
Face1
Cube faces stored contiguously from 0 to 5.
MipMaps stored contiguously from largest to smallest.
Face2
MipLevel 0
Face3
Face4
Face5
Face0
Face1
Face2
MipLevel 1
Face3
Face4
Face5
* Not Drawn To Scale
4.7 3D Textures
3D textures must be power of two in width, height, and depth, however they can be non-square. 3D textures can
be planar or mipmapped.
The layers of a 3D texture are stored contiguously in memory from Layer0 to LayerM. If mipmapped, levels 1
thru N are then stored from largest to smallest. If Macro-Tiled, once image size drops below size of Macro-Tile,
the hardware switches to Macro-Linear to minimize memory use.
© 2010 Advanced Micro Devices, Inc.
Proprietary
16
Revision 1.5
June 8, 2010
Memory
Layer0
Layers stored contiguously from 0 to N.
MipMaps stored contiguously from largest to smallest.
Layer1
MipLevel 0
Layer2
Layer3
Layer0
Layer1
MipLevel 1
* Not Drawn To Scale
© 2010 Advanced Micro Devices, Inc.
Proprietary
17
Revision 1.5
June 8, 2010
5. Command Processor
5.1 Overview
The Command Processor is a programmable processor that is meant to provide some on-chip intelligence for a
Graphics Controller device. The CP architecture has been approached as a special-purpose computing engine,
targeted at fetching and interpreting a PROMO4 command stream.
The Command Processor takes on several tasks in a typical Graphics Controller:

Acts as a receiver of command streams from the video and graphics device driver(s) running on the host
CPU. These command streams are either read from system memory using bus-mastering on the PCI or
AGP bus, or directly written to the CP from the host CPU using the PCI or AGP (fast-write) bus. Three
streams are supported – one Ring Buffer and two Indirect Buffers.

Parses and interprets a command stream, and writes the parsed data to internal “Feature” modules of the
Graphics Controller device; for example, a 3D graphics processor, a 2D graphics processor, a Video
Processor, or an MPEG Decoder. The data writes can be 32, 64, 96, or 128 bits per clock. The 64, 96, and
128 bit writes will occur for “Vector Write Mode”. Vector write mode is valid when the stream (PQ, IQ1,
IQ2) is in Pull Mode. Push mode will only write DWORDs (i.e. Lower 32-bits of the 128-bit data bus will
be valid with a DWORD_Enable = “0001”. The 64 and 96-bit writes will only occur while the alignment of
the data is not on a 128-bit boundary.

There are two general-purpose DMA engines inside the CP, one for GUI-related tasks, and one intended for
Video Capture tasks. The DMA engines do byte alignment between the source and destination surfaces.
5.2 Host Programming Model Description
This section describes the manner in which the host CPU communicates with the graphics controller chip.
5.3 Push vs Pull Model
The Push Model is also referred to as Programmed I/O (PIO). In this model the host CPU is writing to the graphics
controller chip across either the PCI or AGP bus. That is, the host is “pushing” command information to the
graphics controller. This information is in one of two forms:
1) A sequence of register writes to setup the state of a processing engine on the graphics controller, and
then starting the engine running. Typically, engines are started as a side-effect of writing to a special
“trigger” or “initiator” register.
2) A sequence of Command Packets, which are a “compressed” way of conveying the command
information to the graphics controller, relying on an intelligent processor in the graphics controller to
convert the command packets into register writes to other processing engines in the graphics controller.
It is expected that option (1) above will only be used for debug purposes.
The Pull Model utilizes bus-mastering on the part of the graphics controller, as it actively goes out and reads from an
area of system memory in which the host CPU has previously placed command information. An important part of
the pull model is how the host and the graphics controller manage access to the shared buffer in system memory.
This is discussed in the following section.
The pull model allows more slip between the CPU and the graphics controller than does the push model, assuming
that the command buffer for the push model is limited to on-chip storage.
The push model may have some advantage when the overall system performance is taken into account as it lightens
the bandwidth demand on system memory as compared to the pull model. The push model may be able to make-up
for its limited slip by implementing an on-chip command buffer that “spills-over” into the frame buffer; however,
this of course begins to place a demand on the frame buffer bandwidth to write and read the command buffer.
© 2010 Advanced Micro Devices, Inc.
Proprietary
18
Revision 1.5
June 8, 2010
The Command Processor will support both the push and pull models; however, switching between these two models
must be carefully controlled. It is intended that switching is not done often; most likely the model is chosen at reset
time, and never changed once the system is running. The pull model is the preferred choice for systems that allow
bus-mastering, and whose API allows concurrent processing between the host CPU and the graphics controller,
primarily because of its superior capability for overlapped processing. The push model is available for systems that
are not well-suited to using the pull model.
5.4 Ring Buffer Management
When the Graphics Controller is set to operate in the bus-mastering mode (pull model), the host application, say a
driver, has to allocate a block of system memory as a buffer for the command packets it issues to the Graphics
Controller. The command packets, or simply packets, instruct the Graphics Controller to carry out operations such
as drawing objects on the screen. This memory block is treated as if it is a ring that allows the packets to be placed
into and taken away from the memory in a circular manner, thus the name Ring Buffer.
The Ring Buffer is a shared memory space between two cooperating processors. It is used to implement one-way
communication from the Host processor (the Writer) to the Graphics Controller (the Reader). Each processor must
maintain the state that it believes that the Ring Buffer is in. The state is composed of:

Buffer Base: The address of the beginning of the buffer.

Buffer Size: The size of the buffer.

Write Pointer: The address that the Host is writing to.

Read Pointer: The address that the Graphics Controller is reading from.
In order for the Ring Buffer to work properly, both processors must maintain a consistent view of this state. The
Buffer Base and Buffer Size are generally initialized when the system is first brought-up, and rarely changed after
that point. It is a simple task to initialize both the Reader‟s and the Writer‟s copies of this state. The Read and
Write Pointers, on the other hand, change quite frequently as the Ring Buffer is in operation. In order to achieve
consistency, when the Writer (the host) updates the Write Pointer, he must send that value to the Reader‟s (the
Graphics Controller‟s) copy of the Write Pointer. And similarly, when the Reader updates the Read Pointer, he must
send that value to the Writer‟s copy of the Read Pointer.
Packets are placed into the memory block, or buffer, from the beginning towards the end, i.e., from lower addresses
toward higher addresses. Once the data placement hits the end, it starts from the beginning again. Meanwhile, the
packets are consumed from the head of the queue in a manner similar to how they were placed.
Figure illustrates how the ring buffer operates when combined with the bus-mastering operation.
© 2010 Advanced Micro Devices, Inc.
Proprietary
19
Revision 1.5
start of buffer
Host
June 8, 2010
end of buffer
Graphics
Controller
Write Pointer Address
Buffer Base
Write Pointer
Buffer Size
Read Pointer
Write Pointer
Buffer Base
Ring Buffer
Server
PN-1
P2
P1
PN
Packets
Driver(s)
Buffer Size
Bus
Mastering
Unit
Ring Buffer
Com m and Packet
Buffer
Read Pointer
free area
Read Pointer
Address
Legend:
Register
Execution
Unit
Memory
data flow
Figure: Ring Buffer and its Control Structure
In the figure, packets are placed into the buffer in a counter-clockwise order, forming a packet queue. The first
packet in the queue is denoted by P1 , and the last by Pn . The start of the queue, P1 , is pointed to by the Read
Pointer(s). The memory portion that is not occupied by packets is called the free area, and it is pointed to by the
Write Pointer(s).
Initially, both the read and write pointers may point to the same location of the ring buffer, e.g. the start of the
memory block. The two pointers pointing to the same location of the ring buffer generally implies one of two
situations. One is that the buffer is empty, and the other is that the buffer is full. We want to define this situation as
an empty buffer. To resolve the ambiguity of both pointers being equal, we must prevent the case of a full buffer
from ever happening. It is the Host‟s responsibility to ensure that there is at least one free location in the buffer.
On the host side, the driver places command packets into the free area of the ring buffer, and informs the Graphics
Controller of any changes to the Write Pointer by writing directly to the Write Pointer register inside the Graphics
Controller. The host tracks free-space in the buffer by comparing its Read and Write Pointers, and suspends writing
if the buffer becomes (almost) full.
On the Graphics Controller side, packets are taken away one-by-one from the head of the packet queue, pointed to
by its Read Pointer, through the Host Bus Interface, and placed into the Command Packet Buffer. As the Graphics
Controller updates its copy of the Read Pointer, it uses a bus-mastering write to update the Host‟s copy of the Read
Pointer, residing in a shared memory location. The Graphics Controller has a register that holds the memory address
of where the Host‟s Read Pointer resides, and uses that for the address of the bus-mastering write. The Graphics
Controller tracks free-space in the buffer by comparing its Read and Write Pointers, and suspends reading if the
buffer becomes empty (i.e., Read Pointer == Write Pointer).
To reduce traffic on the system memory bus, the Graphics Controller should not update the Host‟s copy of the Read
Pointer every time it changes on the Graphics Controller side. To facilitate this, we have adopted a concept of a
© 2010 Advanced Micro Devices, Inc.
Proprietary
20
Revision 1.5
June 8, 2010
block of dwords in the packet queue. The Graphics Controller will update the host‟s copy of the Read Pointer every
time it has consumed a “block‟s-worth” of data from the ring buffer. The other time when the Graphics Controller
will update the Read Pointer is when it thinks that the packet queue is empty. The size of the block is
programmable, to allow the programmer to trade-off the amount of time the system bus spends doing real data
transfer vs the amount of time it spends on the communication overhead of updating read/write pointers. Larger
block sizes tend to reduce communication overhead, at the “expense” of reducing the number of blocks in the queue,
which reduces the amount of “slip” (or de-coupling) between the Host and the Graphics Controller.
To reduce traffic on the system memory bus, the driver may want to minimize the frequency of accesses to its copies
of the Read and Write Pointers. To minimize reads of the Read Pointer, it can check them once, calculate an amount
of free space, and then decrement a local copy of the amount of free space as it adds packets to the queue. When it
sees that the free-space is small (queue nearly full), it can start this procedure over again. (Its copy of the Read
Pointer may have changed since the last time he read it.) The host also has the option of updating the Graphics
Controller‟s Write Pointer on a less-frequent basis than with every write he does to the packet queue, possibly on a
block-basis similar to the Graphics Controller‟s mechanism. However, if the buffer is running close to empty, any
delay in updating the Graphics Controller‟s Write Pointer may add latency to the Graphics Controller‟s response to
this command packet. Also, the host must be careful to update the Graphics Controller‟s copy of the Write Pointer
if it wants the Graphics Controller to read from the queue until it is empty.
When the queue has become (almost) full, the host will have to poll the Read Pointer until space becomes available.
In certain systems (Pentium II for example), this polling will stay within the processor cache, thus avoiding traffic
on the system bus, and the snoop logic of the host CPU will take care of maintaining consistency between the main
memory and the processor cache when the Graphics Controller performs its bus-mastering write of the Read Pointer.
It is important to note that the Read Pointer must reside in PCI space in order for this snoop technique to work.
AGP writes are not snooped.
5.5 Chipset Coherency Issues
The Rage128 product revealed a weakness in some motherboard chipsets in that there is no mechanism to guarantee
that data written by the CPU to memory is actually in a readable state before the Graphics Controller receives an
update to its copy of the Write Pointer. In an effort to alleviate this problem, we‟ve introduced a mechanism into the
Graphics Controller that will delay the actual write to the Write Pointer for some programmable amount of time, in
order to give the chipset time to flush its internal write buffers to memory.
There are two register fields that control this mechanism: PRE_WRITE_TIMER and PRE_WRITE_LIMIT. There
is also a staging register placed “in front of” the actual Write Pointer register of the CP. All host writes go into the
staging register and are held there until one of two events occurs: the down-counter of PRE_WRITE_TIMER has
expired; or the host has written the staging register PRE_WRITE_LIMIT-times, forcing the contents of the staging
register into the actual Write Pointer register. The down-counter is seeded with PRE_WRITE_TIMER every time
the host writes to the Write Pointer register address, and expires when it reaches zero. This implementation does not
guarantee a certain time-delay between the host write to the Write Pointer, and the Graphics Controller read of the
system memory; because the host could flood the Graphics Controller with multiple writes (more than the
PRE_WRITE_LIMIT) in a short amount of time, thus overriding the time-delay imposed by the
PRE_WRITE_TIMER. However, since the normal operation of this system is to increase the Write Pointer by some
significant amount with each write, it is likely that by the time the PRE_WRITE_LIMIT has been reached, the data
has in fact been “pushed” through the chipset‟s write buffer by subsequent writes to the ring buffer in system
memory.
Note that programming the PRE_WRITE_TIMER and PRE_WRITE_LIMIT to zero allows the chip to behave just
as the Rage128 did.
The above solution is based on a time delay, the assumption being that if the chipset is given enough time, the write
buffer will be flushed to memory, and become available for a coherent read.
5.6 Indirect Buffer Management
The Command Processor has the capability to read commands from other locations in memory, outside of the Ring
© 2010 Advanced Micro Devices, Inc.
Proprietary
21
Revision 1.5
June 8, 2010
Buffer. These locations are known as Indirect Buffer1 and Indirect Buffer2. This is accomplished as follows: there
is a packet in the Primary command stream (being read from the ring buffer) which sets up the Indirect Buffer1
Address and Size registers of the Command Processor. The writing of the Indirect Buffer1 Size register triggers the
Command Processor to begin fetching the new stream from the provided address. The last packet to be parsed from
the Primary stream is the one that sets the Indirect Buffer1 Address and Size registers. The CP then begins fetching
data from Indirect Buffer1. The data stream in Indirect Buffer1 may set up the Indirect Buffer2 Address and Size
registers of the Command Processor. As before, writing of the Indirect Buffer1 Size register triggers the Command
Processor to begin fetching the new stream from the provided address. The last packet to be parsed from the
Indirect Buffer1 stream is the one that sets the Indirect Buffer2 Address and Size registers. The CP fetches the
correct amount of data from Indirect Buffer2 until The Buffer2 Size is exhausted; it then returns to its interpretation
of packets from Indirect Buffer1. The CP fetches the correct amount of data from Indirect Buffer1 until the Buffer1
Size is exhausted; it then returns to its interpretation of packets from the Primary Stream (being read from the ring
buffer).
5.7 Overview of DMA Operation
The DMA engines in the Command Processor fetch commands from the frame buffer memory which tell them what
to do. The command in memory is stored in a structure known as a Descriptor, having a four-doubleword
(DWORD) format as shown below:
Ordinal
Name
Bit
Function
0
SRC_ADDR
31:0
Source address
1
DST_ADDR
31:0
Destination address
2
COMMAND
31:0
Command word. (See description below)
3
(Reserved)
31:0
The COMMAND word has the following format:
31
30
29
28
27
26
25:24
23:22
20:0
EOL
INTDIS
DAIC
SAIC
DAS
SAS
DST_SWAP
SRC_SWAP
BYTE_COUNT[20:0]
End Of List Marker
Interrupt Disable
Destination Address Increment Control
Source Address Increment Control
Destination Address Space
Source Address Space
Destination Endian Swap Control
Source Endian Swap Control
Byte Count of Transfer
There are some constraints on the programming of the Descriptor, as follows: If either the Source or the Destination
is in the register address space, or is programmed to be non-incrementing, then the atomic transfer unit is assumed to
be a DWORD. Namely, the bottom two-bits of the BYTE_COUNT and the Address will be ignored (assumed
“00”).
Note that a BYTE_COUNT of zero will perform no operation.
Multiple Descriptors may be stored contiguously in memory to make up a Descriptor Table (DT) (see Figure). The
last Descriptor in the Descriptor Table must be marked as such so that the DMA engine knows when to stop
consuming commands.
The programmer provides the DMA engine with a pointer to the beginning of the Descriptor Table, and the DMA
© 2010 Advanced Micro Devices, Inc.
Proprietary
22
Revision 1.5
June 8, 2010
engine fetches one Descriptor at a time, interprets the command to carry out a transfer, and then moves on to the
next Descriptor in the table. As mentioned above, the DMA engine will stop when it reaches the last Descriptor in
the table.
There is a bit called CP_SYNC in the Descriptor Address register (DMA_xxx_TABLE_ADDR). If this bit is set,
the DMA will “lock-out” the microengine from performing any writes on the register backbone while the DMA is
active. This mechanism can be used to synchronize a DMA-driven stream of register writes to the command FIFO.
among other things.
A DMA channel may have its operation aborted by writing a „1‟ to the ABORT_EN bit of the DMA_xxx_STATUS
register. It is important that the programmer then poll the ACTIVE bit of that same register, waiting for a value of
„0‟, before writing a „0‟ to the ABORT_EN bit. Once the ACTIVE bit is „0‟, the programmer is guaranteed to readback stable state from all DMA registers.
Memory Space
TABLE_ADDR Register
Dword 0
Dword 1
Dword 2
Dword 3
Dword 4
Dword 5
Dword 6
Dword 7
Descriptor 0
Descriptor 1
.
.
.
Dword (n*4)
Dword (n*4)+1
Dword (n*4)+2
Dword (n*4)+3
Descriptor n (Last)
Figure: Descriptor Table Layout in Memory
An alternate method to writing the DMA_XXX_TABLE_ADDR register to initiate a DMA operation is to write the
descriptors directly to the CP. This saves the fetching of the descriptor table from memory.
Three registers are provided for each of the DMA engines (CP_XXX_SRC_ADDR, CP_XXX_DST_ADDR,
CP_XXX_COMMAND). The contents of these registers have the same fields as the SRC_ADDR, DST_ADDR, and
COMMAND DWORDs of the descriptor table entry described above. Except that the EOL is hard-coded TRUE in
the COMMAND DWORD. Writing to the CP_XXX_COMMAND register initiates a DMA operation using the
descriptor described in all three registers. A table of descriptors can be built from multiple Type-0 packets each
containing the SRC, DST, and COMMAND data.
© 2010 Advanced Micro Devices, Inc.
Proprietary
23
Revision 1.5
June 8, 2010
5.8 Resetting the Command Processor
To support recovery from a power-down state the read pointer (CP_RB_RPTR) is writable. The read pointer is
initialized by writing the writable read pointer (CP_RB_RPTR_WR). Then, when the write pointer
(CP_RB_WPTR) is subsequently written the contents of the writable read pointer (CP_RB_RPTR_WR) are
transferred to the active read pointer (CP_RB_RPTR). As a precaution, an enable bit must be set in the control
register (CP_RB_CNTL) to allow the contents to transfer to the active read pointer (CP_RB_RPTR). Note that the
read pointer still resets to zero to ensure starting at the beginning of the buffer if the host does not initialize the
writable read pointer (CP_RB_RPTR_WR).
Therefore, a certain sequence of actions is required of the host in order to perform a “clean” soft reset of the CP:
1) Write CP_CSQ_CNTL and CP_CSQ_MODE to zero, effectively disabling the CP.
2) Write to the proper RBBM register to assert and then de-assert the Soft Reset signal to the CP.
3) Set the RB_RPTR_WR_ENA bit to enable writing of the RPTR if desired not to start from the
beginning of the buffer.
4) Write the CP_RB_RPTR_WR register if it is desired not to start at the beginning of the buffer.
5) Write CP_RB_WPTR, to make it match the RPTR, causing the ring buffer to appear to be empty.
6) Clear the RB_RPTR_WR_ENA bit if no further writes of the RPTR are desired.
7) Write CP_CSQ_CNTL or CP_CSQ_MODE to set the mode back to whatever you want.
5.9 Command Stream Synchronization
In the RBBM, there is an event engine that can be used to synchronize the sending of transactions to the Register
Backbone based on status signals from its clients. The CP however has a mechanism that can directly provide the
Host with knowledge of command status. This mechanism is the eight “SCRATCH” registers and their associated
functionality.
Associated with the eight “SCRATCH” registers in the CP are a scratch address register and a write mask. When a
scratch register is written, the CP will subsequently write its value to a location equal to what is programmed in the
SCRATCH_ADDR register plus the number (0 to 7) of the scratch register. The writing of the scratch register‟s
value by the CP is qualified by the register‟s write mask (SCRATCH_UMSK).
So, at the end of processing an Indirect Buffer, for example, a Type-0 packet can be inserted that writes a data
pattern to SCRATCH_REG1. The driver software can poll the external location SCRATCH_ADDR+1 and when it
changes to the value that was inserted in the Type-0 packet, the Driver will “know” that the CP has completed
parsing the indirect buffer up to that point. Note that this status only indicates that the CP is done to that point, the
data still may be being used by the rest of the pipeline.
© 2010 Advanced Micro Devices, Inc.
Proprietary
24
Revision 1.5
June 8, 2010
For R5xx an interrupt is added associated with the scratch registers, which is asserted when the scratch register pair
selected is written to memory and is greater than or equal to the pair of values written by the Driver.
The CP can receive sync pulses from the back-end of the pipeline (CBA_CP_SYNC, CBB_CP_SYNC,
CBC_CP_SYNC, and CBD_CP_SYNC). When a pulse from each is received (pulse pair), the CP will write the
targeted scratch register with the corresponding CP_RESYNC_DATA value. The targeted scratch register is
determined by the 3-bit CP_RESYNC_ADDR which is a scratch register offset from the SCRATCH_ADDR base
address.
Because this function uses the SCRATCH_ADDR and SCRATCH_UMSK values, they must be initialized prior to
its use. The CP_RESYNC_ADDR and CP_RESYNC_DATA registers must also be programmed with the target
scratch register offset and the appropriate data respectively before the pulses are received. Both the
CP_RESYNC_ADDR and CP_RESYNC_DATA values are written into 8-deep FIFOs so that multiple
synchronization events can be en-queued in the CP.
If the sync pulses from the CB are asserted before programming the CP_RESYNC_ADDR and
CP_RESYNC_DATA, the logic will still work providing that Dynamic Clocking for the CP is disabled. Receipt of
the sync pulses by the CP does not cause the clocks to be enabled to the CP, so knowledge of these pulses may not
be remembered if Dynamic Clocking is enabled. Writing the CP_RESYNC_ADDR and CP_RESYNC_DATA
registers does enable the clocks to the CP. The “busy” signal to the CG will remain asserted as long as there is
RESYNC data in the ADDR and DATA FIFOs – keeping the clock enabled to the CP.
5.10 Starting the Indirect Streams
A write to the CP_IB_BUFSZ register triggers the Command Processor to start fetching the command stream from
the Indirect1 buffer, instead of from the Primary buffer. The CP will continue to fetch from the Indirect1 buffer,
starting at the address in the CP_IB_BASE register, and continuing until the CP_IB_BUFSZ amount is exhausted.
Then it will switch back to the Primary stream.
A write to the CP_IB2_BUFSZ register triggers the Command Processor to start fetching the command stream from
the Indirect2 buffer, instead of from the Indirect1 buffer. The CP will continue to fetch from the Indirect2 buffer,
starting at the address in the CP_IB2_BASE register, and continuing until the CP_IB2_BUFSZ amount is exhausted.
Then it will switch back to the Indirect1 stream.
Note that there are some important rules to follow when starting an indirect stream. Firstly, the write to the
CP_IB_BUFSZ or CP_IB2_BUFSZ register must be the last register-write of a Type 0 or Type 1 packet. The very
next packet that is delivered to the Command Stream Interpreter is the first packet of the respective indirect buffer.
The second rule is that the respective CP_IB_BASE or CP_IB2_BASE register must have been setup with the
proper value before the appropriate CP_IB_BUFSZ or CP_IB_BUFSZ register is written.
In PIO mode, the BUFSZ register still needs to be written with the size of the indirect buffer. Care must be taken to
write this register before the command queue fills in the CP.
© 2010 Advanced Micro Devices, Inc.
Proprietary
25
Revision 1.5
June 8, 2010
5.11 Writing Host Data to the Command Stream Queue
Either or all of the Primary, Indirect1 and Indirect2 streams can be delivered to the Command Processor via hostprogrammed writes to the Graphics Controller device. There is a range of register-space addresses assigned to each
of the three streams, that is, one aperture for the Primary Stream, one for the Indirect1 Stream, and one for the
Indirect2 Stream. The act of writing to a location in the aperture causes that data to be enqueued to the Command
Stream Queue. Note that the actual address of the written data is inconsequential; the data will be enqueued into the
Command Stream Queue in the order in which it was received from the host.
Note that each of the three streams can be in one of three delivery modes, resulting in nine possible combinations.
The three modes are:
1) OFF: The stream is disabled.
2) PUSH: The host is writing the stream data to the Command Processor. (also known as Programmed
I/O, or PIO mode)
3) PULL: The Command Processor is actively fetching the command stream from memory. (also known
as Bus Master, or BM mode)
Note that the BUFSZ register must be written to initiate indirect buffer parsing in the “PUSH” mode.
© 2010 Advanced Micro Devices, Inc.
Proprietary
26
Revision 1.5
June 8, 2010
5.12 Writing to the MicroEngine RAM
In order to change a location in the MicroEngine RAM, first load the CP_ME_RAM_ADDR Register with the
address of the RAM into which data is to be written. Next, the host performs two writes; the first must be to the
CP_ME_RAM_DATAH port, and the second to the CP_ME_RAM_DATAL port. Internally, the Command
Processor maintains a 40-bit holding registers which concatenates the lower 8-bits of the DATAH value to the top of
the 32-bit DATAL value, and at the end of the write of the DATAL value, the 40-bit value is written to the RAM at
the location specified by the RAM Address Register. The RAM Address Register is then auto-incremented to point
to the next location in the RAM. This process of writing two data values may be repeated to write to successive
RAM locations without re-loading the RAM Address Register.
5.13 Reading from the MicroEngine RAM
In order to read a location in the MicroEngine RAM, first load the CP_ME_RAM_RADDR Register with the
address of the RAM from which data is to be read. This write triggers the Command Processor to read the 40-bit
data value at that RAM location and transfer it to an internal 40-bit holding register. Also, the RAM Address
Register is auto-incremented to point to the next location in the RAM. Next, the host performs two read cycles, the
first from the DATAH port, and the second from the DATAL port. At the end of the DATAL cycle, the next
location of the RAM is transferred to the 40-bit holding register, and the RAM Address Register is again autoincremented. This process of reading two values may be repeated to read from successive RAM locations without
re-loading the RAM Address Register.
© 2010 Advanced Micro Devices, Inc.
Proprietary
27
Revision 1.5
June 8, 2010
5.14 Starting a DMA Operation
There are two methods to initiate a DMA operation – Descriptor Tables or Direct Descriptor Entry Register Writes.
To program a DMA operation via Descriptor Tables, the programmer has to build the table in the frame buffer first,
being sure to mark the last entry of the list as “End Of List”. Then, the programmer can write the starting address of
the descriptor table into the Descriptor Table Address Queue (DTAQ) through the xxx_DMA_TABLE_ADDR port.
The action of writing the first starting address into the DTAQ will trigger the DMA operation.
The type of transfer operation depends on the DMA_COMMAND DWORD in the Descriptor. It controls such
variables as: the length of the transfer, whether the Source/Destination addresses are in memory-space or registerspace, whether the Source/Destination addresses auto-increment with each transfer, and whether an interrupt is
generated when the entire Descriptor Table has been processed.
The second method - Direct Descriptor Entry Register Writes – involves writing the three DMA Entry registers.
Three registers are provided for each of the DMA engines (CP_XXX_SRC_ADDR, CP_XXX_DST_ADDR,
CP_XXX_COMMAND). The contents of these registers have the same fields as the SRC_ADDR, DST_ADDR, and
COMMAND DWORDs of the descriptor table entry. Except that the EOL is hard-coded TRUE in the COMMAND
DWORD. Writing to the CP_XXX_COMMAND register initiates a DMA operation using the descriptor described
in all three registers. A table of descriptors can be built from multiple Type-0 packets each containing the SRC,
DST, and COMMAND data.
© 2010 Advanced Micro Devices, Inc.
Proprietary
28
Revision 1.5
June 8, 2010
6. PM4
6.1
Packet Types
When programming in the PM4 mode, we do not need to write directly to registers to carry out drawing operations
on the screen. Instead, what we need to do is to prepare data in the format of PM4 Command Packets in the system
memory, and let the hardware (Microengine) to do the rest of the job.
Four types of PM4 command packets are currently defined. They are types 0, 1, 2 and 3 as shown in the following
figure. A PM4 command packet consists of a packet header, identified by field HEADER, and an information body,
identified by IT_BODY, that follows the header. The packet header defines the operations to be carried out by the
PM4 micro-engine, and the information body contains the data to be used by the engine in carrying out the
operation. In the following, we use brackets [.] to denote a 32-bit field (referred to as DWORD) in a packet, and
braces {.} to denote a size-varying field that may consist of a number of DWORDs. If a DWORD is shared by more
than one field, the fields are separated by „|‟. The field that appears on the far left takes the most significant bits, and
the field that appears on the far right takes the least significant bits. For example, DWORD [HI_WORD |
LO_WORD] denotes that HI_WORD is defined on bits 16-31, and LO_WORD on bits 0-15. A C-style notation of
referencing an element of a structure is used to refer to a subfield of a main field. For example,
MAIN_FIELD.SUBFIELD refers to the subfield SUBFIELD of MAIN_FIELD.
Type-0 packet
Bit position
Packet header
3 3 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1
9 8 7 6 5 4 3 2 1 0
1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0
00
COUNT
a
BASE_INDEX
REG_DATA_1
REG_DATA_2
IT_BODY
...
REG_DATA_n
Type-1 packet
Bit position
Packet header
3 3 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1
9 8 7 6 5 4 3 2 1 0
1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0
01
Reserved
REG_INDEX2
REG_INDEX1
REG_DATA_1
IT_BODY
REG_DATA_2
© 2010 Advanced Micro Devices, Inc.
Proprietary
29
Revision 1.5
June 8, 2010
Type-2 packet
Bit position
Packet header
3 3 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1
9 8 7 6 5 4 3 2 1 0
1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0
10
Reserved
Type-3 packet
Bit position
Packet header
3 3 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1
9 8 7 6 5 4 3 2 1 0
1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0
11
COUNT
IT_OPCODE
Reserved
DATA_1
DATA_2
IT_BODY
...
Data_n
6.1.1 Type-0 Packet
Functionality
Write N DWORDs in the information body to the N consecutive registers, or to the register, pointed to by the
BASE_INDEX field of the packet header.
Format
Ordinal
1
2
3
Field Name
[ HEADER ]
[REG_DATA_1]
[REG_DATA_2]
...
[REG_DATA_N]
N+1
Header Fields
Bit(s)
12:0
Field Name
BASE_INDEX
14:13
15
Reserved
ONE_REG_WR
29:16
COUNT
31:30
TYPE
Description
The BASE_INDEX[12:0] correspond to byte address bits [14:2]. So the
BASE_INDEX is the DWORD Memory-mapped address.
The BASE_INDEX field width supports up to DWORD address: 0x7FFF.
Reserved for future expansion of address space.
0:- Write the data to N consecutive registers.
1:- Write all the data to the same register.
Count of DWORDs in the information body. Its value should be N-1 if there
are N DWORDs in the information body.
Packet identifier. It should be zero.
Note: Symbol „:-‟ reads “defined as.”
Information Body
© 2010 Advanced Micro Devices, Inc.
Proprietary
30
Revision 1.5
Bit(s)
31:0
Field Name
REG_DATA _x
June 8, 2010
Description
The bits correspond to those defined for the relevant register. Note the suffix x
of REG_DATA_x stands for an integer ranging from 1 to N.
Comment
The use of this packet requires the complete understanding of the registers to be written.
6.1.2
Type-1 Packet
Functionality
Write REG_DATA_1 and REG_DATA_2 in the information body respectively to the registers pointed to by
REG_INDEX1 and REG_INDEX2. Note that this packet cannot address the entire address space. It is recommended
that Type 0 packets be used instead.
Format
Ordinal
1
2
3
Field Name
[ HEADER ]
[REG_DATA_1]
[REG_DATA_2]
Header fields
Bit(s)
10:0
21:11
29:22
31:30
Field Name
REG_INDEX1
REG_INDEX2
Reserved
TYPE
Description
The field points to a memory-mapped register that REG_DATA_1 is written to.
The field points to a memory-mapped register that REG_DATA_2 is written to.
Packet identifier. It should be 1 (one).
Information Body
Bit(s)
31:0
6.1.3
Field Name
REG_DATA_x
Description
The bits correspond to those defined for the relevant register.
Type-2 Packet
Functionality
This is a filler packet. It has only the header, and its content is not important except for bits 30 and 31. It is used to
fill up the trailing space left when the allocated buffer for a packet, or packets, is not fully filled. This allows the
microengine to skip the trailing space and to fetch the next packet.
Format
Ordinal
1
Field Name
[ HEADER ]
Header fields
Bit(s)
29:0
Field Name
reserved
© 2010 Advanced Micro Devices, Inc.
Proprietary
Description
31
Revision 1.5
31:30
6.1.4
TYPE
June 8, 2010
Packet identifier. It should be 2.
Type-3 Packet
Functionality
Carry out the operation indicated by field IT_OPCODE.
Format
Ordinal
1
2
Field Name
[ HEADER ]
{IT_BODY}
Header fields
Bit(s)
7:0
15:8
29:16
Field Name
Reserved
IT_OPCODE
COUNT
31:30
TYPE
Description
This field is undefined, and is set to zero by default.
Operation to be carried out. See section B.2 for details.
Number of DWORDs -1 in the information body. It is N-1 if the information body
contains N DWORDs.
Packet identifier. It should be 3.
Information Body
The information body IT_BODY will be described extensively in the following section.
© 2010 Advanced Micro Devices, Inc.
Proprietary
32
Revision 1.5
6.2
June 8, 2010
Definition of Type-3 packets
Type-3 packets has a common format in their headers. However, the size of their information body may vary
depending on the value of field IT_OPCODE. The size of the information body is indicated by field COUNT. If the
size of the information is N DWORDs, the value of COUNT is N-1. In the following packet definitions, we will
describe the field IT_BODY for each packet with respect to a given IT_OPCODE, and omit the header. The MSB
of the IT_OPCODE identifies whether this packet requires the GUI_CONTROL field (described later). A 1 in the
MSB of the IT_OPCODE indicates that GUI control is required. A 0 in the MSB of the IT_OPCODE indicates that
the GUI_CONTROL should be omitted.
© 2010 Advanced Micro Devices, Inc.
Proprietary
33
Revision 1.5
June 8, 2010
6.2.1 Summary of packets
Packet Name
NOP
PAINT
IT_OPCODE
0x10
0x91
Description
Skip N DWORDs to get to the next packet.
Paint a number of rectangles with a colour brush.
BITBLT
0x92
Copy a source rectangle to a destination rectangle.
HOSTDATA_BLT
0x94
POLYLINE
POLYSCANLINES
NEXTCHAR
0x95
0x98
0x19
PAINT_MULTI
0x9A
BITBLT_MULTI
0x9B
TRANS_BITBLT
PLY_NEXTSCAN
SET_SCISSORS
PRED_EXEC
COND_EXEC
WAIT_SEMAPHORE
WAIT_MEM
0x9C
0x1D
0x1E
0x20
0x21
0x22
0x23
3D_DRAW_VBUF
3D_DRAW_IMMD
3D_DRAW_INDX
0x28
0x29
0x2A
LOAD_PALETTE
3D_LOAD_VBPNTR
INDX_BUFFER
3D_DRAW_VBUF_2
0x2C
0x2F
0x33
0x34
3D_DRAW_IMMD_2
0x35
3D_DRAW_INDX_2
0x36
3D_CLEAR_HIZ
3D_DRAW_128
MPEG_INDEX
0x37
0x39
0x3A
Draw a string of large characters on the screen, or copy a
number of bitmaps to the video memory.
Draw a polyline (lines connected with their ends).
Draw polyscanlines or scanlines.
Print a character at a given screen location using the
default foreground and background colours.
Paint a number of rectangles on the screen with one
colour. The difference between this function and PAINT is
the representation of parameters.
Copy a number of source rectangles to destination
rectangles of the screen respectively.
2D transparent bitblt operation.
Draw polyscanlines using current settings.
Set up scissors.
Predicated execute wrapper for a sequence of packets
Conditional execute wrapper for a sequence of packets
Wait in the CP micro-engine for semaphore to be zero
Wait in the CP micro-engine for GPU-accessible memory
semaphore to be zero
Draw primitives using vertex buffer
Draw primitives using immediate vertices in this packet
Draw primitives using vertex buffer and indices in this
packet
Load a palette for 2D scaling.
Load pointers to vertex buffers
Load Indices Using Indirect Buffer #2
Same as 3D_DRAW_VBUF, but without
VAP_VTX_FMT
Same as 3D_DRAW_IMMD, but without
VAP_VTX_FMT
Same as 3D_DRAW_INDX, but without
VAP_VTX_FMT
Clear portion of the Hierarchal Z RAM
Draw packet to write to 128-bit VAP data port.
MPEG Packet Registers and Index Generation
6.2.2
2D Packets
The information body IT_BODY of 2-D packets may have the following format:
Ordinal
1
2
Field Name
{SETTINGS}
{DATA_BLOCK}
© 2010 Advanced Micro Devices, Inc.
Proprietary
34
Revision 1.5
June 8, 2010
SETTINGS
This field consists of 2 subfields, GUI_CONTROL and SETUP_BODY.
Ordinal
1
2

Field Name
[ GUI_CONTROL ]
{SETUP_BODY}
SETTINGS.GUI_CONTROL
This field will be used to setup the register DP_GUI_MASTER_CNTL, and it also decides the content of
SETTINGS.SETUP_BODY.
Bit(s)
0
Field Name
SRC_PITCH_OFF
1
DST_PITCH_OFF
2
SRC_CLIPPING
3
DST_CLIPPING
7:4
BRUSH_TYPE
11:8
DST_TYPE
{Not Used by uCode}
© 2010 Advanced Micro Devices, Inc.
Proprietary
Description
The bit controls the pitch and offset of the blitting source.
0:- Use the default pitch and offset, and no datum
[SRC_PITCH_OFFSET] is supplied in SETUP_BODY.
1:- Use the datum [SRC_PITCH_OFFSET] supplied in SETUP_BODY
to set up a new pitch offset.
The bit controls the pitch and offset of the blitting destination.
0:- Use the default pitch and offset, and no datum
[DST_PITCH_OFFSET] is supplied in SETUP_BODY.
1:- Use the datum [DST_PITCH_OFFSET] supplied in SETUP_BODY.
The pitch may mean the bitmap pitch and the offset may points the offscreen area of the video memory.
This bit controls the clipping parameters of the blitting source.
0:- Use the default clipping parameters, and no relevant clipping data
supplied in SETUP_BODY.
1:- Use datum [SRC_SC_BOT_RITE] supplied in SETUP_BODY to set
up the bottom and right edges of the clipping rectangle.
This bit controls the clipping parameters of the blitting destination.
0:- Use the default clipping parameters, and no relevant clipping data
supplied in SETUP_BODY.
1:- Use data [SC_TOP_LEFT] and [SC_BOTTOM_RIGHT] supplied in
SETUP_BODY to set up a new clipping rectangle.
Types of brush used in drawing. The type code determines how to supply
data to the subfield BRUSH_PACKET in SETUP_BODY. See detailed
definition of BRUSH_TYPE in the following.
The pixel type of the destination.
0--1 :- (reserved)
2 :- 8 bpp pseudocolor
3 :- 16 bpp aRGB 1555
4 :- 16 bpp RGB 565
5 :- reserved
6 :- 32 bpp aRGB 8888
7 :- 8 bpp RGB 332
8 :- Y8 greyscale
9 :- RGB8 greyscale (8 bit intensity, duplicated for all 3 channels. Green
channel is used on writes)
10 :- (reserved)
11 :- YUV 422 packed (VYUY)
12 :- YUV 422 packed (YVYU)
13 :- (reserved)
Status
7 through 15
not supported in
3D pipe
35
Revision 1.5
13:12
SRC_TYPE
{Not Used by uCode}
14
PIX_ORDER
{Not Used by uCode}
15
COLOR_CONVT
{Not Used by uCode}
23:16
WIN31_ROP
{Not Used by uCode}
26:24
SRC_LOAD
{Not Used by uCode}
27
SRC_TYPE
{Not Used by uCode}
28
GMC_CLR_CMP_FCN
_DIS
{Not Used by uCode}
Reserved
{Not Used by uCode}
GMC_WR_MSK_DIS
{Not Used by uCode}
29
30
© 2010 Advanced Micro Devices, Inc.
Proprietary
June 8, 2010
14 :- aYUV 444 (8:8:8:8)
15 :- aRGB4444 (intermediate format only. Not understood by the
Display Controller)
Note: choices 7-15 only valid in 3D mode.
The field indicates the pixel type of blitting source.
0:- The source data type is mono opaque, and the fore- and back-ground
colours need to be redefined.
1:- The source data type is mono transparent, and only the foreground
colour needs to be redefined.
2:- Reserved.
3:- The source pixel type is the same as that given in field DST_TYPE.
If bit 27 (SRC_TYPE) is one then the following new sources are
available:
4:- 4bpp source clut translation (May not be supported, value reserved)
5:- 8bpp source clut translation
6:- 32 bpp source clut translation (gamma correction)
7:- 64 bpp Obuffer blit
The bit decides the order of bits (or pixels) in DWORD to be consumed.
Only applicable to the monochrome mode.
0 :- Bits to be consumed from the Most Significant Bit (MSB) to the Least
Significant Bit (LSB).
1 :- Bits to be consumed from LSB to MSB.
Reserved
This field tells the GUI engine how the raster operation to be carried out.
The code of this field follows the ROP3 code defined by Microsoft. See
WIN31 DDK for reference.
The field indicates where the source data come from.
0,1 :- Reserved
2 :- loaded from the video memory (rectangular trajectory)
3 :- loaded through the HOSTDATA registers (linear trajectory)
4 :- loaded through the HOSTDATA registers (linear trajectory & bytealigned)
Note that during 3D/Scale Operations (whenever
SCALE_3D_FCN@MISC_3D_STATE_REG is non-zero), this field is
ignored and data is always loaded from the 3D/Scaler pipeline.
Third bit of SRC_TYPE
Not supported
in 2D pipe
Compatible 128
code must write
zero to this
register.
0 :- No change to CLR_CMP_FCN_SRC and CLR_CMP_FCN_DST
1 :- clear CLR_CMP_FCN_DST and CLR_CMP_FCN_SRC to 0
TBD
Reserved
Reserved
0 :- No Change to DP_WR_MSK/CLR_CMP_MSK
1 :- Set DP_WR_MSK/CLR_CMP_MSK to 0xffffffff
36
Revision 1.5
31

BRUSH_FLAG
June 8, 2010
This field indicates whether there is a field BRUSH_Y_X field in the
SETTINGS.SETUP_BODY.
0:- No such a field in SETTINGS.SETUP_BODY.
1:- There is a field in SETTINGS.SETUP_BODY.
SETTINGS.SETUP_BODY
This field may contain the following subfields. Their presence depends on the bits 0-7 of
SETTINGS.GUI_CONTROL.

Ordinal
1
Field Name
[SRC_PITCH_OFFSET]
2
[DST_PITCH_OFFSET]
3
[SRC_SC_BOT_RITE]
4
[SC_TOP_LEFT]
[SC_BOT_RITE]
5
{ BRUSH_PACKET }
6
[BRUSH_Y_X]
Description
Bit 30:
Select between untiled(0) and tiled (1)
Bit 31:
select between no microtiling(0) and microtiling(1)
Bits 29:22
Pitch in units of 64 bytes, 64 to 16384 bytes across
bits 21:0 Offset in units of 1KB, 0 to 4GB-1K
Bit 30:
Select between untiled(0) and tiled (1)
Bit 31:
select between no microtiling(0) and microtiling(1)
Bits 29:22
Pitch in units of 64 bytes, 64 to 16384 bytes across
bits 21:0
Offset in units of 1KB, 0 to 4GB-1K
The parameters are used to setup the clipping area of the source.
The implied coordinates of the top-left corner of the clipping
rectangle is the same as the source.
[13:0] :- x-coordinate of the right edge of the clipping rectangle (in
number of pixels).
[29:16] :- y-coordinate of the bottom edge of the clipping
rectangle (in number of scanlines).
The parameters are used to setup the clipping area of destination.
SC_TOP_LEFT:
[13:0] :- x-coordinate of the left edge of the clipping rectangle (in
number of pixels).
[29:16] :- y-coordinate of the top edge of the clipping rectangle (in
number of scanlines).
SC_BOT_RITE:
[13:0] :- x-coordinate of the right edge of the clipping rectangle (in
number of pixels).
[29:16] :- y-coordinate of the bottom edge of the clipping
rectangle (in number of scanlines).
The content of this field is determined by field
SETTINGS.GUI_CONTROL.BRUSH_TYPE. See the following
table for the possible content.
[4:0] :- x-coordinate for brush alignment.
[12:8] :- y-coordinate for brush alignment.
[20:16] :- Initial value used for BRUSH_X pointer in drawing
Lines. When POLY_LINE is off, it is reloaded from BRUSH_X
at the end of the line. When POLY_LINE is on, it is reloaded
from the current Brush pointer at the end of the line. Whenever
BRUSH_X is updated, the field should be written with the same
value.
SETTINGS.SETUP_BODY.BRUSH_PACKET
© 2010 Advanced Micro Devices, Inc.
Proprietary
37
Revision 1.5
June 8, 2010
Note that all but 6 and 7 are not available for lines, and 6 and 7 are only usable for lines.
BRUSH_TYPE Description of the brush
Packet size
0
A 8 x 8 mono pattern with the foreground
4 DWORDs
and background colours specified in the
packet. Here the matrix is represented in the
format column-by-row.
1
A 8 x 8 mono pattern with the foreground
3 DWORDs
colour specified in the packet and the
background colour the same as that of the
area to be painted.
2
Reserved
not applicable
Packet content
[BKGRD_COLOR]
[FRGRD_COLOR]
[MONO_BMP_1]
[MONO_BMP_2]
[FRGRD_COLOR]
[MONO_BMP_1]
[MONO_BMP_2]
3
Reserved
not applicable
4
Reserved
not applicable
5
Reserved
not applicable
6
A 32 x 1 mono pattern with the foreground
and background colours specified in the
packet. This pattern corresponds to the PEN
of Win95 DDK. And is only usable for lines.
A 32x1 mono pattern with the foreground
colour specified in the packet and the
background colour the same as that of the
area to be painted. This is PEN as well. And
is only usable for lines.
Removed, see 32x32 in 3D pipe
Removed, see 32x32 in 3D pipe
A 8x8 colour pattern. The pixel type is given
by field
SETTINGS.GUI_CONTROL.
DST_TYPE.
3 DWORDs
[BKGRD_COLOR]
[FRGRD_COLOR]
[MONO_BMP_1]
2 DWORDs
[FRGRD_COLOR]
[MONO_BMP_1]
7
8
9
10
11
Reserved
not applicable
not applicable
16* N DWORDs,
where N stands
for the number of
bytes per pixel
with exception
that a 24-BPP
pixel is still
represented by 4
bytes.
not applicable
12
Reserved
not applicable
13
Use the colour specified in the packet as the
solid (plain) colour for the brush, i.e. a colour
brush without pattern.
Use the colour specified in the packet as the
solid (plain) colour for the brush, i.e. a colour
brush without pattern.
No brush used.
1 DWORD
[FRGRD_COLOR]
1 DWORD
[FRGRD_COLOR]
14
15
© 2010 Advanced Micro Devices, Inc.
Proprietary
[COLOR_BMP_1]
[COLOR_BMP_2]
...
[COLOR_BMP_16*N]
0
38
Revision 1.5
June 8, 2010
Brush packet content
Field Name
[FRGRD_COLOR]
[BKGRD_COLOR]
[MONO_BMP_x]
[COLOR_BMP_x]
Description
The foreground colour of the text in the RGBQUAD format.
bits [7:0] :- intensity of Blue;
bits [15:8] :- intensity of Green; and
bits [23:16] :- intensity of Red.
bits [31:25] :- reserved.
The background colour of the text in the RGBQUAD format.
bits [7:0] :- intensity of Blue;
bits [15:8] :- intensity of Green; and
bits [23:16] :- intensity of Red.
bits [31:25] :- reserved.
Raster data of monochrome pixels. One bit represents one pixel. If the
number of pixels for the field is less than 32, the pixels take the lower bits.
The remaining bits should be filled with 0‟s.
Raster data of colour pixels. The representation depends on the pixel type.
DATA_BLOCK
The composition of this field depends on the operation code IT_OPCODE given in the header. Section B.2 gives
details of DATA_BLOCK with respect to IT_OPCODE. In the following, the field SETTINGS may appear in the
definition of a packet, but will not be described further.
6.2.2.1 NOP
Functionality
Skip a number of DWORDs to get to the next packet.
Format
Ordinal
1
2
Field Name
[ HEADER ]
{DATA_BLOCK}
DATA_BLOCK
This field may consist of a number of DWORDs, and the content may be anything.
6.2.2.2 PAINT
Functionality
Paint a number of rectangles with a colour brush.
Format
Ordinal
1
2
3
Field Name
[ HEADER ]
{SETTINGS}
{DATA_BLOCK}
© 2010 Advanced Micro Devices, Inc.
Proprietary
39
Revision 1.5
June 8, 2010
DATA_BLOCK
Ordinal
1
Field Name
[TOP_1 | LEFT_1]
2
[BOTM_1| RITE_1]
...
2n-1
2n
Description
The coordinates of the top-left corner of the 1st rectangle to be painted.
LEFT_1: [15:0]:- x-coordinate, ranging from -8192 to 8191. Bits 14 and 15
should be copies of bit 13.
TOP_1: [31:16]:- y-coordinate, ranging from -8192 to 8191. Bits 30 and 31
should be copies of bit 29.
The coordinates of the bottom-right corner of the 1st rectangle to be painted.
RITE_1: [15:0]:- x-coordinate, ranging from -8192 to 8191. Bits 14 and 15
should be copies of bit 13.
BOTM_1: [31:16]:- y-coordinate, ranging from -8192 to 8191. Bits 30 and
31 should be copies of bit 29.
[TOP_n| LEFT_n]
The coordinates of the top-left corner of the n-th rectangle to be painted.
[BOTM_n| RITE_n]
The coordinates of the bottom-right corner of the n-th rectangle to be
painted.
6.2.2.3 HOSTDATA_BLT
Functionality
Copy a number of bit-packed bitmaps to the video memory. It can be used to print a string of large characters on the
screen. In other words, the function supports the LARGEBITGLYPH structure of Windows95 DDK.
Format
Ordinal
1
2
3
Field Name
[ HEADER ]
{SETTINGS}
{DATA_BLOCK}
DATA_BLOCK
Ordinal
1
Field Name
[FRGD_COLOUR]
2
[BKGD_COLOUR]
3
...
m+2

{BIGCHAR_1}
Description
Foreground colour in the RGBQUAD format. For mono-to colour expansion
only. The field is ineffective if field SRC_TYPE at
SETTINGS.GUI_CONTROL is set to a type other than mono opaque or
mono transparent (0 or 1 ).
Background colour in the RGBQUAD format. For mono-to colour
expansion only. The field is ineffective if field SRC_TYPE at
SETTINGS.GUI_CONTROL is set to a type other than mono opaque or
mono transparent (0 or 1).
Data block of the 1st character.
{BIGCHAR _m}
Data block of the m-th character.
DATA_BLOCK.BIGCHAR_x
Ordinal
1
Field Name
[BaseY | BaseX]
© 2010 Advanced Micro Devices, Inc.
Proprietary
Description
The coordinate of the top-left corner of the character‟s bitmap.
40
Revision 1.5
[RASTER_1]
BaseX: [15:0] :- x-coordinate.
BaseY: [31:16] :- y-coordinate.
The geometry of the bitmap.
WIDTH: [15:0] :- width of the bitmap.
HEIGHT: [31:16] :- height of the bitmap.
The number of DWORDs in the bitmap. It should be m in this case. The max
value is 0x3FFF.
The 1st DWORD of the mono bitmap data.
[RASTER_m ]
The m-th DWORD of the mono bitmap data.
2
[HEIGHT | WIDTH]
3
[ NUMBER[13:0] ]
4
...
m+3
June 8, 2010
6.2.2.4 POLYLINE
Functionality
( x0 , y0 ) , ( x1 , y1 ) , ..., ( xn , yn ) , where coordinate ( x0 , y0 ) is the
beginning of the polyline, and coordinate ( xn , yn ) is the end.
Draw a polyline specified by a set of coordinates
Format
Ordinal
1
2
3
Field Name
[ HEADER ]
{SETTINGS}
{DATA_BLOCK}
DATA_BLOCK
Ordinal
1
2
...
n+1
Field Name
[Y0 | X0]
[Y1 | X1]
[Yn | Xn]
Description
The starting coordinate of the polyline.
X0: [15:0] :- x-component of the coordinate. Y0: [31:16]:- y-component.
The 2nd coordinate of the polyline. Definition of bits is the same as above.
The ending coordinate of the polyline. Definition of bits is the same as
above.
6.2.2.5 POLYSCANLINES
Functionality
Draw a number of scanlines and polyscanlines. The number can be one. The difference between a scanline and a
polyscanline is that a scanline has only one starting x-coordinate and one ending x-coordinate while a polyscanline
has a number of starting-ending x-coordinate pairs.
Format
Ordinal
1
2
Field Name
[ HEADER ]
{SETTINGS}
© 2010 Advanced Micro Devices, Inc.
Proprietary
41
Revision 1.5
3
June 8, 2010
{DATA_BLOCK}
DATA_BLOCK
Ordinal
1
2
...
n+1

Field Name
[SCAN_COUNT]
{ SCAN_1 }
Description
The number of scan subpackets identified by SCAN_x, where x denotes the
ordinal number of a SCAN subpacket.
The 1st scanline/polyscanline.
{ SCAN_n }
The n-th scanline/polyscanline.
DATA_BLOCK.SCAN_x
Ordinal
1
2
Field Name
[ NUM_LINE[13:0] ]
[HEIGHT | TOP ]
3
[END_1 | START_1]
...
n+2
[END_n |START_n]
Description
The number of line segments in a polyscanline. Maximum is 0x3fff.
TOP: [15:0] :- y-coordinate of the polyscanline.
HEIGHT: [31:16] :- The thickness of the line measured in pixels.
START_1: [15:0] :- the starting x-coordinate of the 1st line segment.
END_1: [31:16]:- the ending x-coordinate of the 1st line segment.
START_n: [15:0] :- the starting x-coordinate of the n-th line segment.
END_n: [31:16]:- the ending x-coordinate of the n-th line segment.
6.2.2.6 NEXTCHAR
Functionality
Print a character at a given screen location using the default foreground and background colours.
Format
Ordinal
1
2
Field Name
[ HEADER ]
{DATA_BLOCK}
DATA_BLOCK
Ordinal
1
Field Name
[DST_Y | DST_X]
2
[DST_H | DST_W]
3
...
[BITMAP_DATA_1]
© 2010 Advanced Micro Devices, Inc.
Proprietary
Description
The coordinates of the top-left corner of the destination bitmap.
DST_X: [15:0]:- x-coordinate, ranging from -8192 to 8191. Bits 14 and 15
should be copies of bit 13.
DST_Y: [31:16]:- y-coordinate, ranging from -8192 to 8191. Bits 30 and 31
should be copies of bit 29.
The width and height of the destination bitmap, expressed in unsigned
integers.
DST_W: [15:0]:- width. DST_H [31:16]:- height.
The 1st DWORD of the bitmap data.
42
Revision 1.5
N+2
[BITMAP_DATA_n]
June 8, 2010
The n-th DWORD of the bitmap data.
6.2.2.7 PAINT_MULTI
Functionality
Paint a number of rectangles on the screen with one colour. The colour used is specified in field SETTINGS while
the location and geometry of the rectangles are specified in field DATA_BLOCK.
Format
Ordinal
1
2
3
Field Name
[ HEADER ]
{SETTINGS}
{DATA_BLOCK}
DATA_BLOCK
Ordinal
1
Field Name
[DST_X1 | DST_Y1]
2
[DST_W1 | DST_H1]
...
2n-1
[DST_Xn | DST_Yn]
2n
[DST_Wn | DST_Hn]
Description
The coordinates of the top-left corner of the 1st rectangle.
DST_Y1: [15:0]:- y-coordinate, ranging from -8192 to 8191. Bits 14 and 15
should be copies of bit 13.
DST_X1: [31:16]:- x-coordinate, ranging from -8192 to 8191. Bits 30 and
31 should be copies of bit 29.
The width and height of the 1st rectangle, expressed in unsigned integers.
DST_H1: [15:0]:- height.
DST_W1: [31:16]:- width.
The coordinates of the top-left corner of the n-th rectangle.
DST_Yn: [15:0]:- y-coordinate, ranging from -8192 to 8191. Bits 14 and 15
should be copies of bit 13.
DST_Xn: [31:16]:- x-coordinate, ranging from -8192 to 8191. Bits 30 and
31 should be copies of bit 29.
The width and height of the n-th rectangle, expressed in unsigned integers.
DST_Hn: [15:0]:- height.
DST_Wn: [31:16]:- width.
6.2.2.8 BITBLT
Functionality
Copy a source rectangle to a destination rectangle of the screen. It is assumed that the geometry of the destination is
identical to its source.
Format
Ordinal
1
2
Field Name
[ HEADER ]
{SETTINGS}
© 2010 Advanced Micro Devices, Inc.
Proprietary
43
Revision 1.5
3
June 8, 2010
{DATA_BLOCK}
DATA_BLOCK
Ordinal
1
Field Name
[SRC_X1 | SRC_Y1]
2
[DST_X1 | DST_Y1]
3
[SRC_W1| SRC_H1]
Description
The coordinates of the top-left corner of the 1st source bitmap.
SRC_Y1: [15:0]:- y-coordinate, ranging from -8192 to 8191. Bits 14 and 15
should be copies of bit 13.
SRC_X1: [31:16]:- x-coordinate, ranging from -8192 to 8191. Bits 30 and
31 should be copies of bit 29.
The coordinates of the top-left corner of the 1st destination.
The definition of bits is the same as SRC_X1 and SRC_Y1.
The width and height of the 1st source bitmap, expressed in unsigned
integers.
SRC_H1: [13:0]:- height.
SRC_W1: [29:16]:- width.
6.2.2.9 BITBLT_MULTI
Functionality
Copy a number of source rectangles to destination rectangles of the screen respectively. It is assumed that the
geometry of the destination is identical to its source.
Format
Ordinal
1
2
3
Field Name
[ HEADER ]
{SETTINGS}
{DATA_BLOCK}
DATA_BLOCK
Ordinal
1
Field Name
[SRC_X1 | SRC_Y1]
2
[DST_X1 | DST_Y1]
3
[SRC_W1| SRC_H1]
Description
The coordinates of the top-left corner of the 1st source bitmap.
SRC_Y1: [15:0]:- y-coordinate, ranging from -8192 to 8191. Bits 14 and 15
should be copies of bit 13.
SRC_X1: [31:16]:- x-coordinate, ranging from -8192 to 8191. Bits 30 and
31 should be copies of bit 29.
The coordinates of the top-left corner of the 1st destination.
The definition of bits is the same as SRC_X1 and SRC_Y1.
The width and height of the 1st source bitmap, expressed in unsigned
integers.
SRC_H1: [13:0]:- height.
SRC_W1: [29:16]:- width.
...
© 2010 Advanced Micro Devices, Inc.
Proprietary
44
Revision 1.5
3n-1
[SRC_Xn | SRC_Yn]
3n-2
[DST_Xn | DST_Yn]
3n
[SRC_Wn| SRC_Hn]
June 8, 2010
The coordinates of the top-left corner of the n-th source bitmap.
SRC_Yn: [15:0]:- y-coordinate, ranging from -8192 to 8191. Bits 14 and 15
should be copies of bit 13.
SRC_Xn: [31:16]:- x-coordinate, ranging from -8192 to 8191. Bits 30 and
31 should be copies of bit 29.
The coordinates of the top-left corner of the n-th destination.
The definition of bits is the same as SRC_Xn and SRC_Yn.
The width and height of the n-th source bitmap, expressed in unsigned
integers.
SRC_Hn: [13:0]:- height.
SRC_Wn: [29:16]:- width.
6.2.2.10 TRANS_BITBLT
Functionality
Copy pixels from the source rectangle to the destination with transparency.
Format
Ordinal
1
2
3
Field Name
[ HEADER ]
{SETTINGS}
{DATA_BLOCK}
DATA_BLOCK
Ordinal
1

Field Name
[CLR_CMP_ CNTL]
2
[SRC_REF_CLR]
3
[DST_REF_CLR]
4
[SRC_X1 | SRC_Y1]
5
[DST_X1 | DST_Y1]
6
[SRC_W1| SRC_H1]
Description
This field decides how the transparent blitting is done. See following for
details.
Source reference colour in the RGBQUAD format. This is the colour to be
stripped off from the source.
Destination reference colour in the RGBQUAD format. This is the colour to
be preserved at the destination.
The coordinates of the top-left corner of the 1st source bitmap.
SRC_Y1: [15:0]:- y-coordinate, ranging from -8192 to 8191. Bits 14 and 15
should be copies of bit 13.
SRC_X1: [31:16]:- x-coordinate, ranging from -8192 to 8191. Bits 30 and
31 should be copies of bit 29.
The coordinates of the top-left corner of the 1st destination.
The definition of bits is the same as SRC_X1 and SRC_Y1.
The width and height of the 1st source bitmap, expressed in unsigned
integers.
SRC_H1: [13:0]:- height.
SRC_W1: [29:16]:- width.
DATA_BLOCK.CLR_CMP_CNTL
This field controls how the source pixels are written to the destination, depending on the source and destination
reference colours and comparison settings. The source pixels may be filtered against the source reference colour,
and the destination pixels with a specific colour may be preserved according to field CLR_CMP_DST.
Bit(s)
2:0
Bit-Field Name
CLR_CMP_SRC
© 2010 Advanced Micro Devices, Inc.
Proprietary
Description
Strip off the source reference colour from the source pixels.
45
Revision 1.5
June 8, 2010
0 :- Do not strip off source pixels. All source pixels are written to the destination.
1 :- Block the blitting source. No source pixel is written to the destination.
2, 3 :- reserved.
4 :- The source pixels whose colour is equal to the reference colour are written to the
destination.
5 :- The source pixels whose colour is NOT equal to the reference colour are written
to the destination.
6 :- Reserved.
7 :- The source pixels whose colour is equal to the reference colour will be XORed
with the foreground colour of a mono bitmap, and then written to the destination.
That is, destPixel = srcPixel XOR foregrndColor if srcPixel is equal to the
foreground colour of a mono bitmap, specifically text. This is referred to as flipping
sometimes.
7:3
10:8
Reserved
CLR_CMP_DST
23:11
25:24
Reserved
CMP_ENABLE
31:26
Reserved
Preserve pixels at the destination.
0 :- Do not preserve the destination pixels. All pixels from the source are written to
the destination.
1 :- Preserve all the destination pixels. No source pixel is written to the destination.
2, 3 :- Reserved.
4 :- The destination pixels whose colour is equal to the reference colour are
preserved. No source pixel is written on top of the pixels.
5 :- The destination pixels whose colour is NOT equal to the reference colour are
preserved.
6, 7 :- Reserved.
The bits controls what type of operation to be carried out.
0 :- Enable function CLR_CMP_DST.
1 :- Enable function CLR_CMP_SRC
2 :- Enable both CLR_CMP_SRC and CLR_CMP_DST. The final decision is based
on the agreement between decisions made separately.
3 :- Reserved.
6.2.2.11 PLY_NEXTSCAN
Functionality
Draw a number of scanlines or polyscanlines using the current settings.
Format
Ordinal
1
2
Field Name
[HEADER]
[HEIGHT | TOP ]
3
[END_1 | START_1]
...
n+2
[END_n |START_n]
© 2010 Advanced Micro Devices, Inc.
Proprietary
Description
The packet header
TOP: [15:0] :- y-coordinate of the scanline/polyscanline.
HEIGHT: [31:16] :- The thickness of the line measured in pixels.
START_1: [15:0] :- the starting x-coordinate of the 1st dash.
END_1: [31:16]:- the ending x-coordinate of the 1st dash.
START_n: [15:0] :- the starting x-coordinate of the 1st dash.
END_n: [31:16]:- the ending x-coordinate of the 1st dash.
46
Revision 1.5
June 8, 2010
6.2.2.12 LOAD_PALETTE
Functionality
Set up the 3D engine scaler and load a palette for a consequent 2D scaling operation.
Format
Ordinal
1
2
Field Name
[HEADER]
[SCALE_DATATYPE ]
3
[ COLOR_1]
4
...
n+2
[ COLOR_2]
Description
The packet header
1:- The palette has 16 entries (4 bpp palette).
2:- The palette has 256 entries (8 bpp palette).
The 1st entry of the palette.
Data is in destination format (i.e. ARGB8888, RGB565, RGB555,…)
The 2nd entry of the palette. Bits are defined as above.
[ COLOR_n]
The n-th entry of the palette. n = 16 (4bpp) or 256 (8bpp)
6.2.2.13 SET_SCISSORS
Functionality
Set the scissors to the given parameters.
Format
Ordinal
1
2
3
Field Name
[HEADER]
[TOP_LEFT]
[ BOTTOM_RIGHT]
© 2010 Advanced Micro Devices, Inc.
Proprietary
Description
The packet header
[13:0] :- x-coordinate of the left edge of the clipping rectangle (in number of
pixels).
[29:16] :- y-coordinate of the top edge of the clipping rectangle (in number
of scanlines).
[13:0] :- x-coordinate of the right edge of the clipping rectangle (in number
of pixels).
[29:16] :- y-coordinate of the bottom edge of the clipping rectangle (in
number of scanlines).
47
Revision 1.5
June 8, 2010
6.2.3 3D Packets
6.2.3.1 3D_DRAW_VBUF
Functionality
Draws a set of primitives using a vertex buffer(s) pointed to by state data.
Format
Ordinal
1
2
3
Field Name
[ HEADER ]
[VAP_VTX_FMT]
[VAP_VF_CNTL]
Description
Header of the packet
** Not Written to Hardware, Microcode Throws Away **
Primitive type and other control (See VAP_VF_CNTL register in register spec)
Number of Vertices is bits: 31:16
6.2.3.2 3D_DRAW_IMMD
Functionality
Draws a set of primitives using vertices stored in packet.
Format
Ordinal
1
2
3
Field Name
[ HEADER ]
[VAP_VTX_FMT]
[VAP_VF_CNTL]
Description
Header of the packet
** Not Written to Hardware, Microcode Throws Away **
4 to end
Vertex data
Up to 16,380 DWORDs of vertex data.
© 2010 Advanced Micro Devices, Inc.
Proprietary
Primitive type and other control (See VAP_VF_CNTL register in register spec)
Number of Vertices is bits: 31:16
48
Revision 1.5
June 8, 2010
6.2.3.3 3D_DRAW_INDX
Functionality
Draws a set of primitives using a vertex buffer(s) pointed to by state data, index from indices in packet. Indices are
either 16-bit or 32-bit.
Format
Ordinal
1
2
3
Field Name
4 to end
[indx16 #2 | indx16 #1]
or [indx32]
[ HEADER ]
[VAP_VTX_FMT]
[VAP_VF_CNTL]
Description
Header of the packet
** Not Written to Hardware, Microcode Throws Away **
Primitive type and other control (See VAP_VF_CNTL register in register spec)
Number of Vertices is bits: 31:16
Up to or 32,760 16-bit indices or 16,380 32-bit indices to vertex data pointed
to by state registers. The INDEX_SIZE field in the VAP_VF_CNTL register
indicates whether the indices are 16-bit or 32-bit. See INDX_BUFFER
packet for support of more indices.
6.2.3.4 3D_DRAW_VBUF_2
Functionality
Draws a set of primitives using a vertex buffer(s) pointed to by state data.
Format
Ordinal
1
2
Field Name
[ HEADER ]
[VAP_VF_CNTL]
© 2010 Advanced Micro Devices, Inc.
Proprietary
Description
Header of the packet
Primitive type and other control (See VAP_VF_CNTL register in register spec)
Number of Vertices is bits: 31:16
49
Revision 1.5
June 8, 2010
6.2.3.5 3D_DRAW_IMMD_2
Functionality
Draws a set of primitives using vertices stored in packet.
Format
Ordinal
1
2
Field Name
[ HEADER ]
[VAP_VF_CNTL]
Header of the packet
Description
3 to end
Vertex data
Up to 16,381 DWORDs of vertex data
Primitive type and other control (See VAP_VF_CNTL register in register spec)
Number of Vertices is bits: 31:16
6.2.3.6 3D_DRAW_INDX_2
Functionality
Draws a set of primitives using a vertex buffer(s) pointed to by state data, index from indices in packet.
Format
Ordinal
1
2
Field Name
[ HEADER ]
[VAP_VF_CNTL]
Header of the packet
3 to end
[indx16 #2 | indx16 #1]
or [indx32 #1]
Up to or 32762 16-bit indices or 16,381 32-bit indices to vertex data pointed
to by state registers. The INDEX_SIZE field in the VAP_VF_CNTL register
indicates whether the indices are 16-bit or 32-bit. See INDX_BUFFER
packet for support of more indices.
© 2010 Advanced Micro Devices, Inc.
Proprietary
Description
Primitive type and other control (See VAP_VF_CNTL register in register spec)
Number of Vertices is bits: 31:16
50
Revision 1.5
June 8, 2010
6.2.3.7 3D_DRAW_128
Functionality
Draws a set of primitives using a vertex buffer(s) pointed to by state data, index from indices in packet. Data/Indices
are written to 128-bit VAP vector data port to take advantage of the 128-bit data path for sending data. The packet
should only be used in bus master mode.
Vector mode operates as follows:
1.
Data will be written to the destination register (VAP_POR_DATA_IDX_128) one DWORD at a time until
the source address of the data is aligned to a vector (128 bits).
2.
Once aligned, the data will be written 128-bits per clock to the destination register. The CP does grouping
of the data such that it will wait until a full vector is available if the MC is slow in returning the data that
was requested.
3.
If the last DWORDs of a packet do not fill a vector, they will still be written in one clock, but the DWORD
write mask will be set accordingly.
Format
Ordinal
1
2
Field Name
[ HEADER ]
[VAP_VF_CNTL]
Header of the packet
Description
3 to end
Data or Indices
See other 3D_DRAW packets for details.
Primitive type and other control (See VAP_VF_CNTL register in register spec)
Number of Vertices is bits: 31:16
6.2.3.8
© 2010 Advanced Micro Devices, Inc.
Proprietary
51
Revision 1.5
June 8, 2010
6.2.3.9 3D_LOAD_VBPNTR
Functionality
Load the vertex arrays pointers.
Format
Ordinal
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
Field Name
[ HEADER ]
VTX_NUM_ARRAYS
VTX_AOS_ATTR01
VTX_AOS_ADDR0
VTX_AOS_ADDR1
VTX_AOS_ATTR23
VTX_AOS_ADDR2
VTX_AOS_ADDR3
VTX_AOS_ATTR45
VTX_AOS_ADDR4
VTX_AOS_ADDR5
VTX_AOS_ATTR67
VTX_AOS_ADDR6
VTX_AOS_ADDR7
VTX_AOS_ATTR89
VTX_AOS_ADDR8
VTX_AOS_ADDR9
VTX_AOS_ATTR1011
VTX_AOS_ADDR10
VTX_AOS_ADDR11
Description
Header of the packet
Number of arrays
Control for the first two arrays
Pointer to first array
Pointer to second array
And so on….
6.2.3.10 3D_CLEAR_HIZ
Functionality
Clear HIZ RAM.
Format
Ordinal
1
2
3
4
Field Name
[ HEADER ]
START
COUNT[13:0]
CLEAR_VALUE
Description
Header of the packet
Start
Count[13:0] – Maximum is 0x3FFF.
The value to write into the HIZ RAM.
6.2.3.11 INDX_BUFFER
Functionality
Initiates Indirect Buffer #2 (IB #2) to fetch data that is written to the destination address. The main reason for this
packet is to fetch indices from an index buffer. The packet however can be used to fetch any type of data and write it
to destination address(s) in the chip.
To process an index buffer, first issue a 3D_DRAW_INDX packet with only the VAP_VTX_FMT and
VAP_VF_CNTL DWORDs (i.e. count = 1). Then process an INDX_BUFFER packet to supply the indices that
would have otherwise been in the 3D_DRAW_INDX packet. Note: For a 3D_DRAW_INDX_2 packet, the
VAP_VTX_FMT is not present and the count in the header should be zero.
© 2010 Advanced Micro Devices, Inc.
Proprietary
52
Revision 1.5
June 8, 2010
The maximum size of the Indirect #2 Buffer is 8,192K DWORDs – as determined by the BUFFER_SIZE field. So
the maximum number of indices supported is 8,192K 32-bit or 16,384K 16-bit indices. These maximums may be
further limited by the design of the Vertex Fetcher/Vertex Cache. See the VAP specification for details.
Format
Ordinal
1
2
3
4
Field Name
[ HEADER ]
[ONE_REG_WR |
SKIP_COUNT |
DESTINATION]
BUFFER_BASE[31:2]
BUFFER_SIZE[22:0]
© 2010 Advanced Micro Devices, Inc.
Proprietary
Description
Header of the packet
ONE_REG_WR – Bit 31 (Set for upper-word-aligned buffers)
SKIP_COUNT – Bits 18:16: Number of DWORDs to discard at start of data buffer
DESTINATION Address – Bits 12:0
Base Address of Buffer – Written to CP_IB2_BASE
Size of Buffer in DWORDs – Written to CP_IB2_BUFSZ to initiate the Indirect
Buffer #2. Note that the (BUFFER_SIZE – 1) also overwrites the CNT register in
the micro engine so that the parser will not finish with this packet until all the data
from the IB #2 is transferred. For misaligned data, this number must be increased by
1.
53
Revision 1.5
June 8, 2010
6.2.3.12 MPEG_INDEX
Functionality
Packed register writes for MPEG and Generation of Indices.
Format
1
Ordinal
Field Name
[ HEADER ]
2
[MASK]
Description
Header field of the packet.
DWORD write Mask: Bits 15:0 are “present” bits to indicate whether to
write the register:
bit[0]
VAP_PVS_CODE_CNTL_0 present
bit[1]
VAP_PVS_CODE_CNTL_1 present
bit[2]
bit[3]
VAP_PROG_STREAM_CNTL_0 present
VAP_PROG_STREAM_CNTL_1 present
bit[4]
VAP_PROG_STREAM_CNTL_2 present
bit[5]
VAP_PROG_STREAM_CNTL_3 present
bit[6]
VAP_OUT_VTX_FMT_0 present
bit[7]
VAP_OUT_VTX_FMT_1 present
bit[8]
VAP_VTX_NUM_ARRAYS present
bit[9]
RS_COUNT present
bit[10] RS_INST_COUNT present
bit[11] TX_ENABLE present
bit[12] US_CODE_ADDR_0 present
bit[13] US_CODE_ADDR_1 present
bit[14] US_CODE_ADDR_2 present
bit[15] US_CODE_ADDR_3 present
bit[16] US_CONFIG present
bit[17] RB3D_DSTCACHE_CTLSTAT present
bit[18] RB3D_COLOROFFSET0 present
bit[19] RB3D_COLORPITCH0 present
Conditional
[Register Values]
© 2010 Advanced Micro Devices, Inc.
Proprietary
Values to Write into Registers. Only present in packet if corresponding
54
Revision 1.5
June 8, 2010
“present” bit is set in the MASK.
3 up to 22
Next
[VF_CNTL]
Written Unconditional to VAP_VF_CNTL register
Next+1
[NUM_INDICES]
Number of Index Base Values (0x3FFF Maximum)
Next+2 to
Next+2+
[FIRST_INDEX]
First Index of Quad. (0x0000 to 0xFFFC)
For each “First Index”, CP will generate the other 3 indices and output:
NUM_INDIC
ES
FIRST_INDEX
FIRST_INDEX+1
FIRST_INDEX+2
FIRST_INDEX+3
Last Values
[DUMMY]
© 2010 Advanced Micro Devices, Inc.
Proprietary
Any value is fine. Any number of dummy values are supported.
55
Revision 1.5
6.2.4
June 8, 2010
PRED_EXEC
Functionality
Perform a predicated execution of a sequence of packets (type 0, 2, and type 3) on select devices.
Format
Ordinal
1
2
Field Name
[ HEADER ]
[DEVICE_SELECT |
EXEC_COUNT]
Description
Header field of the packet.
DEVICE_SELECT: [31:24] – bitfield to select one or more device upon
which the subsequent predicated packets will be executed
EXEC_COUNT: [22:0] – total number of DWORDs of subsequent
predicated packets. This count wraps the packets that will be predicated
by the device select.
6.2.4.1 WAIT_SEMAPHORE
Functionality
Wait for a semaphore to be zero before continuing to process the subsequent command stream. There are four
microcode ram slots set aside for use as semaphores. These are at offset 0xFC-0xFF.
Notes
The driver/application executing on the CPU can write non-zero values at any time to semaphore memory. The
application can write a non-zero value to cause the CP micro-engine to pause at the next WAIT_SEMAPHORE
packet in the command stream. This has the affect of pausing all GPU rendering that is queued in the indirect and
ring buffers. The application can then write a zero to the semaphore to allow the micro-engine to proceed.
The application can write to the semaphore memory by a direct (PIO) register write to two registers:
1.
Write the semaphore offset (0xFC, 0xFD, 0xFE, or 0xFF) to the CP_ME_RAM_ADDR register.
2.
Write the semaphore value (zero or non-zero) to the CP_ME_RAM_DATAL register.
Format
Ordinal
1
2
Field Name
[ HEADER ]
Semaphore offset
3
Semaphore reset
© 2010 Advanced Micro Devices, Inc.
Proprietary
Description
Header field of the packet.
This is the desired semaphore to test in the wait loop. This can be any one of
0xFC, 0xFD, 0xFE, 0xFF.
Optional. This value, if present, is written to the semaphore offset once the
wait loop has been satisfied (i.e., once the semaphore is zero).
56
Revision 1.5
6.2.5
June 8, 2010
Miscellaneous Packets
6.2.5.1 COND_EXEC
Functionality
Perform a conditional execution of a sequence of packets (type 0, 2, and type 3) based on a boolean stored in GPUaccessible video memory.
This packet use the Indirect Buffer #2 (IB2) to read the boolean in memory. Therefore, this packet can not be
initiated from an IB2.
Notes
Care must be taken to make certain that EXEC_COUNT contains the exact number of DWORDs for the subsequent
packets that are to be conditionally executed. The microengine will start parsing the DWORD immediately
following EXEC_COUNT DWORDs. If this is not a packet header, the device will encounter corruption or hang.
Format
Ordinal
1
2
3
Field Name
[ HEADER ]
TWO
EXEC_COUNT
Description
Header field of the packet.
This value must be 2
EXEC_COUNT: [22:0] – total number of DWORDs of subsequent
conditional packets. This count wraps the packets that will be
conditionally executed.
6.2.5.2 WAIT_MEM
Functionality
Wait for a GPU-accessible memory semaphore to be zero before continuing to process the subsequent command
stream. The semaphore can reside in any GPU-accessible memory (local or non-local). The base address of the
semaphore must be aligned to a DWORD boundary. The semaphore in memory consists of two DWORDs.
This packet has no ability to increment, decrement or otherwise change the contents of the memory semaphore.
The memory semaphore consists of two DWORDs: the actual semaphore and an extra DWORD with a fixed value
of two. The extra DWORD is required and guarantees that the command processor micro-engine can loop properly
in order to repeatedly test the semaphore value as necessary. The semaphore is organized as follows:
Semaphore value
Fixed value of 2
This packet use the Indirect Buffer #2 (IB2) to read the memory semaphore. Therefore, this packet can not be
initiated from an IB2.
Notes
If both ordinal 3 (SEM_LEN) and the DWORD in memory following the semaphore value is not equal to two, the
CP micro-engine will become confused and ultimately hang the hardware.
The driver/application executing on the CPU can write non-zero values at any time to semaphore memory. The
application can write a non-zero value to cause the CP micro-engine to pause at the next WAIT_MEM packet in the
© 2010 Advanced Micro Devices, Inc.
Proprietary
57
Revision 1.5
June 8, 2010
command stream. This has the affect of pausing all GPU rendering that is queued in the indirect and ring buffers.
The application can then write a zero to the semaphore to allow the micro-engine to proceed.
Format
Ordinal
1
2
3
Field Name
[ HEADER ]
SEM_ADDR[31:2]
SEM_LEN
© 2010 Advanced Micro Devices, Inc.
Proprietary
Description
Header field of the packet.
Memory semaphore device address (DWORD aligned)
This value is written to the CP_IB2_BASE in order to read the semaphore
Memory semaphore length
This value MUST be 2
This value is written to the CP_IB2_BUFSIZ in order to read the semaphore
the first time
58
Revision 1.5
June 8, 2010
7. Vertex Shaders
7.1 Introduction
The VAP includes the Vertex Fetcher and Vertex Cache which take commands and vertex data from a command
stream and formats it into vertices and primitives. Typically, the commands are stored in a ring buffer and the
vertex data is stored as a separate array in memory, although there are other possibilities described later. The VAP
begins operation when a command to render a set of primitives is received. Depending on the command, the VAP
will either expect vertex data to be sent, or it will perform the memory accesses to read the vertex data on its own.
The format of the vertex data is described later in this section.
The VAP includes a Programmable Vertex Shader (PVS) Engine which performs programmable operations on
vertices which are then subsequently assembled and clipped. This programmable processing path will also be used
to perform all Fixed-Function vertex processing after driver generation of a shader from fixed-function state
settings.
The VAP includes a Clip Engine which will clip primitives (using the PVS-processed vertices) to the 6 frustum
planes as well as to 6 User-Defined Clip Planes. The VAP includes a Viewport Transform Engine (VTE) which
performs the perspective divide and viewport transformation operations on the vertex data and a Reciprocal Engine
(RCP) which performs an IEEE 23-bit mantissa accurate 1/X function.
7.2 Input
The input to the VAP is a Command Packet which contains two parts: a command to render some set of primitives
(like a list of triangles), and a set of vertex data. As described later, the vertex data may be sent to the VAP or the
Vertex Fetcher may fetch the data. There are a number of different data formats which are possible. Data may be
stored as an array of structures (AOS), a structure of arrays (SOA), or in a strided vertex format. The AOS mode is
what has been used up to DX6. In AOS mode, all of the data for a vertex is stored sequentially as one contiguous
block of memory as shown in Figure. In SOA mode, the data for each parameter (like x or w) is stored as a separate
array. To get all of the data for a vertex, one must look into several different arrays. For example, assume that we
have eight vertices which have the parameters X, Y, W, S, and T. In SOA mode the data would be stored in five
different arrays as shown in Figure. In the strided vertex format, data is stored in several different arrays. Each
array holds a variable number of parameters. For example, the first array might hold the x, y, and z coordinates. A
second array might hold the diffuse color, a third array might hold the S and T coordinates for a texture map. Figur
shows how a strided vertex with x, y, z, w, S, and T might be stored. The holes in the xyz array are not required but
are shown to indicate the flexibility allowed with the strided vertex format.
dword
Base Address 0
0
1
2
3
4
X0
X1
X2
X3
Y0
Y1
Y2
Y3
W0
W1
W2
W3
S0
S1
S2
S3
T0
T1
T2
T3
Figure: AOS Vertex Data Storage
© 2010 Advanced Micro Devices, Inc.
Proprietary
59
Revision 1.5
dword
June 8, 2010
0
1
2
3
Base Address 0
X0 X1 X2 X3
Base Address 1
Y0 Y1 Y2 Y3
Base Address 2
W0 W1 W2 W3
Base Address 3
S0
S1
S2
S3
Base Address 4
T0
T1
T2
T3
Figure: SOA Vertex Data Storage
dword
0
1
2
3
Base Address 0
X0 Y0 Z0
Base Address 1
W0 W1 W2 W3
Base Address 2
S0
T0
S1
4
5
6
7
X1 Y1 Z1
T1
S2
T2
S3
8
9
10
X2 Y2 Z2
11
12
13
14
15
X3 Y3 Z3
T3
Figure: Strided Vertex Data Storage
To represent all of these formats, the Vertex Fetcher architecture allows for a vertex to be described as multiple
arrays of structures. Each array is described with a base address, a count and a stride. The base address points to
the beginning of the array. The count indicates the number of dwords of vertex data in this array. The stride gives
the number of dwords to the next structure in the array of structures. The AOS vertex from Figure with 5 parameters
would be represented with a single array which consists of 5 dwords with a stride of 5 dwords. The SOA vertex
from Figure would be represented with 5 arrays. Each array would have a count of 1 and a stride of 1. The strided
vertex from Figur would be represented with three arrays. The first array would have a count of 3 and a stride of 4.
The second array would have a count of 1 and a stride of 1. The third array would have a count of 2 and a stride of
2. A given implementation of this architecture may have a different maximum number of arrays of structures. If
only AOS is supported, then only one array is required. To support a strided vertex format with three textures, 7
arrays would be required (xyz, w, diffuse, specular, S0T0, S1T1, S2T2.) To support a true SOA mode, each
parameter would require its own array.
The access to vertex data may be immediate or by an index. In immediate mode, the base address of an array of
vertex data is provided. The vertex data should be read in the order in which it is stored to produce the desired
primitives. In indexed mode, a base address to the beginning of the vertex data is provided along with a set of
indices. The indices are used to access vertices in any order.
The vertex indices are clamped between a minimum and maximum state value which is supplied by the driver. This
prevents making requests to illegal or unavailable memory addresses.
Finally, the vertex data can be embedded as part of the command stream, or it can be stored in a separate array. The
figure below shows all of the possible vertex data storage modes along with implementation details for each mode.
© 2010 Advanced Micro Devices, Inc.
Proprietary
60
Revision 1.5
June 8, 2010
The table below describes the parameters that may be in a vertex, as supplied to the graphics controller
device.
NOTE: With the R300 PVS-only vertex processing path and PSC-only input vertex data mapping path, the
TCL (or PVS) input memories have no pre-defined mapping to vertex values. This is completely determined
by the driver FF->PVS conversion process. Due to this fact, the table below is fairly meaningless to the vertex
process. It is retained as a guide to help describe the fixed-function possibilities for vertex data.
Type
Param
eter
Description
Format
Applicable
Interface
(PRE-TNL /
POST-TNL
/ BOTH)**
Position0 XY
X0
The x coordinate of the vertex
IEEE floating point
BOTH
Y0
The y coordinate of the vertex
IEEE floating point
BOTH
Position0 Z
Z0
The z coordinate of the vertex
IEEE floating point
BOTH
Position0 W
W0
W or RHW (1/Homog W) coordinate of the
vertex
IEEE floating point
BOTH
Vertex Blending
Weight(s)
BW0-4
0-4 Blend Weights
IEEE floating point
PRE-TNL
Per-Vertex Matrix
Select
PVMS
Vertex Blending Matrix Selects
8888 packed fixed point
PRE-TNL
Vertex Normal 0
Nx0
The x coordinate of the vertex normal
IEEE floating point
PRE-TNL
Ny0
The y coordinate of the vertex normal
IEEE floating point
PRE-TNL
Nz0
The z coordinate of the vertex normal
IEEE floating point
PRE-TNL
Point Size Modifier
PS
Point Size Modifier – Point Sprites – PostTCL only
IEEE floating point
BOTH
Discrete Fog
F
Fog value – Post TCL only
IEEE floating point
POST-TNL
Shininess0
Shine0
Used for GL Material Per-Vertex Support
IEEE floating point
PRE-TNL
Shininess1
Shine1
Used for GL Material Per-Vertex Support
IEEE floating point
PRE-TNL
Color 0
ARGB
Typically Diffuse color and alpha weight
Usually 8888, but can be
three or four separate IEEE
floating point values **
BOTH
© 2010 Advanced Micro Devices, Inc.
Proprietary
61
Revision 1.5
June 8, 2010
Color 1
ARGB
Typically Specular color and fog/alpha weight
Usually 8888, but can be
three or four separate IEEE
floating point values **
BOTH
Color 2
ARGB
Typically Used for GL Material Per-Vertex
Support
Usually 8888, but can be
three or four separate IEEE
floating point values **
PRE-TNL
Color 3
ARGB
Typically Used for GL Material Per-Vertex
Support
Usually 8888, but can be
three or four separate IEEE
floating point values **
PRE-TNL
Color 4
ARGB
Typically Used for GL Material Per-Vertex
Support
Usually 8888, but can be
three or four separate IEEE
floating point values **
PRE-TNL
Color 5
ARGB
Typically Used for GL Material Per-Vertex
Support
Usually 8888, but can be
three or four separate IEEE
floating point values **
PRE-TNL
Color 6
ARGB
Typically Used for GL Material Per-Vertex
Support
Usually 8888, but can be
three or four separate IEEE
floating point values **
PRE-TNL
Color 7
ARGB
Typically Used for GL Material Per-Vertex
Support
Usually 8888, but can be
three or four separate IEEE
floating point values **
PRE-TNL
Texture Coordinate
Set 0
S0
The 1st coordinate for texture number 0
IEEE floating point
BOTH
IEEE floating point
BOTH
IEEE floating point
BOTH
IEEE floating point
BOTH
IEEE floating point
BOTH
(usually the single dimension horizontal
component S )
T0
The 2nd coordinate for texture number 0
(usually the two dimension vertical
component T )
R0
The 3rd coordinate for texture number 0
(The 3rd & 4th components can have many
uses)*
Q0
The 4th coordinate for texture number 0
(The 3rd & 4th components can have many
uses)*
Texture Coordinate
S1
The 1st coordinate for texture number 1
© 2010 Advanced Micro Devices, Inc.
Proprietary
62
Revision 1.5
June 8, 2010
(usually the single dimension horizontal
component S )
Set 1
T1
The 2nd coordinate for texture number 1
IEEE floating point
BOTH
IEEE floating point
BOTH
IEEE floating point
BOTH
(usually the two dimension vertical
component T )
R1
The 3rd coordinate for texture number 1
(The 3rd & 4th components can have many
uses)*
Q1
The 4th coordinate for texture number 1
(The 3rd & 4th components can have many
uses)*
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Texture Coordinate
Set 5
S5
The 1st coordinate for texture number 5
IEEE floating point
BOTH
IEEE floating point
BOTH
IEEE floating point
BOTH
IEEE floating point
BOTH
(usually the single dimension horizontal
component S )
T5
The 2nd coordinate for texture number 5
(usually the two dimension vertical
component T )
R5
The 3rd coordinate for texture number 5
(The 3rd & 4th components can have many
uses)*
Q5
The 4th coordinate for texture number 5
(The 3rd & 4th components can have many
uses)*
Position1 XY
Position1 Z
X1
The x coordinate of the vertex for blending
IEEE floating point
PRE-TNL
Y1
The y coordinate of the vertex for blending
IEEE floating point
PRE-TNL
Z1
The z coordinate of the vertex for blending
IEEE floating point
PRE-TNL
© 2010 Advanced Micro Devices, Inc.
Proprietary
63
Revision 1.5
June 8, 2010
Position1 W
W1
W or RHW (1/Homog W) coordinate of the
vertex for blending
IEEE floating point
PRE-TNL
Vertex Normal 1
Nx1
The x coordinate of the vertex normal
IEEE floating point
PRE-TNL
Ny1
The y coordinate of the vertex normal
IEEE floating point
PRE-TNL
Nz1
The z coordinate of the vertex normal
IEEE floating point
PRE-TNL
Figure: Vertex Parameters
** The Applicable Interface column is provided to specify which values are inputs to the TCL process and/or the
Raster Process. All of the values can appear in the FVF at the same time, but PRE-TNL values are ignored by the
raster process and POST-TNL values are ignored by the TCL process. In the unlikely circumstance that POST-TNL
values are provided in the FVF as inputs to the TCL process, there will be the ability to pass these values around the
TCL process.
7.3 Vector Order and Vector ID‟s
With the move to a PSC-only and PVS-only Vertex Process, there is no fixed definition of data (or location of data)
in the input vertex memory. Therefore, the destination vector locations in the PSC are fully flexible and map
directly into the corresponding location in the input vertex memory. The PSC also allows for write_mask and
swizzle capabilities to allow for more complex fixed-function and/or shader usage.
The special vector known as the NULL vector is used to keep the pipeline flow the same when there are no vectors
to be processed. It is a sort-of “special” vector that each engine knows to ignore it as far as processing is concerned,
but it is used because we need to send some kind of token down the pipeline for synchronization purposes.
The NULL vector is a vector that is not to undergo vector processing, but which will carry information in its
associated flags, such as endOfPacket . It is used when a vertex has been deleted (for culling, clipping, or other
potential reasons) and there is no valid vertex to be sent with the control information.
For the case of TCL_BYPASS (or when there is no TCL present in the HW), the PSC destination vector locations
shall map directly to the semantically defined locations of the GA input memories. In this mode, the discrete fog
and point size terms can use the write_enable and or swizzle capabilities of PSC to get the terms into the appropriate
channels.
© 2010 Advanced Micro Devices, Inc.
Proprietary
64
Revision 1.5
June 8, 2010
7.4 VAP Registers
7.4.1 VAP Vertex Data Port Registers
The DATA and IDX PORT registers are written with either primitive vertex data or primitive vertex indices after a
“trigger” write has occurred. A “trigger” write is a write to the VAP_VF_CNTL register with a non-zero prim_type.
The correct (expected) number of data words or index words must be written to these registers or undefined
behavior will result.
For R300, there is a new DATA/IDX port register added for 128-bit access. This register is only accessible via a
PM4 Type3 packet and can only be used for indexed TRI_LIST and LINE_LIST. Other than the prim-type
limitations, using this 128-bit register (or PM4 Type 3 packet opcode) is identical to using the standard method.
The PRIM_WALK field in the VAP_VF_CNTL register defines what method of vertex data or indx updates are to
occur.
1 = Indexes (Indices embedded in command stream; vertex data to be fetched from memory)
In this mode, vertex indices are written to the DATA/IDX port registers. Data is fetched using the AOS registers
corresponding to the indices in the input list. The number of indices expected is
VAP_VF_CNTL.NUM_VERTICES – 1. This mode does not use the VAP_VTX_SIZE register. The size of the
vertices is determined by the AOS register setup.
2 = Vertex List (Vertex data to be fetched from memory)
This mode does not require any vertex data or vertex indices written to the DATA/IDX port registers. Data is
fetched using the AOS registers for the indices from 0 to VAP_VF_CNTL.NUM_VERTICES – 1. Identical to
Indexes mode, except indices are internally generated.
3 = Vertex Data (Vertex data embedded in command stream)
In this mode, the vertex data is written to the DATA port registers. The number of DWORDS expected is
VAP_VTX_SIZE.DWORDS_PER_VTX * (VAP_VF_CNTL.NUM_VERTICES – 1). The VAP_VTX_SIZE
register is new to R300. In R100 / R200, this size was derived from the VAP_VTX_FMT_0/1.
7.4.2 VAP Control Register
The PVS_NUM_SLOTS should be set to the minimum of

the MAX_SLOTS, (POR is 10)

the INPUT_VTX_MEM_SIZE / INPUT_VECTORS_PER_VTX (POR is 128 / Var)

the OUPUT_VTX_MEM_SIZE / OUTPUT_VECTORS_PER_VTX (POR is 128 / Var)
These equations assume the input and output vertex data has been packed. If not, use the
MAX_INPUT_VECTOR_USED instead of INPUT_VECTORS_PER_VTX
The PVS_NUM_CNTLRS should be set to the minimum of

the MAX_CNTLRS, (POR is 6)

the TEMP_VTX_MEM_SIZE / TEMP_VECTORS_PER_VTX (POR is 128 / Var)
© 2010 Advanced Micro Devices, Inc.
Proprietary
65
Revision 1.5
June 8, 2010
These equations assume the temp vertex data has been packed. If not, use the MAX_TEMP_VECTOR_USED
instead of TEMP_VECTORS_PER_VTX.
When modifying either of PVS_NUM_SLOTS or PVS_NUM_CNTLRS, a flush must be inserted prior to the
update.
The PVS_NUM_FPUS will typically remain constant for a given chip, but can be used for performance testing.
The Shader HW will support up to a max of 32 vectors-per-vertex of input data and 32-vectors-per-vertex of temp
data as long as the NUM_SLOTS and NUM_CNTLRS are set to obey the above-described rules.
New R5xx Fields
The TCL_STATE_OPTIMIZATION bit enables a hardware optimization to improve small batch and
multiple instance performance. The TCL_STATE_OPTIMIZATION is a bit which should be set all the time.
The bit can be reset to return operation to pre-R5xx status.
7.4.3 R300 Edge Flag Support Description
Edge Flags refers to the bits which are provided, generated and/or modified during the primitive process which
affect which edges (lines) or points of a triangle are drawn when in a wireframe or point fill mode. Edge Flags are
not applicable to line or point primitive types, but are applicable to all 3 or more-sided primitives (i.e quad, polygon,
etc). R300 will support edge flags for wireframe rendering as follows :
1.
Prim Type initialization of edge flags is done by the vertex fetcher logic. Edge flags are initialized
by the vertex fetcher based on the VAP_VF_CNTL.PRIM_TYPE field. The edge flags values for
points and lines are not used during the triangle fill process, so are irrelevant. The edge flags for all
triangle primitive types are all 3 set. For more complex prim types like quads and polygons, only
the exterior of the primitive is supposed to be drawn, so the vertex fetcher applies the edge flags in a
way which only sets the bits which correspond to an external edge of the supplied primitive.
2.
Clipping modification of edge flags is done by the clipping processor according to the OpenGL
specification. Basically, the rule is that edges introduced by clipping (which would lie along a clip
plane) will always have thier corresponding edge flag set and edges which are fragments of initial
edges would retain thier original edge flags. The boundary edges introduced by clipping may be
either always set or never based on the VAP_CLIP_CNTL.BOUNDARY_EDGE_FLAG_ENA bit.
7.4.4 Input Vertex Format Registers
The VAP_VTX_FMT_0 and VAP_VTX_FMT_1 registers were used for 2 reasons on R200:
1.
Decoding / Data Conversion / Data Direction of Vertex Stream Data from output of Cache to Vector
ID‟s
2.
Computation of Dwords/Vtx for Command Stream load of vertex data.
These registers will no longer exist for R300. They are replaced as follows:

The Decoding / Data Conversion / Data Direction will be controlled completely by the Programmable
Stream Control logic. R300 will contain the additional functionality of component swizzle and writemask specification to ensure full control of input stream.

The computation of Dwords/Vtx will be replaced by the VAP_VTX_SIZE register which must be
loaded by the driver when using command stream vertex data.
7.4.5 TCL Output Vertex Format Registers
The purpose of these controls is to indicate which vertex data should be transmitted from the PVS output vertex
© 2010 Advanced Micro Devices, Inc.
Proprietary
66
Revision 1.5
June 8, 2010
memories, and from which vector locations they come. The PVS output vertex memories are not directly mapped to
semantic values to enable the split-vertex mode described later. The RASTER_VTX_FMT_0/1registers define
which values will be transmitted from PVS to CLIP/Setup to GA to Raster.
The locations of the vectors in the PVS output memory must be packed based on the VAP_OUT_VTX_FMT_0/1
register settings. Only the fields which are present in the OVFRs should be packed in the output memory. The
packing order is as follows: Position is always in location 0, Point Size (if present) is next (Point Size consumes an
entire vector in the memory, the X-channel is the value used by the raster), Colors (0-3) are next (if present), and
Textures (0-7) are next (if present). For example if the OVFR specified POS, PNT_SIZE, C0, C2, T1 and T5, these
vectors should be mapped (by the shader output operand offsets) to Output Memory locations 0-5 respectively.
For Points (Sprites) using Tex Gen (GB_ENABLE.TEX#_SOURCE == STUFF), the
VAP_OUT_VTX_FMT_1.TEX# should not be set. This is because, in general, there is no texture coordinate data
transferred from VAP to GA for this case. In the case of point clipping with tex gen, VAP will send these texture
coordinates to the GA even though the OVFR bit is not set, as follows:
OVFR COUNT
STUFF TEX
DESIRED CHANGED RESULT
0
0
No texture ever
0
>0
Clipper and/or GA creates (stuffs)
texture
>0
0
Normal texture (use vertex/pvs
texture)
>0
>0
Normal texture (use vertex/pvs
texture)
There is also the ability to pack 2 2-dimensional textures into a single 4-component texture for the VAP->GA
interface by only specifying one texture and mapping the raster state to think it is two textures.
7.4.6 Vertex State Control
This vector controls how the per-vertex state is processed. This input method is designed for OpenGL Immediate
Mode and Display List Processing.
UPDATE_USER_COLOR_0/1_ENA are deleted from R300 since they are not needed, only one user color is
required.
The COLOR_#_ASSEMBLY_CNTL change from 2-bit fields on R200 to 1-bit fields on R300 since there is only 1
USER COLOR.
7.4.7 Programmable Input Stream Control Registers
These registers control the post-vertex-cache mapping of input vertex stream data to the vector ids for TCL or SE
input memories. These registers replace the R200 Input Vertex Format Registers. Terminology: A vertex is
composed of multiple (up to 16 for R300) streams. A stream can be composed of multiple elements (where an
element is pos or norm or texcoord). The control data is arranged as 16 sets of element control data. There is not
necessarily a one-for-one mapping of stream to element. The stream control shall be set up in the order that the data
is received (or fetched).
The DataType specifies the number of DWORDS and format for each input element.
The SkipDwords specifies the number of DWORDS to skip (discard from the input stream) after the corresponding
© 2010 Advanced Micro Devices, Inc.
Proprietary
67
Revision 1.5
June 8, 2010
element has been processed. This allows multiple non-contiguous elements to reside within one stream. NOTE:
There is not support for skipping DWORDS prior to the first element, the assumption is that the driver can
prevent this from occurring.
There are two sets of PSC control registers, the VAP_PROG_STREAM_CNTL_0-7 are identical to the R200
registers of the same name. R300 adds VAP_PROG_STREAM_CNTL_EXT_0-7 which are extensions to the first
set of registers to allow a swizzle and write_mask capability. The expectation is that the EXT registers will not be
updated frequently, but they must be updated at least once to provide default control.
The DstVecLoc specifies the destination vector location (TCL / SE input vector address) for the given element.
The data type of FLOAT_8 has been added to R300 to permit using input vertices greater than 16 vectors. By
making sure that the VAP_CNTL.PVS_NUM_SLOTS and VAP_CNTL.PVS_NUM_CNTLRS are appropriately
sized, it is possible to use up to 32 vectors for the input vertex representation.
7.4.8 PVS State Flush Register
Since the driver is given control over multi-state updates to PVS Code and Constant memories, there is the need for
the driver to be able to force a “flush” of the state data. When this register address is written, the State Block will
force a flush of TCL processing so that both versions of TCL state are available before updates are processed. This
register is write only, and the data that is written is unused.
7.4.9 PVS Vertex Timeout Register
A condition can occur in the HW, in pathological vertex reuse cases, where when many primitives are sent which do
not use any new verts, the HW could hang. The solution for this hang is to wait a programmable number of clocks
when in the condition of primitive buffer full and waiting on vertices. After this number of clocks has passed
without receiving any new vertex data, the accumulated vertex data (less than 4 vertices) will be submitted to the
PVS engines. This register defaults to 0xFFFFFFFF.
7.4.10 VECTOR Indx/Data Update Register Pair
The Vector Indx Data pair is used to update all TCL vector state memories.
There are basically 2 vector memories, the PVS Constant Memory and the PVS Code Memory.
The index register contains the octword offset to write to (or read from) on the subsequent DATA_REG write/read. All
writes/reads must start octword aligned. An internal Dword counter is incremented each time a write or read occurs
to/from the DATA_REG. The Dword counter is reset when the index register is written (or read). When the dword
counter rolls from 3 back to 0, the index register value (octword address) is incremented. (Writes to the DATA_REG_128
register do not use or affect the dword counter. The DATA_REG_128 register is not readable.
The VAP_PVS_VECTOR_DATA_REG_128 register is very similar to the VAP_TCL_VECTOR_DATA_REG, but
allows 128-bit writes into the vector memory. There may be some restrictions when writing to this register (i.e. only 128bit aligned, 128-bit updates allowed).
The vertex shader instruction store increased from 256 to 1024 for R5xx VS3.0.
To account for the increased shader instruction store, the Offsets Used to get to the various memories (and elements of
memories) are as follows:
#define VERTEX_SHADER_CONST_VECS
#define VERTEX_SHADER_CODE_LINES
#define PVS_CODE_START
© 2010 Advanced Micro Devices, Inc.
Proprietary
256
1024
// R300 256
0
68
Revision 1.5
#define PVS_CONST_START
#define UCP_START_OFFSET
#define POINT_VPORT_SCALE_OFFSET
#define POINT_GEN_TEX_OFFSET
1536
June 8, 2010
1024 // R300 512
// R300 1024
1542
// R300 1030
1543
// R300 1031
7.4.11 State-Vector Engine State Data
The input vector state data required for TCL is listed in the table below. Each entry will consist of 4 single precision
IEEE floating-point vector values. The entire StVe_Vector memory is accessed via an index/data register pair. When
updating multiple DWORDS through this path, the PM4 packet bit which prevents auto-incrementation should be used so
that all words are written to the data register.
UCP0
XYZW
User clip plane 0
4 IEEE fp
UCP1
XYZW
User clip plane 1
4 IEEE fp
:
:
UCP5
XYZW
User clip plane 5
4 IEEE fp
XYZW
Viewport scaling parameters for Point Sprite Expansion in Clip Coords
4 IEEEfp
Point Sprite
Viewport Scale /
Misc
:
X = X-Radius Expansion
Y = Y-Radius Expansion
Z = State Size Multiply Constant
W = Culling Radius Expansion (SQRT(XRadExp ^2 + YradExp ^2)
Point Tex Gen
Corner Values
XYZW
Texture values to apply to points when tex gen is on
4 IEEEfp
X = Lower Left Corner S-Value
Y = Lower Left Corner T-Value
Z = Upper Right Corner S-Value
W = Upper Right Corner T-Value
** These values may be updated using the
VAP_PVS_VECTOR_DATA_REG or via the
GA_POINT_S0,T0,S1,T1 Registers. Note that updates using the
VAP_PVS_VECTOR_DATA_REG will not update the GA registers.
VECTOR MEMORY DESCRIPTIONS
There are two vector memories.
The vertex shader instruction store increased from 256 to 1024 for R5xx VS3.0.
© 2010 Advanced Micro Devices, Inc.
Proprietary
69
Revision 1.5
June 8, 2010
The PVS_CODE memory which will be 1024 entries deep and can operate as a ring (similar to R200), is linearly
addressed using offsets 0-1023. Auto-incrementing writes to this memory segment will auto-wrap back to 0 from
1023.
The PVS_CONST memory will be 256+8 entries deep. The first 256 entries of this memory will operate as a ring
(similar to R200/R300), and are linearly addressed using offsets 1024-1535. Auto-incrementing writes to this
memory segment will auto-wrap back to 1024 from 1535.
The last 8 entries of this memory are used for Clipping data which currently includes the User-Clip Planes , Point
Sprite Viewport Scale vector, and Point Sprite Gen Tex Corner values. These entries will be updated starting at
address 1536 through 1543. Since the PVS_CONST will auto-wrap at 1535 for constant updates, the UCP writes
must start with an index update to 1536 or above. Auto-incrementing writes will auto-wrap back to 1536 from 1542
(NOT 1543). This wrap-around probably will never be used, but, note that the wrap-around intentionally excludes
the Point Gen Tex vector since it is considered raster state.
These memories are not double-buffered in the code and constant range of addresses. For the code and const
memories, it is expected that the driver will insert a flush if the currently being-loaded shader code or const overlaps
the immediately preceding shader code or const. Updates to the UCP / PS_VPORT_SCALE / Point Gen Tex values
are double-buffered and therefore no flush is required.
7.4.12 Scalar Indx / Data Registers
These memories and registers no longer exist for R300. The only data in them that is still relevant is the guard band
data which now resides in dedicated registers as described below.
7.4.13 VAP_GB_VERT_CLIP_ADJ
The VAP_GB_* registers will only be single-buffered which means that a VAP_PVS_STATE_FLUSH_REG
write must precede updates to these registers.
7.4.14 Programmable Vertex Shader Control Registers
The VAP_PVS_CNTL register allows control over which instructions in the PVS code store are executed with
respect to the current shader.
The VAP_PVS_CONST_CNTL register allows control over which address ranges in the PVS const store (STVE)
are used with respect to the current shader.
7.4.15 Vertex Blending Control Register
The COLOR2_IS_TEXTURE and COLOR3_IS_TEXTURE bits enable the R5xx VAP VS3.0 to support 10 general
output vectors. For pre-R5xx, VAP supported 4 color vectors and 8 texture vectors to output to the pixel shader.
During new clip vertex generation, the color interpolation supported color clamping and flat shading and the texture
interpolation supported point texture coordinate generation and cylindrical wrap. In order to create general output
vectors, color vectors required point texture coordinate generation and cylindrical wrap processing while texture
vectors required color clamping and flat shading.
© 2010 Advanced Micro Devices, Inc.
Proprietary
70
Revision 1.5
June 8, 2010
7.4.16 Texture to Color Control Registers
The TEX_RGB_SHADE_FUNC_(0-7), TEX_ALPHA_SHADE_FUNC_(0-7), and TEX_RGBA_CLAMP_(0-7)
bits enable the R5xx VAP VS3.0 to support 10 general output vectors. For pre-R5xx, VAP supported 4 colors and 8
textures to output to the pixel shader. During new clip vertex generation, the color interpolation supported color
clamping and flat shading and the texture interpolation supported point texture coordinate generation and cylindrical
wrap. In order to create general output vectors, color vectors required point texture coordinate generation and
cylindrical wrap processing while texture vectors required color clamping and flat shading.
The TEX_RGB_SHADE_FUNC_(0-7), TEX_ALPHA_SHADE_FUNC_(0-7), and TEX_RGBA_CLAMP_(0-7)
bits enable the R5xx VAP VS3.0 to support color type interpolation during clipping on texture vectors. The bits
enable flat shading or color clamping selectively on all 8 texture vectors. These bits only support clipper
functionality of flat shading. The rasterizer has separate register bits to enable flat shading at pixel interpolation.
7.4.17 VAP_VTE_CNTL
This register is used to control the functionality of the VAP Viewport Transform Engine.
7.4.18 GA_COLOR_CONTROL
This register is used by the clipper to control flat shading of all 4 colors and alphas based off of the provoking
vertex.
7.4.19 GA_ROUND_MODE
This register specifies the rouding mode for geometry & color SPFP to FP conversions. Only the RGB and
ALPHA_CLAMP fields are used by VAP.
7.4.20 GA_POINT_S0/T0/S1/T1
These registers are used to control the texture coordinates for texture coordinate generation. These are only used by
VAP for point clipping.
7.4.21 GB_ENABLE
This register is used by VAP to control when and how point textures are generated for clipping.
7.4.22 SU_TEX_WRAP
This register is used by VAP when clipping in order to perform cylindrical wrap clipping calculations.
7.5 R3xx-R5xx Programmable Vertex Shader Description
7.5.1 OVERVIEW
The R300 PVS model is a superset of the R200 PVS model. Differences are noted below.
R200->R300 Notable Shader Model Differences at Shader Definition Level
© 2010 Advanced Micro Devices, Inc.
Proprietary
71
Revision 1.5
1.
2.
3.
4.
5.
6.
7.
June 8, 2010
Constant Store Size Increase from 192 to 256
Code Store Size Increase from 128 to 256
Ability to increase Input Size from 16 to 32 vectors-per-vertex
Ability to increase Temp Register Size from 12 to 32 vectors-per-vertex
Increase support from 6 Output Textures to 8
Increase support from 2 Output Colors to 4 (4 th color only used for 2-sided lighting)
Ability to perform flow control instructions of jump, loop and subroutine
R200->R300 Notable Shader Model Differences at Driver Compilation Level
1.
2.
3.
4.
5.
6.
7.
Requirement to Manage NUM_SLOTS & NUM_CONTROLLERS based on Input, Output
and Temp Register sizes relative to the respective vectors-per-vertex.
Requirement to “pack” output vectors based on OVFR.
Discrete Fog resides in one of Color 0-3 alpha.
Addition of Alternate Temp Memory. Can be used as additional standard Temp Memory.
Addition of Dual-Op Vector/Math Capability along with Alternate Temp Reg Memory
Ability to write back into Input Memory from Shader (For HOS Evaluation Shader)
Ability to use address register with Input, Output, and Temp registers as src and dest
operands. There is not a current known use for this, but it was simple to add.
The R5xx VS3.0 PVS model is a superset of the R300 PVS VS2.0 model. Differences are noted below:
1.
Ability to support dynamic flow control through the use of predication opcodes, predication bit,
predicated writes, and a nested false count maintained in a temporary memory location.
2. Ability to support predication register through predication opcodes, predication bit, and
predicated writes or use CONDITIONAL vector opcodes where sources are conditionally written
or conditionally selected.
3. Code store size increase from 256 to 1024.
4. Temporary memory size increase from 72 to 128 (supports 4 threads and 32 vectors per thread).
5. Input memory size increase from 72 to 128 (supports 4 threads and 32 vectors per thread).
6. Output memory size increase from 72 to 128.
7. Static control flow nested loops and subroutines (4 deep loops and 4 deep subroutines)
8. Ability to access input, temporary, and output memories with inner most loop index.
9. Added new loop repeat type where the fixed-point loop index is not loaded at loop initialization.
FLI is inherited from parent loop.
10. Added new source input modifier (absolute value).
11. Added new instruction modifier saturate to clamp outputs between 0 and 1.
The programmable vertex shader (PVS) is a model which replaces the standard DirectX / OGL vertex processing
pipeline. It replaces only the per-vertex operations (i.e. transformation, lighting, texture coordinate generation,
texture transform, fog), but does not replace any of the primitive operations (i.e. primitive assembly, clipping,
backface culling, 2-sided lighting. The functional model for the PVS HW is as shown in the following diagram. For
R300, 2-sided lighting is achieved by writing up to 4 output colors (both front and back color results) and allowing
the setup engine to select the appropriate color(s) based on the facedness of the triangle.
The general model of the PVS is that all operands are of a vector type (4 floating point values). When there are
scalar operations, generally they emit the scalar result on all 4 channels of the output vector.
The input vertex memory (IVM) represents the data which is provided on a per-vertex basis (i.e. position, normal,
color, etc). This vertex data does not have any semantic attachment from the perspective of the shader HW. All
© 2010 Advanced Micro Devices, Inc.
Proprietary
72
Revision 1.5
June 8, 2010
vertex attributes are generic. There is a total of 128 vectors of IVM memory where up to 32 vectors (16 is
typical) may be used per vertex. (See description of slot/controller dependencies below).
The constant state memory (CSM) represents the constant values which are used in the shader process (i.e rotation
matrices, light positions, etc). This data also has no semantic attachment from the perspective of the shader HW.
There are 256 vectors of constant memory available.
The temporary register memory (TRM) represents the intermediate storage of temporary values computed during the
shader process. There are a total of 128 vectors of TRM memory where up to 32 vectors (12 is typical) may be
used per vertex. (See description of slot/controller dependencies below).
The alternate temporary register memory (ATRM) was added to R300 to allow both a vector engine operation and a
math engine operation to output unique results simultaneously. The ATRM can be used in the same manner as the
TRM for regular vector operations except there is only a single read port on the ATRM memory, thus only 1 unique
source operand of an instruction may come from ATRM memory. The ATRM memory is the only memory that the
math portion of a dual-math operation can write. There are a total of 20 vectors of ATRM memory where up to
20 vectors (4 is typical) may be used per vertex. (See description of slot/controller dependencies below). (See
description of dual math op for ATRM limitations).
There are 4 address registers arranged as a vector (A0.x,y,z,w) which are signed integer fixed point values. The
address registers can only be used as an offset to the address into the constant memory. The address registers are
loaded using a MOV instruction from any of the IVM, CSM, TRM or ATRM. This special MOV instruction will
perform a floating point to fixed point conversion of the selected source vector. There are two separate MOV
instructions for unique float to fix conversion. One is a truncate to minus infinity (the floor() C function), the other
is a round and truncate to minus infinity ( val + 0.5f, followed by floor() C function. The value is clamped between
the range of –256 and 255. When this value is added to the constant address of the current operation, the result is
tested for in the range of 0 to MAX_SHADER_CONST where MAX_SHADER_CONST is determined by the
driver as the maximum constant address provided by the shader declaration. If the resultant address is out of the
range 0 to MAX_SHADER_CONST, (0,0,0,0) is returned on the data path. There is a 2-bit address register select
for each source operand which is used to select between the x,y,z,w components of the address register vector. Only
a single address register (component) may be used for CSM offsets across all of the source operands of a given
instruction. If the address registers are used for offsets to IVM, TRM, ATRM, or OVM, there is no limitation on the
number of address registers which can be used.
The output vertex memory (OVM) represents the data that is computed or passed by the shader program. These
locations have semantics attached since they are passed through the clipping, viewport transform, rasterization
process. The locations in the OVM are as follows:
PVS_OUT_POS
The output x,y,z,w position. This output vector must be written to by all
shaders.
PVS_OUT_PT_SIZE
The output scalar point sprite size modifier. X-comp only.
PVS_OUT_CLR(0-3)
The output r,g,b,a colors. Support for 4.
PVS_OUT_TEX(0-7)
The output s,t,r,q textures. Support for 8.
PVS_OUT_FOG
The output scalar discrete fog. X-component only.
© 2010 Advanced Micro Devices, Inc.
Proprietary
73
Revision 1.5
June 8, 2010
There are a total of 128 vectors of OUT memory. These values are mapped based on the compression
described below. (See description of slot/controller dependencies below).
For R300, the driver must remap the shader output memory attributes to be “packed” into the first sequential output
vectors based on the OVFR register definition. For example, if the only attributes present in the OVFR are Pos,
Pt_Size, Clr1 and Tex 2, then these values must be written to output vectors 0-3. The order of the vectors, when
present, is as listed above. Note that Fog does not have an associated vector, it can be placed in any of color 0-3
alpha channel. There is a GB_SELECT.FOG_SELECT setting in the raster to control where fog comes from.
Operations are defined generally as
PVS_OP DST_OP.write_mask
SRC_OP_A.modifier
SRC_OP_C.modifier
SRC_OP_B.modifier
Different PVS ops have differing numbers of source operands. The number of source operands for each instruction
is specified below with the function descriptions.
One strict limitation of the PVS model is that a single operation may only use one unique address from the IVM,
CSM, or ATRM. One, Two, or Three addresses may be used from the TRM (although 3 unique addresses from the
TRM on a single instruction will take 2 cycles in the HW). More than one source operand may utilize the IVM,
CSM, or ATRM memory as long as they all access the same vector address.
Each source operand has a modifier which can be applied on a per-component basis. There are two basic types of
source operand modification, Swizzle and negation. The swizzle operation is performed first. For each component
x,y,z,w it is possible to define independently which component gets mapped to these components, including a 0.0 or
1.0 value. So for each component you can select from (X, Y, Z, W, 0.0, 1.0). Following the swizzle operation, it is
possible to specify a negation of the value on a per-component basis.
The destination operand has a write mask which allows any or all of the vector components to be updated. This is
particularly useful when performing scalar output operations to pack the result into a single component of a vector
value (since the scalar results are generally emitted on all component channels).
7.5.2 SLOT AND CONTROLLER MANAGEMENT
For R5xx, the input memory size, the temporary memory size, and the output memory size have been increased from
72 to 128 vectors. As stated below, with larger memories, the PVS design can run more efficiently with more
NUM_SLOTS and more NUM_CNTRS.
The R300 PVS design has a degree of flexibility which allows the driver to increase the effective per-vertex sizes of
the IVM, TRM, ATRM, and OVM memories at the expense of reduced performance. There are two variables in this
performance tradeoff for R300: (NOTE: a vertex group is 8 vertices per group for R5xx since 8 vector engines)
a.
b.
© 2010 Advanced Micro Devices, Inc.
Proprietary
the number of slots (NUM_SLOTS): the max number of vertex groups that can
reside from the input of vertex data to the IVM to the output of vertex data
from the OVM, and
the number of controllers (NUM_CNTLRS): the max number of vertex groups
that are available for vector engine processing at any given time.
74
Revision 1.5
June 8, 2010
The IVM and OVM memory flexibility is affected by NUM_SLOTS, while the TRM and ATRM memory flexibility
is affected by the NUM_CNTLRS. In general, the higher the values for NUM_SLOTS and NUM_CNTLRS, the
more efficient (higher performance) the PVS engine will run. The values for NUM_SLOTS and NUM_CNTLRS
are restricted by the vectors-per-vertex required for the active vertex shader program.
The equations for determining valid values for these terms are as follows:
NUM_SLOTS <= MIN(10, IVM_SIZE / IVM_VEC_PER_VTX, OVM_SIZE /
OVM_VEC_PER_VTX)
Where IVM_SIZE = 128, OVM_SIZE = 128 and IVM_VEC_PER_VTX and
OVM_VEC_PER_VTX are vertex shader dependent values.
NUM_CNTLRS <= MIN(5, TRM_SIZE / TRM_VEC_PER_VTX, ATRM_SIZE /
ATRM_VEC_PER_VTX)
Where TRM_SIZE = 128, ATRM_SIZE = 20, and TRM_VEC_PER_VTX and
ATRM_VEC_PER_VTX are vertex shader dependent values.
Note that NUM_SLOTS and NUM_CNTLRS are permitted to be set too low, but there is a performance penalty for
setting them lower.
Note that when changing NUM_SLOTS or NUM_CNTLRS, a flush of the PVS engine is required by writing the
VAP_PVS_STATE_FLUSH_REG.
7.5.3 VS3.0 DYNAMIC FLOW CONTROL USING R5xx PREDICATION LOGIC
VS3.0 dynamic flow control is implemented on R5xx in a manner similar to R400 where vector engine operations
and math engine operations are used to manipulate a predication bit to mask writes to the temporary memory, the
output memory, the input memory, the alternate temporary memory, and the address register. The operations are
designed to use a temporary memory location as a stack counter to keep the count of false branches. For nested
if/else/endif branches, the operations receive as input the stack counter as well as the boolean operation to determine
whether the predication bit is set and whether the stack counter is incremented or decremented. Within the
if/else/endif construct, the ALU operations are predicated which kills the writes if the predication bit is not set.
A possible implementation of nested if/else/endif constructs is as follows:
if (
A.x == 0 ) {
if ( A.y > 0 ) {
B = C;
} else {
B = D;
}
} else {
If ( A.z >= 0 ) {
B = E;
} else {
B = F;
}
}
TEMP.w = ME_PRED_SET_EQ
A.xxxx
TEMP.w = VE_PRED_SET_GT_PUSH TEMP.000w, A.000y
B = C
with pred_enable = 1 and pred_sense =1
TEMP.w = ME_PRED_SET_INV
TEMP.000w
B = D
with pred_enable = 1 and pred_sense =1
TEMP.w = ME_PRED_SET_POP
TEMP.000w
TEMP.w = ME_PRED_SET_INV
TEMP.000w
TEMP.w = VE_PRED_SET_GTE_PUSH TEMP.000w, A.000z
B = E
with pred_enable = 1 and pred_sense =1
TEMP.w = ME_PRED_SET_INV
TEMP.000w
B = F
with pred_enable = 1 and pred_sense =1
TEMP.w = ME_PRED_SET_POP
TEMP.000w
TEMP.w = ME_PRED_SET_POP
TEMP.000w
First level “if” statements turn in to ME_PRED_SET_EQ, ME_PRED_SET_GT, ME_PRED_SET_GTE, or
ME_PRED_SET_NEQ depending on the boolean expression. The first level “If” statements appropriately initialize
© 2010 Advanced Micro Devices, Inc.
Proprietary
75
Revision 1.5
June 8, 2010
the predication bit and false branch counter to 0 or 1 depending on the result of the boolean expression. Second
level or deeper “If” statements turn in to VE_PRED_SET_EQ_PUSH, VE_PRED_SET_GT_PUSH,
VE_PRED_SET_GTE_PUSH, or VE_PRED_SET_NEQ_PUSH. These “If” statements require the false branch
counter as an additional input to determine the final status of the predication bit and the output false branch counter.
For these “If” statements, the predication bit will only be set if the input false branch counter is 0 and the boolean
expression is true. “Else” statements turn into ME_PRED_SET_INV, which also require the false branch counter as
an input and only set the predication bit if this counter is 1. If the input false branch counter is 0, the
ME_PRED_SET_INV sets the output false branch counter to 1 for later nesting and resets the predication bit.
“Endif” statements turn into ME_PRED_SET_POP, which decrement and clamp the false branch counter to 0 if
negative.
The ME_PRED_SET_CLR and ME_PRED_SET_RESTORE operations can be used for loop break statements.
The ME_PRED_SET_CLR resets the predication bit and outputs maximum float to set the false branch counter to
an extremely high number to disable successive operations in a breaked loop. The ME_PRED_SET_RESTORE
operation can be used to restore the predication bit and the false branch counter after exiting a breaked loop.
In the R300 architecture, the best performance is achieved by trying to interlace computations so that an operations
source is not the destination of the preceding operation. In the above example, the false branch stack counter stored
in TEMP.w is a very popular source and destination operand, and R5xx performance would be better optimized by
finding other operations to interlace between them.
7.5.4 VS3.0 PREDICATION AND SIMPLE DYNAMIC FLOW CONTROL USING R5xx CONDITIONAL
OPCODES
In a manner similar to R400, R5xx has conditional moves, writes, or muxes to support VS3.0 predication and simple
dynamic flow control. For predication support in VS3.0, a temporary memory vector can be used in place of a
predication bit. VE_COND_WRITE_EQ, VE_COND_WRITE_GT, VE_COND_WRITE_GTE, and
VE_COND_WRITE_NEQ have two input vector source operands where the first source operand is a conditional
component write mask for the writing of the second source vector into the destination vector. An example of VS3.0
predication being supported with a conditional move or write is as follows:
P = pred_set_gt(A.xyzw,Bxyzw);
(P) Cxyzw = Dxyzw;
(!P) Cxyzw = Exyzw;
TEMPxyzw = VE_SET_GREATER_THAN(A.xyzw,Bxyzw);
Cxyzw = VE_COND_WRITE_NEQ(TEMPxyzw,Dxyzw);
Cxyzw = VE_COND_WRITE_EQ(TEMPxyzw,Exyzw);
Conditional mux opcodes include VE_COND_MUX_EQ, VE_COND_MUX_GT, and VE_COND_MUX_GTE
have three input vector source operands where the first source operand is a component mux select selecting between
the second and third source vectors to write the destination vector. The above example can simplified to the
following:
TEMPxyzw = VE_SET_GREATER_THAN(A.xyzw,Bxyzw);
Cxyzw = VE_COND_MUX_EQ(TEMPxyzw,Exyzw,Dxyzw);
The primary limitation of the conditional mux opcodes is that only two of the three source operands can come from
temporary memory since the temporary memory has only two read ports. A possible solution is using the input
memory as a temporary location for one of the three source operands (the input memory can be written by the vector
and math engine). Also, VE_COND_MUX operations could be reverted into two VE_COND_WRITE
opcoderations as above.
© 2010 Advanced Micro Devices, Inc.
Proprietary
76
Revision 1.5
June 8, 2010
7.5.5 PVS FLOW CONTROL CAPABILITY
R300 adds the DX9 support for Vertex Shader Flow control. There are 3 types of flow control instructions: JMP,
LOOP and JSR. Up to 16 total JMP, LOOP, and JSR instructions are allowed for any one shader program.
A JMP is a simple conditional JMP from one instruction to another instruction. Only forward jumps are allowed by
DX9. The hardware is capable of backward jumps, but they are not recommended. There is not actually a
conditional jump in R300, if the Boolean jump bit is not set, the the driver should disable the JMP.
A JSR instruction is a conditional Jump to Subroutine. Similar to the JMP, if the JSR Boolean control is disabled,
the driver should disable the JSR. Upon reaching the activation instruction, (the JSR), a jump is made to the
subroutine label (the jump address). The RET instruction is temporarily “activated” in the HW such that when the
RET instruction is reached, it jumps back to the location specified in the VAP_PVS_FLOW_CNTL_ADDRS#
register.
A LOOP instruction allows a set of instructions to be executed multiple times. Upon reaching the loop start
instruction, the loop count is initialized and the fixed-point loop index register is initialized. The Loop End
instruction address is temporarily “activated” such that when that instruction is reached, the loop count is
decremented, the fixed-point loop index register is incremented (by inc_value) and it jumps back to the location
specified in the VAP_PVS_FLOW_CNTL_ADDRS# register. When loop count is decremented to 0, the
LOOP_END instruction is taken out of the temporarily activated list.
R5xx VS3.0 required the following changes to the PVS flow control capability:
1.
Loops and subroutines can be nested up to four levels deep. The official definition is 4 levels of loops and
4 levels of subroutines. The actual R5xx implementation supports 8 total between loops and subroutines
(any combination not to exceed 8). Some special points with regard to loop and subroutine nesting:
o
Only the inner-most fixed-point loop index register is accessible for memory addressing.
o
The inner-most fixed-point loop index is visible within all nested subroutines.
o
The fixed-point loop index is initialized for a loop on the activation address for the loop.
2.
R5xx support VS3.0 capability for fixed-point loop index addressing for constant memory, input memory,
output memory and temporary memory. VS3.0 requires support for constant memory, input memory, and
output memory. Address clamping is only provided for constant memory, and therefore shader validation
should verify all fixed-point loop index register addressing is within input, output, or temporary boundaries
for that vertex and loop.
3.
R5xx supports VS3.0 capability for the loop repeat construct. The loop repeat is similar to a general loop
except the fixed-point loop index is not initialized at the activation of the loop. The loop repeat inherits the
fixed-point loop index from the above nested loop. Though the init value is not used, the loop step value is
still used for the loop repeat. This enables the possibility for creative dual loop indexing of memories, but
the general VS3.0 functionality would set the step value to 0. Upon loop repeat completion, the original
fixed-point loop index is popped back to its pre-loop repeat value. Loop repeats can be nested and use the
fixed-point loop index under a general loop.
4.
R5xx VS3.0 supports 16 flow control instructions. VS3.0 treats flow control instructions in the same
manner as ALU instructions and therefore has a logical maximum of 512 flow control instructions if no
ALU instructions were used. However, the 16 R5xx flow control registers can really equate to
approximately 32 VS3.0 flow control instructions since an R5xx loop instruction includes the loop begin
and the loop end and a R5xx subroutine call includes the call, the subroutine start, and the subroutine
return.
*NOTE: When a loop count is set to 0, the driver must change the loop instruction to a jump instruction to jump
over the loop, since the control flow in the HW is done at the end of the loop.
Details on the language syntax are described below.
Caveats:
© 2010 Advanced Micro Devices, Inc.
Proprietary
77
Revision 1.5
June 8, 2010
When a loop count is changed to 0, the driver must change this loop to be a jump to the end-of-loop label.
Jump Instruction
jump b#, labelname;
1.
2.
3.
4.
b# is a boolean flow control constant register signified by "b" and "#" can range
from 0 to 15
labelname must be defined downstream and terminated with a ":"
There are 16 flow control constant registers of 1bit boolean type
Jumps are conditional (the jump will only occur if the value in the specified
boolean flow control constant is '1')
Example
mul
mad
jump b2, end;
mad
rcp
end:
mul
out
Subroutine Call Instruction
call b#, labelname;
1.
2.
3.
4.
5.
6.
7.
8.
b# is a boolean flow control constant register signified by "b" and "#" can range
from 0 to 15
labelname must be defined downstream and terminated with a ":"
There are 16 flow control constant registers of 1bit boolean type
Subroutine calls are conditional (the call will only occur if the value in the
specified flow control constant is non-zero)
A subroutine block is defined as the code between the label referenced when called
to the return from subroutine instruction
Loop instructions are allowed inside the subroutine block as long as the end of loop
label is also within the same subroutine block
Nested subroutines and loops are allowed to a depth of 8 total.
A parent fixed-point index is visible through all subroutine nesting.
Example
call b5 normalize;
Return from Subroutine
© 2010 Advanced Micro Devices, Inc.
Proprietary
78
Revision 1.5
June 8, 2010
ret;
1. The "ret" instruction is used to indicate the end of a subroutine
Example
normalize:
dp3 r0.w, r0, r0;
rsq r0.w r0.w;
mul r0, r0, r0.w;
ret;
Loop Instruction
loop i#, labelname;
1.
2.
3.
4.
5.
6.
7.
8.
9.
i# is an integer flow control constant register signified by "i" and "#" can range from
0 to 15
The 'i' register is comprised of three components i#.c loop count (range 0 to 255), i#.i
initial value (range from 0 to 255), and i#.s step value (range from -128 to 127)
which when referenced as i# is an integer scalar defined by i# = i#.i + n*i#.s where n
is the number of times the loop has been traversed The loop value is clamped to be
in the range (–256 – 255) if it over/underflows.
For the "loop" instruction, only the first component (initial value) of the "i" register
is used and the i#.s step value is ignored and treated as '1'
labelname must be defined downstream and terminated with a ":"
The loop will be traversed i#.c times regardless of the i#.i and i#.s values
A zero value i#.c loop count is treated as??? so may not be supported (the driver may
be required to preprocess this case to be a jump to the end-of-loop label)
Jump instructions are allowed within a loop block as long as the jump target label is
also within the same loop block
Jump Subroutine instructions are allowed within a loop block
Nested subroutines and loops are allowed to a depth of 8 total.
Example
mul
mad
loop i13, endloop;
mad
mul
endloop:
mul
out
Loop Instruction With Auto-Increment
iloop i#, labelname;
1.
i# is an integer flow control constant register signified by "i" and "#" can range from
© 2010 Advanced Micro Devices, Inc.
Proprietary
79
Revision 1.5
June 8, 2010
0 to 15
2.
The 'i' register is comprised of three components i#.c loop count (range 0 to 255), i#.i
initial value (range from 0 to 255), and i#.s step value (range from -128 to 127)
which when referenced as i# is an integer scalar defined by i# = i#.i + n*i#.s where n
is the number of times the loop has been traversed The loop value is clamped to be
in the range (–256 – 255) if it over/underflows.
3.
labelname must be defined downstream and terminated with a ":"
4.
The loop will be traversed i#.c times regardless of the i#.i and i#.s values
5.
A zero value i#.c loop count is treated as??? so may not be supported (the driver may
be required to preprocess this case to be a jump to the end-of-loop label)
6.
Jump instructions are allowed within an iloop block as long as the jump target label
is also within the same iloop block
7.
Jump Subroutine instructions are allowed within an iloop block
8.
Nested subroutines and loops are allowed to a depth of 8 total.
9.
With nested loops, only the inner-most fixed-point loop index is accessible for ALU
source operand addressing. The resulting address is not clamped for the input,
output, and temporary memories so shader validation is required to ensure all
addressing using the fixed-point loop index is within the boundaries for that vertex
and loop.
10. A loop repeat construct does not initialize the fixed-point loop index. The loop
repeat inherits the fixed-point loop index from the above nested loop. Though the
init value is not used, the loop step value is still used for the loop repeat. This
enables the possibility for creative dual loop indexing of memories, but the general
VS3.0 functionality would set the step value to 0. Upon loop repeat completion, the
original fixed-point loop index is popped to its pre-loop repeat value.
Example
mul
mad
iloop i5, endloop;
mul
mad r0, r0, c[i5];
add
endloop:
mul
out
// faster to use loop counter than a0
7.5.6 DUAL MATH OP USAGE
The R300 PVS design enables the ability to use both the Vector Engine and the Math Engine on the same clock. An
instruction which combines a Vector Engine and a Math Engine instruction will be termed a Dual-Math Instruction.
A Dual-Math Instruction has the following restrictions:
The Vector Instruction of a Dual-Math Inst must not use more than 2 source operands because the Math Instruction
definition is stored in the 3rd source operand bits of the instruction field.
© 2010 Advanced Micro Devices, Inc.
Proprietary
80
Revision 1.5
June 8, 2010
The Math Instruction of a Dual-Math Inst must have 2 or less source scalar operands which must both come from a
single source vector. Swizzles enable the two scalar operands to come from any components of the single source
vector.
The Vector Instruction of a Dual-Math Inst cannot have the destination operand use the ATRM memory.
The Math Instruction of a Dual-Math Inst can only use the ATRM memory as the destination operand and can only
write to locations 0-3 and cannot use relative addressing (address register).
The combined instructions source operands must conform to the same memory restrictions as a single op (1 unique
src from CSM, IVM, ATRM, 2 unique src from TRM (3 unique src from TRM only allowed for single op Vector
Macro inst)).
7.5.7 VECTOR INSTRUCTIONS
VE_DOT_PRODUCT: 2 VECTOR SOURCE OPERANDS
OUT.X = ((IN_A.X * IN_B.X) + (IN_A.Y * IN_B.Y)
+ (IN_A.Z * IN_B.Z) + (IN_A.W * IN_B.W));
OUT.Y = OUT.Z = OUT.W = OUT.X
VE_MULTIPLY: 2 VECTOR SOURCE OPERANDS
OUT.X = IN_A.X * IN_B.X;
OUT.Y = IN_A.Y * IN_B.Y;
OUT.Z = IN_A.Z * IN_B.Z;
OUT.W = IN_A.W * IN_B.W;
VE_ADD: 2 VECTOR SOURCE OPERANDS
OUT.X = IN_A.X + IN_B.X;
OUT.Y = IN_A.Y + IN_B.Y;
OUT.Z = IN_A.Z + IN_B.Z;
OUT.W = IN_A.W + IN_B.W;
VE_MULTIPLY_ADD: 3 VECTOR SOURCE OPERANDS (MACRO IF 3 UNIQUE TEMPS)
OUT.X = (IN_A.X * IN_B.X) + IN_C.X;
OUT.Y = (IN_A.Y * IN_B.Y) + IN_C.Y;
OUT.Z = (IN_A.Z * IN_B.Z) + IN_C.Z;
OUT.W = (IN_A.W * IN_B.W) + IN_C.W;
© 2010 Advanced Micro Devices, Inc.
Proprietary
81
Revision 1.5
June 8, 2010
VE_DISTANCE_VECTOR: 2 VECTOR SOURCE OPERANDS
OUT.X = 1.0;
OUT.Y = IN_A.Y * IN_B.Y;
OUT.Z = IN_A.Z;
OUT.W = IN_B.W;
Potentially useful as follows (XX = Don‟t Care, D = Depth)
IN_A = (XX, D * D, D * D, XX)
IN_B = (XX, 1 / D, XX, 1 / D)
OUT = (1.0, D, D*D, 1/D) for light attenuation multiply.
VE_FRACTION: 1 VECTOR SOURCE OPERAND
OUT.X = IN_A.X – FLOOR(IN_A.X);
OUT.Y = IN_A.Y – FLOOR(IN_A.Y);
OUT.Z = IN_A.Z – FLOOR(IN_A.Z);
OUT.W = IN_A.W – FLOOR(IN_A.W);
This function returns the positive difference between a floating point number and the
largest integer number less than the floating point number.
VE_MAXIMUM:
2 VECTOR SOURCE OPERANDS
OUT.X = MAX(IN_A.X, IN_B.X);
OUT.Y = MAX(IN_A.Y, IN_B.Y);
OUT.Z = MAX(IN_A.Z, IN_B.Z);
OUT.W = MAX(IN_A.W, IN_B.W);
VE_MINIMUM:
2 VECTOR SOURCE OPERANDS
OUT.X = MIN(IN_A.X, IN_B.X);
OUT.Y = MIN(IN_A.Y, IN_B.Y);
OUT.Z = MIN(IN_A.Z, IN_B.Z);
© 2010 Advanced Micro Devices, Inc.
Proprietary
82
Revision 1.5
June 8, 2010
OUT.W = MIN(IN_A.W, IN_B.W);
VE_SET_GREATER_THAN_EQUAL: 2 VECTOR SOURCE OPERANDS
OUT.X = (IN_A.X >= IN_B.X) ? 1.0 : 0.0;
OUT.Y = (IN_A.Y >= IN_B.Y) ? 1.0 : 0.0;
OUT.Z = (IN_A.Z >= IN_B.Z) ? 1.0, 0.0;
OUT.W = (IN_A.W >= IN_B.W) ? 1.0, 0.0;
VE_SET_LESS_THAN:
2 VECTOR SOURCE OPERANDS
OUT.X = (IN_A.X < IN_B.X) ? 1.0, 0.0;
OUT.Y = (IN_A.Y < IN_B.Y) ? 1.0, 0.0;
OUT.Z = (IN_A.Z < IN_B.Z) ? 1.0, 0.0;
OUT.W = (IN_A.W < IN_B.W) ? 1.0, 0.0;
VE_MULTIPLYX2_ADD:
TEMPS)
3 VECTOR SOURCE OPERANDS (MACRO IF 3 UNIQUE
OUT.X = (2.0 * (IN_A.X * IN_B.X)) + IN_C.X;
OUT.Y = (2.0 * (IN_A.Y * IN_B.Y)) + IN_C.Y;
OUT.Z = (2.0 * (IN_A.Z * IN_B.Z)) + IN_C.Z;
OUT.W = (2.0 * (IN_A.W * IN_B.W)) + IN_C.W;
VE_MULTIPLY_CLAMP:
UNIQUE TEMPS)
3 VECTOR SOURCE OPERANDS (NO MACRO -> NO 3
IF(C.W < (A.W * B.W)) {
OUT.X = C.W;
} ELSE IF(C.X >= (A.X * B.X)) {
OUT.X = C.X;
} ELSE {
OUT.X = A.X * B.X;
}
© 2010 Advanced Micro Devices, Inc.
Proprietary
83
Revision 1.5
June 8, 2010
OUT.Y = OUT.Z = OUT.W = OUT.X;
This function is used for point sprite clamping. May or may not be useful for other
functions.
VE_FLT2FIX_DX:
1 VECTOR SOURCE OPERAND
OUT.X = FLOOR(IN_A.X);
OUT.Y = FLOOR(IN_A.Y);
OUT.Z = FLOOR(IN_A.Z);
OUT.W = FLOOR(IN_A.W);
This function is a component-wise float to fixed conversion which returns the largest
integer less than the input value. This function is used to load the address register.
VE_FLT2FIX_DX_RND: 1 VECTOR SOURCE OPERAND
OUT.X = FLOOR(IN_A.X + 0.5);
OUT.Y = FLOOR(IN_A.Y + 0.5);
OUT.Z = FLOOR(IN_A.Z + 0.5);
OUT.W = FLOOR(IN_A.W + 0.5);
This function is a component-wise float to fixed conversion which returns the nearest
integer to the input value. This function is used to load the address register.
VE_PRED_SET_EQ_PUSH: 2 VECTOR SOURCE OPERANDS
IF( (IN_B.W==0) && (IN_A.W==0) ) {
PREDICATE_BIT = 1;
OUT.W = 0;
} ELSE {
PREDICATE_BIT = 0;
OUT.W = IN_A.W + 1.0;
}
OUT.X = OUT.Y = OUT.Z = OUT.W;
VE_PRED_SET_GT_PUSH: 2 VECTOR SOURCE OPERANDS
IF( (IN_B.W>0) && (IN_A.W==0) ) {
© 2010 Advanced Micro Devices, Inc.
Proprietary
84
Revision 1.5
June 8, 2010
PREDICATE_BIT = 1;
OUT.W = 0;
} ELSE {
PREDICATE_BIT = 0;
OUT.W = IN_A.W + 1.0;
}
OUT.X = OUT.Y = OUT.Z = OUT.W;
VE_PRED_SET_GTE_PUSH: 2 VECTOR SOURCE OPERANDS
IF( (IN_B.W>=0) && (IN_A.W==0) ) {
PREDICATE_BIT = 1;
OUT.W = 0;
} ELSE {
PREDICATE_BIT = 0;
OUT.W = IN_A.W + 1.0;
}
OUT.X = OUT.Y = OUT.Z = OUT.W;
VE_PRED_SET_NEQ_PUSH: 2 VECTOR SOURCE OPERANDS
IF( (IN_B.W!=0) && (IN_A.W==0) ) {
PREDICATE_BIT = 1;
OUT.W = 0;
} ELSE {
PREDICATE_BIT = 0;
OUT.W = IN_A.W + 1.0;
}
OUT.X = OUT.Y = OUT.Z = OUT.W;
VE_COND_WRITE_EQ4 : 2 VECTOR SOURCE OPERANDS
WRITE_ENABLE[0] = ( IN_A.X==0 ) ? 1 : 0;
© 2010 Advanced Micro Devices, Inc.
Proprietary
85
Revision 1.5
June 8, 2010
WRITE_ENABLE[1] = ( IN_A.Y==0 ) ? 1 : 0;
WRITE_ENABLE[2] = ( IN_A.Z==0 ) ? 1 : 0;
WRITE_ENABLE[3] = ( IN_A.W==0 ) ? 1 : 0;
OUT = IN_B;
VE_COND_WRITE_GT4 : 2 VECTOR SOURCE OPERANDS
WRITE_ENABLE[0] = ( IN_A.X>0 ) ? 1 : 0;
WRITE_ENABLE[1] = ( IN_A.Y>0 ) ? 1 : 0;
WRITE_ENABLE[2] = ( IN_A.Z>0 ) ? 1 : 0;
WRITE_ENABLE[3] = ( IN_A.W>0 ) ? 1 : 0;
OUT = IN_B;
VE_COND_WRITE_GTE4 : 2 VECTOR SOURCE OPERANDS
WRITE_ENABLE[0] = ( IN_A.X>=0 ) ? 1 : 0;
WRITE_ENABLE[1] = ( IN_A.Y>=0 ) ? 1 : 0;
WRITE_ENABLE[2] = ( IN_A.Z>=0 ) ? 1 : 0;
WRITE_ENABLE[3] = ( IN_A.W>=0 ) ? 1 : 0;
OUT = IN_B;
VE_COND_WRITE_NEQ4 : 2 VECTOR SOURCE OPERANDS
WRITE_ENABLE[0] = ( IN_A.X!=0 ) ? 1 : 0;
WRITE_ENABLE[1] = ( IN_A.Y!=0 ) ? 1 : 0;
WRITE_ENABLE[2] = ( IN_A.Z!=0 ) ? 1 : 0;
WRITE_ENABLE[3] = ( IN_A.W!=0 ) ? 1 : 0;
OUT = IN_B;
VE_COND_MUX_EQ4 : 3 VECTOR SOURCE OPERANDS
// only 2 unique input vectors can be from temporary storage
OUT.X = ( IN_A.X==0 ) ? IN_B.X : IN_C.X;
OUT.Y = ( IN_A.Y==0 ) ? IN_B.Y : IN_C.Y;
OUT.Z = ( IN_A.Z ==0 ) ? IN_B.Z : IN_C.Z;
© 2010 Advanced Micro Devices, Inc.
Proprietary
86
Revision 1.5
June 8, 2010
OUT.W = ( IN_A.W==0 ) ? IN_B.W : IN_C.W;
VE_COND_MUX_GT4 : 3 VECTOR SOURCE OPERANDS
// only 2 unique input vectors can be from temporary storage
OUT.X = ( IN_A.X>0 ) ? IN_B.X : IN_C.X;
OUT.Y = ( IN_A.Y>0 ) ? IN_B.Y : IN_C.Y;
OUT.Z = ( IN_A.Z >0 ) ? IN_B.Z : IN_C.Z;
OUT.W = ( IN_A.W>0 ) ? IN_B.W : IN_C.W;
VE_COND_MUX_GTE4 : 3 VECTOR SOURCE OPERANDS
// only 2 unique input vectors can be from temporary storage
OUT.X = ( IN_A.X>=0 ) ? IN_B.X : IN_C.X;
OUT.Y = ( IN_A.Y>=0 ) ? IN_B.Y : IN_C.Y;
OUT.Z = ( IN_A.Z >=0 ) ? IN_B.Z : IN_C.Z;
OUT.W = ( IN_A.W>=0 ) ? IN_B.W : IN_C.W;
VE_SET_GREATER_THAN: 2 VECTOR SOURCE OPERANDS
OUT.X = (IN_A.X > IN_B.X) ? 1.0 : 0.0;
OUT.Y = (IN_A.Y > IN_B.Y) ? 1.0 : 0.0;
OUT.Z = (IN_A.Z > IN_B.Z) ? 1.0, 0.0;
OUT.W = (IN_A.W > IN_B.W) ? 1.0, 0.0;
VE_SET_EQUAL:
2 VECTOR SOURCE OPERANDS
OUT.X = (IN_A.X == IN_B.X) ? 1.0 : 0.0;
OUT.Y = (IN_A.Y== IN_B.Y) ? 1.0 : 0.0;
OUT.Z = (IN_A.Z == IN_B.Z) ? 1.0, 0.0;
OUT.W = (IN_A.W == IN_B.W) ? 1.0, 0.0;
VE_SET_NOT_EQUAL:
2 VECTOR SOURCE OPERANDS
OUT.X = (IN_A.X != IN_B.X) ? 1.0 : 0.0;
OUT.Y = (IN_A.Y != IN_B.Y) ? 1.0 : 0.0;
OUT.Z = (IN_A.Z != IN_B.Z) ? 1.0, 0.0;
© 2010 Advanced Micro Devices, Inc.
Proprietary
87
Revision 1.5
June 8, 2010
OUT.W = (IN_A.W != IN_B.W) ? 1.0, 0.0;
NOTES
* A Vector Move Instruction can be accomplished via a VE_ADD with other source operand set to (0,0,0,0).
* A 3-Component Dot Product can be accomplished via a VE_DOT_PRODUCT with 4 th components forced to 0.0.
7.5.8 SCALAR INSTRUCTIONS
The scalar (math) instructions have changed their src operands somewhat for R300. The general rules are as
follows:
1.
Only w channels of src operands are available for math ops
2.
For all 1 source operand instructions, the input is IN_A.W (except for ME_EXP_BASEE_FF
because of rule 3 below)
3.
All source operands which are powers (e^x, 2^x, x^y, etc) will be on IN_C.W, all source operands
which are bases will be on IN_A.W and all sources which are clamps will be on IN_B.W. As long
as the compiler (driver) replicates the last valid src operand to all unused src operands, the
behavior looks clean as follows:
i. 1 source operand instructions (like e^x), the x would be in IN_C.W, but it can appear as
if in IN_A.W as long as this value is replicated
ii. 2 source operand instructions (like x^y), the base is in the IN_A.W, and the pow is in
IN_C.W, but it can appear as if in IN_B.W as long as this value is replicated.
All of the function definitions below are written with the assumption that the last valid source operand is replicated
to the “unused” source operands. The HW does not always use the source operands specified, sometimes it relies on
the replication. These will be noted in comments below.
ME_EXP_BASE2_DX:
1 SCALAR SOURCE OPERAND
OUT.X = 2 ^ FLOOR(IN_A.W);
IF (IN_A.W > 128.0) {
OUT.Y = 0.0; //NOTE: THIS IS NOT EQUIV TO DX BEHAVIOR
} ELSE {
OUT.Y = FRAC(IN_A.W);
}
OUT.Z = 2 ^ (IN_A.W);
OUT.W = 1.0;
ME_LOG_BASE2_DX:
1 SCALAR SOURCE OPERAND
IF(IN_A.W == 0.0) {
OUT.X = MINUS_MAX_FLOAT;
© 2010 Advanced Micro Devices, Inc.
Proprietary
88
Revision 1.5
June 8, 2010
OUT.Y = 1.0;
OUT.Z = MINUS_MAX_FLOAT;
OUT.W = 1.0;
} ELSE {
OUT.X = Unbiased exponent of ABS(IN_A.W) as float(i.e. 4.0 -> 2.0);
OUT.Y = mantissa of IN_A.W as float (1.0 <= OUT.Y < 2.0);
OUT.Z = LOG2(ABS(IN_A.W));
OUT.W = 1.0;
}
ME_EXP_BASEE_FF: 1 SCALAR SOURCE OPERAND
OUT.X = e ^ (IN_A.W); //NOTE WAS IN_A.X FOR R200 *FROM C.W, IN_A.W if operand
replicate
OUT.Y = OUT.Z = OUT.W = OUT.X;
ME_LIGHT_COEFF_DX: 3 SCALAR SOURCE OPERANDS (NO MACRO -> NO 3 UNIQUE
TEMPS)
This function was a single vector source operand for R200. Now it uses 3 vector source operands
(w components only).
The 3 operands may be the same vector using different swizzles to emulate R200 behavior.
OUT.X = 1.0;
OUT.Y = MAX(IN_B.W, 0.0);
IF(IN_B.W > 0) {
IN_C.W = CLAMP(IN_C.W, -128.0, 128.0);
OUT.Z = (MAX(IN_A.W, 0.0)) ^ IN_C.W;
} ELSE {
OUT.Z = 0.0;
}
OUT.W = 1.0;
ME_POWER_FUNC_FF:
© 2010 Advanced Micro Devices, Inc.
Proprietary
2 SCALAR SOURCE OPERANDS (IN ONE VECTOR)
89
Revision 1.5
June 8, 2010
IF(IN_A.W < 0.0) {
OUT.X = - (ABS(IN_A.W) ^ IN_B.W); //IN_B.W is from IN_C.W, but same if operand
replicate
} ELSE {
OUT.X = IN_A.W ^ IN_B.W;
}
OUT.Y = OUT.Z = OUT.W = OUT.X;
Special cases (in order of detection) are (using x^n notation):
0.0^-n  Plus Infinity
0.0^n  0.0
x ^ 0.0  1.0
Inf ^-n  0.0
Inf ^n -> Inf
IF (x >1.0 and n == -Inf)  0.0
IF (x <1.0 and n == -Inf)  Inf
IF (x >1.0 and n == Inf)  Inf
IF (x <1.0 and n == Inf)  0.0
ME_RECIP_DX:
1 SCALAR SOURCE OPERAND
OUT.X = 1.0 / IN_A.W
OUT.Y = OUT.Z = OUT.W = OUT.X;
An input of 0.0 yields a result of MAX_FLOAT.
ME_RECIP_FF:
1 SCALAR SOURCE OPERAND
OUT.X = 1.0 / IN_A.W
OUT.Y = OUT.Z = OUT.W = OUT.X;
An input of 0.0 yields a result of zero.
ME_RECIP_SQRT_DX:
1 SCALAR SOURCE OPERAND
OUT.X = 1.0 / SQRT(ABS(IN_A.W))
OUT.Y = OUT.Z = OUT.W = OUT.X;
© 2010 Advanced Micro Devices, Inc.
Proprietary
90
Revision 1.5
June 8, 2010
An input of 0.0 yields a result of MAX_FLOAT.
ME_RECIP_SQRT_FF:
1 SCALAR SOURCE OPERAND
OUT.X = 1.0 / SQRT(ABS(IN_A.W))
OUT.Y = OUT.Z = OUT.W = OUT.X;
An input of 0.0 yields a result of zero.
ME_MULTIPLY:
2 SCALAR SOURCE OPERANDS (IN ONE VECTOR)
OUT.X = IN_A.W * IN_B.W;
OUT.Y = OUT.Z = OUT.W = OUT.X;
ME_EXP_BASE2:
1 SCALAR SOURCE OPERAND
OUT.X = 2.0 ^ (IN_A.W); //*FROM C.W, IN_A.W if operand replicate
OUT.Y = OUT.Z = OUT.W = OUT.X;
ME_LOG_BASE2:
1 SCALAR SOURCE OPERAND
OUT.X = LOG2(ABS(IN_A.W));
OUT.Y = OUT.Z = OUT.W = OUT.X;
An input of 0.0 yields a result of MINUS_MAX_FLOAT.
ME_POWER_FUNC_FF_CLAMP_B: 3 SCALAR SOURCE OPERANDS (NO MACRO)
IF (IN_A.W < IN_B.W) { //IN_B.W is the clamp value.
OUT.X = 0.0;
} ELSE {
SAME BEHAVIOR AS ME_POWER_FUNC_FF WITH IN_A.W as base and IN_C.W as
power (not IN_B.W).
}
OUT.Y = OUT.Z = OUT.W = OUT.X;
ME_POWER_FUNC_FF_CLAMP_B1: 3 SCALAR SOURCE OPERANDS (NO MACRO)
IF (IN_A.W < IN_B.W) { //IN_B.W is the clamp value.
OUT.X = 0.0;
} ELSE IF (IN_A.W > 1.0) {
© 2010 Advanced Micro Devices, Inc.
Proprietary
91
Revision 1.5
June 8, 2010
OUT.X = 1.0;
} ELSE {
SAME BEHAVIOR AS ME_POWER_FUNC_FF WITH IN_A.W as base and IN_C.W as
power (not IN_B.W).
}
OUT.Y = OUT.Z = OUT.W = OUT.X;
ME_POWER_FUNC_FF_CLAMP_01: 2 SCALAR SOURCE OPERANDS
IF (IN_A.W <= 0.0) {
OUT.X = 0.0;
} ELSE IF (IN_A.W > 1.0) {
OUT.X = 1.0;
} ELSE {
SAME BEHAVIOR AS ME_POWER_FUNC_FF
}
OUT.Y = OUT.Z = OUT.W = OUT.X;
ME_SIN:
1 SCALAR SOURCE OPERAND
OUT.X = SIN(IN_A.W);
OUT.Y = OUT.Z = OUT.W = OUT.X;
The hardware implementation of SIN/COS clamps the input, including nans and infs, to -pi to +pi
before computing the output, so for any inputs outside that range, cos(x) = -1 and sin(x) = 0. Except
for inputs of zero where sin(0) = 0, the minimum value that this function will output is +/0x33800000. In other words, the absolute value of the output is clamped to 0x33800000 minimum
except for sin(0) and sin(+/-pi).
ME_COS:
1 SCALAR SOURCE OPERAND
OUT.X = COS(IN_A.W);
OUT.Y = OUT.Z = OUT.W = OUT.X;
The hardware implementation of SIN/COS clamps the input, including nans and infs, to -pi to +pi
before computing the output, so for any inputs outside that range, cos(x) = -1 and sin(x) = 0. Except
for inputs of zero where sin(0) = 0, the minimum value that this function will output is +/0x33800000. In other words, the absolute value of the output is clamped to 0x33800000 minimum
except for sin(0) and sin(+/-pi).
ME_LOG_BASE2_IEEE:
© 2010 Advanced Micro Devices, Inc.
Proprietary
1 SCALAR SOURCE OPERAND
92
Revision 1.5
June 8, 2010
OUT.X = LOG2(ABS(IN_A.W));
OUT.Y = OUT.Z = OUT.W = OUT.X;
An input of 0.0 yields a result of minus infinity.
ME_RECIP_IEEE:
1 SCALAR SOURCE OPERAND
OUT.X = 1.0 / IN_A.W
OUT.Y = OUT.Z = OUT.W = OUT.X;
An input of 0.0 yields a result of infinity.
ME_RECIP_SQRT_IEEE:
1 SCALAR SOURCE OPERAND
OUT.X = 1.0 / SQRT(ABS(IN_A.W))
OUT.Y = OUT.Z = OUT.W = OUT.X;
An input of 0.0 yields a result of infinity.
ME_PRED_SET_EQ:
1 SCALAR SOURCE OPERAND
IF(IN_A.W==0) {
PREDICATE_BIT = 1;
OUT.X = OUT.Y = OUT.Z = OUT.W = 0;
} ELSE {
PREDICATE_BIT = 0;
OUT.X = OUT.Y = OUT.Z = OUT.W = 1;
}
ME_PRED_SET_GT:
1 SCALAR SOURCE OPERAND
IF(IN_A.W > 0) {
PREDICATE_BIT = 1;
OUT.X = OUT.Y = OUT.Z = OUT.W = 0;
} ELSE {
PREDICATE_BIT = 0;
OUT.X = OUT.Y = OUT.Z = OUT.W = 1;
}
© 2010 Advanced Micro Devices, Inc.
Proprietary
93
Revision 1.5
ME_PRED_SET_GTE:
June 8, 2010
1 SCALAR SOURCE OPERAND
IF(IN_A.W >= 0) {
PREDICATE_BIT = 1;
OUT.X = OUT.Y = OUT.Z = OUT.W = 0;
} ELSE {
PREDICATE_BIT = 0;
OUT.X = OUT.Y = OUT.Z = OUT.W = 1;
}
ME_PRED_SET_NEQ:
1 SCALAR SOURCE OPERAND
IF(IN_A.W != 0) {
PREDICATE_BIT = 1;
OUT.X = OUT.Y = OUT.Z = OUT.W = 0;
} ELSE {
PREDICATE_BIT = 0;
OUT.X = OUT.Y = OUT.Z = OUT.W = 1;
}
ME_PRED_SET_CLR:
0 SCALAR SOURCE OPERANDS
PREDICATE_BIT = 1;
OUT.X = OUT.Y = OUT.Z = OUT.W = MAX_FLOAT;
ME_PRED_SET_INV:
1 SCALAR SOURCE OPERAND
IF(IN_A.W==1) {
PREDICATE_BIT = 1;
OUT.X = OUT.Y = OUT.Z = OUT.W = 0;
} ELSE {
PREDICATE_BIT = 0;
IF(IN_A.W==0) {
OUT.X = OUT.Y = OUT.Z = OUT.W = 1;
© 2010 Advanced Micro Devices, Inc.
Proprietary
94
Revision 1.5
June 8, 2010
} ELSE {
OUT.X = OUT.Y = OUT.Z = OUT.W = IN_A.W;
}
}
ME_PRED_SET_POP:
1 SCALAR SOURCE OPERAND
OUT.W = IN_A.W – 1.0;
IF(OUT.W < 0) {
PREDICATE_BIT = 1;
OUT.W = 0;
} ELSE {
PREDICATE_BIT = 0;
}
OUT.X = OUT.Y = OUT.Z = OUT.W;
ME_PRED_SET_RESTORE:
1 SCALAR SOURCE OPERAND
IF(IN_A.W==0) {
PREDICATE_BIT = 1;
OUT.X = OUT.Y = OUT.Z = OUT.W = 0;
} ELSE {
PREDICATE_BIT = 0;
OUT.X = OUT.Y = OUT.Z = OUT.W = IN_A.W;
}
7.5.9 PVS INSTRUCTION DEFINITION
© 2010 Advanced Micro Devices, Inc.
Proprietary
95
Revision 1.5
June 8, 2010
PVS INSTRUCTION
Description of PVS 128-bit Instruction for Vector Memory
Field Name
PVS_OP_DST_OPERAND
PVS_SRC_OPERAND_0
PVS_SRC_OPERAND_1
PVS_SRC_OPERAND_2
Bit(s)
31:0
63:32
95:64
127:96
Description
Defines the opcode and destination operand.
Defines the first source operand for the instruction.
Defines the first source operand for the instruction.
Defines the first source operand for the instruction.
PVS Source Operand Description
Applies to PVS_SRC_OPERAND_0,1 & 2
Field Name
PVS_SRC_REG_TYPE
SPARE_0
PVS_SRC_ABS_XYZW
PVS_SRC_ADDR_MODE_0
PVS_SRC_OFFSET
PVS_SRC_SWIZZLE_X
PVS_SRC_SWIZZLE_Y
PVS_SRC_SWIZZLE_Z
PVS_SRC_SWIZZLE_W
PVS_SRC_MODIFIER_X
PVS_SRC_MODIFIER_Y
PVS_SRC_MODIFIER_Z
PVS_SRC_MODIFIER_W
PVS_SRC_ADDR_SEL
PVS_SRC_ADDR_MODE_1
Bit(s)
1:0
2
3
4
12:5
15:13
18:16
21:19
24:22
25
26
27
28
30:29
31
Description
Defines the Memory Select (Register Type) for the Source Operand. See
Below.
If set, Take absolute value of all 4 components of input vector.
Combine ADDR_MODE_1 (msb) with ADDR_MODE_0 (lsb) to form 2-bit
ADDR_MODE as follows:
0 = Absolute addressing
1 = Relative addressing using A0 register
2 = Relative addressing using I0 register (loop index)
Vector Offset into selected memory (Register Type)
X-Component Swizzle Select. See Below
Y-Component Swizzle Select. See Below
Z-Component Swizzle Select. See Below
W-Component Swizzle Select. See Below
If set, Negate X Component of input vector.
If set, Negate Y Component of input vector.
If set, Negate Z Component of input vector.
If set, Negate W Component of input vector.
When PVS_SRC_ADDR_MODE is set, this selects which component of the
4-component address register to use.
Combine ADDR_MODE_1 (msb) with ADDR_MODE_0 (lsb) to form 2-bit
ADDR_MODE as follows:
0 = Absolute addressing
1 = Relative addressing using A0 register
2 = Relative addressing using I0 register (loop index)
The memory selects (or register type) valid selections are as follows:
© 2010 Advanced Micro Devices, Inc.
Proprietary
96
Revision 1.5
June 8, 2010
SOURCE REG_TYPES:
PVS_SRC_REG_TEMPORARY
= 0; //Intermediate storage
PVS_SRC_REG_INPUT
= 1; //Input Vertex Storage
PVS_SRC_REG_CONSTANT
= 2; //Constant State Storage
PVS_SRC_REG_ALT_TEMPORARY
= 3; //Alternate Intermediate Storage
The valid swizzle selects are as follows:
PVS_SRC_SELECT_X
= 0; //Select X Component
PVS_SRC_SELECT_Y
= 1; //Select Y Component
PVS_SRC_SELECT_Z
= 2; //Select Z Component
PVS_SRC_SELECT_W
= 3; //Select W Component
PVS_SRC_SELECT_FORCE_0
= 4; //Force Component to 0.0
PVS_SRC_SELECT_FORCE_1
= 5; //Force Component to 1.0
For R5xx VS3.0, the PVS_SRC_ABS_XYZW bits enables the absolute value for the four components of the source
vector.
© 2010 Advanced Micro Devices, Inc.
Proprietary
97
Revision 1.5
June 8, 2010
PVS Opcode & Destination Operand Description
Field Name
PVS_DST_OPCODE
PVS_DST_MATH_INST
PVS_DST_MACRO_INST
PVS_DST_REG_TYPE
PVS_DST_ADDR_MODE_1
Bit(s)
5:0
6
7
11:8
12
PVS_DST_OFFSET
PVS_DST_WE_X
PVS_DST_WE_Y
PVS_DST_WE_Z
PVS_DST_WE_W
PVS_DST_VE_SAT
19:13
20
21
22
23
24
PVS_DST_ME_SAT
PVS_DST_PRED_ENABLE
25
26
PVS_DST_PRED_SENSE
27
PVS_DST_DUAL_MATH_OP
PVS_DST_ADDR_SEL
PVS_DST_ADDR_MODE_0
28
30:29
31
Description
Selects the Operation which is to be performed.
Specifies a Math Engine Operation
Specifies a Macro Operation
Defines the Memory Select (Register Type) for the Dest Operand.
Combine ADDR_MODE_1 (msb) with ADDR_MODE_0 (lsb) to form 2-bit
ADDR_MODE as follows:
0 = Absolute addressing
1 = Relative addressing using A0 register
2 = Relative addressing using I0 register (loop index)
Vector Offset into the Selected Memory
Write Enable for X Component
Write Enable for Y Component
Write Enable for Z Component
Write Enable for W Component
Vector engine operation is saturate clamped between 0 and 1 (all
components)
Math engine operation is saturate clamped between 0 and 1 (all components)
Operation is predicated – Operation writes if predicate bit matches predicate
sense.
Operation predication sense – If set, operation writes if predicate bit is set. If
reset, operation writes if predicate bit is reset.
Set to describe a dual-math op.
When PVS_DST_ADDR_MODE is set, this selects which component of the
4-component address register to use.
Combine ADDR_MODE_1 (msb) with ADDR_MODE_0 (lsb) to form 2-bit
ADDR_MODE as follows:
0 = Absolute addressing
1 = Relative addressing using A0 register
2 = Relative addressing using I0 register (loop index)
For R5xx VS3.0, the PVS_DST_VE_SAT and PVS_DST_ME_SAT bits enable a zero to one saturate clamp for all
four component of the output.
For R5xx VS3.0, the PVS_DST_PRED_ENABLE and PVS_DST_PRED_SENSE bits enable predicated writes for
the temporary memory, the output memory, the alternate temporary memory, the address register, and the input
memory. The PVS_DST_PRED_ENABLE enables the feature while PVS_DST_PRED_SENSE determines the
polarity of the predication bit for the write to be enabled. When the predication bit matches the predication sense,
the predicated write is enabled. For dual vector/math engine operations, both operations are predicated.
The PVS_DST_MACRO_INST bit was meant to be used for MACROS such as a vector-matrix multiply, but
currently is only set for the following cases:
© 2010 Advanced Micro Devices, Inc.
Proprietary
98
Revision 1.5
June 8, 2010

A VE_MULTIPLY_ADD or VE_MULTIPLYX2_ADD instruction with all 3 source operands using
unique PVS_REG_TEMPORARY vector addresses. Since R300 only has two read ports on the temporary
memory, this special case of these instructions is broken up (by the HW) into 2 operations.

When the MACRO enable bit is set, the opcode (lower 6 bits is remapped as follows:
PVS_MACRO_OP_2CLK_MADD
=0
PVS_MACRO_OP_2CLK_M2X_ADD = 1
The PVS_DST_MATH_INST is used to identify whether the instruction is a Vector Engine instruction or a Math
Engine instruction.
The PVS_DST_OPCODE values are listed below:
VECTOR_NO_OP
= 0
VE_DOT_PRODUCT
= 1
VE_MULTIPLY
= 2
VE_ADD
= 3
VE_MULTIPLY_ADD
= 4
VE_DISTANCE_VECTOR
= 5
VE_FRACTION
= 6
VE_MAXIMUM
= 7
VE_MINIMUM
= 8
VE_SET_GREATER_THAN_EQUAL
VE_SET_LESS_THAN
= 10
VE_MULTIPLYX2_ADD
= 11
VE_MULTIPLY_CLAMP
= 12
VE_FLT2FIX_DX
= 13
VE_FLT2FIX_DX_RND
= 14
// NEW R5xx OPCODES - below
VE_PRED_SET_EQ_PUSH
= 15
VE_PRED_SET_GT_PUSH
= 16
VE_PRED_SET_GTE_PUSH
= 17
VE_PRED_SET_NEQ_PUSH
= 18
VE_COND_WRITE_EQ
= 19
VE_COND_WRITE_GT
= 20
VE_COND_WRITE_GTE
= 21
VE_COND_WRITE_NEQ
= 22
VE_COND_MUX_EQ
= 23
VE_COND_MUX_GT
= 24
VE_COND_MUX_GTE
= 25
VE_SET_GREATER_THAN
= 26
VE_SET_EQUAL
= 27
VE_SET_NOT_EQUAL
= 28
MATH_NO_OP
ME_EXP_BASE2_DX
ME_LOG_BASE2_DX
ME_EXP_BASEE_FF
ME_LIGHT_COEFF_DX
ME_POWER_FUNC_FF
ME_RECIP_DX
© 2010 Advanced Micro Devices, Inc.
Proprietary
=
=
=
=
=
=
=
= 9
0
1
2
3
4
5
6
99
Revision 1.5
June 8, 2010
ME_RECIP_FF
= 7
ME_RECIP_SQRT_DX
= 8
ME_RECIP_SQRT_FF
= 9
ME_MULTIPLY
= 10
ME_EXP_BASE2_FULL_DX
= 11
ME_LOG_BASE2_FULL_DX
= 12
ME_POWER_FUNC_FF_CLAMP_B = 13
ME_POWER_FUNC_FF_CLAMP_B1 = 14
ME_POWER_FUNC_FF_CLAMP_01 = 15
ME_SIN
= 16
ME_COS
= 17
// NEW R5xx OPCODES - below
ME_LOG_BASE2_IEEE
= 18
ME_RECIP_IEEE
= 19
ME_RECIP_SQRT_IEEE
= 20
ME_PRED_SET_EQ
= 21
ME_PRED_SET_GT
= 22
ME_PRED_SET_GTE
= 23
ME_PRED_SET_NEQ
= 24
ME_PRED_SET_CLR
= 25
ME_PRED_SET_INV
= 26
ME_PRED_SET_POP
= 27
ME_PRED_SET_RESTORE
= 28
DEST REG_TYPES:
PVS_DST_REG_TEMPORARY
= 0; //Intermediate storage
PVS_DST_REG_A0
PVS_DST_REG_OUT
= 1; //Address Register Storage
= 2; //Output Memory. Used for all outputs
PVS_DST_REG_OUT_REPL_X
= 3; //Output Memory & Replicate X to all channels
PVS_DST_REG_ALT_TEMPORARY
= 4; //Alternate Intermediate Storage
PVS_DST_REG_INPUT
= 5; //Output Memory & Replicate X to all channels
The PVS_REG_A0 may only be used as the destination operand register type when using the VE_FLT2FIX_DX or
the VE_FLT2FIX_DX_RND opcodes.
For R300, PVS_REG_OUT_* is replaced by the single PVS_REG_OUT and the PVS_DST_OFFSET field will be
used to place data in the appropriate vectors. This allows the PVS Output Vertex memories to be variable format for
the variable vertex methodology. The PVS_REG_OUT_REPL_X is equivalent to PVS_REG_OUT except that it
forces the X channel to be replicated onto all 4 output channels. This capability is used to allow the mapping of
Point-Sprite and Discrete Fog to any output memory channel from an instruction with a unique x-channel output.
The PVS_DST_DUAL_MATH_OP bit must be set when combining Vector and Math Engine operations.
The PVS_DST_ADDR_MODE and DST_ADDR_SEL are the same as the SRC operand definitions.
© 2010 Advanced Micro Devices, Inc.
Proprietary
100
Revision 1.5
June 8, 2010
Dual Math Instruction (Replaces PVS_SRC_OPERAND_2)
Field Name
PVS_SRC_REG_TYPE
PVS_DST_OPCODE_MSB
PVS_SRC_ABS_XYZW
PVS_SRC_ADDR_MODE_0
PVS_SRC_OFFSET
PVS_SRC_SWIZZLE_X
PVS_SRC_SWIZZLE_Y
DUAL_MATH_DST_OFFSET
PVS_DST_OPCODE
PVS_SRC_MODIFIER_X
PVS_SRC_MODIFIER_Y
PVS_DST_WE_SEL
PVS_SRC_ADDR_SEL
PVS_SRC_ADDR_MODE_1
Bit(s)
1:0
2
3
4
12:5
15:13
18:16
20:19
24:21
25
26
28:27
30:29
31
Description
Defines the Memory Select (Register Type) for the Source Operand. See
Below.
Math Opcode MSB for Dual Math Inst.
If set, Take absolute value of both components of Dual Math input vector.
Combine ADDR_MODE_1 (msb) with ADDR_MODE_0 (lsb) to form 2-bit
ADDR_MODE as follows:
0 = Absolute addressing
1 = Relative addressing using A0 register
2 = Relative addressing using I0 register (loop index)
Vector Offset into selected memory (Register Type)
X-Component Swizzle Select. See Below
Y-Component Swizzle Select. See Below
Selects Dest Address ATRM 0-3 for Math Inst.
Math Opcode for Dual Math Inst.
If set, Negate X Component of input vector.
If set, Negate Y Component of input vector.
Encoded Write Enable for Dual Math Op Inst (0 = X, 1 = Y, 2 = Z, 3 = W)
When PVS_SRC_ADDR_MODE is set, this selects which component of the
4-component address register to use.
Combine ADDR_MODE_1 (msb) with ADDR_MODE_0 (lsb) to form 2-bit
ADDR_MODE as follows:
0 = Absolute addressing
1 = Relative addressing using A0 register
2 = Relative addressing using I0 register (loop index)
The PVS_DST_OPCODE_MSB is the most significant bit of the PVS_DST_OPCODE field to be used for the math
engine for dual ops. This enables math engine operations 16 through 28 to be used during dual ops.
For R5xx VS3.0, a PVS_SRC_ABS_XYZW bits enables the absolute value for the two components of the dual op
math engine source vector.
7.6 Setting-Up and Starting the VAP
The following method of programming is required in order to get the VAP to run.
The format and storage method for vertex data must be conveyed to the VAP by loading the set of Address and
Attribute registers for the Multiple Arrays of Structures paradigm. The Vertex Format register also must be loaded.
After all of the registers have been set-up, the VAP is started by a single write to the Vertex Fetcher Control
Register (VF_CNTL). This register is said to be an “initiator”, or “trigger” register, because of its characteristic of
causing the VAP to begin running. A single primitive or a group of primitives can be processed as a result of the
single trigger; the exact number of primitives being controlled by the NUM_VERTICES field of the Vertex Fetcher
Control Register.
Depending on the data-flow configuration of the VAP (controlled by the VTX_AMODE and VTX_LOCN fields of
© 2010 Advanced Micro Devices, Inc.
Proprietary
101
Revision 1.5
June 8, 2010
the Vertex Control Register), the VAP may expect an external entity (the host, or Command Processor) to deliver
data for the current operation. It is the responsibility of the external entity to perform the exact number of register
writes in accordance with the value set in the NUM_VERTICES field; otherwise the VAP will hang. For Index
data, the host must write to any dword in the PORT_IDX range; and for parameter data, the host must write to any
dword in the PORT_DATA range.
Once the VAP has completed processing the number of vertices specified in the NUM_VERTICES field, it goes
back to an idle state, waiting for another trigger.
7.7 Methods of Passing Vertex Data
There are three parameters that characterize the passing of vertex data for 3D primitives to the Graphics Controller.
1) Location: Embedded vs. Separate.
In Embedded mode, the vertex information is present directly in the command packet.
In Separate Mode, the command packet contains a pointer to another memory area containing the
vertex information.
2) Addressing Mode: Immediate vs. Indexed.
The vertex information can be expressed as either the vertex data itself (Immediate Mode), or a list of
indices into a buffer of vertices (Indexed Mode).
3) Format: Examples are: StructureOfArrays(SOA), ArrayOfStructures(AOS), Strided Vertex Format.
The format of the vertex data is conveyed to the Setup Engine via the flexible vertex format register, as
well as the address and attribute registers for the Multiple Array of Structures.
The Location and Addressing Mode fields control the “data-flow configuration” of the VAP, specifying what type of
information will be flowing on the register backbone and on the memory backbone while the VAP is processing a
command packet.
© 2010 Advanced Micro Devices, Inc.
Proprietary
102
Revision 1.5
June 8, 2010
8. Fragment Shaders
8.1 Introduction
This section describes the functional behavior of the Universal Shaders of on R5xx.
8.2 Instructions
There are 512 instruction slots. A program can begin execution at any address. In the absence of flow control,
programs will increment the program counter after each instruction. The program counter wraps at 512
automatically, so it is valid to load shader programs which utilize the bottommost and topmost regions of the
instruction store.
Each instruction can be one of four types:
US_INST_TYPE_ALU
Arithmetic and Logic Unit instruction
US_INST_TYPE_OUTPUT
Output instruction (with ALU functionality)
US_INST_TYPE_FC
Flow Control instruction
US_INST_TYPE_TEX
Texture instruction
ALU and OUTPUT instructions both have full RGB and Alpha math functionality. The only functional difference
between them is that ALU instructions can set the predicate bits, and OUTPUT instructions can write to the output
registers. There is no way to do both in the same instruction. Internally, the sequencer must treat instructions that
have potential outputs specially for scheduling. The last executed instruction of the shader program must also be an
OUTPUT instruction, even if it's not outputting anything interesting.
The first OUTPUT instruction will reserve space in the output register fifo. This space is limited, therefore issuing
an OUTPUT earlier than necessary may cause threads to stall earlier than necessary. You should not set an ALU
instruction as type OUTPUT unless it is actually writing to an output register, or it is the last instruction of the
program.
Flow control instructions and texture instructions each have their own interpretation of the bits in the instruction
word.
The active shader should reside in the range US_CODE_RANGE.CODE_ADDR to
US_CODE_RANGE.CODE_ADDR + US_CODE_RANGE.CODE_SIZE, inclusive (note that
US_CODE_RANGE.CODE_SIZE is the size of the shader program, minus one). You may setup additional shaders
in advance outside of this range, but the current shader should not attempt to execute code outside of this range.
The shader has an offset, US_CODE_OFFSET.OFFSET_ADDR, associated with it that is added to various
instruction addresses, minimizing the number of registers you may need to update when relocating a shader. Each
pixel starts the shader at instruction US_CODE_ADDR.START_ADDR + US_CODE_OFFSET.OFFSET_ADDR
(instruction addresses are always modulo 512). Execution continues until the program counter reaches
US_CODE_SIZE.END_ADDR + US_CODE_OFFSET.OFFSET_ADDR. It does not matter how many pixels in
the group are active (even none), the program will end after that instruction is executed. The instruction at the end
© 2010 Advanced Micro Devices, Inc.
Proprietary
103
Revision 1.5
June 8, 2010
address must be an OUTPUT instruction (even if the output mask is zero), and should always wait for the texture
unit semaphore by setting the TEX_SEM_WAIT bit (see below). At the time of termination, the contents of the
output registers are sent to the render targets.
Multiple shaders can be loaded into the instruction memory. Switching between them only requires changing global
registers like US_CODE_ADDR, US_CODE_RANGE, US_CODE_OFFSET, US_PIXSIZE, and US_FC_CTRL.
Updates to shader code outside the currently active program are safe, and do not stall the pipeline. If you intend to
overwrite the active shader, however, the pixel shader pipe must be flushed so that pixels running the old shader get
out before the update. Register writes to US_CODE_ADDR, US_CODE_RANGE, US_CODE_OFFSET, and/or
US_PIXSIZE should flush the pixel shader pipe.
The US instruction and ALU constant registers cannot be written to directly, due to addressing limitations elsewhere
in the pipe. A vector mechanism is provided in the GA block for writing to the US registers. Details on writing the
US registers are provided toward the end of this document.
8.3 Instruction Words
US_INST_TYPE_ALU / US_INST_TYPE_OUTPUT (6 registers):
 US_CMN_INST_*
 US_ALU_RGB_ADDR_*
 US_ALU_ALPHA_ADDR_*
 US_ALU_RGB_INST_*
 US_ALU_ALPHA_INST_*
 US_ALU_RGBA_INST_*
US_INST_TYPE_FC
 US_CMN_INST_*
 US_FC_INST_*
 US_FC_ADDR_*
(3 registers):
US_INST_TYPE_TEX
(4 registers):
 US_CMN_INST_*
 US_TEX_INST_*
 US_TEX_ADDR_*
 US_TEX_ADDR_DXDY_*
The FC and TEX words overlap with the ALU/OUTPUT words in instruction memory. The unused memory
locations for FC and TEX are ignored by US; they may be left uninitialized, or set to zero, with no ill effect.
However, the driver should take care to write to all registers that are required by each instruction type.
Within US_CMN_INST_*, the fields effective for each instruction type are indicated by *s:
TYPE
TEX_SEM_WAIT
RGB_PRED_SEL
RGB_PRED_INV
ALPHA_PRED_SEL
ALPHA_PRED_INV
ALU
*
*
*
*
*
*
© 2010 Advanced Micro Devices, Inc.
Proprietary
OUTPUT
*
*
*
*
*
*
FC
*
*
*
*
TEX
*
*
*
*
*
*
104
Revision 1.5
WRITE_INACTIVE
LAST
NOP
RGB_WMASK
ALPHA_WMASK
RGB_OMASK
ALPHA_OMASK
RGB_CLAMP
ALPHA_CLAMP
ALU_RESULT_SEL
ALU_RESULT_OP
ALU_WAIT
STAT_WE
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
June 8, 2010
*
*
*
*
*
*
*
*
8.3.1 Synchronization of instruction streams
The US allows you to freely intermix instructions of multiple types. It will process the three types (ALU/Output,
Texture, and FC) in parallel whenever possible. Instructions need to be synchronized when an instruction of one
type depends on the output of another type. The cases where explicit synchronization may be required are:



TEX instruction dependent on ALU for source register or predicate. Synchronized with the ALU_WAIT
bit.
FC instruction dependent on ALU for predicate or ALU result. Synchronized with the ALU_WAIT bit.
ALU instruction dependent on TEX for lookup result. Synchronized using the texture semaphore.
A texture or FC instruction that uses a result computed by a prior ALU instruction should set the ALU_WAIT bit.
This forces processing for the thread to stall until pending ALU instructions are complete. A latency of about 30
cycles is imposed on the thread.
Note that a static FC instruction never needs to set ALU_WAIT since it never depends on a result computed within
the shader. Also, an ALU instruction never needs to set ALU_WAIT -- dependencies amongst ALU instructions are
resolved internally.
The texture semaphore is used to synchronize the output of a texture instruction with a subsequent ALU or texture
instruction that uses that result. Since the latency for a texture fetch is difficult to anticipate in advance, the texture
semaphore mechanism is more complex than ALU_WAIT. The texture semaphore is described in more detail
below.
8.4 ALU Instructions
An ALU instruction actually consists of an RGB vector instruction and an Alpha scalar instruction.
There are only a few operations that only one or the other unit can compute, but in each case there is a special
instruction the other engine can use to copy the result.
8.4.1 Sources
© 2010 Advanced Micro Devices, Inc.
Proprietary
105
Revision 1.5
June 8, 2010
Each instruction can specify the addresses for 6 different sources – 3 RGB vectors and 3 Alpha scalars. Each source
can either come from one of 128 temporary registers (which can be modified during the shader, and be different for
each pixel), or from one of 256 constant registers (which can only be changed between geometry packets). In
addition, a source can be an inline constant. The loop variable (aL) may be added to any combination of source
addresses, but may not be added to an inline constant.
Each color register (temporary and constant) consists of a 3-component RGB vector and a scalar Alpha value.
Inline constants are unsigned floating-point values with 4 bits of exponent (with bias 7) and 3 bits mantissa. Inline
constants represent finite values only -- there is no representation for NaN or infinity. Inline constants can express
denormal values though. Also, the bit pattern 0x0 represents 2^-10, rather than zero. Example values are shown
below:
EXPONENT
0x0
0x0
0x0
0x1
0x7
0xf
0xf
0xf
2^-10
2^-9
2^-8
2^-7
2^-6
1
256
480
MANTISSA
0x0
0x1
0x2
0x4
0x0
0x0
0x0
0x7
You can obtain negative inline constants and the value zero using the input modifiers and swizzles, described below.
Each source is specified with three fields. Valid encodings of these fields are shown below (for source 0, in this
example):
register N
register N + aL
constant N
constant N + aL
inline const X
ADDR0[7]
0
0
N / 128
N / 128
1
ADDR0[6:0]
N
N
N % 128
N % 128
X
ADDR0_CONST
0
0
1
1
0
ADDR0_REL
0
1
0
1
0
Note that inline constants set the MSB of ADDR0 and clear ADDR0_CONST.
8.4.2 Presubtract
Each RGB and Alpha instruction has a presubtract operation, which does some extra math on incoming data from
the first or from the first and second sources. The available operations are:
US_SRCP_OP_BIAS
US_SRCP_OP_SUB
US_SRCP_OP_ADD
US_SRCP_OP_INV
1 – 2 * src0
src1 - src0
src1 + src0
1 - src0
The RGB presubtract happens on all three components in parallel. The Alpha presubtract is scalar.
© 2010 Advanced Micro Devices, Inc.
Proprietary
106
Revision 1.5
June 8, 2010
If any presubtract result is used in the instruction, and one of the sources being used in a presubtract is written in the
previous instruction, and the previous instruction is an ALU or output instruction, a NOP needs to be inserted
between the two instructions. Do this by setting the NOP flag in the previous instruction, so the NOP does not
consume an instruction slot. This allows the HW the extra cycle necessary to resolve the dependencies involved in
doing this extra math (there are additional cases where NOP may need to be set, noted below).
NOP is never required if the previous instruction is a texture lookup.
8.4.3 Inputs
Each math operation has zero to three inputs. Each input can be configured to select a source and swizzle its
channels. There are fields to configure 6 inputs per instruction: 3 for RGB and 3 for Alpha. An instruction can read
in at most 12 independent colour components (9 RGB components and 3 alpha components).
8.4.3.1 Select
Each input selects from src0, src1, src2, or the presubtract result ("srcp"). One can conceive of the selects
assembling a 4-component vector as seen below. The swizzle selects (see next section) determine which of the four
values are chosen to actually take part in the computations.
{ rgb_addr0->r
src0 ={ rgb_addr0->g
{ rgb_addr0->b
{ alpha_addr0->a
{ rgb_addr1->r
src1 ={ rgb_addr1->g
{ rgb_addr1->b
{ alpha_addr1->a
{ rgb_addr2->r
src2 ={ rgb_addr2->g
{ rgb_addr2->b
{ alpha_addr2->a
{ rgb_srcp_result.r = rgb_srcp_op(rgb_addr0->r, rgb_addr1->r)
srcp ={ rgb_srcp_result.g = rgb_srcp_op(rgb_addr0->g, rgb_addr1->g)
{ rgb_srcp_result.b = rgb_srcp_op(rgb_addr0->b, rgb_addr1->b)
{ alpha_srcp_result.a = alpha_srcp_op(alpha_addr0->a, alpha_addr1->a)
The RGB and alpha units each take three operands, A, B, and C. These operands are selected with the RGB_SEL_x
and ALPHA_SEL_x fields. Note that src0, src1 and src2 are fetched from a combination of the RGB and alpha
source addresses. If the RGB unit swizzles in an alpha component, the alpha component will always come from
alpha_addr*. Similarly, if the alpha unit swizzles in an RGB component, it will always come from rgb_addr*.
8.4.3.2 Swizzle
© 2010 Advanced Micro Devices, Inc.
Proprietary
107
Revision 1.5
June 8, 2010
Each component of each input can specify one of seven values. Each component can select R, G, B, or A from the
selected source, or it can choose 0, 0.5, or 1. The RGB unit has 3 components, so there are three swizzle select fields
per input. The Alpha unit only has 1 swizzle select per input.
The RGB unit always uses the RGB selectors (RGB_SEL_x) and, except for one case noted below, the red
(RED_SWIZ_x), green (GREEN_SWIZ_x), and blue (BLUE_SWIZ_x) swizzle selects. The alpha unit always uses
the alpha selectors (ALPHA_SEL_x) and the alpha (ALPHA_SWIZ_x) swizzle selects.
DP4 is a special case in that it is an RGB operation which operates on 4 components instead of 3. The fourth input
component is configured with the Alpha's select (ALPHA_SEL_x) and swizzle (ALPHA_SWIZ_x). This is the only
case where the Alpha's swizzle has an effect on the RGB computation's input.
8.4.3.3 Input Modifier
Each input has a modifier applied to it. The modifier can be one of:
US_IMOD_OFF
US_IMOD_NEG
US_IMOD_ABS
US_IMOD_NAB
No modification
Negate
Take absolute value
Take negative of absolute value
8.4.4 The Operation
Following are the possible math operations the ALU can perform. The three inputs are denoted by A, B, and C.
US_OP_RGB_SOP / US_OP_ALPHA_DP
US_OP_RGB_MAD / US_OP_ALPHA_MAD
US_OP_RGB_MIN / US_OP_ALPHA_MIN
US_OP_RGB_MAX / US_OP_ALPHA_MAX
US_OP_RGB_CND / US_OP_ALPHA_CND
US_OP_RGB_CMP / US_OP_ALPHA_CMP
US_OP_RGB_FRC / US_OP_ALPHA_FRC
US_OP_RGB_MDH / US_OP_ALPHA_MDH
© 2010 Advanced Micro Devices, Inc.
Proprietary
Get results from the other unit's unique ops. In the case
of RGB_SOP, the result is replicated to all three
channels. RGB's unique ops all have scalar results, so
ALPHA_DP simply copies that scalar result to its alpha
destination.
RGB_SOP is only valid if the alpha operation is a
transcendental operation: EX2, LN2, RCP, RSQ, SIN,
COS. ALPHA_DP is only valid if the RGB operation is
a dot product: DP3, DP4, D2A.
A*B+C
A<B?A:B
Minimum of A and B.
A >= B ? A : B
Maximum of A and B.
C > 0.5 ? A : B
C >= 0 ? A : B
A - floor(A)
floor(A) is the largest integer value less than or equal to
A.
A*B+C
Where:
A is forced to topleft.src0 (source select and
swizzles ignored)
108
Revision 1.5
June 8, 2010
C is forced to topright.src0 (source select and
swizzles ignored)
MDH operates on a quad of pixels at a time; A and C
will be the same value for each pixel within a quad, and
the result will also be the same if B is a constant value.
Used to computes change in horizontal direction
between neighboring pixels. For example, to get the
difference (topright.r0 - topleft.r0)
set:
src0 = r0 B = -1
Note that input modifiers work on all three inputs.
US_OP_RGB_MDV / US_OP_ALPHA_MDV
If src0 is computed in the previous instruction, then a
NOP needs to be inserted between the two instructions.
Do this by setting the NOP flag in the previous
instruction. This is not required if the previous
instruction is a texture lookup.
A*B+C
Where:
A is forced to topleft.src0 (source select and
swizzles ignored)
C is forced to bottomleft.src0 (source select and
swizzles ignored)
MDV operates on a quad of pixels at a time; A and C
will be the same value for each pixel within a quad, and
the result will also be the same if B is a constant value.
Used to computes change in vertical direction between
neighboring pixels. For example, to get the difference
(bottomleft.r0 - topleft.r0) set:
src0 = r0 B = -1
Note that input modifiers work on all three inputs.
US_OP_RGB_DP3
US_OP_RGB_DP4
US_OP_RGB_D2A
US_OP_ALPHA_EX2
© 2010 Advanced Micro Devices, Inc.
Proprietary
If src0 is computed in the previous instruction, then a
NOP needs to be inserted between the two instructions.
Do this by setting the NOP flag in the previous
instruction. This is not required if the previous
instruction is a texture lookup.
A.r*B.r + A.g*B.g + A.b*B.b
Results are broadcast to all 3 channels.
Use US_OP_ALPHA_DP to get result into Alpha.
A.r*B.r + A.g*B.g + A.b*B.b + A.a*B.a
Results are broadcast to all 3 channels.
Use US_OP_ALPHA_DP to get result into Alpha.
Note that ".a" actually comes from the alpha instruction's
swizzle and select (see the section on swizzle above).
A.r*B.r + A.g*B.g + C.b
Results are broadcast to all 3 channels.
Use US_OP_ALPHA_DP to get result into Alpha.
2^A
109
Revision 1.5
US_OP_ALPHA_LN2
US_OP_ALPHA_RCP
US_OP_ALPHA_RSQ
US_OP_ALPHA_SIN
US_OP_ALPHA_COS
June 8, 2010
Use US_OP_RGB_SOP to get result into RGB.
log2(A)
Use US_OP_RGB_SOP to get result into RGB.
1/A
Use US_OP_RGB_SOP to get result into RGB.
1 / squareRoot(A)
Use US_OP_RGB_SOP to get result into RGB.
Note that the SM3 specification defines reciprocal
square root as 1 / squareRoot(abs(A)) -- this can be
achieved by using the input modifier for A.
sin(A * 2pi)
Use US_OP_RGB_SOP to get result into RGB.
cos(A * 2pi)
Use US_OP_RGB_SOP to get result into RGB.
8.4.5 Instruction modifiers
Each instruction can have an output modifier applied to its result:
US_OMOD_U1
US_OMOD_U2
US_OMOD_U4
US_OMOD_U8
US_OMOD_D2
US_OMOD_D4
US_OMOD_D8
US_OMOD_DISABLED
Multiply by 1
Multiply by 2
Multiply by 4
Multiply by 8
Divide by 2
Divide by 4
Divide by 8
No modification
Each instruction can also be optionally clamped to the range 0 to 1. This happens after the above output modifier.
8.4.5.1 Disabling the output modifier
The multiply/divide output modifiers all convert NaN values into a standardized NaN (0x7fffffff) and squash any
denormal values to plus or minus zero. For most ALU operations this is acceptable, however a MOV instruction
needs to preserve the source exactly. For this, you can disable the output modifier for the MIN, MAX, CMP and
CND instructions. With US_OMOD_DISABLED, the result is not modified at all; the value is neither multiplied
nor divided, and clamping is not applied.
This allows a MOV to be implemented using any of the following instructions, with US_OMOD_DISABLED set:
MIN(src, src)
MAX(src, src)
CND(src, src, 0)
CMP(src, src, 0)
US_OMOD_DISABLED is not valid with any other ALU operation.
© 2010 Advanced Micro Devices, Inc.
Proprietary
110
Revision 1.5
June 8, 2010
8.4.6 Writemasks
There are a number of writemasks for each instruction:
RGB_WMASK
ALPHA_WMASK
RGB_OMASK
ALPHA_OMASK
W_OMASK
WRITE_INACTIVE
STAT_WE
RGB_PRED_SEL
RGB_PRED_INV
ALPHA_PRED_SEL
ALPHA_PRED_INV
IGNORE_UNCOVERED
ALU_WMASK
3 bits; write R,G,B to register destination.
1 bit; write A to register destination.
bits; write R,G,B to output or to predicate bits.
1 bit; write A to output or to predicate bits.
1 bit; write A to W output.
1 bit; if set, ignores flow control pixel mask when
writing. Affects ALU and texture instructions. If in
doubt, this bit should be cleared.
4 bits; Mask R,G,B,A to increment sign-count
performance counter.
3 bits; Sets one of six modes that specify which of the 4
predicate bit(s) to AND with the RGB writemask (and
output mask when applicable). One of:
NONE - no predication
RGBA - normal predication
RRRR - replicate R predicate bit
GGGG - replicate G predicate bit
BBBB - replicate B predicate bit
AAAA - replicate A predicate bit
1 bit; Inverts selected RGB predicate bit(s). Should be
zero if RGB_PRED_SEL is set to NONE.
3 bits; like RGB_PRED_SEL, but used to control
predication for the alpha unit's write mask.
1 bit; Inverts selected alpha unit predicate bit. Should be
zero if ALPHA_PRED_SEL is set to NONE.
1 bit; if set, excludes uncovered pixels (outside triangle
or killed via TEXKILL) from TEX lookups and flow
control decisions. Affects texture and flow control
instructions. If in doubt, this bit should be cleared.
1 bit; if set, update the ALU result. Similar to the
predicate write mask.
Flow control instructions only have one predicate select, using the RGB_PRED_SEL and RGB_PRED_INV fields.
ALU/Output instructions can use different predicate selects for the RGB (vector) computation and the alpha (scalar)
computation. For texture instructions, the RGB results from the texture unit will be influenced by
RGB_PRED_SEL/RGB_PRED_INV, and the alpha result from the texture unit will be influenced by the
ALPHA_PRED_SEL/ALPHA_PRED_INV fields.
8.4.7 Destination
The destination address refers to a temporary register. The loop variable (aL) may optionally be added to the address
before writing. The predicate select in RGB_PRED_SEL, RGB_PRED_INV, ALPHA_PRED_SEL, and
ALPHA_PRED_INV will be applied when writing to the destination.
© 2010 Advanced Micro Devices, Inc.
Proprietary
111
Revision 1.5
June 8, 2010
8.4.8 Output
With OUTPUT instructions, the TARGET field indicates where the result of the instruction should be written.
When in cached write mode (the default mode), the following options are available:
US_RNDR_TGT_A
US_RNDR_TGT_B
US_RNDR_TGT_C
US_RNDR_TGT_D
Write to render target A register
Write to render target B register
Write to render target C register
Write to render target D register
The US_OUT_FMT_* registers describe render targets A through D. The results are stored and the final value is
sent out when the program terminates. If a channel in an output target is written more than once, the final value
written is what will be sent out. The RGB and alpha unit may write to different targets in the same instruction.
The output may be predicated using PRED_SEL and PRED_INV.
8.4.9 Setting Predicate Bits
Each instruction may optionally set one or more predicate bits. ALU instructions (as opposed to OUTPUT
instructions) interpret the OMASK fields as a predicate writemask. The TARGET field determines when to set the
bits associated with each channel:
US_PRED_OP_EQUAL
US_PRED_OP_LESS
US_PRED_OP_GREATER_EQUAL
US_PRED_OP_NOT_EQUAL
Set when channel is zero
Set when channel is negative
Set when channel is non-negative
Set when channel is non-zero
The enumeration's names are based on the assumption that they will be primarily used after a subtraction of two
values. That's not the only possible use, of course. The RGB and alpha units may use different functions to set the
predicate in the same instruction.
In order to achieve the remaining common comparisons, <= and >, one can simply reverse the order of the values
being subtracted, or reverse both signs, and use the >= and < operations respectively.
You can simultaneously write to the predicate register and a temporary register, and you can perform a predicated
temporary register write if you are also writing the predicate register. However, the old value of the predicate will
only be applied to the temporary register's write mask; it will not be applied to the predicate write mask. In other
words, if the predicate is 0x7, your temporary write mask is 0xf and your predicate write mask is 0xf, you will write
only RGB components to the temporary register, but you will write to all 4 predicate bits.
If the instruction result is clamped, the comparison happens on the post-clamped result. If output modifier is
disabled, denormals may be compared -- denormals are equivalent to zero.
8.4.10 ALU Result
Every instruction has an "ALU result." In order to use it, an ALU instruction must write an ALU result, and a it must
be consumed by the next flow control instruction. The ALU result is preserved across other ALU/texture
© 2010 Advanced Micro Devices, Inc.
Proprietary
112
Revision 1.5
June 8, 2010
instructions that do not write a new ALU result, but is NOT preserved across flow control instructions; therefore the
ALU result must be consumed by the first flow control statement after it is written.
The ALU result is a single bit. The channel source for the ALU result is selected by the ALU_RESULT_SEL field:
US_ALU_RESULT_SEL_RED
US_ALU_RESULT_SEL_ALPHA
How to interpret the floating point result to set the ALU result bit is specified by the ALU_RESULT_OP field,
which is similar to the interpretation of the TARGET field for setting the predicate bits:
US_ALU_RESULT_OP_EQUAL
US_ALU_RESULT_OP_LESS
US_ALU_RESULT_OP_GREATER_EQUAL
US_ALU_RESULT_OP_NOT_EQUAL
Set when channel is zero
Set when channel is negative
Set when channel is non-negative
Set when channel is non-zero
The ALU instruction that updates the ALU result must set the ALU_WMASK bit.
If the instruction result is clamped, the comparison happens on the post-clamped result. If output modifier is
disabled, denormals may be compared -- denormals are equivalent to zero.
8.5 Texture Instructions
Texture instructions are simpler than ALU or flow control instructions. Texture instructions have one destination
temporary address, 1 to 3 source temporary addresses, a sampler ID, and an opcode and control bits specifying how
to lookup the texture. Most texture configuration is handled in the per-sampler configuration.
As with ALU temporary addresses, the loop variable (aL) may be added to any texture temporary address (source
and destination). Texture source addresses allow arbitrary swizzles from RGBA to STRQ coordinate space, and the
RGBA result from the texture unit may also be swizzled. Unlike with ALU instructions, the texture swizzles cannot
be used to select constant inputs (0, 0.5, 1). Texture source addresses always read from the temporary registers; they
cannot read from the constant bank.
Texture instructions feature a texture semaphore mechanism to synchronize texture lookup with instructions using
the result of the lookup. See below for more information.
You may choose to limit which channels of a texture lookup are written by using the write masks RGB_WMASK
and ALPHA_WMASK. These write masks may be predicated; the RGB results from the texture unit are predicated
with RGB_PRED_SEL and RGB_PRED_INV, while the alpha result from the texture unit is predicated with
ALPHA_PRED_SEL and ALPHA_PRED_INV.
Texture instructions have an UNSCALED bit that to control whether the texture coordinates are scaled by the
texture dimensions before lookup. In typical usage, this bit is cleared for normal texture lookups which supply
coordinates in the range [0.0, 1.0], and set for texture lookups which supply coordinates that are prescaled to the
texture dimensions.
8.5.1 Operations
There are currently 7 texture operations available.
© 2010 Advanced Micro Devices, Inc.
Proprietary
113
Revision 1.5
US_TEX_INST_NOP
US_TEX_INST_LOOKUP
US_TEX_INST_KILL_LT_0
US_TEX_INST_LOOKUP_PROJ
US_TEX_INST_LOOKUP_LODBIAS
US_TEX_INST_LOOKUP_LOD
US_TEX_INST_LOOKUP_DXDY
June 8, 2010
Perform no operation. The source addresses are ignored,
and nothing is written to the destination address. A
texture NOP may acquire the texture semaphore, so
NOP can be used for synchronization purposes.
A standard texture lookup. Reads the coordinates from
SRC_ADDR and writes the results of the lookup to
DST_ADDR.
Kill the pixel if any components in SRC_ADDR are less
than zero. Note that the source swizzles are ignored in
this case; if you want to limit which channels are
examined, you may use the write masks in
WMASK_RGB, WMASK_ALPHA, and/or predication.
Nothing is written to the destination address, but the
coverage mask may be updated.
Lookup a projected texture. Q is used for the projective
divide.
Lookup a texture, biasing the LOD that is computed.
Lookup a texture, using the value specified in the Q
coordinate of the input as an explicit LOD value.
Lookup a texture, computing a LOD based on slopes
given. This is the only opcode that uses the DX_ADDR
and DY_ADDR source addresses. These registers
contain the slope values the texture unit should use when
determining the slope.
8.5.2 Semaphore
The semaphore is used to synchronize texture lookups with their subsequent use in the shader program.
Each texture instruction has a bit, TEX_SEM_ACQUIRE, specifying whether it should hold the texture semaphore
until the looked-up data comes back and is written to the destination temporary register. All shader instructions have
another semaphore bit, TEX_SEM_WAIT, that specifies whether to wait on the semaphore so its (dependent) source
data is up to date. You may take advantage of the texture semaphore to perform various independent computations
while waiting on the texture operation to complete.
Hardware disallows more than one ACQUIRE operation at a time, so if you set TEX_SEM_ACQUIRE on a lookup,
you must also set TEX_SEM_WAIT for that instruction. WAIT has no cost if there are no outstanding ACQUIRE
operations. For an instruction with TEX_SEM_WAIT and TEX_SEM_ACQUIRE both set, the wait happens first.
There is only one texture semaphore, however you may use it to protect multiple texture lookups, as long as the
lookups are themselves independent. When a texture instruction sets TEX_SEM_ACQUIRE, the texture unit
ensures that that particular lookup, and all prior lookups, have completed before releasing the semaphore. Therefore,
to protect several texture lookups, you may set TEX_SEM_ACQUIRE only on the last texture lookup, and set
TEX_SEM_WAIT on the first instruction that uses any of the results. This example illustrates the usage:
0:
1:
2:
INSTRUCTION
r4 = TEXLD(s0, r1)
r5 = TEXLD(s1, r2)
r6 = TEXLD(s2, r3)
© 2010 Advanced Micro Devices, Inc.
Proprietary
TEX_SEM_WAIT
0
0
1
TEX_SEM_AQUIRE
0
0
1
114
Revision 1.5
3:
4:
5:
6:
r1 = r1 + 1
r2 = r2 + 1
r3 = r3 + 1
r4 = r4 + 1
June 8, 2010
0
0
0
1
In the above example, note that instruction 2 waits for the semaphore to ensure the semaphore is available before
acquiring it.
Remember that the last instruction of the shader program must set TEX_SEM_WAIT, to ensure that the texture unit
is ready to process the next quad. It is invalid to terminate the shader while holding the texture semaphore from a
texture lookup.
8.6 Flow Control
Each flow control instruction is essentially a conditional jump. Various optional stack operations allow all the
different kinds of traditional flow control statements. In particular, flow control instructions allow branch statements
(if/else/endif blocks), loop statements (with an optional loop register, aL), and subroutine calls. Optimizers may be
able to combine these basic types of instructions, and utilize more esoteric flow control modes.
HW supports two flow control modes, "partial" and "full". Partial flow control mode enables twice as many
contexts as full mode, but partial flow control mode has a limited nesting depth of branch statements, and does not
support loops or subroutine calls. Partial flow control mode should be used unless the program requires branch
statements nested more than 6 deep, or the program requires loops or subroutines. If full flow control mode is used,
then your shader must declare at least two temporary registers (the US_PIXSIZE.PIXSIZE field must be greater than
or equal to 1). The US_FC_CTRL register, described below, controls the behaviour of all flow control statements in
a program including whether to use partial or full flow control mode.
See the Fields section below for descriptions of fields that affect the jump condition and the various flow control
stacks. Following that are the values of those fields for the most common types of flow control operations.
8.6.1 Dynamic Flow Control
As the US is a SIMD engine, applying the same instruction to a group of pixels, dynamic flow control must be
implemented with pixel masks. If a pixel wants to take a jump because it failed an IF condition, but its neighbors in
the pixel group don't want to jump, the pixel must be masked off for a time until that branch of the IF statement is
completed. Only if all pixels fail the IF condition would the program counter actually be changed. Conversely, if
some pixels don't want to jump to a subroutine, they must be masked off for the entire subroutine. Only if none of
the pixels want to jump would the call be skipped. A break statement within a loop masks off passing pixels until
the loop is complete, and the program counter is only changed if all pixels want to jump.
These pixel masks are organized into stacks so flow control blocks may be nested. The operations on these stacks
are encoded in the flow control instructions as flags, instead of having one set of opcodes which hard-wire the stack
behavior. This orthogonality allows for more creative control of the shader's behavior, and provides opportunity for
optimizations in shaders that use a lot of flow control.
Jump conditions can be based off of a boolean constant, the result of the previous ALU operation, and/or a predicate
bit. Booleans are constant across all pixels, so dynamic flow control is only achieved with predicates and
conditionals (ALU result). Any ALU instruction can specify whether to write the ALU result and what channel
supplies the data for the result. The ALU result is only valid until another ALU instruction writes to the result, or a
© 2010 Advanced Micro Devices, Inc.
Proprietary
115
Revision 1.5
June 8, 2010
flow control instruction is encountered. The predicate bits can be set anywhere and are preserved across flow
control instructions, but there are only 4 of them.
Flow control predication cannot be per-channel. One of the replicate swizzles must be used for predication of flow
control instructions (all other types of instructions can be predicated per channel). Flow control instructions use the
RGB_PRED_SEL and RGB_PRED_INV fields to compute the predicate.
8.6.2 The Stacks, and Branch Counters
The HW maintains two separate stacks for flow control.
Address Stack
Purely an address stack. No other state is maintained.
Popping the address stack overrides the instruction
address field of the flow control instruction. The address
stack will only be modified if the flow control
instruction decides to jump.
Stores an internal iteration count, loop variable (aL), and
a pixel mask per frame. The only way to access the
iteration count is with the LOOP/ENDLOOP and
REP/ENDREP operations. The only way to alter the aL
variable is with the LOOP/ENDLOOP ops. The only
way to read the aL variable is with relative addressing.
The only way to alter the pixel mask is with the BREAK
or CONTINUE instruction.
Loop Stack
Each stack's size is dependent on whether the program is in partial or full flow control mode. Stack overflows and
underflows produce undefined behaviour in the hardware. The stack sizes are:
Loop stack
Address stack
PARTIAL
n/a
n/a
FULL
4
4
The loop stack is maintained in such a way that an inner REP block will continue to see the loop variable from an
outer LOOP block. Nested LOOP blocks will shadow the loop variable. The loop variable is not valid if you are not
in at least one LOOP block.
In addition to the two stacks, hardware maintains an Active Bit and a Branch Counter for each pixel that indicate
whether the pixel is active and, if it was disabled by a conditional statement (if, else), how long before it can be
reactivated. If the active bit is unset, the pixel is inactive and the branch counter indicates the number of conditional
blocks we must exit before the pixel can be activated again. The maximum value of this counter is dependent on
whether the program is in partial or full flow control mode. The limits (which determine maximum safe nesting
depth) are:
Branch counter
Maximum depth
© 2010 Advanced Micro Devices, Inc.
Proprietary
PARTIAL
0..3
4
FULL
0..31
32
116
Revision 1.5
June 8, 2010
The branch counter can be incremented and decremented directly by any flow control instruction based on whether
the pixel agrees with the jump decision. Manipulating the branch counter may affect the active bit. Incrementing
the counter on an active pixel will disable the pixel by clearing the active bit, and set the branch counter to zero.
Decrementing the counter of an inactive pixel to a negative value will set the active bit, reactivating the pixel. The
branch counter is ignored in hardware while the active bit is set.
Pixels disabled by looping statements (BREAKLOOP, BREAKREP, and CONTINUE) are also tracked with "loop
inactive" counters, however unlike the branch counter, the loop counters cannot be manipulated directly.
Since only conditional (if, else) and loop statements maintain active pixel masks, to call a function based on a
condition requires the shader to use the branch counters on CALL and RETURN so the pixel active mask will be
updated on the conditional call. If you know ahead of time that *all* calls to a particular subroutine will be
unconditional calls, you can omit the branch counter manipulation on that subroutine's return and on any calls to that
subroutine. The benefit of this is unclear, unless you are nearing the upper limit on the branch counter.
Returns within dynamic branches and/or loops (nested in the subroutine) are not supported. A return can be made
conditional (by incremeneting the branch stack counter on stay), but the hardware does not support returning within
other conditional blocks that might partially mask it. If a branch is entirely static (based on a constant boolean), you
may put a return within a branch (just get the branch counter decrement right). This cannot be done inside loops,
however.
8.6.3 Fields
8.6.3.1 Fields controlling conditions on the jump
JUMP_FUNC
Bit 0
Bit 1
Bit 2
Bit 3
Bit 4
Bit 5
Bit 6
Bit 7
2x2x2 table indicating when to jump
= Jump when (!alu_result && !predicate && !boolean).
= Jump when (!alu_result && !predicate && boolean).
= Jump when (!alu_result && predicate && !boolean).
= Jump when (!alu_result && predicate && boolean).
= Jump when ( alu_result && !predicate && !boolean).
= Jump when ( alu_result && !predicate && boolean).
= Jump when ( alu_result && predicate && !boolean).
= Jump when ( alu_result && predicate && boolean).
Common JUMP_FUNC values:
0x00
0x0f
0x33
0x55
0xaa
0xcc
0xf0
0xff
= Never jump
= Jump iff alu_result is false.
= Jump iff predicate is false.
= Jump iff boolean is false.
= Jump iff boolean is true.
= Jump iff predicate is true.
= Jump iff alu_result is true.
= Always jump
JUMP_ANY
false
How to treat partially passing groups of pixels
= Don't jump unless all pixels want to jump.
© 2010 Advanced Micro Devices, Inc.
Proprietary
117
Revision 1.5
true
June 8, 2010
= Jump if at least one active pixel wants to jump.
When JUMP_ANY is false, the instruction behaves like a universal quantifier, and will decide jump if there are no
active pixels. When JUMP_ANY is true, the instruction behaves like an existential quantifier, and will never decide
to jump if there are no active pixels. Looping statements may override the jump decision made by the pixels based
on the loop counter.
8.6.3.2 Fields controlling optional stack operation
OP
US_FC_OP_JUMP
US_FC_OP_LOOP
US_FC_OP_ENDLOOP
US_FC_OP_REP
US_FC_OP_ENDREP
US_FC_OP_BREAKLOOP
US_FC_OP_BREAKREP
US_FC_OP_CONTINUE
Loop Stack Operations
None
Initialize counter and aL, and push loop stack if stay
Increment counter and aL if jump, else pop loop stack
Initialize counter, and push loop stack if stay
Increment counter if jump, else pop loop stack
Pop loop stack if jump
Pop loop stack if jump
Disable pixels until end of current loop
You should use US_FC_OP_BREAKLOOP if the innermost looping construct is LOOP, and
US_FC_OP_BREAKREP if the innermost looping construct is REP.
A_OP
US_FC_A_OP_NONE
US_FC_A_OP_POP
US_FC_A_OP_PUSH
B_OP0
US_FC_B_OP_NONE
US_FC_B_OP_DECR
US_FC_B_OP_INCR
B_OP1
US_FC_B_OP_NONE
US_FC_B_OP_DECR
US_FC_B_OP_INCR
Address Stack Operations
= None
= Pop address stack if jump (overrides JUMP_ADDR
given in instruction)
= Push address stack if jump
Branch stack Operations if stay
= None
= Decrement branch counter for inactive pixels by
amount in B_POP_CNT. Activate pixels which go
negative.
= Increment branch counter for inactive pixels by 1.
Deactivate pixels which disagree with the jump decision
(by deciding to jump) and set their branch counter to 0.
Branch stack Operations if jump
= None
= Decrement branch counter for inactive pixels by
amount in B_POP_CNT. Activate pixels which go
negative.
= Increment branch counter for inactive pixels by 1.
Deactivate pixels which disagree with the jump decision
(by deciding not to jump) and set their branch counter to
0.
B_POP_CNT
Branch Stack Pop Count
How much to decrement the branch counters by when appropriate B_OP* field says to decrement.
© 2010 Advanced Micro Devices, Inc.
Proprietary
118
Revision 1.5
B_ELSE
false
true
June 8, 2010
Branch Stack Else
= None
= Activate pixels whose branch count is zero (pixels
deactivated by the innermost conditional block), and
deactivate all pixels that were active.
Special Cases:
 When the iteration count is zero, LOOP/REP ignore JUMP_FUNC and jump.
 When the iteration count is zero, ENDLOOP/ENDREP ignore JUMP_FUNC and don't jump.
 Any pixels deactivated by B_ELSE "want to jump" regardless of JUMP_FUNC.
 Any pixels deactivated by a branching statement (if, else) will inhibit a decision to jump by a BREAK or
CONTINUE statement.
 Any pixels deactivated by a CONTINUE statement will inhibit a decision to jump by a BREAK statement;
they will not inhibit a decision to jump by another CONTINUE statement.
 Pixels deactivated by other flow control are indifferent to the decision to jump by a BREAK or
CONTINUE statement.
8.6.3.3 Address Fields
BOOL_ADDR
INT_ADDR
JUMP_ADDR
JUMP_GLOBAL
Which of 32 constant booleans to use for jump condition
Which of 32 constant integers to use for loop initialization (the red channel is used for
iteration count, green for aL initialization, and blue for aL increment)
Which instruction to jump to if conditions pass
Whether JUMP_ADDR is global, or if OFFSET_ADDR should be added to JUMP_ADDR.
8.6.3.4 Global Configuration
FULL_FC_EN
false
true
Whether to enable full flow control support.
= No loops or calls, limited branching. Better performance.
= All flow control functionality enabled.
8.6.4 Common Flow Control Statements
JUMP_FUNC
0x55
0xff
JUMP_ANY
0
0
OP
JUMP
JUMP
A_OP
NONE
NONE
B_OP0
NONE
NONE
B_OP1
NONE
NONE
B_POP_CNT
0
0
B_ELSE
0
0
JUMP_ADDR
ELSE+1
ENDIF
IF p
ELSE
ENDIF
0x33
0x00
0x00
0
0
1
JUMP
JUMP
JUMP
NONE
NONE
NONE
INCR
NONE
DECR
INCR
DECR
NONE
0
1
1
0
1
0
ELSE+1
ENDIF+1
0
IF c
ELSE
ENDIF
0x0f
0x00
0x00
0
0
1
JUMP
JUMP
JUMP
NONE
NONE
NONE
INCR
NONE
DECR
INCR
DECR
NONE
0
1
1
0
1
0
ELSE+1
ENDIF+1
0
IF b
0x55
0
JUMP
NONE
NONE
NONE
0
0
ENDIF
IF b
ELSE
ENDIF
© 2010 Advanced Micro Devices, Inc.
Proprietary
119
Revision 1.5
June 8, 2010
ENDIF
IF p
ENDIF
0x33
0x00
0
1
JUMP
JUMP
NONE
NONE
INCR
DECR
NONE
NONE
0
1
0
0
ENDIF+1
0
IF c
ENDIF
0x0f
0x00
0
1
JUMP
JUMP
NONE
NONE
INCR
DECR
NONE
NONE
0
1
0
0
ENDIF+1
0
LOOP
ENDLOOP
0x00
0xff
0
1
LOOP
ENDLOOP
NONE
NONE
NONE
NONE
NONE
NONE
0
0
0
0
ENDLOOP+1
LOOP+1
REP
ENDREP
0x00
0xff
0
1
REP
ENDREP
NONE
NONE
NONE
NONE
NONE
NONE
0
0
0
0
ENDREP+1
REP+1
BREAK
BREAK b
BREAK p
BREAK c
0xff
0xaa
0xcc
0xf0
0
0
0
0
BREAK
BREAK
BREAK
BREAK
NONE
NONE
NONE
NONE
NONE
NONE
NONE
NONE
DECR
DECR
DECR
DECR
n
n
n
n
0
0
0
0
END+1
END+1
END+1
END+1
CONTINUE
CONTINUE
b
CONTINUE
p
CONTINUE
c
0xff
0xaa
0
0
CONTINUE
CONTINUE
NONE
NONE
NONE
NONE
DECR
DECR
n
n
0
0
END
END
0xcc
0
CONTIUNE
NONE
NONE
DECR
n
0
END
0xf0
0
CONTINUE
NONE
NONE
DECR
n
0
END
CALL
CALL b
CALL p
CALL c
RETURN
0xff
0xaa
0xcc
0xf0
0xff
1
1
1
1
0
JUMP
JUMP
JUMP
JUMP
JUMP
PUSH
PUSH
PUSH
PUSH
POP
NONE
NONE
NONE
NONE
NONE
INCR
INCR
INCR
INCR
DECR
0
0
0
0
1
0
0
0
0
0
Subroutine
Subroutine
Subroutine
Subroutine
0
* n indicates how many branch stack frames the BREAK is inside within the current loop.
* Lines with no fields filled out indicate no FC instruction is necessary in that spot.
8.6.5 Optimizations
Clearly, not all the possible combinations are explored above. The flexibility of the flow control instruction allows
for more creative flow control operations, or (more likely) optimizations.
One of the easiest optimizations makes use of the B_POP_CNT to merge consecutive ENDIF statements:
IF c
[…]
IF c
[…]
IF c
[…]
ENDIF
ENDIF
ENDIF
JUMP_FUNC
0x0f
JUMP_ANY
0
OP
JUMP
A_OP
NONE
B_OP0
INCR
B_OP1
NONE
B_POP_CNT
0
B_ELSE
0
JUMP_ADDR
ENDIF_0+1
0x0f
0
JUMP
NONE
INCR
NONE
0
0
ENDIF_1+1
0x0f
0
JUMP
NONE
INCR
NONE
0
0
ENDIF_2+1
0x00
0x00
0x00
1
1
1
JUMP
JUMP
JUMP
NONE
NONE
NONE
DECR
DECR
DECR
NONE
NONE
NONE
1
1
1
0
0
0
0
0
0
JUMP_FUNC
0x0f
JUMP_ANY
0
OP
JUMP
A_OP
NONE
B_OP0
INCR
B_OP1
NONE
B_POP_CNT
0
B_ELSE
0
JUMP_ADDR
ENDIF+1
Becomes:
IF c
© 2010 Advanced Micro Devices, Inc.
Proprietary
120
Revision 1.5
[…]
IF c
[…]
IF c
[…]
ENDIF
ENDIF
ENDIF
June 8, 2010
0x0f
0
JUMP
NONE
INCR
DECR
1
0
ENDIF+1
0x0f
0
JUMP
NONE
INCR
DECR
2
0
ENDIF+1
0x00
1
JUMP
NONE
DECR
NONE
3
0
0
8.6.6 LAST Bit
The LAST bit in the US_CMN_INST instruction word allows shaders to terminate before reaching the address
indicated by US_CODE_SIZE.END_ADDR. The LAST bit can be indicated for any instruction type. Any active
pixel for an instruction of any type (FC, ALU, OUTPUT or TEX) marked "last" will be considered "done" for that
instruction and all future instructions that the shader might execute for that thread. Future instructions may or may
not be executed, according to the hardware implementation.
In the R5xx hardware implementation, when all pixels are "done" in a thread and we hit an OUTPUT instruction
that is marked as "last" (and has a texture semaphore wait! -- this is required), we will stop the thread, even if this
isn't the instruction specified by END_ADDR. Also, pixels that are "done" behave the same as pixels considered
"inactive" when encountering flow control instructions, meaning that code that would have been skipped over if all
pixels were "inactive" would also be skipped over if the only pixels marked as "active" were also marked as "done."
8.7 Floating Point Issues
The US is designed to be compliant with the Shader Model 3, which does not officially support IEEE special values
(denormal, infinity, NaN), and allows for leniency in various corner cases.
The US strives to provide a more complete IEEE floating point implementation. US supports the IEEE 32-bit
floating point format, with 23 bits mantissa, 8 bits biased exponent (bias 127), and 1 bit sign. The US also supports
the special IEEE values (denormal, infinity, NaN), but there are some important caveats in the implementation
which are noted below. There is no distinction between an sNaN and a qNaN.
8.7.1 Deviations from IEEE
The most pervasive caveat is that denormals are flushed to an appropriately signed zero throughout US. There is no
gradual underflow, and identities are not preserved for denormal values. This will be apparent in comparison
operations where a denormal is treated as equivalent to zero.
Also pervasive, the internal rounding mode is not configurable and is not exact to the IEEE standard. It could best
be said that rounding is random; operations in and near US round with differing standards and it is infeasible to
specify a uniform rounding mode at this stage of design. Most ALU operations are accurate to within one bit on
each input; transcendental functions have larger tolerances.
The lack of separable multiply and add instructions has consequences on rounding and sign preservation; when
using MAD to perform only a multiply or addition, keep in mind that the other operation may influence the result
despite apparent identities. For example, the obvious instructions to use for moving a value from one register to
another both utilize MAD, either with the additive identity "0 * 0 + r1", or a combination of additive and
multiplicative identities, "r0 * 1 + 0". Neither these instructions will correctly copy -0.0, because the adder cannot
generate -0.0 except with two negative inputs. In this case, a more accurate move instruction would be "-0 * 0 + r1".
(the ideal MOV instruction is described below).
© 2010 Advanced Micro Devices, Inc.
Proprietary
121
Revision 1.5
June 8, 2010
US only supports comparisons against zero (predication, ALU result, and CMP) and +0.5 (CND), and this has
consequences for implementing a general compare function with special values. It is tempting to implement a
general comparison between values A and B by subtracting the results, but this will not have the desired effect for
special values. In IEEE, an infinite value is equivalent to itself, but NaN is never equivalent to NaN. Yet (infinity infinity) = (NaN - NaN) = NaN, and the results are indistinguishable. The limited operator set further complicates
issues, since (A > B) is not equivalent to !(A <= B) when either input is NaN.
The behaviour for CMP and CND is described below. When using the predicate comparison operators, the
following hold for special values:
VALUE
+0.0
-0.0
+Inf
-Inf
NaN
X<0
0
0
0
1
0
X>=0
1
1
1
0
0
X==0
1
1
0
0
0
X!=0
0
0
1
1
1
* Denormals compare as equivalent to zero. Note that the only way a denormal may be involved in a comparison
for predicate/alu result is if the output modifier is disabled with US_OMOD_DISABLED.
8.7.2 ALU Non-Transcendental Floating Point
Non-transcendental ALU operations maintain extra precision to represent computations where an intermediate result
exceeds IEEE's finite range. For example, if a MAD generates a result outside the finite range, but the output
modifier brings the value back into range, the ALU will generate a finite value, not infinity.
The ALU accepts denormal values, but denormals are flushed to zero, preserving sign. It is possible for a
multiplicative output modifier to bring a denormal intermediate result into the normal range; in this case, the ALU
will generate a normal nonzero value.
The ALU MAD operation, which many ALU operations are based on, follows standard IEEE rules when handling
special input values, for example:
OPERATION
x * NaN
0.0 * Inf
Inf * Inf
Inf * -Inf
0.0 * -0.0
x + NaN
Inf + -Inf
Inf + Inf
Inf + -1.0
0.0 + -0.0
-0.0 + -0.0
RESULT
NaN
NaN
Inf
-Inf
-0.0
NaN
NaN
Inf
Inf
0.0
-0.0
NOTE
X is any value
X is any value
Dot products may lose precision in cases where the values to be added differ greatly in magnitude. For example, if
the two largest values to be added cancel exactly, and the next-largest value has a magnitude smaller by a factor of
© 2010 Advanced Micro Devices, Inc.
Proprietary
122
Revision 1.5
June 8, 2010
2^25 or more, US will emit +0.0 rather than the sum of the two remaining components. IEEE is silent on the
behavior of such fused operations, and it seems unlikely that this condition will manifest very often.
MIN and MAX operations return the second argument if either input is NaN (this is consistent with IEEE and SM3
specifications); infinite values compare as usual. If both inputs are +-0.0, MIN and MAX will return the second
input (consistent with IEEE and the SM3 spec) – as a result, MIN(+0, -0) == -0, and MIN(-0, +0) == +0.
CND and CMP operations return the second argument if either input is NaN; infinite values compare as usual. As
with the predicate compare operators, +0.0 and -0.0 are both "equal" to 0.
MIN, MAX, CND, and CMP are guaranteed to return one of their first two arguments. If you use
US_OMOD_DISABLED as well, then you will get a bit-exact representation of one of the first two arguments.
ALU operations usually enable the output modifier, which in turn standardizes NaN values and flushes denormal
results to zero. A MOV instruction which preserves the source bits may be implemented by setting
US_OMOD_DISABLED for the instruction and using the MAX(src, src) instruction. The output modifier cannot be
disabled for a saturated MOV (MOV with clamping enabled).
8.7.3 ALU Transcendental Floating Point
In US, transcendental operations are EX2, LN2, RCP, RSQ, SIN, and COS (mathematically speaking, one of these
functions does not belong). Transcendentals do not maintain extra internal precision; as a result, if the result of the
transcendental operation exceeds the IEEE finite range, the ALU will generate infinity even if the output modifier
would bring the result back into range. Similarly if the result is denormal, the ALU will generate a pure zero
(preserving sign) even if the output modifier would bring the result back into the normal range.
Special values are computed as shown in the following table:
INPUT
+0.0
-0.0
+Inf
-Inf
NaN
EX2
+1.0
+1.0
+Inf
+0.0
NaN
LN2
-Inf
-Inf
+Inf
NaN
NaN
RCP
+Inf
-Inf
+0.0
-0.0
NaN
RSQ
+Inf
+Inf *
+0.0
NaN
NaN
SIN
+0.0
-0.0
NaN
NaN
NaN
COS
+1.0
+1.0
NaN
NaN
NaN
* For RSQ, recall that the square root occurs first. IEEE specifies sqrt(-0.0) -> -0.0; the US deviates from this,
however this does not affect SM3 compliance since RSQ is always used with the absolute value input modifier for
SM3 shaders.
8.7.4 Texture Floating Point
Projected and cubemapped texture coordinates are processed in the US block before being sent to the texture unit.
The texture unit does not accept NaN, so NaN coordinates are converted to +infinity before being sent to the texture
unit. As with the ALU, denormal inputs and denormal results are converted to pure zero, preserving sign.
The multiplier used for projection and cubemapping does not follow IEEE rules when handling special values. This
will become apparent only when you attempt to project or cubemap a coordinate that contains an infinite or NaN
component.
© 2010 Advanced Micro Devices, Inc.
Proprietary
123
Revision 1.5
June 8, 2010
You should use caution when generating very large values for use as coordinates in a texture lookup. These values
may generate infinite values when scaled by the texture dimensions, projected, or cubemapped.
8.7.5 Legacy multiply behaviour
By default multiplication by zero is IEEE-compliant for any ALU instruction. To support legacy (SM1.x) shaders
which did not have an IEEE-compliant multiplier, set
US_CONFIG.ZERO_TIMES_ANYTHING_EQUALS_ZERO. Setting this bit will cause the multiplier used by
MAD, dot products, MDH and MDV to treat "+-0 * x == +0" for all values x. Note that IEEE deviates from this
behaviour when x is infinity or NaN. Modern shaders should not set this bit.
8.8 Writing to US Registers
The US configuration, integer constant, and boolean constant registers can be written to directly. However due to
addressing limitations elsewhere in the pipe, the US instruction and ALU constant registers cannot be written
directly; they must be programmed via a vector mechanism provided in the GA block. You write to the vector in
two parts; first, you program the write destination in GA_US_VECTOR_INDEX, then you write data to
GA_US_VECTOR_DATA until you have set all the values of interest.
8.8.1 Writing instructions
To write one or more shader instructions, set GA_US_VECTOR_INDEX.TYPE to GA_US_VECTOR_INST and
GA_US_VECTOR_INDEX.INDEX to the address of the first shader instruction you want to write (from 0 to 511).
Then write each instruction register to GA_US_VECTOR_DATA (usually, a total of 6 writes per instruction), in the
following order:
0:
1:
2:
3:
4:
5:
ALU/OUTPUT
US_CMN_INST
US_ALU_RGB_ADDR
US_ALU_ALPHA_ADDR
US_ALU_RGB_INST
US_ALU_ALPHA_INST
US_ALU_RGBA_INST
TEX
US_CMN_INST
US_TEX_INST
US_TEX_ADDR
US_TEX_ADDR_DXDY
0
0
FC
US_CMN_INST
0
US_FC_INST
US_FC_ADDR
0
0
A few notes:
 If you are writing an FC or TEX instruction, you may need to pad the vector with zeros; note that a zero
dword must be written in the middle of the FC instruction.
 You can write to multiple instructions without updating the index. After you write 6 values to
GA_US_VECTOR_DATA, the GA will automatically increment the instruction index. The index wraps at
512.
 If the last instruction you write to is a TEX or FC instruction, you do not need to write the last two zero
dwords that are used for padding.
 Similarly, if you do not need to update all instruction registers for the last instruction you write, you do not
need to write the registers that follow it.
 You should always write to GA_US_VECTOR_INDEX before writing a sequence of instructions, to
ensure the GA is setup appropriately.
© 2010 Advanced Micro Devices, Inc.
Proprietary
124
Revision 1.5
June 8, 2010
8.8.2 Writing ALU constants
To write one or more ALU constants, set GA_US_VECTOR_INDEX.TYPE to GA_US_VECTOR_CONST and
GA_US_VECTOR_INDEX.INDEX to the address of the first constant you want to write (from 0 to 255). Then
write each constant register to GA_US_VECTOR_DATA (usually, a total of 4 writes per constant), in the following
order:
0:
1:
2:
3:
US_ALU_CONST_R
US_ALU_CONST_G
US_ALU_CONST_B
US_ALU_CONST_A
A few notes:
 You can write to multiple constants without updating the index. After you write 4 values to
GA_US_VECTOR_DATA, the GA will automatically increment the constant index.
 If you do not need to update all components of the last constant you write, you do not need to write the
components that follow it.
 You should always write to GA_US_VECTOR_INDEX before writing a sequence of constants, to ensure
the GA is setup appropriately.
© 2010 Advanced Micro Devices, Inc.
Proprietary
125
Revision 1.5
June 8, 2010
9. HiZ
9.1 Introduction
The R5xx HiZ (Hierarchical Z) unit performs a coarse z occlusion test on a tile of pixels to generate a mask
indicating whether a set of quads within the tile is potentially visible. The Scan Converter (SC) block uses this mask
to determine which quads will be passed on to the Rasterizer (RS) and which will be pruned. In this manner, HiZ
provides an early-out mechanism for dropping quads.
This section presents an overview of the operation of the HiZ unit and a guide on how to program it.
9.2 Enabling HiZ
HiZ operation must be enabled in both the SC and ZB. It is enabled or disabled in the SC by setting the HZ_EN field
in the SC_HYPERZ_EN field to 1 or 0. Similarly, it is enabled or disabled in the ZB by setting the HIZ_ENABLE
field in the ZB_BW_CNTL register to 1 or 0.
9.3 Configuring HiZ
The following registers must be set to configure the HiZ unit for operation.
The ZB_HIZ_PITCH register specifies the pitch of the HiZ buffer in HiZ RAM. The host writes the pitch in pixels.
The register interprets bits [13:4] as the 16 pixel-aligned HIZ_PITCH field. This field is used as pitch_mux in
formula 1 in section 2.2, which calculates the DWORD address in HiZ RAM where z floor updates are written
during z cache line evictions.
The ZB_HIZ_OFFSET register specifies a base offset into HiZ RAM. Bits [16:2] of this register are the DWORDaligned HIZ_OFFSET field.
The HZ_MAX field in the SC_HYPERZ_EN register specifies whether the minimum or maximum z in the 8x8 tile
is interpreted as the closest z whose floor is sent to the HiZ unit. The definition of which is the closest depends on
the sense of the z function. For instance, if the z function is LESS, the minimum value is the closest. The
programmer should set this field according to the z comparison function that is set in the ZFUNC field of the
ZB_ZSTENCILCNTL register. Setting SC_HYPERZ_EN.HZ_MAX to 0 sends the floor of the minimum, and
setting it to 1 sends the floor of the maximum.
The HIZ_MIN field of the ZB_BW_CNTL register specifies whether the HiZ unit updates the HiZ RAM with the
floor of the minimum or maximum z value during z cache line evictions. As with the SC_HYPERZ_EN.HZ_MAX
field, this field is also dependant on the z function set in the ZB_ZSTENCILCNTL. Setting HIZ_MIN to 0 updates
HiZ RAM with the floor of the maximum z, and 1 updates with the floor of the minimum.
The following table shows how the SC_HYPERZ_EN.HZ_MAX and ZB_BW_CNTL. HIZ_MIN fields should be
set according to ZFUNC. It also shows what the HiZ RAM should be initially cleared to, and what action the HiZ
© 2010 Advanced Micro Devices, Inc.
Proprietary
126
Revision 1.5
June 8, 2010
comparison takes. The „Z_MINMAX‟ column corresponds to the SC_HYPERZ_EN.HZ_MAX setting, and the „ZB
write to HiZ(X, Y)‟ corresponds to the ZB_BW_CNTL. HIZ_MIN setting.
ZFUNC
HiZ Clear
Value
Z_MINMAX
HZ 2nd Level Z Function
ZB write to HIZ(X,Y)
0 - Never
Don‟t Care
Min(Z0, Z1,
Z2)
Prune the Block
Don‟t care
1 - Less
Floor(Z_Clear)
Min(Z0, Z1,
Z2)
If (floor(Z_MINMAX) >
HiZ(X,Y))
Floor(Maximum(Z))
Prune the Block
Else
Pass the Block
2 - Less or Equal
Floor(Z_Clear)
Min(Z0, Z1,
Z2)
If (floor(Z_MINMAX) >
HiZ(X,Y))
Floor(Maximum(Z))
Prune the Block
Else
Pass the Block
3 - Equal
Don‟t Care
Min(Z0, Z1,
Z2)
Pass the Block
Don‟t care
4 - Greater or
Equal
Floor(Z_Clear)
Max(Z0, Z1,
Z2)
If (floor(Z_MINMAX) <
HiZ(X,Y))
Floor(Minimum(Z))
Prune the Block
Else
Pass the Block
5 - Greater Than
Floor(Z_Clear)
Max(Z0, Z1,
Z2)
If (floor(Z_MINMAX) <
HiZ(X,Y))
Floor(Minimum(Z))
Prune the Block
Else
Pass the Block
© 2010 Advanced Micro Devices, Inc.
Proprietary
127
Revision 1.5
June 8, 2010
6 - Not Equal
Don‟t Care
Max(Z0, Z1,
Z2)
Pass the Block
Don‟t Care
7 - Always
Don‟t Care
Max(Z0, Z1,
Z2)
Pass the Block
Don‟t Care
9.4 HiZ Clear with PM4 Packet
The most efficient manner for a driver to clear HiZ RAM is to use the 3D_CLEAR_HIZ Type-3 PM4 packet. The
3D_CLEAR_HIZ packet is described below.
3D_CLEAR_HIZ
Functionality
Clear HIZ RAM.
Format
Ordinal
1
2
3
4
Field Name
[ HEADER ]
START
COUNT[13:0]
CLEAR_VALUE
Description
Header of the packet
Start
Count[13:0] – Maximum is 0x3FFF.
The value to write into the HIZ RAM.
9.5 Example: Putting it All Together
Here is a simple example that demonstrates typical steps in setting up the HiZ unit:
// enable z buffering
regwrite (ZB_CNTL, Z_ENABLE, 1);
// set the ZFUNC to LESS
regwrite (ZB_ZSTENCILCNTL, ZFUNC, 1); // 1 = LESS
// enable HiZ in the SC
regwrite (SC_HYPERZ_EN, HZ_EN, 1);
// enable HiZ in the ZB
regwrite (ZB_BW_CNTL, HZ_EN, 1);
// set HZ_MAX in SC_HYPERZ_EN to MIN for ZFUNC=LESS
regwrite (SC_HYPERZ_EN, HZ_MAX, 0);
// set HIZ_MIN in ZB_BW_CNTL to MAX for ZFUNC=LESS
regwrite (ZB_BW_CNTL, HZ_MIN, 0);
// set HIZ_OFFSET to 0
regwrite (ZB_HIZ_OFFSET, HIZ_OFFSET, 0);
// set HIZ_PITCH to 1024
regwrite (ZB_HIZ_PITCH, HIZ_PITCH, 1024 >> 4);
// initialize the HiZ RAM to a clear value of 0xff
// for all the bytes in a 1024x768 area:
// set initial write index. It will auto-increment
// after each write to ZB_HIZ_DWORD
regwrite (ZB_HIZ_WINDEX, HIZ_WINDEX, 0);
© 2010 Advanced Micro Devices, Inc.
Proprietary
128
Revision 1.5
June 8, 2010
// write floors for one 8x8 tile with each DWORD.
// this example assumes a dual-pipeline configuration.
// since half the screen is owned by the second pipeline,
// and host writes are broadcast to both pipeline RAMS
// at the same address, we write the clear DWORD for
// half of 1024>>3. In a single-pipeline configuration,
// we would write the clear DWORD for 1024>>3.
for (int y = 0; y < (768 >> 3); y++)
{
for (int x = 0; x < ((1024 >> 3)>1); x++)
{
regwrite (ZB_HIZ_DWORD, HIZ_DATA, 0xffffffffL);
}
}
// read back a DWORD in pipeline 1 at address 0
regwrite (SU_REG_DEST, SELECT, 1);
regwrite (ZB_HIZ_RINDEX, 0);
DWORD dwGetHiZValue = regread (ZB_HIZ_DWORD);
9.6 State Changes That Invalidate HiZ
This section describes the conditions that invalidate HiZ RAM and those that have no effect.
Disabling Z testing or disabling Z writes does not invalidate HiZ RAM, so no special action is required in these
cases. Because both of these states result in no new z data being written to the z buffer, there are no z cache
evictions that update the contents of HiZ RAM. Therefore, HiZ RAM is preserved and can continue to be used after
Z buffering or Z writes are re-enabled.
Certain ZFUNC transitions can invalidate the contents of HiZ RAM. As a general rule, the safest approach when
ZFUNC is changed is to disable HiZ testing until the contents of HiZ RAM are reset, e.g. until the start of the next
frame where HiZ RAM is re-initialised. Having said that, there are transitions where either HiZ does not need to be
disabled, or it may be re-enabled before the end of the frame:
1) HiZ does not need to be turned off when transitioning back and forth between LESS and LESSEQUAL.
HiZ must be disabled when transitioning from either LESS or LESSEQUAL to EQUAL, but may be reenabled when transitioning back from EQUAL to LESS or LESSEQUAL.
2) HiZ does not need to be turned off when transitioning back and forth between GREATER and
GREATEREQUAL. HiZ must be disabled when transitioning from either GREATER or
GREATEREQUAL to EQUAL, but may be re-enabled when transitioning back from EQUAL to
GREATER or GREATEREQUAL.
All other transitions invalidate the contents of HiZ RAM with respect to the new sense of the z comparison.
© 2010 Advanced Micro Devices, Inc.
Proprietary
129
Revision 1.5
June 8, 2010
10. Driver notes
10.1 R5xx Changes
10.1.1 PS3.0
R520 TX supports pixel shader model 3.0. Support for 32-bit IEEE input coordinates from the shader and 32-bit
IEEE output colors to the shader. Support for per pixel (or per quad) TEXLDB, TEXLDL, and TEXLDD
instructions.
10.1.2 Filter4
R520 can support limited Filter4 filtering. The kernel is 4x4 symmetric and separable with 16 phases. The kernel
weight precision is S,1.9. There is one global kernel shared by all textures. The kernel is loaded using the global
TX_FILTER4 register. Filter4 can be enabled per texture using the MAG and MIN filter registers. Only one of four
8-bit components can have Filter4 applied at a time. That component is selected using FORMAT2.SEL_FILTER4.
10.1.3 Maximum Image Extents
R520 supports up to 4K texels in width, height, or depth.
10.1.4 Trilinear Interpolation Precision
R520 supports 6-bits of trilinear precision. R420 supported 5-bits.
10.1.5 Image Formats
New image formats over R420 : ATI1N, 10, 10_10, 10_10_10_10, 1, 1_REVERSED
10.1.6 Border Color
Added border color support for FAT formats, specifically 16_16_16_16, 16f_16f_16f_16f, 32f_32f,
32f_32f_32f_32f. Border color is now supported for all image formats.
10.1.7 Non-Square mipmaps with border color
Added mode register FILTER1.BORDER_FIX which when asserted will stop right shifting the texture coordinate
once the image size has been right shifted to one. BORDER_FIX only needs to be asserted when the clamp mode is
a border mode and mipmapping is enabled and the mipmap is non-square. However it should be safe to assert
BORDER_FIX anytime.
10.1.8 POW2FIX2FLT
Added mode register FORMAT2.POW2FIX2FLT which when asserted the TX will divide by pow2 instead of
pow2-1 when doing fix2float conversion of the filtered texture color.
10.1.9 GA_IDLE
R520 has a new status register called GA_IDLE which can be used to get information about back-end hangs. To
read this register, the following procedure may be used:


Read RBBM_STATUS to make sure the HW is hung. If GA bit is busy, this may indicate a
back-end hang.
Write 0x32005 to the RBBM_SOFTRESET register. This is to reset GA, CP and VAP.
© 2010 Advanced Micro Devices, Inc.
Proprietary
130
Revision 1.5





June 8, 2010
Read RBBM_SOFTRESET to make sure the write went through.
Write 0 to RBBM_SOFTRESET. This is necessary to get VAP to go idle.
RBBM_STATUS should now show that VAP and CP are idle but GA still busy. If GA is not busy,
then GA_IDLE should be readable at this point.
If GA was still hung, write 0x200 to GA_SOFTRESET
Now GA_IDLE can be read. See the register spec for details on what each bit means. Note that a “1”
indicates an idle unit.
10.1.10 HDP surface0 upper bound 64 byte alignment requirement
HDP surface 0 upper bound needs at least 64 byte alignment. This applies only to surface 0 and not to surface 1 to
7, which can be programmed as specified (32 byte aligned).
10.1.11 New Soft resets for CP
CP now has total of 3 soft resets:
CP_SOFT_RESET => as before (for backward compatibility).
CP_SOFT_RESET_NO_DMA => soft reset CP except DMA engine.
CP_SOFT_RESET_DMA => soft reset only DMA engine of CP.
10.1.12 CP STOP_CONTEXT
Once SC/CB informs CP to stop_context, CP will not fetch/process any further read requests from command
buffers.
10.1.13 Updated CP Scratch compare logic
Scratch register interrupt functions as follows:
(a) Driver programs two 32bit registers with timestamp for comparisons with a pair of scratch registers. We can call
this as DRV_REGS
(b) Driver programs PM4 stream with writes to two consecutive scratch registers (paired as 0-1,2-3,4-5,6-7) to be
compared with DRV_REGS.
(c) In due course of time PM4 pkt would get executed , this address/data would sit in the input fifo of CP , ready to
program both the scratch registers.
(d) As soon as CB (color buffer) sends two sets of RESYNC pulses (4 of them from each pipe with mask), CP
allows the FIFO contents to get transferred to scratch registers for further action. (RBBM transactions are stalled at
this time)
(e) SCR_REGS data gets compared with DRV_REGS data for preprogrammed condition of either "equality" or
"non-equality" or "greater than" or "less than " or "greater than or equal" or "less than or equal".
© 2010 Advanced Micro Devices, Inc.
Proprietary
131
Revision 1.5
June 8, 2010
(f) If the condition is satisfied then an interrupt is generated informing driver/system to wake-up and proceed for the
next command.
(g) The scratch register data gets written to system memory (if umask is set) at premapped address to be read back
by the system/driver.
10.1.14 Host requests (GFX, ISYNC_CNTL, RBBM_GUICNTL, WAIT_UNTIL)
Pre-R5xx, requests made within the aperture range 0x1400 - 0x1EFF and 0x2000 – 0xFFFF were queued. From
R5xx, onwards these requests will not be queued. ISYNC_CNTL, RBBM_GUICNTL and WAIT_UNTIL can be
programmed only for queued requests. As none of the host (PIO) requests are queued, host cannot program above
three registers through PIO.
10.1.15 Double Z
RV530 has two Z pipes, but a single raster pipe. In the past, SU_REGDEST was used to select which raster pipe
you want to select. On RV530, you use FG_ZBREG_DEST. Because the pipe selection happens in the FG, you
must be in Z bottom mode. This mainly applies to occlusion queries where you want to get Z pass data from each
Z unit.
10.1.16 FP16 AA support
R5xx-family chips support FP16 AA. However, there is an issue with the blend optimizations while FP16 AA is
enabled. Because of this, RB3D_BLENDCNTL.DISCARD_SRC_PIXELS must be set to
CB_DISCARD_SRC_DISABLE while FP 16 AA is enabled.
10.1.17 FP16 Blending
FP16 (64bit pixel) blending is added in R5xx parts. FP16 Blend bandwidth is half the rate of 32 bit pixels; i.e. 8
pixels/clk in a 16 pipe system. FP16 blending uses the new 64 bit clear color register and constant color registers.
Setting the FP16 blend equation to multiply by 1.0 is subtly different from disabling blending. A negative zero
(0x8000) will be converted to zero (0x0000) if it is blended but 0x8000 will be drawn if blending is disabled. The
driver should distinguish between FP16 and 16 bit integer formats and never enable blending for 16 bit integer
formats. The CB FP16 implementation supports denorms but does not support NaNs and Infs. Only a 4 component
(ARGB16161616) format is supported. There are no I16 or IA1616 formats.
10.2 Interface Notes
10.2.1 Raster Reset
The proper sequence for a full raster reset is the following:


Perform a RBBM reset with the GA RBBM client flag set
Perform a register write to the GA_SOFT_RESET register, with a value of 0x200 or higher
In the above sequence, the first item causes the GA to delete all pending register reads & writes and resets the
RBBM interface. If the GA status is idle, then the RBBM reset is not required. After this reset, the GA is ready to
accept register read and write commands. However, the 3D pipe could be in a hung state, which would prevent it
© 2010 Advanced Micro Devices, Inc.
Proprietary
132
Revision 1.5
June 8, 2010
from accepting 3D commands or register commands.
The second operations (GA_SOFT_RESET) causes a soft reset of the 3D pipe. This reset causes a loss of all state in
the 3D, except in the GA & SU blocks. Shadow register values are not reset. The 3D pipe should then switch to the
idle state after the reset. It will take 0x200+ cycles for the idle state to be re-asserted (should be less than 0x200 +
64). The value of 0x200 is a suggestion, which should be enough to reset all the pipelines. A larger value can be
used (up to 16b), but should not offer any benefit.
10.2.2 Non-textured, non-colored primitives
The R300 always does at least one 2D texture and one color per primitive. The RS_COUNT has a baseline value of
1, which indicates up to 1 color and 1 texture are to be rasterized. The other registers used to specify the colors and
textures are the VAP_RASTER_VTX_FMT_0 and RASTER_VTX_FMT_1 registers. These registers can be set to
have no color and no texture. So if one wants to specify a non-textured and non-color primitive, one should set the
RASTER_VTX_FMT registers to no color and no texture, and set the RS_COUNT to 0. The raster will still
rasterize the extra colors and textures, but the rasterized values will be wrong. The shader code should then be set to
ignore the texture coordinates and colors and to setup a constant color, or the CB could be disabled so no color
writes occur (to setup the ZB, for example).
10.2.3 Flushing primitives out of the SC
All 3D operations need to be terminated with a register write to the SC, US or some down stream register. Unless
this is done, the SC/RS will never assert idle (which will be reflected as GA_BUSY). The final polygon rendered
should still drain out of the pipe.
10.3 Register Notes
10.3.1 Update to register reads
R520 and follow-on chips now support simultaneous G3D register reads and writes. Coherency of reads and writes
is not guaranteed (reads can occur before writes). However, switching from write/cmd mode to read mode (PIO
through RBBM) does not require idling the G3D pipe anymore. However, this mode is not enabled by default. The
following fields have been added to the GA_ENHANCE register:
REG_READWRITE 2:2
REG_NOSTALL 3:3
When the REG_READWRITE field is set, this enables the GA to support simultaneous register reads and writes.
However, simply enabling this mode allows the GA to receive both read and write commands (and to deal with
both), but it still tells the GA to wait for register return before continuing. Consequently, the GA will cause a stall
bubble, of (n) cycles to be injected, where (n) is the latency for register read back. If the register is shadowed, that
value is very small (A few cycles). If not, then it can be hundreds of cycles
When REG_NOSTALL field is set, this enables GA to support mixing the G3D pipe with reads and other activity;
in this mode, the register read is simply part of the pipeline data. This mode would allow for no performance hit at
all, when doing register reads, since the GA will not cause a stall bubble (it will not wait for the register data to
return). It does not permit the GA to have multiple outstanding read requests, but it allows for minimal performance
impact.
10.3.2 Registers that cause stalls
© 2010 Advanced Micro Devices, Inc.
Proprietary
133
Revision 1.5
June 8, 2010
10.3.2.1 ZB Registers
Unpipelined registers
Writes to these registers causes a stall in the pipe. The stall is on as long as there are any quads in the ZB block.
Once the ZB block is empty the register is updated and the stall is removed. If multiple unpipelined registers are
updated with no quads in the middle, then the first one will cause a stall to drain the ZB, but the following
unpipelined writes will go at full speed…
ZB_FORMAT
ZB_ZCACHE_CTLSTAT
ZB_BW_CNTL
ZB_DEPTHOFFSET
ZB_DEPTHPITCH
ZB_DEPTHCLEARVALUE
ZB_HIZ_OFFSET
ZB_ZPASS_DATA
ZB_ZPASS_ADDR
ZB_DEPTHXY_OFFSET
Pipelined Registers
ZB_CNTL
ZB_ZSTENCILCNTL
ZB_STENCILREFMASK
ZB_HIZ_DWORD
Special register ZTOP
Whenever ZTOP register is switched from 1 to 0 or 0 to 1 a stall occurs at the SC stage of the pipe and it goes away
when all the quads between the SC and CB are drained from the pipe. Then the Zbuffer is moved in the pipe-lined.
Writing to Ztop a value that it currently holds (0 to 0 or 1 to 1) has no performance penalty.
10.3.2.2 CB Registers
Unpipelined registers
Writes to unpipelined registers cause the CB to stall until all previous quads, pipelined registers, and partially
pipelined registers have finished processing. Once an unpipelined register has been written, a write to another
unpipelined register will not cause more stalls as long as there are no intervening quads, pipelined registers, or
partially pipelined registers. The unpipelined CB registers are the following:
RB3D_CCTL
RB3D_COLOR_CLEAR_VALUE
RB3D_COLOROFFSET(0, 1, 2, 3)
RB3D_COLORPITCH(0, 1, 2, 3)
RB3D_DSTCACHE_CTLSTAT
RB3D_AARESOLVE_OFFSET
RB3D_AARESOLVE_PITCH
RB3D_AARESOLVE_CTL
GB_TILE_CONFIG
GB_AA_CONFIG
Partially pipelined registers
Partially pipelined registers are pipelined everywhere in the CB except in one module. That module must stall until
all the quads that it is currently processing have finished. The number of stall cycles should not exceed about 15
cycles. The partially pipelined CB registers are the following:
© 2010 Advanced Micro Devices, Inc.
Proprietary
134
Revision 1.5
June 8, 2010
RB3D_ROPCNTL
RB3D_CLRCMP_FLIPE
RB3D_CLRCMP_CLR
RB3D_CLRCMP_MSK
Pipelined registers
These registers are fully pipelined and may be freely intermixed with quads without causing stalls. The pipelined
registers are the following:
RB3D_BLENDCNTL
RB3D_ABLENDCNTL
RB3D_COLOR_CHANNEL_MASK
RB3D_CONSTANT_COLOR
RB3D_DITHER_CTL
CB register ordering
Because unpipelined registers can stall on preceding pipelined or partially pipelined registers, it is recommended
that all unpipelined registers are written first. Pipelined and partially pipelined registers may be freely intermixed
without penalty.
10.3.2.3 TX Registers
Global registers
Global registers are registers that affect all texture stages. On a write to any global texture register, the US will wait
for the TX to flush completely before passing the register to the TX. This could take on the order of a couple
hundred clocks worst case. Obviously writes to these registers should be minimized. There are two global registers
that cause the TX to flush : TX_INVALTAGS and TX_PERF.
Stage registers
Stage registers are registers that only affect 1 of the 16 possible texture stages. On a write to a Stage register, the US
will wait until that texture stage is inactive in the TX pipe, and only then will it pass the register to the TX. It is
therefore important to rotate through the 16 sets of registers to avoid a register write to a stage that is still being
processed in the TX. Otherwise unnecessary stalls will occur.
10.3.3 Registers that affect performance
10.3.3.1 US_W_FMT
When the W value is not being used (FG_DEPTH_SRC does not select discrete W), then this register should be set
to specify that the source is the US and the format is always 0. Specifying that W comes from the rasterizer causes
stalls inside the US.
10.3.4 Other Registers
10.3.4.1 GB_TILE_CONFIG
The GB_TILE_CONFIG contains multiple raster pipe control fields. Some of these need a soft reset afterwards to
apply the change. All of them require the pipe to be idle before performing the change. As well, in the R5xx, this
register is simply shadowed in the shadow RAM, except for the PIPE_COUNT field, which always indicates the
internal value of this field. This might or might not match the written value, depending on bad_pipes and max_pipes.
All fields after Hard reset will show the default values shown below. The fields all hard reset to the default values.
© 2010 Advanced Micro Devices, Inc.
Proprietary
135
Revision 1.5
June 8, 2010
Soft reset (GA_SOFT_RESET) does not affect this register.
Here are the fields, with the default values, the reset status and a slight comment:
Fields
Enable [0:0]
Possible values
0: Disable tiling
1: Enable tiling
Defaults
Enabled (1)
Pipe_count [3:1]
0: RV350
3: R300
6: R420 (3 pipes)
7: R420 (4 pipes)
0: 8x8 pixels
1: 16x16 pixels
2: 32x32 pixels
Depends on fuses
Super_size [8:6]
0: 1x1 tile
1: two 1x1 A,B tiles
2: one 2x2 tile
3: two 2x2 A,B tiles
0: 1x1 tile
Super_X, Super_Y,
Super_Tile [15:9]
7b ID identifies
unique location of
chip in multi-chip
board
0: 1/12 subpixel
1: 1/16 subpixel
0
No reset required
0: 1/12
Selects the 1/12 or
1/16 subpixel mode
0: 4 quads
1: 8 quads
2: 16 quads
3: 32 quads
0: Use intercept scan
conv.
1: Use bounding box
scan conv.
0: 4 quads
Can be changed
whenever pipe is
idle without Reset
No reset required
0: Intercept
No reset required
0: Do Z type scan
conversion
1: Do S type scan
conversion
0: Z type
Can be changed
when pipe idle.
Intercept method is
new and higher
performance.
Bounding box is
traditional & slower,
but “guaranteed” to
work. Should only be
changed if raster
issues come up.
RV350 and R420
support S scan
conversion, which
maintains local
coherence from scan
line to scan line,
instead of Z type
Tile_size [5:4]
Subpixel [16:16]
Quads_per_ras
[18:17]
Bb_scan [19:19]
Alt_scan_en[20:20]
© 2010 Advanced Micro Devices, Inc.
Proprietary
1 : 16x16
Reset
If changed, soft
reset should be
applied
If changed, soft
reset should be
applied
No reset required
Comments
The default value of
(1) should never be
changed
Should be
programmed with 4P
(7), 3P (6), 2P (3) or
1P (0).
R5xx supports 16x16
or 32x32 only.
32x32 should be
used, in 3p or 4p
cases, as performance
testing determines
Only 1x1 mode
guaranteed – Feature
only used in multichip boards
Only support super
tiling with 1, 2 or 4
pipes (not in 3P
config)
When in single chip,
value should be 0.
Reserved for R350 –
Leave at 0 for R300,
RV350
136
Revision 1.5
June 8, 2010
Alt_offset[21:21]
0: Use 1440/1088
offset for SC
1: Use 672/1088
offset for SC
0: 1440/1088 mode
Should be
switched when
pipe is idle.
Subprecision [22:22]
0: Uses 4b of sub
pixel precision
1: Uses 8b of sub
pixel precision
0: 4b
Should be
changed when
pipe is idle.
Alt_tiling [23:23]
0: Use regular tiling
for 3P mode
1: Use alternate
tiling for 3P mode
0: Regular tiling
No reset required
Z_extended[24:24]
0: Use [0,1] Z clamp
range
1: Use [-2,2] Z
range
0: R3xx/R4xx mode
Should be
changed when
pipe is idle
which “goes back” to
the left on every scan
line
When in mode (1),
allows for a render
target of 4k x 4k,
only for 1/12
subpixel mode. The
X,Y offsets in the GA
are not affected, so
that the viewport
should be loaded with
a value of (6721440=-768) to match.
Allows for 4 extra
bits of subpixel
precision. All
computations done in
higher precision
when in use. Should
always be enabled.
Empirical testing
needs to be done to
determine which has
higher performance.
Either tiling mode is
possible.
Should allow us to
increase guardband.
Per pixel clamping to
[0,1] still occurs in
SC
10.3.4.2 GB_PIPE_SELECT
GB_PIPE_SELECT controls the physical and logical pipe mapping, as well as the total number of active pipes. It
works with GB_TILE_CONFIG to configure the pipelines. It is procedural and not shadowed; if you read the
register back after hard reset, you should get the default values. Changing this register is generally not required, if
the fuses are set correctly (i.e. max_pipes reflects total number of working and desired pipes; bad_pipes indicates
which of the 4 pipes are bad). The MAX_PIPES and BAD_PIPES fields are read-only, and reflect what the SU unit
receives from the fuse unit. The fuse unit can be programmed to alter the max_pipes/bad_pipes, but not contrary to
the actual fuse settings (can never set, through SW, internally max_pipes to higher than the fuse setting).
Fields
PIPE0_ID [1:0]
Possible Values
0, 1, 2, 3
Defaults
Depends on fuses
– Often 0
PIPE1_ID [3:2]
0, 1, 2, 3
Depends on fuses
– Often 1
PIPE2_ID [5:4]
0, 1, 2, 3
Depends on fuses
– Often 2
© 2010 Advanced Micro Devices, Inc.
Proprietary
Reset
Pipe should be
soft reset after
changing
Pipe should be
soft reset after
changing
Pipe should be
soft reset after
Comments
Determines the logical mapping
of physical pipe 0
Determines the logical mapping
of physical pipe 0
Determines the logical mapping
of physical pipe 0
137
Revision 1.5
changing
Pipe should be
soft reset after
changing
Pipe should be
soft reset after
changing
PIPE3_ID [7:6]
0, 1, 2, 3
Depends on fuses
– Often 3
Pipe_mask [11:8]
0 through 16
Depends on fuses
– Max is 4
Max_pipes [13:12]
Read Only
0: 1 good pipe
1: 2 good pipes
2: 3 good pipes
3: 4 sweet pipes
Depends on fuses
Read only field
Bad_pipes [17:14]
0 through 16
Depends on fuses
Read only field
Config_pipes
[18:18]
0: Do nothing
1: Force autoconfig
N/A
Should be soft
reset after
writing, if fields
are changed
June 8, 2010
Determines the logical mapping
of physical pipe 0
Each bit of the mask identifies
if a physical pipe is good (1) or
not (0). A value of 0xf
indicates 4 good pipes.
Indicates the fuse state for the
number of good pipes.
GB_TILE_CONFIG.pipe_count
should not try to use more than
this number of pipes. HW will
ignore any programming that
tries to override this value.
Returns a (1) for each good
pipe. Matches pipe_mask
format. You cannot enable more
pipes than max_pipes.
Causes the HW to ignore the
pipe#_ID and pipe_mask fields,
and to generate those values
based on the fuse state.
The GB_PIPE_SELECT configures the pipes to match the desired configuration. SW should not attempt to
configure the pipes in a way that contradicts the max_pipes value, which is programmed through on-die fuses at die
test time. SW will be ignored if it contradicts the fuses. However, the bad_pipes can be programmed to enable a
“marked bad” pipe, but it must then disable a good pipe, since the total number of active pipes must be equal or less
than max_pipes, otherwise the HW will ignore the bad_pipes register.
10.4 Feature Notes
10.4.1 Switching Pipeline configuration / Resetting 3D pipe
The raster pipeline can be switched from single pipe to dual pipe and back through the use of the
GB_TILE_CONFIG register. As well, the GB_TILE_SELECT should be used to select the physical pipes to use.
Switching from one mode to another requires the following sequence:





The 3D pipe must be idle (WAIT For 3D IDLE)
The GB_PIPE_SELECT register should then be read, to determine the current max_pipes and bad_pipes.
The SW can then program it with those values or new values.
The GB_TILE_CONFIG register‟s PIPE_COUNT field should be written with the appropriate value (use
PIO):
o 0x0 for single pipe (RV350)
o 0x3 for dual pipe (R300)
o 0x6 for triple pipe (R420-3P)
o 0x7 for quad pipe (R420)
The 3D pipe & GUI must be idle again after writing the registers
The GA_SOFT_RESET register must be written with 0x100 or greater (use PIO)
© 2010 Advanced Micro Devices, Inc.
Proprietary
138
Revision 1.5



June 8, 2010
Wait for ~1 ms (prevents race conditions between GA_SOFT_RESET And 3d idle status read)
The 3D pipe & GUI must be idle again to permit any other activity (register or data) (read RBBM status for
GA idle)
If the fuses are set to limit the number of active pipes to a given level (1,2,3 or 4), then GB_TILE_CONFIG
and GB_PIPE_SELECT settings will not be able to override those values. A hang or other problem could
actually occur if SW tries to enable “bad pipes”.
The above sequence will invalidate the state of the pipe as well as switching it.
For resetting the pipe, the same process as above is followed:
 The 3D pipe must be idle (WAIT for 3D IDLE) or hung
 The RBBM soft reset of GA must be done, if chip is not idle
 The GA_SOFT_RESET register must be written with 0x100 or greater (use PIO)
 Wait for ~1ms
 The 3D pipe & GUI must be idle again to permit any other activity (read RBBM status for GA idle)
10.4.2 Switching vertex data rounding mode
The GA_ROUND_MODE register can be used to select between round to nearest and truncate (round to 0) for both
vertex geometry (X,Y) and color conversions. The default is to truncate. This register should only be changed when
the 3D pipe is idle. Otherwise, switching can occur in the middle of primitives, which could cause visual anomalies.
This register, once set, should never be changed again.
10.4.3 Switching from 1/12th to 1/16th subpixel mode
Switching from 1/12 to 1/16 subpixel mode is done through the use of the GB_TILE_CONFIG register. Normally,
changing this register requires the use of a soft reset afterwards. However, changing the subpixel field does not
require a reset. However, it does require that the 3D pipe be idle. Also, the Z buffer can become incompatible after
switching the subpixel mode. Basically, if Z compression is enabled, the values contained in the Z buffer are
incompatible between subpixel modes, so that the buffer needs to be re-initialized after each switch.
10.4.4 Fastfill and compression in Z
Fast fill and compression only works in micro-tiled mode. The following table shows the valid combinations of fast
fill and rd/wr compression :
Fast FIll
0
1
RdCompression
0
0
WrCompression
0
0
1
1
1
1
1
0
description
no fast-fill or compression, the Z buffer has to be cleared explicitly.
fastfill, Z buffer does not need to be cleared explicitly, The zmask should be
set to 2‟b00 for all for all 4x4 tiles on the drawing window. The zb_clearvalue
will hold the cleared Z value
Same as above , with compression turned on.
Used to decompress , a compressed Z buffer …
Note that all other combinations in the above table are invalid. The emulator is programmed to generate an assert in
these cases. Compression does not work with all 16-bit formats. For 16-bit integer buffering, compression causes a
hung with one or two samples and should not be used.
© 2010 Advanced Micro Devices, Inc.
Proprietary
139
Revision 1.5
June 8, 2010
10.4.5 Z-Top
It is beneficial for performance to have Z buffer at the top of the pipe, since the quads that do not pass Z buffer do
not have to be sent to the shader. Depending on how many instructions the shader executes, this could gain you a lot
of advantage. There are several cases in which the Z buffer has to be at the bottom:
1234-
Alpha threshold (afunction) is turned on
Shader uses texkill instructions.
Chroma key cull enabled.
W-buffering
Cases 1,2 and 3 can kill a pixel before Z buffering . However, if the contents of the Z/stencil buffer will not be
modified, then ztop can remain enabled (1). This implies that the following state is in effect:
12-
Z-buffering is disabled or Zwrite-mask is off .
Stencil is disabled or stencil-wrmask is off or SFAIL/ZPASS/ZFAIL are all set to KEEP.
W values are always generated at the bottom of the pipe, so for w-buffering, ztop should be set to 0.
There is penalty in moving the Z buffer from top to bottom or vice versa. The pipe will be stalled at the sc and all
the quads that are in the pipe between the sc and cb have to be processed before the switch occurs. This is all done in
HW. If the ztop =0 and you write another 0 to it, there is no performance penalty. If it is 1 and you write a 1 to it,
there is no performance penalty. The penalty is only incurred when you switch from top to bottom or bottom to top.
10.4.6 Sub-sample locations
In point sample mode, POS0 defines the X,Y of the upper left pixel of the quad. POS1 defines the X,Y of the upper
right pixel of a quad. POS2 defines the X,Y of the lower left pixel in a quad and POS3 defines the X,Y of the lower
right pixel in a quad. This is done so that in R200 style super-sampling mode, the sample locations for the pixels can
be jittered. Hierarcical Z has to be shut off when the 4 pixels in the quad have different locations in point sample
mode.
In multi-sample mode , samples 0,1,2,3,4,5 of pixels 0,1,2,3 of a quad are defines by pos0,1,2,3,4,5 .., so all pixels in
the quad have the same sub-sample pattern.
There is a quirk when setting the MSPOS0.msbd0_x. The value represents the distance from the left edge of the
pixel quad to the first sample in subpixels. All values less than eight should use the actual value, but „7‟ should be
used for the distance „8‟. The hardware will convert 7 into 8 internally.
It is also important that when using less than 6 multisample positions, the unused samples must be set to the position
of other valid samples.
10.4.7 Dithered Clears
Fast cmask clears of a subsampled buffer will not be dithered.
The ZB doesn‟t do color dithering so ZBCB clears will not be dithered.
When doing clears in 16 bit mode with dithering enabled the driver should examine the clear color value and
determine if it would be affected by dithering. For example a color value of zero when dithered will remain zero for
all dither factors. If the color would not be affected by dithering either fast clears or ZBCB clears can be used,
otherwise a full window rectangle write should be used to clear the buffer. This is only an issue for 16 bit buffers
with some clear color values so hardware support is not provided.
© 2010 Advanced Micro Devices, Inc.
Proprietary
140
Revision 1.5
June 8, 2010
10.4.8 4x AA tiling
R420 introduced a new tiling mode for 4x AA buffers. Each 4x4 block of pixels occupies 8 cache lines of memory
(32 bytes per cache line). When the block is decompressed, the color samples are grouped together. Thus, all 16
sample 0s are in one chunk, all 16 sample 1s are in another, etc. On R300, decompressed blocks where organized
with sample 0s being first, then sample 1s, then 2s then 3s. On R420, groups of 8 cache lines have the top and
bottom halves interchanged when the block address is odd in the x dimension. For example, block (0,0) is organized
just like R300, but block (1,0) would have samples 2 and 3 before samples 0 and 1. Block (2, 0) would be just like
R300 again. Note: This new tiling mode only applies when memory mapping is disabled.
10.4.9 8x8 Z plane compression
Chips based on the RV350 and beyond support a new 8x8 Z plane compression mode specified in the
GB_Z_PEQ_CONFIG register. When compression is not enabled, the Z plane compression mode has to be set to
4x4 in order for the GA and ZB to agree on the Z plane equation format and avoid visual corruption.
10.5 Blend optimization notes
10.5.1 Disabling reads during blending
The destination color is not necessary for some blending operations. The cb has a read enable called
RB3D_BLENDCNTL.READ_ENABLE to control whether the destination color is read or not during blending
operations. Reads must be enabled during blending operations that require the destination color. Failure to do so
will result in incorrect results. Leaving the register enabled when blending is disabled does not have any adverse
affects.
10.5.2 Discarding pixels based upon the source color
There are cases where blend operations do not change the contents of the frame buffer. For example, adding zero to
the frame buffer does not change the frame buffer contents. Although the operations do nothing to the frame buffer,
they still take bandwidth. The cb can discard pixels based on the source color to eliminate some useless blend
operations. The RB3D_BLENDCNTL. DISCARD_SRC_PIXELS register controls the functionality. When to use
this feature is under driver control. The cb will not override this register if it is not safe to use under the current
blending mode.
10.5.3 ZB/CB cache flushes
ZB/CB cache flushes take hundreds of cycles to complete, so they should be avoided if possible. Performing a
cache flush when the cache is already clean only takes a cycle, so there isn‟t any penalty for flushing a cache
multiple times as long as there are no intervening quads.
10.6 Texture Notes
TX_CHROMA_KEY must be the same format as the texture being keyed with any unused msb‟s zero‟d. And
should be AVYU for all YUV formats.
TX_FMT_*_MPEG formats are implicitely signed. However the TX_FORMAT1_*_SIGNED_COMP* bits must
still be explicitely set. It is a bug to use an MPEG format and indicate that the components are unsigned.
© 2010 Advanced Micro Devices, Inc.
Proprietary
141
Revision 1.5
June 8, 2010
10.7 GA Point/Line/Polygon Setup
10.7.1
Wide & Anti-aliased points
All points in the GA are converted to parallelograms that have width and height. “Wide” points are just points with
larger height and width and so are not different than other points. AA points are identical to regular points,
dimension wise. However, AA points do have at least 1 texture coordinate. The AA texture coordinates will be
“stuffed” into the indicated texture coordinates. The values to stuff are loaded from registers. The geometry for the
point (height and width) will be used to compute the screen coordinates of the vertices, based on the incoming V0
vertex.
To compute the geometry, the half height and half width of a point are supplied in a register, or can be supplied per
vertex. Note that the ½ height and ½ width represent 16b values in 1/12 or 1/16 pixel increments (since they are ½
size, the minimum point width and height are 1/6). When supplied per vertex, the ½ height and ½ width are equal.
Per vertex size is always square.
The (min_s, min_t, max_s, max_t) are loaded from registers in the GA. The third dimension for the AA texture will
be stuffed with 0.0, indicating an AA point.
Note: If texture AA/Stipple stuffing is enabled for a set of texture coordinates, but AA points are not, the specified
texture coordinates will be stuffed with (0.0, 0.0, 0.0).
10.7.2
Wide & AA lines geometry
For wide lines, the width is programmed in a register that indicates ½ width of the line (in 16.0/12 or 16.0/16
format).
10.7.3
Anti-Aliased and Stippled lines’ texture
For lines that are stippled and/or anti-aliased, the setup will stuff the indicated texture coordinate with procedural
texture coordinate values. It is to be noted that the pipe must be setup to handle (n+1) texture coordinates in this
mode (where n is the number of replicated texture coordinates). The generated texture coordinate will be 3
dimensions. The S component will be used for Anti-Aliased lines. The (min_s, max_s) values for AA lines are
loaded from registers in the GA. The stipple uses the t coordinate for lines.
The third coordinate will be stuffed with 1.0, which indicates to the texture unit that the texture to be used is for
lines (stipple, AA or AA & Stipple).
Note: If AA/stipple texture stuffing is enabled, but AA lines and stippled lines are disabled, then (0.0, 0.0, 1.0) will
be stuffed in the specified texture(s). Also, if texture stuffing is disabled but line stippling is enabled, then
accumulation of stipple pattern will still be done, even though no texture coordinates will be outputted.
10.7.4
Stipple Polygon
For stippled polygons, the GA unit will stuff the indicated texture coordinate with a 3D texture. The first two
coordinates will be computed based on the screen coordinates of the triangle. The third component of the stuffed
texture will be 2.0, which indicates to the texture unit that the stippled polygon texture should be used.
10.7.5
Texture Stuffing
The GA has the ability to stuff any of the texture coordinates with the following items:
© 2010 Advanced Micro Devices, Inc.
Proprietary
142
Revision 1.5
Encoding
Replicate
PointTexture
StippleAA
June 8, 2010
Texture Coordinate Source
VAP Supplied texture coordinates
Point (S,T) GA supplied texture coordinates
Stipple and or AA GA supplied texture coordinates
A texture is active if the VAP FMT_1 register enables that texture and its stuff option is Replicate, or if its stuff
option is other than Replicate.
10.7.6
GA Fog stuffing (R5xx)
The GA supports the stuffing of texture coordinates with the current fog value. A single texture component of a
single texture can be selected. The GB_SELECT register controls the stuffing of the texture. The
FOG_STUFF_TEX selects which texture, while the FOG_STUFF_COMP selects the component. FP20 values of
A0,A1,A2 or A3 can be selected, as well as FP32 values of 1/W and Z. This could also be used as a way to get 1/W
buffering into the pixel shader, which can then be sent instead of Z.
10.8 CB AA/Clear Setup
10.8.1 Anti-aliasing
Only 32-bit color modes can be anti-aliased. 16-bit and 8-bit color modes are not supported. The CB supports
1,2,3,4, or 6 subsamples per pixel. The CB also has support for supersampling in the form of using coverage mask
bits s00, s11, s22, and s33 for pixels 0, 1, 2, and 3, respectively, in point sampled mode. The effect is that
subsample locations 0, 1, 2, and 3 are used for pixels 0, 1, 2, and 3, respectively. This makes it possible to
individually change the sample location of any of the 4 pixels in a quad for jittered supersampling. There isn‟t an
enable bit for this feature, so in point sampled mode, the first four subsample locations should be set to the same
location.
For anti-aliasing a special subsampled frame buffer is allocated with enough storage for all the samples. Only the
subsamples selected by the incoming coverage mask will be updated. Coverage mask bits are used in numeric order:
2 subsample uses mask bits 0, and 1; 3 subsample uses mask bits 0,1,and 2; 4 subsamples uses mask bits 0,1,2, and
3; 6 subsamples uses mask bits 0,1,2,3,4, and 5.
The storage format for subsampled pixels is optimized for and proprietary to the CB. The
RB3D_COLORPITCH0.COLORTILE and RB3D_COLROPITCH0.COLORMICROTILE control the tiling format
of the resolve buffer. They have no effect on the tiling of the subsample buffer.
10.8.2 Anti-aliased Buffer Resolve
The subsample buffer can neither be displayed nor used as a texture. The CB must perform a resolve pass to create
a point sampled buffer, hereafter called the resolve buffer, which can then be displayed or used as a texture. The
resolve buffer may be microtiled and/or macrotiled. The resolve operation cannot be performed in place – the
subsample buffer and the resolve buffer must not overlap.
The following is the procedure for performing a resolve:
1.
2.
3.
Flush the CB cache. The cache does not have to be flushed if it is already known that the cache is
clean.
Set the CB registers to allow rendering to the subsample buffer. This includes things like the pitch,
offset, cmask enable, etc…
Set RB3D_COLORPITCH0.COLORTILE and RB3D_COLORPITCH0.COLORMICROTILE to the
desired tiling format of the resolve buffer.
© 2010 Advanced Micro Devices, Inc.
Proprietary
143
Revision 1.5
June 8, 2010
4.
Set RB3D_AARESOLVE_OFFSET.AARESOLVE_OFFSET and
RB3D_AARESOLVE_PITCH.AARESOLVE_PITCH to the offset and pitch of the resolve buffer.
5. Set RB3D_AARESOLVE_CTL.AARESOLVE_GAMMA to the desired resolve buffer gamma.
6. Enable resolve mode through RB3D_AARESOLVE_CTL.AARESOLVE_MODE.
7. Set subsample sample locations 0, 1, 2, and 3 to the pixel centers or wherever the desired sample
location would be in point sampled mode.
8. Disable z-buffering.
9. Enable the screen door mask on all the subsamples of all the pixels.
10. Draw triangles over the region(s) that should be resolved. The CB will resolve any pixels that would
be considered visible in point sampled rendering. The fragment colors are ignored, so all bandwidth
consuming or performance degrading features should be disabled to maximize rasterization speed. If
the desire is to resolve a rectangle, then it is best to draw a rectangular point. This is more efficient
than drawing two triangles.
11. Flush the CB cache.
The resolve operation consists of summing all of the subsamples that make up a pixel and dividing by the number of
samples. No samples outside of the pixel can be included in the resolved result. The alpha channel data is not
averaged, the sample 0 data is returned as the resolved value. The resolve can be gamma corrected.
Anti-aliased buffers must be resolved for display.
10.8.3 Subsample buffer addressing
The subsample buffer format used by the CB is proprietary to the CB, but it is exactly the same as the one used by
the ZB down to a 4x4 pixel granularity.
Format
32-bit 2 ss
32-bit 3 ss
32-bit 4 ss
32-bit 6 ss
Formula
32*(4 *cat(x[11:2], y[2]) + 8*y[11:3]*pitch[13:2] + cache_word_offset +
RB3D_COLOROFFSET[31:5])
32*(6 *cat(x[11:2], y[2]) + 12*y[11:3]*pitch[13:2] + cache_word_offset +
RB3D_COLOROFFSET[31:5])
32*(8 *cat(x[11:2], y[2]) + 16*y[11:3]*pitch[13:2] + cache_word_offset^cat(x[2], 2‟b00) +
RB3D_COLOROFFSET[31:5])
32*(12*cat(x[11:2], y[2]) + 24*y[11:3]*pitch[13:2] + cache_word_offset +
RB3D_COLOROFFSET[31:5])
cache_word_offset refers to the 256-bit word number within a cacheline. The given equations generate the byte
address for a 256-bit cache word for the cache line with x=x[11:2] and y=y[11:2].
10.8.4 CBZB Color Clear
In point sampled modes, the ZB can use zmask fast clears to clear the z-buffer. The CB does not have this option
and must clear the buffer the conventional way. The ZB would be idle during a color buffer clear and bandwidth
would be wasted. By having the ZB clear half of the color buffer while the CB clears the other half, the R300 can
achieve 100% memory bandwidth utilization. Except for the ZB_BW_CNTL.ZB_CB_CLEAR bit in the ZB, there
are no special provisions for this in the ZB. The ZB_DEPTHOFFSET must be set to the midpoint of the color
buffer and the ZB_DEPTHPITCH settings must match that of the CB in order for this to work.
The ZB does not have dither support, so this technique cannot be used when a dithered result is required.
10.8.5 Frame Buffer Granularity and Alignment
For various implementation reasons, there are X and Y granularity and alignment restrictions on the frame buffer.
© 2010 Advanced Micro Devices, Inc.
Proprietary
144
Revision 1.5
June 8, 2010
The width and height of the color buffer must be a multiple of the x and y granularity, respectively.
Color
depth
8-bit
8-bit
8-bit
8-bit
16-bit
16-bit
16-bit
16-bit
16-bit
16-bit
32-bit
32-bit
32-bit
32-bit
32-bit
64-bit
64-bit
64-bit
64-bit
128-bit
128-bit
Macrotiled
Microtiled
Misc.
X
X
X
X
X
X
X
X
X
4x4 microtiles
X
X
4x4 microtiles
X
X
X
X
AA mode
X
X
X
X
X
X granularity
(pixels)
32
8
256
64
16
8
4
128
64
32
8
4
64
32
4
4
2
32
16
2
16
Y granularity
(pixels)
2
4
8
32
2
2
4
8
16
32
2
4
8
16
8
2
2
8
16
1
8
Alignment
(bytes)
32
32
2048
2048
32
32
32
2048
2048
2048
32
32
2048
2048
32
32
32
2048
2048
32
2048
10.8.6 Multiple buffers
The CB can write to up to 4 different buffers with either different data or the same data to each buffer. To write
different data to N buffers, set RB3D_CCTL.NUM_MULTIWRITES to 1 buffer and have the pixel shader output N
colors per packet. The Jth color in the packet will be written to the Jth buffer. To write the same data to N buffers,
set RB3D_CCTL.NUM_MULTIWRITES to N buffers and have the pixel shader output 1 color per packet.
There is no way to skip buffers – it is not possible to write to buffer 0 and 2 and skip buffer 1.
Usage of the multiple buffer mechanism will result in performance degradation. The degradation is due to the
increased number of color writes (and reads) and due to the effective size of the cache being decreased by sharing it
between the different memory buffer targets.
Resolving multiple buffers simultaneously is not supported.
10.9 Errata
10.9.1 Facing bit with Polymode & colors
In R5xx, just as R4xx, when lines are sent from the setup to the rasterizer, the setup‟s facing information is lost,
since no facing information is sent between the SU and SC. This implies that lines will always be treated as
“forward facing” in the scan converter. This facing information is passed to the shader as the “facing bit”, which can
be used as a conditional.
Consequently, in polygon outline mode, where lines have front and back meaning, when rendering a line polygon
(for either front or back), the facing bit will always be marked as front facing, regardless of the facing of the original
© 2010 Advanced Micro Devices, Inc.
Proprietary
145
Revision 1.5
June 8, 2010
triangle. Back / Front culling does occur correctly here (i.e. if the front render is line and front face culling is
enabled, then no front facing lines will get drawn), but the facing bit for rendered lines or points will be always front
facing.
The R5xx contain a work-around for this problem, in the form of a special mode. This mode is enabled by setting
the bits of SU_PERF.PERF3_SEL to all 1‟s (31). When enabled, this will force the sign bits of the components of
the colors to be set to (0) for front facing, or (1) for back facing. All colors in a primitive will get their sign bit
changed, based on the facing of the primitive, or of its provoking vertex (in the case of polymode). If source colors
are positive, then, in the pixel shader, back facing polygons will have negative colors, while front facing polygons
will have positive colors. This mode will work, regardless of PS2 or PS3 mode in the pipe.
10.9.2 PS3 Polymode textures
In the R5xx mode, polymode texture coordinates are not computed correctly when the pipe is in PS3 mode. To fix
this, a polymode_ps3 fix has been implemented. This mode is enabled by setting the GA_PERF.PERF3_SEL[4] bit
to 0x1. This mode should only be set when in PS3 mode. As well, when set and in PS3 mode, colors will not
longer be computed correctly in polymode for polygons, but that is acceptable, since colors are not naturally
available in PS3 mode.
10.9.3 GA Fog stuffing
The GA supports stuffing the fog value (either an FP20 from C0a->C3a, or W or Z) into a texture component. The
limitation for R5xx, is that the GA can only stuff the component of the first active texture. It can only stuff any one
of the first 2 active components of the first active coordinate set.
10.9.4 Line rendering
When subpixel precision is enabled, there is a possibility that the rendering hardware will determine an incorrect
dominating direction, when the start and end X values of the line have the same 1/12 or 1/16 pixel value, but
different subpixel values. This can cause double pixel hits or missing pixels in continuous line drawing. The workaround, is to disable subpixel precision rendering when drawing lines.
10.9.5 PS3_VTX_FMT & PS3_TEX_SOURCE
Writes to the PS3_VTX_FMT and PS3_TEX_SOURCE register can cause bad textures or hangs in R5xx chips, if
followed immediately by VF_CNTL writes (i.e. draw command). Following any of these 2 registers with 2 register
writes (to GA or any block below) will always avoid the problem, before the next VF_CNTL.
© 2010 Advanced Micro Devices, Inc.
Proprietary
146
Revision 1.5
June 8, 2010
11. Registers
11.1 Command Processor Registers
CP:CP_CSQ2_STAT · [R] · 32 bits · Access: 8/16/32 · MMReg:0x7fc
DESCRIPTION: (RO) Command Stream Indirect Queue 2 Status
Field Name
Bits
Default
Description
CSQ_WPTR_INDIRECT
9:0
none
Current Write Pointer into the Indirect Queue. Default =
0.
CSQ_RPTR_INDIRECT2
19:10
none
Current Read Pointer into the Indirect Queue. Default =
0.
CSQ_WPTR_INDIRECT2
29:20
none
Current Write Pointer into the Indirect Queue. Default =
0.
CP:CP_CSQ_ADDR · [W] · 32 bits · Access: 8/16/32 · MMReg:0x7f0
DESCRIPTION: (WO) Command Stream Queue Address
Field Name
Bits
Default
Description
CSQ_ADDR
11:2
none
Address into the Command Stream Queue which is to be
read from. Used for debug, to read the contents of the
Command Stream Queue.
CP:CP_CSQ_APER_INDIRECT · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x1300-0x13fc
DESCRIPTION: IB1 Aperture map in RBBM - PIO
Field Name
Bits
Default
Description
CP_CSQ_APER_INDIRECT
(Access: W)
31:0
none
IB1 Aperture
CP:CP_CSQ_APER_INDIRECT2 · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x1200-0x12fc
DESCRIPTION: IB2 Aperture map in RBBM - PIO
Field Name
Bits
Default
Description
CP_CSQ_APER_INDIRECT2
(Access: W)
31:0
none
IB2 Aperture
CP:CP_CSQ_APER_PRIMARY · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x1000-0x11fc
DESCRIPTION: Primary Aperture map in RBBM - PIO
Field Name
© 2010 Advanced Micro Devices, Inc.
Proprietary
Bits
Default
Description
147
Revision 1.5
CP_CSQ_APER_PRIMARY
(Access: W)
31:0
none
June 8, 2010
Primary Aperture
CP:CP_CSQ_AVAIL · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x7b8
DESCRIPTION: Command Stream Queue Available Counts
Field Name
Bits
Default
Description
CSQ_CNT_PRIMARY
(Access: R)
9:0
none
Count of available dwords in the queue for the Primary
Stream. Read Only.
CSQ_CNT_INDIRECT
(Access: R)
19:10
none
Count of available dwords in the queue for the Indirect
Stream. Read Only.
CSQ_CNT_INDIRECT2
(Access: R)
29:20
none
Count of available dwords in the queue for the Indirect
Stream. Read Only.
CP:CP_CSQ_CNTL · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x740
DESCRIPTION: Command Stream Queue Control
Field Name
Bits
Default
Description
CSQ_MODE
31:28
0x0
Command Stream Queue Mode. Controls whether each
command stream is enabled, and whether it is in push
mode (Programmed I/O), or pull mode (Bus-Master).
Encodings are chosen to be compatible with Rage128. 0=
Primary Disabled, Indirect Disabled. 1= Primary PIO,
Indirect Disabled. 2= Primary BM, Indirect Disabled.
3,5,7= Primary PIO, Indirect BM. 4,6,8= Primary BM,
Indirect BM. 9-14= Reserved. 15= Primary PIO, Indirect
PIO Default = 0
CP:CP_CSQ_DATA · [R] · 32 bits · Access: 8/16/32 · MMReg:0x7f4
DESCRIPTION: (RO) Command Stream Queue Data
Field Name
Bits
Default
Description
CSQ_DATA
31:0
none
Data from the Command Stream Queue, from location
pointed to by the CP_CSQ_ADDR register. Used for
debug, to read the contents of the Command Stream
Queue.
CP:CP_CSQ_MODE · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x744
DESCRIPTION: Alternate Command Stream Queue Control
Field Name
Bits
Default
Description
INDIRECT2_START
6:0
none
Start location of Indirect Queue #2 in the command
cache. This value also sets the size in double octwords of
the Indirect Queue #1 cache that will reside in locations
INDIRECT1_START to (INDIRECT2_START - 1). The
Indirect Queue #2 will reside in locations
© 2010 Advanced Micro Devices, Inc.
Proprietary
148
Revision 1.5
June 8, 2010
INDIRECT2_START to 0x5f. The minimum size of the
Indirect Queues must be at least twice the MAX_FETCH
size as programmed in the CP_RB_CNTL register.
INDIRECT1_START
14:8
none
Start location of Indirect Queue #1 in the command
cache. This value is also the size in double octwords of
the Primary Queue cache that will reside in locations 0 to
(INDIRECT1_START - 1). The minimum size of the
Primary Queue cache must be at least twice the
MAX_FETCH size as programmed in the
CP_RB_CNTL register.
CSQ_INDIRECT2_MODE
26
0x0
0=>PIO, 1=>BM
CSQ_INDIRECT2_ENABLE
27
0x0
Enables Indirect Buffer #2. If this bit is set, the
CP_CSQ_MODE register overrides the operation of the
CSQ_MODE variable in the CP_CSQ_CNTL register.
CSQ_INDIRECT1_MODE
28
0x0
0=>PIO, 1=>BM
CSQ_INDIRECT1_ENABLE
29
0x0
Enables Indirect Buffer #1. If this bit is set, the
CP_CSQ_MODE register overrides the operation of the
CSQ_MODE variable in the CP_CSQ_CNTL register.
CSQ_PRIMARY_MODE
30
0x0
0=>PIO, 1=>BM
CSQ_PRIMARY_ENABLE
31
0x0
Enables Primary Buffer. If this bit is set, the
CP_CSQ_MODE register overrides the operation of the
CSQ_MODE variable in the CP_CSQ_CNTL register.
CP:CP_CSQ_STAT · [R] · 32 bits · Access: 8/16/32 · MMReg:0x7f8
DESCRIPTION: (RO) Command Stream Queue Status
Field Name
Bits
Default
Description
CSQ_RPTR_PRIMARY
9:0
none
Current Read Pointer into the Primary Queue. Default =
0.
CSQ_WPTR_PRIMARY
19:10
none
Current Write Pointer into the Primary Queue. Default =
0.
CSQ_RPTR_INDIRECT
29:20
none
Current Read Pointer into the Indirect Queue. Default =
0.
CP:CP_GUI_COMMAND · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x728
DESCRIPTION: Command for PIO GUI DMAs
Field Name
Bits
Default
Description
CP_GUI_COMMAND
31:0
none
Command for PIO DMAs to the GUI DMA. Only
DWORD access is allowed to this register.
CP:CP_GUI_DST_ADDR · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x724
DESCRIPTION: Destination Address for PIO GUI DMAs
Field Name
© 2010 Advanced Micro Devices, Inc.
Proprietary
Bits
Default
Description
149
Revision 1.5
CP_GUI_DST_ADDR
31:0
none
June 8, 2010
Destination address for PIO DMAs to the GUI DMA.
Only DWORD access is allowed to this register.
CP:CP_GUI_SRC_ADDR · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x720
DESCRIPTION: Source Address for PIO GUI DMAs
Field Name
Bits
Default
Description
CP_GUI_SRC_ADDR
31:0
none
Source address for PIO DMAs to the GUI DMA. Only
DWORD access is allowed to this register.
CP:CP_IB2_BASE · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x730
DESCRIPTION: Indirect Buffer 2 Base
Field Name
Bits
Default
Description
IB2_BASE
31:2
none
Indirect Buffer 2 Base. Address of the beginning of the
indirect buffer. Only DWORD access is allowed to this
register.
CP:CP_IB2_BUFSZ · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x734
DESCRIPTION: Indirect Buffer 2 Size
Field Name
Bits
Default
Description
IB2_BUFSZ
22:0
0x0
Indirect Buffer 2 Size. This size is expressed in dwords.
This field is an initiator to begin fetching commands
from the Indirect Buffer. Only DWORD access is
allowed to this register. Default = 0
CP:CP_IB_BASE · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x738
DESCRIPTION: Indirect Buffer Base
Field Name
Bits
Default
Description
IB_BASE
31:2
none
Indirect Buffer Base. Address of the beginning of the
indirect buffer. Only DWORD access is allowed to this
register.
CP:CP_IB_BUFSZ · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x73c
DESCRIPTION: Indirect Buffer Size
Field Name
Bits
Default
Description
IB_BUFSZ
22:0
0x0
Indirect Buffer Size. This size is expressed in dwords.
This field is an initiator to begin fetching commands
from the Indirect Buffer. Only DWORD access is
allowed to this register. Default = 0
© 2010 Advanced Micro Devices, Inc.
Proprietary
150
Revision 1.5
June 8, 2010
CP:CP_ME_CNTL · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x7d0
DESCRIPTION: Micro Engine Control
Field Name
Bits
Default
Description
ME_STAT
(Access: R)
15:0
none
Status of MicroEngine internal registers. This value
depends on the current value of the ME_STATMUX
field. Read Only.
ME_STATMUX
20:16
0x0
Selects which status is to be returned on the ME_STAT
field.
ME_BUSY
(Access: R)
29
none
Busy indicator for the MicroEngine. 0 = MicroEngine
not busy. 1 = MicroEngine is active. Read Only.
ME_MODE
30
0x1
Run-Mode of MicroEngine. 0 = Single-Step Mode. 1 =
Free-running Mode. Default = 1
ME_STEP
(Access: W)
31
0x0
Step the MicroEngine by one instruction. Writing a `1` to
this field causes the MicroEngine to step by one
instruction, if and only if the ME_MODE bit is a `0`.
Write Only.
CP:CP_ME_RAM_ADDR · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x7d4
DESCRIPTION: MicroEngine RAM Address
Field Name
Bits
Default
Description
ME_RAM_ADDR
(master with mirrors)
7:0
none
MicroEngine RAM Address (Write Mode) Writing this
register puts the RAM access circuitry into `Write Mode`
, which allows the address to auto-increment as data is
written into the RAM.
CP:CP_ME_RAM_DATAH · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x7dc
DESCRIPTION: MicroEngine RAM Data High
Field Name
Bits
Default
Description
ME_RAM_DATAH
7:0
none
MicroEngine RAM Data High Used to load the
MicroEngine RAM.
CP:CP_ME_RAM_DATAL · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x7e0
DESCRIPTION: MicroEngine RAM Data Low
Field Name
Bits
Default
Description
ME_RAM_DATAL
31:0
none
MicroEngine RAM Data Low Used to load the
MicroEngine RAM.
CP:CP_ME_RAM_RADDR · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x7d8
DESCRIPTION: MicroEngine RAM Read Address
Field Name
© 2010 Advanced Micro Devices, Inc.
Proprietary
Bits
Default Description
151
Revision 1.5
ME_RAM_RADDR
7:0
(mirror of
CP_ME_RAM_ADDR:ME_RAM_ADDR)
(Access: W)
none
June 8, 2010
MicroEngine RAM Address (Read Mode) Writing
this register puts the RAM access circuitry into `Read
Mode` , which allows the address to auto-increment
as data is read from the RAM. Write Only.
CP:CP_RB_BASE · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x700
DESCRIPTION: Ring Buffer Base
Field Name
Bits
Default
Description
RB_BASE
31:2
none
Ring Buffer Base. Address of the beginning of the ring
buffer.
CP:CP_RB_CNTL · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x704
DESCRIPTION: Ring Buffer Control
Field Name
Bits
Default
Description
RB_BUFSZ
5:0
0x0
Ring Buffer Size. This size is expressed in log2 of the
actual size. Values 0 and 1 are clamped to an 8 DWORD
ring buffer. A value of 2 to 22 will give a ring buffer:
2^(RB_BUFSZ+1). Values greater than 22 will clamp to
22. Default = 0
RB_BLKSZ
13:8
0x0
Ring Buffer Block Size. This defines the number of
quadwords that the Command Processor will read
between updates to the host`s copy of the Read Pointer.
This size is expressed in log2 of the actual size (in 64-bit
quadwords). For example, for a block of 1024
quadwords, you would program this field to 10(decimal).
Default = 0
BUF_SWAP
17:16
0x0
Endian Swap Control for Ring Buffer and Indirect
Buffer. Only affects the chip behavior if the buffer
resides in system memory. 0 = No swap 1 = 16-bit swap:
0xAABBCCDD becomes 0xBBAADDCC 2 = 32-bit
swap: 0xAABBCCDD becomes 0xDDCCBBAA 3 =
Half-dword swap: 0xAABBCCDD becomes
0xCCDDAABB Default = 0
MAX_FETCH
19:18
0x0
Maximum Fetch Size for any read request that the CP
makes to memory. 0 = 1 double octword. (32 bytes) 1 =
2 double octwords. (64 bytes) 2 = 4 double octwords.
(128 bytes) 3 = 8 double octwords. (256 bytes). Default
=0
RB_NO_UPDATE
27
0x0
Ring Buffer No Write to Read Pointer 0= Write to Host`s
copy of Read Pointer in system memory. 1= Do not write
to Host`s copy of Read pointer. The purpose of this
control bit is to have a fall-back position if the busmastered write to system memory doesn`t work, in which
case the driver will have to read the Graphics
Controller`s copy of the Read Pointer directly, with some
performance penalty. Default = 0
© 2010 Advanced Micro Devices, Inc.
Proprietary
152
Revision 1.5
RB_RPTR_WR_ENA
31
0x0
June 8, 2010
Ring Buffer Read Pointer Write Transfer Enable. When
set the contents of the CP_RB_RPTR_WR register is
transferred to the active read pointer (CP_RB_RPTR)
whenever the CP_RB_WPTR register is written. Default
=0
CP:CP_RB_RPTR · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x710
DESCRIPTION: Ring Buffer Read Pointer Address (RO)
Field Name
Bits
Default
Description
RB_RPTR
(Access: R)
22:0
none
Ring Buffer Read Pointer. This is an index (in dwords)
of the current element being read from the ring buffer.
CP:CP_RB_RPTR_ADDR · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x70c
DESCRIPTION: Ring Buffer Read Pointer Address
Field Name
Bits
Default
Description
RB_RPTR_SWAP
1:0
0x0
Swap control of the reported read pointer address. See
CP_RB_CNTL.BUF_SWAP for the encoding.
RB_RPTR_ADDR
31:2
0x0
Ring Buffer Read Pointer Address. Address of the Host`s
copy of the Read Pointer. CP_RB_RPTR (RO) Ring
Buffer Read Pointer
CP:CP_RB_RPTR_WR · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x71c
DESCRIPTION: Writable Ring Buffer Read Pointer Address
Field Name
Bits
Default
Description
RB_RPTR_WR
22:0
0x0
Writable Ring Buffer Read Pointer. Writable for
updating the RB_RPTR after an ACPI.
CP:CP_RB_WPTR · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x714
DESCRIPTION: (RO) Ring Buffer Write Pointer
Field Name
Bits
Default
Description
RB_WPTR
22:0
0x0
Ring Buffer Write Pointer. This is an index (in dwords)
of the last known element to be written to the ring buffer
(by the host).
CP:CP_RB_WPTR_DELAY · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x718
DESCRIPTION: Ring Buffer Write Pointer Delay
Field Name
Bits
Default
Description
PRE_WRITE_TIMER
27:0
0x0
Pre-Write Timer. The number of clocks that a write to
the CP_RB_WPTR register will be delayed until actually
taking effect. Default = 0
© 2010 Advanced Micro Devices, Inc.
Proprietary
153
Revision 1.5
PRE_WRITE_LIMIT
31:28
0x0
June 8, 2010
Pre-Write Limit. The number of times that the
CP_RB_WPTR register can be written (while the
PRE_WRITE_TIMER has not expired) before the
CP_RB_WPTR register is forced to be updated with the
most recently written value. Default = 0
CP:CP_RESYNC_ADDR · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x778
DESCRIPTION: Raster Engine Sync Address (WO)
Field Name
Bits
Default
Description
RESYNC_ADDR
(Access: W)
2:0
0x0
Scratch Register Offset Address.
CP:CP_RESYNC_DATA · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x77c
DESCRIPTION: Raster Engine Sync Data (WO)
Field Name
Bits
Default
Description
RESYNC_DATA
(Access: W)
31:0
none
Data written to selected Scratch Register when a sync
pulse pair is received from the CBA and CBB.
CP:CP_STAT · [R] · 32 bits · Access: 8/16/32 · MMReg:0x7c0
DESCRIPTION: (RO) Busy Status Signals
Field Name
Bits
Default
Description
MRU_BUSY
0
none
Memory Read Unit Busy.
MWU_BUSY
1
none
Memory Write Unit Busy.
RSIU_BUSY
2
none
Register Backbone Input Interface Busy.
RCIU_BUSY
3
none
RBBM Output Interface Busy.
CSF_PRIMARY_BUSY
9
none
Primary Command Stream Fetcher Busy.
CSF_INDIRECT_BUSY
10
none
Indirect #1 Command Stream Fetcher Busy.
CSQ_PRIMARY_BUSY
11
none
Data in Command Queue for Primary Stream.
CSQ_INDIRECT_BUSY
12
none
Data in Command Queue for Indirect #1 Stream.
CSI_BUSY
13
none
Command Stream Interpreter Busy.
CSF_INDIRECT2_BUSY
14
none
Indirect #2 Command Stream Fetcher Busy.
CSQ_INDIRECT2_BUSY
15
none
Data in Command Queue for Indirect #2 Stream.
GUIDMA_BUSY
28
none
GUI DMA Engine Busy.
VIDDMA_BUSY
29
none
VID DMA Engine Busy.
CMDSTRM_BUSY
30
none
Command Stream Busy.
CP_BUSY
31
none
CP Busy.
CP:CP_VID_COMMAND · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x7cc
© 2010 Advanced Micro Devices, Inc.
Proprietary
154
Revision 1.5
June 8, 2010
DESCRIPTION: Command for PIO VID DMAs
Field Name
Bits
Default
Description
CP_VID_COMMAND
31:0
none
Command for PIO DMAs to the VID DMA. Only
DWORD access is allowed to this register.
CP:CP_VID_DST_ADDR · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x7c8
DESCRIPTION: Destination Address for PIO VID DMAs
Field Name
Bits
Default
Description
CP_VID_DST_ADDR
31:0
none
Destination address for PIO DMAs to the VID DMA.
Only DWORD access is allowed to this register.
CP:CP_VID_SRC_ADDR · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x7c4
DESCRIPTION: Source Address for PIO VID DMAs
Field Name
Bits
Default
Description
CP_VID_SRC_ADDR
31:0
none
Source address for PIO DMAs to the VID DMA. Only
DWORD access is allowed to this register.
CP:CP_VP_ADDR_CNTL · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x7e8
DESCRIPTION: Virtual vs Physical Address Control - Selects whether the address corresponds to a physical or
virtual address in memory.
Field Name
Bits
Default
Description
SCRATCH_ALT_VP_WR
0
0x0
0=Physical (Default), 1=Virtual
SCRATCH_VP_WR
1
0x0
0=Physical (Default), 1=Virtual
RPTR_VP_UPDATE
2
0x0
0=Physical (Default), 1=Virtual
VIDDMA_VP_WR
3
0x0
0=Physical (Default), 1=Virtual
VIDDMA_VP_RD
4
0x0
0=Physical (Default), 1=Virtual
GUIDMA_VP_WR
5
0x0
0=Physical (Default), 1=Virtual
GUIDMA_VP_RD
6
0x0
0=Physical (Default), 1=Virtual
INDR2_VP_FETCH
7
0x0
0=Physical (Default), 1=Virtual
INDR1_VP_FETCH
8
0x0
0=Physical (Default), 1=Virtual
RING_VP_FETCH
9
0x0
0=Physical (Default), 1=Virtual
© 2010 Advanced Micro Devices, Inc.
Proprietary
155
Revision 1.5
June 8, 2010
11.2 Color Buffer Registers
CB:RB3D_AARESOLVE_CTL · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x4e88
DESCRIPTION: Resolve Buffer Control. Unpipelined
Field Name
Bits
Default
Description
AARESOLVE_MODE
0
0x0
Specifies if the color buffer is in resolve mode. The
cache must be empty before changing this register.
POSSIBLE VALUES:
00 - Normal operation.
01 - Resolve operation.
AARESOLVE_GAMMA
1
none
Specifies the gamma and degamma to be applied to the
samples before and after filtering, respectively.
POSSIBLE VALUES:
00 - 1.0
01 - 2.2
AARESOLVE_ALPHA
2
0x0
Controls whether alpha is averaged in the resolve. 0 =>
the resolved alpha value is selected from the sample 0
value. 1=> the resolved alpha value is a filtered (average)
result of of the samples.
POSSIBLE VALUES:
00 - Resolved alpha value is taken from sample 0.
01 - Resolved alpha value is the average of the
samples. The average is not gamma corrected.
CB:RB3D_AARESOLVE_OFFSET · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x4e80
DESCRIPTION: Resolve buffer destination address. The cache must be empty before changing this register if the
cb is in resolve mode. Unpipelined
Field Name
Bits
Default
Description
AARESOLVE_OFFSET
31:5
none
256-bit aligned 3D resolve destination offset.
CB:RB3D_AARESOLVE_PITCH · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x4e84
DESCRIPTION: Resolve Buffer Pitch and Tiling Control. The cache must be empty before changing this register if
the cb is in resolve mode. Unpipelined
Field Name
Bits
Default
Description
AARESOLVE_PITCH
13:1
none
3D destination pitch in multiples of 2-pixels.
CB:RB3D_ABLENDCNTL · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x4e08
DESCRIPTION: Alpha Blend Control for Alpha Channel. Pipelined through the blender.
Field Name
© 2010 Advanced Micro Devices, Inc.
Proprietary
Bits
Default
Description
156
Revision 1.5
COMB_FCN
14:12
none
June 8, 2010
Combine Function , Allows modification of how the
SRCBLEND and DESTBLEND are combined.
POSSIBLE VALUES:
00 - Add and Clamp
01 - Add but no Clamp
02 - Subtract Dst from Src, and Clamp
03 - Subtract Dst from Src, and don`t Clamp
04 - Minimum of Src, Dst (the src and dst blend
functions are forced to D3D_ONE)
05 - Maximum of Src, Dst (the src and dst blend
functions are forced to D3D_ONE)
06 - Subtract Src from Dst, and Clamp
07 - Subtract Src from Dst, and don`t Clamp
SRCBLEND
21:16
none
Source Blend Function , Alpha blending function (SRC).
POSSIBLE VALUES:
00 - RESERVED
01 - D3D_ZERO
02 - D3D_ONE
03 - D3D_SRCCOLOR
04 - D3D_INVSRCCOLOR
05 - D3D_SRCALPHA
06 - D3D_INVSRCALPHA
07 - D3D_DESTALPHA
08 - D3D_INVDESTALPHA
09 - D3D_DESTCOLOR
10 - D3D_INVDESTCOLOR
11 - D3D_SRCALPHASAT
12 - D3D_BOTHSRCALPHA
13 - D3D_BOTHINVSRCALPHA
14 - RESERVED
15 - RESERVED
16 - RESERVED
17 - RESERVED
18 - RESERVED
19 - RESERVED
20 - RESERVED
21 - RESERVED
22 - RESERVED
23 - RESERVED
24 - RESERVED
25 - RESERVED
26 - RESERVED
27 - RESERVED
28 - RESERVED
29 - RESERVED
30 - RESERVED
31 - RESERVED
32 - GL_ZERO
33 - GL_ONE
34 - GL_SRC_COLOR
© 2010 Advanced Micro Devices, Inc.
Proprietary
157
Revision 1.5
June 8, 2010
35 - GL_ONE_MINUS_SRC_COLOR
36 - GL_DST_COLOR
37 - GL_ONE_MINUS_DST_COLOR
38 - GL_SRC_ALPHA
39 - GL_ONE_MINUS_SRC_ALPHA
40 - GL_DST_ALPHA
41 - GL_ONE_MINUS_DST_ALPHA
42 - GL_SRC_ALPHA_SATURATE
43 - GL_CONSTANT_COLOR
44 - GL_ONE_MINUS_CONSTANT_COLOR
45 - GL_CONSTANT_ALPHA
46 - GL_ONE_MINUS_CONSTANT_ALPHA
47 - RESERVED
48 - RESERVED
49 - RESERVED
50 - RESERVED
51 - RESERVED
52 - RESERVED
53 - RESERVED
54 - RESERVED
55 - RESERVED
56 - RESERVED
57 - RESERVED
58 - RESERVED
59 - RESERVED
60 - RESERVED
61 - RESERVED
62 - RESERVED
63 - RESERVED
DESTBLEND
29:24
none
Destination Blend Function , Alpha blending function
(DST).
POSSIBLE VALUES:
00 - RESERVED
01 - D3D_ZERO
02 - D3D_ONE
03 - D3D_SRCCOLOR
04 - D3D_INVSRCCOLOR
05 - D3D_SRCALPHA
06 - D3D_INVSRCALPHA
07 - D3D_DESTALPHA
08 - D3D_INVDESTALPHA
09 - D3D_DESTCOLOR
10 - D3D_INVDESTCOLOR
11 - RESERVED
12 - RESERVED
13 - RESERVED
14 - RESERVED
15 - RESERVED
16 - RESERVED
17 - RESERVED
18 - RESERVED
19 - RESERVED
© 2010 Advanced Micro Devices, Inc.
Proprietary
158
Revision 1.5
June 8, 2010
20 - RESERVED
21 - RESERVED
22 - RESERVED
23 - RESERVED
24 - RESERVED
25 - RESERVED
26 - RESERVED
27 - RESERVED
28 - RESERVED
29 - RESERVED
30 - RESERVED
31 - RESERVED
32 - GL_ZERO
33 - GL_ONE
34 - GL_SRC_COLOR
35 - GL_ONE_MINUS_SRC_COLOR
36 - GL_DST_COLOR
37 - GL_ONE_MINUS_DST_COLOR
38 - GL_SRC_ALPHA
39 - GL_ONE_MINUS_SRC_ALPHA
40 - GL_DST_ALPHA
41 - GL_ONE_MINUS_DST_ALPHA
42 - RESERVED
43 - GL_CONSTANT_COLOR
44 - GL_ONE_MINUS_CONSTANT_COLOR
45 - GL_CONSTANT_ALPHA
46 - GL_ONE_MINUS_CONSTANT_ALPHA
47 - RESERVED
48 - RESERVED
49 - RESERVED
50 - RESERVED
51 - RESERVED
52 - RESERVED
53 - RESERVED
54 - RESERVED
55 - RESERVED
56 - RESERVED
57 - RESERVED
58 - RESERVED
59 - RESERVED
60 - RESERVED
61 - RESERVED
62 - RESERVED
63 - RESERVED
CB:RB3D_BLENDCNTL · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x4e04
DESCRIPTION: Alpha Blend Control for Color Channels. Pipelined through the blender.
Field Name
Bits
Default
Description
ALPHA_BLEND_ENABLE
0
0x0
Allow alpha blending with the destination.
© 2010 Advanced Micro Devices, Inc.
Proprietary
159
Revision 1.5
June 8, 2010
POSSIBLE VALUES:
00 - Disable
01 - Enable
SEPARATE_ALPHA_ENABLE 1
0x0
Enables use of RB3D_ABLENDCNTL
POSSIBLE VALUES:
00 - Disabled (Use RB3D_BLENDCNTL)
01 - Enabled (Use RB3D_ABLENDCNTL)
READ_ENABLE
2
0x1
When blending is enabled, this enables memory reads.
Memory reads will still occur when this is disabled if
they are for reasons not related to blending.
POSSIBLE VALUES:
00 - Disable reads
01 - Enable reads
DISCARD_SRC_PIXELS
5:3
0x0
Discard pixels when blending is enabled based on the src
color.
POSSIBLE VALUES:
00 - Disable
01 - Discard pixels if src alpha <=
RB3D_DISCARD_SRC_PIXEL_LTE_THRESHOLD
02 - Discard pixels if src color <=
RB3D_DISCARD_SRC_PIXEL_LTE_THRESHOLD
03 - Discard pixels if src argb <=
RB3D_DISCARD_SRC_PIXEL_LTE_THRESHOLD
04 - Discard pixels if src alpha >=
RB3D_DISCARD_SRC_PIXEL_GTE_THRESHOLD
05 - Discard pixels if src color >=
RB3D_DISCARD_SRC_PIXEL_GTE_THRESHOLD
06 - Discard pixels if src argb >=
RB3D_DISCARD_SRC_PIXEL_GTE_THRESHOLD
07 - (reserved)
COMB_FCN
14:12
none
Combine Function , Allows modification of how the
SRCBLEND and DESTBLEND are combined.
POSSIBLE VALUES:
00 - Add and Clamp
01 - Add but no Clamp
02 - Subtract Dst from Src, and Clamp
03 - Subtract Dst from Src, and don`t Clamp
04 - Minimum of Src, Dst (the src and dst blend
functions are forced to D3D_ONE)
05 - Maximum of Src, Dst (the src and dst blend
functions are forced to D3D_ONE)
06 - Subtract Src from Dst, and Clamp
07 - Subtract Src from Dst, and don`t Clamp
SRCBLEND
© 2010 Advanced Micro Devices, Inc.
Proprietary
21:16
none
Source Blend Function , Alpha blending function (SRC).
160
Revision 1.5
June 8, 2010
POSSIBLE VALUES:
00 - RESERVED
01 - D3D_ZERO
02 - D3D_ONE
03 - D3D_SRCCOLOR
04 - D3D_INVSRCCOLOR
05 - D3D_SRCALPHA
06 - D3D_INVSRCALPHA
07 - D3D_DESTALPHA
08 - D3D_INVDESTALPHA
09 - D3D_DESTCOLOR
10 - D3D_INVDESTCOLOR
11 - D3D_SRCALPHASAT
12 - D3D_BOTHSRCALPHA
13 - D3D_BOTHINVSRCALPHA
14 - RESERVED
15 - RESERVED
16 - RESERVED
17 - RESERVED
18 - RESERVED
19 - RESERVED
20 - RESERVED
21 - RESERVED
22 - RESERVED
23 - RESERVED
24 - RESERVED
25 - RESERVED
26 - RESERVED
27 - RESERVED
28 - RESERVED
29 - RESERVED
30 - RESERVED
31 - RESERVED
32 - GL_ZERO
33 - GL_ONE
34 - GL_SRC_COLOR
35 - GL_ONE_MINUS_SRC_COLOR
36 - GL_DST_COLOR
37 - GL_ONE_MINUS_DST_COLOR
38 - GL_SRC_ALPHA
39 - GL_ONE_MINUS_SRC_ALPHA
40 - GL_DST_ALPHA
41 - GL_ONE_MINUS_DST_ALPHA
42 - GL_SRC_ALPHA_SATURATE
43 - GL_CONSTANT_COLOR
44 - GL_ONE_MINUS_CONSTANT_COLOR
45 - GL_CONSTANT_ALPHA
46 - GL_ONE_MINUS_CONSTANT_ALPHA
47 - RESERVED
48 - RESERVED
49 - RESERVED
50 - RESERVED
© 2010 Advanced Micro Devices, Inc.
Proprietary
161
Revision 1.5
June 8, 2010
51 - RESERVED
52 - RESERVED
53 - RESERVED
54 - RESERVED
55 - RESERVED
56 - RESERVED
57 - RESERVED
58 - RESERVED
59 - RESERVED
60 - RESERVED
61 - RESERVED
62 - RESERVED
63 - RESERVED
DESTBLEND
29:24
none
Destination Blend Function , Alpha blending function
(DST).
POSSIBLE VALUES:
00 - RESERVED
01 - D3D_ZERO
02 - D3D_ONE
03 - D3D_SRCCOLOR
04 - D3D_INVSRCCOLOR
05 - D3D_SRCALPHA
06 - D3D_INVSRCALPHA
07 - D3D_DESTALPHA
08 - D3D_INVDESTALPHA
09 - D3D_DESTCOLOR
10 - D3D_INVDESTCOLOR
11 - RESERVED
12 - RESERVED
13 - RESERVED
14 - RESERVED
15 - RESERVED
16 - RESERVED
17 - RESERVED
18 - RESERVED
19 - RESERVED
20 - RESERVED
21 - RESERVED
22 - RESERVED
23 - RESERVED
24 - RESERVED
25 - RESERVED
26 - RESERVED
27 - RESERVED
28 - RESERVED
29 - RESERVED
30 - RESERVED
31 - RESERVED
32 - GL_ZERO
33 - GL_ONE
34 - GL_SRC_COLOR
35 - GL_ONE_MINUS_SRC_COLOR
© 2010 Advanced Micro Devices, Inc.
Proprietary
162
Revision 1.5
June 8, 2010
36 - GL_DST_COLOR
37 - GL_ONE_MINUS_DST_COLOR
38 - GL_SRC_ALPHA
39 - GL_ONE_MINUS_SRC_ALPHA
40 - GL_DST_ALPHA
41 - GL_ONE_MINUS_DST_ALPHA
42 - RESERVED
43 - GL_CONSTANT_COLOR
44 - GL_ONE_MINUS_CONSTANT_COLOR
45 - GL_CONSTANT_ALPHA
46 - GL_ONE_MINUS_CONSTANT_ALPHA
47 - RESERVED
48 - RESERVED
49 - RESERVED
50 - RESERVED
51 - RESERVED
52 - RESERVED
53 - RESERVED
54 - RESERVED
55 - RESERVED
56 - RESERVED
57 - RESERVED
58 - RESERVED
59 - RESERVED
60 - RESERVED
61 - RESERVED
62 - RESERVED
63 - RESERVED
SRC_ALPHA_0_NO_READ
30
0x0
Enables source alpha zero performance optimization to
skip reads.
POSSIBLE VALUES:
00 - Disable source alpha zero performance
optimization to skip reads
01 - Enable source alpha zero performance
optimization to skip reads
SRC_ALPHA_1_NO_READ
31
0x0
Enables source alpha one performance optimization to
skip reads.
POSSIBLE VALUES:
00 - Disable source alpha one performance
optimization to skip reads
01 - Enable source alpha one performance
optimization to skip reads
CB:RB3D_DISCARD_SRC_PIXEL_GTE_THRESHOLD · [R/W] · 32 bits · Access: 8/16/32 ·
MMReg:0x4ea4
DESCRIPTION: Discard src pixels greater than or equal to threshold.
Field Name
© 2010 Advanced Micro Devices, Inc.
Proprietary
Bits
Default
Description
163
Revision 1.5
BLUE
7:0
0xFF
Blue
GREEN
15:8
0xFF
Green
RED
23:16
0xFF
Red
ALPHA
31:24
0xFF
Alpha
June 8, 2010
CB:RB3D_DISCARD_SRC_PIXEL_LTE_THRESHOLD · [R/W] · 32 bits · Access: 8/16/32 ·
MMReg:0x4ea0
DESCRIPTION: Discard src pixels less than or equal to threshold.
Field Name
Bits
Default
Description
BLUE
7:0
0x0
Blue
GREEN
15:8
0x0
Green
RED
23:16
0x0
Red
ALPHA
31:24
0x0
Alpha
CB:RB3D_CCTL · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x4e00
DESCRIPTION: Unpipelined.
Field Name
Bits Default Description
NUM_MULTIWRITES
6:5 0x0
A quad is replicated and written to this
many buffers.
POSSIBLE VALUES:
00 - 1 buffer. This is the only mode
where the cb processes the end of packet
command.
01 - 2 buffers
02 - 3 buffers
03 - 4 buffers
CLRCMP_FLIPE_ENABLE
7
0x0
Enables equivalent of rage128
CMP_EQ_FLIP color compare mode.
This is used to ensure 3D data does not
get chromakeyed away by logic in the
backend.
POSSIBLE VALUES:
00 - Disable color compare.
01 - Enable color compare.
AA_COMPRESSION_ENABLE
9
none
Enables AA color compression. Cmask
must also be enabled when aa
compression is enabled. The cache must
be empty before this is changed.
POSSIBLE VALUES:
00 - Disable AA compression
01 - Enable AA compression
© 2010 Advanced Micro Devices, Inc.
Proprietary
164
Revision 1.5
CMASK_ENABLE
10
June 8, 2010
none
Enables use of the cmask ram. The cache
must be empty before this is changed.
POSSIBLE VALUES:
00 - Disable
01 - Enable
Reserved
11
0x0
Set to 0
INDEPENDENT_COLOR_CHANNEL_MASK_ENABLE 12
0x0
Enables indepedent color channel masks
for the MRTs. Disabling this feature will
cause all the MRTs to use color channel
mask 0.
POSSIBLE VALUES:
00 - Disable
01 - Enable
WRITE_COMPRESSION_DISABLE
13
none
Disables write compression.
POSSIBLE VALUES:
00 - Enable write compression
01 - Disable write compression
INDEPENDENT_COLORFORMAT_ENABLE
14
0x0
Enables independent color format for the
MRTs. Disabling this feature will cause
all the MRTs to use color format 0.
POSSIBLE VALUES:
00 - Disable
01 - Enable
CB:RB3D_CLRCMP_CLR · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x4e20
DESCRIPTION: Color Compare Color. Stalls the 2d/3d datapath until it is idle.
Field Name
Bits
Default
Description
CLRCMP_CLR
31:0
none
Like RB2D_CLRCMP_CLR, but a separate register is
provided to keep 2D and 3D state separate.
CB:RB3D_CLRCMP_FLIPE · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x4e1c
DESCRIPTION: Color Compare Flip. Stalls the 2d/3d datapath until it is idle.
Field Name
Bits
Default
Description
CLRCMP_FLIPE
31:0
none
Like RB2D_CLRCMP_FLIPE, but a separate register is
provided to keep 2D and 3D state separate.
CB:RB3D_CLRCMP_MSK · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x4e24
DESCRIPTION: Color Compare Mask. Stalls the 2d/3d datapath until it is idle.
Field Name
© 2010 Advanced Micro Devices, Inc.
Proprietary
Bits
Default
Description
165
Revision 1.5
CLRCMP_MSK
31:0
none
June 8, 2010
Like RB2D_CLRCMP_CLR, but separate registers
provided to keep 2D and 3D state separate.
CB:RB3D_COLOROFFSET[0-3] · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x4e28-0x4e34
DESCRIPTION: Color Buffer Address Offset of multibuffer 0. Unpipelined.
Field Name
Bits
Default
Description
COLOROFFSET
31:5
none
256-bit aligned 3D destination offset address. The cache
must be empty before this is changed.
CB:RB3D_COLORPITCH[0-3] · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x4e38-0x4e44
DESCRIPTION: Color buffer format and tiling control for all the multibuffers and the pitch of multibuffer 0.
Unpipelined. The cache must be empty before any of the registers are changed.
Field Name
Bits
Default
Description
COLORPITCH
13:1
none
3D destination pitch in multiples of 2-pixels.
COLORTILE
16
none
Denotes whether the 3D destination is in macrotiled
format.
POSSIBLE VALUES:
00 - 3D destination is not macrotiled
01 - 3D destination is macrotiled
COLORMICROTILE
18:17
none
Denotes whether the 3D destination is in microtiled
format.
POSSIBLE VALUES:
00 - 3D destination is no microtiled
01 - 3D destination is microtiled
02 - 3D destination is square microtiled. Only
available in 16-bit
03 - (reserved)
COLORENDIAN
20:19
none
Specifies endian control for the color buffer.
POSSIBLE VALUES:
00 - No swap
01 - Word swap (2 bytes in 16-bit)
02 - Dword swap (4 bytes in a 32-bit)
03 - Half-Dword swap (2 16-bit in a 32-bit)
COLORFORMAT
24:21
0x6
3D destination color format.
POSSIBLE VALUES:
00 - ARGB10101010
01 - UV1010
02 - CI8 (2D ONLY)
03 - ARGB1555
04 - RGB565
05 - ARGB2101010
06 - ARGB8888
© 2010 Advanced Micro Devices, Inc.
Proprietary
166
Revision 1.5
June 8, 2010
07 - ARGB32323232
08 - (Reserved)
09 - I8
10 - ARGB16161616
11 - YUV422 packed (VYUY)
12 - YUV422 packed (YVYU)
13 - UV88
14 - I10
15 - ARGB4444
CB:RB3D_COLOR_CHANNEL_MASK · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x4e0c
DESCRIPTION: 3D Color Channel Mask. If all the channels used in the current color format are disabled, then
the cb will discard all the incoming quads. Pipelined through the blender.
Field Name
Bits
Default
Description
BLUE_MASK
0
0x1
mask bit for the blue channel
POSSIBLE VALUES:
00 - disable
01 - enable
GREEN_MASK
1
0x1
mask bit for the green channel
POSSIBLE VALUES:
00 - disable
01 - enable
RED_MASK
2
0x1
mask bit for the red channel
POSSIBLE VALUES:
00 - disable
01 - enable
ALPHA_MASK
3
0x1
mask bit for the alpha channel
POSSIBLE VALUES:
00 - disable
01 - enable
BLUE_MASK1
4
0x1
mask bit for the blue channel of MRT 1
POSSIBLE VALUES:
00 - disable
01 - enable
GREEN_MASK1
5
0x1
mask bit for the green channel of MRT 1
POSSIBLE VALUES:
00 - disable
01 - enable
RED_MASK1
6
0x1
mask bit for the red channel of MRT 1
POSSIBLE VALUES:
© 2010 Advanced Micro Devices, Inc.
Proprietary
167
Revision 1.5
June 8, 2010
00 - disable
01 - enable
ALPHA_MASK1
7
0x1
mask bit for the alpha channel of MRT 1
POSSIBLE VALUES:
00 - disable
01 - enable
BLUE_MASK2
8
0x1
mask bit for the blue channel of MRT 2
POSSIBLE VALUES:
00 - disable
01 - enable
GREEN_MASK2
9
0x1
mask bit for the green channel of MRT 2
POSSIBLE VALUES:
00 - disable
01 - enable
RED_MASK2
10
0x1
mask bit for the red channel of MRT 2
POSSIBLE VALUES:
00 - disable
01 - enable
ALPHA_MASK2
11
0x1
mask bit for the alpha channel of MRT 2
POSSIBLE VALUES:
00 - disable
01 - enable
BLUE_MASK3
12
0x1
mask bit for the blue channel of MRT 3
POSSIBLE VALUES:
00 - disable
01 - enable
GREEN_MASK3
13
0x1
mask bit for the green channel of MRT 3
POSSIBLE VALUES:
00 - disable
01 - enable
RED_MASK3
14
0x1
mask bit for the red channel of MRT 3
POSSIBLE VALUES:
00 - disable
01 - enable
ALPHA_MASK3
15
0x1
mask bit for the alpha channel of MRT 3
POSSIBLE VALUES:
00 - disable
01 - enable
© 2010 Advanced Micro Devices, Inc.
Proprietary
168
Revision 1.5
June 8, 2010
CB:RB3D_COLOR_CLEAR_VALUE · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x4e14
DESCRIPTION: Clear color that is used when the color mask is set to 00. Unpipelined. Program this register with
a 32-bit value in ARGB8888 or ARGB2101010 formats, ignoring the fields.
Field Name
Bits
Default
Description
BLUE
7:0
none
blue clear color
GREEN
15:8
none
green clear color
RED
23:16
none
red clear color
ALPHA
31:24
none
alpha clear color
CB:RB3D_COLOR_CLEAR_VALUE_AR · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x46c0
DESCRIPTION: Alpha and red clear color values that are used when the color mask is set to 00 in FP16 per
component mode. Unpipelined.
Field Name
Bits
Default
Description
RED
15:0
none
red clear color
ALPHA
31:16
none
alpha clear color
CB:RB3D_COLOR_CLEAR_VALUE_GB · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x46c4
DESCRIPTION: Green and blue clear color values that are used when the color mask is set to 00 in FP16 per
component mode. Unpipelined.
Field Name
Bits
Default
Description
BLUE
15:0
none
blue clear color
GREEN
31:16
none
green clear color
CB:RB3D_CONSTANT_COLOR · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x4e10
DESCRIPTION: Constant color used by the blender. Pipelined through the blender.
Field Name
Bits
Default
Description
BLUE
7:0
none
blue constant color (For R520, this field is ignored, use
RB3D_CONSTANT_COLOR_GB__BLUE instead)
GREEN
15:8
none
green constant color (For R520, this field is ignored, use
RB3D_CONSTANT_COLOR_GB__GREEN instead)
RED
23:16
none
red constant color (For R520, this field is ignored, use
RB3D_CONSTANT_COLOR_AR__RED instead)
ALPHA
31:24
none
alpha constant color (For R520, this field is ignored, use
RB3D_CONSTANT_COLOR_AR__ALPHA instead)
CB:RB3D_CONSTANT_COLOR_AR · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x4ef8
DESCRIPTION: Constant color used by the blender. Pipelined through the blender.
Field Name
Bits
Default
Description
RED
15:0
none
red constant color in 0.10 fixed or FP16 format
© 2010 Advanced Micro Devices, Inc.
Proprietary
169
Revision 1.5
ALPHA
31:16
none
June 8, 2010
alpha constant color in 0.10 fixed or FP16 format
CB:RB3D_CONSTANT_COLOR_GB · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x4efc
DESCRIPTION: Constant color used by the blender. Pipelined through the blender.
Field Name
Bits
Default
Description
BLUE
15:0
none
blue constant color in 0.10 fixed or FP16 format
GREEN
31:16
none
green constant color in 0.10 fixed or FP16 format
CB:RB3D_DITHER_CTL · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x4e50
DESCRIPTION: Dithering control register. Pipelined through the blender.
Field Name
Bits
Default
Description
DITHER_MODE
1:0
0x0
Dither mode
POSSIBLE VALUES:
00 - Truncate
01 - Round
02 - LUT dither
03 - (reserved)
ALPHA_DITHER_MODE
3:2
0x0
POSSIBLE VALUES:
00 - Truncate
01 - Round
02 - LUT dither
03 - (reserved)
CB:RB3D_DSTCACHE_CTLSTAT · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x4e4c
DESCRIPTION: Destination Color Buffer Cache Control/Status. If the cb is in e2 mode, then a flush or free will
not occur upon a write to this register, but a sync will be immediately sent if one is requested. If both DC_FLUSH
and DC_FREE are zero but DC_FINISH is one, then a sync will be sent immediately -- the cb will not wait for all
the previous operations to complete before sending the sync. Unpipelined except when DC_FINISH and DC_FREE
are both set to zero.
Field Name
Bits
Default
Description
DC_FLUSH
1:0
0x0
Setting this bit flushes dirty data from the 3D Dst Cache.
Unless the DC_FREE bits are also set, the tags in the
cache remain valid. A purge is achieved by setting both
DC_FLUSH and DC_FREE.
POSSIBLE VALUES:
00 - No effect
01 - No effect
02 - Flushes dirty 3D data
03 - Flushes dirty 3D data
DC_FREE
© 2010 Advanced Micro Devices, Inc.
Proprietary
3:2
0x0
Setting this bit invalidates the 3D Dst Cache tags. Unless
the DC_FLUSH bit is also set, the cache lines are not
written to memory. A purge is achieved by setting both
170
Revision 1.5
June 8, 2010
DC_FLUSH and DC_FREE.
POSSIBLE VALUES:
00 - No effect
01 - No effect
02 - Free 3D tags
03 - Free 3D tags
DC_FINISH
4
0x0
POSSIBLE VALUES:
00 - do not send a finish signal to the CP
01 - send a finish signal to the CP after the end of
operation
CB:RB3D_FIFO_SIZE · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x4ef4
DESCRIPTION: Sets the fifo sizes
Field Name
Bits
Default
Description
OP_FIFO_SIZE
1:0
0x0
Determines the size of the op fifo
POSSIBLE VALUES:
00 - Full size
01 - 1/2 size
02 - 1/4 size
03 - 1/8 size
CB:RB3D_ROPCNTL · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x4e18
DESCRIPTION: 3D ROP Control. Stalls the 2d/3d datapath until it is idle.
Field Name
Bits
Default
Description
ROP_ENABLE
2
0x0
POSSIBLE VALUES:
00 - Disable ROP. (Forces ROP2 to be 0xC).
01 - Enabled
ROP
11:8
none
ROP2 code for 3D fragments. This value is replicated
into 2 nibbles to form the equivalent ROP3 code to
control the ROP3 logic. These are the GDI ROP2 codes.
© 2010 Advanced Micro Devices, Inc.
Proprietary
171
Revision 1.5
June 8, 2010
11.3 Fog Registers
FG:FG_ALPHA_FUNC · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x4bd4
DESCRIPTION: Alpha Function
Field Name
Bits
Default
Description
AF_VAL
7:0
0x0
Specifies the 8-bit alpha compare value when
AF_EN_8BIT is enabled
AF_FUNC
10:8
0x0
Specifies the alpha compare function.
POSSIBLE VALUES:
00 - AF_NEVER
01 - AF_LESS
02 - AF_EQUAL
03 - AF_LE
04 - AF_GREATER
05 - AF_NOTEQUAL
06 - AF_GE
07 - AF_ALWAYS
AF_EN
11
0x0
Enables/Disables alpha compare function.
POSSIBLE VALUES:
00 - Disable alpha function.
01 - Enable alpha function.
AF_EN_8BIT
12
0x0
Enable 8-bit alpha compare function.
POSSIBLE VALUES:
00 - Default 10-bit alpha compare.
01 - Enable 8-bit alpha compare.
AM_EN
16
0x0
Enables/Disables alpha-to-mask function.
POSSIBLE VALUES:
00 - Disable alpha to mask function.
01 - Enable alpha to mask function.
AM_CFG
17
0x0
Specfies number of sub-pixel samples for alpha-to-mask
function.
POSSIBLE VALUES:
00 - 2/4 sub-pixel samples.
01 - 3/6 sub-pixel samples.
DITH_EN
20
0x0
Enables/Disables RGB Dithering (Not supported in
R520)
POSSIBLE VALUES:
00 - Disable Dithering
01 - Enable Dithering.
ALP_OFF_EN
© 2010 Advanced Micro Devices, Inc.
Proprietary
24
0x0
Alpha offset enable/disable (Not supported in R520)
172
Revision 1.5
June 8, 2010
POSSIBLE VALUES:
00 - Disables alpha offset of 2 (default r300 & rv350
behavior)
01 - Enables offset of 2 on alpha coming in from the
US
DISCARD_ZERO_MASK_QUAD 25
0x0
Enable/Disable discard zero mask coverage quad to ZB
POSSIBLE VALUES:
00 - No discard of zero coverage mask quads
01 - Discard zero coverage mask quads
FP16_ENABLE
28
0x0
Enables/Disables FP16 alpha function
POSSIBLE VALUES:
00 - Default 10-bit alpha compare and alpha-to-mask
function
01 - Enable FP16 alpha compare and alpha-to-mask
function
FG:FG_ALPHA_VALUE · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x4be0
DESCRIPTION: Alpha Compare Value
Field Name
Bits
Default
Description
AF_VAL
15:0
0x0
Specifies the alpha compare value, 0.10 fixed or FP16
format
FG:FG_DEPTH_SRC · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x4bd8
DESCRIPTION: Where does depth come from?
Field Name
Bits
Default
Description
DEPTH_SRC
0
0x0
POSSIBLE VALUES:
00 - Depth comes from scan converter as plane
equation.
01 - Depth comes from shader as four discrete values.
FG:FG_FOG_BLEND · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x4bc0
DESCRIPTION: Fog Blending Enable
Field Name
Bits
Default
Description
ENABLE
0
0x0
Enable for fog blending
POSSIBLE VALUES:
00 - Disables fog (output matches input color).
01 - Enables fog.
FN
© 2010 Advanced Micro Devices, Inc.
Proprietary
2:1
0x0
Fog generation function
173
Revision 1.5
June 8, 2010
POSSIBLE VALUES:
00 - Fog function is linear
01 - Fog function is exponential
02 - Fog function is exponential squared
03 - Fog is derived from constant fog factor
FG:FG_FOG_COLOR_B · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x4bd0
DESCRIPTION: Blue Component of Fog Color
Field Name
Bits
Default
Description
BLUE
9:0
0x0
Blue component of fog color; (0.10) fixed format.
FG:FG_FOG_COLOR_G · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x4bcc
DESCRIPTION: Green Component of Fog Color
Field Name
Bits
Default
Description
GREEN
9:0
0x0
Green component of fog color; (0.10) fixed format.
FG:FG_FOG_COLOR_R · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x4bc8
DESCRIPTION: Red Component of Fog Color
Field Name
Bits
Default
Description
RED
9:0
0x0
Red component of fog color; (0.10) fixed format.
FG:FG_FOG_FACTOR · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x4bc4
DESCRIPTION: Constant Factor for Fog Blending
Field Name
Bits
Default
Description
FACTOR
9:0
0x0
Constant fog factor; fixed (0.10) format.
© 2010 Advanced Micro Devices, Inc.
Proprietary
174
Revision 1.5
June 8, 2010
11.4 Geometry Assembly Registers
GA:GA_COLOR_CONTROL · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x4278
DESCRIPTION: Specifies per RGB or Alpha shading method.
Field Name
Bits
Default
Description
RGB0_SHADING
1:0
0x0
Specifies solid, flat or Gouraud shading.
POSSIBLE VALUES:
00 - Solid fill color
01 - Flat shading
02 - Gouraud shading
ALPHA0_SHADING
3:2
0x0
Specifies solid, flat or Gouraud shading.
POSSIBLE VALUES:
00 - Solid fill color
01 - Flat shading
02 - Gouraud shading
RGB1_SHADING
5:4
0x0
Specifies solid, flat or Gouraud shading.
POSSIBLE VALUES:
00 - Solid fill color
01 - Flat shading
02 - Gouraud shading
ALPHA1_SHADING
7:6
0x0
Specifies solid, flat or Gouraud shading.
POSSIBLE VALUES:
00 - Solid fill color
01 - Flat shading
02 - Gouraud shading
RGB2_SHADING
9:8
0x0
Specifies solid, flat or Gouraud shading.
POSSIBLE VALUES:
00 - Solid fill color
01 - Flat shading
02 - Gouraud shading
ALPHA2_SHADING
11:10
0x0
Specifies solid, flat or Gouraud shading.
POSSIBLE VALUES:
00 - Solid fill color
01 - Flat shading
02 - Gouraud shading
RGB3_SHADING
13:12
0x0
Specifies solid, flat or Gouraud shading.
POSSIBLE VALUES:
00 - Solid fill color
01 - Flat shading
02 - Gouraud shading
© 2010 Advanced Micro Devices, Inc.
Proprietary
175
Revision 1.5
ALPHA3_SHADING
15:14
0x0
June 8, 2010
Specifies solid, flat or Gouraud shading.
POSSIBLE VALUES:
00 - Solid fill color
01 - Flat shading
02 - Gouraud shading
PROVOKING_VERTEX
17:16
0x0
Specifies, for flat shaded polygons, which vertex holds
the polygon color.
POSSIBLE VALUES:
00 - Provoking is first vertex
01 - Provoking is second vertex
02 - Provoking is third vertex
03 - Provoking is always last vertex
GA:GA_COLOR_CONTROL_PS3 · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x4258
DESCRIPTION: Specifies color properties and mappings of textures.
Field Name
Bits
Default
Description
TEX0_SHADING_PS3
1:0
0x0
Specifies undefined(0), flat(1) and Gouraud(2/def)
shading for each texture.
POSSIBLE VALUES:
00 - Solid fill color
01 - Flat shading
02 - Gouraud shading
TEX1_SHADING_PS3
3:2
0x0
Specifies undefined(0), flat(1) and Gouraud(2/def)
shading for each texture.
POSSIBLE VALUES:
00 - Solid fill color
01 - Flat shading
02 - Gouraud shading
TEX2_SHADING_PS3
5:4
0x0
Specifies undefined(0), flat(1) and Gouraud(2/def)
shading for each texture.
POSSIBLE VALUES:
00 - Solid fill color
01 - Flat shading
02 - Gouraud shading
TEX3_SHADING_PS3
7:6
0x0
Specifies undefined(0), flat(1) and Gouraud(2/def)
shading for each texture.
POSSIBLE VALUES:
00 - Solid fill color
01 - Flat shading
02 - Gouraud shading
TEX4_SHADING_PS3
© 2010 Advanced Micro Devices, Inc.
Proprietary
9:8
0x0
Specifies undefined(0), flat(1) and Gouraud(2/def)
shading for each texture.
176
Revision 1.5
June 8, 2010
POSSIBLE VALUES:
00 - Solid fill color
01 - Flat shading
02 - Gouraud shading
TEX5_SHADING_PS3
11:10
0x0
Specifies undefined(0), flat(1) and Gouraud(2/def)
shading for each texture.
POSSIBLE VALUES:
00 - Solid fill color
01 - Flat shading
02 - Gouraud shading
TEX6_SHADING_PS3
13:12
0x0
Specifies undefined(0), flat(1) and Gouraud(2/def)
shading for each texture.
POSSIBLE VALUES:
00 - Solid fill color
01 - Flat shading
02 - Gouraud shading
TEX7_SHADING_PS3
15:14
0x0
Specifies undefined(0), flat(1) and Gouraud(2/def)
shading for each texture.
POSSIBLE VALUES:
00 - Solid fill color
01 - Flat shading
02 - Gouraud shading
TEX8_SHADING_PS3
17:16
0x0
Specifies undefined(0), flat(1) and Gouraud(2/def)
shading for each texture.
POSSIBLE VALUES:
00 - Solid fill color
01 - Flat shading
02 - Gouraud shading
TEX9_SHADING_PS3
19:18
0x0
Specifies undefined(0), flat(1) and Gouraud(2/def)
shading for each texture.
POSSIBLE VALUES:
00 - Solid fill color
01 - Flat shading
02 - Gouraud shading
TEX10_SHADING_PS3
21:20
0x0
Specifies undefined(0), flat(1) and Gouraud(2/def)
shading for tex10 components.
POSSIBLE VALUES:
00 - Solid fill color
01 - Flat shading
02 - Gouraud shading
COLOR0_TEX_OVERRIDE
© 2010 Advanced Micro Devices, Inc.
Proprietary
25:22
0x0
Specifies if each color should come from a texture and
which one.
177
Revision 1.5
June 8, 2010
POSSIBLE VALUES:
00 - No override
01 - Stuff texture 0
02 - Stuff texture 1
03 - Stuff texture 2
04 - Stuff texture 3
05 - Stuff texture 4
06 - Stuff texture 5
07 - Stuff texture 6
08 - Stuff texture 7
09 - Stuff texture 8/C2
10 - Stuff texture 9/C3
COLOR1_TEX_OVERRIDE
29:26
0x0
Specifies if each color should come from a texture and
which one.
POSSIBLE VALUES:
00 - No override
01 - Stuff texture 0
02 - Stuff texture 1
03 - Stuff texture 2
04 - Stuff texture 3
05 - Stuff texture 4
06 - Stuff texture 5
07 - Stuff texture 6
08 - Stuff texture 7
09 - Stuff texture 8/C2
10 - Stuff texture 9/C3
GA:GA_ENHANCE · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x4274
DESCRIPTION: GA Enhancement Register
Field Name
Bits
Default
Description
DEADLOCK_CNTL
0
0x0
TCL/GA Deadlock control.
POSSIBLE VALUES:
00 - No effect.
01 - Prevents TCL interface from deadlocking on GA
side.
FASTSYNC_CNTL
1
0x1
Enables Fast register/primitive switching
POSSIBLE VALUES:
00 - No effect.
01 - Enables high-performance register/primitive
switching.
REG_READWRITE
2
0x0
R520+: When set, GA supports simultaneous register
reads & writes
POSSIBLE VALUES:
© 2010 Advanced Micro Devices, Inc.
Proprietary
178
Revision 1.5
June 8, 2010
00 - No effect.
01 - Enables GA support of simultaneous register
reads and writes.
REG_NOSTALL
3
0x0
POSSIBLE VALUES:
00 - No effect.
01 - Enables GA support of no-stall reads for register
read back.
GA:GA_FIFO_CNTL · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x4270
DESCRIPTION: GA Input fifo high water marks
Field Name
Bits
Default
Description
VERTEX_FIFO
2:0
0x0
Number of words remaining in input vertex fifo before
asserting nearly full
INDEX_FIFO
5:3
0x0
Number of words remaining in input primitive fifo
before asserting nearly full
REG_FIFO
13:6
0x0
Number of words remaining in input register fifo before
asserting nearly full
GA:GA_FILL_A · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x422c
DESCRIPTION: Alpha fill color
Field Name
Bits
Default
Description
COLOR_ALPHA
31:0
0x0
FP20 format for alpha fill.
GA:GA_FILL_B · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x4228
DESCRIPTION: Blue fill color
Field Name
Bits
Default
Description
COLOR_BLUE
31:0
0x0
FP20 format for blue fill.
GA:GA_FILL_G · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x4224
DESCRIPTION: Green fill color
Field Name
Bits
Default
Description
COLOR_GREEN
31:0
0x0
FP20 format for green fill.
GA:GA_FILL_R · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x4220
DESCRIPTION: Red fill color
Field Name
Bits
Default
Description
COLOR_RED
31:0
0x0
FP20 format for red fill.
© 2010 Advanced Micro Devices, Inc.
Proprietary
179
Revision 1.5
June 8, 2010
GA:GA_FOG_OFFSET · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x4298
DESCRIPTION: Specifies the offset to apply to fog.
Field Name
Bits
Default
Description
VALUE
31:0
0x0
32b SPFP scale value.
GA:GA_FOG_SCALE · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x4294
DESCRIPTION: Specifies the scale to apply to fog.
Field Name
Bits
Default
Description
VALUE
31:0
0x0
32b SPFP scale value.
GA:GA_IDLE · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x425c
DESCRIPTION: Returns idle status of various G3D block, captured when GA_IDLE written or when hard or soft
reset asserted.
Field Name
Bits
Default
Description
PIPE3_Z_IDLE
0
0x0
Idle status of physical pipe 3 Z unit
PIPE2_Z_IDLE
1
0x0
Idle status of physical pipe 2 Z unit
PIPE3_CB_IDLE
2
0x0
Idle status of physical pipe 3 CB unit
PIPE2_CB_IDLE
3
0x0
Idle status of physical pipe 2 CB unit
PIPE3_FG_IDLE
4
0x0
Idle status of physical pipe 3 FG unit
PIPE2_FG_IDLE
5
0x0
Idle status of physical pipe 2 FG unit
PIPE3_US_IDLE
6
0x0
Idle status of physical pipe 3 US unit
PIPE2_US_IDLE
7
0x0
Idle status of physical pipe 2 US unit
PIPE3_SC_IDLE
8
0x0
Idle status of physical pipe 3 SC unit
PIPE2_SC_IDLE
9
0x0
Idle status of physical pipe 2 SC unit
PIPE3_RS_IDLE
10
0x0
Idle status of physical pipe 3 RS unit
PIPE2_RS_IDLE
11
0x0
Idle status of physical pipe 2 RS unit
PIPE1_Z_IDLE
12
0x0
Idle status of physical pipe 1 Z unit
PIPE0_Z_IDLE
13
0x0
Idle status of physical pipe 0 Z unit
PIPE1_CB_IDLE
14
0x0
Idle status of physical pipe 1 CB unit
PIPE0_CB_IDLE
15
0x0
Idle status of physical pipe 0 CB unit
PIPE1_FG_IDLE
16
0x0
Idle status of physical pipe 1 FG unit
PIPE0_FG_IDLE
17
0x0
Idle status of physical pipe 0 FG unit
PIPE1_US_IDLE
18
0x0
Idle status of physical pipe 1 US unit
PIPE0_US_IDLE
19
0x0
Idle status of physical pipe 0 US unit
PIPE1_SC_IDLE
20
0x0
Idle status of physical pipe 1 SC unit
PIPE0_SC_IDLE
21
0x0
Idle status of physical pipe 0 SC unit
PIPE1_RS_IDLE
22
0x0
Idle status of physical pipe 1 RS unit
PIPE0_RS_IDLE
23
0x0
Idle status of physical pipe 0 RS unit
© 2010 Advanced Micro Devices, Inc.
Proprietary
180
Revision 1.5
SU_IDLE
24
0x0
Idle status of SU unit
GA_IDLE
25
0x0
Idle status of GA unit
GA_UNIT2_IDLE
26
0x0
Idle status of GA unit2
June 8, 2010
GA:GA_LINE_CNTL · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x4234
DESCRIPTION: Line control
Field Name
Bits
Default
Description
WIDTH
15:0
0x0
1/2 width of line, in subpixels (1/12 or 1/16 only, even in
8b subprecision); (16.0) fixed format.
END_TYPE
17:16
0x0
Specifies how ends of lines should be drawn.
POSSIBLE VALUES:
00 - Horizontal
01 - Vertical
02 - Square (horizontal or vertical depending upon
slope)
03 - Computed (perpendicular to slope)
SORT
18
0x0
R520+: When enabled, all lines are sorted so that V0 is
vertex with smallest X, or if X equal, smallest Y.
POSSIBLE VALUES:
00 - No sorting (default)
01 - Sort on minX than MinY
GA:GA_LINE_S0 · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x4264
DESCRIPTION: S Texture Coordinate Value for Vertex 0 of Line (stuff textures -- i.e. AA)
Field Name
Bits
Default
Description
S0
31:0
0x0
S texture coordinate value generated for vertex 0 of an
antialiased line; 32-bit IEEE float format. Typical 0.0.
GA:GA_LINE_S1 · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x4268
DESCRIPTION: S Texture Coordinate Value for Vertex 1 of Lines (V2 of parallelogram -- stuff textures -- i.e. AA)
Field Name
Bits
Default
Description
S1
31:0
0x0
S texture coordinate value generated for vertex 1 of an
antialiased line; 32-bit IEEE float format. Typical 1.0.
GA:GA_LINE_STIPPLE_CONFIG · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x4238
DESCRIPTION: Line Stipple configuration information.
Field Name
Bits
Default
Description
LINE_RESET
1:0
0x0
Specify type of reset to use for stipple accumulation.
© 2010 Advanced Micro Devices, Inc.
Proprietary
181
Revision 1.5
June 8, 2010
POSSIBLE VALUES:
00 - No reseting
01 - Reset per line
02 - Reset per packet
STIPPLE_SCALE
31:2
0x0
Specifies, in truncated (30b) floating point, scale to apply
to generated texture coordinates.
GA:GA_LINE_STIPPLE_VALUE · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x4260
DESCRIPTION: Current value of stipple accumulator.
Field Name
Bits
Default
Description
STIPPLE_VALUE
31:0
0x0
24b Integer, measuring stipple accumulation in subpixels
(1/12 or 1/16, even in 8b precision). (note: field is 32b,
but only lower 24b used)
GA:GA_OFFSET · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x4290
DESCRIPTION: Specifies x & y offsets for vertex data after conversion to FP.
Field Name
Bits
Default
Description
X_OFFSET
15:0
0x0
Specifies X offset in S15 format (subpixels -- 1/12 or
1/16, even in 8b subprecision).
Y_OFFSET
31:16
0x0
Specifies Y offset in S15 format (subpixels -- 1/12 or
1/16, even in 8b subprecision).
GA:GA_POINT_MINMAX · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x4230
DESCRIPTION: Specifies maximum and minimum point & sprite sizes for per vertex size specification.
Field Name
Bits
Default
Description
MIN_SIZE
15:0
0x0
Minimum point & sprite radius (in subsamples) size to
allow.
MAX_SIZE
31:16
0x0
Maximum point & sprite radius (in subsamples) size to
allow.
GA:GA_POINT_S0 · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x4200
DESCRIPTION: S Texture Coordinate of Vertex 0 for Point texture stuffing (LLC)
Field Name
Bits
Default
Description
S0
31:0
0x0
S texture coordinate of vertex 0 for point; 32-bit IEEE
float format.
GA:GA_POINT_S1 · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x4208
DESCRIPTION: S Texture Coordinate of Vertex 2 for Point texture stuffing (URC)
© 2010 Advanced Micro Devices, Inc.
Proprietary
182
Revision 1.5
June 8, 2010
Field Name
Bits
Default
Description
S1
31:0
0x0
S texture coordinate of vertex 2 for point; 32-bit IEEE
float format.
GA:GA_POINT_SIZE · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x421c
DESCRIPTION: Dimensions for Points
Field Name
Bits
Default
Description
HEIGHT
15:0
0x0
1/2 Height of point; fixed (16.0), subpixel format (1/12
or 1/16, even if in 8b precision).
WIDTH
31:16
0x0
1/2 Width of point; fixed (16.0), subpixel format (1/12 or
1/16, even if in 8b precision)
GA:GA_POINT_T0 · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x4204
DESCRIPTION: T Texture Coordinate of Vertex 0 for Point texture stuffing (LLC)
Field Name
Bits
Default
Description
T0
31:0
0x0
T texture coordinate of vertex 0 for point; 32-bit IEEE
float format.
GA:GA_POINT_T1 · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x420c
DESCRIPTION: T Texture Coordinate of Vertex 2 for Point texture stuffing (URC)
Field Name
Bits
Default
Description
T1
31:0
0x0
T texture coordinate of vertex 2 for point; 32-bit IEEE
float format.
GA:GA_POLY_MODE · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x4288
DESCRIPTION: Polygon Mode
Field Name
Bits
Default
Description
POLY_MODE
1:0
0x0
Polygon mode enable.
POSSIBLE VALUES:
00 - Disable poly mode (render triangles).
01 - Dual mode (send 2 sets of 3 polys with specified
poly type).
02 - Reserved
FRONT_PTYPE
6:4
0x0
Specifies how to render front-facing polygons.
POSSIBLE VALUES:
00 - Draw points.
01 - Draw lines.
02 - Draw triangles.
03 - Reserved 3 - 7.
© 2010 Advanced Micro Devices, Inc.
Proprietary
183
Revision 1.5
BACK_PTYPE
9:7
0x0
June 8, 2010
Specifies how to render back-facing polygons.
POSSIBLE VALUES:
00 - Draw points.
01 - Draw lines.
02 - Draw triangles.
03 - Reserved 3 - 7.
GA:GA_ROUND_MODE · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x428c
DESCRIPTION: Specifies the rouding mode for geometry & color SPFP to FP conversions.
Field Name
Bits
Default
Description
GEOMETRY_ROUND
1:0
0x0
Trunc (0) or round to nearest (1) for geometry (XY).
POSSIBLE VALUES:
00 - Round to trunc
01 - Round to nearest
COLOR_ROUND
3:2
0x0
When set, FP32 to FP20 using round to nearest;
otherwise trunc
POSSIBLE VALUES:
00 - Round to trunc
01 - Round to nearest
RGB_CLAMP
4
0x0
Specifies SPFP color clamp range of [0,1] or FP20 for
RGB.
POSSIBLE VALUES:
00 - Clamp to [0,1.0] for RGB
01 - RGB is FP20
ALPHA_CLAMP
5
0x0
Specifies SPFP alpha clamp range of [0,1] or FP20.
POSSIBLE VALUES:
00 - Clamp to [0,1.0] for Alpha
01 - Alpha is FP20
GEOMETRY_MASK
9:6
0x0
4b negative polarity mask for subpixel precision.
Inverted version gets ANDed with subpixel X, Y masks.
GA:GA_SOLID_BA · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x4280
DESCRIPTION: Specifies blue & alpha components of fill color -- S312 format -- Backwards comp.
Field Name
Bits
Default
Description
COLOR_ALPHA
15:0
0x0
Component alpha value. (S3.12)
COLOR_BLUE
31:16
0x0
Component blue value. (S3.12)
GA:GA_SOLID_RG · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x427c
© 2010 Advanced Micro Devices, Inc.
Proprietary
184
Revision 1.5
June 8, 2010
DESCRIPTION: Specifies red & green components of fill color -- S312 format -- Backwards comp.
Field Name
Bits
Default
Description
COLOR_GREEN
15:0
0x0
Component green value (S3.12).
COLOR_RED
31:16
0x0
Component red value (S3.12).
GA:GA_TRIANGLE_STIPPLE · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x4214
DESCRIPTION: Specifies amount to shift integer position of vertex (screen space) before converting to float for
triangle stipple.
Field Name
Bits
Default
Description
X_SHIFT
3:0
0x0
Amount to shift x position before conversion to SPFP.
Y_SHIFT
19:16
0x0
Amount to shift y position before conversion to SPFP.
GA:GA_US_VECTOR_DATA · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x4254
DESCRIPTION: Data register for loading US instructions and constants
Field Name
Bits
Default
Description
DATA
31:0
0x0
32 bit dword
GA:GA_US_VECTOR_INDEX · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x4250
DESCRIPTION: Used to load US instructions and constants
Field Name
Bits
Default
Description
INDEX
8:0
0x0
Instruction (TYPE == GA_US_VECTOR_INST) or
constant (TYPE == GA_US_VECTOR_CONST)
number at which to start loading. The GA will then
expect n*6 (instructions) or n*4 (constants) writes to
GA_US_VECTOR_DATA. The GA will self-increment
until this register is written again. For instructions, the
GA expects the dwords in the following order:
US_CMN_INST, US_ALU_RGB_ADDR,
US_ALU_ALPHA_ADDR, US_ALU_ALPHA,
US_RGB_INST, US_ALPHA_INST, US_RGBA_INST.
For constants, the GA expects the dwords in RGBA
order.
TYPE
16
0x0
Specifies if the GA should load instructions or constants.
POSSIBLE VALUES:
00 - Load instructions - INDEX is an instruction
index
01 - Load constants - INDEX is a constant index
CLAMP
© 2010 Advanced Micro Devices, Inc.
Proprietary
17
0x0
POSSIBLE VALUES:
00 - No clamping of data - Default
01 - Clamp to [-1.0,1.0] constant data
185
Revision 1.5
June 8, 2010
11.5 Graphics Backend Registers
GB:GB_AA_CONFIG · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x4020
DESCRIPTION: Specifies the graphics pipeline configuration for antialiasing.
Field Name
Bits
Default
Description
AA_ENABLE
0
0x0
Enables antialiasing.
POSSIBLE VALUES:
00 - Antialiasing disabled(def)
01 - Antialiasing enabled
NUM_AA_SUBSAMPLES
2:1
0x0
Specifies the number of subsamples to use while
antialiasing.
POSSIBLE VALUES:
00 - 2 subsamples
01 - 3 subsamples
02 - 4 subsamples
03 - 6 subsamples
GB:GB_ENABLE · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x4008
DESCRIPTION: Specifies top of Raster pipe specific enable controls.
Field Name
Bits
Default
Description
POINT_STUFF_ENABLE
0
0x0
Specifies if points will have stuffed texture coordinates.
POSSIBLE VALUES:
00 - Disable point texture stuffing.
01 - Enable point texture stuffing.
LINE_STUFF_ENABLE
1
0x0
Specifies if lines will have stuffed texture coordinates.
POSSIBLE VALUES:
00 - Disable line texture stuffing.
01 - Enable line texture stuffing.
TRIANGLE_STUFF_ENABLE
2
0x0
Specifies if triangles will have stuffed texture
coordinates.
POSSIBLE VALUES:
00 - Disable triangle texture stuffing.
01 - Enable triangle texture stuffing.
STENCIL_AUTO
5:4
0x0
Specifies if the auto dec/inc stencil mode should be
enabled, and how.
POSSIBLE VALUES:
00 - Disable stencil auto inc/dec (def).
01 - Enable stencil auto inc/dec based on triangle
cw/ccw, force into dzy low bit.
02 - Force 0 into dzy low bit.
© 2010 Advanced Micro Devices, Inc.
Proprietary
186
Revision 1.5
TEX0_SOURCE
17:16
0x0
June 8, 2010
Specifies the sources of the texture coordinates for each
texture.
POSSIBLE VALUES:
00 - Replicate VAP source texture coordinates
(S,T,[R,Q]).
01 - Stuff with source texture coordinates (S,T).
02 - Stuff with source texture coordinates (S,T,R).
TEX1_SOURCE
19:18
0x0
Specifies the sources of the texture coordinates for each
texture.
POSSIBLE VALUES:
00 - Replicate VAP source texture coordinates
(S,T,[R,Q]).
01 - Stuff with source texture coordinates (S,T).
02 - Stuff with source texture coordinates (S,T,R).
TEX2_SOURCE
21:20
0x0
Specifies the sources of the texture coordinates for each
texture.
POSSIBLE VALUES:
00 - Replicate VAP source texture coordinates
(S,T,[R,Q]).
01 - Stuff with source texture coordinates (S,T).
02 - Stuff with source texture coordinates (S,T,R).
TEX3_SOURCE
23:22
0x0
Specifies the sources of the texture coordinates for each
texture.
POSSIBLE VALUES:
00 - Replicate VAP source texture coordinates
(S,T,[R,Q]).
01 - Stuff with source texture coordinates (S,T).
02 - Stuff with source texture coordinates (S,T,R).
TEX4_SOURCE
25:24
0x0
Specifies the sources of the texture coordinates for each
texture.
POSSIBLE VALUES:
00 - Replicate VAP source texture coordinates
(S,T,[R,Q]).
01 - Stuff with source texture coordinates (S,T).
02 - Stuff with source texture coordinates (S,T,R).
TEX5_SOURCE
27:26
0x0
Specifies the sources of the texture coordinates for each
texture.
POSSIBLE VALUES:
00 - Replicate VAP source texture coordinates
(S,T,[R,Q]).
01 - Stuff with source texture coordinates (S,T).
02 - Stuff with source texture coordinates (S,T,R).
TEX6_SOURCE
© 2010 Advanced Micro Devices, Inc.
Proprietary
29:28
0x0
Specifies the sources of the texture coordinates for each
texture.
187
Revision 1.5
June 8, 2010
POSSIBLE VALUES:
00 - Replicate VAP source texture coordinates
(S,T,[R,Q]).
01 - Stuff with source texture coordinates (S,T).
02 - Stuff with source texture coordinates (S,T,R).
TEX7_SOURCE
31:30
0x0
Specifies the sources of the texture coordinates for each
texture.
POSSIBLE VALUES:
00 - Replicate VAP source texture coordinates
(S,T,[R,Q]).
01 - Stuff with source texture coordinates (S,T).
02 - Stuff with source texture coordinates (S,T,R).
GB:GB_FIFO_SIZE · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x4024
DESCRIPTION: Specifies the sizes of the various FIFO`s in the sc/rs/us. This register must be the first one written
Field Name
Bits
Default
Description
SC_IFIFO_SIZE
1:0
0x0
Size of scan converter input FIFO (XYZ)
POSSIBLE VALUES:
00 - 32 words
01 - 64 words
02 - 128 words
03 - 256 words
SC_TZFIFO_SIZE
3:2
0x0
Size of scan converter top-of-pipe Z FIFO
POSSIBLE VALUES:
00 - 16 words
01 - 32 words
02 - 64 words
03 - 128 words
SC_BFIFO_SIZE
5:4
0x0
Size of scan converter input FIFO (B)
POSSIBLE VALUES:
00 - 32 words
01 - 64 words
02 - 128 words
03 - 256 words
RS_TFIFO_SIZE
7:6
0x0
Size of ras input FIFO (Texture)
POSSIBLE VALUES:
00 - 64 words
01 - 128 words
02 - 256 words
03 - 512 words
RS_CFIFO_SIZE
© 2010 Advanced Micro Devices, Inc.
Proprietary
9:8
0x0
Size of ras input FIFO (Color)
188
Revision 1.5
June 8, 2010
POSSIBLE VALUES:
00 - 64 words
01 - 128 words
02 - 256 words
03 - 512 words
US_RAM_SIZE
11:10
0x0
Size of us RAM
POSSIBLE VALUES:
00 - 64 words
01 - 128 words
02 - 256 words
03 - 512 words
US_OFIFO_SIZE
13:12
0x0
Size of us output FIFO (RGBA)
POSSIBLE VALUES:
00 - 16 words
01 - 32 words
02 - 64 words
03 - 128 words
US_WFIFO_SIZE
15:14
0x0
Size of us output FIFO (W)
POSSIBLE VALUES:
00 - 16 words
01 - 32 words
02 - 64 words
03 - 128 words
RS_HIGHWATER_COL
18:16
0x0
High water mark for RS colors` fifo -- NOT USED
RS_HIGHWATER_TEX
21:19
0x0
High water mark for RS textures` fifo -- NOT USED
US_OFIFO_HIGHWATER
23:22
0x0
High water mark for US output fifo
POSSIBLE VALUES:
00 - 0 words
01 - 4 words
02 - 8 words
03 - 12 words
US_CUBE_FIFO_HIGHWATER 28:24
0x0
High water mark for US cube map fifo
GB:GB_FIFO_SIZE1 · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x4070
DESCRIPTION: Specifies the sizes of the various FIFO`s in the sc/rs.
Field Name
Bits
Default
Description
SC_HIGHWATER_IFIFO
5:0
0x0
High water mark for SC input fifo
SC_HIGHWATER_BFIFO
11:6
0x0
High water mark for SC input fifo (B)
RS_HIGHWATER_COL
17:12
0x0
High water mark for RS colors` fifo
RS_HIGHWATER_TEX
23:18
0x0
High water mark for RS textures` fifo
© 2010 Advanced Micro Devices, Inc.
Proprietary
189
Revision 1.5
June 8, 2010
GB:GB_MSPOS0 · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x4010
DESCRIPTION: Specifies the position of multisamples 0 through 2
Field Name
Bits
Default
Description
MS_X0
3:0
0x0
Specifies the x and y position (in subpixels) of
multisample 0
MS_Y0
7:4
0x0
Specifies the x and y position (in subpixels) of
multisample 0
MS_X1
11:8
0x0
Specifies the x and y position (in subpixels) of
multisample 1
MS_Y1
15:12
0x0
Specifies the x and y position (in subpixels) of
multisample 1
MS_X2
19:16
0x0
Specifies the x and y position (in subpixels) of
multisample 2
MS_Y2
23:20
0x0
Specifies the x and y position (in subpixels) of
multisample 2
MSBD0_Y
27:24
0x0
Specifies the minimum x and y distance (in subpixels)
between the pixel edge and the multisamples. These
values are used in the first (coarse) scan converter
MSBD0_X
31:28
0x0
Specifies the minimum x and y distance (in subpixels)
between the pixel edge and the multisamples. These
values are used in the first (coarse) scan converter
GB:GB_MSPOS1 · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x4014
DESCRIPTION: Specifies the position of multisamples 3 through 5
Field Name
Bits
Default
Description
MS_X3
3:0
0x0
Specifies the x and y position (in subpixels) of
multisample 3
MS_Y3
7:4
0x0
Specifies the x and y position (in subpixels) of
multisample 3
MS_X4
11:8
0x0
Specifies the x and y position (in subpixels) of
multisample 4
MS_Y4
15:12
0x0
Specifies the x and y position (in subpixels) of
multisample 4
MS_X5
19:16
0x0
Specifies the x and y position (in subpixels) of
multisample 5
MS_Y5
23:20
0x0
Specifies the x and y position (in subpixels) of
multisample 5
MSBD1
27:24
0x0
Specifies the minimum distance (in subpixels) between
the pixel edge and the multisamples. This value is used
in the second (quad) scan converter
GB:GB_PIPE_SELECT · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x402c
DESCRIPTION: Selects which of 4 pipes are active.
© 2010 Advanced Micro Devices, Inc.
Proprietary
190
Revision 1.5
June 8, 2010
Field Name
Bits
Default
Description
PIPE0_ID
1:0
0x0
Maps physical pipe 0 to logical pipe ID (def 0).
PIPE1_ID
3:2
0x1
Maps physical pipe 1 to logical pipe ID (def 1).
PIPE2_ID
5:4
0x2
Maps physical pipe 2 to logical pipe ID (def 2).
PIPE3_ID
7:6
0x3
Maps physical pipe 3 to logical pipe ID (def 3).
PIPE_MASK
11:8
0x0
4b mask, indicates which physical pipes are enabled (def
none=0x0) -- B3=P3, B2=P2, B1=P1, B0=P0. -- 1:
enabled, 0: disabled
MAX_PIPE
13:12
0x3
2b, indicates, by the fuses, the max number of allowed
pipes. 0 = 1 pipe ... 3 = 4 pipes -- Read Only
BAD_PIPES
17:14
0xF
4b, indicates, by the fuses, the bad pipes: B3=P3, B2=P2,
B1=P1, B0=P0 -- 1: bad, 0: good -- Read Only
CONFIG_PIPES
18
0x0
If this bit is set when writing this register, the logical
pipe ID values are assigned automatically based on the
values that are read back in the MAX_PIPE and
BAD_PIPES fields. This field is always read back as 0.
POSSIBLE VALUES:
00 - Do nothing
01 - Force self-configuration
GB:GB_SELECT · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x401c
DESCRIPTION: Specifies various polygon specific selects (fog, depth, perspective).
Field Name
Bits
Default
Description
FOG_SELECT
2:0
0x0
Specifies source for outgoing (GA to SU) fog value.
POSSIBLE VALUES:
00 - Select C0A
01 - Select C1A
02 - Select C2A
03 - Select C3A
04 - Select 1/(1/W)
05 - Select Z
DEPTH_SELECT
3
0x0
Specifies source for outgoing (GA/SU & SU/RAS) depth
value.
POSSIBLE VALUES:
00 - Select Z
01 - Select 1/(1/W)
W_SELECT
4
0x0
Specifies source for outgoing (1/W) value, used to
disable perspective correct colors/textures.
POSSIBLE VALUES:
00 - Select (1/W)
01 - Select 1.0
FOG_STUFF_ENABLE
© 2010 Advanced Micro Devices, Inc.
Proprietary
5
0x0
Controls enabling of fog stuffing into texture coordinate.
191
Revision 1.5
June 8, 2010
POSSIBLE VALUES:
00 - Disable fog texture stuffing
01 - Enable fog texture stuffing
FOG_STUFF_TEX
9:6
0x0
Controls which texture gets fog value
FOG_STUFF_COMP
11:10
0x0
Controls which component of texture gets fog value
GB:GB_TILE_CONFIG · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x4018
DESCRIPTION: Specifies the graphics pipeline configuration for rasterization
Field Name
Bits
Default
Description
ENABLE
0
0x1
Enables tiling, otherwise all tiles receive all polygons.
POSSIBLE VALUES:
00 - Tiling disabled.
01 - Tiling enabled (def).
PIPE_COUNT
3:1
0x0
Specifies the number of active pipes and contexts (up to
4 pipes, 1 ctx). When this field is written, it is
automatically reduced by hardware so as not to use more
pipes than the number indicated in
GB_PIPE_SELECT.MAX_PIPES or the number of
pipes left unmasked GB_PIPE_SELECT.BAD_PIPES.
The potentially altered value is read back, rather than the
original value written by software.
POSSIBLE VALUES:
00 - RV350 (1 pipe, 1 ctx)
03 - R300 (2 pipes, 1 ctx)
06 – R420-3P (3 pipes, 1 ctx)
07 – R420 (4 pipes, 1 ctx)
TILE_SIZE
5:4
0x1
Specifies width & height (square), in pixels (only 16, 32
available).
POSSIBLE VALUES:
00 - 8 pixels.
01 - 16 pixels.
02 - 32 pixels.
SUPER_SIZE
8:6
0x0
Specifies number of tiles and config in super chip
configuration.
POSSIBLE VALUES:
00 - 1x1 tile (one 1x1).
01 - 2 tiles (two 1x1 : ST-A,B).
02 - 4 tiles (one 2x2).
03 - 8 tiles (two 2x2 : ST-A,B).
04 - 16 tiles (one 4x4).
05 - 32 tiles (two 4x4 : ST-A,B).
06 - 64 tiles (one 8x8).
07 - 128 tiles (two 8x8 : ST-A,B).
© 2010 Advanced Micro Devices, Inc.
Proprietary
192
Revision 1.5
June 8, 2010
SUPER_X
11:9
0x0
X Location of chip within super tile.
SUPER_Y
14:12
0x0
Y Location of chip within super tile.
SUPER_TILE
15
0x0
Tile location of chip in a multi super tile config (Super
size of 2,8,32 or 128).
POSSIBLE VALUES:
00 - ST-A tile.
01 - ST-B tile.
SUBPIXEL
16
0x0
Specifies the precision of subpixels wrt pixels (12 or 16).
POSSIBLE VALUES:
00 - Select 1/12 subpixel precision.
01 - Select 1/16 subpixel precision.
QUADS_PER_RAS
18:17
0x0
Specifies the number of quads to be sent to each
rasterizer in turn when in RV300B or R300B mode
POSSIBLE VALUES:
00 - 4 Quads
01 - 8 Quads
02 - 16 Quads
03 - 32 Quads
BB_SCAN
19
0x0
Specifies whether to use an intercept or bounding box
based calculation for the first (coarse) scan converter
POSSIBLE VALUES:
00 - Use intercept based scan converter
01 - Use bounding box based scan converter
ALT_SCAN_EN
20
0x0
Specifies whether to use an altenate scan pattern for the
coarse scan converter
POSSIBLE VALUES:
00 - Use normal left-right scan
01 - Use alternate left-right-left scan
ALT_OFFSET
21
0x0
Not used -- should be 0
POSSIBLE VALUES:
00 - Not used
01 - Not used
SUBPRECISION
22
0x0
Set to 0
ALT_TILING
23
0x0
Support for 3x2 tiling in 3P mode
POSSIBLE VALUES:
00 - Use default tiling in all tiling modes
01 - Use alternative 3x2 tiling in 3P mode
Z_EXTENDED
24
0x0
Support for extended setup Z range from [0,1] to [-2,2]
with per pixel clamping
POSSIBLE VALUES:
© 2010 Advanced Micro Devices, Inc.
Proprietary
193
Revision 1.5
June 8, 2010
00 - Use (24.1) Z format, with vertex clamp to
[1.0,0.0]
01 - Use (S25.1) format, with vertex clamp to [2.0,2.0] and per pixel [1.0,0.0]
GB:GB_Z_PEQ_CONFIG · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x4028
DESCRIPTION: Specifies the z plane equation configuration.
Field Name
Bits
Default
Description
Z_PEQ_SIZE
0
0x0
Specifies the z plane equation size.
POSSIBLE VALUES:
00 - 4x4 z plane equations (point-sampled or aa)
01 - 8x8 z plane equations (point-sampled only)
GB:PS3_ENABLE · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x4118
DESCRIPTION: PS3 mode enable register
Field Name
Bits
Default
Description
PS3_MODE
0
0x0
When reset (default), follows R300/PS2 mode; when set,
allows for new ps3 mode.
POSSIBLE VALUES:
00 - Default PS2 mode
01 - New PS3 mode
GB:PS3_TEX_SOURCE · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x4120
DESCRIPTION: Specifies source for texture components in PS3 mode
Field Name
Bits
Default
Description
TEX0_SOURCE
1:0
0x0
Specifies VAP source, or GA (ST) or GA (STR) stuffing
for each texture.
POSSIBLE VALUES:
00 - Replicate VAP source texture coordinates
(S,T,[R,Q]).
01 - Stuff with source texture coordinates (S,T).
02 - Stuff with source texture coordinates (S,T,R).
TEX1_SOURCE
3:2
0x0
Specifies VAP source, or GA (ST) or GA (STR) stuffing
for each texture.
POSSIBLE VALUES:
00 - Replicate VAP source texture coordinates
(S,T,[R,Q]).
© 2010 Advanced Micro Devices, Inc.
Proprietary
194
Revision 1.5
June 8, 2010
01 - Stuff with source texture coordinates (S,T).
02 - Stuff with source texture coordinates (S,T,R).
TEX2_SOURCE
5:4
0x0
Specifies VAP source, or GA (ST) or GA (STR) stuffing
for each texture.
POSSIBLE VALUES:
00 - Replicate VAP source texture coordinates
(S,T,[R,Q]).
01 - Stuff with source texture coordinates (S,T).
02 - Stuff with source texture coordinates (S,T,R).
TEX3_SOURCE
7:6
0x0
Specifies VAP source, or GA (ST) or GA (STR) stuffing
for each texture.
POSSIBLE VALUES:
00 - Replicate VAP source texture coordinates
(S,T,[R,Q]).
01 - Stuff with source texture coordinates (S,T).
02 - Stuff with source texture coordinates (S,T,R).
TEX4_SOURCE
9:8
0x0
Specifies VAP source, or GA (ST) or GA (STR) stuffing
for each texture.
POSSIBLE VALUES:
00 - Replicate VAP source texture coordinates
(S,T,[R,Q]).
01 - Stuff with source texture coordinates (S,T).
02 - Stuff with source texture coordinates (S,T,R).
TEX5_SOURCE
11:10
0x0
Specifies VAP source, or GA (ST) or GA (STR) stuffing
for each texture.
POSSIBLE VALUES:
00 - Replicate VAP source texture coordinates
(S,T,[R,Q]).
01 - Stuff with source texture coordinates (S,T).
02 - Stuff with source texture coordinates (S,T,R).
TEX6_SOURCE
13:12
0x0
Specifies VAP source, or GA (ST) or GA (STR) stuffing
for each texture.
POSSIBLE VALUES:
00 - Replicate VAP source texture coordinates
(S,T,[R,Q]).
01 - Stuff with source texture coordinates (S,T).
02 - Stuff with source texture coordinates (S,T,R).
TEX7_SOURCE
© 2010 Advanced Micro Devices, Inc.
Proprietary
15:14
0x0
Specifies VAP source, or GA (ST) or GA (STR) stuffing
for each texture.
195
Revision 1.5
June 8, 2010
POSSIBLE VALUES:
00 - Replicate VAP source texture coordinates
(S,T,[R,Q]).
01 - Stuff with source texture coordinates (S,T).
02 - Stuff with source texture coordinates (S,T,R).
TEX8_SOURCE
17:16
0x0
Specifies VAP source, or GA (ST) or GA (STR) stuffing
for each texture.
POSSIBLE VALUES:
00 - Replicate VAP source texture coordinates
(S,T,[R,Q]).
01 - Stuff with source texture coordinates (S,T).
02 - Stuff with source texture coordinates (S,T,R).
TEX9_SOURCE
19:18
0x0
Specifies VAP source, or GA (ST) or GA (STR) stuffing
for each texture.
POSSIBLE VALUES:
00 - Replicate VAP source texture coordinates
(S,T,[R,Q]).
01 - Stuff with source texture coordinates (S,T).
02 - Stuff with source texture coordinates (S,T,R).
GB:PS3_VTX_FMT · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x411c
DESCRIPTION: PS3 vertex format register
Field Name
Bits
Default
Description
TEX_0_COMP_CNT
2:0
0x0
How many active components (0,1,2,3,4) are in each
texture.
POSSIBLE VALUES:
00 - Not active
01 - 1 component (VAP/GA), 2 component (GA/SU)
02 - 2 component (VAP/GA), 2 component (GA/SU)
03 - 3 component (VAP/GA), 3 component (GA/SU)
04 - 4 component (VAP/GA), 4 component (GA/SU)
TEX_1_COMP_CNT
5:3
0x0
How many active components (0,1,2,3,4) are in each
texture.
POSSIBLE VALUES:
00 - Not active
01 - 1 component (VAP/GA), 2 component (GA/SU)
02 - 2 component (VAP/GA), 2 component (GA/SU)
03 - 3 component (VAP/GA), 3 component (GA/SU)
© 2010 Advanced Micro Devices, Inc.
Proprietary
196
Revision 1.5
June 8, 2010
04 - 4 component (VAP/GA), 4 component (GA/SU)
TEX_2_COMP_CNT
8:6
0x0
How many active components (0,1,2,3,4) are in each
texture.
POSSIBLE VALUES:
00 - Not active
01 - 1 component (VAP/GA), 2 component (GA/SU)
02 - 2 component (VAP/GA), 2 component (GA/SU)
03 - 3 component (VAP/GA), 3 component (GA/SU)
04 - 4 component (VAP/GA), 4 component (GA/SU)
TEX_3_COMP_CNT
11:9
0x0
How many active components (0,1,2,3,4) are in each
texture.
POSSIBLE VALUES:
00 - Not active
01 - 1 component (VAP/GA), 2 component (GA/SU)
02 - 2 component (VAP/GA), 2 component (GA/SU)
03 - 3 component (VAP/GA), 3 component (GA/SU)
04 - 4 component (VAP/GA), 4 component (GA/SU)
TEX_4_COMP_CNT
14:12
0x0
How many active components (0,1,2,3,4) are in each
texture.
POSSIBLE VALUES:
00 - Not active
01 - 1 component (VAP/GA), 2 component (GA/SU)
02 - 2 component (VAP/GA), 2 component (GA/SU)
03 - 3 component (VAP/GA), 3 component (GA/SU)
04 - 4 component (VAP/GA), 4 component (GA/SU)
TEX_5_COMP_CNT
17:15
0x0
How many active components (0,1,2,3,4) are in each
texture.
POSSIBLE VALUES:
00 - Not active
01 - 1 component (VAP/GA), 2 component (GA/SU)
02 - 2 component (VAP/GA), 2 component (GA/SU)
03 - 3 component (VAP/GA), 3 component (GA/SU)
04 - 4 component (VAP/GA), 4 component (GA/SU)
TEX_6_COMP_CNT
20:18
0x0
How many active components (0,1,2,3,4) are in each
texture.
POSSIBLE VALUES:
00 - Not active
01 - 1 component (VAP/GA), 2 component (GA/SU)
02 - 2 component (VAP/GA), 2 component (GA/SU)
© 2010 Advanced Micro Devices, Inc.
Proprietary
197
Revision 1.5
June 8, 2010
03 - 3 component (VAP/GA), 3 component (GA/SU)
04 - 4 component (VAP/GA), 4 component (GA/SU)
TEX_7_COMP_CNT
23:21
0x0
How many active components (0,1,2,3,4) are in each
texture.
POSSIBLE VALUES:
00 - Not active
01 - 1 component (VAP/GA), 2 component (GA/SU)
02 - 2 component (VAP/GA), 2 component (GA/SU)
03 - 3 component (VAP/GA), 3 component (GA/SU)
04 - 4 component (VAP/GA), 4 component (GA/SU)
TEX_8_COMP_CNT
26:24
0x0
How many active components (0,1,2,3,4) are in each
texture.
POSSIBLE VALUES:
00 - Not active
01 - 1 component (VAP/GA), 2 component (GA/SU)
02 - 2 component (VAP/GA), 2 component (GA/SU)
03 - 3 component (VAP/GA), 3 component (GA/SU)
04 - 4 component (VAP/GA), 4 component (GA/SU)
TEX_9_COMP_CNT
29:27
0x0
How many active components (0,1,2,3,4) are in each
texture.
POSSIBLE VALUES:
00 - Not active
01 - 1 component (VAP/GA), 2 component (GA/SU)
02 - 2 component (VAP/GA), 2 component (GA/SU)
03 - 3 component (VAP/GA), 3 component (GA/SU)
04 - 4 component (VAP/GA), 4 component (GA/SU)
TEX_10_COMP_CNT
31:30
0x0
How many active components (0,2,3,4) are in texture 10.
POSSIBLE VALUES:
00 - Not active
01 - 2 component (GA/SU)
02 - 3 component (GA/SU)
03 - 4 component (GA/SU)
© 2010 Advanced Micro Devices, Inc.
Proprietary
198
Revision 1.5
June 8, 2010
11.6 Rasterizer Registers
RS:RS_COUNT · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x4300
DESCRIPTION: This register specifies the rasterizer input packet configuration
Field Name
Bits
Default
Description
IT_COUNT
6:0
0x0
Specifies the total number of texture address components
contained in the rasterizer input packet (0:32).
IC_COUNT
10:7
0x0
Specifies the total number of colors contained in the
rasterizer input packet (0:4).
W_ADDR
17:12
0x0
Specifies the relative rasterizer input packet location of w
(if w_count==1)
HIRES_EN
18
0x0
Enable high resolution texture coordinate output when q
is equal to 1
RS:RS_INST_[0-15] · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x4320-0x435c
DESCRIPTION: This table specifies what happens during each rasterizer instruction
Field Name
Bits
Default
Description
TEX_ID
3:0
0x0
Specifies the index (into the RS_IP table) of the texture
address output during this rasterizer instruction
TEX_CN
4
0x0
Write enable for texture address
POSSIBLE VALUES:
00 - No write - texture coordinate not valid
01 - write - texture valid
TEX_ADDR
11:5
0x0
Specifies the destination address (within the current pixel
stack frame) of the texture address output during this
rasterizer instruction
COL_ID
15:12
0x0
Specifies the index (into the RS_IP table) of the color
output during this rasterizer instruction
COL_CN
17:16
0x0
Write enable for color
POSSIBLE VALUES:
00 - No write - color not valid
01 - write - color valid
02 - write fbuffer - XY00->RGBA
03 - write backface - B000->RGBA
COL_ADDR
24:18
0x0
Specifies the destination address (within the current pixel
stack frame) of the color output during this rasterizer
instruction
TEX_ADJ
25
0x0
Specifies whether to sample texture coordinates at the
real or adjusted pixel centers
POSSIBLE VALUES:
00 - Sample texture coordinates at real pixel centers
© 2010 Advanced Micro Devices, Inc.
Proprietary
199
Revision 1.5
June 8, 2010
01 - Sample texture coordinates at adjusted pixel
centers
W_CN
26
0x0
Specifies that the rasterizer should output w
POSSIBLE VALUES:
00 - No write - w not valid
01 - write - w valid
RS:RS_INST_COUNT · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x4304
DESCRIPTION: This register specifies the number of rasterizer instructions
Field Name
Bits
Default
Description
INST_COUNT
3:0
0x0
Number of rasterizer instructions (1:16)
TX_OFFSET
7:5
0x0
Indicates range of texture offset to minimize peroidic
errors on texels sampled right on their edges
RS:RS_IP_[0-15] · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x4074-0x40b0
DESCRIPTION: This table specifies the source location and format for up to 16 texture addresses (i[0]:i[15]) and
four colors (c[0]:c[3])
Field Name
Bits
Default
Description
TEX_PTR_S
5:0
0x0
Specifies the relative rasterizer input packet location of
each component (S, T, R, and Q) of texture address (i[i]).
The values 62 and 63 select constant inputs for the
component: 62 selects K0 (0.0), and 63 selects K1 (1.0).
TEX_PTR_T
11:6
0x0
Specifies the relative rasterizer input packet location of
each component (S, T, R, and Q) of texture address (i[i]).
The values 62 and 63 select constant inputs for the
component: 62 selects K0 (0.0), and 63 selects K1 (1.0).
TEX_PTR_R
17:12
0x0
Specifies the relative rasterizer input packet location of
each component (S, T, R, and Q) of texture address (i[i]).
The values 62 and 63 select constant inputs for the
component: 62 selects K0 (0.0), and 63 selects K1 (1.0).
TEX_PTR_Q
23:18
0x0
Specifies the relative rasterizer input packet location of
each component (S, T, R, and Q) of texture address (i[i]).
The values 62 and 63 select constant inputs for the
component: 62 selects K0 (0.0), and 63 selects K1 (1.0).
COL_PTR
26:24
0x0
Specifies the relative rasterizer input packet location of
the color (c[i]).
COL_FMT
30:27
0x0
Specifies the format of the color (c[i]).
POSSIBLE VALUES:
00 - Four components (R,G,B,A)
01 - Three components (R,G,B,0)
02 - Three components (R,G,B,1)
04 - One component (0,0,0,A)
05 - Zero components (0,0,0,0)
© 2010 Advanced Micro Devices, Inc.
Proprietary
200
Revision 1.5
June 8, 2010
06 - Zero components (0,0,0,1)
08 - One component (1,1,1,A)
09 - Zero components (1,1,1,0)
10 - Zero components (1,1,1,1)
OFFSET_EN
31
0x0
Enable application of the TX_OFFSET in
RS_INST_COUNT
POSSIBLE VALUES:
00 - Do not apply the TX_OFFSET in
RS_INST_COUNT
01 - Apply the TX_OFFSET specified by
RS_INST_COUNT
© 2010 Advanced Micro Devices, Inc.
Proprietary
201
Revision 1.5
June 8, 2010
11.7 Clipping Registers
SC:SC_CLIP_0_A · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x43b0
DESCRIPTION: OpenGL Clip rectangles
Field Name
Bits
Default
Description
XS0
12:0
0x0
Left hand edge of clip rectangle
YS0
25:13
0x0
Upper edge of clip rectangle
SC:SC_CLIP_0_B · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x43b4
DESCRIPTION: OpenGL Clip rectangles
Field Name
Bits
Default
Description
XS1
12:0
0x0
Right hand edge of clip rectangle
YS1
25:13
0x0
Lower edge of clip rectangle
SC:SC_CLIP_1_A · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x43b8
Field Name
Bits
Default
XS0
12:0
0x0
YS0
25:13
0x0
Description
SC:SC_CLIP_1_B · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x43bc
Field Name
Bits
Default
XS1
12:0
0x0
YS1
25:13
0x0
Description
SC:SC_CLIP_2_A · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x43c0
Field Name
Bits
Default
XS0
12:0
0x0
YS0
25:13
0x0
Description
SC:SC_CLIP_2_B · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x43c4
Field Name
Bits
Default
XS1
12:0
0x0
YS1
25:13
0x0
Description
SC:SC_CLIP_3_A · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x43c8
Field Name
© 2010 Advanced Micro Devices, Inc.
Proprietary
Bits
Default
Description
202
Revision 1.5
XS0
12:0
0x0
YS0
25:13
0x0
June 8, 2010
SC:SC_CLIP_3_B · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x43cc
Field Name
Bits
Default
XS1
12:0
0x0
YS1
25:13
0x0
Description
SC:SC_CLIP_RULE · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x43d0
DESCRIPTION: OpenGL Clip boolean function
Field Name
Bits
Default
Description
CLIP_RULE
15:0
0x0
OpenGL Clip boolean function. The `inside` flags for
each of the four clip rectangles form a 4-bit binary
number. The corresponding bit in this 16-bit number
specifies whether the pixel is visible.
SC:SC_EDGERULE · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x43a8
DESCRIPTION: Edge rules - what happens when an edge falls exactly on a sample point
Field Name
Bits
Default
Description
ER_TRI
4:0
0x0
Edge rules for triangles, points, left-right lines, right-left
lines, upper-bottom lines, bottom-upper lines. For values
0 to 15, bit 0 specifies whether a sample on a horizontalbottom edge is in, bit 1 specifies whether a sample on a
horizontal-top edge is in, bit 2 species whether a sample
on a right edge is in, bit 3 specifies whether a sample on
a left edge is in. For values 16 to 31, bit 0 specifies
whether a sample on a vertical-right edge is in, bit 1
specifies whether a sample on a vertical-left edge is in,
bit 2 species whether a sample on a bottom edge is in, bit
3 specifies whether a sample on a top edge is in
POSSIBLE VALUES:
00 - L-in,R-in,HT-in,HB-in
01 - L-in,R-in,HT-in,HB-out
02 - L-in,R-in,HT-out,HB-in
03 - L-in,R-in,HT-out,HB-out
04 - L-in,R-out,HT-in,HB-in
05 - L-in,R-out,HT-in,HB-out
06 - L-in,R-out,HT-out,HB-in
07 - L-in,R-out,HT-out,HB-out
08 - L-out,R-in,HT-in,HB-in
09 - L-out,R-in,HT-in,HB-out
10 - L-out,R-in,HT-out,HB-in
11 - L-out,R-in,HT-out,HB-out
12 - L-out,R-out,HT-in,HB-in
© 2010 Advanced Micro Devices, Inc.
Proprietary
203
Revision 1.5
June 8, 2010
13 - L-out,R-out,HT-in,HB-out
14 - L-out,R-out,HT-out,HB-in
15 - L-out,R-out,HT-out,HB-out
16 - T-in,B-in,VL-in,VR-in
17 - T-in,B-in,VL-in,VR-out
18 - T-in,B-in,VL,VR-in
19 - T-in,B-in,VL-out,VR-out
20 - T-out,B-in,VL-in,VR-in
21 - T-out,B-in,VL-in,VR-out
22 - T-out,B-in,VL-out,VR-in
23 - T-out,B-in,VL-out,VR-out
24 - T-in,B-out,VL-in,VR-in
25 - T-in,B-out,VL-in,VR-out
26 - T-in,B-out,VL-out,VR-in
27 - T-in,B-out,VL-out,VR-out
28 - T-out,B-out,VL-in,VR-in
29 - T-out,B-out,VL-in,VR-out
30 - T-out,B-out,VL-out,VR-in
31 - T-out,B-out,VL-out,VR-out
ER_POINT
9:5
0x0
Edge rules for triangles, points, left-right lines, right-left
lines, upper-bottom lines, bottom-upper lines. For values
0 to 15, bit 0 specifies whether a sample on a horizontalbottom edge is in, bit 1 specifies whether a sample on a
horizontal-top edge is in, bit 2 species whether a sample
on a right edge is in, bit 3 specifies whether a sample on
a left edge is in. For values 16 to 31, bit 0 specifies
whether a sample on a vertical-right edge is in, bit 1
specifies whether a sample on a vertical-left edge is in,
bit 2 species whether a sample on a bottom edge is in, bit
3 specifies whether a sample on a top edge is in
POSSIBLE VALUES:
00 - L-in,R-in,HT-in,HB-in
01 - L-in,R-in,HT-in,HB-out
02 - L-in,R-in,HT-out,HB-in
03 - L-in,R-in,HT-out,HB-out
04 - L-in,R-out,HT-in,HB-in
05 - L-in,R-out,HT-in,HB-out
06 - L-in,R-out,HT-out,HB-in
07 - L-in,R-out,HT-out,HB-out
08 - L-out,R-in,HT-in,HB-in
09 - L-out,R-in,HT-in,HB-out
10 - L-out,R-in,HT-out,HB-in
11 - L-out,R-in,HT-out,HB-out
12 - L-out,R-out,HT-in,HB-in
13 - L-out,R-out,HT-in,HB-out
14 - L-out,R-out,HT-out,HB-in
15 - L-out,R-out,HT-out,HB-out
16 - T-in,B-in,VL-in,VR-in
17 - T-in,B-in,VL-in,VR-out
18 - T-in,B-in,VL,VR-in
19 - T-in,B-in,VL-out,VR-out
20 - T-out,B-in,VL-in,VR-in
© 2010 Advanced Micro Devices, Inc.
Proprietary
204
Revision 1.5
June 8, 2010
21 - T-out,B-in,VL-in,VR-out
22 - T-out,B-in,VL-out,VR-in
23 - T-out,B-in,VL-out,VR-out
24 - T-in,B-out,VL-in,VR-in
25 - T-in,B-out,VL-in,VR-out
26 - T-in,B-out,VL-out,VR-in
27 - T-in,B-out,VL-out,VR-out
28 - T-out,B-out,VL-in,VR-in
29 - T-out,B-out,VL-in,VR-out
30 - T-out,B-out,VL-out,VR-in
31 - T-out,B-out,VL-out,VR-out
ER_LINE_LR
14:10
0x0
Edge rules for triangles, points, left-right lines, right-left
lines, upper-bottom lines, bottom-upper lines. For values
0 to 15, bit 0 specifies whether a sample on a horizontalbottom edge is in, bit 1 specifies whether a sample on a
horizontal-top edge is in, bit 2 species whether a sample
on a right edge is in, bit 3 specifies whether a sample on
a left edge is in. For values 16 to 31, bit 0 specifies
whether a sample on a vertical-right edge is in, bit 1
specifies whether a sample on a vertical-left edge is in,
bit 2 species whether a sample on a bottom edge is in, bit
3 specifies whether a sample on a top edge is in
POSSIBLE VALUES:
00 - L-in,R-in,HT-in,HB-in
01 - L-in,R-in,HT-in,HB-out
02 - L-in,R-in,HT-out,HB-in
03 - L-in,R-in,HT-out,HB-out
04 - L-in,R-out,HT-in,HB-in
05 - L-in,R-out,HT-in,HB-out
06 - L-in,R-out,HT-out,HB-in
07 - L-in,R-out,HT-out,HB-out
08 - L-out,R-in,HT-in,HB-in
09 - L-out,R-in,HT-in,HB-out
10 - L-out,R-in,HT-out,HB-in
11 - L-out,R-in,HT-out,HB-out
12 - L-out,R-out,HT-in,HB-in
13 - L-out,R-out,HT-in,HB-out
14 - L-out,R-out,HT-out,HB-in
15 - L-out,R-out,HT-out,HB-out
16 - T-in,B-in,VL-in,VR-in
17 - T-in,B-in,VL-in,VR-out
18 - T-in,B-in,VL,VR-in
19 - T-in,B-in,VL-out,VR-out
20 - T-out,B-in,VL-in,VR-in
21 - T-out,B-in,VL-in,VR-out
22 - T-out,B-in,VL-out,VR-in
23 - T-out,B-in,VL-out,VR-out
24 - T-in,B-out,VL-in,VR-in
25 - T-in,B-out,VL-in,VR-out
26 - T-in,B-out,VL-out,VR-in
27 - T-in,B-out,VL-out,VR-out
28 - T-out,B-out,VL-in,VR-in
© 2010 Advanced Micro Devices, Inc.
Proprietary
205
Revision 1.5
June 8, 2010
29 - T-out,B-out,VL-in,VR-out
30 - T-out,B-out,VL-out,VR-in
31 - T-out,B-out,VL-out,VR-out
ER_LINE_RL
19:15
0x0
Edge rules for triangles, points, left-right lines, right-left
lines, upper-bottom lines, bottom-upper lines. For values
0 to 15, bit 0 specifies whether a sample on a horizontalbottom edge is in, bit 1 specifies whether a sample on a
horizontal-top edge is in, bit 2 species whether a sample
on a right edge is in, bit 3 specifies whether a sample on
a left edge is in. For values 16 to 31, bit 0 specifies
whether a sample on a vertical-right edge is in, bit 1
specifies whether a sample on a vertical-left edge is in,
bit 2 species whether a sample on a bottom edge is in, bit
3 specifies whether a sample on a top edge is in
POSSIBLE VALUES:
00 - L-in,R-in,HT-in,HB-in
01 - L-in,R-in,HT-in,HB-out
02 - L-in,R-in,HT-out,HB-in
03 - L-in,R-in,HT-out,HB-out
04 - L-in,R-out,HT-in,HB-in
05 - L-in,R-out,HT-in,HB-out
06 - L-in,R-out,HT-out,HB-in
07 - L-in,R-out,HT-out,HB-out
08 - L-out,R-in,HT-in,HB-in
09 - L-out,R-in,HT-in,HB-out
10 - L-out,R-in,HT-out,HB-in
11 - L-out,R-in,HT-out,HB-out
12 - L-out,R-out,HT-in,HB-in
13 - L-out,R-out,HT-in,HB-out
14 - L-out,R-out,HT-out,HB-in
15 - L-out,R-out,HT-out,HB-out
16 - T-in,B-in,VL-in,VR-in
17 - T-in,B-in,VL-in,VR-out
18 - T-in,B-in,VL,VR-in
19 - T-in,B-in,VL-out,VR-out
20 - T-out,B-in,VL-in,VR-in
21 - T-out,B-in,VL-in,VR-out
22 - T-out,B-in,VL-out,VR-in
23 - T-out,B-in,VL-out,VR-out
24 - T-in,B-out,VL-in,VR-in
25 - T-in,B-out,VL-in,VR-out
26 - T-in,B-out,VL-out,VR-in
27 - T-in,B-out,VL-out,VR-out
28 - T-out,B-out,VL-in,VR-in
29 - T-out,B-out,VL-in,VR-out
30 - T-out,B-out,VL-out,VR-in
31 - T-out,B-out,VL-out,VR-out
ER_LINE_TB
© 2010 Advanced Micro Devices, Inc.
Proprietary
24:20
0x0
Edge rules for triangles, points, left-right lines, right-left
lines, upper-bottom lines, bottom-upper lines. For values
0 to 15, bit 0 specifies whether a sample on a horizontalbottom edge is in, bit 1 specifies whether a sample on a
206
Revision 1.5
June 8, 2010
horizontal-top edge is in, bit 2 species whether a sample
on a right edge is in, bit 3 specifies whether a sample on
a left edge is in. For values 16 to 31, bit 0 specifies
whether a sample on a vertical-right edge is in, bit 1
specifies whether a sample on a vertical-left edge is in,
bit 2 species whether a sample on a bottom edge is in, bit
3 specifies whether a sample on a top edge is in
POSSIBLE VALUES:
00 - L-in,R-in,HT-in,HB-in
01 - L-in,R-in,HT-in,HB-out
02 - L-in,R-in,HT-out,HB-in
03 - L-in,R-in,HT-out,HB-out
04 - L-in,R-out,HT-in,HB-in
05 - L-in,R-out,HT-in,HB-out
06 - L-in,R-out,HT-out,HB-in
07 - L-in,R-out,HT-out,HB-out
08 - L-out,R-in,HT-in,HB-in
09 - L-out,R-in,HT-in,HB-out
10 - L-out,R-in,HT-out,HB-in
11 - L-out,R-in,HT-out,HB-out
12 - L-out,R-out,HT-in,HB-in
13 - L-out,R-out,HT-in,HB-out
14 - L-out,R-out,HT-out,HB-in
15 - L-out,R-out,HT-out,HB-out
16 - T-in,B-in,VL-in,VR-in
17 - T-in,B-in,VL-in,VR-out
18 - T-in,B-in,VL,VR-in
19 - T-in,B-in,VL-out,VR-out
20 - T-out,B-in,VL-in,VR-in
21 - T-out,B-in,VL-in,VR-out
22 - T-out,B-in,VL-out,VR-in
23 - T-out,B-in,VL-out,VR-out
24 - T-in,B-out,VL-in,VR-in
25 - T-in,B-out,VL-in,VR-out
26 - T-in,B-out,VL-out,VR-in
27 - T-in,B-out,VL-out,VR-out
28 - T-out,B-out,VL-in,VR-in
29 - T-out,B-out,VL-in,VR-out
30 - T-out,B-out,VL-out,VR-in
31 - T-out,B-out,VL-out,VR-out
ER_LINE_BT
© 2010 Advanced Micro Devices, Inc.
Proprietary
29:25
0x0
Edge rules for triangles, points, left-right lines, right-left
lines, upper-bottom lines, bottom-upper lines. For values
0 to 15, bit 0 specifies whether a sample on a horizontalbottom edge is in, bit 1 specifies whether a sample on a
horizontal-top edge is in, bit 2 species whether a sample
on a right edge is in, bit 3 specifies whether a sample on
a left edge is in. For values 16 to 31, bit 0 specifies
whether a sample on a vertical-right edge is in, bit 1
specifies whether a sample on a vertical-left edge is in,
bit 2 species whether a sample on a bottom edge is in, bit
3 specifies whether a sample on a top edge is in
207
Revision 1.5
June 8, 2010
POSSIBLE VALUES:
00 - L-in,R-in,HT-in,HB-in
01 - L-in,R-in,HT-in,HB-out
02 - L-in,R-in,HT-out,HB-in
03 - L-in,R-in,HT-out,HB-out
04 - L-in,R-out,HT-in,HB-in
05 - L-in,R-out,HT-in,HB-out
06 - L-in,R-out,HT-out,HB-in
07 - L-in,R-out,HT-out,HB-out
08 - L-out,R-in,HT-in,HB-in
09 - L-out,R-in,HT-in,HB-out
10 - L-out,R-in,HT-out,HB-in
11 - L-out,R-in,HT-out,HB-out
12 - L-out,R-out,HT-in,HB-in
13 - L-out,R-out,HT-in,HB-out
14 - L-out,R-out,HT-out,HB-in
15 - L-out,R-out,HT-out,HB-out
16 - T-in,B-in,VL-in,VR-in
17 - T-in,B-in,VL-in,VR-out
18 - T-in,B-in,VL,VR-in
19 - T-in,B-in,VL-out,VR-out
20 - T-out,B-in,VL-in,VR-in
21 - T-out,B-in,VL-in,VR-out
22 - T-out,B-in,VL-out,VR-in
23 - T-out,B-in,VL-out,VR-out
24 - T-in,B-out,VL-in,VR-in
25 - T-in,B-out,VL-in,VR-out
26 - T-in,B-out,VL-out,VR-in
27 - T-in,B-out,VL-out,VR-out
28 - T-out,B-out,VL-in,VR-in
29 - T-out,B-out,VL-in,VR-out
30 - T-out,B-out,VL-out,VR-in
31 - T-out,B-out,VL-out,VR-out
SC:SC_HYPERZ_EN · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x43a4
DESCRIPTION: Hierarchical Z Enable
Field Name
Bits
Default
Description
HZ_EN
0
0x0
Enable for hierarchical Z.
POSSIBLE VALUES:
00 - Disables Hyper-Z.
01 - Enables Hyper-Z.
HZ_MAX
1
0x0
Specifies whether to compute min or max z value
POSSIBLE VALUES:
00 - HZ block computes minimum z value
01 - HZ block computes maximum z value
HZ_ADJ
© 2010 Advanced Micro Devices, Inc.
Proprietary
4:2
0x0
Specifies adjustment to get added or subtracted from
208
Revision 1.5
June 8, 2010
computed z value
POSSIBLE VALUES:
00 - Add or Subtract 1/256 << ze
01 - Add or Subtract 1/128 << ze
02 - Add or Subtract 1/64 << ze
03 - Add or Subtract 1/32 << ze
04 - Add or Subtract 1/16 << ze
05 - Add or Subtract 1/8 << ze
06 - Add or Subtract 1/4 << ze
07 - Add or Subtract 1/2 << ze
HZ_Z0MIN
5
0x0
Specifies whether vertex 0 z contains minimum z value
POSSIBLE VALUES:
00 - Vertex 0 does not contain minimum z value
01 - Vertex 0 does contain minimum z value
HZ_Z0MAX
6
0x0
Specifies whether vertex 0 z contains maximum z value
POSSIBLE VALUES:
00 - Vertex 0 does not contain maximum z value
01 - Vertex 0 does contain maximum z value
SC:SC_SCISSOR0 · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x43e0
DESCRIPTION: Scissor rectangle specification
Field Name
Bits
Default
Description
XS0
12:0
0x0
Left hand edge of scissor rectangle
YS0
25:13
0x0
Upper edge of scissor rectangle
SC:SC_SCISSOR1 · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x43e4
DESCRIPTION: Scissor rectangle specification
Field Name
Bits
Default
Description
XS1
12:0
0x0
Right hand edge of scissor rectangle
YS1
25:13
0x0
Lower edge of scissor rectangle
SC:SC_SCREENDOOR · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x43e8
DESCRIPTION: Screen door sample mask
Field Name
Bits
Default
Description
SCREENDOOR
23:0
0x0
Screen door sample mask - 1 means sample may be
covered, 0 means sample is not covered
© 2010 Advanced Micro Devices, Inc.
Proprietary
209
Revision 1.5
June 8, 2010
11.8 Setup Unit Registers
SU:SU_CULL_MODE · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x42b8
DESCRIPTION: Culling Enables
Field Name
Bits
Default
Description
CULL_FRONT
0
0x0
Enable for front-face culling.
POSSIBLE VALUES:
00 - Do not cull front-facing triangles.
01 - Cull front-facing triangles.
CULL_BACK
1
0x0
Enable for back-face culling.
POSSIBLE VALUES:
00 - Do not cull back-facing triangles.
01 - Cull back-facing triangles.
FACE
2
0x0
X-Ored with cross product sign to determine positive
facing
POSSIBLE VALUES:
00 - Positive cross product is front (CCW).
01 - Negative cross product is front (CW).
SU:SU_DEPTH_OFFSET · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x42c4
DESCRIPTION: SU Depth Offset value
Field Name
Bits
Default
Description
OFFSET
31:0
0x0
SPFP Floating point applied to depth before conversion
to FXP.
SU:SU_DEPTH_SCALE · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x42c0
DESCRIPTION: SU Depth Scale value
Field Name
Bits
Default
Description
SCALE
31:0
0x3F800000 SPFP Floating point applied to depth before conversion
to FXP.
SU:SU_POLY_OFFSET_BACK_OFFSET · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x42b0
DESCRIPTION: Back-Facing Polygon Offset Offset
Field Name
Bits
Default
Description
OFFSET
31:0
0x0
Specifies polygon offset offset for back-facing polygons;
32b IEEE float format; applied after Z scale & offset (0
to 2^24-1 range)
© 2010 Advanced Micro Devices, Inc.
Proprietary
210
Revision 1.5
June 8, 2010
SU:SU_POLY_OFFSET_BACK_SCALE · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x42ac
DESCRIPTION: Back-Facing Polygon Offset Scale
Field Name
Bits
Default
Description
SCALE
31:0
0x0
Specifies polygon offset scale for back-facing polygons;
32-bit IEEE float format; applied after Z scale & offset
(0 to 2^24-1 range); slope computed in subpixels (1/12 or
1/16)
SU:SU_POLY_OFFSET_ENABLE · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x42b4
DESCRIPTION: Enables for polygon offset
Field Name
Bits
Default
Description
FRONT_ENABLE
0
0x0
Enables front facing polygon`s offset.
POSSIBLE VALUES:
00 - Disable front offset.
01 - Enable front offset.
BACK_ENABLE
1
0x0
Enables back facing polygon`s offset.
POSSIBLE VALUES:
00 - Disable back offset.
01 - Enable back offset.
PARA_ENABLE
2
0x0
Forces all parallelograms to have FRONT_FACING for
poly offset -- Need to have FRONT_ENABLE also set to
have Z offset for parallelograms.
POSSIBLE VALUES:
00 - Disable front offset for parallelograms.
01 - Enable front offset for parallelograms.
SU:SU_POLY_OFFSET_FRONT_OFFSET · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x42a8
DESCRIPTION: Front-Facing Polygon Offset Offset
Field Name
Bits
Default
Description
OFFSET
31:0
0x0
Specifies polygon offset offset for front-facing polygons;
32b IEEE float format; applied after Z scale & offset (0
to 2^24-1 range)
SU:SU_POLY_OFFSET_FRONT_SCALE · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x42a4
DESCRIPTION: Front-Facing Polygon Offset Scale
Field Name
Bits
Default
Description
SCALE
31:0
0x0
Specifies polygon offset scale for front-facing polygons;
32b IEEE float format; applied after Z scale & offset (0
to 2^24-1 range); slope computed in subpixels (1/12 or
1/16)
© 2010 Advanced Micro Devices, Inc.
Proprietary
211
Revision 1.5
June 8, 2010
SU:SU_REG_DEST · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x42c8
DESCRIPTION: SU Raster pipe destination select for registers
Field Name
Bits
Default
Description
SELECT
3:0
0xF
Register read/write destination select: b0: logical pipe0,
b1: logical pipe1, b2: logical pipe2 and b3: logical pipe3
SU:SU_TEX_WRAP · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x42a0
DESCRIPTION: Enables for Cylindrical Wrapping
Field Name
Bits
Default
Description
T0C0
0
0x0
tNcM -- Enable texture wrapping on component M
(S,T,R,Q) of texture N.
POSSIBLE VALUES:
00 - Disable cylindrical wrapping.
01 - Enable cylindrical wrapping.
T0C1
1
0x0
tNcM -- Enable texture wrapping on component M
(S,T,R,Q) of texture N.
POSSIBLE VALUES:
00 - Disable cylindrical wrapping.
01 - Enable cylindrical wrapping.
T0C2
2
0x0
tNcM -- Enable texture wrapping on component M
(S,T,R,Q) of texture N.
POSSIBLE VALUES:
00 - Disable cylindrical wrapping.
01 - Enable cylindrical wrapping.
T0C3
3
0x0
tNcM -- Enable texture wrapping on component M
(S,T,R,Q) of texture N.
POSSIBLE VALUES:
00 - Disable cylindrical wrapping.
01 - Enable cylindrical wrapping.
T1C0
4
0x0
tNcM -- Enable texture wrapping on component M
(S,T,R,Q) of texture N.
POSSIBLE VALUES:
00 - Disable cylindrical wrapping.
01 - Enable cylindrical wrapping.
T1C1
5
0x0
tNcM -- Enable texture wrapping on component M
(S,T,R,Q) of texture N.
POSSIBLE VALUES:
00 - Disable cylindrical wrapping.
© 2010 Advanced Micro Devices, Inc.
Proprietary
212
Revision 1.5
June 8, 2010
01 - Enable cylindrical wrapping.
T1C2
6
0x0
tNcM -- Enable texture wrapping on component M
(S,T,R,Q) of texture N.
POSSIBLE VALUES:
00 - Disable cylindrical wrapping.
01 - Enable cylindrical wrapping.
T1C3
7
0x0
tNcM -- Enable texture wrapping on component M
(S,T,R,Q) of texture N.
POSSIBLE VALUES:
00 - Disable cylindrical wrapping.
01 - Enable cylindrical wrapping.
T2C0
8
0x0
tNcM -- Enable texture wrapping on component M
(S,T,R,Q) of texture N.
POSSIBLE VALUES:
00 - Disable cylindrical wrapping.
01 - Enable cylindrical wrapping.
T2C1
9
0x0
tNcM -- Enable texture wrapping on component M
(S,T,R,Q) of texture N.
POSSIBLE VALUES:
00 - Disable cylindrical wrapping.
01 - Enable cylindrical wrapping.
T2C2
10
0x0
tNcM -- Enable texture wrapping on component M
(S,T,R,Q) of texture N.
POSSIBLE VALUES:
00 - Disable cylindrical wrapping.
01 - Enable cylindrical wrapping.
T2C3
11
0x0
tNcM -- Enable texture wrapping on component M
(S,T,R,Q) of texture N.
POSSIBLE VALUES:
00 - Disable cylindrical wrapping.
01 - Enable cylindrical wrapping.
T3C0
12
0x0
tNcM -- Enable texture wrapping on component M
(S,T,R,Q) of texture N.
POSSIBLE VALUES:
00 - Disable cylindrical wrapping.
01 - Enable cylindrical wrapping.
T3C1
13
0x0
tNcM -- Enable texture wrapping on component M
(S,T,R,Q) of texture N.
POSSIBLE VALUES:
00 - Disable cylindrical wrapping.
01 - Enable cylindrical wrapping.
© 2010 Advanced Micro Devices, Inc.
Proprietary
213
Revision 1.5
T3C2
14
0x0
June 8, 2010
tNcM -- Enable texture wrapping on component M
(S,T,R,Q) of texture N.
POSSIBLE VALUES:
00 - Disable cylindrical wrapping.
01 - Enable cylindrical wrapping.
T3C3
15
0x0
tNcM -- Enable texture wrapping on component M
(S,T,R,Q) of texture N.
POSSIBLE VALUES:
00 - Disable cylindrical wrapping.
01 - Enable cylindrical wrapping.
T4C0
16
0x0
tNcM -- Enable texture wrapping on component M
(S,T,R,Q) of texture N.
POSSIBLE VALUES:
00 - Disable cylindrical wrapping.
01 - Enable cylindrical wrapping.
T4C1
17
0x0
tNcM -- Enable texture wrapping on component M
(S,T,R,Q) of texture N.
POSSIBLE VALUES:
00 - Disable cylindrical wrapping.
01 - Enable cylindrical wrapping.
T4C2
18
0x0
tNcM -- Enable texture wrapping on component M
(S,T,R,Q) of texture N.
POSSIBLE VALUES:
00 - Disable cylindrical wrapping.
01 - Enable cylindrical wrapping.
T4C3
19
0x0
tNcM -- Enable texture wrapping on component M
(S,T,R,Q) of texture N.
POSSIBLE VALUES:
00 - Disable cylindrical wrapping.
01 - Enable cylindrical wrapping.
T5C0
20
0x0
tNcM -- Enable texture wrapping on component M
(S,T,R,Q) of texture N.
POSSIBLE VALUES:
00 - Disable cylindrical wrapping.
01 - Enable cylindrical wrapping.
T5C1
21
0x0
tNcM -- Enable texture wrapping on component M
(S,T,R,Q) of texture N.
POSSIBLE VALUES:
00 - Disable cylindrical wrapping.
01 - Enable cylindrical wrapping.
T5C2
© 2010 Advanced Micro Devices, Inc.
Proprietary
22
0x0
tNcM -- Enable texture wrapping on component M
214
Revision 1.5
June 8, 2010
(S,T,R,Q) of texture N.
POSSIBLE VALUES:
00 - Disable cylindrical wrapping.
01 - Enable cylindrical wrapping.
T5C3
23
0x0
tNcM -- Enable texture wrapping on component M
(S,T,R,Q) of texture N.
POSSIBLE VALUES:
00 - Disable cylindrical wrapping.
01 - Enable cylindrical wrapping.
T6C0
24
0x0
tNcM -- Enable texture wrapping on component M
(S,T,R,Q) of texture N.
POSSIBLE VALUES:
00 - Disable cylindrical wrapping.
01 - Enable cylindrical wrapping.
T6C1
25
0x0
tNcM -- Enable texture wrapping on component M
(S,T,R,Q) of texture N.
POSSIBLE VALUES:
00 - Disable cylindrical wrapping.
01 - Enable cylindrical wrapping.
T6C2
26
0x0
tNcM -- Enable texture wrapping on component M
(S,T,R,Q) of texture N.
POSSIBLE VALUES:
00 - Disable cylindrical wrapping.
01 - Enable cylindrical wrapping.
T6C3
27
0x0
tNcM -- Enable texture wrapping on component M
(S,T,R,Q) of texture N.
POSSIBLE VALUES:
00 - Disable cylindrical wrapping.
01 - Enable cylindrical wrapping.
T7C0
28
0x0
tNcM -- Enable texture wrapping on component M
(S,T,R,Q) of texture N.
POSSIBLE VALUES:
00 - Disable cylindrical wrapping.
01 - Enable cylindrical wrapping.
T7C1
29
0x0
tNcM -- Enable texture wrapping on component M
(S,T,R,Q) of texture N.
POSSIBLE VALUES:
00 - Disable cylindrical wrapping.
01 - Enable cylindrical wrapping.
T7C2
© 2010 Advanced Micro Devices, Inc.
Proprietary
30
0x0
tNcM -- Enable texture wrapping on component M
(S,T,R,Q) of texture N.
215
Revision 1.5
June 8, 2010
POSSIBLE VALUES:
00 - Disable cylindrical wrapping.
01 - Enable cylindrical wrapping.
T7C3
31
0x0
tNcM -- Enable texture wrapping on component M
(S,T,R,Q) of texture N.
POSSIBLE VALUES:
00 - Disable cylindrical wrapping.
01 - Enable cylindrical wrapping.
SU:SU_TEX_WRAP_PS3 · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x4114
DESCRIPTION: Specifies texture wrapping for new PS3 textures.
Field Name
Bits
Default
Description
T9C0
0
0x0
tNcM -- Enable texture wrapping on component M
(S,T,R,Q) of texture N.
POSSIBLE VALUES:
00 - Disable cylindrical wrapping.
01 - Enable cylindrical wrapping.
T9C1
1
0x0
tNcM -- Enable texture wrapping on component M
(S,T,R,Q) of texture N.
POSSIBLE VALUES:
00 - Disable cylindrical wrapping.
01 - Enable cylindrical wrapping.
T9C2
2
0x0
tNcM -- Enable texture wrapping on component M
(S,T,R,Q) of texture N.
POSSIBLE VALUES:
00 - Disable cylindrical wrapping.
01 - Enable cylindrical wrapping.
T9C3
3
0x0
tNcM -- Enable texture wrapping on component M
(S,T,R,Q) of texture N.
POSSIBLE VALUES:
00 - Disable cylindrical wrapping.
01 - Enable cylindrical wrapping.
T8C0
4
0x0
tNcM -- Enable texture wrapping on component M
(S,T,R,Q) of texture N.
POSSIBLE VALUES:
00 - Disable cylindrical wrapping.
© 2010 Advanced Micro Devices, Inc.
Proprietary
216
Revision 1.5
June 8, 2010
01 - Enable cylindrical wrapping.
T8C1
5
0x0
tNcM -- Enable texture wrapping on component M
(S,T,R,Q) of texture N.
POSSIBLE VALUES:
00 - Disable cylindrical wrapping.
01 - Enable cylindrical wrapping.
T8C2
6
0x0
tNcM -- Enable texture wrapping on component M
(S,T,R,Q) of texture N.
POSSIBLE VALUES:
00 - Disable cylindrical wrapping.
01 - Enable cylindrical wrapping.
T8C3
7
0x0
tNcM -- Enable texture wrapping on component M
(S,T,R,Q) of texture N.
POSSIBLE VALUES:
00 - Disable cylindrical wrapping.
01 - Enable cylindrical wrapping.
© 2010 Advanced Micro Devices, Inc.
Proprietary
217
Revision 1.5
June 8, 2010
11.9 Texture Registers
TX:TX_BORDER_COLOR_[0-15] · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x45c0-0x45fc
DESCRIPTION: Border Color
Field Name
Bits
Default
Description
BORDER_COLOR
31:0
none
Color used for borders. Format is the same as the texture
being bordered.
TX:TX_CHROMA_KEY_[0-15] · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x4580-0x45bc
DESCRIPTION: Texture Chroma Key
Field Name
Bits
Default
Description
CHROMA_KEY
31:0
none
Color used for chroma key compare. Format is the same
as the texture being keyed.
TX:TX_ENABLE · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x4104
DESCRIPTION: Texture Enables for Maps 0 to 15
Field Name
Bits
Default
Description
TEX_0_ENABLE
0
none
Texture Map Enables.
POSSIBLE VALUES:
00 - Disable, ARGB = 1,0,0,0
01 - Enable
TEX_1_ENABLE
1
none
Texture Map Enables.
POSSIBLE VALUES:
00 - Disable, ARGB = 1,0,0,0
01 - Enable
TEX_2_ENABLE
2
none
Texture Map Enables.
POSSIBLE VALUES:
00 - Disable, ARGB = 1,0,0,0
01 - Enable
TEX_3_ENABLE
3
none
Texture Map Enables.
POSSIBLE VALUES:
00 - Disable, ARGB = 1,0,0,0
01 - Enable
TEX_4_ENABLE
4
none
Texture Map Enables.
POSSIBLE VALUES:
00 - Disable, ARGB = 1,0,0,0
01 - Enable
TEX_5_ENABLE
© 2010 Advanced Micro Devices, Inc.
Proprietary
5
none
Texture Map Enables.
218
Revision 1.5
June 8, 2010
POSSIBLE VALUES:
00 - Disable, ARGB = 1,0,0,0
01 - Enable
TEX_6_ENABLE
6
none
Texture Map Enables.
POSSIBLE VALUES:
00 - Disable, ARGB = 1,0,0,0
01 - Enable
TEX_7_ENABLE
7
none
Texture Map Enables.
POSSIBLE VALUES:
00 - Disable, ARGB = 1,0,0,0
01 - Enable
TEX_8_ENABLE
8
none
Texture Map Enables.
POSSIBLE VALUES:
00 - Disable, ARGB = 1,0,0,0
01 - Enable
TEX_9_ENABLE
9
none
Texture Map Enables.
POSSIBLE VALUES:
00 - Disable, ARGB = 1,0,0,0
01 - Enable
TEX_10_ENABLE
10
none
Texture Map Enables.
POSSIBLE VALUES:
00 - Disable, ARGB = 1,0,0,0
01 - Enable
TEX_11_ENABLE
11
none
Texture Map Enables.
POSSIBLE VALUES:
00 - Disable, ARGB = 1,0,0,0
01 - Enable
TEX_12_ENABLE
12
none
Texture Map Enables.
POSSIBLE VALUES:
00 - Disable, ARGB = 1,0,0,0
01 - Enable
TEX_13_ENABLE
13
none
Texture Map Enables.
POSSIBLE VALUES:
00 - Disable, ARGB = 1,0,0,0
01 - Enable
TEX_14_ENABLE
14
none
Texture Map Enables.
POSSIBLE VALUES:
00 - Disable, ARGB = 1,0,0,0
© 2010 Advanced Micro Devices, Inc.
Proprietary
219
Revision 1.5
June 8, 2010
01 - Enable
TEX_15_ENABLE
15
none
Texture Map Enables.
POSSIBLE VALUES:
00 - Disable, ARGB = 1,0,0,0
01 - Enable
TX:TX_FILTER0_[0-15] · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x4400-0x443c
DESCRIPTION: Texture Filter State
Field Name
Bits
Default
Description
CLAMP_S
2:0
none
Clamp mode for texture coordinates
POSSIBLE VALUES:
00 - Wrap (repeat)
01 - Mirror
02 - Clamp to last texel (0.0 to 1.0)
03 - MirrorOnce to last texel (-1.0 to 1.0)
04 - Clamp half way to border color (0.0 to 1.0)
05 - MirrorOnce half way to border color (-1.0 to 1.0)
06 - Clamp to border color (0.0 to 1.0)
07 - MirrorOnce to border color (-1.0 to 1.0)
CLAMP_T
5:3
none
Clamp mode for texture coordinates
POSSIBLE VALUES:
00 - Wrap (repeat)
01 - Mirror
02 - Clamp to last texel (0.0 to 1.0)
03 - MirrorOnce to last texel (-1.0 to 1.0)
04 - Clamp half way to border color (0.0 to 1.0)
05 - MirrorOnce half way to border color (-1.0 to 1.0)
06 - Clamp to border color (0.0 to 1.0)
07 - MirrorOnce to border color (-1.0 to 1.0)
CLAMP_R
8:6
none
Clamp mode for texture coordinates
POSSIBLE VALUES:
00 - Wrap (repeat)
01 - Mirror
02 - Clamp to last texel (0.0 to 1.0)
03 - MirrorOnce to last texel (-1.0 to 1.0)
04 - Clamp half way to border color (0.0 to 1.0)
05 - MirrorOnce half way to border color (-1.0 to 1.0)
06 - Clamp to border color (0.0 to 1.0)
07 - MirrorOnce to border color (-1.0 to 1.0)
MAG_FILTER
10:9
none
Filter used when texture is magnified
POSSIBLE VALUES:
00 - Filter4
01 - Point
© 2010 Advanced Micro Devices, Inc.
Proprietary
220
Revision 1.5
June 8, 2010
02 - Linear
03 - Reserved
MIN_FILTER
12:11
none
Filter used when texture is minified
POSSIBLE VALUES:
00 - Filter4
01 - Point
02 - Linear
03 - Reserved
MIP_FILTER
14:13
none
Filter used between mipmap levels
POSSIBLE VALUES:
00 - None
01 - Point
02 - Linear
03 - Reserved
VOL_FILTER
16:15
none
Filter used between layers of a volume
POSSIBLE VALUES:
00 - None (no filter specifed, select from MIN/MAG
filters)
01 - Point
02 - Linear
03 - Reserved
MAX_MIP_LEVEL
20:17
none
Reserved
23:21
none
ID
31:28
none
LOD index of largest (finest) mipmap to use (0 is
largest). Ranges from 0 to NUM_LEVELS.
Logical id for this physical texture
TX:TX_FILTER1_[0-15] · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x4440-0x447c
DESCRIPTION: Texture Filter State
Field Name
Bits
Default
Description
CHROMA_KEY_MODE
1:0
none
Chroma Key Mode
POSSIBLE VALUES:
00 - Disable
01 - ChromaKey (kill pixel if any sample matches
chroma key)
02 - ChromaKeyBlend (set sample to 0 if it matches
chroma key)
MC_ROUND
2
none
Bilinear rounding mode
POSSIBLE VALUES:
00 - Normal rounding on all components (+0.5)
01 - MPEG4 rounding on all components (+0.25)
LOD_BIAS
© 2010 Advanced Micro Devices, Inc.
Proprietary
12:3
none
(s4.5). Ranges from -16.0 to 15.99. Mipmap LOD bias
measured in mipmap levels. Added to the signed,
221
Revision 1.5
June 8, 2010
computed LOD before the LOD is clamped.
Reserved
13
none
MC_COORD_TRUNCATE
14
none
MPEG coordinate truncation mode
POSSIBLE VALUES:
00 - Dont truncate coordinate fractions.
01 - Truncate coordinate fractions to 0.0 and 0.5 for
MPEG
TRI_PERF
16:15
none
Apply slope and bias to trilerp fraction to reduce the
number of 2-level fetches for trilinear. Should only be
used if MIP_FILTER is LINEAR.
POSSIBLE VALUES:
00 - Breakpoint=0/8. lfrac_out = lfrac_in
01 - Breakpoint=1/8. lfrac_out = clamp(4/3*lfrac_in 1/6)
02 - Breakpoint=1/4. lfrac_out = clamp(2*lfrac_in 1/2)
03 - Breakpoint=3/8. lfrac_out = clamp(4*lfrac_in 3/2)
Reserved
19:17
none
Set to 0
Reserved
20
none
Set to 0
Reserved
21
none
Set to 0
MACRO_SWITCH
22
none
If enabled, addressing switches to macro-linear when
image width is <= 8 micro-tiles. If disabled, functionality
is same as RV350, switch to macro-linear when image
width is < 8 micro-tiles.
POSSIBLE VALUES:
00 - RV350 mode
01 - Switch from macro-tiled to macro-linear when
(width <= 8 micro-tiles)
Reserved
28:23
none
Reserved
29
none
Reserved
30
none
BORDER_FIX
31
none
To fix issues when using non-square mipmaps, with
border_color, and extreme minification.
POSSIBLE VALUES:
00 - R3xx R4xx mode
01 - Stop right shifting coord once mip size is pinned
to one
TX:TX_FILTER4 · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x4110
DESCRIPTION: Filter4 Kernel
Field Name
© 2010 Advanced Micro Devices, Inc.
Proprietary
Bits
Default
Description
222
Revision 1.5
June 8, 2010
WEIGHT_1
10:0
none
(s1.9). Bottom or Right weight of pair.
WEIGHT_0
21:11
none
(s1.9). Top or Left weight of pair.
WEIGHT_PAIR
22
none
Indicates which pair of weights within phase to load.
POSSIBLE VALUES:
00 - Top or Left
01 - Bottom or Right
PHASE
26:23
none
Indicates which of 9 phases to load
DIRECTION
27
none
Indicates whether to load the horizontal or vertical
weights
POSSIBLE VALUES:
00 - Horizontal
01 - Vertical
TX:TX_FORMAT0_[0-15] · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x4480-0x44bc
DESCRIPTION: Texture Format State
Field Name
Bits
Default
Description
TXWIDTH
10:0
none
Image width - 1. The largest image is 4096 texels. When
wrapping or mirroring, must be a power of 2. When
mipmapping, must be a power of 2 or padded to a power
of 2 in memory. Can always be non-square, except for
cube maps which must be square.
TXHEIGHT
21:11
none
Image height - 1. The largest image is 4096 texels. When
wrapping or mirroring, must be a power of 2. When
mipmapping, must be a power of 2 or padded to a power
of 2 in memory. Can always be non-square, except for
cube maps which must be square.
TXDEPTH
25:22
none
LOG2(depth) of volume texture
NUM_LEVELS
29:26
none
Number of mipmap levels minus 1. Ranges from 0 to 12.
Equivalent to LOD index of smallest (coarsest) mipmap
to use.
PROJECTED
30
none
Specifies whether texture coords are projected.
POSSIBLE VALUES:
00 - Non-Projected
01 - Projected
TXPITCH_EN
31
none
Indicates when TXPITCH should be used instead of
TXWIDTH for image addressing
POSSIBLE VALUES:
00 - Use TXWIDTH for image addressing
01 - Use TXPITCH for image addressing
TX:TX_FORMAT1_[0-15] · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x44c0-0x44fc
© 2010 Advanced Micro Devices, Inc.
Proprietary
223
Revision 1.5
June 8, 2010
DESCRIPTION: Texture Format State
Field Name
Bits
Default
Description
TXFORMAT
4:0
none
Texture Format. Components are numbered right to left.
Parenthesis indicate typical uses of each format.
POSSIBLE VALUES:
00 - TX_FMT_8 or TX_FMT_1 (if
TX_FORMAT2.TXFORMAT_MSB is set)
01 - TX_FMT_16 or TX_FMT_1_REVERSE (if
TX_FORMAT2.TXFORMAT_MSB is set)
02 - TX_FMT_4_4 or TX_FMT_10 (if
TX_FORMAT2.TXFORMAT_MSB is set)
03 - TX_FMT_8_8 or TX_FMT_10_10 (if
TX_FORMAT2.TXFORMAT_MSB is set)
04 - TX_FMT_16_16 or TX_FMT_10_10_10_10 (if
TX_FORMAT2.TXFORMAT_MSB is set)
05 - TX_FMT_3_3_2 or TX_FMT_ATI1N (if
TX_FORMAT2.TXFORMAT_MSB is set)
06 - TX_FMT_5_6_5 or TX_FMT_24_8 (if
TX_FORMAT2.TXFORMAT_MSB is set)
07 - TX_FMT_6_5_5
08 - TX_FMT_11_11_10
09 - TX_FMT_10_11_11
10 - TX_FMT_4_4_4_4
11 - TX_FMT_1_5_5_5
12 - TX_FMT_8_8_8_8
13 - TX_FMT_2_10_10_10
14 - TX_FMT_16_16_16_16
15 - Reserved
16 - Reserved
17 - Reserved
18 - TX_FMT_Y8
19 - TX_FMT_AVYU444
20 - TX_FMT_VYUY422
21 - TX_FMT_YVYU422
22 - TX_FMT_16_MPEG
23 - TX_FMT_16_16_MPEG
24 - TX_FMT_16f
25 - TX_FMT_16f_16f
26 - TX_FMT_16f_16f_16f_16f
27 - TX_FMT_32f
28 - TX_FMT_32f_32f
29 - TX_FMT_32f_32f_32f_32f
30 - TX_FMT_W24_FP
31 - TX_FMT_ATI2N
SIGNED_COMP0
5
none
Component filter should interpret texel data as signed or
unsigned. (Ignored for Y/YUV formats.)
POSSIBLE VALUES:
00 - Component filter should interpret texel data as
unsigned
01 - Component filter should interpret texel data as
© 2010 Advanced Micro Devices, Inc.
Proprietary
224
Revision 1.5
June 8, 2010
signed
SIGNED_COMP1
6
none
Component filter should interpret texel data as signed or
unsigned. (Ignored for Y/YUV formats.)
POSSIBLE VALUES:
00 - Component filter should interpret texel data as
unsigned
01 - Component filter should interpret texel data as
signed
SIGNED_COMP2
7
none
Component filter should interpret texel data as signed or
unsigned. (Ignored for Y/YUV formats.)
POSSIBLE VALUES:
00 - Component filter should interpret texel data as
unsigned
01 - Component filter should interpret texel data as
signed
SIGNED_COMP3
8
none
Component filter should interpret texel data as signed or
unsigned. (Ignored for Y/YUV formats.)
POSSIBLE VALUES:
00 - Component filter should interpret texel data as
unsigned
01 - Component filter should interpret texel data as
signed
SEL_ALPHA
11:9
none
Specifies swizzling for each channel at the input of the
pixel shader. (Ignored for Y/YUV formats.)
POSSIBLE VALUES:
00 - Select Texture Component0.
01 - Select Texture Component1.
02 - Select Texture Component2.
03 - Select Texture Component3.
04 - Select the value 0.
05 - Select the value 1.
SEL_RED
14:12
none
Specifies swizzling for each channel at the input of the
pixel shader. (Ignored for Y/YUV formats.)
POSSIBLE VALUES:
00 - Select Texture Component0.
01 - Select Texture Component1.
02 - Select Texture Component2.
03 - Select Texture Component3.
04 - Select the value 0.
05 - Select the value 1.
SEL_GREEN
17:15
none
Specifies swizzling for each channel at the input of the
pixel shader. (Ignored for Y/YUV formats.)
POSSIBLE VALUES:
00 - Select Texture Component0.
© 2010 Advanced Micro Devices, Inc.
Proprietary
225
Revision 1.5
June 8, 2010
01 - Select Texture Component1.
02 - Select Texture Component2.
03 - Select Texture Component3.
04 - Select the value 0.
05 - Select the value 1.
SEL_BLUE
20:18
none
Specifies swizzling for each channel at the input of the
pixel shader. (Ignored for Y/YUV formats.)
POSSIBLE VALUES:
00 - Select Texture Component0.
01 - Select Texture Component1.
02 - Select Texture Component2.
03 - Select Texture Component3.
04 - Select the value 0.
05 - Select the value 1.
GAMMA
21
none
Optionally remove gamma from texture before passing to
shader. Only apply to 8bit or less components.
POSSIBLE VALUES:
00 - Disable gamma removal
01 - Enable gamma removal
YUV_TO_RGB
23:22
none
YUV to RGB conversion mode
POSSIBLE VALUES:
00 - Disable YUV to RGB conversion
01 - Enable YUV to RGB conversion (with clamp)
02 - Enable YUV to RGB conversion (without
clamp)
SWAP_YUV
24
none
POSSIBLE VALUES:
00 - Disable swap YUV mode
01 - Enable swap YUV mode (hw inverts upper bit of
U and V)
TEX_COORD_TYPE
26:25
none
Specifies coordinate type.
POSSIBLE VALUES:
00 - 2D
01 - 3D
02 - Cube
03 - Reserved
CACHE
31:27
none
This field is ignored on R520 and RV510.
POSSIBLE VALUES:
00 - WHOLE
01 - Reserved
02 - HALF_REGION_0
03 - HALF_REGION_1
04 - FOURTH_REGION_0
05 - FOURTH_REGION_1
06 - FOURTH_REGION_2
07 - FOURTH_REGION_3
© 2010 Advanced Micro Devices, Inc.
Proprietary
226
Revision 1.5
June 8, 2010
08 - EIGHTH_REGION_0
09 - EIGHTH_REGION_1
10 - EIGHTH_REGION_2
11 - EIGHTH_REGION_3
12 - EIGHTH_REGION_4
13 - EIGHTH_REGION_5
14 - EIGHTH_REGION_6
15 - EIGHTH_REGION_7
16 - SIXTEENTH_REGION_0
17 - SIXTEENTH_REGION_1
18 - SIXTEENTH_REGION_2
19 - SIXTEENTH_REGION_3
20 - SIXTEENTH_REGION_4
21 - SIXTEENTH_REGION_5
22 - SIXTEENTH_REGION_6
23 - SIXTEENTH_REGION_7
24 - SIXTEENTH_REGION_8
25 - SIXTEENTH_REGION_9
26 - SIXTEENTH_REGION_A
27 - SIXTEENTH_REGION_B
28 - SIXTEENTH_REGION_C
29 - SIXTEENTH_REGION_D
30 - SIXTEENTH_REGION_E
31 - SIXTEENTH_REGION_F
TX:TX_FORMAT2_[0-15] · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x4500-0x453c
DESCRIPTION: Texture Format State
Field Name
Bits
Default
Description
TXPITCH
13:0
none
Used instead of TXWIDTH for image addressing when
TXPITCH_EN is asserted. Pitch is given as number of
texels minus one. Maximum pitch is 16K texels.
TXFORMAT_MSB
14
none
Specifies the MSB of the texture format to extend the
number of formats to 64.
TXWIDTH_11
15
none
Specifies bit 11 of TXWIDTH to extend the largest
image to 4096 texels.
TXHEIGHT_11
16
none
Specifies bit 11 of TXHEIGHT to extend the largest
image to 4096 texels.
POW2FIX2FLT
17
none
Optionally divide by 256 instead of 255 during fix2float.
Can only be asserted for 8-bit components.
POSSIBLE VALUES:
00 - Divide by pow2-1 for fix2float (default)
01 - Divide by pow2 for fix2float
SEL_FILTER4
19:18
none
If filter4 is enabled, specifies which texture component
to apply filter4 to.
POSSIBLE VALUES:
00 - Select Texture Component0.
© 2010 Advanced Micro Devices, Inc.
Proprietary
227
Revision 1.5
June 8, 2010
01 - Select Texture Component1.
02 - Select Texture Component2.
03 - Select Texture Component3.
TX:TX_INVALTAGS · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x4100
DESCRIPTION: Invalidate texture cache tags
Field Name
Bits
Default
Description
RESERVED
31:0
none
Unused
TX:TX_OFFSET_[0-15] · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x4540-0x457c
DESCRIPTION: Texture Offset State
Field Name
Bits
Default
Description
ENDIAN_SWAP
1:0
none
Endian Control
POSSIBLE VALUES:
00 - No swap
01 - 16 bit swap
02 - 32 bit swap
03 - Half-DWORD swap
MACRO_TILE
2
none
Macro Tile Control
POSSIBLE VALUES:
00 - 2KB page is linear
01 - 2KB page is tiled
MICRO_TILE
4:3
none
Micro Tile Control
POSSIBLE VALUES:
00 - 32 byte cache line is linear
01 - 32 byte cache line is tiled
02 - 32 byte cache line is tiled square (only applies to
16-bit texel)
03 - Reserved
TXOFFSET
© 2010 Advanced Micro Devices, Inc.
Proprietary
31:5
none
32-byte aligned pointer to base map
228
Revision 1.5
June 8, 2010
11.10 Fragment Shader Registers
US:US_ALU_ALPHA_INST_[0-511] · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0xa800-0xaffc
DESCRIPTION: ALU Alpha Instruction
Field Name
Bits
Default
Description
ALPHA_OP
3:0
0x0
Specifies the opcode for this instruction.
POSSIBLE VALUES:
00 - OP_MAD: Result = A*B + C
01 - OP_DP: Result = dot product from RGB ALU
02 - OP_MIN: Result = min(A,B)
03 - OP_MAX: Result = max(A,B)
04 - reserved
05 - OP_CND: Result = cnd(A,B,C) = (C>0.5)?A:B
06 - OP_CMP: Result = cmp(A,B,C) =
(C>=0.0)?A:B
07 - OP_FRC: Result = A-floor(A)
08 - OP_EX2: Result = 2^^A
09 - OP_LN2: Result = log2(A)
10 - OP_RCP: Result = 1/A
11 - OP_RSQ: Result = 1/sqrt(A)
12 - OP_SIN: Result = sin(A*2pi)
13 - OP_COS: Result = cos(A*2pi)
14 - OP_MDH: Result = A*B + C; A is always
topleft.src0, C is always topright.src0 (source select and
swizzles ignored). Input modifiers are respected for all
inputs.
15 - OP_MDV: Result = A*B + C; A is always
topleft.src0, C is always bottomleft.src0 (source select
and swizzles ignored). Input modifiers are respected for
all inputs.
ALPHA_ADDRD
10:4
0x0
Specifies the address of the pixel stack frame register to
which the Alpha result of this instruction is to be written.
ALPHA_ADDRD_REL
11
0x0
Specifies whether the loop register is added to the value
of ALPHA_ADDRD before it is used. This implements
relative addressing.
POSSIBLE VALUES:
00 - NONE: Do not modify destination address.
01 - RELATIVE: Add aL to address before write.
ALPHA_SEL_A
13:12
0x0
Specifies the operands for Alpha inputs A and B.
POSSIBLE VALUES:
00 - src0
01 - src1
02 - src2
03 - srcp
ALPHA_SWIZ_A
© 2010 Advanced Micro Devices, Inc.
Proprietary
16:14
0x0
Specifies the channel sources for Alpha inputs A and B.
229
Revision 1.5
June 8, 2010
POSSIBLE VALUES:
00 - Red
01 - Green
02 - Blue
03 - Alpha
04 - Zero
05 - Half
06 - One
07 - Unused
ALPHA_MOD_A
18:17
0x0
Specifies the input modifiers for Alpha inputs A and B.
POSSIBLE VALUES:
00 - NOP: Do not modify input
01 - NEG: Negate input
02 - ABS: Take absolute value of input
03 - NAB: Take negative absolute value of input
ALPHA_SEL_B
20:19
0x0
Specifies the operands for Alpha inputs A and B.
POSSIBLE VALUES:
00 - src0
01 - src1
02 - src2
03 - srcp
ALPHA_SWIZ_B
23:21
0x0
Specifies the channel sources for Alpha inputs A and B.
POSSIBLE VALUES:
00 - Red
01 - Green
02 - Blue
03 - Alpha
04 - Zero
05 - Half
06 - One
07 - Unused
ALPHA_MOD_B
25:24
0x0
Specifies the input modifiers for Alpha inputs A and B.
POSSIBLE VALUES:
00 - NOP: Do not modify input
01 - NEG: Negate input
02 - ABS: Take absolute value of input
03 - NAB: Take negative absolute value of input
OMOD
28:26
0x0
Specifies the output modifier for this instruction.
POSSIBLE VALUES:
00 - Result * 1
01 - Result * 2
02 - Result * 4
03 - Result * 8
04 - Result / 2
© 2010 Advanced Micro Devices, Inc.
Proprietary
230
Revision 1.5
June 8, 2010
05 - Result / 4
06 - Result / 8
07 - Disable output modifier and clamping (result is
copied exactly; only valid for MIN/MAX/CMP/CND)
TARGET
30:29
0x0
This specifies which (cached) frame buffer target to write
to. For non-output ALU instructions, this specifies how
to compare the results against zero when setting the
predicate bits.
POSSIBLE VALUES:
00 - A: Output to render target A. Predicate ==
(ALU)
01 - B: Output to render target B. Predicate < (ALU)
02 - C: Output to render target C. Predicate >=
(ALU)
03 - D: Output to render target D. Predicate != (ALU)
W_OMASK
31
0x0
Specifies whether or not to write the Alpha component of
the result of this instuction to the depth output fifo.
POSSIBLE VALUES:
00 - NONE: Do not write output to w.
01 - A: Write the alpha channel only to w.
US:US_ALU_ALPHA_ADDR_[0-511] · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x9800-0x9ffc
DESCRIPTION: This table specifies the Alpha source addresses and pre-subtract operation for up to 512 ALU
instruction. The ALU expects 6 source operands - three for color (rgb0, rgb1, rgb2) and three for alpha (a0, a1, a2).
The pre-subtract operation creates two more (rgbp and ap).
Field Name
Bits
Default
Description
ADDR0
7:0
0x0
Specifies the identity of source operands a0, a1, and a2.
If the const field is set, this number ranges from 0 to 255
and specifies a location within the constant register bank.
Otherwise: If the most significant bit is cleared, this field
specifies a location within the current pixel stack frame
(ranging from 0 to 127). If the most significant bit is set,
then the lower 7 bits specify an inline unsigned floatingpoint constant with 4 bit exponent (bias 7) and 3 bit
mantissa, including denormals but excluding
infinite/NaN.
ADDR0_CONST
8
0x0
Specifies whether the associated address is a constant
register address or a temporary address / inline constant.
POSSIBLE VALUES:
00 - TEMPORARY: Address temporary register or
inline constant value.
01 - CONSTANT: Address constant register.
ADDR0_REL
© 2010 Advanced Micro Devices, Inc.
Proprietary
9
0x0
Specifies whether the loop register is added to the value
of the associated address before it is used. This
implements relative addressing.
231
Revision 1.5
June 8, 2010
POSSIBLE VALUES:
00 - NONE: Do not modify source address.
01 - RELATIVE: Add aL before lookup.
ADDR1
17:10
0x0
Specifies the identity of source operands a0, a1, and a2.
If the const field is set, this number ranges from 0 to 255
and specifies a location within the constant register bank.
Otherwise: If the most significant bit is cleared, this field
specifies a location within the current pixel stack frame
(ranging from 0 to 127). If the most significant bit is set,
then the lower 7 bits specify an inline unsigned floatingpoint constant with 4 bit exponent (bias 7) and 3 bit
mantissa, including denormals but excluding
infinite/NaN.
ADDR1_CONST
18
0x0
Specifies whether the associated address is a constant
register address or a temporary address / inline constant.
POSSIBLE VALUES:
00 - TEMPORARY: Address temporary register or
inline constant value.
01 - CONSTANT: Address constant register.
ADDR1_REL
19
0x0
Specifies whether the loop register is added to the value
of the associated address before it is used. This
implements relative addressing.
POSSIBLE VALUES:
00 - NONE: Do not modify source address.
01 - RELATIVE: Add aL before lookup.
ADDR2
27:20
0x0
Specifies the identity of source operands a0, a1, and a2.
If the const field is set, this number ranges from 0 to 255
and specifies a location within the constant register bank.
Otherwise: If the most significant bit is cleared, this field
specifies a location within the current pixel stack frame
(ranging from 0 to 127). If the most significant bit is set,
then the lower 7 bits specify an inline unsigned floatingpoint constant with 4 bit exponent (bias 7) and 3 bit
mantissa, including denormals but excluding
infinite/NaN.
ADDR2_CONST
28
0x0
Specifies whether the associated address is a constant
register address or a temporary address / inline constant.
POSSIBLE VALUES:
00 - TEMPORARY: Address temporary register or
inline constant value.
01 - CONSTANT: Address constant register.
ADDR2_REL
29
0x0
Specifies whether the loop register is added to the value
of the associated address before it is used. This
implements relative addressing.
POSSIBLE VALUES:
© 2010 Advanced Micro Devices, Inc.
Proprietary
232
Revision 1.5
June 8, 2010
00 - NONE: Do not modify source address.
01 - RELATIVE: Add aL before lookup.
SRCP_OP
31:30
0x0
Specifies how the pre-subtract value (SRCP) is
computed.
POSSIBLE VALUES:
00 - 1.0-2.0*A0
01 - A1-A0
02 - A1+A0
03 - 1.0-A0
US:US_ALU_RGBA_INST_[0-511] · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0xb000-0xb7fc
DESCRIPTION: ALU Shared RGBA Instruction
Field Name
Bits
Default
Description
RGB_OP
3:0
0x0
Specifies the opcode for this instruction.
POSSIBLE VALUES:
00 - OP_MAD: Result = A*B + C
01 - OP_DP3: Result = A.r*B.r + A.g*B.g + A.b*B.b
02 - OP_DP4: Result = A.r*B.r + A.g*B.g + A.b*B.b
+ A.a*B.a
03 - OP_D2A: Result = A.r*B.r + A.g*B.g + C.b
04 - OP_MIN: Result = min(A,B)
05 - OP_MAX: Result = max(A,B)
06 - reserved
07 - OP_CND: Result = cnd(A,B,C) = (C>0.5)?A:B
08 - OP_CMP: Result = cmp(A,B,C) =
(C>=0.0)?A:B
09 - OP_FRC: Result = A-floor(A)
10 - OP_SOP: Result = ex2,ln2,rcp,rsq,sin,cos from
Alpha ALU
11 - OP_MDH: Result = A*B + C; A is always
topleft.src0, C is always topright.src0 (source select and
swizzles ignored). Input modifiers are respected for all
inputs.
12 - OP_MDV: Result = A*B + C; A is always
topleft.src0, C is always bottomleft.src0 (source select
and swizzles ignored). Input modifiers are respected for
all inputs.
RGB_ADDRD
10:4
0x0
Specifies the address of the pixel stack frame register to
which the RGB result of this instruction is to be written.
RGB_ADDRD_REL
11
0x0
Specifies whether the loop register is added to the value
of RGB_ADDRD before it is used. This implements
relative addressing.
POSSIBLE VALUES:
00 - NONE: Do not modify destination address.
01 - RELATIVE: Add aL to address before write.
© 2010 Advanced Micro Devices, Inc.
Proprietary
233
Revision 1.5
RGB_SEL_C
13:12
0x0
June 8, 2010
Specifies the operands for RGB and Alpha input C.
POSSIBLE VALUES:
00 - src0
01 - src1
02 - src2
03 - srcp
RED_SWIZ_C
16:14
0x0
Specifies, per channel, the sources for RGB and Alpha
input C.
POSSIBLE VALUES:
00 - Red
01 - Green
02 - Blue
03 - Alpha
04 - Zero
05 - Half
06 - One
07 - Unused
GREEN_SWIZ_C
19:17
0x0
Specifies, per channel, the sources for RGB and Alpha
input C.
POSSIBLE VALUES:
00 - Red
01 - Green
02 - Blue
03 - Alpha
04 - Zero
05 - Half
06 - One
07 - Unused
BLUE_SWIZ_C
22:20
0x0
Specifies, per channel, the sources for RGB and Alpha
input C.
POSSIBLE VALUES:
00 - Red
01 - Green
02 - Blue
03 - Alpha
04 - Zero
05 - Half
06 - One
07 - Unused
RGB_MOD_C
24:23
0x0
Specifies the input modifiers for RGB and Alpha input
C.
POSSIBLE VALUES:
00 - NOP: Do not modify input
01 - NEG: Negate input
02 - ABS: Take absolute value of input
03 - NAB: Take negative absolute value of input
© 2010 Advanced Micro Devices, Inc.
Proprietary
234
Revision 1.5
ALPHA_SEL_C
26:25
0x0
June 8, 2010
Specifies the operands for RGB and Alpha input C.
POSSIBLE VALUES:
00 - src0
01 - src1
02 - src2
03 - srcp
ALPHA_SWIZ_C
29:27
0x0
Specifies, per channel, the sources for RGB and Alpha
input C.
POSSIBLE VALUES:
00 - Red
01 - Green
02 - Blue
03 - Alpha
04 - Zero
05 - Half
06 - One
07 - Unused
ALPHA_MOD_C
31:30
0x0
Specifies the input modifiers for RGB and Alpha input
C.
POSSIBLE VALUES:
00 - NOP: Do not modify input
01 - NEG: Negate input
02 - ABS: Take absolute value of input
03 - NAB: Take negative absolute value of input
US:US_ALU_RGB_INST_[0-511] · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0xa000-0xa7fc
DESCRIPTION: ALU RGB Instruction
Field Name
Bits
Default
Description
RGB_SEL_A
1:0
0x0
Specifies the operands for RGB inputs A and B.
POSSIBLE VALUES:
00 - src0
01 - src1
02 - src2
03 - srcp
RED_SWIZ_A
4:2
0x0
Specifies, per channel, the sources for RGB inputs A and
B.
POSSIBLE VALUES:
00 - Red
01 - Green
02 - Blue
03 - Alpha
04 - Zero
05 - Half
© 2010 Advanced Micro Devices, Inc.
Proprietary
235
Revision 1.5
June 8, 2010
06 - One
07 - Unused
GREEN_SWIZ_A
7:5
0x0
Specifies, per channel, the sources for RGB inputs A and
B.
POSSIBLE VALUES:
00 - Red
01 - Green
02 - Blue
03 - Alpha
04 - Zero
05 - Half
06 - One
07 - Unused
BLUE_SWIZ_A
10:8
0x0
Specifies, per channel, the sources for RGB inputs A and
B.
POSSIBLE VALUES:
00 - Red
01 - Green
02 - Blue
03 - Alpha
04 - Zero
05 - Half
06 - One
07 - Unused
RGB_MOD_A
12:11
0x0
Specifies the input modifiers for RGB inputs A and B.
POSSIBLE VALUES:
00 - NOP: Do not modify input
01 - NEG: Negate input
02 - ABS: Take absolute value of input
03 - NAB: Take negative absolute value of input
RGB_SEL_B
14:13
0x0
Specifies the operands for RGB inputs A and B.
POSSIBLE VALUES:
00 - src0
01 - src1
02 - src2
03 - srcp
RED_SWIZ_B
17:15
0x0
Specifies, per channel, the sources for RGB inputs A and
B.
POSSIBLE VALUES:
00 - Red
01 - Green
02 - Blue
03 - Alpha
04 - Zero
05 - Half
© 2010 Advanced Micro Devices, Inc.
Proprietary
236
Revision 1.5
June 8, 2010
06 - One
07 - Unused
GREEN_SWIZ_B
20:18
0x0
Specifies, per channel, the sources for RGB inputs A and
B.
POSSIBLE VALUES:
00 - Red
01 - Green
02 - Blue
03 - Alpha
04 - Zero
05 - Half
06 - One
07 - Unused
BLUE_SWIZ_B
23:21
0x0
Specifies, per channel, the sources for RGB inputs A and
B.
POSSIBLE VALUES:
00 - Red
01 - Green
02 - Blue
03 - Alpha
04 - Zero
05 - Half
06 - One
07 - Unused
RGB_MOD_B
25:24
0x0
Specifies the input modifiers for RGB inputs A and B.
POSSIBLE VALUES:
00 - NOP: Do not modify input
01 - NEG: Negate input
02 - ABS: Take absolute value of input
03 - NAB: Take negative absolute value of input
OMOD
28:26
0x0
Specifies the output modifier for this instruction.
POSSIBLE VALUES:
00 - Result * 1
01 - Result * 2
02 - Result * 4
03 - Result * 8
04 - Result / 2
05 - Result / 4
06 - Result / 8
07 - Disable output modifier and clamping (result is
copied exactly; only valid for MIN/MAX/CMP/CND)
TARGET
© 2010 Advanced Micro Devices, Inc.
Proprietary
30:29
0x0
This specifies which (cached) frame buffer target to write
to. For non-output ALU instructions, this specifies how
to compare the results against zero when setting the
predicate bits.
237
Revision 1.5
June 8, 2010
POSSIBLE VALUES:
00 - A: Output to render target A. Predicate ==
(ALU)
01 - B: Output to render target B. Predicate < (ALU)
02 - C: Output to render target C. Predicate >=
(ALU)
03 - D: Output to render target D. Predicate != (ALU)
ALU_WMASK
31
0x0
Specifies whether to update the current ALU result.
POSSIBLE VALUES:
00 - Do not modify the current ALU result.
01 - Modify the current ALU result based on the
settings of ALU_RESULT_SEL and
ALU_RESULT_OP.
US:US_ALU_RGB_ADDR_[0-511] · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x9000-0x97fc
DESCRIPTION: This table specifies the RGB source addresses and pre-subtract operation for up to 512 ALU
instructions. The ALU expects 6 source operands - three for color (rgb0, rgb1, rgb2) and three for alpha (a0, a1,
a2). The pre-subtract operation creates two more (rgbp and ap).
Field Name
Bits
Default
Description
ADDR0
7:0
0x0
Specifies the identity of source operands rgb0, rgb1, and
rgb2. If the const field is set, this number ranges from 0
to 255 and specifies a location within the constant
register bank. Otherwise: If the most significant bit is
cleared, this field specifies a location within the current
pixel stack frame (ranging from 0 to 127). If the most
significant bit is set, then the lower 7 bits specify an
inline unsigned floating-point constant with 4 bit
exponent (bias 7) and 3 bit mantissa, including
denormals but excluding infinite/NaN.
ADDR0_CONST
8
0x0
Specifies whether the associated address is a constant
register address or a temporary address / inline constant.
POSSIBLE VALUES:
00 - TEMPORARY: Address temporary register or
inline constant value.
01 - CONSTANT: Address constant register.
ADDR0_REL
9
0x0
Specifies whether the loop register is added to the value
of the associated address before it is used. This
implements relative addressing.
POSSIBLE VALUES:
00 - NONE: Do not modify source address.
01 - RELATIVE: Add aL before lookup.
ADDR1
© 2010 Advanced Micro Devices, Inc.
Proprietary
17:10
0x0
Specifies the identity of source operands rgb0, rgb1, and
rgb2. If the const field is set, this number ranges from 0
to 255 and specifies a location within the constant
238
Revision 1.5
June 8, 2010
register bank. Otherwise: If the most significant bit is
cleared, this field specifies a location within the current
pixel stack frame (ranging from 0 to 127). If the most
significant bit is set, then the lower 7 bits specify an
inline unsigned floating-point constant with 4 bit
exponent (bias 7) and 3 bit mantissa, including
denormals but excluding infinite/NaN.
ADDR1_CONST
18
0x0
Specifies whether the associated address is a constant
register address or a temporary address / inline constant.
POSSIBLE VALUES:
00 - TEMPORARY: Address temporary register or
inline constant value.
01 - CONSTANT: Address constant register.
ADDR1_REL
19
0x0
Specifies whether the loop register is added to the value
of the associated address before it is used. This
implements relative addressing.
POSSIBLE VALUES:
00 - NONE: Do not modify source address.
01 - RELATIVE: Add aL before lookup.
ADDR2
27:20
0x0
Specifies the identity of source operands rgb0, rgb1, and
rgb2. If the const field is set, this number ranges from 0
to 255 and specifies a location within the constant
register bank. Otherwise: If the most significant bit is
cleared, this field specifies a location within the current
pixel stack frame (ranging from 0 to 127). If the most
significant bit is set, then the lower 7 bits specify an
inline unsigned floating-point constant with 4 bit
exponent (bias 7) and 3 bit mantissa, including
denormals but excluding infinite/NaN.
ADDR2_CONST
28
0x0
Specifies whether the associated address is a constant
register address or a temporary address / inline constant.
POSSIBLE VALUES:
00 - TEMPORARY: Address temporary register or
inline constant value.
01 - CONSTANT: Address constant register.
ADDR2_REL
29
0x0
Specifies whether the loop register is added to the value
of the associated address before it is used. This
implements relative addressing.
POSSIBLE VALUES:
00 - NONE: Do not modify source address.
01 - RELATIVE: Add aL before lookup.
SRCP_OP
31:30
0x0
Specifies how the pre-subtract value (SRCP) is
computed.
POSSIBLE VALUES:
00 - 1.0-2.0*RGB0
© 2010 Advanced Micro Devices, Inc.
Proprietary
239
Revision 1.5
June 8, 2010
01 - RGB1-RGB0
02 - RGB1+RGB0
03 - 1.0-RGB0
US:US_CMN_INST_[0-511] · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0xb800-0xbffc
DESCRIPTION: Shared instruction fields for all instruction types
Field Name
Bits
Default
Description
TYPE
1:0
0x0
Specifies the type of instruction. Note that output
instructions write to render targets.
POSSIBLE VALUES:
00 - US_INST_TYPE_ALU: This instruction is an
ALU instruction.
01 - US_INST_TYPE_OUT: This instruction is an
output instruction.
02 - US_INST_TYPE_FC: This instruction is a flow
control instruction.
03 - US_INST_TYPE_TEX: This instruction is a
texture instruction.
TEX_SEM_WAIT
2
0x0
Specifies whether to wait for the texture semaphore.
POSSIBLE VALUES:
00 - This instruction may issue immediately.
01 - This instruction will not issue until the texture
semaphore is available.
RGB_PRED_SEL
5:3
0x0
Specifies whether the instruction uses predication. For
ALU/TEX/Output this specifies predication for the RGB
channels only. For FC this specifies the predicate for the
entire instruction.
POSSIBLE VALUES:
00 - US_PRED_SEL_NONE: No predication
01 - US_PRED_SEL_RGBA: Independent Channel
Predication
02 - US_PRED_SEL_RRRR: R-Replicate
Predication
03 - US_PRED_SEL_GGGG: G-Replicate
Predication
04 - US_PRED_SEL_BBBB: B-Replicate
Predication
05 - US_PRED_SEL_AAAA: A-Replicate
Predication
RGB_PRED_INV
6
0x0
Specifies whether the predicate should be inverted. For
ALU/TEX/Output this specifies predication for the RGB
channels only. For FC this specifies the predicate for the
entire instruction.
POSSIBLE VALUES:
00 - Normal predication
© 2010 Advanced Micro Devices, Inc.
Proprietary
240
Revision 1.5
June 8, 2010
01 - Invert the value of the predicate
WRITE_INACTIVE
7
0x0
Specifies which pixels to write to.
POSSIBLE VALUES:
00 - Only write to channels of active pixels
01 - Write to channels of all pixels, including inactive
pixels
LAST
8
0x0
Specifies whether this is the last instruction.
POSSIBLE VALUES:
00 - Do not terminate the shader after executing this
instruction (unless this instruction is at END_ADDR).
01 - All active pixels are willing to terminate after
executing this instruction. There is no guarantee that the
shader will actually terminate here. This feature is
provided as a performance optimization for tests where
pixels can conditionally terminate early.
NOP
9
0x0
Specifies whether to insert a NOP instruction after this.
This would get specified in order to meet dependency
requirements for the pre-subtract inputs, and dependency
requirements for src0 of an MDH/MDV instruction.
POSSIBLE VALUES:
00 - Do not insert NOP instruction after this one.
01 - Insert a NOP instruction after this one.
ALU_WAIT
10
0x0
Specifies whether to wait for pending ALU instructions
to complete before issuing this instruction.
POSSIBLE VALUES:
00 - Do not wait for pending ALU instructions to
complete before issuing the current instruction.
01 - Wait for pending ALU instructions to complete
before issuing the current instruction.
RGB_WMASK
13:11
0x0
Specifies which components of the result of the RGB
instruction are written to the pixel stack frame.
POSSIBLE VALUES:
00 - NONE: Do not write any output.
01 - R: Write the red channel only.
02 - G: Write the green channel only.
03 - RG: Write the red and green channels.
04 - B: Write the blue channel only.
05 - RB: Write the red and blue channels.
06 - GB: Write the green and blue channels.
07 - RGB: Write the red, green, and blue channels.
ALPHA_WMASK
14
0x0
Specifies whether the result of the Alpha instruction is
written to the pixel stack frame.
POSSIBLE VALUES:
00 - NONE: Do not write register.
© 2010 Advanced Micro Devices, Inc.
Proprietary
241
Revision 1.5
June 8, 2010
01 - A: Write the alpha channel only.
RGB_OMASK
17:15
0x0
Specifies which components of the result of the RGB
instruction are written to the output fifo if this is an
output instruction, and which predicate bits should be
modified if this is an ALU instruction.
POSSIBLE VALUES:
00 - NONE: Do not write any output.
01 - R: Write the red channel only.
02 - G: Write the green channel only.
03 - RG: Write the red and green channels.
04 - B: Write the blue channel only.
05 - RB: Write the red and blue channels.
06 - GB: Write the green and blue channels.
07 - RGB: Write the red, green, and blue channels.
ALPHA_OMASK
18
0x0
Specifies whether the result of the Alpha instruction is
written to the output fifo if this is an output instruction,
and whether the Alpha predicate bit should be modified
if this is an ALU instruction.
POSSIBLE VALUES:
00 - NONE: Do not write output.
01 - A: Write the alpha channel only.
RGB_CLAMP
19
0x0
Specifies RGB and Alpha clamp mode for this
instruction.
POSSIBLE VALUES:
00 - Do not clamp output.
01 - Clamp output to the range [0,1].
ALPHA_CLAMP
20
0x0
Specifies RGB and Alpha clamp mode for this
instruction.
POSSIBLE VALUES:
00 - Do not clamp output.
01 - Clamp output to the range [0,1].
ALU_RESULT_SEL
21
0x0
Specifies which component of the result of this
instruction should be used as the `ALU result` by a
subsequent flow control instruction.
POSSIBLE VALUES:
00 - RED: Use red as ALU result for FC.
01 - ALPHA: Use alpha as ALU result for FC.
ALPHA_PRED_INV
22
0x0
Specifies whether the predicate should be inverted. For
ALU/TEX/Output this specifies predication for the alpha
channel only. This field has no effect on FC instructions.
POSSIBLE VALUES:
00 - Normal predication
01 - Invert the value of the predicate
© 2010 Advanced Micro Devices, Inc.
Proprietary
242
Revision 1.5
ALU_RESULT_OP
24:23
0x0
June 8, 2010
Specifies how to compare the ALU result against zero
for the `alu_result` bit in a subsequent flow control
instruction.
POSSIBLE VALUES:
00 - Equal to
01 - Less than
02 - Greater than or equal to
03 - Not equal
ALPHA_PRED_SEL
27:25
0x0
Specifies whether the instruction uses predication. For
ALU/TEX/Output this specifies predication for the alpha
channel only. This field has no effect on FC instructions.
POSSIBLE VALUES:
00 - US_PRED_SEL_NONE: No predication
01 - US_PRED_SEL_RGBA: A predication
(identical to US_PRED_SEL_AAAA)
02 - US_PRED_SEL_RRRR: R Predication
03 - US_PRED_SEL_GGGG: G Predication
04 - US_PRED_SEL_BBBB: B Predication
05 - US_PRED_SEL_AAAA: A Predication
STAT_WE
31:28
0x0
Specifies which components (R,G,B,A) contribute to the
stat count
US:US_CODE_ADDR · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x4630
DESCRIPTION: Code start and end instruction addresses.
Field Name
Bits
Default
Description
START_ADDR
8:0
0x0
Specifies the address of the first instruction to execute in
the shader program. This address is relative to the shader
program offset given in
US_CODE_OFFSET.OFFSET_ADDR.
END_ADDR
24:16
0x0
Specifies the address of the last instruction to execute in
the shader program. This address is relative to the shader
program offset given in
US_CODE_OFFSET.OFFSET_ADDR. Shader program
execution will always terminate after the instruction at
this address is executed.
US:US_CODE_OFFSET · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x4638
DESCRIPTION: Offsets used for relative instruction addresses in the shader program, including START_ADDR,
END_ADDR, and any non-global flow control jump addresses.
Field Name
Bits
Default
Description
OFFSET_ADDR
8:0
0x0
Specifies the offset to add to relative instruction
addresses, including START_ADDR, END_ADDR, and
some flow control jump addresses.
© 2010 Advanced Micro Devices, Inc.
Proprietary
243
Revision 1.5
June 8, 2010
US:US_CODE_RANGE · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x4634
DESCRIPTION: Range of instructions that contains the current shader program.
Field Name
Bits
Default
Description
CODE_ADDR
8:0
0x0
Specifies the start address of the current code window.
This address is an absolute address.
CODE_SIZE
24:16
0x0
Specifies the size of the current code window, minus
one. The last instruction in the code window is given by
CODE_ADDR + CODE_SIZE.
US:US_CONFIG · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x4600
DESCRIPTION: Shader Configuration
Field Name
Bits
Default Description
Reserved
0
0x0
Set to 0
ZERO_TIMES_ANYTHING_EQUALS_ZERO 1
0x0
Control how ALU multiplier behaves when one
argument is zero. This affects the multiplier used in
MAD and dot product calculations.
POSSIBLE VALUES:
00 - Default behaviour (0*inf=nan,0*nan=nan)
01 - Legacy behaviour for shader model 1
(0*anything=0)
US:US_FC_ADDR_[0-511] · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0xa000-0xa7fc
DESCRIPTION: Flow Control Instruction Address Fields
Field Name
Bits
Default
Description
BOOL_ADDR
4:0
0x0
The address of the static boolean register to use in the
jump function.
INT_ADDR
12:8
0x0
The address of the static integer register to use for
loop/rep and endloop/endrep.
JUMP_ADDR
24:16
0x0
The address to jump to if the jump function evaluates to
true.
JUMP_GLOBAL
31
0x0
Specifies whether to interpret JUMP_ADDR as a global
address.
POSSIBLE VALUES:
00 - Add the shader program offset in
US_CODE_OFFSET.OFFSET_ADDR when calculating
the destination address of a jump
01 - Don`t use the shader program offset when
calculating the destination address jump
US:US_FC_BOOL_CONST · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x4620
DESCRIPTION: Static Boolean Constants for Flow Control Branching Instructions. Quad-buffered.
© 2010 Advanced Micro Devices, Inc.
Proprietary
244
Revision 1.5
June 8, 2010
Field Name
Bits
Default
Description
KBOOL
31:0
0x0
Specifies the boolean value for constants 0-31.
US:US_FC_CTRL · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x4624
DESCRIPTION: Flow Control Options. Quad-buffered.
Field Name
Bits
Default
Description
TEST_EN
30
0x0
Specifies whether test mode is enabled. This flag
currently has no effect in hardware.
POSSIBLE VALUES:
00 - Normal mode
01 - Test mode (currently unused)
FULL_FC_EN
31
0x0
Specifies whether full flow control functionality is
enabled.
POSSIBLE VALUES:
00 - Use partial flow-control (enables twice the
contexts). Loops and subroutines are not available in
partial flow-control mode, and the nesting depth of
branch statements is limited.
01 - Use full pixel shader 3.0 flow control, including
loops and subroutines.
US:US_FC_INST_[0-511] · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x9800-0x9ffc
DESCRIPTION: Flow Control Instruction
Field Name
Bits
Default
Description
OP
2:0
0x0
Specifies the type of flow control instruction.
POSSIBLE VALUES:
00 - US_FC_OP_JUMP: (if, endif, call, etc)
01 - US_FC_OP_LOOP: same as jump except
always take the jump if the static counter is 0. If we don`t
take the jump, push initial loop counter and loop register
(aL) values onto the loop stack.
02 - US_FC_OP_ENDLOOP: same as jump but
decrement the loop counter and increment the loop
register (aL), and don`t take the jump if the loop counter
becomes zero.
03 - US_FC_OP_REP: same as loop but don`t push
the loop register aL.
04 - US_FC_OP_ENDREP: same as endloop but
don`t update/pop the loop register aL.
05 - US_FC_OP_BREAKLOOP: same as jump but
pops the loop stacks if a pixel stops being active.
06 - US_FC_OP_BREAKREP: same as breakloop
but don`t pop the loop register if it jumps.
© 2010 Advanced Micro Devices, Inc.
Proprietary
245
Revision 1.5
June 8, 2010
07 - US_FC_OP_CONTINUE: used to disable pixels
that are ready to jump to the ENDLOOP/ENDREP
instruction.
B_ELSE
4
0x0
Specifies whether to perform an else operation on the
active and branch-inactive pixels before executing the
instruction.
POSSIBLE VALUES:
00 - Don`t alter the branch state before executing the
instruction.
01 - Perform an else operation on the branch state
before executing the instruction; pixels in the active state
are moved to the branch inactive state with zero counter,
and vice versa.
JUMP_ANY
5
0x0
If set, jump if any active pixels want to take the jump
(otherwise the instruction jumps only if all active pixels
want to).
POSSIBLE VALUES:
00 - Jump if ALL active pixels want to take the jump
(for if and else). If no pixels are active, jump.
01 - Jump if ANY active pixels want to take the jump
(for call, loop/rep and endrep/endloop). If no pixels are
active, do not jump.
A_OP
7:6
0x0
The address stack operation to perform if we take the
jump.
POSSIBLE VALUES:
00 - US_FC_A_OP_NONE: Don`t change the
address stack
01 - US_FC_A_OP_POP: If we jump, pop the
address stack and use that value for the jump target
02 - US_FC_A_OP_PUSH: If we jump, push the
current address onto the address stack
JUMP_FUNC
15:8
0x0
A 2x2x2 table of boolean values indicating whether to
take the jump. The table index is indexed by {ALU
Compare Result, Predication Result, Boolean Value
(from the static boolean address in
US_FC_ADDR.BOOL)}. To determine whether to jump,
look at bit ((alu_result<<2) | (predicate<<1) | bool).
B_POP_CNT
20:16
0x0
The amount to decrement the branch counter by if
US_FC_B_OP_DECR operation is performed.
B_OP0
25:24
0x0
The branch state operation to perform if we don`t take
the jump.
POSSIBLE VALUES:
00 - US_FC_B_OP_NONE: If we don`t jump, don`t
alter the branch counter for any pixel.
01 - US_FC_B_OP_DECR: If we don`t jump,
decrement branch counter by B_POP_CNT for inactive
© 2010 Advanced Micro Devices, Inc.
Proprietary
246
Revision 1.5
June 8, 2010
pixels. Activate pixels with negative counters.
02 - US_FC_B_OP_INCR: If we don`t jump,
increment branch counter by 1 for inactive pixels.
Deactivate pixels that decided to jump and set their
counter to zero.
B_OP1
27:26
0x0
The branch state operation to perform if we do take the
jump.
POSSIBLE VALUES:
00 - US_FC_B_OP_NONE: If we do jump, don`t
alter the branch counter for any pixel.
01 - US_FC_B_OP_DECR: If we do jump,
decrement branch counter by B_POP_CNT for inactive
pixels. Activate pixels with negative counters.
02 - US_FC_B_OP_INCR: If we do jump, increment
branch counter by 1 for inactive pixels. Deactivate pixels
that decided not to jump and set their counter to zero.
IGNORE_UNCOVERED
28
0x0
If set, uncovered pixels will not participate in flow
control decisions.
POSSIBLE VALUES:
00 - Include uncovered pixels in jump decisions
01 - Ignore uncovered pixels in making jump
decisions
US:US_FC_INT_CONST_[0-31] · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x4c00-0x4c7c
DESCRIPTION: Integer Constants used by Flow Control Loop Instructions. Single buffered.
Field Name
Bits
Default
Description
KR
7:0
0x0
Specifies the number of iterations. Unsigned 8-bit integer
in [0, 255].
KG
15:8
0x0
Specifies the initial value of the loop register (aL).
Unsigned 8-bit integer in [0, 255].
KB
23:16
0x0
Specifies the increment used to change the loop register
(aL) on each iteration. Signed 7-bit integer in [-128,
127].
US:US_FORMAT0_[0-15] · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x4640-0x467c
Field Name
Bits
Default
TXWIDTH
10:0
0x0
TXHEIGHT
21:11
0x0
TXDEPTH
25:22
0x0
© 2010 Advanced Micro Devices, Inc.
Proprietary
Description
POSSIBLE VALUES:
13 - width > 2048, height <= 2048
14 - width <= 2048, height > 2048
15 - width > 2048, height > 2048
247
Revision 1.5
June 8, 2010
US:US_OUT_FMT_[0-3] · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x46a4-0x46b0
Field Name
Bits
Default
Description
OUT_FMT
4:0
0x0
POSSIBLE VALUES:
00 - C4_8 (S/U)
01 - C4_10 (U)
02 - C4_10_GAMMA - (U)
03 - C_16 - (S/U)
04 - C2_16 - (S/U)
05 - C4_16 - (S/U)
06 - C_16_MPEG - (S)
07 - C2_16_MPEG - (S)
08 - C2_4 - (U)
09 - C_3_3_2 - (U)
10 - C_6_5_6 - (S/U)
11 - C_11_11_10 - (S/U)
12 - C_10_11_11 - (S/U)
13 - C_2_10_10_10 - (S/U)
14 - reserved
15 - UNUSED - Render target is not used
16 - C_16_FP - (S10E5)
17 - C2_16_FP - (S10E5)
18 - C4_16_FP - (S10E5)
19 - C_32_FP - (S23E8)
20 - C2_32_FP - (S23E8)
21 - C4_32_FP - (S23E8)
C0_SEL
9:8
0x0
POSSIBLE VALUES:
00 - Alpha
01 - Red
02 - Green
03 - Blue
C1_SEL
11:10
0x0
POSSIBLE VALUES:
00 - Alpha
01 - Red
02 - Green
03 - Blue
C2_SEL
13:12
0x0
POSSIBLE VALUES:
00 - Alpha
01 - Red
02 - Green
03 - Blue
C3_SEL
15:14
0x0
POSSIBLE VALUES:
00 - Alpha
01 - Red
02 - Green
03 - Blue
OUT_SIGN
19:16
0x0
ROUND_ADJ
20
0x0
© 2010 Advanced Micro Devices, Inc.
Proprietary
POSSIBLE VALUES:
00 - Normal rounding
01 - Modified rounding of fixed-point data
248
Revision 1.5
June 8, 2010
US:US_PIXSIZE · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x4604
DESCRIPTION: Shader pixel size. This register specifies the size and partitioning of the current pixel stack frame
Field Name
Bits
Default
Description
PIX_SIZE
6:0
0x0
Specifies the total size of the current pixel stack frame
(1:128)
US:US_TEX_ADDR_[0-511] · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x9800-0x9ffc
DESCRIPTION: Texture addresses and swizzles
Field Name
Bits
Default
Description
SRC_ADDR
6:0
0x0
Specifies the location (within the shader pixel stack
frame) of the texture address for this instruction
SRC_ADDR_REL
7
0x0
Specifies whether the loop register is added to the value
of the associated address before it is used. This
implements relative addressing.
POSSIBLE VALUES:
00 - NONE: Do not modify source address
01 - RELATIVE: Add aL before lookup.
SRC_S_SWIZ
9:8
0x0
Specify which colour channel of src_addr to use for S
coordinate
POSSIBLE VALUES:
00 - Use R channel as S coordinate
01 - Use G channel as S coordinate
02 - Use B channel as S coordinate
03 - Use A channel as S coordinate
SRC_T_SWIZ
11:10
0x0
Specify which colour channel of src_addr to use for T
coordinate
POSSIBLE VALUES:
00 - Use R channel as T coordinate
01 - Use G channel as T coordinate
02 - Use B channel as T coordinate
03 - Use A channel as T coordinate
SRC_R_SWIZ
13:12
0x0
Specify which colour channel of src_addr to use for R
coordinate
POSSIBLE VALUES:
00 - Use R channel as R coordinate
01 - Use G channel as R coordinate
02 - Use B channel as R coordinate
03 - Use A channel as R coordinate
SRC_Q_SWIZ
15:14
0x0
Specify which colour channel of src_addr to use for Q
coordinate
POSSIBLE VALUES:
00 - Use R channel as Q coordinate
© 2010 Advanced Micro Devices, Inc.
Proprietary
249
Revision 1.5
June 8, 2010
01 - Use G channel as Q coordinate
02 - Use B channel as Q coordinate
03 - Use A channel as Q coordinate
DST_ADDR
22:16
0x0
Specifies the location (within the shader pixel stack
frame) of the returned texture data for this instruction
DST_ADDR_REL
23
0x0
Specifies whether the loop register is added to the value
of the associated address before it is used. This
implements relative addressing.
POSSIBLE VALUES:
00 - NONE: Do not modify destination address
01 - RELATIVE: Add aL before lookup.
DST_R_SWIZ
25:24
0x0
Specify which colour channel of the returned texture data
to write to the red channel of dst_addr
POSSIBLE VALUES:
00 - Write R channel to R channel
01 - Write G channel to R channel
02 - Write B channel to R channel
03 - Write A channel to R channel
DST_G_SWIZ
27:26
0x0
Specify which colour channel of the returned texture data
to write to the green channel of dst_addr
POSSIBLE VALUES:
00 - Write R channel to G channel
01 - Write G channel to G channel
02 - Write B channel to G channel
03 - Write A channel to G channel
DST_B_SWIZ
29:28
0x0
Specify which colour channel of the returned texture data
to write to the blue channel of dst_addr
POSSIBLE VALUES:
00 - Write R channel to B channel
01 - Write G channel to B channel
02 - Write B channel to B channel
03 - Write A channel to B channel
DST_A_SWIZ
31:30
0x0
Specify which colour channel of the returned texture data
to write to the alpha channel of dst_addr
POSSIBLE VALUES:
00 - Write R channel to A channel
01 - Write G channel to A channel
02 - Write B channel to A channel
03 - Write A channel to A channel
US:US_TEX_ADDR_DXDY_[0-511] · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0xa000-0xa7fc
DESCRIPTION: Additional texture addresses and swizzles for DX/DY inputs
Field Name
© 2010 Advanced Micro Devices, Inc.
Proprietary
Bits
Default
Description
250
Revision 1.5
June 8, 2010
DX_ADDR
6:0
0x0
Specifies the location (within the shader pixel stack
frame) of the DX value for this instruction
DX_ADDR_REL
7
0x0
Specifies whether the loop register is added to the value
of the associated address before it is used. This
implements relative addressing.
POSSIBLE VALUES:
00 - NONE: Do not modify source address
01 - RELATIVE: Add aL before lookup.
DX_S_SWIZ
9:8
0x0
Specify which colour channel of dx_addr to use for S
coordinate
POSSIBLE VALUES:
00 - Use R channel as S coordinate
01 - Use G channel as S coordinate
02 - Use B channel as S coordinate
03 - Use A channel as S coordinate
DX_T_SWIZ
11:10
0x0
Specify which colour channel of dx_addr to use for T
coordinate
POSSIBLE VALUES:
00 - Use R channel as T coordinate
01 - Use G channel as T coordinate
02 - Use B channel as T coordinate
03 - Use A channel as T coordinate
DX_R_SWIZ
13:12
0x0
Specify which colour channel of dx_addr to use for R
coordinate
POSSIBLE VALUES:
00 - Use R channel as R coordinate
01 - Use G channel as R coordinate
02 - Use B channel as R coordinate
03 - Use A channel as R coordinate
DX_Q_SWIZ
15:14
0x0
Specify which colour channel of dx_addr to use for Q
coordinate
POSSIBLE VALUES:
00 - Use R channel as Q coordinate
01 - Use G channel as Q coordinate
02 - Use B channel as Q coordinate
03 - Use A channel as Q coordinate
DY_ADDR
22:16
0x0
Specifies the location (within the shader pixel stack
frame) of the DY value for this instruction
DY_ADDR_REL
23
0x0
Specifies whether the loop register is added to the value
of the associated address before it is used. This
implements relative addressing.
POSSIBLE VALUES:
00 - NONE: Do not modify source address
© 2010 Advanced Micro Devices, Inc.
Proprietary
251
Revision 1.5
June 8, 2010
01 - RELATIVE: Add aL before lookup.
DY_S_SWIZ
25:24
0x0
Specify which colour channel of dy_addr to use for S
coordinate
POSSIBLE VALUES:
00 - Use R channel as S coordinate
01 - Use G channel as S coordinate
02 - Use B channel as S coordinate
03 - Use A channel as S coordinate
DY_T_SWIZ
27:26
0x0
Specify which colour channel of dy_addr to use for T
coordinate
POSSIBLE VALUES:
00 - Use R channel as T coordinate
01 - Use G channel as T coordinate
02 - Use B channel as T coordinate
03 - Use A channel as T coordinate
DY_R_SWIZ
29:28
0x0
Specify which colour channel of dy_addr to use for R
coordinate
POSSIBLE VALUES:
00 - Use R channel as R coordinate
01 - Use G channel as R coordinate
02 - Use B channel as R coordinate
03 - Use A channel as R coordinate
DY_Q_SWIZ
31:30
0x0
Specify which colour channel of dy_addr to use for Q
coordinate
POSSIBLE VALUES:
00 - Use R channel as Q coordinate
01 - Use G channel as Q coordinate
02 - Use B channel as Q coordinate
03 - Use A channel as Q coordinate
US:US_TEX_INST_[0-511] · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x9000-0x97fc
DESCRIPTION: Texture Instruction
Field Name
Bits
Default
Description
TEX_ID
19:16
0x0
Specifies the id of the texture map used for this
instruction
INST
24:22
0x0
Specifies the operation taking place for this instruction
POSSIBLE VALUES:
00 - NOP: Do nothing
01 - LD: Do Texture Lookup (S,T,R)
02 - TEXKILL: Kill pixel if any component is < 0
03 - PROJ: Do projected texture lookup
(S/Q,T/Q,R/Q)
04 - LODBIAS: Do texture lookup with lod bias
© 2010 Advanced Micro Devices, Inc.
Proprietary
252
Revision 1.5
June 8, 2010
05 - LOD: Do texture lookup with explicit lod
06 - DXDY: Do texture lookup with lod calculated
from DX and DY
TEX_SEM_ACQUIRE
25
0x0
Whether to hold the texture semaphore until the data is
written to the temporary register.
POSSIBLE VALUES:
00 - Don`t hold the texture semaphore
01 - Hold the texture semaphore until the data is
written to the temporary register.
IGNORE_UNCOVERED
26
0x0
If set, US will not request data for pixels which are
uncovered. Clear this bit for indirect texture lookups.
POSSIBLE VALUES:
00 - Fetch texels for uncovered pixels
01 - Don`t fetch texels for uncovered pixels
UNSCALED
27
0x0
Whether to scale texture coordinates when sending them
to the texture unit.
POSSIBLE VALUES:
00 - Scale the S, T, R texture coordinates from
[0.0,1.0] to the dimensions of the target texture
01 - Use the unscaled S, T, R texture coordates.
US:US_W_FMT · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x46b4
DESCRIPTION: Specifies the source and format for the Depth (W) value output by the shader
Field Name
Bits
Default
Description
W_FMT
1:0
0x0
Format for W
POSSIBLE VALUES:
00 - W0 - W is always zero
01 - W24 - 24-bit fixed point
02 - W24_FP - 24-bit floating point. The floating
point values are a special format that preserve sorting
order when values are compared as integers, allowing
higher precision in W without additional logic in other
blocks.
03 - Reserved
W_SRC
2
0x0
Source for W
POSSIBLE VALUES:
00 - WSRC_US - W comes from shader instruction
01 - WSRC_RAS - W comes from rasterizer
© 2010 Advanced Micro Devices, Inc.
Proprietary
253
Revision 1.5
June 8, 2010
11.11 Vertex Registers
VAP:VAP_ALT_NUM_VERTICES · [R/W] · 32 bits · Access: 32 · MMReg:0x2088
DESCRIPTION: Alternate Number of Vertices to allow > 16-bits of Vertex count
Field Name
Bits
Default
Description
NUM_VERTICES
23:0
0x0
24-bit vertex count for command packet. Used instead of
bits 31:16 of VAP_VF_CNTL if
VAP_VF_CNTL.USE_ALT_NUM_VERTS is set.
VAP:VAP_CLIP_CNTL · [R/W] · 32 bits · Access: 32 · MMReg:0x221c
DESCRIPTION: Control Bits for User Clip Planes and Clipping
Field Name
Bits
Default
Description
UCP_ENA_0
0
0x0
Enable User Clip Plane 0
UCP_ENA_1
1
0x0
Enable User Clip Plane 1
UCP_ENA_2
2
0x0
Enable User Clip Plane 2
UCP_ENA_3
3
0x0
Enable User Clip Plane 3
UCP_ENA_4
4
0x0
Enable User Clip Plane 4
UCP_ENA_5
5
0x0
Enable User Clip Plane 5
PS_UCP_MODE
15:14
0x0
0 = Cull using distance from center of point
1 = Cull using radius-based distance from center of
point
2 = Cull using radius-based distance from center of
point, Expand and Clip on intersection
3 = Always expand and clip as trifan
CLIP_DISABLE
16
0x0
Disables clip code generation and clipping process for
TCL
UCP_CULL_ONLY_ENA
17
0x0
Cull Primitives against UCPS, but don`t clip
BOUNDARY_EDGE_FLAG_ENA 18
0x0
If set, boundary edges are highlighted, else they are not
highlighted
COLOR2_IS_TEXTURE
20
0x0
If set, color2 is used as texture8 by GA (PS3.0
requirement)
COLOR3_IS_TEXTURE
21
0x0
If set, color3 is used as texture9 by GA (PS3.0
requirement)
VAP:VAP_CNTL · [R/W] · 32 bits · Access: 32 · MMReg:0x2080
DESCRIPTION: Vertex Assembler/Processor Control Register
Field Name
Bits
Default
Description
PVS_NUM_SLOTS
3:0
0x0
Specifies the number of vertex slots to be used in the
VAP PVS process. A slot represents a single vertex
storage location1 across multiple engines (one vertex per
engine). By decreasing the number of slots, there is more
memory for each vertex, but less parallel processing.
© 2010 Advanced Micro Devices, Inc.
Proprietary
254
Revision 1.5
June 8, 2010
Similarly, by increasing the number of slots, there is less
memory per vertex but more vertices being processed in
parallel.
PVS_NUM_CNTLRS
7:4
0x0
Specifies the maximum number of controllers to be
processing in parallel. In general should be set to max
value of TBD. Can be changed for performance analysis.
PVS_NUM_FPUS
11:8
0x0
Specifies the number of Floating Point Units
(Vector/Math Engines) to use when processing vertices.
VAP_NO_RENDER
17
0x0
If set, VAP will not process any draw commands (i.e.
writes to VAP_VF_CNTL, the INDX and DATAPORT
and Immediate mode writes are ignored.
VF_MAX_VTX_NUM
21:18
0x9
This field controls the number of vertices that the vertex
fetcher manages for the TCL and Setup Vertex Storage
memories (and therefore the number of vertices that can
be re-used). This value should be set to 12 for most
operation, This number may be modified for
performance evaluation. The value is the maximum
vertex number used which is one less than the number of
vertices (i.e. a 12 means 13 vertices will be used)
DX_CLIP_SPACE_DEF
22
0x0
Clip space is defined as:
0: -W < X < W, -W < Y < W, -W < Z < W (OpenGL
Definition)
1: -W < X < W, -W < Y < W, 0 < Z < W (DirectX
Definition)
TCL_STATE_OPTIMIZATION 23
0x0
If set, enables the TCL state optimization, and the new
state is used only if there is a change in TCL state,
between VF_CNTL (triggers)
VAP:VAP_CNTL_STATUS · [R/W] · 32 bits · Access: 32 · MMReg:0x2140
DESCRIPTION: Vertex Assemblen/Processor Control Status
Field Name
Bits
Default
Description
VC_SWAP
1:0
0x0
Endian-Swap Control.
0 = No swap 1 = 16-bit swap: 0xAABBCCDD becomes
0xBBAADDCC
2 = 32-bit swap: 0xAABBCCDD becomes
0xDDCCBBAA
3 = Half-dword swap: 0xAABBCCDD becomes
0xCCDDAABB
Default = 0
PVS_BYPASS
8
0x0
The TCL engine is logically or physically removed from
the circuit.
PVS_BUSY
(Access: R)
11
0x0
Transform/Clip/Light (TCL) Engine is Busy. Read-only.
MAX_MPS
(Access: R)
19:16
0x0
Maximum number of MPs fused for this chip. Readonly.
For A11, fusemask is fixed to 1XXX.
For A12,
© 2010 Advanced Micro Devices, Inc.
Proprietary
255
Revision 1.5
June 8, 2010
CG.CC_COMBINEDSTRAPS.MAX_MPS[7:5] = 000
=> max_mps[3:0] = 1XXX => 8 MPs
CG.CC_COMBINEDSTRAPS.MAX_MPS[7:5] = 001
=> max_mps[3:0] = 0110 => 6 MPs
CG.CC_COMBINEDSTRAPS.MAX_MPS[7:5] = 010
=> max_mps[3:0] = 0101 => 5 MPs
CG.CC_COMBINEDSTRAPS.MAX_MPS[7:5] = 011
=> max_mps[3:0] = 0100 => 4 MPs
CG.CC_COMBINEDSTRAPS.MAX_MPS[7:5] = 100
=> max_mps[3:0] = 0011 => 3 MPs
CG.CC_COMBINEDSTRAPS.MAX_MPS[7:5] = 101
=> max_mps[3:0] = 0010 => 2 MPs
CG.CC_COMBINEDSTRAPS.MAX_MPS[7:5] = 110
=> max_mps[3:0] = 0001 => 1 MP
CG.CC_COMBINEDSTRAPS.MAX_MPS[7:5] = 111
=> max_mps[3:0] = 0000 => 0 MP
Note that max_mps[3:0] = 0111 = 7 MPs is not available
VS_BUSY
(Access: R)
24
0x0
Vertex Store is Busy. Read-only.
RCP_BUSY
(Access: R)
25
0x0
Reciprocal Engine is Busy. Read-only.
VTE_BUSY
(Access: R)
26
0x0
ViewPort Transform Engine is Busy. Read-only.
MIU_BUSY
(Access: R)
27
0x0
Memory Interface Unit is Busy. Read-only.
VC_BUSY
(Access: R)
28
0x0
Vertex Cache is Busy. Read-only.
VF_BUSY
(Access: R)
29
0x0
Vertex Fetcher is Busy. Read-only.
REGPIPE_BUSY
(Access: R)
30
0x0
Register Pipeline is Busy. Read-only.
VAP_BUSY
(Access: R)
31
0x0
VAP Engine is Busy. Read-only.
VAP:VAP_GB_HORZ_CLIP_ADJ · [R/W] · 32 bits · Access: 32 · MMReg:0x2228
DESCRIPTION: Horizontal Guard Band Clip Adjust Register
Field Name
Bits
Default
Description
DATA_REGISTER
31:0
0x0
32-bit floating point value. Should be set to 1.0 for no
guard band.
VAP:VAP_GB_HORZ_DISC_ADJ · [R/W] · 32 bits · Access: 32 · MMReg:0x222c
DESCRIPTION: Horizontal Guard Band Discard Adjust Register
Field Name
Bits
Default
Description
DATA_REGISTER
31:0
0x0
32-bit floating point value. Should be set to 1.0 for no
guard band.
© 2010 Advanced Micro Devices, Inc.
Proprietary
256
Revision 1.5
June 8, 2010
VAP:VAP_GB_VERT_CLIP_ADJ · [R/W] · 32 bits · Access: 32 · MMReg:0x2220
DESCRIPTION: Vertical Guard Band Clip Adjust Register
Field Name
Bits
Default
Description
DATA_REGISTER
31:0
0x0
32-bit floating point value. Should be set to 1.0 for no
guard band.
VAP:VAP_GB_VERT_DISC_ADJ · [R/W] · 32 bits · Access: 32 · MMReg:0x2224
DESCRIPTION: Vertical Guard Band Discard Adjust Register
Field Name
Bits
Default
Description
DATA_REGISTER
31:0
0x0
32-bit floating point value. Should be set to 1.0 for no
guard band.
VAP:VAP_INDEX_OFFSET · [R/W] · 32 bits · Access: 32 · MMReg:0x208c
DESCRIPTION: Offset Value added to index value in both Indexed and Auto-indexed modes. Disabled by setting
to 0
Field Name
Bits
Default
Description
INDEX_OFFSET
24:0
0x0
25-bit signed 2`s comp offset value
VAP:VAP_OUT_VTX_FMT_0 · [R/W] · 32 bits · Access: 32 · MMReg:0x2090
DESCRIPTION: VAP Out/GA Vertex Format Register 0
Field Name
Bits
Default
Description
VTX_POS_PRESENT
0
0x0
Output the Position Vector
VTX_COLOR_0_PRESENT
1
0x0
Output Color 0 Vector
VTX_COLOR_1_PRESENT
2
0x0
Output Color 1 Vector
VTX_COLOR_2_PRESENT
3
0x0
Output Color 2 Vector
VTX_COLOR_3_PRESENT
4
0x0
Output Color 3 Vector
VTX_PT_SIZE_PRESENT
16
0x0
Output Point Size Vector
VAP:VAP_OUT_VTX_FMT_1 · [R/W] · 32 bits · Access: 32 · MMReg:0x2094
DESCRIPTION: VAP Out/GA Vertex Format Register 1
Field Name
Bits
Default
Description
TEX_0_COMP_CNT
2:0
0x0
Number of words in texture
0 = Not Present
1 = 1 component
2 = 2 components
3 = 3 components
4 = 4 components
© 2010 Advanced Micro Devices, Inc.
Proprietary
257
Revision 1.5
TEX_1_COMP_CNT
5:3
0x0
Number of words in texture
0 = Not Present
1 = 1 component
2 = 2 components
3 = 3 components
4 = 4 components
TEX_2_COMP_CNT
8:6
0x0
Number of words in texture
0 = Not Present
1 = 1 component
2 = 2 components
3 = 3 components
4 = 4 components
TEX_3_COMP_CNT
11:9
0x0
Number of words in texture
0 = Not Present
1 = 1 component
2 = 2 components
3 = 3 components
4 = 4 components
TEX_4_COMP_CNT
14:12
0x0
Number of words in texture
0 = Not Present
1 = 1 component
2 = 2 components
3 = 3 components
4 = 4 components
TEX_5_COMP_CNT
17:15
0x0
Number of words in texture
0 = Not Present
1 = 1 component
2 = 2 components
3 = 3 components
4 = 4 components
TEX_6_COMP_CNT
20:18
0x0
Number of words in texture
0 = Not Present
1 = 1 component
2 = 2 components
3 = 3 components
4 = 4 components
TEX_7_COMP_CNT
23:21
0x0
Number of words in texture
0 = Not Present
1 = 1 component
2 = 2 components
3 = 3 components
4 = 4 components
June 8, 2010
VAP:VAP_PORT_DATA[0-15] · [W] · 32 bits · Access: 32 · MMReg:0x2000-0x203c
DESCRIPTION: Setup Engine Data Port 0 through 15.
Field Name
Bits
Default
Description
DATAPORT0
(master with mirrors)
31:0
0x0
1st of 16 consecutive dwords for writing vertex data
information.
© 2010 Advanced Micro Devices, Inc.
Proprietary
258
Revision 1.5
June 8, 2010
Write-only.
VAP:VAP_PORT_DATA_IDX_128 · [W] · 32 bits · Access: 32 · MMReg:0x20b8
DESCRIPTION: 128-bit Data Port for Indexed Primitives.
Field Name
Bits
Default
Description
DATA_IDX_PORT_128
31:0
0x0
128-bit Data Port for Indexed Primitives.
Write-only.
VAP:VAP_PORT_IDX[0-15] · [W] · 32 bits · Access: 32 · MMReg:0x2040-0x207c
DESCRIPTION: Setup Engine Index Port 0 through 15.
Field Name
Bits
Default
Description
IDXPORT0
(master with mirrors)
31:0
0x0
1st of 16 consecutive dwords for writing vertex index
information, in the format of:
15:0 Index 0
31:16 Index 1
Write-only.
VAP:VAP_PROG_STREAM_CNTL_[0-7] · [R/W] · 32 bits · Access: 32 · MMReg:0x2150-0x216c
DESCRIPTION: Programmable Stream Control Word 0
Field Name
Bits
Default
Description
DATA_TYPE_0
3:0
0x0
The data type for element 0
0 = FLOAT_1 (Single IEEE Float)
1 = FLOAT_2 (2 IEEE floats)
2 = FLOAT_3 (3 IEEE Floats)
3 = FLOAT_4 (4 IEEE Floats)
4 = BYTE * (1 DWORD w 4 8-bit fixed point values)
(X = [7:0], Y = [15:8], Z = [23:16], W = [31:24])
5 = D3DCOLOR * (Same as BYTE except has X->Z,Z>X swap for D3D color def)
(Z = [7:0], Y = [15:8], X = [23:16], W = [31:24])
6 = SHORT_2 * (1 DWORD with 2 16-bit fixed point
values)
(X = [15:0], Y = [31:16], Z = 0.0, W = 1.0)
7 = SHORT_4 * (2 DWORDS with 4(2 per dword) 16bit fixed point values)
(X = DW0 [15:0], Y = DW0 [31:16], Z = DW1 [15:0],
W = DW1 [31:16])
8 = VECTOR_3_TTT * (1 DWORD with 3 10-bit fixed
point values)
(X = [9:0], Y = [19:10], Z = [29:20], W = 1.0)
9 = VECTOR_3_EET * (1 DWORD with 2 11-bit and 1
10-bit fixed point values)
(X = [10:0], Y = [21:11], Z = [31:22], W = 1.0)
10 = FLOAT_8 (8 IEEE Floats)
Sames as 2 FLOAT_4 but must use consecutive
© 2010 Advanced Micro Devices, Inc.
Proprietary
259
Revision 1.5
June 8, 2010
DST_VEC_LOC. Used to allow > 16 PSC for OGL path.
11 = FLT16_2 (1 DWORD with 2 16-bit floating point
values (SE5M10 exp bias of 15, supports denormalized
numbers))
(X = [15:0], Y = [31:16], Z = 0.0, W = 1.0)
12 = FLT16_4 (2 DWORDS with 4(2 per dword) 16-bit
floating point values (SE5M10 exp bias of 15, supports
denormalized numbers)))
(X = DW0 [15:0], Y = DW0 [31:16], Z = DW1 [15:0],
W = DW1 [31:16])
* These data types use the SIGNED and NORMALIZE
flags described below.
SKIP_DWORDS_0
7:4
0x0
The number of DWORDS to skip (discard) after
processing the current element.
DST_VEC_LOC_0
12:8
0x0
The vector address in the input memory to write this
element
LAST_VEC_0
13
0x0
If set, indicates the last vector of the current vertex
stream
SIGNED_0
14
0x0
Determines whether fixed point data types are unsigned
(0) or 2`s complement signed (1) data types. See
NORMALIZE for complete description of affect
NORMALIZE_0
15
0x0
Determines whether the fixed to floating point
conversion will normalize the value (i.e. fixed point
value is all fractional bits) or not (i.e. fixed point value is
all integer bits).
This table describes the fixed to float conversion results
SIGNED NORMALIZE FLT RANGE
0 0 0.0 - (2^n - 1) (i.e. 8-bit -> 0.0 - 255.0)
0 1 0.0 - 1.0
1 0 -2^(n-1) - (2^(n-1) - 1) (i.e. 8-bit -> -128.0 - 127.0)
1 1 -1.0 - 1.0
where n is the number of bits in the associated fixed
point value
For signed, normalize conversion, since the fixed point
range is not evenly distributed around 0, there are 3
different methods supported by R300. See the
VAP_PSC_SGN_NORM_CNTL description for details.
DATA_TYPE_1
19:16
0x0
Similar to DATA_TYPE_0
SKIP_DWORDS_1
23:20
0x0
See SKIP_DWORDS_0
DST_VEC_LOC_1
28:24
0x0
See DST_VEC_LOC_0
LAST_VEC_1
29
0x0
See LAST_VEC_0
SIGNED_1
30
0x0
See SIGNED_0
NORMALIZE_1
31
0x0
See NORMALIZE_0
VAP:VAP_PROG_STREAM_CNTL_EXT_[0-7] · [R/W] · 32 bits · Access: 32 · MMReg:0x21e0-0x21fc
DESCRIPTION: Programmable Stream Control Extension Word 0
Field Name
© 2010 Advanced Micro Devices, Inc.
Proprietary
Bits
Default
Description
260
Revision 1.5
June 8, 2010
SWIZZLE_SELECT_X_0
2:0
0x0
X-Component Swizzle Select
0 = SELECT_X
1 = SELECT_Y
2 = SELECT_Z
3 = SELECT_W
4 = SELECT_FP_ZERO (Floating Point 0.0)
5 = SELECT_FP_ONE (Floating Point 1.0)
6,7 RESERVED
SWIZZLE_SELECT_Y_0
5:3
0x0
Y-Component Swizzle Select (See Above)
SWIZZLE_SELECT_Z_0
8:6
0x0
Z-Component Swizzle Select (See Above)
SWIZZLE_SELECT_W_0
11:9
0x0
W-Component Swizzle Select (See Above)
WRITE_ENA_0
15:12
0x0
4-bit write enable.
Bit 0 maps to X
Bit 1 maps to Y
Bit 2 maps to Z
Bit 3 maps to W
SWIZZLE_SELECT_X_1
18:16
0x0
See SWIZZLE_SELECT_X_0
SWIZZLE_SELECT_Y_1
21:19
0x0
See SWIZZLE_SELECT_Y_0
SWIZZLE_SELECT_Z_1
24:22
0x0
See SWIZZLE_SELECT_Z_0
SWIZZLE_SELECT_W_1
27:25
0x0
See SWIZZLE_SELECT_W_0
WRITE_ENA_1
31:28
0x0
See WRITE_ENA_0
VAP:VAP_PSC_SGN_NORM_CNTL · [R/W] · 32 bits · Access: 32 · MMReg:0x21dc
DESCRIPTION: Programmable Stream Control Signed Normalize Control
Field Name
Bits
Default
Description
SGN_NORM_METHOD_0
1:0
0x0
There are 3 methods of normalizing signed numbers:
0: SGN_NORM_ZERO : value / (2^(n-1)-1), so 128/127 will be less that -1.0, -127/127 will yeild -1.0,
0/127 will yield 0, and 127/127 will yield 1.0 for 8-bit
numbers.
1: SGN_NORM_ZERO_CLAMP_MINUS_ONE: Same
as SGN_NORM_ZERO except -128/127 will yield -1.0
for 8-bit numbers.
2: SGN_NORM_NO_ZERO: (2 * value + 1)/2^n, so 128 will yield -255/255 = -1.0, 127 will yield 255/255 =
1.0, but 0 will yield 1/255 != 0.
SGN_NORM_METHOD_1
3:2
0x0
See SGN_NORM_METHOD_0
SGN_NORM_METHOD_2
5:4
0x0
See SGN_NORM_METHOD_0
SGN_NORM_METHOD_3
7:6
0x0
See SGN_NORM_METHOD_0
SGN_NORM_METHOD_4
9:8
0x0
See SGN_NORM_METHOD_0
SGN_NORM_METHOD_5
11:10
0x0
See SGN_NORM_METHOD_0
SGN_NORM_METHOD_6
13:12
0x0
See SGN_NORM_METHOD_0
SGN_NORM_METHOD_7
15:14
0x0
See SGN_NORM_METHOD_0
SGN_NORM_METHOD_8
17:16
0x0
See SGN_NORM_METHOD_0
© 2010 Advanced Micro Devices, Inc.
Proprietary
261
Revision 1.5
SGN_NORM_METHOD_9
19:18
0x0
See SGN_NORM_METHOD_0
SGN_NORM_METHOD_10
21:20
0x0
See SGN_NORM_METHOD_0
SGN_NORM_METHOD_11
23:22
0x0
See SGN_NORM_METHOD_0
SGN_NORM_METHOD_12
25:24
0x0
See SGN_NORM_METHOD_0
SGN_NORM_METHOD_13
27:26
0x0
See SGN_NORM_METHOD_0
SGN_NORM_METHOD_14
29:28
0x0
See SGN_NORM_METHOD_0
SGN_NORM_METHOD_15
31:30
0x0
See SGN_NORM_METHOD_0
June 8, 2010
VAP:VAP_PVS_CODE_CNTL_0 · [R/W] · 32 bits · Access: 32 · MMReg:0x22d0
DESCRIPTION: Programmable Vertex Shader Code Control Register 0
Field Name
Bits
Default
Description
PVS_FIRST_INST
9:0
0x0
First Instruction to Execute in PVS.
PVS_XYZW_VALID_INST
19:10
0x0
The PVS Instruction which updates the clip coordinate
position for the last time. This value is used to lower the
processing priority while trivial clip and back-face
culling decisions are made. This field must be set to valid
instruction.
PVS_LAST_INST
29:20
0x0
Last Instruction (Inclusive) for the PVS to execute.
VAP:VAP_PVS_CODE_CNTL_1 · [R/W] · 32 bits · Access: 32 · MMReg:0x22d8
DESCRIPTION: Programmable Vertex Shader Code Control Register 1
Field Name
Bits
Default
Description
PVS_LAST_VTX_SRC_INST
9:0
0x0
The PVS Instruction which uses the Input Vertex
Memory for the last time. This value is used to free up
the Input Vertex Slots ASAP. This field must be set to a
valid instruction.
VAP:VAP_PVS_CONST_CNTL · [R/W] · 32 bits · Access: 32 · MMReg:0x22d4
DESCRIPTION: Programmable Vertex Shader Constant Control Register
Field Name
Bits
Default
Description
PVS_CONST_BASE_OFFSET
7:0
0x0
Vector Offset into PVS constant memory to the start of
the constants for the current shader
PVS_MAX_CONST_ADDR
23:16
0x0
The maximum constant address which should be
generated by the shader (Inst Const Addr + Addr
Register). If the address which is generated by the shader
is outside the range of 0 to PVS_MAX_CONST_ADDR,
then (0,0,0,0) is returned as the source operand data.
VAP:VAP_PVS_FLOW_CNTL_ADDRS_[0-15] · [R/W] · 32 bits · Access: 32 · MMReg:0x2230-0x226c
DESCRIPTION: Programmable Vertex Shader Flow Control Addresses Register 0
© 2010 Advanced Micro Devices, Inc.
Proprietary
262
Revision 1.5
June 8, 2010
Field Name
Bits
Default
Description
PVS_FC_ACT_ADRS_0
7:0
0x0
This field defines the last PVS instruction to execute
prior to the control flow redirection.
JUMP - The last instruction executed prior to the jump
LOOP - The last instruction executed prior to the loop
(init loop counter/inc)
JSR - The last instruction executed prior to the jump to
the subroutine.
PVS_FC_LOOP_CNT_JMP_INST_0 15:8
0x0
This field has multiple definitions as follows:
JUMP - The instruction address to jump to.
LOOP - The loop count. *Note loop count of 0 must be
replaced by a jump.
JSR - The instruction address to jump to (first inst of
subroutine).
PVS_FC_LAST_INST_0
23:16
0x0
This field has multiple definitions as follows:
JUMP - Not Applicable
LOOP - The last instruction of the loop.
JSR - The last instruction of the subroutine.
PVS_FC_RTN_INST_0
31:24
0x0
This field has multiple definitions as follows:
JUMP - Not Applicable
LOOP - First Instruction of Loop (Typically
ACT_ADRS + 1)
JSR - First Instruction After JSR (Typically
ACT_ADRS + 1)
VAP:VAP_PVS_FLOW_CNTL_ADDRS_LW_[0-15] · [R/W] · 32 bits · Access: 32 · MMReg:0x25000x2578
DESCRIPTION: For VS3.0 - To support more PVS instructions, increase the address range - Programmable
Vertex Shader Flow Control Lower Word Addresses Register 0
Field Name
Bits
Default
Description
PVS_FC_ACT_ADRS_0
15:0
0x0
This field defines the last PVS instruction to execute
prior to the control flow redirection.
JUMP - The last instruction executed prior to the jump
LOOP - The last instruction executed prior to the loop
(init loop counter/inc)
JSR - The last instruction executed prior to the jump to
the subroutine.
(Addrss_Range:1K=[9:0];512=[8:0];256=[7:0])
0x0
This field has multiple definitions as follows:
JUMP - The instruction address to jump to.
LOOP - The loop count. *Note loop count of 0 must be
replaced by a jump.
JSR - The instruction address to jump to (first inst of
subroutine).
(Addrss_Range:1K=[24:15];512=[23:15];256=[22:15])
PVS_FC_LOOP_CNT_JMP_INST_0 31:16
VAP:VAP_PVS_FLOW_CNTL_ADDRS_UW_[0-15] · [R/W] · 32 bits · Access: 32 · MMReg:0x2504© 2010 Advanced Micro Devices, Inc.
Proprietary
263
Revision 1.5
June 8, 2010
0x257c
DESCRIPTION: For VS3.0 - To support more PVS instructions, increase the address range - Programmable
Vertex Shader Flow Control Upper Word Addresses Register 0
Field Name
Bits
Default
Description
PVS_FC_LAST_INST_0
15:0
0x0
This field has multiple definitions as follows:
JUMP - Not Applicable
LOOP - The last instruction of the loop.
JSR - The last instruction of the subroutine.
(Addrss_Range:1K=[9:0];512=[8:0];256=[7:0])
PVS_FC_RTN_INST_0
31:16
0x0
This field has multiple definitions as follows:
JUMP - Not Applicable
LOOP - First Instruction of Loop (Typically
ACT_ADRS + 1)
JSR - First Instruction After JSR (Typically ACT_ADRS
+ 1).
(Addrss_Range:1K=[24:15];512=[23:15];256=[22:15])
VAP:VAP_PVS_FLOW_CNTL_LOOP_INDEX_[0-15] · [R/W] · 32 bits · Access: 32 · MMReg:0x22900x22cc
DESCRIPTION: Programmable Vertex Shader Flow Control Loop Index Register 0
Field Name
Bits
Default
Description
PVS_FC_LOOP_INIT_VAL_0
7:0
0x0
This field stores the automatic loop index register init
value. This is an 8-bit unsigned value 0-255. This field
is only used if the corresponding control flow
instruction is a loop.
PVS_FC_LOOP_STEP_VAL_0
15:8
0x0
This field stores the automatic loop index register step
value. This is an 8-bit 2`s comp signed value -128-127.
This field is only used if the corresponding control
flow instruction is a loop.
0x0
When this field is set, the automatic loop index register
init value is not used at loop activation. The intial loop
index is inherited from outer loop. The loop index
register step value is used at the end of each loop
iteration ; after loop completion, the outer loop index
register is restored
PVS_FC_LOOP_REPEAT_NO_FLI_0 31
VAP:VAP_PVS_FLOW_CNTL_OPC · [R/W] · 32 bits · Access: 32 · MMReg:0x22dc
DESCRIPTION: Programmable Vertex Shader Flow Control Opcode Register
Field Name
Bits
Default
Description
PVS_FC_OPC_0
1:0
0x0
This opcode field determines what type of control flow
instruction to execute.
0 = NO_OP
1 = JUMP
2 = LOOP
3 = JSR (Jump to Subroutine)
© 2010 Advanced Micro Devices, Inc.
Proprietary
264
Revision 1.5
PVS_FC_OPC_1
3:2
0x0
See PVS_FC_OPC_0.
PVS_FC_OPC_2
5:4
0x0
See PVS_FC_OPC_0.
PVS_FC_OPC_3
7:6
0x0
See PVS_FC_OPC_0.
PVS_FC_OPC_4
9:8
0x0
See PVS_FC_OPC_0.
PVS_FC_OPC_5
11:10
0x0
See PVS_FC_OPC_0.
PVS_FC_OPC_6
13:12
0x0
See PVS_FC_OPC_0.
PVS_FC_OPC_7
15:14
0x0
See PVS_FC_OPC_0.
PVS_FC_OPC_8
17:16
0x0
See PVS_FC_OPC_0.
PVS_FC_OPC_9
19:18
0x0
See PVS_FC_OPC_0.
PVS_FC_OPC_10
21:20
0x0
See PVS_FC_OPC_0.
PVS_FC_OPC_11
23:22
0x0
See PVS_FC_OPC_0.
PVS_FC_OPC_12
25:24
0x0
See PVS_FC_OPC_0.
PVS_FC_OPC_13
27:26
0x0
See PVS_FC_OPC_0.
PVS_FC_OPC_14
29:28
0x0
See PVS_FC_OPC_0.
PVS_FC_OPC_15
31:30
0x0
See PVS_FC_OPC_0.
June 8, 2010
VAP:VAP_PVS_STATE_FLUSH_REG · [R/W] · 32 bits · Access: 32 · MMReg:0x2284
Field Name
Bits
Default
Description
DATA_REGISTER
(Access: W)
31:0
0x0
This register is used to force a flush of the PVS block
when single-buffered updates are performed. The multistate control of PVS Code and Const memories by the
driver is primarily for more flexible PVS state control
and for performance testing. When this register address
is written, the State Block will force a flush of PVS
processing so that both versions of PVS state are
available before updates are processed. This register is
write only, and the data that is written is unused.
VAP:VAP_PVS_VECTOR_DATA_REG · [R/W] · 32 bits · Access: 32 · MMReg:0x2204
Field Name
Bits
Default
Description
DATA_REGISTER
31:0
0x0
32-bit data to write to Vector Memory. Used for PVS
code and Constant updates.
VAP:VAP_PVS_VECTOR_DATA_REG_128 · [W] · 32 bits · Access: 32 · MMReg:0x2208
Field Name
Bits
Default
Description
DATA_REGISTER
31:0
0x0
128-bit data path to write to Vector Memory. Used for
PVS code and Constant updates.
VAP:VAP_PVS_VECTOR_INDX_REG · [R/W] · 32 bits · Access: 32 · MMReg:0x2200
© 2010 Advanced Micro Devices, Inc.
Proprietary
265
Revision 1.5
Field Name
Bits
Default
Description
OCTWORD_OFFSET
10:0
0x0
Octword offset to begin writing.
June 8, 2010
VAP:VAP_PVS_VTX_TIMEOUT_REG · [R/W] · 32 bits · Access: 32 · MMReg:0x2288
Field Name
Bits
Default
Description
CLK_COUNT
31:0
0xFFFFFFFF This register is used to define the number of core clocks
to wait for a vertex to be received by the VAP input
controller (while the primitive path is backed up) before
forcing any accumulated vertices to be submitted to the
vertex processing path.
VAP:VAP_TEX_TO_COLOR_CNTL · [R/W] · 32 bits · Access: 32 · MMReg:0x2218
DESCRIPTION: For VS3.0 color2texture - flat shading on textures - limitation: only first 8 vectors can have
clipping with wrap shortest or point sprite generated textures
Field Name
Bits
Default
TEX_RGB_SHADE_FUNC_0
0
0x0
Description
Default = 0
TEX_ALPHA_SHADE_FUNC_0 1
0x0
Default = 0
TEX_RGBA_CLAMP_0
2
0x0
TEX_RGB_SHADE_FUNC_1
4
0x0
Default = 0
Default = 0
TEX_ALPHA_SHADE_FUNC_1 5
0x0
Default = 0
TEX_RGBA_CLAMP_1
6
0x0
Default = 0
TEX_RGB_SHADE_FUNC_2
8
0x0
TEX_ALPHA_SHADE_FUNC_2 9
0x0
Default = 0
Default = 0
TEX_RGBA_CLAMP_2
10
0x0
Default = 0
TEX_RGB_SHADE_FUNC_3
12
0x0
Default = 0
TEX_ALPHA_SHADE_FUNC_3 13
0x0
TEX_RGBA_CLAMP_3
0x0
Default = 0
14
Default = 0
TEX_RGB_SHADE_FUNC_4
16
0x0
Default = 0
TEX_ALPHA_SHADE_FUNC_4 17
0x0
Default = 0
© 2010 Advanced Micro Devices, Inc.
Proprietary
266
Revision 1.5
TEX_RGBA_CLAMP_4
18
0x0
TEX_RGB_SHADE_FUNC_5
20
0x0
June 8, 2010
Default = 0
Default = 0
TEX_ALPHA_SHADE_FUNC_5 21
0x0
Default = 0
TEX_RGBA_CLAMP_5
22
0x0
Default = 0
TEX_RGB_SHADE_FUNC_6
24
0x0
Default = 0
TEX_ALPHA_SHADE_FUNC_6 25
0x0
Default = 0
TEX_RGBA_CLAMP_6
26
0x0
Default = 0
TEX_RGB_SHADE_FUNC_7
28
0x0
Default = 0
TEX_ALPHA_SHADE_FUNC_7 29
0x0
Default = 0
TEX_RGBA_CLAMP_7
30
0x0
Default = 0
VAP:VAP_VF_CNTL · [R/W] · 32 bits · Access: 32 · MMReg:0x2084
DESCRIPTION: Vertex Fetcher Control
Field Name
Bits
Default
Description
PRIM_TYPE
3:0
0x0
Primitive Type
0 : None (will not trigger Setup Engine to run)
1 : Point List
2 : Line List
3 : Line Strip
4 : Triangle List
5 : Triangle Fan
6 : Triangle Strip
7 : Triangle with wFlags (aka, Rage128 `Type-2`
triangles) *
8-11 : Unused
12 : Line Loop
13 : Quad List
14 : Quad Strip
15 : Polygon
*Encoding 7 indicates whether a 16-bit word of wFlags
is present in the stream of indices arriving when the
VTX_AMODE is programmed as a `0`. The Setup
Engine just steps over the wFlags word; ignoring it.
0 = Stream contains just indices, as:
[ Index1, Index0]
[ Index3, Index2]
[ Index5, Index4 ]
© 2010 Advanced Micro Devices, Inc.
Proprietary
267
Revision 1.5
June 8, 2010
etc...
1 = Stream contains indices and wFlags:
[ Index1, Index0]
[ wFlags,Index 2 ]
[ Index4, Index3]
[ wFlags, Index5 ]
etc...
PRIM_WALK
5:4
0x0
Method of Passing Vertex Data.
0 : State-Based Vertex Data. (Vertex data and tokens
embedded in command stream.)
1 = Indexes (Indices embedded in command stream;
vertex data to be fetched from memory.)
2 = Vertex List (Vertex data to be fetched from
memory.)
3 = Vertex Data (Vertex data embedded in command
stream.)
RSVD_PREV_USED
10:6
0x0
Reserved bits
INDEX_SIZE
11
0x0
When set, vertex indices are 32-bits/indx, otherwise, 16bits/indx.
VTX_REUSE_DIS
12
0x0
When set, vertex reuse is disabled. DO NOT SET unless
PRIM_WALK is Indexes.
DUAL_INDEX_MODE
13
0x0
When set, the incoming index is treated as two separate
indices. Bits 23-16 are used as the index for AOS 0
(These are 0 for 16-bit indices) Bits 15-0 are used as the
index for AOS 1-15. This mode was added specifically
for HOS usage
USE_ALT_NUM_VERTS
14
0x0
When set, the number of vertices in the command packet
is taken from VAP_ALT_NUM_VERTICES register
instead of bits 31:16 of VAP_VF_CNTL
NUM_VERTICES
31:16
0x0
Number of vertices in the command packet.
VAP:VAP_VF_MAX_VTX_INDX · [R/W] · 32 bits · Access: 32 · MMReg:0x2134
DESCRIPTION: Maximum Vertex Indx Clamp
Field Name
Bits
Default
Description
MAX_INDX
23:0
0xFFFFFF If index to be fetched is larger than this value, the fetch
indx is set to MAX_INDX
VAP:VAP_VF_MIN_VTX_INDX · [R/W] · 32 bits · Access: 32 · MMReg:0x2138
DESCRIPTION: Minimum Vertex Indx Clamp
Field Name
Bits
Default
Description
MIN_INDX
23:0
0x0
If index to be fetched is smaller than this value, the fetch
indx is set to MIN_INDX
VAP:VAP_VPORT_XOFFSET · [R/W] · 32 bits · Access: 32 · MMReg:0x1d9c, MMReg:0x209c
© 2010 Advanced Micro Devices, Inc.
Proprietary
268
Revision 1.5
June 8, 2010
DESCRIPTION: Viewport Transform X Offset
Field Name
Bits
Default
Description
VPORT_XOFFSET
31:0
0x0
Viewport Offset for X coordinates. An IEEE float.
VAP:VAP_VPORT_XSCALE · [R/W] · 32 bits · Access: 32 · MMReg:0x1d98, MMReg:0x2098
DESCRIPTION: Viewport Transform X Scale Factor
Field Name
Bits
Default
Description
VPORT_XSCALE
31:0
0x0
Viewport Scale Factor for X coordinates. An IEEE float.
VAP:VAP_VPORT_YOFFSET · [R/W] · 32 bits · Access: 32 · MMReg:0x1da4, MMReg:0x20a4
DESCRIPTION: Viewport Transform Y Offset
Field Name
Bits
Default
Description
VPORT_YOFFSET
31:0
0x0
Viewport Offset for Y coordinates. An IEEE float.
VAP:VAP_VPORT_YSCALE · [R/W] · 32 bits · Access: 32 · MMReg:0x1da0, MMReg:0x20a0
DESCRIPTION: Viewport Transform Y Scale Factor
Field Name
Bits
Default
Description
VPORT_YSCALE
31:0
0x0
Viewport Scale Factor for Y coordinates. An IEEE float.
VAP:VAP_VPORT_ZOFFSET · [R/W] · 32 bits · Access: 32 · MMReg:0x1dac, MMReg:0x20ac
DESCRIPTION: Viewport Transform Z Offset
Field Name
Bits
Default
Description
VPORT_ZOFFSET
31:0
0x0
Viewport Offset for Z coordinates. An IEEE float.
VAP:VAP_VPORT_ZSCALE · [R/W] · 32 bits · Access: 32 · MMReg:0x1da8, MMReg:0x20a8
DESCRIPTION: Viewport Transform Z Scale Factor
Field Name
Bits
Default
Description
VPORT_ZSCALE
31:0
0x0
Viewport Scale Factor for Z coordinates. An IEEE float.
VAP:VAP_VTE_CNTL · [R/W] · 32 bits · Access: 32 · MMReg:0x20b0
DESCRIPTION: Viewport Transform Engine Control
Field Name
Bits
Default
Description
VPORT_X_SCALE_ENA
0
0x0
Viewport Transform Scale Enable for X component
VPORT_X_OFFSET_ENA
1
0x0
Viewport Transform Offset Enable for X component
VPORT_Y_SCALE_ENA
2
0x0
Viewport Transform Scale Enable for Y component
VPORT_Y_OFFSET_ENA
3
0x0
Viewport Transform Offset Enable for Y component
© 2010 Advanced Micro Devices, Inc.
Proprietary
269
Revision 1.5
June 8, 2010
VPORT_Z_SCALE_ENA
4
0x0
Viewport Transform Scale Enable for Z component
VPORT_Z_OFFSET_ENA
5
0x0
Viewport Transform Offset Enable for Z component
VTX_XY_FMT
8
0x0
Indicates that the incoming X, Y have already been
multiplied by 1/W0.
If OFF, the Setup Engine will bultiply the X, Y
coordinates by 1/W0.,
VTX_Z_FMT
9
0x0
Indicates that the incoming Z has already been multiplied
by 1/W0.
If OFF, the Setup Engine will multiply the Z coordinate
by 1/W0.
VTX_W0_FMT
10
0x0
Indicates that the incoming W0 is not 1/W0.
If ON, the Setup Engine will perform the reciprocal to
get 1/W0.
SERIAL_PROC_ENA
11
0x0
If set, x,y,z viewport transform are performed serially
through a single pipeline instead of in parallel. Used to
mimic RL300 design.
VAP:VAP_VTX_AOS_ADDR[0-15] · [R/W] · 32 bits · Access: 32 · MMReg:0x20c8-0x2120
DESCRIPTION: Array-of-Structures Address 0
Field Name
Bits
Default
Description
VTX_AOS_ADDR0
31:2
0x0
Base Address of the Array of Structures.
VAP:VAP_VTX_AOS_ATTR[01-1415] · [R/W] · 32 bits · Access: 32 · MMReg:0x20c4-0x2118
DESCRIPTION: Array-of-Structures Attributes 0 & 1
Field Name
Bits
Default
Description
VTX_AOS_COUNT0
6:0
0x0
Number of dwords in this structure.
VTX_AOS_STRIDE0
14:8
0x0
Number of dwords from one array element to the next.
VTX_AOS_COUNT1
22:16
0x0
Number of dwords in this structure.
VTX_AOS_STRIDE1
30:24
0x0
Number of dwords from one array element to the next.
VAP:VAP_VTX_NUM_ARRAYS · [R/W] · 32 bits · Access: 32 · MMReg:0x20c0
DESCRIPTION: Vertex Array of Structures Control
Field Name
Bits
Default
Description
VTX_NUM_ARRAYS
4:0
0x0
The number of arrays required to represent the current
vertex type.
Each Array is described by the following three fields:
VTX_AOS_ADDR, VTX_AOS_COUNT,
VTX_AOS_STRIDE.
VC_FORCE_PREFETCH
5
0x0
Force Vertex Data Pre-fetching. If this bit is set, then a
256-bit word will always be fetched, regardless of which
dwords are needed. Typically useful when
VAP_VF_CNTL.PRIM_WALK is set to Vertex List
© 2010 Advanced Micro Devices, Inc.
Proprietary
270
Revision 1.5
June 8, 2010
(Auto-incremented indices).
VC_DIS_CACHE_INVLD
(Access: R)
6
0x0
If set, the vertex cache is not invalidated between draw
packets. This allows vertex cache hits to occur from
packet to packet. This must be set with caution with
respect to multiple contexts in the driver.
AOS_0_FETCH_SIZE
16
0x0
Granule Size to Fetch for AOS 0.
0 = 128-bit granule size
1 = 256-bit granule size
This allows the driver to program the fetch size based on
DWORDS/VTX/AOS combined with AGP vs. LOC
Memory. The general belief is that the granule size
should always be 256-bits for LOC memory and AGP8X
data, but should be 128-bit for AGP2X/4X data if the
DWORDS/VTX/AOS is less than TBD (128?) bits.
AOS_1_FETCH_SIZE
17
0x0
See AOS_0_FETCH_SIZE
AOS_2_FETCH_SIZE
18
0x0
See AOS_0_FETCH_SIZE
AOS_3_FETCH_SIZE
19
0x0
See AOS_0_FETCH_SIZE
AOS_4_FETCH_SIZE
20
0x0
See AOS_0_FETCH_SIZE
AOS_5_FETCH_SIZE
21
0x0
See AOS_0_FETCH_SIZE
AOS_6_FETCH_SIZE
22
0x0
See AOS_0_FETCH_SIZE
AOS_7_FETCH_SIZE
23
0x0
See AOS_0_FETCH_SIZE
AOS_8_FETCH_SIZE
24
0x0
See AOS_0_FETCH_SIZE
AOS_9_FETCH_SIZE
25
0x0
See AOS_0_FETCH_SIZE
AOS_10_FETCH_SIZE
26
0x0
See AOS_0_FETCH_SIZE
AOS_11_FETCH_SIZE
27
0x0
See AOS_0_FETCH_SIZE
AOS_12_FETCH_SIZE
28
0x0
See AOS_0_FETCH_SIZE
AOS_13_FETCH_SIZE
29
0x0
See AOS_0_FETCH_SIZE
AOS_14_FETCH_SIZE
30
0x0
See AOS_0_FETCH_SIZE
AOS_15_FETCH_SIZE
31
0x0
See AOS_0_FETCH_SIZE
VAP:VAP_VTX_SIZE · [R/W] · 32 bits · Access: 32 · MMReg:0x20b4
DESCRIPTION: Vertex Size Specification Register
Field Name
Bits
Default
Description
DWORDS_PER_VTX
6:0
0x0
This field specifies the number of DWORDS per vertex
to expect when VAP_VF_CNTL.PRIM_WALK is set to
Vertex Data (vertex data embedded in command stream).
This field is not used for any other PRIM_WALK
settings. This field replaces the usage of the
VAP_VTX_FMT_0/1 for this purpose in prior
implementations.
VAP:VAP_VTX_STATE_CNTL · [R/W] · 32 bits · Access: 32 · MMReg:0x2180
DESCRIPTION: VAP Vertex State Control Register
© 2010 Advanced Micro Devices, Inc.
Proprietary
271
Revision 1.5
June 8, 2010
Field Name
Bits
Default
Description
COLOR_0_ASSEMBLY_CNTL
1:0
0x0
0 : Select Color 0
1 : Select User Color 0
2 : Select User Color 1
3 : Reserved
COLOR_1_ASSEMBLY_CNTL
3:2
0x0
0 : Select Color 1
1 : Select User Color 0
2 : Select User Color 1
3 : Reserved
COLOR_2_ASSEMBLY_CNTL
5:4
0x0
0 : Select Color 2
1 : Select User Color 0
2 : Select User Color 1
3 : Reserved
COLOR_3_ASSEMBLY_CNTL
7:6
0x0
0 : Select Color 3
1 : Select User Color 0
2 : Select User Color 1
3 : Reserved
COLOR_4_ASSEMBLY_CNTL
9:8
0x0
0 : Select Color 4
1 : Select User Color 0
2 : Select User Color 1
3 : Reserved
COLOR_5_ASSEMBLY_CNTL
11:10
0x0
0 : Select Color 5
1 : Select User Color 0
2 : Select User Color 1
3 : Reserved
COLOR_6_ASSEMBLY_CNTL
13:12
0x0
0 : Select Color 6
1 : Select User Color 0
2 : Select User Color 1
3 : Reserved
COLOR_7_ASSEMBLY_CNTL
15:14
0x0
0 : Select Color 7
1 : Select User Color 0
2 : Select User Color 1
3 : Reserved
UPDATE_USER_COLOR_0_ENA 16
0x0
0 : User Color 0 State is NOT updated when User Color
0 is written.
1 : User Color 1 State IS updated when User Color 0 is
written.
Reserved
0x0
Set to 0
18
VAP:VAP_VTX_ST_BLND_WT_[0-3] · [R/W] · 32 bits · Access: 32 · MMReg:0x2430-0x243c
DESCRIPTION: Data register
Field Name
Bits
Default
Description
DATA_REGISTER
31:0
0x0
BLND_WT_0
VAP:VAP_VTX_ST_CLR_[0-7]_A · [R/W] · 32 bits · Access: 32 · MMReg:0x232c-0x239c
© 2010 Advanced Micro Devices, Inc.
Proprietary
272
Revision 1.5
June 8, 2010
DESCRIPTION: Data register
Field Name
Bits
Default
Description
DATA_REGISTER
31:0
0x0
CLR_0_A
VAP:VAP_VTX_ST_CLR_[0-7]_B · [R/W] · 32 bits · Access: 32 · MMReg:0x2328-0x2398
DESCRIPTION: Data register
Field Name
Bits
Default
Description
DATA_REGISTER
31:0
0x0
CLR_0_B
VAP:VAP_VTX_ST_CLR_[0-7]_G · [R/W] · 32 bits · Access: 32 · MMReg:0x2324-0x2394
DESCRIPTION: Data register
Field Name
Bits
Default
Description
DATA_REGISTER
31:0
0x0
CLR_0_G
VAP:VAP_VTX_ST_CLR_[0-7]_PKD · [W] · 32 bits · Access: 32 · MMReg:0x2470-0x248c
DESCRIPTION: Data register
Field Name
Bits
Default
Description
DATA_REGISTER
31:0
0x0
CLR_0_PKD
VAP:VAP_VTX_ST_CLR_[0-7]_R · [R/W] · 32 bits · Access: 32 · MMReg:0x2320-0x2390
DESCRIPTION: Data register
Field Name
Bits
Default
Description
DATA_REGISTER
31:0
0x0
CLR_0_R
VAP:VAP_VTX_ST_DISC_FOG · [R/W] · 32 bits · Access: 32 · MMReg:0x2424
DESCRIPTION: Data register
Field Name
Bits
Default
Description
DATA_REGISTER
31:0
0x0
DISC_FOG
VAP:VAP_VTX_ST_EDGE_FLAGS · [R/W] · 32 bits · Access: 32 · MMReg:0x245c
DESCRIPTION: Data register
Field Name
Bits
Default
Description
DATA_REGISTER
0
0x0
EDGE_FLAGS
VAP:VAP_VTX_ST_END_OF_PKT · [W] · 32 bits · Access: 32 · MMReg:0x24ac
© 2010 Advanced Micro Devices, Inc.
Proprietary
273
Revision 1.5
June 8, 2010
DESCRIPTION: Data register
Field Name
Bits
Default
Description
DATA_REGISTER
31:0
0x0
END_OF_PKT
VAP:VAP_VTX_ST_NORM_0_PKD · [W] · 32 bits · Access: 32 · MMReg:0x2498
DESCRIPTION: Data register
Field Name
Bits
Default
Description
DATA_REGISTER
31:0
0x0
NORM_0_PKD
VAP:VAP_VTX_ST_NORM_0_X · [R/W] · 32 bits · Access: 32 · MMReg:0x2310
DESCRIPTION: Data register
Field Name
Bits
Default
Description
DATA_REGISTER
31:0
0x0
NORM_0_X
VAP:VAP_VTX_ST_NORM_0_Y · [R/W] · 32 bits · Access: 32 · MMReg:0x2314
DESCRIPTION: Data register
Field Name
Bits
Default
Description
DATA_REGISTER
31:0
0x0
NORM_0_Y
VAP:VAP_VTX_ST_NORM_0_Z · [R/W] · 32 bits · Access: 32 · MMReg:0x2318
DESCRIPTION: Data register
Field Name
Bits
Default
Description
DATA_REGISTER
31:0
0x0
NORM_0_Z
VAP:VAP_VTX_ST_NORM_1_X · [R/W] · 32 bits · Access: 32 · MMReg:0x2450
DESCRIPTION: Data register
Field Name
Bits
Default
Description
DATA_REGISTER
31:0
0x0
NORM_1_X
VAP:VAP_VTX_ST_NORM_1_Y · [R/W] · 32 bits · Access: 32 · MMReg:0x2454
DESCRIPTION: Data register
Field Name
Bits
Default
Description
DATA_REGISTER
31:0
0x0
NORM_1_Y
VAP:VAP_VTX_ST_NORM_1_Z · [R/W] · 32 bits · Access: 32 · MMReg:0x2458
© 2010 Advanced Micro Devices, Inc.
Proprietary
274
Revision 1.5
June 8, 2010
DESCRIPTION: Data register
Field Name
Bits
Default
Description
DATA_REGISTER
31:0
0x0
NORM_1_Z
VAP:VAP_VTX_ST_PNT_SPRT_SZ · [R/W] · 32 bits · Access: 32 · MMReg:0x2420
DESCRIPTION: Data register
Field Name
Bits
Default
Description
DATA_REGISTER
31:0
0x0
PNT_SPRT_SZ
VAP:VAP_VTX_ST_POS_0_W_4 · [R/W] · 32 bits · Access: 32 · MMReg:0x230c
DESCRIPTION: Data register
Field Name
Bits
Default
Description
DATA_REGISTER
31:0
0x0
POS_0_W
VAP:VAP_VTX_ST_POS_0_X_2 · [W] · 32 bits · Access: 32 · MMReg:0x2490
DESCRIPTION: Data register
Field Name
Bits
Default
Description
DATA_REGISTER
31:0
0x0
POS_0_X_2
VAP:VAP_VTX_ST_POS_0_X_3 · [W] · 32 bits · Access: 32 · MMReg:0x24a0
DESCRIPTION: Data register
Field Name
Bits
Default
Description
DATA_REGISTER
31:0
0x0
POS_0_X_3
VAP:VAP_VTX_ST_POS_0_X_4 · [R/W] · 32 bits · Access: 32 · MMReg:0x2300
DESCRIPTION: Data register
Field Name
Bits
Default
Description
DATA_REGISTER
31:0
0x0
POS_0_X
VAP:VAP_VTX_ST_POS_0_Y_2 · [W] · 32 bits · Access: 32 · MMReg:0x2494
DESCRIPTION: Data register
Field Name
Bits
Default
Description
DATA_REGISTER
31:0
0x0
POS_0_Y_2
VAP:VAP_VTX_ST_POS_0_Y_3 · [W] · 32 bits · Access: 32 · MMReg:0x24a4
© 2010 Advanced Micro Devices, Inc.
Proprietary
275
Revision 1.5
June 8, 2010
DESCRIPTION: Data register
Field Name
Bits
Default
Description
DATA_REGISTER
31:0
0x0
POS_0_Y_3
VAP:VAP_VTX_ST_POS_0_Y_4 · [R/W] · 32 bits · Access: 32 · MMReg:0x2304
DESCRIPTION: Data register
Field Name
Bits
Default
Description
DATA_REGISTER
31:0
0x0
POS_0_Y
VAP:VAP_VTX_ST_POS_0_Z_3 · [W] · 32 bits · Access: 32 · MMReg:0x24a8
DESCRIPTION: Data register
Field Name
Bits
Default
Description
DATA_REGISTER
31:0
0x0
POS_0_Z_3
VAP:VAP_VTX_ST_POS_0_Z_4 · [R/W] · 32 bits · Access: 32 · MMReg:0x2308
DESCRIPTION: Data register
Field Name
Bits
Default
Description
DATA_REGISTER
31:0
0x0
POS_0_Z
VAP:VAP_VTX_ST_POS_1_W · [R/W] · 32 bits · Access: 32 · MMReg:0x244c
DESCRIPTION: Data register
Field Name
Bits
Default
Description
DATA_REGISTER
31:0
0x0
POS_1_W
VAP:VAP_VTX_ST_POS_1_X · [R/W] · 32 bits · Access: 32 · MMReg:0x2440
DESCRIPTION: Data register
Field Name
Bits
Default
Description
DATA_REGISTER
31:0
0x0
POS_1_X
VAP:VAP_VTX_ST_POS_1_Y · [R/W] · 32 bits · Access: 32 · MMReg:0x2444
DESCRIPTION: Data register
Field Name
Bits
Default
Description
DATA_REGISTER
31:0
0x0
POS_1_Y
VAP:VAP_VTX_ST_POS_1_Z · [R/W] · 32 bits · Access: 32 · MMReg:0x2448
© 2010 Advanced Micro Devices, Inc.
Proprietary
276
Revision 1.5
June 8, 2010
DESCRIPTION: Data register
Field Name
Bits
Default
Description
DATA_REGISTER
31:0
0x0
POS_1_Z
VAP:VAP_VTX_ST_PVMS · [R/W] · 32 bits · Access: 32 · MMReg:0x231c
DESCRIPTION: Data register
Field Name
Bits
Default
Description
DATA_REGISTER
31:0
0x0
PVMS
VAP:VAP_VTX_ST_SHININESS_0 · [R/W] · 32 bits · Access: 32 · MMReg:0x2428
DESCRIPTION: Data register
Field Name
Bits
Default
Description
DATA_REGISTER
31:0
0x0
SHININESS_0
VAP:VAP_VTX_ST_SHININESS_1 · [R/W] · 32 bits · Access: 32 · MMReg:0x242c
DESCRIPTION: Data register
Field Name
Bits
Default
Description
DATA_REGISTER
31:0
0x0
SHININESS_1
VAP:VAP_VTX_ST_TEX_[0-7]_Q · [R/W] · 32 bits · Access: 32 · MMReg:0x23ac-0x241c
DESCRIPTION: Data register
Field Name
Bits
Default
Description
DATA_REGISTER
31:0
0x0
TEX_0_Q
VAP:VAP_VTX_ST_TEX_[0-7]_R · [R/W] · 32 bits · Access: 32 · MMReg:0x23a8-0x2418
DESCRIPTION: Data register
Field Name
Bits
Default
Description
DATA_REGISTER
31:0
0x0
TEX_0_R
VAP:VAP_VTX_ST_TEX_[0-7]_S · [R/W] · 32 bits · Access: 32 · MMReg:0x23a0-0x2410
DESCRIPTION: Data register
Field Name
Bits
Default
Description
DATA_REGISTER
31:0
0x0
TEX_0_S
VAP:VAP_VTX_ST_TEX_[0-7]_T · [R/W] · 32 bits · Access: 32 · MMReg:0x23a4-0x2414
© 2010 Advanced Micro Devices, Inc.
Proprietary
277
Revision 1.5
June 8, 2010
DESCRIPTION: Data register
Field Name
Bits
Default
Description
DATA_REGISTER
31:0
0x0
TEX_0_T
VAP:VAP_VTX_ST_USR_CLR_A · [R/W] · 32 bits · Access: 32 · MMReg:0x246c
DESCRIPTION: Data register
Field Name
Bits
Default
Description
DATA_REGISTER
31:0
0x0
USR_CLR_A
VAP:VAP_VTX_ST_USR_CLR_B · [R/W] · 32 bits · Access: 32 · MMReg:0x2468
DESCRIPTION: Data register
Field Name
Bits
Default
Description
DATA_REGISTER
31:0
0x0
USR_CLR_B
VAP:VAP_VTX_ST_USR_CLR_G · [R/W] · 32 bits · Access: 32 · MMReg:0x2464
DESCRIPTION: Data register
Field Name
Bits
Default
Description
DATA_REGISTER
31:0
0x0
USR_CLR_G
VAP:VAP_VTX_ST_USR_CLR_PKD · [W] · 32 bits · Access: 32 · MMReg:0x249c
DESCRIPTION: Data register
Field Name
Bits
Default
Description
DATA_REGISTER
31:0
0x0
USR_CLR_PKD
VAP:VAP_VTX_ST_USR_CLR_R · [R/W] · 32 bits · Access: 32 · MMReg:0x2460
DESCRIPTION: Data register
Field Name
Bits
Default
Description
DATA_REGISTER
31:0
0x0
USR_CLR_R
© 2010 Advanced Micro Devices, Inc.
Proprietary
278
Revision 1.5
June 8, 2010
11.12 Z Buffer Registers
ZB:ZB_BW_CNTL · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x4f1c
DESCRIPTION: Z Buffer Band-Width Control
Field Name
Bits
Defa
Description
ult
HIZ_ENABLE
0
0x0
Enables hierarchical Z.
POSSIBLE VALUES:
00 - Hierarchical Z Disabled
01 - Hierarchical Z Enabled
HIZ_MIN
1
0x0
POSSIBLE VALUES:
00 - Update Hierarchical Z with Max value
01 - Update Hierarchical Z with Min value
FAST_FILL
2
0x0
POSSIBLE VALUES:
00 - Fast Fill Disabled
01 - Fast Fill Enabled (ZB_DEPTHCLEARVALUE )
RD_COMP_ENABLE
3
0x0
Enables reading of compressed Z data from memory to the
cache.
POSSIBLE VALUES:
00 - Z Read Compression Disabled
01 - Z Read Compression Enabled
WR_COMP_ENABLE
4
0x0
Enables writing of compressed Z data from cache to memory,
POSSIBLE VALUES:
00 - Z Write Compression Disabled
01 - Z Write Compression Enabled
ZB_CB_CLEAR
5
0x0
This bit is set when the Z buffer is used to help the CB in
clearing a region. Part of the region is cleared by the color
buffer and part will be cleared by the Z buffer. Since the Z
buffer does not have any write masks in the cache, full microtiles need to be written. If a partial micro-tile is touched, then
the un-touched part will be unknowns. The cache will operate
in write-allocate mode and quads will be accumulated in the
cache and then evicted to main memory. The color value is
supplied through the ZB_DEPTHCLEARVALUE register.
POSSIBLE VALUES:
00 - Z unit cache controller does RMW
01 - Z unit cache controller does cache-line granular Write
only
FORCE_COMPRESSED_STENCIL_V 6
ALUE
© 2010 Advanced Micro Devices, Inc.
Proprietary
0x0
Enabling this bit will force all the compressed stencil values
to be equal to
old_stencil_value&~ZB_STENCILREFMASK.stencilwritem
ask |
ZB_STENCILREFMASK.stencilref&ZB_STENCILREFMA
279
Revision 1.5
June 8, 2010
SK.stencilwritemask. This should be enabled during stencil
clears to avoid needless decompression.
POSSIBLE VALUES:
00 - Do not force the compressed stencil value.
01 - Force the compressed stencil value.
ZEQUAL_OPTIMIZE_DISABLE
7
0x0
By default this is 0 (enabled). When NEWZ=OLDZ, then
writes do not occur to save BW.
POSSIBLE VALUES:
00 - Enable not updating the Z buffer if NewZ=OldZ
01 - Disable above feature (in case there is a bug)
SEQUAL_OPTIMIZE_DISABLE
8
0x0
By default this is 0 (enabled). When
NEW_STENCIL=OLD_STENCIL, then writes do not occur
to save BW.
POSSIBLE VALUES:
00 - Enable not updating the Stencil buffer if NewS=OldS
01 - Disable above feature (in case there is a bug)
BMASK_DISABLE
10
0x0
Controls whether bytemasking is used or not.
POSSIBLE VALUES:
00 - Enable bytemasking
01 - Disable bytemasking
HIZ_EQUAL_REJECT_ENABLE
11
0x0
Enables hiz rejects when the z function is equals.
POSSIBLE VALUES:
00 - Disable
01 - Enable
HIZ_FP_EXP_BITS
14: 0x0
12
Number of exponent bits to use for the hiz floating point
format. Values 0 to 5 are legal. 0 will disable the floating
point hiz encoding.
HIZ_FP_INVERT
15
Determines whether leading zeros or ones are eliminated.
0x0
POSSIBLE VALUES:
00 - Count leading 1s
01 - Count leading 0s
TILE_OVERWRITE_RECOMPRESSI
ON_DISABLE
16
0x0
The zb tries to detect single plane equations that completely
overwrite a compressed tile. This allows the tile to jump from
the decompressed state to the fully compressed state.
POSSIBLE VALUES:
00 - Enable tile overwrite recompression
01 - Disable tile overwrite recompression
CONTIGUOUS_6XAA_SAMPLES_DI 17
SABLE
0x0
This disables storing samples contiguously in 6xaa.
POSSIBLE VALUES:
00 - Enable contiguous samples
01 - Disable contiguous samples
© 2010 Advanced Micro Devices, Inc.
Proprietary
280
Revision 1.5
PEQ_PACKING_ENABLE
18
0x0
June 8, 2010
Enables packing of the plane equations to eliminate wasted
peq slots.
POSSIBLE VALUES:
00 - Disable
01 - Enable
COVERED_PTR_MASKING_ENABL 19
E
0x0
Enables discarding of pointers from pixels that are going to be
covered. This reduces the apparent number of plane equations
in use.
POSSIBLE VALUES:
00 - Disable
01 - Enable
ZB:ZB_CNTL · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x4f00
DESCRIPTION: Z Buffer Control
Field Name
Bits
Default Description
STENCIL_ENABLE
0
0x0
Enables stenciling.
POSSIBLE VALUES:
00 - Disabled
01 - Enabled
Z_ENABLE
1
0x0
Enables Z functions.
POSSIBLE VALUES:
00 - Disabled
01 - Enabled
ZWRITEENABLE
2
0x0
Enables writing of the Z buffer.
POSSIBLE VALUES:
00 - Disable
01 - Enable
ZSIGNED_COMPARE
3
0x0
Enable signed Z buffer comparison , for W-buffering.
POSSIBLE VALUES:
00 - Disable
01 - Enable
STENCIL_FRONT_BACK
© 2010 Advanced Micro Devices, Inc.
Proprietary
4
0x0
When STENCIL_ENABLE is set, setting
STENCIL_FRONT_BACK bit to one specifies that
stencilfunc/stencilfail/stencilzpass/stencilzfail registers are
used if the quad is generated from front faced primitive
and
stencilfunc_bf/stencilfail_bf/stencilzpass_bf/stencilzfail_bf
are used if the quad is generated from a back faced
primitive. If the STENCIL_FRONT_BACK is not set,
then stencilfunc/stencilfail/stencilzpass/stencilzfail
registers determine the operation independent of the
front/back face state of the quad.
281
Revision 1.5
June 8, 2010
POSSIBLE VALUES:
00 - Disable
01 - Enable
ZSIGNED_MAGNITUDE
5
0x0
Specifies the signed number type to use for the Z buffer
comparison. This only has an effect when
ZSIGNED_COMPARE is enabled.
POSSIBLE VALUES:
00 - Twos complement
01 - Signed magnitude
STENCIL_REFMASK_FRONT_BACK 6
0x0
POSSIBLE VALUES:
00 - Disable
01 - Enable
ZB:ZB_DEPTHCLEARVALUE · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x4f28
DESCRIPTION: Z Buffer Clear Value
Field Name
Bits
Default
Description
DEPTHCLEARVALUE
31:0
0x0
When a block has a Z Mask value of 0, all Z values in
that block are cleared to this value. In 24bpp, the stencil
value is also updated regardless of whether it is enabled
or not.
ZB:ZB_DEPTHOFFSET · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x4f20
DESCRIPTION: Z Buffer Address Offset
Field Name
Bits
Default
Description
DEPTHOFFSET
31:5
0x0
2K aligned Z buffer address offset for macro tiles.
ZB:ZB_DEPTHPITCH · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x4f24
DESCRIPTION: Z Buffer Pitch and Endian Control
Field Name
Bits
Default
Description
DEPTHPITCH
13:2
0x0
Z buffer pitch in multiples of 4 pixels.
DEPTHMACROTILE
16
0x0
Specifies whether Z buffer is macro-tiled. macro-tiles are
2K aligned
POSSIBLE VALUES:
00 - macro tiling disabled
01 - macro tiling enabled
DEPTHMICROTILE
18:17
0x0
Specifies whether Z buffer is micro-tiled. micro-tiles is
32 bytes
POSSIBLE VALUES:
00 - 32 byte cache line is linear
© 2010 Advanced Micro Devices, Inc.
Proprietary
282
Revision 1.5
June 8, 2010
01 - 32 byte cache line is tiled
02 - 32 byte cache line is tiled square (only applies to
16-bit pixels)
03 - Reserved
DEPTHENDIAN
20:19
0x0
Specifies endian control for the Z buffer.
POSSIBLE VALUES:
00 - No swap
01 - Word swap
02 - Dword swap
03 - Half Dword swap
ZB:ZB_DEPTHXY_OFFSET · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x4f60
DESCRIPTION: Depth buffer X and Y coordinate offset
Field Name
Bits
Default
Description
DEPTHX_OFFSET
11:1
0x0
X coordinate offset. multiple of 32 . Bits 4:0 have to be
zero
DEPTHY_OFFSET
27:17
0x0
Y coordinate offset. multiple of 32 . Bits 4:0 have to be
zero
ZB:ZB_FIFO_SIZE · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x4fd0
DESCRIPTION: Sets the fifo sizes
Field Name
Bits
Default
Description
OP_FIFO_SIZE
1:0
0x0
Determines the size of the op fifo
POSSIBLE VALUES:
00 - Full size
01 - 1/2 size
02 - 1/4 size
03 - 1/8 size
ZB:ZB_FORMAT · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x4f10
DESCRIPTION: Format of the Data in the Z buffer
Field Name
Bits
Default
Description
DEPTHFORMAT
3:0
0x0
Specifies the format of the Z buffer.
POSSIBLE VALUES:
00 - 16-bit Integer Z
01 - 16-bit compressed 13E3
02 - 24-bit Integer Z, 8 bit Stencil (LSBs)
03 - RESERVED
04 - RESERVED
05 - RESERVED
06 - RESERVED
© 2010 Advanced Micro Devices, Inc.
Proprietary
283
Revision 1.5
June 8, 2010
07 - RESERVED
08 - RESERVED
09 - RESERVED
10 - RESERVED
11 - RESERVED
12 - RESERVED
13 - RESERVED
14 - RESERVED
15 - RESERVED
INVERT
4
0x0
POSSIBLE VALUES:
00 - in 13E3 format , count leading 1`s
01 - in 13E3 format , count leading 0`s.
PEQ8
5
0x0
This bit is unused
ZB:ZB_HIZ_DWORD · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x4f4c
DESCRIPTION: Hierarchical Z Data
Field Name
Bits
Default
Description
HIZ_DWORD
31:0
0x0
This DWORD contains 8-bit values for 4 blocks..
Reading this register causes a read from the address
pointed to by RDINDEX. Writing to this register causes
a write to the address pointed to by WRINDEX.
ZB:ZB_HIZ_OFFSET · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x4f44
DESCRIPTION: Hierarchical Z Memory Offset
Field Name
Bits
Default
Description
HIZ_OFFSET
17:2
0x0
DWORD offset into HiZ RAM.
ZB:ZB_HIZ_PITCH · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x4f54
DESCRIPTION: Hierarchical Z Pitch
Field Name
Bits
Default
Description
HIZ_PITCH
13:4
0x0
Pitch used in HiZ address computation.
ZB:ZB_HIZ_RDINDEX · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x4f50
DESCRIPTION: Hierarchical Z Read Index
Field Name
Bits
Default
Description
HIZ_RDINDEX
17:2
0x0
Read index into HiZ RAM.
ZB:ZB_HIZ_WRINDEX · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x4f48
DESCRIPTION: Hierarchical Z Write Index
Field Name
© 2010 Advanced Micro Devices, Inc.
Proprietary
Bits
Default
Description
284
Revision 1.5
HIZ_WRINDEX
17:2
0x0
June 8, 2010
Self-incrementing write index into the HiZ RAM.
Starting write index must start on a DWORD boundary.
Each time ZB_HIZ_DWORD is written, this index will
autoincrement. HIZ_OFFSET and HIZ_PITCH are not
used to compute read/write address to HIZ ram, when it
is accessed through WRINDEX and DWORD
ZB:ZB_STENCILREFMASK · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x4f08
DESCRIPTION: Stencil Reference Value and Mask
Field Name
Bits
Default
Description
STENCILREF
7:0
0x0
Specifies the reference stencil value.
STENCILMASK
15:8
0x0
This value is ANDed with both the reference and the
current stencil value prior to the stencil test.
STENCILWRITEMASK
23:16
0x0
Specifies the write mask for the stencil planes.
ZB:ZB_STENCILREFMASK_BF · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x4fd4
DESCRIPTION: Stencil Reference Value and Mask for backfacing quads
Field Name
Bits
Default
Description
STENCILREF
7:0
0x0
Specifies the reference stencil value.
STENCILMASK
15:8
0x0
This value is ANDed with both the reference and the
current stencil value prior to the stencil test.
STENCILWRITEMASK
23:16
0x0
Specifies the write mask for the stencil planes.
ZB:ZB_ZCACHE_CTLSTAT · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x4f18
DESCRIPTION: Z Buffer Cache Control/Status
Field Name
Bits
Default
Description
ZC_FLUSH
0
0x0
Setting this bit flushes the dirty data from the Z cache.
Unless ZC_FREE bit is also set, the tags in the cache
remain valid. A purge is achieved by setting both
ZC_FLUSH and ZC_FREE. This is a sticky bit and it
clears itself at the end of the operation.
POSSIBLE VALUES:
00 - No effect
01 - Flush and Free Z cache lines
ZC_FREE
1
0x0
Setting this bit invalidates the Z cache tags. Unless
ZC_FLUSH bit is also set, the cachelines are not written
to memory. A purge is achieved by setting both
ZC_FLUSH and ZC_FREE. This is a sticky bit that
clears itself at the end of the operation.
POSSIBLE VALUES:
00 - No effect
© 2010 Advanced Micro Devices, Inc.
Proprietary
285
Revision 1.5
June 8, 2010
01 - Free Z cache lines (invalidate)
ZC_BUSY
31
0x0
This bit is unused ...
POSSIBLE VALUES:
00 - Idle
01 - Busy
ZB:ZB_ZPASS_ADDR · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x4f5c
DESCRIPTION: Z Buffer Z Pass Counter Address
Field Name
Bits
Default
Description
ZPASS_ADDR
31:2
0x0
Writing this location with a DWORD address causes the
value in ZB_ZPASS_DATA to be written to main
memory at the location pointed to by this address.
NOTE: R300 has 2 pixel pipes. Broadcasting this address
causes both pipes to write their ZPASS value to the same
address. There is no guarantee which pipe will write last.
So when writing to this register, the GA needs to be
programmed to send the write command to pipe 0. Then
a different address needs to be written to pipe 1. Then
both pipes should be enabled for further register writes.
ZB:ZB_ZPASS_DATA · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x4f58
DESCRIPTION: Z Buffer Z Pass Counter Data
Field Name
Bits
Default
Description
ZPASS_DATA
31:0
0x0
Contains the number of Z passed pixels since the last
write to this location. Writing this location resets the
count to the value written.
ZB:ZB_ZSTENCILCNTL · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x4f04
DESCRIPTION: Z and Stencil Function Control
Field Name
Bits
Default
Description
ZFUNC
2:0
0x0
Specifies the Z function.
POSSIBLE VALUES:
00 - Never
01 - Less
02 - Less or Equal
03 - Equal
04 - Greater or Equal
05 - Greater Than
06 - Not Equal
07 - Always
STENCILFUNC
© 2010 Advanced Micro Devices, Inc.
Proprietary
5:3
0x0
Specifies the stencil function.
286
Revision 1.5
June 8, 2010
POSSIBLE VALUES:
00 - Never
01 - Less
02 - Less or Equal
03 - Equal
04 - Greater or Equal
05 - Greater
06 - Not Equal
07 - Always
STENCILFAIL
8:6
0x0
Specifies the stencil value to be written if the stencil test
fails.
POSSIBLE VALUES:
00 - Keep: New value = Old value
01 - Zero: New value = 0
02 - Replace: New value = STENCILREF
03 - Increment: New value++ (clamp)
04 - Decrement: New value-- (clamp)
05 - Invert new value: New value = !Old value
06 - Increment: New value++ (wrap)
07 - Decrement: New value-- (wrap)
STENCILZPASS
11:9
0x0
Same encoding as STENCILFAIL. Specifies the stencil
value to be written if the stencil test passes and the Z test
passes (or is not enabled).
STENCILZFAIL
14:12
0x0
Same encoding as STENCILFAIL. Specifies the stencil
value to be written if the stencil test passes and the Z test
fails.
STENCILFUNC_BF
17:15
0x0
Same encoding as STENCILFUNC. Specifies the stencil
function for back faced quads , if
STENCIL_FRONT_BACK = 1.
STENCILFAIL_BF
20:18
0x0
Same encoding as STENCILFAIL. Specifies the stencil
value to be written if the stencil test fails for back faced
quads, if STENCIL_FRONT_BACK = 1
STENCILZPASS_BF
23:21
0x0
Same encoding as STENCILFAIL. Specifies the stencil
value to be written if the stencil test passes and the Z test
passes (or is not enabled) for back faced quads, if
STENCIL_FRONT_BACK = 1
STENCILZFAIL_BF
26:24
0x0
Same encoding as STENCILFAIL. Specifies the stencil
value to be written if the stencil test passes and the Z test
fails for back faced quads, if STENCIL_FRONT_BACK
=1
ZERO_OUTPUT_MASK
27
0x0
Zeroes the zb coverage mask output. This does not affect
the updating of the depth or stencil values.
POSSIBLE VALUES:
00 - Disable
01 - Enable
© 2010 Advanced Micro Devices, Inc.
Proprietary
287
Revision 1.5
June 8, 2010
ZB:ZB_ZTOP · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x4f14
Field Name
Bits
Default
Description
ZTOP
0
0x0
POSSIBLE VALUES:
00 - Z is at the bottom of the pipe, after the fog unit.
01 - Z is at the top of the pipe, after the scan unit.
© 2010 Advanced Micro Devices, Inc.
Proprietary
288