ETC ALTIVECWP

Freescale Semiconductor, Inc.
ALTIVECWP/D
Motorola’s AltiVec™ Technology
Sam Fuller
System Architecture & Product Planning Manager,
Networking & Computing Core Technologies
Freescale Semiconductor, Inc...
Motorola Inc.
Semiconductor Product Sector
6501 William Cannon Drive West, Austin, Texas 78735
Introduction
Branch
Unit
Over the last 25 years, microprocessors have enjoyed a continuous increase in performance and attendant reduction in
price/performance. Current best of breed microprocessors
operate at frequencies in excess of 300 MHz and offer superscalar instruction dispatch, sophisticated branch prediction
techniques and support for high performance memory systems including external second level cache controllers.
As general purpose microprocessors have continued to
become more powerful, they have been asked to perform
increasingly complex tasks. In fact, the trend of doubling
system performance every 1.5 to 2 years has not met the
requirements of the networking and telecommunications
infrastructure industry due to several emerging applications
and trends. Example applications include the explosive
growth of the Internet, the emergence of new digital communications technologies, including digital cellular phones
employing CDMA, TDMA and PCS technologies, IP-based
telephony, fax and multimedia and wireless messaging. A
general trend in the industry is using programmable processors to implement adaptive filters, modulators/demodulators, and other functions once only possible in hardware.
These trends and applications have created tremendous
opportunities for high-performance, high bandwidth processors. These demanding new applications, along with the continually increasing needs of the computing market, necessitated a new approach in how to maximize performance in
order to provide our customers with the order of magnitude
increase in key application performance they demand.
To meet these needs, a new class of microprocessor product is
called for. One which offers in a single chip solution the high-
INST INST INST
INST
INST
ADDR
Integer
Unit
Floating-Point
Unit
Vector
Unit
GPRs
FPRs
VRs
DATA
DATA
DATA
ADDR
DATA
Memory
Figure 1. High-level structural overview for PowerPC with
AltiVec technology
est level of processing performance while expanding the
processor’s capabilities to concurrently address high-bandwidth data processing and the algorithmic intensive computations which today are typically handled off-chip by other
devices, such as dedicated hardware, DSP farms or custom
ASICs. Motorola is introducing a new technology that provides for this convergence in capabilities — AltiVec technology.
AltiVec technology is Motorola’s high-performance vector
parallel processing expansion to the PowerPC™ RISC
processor architecture. Motorola microprocessors offering
AltiVec technology will represent a new class of product. In
addition to providing 100% compatibility with the industrystandard PowerPC Architecture™, AltiVec technology will
also provide product designers and customers with a new
“one part—one code base” approach to product design
which simplifies design and support while simultaneously
providing a tremendous jump in performance.
1
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
M o t o r o l a ’s A l t i Ve c Te c h n o l o g y — W h i t e P a p e r
vA
vB
vC
op
op
op
op
op
op
op
op
op
op
op
op
op
op
op
op
vT
Figure 2. Generic presentation of a four operand, 16-element, intra-element operation
Freescale Semiconductor, Inc...
AltiVec Technology
• 4-way parallelism for 32-bit signed and unsigned integers
and IEEE floating-point numbers
Motorola's AltiVec technology expands the current
PowerPC architecture through the addition of a 128-bit
vector execution unit, which operates concurrently with the
existing integer and floating point units. This new engine
provides for highly parallel operations, allowing for the
simultaneous execution of up to 16 operations in a single
clock cycle.
AltiVec technology is a short vector parallel architecture.
Depending on data size, vectors are 4, 8 or 16 elements long.
This can be contrasted with the long vector architectures of
supercomputers that were popular in the 1980s. Vector sizes
for those machines ranged to hundreds of elements. The long
vector approach of supercomputers, while useful for scientific calculations, is not optimal for the communications,
multimedia and other performance-driven applications targeted by Motorola with AltiVec technology.
AltiVec technology operations are performed on multiple
data elements by a single instruction. This is often referred
to as SIMD (single instructions, multiple data) parallel processing. AltiVec technology offers support for:
• 16-way parallelism for 8-bit signed and unsigned integers
and characters,
• 8-way parallelism for 16-bit signed and unsigned integers
vA
vB
00000000000000000000000000000
+
00000000000000000000000000000
vT
Figure 3. Sum Across — an inter-element arithmetic operation
AltiVec technology also includes a separate register file
containing 32-entries, each 128-bits wide. These 128-bit
wide registers hold the data sources for the AltiVec technology execution units. The registers are loaded and
unloaded through vector store and vector load instructions
that transfer the contents of a single 128-bit register to and
from memory.
AltiVec technology can be most accurately thought of as a
set of registers and execution units added to the PowerPC
architecture in an analogous manner to the addition of floating point units. Floating point units were added to most
mainstream microprocessor architectures several years ago
to provide better support for high-precision scientific calculations. AltiVec technology is being added to the PowerPC
architecture to dramatically accelerate the next level of performance-driven, high-bandwidth communications and
computing applications.
Each AltiVec instruction specifies up to three source
operands and a single destination operand. All operands are
vector registers, with the exception of the load and store
instructions and a few instruction types that provide
operands from immediate fields within the instruction. 162
new unique instructions are defined for the AltiVec technology. These instructions fall into the following major classes.
1. Intra-Element Arithmetic Operations
Intra-element arithmetic operations perform independent
parallel computations on the elements contained in the
source vector registers and place the results in the corresponding fields of the destination vector register. Both signed
and unsigned integers and floating-point data types are supported by the intra-element operations. The operations support both saturation and modulo arithmetic. A variety of
powerful intra-element operations are defined in the AltiVec
technology: addition, subtraction, multiply, and multiply-
2
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
M o t o r o l a ’s A l t i Ve c Te c h n o l o g y — W h i t e P a p e r
add. Additional instructions perform min, max and average,
as well as conversion between floating-point and 32-bit integer numerical formats.
01 14 18 10 16 15 19 1A 1C 1C 1C 13 08 1D 1B OE vC
0
1
2
vA
11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F
vB
4
5
6
7
8
F
3
9
A
B
C
D
E
Freescale Semiconductor, Inc...
2. Intra-Element Non-Arithmetic Operations
Intra-element non-arithmetic operations include various
forms of compare, shift, and rotate. The following logical
operations are also supported: AND, OR, NOT, XOR,
AND-NOT. A select instruction is also provided. This
instruction is designed to select or choose source data from
one of two source registers and transfer that data to the
results register. The combination of compare and select provides a powerful way to mask and replace data elements
across the entire 16-byte field of the vector registers with a
very few instructions.
3. Inter-Element Arithmetic Operations
A few special inter-element arithmetic operations are provided in the AltiVec technology, these operations are sum of
products and sum across. These operations allow for elements within a single vector register to be summed in combination with a separate accumulation register. These operations are valuable for generating dot products which are the
most common vector operation.
4. Inter-Element Non-Arithmetic Operations
In addition to the powerful intra-element and inter-element
arithmetic operations, AltiVec technology also defines a
group of very powerful inter-element non-arithmetic operations. These inter-element operations include wide field shift
operations, pack and unpack operations, including a special
operation to handle the 1/5/5/5 pixel format common for
16-bit color pixels. Merge operations are also provided that
can interleave data at the byte, halfword and word level.
Perhaps the most powerful inter-element operation offered
in the AltiVec technology is the permute operation. The permute operation is capable of arbitrarily selecting data with
the granularity of a byte from two 16-byte source registers
into a single 16-byte destination register.
For operations where 8- and 16-bit data items must be
reorganized in memory before or after computations, permute can save significant time. In many instances a single
permute operation can operate on 16 bytes of data and
replace 4 or 5 operations per byte using a traditional RISC
or DSP operation.
The powerful inter-element operations of AltiVec technology
define a microprocessor not just capable of operating on 8,
16 and 32-bit data elements in parallel but of operating on
data 128 bits (16 bytes) at a time.
10
vT
Figure 4. The inter-element Permute operation
Applications of AltiVec Technology
The initial target applications for AltiVec technology
include: IP telephony gateways, multi-channel modems,
speech processing systems, echo cancelers, image and video
processing systems, scientific array processing systems, as
well as network infrastructure such as Internet routers and
virtual private network servers.
In addition to accelerating next-generation applications,
AltiVec technology can, through its wide datapaths and wide
field operations, also accelerate many time-consuming traditional computing and embedded processing operations such
as memory copies, string compares and page clears.
Unlike fixed function solutions which are most often implemented as application specific integrated circuits, AltiVec
technology will offer a programmable solution that can easily migrate via software upgrades to follow changing standards and customer requirements. The preferred programming environment is the C and C++ languages favored by
Communication
Control
Computation
bus
DSP
DSP
DSP
DSP
DSP
DSP
DSP
DSP
DSP
Interface
Circuit
Controller
Interface
Circuit
Memory
Figure 5. Typical controller plus DSP system
3
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
M o t o r o l a ’s A l t i Ve c Te c h n o l o g y — W h i t e P a p e r
Communication
Control & Computation
bus
Interface
Circuit*
PowerPC
Processor
with AltiVec
Technology
PowerPC
Processor
with AltiVec
Technology
Interface
Circuit*
Freescale Semiconductor, Inc...
Memory
* Such as Motorola MPC860 PowerQUICC™ controller
Figure 6. System using multiple PowerPC processors with
AltiVec technology, sharing a common bus bridged to
shared memory
embedded systems developers. To more easily express the
parallelism presented by AltiVec technology, Motorola has
developed a standardized set of C/C++ language extensions.
These language extensions allow a software developer to use
their preferred C/C++ development environment and language syntax while explicitly taking advantage of the parallel functional units other facilities offered by the AltiVec
technology. Motorola is working with leading tools
providers to develop simulators, assemblers, linkers and
compilers to assure full support for the AltiVec technology.
While the initial PowerPC microprocessor utilizing AltiVec
technology will target very high-performance applications in
networking and computing, subsequent Motorola processors with AltiVec technology could address markets and
applications in which performance must be balanced with
power, price and peripheral integration.
A New Design Model
The introduction of processors containing AltiVec technology creates a new model of system design for high-performance embedded systems. Historically, many high-perfor-
mance embedded applications have contained a combination
of a single RISC processor performing the system control
function and one or more DSPs or ASICs performing specialized computations.
The single RISC processor plus multiple DSP system has a
number of disadvantages, including two different architectures, code bases, hardware types, and debug environments.
Additionally, because DSPs have not been on the same performance growth curve as general purpose processors - for
example, they often require users to switch to newer noncompatible architectures from generation to generation,
even minor upgrades in a customer’s product performance
often required major hardware redesigns; often including
changing DSP or controller architectures with the attendant
cost and time to market impact.
AltiVec technology-based systems can provide more capable
single architecture systems, often at lower cost, power budget, and physical area than controller plus DSP solutions.
The use of a single high-performance device for controller
and signal processing functions results in quicker time to
market and lower overall engineering cost. A single architecture solution provides a simpler development task to both
the hardware and software engineer.
Summary
With the introduction of AltiVec technology, Motorola is
demonstrating its commitment to the PowerPC architecture
and to meeting the requirements of next generation networking, communications and computing applications.
AltiVec technology will expand the PowerPC microprocessor capability by providing leading edge general purpose
processing performance while concurrently addressing highbandwidth data handling processing and algorithmic intensive computations in a single chip solution. This new class of
processor will provide an aggressive performance growth
path for embedded and computing systems designers, while
lowering development barriers inherent in multiple architecture designs, thereby reducing the time to market and total
system development expense.
©1998 Motorola, Inc. All rights reserved. Printed in the U.S.A. Motorola and the
are registered trademarks and AltiVec and the AltiVec logo are trademarks of Motorola, Inc. PowerPC, the PowerPC logo and PowerPC Architecture are trademarks of
International Business Machines Corporation and used by Motorola, Inc. under license therefrom. This document contains information on a new product under development. Specifications and information herein are subject to change without notice.
TM
For More Information
On This Product,
4
Go to: www.freescale.com