ETC CPPM00WP100

Freescale Semiconductor, Inc.
1HWZRUN3URFHVVRU3URJUDPPLQJ0RGHOV
7KH.H\WR$FKLHYLQJ)DVWHU7LPHWR0DUNHW
DQG([WHQGLQJ3URGXFW/LIH
Freescale Semiconductor, Inc...
The design of the network equipment powering the Internet revolution has undergone profound
changes over the last decade. Today, with network equipment vendors racing to provide the
new converged voice/video/data communications infrastructure, designers require both speed
and flexibility to deliver within the highest time-to-market pressures the industry has ever seen.
Powerful new network processors are challenging traditional network device design methodologies by enabling software implementations of virtually all key communications functions at hardware speeds. Key to this revolution is the programming models that enable designers to
implement the communications processing tasks on these processors. This paper explores
these models, and their effects on delivering on the promise of a new and better network
device design process.
,QGXVWU\,PSHUDWLYHV7LPHWR0DUNHW$QG7LPHLQ0DUNHW
%\'DYLG+XVDN
&3RUW&IRXQGHUDQG
&KLHI7HFKQLFDO2IILFHU
DQG
5REHUW*RKQ
&3RUW9LFH3UHVLGHQW
0DUNHWLQJ
Just as the Internet revolution is forever changing the face of public communication networks,
the way products that make up these networks are designed is also changing. Network equipment developers have consistently faced a difficult trade-off: performance requirements
demand hardware implementations of data forwarding functions, while new features, such as
advanced Quality of Service (QoS), require flexibility that only software can deliver. Designers
have been forced to revisit the fundamental hardware/ software trade-off with each new
product (or even line card) they develop, sacrificing software reuse between product lines and
product generations along the way. The result has been longer time-to-market, higher development costs, and shorter product lifetimes. Companies trying to compete in “internet time” can
no longer afford this type of product development.
The network processor, a new type of semiconductor device, is changing the dynamics of the
speed versus flexibility trade-off by enabling virtually all communications functions to be software programmable without sacrificing “hardware” speeds. These processors eliminate the
high-risk, long development cycles of custom hardware by enabling advanced product features
to be delivered completely in software, even long after initial product introduction. This allows
network equipment vendors to concentrate precious development resources on delivering
advanced services to their customers, rather than just the latest “feeds and speeds”.
The best network processors form the foundation of a “communications platform” that
contains the key elements required to radically transform the network device design process.
For example, Motorola’s Smart Networks Platform combines advanced network processor technology, “standard” programming interfaces, communications software components (from
C-Port and Motorola alliances) and a comprehensive development environment. This enables
network equipment vendors to quickly bring to market a wide array of different products based
on the same hardware and software architecture. The result is significantly faster time-tomarket for new products, and dramatically longer time-in-market (through the use of software
upgrades to deliver new, advanced services that extend the product life cycle). See Figure 1.
)LJXUH ASICs versus Network Processors Product Life Cycles
Point
Product
World
(ASICs)
:KLWH
3DSHU
Point
Product
Development
Point
Product
Lifetime
Time
Open
Platform
World
(Network
Processors)
Product
Develop.
Open Platform Product Lifetime
S/W
S/W
S/W
Time
For More Information On This Product,
Go to: www.freescale.com
S/W
S/W
Freescale Semiconductor, Inc.
2
But not all network processor architectures can support the
platform model. The communications platform requires more
than a reasonable “merchant silicon” point-product alternative
to ASIC design. With so much of the platform value riding on
the programmability of the devices, the network processor
programming model is a key metric by which these solutions
must be evaluated.
Network processors are specifically designed to bring programmability to the forwarding plane functions (layer 2 and higher of
the ISO model) required by the LAN and WAN devices that
make up today’s networks. These forwarding functions include:
•
0HGLDDFFHVVFRQWURO — Implementation of low-layer
protocols, such as Ethernet, SONET framing, ATM cell
processing, and so on. These protocols define how the data
is represented on the communications channel, and the
rules governing how that channel is accessed. Paradoxically,
this is the area of the greatest standardization among
network devices (due to standards-based protocol
definitions), and also the area of greatest diversity (due to
the wide and ever growing variety of protocols). These
include: Ethernet (with three different flavors at 10Mbps,
100Mbps and 1000Mbps), SONET supporting both data
packets and ATM cells at a wide range of standard rates
(OC-3, OC-12 OC-48, and so on), legacy T/E-carrier
interfaces from the existing public voice infrastructure, and
a variety of emerging optical interfaces all must coexist and
interact.
•
'DWDSDUVLQJ — Parsing cell or packet headers containing
addresses, protocol information, and so on. In the past,
parsing functions were fixed based on the type of device
being constructed (for example, LAN bridges, by definition,
only needed to look at the layer 2 Ethernet header). Today,
switching devices need the flexibility to gain access to and
examine a wide variety of information at all layers of the ISO
model — in real time and on a conditional packet-by-packet
basis.
•
&ODVVLILFDWLRQ — Identifying a packet or cell against a set of
criteria defined at layers 2, 3, 4, or higher of the ISO model.
Once data is parsed, it must be classified in order to
determine the required action. Actions might include such
basic functions as a filtering/forwarding decision, as well as
advanced QoS and accounting functions based on a specific
end-to-end traffic flow. This is an area of rapidly changing
requirements.
•
'DWDWUDQVIRUPDWLRQ — Modification or translation of data
within or between protocols. The variety of low-layer
transport protocols is matched only by the diversity of
protocol combinations and services. Transformation
requirements can range from address translation within a
given protocol (such as IP) to full protocol encapsulation or
conversion (such as between IP and ATM).
•
7UDIILFPDQDJHPHQW— Including the queuing, policing, and
scheduling of data traffic through the device according to
defined QoS parameters, based on the results of
classification and established policies. These functions are
key to supporting convergence of voice, video, and data in
next-generation networks.
7KH1DWXUHRI&RPPXQLFDWLRQV3URFHVVLQJ7DVNV
Freescale Semiconductor, Inc...
To evaluate network processor programming models, the
nature of the tasks to be programmed must be understood.
There are two broad categories of communications tasks (see
Figure 2):
•
)RUZDUGLQJ3ODQHWDVNV — Consisting of operations on
forwarding path communications data that occur in
real-time. These constitute the core device operations, and
hence are performance critical. In a switch or router, these
are the functions that receive, process, and transmit
packets into and out of the device.
•
&RQWURO3ODQHWDVNV — Consisting of less time-critical control
and management functions that determine general device
operation. In a switch or router, these functions control
routing table maintenance, port states, and higher-level
management.
In traditional designs, the forwarding plane functions are
divided between fixed-function hardware (usually custom
ASICs) and software running on a general-purpose CPU. Control
plane functions are implemented in software either on the
same CPU or another, dedicated “host” CPU.
)LJXUH Communications Processing Tasks
Policy Applications
Control
Plane
Network Management
Signaling
Topology Management
Queuing / Scheduling
Data Transformation
Forwarding
Plane
Classification
Data Parsing
Media Access Control
Physical layer
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
Network Processor Programming Model Choices
Today, each of these functions presents the challenge of a wide
diversity of possible implementations, rapid evolution based on
continuing innovation, strong interdependencies between functions, and a need for interworking between the diverse protocols. Delivering programmability and integration of these
functions represents a major evolution in network device
design.
Freescale Semiconductor, Inc...
1HWZRUN3URFHVVRU3URJUDPPLQJ0RGHO &KRLFHV
The computing world has always debated about what is the
best processor hardware architectures: CISC versus RISC,
single CPU versus multi-CPU, coprocessors versus faster
clocks, and so on. However, it is the software that determines
the success of computing platforms, both in terms of performance and programming ease. The limited success of
symmetric, parallel computing architectures proved that raw
computing power was not the decisive factor, but rather how
that power could be harnessed by software. The same is true
for network processors — the decisive factor is how the
programming model serves the platform requirements of fast,
simple, and flexible programmability.
There are two primary metrics for evaluating network processor
programming models. The first is the level of programmability
offered, both in terms of which functions can be programmed
(see Figure 2), as well as the extent that these various functions can be programmed. While the physical space, cost, and
power benefits of high functional integration into a single
processor is well understood, there is a forgotten benefit to the
programming model. Processor architectures that assume a
“bag of parts” approach provide programmability for a subset
of the forwarding plane functions, limiting the ability of
programmers to effectively deal with the diversity within each
level and the often complex interactions between them. Likewise, providing appropriate programmability within each level is
crucial to accommodating these interactions. Hence a fully integrated, fully programmable network processor architecture is a
major prerequisite for an effective programming model.
The second, and most important metric, is the actual programming method for the processor. Perhaps the largest struggle in
traditional network system design with ASICs has been a
“hardware first” architectural mentality, with software engineers designing around a less-than-ideal hardware/software
partitioning. Network processor programming models need to
turn this around, providing a hardware processor platform that
serves the requirements of the software functions and, in the
end, the software designers themselves. The key criteria is a
simple programming paradigm, using well known methods,
without sacrificing product performance.
May 2, 2001
3
The network processors available today fall into three broad
categories of programming methods, with a spectrum of capabilities within each. These categories are discussed below.
0LFURFRGH(QJLQH3URJUDPPLQJ
These devices implement virtually all the forwarding plane functions in custom designed, low-level microcode machines. All
tasks including data parsing, search algorithms, data transformation, queuing, and scheduling algorithms must be specifically programmed by the designer. These machines maximize
performance through multi-threading, which generally requires
the microcode writer to consider everything from memory
access times to thread interactions when optimizing each function.
Also, these architectures implement multiple instances (typically 6 to12) of these machines in parallel, with fixed hardware
schedulers assigning incoming data to a given machine based
on availability. This is similar to traditional symmetric multiprocessing computing models, which places additional constraints
on the microcode writer to assure proper interactions and
ordering of forwarded data.
Microcode’s strength lies in the efficiency of the code once it is
written. The code can be compact and fast. However, the
downside of programming in low level microcode, comprehending everything from memory access latencies to multiprocessing dependencies, is the lack of portability of these code
designs to other products based on same processor and to
new, faster versions or new generations of processors. This
code tends to be “one time use”, which when combined with
the inherent difficulty of writing microcode, might make these
processors suitable for point product designs, but seriously
compromises the value of “software programmability” for an
overall communications platform.
*/3URJUDPPLQJ
Another programming model focuses on leveraging proprietary
search and pattern-matching algorithms to the communications
processing task, specifically for parsing and classification. A
number of these algorithms use custom “fourth generation
languages” (4GLs) to describe the parsing and
pattern-matching requirements for a number of applications.
These 4GLs provide a concise method of “programming” the
classification function, and processors that implement these
algorithms provide a partial solution to this piece of the communications processing task.
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
Freescale Semiconductor, Inc...
4
To effectively support the requirements of a communications
platform, the programming model must support the ability to
write effective programs in a higher-level standard language.
The means the RISC cores need to have enough horsepower
within a rich coprocessing architecture to support an API
abstraction layer that insulates the operating code from low
level chip implementation details without sacrificing performance. This is the key to providing a simple programming
model environment and extending the life of the software.
The algorithms implemented by these processors typically
trade-off memory size for search speed, which may or may not
be an issue for the system design. There are, however, larger
impacts on the programming model against the two main
criteria outlined above. First, these processors focus almost
exclusively on the parsing and classification tasks, providing
only one piece of a “bag of parts” solution. The designer must
either build the required external hardware (and associated software) around this part, or, if available, use other piece parts
provided by the processor vendor (sometimes configurable
with microcode as described above). In either case, the
programming domain is disjoint, compromising what functions
are actually programmable and the depth of that programmability.
The implementation of the coprocessing architecture is critical,
as the coprocessors must off-load the RISC processor from the
communications tasks that are notoriously poor in standard
CPUs (such as the bit manipulation typically required in parsing
and data transformation tasks).
Even if the other functions are ignored, using a proprietary
description language for the classification requires new skills
and tools, not just for the coding tasks but for debug, analysis,
and maintenance. Good tools can mitigate some of this cost,
but the inconsistency between the other forwarding plane functions and the control plane functions will remain.
7KH)RXQGDWLRQ$1HWZRUN3URFHVVRU'HVLJQHGIRU
&RPPXQLFDWLRQV7DVNV
C-Port’s C-5 Network Processor (NP) is an example of a
network processor designed from the ground up to provide a
simple and robust programming model. The C-5 NP provides
complete programmability for each of the forwarding plane
tasks using standard C/C++ programming, enabling universal
applications in a wide variety of network devices. The C-5 NP
combines multiple RISC cores, specialized coprocessors, and
microcode engines within a single integrated circuit to offer a
full range of programmability at high performance. Figure 3
shows a block diagram of the C-5 NP.
6WDQGDUG/DQJXDJH3URJUDPPLQJ
The “standard language” programming model leverages
existing languages (such as C and C++ with their inherent
benefits such as readily available skilled programmers and
industry standard programming tools), usually combined with
special coprocessors, to implement the various communications processing tasks. These use multiple embedded RISC
cores as a key processing element to support the execution of
standard C/C++ programs implementing the desired behavior.
)LJXUH C-5 NP Software-optimized Architecture
SRAM
Note that the use of RISC cores in a
network processor does not automatically mean that the processor
was designed to support a
higher-level programming language
paradigm. Many “RISC-based”
network processors implement
proprietary instruction sets (or
proprietary extensions), which,
while expedient from a hardware
design perspective, force programmers to write all or significant
portions of their code in RISC
assembly language. Similarly, the
processing capacity may not be
adequate to support reasonable
implementations in a higher-level
language. Thus, programming
these processors can be just as
complex as writing in low-level
microcode.
External
PROM
(optional)
External
Host CPU
(optional)
SRAM
Fabric
SDRAM
Control
Logic
(optional)
Table
Lookup
Unit
Fabric
Processor
Queue
Mgmt
Unit
PCI
Serial
PROM
Executive Processor
Buffer
Mgmt
Unit
Buses (60Gpbs Bandwidth)
Cluster
C-5
NP
Cluster
CP-0
CP-1
CP-2
CP-3
CP-12
CP-13
CP-14
CP-15
PHY
PHY
PHY
PHY
PHY
PHY
PHY
PHY
Channel
Processors
Processor Boundary
Serial
Data Proc
32-bit
RISC
Serial
Data Proc
Core
Channel Processor
PHY Interface Examples:
10/100 Ethernet
OC-3
Gigabit Ethernet
OC-12
OC-48
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
Freescale Semiconductor, Inc...
Making Programming More Simple Through a “Communications” API
5
&KDQQHO3URFHVVRUV)OH[LEOH%XLOGLQJ%ORFNV
'HGLFDWHG&RSURFHVVRUV
The fundamental building blocks of the C-5 NP are the 16
embedded Channel Processors (CPs). Each CP consists of a
dedicated RISC CPU and dual Serial Data Processors (SDPs).
The CP structure combines the best attributes of specialized
configurable state-machine architectures with a fully programmable RISC core. CPs can be assigned to physical interfaces,
aggregated together to support higher-bandwidth I/O streams,
or assigned internally as a dedicated internal coprocessor.
The C-5 NP also provides five coprocessors optimized for
common tasks and used by the CPs. These coprocessors
handle shared tasks including table lookup, queue management, buffer management, fabric interfacing, and supervisory
processing. Each unit is highly configurable and offers performance and capabilities that, if packaged as stand-alone devices,
would be considered best-in-class communications components. For example, the Table Lookup Unit (TLU) enables a wide
range of traffic classification functions and supports multiple,
different search algorithms.
The SDPs handle data encoding/decoding, framing, formatting,
parsing, error checking (CRCs), and data movement. The SDPs
also control programmable external pin logic, allowing them to
implement virtually any layer 1 interface including connection to
T/E-Carrier framers, 10/100 Ethernet PHY (RMII), Gigabit
Ethernet PHY (GMII or TBI), OC-3 PHY, OC-12 PHY, and OC-48
framers/PHY. At layer 2, the SDPs can be independently configured to support Ethernet, PoS, HDLC streams, ATM, Frame
Relay, FibreChannel, or virtually any format including various
encapsulations such as MPLS. The programmability of the
SDPs support the diversity of media access control interfaces,
as well as first-order parsing requirements, and can support the
“mix-and-match” requirements of different implementations on
a port-by-port basis. This efficiently supports the needs of
various interworking applications.
The SDPs are programmed in microcode, which is provided by
C-Port for the vast majority of applications (all flavors of
Ethernet, IP and ATM over SONET, T/E carrier serial data
streams, and so on). All the tools necessary for equipment
vendors to program the SDPs (including assembler and simulator support) are available. Support for MAC level diversity is
available without any user coding.
The CP’s RISC core, programmed in C or C++, is available to
focus on higher-level tasks such as final switching / forwarding
decision making, scheduling, statistics gathering, or other tasks
required for higher-level services. The RISC core in each CP
operates at the core clock rate of the C-5 NP, has dedicated
internal instruction and data memory, and implements an
industry standard instruction subset, avoiding the issues associated with proprietary instructions. With the SDPs off-loading
the “bit level” tasks from the RISC core, the capacity of the
RISC machine can be dedicated to the tasks that benefit the
most from high-level language implementations.
May 2, 2001
'HVLJQHGIRU+LJKOHYHO3URJUDPPLQJ
The CPs, supported by the coprocessors, provide the fundamental building blocks from which multiple applications can be
supported through high-level programming. For example, the
CPs can take on different personalities to support ATM,
Ethernet/IP, PPP/IP, Frame Relay, Channelized HDLC, or even
proprietary protocols through a combination of microcode in
the SDPs and C/C++ code running on the RISC core. The data
paths through the CPs can be configured for external connection (to PHYs) or looped back internally, for use as an applications “coprocessor”.
Although there are 16 CPs per C-5 NP, each CP is independently
programmable, avoiding the limitations typical of traditional
symmetric multiprocessor designs. With the flexibility provided
by the CP architectures, it is a straight forward task to write
software for the CP to perform a given function.
0DNLQJ3URJUDPPLQJ0RUH6LPSOH7KURXJKD
|&RPPXQLFDWLRQV}$3,
Hardware flexibility is usually accompanied with complexity
driven by the number of possible functional permutations. By
adapting the concept of standard Application Programming
Interfaces (APIs) to communications processing, this
complexity can be put at the service of the programmer. The
C-5 NP supports C-Ware Application Programming Interfaces
(APIs), a set of open, efficient interfaces that abstract common
functions from the underlying hardware. See Figure 4.
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
6
)LJXUH C-Ware APIs
•
5REXVWVLPXODWLRQHQYLURQPHQW — Most network processor
vendors provide extensive simulation environments that
allow completion of forwarding plane code development
and performance characterization before hardware
integration. A key differentiator is the speed and accuracy of
the network processor simulation. Those based on a full
software implementation can be as accurate as a hardware
model (for example, based on Verilog/VHDL models), but
orders of magnitude faster, allowing more simulation
bandwidth.
•
'HYHORSPHQWV\VWHPDYDLODELOLW\ — A hardware development
system, offering the ability to execute software on the
“real” network processor, is also generally available from
most vendors. While not a replacement for a good simulator
(a simulator can always be better instrumented than real
hardware), it is invaluable for starting final integration in
advance of prototypes. A system that can be assembled to
closely match the target system configuration (types of
physical interfaces, and so on) is a great asset.
•
2WKHUVRIWZDUHWRROV — Software tools, such as compilers,
debuggers, performance analyzers, and so on are also key
elements of the software development environment.
Seamless integration of these tools across both the
simulation and hardware development platforms is an often
overlooked, but important, aspect of accelerating
time-to-market.
•
+RVWSURFHVVRULQWHJUDWLRQ — As described earlier, the
control plane functions are supported in a traditional
embedded CPU. The hardware integration of this processor
with the network processor is straight forward, but the
software integration requires some considerable thought.
Hence a software and hardware development environment
that comprehends the host processor, including drivers for
the leading real-time operating systems, host-level APIs,
and some number of fully integrated applications, should be
a key consideration.
C-5 Network Processor
Proc.
Fabric
Proc.
PDU Services
Freescale Semiconductor, Inc...
Kernel Services
Fabric
Services
Protocol Services
Table
Lookup
Unit
Queue
Mgmt.
Unit
Buffer
Mgmt.
Unit
Buffering
Services
CP15 Executive
Queuing
Services
CP1
Table Lookup
Services
CP0
For programmers accustomed to tweaking hundreds of lines of
assembly code to squeeze out the last bit of performance from
a CPU, the concept of using an API in forwarding plane code
would appear odd. However, the C-5 NP computing power (over
3,000 MIPS total) was sized from the beginning to accommodate any overhead imposed by an API. This, combined with
standard C/C++ programming, is the key to delivering on a
simple programming model.
In an effort to leverage the power of this concept throughout
the industry, a group of network processor, software, and
equipment vendors (with C-Port, IBM, and Lucent as charter
members) initiated the Common Programming Interface (CPIX)
Forum (www.cpixforum.org). By defining a common framework
and API, network processor vendors and communications software vendors can offer more portable and flexible solutions for
network equipment designers.
3URJUDPPLQJ(QYLURQPHQW5HTXLUHPHQWV
The use of a true communications platform in network device
design changes the typical design process. A much larger
percentage of the intellectual property of a product is delivered
in software, hence the network processor development tools
environment is critical to project success. In addition to the
basic programming model, other factors influence the speed at
which products can be brought to market.
These factors include:
•
6RIWZDUHUHIHUHQFHGHVLJQDYDLODELOLW\ — Most network
processor vendors provide examples of forwarding plane
software for some number of functions. The extent, quality,
and breadth of these applications (as well as available
implementations from software partners) can help make or
break a project schedule.
For example, C-Port provides a complete communications
development environment, consisting of a full software toolset
(including simulator), and a development system. The development system consists of network processor modules, physical
interface modules (for Ethernet, Gigabit Ethernet, OC-3, OC-12,
and so on), and a host processor module based on a PowerPC
CPU running the VxWorks RTOS. The vast majority of an application can be integrated and tested prior to integration with the
target product hardware design, significantly reducing the time
and risks of the product integration phase.
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
Conclusion
&RQFOXVLRQ
Freescale Semiconductor, Inc...
Network processors offer a significant opportunity to improve
the architecture, design, and maintenance of today’s
networking devices. The opportunity, however, extends beyond
the standard benefits of off-the-shelf merchant silicon. Processors that form the foundation of complete communications
platforms, based on a simple programming model, promise to
radically improve the way networking technology is brought to
market. This adds up to better product features, faster
time-to-market, and better reliability for network equipment
vendors and their customers.
May 2, 2001
For More Information On This Product,
Go to: www.freescale.com
7
Freescale Semiconductor, Inc...
Freescale Semiconductor, Inc.
C-5, C-Port, C-Ware, and the C-Port logo are all trademarks of C-Port
Corporation.
© 1999, 2000, 2001 C-Port Corporation
PM00WP100
For More Information On This Product,
Go to: www.freescale.com