XL C/C++ compilers - IBM

򔻐򗗠򙳰
XL C/C++ compilers
Overview
This paper details what's new in the IBM® XL C/C++ compiler family. IBM XL C/C++ is the successor to
IBM's VisualAge® C++ compiler.
Compiler features vary slightly by operating system platform, and platform-specific features are
described in the appropriate sections. All versions of IBM XL C/C++ share the common features
described below unless otherwise noted.
IBM XL C/C++ for AIX®, V10.1, IBM XL C/C++ for Linux®, V10.1, and IBM XL C/C++ for Multicore
Acceleration for Linux, V10.1 are part of a multi-platform XL compiler family derived from a common
code base optimized to run on IBM Power Architecture®.
IBM XL C/C++ for AIX is an industry leading optimizing compiler that supports IBM Power systems
capable of running IBM AIX V5.3 and IBM AIX V6.1, and IBM i V6.1 PASE. IBM XL C/C++ fully exploits
POWER4™, POWER5™, POWER5+™, and POWER6™ architectures including the Power 970 and Power
970MP as used in the IBM BladeCenter® JS21 and IBM BladeCenter JS22 systems.
The POWER6 processor is the very latest member of the IBM Power family, announced May 2007.
POWER6 is currently the fastest microprocessor ever built. Announced at the same time, the IBM Power
Systems 570 is an ultra-powerful server that leverages the many breakthroughs in both energy
conservation and virtualization technology of the POWER6. IBM Power 570 Server is the first UNIX®
server ever to hold all four major benchmark speed records at once (as of May 2007), SPECint2006,
SPECfp2006, SPECjbb2005 and TPC-C (an on-line transaction processing benchmark).
Note: For more information about these benchmarks see:
v www.ibm.com/systems/power (IBM Power Systems)
v www.spec.org/
v www.tpc.org/tpcc/
New features and enhancements in IBM XL C/C++ for AIX, V10.1 and IBM XL C/C++ for Linux, V10.1:
v Partial support for C++0x
v OpenMP API V3.0
v Enhancements to -qstrict
v New and changed compiler options and directives
Features and enhancements that were introduced in IBM XL C/C++, V9.0:
v Decimal floating-point
v C99 support
v Improved ASM support
v Improved GCC usability
v Most of Technical Report 1 (TR1) for C++
v Thread-local storage (TLS)
v PDF without use of IPA at the object level
v Tested with Boost 1.34.0 and achieved over 95% pass-rate (C++)
IBM XL C/C++ for Linux, V10.1 is available on selected Linux distributions running on IBM BladeCenter
JS20, IBM BladeCenter JS21, and IBM Power technology-based systems. You can run the compiler on Red
Hat Enterprise Linux 5.2 (RHEL5.2) and SUSE Linux Enterprise Server 10 Service Pack 2 (SLES 10 SP2).
For more information, see: www.ibm.com/software/awdtools/xlcpp/linux/
IBM XL C/C++ Advanced Edition for Blue Gene® (enabling support for IBM Blue Gene supercomputer
systems) provides a set of built-in functions that are specifically optimized for the Power 440 and Power
440d’s Double Hummer dual FPU. These are in addition to the family wide set of built-in functions
optimized for the Power architecture. These built-in functions provide an almost one-to-one
correspondence with Blue Gene’s Double Hummer instruction set. It also exploits the performance
capabilities of the PowerPC® 440d processor and its Double Hummer floating-point unit used in Blue
Gene®/L™ systems, and the PowerPC 450d processor and its Double Hummer floating-point unit used in
Blue Gene®/P™ systems. For more information, see: www.ibm.com/software/awdtools/xlcpp/features/
bg/xlcpp-bg.html
IBM XL C/C++ for Multicore Acceleration for Linux, V10.1 adopts proven high-performance compiler
technologies used in its compiler family predecessors, and adds new features tailored to exploit the
unique performance capabilities of processors compliant with the new Cell Broadband Engine™
architecture. It also introduces another compiler invocation allowing compilation and linking of Power
Processor Unit (PPU) and Synergistic Processor Unit (SPU) code segments with a single compiler
invocation. For more information, see: www.ibm.com/software/awdtools/xlcpp/multicore/
Other members of the IBM XL C/C++ compiler family include:
v z/OS® XL C/C++ (an optional priced feature of the z/OS operating system)
For more information, see: www.ibm.com/software/awdtools/czos/
v XL C/C++ for z/VM®
For more information, see: www.ibm.com/software/awdtools/czvm/
IBM XL C/C++ compilers comply with the latest C/C++ international standards and industry
specifications, facilitating application porting across hardware platforms and operating systems. The
compilers support a large array of common language features.
The increased compatibility with GNU C/C++ gives you the versatility to build different parts of your
application with either the IBM or GNU compiler, and still bind the parts together into a single
application. One common use of this functionality is to build an application with IBM XL C/C++ that
interacts with the GNU-built dynamic libraries, without recompiling the library source code. Applications
built with this functionality can integrate with the GNU assembler, and also provide full support for
debugging through gdb, the GNU debugger.
IBM XL C/C++ compilers on AIX and Linux also offer support for the IBM XL Fortran compiler on AIX
and Linux through interlanguage calls.
IBM XL C/C++ offers developers the opportunity to create and optimize 32-bit and 64-bit applications for
the AIX and Linux platforms. On operating systems and architectures supporting the VMX instruction
set, the IBM XL C/C++ compilers allow you to take advantage of the AltiVec programming model and
APIs. They also allow you to improve the performance of your data and CPU intensive applications by
exploiting the cutting edge IBM XL C/C++ automatic SIMD vectorization technology.
IBM XL C/C++ compilers continue to make strides in the development of multiplatform, shared-memory
parallel applications by providing a technology showcase of the Unified Parallel C (UPC) V1.2 language
specification. You can download this technology showcase as a separate free-of-charge add-on. For more
information, see: www.alphaworks.ibm.com/tech/upccompiler/
Standards conformance
2
On Linux platforms the compilers use the GNU C and C++ headers, and the resulting application is
linked with the C and C++ runtime libraries provided by the GNU compiler shipped with the operating
system. IBM ships an implementation of some header files with the product to override the
corresponding GNU header files. These header files are functionally equivalent to the corresponding
GNU implementation. Other IBM headers are wrappers that include the corresponding GNU header files.
IBM compilers strive to maximize the performance of scientific, technical, and commercial applications on
server platforms. Multiple operating system availability ensures cross-platform portability, augmented by
standards compliance. IBM XL compilers conform with:
v IBM XL C compiler conforms with ISO C90 and C99 standards.
v IBM XL C++ supports a limited form of C99 due to its usefulness in mixed C and C++ code and
header file inclusion. In addition, it also supports C++98 with the 2003 Technical Corrigendum 1
updates.
IBM XL C/C++ compilers also conform to these specifications:
v AltiVec (excluding z/OS, z/VM, and Blue Gene XL C/C++)
v OpenMP V3.0
– IBM XL C for AIX, V10.1 and IBM XL C/C++ for AIX, V10.1
– IBM XL C/C++ for Linux, V10.1
v OpenMP V2.5
– IBM XL C/C++ for Multicore Acceleration for Linux, V10.1
– IBM XL C/C++ Advanced Edition for Blue Gene, V9.0
v Universal Parallel C V1.2 (C) support by XL UPC alphaWorks® compiler
v IEEE POSIX 1003.2
The C99 standard has been updated with technical corrigendum (known as TC2). TC2 contains bug fixes.
These updates were first incorporated into IBM XL C V9.0.
C++0x
IBM XL C/C++, V10.1 introduces support for the upcoming release of the standard for the C++
programming language - specifically codenamed C++0x. This standard has not yet been officially adopted
but we are beginning to support some of its features. However, these features might change or be
removed in future according to what is finally ratified in the Standards .
Specifically, in this release:
v a new language level has been created.
v new integer promotion rules for arithmetic conversions with added support for C++0x long long data
types has been introduced.
v the C++ preprocessor now fully supports C99 features according to C++0x.
New language level - extended0x
The default -qlanglvl compiler option remains extended when invoking the C++ compiler. A new
suboption has been added to the -qlanglvl option in this release. -qlanglvl=extended0x is used to allow
users to try out early implementations of any features of C++0x that are currently supported by XL
C/C++.
C99 long long under C++
3
Expected compiler behavior is different with XL C/C++, V10.1 when performing certain arithmetic
operations with integral literal data types. Specifically, the integer promotion rules have changed.
Starting with this release and when compiling with -qlanglvl=extended0x, the compiler will now
promote unsuffixed integral literal to the first type in this list into which it fits:
v int
v long int
v long long int
v unsigned long long
Note: Like our implementation of the C99 Standard in the C compiler, C++ will allow promotions from
long long to unsigned long long if a value cannot fit into a long long type, but can fit in an
unsigned long long. In this case, a message will be generated.
The macro __C99_LLONG has been added for compatibility with C99. This macro is defined to 1 with
-qlanglvl=extended0x and is otherwise undefined.
Preprocessor changes
The following changes to the C++ preprocessor make it easier to port code from C to C++.
v Regular string literals can now be concatenated with wide-string literals.
v The #line <integer> preprocessor directive has a larger upper limit. It has been increased from 32767 to
2147483647 for C++.
v C++ now supports _Pragma operator.
v These macros now apply to C++ as well as C:
– __C99_MACRO_WITH_VA_ARGS (also available with -qlanglvl=extended)
– __C99_MAX_LINE_NUMBER (also available with -qlanglvl=extended)
– __C99_PRAGMA_OPERATOR
– __C99_MIXED_STRING_CONCAT
Note: Except as noted, these C++ preprocessor changes are only available when compiling with
-qlanglvl=extended0x.
For additional information about the language standards supported by XL C/C++, see Language levels
and language extensions.
Other XL C/C++ language-related updates
Vector data types
Vector data types can now use some of the operators that can be used with base data types such as:
v unary operators
v binary operators
v relational operators
Thread local storage
The thread local storage support has been enhanced to include __attribute__((tls-model("string")))
where string is one of local-exec, initial-exec, local-dynamic, or global-dynamic.
Operating system support
4
IBM XL C/C++ for AIX, V10.1 supports AIX V6.1 as well as AIX V5.3 TL5.
This version of the compiler does not support AIX V5.2.
Decimal floating-point support for XL C/C++
Decimal floating point arithmetic offers greater computational performance and precision in business and
financial applications where numeric data I/O is usually performed in decimal form. Data conversions
from decimal type to binary floating-point type and back are avoided, as are inherent rounding errors
accumulated during data conversions.
IBM XL C/C++ Enterprise Edition for AIX, V9.0 added support for decimal floating-point arithmetic with
two new compiler options:
Table 1. Decimal floating-point compiler options
Option/Directive
Description
-qdfp | -qnodfp
Specifying -qdfp enables compiler support for decimal
floating-point data types and literals.
-qfloat= dfpemulate | nodfpemulate
Specifying -qfloat=dfpemulate instructs the compiler to
use software emulation when handling decimal
floating-point computations.
y
There are suboptions specific to decimal floating-point
arithmetic for the y option to control rounding of
constant expressions.
Note: Compiler support for decimal floating-point operations requires AIX 5L™™ for POWER™™ V5.3
with the 5300-06 Technology Level or higher. For more information, see Extension for the
programming language C to support decimal floating-point arithmetic: TR 24732 and Decimal
Types for C++: Draft 4.
C99 support
The default -qlanglvl compiler option setting is extc99 when invoking the C compiler with the xlc
invocation. This change allows you to use C99 features and headers without having to explicitly specify
the extc99 suboption.
You might encounter issues with the following when compiling with the new default -qlanglvl=extc99
setting:
v Pointers can be qualified with restrict in C99, so restrict cannot be used as an identifier.
v C99 treatment of long long data differs from the way long long data is handled in C89.
v C99 header files define new macros: LLONG_MAX in limits.h, and va_copy in stdarg.h.
v The value of macro __STDC_VERSION__ changes from 199409 to 19990.
To revert to previous xlc behavior, specify -qlanglvl=extc89 when invoking the compiler.
Support for C++ Technical Report 1 (TR1)
IBM XL C/C++ Enterprise Edition for AIX, V9.0 introduced support for numerous extensions to the C++
language as defined by the Draft Technical Report on C++ Library Extensions (TR1).
5
For more information on these language extensions, see Draft Technical Report on C++ Library
Extensions (TR1) at www.open-std.org/jtc1/sc22/wg21/docs/papers/2005/n1836.pdf
Enhanced unicode and NLS support
As recommended in a recent report from the C Standard committee, the C compiler extends C99 to add
new data types to support UTF-16 and UTF-32 literals. The data types are u-literals and U-literals. To
enable support for UTF literals in your source code, you must compile with the option -qutf enabled. The
C++ compiler also supports these new data types for compatibility with C. C++ runtime is able to use the
ability of the AIX V5.3 and V6.1 operating system to load multiple locales if the application runs on such
a system.
IBM is a corporate member of the Unicode Consortium. For more information regarding unicode, see:
v www.unicode.org
Porting from open source and other platforms
The cross-platform portability of gcc and g++ has ensured GNU a place in the Open Source community.
GNU has excelled in educational and compiler research arenas as a test bed for new language syntax.
IBM compilers are built on a platform of reliability, customer service, and cutting-edge optimization. In
recent years, the IBM XL compilers have been evolving to gain some of the additional flexibility and
portability of the GNU compilers, while still retaining the strengths that have built the IBM XL C/C++
compiler’s reputation in the industry.
GNU source compatibility
XL C/C++ supports a subset of the GNU compiler command options to facilitate porting applications
developed with gcc and g++ compilers. This support is available when the gxlc or gxlc++ invocation
command is used together with select GNU compiler options. Where possible, the XL C/C++ compiler
maps GNU options to their XL C/C++ compiler option counterparts before invoking the XL C/C++
compiler. These invocation commands use a plain text configuration file to control GNU-to-XL C/C++
option mappings and defaults. You can customize this configuration file to better meet the needs of any
unique compilation requirements you may have. See ″Reusing GNU C /C++ compiler options with gxlc
and gxlc++″ for more information.
GNU Binary Compatibility (Linux only)
The IBM XL C/C++ compilers achieve a high degree of binary compatibility with GNU-built objects,
archives, and shared objects. The compiler achieves this by adhering to the system ABI and calling
conventions, and by closely following the GNU behavior where alignment modifiers like the attributes
aligned and packed are used.
C++ interoperability is somewhat more difficult to achieve due to differing conventions for name
mangling, object model, and exception handling. However, the GNU C++ compiler, since V3.2, has
adopted a common vendor C++ ABI that defines a way to allow interoperability of C++ object model,
name mangling, and exception handling. This common C++ ABI is supported in the IBM XL C++
compilers. IBM XL C/C++, V10.1 compilers on Linux have been fully tested with GNU C/C++ 4.1.2 on
SLES10 and on RHEL5, and offer a high degree of binary compatibility in addition to source
compatibility.
As the C++ ABI improves through bug fixes, some incompatibility can be found even between GNU C++
versions. Significant changes were introduced in GNU C++ V3.4. Most of the changes had to do with
6
subtle corner cases of empty bases and bitfield placement and alignment. These changes were encoded in
a new C++ ABI version. To maintain portability, a new option was introduced to allow backward
compatibility as well as selecting the new ABI. This option is -qabi_version=n and is available in the IBM
XL C++ compiler for Linux.
The XL C++ compiler for Linux also has an option to display the class layouts, the Virtual Function
Tables entries as well as all the intermediate object model tables such as the Construction Virtual
Function Table, and the Virtual Function Table. These help you to ensure binary compatibility through
verification of internal table layouts, and significantly enhance debugging of incompatibility problems.
Boost C++ Library support
Boost C++ Libraries are an Open Source set of libraries that takes you beyond the C++ Standard Library.
Boost makes C++ programming more elegant, robust, and productive. The Boost license grants
permission to copy, use, and modify the software for any commercial or non-commercial use. With the
non-restrictive licensing, these libraries are used directly by many commercial applications. Many of the
libraries are planned for inclusion in the next version of the C++ Standard Library.
Boost libraries allow you to be more productive through software reuse. The ability to compile and
execute the Boost Library properly demonstrates IBM's support of the latest C++ idioms and paradigms,
specifically generic programming and template metaprogramming.
Boost C++ libraries are coded by the leading C++ experts in the world, many of which are long time
members of the C++ Standard Committee. They use Boost as a test bed for cutting edge C++
programming techniques and codify discoveries and best practices without the long delay that it takes for
a library to be formally accepted into the C++ Standard. However, the Boost community subjects each
submission to rigorous peer review. This free sharing of knowledge, exposes a submission to a larger
audience which helps C++ evolve and grow.
The IBM XL C++ compiler has attained a high degree of compatibility with Boost since V7.0 and
continues to support Boost as new releases appear. Each version of the compiler is fully tested on one
version of Boost, usually the latest. The following table shows the Boost support in each version of the
compiler.
Table 2. IBM XL C++ compiler and Boost supported versions
IBM XL C++ Compiler Version
Boost Release Version
10.1
1.34.1
9.0
1.34.0
8.0
1.32.0
7.0
1.30.2
A patch file is available that modifies the Boost 1.34.1 C++ libraries so that they can be built and used
with XL C/C++ applications. The patch or modification file does not extend or otherwise provide
additional functionality to the Boost C++ libraries. To download the patch file and for more information
on support for these libraries see the relevant links on the XL C/C++ Library page.
You should check the IBM XL C/C++ webpages for information regarding modifications that apply to the
supported version of Boost.
For a summary of the results of regression tests, see (Boost Library Regression Test Summaries):
v www.ibm.com/support/docview.wss?uid=swg27006911
For more information on portable C++ source libraries from Boost, see:
7
v www.boost.org
C++ Templates
Templates are an area of the C++ language that provides a great deal of flexibility for developers. The
ISO C++ standard defines the language facilities and features for templates.
The IBM XL C++ compiler provides several methods to compile templates:
v Simple layout method. This results in code bloat and longer compile time, but it is easy to use and
requires no specific structuring by programmers.
v Automatic instantiation using -qtempinc. This requires user code structuring but it addresses the long
compile time problem inherent in the simple layout method.
v Automatic instantiation using -qtemplateregistry. This requires no user code restructuring and
addresses both the long compile time and code bloat issues.
The instantiation mechanisms are the external mechanisms that allow C++ implementations to create
instantiations correctly. These mechanisms may be constrained by requirements of the linker and other
software building tools.
IBM XL C++ compilers have two queried instantiation mechanisms, -qtempinc available before V7.0 and
-qtemplateregistry available since V7.0. One of the differences between -qtempinc and -qtemplateregistry
is that -qtempinc delays the instantiation until link time, and the -qtemplateregistry does the
instantiation in the first compilation unit that uses it.
-qtempinc and -qtemplateregistry compiler options are mutually exclusive.
Here is how you get the various instantiation models on our compiler:
v Greedy instantiation:
default is -qtmplinst=auto -qnotemplateregistry -qnotempinc or -qtmplinst=always
v Queried instantiation:
-qtemplateregistry or -qtempinc (for example -qtmplinst=auto)
v Manual instantiation:
-qtmplinst=none with explicit instantiations in your code.
Parallel programming
IBM provides parallel programming through AltiVec/VMX, OpenMP, and UPC as well as internal
automatic parallelization and autosimdization.
IBM XL C for AIX, V10.1 and IBM XL C/C++ for AIX, V10.1 added thread-specific variable support
through the Thread local storage (TLS) feature. TLS has been included in IBM XL C/C++ for Linux since
V8.0.
Thread-local storage (TLS)
In multi-threaded applications, we need to support thread-specific data. This is data that is unique to a
thread and is called thread-local storage. This is a GNU extension that has been commonly adapted by
many vendors and is similar to the POSIX getthreadspecific and setthreadspecific functions. But the
POSIX functions are slow and not useful for converting single-threaded applications to multi-threaded
applications. This feature allows thread-local storage using a new storage that indicates a variable has
thread storage duration.
8
Thread-local storage (TLS) is enabled by the __thread storage class specifier, or the threadprivate directive
in OpenMP. -qtls enables recognition of the __thread storage class specifier. Thread-local variables are
global-lifetime memory locations (variables with linkage) that are replicated one per thread. At runtime, a
copy of the variable is created for each thread that accesses it. Use of thread-local storage prevents race
conditions to global data, without the need for low-level synchronization of threads. A simple example
demonstrating a practical use of thread-local storage is the C error code variable errno.
AltiVec support
IBM XL C/C++ supports the AltiVec programming model through non-orthogonal language extensions.
These language extensions can be used on operating systems and hardware supporting the VMX
instruction set. The IBM implementation of the AltiVec Programming Interface specification is an
extended syntax that allows type qualifiers and storage class specifiers to precede the keyword vector (or
alternately, __vector) in a declaration.
Although not strictly required by the AltiVec Programming Interface specification the vector keyword is
recognized in a declaration context only when used as a type specifier (and when you compile the
application with -qaltivec). The other AltiVec keywords, pixel and bool (for C), are recognized as valid
type specifiers only when used in a vector declaration context. This approach has an important
advantage: it allows your application to continue to use ″vector″, ″pixel″ as variables and function names.
To ensure maximum portability, use the underscore versions of the specifiers vector and pixel (__vector
and __pixel) in declarations.
VMX support was first delivered on V7.0 Linux compilers, and is now available on V10.1 AIX compilers
where the target environment is running AIX V5.3 and AIX V6.1 on architectures that support the Single
Instruction Multiple Data (SIMD) instruction set.
OpenMP support
IBM XL C for AIX, V10.1 and IBM XL C/C++, V10.1 include support for the OpenMP API V3.0
specification for shared memory parallel programming. OpenMP provides a simple and flexible interface
for parallel application development. OpenMP is comprised of three components: compiler directives,
runtime library functions, and environment variables. Applications that conform to the OpenMP
specification are easily ported to other platforms from desktop to super computer that support the
specification.
OpenMP will support applications that run both as parallel programs (multiple threads of execution and
a full OpenMP support library) and as sequential programs (directives will be ignored and a stub library
will be linked).
The main differences between OpenMP API V2.5 and OpenMP API V3.0 are:
v Addition of task level parallelization. The new OpenMP constructs TASK and TASKWAIT give users
the ability to parallelize irregular algorithms, such as pointer chasing or recursive algorithms for which
the existing OpenMP constructs were not adequate.
v New variable types in FOR loops - In addition to signed int, FOR loops can now contain var values of
unsigned int and pointer type as well as signed int. The for loops can also contain var values that are
C++ classes that satisfy the random access iterator requirements.
v Stack size control. You can now control the size of the stack for threads created by the OMP runtime
library using the new environment variable OMP_STACKSIZE.
v New environment variables. Users can give hints to the desired behavior of waiting threads using new
environment variables OMP_WAIT_POLICY and OMP_SET_POLICY.
9
v Storage reuse. Some restrictions on the PRIVATE clause have been removed. A list item that appears in
the reduction clause of a parallel construct can now also appear in a private clause on a work-sharing
construct.
v Scheduling. A new SCHEDULE attribute, auto allows the compiler and runtime system to control
scheduling.
v STATIC schedule - Consecutive loop constructs with STATIC schedule can now use nowait.
v Nesting support - a COLLAPSE clause has been added to the DO, FOR, PARALLELL FOR, and
PARALLEL DO directives to allow parallelization of perfect loop nests. This means that multiple loops
in a nest can be parallelized.
v THREADPRIVATE directives. THREADPRIVATE directives can now apply to variables at class scope
in addition to file and block scope.
v iterator loops. Parallelization of iterator loops of canonical form including those with random access
iterators.
For a more in-depth discussion of application parallelization using OpenMP, see the following IBM
Redbooks®, Developing and Porting C and C++ Applications on AIX:
v www.redbooks.ibm.com/abstracts/SG245674.html?Open
For more information on OpenMP, see:
v www.openmp.org
Unified Parallel C
Unified Parallel C (UPC) is a specification for distributed shared memory parallelism. Unlike OpenMP
which uses pragma directives to achieve a common syntax that is bolted on to C, C++, and Fortran, UPC
embeds the syntax directly on the C language. Currently, the specification is only available for C.
Available as a separate downloadable add-on package to the IBM XL C and IBM XL C/C++ compiler, the
IBM XL UPC Alpha Edition compiler is a technology showcase of the Unified Parallel C (UPC) language,
V1.1.1 and V1.2, supporting IBM Power systems running AIX and selected Linux solutions.
IBM’s XL UPC alpha compiler is an optimizing compiler providing extensive diagnostics and
compilation-time syntax checking of UPC constructs. As opposed to a source-to-source translator, a full
compiler offers the advantage of carrying the language semantics on from parsing through different levels
of optimization and all the way to the code generator.
Partitioned Global Address Space (PGAS) languages such as UPC are increasingly seen as a convenient
way to enhance programmer productivity for High Performance Computing (HPC) applications on
large-scale machines. As the Defense Advanced Research Projects Agency (DARPA) High Productivity
Computing Systems (HPCS) initiative illustrates, the cost of programming large-scale machines is
becoming increasingly important; thus, programmer productivity is a major factor in procurement
decisions by many HPC customers. This technology is tangible evidence of IBM’s continued commitment
to the HPC community.
The XL UPC runtime system has been designed for scalability to large, parallel machines, such as IBM’s
Blue Gene/L supercomputer. It exposes to the compiler an API that is uniform across several
implementations: shared memory (pthreads) and two types of distributed memory (LAPI and the Blue
Gene/L message layer). An experimental version of the IBM XL UPC alpha compiler was used on a Blue
Gene/L system to participate in the HPC Challenge Class II Competition (www.hpcchallenge.org). Two of
the HPC Challenge benchmarks, Random Access and EP Stream Triad, were implemented in UPC. Using
the same compiler technology present in the IBM XL UPC Alpha Edition, these programs were scaled to
the unprecedented number of 131072 threads — the full Blue Gene/L machine.
10
The IBM XL UPC submission was selected as one of the winners of the 2006 HPC Challenge Class 2
Award. The results: 28.30 GUPS for Random Access and 91,627.49 GB/S for Stream Triad. For more
information, see:
v www.hpcchallenge.org/custom/index.html?lid=103&slid=220
The IBM XL UPC Alpha Edition compiler add-on package is available for download from the alphaWorks
website at www.alphaworks.ibm.com/tech/upccompiler/.
Power built-in functions
Introduced in the IBM XL C/C++ V9.0 compilers, are a number of built-in functions that map directly to
Power hardware instructions. These functions provide access to powerful hardware operations at a source
level such as cache prefetching and direct insertion of arithmetic hardware operations. Built-in functions
can be used in all the IBM XL C/C++ compilers allowing you to port your code between AIX and Linux,
and still exploit the hardware.
For POWER6, stream built-in functions were added. Experienced users may want to exploit patterns of
data accesses by setting up data streams. POWER6 has instructions that bring in data into cache lines as
data is accessed in a regular stream access pattern. The new built-in functions can be used to exploit this.
dcbst and dcbf are two new built-in functions that copy the content of a modified block from the data
cache to main memory. dcbf also flushes the copy from the data cache.
The POWER6 processor has cache control and stream prefetch extensions with support for store stream
prefetch and prefetch depth control. IBM XL C/C++ provides the following new built-in functions to
provide you direct access to these instructions.
Table 3.
Built-in function
void __dcbfl (const void* addr)
POWER6 - Data Cache Block Flush from L1 data cache
only
void __protected_unlimited_stream_set (unsigned int
direction, const void* addr, unsigned int ID)
Supported by POWER5 and POWER6
void __protected_unlimited_store_stream_set (unsigned
int direction, const void* addr, unsigned int ID)
Supported by POWER6
void __protected_store_stream_set (unsigned int direction, Supported by POWER6
const void* addr, unsigned int ID)
void __protected_stream_count_depth (unsigned int
unit_cnt, unsigned int prefetch_depth, unsigned int ID)
Supported by POWER6
New built-in functions for floating-point division allow you more control, rather then leaving the
compiler to make the selection between hardware and software division code. Refer to the compiler
documentation for specific details of the list of supported built-ins.
Optimization Capabilities
One of the key strengths of IBM XL C/C++ is optimization. These compilers offer the benefit of
optimization technology that has been evolving at IBM since the late 1980s, combining extensive
hardware knowledge with a comprehensive understanding of compiler technology and what users look
for in a compiler when building end-user applications. The optimizations can decrease execution time
and make your applications run faster, producing code that is highly tuned for execution on Power
Architecture platforms. Improving optimization is a key goal of the IBM compiler team, and one that will
continue to be a major focus with each iteration of the IBM XL C/C++ compilers.
11
The optimizer includes five base optimization levels; -O0, -O2, -O3, -O4, and -O5. These levels allow you
to choose from minimal optimization to intense program analysis that provides benefits even across
programming languages. Optimization analyses range from local basic block to subprogram to file-level
to whole-program analysis. The higher the optimization level, the more intense the program analysis
becomes as increasingly sophisticated optimization techniques are applied to your code.
At any optimization level, the optimizer performs transformations that result in performance
improvements, while still executing your code the way it was written. At higher levels, the optimizer can
trade numeric precision for execution speed. If this effect is not desired, you can specify compiler options
such as -qstrict to prevent such trade-offs. Other options such as -qsmallstack or -qcompact allow you to
bias optimization decisions in favor of smaller stack space or program size.
The IBM XL C/C++ compilers do not limit your optimization choices unnecessarily. All of the
optimization capabilities, including those discussed above, can be combined. You choose the levels and
types of optimizations best suited to your application and build constraints, putting ultimate control of
how your application builds and runs firmly in your hands.
For more information on optimization, please see the Code optimization with the IBM XL Compilers
whitepaper.
v www.ibm.com/support/docview.wss?uid=swg27005174
Enhancements to -qstrict
In IBM XL C/C++ V10.1 many suboptions have been added to the -qstrict option that allow more
fine-grained control over optimizations and transformations that violate strict program semantics. In
previous releases, the -qstrict option disabled all transformations controlled by the STRICT option. This is
still the behavior if you use -qstrict without suboptions. Likewise, in previous releases -qnostrict allowed
transformations that could change program semantics. Since higher level of optimizations may require
relaxing strict program semantics, the addition of the suboptions allow you to relax selected rules in
order to get specific benefits of faster code without turning off all semantic verification. There are 16 new
suboptions that can be used separately or by using a suboption group. The groups are:
all
Disables all semantics-changing transformations, including those controlled by the other
suboptions.
ieeefp Controls whether individual operations conform to IEEE 754 semantics.
order
Controls whether or not individual operations can be reordered in a way that may violate
program language semantics.
precision
Controls optimizations and transformations that may affect the precision of program results.
exceptions
Controls optimizations and transformations that may affect the runtime exceptions generated by
the program.
New and changed compiler options and directives
Compiler options can be specified on the command line or through directives embedded in your
application source files. See the XL C/C++ Compiler Reference for detailed descriptions and usage
information for these and other compiler options.
12
Table 4. New or changed compiler options and directives
Option/directive
Description
-qstrict
Many suboptions have been added to the -qstrict option
to allow more control over optimizations and
transformations that violate strict program semantics. See
Performance and optimization for more information.
-qshowmacros
When used in conjunction with the -E option, the
-qshowmacros option replaces preprocessed output with
macro definitions. There are suboptions provided to
control the emissions of predefined and user-defined
macros more precisely.
-qreport
When used together with compiler options that enable
automatic parallelization or vectorization, the -qreport
option now reports the number of streams in a loop and
produces information when loops cannot be SIMD
vectorized due to non-stride-one references.
-qnamemangling
There are minor refinements to the mangling scheme and
there is a new suboption to provide backwards
compatibility for the rare cases it is needed.
-qsmp=omp
XL C/C++ now supports some features of OpenMP 3.0.
For more information, see OpenMP 3.0.
#pragma init and #pragma fini
Programmers can use #pragma init and #pragma fini to
specify a list of functions to run before or after main() or
when shared libraries are loaded or unloaded. These
functions can be used to do initialization and cleanup.
Note: For C applications, a C++ invocation, such as xlC
or the redistributable tools linkxlC or
makeC++SharedLib, must be used at link time.
-qtimestamps
This option can be used to remove timestamps from
generated binaries.
-qtls
The thread local storage support has been enhanced to
include __attribute__((tls-model("string"))) where
string is one of local-exec, initial-exec,
local-dynamic, or global-dynamic.
-qinfo
The suboptions als and noals have been added to the
qinfo option to report (or not report) possible violations
of the ANSI aliasing rule.
-qpriority
-qpriority is now supported in C. Also refer to#pragma
init and #pragma fini listed above.
-qunique
-qunique now applies to both C and C++. Also refer
to#pragma init and #pragma fini listed above.
IBM Mathematics Acceleration Subsystem (MASS) libraries
Starting with IBM XL C/C++ V7.0 compilers for AIX and Linux, these compilers began shipping the IBM
Mathematical Accelerated Subsystem (MASS) libraries of mathematical intrinsic functions specifically
tuned for optimum performance on IBM Power Architectures.
The XL C/C++ V9.0 compilers for AIX and Linux, introduced a new library, libmassvp6.a
The MASS libraries include scalar and vector functions, are thread-safe, support both 32-bit and 64-bit
compilations, and offer improved performance.
13
The MASS scalar library, libmass.a, contains an accelerated set of frequently used math intrinsic
functions in the AIX system library libm.a.
Table 5. Libraries included in the MASS library
Mass vector library
Tuned for processor
libmassv.a
libmassvp6.a
POWER6
libmassvp5.a
POWER5
libmassvp4.a
POWER4
libmassvp3.a
POWER3™
The MASS vector libraries libmassv.a, libmassvp3.a,libmassvp4.a, libmassvp5.a, and libmassvp6.a
contain tuned and threadsafe intrinsic functions that can be used with either Fortran or C applications.
libmassv.a, contains vector functions that will run on all models in the IBM Power Systems family, while
libmassvp3.a and libmassvp4.a each contain a subset of libmassv.a functions that have been specifically
tuned for the POWER3 and POWER4 processors, respectively. libmassvp5.a contains functions that have
been tuned for POWER5 and libmassvp6.a contains functions tuned for POWER6.
Basic Linear Algebra Subprograms (BLAS)
IBM XL C/C++ Enterprise Edition V8.0 for AIX introduced the BLAS set of high-performance algebraic
functions. There are four BLAS functions shipped with IBM XL C/C++ in the libxlopt library. The
functions are:
v sgemv (single-precision) and dgemv (double-precision), which compute the matrix-vector product for a
general matrix or its transpose.
v sgemm (single-precision) and dgemm (double-precision), which perform combined matrix multiplication
and addition for general matrices or their transposes.
Because the BLAS routines are written in Fortran, all parameters are passed to them by reference, and all
arrays are stored in column-major order.
Shared memory parallelization
XL C/C++ supports application development for multiprocessor system architectures. You can use any of
the following methods to develop your parallelized applications with XL C/C++:
v Directive-based shared memory parallelization (OpenMP, SMP).
v Instructing the compiler to automatically generate shared memory parallelization.
v Message passing based shared or distributed memory parallelization (MPI).
v POSIX threads (Pthreads) parallelization.
v Low-level UNIX parallelization using fork() and exec().
The parallel programming facilities of the AIX operating system are based on the concept of threads.
Parallel programming exploits the advantages of multiprocessor systems, while maintaining a full binary
compatibility with existing uniprocessor systems. This means that a multithreaded program that works
on a uniprocessor system can take advantage of a multiprocessor system without recompiling. For more
information, see “Parallelizing your programs” in the XL C/C++ Optimization and Programming Guide.
IBM Debugger for AIX
14
The IBM Debugger for AIX can help you detect and diagnose errors in programs that are running locally
or remotely. You can control the execution of your programs by setting compiled language-specific
breakpoints, suspending execution, stepping through your code, and examining and changing the
contents of variables. The debugger contains views and functionality specific to a given programming
language. With the compiled language views, you can monitor variables, expressions, registers, memory,
and application modules of the application you are debugging.
Source-code migration and conformance checking
XL C/C++ helps protect your investment in your existing C/C++ source code by providing compiler
invocation commands that instruct the compiler to compile your application code to a specific language
level. You can also use the -qlanglvl compiler option to specify a given language level, and the compiler
will issue warnings, errors, and severe error messages if language or language extension elements in your
program source do not conform to that language level. See -qlanglvl for more information.
Support for third-party C++ runtime libraries
The IBM XL C++ compiler on AIX can compile C++ applications so that the application supports only the
core language, thus enabling it to link with C++ runtime libraries from third-party vendors. The
following archive files enable this functionality.
Table 6. Core Language libraries
Library name
Content (see -qheapdebug for more information)
lib*C*core.a
Contains exception handling, RTTI, static initialization, new and delete operators. Does not
contain any of the following libraries: Input/Output, Localization, STL Containers, Iterators,
Algorithms, Numerics, Strings.
libCcore.a
The core language version of the C++ runtime library, libC.a.
libC128core.a
The core language version of libC128.a.
libhCcore.a
The language core version of libhC.a.
Invocation commands have been added to facilitate using these libraries:
v xlc++core
v xlCcore
Equivalent special invocations:
v xlc++core_r, xlc++core_r7, xlc++core128, xlc++core128_r, xlc++core128_r7
v xlCcore_r, xlCcore_r7, xlC128core, xlC128core_r, xlC128core_r7
Explanation of suffixes for special invocations:
v 128-suffixed invocations - All 128-suffixed invocation commands are functionally similar to their
corresponding base compiler invocations. They specify the -qldbl128 option, which increases the length
of long double types in your program from 64 to 128 bits. They also link with the 128-bit versions of
the C and C++ runtime libraries.
v _r suffixed invocations - All _r suffixed invocations allow for threadsafe compilation and you can use
them to link the programs that use multithreading. Use these commands if you want to create threaded
applications.
The _r7 invocations are provided to help migrate programs based on POSIX Draft 7 to POSIX Draft 10.
Command-line compatibility and other utilities
15
When you are porting GNU makefiles to IBM XL C/C++, the gxlc and gxlc++ invocation commands are
available to translate a GNU compiler invocation command into the corresponding IBM XL C/C++
command where applicable, and invoke the IBM XL C/C++ compiler. This facilitates the transition to
IBM XL C/C++ while minimizing the number of changes to makefiles built with a GNU compiler. A new
xlc++ command line utility was added in V8.0 to enable compatibility with other platforms.
To fully exploit the capabilities of IBM XL C/C++, you should use the IBM XL C/C++ invocation
commands and their associated options.
Rational PurifyPlus for Linux and UNIX
IBM Rational® PurifyPlus™ is a runtime analysis solution designed to help developers write faster, more
reliable code. Runtime analysis includes four basic functions: memory corruption detection, memory leak
detection, application performance profiling, and code coverage analysis. Rational PurifyPlus packages
support all four of these functions in a single product with a common install and licensing system.
Rational PurifyPlus for Linux and UNIX supports AIX, HP UNIX, Linux, and Sun UNIX.
For more information, see:
v www.ibm.com/software/awdtools/purifyplus/
Diagnostic listings
The compiler output listing can provide important information to help you develop and debug your
applications more efficiently. Listing information is organized into optional sections that you can include
or omit. For more information about the applicable compiler options and the listing itself, refer to
“Compiler messages and listings” in the XL C/C++ Compiler Reference.
Symbolic debugger support
You can instruct the XL C for AIX and XL C/C++ for AIX compilers to include debugging information in
your compiled objects. This debugging information can be examined by dbx, the IBM Debugger for AIX,
or any other symbolic debugger that supports the AIX XCOFF executable format to help you debug your
programs.
You can then use gdb, the IBM Debugger for Linux, or any other symbolic debugger to step through and
inspect the behavior of your compiled application.
Documentation and online help
IBM XL C/C++ uses a fully searchable HTML-based information center, the IBM Eclipse Help System.
The information center allows you to search and browse online information. The information center is
built upon open source software developed by the Eclipse project. For more information on Eclipse, see:
v www.eclipse.org
PDF versions of the IBM XL C/C++ manuals are available with the installation media (either product CD
or electronic package).
An extensive collection of technical material, trials and demos, support information, and features and
benefits of IBM XL C/C++ can be found at the following URL:
v www.ibm.com/awdtools/xlcpp/aix/library/
16
The IBM XL C/C++ compilers also include man pages for all utilities and compiler invocation
commands.
Premier customer service
The IBM XL C/C++ compilers come with IBM’s premier service and support. The IBM Service and
Support organization is made up of a team dedicated to providing you with responsive platform and
cross-platform software support. For complex or code-related problems, IBM employs specialized service
teams with access to compiler development experts. The vision of IBM Service and Support is to achieve
a level of support excellence that exceeds customer expectations and differentiates IBM in the
marketplace. You will always have access to the right level of IBM expertise when you need it.
Summary
IBM XL C/C++ compilers are stable and flexible, providing industry leading optimization techniques that
can address your compiler needs for everything from small applications, to large, computationally
intensive programs.
The extensive cross-platform availability of the IBM XL C/C++ compilers eases the porting process
between AIX, z/OS, and Linux. Standards conformance and GNU compatibilities improve portability of
source code from GNU compilers to IBM compilers. The binary compatibility feature allows direct
linkage with objects, shared libraries, and archives built by either the GNU or IBM compilers. This allows
you to take advantage of the features offered by both suites of compiler products.
IBM is also deeply involved in the High Performance Computing effort. Three of the top ten entries in
the TOP 500 Supercomputing List are IBM systems using IBM XL C/C++ compiler optimizations. The
IBM XL C/C++ compiler team is deeply involved in parallel computing and supporting different parallel
memory models. Other new features support customer requests and enable middleware applications.
Finally, optimization through chip-specific instruction generation and tuning, parallelization,
vectorization, interprocedural analysis, and profile-directed feedback, offers an increase in performance
without sacrificing stability or flexibility. Coupled with IBM’s excellent service and support, IBM XL
C/C++ compilers are robust, versatile, and capable of delivering mission critical applications on AIX and
Linux.
Trial versions and purchasing
Trial versions of IBM XL C/C++ compilers can be downloaded at:
v www.ibm.com/software/awdtools/xlcpp/
Information on how to buy IBM XL C/C++ is also available at this web site.
Contacting IBM
IBM welcomes your comments. You can send them to [email protected].
17
November 2008
References in this document to IBM products, programs, or services do not imply that IBM intends to make these
available in all countries in which IBM operates. Any reference to an IBM program product in this publication is not
intended to state or imply that only IBM’s program product may be used. Any functionally equivalent program may
be used instead.
IBM, the IBM logo, and ibm.com® are trademarks or registered trademarks of International Business Machines
Corporation in the United States, other countries, or both. These and other IBM trademarked terms are marked on
their first occurrence in this information with the appropriate symbol (® or ™), indicating US registered or common
law trademarks owned by IBM at the time this information was published. Such trademarks may also be registered
or common law trademarks in other countries.
Linux is a trademark of Linus Torvalds in the United States, other countries, or both.
UNIX is a registered trademark of The Open Group in the United States and other countries.
Cell Broadband Engine is a trademark of Sony Computer Entertainment, Inc. in the United States, other countries, or
both and is used under license therefrom.
© Copyright International Business Machines Corporation 1999, 2008.
US Government Users Restricted Rights – Use, duplication or disclosure restricted by GSA ADP Schedule Contract
with IBM Corp.
Similar pages
PDF