XL C/C++ compilers Overview This paper details what's new in the IBM® XL C/C++ compiler family. IBM XL C/C++ is the successor to IBM's VisualAge® C++ compiler. Compiler features vary slightly by operating system platform, and platform-specific features are described in the appropriate sections. All versions of IBM XL C/C++ share the common features described below unless otherwise noted. IBM XL C/C++ for AIX®, V10.1, IBM XL C/C++ for Linux®, V10.1, and IBM XL C/C++ for Multicore Acceleration for Linux, V10.1 are part of a multi-platform XL compiler family derived from a common code base optimized to run on IBM Power Architecture®. IBM XL C/C++ for AIX is an industry leading optimizing compiler that supports IBM Power systems capable of running IBM AIX V5.3 and IBM AIX V6.1, and IBM i V6.1 PASE. IBM XL C/C++ fully exploits POWER4™, POWER5™, POWER5+™, and POWER6™ architectures including the Power 970 and Power 970MP as used in the IBM BladeCenter® JS21 and IBM BladeCenter JS22 systems. The POWER6 processor is the very latest member of the IBM Power family, announced May 2007. POWER6 is currently the fastest microprocessor ever built. Announced at the same time, the IBM Power Systems 570 is an ultra-powerful server that leverages the many breakthroughs in both energy conservation and virtualization technology of the POWER6. IBM Power 570 Server is the first UNIX® server ever to hold all four major benchmark speed records at once (as of May 2007), SPECint2006, SPECfp2006, SPECjbb2005 and TPC-C (an on-line transaction processing benchmark). Note: For more information about these benchmarks see: v www.ibm.com/systems/power (IBM Power Systems) v www.spec.org/ v www.tpc.org/tpcc/ New features and enhancements in IBM XL C/C++ for AIX, V10.1 and IBM XL C/C++ for Linux, V10.1: v Partial support for C++0x v OpenMP API V3.0 v Enhancements to -qstrict v New and changed compiler options and directives Features and enhancements that were introduced in IBM XL C/C++, V9.0: v Decimal floating-point v C99 support v Improved ASM support v Improved GCC usability v Most of Technical Report 1 (TR1) for C++ v Thread-local storage (TLS) v PDF without use of IPA at the object level v Tested with Boost 1.34.0 and achieved over 95% pass-rate (C++) IBM XL C/C++ for Linux, V10.1 is available on selected Linux distributions running on IBM BladeCenter JS20, IBM BladeCenter JS21, and IBM Power technology-based systems. You can run the compiler on Red Hat Enterprise Linux 5.2 (RHEL5.2) and SUSE Linux Enterprise Server 10 Service Pack 2 (SLES 10 SP2). For more information, see: www.ibm.com/software/awdtools/xlcpp/linux/ IBM XL C/C++ Advanced Edition for Blue Gene® (enabling support for IBM Blue Gene supercomputer systems) provides a set of built-in functions that are specifically optimized for the Power 440 and Power 440d’s Double Hummer dual FPU. These are in addition to the family wide set of built-in functions optimized for the Power architecture. These built-in functions provide an almost one-to-one correspondence with Blue Gene’s Double Hummer instruction set. It also exploits the performance capabilities of the PowerPC® 440d processor and its Double Hummer floating-point unit used in Blue Gene®/L™ systems, and the PowerPC 450d processor and its Double Hummer floating-point unit used in Blue Gene®/P™ systems. For more information, see: www.ibm.com/software/awdtools/xlcpp/features/ bg/xlcpp-bg.html IBM XL C/C++ for Multicore Acceleration for Linux, V10.1 adopts proven high-performance compiler technologies used in its compiler family predecessors, and adds new features tailored to exploit the unique performance capabilities of processors compliant with the new Cell Broadband Engine™ architecture. It also introduces another compiler invocation allowing compilation and linking of Power Processor Unit (PPU) and Synergistic Processor Unit (SPU) code segments with a single compiler invocation. For more information, see: www.ibm.com/software/awdtools/xlcpp/multicore/ Other members of the IBM XL C/C++ compiler family include: v z/OS® XL C/C++ (an optional priced feature of the z/OS operating system) For more information, see: www.ibm.com/software/awdtools/czos/ v XL C/C++ for z/VM® For more information, see: www.ibm.com/software/awdtools/czvm/ IBM XL C/C++ compilers comply with the latest C/C++ international standards and industry specifications, facilitating application porting across hardware platforms and operating systems. The compilers support a large array of common language features. The increased compatibility with GNU C/C++ gives you the versatility to build different parts of your application with either the IBM or GNU compiler, and still bind the parts together into a single application. One common use of this functionality is to build an application with IBM XL C/C++ that interacts with the GNU-built dynamic libraries, without recompiling the library source code. Applications built with this functionality can integrate with the GNU assembler, and also provide full support for debugging through gdb, the GNU debugger. IBM XL C/C++ compilers on AIX and Linux also offer support for the IBM XL Fortran compiler on AIX and Linux through interlanguage calls. IBM XL C/C++ offers developers the opportunity to create and optimize 32-bit and 64-bit applications for the AIX and Linux platforms. On operating systems and architectures supporting the VMX instruction set, the IBM XL C/C++ compilers allow you to take advantage of the AltiVec programming model and APIs. They also allow you to improve the performance of your data and CPU intensive applications by exploiting the cutting edge IBM XL C/C++ automatic SIMD vectorization technology. IBM XL C/C++ compilers continue to make strides in the development of multiplatform, shared-memory parallel applications by providing a technology showcase of the Unified Parallel C (UPC) V1.2 language specification. You can download this technology showcase as a separate free-of-charge add-on. For more information, see: www.alphaworks.ibm.com/tech/upccompiler/ Standards conformance 2 On Linux platforms the compilers use the GNU C and C++ headers, and the resulting application is linked with the C and C++ runtime libraries provided by the GNU compiler shipped with the operating system. IBM ships an implementation of some header files with the product to override the corresponding GNU header files. These header files are functionally equivalent to the corresponding GNU implementation. Other IBM headers are wrappers that include the corresponding GNU header files. IBM compilers strive to maximize the performance of scientific, technical, and commercial applications on server platforms. Multiple operating system availability ensures cross-platform portability, augmented by standards compliance. IBM XL compilers conform with: v IBM XL C compiler conforms with ISO C90 and C99 standards. v IBM XL C++ supports a limited form of C99 due to its usefulness in mixed C and C++ code and header file inclusion. In addition, it also supports C++98 with the 2003 Technical Corrigendum 1 updates. IBM XL C/C++ compilers also conform to these specifications: v AltiVec (excluding z/OS, z/VM, and Blue Gene XL C/C++) v OpenMP V3.0 – IBM XL C for AIX, V10.1 and IBM XL C/C++ for AIX, V10.1 – IBM XL C/C++ for Linux, V10.1 v OpenMP V2.5 – IBM XL C/C++ for Multicore Acceleration for Linux, V10.1 – IBM XL C/C++ Advanced Edition for Blue Gene, V9.0 v Universal Parallel C V1.2 (C) support by XL UPC alphaWorks® compiler v IEEE POSIX 1003.2 The C99 standard has been updated with technical corrigendum (known as TC2). TC2 contains bug fixes. These updates were first incorporated into IBM XL C V9.0. C++0x IBM XL C/C++, V10.1 introduces support for the upcoming release of the standard for the C++ programming language - specifically codenamed C++0x. This standard has not yet been officially adopted but we are beginning to support some of its features. However, these features might change or be removed in future according to what is finally ratified in the Standards . Specifically, in this release: v a new language level has been created. v new integer promotion rules for arithmetic conversions with added support for C++0x long long data types has been introduced. v the C++ preprocessor now fully supports C99 features according to C++0x. New language level - extended0x The default -qlanglvl compiler option remains extended when invoking the C++ compiler. A new suboption has been added to the -qlanglvl option in this release. -qlanglvl=extended0x is used to allow users to try out early implementations of any features of C++0x that are currently supported by XL C/C++. C99 long long under C++ 3 Expected compiler behavior is different with XL C/C++, V10.1 when performing certain arithmetic operations with integral literal data types. Specifically, the integer promotion rules have changed. Starting with this release and when compiling with -qlanglvl=extended0x, the compiler will now promote unsuffixed integral literal to the first type in this list into which it fits: v int v long int v long long int v unsigned long long Note: Like our implementation of the C99 Standard in the C compiler, C++ will allow promotions from long long to unsigned long long if a value cannot fit into a long long type, but can fit in an unsigned long long. In this case, a message will be generated. The macro __C99_LLONG has been added for compatibility with C99. This macro is defined to 1 with -qlanglvl=extended0x and is otherwise undefined. Preprocessor changes The following changes to the C++ preprocessor make it easier to port code from C to C++. v Regular string literals can now be concatenated with wide-string literals. v The #line <integer> preprocessor directive has a larger upper limit. It has been increased from 32767 to 2147483647 for C++. v C++ now supports _Pragma operator. v These macros now apply to C++ as well as C: – __C99_MACRO_WITH_VA_ARGS (also available with -qlanglvl=extended) – __C99_MAX_LINE_NUMBER (also available with -qlanglvl=extended) – __C99_PRAGMA_OPERATOR – __C99_MIXED_STRING_CONCAT Note: Except as noted, these C++ preprocessor changes are only available when compiling with -qlanglvl=extended0x. For additional information about the language standards supported by XL C/C++, see Language levels and language extensions. Other XL C/C++ language-related updates Vector data types Vector data types can now use some of the operators that can be used with base data types such as: v unary operators v binary operators v relational operators Thread local storage The thread local storage support has been enhanced to include __attribute__((tls-model("string"))) where string is one of local-exec, initial-exec, local-dynamic, or global-dynamic. Operating system support 4 IBM XL C/C++ for AIX, V10.1 supports AIX V6.1 as well as AIX V5.3 TL5. This version of the compiler does not support AIX V5.2. Decimal floating-point support for XL C/C++ Decimal floating point arithmetic offers greater computational performance and precision in business and financial applications where numeric data I/O is usually performed in decimal form. Data conversions from decimal type to binary floating-point type and back are avoided, as are inherent rounding errors accumulated during data conversions. IBM XL C/C++ Enterprise Edition for AIX, V9.0 added support for decimal floating-point arithmetic with two new compiler options: Table 1. Decimal floating-point compiler options Option/Directive Description -qdfp | -qnodfp Specifying -qdfp enables compiler support for decimal floating-point data types and literals. -qfloat= dfpemulate | nodfpemulate Specifying -qfloat=dfpemulate instructs the compiler to use software emulation when handling decimal floating-point computations. y There are suboptions specific to decimal floating-point arithmetic for the y option to control rounding of constant expressions. Note: Compiler support for decimal floating-point operations requires AIX 5L™™ for POWER™™ V5.3 with the 5300-06 Technology Level or higher. For more information, see Extension for the programming language C to support decimal floating-point arithmetic: TR 24732 and Decimal Types for C++: Draft 4. C99 support The default -qlanglvl compiler option setting is extc99 when invoking the C compiler with the xlc invocation. This change allows you to use C99 features and headers without having to explicitly specify the extc99 suboption. You might encounter issues with the following when compiling with the new default -qlanglvl=extc99 setting: v Pointers can be qualified with restrict in C99, so restrict cannot be used as an identifier. v C99 treatment of long long data differs from the way long long data is handled in C89. v C99 header files define new macros: LLONG_MAX in limits.h, and va_copy in stdarg.h. v The value of macro __STDC_VERSION__ changes from 199409 to 19990. To revert to previous xlc behavior, specify -qlanglvl=extc89 when invoking the compiler. Support for C++ Technical Report 1 (TR1) IBM XL C/C++ Enterprise Edition for AIX, V9.0 introduced support for numerous extensions to the C++ language as defined by the Draft Technical Report on C++ Library Extensions (TR1). 5 For more information on these language extensions, see Draft Technical Report on C++ Library Extensions (TR1) at www.open-std.org/jtc1/sc22/wg21/docs/papers/2005/n1836.pdf Enhanced unicode and NLS support As recommended in a recent report from the C Standard committee, the C compiler extends C99 to add new data types to support UTF-16 and UTF-32 literals. The data types are u-literals and U-literals. To enable support for UTF literals in your source code, you must compile with the option -qutf enabled. The C++ compiler also supports these new data types for compatibility with C. C++ runtime is able to use the ability of the AIX V5.3 and V6.1 operating system to load multiple locales if the application runs on such a system. IBM is a corporate member of the Unicode Consortium. For more information regarding unicode, see: v www.unicode.org Porting from open source and other platforms The cross-platform portability of gcc and g++ has ensured GNU a place in the Open Source community. GNU has excelled in educational and compiler research arenas as a test bed for new language syntax. IBM compilers are built on a platform of reliability, customer service, and cutting-edge optimization. In recent years, the IBM XL compilers have been evolving to gain some of the additional flexibility and portability of the GNU compilers, while still retaining the strengths that have built the IBM XL C/C++ compiler’s reputation in the industry. GNU source compatibility XL C/C++ supports a subset of the GNU compiler command options to facilitate porting applications developed with gcc and g++ compilers. This support is available when the gxlc or gxlc++ invocation command is used together with select GNU compiler options. Where possible, the XL C/C++ compiler maps GNU options to their XL C/C++ compiler option counterparts before invoking the XL C/C++ compiler. These invocation commands use a plain text configuration file to control GNU-to-XL C/C++ option mappings and defaults. You can customize this configuration file to better meet the needs of any unique compilation requirements you may have. See ″Reusing GNU C /C++ compiler options with gxlc and gxlc++″ for more information. GNU Binary Compatibility (Linux only) The IBM XL C/C++ compilers achieve a high degree of binary compatibility with GNU-built objects, archives, and shared objects. The compiler achieves this by adhering to the system ABI and calling conventions, and by closely following the GNU behavior where alignment modifiers like the attributes aligned and packed are used. C++ interoperability is somewhat more difficult to achieve due to differing conventions for name mangling, object model, and exception handling. However, the GNU C++ compiler, since V3.2, has adopted a common vendor C++ ABI that defines a way to allow interoperability of C++ object model, name mangling, and exception handling. This common C++ ABI is supported in the IBM XL C++ compilers. IBM XL C/C++, V10.1 compilers on Linux have been fully tested with GNU C/C++ 4.1.2 on SLES10 and on RHEL5, and offer a high degree of binary compatibility in addition to source compatibility. As the C++ ABI improves through bug fixes, some incompatibility can be found even between GNU C++ versions. Significant changes were introduced in GNU C++ V3.4. Most of the changes had to do with 6 subtle corner cases of empty bases and bitfield placement and alignment. These changes were encoded in a new C++ ABI version. To maintain portability, a new option was introduced to allow backward compatibility as well as selecting the new ABI. This option is -qabi_version=n and is available in the IBM XL C++ compiler for Linux. The XL C++ compiler for Linux also has an option to display the class layouts, the Virtual Function Tables entries as well as all the intermediate object model tables such as the Construction Virtual Function Table, and the Virtual Function Table. These help you to ensure binary compatibility through verification of internal table layouts, and significantly enhance debugging of incompatibility problems. Boost C++ Library support Boost C++ Libraries are an Open Source set of libraries that takes you beyond the C++ Standard Library. Boost makes C++ programming more elegant, robust, and productive. The Boost license grants permission to copy, use, and modify the software for any commercial or non-commercial use. With the non-restrictive licensing, these libraries are used directly by many commercial applications. Many of the libraries are planned for inclusion in the next version of the C++ Standard Library. Boost libraries allow you to be more productive through software reuse. The ability to compile and execute the Boost Library properly demonstrates IBM's support of the latest C++ idioms and paradigms, specifically generic programming and template metaprogramming. Boost C++ libraries are coded by the leading C++ experts in the world, many of which are long time members of the C++ Standard Committee. They use Boost as a test bed for cutting edge C++ programming techniques and codify discoveries and best practices without the long delay that it takes for a library to be formally accepted into the C++ Standard. However, the Boost community subjects each submission to rigorous peer review. This free sharing of knowledge, exposes a submission to a larger audience which helps C++ evolve and grow. The IBM XL C++ compiler has attained a high degree of compatibility with Boost since V7.0 and continues to support Boost as new releases appear. Each version of the compiler is fully tested on one version of Boost, usually the latest. The following table shows the Boost support in each version of the compiler. Table 2. IBM XL C++ compiler and Boost supported versions IBM XL C++ Compiler Version Boost Release Version 10.1 1.34.1 9.0 1.34.0 8.0 1.32.0 7.0 1.30.2 A patch file is available that modifies the Boost 1.34.1 C++ libraries so that they can be built and used with XL C/C++ applications. The patch or modification file does not extend or otherwise provide additional functionality to the Boost C++ libraries. To download the patch file and for more information on support for these libraries see the relevant links on the XL C/C++ Library page. You should check the IBM XL C/C++ webpages for information regarding modifications that apply to the supported version of Boost. For a summary of the results of regression tests, see (Boost Library Regression Test Summaries): v www.ibm.com/support/docview.wss?uid=swg27006911 For more information on portable C++ source libraries from Boost, see: 7 v www.boost.org C++ Templates Templates are an area of the C++ language that provides a great deal of flexibility for developers. The ISO C++ standard defines the language facilities and features for templates. The IBM XL C++ compiler provides several methods to compile templates: v Simple layout method. This results in code bloat and longer compile time, but it is easy to use and requires no specific structuring by programmers. v Automatic instantiation using -qtempinc. This requires user code structuring but it addresses the long compile time problem inherent in the simple layout method. v Automatic instantiation using -qtemplateregistry. This requires no user code restructuring and addresses both the long compile time and code bloat issues. The instantiation mechanisms are the external mechanisms that allow C++ implementations to create instantiations correctly. These mechanisms may be constrained by requirements of the linker and other software building tools. IBM XL C++ compilers have two queried instantiation mechanisms, -qtempinc available before V7.0 and -qtemplateregistry available since V7.0. One of the differences between -qtempinc and -qtemplateregistry is that -qtempinc delays the instantiation until link time, and the -qtemplateregistry does the instantiation in the first compilation unit that uses it. -qtempinc and -qtemplateregistry compiler options are mutually exclusive. Here is how you get the various instantiation models on our compiler: v Greedy instantiation: default is -qtmplinst=auto -qnotemplateregistry -qnotempinc or -qtmplinst=always v Queried instantiation: -qtemplateregistry or -qtempinc (for example -qtmplinst=auto) v Manual instantiation: -qtmplinst=none with explicit instantiations in your code. Parallel programming IBM provides parallel programming through AltiVec/VMX, OpenMP, and UPC as well as internal automatic parallelization and autosimdization. IBM XL C for AIX, V10.1 and IBM XL C/C++ for AIX, V10.1 added thread-specific variable support through the Thread local storage (TLS) feature. TLS has been included in IBM XL C/C++ for Linux since V8.0. Thread-local storage (TLS) In multi-threaded applications, we need to support thread-specific data. This is data that is unique to a thread and is called thread-local storage. This is a GNU extension that has been commonly adapted by many vendors and is similar to the POSIX getthreadspecific and setthreadspecific functions. But the POSIX functions are slow and not useful for converting single-threaded applications to multi-threaded applications. This feature allows thread-local storage using a new storage that indicates a variable has thread storage duration. 8 Thread-local storage (TLS) is enabled by the __thread storage class specifier, or the threadprivate directive in OpenMP. -qtls enables recognition of the __thread storage class specifier. Thread-local variables are global-lifetime memory locations (variables with linkage) that are replicated one per thread. At runtime, a copy of the variable is created for each thread that accesses it. Use of thread-local storage prevents race conditions to global data, without the need for low-level synchronization of threads. A simple example demonstrating a practical use of thread-local storage is the C error code variable errno. AltiVec support IBM XL C/C++ supports the AltiVec programming model through non-orthogonal language extensions. These language extensions can be used on operating systems and hardware supporting the VMX instruction set. The IBM implementation of the AltiVec Programming Interface specification is an extended syntax that allows type qualifiers and storage class specifiers to precede the keyword vector (or alternately, __vector) in a declaration. Although not strictly required by the AltiVec Programming Interface specification the vector keyword is recognized in a declaration context only when used as a type specifier (and when you compile the application with -qaltivec). The other AltiVec keywords, pixel and bool (for C), are recognized as valid type specifiers only when used in a vector declaration context. This approach has an important advantage: it allows your application to continue to use ″vector″, ″pixel″ as variables and function names. To ensure maximum portability, use the underscore versions of the specifiers vector and pixel (__vector and __pixel) in declarations. VMX support was first delivered on V7.0 Linux compilers, and is now available on V10.1 AIX compilers where the target environment is running AIX V5.3 and AIX V6.1 on architectures that support the Single Instruction Multiple Data (SIMD) instruction set. OpenMP support IBM XL C for AIX, V10.1 and IBM XL C/C++, V10.1 include support for the OpenMP API V3.0 specification for shared memory parallel programming. OpenMP provides a simple and flexible interface for parallel application development. OpenMP is comprised of three components: compiler directives, runtime library functions, and environment variables. Applications that conform to the OpenMP specification are easily ported to other platforms from desktop to super computer that support the specification. OpenMP will support applications that run both as parallel programs (multiple threads of execution and a full OpenMP support library) and as sequential programs (directives will be ignored and a stub library will be linked). The main differences between OpenMP API V2.5 and OpenMP API V3.0 are: v Addition of task level parallelization. The new OpenMP constructs TASK and TASKWAIT give users the ability to parallelize irregular algorithms, such as pointer chasing or recursive algorithms for which the existing OpenMP constructs were not adequate. v New variable types in FOR loops - In addition to signed int, FOR loops can now contain var values of unsigned int and pointer type as well as signed int. The for loops can also contain var values that are C++ classes that satisfy the random access iterator requirements. v Stack size control. You can now control the size of the stack for threads created by the OMP runtime library using the new environment variable OMP_STACKSIZE. v New environment variables. Users can give hints to the desired behavior of waiting threads using new environment variables OMP_WAIT_POLICY and OMP_SET_POLICY. 9 v Storage reuse. Some restrictions on the PRIVATE clause have been removed. A list item that appears in the reduction clause of a parallel construct can now also appear in a private clause on a work-sharing construct. v Scheduling. A new SCHEDULE attribute, auto allows the compiler and runtime system to control scheduling. v STATIC schedule - Consecutive loop constructs with STATIC schedule can now use nowait. v Nesting support - a COLLAPSE clause has been added to the DO, FOR, PARALLELL FOR, and PARALLEL DO directives to allow parallelization of perfect loop nests. This means that multiple loops in a nest can be parallelized. v THREADPRIVATE directives. THREADPRIVATE directives can now apply to variables at class scope in addition to file and block scope. v iterator loops. Parallelization of iterator loops of canonical form including those with random access iterators. For a more in-depth discussion of application parallelization using OpenMP, see the following IBM Redbooks®, Developing and Porting C and C++ Applications on AIX: v www.redbooks.ibm.com/abstracts/SG245674.html?Open For more information on OpenMP, see: v www.openmp.org Unified Parallel C Unified Parallel C (UPC) is a specification for distributed shared memory parallelism. Unlike OpenMP which uses pragma directives to achieve a common syntax that is bolted on to C, C++, and Fortran, UPC embeds the syntax directly on the C language. Currently, the specification is only available for C. Available as a separate downloadable add-on package to the IBM XL C and IBM XL C/C++ compiler, the IBM XL UPC Alpha Edition compiler is a technology showcase of the Unified Parallel C (UPC) language, V1.1.1 and V1.2, supporting IBM Power systems running AIX and selected Linux solutions. IBM’s XL UPC alpha compiler is an optimizing compiler providing extensive diagnostics and compilation-time syntax checking of UPC constructs. As opposed to a source-to-source translator, a full compiler offers the advantage of carrying the language semantics on from parsing through different levels of optimization and all the way to the code generator. Partitioned Global Address Space (PGAS) languages such as UPC are increasingly seen as a convenient way to enhance programmer productivity for High Performance Computing (HPC) applications on large-scale machines. As the Defense Advanced Research Projects Agency (DARPA) High Productivity Computing Systems (HPCS) initiative illustrates, the cost of programming large-scale machines is becoming increasingly important; thus, programmer productivity is a major factor in procurement decisions by many HPC customers. This technology is tangible evidence of IBM’s continued commitment to the HPC community. The XL UPC runtime system has been designed for scalability to large, parallel machines, such as IBM’s Blue Gene/L supercomputer. It exposes to the compiler an API that is uniform across several implementations: shared memory (pthreads) and two types of distributed memory (LAPI and the Blue Gene/L message layer). An experimental version of the IBM XL UPC alpha compiler was used on a Blue Gene/L system to participate in the HPC Challenge Class II Competition (www.hpcchallenge.org). Two of the HPC Challenge benchmarks, Random Access and EP Stream Triad, were implemented in UPC. Using the same compiler technology present in the IBM XL UPC Alpha Edition, these programs were scaled to the unprecedented number of 131072 threads — the full Blue Gene/L machine. 10 The IBM XL UPC submission was selected as one of the winners of the 2006 HPC Challenge Class 2 Award. The results: 28.30 GUPS for Random Access and 91,627.49 GB/S for Stream Triad. For more information, see: v www.hpcchallenge.org/custom/index.html?lid=103&slid=220 The IBM XL UPC Alpha Edition compiler add-on package is available for download from the alphaWorks website at www.alphaworks.ibm.com/tech/upccompiler/. Power built-in functions Introduced in the IBM XL C/C++ V9.0 compilers, are a number of built-in functions that map directly to Power hardware instructions. These functions provide access to powerful hardware operations at a source level such as cache prefetching and direct insertion of arithmetic hardware operations. Built-in functions can be used in all the IBM XL C/C++ compilers allowing you to port your code between AIX and Linux, and still exploit the hardware. For POWER6, stream built-in functions were added. Experienced users may want to exploit patterns of data accesses by setting up data streams. POWER6 has instructions that bring in data into cache lines as data is accessed in a regular stream access pattern. The new built-in functions can be used to exploit this. dcbst and dcbf are two new built-in functions that copy the content of a modified block from the data cache to main memory. dcbf also flushes the copy from the data cache. The POWER6 processor has cache control and stream prefetch extensions with support for store stream prefetch and prefetch depth control. IBM XL C/C++ provides the following new built-in functions to provide you direct access to these instructions. Table 3. Built-in function void __dcbfl (const void* addr) POWER6 - Data Cache Block Flush from L1 data cache only void __protected_unlimited_stream_set (unsigned int direction, const void* addr, unsigned int ID) Supported by POWER5 and POWER6 void __protected_unlimited_store_stream_set (unsigned int direction, const void* addr, unsigned int ID) Supported by POWER6 void __protected_store_stream_set (unsigned int direction, Supported by POWER6 const void* addr, unsigned int ID) void __protected_stream_count_depth (unsigned int unit_cnt, unsigned int prefetch_depth, unsigned int ID) Supported by POWER6 New built-in functions for floating-point division allow you more control, rather then leaving the compiler to make the selection between hardware and software division code. Refer to the compiler documentation for specific details of the list of supported built-ins. Optimization Capabilities One of the key strengths of IBM XL C/C++ is optimization. These compilers offer the benefit of optimization technology that has been evolving at IBM since the late 1980s, combining extensive hardware knowledge with a comprehensive understanding of compiler technology and what users look for in a compiler when building end-user applications. The optimizations can decrease execution time and make your applications run faster, producing code that is highly tuned for execution on Power Architecture platforms. Improving optimization is a key goal of the IBM compiler team, and one that will continue to be a major focus with each iteration of the IBM XL C/C++ compilers. 11 The optimizer includes five base optimization levels; -O0, -O2, -O3, -O4, and -O5. These levels allow you to choose from minimal optimization to intense program analysis that provides benefits even across programming languages. Optimization analyses range from local basic block to subprogram to file-level to whole-program analysis. The higher the optimization level, the more intense the program analysis becomes as increasingly sophisticated optimization techniques are applied to your code. At any optimization level, the optimizer performs transformations that result in performance improvements, while still executing your code the way it was written. At higher levels, the optimizer can trade numeric precision for execution speed. If this effect is not desired, you can specify compiler options such as -qstrict to prevent such trade-offs. Other options such as -qsmallstack or -qcompact allow you to bias optimization decisions in favor of smaller stack space or program size. The IBM XL C/C++ compilers do not limit your optimization choices unnecessarily. All of the optimization capabilities, including those discussed above, can be combined. You choose the levels and types of optimizations best suited to your application and build constraints, putting ultimate control of how your application builds and runs firmly in your hands. For more information on optimization, please see the Code optimization with the IBM XL Compilers whitepaper. v www.ibm.com/support/docview.wss?uid=swg27005174 Enhancements to -qstrict In IBM XL C/C++ V10.1 many suboptions have been added to the -qstrict option that allow more fine-grained control over optimizations and transformations that violate strict program semantics. In previous releases, the -qstrict option disabled all transformations controlled by the STRICT option. This is still the behavior if you use -qstrict without suboptions. Likewise, in previous releases -qnostrict allowed transformations that could change program semantics. Since higher level of optimizations may require relaxing strict program semantics, the addition of the suboptions allow you to relax selected rules in order to get specific benefits of faster code without turning off all semantic verification. There are 16 new suboptions that can be used separately or by using a suboption group. The groups are: all Disables all semantics-changing transformations, including those controlled by the other suboptions. ieeefp Controls whether individual operations conform to IEEE 754 semantics. order Controls whether or not individual operations can be reordered in a way that may violate program language semantics. precision Controls optimizations and transformations that may affect the precision of program results. exceptions Controls optimizations and transformations that may affect the runtime exceptions generated by the program. New and changed compiler options and directives Compiler options can be specified on the command line or through directives embedded in your application source files. See the XL C/C++ Compiler Reference for detailed descriptions and usage information for these and other compiler options. 12 Table 4. New or changed compiler options and directives Option/directive Description -qstrict Many suboptions have been added to the -qstrict option to allow more control over optimizations and transformations that violate strict program semantics. See Performance and optimization for more information. -qshowmacros When used in conjunction with the -E option, the -qshowmacros option replaces preprocessed output with macro definitions. There are suboptions provided to control the emissions of predefined and user-defined macros more precisely. -qreport When used together with compiler options that enable automatic parallelization or vectorization, the -qreport option now reports the number of streams in a loop and produces information when loops cannot be SIMD vectorized due to non-stride-one references. -qnamemangling There are minor refinements to the mangling scheme and there is a new suboption to provide backwards compatibility for the rare cases it is needed. -qsmp=omp XL C/C++ now supports some features of OpenMP 3.0. For more information, see OpenMP 3.0. #pragma init and #pragma fini Programmers can use #pragma init and #pragma fini to specify a list of functions to run before or after main() or when shared libraries are loaded or unloaded. These functions can be used to do initialization and cleanup. Note: For C applications, a C++ invocation, such as xlC or the redistributable tools linkxlC or makeC++SharedLib, must be used at link time. -qtimestamps This option can be used to remove timestamps from generated binaries. -qtls The thread local storage support has been enhanced to include __attribute__((tls-model("string"))) where string is one of local-exec, initial-exec, local-dynamic, or global-dynamic. -qinfo The suboptions als and noals have been added to the qinfo option to report (or not report) possible violations of the ANSI aliasing rule. -qpriority -qpriority is now supported in C. Also refer to#pragma init and #pragma fini listed above. -qunique -qunique now applies to both C and C++. Also refer to#pragma init and #pragma fini listed above. IBM Mathematics Acceleration Subsystem (MASS) libraries Starting with IBM XL C/C++ V7.0 compilers for AIX and Linux, these compilers began shipping the IBM Mathematical Accelerated Subsystem (MASS) libraries of mathematical intrinsic functions specifically tuned for optimum performance on IBM Power Architectures. The XL C/C++ V9.0 compilers for AIX and Linux, introduced a new library, libmassvp6.a The MASS libraries include scalar and vector functions, are thread-safe, support both 32-bit and 64-bit compilations, and offer improved performance. 13 The MASS scalar library, libmass.a, contains an accelerated set of frequently used math intrinsic functions in the AIX system library libm.a. Table 5. Libraries included in the MASS library Mass vector library Tuned for processor libmassv.a libmassvp6.a POWER6 libmassvp5.a POWER5 libmassvp4.a POWER4 libmassvp3.a POWER3™ The MASS vector libraries libmassv.a, libmassvp3.a,libmassvp4.a, libmassvp5.a, and libmassvp6.a contain tuned and threadsafe intrinsic functions that can be used with either Fortran or C applications. libmassv.a, contains vector functions that will run on all models in the IBM Power Systems family, while libmassvp3.a and libmassvp4.a each contain a subset of libmassv.a functions that have been specifically tuned for the POWER3 and POWER4 processors, respectively. libmassvp5.a contains functions that have been tuned for POWER5 and libmassvp6.a contains functions tuned for POWER6. Basic Linear Algebra Subprograms (BLAS) IBM XL C/C++ Enterprise Edition V8.0 for AIX introduced the BLAS set of high-performance algebraic functions. There are four BLAS functions shipped with IBM XL C/C++ in the libxlopt library. The functions are: v sgemv (single-precision) and dgemv (double-precision), which compute the matrix-vector product for a general matrix or its transpose. v sgemm (single-precision) and dgemm (double-precision), which perform combined matrix multiplication and addition for general matrices or their transposes. Because the BLAS routines are written in Fortran, all parameters are passed to them by reference, and all arrays are stored in column-major order. Shared memory parallelization XL C/C++ supports application development for multiprocessor system architectures. You can use any of the following methods to develop your parallelized applications with XL C/C++: v Directive-based shared memory parallelization (OpenMP, SMP). v Instructing the compiler to automatically generate shared memory parallelization. v Message passing based shared or distributed memory parallelization (MPI). v POSIX threads (Pthreads) parallelization. v Low-level UNIX parallelization using fork() and exec(). The parallel programming facilities of the AIX operating system are based on the concept of threads. Parallel programming exploits the advantages of multiprocessor systems, while maintaining a full binary compatibility with existing uniprocessor systems. This means that a multithreaded program that works on a uniprocessor system can take advantage of a multiprocessor system without recompiling. For more information, see “Parallelizing your programs” in the XL C/C++ Optimization and Programming Guide. IBM Debugger for AIX 14 The IBM Debugger for AIX can help you detect and diagnose errors in programs that are running locally or remotely. You can control the execution of your programs by setting compiled language-specific breakpoints, suspending execution, stepping through your code, and examining and changing the contents of variables. The debugger contains views and functionality specific to a given programming language. With the compiled language views, you can monitor variables, expressions, registers, memory, and application modules of the application you are debugging. Source-code migration and conformance checking XL C/C++ helps protect your investment in your existing C/C++ source code by providing compiler invocation commands that instruct the compiler to compile your application code to a specific language level. You can also use the -qlanglvl compiler option to specify a given language level, and the compiler will issue warnings, errors, and severe error messages if language or language extension elements in your program source do not conform to that language level. See -qlanglvl for more information. Support for third-party C++ runtime libraries The IBM XL C++ compiler on AIX can compile C++ applications so that the application supports only the core language, thus enabling it to link with C++ runtime libraries from third-party vendors. The following archive files enable this functionality. Table 6. Core Language libraries Library name Content (see -qheapdebug for more information) lib*C*core.a Contains exception handling, RTTI, static initialization, new and delete operators. Does not contain any of the following libraries: Input/Output, Localization, STL Containers, Iterators, Algorithms, Numerics, Strings. libCcore.a The core language version of the C++ runtime library, libC.a. libC128core.a The core language version of libC128.a. libhCcore.a The language core version of libhC.a. Invocation commands have been added to facilitate using these libraries: v xlc++core v xlCcore Equivalent special invocations: v xlc++core_r, xlc++core_r7, xlc++core128, xlc++core128_r, xlc++core128_r7 v xlCcore_r, xlCcore_r7, xlC128core, xlC128core_r, xlC128core_r7 Explanation of suffixes for special invocations: v 128-suffixed invocations - All 128-suffixed invocation commands are functionally similar to their corresponding base compiler invocations. They specify the -qldbl128 option, which increases the length of long double types in your program from 64 to 128 bits. They also link with the 128-bit versions of the C and C++ runtime libraries. v _r suffixed invocations - All _r suffixed invocations allow for threadsafe compilation and you can use them to link the programs that use multithreading. Use these commands if you want to create threaded applications. The _r7 invocations are provided to help migrate programs based on POSIX Draft 7 to POSIX Draft 10. Command-line compatibility and other utilities 15 When you are porting GNU makefiles to IBM XL C/C++, the gxlc and gxlc++ invocation commands are available to translate a GNU compiler invocation command into the corresponding IBM XL C/C++ command where applicable, and invoke the IBM XL C/C++ compiler. This facilitates the transition to IBM XL C/C++ while minimizing the number of changes to makefiles built with a GNU compiler. A new xlc++ command line utility was added in V8.0 to enable compatibility with other platforms. To fully exploit the capabilities of IBM XL C/C++, you should use the IBM XL C/C++ invocation commands and their associated options. Rational PurifyPlus for Linux and UNIX IBM Rational® PurifyPlus™ is a runtime analysis solution designed to help developers write faster, more reliable code. Runtime analysis includes four basic functions: memory corruption detection, memory leak detection, application performance profiling, and code coverage analysis. Rational PurifyPlus packages support all four of these functions in a single product with a common install and licensing system. Rational PurifyPlus for Linux and UNIX supports AIX, HP UNIX, Linux, and Sun UNIX. For more information, see: v www.ibm.com/software/awdtools/purifyplus/ Diagnostic listings The compiler output listing can provide important information to help you develop and debug your applications more efficiently. Listing information is organized into optional sections that you can include or omit. For more information about the applicable compiler options and the listing itself, refer to “Compiler messages and listings” in the XL C/C++ Compiler Reference. Symbolic debugger support You can instruct the XL C for AIX and XL C/C++ for AIX compilers to include debugging information in your compiled objects. This debugging information can be examined by dbx, the IBM Debugger for AIX, or any other symbolic debugger that supports the AIX XCOFF executable format to help you debug your programs. You can then use gdb, the IBM Debugger for Linux, or any other symbolic debugger to step through and inspect the behavior of your compiled application. Documentation and online help IBM XL C/C++ uses a fully searchable HTML-based information center, the IBM Eclipse Help System. The information center allows you to search and browse online information. The information center is built upon open source software developed by the Eclipse project. For more information on Eclipse, see: v www.eclipse.org PDF versions of the IBM XL C/C++ manuals are available with the installation media (either product CD or electronic package). An extensive collection of technical material, trials and demos, support information, and features and benefits of IBM XL C/C++ can be found at the following URL: v www.ibm.com/awdtools/xlcpp/aix/library/ 16 The IBM XL C/C++ compilers also include man pages for all utilities and compiler invocation commands. Premier customer service The IBM XL C/C++ compilers come with IBM’s premier service and support. The IBM Service and Support organization is made up of a team dedicated to providing you with responsive platform and cross-platform software support. For complex or code-related problems, IBM employs specialized service teams with access to compiler development experts. The vision of IBM Service and Support is to achieve a level of support excellence that exceeds customer expectations and differentiates IBM in the marketplace. You will always have access to the right level of IBM expertise when you need it. Summary IBM XL C/C++ compilers are stable and flexible, providing industry leading optimization techniques that can address your compiler needs for everything from small applications, to large, computationally intensive programs. The extensive cross-platform availability of the IBM XL C/C++ compilers eases the porting process between AIX, z/OS, and Linux. Standards conformance and GNU compatibilities improve portability of source code from GNU compilers to IBM compilers. The binary compatibility feature allows direct linkage with objects, shared libraries, and archives built by either the GNU or IBM compilers. This allows you to take advantage of the features offered by both suites of compiler products. IBM is also deeply involved in the High Performance Computing effort. Three of the top ten entries in the TOP 500 Supercomputing List are IBM systems using IBM XL C/C++ compiler optimizations. The IBM XL C/C++ compiler team is deeply involved in parallel computing and supporting different parallel memory models. Other new features support customer requests and enable middleware applications. Finally, optimization through chip-specific instruction generation and tuning, parallelization, vectorization, interprocedural analysis, and profile-directed feedback, offers an increase in performance without sacrificing stability or flexibility. Coupled with IBM’s excellent service and support, IBM XL C/C++ compilers are robust, versatile, and capable of delivering mission critical applications on AIX and Linux. Trial versions and purchasing Trial versions of IBM XL C/C++ compilers can be downloaded at: v www.ibm.com/software/awdtools/xlcpp/ Information on how to buy IBM XL C/C++ is also available at this web site. Contacting IBM IBM welcomes your comments. You can send them to [email protected]. 17 November 2008 References in this document to IBM products, programs, or services do not imply that IBM intends to make these available in all countries in which IBM operates. Any reference to an IBM program product in this publication is not intended to state or imply that only IBM’s program product may be used. Any functionally equivalent program may be used instead. IBM, the IBM logo, and ibm.com® are trademarks or registered trademarks of International Business Machines Corporation in the United States, other countries, or both. These and other IBM trademarked terms are marked on their first occurrence in this information with the appropriate symbol (® or ™), indicating US registered or common law trademarks owned by IBM at the time this information was published. Such trademarks may also be registered or common law trademarks in other countries. Linux is a trademark of Linus Torvalds in the United States, other countries, or both. UNIX is a registered trademark of The Open Group in the United States and other countries. Cell Broadband Engine is a trademark of Sony Computer Entertainment, Inc. in the United States, other countries, or both and is used under license therefrom. © Copyright International Business Machines Corporation 1999, 2008. US Government Users Restricted Rights – Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp.