PDF

Use the IBM z13 SIMD unit and the IBM z/OS XL C/C++
compiler to add parallelism to your C/C++ programs
Process data efficiently with vector data types and operations
Rajan Bhakta ([email protected])
Technical Architect, z/OS XL C/C++ compilers
IBM
14 July 2015
The IBM z13 hardware provides a new SIMD unit. This article describes how to use the IBM
z/OS XL C/C++ language to take advantage of the new processor and exploit the enhanced
parallelism it offers. This article also provides an overview of the new data types, the operations
that can be done on those data types, and the built-in functions to make vector programming
easier.
Parallelism in C and C++ languages
The latest C and C++ language standards provide threading support and atomics, but even with
this support, parallelism is still not exploited as deeply as it can be. For example, there is no
prescribed way for single instruction/multiple data (SIMD) style coding.
The IBM z/OS XL C/C++ V2.1.1 compiler adds in support for SIMD data level parallelism with new
data types that fit in with the existing C and C++ type system, use the existing operators to work on
the new types in a way that is natural and intuitive, and a number of new built-in functions (BIFs)
that can be used to exploit SIMD functionality at the hardware level.
Programming efficient code to do a series of similar or same transformations to a large
amount of data is a common task. As long as the data does not have dependencies during the
transformations, it is usually a prime candidate for parallelism.
Encapsulating the data in a form that can be manipulated efficiently, transformed effectively, and
performed quickly requires a strong connection between the hardware and software. Without
hardware support, the software overhead often results in poorly performing code. Without software
support, the code is usually very hard to maintain and has poor readability.
© Copyright IBM Corporation 2015
Use the IBM z13 SIMD unit and the IBM z/OS XL C/C++ compiler
to add parallelism to your C/C++ programs
Trademarks
Page 1 of 6
developerWorks®
ibm.com/developerWorks/
The new IBM z13 processors provide the hardware for increased parallelism and the IBM z/OS XL
C/C++ V2.1.1 compiler provides the software to exploit the hardware to allow efficient, effective,
and fast performing data processing for various fields such as business analytics.
SIMD principles
Large data sets (the multiple data part of SIMD) that require a series of operations lend
themselves well to data types that encapsulate chunks of the data and allow the operations to take
effect on the chunks.
The IBM z13 SIMD unit allows handling of 16 byte chunks of data in various element sizes. It
supports instructions (the single instruction part of SIMD) to perform operations on 16 individual
single byte data elements, 8 individual double byte elements, 4 individual four byte elements or 2
individual eight byte elements (again, the multiple data part of SIMD). There are also instructions
to perform 1 sixteen byte element operations, but since that does not really fit the SIMD model,
that aspect will not be discussed in this article.
New data types
The element sizes supported by the SIMD unit map naturally to some of the basic C and C++ data
types. You can chunk the data into various sized groups of normal C and C++ types. This chunking
is represented by a new family of data types called the vector data types. The data types are the
same for both C and C++ allowing interoperability between the two languages.
The vector family of data types share common characteristics in that they all syntactically begin
with the vector or __vector keyword, followed by the actual element type. The vector keyword is
introduced with the VECTOR option. This article will use the vector keyword for simplicity, though
__vector is valid wherever the vector keyword is present.
The elements that are supported by the vector family of types are the normal integral types (char
to long long int) and the double type, along with boolean versions of each of the integral types
as seen in Table 1.
Table 1. Vector data types
Type
Interpretation of content
Range of values
vector unsigned char
16 unsigned char
0..255
vector signed char
16 signed char
-128..127
vector bool char
16 unsigned char
0 (FALSE), 255 (TRUE)
vector unsigned short
8 unsigned short
0..65535
8 signed short
-32768..32767
8 unsigned short
0 (FALSE), 65535 (TRUE)
4 unsigned int
0..2 -1
vector unsigned short int
vector signed short
vector signed short int
vector bool short
vector bool short int
vector unsigned int
Use the IBM z13 SIMD unit and the IBM z/OS XL C/C++ compiler
to add parallelism to your C/C++ programs
32
Page 2 of 6
ibm.com/developerWorks/
developerWorks®
vector signed int
4 signed int
-2 ..2 -1
vector bool int
4 unsigned int
0 (FALSE), 2 -1 (TRUE)
vector unsigned long long
2 unsigned long long
0..2 -1
vector signed long long
2 signed long long
-2 ..2 -1
vector bool long long
2 unsigned long long
0 (FALSE), 2 -1 (TRUE)
vector double
2 double
IEEE-754 double (64 bit) precision floatingpoint values
31
31
32
64
63
63
64
Note that the long long int types are available since the VECTOR option implies
LANGLVL(LONGLONG) and the double type is available only in IEEE mode, which can be used by
specifying the FLOAT(IEEE) option.
The data types in the vector family are all aligned on an 8 byte boundary.
Specifying literals of the data types is done similar to the C99 compound literal mechanism with
the vector type as the cast part and the vector element values in the brace (or parenthesized) list.
The brace or parenthesized element list can also be used to initialize the vector types.
Operations on vectors
Having the vector data types is not very useful without being able to do transformations or
operations on the data. Using a number of the normal C and C++ operators just as if the vector
data types were normal basic arithmetic types allows a concise and readable mechanism to write
SIMD algorithms.
Arithmetic operations
Common arithmetic operations such as addition, subtraction, increment, decrement, negation,
and others are supported. These operations, like most of the vector operations, act element by
element, following the SIMD model. For example, the following code results in the variable result
having the value of four integers, all with the value 55, being displayed to stdout.
#include <stdio.h>
int main(void) {
vector unsigned int a = {1, 2, 3, 4};
vector unsigned int b = {54, 53, 52, 51};
vector unsigned int result = a + b;
printf("Result: %vld\n", result);
return result[2];
}
To compile this code in USS, the command is as follows:
xlc –qvector –qarch=11 getResult.c –o getResult
The example above also uses the [] vector element selection operator. It allows access to
individual vector elements for both read and write purposes and acts just like the array index
Use the IBM z13 SIMD unit and the IBM z/OS XL C/C++ compiler
to add parallelism to your C/C++ programs
Page 3 of 6
developerWorks®
ibm.com/developerWorks/
operator. Note that since vector double types were not used, the -qfloat=ieee option did not need
to be specified.
Other operations
Along with the arithmetic and logical operators, there are the usual set of data type operators such
as sizeof, __alignof__, address of, typeof, casts, and assignment. A new vec_step operator
that gives the number of elements in a vector is also available. This operator can be used to help
iterate over the underlying scalar type arrays for example.
For comparison purposes, the relational operators are also supported. For use in conditionals, the
relational operators result in either a 1 or 0 value based on the relation between all the elements of
one operand and the other. For more information on how to get element by element results and an
OR reduced result, see the "Built-in functions" section.
The default argument promotion rules that C and C++ provide do not take effect on vector types.
This means that the operands might have to be more strictly related than an equivalent scalar
operation is related. For example, the bit-wise left shift operator requires the same (ignoring
signed-ness) vector type (to allow element by element shifting) or an unsigned long data type (for
a shift by a single number of bits) as the right operand. It also does not allow vector boolean types
since the logical values for those are only the maximum element type value and 0.
Built-in functions
For the most efficient access to machine instructions and operations that do not lend themselves to
normal C and C++ operators, the IBM z/OS XL C/C++ compiler has added in a number of built-in
functions (BIFs) to handle those operations.
These vector BIFs are prototyped in builtins.h just like most of the other BIFs. They are only active
when the VECTOR option is specified.
The BIFs that are available include specialized arithmetic operations (which are not available with
traditional C and C++ operators), compare, range compare, finding elements, gather/scatter, mask
generation, copy until zero, load and store, specialized logical operations, merge, pack/unpack,
replicate, rotate and shift, rounding and conversion, test and ANY and ALL predicates.
These BIFs usually correspond to the machine SIMD instructions to allow efficient and high
performing code. For example, in the following program code, for vec_any_lt you see the VFCH
(VECTOR FP COMPARE HIGH) instruction being generated in the pseudo-assembly listing.
#include <builtins.h>
#include <math.h>
#include <stdio.h>
double getMinimumValue(vector double *list, int numberOfArrayElements) {
vector double currentMinimum = { INFINITY, INFINITY }; // Infinities
for (int i = 0; i < numberOfArrayElements; i++) {
if (vec_any_lt(list[i], currentMinimum)) {
// At least one element in the list element is smaller
Use the IBM z13 SIMD unit and the IBM z/OS XL C/C++ compiler
to add parallelism to your C/C++ programs
Page 4 of 6
ibm.com/developerWorks/
developerWorks®
if (list[i][0] < list[i][1]) {
currentMinimum = vec_splat(list[i], 0);
} else {
currentMinimum = vec_splat(list[i], 1);
}
}
}
return currentMinimum[0];
}
#define NUM_ELEMENTS 3
int main(void) {
vector double array[NUM_ELEMENTS] = {
{ 5.5, 6.6 },
{ 7.7, -3.14 },
{ 2.73, 0.0 }
};
double minValue = getMinimumValue(array, NUM_ELEMENTS);
printf("Minimum Value: %e\n", minValue);
return 55;
}
To compile this code in USS, the command is as follows:
xlc –qvector –qarch=11 –qfloat=ieee –qlist –qlanglvl=extc99 getMinimum.c –o getMinimum
The BIFs often translate directly into the associated machine instruction, but the compiler can still
optimize it better (perhaps even removing the instruction) if it sees a better way of doing things.
Although the example above is simple, it does show the use of array indexing with vector element
indexing, for both initialization and for selecting elements of the vector. It also shows vector BIFs
and vector types being used as parameters and arguments to functions.
Conclusion
The IBM z13 SIMD unit provides a very rich and important mechanism to help crunch data faster
and write code that exploits parallelism at the data level. With the new data types, operations, and
BIFs in C and C++, writing programs to exploit the new unit is very easy and intuitive.
Along with the other features of the IBM z/OS XL C/C++ compiler, such as architecture sections,
inline assembly, and high optimizations, programs can be written to fully exploit the new processor
providing a huge boost to data analytics and processing.
READ: For more information on vector programming, read the "IBM z/OS XL C/C++ V2R1
Programming Guide" (SC14-7315-01)
Use the IBM z13 SIMD unit and the IBM z/OS XL C/C++ compiler
to add parallelism to your C/C++ programs
Page 5 of 6
developerWorks®
ibm.com/developerWorks/
About the author
Rajan Bhakta
Rajan Bhakta has six years of development experience for IBM XL C. He is currently
the ISO C Standards representative for Canada, and the C representative for IBM in
INCITS. He is also the Technical Architect for z/OS XL C/C++.
© Copyright IBM Corporation 2015
(www.ibm.com/legal/copytrade.shtml)
Trademarks
(www.ibm.com/developerworks/ibm/trademarks/)
Use the IBM z13 SIMD unit and the IBM z/OS XL C/C++ compiler
to add parallelism to your C/C++ programs
Page 6 of 6