Официальная спецификация API-интерфейса OpenMP версии 3.1 (июль 2011 г.)

OpenMP
Application Program
Interface
Version 3.1 July 2011
Copyright © 1997-2011 OpenMP Architecture Review Board.
Permission to copy without fee all or part of this material is granted,
provided the OpenMP Architecture Review Board copyright notice and
the title of this document appear. Notice is given that copying is by
permission of OpenMP Architecture Review Board.
This page intentionally left blank.
C O N T E N TS
1.
2.
Introduction
...............................................1
1.1
Scope
................................................1
1.2
Glossary
..............................................2
1.2.1
Threading Concepts
1.2.2
OpenMP Language Terminology
1.2.3
Tasking Terminology
1.2.4
Data Terminology
1.2.5
Implementation Terminology
1.3
Execution Model
1.4
Memory Model
..............................2
.....................2
..............................8
.................................9
. . . . . . . . . . . . . . . . . . . . . . . . 10
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.4.1
Structure of the OpenMP Memory Model
1.4.2
The Flush Operation
1.4.3
OpenMP Memory Consistency
. . . . . . . . . . . . . . . 13
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
. . . . . . . . . . . . . . . . . . . . . . 16
1.5
OpenMP Compliance
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.6
Normative References
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.7
Organization of this document
Directives
2.1
2.2
2.3
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
Directive Format
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.1.1
Fixed Source Form Directives
. . . . . . . . . . . . . . . . . . . . . . . 23
2.1.2
Free Source Form Directives
. . . . . . . . . . . . . . . . . . . . . . . . 24
Conditional Compilation
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.2.1
Fixed Source Form Conditional Compilation Sentinels
2.2.2
Free Source Form Conditional Compilation Sentinel
. . . . 26
. . . . . . 27
Internal Control Variables
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.3.1
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
ICV Descriptions
i
2.4
2.3.2
Modifying and Retrieving ICV Values
2.3.3
How the Per-Data Environment ICVs Work
2.3.4
ICV Override Relationships
parallel Construct
2.4.1
2.5
2.6
2.7
2.8
2.9
ii
. . . . . . . . . . . . . . . . . . 29
. . . . . . . . . . . . . 30
. . . . . . . . . . . . . . . . . . . . . . . . . 31
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
Determining the Number of Threads for a parallel Region
36
Worksharing Constructs
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
2.5.1
Loop Construct
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
2.5.2
sections Construct
2.5.3
single Construct
2.5.4
workshare Construct
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
Combined Parallel Worksharing Constructs
2.6.1
Parallel Loop Construct
2.6.2
parallel sections Construct
2.6.3
parallel workshare Construct
Tasking Constructs
. . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
. . . . . . . . . . . . . . . . . . . . . 57
. . . . . . . . . . . . . . . . . . . . 59
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
2.7.1
task Construct
2.7.2
taskyield Construct
2.7.3
Task Scheduling
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
Master and Synchronization Constructs
2.8.1
master Construct
2.8.2
critical Construct
2.8.3
barrier Construct
2.8.4
taskwait Construct
2.8.5
atomic Construct
2.8.6
flush Construct
2.8.7
ordered Construct
Data Environment
. . . . . . . . . . . . . . . . . . . 55
. . . . . . . . . . . . . . . . . . . . . . 67
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
2.9.1
Data-sharing Attribute Rules
2.9.2
threadprivate Directive
2.9.3
Data-Sharing Attribute Clauses
OpenMP API • Version 3.1 July 2011
. . . . . . . . . . . . . . . . . . . . . . . . 84
. . . . . . . . . . . . . . . . . . . . . . . . . 88
. . . . . . . . . . . . . . . . . . . . . . 92
2.9.4
2.10
3.
Data Copying Clauses
Nesting of Regions
Runtime Library Routines
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
3.1
Runtime Library Definitions
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
3.2
Execution Environment Routines
. . . . . . . . . . . . . . . . . . . . . . . . . . . 115
3.2.1
omp_set_num_threads
. . . . . . . . . . . . . . . . . . . . . . . . . . 116
3.2.2
omp_get_num_threads
. . . . . . . . . . . . . . . . . . . . . . . . . . 117
3.2.3
omp_get_max_threads
. . . . . . . . . . . . . . . . . . . . . . . . . . 118
3.2.4
omp_get_thread_num
3.2.5
omp_get_num_procs
3.2.6
omp_in_parallel
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
3.2.7
omp_set_dynamic
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
3.2.8
omp_get_dynamic
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
3.2.9
omp_set_nested
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
3.2.10 omp_get_nested
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
. . . . . . . . . . . . . . . . . . . . . . . . . . . 119
. . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
3.2.11 omp_set_schedule
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
3.2.12 omp_get_schedule
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
3.2.13 omp_get_thread_limit
. . . . . . . . . . . . . . . . . . . . . . . . . 131
3.2.14 omp_set_max_active_levels
. . . . . . . . . . . . . . . . . . . . 132
3.2.15 omp_get_max_active_levels
. . . . . . . . . . . . . . . . . . . . 134
3.2.16 omp_get_level
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
3.2.17 omp_get_ancestor_thread_num
3.2.18 omp_get_team_size
. . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
3.2.19 omp_get_active_level
3.2.20 omp_in_final
3.3
Lock Routines
. . . . . . . . . . . . . . . . . . 136
. . . . . . . . . . . . . . . . . . . . . . . . . 139
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
3.3.1
omp_init_lock and omp_init_nest_lock
. . . . . . . . . 143
3.3.2
omp_destroy_lock and omp_destroy_nest_lock
3.3.3
omp_set_lock and omp_set_nest_lock
. . . 144
. . . . . . . . . . . . 145
iii
3.4
4.
3.3.4
omp_unset_lock and omp_unset_nest_lock
3.3.5
omp_test_lock and omp_test_nest_lock
Timing Routines
. . . . . . . . . . 147
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
3.4.1
omp_get_wtime
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
3.4.2
omp_get_wtick
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
Environment Variables
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
4.1
OMP_SCHEDULE
4.2
OMP_NUM_THREADS
4.3
OMP_DYNAMIC
4.4
OMP_PROC_BIND
4.5
OMP_NESTED
4.6
OMP_STACKSIZE
4.7
OMP_WAIT_POLICY
4.8
OMP_MAX_ACTIVE_LEVELS
4.9
OMP_THREAD_LIMIT
A. Examples
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
A.1
A Simple Parallel Loop
A.2
The OpenMP Memory Model
A.3
Conditional Compilation
A.4
Internal Control Variables (ICVs)
A.5
The parallel Construct
A.6
Controlling the Number of Threads on Multiple Nesting Levels
A.7
Interaction Between the num_threads Clause and omp_set_dynamic
177
A.8
Fortran Restrictions on the do Construct
. . . . . . . . . . . . . . . . . . . . . 179
A.9
Fortran Private Loop Iteration Variables
. . . . . . . . . . . . . . . . . . . . . . 181
A.10 The nowait clause
A.11 The collapse clause
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
. . . . . . . . . . . . . . . . . . . . . . . . . . . 170
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
OpenMP API • Version 3.1 July 2011
. . . . 175
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
A.12 The parallel sections Construct
iv
. . . . . . . . 146
. . . . . . . . . . . . . . . . . . . . . . . . 189
A.13 The firstprivate Clause and the sections Construct
A.14 The single Construct
A.15 Tasking Constructs
. . . . . . 190
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
A.16 The taskyield Directive
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212
A.17 The workshare Construct
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213
A.18 The master Construct
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217
A.19 The critical Construct
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219
A.20 worksharing Constructs Inside a critical Construct
A.21 Binding of barrier Regions
A.22 The atomic Construct
. . . . . . . . . . 221
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224
A.23 Restrictions on the atomic Construct
A.24 The flush Construct without a List
. . . . . . . . . . . . . . . . . . . . . . . 230
. . . . . . . . . . . . . . . . . . . . . . . . . 233
A.25 Placement of flush, barrier, taskwait and taskyield Directives
236
A.26 The ordered Clause and the ordered Construct
A.27 The threadprivate Directive
. . . . . . . . . . . . . 239
. . . . . . . . . . . . . . . . . . . . . . . . . . . . 244
A.28 Parallel Random Access Iterator Loop
. . . . . . . . . . . . . . . . . . . . . . . 250
A.29 Fortran Restrictions on shared and private Clauses with Common
Blocks
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251
A.30 The default(none) Clause
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253
A.31 Race Conditions Caused by Implied Copies of Shared Variables in
Fortran . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255
A.32 The private Clause
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 256
A.33 Fortran Restrictions on Storage Association with the private Clause
260
A.34 C/C++ Arrays in a firstprivate Clause
A.35 The lastprivate Clause
A.36 The reduction Clause
A.37 The copyin Clause
. . . . . . . . . . . . . . . . . . . 263
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271
A.38 The copyprivate Clause
A.39 Nested Loop Constructs
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 278
v
A.40 Restrictions on Nesting of Regions
. . . . . . . . . . . . . . . . . . . . . . . . . . 281
A.41 The omp_set_dynamic and omp_set_num_threads Routines
A.42 The omp_get_num_threads Routine
A.43 The omp_init_lock Routine
A.44 Ownership of Locks
. . . . . . . . . . . . . . . . . . . . . . 289
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 292
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293
A.45 Simple Lock Routines
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294
A.46 Nestable Lock Routines
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 297
B. Stubs for Runtime Library Routines
. . . . . . . . . . . . . . . . . . . . . . . . . . . . 301
B.1
C/C++ Stub Routines
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302
B.2
Fortran Stub Routines
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 309
C. OpenMP C and C++ Grammar
C.1
Notation
C.2
Rules
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 316
D. Interface Declarations
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325
D.1
Example of the omp.h Header File
. . . . . . . . . . . . . . . . . . . . . . . . . 326
D.2
Example of an Interface Declaration include File
. . . . . . . . . . . . . 328
D.3
Example of a Fortran Interface Declaration module
. . . . . . . . . . . . 330
D.4
Example of a Generic Interface for a Library Routine
. . . . . . . . . . . . 334
E. OpenMP Implementation-Defined Behaviors
F. Features History
. . . . . . . . . . . . . . . . . . . . . 335
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 339
F.1
Version 3.0 to 3.1 Differences
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 339
F.2
Version 2.5 to 3.0 Differences
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 340
Index
vi
. . 288
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343
OpenMP API • Version 3.1 July 2011
1
CHAPTER
1
2
Introduction
3
4
5
6
The collection of compiler directives, library routines, and environment variables
described in this document collectively define the specification of the OpenMP
Application Program Interface (OpenMP API) for shared-memory parallelism in C, C++
and Fortran programs.
7
8
9
10
This specification provides a model for parallel programming that is portable across
shared memory architectures from different vendors. Compilers from numerous vendors
support the OpenMP API. More information about the OpenMP API can be found at the
following web site
11
http://www.openmp.org
12
13
14
15
16
17
18
19
20
The directives, library routines, and environment variables defined in this document
allow users to create and manage parallel programs while permitting portability. The
directives extend the C, C++ and Fortran base languages with single program multiple
data (SPMD) constructs, tasking constructs, worksharing constructs, and
synchronization constructs, and they provide support for sharing and privatizing data.
The functionality to control the runtime environment is provided by library routines and
environment variables. Compilers that support the OpenMP API often include a
command line option to the compiler that activates and allows interpretation of all
OpenMP directives.
21
22
23
24
25
26
27
1.1
Scope
The OpenMP API covers only user-directed parallelization, wherein the programmer
explicitly specifies the actions to be taken by the compiler and runtime system in order
to execute the program in parallel. OpenMP-compliant implementations are not required
to check for data dependencies, data conflicts, race conditions, or deadlocks, any of
which may occur in conforming programs. In addition, compliant implementations are
not required to check for code sequences that cause a program to be classified as non1
conforming. Application developers are responsible for correctly using the OpenMP API
to produce a conforming program. The OpenMP API does not cover compiler-generated
automatic parallelization and directives to the compiler to assist such parallelization.
1
2
3
4
1.2
Glossary
5
1.2.1
Threading Concepts
6
7
8
thread
9
OpenMP thread
10
11
thread-safe routine
12
1.2.2
13
14
15
An execution entity with a stack and associated static memory, called
threadprivate memory.
A thread that is managed by the OpenMP runtime system.
A routine that performs the intended function even when executed
concurrently (by more than one thread).
OpenMP Language Terminology
base language
A programming language that serves as the foundation of the OpenMP
specification.
COMMENT: See Section 1.6 on page 17 for a listing of current base
languages for the OpenMP API.
16
17
18
base program
19
20
structured block
A program written in a base language.
For C/C++, an executable statement, possibly compound, with a single entry
at the top and a single exit at the bottom, or an OpenMP construct.
21
22
For Fortran, a block of executable statements with a single entry at the top and
a single exit at the bottom, or an OpenMP construct.
23
COMMENTS:
24
For all base languages,
25
•
Access to the structured block must not be the result of a branch.
26
27
•
The point of exit cannot be a branch out of the structured block.
2
OpenMP API • Version 3.1 July 2011
1
For C/C++:
2
•
The point of entry must not be a call to setjmp().
3
•
longjmp() and throw() must not violate the entry/exit criteria.
4
•
Calls to exit() are allowed in a structured block.
5
6
7
8
9
•
An expression statement, iteration statement, selection statement,
or try block is considered to be a structured block if the
corresponding compound statement obtained by enclosing it in {
and } would be a structured block.
10
For Fortran:
11
12
13
•
enclosing context
In C/C++, the innermost scope enclosing an OpenMP construct.
In Fortran, the innermost scoping unit enclosing an OpenMP construct.
14
15
16
STOP statements are allowed in a structured block.
directive
In C/C++, a #pragma, and in Fortran, a comment, that specifies OpenMP
program behavior.
COMMENT: See Section 2.1 on page 22 for a description of OpenMP
directive syntax.
17
18
19
white space
20
21
OpenMP program
22
23
conforming program
An OpenMP program that follows all the rules and restrictions of the
OpenMP specification.
24
25
26
declarative directive
An OpenMP directive that may only be placed in a declarative context. A
declarative directive has no associated executable user code, but instead has
one or more associated user declarations.
executable directive
An OpenMP directive that is not declarative. That is, it may be placed in an
executable context.
COMMENT: All directives except the threadprivate directive are
executable directives.
30
31
32
A program that consists of a base program, annotated with OpenMP directives
and runtime library routines.
COMMENT: Only the threadprivate directive is a declarative directive.
27
28
29
A non-empty sequence of space and/or horizontal tab characters.
stand-alone directive
An OpenMP executable directive that has no associated executable user code.
Chapter 1
Introduction
3
loop directive
1
2
An OpenMP executable directive whose associated user code must be a loop
nest that is a structured block.
COMMENTS:
3
4
For C/C++, only the for directive is a loop directive.
5
6
For Fortran, only the do directive and the optional end do directive
are loop directives.
associated loop(s)
7
The loop(s) controlled by a loop directive.
COMMENT: If the loop directive contains a collapse clause then there
may be more than one associated loop.
8
9
10
11
12
13
construct
An OpenMP executable directive (and for Fortran, the paired end directive, if
any) and the associated statement, loop or structured block, if any, not
including the code in any called routines. That is, in the lexical extent of an
executable directive.
14
15
16
17
18
19
region
All code encountered during a specific instance of the execution of a given
construct or of an OpenMP library routine. A region includes any code in
called routines as well as any implicit code introduced by the OpenMP
implementation. The generation of a task at the point where a task directive
is encountered is a part of the region of the encountering thread, but the
explicit task region associated with the task directive is not.
COMMENTS:
20
21
22
A region may also be thought of as the dynamic or runtime extent of a
construct or of an OpenMP library routine.
23
24
During the execution of an OpenMP program, a construct may give
rise to many regions.
25
active parallel region
26
27
inactive parallel
region
28
4
A parallel region that is executed by a team consisting of more than one
thread.
A parallel region that is executed by a team of only one thread.
OpenMP API • Version 3.1 July 2011
1
2
3
sequential part
All code encountered during the execution of an OpenMP program that is not
part of a parallel region corresponding to a parallel construct or a
task region corresponding to a task construct.
COMMENTS:
4
5
6
The sequential part executes as if it were enclosed by an inactive
parallel region.
7
8
9
Executable statements in called routines may be in both the sequential
part and any number of explicit parallel regions at different points
in the program execution.
10
11
master thread
The thread that encounters a parallel construct, creates a team, generates
a set of tasks, then executes one of those tasks as thread number 0.
12
13
14
15
16
parent thread
The thread that encountered the parallel construct and generated a
parallel region is the parent thread of each of the threads in the team of
that parallel region. The master thread of a parallel region is the
same thread as its parent thread with respect to any resources associated with
an OpenMP thread.
17
18
ancestor thread
19
20
team
For a given thread, its parent thread or one of its parent thread’s ancestor
threads.
A set of one or more threads participating in the execution of a parallel
region.
COMMENTS:
21
22
23
For an active parallel region, the team comprises the master thread
and at least one additional thread.
24
25
For an inactive parallel region, the team comprises only the master
thread.
26
27
28
initial thread
implicit parallel
region
29
nested construct
30
31
nested region
32
33
The thread that executes the sequential part.
The inactive parallel region that encloses the sequential part of an OpenMP
program.
A construct (lexically) enclosed by another construct.
A region (dynamically) enclosed by another region. That is, a region
encountered during the execution of another region.
COMMENT: Some nestings are conforming and some are not. See
Section 2.10 on page 111 for the restrictions on nesting.
Chapter 1
Introduction
5
1
2
closely nested region
A region nested inside another region with no parallel region nested
between them.
3
all threads
4
current team
5
encountering thread
6
all tasks
7
8
9
10
current team tasks
All tasks encountered by the corresponding team. Note that the implicit tasks
constituting the parallel region and any descendant tasks encountered
during the execution of these implicit tasks are included in this binding task
set.
11
generating task
For a given region the task whose execution by a thread generated the region.
12
13
binding thread set
All OpenMP threads participating in the OpenMP program.
All threads in the team executing the innermost enclosing parallel region
For a given region, the thread that encounters the corresponding construct.
All tasks participating in the OpenMP program.
The set of threads that are affected by, or provide the context for, the
execution of a region.
14
15
The binding thread set for a given region can be all threads, the current team,
or the encountering thread.
16
17
COMMENT: The binding thread set for a particular region is described in its
corresponding subsection of this specification.
binding task set
18
19
The set of tasks that are affected by, or provide the context for, the execution
of a region.
20
21
The binding task set for a given region can be all tasks, the current team
tasks, or the generating task.
22
23
COMMENT: The binding task set for a particular region (if applicable) is
described in its corresponding subsection of this specification.
6
OpenMP API • Version 3.1 July 2011
1
2
binding region
The enclosing region that determines the execution context and limits the
scope of the effects of the bound region is called the binding region.
3
4
5
Binding region is not defined for regions whose binding thread set is all
threads or the encountering thread, nor is it defined for regions whose binding
task set is all tasks.
6
COMMENTS:
7
8
The binding region for an ordered region is the innermost enclosing
loop region.
9
10
The binding region for a taskwait region is the innermost enclosing
task region.
11
12
13
For all other regions for which the binding thread set is the current
team or the binding task set is the current team tasks, the binding
region is the innermost enclosing parallel region.
14
15
For regions for which the binding task set is the generating task, the
binding region is the region of the generating task.
16
17
A parallel region need not be active nor explicit to be a binding
region.
18
A task region need not be explicit to be a binding region.
19
20
A region never binds to any region outside of the innermost enclosing
parallel region.
21
22
23
24
25
orphaned construct
worksharing
construct
A construct that gives rise to a region whose binding thread set is the current
team, but is not nested within another construct giving rise to the binding
region.
A construct that defines units of work, each of which is executed exactly once
by one of the threads in the team executing the construct.
26
For C/C++, worksharing constructs are for, sections, and single.
27
28
For Fortran, worksharing constructs are do, sections, single and
workshare.
29
sequential loop
30
31
32
33
barrier
A loop that is not associated with any OpenMP loop directive.
A point in the execution of a program encountered by a team of threads,
beyond which no thread in the team may execute until all threads in the team
have reached the barrier and all explicit tasks generated by the team have
executed to completion.
Chapter 1
Introduction
7
1
1.2.3
Tasking Terminology
2
3
task
A specific instance of executable code and its data environment, generated
when a thread encounters a task construct or a parallel construct.
4
task region
A region consisting of all code encountered during the execution of a task.
COMMENT: A parallel region consists of one or more implicit task
regions.
5
6
7
explicit task
A task generated when a task construct is encountered during execution.
8
9
implicit task
A task generated by the implicit parallel region or generated when a
parallel construct is encountered during execution.
10
initial task
11
12
current task
13
14
child task
15
16
descendant task
A task that is the child task of a task region or of one of its descendant task
regions.
17
18
task completion
Task completion occurs when the end of the structured block associated with
the construct that generated the task is reached.
The implicit task associated with the implicit parallel region.
For a given thread, the task corresponding to the task region in which it is
executing.
A task is a child task of the region of its generating task. A child task region
is not part of its generating task region.
COMMENT: Completion of the initial task occurs at program exit.
19
task scheduling point
20
21
22
A point during the execution of the current task region at which it can be
suspended to be resumed later; or the point of task completion, after which the
executing thread may switch to a different task region.
COMMENT:
23
24
25
Within tied task regions, task scheduling points only appear in the
following:
26
•
encountered task constructs
27
•
encountered taskyield constructs
28
•
encountered taskwait constructs
29
•
encountered barrier directives
30
•
implicit barrier regions
31
•
at the end of the tied task region
task switching
32
8
The act of a thread switching from the execution of one task to another task.
OpenMP API • Version 3.1 July 2011
1
2
tied task
3
4
untied task
A task that, when its task region is suspended, can be resumed by any thread
in the team. That is, the task is not tied to any thread.
5
6
7
undeferred task
A task for which execution is not deferred with respect to its generating task
region. That is, its generating task region is suspended until execution of the
undeferred task is completed.
8
9
10
included task
A task for which execution is sequentially included in the generating task
region. That is, it is undeferred and executed immediately by the encountering
thread.
11
12
merged task
A task whose data environment, inclusive of ICVs, is the same as that of its
generating task region.
13
final task
14
task synchronization
construct
15
16
17
1.2.4
variable
27
28
29
A taskwait or a barrier construct.
A named data storage block, whose value can be defined and redefined during
the execution of a program.
Array sections and substrings are not considered variables.
private variable
With respect to a given set of task regions that bind to the same parallel
region, a variable whose name provides access to a different block of storage
for each task region.
A variable that is part of another variable (as an array or structure element)
cannot be made private independently of other components.
22
23
24
25
26
A task that forces all of its child tasks to become final and included tasks.
Data Terminology
18
19
20
21
A task that, when its task region is suspended, can be resumed only by the
same thread that suspended it. That is, the task is tied to that thread.
shared variable
With respect to a given set of task regions that bind to the same parallel
region, a variable whose name provides access to the same block of storage
for each task region.
A variable that is part of another variable (as an array or structure element)
cannot be shared independently of the other components, except for static data
members of C++ classes.
Chapter 1
Introduction
9
threadprivate
variable
1
2
3
A variable that is replicated, one instance per thread, by the OpenMP
implementation. Its name then provides access to a different block of storage
for each thread.
A variable that is part of another variable (as an array or structure element)
cannot be made threadprivate independently of the other components, except
for static data members of C++ classes.
4
5
6
threadprivate
memory
7
8
9
10
data environment
11
defined
The set of threadprivate variables associated with each thread.
All the variables associated with the execution of a given task. The data
environment for a given task is constructed from the data environment of the
generating task at the time the task is generated.
For variables, the property of having a valid value.
12
For C:
13
For the contents of variables, the property of having a valid value.
14
For C++:
15
16
For the contents of variables of POD (plain old data) type, the property of
having a valid value.
17
18
For variables of non-POD class type, the property of having been constructed
but not subsequently destructed.
19
For Fortran:
20
21
22
For the contents of variables, the property of having a valid value. For the
allocation or association status of variables, the property of having a valid
status.
23
24
COMMENT: Programs that rely upon variables that are not defined are nonconforming programs.
class type
25
26
27
28
1.2.5
Implementation Terminology
supporting n levels of
parallelism
10
For C++: Variables declared with one of the class, struct, or union keywords.
Implies allowing an active parallel region to be enclosed by n-1 active
parallel regions.
OpenMP API • Version 3.1 July 2011
1
supporting the
OpenMP API
2
supporting nested
parallelism
3
4
internal control
variable
compliant
implementation
A conceptual variable that specifies run-time behavior of a set of threads or
tasks in an OpenMP program.
An implementation of the OpenMP specification that compiles and executes
any conforming program as defined by the specification.
COMMENT: A compliant implementation may exhibit unspecified behavior
when compiling or executing a non-conforming program.
9
10
11
12
Supporting more than one level of parallelism.
COMMENT: The acronym ICV is used interchangeably with the term internal
control variable in the remainder of this specification.
5
6
7
8
Supporting at least one level of parallelism.
unspecified behavior
A behavior or result that is not specified by the OpenMP specification or not
known prior to the compilation or execution of an OpenMP program.
13
Such unspecified behavior may result from:
14
15
• Issues documented by the OpenMP specification as having unspecified
behavior.
16
• A non-conforming program.
17
• A conforming program exhibiting an implementation defined behavior.
18
19
20
21
22
23
implementation
defined
Behavior that must be documented by the implementation, and is allowed to
vary among different compliant implementations. An implementation is
allowed to define this behavior as unspecified.
COMMENT: All features that have implementation defined behavior are
documented in Appendix E.
Chapter 1
Introduction
11
1
1.3
Execution Model
2
3
4
5
6
7
8
9
10
11
12
13
The OpenMP API uses the fork-join model of parallel execution. Multiple threads of
execution perform tasks defined implicitly or explicitly by OpenMP directives. The
OpenMP API is intended to support programs that will execute correctly both as parallel
programs (multiple threads of execution and a full OpenMP support library) and as
sequential programs (directives ignored and a simple OpenMP stubs library). However,
it is possible and permitted to develop a program that executes correctly as a parallel
program but not as a sequential program, or that produces different results when
executed as a parallel program compared to when it is executed as a sequential program.
Furthermore, using different numbers of threads may result in different numeric results
because of changes in the association of numeric operations. For example, a serial
addition reduction may have a different pattern of addition associations than a parallel
reduction. These different associations may change the results of floating-point addition.
14
15
16
17
An OpenMP program begins as a single thread of execution, called the initial thread.
The initial thread executes sequentially, as if enclosed in an implicit task region, called
the initial task region, that is defined by an implicit inactive parallel region
surrounding the whole program.
18
19
20
21
22
23
24
25
26
27
28
When any thread encounters a parallel construct, the thread creates a team of itself
and zero or more additional threads and becomes the master of the new team. A set of
implicit tasks, one per thread, is generated. The code for each task is defined by the code
inside the parallel construct. Each task is assigned to a different thread in the team
and becomes tied; that is, it is always executed by the thread to which it is initially
assigned. The task region of the task being executed by the encountering thread is
suspended, and each member of the new team executes its implicit task. There is an
implicit barrier at the end of the parallel construct. Only the master thread resumes
execution beyond the end of the parallel construct, resuming the task region that
was suspended upon encountering the parallel construct. Any number of
parallel constructs can be specified in a single program.
29
30
31
32
33
parallel regions may be arbitrarily nested inside each other. If nested parallelism is
disabled, or is not supported by the OpenMP implementation, then the new team that is
created by a thread encountering a parallel construct inside a parallel region
will consist only of the encountering thread. However, if nested parallelism is supported
and enabled, then the new team can consist of more than one thread.
34
35
36
37
38
When any team encounters a worksharing construct, the work inside the construct is
divided among the members of the team, and executed cooperatively instead of being
executed by every thread. There is a default barrier at the end of each worksharing
construct unless the nowait clause is present. Redundant execution of code by every
thread in the team resumes after the end of the worksharing construct.
12
OpenMP API • Version 3.1 July 2011
1
2
3
4
5
6
7
8
9
10
11
12
13
When any thread encounters a task construct, a new explicit task is generated.
Execution of explicitly generated tasks is assigned to one of the threads in the current
team, subject to the thread's availability to execute work. Thus, execution of the new
task could be immediate, or deferred until later. Threads are allowed to suspend the
current task region at a task scheduling point in order to execute a different task. If the
suspended task region is for a tied task, the initially assigned thread later resumes
execution of the suspended task region. If the suspended task region is for an untied
task, then any thread may resume its execution. Completion of all explicit tasks bound
to a given parallel region is guaranteed before the master thread leaves the implicit
barrier at the end of the region. Completion of a subset of all explicit tasks bound to a
given parallel region may be specified through the use of task synchronization
constructs. Completion of all explicit tasks bound to the implicit parallel region is
guaranteed by the time the program exits.
14
15
16
17
Synchronization constructs and library routines are available in the OpenMP API to
coordinate tasks and data access in parallel regions. In addition, library routines
and environment variables are available to control or to query the runtime environment
of OpenMP programs.
18
19
20
21
22
The OpenMP specification makes no guarantee that input or output to the same file is
synchronous when executed in parallel. In this case, the programmer is responsible for
synchronizing input and output statements (or routines) using the provided
synchronization constructs or library routines. For the case where each thread accesses a
different file, no synchronization by the programmer is necessary.
23
24
1.4
Memory Model
25
1.4.1
Structure of the OpenMP Memory Model
26
27
28
29
30
31
32
33
34
35
The OpenMP API provides a relaxed-consistency, shared-memory model. All OpenMP
threads have access to a place to store and to retrieve variables, called the memory. In
addition, each thread is allowed to have its own temporary view of the memory. The
temporary view of memory for each thread is not a required part of the OpenMP
memory model, but can represent any kind of intervening structure, such as machine
registers, cache, or other local storage, between the thread and the memory. The
temporary view of memory allows the thread to cache variables and thereby to avoid
going to memory for every reference to a variable. Each thread also has access to
another type of memory that must not be accessed by other threads, called threadprivate
memory.
Chapter 1
Introduction
13
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
A directive that accepts data-sharing attribute clauses determines two kinds of access to
variables used in the directive’s associated structured block: shared and private. Each
variable referenced in the structured block has an original variable, which is the variable
by the same name that exists in the program immediately outside the construct. Each
reference to a shared variable in the structured block becomes a reference to the original
variable. For each private variable referenced in the structured block, a new version of
the original variable (of the same type and size) is created in memory for each task that
contains code associated with the directive. Creation of the new version does not alter
the value of the original variable. However, the impact of attempts to access the original
variable during the region associated with the directive is unspecified; see
Section 2.9.3.3 on page 96 for additional details. References to a private variable in the
structured block refer to the current task’s private version of the original variable. The
relationship between the value of the original variable and the initial or final value of the
private version depends on the exact clause that specifies it. Details of this issue, as well
as other issues with privatization, are provided in Section 2.9 on page 84.
16
17
18
The minimum size at which a memory update may also read and write back adjacent
variables that are part of another variable (as array or structure elements) is
implementation defined but is no larger than required by the base language.
19
20
21
22
23
24
A single access to a variable may be implemented with multiple load or store
instructions, and hence is not guaranteed to be atomic with respect to other accesses to
the same variable. Accesses to variables smaller than the implementation defined
minimum size or to C or C++ bit-fields may be implemented by reading, modifying, and
rewriting a larger unit of memory, and may thus interfere with updates of variables or
fields in the same unit of memory.
25
26
27
28
29
30
If multiple threads write without synchronization to the same memory unit, including
cases due to atomicity considerations as described above, then a data race occurs.
Similarly, if at least one thread reads from a memory unit and at least one thread writes
without synchronization to that same memory unit, including cases due to atomicity
considerations as described above, then a data race occurs. If a data race occurs then the
result of the program is unspecified.
31
32
33
34
35
36
37
A private variable in a task region that eventually generates an inner nested parallel
region is permitted to be made shared by implicit tasks in the inner parallel region.
A private variable in a task region can be shared by an explicit task region generated
during its execution. However, it is the programmer’s responsibility to ensure through
synchronization that the lifetime of the variable does not end before completion of the
explicit task region sharing it. Any other access by one task to the private variables of
another task results in unspecified behavior.
14
OpenMP API • Version 3.1 July 2011
1
1.4.2
The Flush Operation
2
3
4
5
6
7
The memory model has relaxed-consistency because a thread’s temporary view of
memory is not required to be consistent with memory at all times. A value written to a
variable can remain in the thread’s temporary view until it is forced to memory at a later
time. Likewise, a read from a variable may retrieve the value from the thread’s
temporary view, unless it is forced to read from memory. The OpenMP flush operation
enforces consistency between the temporary view and memory.
8
9
10
11
12
The flush operation is applied to a set of variables called the flush-set. The flush
operation restricts reordering of memory operations that an implementation might
otherwise do. Implementations must not reorder the code for a memory operation for a
given variable, or the code for a flush operation for the variable, with respect to a flush
operation that refers to the same variable.
13
14
15
16
17
18
19
20
21
22
23
24
25
26
If a thread has performed a write to its temporary view of a shared variable since its last
flush of that variable, then when it executes another flush of the variable, the flush does
not complete until the value of the variable has been written to the variable in memory.
If a thread performs multiple writes to the same variable between two flushes of that
variable, the flush ensures that the value of the last write is written to the variable in
memory. A flush of a variable executed by a thread also causes its temporary view of the
variable to be discarded, so that if its next memory operation for that variable is a read,
then the thread will read from memory when it may again capture the value in the
temporary view. When a thread executes a flush, no later memory operation by that
thread for a variable involved in that flush is allowed to start until the flush completes.
The completion of a flush of a set of variables executed by a thread is defined as the
point at which all writes to those variables performed by the thread before the flush are
visible in memory to all other threads and that thread’s temporary view of all variables
involved is discarded.
27
28
29
30
31
32
The flush operation provides a guarantee of consistency between a thread’s temporary
view and memory. Therefore, the flush operation can be used to guarantee that a value
written to a variable by one thread may be read by a second thread. To accomplish this,
the programmer must ensure that the second thread has not written to the variable since
its last flush of the variable, and that the following sequence of events happens in the
specified order:
33
1. The value is written to the variable by the first thread.
34
2. The variable is flushed by the first thread.
35
3. The variable is flushed by the second thread.
36
4. The value is read from the variable by the second thread.
Chapter 1
Introduction
15
Note – OpenMP synchronization operations, described in Section 2.8 on page 67 and in
Section 3.3 on page 141, are recommended for enforcing this order. Synchronization
through variables is possible but is not recommended because the proper timing of
flushes is difficult as shown in Section A.2 on page 162.
1
2
3
4
5
1.4.3
OpenMP Memory Consistency
The restrictions in Section 1.4.2 on page 15 on reordering with respect to flush
operations guarantee the following:
6
7
8
9
10
• If the intersection of the flush-sets of two flushes performed by two different threads
11
12
13
• If two operations performed by the same thread either access, modify, or flush the
14
15
• If the intersection of the flush-sets of two flushes is empty, the threads can observe
16
17
18
The flush operation can be specified using the flush directive, and is also implied at
various locations in an OpenMP program: see Section 2.8.6 on page 78 for details. For
an example illustrating the memory model, see Section A.2 on page 162.
19
20
Note – Since flush operations by themselves cannot prevent data races, explicit flush
operations are only useful in combination with atomic directives.
21
OpenMP programs that:
22
• do not use atomic directives,
23
24
• do not rely on the accuracy of a false result from omp_test_lock and
25
• correctly avoid data races as required in Section 1.4.1 on page 13
26
27
28
29
behave as though operations on shared variables were simply interleaved in an order
consistent with the order in which they are performed by each thread. The relaxed
consistency model is invisible for such programs, and any explicit flush operations in
such programs are redundant.
is non-empty, then the two flushes must be completed as if in some sequential order,
seen by all threads.
same variable, then they must be completed as if in that thread's program order, as
seen by all threads.
these flushes in any order.
omp_test_nest_lock, and
16
OpenMP API • Version 3.1 July 2011
Implementations are allowed to relax the ordering imposed by implicit flush operations
when the result is only visible to programs using atomic directives.
1
2
3
1.5
OpenMP Compliance
4
5
6
7
8
An implementation of the OpenMP API is compliant if and only if it compiles and
executes all conforming programs according to the syntax and semantics laid out in
Chapters 1, 2, 3 and 4. Appendices A, B, C, D, E and F and sections designated as Notes
(see Section 1.7 on page 18) are for information purposes only and are not part of the
specification.
9
10
11
12
13
14
The OpenMP API defines constructs that operate in the context of the base language that
is supported by an implementation. If the base language does not support a language
construct that appears in this document, a compliant OpenMP implementation is not
required to support it, with the exception that for Fortran, the implementation must
allow case insensitivity for directive and API routines names, and must allow identifiers
of more than six characters.
15
16
17
18
19
20
All library, intrinsic and built-in routines provided by the base language must be threadsafe in a compliant implementation. In addition, the implementation of the base
language must also be thread-safe. For example, ALLOCATE and DEALLOCATE
statements must be thread-safe in Fortran. Unsynchronized concurrent use of such
routines by different threads must produce correct results (although not necessarily the
same as serial execution results, as in the case of random number generation routines).
21
22
23
24
In both Fortran 90 and Fortran 95, variables with explicit initialization have the SAVE
attribute implicitly. This is not the case in Fortran 77. However, a compliant OpenMP
Fortran implementation must give such a variable the SAVE attribute, regardless of the
underlying base language version.
25
26
27
Appendix E lists certain aspects of the OpenMP API that are implementation defined. A
compliant implementation is required to define and document its behavior for each of
the items in Appendix E.
28
1.6
Normative References
29
30
• ISO/IEC 9899:1990, Information Technology - Programming Languages - C.
31
This OpenMP API specification refers to ISO/IEC 9899:1990 as C90.
Chapter 1
Introduction
17
1
2
• ISO/IEC 9899:1999, Information Technology - Programming Languages - C.
3
This OpenMP API specification refers to ISO/IEC 9899:1999 as C99.
4
5
• ISO/IEC 14882:1998, Information Technology - Programming Languages - C++.
6
This OpenMP API specification refers to ISO/IEC 14882:1998 as C++.
7
8
• ISO/IEC 1539:1980, Information Technology - Programming Languages - Fortran.
9
This OpenMP API specification refers to ISO/IEC 1539:1980 as Fortran 77.
10
11
• ISO/IEC 1539:1991, Information Technology - Programming Languages - Fortran.
12
This OpenMP API specification refers to ISO/IEC 1539:1991 as Fortran 90.
13
14
• ISO/IEC 1539-1:1997, Information Technology - Programming Languages - Fortran.
15
This OpenMP API specification refers to ISO/IEC 1539-1:1997 as Fortran 95.
16
Where this OpenMP API specification refers to C, C++ or Fortran, reference is made to
the base language supported by the implementation.
17
18
19
1.7
Organization of this document
20
The remainder of this document is structured as follows:
21
• Chapter 2: Directives
22
• Chapter 3: Runtime Library Routines
23
• Chapter 4: Environment Variables
24
• Appendix A: Examples
25
• Appendix B: Stubs for Runtime Library Routines
26
• Appendix C: OpenMP C and C++ Grammar
27
• Appendix D: Interface Declarations
28
• Appendix E: OpenMP Implementation Defined Behaviors
29
• Appendix F: Features History
18
OpenMP API • Version 3.1 July 2011
1
2
3
Some sections of this document only apply to programs written in a certain base
language. Text that applies only to programs whose base language is C or C++ is shown
as follows:
4
C/C++ specific text....
C/C++
C/C++
5
Text that applies only to programs whose base language is Fortran is shown as follows:
Fortran
6
Fortran specific text......
Fortran
7
8
Where an entire page consists of, for example, Fortran specific text, a marker is shown
at the top of the page like this:
9
10
Some text is for information only, and is not part of the normative specification. Such
text is designated as a note, like this:
11
Note – Non-normative text....
Fortran (cont.)
Chapter 1
Introduction
19
1
This page intentionally left blank.
2
20
OpenMP API • Version 3.1 July 2011
1
CHAPTER
2
2
Directives
3
4
This chapter describes the syntax and behavior of OpenMP directives, and is divided
into the following sections:
5
• The language-specific directive format (Section 2.1 on page 22)
6
• Mechanisms to control conditional compilation (Section 2.2 on page 26)
7
• Control of OpenMP API ICVs (Section 2.3 on page 28)
8
9
• Details of each OpenMP directive (Section 2.4 on page 33 to Section 2.10 on page
111)
C/C++
10
11
In C/C++, OpenMP directives are specified by using the #pragma mechanism provided
by the C and C++ standards.
C/C++
Fortran
12
13
14
In Fortran, OpenMP directives are specified by using special comments that are
identified by unique sentinels. Also, a special comment form is available for conditional
compilation.
Fortran
15
16
17
18
19
20
Compilers can therefore ignore OpenMP directives and conditionally compiled code if
support of the OpenMP API is not provided or enabled. A compliant implementation
must provide an option or interface that ensures that underlying support of all OpenMP
directives and OpenMP conditional compilation mechanisms is enabled. In the
remainder of this document, the phrase OpenMP compilation is used to mean a
compilation with these OpenMP features enabled.
21
Fortran
1
Restrictions
2
The following restriction applies to all OpenMP directives:
3
• OpenMP directives may not appear in PURE or ELEMENTAL procedures.
Fortran
4
2.1
Directive Format
C/C++
OpenMP directives for C/C++ are specified with the pragma preprocessing directive.
The syntax of an OpenMP directive is formally specified by the grammar in
Appendix C, and informally as follows:
5
6
7
#pragma omp directive-name [clause[ [,] clause]...] new-line
8
9
10
11
12
Each directive starts with #pragma omp. The remainder of the directive follows the
conventions of the C and C++ standards for compiler directives. In particular, white
space can be used before and after the #, and sometimes white space must be used to
separate the words in a directive. Preprocessing tokens following the #pragma omp
are subject to macro replacement.
13
Directives are case-sensitive.
14
15
An OpenMP executable directive applies to at most one succeeding statement, which
must be a structured block.
C/C++
Fortran
OpenMP directives for Fortran are specified as follows:
16
sentinel directive-name [clause[[,] clause]...]
17
18
19
All OpenMP compiler directives must begin with a directive sentinel. The format of a
sentinel differs between fixed and free-form source files, as described in Section 2.1.1
on page 23 and Section 2.1.2 on page 24.
20
21
Directives are case-insensitive. Directives cannot be embedded within continued
statements, and statements cannot be embedded within directives.
22
OpenMP API • Version 3.1 July 2011
In order to simplify the presentation, free form is used for the syntax of OpenMP
directives for Fortran in the remainder of this document, except as noted.
1
2
Fortran
3
4
5
6
Only one directive-name can be specified per directive (note that this includes combined
directives, see Section 2.6 on page 55). The order in which clauses appear on directives
is not significant. Clauses on directives may be repeated as needed, subject to the
restrictions listed in the description of each clause.
7
8
9
10
Some data-sharing attribute clauses (Section 2.9.3 on page 92), data copying clauses
(Section 2.9.4 on page 107), the threadprivate directive (Section 2.9.2 on page 88)
and the flush directive (Section 2.8.6 on page 78) accept a list. A list consists of a
comma-separated collection of one or more list items.
11
12
A list item is a variable name, subject to the restrictions specified in each of the sections
describing clauses and directives for which a list appears.
C/C++
C/C++
Fortran
A list item is a variable name or a common block name (enclosed in slashes), subject to
the restrictions specified in each of the sections describing clauses and directives for
which a list appears.
13
14
15
Fortran
16
Fortran
17
18
2.1.1
Fixed Source Form Directives
The following sentinels are recognized in fixed form source files:
!$omp | c$omp | *$omp
19
20
21
22
23
Sentinels must start in column 1 and appear as a single word with no intervening
characters. Fortran fixed form line length, white space, continuation, and column rules
apply to the directive line. Initial directive lines must have a space or zero in column 6,
and continuation directive lines must have a character other than a space or a zero in
column 6.
24
25
26
27
Comments may appear on the same line as a directive. The exclamation point initiates a
comment when it appears after column 6. The comment extends to the end of the source
line and is ignored. If the first non-blank character after the directive sentinel of an
initial or continuation directive line is an exclamation point, the line is ignored.
Chapter 2
Directives
23
Fortran (cont.)
1
2
3
Note – in the following example, the three formats for specifying the directive are
equivalent (the first line represents the position of the first 9 columns):
4
c23456789
5
!$omp parallel do shared(a,b,c)
6
7
c$omp parallel do
8
c$omp+shared(a,b,c)
9
10
11
c$omp paralleldoshared(a,b,c)
2.1.2
Free Source Form Directives
The following sentinel is recognized in free form source files:
12
!$omp
13
14
15
16
17
18
19
20
The sentinel can appear in any column as long as it is preceded only by white space
(spaces and tab characters). It must appear as a single word with no intervening
character. Fortran free form line length, white space, and continuation rules apply to the
directive line. Initial directive lines must have a space after the sentinel. Continued
directive lines must have an ampersand (&) as the last nonblank character on the line,
prior to any comment placed inside the directive. Continuation directive lines can have
an ampersand after the directive sentinel with optional white space before and after the
ampersand.
21
22
23
24
Comments may appear on the same line as a directive. The exclamation point (!)
initiates a comment. The comment extends to the end of the source line and is ignored.
If the first nonblank character after the directive sentinel is an exclamation point, the
line is ignored.
25
26
27
One or more blanks or horizontal tabs must be used to separate adjacent keywords in
directives in free source form, except in the following cases, where white space is
optional between the given pair of keywords:
24
OpenMP API • Version 3.1 July 2011
1
end atomic
end critical
end do
end master
end ordered
end parallel
end sections
end single
end task
end workshare
parallel do
parallel sections
parallel workshare
2
3
Note – in the following example the three formats for specifying the directive are
equivalent (the first line represents the position of the first 9 columns):
4
!23456789
5
6
!$omp parallel do &
!$omp shared(a,b,c)
7
8
9
!$omp parallel &
!$omp&do shared(a,b,c)
10
11
!$omp paralleldo shared(a,b,c)
12
Fortran
Chapter 2
Directives
25
1
2.2
Conditional Compilation
2
3
4
In implementations that support a preprocessor, the _OPENMP macro name is defined to
have the decimal value yyyymm where yyyy and mm are the year and month designations
of the version of the OpenMP API that the implementation supports.
5
6
If this macro is the subject of a #define or a #undef preprocessing directive, the
behavior is unspecified.
7
For examples of conditional compilation, see Section A.3 on page 169.
Fortran
The OpenMP API requires Fortran lines to be compiled conditionally, as described in
the following sections.
8
9
11
Fixed Source Form Conditional Compilation
Sentinels
12
13
The following conditional compilation sentinels are recognized in fixed form source
files:
10
2.2.1
!$ | *$ | c$
14
15
To enable conditional compilation, a line with a conditional compilation sentinel must
satisfy the following criteria:
16
17
• The sentinel must start in column 1 and appear as a single word with no intervening
18
19
• After the sentinel is replaced with two spaces, initial lines must have a space or zero
20
21
22
• After the sentinel is replaced with two spaces, continuation lines must have a
23
24
If these criteria are met, the sentinel is replaced by two spaces. If these criteria are not
met, the line is left unchanged.
white space.
in column 6 and only white space and numbers in columns 1 through 5.
character other than a space or zero in column 6 and only white space in columns 1
through 5.
25
26
27
26
OpenMP API • Version 3.1 July 2011
Fortran (cont.)
1
2
3
4
Note – in the following example, the two forms for specifying conditional compilation
in fixed source form are equivalent (the first line represents the position of the first 9
columns):
5
c23456789
6
!$ 10 iam = omp_get_thread_num() +
7
!$
&
index
8
9
10
#ifdef _OPENMP
11
10 iam = omp_get_thread_num() +
12
&
13
#endif
15
Free Source Form Conditional Compilation
Sentinel
16
The following conditional compilation sentinel is recognized in free form source files:
14
2.2.2
index
!$
17
18
To enable conditional compilation, a line with a conditional compilation sentinel must
satisfy the following criteria:
19
• The sentinel can appear in any column but must be preceded only by white space.
20
• The sentinel must appear as a single word with no intervening white space.
21
• Initial lines must have a space after the sentinel.
22
23
24
25
• Continued lines must have an ampersand as the last nonblank character on the line,
26
27
If these criteria are met, the sentinel is replaced by two spaces. If these criteria are not
met, the line is left unchanged.
prior to any comment appearing on the conditionally compiled line. Continued lines
can have an ampersand after the sentinel, with optional white space before and after
the ampersand.
Chapter 2
Directives
27
1
2
3
Note – in the following example, the two forms for specifying conditional compilation
in free source form are equivalent (the first line represents the position of the first 9
columns):
4
c23456789
5
!$ iam = omp_get_thread_num() +
6
!$&
&
index
7
8
#ifdef _OPENMP
9
iam = omp_get_thread_num() +
10
&
index
11
#endif
12
Fortran
13
2.3
Internal Control Variables
14
15
16
17
18
19
20
21
An OpenMP implementation must act as if there were internal control variables (ICVs)
that control the behavior of an OpenMP program. These ICVs store information such as
the number of threads to use for future parallel regions, the schedule to use for
worksharing loops and whether nested parallelism is enabled or not. The ICVs are given
values at various times (described below) during the execution of the program. They are
initialized by the implementation itself and may be given values through OpenMP
environment variables and through calls to OpenMP API routines. The program can
retrieve the values of these ICVs only through OpenMP API routines.
22
23
24
For purposes of exposition, this document refers to the ICVs by certain names, but an
implementation is not required to use these names or to offer any way to access the
variables other than through the ways shown in Section 2.3.2 on page 29.
25
2.3.1
ICV Descriptions
The following ICVs store values that affect the operation of parallel regions.
26
27
28
OpenMP API • Version 3.1 July 2011
1
2
3
• dyn-var - controls whether dynamic adjustment of the number of threads is enabled
4
5
• nest-var - controls whether nested parallelism is enabled for encountered parallel
6
7
• nthreads-var - controls the number of threads requested for encountered parallel
8
9
• thread-limit-var - controls the maximum number of threads participating in the
for encountered parallel regions. There is one copy of this ICV per data
environment.
regions. There is one copy of this ICV per data environment.
regions. There is one copy of this ICV per data environment.
OpenMP program. There is one copy of this ICV for the whole program.
10
11
• max-active-levels-var - controls the maximum number of nested active parallel
12
The following ICVs store values that affect the operation of loop regions.
13
14
• run-sched-var - controls the schedule that the runtime schedule clause uses for
15
16
• def-sched-var - controls the implementation defined default scheduling of loop
17
The following ICVs store values that affect the program execution.
18
19
20
• bind-var - controls the binding of threads to processors. If binding is enabled, the
21
22
• stacksize-var - controls the stack size for threads that the OpenMP implementation
23
24
• wait-policy-var - controls the desired behavior of waiting threads. There is one copy
25
regions. There is one copy of this ICV for the whole program.
loop regions. There is one copy of this ICV per data environment.
regions. There is one copy of this ICV for the whole program.
execution environment is advised not to move OpenMP threads between processors.
There is one copy of this ICV for the whole program.
creates. There is one copy this ICV for the whole program.
of this ICV for the whole program.
2.3.2
Modifying and Retrieving ICV Values
The following table shows the methods for retrieving the values of the ICVs as well as
their initial values:
26
27
ICV
Scope
Ways to modify value
Way to retrieve value
Initial value
dyn-var
data
environment
OMP_DYNAMIC
omp_set_dynamic()
omp_get_dynamic()
See comments
below
nest-var
data
environment
OMP_NESTED
omp_set_nested()
omp_get_nested()
false
nthreads-var
data
environment
OMP_NUM_THREADS
omp_set_num_threads()
omp_get_max_threads()
Implementation
defined
run-sched-var
data
environment
OMP_SCHEDULE
omp_set_schedule()
omp_get_schedule()
Implementation
defined
Chapter 2
Directives
29
ICV
Scope
Ways to modify value
Way to retrieve value
Initial value
def-sched-var
global
(none)
(none)
Implementation
defined
bind-var
global
OMP_PROC_BIND
(none)
Implementation
defined
stacksize-var
global
OMP_STACKSIZE
(none)
Implementation
defined
wait-policy-var
global
OMP_WAIT_POLICY
(none)
Implementation
defined
thread-limit-var
global
OMP_THREAD_LIMIT
omp_get_thread_limit() Implementation
defined
max-active-levels-var
global
OMP_MAX_ACTIVE_LEVELS
omp_set_max_active_
levels()
omp_get_max_active_
levels()
See comments
below
1
Comments:
2
3
4
• The value of the nthreads-var ICV is a list. The runtime call
5
6
• The initial value of dyn-var is implementation defined if the implementation supports
7
8
9
• The initial value of max-active-levels-var is the number of levels of parallelism that
10
11
12
13
After the initial values are assigned, but before any OpenMP construct or OpenMP API
routine executes, the values of any OpenMP environment variables that were set by the
user are read and the associated ICVs are modified accordingly. After this point, no
changes to any OpenMP environment variables will affect the ICVs.
14
Clauses on OpenMP constructs do not modify the values of any of the ICVs.
15
omp_set_num_threads() sets the value of the first element of this list, and
omp_get_max_threads() retrieves the value of the first element of this list.
dynamic adjustment of the number of threads; otherwise, the initial value is false.
the implementation supports. See the definition of supporting n levels of parallelism
in Section 1.2.5 on page 10 for further details.
2.3.3
How the Per-Data Environment ICVs Work
16
17
Each data environment has its own copies of internal variables dyn-var, nest-var,
nthreads-var, and run-sched-var.
18
19
20
Calls to omp_set_num_threads(), omp_set_dynamic(),
omp_set_nested(), and omp_set_schedule() modify only the ICVs in the
data environment of their binding task.
21
22
23
When a task construct or parallel construct is encountered, the generated task(s)
inherit the values of dyn-var, nest-var, and run-sched-var from the generating task's ICV
values.
30
OpenMP API • Version 3.1 July 2011
1
2
3
4
5
6
7
When a task construct is encountered, the generated task inherits the value of
nthreads-var from the generating task's nthreads-var value. When a parallel
construct is encountered, and the generating task's nthreads-var list contains a single
element, the generated task(s) inherit that list as the value of nthreads-var. When a
parallel construct is encountered, and the generating task's nthreads-var list contains
multiple elements, the generated task(s) inherit the value of nthreads-var as the list
obtained by deletion of the first element from the generating task's nthreads-var value.
8
9
10
When encountering a loop worksharing region with schedule(runtime), all
implicit task regions that constitute the binding parallel region must have the same value
for run-sched-var in their data environments. Otherwise, the behavior is unspecified.
11
12
13
2.3.4
ICV Override Relationships
The override relationships among various construct clauses, OpenMP API routines,
environment variables, and the initial values of ICVs are shown in the following table:
14
construct
clause, if used
overrides call to API routine
overrides setting of
environment variable
overrides initial
value of
(none)
(none)
omp_set_dynamic()
OMP_DYNAMIC
dyn-var
omp_set_nested()
OMP_NESTED
nest-var
num_threads
omp_set_num_threads()
OMP_NUM_THREADS
nthreads-var
schedule
omp_set_schedule()
OMP_SCHEDULE
run-sched-var
(none)
(none)
OMP_PROC_BIND
bind-var
schedule
(none)
(none)
def-sched-var
(none)
(none)
OMP_STACKSIZE
stacksize-var
(none)
(none)
OMP_WAIT_POLICY
wait-policy-var
(none)
(none)
OMP_THREAD_LIMIT
thread-limit-var
(none)
omp_set_max_active_levels()
OMP_MAX_ACTIVE_LEVELS
max-active-levels-var
*
15
16
17
* The num_threads clause and omp_set_num_threads() override the value of
the OMP_NUM_THREADS environment variable and the initial value of the first element
of the nthreads-var ICV.
18
Cross References:
19
• parallel construct, see Section 2.4 on page 33.
20
• num_threads clause, see Section 2.4.1 on page 36.
Chapter 2
Directives
31
1
• schedule clause, see Section 2.5.1.1 on page 47.
2
• Loop construct, see Section 2.5.1 on page 39.
3
• omp_set_num_threads routine, see Section 3.2.1 on page 116.
4
• omp_get_max_threads routine, see Section 3.2.3 on page 118.
5
• omp_set_dynamic routine, see Section 3.2.7 on page 123.
6
• omp_get_dynamic routine, see Section 3.2.8 on page 124.
7
• omp_set_nested routine, see Section 3.2.9 on page 125.
8
• omp_get_nested routine, see Section 3.2.10 on page 126.
9
• omp_set_schedule routine, see Section 3.2.11 on page 128.
10
• omp_get_schedule routine, see Section 3.2.12 on page 130.
11
• omp_get_thread_limit routine, see Section 3.2.13 on page 131.
12
• omp_set_max_active_levels routine, see Section 3.2.14 on page 132.
13
• omp_get_max_active_levels routine, see Section 3.2.15 on page 134.
14
• OMP_SCHEDULE environment variable, see Section 4.1 on page 154.
15
• OMP_NUM_THREADS environment variable, see Section 4.2 on page 155.
16
• OMP_DYNAMIC environment variable, see Section 4.3 on page 156.
17
• OMP_PROC_BIND environment variable, see Section 4.4 on page 156
18
• OMP_NESTED environment variable, see Section 4.5 on page 157.
19
• OMP_STACKSIZE environment variable, see Section 4.6 on page 157.
20
• OMP_WAIT_POLICY environment variable, see Section 4.7 on page 158.
21
• OMP_MAX_ACTIVE_LEVELS environment variable, see Section 4.8 on page 159.
22
• OMP_THREAD_LIMIT environment variable, see Section 4.9 on page 160.
32
OpenMP API • Version 3.1 July 2011
1
2.4
parallel Construct
2
Summary
3
4
This fundamental construct starts parallel execution. See Section 1.3 on page 12 for a
general description of the OpenMP execution model.
5
Syntax
6
The syntax of the parallel construct is as follows:
C/C++
#pragma omp parallel [clause[ [, ]clause] ...] new-line
structured-block
7
where clause is one of the following:
if(scalar-expression)
num_threads(integer-expression)
default(shared | none)
private(list)
firstprivate(list)
shared(list)
copyin(list)
reduction(operator: list)
C/C++
8
Fortran
9
The syntax of the parallel construct is as follows:
!$omp parallel [clause[[,] clause]...]
structured-block
!$omp end parallel
Chapter 2
Directives
33
where clause is one of the following:
1
if(scalar-logical-expression)
num_threads(scalar-integer-expression)
default(private | firstprivate | shared | none)
private(list)
firstprivate(list)
shared(list)
copyin(list)
reduction({operator|intrinsic_procedure_name}:list)
The end parallel directive denotes the end of the parallel construct.
2
Fortran
3
Binding
4
5
The binding thread set for a parallel region is the encountering thread. The
encountering thread becomes the master thread of the new team.
6
Description
7
8
9
10
11
12
13
14
When a thread encounters a parallel construct, a team of threads is created to
execute the parallel region (see Section 2.4.1 on page 36 for more information about
how the number of threads in the team is determined, including the evaluation of the if
and num_threads clauses). The thread that encountered the parallel construct
becomes the master thread of the new team, with a thread number of zero for the
duration of the new parallel region. All threads in the new team, including the
master thread, execute the region. Once the team is created, the number of threads in the
team remains constant for the duration of that parallel region.
15
16
17
18
Within a parallel region, thread numbers uniquely identify each thread. Thread
numbers are consecutive whole numbers ranging from zero for the master thread up to
one less than the number of threads in the team. A thread may obtain its own thread
number by a call to the omp_get_thread_num library routine.
19
20
21
22
A set of implicit tasks, equal in number to the number of threads in the team, is
generated by the encountering thread. The structured block of the parallel construct
determines the code that will be executed in each implicit task. Each task is assigned to
a different thread in the team and becomes tied. The task region of the task being
34
OpenMP API • Version 3.1 July 2011
1
2
3
executed by the encountering thread is suspended and each thread in the team executes
its implicit task. Each thread can execute a path of statements that is different from that
of the other threads.
4
5
6
7
The implementation may cause any thread to suspend execution of its implicit task at a
task scheduling point, and switch to execute any explicit task generated by any of the
threads in the team, before eventually resuming execution of the implicit task (for more
details see Section 2.7 on page 61).
8
9
10
There is an implied barrier at the end of a parallel region. After the end of a
parallel region, only the master thread of the team resumes execution of the
enclosing task region.
11
12
13
If a thread in a team executing a parallel region encounters another parallel
directive, it creates a new team, according to the rules in Section 2.4.1 on page 36, and
it becomes the master of that new team.
14
15
16
17
18
If execution of a thread terminates while inside a parallel region, execution of all
threads in all teams terminates. The order of termination of threads is unspecified. All
work done by a team prior to any barrier that the team has passed in the program is
guaranteed to be complete. The amount of work done by each thread after the last
barrier that it passed and before it terminates is unspecified.
19
20
For an example of the parallel construct, see Section A.5 on page 172. For an
example of the num_threads clause, see Section A.7 on page 177.
21
Restrictions
22
Restrictions to the parallel construct are as follows:
23
• A program that branches into or out of a parallel region is non-conforming.
24
25
• A program must not depend on any ordering of the evaluations of the clauses of the
26
• At most one if clause can appear on the directive.
27
28
• At most one num_threads clause can appear on the directive. The num_threads
29
30
31
• A throw executed inside a parallel region must cause execution to resume
parallel directive, or on any side effects of the evaluations of the clauses.
expression must evaluate to a positive integer value.
C/C++
within the same parallel region, and the same thread that threw the exception
must catch it.
C/C++
Chapter 2
Directives
35
Fortran
• Unsynchronized use of Fortran I/O statements by multiple threads on the same unit
1
2
has unspecified behavior.
Fortran
3
Cross References
4
5
• default, shared, private, firstprivate, and reduction clauses, see
6
• copyin clause, see Section 2.9.4 on page 107.
7
• omp_get_thread_num routine, see Section 3.2.4 on page 119.
8
Section 2.9.3 on page 92.
2.4.1
9
Determining the Number of Threads for a
parallel Region
10
11
12
13
When execution encounters a parallel directive, the value of the if clause or
num_threads clause (if any) on the directive, the current parallel context, and the
values of the nthreads-var, dyn-var, thread-limit-var, max-active-level-var, and nest-var
ICVs are used to determine the number of threads to use in the region.
14
15
16
17
18
19
Note that using a variable in an if or num_threads clause expression of a
parallel construct causes an implicit reference to the variable in all enclosing
constructs. The if clause expression and the num_threads clause expression are
evaluated in the context outside of the parallel construct, and no ordering of those
evaluations is specified. It is also unspecified whether, in what order, or how many times
any side-effects of the evaluation of the num_threads or if clause expressions occur.
20
21
When a thread encounters a parallel construct, the number of threads is determined
according to Algorithm 2.1.
Algorithm 2.1
let ThreadsBusy be the number of OpenMP threads currently executing;
let ActiveParRegions be the number of enclosing active parallel regions;
if an if clause exists
then let IfClauseValue be the value of the if clause expression;
else let IfClauseValue = true;
if a num_threads clause exists
36
OpenMP API • Version 3.1 July 2011
Algorithm 2.1
then let ThreadsRequested be the value of the num_threads clause
expression;
else let ThreadsRequested = value of the first element of nthreads-var;
let ThreadsAvailable = (thread-limit-var - ThreadsBusy + 1);
if (IfClauseValue = false)
then number of threads = 1;
else if (ActiveParRegions >= 1) and (nest-var = false)
then number of threads = 1;
else if (ActiveParRegions = max-active-levels-var)
then number of threads = 1;
else if (dyn-var = true) and (ThreadsRequested <= ThreadsAvailable)
then number of threads = [ 1 : ThreadsRequested ];
else if (dyn-var = true) and (ThreadsRequested > ThreadsAvailable)
then number of threads = [ 1 : ThreadsAvailable ];
else if (dyn-var = false) and (ThreadsRequested <= ThreadsAvailable)
then number of threads = ThreadsRequested;
else if (dyn-var = false) and (ThreadsRequested > ThreadsAvailable)
then behavior is implementation defined;
1
2
3
Note – Since the initial value of the dyn-var ICV is implementation defined, programs
that depend on a specific number of threads for correct execution should explicitly
disable dynamic adjustment of the number of threads.
4
Cross References
5
6
• nthreads-var, dyn-var, thread-limit-var, max-active-level-var, and nest-var ICVs, see
Section 2.3 on page 28.
Chapter 2
Directives
37
1
2.5
Worksharing Constructs
A worksharing construct distributes the execution of the associated region among the
members of the team that encounters it. Threads execute portions of the region in the
context of the implicit tasks each one is executing. If the team consists of only one
thread then the worksharing region is not executed in parallel.
2
3
4
5
6
7
8
9
10
11
12
A worksharing region has no barrier on entry; however, an implied barrier exists at the
end of the worksharing region, unless a nowait clause is specified. If a nowait
clause is present, an implementation may omit the barrier at the end of the worksharing
region. In this case, threads that finish early may proceed straight to the instructions
following the worksharing region without waiting for the other members of the team to
finish the worksharing region, and without performing a flush operation (see
Section A.10 on page 182 for an example).
13
14
The OpenMP API defines the following worksharing constructs, and these are described
in the sections that follow:
15
• loop construct
16
• sections construct
17
• single construct
18
• workshare construct
19
Restrictions
20
The following restrictions apply to worksharing constructs:
21
22
• Each worksharing region must be encountered by all threads in a team or by none at
23
24
• The sequence of worksharing regions and barrier regions encountered must be the
all.
same for every thread in a team.
38
OpenMP API • Version 3.1 July 2011
1
2.5.1
Loop Construct
2
Summary
3
4
5
6
The loop construct specifies that the iterations of one or more associated loops will be
executed in parallel by threads in the team in the context of their implicit tasks. The
iterations are distributed across threads that already exist in the team executing the
parallel region to which the loop region binds.
7
Syntax
8
The syntax of the loop construct is as follows:
C/C++
#pragma omp for [clause[[,] clause] ... ] new-line
for-loops
9
where clause is one of the following:
private(list)
firstprivate(list)
lastprivate(list)
reduction(operator: list)
schedule(kind[, chunk_size])
collapse(n)
ordered
nowait
Chapter 2
Directives
39
C/C++ (cont.)
The for directive places restrictions on the structure of all associated for-loops.
Specifically, all associated for-loops must have the following canonical form:
1
2
for (init-expr; test-expr; incr-expr) structured-block
40
init-expr
One of the following:
var = lb
integer-type var = lb
random-access-iterator-type var = lb
pointer-type var = lb
test-expr
One of the following:
var relational-op b
b relational-op var
incr-expr
One of the following:
++var
var++
--var
var-var += incr
var -= incr
var = var + incr
var = incr + var
var = var - incr
var
One of the following:
A variable of a signed or unsigned integer type.
For C++, a variable of a random access iterator type.
For C, a variable of a pointer type.
If this variable would otherwise be shared, it is implicitly made
private in the loop construct. This variable must not be
modified during the execution of the for-loop other than in
incr-expr. Unless the variable is specified lastprivate on
the loop construct, its value after the loop is unspecified.
relational-op
One of the following:
<
<=
>
>=
lb and b
Loop invariant expressions of a type compatible with the type
of var.
incr
A loop invariant integer expression.
OpenMP API • Version 3.1 July 2011
1
2
3
The canonical form allows the iteration count of all associated loops to be computed
before executing the outermost loop. The computation is performed for each loop in an
integer type. This type is derived from the type of var as follows:
4
• If var is of an integer type, then the type is the type of var.
5
6
• For C++, if var is of a random access iterator type, then the type is the type that
7
• For C, if var is of a pointer type, then the type is ptrdiff_t.
8
9
The behavior is unspecified if any intermediate result required to compute the iteration
count cannot be represented in the type determined above.
10
11
12
There is no implied synchronization during the evaluation of the lb, b, or incr
expressions. It is unspecified whether, in what order, or how many times any side effects
within the lb, b, or incr expressions occur.
13
14
15
16
Note – Random access iterators are required to support random access to elements in
constant time. Other iterators are precluded by the restrictions since they can take linear
time or offer limited functionality. It is therefore advisable to use tasks to parallelize
those cases.
would be used by std::distance applied to variables of the type of var.
C/C++
17
Fortran
18
The syntax of the loop construct is as follows:
!$omp do [clause[[,] clause] ... ]
do-loops
[!$omp end do [nowait] ]
19
where clause is one of the following:
private(list)
firstprivate(list)
lastprivate(list)
reduction({operator|intrinsic_procedure_name}:list)
Chapter 2
Directives
41
schedule(kind[, chunk_size])
collapse(n)
ordered
1
2
If an end do directive is not specified, an end do directive is assumed at the end of the
do-loop.
3
4
5
6
All associated do-loops must be do-constructs as defined by the Fortran standard. If an
end do directive follows a do-construct in which several loop statements share a DO
termination statement, then the directive can only be specified for the outermost of these
DO statements. See Section A.8 on page 179 for examples.
7
8
9
10
If any of the loop iteration variables would otherwise be shared, they are implicitly
made private on the loop construct. See Section A.9 on page 181 for examples. Unless
the loop iteration variables are specified lastprivate on the loop construct, their
values after the loop are unspecified.
Fortran
11
Binding
12
13
14
15
The binding thread set for a loop region is the current team. A loop region binds to the
innermost enclosing parallel region. Only the threads of the team executing the
binding parallel region participate in the execution of the loop iterations and the
implied barrier of the loop region if the barrier is not eliminated by a nowait clause.
16
Description
17
18
The loop construct is associated with a loop nest consisting of one or more loops that
follow the directive.
19
20
There is an implicit barrier at the end of a loop construct unless a nowait clause is
specified.
21
22
23
24
The collapse clause may be used to specify how many loops are associated with the
loop construct. The parameter of the collapse clause must be a constant positive
integer expression. If no collapse clause is present, the only loop that is associated
with the loop construct is the one that immediately follows the loop directive.
25
26
27
28
If more than one loop is associated with the loop construct, then the iterations of all
associated loops are collapsed into one larger iteration space that is then divided
according to the schedule clause. The sequential execution of the iterations in all
associated loops determines the order of the iterations in the collapsed iteration space.
42
OpenMP API • Version 3.1 July 2011
1
2
3
The iteration count for each associated loop is computed before entry to the outermost
loop. If execution of any associated loop changes any of the values used to compute any
of the iteration counts, then the behavior is unspecified.
4
5
The integer type (or kind, for Fortran) used to compute the iteration count for the
collapsed loop is implementation defined.
6
7
8
9
10
11
12
13
14
15
16
A worksharing loop has logical iterations numbered 0,1,...,N-1 where N is the number of
loop iterations, and the logical numbering denotes the sequence in which the iterations
would be executed if the associated loop(s) were executed by a single thread. The
schedule clause specifies how iterations of the associated loops are divided into
contiguous non-empty subsets, called chunks, and how these chunks are distributed
among threads of the team. Each thread executes its assigned chunk(s) in the context of
its implicit task. The chunk_size expression is evaluated using the original list items of
any variables that are made private in the loop construct. It is unspecified whether, in
what order, or how many times, any side-effects of the evaluation of this expression
occur. The use of a variable in a schedule clause expression of a loop construct
causes an implicit reference to the variable in all enclosing constructs.
17
18
19
20
21
Different loop regions with the same schedule and iteration count, even if they occur in
the same parallel region, can distribute iterations among threads differently. The only
exception is for the static schedule as specified in Table 2-1. Programs that depend
on which thread executes a particular iteration under any other circumstances are
non-conforming.
22
23
See Section 2.5.1.1 on page 47 for details of how the schedule for a worksharing loop is
determined.
24
The schedule kind can be one of those specified in Table 2-1.
25
Chapter 2
Directives
43
1
TABLE 2-1
static
schedule clause kind values
When schedule(static, chunk_size) is specified, iterations are divided
into chunks of size chunk_size, and the chunks are assigned to the threads in
the team in a round-robin fashion in the order of the thread number.
When no chunk_size is specified, the iteration space is divided into chunks that
are approximately equal in size, and at most one chunk is distributed to each
thread. Note that the size of the chunks is unspecified in this case.
A compliant implementation of the static schedule must ensure that the
same assignment of logical iteration numbers to threads will be used in two
loop regions if the following conditions are satisfied: 1) both loop regions have
the same number of loop iterations, 2) both loop regions have the same value
of chunk_size specified, or both loop regions have no chunk_size specified, and
3) both loop regions bind to the same parallel region. A data dependence
between the same logical iterations in two such loops is guaranteed to be
satisfied allowing safe use of the nowait clause (see Section A.10 on page
182 for examples).
dynamic
When schedule(dynamic, chunk_size) is specified, the iterations are
distributed to threads in the team in chunks as the threads request them. Each
thread executes a chunk of iterations, then requests another chunk, until no
chunks remain to be distributed.
Each chunk contains chunk_size iterations, except for the last chunk to be
distributed, which may have fewer iterations.
When no chunk_size is specified, it defaults to 1.
guided
When schedule(guided, chunk_size) is specified, the iterations are
assigned to threads in the team in chunks as the executing threads request
them. Each thread executes a chunk of iterations, then requests another chunk,
until no chunks remain to be assigned.
For a chunk_size of 1, the size of each chunk is proportional to the
number of unassigned iterations divided by the number of threads in the team,
decreasing to 1. For a chunk_size with value k (greater than 1), the
size of each chunk is determined in the same way, with the restriction
that the chunks do not contain fewer than k iterations (except for the last chunk
to be assigned, which may have fewer than k iterations).
When no chunk_size is specified, it defaults to 1.
auto
44
When schedule(auto) is specified, the decision regarding scheduling is
delegated to the compiler and/or runtime system. The programmer gives the
implementation the freedom to choose any possible mapping of iterations to
threads in the team.
OpenMP API • Version 3.1 July 2011
runtime
When schedule(runtime) is specified, the decision regarding scheduling
is deferred until run time, and the schedule and chunk size are taken from the
run-sched-var ICV. If the ICV is set to auto, the schedule is implementation
defined.
1
2
3
4
5
6
Note – For a team of p threads and a loop of n iterations, let n ⁄ p be the integer q
that satisfies n = p*q - r, with 0 ≤ r < p . One compliant implementation of the static
schedule (with no specified chunk_size) would behave as though chunk_size had been
specified with value q. Another compliant implementation would assign q iterations to
the first p-r threads, and q-1 iterations to the remaining r threads. This illustrates why a
conforming program must not rely on the details of a particular implementation.
7
8
9
10
11
12
A compliant implementation of the guided schedule with a chunk_size value of k
would assign q = n ⁄ p iterations to the first available thread and set n to the larger of
n-q and p*k. It would then repeat this process until q is greater than or equal to the
number of remaining iterations, at which time the remaining iterations form the final
chunk. Another compliant implementation could use the same method, except with
q = n ⁄ ( 2p ) , and set n to the larger of n-q and 2*p*k.
13
Restrictions
14
Restrictions to the loop construct are as follows:
15
16
• All loops associated with the loop construct must be perfectly nested; that is, there
17
18
• The values of the loop control expressions of the loops associated with the loop
19
• Only one schedule clause can appear on a loop directive.
20
• Only one collapse clause can appear on a loop directive.
21
• chunk_size must be a loop invariant integer expression with a positive value.
22
• The value of the chunk_size expression must be the same for all threads in the team.
23
• The value of the run-sched-var ICV must be the same for all threads in the team.
24
25
• When schedule(runtime) or schedule(auto) is specified, chunk_size must
26
• Only one ordered clause can appear on a loop directive.
27
28
• The ordered clause must be present on the loop construct if any ordered region
29
• The loop iteration variable may not appear in a threadprivate directive.
must be no intervening code nor any OpenMP directive between any two loops.
construct must be the same for all the threads in the team.
not be specified.
ever binds to a loop region arising from the loop construct.
Chapter 2
Directives
45
C/C++
1
• The associated for-loops must be structured blocks.
2
3
• Only an iteration of the innermost associated loop may be curtailed by a continue
4
• No statement can branch to any associated for statement.
5
• Only one nowait clause can appear on a for directive.
6
7
8
9
• If test-expr is of the form var relational-op b and relational-op is < or <= then
10
11
12
13
• If test-expr is of the form b relational-op var and relational-op is < or <= then
14
15
16
• A throw executed inside a loop region must cause execution to resume within the
statement.
incr-expr must cause var to increase on each iteration of the loop. If test-expr is of
the form var relational-op b and relational-op is > or >= then incr-expr must cause
var to decrease on each iteration of the loop.
incr-expr must cause var to decrease on each iteration of the loop. If test-expr is of
the form b relational-op var and relational-op is > or >= then incr-expr must cause
var to increase on each iteration of the loop.
same iteration of the loop region, and the same thread that threw the exception must
catch it.
C/C++
Fortran
17
• The associated do-loops must be structured blocks.
18
19
• Only an iteration of the innermost associated loop may be curtailed by a CYCLE
20
21
• No statement in the associated loops other than the DO statements can cause a branch
22
• The do-loop iteration variable must be of type integer.
23
• The do-loop cannot be a DO WHILE or a DO loop without loop control.
statement.
out of the loops.
Fortran
24
Cross References
25
26
• private, firstprivate, lastprivate, and reduction clauses, see
27
• OMP_SCHEDULE environment variable, see Section 4.1 on page 154.
28
• ordered construct, see Section 2.8.7 on page 82.
Section 2.9.3 on page 92.
46
OpenMP API • Version 3.1 July 2011
1
2.5.1.1
Determining the Schedule of a Worksharing Loop
2
3
4
5
6
7
8
9
10
When execution encounters a loop directive, the schedule clause (if any) on the
directive, and the run-sched-var and def-sched-var ICVs are used to determine how loop
iterations are assigned to threads. See Section 2.3 on page 28 for details of how the
values of the ICVs are determined. If the loop directive does not have a schedule
clause then the current value of the def-sched-var ICV determines the schedule. If the
loop directive has a schedule clause that specifies the runtime schedule kind then
the current value of the run-sched-var ICV determines the schedule. Otherwise, the
value of the schedule clause determines the schedule. Figure 2-1 describes how the
schedule for a worksharing loop is determined.
11
Cross References
12
• ICVs, see Section 2.3 on page 28.
13
START
schedule
clause present?
No
Use def-sched-var schedule kind
Yes
schedule kind
value is runtime?
No
Use schedule kind specified in
schedule clause
Yes
Use run-sched-var schedule kind
14
FIGURE 2-1
Determining the schedule for a worksharing loop.
Chapter 2
Directives
47
1
2.5.2
sections Construct
2
Summary
3
4
5
6
The sections construct is a noniterative worksharing construct that contains a set of
structured blocks that are to be distributed among and executed by the threads in a team.
Each structured block is executed once by one of the threads in the team in the context
of its implicit task.
7
Syntax
8
The syntax of the sections construct is as follows:
C/C++
#pragma omp sections [clause[[,] clause] ...] new-line
{
[#pragma omp section new-line]
structured-block
[#pragma omp section new-line
structured-block ]
...
}
where clause is one of the following:
9
private(list)
firstprivate(list)
lastprivate(list)
reduction(operator: list)
nowait
10
C/C++
48
OpenMP API • Version 3.1 July 2011
Fortran
1
The syntax of the sections construct is as follows:
!$omp sections [clause[[,] clause] ...]
[!$omp section]
structured-block
[!$omp section
structured-block ]
...
!$omp end sections [nowait]
2
where clause is one of the following:
private(list)
firstprivate(list)
lastprivate(list)
reduction({operator|intrinsic_procedure_name}:list)
3
Fortran
4
Binding
5
6
7
8
9
The binding thread set for a sections region is the current team. A sections
region binds to the innermost enclosing parallel region. Only the threads of the team
executing the binding parallel region participate in the execution of the structured
blocks and the implied barrier of the sections region if the barrier is not eliminated
by a nowait clause.
10
Description
11
12
Each structured block in the sections construct is preceded by a section directive
except possibly the first block, for which a preceding section directive is optional.
13
14
The method of scheduling the structured blocks among the threads in the team is
implementation defined.
15
16
There is an implicit barrier at the end of a sections construct unless a nowait
clause is specified.
Chapter 2
Directives
49
1
Restrictions
2
Restrictions to the sections construct are as follows:
3
4
5
• Orphaned section directives are prohibited. That is, the section directives must
6
• The code enclosed in a sections construct must be a structured block.
7
• Only a single nowait clause can appear on a sections directive.
appear within the sections construct and must not be encountered elsewhere in the
sections region.
C/C++
• A throw executed inside a sections region must cause execution to resume within
8
9
10
the same section of the sections region, and the same thread that threw the
exception must catch it.
C/C++
11
Cross References
12
13
• private, firstprivate, lastprivate, and reduction clauses, see
14
Section 2.9.3 on page 92.
2.5.3
single Construct
15
Summary
16
17
18
19
The single construct specifies that the associated structured block is executed by only
one of the threads in the team (not necessarily the master thread), in the context of its
implicit task. The other threads in the team, which do not execute the block, wait at an
implicit barrier at the end of the single construct unless a nowait clause is specified.
20
Syntax
21
The syntax of the single construct is as follows:
C/C++
#pragma omp single [clause[[,] clause] ...] new-line
structured-block
50
OpenMP API • Version 3.1 July 2011
1
where clause is one of the following:
private(list)
firstprivate(list)
copyprivate(list)
nowait
2
C/C++
Fortran
3
The syntax of the single construct is as follows:
!$omp single [clause[[,] clause] ...]
structured-block
!$omp end single [end_clause[[,] end_clause] ...]
4
where clause is one of the following:
private(list)
firstprivate(list)
5
and end_clause is one of the following:
copyprivate(list)
nowait
6
Fortran
7
8
9
10
11
12
Binding
The binding thread set for a single region is the current team. A single region
binds to the innermost enclosing parallel region. Only the threads of the team
executing the binding parallel region participate in the execution of the structured
block and the implied barrier of the single region if the barrier is not eliminated by a
nowait clause.
Chapter 2
Directives
51
1
Description
2
3
4
The method of choosing a thread to execute the structured block is implementation
defined. There is an implicit barrier at the end of the single construct unless a
nowait clause is specified.
5
For an example of the single construct, see Section A.14 on page 192.
6
Restrictions
7
Restrictions to the single construct are as follows:
8
• The copyprivate clause must not be used with the nowait clause.
9
• At most one nowait clause can appear on a single construct.
C/C++
• A throw executed inside a single region must cause execution to resume within the
10
11
same single region, and the same thread that threw the exception must catch it.
C/C++
12
Cross References
13
• private and firstprivate clauses, see Section 2.9.3 on page 92.
14
• copyprivate clause, see Section 2.9.4.2 on page 109.
Fortran
15
2.5.4
workshare Construct
16
Summary
17
18
19
The workshare construct divides the execution of the enclosed structured block into
separate units of work, and causes the threads of the team to share the work such that
each unit is executed only once by one thread, in the context of its implicit task.
52
OpenMP API • Version 3.1 July 2011
Fortran (cont.)
1
Syntax
2
The syntax of the workshare construct is as follows:
!$omp workshare
structured-block
!$omp end workshare [nowait]
3
The enclosed structured block must consist of only the following:
4
• array assignments
5
• scalar assignments
6
• FORALL statements
7
• FORALL constructs
8
• WHERE statements
9
• WHERE constructs
10
• atomic constructs
11
• critical constructs
12
• parallel constructs
13
14
Statements contained in any enclosed critical construct are also subject to these
restrictions. Statements in any enclosed parallel construct are not restricted.
15
Binding
16
17
18
19
20
The binding thread set for a workshare region is the current team. A workshare
region binds to the innermost enclosing parallel region. Only the threads of the team
executing the binding parallel region participate in the execution of the units of
work and the implied barrier of the workshare region if the barrier is not eliminated
by a nowait clause.
21
Description
22
23
There is an implicit barrier at the end of a workshare construct unless a nowait
clause is specified.
Chapter 2
Directives
53
Fortran (cont.)
1
2
3
4
5
An implementation of the workshare construct must insert any synchronization that is
required to maintain standard Fortran semantics. For example, the effects of one
statement within the structured block must appear to occur before the execution of
succeeding statements, and the evaluation of the right hand side of an assignment must
appear to complete prior to the effects of assigning to the left hand side.
6
The statements in the workshare construct are divided into units of work as follows:
7
8
• For array expressions within each statement, including transformational array
intrinsic functions that compute scalar values from arrays:
9
10
• Evaluation of each element of the array expression, including any references to
ELEMENTAL functions, is a unit of work.
11
12
• Evaluation of transformational array intrinsic functions may be freely subdivided
into any number of units of work.
13
• For an array assignment statement, the assignment of each element is a unit of work.
14
• For a scalar assignment statement, the assignment operation is a unit of work.
15
16
• For a WHERE statement or construct, the evaluation of the mask expression and the
17
18
19
• For a FORALL statement or construct, the evaluation of the mask expression,
20
21
• For an atomic construct, the atomic operation on the storage location designated as
22
• For a critical construct, the construct is a single unit of work.
23
24
25
• For a parallel construct, the construct is a unit of work with respect to the
26
27
• If none of the rules above apply to a portion of a statement in the structured block,
28
29
30
The transformational array intrinsic functions are MATMUL, DOT_PRODUCT, SUM,
PRODUCT, MAXVAL, MINVAL, COUNT, ANY, ALL, SPREAD, PACK, UNPACK,
RESHAPE, TRANSPOSE, EOSHIFT, CSHIFT, MINLOC, and MAXLOC.
31
32
It is unspecified how the units of work are assigned to the threads executing a
workshare region.
33
34
35
If an array expression in the block references the value, association status, or allocation
status of private variables, the value of the expression is undefined, unless the same
value would be computed by every thread.
masked assignments are each a unit of work.
expressions occurring in the specification of the iteration space, and the masked
assignments are each a unit of work.
x is the unit of work.
workshare construct. The statements contained in the parallel construct are
executed by a new thread team.
then that portion is a unit of work.
54
OpenMP API • Version 3.1 July 2011
1
2
If an array assignment, a scalar assignment, a masked array assignment, or a FORALL
assignment assigns to a private variable in the block, the result is unspecified.
3
4
The workshare directive causes the sharing of work to occur only in the workshare
construct, and not in the remainder of the workshare region.
5
For examples of the workshare construct, see Section A.17 on page 213.
6
Restrictions
7
The following restrictions apply to the workshare construct:
8
9
• All array assignments, scalar assignments, and masked array assignments must be
intrinsic assignments.
• The construct must not contain any user defined function calls unless the function is
10
11
ELEMENTAL.
Fortran
13
Combined Parallel Worksharing
Constructs
14
15
16
17
Combined parallel worksharing constructs are shortcuts for specifying a worksharing
construct nested immediately inside a parallel construct. The semantics of these
directives are identical to that of explicitly specifying a parallel construct containing
one worksharing construct and no other statements.
18
19
20
21
The combined parallel worksharing constructs allow certain clauses that are permitted
both on parallel constructs and on worksharing constructs. If a program would have
different behavior depending on whether the clause were applied to the parallel
construct or to the worksharing construct, then the program’s behavior is unspecified.
22
The following sections describe the combined parallel worksharing constructs:
23
• The parallel loop construct.
24
• The parallel sections construct.
25
• The parallel workshare construct.
12
2.6
Chapter 2
Directives
55
1
2.6.1
Parallel Loop Construct
2
Summary
3
4
The parallel loop construct is a shortcut for specifying a parallel construct
containing one or more associated loops and no other statements.
5
Syntax
6
The syntax of the parallel loop construct is as follows:
C/C++
#pragma omp parallel for [clause[[,] clause] ...] new-line
for-loop
where clause can be any of the clauses accepted by the parallel or for directives,
except the nowait clause, with identical meanings and restrictions.
7
8
C/C++
Fortran
The syntax of the parallel loop construct is as follows:
9
!$omp parallel do [clause[[,] clause] ...]
do-loop
[!$omp end parallel do]
10
11
where clause can be any of the clauses accepted by the parallel or do directives,
with identical meanings and restrictions.
12
13
14
If an end parallel do directive is not specified, an end parallel do directive is
assumed at the end of the do-loop. nowait may not be specified on an end
parallel do directive.
Fortran
56
OpenMP API • Version 3.1 July 2011
1
Description
2
3
The semantics are identical to explicitly specifying a parallel directive immediately
followed by a for directive.
C/C++
C/C++
Fortran
The semantics are identical to explicitly specifying a parallel directive immediately
followed by a do directive, and an end do directive immediately followed by an end
parallel directive.
4
5
6
Fortran
7
Restrictions
8
The restrictions for the parallel construct and the loop construct apply.
9
Cross References
10
• parallel construct, see Section 2.4 on page 33.
11
• loop construct, see Section 2.5.1 on page 39.
12
• Data attribute clauses, see Section 2.9.3 on page 92.
13
2.6.2
parallel sections Construct
14
Summary
15
16
The parallel sections construct is a shortcut for specifying a parallel
construct containing one sections construct and no other statements.
Chapter 2
Directives
57
1
Syntax
2
The syntax of the parallel sections construct is as follows:
C/C++
#pragma omp parallel sections [clause[[,] clause] ...] new-line
{
[#pragma omp section new-line]
structured-block
[#pragma omp section new-line
structured-block ]
...
}
where clause can be any of the clauses accepted by the parallel or sections
directives, except the nowait clause, with identical meanings and restrictions.
3
4
C/C++
Fortran
The syntax of the parallel sections construct is as follows:
5
!$omp parallel sections [clause[[,] clause] ...]
[!$omp section]
structured-block
[!$omp section
structured-block ]
...
!$omp end parallel sections
6
7
where clause can be any of the clauses accepted by the parallel or sections
directives, with identical meanings and restrictions.
8
9
The last section ends at the end parallel sections directive. nowait cannot be
specified on an end parallel sections directive.
Fortran
10
Description
11
12
The semantics are identical to explicitly specifying a parallel directive immediately
followed by a sections directive.
C/C++
C/C++
58
OpenMP API • Version 3.1 July 2011
Fortran
The semantics are identical to explicitly specifying a parallel directive immediately
followed by a sections directive, and an end sections directive immediately
followed by an end parallel directive.
1
2
3
Fortran
4
For an example of the parallel sections construct, see Section A.12 on page 189.
5
Restrictions
6
The restrictions for the parallel construct and the sections construct apply.
7
Cross References:
8
• parallel construct, see Section 2.4 on page 33.
9
• sections construct, see Section 2.5.2 on page 48.
10
• Data attribute clauses, see Section 2.9.3 on page 92.
Fortran
11
2.6.3
parallel workshare Construct
12
Summary
13
14
The parallel workshare construct is a shortcut for specifying a parallel
construct containing one workshare construct and no other statements.
15
Syntax
16
The syntax of the parallel workshare construct is as follows:
!$omp parallel workshare [clause[[,] clause] ...]
structured-block
!$omp end parallel workshare
17
18
19
where clause can be any of the clauses accepted by the parallel directive, with
identical meanings and restrictions. nowait may not be specified on an end
parallel workshare directive.
Chapter 2
Directives
59
1
Description
2
3
4
The semantics are identical to explicitly specifying a parallel directive immediately
followed by a workshare directive, and an end workshare directive immediately
followed by an end parallel directive.
5
Restrictions
6
The restrictions for the parallel construct and the workshare construct apply.
7
Cross References
8
• parallel construct, see Section 2.4 on page 33.
9
• workshare construct, see Section 2.5.4 on page 52.
• Data attribute clauses, see Section 2.9.3 on page 92.
10
Fortran
60
OpenMP API • Version 3.1 July 2011
1
2.7
Tasking Constructs
2
2.7.1
task Construct
3
Summary
4
The task construct defines an explicit task.
5
Syntax
6
The syntax of the task construct is as follows:
C/C++
#pragma omp task [clause[[,] clause] ...] new-line
structured-block
7
where clause is one of the following:
if(scalar-expression)
final(scalar-expression)
untied
default(shared | none)
mergeable
private(list)
firstprivate(list)
shared(list)
C/C++
8
Chapter 2
Directives
61
Fortran
The syntax of the task construct is as follows:
1
!$omp task [clause[[,] clause] ...]
structured-block
!$omp end task
where clause is one of the following:
2
if(scalar-logical-expression)
final(scalar-logical-expression)
untied
default(private | firstprivate | shared | none)
mergeable
private(list)
firstprivate(list)
shared(list)
Fortran
3
4
Binding
5
6
The binding thread set of the task region is the current team. A task region binds to
the innermost enclosing parallel region.
7
Description
8
9
10
11
When a thread encounters a task construct, a task is generated from the code for the
associated structured block. The data environment of the task is created according to the
data-sharing attribute clauses on the task construct, per-data environment ICVs, and
any defaults that apply.
12
13
14
15
16
The encountering thread may immediately execute the task, or defer its execution. In the
latter case, any thread in the team may be assigned the task. Completion of the task can
be guaranteed using task synchronization constructs. A task construct may be nested
inside an outer task, but the task region of the inner task is not a part of the task
region of the outer task.
62
OpenMP API • Version 3.1 July 2011
1
2
3
4
5
6
When an if clause is present on a task construct, and the if clause expression
evaluates to false, an undeferred task is generated, and the encountering thread must
suspend the current task region, for which execution cannot be resumed until the
generated task is completed. Note that the use of a variable in an if clause expression
of a task construct causes an implicit reference to the variable in all enclosing
constructs.
7
8
9
10
11
When a final clause is present on a task construct and the final clause expression
evaluates to true, the generated task will be a final task. All task constructs
encountered during execution of a final task will generate final and included tasks. Note
that the use of a variable in a final clause expression of a task construct causes an
implicit reference to the variable in all enclosing constructs.
12
13
The if clause expression and the final clause expression are evaluated in the context
outside of the task construct, and no ordering of those evaluations is specified.
14
15
16
17
18
19
20
A thread that encounters a task scheduling point within the task region may
temporarily suspend the task region. By default, a task is tied and its suspended task
region can only be resumed by the thread that started its execution. If the untied
clause is present on a task construct, any thread in the team can resume the task
region after a suspension. The untied clause is ignored if a final clause is present
on the same task construct and the final clause expression evaluates to true, or if a
task is an included task.
21
22
23
24
The task construct includes a task scheduling point in the task region of its generating
task, immediately following the generation of the explicit task. Each explicit task
region includes a task scheduling point at its point of completion. An implementation
might add task scheduling points anywhere in untied task regions.
25
26
27
When a mergeable clause is present on a task construct, and the generated task is
an undeferred task or an included task, the implementation might generate a merged task
instead.
28
29
30
Note – When storage is shared by an explicit task region, it is the programmer's
responsibility to ensure, by adding proper synchronization, that the storage does not
reach the end of its lifetime before the explicit task region completes its execution.
31
Restrictions
32
Restrictions to the task construct are as follows:
33
• A program that branches into or out of a task region is non-conforming.
34
35
• A program must not depend on any ordering of the evaluations of the clauses of the
task directive, or on any side effects of the evaluations of the clauses.
Chapter 2
Directives
63
1
• At most one if clause can appear on the directive.
2
• At most one final clause can appear on the directive.
3
4
• A throw executed inside a task region must cause execution to resume within the
C/C++
same task region, and the same thread that threw the exception must catch it.
C/C++
Fortran
• Unsynchronized use of Fortran I/O statements by multiple tasks on the same unit has
5
6
unspecified behavior.
Fortran
7
2.7.2
taskyield Construct
Summary
8
9
10
The taskyield construct specifies that the current task can be suspended in favor of
execution of a different task.
11
Syntax
12
The syntax of the taskyield construct is as follows:
C/C++
#pragma omp taskyield new-line
Because the taskyield construct is a stand-alone directive, there are some
restrictions on its placement within a program. The taskyield directive may be
placed only at a point where a base language statement is allowed. The taskyield
directive may not be used in place of the statement following an if, while, do,
switch, or label. See Appendix C for the formal grammar. The examples in
Section A.25 on page 236 illustrate these restrictions.
13
14
15
16
17
18
C/C++
Fortran
The syntax of the taskyield construct is as follows:
19
!$omp taskyield
64
OpenMP API • Version 3.1 July 2011
Because the taskyield construct is a stand-alone directive, there are some
restrictions on its placement within a program. The taskyield directive may be
placed only at a point where a Fortran executable statement is allowed. The
taskyield directive may not be used as the action statement in an if statement or as
the executable statement following a label if the label is referenced in the program. The
examples in Section A.25 on page 236 illustrate these restrictions.
1
2
3
4
5
6
Fortran
7
Binding
8
9
A taskyield region binds to the current task region. The binding thread set of the
taskyield region is the current team.
10
Description
11
12
The taskyield region includes an explicit task scheduling point in the current task
region.
13
Cross References
14
• Task scheduling, see Section 2.7.3 on page 65.
15
2.7.3
Task Scheduling
16
17
18
Whenever a thread reaches a task scheduling point, the implementation may cause it to
perform a task switch, beginning or resuming execution of a different task bound to the
current team. Task scheduling points are implied at the following locations:
19
• the point immediately following the generation of an explicit task
20
• after the last instruction of a task region
21
• in taskyield regions
22
• in taskwait regions
23
• in implicit and explicit barrier regions.
24
25
In addition, implementations may insert implementation defined task scheduling points
in untied tasks anywhere that they are not specifically prohibited in this specification.
26
27
When a thread encounters a task scheduling point it may do one of the following,
subject to the Task Scheduling Constraints (below):
28
• begin execution of a tied task bound to the current team
Chapter 2
Directives
65
1
• resume any suspended task region, bound to the current team, to which it is tied
2
• begin execution of an untied task bound to the current team
3
• resume any suspended untied task region bound to the current team.
4
5
If more than one of the above choices is available, it is unspecified as to which will be
chosen.
6
Task Scheduling Constraints are as follows:
7
1. An included task is executed immediately after generation of the task.
8
9
10
11
2. Scheduling of new tied tasks is constrained by the set of task regions that are currently
tied to the thread, and that are not suspended in a barrier region. If this set is empty,
any new tied task may be scheduled. Otherwise, a new tied task may be scheduled only
if it is a descendant of every task in the set.
12
13
14
3. When an explicit task is generated by a construct containing an if clause for which the
expression evaluated to false, and the previous constraint is already met, the task is
executed immediately after generation of the task.
15
A program relying on any other assumption about task scheduling is non-conforming.
16
17
18
19
20
Note – Task scheduling points dynamically divide task regions into parts. Each part is
executed uninterrupted from start to end. Different parts of the same task region are
executed in the order in which they are encountered. In the absence of task
synchronization constructs, the order in which a thread executes parts of different
schedulable tasks is unspecified.
21
22
A correct program must behave correctly and consistently with all conceivable
scheduling sequences that are compatible with the rules above.
23
24
25
26
27
For example, if threadprivate storage is accessed (explicitly in the source code or
implicitly in calls to library routines) in one part of a task region, its value cannot be
assumed to be preserved into the next part of the same task region if another schedulable
task exists that modifies it (see Example A.15.7c on page 202, Example A.15.7f on page
202, Example A.15.8c on page 203 and Example A.15.8f on page 203).
28
29
30
31
32
33
34
As another example, if a lock acquire and release happen in different parts of a task
region, no attempt should be made to acquire the same lock in any part of another task
that the executing thread may schedule. Otherwise, a deadlock is possible. A similar
situation can occur when a critical region spans multiple parts of a task and another
schedulable task contains a critical region with the same name (see Example A.15.9c on
page 204, Example A.15.9f on page 205, Example A.15.10c on page 206 and Example
A.15.10f on page 207).
66
OpenMP API • Version 3.1 July 2011
The use of threadprivate variables and the use of locks or critical sections in an explicit
task with an if clause must take into account that when the if clause evaluates to
false, the task is executed immediately, without regard to Task Scheduling Constraint 2.
1
2
3
4
2.8
Master and Synchronization Constructs
5
The following sections describe :
6
• the master construct.
7
• the critical construct.
8
• the barrier construct.
9
• the taskwait construct.
10
• the atomic construct.
11
• the flush construct.
12
• the ordered construct.
13
2.8.1
master Construct
14
Summary
15
16
The master construct specifies a structured block that is executed by the master thread
of the team.
17
Syntax
18
The syntax of the master construct is as follows:
C/C++
#pragma omp master new-line
structured-block
19
C/C++
Chapter 2
Directives
67
Fortran
The syntax of the master construct is as follows:
1
!$omp master
structured-block
!$omp end master
2
Fortran
3
Binding
4
5
6
7
The binding thread set for a master region is the current team. A master region
binds to the innermost enclosing parallel region. Only the master thread of the team
executing the binding parallel region participates in the execution of the structured
block of the master region.
8
Description
9
10
Other threads in the team do not execute the associated structured block. There is no
implied barrier either on entry to, or exit from, the master construct.
11
For an example of the master construct, see Section A.18 on page 217.
12
Restrictions
13
14
• A throw executed inside a master region must cause execution to resume within the
C/C++
same master region, and the same thread that threw the exception must catch it.
C/C++
15
2.8.2
critical Construct
16
Summary
17
18
The critical construct restricts execution of the associated structured block to a
single thread at a time.
68
OpenMP API • Version 3.1 July 2011
1
Syntax
2
The syntax of the critical construct is as follows:
C/C++
#pragma omp critical [(name)] new-line
structured-block
3
C/C++
Fortran
4
The syntax of the critical construct is as follows:
!$omp critical [(name)]
structured-block
!$omp end critical [(name)]
5
Fortran
6
Binding
7
8
9
The binding thread set for a critical region is all threads. Region execution is
restricted to a single thread at a time among all the threads in the program, without
regard to the team(s) to which the threads belong.
10
Description
11
12
13
14
15
16
An optional name may be used to identify the critical construct. All critical
constructs without a name are considered to have the same unspecified name. A thread
waits at the beginning of a critical region until no thread is executing a critical
region with the same name. The critical construct enforces exclusive access with
respect to all critical constructs with the same name in all threads, not just those
threads in the current team.
17
18
19
Identifiers used to identify a critical construct have external linkage and are in a
name space that is separate from the name spaces used by labels, tags, members, and
ordinary identifiers.
C/C++
C/C++
Chapter 2
Directives
69
Fortran
The names of critical constructs are global entities of the program. If a name
conflicts with any other entity, the behavior of the program is unspecified.
1
2
Fortran
3
For an example of the critical construct, see Section A.19 on page 219.
4
Restrictions
5
6
7
• A throw executed inside a critical region must cause execution to resume within
C/C++
the same critical region, and the same thread that threw the exception must catch
it.
C/C++
Fortran
The following restrictions apply to the critical construct:
8
9
10
• If a name is specified on a critical directive, the same name must also be
11
12
• If no name appears on the critical directive, no name can appear on the end
specified on the end critical directive.
critical directive.
Fortran
13
2.8.3
barrier Construct
14
Summary
15
16
The barrier construct specifies an explicit barrier at the point at which the construct
appears.
17
Syntax
18
The syntax of the barrier construct is as follows:
C/C++
#pragma omp barrier new-line
70
OpenMP API • Version 3.1 July 2011
1
2
3
4
5
6
Because the barrier construct is a stand-alone directive, there are some restrictions
on its placement within a program. The barrier directive may be placed only at a
point where a base language statement is allowed. The barrier directive may not be
used in place of the statement following an if, while, do, switch, or label. See
Appendix C for the formal grammar. The examples in Section A.25 on page 236
illustrate these restrictions.
C/C++
Fortran
7
The syntax of the barrier construct is as follows:
!$omp barrier
8
9
10
11
12
13
14
Because the barrier construct is a stand-alone directive, there are some restrictions
on its placement within a program. The barrier directive may be placed only at a
point where a Fortran executable statement is allowed. The barrier directive may not
be used as the action statement in an if statement or as the executable statement
following a label if the label is referenced in the program. The examples in Section A.25
on page 236 illustrate these restrictions.
Fortran
15
Binding
16
17
18
The binding thread set for a barrier region is the current team. A barrier region
binds to the innermost enclosing parallel region. See Section A.21 on page 222 for
examples.
19
Description
20
21
22
23
All threads of the team executing the binding parallel region must execute the
barrier region and complete execution of all explicit tasks generated in the binding
parallel region up to this point before any are allowed to continue execution beyond
the barrier.
24
25
The barrier region includes an implicit task scheduling point in the current task
region.
26
Restrictions
27
The following restrictions apply to the barrier construct:
Chapter 2
Directives
71
1
• Each barrier region must be encountered by all threads in a team or by none at all.
2
3
• The sequence of worksharing regions and barrier regions encountered must be the
4
same for every thread in a team.
2.8.4
taskwait Construct
5
Summary
6
7
The taskwait construct specifies a wait on the completion of child tasks of the
current task.
8
Syntax
9
The syntax of the taskwait construct is as follows:
C/C++
#pragma omp taskwait newline
Because the taskwait construct is a stand-alone directive, there are some restrictions
on its placement within a program. The taskwait directive may be placed only at a
point where a base language statement is allowed. The taskwait directive may not be
used in place of the statement following an if, while, do, switch, or label. See
Appendix C for the formal grammar. The examples in Section A.25 on page 236
illustrate these restrictions.
10
11
12
13
14
15
C/C++
Fortran
The syntax of the taskwait construct is as follows:
16
!$omp taskwait
Because the taskwait construct is a stand-alone directive, there are some restrictions
on its placement within a program. The taskwait directive may be placed only at a
point where a Fortran executable statement is allowed. The taskwait directive may
not be used as the action statement in an if statement or as the executable statement
following a label if the label is referenced in the program. The examples in Section A.25
on page 236 illustrate these restrictions.
17
18
19
20
21
22
Fortran
72
OpenMP API • Version 3.1 July 2011
1
Binding
2
3
A taskwait region binds to the current task region. The binding thread set of the
taskwait region is the current team.
4
Description
5
6
7
The taskwait region includes an implicit task scheduling point in the current task
region. The current task region is suspended at the task scheduling point until execution
of all its child tasks generated before the taskwait region are completed.
8
9
2.8.5
atomic Construct
Summary
10
11
12
The atomic construct ensures that a specific storage location is accessed atomically,
rather than exposing it to the possibility of multiple, simultaneous reading and writing
threads that may result in indeterminate values.
13
Syntax
14
The syntax of the atomic construct takes either of the following forms:
C/C++
#pragma omp atomic [read | write | update | capture ] new-line
expression-stmt
15
or:
#pragma omp atomic capture new-line
structured-block
16
where expression-stmt is an expression statement with one of the following forms:
17
18
• If clause is read:
19
20
• If clause is write:
v = x;
x = expr;
Chapter 2
Directives
73
• If clause is update or not present:
1
2
3
4
5
6
7
x++;
x--;
++x;
--x;
x binop= expr;
x = x binop expr;
8
9
10
11
12
13
• If clause is capture:
14
and where structured-block is a structured block with one of the following forms:
v
v
v
v
v
=
=
=
=
=
x++;
x--;
++x;
--x;
x binop= expr;
{v = x; x binop= expr;}
{x binop= expr; v = x;}
{v = x; x = x binop expr;}
{x = x binop expr; v = x;}
{v = x; x++;}
{v = x; ++x;}
{++x; v = x;}
{x++; v = x;}
{v = x; x--;}
{v = x; --x;}
{--x; v = x;}
{x--; v = x;}
15
16
17
18
19
20
21
22
23
24
25
26
27
In the preceding expressions:
28
• x and v (as applicable) are both l-value expressions with scalar type.
29
30
• During the execution of an atomic region, multiple syntactic occurrences of x must
31
• Neither of v and expr (as applicable) may access the storage location designated by x.
32
• Neither of x and expr (as applicable) may access the storage location designated by v.
33
• expr is an expression with scalar type.
34
• binop is one of +, *, -, /, &, ^, |, <<, or >>.
35
• binop, binop=, ++, and -- are not overloaded operators.
36
37
• For forms that allow multiple occurrences of x, the number of times that x is
designate the same storage location.
evaluated is unspecified.
C/C++
74
OpenMP API • Version 3.1 July 2011
Fortran
1
The syntax of the atomic construct takes any of the following forms:
!$omp atomic read
capture-statement
[!$omp end atomic]
2
or
!$omp atomic write
write-statement
[!$omp end atomic]
3
or
!$omp atomic [update]
update-statement
[!$omp end atomic]
4
or
!$omp atomic capture
update-statement
capture-statement
!$omp end atomic
5
or
!$omp atomic capture
capture-statement
update-statement
!$omp end atomic
6
7
8
9
10
11
where write-statement has the following form (if clause is write):
x = expr
where capture-statement has the following form (if clause is capture or read):
v=x
and where update-statement has one of the following forms (if clause is update,
capture, or not present):
Chapter 2
Directives
75
1
x = x operator expr
2
x = expr operator x
3
x = intrinsic_procedure_name (x, expr_list)
4
x = intrinsic_procedure_name (expr_list, x)
5
In the preceding statements:
6
• x and v (as applicable) are both scalar variables of intrinsic type.
7
8
• During the execution of an atomic region, multiple syntactic occurrences of x must
designate the same storage location.
9
10
• None of v, expr and expr_list (as applicable) may access the same storage location as
11
12
• None of x, expr and expr_list (as applicable) may access the same storage location as
13
• expr is a scalar expression.
14
15
16
• expr_list is a comma-separated, non-empty list of scalar expressions. If
17
• intrinsic_procedure_name is one of MAX, MIN, IAND, IOR, or IEOR.
18
• operator is one of +, *, -, /, .AND., .OR., .EQV., or .NEQV. .
19
20
21
• The operators in expr must have precedence equal to or greater than the precedence
22
23
• intrinsic_procedure_name must refer to the intrinsic procedure name and not to other
24
• operator must refer to the intrinsic operator and not to a user-defined operator.
25
• All assignments must be intrinsic assignments.
26
27
• For forms that allow multiple occurrences of x, the number of times that x is
x.
v.
intrinsic_procedure_name refers to IAND, IOR, or IEOR, exactly one expression
must appear in expr_list.
of operator, x operator expr must be mathematically equivalent to x operator (expr),
and expr operator x must be mathematically equivalent to (expr) operator x.
program entities.
evaluated is unspecified.
Fortran
28
Binding
29
30
31
32
The binding thread set for an atomic region is all threads. atomic regions enforce
exclusive access with respect to other atomic regions that access the same storage
location x among all the threads in the program without regard to the teams to which the
threads belong.
76
OpenMP API • Version 3.1 July 2011
1
Description
2
3
The atomic construct with the read clause forces an atomic read of the location
designated by x regardless of the native machine word size.
4
5
The atomic construct with the write clause forces an atomic write of the location
designated by x regardless of the native machine word size.
6
7
8
9
10
11
12
The atomic construct with the update clause forces an atomic update of the location
designated by x using the designated operator or intrinsic. Note that when no clause is
present, the semantics are equivalent to atomic update. Only the read and write of the
location designated by x are performed mutually atomically. The evaluation of expr or
expr_list need not be atomic with respect to the read or write of the location designated
by x. No task scheduling points are allowed between the read and the write of the
location designated by x.
13
14
15
16
17
18
19
20
21
22
The atomic construct with the capture clause forces an atomic update of the
location designated by x using the designated operator or intrinsic while also capturing
the original or final value of the location designated by x with respect to the atomic
update. The original or final value of the location designated by x is written in the
location designated by v depending on the form of the atomic construct structured
block or statements following the usual language semantics. Only the read and write of
the location designated by x are performed mutually atomically. Neither the evaluation
of expr or expr_list, nor the write to the location designated by v need be atomic with
respect to the read or write of the location designated by x. No task scheduling points
are allowed between the read and the write of the location designated by x.
23
24
25
26
For all forms of the atomic construct, any combination of two or more of these
atomic constructs enforces mutually exclusive access to the locations designated by x.
To avoid race conditions, all accesses of the locations designated by x that could
potentially occur in parallel must be protected with an atomic construct.
27
28
29
30
atomic regions do not guarantee exclusive access with respect to any accesses outside
of atomic regions to the same storage location x even if those accesses occur during a
critical or ordered region, while an OpenMP lock is owned by the executing
task, or during the execution of a reduction clause.
31
32
33
However, other OpenMP synchronization can ensure the desired exclusive access. For
example, a barrier following a series of atomic updates to x guarantees that subsequent
accesses do not form a race with the atomic accesses.
34
35
36
A compliant implementation may enforce exclusive access between atomic regions
that update different storage locations. The circumstances under which this occurs are
implementation defined.
37
For an example of the atomic construct, see Section A.22 on page 224.
38
Chapter 2
Directives
77
1
Restrictions
2
The following restriction applies to the atomic construct:
3
4
• All atomic accesses to the storage locations designated by x throughout the program
C/C++
are required to have a compatible type. See Section A.23 on page 230 for examples.
C/C++
Fortran
5
The following restriction applies to the atomic construct:
6
7
8
• All atomic accesses to the storage locations designated by x throughout the program
are required to have the same type and type parameters. See Section A.23 on page
230 for examples.
Fortran
Cross References
9
10
• critical construct, see Section 2.8.2 on page 68.
11
• barrier construct, see Section 2.8.3 on page 70.
12
• flush construct, see Section 2.8.6 on page 78.
13
• ordered construct, see Section 2.8.7 on page 82.
14
• reduction clause, see Section 2.9.3.6 on page 103.
15
• lock routines, see Section 3.3 on page 141.
16
2.8.6
flush Construct
17
Summary
18
19
20
21
The flush construct executes the OpenMP flush operation. This operation makes a
thread’s temporary view of memory consistent with memory, and enforces an order on
the memory operations of the variables explicitly specified or implied. See the memory
model description in Section 1.4 on page 13 for more details.
78
OpenMP API • Version 3.1 July 2011
1
Syntax
2
The syntax of the flush construct is as follows:
C/C++
#pragma omp flush [(list)] new-line
3
4
5
6
7
8
Because the flush construct is a stand-alone directive, there are some restrictions on
its placement within a program. The flush directive may be placed only at a point
where a base language statement is allowed. The flush directive may not be used in
place of the statement following an if, while, do, switch, or label. See Appendix
C for the formal grammar. See Section A.25 on page 236 for an example that illustrates
these placement restrictions.
C/C++
Fortran
9
The syntax of the flush construct is as follows:
!$omp flush [(list)]
10
11
12
13
14
15
Because the flush construct is a stand-alone directive, there are some restrictions on
its placement within a program. The flush directive may be placed only at a point
where a Fortran executable statement is allowed. The flush directive may not be used
as the action statement in an if statement or as the executable statement following a
label if the label is referenced in the program. The examples in Section A.25 on page
236 illustrate these restrictions.
Fortran
16
Binding
17
18
19
20
21
The binding thread set for a flush region is the encountering thread. Execution of a
flush region affects the memory and the temporary view of memory of only the thread
that executes the region. It does not affect the temporary view of other threads. Other
threads must themselves execute a flush operation in order to be guaranteed to observe
the effects of the encountering thread’s flush operation.
22
Description
23
24
25
A flush construct without a list, executed on a given thread, operates as if the whole
thread-visible data state of the program, as defined by the base language, is flushed. A
flush construct with a list applies the flush operation to the items in the list, and does
Chapter 2
Directives
79
1
2
3
4
not return until the operation is complete for all specified list items. Use of a flush
construct with a list is extremely error prone and users are strongly discouraged from
attempting it. An implementation may implement a flush with a list by ignoring the
list, and treating it the same as a flush without a list.
5
6
If a pointer is present in the list, the pointer itself is flushed, not the memory block to
which the pointer refers.
C/C++
C/C++
Fortran
If the list item or a subobject of the list item has the POINTER attribute, the allocation
or association status of the POINTER item is flushed, but the pointer target is not. If the
list item is a Cray pointer, the pointer is flushed, but the object to which it points is not.
If the list item has the ALLOCATABLE attribute and the list item is allocated, the
allocated array is flushed; otherwise the allocation status is flushed.
7
8
9
10
11
Fortran
12
For examples of the flush construct, see Section A.25 on page 236.
13
14
15
16
17
18
19
20
21
Note – the following examples illustrate the ordering properties of the flush operation.
In the following incorrect pseudocode example, the programmer intends to prevent
simultaneous execution of the protected section by the two threads, but the program
does not work properly because it does not enforce the proper ordering of the operations
on variables a and b. Any shared data accessed in the protected section is not
guaranteed to be current or consistent during or after the protected section. The atomic
notation in the pseudocode in the following two examples indicates that the accesses to
a and b are ATOMIC writes and captures. Otherwise both examples would contain data
races and automatically result in unspecified behavior.
Incorrect example:
a = b = 0
thread 1
atomic(b = 1)
flush(b)
flush(a)
atomic(tmp = a)
if (tmp == 0) then
protected section
end if
80
OpenMP API • Version 3.1 July 2011
thread 2
atomic(a = 1)
flush(a)
flush(b)
atomic(tmp = b)
if (tmp == 0) then
protected section
end if
1
2
3
4
5
6
The problem with this example is that operations on variables a and b are not ordered
with respect to each other. For instance, nothing prevents the compiler from moving the
flush of b on thread 1 or the flush of a on thread 2 to a position completely after the
protected section (assuming that the protected section on thread 1 does not reference b and
the protected section on thread 2 does not reference a). If either re-ordering happens, both
threads can simultaneously execute the protected section.
7
8
9
10
The following pseudocode example correctly ensures that the protected section is executed
by not more than one of the two threads at any one time. Notice that execution of the
protected section by neither thread is considered correct in this example. This occurs if
both flushes complete prior to either thread executing its if statement.
Correct example:
a = b = 0
thread 1
atomic(b = 1)
flush(a,b)
atomic(tmp = a)
if (tmp == 0) then
protected section
end if
thread 2
atomic(a = 1)
flush(a,b)
atomic(tmp = b)
if (tmp == 0) then
protected section
end if
11
12
13
The compiler is prohibited from moving the flush at all for either thread, ensuring that the
respective assignment is complete and the data is flushed before the if statement is
executed.
14
A flush region without a list is implied at the following locations:
15
• During a barrier region.
16
• At entry to and exit from parallel, critical, and ordered regions.
17
• At exit from worksharing regions unless a nowait is present.
18
• At entry to and exit from combined parallel worksharing regions.
19
• During omp_set_lock and omp_unset_lock regions.
20
21
22
• During omp_test_lock, omp_set_nest_lock, omp_unset_nest_lock
23
• Immediately before and immediately after every task scheduling point.
24
A flush region with a list is implied at the following locations:
and omp_test_nest_lock regions, if the region causes the lock to be set or
unset.
Chapter 2
Directives
81
1
2
3
4
5
• At entry to and exit from the atomic operation (read, write, update, or capture)
6
Note – A flush region is not implied at the following locations:
7
• At entry to worksharing regions.
8
• At entry to or exit from a master region.
9
performed in an atomic region, where the list contains only the storage location
designated as x according to the description of the syntax of the atomic construct
in Section 2.8.5 on page 73.
2.8.7
ordered Construct
10
Summary
11
12
13
The ordered construct specifies a structured block in a loop region that will be
executed in the order of the loop iterations. This sequentializes and orders the code
within an ordered region while allowing code outside the region to run in parallel.
14
Syntax
15
The syntax of the ordered construct is as follows:
C/C++
#pragma omp ordered new-line
structured-block
C/C++
16
Fortran
The syntax of the ordered construct is as follows:
17
!$omp ordered
structured-block
!$omp end ordered
18
Fortran
82
OpenMP API • Version 3.1 July 2011
1
Binding
2
3
4
The binding thread set for an ordered region is the current team. An ordered region
binds to the innermost enclosing loop region. ordered regions that bind to different
loop regions execute independently of each other.
5
Description
6
7
8
9
10
11
The threads in the team executing the loop region execute ordered regions
sequentially in the order of the loop iterations. When the thread executing the first
iteration of the loop encounters an ordered construct, it can enter the ordered
region without waiting. When a thread executing any subsequent iteration encounters an
ordered region, it waits at the beginning of that ordered region until execution of
all the ordered regions belonging to all previous iterations have completed.
12
For examples of the ordered construct, see Section A.26 on page 239.
13
Restrictions
14
Restrictions to the ordered construct are as follows:
15
16
• The loop region to which an ordered region binds must have an ordered clause
17
18
19
• During execution of an iteration of a loop or a loop nest within a loop region, a
20
21
22
• A throw executed inside a ordered region must cause execution to resume within
specified on the corresponding loop (or parallel loop) construct.
thread must not execute more than one ordered region that binds to the same loop
region.
C/C++
the same ordered region, and the same thread that threw the exception must catch
it.
C/C++
23
Cross References
24
• loop construct, see Section 2.5.1 on page 39.
25
• parallel loop construct, see Section 2.6.1 on page 56.
Chapter 2
Directives
83
1
2.9
Data Environment
2
3
This section presents a directive and several clauses for controlling the data environment
during the execution of parallel, task, and worksharing regions.
4
5
• Section 2.9.1 on page 84 describes how the data-sharing attributes of variables
6
7
• The threadprivate directive, which is provided to create threadprivate memory,
referenced in parallel, task, and worksharing regions are determined.
is described in Section 2.9.2 on page 88.
8
9
10
• Clauses that may be specified on directives to control the data-sharing attributes of
11
12
13
• Clauses that may be specified on directives to copy data values from private or
14
variables referenced in parallel, task, or worksharing constructs are described
in Section 2.9.3 on page 92.
threadprivate variables on one thread to the corresponding variables on other threads
in the team are described in Section 2.9.4 on page 107.
2.9.1
Data-sharing Attribute Rules
15
16
17
This section describes how the data-sharing attributes of variables referenced in
parallel, task, and worksharing regions are determined. The following two cases
are described separately:
18
19
• Section 2.9.1.1 on page 84 describes the data-sharing attribute rules for variables
20
21
• Section 2.9.1.2 on page 87 describes the data-sharing attribute rules for variables
22
23
referenced in a construct.
referenced in a region, but outside any construct.
2.9.1.1
Data-sharing Attribute Rules for Variables Referenced in a
Construct
24
25
26
The data-sharing attributes of variables that are referenced in a construct can be
predetermined, explicitly determined, or implicitly determined, according to the rules
outlined in this section.
27
28
29
30
Specifying a variable on a firstprivate, lastprivate, or reduction clause
of an enclosed construct causes an implicit reference to the variable in the enclosing
construct. Such implicit references are also subject to the data-sharing attribute rules
outlined in this section.
31
Certain variables and objects have predetermined data-sharing attributes as follows:
84
OpenMP API • Version 3.1 July 2011
C/C++
1
• Variables appearing in threadprivate directives are threadprivate.
2
3
• Variables with automatic storage duration that are declared in a scope inside the
4
• Objects with dynamic storage duration are shared.
5
• Static data members are shared.
6
7
• The loop iteration variable(s) in the associated for-loop(s) of a for or parallel
8
• Variables with const-qualified type having no mutable member are shared.
9
10
construct are private.
for construct is (are) private.
• Variables with static storage duration that are declared in a scope inside the construct
are shared.
C/C++
Fortran
11
12
• Variables and common blocks appearing in threadprivate directives are
13
14
• The loop iteration variable(s) in the associated do-loop(s) of a do or parallel do
15
16
• A loop iteration variable for a sequential loop in a parallel or task construct is
17
• Implied-do indices and forall indices are private.
18
19
• Cray pointees inherit the data-sharing attribute of the storage with which their Cray
20
• Assumed-size arrays are shared.
threadprivate.
construct is(are) private.
private in the innermost such construct that encloses the loop.
pointers are associated.
Fortran
21
22
23
24
Variables with predetermined data-sharing attributes may not be listed in data-sharing
attribute clauses, except for the cases listed below. For these exceptions only, listing a
predetermined variable in a data-sharing attribute clause is allowed and overrides the
variable’s predetermined data-sharing attributes.
25
26
• The loop iteration variable(s) in the associated for-loop(s) of a for or parallel
27
28
• Variables with const-qualified type having no mutable member may be listed in a
C/C++
for construct may be listed in a private or lastprivate clause.
firstprivate clause.
C/C++
Chapter 2
Directives
85
Fortran
1
2
• The loop iteration variable(s) in the associated do-loop(s) of a do or parallel do
3
4
5
• Variables used as loop iteration variables in sequential loops in a parallel or
6
• Assumed-size arrays may be listed in a shared clause.
construct may be listed in a private or lastprivate clause.
task construct may be listed in data-sharing clauses on the construct itself, and on
enclosed constructs, subject to other restrictions.
Fortran
Additional restrictions on the variables that may appear in individual clauses are
described with each clause in Section 2.9.3 on page 92.
7
8
9
10
Variables with explicitly determined data-sharing attributes are those that are referenced
in a given construct and are listed in a data-sharing attribute clause on the construct.
11
12
13
Variables with implicitly determined data-sharing attributes are those that are referenced
in a given construct, do not have predetermined data-sharing attributes, and are not
listed in a data-sharing attribute clause on the construct.
14
Rules for variables with implicitly determined data-sharing attributes are as follows:
15
16
• In a parallel or task construct, the data-sharing attributes of these variables are
17
18
• In a parallel construct, if no default clause is present, these variables are
19
20
• For constructs other than task, if no default clause is present, these variables
21
22
23
• In a task construct, if no default clause is present, a variable that in the
determined by the default clause, if present (see Section 2.9.3.1 on page 93).
shared.
inherit their data-sharing attributes from the enclosing context.
enclosing context is determined to be shared by all implicit tasks bound to the current
team is shared.
Fortran
• In an orphaned task construct, if no default clause is present, dummy arguments
24
25
are firstprivate.
Fortran
26
27
• In a task construct, if no default clause is present, a variable whose data-sharing
28
29
Additional restrictions on the variables for which data-sharing attributes cannot be
implicitly determined in a task construct are described in Section 2.9.3.4 on page 98.
attribute is not determined by the rules above is firstprivate.
86
OpenMP API • Version 3.1 July 2011
1
2
2.9.1.2
Data-sharing Attribute Rules for Variables Referenced in a
Region but not in a Construct
3
4
The data-sharing attributes of variables that are referenced in a region, but not in a
construct, are determined as follows:
5
6
• Variables with static storage duration that are declared in called routines in the region
7
8
• Variables with const-qualified type having no mutable member, and that are
C/C++
are shared.
declared in called routines, are shared.
9
10
• File-scope or namespace-scope variables referenced in called routines in the region
11
• Objects with dynamic storage duration are shared.
12
• Static data members are shared unless they appear in a threadprivate directive.
13
14
• Formal arguments of called routines in the region that are passed by reference inherit
15
• Other variables declared in called routines in the region are private.
are shared unless they appear in a threadprivate directive.
the data-sharing attributes of the associated actual argument.
C/C++
Fortran
16
17
18
• Local variables declared in called routines in the region and that have the save
19
20
21
• Variables belonging to common blocks, or declared in modules, and referenced in
22
23
• Dummy arguments of called routines in the region that are passed by reference inherit
24
25
• Cray pointees inherit the data-sharing attribute of the storage with which their Cray
26
27
• Implied-do indices, forall indices, and other local variables declared in called
attribute, or that are data initialized, are shared unless they appear in a
threadprivate directive.
called routines in the region are shared unless they appear in a threadprivate
directive.
the data-sharing attributes of the associated actual argument.
pointers are associated.
routines in the region are private.
Fortran
Chapter 2
Directives
87
1
2.9.2
threadprivate Directive
2
Summary
3
4
The threadprivate directive specifies that variables are replicated, with each thread
having its own copy.
5
Syntax
6
The syntax of the threadprivate directive is as follows:
C/C++
#pragma omp threadprivate(list) new-line
where list is a comma-separated list of file-scope, namespace-scope, or static
block-scope variables that do not have incomplete types.
7
8
C/C++
Fortran
The syntax of the threadprivate directive is as follows:
9
!$omp threadprivate(list)
where list is a comma-separated list of named variables and named common blocks.
Common block names must appear between slashes.
10
11
Fortran
12
Description
13
14
15
16
17
Each copy of a threadprivate variable is initialized once, in the manner specified by the
program, but at an unspecified point in the program prior to the first reference to that
copy. The storage of all copies of a threadprivate variable is freed according to how
static variables are handled in the base language, but at an unspecified point in the
program.
18
19
A program in which a thread references another thread’s copy of a threadprivate variable
is non-conforming.
88
OpenMP API • Version 3.1 July 2011
1
2
3
The content of a threadprivate variable can change across a task scheduling point if the
executing thread switches to another task that modifies the variable. For more details on
task scheduling, see Section 1.3 on page 12 and Section 2.7 on page 61.
4
5
In parallel regions, references by the master thread will be to the copy of the
variable in the thread that encountered the parallel region.
6
7
8
During the sequential part references will be to the initial thread’s copy of the variable.
The values of data in the initial thread’s copy of a threadprivate variable are guaranteed
to persist between any two consecutive references to the variable in the program.
9
10
11
The values of data in the threadprivate variables of non-initial threads are guaranteed to
persist between two consecutive active parallel regions only if all the following
conditions hold:
12
• Neither parallel region is nested inside another explicit parallel region.
13
• The number of threads used to execute both parallel regions is the same.
14
15
• The value of the dyn-var internal control variable in the enclosing task region is false
16
17
18
If these conditions all hold, and if a threadprivate variable is referenced in both regions,
then threads with the same thread number in their respective regions will reference the
same copy of that variable.
19
20
21
22
If the above conditions hold, the storage duration, lifetime, and value of a thread’s copy
of a threadprivate variable that does not appear in any copyin clause on the second
region will be retained. Otherwise, the storage duration, lifetime, and value of a thread’s
copy of the variable in the second region is unspecified.
23
24
25
If the value of a variable referenced in an explicit initializer of a threadprivate variable
is modified prior to the first reference to any instance of the threadprivate variable, then
the behavior is unspecified.
26
27
28
The order in which any constructors for different threadprivate variables of class type
are called is unspecified. The order in which any destructors for different threadprivate
variables of class type are called is unspecified.
at entry to both parallel regions.
C/C++
C/C++
29
Fortran
30
31
A variable is affected by a copyin clause if the variable appears in the copyin clause
or it is in a common block that appears in the copyin clause.
32
33
34
If the above conditions hold, the definition, association, or allocation status of a thread’s
copy of a threadprivate variable or a variable in a threadprivate common
block, that is not affected by any copyin clause that appears on the second region, will
Chapter 2
Directives
89
1
2
3
be retained. Otherwise, the definition and association status of a thread’s copy of the
variable in the second region is undefined, and the allocation status of an allocatable
array will be implementation defined.
4
5
6
7
If a threadprivate variable or a variable in a threadprivate common block is
not affected by any copyin clause that appears on the first parallel region in which
it is referenced, the variable or any subobject of the variable is initially defined or
undefined according to the following rules:
8
9
• If it has the ALLOCATABLE attribute, each copy created will have an initial
allocation status of not currently allocated.
• If it has the POINTER attribute:
10
11
12
13
• if it has an initial association status of disassociated, either through explicit
initialization or default initialization, each copy created will have an association
status of disassociated;
14
• otherwise, each copy created will have an association status of undefined.
15
• If it does not have either the POINTER or the ALLOCATABLE attribute:
16
17
• if it is initially defined, either through explicit initialization or default
initialization, each copy created is so defined;
18
• otherwise, each copy created is undefined.
Fortran
19
For examples of the threadprivate directive, see Section A.27 on page 244.
20
Restrictions
21
The restrictions to the threadprivate directive are as follows:
22
23
• A threadprivate variable must not appear in any clause except the copyin,
24
• A program in which an untied task accesses threadprivate storage is non-conforming.
25
26
27
• A variable that is part of another variable (as an array or structure element) cannot
28
29
30
• A threadprivate directive for file-scope variables must appear outside any
31
32
33
• A threadprivate directive for static class member variables must appear in the
copyprivate, schedule, num_threads, and if clauses.
C/C++
appear in a threadprivate clause unless it is a static data member of a C++
class.
definition or declaration, and must lexically precede all references to any of the
variables in its list.
class definition, in the same scope in which the member variables are declared, and
must lexically precede all references to any of the variables in its list.
90
OpenMP API • Version 3.1 July 2011
1
2
3
• A threadprivate directive for namespace-scope variables must appear outside
4
5
6
• Each variable in the list of a threadprivate directive at file, namespace, or class
7
8
9
• A threadprivate directive for static block-scope variables must appear in the
any definition or declaration other than the namespace definition itself, and must
lexically precede all references to any of the variables in its list.
scope must refer to a variable declaration at file, namespace, or class scope that
lexically precedes the directive.
scope of the variable and not in a nested scope. The directive must lexically precede
all references to any of the variables in its list.
10
11
12
• Each variable in the list of a threadprivate directive in block scope must refer to
13
14
15
• If a variable is specified in a threadprivate directive in one translation unit, it
16
• The address of a threadprivate variable is not an address constant.
17
• A threadprivate variable must not have an incomplete type or a reference type.
18
• A threadprivate variable with class type must have:
a variable declaration in the same scope that lexically precedes the directive. The
variable declaration must use the static storage-class specifier.
must be specified in a threadprivate directive in every translation unit in which
it is declared.
19
20
• an accessible, unambiguous default constructor in case of default initialization
without a given initializer;
21
22
• an accessible, unambiguous constructor accepting the given argument in case of
direct initialization;
23
24
• an accessible, unambiguous copy constructor in case of copy initialization with an
explicit initializer.
C/C++
25
Fortran
26
27
• A variable that is part of another variable (as an array or structure element) cannot
28
29
30
31
32
33
• The threadprivate directive must appear in the declaration section of a scoping
34
35
36
37
• If a threadprivate directive specifying a common block name appears in one
appear in a threadprivate clause.
unit in which the common block or variable is declared. Although variables in
common blocks can be accessed by use association or host association, common
block names cannot. This means that a common block name specified in a
threadprivate directive must be declared to be a common block in the same
scoping unit in which the threadprivate directive appears.
program unit, then such a directive must also appear in every other program unit that
contains a COMMON statement specifying the same name. It must appear after the last
such COMMON statement in the program unit.
Chapter 2
Directives
91
1
• A blank common block cannot appear in a threadprivate directive.
2
3
4
• A variable can only appear in a threadprivate directive in the scope in which it
5
6
• A variable that appears in a threadprivate directive must be declared in the
is declared. It must not be an element of a common block or appear in an
EQUIVALENCE statement.
scope of a module or have the SAVE attribute, either explicitly or implicitly.
Fortran
7
Cross References:
8
• dyn-var ICV, see Section 2.3 on page 28.
9
• number of threads used to execute a parallel region, see Section 2.4.1 on page 36.
• copyin clause, see Section 2.9.4.1 on page 107.
10
11
2.9.3
Data-Sharing Attribute Clauses
12
13
14
Several constructs accept clauses that allow a user to control the data-sharing attributes
of variables referenced in the construct. Data-sharing attribute clauses apply only to
variables for which the names are visible in the construct on which the clause appears.
15
16
Not all of the clauses listed in this section are valid on all directives. The set of clauses
that is valid on a particular directive is described with the directive.
17
18
19
20
21
22
Most of the clauses accept a comma-separated list of list items (see Section 2.1 on page
22). All list items appearing in a clause must be visible, according to the scoping rules
of the base language. With the exception of the default clause, clauses may be
repeated as needed. A list item that specifies a given variable may not appear in more
than one clause on the same directive, except that a variable may be specified in both
firstprivate and lastprivate clauses.
23
24
25
If a variable referenced in a data-sharing attribute clause has a type derived from a
template, and there are no other references to that variable in the program, then any
behavior related to that variable is unspecified.
C/C++
C/C++
Fortran
A named common block may be specified in a list by enclosing the name in slashes.
When a named common block appears in a list, it has the same meaning as if every
explicit member of the common block appeared in the list. An explicit member of a
26
27
28
92
OpenMP API • Version 3.1 July 2011
1
2
3
common block is a variable that is named in a COMMON statement that specifies the
common block name and is declared in the same scoping unit in which the clause
appears.
4
5
6
7
Although variables in common blocks can be accessed by use association or host
association, common block names cannot. As a result, a common block name specified
in a data-sharing attribute clause must be declared to be a common block in the same
scoping unit in which the data-sharing attribute clause appears.
8
9
10
11
12
13
14
When a named common block appears in a private, firstprivate,
lastprivate, or shared clause of a directive, none of its members may be declared
in another data-sharing attribute clause in that directive (see Section A.29 on page 251
for examples). When individual members of a common block appear in a private,
firstprivate, lastprivate, or reduction clause of a directive, the storage of
the specified variables is no longer associated with the storage of the common block
itself (see Section A.33 on page 260 for examples).
Fortran
15
2.9.3.1
default clause
16
Summary
17
18
19
The default clause explicitly determines the data-sharing attributes of variables that
are referenced in a parallel or task construct and would otherwise be implicitly
determined (see Section 2.9.1.1 on page 84).
20
Syntax
21
The syntax of the default clause is as follows:
C/C++
default(shared | none)
C/C++
22
Chapter 2
Directives
93
1
Fortran
The syntax of the default clause is as follows:
2
default(private | firstprivate | shared | none)
Fortran
3
4
Description
5
6
The default(shared) clause causes all variables referenced in the construct that
have implicitly determined data-sharing attributes to be shared.
Fortran
7
8
The default(firstprivate) clause causes all variables in the construct that have
implicitly determined data-sharing attributes to be firstprivate.
9
10
The default(private) clause causes all variables referenced in the construct that
have implicitly determined data-sharing attributes to be private.
Fortran
11
12
13
14
The default(none) clause requires that each variable that is referenced in the
construct, and that does not have a predetermined data-sharing attribute, must have its
data-sharing attribute explicitly determined by being listed in a data-sharing attribute
clause. See Section A.30 on page 253 for examples.
15
Restrictions
16
The restrictions to the default clause are as follows:
17
• Only a single default clause may be specified on a parallel or task directive.
18
2.9.3.2
shared clause
19
Summary
20
21
The shared clause declares one or more list items to be shared by tasks generated by
a parallel or task construct.
94
OpenMP API • Version 3.1 July 2011
1
Syntax
2
The syntax of the shared clause is as follows:
shared(list)
3
Description
4
5
All references to a list item within a task refer to the storage area of the original variable
at the point the directive was encountered.
6
7
8
It is the programmer's responsibility to ensure, by adding proper synchronization, that
storage shared by an explicit task region does not reach the end of its lifetime before
the explicit task region completes its execution.
Fortran
9
10
11
12
The association status of a shared pointer becomes undefined upon entry to and on exit
from the parallel or task construct if it is associated with a target or a subobject of
a target that is in a private, firstprivate, lastprivate, or reduction
clause inside the construct.
13
14
15
16
17
Under certain conditions, passing a shared variable to a non-intrinsic procedure may
result in the value of the shared variable being copied into temporary storage before the
procedure reference, and back out of the temporary storage into the actual argument
storage after the procedure reference. It is implementation defined when this situation
occurs. See Section A.31 on page 255 for an example of this behavior.
18
19
Note – Use of intervening temporary storage may occur when the following three
conditions hold regarding an actual argument in a reference to a non-intrinsic procedure:
20
a. The actual argument is one of the following:
21
• A shared variable.
22
• A subobject of a shared variable.
23
• An object associated with a shared variable.
24
• An object associated with a subobject of a shared variable.
25
b. The actual argument is also one of the following:
26
• An array section.
27
• An array section with a vector subscript.
28
• An assumed-shape array.
29
• A pointer array.
Chapter 2
Directives
95
1
2
c. The associated dummy argument for this actual argument is an explicit-shape array
or an assumed-size array.
3
4
5
6
These conditions effectively result in references to, and definitions of, the temporary
storage during the procedure reference. Any references to (or definitions of) the shared
storage that is associated with the dummy argument by any other task must be
synchronized with the procedure reference to avoid possible race conditions.
7
Fortran
8
2.9.3.3
private clause
Summary
9
10
The private clause declares one or more list items to be private to a task.
11
Syntax
12
The syntax of the private clause is as follows:
private(list)
13
Description
14
15
16
17
18
19
20
21
Each task that references a list item that appears in a private clause in any statement
in the construct receives a new list item whose language-specific attributes are derived
from the original list item. Inside the construct, all references to the original list item are
replaced by references to the new list item. In the rest of the region, it is unspecified
whether references are to the new list item or the original list item. Therefore, if an
attempt is made to reference the original item, its value after the region is also
unspecified. If a task does not reference a list item that appears in a private clause, it
is unspecified whether that task receives a new list item.
22
The value and/or allocation status of the original list item will change only:
23
• if accessed and modified via pointer,
24
• if possibly accessed in the region but outside of the construct, or
25
• as a side effect of directives or clauses.
96
OpenMP API • Version 3.1 July 2011
1
2
3
4
5
6
7
8
List items that appear in a private, firstprivate, or reduction clause in a
parallel construct may also appear in a private clause in an enclosed parallel,
task, or worksharing construct. List items that appear in a private or
firstprivate clause in a task construct may also appear in a private clause in
an enclosed parallel or task construct. List items that appear in a private,
firstprivate, lastprivate, or reduction clause in a worksharing construct
may also appear in a private clause in an enclosed parallel or task construct.
See Section A.32 on page 256 for an example.
9
10
11
12
13
A new list item of the same type, with automatic storage duration, is allocated for the
construct. The storage and thus lifetime of these list items lasts until the block in which
they are created exits. The size and alignment of the new list item are determined by the
type of the variable. This allocation occurs once for each task generated by the
construct, if the task references the list item in any statement.
14
15
16
17
The new list item is initialized, or has an undefined initial value, as if it had been locally
declared without an initializer. The order in which any default constructors for different
private variables of class type are called is unspecified. The order in which any
destructors for different private variables of class type are called is unspecified.
C/C++
C/C++
Fortran
18
19
20
21
22
A new list item of the same type is allocated once for each implicit task in the
parallel region, or for each task generated by a task construct, if the construct
references the list item in any statement. The initial value of the new list item is
undefined. Within a parallel, worksharing, or task region, the initial status of a
private pointer is undefined.
23
For a list item with the ALLOCATABLE attribute:
24
25
• if the list item is "not currently allocated", the new list item will have an initial state
26
27
• if the list item is allocated, the new list item will have an initial state of allocated
28
29
30
31
A list item that appears in a private clause may be storage-associated with other
variables when the private clause is encountered. Storage association may exist
because of constructs such as EQUIVALENCE or COMMON. If A is a variable appearing
in a private clause and B is a variable that is storage-associated with A, then:
32
33
• The contents, allocation, and association status of B are undefined on entry to the
34
35
• Any definition of A, or of its allocation or association status, causes the contents,
of "not currently allocated";
with the same array bounds.
parallel or task region.
allocation, and association status of B to become undefined.
Chapter 2
Directives
97
1
2
• Any definition of B, or of its allocation or association status, causes the contents,
3
For examples, see Section A.33 on page 260.
allocation, and association status of A to become undefined.
Fortran
4
For examples of the private clause, see Section A.32 on page 256.
5
Restrictions
6
The restrictions to the private clause are as follows:
7
8
• A variable that is part of another variable (as an array or structure element) cannot
appear in a private clause.
C/C++
9
10
• A variable of class type (or array thereof) that appears in a private clause requires
11
12
13
• A variable that appears in a private clause must not have a const-qualified type
14
15
• A variable that appears in a private clause must not have an incomplete type or a
an accessible, unambiguous default constructor for the class type.
unless it is of class type with a mutable member. This restriction does not apply to
the firstprivate clause.
reference type.
C/C++
Fortran
16
17
• A variable that appears in a private clause must either be definable, or an
18
19
• Variables that appear in namelist statements, in variable format expressions, and in
allocatable array. This restriction does not apply to the firstprivate clause.
expressions for statement function definitions, may not appear in a private clause.
Fortran
20
2.9.3.4
firstprivate clause
21
Summary
22
23
24
The firstprivate clause declares one or more list items to be private to a task, and
initializes each of them with the value that the corresponding original item has when the
construct is encountered.
98
OpenMP API • Version 3.1 July 2011
1
Syntax
2
The syntax of the firstprivate clause is as follows:
firstprivate(list)
3
Description
4
5
The firstprivate clause provides a superset of the functionality provided by the
private clause.
6
7
8
9
10
11
A list item that appears in a firstprivate clause is subject to the private clause
semantics described in Section 2.9.3.3 on page 96, except as noted. In addition, the new
list item is initialized from the original list item existing before the construct. The
initialization of the new list item is done once for each task that references the list item
in any statement in the construct. The initialization is done prior to the execution of the
construct.
12
13
14
15
16
17
18
For a firstprivate clause on a parallel or task construct, the initial value of
the new list item is the value of the original list item that exists immediately prior to the
construct in the task region where the construct is encountered. For a firstprivate
clause on a worksharing construct, the initial value of the new list item for each implicit
task of the threads that execute the worksharing construct is the value of the original list
item that exists in the implicit task immediately prior to the point in time that the
worksharing construct is encountered.
19
20
21
To avoid race conditions, concurrent updates of the original list item must be
synchronized with the read of the original list item that occurs as a result of the
firstprivate clause.
22
23
If a list item appears in both firstprivate and lastprivate clauses, the update
required for lastprivate occurs after all the initializations for firstprivate.
24
25
26
27
28
29
For variables of non-array type, the initialization occurs by copy assignment. For an
array of elements of non-array type, each element is initialized as if by assignment from
an element of the original array to the corresponding element of the new array. For
variables of class type, a copy constructor is invoked to perform the initialization. The
order in which copy constructors for different variables of class type are called is
unspecified.
C/C++
C/C++
Chapter 2
Directives
99
Fortran
If the original list item does not have the POINTER attribute, initialization of the new
list items occurs as if by intrinsic assignment, unless the original list item has the
allocation status of not currently allocated, in which case the new list items will have the
same status.
1
2
3
4
5
6
7
If the original list item has the POINTER attribute, the new list items receive the same
association status of the original list item as if by pointer assignment.
Fortran
8
Restrictions
9
The restrictions to the firstprivate clause are as follows:
10
11
• A variable that is part of another variable (as an array or structure element) cannot
12
13
14
15
• A list item that is private within a parallel region must not appear in a
16
17
18
19
• A list item that appears in a reduction clause of a parallel construct must not
20
21
22
• A list item that appears in a reduction clause in a worksharing construct must not
23
24
• A variable of class type (or array thereof) that appears in a firstprivate clause
25
26
• A variable that appears in a firstprivate clause must not have an incomplete
appear in a firstprivate clause.
firstprivate clause on a worksharing construct if any of the worksharing
regions arising from the worksharing construct ever bind to any of the parallel
regions arising from the parallel construct.
appear in a firstprivate clause on a worksharing or task construct if any of
the worksharing or task regions arising from the worksharing or task construct
ever bind to any of the parallel regions arising from the parallel construct.
appear in a firstprivate clause in a task construct encountered during execution
of any of the worksharing regions arising from the worksharing construct.
C/C++
requires an accessible, unambiguous copy constructor for the class type.
type or a reference type.
C/C++
27
Fortran
• Variables that appear in namelist statements, in variable format expressions, and in
28
29
30
expressions for statement function definitions, may not appear in a firstprivate
clause.
Fortran
100
OpenMP API • Version 3.1 July 2011
1
2.9.3.5
lastprivate clause
2
Summary
3
4
5
The lastprivate clause declares one or more list items to be private to an implicit
task, and causes the corresponding original list item to be updated after the end of the
region.
6
Syntax
7
The syntax of the lastprivate clause is as follows:
lastprivate(list)
8
Description
9
10
The lastprivate clause provides a superset of the functionality provided by the
private clause.
11
12
13
14
15
A list item that appears in a lastprivate clause is subject to the private clause
semantics described in Section 2.9.3.3 on page 96. In addition, when a lastprivate
clause appears on the directive that identifies a worksharing construct, the value of each
new list item from the sequentially last iteration of the associated loops, or the lexically
last section construct, is assigned to the original list item.
16
17
For an array of elements of non-array type, each element is assigned to the
corresponding element of the original array.
C/C++
C/C++
Fortran
18
19
If the original list item does not have the POINTER attribute, its update occurs as if by
intrinsic assignment.
20
21
If the original list item has the POINTER attribute, its update occurs as if by pointer
assignment.
Fortran
22
23
24
List items that are not assigned a value by the sequentially last iteration of the loops, or
by the lexically last section construct, have unspecified values after the construct.
Unassigned subcomponents also have unspecified values after the construct.
Chapter 2
Directives
101
1
2
3
4
The original list item becomes defined at the end of the construct if there is an implicit
barrier at that point. To avoid race conditions, concurrent reads or updates of the original
list item must be synchronized with the update of the original list item that occurs as a
result of the lastprivate clause.
5
6
7
8
If the lastprivate clause is used on a construct to which nowait is applied,
accesses to the original list item may create a data race. To avoid this, synchronization
must be inserted to ensure that the sequentially last iteration or lexically last section
construct has stored and flushed that list item.
9
10
If a list item appears in both firstprivate and lastprivate clauses, the update
required for lastprivate occurs after all initializations for firstprivate.
11
For an example of the lastprivate clause, see Section A.35 on page 264.
12
Restrictions
13
The restrictions to the lastprivate clause are as follows:
14
15
• A variable that is part of another variable (as an array or structure element) cannot
16
17
18
19
• A list item that is private within a parallel region, or that appears in the
20
21
22
• A variable of class type (or array thereof) that appears in a lastprivate clause
23
24
25
26
• A variable of class type (or array thereof) that appears in a lastprivate clause
27
28
• A variable that appears in a lastprivate clause must not have a const-qualified
29
30
• A variable that appears in a lastprivate clause must not have an incomplete type
appear in a lastprivate clause.
reduction clause of a parallel construct, must not appear in a lastprivate
clause on a worksharing construct if any of the corresponding worksharing regions
ever binds to any of the corresponding parallel regions.
C/C++
requires an accessible, unambiguous default constructor for the class type, unless the
list item is also specified in a firstprivate clause.
requires an accessible, unambiguous copy assignment operator for the class type. The
order in which copy assignment operators for different variables of class type are
called is unspecified.
type unless it is of class type with a mutable member.
or a reference type.
C/C++
Fortran
• A variable that appears in a lastprivate clause must be definable.
31
102
OpenMP API • Version 3.1 July 2011
1
2
3
4
5
• An original list item with the ALLOCATABLE attribute must be in the allocated state
6
7
8
• Variables that appear in namelist statements, in variable format expressions, and in
at entry to the construct containing the lastprivate clause. The list item in the
sequentially last iteration or lexically last section must be in the allocated state upon
exit from that iteration or section with the same bounds as the corresponding original
list item.
expressions for statement function definitions, may not appear in a lastprivate
clause.
Fortran
9
2.9.3.6
reduction clause
10
Summary
11
12
13
14
The reduction clause specifies an operator and one or more list items. For each list
item, a private copy is created in each implicit task, and is initialized appropriately for
the operator. After the end of the region, the original list item is updated with the values
of the private copies using the specified operator.
15
Syntax
16
The syntax of the reduction clause is as follows:
C/C++
reduction(operator:list)
17
18
The following table lists the operators that are valid and their initialization values. The
actual initialization value depends on the data type of the reduction list item.
Operator
Initialization value
+
0
*
1
-
0
&
~0
|
0
^
0
&&
1
Chapter 2
Directives
103
||
0
max
Least representable value in the reduction list item type
min
Largest representable value in the reduction list item type
1
C/C++
Fortran
The syntax of the reduction clause is as follows:
2
reduction({operator | intrinsic_procedure_name}:list)
The following table lists the operators and intrinsic_procedure_names that are valid and
their initialization values. The actual initialization value depends on the data type of the
reduction list item.
3
4
5
Operator/
Intrinsic
Initialization value
+
0
*
1
-
0
.and.
.true.
.or.
.false.
.eqv.
.true.
.neqv.
.false.
max
Least representable number in the reduction list item type
min
Largest representable number in the reduction list item type
iand
All bits on
ior
0
ieor
0
6
Fortran
104
OpenMP API • Version 3.1 July 2011
1
Description
2
3
The reduction clause can be used to perform some forms of recurrence calculations
(involving mathematically associative and commutative operators) in parallel.
4
5
6
7
8
9
A private copy of each list item is created, one for each implicit task, as if the private
clause had been used. The private copy is then initialized to the initialization value for
the operator, as specified above. At the end of the region for which the reduction
clause was specified, the original list item is updated by combining its original value
with the final value of each of the private copies, using the operator specified. (The
partial results of a subtraction reduction are added to form the final value.)
10
11
For max and min operators, the final values of the private copies are combined with the
original list item value using the following expressions:
C/C++
max
original_list_item =
original_list_item < private_copy ? private_copy : original_list_item;
min
original_list_item =
original_list_item > private_copy ? private_copy : original_list_item;
C/C++
12
13
14
15
16
17
18
19
If nowait is not used, the reduction computation will be complete at the end of the
construct; however, if the reduction clause is used on a construct to which nowait is
also applied, accesses to the original list item will create a race and, thus, have
unspecified effect unless synchronization ensures that they occur after all threads have
executed all of their iterations or section constructs, and the reduction computation
has completed and stored the computed value of that list item. This can most simply be
ensured through a barrier synchronization.
20
21
22
23
24
25
The location in the OpenMP program at which the values are combined and the order in
which the values are combined are unspecified. Therefore, when comparing sequential
and parallel runs, or when comparing one parallel run to another (even if the number of
threads used is the same), there is no guarantee that bit-identical results will be obtained
or that side effects (such as floating point exceptions) will be identical or take place at
the same location in the OpenMP program.
26
27
28
To avoid race conditions, concurrent reads or updates of the original list item must be
synchronized with the update of the original list item that occurs as a result of the
reduction computation.
Chapter 2
Directives
105
1
Restrictions
2
The restrictions to the reduction clause are as follows:
3
4
5
• A list item that appears in a reduction clause of a worksharing construct must be
6
7
• A list item that appears in a reduction clause of the innermost enclosing
8
9
• Any number of reduction clauses can be specified on the directive, but a list item
shared in the parallel regions to which any of the worksharing regions arising
from the worksharing construct bind.
worksharing or parallel construct may not be accessed in an explicit task.
can appear only once in the reduction clauses for that directive.
C/C++
10
11
12
13
14
15
16
• The type of a list item that appears in a reduction clause must be valid for the
17
18
• Aggregate types (including arrays), pointer types and reference types may not appear
19
• A list item that appears in a reduction clause must not be const-qualified.
reduction operator. For a max or min reduction in C, the type of the list item must be
an allowed arithmetic data type: char, int, float, double, or _Bool, possibly
modified with long, short, signed, or unsigned. For a max or min reduction
in C++, the type of the list item must be an allowed arithmetic data type: char,
wchar_t, int, float, double, or bool, possibly modified with long, short,
signed, or unsigned.
in a reduction clause.
C/C++
Fortran
20
21
• The type of a list item that appears in a reduction clause must be valid for the
22
• A list item that appears in a reduction clause must be definable.
23
24
• A list item that appears in a reduction clause must be a named variable of
25
26
27
• An original list item with the ALLOCATABLE attribute must be in the allocated state
28
• Fortran pointers and Cray pointers may not appear in a reduction clause.
29
30
31
• Operators specified must be intrinsic operators and any intrinsic_procedure_name
reduction operator or intrinsic.
intrinsic type.
at entry to the construct containing the reduction clause. Additionally, the list item
must not be deallocated and/or allocated within the region.
must refer to one of the allowed intrinsic procedures. Assignment to the reduction list
items must be via intrinsic assignment. See Section A.36 on page 266 for examples.
Fortran
106
OpenMP API • Version 3.1 July 2011
1
2.9.4
Data Copying Clauses
2
3
4
This section describes the copyin clause (allowed on the parallel directive and
combined parallel worksharing directives) and the copyprivate clause (allowed on
the single directive).
5
6
7
These clauses support the copying of data values from private or threadprivate variables
on one implicit task or thread to the corresponding variables on other implicit tasks or
threads in the team.
8
9
10
11
The clauses accept a comma-separated list of list items (see Section 2.1 on page 22). All
list items appearing in a clause must be visible, according to the scoping rules of the
base language. Clauses may be repeated as needed, but a list item that specifies a given
variable may not appear in more than one clause on the same directive.
12
2.9.4.1
copyin clause
13
Summary
14
15
16
The copyin clause provides a mechanism to copy the value of the master thread’s
threadprivate variable to the threadprivate variable of each other member of the team
executing the parallel region.
17
Syntax
18
The syntax of the copyin clause is as follows:
copyin(list)
19
Description
20
21
22
23
24
25
26
The copy is done after the team is formed and prior to the start of execution of the
associated structured block. For variables of non-array type, the copy occurs by copy
assignment. For an array of elements of non-array type, each element is copied as if by
assignment from an element of the master thread’s array to the corresponding element of
the other thread’s array. For class types, the copy assignment operator is invoked. The
order in which copy assignment operators for different variables of class type are called
is unspecified.
C/C++
C/C++
Chapter 2
Directives
107
Fortran
1
2
The copy is done, as if by assignment, after the team is formed and prior to the start of
execution of the associated structured block.
3
4
5
On entry to any parallel region, each thread’s copy of a variable that is affected by
a copyin clause for the parallel region will acquire the allocation, association, and
definition status of the master thread’s copy, according to the following rules:
6
7
• If the original list item has the POINTER attribute, each copy receives the same
association status of the master thread’s copy as if by pointer assignment.
• If the original list item does not have the POINTER attribute, each copy becomes
8
9
10
11
defined with the value of the master thread's copy as if by intrinsic assignment,
unless it has the allocation status of not currently allocated, in which case each copy
will have the same status.
Fortran
12
For an example of the copyin clause, see Section A.37 on page 271.
13
Restrictions
14
The restrictions to the copyin clause are as follows:
15
• A list item that appears in a copyin clause must be threadprivate.
16
17
• A variable of class type (or array thereof) that appears in a copyin clause requires
C/C++
an accessible, unambiguous copy assignment operator for the class type.
C/C++
Fortran
18
19
20
• A list item that appears in a copyin clause must be threadprivate. Named variables
21
22
• A common block name that appears in a copyin clause must be declared to be a
23
24
• If an array with the ALLOCATABLE attribute is allocated, then each thread's copy of
appearing in a threadprivate common block may be specified: it is not necessary to
specify the whole common block.
common block in the same scoping unit in which the copyin clause appears.
that array must be allocated with the same bounds.
Fortran
108
OpenMP API • Version 3.1 July 2011
1
2.9.4.2
copyprivate clause
2
Summary
3
4
5
The copyprivate clause provides a mechanism to use a private variable to broadcast
a value from the data environment of one implicit task to the data environments of the
other implicit tasks belonging to the parallel region.
6
7
8
To avoid race conditions, concurrent reads or updates of the list item must be
synchronized with the update of the list item that occurs as a result of the
copyprivate clause.
9
Syntax
10
The syntax of the copyprivate clause is as follows:
copyprivate(list)
11
Description
12
13
14
15
The effect of the copyprivate clause on the specified list items occurs after the
execution of the structured block associated with the single construct (see
Section 2.5.3 on page 50), and before any of the threads in the team have left the barrier
at the end of the construct.
16
17
18
19
20
21
22
23
24
In all other implicit tasks belonging to the parallel region, each specified list item
becomes defined with the value of the corresponding list item in the implicit task whose
thread executed the structured block. For variables of non-array type, the definition
occurs by copy assignment. For an array of elements of non-array type, each element is
copied by copy assignment from an element of the array in the data environment of the
implicit task associated with the thread that executed the structured block to the
corresponding element of the array in the data environment of the other implicit tasks.
For class types, a copy assignment operator is invoked. The order in which copy
assignment operators for different variables of class type are called is unspecified.
C/C++
C/C++
Fortran
25
26
27
28
If a list item does not have the POINTER attribute, then in all other implicit tasks
belonging to the parallel region, the list item becomes defined as if by intrinsic
assignment with the value of the corresponding list item in the implicit task associated
with the thread that executed the structured block.
Chapter 2
Directives
109
If the list item has the POINTER attribute, then, in all other implicit tasks belonging to
the parallel region, the list item receives, as if by pointer assignment, the same
association status of the corresponding list item in the implicit task associated with the
thread that executed the structured block.
1
2
3
4
Fortran
5
For examples of the copyprivate clause, see Section A.38 on page 273.
6
7
8
Note – The copyprivate clause is an alternative to using a shared variable for the
value when providing such a shared variable would be difficult (for example, in a
recursion requiring a different variable at each level).
9
Restrictions
10
The restrictions to the copyprivate clause are as follows:
11
12
• All list items that appear in the copyprivate clause must be either threadprivate
13
14
• A list item that appears in a copyprivate clause may not appear in a private or
15
16
• A variable of class type (or array thereof) that appears in a copyprivate clause
or private in the enclosing context.
firstprivate clause on the single construct.
C/C++
requires an accessible unambiguous copy assignment operator for the class type.
C/C++
Fortran
17
• A common block that appears in a copyprivate clause must be threadprivate.
18
19
• An array with the ALLOCATABLE attribute must be in the allocated state with the
same bounds in all threads affected by the copyprivate clause.
Fortran
110
OpenMP API • Version 3.1 July 2011
1
2.10
Nesting of Regions
2
3
This section describes a set of restrictions on the nesting of regions. The restrictions on
nesting are as follows:
4
5
• A worksharing region may not be closely nested inside a worksharing, explicit task,
6
7
• A barrier region may not be closely nested inside a worksharing, explicit task,
8
9
• A master region may not be closely nested inside a worksharing, atomic, or
critical, ordered, atomic, or master region.
critical, ordered, atomic, or master region.
explicit task region.
10
11
• An ordered region may not be closely nested inside a critical, atomic, or
12
13
• An ordered region must be closely nested inside a loop region (or parallel loop
14
15
16
• A critical region may not be nested (closely or otherwise) inside a critical
17
18
• parallel, flush, critical, atomic, taskyield, and explicit task
19
20
For examples illustrating these rules, see Section A.20 on page 221, Section A.39 on
page 278, Section A.40 on page 281, and Section A.15 on page 193.
explicit task region.
region) with an ordered clause.
region with the same name. Note that this restriction is not sufficient to prevent
deadlock.
regions may not be closely nested inside an atomic region.
Chapter 2
Directives
111
1
This page intentionally left blank.
2
3
112
OpenMP API • Version 3.1 July 2011
1
CHAPTER
3
2
Runtime Library Routines
3
4
This chapter describes the OpenMP API runtime library routines and is divided into the
following sections:
5
• Runtime library definitions (Section 3.1 on page 114).
6
7
• Execution environment routines that can be used to control and to query the parallel
8
9
• Lock routines that can be used to synchronize access to data (Section 3.3 on page
10
execution environment (Section 3.2 on page 115).
141).
• Portable timer routines (Section 3.4 on page 148).
11
12
13
Throughout this chapter, true and false are used as generic terms to simplify the
description of the routines.
14
true means a nonzero integer value and false means an integer value of zero.
C/C++
C/C++
Fortran
15
true means a logical value of .TRUE. and false means a logical value of .FALSE..
Fortran
Fortran
16
Restrictions
17
The following restriction applies to all OpenMP runtime library routines:
18
19
• OpenMP runtime library routines may not be called from PURE or ELEMENTAL
procedures.
Fortran
113
1
3.1
Runtime Library Definitions
2
3
4
5
6
For each base language, a compliant implementation must supply a set of definitions for
the OpenMP API runtime library routines and the special data types of their parameters.
The set of definitions must contain a declaration for each OpenMP API runtime library
routine and a declaration for the simple lock, nestable lock and schedule data types. In
addition, each set of definitions may specify other implementation specific values.
7
The library routines are external functions with “C” linkage.
8
9
Prototypes for the C/C++ runtime library routines described in this chapter shall be
provided in a header file named omp.h. This file defines the following:
C/C++
10
• The prototypes of all the routines in the chapter.
11
• The type omp_lock_t.
12
• The type omp_nest_lock_t.
13
• The type omp_sched_t.
14
See Section D.1 on page 326 for an example of this file.
C/C++
Fortran
15
16
The OpenMP Fortran API runtime library routines are external procedures. The return
values of these routines are of default kind, unless otherwise specified.
17
18
19
20
Interface declarations for the OpenMP Fortran runtime library routines described in this
chapter shall be provided in the form of a Fortran include file named omp_lib.h or
a Fortran 90 module named omp_lib. It is implementation defined whether the
include file or the module file (or both) is provided.
21
These files define the following:
22
• The interfaces of all of the routines in this chapter.
23
• The integer parameter omp_lock_kind.
24
• The integer parameter omp_nest_lock_kind.
25
• The integer parameter omp_sched_kind.
26
27
28
29
30
• The integer parameter openmp_version with a value yyyymm where yyyy
and mm are the year and month designations of the version of the OpenMP Fortran
API that the implementation supports. This value matches that of the C preprocessor
macro _OPENMP, when a macro preprocessor is supported (see Section 2.2 on page
26).
114
OpenMP API • Version 3.1 July 2011
1
See Section D.2 on page 328 and Section D.3 on page 330 for examples of these files.
2
3
4
It is implementation defined whether any of the OpenMP runtime library routines that
take an argument are extended with a generic interface so arguments of different KIND
type can be accommodated. See Appendix D.4 for an example of such an extension.
Fortran
5
3.2
Execution Environment Routines
6
7
The routines described in this section affect and monitor threads, processors, and the
parallel environment.
8
• the omp_set_num_threads routine.
9
• the omp_get_num_threads routine.
10
• the omp_get_max_threads routine.
11
• the omp_get_thread_num routine.
12
• the omp_get_num_procs routine.
13
• the omp_in_parallel routine.
14
• the omp_set_dynamic routine.
15
• the omp_get_dynamic routine.
16
• the omp_set_nested routine.
17
• the omp_get_nested routine.
18
• the omp_set_schedule routine.
19
• the omp_get_schedule routine.
20
• the omp_get_thread_limit routine.
21
• the omp_set_max_active_levels routine.
22
• the omp_get_max_active_levels routine.
23
• the omp_get_level routine.
24
• the omp_get_ancestor_thread_num routine.
25
• the omp_get_team_size routine.
26
• the omp_get_active_level routine.
27
• the omp_in_final routine.
Chapter 3
Runtime Library Routines
115
1
3.2.1
omp_set_num_threads
2
Summary
3
4
5
The omp_set_num_threads routine affects the number of threads to be used for
subsequent parallel regions that do not specify a num_threads clause, by setting the
value of the first element of the nthreads-var ICV of the current task.
6
Format
C/C++
void omp_set_num_threads(int num_threads);
C/C++
7
Fortran
subroutine omp_set_num_threads(num_threads)
integer num_threads
Fortran
8
Constraints on Arguments
9
10
11
The value of the argument passed to this routine must evaluate to a positive integer, or
else the behavior of this routine is implementation defined.
12
Binding
13
The binding task set for an omp_set_num_threads region is the generating task.
14
Effect
15
16
The effect of this routine is to set the value of the first element of the nthreads-var ICV
of the current task to the value specified in the argument.
17
18
See Section 2.4.1 on page 36 for the rules governing the number of threads used to
execute a parallel region.
116
OpenMP API • Version 3.1 July 2011
1
2
For an example of the omp_set_num_threads routine, see Section A.41 on page
288.
3
Cross References
4
• nthreads-var ICV, see Section 2.3 on page 28.
5
• OMP_NUM_THREADS environment variable, see Section 4.2 on page 155.
6
• omp_get_max_threads routine, see Section 3.2.3 on page 118.
7
• parallel construct, see Section 2.4 on page 33.
8
• num_threads clause, see Section 2.4 on page 33.
9
3.2.2
omp_get_num_threads
10
Summary
11
12
The omp_get_num_threads routine returns the number of threads in the current
team.
13
Format
C/C++
int omp_get_num_threads(void);
C/C++
14
Fortran
integer function omp_get_num_threads()
Fortran
15
16
Binding
17
18
The binding region for an omp_get_num_threads region is the innermost enclosing
parallel region.
Chapter 3
Runtime Library Routines
117
1
Effect
2
3
4
5
The omp_get_num_threads routine returns the number of threads in the team
executing the parallel region to which the routine region binds. If called from the
sequential part of a program, this routine returns 1. For examples, see Section A.42 on
page 289.
6
7
See Section 2.4.1 on page 36 for the rules governing the number of threads used to
execute a parallel region.
8
Cross References
9
• parallel construct, see Section 2.4 on page 33.
10
• omp_set_num_threads routine, see Section 3.2.1 on page 116.
11
• OMP_NUM_THREADS environment variable, see Section 4.2 on page 155.
12
3.2.3
omp_get_max_threads
13
Summary
14
15
16
The omp_get_max_threads routine returns an upper bound on the number of
threads that could be used to form a new team if a parallel region without a
num_threads clause were encountered after execution returns from this routine.
17
Format
C/C++
int omp_get_max_threads(void);
C/C++
18
Fortran
integer function omp_get_max_threads()
Fortran
19
118
OpenMP API • Version 3.1 July 2011
1
Binding
2
The binding task set for an omp_get_max_threads region is the generating task.
3
Effect
4
5
6
7
The value returned by omp_get_max_threads is the value of the first element of
the nthreads-var ICV of the current task. This value is also an upper bound on the
number of threads that could be used to form a new team if a parallel region without a
num_threads clause were encountered after execution returns from this routine.
8
9
See Section 2.4.1 on page 36 for the rules governing the number of threads used to
execute a parallel region.
10
11
12
Note – The return value of the omp_get_max_threads routine can be used to
dynamically allocate sufficient storage for all threads in the team formed at the
subsequent active parallel region.
13
Cross References
14
• nthreads-var ICV, see Section 2.3 on page 28.
15
• parallel construct, see Section 2.4 on page 33.
16
• num_threads clause, see Section 2.4 on page 33.
17
• omp_set_num_threads routine, see Section 3.2.1 on page 116.
18
• OMP_NUM_THREADS environment variable, see Section 4.2 on page 155.
19
3.2.4
omp_get_thread_num
20
Summary
21
22
The omp_get_thread_num routine returns the thread number, within the current
team, of the calling thread.
Chapter 3
Runtime Library Routines
119
Format
1
C/C++
int omp_get_thread_num(void);
C/C++
2
Fortran
integer function omp_get_thread_num()
Fortran
3
4
Binding
5
6
7
The binding thread set for an omp_get_thread_num region is the current team. The
binding region for an omp_get_thread_num region is the innermost enclosing
parallel region.
8
Effect
9
10
11
12
13
The omp_get_thread_num routine returns the thread number of the calling thread,
within the team executing the parallel region to which the routine region binds. The
thread number is an integer between 0 and one less than the value returned by
omp_get_num_threads, inclusive. The thread number of the master thread of the
team is 0. The routine returns 0 if it is called from the sequential part of a program.
14
15
16
Note – The thread number may change at any time during the execution of an untied
task. The value returned by omp_get_thread_num is not generally useful during the
execution of such a task region.
17
Cross References
18
• omp_get_num_threads routine, see Section 3.2.2 on page 117.
120
OpenMP API • Version 3.1 July 2011
1
3.2.5
omp_get_num_procs
2
Summary
3
4
The omp_get_num_procs routine returns the number of processors available to the
program.
5
Format
C/C++
int omp_get_num_procs(void);
C/C++
6
Fortran
integer function omp_get_num_procs()
Fortran
7
8
Binding
9
10
11
The binding thread set for an omp_get_num_procs region is all threads. The effect
of executing this routine is not related to any specific region corresponding to any
construct or API routine.
12
Effect
13
14
15
16
17
The omp_get_num_procs routine returns the number of processors that are available
to the program at the time the routine is called. Note that this value may change between
the time that it is determined by the omp_get_num_procs routine and the time that it
is read in the calling context due to system actions outside the control of the OpenMP
implementation.
Chapter 3
Runtime Library Routines
121
1
3.2.6
omp_in_parallel
2
Summary
3
4
The omp_in_parallel routine returns true if the call to the routine is enclosed by an
active parallel region; otherwise, it returns false.
5
Format
C/C++
int omp_in_parallel(void);
C/C++
6
Fortran
logical function omp_in_parallel()
Fortran
7
Binding
8
9
10
11
The binding thread set for an omp_in_parallel region is all threads. The effect of
executing this routine is not related to any specific parallel region but instead
depends on the state of all enclosing parallel regions.
12
Effect
13
14
15
omp_in_parallel returns true if any enclosing parallel region is active. If the
routine call is enclosed by only inactive parallel regions (including the implicit
parallel region), then it returns false.
122
OpenMP API • Version 3.1 July 2011
1
3.2.7
omp_set_dynamic
2
Summary
3
4
5
The omp_set_dynamic routine enables or disables dynamic adjustment of the
number of threads available for the execution of subsequent parallel regions by
setting the value of the dyn-var ICV.
6
Format
C/C++
void omp_set_dynamic(int dynamic_threads);
C/C++
7
Fortran
subroutine omp_set_dynamic (dynamic_threads)
logical dynamic_threads
Fortran
8
9
Binding
10
The binding task set for an omp_set_dynamic region is the generating task.
11
Effect
12
13
14
15
16
For implementations that support dynamic adjustment of the number of threads, if the
argument to omp_set_dynamic evaluates to true, dynamic adjustment is enabled for
the current task; otherwise, dynamic adjustment is disabled for the current task. For
implementations that do not support dynamic adjustment of the number of threads this
routine has no effect: the value of dyn-var remains false.
17
For an example of the omp_set_dynamic routine, see Section A.41 on page 288.
18
19
See Section 2.4.1 on page 36 for the rules governing the number of threads used to
execute a parallel region.
Chapter 3
Runtime Library Routines
123
1
Cross References:
2
• dyn-var ICV, see Section 2.3 on page 28.
3
• omp_get_num_threads routine, see Section 3.2.2 on page 117.
4
• omp_get_dynamic routine, see Section 3.2.8 on page 124.
5
• OMP_DYNAMIC environment variable, see Section 4.3 on page 156.
6
3.2.8
omp_get_dynamic
7
Summary
8
9
The omp_get_dynamic routine returns the value of the dyn-var ICV, which
determines whether dynamic adjustment of the number of threads is enabled or disabled.
Format
10
C/C++
int omp_get_dynamic(void);
C/C++
11
Fortran
logical function omp_get_dynamic()
Fortran
12
13
Binding
14
The binding task set for an omp_get_dynamic region is the generating task.
15
Effect
16
17
18
This routine returns true if dynamic adjustment of the number of threads is enabled for
the current task; it returns false, otherwise. If an implementation does not support
dynamic adjustment of the number of threads, then this routine always returns false.
124
OpenMP API • Version 3.1 July 2011
1
2
See Section 2.4.1 on page 36 for the rules governing the number of threads used to
execute a parallel region.
3
Cross References
4
• dyn-var ICV, see Section 2.3 on page 28.
5
• omp_set_dynamic routine, see Section 3.2.7 on page 123.
6
• OMP_DYNAMIC environment variable, see Section 4.3 on page 156.
7
8
3.2.9
omp_set_nested
Summary
9
10
The omp_set_nested routine enables or disables nested parallelism, by setting the
nest-var ICV.
11
Format
C/C++
void omp_set_nested(int nested);
12
C/C++
Fortran
subroutine omp_set_nested (nested)
logical nested
13
Fortran
Chapter 3
Runtime Library Routines
125
1
Binding
2
The binding task set for an omp_set_nested region is the generating task.
3
Effect
4
5
6
7
8
For implementations that support nested parallelism, if the argument to
omp_set_nested evaluates to true, nested parallelism is enabled for the current task;
otherwise, nested parallelism is disabled for the current task. For implementations that
do not support nested parallelism, this routine has no effect: the value of nest-var
remains false.
9
10
See Section 2.4.1 on page 36 for the rules governing the number of threads used to
execute a parallel region.
11
Cross References
12
• nest-var ICV, see Section 2.3 on page 28.
13
• omp_set_max_active_levels routine, see Section 3.2.14 on page 132.
14
• omp_get_max_active_levels routine, see Section 3.2.15 on page 134.
15
• omp_get_nested routine, see Section 3.2.10 on page 126.
16
• OMP_NESTED environment variable, see Section 4.5 on page 157.
17
3.2.10
omp_get_nested
18
Summary
19
20
The omp_get_nested routine returns the value of the nest-var ICV, which
determines if nested parallelism is enabled or disabled.
126
OpenMP API • Version 3.1 July 2011
1
Format
C/C++
int omp_get_nested(void);
C/C++
2
Fortran
logical function omp_get_nested()
Fortran
3
4
Binding
5
The binding task set for an omp_get_nested region is the generating task.
6
Effect
7
8
9
This routine returns true if nested parallelism is enabled for the current task; it returns
false, otherwise. If an implementation does not support nested parallelism, this routine
always returns false.
10
11
See Section 2.4.1 on page 36 for the rules governing the number of threads used to
execute a parallel region.
12
Cross References
13
• nest-var ICV, see Section 2.3 on page 28.
14
• omp_set_nested routine, see Section 3.2.9 on page 125.
15
• OMP_NESTED environment variable, see Section 4.5 on page 157.
Chapter 3
Runtime Library Routines
127
1
3.2.11
omp_set_schedule
2
Summary
3
4
The omp_set_schedule routine affects the schedule that is applied when runtime
is used as schedule kind, by setting the value of the run-sched-var ICV.
5
Format
6
C/C++
void omp_set_schedule(omp_sched_t kind, int modifier);
7
C/C++
8
Fortran
subroutine omp_set_schedule(kind, modifier)
integer (kind=omp_sched_kind) kind
integer modifier
Fortran
9
10
Constraints on Arguments
11
12
13
14
15
The first argument passed to this routine can be one of the valid OpenMP schedule kinds
(except for runtime) or any implementation specific schedule. The C/C++ header file
(omp.h) and the Fortran include file (omp_lib.h) and/or Fortran 90 module file
(omp_lib) define the valid constants. The valid constants must include the following,
which can be extended with implementation specific values:
128
OpenMP API • Version 3.1 July 2011
C/C++
1
typedef enum omp_sched_t {
omp_sched_static = 1,
omp_sched_dynamic = 2,
omp_sched_guided = 3,
omp_sched_auto = 4
} omp_sched_t;
C/C++
2
Fortran
integer(kind=omp_sched_kind),
integer(kind=omp_sched_kind),
integer(kind=omp_sched_kind),
integer(kind=omp_sched_kind),
parameter
parameter
parameter
parameter
::
::
::
::
omp_sched_static = 1
omp_sched_dynamic = 2
omp_sched_guided = 3
omp_sched_auto = 4
Fortran
3
4
Binding
5
The binding task set for an omp_set_schedule region is the generating task.
6
Effect
7
8
9
10
11
12
13
14
The effect of this routine is to set the value of the run-sched-var ICV of the current task
to the values specified in the two arguments. The schedule is set to the schedule type
specified by the first argument kind. It can be any of the standard schedule types or
any other implementation specific one. For the schedule types static, dynamic, and
guided the chunk_size is set to the value of the second argument, or to the default
chunk_size if the value of the second argument is less than 1; for the schedule type
auto the second argument has no meaning; for implementation specific schedule types,
the values and associated meanings of the second argument are implementation defined.
Chapter 3
Runtime Library Routines
129
1
Cross References
2
• run-sched-var ICV, see Section 2.3 on page 28.
3
• omp_get_schedule routine, see Section 3.2.12 on page 130.
4
• OMP_SCHEDULE environment variable, see Section 4.1 on page 154.
5
• Determining the schedule of a worksharing loop, see Section 2.5.1.1 on page 47.
6
3.2.12
omp_get_schedule
7
Summary
8
9
The omp_get_schedule routine returns the schedule that is applied when the
runtime schedule is used.
Format
10
11
C/C++
void omp_get_schedule(omp_sched_t * kind, int * modifier );
C/C++
12
Fortran
subroutine omp_get_schedule(kind, modifier)
integer (kind=omp_sched_kind) kind
integer modifier
Fortran
13
14
Binding
15
The binding task set for an omp_get_schedule region is the generating task.
130
OpenMP API • Version 3.1 July 2011
1
Effect
2
3
4
5
6
This routine returns the run-sched-var ICV in the task to which the routine binds. The
first argument kind returns the schedule to be used. It can be any of the standard
schedule types as defined in Section 3.2.11 on page 128, or any implementation specific
schedule type. The second argument is interpreted as in the omp_set_schedule call,
defined in Section 3.2.11 on page 128.
7
Cross References
8
• run-sched-var ICV, see Section 2.3 on page 28.
9
• omp_set_schedule routine, see Section 3.2.11 on page 128.
10
• OMP_SCHEDULE environment variable, see Section 4.1 on page 154.
11
• Determining the schedule of a worksharing loop, see Section 2.5.1.1 on page 47.
12
3.2.13
omp_get_thread_limit
13
Summary
14
15
The omp_get_thread_limit routine returns the maximum number of OpenMP
threads available to the program.
16
Format
17
C/C++
int omp_get_thread_limit(void);
C/C++
18
Fortran
integer function omp_get_thread_limit()
19
Fortran
Chapter 3
Runtime Library Routines
131
1
Binding
2
3
4
The binding thread set for an omp_get_thread_limit region is all threads. The
effect of executing this routine is not related to any specific region corresponding to any
construct or API routine.
5
Effect
6
7
The omp_get_thread_limit routine returns the maximum number of OpenMP
threads available to the program as stored in the ICV thread-limit-var.
8
Cross References
9
• thread-limit-var ICV, see Section 2.3 on page 28.
• OMP_THREAD_LIMIT environment variable, see Section 4.9 on page 160.
10
11
3.2.14
omp_set_max_active_levels
12
Summary
13
14
The omp_set_max_active_levels routine limits the number of nested active
parallel regions, by setting the max-active-levels-var ICV.
15
Format
16
C/C++
void omp_set_max_active_levels (int max_levels);
C/C++
17
132
OpenMP API • Version 3.1 July 2011
1
Fortran
subroutine omp_set_max_active_levels (max_levels)
integer max_levels
Fortran
2
3
Constraints on Arguments
4
5
The value of the argument passed to this routine must evaluate to a non-negative integer,
otherwise the behavior of this routine is implementation defined.
6
Binding
7
8
9
10
When called from the sequential part of the program, the binding thread set for an
omp_set_max_active_levels region is the encountering thread. When called
from within any explicit parallel region, the binding thread set (and binding region, if
required) for the omp_set_max_active_levels region is implementation defined.
11
Effect
12
13
The effect of this routine is to set the value of the max-active-levels-var ICV to the value
specified in the argument.
14
15
16
If the number of parallel levels requested exceeds the number of levels of parallelism
supported by the implementation, the value of the max-active-levels-var ICV will be set
to the number of parallel levels supported by the implementation.
17
18
19
This routine has the described effect only when called from the sequential part of the
program. When called from within an explicit parallel region, the effect of this
routine is implementation defined.
20
Cross References
21
• max-active-levels-var ICV, see Section 2.3 on page 28.
22
• omp_get_max_active_levels routine, see Section 3.2.15 on page 134.
23
• OMP_MAX_ACTIVE_LEVELS environment variable, see Section 4.8 on page 159.
Chapter 3
Runtime Library Routines
133
1
3.2.15
omp_get_max_active_levels
2
Summary
3
4
5
The omp_get_max_active_levels routine returns the value of the max-activelevels-var ICV, which determines the maximum number of nested active parallel
regions.
6
Format
7
C/C++
int omp_get_max_active_levels(void);
C/C++
8
Fortran
integer function omp_get_max_active_levels()
Fortran
9
10
Binding
11
12
13
14
When called from the sequential part of the program, the binding thread set for an
omp_get_max_active_levels region is the encountering thread. When called
from within any explicit parallel region, the binding thread set (and binding region, if
required) for the omp_get_max_active_levels region is implementation defined.
15
Effect
16
17
18
The omp_get_max_active_levels routine returns the value of the max-activelevels-var ICV, which determines the maximum number of nested active parallel
regions.
134
OpenMP API • Version 3.1 July 2011
1
Cross References
2
• max-active-levels-var ICV, see Section 2.3 on page 28.
3
• omp_set_max_active_levels routine, see Section 3.2.14 on page 132.
4
• OMP_MAX_ACTIVE_LEVELS environment variable, see Section 4.8 on page 159.
5
3.2.16
omp_get_level
6
Summary
7
8
The omp_get_level routine returns the number of nested parallel regions
enclosing the task that contains the call.
9
Format
10
C/C++
int omp_get_level(void);
C/C++
11
Fortran
integer function omp_get_level()
Fortran
12
13
Binding
14
15
16
The binding task set for an omp_get_level region is the generating task. The
binding region for an omp_get_level region is the innermost enclosing parallel
region.
Chapter 3
Runtime Library Routines
135
1
Effect
2
3
4
5
The omp_get_level routine returns the number of nested parallel regions
(whether active or inactive) enclosing the task that contains the call, not including the
implicit parallel region. The routine always returns a non-negative integer, and returns 0
if it is called from the sequential part of the program.
6
Cross References
7
• omp_get_active_level routine, see Section 3.2.19 on page 139.
8
• OMP_MAX_ACTIVE_LEVELS environment variable, see Section 4.8 on page 159.
9
3.2.17
omp_get_ancestor_thread_num
10
Summary
11
12
The omp_get_ancestor_thread_num routine returns, for a given nested level of
the current thread, the thread number of the ancestor or the current thread.
13
Format
14
C/C++
int omp_get_ancestor_thread_num(int level);
C/C++
15
Fortran
integer function omp_get_ancestor_thread_num(level)
integer level
Fortran
16
136
OpenMP API • Version 3.1 July 2011
1
Binding
2
3
4
The binding thread set for an omp_get_ancestor_thread_num region is the
encountering thread. The binding region for an omp_get_ancestor_thread_num
region is the innermost enclosing parallel region.
5
Effect
6
7
8
9
The omp_get_ancestor_thread_num routine returns the thread number of the
ancestor at a given nest level of the current thread or the thread number of the current
thread. If the requested nest level is outside the range of 0 and the nest level of the
current thread, as returned by the omp_get_level routine, the routine returns -1.
10
11
12
Note – When the omp_get_ancestor_thread_num routine is called with a value
of level=0, the routine always returns 0. If level=omp_get_level(), the routine
has the same effect as the omp_get_thread_num routine.
13
Cross References
14
• omp_get_level routine, see Section 3.2.16 on page 135.
15
• omp_get_thread_num routine, see Section 3.2.4 on page 119.
16
• omp_get_team_size routine, see Section 3.2.18 on page 137.
17
3.2.18
omp_get_team_size
18
Summary
19
20
The omp_get_team_size routine returns, for a given nested level of the current
thread, the size of the thread team to which the ancestor or the current thread belongs.
Chapter 3
Runtime Library Routines
137
Format
1
2
C/C++
int omp_get_team_size(int level);
C/C++
3
Fortran
integer function omp_get_team_size(level)
integer level
Fortran
4
5
Binding
6
7
8
The binding thread set for an omp_get_team_size region is the encountering
thread. The binding region for an omp_get_team_size region is the innermost
enclosing parallel region.
9
Effect
10
11
12
13
14
The omp_get_team_size routine returns the size of the thread team to which the
ancestor or the current thread belongs. If the requested nested level is outside the range
of 0 and the nested level of the current thread, as returned by the omp_get_level
routine, the routine returns -1. Inactive parallel regions are regarded like active parallel
regions executed with one thread.
15
16
17
Note – When the omp_get_team_size routine is called with a value of level=0,
the routine always returns 1. If level=omp_get_level(), the routine has the same
effect as the omp_get_num_threads routine.
138
OpenMP API • Version 3.1 July 2011
1
Cross References
2
• omp_get_num_threads routine, see Section 3.2.2 on page 117.
3
• omp_get_level routine, see Section 3.2.16 on page 135.
4
• omp_get_ancestor_thread_num routine, see Section 3.2.17 on page 136.
5
3.2.19
omp_get_active_level
6
Summary
7
8
The omp_get_active_level routine returns the number of nested, active
parallel regions enclosing the task that contains the call.
9
Format
10
C/C++
int omp_get_active_level(void);
C/C++
11
Fortran
integer function omp_get_active_level()
Fortran
12
13
Binding
14
15
16
The binding task set for the an omp_get_active_level region is the generating
task. The binding region for an omp_get_active_level region is the innermost
enclosing parallel region.
Chapter 3
Runtime Library Routines
139
1
Effect
2
3
4
The omp_get_active_level routine returns the number of nested, active parallel
regions enclosing the task that contains the call. The routine always returns a nonnegative integer, and returns 0 if it is called from the sequential part of the program.
5
Cross References
6
• omp_get_level routine, see Section 3.2.16 on page 135.
7
3.2.20
omp_in_final
Summary
8
9
10
The omp_in_final routine returns true if the routine is executed in a final task
region; otherwise, it returns false.
11
Format
12
C/C++
int omp_in_final(void);
C/C++
13
Fortran
logical function omp_in_final()
Fortran
14
15
Binding
16
The binding task set for an omp_in_final region is the generating task.
140
OpenMP API • Version 3.1 July 2011
1
Effect
2
3
omp_in_final returns true if the enclosing task region is final. Otherwise, it returns
false.
4
3.3
Lock Routines
5
6
7
8
9
The OpenMP runtime library includes a set of general-purpose lock routines that can be
used for synchronization. These general-purpose lock routines operate on OpenMP locks
that are represented by OpenMP lock variables. OpenMP lock variables must be
accessed only through the routines described in this section; programs that otherwise
access OpenMP lock variables are non-conforming.
10
11
12
13
14
An OpenMP lock can be in one of the following states: uninitialized, unlocked, or
locked. If a lock is in the unlocked state, a task can set the lock, which changes its state
to locked. The task that sets the lock is then said to own the lock. A task that owns a
lock can unset that lock, returning it to the unlocked state. A program in which a task
unsets a lock that is owned by another task is non-conforming.
15
16
17
18
19
Two types of locks are supported: simple locks and nestable locks. A nestable lock can
be set multiple times by the same task before being unset; a simple lock cannot be set if
it is already owned by the task trying to set it. Simple lock variables are associated with
simple locks and can only be passed to simple lock routines. Nestable lock variables are
associated with nestable locks and can only be passed to nestable lock routines.
20
21
22
Constraints on the state and ownership of the lock accessed by each of the lock routines
are described with the routine. If these constraints are not met, the behavior of the
routine is unspecified.
23
24
25
26
The OpenMP lock routines access a lock variable in such a way that they always read
and update the most current value of the lock variable. It is not necessary for an
OpenMP program to include explicit flush directives to ensure that the lock variable’s
value is consistent among different tasks.
27
28
See Section A.45 on page 294 and Section A.46 on page 297, for examples of using the
simple and the nestable lock routines, respectively.
29
Binding
30
31
32
The binding thread set for all lock routine regions is all threads. As a consequence, for
each OpenMP lock, the lock routine effects relate to all tasks that call the routines,
without regard to which teams the threads executing the tasks belong.
Chapter 3
Runtime Library Routines
141
1
Simple Lock Routines
2
3
4
The type omp_lock_t is a data type capable of representing a simple lock. For the
following routines, a simple lock variable must be of omp_lock_t type. All simple
lock routines require an argument that is a pointer to a variable of type omp_lock_t.
C/C++
C/C++
Fortran
For the following routines, a simple lock variable must be an integer variable of
kind=omp_lock_kind.
5
6
Fortran
7
The simple lock routines are as follows:
8
• The omp_init_lock routine initializes a simple lock.
9
• The omp_destroy_lock routine uninitializes a simple lock.
10
• The omp_set_lock routine waits until a simple lock is available, and then sets it.
11
• The omp_unset_lock routine unsets a simple lock.
12
• The omp_test_lock routine tests a simple lock, and sets it if it is available.
13
14
Nestable Lock Routines:
15
16
17
18
The type omp_nest_lock_t is a data type capable of representing a nestable lock.
For the following routines, a nested lock variable must be of omp_nest_lock_t type.
All nestable lock routines require an argument that is a pointer to a variable of type
omp_nest_lock_t.
C/C++
C/C++
Fortran
For the following routines, a nested lock variable must be an integer variable of
kind=omp_nest_lock_kind.
19
20
Fortran
21
The nestable lock routines are as follows:
22
• The omp_init_nest_lock routine initializes a nestable lock.
23
• The omp_destroy_nest_lock routine uninitializes a nestable lock.
142
OpenMP API • Version 3.1 July 2011
1
2
• The omp_set_nest_lock routine waits until a nestable lock is available, and then
3
• The omp_unset_nest_lock routine unsets a nestable lock.
4
5
• The omp_test_nest_lock routine tests a nestable lock, and sets it if it is
6
sets it.
available.
3.3.1
omp_init_lock and omp_init_nest_lock
7
Summary
8
These routines provide the only means of initializing an OpenMP lock.
9
Format
C/C++
void omp_init_lock(omp_lock_t *lock);
void omp_init_nest_lock(omp_nest_lock_t *lock);
C/C++
10
Fortran
subroutine omp_init_lock(svar)
integer (kind=omp_lock_kind) svar
subroutine omp_init_nest_lock(nvar)
integer (kind=omp_nest_lock_kind) nvar
Fortran
11
12
Constraints on Arguments
13
14
A program that accesses a lock that is not in the uninitialized state through either routine
is non-conforming.
Chapter 3
Runtime Library Routines
143
1
Effect
2
3
The effect of these routines is to initialize the lock to the unlocked state; that is, no task
owns the lock. In addition, the nesting count for a nestable lock is set to zero.
4
For an example of the omp_init_lock routine, see Section A.43 on page 292.
6
omp_destroy_lock and
omp_destroy_nest_lock
7
Summary
8
These routines ensure that the OpenMP lock is uninitialized.
9
Format
5
3.3.2
C/C++
void omp_destroy_lock(omp_lock_t *lock);
void omp_destroy_nest_lock(omp_nest_lock_t *lock);
C/C++
10
Fortran
subroutine omp_destroy_lock(svar)
integer (kind=omp_lock_kind) svar
subroutine omp_destroy_nest_lock(nvar)
integer (kind=omp_nest_lock_kind) nvar
Fortran
11
12
Constraints on Arguments
13
14
A program that accesses a lock that is not in the unlocked state through either routine is
non-conforming.
144
OpenMP API • Version 3.1 July 2011
1
Effect
2
The effect of these routines is to change the state of the lock to uninitialized.
3
3.3.3
omp_set_lock and omp_set_nest_lock
4
Summary
5
6
These routines provide a means of setting an OpenMP lock. The calling task region is
suspended until the lock is set.
7
Format
C/C++
void omp_set_lock(omp_lock_t *lock);
void omp_set_nest_lock(omp_nest_lock_t *lock);
C/C++
8
Fortran
subroutine omp_set_lock(svar)
integer (kind=omp_lock_kind) svar
subroutine omp_set_nest_lock(nvar)
integer (kind=omp_nest_lock_kind) nvar
Fortran
9
10
Constraints on Arguments
11
12
13
A program that accesses a lock that is in the uninitialized state through either routine is
non-conforming. A simple lock accessed by omp_set_lock that is in the locked state
must not be owned by the task that contains the call or deadlock will result.
Chapter 3
Runtime Library Routines
145
1
Effect
2
3
Each of these routines causes suspension of the task executing the routine until the
specified lock is available and then sets the lock.
4
5
A simple lock is available if it is unlocked. Ownership of the lock is granted to the task
executing the routine.
6
7
8
A nestable lock is available if it is unlocked or if it is already owned by the task
executing the routine. The task executing the routine is granted, or retains, ownership of
the lock, and the nesting count for the lock is incremented.
9
3.3.4
omp_unset_lock and omp_unset_nest_lock
10
Summary
11
These routines provide the means of unsetting an OpenMP lock.
12
Format
C/C++
void omp_unset_lock(omp_lock_t *lock);
void omp_unset_nest_lock(omp_nest_lock_t *lock);
C/C++
13
Fortran
subroutine omp_unset_lock(svar)
integer (kind=omp_lock_kind) svar
subroutine omp_unset_nest_lock(nvar)
integer (kind=omp_nest_lock_kind) nvar
Fortran
14
146
OpenMP API • Version 3.1 July 2011
1
Constraints on Arguments
2
3
A program that accesses a lock that is not in the locked state or that is not owned by the
task that contains the call through either routine is non-conforming.
4
Effect
5
For a simple lock, the omp_unset_lock routine causes the lock to become unlocked.
6
7
For a nestable lock, the omp_unset_nest_lock routine decrements the nesting
count, and causes the lock to become unlocked if the resulting nesting count is zero.
For either routine, if the lock becomes unlocked, and if one or more task regions were
suspended because the lock was unavailable, the effect is that one task is chosen and
given ownership of the lock.
8
9
10
11
3.3.5
omp_test_lock and omp_test_nest_lock
12
Summary
13
14
These routines attempt to set an OpenMP lock but do not suspend execution of the task
executing the routine.
15
Format
C/C++
int omp_test_lock(omp_lock_t *lock);
int omp_test_nest_lock(omp_nest_lock_t *lock);
C/C++
16
Fortran
logical
integer
integer
integer
17
function omp_test_lock(svar)
(kind=omp_lock_kind) svar
function omp_test_nest_lock(nvar)
(kind=omp_nest_lock_kind) nvar
Fortran
Chapter 3
Runtime Library Routines
147
1
Constraints on Arguments
2
3
4
A program that accesses a lock that is in the uninitialized state through either routine is
non-conforming. The behavior is unspecified if a simple lock accessed by
omp_test_lock is in the locked state and is owned by the task that contains the call.
5
Effect
6
7
8
These routines attempt to set a lock in the same manner as omp_set_lock and
omp_set_nest_lock, except that they do not suspend execution of the task
executing the routine.
9
10
For a simple lock, the omp_test_lock routine returns true if the lock is successfully
set; otherwise, it returns false.
11
12
For a nestable lock, the omp_test_nest_lock routine returns the new nesting count
if the lock is successfully set; otherwise, it returns zero.
13
3.4
Timing Routines
14
The routines described in this section support a portable wall clock timer.
15
• the omp_get_wtime routine.
16
• the omp_get_wtick routine.
17
3.4.1
omp_get_wtime
18
Summary
19
The omp_get_wtime routine returns elapsed wall clock time in seconds.
148
OpenMP API • Version 3.1 July 2011
1
Format
C/C++
double omp_get_wtime(void);
C/C++
2
Fortran
double precision function omp_get_wtime()
Fortran
3
4
Binding
5
6
The binding thread set for an omp_get_wtime region is the encountering thread. The
routine’s return value is not guaranteed to be consistent across any set of threads.
7
Effect
8
9
10
11
12
The omp_get_wtime routine returns a value equal to the elapsed wall clock time in
seconds since some “time in the past”. The actual “time in the past” is arbitrary, but it is
guaranteed not to change during the execution of the application program. The time
returned is a “per-thread time”, so it is not required to be globally consistent across all
the threads participating in an application.
13
14
Note – It is anticipated that the routine will be used to measure elapsed times as shown
in the following example:
C/C++
double start;
double end;
start = omp_get_wtime();
... work to be timed ...
end = omp_get_wtime();
printf("Work took %f seconds\n", end - start);
15
C/C++
Chapter 3
Runtime Library Routines
149
Fortran
1
DOUBLE PRECISION START, END
START = omp_get_wtime()
... work to be timed ...
END = omp_get_wtime()
PRINT *, "Work took", END - START, "seconds"
Fortran
2
3
3.4.2
omp_get_wtick
4
Summary
5
6
The omp_get_wtick routine returns the precision of the timer used by
omp_get_wtime.
7
Format
C/C++
double omp_get_wtick(void);
C/C++
8
Fortran
double precision function omp_get_wtick()
Fortran
9
10
Binding
11
12
The binding thread set for an omp_get_wtick region is the encountering thread. The
routine’s return value is not guaranteed to be consistent across any set of threads.
150
OpenMP API • Version 3.1 July 2011
1
Effect
2
3
The omp_get_wtick routine returns a value equal to the number of seconds between
successive clock ticks of the timer used by omp_get_wtime.
Chapter 3
Runtime Library Routines
151
1
This page intentionally left blank.
2
152
OpenMP API • Version 3.1 July 2011
1
2
CHAPTER
4
Environment Variables
3
4
5
6
7
8
9
10
This chapter describes the OpenMP environment variables that specify the settings of
the ICVs that affect the execution of OpenMP programs (see Section 2.3 on page 28).
The names of the environment variables must be upper case. The values assigned to the
environment variables are case insensitive and may have leading and trailing white
space. Modifications to the environment variables after the program has started, even if
modified by the program itself, are ignored by the OpenMP implementation. However,
the settings of some of the ICVs can be modified during the execution of the OpenMP
program by the use of the appropriate directive clauses or OpenMP API routines.
11
The environment variables are as follows:
12
13
• OMP_SCHEDULE sets the run-sched-var ICV that specifies the runtime schedule type
14
15
• OMP_NUM_THREADS sets the nthreads-var ICV that specifies the number of threads
16
17
• OMP_DYNAMIC sets the dyn-var ICV that specifies the dynamic adjustment of
18
19
• OMP_PROC_BIND sets the bind-var ICV that controls whether threads are bound to
20
• OMP_NESTED sets the nest-var ICV that enables or disables nested parallelism.
21
22
• OMP_STACKSIZE sets the stacksize-var ICV that specifies the size of the stack for
23
24
• OMP_WAIT_POLICY sets the wait-policy-var ICV that controls the desired behavior
25
26
• OMP_MAX_ACTIVE_LEVELS sets the max-active-levels-var ICV that controls the
27
28
• OMP_THREAD_LIMIT sets the thread-limit-var ICV that controls the maximum
and chunk size. It can be set to any of the valid OpenMP schedule types.
to use for parallel regions.
threads to use for parallel regions.
processors.
threads created by the OpenMP implementation.
of waiting threads.
maximum number of nested active parallel regions.
number of threads participating in the OpenMP program.
153
1
2
3
The examples in this chapter only demonstrate how these variables might be set in Unix
C shell (csh) environments. In Korn shell (ksh) and DOS environments the actions are
similar, as follows:
4
• csh:
setenv OMP_SCHEDULE "dynamic"
• ksh:
5
export OMP_SCHEDULE="dynamic"
• DOS:
6
set OMP_SCHEDULE=dynamic
7
4.1
OMP_SCHEDULE
8
9
10
The OMP_SCHEDULE environment variable controls the schedule type and chunk size
of all loop directives that have the schedule type runtime, by setting the value of the
run-sched-var ICV.
11
The value of this environment variable takes the form:
12
type[,chunk]
13
where
14
• type is one of static, dynamic, guided, or auto
15
• chunk is an optional positive integer that specifies the chunk size
16
17
If chunk is present, there may be white space on either side of the “,”. See Section 2.5.1
on page 39 for a detailed description of the schedule types.
18
19
The behavior of the program is implementation defined if the value of OMP_SCHEDULE
does not conform to the above format.
20
21
22
Implementation specific schedules cannot be specified in OMP_SCHEDULE. They can
only be specified by calling omp_set_schedule, described in Section 3.2.11 on page
128.
154
OpenMP API • Version 3.1 July 2011
Example:
1
setenv OMP_SCHEDULE "guided,4"
setenv OMP_SCHEDULE "dynamic"
2
Cross References
3
• run-sched-var ICV, see Section 2.3 on page 28.
4
• Loop construct, see Section 2.5.1 on page 39.
5
• Parallel loop construct, see Section 2.6.1 on page 56.
6
• omp_set_schedule routine, see Section 3.2.11 on page 128.
7
• omp_get_schedule routine, see Section 3.2.12 on page 130.
8
4.2
OMP_NUM_THREADS
9
10
11
12
13
14
15
The OMP_NUM_THREADS environment variable sets the number of threads to use for
parallel regions by setting the initial value of the nthreads-var ICV. See Section 2.3
on page 28 for a comprehensive set of rules about the interaction between the
OMP_NUM_THREADS environment variable, the num_threads clause, the
omp_set_num_threads library routine and dynamic adjustment of threads, and
Section 2.4.1 on page 36 for a complete algorithm that describes how the number of
threads for a parallel region is determined.
16
17
18
The value of this environment variable must be a list of positive integer values. The
values of the list set the number of threads to use for parallel regions at the
corresponding nested level.
19
20
21
The behavior of the program is implementation defined if any value of the list specified
in the OMP_NUM_THREADS environment variable leads to a number of threads which is
greater than an implementation can support, or if any value is not a positive integer.
22
Example:
setenv OMP_NUM_THREADS 4,3,2
23
Cross References:
24
• nthreads-var ICV, see Section 2.3 on page 28.
25
• num_threads clause, Section 2.4 on page 33.
Chapter 4
Environment Variables
155
1
• omp_set_num_threads routine, see Section 3.2.1 on page 116.
2
• omp_get_num_threads routine, see Section 3.2.2 on page 117.
3
• omp_get_max_threads routine, see Section 3.2.3 on page 118.
4
• omp_get_team_size routine, see Section 3.2.18 on page 137.
5
6
4.3
OMP_DYNAMIC
7
8
9
10
11
12
13
14
The OMP_DYNAMIC environment variable controls dynamic adjustment of the number
of threads to use for executing parallel regions by setting the initial value of the
dyn-var ICV. The value of this environment variable must be true or false. If the
environment variable is set to true, the OpenMP implementation may adjust the
number of threads to use for executing parallel regions in order to optimize the use
of system resources. If the environment variable is set to false, the dynamic
adjustment of the number of threads is disabled. The behavior of the program is
implementation defined if the value of OMP_DYNAMIC is neither true nor false.
15
Example:
setenv OMP_DYNAMIC true
16
Cross References:
17
• dyn-var ICV, see Section 2.3 on page 28.
18
• omp_set_dynamic routine, see Section 3.2.7 on page 123.
19
• omp_get_dynamic routine, see Section 3.2.8 on page 124.
20
4.4
OMP_PROC_BIND
The OMP_PROC_BIND environment variable sets the value of the global bind-var ICV.
The value of this environment variable must be true or false. If the environment
variable is set to true, the execution environment should not move OpenMP threads
between processors. If the environment variable is set to false, the execution
environment may move OpenMP threads between processors. The behavior of the
program is implementation defined if the value of OMP_PROC_BIND is neither true
nor false.
21
22
23
24
25
26
27
156
OpenMP API • Version 3.1 July 2011
Example:
1
setenv
OMP_PROC_BIND true
2
Cross References:
3
• bind-var ICV, see Section 2.3 on page 28.
4
4.5
OMP_NESTED
The OMP_NESTED environment variable controls nested parallelism by setting the
initial value of the nest-var ICV. The value of this environment variable must be true
or false. If the environment variable is set to true, nested parallelism is enabled; if
set to false, nested parallelism is disabled. The behavior of the program is
implementation defined if the value of OMP_NESTED is neither true nor false.
5
6
7
8
9
Example:
10
setenv OMP_NESTED false
11
Cross References
12
• nest-var ICV, see Section 2.3 on page 28.
13
• omp_set_nested routine, see Section 3.2.9 on page 125.
14
• omp_get_nested routine, see Section 3.2.18 on page 137.
15
16
4.6
OMP_STACKSIZE
17
18
19
The OMP_STACKSIZE environment variable controls the size of the stack for threads
created by the OpenMP implementation, by setting the value of the stacksize-var ICV.
The environment variable does not control the size of the stack for the initial thread.
20
The value of this environment variable takes the form:
21
size | sizeB | sizeK | sizeM | sizeG
22
where:
Chapter 4
Environment Variables
157
1
2
• size is a positive integer that specifies the size of the stack for threads that are created
3
4
5
6
• B, K, M, and G are letters that specify whether the given size is in Bytes, Kilobytes
7
8
If only size is specified and none of B, K, M, or G is specified, then size is assumed to be
in Kilobytes.
9
10
11
The behavior of the program is implementation defined if OMP_STACKSIZE does not
conform to the above format, or if the implementation cannot provide a stack with the
requested size.
12
Examples:
by the OpenMP implementation.
(1024 Bytes), Megabytes (1024 Kilobytes), or Gigabytes (1024 Megabytes),
respectively. If one of these letters is present, there may be white space between
size and the letter.
setenv
setenv
setenv
setenv
setenv
setenv
setenv
OMP_STACKSIZE
OMP_STACKSIZE
OMP_STACKSIZE
OMP_STACKSIZE
OMP_STACKSIZE
OMP_STACKSIZE
OMP_STACKSIZE
2000500B
"3000 k "
10M
" 10 M "
"20 m "
" 1G"
20000
13
Cross References
14
• stacksize-var ICV, see Section 2.3 on page 28.
15
4.7
OMP_WAIT_POLICY
16
17
18
19
The OMP_WAIT_POLICY environment variable provides a hint to an OpenMP
implementation about the desired behavior of waiting threads by setting the wait-policyvar ICV. A compliant OpenMP implementation may or may not abide by the setting of
the environment variable.
20
The value of this environment variable takes the form:
21
ACTIVE | PASSIVE
22
23
24
The ACTIVE value specifies that waiting threads should mostly be active, consuming
processor cycles, while waiting. An OpenMP implementation may, for example, make
waiting threads spin.
158
OpenMP API • Version 3.1 July 2011
1
2
3
The PASSIVE value specifies that waiting threads should mostly be passive, not
consuming processor cycles, while waiting. For example, an OpenMP implementation
may make waiting threads yield the processor to other threads or go to sleep.
4
The details of the ACTIVE and PASSIVE behaviors are implementation defined.
5
Examples:
setenv
setenv
setenv
setenv
OMP_WAIT_POLICY
OMP_WAIT_POLICY
OMP_WAIT_POLICY
OMP_WAIT_POLICY
ACTIVE
active
PASSIVE
passive
6
Cross References
7
• wait-policy-var ICV, see Section 2.3 on page 24.
8
4.8
OMP_MAX_ACTIVE_LEVELS
9
10
11
The OMP_MAX_ACTIVE_LEVELS environment variable controls the maximum number
of nested active parallel regions by setting the initial value of the max-active-levels-var
ICV.
12
13
14
15
16
The value of this environment variable must be a non-negative integer. The behavior of
the program is implementation defined if the requested value of
OMP_MAX_ACTIVE_LEVELS is greater than the maximum number of nested active
parallel levels an implementation can support, or if the value is not a non-negative
integer.
17
Cross References
18
• max-active-levels-var ICV, see Section 2.3 on page 28.
19
• omp_set_max_active_levels routine, see Section 3.2.14 on page 132.
20
• omp_get_max_active_levels routine, see Section 3.2.15 on page 134.
Chapter 4
Environment Variables
159
1
4.9
OMP_THREAD_LIMIT
2
3
The OMP_THREAD_LIMIT environment variable sets the number of OpenMP threads
to use for the whole OpenMP program by setting the thread-limit-var ICV.
4
5
6
7
The value of this environment variable must be a positive integer. The behavior of the
program is implementation defined if the requested value of OMP_THREAD_LIMIT is
greater than the number of threads an implementation can support, or if the value is not
a positive integer.
8
Cross References
9
• thread-limit-var ICV, see Section 2.3 on page 28.
• omp_get_thread_limit routine
10
160
OpenMP API • Version 3.1 July 2011
1
APPENDIX
A
2
Examples
3
The following are examples of the constructs and routines defined in this document.
4
5
A statement following a directive is compound only when necessary, and a noncompound statement is indented with respect to a directive preceding it.
C/C++
C/C++
6
7
8
9
A.1
A Simple Parallel Loop
The following example demonstrates how to parallelize a simple loop using the parallel
loop construct (Section 2.6.1 on page 56). The loop iteration variable is private by
default, so it is not necessary to specify it explicitly in a private clause.
C/C++
10
Example A.1.1c
11
12
13
14
15
16
17
18
void simple(int n, float *a, float *b)
{
int i;
#pragma omp parallel for
for (i=1; i<n; i++) /* i is private by default */
b[i] = (a[i] + a[i-1]) / 2.0;
}
C/C++
161
Fortran
Example A.1.1f
1
2
3
4
5
6
7
8
9
10
11
12
13
SUBROUTINE SIMPLE(N, A, B)
INTEGER I, N
REAL B(N), A(N)
!$OMP PARALLEL DO !I is private by default
DO I=2,N
B(I) = (A(I) + A(I-1)) / 2.0
ENDDO
!$OMP END PARALLEL DO
END SUBROUTINE SIMPLE
Fortran
14
A.2
The OpenMP Memory Model
In the following example, at Print 1, the value of x could be either 2 or 5, depending on
the timing of the threads, and the implementation of the assignment to x. There are two
reasons that the value at Print 1 might not be 5. First, Print 1 might be executed before
the assignment to x is executed. Second, even if Print 1 is executed after the assignment,
the value 5 is not guaranteed to be seen by thread 1 because a flush may not have been
executed by thread 0 since the assignment.
15
16
17
18
19
20
162
OpenMP API • Version 3.1 July 2011
1
2
3
The barrier after Print 1 contains implicit flushes on all threads, as well as a thread
synchronization, so the programmer is guaranteed that the value 5 will be printed by
both Print 2 and Print 3.
4
Example A.2.1c
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
C/C++
#include <stdio.h>
#include <omp.h>
int main(){
int x;
x = 2;
#pragma omp parallel num_threads(2) shared(x)
{
if (omp_get_thread_num() == 0) {
x = 5;
} else {
/* Print 1: the following read of x has a race */
printf("1: Thread# %d: x = %d\n", omp_get_thread_num(),x );
}
#pragma omp barrier
if (omp_get_thread_num() == 0) {
/* Print 2 */
printf("2: Thread# %d: x = %d\n", omp_get_thread_num(),x );
} else {
/* Print 3 */
printf("3: Thread# %d: x = %d\n", omp_get_thread_num(),x );
}
}
return 0;
}
C/C++
Appendix A
Examples
163
Fortran
Example A.2.1f
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
PROGRAM MEMMODEL
INCLUDE "omp_lib.h"
INTEGER X
! or USE OMP_LIB
X = 2
!$OMP PARALLEL NUM_THREADS(2) SHARED(X)
IF (OMP_GET_THREAD_NUM() .EQ. 0) THEN
X = 5
ELSE
! PRINT 1: The following read of x has a race
PRINT *,"1: THREAD# ", OMP_GET_THREAD_NUM(), "X = ", X
ENDIF
!$OMP BARRIER
IF (OMP_GET_THREAD_NUM() .EQ. 0) THEN
! PRINT 2
PRINT *,"2: THREAD# ", OMP_GET_THREAD_NUM(), "X = ", X
ELSE
! PRINT 3
PRINT *,"3: THREAD# ", OMP_GET_THREAD_NUM(), "X = ", X
ENDIF
!$OMP END PARALLEL
END PROGRAM MEMMODEL
Fortran
The following example demonstrates why synchronization is difficult to perform
correctly through variables. The value of flag is undefined in both prints on thread 1 and
the value of data is only well-defined in the second print.
29
30
31
164
OpenMP API • Version 3.1 July 2011
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
Example A.2.2c
C/C++
#include <omp.h>
#include <stdio.h>
int main()
{
int data;
int flag=0;
#pragma omp parallel num_threads(2)
{
if (omp_get_thread_num()==0)
{
/* Write to the data buffer that will be
read by thread */
data = 42;
/* Flush data to thread 1 and strictly order
the write to data
relative to the write to the flag */
#pragma omp flush(flag, data)
/* Set flag to release thread 1 */
flag = 1;
/* Flush flag to ensure that thread 1 sees
the change */
#pragma omp flush(flag)
}
else if(omp_get_thread_num()==1)
{
/* Loop until we see the update to the flag */
#pragma omp flush(flag, data)
while (flag < 1)
{
#pragma omp flush(flag, data)
}
/* Values of flag and data are undefined */
printf("flag=%d data=%d\n", flag, data);
#pragma omp flush(flag, data)
/* Values data will be 42, value of flag
still undefined */
printf("flag=%d data=%d\n", flag, data);
}
}
return 0;
}
C/C++
43
Appendix A
Examples
165
Fortran
Example A.2.2f
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
PROGRAM
INCLUDE
INTEGER
INTEGER
EXAMPLE
"omp_lib.h" ! or USE OMP_LIB
DATA
FLAG
FLAG = 0
!$OMP PARALLEL NUM_THREADS(2)
IF(OMP_GET_THREAD_NUM() .EQ. 0) THEN
! Write to the data buffer that will be read by thread 1
DATA = 42
! Flush DATA to thread 1 and strictly order the write to DATA
! relative to the write to the FLAG
!$OMP FLUSH(FLAG, DATA)
! Set FLAG to release thread 1
FLAG = 1;
! Flush FLAG to ensure that thread 1 sees the change */
!$OMP FLUSH(FLAG)
ELSE IF(OMP_GET_THREAD_NUM() .EQ. 1) THEN
! Loop until we see the update to the FLAG
!$OMP FLUSH(FLAG, DATA)
DO WHILE(FLAG .LT. 1)
!$OMP FLUSH(FLAG, DATA)
ENDDO
! Values of FLAG and DATA are undefined
PRINT *, 'FLAG=', FLAG, ' DATA=', DATA
!$OMP FLUSH(FLAG, DATA)
!Values DATA will be 42, value of FLAG still undefined */
PRINT *, 'FLAG=', FLAG, ' DATA=', DATA
ENDIF
!$OMP END PARALLEL
END
Fortran
The next example demonstrates why synchronization is difficult to perform correctly
through variables. Because the write(1)-flush(1)-flush(2)-read(2) sequence cannot be
guaranteed in the example, the statements on thread 0 and thread 1 may execute in either
order.
35
36
37
38
166
OpenMP API • Version 3.1 July 2011
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
C/C++
Example A.2.3c
#include <omp.h>
#include <stdio.h>
int main()
{
int flag=0;
#pragma omp parallel num_threads(3)
{
if(omp_get_thread_num()==0)
{
/* Set flag to release thread 1 */
#pragma omp atomic update
flag++;
/* Flush of flag is implied by the atomic directive */
}
else if(omp_get_thread_num()==1)
{
/* Loop until we see that flag reaches 1*/
#pragma omp flush(flag)
while(flag < 1)
{
#pragma omp flush(flag)
}
printf("Thread 1 awoken\n");
/* Set flag to release thread 2 */
#pragma omp atomic update
flag++;
/* Flush of flag is implied by the atomic directive */
}
else if(omp_get_thread_num()==2)
{
/* Loop until we see that flag reaches 2 */
#pragma omp flush(flag)
while(flag < 2)
{
#pragma omp flush(flag)
}
printf("Thread 2 awoken\n");
}
}
return 0;
}
C/C++
45
Appendix A
Examples
167
Fortran
Example A.2.3f
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
PROGRAM EXAMPLE
INCLUDE "omp_lib.h" ! or USE OMP_LIB
INTEGER FLAG
FLAG = 0
!$OMP PARALLEL NUM_THREADS(3)
IF(OMP_GET_THREAD_NUM() .EQ. 0) THEN
! Set flag to release thread 1
!$OMP ATOMIC UPDATE
FLAG = FLAG + 1
!Flush of FLAG is implied by the atomic directive
ELSE IF(OMP_GET_THREAD_NUM() .EQ. 1) THEN
! Loop until we see that FLAG reaches 1
!$OMP FLUSH(FLAG, DATA)
DO WHILE(FLAG .LT. 1)
!$OMP FLUSH(FLAG, DATA)
ENDDO
PRINT *, 'Thread 1 awoken'
! Set FLAG to release thread 2
!$OMP ATOMIC UPDATE
FLAG = FLAG + 1
!Flush of FLAG is implied by the atomic directive
ELSE IF(OMP_GET_THREAD_NUM() .EQ. 2) THEN
! Loop until we see that FLAG reaches 2
!$OMP FLUSH(FLAG, DATA)
DO WHILE(FLAG .LT. 2)
!$OMP FLUSH(FLAG,
DATA)
ENDDO
PRINT *, 'Thread 2 awoken'
ENDIF
!$OMP END PARALLEL
END
Fortran
37
168
OpenMP API • Version 3.1 July 2011
1
A.3
Conditional Compilation
C/C++
2
3
4
The following example illustrates the use of conditional compilation using the OpenMP
macro _OPENMP (Section 2.2 on page 26). With OpenMP compilation, the _OPENMP
macro becomes defined.
5
Example A.3.1c
6
7
8
9
10
11
12
13
14
15
16
#include <stdio.h>
int main()
{
# ifdef _OPENMP
printf("Compiled by an OpenMP-compliant implementation.\n");
# endif
return 0;
}
C/C++
Fortran
17
18
19
20
The following example illustrates the use of the conditional compilation sentinel (see
Section 2.2 on page 26). With OpenMP compilation, the conditional compilation
sentinel !$ is recognized and treated as two spaces. In fixed form source, statements
guarded by the sentinel must start after column 6.
21
Example A.3.1f
22
23
24
25
26
27
PROGRAM EXAMPLE
C234567890
!$
PRINT *, "Compiled by an OpenMP-compliant implementation."
END PROGRAM EXAMPLE
Fortran
Appendix A
Examples
169
1
A.4
Internal Control Variables (ICVs)
2
3
4
5
6
7
According to Section 2.3 on page 28, an OpenMP implementation must act as if there
are ICVs that control the behavior of the program. This example illustrates two ICVs,
nthreads-var and max-active-levels-var. The nthreads-var ICV controls the number of
threads requested for encountered parallel regions; there is one copy of this ICV per
task. The max-active-levels-var ICV controls the maximum number of nested active
parallel regions; there is one copy of this ICV for the whole program.
8
9
10
11
12
13
In the following example, the nest-var, max-active-levels-var, dyn-var, and nthreads-var
ICVs are modified through calls to the runtime library routines omp_set_nested,
omp_set_max_active_levels, omp_set_dynamic, and
omp_set_num_threads respectively. These ICVs affect the operation of
parallel regions. Each implicit task generated by a parallel region has its own
copy of the nest-var, dyn-var, and nthreads-var ICVs.
14
15
16
17
In the following example, the new value of nthreads-var applies only to the implicit
tasks that execute the call to omp_set_num_threads. There is one copy of the maxactive-levels-var ICV for the whole program and its value is the same for all tasks. This
example assumes that nested parallelism is supported.
18
19
The outer parallel region creates a team of two threads; each of the threads will
execute one of the two implicit tasks generated by the outer parallel region.
20
21
22
23
24
Each implicit task generated by the outer parallel region calls
omp_set_num_threads(3), assigning the value 3 to its respective copy of
nthreads-var. Then each implicit task encounters an inner parallel region that
creates a team of three threads; each of the threads will execute one of the three implicit
tasks generated by that inner parallel region.
25
26
Since the outer parallel region is executed by 2 threads, and the inner by 3, there
will be a total of 6 implicit tasks generated by the two inner parallel regions.
27
28
29
Each implicit task generated by an inner parallel region will execute the call to
omp_set_num_threads(4), assigning the value 4 to its respective copy of
nthreads-var.
30
31
The print statement in the outer parallel region is executed by only one of the
threads in the team. So it will be executed only once.
32
33
34
The print statement in an inner parallel region is also executed by only one of the
threads in the team. Since we have a total of two inner parallel regions, the print
statement will be executed twice -- once per inner parallel region.
170
OpenMP API • Version 3.1 July 2011
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
Example A.4.1c
C/C++
#include <stdio.h>
#include <omp.h>
int main (void)
{
omp_set_nested(1);
omp_set_max_active_levels(8);
omp_set_dynamic(0);
omp_set_num_threads(2);
#pragma omp parallel
{
omp_set_num_threads(3);
#pragma omp parallel
{
omp_set_num_threads(4);
#pragma omp single
{
/*
* The following should print:
* Inner: max_act_lev=8, num_thds=3, max_thds=4
* Inner: max_act_lev=8, num_thds=3, max_thds=4
*/
printf ("Inner: max_act_lev=%d, num_thds=%d, max_thds=%d\n",
omp_get_max_active_levels(), omp_get_num_threads(),
omp_get_max_threads());
}
}
#pragma omp barrier
#pragma omp single
{
/*
* The following should print:
* Outer: max_act_lev=8, num_thds=2, max_thds=3
*/
printf ("Outer: max_act_lev=%d, num_thds=%d, max_thds=%d\n",
omp_get_max_active_levels(), omp_get_num_threads(),
omp_get_max_threads());
}
}
return 0;
}
C/C++
Appendix A
Examples
171
Fortran
Example A.4.1f
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
program icv
use omp_lib
call
call
call
call
omp_set_nested(.true.)
omp_set_max_active_levels(8)
omp_set_dynamic(.false.)
omp_set_num_threads(2)
!$omp parallel
call omp_set_num_threads(3)
!$omp parallel
call omp_set_num_threads(4)
!$omp single
!
The following should print:
!
Inner: max_act_lev= 8 , num_thds= 3 , max_thds= 4
!
Inner: max_act_lev= 8 , num_thds= 3 , max_thds= 4
print *, "Inner: max_act_lev=", omp_get_max_active_levels(),
&
", num_thds=", omp_get_num_threads(),
&
", max_thds=", omp_get_max_threads()
!$omp end single
!$omp end parallel
!$omp barrier
!$omp single
!
The following should print:
!
Outer: max_act_lev= 8 , num_thds= 2 , max_thds= 3
print *, "Outer: max_act_lev=", omp_get_max_active_levels(),
&
", num_thds=", omp_get_num_threads(),
&
", max_thds=", omp_get_max_threads()
!$omp end single
!$omp end parallel
end
Fortran
35
A.5
The parallel Construct
The parallel construct (Section 2.4 on page 33) can be used in coarse-grain parallel
programs. In the following example, each thread in the parallel region decides what
part of the global array x to work on, based on the thread number:
36
37
38
172
OpenMP API • Version 3.1 July 2011
1
Example A.5.1c
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
#include <omp.h>
C/C++
void subdomain(float *x, int istart, int ipoints)
{
int i;
for (i = 0; i < ipoints; i++)
x[istart+i] = 123.456;
}
void sub(float *x, int npoints)
{
int iam, nt, ipoints, istart;
#pragma omp parallel default(shared) private(iam,nt,ipoints,istart)
{
iam = omp_get_thread_num();
nt = omp_get_num_threads();
ipoints = npoints / nt;
/* size of partition */
istart = iam * ipoints; /* starting array index */
if (iam == nt-1)
/* last thread may do more */
ipoints = npoints - istart;
subdomain(x, istart, ipoints);
}
}
int main()
{
float array[10000];
sub(array, 10000);
return 0;
}
C/C++
Appendix A
Examples
173
Fortran
Example A.5.1f
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
SUBROUTINE SUBDOMAIN(X, ISTART, IPOINTS)
INTEGER ISTART, IPOINTS
REAL X(*)
INTEGER I
100
DO 100 I=1,IPOINTS
X(ISTART+I) = 123.456
CONTINUE
END SUBROUTINE SUBDOMAIN
SUBROUTINE SUB(X, NPOINTS)
INCLUDE "omp_lib.h"
! or USE OMP_LIB
REAL X(*)
INTEGER NPOINTS
INTEGER IAM, NT, IPOINTS, ISTART
!$OMP PARALLEL DEFAULT(PRIVATE) SHARED(X,NPOINTS)
IAM = OMP_GET_THREAD_NUM()
NT = OMP_GET_NUM_THREADS()
IPOINTS = NPOINTS/NT
ISTART = IAM * IPOINTS
IF (IAM .EQ. NT-1) THEN
IPOINTS = NPOINTS - ISTART
ENDIF
CALL SUBDOMAIN(X,ISTART,IPOINTS)
!$OMP END PARALLEL
END SUBROUTINE SUB
PROGRAM PAREXAMPLE
REAL ARRAY(10000)
CALL SUB(ARRAY, 10000)
END PROGRAM PAREXAMPLE
Fortran
174
OpenMP API • Version 3.1 July 2011
2
Controlling the Number of Threads on
Multiple Nesting Levels
3
4
5
The following examples demonstrate how to use the OMP_NUM_THREADS environment
variable (Section 2.3.2 on page 29) to control the number of threads on multiple nesting
levels:
6
Example A.6.1c
1
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
A.6
C/C++
#include <stdio.h>
#include <omp.h>
int main (void)
{
omp_set_nested(1);
omp_set_dynamic(0);
#pragma omp parallel
{
#pragma omp parallel
{
#pragma omp single
{
/*
* If OMP_NUM_THREADS=2,3 was set, the following should print:
* Inner: num_thds=3
* Inner: num_thds=3
*
* If nesting is not supported, the following should print:
* Inner: num_thds=1
* Inner: num_thds=1
*/
printf ("Inner: num_thds=%d\n", omp_get_num_threads());
}
}
#pragma omp barrier
omp_set_nested(0);
#pragma omp parallel
{
#pragma omp single
{
/*
* Even if OMP_NUM_THREADS=2,3 was set, the following should
* print, because nesting is disabled:
* Inner: num_thds=1
* Inner: num_thds=1
*/
printf ("Inner: num_thds=%d\n", omp_get_num_threads());
Appendix A
Examples
175
1
2
3
4
5
6
7
8
9
10
11
12
13
14
}
}
#pragma omp barrier
#pragma omp single
{
/*
* If OMP_NUM_THREADS=2,3 was set, the following should print:
* Outer: num_thds=2
*/
printf ("Outer: num_thds=%d\n", omp_get_num_threads());
}
}
return 0;
}
C/C++
Fortran
Example A.6.1f
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
!$omp
!$omp
!$omp
!$omp
!$omp
!$omp
!$omp
!$omp
!$omp
!$omp
!$omp
!$omp
176
program icv
use omp_lib
call omp_set_nested(.true.)
call omp_set_dynamic(.false.)
parallel
parallel
single
! If OMP_NUM_THREADS=2,3 was set, the following should print:
! Inner: num_thds= 3
! Inner: num_thds= 3
! If nesting is not supported, the following should print:
! Inner: num_thds= 1
! Inner: num_thds= 1
print *, "Inner: num_thds=", omp_get_num_threads()
end single
end parallel
barrier
call omp_set_nested(.false.)
parallel
single
! Even if OMP_NUM_THREADS=2,3 was set, the following should print,
! because nesting is disabled:
! Inner: num_thds= 1
! Inner: num_thds= 1
print *, "Inner: num_thds=", omp_get_num_threads()
end single
end parallel
barrier
single
! If OMP_NUM_THREADS=2,3 was set, the following should print:
! Outer: num_thds= 2
print *, "Outer: num_thds=", omp_get_num_threads()
OpenMP API • Version 3.1 July 2011
1
2
3
!$omp end single
!$omp end parallel
end
Fortran
4
5
6
7
A.7
Interaction Between the num_threads
Clause and omp_set_dynamic
8
9
10
The following example demonstrates the num_threads clause (Section 2.4 on page
33) and the effect of the omp_set_dynamic routine (Section 3.2.7 on page 123) on
it.
11
12
13
14
15
The call to the omp_set_dynamic routine with argument 0 in C/C++, or .FALSE.
in Fortran, disables the dynamic adjustment of the number of threads in OpenMP
implementations that support it. In this case, 10 threads are provided. Note that in case
of an error the OpenMP implementation is free to abort the program or to supply any
number of threads available.
16
C/C++
17
Example A.7.1c
18
19
20
21
22
23
24
25
26
27
#include <omp.h>
int main()
{
omp_set_dynamic(0);
#pragma omp parallel num_threads(10)
{
/* do work here */
}
return 0;
}
C/C++
Fortran
28
29
30
Example A.7.1f
PROGRAM EXAMPLE
INCLUDE "omp_lib.h"
! or USE OMP_LIB
Appendix A
Examples
177
1
2
3
4
5
CALL OMP_SET_DYNAMIC(.FALSE.)
PARALLEL NUM_THREADS(10)
! do work here
!$OMP
END PARALLEL
END PROGRAM EXAMPLE
!$OMP
Fortran
6
7
8
The call to the omp_set_dynamic routine with a non-zero argument in C/C++, or
.TRUE. in Fortran, allows the OpenMP implementation to choose any number of
threads between 1 and 10 (see also Algorithm 2.1 in Section 2.4.1 on page 36).
9
Example A.7.2c
10
11
12
13
14
15
16
17
18
19
C/C++
#include <omp.h>
int main()
{
omp_set_dynamic(1);
#pragma omp parallel num_threads(10)
{
/* do work here */
}
return 0;
}
C/C++
Fortran
20
Example A.7.2f
21
22
23
24
25
26
27
PROGRAM EXAMPLE
INCLUDE "omp_lib.h"
! or USE OMP_LIB
CALL OMP_SET_DYNAMIC(.TRUE.)
!$OMP
PARALLEL NUM_THREADS(10)
! do work here
!$OMP
END PARALLEL
END PROGRAM EXAMPLE
Fortran
It is good practice to set the dyn-var ICV explicitly by calling the omp_set_dynamic
routine, as its default setting is implementation defined.
28
29
178
OpenMP API • Version 3.1 July 2011
1
Fortran
2
3
4
5
6
A.8
Fortran Restrictions on the do Construct
If an end do directive follows a do-construct in which several DO statements share a
DO termination statement, then a do directive can only be specified for the outermost of
these DO statements. For more information, see Section 2.5.1 on page 39. The following
example contains correct usages of loop constructs:
Appendix A
Examples
179
Example A.8.1f
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
SUBROUTINE WORK(I, J)
INTEGER I,J
END SUBROUTINE WORK
SUBROUTINE DO_GOOD()
INTEGER I, J
REAL A(1000)
!$OMP
100
!$OMP
200
!$OMP
DO 100 I = 1,10
DO
DO 100 J = 1,10
CALL WORK(I,J)
CONTINUE
! !$OMP ENDDO implied here
DO
DO 200 J = 1,10
A(I) = I + 1
ENDDO
!$OMP
DO
DO 300 I = 1,10
DO 300 J = 1,10
CALL WORK(I,J)
300
CONTINUE
!$OMP
ENDDO
END SUBROUTINE DO_GOOD
29
30
The following example is non-conforming because the matching do directive for the
end do does not precede the outermost loop:
31
Example A.8.2f
32
33
34
35
36
37
38
39
40
41
42
43
44
45
SUBROUTINE WORK(I, J)
INTEGER I,J
END SUBROUTINE WORK
SUBROUTINE DO_WRONG
INTEGER I, J
DO 100 I = 1,10
DO
DO 100 J = 1,10
CALL WORK(I,J)
100
CONTINUE
!$OMP
ENDDO
END SUBROUTINE DO_WRONG
!$OMP
Fortran
180
OpenMP API • Version 3.1 July 2011
1
Fortran
2
3
A.9
Fortran Private Loop Iteration Variables
4
5
6
7
In general loop iteration variables will be private, when used in the do-loop of a do and
parallel do construct or in sequential loops in a parallel construct (see
Section 2.5.1 on page 39 and Section 2.9.1 on page 84). In the following example of a
sequential loop in a parallel construct the loop iteration variable I will be private.
8
Example A.9.1f
9
10
11
12
13
14
15
16
17
18
19
20
21
22
SUBROUTINE PLOOP_1(A,N)
INCLUDE "omp_lib.h"
! or USE OMP_LIB
REAL A(*)
INTEGER I, MYOFFSET, N
!$OMP PARALLEL PRIVATE(MYOFFSET)
MYOFFSET = OMP_GET_THREAD_NUM()*N
DO I = 1, N
A(MYOFFSET+I) = FLOAT(I)
ENDDO
!$OMP END PARALLEL
END SUBROUTINE PLOOP_1
Appendix A
Examples
181
1
2
In exceptional cases, loop iteration variables can be made shared, as in the following
example:
3
Example A.9.2f
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
SUBROUTINE PLOOP_2(A,B,N,I1,I2)
REAL A(*), B(*)
INTEGER I1, I2, N
26
27
Note however that the use of shared loop iteration variables can easily lead to race
conditions.
!$OMP PARALLEL SHARED(A,B,I1,I2)
!$OMP SECTIONS
!$OMP SECTION
DO I1 = I1, N
IF (A(I1).NE.0.0) EXIT
ENDDO
!$OMP SECTION
DO I2 = I2, N
IF (B(I2).NE.0.0) EXIT
ENDDO
!$OMP END SECTIONS
!$OMP SINGLE
IF (I1.LE.N) PRINT *, 'ITEMS IN A UP TO ', I1, 'ARE ALL ZERO.'
IF (I2.LE.N) PRINT *, 'ITEMS IN B UP TO ', I2, 'ARE ALL ZERO.'
!$OMP END SINGLE
!$OMP END PARALLEL
END SUBROUTINE PLOOP_2
Fortran
28
A.10
The nowait clause
If there are multiple independent loops within a parallel region, you can use the
nowait clause (see Section 2.5.1 on page 39) to avoid the implied barrier at the end of
the loop construct, as follows:
29
30
31
182
OpenMP API • Version 3.1 July 2011
1
Example A.10.1c
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
#include <math.h>
C/C++
void nowait_example(int n, int m, float *a, float *b, float *y, float *z)
{
int i;
#pragma omp parallel
{
#pragma omp for nowait
for (i=1; i<n; i++)
b[i] = (a[i] + a[i-1]) / 2.0;
#pragma omp for nowait
for (i=0; i<m; i++)
y[i] = sqrt(z[i]);
}
}
C/C++
Fortran
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
Example A.10.1f
SUBROUTINE NOWAIT_EXAMPLE(N, M, A, B, Y, Z)
INTEGER N, M
REAL A(*), B(*), Y(*), Z(*)
INTEGER I
!$OMP PARALLEL
!$OMP DO
DO I=2,N
B(I) = (A(I) + A(I-1)) / 2.0
ENDDO
!$OMP END DO NOWAIT
!$OMP DO
DO I=1,M
Y(I) = SQRT(Z(I))
ENDDO
!$OMP END DO NOWAIT
!$OMP END PARALLEL
END SUBROUTINE NOWAIT_EXAMPLE
Fortran
Appendix A
Examples
183
1
2
3
4
5
In the following example, static scheduling distributes the same logical iteration
numbers to the threads that execute the three loop regions. This allows the nowait
clause to be used, even though there is a data dependence between the loops. The
dependence is satisfied as long the same thread executes the same logical iteration
numbers in each loop.
6
7
8
9
Note that the iteration count of the loops must be the same. The example satisfies this
requirement, since the iteration space of the first two loops is from 0 to n-1 (from 1 to
N in the Fortran version), while the iteration space of the last loop is from 1 to n (2 to
N+1 in the Fortran version).
Example A.10.2c
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
C/C++
#include <math.h>
void nowait_example2(int n, float *a, float *b, float *c, float *y, float *z)
{
int i;
#pragma omp parallel
{
#pragma omp for schedule(static) nowait
for (i=0; i<n; i++)
c[i] = (a[i] + b[i]) / 2.0f;
#pragma omp for schedule(static) nowait
for (i=0; i<n; i++)
z[i] = sqrtf(c[i]);
#pragma omp for schedule(static) nowait
for (i=1; i<=n; i++)
y[i] = z[i-1] + a[i];
}
}
C/C++
Fortran
Example A.10.2f
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
SUBROUTINE NOWAIT_EXAMPLE2(N, A, B, C, Y, Z)
INTEGER N
REAL A(*), B(*), C(*), Y(*), Z(*)
INTEGER I
!$OMP PARALLEL
!$OMP DO SCHEDULE(STATIC)
DO I=1,N
C(I) = (A(I) + B(I)) / 2.0
ENDDO
!$OMP END DO NOWAIT
!$OMP DO SCHEDULE(STATIC)
DO I=1,N
Z(I) = SQRT(C(I))
ENDDO
!$OMP END DO NOWAIT
184
OpenMP API • Version 3.1 July 2011
1
2
3
4
5
6
7
!$OMP DO SCHEDULE(STATIC)
DO I=2,N+1
Y(I) = Z(I-1) + A(I)
ENDDO
!$OMP END DO NOWAIT
!$OMP END PARALLEL
END SUBROUTINE NOWAIT_EXAMPLE2
Fortran
8
A.11
The collapse clause
9
10
11
12
For the following three examples, see Section 2.5.1 on page 39 for a description of the
collapse clause, Section 2.8.7 on page 82 for a description of the ordered
construct, and Section 2.9.3.5 on page 101 for a description of the lastprivate
clause.
13
14
15
16
17
In the following example, the k and j loops are associated with the loop construct. So
the iterations of the k and j loops are collapsed into one loop with a larger iteration
space, and that loop is then divided among the threads in the current team. Since the i
loop is not associated with the loop construct, it is not collapsed, and the i loop is
executed sequentially in its entirety in every iteration of the collapsed k and j loop.
18
19
20
21
The variable j can be omitted from the private clause when the collapse clause
is used since it is implicitly private. However, if the collapse clause is omitted then
j will be shared if it is omitted from the private clause. In either case, k is implicitly
private and could be omitted from the private clause.
22
Example A.11.1c
23
24
25
26
27
28
29
30
31
32
33
void bar(float *a, int i, int j, int k);
int kl, ku, ks, jl, ju, js, il, iu,is;
void sub(float *a)
{
int i, j, k;
#pragma omp for collapse(2) private(i, k, j)
for (k=kl; k<=ku; k+=ks)
for (j=jl; j<=ju; j+=js)
for (i=il; i<=iu; i+=is)
bar(a,i,j,k);
}
C/C++
C/C++
Appendix A
Examples
185
Fortran
Example A.11.1f
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
subroutine sub(a)
real a(*)
integer kl, ku, ks, jl, ju, js, il, iu, is
common /csub/ kl, ku, ks, jl, ju, js, il, iu, is
integer i, j, k
!$omp do collapse(2) private(i,j,k)
do k = kl, ku, ks
do j = jl, ju, js
do i = il, iu, is
call bar(a,i,j,k)
enddo
enddo
enddo
!$omp end do
end subroutine
Fortran
17
18
19
In the next example, the k and j loops are associated with the loop construct. So the
iterations of the k and j loops are collapsed into one loop with a larger iteration space,
and that loop is then divided among the threads in the current team.
20
21
22
23
24
The sequential execution of the iterations in the k and j loops determines the order of
the iterations in the collapsed iteration space. This implies that in the sequentially last
iteration of the collapsed iteration space, k will have the value 2 and j will have the
value 3. Since klast and jlast are lastprivate, their values are assigned by the
sequentially last iteration of the collapsed k and j loop. This example prints: 2 3.
186
OpenMP API • Version 3.1 July 2011
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Example A.11.2c
C/C++
#include <stdio.h>
void test()
{
int j, k, jlast, klast;
#pragma omp parallel
{
#pragma omp for collapse(2) lastprivate(jlast, klast)
for (k=1; k<=2; k++)
for (j=1; j<=3; j++)
{
jlast=j;
klast=k;
}
#pragma omp single
printf("%d %d\n", klast, jlast);
}
}
C/C++
Fortran
19
Example A.11.2f
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
program test
!$omp parallel
!$omp do private(j,k) collapse(2) lastprivate(jlast, klast)
do k = 1,2
do j = 1,3
jlast=j
klast=k
enddo
enddo
!$omp end do
!$omp single
print *, klast, jlast
!$omp end single
!$omp end parallel
end program test
Fortran
35
The next example illustrates the interaction of the collapse and ordered clauses.
Appendix A
Examples
187
1
2
3
4
5
In the example, the loop construct has both a collapse clause and an ordered
clause. The collapse clause causes the iterations of the k and j loops to be collapsed
into one loop with a larger iteration space, and that loop is divided among the threads in
the current team. An ordered clause is added to the loop construct, because an
ordered region binds to the loop region arising from the loop construct.
6
7
8
9
10
11
12
According to Section 2.8.7 on page 82, a thread must not execute more than one ordered
region that binds to the same loop region. So the collapse clause is required for the
example to be conforming. With the collapse clause, the iterations of the k and j
loops are collapsed into one loop, and therefore only one ordered region will bind to the
collapsed k and j loop. Without the collapse clause, there would be two ordered
regions that bind to each iteration of the k loop (one arising from the first iteration of
the j loop, and the other arising from the second iteration of the j loop).
13
14
15
16
17
18
19
20
0
0
0
1
1
1
21
Example A.11.3c
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
#include <omp.h>
#include <stdio.h>
void work(int a, int j, int k);
void sub()
{
int j, k, a;
#pragma omp parallel num_threads(2)
{
#pragma omp for collapse(2) ordered private(j,k) schedule(static,3)
for (k=1; k<=3; k++)
for (j=1; j<=2; j++)
{
#pragma omp ordered
printf("%d %d %d\n", omp_get_thread_num(), k, j);
/* end ordered */
work(a,j,k);
}
}
}
The code prints
1
1
2
2
3
3
1
2
1
2
1
2
C/C++
C/C++
188
OpenMP API • Version 3.1 July 2011
Fortran
Example A.11.3f
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
!$omp
!$omp
!$omp
!$omp
!$omp
!$omp
program test
include 'omp_lib.h'
parallel num_threads(2)
do collapse(2) ordered private(j,k) schedule(static,3)
do k = 1,3
do j = 1,2
ordered
print *, omp_get_thread_num(), k, j
end ordered
call work(a,j,k)
enddo
enddo
end do
end parallel
end program test
Fortran
17
18
A.12
The parallel sections Construct
19
20
21
In the following example (for Section 2.6.2 on page 57) routines XAXIS, YAXIS, and
ZAXIS can be executed concurrently. The first section directive is optional. Note
that all section directives need to appear in the parallel sections construct.
22
Example A.12.1c
23
24
25
26
27
28
29
30
31
32
33
34
35
36
void XAXIS();
void YAXIS();
void ZAXIS();
C/C++
void sect_example()
{
#pragma omp parallel sections
{
#pragma omp section
XAXIS();
#pragma omp section
YAXIS();
Appendix A
Examples
189
1
2
3
4
#pragma omp section
ZAXIS();
}
}
C/C++
Fortran
Example A.12.1f
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
SUBROUTINE SECT_EXAMPLE()
!$OMP PARALLEL SECTIONS
!$OMP SECTION
CALL XAXIS()
!$OMP SECTION
CALL YAXIS()
!$OMP SECTION
CALL ZAXIS()
!$OMP END PARALLEL SECTIONS
END SUBROUTINE SECT_EXAMPLE
Fortran
22
The firstprivate Clause and the
sections Construct
23
24
25
26
27
28
29
30
In the following example of the sections construct (Section 2.5.2 on page 48) the
firstprivate clause is used to initialize the private copy of section_count of
each thread. The problem is that the section constructs modify section_count,
which breaks the independence of the section constructs. When different threads
execute each section, both sections will print the value 1. When the same thread
executes the two sections, one section will print the value 1 and the other will print the
value 2. Since the order of execution of the two sections in this case is unspecified, it is
unspecified which section prints which value.
31
Example A.13.1c
32
#include <omp.h>
21
A.13
190
OpenMP API • Version 3.1 July 2011
C/C++
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
#include <stdio.h>
#define NT 4
int main( ) {
int section_count = 0;
omp_set_dynamic(0);
omp_set_num_threads(NT);
#pragma omp parallel
#pragma omp sections firstprivate( section_count )
{
#pragma omp section
{
section_count++;
/* may print the number one or two */
printf( "section_count %d\n", section_count );
}
#pragma omp section
{
section_count++;
/* may print the number one or two */
printf( "section_count %d\n", section_count );
}
}
return 1;
}
C/C++
Fortran
25
Example A.13.1f
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
program section
use omp_lib
integer :: section_count = 0
integer, parameter :: NT = 4
call omp_set_dynamic(.false.)
call omp_set_num_threads(NT)
!$omp parallel
!$omp sections firstprivate ( section_count )
!$omp section
section_count = section_count + 1
! may print the number one or two
print *, 'section_count', section_count
!$omp section
section_count = section_count + 1
! may print the number one or two
print *, 'section_count', section_count
!$omp end sections
!$omp end parallel
end program section
Fortran
45
Appendix A
Examples
191
1
A.14
The single Construct
2
3
4
5
6
7
8
The following example demonstrates the single construct (Section 2.5.3 on page 50).
In the example, only one thread prints each of the progress messages. All other threads
will skip the single region and stop at the barrier at the end of the single construct
until all threads in the team have reached the barrier. If other threads can proceed
without waiting for the thread executing the single region, a nowait clause can be
specified, as is done in the third single construct in this example. The user must not
make any assumptions as to which thread will execute a single region.
9
Example A.14.1c
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
C/C++
#include <stdio.h>
void work1() {}
void work2() {}
void single_example()
{
#pragma omp parallel
{
#pragma omp single
printf("Beginning work1.\n");
work1();
#pragma omp single
printf("Finishing work1.\n");
#pragma omp single nowait
printf("Finished work1 and beginning work2.\n");
work2();
}
}
C/C++
192
OpenMP API • Version 3.1 July 2011
Fortran
Example A.14.1f
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
SUBROUTINE WORK1()
END SUBROUTINE WORK1
SUBROUTINE WORK2()
END SUBROUTINE WORK2
PROGRAM SINGLE_EXAMPLE
!$OMP PARALLEL
!$OMP SINGLE
print *, "Beginning work1."
!$OMP END SINGLE
CALL WORK1()
!$OMP SINGLE
print *, "Finishing work1."
!$OMP END SINGLE
!$OMP SINGLE
print *, "Finished work1 and beginning work2."
!$OMP END SINGLE NOWAIT
CALL WORK2()
!$OMP END PARALLEL
END PROGRAM SINGLE_EXAMPLE
Fortran
30
31
32
33
34
35
36
A.15
Tasking Constructs
The following example shows how to traverse a tree-like structure using explicit tasks
(see Section 2.7 on page 61). Note that the traverse function should be called from
within a parallel region for the different specified tasks to be executed in parallel. Also
note that the tasks will be executed in no specified order because there are no
synchronization directives. Thus, assuming that the traversal will be done in post order,
as in the sequential code, is wrong.
Appendix A
Examples
193
Example A.15.1c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
C/C++
struct node {
struct node *left;
struct node *right;
};
extern void process(struct node *);
void traverse( struct node *p ) {
if (p->left)
#pragma omp task
// p is firstprivate by default
traverse(p->left);
if (p->right)
#pragma omp task
// p is firstprivate by default
traverse(p->right);
process(p);
}
C/C++
Fortran
Example A.15.1f
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
RECURSIVE SUBROUTINE traverse ( P )
TYPE Node
TYPE(Node), POINTER :: left, right
END TYPE Node
TYPE(Node) :: P
IF (associated(P%left)) THEN
!$OMP TASK
! P is firstprivate by default
call traverse(P%left)
!$OMP END TASK
ENDIF
IF (associated(P%right)) THEN
!$OMP TASK
! P is firstprivate by default
call traverse(P%right)
!$OMP END TASK
ENDIF
CALL process ( P )
END SUBROUTINE
Fortran
194
OpenMP API • Version 3.1 July 2011
1
2
3
In the next example, we force a postorder traversal of the tree by adding a taskwait
directive (see Section 2.8.4 on page 72). Now, we can safely assume that the left and
right sons have been executed before we process the current node.
4
Example A.15.2c
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
C/C++
struct node {
struct node *left;
struct node *right;
};
extern void process(struct node *);
void postorder_traverse( struct node *p ) {
if (p->left)
#pragma omp task
// p is firstprivate by default
postorder_traverse(p->left);
if (p->right)
#pragma omp task
// p is firstprivate by default
postorder_traverse(p->right);
#pragma omp taskwait
process(p);
}
C/C++
Fortran
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
Example A.15.2f
RECURSIVE SUBROUTINE traverse ( P )
TYPE Node
TYPE(Node), POINTER :: left, right
END TYPE Node
TYPE(Node) :: P
IF (associated(P%left)) THEN
!$OMP TASK
! P is firstprivate by default
call traverse(P%left)
!$OMP END TASK
ENDIF
IF (associated(P%right)) THEN
!$OMP TASK
! P is firstprivate by default
call traverse(P%right)
!$OMP END TASK
ENDIF
!$OMP TASKWAIT
CALL process ( P )
END SUBROUTINE
Fortran
Appendix A
Examples
195
1
2
3
4
5
The following example demonstrates how to use the task construct to process elements
of a linked list in parallel. The thread executing the single region generates all of the
explicit tasks, which are then executed by the threads in the current team. The pointer p
is firstprivate by default on the task construct so it is not necessary to specify it
in a firstprivate clause (see page 86).
6
Example A.15.3c
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
C/C++
typedef struct node node;
struct node {
int data;
node * next;
};
void process(node * p)
{
/* do work here */
}
void increment_list_items(node * head)
{
#pragma omp parallel
{
#pragma omp single
{
node * p = head;
while (p) {
#pragma omp task
// p is firstprivate by default
process(p);
p = p->next;
}
}
}
}
C/C++
196
OpenMP API • Version 3.1 July 2011
Fortran
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
Example A.15.3f
MODULE LIST
TYPE NODE
INTEGER :: PAYLOAD
TYPE (NODE), POINTER :: NEXT
END TYPE NODE
CONTAINS
SUBROUTINE PROCESS(p)
TYPE (NODE), POINTER :: P
! do work here
END SUBROUTINE
SUBROUTINE INCREMENT_LIST_ITEMS (HEAD)
TYPE (NODE), POINTER :: HEAD
TYPE (NODE), POINTER :: P
!$OMP PARALLEL PRIVATE(P)
!$OMP SINGLE
P => HEAD
DO
!$OMP TASK
! P is firstprivate by default
CALL PROCESS(P)
!$OMP END TASK
P => P%NEXT
IF ( .NOT. ASSOCIATED (P) ) EXIT
END DO
!$OMP END SINGLE
!$OMP END PARALLEL
END SUBROUTINE
END MODULE
Fortran
Appendix A
Examples
197
1
2
3
4
The fib() function should be called from within a parallel region for the different
specified tasks to be executed in parallel. Also, only one thread of the parallel
region should call fib() unless multiple concurrent Fibonacci computations are
desired.
5
Example A.15.4c
6
7
8
9
10
11
12
13
14
15
16
17
18
C/C++
int fib(int n) {
int i, j;
if (n<2)
return n;
else {
#pragma omp task shared(i)
i=fib(n-1);
#pragma omp task shared(j)
j=fib(n-2);
#pragma omp taskwait
return i+j;
}
}
C/C++
Fortran
Example A.15.4f
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
!$OMP
!$OMP
!$OMP
!$OMP
!$OMP
RECURSIVE INTEGER FUNCTION fib(n) RESULT(res)
INTEGER n, i, j
IF ( n .LT. 2) THEN
res = n
ELSE
TASK SHARED(i)
i = fib( n-1 )
END TASK
TASK SHARED(j)
j = fib( n-2 )
END TASK
TASKWAIT
res = i+j
END IF
END FUNCTION
Fortran
Note: There are more efficient algorithms for computing Fibonacci numbers. This
classic recursion algorithm is for illustrative purposes.
35
36
198
OpenMP API • Version 3.1 July 2011
1
2
3
4
5
6
7
The following example demonstrates a way to generate a large number of tasks with one
thread and execute them with the threads in the team (see Section 2.7.3 on page 65).
While generating these tasks, the implementation may reach its limit on unassigned
tasks. If it does, the implementation is allowed to cause the thread executing the task
generating loop to suspend its task at the task scheduling point in the task directive,
and start executing unassigned tasks. Once the number of unassigned tasks is
sufficiently low, the thread may resume execution of the task generating loop.
8
Example A.15.5c
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
C/C++
#define LARGE_NUMBER 10000000
double item[LARGE_NUMBER];
extern void process(double);
int main() {
#pragma omp parallel
{
#pragma omp single
{
int i;
for (i=0; i<LARGE_NUMBER; i++)
#pragma omp task
// i is firstprivate, item is shared
process(item[i]);
}
}
}
C/C++
Fortran
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
Example A.15.5f
real*8 item(10000000)
integer i
!$omp parallel
!$omp single ! loop iteration variable i is private
do i=1,10000000
!$omp task
! i is firstprivate, item is shared
call process(item(i))
!$omp end task
end do
!$omp end single
!$omp end parallel
end
Fortran
Appendix A
Examples
199
1
2
3
4
5
6
7
The following example is the same as the previous one, except that the tasks are
generated in an untied task (see Section 2.7 on page 61). While generating the tasks, the
implementation may reach its limit on unassigned tasks. If it does, the implementation is
allowed to cause the thread executing the task generating loop to suspend its task at the
task scheduling point in the task directive, and start executing unassigned tasks. If
that thread begins execution of a task that takes a long time to complete, the other
threads may complete all the other tasks before it is finished.
8
9
10
11
In this case, since the loop is in an untied task, any other thread is eligible to resume the
task generating loop. In the previous examples, the other threads would be forced to idle
until the generating thread finishes its long task, since the task generating loop was in a
tied task.
12
Example A.15.6c
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
#define LARGE_NUMBER 10000000
double item[LARGE_NUMBER];
extern void process(double);
int main() {
#pragma omp parallel
{
#pragma omp single
{
int i;
#pragma omp task untied
// i is firstprivate, item is shared
{
for (i=0; i<LARGE_NUMBER; i++)
#pragma omp task
process(item[i]);
}
}
}
return 0;
}
C/C++
C/C++
200
OpenMP API • Version 3.1 July 2011
Fortran
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Example A.15.6f
real*8 item(10000000)
!$omp parallel
!$omp single
!$omp task untied
! loop iteration variable i is private
do i=1,10000000
!$omp task ! i is firstprivate, item is shared
call process(item(i))
!$omp end task
end do
!$omp end task
!$omp end single
!$omp end parallel
end
Fortran
16
17
18
19
20
21
The following two examples demonstrate how the scheduling rules illustrated in
Section 2.7.3 on page 65 affect the usage of threadprivate variables in tasks. A
threadprivate variable can be modified by another task that is executed by the
same thread. Thus, the value of a threadprivate variable cannot be assumed to be
unchanged across a task scheduling point. In untied tasks, task scheduling points may be
added in any place by the implementation.
22
23
24
A task switch may occur at a task scheduling point. A single thread may execute both of
the task regions that modify tp. The parts of these task regions in which tp is modified
may be executed in any order so the resulting value of var can be either 1 or 2.
Appendix A
Examples
201
Example A.15.7c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
C/C++
int tp;
#pragma omp threadprivate(tp)
int var;
void work()
{
#pragma omp task
{
/* do work here */
#pragma omp task
{
tp = 1;
/* do work here */
#pragma omp task
{
/* no modification of tp */
}
var = tp; //value of tp can be 1 or 2
}
tp = 2;
}
}
C/C++
Fortran
Example A.15.7f
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
!$omp
!$omp
!$omp
!$omp
!$omp
!$omp
!$omp
module example
integer tp
threadprivate(tp)
integer var
contains
subroutine work
use globals
task
! do work here
task
tp = 1
! do work here
task
! no modification of tp
end task
var = tp
! value of var can be 1 or 2
end task
tp = 2
end task
end subroutine
end module
Fortran
202
OpenMP API • Version 3.1 July 2011
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
In this example, scheduling constraints (see Section 2.7.3 on page 65) prohibit a thread
in the team from executing a new task that modifies tp while another such task region
tied to the same thread is suspended. Therefore, the value written will persist across the
task scheduling point.
Example A.15.8c
C/C++
int tp;
#pragma omp threadprivate(tp)
int var;
void work()
{
#pragma omp parallel
{
/* do work here */
#pragma omp task
{
tp++;
/* do work here */
#pragma omp task
{
/* do work here but don't modify tp */
}
var = tp; //Value does not change after write above
}
}
}
C/C++
Fortran
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
Example A.15.8f
!$omp
!$omp
!$omp
!$omp
!$omp
!$omp
!$omp
module example
integer tp
threadprivate(tp)
integer var
contains
subroutine work
parallel
! do work here
task
tp = tp + 1
! do work here
task
! do work here but don't modify tp
end task
var = tp
! value does not change after write above
end task
end parallel
end subroutine
Appendix A
Examples
203
1
end module
Fortran
2
3
4
5
The following two examples demonstrate how the scheduling rules illustrated in
Section 2.7.3 on page 65 affect the usage of locks and critical sections in tasks. If a lock
is held across a task scheduling point, no attempt should be made to acquire the same
lock in any code that may be interleaved. Otherwise, a deadlock is possible.
6
7
8
In the example below, suppose the thread executing task 1 defers task 2. When it
encounters the task scheduling point at task 3, it could suspend task 1 and begin task 2
which will result in a deadlock when it tries to enter critical region 1.
9
Example A.15.9c
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
C/C++
void work()
{
#pragma omp task
{ //Task 1
#pragma omp task
{ //Task 2
#pragma omp critical //Critical region 1
{/*do work here */ }
}
#pragma omp critical //Critical Region 2
{
//Capture data for the following task
#pragma omp task
{ /* do work here */ } //Task 3
}
}
}
C/C++
204
OpenMP API • Version 3.1 July 2011
Fortran
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
Example A.15.9f
!$omp
!$omp
!$omp
!$omp
!$omp
!$omp
!$omp
!$omp
!$omp
!$omp
module example
contains
subroutine work
task
! Task 1
task
! Task 2
critical
! Critical region 1
! do work here
end critical
end task
critical
! Critical region 2
! Capture data for the following task
task
!Task 3
! do work here
end task
end critical
end task
end subroutine
end module
Fortran
Appendix A
Examples
205
1
2
3
4
In the following example, lock is held across a task scheduling point. However,
according to the scheduling restrictions outlined in Section 2.7.3 on page 65, the
executing thread can't begin executing one of the non-descendant tasks that also acquires
lock before the task region is complete. Therefore, no deadlock is possible.
5
Example A.15.10c
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
C/C++
#include <omp.h>
void work() {
omp_lock_t lock;
omp_init_lock(&lock);
#pragma omp parallel
{
int i;
#pragma omp for
for (i = 0; i < 100; i++) {
#pragma omp task
{
// lock is shared by default in the task
omp_set_lock(&lock);
// Capture data for the following task
#pragma omp task
// Task Scheduling Point 1
{ /* do work here */ }
omp_unset_lock(&lock);
}
}
}
omp_destroy_lock(&lock);
}
C/C++
206
OpenMP API • Version 3.1 July 2011
Fortran
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
Example A.15.10f
module example
include 'omp_lib.h'
integer (kind=omp_lock_kind) lock
integer i
contains
subroutine work
call omp_init_lock(lock)
!$omp parallel
!$omp do
do i=1,100
!$omp task
! Outer task
call omp_set_lock(lock)
! lock is shared by
! default in the task
! Capture data for the following task
!$omp task
! Task Scheduling Point 1
! do work here
!$omp end task
call omp_unset_lock(lock)
!$omp end task
end do
!$omp end parallel
call omp_destroy_lock(lock)
end subroutine
end module
Fortran
27
28
29
30
31
32
33
34
35
36
The following examples illustrate the use of the mergeable clause in the task
construct. In this first example, the task construct has been annotated with the
mergeable clause (see Section 2.7.1 on page 61). The addition of this clause allows
the implementation to reuse the data environment (including the ICVs) of the parent task
for the task inside foo if the task is included or undeferred (see Section 1.2.3 on page
8). Thus, the result of the execution may differ depending on whether the task is merged
or not. Therefore the mergeable clause needs to be used with caution. In this example,
the use of the mergeable clause is safe. As x is a shared variable the outcome does not
depend on whether or not the task is merged (that is, the task will always increment the
same variable and will always compute the same value for x).
37
Example A.15.11c
38
39
40
41
#include <stdio.h>
void foo ( )
{
int x = 2;
C/C++
Appendix A
Examples
207
1
2
3
4
5
6
7
#pragma omp task shared(x) mergeable
{
x++;
}
#pragma omp taskwait
printf("%d\n",x); // prints 3
}
C/C++
Fortran
Example A.15.11f
8
9
10
11
12
13
14
15
16
17
subroutine foo()
integer :: x
x = 2
!$omp task shared(x) mergeable
x = x + 1
!$omp end task
!$omp taskwait
print *, x
! prints 3
end subroutine
Fortran
18
19
20
21
22
This second example shows an incorrect use of the mergeable clause. In this
example, the created task will access different instances of the variable x if the task is
not merged, as x is firstprivate, but it will access the same variable x if the task
is merged. As a result, the behavior of the program is unspecified and it can print two
different values for x depending on the decisions taken by the implementation.
23
Example A.15.12c
24
25
26
27
28
29
30
31
32
33
34
#include <stdio.h>
void foo ( )
{
int x = 2;
#pragma omp task mergeable
{
x++;
}
#pragma omp taskwait
printf("%d\n",x); // prints 2 or 3
}
C/C++
C/C++
35
36
208
OpenMP API • Version 3.1 July 2011
Fortran
1
2
3
4
5
6
7
8
9
10
Example A.15.12f
subroutine foo()
integer :: x
x = 2
!$omp task mergeable
x = x + 1
!$omp end task
!$omp taskwait
print *, x
! prints 2 or 3
end subroutine
Fortran
11
12
13
14
15
The following example shows the use of the final clause (see Section 2.7.1 on page
61) and the omp_in_final API call (see Section 3.2.20 on page 140) in a recursive
binary search program. To reduce overhead, once a certain depth of recursion is reached
the program uses the final clause to create only included tasks, which allow
additional optimizations.
16
17
18
19
20
21
22
23
24
The use of the omp_in_final API call allows programmers to optimize their code by
specifying which parts of the program are not necessary when a task can create only
included tasks (that is, the code is inside a final task). In this example, the use of a
different state variable is not necessary so once the program reaches the part of the
computation that is finalized and copying from the parent state to the new state is
eliminated. The allocation of new_state in the stack could also be avoided but it
would make this example less clear. The final clause is most effective when used in
conjunction with the mergeable clause since all tasks created in a final task region
are included tasks that can be merged if the mergeable clause is present.
25
Example A.15.13c
26
27
28
29
30
31
32
33
34
35
36
37
38
39
#include <string.h>
#include <omp.h>
#define LIMIT 3 /* arbitrary limit on recursion depth */
void check_solution(char *);
void bin_search (int pos, int n, char *state)
{
if ( pos == n ) {
check_solution(state);
return;
}
#pragma omp task final( pos > LIMIT ) mergeable
{
char new_state[n];
if (!omp_in_final() ) {
C/C++
Appendix A
Examples
209
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
memcpy(new_state, state, pos );
state = new_state;
}
state[pos] = 0;
bin_search(pos+1, n, state );
}
#pragma omp task final( pos > LIMIT ) mergeable
{
char new_state[n];
if (! omp_in_final() ) {
memcpy(new_state, state, pos );
state = new_state;
}
state[pos] = 1;
bin_search(pos+1, n, state );
}
#pragma omp taskwait
}
C/C++
Fortran
19
Example A.15.13f
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
recursive subroutine bin_search(pos, n, state)
use omp_lib
integer :: pos, n
character, pointer :: state(:)
character, target, dimension(n) :: new_state1, new_state2
integer, parameter :: LIMIT = 3
if (pos .eq. n) then
call check_solution(state)
return
endif
!$omp task final(pos > LIMIT) mergeable
if (.not. omp_in_final()) then
new_state1(1:pos) = state(1:pos)
state => new_state1
endif
state(pos+1) = 'z'
call bin_search(pos+1, n, state)
!$omp end task
!$omp task final(pos > LIMIT) mergeable
if (.not. omp_in_final()) then
new_state2(1:pos) = state(1:pos)
state => new_state2
endif
state(pos+1) = 'y'
call bin_search(pos+1, n, state)
!$omp end task
!$omp taskwait
210
OpenMP API • Version 3.1 July 2011
1
end subroutine
Fortran
2
3
4
5
6
7
8
The following example illustrates the difference between the if and the final
clauses. The if clause has a local effect. In the first nest of tasks, the one that has the
if clause will be undeferred but the task nested inside that task will not be affected by
the if clause and will be created as usual. Alternatively, the final clause affects all
task constructs in the final task region but not the final task itself. In the second
nest of tasks, the nested tasks will be created as included tasks. Note also that the
conditions for the if and final clauses are usually the opposite.
9
Example A.15.14c
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
C/C++
void foo ( )
{
int i;
#pragma omp task if(0) // This task is undeferred
{
#pragma omp task
// This task is a regular task
for (i = 0; i < 3; i++) {
#pragma omp task
// This task is a regular task
bar();
}
}
#pragma omp task final(1) // This task is a regular task
{
#pragma omp task // This task is included
for (i = 0; i < 3; i++) {
#pragma omp task
// This task is also included
bar();
}
}
}
C/C++
Fortran
30
Example A.15.14f
31
32
33
34
35
36
37
38
subroutine foo()
integer i
!$omp task if(.FALSE.) ! This task is undeferred
!$omp task
! This task is a regular task
do i = 1, 3
!$omp task
! This task is a regular task
call bar()
!$omp end task
Appendix A
Examples
211
1
2
3
4
5
6
7
8
9
10
11
12
13
enddo
!$omp end task
!$omp end task
!$omp task final(.TRUE.) ! This task is a regular task
!$omp task
! This task is included
do i = 1, 3
!$omp task
! This task is also included
call bar()
!$omp end task
enddo
!$omp end task
!$omp end task
end subroutine
Fortran
14
A.16
The taskyield Directive
15
16
17
18
19
The following example illustrates the use of the taskyield directive (see
Section 2.7.2 on page 64). The tasks in the example compute something useful and then
do some computation that must be done in a critical region. By using taskyield
when a task cannot get access to the critical region the implementation can suspend
the current task and schedule some other task that can do something useful.
20
Example A.16.1c
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
#include <omp.h>
C/C++
void something_useful ( void );
void something_critical ( void );
void foo ( omp_lock_t * lock, int n )
{
int i;
for ( i = 0; i < n; i++ )
#pragma omp task
{
something_useful();
while ( !omp_test_lock(lock) ) {
#pragma omp taskyield
}
something_critical();
omp_unset_lock(lock);
}
}
C/C++
212
OpenMP API • Version 3.1 July 2011
Fortran
Example A.16.1f
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
subroutine foo ( lock, n )
use omp_lib
integer (kind=omp_lock_kind) :: lock
integer n
integer i
do i = 1, n
!$omp task
call something_useful()
do while ( .not. omp_test_lock(lock) )
!$omp taskyield
end do
call something_critical()
call omp_unset_lock(lock)
!$omp end task
end do
end subroutine
Fortran
20
21
22
Fortran
23
24
A.17
The workshare Construct
25
26
The following are examples of the workshare construct (see Section 2.5.4 on page
52).
27
28
29
In the following example, workshare spreads work across the threads executing the
parallel region, and there is a barrier after the last statement. Implementations must
enforce Fortran execution rules inside of the workshare block.
Appendix A
Examples
213
Fortran (cont.)
Example A.17.1f
1
2
3
4
5
6
7
8
9
10
11
12
13
14
SUBROUTINE WSHARE1(AA, BB, CC, DD, EE, FF, N)
INTEGER N
REAL AA(N,N), BB(N,N), CC(N,N), DD(N,N), EE(N,N), FF(N,N)
!$OMP
!$OMP
!$OMP
!$OMP
PARALLEL
WORKSHARE
AA = BB
CC = DD
EE = FF
END WORKSHARE
END PARALLEL
END SUBROUTINE WSHARE1
15
16
17
In the following example, the barrier at the end of the first workshare region is
eliminated with a nowait clause. Threads doing CC = DD immediately begin work on
EE = FF when they are done with CC = DD.
18
Example A.17.2f
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
SUBROUTINE WSHARE2(AA, BB, CC, DD, EE, FF, N)
INTEGER N
REAL AA(N,N), BB(N,N), CC(N,N)
REAL DD(N,N), EE(N,N), FF(N,N)
!$OMP
!$OMP
!$OMP
!$OMP
!$OMP
!$OMP
PARALLEL
WORKSHARE
AA = BB
CC = DD
END WORKSHARE NOWAIT
WORKSHARE
EE = FF
END WORKSHARE
END PARALLEL
END SUBROUTINE WSHARE2
The following example shows the use of an atomic directive inside a workshare
construct. The computation of SUM(AA) is workshared, but the update to R is atomic.
34
35
214
OpenMP API • Version 3.1 July 2011
Fortran (cont.)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
Example A.17.3f
SUBROUTINE WSHARE3(AA, BB, CC, DD, N)
INTEGER N
REAL AA(N,N), BB(N,N), CC(N,N), DD(N,N)
REAL R
!$OMP
!$OMP
!$OMP
!$OMP
!$OMP
R=0
PARALLEL
WORKSHARE
AA = BB
ATOMIC UPDATE
R = R + SUM(AA)
CC = DD
END WORKSHARE
END PARALLEL
END SUBROUTINE WSHARE3
18
19
20
21
Fortran WHERE and FORALL statements are compound statements, made up of a control
part and a statement part. When workshare is applied to one of these compound
statements, both the control and the statement parts are workshared. The following
example shows the use of a WHERE statement in a workshare construct.
22
23
24
25
26
27
Each task gets worked on in order by the threads:
AA = BB then
CC = DD then
EE .ne. 0 then
FF = 1 / EE then
GG = HH
28
Example A.17.4f
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
SUBROUTINE WSHARE4(AA, BB, CC, DD, EE, FF, GG, HH, N)
INTEGER N
REAL AA(N,N), BB(N,N), CC(N,N)
REAL DD(N,N), EE(N,N), FF(N,N)
REAL GG(N,N), HH(N,N)
!$OMP
!$OMP
!$OMP
!$OMP
PARALLEL
WORKSHARE
AA = BB
CC = DD
WHERE (EE .ne. 0) FF = 1 / EE
GG = HH
END WORKSHARE
END PARALLEL
END SUBROUTINE WSHARE4
Appendix A
Examples
215
Fortran (cont.)
1
2
In the following example, an assignment to a shared scalar variable is performed by one
thread in a workshare while all other threads in the team wait.
3
Example A.17.5f
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
SUBROUTINE WSHARE5(AA, BB, CC, DD, N)
INTEGER N
REAL AA(N,N), BB(N,N), CC(N,N), DD(N,N)
INTEGER SHR
!$OMP
!$OMP
!$OMP
!$OMP
PARALLEL SHARED(SHR)
WORKSHARE
AA = BB
SHR = 1
CC = DD * SHR
END WORKSHARE
END PARALLEL
END SUBROUTINE WSHARE5
20
21
22
23
The following example contains an assignment to a private scalar variable, which is
performed by one thread in a workshare while all other threads wait. It is nonconforming because the private scalar variable is undefined after the assignment
statement.
24
Example A.17.6f
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
SUBROUTINE WSHARE6_WRONG(AA, BB, CC, DD, N)
INTEGER N
REAL AA(N,N), BB(N,N), CC(N,N), DD(N,N)
INTEGER PRI
!$OMP
!$OMP
!$OMP
!$OMP
PARALLEL PRIVATE(PRI)
WORKSHARE
AA = BB
PRI = 1
CC = DD * PRI
END WORKSHARE
END PARALLEL
END SUBROUTINE WSHARE6_WRONG
216
OpenMP API • Version 3.1 July 2011
1
2
3
4
Fortran execution rules must be enforced inside a workshare construct. In the
following example, the same result is produced in the following program fragment
regardless of whether the code is executed sequentially or inside an OpenMP program
with multiple threads:
5
Example A.17.7f
6
7
8
9
10
11
12
13
14
15
16
17
SUBROUTINE WSHARE7(AA, BB, CC, N)
INTEGER N
REAL AA(N), BB(N), CC(N)
!$OMP
!$OMP
!$OMP
!$OMP
PARALLEL
WORKSHARE
AA(1:50) = BB(11:60)
CC(11:20) = AA(1:10)
END WORKSHARE
END PARALLEL
END SUBROUTINE WSHARE7
Fortran
18
A.18
The master Construct
19
20
21
The following example demonstrates the master construct (Section 2.8.1 on page 67). In
the example, the master keeps track of how many iterations have been executed and
prints out a progress report. The other threads skip the master region without waiting.
22
Example A.18.1c
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
#include <stdio.h>
C/C++
extern float average(float,float,float);
void master_example( float* x, float* xold, int n, float tol )
{
int c, i, toobig;
float error, y;
c = 0;
#pragma omp parallel
{
do{
#pragma omp for private(i)
for( i = 1; i < n-1; ++i ){
xold[i] = x[i];
Appendix A
Examples
217
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
}
#pragma omp single
{
toobig = 0;
}
#pragma omp for private(i,y,error) reduction(+:toobig)
for( i = 1; i < n-1; ++i ){
y = x[i];
x[i] = average( xold[i-1], x[i], xold[i+1] );
error = y - x[i];
if( error > tol || error < -tol ) ++toobig;
}
#pragma omp master
{
++c;
printf( "iteration %d, toobig=%d\n", c, toobig );
}
}while( toobig > 0 );
}
}
C/C++
218
OpenMP API • Version 3.1 July 2011
Fortran
Example A.18.1f
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
!$OMP
!$OMP
!$OMP
!$OMP
!$OMP
!$OMP
!$OMP
!$OMP
SUBROUTINE MASTER_EXAMPLE( X, XOLD, N, TOL )
REAL X(*), XOLD(*), TOL
INTEGER N
INTEGER C, I, TOOBIG
REAL ERROR, Y, AVERAGE
EXTERNAL AVERAGE
C = 0
TOOBIG = 1
PARALLEL
DO WHILE( TOOBIG > 0 )
DO PRIVATE(I)
DO I = 2, N-1
XOLD(I) = X(I)
ENDDO
SINGLE
TOOBIG = 0
END SINGLE
DO PRIVATE(I,Y,ERROR), REDUCTION(+:TOOBIG)
DO I = 2, N-1
Y = X(I)
X(I) = AVERAGE( XOLD(I-1), X(I), XOLD(I+1) )
ERROR = Y-X(I)
IF( ERROR > TOL .OR. ERROR < -TOL ) TOOBIG = TOOBIG+1
ENDDO
MASTER
C = C + 1
PRINT *, 'Iteration ', C, 'TOOBIG=', TOOBIG
END MASTER
ENDDO
END PARALLEL
END SUBROUTINE MASTER_EXAMPLE
Fortran
33
34
35
36
37
38
39
A.19
The critical Construct
The following example includes several critical constructs (Section 2.8.2 on page
68). The example illustrates a queuing model in which a task is dequeued and worked
on. To guard against multiple threads dequeuing the same task, the dequeuing operation
must be in a critical region. Because the two queues in this example are
independent, they are protected by critical constructs with different names, xaxis
and yaxis.
Appendix A
Examples
219
Example A.19.1c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
C/C++
int dequeue(float *a);
void work(int i, float *a);
void critical_example(float *x, float *y)
{
int ix_next, iy_next;
#pragma omp parallel shared(x, y) private(ix_next, iy_next)
{
#pragma omp critical (xaxis)
ix_next = dequeue(x);
work(ix_next, x);
#pragma omp critical (yaxis)
iy_next = dequeue(y);
work(iy_next, y);
}
}
C/C++
Fortran
Example A.19.1f
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
SUBROUTINE CRITICAL_EXAMPLE(X, Y)
REAL X(*), Y(*)
INTEGER IX_NEXT, IY_NEXT
!$OMP PARALLEL SHARED(X, Y) PRIVATE(IX_NEXT, IY_NEXT)
!$OMP CRITICAL(XAXIS)
CALL DEQUEUE(IX_NEXT, X)
!$OMP END CRITICAL(XAXIS)
CALL WORK(IX_NEXT, X)
!$OMP CRITICAL(YAXIS)
CALL DEQUEUE(IY_NEXT,Y)
!$OMP END CRITICAL(YAXIS)
CALL WORK(IY_NEXT, Y)
!$OMP END PARALLEL
END SUBROUTINE CRITICAL_EXAMPLE
Fortran
220
OpenMP API • Version 3.1 July 2011
1
2
A.20
worksharing Constructs Inside a
critical Construct
3
4
5
6
7
8
9
10
The following example demonstrates using a worksharing construct inside a critical
construct (see Section 2.8.2 on page 68). This example is conforming because the
worksharing single region is not closely nested inside the critical region (see
Section 2.10 on page 111). A single thread executes the one and only section in the
sections region, and executes the critical region. The same thread encounters
the nested parallel region, creates a new team of threads, and becomes the master of
the new team. One of the threads in the new team enters the single region and
increments i by 1. At the end of this example i is equal to 2.
11
Example A.20.1c
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
void critical_work()
{
int i = 1;
#pragma omp parallel sections
{
#pragma omp section
{
#pragma omp critical (name)
{
#pragma omp parallel
{
#pragma omp single
{
i++;
}
}
}
}
}
}
C/C++
C/C++
Appendix A
Examples
221
Fortran
Example A.20.1f
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
SUBROUTINE CRITICAL_WORK()
INTEGER I
I = 1
!$OMP
!$OMP
!$OMP
!$OMP
!$OMP
!$OMP
!$OMP
!$OMP
!$OMP
PARALLEL SECTIONS
SECTION
CRITICAL (NAME)
PARALLEL
SINGLE
I = I + 1
END SINGLE
END PARALLEL
END CRITICAL (NAME)
END PARALLEL SECTIONS
END SUBROUTINE CRITICAL_WORK
Fortran
18
A.21
Binding of barrier Regions
19
20
The binding rules call for a barrier region to bind to the closest enclosing
parallel region (see Section 2.8.3 on page 70).
21
22
23
24
In the following example, the call from the main program to sub2 is conforming because
the barrier region (in sub3) binds to the parallel region in sub2. The call from
the main program to sub1 is conforming because the barrier region binds to the
parallel region in subroutine sub2.
25
26
27
28
The call from the main program to sub3 is conforming because the barrier region
binds to the implicit inactive parallel region enclosing the sequential part. Also note
that the barrier region in sub3 when called from sub2 only synchronizes the team of
threads in the enclosing parallel region and not all the threads created in sub1.
222
OpenMP API • Version 3.1 July 2011
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
Example A.21.1c
C/C++
void work(int n) {}
void sub3(int n)
{
work(n);
#pragma omp barrier
work(n);
}
void sub2(int k)
{
#pragma omp parallel shared(k)
sub3(k);
}
void sub1(int n)
{
int i;
#pragma omp parallel private(i) shared(n)
{
#pragma omp for
for (i=0; i<n; i++)
sub2(i);
}
}
int main()
{
sub1(2);
sub2(2);
sub3(2);
return 0;
}
C/C++
Appendix A
Examples
223
Fortran
Example A.21.1f
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
SUBROUTINE WORK(N)
INTEGER N
END SUBROUTINE WORK
SUBROUTINE SUB3(N)
INTEGER N
CALL WORK(N)
!$OMP
BARRIER
CALL WORK(N)
END SUBROUTINE SUB3
SUBROUTINE SUB2(K)
INTEGER K
!$OMP
PARALLEL SHARED(K)
CALL SUB3(K)
!$OMP
END PARALLEL
END SUBROUTINE SUB2
SUBROUTINE SUB1(N)
INTEGER N
INTEGER I
!$OMP
PARALLEL PRIVATE(I) SHARED(N)
!$OMP
DO
DO I = 1, N
CALL SUB2(I)
END DO
!$OMP
END PARALLEL
END SUBROUTINE SUB1
PROGRAM EXAMPLE
CALL SUB1(2)
CALL SUB2(2)
CALL SUB3(2)
END PROGRAM EXAMPLE
Fortran
37
A.22
The atomic Construct
The following example avoids race conditions (simultaneous updates of an element of x
by multiple threads) by using the atomic construct (Section 2.8.5 on page 73).
38
39
224
OpenMP API • Version 3.1 July 2011
1
2
3
4
The advantage of using the atomic construct in this example is that it allows updates
of two different elements of x to occur in parallel. If a critical construct (see
Section 2.8.2 on page 68) were used instead, then all updates to elements of x would be
executed serially (though not in any guaranteed order).
5
6
Note that the atomic directive applies only to the statement immediately following it.
As a result, elements of y are not updated atomically in this example.
7
Example A.22.1c
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
C/C++
float work1(int i)
{
return 1.0 * i;
}
float work2(int i)
{
return 2.0 * i;
}
void atomic_example(float *x, float *y, int *index, int n)
{
int i;
#pragma omp parallel for shared(x, y, index, n)
for (i=0; i<n; i++) {
#pragma omp atomic update
x[index[i]] += work1(i);
y[i] += work2(i);
}
}
int main()
{
float x[1000];
float y[10000];
int index[10000];
int i;
for (i = 0; i < 10000; i++) {
index[i] = i % 1000;
y[i]=0.0;
}
for (i = 0; i < 1000; i++)
x[i] = 0.0;
atomic_example(x, y, index, 10000);
return 0;
}
C/C++
Appendix A
Examples
225
Fortran
Example A.22.1f
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
REAL FUNCTION WORK1(I)
INTEGER I
WORK1 = 1.0 * I
RETURN
END FUNCTION WORK1
REAL FUNCTION WORK2(I)
INTEGER I
WORK2 = 2.0 * I
RETURN
END FUNCTION WORK2
SUBROUTINE SUB(X, Y, INDEX, N)
REAL X(*), Y(*)
INTEGER INDEX(*), N
INTEGER I
!$OMP
!$OMP
PARALLEL DO SHARED(X, Y, INDEX, N)
DO I=1,N
ATOMIC UPDATE
X(INDEX(I)) = X(INDEX(I)) + WORK1(I)
Y(I) = Y(I) + WORK2(I)
ENDDO
END SUBROUTINE SUB
PROGRAM ATOMIC_EXAMPLE
REAL X(1000), Y(10000)
INTEGER INDEX(10000)
INTEGER I
DO I=1,10000
INDEX(I) = MOD(I, 1000) + 1
Y(I) = 0.0
ENDDO
DO I = 1,1000
X(I) = 0.0
ENDDO
CALL SUB(X, Y, INDEX, 10000)
END PROGRAM ATOMIC_EXAMPLE
Fortran
226
OpenMP API • Version 3.1 July 2011
1
2
3
4
5
6
7
The following example illustrates the read and write clauses for the atomic
directive. These clauses ensure that the given variable is read or written, respectively, as
a whole. Otherwise, some other thread might read or write part of the variable while the
current thread was reading or writing another part of the variable. Note that most
hardware provides atomic reads and writes for some set of properly aligned variables of
specific sizes, but not necessarily for all the variable types supported by the OpenMP
API.
8
Example A.22.2c
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
C/C++
int atomic_read(const int *p)
{
int value;
/* Guarantee that the entire value of *p is read atomically. No part of
* *p can change during the read operation.
*/
#pragma omp atomic read
value = *p;
return value;
}
void atomic_write(int *p, int value)
{
/* Guarantee that value is stored atomically into *p. No part of *p can change
* until after the entire write operation is completed.
*/
#pragma omp atomic write
*p = value;
}
C/C++
Fortran
28
Example A.22.2f
29
30
31
32
33
34
35
36
37
38
39
40
41
function atomic_read(p)
integer :: atomic_read
integer, intent(in) :: p
! Guarantee that the entire value of p is read atomically. No part of
! p can change during the read operation.
!$omp atomic read
atomic_read = p
return
end function atomic_read
subroutine atomic_write(p, value)
integer, intent(out) :: p
Appendix A
Examples
227
1
2
3
4
5
6
integer, intent(in) :: value
! Guarantee that value is stored atomically into p. No part of p can change
! until after the entire write operation is completed.
!$omp atomic write
p = value
end subroutine atomic_write
Fortran
7
8
9
10
11
12
The following example illustrates the capture clause for the atomic directive. In
this case the value of a variable is captured, and then the variable is incremented. These
operations occur atomically. This particular example could be implemented using the
fetch-and-add instruction available on many kinds of hardware. The example also shows
a way to implement a spin lock using the capture and read clauses.
13
Example A.22.3c
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
int fetch_and_add(int *p)
{
/* Atomically read the value of *p and then increment it. The previous value is
* returned. This can be used to implement a simple lock as shown below.
*/
int old;
#pragma omp atomic capture
{ old = *p; (*p)++; }
return old;
}
C/C++
/*
* Use fetch_and_add to implement a lock
*/
struct locktype {
int ticketnumber;
int turn;
};
void do_locked_work(struct locktype *lock)
{
int atomic_read(const int *p);
void work();
// Obtain the lock
int myturn = fetch_and_add(&lock->ticketnumber);
while (atomic_read(&lock->turn) != myturn)
;
// Do some work. The flush is needed to ensure visibility of
// variables not involved in atomic directives
#pragma omp flush
work();
228
OpenMP API • Version 3.1 July 2011
1
2
3
4
#pragma omp flush
// Release the lock
fetch_and_add(&lock->turn);
}
C/C++
Fortran
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
Example A.22.3f
function fetch_and_add(p)
integer:: fetch_and_add
integer, intent(inout) :: p
! Atomically read the value of p and then increment it. The previous value is
! returned. This can be used to implement a simple lock as shown below.
!$omp atomic capture
fetch_and_add = p
p = p + 1
!$omp end atomic
end function fetch_and_add
! Use fetch_and_add to implement a lock
module m
interface
function fetch_and_add(p)
integer :: fetch_and_add
integer, intent(inout) :: p
end function
function atomic_read(p)
integer :: atomic_read
integer, intent(in) :: p
end function
end interface
type locktype
integer ticketnumber
integer turn
end type
contains
subroutine do_locked_work(lock)
type(locktype), intent(inout) :: lock
integer myturn
integer junk
! obtain the lock
myturn = fetch_and_add(lock%ticketnumber)
do while (atomic_read(lock%turn) .ne. myturn)
continue
enddo
! Do some work. The flush is needed to ensure visibility of variables
! not involved in atomic directives
Appendix A
Examples
229
1
2
3
4
5
6
7
8
!$omp flush
call work
!$omp flush
! Release the lock
junk = fetch_and_add(lock%turn)
end subroutine
end module
Fortran
9
10
11
12
A.23
Restrictions on the atomic Construct
13
14
The following non-conforming examples illustrate the restrictions on the atomic
construct given in Section 2.8.5 on page 73.
15
Example A.23.1c
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
void atomic_wrong ()
{
union {int n; float x;} u;
C/C++
#pragma omp parallel
{
#pragma omp atomic update
u.n++;
#pragma omp atomic update
u.x += 1.0;
/* Incorrect because the atomic constructs reference the same location
through incompatible types */
}
}
C/C++
Fortran
Example A.23.1f
32
33
34
SUBROUTINE ATOMIC_WRONG()
INTEGER:: I
230
OpenMP API • Version 3.1 July 2011
1
2
3
4
5
6
7
8
9
10
11
12
REAL:: R
EQUIVALENCE(I,R)
!$OMP
!$OMP
PARALLEL
ATOMIC UPDATE
I = I + 1
!$OMP
ATOMIC UPDATE
R = R + 1.0
! incorrect because I and R reference the same location
! but have different types
!$OMP
END PARALLEL
END SUBROUTINE ATOMIC_WRONG
Fortran
13
Example A.23.2c
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
void atomic_wrong2 ()
{
int x;
int *i;
float
*r;
C/C++
i = &x;
r = (float *)&x;
#pragma omp parallel
{
#pragma omp atomic update
*i += 1;
#pragma omp atomic update
*r += 1.0;
/* Incorrect because the atomic constructs reference the same location
through incompatible types */
}
}
C/C++
Appendix A
Examples
231
Fortran
1
2
The following example is non-conforming because I and R reference the same location
but have different types.
3
Example A.23.2f
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
SUBROUTINE SUB()
COMMON /BLK/ R
REAL R
!$OMP
ATOMIC UPDATE
R = R + 1.0
END SUBROUTINE SUB
SUBROUTINE ATOMIC_WRONG2()
COMMON /BLK/ I
INTEGER I
!$OMP
PARALLEL
!$OMP
ATOMIC UPDATE
I = I + 1
CALL SUB()
!$OMP
END PARALLEL
END SUBROUTINE ATOMIC_WRONG2
232
OpenMP API • Version 3.1 July 2011
1
2
Although the following example might work on some implementations, this is also nonconforming:
3
Example A.23.3f
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
SUBROUTINE ATOMIC_WRONG3
INTEGER:: I
REAL:: R
EQUIVALENCE(I,R)
!$OMP
!$OMP
PARALLEL
ATOMIC UPDATE
I = I + 1
! incorrect because I and R reference the same location
! but have different types
!$OMP
END PARALLEL
!$OMP
!$OMP
PARALLEL
ATOMIC UPDATE
R = R + 1.0
! incorrect because I and R reference the same location
! but have different types
!$OMP
END PARALLEL
END SUBROUTINE ATOMIC_WRONG3
Fortran
24
A.24
The flush Construct without a List
25
26
The following example (for Section 2.8.6 on page 78) distinguishes the shared variables
affected by a flush construct with no list from the shared objects that are not affected:
27
Example A.24.1c
28
29
30
31
32
33
34
35
36
37
int x, *p = &x;
C/C++
void f1(int *q)
{
*q = 1;
#pragma omp flush
/* x, p, and *q are flushed */
/* because they are shared and accessible */
/* q is not flushed because it is not shared. */
}
Appendix A
Examples
233
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
void f2(int *q)
{
#pragma omp barrier
*q = 2;
#pragma omp barrier
/*
/*
/*
/*
a barrier implies a flush */
x, p, and *q are flushed */
because they are shared and accessible */
q is not flushed because it is not shared. */
}
int g(int n)
{
int i = 1, j, sum = 0;
*p = 1;
#pragma omp parallel reduction(+: sum) num_threads(10)
{
f1(&j);
/* i, n and sum were not flushed */
/* because they were not accessible in f1 */
/* j was flushed because it was accessible */
sum += j;
f2(&j);
/* i, n, and sum were not flushed */
/* because they were not accessible in f2 */
/* j was flushed because it was accessible */
sum += i + j + *p + n;
}
return sum;
}
int main()
{
int result = g(7);
return result;
}
C/C++
234
OpenMP API • Version 3.1 July 2011
Fortran
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
Example A.24.1f
SUBROUTINE F1(Q)
COMMON /DATA/ X, P
INTEGER, TARGET :: X
INTEGER, POINTER :: P
INTEGER Q
!$OMP
Q = 1
FLUSH
! X, P and Q are flushed
! because they are shared and accessible
END SUBROUTINE F1
SUBROUTINE F2(Q)
COMMON /DATA/ X, P
INTEGER, TARGET :: X
INTEGER, POINTER :: P
INTEGER Q
!$OMP
BARRIER
Q = 2
!$OMP
BARRIER
! a barrier implies a flush
! X, P and Q are flushed
! because they are shared and accessible
END SUBROUTINE F2
INTEGER FUNCTION G(N)
COMMON /DATA/ X, P
INTEGER, TARGET :: X
INTEGER, POINTER :: P
INTEGER N
INTEGER I, J, SUM
!$OMP
!$OMP
I = 1
SUM = 0
P = 1
PARALLEL REDUCTION(+: SUM) NUM_THREADS(10)
CALL F1(J)
! I, N and SUM were not flushed
!
because they were not accessible in F1
! J was flushed because it was accessible
SUM = SUM + J
CALL F2(J)
! I, N, and SUM were not flushed
!
because they were not accessible in f2
! J was flushed because it was accessible
SUM = SUM + I + J + P + N
END PARALLEL
Appendix A
Examples
235
1
2
3
4
5
6
7
8
9
10
11
12
13
14
G = SUM
END FUNCTION G
PROGRAM FLUSH_NOLIST
COMMON /DATA/ X, P
INTEGER, TARGET :: X
INTEGER, POINTER :: P
INTEGER RESULT, G
P => X
RESULT = G(7)
PRINT *, RESULT
END PROGRAM FLUSH_NOLIST
Fortran
15
17
Placement of flush, barrier, taskwait
and taskyield Directives
18
19
20
21
The following example is non-conforming, because the flush, barrier, taskwait,
and taskyield directives are stand-alone directives and cannot be the immediate
substatement of an if statement. See Section 2.8.3 on page 70, Section 2.8.6 on page
78, Section 2.8.4 on page 72, and Section 2.7.2 on page 64.
22
Example A.25.1c
16
A.25
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
C/C++
void standalone_wrong()
{
int a = 1;
if (a != 0)
#pragma omp flush(a)
/* incorrect as flush cannot be immediate substatement
of if statement */
if (a != 0)
#pragma omp barrier
/* incorrect as barrier cannot be immediate substatement
of if statement */
if (a!=0)
#pragma omp taskyield
/* incorrect as taskyield cannot be immediate substatement of if statement */
236
OpenMP API • Version 3.1 July 2011
1
2
3
4
5
6
if (a != 0)
#pragma omp taskwait
/* incorrect as taskwait cannot be immediate substatement
of if statement */
}
C/C++
7
8
9
10
The following example is non-conforming, because the flush, barrier, taskwait,
and taskyield directives are stand-alone directives and cannot be the action
statement of an if statement or a labeled branch target.
Fortran
11
Example A.25.1f
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
SUBROUTINE STANDALONE_WRONG()
INTEGER A
A = 1
! the FLUSH directive must not be the action statement
! in an IF statement
IF (A .NE. 0) !$OMP FLUSH(A)
! the BARRIER directive must not be the action statement
! in an IF statement
IF (A .NE. 0) !$OMP BARRIER
! the TASKWAIT directive must not be the action statement
! in an IF statement
IF (A .NE. 0) !$OMP TASKWAIT
! the TASKYIELD directive must not be the action statement
! in an IF statement
IF (A .NE. 0) !$OMP TASKYIELD
GOTO 100
! the FLUSH directive must not be a labeled branch target
! statement
100 !$OMP FLUSH(A)
GOTO 200
! the BARRIER directive must not be a labeled branch target
! statement
200 !$OMP BARRIER
GOTO 300
! the TASKWAIT directive must not be a labeled branch target
! statement
300 !$OMP TASKWAIT
Appendix A
Examples
237
1
2
3
4
5
6
7
GOTO 400
! the TASKYIELD directive must not be a labeled branch target
! statement
400 !$OMP TASKYIELD
END SUBROUTINE
Fortran
8
9
10
The following version of the above example is conforming because the flush,
barrier, taskwait, and taskyield directives are enclosed in a compound
statement.
11
Example A.25.2c
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
void standalone_ok()
{
int a = 1;
#pragma omp
{
if (a !=
#pragma omp
}
if (a !=
#pragma omp
}
if (a !=
#pragma omp
}
if (a != 0)
#pragma omp
}
}
C/C++
parallel
0) {
flush(a)
0) {
barrier
0) {
taskwait
{
taskyield
}
C/C++
The following example is conforming because the flush, barrier, taskwait, and
taskyield directives are enclosed in an if construct or follow the labeled branch
target.
32
33
34
Fortran
35
Example A.25.2f
36
37
38
SUBROUTINE STANDALONE_OK()
INTEGER A
A = 1
238
OpenMP API • Version 3.1 July 2011
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
IF (A .NE. 0) THEN
!$OMP FLUSH(A)
ENDIF
IF (A .NE. 0) THEN
!$OMP BARRIER
ENDIF
IF (A .NE. 0) THEN
!$OMP TASKWAIT
ENDIF
IF (A .NE. 0) THEN
!$OMP TASKYIELD
ENDIF
GOTO 100
100 CONTINUE
!$OMP FLUSH(A)
GOTO 200
200 CONTINUE
!$OMP BARRIER
GOTO 300
300 CONTINUE
!$OMP TASKWAIT
GOTO 400
400 CONTINUE
!$OMP TASKYIELD
END SUBROUTINE
Fortran
26
28
The ordered Clause and the ordered
Construct
29
30
31
Ordered constructs (Section 2.8.7 on page 82) are useful for sequentially ordering the
output from work that is done in parallel. The following program prints out the indices
in sequential order:
27
A.26
Appendix A
Examples
239
Example A.26.1c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
C/C++
#include <stdio.h>
void work(int k)
{
#pragma omp ordered
printf(" %d\n", k);
}
void ordered_example(int lb, int ub, int stride)
{
int i;
#pragma omp parallel for ordered schedule(dynamic)
for (i=lb; i<ub; i+=stride)
work(i);
}
int main()
{
ordered_example(0, 100, 5);
return 0;
}
C/C++
240
OpenMP API • Version 3.1 July 2011
Fortran
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
Example A.26.1f
SUBROUTINE WORK(K)
INTEGER k
!$OMP ORDERED
WRITE(*,*) K
!$OMP END ORDERED
END SUBROUTINE WORK
SUBROUTINE SUB(LB, UB, STRIDE)
INTEGER LB, UB, STRIDE
INTEGER I
!$OMP PARALLEL DO ORDERED SCHEDULE(DYNAMIC)
DO I=LB,UB,STRIDE
CALL WORK(I)
END DO
!$OMP END PARALLEL DO
END SUBROUTINE SUB
PROGRAM ORDERED_EXAMPLE
CALL SUB(1,100,5)
END PROGRAM ORDERED_EXAMPLE
Fortran
26
27
28
29
It is possible to have multiple ordered constructs within a loop region with the
ordered clause specified. The first example is non-conforming because all iterations
execute two ordered regions. An iteration of a loop must not execute more than one
ordered region:
Appendix A
Examples
241
C/C++
Example A.26.2c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
void work(int i) {}
void ordered_wrong(int n)
{
int i;
#pragma omp for ordered
for (i=0; i<n; i++) {
/* incorrect because an iteration may not execute more than one
ordered region */
#pragma omp ordered
work(i);
#pragma omp ordered
work(i+1);
}
}
C/C++
Fortran
Example A.26.2f
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
SUBROUTINE WORK(I)
INTEGER I
END SUBROUTINE WORK
SUBROUTINE ORDERED_WRONG(N)
INTEGER N
INTEGER I
DO ORDERED
DO I = 1, N
! incorrect because an iteration may not execute more than one
! ordered region
!$OMP
ORDERED
CALL WORK(I)
!$OMP
END ORDERED
!$OMP
!$OMP
ORDERED
CALL WORK(I+1)
!$OMP
END ORDERED
END DO
END SUBROUTINE ORDERED_WRONG
Fortran
242
OpenMP API • Version 3.1 July 2011
1
2
The following is a conforming example with more than one ordered construct. Each
iteration will execute only one ordered region:
3
Example A.26.3c
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
C/C++
void work(int i) {}
void ordered_good(int n)
{
int i;
#pragma omp for ordered
for (i=0; i<n; i++) {
if (i <= 10) {
#pragma omp ordered
work(i);
}
if (i > 10) {
#pragma omp ordered
work(i+1);
}
}
}
C/C++
Fortran
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
Example A.26.3f
SUBROUTINE ORDERED_GOOD(N)
INTEGER N
!$OMP
!$OMP
!$OMP
DO ORDERED
DO I = 1,N
IF (I <= 10) THEN
ORDERED
CALL WORK(I)
END ORDERED
ENDIF
IF (I > 10) THEN
ORDERED
CALL WORK(I+1)
!$OMP
END ORDERED
ENDIF
ENDDO
END SUBROUTINE ORDERED_GOOD
!$OMP
Fortran
Appendix A
Examples
243
1
A.27
The threadprivate Directive
2
3
The following examples demonstrate how to use the threadprivate directive
(Section 2.9.2 on page 88) to give each thread a separate counter.
4
Example A.27.1c
5
6
7
8
9
10
11
12
C/C++
int counter = 0;
#pragma omp threadprivate(counter)
int increment_counter()
{
counter++;
return(counter);
}
C/C++
Fortran
13
Example A.27.1f
14
15
16
17
18
19
20
21
INTEGER FUNCTION INCREMENT_COUNTER()
COMMON/INC_COMMON/COUNTER
!$OMP
THREADPRIVATE(/INC_COMMON/)
COUNTER = COUNTER +1
INCREMENT_COUNTER = COUNTER
RETURN
END FUNCTION INCREMENT_COUNTER
Fortran
C/C++
22
The following example uses threadprivate on a static variable:
23
Example A.27.2c
24
25
26
27
28
29
30
int increment_counter_2()
{
static int counter = 0;
#pragma omp threadprivate(counter)
counter++;
return(counter);
}
244
OpenMP API • Version 3.1 July 2011
1
2
3
4
5
6
7
The following example demonstrates unspecified behavior for the initialization of a
threadprivate variable. A threadprivate variable is initialized once at an
unspecified point before its first reference. Because a is constructed using the value of x
(which is modified by the statement x++), the value of a.val at the start of the
parallel region could be either 1 or 2. This problem is avoided for b, which uses an
auxiliary const variable and a copy-constructor.
8
Example A.27.3c
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
class T {
public:
int val;
T (int);
T (const T&);
};
T :: T (int v){
val = v;
}
T :: T (const T& t) {
val = t.val;
}
void g(T a, T b){
a.val += b.val;
}
int x = 1;
T a(x);
const T b_aux(x); /* Capture value of x = 1 */
T b(b_aux);
#pragma omp threadprivate(a, b)
void f(int n) {
x++;
#pragma omp parallel for
/* In each thread:
* a is constructed from x (with value 1 or 2?)
* b is copy-constructed from b_aux
*/
for (int i=0; i<n; i++) {
g(a, b); /* Value of a is unspecified. */
}
}
C/C++
Appendix A
Examples
245
Fortran
1
2
3
The following examples show non-conforming uses and correct uses of the
threadprivate directive. For more information, see Section 2.9.2 on page 88 and
Section 2.9.4.1 on page 107.
4
5
The following example is non-conforming because the common block is not declared
local to the subroutine that refers to it:
6
Example A.27.2f
7
8
9
10
11
12
13
14
15
MODULE INC_MODULE
COMMON /T/ A
END MODULE INC_MODULE
SUBROUTINE INC_MODULE_WRONG()
USE INC_MODULE
!$OMP
THREADPRIVATE(/T/)
!non-conforming because /T/ not declared in INC_MODULE_WRONG
END SUBROUTINE INC_MODULE_WRONG
16
17
18
The following example is also non-conforming because the common block is not
declared local to the subroutine that refers to it:
19
Example A.27.3f
20
21
22
23
24
25
26
27
28
29
30
31
SUBROUTINE INC_WRONG()
COMMON /T/ A
!$OMP
THREADPRIVATE(/T/)
CONTAINS
SUBROUTINE INC_WRONG_SUB()
!$OMP
PARALLEL COPYIN(/T/)
!non-conforming because /T/ not declared in INC_WRONG_SUB
!$OMP
END PARALLEL
END SUBROUTINE INC_WRONG_SUB
END SUBROUTINE INC_WRONG
246
OpenMP API • Version 3.1 July 2011
Fortran (cont.)
1
The following example is a correct rewrite of the previous example:
2
Example A.27.4f
3
4
5
6
7
8
9
10
11
12
13
14
15
16
!$OMP
!$OMP
!$OMP
!$OMP
SUBROUTINE INC_GOOD()
COMMON /T/ A
THREADPRIVATE(/T/)
CONTAINS
SUBROUTINE INC_GOOD_SUB()
COMMON /T/ A
THREADPRIVATE(/T/)
PARALLEL COPYIN(/T/)
END PARALLEL
END SUBROUTINE INC_GOOD_SUB
END SUBROUTINE INC_GOOD
17
The following is an example of the use of threadprivate for local variables:
18
Example A.27.5f
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
PROGRAM INC_GOOD2
INTEGER, ALLOCATABLE, SAVE :: A(:)
INTEGER, POINTER, SAVE :: PTR
INTEGER, SAVE :: I
INTEGER, TARGET :: TARG
LOGICAL :: FIRSTIN = .TRUE.
!$OMP
THREADPRIVATE(A, I, PTR)
ALLOCATE (A(3))
A = (/1,2,3/)
PTR => TARG
I = 5
!$OMP
!$OMP
PARALLEL COPYIN(I, PTR)
CRITICAL
IF (FIRSTIN) THEN
TARG = 4
! Update target of ptr
I = I + 10
IF (ALLOCATED(A)) A = A + 10
FIRSTIN = .FALSE.
END IF
IF (ALLOCATED(A)) THEN
PRINT *, 'a = ', A
ELSE
Appendix A
Examples
247
Fortran (cont.)
1
2
3
4
5
6
7
8
9
10
11
PRINT *, 'A is not allocated'
END IF
PRINT *, 'ptr = ', PTR
PRINT *, 'i = ', I
PRINT *
!$OMP
!$OMP
END CRITICAL
END PARALLEL
END PROGRAM INC_GOOD2
The above program, if executed by two threads, will print one of the following two sets
of output:
12
13
14
15
16
17
18
19
20
21
a = 11 12 13
ptr = 4
i = 15
A is not allocated
ptr = 4
i = 5
or
22
23
24
25
26
27
28
29
30
A is not allocated
ptr = 4
i = 15
a = 1 2 3
ptr = 4
i = 5
31
32
The following is an example of the use of threadprivate for module variables:
33
Example A.27.6f
34
35
36
37
38
39
40
41
42
43
MODULE INC_MODULE_GOOD3
REAL, POINTER :: WORK(:)
SAVE WORK
!$OMP
THREADPRIVATE(WORK)
END MODULE INC_MODULE_GOOD3
SUBROUTINE SUB1(N)
USE INC_MODULE_GOOD3
!$OMP
PARALLEL PRIVATE(THE_SUM)
ALLOCATE(WORK(N))
248
OpenMP API • Version 3.1 July 2011
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
CALL SUB2(THE_SUM)
WRITE(*,*)THE_SUM
!$OMP
END PARALLEL
END SUBROUTINE SUB1
SUBROUTINE SUB2(THE_SUM)
USE INC_MODULE_GOOD3
WORK(:) = 10
THE_SUM=SUM(WORK)
END SUBROUTINE SUB2
PROGRAM INC_GOOD3
N = 10
CALL SUB1(N)
END PROGRAM INC_GOOD3
Fortran
C/C++
16
17
18
The following example illustrates initialization of threadprivate variables for
class-type T. t1 is default constructed, t2 is constructed taking a constructor accepting
one argument of integer type, t3 is copy constructed with argument f():
19
Example A.27.4c
20
21
22
23
24
25
26
static T t1;
#pragma omp threadprivate(t1)
static T t2( 23 );
#pragma omp threadprivate(t2)
static T t3 = f();
#pragma omp threadprivate(t3)
27
28
29
The following example illustrates the use of threadprivate for static class
members. The threadprivate directive for a static class member must be placed
inside the class definition.
30
Example A.27.5c
31
32
33
34
35
36
class T {
public:
static int i;
#pragma omp threadprivate(i)
};
C/C++
Appendix A
Examples
249
1
C/C++
2
A.28
Parallel Random Access Iterator Loop
The following example shows a parallel random access iterator loop.
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Example A.28.1c
#include <vector>
void iterator_example()
{
std::vector<int> vec(23);
std::vector<int>::iterator it;
#pragma omp parallel for default(none) shared(vec)
for (it = vec.begin(); it < vec.end(); it++)
{
// do work with *it //
}
}
C/C++
250
OpenMP API • Version 3.1 July 2011
1
Fortran
3
Fortran Restrictions on shared and
private Clauses with Common Blocks
4
5
6
7
When a named common block is specified in a private, firstprivate, or
lastprivate clause of a construct, none of its members may be declared in another
data-sharing attribute clause on that construct. The following examples illustrate this
point. For more information, see Section 2.9.3 on page 92.
8
The following example is conforming:
9
Example A.29.1f
2
10
11
12
13
14
15
16
17
18
19
20
21
A.29
SUBROUTINE COMMON_GOOD()
COMMON /C/ X,Y
REAL X, Y
!$OMP
!$OMP
PARALLEL PRIVATE (/C/)
! do work here
END PARALLEL
!$OMP
PARALLEL SHARED (X,Y)
! do work here
!$OMP
END PARALLEL
END SUBROUTINE COMMON_GOOD
22
23
The following example is also conforming:
24
Example A.29.2f
25
26
27
28
29
30
31
32
33
34
35
36
37
SUBROUTINE COMMON_GOOD2()
COMMON /C/ X,Y
REAL X, Y
INTEGER I
!$OMP
!$OMP
!$OMP
!
PARALLEL
DO PRIVATE(/C/)
DO I=1,1000
! do work here
ENDDO
END DO
Appendix A
Examples
251
Fortran (cont.)
1
2
3
4
5
6
7
8
!$OMP
9
The following example is conforming:
DO PRIVATE(X)
DO I=1,1000
! do work here
ENDDO
!$OMP
END DO
!$OMP
END PARALLEL
END SUBROUTINE COMMON_GOOD2
Example A.29.3f
10
11
12
13
14
15
16
17
18
19
20
21
SUBROUTINE COMMON_GOOD3()
COMMON /C/ X,Y
!$OMP
!$OMP
PARALLEL PRIVATE (/C/)
! do work here
END PARALLEL
!$OMP
PARALLEL SHARED (/C/)
! do work here
!$OMP
END PARALLEL
END SUBROUTINE COMMON_GOOD3
22
23
The following example is non-conforming because x is a constituent element of c:
24
Example A.29.4f
25
26
27
28
29
30
31
SUBROUTINE COMMON_WRONG()
COMMON /C/ X,Y
! Incorrect because X is a constituent element of C
!$OMP
PARALLEL PRIVATE(/C/), SHARED(X)
! do work here
!$OMP
END PARALLEL
END SUBROUTINE COMMON_WRONG
32
33
34
The following example is non-conforming because a common block may not be
declared both shared and private:
35
Example A.29.5f
36
37
SUBROUTINE COMMON_WRONG2()
COMMON /C/ X,Y
252
OpenMP API • Version 3.1 July 2011
1
2
3
4
5
6
7
! Incorrect: common block C cannot be declared both
! shared and private
!$OMP
PARALLEL PRIVATE (/C/), SHARED(/C/)
! do work here
!$OMP
END PARALLEL
END SUBROUTINE COMMON_WRONG2
Fortran
8
A.30
The default(none) Clause
9
10
11
The following example distinguishes the variables that are affected by the
default(none) clause from those that are not. For more information on the
default clause, see Section 2.9.3.1 on page 93.
12
Example A.30.1c
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
#include <omp.h>
int x, y, z[1000];
#pragma omp threadprivate(x)
C/C++
void default_none(int a) {
const int c = 1;
int i = 0;
#pragma omp parallel default(none) private(a) shared(z)
{
int j = omp_get_num_threads();
/* O.K. - j is declared within parallel region */
a = z[j];
/* O.K. - a is listed in private clause */
/*
- z is listed in shared clause */
x = c;
/* O.K. - x is threadprivate */
/*
- c has const-qualified type */
z[i] = y;
/* Error - cannot reference i or y here */
#pragma omp for firstprivate(y)
/* Error - Cannot reference y in the firstprivate clause */
for (i=0; i<10 ; i++) {
z[i] = i; /* O.K. - i is the loop iteration variable */
}
z[i] = y;
/* Error - cannot reference i or y here */
}
}
C/C++
Appendix A
Examples
253
Fortran
Example A.30.1f
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
SUBROUTINE DEFAULT_NONE(A)
INCLUDE "omp_lib.h"
! or USE OMP_LIB
INTEGER A
INTEGER X, Y, Z(1000)
COMMON/BLOCKX/X
COMMON/BLOCKY/Y
COMMON/BLOCKZ/Z
!$OMP THREADPRIVATE(/BLOCKX/)
INTEGER I, J
i = 1
!$OMP
PARALLEL DEFAULT(NONE) PRIVATE(A) SHARED(Z) PRIVATE(J)
J = OMP_GET_NUM_THREADS();
! O.K. - J is listed in PRIVATE clause
A = Z(J) ! O.K. - A is listed in PRIVATE clause
!
- Z is listed in SHARED clause
X = 1
! O.K. - X is THREADPRIVATE
Z(I) = Y ! Error - cannot reference I or Y here
!$OMP DO firstprivate(y)
! Error - Cannot reference y in the firstprivate clause
DO I = 1,10
Z(I) = I ! O.K. - I is the loop iteration variable
END DO
!$OMP
Z(I) = Y
! Error - cannot reference I or Y here
END PARALLEL
END SUBROUTINE DEFAULT_NONE
Fortran
254
OpenMP API • Version 3.1 July 2011
1
Fortran
2
3
A.31
Race Conditions Caused by Implied
Copies of Shared Variables in Fortran
4
5
6
7
8
9
10
The following example contains a race condition, because the shared variable, which is
an array section, is passed as an actual argument to a routine that has an assumed-size
array as its dummy argument (see Section 2.9.3.2 on page 94). The subroutine call
passing an array section argument may cause the compiler to copy the argument into a
temporary location prior to the call and copy from the temporary location into the
original variable when the subroutine returns. This copying would cause races in the
parallel region.
11
Example A.31.1f
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
SUBROUTINE SHARED_RACE
INCLUDE "omp_lib.h"
! or USE OMP_LIB
REAL A(20)
INTEGER MYTHREAD
!$OMP PARALLEL SHARED(A) PRIVATE(MYTHREAD)
MYTHREAD = OMP_GET_THREAD_NUM()
IF (MYTHREAD .EQ. 0) THEN
CALL SUB(A(1:10)) ! compiler may introduce writes to A(6:10)
ELSE
A(6:10) = 12
ENDIF
!$OMP END PARALLEL
END SUBROUTINE SHARED_RACE
SUBROUTINE SUB(X)
REAL X(*)
X(1:5) = 4
END SUBROUTINE SUB
Fortran
Appendix A
Examples
255
1
A.32
The private Clause
2
3
4
5
In the following example, the values of original list items i and j are retained on exit
from the parallel region, while the private list items i and j are modified within the
parallel construct. For more information on the private clause, see
Section 2.9.3.3 on page 96.
6
Example A.32.1c
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
C/C++
#include <stdio.h>
#include <assert.h>
int main()
{
int i, j;
int *ptr_i, *ptr_j;
i = 1;
j = 2;
ptr_i = &i;
ptr_j = &j;
#pragma omp parallel private(i) firstprivate(j)
{
i = 3;
j = j + 2;
assert (*ptr_i == 1 && *ptr_j == 2);
}
assert(i == 1 && j == 2);
return 0;
}
C/C++
Fortran
Example A.32.1f
32
33
34
35
36
37
PROGRAM PRIV_EXAMPLE
INTEGER I, J
I = 1
J = 2
256
OpenMP API • Version 3.1 July 2011
1
2
3
4
5
6
7
8
!$OMP
!$OMP
PARALLEL PRIVATE(I) FIRSTPRIVATE(J)
I = 3
J = J + 2
END PARALLEL
PRINT *, I, J ! I .eq. 1 .and. J .eq. 2
END PROGRAM PRIV_EXAMPLE
Fortran
9
10
11
In the following example, all uses of the variable a within the loop construct in the
routine f refer to a private list item a, while it is unspecified whether references to a in
the routine g are to a private list item or the original list item.
12
Example A.32.2c
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
int a;
C/C++
void g(int k) {
a = k; /* Accessed in the region but outside of the construct;
* therefore unspecified whether original or private list
* item is modified. */
}
void f(int n) {
int a = 0;
#pragma omp parallel for private(a)
for (int i=1; i<n; i++) {
a = i;
g(a*2);
/* Private copy of "a" */
}
}
C/C++
Appendix A
Examples
257
Fortran
Example A.32.2f
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
MODULE PRIV_EXAMPLE2
REAL A
CONTAINS
SUBROUTINE G(K)
REAL K
A = K ! Accessed in the region but outside of the
! construct; therefore unspecified whether
! original or private list item is modified.
END SUBROUTINE G
SUBROUTINE F(N)
INTEGER N
REAL A
!$OMP
!$OMP
INTEGER I
PARALLEL DO PRIVATE(A)
DO I = 1,N
A = I
CALL G(A*2)
ENDDO
END PARALLEL DO
END SUBROUTINE F
END MODULE PRIV_EXAMPLE2
Fortran
The following example demonstrates that a list item that appears in a private clause
in a parallel construct may also appear in a private clause in an enclosed
worksharing construct, which results in an additional private copy.
28
29
30
258
OpenMP API • Version 3.1 July 2011
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
Example A.32.3c
C/C++
#include <assert.h>
void priv_example3()
{
int i, a;
#pragma omp parallel private(a)
{
a = 1;
#pragma omp parallel for private(a)
for (i=0; i<10; i++)
{
a = 2;
}
assert(a == 1);
}
}
C/C++
Fortran
18
19
20
21
22
23
24
25
26
27
28
29
30
31
Example A.32.3f
SUBROUTINE PRIV_EXAMPLE3()
INTEGER I, A
!$OMP
PARALLEL PRIVATE(A)
A = 1
!$OMP
PARALLEL DO PRIVATE(A)
DO I = 1, 10
A = 2
END DO
!$OMP
END PARALLEL DO
PRINT *, A ! Outer A still has value 1
!$OMP
END PARALLEL
END SUBROUTINE PRIV_EXAMPLE3
Fortran
Appendix A
Examples
259
1
Fortran
3
Fortran Restrictions on Storage
Association with the private Clause
4
5
The following non-conforming examples illustrate the implications of the private
clause rules with regard to storage association (see Section 2.9.3.3 on page 96).
6
Example A.33.1f
2
A.33
7
8
9
10
11
12
13
14
15
16
17
18
19
PROGRAM PRIV_RESTRICT
COMMON /BLOCK/ X
X = 1.0
!$OMP
PARALLEL PRIVATE (X)
X = 2.0
CALL SUB()
!$OMP
END PARALLEL
END PROGRAM PRIV_RESTRICT
20
Example A.33.2f
SUBROUTINE SUB()
COMMON /BLOCK/ X
PRINT *,X
END SUBROUTINE SUB
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
! X is undefined
PROGRAM PRIV_RESTRICT2
COMMON /BLOCK2/ X
X = 1.0
!$OMP
!$OMP
PARALLEL PRIVATE (X)
X = 2.0
CALL SUB()
END PARALLEL
CONTAINS
SUBROUTINE SUB()
COMMON /BLOCK2/ Y
PRINT *,X
PRINT *,Y
END SUBROUTINE SUB
END PROGRAM PRIV_RESTRICT2
260
OpenMP API • Version 3.1 July 2011
! X is undefined
! Y is undefined
Fortran (cont.)
1
2
3
4
5
6
7
8
9
10
11
Example A.33.3f
PROGRAM PRIV_RESTRICT3
EQUIVALENCE (X,Y)
X = 1.0
!$OMP
PARALLEL PRIVATE(X)
PRINT *,Y
Y = 10
PRINT *,X
!$OMP
END PARALLEL
END PROGRAM PRIV_RESTRICT3
! Y is undefined
! X is undefined
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
Example A.33.4f
PROGRAM PRIV_RESTRICT4
INTEGER I, J
INTEGER A(100), B(100)
EQUIVALENCE (A(51), B(1))
!$OMP PARALLEL DO DEFAULT(PRIVATE) PRIVATE(I,J) LASTPRIVATE(A)
DO I=1,100
DO J=1,100
B(J) = J - 1
ENDDO
DO J=1,100
A(J) = J
ENDDO
! B becomes undefined at this point
DO J=1,50
B(J) = B(J) + 1 ! B is undefined
! A becomes undefined at this point
ENDDO
ENDDO
!$OMP END PARALLEL DO
! The LASTPRIVATE write for A has
! undefined results
PRINT *, B
! B is undefined since the LASTPRIVATE
! write of A was not defined
END PROGRAM PRIV_RESTRICT4
Appendix A
Examples
261
Example A.33.5f
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
SUBROUTINE SUB1(X)
DIMENSION X(10)
!
!
!
!
!
This use of X does not conform to the
specification. It would be legal Fortran 90,
but the OpenMP private directive allows the
compiler to break the sequence association that
A had with the rest of the common block.
FORALL (I = 1:10) X(I) = I
END SUBROUTINE SUB1
PROGRAM PRIV_RESTRICT5
COMMON /BLOCK5/ A
DIMENSION B(10)
EQUIVALENCE (A,B(1))
! the common block has to be at least 10 words
A = 0
!$OMP
PARALLEL PRIVATE(/BLOCK5/)
!
!
!
!
!
!$OMP
!$OMP
!$OMP
Without the private clause,
we would be passing a member of a sequence
that is at least ten elements long.
With the private clause, A may no longer be
sequence-associated.
CALL SUB1(A)
MASTER
PRINT *, A
END MASTER
END PARALLEL
END PROGRAM PRIV_RESTRICT5
Fortran
39
40
41
262
OpenMP API • Version 3.1 July 2011
1
2
3
4
5
6
7
8
9
10
11
12
C/C++
A.34
C/C++ Arrays in a firstprivate Clause
The following example illustrates the size and value of list items of array or pointer type
in a firstprivate clause (Section 2.9.3.4 on page 98). The size of new list items is
based on the type of the corresponding original list item, as determined by the base
language.
In this example:
• The type of A is array of two arrays of two ints.
• The type of B is adjusted to pointer to array of n ints, because it is a function parameter.
• The type of C is adjusted to pointer to int, because it is a function parameter.
• The type of D is array of two arrays of two ints.
• The type of E is array of n arrays of n ints.
13
Note that B and E involve variable length array types.
14
15
16
The new items of array type are initialized as if each integer element of the original
array is assigned to the corresponding element of the new array. Those of pointer type
are initialized as if by assignment from the original item to the new item.
17
Appendix A
Examples
263
Example A.34.1c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
#include <assert.h>
int A[2][2] = {1, 2, 3, 4};
void f(int n, int B[n][n], int C[])
{
int D[2][2] = {1, 2, 3, 4};
int E[n][n];
assert(n >= 2);
E[1][1] = 4;
#pragma omp parallel firstprivate(B, C, D, E)
{
assert(sizeof(B) == sizeof(int (*)[n]));
assert(sizeof(C) == sizeof(int*));
assert(sizeof(D) == 4 * sizeof(int));
assert(sizeof(E) == n * n * sizeof(int));
/* Private B and C have values of original B and C. */
assert(&B[1][1] == &A[1][1]);
assert(&C[3] == &A[1][1]);
assert(D[1][1] == 4);
assert(E[1][1] == 4);
}
}
int main() {
f(2, A, A[0]);
return 0;
}
C/C++
33
A.35
The lastprivate Clause
Correct execution sometimes depends on the value that the last iteration of a loop
assigns to a variable. Such programs must list all such variables in a lastprivate
clause (Section 2.9.3.5 on page 101) so that the values of the variables are the same as
when the loop is executed sequentially.
34
35
36
37
264
OpenMP API • Version 3.1 July 2011
1
2
3
4
5
6
7
8
9
10
11
12
13
14
C/C++
Example A.35.1c
void lastpriv (int n, float *a, float *b)
{
int i;
#pragma omp parallel
{
#pragma omp for lastprivate(i)
for (i=0; i<n-1; i++)
a[i] = b[i] + b[i+1];
}
a[i]=b[i];
/* i == n-1 here */
}
C/C++
Fortran
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
Example A.35.1f
SUBROUTINE LASTPRIV(N, A, B)
INTEGER N
REAL A(*), B(*)
INTEGER I
!$OMP PARALLEL
!$OMP DO LASTPRIVATE(I)
DO I=1,N-1
A(I) = B(I) + B(I+1)
ENDDO
!$OMP END PARALLEL
A(I) = B(I)
! I has the value of N here
END SUBROUTINE LASTPRIV
Fortran
Appendix A
Examples
265
1
A.36
The reduction Clause
2
3
4
The following example demonstrates the reduction clause (Section 2.9.3.6 on page
103); note that some reductions can be expressed in the loop in several ways, as shown
for the max and min reductions below:
5
Example A.36.1c
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
C/C++
#include <math.h>
void reduction1(float *x, int *y, int n)
{
int i, b, c;
float a, d;
a = 0.0;
b = 0;
c = y[0];
d = x[0];
#pragma omp parallel for private(i) shared(x, y, n) \
reduction(+:a) reduction(^:b) \
reduction(min:c) reduction(max:d)
for (i=0; i<n; i++) {
a += x[i];
b ^= y[i];
if (c > y[i]) c = y[i];
d = fmaxf(d,x[i]);
}
}
C/C++
Fortran
25
Example A.36.1f
26
27
28
29
30
31
32
33
34
35
36
37
38
SUBROUTINE REDUCTION1(A, B, C, D, X, Y, N)
REAL :: X(*), A, D
INTEGER :: Y(*), N, B, C
INTEGER :: I
A = 0
B = 0
C = Y(1)
D = X(1)
!$OMP PARALLEL DO PRIVATE(I) SHARED(X, Y, N) REDUCTION(+:A) &
!$OMP& REDUCTION(IEOR:B) REDUCTION(MIN:C) REDUCTION(MAX:D)
DO I=1,N
A = A + X(I)
B = IEOR(B, Y(I))
266
OpenMP API • Version 3.1 July 2011
1
2
3
4
5
C = MIN(C, Y(I))
IF (D < X(I)) D = X(I)
END DO
END SUBROUTINE REDUCTION1
Fortran
6
7
A common implementation of the preceding example is to treat it as if it had been
written as follows:
8
Example A.36.2c
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
C/C++
#include <limits.h>
#include <math.h>
void reduction2(float *x, int *y, int n)
{
int i, b, b_p, c, c_p;
float a, a_p, d, d_p;
a = 0.0f;
b = 0;
c = y[0];
d = x[0];
#pragma omp parallel shared(a, b, c, d, x, y, n) \
private(a_p, b_p, c_p, d_p)
{
a_p = 0.0f;
b_p = 0;
c_p = INT_MAX;
d_p = -HUGE_VALF;
#pragma omp for private(i)
for (i=0; i<n; i++) {
a_p += x[i];
b_p ^= y[i];
if (c_p > y[i]) c_p = y[i];
d_p = fmaxf(d_p,x[i]);
}
#pragma omp critical
{
a += a_p;
b ^= b_p;
if( c > c_p ) c = c_p;
d = fmaxf(d,d_p);
}
}
}
C/C++
Appendix A
Examples
267
Fortran
Example A.36.2f
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
SUBROUTINE REDUCTION2(A, B, C, D, X, Y, N)
REAL :: X(*), A, D
INTEGER :: Y(*), N, B, C
REAL :: A_P, D_P
INTEGER :: I, B_P, C_P
A = 0
B = 0
C = Y(1)
D = X(1)
!$OMP PARALLEL SHARED(X, Y, A, B, C, D, N) &
!$OMP&
PRIVATE(A_P, B_P, C_P, D_P)
A_P = 0.0
B_P = 0
C_P = HUGE(C_P)
D_P = -HUGE(D_P)
!$OMP DO PRIVATE(I)
DO I=1,N
A_P = A_P + X(I)
B_P = IEOR(B_P, Y(I))
C_P = MIN(C_P, Y(I))
IF (D_P < X(I)) D_P = X(I)
END DO
!$OMP CRITICAL
A = A + A_P
B = IEOR(B, B_P)
C = MIN(C, C_P)
D = MAX(D, D_P)
!$OMP END CRITICAL
!$OMP END PARALLEL
END SUBROUTINE REDUCTION2
32
33
34
The following program is non-conforming because the reduction is on the intrinsic
procedure name MAX but that name has been redefined to be the variable named MAX.
35
Example A.36.3f
36
37
38
39
40
41
42
43
44
45
PROGRAM REDUCTION_WRONG
MAX = HUGE(0)
M = 0
!$OMP PARALLEL DO REDUCTION(MAX: M)
! MAX is no longer the intrinsic so this is non-conforming
DO I = 1, 100
CALL SUB(M,I)
END DO
268
OpenMP API • Version 3.1 July 2011
1
2
3
4
5
6
END PROGRAM REDUCTION_WRONG
SUBROUTINE SUB(M,I)
M = MAX(M,I)
END SUBROUTINE SUB
7
8
The following conforming program performs the reduction using the intrinsic procedure
name MAX even though the intrinsic MAX has been renamed to REN.
9
Example A.36.4f
10
11
12
13
14
15
16
17
18
19
20
21
22
MODULE M
INTRINSIC MAX
END MODULE M
23
24
The following conforming program performs the reduction using intrinsic procedure
name MAX even though the intrinsic MAX has been renamed to MIN.
25
Example A.36.5f
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
MODULE MOD
INTRINSIC MAX, MIN
END MODULE MOD
PROGRAM REDUCTION3
USE M, REN => MAX
N = 0
!$OMP PARALLEL DO REDUCTION(REN: N)
DO I = 1, 100
N = MAX(N,I)
END DO
END PROGRAM REDUCTION3
! still does MAX
PROGRAM REDUCTION4
USE MOD, MIN=>MAX, MAX=>MIN
REAL :: R
R = -HUGE(0.0)
!$OMP PARALLEL DO REDUCTION(MIN: R)
DO I = 1, 1000
R = MIN(R, SIN(REAL(I)))
END DO
PRINT *, R
END PROGRAM REDUCTION4
! still does MAX
Fortran
Appendix A
Examples
269
The following example is non-conforming because the initialization (a = 0) of the
original list item a is not synchronized with the update of a as a result of the reduction
computation in the for loop. Therefore, the example may print an incorrect value for a.
1
2
3
4
To avoid this problem, the initialization of the original list item a should complete
before any update of a as a result of the reduction clause. This can be achieved by
adding an explicit barrier after the assignment a = 0, or by enclosing the assignment
a = 0 in a single directive (which has an implied barrier), or by initializing a before
the start of the parallel region.
5
6
7
8
9
10
Example A.36.3c
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
#include <stdio.h>
C/C++
int main (void)
{
int a, i;
#pragma omp parallel shared(a) private(i)
{
#pragma omp master
a = 0;
// To avoid race conditions, add a barrier here.
#pragma omp for reduction(+:a)
for (i = 0; i < 10; i++) {
a += i;
}
#pragma omp single
printf ("Sum is %d\n", a);
}
}
C/C++
270
OpenMP API • Version 3.1 July 2011
Fortran
Example A.36.6f
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
INTEGER A, I
!$OMP PARALLEL SHARED(A) PRIVATE(I)
!$OMP MASTER
A = 0
!$OMP END MASTER
! To avoid race conditions, add a barrier here.
!$OMP DO REDUCTION(+:A)
DO I= 0, 9
A = A + I
END DO
!$OMP SINGLE
PRINT *, "Sum is ", A
!$OMP END SINGLE
!$OMP END PARALLEL
END
Fortran
23
24
25
26
A.37
The copyin Clause
The copyin clause (see Section 2.9.4.1 on page 107) is used to initialize threadprivate
data upon entry to a parallel region. The value of the threadprivate variable in the
master thread is copied to the threadprivate variable of each other team member.
Appendix A
Examples
271
Example A.37.1c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
C/C++
#include <stdlib.h>
float* work;
int size;
float tol;
#pragma omp threadprivate(work,size,tol)
void build()
{
int i;
work = (float*)malloc( sizeof(float)*size );
for( i = 0; i < size; ++i ) work[i] = tol;
}
void copyin_example( float t, int n )
{
tol = t;
size = n;
#pragma omp parallel copyin(tol,size)
{
build();
}
}
C/C++
272
OpenMP API • Version 3.1 July 2011
Fortran
Example A.37.1f
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
MODULE M
REAL, POINTER, SAVE :: WORK(:)
INTEGER :: SIZE
REAL :: TOL
!$OMP
THREADPRIVATE(WORK,SIZE,TOL)
END MODULE M
SUBROUTINE COPYIN_EXAMPLE( T, N )
USE M
REAL :: T
INTEGER :: N
TOL = T
SIZE = N
!$OMP
PARALLEL COPYIN(TOL,SIZE)
CALL BUILD
!$OMP
END PARALLEL
END SUBROUTINE COPYIN_EXAMPLE
SUBROUTINE BUILD
USE M
ALLOCATE(WORK(SIZE))
WORK = TOL
END SUBROUTINE BUILD
Fortran
25
A.38
The copyprivate Clause
26
27
28
29
30
31
The copyprivate clause (see Section 2.9.4.2 on page 109) can be used to broadcast
values acquired by a single thread directly to all instances of the private variables in the
other threads. In this example, if the routine is called from the sequential part, its
behavior is not affected by the presence of the directives. If it is called from a
parallel region, then the actual arguments with which a and b are associated must
be private.
32
33
34
35
36
The thread that executes the structured block associated with the single construct
broadcasts the values of the private variables a, b, x, and y from its implicit task's
data environment to the data environments of the other implicit tasks in the thread team.
The broadcast completes before any of the threads have left the barrier at the end of the
construct.
Appendix A
Examples
273
Example A.38.1c
1
2
3
4
5
6
7
8
9
10
11
C/C++
#include <stdio.h>
float x, y;
#pragma omp threadprivate(x, y)
void init(float a, float b ) {
#pragma omp single copyprivate(a,b,x,y)
{
scanf("%f %f %f %f", &a, &b, &x, &y);
}
}
C/C++
Fortran
12
Example A.38.1f
13
14
15
16
17
18
19
20
21
22
SUBROUTINE INIT(A,B)
REAL A, B
COMMON /XY/ X,Y
!$OMP
THREADPRIVATE (/XY/)
!$OMP
!$OMP
SINGLE
READ (11) A,B,X,Y
END SINGLE COPYPRIVATE (A,B,/XY/)
END SUBROUTINE INIT
Fortran
274
OpenMP API • Version 3.1 July 2011
1
2
3
4
In this example, assume that the input must be performed by the master thread. Since the
master construct does not support the copyprivate clause, it cannot broadcast the
input value that is read. However, copyprivate is used to broadcast an address where
the input value is stored.
5
Example A.38.2c
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
C/C++
#include <stdio.h>
#include <stdlib.h>
float read_next( ) {
float * tmp;
float return_val;
#pragma omp single copyprivate(tmp)
{
tmp = (float *) malloc(sizeof(float));
} /* copies the pointer only */
#pragma omp master
{
scanf("%f", tmp);
}
#pragma omp barrier
return_val = *tmp;
#pragma omp barrier
#pragma omp single nowait
{
free(tmp);
}
return return_val;
}
C/C++
Appendix A
Examples
275
Fortran
Example A.38.2f
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
REAL FUNCTION READ_NEXT()
REAL, POINTER :: TMP
!$OMP
!$OMP
!$OMP
!$OMP
!$OMP
!$OMP
!$OMP
!$OMP
SINGLE
ALLOCATE (TMP)
END SINGLE COPYPRIVATE (TMP)
! copies the pointer only
MASTER
READ (11) TMP
END MASTER
BARRIER
READ_NEXT = TMP
BARRIER
SINGLE
DEALLOCATE (TMP)
END SINGLE NOWAIT
END FUNCTION READ_NEXT
Fortran
21
22
23
Suppose that the number of lock variables required within a parallel region cannot
easily be determined prior to entering it. The copyprivate clause can be used to
provide access to shared lock variables that are allocated within that parallel region.
24
Example A.38.3c
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
#include <stdio.h>
#include <stdlib.h>
#include <omp.h>
C/C++
omp_lock_t *new_lock()
{
omp_lock_t *lock_ptr;
#pragma omp single copyprivate(lock_ptr)
{
lock_ptr = (omp_lock_t *) malloc(sizeof(omp_lock_t));
omp_init_lock( lock_ptr );
}
return lock_ptr;
}
C/C++
276
OpenMP API • Version 3.1 July 2011
Fortran
1
Example A.38.3f
2
3
4
5
6
7
8
9
10
11
!$OMP
12
13
14
15
16
Note that the effect of the copyprivate clause on a variable with the allocatable
attribute is different than on a variable with the pointer attribute. The value of A is
copied (as if by intrinsic assignment) and the pointer B is copied (as if by pointer
assignment) to the corresponding list items in the other implicit tasks belonging to the
parallel region.
17
Example A.38.4f
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
FUNCTION NEW_LOCK()
USE OMP_LIB
! or INCLUDE "omp_lib.h"
INTEGER(OMP_LOCK_KIND), POINTER :: NEW_LOCK
SINGLE
ALLOCATE(NEW_LOCK)
CALL OMP_INIT_LOCK(NEW_LOCK)
!$OMP
END SINGLE COPYPRIVATE(NEW_LOCK)
END FUNCTION NEW_LOCK
SUBROUTINE S(N)
INTEGER N
REAL, DIMENSION(:), ALLOCATABLE :: A
REAL, DIMENSION(:), POINTER :: B
!$OMP
!$OMP
ALLOCATE (A(N))
SINGLE
ALLOCATE (B(N))
READ (11) A,B
END SINGLE COPYPRIVATE(A,B)
! Variable A is private and is
! assigned the same value in each thread
! Variable B is shared
!$OMP
!$OMP
BARRIER
SINGLE
DEALLOCATE (B)
!$OMP
END SINGLE NOWAIT
END SUBROUTINE S
Fortran
Appendix A
Examples
277
1
A.39
Nested Loop Constructs
2
3
4
The following example of loop construct nesting (see Section 2.10 on page 111) is
conforming because the inner and outer loop regions bind to different parallel
regions:
5
Example A.39.1c
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
C/C++
void work(int i, int j) {}
void good_nesting(int n)
{
int i, j;
#pragma omp parallel default(shared)
{
#pragma omp for
for (i=0; i<n; i++) {
#pragma omp parallel shared(i, n)
{
#pragma omp for
for (j=0; j < n; j++)
work(i, j);
}
}
}
}
C/C++
278
OpenMP API • Version 3.1 July 2011
Fortran
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
Example A.39.1f
SUBROUTINE WORK(I, J)
INTEGER I, J
END SUBROUTINE WORK
SUBROUTINE GOOD_NESTING(N)
INTEGER N
!$OMP
!$OMP
!$OMP
!$OMP
!$OMP
!$OMP
INTEGER I
PARALLEL DEFAULT(SHARED)
DO
DO I = 1, N
PARALLEL SHARED(I,N)
DO
DO J = 1, N
CALL WORK(I,J)
END DO
END PARALLEL
END DO
END PARALLEL
END SUBROUTINE GOOD_NESTING
Fortran
Appendix A
Examples
279
1
The following variation of the preceding example is also conforming:
2
Example A.39.2c
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
C/C++
void work(int i, int j) {}
void work1(int i, int n)
{
int j;
#pragma omp parallel default(shared)
{
#pragma omp for
for (j=0; j<n; j++)
work(i, j);
}
}
void good_nesting2(int n)
{
int i;
#pragma omp parallel default(shared)
{
#pragma omp for
for (i=0; i<n; i++)
work1(i, n);
}
}
C/C++
280
OpenMP API • Version 3.1 July 2011
Fortran
Example A.39.2f
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
SUBROUTINE WORK(I, J)
INTEGER I, J
END SUBROUTINE WORK
SUBROUTINE WORK1(I, N)
INTEGER J
!$OMP PARALLEL DEFAULT(SHARED)
!$OMP DO
DO J = 1, N
CALL WORK(I,J)
END DO
!$OMP END PARALLEL
END SUBROUTINE WORK1
SUBROUTINE GOOD_NESTING2(N)
INTEGER N
!$OMP PARALLEL DEFAULT(SHARED)
!$OMP DO
DO I = 1, N
CALL WORK1(I, N)
END DO
!$OMP END PARALLEL
END SUBROUTINE GOOD_NESTING2
Fortran
25
26
27
A.40
Restrictions on Nesting of Regions
The examples in this section illustrate the region nesting rules. For more information on
region nesting, see Section 2.10 on page 111.
Appendix A
Examples
281
1
2
The following example is non-conforming because the inner and outer loop regions are
closely nested:
3
Example A.40.1c
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
C/C++
void work(int i, int j) {}
void wrong1(int n)
{
#pragma omp parallel default(shared)
{
int i, j;
#pragma omp for
for (i=0; i<n; i++) {
/* incorrect nesting of loop regions */
#pragma omp for
for (j=0; j<n; j++)
work(i, j);
}
}
}
C/C++
Fortran
Example A.40.1f
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
SUBROUTINE WORK(I, J)
INTEGER I, J
END SUBROUTINE WORK
SUBROUTINE WRONG1(N)
INTEGER N
INTEGER I,J
PARALLEL DEFAULT(SHARED)
DO
DO I = 1, N
!$OMP
DO
! incorrect nesting of loop regions
DO J = 1, N
CALL WORK(I,J)
END DO
END DO
!$OMP
END PARALLEL
END SUBROUTINE WRONG1
!$OMP
!$OMP
Fortran
282
OpenMP API • Version 3.1 July 2011
1
The following orphaned version of the preceding example is also non-conforming:
2
Example A.40.2c
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
C/C++
void work(int i, int j) {}
void work1(int i, int n)
{
int j;
/* incorrect nesting of loop regions */
#pragma omp for
for (j=0; j<n; j++)
work(i, j);
}
void wrong2(int n)
{
#pragma omp parallel default(shared)
{
int i;
#pragma omp for
for (i=0; i<n; i++)
work1(i, n);
}
}
C/C++
Fortran
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
Example A.40.2f
!$OMP
!$OMP
!$OMP
!$OMP
SUBROUTINE WORK1(I,N)
INTEGER I, N
INTEGER J
DO
! incorrect nesting of loop regions
DO J = 1, N
CALL WORK(I,J)
END DO
END SUBROUTINE WORK1
SUBROUTINE WRONG2(N)
INTEGER N
INTEGER I
PARALLEL DEFAULT(SHARED)
DO
DO I = 1, N
CALL WORK1(I,N)
END DO
END PARALLEL
END SUBROUTINE WRONG2
Fortran
Appendix A
Examples
283
1
2
The following example is non-conforming because the loop and single regions are
closely nested:
3
Example A.40.3c
4
5
6
7
8
9
10
11
12
13
14
15
16
17
C/C++
void work(int i, int j) {}
void wrong3(int n)
{
#pragma omp parallel default(shared)
{
int i;
#pragma omp for
for (i=0; i<n; i++) {
/* incorrect nesting of regions */
#pragma omp single
work(i, 0);
}
}
}
C/C++
Fortran
Example A.40.3f
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
SUBROUTINE WRONG3(N)
INTEGER N
INTEGER I
PARALLEL DEFAULT(SHARED)
DO
DO I = 1, N
!$OMP
SINGLE
! incorrect nesting of regions
CALL WORK(I, 1)
!$OMP
END SINGLE
END DO
!$OMP
END PARALLEL
END SUBROUTINE WRONG3
!$OMP
!$OMP
Fortran
284
OpenMP API • Version 3.1 July 2011
1
2
The following example is non-conforming because a barrier region cannot be closely
nested inside a loop region:
3
Example A.40.4c
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
C/C++
void work(int i, int j) {}
void wrong4(int n)
{
#pragma omp parallel default(shared)
{
int i;
#pragma omp for
for (i=0; i<n; i++) {
work(i, 0);
/* incorrect nesting of barrier region in a loop region */
#pragma omp barrier
work(i, 1);
}
}
}
C/C++
Fortran
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
Example A.40.4f
SUBROUTINE WRONG4(N)
INTEGER N
INTEGER I
PARALLEL DEFAULT(SHARED)
DO
DO I = 1, N
CALL WORK(I, 1)
! incorrect nesting of barrier region in a loop region
!$OMP
BARRIER
CALL WORK(I, 2)
END DO
!$OMP
END PARALLEL
END SUBROUTINE WRONG4
!$OMP
!$OMP
Fortran
Appendix A
Examples
285
1
2
3
The following example is non-conforming because the barrier region cannot be
closely nested inside the critical region. If this were permitted, it would result in
deadlock due to the fact that only one thread at a time can enter the critical region:
4
Example A.40.5c
5
6
7
8
9
10
11
12
13
14
15
16
17
18
C/C++
void work(int i, int j) {}
void wrong5(int n)
{
#pragma omp parallel
{
#pragma omp critical
{
work(n, 0);
/* incorrect nesting of barrier region in a critical region */
#pragma omp barrier
work(n, 1);
}
}
}
C/C++
Fortran
Example A.40.5f
19
20
21
22
23
24
25
26
27
28
29
30
31
SUBROUTINE WRONG5(N)
INTEGER N
!$OMP
!$OMP
PARALLEL DEFAULT(SHARED)
CRITICAL
CALL WORK(N,1)
! incorrect nesting of barrier region in a critical region
!$OMP
BARRIER
CALL WORK(N,2)
!$OMP
END CRITICAL
!$OMP
END PARALLEL
END SUBROUTINE WRONG5
Fortran
286
OpenMP API • Version 3.1 July 2011
1
2
3
The following example is non-conforming because the barrier region cannot be
closely nested inside the single region. If this were permitted, it would result in
deadlock due to the fact that only one thread executes the single region:
4
Example A.40.6c
5
6
7
8
9
10
11
12
13
14
15
16
17
18
C/C++
void work(int i, int j) {}
void wrong6(int n)
{
#pragma omp parallel
{
#pragma omp single
{
work(n, 0);
/* incorrect nesting of barrier region in a single region */
#pragma omp barrier
work(n, 1);
}
}
}
C/C++
Fortran
19
20
21
22
23
24
25
26
27
28
29
30
31
Example A.40.6f
SUBROUTINE WRONG6(N)
INTEGER N
!$OMP
!$OMP
PARALLEL DEFAULT(SHARED)
SINGLE
CALL WORK(N,1)
! incorrect nesting of barrier region in a single region
!$OMP
BARRIER
CALL WORK(N,2)
!$OMP
END SINGLE
!$OMP
END PARALLEL
END SUBROUTINE WRONG6
Fortran
Appendix A
Examples
287
2
The omp_set_dynamic and
omp_set_num_threads Routines
3
4
5
6
7
8
Some programs rely on a fixed, prespecified number of threads to execute correctly.
Because the default setting for the dynamic adjustment of the number of threads is
implementation defined, such programs can choose to turn off the dynamic threads
capability and set the number of threads explicitly to ensure portability. The following
example shows how to do this using omp_set_dynamic (Section 3.2.7 on page 123),
and omp_set_num_threads (Section 3.2.1 on page 116).
9
10
11
12
13
14
15
In this example, the program executes correctly only if it is executed by 16 threads. If
the implementation is not capable of supporting 16 threads, the behavior of this example
is implementation defined (see Algorithm 2.1 on page 36). Note that the number of
threads executing a parallel region remains constant during the region, regardless of
the dynamic threads setting. The dynamic threads mechanism determines the number of
threads to use at the start of the parallel region and keeps it constant for the duration
of the region.
16
Example A.41.1c
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
#include <omp.h>
#include <stdlib.h>
1
A.41
C/C++
void do_by_16(float *x, int iam, int ipoints) {}
void dynthreads(float *x, int npoints)
{
int iam, ipoints;
omp_set_dynamic(0);
omp_set_num_threads(16);
#pragma omp parallel shared(x, npoints) private(iam, ipoints)
{
if (omp_get_num_threads() != 16)
abort();
iam = omp_get_thread_num();
ipoints = npoints/16;
do_by_16(x, iam, ipoints);
}
}
C/C++
288
OpenMP API • Version 3.1 July 2011
Fortran
Example A.41.1f
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
SUBROUTINE DO_BY_16(X, IAM, IPOINTS)
REAL X(*)
INTEGER IAM, IPOINTS
END SUBROUTINE DO_BY_16
SUBROUTINE DYNTHREADS(X, NPOINTS)
INCLUDE "omp_lib.h"
! or USE OMP_LIB
INTEGER NPOINTS
REAL X(NPOINTS)
INTEGER IAM, IPOINTS
CALL OMP_SET_DYNAMIC(.FALSE.)
CALL OMP_SET_NUM_THREADS(16)
!$OMP
PARALLEL SHARED(X,NPOINTS) PRIVATE(IAM, IPOINTS)
IF (OMP_GET_NUM_THREADS() .NE. 16) THEN
STOP
ENDIF
IAM = OMP_GET_THREAD_NUM()
IPOINTS = NPOINTS/16
CALL DO_BY_16(X,IAM,IPOINTS)
!$OMP
END PARALLEL
END SUBROUTINE DYNTHREADS
Fortran
32
33
34
35
36
A.42
The omp_get_num_threads Routine
In the following example, the omp_get_num_threads call (see Section 3.2.2 on
page 117) returns 1 in the sequential part of the code, so np will always be equal to 1.
To determine the number of threads that will be deployed for the parallel region, the
call should be inside the parallel region.
Appendix A
Examples
289
1
Example A.42.1c
2
3
4
5
6
7
8
9
10
11
12
13
14
#include <omp.h>
void work(int i);
C/C++
void incorrect()
{
int np, i;
np = omp_get_num_threads();
/* misplaced */
#pragma omp parallel for schedule(static)
for (i=0; i < np; i++)
work(i);
}
C/C++
Fortran
Example A.42.1f
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
SUBROUTINE WORK(I)
INTEGER I
I = I + 1
END SUBROUTINE WORK
SUBROUTINE INCORRECT()
INCLUDE "omp_lib.h"
INTEGER I, NP
! or USE OMP_LIB
NP = OMP_GET_NUM_THREADS()
!misplaced: will return 1
PARALLEL DO SCHEDULE(STATIC)
DO I = 0, NP-1
CALL WORK(I)
ENDDO
!$OMP
END PARALLEL DO
END SUBROUTINE INCORRECT
!$OMP
Fortran
290
OpenMP API • Version 3.1 July 2011
1
2
The following example shows how to rewrite this program without including a query for
the number of threads:
3
Example A.42.2c
4
5
6
7
8
9
10
11
12
13
14
15
16
#include <omp.h>
void work(int i);
C/C++
void correct()
{
int i;
#pragma omp parallel private(i)
{
i = omp_get_thread_num();
work(i);
}
}
C/C++
Fortran
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
Example A.42.2f
SUBROUTINE WORK(I)
INTEGER I
I = I + 1
END SUBROUTINE WORK
SUBROUTINE CORRECT()
INCLUDE "omp_lib.h"
INTEGER I
!$OMP
!$OMP
! or USE OMP_LIB
PARALLEL PRIVATE(I)
I = OMP_GET_THREAD_NUM()
CALL WORK(I)
END PARALLEL
END SUBROUTINE CORRECT
Fortran
Appendix A
Examples
291
1
A.43
The omp_init_lock Routine
2
3
The following example demonstrates how to initialize an array of locks in a parallel
region by using omp_init_lock (Section 3.3.1 on page 143).
4
Example A.43.1c
5
6
7
8
9
10
11
12
13
14
15
16
17
18
C/C++
#include <omp.h>
omp_lock_t *new_locks()
{
int i;
omp_lock_t *lock = new omp_lock_t[1000];
#pragma omp parallel for private(i)
for (i=0; i<1000; i++)
{
omp_init_lock(&lock[i]);
}
return lock;
}
C/C++
Fortran
Example A.43.1f
19
20
21
22
23
24
25
26
27
28
29
30
31
32
FUNCTION NEW_LOCKS()
USE OMP_LIB
! or INCLUDE "omp_lib.h"
INTEGER(OMP_LOCK_KIND), DIMENSION(1000) :: NEW_LOCKS
INTEGER I
!$OMP
!$OMP
PARALLEL DO PRIVATE(I)
DO I=1,1000
CALL OMP_INIT_LOCK(NEW_LOCKS(I))
END DO
END PARALLEL DO
END FUNCTION NEW_LOCKS
Fortran
292
OpenMP API • Version 3.1 July 2011
1
2
3
4
5
6
A.44
Ownership of Locks
Ownership of locks has changed since OpenMP 2.5. In OpenMP 2.5, locks are owned by
threads; so a lock released by the omp_unset_lock routine must be owned by the
same thread executing the routine. With OpenMP 3.0, locks are owned by task regions;
so a lock released by the omp_unset_lock routine in a task region must be owned by
the same task region.
7
8
9
10
11
12
13
This change in ownership requires extra care when using locks. The following program
is conforming in OpenMP 2.5 because the thread that releases the lock lck in the
parallel region is the same thread that acquired the lock in the sequential part of the
program (master thread of parallel region and the initial thread are the same). However,
it is not conforming in OpenMP 3.0 and 3.1, because the task region that releases the
lock lck is different from the task region that acquires the lock.
14
Example A.44.1c
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
#include <stdlib.h>
#include <stdio.h>
#include <omp.h>
C/C++
int main()
{
int x;
omp_lock_t lck;
omp_init_lock (&lck);
omp_set_lock (&lck);
x = 0;
#pragma omp parallel shared (x)
{
#pragma omp master
{
x = x + 1;
omp_unset_lock (&lck);
}
/* Some more stuff. */
}
omp_destroy_lock (&lck);
return 0;
}
C/C++
Appendix A
Examples
293
Fortran
Example A.44.1f
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
program lock
use omp_lib
integer :: x
integer (kind=omp_lock_kind) :: lck
call omp_init_lock (lck)
call omp_set_lock(lck)
x = 0
!$omp parallel shared (x)
!$omp master
x = x + 1
call omp_unset_lock(lck)
!$omp end master
!
Some more stuff.
!$omp end parallel
call omp_destroy_lock(lck)
end
Fortran
22
23
A.45
Simple Lock Routines
In the following example (for Section 3.3 on page 141), the lock routines cause the
threads to be idle while waiting for entry to the first critical section, but to do other work
while waiting for entry to the second. The omp_set_lock function blocks, but the
omp_test_lock function does not, allowing the work in skip to be done.
24
25
26
27
294
OpenMP API • Version 3.1 July 2011
C/C++
1
2
Note that the argument to the lock routines should have type omp_lock_t, and that
there is no need to flush it.
3
Example A.45.1c
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
#include <stdio.h>
#include <omp.h>
void skip(int i) {}
void work(int i) {}
int main()
{
omp_lock_t lck;
int id;
omp_init_lock(&lck);
#pragma omp parallel shared(lck) private(id)
{
id = omp_get_thread_num();
omp_set_lock(&lck);
/* only one thread at a time can execute this printf */
printf("My thread id is %d.\n", id);
omp_unset_lock(&lck);
while (! omp_test_lock(&lck)) {
skip(id);
/* we do not yet have the lock,
so we must do something else */
}
work(id);
/* we now have the lock
and can do the work */
omp_unset_lock(&lck);
}
omp_destroy_lock(&lck);
return 0;
}
C/C++
Appendix A
Examples
295
Fortran
1
Note that there is no need to flush the lock variable.
2
Example A.45.1f
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
SUBROUTINE SKIP(ID)
END SUBROUTINE SKIP
SUBROUTINE WORK(ID)
END SUBROUTINE WORK
PROGRAM SIMPLELOCK
INCLUDE "omp_lib.h"
! or USE OMP_LIB
INTEGER(OMP_LOCK_KIND) LCK
INTEGER ID
CALL OMP_INIT_LOCK(LCK)
!$OMP
PARALLEL SHARED(LCK) PRIVATE(ID)
ID = OMP_GET_THREAD_NUM()
CALL OMP_SET_LOCK(LCK)
PRINT *, 'My thread id is ', ID
CALL OMP_UNSET_LOCK(LCK)
DO WHILE (.NOT. OMP_TEST_LOCK(LCK))
CALL SKIP(ID)
! We do not yet have the lock
! so we must do something else
END DO
CALL WORK(ID)
! We now have the lock
! and can do the work
CALL OMP_UNSET_LOCK( LCK )
!$OMP
END PARALLEL
CALL OMP_DESTROY_LOCK( LCK )
END PROGRAM SIMPLELOCK
Fortran
296
OpenMP API • Version 3.1 July 2011
1
A.46
Nestable Lock Routines
2
3
The following example (for Section 3.3 on page 141) demonstrates how a nestable lock
can be used to synchronize updates both to a whole structure and to one of its members.
4
Example A.46.1c
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
C/C++
#include <omp.h>
typedef struct {
int a,b;
omp_nest_lock_t lck; } pair;
int work1();
int work2();
int work3();
void incr_a(pair *p, int a)
{
/* Called only from incr_pair, no need to lock. */
p->a += a;
}
void incr_b(pair *p, int b)
{
/* Called both from incr_pair and elsewhere, */
/* so need a nestable lock. */
omp_set_nest_lock(&p->lck);
p->b += b;
omp_unset_nest_lock(&p->lck);
}
void incr_pair(pair *p, int a, int b)
{
omp_set_nest_lock(&p->lck);
incr_a(p, a);
incr_b(p, b);
omp_unset_nest_lock(&p->lck);
}
void nestlock(pair *p)
{
#pragma omp parallel sections
{
#pragma omp section
incr_pair(p, work1(), work2());
#pragma omp section
incr_b(p, work3());
}
}
C/C++
Appendix A
Examples
297
Fortran
Example A.46.1f
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
MODULE DATA
USE OMP_LIB, ONLY: OMP_NEST_LOCK_KIND
TYPE LOCKED_PAIR
INTEGER A
INTEGER B
INTEGER (OMP_NEST_LOCK_KIND) LCK
END TYPE
END MODULE DATA
SUBROUTINE INCR_A(P, A)
! called only from INCR_PAIR, no need to lock
USE DATA
TYPE(LOCKED_PAIR) :: P
INTEGER A
P%A = P%A + A
END SUBROUTINE INCR_A
SUBROUTINE INCR_B(P, B)
! called from both INCR_PAIR and elsewhere,
! so we need a nestable lock
USE OMP_LIB
! or INCLUDE "omp_lib.h"
USE DATA
TYPE(LOCKED_PAIR) :: P
INTEGER B
CALL OMP_SET_NEST_LOCK(P%LCK)
P%B = P%B + B
CALL OMP_UNSET_NEST_LOCK(P%LCK)
END SUBROUTINE INCR_B
SUBROUTINE INCR_PAIR(P, A, B)
USE OMP_LIB
! or INCLUDE "omp_lib.h"
USE DATA
TYPE(LOCKED_PAIR) :: P
INTEGER A
INTEGER B
CALL OMP_SET_NEST_LOCK(P%LCK)
CALL INCR_A(P, A)
CALL INCR_B(P, B)
CALL OMP_UNSET_NEST_LOCK(P%LCK)
END SUBROUTINE INCR_PAIR
SUBROUTINE NESTLOCK(P)
USE OMP_LIB
! or INCLUDE "omp_lib.h"
USE DATA
TYPE(LOCKED_PAIR) :: P
INTEGER WORK1, WORK2, WORK3
EXTERNAL WORK1, WORK2, WORK3
298
OpenMP API • Version 3.1 July 2011
1
2
3
4
5
6
7
8
9
!$OMP
PARALLEL SECTIONS
!$OMP
SECTION
CALL INCR_PAIR(P, WORK1(), WORK2())
SECTION
CALL INCR_B(P, WORK3())
END PARALLEL SECTIONS
!$OMP
!$OMP
END SUBROUTINE NESTLOCK
Fortran
10
Appendix A
Examples
299
1
This page intentionally left blank.
2
300
OpenMP API • Version 3.1 July 2011
1
APPENDIX
B
3
Stubs for Runtime Library
Routines
4
5
6
7
8
This section provides stubs for the runtime library routines defined in the OpenMP API.
The stubs are provided to enable portability to platforms that do not support the
OpenMP API. On these platforms, OpenMP programs must be linked with a library
containing these stub routines. The stub routines assume that the directives in the
OpenMP program are ignored. As such, they emulate serial semantics.
9
10
11
Note that the lock variable that appears in the lock routines must be accessed
exclusively through these routines. It should not be initialized or otherwise modified in
the user program.
12
13
14
15
In an actual implementation the lock variable might be used to hold the address of an
allocated memory block, but here it is used to hold an integer value. Users should not
make assumptions about mechanisms used by OpenMP implementations to implement
locks based on the scheme used by the stub procedures.
2
Fortran
16
17
18
19
Note – In order to be able to compile the Fortran stubs file, the include file
omp_lib.h was split into two files: omp_lib_kinds.h and omp_lib.h and the
omp_lib_kinds.h file included where needed. There is no requirement for the
implementation to provide separate files.
Fortran
301
1
B.1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
C/C++ Stub Routines
#include <stdio.h>
#include <stdlib.h>
#include "omp.h"
void omp_set_num_threads(int num_threads)
{
}
int omp_get_num_threads(void)
{
return 1;
}
int omp_get_max_threads(void)
{
return 1;
}
int omp_get_thread_num(void)
{
return 0;
}
int omp_get_num_procs(void)
{
return 1;
}
int omp_in_parallel(void)
{
return 0;
}
void omp_set_dynamic(int dynamic_threads)
{
}
int omp_get_dynamic(void)
{
return 0;
}
void omp_set_nested(int nested)
{
}
302
OpenMP API • Version 3.1 July 2011
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
int omp_get_nested(void)
{
return 0;
}
void omp_set_schedule(omp_sched_t kind, int modifier)
{
}
void omp_get_schedule(omp_sched_t *kind, int *modifier)
{
*kind = omp_sched_static;
*modifier = 0;
}
int omp_get_thread_limit(void)
{
return 1;
}
void omp_set_max_active_levels(int max_active_levels)
{
}
int omp_get_max_active_levels(void)
{
return 0;
}
int omp_get_level(void)
{
return 0;
}
int omp_get_ancestor_thread_num(int level)
{
if (level == 0)
{
return 0;
}
else
{
return -1;
}
}
Appendix B
Stubs for Runtime Library Routines
303
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
int omp_get_team_size(int level)
{
if (level == 0)
{
return 1;
}
else
{
return -1;
}
}
int omp_get_active_level(void)
{
return 0;
}
int omp_in_final(void)
{
return 1;
}
struct __omp_lock
{
int lock;
};
enum { UNLOCKED = -1, INIT, LOCKED };
void omp_init_lock(omp_lock_t *arg)
{
struct __omp_lock *lock = (struct __omp_lock *)arg;
lock->lock = UNLOCKED;
}
void omp_destroy_lock(omp_lock_t *arg)
{
struct __omp_lock *lock = (struct __omp_lock *)arg;
lock->lock = INIT;
}
304
OpenMP API • Version 3.1 July 2011
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
void omp_set_lock(omp_lock_t *arg)
{
struct __omp_lock *lock = (struct __omp_lock *)arg;
if (lock->lock == UNLOCKED)
{
lock->lock = LOCKED;
}
else if (lock->lock == LOCKED)
{
fprintf(stderr,
"error: deadlock in using lock variable\n");
exit(1);
}
else
{
fprintf(stderr, "error: lock not initialized\n");
exit(1);
}
}
void omp_unset_lock(omp_lock_t *arg)
{
struct __omp_lock *lock = (struct __omp_lock *)arg;
if (lock->lock == LOCKED)
{
lock->lock = UNLOCKED;
}
else if (lock->lock == UNLOCKED)
{
fprintf(stderr, "error: lock not set\n");
exit(1);
}
else
{
fprintf(stderr, "error: lock not initialized\n");
exit(1);
}
}
Appendix B
Stubs for Runtime Library Routines
305
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
int omp_test_lock(omp_lock_t *arg)
{
struct __omp_lock *lock = (struct __omp_lock *)arg;
if (lock->lock == UNLOCKED)
{
lock->lock = LOCKED;
return 1;
}
else if (lock->lock == LOCKED)
{
return 0;
}
else
{
fprintf(stderr, "error: lock not initialized\n");
exit(1);
}
}
struct __omp_nest_lock
{
short owner;
short count;
};
enum { NOOWNER = -1, MASTER = 0 };
void omp_init_nest_lock(omp_nest_lock_t *arg)
{
struct __omp_nest_lock *nlock=(struct __omp_nest_lock *)arg;
nlock->owner = NOOWNER;
nlock->count = 0;
}
void omp_destroy_nest_lock(omp_nest_lock_t *arg)
{
struct __omp_nest_lock *nlock=(struct __omp_nest_lock *)arg;
nlock->owner = NOOWNER;
nlock->count = UNLOCKED;
}
306
OpenMP API • Version 3.1 July 2011
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
void omp_set_nest_lock(omp_nest_lock_t *arg)
{
struct __omp_nest_lock *nlock=(struct __omp_nest_lock *)arg;
if (nlock->owner == MASTER && nlock->count >= 1)
{
nlock->count++;
}
else if (nlock->owner == NOOWNER && nlock->count == 0)
{
nlock->owner = MASTER;
nlock->count = 1;
}
else
{
fprintf(stderr,
"error: lock corrupted or not initialized\n");
exit(1);
}
}
void omp_unset_nest_lock(omp_nest_lock_t *arg)
{
struct __omp_nest_lock *nlock=(struct __omp_nest_lock *)arg;
if (nlock->owner == MASTER && nlock->count >= 1)
{
nlock->count--;
if (nlock->count == 0)
{
nlock->owner = NOOWNER;
}
}
else if (nlock->owner == NOOWNER && nlock->count == 0)
{
fprintf(stderr, "error: lock not set\n");
exit(1);
}
else
{
fprintf(stderr,
"error: lock corrupted or not initialized\n");
exit(1);
}
}
int omp_test_nest_lock(omp_nest_lock_t *arg)
{
struct __omp_nest_lock *nlock=(struct __omp_nest_lock *)arg;
omp_set_nest_lock(arg);
return nlock->count;
}
Appendix B
Stubs for Runtime Library Routines
307
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
double omp_get_wtime(void)
{
/* This function does not provide a working
* wallclock timer. Replace it with a version
* customized for the target machine.
*/
return 0.0;
}
double omp_get_wtick(void)
{
/* This function does not provide a working
* clock tick function. Replace it with
* a version customized for the target machine.
*/
return 365. * 86400.;
}
18
308
OpenMP API • Version 3.1 July 2011
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
B.2
Fortran Stub Routines
C23456
subroutine omp_set_num_threads(num_threads)
integer num_threads
return
end subroutine
integer function omp_get_num_threads()
omp_get_num_threads = 1
return
end function
integer function omp_get_max_threads()
omp_get_max_threads = 1
return
end function
integer function omp_get_thread_num()
omp_get_thread_num = 0
return
end function
integer function omp_get_num_procs()
omp_get_num_procs = 1
return
end function
logical function omp_in_parallel()
omp_in_parallel = .false.
return
end function
subroutine omp_set_dynamic(dynamic_threads)
logical dynamic_threads
return
end subroutine
logical function omp_get_dynamic()
omp_get_dynamic = .false.
return
end function
subroutine omp_set_nested(nested)
logical nested
return
end subroutine
Appendix B
Stubs for Runtime Library Routines
309
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
logical function omp_get_nested()
omp_get_nested = .false.
return
end function
subroutine omp_set_schedule(kind, modifier)
include 'omp_lib_kinds.h'
integer (kind=omp_sched_kind) kind
integer modifier
return
end subroutine
subroutine omp_get_schedule(kind, modifier)
include 'omp_lib_kinds.h'
integer (kind=omp_sched_kind) kind
integer modifier
kind = omp_sched_static
modifier = 0
return
end subroutine
integer function omp_get_thread_limit()
omp_get_thread_limit = 1
return
end function
subroutine omp_set_max_active_levels( level )
integer level
end subroutine
integer function omp_get_max_active_levels()
omp_get_max_active_levels = 0
return
end function
integer function omp_get_level()
omp_get_level = 0
return
end function
integer function omp_get_ancestor_thread_num( level )
integer level
if ( level .eq. 0 ) then
omp_get_ancestor_thread_num = 0
else
omp_get_ancestor_thread_num = -1
end if
return
end function
310
OpenMP API • Version 3.1 July 2011
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
integer function omp_get_team_size( level )
integer level
if ( level .eq. 0 ) then
omp_get_team_size = 1
else
omp_get_team_size = -1
end if
return
end function
integer function omp_get_active_level()
omp_get_active_level = 0
return
end function
logical function omp_in_final()
omp_in_final = .true.
return
end function
subroutine omp_init_lock(lock)
! lock is 0 if the simple lock is not initialized
!
-1 if the simple lock is initialized but not set
!
1 if the simple lock is set
include 'omp_lib_kinds.h'
integer(kind=omp_lock_kind) lock
lock = -1
return
end subroutine
subroutine omp_destroy_lock(lock)
include 'omp_lib_kinds.h'
integer(kind=omp_lock_kind) lock
lock = 0
return
end subroutine
subroutine omp_set_lock(lock)
include 'omp_lib_kinds.h'
integer(kind=omp_lock_kind) lock
if (lock .eq. -1) then
lock = 1
elseif (lock .eq. 1) then
print *, 'error: deadlock in using lock variable'
stop
else
print *, 'error: lock not initialized'
stop
endif
return
end subroutine
Appendix B
Stubs for Runtime Library Routines
311
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
subroutine omp_unset_lock(lock)
include 'omp_lib_kinds.h'
integer(kind=omp_lock_kind) lock
if (lock .eq. 1) then
lock = -1
elseif (lock .eq. -1) then
print *, 'error: lock not set'
stop
else
print *, 'error: lock not initialized'
stop
endif
return
end subroutine
logical function omp_test_lock(lock)
include 'omp_lib_kinds.h'
integer(kind=omp_lock_kind) lock
if (lock .eq. -1) then
lock = 1
omp_test_lock = .true.
elseif (lock .eq. 1) then
omp_test_lock = .false.
else
print *, 'error: lock not initialized'
stop
endif
return
end function
subroutine omp_init_nest_lock(nlock)
! nlock is
! 0 if the nestable lock is not initialized
! -1 if the nestable lock is initialized but not set
! 1 if the nestable lock is set
! no use count is maintained
include 'omp_lib_kinds.h'
integer(kind=omp_nest_lock_kind) nlock
nlock = -1
return
end subroutine
312
OpenMP API • Version 3.1 July 2011
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
subroutine omp_destroy_nest_lock(nlock)
include 'omp_lib_kinds.h'
integer(kind=omp_nest_lock_kind) nlock
nlock = 0
return
end subroutine
subroutine omp_set_nest_lock(nlock)
include 'omp_lib_kinds.h'
integer(kind=omp_nest_lock_kind) nlock
if (nlock .eq. -1)
nlock = 1
elseif (nlock .eq.
print *, 'error:
stop
else
print *, 'error:
stop
endif
then
0) then
nested lock not initialized'
deadlock using nested lock variable'
return
end subroutine
subroutine omp_unset_nest_lock(nlock)
include 'omp_lib_kinds.h'
integer(kind=omp_nest_lock_kind) nlock
if (nlock .eq. 1) then
nlock = -1
elseif (nlock .eq. 0) then
print *, 'error: nested lock not initialized'
stop
else
print *, 'error: nested lock not set'
stop
endif
return
end subroutine
Appendix B
Stubs for Runtime Library Routines
313
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
integer function omp_test_nest_lock(nlock)
include 'omp_lib_kinds.h'
integer(kind=omp_nest_lock_kind) nlock
if (nlock .eq. -1) then
nlock = 1
omp_test_nest_lock = 1
elseif (nlock .eq. 1) then
omp_test_nest_lock = 0
else
print *, 'error: nested lock not initialized'
stop
endif
return
end function
double precision function omp_get_wtime()
! this function does not provide a working
! wall clock timer. replace it with a version
! customized for the target machine.
omp_get_wtime = 0.0d0
return
end function
double precision function omp_get_wtick()
! this function does not provide a working
! clock tick function. replace it with
! a version customized for the target machine.
double precision one_year
parameter (one_year=365.d0*86400.d0)
omp_get_wtick = one_year
return
end function
314
OpenMP API • Version 3.1 July 2011
1
APPENDIX
C
OpenMP C and C++ Grammar
2
3
4
C.1
Notation
5
6
The grammar rules consist of the name for a non-terminal, followed by a colon,
followed by replacement alternatives on separate lines.
7
8
The syntactic expression termopt indicates that the term is optional within the
replacement.
9
10
The syntactic expression termoptseq is equivalent to term-seqopt with the following
additional rules:
11
term-seq :
12
term
13
term-seq term
14
term-seq , term
315
1
C.2
Rules
The notation is described in Section 6.1 of the C standard. This grammar appendix
shows the extensions to the base language grammar for the OpenMP C and C++
directives.
2
3
4
5
6
/* in C++ (ISO/IEC 14882:1998) */
7
statement-seq:
statement
8
openmp-directive
9
10
statement-seq statement
11
statement-seq openmp-directive
12
13
14
/* in C90 (ISO/IEC 9899:1990) */
15
statement-list:
statement
16
17
openmp-directive
18
statement-list statement
19
statement-list openmp-directive
20
21
22
/* in C99 (ISO/IEC 9899:1999) */
23
block-item:
declaration
24
25
statement
26
openmp-directive
27
316
OpenMP API • Version 3.1 July 2011
1
2
3
4
5
6
statement:
/* standard statements */
openmp-construct
openmp-construct:
parallel-construct
7
for-construct
8
sections-construct
9
single-construct
10
parallel-for-construct
11
parallel-sections-construct
12
task-construct
13
master-construct
14
critical-construct
15
atomic-construct
16
ordered-construct
17
openmp-directive:
18
barrier-directive
19
taskwait-directive
20
taskyield-directive
21
flush-directive
22
structured-block:
23
statement
24
25
26
27
parallel-construct:
parallel-directive structured-block
parallel-directive:
# pragma omp parallel parallel-clauseoptseq new-line
28
Appendix C
OpenMP C and C++ Grammar
317
parallel-clause:
1
unique-parallel-clause
2
3
data-default-clause
4
data-privatization-clause
5
data-privatization-in-clause
6
data-sharing-clause
7
data-reduction-clause
unique-parallel-clause:
8
if ( expression )
9
10
num_threads ( expression )
11
copyin ( variable-list )
for-construct:
12
for-directive iteration-statement
13
for-directive:
14
# pragma omp for for-clauseoptseq new-line
15
for-clause:
16
unique-for-clause
17
18
data-privatization-clause
19
data-privatization-in-clause
20
data-privatization-out-clause
21
data-reduction-clause
22
nowait
23
unique-for-clause:
24
ordered
25
schedule ( schedule-kind )
26
schedule ( schedule-kind , expression )
27
collapse ( expression )
28
318
OpenMP API • Version 3.1 July 2011
1
schedule-kind:
2
static
3
dynamic
4
guided
5
auto
6
runtime
7
8
9
10
11
sections-construct:
sections-directive section-scope
sections-directive:
# pragma omp sections sections-clauseoptseq new-line
sections-clause:
12
data-privatization-clause
13
data-privatization-in-clause
14
data-privatization-out-clause
15
data-reduction-clause
16
nowait
17
section-scope:
18
19
{ section-sequence }
section-sequence:
20
section-directiveopt structured-block
21
section-sequence section-directive structured-block
22
23
24
25
26
27
section-directive:
# pragma omp section new-line
single-construct:
single-directive structured-block
single-directive:
# pragma omp single single-clauseoptseq new-line
28
Appendix C
OpenMP C and C++ Grammar
319
single-clause:
1
unique-single-clause
2
3
data-privatization-clause
4
data-privatization-in-clause
5
nowait
unique-single-clause:
6
copyprivate ( variable-list )
7
task-construct:
8
task-directive structured-block
9
task-directive:
10
# pragma omp task task-clauseoptseq new-line
11
task-clause:
12
13
unique-task-clause
14
data-default-clause
15
data-privatization-clause
16
data-privatization-in-clause
17
data-sharing-clause
unique-task-clause:
18
19
if ( scalar-expression )
20
final( scalar-expression )
21
untied
22
mergeable
parallel-for-construct:
23
parallel-for-directive iteration-statement
24
parallel-for-directive:
25
# pragma omp parallel for parallel-for-clauseoptseq new-line
26
320
OpenMP API • Version 3.1 July 2011
1
2
parallel-for-clause:
unique-parallel-clause
3
unique-for-clause
4
data-default-clause
5
data-privatization-clause
6
data-privatization-in-clause
7
data-privatization-out-clause
8
data-sharing-clause
9
data-reduction-clause
10
parallel-sections-construct:
11
12
13
14
parallel-sections-directive section-scope
parallel-sections-directive:
# pragma omp parallel sections parallel-sections-clauseoptseq new-line
parallel-sections-clause:
15
unique-parallel-clause
16
data-default-clause
17
data-privatization-clause
18
data-privatization-in-clause
19
data-privatization-out-clause
20
data-sharing-clause
21
data-reduction-clause
22
23
24
25
26
27
28
master-construct:
master-directive structured-block
master-directive:
# pragma omp master new-line
critical-construct:
critical-directive structured-block
Appendix C
OpenMP C and C++ Grammar
321
critical-directive:
1
# pragma omp critical region-phraseopt new-line
2
region-phrase:
3
( identifier )
4
5
barrier-directive:
6
# pragma omp barrier new-line
7
taskwait-directive:
8
# pragma omp taskwait new-line
9
taskyield-directive:
10
# pragma omp taskyield new-line
11
atomic-construct:
12
atomic-directive expression-statement
13
atomic-directive structured block
14
atomic-directive:
15
# pragma omp atomic atomic-clauseopt new-line
16
atomic-clause:
17
18
read
19
write
20
update
21
capture
22
flush-directive:
# pragma omp flush flush-varsopt new-line
23
flush-vars:
24
25
( variable-list )
26
ordered-construct:
ordered-directive structured-block
27
28
322
OpenMP API • Version 3.1 July 2011
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
ordered-directive:
# pragma omp ordered new-line
declaration:
/* standard declarations */
threadprivate-directive
threadprivate-directive:
# pragma omp threadprivate ( variable-list ) new-line
data-default-clause:
default ( shared )
default ( none )
data-privatization-clause:
private ( variable-list )
data-privatization-in-clause:
firstprivate ( variable-list )
data-privatization-out-clause:
lastprivate ( variable-list )
data-sharing-clause:
shared ( variable-list )
data-reduction-clause:
reduction ( reduction-operator : variable-list )
reduction-operator:
One of: + * - & ^ | && || max min
23
/* in C */
24
variable-list:
25
identifier
26
variable-list , identifier
Appendix C
OpenMP C and C++ Grammar
323
1
/* in C++ */
2
variable-list:
id-expression
3
variable-list , id-expression
4
324
OpenMP API • Version 3.1 July 2011
1
APPENDIX
D
2
Interface Declarations
3
4
5
6
This appendix gives examples of the C/C++ header file, the Fortran include file and
Fortran module that shall be provided by implementations as specified in Chapter 3. It
also includes an example of a Fortran 90 generic interface for a library routine. This is a
non-normative section, implementation files may differ.
325
1
D.1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
Example of the omp.h Header File
#ifndef _OMP_H_DEF
#define _OMP_H_DEF
/*
* define the lock data types
*/
typedef void *omp_lock_t;
typedef void *omp_nest_lock_t;
/*
* define the schedule kinds
*/
typedef enum omp_sched_t
{
omp_sched_static = 1,
omp_sched_dynamic = 2,
omp_sched_guided = 3,
omp_sched_auto = 4
/* , Add vendor specific schedule constants here */
} omp_sched_t;
/*
* exported OpenMP functions
*/
#ifdef __cplusplus
extern
"C"
{
#endif
extern
extern
extern
extern
extern
extern
extern
extern
extern
extern
extern
extern
extern
extern
extern
extern
extern
extern
326
void
int
int
int
int
int
void
int
void
int
int
void
int
int
int
int
int
int
omp_set_num_threads(int num_threads);
omp_get_num_threads(void);
omp_get_max_threads(void);
omp_get_thread_num(void);
omp_get_num_procs(void);
omp_in_parallel(void);
omp_set_dynamic(int dynamic_threads);
omp_get_dynamic(void);
omp_set_nested(int nested);
omp_get_nested(void);
omp_get_thread_limit(void);
omp_set_max_active_levels(int max_active_levels);
omp_get_max_active_levels(void);
omp_get_level(void);
omp_get_ancestor_thread_num(int level);
omp_get_team_size(int level);
omp_get_active_level(void);
omp_in_final(void);
OpenMP API • Version 3.1 July 2011
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
extern void
extern void
omp_set_schedule(omp_sched_t kind, int modifier);
omp_get_schedule(omp_sched_t *kind, int *modifier);
extern
extern
extern
extern
extern
void
void
void
void
int
omp_init_lock(omp_lock_t *lock);
omp_destroy_lock(omp_lock_t *lock);
omp_set_lock(omp_lock_t *lock);
omp_unset_lock(omp_lock_t *lock);
omp_test_lock(omp_lock_t *lock);
extern
extern
extern
extern
extern
void
void
void
void
int
omp_init_nest_lock(omp_nest_lock_t *lock);
omp_destroy_nest_lock(omp_nest_lock_t *lock);
omp_set_nest_lock(omp_nest_lock_t *lock);
omp_unset_nest_lock(omp_nest_lock_t *lock);
omp_test_nest_lock(omp_nest_lock_t *lock);
extern double omp_get_wtime(void);
extern double omp_get_wtick(void);
#ifdef __cplusplus
}
#endif
#endif
Appendix D
Interface Declarations
327
1
D.2
2
Example of an Interface Declaration include
File
omp_lib_kinds.h:
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
integer
omp_lock_kind
integer
omp_nest_lock_kind
! this selects an integer that is large enough to hold a 64 bit integer
parameter ( omp_lock_kind = selected_int_kind( 10 ) )
parameter ( omp_nest_lock_kind = selected_int_kind( 10 ) )
integer
omp_sched_kind
! this selects an integer that is large enough to hold a 32 bit integer
parameter ( omp_sched_kind = selected_int_kind( 8 ) )
integer ( omp_sched_kind ) omp_sched_static
parameter ( omp_sched_static = 1 )
integer ( omp_sched_kind ) omp_sched_dynamic
parameter ( omp_sched_dynamic = 2 )
integer ( omp_sched_kind ) omp_sched_guided
parameter ( omp_sched_guided = 3 )
integer ( omp_sched_kind ) omp_sched_auto
parameter ( omp_sched_auto = 4 )
omp_lib.h:
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
! default integer type assumed below
! default logical type assumed below
! OpenMP API v3.1
include 'omp_lib_kinds.h'
integer
openmp_version
parameter ( openmp_version = 201107 )
external
external
integer
external
integer
external
integer
external
integer
external
logical
external
external
logical
external
external
logical
external
external
external
328
omp_set_num_threads
omp_get_num_threads
omp_get_num_threads
omp_get_max_threads
omp_get_max_threads
omp_get_thread_num
omp_get_thread_num
omp_get_num_procs
omp_get_num_procs
omp_in_parallel
omp_in_parallel
omp_set_dynamic
omp_get_dynamic
omp_get_dynamic
omp_set_nested
omp_get_nested
omp_get_nested
omp_set_schedule
omp_get_schedule
omp_get_thread_limit
OpenMP API • Version 3.1 July 2011
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
integer omp_get_thread_limit
external omp_set_max_active_levels
external omp_get_max_active_levels
integer omp_get_max_active_levels
external omp_get_level
integer omp_get_level
external omp_get_ancestor_thread_num
integer omp_get_ancestor_thread_num
external omp_get_team_size
integer omp_get_team_size
external omp_get_active_level
integer omp_get_active_level
external omp_in_final
logical omp_in_final
external
external
external
external
external
logical
omp_init_lock
omp_destroy_lock
omp_set_lock
omp_unset_lock
omp_test_lock
omp_test_lock
external
external
external
external
external
integer
omp_init_nest_lock
omp_destroy_nest_lock
omp_set_nest_lock
omp_unset_nest_lock
omp_test_nest_lock
omp_test_nest_lock
external omp_get_wtick
double precision omp_get_wtick
external omp_get_wtime
double precision omp_get_wtime
35
Appendix D
Interface Declarations
329
1
D.3
Example of a Fortran Interface Declaration
module
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
!
the "!" of this comment starts in column 1
!23456
&
&
&
&
module omp_lib_kinds
integer, parameter :: omp_lock_kind = selected_int_kind( 10 )
integer, parameter :: omp_nest_lock_kind = selected_int_kind( 10 )
integer, parameter :: omp_sched_kind = selected_int_kind( 8 )
integer(kind=omp_sched_kind), parameter ::
omp_sched_static = 1
integer(kind=omp_sched_kind), parameter ::
omp_sched_dynamic = 2
integer(kind=omp_sched_kind), parameter ::
omp_sched_guided = 3
integer(kind=omp_sched_kind), parameter ::
omp_sched_auto = 4
end module omp_lib_kinds
module omp_lib
use omp_lib_kinds
!
OpenMP API v3.1
integer, parameter :: openmp_version = 201107
interface
subroutine omp_set_num_threads (number_of_threads_expr)
integer, intent(in) :: number_of_threads_expr
end subroutine omp_set_num_threads
function omp_get_num_threads ()
integer :: omp_get_num_threads
end function omp_get_num_threads
function omp_get_max_threads ()
integer :: omp_get_max_threads
end function omp_get_max_threads
function omp_get_thread_num ()
integer :: omp_get_thread_num
end function omp_get_thread_num
function omp_get_num_procs ()
integer :: omp_get_num_procs
end function omp_get_num_procs
function omp_in_parallel ()
330
OpenMP API • Version 3.1 July 2011
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
logical :: omp_in_parallel
end function omp_in_parallel
subroutine omp_set_dynamic (enable_expr)
logical, intent(in) ::enable_expr
end subroutine omp_set_dynamic
function omp_get_dynamic ()
logical :: omp_get_dynamic
end function omp_get_dynamic
subroutine omp_set_nested (enable_expr)
logical, intent(in) :: enable_expr
end subroutine omp_set_nested
function omp_get_nested ()
logical :: omp_get_nested
end function omp_get_nested
subroutine omp_set_schedule (kind, modifier)
use omp_lib_kinds
integer(kind=omp_sched_kind), intent(in) :: kind
integer, intent(in) :: modifier
end subroutine omp_set_schedule
subroutine omp_get_schedule (kind, modifier)
use omp_lib_kinds
integer(kind=omp_sched_kind), intent(out) :: kind
integer, intent(out)::modifier
end subroutine omp_get_schedule
function omp_get_thread_limit()
integer :: omp_get_thread_limit
end function omp_get_thread_limit
subroutine omp_set_max_active_levels(var)
integer, intent(in) :: var
end subroutine omp_set_max_active_levels
function omp_get_max_active_levels()
integer :: omp_get_max_active_levels
end function omp_get_max_active_levels
function omp_get_level()
integer :: omp_get_level
end function omp_get_level
function omp_get_ancestor_thread_num(level)
integer, intent(in) :: level
integer :: omp_get_ancestor_thread_num
end function omp_get_ancestor_thread_num
Appendix D
Interface Declarations
331
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
function omp_get_team_size(level)
integer, intent(in) :: level
integer :: omp_get_team_size
end function omp_get_team_size
function omp_get_active_level()
integer :: omp_get_active_level
end function omp_get_active_level
function omp_in_final()
logical omp_in_final
end function omp_in_final
subroutine omp_init_lock (var)
use omp_lib_kinds
integer (kind=omp_lock_kind), intent(out) :: var
end subroutine omp_init_lock
subroutine omp_destroy_lock (var)
use omp_lib_kinds
integer (kind=omp_lock_kind), intent(inout) :: var
end subroutine omp_destroy_lock
subroutine omp_set_lock (var)
use omp_lib_kinds
integer (kind=omp_lock_kind), intent(inout) :: var
end subroutine omp_set_lock
subroutine omp_unset_lock (var)
use omp_lib_kinds
integer (kind=omp_lock_kind), intent(inout) :: var
end subroutine omp_unset_lock
function omp_test_lock (var)
use omp_lib_kinds
logical :: omp_test_lock
integer (kind=omp_lock_kind), intent(inout) :: var
end function omp_test_lock
subroutine omp_init_nest_lock (var)
use omp_lib_kinds
integer (kind=omp_nest_lock_kind), intent(out) :: var
end subroutine omp_init_nest_lock
subroutine omp_destroy_nest_lock (var)
use omp_lib_kinds
integer (kind=omp_nest_lock_kind), intent(inout) :: var
end subroutine omp_destroy_nest_lock
subroutine omp_set_nest_lock (var)
use omp_lib_kinds
integer (kind=omp_nest_lock_kind), intent(inout) :: var
332
OpenMP API • Version 3.1 July 2011
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
end subroutine omp_set_nest_lock
subroutine omp_unset_nest_lock (var)
use omp_lib_kinds
integer (kind=omp_nest_lock_kind), intent(inout) :: var
end subroutine omp_unset_nest_lock
function omp_test_nest_lock (var)
use omp_lib_kinds
integer :: omp_test_nest_lock
integer (kind=omp_nest_lock_kind), intent(inout) :: var
end function omp_test_nest_lock
function omp_get_wtick ()
double precision :: omp_get_wtick
end function omp_get_wtick
function omp_get_wtime ()
double precision :: omp_get_wtime
end function omp_get_wtime
end interface
end module omp_lib
Appendix D
Interface Declarations
333
D.4
2
Example of a Generic Interface for a Library
Routine
3
4
Any of the OpenMP runtime library routines that take an argument may be extended
with a generic interface so arguments of different KIND type can be accommodated.
5
6
The OMP_SET_NUM_THREADS interface could be specified in the omp_lib module
as the following:
1
!
the "!" of this comment starts in column 1
interface omp_set_num_threads
subroutine omp_set_num_threads_1 ( number_of_threads_expr )
use omp_lib_kinds
integer ( kind=selected_int_kind( 8 ) ), intent(in) :: &
&
number_of_threads_expr
end subroutine omp_set_num_threads_1
subroutine omp_set_num_threads_2 ( number_of_threads_expr )
use omp_lib_kinds
integer ( kind=selected_int_kind( 3 ) ), intent(in) :: &
&
number_of_threads_expr
end subroutine omp_set_num_threads_2
end interface omp_set_num_threads
7
8
334
OpenMP API • Version 3.1 July 2011
1
APPENDIX
E
3
OpenMP ImplementationDefined Behaviors
4
5
6
7
This appendix summarizes the behaviors that are described as implementation defined in
this API. Each behavior is cross-referenced back to its description in the main
specification. An implementation is required to define and document its behavior in
these cases.
2
8
9
10
11
• Memory model: the minimum size at which a memory update may also read and
12
13
14
• Internal control variables: the initial values of nthreads-var, dyn-var, run-sched-var,
15
16
17
18
• Dynamic adjustment of threads: providing the ability to dynamically adjust the
19
20
21
22
• Loop directive: the integer type or kind used to compute the iteration count of a
23
24
• sections construct: the method of scheduling the structured blocks among threads
25
26
• single construct: the method of choosing a thread to execute the structured block
27
28
• Task scheduling points: where task scheduling points occur in untied task regions is
write back adjacent variables that are part of another variable (as array or structure
elements) is implementation defined but is no larger than required by the base
language (see Section 1.4.1 on page 13).
def-sched-var, bind-var, stacksize-var, wait-policy-var, thread-limit-var, and maxactive-levels-var are implementation defined (see Section 2.3.2 on page 29).
number of threads is implementation defined . Implementations are allowed to deliver
fewer threads (but at least one) than indicated in Algorithm 2-1 even if dynamic
adjustment is disabled (see Section 2.4.1 on page 36).
collapsed loop is implementation defined. The effect of the schedule(runtime)
clause when the run-sched-var ICV is set to auto is implementation defined. See
Section 2.5.1 on page 39.
in the team is implementation defined (see Section 2.5.2 on page 48).
is implementation defined (see Section 2.5.3 on page 50).
implementation defined (see Section 2.7.3 on page 65).
335
1
2
3
4
• atomic construct: a compliant implementation may enforce exclusive access
5
6
• omp_set_num_threads routine: if the argument is not a positive integer the
7
8
9
• omp_set_schedule routine: for implementation specific schedule types, the
between atomic regions which update different storage locations. The
circumstances under which this occurs are implementation defined (see Section 2.8.5
on page 73).
behavior is implementation defined (see Section 3.2.1 on page 116).
values and associated meanings of the second argument are implementation defined.
(see Section 3.2.11 on page 128).
10
11
12
13
14
• omp_set_max_active_levels routine: when called from within any explicit
15
16
17
18
• omp_get_max_active_levels routine: when called from within any explicit
19
20
21
• OMP_SCHEDULE environment variable: if the value of the variable does not
22
23
24
25
• OMP_NUM_THREADS environment variable: if any value of the list specified in the
26
27
• OMP_PROC_BIND environment variable: if the value is neither true nor false the
28
29
• OMP_DYNAMIC environment variable: if the value is neither true nor false the
30
31
• OMP_NESTED environment variable: if the value is neither true nor false the
32
33
34
• OMP_STACKSIZE environment variable: if the value does not conform to the
35
36
• OMP_WAIT_POLICY environment variable: the details of the ACTIVE and
37
38
39
40
• OMP_MAX_ACTIVE_LEVELS environment variable: if the value is not a non-
parallel region the binding thread set (and binding region, if required) for the
omp_set_max_active_levels region is implementation defined and the
behavior is implementation defined. If the argument is not a non-negative integer
then the behavior is implementation defined (see Section 3.2.14 on page 132).
parallel region the binding thread set (and binding region, if required) for the
omp_get_max_active_levels region is implementation defined (see
Section 3.2.15 on page 134).
conform to the specified format then the result is implementation defined (see
Section 4.1 on page 154).
OMP_NUM_THREADS environment variable leads to a number of threads that is
greater than the implementation can support, or if any value is not a positive integer,
then the result is implementation defined (see Section 4.2 on page 155).
behavior is implementation defined (see Section 4.4 on page 156).
behavior is implementation defined (see Section 4.3 on page 156).
behavior is implementation defined (see Section 4.5 on page 157).
specified format or the implementation cannot provide a stack of the specified size
then the behavior is implementation defined (see Section 4.6 on page 157).
PASSIVE behaviors are implementation defined (see Section 4.7 on page 158).
negative integer or is greater than the number of parallel levels an implementation
can support then the behavior is implementation defined (see Section 4.8 on page
159).
336
OpenMP API • Version 3.1 July 2011
1
2
3
4
• OMP_THREAD_LIMIT environment variable: if the requested value is greater than
the number of threads an implementation can support, or if the value is not a positive
integer, the behavior of the program is implementation defined (see Section 4.9 on
page 160).
Fortran
5
6
7
8
• threadprivate directive: if the conditions for values of data in the threadprivate
9
10
11
12
13
• shared clause: passing a shared variable to a non-intrinsic procedure may result in
14
15
16
17
18
• Runtime library definitions: it is implementation defined whether the include file
objects of threads (other than the initial thread) to persist between two consecutive
active parallel regions do not all hold, the allocation status of an allocatable array in
the second region is implementation defined (see Section 2.9.2 on page 88).
the value of the shared variable being copied into temporary storage before the
procedure reference, and back out of the temporary storage into the actual argument
storage after the procedure reference. Situations where this occurs other than those
specified are implementation defined (see Section 2.9.3.2 on page 94).
omp_lib.h or the module omp_lib (or both) is provided. It is implementation
defined whether any of the OpenMP runtime library routines that take an argument
are extended with a generic interface so arguments of different KIND type can be
accommodated (see Section 3.1 on page 114).
Fortran
Appendix E
OpenMP Implementation-Defined Behaviors
337
1
This page intentionally left blank.
2
338
OpenMP API • Version 3.1 July 2011
1
APPENDIX
F
2
Features History
3
4
This appendix summarizes the major changes between the OpenMP API Version 2.5 and
Version 3.0, and between Version 3.0 and Version 3.1.
5
F.1
Version 3.0 to 3.1 Differences
6
7
• The final and mergeable clauses (see Section 2.7.1 on page 61) were added to
8
9
• The taskyield construct (see Section 2.7.2 on page 64) was added to allow user-
the task construct to support optimization of task data environments.
defined task switching points.
10
11
12
• The atomic construct (see Section 2.8.5 on page 73) was extended to include
13
14
• Data environment restrictions were changed to allow intent(in) and const-
15
16
17
• Data environment restrictions were changed to allow Fortran pointers in
18
19
• New reduction operators min and max were added for C and C++ (see
20
21
22
23
• The nesting restrictions in Section 2.10 on page 111 were clarified to disallow
24
25
• The omp_in_final runtime library routine (see Section 3.2.20 on page 140) was
read, write, and capture forms, and an update clause was added to apply
the already existing form of the atomic construct.
qualified types for the firstprivate clause (see Section 2.9.3.4 on page 98).
firstprivate (see Section 2.9.3.4 on page 98) and lastprivate (see
Section 2.9.3.5 on page 101).
Section 2.9.3.6 on page 103 and page 105)
closely-nested OpenMP regions within an atomic region. This allows an atomic
region to be consistently defined with other OpenMP regions so that they include all
the code in the atomic construct.
added to support specialization of final task regions.
339
1
2
3
4
5
• The nthreads-var ICV has been modified to be a list of the number of threads to use
6
7
8
• The bind-var ICV has been added, which controls whether or not threads are bound
9
• Descriptions of examples (see Appendix A on page 161) were expanded and clarified.
at each nested parallel region level. The value of this ICV is still set with the
OMP_NUM_THREADS environment variable (see Section 4.2 on page 155), but the
algorithm for determining the number of threads used in a parallel region has been
modified to handle a list (see Section 2.4.1 on page 36).
to processors (see Section 2.3.1 on page 28). The value of this ICV can be set with
the OMP_PROC_BIND environment variable (see Section 4.4 on page 156).
• Replaced incorrect use of omp_integer_kind in Fortran interfaces (see
10
11
12
13
Section D.3 on page 330 and Section D.4 on page 334) with
selected_int_kind(8).
F.2
Version 2.5 to 3.0 Differences
14
15
The concept of tasks has been added to the OpenMP execution model (see Section 1.2.3
on page 8 and Section 1.3 on page 12).
16
17
• The task construct (see Section 2.7 on page 61) has been added, which provides a
18
19
• The taskwait construct (see Section 2.8.4 on page 72) has been added, which
20
21
22
• The OpenMP memory model now covers atomicity of memory accesses (see
23
24
25
26
27
28
29
• In Version 2.5, there was a single copy of the nest-var, dyn-var, nthreads-var and
30
31
32
• The definition of active parallel region has been changed: in Version 3.0 a
33
34
• The rules for determining the number of threads used in a parallel region have
35
36
• In Version 3.0, the assignment of iterations to threads in a loop construct with a
mechanism for creating tasks explicitly.
causes a task to wait for all its child tasks to complete.
Section 1.4.1 on page 13). The description of the behavior of volatile in terms of
flush was removed.
run-sched-var internal control variables (ICVs) for the whole program. In Version
3.0, there is one copy of these ICVs per task (see Section 2.3 on page 28). As a result,
the omp_set_num_threads, omp_set_nested and omp_set_dynamic
runtime library routines now have specified effects when called from inside a
parallel region (see Section 3.2.1 on page 116, Section 3.2.7 on page 123 and
Section 3.2.9 on page 125).
parallel region is active if it is executed by a team consisting of more than one
thread (see Section 1.2.2 on page 2).
been modified (see Section 2.4.1 on page 36).
static schedule kind is deterministic (see Section 2.5.1 on page 39).
340
OpenMP API • Version 3.1 July 2011
1
2
3
• In Version 3.0, a loop construct may be associated with more than one perfectly
4
5
• Random access iterators, and variables of unsigned integer type, may now be used as
6
7
8
• The schedule kind auto has been added, which gives the implementation the
nested loop. The number of associated loops may be controlled by the collapse
clause (see Section 2.5.1 on page 39).
loop iterators in loops associated with a loop construct (see Section 2.5.1 on page 39).
freedom to choose any possible mapping of iterations in a loop construct to threads in
the team (see Section 2.5.1 on page 39).
9
10
• Fortran assumed-size arrays now have predetermined data-sharing attributes (see
11
12
• In Fortran, firstprivate is now permitted as an argument to the default
13
14
15
16
17
• For list items in the private clause, implementations are no longer permitted to use
18
19
20
21
22
• In Version 3.0, Fortran allocatable arrays may appear in private,
23
24
• In Version 3.0, static class members variables may appear in a threadprivate
25
26
27
28
• Version 3.0 makes clear where, and with which arguments, constructors and
29
30
31
• The runtime library routines omp_set_schedule and omp_get_schedule
32
33
34
35
36
• The thread-limit-var ICV has been added, which controls the maximum number of
37
38
39
40
• The max-active-levels-var ICV has been added, which controls the number of nested
Section 2.9.1.1 on page 84).
clause (see Section 2.9.3.1 on page 93).
the storage of the original list item to hold the new list item on the master thread. If
no attempt is made to reference the original list item inside the parallel region, its
value is well defined on exit from the parallel region (see Section 2.9.3.3 on page
96).
firstprivate, lastprivate, reduction, copyin and copyprivate
clauses. (see Section 2.9.2 on page 88, Section 2.9.3.3 on page 96, Section 2.9.3.4 on
page 98, Section 2.9.3.5 on page 101, Section 2.9.3.6 on page 103, Section 2.9.4.1 on
page 107 and Section 2.9.4.2 on page 109).
directive (see Section 2.9.2 on page 88).
destructors of private and threadprivate class type variables are called (see
Section 2.9.2 on page 88, Section 2.9.3.3 on page 96, Section 2.9.3.4 on page 98,
Section 2.9.4.1 on page 107 and Section 2.9.4.2 on page 109)
have been added; these routines respectively set and retrieve the value of the
run-sched-var ICV (see Section 3.2.11 on page 128 and Section 3.2.12 on page 130).
threads participating in the OpenMP program. The value of this ICV can be set with
the OMP_THREAD_LIMIT environment variable and retrieved with the
omp_get_thread_limit runtime library routine (see Section 2.3.1 on page 28,
Section 3.2.13 on page 131 and Section 4.9 on page 160).
active parallel regions. The value of this ICV can be set with the
OMP_MAX_ACTIVE_LEVELS environment variable and the
omp_set_max_active_levels runtime library routine, and it can be retrieved
Appendix F
Features History
341
1
2
3
with the omp_get_max_active_levels runtime library routine (see
Section 2.3.1 on page 28, Section 3.2.14 on page 132, Section 3.2.15 on page 134 and
Section 4.8 on page 159).
4
5
6
7
• The stacksize-var ICV has been added, which controls the stack size for threads that
the OpenMP implementation creates. The value of this ICV can be set with the
OMP_STACKSIZE environment variable (see Section 2.3.1 on page 28 and
Section 4.6 on page 157).
8
9
10
• The wait-policy-var ICV has been added, which controls the desired behavior of
11
12
13
• The omp_get_level runtime library routine has been added, which returns the
14
15
16
• The omp_get_ancestor_thread_num runtime library routine has been added,
17
18
19
• The omp_get_team_size runtime library routine has been added, which returns,
20
21
22
• The omp_get_active_level runtime library routine has been added, which
23
24
• In Version 3.0, locks are owned by tasks, not by threads (see Section 3.3 on page
waiting threads. The value of this ICV can be set with the OMP_WAIT_POLICY
environment variable (see Section 2.3.1 on page 28 and Section 4.7 on page 158).
number of nested parallel regions enclosing the task that contains the call (see
Section 3.2.16 on page 135).
which returns, for a given nested level of the current thread, the thread number of the
ancestor (see Section 3.2.17 on page 136).
for a given nested level of the current thread, the size of the thread team to which the
ancestor belongs (see Section 3.2.18 on page 137).
returns the number of nested, active parallel regions enclosing the task that
contains the call (see Section 3.2.19 on page 139).
141).
342
OpenMP API • Version 3.1 July 2011
Index
Symbols
_OPENMP macro, 2-26
A
atomic, 2-73
attributes, data-sharing, 2-84
auto, 2-44
B
barrier, 2-70
C
capture, atomic, 2-73
clauses
collapse, 2-42
copyin, 2-107
copyprivate, 2-109
data-sharing, 2-92
default, 2-93
firstprivate, 2-98
lastprivate, 2-101
private, 2-96
reduction, 2-103
schedule, 2-43
shared, 2-94
collapse, 2-42
compliance, 1-17
conditional compilation, 2-26
constructs
atomic, 2-73
barrier, 2-70
critical, 2-68
do, Fortran, 2-41
flush, 2-78
for, C/C++, 2-39
loop, 2-39
master, 2-67
ordered, 2-82
parallel, 2-33
parallel for, C/C++, 2-56
parallel sections, 2-57
parallel workshare, Fortran, 2-59
sections, 2-48
single, 2-50
task, 2-61
taskwait, 2-72
taskyield, 2-64
workshare, 2-52
worksharing, 2-38
copyin, 2-107
copyprivate, 2-109
critical, 2-68
D
data sharing, 2-84
data-sharing clauses, 2-92
default, 2-93
directives, 2-21
format, 2-22
threadprivate, 2-88
see also constructs
do, Fortran, 2-41
dynamic, 2-44
Index 343
E
N
environment variables, 4-153
modifying ICV’s, 2-29
OMP_DYNAMIC, 4-156
OMP_MAX_ACTIVE_LEVELS, 4-159
OMP_NESTED, 4-157
OMP_NUM_THREADS, 4-155
OMP_SCHEDULE, 4-154
OMP_STACKSIZE, 4-157
OMP_THREAD_LIMIT, 4-160
OMP_WAIT_POLICY, 4-158
Examples, A-161
execution model, 1-12
nested parallelism, 1-12, 2-28, 3-125
nesting, 2-111
number of threads, 2-36
F
firstprivate, 2-98
flush, 2-78
flush operation, 1-15
for, C/C++, 2-39
G
glossary, 1-2
grammar rules, C-316
guided, 2-44
H
header files, 3-114, D-325
I
ICVs (internal control variables), 2-28
implementation, E-335
include files, 3-114, D-325
internal control variables (ICVs), 2-28
L
lastprivate, 2-101
loop, scheduling, 2-47
M
master, 2-67
memory model, 1-13
model
execution, 1-12
memory, 1-13
Index-344
OpenMP API • Version 3.1 July 2011
O
omp_destroy_lock, 3-144
omp_destroy_nest_lock, 3-144
OMP_DYNAMIC, 4-156
omp_get_active_level, 3-139
omp_get_ancestor_thread_num, 3-136
omp_get_dynamic, 3-124
omp_get_level, 3-135
omp_get_max_active_levels, 3-134
omp_get_max_threads, 3-118
omp_get_nested, 3-126
omp_get_num_procs, 3-121
omp_get_num_threads, 3-117
omp_get_schedule, 3-130
omp_get_team_size, 3-137
omp_get_thread_limit, 3-131
omp_get_thread_num, 3-119
omp_get_wtick, 3-150
omp_get_wtime, 3-148
omp_in_final, 3-140
omp_in_parallel, 3-122
omp_init_lock, 3-143
omp_init_nest_lock, 3-143
omp_lock_kind, 3-142
omp_lock_t, 3-142
OMP_MAX_ACTIVE_LEVELS, 4-159
omp_nest_lock_kind, 3-142
omp_nest_lock_t, 3-142
OMP_NESTED, 4-157
OMP_NUM_THREADS, 4-155
OMP_SCHEDULE, 4-154
omp_set_dynamic, 3-123
omp_set_lock, 3-145
omp_set_max_active_levels, 3-132
omp_set_nest_lock, 3-145
omp_set_nested, 3-125
omp_set_num_threads, 3-116
omp_set_schedule, 3-128
OMP_STACKSIZE, 4-157
omp_test_lock, 3-147
omp_test_nest_lock, 3-147
OMP_THREAD_LIMIT, 4-160
omp_unset_lock, 3-146
omp_unset_nest_lock, 3-146
OMP_WAIT_POLICY, 4-158
OpenMP
compliance, 1-17
examples, A-161
features history, F-339
implementation, E-335
ordered, 2-82
P
parallel, 2-33
parallel do, 2-56
parallel for, C/C++, 2-56
parallel sections, 2-57
parallel workshare, Fortran, 2-59
pragmas
see constructs
private, 2-96
R
read, atomic, 2-73
reduction, 2-103
references, 1-17
regions, nesting, 2-111
runtime, 2-45
runtime library
interfaces and prototypes, 3-114
synchronization, locks
constructs, 2-67
routines, 3-141
T
task
scheduling, 2-65
task, 2-61
tasking, 2-61
taskwait, 2-72
taskyield, 2-64
terminology, 1-2
threadprivate, 2-88
timer, 3-148
timing routines, 3-148
U
update, atomic, 2-73
V
variables, environment, 4-153
W
wall clock timer, 3-148
website
www.openmp.org
workshare, 2-52
worksharing
constructs, 2-38
parallel, 2-55
scheduling, 2-47
write, atomic, 2-73
S
schedule, 2-43
scheduling
loop, 2-47
tasks, 2-65
sections, 2-48
shared, 2-94
single, 2-50
static, 2-44
stubs for runtime library routines
C/C++, B-302
Fortran, B-309
Index-345
Index-346
OpenMP API • Version 3.1 July 2011