PDF

Workshop on the State of Art in Software Research, at UC Irvine
Programming Language X10 on Multiple JVMs
2013/05/15
Kiyokuni (KIYO) Kawachiya and Mikio Takeuchi
IBM Research - Tokyo
© 2013 IBM Corporation
IBM Research - Tokyo
This Talk Focuses on “Managed X10”
Managed X10 = X10 implementation on Java (i.e. managed runtime)
– X10 program is translated into Java source code and compiled into
Java bytecode, then executed on multiple Java VMs
X10 Compiler Frontend
X10
Source
Parsing /
Type Check
X10 AST
AST Optimizations
AST Lowering
AST: Abstract Syntax Tree
XRX: X10 Runtime in X10
XRJ: X10 Runtime in Java
XRC: X10 Runtime in C++
X10RT: X10 Comm. Runtime
X10 AST
C++
Backend
XRC
C++ Code
Generation
Java Code
Generation
C++ Source
Java Source
C++ Compiler
XRX
Java Compiler
Native Code
2
2013/05/15
Kiyokuni Kawachiya
XRJ
Bytecode
Native X10
Managed X10
JNI
Native Env
Java
Backend
X10RT
Java VMs
© 2013 IBM Corporation
IBM Research - Tokyo
“Hello World” in X10
ƒ X10 Program
Parallel distributed execution
by creating an activity at each
place
class MyHello {
public static def main(Array[String]) {
finish for (pl in Place.places()) {
async at (pl) Console.OUT.println("Hello World from place " + here.id);
}}}
ƒ Compile and Execute
$ x10c MyHello.x10
# compile
$ ls
MyHello.x10
MyHello$$Main.class
MyHello.java
MyHello$$Closure$0.class
MyHello.class
$ X10_NPLACES=4 x10 MyHello # execute
Hello World from place 3
Hello World from place 0
Hello World from place 1
Hello World from place 2
3
2013/05/15
Kiyokuni Kawachiya
X10 program is translated into
Java (or C+) source code and
further compiled
Executed asynchronously on
each place
For C++ back-end:
$ x10c++ -o MyHello MyHello.x10
$ X10_NPLACES=4 runx10 MyHello
© 2013 IBM Corporation
IBM Research - Tokyo
Distributed Execution Model of X10
APGAS: Asynchronous Partitioned Global Address Space
Address space
Activity
Place 0
Place 1
at
async
asyn
c
Creation
at
Place MAX_PLACES-1
...
c
yn
as
at
Place shift
DistArray
Object
DistArray
Local ref
Remote ref
Immutable data, class, struct, function
ƒ A global address space is divided into multiple places (≒computer)
– Place can contain activities and data (objects, structs, functions)
– In Managed X10, each place is implemented by a Java VM
ƒ An object belongs to a place where it is created
– Object cannot be accessed from other places, but
– Object can be remotely referenced from other places, using GlobalRef
4
2013/05/15
Kiyokuni Kawachiya
© 2013 IBM Corporation
IBM Research - Tokyo
X10 Characteristics
Java-like syntax, but uses var, val, def, ...
Rich array class
X10 24 /* Array example */
X10
1 class Sample[T] implements (String)=>String {
2 var data:T;
25 val pt = [2,4] as Point;
// Point{rank==2}
Operators
can
3 def this(d:T) { data = d; } // constructor
26 val R1 = (1..2)*(3..5);
// Region{rank==2}
4 public operator this(str:String) = str + data; also be defined
27 val arr = new Array[Int](R1); // Array[Int]{rank==2}
5
28 arr(2,4) = 8;
6 static struct MyPair[T,U](fst:T,snd:U) {
29 Console.OUT.println(arr(pt)); // -> 8
7 public def toString() = "(" + fst + "," + snd +")";
30 /* Parallel processing */
8 }
31 var m:Int = 0; val i = 1;
// mutable/immutable data
9
32 finish async { m = i * 2; }
10 public static def main(args:Array[String](1)) {
33 Console.OUT.println(m);
// -> 2
11 /* Class example */
34 /* Distributed processing */
Parallel/distributed
12 val o = new Sample[Double](1.2);
35 at (here.next()) o.data = 3.4; // copy of o is modified
processing
13 Console.OUT.println(o.data);
// -> 1.2
36 Console.OUT.println(o.data); // -> 1.2
14 Console.OUT.println(o("Data is ")); // -> Data is 1.2
37 /* GlobalRef example */
15 var a:Any = o;
38 val g = GlobalRef(o);
16 Console.OUT.println(o.typeName());// -> Sample[x10.lang.Double]
39 at (here.next()) { at (g.home) g().data = 5.6; }
17 /* Struct example */
40 Console.OUT.println(o.data); // -> 5.6
18 val p = MyPair[Int,Double](1,2.3);
41 }
Global data access
19 Console.OUT.println(p);
// -> (1,2.3)
42 }
20 val x = 4;
Strong type system – type
21 /* Function example */
22 val q = MyPair[(Int)=>Int, Int]((i:Int)=>i*x, 5);
parameters are not erased
23 Console.OUT.println(q.fst(q.snd)); // -> 20
New 1st class data types,
structs and functions
5
2013/05/15
Kiyokuni Kawachiya
Primitive types (Int, Double, ...)
are defined as structs
© 2013 IBM Corporation
IBM Research - Tokyo
Compiling X10 to Java
1 class B {
2 def w() = this;
3}
4 class C(p:Int) extends B {
5 val q:Int; var r:Int;
6 def u() { }
7 private def v() { }
8 def w() = super.w();
9 def this(p:Int) {
10 property(p);
11 q = 1;
12 }
13 static val s = Place.MAX_PLACES;
14 static def t() { }
15 }
(in X10 2.3.0)
X10
Class/field names are basically same
Constructor is separated into Java
constructor and field initializer
– to support inlining by front-end
For private instance method and nonvirtual call, additional bridge methods
are generated
x10c
1 import x10.core.concurrent.AtomicInteger;
Java
2 import x10.lang.ExceptionInInitializer;
3 import x10.lang.Place;
4 import x10.runtime.impl.java.InitDispatcher;
5 public class C extends B {
6 public int p; // property
7 public int q, r; // instance fields
8 // constructor just for allocation
9 public C(System[] $dummy) { super($dummy); }
10 // instance field initializer
11 final public C C$$init$S(int p) {
12 this.p = p;
13 this.r = 0;
14 this.q = 1;
15 return this;
16 }
17 // 1-phase constructor for Java interop programmers
18 public C(int p) {
19 this((System[]) null); C$$init$S(p);
20 }
21 public void u() {} // instance methods
22 private void v() {}
23 // bridge for private instance method
24 public static void v$P(C C) { C.v(); }
25 public B w() { return super.w(); }
26 // bridge for non-virtual call
27 public B B$w$S() { return super.w(); }
28 public static void t() {} // static method
(some lines are omitted)
– to support inlining by front-end
6
2013/05/15
Kiyokuni Kawachiya
© 2013 IBM Corporation
IBM Research - Tokyo
Distributed Execution using Multiple JVMs
ƒ Multi-JVM is supported in X10 2.1.2
– Each X10 “Place” uses its own Java VM
– Uses common X10RT (in C++) through JNI
“x10” command
Place P
JVM
Initiate Places via SSH
x10.runtime.impl.java.Runtime
Run activity at place P
Invoke function
Serialize function into byte array
Deserialize function from byte array
Send byte array
Receive byte array from place P
JNI (Java->C++)
JNI (Java->C++)
x10rt_send_msg
x10rt_probe
Communication Physical Layer
(e.g. socket, MPI)
Communication Physical Layer
(e.g. socket, MPI)
Kiyokuni Kawachiya
Msg Buffer
Byte array message
Network
2013/05/15
Place Q
x10.runtime.impl.java.Runtime
Byte array message
7
JVM
Uses backend unique
wire format in X10
2.2.0
© 2013 IBM Corporation
IBM Research - Tokyo
Remote Reference using GlobalRef
X10
1 class GRefExample {
2 static class ResultBox { var value:Int = 0; }
3 public static def main(Array[String]) {
4 val place1 = here.next();
5 val o = new ResultBox();
6 val g = GlobalRef[ResultBox](o);
7 finish {
8
at (place1) async {
// create an activity at place1
9
val r = do_long_calculation(g);
10
at (g) g().value = r; // set the result through GlobalRef
11
}
12
do_some_calculation_locally();
13 } // end of finish
14 Console.OUT.println(o.value);
15 } }
Figure. Distributed Processing with GlobalRef.
Place 0
Place 1
o
g (at Line 6)
at
g()
g (at Line 9)
at
g (at Line 10)
This object must not be collected
while its GlobalRefs (g) exist in
other places
ƒ GlobalRef is a special struct to hold a global reference to an object
– Created by “GlobalRef[T](obj)” and cannot be modified
– The object can be accessed by “at (g) g()...”
8
2013/05/15
Kiyokuni Kawachiya
© 2013 IBM Corporation
IBM Research - Tokyo
Garbage Collection in Managed X10
ƒ X10 data is represented by Java objects and collected by each JVM’s GC
– However, remote reference is not a reference in the JVM level
Place 0
Place 1
Even if A disappears,
C must not be collected
D (GlobalRef)
C
A
X10 environment
on multiple JVMs
This remote reference is not
visible to underlying JVMs
B
123
JVM0
JVM1
GC
Computer 0
GC
Computer 1
Network
Figure. Remote References across JVMs.
ƒ In old X10, remotely-referenced (globalized) objects were registered into
a management table and never collected
Î We needed better implementation
9
2013/05/15
Kiyokuni Kawachiya
© 2013 IBM Corporation
IBM Research - Tokyo
Behavior of the Distributed GC
XRJ: X10 Runtime in Java
GOT: Globalized Object Tracker
RRT: Remote Reference Tracker
Creating and using a GlobalRef
X10
1 class GRefExample {
2 static class ResultBox { var value:Int = 0; }
3 public static def main(Array[String]) {
4 val place1 = here.next();
5 val o = new ResultBox();
6 val g = GlobalRef[ResultBox](o); // g
7 finish {
8
at (place1) async {
// g’
9
val r = do_long_calculation(g);
10
at (g) g().value = r;
// g’’
11
}
12
do_some_calculation_locally();
13 } // end of finish
14 Console.OUT.println(o.value);
15 } }
Place 0
g’’
Place 1
obj
home=P0
id=123
g
g’ (at Place 1)
o
obj
obj=null
home=P0
home=P0
id=0
id=123
id=123
GOT for o
id2got
123
got2id
123
XRJ
RRT for g’
weakRef
weakRef
strongRef
home=P0
rCnt=0
rCnt=10
rCnt=8
id=123
id=123
wCnt=8
wCnt=10
Computer 0
rrtTable
XRJ
Computer 1
Network
Figure. Data Structures for the Distributed GC.
10
2013/05/15
Kiyokuni Kawachiya
© 2013 IBM Corporation
IBM Research - Tokyo
Behavior of the Distributed GC
Collecting the globalized object, and related data
X10
1 class GRefExample {
2 static class ResultBox { var value:Int = 0; }
3 public static def main(Array[String]) {
4 val place1 = here.next();
5 val o = new ResultBox();
6 val g = GlobalRef[ResultBox](o); // g
7 finish {
8
at (place1) async {
// g’
9
val r = do_long_calculation(g);
10
at (g) g().value = r;
// g’’
11
}
12
do_some_calculation_locally();
13 } // end of finish
14 Console.OUT.println(o.value);
15 } }
Place 0
g’’
XRJ: X10 Runtime in Java
GOT: Globalized Object Tracker
RRT: Remote Reference Tracker
Place 1
obj
home=P0
id=123
g
g’ (at Place 1)
o
obj
obj=null
home=P0
home=P0
id=123
id=123
GOT for o
id2got
123
got2id
123
XRJ
RRT for g’
weakRef
weakRef
strongRef
home=P0
rCnt=0
rCnt=8
id=123
id=123
wCnt=8
Computer 0
rrtTable
XRJ
Computer 1
Network
Figure. Data Structures for the Distributed GC.
ƒ Extra inter-place comm. is performed only when a remote ref. is removed
11
2013/05/15
Kiyokuni Kawachiya
© 2013 IBM Corporation
IBM Research - Tokyo
Evaluation: Distributed Fibonacci
X10
1 class DistFib {
2 static val atomicI = new x10.util.concurrent.AtomicInteger(0);
3 static def getAvailPlace() { // returns next available place
4 val i = at (Place.FIRST_PLACE) atomicI.incrementAndGet();
5 return Place(i % Place.MAX_PLACES);
6 }
7
8 private val root = GlobalRef[DistFib](this); // ref to root obj
9 transient var v:Int; // in-out param, exists only in the root
10 def this(n:Int) { v = n; }
11 def compute() { // global method, can be invoked at any place
12 val n = at (root) root().v; // get the input from the root object
13 if (n < 2) return;
// Fib(0)==0, Fib(1)==1
14 val f1 = new DistFib(n-1), f2 = new DistFib(n-2);
15 finish {
16
at (getAvailPlace()) async f1.compute(); // compute remotely
17
f2.compute();
// compute here
18 }
19 val r = f1.v + f2.v;
// result = Fib(n-1) + Fib(n-2)
20 at (root) root().v = r;
// set the result to the root object
21 }
22
23 public static def main(args:Array[String](1)) {
24 var n:Int = 10; if (args.size > 0) n = Int.parseInt(args(0));
25 val f = new DistFib(n); f.compute(); // this line is measured
26 Console.OUT.println(f.v);
27 }
Figure. Distributed Fibonacci.
28 }
12
2013/05/15
Kiyokuni Kawachiya
ƒ Calculates n-th Fibonacci number
recursively using multiple places
– Field root is a GlobalRef that
points to the “root object”
– Field v of the root object is
accessed by “at (root) root().v ...”
Place 0
Place 1
DistFib(3)
Place 2
DistFib(2), copied
root
DistFib(1), copied
root
v=3
root
v = undef
DistFib(2)
at
DistFib(1)
root
root
v=2
v=1
DistFib(1)
v = undef
at
DistFib(0)
root
root
v=1
v=0
© 2013 IBM Corporation
Environment: 6-core Xeon X5670 blades (HT disabled)
32GB memory, 40Gbps Infiniband
Red Hat Enterprise Server Linux 5.5 x86_64
IBM J9 Java VM for Linux x86_64, Version 6.0 SR9
Generational GC, -Xms4m –Xmx512m
16 places by starting 4 JVMs on each of 4 blades
IBM Research - Tokyo
Result: Distributed Fibonacci
Execution times to calculate n-th Fibonacci numbers using 16 places
Elapsed time in msec (smaller is better)
* Logarithmic scale
100,000
Base
Dist. GC
10,000
1,000
100
10
n = 5 (5)
10 (55)
15 (610)
20 (6765)
25 (75025)
Index (and result)
Figure. Execution Times of Distributed Fibonacci.
ƒ For very small n’s, base X10 is faster by about 10%
– Because of the small overhead in the Dist. GC version
ƒ For n=25, distributed GC version is 13% faster
– Because heap is not consumed by uncollectable objects
– Heap size of Place 0 reaches 468MB in base, but 148MB in Dist. GC
13
2013/05/15
Kiyokuni Kawachiya
© 2013 IBM Corporation
IBM Research - Tokyo
Summary of the Talk
Explained implementation details of Managed X10 (X10 on Java VMs)
ƒ Challenges in compiling X10 to Java
– For performance and functionality
ƒ Distributed GC over Multiple Java VMs
– No modification to the underlying JVMs
– Introduced two data structures for monitoring globalized objects and
remote references
14
2013/05/15
Kiyokuni Kawachiya
© 2013 IBM Corporation