Slides

IBM, Armonk, 2004–09–16
Late Binding and
Dynamic Implementation
Ian Piumarta, HP Labs
[email protected]
http://www-sor.inria.fr/projects/vvm
1
Copyright 2004 Ian Piumarta
Reproduction permitted for personal and educational use.
VMs from the language designer’s perspective
a VM provides:
• idealised abstract instruction set
– abstract representation of program code closely matched to source language
semantics
– compact representation (bytecodes, . . . )
– architecture-neutral “engine” hides underlying CPU/OS
• idealised memory abstraction
– execution engine “sees” application objects, not bits/bytes/words/. . .
– object semantics match source language semantics
– transparent services: GC, weak pointers, finalisation, . . .
• idealised execution environment
– virtual “OS” services insulate app from OS
– memory, files, threads, windows, . . .
– platform-neutral apps: “write once, run anywhere” a
a
still pretty much a holy grail, although several dialects of Smalltalk come very close
2
Copyright 2004 Ian Piumarta
Reproduction permitted for personal and educational use.
background: problems with virtual machines
VMs are a pretty neat idea, but what’s the problem with them today?
• rigidity
it’s difficult to specialise them for a new application or domain
• adding new abstractions is complex, and in the domain of a few experts
• reusability is depressingly low
• specialisation is complex and inherently static
– much work repeated for each new application
• portability is almost non-existent
and yet there are precedents that clearly indicate the need for specialisation. . .
• OS “Kits”, [Personal,Embedded]Java[Card], OA scripting, . . .
. . . and evolution . . .
• GUI, protocols, algorithms (e.g., set-top boxes), . . .
3
Copyright 2004 Ian Piumarta
Reproduction permitted for personal and educational use.
virtual virtual machines
circa 1999
VM specs
Mister Toast
formats
incinerate
raw
objects
instructions
operating system
file!
primitives
net
VMlets
rdr
primitives
VMlet
loader
program
primitive
glue
executive
PBR
application
loader
‘‘VM’’
file
obmem
glue
bytecode
glue
network
card
reader
VVEE
application
reader
GC
memory
object
memory
virtual processor
VVM
CPU
4
Copyright 2004 Ian Piumarta
Reproduction permitted for personal and educational use.
contentious statements
without justification:
• it is much easier to implement highly-dynamic languages/systems on top of a
highly-dynamic language/system (than it is, e.g., on top of an entirely static
language/system)
in other words: reuse (with/without modification) is easier than (re)implementation
corollary:
• the more dynamic the base language/system, the easier it is to build a given
dynamic language/system feature within
influences:
• BBC Basic: built-in 6502 asm
• ANDF
5
Copyright 2004 Ian Piumarta
Reproduction permitted for personal and educational use.
conventional programming languages
Source
Syntax
Semantics
Pragmatics
Application
Compiler
Runtime
UDP
Language
Environment
Libraries
System
malleable (under programmer control)
Hardware
rigid (imposed from outside)
"black box" (hermetically sealed)
6
Copyright 2004 Ian Piumarta
Reproduction permitted for personal and educational use.
conventional programming languages
two black boxes:
• language
• environment
rigidity
• syntax (DSLs?)
• semantics (atomic test-and-set?)
• pragmatics (primitives?)
• libraries (protocol evolution / in-field servicing?)
• system (drivers, scheduling, resource management, . . . ?)
7
Copyright 2004 Ian Piumarta
Reproduction permitted for personal and educational use.
some observations about conventional languages
without much justification (for lack of time)
• disasterously early-bound
• insufficient meta-data
• no reification of implementation
• artificially-closed access to internals
(characteristics which are inherited by their application “offspring”)
⇒
• late-bind everything (including language implementation)
• make everything 1st-class (including input [‘programs’] and output [native code])
• leave the entire compilation chain open by default (but make it trivial to close it
off entirely, selectively, or according to artbitrary verification policies as needed)
for some suitable definition of “everything”
(traditionally: the simplest possible one)
8
Copyright 2004 Ian Piumarta
Reproduction permitted for personal and educational use.
the simplest possible definition of “everything”
lowest common (implementation) denominator:
• platform’s native code
• C ABI
• connection to libc / posix / whatever . . .
assume nothing about anything else
expose this “everything” to the application (or the app’s runtime)
• in the simplest, least-constrained form possible
then go have (lots of) fun building maximally-dynamic systems on top of it
9
Copyright 2004 Ian Piumarta
Reproduction permitted for personal and educational use.
a universal dynamic compiler: YNVM
• dynamic compilation (to native code, C ABI)
• structured, language- / architecture- independent input
– object structures (∼ “parse” trees) describing C semantics, built from a small
number of object types (symbols, literals and lists)
• simple, persistent meta-data (program structure, environment)
– using the same objects as the structured input
⇒ programs = data = metan data = programs
• customisable semantics
– application-defined transformations on input data structures
– applied at runtime (during compilation) & can be redefined on-the-fly
• access to all levels of compilation & code generation
– the compiler itself is reified, 1st-class data in the application’s address space
(don’t like the compiler? redefine it on-the-fly. . . )
• single point of access to the host system: dlsym (or equivalent)
– hence (by transitivity) arbitrary & unlimited access to all OS services
10
Copyright 2004 Ian Piumarta
Reproduction permitted for personal and educational use.
architecture
keyboard
console interface
text
external
file, etc...
parser
object memory
parse trees,
object
structures
program-generated
structures
meta-data
tree compiler
GC
asbtract
machine insns
program-generated
abstract insns
stack compiler
optimizer
code generator
concrete
machine insns
program-generated
concrete insns
heap
malloc()
dynamic assembler
native code
11
Copyright 2004 Ian Piumarta
Reproduction permitted for personal and educational use.
architecture
(define x 42)
memoire objet
for (;;) {
Object *exp= yyparse();
void *impl= exp->compile();
impl();
free(impl);
gc::collect();
}
parseur
cell: #
list:
sym: define
tas C (malloc() + free())
int: 42
compilateur
sym: x
0x1234
0x5678
cell: 42
ccg
vpu->Enter(0, 0);
vpu->LdInt(42);
vpu->StW(0x1234);
vpu->Ret();
tous les symboles
VPU
li r3, 42
li r4, 0x1234
stw r3, 0(r4)
blr
LIri(3, 42)
LIri(4, 0x1234)
STWrr(3, 4)
BLR()
gen
PPC
racines:
gen
i386
ramasse miettes
libc
gen
Sparc
gen
MIPS
malloc()
free()
12
Copyright 2004 Ian Piumarta
Reproduction permitted for personal and educational use.
symbols
contain three kinds of value:
• primitive
• object
• syntactic
hierarchical namespaces
• any symbol’s object value can be a list of symbols (another namespace)
⇒ hierarchy
named semantics
• syntactic value = closure invoked during compilation
⇒ context governs meaning
13
Copyright 2004 Ian Piumarta
Reproduction permitted for personal and educational use.
implementation is reified in its own metadata
implementation
C heap
sym: read
parser
implementation
sym: lambda
compiler
0x5678
li r3, 42
li r4, 0x1234
stw r3, 0(r4)
blr
sym: and
implementation
ccg
sym: lis_i
sym: blr
implementation
sym: LdI
sym: Call
VPU
sym: Ret
libc
...
14
Copyright 2004 Ian Piumarta
Reproduction permitted for personal and educational use.
infinite extensibility
runtime implements dynamic compiler over its own implementation domain
every intrinsic semantic operation is (pre)defined as an (user-defined) extension
• no “privileged” meaning
• anything can be modified/removed/added on-the-fly
• unlimited malleability, under program control
15
Copyright 2004 Ian Piumarta
Reproduction permitted for personal and educational use.
unconventional programming languages
Source
Syntax
Semantics
Pragmatics
Application
Dynamic
Compiler
Runtime
Libraries
System
malleable (under programmer control)
Hardware
delicate (but not impossible ;-)
16
Copyright 2004 Ian Piumarta
Reproduction permitted for personal and educational use.
virtual machines
Java, for example. . .
• modify the compilation chain on-the-fly (3 KLOC)
– YNVM replaces a vertical section of a “traditional” JIT compiler
– then introspects and intercedes on the entire JIT+JVM+etc. . .
• new object formats, real runtime MOP, dynamically-defined native methods,
etc. . .
jdk.vvm
HelloWorld.class
parser
object memory
class loader
parse trees,
meta-data
tree compiler
GC
GC
JIT
object
memory
stack compiler
compile
optimizer
relink
code generator
dynamic asm
JVM
method
lookup
compiled
methods
17
Copyright 2004 Ian Piumarta
Reproduction permitted for personal and educational use.
active networks
interesting
design space
extremely interesting design space
interesting
design space
capsules (e.g. ants)
hybrid active networks
active packets (e.g. plan)
packet
data
preloaded protocol
implementation
packet
proto
data
code
dynamically-loaded
protocol implementation
treatment
node
treatment
node
capsule
env
code
code
capsule
env
code
env
environment
interpreter
interpreter
18
Copyright 2004 Ian Piumarta
Reproduction permitted for personal and educational use.
embedded systems
application(s)
µ-libc
YNVM
standard interfaces
OS services
C
drivers
component
pool
YNVM
drivers
YNVM
YNVM
h/w post-initialization
YNVM
YNVM
YNVM
boot script
C
C+asm
h/w pre-initialization
hardware
19
Copyright 2004 Ian Piumarta
Reproduction permitted for personal and educational use.
example: inline cache detail
(define (syntax ->)
(lambda (form)
; (-> object (selector args))
(syntax.match form (? ,recv ( ,[object.symbol? sel] ,@args ))
(let ([posic (:posic.malloc)])
‘(let ([_recv ,recv])
((if (= (:class.class-of _recv)
(:posic.prev-class ,(object.integer posic)))
(:posic.dest-meth ,(object.integer posic))
(relink ,(object.integer posic)
_recv
,(object.integer sel)))
_recv
,@args)))]))
≈ 1.98 × static C function call
(various demos offline for anyone interested)
20
Copyright 2004 Ian Piumarta
Reproduction permitted for personal and educational use.
the catch(es)
people are terrified of too much flexibility
need for stable intermediate forms (“layers” of abstraction)
• in deployment
• in “vertical” system / language design
– code gen, intrinsic forms, etc. . .
– where are the borders of the VM?
– APIs accessible only to certain groups
c.f., Herbert Simon, parable of the watchmakers
• complex systems will evolve from simple systems much more rapidly if there are
stable intermediate forms than if there are not; the resulting complex systems in
the former case will be hierarchic.
21
Copyright 2004 Ian Piumarta
Reproduction permitted for personal and educational use.
the catch(es)
C++ intrinsics
• vtbl is not 1st class
• modification of object (memory) behaviour is ugly
• modification of stack compiler is ugly
⇒
• pure C (macro assembler) implementation
• proto + delegate (50 LOC in bootstrap)
• everything 1st class
• 100% malleable down to the metal
22
Copyright 2004 Ian Piumarta
Reproduction permitted for personal and educational use.
the catch(es)
type safety
• dynamically-typed objects, messages
• statically-typed native code (platform headers)
self-implementing bootstrap
• dynamic – static compilation “slider”
repotimisation for performance (llvm, . . . )
dynamic specialisation (FDO) vs. static specialisation (DSL, ...)
isolates for domain protection
23
Copyright 2004 Ian Piumarta
Reproduction permitted for personal and educational use.
metrics?
buzzwords
• configurability, extensibility, evolution;
• in-field update (dynamic upgrade of running engine code)
• administration, response to change, reduced total cost of ownership
• interoperability: implementation languages with fine-grained communication
language designers/implementors: freedom to innovate
• expressivity; application, language, runtime;
• distribution, runtime adaptability (dynamic proxies, interfaces, ...)
target-specific dynamic compilation
dynamic optimisations: self-modifying IR code
performance: dynamic vs. static compilation
security:
erights.org
24
Copyright 2004 Ian Piumarta
Reproduction permitted for personal and educational use.
metrics?
compatibility and startup costs
• C ABI: performance loss?
• mixed dynamic/static code? (cf., gcj trick)
• dynamic – static slider
extending the runtime environment:
• increasing granularity (higher level abstrations → language)
• vs. decreasing granularity (lightweight transactions → language/runtime prim w/
associated syntax/semantics)
how much of system is written in managed code?
• microkernels
• lkms
25
Copyright 2004 Ian Piumarta
Reproduction permitted for personal and educational use.
metrics?
new language with features that don’t fit?
• impossible to predict the future
• must not legislate paradigms/models
• policies are separate from implementation/mechanisms
• can you do continuations in it?
right tool for job:
• right language for job
• right level (abstraction) in architecture for the (implementation) job
26
Copyright 2004 Ian Piumarta
Reproduction permitted for personal and educational use.
conclusion
algebra
longevity, generality
lambda
B
C
features, complexity
J*
C++
27
Copyright 2004 Ian Piumarta
Reproduction permitted for personal and educational use.
conclusion
• choice of flexibility / complexity “compromise”
5 (well-defined, in reality even more) levels of entry
• dynamic compilation overhead seen by applications
>= 3 MB native code per second (on current hardware)
• symbolic space affects semantic space
domain-specific compiler customisation
design & optimise your solution space, then use it
• agility
programs are (meta) data are programs are . . .
semantics are dynamic (interpret or compile the same code)
the compiler’s implementation is just part of the client’s namespace
system state created incrementally, described dynamically, changes felt
immediately
• simplicity: ∼ 10 KLOC, 250 KB code, for everything
• performance: 0.85 – 1.15 static (optimised) C programs
28
Copyright 2004 Ian Piumarta
Reproduction permitted for personal and educational use.
conclusion-in-a-picture
programme.ynvm
compiler
modifier
recompiler
etendre
mapper
mmap(fichier)
biblio.so
charger
modifier
recompiler
etendre
YNVM
YNVM
attacher
shmat(autreProcessus)
29
Copyright 2004 Ian Piumarta
Reproduction permitted for personal and educational use.
recommendations
ideally everyone shares a single, malleable “implementation engine”
otherwise:
• include a C semantics (and ABI) JIT
• maximise the potential for reimplementation of static parts of system at runtime
• design out all the barriers to innovation etc. . .
• adopt and apply (rigorously) simple (but entirely sufficient) safety practices
(then put it in your (OS) kernel!)
30
Copyright 2004 Ian Piumarta
Reproduction permitted for personal and educational use.
recommendations
safety/security:
erights.org
• principle of least authority
• capabilities as (object) references: good OO practice (nothing more, nothing
less, bulletproof )
live inside the kind of system you build
reduce to the max: simplicity
parse trees (not bytecodes) as an archival/interchange format
open source (and freely licensed)
31
Copyright 2004 Ian Piumarta
Reproduction permitted for personal and educational use.