ADVCOMP08

ADVCOMP 2008
Valencia (Spain), September 30, 2008
Dynamic Deployment of Custom Execution
Environments in Grids
R.S. Montero, E. Huedo and I.M. Llorente
Distributed Systems Architecture Research Group
Universidad Complutense de Madrid
Contents
1. Introduction
2. Straightforward Deployment of VMs
3. Dynamic Provisioning of Computing Elements
4. Related Work
5. Conclusions
2/23
1. Introduction
Dynamic Deployment of Custom Execution Environments in Grids
The Problem
• Growing heterogeneity (both hardware & software) of grids highly
impacts application porting
– Increase of cost and length of application porting or development cycle
(mainly for testing in the great variety of environments)
– Limitation of the effective number of resources available to a
user/application (only sites bound to a given VO will invest the effort to
install, configure and maintain custom software configurations)
– High operational costs of the infrastructure
3/23
1. Introduction
Dynamic Deployment of Custom Execution Environments in Grids
Example: XMM-Newton Science Analysis Software (SAS)
• Analysis of the data provided by XMM-Newton
• Frequently released
• Support for several platforms (OS, hardware)
• Strong software requirements (libraries)
• Must be deployed in all resources
• Impose a significant effort
– System admin staff
– Developers
– Users which may need
specific versions
The XMM Newton satellite
4/23
1. Introduction
Dynamic Deployment of Custom Execution Environments in Grids
Possible Solutions
• Software-environment configuration systems
– Let users define what applications they want
– Let administrators make applications available to users
– Example: SoftEnv
– They do not completely solve any of the previous problems
• Deployment of software-environment overlays
– Deploy custom software configurations in user-space
– Example: Condor GlideIn (to deploy Condor pools)
– Software must be installed in user space
– Compatibility issues
• Virtual Machine technologies
– Natural way to deal with the heterogeneity of the infrastructure
– Allow partitioning and isolation of physical resources
– Execution of legacy applications or scientific codes
– Examples: In-VIGO, VWS
5/23
2. Straightforward Deployment of VMs
Dynamic Deployment of Custom Execution Environments in Grids
Main Idea
• Encapsulate a virtual machine in a grid job.
– Incorporate the functionality of a general purpose metascheduler
– Do not need new middleware
– The underlying LRMS is not aware of the nature of the job
– Only suitable to medium/coarse grained HTC applications.
• Generalization of previous overlays for Grids:
– Condor GlideIn
– GridWay & BOINC
6/23
2. Straightforward Deployment of VMs
Dynamic Deployment of Custom Execution Environments in Grids
DRMAA
Results
$>
CLI
.C, .java
.java
.C,
GridWay
Globus
PBS
SGE
Grid MetaGrid Middleware Scheduler Applications
The GridWay Metascheduler
• Advanced scheduling
• Different application
profiles
• Fault detection & recovery
• Job execution management
• Prolog (stage-in)
• Wrapper (execution)
• Epilog (stage-out)
Infrastructure
7/23
2. Straightforward Deployment of VMs
Dynamic Deployment of Custom Execution Environments in Grids
XMM-Newton Science
Archive (XSA)
VM Image
Repository
SAS
Wrapper
Client Machine
GridFTP
GridWay
GridFTP
GRAM
GridFTP
Front-end
Virtual WN
LRMS
1 Prolog
2
Wrapper
(Execution)
3
Epilog (Satge-in)
(stage-out)
2.1 Stage-in to virtual WN
2.2 Execution in the virtual WN Worker Nodes
2.3 Stage-out to cluster FS
8/23
2. Straightforward Deployment of VMs
Dynamic Deployment of Custom Execution Environments in Grids
Experiments
• XMM-Newton SAS application
• Overhead analysis
Testbed Characteristics
Host
CPU
Memory
OS
Service
ursa
P4 3.2GHz 512MB
FC 4 GW
draco
P4 3.2GHz 512MB
Etch
GT4, PBS
draco WN
P4 3.2GHz 2GB
Etch
Xen3.0
9/23
2. Straightforward Deployment of VMs
Dynamic Deployment of Custom Execution Environments in Grids
Overhead Analysis
Without VMs Persistent
VMs
Save
Restore
Start
Stop
10/23
3. Dynamic Provisioning of Computing Elements
Dynamic Deployment of Custom Execution Environments in Grids
Main Idea
• New infrastructure layer, separating Resource Provisioning from Job
Management
• Seamless integration with the existing middleware stacks
• Completely transparent to the computing service and so end users
LRMS (Job Management)
Virtual cluster nodes
Distributed Virtualization Layer
VMM
VMM
VMM
VMM
Physical working nodes
11/23
3. Dynamic Provisioning of Computing Elements
Dynamic Deployment of Custom Execution Environments in Grids
Features
User Requests
• Typical LRMS interface
• Virtualization overhead
Cluster Frontend
Virtualized cluster nodes
Distributed Virtualizer
VMM
Dedicated cluster nodes
VMM
VMM
Cluster Nodes
12/23
3. Dynamic Provisioning of Computing Elements
Dynamic Deployment of Custom Execution Environments in Grids
Features
Cluster Frontend
Cluster Consolidation
• Multiple worker nodes in a single resource
• Dynamic provision rules (infr. adaptation)
• VMM functionality (e.g. live migration)
Virtualized cluster nodes
Distributed Virtualizer
VMM
Dedicated cluster nodes
VMM
VMM
Cluster Nodes
13/23
3. Dynamic Provisioning of Computing Elements
Dynamic Deployment of Custom Execution Environments in Grids
Features
Cluster Frontend
Cluster Partitioning
• Performance partitioning (dedicated workernodes)
• Isolate cluster workload
• Dedicated HA partitions
Virtualized cluster nodes
Distributed Virtualizer
VMM
Dedicated cluster nodes
VMM
VMM
Cluster Nodes
14/23
3. Dynamic Provisioning of Computing Elements
Dynamic Deployment of Custom Execution Environments in Grids
Features
Heterogenous Workloads
• Dynamic provision of cluster configurations
Cluster Frontend
• Example: on-demand VO workernodes in Grids
Virtualized cluster nodes
Distributed Virtualizer
VMM
Dedicated SGE nodes
VMM
VMM
Cluster Nodes
15/23
3. Dynamic Provisioning of Computing Elements
Dynamic Deployment of Custom Execution Environments in Grids
• Unmodified Grid applications
Grid Integration
• Grid interfaces preserved (DRMAA...)
Applications
• Virtual resources are exposed by GT
GridWay
• Dynamic scheduling
• Fault detection & recovery
MDS
GRAM
GridFTP
Cluster Frontend (SGE)
Grid Middleware Layer
• WN images registers to a different queue
Computing Service Layer
OpenNebula
VMM
VMM
• VO specific appliances for the WNs
VMM
• Coexist with other services
Infrastructure Layer
16/23
3. Dynamic Provisioning of Computing Elements
Dynamic Deployment of Custom Execution Environments in Grids
Grid Integration
• Adapts the Grid infrastructure to its workload
• WN deployment policies (e.g. VO shares)
• Fault detection & recovery
Infrastructure
Manager
GridWay
MDS
GRAM
GridFTP
Cluster Frontend (SGE)
VWS
OpenNebula
VMM
VMM
VMM
VO Appliance
repository
17/23
3. Dynamic Provisioning of Computing Elements
Dynamic Deployment of Custom Execution Environments in Grids
Experiments
• Interaction between each component
• Overhead induced by each component
Testbed Characteristics
Host
CPU
Memory
OS
Service
UCM
P4 3.2GHz
1GB
Etch
GT, SGE, VWS
ESA
Xeon 2.2GHz
2GB
FC6
GT, SGE, VWS
Manager
P4 3.2GHz
2GB
Etch
GW, GT
18/23
3. Dynamic Provisioning of Computing Elements
Dynamic Deployment of Custom Execution Environments in Grids
Overhead Analysis
GridWay
Infrastructure
Manager
6 90 s
MDS
5
GRAM
GridFTP
10% of overhead for
170 s
computational tasks
Cluster Frontend (SGE)
VWS
4
2 sec
18 s
OpenNebula
VMM
VMM
1
2 sec
3
VMM
2
96 s
VO Appliance
repository
19/23
3. Dynamic Provisioning of Computing Elements
Dynamic Deployment of Custom Execution Environments in Grids
Overhead Analysis
Shutdown
• Same steps
• Similar overhead
• Metascheduler must be able to recover from failure
GridWay
Infrastructure
Manager
6 90 s
MDS
5
GRAM
GridFTP
10% of overhead for
170 s
computational tasks
Cluster Frontend (SGE)
VWS
4
2 sec
18 s
OpenNebula
VMM
VMM
1
2 sec
3
VMM
2
96 s
VO Appliance
repository
20/23
4. Related Work
Dynamic Deployment of Custom Execution Environments in Grids
Renewed Interest on Virtualization Technologies
• COD (Cluster on Demand) is a cluster management software
• Edge Services uses VWS to deploy VO-dedicated servers
• In-VIGO uses VMs to deploy different middleware stacks
• Amazon EC2 (Elastic Computing Cloud) provides a remote VM
execution environment through a simple WS interface
• And many more…
21/23
5. Conclusions
Dynamic Deployment of Custom Execution Environments in Grids
Grids and Virtual Machines
• Both alternatives
– Reduce application porting times (mainly testing time)
– Increase the effective number of resources available to a user/application
– Reduce the operational costs of the infrastructure (simple on-demand
provision of custom configurations)
• Straightforward deployment of VMs
– Almost ready to work on existing infrastructures, with limited overhead for
some deployments
– Does not fully exploit virtualization
– Limited to medium to coarse grained batch applications
• Dynamic provisioning of computing elements
– New infrastructure layer, separating Resource Provisioning from Job
Management
– Dynamically adapt the infrastructure to support different VOs
– Seamless integration of remote providers (Amazon EC2, VWS…)
– Implement different VO policies to adapt the infrastructure (future)
22/23
Dynamic Deployment of Custom Execution Environments in Grids
THANK YOU FOR YOUR ATTENTION!!!
More info, downloads, mailing lists at
www.opennebula.org
This work is partially funded by the “RESERVOIR– Resources and Services
Virtualization without Barriers” project
EU grant agreement 215605
www.reservoir-fp7.eu/
23/23