RHOC - Rapid HPC Orchestration in the Cloud

Introduction
How to use RHOC
Preparation steps
Installing RHOC
Use cases
Options and parameters
Examples

Introduction

This is a Pre-Alpha Engineering Version

Overview

RHOC is a software tool that allows you to set up Intel® HPC Platform Specification compliant cloud-based clusters and other types of clusters, including single-node usage, in a cloud-independent way. RHOC can create an Intel HPC Platform compatible cluster on a variety of cloud infrastructures, including public and private clouds, easing development of hybrid- and multi-cloud HPC environments. It does not contain any scheduler, orchestration, or resource manager, but these can be added to manage the cluster.

Intro to the Intel® HPC Platform Specification

The Intel HPC Platform Specification defines both software and hardware requirements that form a foundation for high performance computing solutions. Systems that comply with the specification have enhanced compatibility and performance across a range of popular HPC workloads.

Motivations for enabling HPC in the cloud

HPC cluster capacity is typically fixed, but demand is typically variable. Cloud-based HPC can provide additional capacity on demand.
Cloud-based HPC clusters can simplify and accelerate access for new HPC users and new businesses, resulting in faster time to results.
Cloud provides a means to access massive resources or specialized resource for short periods, to address temporary or intermittent business needs.
Cloud provides access to the newest technologies, allowing evaluation and use ahead of long-term ownership

How the parameterization works

RHOC takes the parameters from a user-provided *.json file and from the command line. All parameters from the file and command line are combined into a single structure. RHOC gets templates from the templates/{provider}/ folder and replaces default variables with the user’s parameters. It then saves new configuration files based on these templates to .RHOC/. The generated files are used by terraform and packer tools for creating machine images and the cluster.

Currently supported providers

Planned supported providers in the nearest future

Microsoft Azure

How to use RHOC

Preparation steps

Cloud system

First, you need to have an account with a supported provider:

Google Cloud Platform

Then you need credentials:

For Google Cloud Platform

and put it into any folder (e.g. user_credentials/gcp/gcp_credentials.json).

Intel software

You will need the following distributions: Intel® Cluster Checker (version: 2019 initial release) and Intel® Parallel Studio XE Cluster Edition (version: 2018 update 3). Download and get licenses.

Put Parallel Studio XE Cluster Edition and its license into the distrib/psxe_cluster_edition folder.
Put the Cluster Checker distribution into the distrib/clck folder.

NOTICE: The names of distributions should end with .tgz. If they do not, rename them.

Other required software

You need to install:

Go
make: for Windows, for Linux

Installing RHOC

Clone RHOC repository from Github
Build project by issuing make in the root project folder
```
make GOOS=windows
```
GOOS parameter defines binaries for which platform (windows, linux) you want to get them. (default: "linux")

You should get a package-{GOOS}-amd64 folder with built binaries/executables. Also, the whole package is archived into package-{GOOS}-amd64-{version}-{hash}.tar.gz.

Use cases

First, create a configuration file with your customized parameters. For example, see examples/linpack/linpack-cluster.json

Launching workloads

RHOC run task.sh --parameters path/to/parameters.json

This command will instantiate a cloud-based cluster and run the specified task. On first use, the machine image will be automatically created. After the task completes, the cluster will be destroyed, but the machine image will be left intact for future use.

Persistent clusters

rhoc run task.sh --parameters path/to/parameters.json --keep-cluster

This command will instantiate the requested cluster and storage for the specified task. The required images will be created on first use. Using --use-storage option allows you to access data living on the storage node. NOTICE: make sure you don't change parameters in configuration except storage_disk_size, otherwise, a new storage will be created after parameters are changed. Currently changing storage_disk_size has no effect and the disk keep its previous size, to force it to resize destroy the storage node and delete the disk in cloud provider interface.

You can create a persistent cluster without running a task. For this, just use the create cluster command.

Launching workloads with storage

rhoc run task.sh --parameters path/to/parameters.json --use-storage

This command will instantiate the requested cluster and storage and then run the specified task. As before, the required images will be created on first use. Using --use-storage option allows you to access to storage data. NOTICE: make sure you didn't change parameters in configuration except storage_disk_size, otherwise, a new storage will be created after parameters are changed. storage_disk_size changing is ignored, disk keep the previous size.

You can create storage without running a task. For this, just use the create storage command.

Destroying clusters

rhoc destroy destroyObjectID

You can destroy a cluster or storage by destroyObjectID, which can be found by checking state.

NOTICE: The disk is kept when the storage is destroyed. Only the VM instances will be removed, and the "storage" RHOC entity will change its status from XXXX to configured. You can delete a disk manually through a selected provider if you want to.

Additional commands and options

Create image

rhoc create image --parameters path/to/parameters.json

This command tells RHOC to create a VM image from a single configuration file. You can check for created images in cloud provider interface if you want to.

Create cluster

rhoc create cluster --parameters path/to/parameters.json

This command tells RHOC to spawn VM instances and form a cluster. It also creates the needed image if it doesn't yet exist.

Create storage

rhoc create storage --parameters path/to/parameters.json

This command tells RHOC to create VM instance based on a disk that holds your data. You can use storage to organize your data and control access to it. Storage locates in /storage folder on VM instance. It also creates the needed image if it doesn't exist yet.

Uploading data into the storage is outside the scope of RHOC. RHOC only provides information allowing you to connect to the storage using rhoc state state command.

Check states

rhoc state

This command enumerate all manageable entities (images, clusters, storages etc.) and their respective status. For cluster and storage entities, additional information about SSH/SCP connection (user name, address, and security keys) is provided, in order to facilitate access to these resources.

Check version

rhoc version

Check parameters that user can set

Use this command with one of the additional arguments: image, cluster, task.

rhoc print-vars image

You can use --provider flag to check parameters specific for certain provider (default: GCP)

Help

rhoc help

This command prints a short help summary. Also, each RHOC command has a --help switch for providing command-related help.

Verbose info

Use -v or --verbose flag with any command to get extended info.

Simulation

Use -s or --simulate flag with any command to simulate running the execution without actually running any commands that can modify anything in the cloud or locally. Useful for checking what RHOC would perform without actually performing it.

Options and parameters

Common parameters

-p, --provider select provider (default: gcp) gcp - Google Cloud Platform aws - Amazon Web Services
-c, --credentials path to credentials file (default: user_credentials/credentials.json)
-r, --region location of your cluster for selected provider (default: us-central1)
-z, --zone location of your cluster for selected provider (default: a)
--parameters path to file with user parameters

You can define the above parameters only via command line.

Parameters presented below can be used in configuration file and command line. When specified in command line they override parameters from configuration file.

For applying them by command line use

--vars list of user's variables (example: "image_name=RHOC,disk_size=30")

Task

parameters

A task combines parameters from all entities it might need to create. For individual entities see:

Image parameters
Cluster parameters

options

--keep-cluster keep the cluster running after script is done
--use-storage allow accessing to storage data
--newline-conversion enable conversion of DOS/Windows newlines to UNIX newlines for the uploaded script (useful if you're running RHOC on Windows)
--overwrite overwrite the content of the remote file with the content of the local file
--remote-path name for the uploaded script on the remote machine (default: "./RHOC-script")
--upload-files files for copying into the cluster (into ~/RHOC-upload folder with the same names)
--download-files files for copying from the cluster (into ./RHOC-download folder with the same names)

Image

parameters

project_name (default: "zyme-cluster")
user_name user name for ssh access (default: "ec2-user")
image_name name of the image of the machine being created (default: "zyme-worker-node")
disk_size size of image boot disk, in GB (default: "20")

Cluster

parameters

project_name (default: "zyme-cluster")
user_name user name for ssh access (default: "ec2-user")
cluster_name name of the cluster being created (default: "sample-cloud-cluster")
image_name name of the image which will be used (default: "zyme-worker-node")
worker_count count of worker nodes (default: "2")
```
   **NOTICE**: *Must be greater than 1*
```
login_node_root_size boot disk size for login node, in GB (default: "20")
```
  **NOTICE**: *Must be no less than `disk_size`*
```
instance_type_login_node machine type of root node (default: "f1-micro" for GCP)
instance_type_worker_node machine type of worker nodes (default: "f1-micro" for GCP)
ssh_key_pair_path (default: "private_keys")
key_name (default: "hello")

Storage

parameters

project_name (default: "zyme-cluster")
user_name user name for ssh access (default: "ec2-user")
storage_name name of the storage being created (default: "zyme-storage")
image_name name of the image which will be used (default: "zyme-worker-node")
storage_disk_size size of permanent disk, in GB (default: "50")
storage_instance_type machine type of storage node (default: "f1-micro" for GCP)
ssh_key_pair_path (default: "private_keys")
storage_key_name (default: "hello-storage")

Examples

Examples preparation steps

Let's create your first own cluster.

Take the first two steps from How to use RHOC (1 and 2) if you haven't completed them yet.
Acquire GCP credentials file and save it as user_credentials/credentials.json.

Run High-Performance LINPACK benchmark on cloud cluster

Complete the preparation steps from Example preparation steps

Run the LINPACK benchmark:

RHOC run examples/linpack/linpack-cluster.sh --parameters examples/linpack/linpack-cluster.json --upload-files examples/linpack/HPL.dat

Your end of output should look like this:

*Finished        1 tests with the following results:*

                                        *1 tests completed and passed residual checks,*

                                         *0 tests completed and failed residual checks,*

                                        *0 tests skipped because of illegal input values.*

--------------------------------------------------------------------------------

*End of Tests.*

This is it! You have just successfully ran LINPACK on the cloud.

Run LAMMPS Molecular Dynamics Simulator on login node

Complete the preparation steps from Example preparation steps

Create storage:

./RHOC create storage --parameters=examples/lammps/lammps-single-node.json

Consult RHOC state for connection details to the storage node, SSH into it using provided private key and IP address

Prepare /storage/lammps/ folder for upload data:

sudo mkdir /storage/lammps/
chown lammps-user /storage/lammps/

Upload lammps.avx512.simg container into /storage/lammps/, e.g. by scp -i path/to/private_key.pem path/to/lammps.avx512.simg lammps-user@storage-address:/storage/lammps/

Run LAMMPS benchmark:

./RHOC run examples/lammps/lammps-single-node.sh --parameters=examples/lammps/lammps-single-node.json --use-storage --download-files=lammps.log

Your content of RHOC-download/lammps.log file should look like this (Note: this was received by running on 4 cores):

args: 2
OMP_NUM_THREADS=1
NUMCORES=4
mpiexec.hydra -np 4 ./lmp_intel_cpu_intelmpi -in WORKLOAD -log none -pk intel 0 omp 1 -sf intel -v m 0.2 -screen
Running: airebo Performance: 1.208 timesteps/sec
Running: dpd Performance: 9.963 timesteps/sec
Running: eam Performance: 9.378 timesteps/sec
Running: lc Performance: 1.678 timesteps/sec
Running: lj Performance: 19.073 timesteps/sec
Running: rhodo Performance: 1.559 timesteps/sec
Running: sw Performance: 14.928 timesteps/sec
Running: tersoff Performance: 7.026 timesteps/sec
Running: water Performance: 7.432 timesteps/sec
Output file lammps-cluster-login_lammps_2019_11_17.results and all the logs for each workload lammps-cluster-login_lammps_2019_11_17 ... are located at /home/lammps-user/lammps

This is it! You have just successfully ran LAMMPS on the cloud.
Don't forget to destroy storage.

Run OpenFOAM Benchmark on login node

Complete the preparation steps from Example preparation steps

Run OpenFOAM benchmark, where 7 is the endTime of computing benchmark:

./RHOC run -r us-east1 -z b --parameters examples/openfoam/openfoam-single-node.json --download-files DrivAer/log.simpleFoam --overwrite examples/openfoam/openfoam-single-node.sh 7

Full log of running OpenFOAM should be available as RHOC-download/log.simpleFoam
This is it! You have just successfully ran OpenFOAM on the cloud.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
CI		CI
cmd		cmd
distrib		distrib
examples		examples
hashicorp		hashicorp
pkg		pkg
postprocess		postprocess
templates		templates
vendor		vendor
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitmodules		.gitmodules
.travis.yml		.travis.yml
LICENSE		LICENSE
Makefile		Makefile
Readme.md		Readme.md
go.mod		go.mod
go.sum		go.sum
main.go		main.go

License

anmyachev/rhoc

Folders and files

Latest commit

History

Repository files navigation

RHOC - Rapid HPC Orchestration in the Cloud

Introduction

Overview

Intro to the Intel® HPC Platform Specification

Motivations for enabling HPC in the cloud

How the parameterization works

Currently supported providers

Planned supported providers in the nearest future

How to use RHOC

Preparation steps

Cloud system

Intel software

Other required software

Installing RHOC

Use cases

Launching workloads

Persistent clusters

Launching workloads with storage

Destroying clusters

Additional commands and options

Create image

Create cluster

Create storage

Check states

Check version

Check parameters that user can set

Help

Verbose info

Simulation

Options and parameters

Common parameters

Task

parameters

options

Image

parameters

Cluster

parameters

Storage

parameters

Examples

Examples preparation steps

Run High-Performance LINPACK benchmark on cloud cluster

Run LAMMPS Molecular Dynamics Simulator on login node

Run OpenFOAM Benchmark on login node

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages