General
-------
Hydra is a process management system for starting parallel jobs. Hydra
is designed to natively work with multiple daemons such as ssh, rsh,
pbs, slurm and sge. However, in the current release, only ssh, rsh and
fork are supported, with a preliminary version of slurm available.

More detailed documentation of the internal workings of Hydra are
available here:
http://wiki.mcs.anl.gov/mpich2/index.php/Hydra_Process_Management_Framework


Quick Start
-----------
To use hydra, mpich2 needs to be configured with the configure option
--with-pm=hydra.

Once built, the Hydra executables are in mpich2/bin, or the bin
subdirectory of the install directory if you have done an install.
You should put this (bin) directory in your PATH in your .cshrc or
.bashrc for usage convenience:

Put in .cshrc:  setenv PATH /home/you/mpich2/bin:$PATH

Put in .bashrc: export PATH=/home/you/mpich2/bin:$PATH

To compile your application use mpicc:

 $ mpicc app.c -o app

Create a file with the names of the machines that you want to run your
job on. This file may or may not include the local machine.

 $ cat hosts

   donner
   foo
   shakey
   terra

To run your application on these nodes, use mpiexec:

 $ mpiexec -f hosts -n 4 ./app

The host file can also be specified as follows:

 $ cat hosts

   donner:2
   foo:3
   shakey:2

In this case, the first 2 processes are scheduled on "donner", the
next 3 on "foo" and the last 2 on "shakey". Comments in the
host file start with a "#" character.

 $ cat hosts

   # This is a sample host file
   donner:2     # The first 2 procs are scheduled to run here
   foo:3        # The next 3 procs run on this host
   shakey:2     # The last 2 procs run on this host


Environment settings
--------------------
HYDRA_HOST_FILE: This variable points to the default host file to use,
when the "-f" option is not provided to mpiexec.

  For bash:
    export HYDRA_HOST_FILE=<path_to_host_file>/hosts

  For csh/tcsh:
    setenv HYDRA_HOST_FILE <path_to_host_file>/hosts


HYDRA_DEBUG: Setting this to "1" enables debug mode; set it to "0" to
disable.

HYDRA_ENV: Setting this to "all" will pass all the environment to the
application processes.

HYDRA_PROXY_PORT: The port to use for the proxies.


Bootstrap servers
-----------------
Hydra supports SSH and FORK bootstrap servers to launch processes. You
can pick these through the mpiexec option -bootstrap:

 $ mpiexec -bootstrap ssh -f hosts -n 4 ./app

 (or)

 $ mpiexec -bootstrap fork -f hosts -n 4 ./app

This can also be controlled by using the HYDRA_BOOTSTRAP environment
variable.

The default bootstrap server is ssh.

The executable to use as the bootstrap server can be specified using
the option -bootstrap-exec:

 $ mpiexec -bootstrap ssh -bootstrap-exec /usr/bin/ssh -f hosts -n 4 ./app

This can also be specified using the HYDRA_BOOTSTRAP_EXEC environment
variable. If the bootstrap executable is not specified, the default
path (specific to each bootstrap server) is chosen.


Process-core binding
--------------------
To configure hydra with process-core binding support, use the
configure option -enable-hydra-procbind.

We support three models of allocation strategies:

1. Basic allocation strategies: this just allocates processes using
the OS specified processor IDs. Currently, only round-robin scheme is
provided here.

2. Topology-aware allocation strategies: these are a bit more
intelligent in that they try to understand the system topology and
assign processes in that order. Currently, "buddy" and "pack" schemes
are provided. The "buddy" scheme loops between all the available
sockets, allocating one process per socket; this tries to minimize the
inter-process resource sharing (assuming the closer the processes are
the more resources that they share). The "pack" scheme packs
everything as closely as it can; this tries to maximize resource
sharing hoping that the communication library can take advantage of
this packing for better performance.

3. User-defined allocation strategies: two schemes are
provided---command-line and host-file based. The command-line scheme
lets the user specify a common-mapping for all physical nodes on the
command line. The host-file scheme is the most general and lets the
user specify the mapping for each node separately.

The modes of process-core binding are: round-robin ("rr"),
buddy-allocation ("buddy"), closest packing ("pack") and user-defined
("user"). These can be selected as follows:

 $ mpiexec -binding rr -f hosts -n 8 ./app

 ... or ...

 $ mpiexec -binding pack -f hosts -n 8 ./app

Consider the following layout of processing elements in the system
(e.g., two nodes, each with two processors, and each processor with
two cores). Suppose the Operating System assigned processor IDs for
each of these processing elements are as shown below:

__________________________________________      __________________________________________
|  _________________    _________________  |    |  _________________    _________________  | 
| |  _____   _____  |  |  _____   _____  | |    | |  _____   _____  |  |  _____   _____  | |
| | |     | |     | |  | |     | |     | | |    | | |     | |     | |  | |     | |     | | |
| | |     | |     | |  | |     | |     | | |    | | |     | |     | |  | |     | |     | | | 
| | |  0  | |  2  | |  | |  1  | |  3  | | |    | | |  0  | |  2  | |  | |  1  | |  3  | | |
| | |     | |     | |  | |     | |     | | |    | | |     | |     | |  | |     | |     | | |
| | |_____| |_____| |  | |_____| |_____| | |    | | |_____| |_____| |  | |_____| |_____| | |
| |_________________|  |_________________| |    | |_________________|  |_________________| |
|__________________________________________|    |__________________________________________|


In this case, the binding options are as follows:

RR: 0, 1, 2, 3 (use the order provided by the OS)
Buddy: 0, 1, 2, 3 (increasing sharing of resources)
Pack: 0, 2, 1, 3 (closest packing)
User: as defined by the user

Within the user-defined binding, two modes are supported: command-line
and host-file based. The command-line based mode can be used as
follows:

 $ mpiexec -binding user:0,3 -f hosts -n 4 ./app

If a machine has 4 processing elements, and only two bindings are
provided (as in the above example), the rest are padded with (-1),
which refers to no binding. Also, the mapping is the same for all
machines; so if the application is run with 8 processes, the first 2
processes on "each machine" are bound to processing elements as
specified.

The host-file based mode for user-defined binding can be used by the
"map=" argument on each host line. E.g.:

 $ cat hosts

   donner:4    map=0,-1,-1,3
   foo:4       map=3,2
   shakey:2

Using this method, each host can be given a different mapping. Any
unspecified mappings are treated as (-1), referring to no binding.

Command-line based mappings are given a higher priority than the
host-file based mappings. So, if a mapping is given at both places,
the host-file mappings are ignored.

Binding options can also be controlled with the environment variable
HYDRA_BINDING.


X Forwarding
------------
X-forwarding is specific to each bootstrap server. Some servers do it
by default, while some don't. For ssh, this is disabled by default. To
enable it, you should use the option -enable-x to mpiexec.

 $ mpiexec -enable-x -f hosts -n 4 ./app


Persistent-mode Proxies
-----------------------
Hydra also supports proxies to be launched in persistent mode on the
system (e.g., by a system administrator). To launch in persistent
mode, use:

 $ mpiexec -boot-proxies -f hosts

 $ mpiexec -use-persistent -f hosts -n 4 ./app1

 $ mpiexec -use-persistent -f hosts -n 4 ./app2

 $ mpiexec -use-persistent -f hosts -n 4 ./app3

 $ mpiexec -shutdown-proxies -f hosts

Persistent mode can also be picked using the environment setting
HYDRA_LAUNCH_MODE=persistent.

The option "-boot-foreground-proxies" can be used to prevent
persistent proxies from spawning a child process and exiting.
This option is useful for debugging. This option can also be
picked using the environment setting
HYDRA_BOOT_FOREGROUND_PROXIES=1

 $ mpiexec -boot-foreground-proxies -f hosts

 $ mpiexec -use-persistent -f hosts -n 4 ./app1

 $ mpiexec -shutdown-proxies -f hosts


Communication sub-systems
-------------------------
Hydra supports different communication sub-systems to connect proxies
in the persistent mode. The default is "none", which means that the
proxies are not connected. You can pick these through the mpiexec
option -css:

 $ mpiexec -css ib -f hosts -n 4 ./app

 (or)

 $ mpiexec -css mx -f hosts -n 4 ./app

This can also be controlled by using the HYDRA_CSS environment
variable.


Resource Manager integration
----------------------------
Hydra provides capability to integrate with different resource
managers. The default is "dummy", which means no resource manager. You
can pick these through the mpiexec option -rmk:

 $ mpiexec -rmk lsf -f hosts -n 4 ./app

This can also be controlled by using the HYDRA_RMK environment
variable.


Hydra in hybrid environments
----------------------------
Hydra can be used to launch other process managers as well, such as a
UPC launcher, for example:

 $ mpiexec -n 2 -ranks-per-proc=4 upcrun -n 4 ./app

This launches two instances of upcrun, each of which is expected to
launch 4 application processes (two subgroups of processes). Hydra
needs the -ranks-per-proc argument to tell it how many MPI ranks it
needs to allocate to each group of processes.

If the internal nested environment also needs to use Hydra as a
launcher, but not as a process manager, this can be set using:

 $ mpiexec -n 2 -ranks-per-proc=4 mpiexec -n 4 -disable-pm-env ./app

 (or)

 $ mpiexec -n 2 -ranks-per-proc=4 HYDRA_PM_ENV=0 mpiexec -n 4 ./app
