next previous contents
Next: Running MPI Previous: Compilation

Creating a machinefile for Running Your MPI Program

A machinefile is a file that contains a list of the possible machines on which you want your MPI program to run.

By default, the mpirun command uses the machine file located in the /usr/local/appl/mpich/share/machines.ARCH file (where ARCH is the name of the architecture you are running on: solaris, Linux or us2). However, you can specify an alternate set of machines by (1) having a different machine file, and (2) telling mpirun to use this alternate file.

This alternate machine file is useful if one of the computers is heavily loaded or is having problems. MPI is not particularly fault-tolerant so if you try to use a computer which is turned off (or is in Windows XP) and the MPI process cannot be started on it then your program will crash. The particular machine you want to avoid can be commented out or deleted from the list of possible machines for selection. For example, pc04 will not be used if the following machine file is specified:

# sample machine file
pc01.ee.uwa.edu.au
pc02.ee.uwa.edu.au
pc03.ee.uwa.edu.au
# pc04.ee.uwa.edu.au
pc05.ee.uwa.edu.au
pc06.ee.uwa.edu.au
pc07.ee.uwa.edu.au
pc08.ee.uwa.edu.au
For convenience, your machine file should be kept in the same directory as your executable MPI files and named something appropriate like machines. The name of your machine file will be used as an argument to the mpirun option -machinefile, (see next section).

When machines are selected for use from the machine file, a cyclic system is used to parse the list. If you ask for n parallel processes then the first n entries from the file will be used. If the file does not contain n entries then all machines will be used, and some will be assigned multiple processes. The selection process cycles though the list until the required number of processors have been found.

During program development, testing and debugging (when you are not really worried about performance) you can make things easier for yourself (and have less impact on other users) by starting all parallel processes on the local computer. You can do this be editing your machine file so that it contains a single line which is the name of the machine you are sitting in front of. This may be necessary if you want to use the Linux machines outside of lab times and all of them are in Windows XP.

You can either manually specify a machine file (by copying the appropriate file from /usr/local/appl/mpich/share and editing it) or you can use a script called getmachines.sh which queries all the computers and builds a machine file containing those which are currently able to be used. You can get some brief information on this program by typing "getmachines.sh -h". The most common way to use it is to type

   "getmachines.sh -outfile=machines"

which will output the list of available machines to the file "machines" in the current directory