# Compiling a parallel version of Tonto

Parallel lines meet at infinity

These instructions are for Linux.

It's probably similar for Mac, and for Windows.

Make sure you've got a working single-processor Tonto version before parallelizing!

## Contents

• Tonto uses the message passing interface (MPI) to achieve parallelism.
• It doesn't matter if you use the MPI-1 or MPI-2 standard of the MPI specification.
• For simplicity these instructions are for the newer MPI-2 standard.
• There are several different implementations of these standards.
• For simplicity I also assume that you are going to use the MPICH2 implementation of MPI-2.
• Further, I assume you want to run processes on a single computer box with multiple cores.
• That is you want to parallelize only jobs on "your" computer.
• It's easy make a parallel Tonto in parallel within a supercomputer MPI environment; look at the fine print.
• Finally, I assume you are using a Linux OS; see below for fine print for other OS's.

Do this:

• Use your package manager to install MPICH2.
• If you must, download and install the MPICH2 installation from Argonne National Labs. This is probably one of the best.
• Then: check that you have the mpif90 script in your path by typing
  which mpif90

• Then: check that you have the "hydra" mpiexec command in your path
  which mpiexec.hydra


If the computer recognizes what you typed, you've got a working MPICH2 installation!

Congratulations! You are ready to rock-n-roll!

If you have had trouble with these steps you'll have to understand a bit how MPI works (next).

If you're installing at a supercomputer center, read that section below.

Still having problems? Then and read the fine print in the section following that.

## How MPI works

The MPI libraries use a "process manager" to spawn different processes.

• These processes run in parallel, but talk to each other through MPI.
• Each MPI process executes one Tonto program executable.
• These processes can be on your machine,
• The processes can be on several different machines over a network.
• The current standard process manager is called "Hydra".
• Since Hydra is the default, we use it, as you can see.
• It is usually not a good idea to run parallel jobs over a network, unless you have a fast network, like at a supercomputer center.
• If you want to run over a network, make sure you have a switch because the data transfer will screw other people on the network.

On Linux, if you installed MPICH2 via a package manager, MPI works like this:

• Your Fortran 90 compiler will probably be gfortran.
• However, gfortran will be called by a "wrapper" script mpif90.
• The purpose of this mpif90 script is to make sure all the extra parallel libraries and things you need are added to your compiler without you having to worry about it.
• All your usual underlying fortran compiler options will still be available.
• The MPI just makes your compiler parallel -- everything else stays the same
• Nice, huh?

## Parallel Tonto at a supercomputer center

The MPICH2 instructions aren't appropriate for supercomputer users.

• The supercomputer staff will have have set up an MPI compiler for you
• Their MPI compiler does the same job as "mpif90" decribed above.
• That's great! Use the one they provided you!
• But when you see "mpif90", substitute your supercomputer site's MPI compiler script
• And, substitute your site's "mpiexec.hydra" command too; it may be "mpirun".

At supercomputer sites, you will probably submit to a resource-controlled queue.

• Most likely, you'll use the qsub command.
• In this case and you don't have to worry about starting up a "process manager",
• Your supercomputer people will do it for you, when your queued job starts.
• You don't even have to worry about setting up MPI because they've done that too.
• Read the documentation: it will say how to start processors, request memeory, etc.

## Installing MPI: fine print

Read these notes if the above hasn't worked for you.

Or, you are on a non-Linux system (loser).

Or, for whatever reason you don't want to use MPICH2.

Or, you can't use MPICH2.

• On non-Linux systems, I have no idea how to install MPICH2
• Can someone help?
• At Argonne there are binaries for Windows and Mac.
• That looks easy.
• because you want to use a specific Fortran compiler.
• Or for whatever reason,
• That's fine, go ahead, do it.
• Just make sure you have an "mpif90" and "mpiexec" command, or analogs
• If for whatever reason you don't like MPICH2,
• try open-mpi from Indiana.
• However, you'll probably have to compile it from scratch.
• In the past, people used to use the "mpd" process manager via an "mpdboot" command.
• We won't use that here, we are going to use Hydra.
• Apparently mpd had problems (and I can confirm that).
• However, I mention mpd here in case you are using an older MPI-1 setup.

## Compiling parallel Tonto

Compiling parallel Tonto is the same as a standard installation.

Just type:

  perl -w scripts/Makefile.pl -fc=mpif90
make


Of course, replace mpif90 with your parallel compiler if you need to.

The executables go in the <platform-specific-directory> as usual.

## Compiling a parallel Tonto: fine print

Read this if you get into trouble in the previous step.

One thing that may occur is that we don't have a platform-specific compiler option file for you.

This is the same kind of problem as found for a single-processor installation.

However, there are a couple of extra things you should know.

(If you need even more information, look at the information in "Customised compile")

The first thing is that you need to activate the -DMPI swictch.

Simply add -DMPI to the DEFS variable in the platform-specific compiler option file:

  # Put in the DEFS any macros that turn on/off conditional parts of the
# compilation.  You can specify many of them if you want, all on the one line
# separated by spaces.
DEFS     = -DUSE_ERROR_MANAGEMENT -DFLUSH -DMPI


In rare cases you may need to include appropriate MPI libraries.

For example the library might be libmpi.a.

To do this just add -lmpi to the LIBS variable in the platform-specific compiler option file:

  # Commands to make the LAPACK/BLAS libraries, or their equivalent.
LIBS     = -L$(objdir) -llapack -lblas -lmpi  The name of library may vary according to your MPI installation. Read your documentation. However this should be rare since the mpif90 compiler script should do it for you. Probably you haven't done a proper installation of MPI if you have to do this. Finally, you may need to tell the compiler where to fine the libraries. To do that add a directory to the library search path. Do it by adding a -I option to the FFLAGS variable in the platform-specific compiler options file:  # Compiler-specific options. These should tell the compiler how to look in #$(srcdir) for the Fortran source files, how to look in $(moddir) for module # information files, and any additional libraries needed. # Is srcdir really needed? FFLAGS = -I$(srcdir) -I$(moddir) -J$(moddir) -I<your_MPI_library_location> -Wall -static


Replace <your_MPI_library_location> with the folder containing the libmpi.a library.

You may need more than one -I option.

Especially, if you need to tell the location of the "mpi.mod" Fortran module file.

Type "which mpi.mod" to find out where this library is.

Again, if you have to do this there is probably some mistake.

But sometimes it can be easier just to do this than trying to figure out what went wrong with an install!

Finally: you may have to remove the -static option if the compiler complains about static libraries.

Here is a the type of error message you could see:

  /usr/lib/openmpi/lib/libopen-pal.a(dlopen.o): In function vm_open':
(.text+0x100): warning: Using 'dlopen' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking


Then you have to remove the -static option, like this:

  # Compiler-specific options.  These should tell the compiler how to look in
# $(srcdir) for the Fortran source files, how to look in$(moddir) for module
# information files, and any additional libraries needed.
# Is srcdir really needed?
FFLAGS   = -I$(srcdir) -I$(moddir) -J\$(moddir) -I<your_MPI_library_location> -Wall


## Testing: the run_pi.exe program

You can test that parallel-Tonto is working by compiling the classic program to calculate the value of Pi.

It works by calculating the integral below, in parallel:

$\pi = \int_0^1 \frac{4}{1+x^2} dx$

It just adds up blocks in the interval [0,1] which approximate the "area under the curve".

To do the check, change directory into the tonto folder (as usual) and type

  make run_pi.exe


Then run the Pi program by typing

  mpiexec.hydra -n 2 <platform-specific-directory>/run_pi.exe


This will run the program with 2 processors. The result should be placed in the "stdout" file.

If everything is working you should get this:

  No. of intervals ..... 100000000
Pi ..... 3.141593
Wall-clock time taken is , 40 seconds, 169 milliseconds.
CPU time taken is 80.060 CPU seconds.


You can repeat the calculation with different numbers of processors to see how parallel scaling works.

By the way: wall-clock time is how much "real" time the job took.

CPU time is the total CPU time added up over all processors.

Here are the wall-clock times I get for different numbers of processors:

  1 ... 84s
2 ... 40s
4 ... 22s
8 ... 20s


Lesson: you can see from this that it wouldn't be worth using 8 processors.

On this computer, you'd be better off doing two different jobs with 4 processors each.

It pays to investigate a bit.

## Testing: the run_pi_io program

A common area for parallelism to fail is during IO type operations.

Why?

It's because, when using parallelism, generally only one process has control of reading and writing files. If not, things get complicated. So when using parallelism you have to control which processors get to do the IO. And that's where potential problems arise.

You can test the parallel IO restriction to the "master" processor.

To do the check, change directory into the tonto folder (as usual) and type

  make run_pi_io.exe


Next, make a file called "stdin" in the tonto folder and place in it a number

  1000000


The run_pi_io program will open this file and use this number as the number of blocks to use in the Riemann sum for Pi. Otherwise, the program is identical to the run_pi program.

Finally, run the Pi program by typing

  mpiexec.hydra -n 2 <platform-specific-directory>/run_pi_io.exe


You should get the expected result without any crashes:

  No. of intervals ..... 1000000
Pi ..... 3.141593
Wall-clock time taken is , 236 milliseconds.
CPU time taken is 0.910 CPU seconds.
`

If these tests work then it's a good bet that the whole of Tonto will work in parallel.

## How parallelism works

The run_pi.foo program

Here is the code from run_pi.foo

Take a look at it; we'll refer back to it in the explanation below.

The general flow is like this:

• Set up Tonto-system objects and open files
• Set the number of Riemann blocks
• Do the Pi integral --- with a parallel-do loop
• Call a PARALLEL_SUM thingy
• Close files and delete Tonto-system objects

All the action occurs in the parallel do loop.

What basically happens is that, at the parallel loop, Tonto tells each processor to do a different segmentof the loop.

In actual fact, if we have three processors, then

• processor 1 does loop values i = 1,4,7,10 ...
• processor 2 does loop values for i = 2,5,8,11 ...
• processor 3 does values i = 3,6,9,12 ...

It's actually not very important exactly how the loop values are divided up (and it may change in the future).

Just you have to know that at the parallel do loop each processor gets to do its own part of the loop.

Unfortunately, that means at the end of the parallel loop each processor has its own "part" of the answer to the value of Pi.

To get the correct answer we need to add these part-values together.

The processors have to "communicate" their part-values to each other.

This is where the PARALLEL_SUM(pi) comes in.

At the PARALLEL_SUM(pi) statement, all processors "synchronize", meaning that they wait for each other, and then they (somehow) add their respective Pi-part-values together, and then transmit the (correct) summed total value of Pi back to all the separate processors.

That's how it works.

Of course, none of this is MPI; it's not even Fortran.

The parallel do syntax and the PARALLEL_SUM are part of the Foo language, and they are implemented on top of the MPI layer.

It should be possible to implement these features on top of any parallel system.

In this way you get some protection against changes in underlying technology.

Cool, huh? :-)