Uppsala Multidisciplinary Center for Advanced Computational Science

MPI and OpenMP user guide

This is a short tutorial about how to use the queuing system, and how to compile and run MPI and OpenMP jobs.

Compiling and running parallel programs on UPPMAX clusters.

Introduction

These notes show by brief examples how to compile and run serial and parallel programs on the clusters at UPPMAX, such as milou.uppmax.uu.se and tintin.uppmax.uu.se

Section 1 show how to compile and run serial programs, written in fortran, c, or java, on the login nodes. Things work very much like on any unix system, but the subsections on c and java also demonstrate the use of modules.

Section 2 show how to run serial programs on the execution nodes by submitting them as batch jobs to the queue system SLURM.

Section 3 demonstrate parallel message passing programs in c, using the MPI system.

Section 4 demonstrate threaded programs in c using OpenMP directives. These programs must be executed on processors on the same node, since the threads have common memory areas.

Section 5, finally, demonstrate threaded programs usinig pthreads instead of OpenMP.

All programs are of the trivial "hello, world" type. The point is to demonstrate how to compile and execute the programs, not how to write parallel programs.

Serial programs on the login node

Fortran programs

Enter the following fortran program and save in the file hello.f

C     HELLO.F :  PRINT MESSAGE ON SCREEN
      PROGRAM HELLO
      WRITE(*,*) "hello, world";
      END 

To compile this you should decide on which compilers to use. We have the Portland group compilers installed on UPPMAX, so the pgf77 or pgf90 command can be used to compile fortran code. (pgf90 is in fact a F95 compiler). A module must first be loaded to use the compilers:

$ module load pgi

To compile, enter the command:

$ pgf90 -o hello hello.f

to run, enter:

$ ./hello
hello, world

To compile with good optimization you can use the -fast flag to the compiler, but be a bit careful with the -fast flag, since sometimes the compiler is a bit overenthusiastic in the optimization and this is especially true if your code contains programming errors (which if you are responsible for the code ought to fix, but if this is someone elses code your options are often more limited). Should -fast not work for your code you may try with -O3 instead.

C programs

Enter the following c program and save in the file hello.c

/* hello.c :  print message on screen */
 <stdio.h>
int main()
{
    printf("hello, world\n");
    return 0;
} 

To compile using gcc with no optimization:

$ gcc -o hello hello.c

with basic optimization:

$ gcc -O3 -o hello hello.c

Alternatively, to use the pgi compiler, first load the pgi module:

$ module load pgi

and then compile with the command:

$ pgcc -o hello hello.c

With fast optimization (be careful with -fast, if it works it is good, you can also try with -O3):

$ pgcc -fast -o hello hello.c

To run, enter:

$ ./hello
hello, world

Java programs

Enter the following java program and save in the file hello.java

/* hello.java :  print message on screen */
class hello {
public static void main(String[] args)
{
     System.out.println("hello, world");
}
}

Before compiling a java program, the module java has to be loaded.
To load the java module, enter the command:

$ module load java

To check that the java module is loaded, use the command:

$ module list

To compile, enter the command:

$ javac hello.java

The java module is not always needed to run the program.
To verify this, unload the java module:

$ module unload java

to run, enter:

$ java hello
hello, world

Running serial programs on execution nodes

Jobs are submitted to execution nodes through the resource manager.
We use SLURM on our clusters. 

To run the serial program hello as a batch job using SLURM, enter the following shell script in the file hello.sh:

#!/bin/bash -l
# hello.sh :  execute hello serially in SLURM
# command: $ sbatch hello.sh
# sbatch options use the sentinel #SBATCH
# You must specify a project
#SBATCH -A your_project_name
#SBATCH -J serialtest
# Put all output in the file hello.out
#SBATCH -o hello.out
# request 5 seconds of run time
#SBATCH -t 0:0:5
# request one core
#SBATCH -p core -n 1
./hello

The last line in the script is the command used to start the program.

Submit the job to the batch queue:

$ sbatch hello.sh

The program's output to stdout is saved in the file named at the -o flag.

$ cat hello.out
hello, world

Mpi using the OpenMPI system

C programs

Enter the following mpi program in c and save in the file hello.c

/* hello.c :  mpi program in c printing a message from each process */
 <stdio.h>
 <mpi.h>
int main(int argc, char *argv[])
{
    int npes, myrank;
    MPI_Init(&argc, &argv);
    MPI_Comm_size(MPI_COMM_WORLD, &npes);
    MPI_Comm_rank(MPI_COMM_WORLD, &myrank);
    printf("From process %d out of %d, Hello World!\n", myrank, npes);
    MPI_Finalize();
    return 0;
}

Before compiling a program for MPI, the module openmpi must be loaded.
To load the openmpi module, enter the command:

$ module load pgi openmpi

To check that the openmpi modules is loaded, use the command:

$ module list

The command to compile a c program for mpi is mpicc. Which compiler is used when this command is issued depends on what compiler module was loaded before openmpi

To compile, enter the command:

$ mpicc -o hello hello.c

You should add optimization and other flags to the mpicc command, just as you would to the compiler used. So if the pgi compiler is used and you wish to compile an mpi program written in C with good, fast optimization you should use a command similar to the following:

$ mpicc -fast -o hello hello.c

To run the mpi program hello using the batch system:

#!/bin/bash -l
# hello.sh :  execute parallel mpi program hello on slurm
# use openmpi
# command: $ sbatch hello.sh
# slurm options use the sentinel #SBATCH
#SBATCH -A your_project_name
#SBATCH -J mpitest
#SBATCH -o hello.out
# 
# request 5 seconds of run time
#SBATCH -t 00:00:05
#SBATCH -p node -n 8
module load pgi/2011 openmpi/1.5.0
mpirun ./hello

The last line in the script is the command used to start the program.
The last word on the last line is the program name hello.

Submit the job to the batch queue:

$ sbatch hello.sh

The program's output to stdout is saved in the file named at the -o flag.
A test run of the above program yelds the following output file:

$ cat hello.out
mod: loaded OpenMPI 1.5.0, compiled with pgi10.9 (found in /opt/openmpi/1.5.0pgi10.9/)
From process 4 out of 8, Hello World!
From process 5 out of 8, Hello World!
From process 2 out of 8, Hello World!
From process 7 out of 8, Hello World!
From process 6 out of 8, Hello World!
From process 3 out of 8, Hello World!
From process 1 out of 8, Hello World!
From process 0 out of 8, Hello World! 

Fortran programs

The following example program does numerical integration to find Pi (inefficiently, but it is just an example):

program testampi
    implicit none
    include 'mpif.h'
    double precision :: h,x0,x1,v0,v1
    double precision :: a,amaster
    integer :: i,intlen,rank,size,ierr,istart,iend
    call MPI_Init(ierr)
    call MPI_Comm_size(MPI_COMM_WORLD,size,ierr)
    call MPI_Comm_rank(MPI_COMM_WORLD,rank,ierr)
    intlen=100000000
    write (*,*) 'I am node ',rank+1,' out of ',size,' nodes.'
 
    h=1.d0/intlen
    istart=(intlen-1)*rank/size
    iend=(intlen-1)*(rank+1)/size
    write (*,*) 'start is ', istart
    write (*,*) 'end is ', iend
    a=0.d0
    do i=istart,iend
           x0=i*h
           x1=(i+1)*h
           v0=sqrt(1.d0-x0*x0)
           v1=sqrt(1.d0-x1*x1)
           a=a+0.5*(v0+v1)*h
    enddo
    write (*,*) 'Result from node ',rank+1,' is ',a
    call MPI_Reduce(a,amaster,1, &
             MPI_DOUBLE_PRECISION,MPI_SUM,0,MPI_COMM_WORLD,ierr)
    if (rank.eq.0) then
           write (*,*) 'Result of integration is ',amaster
           write (*,*) 'Estimate of Pi is ',amaster*4.d0
    endif
    call MPI_Finalize(ierr)
    stop
end program testampi

The program can be compiled by this procedure:

$ module load pgi openmpi
$ mpif90 -fast -o testampi testampi.f90

The program can be run by creating a submit script sub.sh:

#!/bin/bash -l
# execute parallel mpi program in slurm
# command: $ sbatch sub.sh
# slurm options use the sentinel #SBATCH
#SBATCH -J mpitest
#SBATCH -A your_project_name
#SBATCH -o pi
#
# request 5 seconds of run time
#SBATCH -t 00:00:05
#
#SBATCH -p node -n 8
module load pgi/2011 openmpi/1.5.0
mpirun ./testampi

Submit it:

sbatch sub.sh

Output from the program on kalkyl:

mod: loaded OpenMPI 1.5.0, compiled with pgi10.9 (found in /opt/openmpi/1.5.0pgi10.9/)
I am node             8  out of             8  nodes.
start is      87499999
end is      99999999
I am node             3  out of             8  nodes.
start is      24999999
end is      37499999
I am node             5  out of             8  nodes.
start is      49999999
end is      62499999
I am node             2  out of             8  nodes.
start is      12499999
end is      24999999
I am node             7  out of             8  nodes.
start is      74999999
end is      87499999
I am node             6  out of             8  nodes.
start is      62499999
end is      74999999
I am node             1  out of             8  nodes.
start is             0
end is      12499999
I am node             4  out of             8  nodes.
start is      37499999
end is      49999999
Result from node             8  is    4.0876483237300587E-002
Result from node             5  is    0.1032052706959522     
Result from node             2  is    0.1226971551244773     
Result from node             3  is    0.1186446918315650     
Result from node             7  is    7.2451466712425514E-002
Result from node             6  is    9.0559231928350928E-002
Result from node             1  is    0.1246737119371059     
Result from node             4  is    0.1122902087263801     
Result of integration is    0.7853982201935574     
Estimate of Pi is     3.141592880774230     
Warning: ieee_inexact is signaling
FORTRAN STOP
Warning: ieee_inexact is signaling
FORTRAN STOP
Warning: ieee_inexact is signaling
FORTRAN STOP
Warning: ieee_inexact is signaling
FORTRAN STOP
Warning: ieee_inexact is signaling
FORTRAN STOP
Warning: ieee_inexact is signaling
FORTRAN STOP
Warning: ieee_inexact is signaling
FORTRAN STOP
Warning: ieee_inexact is signaling
FORTRAN STOP

OpenMP

Enter the following openmp program in c and save in the file hello.c

/* hello.c :  openmp program in c printing a message from each thread */
 <stdio.h>
 <omp.h>
int main()
{
      int nthreads, tid;
      #pragma omp parallel private(nthreads, tid)
      {
            nthreads = omp_get_num_threads();
            tid = omp_get_thread_num();
           printf("From thread %d out of %d, hello, world\n", tid, nthreads);
    }
    return 0;
}

Programs for openmp must be compiled with the pgi compiler. To use it, the module pgi must be loaded:

$ module load pgi/2011

To compile, enter the command (note the -mp flag):

$ pgcc -mp -o hello hello.c

Also here you should add optimization flags such as -fast as appropriate.

To run the openmp program hello using the batch system, enter the following shell script in the file hello.sh:

#!/bin/bash -l
# hello.sh :  execute parallel openmp program hello on slurm
# use openmp
# command: $ sbatch hello.sh
# slurm options use the sentinel #SBATCH
#SBATCH -J openmptest
#SBATCH -A your_project_name
#SBATCH -o hello.out
#
# request 5 seconds of run time
#SBATCH -t 00:00:05
#SBATCH -p node -n 8
uname -n
#Tell the openmp program to use 8 threads
export OMP_NUM_THREADS=8
module load pgi/2011 
ulimit -s  $STACKLIMIT
./hello

The last line in the script is the command used to start the program.

Submit the job to the batch queue:

$ sbatch hello.sh

The program's output to stdout is saved in the file named at the -o flag.
A test run of the above program yelds the following output file:

$ cat hello.out
q33.uppmax.uu.se
unlimited
From thread 0 out of 8, hello, world
From thread 1 out of 8, hello, world
From thread 2 out of 8, hello, world
From thread 3 out of 8, hello, world
From thread 4 out of 8, hello, world
From thread 6 out of 8, hello, world
From thread 7 out of 8, hello, world
From thread 5 out of 8, hello, world

Pthreads

Enter the following program in c and save in the file hello.c

/* hello.c :  create system pthreads and print a message from each thread */
 <stdio.h>
 <pthread.h>
int NTHR = 8;
int nt = NTHR, tid[NTHR];
pthread_attr_t attr;
void *hello(void *id)
{
     printf("From thread %d out of %d: hello, world\n", *((int *) id), nt);
     pthread_exit(0);
}
int main()
{
    int i, arg1;
    pthread_t thread[NTHR];
    /* system threads */
    pthread_attr_init(&attr);
    pthread_attr_setscope(&attr, PTHREAD_SCOPE_SYSTEM);
    /* create threads */
    for (i = 0; i < nt; i++) {
          tid[i] = i;
          pthread_create(&thread[i], &attr, hello, (void *) &tid[i]);
     }
    /* wait for threads to complete */
    for (i = 0; i < nt; i++)
            pthread_join(thread[i], NULL);
      return 0;
}

To compile, enter the commands

$ module load gcc/4.5.0
$ gcc -pthread -o hello hello.c

To run the pthread program hello using the batch system, enter the following shell script in the file hello.sh:

#!/bin/bash -l
# hello.sh :  execute parallel pthreaded program hello on slurm
# command: $ sbatch hello.sh
# slurm options use the sentinel #SBATCH
#SBATCH -N pthreadtest
#SBATCH -A your_project_name
#SBATCH -o hello.out
#
# request 5 seconds of run time
#SBATCH -t 00:00:05
# use openmp programming environment
# to ensure all processors on the same node
#SBATCH -p node -n 8
uname -n
./hello

The last line in the script is the command used to start the program.
Submit the job to the batch queue:

$ sbatch hello.sh

The program's output to stdout is saved in the file named at the -o flag.
A test run of the above program yelds the following output file:

$ cat hello.out
q33.uppmax.uu.se
From thread 0 out of 8: hello, world
From thread 4 out of 8: hello, world
From thread 5 out of 8: hello, world
From thread 6 out of 8: hello, world
From thread 7 out of 8: hello, world
From thread 1 out of 8: hello, world
From thread 2 out of 8: hello, world
From thread 3 out of 8: hello, world