Introduction to mpi4py

mpi4py is a Python library built on top of MPI, which makes Python data structures easy to pass across multiple processes.

mpi4py is a very powerful library that implements many interfaces in the MPI standard, including point-to-point communication, collective communication, blocking/non-blocking communication, inter-group communication, etc. Basically, all MPI interfaces that can be used have corresponding implementations. Not only any Python object that can be pickled, mpi4py also has good support for Python objects with a single-segment buffer interface such as numpy arrays and built-in bytes/string/array, etc., and the transfer is very efficient. At the same time, it also provides SWIG and F2PY interfaces, so that C/C++ or Fortran programs can still use mpi4py objects and interfaces for parallel processing after being packaged into Python.

Dependencies

To properly install and use mpi4py, you need to install and set up the following software:

An MPI implementation software, preferably supporting the MPI-3 standard, and preferably dynamically compiled. The more commonly used MPI implementation software includes OpenMPI, MPICH, etc.
Python 2.7, Python 3.3+. To write parallel programs in Python, Python is naturally essential.

Install mpi4py

After you have installed the above dependent software, you can install mpi4py, but before that you need mpicc on your program search path, you can simply check it with the following command:

$ which mpicc

If the command outputs the corresponding mpicc execution path, you can perform the following installation steps, but if there is no output, you need to add the bin directory of the installed MPI software to the PATH environment variable, lib The directory is added to the LD_LIBRARY_PATH environment variable so that mpicc can be searched.

Install with pip

Installing mpi4py with pip is very easy, if you have root privileges, just type the following command in the terminal:

$ pip install mpi4py

If you don't have root privileges, you can install mpi4py into your $HOME directory for your own use only:

$ pip install mpi4py --user

Make sure that the installed executable path ~/.local/bin is added to the PATH environment variable, and the library file path ~/.local/lib is added to the LD_LIBRARY_PATH environment variable.

Install from source

Download the mpi4py installation package from https://pypi.python.org/pypi/mpi4py, and then unzip the installation package:

$ tar -xvzf mpi4py-X.Y.Z.tar.gz
$ cd mpi4py-X.Y.Z

Compile and install the package:

$ python setup.py build

After compiling, you can install it:

$ python setup.py install

$ python setup.py install --user

Test if mpi4py is installed correctly

Now you can write a simple program to test whether mpi4py is installed and can be used normally:

# mpi_helloworld.py

from mpi4py import MPI


comm = MPI.COMM_WORLD
size = comm.Get_size()
rank = comm.Get_rank()
node_name = MPI.Get_processor_name() # get the name of the node

print('Hello world from process %d at %s.' % (rank, node_name))

run the mpi4py program

Run an MPI program written in Python with the following command:

$ mpiexec -n 3 python mpi_helloworld.py
Hello world from process 2 at node1.
Hello world from process 0 at node1.
Hello world from process 1 at node1.

It is also possible to use an older way:

$ mpirun -np 3 python mpi_helloworld.py
Hello world from process 2 at node1.
Hello world from process 0 at node1.
Hello world from process 1 at node1.

Where -n or -np specifies how many MPI processes to use to execute the program.

The above command will launch 3 MPI processes on a single node (single machine) to execute mpi_helloworld.py in parallel. If you want to execute the program on multiple nodes (multiple machines) in parallel, you can use the following command:

$ mpiexec -n 3 -host node1,node2,node3 python mpi_helloworld.py
Hello world from process 1 at node2.
Hello world from process 2 at node3.
Hello world from process 0 at node1.

Where -host (or -H) specifies the nodes to be used, separated by commas. If there are many nodes, you can also specify a file with the option -hostfile or -machinefile, and write the compute nodes you need to use in this file. More run options can be obtained with the following command:

$ mpiexec --help

Install and use mpi4py