Introduction to mpi4py
mpi4py is a Python library built on top of MPI, which makes Python data structures easy to pass across multiple processes.
mpi4py is a very powerful library that implements many interfaces in the MPI standard, including point-to-point communication, collective communication, blocking/non-blocking communication, inter-group communication, etc. Basically, all MPI interfaces that can be used have corresponding implementations. Not only any Python object that can be pickled, mpi4py also has good support for Python objects with a single-segment buffer interface such as numpy arrays and built-in bytes/string/array, etc., and the transfer is very efficient. At the same time, it also provides SWIG and F2PY interfaces, so that C/C++ or Fortran programs can still use mpi4py objects and interfaces for parallel processing after being packaged into Python.
Dependencies
To properly install and use mpi4py, you need to install and set up the following software:
- An MPI implementation software, preferably supporting the MPI-3 standard, and preferably dynamically compiled. The more commonly used MPI implementation software includes OpenMPI, MPICH, etc.
- Python 2.7, Python 3.3+. To write parallel programs in Python, Python is naturally essential.
Install mpi4py
After you have installed the above dependent software, you can install mpi4py, but before that you need mpicc on your program search path, you can simply check it with the following command:
$ which mpicc
If the command outputs the corresponding mpicc execution path, you can perform the following installation steps, but if there is no output, you need to add the bin directory of the installed MPI software to the PATH environment variable, lib The directory is added to the LD_LIBRARY_PATH environment variable so that mpicc can be searched.
Install with pip
Installing mpi4py with pip is very easy, if you have root privileges, just type the following command in the terminal:
$ pip install mpi4py
If you don't have root privileges, you can install mpi4py into your $HOME directory for your own use only:
$ pip install mpi4py --user
Make sure that the installed executable path ~/.local/bin is added to the PATH environment variable, and the library file path ~/.local/lib is added to the LD_LIBRARY_PATH environment variable.
Install from source
Download the mpi4py installation package from https://pypi.python.org/pypi/mpi4py, and then unzip the installation package:
$ tar -xvzf mpi4py-X.Y.Z.tar.gz
$ cd mpi4py-X.Y.Z
Compile and install the package:
$ python setup.py build
After compiling, you can install it:
$ python setup.py install
or
$ python setup.py install --user
Test if mpi4py is installed correctly
Now you can write a simple program to test whether mpi4py is installed and can be used normally:
# mpi_helloworld.py
from mpi4py import MPI
comm = MPI.COMM_WORLD
size = comm.Get_size()
rank = comm.Get_rank()
node_name = MPI.Get_processor_name() # get the name of the node
print('Hello world from process %d at %s.' % (rank, node_name))
run the mpi4py program
Run an MPI program written in Python with the following command:
$ mpiexec -n 3 python mpi_helloworld.py
Hello world from process 2 at node1.
Hello world from process 0 at node1.
Hello world from process 1 at node1.
It is also possible to use an older way:
$ mpirun -np 3 python mpi_helloworld.py
Hello world from process 2 at node1.
Hello world from process 0 at node1.
Hello world from process 1 at node1.
Where -n or -np specifies how many MPI processes to use to execute the program.
The above command will launch 3 MPI processes on a single node (single machine) to execute mpi_helloworld.py
in parallel. If you want to execute the program on multiple nodes (multiple machines) in parallel, you can use the following command:
$ mpiexec -n 3 -host node1,node2,node3 python mpi_helloworld.py
Hello world from process 1 at node2.
Hello world from process 2 at node3.
Hello world from process 0 at node1.
Where -host (or -H) specifies the nodes to be used, separated by commas. If there are many nodes, you can also specify a file with the option -hostfile or -machinefile, and write the compute nodes you need to use in this file. More run options can be obtained with the following command:
$ mpiexec --help