A Survey of MPI Related Debuggers and Tools
http://en.wikipedia.org/wiki/Message_Passing_Interface
http://www.eecatalog.com/multicore/index.php?page=editorialµsiteId=25&editorialId=162
www.cs.utah.edu/research/
Message Passing Interface
- MPI is a language-independent communications protocol used to program parallel computers.
- Message Passing Interface (MPI) is the most popular form of message passing APIs for "widely" distributed computing
- Often a so-called hybrid-model for parallel programming, using both OpenMP and MPI, is used for programming computer clusters
1. MPI-CHECK
Operation Logic
- Using a macro-like mechanism, it instruments the programs
- MPI class are replace with modified calls that have extra arguments.
- Arguments provide information such as line number in the source code
User Scenario
- In phase one, instrumentation of MPI programs is performed followed by their compilation
- In phase two, execution of the instrumented MPI code under the control of the MPI-CHECK server takes place
Features
- Mismatch in argument type, kind, rank or number.
(Some checks can be done statically.)
- Confirm if the bounds of the message buffer exceed the allocated size.
- Potential and real deadlock detection by creating dependency graphs from calls made for point-to-point or collective communication
- Negative message lengths
- MPI calls before MPI_Init or after MPI_Finalize
Disadvantage
- Significant overhead of instrumenting the user code and building the Program Database(PDB)
2. MARMOT
Features
- More than one call to MPI_Init in an application
- Any pending messages or active requests in any communicator at the time of MPI_Finalize
- Detects possible real deadlocks, using a time-out mechanism
- Gives warnings if there are active non-freed requests left at MPI_Finalize
3. Intel Message Checker (IMC)
Operation Logic
- Collects information of each MPI class in a trace file using a library file libVTmc.so
- This trace file is then analyzed by a checking engine after the execution
Features
- Mismatch of send and receive calls caused by incorrect specification of message sender or receiver
- Potential or real deadlocks. Potential deadlocks are identified by the time-out mechanism. The wait time is a configurable entity.
- Mismatch of checksums of send and received messages.
- Infinite loop or abnormal program termination in an MPI function call
- Memory leaks occur when communicator is freed.
- Prints a list of unfreed requests at the time of MPI_Finalize
4. TotalView
It is designed especially for complex multi-process or multi-threaded applications
- User Scenario
Users can select an appropriate command
Go, Halt, Step, Kill, Next, etc.
Users are provided with an option to decide upon the scope of the chosen command
Group, Process or Thread
5. Scoping
- Group Scoping: Executes the chosen command on all the processes that define that Group
- Process Scoping: Executes the chosen command on a single selected process. If the process has several threads, then the command influences all the threads owned by the process
- Thread Scoping: Executes the chosen command on a single specified thread of a Multi-threaded process