next previous contents
Next: World Wide Web Previous: Lost Output

Error Messages

Broken Pipe - When one of the processors running your program crashes or is rebooted, you may get an error telling you that one of your MPI pipes has broken. When your program crashes the first thing to check is that it is not a problem with the machines you were using (by trying to ping or ssh to them). then you can start to work out if it is your code that is causing problems.

Many messages - When an MPI job crashes, you typically get more than one line of error messages. The FIRST line is the most important and contains the clue to your actual problem. The rest of the messages are usually the system's attempt to clean up the rest of the processes that have been left hanging!

Uninitialized variables - Another potential problem error could be uninitialized variables. MPI_Init in the main part of your program appears to set uninitialized variables to zero; however, uninitialized variables in subroutines appear to be set to the usual compiler initialization; that is, garbage. Beware of subroutines bearing garbage! A clue to this problem is a SIGFPE error message.

A reminder of common signals and their explanation:

SIGABRT - Abnormal termination of the program (such as a call to abort).
SIGFPE  - An erroneous arithmetic operation, such as a divide-by-zero or an operation resulting in overflow
SIGILL -  Detection of an illegal instruction
SIGINT - Receipt of an interactive attention signal (^C)
SIGSEGV - An invalid access to storage (a problem with using arrays and pointers)
SIGTERM - A termination request sent to the program