Debugging

Started by Shadow, February 22, 2011, 11:19:10 AM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

Shadow

So... I have a rather large code project I am working on. This is my debugging story.

The code is parallelized using MPI, so that it can run on several hundred CPUs at once. This weekend, there appeared a very weird bug in the code whereby it got stuck on a specific function call, the output. This is exceptionally wierd, because I did not write that part of the code, nor have I even looked at it before today.

So I started adding print statements throughout the function, to narrow down where it is getting stuck. It calls other functions, so having narrowed it down further by the same method, I get to another level of function calls. Again, I keep recursing down until I am at the bottom level - there is only one function call inside this function, and it is MPI library function. So I wrap it in prints to see if it completes.

Running my code again... and it works. No longer getting stuck immediately.

I have no doubt that it will in the future, but it is slow and now I have to wait for the [darn] thing (for anywhere up to 12 hours!) to get stuck to I can figure out if it really is MPI that is screwing up. And if it is... wat can I do about that? It means the cluster I am running on has problems, and I am not a sysadmin. So I will have to wait another week before it gets fixed, at best, wasting time and not getting results.

FML
<=holbs-.. ..-holbs=> <=holbs-..

Krowdon

Did you try raising its allowance? Perhaps $20 a week would be better than $10
Quote from: Ashyra Nightwingi have work to do and that is why i'm playing rwl, this is how it always works

Ungatt Trunn II

No using abbreviations to get around the word filter, and the filter should not have to be used at all because that language is inappropriate for a forum aimed at a young audience. You have been warned shadow.

Ungatt Trunn II
Annoying shadow duties
DIE HIPPIE DIE

Krowdon

*gasp* You have a point. But...
Quote from: Ashyra Nightwingi have work to do and that is why i'm playing rwl, this is how it always works