Okay, you’ve used top or ps to get the process ID, and strace hasn’t told you anything useful. What next?
The next step is to get a stack trace with gdb. A stack trace tells you not only what the program is actually doing right now at a low level (waiting on a network socket), but sometimes also higher level information (what sort of network read it was doing).
Knowing how to use gdb to get stack traces is also handy if you ever need to file a bug on a program that crashes or hangs.
Like strace, gdb uses -p and the process ID. Once it starts up, you’ll get a (gdb) prompt. Type where to get a stack trace.
#0 0x01ad9794 in gfxPangoFontGroup::GetFontAt (this=0xa74e8160, i=0) at gfxPangoFonts.cpp:1936 #1 0x01ad1c11 in GetFontOrGroup (this=0xa51466b4, aKey=0xbfab1e2c) at gfxTextRunWordCache.cpp:899 #2 TextRunWordCache::CacheHashEntry::KeyEquals (this=0xa51466b4, aKey=0xbfab1e2c) at gfxTextRunWordCache.cpp:910 #3 0x01a5cb74 in SearchTable (table=0xb45ce2d0, key=, keyHash=, op=PL_DHASH_ADD) at pldhash.c:472 #4 0x01a5cc50 in PL_DHashTableOperate (table=0xb45ce2d0, key=0xbfab1e2c, op=) at pldhash.c:661 #5 0x01ad2421 in nsTHashtable::PutEntry ( this=0xb45ce2c0, aTextRun=0xa7ee0ae0, aFirstFont=0xad613d30, aStart=8, aEnd=10, aHash=821, aDeferredWords=0x0) at ../../../dist/include/nsTHashtable.h:188 #6 TextRunWordCache::LookupWord (this=0xb45ce2c0, aTextRun=0xa7ee0ae0, aFirstFont=0xad613d30, aStart=8, aEnd=10, aHash=821, aDeferredWords=0x0) at gfxTextRunWordCache.cpp:358 ... etc.
You don’t need to be familiar with Firefox source code to see that it’s doing something with fonts, including something called LookupWord.
If a program is looping, it might not be doing the same thing all the time. When you run gdb -p, it stops the program so you can examine it. But you can continue it by typing c at the prompt. Ctrl-C stops it again, then another where prints another stack trace.
(gdb) where #0 0xb686db07 in ?? () from /usr/lib/firefox-3.6.12/libmozjs.so #1 0xb684bec9 in ?? () from /usr/lib/firefox-3.6.12/libmozjs.so #2 0xb685cf66 in js_Invoke () from /usr/lib/firefox-3.6.12/libmozjs.so #3 0xb6b6231b in ?? () from /usr/lib/firefox-3.6.12/libxul.so
Stack traces can also be handy for programs that are hanging waiting for a resource. Here’s that Python network app I used earlier:
(gdb) where #0 0x006a2422 in __kernel_vsyscall () #1 0x0095d241 in recv () at ../sysdeps/unix/sysv/linux/i386/socket.S:61 #2 0x081301ba in ?? () #3 0x081303b4 in ?? () #4 0x080e0a21 in PyEval_EvalFrameEx () #5 0x080e2807 in PyEval_EvalCodeEx () #6 0x080e0c8b in PyEval_EvalFrameEx () ... etc.
gdb shows the same thing strace did: it’s in recv. The rest just tells you you’re running inside Python, but not where you are in the Python script. How do you find out more?
Stay tuned for the next installment, which will cover techniques for debugging Python, and what to do if you don’t have fancy developer tools like gdb installed on the problem machine.