READER BEWARE: This got long and geeky, so make sure you're really trying to avoid doing "real work" before you sit down to read.
Welcome back all, and thanks for joining me for the final installment of a three part series on decompiling Java code and analyzing stack traces. If you're interested in the back story, you can read about decompiling Java here, and analyzing basic stack traces here.
When we last left off, our hero was frozen in a block of Carbonite and left to his doom. Err...sorry, wrong story, let me start again.
When we last left off, we walked through analyzing a simple stack trace to run down a bug in a standalone Java program. This is all fine and good, but what happens when you're running a Java application server (like Tomcat or Weblogic) that has hundreds of concurrent threads running tens of different web apps? And you're just trying to figure out why your particular web application is hung?
Well, if you're lucky, the developer of your web application was a good boy/girl, and they're logging stack traces for you in a log file somewhere. If this is the case, then you can open up the log and analyze the trace like we did the last go round.
Often times though, you're not so lucky, and you have to dig a little deeper to figure out what's going on. Say, for instance, your web application just starts responding very slowly. You see from access logs that responses are being served back to users, but they're about 5 times slower than normal...WTF? Or, what if your application server starts pegging the box at 100% CPU...what to do? Or, out of nowhere, your app server starts throwing Java.lang.OutOfMemory exceptions...Gah!!! Sadly, these nebulous problems seem to happen in a production environment more often than most of us would like to admit. And when they do occur, it's usually a high stress situation because there's probably a production outage and nobody really knows why. So, how do we find a better fix to the problem than the traditional, "Let's just bounce it and see what happens" response? Why, we use "The Force", of course. Except in this case, "The Force", is just a set of debugging tips that I'm getting ready to share with you as follows:
- Take a deep breath and don't freak out when a bunch of people start yelling.
- Understand the severity of the situation. Figure out how much downtime you can tolerate for debugging before things really hit the fan.
- Set expectations. Let people know when they can expect the issue to be fixed. If you know a bounce will temporarily fix the problem, then set a drop-dead time for debugging and schedule a bounce. Let folks know that if you don't find a root-cause of the problem, that they should expect to see the issue again.
- Gather as much information about the problem as you can. What, specifically, are users experiencing? Can you reproduce? What are the symptoms? What log messages do you have? What does the server environment look like? etc.
- Try to relate this issue to something you've seen in the past. Does this look like a problem you saw last week or last month? What did you do to fix it then? Why is it popping up again now?
- Eliminate possible external causes. Is this actually a network problem in disguise? Is the database acting up? Is some other process on the server eating up CPU? Is the server constantly swapping because it doesn't have enough RAM to handle all its business?
- Eliminate environment changes as possible causes. Has the server environment changed recently? Could this be causing your problem?
- Look in earnest at your application server process. Are we bumping up against the Java max. heap size? Are we seeing lots of garbage collection? Is our thread pool exhausted? Is our JDBC pool maxed out?
- Take a look inside of the application server JVM. Generate a thread dump to get a snapshot of how your application server is behaving. Analyze the stack traces in the thread dump, and look for points of interest, like...race conditions, stuck threads, deadlock, etc.
- Open a ticket with the software vendor (If applicable)
- Pray the issue fixes itself and doesn't come back
- Find a new job so you don't have to deal with these problems anymore
Read on for more fascinating details about The Stack Trace...



