Recently in Development Category

READER BEWARE: This got long and geeky, so make sure you're really trying to avoid doing "real work" before you sit down to read.

Welcome back all, and thanks for joining me for the final installment of a three part series on decompiling Java code and analyzing stack traces. If you're interested in the back story, you can read about decompiling Java here, and analyzing basic stack traces here.

When we last left off, our hero was frozen in a block of Carbonite and left to his doom. Err...sorry, wrong story, let me start again.

When we last left off, we walked through analyzing a simple stack trace to run down a bug in a standalone Java program. This is all fine and good, but what happens when you're running a Java application server (like Tomcat or Weblogic) that has hundreds of concurrent threads running tens of different web apps? And you're just trying to figure out why your particular web application is hung?

Well, if you're lucky, the developer of your web application was a good boy/girl, and they're logging stack traces for you in a log file somewhere. If this is the case, then you can open up the log and analyze the trace like we did the last go round.

Often times though, you're not so lucky, and you have to dig a little deeper to figure out what's going on. Say, for instance, your web application just starts responding very slowly. You see from access logs that responses are being served back to users, but they're about 5 times slower than normal...WTF? Or, what if your application server starts pegging the box at 100% CPU...what to do? Or, out of nowhere, your app server starts throwing Java.lang.OutOfMemory exceptions...Gah!!! Sadly, these nebulous problems seem to happen in a production environment more often than most of us would like to admit. And when they do occur, it's usually a high stress situation because there's probably a production outage and nobody really knows why. So, how do we find a better fix to the problem than the traditional, "Let's just bounce it and see what happens" response? Why, we use "The Force", of course. Except in this case, "The Force", is just a set of debugging tips that I'm getting ready to share with you as follows:

  1. Take a deep breath and don't freak out when a bunch of people start yelling.
  2. Understand the severity of the situation. Figure out how much downtime you can tolerate for debugging before things really hit the fan.
  3. Set expectations. Let people know when they can expect the issue to be fixed. If you know a bounce will temporarily fix the problem, then set a drop-dead time for debugging and schedule a bounce. Let folks know that if you don't find a root-cause of the problem, that they should expect to see the issue again.
  4. Gather as much information about the problem as you can. What, specifically, are users experiencing? Can you reproduce? What are the symptoms? What log messages do you have? What does the server environment look like? etc.
  5. Try to relate this issue to something you've seen in the past. Does this look like a problem you saw last week or last month? What did you do to fix it then? Why is it popping up again now?
  6. Eliminate possible external causes. Is this actually a network problem in disguise? Is the database acting up? Is some other process on the server eating up CPU? Is the server constantly swapping because it doesn't have enough RAM to handle all its business?
  7. Eliminate environment changes as possible causes. Has the server environment changed recently? Could this be causing your problem?
  8. Look in earnest at your application server process. Are we bumping up against the Java max. heap size? Are we seeing lots of garbage collection? Is our thread pool exhausted? Is our JDBC pool maxed out?
  9. Take a look inside of the application server JVM. Generate a thread dump to get a snapshot of how your application server is behaving. Analyze the stack traces in the thread dump, and look for points of interest, like...race conditions, stuck threads, deadlock, etc.
  10. Open a ticket with the software vendor (If applicable)
  11. Pray the issue fixes itself and doesn't come back
  12. Find a new job so you don't have to deal with these problems anymore

Read on for more fascinating details about The Stack Trace...

AJAX Refresher

Comments (0)

It's been a while since we touched on AJAX, but a question came up recently about it an I thought it might be good to review. AJAX, or "Asynchronous JavaScript and XML", is a way for portlet developers to create rich Web Applications that don't require the entire browser page to refresh to update content.  This is done by making asynchronous calls to the server and updating content within the page itself.  With the AquaLogic portal, this means that portlets can dynamically update content in <div> tags by requesting new content without having to refresh the entire page (and other portlets on the page).  It's a pretty simple concept; in many cases you can accomplish this without having to even change any code - you can just specify "inline refresh" on the Web Service and the portal will automatically rewrite the HTML links on the page to make AJAX calls:

inline_refresh.jpg

The HTML rewrites cause the browser to make the HTTP request "behind the scenes", and when a response comes back, the portal refreshes the content inside the portlet <div> tag.

But there are some things to know about this AJAX stuff, so here are a couple of refresher points about AJAX:

1) The response to an AJAX request is basically just a text string to a browser, and it's up to your JavaScript to interpret it.  Often you do something like:

document.getElementById("responseDivTag").innerHTML = response.getResponse();

... to refresh content.  But note that this doesn't tell the browser to "process" the response - specifically, JavaScript that comes back in the response won't run, because all we're doing is setting the HTML to a string that comes back from the server.  In order to run JavaScript in the response, you should look into the JavaScript "eval()" method, which will take a string returned from the server and run it as JavaScript.  Just make sure you don't include the <script> tags in your response if you really are returning JavaScript and are parsing it as such.

2) The response does not have to actually be HTML!  It's just a string to the browser, and you can do anything with it.  The most common use (which all of our products use) is to return JSON, or "JavaScript Object Notation", which can then be treated as objects that your script can handle however you want.  Let's say you just want to know if there was a success or failure: you could literally just return a "0" or "1" in your response and write something like:

if (response.getResponseText.equals("1"))

   alert("success!");

else

   alert("fail");

Obviously, this just barely scratches the surface on AJAX, and you can rest assured that you haven't heard the last of it.  AJAX is the cornerstone of pretty much all future Oracle portal technologies, and if you're a web developer who's not all that familiar with it, trust me:  you will be soon.

The Stack Trace Strikes Back

Comments (1)

Howdy all. Welcome to part two of three of what was originally conceived as a one part series. It's entirely possible that I'll get all George Lucas on you years from now and produce some more of these posts that are a complete letdown and affront to your childhood memories, but I digress. For now, rest assured that this post will knock your socks off as a follow-up to my last tidbit on decompiling Java code.

Without further ado, I give you...Stack Wars II: The Stacktrace Strikes Back (I'm completely aware that I'm abusing the metaphor here, but isn't that really what blogging is all about?).

Standard disclaimer: This post is intended for a technical audience with a focus on production support. Also, everything here is Java focused, but you can certainly apply some of the concepts in a .NET environment as well...you'll just have to create your own screencaps to replace the examples I've included below.

So, what is a stack trace, and why should you care? Well, one question at a time please.

What is a stack trace?

Wikipedia says a Stack Trace is, "A report of the active stack frames instantiated by the execution of a program." Now, I vaguely understand the Wikipedia definition, but I have also have a computer science degree from a second tier state university, so let me try to translate for those of you who were smart enough to get degrees in something besides CompSci: a Stacktrace is a snapshot of a program's behavior at a point in time. In the Java world, a stack trace will tell you which method was being executed at the time the trace was generated, along with its complete call stack, and usually line numbers as well. Take a look at the following simple stack trace below as an example:

stack_trace.png

Why should you care?

Good question. I'd venture a guess and say that about 99.99% of the world doesn't need to know nor care about stack traces. But here you are reading this post none-the-less, so here's why they're important:

1) Good programmers almost always print stack traces out in log files when an error in a program occurs. This gives us a useful tool to track down bugs. Whether you're just reporting information to a support team somewhere, or getting a little sassy and trying to fix a problem yourself, the stack trace is like a map for finding treasure buried deep in code. Except that instead of finding actual treasure, you're just finding a logic error. And instead of getting rich, you just get to complain about a problem, and maybe fix it.

2) You can tell the JVM to generate a stack trace for a running process. Doing so allows us to take a snapshot of the JVM at an arbitrary point in time, and see what all its threads are up to. This is useful when trying to figure out why a process (Tomcat for instance) is zombied (i.e. it's running, but not responding to requests), or when you're trying to fix deadlock issues, which are particularly difficult to run down.

Hit the jump for learning more about reading and interpreting stack traces!

Yeah, this one is only marginally useful, can only be considered a "Cool Tool" in the broadest sense of the title, and not nearly as fun of an Easter Egg as we've posted in the past, but it's neat to check out JavaScript internals of the portal nonetheless.

Basically, open up your portal in FireFox (this doesn't seem to work in IE), and hit CTRL-SHIFT-3.  That is, hold down CTRL and SHIFT and hit 1, 2, or 3.  You'll get a little JS window at the bottom of your portal page showing some JS debug information that you can use to see what's going on under the covers and evaluate ALI portal JS objects in real time.

The "1, 2, or 3" that I mentioned above refers to the log threshold you'd like to look at (use "0" to turn it off); three is the highest.

Fun?  If you're a curious techie who gets a thrill out of this sort of stuff.  Useful?  I've never used it (but then again, I only meet the first two of these criteria.  OK, fine, who am I kidding, I meet all three - but it's still never been useful for me).

js_debug.jpg

Find any practical uses for this hidden feature?  Hit me up in the comments.

In Analytics, there aren't many options to filter the reports well - some reports allow filtering on a user property, some on an auth source, and some on communities.

It's the latter one that I found myself needing when I wanted to create Community Analytics Reports, filtered on a certain set of communities (we're using lots of Experience Definitions, so since that wasn't an option to filter on, I figured I'd create an Analytics Report that just filtered on all communities within each Experience Definition).

Problem is, Analytics 2.1 has a bug that prevents you from browsing and selecting multiple communities in a folder. So when trying to configure a Community Traffic Report and using the Browse button here:

analytics_bug_communities2.jpg

... you just got this when navigating to a folder with a bunch of communities in it:

analytics_bug_before.jpg

The standard BEA (Oracle?) party line is that this bug is fixed in Analytics 2.5 (which I'm sure it is), and to just search for communities here.  But unfortunately 2.5 only works with the 6.5 portal, and many of you aren't ready to make that move (drop us a line when you are!).

Fortunately, there's a (relatively) easy fix after the jump!

Performance Tuning Tips

Comments (0)

For those of you unaware, Dev2Dev is meeting a grisly fate:  it won't be with us much longer (apparently all content except for the blogs up there will be migrated to the Oracle Mother Ship).  No doubt our friends at Oracle will come up with an alternative way for employees to speak their minds, but for now many employees and "alumni" still have something to say, and we want to give them a forum.  Today's guest post is from Ray Gao, one he started while still at BEA (warning: dev2dev links are not long for this world...) on Performance Tuning of the ALI Portal.

The gist of his post is that there are a lot of moving parts in performance tuning, including the portal, remote tier, and the database: the performance chain is only as strong as its weakest link.

To get more of his great high-level overview on performance tuning, click on through for a good read!

AquaLogic IDK Traffic Analysis

Comments (0)

This post is a little more technical than we usually write about in the blog, but it was an interesting exercise that I thought was worth sharing.

The IDK, or AquaLogic Interaction Development Kit allows remote code to load and manipulate ALUI objects. This is done primarily via SOAP calls to the WS API Server. For a deeper dive, check out Ross Brodbeck's How does the IDK Work? blog.

The neat thing about the IDK's API is that the SOAP calls are abstracted out for you, so you don't need to worry about the implementation details. You just need your remote code to be able to connect to the WS API server. Ignorance is not always bliss, though; if you are accessing Collab or Publisher objects, the code also needs direct access to running instances of Collab and Publisher.

This isn't particularly earth-shattering, but in distributed environments, or, say, environments where your remote servers have host names that are in HOST files rather than DNS entries, you could have problems with remotely executed code. Another possible trouble point is if you use the serverconfig.xml hack to load balance your servers; the IDK on a remote server won't be able to resolve those host names.

Technical network traces after the jump...

Performance Tuning by Yahoo

Comments (0)
Here's an interesting article by Yahoo: "Thirteen Simple Rules for Speeding Up Your Web Site". While you may not be able to optimize some of the ALUI portal's less-than-stellar behaviors related to performance (really, 1+ Megabyte .js files?), these tips can help eke out even more performance from your custom code. Some aren't really relevant to custom code being served through a portal server (such as “use a Content Distribution Network"), but many are (such as "GZip Components" and "Make JavaScript and CSS External").

The interesting thing to remember here is that from a web browser's perspective, your portal is just a web site, subject to the same caching rules and performance limitations. Most optimizations you read about for web sites can be applied to your own site.

DataDirect DB Driver Debugging

Comments (0)

Here's an undocumented little trick for debugging database transactions. The .NET ALUI portal uses DataDirect's ADO.NET database drivers for Oracle and SQL Server, so there are some settings that can be used to extract a lot (a LOT) of logging information about the state of the connection pool and SQL queries being executed.

Note that because this configuration can generate over a hundred megabytes a minute (!), this should just be used temporarily for debugging purposes.

Simply add the following lines to the database component section of the %PORTAL_HOME%\settings\common\serverconfig.xml file:

    <setting name="database-connection:trace-enable">
      <value xsi:type="xsd:boolean">true</value>
    </setting>
   <!-- java/ado.net trace file.  Required if trace-enable true -->
    <setting name="database-connection:trace-file">
      <value xsi:type="xsd:string">c:\temp\trace.out</value>
    </setting>

Logging is an important part of portal diagnostics. In addition to using PTSpy for real-time diagnostics, the ALUI portal now ships with a Plumtree Logger service that can capture events and record them to a file. The great thing about this service is that it uses log4j to configure how and what events are recorded.

Unfortunately, in many cases the default logging configuration isn't adequate. The default logging for the portal is to roll over daily, and in some cases with heavily used portals this can cause the log files to exceed hundreds of megs a day, which means you can't even open them in PTSpy to view all the information. The good news is that log4j is easy to configure - instead of using a "DailyRollingFileAppender", you can use a standard "RollingFileAppender". This allows the logging to roll over when the file hits a certain size, not just once per day. That way, you can configure the logging service to maintain a finite set of logs with a managable size.

To configure this, open %PORTAL_HOME%\plumtree\settings\ptlogging\ptLogger.xml. Study the file to see how the various appenders and filters are configured, and follow these steps:

  1. Change the name and class of the PortalDailyLogFile appender:
              <appender class="org.apache.log4j.DailyRollingFileAppender" name="PortalDailyLogFile">
    to:
              <appender class="org.apache.log4j.RollingFileAppender" name="PortalRollingLogFile">
  2. Change the filter to use this appender:
              <filters appender="PortalDailyLogFile" server="portal.nas-vm.Administrator">
    to:
              <filters appender="PortalRollingLogFile" server="portal.nas-vm.Administrator">
  3. Remove the "DatePattern" parameter (we're not going be rolling over based on date, just size):
              <param name="DatePattern" value="'.'yyyy-ww"/>
  4. Change the "Append" parameter to "true" so that the logs aren't overwritten when the service PT Logger Service is restarted:
              <param name="Append" value="false"/>
    to:
              <param name="Append" value="true"/>
  5. Add the following lines to the appender to determine how big the file should be before it's rolled over, and how many files to keep at any time:
              <param name="MaxFileSize" value="50MB"/>
              <param name="MaxBackupIndex" value="10"/>
  6. Restart the PT Logger Service. The new logs will be recorded to %PORTAL_HOME%\ptlogging\logs\portal\.

Log4j Configuration

Remember these are all standard log4j configurations, so you could change the layout of the logging, log to a database, even set up logging to email you when fatal errors occur.