WWW Crawlers gotcha down? De-mystifying the nebulous “OKAssertError (282610)” error and the dreaded Property Bag

Automation Server, Portal Server by Brian Hak on July 28th, 2009 1 Comment

Howdy all,

We recently had a customer who was running into a problem with their WWW crawlers.  Sadly, the verbose job log wasn’t of much help in tracking down the issue:

Jul 20, 2009 5:25:02 PM- *** Job Operation #1 failed: This crawl could not be launched because the location from which it was supposed to start, http://www.somecoolsite.com could not be found or was inaccessible.  When the crawler attempted to visit this location it received the following message: Exception of type com.plumtree.openkernel.exceptions.OKAssertError was thrown.(282610)

Well, we’ll never get those two seconds of our life that it took us to but the job in verbose mode back.  Perhaps a PTSpy will give us a little more useful information.  Let’s have a looksee:

19 7-24-2009 10:17:36.351 Warning Directory automation.xxx.xxx.wadm NodePullerWorker1 com.plumtree.server.impl.directory.providers.PTWebCrawlProvider


Error in function PTWebCrawlProvider.Initialize (pBagRepositoryInfo == 

<?xml version=”1.0″ encoding=”ucs-2″?>

<PTBAG V=”1.1″ xml:space=”preserve”>

<S N=”PTC_WEB_LOGIN”></S>

<S N=”PTC_WEB_HDRS”></S>

<S N=”PTC_PBAGFORMAT”>1002</S>

<S N=”PTC_WEB_PR_LOGIN”></S>

<A N=”PTC_WEB_COOK”>

<I N=”DIMS”>1</I>

<I N=”MAX0″>2</I>

<A N=”0″>

<I N=”DIMS”>1</I>

<I N=”MAX0″>-1</I>

</A>

<A N=”1″>

<I N=”DIMS”>1</I>

<I N=”MAX0″>-1</I>

</A>

<A N=”2″>

<I N=”DIMS”>1</I>
<I N=”MAX0″>-1</I>

</A>

</A>
<S N=”PTC_WEB_PR_PASS”></S>
<S N=”PTC_WEB_PASS”></S>
<S N=”PTC_WEB_PDATA”></S>
<S N=”PTC_WEB_PURL”></S>
<I N=”PTC_WEB_BYPASS”>0</I>
<S N=”PTC_WEB_LURL”></S>

</PTBAG>, 

com.plumtree.openfoundation.util.XPExceptionpSession == com.plumtree.server.impl.core.PTSession@196e136, pDocumentTypeMap == com.plumtree.server.impl.directory.PTDocumentTypeMap@9bad5a, lWebServiceID == 0)

OK, now we’re getting somewhere.  Looks like the Crawler had some trouble initializing.  And there’s some “PTBag” xml object that might be the culprit.  What’s this crazy XML and “PTBag” noise?  Hold tight, more on that later.  For now though, let’s take a look a little further down in the PTSpy log:

com.plumtree.openfoundation.util.XPException.GetInstance(XPException.java:371)
com.plumtree.server.impl.directory.providers.PTWebCrawlProvi
der.Initialize(PTWebCrawlProvider.java:102)

com.plumtree.server.impl.directory.utils.PTProviderFactory.CreateInitializedCrawlProvider(PTProviderFactory.java:46)
… 7 more
Caused by: com.plumtree.openfoundation.util.XPException
com.plumtree.openfoundation.util.XPPropertyBag.ReadAsInt(XPPropertyBag.java:363)
com.plumtree.server.impl.directory.providers.PTWebCrawlProvider.Initialize(PTWebCrawlProvider.java:88)
… 8 more
Caused by: java.lang.ClassCastException
com.plumtree.openfoundation.util.XPPropertyBag.ReadAsInt(XPPropertyBag.java:356)
… 9 more


Well, well, well, what have we here?  Looks like our good friend the stack trace.  Now we have a real, low-level, clue to what’s going on.  Specifically, the last line of the stack trace:

Caused by: java.lang.ClassCastException
com.plumtree.openfoundation.util.XPPropertyBag.ReadAsInt(XPPropertyBag.java:356)

tells us that the code is expecting to read an integer from the XML data listed above, but instead, it found something that it can’t handle, likely a string.
So what to do?  Well, we could start de-compiling a bunch of code, but that’s probably a waste of time in this case.  Instead, let’s dig a little deeper into the Property Bag and try to figure out what’s going on there.
Remember when I told you to hold tight for a discussion on Property Bags?  Well you can stop holding tight now.  Unless you’re more comfortable that way, in which case…err…well…carry on.
Anyhow, a long time ago in a universe far far away, there was a product called the Plumtree Corporate Portal.  Over time, this product has been know by many names; Plumtree, Portal, ALUI, ALI, WCI.  But here’s a dirty little secret for you; re-branding aside, much of the codebase of the original Plumtree product remains in tact; that’s why you see “PT”-this and “com.plumtree” that everywhere when you start looking under-the-covers.  I’m rambling, I know, but the upshot is that ever since way-back-when, the portal has stored Object property data in the database in the PTObjects table as serialized XML.  Some people may question the wisdom of this approach, but, circa 1998, I guess it seemed like a good idea, “Hey, I know, let’s wrap every damn piece of data on earth in XML so we can bloat everything into oblivion”.  If you take a minute to open up your portal database, you’ll see that the PTobjects looks something like this:
ptobjectproperties1.png
The fields properties1 and properties2 are used to carry the payload of the Propery Bag data; i.e. XML data describing an object.  Note that this table contains property data for all types of objects in the portal, hence the need for both classid and objectid primary key of the table.  So, great, we can store about 500 characters of XML data about an object.  That should almost allow us to keep the header of the XML document in the database.  But what happens when we have more than 500 characters?  Enter the pagenumber field.  To faciliate large data storage, the table is designed to have multiple rows of property data, ordered by pagenumber.  That means, if you want to grab the full Property bag data out of the table for a given object, you’ll want a query that looks something like this:

select properties1, properties2 from ptobjectproperties where objectid=X and classid=y order by pagenumber


More generically, we can just look at all the data in the table in a relatively easy-to-follow-format by running:

select * from ptobjectproperties order by objectid, classid, pagenumber

ptobjectproperties2.png
Wow…look at all that beautiful XML.  If you want to make something useful out of this data, export the results to your favorite text editor, format nicely, and BAM!  You have the property bag for an object of interest; it’s just that simple.
Now back to our original problem.  PTSpy is telling us that the property back of the WWW Provider is screwed up somehow.  So we go to that database and pull out the property bag info as listed above and take a look at it.  Hmm….nothing looks particularly out of the ordinary.  I suppose I could go through the hassle of comparing this property bag to one that works, but that’s probably a waste of time.  I can tell you it’s a waste of time because I actually did bother to compare a broken property bag with a good one and didn’t see any obvious problems.  A better approach is probably just to re-create the WWW data source, and associate our crawler to the newly created data source.
Wouldn’t you know it, we created a new WWW data source, a new copy of the previously broken WWW crawler, and everything was rainbows!  So what happened?  To be honest, I’m still not completely sure.  That said, I do have an educated guess: this customer has been running the portal for a long time, and gone down the upgrade path of portal 4.x->5.x->6.1.  Until this time, they hadn’t really used WWW crawlers.  It seems likely that somewhere along the way, an upgrade failed to update the data source property bag to work with a newer version of the crawler code, and introduced our problem.  The truth is the world may never know the root of the problem, but at least it’s fixed.

One Response to “WWW Crawlers gotcha down? De-mystifying the nebulous “OKAssertError (282610)” error and the dreaded Property Bag”

  1. omidk.myopenid.com says:

    My understanding was that property bags exist as an alternative to serializing an object. Because the WCI supports .NET and Java they went with XML to work with whatever library they use that is used across both platforms.
    I agree with you though that I think if it was designed today, few people would design a database like that.

Leave a Reply

You must be logged in to post a comment.