OH NO!! Splunking log files with multiple formats?? No problem!


I was recently at a client site  for a two-week engagement assisting them with ramping up their Splunk installation, and I came across something particularly interesting. One of the log files the client wanted to index in Splunk contained four different log formats with four different timestamps. Take a look at a sample of the log:

There is actually a fairly simple way to solve this issue and get Splunk to index each event correctly. You need to use datetime.xml and props.conf (take a look here for more information on props.conf) to ensure that Splunk breaks up each event and timestamps it correctly. Creating a custom datetime.xml file will provide Splunk with the different time formats of your data. You can specify any date/time format you need by defining it using regular expressions (more information on regular expressions). You will also need to properly identify each part of the date/time stamp (e.g. day, month, year, hour, minute, second) so Splunk knows exactly what it is and extracts it accordingly. Take a look at the example below:

<!-- [Sat May 31 13:27:14 2012] -->
<define name="_datetimeformat1" extract="litmonth, day, hour, minute, second, year">
<!-- [2012/06/01 8:54:21.599] -->
<define name="_datetimeformat2" extract="year, month, day, hour, minute, second, subsecond">
<!-- Fri Jun  1 02:47:47 2012 -->
<define name="_datetimeformat3" extract="litmonth, day, hour, minute, second, year">
<!-- Jun 01 02:45:35  NDS iMonitor for Novell eDirectory 8.8.5 SP5 v20506.01 SP5 started successfully. -->
<define name="_datetimeformat4" extract="month, day, hour, minute, second">
<use name="_datetimeformat1"/>
<use name="_datetimeformat2"/>
<use name="_datetimeformat3"/>
<use name="_datetimeformat4"/>
<use name="_datetimeformat1"/>
<use name="_datetimeformat2"/>
<use name="_datetimeformat3"/>
<use name="_datetimeformat4"/>

Now that you have created a datetime.xml, you must let Splunk know where it is located and, for this particular sourcetype, you do not want Splunk use its default version located in $SPLUNK_HOME/etc. You do this by adding the DATETIME_CONFIG attribute to props.conf. Also, because we need Splunk to break up the events properly, we have to set the LINE_BREAKER attribute in the same file. Normally, you would accomplish this by creating a regular expression matching the specific pattern of the beginning of an event. Usually, there is only one pattern, however, in this situation there are four distinct patterns we must match. All you need to do is create one large regular expression consisting of the four patterns all separated by a pipe "|" symbol, which stands for "OR". Take a look at the example below:

DATETIME_CONFIG = /etc/apps/sampleapp/datetime.xml
LINE_BREAKER = ([\r\n])+(?=(\[\w{3}\s(\w{3})\s(\d{1,2})\s(\d{2}):(\d{2}):(\d{2})\s(\d{4})\]|\d{10}:?\s\w{4}:|\[\s--\s|\w{3}\s\d{2}\s\d{2}:\d{2}:\d{2}\s\s))

So if you ever come across a log file with multiple formats there is no need to panic. You can index and time stamp  it correctly with no problem. Happy Splunking!!!

image courtesy of: http://www.outlawvern.com/wp-content/uploads/2003/04/mrbill-150x150.jpg

Subscribe to Our Newsletter

Stay In Touch