How to generate 1 TB of data for Splunk Performance Testing

By: Donald April 12, 2016

HOW TO GENERATE 1 TB OF DATA FOR SPLUNK PERFORMANCE TESTING

INTRODUCTION

Splunk, a leader in Event Management provides insight into your business’s machine-generated log data. Splunk enables you to make sense of your business, make smart decisions and initiate corrective actions.

Processing Big Data is by no means a small feat. The ability to scale Splunk to accommodate and grow with your business is key to providing reliable and accurate information. Splunk provides insight into your machine-generated data but there are only a few apps that provide insight on how Splunk is performing. Performance testing has been on-going effort by Splunk and various hardware and software vendors for some time now. Most if not all of these tests were generated by the SplunkIT or Bonnie++. These apps were designed to measure a single indexer that uses a small sample data size. If you wanted to test Splunk’s performance on your own environment, the challenge is where are you going to get a large data set to test with such as 1 TB of machine-generated data?

I will demonstrate how to create 1TB of data with embedded rare and dense search terms using Splunk’s EventGen for Splunk performance testing.

SEARCHES

Embedding search terms into log data will enable the search to scan all three data tiers ( HOT/WARM, COLD data buckets) of the dataset.

The following are search terms we will be generating based on a 10,000,000 line file.

· Very Dense Search, 1 out of 100 lines, 100,000 occurrences.

· Dense Search, 1 out of 1000 lines, 10,000 occurrences.

· Extremely Rare Search, 1 out of 100,000,000 lines

· Sparse Search, 1 out of 10,000,000 lines, 1 occurrence.

· Rare Search 1 out of 1,000,000 lines, 10 occurrences.

SPLUNK’S EVENT GENERATOR

Splunk Event Generator is a utility that enables you to build real-time events based on file definitions. We will be using a sample file with embedded search terms to build 1 TB of data with the Event Generator.

Install Event Generator:

If you do not have an account, simply create one by going to GitHub’s home page at www.github.com and register a new account.

· Download the Splunk Event Generator from https://github.com/splunk/eventgen

· Click the “Install app from file” button located at the upper left had corner.

· Choose the file by browsing to where the event generator zip file is located, and choose the event generator file.

· On a terminal, enter the following command to rename to a new directory

mv $SPLUNK_HOME/etc/apps/eventgen-master $SPLUNK_HOME/etc/apps/eventgen

· Restart Splunk to enable the app.

EMBED SEARCH TERM IN SAMPLE FILE

Splunk’s Event Generator can create real-time events from most if not all sample files. In the past, I was able to create machine-data logs from Cisco:ASA, Cisco:FWSM, syslog, Mcafee Endpoint protection, Nessus Vulnerability Scan and the many out of the box samples included in Splunk installations.

For this demonstration, I have chosen to use a syslog data as the sample log to generate 1 TB of data.

1. It is easier to embed the various search terms if your sample data has a defined number of lines. The file I selected is an old syslog file that is about 12 GB. First, I trimmed the file to the defined size of 10,000,000 lines.

$head -10000000 syslog.sample.log > new_syslog.sample.log

2. Create Dense Search - Enter the command to find and replace nth number of dense occurrences.

awk 'c && sub("pattern","replace") {c--}1' c=1 samplefile > newsamplefilewithreplace

example:

awk 'c && sub("certificate","DENSE100") {c--}1' c=100000 samplefile > newsamplefilewithreplace

3. Check the number of replacements.

$grep DENSE100 sample.filename.log | wc –l

4. Create Rare Search Results - Insert random rare search terms throughout the sample file by choosing a random line in vi and inserting a string that would be unique within the data, for example "$rFv5TgB^yHn".

5. When you have added all the search terms, your sample file is ready to go. Next, move the sample file to:

$SPLUNK_HOME/etc/apps/eventgen/samples

CONFIGURE THE SPLUNK EVENT GENERATOR

The conf file for the Event Generator is named eventgen.conf. There is an eventgen.conf located in the default directory. Do not edit this file, instead, create a new eventgen.conf file in $SPLUNK_HOME/etc/apps/eventgen/local

Below is a simple configuration to get you started building your Splunk Event Generator data.

To add additional settings, please refer to the README directory located in the root directory of eventgen.

Example: eventgen.conf configuration

[syslog_sample.log] #sample log name

interval = 3 # number of secs between events

fileMaxBytes = 100000000000 # size of each log file 100 GB

fileBackupFiles = 11# number of files, 10 x 100 GB = 1 TB

count = 0 # use entire sample file

outputMode = file # output mode set to file

fileName = /opt/splunk/var/lib/splunk/whitebox/sample_syslog_06302015.log # Output file name

#timestamp regular expression match string

token.0.token = \d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}

token.0.replacementType = timestamp

#Timestamp replacement string

token.0.replacement = %Y-%m-%d %H:%M:%S,%f

That is all it takes to generate 1 TB of data for Splunk performance testing. Generating 125 GB of data takes about 4 hours.

CONCLUSION

There has been a lot of buzz within the Splunk Community on performance testing and it’s many approaches. Performance apps like SplunkIT and bonnie++ has laid the foundation for such testing to occur. However, these tools are limited in some ways because they were designed to measure a single indexer. By creating your own data with rare and dense searches, you’ll be able to measure the search performance and narrow down any bottlenecks within a multi-indexer environment.

Happy Splunking…!

Tags: Splunk, Operational Intelligence, Performance Testing, Big data

Blog