Organizing Your Splunk Shoe Rack (Defining Index Structures , Part 2 of 2 )

By: Anshu November 02, 2012

In my previous post, I went through the thought process of defining a Splunk index structure. There aspects of defining this structure were covered: data access control, data retention, and search performance. Now that we understand the case for a well-defined index structure and the different factors that drive it, let's go through a use case.

An extremely bright and talented system administrator at the Panda Shoe Company (fictitious) wanted to work smarter and installed Splunk into their environment. After seeing the tremendous benefits, soon other groups within the company wanted to use Splunk such as IT security, development, and marketing. Data began pouring into Splunk. Application logs, host data, system health metrics, firewall data, router data, and even a phone call record system. After reading insightful blog posts and taking into account the various factors around the data, the bright system administrator came up with the following index structure strategy:

Panda Shoe Company Indexes

(1) "network" index
-Stores firewall, router, switch and load balancer data
-Data kept for 2 weeks
-Access limited to "operations" and "network security" groups

(2) "app_ecomm" index
-Contains custom application logs from Panda's e-commerce system including front-end web servers, application servers, and payment processing servers. The data is kept together because events are correlated with an ID and are grouped as "transactions" within Splunk.
-Data is kept for 90 days because Panda wants to see business trends over that time period.
-Access is limited to the "ecomm_dev" and "operations" groups

(3) "app_hr"
-Contains custom application logs from Panda Shoe's internal HR application.
-Data is kept for 180 days because of a federal law to do so
-Access is limited to the "hr_dev"and "operations" groups

(4) "app_accounting"
-Contains custom application logs from the internal accounting system
-Data is kept for 180 days because of an organizational requirement to do so
-Access is limited to the "accounting_dev" and "operations" groups

(5) "os_linux"
-Contains OS level data in syslog format from Linux hosts such as cpu, memory, and disk utilization metrics and /var/log contents.
-Data is kept for 30 days because the organization did not find it useful after that time period
-Access to data is limited to the "operations" and "security" groups

(6) "os_windows"
-Same as above except for Windows hosts. Windows event logs and perf mon data is collected.

(7) "phone"
-Contains data from the phone call record system
-Operations only needs the data for 2 weeks, but marketing wants to see trends over the past month, so data is kept for 30 days
-Access to data is limited to the "marketing" and "operations" groups

A couple of things to note:

(1) HR and Accounting application logs were not stored together since the developers for those groups are separate and there as no need to search the data together 99.9% of the time

(2) Indexes that store different types of data were named with common prefixes. This was done so that if there is a need to search for example all os data, the search would just have to contain "index=os_*" instead of "index=os_linux OR index=os_windows." The same for the applications, index=app_* would handle all the application data.

(3) In indexes.conf, use the "frozenTimePeriodInSecs" setting per index to set the retention time period. Remember that by default data that is rolled to frozen is "deleted." It's up to you to set a path for storing frozen data. If a path is set then Splunk will use it's own script to compress and move the data to that path. You can also provide your own script if you wish.

Hopefully, this provided some insight for configuring your Splunk deployment in a meaningful and effective way. Happy Splunking!

Tags: indexes, Splunk

Blog