Monitoring Frozen Data Storage in Splunk
In this post, I'd like to visit the "Siberia" of Splunk data or frozen (archived) storage. For all other types of data besides frozen, you can get insight on your Splunk data at the index and bucket level by using the "dbinspect" command or apps like "Fire Brigade." However, because frozen data "lives" outside of the world of Splunk, there's no way to get insight on that data via Splunk. Therefore, I will outline a solution for creating a scripted input to send metrics to Splunk which can then be used for reporting.
Create the Script
In our sample environment, frozen data is stored on each indexer on the "/data/frozen/" path. Inside of the "frozen" directory, are directories for each index, which contain frozen buckets. For example, archived data for the "windows" index would be in "/data/frozen/windows/" directory and would contain many frozen buckets.
One of the metrics we wish to obtain is how much space the frozen data is taking up per index and in total. Below is a bash script to collect this data. I made the comments fairly verbose to help illustrate what is going on.
---frozen_storage_metrics.sh--- #!/bin/bash #Do not output STDERR messages exec 2>/dev/null #Set the value for the frozen path FROZEN_PATH="/data/frozen" #Capture the current timestamp for use when outputting events CURR_DATE="`date +%Y/%m/%dT%T`" #iterate through each index in the frozen path. The "_dir" variable is used to store all the paths for _dir in "$FROZEN_PATH"/*/ do #Extract the portion of the path with the index name and store for later use CURR_IDX=$(echo $_dir | perl -pe 's/\/data\/frozen\/([^\/]*)\//\1/') #Use the "du" command to get the size of the directory. Only the "total" line is used and is then transformed to include a field name "frozen_size_mb" FROZEN_SIZE_MB=$(du -cms "$_dir"/ | grep 'total' | perl -pe 's/(\d+)\stotal/frozen_size_mb=\1/') #Output the data into a Splunk-friendly event format, which includes: # 1. A timestamp at the beginning of the event # 2. A delimited set of key-value pairs echo $CURR_DATE,index_name="$CURR_IDX","$FROZEN_SIZE_MB" done #Get the total size of the frozen path. This is optional since the events produced above can be aggregated to produce a total amount FROZEN_TOTAL_SIZE=$(du -cms "$FROZEN_PATH"/ | grep 'total' | perl -pe 's/(\d+)\stotal/frozen_size_mb=\1/') #Output the data. Set "index_name" to "all" since this is for the entire path. echo $CURR_DATE,index_name=all,"$FROZEN_TOTAL_SIZE" ---
I included verbose comments in the script to help illustrate what it is doing. One thing to note is that the last "du" command to get the total size of the frozen path, could have been used to get the total size for all the individual indexes. This would have created an event with a tabular format, which then could have been parsed in Splunk at search-time into separate events. The script above is formulated to produce a single event per index, which alleviates any search-time manipulation of the data.
Also, the directory name for the index is being used as the "index_name." We know, however, that this directory could be any name depending on how the index paths are configured in indexes.conf. In practice, most index configuration will just use the name of the index for the name of the directory on the filesystem. One example where this is not the case for a default index is the for the "main" index which appears as "defaultdb" on the filesystem.
You can test the script by executing it at the command line. When doing so you should see output similar to the following:
2016/04/01T10:38:09,index_name=windows,frozen_size_mb=95 2016/04/01T10:38:09,index_name=os,frozen_size_mb=67 2016/04/01T10:38:09,index_name=defaultdb,frozen_size_mb=4 2016/04/01T10:38:09,index_name=network,frozen_size_mb=14 2016/04/01T10:38:09,index_name=mcafee,frozen_size_mb=1 2016/04/01T10:38:09,index_name=all,frozen_size_mb=181
This format will be easy to parse for Splunk and will minimize the amount of both index-time and search-time configuration necessary to use the data.
Create the Scripted Input
Now that we have our script created, we'll create an add-on to package it and the inputs configuration. We'll call the app "acme_TA_indexer_metrics." The script will be stored in the "bin" directory inside of the app. The inputs.conf configuration will be stored in the "default" folder. The inputs.conf configuration would appear as follows:
---default/inputs.conf--- [script:://./bin/frozen_storage_metrics.sh] index=os disabled=true sourcetype=indexer_storage_metrics interval=3600
The scripted input is configured to send data to the "os" index, set the sourcetype to "indexer_storage_metrics," and run every hour (the "interval" attribute is set to 3600 seconds or one hour).
When activating the input, you can create a copy of the stanza in the inputs.conf in the "local" folder
---local/inputs.conf--- [script:://./bin/frozen_storage_metrics.sh] disabled=false
Not shown here, but index-time configuration to explicitly extract the timestamp and set line-breaking would be included in the add-on in the "default/props.conf" file.
In our scenario, this add-on would be configured with inputs enabled and deployed to the indexers which have access to the frozen data. It's main purpose there is to execute the script and collect the data. If additional search-time configuration was added to this add-on, it would also be deployed to the search heads. Note that Splunk needs to be installed on a server that has access to the frozen data. Another consideration is that if the data is not stored on the indexers then a portion of the file path could be extracted and used as a "hostname" field to identify which indexer the frozen data originated from. For example, the data might be organized in the following way "/data/frozen/indexer1," "/data/frozen/indexer2," and so on. The "indexer1" portion of the path could be used in this case to specify the indexer.
Now that this data is collected in Splunk you can use it for monitoring the size of archived or frozen directories. An example search would be
index=os sourcetype=indexer_storage_metrics index_name="all" | eval frozen_size_gb = frozen_size_mb / 1024 | timechart max(frozen_size_gb) by host
The above search would give the total size of each indexer's frozen storage over time.
Because frozen data is "orphaned" from Splunk, alternative ways can be created to gather metrics on this data. This approach covered how to collect storage size for archived data. This approach could also be used to collect metrics such as number of buckets. Thanks for reading and happy Splunking!