Growing your Splunk Deployment
Growth. It's important in so many aspects of our lives; from our careers, health, and relationships. The famed motivational, self-help guru Tony Robbins says that beyond our basic needs, we need growth and giving back in order to truly be fulfilled. In addition to adding to ourselves, sometimes growth requires us to rebuild a portion of ourselves. Well, Splunk is no different. In order for it to keep it's self-esteem high, it also needs to grow. In this blog post I wanted to cover a process to expand the number of indexers in an existing Splunk deployment while also rebuilding existing indexers.
Normally, adding an indexer is fairly straightforward, as Splunk is made to scale horizontally at the indexer and search head layers. The key parts are to
(1) Provision hardware that is ideally identical to your existing indexers
(2) Ensure you are pushing indexer configuration through the cluster master to the indexers and
(3) Update your forwarder outputs configuration to include the new indexers.
However, consider the following scenario which can be a bit more challenging. Imagine that current storage for each indexer consists of storage arrays of different speed disks, such as 10K and 15K. These mixed-speed storage arrays are presented as one logical volume to the indexers. In order to maximize the use of these disks, the faster 15K disks could be used for the more frequently-written to and accessed hot/warm data and the 10K disks for the less frequently-accessed cold data. Also, new indexers are being added to the cluster that will have the same storage hardware. The new indexers can easily be built to this specification, but there are a couple of challenges:
1. Adding the new indexers, which have different data partitions, to the cluster and keeping the indexer configuration synchronized with the existing indexers.
2. De-commissioning the existing indexers which requires data migration to other indexers in the cluster.
In terms of challenge #1, the issue is that on the existing indexers all data is stored on one data partition, such as "/data." Whereas, with the "new" storage configuration, data will be separated onto two partitions such as "/warm_data" and "/cold_data." This is obviously a problem for defining index paths globally for the indexer cluster while it is in a mixed state of having one vs. two partitions for the data.
For challenge #2, the issue is that when re-building an indexer, including it's storage, the data must be migrated to other indexers. In addition to the hot, warm, and cold data, this also includes summarized data and frozen data, if it's stored on the indexers.
In order to meet challenge #1, the following can be done:
1. File system aliases can be created on the existing indexers that mimic the two data partitions on the "new" indexers. Continuing our example from above, aliases called "/warm_data" and "/cold_data" would both point to the "/data" partition on the original set of indexers. Therefore, through configuration we can make it appear to Splunk that two separate data partitions exist.
2. If not already done so, the "volume" setting can be implemented to ensure the volumes do not fill up.
3. Update the indexes.conf file to use the new aliases/data partitions and deploy it to the existing indexer cluster.
Regarding challenge #2, there's potentially several types of data that need to be migrated off an indexer before it can be decommissioned. We'll address each type below:
1. Warm and Cold data
This data can be migrated off the Splunk indexer to one of the new indexers using the "./splunk offline --enforce-counts" command. Essentially what this does is smoothly transfer the data to other indexers, while handling any current searches, and then gracefully shutting down the indexer.
2. Data model summary data and report summary data (i.e. summarized data)
This would technically be rebuilt by the cluster after the indexer is offlined. However, a better way would be to enable summarized data replication on the indexer cluster. By doing this, there will be an existing set of this data available as soon as the indexer is offlined and there will be no gap in it having to be rebuilt.
3. Frozen Data
Not all indexers will have frozen data stored on their local storage. However, if they do, this data would also need to be migrated. In this case, an rsync script can be developed which transfers data from an existing indexer to one of the new indexers. Because frozen data is indexer-agnostic, it can be thawed on another indexer and does not have to remain on the indexer it originated from.
Putting all of this together, the plan below is a sample of how this could be accomplished:
1. Build the new indexers according to the "new" specification, which has two data partitions.
2. Create file system aliases on the existing indexers to mimic two data partitions.
3. Implement the volume setting and deploy the "new" index configuration to the existing indexers.
4. Add the new indexers to the cluster.
5. Enable replication of datamodel and report summaries.
6. Decommission and rebuild each of the existing indexers according to the new specification. For each indexer:
a. Stop sending data to the indexer
b. Migrate the warm and cold data to other indexers by gracefully offlining the indexer.
b. Migrate frozen data to an existing indexer
d. Rebuild the indexer
e. Add the rebuilt indexer back to the cluster
Growth is good for people and Splunk deployments. However, in both cases it's important to grow in the right way. Although this scenario is somewhat specific, hopefully it opened up some ideas about the possibilities of architecting the growth of a Splunk deployment. Thanks for reading and, as always, happy Splunking!