Syslog Collection with Splunk

By: Anshu May 24, 2012

^{www.freedigitalphotos.net}

What is Syslog?

If you're familiar with IT system administration, syslog data is something you've most likely come across. It's a standard used to log server, system, and device messages. It was originally developed as part of the Sendmail project in the 1980's and has become the standard used for Unix-based systems and for network devices such as firewalls. Because of its widespread adoption, syslog data can be extremely helpful for IT operations and infrastructure management. The problem lies in gathering all the data that is being generated by hosts across a network. In order to solve this problem, many organizations will configure hosts to send syslog data to a server (or servers) functioning as a syslog collector. This approach centralizes the storage of syslog data, which is helpful from a log management and availability perspective. Now that the data is being logged locally, a Splunk universal forwarder can be installed on the syslog collector and forward the data to Splunk indexers. The following post highlights some of the aspects of setting up syslog-ng collector server in your organization.

Syslog-ng Installation

The current recommended version of syslog-ng to use is version 3. There are two ways to do the install: 1) Download the source code from the Balabit site and compile. http://www.balabit.com/network-security/syslog-ng/opensource-logging-system/downloads/download/syslog-ng-ose/ 2) Use a third-party binary available here: http://www.balabit.com/network-security/syslog-ng/opensource-logging-system/downloads/3rd_party

Syslog-ng Configuration

Once syslog-ng is installed, the next step will be to configure how it receives data sources and writes to log files. (The complete admin guide is available at: http://www.balabit.com/support/documentation) The configuration file is basically broken down into the following sections:

Sources
Destinations
Filters
Logs

Each of these sections will be examined below: Sources A source is a data source. In terms of collecting syslog data, this will usually be set to UDP port 514. However, it is possible to collect data from other network sources. For example, a customer might have a call reporting system which they can configure to send syslog data via TCP to a configurable port. A source can be created to handle this input. Below is an example of a source construct.

source s_syslog  {
udp(ip(0.0.0.0) port(514));
};

“source” is the variable type
“s_syslog” is the name for this source variable. The “s_” is a variable naming convention
“udp” specifies the protocol of the data source
“ip(0.0.0.0)” specifies the network interface to receive data on for this source. 0.0.0.0 means to capture data arriving to any interface on the host.
“port(<port_number>)” specifies the port to receive data on for this source

It may be desirable to output some internally generated data such as messages from Syslog-ng and from the kernel. The following configuration can be added to the sources section to do this.

source s_local {
internal();
unix-stream("/dev/log");
file("/proc/kmsg" program_override("kernel: "));
};

Destinations A destination is where the data sources will be sent to. In most cases this will be to a log file. Below is an example of a destination:

destination d_dns_query { file("/var/log/data/dns_query.log" create_dirs(yes)); };

“destination” is the variable type
“d_dns_query” is the destination variable name. The “d_” is a variable naming convention
“file(“<log_file_path>” specifies the location of the log file
“create_dirs(yes)” specifies whether or not to create directories that may not currently exist in the log file path

Filters A filter is essentially a matching rule that can be applied to a data source that ensures only the data that is needed to be sent to the destination is sent. Below is an example of a filter:

filter f_dns_query { match("queries" value("PROGRAM")); };

“filter” is the variable type
“f_dns_query” is the filter variable name. The “f_” is a variable naming convention
“match(“<matching_expression>” is the expression to match on for the data coming in from the data source
“value” is which part of the data coming in to apply the matching rule

Logs A log ties together a source, destination, and filter (if needed). This is the statement that actually directs syslog-ng to do something. There can be any number of sources, destinations, and filters in the configuration file, but without a log statement using them, they will never be applied. The following is an example of a log statement:

log { source(s_syslog); filter(f_dns_query); destination(d_dns_query); };

“log” specifies that this is a log statement
“source(<source_variable_name>);” specifies which data source to use
“filter(<filter_variable_name>);” specifies which filter to use
“destination(<destination_variable_name”) specifies which destination to use

Because the source, filters, and destinations are all stored as variables they can be used in different combinations in multiple log statements, which provides a great deal of flexibility. Last but not least, as part of a syslog-ng setup, log file rotation should be configured on the collector server so the log files are periodically deleted after they are ingested into Splunk.

Tags:

Blog