What are Splunk’s magic 6?

what is splunk

In our last article about Splunk metadata we spoke about the sourcetype field being one of the metadadata fields. In our article about getting data in Splunk we also saw that at a certain moment we had to chose a sourcetype for our data.

What we did not do was open the ‘Advanced’ section. Below you can see a screenshot with the advanced options visible doing the exact same steps as before to import a file.

NOTE: Please NOT be overwhelmed by all the settings you see there. We will go over all these settings and what their meaning is, in upcoming lessons. We will just focus on 6 magic settings or the splunk magic 6 as they are often called. (There are actually discussion that it should be called the magic 8 due to some other settings that are not as relevant.)

Why are these 6 parameters so important?

These settings are important because they will determine how data will be parsed and indexed. I know that this vocabulary is new but, let’s say that in plain English this means: the parameters are important because this is how they will end up being saved on a disk and WILL NOT be able to be changed after that.

What are the ‘magic 6’?

The magic 6 are the following:

  • SHOULD_LINEMERGE
  • LINE_BREAKER
  • TIME_FORMAT
  • TIME_PREFIX
  • MAX_TIMESTAMP_LOOKAHEAD
  • TRUNCATE

SHOULD_LINEMERGE: this setting dictates whether an event can span multiple lines (Default: true) Note that most circumstance we will however configure it to be ‘False’. More on that later on.

LINE_BREAKER: As we will configure the SHOULD_LINEMERGE to ‘False’, the LINE_BREAKER (Default: ([\r\n]+) setting will determine how are events are split. The default is a regex which has one capture group of either one or more carriage returns or new line characters and thus will split events once it finds a match for this REGEX. The content of this capture group will also be removed from the data before it is written to disk. Check this article to learn more about REGEXes.

TIME_FORMAT: Time format is the format in which the time is specified in your log file. It uses the strptime standards. A list of time variables to use can be found here. (Default: empty string)

TIME_PREFIX: Time prefix is the REGEX that precedes the time stamp in your event. (Default: empty string)

MAX_TIMESTAMP_LOOKAHEAD: this the number of characters that Splunk will look into your event, after it has matched the TIME_PREFIX value. So this does not count from the beginning of the event. In plain English, your entire timestamp has to be within the first position after your TIME_PREFIX + <MAX_TIMESTAMP_LOOKAHEAD>th position in your raw data. (Default: 128)

TRUNCATE: defines the value of maximum characters that can be in one event. (Default: 10000)

What about the ‘magic 8’?

There is some debate around this but the following parameters are sometimes considered belonging to the magic 6 which makes them change their name to magic 8:

  • EVENT_BREAKER_ENABLE
  • EVENT_BREAKER

The reason why they fit in would be that they also have something to do with event parsing. The reason that they wouldn’t comprised into this ‘magic’ group, would be that these 2 are NOT indexer parameters but parameters that will only have effect on Universal Forwarders.

EVENT_BREAKER_ENABLE: is a boolean that specifies whether event breaking will be enabled on a Universal Forwarder (which is not enabled by default on UFs)

EVENT_BREAKER: is a REGEX that specifies what are the event boundaries (similar to the LINE_BREAKER setting above for indexers, but this setting will already break

I know I promised already several times but the article about the server roles in a distributed environment will be the next one that I create 🙂

As always, should you have question, don’t hesitate to ask.

Latest articles in the Splunk series

Categories

Latest articles

Latest comments

All Splunk Posts