Splunk has two ways of being deployed. One is a standalone box and the other is called a ‘distributed deployment’.
What are the differences?
In a standalone deployment you will have one server that does all of the work. In a distributed deployment you will have more than one machine running Splunk and each of them gets one or more roles assigned, which define the tasks they are responsible for.
When should you choose which deployment type?
Stand alone deployment
Let’s be clear that a standalone Splunk deployment is not something you run in production. Stand alone deployment are used for testing purposes, raising awareness about the product, user demos.
As opposed to that a distributed environment is the way to go if you want a scalable and performant solution. This will usually happen after you won the hearts of some key user during user demos on your standalone box.
Distributed deployment
In a distributed deployment, as the name implies different server roles are split between different servers. How many servers is really dependent on so many factors, that it is difficult to put even a ball park number on the number of servers you are needing. Just keep in mind that in fact, if you have more than 2 servers you can call your deployment a distributed one. I would say 99,9% production deployments that are out there, are distributed deployments.
Factors that a design of a distributed deployment depends on
As I already mentioned, a distributed deployment can range from 2 to … servers. There are some important questions you need to ask yourself before you start putting your deployment together. Below you will find a good list to start with:
- What is my daily data volume?
- What are the type of sources I will receive and how will I receive them? File/API/scripts, …?
- What are the methods to get data in?
- Can I install UFs on every machine that needs data ingested or do I need to look for other methods?
- How will my data access matrix look? Who will have access to which data?
- Do I need to do complex data routing within my environment.
- Do I need to keep this data available at all times?
- Do I want data to be available as locally as possible to my users or not?
- What is going to be the storage needs to accommodate this?
- …
This list is far from complete but it gives you a good starting point. Remember: don’t do it fast, but do it right!
This post showcases exceptional research and a deep understanding of the subject matter. The clarity of your writing and the…