What is ‘Big Data’?

big data

A definition

Big Data refers to extremely large and complex sets of data that cannot be effectively processed using traditional data processing methods. It encompasses vast amounts of information that is generated from various sources, such as social media, sensors, transaction records, and more.

The defining characteristics of Big Data are commonly referred to as the three Vs: Volume, Velocity, and Variety.

The 3 Vs

  1. Volume: Big Data involves a massive volume of data that exceeds the capacity of traditional databases or software tools to handle efficiently. It typically involves terabytes, petabytes, or even larger amounts of data.
  2. Velocity: Big Data is generated and collected at an unprecedented speed. It is often produced in real-time or near real-time, requiring rapid processing and analysis to derive meaningful insights.
  3. Variety: Big Data comes in various formats and types. It includes structured data (like traditional databases), semi-structured data (such as XML or JSON files), and unstructured data (like social media posts, emails, videos, etc.). This diverse mix of data sources adds complexity to the analysis process.

Tools

Dealing with Big Data requires specialized tools, technologies, and approaches to store, process, and analyze the information effectively. The goal is to extract valuable insights, uncover patterns, make informed decisions, and gain a deeper understanding of complex phenomena by leveraging the immense amount of data available.

One tool that is frequently used today is a tool called Splunk. It allows the storage of events of all sorts of sources. With their ‘schema on the fly’ searching this big pile of data will be easy and fast. They also offer features like data modeling which makes it possible to prepoluate results and make searches even more efficient. They make it also possible to treat event based log data and metric data within the same platform.

Furthermore you have some built-in visualisations that make the creation of dashboards super easy. As every data tool today it offers built in replication in cluster and fast failover in case of failure of one of your indexer or search head cluster nodes.

We will discuss the Splunk’s terminology in later articles because it is impossible to cover them all now.

In this big data tooling landscape, you still have other companies like Microsoft with Sentinel, IBM with Qradar.

Let’s explore this big data world further in the coming couple of weeks months.

Categories

Latest articles

Latest comments

All Splunk Posts