I’ve put data in Hadoop so analytics will be quick, right?

Volume, variety and velocity are the three known defining properties that put the ‘big’ in ‘big data’. Hadoop helps businesses overcome many of the challenges of dealing with these three properties, but does it solve all of them?

Does velocity mean faster analysis?

While data is constantly collected from hundreds of sources at speed does Hadoop’s strengths include faster querying of your big data? Hearing “velocity”, you’d be forgiven for thinking it not only refers to fast data in but fast data out i.e. high-speed analysis; in reality querying your data on Hadoop is anything but.

Traditionally, companies imported and analyzed data using a batch process, taking a chunk of data, submitting it to the server and waiting for the results. That scheme works when the incoming data rate is slower than the batch processing rate, and when the result is useful despite the delay.

Now, with real-time sources of data such as social and mobile apps (Twitter users generate nearly 100,000 tweets every minute, for example) the batch process fails.

Yet Hadoop still lacks the ability to get super-fast insights from this high-velocity inbound data. This is particularly problematic if your businesses has hundreds of business users attempting to deliver insights from this central data simultaneously. Software like Kognitio on Hadoop, however, can solve this velocity issue; it was built specifically for massively parallel analytical query processing, for high speed analytics directly on Hadoop.

You can solve the data velocity issue, after all.

What about the other two Vs?

Nowadays, the sheer volume of business data processed on a daily basis is almost taken for granted. With such wide-ranging and varied sources of internal and external information, most organizations have a robust way of aggregating and consolidating this data. Hadoop is one such platform that businesses have deployed to successfully handle their huge, fast-moving data sets.

Big data variety goes hand in hand with the sheer volume of data available; the format and structure of this data can vary widely between structured and unstructured data, generated either by humans or machines. Unstructured data can include text from emails, social platforms, information in voicemails, written text, images or audio recordings.

Hadoop is an excellent platform to help businesses sorting and identify these structured and unstructured datasets, and putting them into one queryable cluster.

The missing V — value

With self-service BI becoming the norm, it’s clear that organizing and analyzing your business data into a single data lake, no matter how quickly, simply isn’t enough to drive true value.

Businesses spent time getting data into Hadoop, only to find it takes a serious dose of technical know-how to get the most out of it. Data became limited to data experts, rather the wider business users.

In recent years, SQL on Hadoop software like Kognitio on Hadoop was built specifically for massively parallel analytical query processing, with a goal of delivering more value to your big data.

By allowing hundreds of concurrent user to access and query big data directly on Hadoop, it immediately adds real value to organizations’ high volume, fast-moving and varied data sets.

Leave a Reply

Your email address will not be published nor used for any other purpose. Required fields are marked *

SQL on Hadoop. Bring your BI to life.

Read how to transform Hadoop into your best BI platform

Download the guide