Transforming Hadoop into your best BI platform:

An expert how-to guide

Addressing the three problems

So let’s talk what-ifs; let’s talk ultra-fast, high-concurrency SQL for Hadoop and data warehousing; and let’s talk about scale-out, in-memory software that enables modern BI and visualization tools to maintain their performance – even when the data volume is large and the user count high.

In other words, let’s talk Kognitio.

As a software layer sitting between the data stored in Hadoop and your business users/BI tools, Kognitio is:

  • Free to use, with no limits on scalability, capability, or duration of use
  • Proven and mature in terms of query optimization and functionality
  • Highly performant, particularly with concurrent SQL workloads
  • Can be used both on-premise and in the cloud
  • Deployed and runs as a YARN application on your Hadoop cluster

How does this relate to your day job? Well, every reader’s operational concerns are obviously unique, but by way of exploring generic benefits, let’s return to our three core problems.

Problem one: an inability to perform BI tasks directly on Hadoop is causing organizations to aggregate individual data sets and push them out to business users

Now we can provide the answer: with Kognitio you don’t have to take data out of Hadoop. Simple.

Instead, you access the data directly from the platform, with the speed and performance required for interactive BI. Better still; by removing the need to constantly create and share data subsets, you’ll have a super-charged BI experience that’s complete – and far better than anything you’re currently using. What’s more, in this situation you really will have a single version of the truth, and from it the opportunity to see new ways for delivering ROI, and for rationalizing your entire BI infrastructure.

Problem two: running data analysis directly in Hadoop is too slow, and made worse with high-concurrent workloads.

Kognitio has spent the last 25 years working out the best ways to parallelize complex SQL functionality. Due to this heritage, we can point to a history of empowering tools like Tableau and MicroStrategy, and delivering breath-taking SQL performance with concurrent workloads. To deliver the fastest SQL on Hadoop today, we’ve migrated our engine to the platform, and can help you to run thousands of complex queries per second to serve answers to thousands of concurrent users throughout your business.

Problem three: the SQL products on the market that are dedicated to solving these problems are themselves too slow to be effective.

SQL is a large complex standard, and difficult to implement from scratch – but this is exactly what most SQL on Hadoop technologies are trying to do. To make matters worse, the challenge of developing SQL on Hadoop is compounded by the platform being a scale-out parallel platform – meaning all SQL operations themselves need to be efficiently executed in parallel. This is a very hard process to manage, and as a result, newer SQL solutions deliver poor performance when it comes to ad hoc analysis using modern, interactive visualization tools. Kognitio on the other hand started out working on complex parallel implementations, it’s what we do and we’ve got rather good at it!

The best platform for SQL on Hadoop

Want to know what your options are for getting SQL to work on Hadoop? To help, we’ve been doing some tests, and using the TPC-DS query set have measured the performance of Hive LLAP, Impala, SparkSQL and Presto – and compared their results to Kognitio.

Keep reading