Kognitio analytical platform:
tech profile

5. Ultra-fast, interactive analytics on Hadoop

Hadoop has experienced widespread adoption as the preferred scalable data storage and processing platform for organizations trying to gain insight from ever-increasing volumes of varied data (big data).

There are a number of reasons for this; massive scalability, resilience and low-cost being the most important. The open source nature of Hadoop, and the consequent lack of license fees, is attractive to organizations that have rapidly growing data volumes and increasingly demanding analytics requirements, and so is the ability to build very large capacity, resilient clusters using cheap, widely available, generic hardware.

While Hadoop can effectively be used to build a cost-effective and hugely scalable data storage and processing platform, providing business users access to the data in Hadoop can be difficult. Hadoop was originally conceived as batch data processing environment that could be programmed by engineers and data scientists to perform complex data processing tasks on big data.

Early Hadoop was a NoSQL platform and SQL, the language used by the vast majority of data analytics and visualization tools to access and query data, was thought to be obsolete.

In recent years, the Hadoop market has realized that the lack of a meaningful SQL interface to Hadoop seriously limited the degree to which the wider business could benefit from the data stored in the platform and there was a desperate scramble to develop SQL interfaces to Hadoop (SQL on Hadoop).

Unfortunately the SQL interfaces that emerged were not capable of providing the functionality and performance required for the sort of ad-hoc, interactive analytics demanded by modern users. There are a number of reasons for this:

  • SQL is very complex standard and implementing it from scratch is a long process, so most SQL on Hadoop solutions have poor SQL standards compliance
  • Hadoop is a parallel platform. A SQL on Hadoop solution must not only implement all the functionality in the standard, but it must do so using efficient algorithms that can run in parallel and can scale
  • Hadoop’s standard data search engines are very slow and cannot provide interactive access
  • Interactive access to big data requires highly efficient code, Hadoop is predominantly a JAVA based platform, which was not designed for efficiency
  • Modern data visualization tools have made self-serve BI and analytics pervasive across a business. Pervasive BI means that the data platform must be able to support high concurrency workloads. Support for high-concurrency comes with product maturity

Kognitio on Hadoop

Kognitio looked at the SQL on Hadoop problem in a different way. For the past 20 years, Kognitio has been solving the problems associated with massive scale-out, ensuring maximum processor efficiency and integrity when working with data in-memory. It has more experience in this field than any other organization. Its analytical platform already had a fully functional, fully parallelized SQL interface and Kognitio’s hardware original platform was a cluster of networked industry standard servers, so porting the Kognitio software to run directly on a Hadoop cluster was a relatively simple and natural step.

Kognitio and Hadoop co-exist on the same large-scale compute cluster. When running on Hadoop, Kognitio supports YARN and allows HDFS to be used for all Kognitio disk operations (metadata storage, logging and optional data storage). Integration with YARN allows Kognitio to work reliably and co-operatively (in terms of resource utilization) with any other YARN compatible applications. Kognitio on Hadoop adapts to a thinly provisioned set of processes distributed across all or part of the Hadoop cluster as allowed by YARN.

Kognitio is a natural fit with Hadoop infrastructures. Kognitio, unlike Hadoop, has low latencies, raw speed and can support a high concurrency, mixed workload.
Combine this with excellent industry standard application connectivity and Kognitio can deliver a high-performance analytical layer for business applications wanting to interoperate with Hadoop.

One interesting use case for Kognitio’s high performance, MPP, in-memory layer on top of Hadoop, is the acceleration of popular visualization and analytics tools like Tableau, enabling users to experience fast interactive analytics, even when working against large amounts of Hadoop-based data.

6. Data persistence in Kognitio

As a high-performance in-memory platform, Kognitio is generally used as an acceleration layer that sits between where the data is stored and where the data is consumed.

Keep reading