There are a number of reasons for this; massive scalability, resilience and low-cost being the most important. The open source nature of Hadoop, and the consequent lack of license fees, is attractive to organizations that have rapidly growing data volumes and increasingly demanding analytics requirements, and so is the ability to build very large capacity, resilient clusters using cheap, widely available, generic hardware.
While Hadoop can effectively be used to build a cost-effective and hugely scalable data storage and processing platform, providing business users access to the data in Hadoop can be difficult. Hadoop was originally conceived as batch data processing environment that could be programmed by engineers and data scientists to perform complex data processing tasks on big data.
Early Hadoop was a NoSQL platform and SQL, the language used by the vast majority of data analytics and visualization tools to access and query data, was thought to be obsolete.
In recent years, the Hadoop market has realized that the lack of a meaningful SQL interface to Hadoop seriously limited the degree to which the wider business could benefit from the data stored in the platform and there was a desperate scramble to develop SQL interfaces to Hadoop (SQL on Hadoop).
Unfortunately the SQL interfaces that emerged were not capable of providing the functionality and performance required for the sort of ad-hoc, interactive analytics demanded by modern users. There are a number of reasons for this:
Kognitio looked at the SQL on Hadoop problem in a different way. For the past 20 years, Kognitio has been solving the problems associated with massive scale-out, ensuring maximum processor efficiency and integrity when working with data in-memory. It has more experience in this field than any other organization. Its analytical platform already had a fully functional, fully parallelized SQL interface and Kognitio’s hardware original platform was a cluster of networked industry standard servers, so porting the Kognitio software to run directly on a Hadoop cluster was a relatively simple and natural step.
Kognitio and Hadoop co-exist on the same large-scale compute cluster. When running on Hadoop, Kognitio supports YARN and allows HDFS to be used for all Kognitio disk operations (metadata storage, logging and optional data storage). Integration with YARN allows Kognitio to work reliably and co-operatively (in terms of resource utilization) with any other YARN compatible applications. Kognitio on Hadoop adapts to a thinly provisioned set of processes distributed across all or part of the Hadoop cluster as allowed by YARN.
Kognitio is a natural fit with Hadoop infrastructures. Kognitio, unlike Hadoop, has low latencies, raw speed and can support a high concurrency, mixed workload.
Combine this with excellent industry standard application connectivity and Kognitio can deliver a high-performance analytical layer for business applications wanting to interoperate with Hadoop.
One interesting use case for Kognitio’s high performance, MPP, in-memory layer on top of Hadoop, is the acceleration of popular visualization and analytics tools like Tableau, enabling users to experience fast interactive analytics, even when working against large amounts of Hadoop-based data.