Kognitio on Hadoop Architecture
Kognitio on Hadoop is the latest version of the Kognitio Analytical Platform, the world’s fastest in-memory data analysis engine. Kognitio on Hadoop includes full YARN (Hadoop’s preferred resource manager) integration allowing Kognitio to share hardware infrastructure with other Hadoop applications. On Hadoop, the software is free to use, with no functionality or scaling limitations.
Hadoop and its associated software projects have provided organisations with a cost-effect way of storing and processing vast amounts of diverse data. A range of tools allow that data to be captured, analysed and consumed. However, these tools do not possess the performance levels or enterprise capabilities needed to reliably support hundreds of concurrent users. Kognitio on Hadoop can support thousands of analytical queries per second from hundreds of concurrent sessions, and can be scaled out to accommodate both larger data volumes and more query throughput.
Kognitio on Hadoop provides ultra-fast SQL access to data stored in Hadoop, at very high concurrency levels. The speed and concurrency allows organisations to deploy interactive, self-service analytical tools to thousands of end users even when the data volumes are very large (“Big Data”).
KOGNITIO ON HADOOP
Kognitio allows users to easily pull very large amounts of data from HDFS (Hadoop’s File System) into high-speed computer memory, apply massive amounts of processing power to it, and thereby allow analytical questions to be answered quickly and interactively, regardless of how big the data is.
Kognitio has a scale-out, in-memory, shared nothing architecture. The principals of MPP Scale-out and in-memory are now a pre-requisite for fast, scalable analytics on Hadoop. However, the implementations of these principals vary widely, and this causes a large variation in the performance, flexibility and maturity of the available offerings. Because Kognitio has been developing scale-out, in-memory platforms for over 20 years the Kognitio Platform is highly efficient, extremely flexible and fully enterprise ready.
Kognitio treats every available CPU cycle as a precious commodity, because when the performance of an analytical platform is no longer disk IO bound, it becomes CPU bound. Kognitio technology also allows the available CPU power to be automatically deployed in the most effective way for the data and analytics being performed. Whether that be deploying all available CPU power to answer a single complex query over a very large data-set or splitting the power to answer thousands of queries per second, at high concurrency, on a smaller sub-set of the data.
In addition to the ultra-fast SQL, Kognitio also has sophisticated NoSQL support that allows advanced analytical algorithms to be executed at scale and in almost any language e.g. R, Python, Java, C.
KOGNITIO ON HADOOP FUNDAMENTALS
How does it do it?
- Disk IO bottlenecks are eliminated by holding the data in very fast high-speed computer memory (RAM). This allows large amounts of processing power to be simultaneously applied to the data.
- Data is held in structures that are optimized for in-memory analysis; it is not a transient copy of disk-based data like a traditional cache.
- Massively Parallel Processing (MPP) allows platforms to be scaled-out across the largest of Hadoop clusters.
- True query parallelization allows queries on very large data-sets to use every processor core, on every processor (CPU), on every server equally.
- Granular queries that access smaller sub-sets of data can be automatically isolated to a sub-set of CPU cores allowing hundreds of these queries to be satisfied simultaneously with zero computational contention.
- Processor efficiency is very high. Kognitio uses development languages and sophisticated techniques to ensure every CPU cycle is utilized effectively.
- Machine code generation and advanced query plan optimization techniques further ensure every processor cycle is effectively used to its maximum capacity.