Kognitio analytical platform:
tech profile

4. True parallelism is all about the CPUs

Having all the data of interest held in computer memory does not, in itself, make an analytical platform fast.

What makes an analytical database fast is its ability to bring all of the available processing (CPU) power to bear against a given set of data.

Analytical queries, when compared with transactional queries, are generally looking for patterns of behavior across large portions of data, as opposed to transactional queries, which usually hit a few rows and involve relatively simple operations.

Analytics usually involves complex processing operations, across very large row sets and, assuming they are not in any way disk I/O bound, become CPU bound.

Because of the compute-intensive nature of analytical queries, the ratio of data-to-CPU-cores becomes very important and is a key measure of a platform’s ability to “do work” with data, as opposed to its ability to simply store data. In a disk-based platform, this ratio is very high since the data can only be accessed quickly enough to keep a small number of CPU cores busy. Holding data in very fast computer memory allows very large amounts of CPU cores to be kept efficiently busy for the duration of a user’s query. Kognitio analytical platforms typically have a data-to-core ratio of 4–16 GB per core.

To support the sorts of data volumes that organizations are trying to analyze today, an in-memory analytical platform must be able to parallelize every individual query operation across any number of CPU cores, from just a few to tens of thousands.

Kognitio does exactly that: its in-memory analytical platform can scale from a single core on a single CPU, in a single server, to thousands of cores, in hundreds of individual servers. Each and every core, can participate equally in every single query or a query can be executed on smaller sub-set of cores, thereby allowing for greater concurrency. Advanced techniques such as dynamic machine code generation are used to ensure that every CPU cycle is efficiently used. This is true parallelism.

As data volumes grow, true parallelism allows a platform to be expanded a few servers at a time, maintaining the data/core ratio and keeping query performance constant. This is linear scalability.

Kognitio is able to support true parallelism because it was designed from inception to do so. The fact that Kognitio has always been in-memory and performance has always been CPU bound, has fostered a software design and implementation philosophy where the efficient use of every available CPU cycle is paramount. On the other hand, in software that was originally designed for a disk based platform, where CPUs generally spend much of their time waiting for disk IO, CPU efficiency was a very low priority. Such software will undoubtedly produce some performance benefit when used with more data held in RAM, but nowhere near what can be achieved by platforms that are designed to fully exploit the benefits of an in-memory architecture.

Concurrency

Parallelism delivers raw performance which is experienced by the end-user as less waiting time for their query to complete – vital for modern interactive drag-and-drop GUI tools and dashboards. This power can also be harnessed to meet the needs of many users, all of whom want improved performance at those busy times of day. Concurrency can be measured in terms of connected users but most importantly (for the active users) should be measured in terms of number of simultaneously executing queries. This is relatively easy on OLTP platforms where the granularity of many small accesses can be accommodated. In an analytical platform the user queries are typically running against large sets of data with complex computation steps. Efficient CPU utilization and minimal IO wait is vital to reduce contentions for platform resource. Kognitio excels at this, and by intelligently (see Intelligent Parallelism) splitting data and queries across all of the CPU cores it can achieve interactive performance and high concurrency even for very large data-sets.

Fast and flexible

Along with fast query performance Kognitio also provides a great deal of flexibility. It can operate as a massively parallel SQL engine and as a platform for parallelizing complex mathematical algorithms written in almost any language. This very flexible capability allows companies to fully parallelize almost any algorithm, whether it be a proprietary, open source, or commercially available algorithm. This makes Kognitio a very powerful data science “sandbox” environment as well as a scalable production platform.

Customers are adopting Kognitio for critical business applications for the following key reasons:

  • Very high query and analytical performance
  • High concurrency to support growing user communities
  • Well proven robust and fully featured implementation
  • Ease of interfacing third-party tools and applications

5. Ultra-fast, interactive analytics on Hadoop

Hadoop has experienced widespread adoption as the preferred scalable data storage and processing platform for organizations trying to gain insight from ever-increasing volumes of varied data (big data).

Keep reading