News and events

Kognitio on Hadoop outperforms Spark and Impala in industry-standard benchmarking tests

Mar 05, 2017 | By admin

New tests, run using the industry-standard TPC-DS benchmarks, conclusively prove that Kognitio® on Hadoop returned results faster and with greater overall consistency than similar tests run against Big Data SQL engines Spark and Impala.

Read Kognitio White Paper Read independent evaluation of benchmarks

The tests showed that Kognitio on Hadoop returned results faster than Spark and Impala in 92 of the 99 TPC-DS tests running a single stream at one terabyte, a starting point for assessing performance (fig 1). When the queries were increased to ten concurrent streams, Kognitio still delivered, proving faster than its competitors in 80 of the 99 tests (fig 2). Kognitio on Hadoop was also more reliable; it returned results in the allotted time of one hour or less 96 percent of the time, compared with Spark’s 85 percent and Impala’s 71 percent.

Platform Impala Kognito Spark
Queries run 73 99 89
Long running 2   10
No support 24    
Fastest query count 6 92 1
Figure 1
Platform Impala Kognito Spark
Queries run in each stream 68 92 79
Long running 7 7 20
No support 24    
Fastest query count 12 80 0
Figure 2

Speed was also a key consideration: Kognitio on Hadoop returned results up to 178.5 times faster than Spark , and up to 30.4 times faster than Impala .

The test results also showed that Kognitio was easier to implement; it was able to run each of the TPC-DS queries, 76 of them with no changes needed. By contrast, Spark only ran 72 of the queries “out of the box,” and Impala was only able to do so in 55 of the 99 queries. In fact, the tests returned results showing that Impala was not able to support 24 of the queries, a full 25 percent of the total.

Kognitio has leveraged its worldwide experience in in-memory analytics, stretching back more than a generation, making Kognitio on Hadoop available on a free-to-use basis, without time or capacity restrictions. Kognitio has solved many challenges which competing solutions have not been able to address, such as such as how to run a query in-memory when the data size means that there is insufficient memory.

Details of the infrastructure utilized for the benchmark tests, along with timings for individual queries across all three platforms can be found here.