Many businesses are set to adopt Apache Spark as a key part of their big data analytics initiatives this year, with a new survey suggesting the technology will see major growth.
Introduced last year, the technology is an open-source platform for large-scale data processing that promises to greatly reduce the time needed to run tasks within the Hadoop framework. As a replacement for MapReduce, it can perform both batch processing and newer workloads such as streaming, interactive queries, and iterative algorithms.
A study by Typesafe has revealed awareness of the tool is on the rise, with the company describing this trend as having a 'hockey stick' shape, with a sharp upswing in interest this year. It found more than seven out of ten respondents (71 per cent) have some level of evaluation or research experience with Spark, while 35 per cent stated they intend to adopt it soon.
Spark is also becoming increasingly attractive to companies that already have big data analytics solutions in production. Of these respondents, 82 per cent stated they are keen to replace MapReduce with Spark as their core processing engine.
The main reason for this will be the faster data processing that Spark promises to deliver to enterprises. More than 78 per cent of professionals cited Spark's improved processing power when compared with MapReduce as a key benefit.
Meanwhile, two-thirds of respondents said the ability to process event streams is an important factor, as this is a capability MapReduce lacks.
Dr Dean Wampler, big data architect at Typesafe, commented that developer interest in Spark has been fuelled by demands to process data at ever-faster speeds.
He added: "Hadoop's historic focus on batch processing of data was well supported by MapReduce, but there is an appetite for more flexible developer tools to support the larger market of 'mid-size' datasets and use cases that call for real-time processing."
As more companies look towards real-time and predictive analytics to improve their decision-making and get ahead of their competitors, speeds will increasingly become a key component of their thinking when examining possible solutions.
Matei Zaharia, chief technology officer at Databricks and vice-president of Apache Spark, added that he is particularly excited by the wide breadth of use cases that is emerging for Spark, which ranges from batch jobs to streaming and machine learning.
"It's this type of direct feedback and dialogue with our community that enables us to continue to improve the usability, performance and built-in libraries of Spark," he added.
The survey also found that while businesses have some concerns about barriers to adoption, these will not usually be major stumbling blocks. Respondents named a lack of in-house experience with the technology, the perceived immaturity of some Spark components and how the tools will integrate with other middleware and management tools as issues than will need to be addressed.
But with one in five developers planning to use Spark in 2015 – and 17 per cent of respondents already in production with the technology – many of these issues are likely to be solved as familiarity with the tool grows.