Blogs

Apache introduces Hadoop 2

The Apache Software Foundation has announced the next version of the Hadoop open-source big data analytics tool has reached general availability this week.

Hadoop 2 is a major milestone in the development of the software and has been more than four years in the making. It should offer businesses greater reliability and scalability when conducting large scale data analytics activities.

Release manager of the solution and founder of Hortonworks Arun Murthy said: "Hadoop 2 marks a major evolution of the open source project that has been built collectively by passionate and dedicated developers and committers in the Apache community who are committed to bringing greater usability and stability to the data platform."

Apache Hadoop enables data-intensive distributed applications to easily work with exabytes of data spread over thousands of nodes, which means organizations can cost-effectively store, process, manage and analyze the growing volumes of information they gather in their everyday activities. 

Hadoop has been described as a "Swiss army knife of the 21st century" by the Media Guardian innovation Awards, which awarded it the Innovation of the Year prize in 2011. Some of the world's largest and most data-intensive organizations have adopted the tool, including Amazon Web Services, Apple, Facebook, eBay and Twitter.

New features in version 2 of the software include the addition of YARN, which sits on top of HDFS and serves as a large-scale, distributed operating system for big data applications. This is a significant alteration to how Hadoop performs its processing tasks. By sharing resources more efficiently, this enables users to run multiple applications simultaneously for more efficient support of data and is described by Apache as a "cornerstone" of the update.

It also comes with high availability for Apache Hadoop HDFS, support for Microsoft Windows and support for Federation for Apache Hadoop HDFS for significant scale compared with Apache Hadoop 1.x.
    
Aaron Myers, member of the Apache Hadoop Project Management Committee and Engineer at Cloudera, said: "The community has stepped up to the challenge of making Hadoop enterprise-ready, hardening the filesystem, providing high availability, adding critical security capabilities,and delivering integrations to enable consolidation of any kind or amount of enterprise data."

He added that the new release is "another step" for the project, adding: "Beyond the basic multitenancy customers have enjoyed for the past year, enabling them to mix batch, interactive and real-time workloads, they now have the ability to do so from within a stable foundational part of the Hadoop ecosystem." 

Mr Myers added that with the stable GA release of Hadoop 2, every distribution of Apache Hadoop will enjoy these benefits, which ensures that customers can deliver the applications they need on a single platform."

Creator of Hadoop and board member at the Apache Software Foundation Doug Cutting added that what was originally designed as a scalable batch processing system aimed at Java programmers had now emerged as one of of the most vital components of a big data analytics solution.

He added a major reason for the success of nature has been its commitment to open-source, which has permitted a wide range of users and vendors to collaborate on improving the shared platform.

Kognitio, a pioneer in the big data analytics space, is focused on providing its in-memory analytical platform to organizations that wish to integrate Hadoop into their infrastructure. This way of modernizing their data platform lowers their reliance on large data warehouse vendors, and enables them to do complex analytics on large volumes of data wherever it resides.  

Paul Groom, Kognitio's chief innovation officer, said: "We are excited about the advances in Hadoop 2.0 and have engineered advances into our Analytical Platform that enable organizations to get even higher levels of productivity out of their Hadoop clusters."