Blogs

Hadoop 2.0 “an important step” in data analytics

October is expected to see the general availability release of Hadoop 2.0, the next major upgrade to the data analytics tool.

Merv Adrian, an analyst at Gartner, told the New York Times' Bits blog this will mark an "important step" in the evolution of the platform, as it will make Hadoop a much more versatile solution that businesses will be able to put to a wide range of uses.

The publication observed that as big data becomes more essential to the operations of many firms, technologies such as Hadoop will be a crucial part of their toolbox. It noted Hadoop is increasingly seen as the bedrock of many big data solutions as it offers a relatively inexpensive way to process large quantities of data, with the next generation expected to build on these capabilities.

Mr Adrian explained Hadoop 2.0 is able to handle much larger quantities of information than its predecessor, while it will also open the door to effective real-time analytics for many firms. 

In the past, Hadoop has been used mostly to help divide up very large data sets for analysis, but it has only been able to do this in batches, rather than a continuous stream. This will be addressed by the new update, while the technology has also been tweaked to work more easily with traditional database tools such as SQL.     

Arun Murthy, release manager for Hadoop 2.0 at the Apache Software Foundation, told the New York Times that tools such as Hadoop have become essential as the volume of data held by organizations grows, as well the demands to get fast results from their analytics activities.

"Everybody has the amount of data Yahoo! and Google did five years ago," he explained. Many companies are therefore trying to find useful insight from this information – be it gathered from the web, social media, web or sensors – and will want to do so as cost-effectively as possible.

Hadoop may offer businesses of all sizes and in all sectors the tools they need to make the most of their data, as analysts have noted that the marketplace for these solutions is much more fluid marketplace than for other open-source technologies, such as Linux.

The New York Times stated Hadoop uses a more permissive open-source licence than Linux, which means companies can add extra features of their own choosing to their Hadoop offerings. As a result, there are a wide range of options available.

"There are different elements in these distributions," Mr Adrian said. "And the question for corporate customers is 'who am I going to place my bet with?'."

It was noted by the publication that this vibrant competition is a typical hallmark of young, fast-growing markets. However, with a great deal of uncertainty remaining in the big data analytics sector, companies will have to study their options carefully when making a decision, which may delay some businesses from getting on board with big data.

While 64 percent of companies surveyed by Gartner this summer said they plan to make big data investments in the next two years – or have already done so – nearly a third of companies said they are not yet looking to get involved.

The New York Times said this suggests many firms will demand proof of the benefits of the technology before they commit to a strategy.