Why Hadoop is a “must-have” for large firms

With the launch of Hadoop 2 earlier this month, the big data analytics tool has added even more features that should encourage businesses of all sizes to investigate what the technology could do for them.

This is especially true of the largest enterprises, which are generating huge amounts of information on a daily basis. It was noted by Forrester analyst Mike Gualtieri that if an organization is handling a lot of data, there will almost certainly be a '"sweet spot" within the business where Hadoop can be hugely beneficial.

Because of its unique approach to data management, which transforms the way companies store, process, analyze and share big data, he added that this technology is a "must-have infrastructure for large enterprises". 

Mr Gualtieri identified five key reasons why these firms should be looking to incorporate Hadoop into their operations as soon as possible.

Build a wide data repository

Many firms end up creating large amounts of 'dark data' that they are unable to gain any insight from – and this potentially valuable information is either deleted or ignored. With Hadoop's HDFS file system, users are able to store huge amounts of information and scale quickly across multiple nodes.

Mr Gualtieri said: "Firms can use Hadoop data lakes to break down data silos across the enterprise and commingle data from CRM, ERP, clickstreams, system logs, mobile GPS and just about any other structured or unstructured data that might contain previously undiscovered insights."

Cheap, quick processing

He noted that the MapReduce distributed data processing framework greatly improves the speed at which information can be analyzed, as it enables organizations to run processing in parallel instead of reading the data from files serially.

The result of this is that large quantities of data can be processed in minutes or hours rather than in days, which would be the case under older data analytics tools.

Faster insight for data scientists

What this speed means is data scientists can get success much quicker and still ensure accurate results, as they are able to test and run algorithms on massive amounts of data instead of smaller samples. "Hadoop's HDFS combined with MapReduce make it an ideal platform to run advanced analytics such as machine learning algorithms to find predictive models," Mr Gualtieri said.

See immediate returns

Because of the way Hadoop transforms and streamlines big data analytics, even early proof-of-concept deployments are likely to see strong results if implemented correctly. Many early adopters Forrester has spoken with about their solutions described their initial experiments as "wildly successful" – and this success is seen across multiple sectors. 

The market research firm observed enterprises in industries including e-commerce, finance, manufacturing, media, oil exploration and government all have large amounts of data that can be managed better with Hadoop.

A real-time future

The new features included in Hadoop 2 have greatly improved the maturity of Hadoop – which was already a lot more established than many people think. The focus for many vendors is now of enhancements such as fast SQL access, real-time streaming and better manageability features.

"The groundwork is being laid for an eruption in data management technologies as Hadoop sneaks its way into the transactional database market," Mr Gualtieri said. "Your adoption of Hadoop now, for analytical processing, will ensure that you are ready."