The big story in the tech world earlier this week was the lengthy expose ofRead More
So you’ve got Hadoop – what are the next steps?
Big data delivers benefits after 18 to 24 months[/caption]Of all the technologies currently competing for attention in the big data analytics sector, one solution that no business can afford to ignore is Hadoop. Even though this platform is still in its relative infancy, projections for the future are highly optimistic.
Indeed, it was forecast by Forrester Research analyst Mike Gualtieri that as Hadoop continues to disrupt established ways of running analytics operations, it will become the only viable option for many users.
Speaking at the 2015 Hadoop Summit in San Jose, California, he said: “It’s a data operating system and a fundamental data platform that in the next couple of years 100 per cent of large companies will adopt.”
However, there’s a world of difference between adopting a solution and being able to make the most of it. While many companies may be driven to explore Hadoop as a result of the hype surrounding it, relatively few understand exactly how they will leverage the solution to improve their business once it is established.
More than just a storage solution
One of the biggest reasons why Hadoop deployments fail is because businesses do not use them to their full potential. In fact, in many cases, Hadoop is simply used as a cheap storage solution in which companies can dump all their data, without really considering what they do with this resource; the processing potential of Hadoop and its ecosystem is undervalued or mis-understood.
The fact that Hadoop offers a highly cost-effective way of storing data is only one of its key benefits, but it can lead to businesses failing to treat it as the powerful analytics platform it is capable of being. Coupled with the sometimes steep learning curve for the technology and its many components, it’s easy to see why companies fail to take full advantage of its potential.
The result of this is that instead of a useful ‘data lake’, where all of a business’ digital assets are easily available for continual analysis, companies end up with a ‘data attic’, in which lots of data is just parked and then forgotten about for many months. In these cases, by the time data scientists return to these attics, they will struggle to achieve timely value.
A clear plan
To avoid this, it’s vital that companies engage with their data as soon as possible. Even if they are not yet ready for running full analytics operations, encouraging users to pay close attention to the information they are inputting into their system has clear near-term benefits.
Therefore, it’s important that businesses don’t approach their Hadoop deployments with an attitude that sees them put all their data into the tools first, and figuring out what to do with it later. In order to be successful, a clear path to results will be needed, so users at all stages understand what the end-goal is and what steps will need to be taken along the way to achieve this.
If businesses don’t have such plans in place from the start, and instead treat their Hadoop as a data attic, then when they do eventually come back and look at big data analytics, they’re likely to find a sprawling mess of disparate data that requires a lot of work to convert it into useful insights.
Realising the true value
One of the best ways to prevent this is to ensure your business takes the time to assess its data estate for potential value right from the start. Instead of simply shovelling every scrap of raw data they collect into a Hadoop storage solution, companies need to effectively ‘triage’ their information base to determine whether or not elements will enhance or distract from the collective value.
Things to be considering at this stage include the quality of the data – how likely is it to be complete, clean and accurate – and how relevant you expect it to be for future use. It’s all too easy to just add data under the assumption that these are concerns to be thought about later, but doing this just creates potential clutter. To derive true value from your big data analytics, you need to plan carefully and appreciate that Hadoop needs to be much more than just a low-cost place to store your growing volumes of data.