Blog

Learning to swim in the data lake

One key trend that's expected to be seen a lot more this year is the use of unified repositories for all a business' data, rather than siloing information away in various departments' data warehouses. This 'data lake' approach has been made possible by technologies such as Hadoop, which offers a much faster solution for processing this huge amount of data at a much lower cost than was previously possible.

It was noted by TechTarget that these promises of better efficiency and lower costs are attracting many companies to this big data analytics strategy – particularly in data-intensive industries such as telecommunications, healthcare, manufacturing and financial services.

The publication explained: "The data lake concept takes Hadoop deployments to their extreme, creating a potentially limitless reservoir for disparate collections of structured, semi-structured and unstructured data generated by transaction systems, social networks, server logs, sensors and other sources."

In some cases, Hadoop-based data lakes can become the centrepiece of complex analytics architectures.

But businesses need to be wary when approaching this solution, as if projects are not managed in the right way, it could end up creating more problems than it solves. For instance, Gartner analyst Nick Heudecker explained to TechTarget that if proper precautions are not taken with issues such as security and data governance, "it could result in piles of information that could be breached, or from which bad decisions could be made".

What's more, the analytics skills needed to get the most tangible business benefits from the solution remain in short supply, he continued.

Sai Devulapalli, head of product marketing and data analytics at Pivotal, also noted that data lake technology is still in the nascent stage of development and the required technologies are not simple to use. Coupled with this, overall adoption of data lake approaches remains in the low double digits – despite all the hype about Hadoop's potential – which means there are few deployment guides businesses can turn to for information.

However, despite the challenges, firms that do make the effort are enjoying strong results. And one company to witness positive outcomes is US insurance provider Allstate, which turned to a Hadoop data lake to help meet its twin goals of boosting revenue and creating a better customer experience.

Mark Slusar, a quantitative research and analytics fellow at the firm, explained to TechTarget how the company uses a data lake to sift through decades worth of data that had been siloed into many different databases in the past.

"Previously, a lot of the data we looked at was only at the state level because data at the country-wide level was so large that we didn't have an effective way to work with it," he stated. However, the Hadoop-based data lake has made this data much more organised and centrally-located, while computing power is much faster than what was available in the past. As a result, operations that used to take months can now be completed in hours.

One example of where this improved Allstate's performance is how the firm underwrites home insurance policies – which usually cannot be done until a property inspection takes place. These typically cost Allstate a few hundred dollars for a home inspector, as well as disrupting the prospective customer's day, but sometimes they turn out to be unnecessary.

"We were able to go through historical data for different neighborhoods and apply predictive algorithms, which identified areas where we could cut out inspections," Mr Slusar said. This enables the company to reduce the number of inspections it carried out by 20 per cent – saving Allstate more than $3 million last year.