While one of the major drivers for many businesses when it comes to big data analytics is a desire to be more innovative and do something different, it's often the case that in reality, companies end up embracing a few familiar scenarios for their information.
When using Hadoop, there are several key patterns that crop up time and time again. It was noted by InfoWorld contributor Andrew Oliver that while specific implementations may have some variation, the basics are often very familiar.
He therefore identified seven key projects that Hadoop technology is commonly employed to assist with.
1. Data consolidation
Hadoop has proven to be highly useful in the creation of 'data lake' resources where information from a wide variety of sources is brought into a single location, from which analysis can be performed. The benefits of this approach are better horizontal scalability and lower costs than traditional data warehousing solutions.
2. Specialised analytics
Some businesses will find their data analysis needs can be highly specific – for example, banks with a need to perform liquidity risk or Monte Carlo simulations. "In the past, such specialised analyses depended on antiquated, proprietary packages that couldn't scale up as the data did and frequently suffered from a limited feature set," Mr Oliver said, something that is not an issue with Hadoop.
3. Streaming analytics
Streaming solutions – something that is necessary for real-time analytics – is also facilitated by Hadoop technology. Tools that can analyse data bit by bit as it arrives into a system, instead of collecting it in batches for later review, will be increasingly important as businesses expect fast results.
4. Complex event processing
The rise in demand for real-time analytics has also created headaches for many enterprises that have to deal with complex events where decisions need to be made in milliseconds, such as real-time rating of call data records for telcos or processing of Internet of Things events. In the past, such systems have been based on customised messaging software or high-performance, off-the-shelf products – but today's data volumes are too much for either.
5. Streaming as ETL
A growing number of businesses also now want to capture streaming data and immediately warehouse it somewhere, but add their own characteristics to the data, instead of simply shunting it on to a disk and analysing it later. Mr Oliver noted tools such as Storm and Kafka are best suited for these projects.
Companies that have highly specialised analytics projects are likely to find it very tough to manage several differently-configured Hadoop clusters. They may look to avoid this in future by consolidating their resources, but another solution that's growing in popularity is turning to the cloud.
7. Replacing or augmenting SAS
Mr Oliver observed that while SAS is fine for many situations, it's also expensive and therefore not suited to experimentation. But with Hadoop, users have a cost-effective way of exploring new ideas for making the most of their data. Businesses do not even have to completely replace SAS investments, as they are still able to feed the results of their Hadoop efforts into their SAS platform to get better insight.