Some of the key challenges businesses face when adopting Hadoop have been identified.Read More
Analytics is a Many Splendored Thing
[dropcap type=”dropcap-default” variation=”aqua”]I[/dropcap]
recently reviewed Gartner’s Hyper Cycle and Priority Matrix for ‘Big Data’ as part of a strategy exercise. I have to say that, having worked in business intelligence and analytics since before the term ‘business intelligence’ was an accepted part of the lexicon, it is a relief that analytics has finally come into the spotlight. But analytics, like reporting, has many flavours and these flavours have different requirements. For example, exploratory (‘train of thought’) analytics requires interaction between the analyst and the data with rapid query turnaround whereas predictive analytics does not require this interaction but may need to resolve very complex mathematical models. So there are differing needs to deliver the required insights at the right time. To satisfy these needs requires different performance characteristics in the underlying platform – characteristics that may conflict and, hence, may not be satisfied by a single platform.
For analytical platforms, one size does not fit all. There are a range of factors to consider when selecting an analytical platform and these are determined by the application requirements:
- Performance – obviously a key consideration but the answer isn’t as straightforward as ‘as fast as possible please’. In the example above, for exploratory analytics it is important that the thought process is not destroyed by a non-performant platform whereas for predictive modelling it may be more important to be able to get to the detail.
- Scalability – for some companies and applications, the need to scale is essential (consider the likes of Google and Yahoo! who are dealing with massive data volumes) whilst for others it is not.
- Latency – how quickly can data be loaded, ready for analysis? For example, OLAP engines tend not to be ideally suited for this as they have to build the structures that provide the necessary performance. Columnar databases tend to have a similar constraint, if not as pronounced. These technologies may be unsuited for near real-time applications where high data volumes are the norm.
- Complexity – for low complexity applications a simple solution or one that uses performance enhancing techniques may fit the bill whereas highly complex applications often can’t be satisfied in this way and require more raw power to resolve.
- Cost – an overarching consideration is cost. There is no point in spending huge sums on an infrastructure if the benefits are not considerably greater than the costs. I always worked on the basis of a return horizon of no greater than 9 months otherwise the project was not worth the risks associated with development.
So how does a business make a choice if one size does not fit all? Obviously, if there has to be a choice then the decision should be made on the basis of benefit – where is the best return achieved? But, fortunately, we are now in a world where this compromise does not need to be the made. There are many choices available to cover a whole range of analytical requirements and cloud deployment models enable the right platform to be used on a ‘pay as you go’ basis.
— John Coppins
Director of Product Management