The Shark in the Park*

I was reading my daughter a story the other night (‘A Shark in the Park’ not ‘The Happy Hedgehog Band’ – that was the previous night) and, sad to say, it started me thinking about the current state of Big Data and Information Management.

The story is about a boy who takes his new telescope to the park and looks through it at familiar places but with a different perspective (you can see how this would make me think of BI but, if not, bear with me). Because of the magnification, to him a crow’s wing looks like the fin of a shark or his dad’s (Elvis style) haircut. Needless to say, the appearance of the shark’s fin is enough to make him shout a warning to those around him.

So why did this make me think of Big Data and Information Management? Well, we are experiencing one of those points in time when the technology landscape is changing so much that there is no clear vision as to what the eventual solution landscape will look like. However, one thing is for sure and that is the fact that we are being given the opportunity to look at our data from a new perspective – Hadoop, for example, gives us access to the three ‘V’s of ‘big data’ to a level that we have never achieved in the past. It’s appearance and rapid growth has led to calls that the Enterprise Data Warehouse (EDW) is dead (I would argue that it only existed as a goal anyway, but that’s a different story – and not one that I will be reading to my daughter any time soon!).

As with most new technology environments there are risks as well as potential rewards. Whilst the EDW may be challenged in the future, there are many practices and lessons associated with data warehousing that should not be thrown out with it. For example, the old adage that ‘information is data with context’ remains true. As data in Hadoop lacks context to a level that is required for BI and analytics, this context has to be added at the point of analysis – a very time consuming process that may be an affordable price in some processes but in many others is certainly not. In these cases, an additional layer is required to provide the context in a more persistent manner. We would argue, naturally, that an in-memory MPP analytical platform is the perfect accompaniment to Hadoop.

One particular area that will, in my opinion, be challenged by the architectures being stimulated by new technologies and, as a result experience a significant shift in emphasis is data integration, but I propose to delve into that one separately in another blog.

In summary, and with due deference to the shark, whilst there is a tendency with new paradigms to throw out the old in favour of the new, it would be a mistake to forget all of those practices that were so painfully learned and are still relevant today. Otherwise the ‘shark in the park’ may actually be more threat than simple perspective.

*Acknowledgements to Nick Sharratt for The Shark in the Park

— John Coppins