For many companies, the advent of big data offers new opportunities to transform how they operate. But for organisations whose business has always been information, the potential is even greater.
One of these enterprises is comScore, which was founded in 1999 with the goal of delivering key business intelligence to inform customers about what is happening online.
However, as usage of the internet has exploded in recent years, the volume of data it deals with has grown exponentially. In an interview with CIO.com, chief technology officer at the business Mike Brown said it was around 2009 where the tipping point occurred.
"Prior to that, we had been in the 50 billion to 100 billion events per month space," he said, but in the summer of that year, the floodgates opened and the volume of data the company ingested shot up. By December last year, comScore was handling upwards of 1.9 trillion events a month – equating to more than ten terabytes of data a day.
To deal with this, the business needed advanced big data analytics tools, and the solution it turned to was Hadoop. ComScore was one of the first production clients of the tool and it has increased its investment substantially over the years.
"Our cluster has grown to a decent size," Mr Brown said. "We have 450 nodes in our production cluster and that has 10 petabytes of addressable disk space, 40 terabytes of RAM and 18,000+ CPUs."
He therefore used comScore's experience to offer advice to other data-driven businesses that are looking to adopt Hadoop.
For instance, Mr Brown stated that businesses should resist the urge to dive head-first into a complete big data initiative before the technology has proven its value. He observed that the technology will be easy to scale up if companies start small, but being over-ambitious in the early stages could prove disastrous.
However, it will also be important that businesses persevere with their efforts and ensure they move beyond the proof of concept (PoC) stage to full production.
"Choose one thing to try to provide value to show that this does work," Mr Brown said. "Then get that into production. I'm fearful that some places choose to leave their big data projects as the evergreen PoC. It doesn't get real until you've got it in production."
Companies also need to ensure they are thinking about the long-term potential of their big data platforms, rather than focusing only on short-term gains. This may well require them to revisit their solutions from time to time and assess if they are still the most appropriate tool for the task.
For instance, Mr Brown said companies need to have a clear idea of what they will do if their current data volumes grow by ten or 100 times. Will the solutions they have in place now be able to scale up to meet these new demands, or will they have to go through the costly process of developing an all-new solution from scratch to meet the challenges of tomorrow?