As more businesses come to recognise the value that big data can bring to their operations, one of the key questions will be how companies implement the technology to ensure the best results.
One area that is often not given a high priority is the supporting infrastructure that will be necessary to handle large volumes of information and fast processing requirements. It was observed by Infoworld that if infrastructure cannot keep up with these demands, big data deployments may quickly become problematic.
Therefore, one of the key factors for a successful project will not just be implementing the latest big data tools, but ensuring their ability to respond quickly and flexibly to the needs of a business.
This can often mean having the elasticity to scale up operations on-demand or being able to turn to new tools as and when they are required.
Matt Wood, data science chief at Amazon Web Services, told the publication that in many cases, success will not come down to making a choice between certain tools, such as Spark or Hadoop. Instead, the most effective solutions will include a broad set of options that allow users to interact with data in a variety of ways.
"If you're using Spark, that shouldn't preclude you from using traditional MapReduce in other areas, or Mahout," he explained, adding: "You get to choose the right tool for the job, versus fitting a square peg into a round hole."
Mr Wood stated there are three key components to any successful analytics systems. These include the need for real-time analytics and being able to rely on a 'single source of truth' for data. This eliminates problems of duplication and sprawl, as well as ensuring all personnel are working from the same information.
Being able to turn to dedicated task clusters is another essential. Mr Wood explained these are a group of instances running a distributed framework like Hadoop, but spun up specifically for a dedicated task like data visualisation.
In many cases, solutions such as cloud computing will have an integral part to play in this, as this technology allows businesses to access the resources they need to make big data a success without having to devote large capital expenditure to systems that may only be used occasionally.
Hortonworks vice-president of corporate strategy Shaun Connolly told InfoWorld that this flexibility will be particularly useful when enterprises are looking to create ad hoc clusters to work with specific data subsets, or when they are trying to demonstrate proof of concept solutions.
However, he added: "Once that's done, the question becomes, 'Will this move on-premise because that's where the bulk of the data is, or will it remain in the cloud?'"
In most cases, this dilemma will not be answered by either a fully on-premise approach or a full cloud solution, Mr Connelly said, noting that flexibility will again be the key here.
He stated that in cases where the bulk of the data is created on-site, analytics will typically remain on-premises. However, in other scenarios, such as stream processing of machine or sensor data, the cloud is a natural starting point.
"It's going to be an operational discussion around where do you want to spend the cost, where is the data born and where do you want to run the tech. I think it's going to be a connected hybrid experience," Mr Connelly said.