With more organisations looking to add big data analytics capabilities to their operations in order to take advantage of the huge amounts of information they have available, many firms will be examining which technologies will be the best options for their business.
One of the most popular choices for firms will be Hadoop, which is a tempting option due to its flexibility and ability to effectively manage very large data sets.
In fact, it has been forecast that by 2020, as many as three-quarters of Fortune 2000 companies will be running Hadoop clusters of at least 1,000 nodes.
But getting up and running with the technology will prove challenging for many businesses. Hadoop remains a highly complex solution that requires a high level of understanding and patience if companies are to make the most of it.
Therefore, it will be vital for organisations to develop and adhere to proven best practices if they are to see a return from their Hadoop investment. Several of these were recently identified by Ronen Kofman, vice-president of product at Stratoscale.
He noted, for example, that it is a bad idea to immediately jump into large-scale Hadoop deployments, as the complexity and costs involved with this open up businesses to significant risks, should the project fail.
However, he added that the flexibility and scalability of Hadoop make it easy to start small, with limited pilots, then add functionality as businesses become more familiar and comfortable with the solution. While it is straightforward to add nodes to a cluster as needed, it is harder to scale down should an initial development prove to be overly-optimistic.
"Choosing a small project to run as a proof-of-concept allows development and infrastructure staff to familiarise themselves with the inter-workings of this technology, enabling them to support other groups' big data requirements in their organisation with reduced implementation risks," Mr Kofman said.
Another essential factor to consider is how business manage the workloads of their Hadoop clusters. The open-source framework of Hadoop enables businesses to very quickly build up vast stores of data without the need for costly purpose-built infrastructure, by taking advantage of technology such as cloud computing.
But if close attention is not paid to how these are deployed, it is easy to over-build a cluster. Mr Kofman said: "Effective workload management is a necessary Hadoop best practice. Hadoop architecture and management have to look at whole cluster and not just single batch jobs in order to avoid future roadblocks."
Organisations also need to maintain a close eye on their clusters, as there are many moving parts that will need to be monitored, and Hadoop's inbuilt redundancies are somewhat limited.
"Your cluster monitoring needs to report on the whole cluster as well as on specific nodes," Mr Kofman continued. "It also needs to be scalable and be able to automatically track an eventual increase in the amount of nodes in the cluster."
Other areas that IT departments need to pay close attention to include how data coming from multiple sources integrates, and what protections are in place to secure sensitive information.
Getting all these right is vital if Hadoop projects are to be successful. With big data set to play such a vital role in the future direction of almost every company, being able to gather, process and manage this effectively will be essential.