When businesses are looking to deploy big data analytics solutions, one question that needs to be answered is how they will support their applications with their IT infrastructure.
This may often end up being a factor that can be the difference between success and failure for a project – but there is a great deal of uncertainty about what the best way to approach this is.
It was noted by TechTarget that this is because there are still no clear best practice guidelines for big data infrastructure, and the toolsets needed to support applications are not well understood. However, as big data activities can be a significant drain on infrastructure, there is often little margin for error.
One of the major issues is that it can be very difficult at the outset of a project to predict what resources will be required to manage applications effectively. For instance, the publication highlighted one telecommunications provider that is looking to use Hadoop to analyze content, usage and monetisation data generated by a new digital service.
But the company remains in the dark about what infrastructure it will need to put in place to handle this. The vice-president of technology responsible for the rollout said: "It's impossible to do any kind of capacity planning on a product that hasn't launched yet."
Therefore, taking an incremental approach and turning to solutions that can be easily scaled up will be essential to the success of many projects.
One option for this will be to turn to cloud computing for assistance, as this typically offers a high level of scalability – and if the data businesses are looking to analyse is generated in the cloud, it makes sense to keep it there.
However, TechTarget noted that when organisations are dealing with a mix of cloud-based and on-premises information, this can complicate matters if data is available, but residing on storage formats that are not accessible by big data applications.
For some applications, such mixed infrastructure will be unavoidable. In the case of the telecommunication provider's service, for example, it will be necessary to use data from both the cloud and on-premises. It is therefore important for any big data solution to support both, for compliance reasons and to save time and network bandwidth.
Another key consideration should be cutting down on latency. In an era where real-time analytics is increasingly in demand, any delays in getting data from a storage array to an analytics application can have a major impact on performance.
Therefore, companies need to examine ways to avoid this. One solution is to use in-memory computing, which allows users to load data into RAM in order to run queries immediately. This can give businesses much faster results than if data is being transferred from disk into applications and help ensure their big data infrastructure is supporting their efforts, rather than holding them back.