Key reasons why your big data may be misinterpreted

With so many businesses now adopting big data analytics programmes to study the wealth of information they have available, one frequent problem that is emerging is that organisations are struggling to convert this into effective results.

It was noted by InformationWeek that as more sources of information get added to the mix, this opens up new possibilities for analysing data, which leads to more potential results.

This can cause issues if, for instance, two processes to examine the same data end up giving different outcomes, as firms will be unsure which to trust. And if businesses go down the wrong route, it may be many months before errors are uncovered.

"There are inherent uncertainties in algorithms, models, outcomes, and sometimes the data itself that can impact conclusions," InformationWeek noted, while human factors also play a part.

The publication highlighted several key reasons why different analyses of the same data sometimes lead to different conclusions.

For starters, it noted that it is important to be working with the best-quality data possible. Often, the information gained from third-party sources may be incomplete, inaccurate, inconsistent, irrelevant, or outdated, so it needs to be cleansed prior to being analysed.

If such processes are not applied, or not applied consistently at different times, running the same input data using the same parameters could produce dramatically different results.

Using different algorithms to study information may also yield differing results, as some may be better suited to one particular task, be more efficient, or offer less uncertainty than others. Similarly, data models each have their own set of parameters that may cause discrepancies in results. To counter this, organisations must specify values for their models and ensure that it is understood that outputs will only apply to these conditions.

However, even if businesses do all they can to ensure technical consistency across all their data analytics operations, there is still the human element to consider. It may often be the case that two individuals can study the same data in the same way, and draw different conclusions, as one or both of them bring different – sometimes unconscious – personal biases into the equation.

Kirk Borne, principal data scientist at Booz Allen Hamilton said it is not necessarily a bad thing if two people reach different conclusions, as it allows them to discuss the reasons behind this and identify any biases they are unaware of.

"If you stay within boundaries where everything is a yes answer, then you're not going to learn anything," he continued. "You want to get to the point when your model and your algorithm fails, the point where things went from being good to not being good, [because] that's where the real knowledge is discovered."