With many businesses looking to expand their number of data sources to improve their analytics projects, one area that's seeing a great deal of focus is how companies capture and evaluate data gleaned from social media.
As Facebook claims to have over a billion users, and Twitter in excess of 300 million active users, there is clearly a huge amount of potential data on these platforms that can give businesses an insight into what their customers are thinking.
At the same time, affordable big data analytics tools such as Hadoop that are capable of handling large amounts of this unstructured data have allowed many more enterprises to take advantage of the opportunities this opens up.
But are companies relying too heavily on the information they gain from social media? A new study by Northwestern University has suggested that in many cases, businesses may not be accounting for the systemic biases that these platforms have.
Researcher at the institution professor Eszter Hargittai, who heads the Web Use Project, explained that the key thing businesses must bear in mind when analysing social media data is that their subjects are self-selecting. That is to say, they do not use sites such as Facebook and Twitter randomly, but make a conscious choice to engage.
This means the data they produce may be potentially biased in terms of demographics, socioeconomic background or internet skills, the research stated. This can have significant implications for businesses and other organisations that use big data, because it excludes certain segments of the population and could lead to unwarranted or faulty conclusions.
Prof Hargittai said: "Many data sets that use so-called 'big data' rely on social network sites such as Facebook and Twitter. But studies rarely discuss that people who select into using Facebook and Twitter don't necessarily represent larger populations."
For example, a local authority may turn to Twitter to collect local opinions about how to improve the community. But in cases like this, it will be vital for them to understand what sort of cross-section of people will be likely to respond to the question.
"You could be missing half the population, if not more. The same holds true for companies who only use Twitter and Facebook and are looking for feedback about their products," prof Hargittai said. "It really has implications for every kind of group."
The data examined by the research revealed there there are several factors that influence what social media sites consumers choose to interact on, such as age and gender.
Prof Hargittai said: "Even among young adults who are generally thought of as the most active on social network sites, we see socioeconomic differences when it comes to Twitter and Tumblr. We also see gender and skill differences on who is on what site."
Therefore, these biases will have to be taken into account when businesses are incorporating social media into their big data projects. By being aware of the potential for the results to be skewed, companies will be able to adjust their operations accordingly, add other sources to create a more complete picture and ensure their projects stand the best chance of success.