5 reasons you need to be wary of using social data

One of the biggest advantages of many people's current social media obsession is that it can give businesses a huge new insight into the minds of their customers they simply was not possible in the past.

Effective analysis of social media data can help firms spot trends and learn about the interests of their consumers. And, as it's freely available and relatively cheap to gather, it is no wonder more companies are incorporating it into their big data analytics programmes.

But organisations need to be wary about relying too heavily on this information. This is according to researchers from Carnegie Mellon University (CMU) in the US and Canada's McGill University, who have been studying the use of social data. And they warn that if not approached in the right way, the results firms get from these datasets could be very misleading.

Juergen Pfeffer, an assistant research professor in CMU's Institute for Software Research, said that firms often believe that if they gather a large enough dataset, they can overcome any biases or distortion, but they may not always be the case.

"Not everything that can be labelled as 'big data' is automatically great," he added, saying: "The old adage of behavioural research still applies: know your data."

The researchers therefore highlighted five reasons why companies need to be wary when they're dipping into social media to support their big data.

1. Different platforms attract different users

A common mistake is to assume that all social media platforms are alike, but this is far from the case. For instance, Instagram is particularly appealing to those aged between 18 and 29, while Pinterest is favoured by females aged 25 to 34 with high household incomes. But the researchers rarely correct for, or even acknowledge, these biases.

2. Public feeds don't provide an accurate overall picture

The information that is publicly-available to researchers may not be a fully accurate picture of the platform's overall data, while firms are generally in the dark about when and how social media providers filter their data streams.

3. User behaviour can be influenced by site design

How social platforms choose to create their sites can also skew data, by dictating how users behave and, therefore, what behaviour can be measured. For instance, the lack of a 'dislike' button on Facebook makes negative responses to content harder to detect than positive 'likes'.

4. Failure to sift out bots can skew results

Some social networks, such as Twitter, are notorious for the large number of bots and spammers, which masquerade as normal users to boost follower counts. If these fake users are not excluded from data sets – which may be a difficult task for many organisations – they can significantly alter the outcome of analytics.

5. Easy-to-identify categories give false impressions of accuracy

Researchers often report results for groups of easy-to-classify users, topics, and events, it was noted, which makes new analysis methods seem more accurate than they actually are. For instance, it has been estimated that efforts to infer a user's political affiliations through Twitter are only around 65 per cent accurate for a typical user, despite some studies focusing on more politically-active users claiming 90 per cent accuracy.

Derek Ruths, an assistant professor in McGill's School of Computer Science, noted that there are solutions to these challenges, however. He said: "The common thread in all these issues is the need for researchers to be more acutely aware of what they’re actually analysing when working with social media data."