Blog

NIH highlights use of big data in disease research

The US National Institute of Health (NIH) has highlighted the importance of big data in helping track infectious disease outbreaks and formulating response plans.

In a study published as a supplement in the Journal of Infectious Disease, the body observed that data derived from sources ranging from electronic health records to social media has the potential to provide much more detailed and timely information about outbreaks than traditional surveillance techniques.

Existing methods are typically based on laboratory tests and other data gathered by public health institutions, but these have a range of issues. The NIH noted they are expensive, slow to produce results and do not provide adequate data at a local level to set up effective monitoring.

Big data analytics tools that can process data gathered from internet queries, however, work in real-time and can track disease outbreaks at a much more local level. While the technology does have its own challenges to overcome, such as the potential for biases to emerge, these can be countered by developing a hybrid system that combines big data and traditional surveillance.

Cecile Viboud, PhD, co-editor of the study and a senior scientist at the NIH's Fogarty International Center, said: "The ultimate goal is to be able to forecast the size, peak or trajectory of an outbreak weeks or months in advance in order to better respond to infectious disease threats. Integrating big data in surveillance is a first step toward this long-term goal."

She added that now that proof-of-concepts for the technology have been demonstrated in high-income countries, researchers can examine the impact big data may have in lower-income economies when traditional surveillance is not as widespread.

However, the NIH warned that big data must be handled with caution. For example, organisations must be wary about relying too heavily on data gleaned from non-traditional data streams that may lack key demographic identifiers such as age and sex. They must also recognise and correct for the fact that such sources may underrepresent groups such as infants, children, the elderly and developing countries.  

"Social media outlets may not be stable sources of data, as they can disappear if there is a loss of interest or financing," the body continued. "Most importantly, any novel data stream must be validated against established infectious disease surveillance data and systems."

The NIH's supplement features ten articles that highlight promising examples of how big data analytics is able to transform how disease outbreaks are monitored and responded to.

Experts in computer science, data modelling and epidemiology collaborated to look at the opportunities and challenges associated with three different types of data – medical encounter files, crowdsourced data from volunteers, and information generated by social media, the internet and mobile phones.

Professor Shweta Bansal of Georgetown University, a co-editor of the supplement, stated: "To be able to produce accurate forecasts, we need better observational data that we just don’t have in infectious diseases. There's a magnitude of difference between what we need and what we have, so our hope is that big data will help us fill this gap."