Plugging gaps in Zika surveillance with online news reports

Zika surveillance
Zika virus disease reports as of May 31, 2016 (click to enlarge)

As the Zika epidemic continues to unfold, most affected countries are flying blind: they have limited government disease surveillance systems in place to track new cases. That leaves public health officials unable to estimate how fast Zika is spreading, where the hotspots are and when the outbreak will peak — much less contain it and prepare for cases of microcephaly and Guillain-Barré syndrome, both now presumed to be caused by the Zika virus.

“One of the things we really struggled with in the early days of Zika was a lack of official data sources,” says research fellow Maia Majumder, MPH, of the Computational Epidemiology Group at Boston Children’s Hospital. “Surveillance has been really lagging. When we don’t know how many cases there are day to day, week to week, it’s really hard to characterize how bad an outbreak is.”

A study this week led by Majumder suggests a readily available data source for estimating actual case counts on the ground: online local news reports, adjusted using data from Google search trends.

Zika map mark ShutterstockMajumder, group director John Brownstein, PhD, and colleagues focused on Colombia, the country with the second largest number of cases and one of the few to have started publishing official surveillance figures. They chose a six-month period (October 16, 2015 to April 16, 2016).

The team began by extracting news reports — mostly local, Spanish-language articles — from Boston Children’s digital surveillance tool, HealthMap, using natural language processing, with some human curation, to pick up potential cases.

“It’s largely automated, unlike government surveillance systems, so we don’t need as much manpower to aggregate cases,” says Majumder, who also has an appointment at MIT.

From the news reports, they estimated the number of weekly cases occurring nationally. Bilingual curators checked the reports to make sure they truly represented new cases and to eliminate duplicates.

The team then used data on Google search trends for Zika in Colombia to “smooth” their charted curve, distributing the media-reported cases over a more realistic time frame. Combining the data allowed the team to account for the fact that news reports themselves probably prompted more searches, though they found that Google searches sometimes preceded news reports.

Finally, they applied mathematical models to their data to calculate the “basic reproductive number” — how many new cases result from one person getting Zika, which turns out to be around two — and made population-wide projections.

When they charted these and compared them with actual government data from Colombia’s Instituto Nacional de Salud (INS), the curves matched quite closely.

Zika surveillance estimates vs official dataClose enough, Majumder believes, that their nontraditional approach could reliably guide allocation of resources — like arranging enough care for pregnant women with Zika and providing enough diagnostic kits to test people who fall ill. With Zika vaccines under development, they also want to be able to estimate how many people would need to be vaccinated to prevent transmission. In Colombia’s case, their data suggest a need to immunize 50 percent of the population.

The team is offering its assistance to health ministries in other countries affected by Zika.

“We’re looking into El Salvador now, and Puerto Rico,” Majumder says. “We’re also concerned that the strain of Zika affecting the Americas, which is of an Asian lineage, has now been imported to Africa for the first time. The African continent is even less equipped surveillance-wise to keep tabs on what is going on. We’re also hoping to do similar analyses as outbreaks of other diseases pop up, too.”

Check out Health Map’s live Zika tracking map. If you click the arrow you can recapitulate Zika’s global spread since the first report in March 2014.