Tuesday, November 11, 2008

Modeling the Flu With Google Search Queries

There is an interesting article in today's NYTimes that outlines how Google is using search queries to model flu outbreaks. Google Flu Trends watches for key search terms that could indicate someone has the flu: thermometer, flu symptoms, muscle aches, congestion, etc. By knowing when and were the searches originate they can model the spread of the flu.

They have tested their models against Center for Disease Control (CDC) data--they claim they can see the start of a flu strain 7-10 days before the CDC. Google has also tested their historical data against the CDC's and found very high correlations. There is a study in the works that will be published in Nature.

From the article:

“This seems like a really clever way of using data that is created unintentionally by the users of Google to see patterns in the world that would otherwise be invisible,” said Thomas Malone, a professor at the M.I.T Sloan School of Management. “I think we are just scratching the surface of what’s possible with collective intelligence.”

Oh, and our friend Hal Varian is quoted in the piece.

1 comment:

Yiftu said...

Google has weird ideas. The result is definitely based on assumptions that people search for flu on Google or Yahoo when they have the flu or when they see the symptoms. While this is a resaonable assumption, it makes Google's result not continous or not reliable.

There are some assumption that we can make as well. People maybe be searching for flu to learn and get information for precaution. Also the number of searches may decrease over time once the people know about the flu and how to prevent it or what medicines to use. Not all people have access tp internete and not everyone who have internete uses it on a regular bases. Some prefer other forms of informative systems, such as books, magazines, journals and news papers.

Over all, Google is doing some advancements in a pre-flu outbreak control but we may want to be careful about using Google's data and taking big measurements that could not be effective economically and maybe assuming wrong information.