Friday, December 5, 2008

Wash Your Hands...

As long as we're on the subject of NPR, I thought I'd share another networks story I heard this morning on NPR. Researchers at Harvard Medical School and UC San Diego recently completed a study on the nature of happiness is social networks and found that happiness tends to be contagious. The authors found that people with happy friends tended to be happier, thus creating happy social clusters. In fact, the researchers found that a person who is close to someone who becomes happy has a 25% chance of becoming happier themselves. Additionally, the authors found that happiness tends to be more contagious within social networks than unhappiness is. You can see the social network of happiness clusters on the NPR website.

This network from 2000, colored for average mood, shows yellow as happy, blue as sad, and green as in-between.
Image of the happiness network from CNN story.
The nodes are colored for average mood. Yellow is
happy, blue is sad, and green is in-between.

Thursday, December 4, 2008

Newman's Map of the "Real World"

I know that most of you are on holiday but this is worth posting on the blog. During my trip to NYC last week, I was listening to NPR which happened to come across an interview with Mark Newman, a co-author of the book The Atlas of the Real World: Mapping the Way We Live. Newman says that the regular map that we see is not the "real world" representation. He was quoted:

"Maps can be misleading, absolutely...Your standard map of the world makes the North Pole look huge and the equator look very small. And we just accept it the way it is."

The book could be used for so many reason's including, which part of the world has more health expenditure versus which accumulate disease which kill millions of people, importers and exporters of automobiles, population densities and so on. All this things might have been already represented in different maps but Newman is applying to the whole globe which will help us see the world in different perspectives. So the largest countries that we know of could be invisible and vise verse:

The reality is that New York has more than 10 times the number of electoral votes, because its population is so much bigger. "If you just counted the amount of color, you might think the Republican Party won by a landslide," Newman tells NPR's Andrea Seabrook. "The way we do it is we change the sizes of the states to represent how many people are living in each one."

There are photo galleries of some maps on NPR website.




Friday, November 21, 2008

Complex Food Webs, Predation, and Competition

The paper “Complex food webs prevent competitive exclusion among producer species” looks at the effects that nutrient supply and predation have on the survival of multiple producer species. Brose did this by randomly assigning nutrient intake efficiency to five producers. The same producers were also put into a food web formulated based on the niche model. Then using a nutrient intake model that included two limiting nutrients a simulation was run to see what would happen to the network with just producers and what would happen to the food web including producers and consumers. There were also food web models that differed in terms of predators. Some webs only had predators that were generalist so they ate the producers in proportion to their biomass and other webs where predators are given random preference for a particular producer.

After the simulations were compared it was found that 99.7% of the time in the networks with just producers only one producer dominated and all others became extinct. This extinction is thought to be caused by competition between producers where the most efficient nutrient obtainer eventually excludes all other species. When the food webs with predators are simulated then a majority of the time more producers survive than in webs without predators. 86.7% of simulations with predators that had random preferences had more producers than number of limiting nutrients and 91.9% of simulations with predators, with random food preferences, had more producers than the number of limiting nutrients. This suggests that there is big reduction on the influence of competition when there are predators but that without predators, producers will engage in competition and the dominant competitors will drive the other producers to extinction.

As mentioned in the article these networks have assumptions built in or effects not taken into account. Although not mentioned the model also does not take the environment or disturbance. (It has been proposed by Connell (1980) that if there is intermediate harshness in environmental conditions the effectiveness of predators would be reduced allowing completion to occur. It has also been proposed in ecology that disturbances could allow inferior competitors to exist.) However the point of this model seems to have been to illustrate that just predation by itself can limit competition and it does illustrate the general concept of predation influencing competition.

Analysis and Comparison of Modern and Cambrian Food Webs

I read another paper, “Compilation and Network Analyses of Cambrian Food Webs” , looking at food webs as networks. This one was comparing modern food webs to those derived from fossils during the Cambrian period (time period between about 542 to 488 million years ago). It analyzed S (taxa), C (connectance), L/S (links per species) and 17 other separate components of networks to try and see if there were any differences between ancient and more current food webs and what might be learned from these differences. Dunne et al. analyzed 17 features including the fraction of species that are herbivores, the fraction of species without consumers and the mean length of food chains.

Dunne et al. tried to make sure that the representation of the Cambrian food web was as realistic as possible by ranking the surety of the links between nodes from 3 (highest) to 1 (lowest). They then tested their results to make sure the lack of certainty about connections in the food web did not skew their results. They did this by removing the 10, 25, 50,75 and 100 percent of number 1 links from the food web and then removed the same number of links randomly from the original food web and found the 17 conditions, S, C, and L/S for both these networks (each reduced food web was taken 100 times except the 100 percent removal of # 1 links because it would be the same each time and conditions were averaged) . When the two networks conditions were compared Dunne at al. found very little difference between the two results suggesting that the Cambrian food web is not being skewed by lower certainty links.

Of all the 17 components found, only a few proved to vary between the Cambrian food webs and modern food webs. The Cambrian food webs were found to have higher variability in total species links than modern webs. It is possible that this is due to some species having a large amount of predators. Since it is thought that the Cambrian food webs were possibly transitioning into more modern-like food webs during this period Dunne at al. suggest that the large amount of predators could be because certain species had not adapted to predation and that given time they would either become extinct or evolve defenses against the predators. One of the two ancient webs also differed from the other food webs in having a longer mean shortest path and a higher amount of loops. The longer mean shortest path length suggests a higher amount of separation between species and therefore a lack of influence between species. A high amount of loops in a food web are often thought to make the web less stable and therefore not to last very long. Both these conditions suggest that this food web is unstable and may be transitioning into a more modern-like food web.

Some of the methods performed here were a bit above my knowledge. Dunne et al. seem to have tested the assumptions in connecting the Cambrian food web quite thoroughly and the value of this model lies not in being 100% accurate but more in showing general trends. It would be interesting to see what the same analysis done on an older food web, thought to be before the transitions stage of the Cambrian food network, would look like compared to modern networks. It may be that there is not a food web available for this comparison but if there was it might show a different structure than the transitioning food web of Cambrian and the modern food webs.

The Napoleon Dynamite Problem

I was not planning on blogging at the end of week ten but this article in the New York Times caught my eye. It discusses the Netflix challenge to create an algorithm to improve their recommendation algorithm. The part that I found particularly interesting is that certain movies are really hard to classify. Bertoni, the computer programmer in the article, says that his algorithm is really accurate for the fast majority of movies but there are a few movies that are really hard to predict. Napoleon Dynamite for example. It is one of those movies that people either love or hate and it is hard to say why. He says that other polarizing movies such as Lost in Translation, Fahrenheit 9/11, and Kill Bill are also hard to predict. The difficulty of predicting specific movies and relative ease of predicting others adds an interesting dynamic to the network of movie preferences.

Tuesday, November 11, 2008

Modeling the Flu With Google Search Queries

There is an interesting article in today's NYTimes that outlines how Google is using search queries to model flu outbreaks. Google Flu Trends watches for key search terms that could indicate someone has the flu: thermometer, flu symptoms, muscle aches, congestion, etc. By knowing when and were the searches originate they can model the spread of the flu.

They have tested their models against Center for Disease Control (CDC) data--they claim they can see the start of a flu strain 7-10 days before the CDC. Google has also tested their historical data against the CDC's and found very high correlations. There is a study in the works that will be published in Nature.

From the article:

“This seems like a really clever way of using data that is created unintentionally by the users of Google to see patterns in the world that would otherwise be invisible,” said Thomas Malone, a professor at the M.I.T Sloan School of Management. “I think we are just scratching the surface of what’s possible with collective intelligence.”

Oh, and our friend Hal Varian is quoted in the piece.

Friday, November 7, 2008

Obesity: A contagious disease?

It seems obvious that diseases such as colds and STDs are spread through social networks, but it now seems that obesity may also be. The Framingham Heart Study, a longitudinal study looking at health, specifically weight, among people and their friends, shows that obesity tends to cluster in social networks.

It does not seem surprising that obese people may tend to be friends with other obese people, but the Framingham study shows that if someone becomes obese in a given time interval, their friends have a greatly increased chance of also becoming obese. Possible reasons for this could be that friends may simultaneously adopt similar lifestyle choices such as diet, exercise or smoking, which could effect their weight. There could also be changing attitudes towards weight that could spread through social networks and make one more or less inclined to gain/lose weight because of social pressures. Similar effects were seen among siblings, where if one gained weight, another sibling is also likely to gain weight.

The effects both among siblings and friends are strongest among same sex relationships. They also depend primarily on social distance, ie. closer friends have stronger impacts, and the effects appear to be independent of geographic distance.

Another article on the spread of obesity through social networks, looks at how obesity, and other diseases, should be looked at as a hierarchy of networks (see figure below). On the bottom, there are all of the molecular networks such as gene regulation, protein interaction and metabolic networks which can be studied to look at the molecular basis of disease. Then the paper discusses a disease network where diseases such as obesity are also linked to other diseases such as heart disease and diabetes. Diseases could be connected because they frequently co-occur, one leads to another, they involve similar molecular components or other types of connections. Lastly, there is the social network. Diseases such as the flu or HIV could be passed directly through a social network or it could be a transfer of attitudes or lifestyle.
The relation between the networks adds an interesting angle to the subject. For example, genetics. One's genetics most obviously effect the molecular aspects of disease, but the social network has a big impact on genetics because within an obesity cluster, genetic predispositions for obesity are likely to be passed on to children from both parents. The combined factors of genetics, lifestyle and social pressures could make these children highly prone to obesity.

The interacting levels of networks may also lead to new strategies for treating obesity. The Framingham study emphasizes that their findings suggest that the social network provides an important resource in treating obesity as well and that perhaps social support groups could be a helpful strategy. It seems so human ecological to say it, but issues like obesity and other public health problems, need to be addressed on many levels, and the hierarchy of networks provides a model for how those levels interact.

Sunday, November 2, 2008

Readers of Taste

"The main problem, if that's the word, is that we live in the physical world," writes Chris Anderson. The problem, if that's the word, is "suffering the tyranny of lowest-common-denominator fare, subjected to brain-dead summer blockbusters and manufactured pop." And to free ourselves of this tyranny is to see expression of "our true taste, unfiltered by the economics of scarcity." The Internet's virtuality, then, offers the ideal, trapped in the physical world, unmediated expression. Anderson gets palpable, winking glee at the coming of this utopia from the possibility that we may all be individuated by consumer choices. He would find companions among the economists who, looking only at the numbers, argue that oil is a commodity only limitable by technologic development, which they see as finding its limitlessness through absolution from physical constraints. The physical medium is ignored in favor of its purpose: in Anderson's case, to buy digital products. There is an infrastructure at work here, whose costs are becoming more and more negligable, but is dependent on not just fallable sytems, but on human labour. The college student downloading iTunes tracks may do so because of a physical network that must be maintained and powered. "The Long Tail" is only long in a particular dimension; Anderson neatly only addresses anyone who cares to and can read his article by referring to us as "we."

Thursday, October 23, 2008

"Do Schools Kill Creativity?" Well, find that out!

I know this has nothing to do with the class but you should watch it anyways. Nafisa and I were surfing on TED.com and we found Sir Ken Robinson's talk on "Do Schools Kill Creativity?" Why should you watch this? He is funny; mentions human ecology all the way at the end; points some fascinating thoughts about "you" and the creativity.

It is weeks six and you need a little break. And of course, you should use your imagination to create a network niche with the topic. Trust me you wont regret! Here is the video. comments are welcomed.

Binomial Coefficients and the Election

I've been following 538.com: Electoral Projections done Right very closely the last few weeks. The blog authors collect polling data from a wide range of sources, weigh it according to their sense of the poll's reliability, and produce aggregate projections. I'm not an expert, but from what I can tell the blog's authors are very good; I've learned a lot from reading the blog and I think their way of handling data makes a lot of sense. If you're interested in the art and mathematics of political polling, or if you just want to follow the build-up to the election, I highly recommend it.

Anyway, just moments ago what's wrong with this picture appeared in my feed reader from 538.com. In it, the author skewers some fishy poll results that were published earlier today. To do so, he makes reference to the binomial distribution. Proof that I wasn't lying when I said in class that binomial distributions are super useful.

Network analysis of the IPCC

Last class we briefly discussed how networks could be applied to a vast array of topics. This term in my Global Environmental Politics I’ve developed a strong interest in the politics of climate change. A few weeks ago I stumble across a paper on the “Network Analysis of the Intergovernmental Panel on Climate Change (IPCC),” in which Travis Frank, Robert Nicol, and Jaemin Song from MIT examine the team structure, network architecture, and other major influences of the IPCC’s Third Assessment Report.

The IPCC–who in 2007 shared the Nobel Peace Prize with Al Gore– is the scientific body in charge of assessing on a comprehensive, objective, open and transparent basis the latest scientific, technical and socio-economic literature produced worldwide relevant to the understanding of the risk of human-induced climate change, its observed and projected impacts and options for adaptation and mitigation.

The hundreds of climate change scientists who were members of the IPCC at that time, their nationality, fields of expertise, team collaboration among other things make this study quite interesting in understanding the politics of climate change within the scientific community.

The calculation of the clustering coefficient shows that the authors in the IPCC report are less connected than Newman’s least connected research field. They also found that the “longest shortest” path in the network was 19 edges. “Centrality betweenness” is used to rank the top 20 authors and shows that developing countries are not represented as well as the developed countries.

I feel that this paper is a clear example of how network theory can be applied to a specific area to prove arguments related to collaboration in climate change.

Wednesday, October 22, 2008

Treating Food Webs as Undirected Networks

After the discussion of the ecology papers in class on Friday I got curious about whether anyone had attempted to address the tendency of many people studying food web networks to treat them as undirected despite their directed nature. I spent some time doing some additional reading, and I found that the trend was pretty consistent through the papers that I found. The two papers that offered a justification for it (Williams et al., 2002; Dunne et al., 2002) both stated that a directed graph could be treated as undirected, because "effects can propagate through the network in either direction.". It seems to me that whether this is a valid assumption depends heavily on what exactly the network is being used to study. If one was studying the effects on the food web from the removal of certain species then perhaps it would be an okay idea to work under, because the effects of prey and predator removal would have similar effects on organisms throughout the food web. On the other hand if one was looking at transmission of toxins through an ecosystem then one would have to use a directed network to model it, because they are only going to move in one direction. In almost every other aspect of network analysis I would think that the directed nature of the food web is too much a key part of how the web works that it should not be ignored. There are perhaps a few situations in which an undirected food web could be applicable, but it seems to me that none of the articles that I found provided sufficient justification for an undirected food web.

Tuesday, October 21, 2008

If we're only 6 degrees of separation away from Osama Bin Laden...

If you have 23 spare minutes, I would suggest listening to this, a National Public Radio program, Talk of the Nation, from January 25. It features Judith Kleinfeld, a psychology professor at the University of Alaska, and Steven Strogatz (of Strogatz and Watts, "Collective dynamics of 'small-world' networks."). It's not a rigorous conversation, but interesting none the less - perhaps particularly for those of us encountering networks theory for the first time. The discussion is primarily dedicated to the validity of the findings of Milgram's original "6 degrees" experiment, and modern examples and consequences of social connectedness. I think it is worth listening to if only to hear an informed discourse on networks - and Strogatz keeps the conversation interesting. He makes the point that there are 3 questions to distinguish when looking at the significance of small-worldness in the context of Milgram's experiment:
  1. Given 2 people, is there a short path connecting them?
  2. If there is a short path, can people find that path?
  3. If paths did exist, and people can find them, could they be used to exert influence?
Additionally, he makes the point that people must be willing to cooperate in order to make these paths possible - a complication that I hadn't considered. All of the above distinctions would seem to be relevant to similar applied small world networks - networks in which members have the ability to refuse a connection, and in which not all members can see the short paths. This program manages to frame network theory, and its relevance to the real-world, in an interesting (and entertaining) manner. Most memorably, Strogatz poses the title question, "If we're only 6 degrees of separation away from Osama Bin Laden, why can't we find him?"

Saturday, October 18, 2008

Power Laws, or maybe not

On numerous occasions I've urged caution and skepticism when reading papers claiming that there are power laws in some empirical data. Right on cue, a dubious power law claim appeared in a paper published a few weeks ago.

The paper in question is Yu, et al, High-Quality Binary Protein Interaction Map of the Yeast Interactome Network from the October 3, 2008 issue of Science. The interactome referred to is a network of protein-protein interactions. The paper claims that the degree distribution of the interactome network is power-law. This claim was critiqued by Aaron Clauset in a recent blog post, poetically titled power laws in the mist.

Specifically, Aaron examines three power-law claims from the original paper. Using maximum-likelihood estimators instead of log-linear regression, he finds strong evidence that one of the "power-laws" is definitely not a power law, one could be, and one probably is. For the real power law, he estimates an exponent that is incompatible with the values published in Science. The full blog entry is well worth reading, Aaron is a good writer and the piece is a nice discussion of the right way to look for power laws in empirical data.

Aaron's piece is an interesting example of the way that blogs are now being used as a form of scientific communication. Aaron writes

A colleague of mine asked me why I didn't write this up as an official "Comment" for Science. My response was basically that I didn't think it would make a bit of difference if I did, and it's probably more useful to do it informally here, anyway. My intention is not to trash the Yu et al. paper, which as I mentioned above has a lot of genuinely good bits in it, but rather to point out that the common practices (i.e., regressions) for analyzing power-law distributions in empirical data are terrible, and that better, more reliable methods both exist and are easy to use. Plus, I haven't blogged in a while, and grumping about power laws is as much a favorite past time of mine as publishing bad power laws apparently is for some people.

It seems to me that science blogging serves as an excellent complement to process of publishing in peer-reviewed journals. Blogs allow for informal comment, discussion, and debate in a way that can't happen in journals. This sort of back-and-forth serves as an important check on flimsy results and as a way to get quick feedback on new ideas. This sort of dialog isn't new; people have been debating and exchanging ideas at seminars, in hallways, at department gatherings, at academic conferences, and so on, for decades if not centuries. What's new about blogs is that they open up the discussion and allow lots of people to observe and participate in the fun.

Thursday, October 9, 2008

How Do Cells Function to Keep Organisms Alive?

I am sure most of you know about certain processes that take place in our bodies such as respiration and digestion. But have you ever wondered why we are alive and or how individual cells synchronize to make this possible? Well, I have, and it is very interesting to think about living organisms, especially human beings. Do you think you are alive just because you eat or breath? Metabolism is one of the processes that keep us alive. We die whenever cells stop their metabolic activities.

Metabolism is the set of chemical reactions in cells that allow organisms to grow, reproduce, walk, talk, breath, think, etc. You get the idea. What I am mostly interested in, considering that I am posting this for the Complex Network class, is the growing interest on modeling metabolic pathway networks. There are similarities between the metabolic pathways of most species, even between unicellular bacteria and human beings.David A. Fell and Andreas Wagner analyzed the structure of a unicellular bacteria's, Escherichia coli, core metabolism to identify metabolites that are central to metabolism.

Why should we learn about metabolic pathway networks? One of the reasons, is that we can then begin to compare the evolutionary history and molecular mechanism of living organism by looking at metabolic and genomic information. It is always puzzling to think about evolution and how living organisms come to existence. I am mostly confused when it comes to evolutionist and creationist theories. On one hand, I feel religiously responsible to believe in what the bible dictates. At least that is how I was raised. On the other hand, I need to know about evolutionary theories, not just because I need to pass my Biology class but also because I know that there has been research and evidence to support these theories.

I find the evolutionary theory hard to believe because of its nature of changing and reforming. There are discoveries every year, or even every day, that prove or disprove previous theories and research. But the creationist theory never changes. It is as simple as God created the earth and its inhabitants. I know if any of the evolutionists at COA read what I have written, they will think that I am going crazy. They are probably right.

There is an incredible amount of research showing links between each and every creation on earth that force me to give some credit to scientists such as Darwin. It is because of such discoveries that we become increasingly closer to knowing living creatures' behavior, structure, composition and other complex features. We also need to know how to model metabolic pathway networks to better understand these features. Surely human beings are some of the smartest, perhaps even the smartest creatures!

Usually, only a single metabolic pathway is studied applying radioactive tracers to an organism. Then the information is used to understand and label the pathways. Such an example could be the metabolic pathway of cellular respiration. But this method does not help when it comes to a more complex metabolic pathway such as the metabolism of the whole cell. The reconstruction technique has allowed researchers to construct models of more complex metabolisms. The diagram on the left shows the interaction between 43 proteins and 40 metabolites in Arabidopsis thaliana citric acid cycle. Red nodes are metabolites and enzymes and the black links are the interactions.

In conclusion, in addition to helping us understand metabolic pathways and how cells function, these types of models can be used to classify human diseases into groups that share common proteins or metabolites, which in turn leads to drug discoveries and biochemical research.

Tuesday, October 7, 2008

The Robustness of Food Webs

Network Structure and Biodiversity Loss in Food Webs: Robustness Increase with Connectance” (Dunne, Williams, and Martinez 2002) is a paper investigating the robustness of food webs using networks. It looks at 16 different food webs from 15 different places. Each web was tested to see how the removal (extinction) of certain species might affect the overall food web. A food web was said to collapse if over 50% of its species had become extinct. The robustness of food webs is represented in this paper as the fraction of removals it takes to collapse the food web over the total number of trophic species in the food web. Several other properties of the food networks were calculated to see if they correspond to the robustness of food webs including connectance, species richness, omnivory, and the number of links per species.

There were four experiments simulated with the food webs. In the first simulation the most connected species were removed; in the second the most connected species were removed excluding primary producers(like grass etc.); in the third species were randomly removed, and in the last the least connected species were removed. These simulations were conducted to see if any new insights could be discovered about a food webs robustness and to see if there were any particular species which was significantly more important than the others (i.e. a species that if removed would cause a mass second extinction).

Three out of the 16 food networks displayed power law degree distribution and small world properties but no major differences in reactions to the extinctions were found between the small world and non small world food networks. However, small world food networks are shown to be more severely affected by extinctions due to their lack of strong connectance. Out of all the properties of food webs looked at, connectance appears to be the most influential in determining the robustness of the food web. The more connectance the more robust the food web, which seems to make sense in ecological terms because the more options a predator has to feed upon others the less likely it is for them to become extinct due to loss of a prey species.

While this network analysis of food webs makes sense in mathematical terms I would stress the need to review the assumptions this analysis makes in order to critique its relevance in the real world. Since the network analysis on food webs do not take into account the adaptability of species it may be that they are generally irrelevant in testing robustness. It could be that a species will usually eat only one prey if available but if that prey went extinct then it could move on to eating another species. Also food webs will likely never be able to take into account all links between species so there is an artificial cut off point that could render this analysis unhelpful. Links may also be false due to human error or there could be a particular species that is essential in the predator’s diet because it provides some nutrient that the predator can get no other way. Therefore, even though this predator has other food sources it would still go extinct if that particular species was removed from the food web. However, if scientists could provide field evidence that connectance is an important value in the world outside this analysis of food webs could allow us to identify ecosystems that are particularly vulnerable to mass extinctions.

Monday, October 6, 2008

"It's what we swim in"

Last week I found (via Jon Shock), a thought-provoking talk by Clay Shirky about, among other things, "information overload." What I really like about Shirky's view is that he posits information overload as a fact of life. Paraphrasing slightly, he advocates for

a way of seeing the world that assumes that we are to information overload as fishes are to water: it's just what we swim in. Yiztak Rabin... has said that "if you have the same problem for a long time, maybe it's not a problem. Maybe it's a fact." That's information overload. Talking about information overload as if it explains or excuses anything is actually a distraction.... When you feel yourself getting too much information, ... don't say to yourself what happened to the information, but say to yourself what filter just broke? What was I relying on before that stopped functioning?

This is an extremely clear, and much more succinct statement of what I've been feeling for a while and have been struggling to put to words. Information overload is a fact of life. So we need to be smart and proactive about the filters that we use to manage and process this information. Check out Shirky's talk, below, for more discussion. It's only 25 minutes and is packed with interesting observations.

Networks as a Predictor for the Spread of Cancer

In class, Dave has talked about survival rates of cancer patients . Models and networks are cropping up in other useful ways in the medical field around cancer as well. I came across a study by 6 doctors who used an artificial neural network to see if this could be an effective way of predicting patterns in lymph node metastasis. 
An artificial neural network (ANN) is a computational model that can process vast amounts of data and complex relationships and find patterns. An ANN "learns" from the data and changes accordingly. Wikipedia has a page about ANN that does a nice job explaining what they are an ANN is and how it is applied in many different contexts. 
Lymph node metastasis is particularly significant because in certain kinds cancer, namely gastric and esophageal, the lymph system is the first place to which the cancer spreads. So an accurate prediction of this could help catch and effectively treat the cancer in its early stages. 
In this study, the doctors were specifically looking to determine if genetics were a leading factor in the predictability of lymph node metastasis. By using a mass of data collected by the National Cancer Center in Japan the doctors created an artificial neural network, used more data to refine it, and then tested its accuracy. In a separate study in Germany, the same ANN was was able to predict the incidence of lymph node metastasis as accurately as 96% for certain sub-types of cancer. This could mean more accurate, more efficient, and thus more effective surgeries and postoperative care, with fewer side-effects. The way surgeries are done now, lymph nodes that are not cancerous but appear irregular for other reasons are resected. The removal of lymph nodes not only decreases immunity, but can interfere with lymphatic drainage leading to such unpleasant conditions as lymphedema. Perhaps the use of an artificial neural network could prevent damage to the real living network that is the lymphatic system. 

Monday, September 29, 2008

Mathematics = Language of Interdisciplinarity?

Michael Mitzenmacher had an interesting post last week on his blog, my biased coin. Mitzenmacher is professor of computer science at Harvard University. He's written a few extremely interesting papers on power laws which we'll probably read later in the term.

Mitzenmacher was writing about attending the opening of Microsoft Research New England. The theme of the opening symposium was interdiscplinary research. Mitzenmacher writes that

a natural question was how such research should be encouraged. Nobody on the panel seemed to make what I thought was an obvious point: one way to encourage such research is to make sure people across the sciences are well trained in mathematics and (theoretical) computer science. Interdisciplinary research depends on finding a common language, which historically has been mathematics, but these days more and more involves algorithms, complexity, and programming.

Mitzenmachmer then goes on to describe a subsequent talk by Erik Demaine. The abstract of the talk is:

Theoretical computer science, and the algorithmic way of thinking, transcends our traditional boundaries. I believe that algorithms are relevant to every discipline of study, and will give eclectic examples from the arts and sciences to business and society. The examples span the spectrum from serious topics like protein folding and decoding Inka khipu to fun topics like juggling and magic.

There's a link to a video of Demaine's talk here, although I can't get the video to work right now.

I find myself quite intrigued by this and I'm not quite sure what to think, although I think I'm basically in agreement. Having a solid base in math, statistics, and some computer programming/computer science strikes me as almost indispensible for work in most sciences and social sciences. Quite generally, I think that having a strong understanding of these areas greatly expands the sort of scientific problems one can tackle. And it certainly increases the range of other scientists one can talk to and the depths that those conversations can go.

So as far as the sciences go, I'm basically quite comfortable with Mitzenmacher's statement. I wonder, though, how his statement might have to be ammended to apply to the humanities. Is there a "common language" that helps philosophers, anthopologists, historians, and literary theorists research together? Is this language broad enough to include political scientists, psychologists, or economists? My guess is that there is a semi-canonical body of thinkers or schools of thought that scholars in these areas would all be familiar with, and which could serve as a useful touchstone or frame of reference for interdisciplinary collaboration. But I'm not sure, as I'm not even close to an expert in these fields.

As for the centrality of computer science and mathematics for science, I worry sometimes that COA could be doing more to prepare students in these areas. We've graduated lots of students who are very well prepared in math and who have gone on to make good use of their math backgrounds in grad school. But I think we can do better. Part of the problem is that we could use a few more classes in math, statistics, and computer science. But I also think that there might be a subtle bias against mathematics, a perhaps unspoken idea that if you learn too much math you'll lose your sense of creativity and joy and acquire a simplistic and reductionist approach to everything. Needless to say, I disagree.

Anyway, partly inspired by Mitzenmacher, for the next two classes I want to attempt to present a sort of crash course or primer in interdisciplinary probability and stochastic processes. There are just some good, basic, widely applicable things about this area that I think (almost) every scientist and social scientist should know. We'll see how it goes. Fasten your seat belts...

Monday, September 22, 2008

Simple Networks for a Brighter Tomorrow

Lately I have been giving much thought to the psychological applications of graph theory to neural networks. If indeed a neural network is an accurate model of the human brain, what may we discover by analyzing it as a graph? Is the brain a small-world graph? Does it have significant clumping? If its degree distribution, as I hypothesize, is not poisson, perhaps it isn't arbitrary after all, and what we thought was subjectivity is actually just the logic of pathways. But undoubtedly, since it is probably not a Erdos-Remyi graph, there's something there. The possibilities are limitless. And limiting: “although neural nets do solve a few toy problems, their powers of computation are so limited that I am surprised anyone takes them seriously as a general problem-solving tool.” Jeepers. I suddenly feel very small.

Sunday, September 21, 2008

Fearfull Parents Increasing the Risk of Infectious Disease

A recent report by the American Medical Association warns that the number of parents opting not to vaccinate their children is leading to an increase in outbreaks of preventable infectious diseases. They cite recent measles outbreaks around the United States. Apparently most of the children infected had not been vaccinated against the disease.

Vaccinations work on two levels: they protect the individuals that receive the vaccines and they protect others in the community by decreasing the number of individuals in a community who are susceptible to a disease. In theory, if a large enough percentage of a group is vaccinated, there is a really low chance that the un-vaccinated individuals will be exposed to the disease, creating group immunity. This provides the reasoning behind allowing parents with particular "religious" beliefs to opt out of vaccinating their children. As long as the majority of a population is immune to a disease, the rest of the population is also protected. The problem comes when there is no longer a critical number of people being vaccinated.

My first thought was that an increasing number of parents must be choosing not to vaccinate their children. There have been a lot of stories about the dangers of vaccines so perhaps these are scaring parents away. Then I saw at 2004 report from the CDC that the number of children being vaccinated is increasing. While that is a few years old, perhaps the overall percentage of children is not the problem. Perhaps the recent outbreaks have more to do with the distribution and connectivity of the susceptibles than their number.

Even if the number of parents who choose not to vaccinate their children is a fairly small percentage of the total population, they are likely not distributed evenly. Certain beliefs are likely to be clustered in certain geographic areas which may lead to a clustering of susceptible individuals. The availability of health care will likely leave some populations more vaccinated than others. Siblings of susceptible children and likely to also be susceptible. Additionally, one parent who doesn't believe in vaccinating their children may convince other parents that vaccination is unnecessary, leading to the growth of an un-vaccinated and highly susceptible cluster.

The AMA study states that many of the current measles outbreaks are linked to international travel. Some parents may be lulled into a sense of security by the fact that most of these diseases are very rare in the United States and so the chance that their children being exposed seems really small. However, it only takes one individual to serve as a connection between an infected group and a susceptible group. According to the small world theory, these diseases are likely far closer than we would like to believe.

While the small world theory suggests that we are all closely connected, clusters with similar beliefs are likely more closely connected than other clusters within the population. For example, parents who choose not to vaccinate their children may also tend to send their children to the same types of schools or summer camps, attend similar events and choose similar vacation destinations. All of this conspires to create well connected clusters of un-vaccinated individuals. While only a small percentage of the population may be included, the infection of any one of the individuals could lead to a large outbreak among the susceptible clusters.

There is an interesting dynamic of self defeating, infectious nodes within a larger network of the population. If you picture a network of people in the United States and their daily social connections, most of the individuals are vaccinated from these diseases. There are some individuals scattered throughout the population that remain susceptible but since they are far out numbed by immune individuals, the population as a whole remains immune. Now, imagine that from each of the susceptible nodes spreads the infectious belief of not vaccinating their children. Over time, clusters of susceptibles will form. Perhaps if disease appears in one cluster, the rest of the cluster will again be convinced to vaccinate their children. There must be a fine balance between the infection rate of disease and the popularity of not vaccinating children. Perhaps it is an ongoing cycle: the more distant the disease seems to be, the more contagious the practice of not vaccinating, the more susceptible the population becomes until it becomes infected, then the popularity of the vaccine increases until the disease once again seems distant. This creates interesting challenges of convincing parents to vaccinate their children even when the threat of disease seems so minimal.

Friday, September 19, 2008

The 2 Königsbergs

In The structure and function of networks, one of M.E.J Newman’s first assertions is that Leonhard Euler’s “1735 solution of the Königsberg bridge problem is often cited as the first true proof in the theory of networks.” Dave also presented Euler’s solution in class today, advocating the importance of Euler’s solution to the field of networks. Nonetheless, this is a fairly bold assertion. Learning more about the origins of the field of networks, while perhaps slightly arbitrary, struck me as profoundly interesting. It seemed that some investigation was merited.

This research quickly yielded Euler’s original latin publication in PDF form, as well as an informative a wikipedia entry on the Seven Bridges of Königsberg problem. Perusing the Latin publication by Euler is an interesting experience – particularly seeing Euler’s original drawings of the problem. Euler employs the universal point labeling system in his proof – A, B, C, D…which apparently transcends time and language. It also left me wondering if Euler himself ever drew a figure of nodes and edges to represent the problem, or if the jump from 1736 to 20th century logic came later. Euler's original paper is interesting to ponder for a few moments before reading the Wikipedia entry.

Wikipedia additionally features a nice entry on Euler –a renowned Swiss mathematician and physicist who is famous for Euler's formula in calculus, among other discoveries. For those who are unaware (or need refreshing), the Seven Bridge of Königsberg was a problem involving seven bridges crossing the Pregel River, Prussia, via two large islands in the middle of the river. The object was to determine if it was possible to go for a walk, crossing each bridge exactly once, beginning and ending in the same location. Königsberg was the capital of Eastern Prussia until WW II, when it was occupied by the Soviet Army and renamed Kaliningrad. Kaliningrad still exists today – although it’s unclear if all the bridges survived the Soviet Army. The following, however, is a nice picture of one of The Islands from the Königsberg bridge
problem.
Image:IMG 6448.jpg
(source: Wikipedia)

Coincidentally, Königsberg was also the name of a German cruiser in WW I. It’s unclear how or why Königsberg the city and Königsberg German warship were related (or maybe it’s just a small world?), but the warship faced a similar fate as the city – it was sunk by a British plane in 1915). Thus, neither the city nor the warship Königsberg remain.

The end of this story, which many are familiar with, is that Euler solved the Bridges of Königsberg problem. He did so by reducing the system of bridges to what we would now call a network: a system of nodes and edges. Ultimately, he argued that the feasibility of such a walk depended entirely on the degree of the nodes in the system. The walk existed only if there were 2 or 0 nodes of an odd degree. This discovery has lead to the existence of the so-called “Euler path” or Eulerian trail, as well as the Eularian circuit, and the beginning of networks theory. Euler and his paths and circuits remain entirely alive and relevant in modern science. A search for Euler circuits on Google scholar brings back 27,700 articles.

Wednesday, September 17, 2008

The Future of the Internet

There is only one machine.
The web is its OS.
All screens will look into the One.
No bits will live outside the One.
To share is to gain.
Let the One read it.
The One is us.


This is Kevin Kelly's vision of the Internet in a not so distant future--the next 5,000 days. I know what you are thinking: he's some techno-cult leader promising us all lower mortgages, better sex lives and digital salvation. I thought so too, but no--Kelly is the Executive Editor at Wired, and is known for his deep and accurate insights into the future of technology, specifically our favorite network: the Web. At the 2007 TED conference Kelly gave a presentation at called "Predicting the Next 5,000 Days of the Web" where gave his vision for the future of the Internet which is exciting, powerful and scary (all at once). What is interesting from the point of network studies is his view that everything will be part of the Web--EVERYTHING--all machines, all data, all systems. It will be single global machine that is smarter, more personalized and more ubiquitous, were a sense of unity will emerge, a sense of consciousness. As Kelly says, "we all thought the Internet was going to be TV but better." Perhaps the next 5,000 days of the Web will be much more than the "Web but better." Perhaps one day there will no longer be networks but only The Network, or as Kelly calls it, "The One."

Click here to watch his presentation, it is about 20 minutes long. It is also worth checking out the TED website, they have a huge collection of amazing thinkers giving short presentation on all sorts of really cool stuff.

Kelly first gives some really interesting statistics that help us get a sense of the size of the Internet. These are some informal metrics similar to the metrics we have been talking about in class. Internet=100 billion clicks/day, 55 trillion links (edges), 2 billion chips, 2 million emails/sec, 8 TB/sec, 65 billion phone calls/day, 600 billion RFID tags. He is giving data on both the physical network (the Internet) and the data that is on it (www, voip, etc.). In addition consuming 5% of the worlds electricity.

He then goes on to make some (fairly controversial) comparisons between the complexity of the Internet and the human mind. Though the numbers of nodes and edges in the brain (neurons) and on the Internet (wires and switches) may be the same, I personnly consider the current emergent properties of the two networks to be very dissimilar, though I suppose it depends on how one defines "intelligence". By 2040 the power of the Internet will equal the processing power of humanity which he supports by positing that unlike computers, our brains aren't doubling in power every two years. Perhaps by 2040 complexity will be indeed prooved to be the cause of life.

From his point of view, this "new" Web will be different in three ways:
  1. Embodiment
  2. Restructuring
  3. Codependency
Embodiment means that the web will really start to have a physical form. It will be a collection of all the machines (computers, cell phones, cameras, microphones, GPS, cars, etc.) that can speak to the network, or what Kelly calls the Cloud. Think of the cloud this way: when you check your email, where is it sitting waiting for you to eagerly log on and read it? The Cloud. The beauty of the could is that it really doesn't matter where it is, what matters is that you can access it from any where, and with an ever growing list of devices, especially wireless ones. Devices that communicate with the Cloud serve two functions: they act as windows into into the Web, as well as acting as the eyes (cameras) and ears (microphones) of the web. In Kelly's view the web will no longer just be a set of web sites and links between them, but all things will be synchronized with the web so that everything "lives" on the web as well as in physical reality. It will become an "Internet of things." The web will branch out and include all things in its network, nothing will be offline. The network will become a network of networks. From the network's point of of view, we humans will just be "extended sensing tools". As much as the Internet will serve us, we will serve it.

Restructuring is a process where the nodes of the Internet gain an ever increasing level of granularity. The early Internet was computers linking to computer. Now we have the WWW, which are pages linking to pages. The Internet of the future will be data linking to data, ideas will link with ideas. Everything will be linked to, and in turn be defined by links to other things. The fourth stage will be an Internet of things, were physical things become linked to the web.

The last change the Internet will see will be Codependency. This means that we rely on the Internet for our very survival as much as the Internet relies on us for its existence. Kelly posits that using the Internet as a tool is no different than using written language as a tool. The codependency process has just one catch: you have to be willing to have your data shared. To really have the Internet become a tool humanity relies on in a dependent way we can have no secrets. "Total personalization will require total transparency." Having information shared in such an open way could lead to abuses and privacy issues. I think as we become more codependent, Internet privacy issues will become a much debated issue. We already have seen abuses with telecom companies allowing the US government secretly monitoring international calls of US citizens. With Codependency, the line between the virtual and real world will be very fuzzy, "we will become the web." The moral ramifications of this synergy will be huge, and needs to be scrutinized very careful to insure responsible access and usage of information.

I don't think we can fully anticipate what the Web will look like in the future--I don't think any body really knows. What this article does show us however is that at one point there was no Internet, and then it emerged, and began evolving to become the Web that we all know and love. From Kelly's perspecive Internet seems to be evolving and becoming more complex, as if it were an organism. The power of the Internet is greater than the sum of its parts, as is any organism were life is the emergent property. Emergent properties emerge (for lack of a better word) from the interactions of all of the nodes. There seem to be three properties of network evolution: more nodes, and more edges, and the emergent properties that come from the interaction between nodes.

If you subscribe to evolution (design without a designer) then the web looks very much like a life form. It began as very simple networks and has grown into something that is evolving and growing on its own. This raises an interesting questions regarding networks: is there a point where a network become sufficiently complex and large where where really unexpected and amazing things start to happen? Like life, or the Web.

Though Kelly not positing that the Web is an organism in the traditional sense. I think he is positing that it is an organism in the non-traditional sense, which started like all network do, biological or technological: a random linking of two nodes. He is making us think about what it means to be intelligent, what it means to be conscious and ultimately what it means to be One.

"The first person to buy a fax machine was an idiot." But the second...

Tuesday, September 16, 2008

Students in the 21st Century

Tuesday in class I made some remarks about dragging COA and its students into the 21st century. I also used this phrase last Spring in my Differential Equations course. I thought it might be a good exercise for me to say a little bit more about what I mean by this. Perhaps it will be of interest to others, too.

It might be easier to say what I don't mean. I'm not concerned that students keep up with all the latest electronic gadgets. And I certainly don't want to encourage students to unquestioningly accept new technology. I don't think debates about whether or not it's better to read the New York Times online or on paper are very interesting. And I'm not interested in virtual reality, Second Life, computer art, website design, or whether or not instant messaging will destroy this generation's ability to write grammatically.

What I am interested in is the vast amount of new technology that's actually very, very useful. How can this technology -- the myriad "web 2.0" apps and gizmos -- help us do the important and fun work we want to do. This technology isn't an end in and of itself, although it may seem like it. But it's a potentially powerful tool. Which tools are good, and which should be avoided? Moreover, like it or not, much of this new technology is here to stay. Email, the web, wikipedia, social networking, and so on, aren't going anywhere.

We live in a world that is "information rich." Email, blogs, podcasts, online journals and newspapers: there is a dizzying array of information to which we have easy access. A lot of this information comes streaming at us, whether we want it or not. This information is, of course, of wildly varying quality. Lots of it is distracting and irrelevant, and some is pure rubbish. But there's so much good stuff out there, too, that it would self-defeating to turn ones back on the sea of online information.

I worry occasionally that students get strong messages that the internet is never to be trusted, and that real knowledge comes in books or at least printed on paper. Or that electronic media is a little frivolous and library and book research is more serious and scholarly. The result is that sometimes students reflexively turn away from the online world. I want students to turn toward it and embrace it, at least long enough to see what it has to offer. I love books -- my home and office are full of them -- and I love libraries. But if I solely relied on paper resources it would be almost impossible to stay current in my fields. And it would take an enormous amount of paper.

I think it is important to gain skills and learn how to use tools to efficiently sample a lot of the new (and old) knowledge and ideas that are being produced. Even more important is being able to sort, index, store, share, and re-find references and resources that are useful. My experience has been that many COA students (and faculty) are unaware of lots of tools and strategies for working in an information rich environment.

I am hard pressed to think of many jobs/careers/callings which don't require some sort of facility with lots of different forms of (mostly) electronic communication and reading. There will almost surely be some fields or bodies of knowledge that students will need to keep up with: the art scene in Chicago, or politics in Nebraska or Nigeria, or an academic field (usually more than one), or the goings on in one's professional societies or associations, and so on. I want students to have good strategies and techniques for doing so efficiently and smartly. And email can be soul-crushing and time consuming, but it's here to stay. Better get strategies and techniques for dealing with it. Lots of it.

Ultimately, it's not up to me to determine what strategies people adopt to navigate this information-rich world. This will vary lots from person to person. But I do think it's appropriate for me to pester students to think about these issues and gently coerce them to trying some different approaches. In fact, the more I think about it, I worry that I would be remiss if I didn't do so.

These thoughts feel a little incoherent to me. I'd welcome questions and comments. If there is interest, I may follow up this post with a few others concerning some more specific (and practical) thoughts I have about particular strategies for working in information-rich settings.