Saturday, October 18, 2008

Power Laws, or maybe not

On numerous occasions I've urged caution and skepticism when reading papers claiming that there are power laws in some empirical data. Right on cue, a dubious power law claim appeared in a paper published a few weeks ago.

The paper in question is Yu, et al, High-Quality Binary Protein Interaction Map of the Yeast Interactome Network from the October 3, 2008 issue of Science. The interactome referred to is a network of protein-protein interactions. The paper claims that the degree distribution of the interactome network is power-law. This claim was critiqued by Aaron Clauset in a recent blog post, poetically titled power laws in the mist.

Specifically, Aaron examines three power-law claims from the original paper. Using maximum-likelihood estimators instead of log-linear regression, he finds strong evidence that one of the "power-laws" is definitely not a power law, one could be, and one probably is. For the real power law, he estimates an exponent that is incompatible with the values published in Science. The full blog entry is well worth reading, Aaron is a good writer and the piece is a nice discussion of the right way to look for power laws in empirical data.

Aaron's piece is an interesting example of the way that blogs are now being used as a form of scientific communication. Aaron writes

A colleague of mine asked me why I didn't write this up as an official "Comment" for Science. My response was basically that I didn't think it would make a bit of difference if I did, and it's probably more useful to do it informally here, anyway. My intention is not to trash the Yu et al. paper, which as I mentioned above has a lot of genuinely good bits in it, but rather to point out that the common practices (i.e., regressions) for analyzing power-law distributions in empirical data are terrible, and that better, more reliable methods both exist and are easy to use. Plus, I haven't blogged in a while, and grumping about power laws is as much a favorite past time of mine as publishing bad power laws apparently is for some people.

It seems to me that science blogging serves as an excellent complement to process of publishing in peer-reviewed journals. Blogs allow for informal comment, discussion, and debate in a way that can't happen in journals. This sort of back-and-forth serves as an important check on flimsy results and as a way to get quick feedback on new ideas. This sort of dialog isn't new; people have been debating and exchanging ideas at seminars, in hallways, at department gatherings, at academic conferences, and so on, for decades if not centuries. What's new about blogs is that they open up the discussion and allow lots of people to observe and participate in the fun.

4 comments:

Yiftu said...

This term is my first time using blog and I have come to realize how useful it is. It is also fun having an authorization on your own publication and maybe find yourself on Google! Although it is easy and quick to find information, critiques, reviews and the like on blog posts, we need to question about the reliability. Otherwise it will be a total waist of our time plus our minds will get messed up with false knowledge.

helen said...

I agree with Dave that science blogging can be an excellent COMPLEMENT to publishing in the peer-reviewed literature (preferably in an open-access forum). I wouldn't want to see the peer review process replaced entirely. Not all bloggers are as careful and thoughtful with their posts as Aaron Clauset.
What excites me most about science blogging is that the audience has the potential to be so vast. Scientists rely on public support to a great extent (in terms of funding and cultural attitudes), and science blogging represents an additional way to effectively educate the public about what scientists are learning. And I'm a great believer in the power of education and in the value of science. I find myself thinking, "once informed, who could resist being persuaded?"

dave said...

I think Yiftu and Helen are right: there's a lot of unreliable stuff out there. But there's also some pretty soft stuff that gets by peer review. So I'd encourage a healthy skepticism regardless of the media one is reading. That said, there certainly is a large variance in blogging quality. But it might be that the good stuff in science blogs is better---or at least different---than the good stuff in peer reviewed articles.

The original post by Aaron that I linked to is, I think, extremely informative and useful for researchers wanting to do statistical inference of power laws correctly. There's no doubt in my mind Aaron's post, and similar (and less kind) posts by Cosma Shalizi, lead to better science.

In physics/applied math, there is a set of things that everyone (should) know, but which aren't always in textbooks, and aren't necessarily in peer reviewed articles. Examples of these sorts of things include optimal methods for power law inference, and lots of other computational tricks and tips. I suspect there are similar unwritten bits of widsom in other fields, even Invertebrate Zoology. It used to be that these sorts of things were passed down from adviser to student, or from advanced students to beginning students. Blogs give another path for transmission. Aaron's blog post is archived and will be around for researchers young and old to find, be enlightened by, or disagree with.

helen said...

Hey, thanks for the shout-out to my blog. I need you to show me how to link in the comments section. I suspect I need some html literacy. Always happy to engage in skills building.