Monday, November 8, 2010

Just an Idea to Throw Out There

Bayesian Non-Parametric Models as the Appropriate Null Hypothesis.

EX: The Cascading Indian Buffet Process

I was doing some casual web surfing when I came across a set of slides Hanna Wallach made regarding a generative model for deep belief networks (link above). I always liked DBM's but felt that they were almost too general. Add enough layers, give it enough data and they can do just about anything.

That's when something which is probably obvious to many people doing Bayesian Non-Parametrics finally occurred to me: The cascade indian buffet process may constitute the Bayesian equivalent of a null hypothesis (at least for directed graphical models of this kind). After all, given this directed structure, these models have basically no assumptions built into them. None-the-less they are quite complex, much more so than the standard null hypothesis of 'no relationship' which is almost surely false. Structured models appropriate to the data should at least be better than these assumption free models. This is a sad statement for data for which these models actually are the best performers as it suggest that when the cascading IBP is the best performer we should probably conclude that, in those cases, we really don't understand squat about the mechanism which is generating the data.

Anyway, just a thought... and a Chardonnay induced one at that.

Sunday, February 21, 2010

Stop saying that it's JUST noise

There is a comical trend these days of labeling data incompatible with ones theories or views as 'just noise' or as 'simply random fluctuations'. The suggestion implicit statements such as these is that you should somehow discount that data point. This is the confirmational bias at work and is utter folly. Just as every data point compatible with a particular hypothesis should increase your degree of belief in the theory, so also should every disconfirmational data point should decrease that degree of belief. Perhaps not by much, but it must be taken into account and treated with equal weight as all previous data points.

This view is hardly novel, but it is worth repeating. Now that I have you nodding in agreement (and likely questioning the depth of this post) there is one ever so slightly more subtle point I would like to politely make.


The noise is not just some random fluctuations that obscure whatever signal our theory predicts. No, noise is how we model ignorance. Now it may be that there is some fundamental limit to our knowledge about something (as in quantum mechanics), but, in general, the physics is in the fluctuations and the things we should be trying to understand tomorrow is the stuff we were forced to label as noise today. Even in the presence of rational inference, calling something noise and suggesting we forget about it is the equivalent of declaring the scientific process finished and that is something I hope we never do, regardless of how politically convenient it might be.

Q: What do you call a theory that is equally compatible with any data set?

Unscientific is the polite answer.

Time Magazine Then (June 17, 2009):
'Warming will make skiing, ice-skating and snowmobiling pastimes of the past in many areas of the Northeast, decimating the multibillion-dollarwinter-sports industry.'

Time Magazine Now (Feb 10, 2010):
'Climate change could in fact make such massive snowstorms more common, even as the world continues to warm.'

National Geographic has an equally short memory.

National Geographic Then (2009):
'Droughts will become more common'

National Geographic Now (Feb. 2010):
'Scientists say global warming is the main culprit behind this month's eastern-U.S. snowstorms—and it could cause more heavy snowfalls in future winters.'

Not credible on any topic (other than pretty pictures)

Wednesday, October 7, 2009

Unbiased but Probably an Underestimate

Over at mind hacks, Vaughan asks, why it always seems worse than you think? It turns out that its proper Bayesian statistical reasoning. The frequentist constructs unbiased estimates in the limit of large data. For finite data sets these estimates might be unbiased in some sense, but that doesn't mean that the true value is going to be just as likely to be greater than the estimate or less than the estimate. The median estimate has this property, by definition, but in most cases of interest the median is significantly higher than the mean or the maximum a posterori estimate. This seems to be generically the case when estimating quantities which are small. For example, consider two cases where we are estimating a small quantity. In one case, I sample from a Poisson distribution with mean obtained from a gamma distribution. In the other case, I assume a beta prior (uniform prior in particular) for the probability p of binomially distributed random variable. Because these are conjugate priors the posterior distributions are also gamma and beta respectively and we can compare the probability that the true value of the associated means conditioned on observing K examples (normalized by N samples in the Binomial case) is greater than the maximum a posteriori or mean estimates.

Clearly both MAP and MEAN estimates of rare events are likely to be underestimates given finite data. The same is true for ML estimates. In the binomial case a uniform prior was used. In the poisson case the results are independent of the prior... unless i made a silly error putting this together in 15 minutes :) For the binomial case, a Jeffrey's prior also seems to have this property despite heavily favoring small probabilities...

Monday, February 23, 2009

Change Reply to Address on Iphone

Annoyed that you iphone uses your gmail address as the reply to address. I was. Here's the fix. Goto settings, Mail on you iphone and deactivate your gmail account. Then click on add account. Now don't click Google mail, but instead click other. On the first page put the desired return email address in the Address area, but use your gmail password for the password slot. Then just configure your imap and smtp settings using your gmail address and password as usual. Note, the email address you put into the first slot will have to be a gmail authorized address. You can authorize an address from the settings page at

Thursday, February 19, 2009

Adding the Noise...

As i am burning out preparing for COSYNE i've decided to add some noise of a more pleasant variety. If you're in London and like to punctuate good music with beer and bayes please join me:
(this one is a maybe as some figures must be made today)

These are definites:

iPhone and Gmail Properly Configured

To make sure that things you trash on you iphone show up in your gmail trash and messages you send show up in your gmail sent folder follow these instructions: