Wednesday, July 2, 2008

Why Bayes Rules

I need to stop reading this stuff. The opening discussion on induction and the in ability to come to the "correct" conclusion in the case of the crows and to decide between green and grue emeralds was particularly annoying. In the first case, the statement all crows are black is considered. This statement is problematic because, if it is true, one could never rationally conclude that it is true unless you observe all crows.

In a slightly more complicated example, Nelson Goodman considered the statement, "all emeralds are green" and the statement "all emeralds are either green or blue (grue)". He then noted that if it is true that all emeralds are green, then the observation of any number of emeralds is equally consistent with either of these two statements.

The author then resolves this by suggesting that these statements should be rephrased to take the form "all but finitely many crows are black, i.e. 70% are black". Ignoring the incorrect use of the word finitely, this is considered acceptable as it represents (to the author) a statement which can be reasonably accepted after observing a finite amount of data and is asymptotically reliable.

Anyway, what kills me is that this problem has a normative solution provided you're willing to give up the impossible dream certainty in the presence of finite data. Moreover, that solution doesn't require you to give up on very reasonable statements like all emeralds are green, just that you acknowledge that you can only come to a conclusion of that sort with a confidence level which corresponds to a probability level that the statement is correct that is less than one. Here, for example, is the Bayesian Laplacian approach to the emerald problem.

Suppose that you have N emeralds and all are green. The quantity of interest is the probability that hypothesis one (H1 = all emeralds are green) is true divided by the probability that hypothesis two (H2 = all emeralds are grue) is true.


P(N Green Observations/NG0 | H1 = true) = 1

while hypothesis 2 has a hidden parameter, theta, which gives the percent of emeralds which are green. Marginalizing out this parameter implies...

P(NGO | H2 = true) = \int p(NGO | theta, H2)*p(theta | H2) dtheta

Utilizing the Jeffreys prior for p(theta | H2)dtheta this yeilds

P(NGO | H2 = true) = \int theta^n/sqrt(theta*(1-theta))/pi dtheta
= Gamma(N+1/2)/Gamma(N+1)/sqrt(pi)

where Gamma is the Gamma function.

In the limit as N goes to infinity P(NGO | H2) goes to zero indicating that this form of inference is asymptotically efficient. More importantly, however, for N=30 observations of green emeralds P(NG0|H2) = 0.05 indicating that, even a small amount of data can lead to a 95% confidence in the conclusion that all emeralds are green. Meanwhile, it is still the case that a single observation of a blue emerald is sufficient to reduce this confidence to zero.

As an added benefit this method can also be used to predict (at any time) the probability that a blue emerald will be found, i.e we can compute P(Blue Emerald | NGO), which gives the probability that we will change our mind the next time we see an emerald.

Of course, we could also have take the authors suggestion and, instead of hypothesis testing, we could have asked simple what is the probability that non-green emeralds exist given that we have observed N green emeralds. This requires ignoring H1 completely, and instead computing

p(theta | NGO) = theta^N / pi / sqrt(theta*(1-theta)) / p( NGO | H2 )

Curiously, in this case, the probability that your conclusion that all emeralds are green will eventually be proven incorrect is bounded from below by 1-exp(-1/2) \approx 0.4 That is, no matter how many observations of green emeralds you have observed there is at least a 40% chance that you will eventually observe a green emerald.

Similarly, the probability of failing to observe N additional green emeralds after having already observed N green emeralds, i.e. the probability of failing to replicate your observations is bounded from below by 1-exp(-1/4) \approx 0.22 even as N goes to infinity.