There are two main schools of thought in the realm of statistical analysis (sorry Nihilists, though your attitude toward uncertainty is admirable in its boldness, it’s no way to go about making decisions). The Frequentist school of thought is what is taught to most entry-level statisticians, because it seems to work well for a whole slew of things and the things it doesn’t work well for, well, what you don’t know can’t hurt you, right?
Frequentists treat their estimands as unchanging constants. Typically Frequentist methods seek to produce an interval that can, with a high level of confidence, be expected to contain a parameter of interest. So if a Frequentist were attempting to estimate the average length of a wallaby ear, they would measure a bunch of wallaby ears, assume that there is one true average wallaby ear length and use information contained in their sample of wallaby ear lengths along with a distributional assumption about all wallaby ear lengths to infer an interval that should contain the true mean wallaby ear length 95% of the time. This is a bit counterintuitive. The natural inclination is to believe that there is a 95% chance that the true mean is within the interval produced via this process. However, this isn’t how it works. A Frequentist will tell you that the mean wallaby ear length is what it is with a probability of one and is what it is not with probability of zero. They might further explain that their method can generate an interval containing the one true mean wallaby ear length with an any ol’ arbitrary level of certainty, and that for some mysterious reason, 95% typically defines the line between “certain enough” and “we’ll never get this published”.
A Bayesian would treat the mean wallaby ear length itself as a random variable, make a distributional assumption about mean wallaby ear length, then measure a bunch of wallaby ears and combine the information from this sample with the information from the prior assumed mean wallaby ear distribution to produce a posterior distribution for mean wallaby ear length. The result of the Bayesian method is a probability distribution for mean wallaby ear length which a wallaby earmuff maker might then use to properly fill out their inventory for winter in the Southern Hemisphere (though they might have better luck using the raw data as a guide instead of the distribution of the mean). If you’ve noticed the simplicity in interpreting the results of the Bayesian method, you’re not alone. It is a lot nicer to be able to claim knowledge of the distribution of mean wallaby ear lengths. Plus, is it really really realistic for Frequentists to assume that there is one constant mean wallaby ear length? Aren’t wallaby populations constantly changing with the natural cycle of birth, growth and death? Would this constant change then imply a constantly changing mean wallaby ear length? Isn’t treating estimands as random more accurate?
A Frequentist might respond to this question by pointing out that there isn’t necessarily an objective way to specify the distribution of the mean wallaby ear length, and that attempting to do so reduces the credibility of the analysis. Better to be consistently slightly wrong than inconsistently slightly wrong. And then the Bayesian would say “Pish posh to you!” and the Frequentist would respond “Well! Aren’t you a devilish little scoundrel!” and then they’d retire to their offices, where each one would write long, passionate letters to the editors of various statistical journals containing elaborate arguments about the foolishness of the other.
Bayesian and Frequentist methods both have advantages and drawbacks. I’m not going to pretend to be enough of an expert in the field to elucidate them with authority.
I’m taking a Bayesian Analysis course this semester. It involves a lot of coding, which is nice. For the current homework assignment, I have to write an algorithm which implements a Gibbs Sampler on a set of coal mine accident data. The story behind the data is that coal mines are dangerous and that sometimes governments pass laws to make coal mines less dangerous. If these laws work, accident counts drop, and people who would have otherwise died in coal mine accidents live long enough to die of black lung. Pip pip cheerio! These data have a changepoint, which just means that at some point, a change occurred in mechanism driving the phenomena being studied. If you want to read about these data and a Bayesian method for finding the changepoint, get your JSTOR login info out and click here (or google “Hierarchical Bayesian Analysis of Changepoint Problems” and find a wayward pdf).
One reason Bayesian methods are useful is that they leave you with a probability distribution, AKA the posterior. If you can specify your models in the right way, you can produce a closed form expression for your posterior. This is how Bayesian analysis had to be done before computers were cheap and easy. Frequently, it isn’t possible produce a closed form for your posterior and so various sampling techniques are used in conjunction with your data and model to simulate the posterior. One of these is an algorithm known as the Gibbs Sampler. Unfortunately, the Gibbs Sampler has nothing to do with the Gibb brothers of BeeGees fame. But seeing as how I just wrote a Gibbs Sampler for my homework, I thought it might be informative to use it to analyze their yearly output of singles as a set of data with a changepoint.
First, a plot of the number of BeeGees singles from 1963-2001 (via Wikipedia)
Seems like they had fairly high output until about 1980, then less output until, after the death of one of the Gees in 2003, they retired the BeeGee name. Running these data through a simplification (I specified the changepoint a priori) of the model in the Carlin paper I linked to above yields the following plot.
The model assumes that the number of singles released per year is a Poisson-distributed random variable. The curves in the above plot correspond to the number of expected singles in a given year. The lambda curve, which is centered around 1.0, corresponds to the expected number of singles per year following 1980. The theta curve, centered around 3.1, corresponds to the expected number of of singles per year between 1963 and 1980. There is essentially zero overlap between these curves, indicating a pretty clear change in BeeGees single productivity in the year 1980. Wikipedia blames a backlash against disco. Disco Stu was not amused.
Bayesian analysis is flexible too. For instance, this model can be refit with the changepoint as a parameter, resulting in the following posterior.
The posterior distribution for the changepoint has a mode of 18, which corresponds to 1981. According to this analysis, the year which most likely divides two levels of single production for the BeeGees is 1981. This second analysis used JAGS instead of my Gibbs Sampler code because that’s how I did the homework and I’m too much of a tired and lazy blogger to rewrite my sampler to incorporate the changepoint.