Bayesian statistics: a concise introduction 1 Bayesian vs frequentist statistics

In Bayesian statistics, probability is interpreted as representing the degree of belief in a proposition, such as “the mean of X is 0.44”, or “the polar ice cap will melt in 2020”, or “the polar ice cap would have melted in 2000 if we had not.”, etc. Thus we see it can be applied to reasoning about...

Full description

Bibliographic Details
Main Author: Kevin P. Murphy
Other Authors: The Pennsylvania State University CiteSeerX Archives
Format: Text
Language:English
Published: 2007
Subjects:
Online Access:http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.120.2779
http://www.cs.ubc.ca/~murphyk/Teaching/CS340-Fall07/reading/bayesStat.pdf
Description
Summary:In Bayesian statistics, probability is interpreted as representing the degree of belief in a proposition, such as “the mean of X is 0.44”, or “the polar ice cap will melt in 2020”, or “the polar ice cap would have melted in 2000 if we had not.”, etc. Thus we see it can be applied to reasoning about one time events (ice cap melting), counterfactual events (ice cap would have melted), as well as more “traditional ” statistical questions, such as computing distributions over random variables. Bayes rule provides the mechanism by which prior beliefs are converted into posterior beliefs when new data arrives. (Bayes rule is sometimes called the rule of inverse probability.) For example, to estimate a parameter θ from data D, one can write p(θ|D) ∝ p(θ)p(D|θ), where p(θ) is the prior and p(D|θ) is the likelihood. Decision theory can be used to decide how to convert beliefs into actions. For example, if we want to summarize our belief state with a single number (called a point estimate), we often use the posterior mean or posterior mode, depending on our loss function. There are various compelling arguments (see e.g., [Jay03]) that Bayesian statistics is the only consistent way to reason under uncertainty. In frequentist statistics (also called classical statistics or orthodox statistics), probability is interpreted as representing long run frequencies of repeatable events. Thus it cannot be used to reason about one time events or counterfactual events. One can talk about the probability of data having a certain value, p(D|θ) (this is the likelihood function), since one can imagine repeating the experiment and observing different data. But one cannot talk about the