The Ptolemaic geocentric model using the additive combination of circles and epicycles was able to predict the movement of the planets with some accuracy. Despite its inaccuracies, the model worked within certain boundaries and had its uses.
Similarly, Linear regression is like the geocentric model of the solar system in many ways:
Simple Yet Powerful: uses an additive combination of measurements (like variables) to predict an outcome. It’s a straightforward tool that can describe a wide variety of phenomena.
Descriptive but Limited: can accurately describe relationships between variables but may not capture the full complexity of the real-world processes if taken too literally.
Useful Approximation: helps in making predictions and understanding data, even though it might not always capture every nuance.
Imagine you’re a farmer trying to predict the yield of your crop based on the amount of rainfall and fertilizer used. You gather data over several years and use linear regression to create a model:
\[ \text{Crop Yield} = \beta_0 + \beta_1 \times \text{Rainfall} + \beta_2 \times \text{Fertilizer} \]
Epicycles in Farming: Just like epicycles were circles on circles to predict planet positions, you’re using rainfall and fertilizer (variables) to predict crop yield.
Approximation: Your linear regression model gives a good approximation of crop yield based on the data you have, similar to how the geocentric model approximated planetary positions.
Limits: If you tried to predict crop yield in a completely different climate with different soil, the model might fail, just as the geocentric model would fail to plot a Mars probe.
The chapter introduces linear regression as a Bayesian procedure, which means using probability distributions to describe uncertainty. Instead of just finding the best-fitting line, Bayesian regression provides a distribution of possible lines, giving a more nuanced understanding of the uncertainty in predictions.
Call:
lm(formula = scores ~ hours, data = data)
Residuals:
1 2 3 4 5
-3.5366 1.0061 5.0915 -0.8232 -1.7378
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 47.6220 3.8381 12.408 0.00113 **
hours 5.4573 0.6621 8.242 0.00374 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 3.792 on 3 degrees of freedom
Multiple R-squared: 0.9577, Adjusted R-squared: 0.9436
F-statistic: 67.93 on 1 and 3 DF, p-value: 0.00374
2.5 % 97.5 %
(Intercept) 35.407486 59.836417
hours 3.350122 7.564513
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
Frequentist Plot:
Bayesian Plot:
The familiar “bell” curve of the Gaussian distribution is emerging from the randomness. Where does it come from? Why is it so common?
Normality by addition When you add together many random values from the same distribution, the result tends to form a bell curve, or normal distribution. Here’s an easy way to understand why this happens:
Visual Example
Imagine rolling a die:
In the context of statistical modeling, the Gaussian (or normal) distribution is often used as a foundation for building hypotheses. The justifications for this choice fall into two broad categories: ontological and epistemological.
1. Ontological Justification The ontological reason for using the Gaussian distribution is that it appears frequently in nature. While we might never encounter a perfect Gaussian distribution, we often observe patterns that approximate it across various domains and scales. Some examples include:
The underlying reason for this prevalence is that many natural processes involve the addition of small, random fluctuations. When these fluctuations are added together repeatedly, the result tends to be a Gaussian distribution. This is because the process of summing these fluctuations sheds all detailed information about the individual contributions, leaving only the mean and spread (variance).
However, it’s important to note that the Gaussian distribution is not the only pattern in nature. There are other significant distributions, like exponential, gamma, and Poisson, which also arise from natural processes. The Gaussian is part of the broader exponential family of distributions, all of which are fundamental in the natural world.
2. Epistemological Justification The epistemological justification for using the Gaussian distribution is based on the concept of maximum entropy and information theory. When we only know or are willing to assume the mean and variance of a distribution, the Gaussian distribution is the most reasonable choice. This is because:
Practical Implications Using the Gaussian distribution as a starting point for building models doesn’t mean we are committing to it as the only or best model. It is a practical tool that helps us begin modeling continuous measurements. As we gather more information or develop specific knowledge about the data, we can refine our models and possibly choose other distributions that better fit our data’s characteristics.
In summary, the Gaussian distribution is a powerful tool in statistical modeling due to its natural occurrence in many processes and its role as a default assumption when we have limited information. It provides a solid foundation for building and refining our understanding of data.
Simple Explanation with Examples
Main Idea When we don’t have much information about a set of data, but we do know its average (mean) and how spread out the values are (variance), the Gaussian distribution is the most neutral choice. It’s like saying, “Given what little we know, this is the safest bet.”
Example 1: Heights of Students Imagine you’re a school principal, and you want to understand the heights of students in your school. You don’t have detailed data about every student’s height, but you do know the average height (mean) and the general variability in heights (variance).
Known Information:
Given just this information, you choose to assume that the heights follow a Gaussian distribution because it is the least biased choice. It doesn’t make any extra assumptions about the specific shape of the data beyond what you know.
Ontological Justification
Epistemological Justification
Observable Variable (Data): - Heights of individuals (let’s denote it as \(y\)).
Unobservable Variables (Parameters): - Mean height (\(\mu\)). - Standard deviation of heights (\(\sigma\)).
Heights (\(y\)): The heights are distributed according to a normal (Gaussian) distribution with mean \(\mu\) and standard deviation \(\sigma\).
\[ y_i \sim \text{Normal}(\mu, \sigma) \]
Mean Height (\(\mu\)): This parameter can be modeled with a prior distribution if we have some prior knowledge. Let’s assume a normal prior with mean 178 cm and standard deviation 20 cm.
\[ \mu \sim \text{Normal}(178, 20) \]
Standard Deviation (\(\sigma\)): This parameter can also be modeled with a prior distribution. Let’s assume a uniform prior from 0 to 50 cm, indicating we believe the standard deviation could reasonably be anywhere in this range.
\[ \sigma \sim \text{Uniform}(0, 50) \]
Combining these variables and their probability distributions, we define a joint generative model. This model allows us to simulate hypothetical observations and analyze real data.
Specify priors: \[ \mu \sim \text{Normal}(178, 20) \] \[ \sigma \sim \text{Uniform}(0, 50) \]
Specify likelihood: \[ y_i \sim \text{Normal}(\mu, \sigma) \]
This model describes how the observed data (heights) are generated from the underlying parameters (mean and standard deviation).
To compute the posterior distribution, we apply Bayes’ theorem:
\[ P(\mu, \sigma \mid y) \propto P(y \mid \mu, \sigma) P(\mu) P(\sigma) \]
where:
Prior Choices
The chapter investigates two ways of computing the posterior: grid approximation and Quadratic approximation.
1. Normal Distribution Justification:
2. Priors and Their Influence:
3. Visualizing Posterior Distributions:
4. Computational Challenges: