Statistical Inference and Regression

Celia Evans

8/2/2021

Quick Review of Statistical Inference

Population and Sample

Confidence Interval

Student t-distribution compared to the normal

Student t-Distributionon vs Normal

Confidence intervals with t-distributions

n = 100
x = rnorm(n,1,2)

se = sd(x)/sqrt(n)
mean(x) + c(-1,1)*se*qt(0.975,99) # t CI
## [1] 0.6224048 1.3964536
mean(x) + c(-1,1)*se*qnorm(0.975)
## [1] 0.6271354 1.3917230

But even in this example the t-CI is slightly larger. So always use t, it’s more coservative.

Formal Hypothesis Testing Framework

Suppose we have estimated the slope for a regression line for a single variable. The true slope is \(\beta_1\) parameter, and the sample slope is given by $

\(\mathbf{H_0}\): \(\beta_1 = 0\) There is no linear relationship.

\(\mathbf{H_A}\): \(\beta-1 \ne 0\) There is a linear relationship (two sided test)

The idea is that we could only reject \(H_0\) if we had a lot of evidence to the contrary. In other words how likely are the results we observed if the null hypothesis was true.

Example: Galton data

The follow code computes statistics of the coefficients in a single linear regresion of child on parent in the galton data set.

##               Estimate Std. Error   t value     Pr(>|t|)
## (Intercept) 23.9415302 2.81087834  8.517455 6.536845e-17
## parent       0.6462906 0.04113588 15.711115 1.732509e-49

Usually we don’t pay too much attention to the intercept, unless it has some intrinsic meaning. In this case there are very few adults who are 23 inches tall!

However the coefficient of parent \(\hat{\beta_1}\) says that a child’s height will only increase by about .65 inches for every inch increase in the parent. Can we conclude this is a statistically significant result?. Let’s look at the tale. The column labeles “Pr(>|t|)” is the probability of observing a value as large, in absolute value, as the the one we absovered is on the order of \(10^{-14}\), in other words effectively zero.

So we would reject the null hypothesis that there is no linear relationship between the heights. “Pr(>|t|)” is frequently called the \(p\)-value. We would reject the null hypothesis if the p-value were as large as 2.5%.