Each observation \(y_i\) is assumed to come from
\[ y_i \sim \mathcal{N}(\mu_i, \sigma^2) \]
where \(\mu\) is
\[ \mu_i = \beta_0 + \beta_1 \times x_i \]
Subscript \(i\) means that every observation \(y_i\) depends on a corresponding \(x_i\).
Each observation \(y_i\) is assumed to come from
\[ y_i \sim \mathcal{N}(\mu_i, \sigma^2) \]
where \(\mu\) is
\[ \mu_i = \beta_0 + \beta_1 \times x_i \]
Subscript \(i\) means that every observation \(y_i\) depends on a corresponding \(x_i\).
Assuming a linear relationship we can calculate the predicted mean of the distribution over reaction times rt using the linear regression equation:
\[ \text{rt}_i = \text{intercept} + \text{slope} \times \text{age}_i \] Use R as a calculator, when necessary, but think first:
In the assumed model
\[ y_i \sim \mathcal{N}(\mu_i, \sigma^2) \]
where \(\mu\) is
\[ \mu_i = \beta_0 + \beta_1 \times x_i \]
the value of \(\mu\) and \(x\) depend on \(i\) but not the variance \(\sigma^2\). The variance \(\sigma^2\) is constant and, hence, so is the standard deviation \(\sigma\).
Predicted rt with SDs across age (in years).
We want to know if the outcome variable is related to a predictor. Therefore, we ask if the change in the outcome for a predictor (aka the slope) different from zero?
t-value is the difference between the estimate of the slope coefficient and the hypothesis over the standard error of the slope coefficient.
\[ \frac{\hat\beta_1 - \beta_H}{\text{SE}_1} \sim \text{t}_\text{df} \] \(\hat\beta_1\): slope coefficient (change in the outcome variable for a predictor)
\(\beta_H\): hypothesized change in the outcome variable; typically 0 when an outcome is hypothesized to be unrelated to the predictor.
\(SE\): standard error of the change in the outcome variable for a predictor
One the basis of this t-value (difference between slope coefficient and 0 in units of standard errors) we can calculate the probability of observing such a t-value or anything more extreme in a t-distribution with a given number of degrees of freedom.
That’s our p-value, the probability of our t-value or anything more extreme, if the null hypothesis is true.
Confidence interval are all hypothetical values of our slope effect that can’t be ruled out: \(\hat\beta_1 \pm \tau \times \text{SE}_1\) where \(\tau\) is the value in a t-distribution that contains 95% of the area under the curve (for 95% CIs).
Report results as:
est. = \(\dots\), 95% CI [\(\dots\), \(\dots\)], t = \(\dots\), p < / = \(\dots\)
Complete exercise script 1_linearmodel.R
Variance in the data explained by the model predictor(s).
\[ R^2 = \frac{\text{ESS}}{\text{TSS}} \]
where
\[ \text{ESS} = \sum_{i=1}^n(\hat\mu_i-\bar{y})^2 \] and
\[ \text{TSS} = \text{ESS} + \text{RSS} \]
Total sum of squares = explained sum of squares + residual sum of squares
\[ \text{RSS} = \sum_{i=1}^n(y_i-\hat\mu_i)^2 \]
\(y_i-\hat\mu_i\) are the residuals of the model; see script 2_residuals.R.
Then complete the calculation for 3_rsquared.R.
To overcome this spurious increase in \(R^2\), the following adjustment is applied.
\[ R^2_\text{Adj} = 1 - (1 - R^2) \cdot \underbrace{\frac{n-1}{n-K-1}}_\text{penalty} \]
Complete exercise script 4_adjrsquared.R
# Fit the rt as a normal model with age as predictor. m_1 <- lm(rt ~ age, data = blomkvist) summary(m_1)
Call:
lm(formula = rt ~ age, data = blomkvist)
Residuals:
Min 1Q Median 3Q Max
-371.3 -93.8 -23.0 60.9 838.7
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 314.260 26.723 11.8 <2e-16 ***
age 5.778 0.447 12.9 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 149 on 264 degrees of freedom
Multiple R-squared: 0.388, Adjusted R-squared: 0.386
F-statistic: 167 on 1 and 264 DF, p-value: <2e-16
Go through exercises script 5_ftest.R.
You will need this equation for the F value
\[ \text{F} = \underbrace{\frac{\text{RSS}_0 - \text{RSS}_1}{\text{RSS}_1}}_\text{effect size} \cdot \underbrace{\frac{\text{df}_1}{\text{df}_0 -\text{df}_1}}_\text{sample size} \]
where subscripts refer to the two models (hypotheses) in the exercises script.
Complete this statement: “Model comparisons showed that age did / did not have a significant effect on the model fit (F(\(\text{df}_1\), \(\text{df}_2\)) = \(\dots\), p < \(\dots\)).”
where \(\text{df}_2\) is \(n - K + 1\), \(K\) and \(\text{df}_1\) are the number of predictors in the more complex model.
Go through exercises script 5_ftest.R.
You will need this equation for the F value
\[ \text{F} = \underbrace{\frac{\text{RSS}_0 - \text{RSS}_1}{\text{RSS}_1}}_\text{effect size} \cdot \underbrace{\frac{\text{df}_1}{\text{df}_0 -\text{df}_1}}_\text{sample size} = \frac{(\text{RSS}_0 - \text{RSS}_1)/(\text{df}_0 - \text{df}_1)}{\text{RSS}_1/\text{df}_1} \]
where subscripts refer to the two models (hypotheses) in the exercises script.
Complete this statement: “Model comparisons showed that age did / did not have a significant effect on the model fit (F(\(\text{df}_1\), \(\text{df}_2\)) = \(\dots\), p < \(\dots\)).”
where \(\text{df}_2\) is \(n - K + 1\), \(K\) and \(\text{df}_1\) are the number of predictors in the more complex model.
Summary for model coefficients and F-test
summary(model)
Confidence intervals for model coefficients
confint(model)
Also try
library(broom) tidy(model, conf.int = TRUE)