Each observation \(y_i\) is assumed to come from
\[ y_i \sim \mathcal{N}(\mu_i, \sigma^2) \]
where \(\mu\) is
\[ \mu_i = \beta_0 + \beta_1 \times x_i \]
Subscript \(i\) means that every observation \(y_i\) depends on a corresponding \(x_i\).
Each observation \(y_i\) is assumed to come from
\[ y_i \sim \mathcal{N}(\mu_i, \sigma^2) \]
where \(\mu\) is
\[ \mu_i = \beta_0 + \beta_1 \times x_i \]
Subscript \(i\) means that every observation \(y_i\) depends on a corresponding \(x_i\).
Given a linear relationship we can calculate the predicted mean of the distribution over rt
\[ \text{rt}_i = \text{intercept} + \text{slope} \times \text{age}_i \] Use R as a calculator but think first:
In the assumed model
\[ y_i \sim \mathcal{N}(\mu_i, \sigma^2) \]
where \(\mu\) is
\[ \mu_i = \beta_0 + \beta_1 \times x_i \]
\(\mu\) and therefore \(y\) depends on \(i\) but not \(\sigma^2\).
Predicted rt with SDs across age groups (n = 5).
Predicted rt with SEs across age groups (n = 5).
\[ \text{SE}(\hat{y}_i) = \sqrt{\hat\sigma \times h_i} \]
where \(\hat\sigma\) is the mean squared error (the residual variance sigma(model)^2
) and \(h_i\) is the penalty for sampling variability which we get in R using hatvalue(model)
.
leverage.R
We want to know if the outcome variable is related to a predictor. Therefore, we ask if the change in the outcome that is due to a predictor (aka the slope) different from zero?
t-value is the difference between the estimate of the slope coefficient and the hypothesis over the standard error of the slope coefficient.
\[ \frac{\hat\beta - \beta_H}{\text{SE}} \sim \text{t}_\text{df} \]
p-value is the probability of observing such a t-value or anything more extreme in a t-distribution with a given number of degrees of freedom.
Confidence interval is \(\hat\beta \pm \tau \times \text{SE}\) where \(\tau\) is the value in a t-distribution with df degrees of freedom that contains 95% of the area under the curve (for 95% CIs).
Report results as: est. = \(\dots\), 95% CI [\(\dots\), \(\dots\)], t = \(\dots\), p < / = \(\dots\)
Check exercise script linearmodel.R
\[ \underbrace{\sum_{i=1}^n(y_i-\bar{y})^2}_\text{TSS} = \underbrace{\sum_{i=1}^n(\hat\mu_i-\bar{y})^2}_\text{ESS} + \underbrace{\sum_{i=1}^n(y_i-\hat\mu_i)^2}_\text{RSS} \]
\[ R^2 = \frac{\text{ESS}}{\text{TSS}} \]
where
\[ \text{ESS} = \sum_{i=1}^n(\hat\mu_i-\bar{y})^2 \] and
\[ \text{TSS} = \text{ESS} + \text{RSS} \]
\[
\text{RSS} = \sum_{i=1}^n(y_i-\hat\mu_i)^2
\] \(y_i-\hat\mu_i\) are the residuals of the model; see script 1_residuals.R
.
Then complete the calculation for 2_rsquared.R
.
To overcome this spurious increase in \(R^2\), the following adjustment is applied.
\[ R^2_\text{Adj} = 1 - (1 - R^2) \cdot \underbrace{\frac{n-1}{n-K-1}}_\text{penalty} \]
Complete exercise script 3_adjrsquared.R
# Fit the rt as a normal model with age as predictor. m_1 <- lm(rt ~ age, data = blomkvist) summary(m_1)
Call: lm(formula = rt ~ age, data = blomkvist) Residuals: Min 1Q Median 3Q Max -371.3 -93.8 -23.0 60.9 838.7 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 314.260 26.723 11.8 <2e-16 *** age 5.778 0.447 12.9 <2e-16 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 149 on 264 degrees of freedom Multiple R-squared: 0.388, Adjusted R-squared: 0.386 F-statistic: 167 on 1 and 264 DF, p-value: <2e-16
Go through exercises script 4_ftest.R
.
You will need this equation for the F value
\[ \text{F} = \underbrace{\frac{\text{RSS}_0 - \text{RSS}_1}{\text{RSS}_1}}_\text{effect size} \cdot \underbrace{\frac{\text{df}_1}{\text{df}_0 -\text{df}_1}}_\text{sample size} = \frac{(\text{RSS}_0 - \text{RSS}_1)/(\text{df}_0 - \text{df}_1)}{\text{RSS}_1/\text{df}_1} \]
where subscripts refer to the two models in the exercises script.
Complete this statement: “Model comparisons showed that age did / did not have a significant effect on the model fit (F(\(\text{df}_1\), \(\text{df}_2\)) = \(\dots\), p < \(\dots\)).”
where \(\text{df}_2\) is \(n - K + 1\), \(K\) and \(\text{df}_1\) are the number of predictors in the more complex model.
summary(model)
confint(model)
Go to www.ntu.ac.uk/mysay or scan the QR code