Note on Assignments

Assignments are not stand alone and are designed to be answered in conjunction with lecture notes and case studies. You need to follow the R-code taught in the course when completing the assignment. Alternative R code (and interpretation) not taught in the course and extraneous R output (and interpretation) included in your answers can lead to deductions in marks. Note: AI use frequently will generate alternative code and interpretation that does not follow the course material.

1 Question 1

A researcher was interested in the relationship between the age of drivers and the maximum distance at which a newly designed sign is legible. Data was collected from a random sample of 30 drivers. The age of the driver and the maximum distance at which they could read the newly designed road sign were recorded.

The data is stored in the file road.csv and contains the following variables for each driver:

Variable Description
Distance the maximum distance at which the driver could read the sign (in metres).
Age the age of the driver (in years).

1.1 Question of interest/goal of the study

We are interested in whether the age of the driver affects the maximal distance that a road sign is legible.

1.2 Read in and inspect the data:

Road.df=read.csv("road.csv",header=T)
plot(Distance~Age, main="Maximum legible distance versus Age",data=Road.df)

road.fit = lm(Distance ~ Age, data=Road.df)
summary(road.fit)
## 
## Call:
## lm(formula = Distance ~ Age, data = Road.df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -23.876 -12.726   2.327  10.253  33.173 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 175.7648     7.1561  24.562  < 2e-16 ***
## Age          -0.9165     0.1294  -7.084 1.05e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 15.17 on 28 degrees of freedom
## Multiple R-squared:  0.6419, Adjusted R-squared:  0.6291 
## F-statistic: 50.18 on 1 and 28 DF,  p-value: 1.046e-07

1.3 Comment on the plots

There looks to be a negative relationship between age and maximum legible distance; as age increases, the maximum legible distance decreases.

1.4 Fit an appropriate linear model, including model checks and relevant output. DO NOT CHANGE THE FITTED MODEL

Road.lm=lm(Distance~Age,data=Road.df)
modelcheck(Road.lm)

summary(Road.lm)
## 
## Call:
## lm(formula = Distance ~ Age, data = Road.df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -23.876 -12.726   2.327  10.253  33.173 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 175.7648     7.1561  24.562  < 2e-16 ***
## Age          -0.9165     0.1294  -7.084 1.05e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 15.17 on 28 degrees of freedom
## Multiple R-squared:  0.6419, Adjusted R-squared:  0.6291 
## F-statistic: 50.18 on 1 and 28 DF,  p-value: 1.046e-07
confint(Road.lm)
##                  2.5 %      97.5 %
## (Intercept) 161.106186 190.4234214
## Age          -1.181517  -0.6514817

1.5 Create a scatter plot with the fitted line from your model superimposed over it.

plot(Distance~Age, main="Maximum legible distance versus Age",data=Road.df)

plot(Distance~Age,data=Road.df, xlim=c(0,80), ylim=c(0,180))
abline(road.fit, lty=2, col = "green")
prediction=predict(road.fit, newdata=data.frame(Age=c(20,40,60,80)))
points(c(20,40,60,80), prediction, col = "green", pch=19)

1.6 Method and Assumption Checks

As the initial plot of the data shows a linear relationship, we have fitted a linear regression model to our data. The data is from a random sample so we can assume independence. The residual plot shows randomness around zero with constant variance. The residuals look like they come from a normal distribution. There are no overly unduly influential points. All assumptions appear to be satisfied. The effect of the driver’s age is statistically significant.

1.6.1 Complete the equation below:

Our model is:

{Distance = 175.7648 - 0.9165 x Age + \(\epsilon\)} where \(\epsilon_i \sim iid ~ N(0,\sigma^2)\)

1.6.2 Complete the statement

Our model explains 62.91% of the variation in the response variable.

1.7 Executive Summary

We are interested in whether the age of the driver affects the maximal distance that a road sign is legible.

Based on our findings, we can confidently say there is a negative relationship (p-value = 1.05e-07) between the age of a driver, and the maximal distance that a road sign is legible. For each year that age increases, the average maximum legible distance decreased by 0.9165m.