Creating and visualizing the model

# Creating the model
data("airquality")
temp_model <- lm(Temp ~ Solar.R, data = airquality)

# Visualizing the model
airquality %>% 
  ggplot(mapping = aes(x = Solar.R, y = Temp)) + 
  geom_point(color = 'darkgreen') + 
  geom_smooth(method = 'lm', se = FALSE, color = 'red') + 
  labs(title = 'Linear Model', 
       subtitle = 'temperature as function of solar radiation', 
       x = 'solar radiation in Ly', 
       y = 'temperature in F') + 
  theme(
    plot.title=element_text(hjust=0.5), 
    plot.subtitle=element_text(hjust=0.5))

# Model information
summary(temp_model)
## 
## Call:
## lm(formula = Temp ~ Solar.R, data = airquality)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -22.3787  -4.9572   0.8932   5.9111  18.4013 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 72.863012   1.693951  43.014  < 2e-16 ***
## Solar.R      0.028255   0.008205   3.444 0.000752 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 8.898 on 144 degrees of freedom
##   (7 observations deleted due to missingness)
## Multiple R-squared:  0.07609,    Adjusted R-squared:  0.06967 
## F-statistic: 11.86 on 1 and 144 DF,  p-value: 0.0007518

As we can see from the summary() call, the y intercept is 72.8630118, and the slope is 0.0282546. So, the final regression model is:

\[\hat{dist}=0.028{\times}speed+72.863\]

Residual analysis

If the residuals are normally distributed around a mean of 0, we can say that the linear model is a good fit for the data. Let’s see if they really are normally distributed using 2 plots: the residuals plot and the quantile-quantile plot.

# The residuals plot
qplot(fitted.values(temp_model), residuals(temp_model)) + geom_smooth(method = 'lm', se = FALSE)

# The Q-Q plot
qqnorm(resid(temp_model))
qqline(resid(temp_model))

The residuals plot looks to be free of any obvious pattern. The points are more or less uniformly spread around the line. So, the residuals plot indicates that the linear model fits the data because the residuals are normally distributed close to 0. Additionally, When we look at the Q-Q plot, we see that the points are mostly on the line, and the shape and density of the 2 tails are similar. So, again, the residuals look to be normally distributed and as such, we can say that the data fits the proposed linear model.