For this part of the exam you will need to use the forbes dataset in the MASS package. A data frame with 17 observations on boiling point (bp) of water and barometric pressure (pres) in inches of mercury.
According to chem.perdue.edu: - A liquid boils at a temperature at which its vapor pressure is equal to the pressure of the gas above it. The lower the pressue of a gas above a liquid, the lower the temperature at which the liquid will boil.
# Import the package
library(MASS)
# Load in the data
data("forbes")
# Learn about the data
?forbes
## starting httpd help server ... done
str(forbes)
## 'data.frame': 17 obs. of 2 variables:
## $ bp : num 194 194 198 198 199 ...
## $ pres: num 20.8 20.8 22.4 22.7 23.1 ...
x <- forbes$pres
y <- forbes$bp
Since the boiling point is based on how much pressure is above the liquid, the explanatory variable would be Pressure (pres) while the response variable is Boiling Point (bp).
plot(x,y)
This scatter plot seems to have a strong, positive, linear relationship; when “x” increases, “Y” also increases, and it seems to rise with a steady slope that doesn’t curve the way, say, an exponentially-related set of data.
# From scratch
x_bar <- mean(x)
y_bar <- mean(y)
x2 <- sum((x-x_bar)*(y-y_bar))
x3 <- sum((x-x_bar)^2)
beta_1 <- x2/x3
beta_0 <- y_bar-(beta_1*x_bar)
beta_1
## [1] 1.901784
beta_0
## [1] 155.2965
# Verifying using lm()
mod <- lm(y~x, data = forbes)
mod$coefficients
## (Intercept) x
## 155.296483 1.901784
The equation for the line is: y = 155.296483 + 1.901784x
summary(mod)
##
## Call:
## lm(formula = y ~ x, data = forbes)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.22687 -0.22178 0.07723 0.19687 0.51001
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 155.29648 0.92734 167.47 <2e-16 ***
## x 1.90178 0.03676 51.74 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.444 on 15 degrees of freedom
## Multiple R-squared: 0.9944, Adjusted R-squared: 0.9941
## F-statistic: 2677 on 1 and 15 DF, p-value: < 2.2e-16
We reject the null hypothesis with a significance level of 0.05 and a p-value of less than 2.2e-16. There is compelling evidence that barometric pressure effects the boiling point of the given liquid.
anova<-anova(mod)
ssreg<-anova$`Sum Sq`[1]
ssres<- anova$`Sum Sq`[2]
sstot<- ssreg+ssres
R2 <- ssreg/sstot
R2
## [1] 0.9944282
It seems that 99.44282% of the variable is represented by the simple linear regression model.