Problem 4

library(tidyverse)
## ── Attaching packages ────────────────────────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.2.1     ✓ purrr   0.3.3
## ✓ tibble  2.1.3     ✓ dplyr   1.0.4
## ✓ tidyr   1.0.2     ✓ stringr 1.4.0
## ✓ readr   1.3.1     ✓ forcats 0.4.0
## ── Conflicts ───────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library(MASS)
## 
## Attaching package: 'MASS'
## The following object is masked from 'package:dplyr':
## 
##     select
# Load in the data
data("forbes")
# Learn about the data
?forbes

a) Identity the explanatory variable and the response variable for this study.

The boiling point is the response variable and the barometric pressure is the explainatory variable.

b) Use R to create a scatter plot of the two variables. Examine the scatter plot

and verbally describe the overall relationship (linear or nonlinear, positive or negative).

plot(forbes)

The plot is linear and positive. There are no obvious outliers and the seems to have a strong relationship.

c) Create a simple linear model and add the least squares regression line to the scatter plot. What is the equation of the line?

mod = lm(bp~pres, data = forbes)
summary(mod)
## 
## Call:
## lm(formula = bp ~ pres, data = forbes)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.22687 -0.22178  0.07723  0.19687  0.51001 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 155.29648    0.92734  167.47   <2e-16 ***
## pres          1.90178    0.03676   51.74   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.444 on 15 degrees of freedom
## Multiple R-squared:  0.9944, Adjusted R-squared:  0.9941 
## F-statistic:  2677 on 1 and 15 DF,  p-value: < 2.2e-16
plot(bp~pres, data = forbes)
abline(coefficients(mod),lty=2, col="blue")

The equation of the line is \[y = 155.29648 + 1.90178x\]

d) Interpret the slope in the context of these data. Provide a five part conclusion for the hypothesis test for slope in the model summary output in R.

The slope represents the increase in farenheit for boiling point of water for every additional increase in barometric pressure.

We reject the null hypothesis with a p-vlaue of 2.2e-16 at a signifigance level of 0 (p is very small and near zero). There is strong evidence that there is a correlation between baromentric pressure and boiling point.

e) What percent of the variable is explained by the simple linear regression model?

Our R^2 value is 0.9944 which means 99% of the variable is explained by the simple linear regression model. The influences of randomness and unexplained factors is less than 1%.