library(tidyverse)
## ── Attaching packages ────────────────────────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.2.1 ✓ purrr 0.3.3
## ✓ tibble 2.1.3 ✓ dplyr 1.0.4
## ✓ tidyr 1.0.2 ✓ stringr 1.4.0
## ✓ readr 1.3.1 ✓ forcats 0.4.0
## ── Conflicts ───────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(MASS)
##
## Attaching package: 'MASS'
## The following object is masked from 'package:dplyr':
##
## select
# Load in the data
data("forbes")
# Learn about the data
?forbes
The boiling point is the response variable and the barometric pressure is the explainatory variable.
and verbally describe the overall relationship (linear or nonlinear, positive or negative).
plot(forbes)
The plot is linear and positive. There are no obvious outliers and the seems to have a strong relationship.
mod = lm(bp~pres, data = forbes)
summary(mod)
##
## Call:
## lm(formula = bp ~ pres, data = forbes)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.22687 -0.22178 0.07723 0.19687 0.51001
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 155.29648 0.92734 167.47 <2e-16 ***
## pres 1.90178 0.03676 51.74 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.444 on 15 degrees of freedom
## Multiple R-squared: 0.9944, Adjusted R-squared: 0.9941
## F-statistic: 2677 on 1 and 15 DF, p-value: < 2.2e-16
plot(bp~pres, data = forbes)
abline(coefficients(mod),lty=2, col="blue")
The equation of the line is \[y = 155.29648 + 1.90178x\]
The slope represents the increase in farenheit for boiling point of water for every additional increase in barometric pressure.
We reject the null hypothesis with a p-vlaue of 2.2e-16 at a signifigance level of 0 (p is very small and near zero). There is strong evidence that there is a correlation between baromentric pressure and boiling point.
Our R^2 value is 0.9944 which means 99% of the variable is explained by the simple linear regression model. The influences of randomness and unexplained factors is less than 1%.