oxy <- read.csv("https://raw.githubusercontent.com/kitadasmalley/sp21_MATH376LMT/main/data/oxygenPurity.csv", header = TRUE)
purity_y <- oxy$purity
hydro_x <- oxy$hydro
The explanatory variable is the “percentage of hydrocarbons in the main condenser of the processing unity”. The response variable is the purity of the oxygen produced by the fractionation process.
Here is the histogram of purity (y), along with its mean and standard deviation.
# Histogram of Purity
hist(purity_y)
# Mean of Purity
mean(purity_y)
## [1] 91.818
# Standard Deviation of Purity
sd(purity_y)
## [1] 4.478882
Here is the histogram of hydrocarbon (x), along with its mean and standard deviation.
# Histogram of Hydrocarbon
hist(hydro_x)
# Mean of Hydrocarbon
mean(hydro_x)
## [1] 1.1825
# Standard Deviation of Hydrocarbon
sd(hydro_x)
## [1] 0.2367516
plot(hydro_x, purity_y, pch=16)
This scatterplot shows a roughly linear, positive relationship between the percentage of hydrocarbons and the purity of oxygen produced.
For this, I need to create an SLR model of the data
result <- lm(purity_y~hydro_x)
Then I can use this to fit a line of best fit to the data.
plot(hydro_x, purity_y, pch=16)
abline(coefficients(result), lty=2, col="red")
R can also tell me the exact equation of this line:
result
##
## Call:
## lm(formula = purity_y ~ hydro_x)
##
## Coefficients:
## (Intercept) hydro_x
## 77.86 11.80
Thus, the equation is as follows \[Y_i=77.86 + 11.8 x_i\]