oxy <- read.csv("https://raw.githubusercontent.com/kitadasmalley/sp21_MATH376LMT/main/data/oxygenPurity.csv", header = TRUE)

purity_y <- oxy$purity
hydro_x <- oxy$hydro

Problem 4

1. Identity the explanatory variable and the response variable for this study.

The explanatory variable is the “percentage of hydrocarbons in the main condenser of the processing unity”. The response variable is the purity of the oxygen produced by the fractionation process.

2. Use R to make a histogram of purity and report the mean and standard deviation.

Here is the histogram of purity (y), along with its mean and standard deviation.

# Histogram of Purity
hist(purity_y)

# Mean of Purity
mean(purity_y)
## [1] 91.818
# Standard Deviation of Purity
sd(purity_y)
## [1] 4.478882

3. Use R to make a histogram of hydrocarbon and report the mean and standard deviation.

Here is the histogram of hydrocarbon (x), along with its mean and standard deviation.

# Histogram of Hydrocarbon
hist(hydro_x)

# Mean of Hydrocarbon
mean(hydro_x)
## [1] 1.1825
# Standard Deviation of Hydrocarbon
sd(hydro_x)
## [1] 0.2367516

4. Use R to create a scatter plot of the two variables. Examine the scatter plot and verbally describe the overall relationship (linear or nonlinear, positive or negative).

plot(hydro_x, purity_y, pch=16)

This scatterplot shows a roughly linear, positive relationship between the percentage of hydrocarbons and the purity of oxygen produced.

5. Add the least squares regression line to the scatter plot. What is the equation of the line?

For this, I need to create an SLR model of the data

result <- lm(purity_y~hydro_x)

Then I can use this to fit a line of best fit to the data.

plot(hydro_x, purity_y, pch=16)
abline(coefficients(result), lty=2, col="red")

R can also tell me the exact equation of this line:

result
## 
## Call:
## lm(formula = purity_y ~ hydro_x)
## 
## Coefficients:
## (Intercept)      hydro_x  
##       77.86        11.80

Thus, the equation is as follows \[Y_i=77.86 + 11.8 x_i\]