In this assignment I want to do focus on something I take an interest in as my data set. Therefore I will be using mtcars to do all data analysis and latex equations as well as perform a simple linear regression between a cars cylinder count and its horsepower. In addition I will be doing five LaTex math problems which will be: regression equation, coefficient estimation, goodness of fit, hypothesis testing for slope, and confidence intervals.
In this R code block I will be adding all my libraries that I will be using across this homework
library(ggplot2)
library(plotly)
##
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
In this R block I will do my simple linear regression and also create a 3D scatter plot that shows the relationship between miles per gallon, horsepower, and weight
fit <- lm(mpg ~ hp, data = mtcars)
multiDimPlot <- plot_ly(data = mtcars, x = ~mpg, y = ~hp,
z = ~wt, type = 'scatter3d', mode = 'markers')
multiDimPlot <- multiDimPlot %>% layout(title = 'Horsepower vs MPG
vs Weight')
In this R block I will create 3 ggplots, a linear regression, a scatter plot, and a bar plot
linRegLine <- ggplot(mtcars, aes(x = hp, y = mpg)) + geom_point() +
geom_smooth(method = "lm", se = FALSE, color = "blue") +
ggtitle("MPG vs HP with Linear Regression Line")
MPGvWGHT <- ggplot(mtcars, aes(x = wt, y = mpg)) + geom_point() +
geom_smooth(method = "lm", se = FALSE, color = "red") +
ggtitle("MPG vs Weight with Linear Regression Line")
barPlot <- ggplot(mtcars, aes(factor(cyl), mpg)) +
geom_bar(stat = "summary", fun = "mean", fill = "blue") +
theme_minimal() +
labs(title = "Average MPG by Number of Cylinders",
x = "Number of Cylinders", y = "Average MPG")
Lets take a look at my simple linear regression as well as the regression equation
\[ mpg = \beta_0 + \beta_1 \cdot hp + \epsilon \] Analyzing this equation we get:
summary(fit)
##
## Call:
## lm(formula = mpg ~ hp, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.7121 -2.1122 -0.8854 1.5819 8.2360
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 30.09886 1.63392 18.421 < 2e-16 ***
## hp -0.06823 0.01012 -6.742 1.79e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.863 on 30 degrees of freedom
## Multiple R-squared: 0.6024, Adjusted R-squared: 0.5892
## F-statistic: 45.46 on 1 and 30 DF, p-value: 1.788e-07
Now we can plug in the coefficients to our model equation
\[ \text{mpg}_i = 30.09886 - 0.06823 \times \text{hp}_i + \epsilon_i\] The linear equation \[\text{mpg}_i\] shows that for each additional horsepower the miles per gallon is expected to decrease 0.06823
The standard errors of the estimates tell us the accuracy
\[ \hat{\beta_0} = 1.63392 \] \[ \hat{\beta_1} = 0.01012 \]
For \[\hat{\beta_0}\] we can see that if we were to take samples and build a regression model the intercepts of the model would vary by 1.63392 For \[\hat{\beta_1}\] we can see that there is a variation in 0.01012 units in the estimated slope coefficient for different samples
Statistical tests and confidence intervals allow us to infer the population parameters: \[ H_0: \beta_1 = 0 \quad \text{(No relationship)} \] \[ H_a: \beta_1 \neq 0 \quad \text{(There is a relationship)} \]
The t value for intercept
\[ t = \frac{\hat{\beta_1}}{SE(\hat{\beta}_1)} = \frac{-0.06823}{0.01012} \approx -6.742 \]
We can now display the p-value of intercept to see if I reject or accept the null hypothesis and get proportion of the variance shown in the model
\[p \text{-value} = 1.79 \times 10^{-7} \quad\]
\[R^2 = 0.6024 \quad\]
Based on this statistical analysis we can reject the null hypothesis that there is no relationship between the number of horsepower and miles per gallon of a car. The results show that there is a statistically significant relationship between the two variables
## `geom_smooth()` using formula = 'y ~ x'
## `geom_smooth()` using formula = 'y ~ x'