Introduction

In this assignment I want to do focus on something I take an interest in as my data set. Therefore I will be using mtcars to do all data analysis and latex equations as well as perform a simple linear regression between a cars cylinder count and its horsepower. In addition I will be doing five LaTex math problems which will be: regression equation, coefficient estimation, goodness of fit, hypothesis testing for slope, and confidence intervals.

R Code Imports

In this R code block I will be adding all my libraries that I will be using across this homework

library(ggplot2)
library(plotly)
## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout

Snapshot of data

R Code Liner Regression and Plotly

In this R block I will do my simple linear regression and also create a 3D scatter plot that shows the relationship between miles per gallon, horsepower, and weight

fit <- lm(mpg ~ hp, data = mtcars)

multiDimPlot <- plot_ly(data = mtcars, x = ~mpg, y = ~hp, 
                        z = ~wt, type = 'scatter3d', mode = 'markers')
multiDimPlot <- multiDimPlot %>% layout(title = 'Horsepower vs MPG 
                                        vs Weight')

R Code Plots

In this R block I will create 3 ggplots, a linear regression, a scatter plot, and a bar plot

linRegLine <- ggplot(mtcars, aes(x = hp, y = mpg)) + geom_point() +
  geom_smooth(method = "lm", se = FALSE, color = "blue") +
  ggtitle("MPG vs HP with Linear Regression Line")

MPGvWGHT <- ggplot(mtcars, aes(x = wt, y = mpg)) + geom_point() +
  geom_smooth(method = "lm", se = FALSE, color = "red") +
  ggtitle("MPG vs Weight with Linear Regression Line")

barPlot <- ggplot(mtcars, aes(factor(cyl), mpg)) +
  geom_bar(stat = "summary", fun = "mean", fill = "blue") +
  theme_minimal() +
  labs(title = "Average MPG by Number of Cylinders", 
       x = "Number of Cylinders", y = "Average MPG")

Simple Linear Regression

Lets take a look at my simple linear regression as well as the regression equation

Regression Equation

\[ mpg = \beta_0 + \beta_1 \cdot hp + \epsilon \] Analyzing this equation we get:

  • HP is the outcome variable where we are trying to predict the horsepower amount
  • \[\beta_0\] Is the intercept of the regression line. It represents the value of hp when there are no cylinders

Simple Linear Regression Cont:

Simple Linear Regression Cont:

summary(fit)
## 
## Call:
## lm(formula = mpg ~ hp, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5.7121 -2.1122 -0.8854  1.5819  8.2360 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 30.09886    1.63392  18.421  < 2e-16 ***
## hp          -0.06823    0.01012  -6.742 1.79e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.863 on 30 degrees of freedom
## Multiple R-squared:  0.6024, Adjusted R-squared:  0.5892 
## F-statistic: 45.46 on 1 and 30 DF,  p-value: 1.788e-07

LaTex: Regression with Coefficients

Now we can plug in the coefficients to our model equation

\[ \text{mpg}_i = 30.09886 - 0.06823 \times \text{hp}_i + \epsilon_i\] The linear equation \[\text{mpg}_i\] shows that for each additional horsepower the miles per gallon is expected to decrease 0.06823

The standard errors of the estimates tell us the accuracy

\[ \hat{\beta_0} = 1.63392 \] \[ \hat{\beta_1} = 0.01012 \]

Analysis of the linear regression equation

For \[\hat{\beta_0}\] we can see that if we were to take samples and build a regression model the intercepts of the model would vary by 1.63392 For \[\hat{\beta_1}\] we can see that there is a variation in 0.01012 units in the estimated slope coefficient for different samples

Statistical Inference of Linear Regression

Statistical tests and confidence intervals allow us to infer the population parameters: \[ H_0: \beta_1 = 0 \quad \text{(No relationship)} \] \[ H_a: \beta_1 \neq 0 \quad \text{(There is a relationship)} \]

The t value for intercept

\[ t = \frac{\hat{\beta_1}}{SE(\hat{\beta}_1)} = \frac{-0.06823}{0.01012} \approx -6.742 \]

Statistical Inference of Linear Regression Cont:

We can now display the p-value of intercept to see if I reject or accept the null hypothesis and get proportion of the variance shown in the model

p-value

\[p \text{-value} = 1.79 \times 10^{-7} \quad\]

Proportion of variance

\[R^2 = 0.6024 \quad\]

Summary of linear regression equations and statistical inference

Based on this statistical analysis we can reject the null hypothesis that there is no relationship between the number of horsepower and miles per gallon of a car. The results show that there is a statistically significant relationship between the two variables

Linear Regression Plot: HP vs MPG

## `geom_smooth()` using formula = 'y ~ x'

Linear Regression Plot: MPG vs Weight

## `geom_smooth()` using formula = 'y ~ x'

Bar plot: Average MPG by Number of Cylinders

Linear Regression 3D Plot: MPG v Weight v HP