Statistic Topic: What is a P-Value?

-The term ‘p-value’ is a probability used in hypothesis testing to measure compatibility with the Null Hypothesis.

-If the Null Hypothesis is true, the p-value is the probability of observing results AT LEAST as extreme as those obtained.

-The Null Hypothesis states that no effect or relation exists between the variables being tested.

-A smaller p-value indicates less compatibility with the Null Hypothesis, suggesting we may reject it.

Hypothesis Testing

We can test Two Hypotheses:

-Null Hypothesis: \[ H_0: \mu = \mu_0 \] -Alternate Hypothesis: \[ H_1: \mu \ne \mu_0 \]

Normal Distribution Example

x <- rnorm(10000)
hist(x)

Null Hypothesis Distribution Example

z <- (9.5 -10)/(1.2/sqrt(36))
p_value <- 2 * (1 - pnorm(z))
p_value
## [1] 1.987581

The p-value is less than 0.05, so we reject the NullHypothesis

Main Topic: Fuel Consumption vs Engine Size

Something interesting in the world of vehicles is the relationship between fuel consumption and the different number of cylinders in an engine.

The more cylinders an engine has (4 vs 6 vs 8), the higher the fuel consumption. However, that is not always the case, especially in the relationships between trucks vs sedan engines. Size AND Number of Cylinders matters.

This project will demonstrate that relationships using mtcars data

Formulas:

We require some formulas to test our hypothesis:

Regression Formula \[ \text{mpg} = \beta_0 + \beta_1 \cdot \text{cyl} + \epsilon \]

  • Null Hypothesis: \[ H_0: \beta_1 = 0 \]

  • Alternate Hypothesis \[ H_0: \beta_1 \ne 0 \]

Fuel per Cylinder

Formula for Fuel per Cylinder: \[ Total Fuel = mpg/cyl \] - We need to test the Average MPG per cylinders in different cars

Scatter Plot: MPG vs Cylinders

## `geom_smooth()` using formula = 'y ~ x'

Boxplot: MPG by Cylinder Count

3D Plot: MPG vs Cylinders vs Displacement

P-Value from Regression: Conclusion

##             Estimate Std. Error   t value     Pr(>|t|)
## (Intercept) 37.88458  2.0738436 18.267808 8.369155e-18
## cyl         -2.87579  0.3224089 -8.919699 6.112687e-10

If the p-value is less than 0.05, we can reject the Null Hypothesis -This would mean that regardless of size, Cylinder Count has a greater effect in fuel consumption compared to displacement. -We can see that p-value is 6.11x10^-10, which is smaller than 0.05, so we reject the Null!

Graph of Choice: Scatter Plot

ggplot(mtcars, aes(x = cyl, y = mpg)) +
       geom_jitter(width = 0.2, height = 0) +
        geom_smooth(method = "lm", se = FALSE) +
        labs(title = "Fuel Consumption vs Cylinder Count", 
             x = "Number of   Cylinders", y = "Miles per Gallon") + 
        theme_minimal() +
        theme(plot.title = element_text(hjust = 0.5))