1

Importance of Regression

  • Regression is a statistical technique used to study relationships between a dependent variable and one or more independent variable.
  • The purpose of regression is to be able to predict outcomes based on patterns observed in the data.
  • Regression is used in various fields such as finance (predicting stock prices), engineering (predictive maintenance), and biology (growth analysis).

Mathematical Foundations: Regression Equation

\[ Y = \beta_0 + \beta_1 X + \epsilon \]

Explanation:

  • \(Y\): Dependent variable (outcome or response).
  • \(X\): Independent variable (predictor or explanatory variable).
  • \(\beta_0\): Intercept — value of \(Y\) when \(X = 0\).
  • \(\beta_1\): Slope — change in \(Y\) for a one-unit increase in \(X\).
  • \(\epsilon\): Error term — captures unexplained variation in \(Y\).

Dataset Overview:

  • The mtcars dataset contains data about motor vehicle performance.

  • Number of Observations: 32 cars.

  • Number of Variables: 11 features, including:

    • mpg: Miles per gallon (fuel efficiency, the dependent variable).

    • wt: Weight of the car (independent variable).

    • hp: Horsepower (potential secondary predictor).

Libraries and Setup

Libraries Used:

  • ggplot2: For creating static visualizations (e.g., scatterplots and diagnostic plots).

  • plotly: For building interactive plots.

Global Setup: - Import libraries at the beginning of the document:

library(ggplot2)
library(plotly)

Using the Dataset

  1. Explore the relationship between car weight (wt) and fuel efficiency (mpg) as well as the relationship between fuel efficiency (mpg) and horsepower (hp) while using ggplot for static visualizations.

  2. Explore the relationship between car weight (wt) and horsepower (hp) while using Plotly to create an interactive plot.

  3. Fit a simple linear regression model: \[ \text{mpg} = \beta_0 + \beta_1 \cdot \text{wt} + \epsilon \]

  4. Evaluate the regression model’s fit using residuals and \(R^2\).

  5. Present results using both ggplot and Plotly visualizations.

Correlation Coefficient Explained

Formula:

\[ r = \frac{\sum_{i=1}^{n}(x_i - \bar{x})(y_i - \bar{y})}{\sqrt{\sum_{i=1}^{n}(x_i - \bar{x})^2 \sum_{i=1}^{n}(y_i - \bar{y})^2}} \]

Explanation: - \(x_i, y_i\): Individual data points for \(x\) and \(y\). - \(\bar{x}, \bar{y}\): Mean values of \(x\) and \(y\). - \(n\): Number of observations.

This formula quantifies the linear relationship between two variables. The value of \(r\) lies between \(-1\) and \(1\): - \(r = 1\): Perfect positive correlation. - \(r = -1\): Perfect negative correlation. - \(r = 0\): No linear correlation.

Car Weight to MPG Scatterplot

Goal:
Visualize the relationship between car weight (wt) and fuel efficiency (mpg) with a regression line.

Car Weight to MPG Analysis

# Scatterplot with regression line
ggplot(data = mtcars, aes(x = wt, y = mpg)) +
  geom_point(color = "blue") +
  geom_smooth(
    method = "lm",
    se = TRUE,
    color = "red",
    method.args = list(formula = y ~ x)
  ) +
  labs(
    title = "Regression Line for mpg vs. wt",
    x = "Car Weight (1000 lbs)",
    y = "Miles per Gallon (mpg)"
  )
## Correlation (r): -0.87
  • Indicates a strong negative relationship.

Car MPG to Horsepower Scatterplot

Car MPG to Horsepower Analysis

## Correlation (r): -0.78
  • indicates a strong negative relationship: as horsepower increases, fuel efficiency decreases significantly.

Car Horsepower vs. Weight Scatterplot

Car Horsepower vs. Weight Analysis

## Correlation (r): 0.66
  • The correlation coefficient (r=0.66) indicates a strong positive relationship: as car weight increases, horsepower tends to increase, which aligns with expectations that heavier vehicles require more powerful engines.