Slide 1: Introduction to Linear Regression

Introduction: Linear regression, a robust statistical method, serves as a crucial tool in the realm of data analysis. By providing a model for understanding the relationship between a dependent variable and one or more independent variables, it empowers analysts to delve into the intricate patterns within datasets. This technique goes beyond mere data exploration; it equips us to make predictions and discern the impact of variables, contributing valuable insights to decision-making processes.

Definition: Linear regression is a powerful statistical technique that allows us to model the relationship between a dependent variable and one or more independent variables.

Purpose: Uncover patterns, make predictions, and understand the influence of variables in data analysis.

Slide 2: Equation

The simple linear regression model is represented by the equation:

\[ \widehat{y} = \beta_0 + \beta_1 \cdot x \]

or, in other words, \(y = mx + b\), where \(m\) is the slope (\(\beta_1\)), and \(b\) is the intercept (\(\beta_0\))

Components of the Equation:

Predicted Dependent Variable (\(\widehat{y}\)): The estimated or predicted value of the dependent variable based on the regression model.

Intercept (\(\beta_0\)): Represents the predicted value of the dependent variable when the independent variable (\(x\)) is zero.

Slope (\(\beta_1\)): Indicates the change in the predicted value of the dependent variable for a one-unit change in the independent variable (\(x\)).

Slide 3:Undersatnding The Regression Line

Regression Line: The equation defines a straight line that best fits the relationship between the dependent and independent variables.

Prediction: Use the equation of the regression line to make predictions for the dependent variable (\(\widehat{y}\)) based on specific values of the independent variable (\(x\)).

This equation is fundamental in understanding how simple linear regression models the relationship between variables and facilitates predictive analysis.

Slide 4: Example of Linear Regresssion

Let’s explore the relationship between an employee’s years of experience
(\(x\)) and their annual salary (\(\widehat{y}\)). Our regression equation is:

\[ \widehat{y} = 50,000 + 3,000 \cdot x \]

This equation suggests that for every additional year of experience (\(x\)), we expect the annual salary (\(\widehat{y}\)) to increase by $3,000, highlighting the positive slope of the relationship.

The intercept of $50,000 represents the expected annual salary(\(\widehat{y}\)) when an employee has zero years of experience(\(x\)).

As we can see in this example, the slope in a simple linear regression equation signifies the rate of change in the predicted dependent variable for a one-unit change in the independent variable.

Slide 5: Annual Salary Prediction with 8 Years of Experience

Now lets try to figure out what the predicted annual salary would be if an employee has 8 years of experience: \[ \widehat{y} = 50,000 + 3,000 \cdot 8 = 74,000 \]

The number $74,000 represents the predicted or expected annual salary for an employee with 8 years of experience, as estimated by the simple linear regression model.

In summary, this fundamental equation serves as the foundation for simple linear regression, allowing us to model, comprehend, and predict relationships between variables.

Slide 6: Example of LR using plotly plot

Let’s explore a linear regression example using flower data:

flower <- read.csv("flower.csv")

plot_ly(data = flower, x = ~height, y = ~weight, type = 'scatter', mode = 
          'markers', text = ~treat) %>%
  add_trace(type = 'scatter', mode = 'lines', x = ~height, 
            y = ~fitted(lm(weight ~ height, data = flower)), 
            line = list(color = 'blue')) %>%
  layout(title = "Scatterplot of Flower Data: Height vs Weight",
         xaxis = list(title = "Height"), yaxis = list(title = "Weight"))

Slide 7: Example 1 of LR using ggplot2

Here, we are generating a ggplot visualization using car data:

ggplot(aes(x = disp, y = mpg), data = mtcars) + 
  geom_point() +
  geom_smooth(method = "lm", se = T, color = "#008080") +
  labs(x = "Displacement", y = "Miles Per Gallon") +
  ggtitle("Scatter Plot with Linear Fit: Displacement vs. Miles Per Gallon")

Slide 8: Example 2 of LR using ggplot2

Here, we are generating a ggplot visualization using iris data:

ggplot(aes(x = Sepal.Length, y = Petal.Length), data = iris) + 
  geom_point() +
  geom_smooth(method = "lm", se = T, color = "#FF00FF") +
  labs(x = "Sepal Length (cm)", y = "Petal Length (cm)") +
  ggtitle("Scatter Plot with Linear Fit: Sepal Length vs Petal Length")