Introduction to Simple Linear Regression

2024-02-15

Overview

Introduction to regression analysis.
Importance of simple linear regression in statistics.

What is Simple Linear Regression?

Simple linear regression is a statistical method to model the relationship between two variables. It assumes that there is a linear relationship between the predictor variable (X) and the response variable (Y).

Mathematical Representation

The simple linear regression model can be represented as:

\[ y = \beta_0 + \beta_1 x + \varepsilon \]

where: \(y\) is the dependent variable, \(x\) is the independent variable, \(\beta_0\) is the intercept, \(\beta_1\) is the slope, and \(\varepsilon\) is the error term.

Example Data

Let’s consider an example where we want to predict the sales of a product based on the advertising budget spent on it.

# Generate example data
set.seed(123)
budget <- seq(100, 500, by = 50)
sales <- 100 + 0.5 * budget + rnorm(length(budget), mean = 0, sd = 20)
data <- data.frame(Budget = budget, Sales = sales)

# Display the first few rows of the data
head(data)

##   Budget    Sales
## 1    100 138.7905
## 2    150 170.3965
## 3    200 231.1742
## 4    250 226.4102
## 5    300 252.5858
## 6    350 309.3013

Scatter plot

Fit linear regression model

model <- lm(Sales ~ Budget, data = data)

Display model summary

summary(model)

## 
## Call:
## lm(formula = Sales ~ Budget, data = data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -23.789 -11.413  -2.626   9.344  33.040 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 110.9711    17.5711   6.316 0.000398 ***
## Budget        0.4723     0.0538   8.778 5.02e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 20.84 on 7 degrees of freedom
## Multiple R-squared:  0.9167, Adjusted R-squared:  0.9048 
## F-statistic: 77.05 on 1 and 7 DF,  p-value: 5.017e-05

Model Interpretation

The intercept \(\beta_0\) is approximately 100. This represents the estimated sales when the advertising budget is zero.
The slope \(\beta_0\) is approximately 0.5. This indicates that for every unit increase in the advertising budget, the sales increase by an average of 0.5 units.

Plot of regression line

Plot using Ploty Plot

plot_ly(data, x = ~Budget, y = ~Sales, type = "scatter", mode = "markers", name = "Data") %>%
  add_trace(x = budget, y = predict(model), mode = "lines", name = "Regression Line") %>%
  layout(title = "Fitted Regression Line", xaxis = list(title = "Advertising Budget"), yaxis = list(title = "Sales"))

Conclusion

In this presentation, we introduced the concept of simple linear regression, demonstrated its application using an example, and interpreted the results obtained from the fitted regression model.