2023-10-15

What is Linear Regression

Linear regression is a statistical method used to model the relationship between a dependent variable and one or more independent variables by fitting a linear equation to the observed data. It is commonly used for predictive analysis and modeling

The formula for a simple linear regression line is given by:

\[ Y = \alpha + \beta_1X + \epsilon \]

Where: - \(Y\) is the dependent variable.

  • \(X\) is the independent variable.

  • \(\alpha\) is the intercept.

  • \(\beta_1\) is the slope.

  • \(\epsilon\) represents the error term.

Iris dataset exploration

We will be using the data set “iris” that is pre-installed within Rstudio

data("iris")
head(iris)
##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1          5.1         3.5          1.4         0.2  setosa
## 2          4.9         3.0          1.4         0.2  setosa
## 3          4.7         3.2          1.3         0.2  setosa
## 4          4.6         3.1          1.5         0.2  setosa
## 5          5.0         3.6          1.4         0.2  setosa
## 6          5.4         3.9          1.7         0.4  setosa

Example 1: Sepal Length vs Petal Length

## `geom_smooth()` using formula = 'y ~ x'

Interpretaion of the Simple Linear Regression Plot

In the previous slide, we generated a scatter plot with a linear progression line where the plot showcases the relationship between Sepal Length and Petal Length of the iris flowers

Key observations:

  • As Sepal Length increases, Petal Length tends to increase linearly.

  • The red line represents the linear regression line, which best fits the data points.

  • The linear regression line can be used for making predictions and understand the relationship between the two variables.

Example 2 : Petal Length vs Petal Width Code

ggplot(iris, aes(x = Petal.Length, y = Petal.Width)) + geom_point() + 
  geom_smooth(method = "lm", se = FALSE, color = "green") +
  labs (title = "Simple Linear Regression with Iris Dataset",
        x = "Petal Length",
        y = "Petal Width")
## `geom_smooth()` using formula = 'y ~ x'

Example 2 Plot

## `geom_smooth()` using formula = 'y ~ x'

Polynomial Regression

Polynomial regression is a type of regression where the relationship between the dependent variable and the independent variable(s) is modeled as an n-th degree. They are used when the relationship between the variables is not linear, but can approximated by a polynomial function.

\[ Y = \beta_0 + \beta+1X + \beta+2X + \ldots + \beta_nX^n + \epsilon \]

Plotly 3d Plot