2024-10-17

High Level Idea of Linear Regression

Linear Regression is one of the most fundamental topics in the realm of Statistics as well as AI/ML. Simple Linear Regression involves predicting the value of a dependent variable by using a single explanatory (independent) variable. This concept falls under Supervised Machine Learning where the idea is to predict the value of a target variable after being trained on labeled data.

Assumptions For Linear Regression

  • The data should be normally distributed.
  • There are no hidden characteristics or links between observations.
  • The error size should be somehwhat consistent and should not change drastically.
  • There should be a linear relationship between the independent and dependent variables.

Mathematical Representation

Linear Regeression can be represented using the given formula: \[ y=\beta_0+\beta_1 x+\epsilon \] In the given formula, y is the variable that we wish to predict, \(\beta_0\) is the intercept, \(\beta_1\) is slope, x is the independent variable and \(\epsilon\) is the residual.

Mean Squared Error

Mean Squared Error is one of the most important concepts in regression. It is the measure of the average squared difference between the predicted and actual values. Calculating the Mean Sqaured Error gives us an idea about how accurate the model is. The following is the formula for MSE: \[ MSE= \frac{1}{n} \sum_{i=1}^{n}(Y_i - \hat{Y}_i)^2 \] In the given formula, n is the number of data points, \(Y_i\) is the observed value and \(\hat{Y}_i\) is the predicted value.

Simple Linear Regression Scatter Plot Using Iris Dataset(ggplot2)

R Code for the Linear Regression Plot using Iris Dataset

library(ggplot2)
ggplot(iris, aes(x= Petal.Length, y=Petal.Width))+ geom_point()+
geom_smooth(method = "lm",se=TRUE)+
labs(title = "Display of Linear Regression in Iris Dataset",
     x="Length",y="Width")+
theme_minimal()

Simple Linear Regression Scatter Plot Using mtcars Dataset(plotly)

Simple Linear Regression Scatter Plot Using Trees Dataset(ggplot2)

Additional Sources Used