Polynomial Regression is another type of regression analysis used to better understand relationship between a target variables and the features. The difference between linear regression and polynomial regression is that polynomial regression can better capture more complex and nonlinear relationship by fitting a polynomial equation to the data rather than the a linear equation from a linear regression.
The general equation for polynomial regression is :
\[Y = \beta_0 + \beta_1 \cdot X + \beta_2 \cdot X^2 + \beta_3 \cdot X^3 + \ .... \ \beta_n \cdot X^n + \epsilon \]
Where,
Y = Target Variable
X = Predictor Variables
\(\beta\) = Coefficients of the fit on the data
\(\epsilon\) = Error Term
Generally, when the relationship between the target variable and the predictors are non linear such that it follow as curve.
For instance,
Growth Rate of Bacteria
Analyzing the Relationship between Education level and Household Income
Modeling the Econmic metrics such as Inflation Rate and Stock Market Returns
Source Code : https://www.geeksforgeeks.org/polynomial-regression-in-r-programming/
# Importing required library
library(tidyverse)
library(caret)
theme_set(theme_classic())
# Load the data
data("Boston", package = "MASS")
# Split the data into training and test set
set.seed(123)
training.samples <- Boston$medv %>%
createDataPartition(p = 0.8, list = FALSE)
train.data <- Boston[training.samples, ]
test.data <- Boston[-training.samples, ]
# Build the model
model <- lm(medv ~ poly(lstat, 5, raw = TRUE), data = train.data)
# Make predictions
predictions <- model %>%
predict(test.data)
# Model performance
modelPerfomance = data.frame(RMSE = RMSE(predictions, test.data$medv), R2 = R2(predictions,
test.data$medv))
print(lm(medv ~ lstat + I(lstat^2), data = train.data))##
## Call:
## lm(formula = medv ~ lstat + I(lstat^2), data = train.data)
##
## Coefficients:
## (Intercept) lstat I(lstat^2)
## 42.5736 -2.2673 0.0412
## RMSE R2
## 1 5.270374 0.6829474
So, this model attempts at capturing a quadratic relationship between
lstat and medv. The negative coefficient for lstat suggests
a decreasing linear effect on medv, on the other hand, the
positive coefficient on I(lstat^2) indicates an upward
curve between the two variables.
ggplot(data = train.data, aes(lstat, medv)) + geom_point() + stat_smooth(method = "lm",
formula = y ~ poly(x, 5, raw = TRUE))Overall, polynomial regression is another powerful tool which extends linear regression, better tool at capturing nonlinear relationship between variables something a linear model may overlook.