What is Polynomial Regression?

Polynomial Regression is another type of regression analysis used to better understand relationship between a target variables and the features. The difference between linear regression and polynomial regression is that polynomial regression can better capture more complex and nonlinear relationship by fitting a polynomial equation to the data rather than the a linear equation from a linear regression.

The general equation for polynomial regression is :

\[Y = \beta_0 + \beta_1 \cdot X + \beta_2 \cdot X^2 + \beta_3 \cdot X^3 + \ .... \ \beta_n \cdot X^n + \epsilon \]

Where,

Y = Target Variable
X = Predictor Variables
\(\beta\) = Coefficients of the fit on the data
\(\epsilon\) = Error Term

When to Use Polynomial Regression

Generally, when the relationship between the target variable and the predictors are non linear such that it follow as curve.

For instance,

Growth Rate of Bacteria
Analyzing the Relationship between Education level and Household Income
Modeling the Econmic metrics such as Inflation Rate and Stock Market Returns

Example of Polynomial Regression

Source Code : https://www.geeksforgeeks.org/polynomial-regression-in-r-programming/

# Importing required library
library(tidyverse)
library(caret)
theme_set(theme_classic())

# Load the data
data("Boston", package = "MASS")
# Split the data into training and test set
set.seed(123)
training.samples <- Boston$medv %>%
    createDataPartition(p = 0.8, list = FALSE)
train.data <- Boston[training.samples, ]
test.data <- Boston[-training.samples, ]

# Build the model
model <- lm(medv ~ poly(lstat, 5, raw = TRUE), data = train.data)
# Make predictions
predictions <- model %>%
    predict(test.data)
# Model performance
modelPerfomance = data.frame(RMSE = RMSE(predictions, test.data$medv), R2 = R2(predictions,
    test.data$medv))

print(lm(medv ~ lstat + I(lstat^2), data = train.data))

## 
## Call:
## lm(formula = medv ~ lstat + I(lstat^2), data = train.data)
## 
## Coefficients:
## (Intercept)        lstat   I(lstat^2)  
##     42.5736      -2.2673       0.0412

print(modelPerfomance)

##       RMSE        R2
## 1 5.270374 0.6829474

So, this model attempts at capturing a quadratic relationship between lstat and medv. The negative coefficient for lstat suggests a decreasing linear effect on medv, on the other hand, the positive coefficient on I(lstat^2) indicates an upward curve between the two variables.

ggplot(data = train.data, aes(lstat, medv)) + geom_point() + stat_smooth(method = "lm",
    formula = y ~ poly(x, 5, raw = TRUE))

Overall, polynomial regression is another powerful tool which extends linear regression, better tool at capturing nonlinear relationship between variables something a linear model may overlook.

Resource

https://www.statology.org/polynomial-regression-r/

Blog 5: What is Polynomial Regression?

Nick Climaco

2023-12-15

What is Polynomial Regression?

When to Use Polynomial Regression

Example of Polynomial Regression

Resource