What is Polynomial Regression?

Polynomial Regression is another type of regression analysis used to better understand relationship between a target variables and the features. The difference between linear regression and polynomial regression is that polynomial regression can better capture more complex and nonlinear relationship by fitting a polynomial equation to the data rather than the a linear equation from a linear regression.

The general equation for polynomial regression is :

\[Y = \beta_0 + \beta_1 \cdot X + \beta_2 \cdot X^2 + \beta_3 \cdot X^3 + \ .... \ \beta_n \cdot X^n + \epsilon \]

Where,

  • Y = Target Variable

  • X = Predictor Variables

  • \(\beta\) = Coefficients of the fit on the data

  • \(\epsilon\) = Error Term

When to Use Polynomial Regression

Generally, when the relationship between the target variable and the predictors are non linear such that it follow as curve.

For instance,

  • Growth Rate of Bacteria

  • Analyzing the Relationship between Education level and Household Income

  • Modeling the Econmic metrics such as Inflation Rate and Stock Market Returns

Example of Polynomial Regression

Source Code : https://www.geeksforgeeks.org/polynomial-regression-in-r-programming/

# Importing required library
library(tidyverse)
library(caret)
theme_set(theme_classic())

# Load the data
data("Boston", package = "MASS")
# Split the data into training and test set
set.seed(123)
training.samples <- Boston$medv %>%
    createDataPartition(p = 0.8, list = FALSE)
train.data <- Boston[training.samples, ]
test.data <- Boston[-training.samples, ]

# Build the model
model <- lm(medv ~ poly(lstat, 5, raw = TRUE), data = train.data)
# Make predictions
predictions <- model %>%
    predict(test.data)
# Model performance
modelPerfomance = data.frame(RMSE = RMSE(predictions, test.data$medv), R2 = R2(predictions,
    test.data$medv))

print(lm(medv ~ lstat + I(lstat^2), data = train.data))
## 
## Call:
## lm(formula = medv ~ lstat + I(lstat^2), data = train.data)
## 
## Coefficients:
## (Intercept)        lstat   I(lstat^2)  
##     42.5736      -2.2673       0.0412
print(modelPerfomance)
##       RMSE        R2
## 1 5.270374 0.6829474

So, this model attempts at capturing a quadratic relationship between lstat and medv. The negative coefficient for lstat suggests a decreasing linear effect on medv, on the other hand, the positive coefficient on I(lstat^2) indicates an upward curve between the two variables.

ggplot(data = train.data, aes(lstat, medv)) + geom_point() + stat_smooth(method = "lm",
    formula = y ~ poly(x, 5, raw = TRUE))

Overall, polynomial regression is another powerful tool which extends linear regression, better tool at capturing nonlinear relationship between variables something a linear model may overlook.