Dataset:

Cars and salespersons

we would like to analyze the relationship between how many total cars have been sold by each sales person and how many weeks each salesperson has worked to sell these cars.

Libraries

library(tidyverse)
library(ggplot2)
library(dplyr)

SalesCars <- read_csv('https://raw.githubusercontent.com/GabrielSantos33/Non-linear-regression/main/CarsSoldWeek.csv', show_col_types = FALSE)

Data Set

SalesCars

J_period	Cars_sold
168	272
428	300
296	311
392	365
80	167
56	149
352	366
444	310
168	192
200	229
4	88
52	118
20	62
228	319
72	193

Plotting the data

ggplot(SalesCars, aes(x = J_period, y = Cars_sold)) +
  geom_point() +
  labs(title = "Scatter Plot", 
       x = "Job period", 
       y = "Total cars sold")

Does it look linear? Does it look non linear?, both?

What model best represents this data

Does this linear regression fits the data quite well

Linear model

ggplot(SalesCars, aes(x = J_period, y = Cars_sold)) +
  geom_point() +
  geom_smooth(method = "lm", formula = "y ~ x",se = FALSE) +
  labs(title = "Scatter Plot Example", 
       x = "Job period", 
       y = "Total cars sold")

Fit linear regression model

lm_model <- lm(Cars_sold ~ J_period, data = SalesCars)

Coefficients

coeficient <- coef(lm_model)

Extract R-squared value

r_squared <- summary(lm_model)$r.squared

Predict y-values

predicted_y <- predict(lm_model)

Adding R-squared value and predicted y-values to dataframe

SalesCars$r_squared <- r_squared
SalesCars$predicted_y <- predicted_y

Display R-squared value and predicted y-values

cat("R-squared value: ", round(r_squared, 2), "\n")

## R-squared value:  0.8

cat("Predicted Y-values: ", paste(round(predicted_y, 2), collapse = ", "))

## Predicted Y-values:  212.32, 363.71, 286.85, 342.75, 161.08, 147.1, 319.46, 373.03, 212.32, 230.95, 116.83, 144.77, 126.14, 247.26, 156.42

Extract coefficients of the linear equation

intercept <- coef(lm_model)[1]
slope <- coef(lm_model)[2]
# Display the linear equation
cat("Linear equation: y =", round(intercept, 2), "+", round(slope, 2), "* x")

## Linear equation: y = 114.5 + 0.58 * x

For each week a person has added a bit more than a half car per week

For every week the sales is expected to increase by 0.58 per week

NOTE: In rare cases where setting the intercept to ZERO makes sense, but for this case we will not set it this up.

Linear regression Output

print(summary(lm_model))

## 
## Call:
## lm(formula = Cars_sold ~ J_period, data = SalesCars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -64.142 -27.800   1.896  30.364  71.743 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 114.49632   19.78813   5.786 6.32e-05 ***
## J_period      0.58228    0.08026   7.255 6.41e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 45.94 on 13 degrees of freedom
## Multiple R-squared:  0.8019, Adjusted R-squared:  0.7867 
## F-statistic: 52.63 on 1 and 13 DF,  p-value: 6.412e-06

Linear equation: y = 114.5 + 0.58 * x Multiple R = 0.8955 very high Adjusted R-squared: 0.7867 R-squared: 0.8019 this model explains the 80% of the variance standard error: 45.94 how well the observation fits around the regression line.

ANOVA results

anova_results <- anova(lm_model)

print("ANOVA Results:")

## [1] "ANOVA Results:"

print(anova_results)

## Analysis of Variance Table
## 
## Response: Cars_sold
##           Df Sum Sq Mean Sq F value    Pr(>F)    
## J_period   1 111097  111097  52.633 6.412e-06 ***
## Residuals 13  27440    2111                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Regression = mean square of 111097 Residuals = 27440 total sum of squares = 138537 ——— Mean Sq = 111097 Mean Sq error or residual = 2111 = low relative to the total sum of squares F Statistics = 52.633 very high you can see that our significance value is very small- the model is statistically significant P-value = 6.32e-05 Jperiod = 6.41e-06

Residuals

Let’s see the residuals

residuals <- residuals(lm_model)
# Create a residuals plot
plot(residuals, main = "Residuals Plot", xlab = "Predicted Cars_sold", ylab = "Residuals")

#x = J_period, y = Cars_sold)

Transforming the exploratory valiables

Simply by transforming the exploratory valiables does not make a non-linear model

ggplot(SalesCars, aes(x = J_period, y = Cars_sold)) +
  geom_point() +
  geom_smooth(method = "lm", formula = "y ~ log(x)",se = FALSE) +
  labs(title = "Non linear Scatter Plot Example - transformation", 
       x = "Job period", 
       y = "Total cars sold")

Exponential model

Lets use an Exponential model not a good model

ggplot(SalesCars, aes(x = J_period, y = Cars_sold)) +
  geom_point() + 
  
  geom_smooth(method = "nls", formula = y ~ a * exp(b * x), se = FALSE, color = "red", 
              method.args = list(start = list(a = 5, b = 0))) + # Add a red 
  labs(title = "Exponetial model", 
       x = "X", y = "Y") + # Set title and axis labels
  theme_minimal()

Non-linear models

Modelsq quadratic regression model

Non Linear model using modelsq quadratic regression model not a good one

library(ggplot2)

y <- SalesCars$Cars_sold
x <- SalesCars$J_period

# Fit non-linear regression model
modelsq <- nls(y ~ a * x^2 + b * x + c, start = list(a = 1, b = 1, c = 1))
           
p <- ggplot(SalesCars, aes(x = J_period, y = Cars_sold)) +
  geom_point() +
  labs(x = "X", y = "Y")

# Add fitted line to the plot
p <- p +
  stat_function(fun = function(x) predict(modelsq, newdata = data.frame(x = x)), 
                color = "red")

print(p)

Non-linear regression Output

print(summary(modelsq))

## 
## Formula: y ~ a * x^2 + b * x + c
## 
## Parameters:
##     Estimate Std. Error t value Pr(>|t|)    
## a -0.0018521  0.0005004  -3.702  0.00303 ** 
## b  1.4094525  0.2306406   6.111 5.25e-05 ***
## c 63.8509693 19.6280432   3.253  0.00692 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 32.68 on 12 degrees of freedom
## 
## Number of iterations to convergence: 1 
## Achieved convergence tolerance: 3.252e-09

residual Square model

residuals <- residuals(modelsq)
# Create a residuals plot
plot(residuals, main = "Residuals Plot", xlab = "Predicted Cars_sold", ylab = "Residuals")

#x = J_period, y = Cars_sold)

NonLinear Regression - Presentation

Gabriel Santos / Anjal Hussan

2023-04-18