Dataset:

Cars and salespersons

we would like to analyze the relationship between how many total cars have been sold by each sales person and how many weeks each salesperson has worked to sell this cars.

Libraries

library(tidyverse)
library(ggplot2)
library(dplyr)
SalesCars <- read_csv('https://raw.githubusercontent.com/GabrielSantos33/Non-linear-regression/main/CarsSoldWeek.csv', show_col_types = FALSE)

Data Set

SalesCars
J_period Cars_sold
168 272
428 300
296 311
392 365
80 167
56 149
352 366
444 310
168 192
200 229
4 88
52 118
20 62
228 319
72 193

Plotting the data

ggplot(SalesCars, aes(x = J_period, y = Cars_sold)) +
  geom_point() +
  labs(title = "Scatter Plot", 
       x = "Job period", 
       y = "Total cars sold")

Does it look linear? Does it look non linear?, both?

What model best represents this data?

Does this linear regression fits the data quite well?

Linear model

ggplot(SalesCars, aes(x = J_period, y = Cars_sold)) +
  geom_point() +
  geom_smooth(method = "lm", formula = "y ~ x",se = FALSE) +
  labs(title = "Scatter Plot Example", 
       x = "Job period", 
       y = "Total cars sold")

Fit linear regression model

lm_model <- lm(Cars_sold ~ J_period, data = SalesCars)

Coefficients

coeficient <- coef(lm_model)

Extract R-squared value

r_squared <- summary(lm_model)$r.squared

Predict y-values

predicted_y <- predict(lm_model)

Adding R-squared value and predicted y-values to dataframe

SalesCars$r_squared <- r_squared
SalesCars$predicted_y <- predicted_y

Display R-squared value and predicted y-values

cat("R-squared value: ", round(r_squared, 2), "\n")
## R-squared value:  0.8
cat("Predicted Y-values: ", paste(round(predicted_y, 2), collapse = ", "))
## Predicted Y-values:  212.32, 363.71, 286.85, 342.75, 161.08, 147.1, 319.46, 373.03, 212.32, 230.95, 116.83, 144.77, 126.14, 247.26, 156.42

Extract coefficients of the linear equation

intercept <- coef(lm_model)[1]
slope <- coef(lm_model)[2]
# Display the linear equation
cat("Linear equation: y =", round(intercept, 2), "+", round(slope, 2), "* x")
## Linear equation: y = 114.5 + 0.58 * x

For each week a person has add a bit more than a half car per week

For every week the sells is expected to increase by 0.58 per week

NOTE: for rare cases where setting the intercept to ZERO makes sense. but for this case we will not set it this up.

Linear regression Output

print(summary(lm_model))
## 
## Call:
## lm(formula = Cars_sold ~ J_period, data = SalesCars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -64.142 -27.800   1.896  30.364  71.743 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 114.49632   19.78813   5.786 6.32e-05 ***
## J_period      0.58228    0.08026   7.255 6.41e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 45.94 on 13 degrees of freedom
## Multiple R-squared:  0.8019, Adjusted R-squared:  0.7867 
## F-statistic: 52.63 on 1 and 13 DF,  p-value: 6.412e-06

Linear equation: y = 114.5 + 0.58 * x Multiple R = 0.8955 very high Adjusted R-squared: 0.7867 R-squared: 0.8019 this model explain the 80% of the variance standard error: 45.94 how well the observation fit around the regression line

ANOVA results

anova_results <- anova(lm_model)
print("ANOVA Results:")
## [1] "ANOVA Results:"
print(anova_results)
## Analysis of Variance Table
## 
## Response: Cars_sold
##           Df Sum Sq Mean Sq F value    Pr(>F)    
## J_period   1 111097  111097  52.633 6.412e-06 ***
## Residuals 13  27440    2111                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Regression = mean square of 111097 Residuals = 27440 total sum of squares = 138537 ——— Mean Sq = 111097 Mean Sq error or residual = 2111 = low relative to the total sum of squares F Statistics = 52.633 very high you can se that our significance value is very small- the model is statically significant P-value = 6.32e-05 Jperiod = 6.41e-06

Residuals

Let’s see the residuals

residuals <- residuals(lm_model)
# Create a residuals plot
plot(residuals, main = "Residuals Plot", xlab = "Predicted Cars_sold", ylab = "Residuals")

#x = J_period, y = Cars_sold)

Transforming the exploratory valiables

Simply by transforming the exploratory valiables does not make a non-linear model

ggplot(SalesCars, aes(x = J_period, y = Cars_sold)) +
  geom_point() +
  geom_smooth(method = "lm", formula = "y ~ log(x)",se = FALSE) +
  labs(title = "Non linear Scatter Plot Example - transformation", 
       x = "Job period", 
       y = "Total cars sold")

Exponetial model

Lets use an exponetial model not a good model

ggplot(SalesCars, aes(x = J_period, y = Cars_sold)) +
  geom_point() + 
  
  geom_smooth(method = "nls", formula = y ~ a * exp(b * x), se = FALSE, color = "red", 
              method.args = list(start = list(a = 5, b = 0))) + # Add a red 
  labs(title = "Exponetial model", 
       x = "X", y = "Y") + # Set title and axis labels
  theme_minimal() 

Non-linear models

Modelsq quadratic regression model

Non Linear model using modelsq quadratic regression model not a good one

library(ggplot2)

y <- SalesCars$Cars_sold
x <- SalesCars$J_period

# Fit non-linear regression model
modelsq <- nls(y ~ a * x^2 + b * x + c, start = list(a = 1, b = 1, c = 1))
           
p <- ggplot(SalesCars, aes(x = J_period, y = Cars_sold)) +
  geom_point() +
  labs(x = "X", y = "Y")

# Add fitted line to the plot
p <- p +
  stat_function(fun = function(x) predict(modelsq, newdata = data.frame(x = x)), 
                color = "red")

print(p)

Non-linear regression Output

print(summary(modelsq))
## 
## Formula: y ~ a * x^2 + b * x + c
## 
## Parameters:
##     Estimate Std. Error t value Pr(>|t|)    
## a -0.0018521  0.0005004  -3.702  0.00303 ** 
## b  1.4094525  0.2306406   6.111 5.25e-05 ***
## c 63.8509693 19.6280432   3.253  0.00692 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 32.68 on 12 degrees of freedom
## 
## Number of iterations to convergence: 1 
## Achieved convergence tolerance: 3.252e-09