Load CSV file

Loading the csv file to garment_prod variable.

garment_prod <-read.csv("/Users/lakshmimounikab/Desktop/Stats with R/R practice/garment_prod.csv")
garment_prod$team <- as.character(garment_prod$team)

Load required libraries

library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.3     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ ggplot2   3.4.3     ✔ tibble    3.2.1
## ✔ lubridate 1.9.2     ✔ tidyr     1.3.0
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(tidymodels)
## ── Attaching packages ────────────────────────────────────── tidymodels 1.1.1 ──
## ✔ broom        1.0.5     ✔ rsample      1.2.0
## ✔ dials        1.2.0     ✔ tune         1.1.2
## ✔ infer        1.0.5     ✔ workflows    1.1.3
## ✔ modeldata    1.2.0     ✔ workflowsets 1.0.1
## ✔ parsnip      1.1.1     ✔ yardstick    1.2.0
## ✔ recipes      1.0.8     
## ── Conflicts ───────────────────────────────────────── tidymodels_conflicts() ──
## ✖ scales::discard() masks purrr::discard()
## ✖ dplyr::filter()   masks stats::filter()
## ✖ recipes::fixed()  masks stringr::fixed()
## ✖ dplyr::lag()      masks stats::lag()
## ✖ yardstick::spec() masks readr::spec()
## ✖ recipes::step()   masks stats::step()
## • Learn how to get started at https://www.tidymodels.org/start/
library(modelr)
## 
## Attaching package: 'modelr'
## 
## The following objects are masked from 'package:yardstick':
## 
##     mae, mape, rmse
## 
## The following object is masked from 'package:broom':
## 
##     bootstrap

Exploring data

glimpse(garment_prod)
## Rows: 1,197
## Columns: 15
## $ date                  <chr> "1/1/15", "1/1/15", "1/1/15", "1/1/15", "1/1/15"…
## $ quarter               <chr> "Quarter1", "Quarter1", "Quarter1", "Quarter1", …
## $ department            <chr> "sweing", "finishing ", "sweing", "sweing", "swe…
## $ day                   <chr> "Thursday", "Thursday", "Thursday", "Thursday", …
## $ team                  <chr> "8", "1", "11", "12", "6", "7", "2", "3", "2", "…
## $ targeted_productivity <dbl> 0.80, 0.75, 0.80, 0.80, 0.80, 0.80, 0.75, 0.75, …
## $ smv                   <dbl> 26.16, 3.94, 11.41, 11.41, 25.90, 25.90, 3.94, 2…
## $ wip                   <int> 1108, NA, 968, 968, 1170, 984, NA, 795, 733, 681…
## $ over_time             <int> 7080, 960, 3660, 3660, 1920, 6720, 960, 6900, 60…
## $ incentive             <int> 98, 0, 50, 50, 50, 38, 0, 45, 34, 45, 44, 45, 50…
## $ idle_time             <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ idle_men              <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ no_of_style_change    <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ no_of_workers         <dbl> 59.0, 8.0, 30.5, 30.5, 56.0, 56.0, 8.0, 57.5, 55…
## $ actual_productivity   <dbl> 0.9407254, 0.8865000, 0.8005705, 0.8005705, 0.80…
summary(garment_prod)
##      date             quarter           department            day           
##  Length:1197        Length:1197        Length:1197        Length:1197       
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##                                                                             
##      team           targeted_productivity      smv             wip         
##  Length:1197        Min.   :0.0700        Min.   : 2.90   Min.   :    7.0  
##  Class :character   1st Qu.:0.7000        1st Qu.: 3.94   1st Qu.:  774.5  
##  Mode  :character   Median :0.7500        Median :15.26   Median : 1039.0  
##                     Mean   :0.7296        Mean   :15.06   Mean   : 1190.5  
##                     3rd Qu.:0.8000        3rd Qu.:24.26   3rd Qu.: 1252.5  
##                     Max.   :0.8000        Max.   :54.56   Max.   :23122.0  
##                                                           NA's   :506      
##    over_time       incentive         idle_time           idle_men      
##  Min.   :    0   Min.   :   0.00   Min.   :  0.0000   Min.   : 0.0000  
##  1st Qu.: 1440   1st Qu.:   0.00   1st Qu.:  0.0000   1st Qu.: 0.0000  
##  Median : 3960   Median :   0.00   Median :  0.0000   Median : 0.0000  
##  Mean   : 4567   Mean   :  38.21   Mean   :  0.7302   Mean   : 0.3693  
##  3rd Qu.: 6960   3rd Qu.:  50.00   3rd Qu.:  0.0000   3rd Qu.: 0.0000  
##  Max.   :25920   Max.   :3600.00   Max.   :300.0000   Max.   :45.0000  
##                                                                        
##  no_of_style_change no_of_workers   actual_productivity
##  Min.   :0.0000     Min.   : 2.00   Min.   :0.2337     
##  1st Qu.:0.0000     1st Qu.: 9.00   1st Qu.:0.6503     
##  Median :0.0000     Median :34.00   Median :0.7733     
##  Mean   :0.1504     Mean   :34.61   Mean   :0.7351     
##  3rd Qu.:0.0000     3rd Qu.:57.00   3rd Qu.:0.8503     
##  Max.   :2.0000     Max.   :89.00   Max.   :1.1204     
## 

Linear Regression Model

To build a linear regression model, I’ll consider ‘actual_productivity’ as the response variable and ‘smv’, ‘wip’ and ‘no_of_workers’ as the explanatory variables.

# Building model on garment_prod data
model <- lm(actual_productivity ~ smv + wip + no_of_workers, data = garment_prod)
# summarizing the model
summary(model)
## 
## Call:
## lm(formula = actual_productivity ~ smv + wip + no_of_workers, 
##     data = garment_prod)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.48317 -0.05458  0.03700  0.09007  0.36298 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    7.144e-01  3.268e-02  21.862  < 2e-16 ***
## smv           -5.029e-03  1.012e-03  -4.968 8.56e-07 ***
## wip            9.991e-06  3.139e-06   3.183  0.00152 ** 
## no_of_workers  2.148e-03  7.497e-04   2.865  0.00430 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1511 on 687 degrees of freedom
##   (506 observations deleted due to missingness)
## Multiple R-squared:  0.05128,    Adjusted R-squared:  0.04714 
## F-statistic: 12.38 on 3 and 687 DF,  p-value: 6.825e-08

Coefficient interpretation

  1. Intercept (Intercept Estimate = 7.144e-01):

    • The intercept represents the estimated value of ‘actual_productivity’ when all other predictor variables are zero. In this case, it’s 0.7144 (approximately).

    • It’s important to note that in your model, it may not have a practical interpretation since many predictor variables (e.g., ‘smv,’ ‘wip,’ ‘no_of_workers’) are unlikely to be zero. Instead, it serves as a baseline reference point for the model.

  2. smv (smv Estimate = -5.029e-03):

    • For every one-unit increase in ‘smv,’ ‘actual_productivity’ is estimated to decrease by approximately 0.00503 units.

    • Since the estimate is negative and statistically significant (p-value < 0.001), it suggests that higher ‘smv’ values are associated with lower ‘actual_productivity.’

  3. wip (wip Estimate = 9.991e-06):

    • For every one-unit increase in ‘wip,’ ‘actual_productivity’ is estimated to increase by approximately 9.991e-06 units.

    • The estimate is positive and statistically significant (p-value = 0.00152), indicating that higher ‘wip’ values are associated with slightly higher ‘actual_productivity.’

  4. no_of_workers (no_of_workers Estimate = 2.148e-03):

    • For every one-unit increase in ‘no_of_workers,’ ‘actual_productivity’ is estimated to increase by approximately 0.00215 units.

    • The estimate is positive and statistically significant (p-value = 0.00430), suggesting that having more workers is associated with higher ‘actual_productivity.’

Diagnostic plots

#Residual plot
plot(model)

Scatter plot

ggplot(garment_prod, aes(x = smv, y = actual_productivity)) +
  geom_point() +
  geom_smooth(method = 'lm', color = 'gray', linetype = 'dashed',
              se = FALSE) +
  geom_smooth(se = FALSE) +
  labs(y = "actual_productivity", x = "smv") + 
  ggtitle("Scatter Plot of actual_productivity vs. smv") +
  theme_classic()
## `geom_smooth()` using formula = 'y ~ x'
## `geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'

ggplot(garment_prod, aes(x = wip, y = actual_productivity)) +
  geom_point() +
  geom_smooth(method = 'lm', color = 'gray', linetype = 'dashed', se = FALSE) +
  geom_smooth(se = FALSE) +
  labs(y = "actual_productivity", x = "wip") + 
  ggtitle("Scatter Plot of actual_productivity vs. wip") +
  theme_classic()
## `geom_smooth()` using formula = 'y ~ x'
## Warning: Removed 506 rows containing non-finite values (`stat_smooth()`).
## `geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'
## Warning: Removed 506 rows containing non-finite values (`stat_smooth()`).
## Warning: Removed 506 rows containing missing values (`geom_point()`).

ggplot(garment_prod, aes(x = no_of_workers, y = actual_productivity)) +
  geom_point() +
  geom_smooth(method = 'lm', color = 'gray', linetype = 'dashed',
              se = FALSE) +
  geom_smooth(se = FALSE) +
  labs(y = "actual_productivity", x = "no_of_workers") + 
  ggtitle("Scatter Plot of actual_productivity vs. no_of_workers") +
  theme_classic()
## `geom_smooth()` using formula = 'y ~ x'
## `geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'

Transformation

The above scatter plots show that the model is not entirely linear. So, we apply logarithmic transformation to the model to linearize ‘smv’ relationship.

We remove the top 5% ‘wip’ values to eliminate high leverage points.

df <- garment_prod
df$smv <- log(df$smv)
df <- df %>% filter(!is.na(wip) & wip < quantile(wip, 0.95, na.rm = TRUE))
model_t <- lm(actual_productivity ~ smv + wip + no_of_workers, data = df)

Diagnostic plots

plot(model_t)

Scatter plots

ggplot(df, aes(x = smv, y = actual_productivity)) +
  geom_point() +
  geom_smooth(method = 'lm', color = 'gray', linetype = 'dashed',
              se = FALSE) +
  geom_smooth(se = FALSE) +
  labs(y = "actual_productivity", x = "smv") + 
  ggtitle("Scatter Plot of actual_productivity vs. smv") +
  theme_classic()
## `geom_smooth()` using formula = 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

ggplot(df, aes(x = wip, y = actual_productivity)) +
  geom_point() +
  geom_smooth(method = 'lm', color = 'gray', linetype = 'dashed', se = FALSE) +
  geom_smooth(se = FALSE) +
  labs(y = "actual_productivity", x = "wip") + 
  ggtitle("Scatter Plot of actual_productivity vs. wip") +
  theme_classic()
## `geom_smooth()` using formula = 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

summary(model_t)
## 
## Call:
## lm(formula = actual_productivity ~ smv + wip + no_of_workers, 
##     data = df)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.52733 -0.06730  0.04090  0.09264  0.31971 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    8.071e-01  5.801e-02  13.913  < 2e-16 ***
## smv           -9.953e-02  2.265e-02  -4.395 1.29e-05 ***
## wip            1.170e-04  1.684e-05   6.952 8.77e-12 ***
## no_of_workers  2.108e-03  7.661e-04   2.752  0.00609 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1459 on 652 degrees of freedom
## Multiple R-squared:  0.09745,    Adjusted R-squared:  0.09329 
## F-statistic: 23.46 on 3 and 652 DF,  p-value: 1.969e-14

Comparing the plots before transformation and after transformation, there is significant change in the linearity of the plots. smv and wip relationship has been linearized.

Coefficient interpretation

  1. Intercept (Intercept Estimate = 0.8071):

    • The intercept represents the estimated value of ‘actual_productivity’ when all predictor variables (smv, wip, no_of_workers) are zero. It’s the expected ‘actual_productivity’ when there is no influence from the predictors.
  2. smv (smv Estimate = -0.0995):

    • For every one-unit decrease in ‘smv,’ ‘actual_productivity’ is estimated to decrease by approximately 0.0995 units.

    • The estimate is negative, indicating that lower ‘smv’ values are associated with lower ‘actual_productivity.’

    • The p-value (1.29e-05) is very small, indicating that this effect is highly statistically significant.

  3. wip (wip Estimate = 0.000117):

    • For every one-unit increase in ‘wip,’ ‘actual_productivity’ is estimated to increase by approximately 0.000117 units.

    • The estimate is positive, suggesting that higher ‘wip’ values are associated with slightly higher ‘actual_productivity.’

    • The p-value (8.77e-12) is very small, indicating that this effect is highly statistically significant.

  4. no_of_workers (no_of_workers Estimate = 0.0021):

    • For every one-unit increase in ‘no_of_workers,’ ‘actual_productivity’ is estimated to increase by approximately 0.0021 units.

    • The estimate is positive, suggesting that having more workers is associated with higher ‘actual_productivity.’

    • The p-value (0.00609) is less than 0.01, indicating that this effect is statistically significant, though less so than ‘smv’ and ‘wip.’