data <- read.csv("C:\\Users\\Krishna\\Downloads\\productivity+prediction+of+garment+employees\\garments_worker_productivity.csv")
# load the neccesary libraries
library(ggplot2)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(tidyr)
library(car)
## Loading required package: carData
## 
## Attaching package: 'car'
## The following object is masked from 'package:dplyr':
## 
##     recode
# Display the structure and summary statistics of the dataset
str(data)
## 'data.frame':    1197 obs. of  15 variables:
##  $ date                 : chr  "01-01-2015" "01-01-2015" "01-01-2015" "01-01-2015" ...
##  $ quarter              : chr  "Quarter1" "Quarter1" "Quarter1" "Quarter1" ...
##  $ department           : chr  "sweing" "finishing " "sweing" "sweing" ...
##  $ day                  : chr  "Thursday" "Thursday" "Thursday" "Thursday" ...
##  $ team                 : int  8 1 11 12 6 7 2 3 2 1 ...
##  $ targeted_productivity: num  0.8 0.75 0.8 0.8 0.8 0.8 0.75 0.75 0.75 0.75 ...
##  $ smv                  : num  26.16 3.94 11.41 11.41 25.9 ...
##  $ wip                  : int  1108 NA 968 968 1170 984 NA 795 733 681 ...
##  $ over_time            : int  7080 960 3660 3660 1920 6720 960 6900 6000 6900 ...
##  $ incentive            : int  98 0 50 50 50 38 0 45 34 45 ...
##  $ idle_time            : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ idle_men             : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ no_of_style_change   : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ no_of_workers        : num  59 8 30.5 30.5 56 56 8 57.5 55 57.5 ...
##  $ actual_productivity  : num  0.941 0.886 0.801 0.801 0.8 ...
summary(data)
##      date             quarter           department            day           
##  Length:1197        Length:1197        Length:1197        Length:1197       
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##                                                                             
##       team        targeted_productivity      smv             wip         
##  Min.   : 1.000   Min.   :0.0700        Min.   : 2.90   Min.   :    7.0  
##  1st Qu.: 3.000   1st Qu.:0.7000        1st Qu.: 3.94   1st Qu.:  774.5  
##  Median : 6.000   Median :0.7500        Median :15.26   Median : 1039.0  
##  Mean   : 6.427   Mean   :0.7296        Mean   :15.06   Mean   : 1190.5  
##  3rd Qu.: 9.000   3rd Qu.:0.8000        3rd Qu.:24.26   3rd Qu.: 1252.5  
##  Max.   :12.000   Max.   :0.8000        Max.   :54.56   Max.   :23122.0  
##                                                         NA's   :506      
##    over_time       incentive         idle_time           idle_men      
##  Min.   :    0   Min.   :   0.00   Min.   :  0.0000   Min.   : 0.0000  
##  1st Qu.: 1440   1st Qu.:   0.00   1st Qu.:  0.0000   1st Qu.: 0.0000  
##  Median : 3960   Median :   0.00   Median :  0.0000   Median : 0.0000  
##  Mean   : 4567   Mean   :  38.21   Mean   :  0.7302   Mean   : 0.3693  
##  3rd Qu.: 6960   3rd Qu.:  50.00   3rd Qu.:  0.0000   3rd Qu.: 0.0000  
##  Max.   :25920   Max.   :3600.00   Max.   :300.0000   Max.   :45.0000  
##                                                                        
##  no_of_style_change no_of_workers   actual_productivity
##  Min.   :0.0000     Min.   : 2.00   Min.   :0.2337     
##  1st Qu.:0.0000     1st Qu.: 9.00   1st Qu.:0.6503     
##  Median :0.0000     Median :34.00   Median :0.7733     
##  Mean   :0.1504     Mean   :34.61   Mean   :0.7351     
##  3rd Qu.:0.0000     3rd Qu.:57.00   3rd Qu.:0.8503     
##  Max.   :2.0000     Max.   :89.00   Max.   :1.1204     
## 
# Build linear model
lm_model <- lm(actual_productivity ~ targeted_productivity + wip + over_time, data = data)

# Summary of the model
summary(lm_model)
## 
## Call:
## lm(formula = actual_productivity ~ targeted_productivity + wip + 
##     over_time, data = data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.48763 -0.00196  0.00539  0.03944  0.48735 
## 
## Coefficients:
##                         Estimate Std. Error t value Pr(>|t|)    
## (Intercept)           -6.503e-02  3.237e-02  -2.009  0.04492 *  
## targeted_productivity  1.056e+00  4.131e-02  25.560  < 2e-16 ***
## wip                    7.336e-06  2.286e-06   3.209  0.00139 ** 
## over_time              2.129e-06  1.469e-06   1.449  0.14768    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1101 on 687 degrees of freedom
##   (506 observations deleted due to missingness)
## Multiple R-squared:  0.4964, Adjusted R-squared:  0.4942 
## F-statistic: 225.8 on 3 and 687 DF,  p-value: < 2.2e-16

Diagnosing the model

# Residuals vs Fitted Values
plot(lm_model, which = 1)

# Normal Q-Q plot
plot(lm_model, which = 2)

# Scale-Location plot
plot(lm_model, which = 3)

# Residuals vs Leverage plot
plot(lm_model, which = 5)

Issue with the model:

One of the issues observed in the diagnostic plots is heteroscedasticity, which refers to the unequal variance of the errors across the range of predicted values. In the “Residuals vs Fitted Values” plot, if the spread of the residuals varies significantly as the fitted values change, it indicates heteroscedasticity.

# Interpret the  coefficients for targeted_productivity
coef(lm_model)
##           (Intercept) targeted_productivity                   wip 
##         -6.502847e-02          1.055906e+00          7.336113e-06 
##             over_time 
##          2.128635e-06

INSIGHTS:

1)Building the linear model:

Insight: The linear model revealed that targeted productivity, work in progress (WIP), and overtime hours are significant predictors of actual productivity.

Significance: Understanding which variables are significant helps prioritize resources and interventions to improve productivity.

2)diagnosing the model :

Insight: Diagnostic plots indicate heteroscedasticity in the residuals, suggesting that the variance of errors changes across different levels of predicted values.

Significance: Heteroscedasticity can affect the reliability of model estimates and predictions, highlighting the need for model improvement.

3)Highlighting issues with model:

Insight: The model faces challenges due to heteroscedasticity and potential outliers, which may lead to biased estimates and reduced predictive accuracy.

Significance: Identifying these issues early allows for corrective actions to be taken to enhance the model’s performance and reliability.

4)interpreting the coefficient:

Insight: The coefficient for targeted productivity suggests that, on average, a one-unit increase in targeted productivity is associated with a certain change in actual productivity, holding other variables constant.

Significance: Understanding the direction and magnitude of the coefficients helps in making informed decisions and interventions to optimize productivity.