data <- read.csv("C:\\Users\\Krishna\\Downloads\\productivity+prediction+of+garment+employees\\garments_worker_productivity.csv")
# load the neccesary libraries
library(ggplot2)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(tidyr)
library(car)
## Loading required package: carData
##
## Attaching package: 'car'
## The following object is masked from 'package:dplyr':
##
## recode
# Display the structure and summary statistics of the dataset
str(data)
## 'data.frame': 1197 obs. of 15 variables:
## $ date : chr "01-01-2015" "01-01-2015" "01-01-2015" "01-01-2015" ...
## $ quarter : chr "Quarter1" "Quarter1" "Quarter1" "Quarter1" ...
## $ department : chr "sweing" "finishing " "sweing" "sweing" ...
## $ day : chr "Thursday" "Thursday" "Thursday" "Thursday" ...
## $ team : int 8 1 11 12 6 7 2 3 2 1 ...
## $ targeted_productivity: num 0.8 0.75 0.8 0.8 0.8 0.8 0.75 0.75 0.75 0.75 ...
## $ smv : num 26.16 3.94 11.41 11.41 25.9 ...
## $ wip : int 1108 NA 968 968 1170 984 NA 795 733 681 ...
## $ over_time : int 7080 960 3660 3660 1920 6720 960 6900 6000 6900 ...
## $ incentive : int 98 0 50 50 50 38 0 45 34 45 ...
## $ idle_time : num 0 0 0 0 0 0 0 0 0 0 ...
## $ idle_men : int 0 0 0 0 0 0 0 0 0 0 ...
## $ no_of_style_change : int 0 0 0 0 0 0 0 0 0 0 ...
## $ no_of_workers : num 59 8 30.5 30.5 56 56 8 57.5 55 57.5 ...
## $ actual_productivity : num 0.941 0.886 0.801 0.801 0.8 ...
summary(data)
## date quarter department day
## Length:1197 Length:1197 Length:1197 Length:1197
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## team targeted_productivity smv wip
## Min. : 1.000 Min. :0.0700 Min. : 2.90 Min. : 7.0
## 1st Qu.: 3.000 1st Qu.:0.7000 1st Qu.: 3.94 1st Qu.: 774.5
## Median : 6.000 Median :0.7500 Median :15.26 Median : 1039.0
## Mean : 6.427 Mean :0.7296 Mean :15.06 Mean : 1190.5
## 3rd Qu.: 9.000 3rd Qu.:0.8000 3rd Qu.:24.26 3rd Qu.: 1252.5
## Max. :12.000 Max. :0.8000 Max. :54.56 Max. :23122.0
## NA's :506
## over_time incentive idle_time idle_men
## Min. : 0 Min. : 0.00 Min. : 0.0000 Min. : 0.0000
## 1st Qu.: 1440 1st Qu.: 0.00 1st Qu.: 0.0000 1st Qu.: 0.0000
## Median : 3960 Median : 0.00 Median : 0.0000 Median : 0.0000
## Mean : 4567 Mean : 38.21 Mean : 0.7302 Mean : 0.3693
## 3rd Qu.: 6960 3rd Qu.: 50.00 3rd Qu.: 0.0000 3rd Qu.: 0.0000
## Max. :25920 Max. :3600.00 Max. :300.0000 Max. :45.0000
##
## no_of_style_change no_of_workers actual_productivity
## Min. :0.0000 Min. : 2.00 Min. :0.2337
## 1st Qu.:0.0000 1st Qu.: 9.00 1st Qu.:0.6503
## Median :0.0000 Median :34.00 Median :0.7733
## Mean :0.1504 Mean :34.61 Mean :0.7351
## 3rd Qu.:0.0000 3rd Qu.:57.00 3rd Qu.:0.8503
## Max. :2.0000 Max. :89.00 Max. :1.1204
##
# Build linear model
lm_model <- lm(actual_productivity ~ targeted_productivity + wip + over_time, data = data)
# Summary of the model
summary(lm_model)
##
## Call:
## lm(formula = actual_productivity ~ targeted_productivity + wip +
## over_time, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.48763 -0.00196 0.00539 0.03944 0.48735
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -6.503e-02 3.237e-02 -2.009 0.04492 *
## targeted_productivity 1.056e+00 4.131e-02 25.560 < 2e-16 ***
## wip 7.336e-06 2.286e-06 3.209 0.00139 **
## over_time 2.129e-06 1.469e-06 1.449 0.14768
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1101 on 687 degrees of freedom
## (506 observations deleted due to missingness)
## Multiple R-squared: 0.4964, Adjusted R-squared: 0.4942
## F-statistic: 225.8 on 3 and 687 DF, p-value: < 2.2e-16
# Residuals vs Fitted Values
plot(lm_model, which = 1)
# Normal Q-Q plot
plot(lm_model, which = 2)
# Scale-Location plot
plot(lm_model, which = 3)
# Residuals vs Leverage plot
plot(lm_model, which = 5)
One of the issues observed in the diagnostic plots is heteroscedasticity, which refers to the unequal variance of the errors across the range of predicted values. In the “Residuals vs Fitted Values” plot, if the spread of the residuals varies significantly as the fitted values change, it indicates heteroscedasticity.
# Interpret the coefficients for targeted_productivity
coef(lm_model)
## (Intercept) targeted_productivity wip
## -6.502847e-02 1.055906e+00 7.336113e-06
## over_time
## 2.128635e-06
INSIGHTS:
1)Building the linear model:
Insight: The linear model revealed that targeted productivity, work in progress (WIP), and overtime hours are significant predictors of actual productivity.
Significance: Understanding which variables are significant helps prioritize resources and interventions to improve productivity.
2)diagnosing the model :
Insight: Diagnostic plots indicate heteroscedasticity in the residuals, suggesting that the variance of errors changes across different levels of predicted values.
Significance: Heteroscedasticity can affect the reliability of model estimates and predictions, highlighting the need for model improvement.
3)Highlighting issues with model:
Insight: The model faces challenges due to heteroscedasticity and potential outliers, which may lead to biased estimates and reduced predictive accuracy.
Significance: Identifying these issues early allows for corrective actions to be taken to enhance the model’s performance and reliability.
4)interpreting the coefficient:
Insight: The coefficient for targeted productivity suggests that, on average, a one-unit increase in targeted productivity is associated with a certain change in actual productivity, holding other variables constant.
Significance: Understanding the direction and magnitude of the coefficients helps in making informed decisions and interventions to optimize productivity.