This assignment uses a Generalized Linear Model (GLM) to find out what affects worker productivity in a garment factory. The goal is to identify what helps or hinders workers’ effectiveness. Understanding these factors can help management improve productivity and efficiency.
Chosen Variables:
Response Variable:
Actual Productivity: This is the primary focus of the study. It quantifies the proportion of production targets achieved by workers, serving as a direct measure of their performance. It is the outcome that the model aims to predict based on other factors.
Explanatory Variables:
Overtime Hours (over_time): This explores whether working extra hours influences how much workers can accomplish.
Number of Workers (no_of_workers): Analyzes the impact of team size on productivity.
Targeted_productivity: Investigates whether meeting predefined productivity targets is associated with enhanced overall worker performance.
Incentive: This variable represents the amount of incentive given to workers and is used to determine whether it affects their productivity levels.
data <- read.csv("C:/Users/rbada/Downloads/productivity+prediction+of+garment+employees/garments_worker_productivity.csv")
head(data)
## date quarter department day team targeted_productivity smv wip
## 1 1/1/2015 Quarter1 sweing Thursday 8 0.80 26.16 1108
## 2 1/1/2015 Quarter1 finishing Thursday 1 0.75 3.94 NA
## 3 1/1/2015 Quarter1 sweing Thursday 11 0.80 11.41 968
## 4 1/1/2015 Quarter1 sweing Thursday 12 0.80 11.41 968
## 5 1/1/2015 Quarter1 sweing Thursday 6 0.80 25.90 1170
## 6 1/1/2015 Quarter1 sweing Thursday 7 0.80 25.90 984
## over_time incentive idle_time idle_men no_of_style_change no_of_workers
## 1 7080 98 0 0 0 59.0
## 2 960 0 0 0 0 8.0
## 3 3660 50 0 0 0 30.5
## 4 3660 50 0 0 0 30.5
## 5 1920 50 0 0 0 56.0
## 6 6720 38 0 0 0 56.0
## actual_productivity
## 1 0.9407254
## 2 0.8865000
## 3 0.8005705
## 4 0.8005705
## 5 0.8003819
## 6 0.8001250
summary(data)
## date quarter department day
## Length:1197 Length:1197 Length:1197 Length:1197
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## team targeted_productivity smv wip
## Min. : 1.000 Min. :0.0700 Min. : 2.90 Min. : 7.0
## 1st Qu.: 3.000 1st Qu.:0.7000 1st Qu.: 3.94 1st Qu.: 774.5
## Median : 6.000 Median :0.7500 Median :15.26 Median : 1039.0
## Mean : 6.427 Mean :0.7296 Mean :15.06 Mean : 1190.5
## 3rd Qu.: 9.000 3rd Qu.:0.8000 3rd Qu.:24.26 3rd Qu.: 1252.5
## Max. :12.000 Max. :0.8000 Max. :54.56 Max. :23122.0
## NA's :506
## over_time incentive idle_time idle_men
## Min. : 0 Min. : 0.00 Min. : 0.0000 Min. : 0.0000
## 1st Qu.: 1440 1st Qu.: 0.00 1st Qu.: 0.0000 1st Qu.: 0.0000
## Median : 3960 Median : 0.00 Median : 0.0000 Median : 0.0000
## Mean : 4567 Mean : 38.21 Mean : 0.7302 Mean : 0.3693
## 3rd Qu.: 6960 3rd Qu.: 50.00 3rd Qu.: 0.0000 3rd Qu.: 0.0000
## Max. :25920 Max. :3600.00 Max. :300.0000 Max. :45.0000
##
## no_of_style_change no_of_workers actual_productivity
## Min. :0.0000 Min. : 2.00 Min. :0.2337
## 1st Qu.:0.0000 1st Qu.: 9.00 1st Qu.:0.6503
## Median :0.0000 Median :34.00 Median :0.7733
## Mean :0.1504 Mean :34.61 Mean :0.7351
## 3rd Qu.:0.0000 3rd Qu.:57.00 3rd Qu.:0.8503
## Max. :2.0000 Max. :89.00 Max. :1.1204
##
str(data)
## 'data.frame': 1197 obs. of 15 variables:
## $ date : chr "1/1/2015" "1/1/2015" "1/1/2015" "1/1/2015" ...
## $ quarter : chr "Quarter1" "Quarter1" "Quarter1" "Quarter1" ...
## $ department : chr "sweing" "finishing " "sweing" "sweing" ...
## $ day : chr "Thursday" "Thursday" "Thursday" "Thursday" ...
## $ team : int 8 1 11 12 6 7 2 3 2 1 ...
## $ targeted_productivity: num 0.8 0.75 0.8 0.8 0.8 0.8 0.75 0.75 0.75 0.75 ...
## $ smv : num 26.16 3.94 11.41 11.41 25.9 ...
## $ wip : int 1108 NA 968 968 1170 984 NA 795 733 681 ...
## $ over_time : int 7080 960 3660 3660 1920 6720 960 6900 6000 6900 ...
## $ incentive : int 98 0 50 50 50 38 0 45 34 45 ...
## $ idle_time : num 0 0 0 0 0 0 0 0 0 0 ...
## $ idle_men : int 0 0 0 0 0 0 0 0 0 0 ...
## $ no_of_style_change : int 0 0 0 0 0 0 0 0 0 0 ...
## $ no_of_workers : num 59 8 30.5 30.5 56 56 8 57.5 55 57.5 ...
## $ actual_productivity : num 0.941 0.886 0.801 0.801 0.8 ...
The first few rows of the data set were checked using head() to get a quick look at the data. Each row contains information about a team’s work on a specific day. The summary() function was then used to see basic details like the minimum, maximum, and average values for each column. This showed that the wip column had some missing values, and that actual_productivity ranges from about 0.23 to 1.12. Next, the str() function was used to check the structure of the data set. It showed that some columns, such as department, day, and quarter, were stored as text and needed to be converted to factors. It also confirmed that the data set includes 1197 rows and 15 columns. These steps helped to understand the data and prepare for the cleaning and modeling process.
data$department <- trimws(data$department)
data$department[data$department == "sweing"] <- "sewing"
data$department <- as.factor(data$department)
unique(data$department)
## [1] sewing finishing
## Levels: finishing sewing
After loading and exploring the dataset using head(), summary(), and str(), the next step was data cleaning. It was noticed that the department column contained extra spaces and a spelling mistake (“sweing” instead of “sewing”). To fix this, the trimws() function was used to remove spaces, the spelling was corrected, and the column was converted to a factor. These steps helped clean up the department names and ensure the values were consistent and ready for modeling.
data$wip[is.na(data$wip)] <- median(data$wip, na.rm = TRUE)
print(data$wip)
## [1] 1108 1039 968 968 1170 984 1039 795 733 681 872 578
## [13] 668 1039 1039 1039 1039 861 1039 1039 1039 1039 1039 1039
## [25] 772 913 1261 844 1005 659 1152 1138 610 1039 944 1039
## [37] 544 1072 1039 1039 1039 1039 1039 539 1039 1278 1227 1039
## [49] 878 1033 782 1216 1039 1039 513 734 1202 1039 1039 884
## [61] 1039 1039 1039 1039 1039 1227 1039 1039 1255 1047 1138 678
## [73] 712 1037 757 759 1039 1039 1083 944 1039 1039 666 1039
## [85] 1039 1039 1039 1039 1187 1305 1039 1039 716 925 963 1101
## [97] 1035 1083 910 1209 590 1039 808 1039 1039 1039 1179 1324
## [109] 1135 1039 1039 1039 1039 1039 776 990 986 924 1120 1066
## [121] 1144 413 1039 1039 568 1039 1039 1039 1216 1039 1189 942
## [133] 1050 1026 1039 1039 783 857 548 411 1066 1039 1039 1039
## [145] 1039 1039 287 724 1039 1039 1039 1039 1122 970 1158 1039
## [157] 1039 1039 660 749 893 887 1335 1082 1075 749 1039 1039
## [169] 966 1039 1039 1039 1095 1039 1039 1383 1012 1209 1039 1039
## [181] 887 896 805 762 1043 1039 1039 1039 831 1039 1039 562
## [193] 1208 1039 1039 1039 1039 1039 1099 1093 1031 1233 1039 1039
## [205] 941 843 760 737 1039 381 1141 1004 1039 1039 581 1039
## [217] 1039 1039 1039 1039 1039 1073 1156 1211 1126 1039 1039 1004
## [229] 1063 723 465 1039 1039 1039 530 1297 783 715 1039 1039
## [241] 1150 1039 1039 1039 1232 1218 1039 1039 1159 972 1092 965
## [253] 1039 816 947 838 1039 1086 1039 1039 1039 1039 1039 1039
## [265] 1160 1177 1281 1369 1084 1216 391 1039 1102 872 1039 1076
## [277] 917 1044 1039 1039 1039 1039 1039 1067 1039 1396 1292 171
## [289] 1128 865 825 1039 1039 1163 1140 1027 1039 1052 1039 1039
## [301] 1039 1039 1140 624 1288 1381 882 1002 1039 585 1138 1140
## [313] 1025 1039 1031 1039 1381 1039 1039 1039 1039 1028 1380 1196
## [325] 1039 1006 930 1095 938 1484 1039 1255 1040 1492 1039 1039
## [337] 1111 1039 1047 1434 399 983 1108 1134 1363 1039 1501 1352
## [349] 1039 1352 1450 1039 1039 1039 825 916 1039 562 1422 1206
## [361] 1039 1404 1139 1735 1595 1245 1039 757 1039 1720 1039 1039
## [373] 1039 759 1510 955 1328 1384 1244 1039 1480 1863 1039 1039
## [385] 1039 1039 1492 1859 980 970 1039 1039 1039 1039 1039 1039
## [397] 1606 727 1039 1039 1039 1039 1472 1332 1009 1266 1316 1282
## [409] 1382 1039 1295 925 1039 1039 1448 1039 1039 1039 1039 1039
## [421] 1039 1039 867 1636 1682 1495 1181 1413 1039 360 1296 1217
## [433] 1086 1039 532 1557 1039 1039 1039 1039 1039 1039 1039 1016
## [445] 1072 1584 1764 1745 1250 1039 1118 1143 1337 1039 1519 749
## [457] 1635 1299 1039 1039 1039 1134 1039 1106 1039 1039 1039 1871
## [469] 1039 1039 1294 1462 976 338 1039 1455 1282 1559 1350 1039
## [481] 1039 1039 1175 1868 1039 1045 1342 1039 1340 1429 1039 1039
## [493] 1413 336 511 1039 958 1039 1416 1039 1287 1444 1088 1039
## [505] 1039 1039 1039 1039 1579 1170 1015 1039 1436 1039 1601 717
## [517] 1039 830 1136 1397 1039 1039 1039 1039 1039 1039 1039 1039
## [529] 1039 1039 1039 1039 1116 1432 1082 1502 1209 282 1417 799
## [541] 1109 1144 1039 1396 1582 1039 1124 1039 1039 1039 1282 1472
## [553] 1039 1282 1276 1039 1192 881 1196 1400 1039 16882 1039 21385
## [565] 21266 21540 1039 1039 12261 23122 8992 1039 9792 2984 1039 1039
## [577] 839 2698 1039 1435 1039 1500 1188 1039 1039 1112 1452 1039
## [589] 340 1029 1039 1070 946 913 1363 1639 1039 1039 415 1263
## [601] 968 1108 1039 1039 1039 1532 1094 1086 1039 1039 941 1471
## [613] 1042 1248 1039 326 1039 287 1300 1485 1039 1039 1063 1039
## [625] 1039 1039 1039 1083 1069 1039 668 766 557 1608 417 1167
## [637] 1039 1186 1060 1534 1142 1039 1039 1039 1021 829 1039 813
## [649] 677 1039 658 486 687 1039 652 1233 1039 1388 1615 1283
## [661] 1039 1039 1039 1039 1026 909 670 629 754 1039 1039 712
## [673] 148 154 1039 1039 1127 1506 1413 1039 970 1271 1073 767
## [685] 103 1039 773 1039 708 842 901 1039 1039 1039 1268 1546
## [697] 813 1512 1039 1263 900 1039 640 13 855 1039 855 1039
## [709] 1039 1025 1061 1039 1039 1557 1498 1039 1039 1039 598 656
## [721] 854 609 788 1069 667 1039 829 1039 11 434 1397 1039
## [733] 1327 1039 894 1039 610 779 1039 1039 419 1039 1039 909
## [745] 1039 14 1193 10 770 1416 1039 1039 1039 942 674 1039
## [757] 481 664 1039 824 957 1039 822 1261 1039 1039 1039 1039
## [769] 1420 1039 1120 1039 679 1039 881 507 814 700 1039 1039
## [781] 919 1361 12 1039 1688 1422 1039 1039 1039 1263 461 1079
## [793] 1039 895 912 1039 784 1039 7 680 1131 2103 1039 1445
## [805] 1039 1039 1039 1133 1039 726 849 1039 562 1054 1422 1039
## [817] 680 2120 179 1039 1039 838 741 1039 1039 1039 685 864
## [829] 511 1095 1374 759 553 1039 1057 1587 1039 1039 1039 1039
## [841] 868 1119 1039 962 1039 1126 265 859 276 1039 899 1039
## [853] 1039 247 653 1116 1039 1039 1603 1039 444 1039 841 1410
## [865] 1039 1039 1039 1039 1103 922 1039 1039 1055 1039 1146 746
## [877] 834 1717 1041 1531 627 1039 450 1039 1039 1039 1039 1039
## [889] 1060 891 541 709 1100 826 577 1094 1228 1641 1583 1039
## [901] 1039 1039 1039 1039 1039 1062 450 901 1039 739 913 698
## [913] 1172 1039 1039 408 1079 1068 1039 1834 1039 708 1085 1039
## [925] 1039 1039 1039 323 1053 786 1003 635 1283 847 361 1653
## [937] 1039 1039 15 762 1039 1039 1039 1039 1039 694 816 1065
## [949] 1039 912 1244 1020 916 1591 398 1039 1039 437 413 1039
## [961] 1039 1164 1039 1039 1039 1039 954 1194 644 1039 1144 1012
## [973] 792 841 1448 512 1039 1039 609 1448 556 1039 1039 1039
## [985] 1039 1039 1039 1106 622 1193 1143 895 549 1393 1578 254
## [997] 347 1039 1039 692 1039 934 1039 1039 1039 1039 1039 1039
## [1009] 1527 1035 1039 632 1231 616 1039 947 551 935 832 544
## [1021] 1511 1039 1039 1039 1039 1039 1039 601 1562 983 1156 29
## [1033] 950 534 1096 402 1039 573 1313 1039 1039 1039 1039 1039
## [1045] 1039 1039 157 1039 1039 1039 1039 1039 1039 433 961 1169
## [1057] 874 1028 1772 1206 586 1334 1039 1039 1039 30 261 52
## [1069] 1039 1039 1039 1039 1039 1039 1039 455 797 959 1373 1228
## [1081] 1117 230 1390 529 1039 834 1450 1039 986 1267 1039 1039
## [1093] 1195 983 471 1039 1226 1251 1039 1039 1039 626 1039 1039
## [1105] 1379 541 1039 1039 1039 1039 1039 1039 1039 1082 1254 1278
## [1117] 1064 313 1039 1239 1335 928 1266 869 711 1039 1039 1039
## [1129] 1039 1039 1039 1079 1322 1039 964 1132 357 1039 1039 1039
## [1141] 1193 817 1576 1039 1262 953 919 1161 1039 1039 1039 1039
## [1153] 1574 1039 1039 1039 1104 1039 1069 756 338 1039 1499 1247
## [1165] 1039 923 1180 957 1001 1017 1039 1039 1039 1039 1039 1039
## [1177] 1039 1039 470 735 560 1039 1039 1674 290 971 1322 1054
## [1189] 992 914 1128 935 1039 1039 1039 1039 1039
summary(data$wip)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 7 970 1039 1126 1083 23122
To prepare the data for modeling, missing values in the wip column were filled using the median. This was done to make sure the model could run without errors. The median was chosen because it gives a typical value and is not affected by very high or low numbers. This helped keep the data complete and reliable for analysis.
glm_model <- glm(actual_productivity ~ over_time + no_of_workers + targeted_productivity + incentive,
family = gaussian(link = "identity"), data = data)
summary(glm_model)
##
## Call:
## glm(formula = actual_productivity ~ over_time + no_of_workers +
## targeted_productivity + incentive, family = gaussian(link = "identity"),
## data = data)
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.964e-01 3.592e-02 5.469 5.51e-08 ***
## over_time 2.415e-07 2.015e-06 0.120 0.9046
## no_of_workers -2.309e-04 3.043e-04 -0.759 0.4481
## targeted_productivity 7.440e-01 4.692e-02 15.859 < 2e-16 ***
## incentive 7.007e-05 2.863e-05 2.447 0.0145 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for gaussian family taken to be 0.0249775)
##
## Null deviance: 36.413 on 1196 degrees of freedom
## Residual deviance: 29.773 on 1192 degrees of freedom
## AIC: -1012.7
##
## Number of Fisher Scoring iterations: 2
AIC(glm_model)
## [1] -1012.738
BIC(glm_model)
## [1] -982.2128
deviance(glm_model)
## [1] 29.77318
Our analysis using a Generalized Linear Model (GLM) has provided valuable insights into the factors affecting worker productivity. The model demonstrates an excellent fit, indicated by very low AIC (-1012.7) and BIC (-982.2128) values, and a minimal residual deviance of 29.77318 on 1192 degrees of freedom, underscoring its predictive accuracy and appropriateness. Significantly, we found that targeted productivity goals and incentive schemes are key drivers of actual productivity. Specifically, teams with higher targets and those receiving incentives showed enhanced performance. In contrast, overtime hours and the number of workers did not emerge as significant predictors, suggesting that simply increasing working hours or team size without strategic planning does not necessarily enhance productivity. These findings emphasize the importance of well-defined goals and motivational rewards over simple quantitative adjustments in workforce management.
step(glm_model, direction = "backward", k = 2)
## Start: AIC=-1012.74
## actual_productivity ~ over_time + no_of_workers + targeted_productivity +
## incentive
##
## Df Deviance AIC
## - over_time 1 29.774 -1014.72
## - no_of_workers 1 29.788 -1014.16
## <none> 29.773 -1012.74
## - incentive 1 29.923 -1008.74
## - targeted_productivity 1 36.055 -785.58
##
## Step: AIC=-1014.72
## actual_productivity ~ no_of_workers + targeted_productivity +
## incentive
##
## Df Deviance AIC
## - no_of_workers 1 29.798 -1015.75
## <none> 29.774 -1014.72
## - incentive 1 29.923 -1010.74
## - targeted_productivity 1 36.061 -787.39
##
## Step: AIC=-1015.75
## actual_productivity ~ targeted_productivity + incentive
##
## Df Deviance AIC
## <none> 29.798 -1015.75
## - incentive 1 29.941 -1012.00
## - targeted_productivity 1 36.200 -784.78
##
## Call: glm(formula = actual_productivity ~ targeted_productivity + incentive,
## family = gaussian(link = "identity"), data = data)
##
## Coefficients:
## (Intercept) targeted_productivity incentive
## 0.1868498 0.7478119 0.0000684
##
## Degrees of Freedom: 1196 Total (i.e. Null); 1194 Residual
## Null Deviance: 36.41
## Residual Deviance: 29.8 AIC: -1016
A Generalized Linear Model (GLM) was used to analyze factors affecting actual productivity. The model included over_time, no_of_workers, targeted_productivity, and incentive as predictors. A Gaussian distribution with an identity link function was chosen because the response variable is continuous.
To refine the model, stepwise selection was applied, using the Akaike Information Criterion (AIC) to identify the most important variables. Additionally, the Variance Inflation Factor (VIF) was checked to ensure that the predictors were not too closely related, A Generalized Linear Model (GLM) was used to analyze factors affecting actual productivity. The model included over_time, no_of_workers, targeted_productivity, and incentive as predictors. A Gaussian distribution with an identity link function was chosen because the response variable is continuous.
To refine the model, stepwise selection was applied, using the Akaike Information Criterion (AIC) to identify the most important variables. Additionally, the Variance Inflation Factor (VIF) was checked to ensure that the predictors were not too closely related,keeping the model reliable.
The stepwise selection process improved the model by removing over_time, as it did not significantly contribute to explaining productivity, evidenced by the improvement in the AIC. The final model includes no_of_workers, targeted_productivity, and incentive. The analysis shows that targeted_productivity and incentives have a significant positive impact on productivity, while no_of_workers slightly decreases productivity. This refined model provides a better fit and more focused predictors, enhancing our understanding of what drives worker productivity.
actual_productivity ~ no_of_workers + targeted_productivity + incentive
## actual_productivity ~ no_of_workers + targeted_productivity +
## incentive
The final Generalized Linear Model (GLM) found that targeted productivity and incentives significantly improve actual productivity, with both having positive coefficients. Specifically, teams with higher targets and those receiving incentives performed better. However, the number of workers had a slight negative impact, suggesting that larger teams may not be as efficient. The stepwise selection process confirmed that only these key predictors were necessary, resulting in a more streamlined and accurate model. The model fit was confirmed by a low AIC, suggesting that the chosen variables provide a good explanation of productivity.
model <- glm(actual_productivity ~ over_time + no_of_workers + targeted_productivity + incentive,
family = gaussian(link = "identity"), data = data)
model1 <- glm(actual_productivity ~ over_time + no_of_workers,
family = gaussian(link = "identity"), data = data)
summary(model)
##
## Call:
## glm(formula = actual_productivity ~ over_time + no_of_workers +
## targeted_productivity + incentive, family = gaussian(link = "identity"),
## data = data)
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.964e-01 3.592e-02 5.469 5.51e-08 ***
## over_time 2.415e-07 2.015e-06 0.120 0.9046
## no_of_workers -2.309e-04 3.043e-04 -0.759 0.4481
## targeted_productivity 7.440e-01 4.692e-02 15.859 < 2e-16 ***
## incentive 7.007e-05 2.863e-05 2.447 0.0145 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for gaussian family taken to be 0.0249775)
##
## Null deviance: 36.413 on 1196 degrees of freedom
## Residual deviance: 29.773 on 1192 degrees of freedom
## AIC: -1012.7
##
## Number of Fisher Scoring iterations: 2
summary(model1)
##
## Call:
## glm(formula = actual_productivity ~ over_time + no_of_workers,
## family = gaussian(link = "identity"), data = data)
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.518e-01 9.476e-03 79.341 <2e-16 ***
## over_time -1.315e-06 2.217e-06 -0.593 0.553
## no_of_workers -3.102e-04 3.344e-04 -0.928 0.354
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for gaussian family taken to be 0.03038552)
##
## Null deviance: 36.413 on 1196 degrees of freedom
## Residual deviance: 36.280 on 1194 degrees of freedom
## AIC: -780.13
##
## Number of Fisher Scoring iterations: 2
The comparison between two models was made to understand whether including targeted_productivity and incentive actually improved the model. Although these variables had positive effects, it was important to test if they made the overall model more accurate. Comparing a full model (model with the variables) to a simpler model (model1 without them) helped confirm their value and showed which version fit the data best.The result of the two built models, model and model1, showed that model1 provided a much better fit to the data. It had a lower AIC (-1012.7 compared to -780.13), lower residual deviance (29.773 vs. 36.280), and included two significant predictors—targeted_productivity and incentive—which improved the model’s explanatory power.
# Compare AIC and BIC,Deviance and ANOVA
cat("AIC Comparison:\n")
## AIC Comparison:
print(AIC(model, model1))
## df AIC
## model 6 -1012.7382
## model1 4 -780.1305
cat("\nBIC Comparison:\n")
##
## BIC Comparison:
print(BIC(model, model1))
## df BIC
## model 6 -982.2128
## model1 4 -759.7802
cat("\nDeviance Comparison:\n")
##
## Deviance Comparison:
cat("Model Deviance:", deviance(model), "\n")
## Model Deviance: 29.77318
cat("Model1 Deviance:", deviance(model1), "\n")
## Model1 Deviance: 36.28031
cat("\nANOVA Comparison:\n")
##
## ANOVA Comparison:
print(anova(model, model1, test = "Chisq"))
## Analysis of Deviance Table
##
## Model 1: actual_productivity ~ over_time + no_of_workers + targeted_productivity +
## incentive
## Model 2: actual_productivity ~ over_time + no_of_workers
## Resid. Df Resid. Dev Df Deviance Pr(>Chi)
## 1 1192 29.773
## 2 1194 36.280 -2 -6.5071 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The full model(model), which included targeted_productivity and incentive, had an AIC of -1012.7382 and a BIC of -982.2128 , while the simplified model (model1), which excluded these variables, had a higher AIC of -780.13 and BIC of -759.78. These results indicate that the full model provides a much better fit to the data while still maintaining an acceptable level of complexity. The difference in AIC and BIC confirms that targeted_productivity and incentive are important variables that contribute significantly to the model’s explanatory power and should not be removed.
par(mfrow = c(1, 2))
plot(glm_model$fitted.values, glm_model$residuals,
xlab = "Fitted Values", ylab = "Residuals",
main = "Residuals vs Fitted Values")
abline(h = 0, col = "red")
qqnorm(glm_model$residuals, main = "Normal Q-Q Plot")
qqline(glm_model$residuals, col = "red")
par(mfrow = c(1, 1))
The residual plot shows that the residuals are spread out randomly around zero, meaning the model fits the data well without problems. The Q-Q plot shows that the residuals mostly follow a normal distribution, with a few small deviations at the ends, suggesting the model’s errors are nearly normal. Overall, the model seems to fit the data well, with no major issues, though there may be a few small outliers or minor issues with normality.
library(car)
## Warning: package 'car' was built under R version 4.4.3
## Loading required package: carData
vif(glm_model)
## over_time no_of_workers targeted_productivity
## 2.180145 2.184469 1.009947
## incentive
## 1.007314
The VIF values show that the predictors (over_time, no_of_workers, targeted_productivity, and incentive) are not highly correlated with each other. This means that each predictor can be reliably estimated without issues from multicollinearity, allowing for a clearer understanding of their individual impact on productivity.
library(boot)
##
## Attaching package: 'boot'
## The following object is masked from 'package:car':
##
## logit
cv_result <- cv.glm(data, glm_model, K = 10)
cv_result$delta
## [1] 0.02509775 0.02508557
The results show that the model performs well, with similar error values for both the training and test data. This suggests the model is not over fitting and generalizes well to new, unseen data.
AIC(glm_model)
## [1] -1012.738
BIC(glm_model)
## [1] -982.2128
deviance(glm_model)
## [1] 29.77318
The unchanged AIC, BIC, and Deviance values show that our model fits the data well and isn’t too complicated. These metrics confirm that the model is a good fit and doesn’t need any further simplification.
plot(glm_model$fitted.values, glm_model$residuals)
abline(h = 0, col = "red")
The Residuals vs. Fitted Values plot shows that the residuals are randomly scattered around zero. This means the model fits the data well and doesn’t have major issues like non-linearity or changing variance. The model appears to be a good fit for the data.
cooks.distance <- cooks.distance(glm_model)
plot(cooks.distance)
The Cook’s distance plot shows that most points don’t affect the model much. A few points have more influence, but they don’t greatly change the model. Overall, the model is stable with only a few influential points. The diagnostic tools suggest that the model is performing well, with no major issues related to residuals, multicollinearity, or over fitting. The model appears stable, and the predictors have significant effects on the response variable, making it a reliable model for predicting worker productivity.
summary(glm_model)
##
## Call:
## glm(formula = actual_productivity ~ over_time + no_of_workers +
## targeted_productivity + incentive, family = gaussian(link = "identity"),
## data = data)
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.964e-01 3.592e-02 5.469 5.51e-08 ***
## over_time 2.415e-07 2.015e-06 0.120 0.9046
## no_of_workers -2.309e-04 3.043e-04 -0.759 0.4481
## targeted_productivity 7.440e-01 4.692e-02 15.859 < 2e-16 ***
## incentive 7.007e-05 2.863e-05 2.447 0.0145 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for gaussian family taken to be 0.0249775)
##
## Null deviance: 36.413 on 1196 degrees of freedom
## Residual deviance: 29.773 on 1192 degrees of freedom
## AIC: -1012.7
##
## Number of Fisher Scoring iterations: 2
Targeted productivity and incentives are the most significant factors affecting actual productivity, with higher targets and incentives leading to greater productivity. On the other hand, overtime hours and the number of workers do not significantly impact productivity. However, an increase in the number of workers slightly decreases productivity in this model.
This analysis used a Generalized Linear Model (GLM) to study factors affecting worker productivity in a garment factory. The results showed that incentive had a strong positive effect on productivity, while higher targeted productivity levels had a negative effect—possibly because high goals create pressure or are difficult to achieve. Overtime hours and the number of workers did not show significant effects, suggesting they may not be effective strategies for improving productivity. Model diagnostics showed no major issues. Overall, the final model demonstrated that incentive and targeted productivity were key explanatory variables. Including both improved model fit and provided valuable insights that can help management make more informed decisions to enhance productivity on the factory floor.