Build a linear (or generalized linear) model as you like

Use whatever response variable and explanatory variables you prefer

This assignment uses a Generalized Linear Model (GLM) to find out what affects worker productivity in a garment factory. The goal is to identify what helps or hinders workers’ effectiveness. Understanding these factors can help management improve productivity and efficiency.

Chosen Variables:

Response Variable:

Actual Productivity: This is the primary focus of the study. It quantifies the proportion of production targets achieved by workers, serving as a direct measure of their performance. It is the outcome that the model aims to predict based on other factors.

Explanatory Variables:

Overtime Hours (over_time): This explores whether working extra hours influences how much workers can accomplish.

Number of Workers (no_of_workers): Analyzes the impact of team size on productivity.

Targeted_productivity: Investigates whether meeting predefined productivity targets is associated with enhanced overall worker performance.

Incentive: This variable represents the amount of incentive given to workers and is used to determine whether it affects their productivity levels.

data <- read.csv("C:/Users/rbada/Downloads/productivity+prediction+of+garment+employees/garments_worker_productivity.csv")
head(data)
##       date  quarter department      day team targeted_productivity   smv  wip
## 1 1/1/2015 Quarter1     sweing Thursday    8                  0.80 26.16 1108
## 2 1/1/2015 Quarter1 finishing  Thursday    1                  0.75  3.94   NA
## 3 1/1/2015 Quarter1     sweing Thursday   11                  0.80 11.41  968
## 4 1/1/2015 Quarter1     sweing Thursday   12                  0.80 11.41  968
## 5 1/1/2015 Quarter1     sweing Thursday    6                  0.80 25.90 1170
## 6 1/1/2015 Quarter1     sweing Thursday    7                  0.80 25.90  984
##   over_time incentive idle_time idle_men no_of_style_change no_of_workers
## 1      7080        98         0        0                  0          59.0
## 2       960         0         0        0                  0           8.0
## 3      3660        50         0        0                  0          30.5
## 4      3660        50         0        0                  0          30.5
## 5      1920        50         0        0                  0          56.0
## 6      6720        38         0        0                  0          56.0
##   actual_productivity
## 1           0.9407254
## 2           0.8865000
## 3           0.8005705
## 4           0.8005705
## 5           0.8003819
## 6           0.8001250
summary(data)
##      date             quarter           department            day           
##  Length:1197        Length:1197        Length:1197        Length:1197       
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##                                                                             
##       team        targeted_productivity      smv             wip         
##  Min.   : 1.000   Min.   :0.0700        Min.   : 2.90   Min.   :    7.0  
##  1st Qu.: 3.000   1st Qu.:0.7000        1st Qu.: 3.94   1st Qu.:  774.5  
##  Median : 6.000   Median :0.7500        Median :15.26   Median : 1039.0  
##  Mean   : 6.427   Mean   :0.7296        Mean   :15.06   Mean   : 1190.5  
##  3rd Qu.: 9.000   3rd Qu.:0.8000        3rd Qu.:24.26   3rd Qu.: 1252.5  
##  Max.   :12.000   Max.   :0.8000        Max.   :54.56   Max.   :23122.0  
##                                                         NA's   :506      
##    over_time       incentive         idle_time           idle_men      
##  Min.   :    0   Min.   :   0.00   Min.   :  0.0000   Min.   : 0.0000  
##  1st Qu.: 1440   1st Qu.:   0.00   1st Qu.:  0.0000   1st Qu.: 0.0000  
##  Median : 3960   Median :   0.00   Median :  0.0000   Median : 0.0000  
##  Mean   : 4567   Mean   :  38.21   Mean   :  0.7302   Mean   : 0.3693  
##  3rd Qu.: 6960   3rd Qu.:  50.00   3rd Qu.:  0.0000   3rd Qu.: 0.0000  
##  Max.   :25920   Max.   :3600.00   Max.   :300.0000   Max.   :45.0000  
##                                                                        
##  no_of_style_change no_of_workers   actual_productivity
##  Min.   :0.0000     Min.   : 2.00   Min.   :0.2337     
##  1st Qu.:0.0000     1st Qu.: 9.00   1st Qu.:0.6503     
##  Median :0.0000     Median :34.00   Median :0.7733     
##  Mean   :0.1504     Mean   :34.61   Mean   :0.7351     
##  3rd Qu.:0.0000     3rd Qu.:57.00   3rd Qu.:0.8503     
##  Max.   :2.0000     Max.   :89.00   Max.   :1.1204     
## 
str(data)
## 'data.frame':    1197 obs. of  15 variables:
##  $ date                 : chr  "1/1/2015" "1/1/2015" "1/1/2015" "1/1/2015" ...
##  $ quarter              : chr  "Quarter1" "Quarter1" "Quarter1" "Quarter1" ...
##  $ department           : chr  "sweing" "finishing " "sweing" "sweing" ...
##  $ day                  : chr  "Thursday" "Thursday" "Thursday" "Thursday" ...
##  $ team                 : int  8 1 11 12 6 7 2 3 2 1 ...
##  $ targeted_productivity: num  0.8 0.75 0.8 0.8 0.8 0.8 0.75 0.75 0.75 0.75 ...
##  $ smv                  : num  26.16 3.94 11.41 11.41 25.9 ...
##  $ wip                  : int  1108 NA 968 968 1170 984 NA 795 733 681 ...
##  $ over_time            : int  7080 960 3660 3660 1920 6720 960 6900 6000 6900 ...
##  $ incentive            : int  98 0 50 50 50 38 0 45 34 45 ...
##  $ idle_time            : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ idle_men             : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ no_of_style_change   : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ no_of_workers        : num  59 8 30.5 30.5 56 56 8 57.5 55 57.5 ...
##  $ actual_productivity  : num  0.941 0.886 0.801 0.801 0.8 ...

The first few rows of the data set were checked using head() to get a quick look at the data. Each row contains information about a team’s work on a specific day. The summary() function was then used to see basic details like the minimum, maximum, and average values for each column. This showed that the wip column had some missing values, and that actual_productivity ranges from about 0.23 to 1.12. Next, the str() function was used to check the structure of the data set. It showed that some columns, such as department, day, and quarter, were stored as text and needed to be converted to factors. It also confirmed that the data set includes 1197 rows and 15 columns. These steps helped to understand the data and prepare for the cleaning and modeling process.

data$department <- trimws(data$department)

data$department[data$department == "sweing"] <- "sewing"

data$department <- as.factor(data$department)

unique(data$department)
## [1] sewing    finishing
## Levels: finishing sewing

After loading and exploring the dataset using head(), summary(), and str(), the next step was data cleaning. It was noticed that the department column contained extra spaces and a spelling mistake (“sweing” instead of “sewing”). To fix this, the trimws() function was used to remove spaces, the spelling was corrected, and the column was converted to a factor. These steps helped clean up the department names and ensure the values were consistent and ready for modeling.

data$wip[is.na(data$wip)] <- median(data$wip, na.rm = TRUE)

print(data$wip)
##    [1]  1108  1039   968   968  1170   984  1039   795   733   681   872   578
##   [13]   668  1039  1039  1039  1039   861  1039  1039  1039  1039  1039  1039
##   [25]   772   913  1261   844  1005   659  1152  1138   610  1039   944  1039
##   [37]   544  1072  1039  1039  1039  1039  1039   539  1039  1278  1227  1039
##   [49]   878  1033   782  1216  1039  1039   513   734  1202  1039  1039   884
##   [61]  1039  1039  1039  1039  1039  1227  1039  1039  1255  1047  1138   678
##   [73]   712  1037   757   759  1039  1039  1083   944  1039  1039   666  1039
##   [85]  1039  1039  1039  1039  1187  1305  1039  1039   716   925   963  1101
##   [97]  1035  1083   910  1209   590  1039   808  1039  1039  1039  1179  1324
##  [109]  1135  1039  1039  1039  1039  1039   776   990   986   924  1120  1066
##  [121]  1144   413  1039  1039   568  1039  1039  1039  1216  1039  1189   942
##  [133]  1050  1026  1039  1039   783   857   548   411  1066  1039  1039  1039
##  [145]  1039  1039   287   724  1039  1039  1039  1039  1122   970  1158  1039
##  [157]  1039  1039   660   749   893   887  1335  1082  1075   749  1039  1039
##  [169]   966  1039  1039  1039  1095  1039  1039  1383  1012  1209  1039  1039
##  [181]   887   896   805   762  1043  1039  1039  1039   831  1039  1039   562
##  [193]  1208  1039  1039  1039  1039  1039  1099  1093  1031  1233  1039  1039
##  [205]   941   843   760   737  1039   381  1141  1004  1039  1039   581  1039
##  [217]  1039  1039  1039  1039  1039  1073  1156  1211  1126  1039  1039  1004
##  [229]  1063   723   465  1039  1039  1039   530  1297   783   715  1039  1039
##  [241]  1150  1039  1039  1039  1232  1218  1039  1039  1159   972  1092   965
##  [253]  1039   816   947   838  1039  1086  1039  1039  1039  1039  1039  1039
##  [265]  1160  1177  1281  1369  1084  1216   391  1039  1102   872  1039  1076
##  [277]   917  1044  1039  1039  1039  1039  1039  1067  1039  1396  1292   171
##  [289]  1128   865   825  1039  1039  1163  1140  1027  1039  1052  1039  1039
##  [301]  1039  1039  1140   624  1288  1381   882  1002  1039   585  1138  1140
##  [313]  1025  1039  1031  1039  1381  1039  1039  1039  1039  1028  1380  1196
##  [325]  1039  1006   930  1095   938  1484  1039  1255  1040  1492  1039  1039
##  [337]  1111  1039  1047  1434   399   983  1108  1134  1363  1039  1501  1352
##  [349]  1039  1352  1450  1039  1039  1039   825   916  1039   562  1422  1206
##  [361]  1039  1404  1139  1735  1595  1245  1039   757  1039  1720  1039  1039
##  [373]  1039   759  1510   955  1328  1384  1244  1039  1480  1863  1039  1039
##  [385]  1039  1039  1492  1859   980   970  1039  1039  1039  1039  1039  1039
##  [397]  1606   727  1039  1039  1039  1039  1472  1332  1009  1266  1316  1282
##  [409]  1382  1039  1295   925  1039  1039  1448  1039  1039  1039  1039  1039
##  [421]  1039  1039   867  1636  1682  1495  1181  1413  1039   360  1296  1217
##  [433]  1086  1039   532  1557  1039  1039  1039  1039  1039  1039  1039  1016
##  [445]  1072  1584  1764  1745  1250  1039  1118  1143  1337  1039  1519   749
##  [457]  1635  1299  1039  1039  1039  1134  1039  1106  1039  1039  1039  1871
##  [469]  1039  1039  1294  1462   976   338  1039  1455  1282  1559  1350  1039
##  [481]  1039  1039  1175  1868  1039  1045  1342  1039  1340  1429  1039  1039
##  [493]  1413   336   511  1039   958  1039  1416  1039  1287  1444  1088  1039
##  [505]  1039  1039  1039  1039  1579  1170  1015  1039  1436  1039  1601   717
##  [517]  1039   830  1136  1397  1039  1039  1039  1039  1039  1039  1039  1039
##  [529]  1039  1039  1039  1039  1116  1432  1082  1502  1209   282  1417   799
##  [541]  1109  1144  1039  1396  1582  1039  1124  1039  1039  1039  1282  1472
##  [553]  1039  1282  1276  1039  1192   881  1196  1400  1039 16882  1039 21385
##  [565] 21266 21540  1039  1039 12261 23122  8992  1039  9792  2984  1039  1039
##  [577]   839  2698  1039  1435  1039  1500  1188  1039  1039  1112  1452  1039
##  [589]   340  1029  1039  1070   946   913  1363  1639  1039  1039   415  1263
##  [601]   968  1108  1039  1039  1039  1532  1094  1086  1039  1039   941  1471
##  [613]  1042  1248  1039   326  1039   287  1300  1485  1039  1039  1063  1039
##  [625]  1039  1039  1039  1083  1069  1039   668   766   557  1608   417  1167
##  [637]  1039  1186  1060  1534  1142  1039  1039  1039  1021   829  1039   813
##  [649]   677  1039   658   486   687  1039   652  1233  1039  1388  1615  1283
##  [661]  1039  1039  1039  1039  1026   909   670   629   754  1039  1039   712
##  [673]   148   154  1039  1039  1127  1506  1413  1039   970  1271  1073   767
##  [685]   103  1039   773  1039   708   842   901  1039  1039  1039  1268  1546
##  [697]   813  1512  1039  1263   900  1039   640    13   855  1039   855  1039
##  [709]  1039  1025  1061  1039  1039  1557  1498  1039  1039  1039   598   656
##  [721]   854   609   788  1069   667  1039   829  1039    11   434  1397  1039
##  [733]  1327  1039   894  1039   610   779  1039  1039   419  1039  1039   909
##  [745]  1039    14  1193    10   770  1416  1039  1039  1039   942   674  1039
##  [757]   481   664  1039   824   957  1039   822  1261  1039  1039  1039  1039
##  [769]  1420  1039  1120  1039   679  1039   881   507   814   700  1039  1039
##  [781]   919  1361    12  1039  1688  1422  1039  1039  1039  1263   461  1079
##  [793]  1039   895   912  1039   784  1039     7   680  1131  2103  1039  1445
##  [805]  1039  1039  1039  1133  1039   726   849  1039   562  1054  1422  1039
##  [817]   680  2120   179  1039  1039   838   741  1039  1039  1039   685   864
##  [829]   511  1095  1374   759   553  1039  1057  1587  1039  1039  1039  1039
##  [841]   868  1119  1039   962  1039  1126   265   859   276  1039   899  1039
##  [853]  1039   247   653  1116  1039  1039  1603  1039   444  1039   841  1410
##  [865]  1039  1039  1039  1039  1103   922  1039  1039  1055  1039  1146   746
##  [877]   834  1717  1041  1531   627  1039   450  1039  1039  1039  1039  1039
##  [889]  1060   891   541   709  1100   826   577  1094  1228  1641  1583  1039
##  [901]  1039  1039  1039  1039  1039  1062   450   901  1039   739   913   698
##  [913]  1172  1039  1039   408  1079  1068  1039  1834  1039   708  1085  1039
##  [925]  1039  1039  1039   323  1053   786  1003   635  1283   847   361  1653
##  [937]  1039  1039    15   762  1039  1039  1039  1039  1039   694   816  1065
##  [949]  1039   912  1244  1020   916  1591   398  1039  1039   437   413  1039
##  [961]  1039  1164  1039  1039  1039  1039   954  1194   644  1039  1144  1012
##  [973]   792   841  1448   512  1039  1039   609  1448   556  1039  1039  1039
##  [985]  1039  1039  1039  1106   622  1193  1143   895   549  1393  1578   254
##  [997]   347  1039  1039   692  1039   934  1039  1039  1039  1039  1039  1039
## [1009]  1527  1035  1039   632  1231   616  1039   947   551   935   832   544
## [1021]  1511  1039  1039  1039  1039  1039  1039   601  1562   983  1156    29
## [1033]   950   534  1096   402  1039   573  1313  1039  1039  1039  1039  1039
## [1045]  1039  1039   157  1039  1039  1039  1039  1039  1039   433   961  1169
## [1057]   874  1028  1772  1206   586  1334  1039  1039  1039    30   261    52
## [1069]  1039  1039  1039  1039  1039  1039  1039   455   797   959  1373  1228
## [1081]  1117   230  1390   529  1039   834  1450  1039   986  1267  1039  1039
## [1093]  1195   983   471  1039  1226  1251  1039  1039  1039   626  1039  1039
## [1105]  1379   541  1039  1039  1039  1039  1039  1039  1039  1082  1254  1278
## [1117]  1064   313  1039  1239  1335   928  1266   869   711  1039  1039  1039
## [1129]  1039  1039  1039  1079  1322  1039   964  1132   357  1039  1039  1039
## [1141]  1193   817  1576  1039  1262   953   919  1161  1039  1039  1039  1039
## [1153]  1574  1039  1039  1039  1104  1039  1069   756   338  1039  1499  1247
## [1165]  1039   923  1180   957  1001  1017  1039  1039  1039  1039  1039  1039
## [1177]  1039  1039   470   735   560  1039  1039  1674   290   971  1322  1054
## [1189]   992   914  1128   935  1039  1039  1039  1039  1039
summary(data$wip)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##       7     970    1039    1126    1083   23122

To prepare the data for modeling, missing values in the wip column were filled using the median. This was done to make sure the model could run without errors. The median was chosen because it gives a typical value and is not affected by very high or low numbers. This helped keep the data complete and reliable for analysis.

Build the Generalized Linear Model (GLM)

glm_model <- glm(actual_productivity ~ over_time + no_of_workers + targeted_productivity + incentive,
                 family = gaussian(link = "identity"), data = data)

summary(glm_model)
## 
## Call:
## glm(formula = actual_productivity ~ over_time + no_of_workers + 
##     targeted_productivity + incentive, family = gaussian(link = "identity"), 
##     data = data)
## 
## Coefficients:
##                         Estimate Std. Error t value Pr(>|t|)    
## (Intercept)            1.964e-01  3.592e-02   5.469 5.51e-08 ***
## over_time              2.415e-07  2.015e-06   0.120   0.9046    
## no_of_workers         -2.309e-04  3.043e-04  -0.759   0.4481    
## targeted_productivity  7.440e-01  4.692e-02  15.859  < 2e-16 ***
## incentive              7.007e-05  2.863e-05   2.447   0.0145 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 0.0249775)
## 
##     Null deviance: 36.413  on 1196  degrees of freedom
## Residual deviance: 29.773  on 1192  degrees of freedom
## AIC: -1012.7
## 
## Number of Fisher Scoring iterations: 2
AIC(glm_model)
## [1] -1012.738
BIC(glm_model)
## [1] -982.2128
deviance(glm_model)
## [1] 29.77318

Our analysis using a Generalized Linear Model (GLM) has provided valuable insights into the factors affecting worker productivity. The model demonstrates an excellent fit, indicated by very low AIC (-1012.7) and BIC (-982.2128) values, and a minimal residual deviance of 29.77318 on 1192 degrees of freedom, underscoring its predictive accuracy and appropriateness. Significantly, we found that targeted productivity goals and incentive schemes are key drivers of actual productivity. Specifically, teams with higher targets and those receiving incentives showed enhanced performance. In contrast, overtime hours and the number of workers did not emerge as significant predictors, suggesting that simply increasing working hours or team size without strategic planning does not necessarily enhance productivity. These findings emphasize the importance of well-defined goals and motivational rewards over simple quantitative adjustments in workforce management.

step(glm_model, direction = "backward", k = 2)
## Start:  AIC=-1012.74
## actual_productivity ~ over_time + no_of_workers + targeted_productivity + 
##     incentive
## 
##                         Df Deviance      AIC
## - over_time              1   29.774 -1014.72
## - no_of_workers          1   29.788 -1014.16
## <none>                       29.773 -1012.74
## - incentive              1   29.923 -1008.74
## - targeted_productivity  1   36.055  -785.58
## 
## Step:  AIC=-1014.72
## actual_productivity ~ no_of_workers + targeted_productivity + 
##     incentive
## 
##                         Df Deviance      AIC
## - no_of_workers          1   29.798 -1015.75
## <none>                       29.774 -1014.72
## - incentive              1   29.923 -1010.74
## - targeted_productivity  1   36.061  -787.39
## 
## Step:  AIC=-1015.75
## actual_productivity ~ targeted_productivity + incentive
## 
##                         Df Deviance      AIC
## <none>                       29.798 -1015.75
## - incentive              1   29.941 -1012.00
## - targeted_productivity  1   36.200  -784.78
## 
## Call:  glm(formula = actual_productivity ~ targeted_productivity + incentive, 
##     family = gaussian(link = "identity"), data = data)
## 
## Coefficients:
##           (Intercept)  targeted_productivity              incentive  
##             0.1868498              0.7478119              0.0000684  
## 
## Degrees of Freedom: 1196 Total (i.e. Null);  1194 Residual
## Null Deviance:       36.41 
## Residual Deviance: 29.8  AIC: -1016

A Generalized Linear Model (GLM) was used to analyze factors affecting actual productivity. The model included over_time, no_of_workers, targeted_productivity, and incentive as predictors. A Gaussian distribution with an identity link function was chosen because the response variable is continuous.

To refine the model, stepwise selection was applied, using the Akaike Information Criterion (AIC) to identify the most important variables. Additionally, the Variance Inflation Factor (VIF) was checked to ensure that the predictors were not too closely related, A Generalized Linear Model (GLM) was used to analyze factors affecting actual productivity. The model included over_time, no_of_workers, targeted_productivity, and incentive as predictors. A Gaussian distribution with an identity link function was chosen because the response variable is continuous.

To refine the model, stepwise selection was applied, using the Akaike Information Criterion (AIC) to identify the most important variables. Additionally, the Variance Inflation Factor (VIF) was checked to ensure that the predictors were not too closely related,keeping the model reliable.

The stepwise selection process improved the model by removing over_time, as it did not significantly contribute to explaining productivity, evidenced by the improvement in the AIC. The final model includes no_of_workers, targeted_productivity, and incentive. The analysis shows that targeted_productivity and incentives have a significant positive impact on productivity, while no_of_workers slightly decreases productivity. This refined model provides a better fit and more focused predictors, enhancing our understanding of what drives worker productivity.

actual_productivity ~ no_of_workers + targeted_productivity + incentive
## actual_productivity ~ no_of_workers + targeted_productivity + 
##     incentive

The final Generalized Linear Model (GLM) found that targeted productivity and incentives significantly improve actual productivity, with both having positive coefficients. Specifically, teams with higher targets and those receiving incentives performed better. However, the number of workers had a slight negative impact, suggesting that larger teams may not be as efficient. The stepwise selection process confirmed that only these key predictors were necessary, resulting in a more streamlined and accurate model. The model fit was confirmed by a low AIC, suggesting that the chosen variables provide a good explanation of productivity.

model <- glm(actual_productivity ~ over_time + no_of_workers + targeted_productivity + incentive,
              family = gaussian(link = "identity"), data = data)
model1 <- glm(actual_productivity ~ over_time + no_of_workers,
             family = gaussian(link = "identity"), data = data)

summary(model)
## 
## Call:
## glm(formula = actual_productivity ~ over_time + no_of_workers + 
##     targeted_productivity + incentive, family = gaussian(link = "identity"), 
##     data = data)
## 
## Coefficients:
##                         Estimate Std. Error t value Pr(>|t|)    
## (Intercept)            1.964e-01  3.592e-02   5.469 5.51e-08 ***
## over_time              2.415e-07  2.015e-06   0.120   0.9046    
## no_of_workers         -2.309e-04  3.043e-04  -0.759   0.4481    
## targeted_productivity  7.440e-01  4.692e-02  15.859  < 2e-16 ***
## incentive              7.007e-05  2.863e-05   2.447   0.0145 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 0.0249775)
## 
##     Null deviance: 36.413  on 1196  degrees of freedom
## Residual deviance: 29.773  on 1192  degrees of freedom
## AIC: -1012.7
## 
## Number of Fisher Scoring iterations: 2
summary(model1)
## 
## Call:
## glm(formula = actual_productivity ~ over_time + no_of_workers, 
##     family = gaussian(link = "identity"), data = data)
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    7.518e-01  9.476e-03  79.341   <2e-16 ***
## over_time     -1.315e-06  2.217e-06  -0.593    0.553    
## no_of_workers -3.102e-04  3.344e-04  -0.928    0.354    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 0.03038552)
## 
##     Null deviance: 36.413  on 1196  degrees of freedom
## Residual deviance: 36.280  on 1194  degrees of freedom
## AIC: -780.13
## 
## Number of Fisher Scoring iterations: 2

The comparison between two models was made to understand whether including targeted_productivity and incentive actually improved the model. Although these variables had positive effects, it was important to test if they made the overall model more accurate. Comparing a full model (model with the variables) to a simpler model (model1 without them) helped confirm their value and showed which version fit the data best.The result of the two built models, model and model1, showed that model1 provided a much better fit to the data. It had a lower AIC (-1012.7 compared to -780.13), lower residual deviance (29.773 vs. 36.280), and included two significant predictors—targeted_productivity and incentive—which improved the model’s explanatory power.

# Compare AIC and BIC,Deviance and ANOVA

cat("AIC Comparison:\n")
## AIC Comparison:
print(AIC(model, model1))
##        df        AIC
## model   6 -1012.7382
## model1  4  -780.1305
cat("\nBIC Comparison:\n")
## 
## BIC Comparison:
print(BIC(model, model1))
##        df       BIC
## model   6 -982.2128
## model1  4 -759.7802
cat("\nDeviance Comparison:\n")
## 
## Deviance Comparison:
cat("Model Deviance:", deviance(model), "\n")
## Model Deviance: 29.77318
cat("Model1 Deviance:", deviance(model1), "\n")
## Model1 Deviance: 36.28031
cat("\nANOVA Comparison:\n")
## 
## ANOVA Comparison:
print(anova(model, model1, test = "Chisq"))
## Analysis of Deviance Table
## 
## Model 1: actual_productivity ~ over_time + no_of_workers + targeted_productivity + 
##     incentive
## Model 2: actual_productivity ~ over_time + no_of_workers
##   Resid. Df Resid. Dev Df Deviance  Pr(>Chi)    
## 1      1192     29.773                          
## 2      1194     36.280 -2  -6.5071 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The full model(model), which included targeted_productivity and incentive, had an AIC of -1012.7382 and a BIC of -982.2128 , while the simplified model (model1), which excluded these variables, had a higher AIC of -780.13 and BIC of -759.78. These results indicate that the full model provides a much better fit to the data while still maintaining an acceptable level of complexity. The difference in AIC and BIC confirms that targeted_productivity and incentive are important variables that contribute significantly to the model’s explanatory power and should not be removed.

Use the tools from previous weeks to diagnose the model

Highlight any issues with the model To diagnose the Generalized Linear Model (GLM), we will use various diagnostic tools. These tools will help identify potential issues with the model and improve its performance.

Residual Analysis:

par(mfrow = c(1, 2)) 

plot(glm_model$fitted.values, glm_model$residuals, 
     xlab = "Fitted Values", ylab = "Residuals", 
     main = "Residuals vs Fitted Values")
abline(h = 0, col = "red") 

qqnorm(glm_model$residuals, main = "Normal Q-Q Plot")
qqline(glm_model$residuals, col = "red") 

par(mfrow = c(1, 1))

The residual plot shows that the residuals are spread out randomly around zero, meaning the model fits the data well without problems. The Q-Q plot shows that the residuals mostly follow a normal distribution, with a few small deviations at the ends, suggesting the model’s errors are nearly normal. Overall, the model seems to fit the data well, with no major issues, though there may be a few small outliers or minor issues with normality.

Checking for Multicollinearity:

library(car)
## Warning: package 'car' was built under R version 4.4.3
## Loading required package: carData
vif(glm_model)
##             over_time         no_of_workers targeted_productivity 
##              2.180145              2.184469              1.009947 
##             incentive 
##              1.007314

The VIF values show that the predictors (over_time, no_of_workers, targeted_productivity, and incentive) are not highly correlated with each other. This means that each predictor can be reliably estimated without issues from multicollinearity, allowing for a clearer understanding of their individual impact on productivity.

Checking for Overfitting:

library(boot)
## 
## Attaching package: 'boot'
## The following object is masked from 'package:car':
## 
##     logit
cv_result <- cv.glm(data, glm_model, K = 10)
cv_result$delta
## [1] 0.02509775 0.02508557

The results show that the model performs well, with similar error values for both the training and test data. This suggests the model is not over fitting and generalizes well to new, unseen data.

Check Model Fit (AIC, BIC, Deviance):

AIC(glm_model)
## [1] -1012.738
BIC(glm_model)
## [1] -982.2128
deviance(glm_model)
## [1] 29.77318

The unchanged AIC, BIC, and Deviance values show that our model fits the data well and isn’t too complicated. These metrics confirm that the model is a good fit and doesn’t need any further simplification.

Plot Residuals vs. Fitted Values:

plot(glm_model$fitted.values, glm_model$residuals)
abline(h = 0, col = "red")

The Residuals vs. Fitted Values plot shows that the residuals are randomly scattered around zero. This means the model fits the data well and doesn’t have major issues like non-linearity or changing variance. The model appears to be a good fit for the data.

Cook’s Distance:

cooks.distance <- cooks.distance(glm_model)
plot(cooks.distance)

The Cook’s distance plot shows that most points don’t affect the model much. A few points have more influence, but they don’t greatly change the model. Overall, the model is stable with only a few influential points. The diagnostic tools suggest that the model is performing well, with no major issues related to residuals, multicollinearity, or over fitting. The model appears stable, and the predictors have significant effects on the response variable, making it a reliable model for predicting worker productivity.

Interpret at least one of the coefficients

summary(glm_model)
## 
## Call:
## glm(formula = actual_productivity ~ over_time + no_of_workers + 
##     targeted_productivity + incentive, family = gaussian(link = "identity"), 
##     data = data)
## 
## Coefficients:
##                         Estimate Std. Error t value Pr(>|t|)    
## (Intercept)            1.964e-01  3.592e-02   5.469 5.51e-08 ***
## over_time              2.415e-07  2.015e-06   0.120   0.9046    
## no_of_workers         -2.309e-04  3.043e-04  -0.759   0.4481    
## targeted_productivity  7.440e-01  4.692e-02  15.859  < 2e-16 ***
## incentive              7.007e-05  2.863e-05   2.447   0.0145 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 0.0249775)
## 
##     Null deviance: 36.413  on 1196  degrees of freedom
## Residual deviance: 29.773  on 1192  degrees of freedom
## AIC: -1012.7
## 
## Number of Fisher Scoring iterations: 2

Targeted productivity and incentives are the most significant factors affecting actual productivity, with higher targets and incentives leading to greater productivity. On the other hand, overtime hours and the number of workers do not significantly impact productivity. However, an increase in the number of workers slightly decreases productivity in this model.

Conclusion:

This analysis used a Generalized Linear Model (GLM) to study factors affecting worker productivity in a garment factory. The results showed that incentive had a strong positive effect on productivity, while higher targeted productivity levels had a negative effect—possibly because high goals create pressure or are difficult to achieve. Overtime hours and the number of workers did not show significant effects, suggesting they may not be effective strategies for improving productivity. Model diagnostics showed no major issues. Overall, the final model demonstrated that incentive and targeted productivity were key explanatory variables. Including both improved model fit and provided valuable insights that can help management make more informed decisions to enhance productivity on the factory floor.