Final project

Introduction

Productivity in garment manufacturing is a critical factor that influences operational efficiency, cost-effectiveness, and timely delivery. As labor-intensive industries face growing pressure to maintain high performance while minimizing overhead, understanding the factors that drive or hinder productivity becomes essential.

This project investigates the key determinants of worker productivity using data from a garment factory, applying statistical and machine learning techniques to identify trends, patterns, and actionable insights. The analysis aims to uncover productivity patterns over time and across teams, and to explore how daily operational practices—such as incentive use, overtime, and team composition—impact output. By doing so, the project supports data-driven strategies that enable managers to make informed decisions in real-world production environments. ### Project Objectives Understand the key drivers of productivity in garment manufacturing teams.

Identify patterns that influence performance across time periods and teams.

Explore how operational practices such as incentives, overtime, and idle time affect output.

Support data-informed decision-making in labor-intensive environments.

Provide actionable insights that lead to measurable performance improvements.

Problem Statement

The primary challenge addressed in this project is the inconsistent productivity levels across different teams and time periods. Managers lack clarity on whether factors such as team size, overtime, idle time, or incentives significantly impact output. Without this understanding, interventions risk being inefficient or ineffective.

Research Questions

This project seeks to answer the following key questions related to garment worker productivity:

What are the most important factors that influence actual productivity in garment manufacturing teams?

Does increasing overtime actually improve productivity, or might it have unintended negative effects?

How does productivity vary across different days of the week, departments, and team sizes?

What combination of variables best predicts productivity in this dataset?

These questions guided the selection of statistical methods and models, as well as the interpretation of patterns and relationships within the dataset. By addressing them, the project aims to provide evidence-based recommendations for improving production efficiency and operational decision-making. ### Methodology A combination of exploratory data analysis, statistical modeling, and hypothesis testing was employed:

Dataset: Daily records of productivity including variables such as department, team, targeted vs actual productivity, number of workers, incentives, overtime, idle time, and SMV (Standard Minute Value).

Preprocessing: Dates were formatted, missing values handled, and key variables engineered, such as adjusted SMV and expected productivity.

Task Distribution by Day of the Week

To understand how the day of the week affects productivity, the total number of tasks completed each day was counted and visualized. This helps show whether some days are more productive than others.

As shown in Figure X, task counts were fairly steady from Sunday to Wednesday, with Wednesday having the highest number of tasks. Saturday had the lowest task count, which may be due to fewer working hours, fewer staff, or workers being tired near the end of the week.

This information can help factory managers plan better schedules and assign workers more effectively based on when productivity is usually higher.

data <- read.csv("C:/Users/rbada/Downloads/productivity+prediction+of+garment+employees/garments_worker_productivity.csv")

library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(ggplot2)
task_counts <- data %>%
  group_by(day) %>%
  summarize(total_tasks = n())

# Visualize the distribution
ggplot(task_counts, aes(x = day, y = total_tasks)) +
  geom_bar(stat = "identity", fill = "steelblue") +
  labs(title = "Task Distribution by Day", x = "Day of the Week", y = "Total Tasks")

Distribution of Standard Minute Value (SMV) Before addressing missing values or applying any modeling techniques, it is important to explore the distribution of key variables. The plot shows a histogram of the Standard Minute Value (SMV), which represents the estimated time required to complete a garment task. This plot helps visualize how SMV values are spread across the dataset. Most values appear to fall within a moderate range, though there may be some extreme values. Understanding the distribution of SMV is essential before running further analysis, especially if we plan to use it in regression models or compare across teams.

library(ggplot2)

ggplot(data, aes(x = smv)) +  # Use SMV column instead of 'cty'
  geom_histogram(color = "white", fill = "#3182bd", bins = 30) +  # Adjust bin size as needed
  labs(title = "Distribution of SMV",
       x = "Standard Minute Value (SMV)", 
       y = "Count") +
  theme_classic()

Missing vs. Available WIP Data

To check the quality of the data, we looked at how many values were missing in the Work in Progress (WIP) column.The plot shows that a large number of WIP values are missing. Since WIP is important for understanding how much work is being done, missing values can make the analysis less accurate.In this analysis, we assume that missing WIP values are not the same as zero progress.Instead, they likely come from mistakes during data entry or problems in how the data was collected. If this assumption is wrong, and missing means no work was done, the results could change. This chart highlights the need for better data collection. Making sure that WIP data is complete and clearly recorded will help future analysis be more accurate and useful for decision-making.

# Create a summary of missing vs. available WIP data
wip_missing_summary <- data %>%
  mutate(wip_missing = ifelse(is.na(wip), "Missing", "Available")) %>%
  count(wip_missing)

# Plot Missing vs. Available WIP
ggplot(wip_missing_summary, aes(x = wip_missing, y = n, fill = wip_missing)) +
  geom_bar(stat = "identity") +
  labs(title = "Missing vs. Available WIP Data", x = "WIP Data", y = "Count") +
  scale_fill_manual(values = c("red", "blue")) +
  theme_minimal()

### Productivity vs. Overtime by Team This scatter plot shows the relationship between total overtime hours and average productivity across different teams. Each point represents a team, and colors are used to distinguish between them. The chart helps to explore whether working more overtime actually leads to better productivity. From the plot, there is no strong positive trend—some teams with high overtime have only moderate productivity, while others with less overtime perform just as well or better.This suggests that increased overtime does not always improve team productivity. In some cases, it may even have a negative effect due to fatigue or inefficiencies. This finding supports the idea that improving workflow and managing regular work hours effectively may be more beneficial than relying on extended shifts.

# Calculate productivity vs overtime for each team
productivity_overtime_comparison <- data %>%
  group_by(team) %>%
  summarise(
    total_overtime = sum(over_time, na.rm = TRUE),
    avg_productivity = mean(actual_productivity, na.rm = TRUE)
  )
print(productivity_overtime_comparison)

## # A tibble: 12 × 3
##     team total_overtime avg_productivity
##    <int>          <int>            <dbl>
##  1     1         503310            0.821
##  2     2         477960            0.771
##  3     3         510690            0.804
##  4     4         572220            0.770
##  5     5         495780            0.698
##  6     6         316695            0.685
##  7     7         466290            0.668
##  8     8         470040            0.674
##  9     9         469980            0.734
## 10    10         473670            0.720
## 11    11         382140            0.682
## 12    12         328475            0.779

# Plot Productivity vs Overtime
ggplot(productivity_overtime_comparison, aes(x = total_overtime, y = avg_productivity, color = as.factor(team))) +
  geom_point(size = 4) + # scatter plot
  labs(title = "Productivity vs Overtime", x = "Total Overtime", y = "Average Productivity") +
  theme_minimal()

### From Exploratory Analysis to Hypothesis Testing The previous visualizations helped explore different factors that may influence garment worker productivity. The SMV distribution showed variation in task complexity, providing context for understanding productivity expectations. The Productivity vs. Overtime plot suggested that higher overtime does not always result in better performance. The Missing vs. Available WIP chart highlighted gaps in data quality that could affect analysis reliability, while the Task Counts by Day plot revealed small differences in activity across the week. Building on those initial findings, the density plot(Incentives vs. Productivity) offered a clearer view of how financial incentives may affect output. It showed that teams with high incentives tend to be more productive and consistent, while teams with low incentives had more variability and lower average performance. To go beyond visual patterns, the next step is to conduct hypothesis testing. Specifically, we will perform a t-test to compare the productivity levels between high-incentive and low-incentive teams. This will help determine whether the observed difference is statistically meaningful or due to chance. By combining visual insights with formal tests, we aim to provide stronger, evidence-based

data$team_category <- ifelse(data$no_of_workers <= median(data$no_of_workers), "Small", "Large")

ggplot(data, aes(x = actual_productivity, fill = team_category)) +
  geom_density(alpha = 0.5) +
  labs(title = "Density Plot of Productivity: Small vs. Large Teams",
       x = "Actual Productivity",
       y = "Density",
       fill = "Team Size") +
  theme_minimal()

# Create a new column to categorize incentives into High and Low
data$incentive_group <- ifelse(data$incentive <= median(data$incentive, na.rm = TRUE), "Low", "High")

ggplot(data, aes(x = actual_productivity, fill = incentive_group)) +
  geom_density(alpha = 0.5) +
  labs(title = "Density Plot of Productivity: High vs. Low Incentives",
       x = "Actual Productivity",
       y = "Density",
       fill = "Incentive Level") +
  theme_minimal()

Understanding Productivity Drivers Through Linear Regression

To further understand which factors influence productivity in garment manufacturing, we applied simple linear regression models and scatter plots with trend lines to explore key operational variables.The Overtime Hours vs Actual Productivity plot revealed a slight negative relationship, indicating that increased overtime may not lead to higher productivity and could potentially contribute to fatigue or inefficiencies. The Team Size (No. of Workers) vs Actual Productivity plot showed a nearly flat trend line, suggesting that team size alone has little to no direct impact on productivity. This supports earlier findings that effective coordination and workflow management are likely more important than the number of workers on a team. In contrast, the Incentives vs Actual Productivity plot demonstrated a positive relationship, where higher financial incentives were generally associated with improved productivity outcomes, though the effect was modest. Lastly, the Idle Time vs Actual Productivity plot displayed a clear negative trend, confirming that increased idle time is linked to lower productivity. While the statistical effect was small, it was significant, emphasizing the importance of minimizing unproductive time during operations. Together, these visualizations provide valuable insight into which factors most directly affect worker performance and suggest that reducing idle time and offering well-structured incentives may be more effective than simply increasing overtime or team size.

ggplot(data, aes(x = over_time, y = actual_productivity)) +
  geom_point() +  
  geom_smooth(method = "lm", se = FALSE, color = "blue") +  
  labs(title = "Overtime Hours vs Actual Productivity", x = "Overtime Hours", y = "Actual Productivity")

## `geom_smooth()` using formula = 'y ~ x'

ggplot(data, aes(x = no_of_workers, y = actual_productivity)) +
  geom_point() +  
  geom_smooth(method = "lm", se = FALSE, color = "blue") +  
  labs(title = "Team Size (No. of Workers) vs Actual Productivity", x = "Team Size (No. of Workers)", y = "Actual Productivity")

## `geom_smooth()` using formula = 'y ~ x'

ggplot(data, aes(x = incentive, y = actual_productivity)) +
  geom_point() +  
  geom_smooth(method = "lm", se = FALSE, color = "blue") +  
  labs(title = "Incentives vs Actual Productivity", x = "Incentives", y = "Actual Productivity")

## `geom_smooth()` using formula = 'y ~ x'

ggplot(data, aes(x = idle_time, y = actual_productivity)) +
  geom_point() +  
  geom_smooth(method = "lm", se = FALSE, color = "blue") +  
  labs(title = "Idle Time vs Actual Productivity", x = "Idle Time", y = "Actual Productivity")

## `geom_smooth()` using formula = 'y ~ x'

### Key Findings This project explored what affects productivity in a garment factory by using charts, hypothesis tests, and simple regression models. The analysis showed that tasks were fairly evenly spread during the week, but Saturdays had noticeably lower output. The SMV values, which measure how complex a task is, varied widely, helping us understand why some tasks take more time than others. There were many missing values in the WIP (Work in Progress) data, which shows that some production activity might not have been recorded properly. This needs to be improved for future analysis.When looking at key factors, higher incentives were clearly linked to better productivity, and more idle time was linked to lower productivity. In contrast, larger teams and more overtime did not lead to higher output. T-tests confirmed that teams with higher incentives and smaller team sizes performed significantly better. The linear regression models supported this by showing that incentives had a small positive effect, while idle time had a small negative effect. Overtime and team size had little to no impact. In summary, the most effective ways to improve productivity are to reduce idle time, offer performance-based incentives, and focus on team coordination—not just increase working hours or team size.

Recommendations

Based on the findings of this analysis, several recommendations can help improve productivity in garment manufacturing. First, incentive structures should be strengthened to motivate performance. However, incentives should be used thoughtfully and not as a substitute for other efficiency measures, such as reducing excessive overtime. Second, idle time should be minimized by improving task scheduling, maintaining equipment, and balancing workloads more effectively. Reducing delays and unproductive time on the floor is likely to yield measurable productivity gains. Third, attention should shift from increasing team size to optimizing operational flow. Investing in workflow management tools or techniques may help streamline production and reduce coordination problems. Finally, performance data should be used to identify what top-performing teams are doing well. These best practices can then be shared or implemented across other teams to improve consistency and overall efficiency.

Conclusion

This project used a data-driven approach to explore the key factors affecting productivity in garment manufacturing. Through visualizations, hypothesis testing, and linear regression analysis, the results showed that incentives and idle time play a significant role in shaping performance, while team size and overtime have limited or no impact on output. These findings highlight the need to move beyond assumptions and focus on evidence-based strategies. Improving productivity is not simply about adding more hours or more workers,it’s about working smarter, not harder. By reducing idle time, strengthening incentive programs, and learning from top-performing teams, managers can make informed decisions that lead to more efficient, consistent, and sustainable production outcomes.