In this analysis, we aim to explore key factors influencing productivity in a garment manufacturing setting by conducting A/B testing on the provided data set. A/B testing is a statistical method used to compare two groups and determine if a specific change significantly impacts the outcome. This approach is particularly useful in identifying operational efficiencies or inefficiencies within the manufacturing process. The primary objective of the analysis is to understand how different variables such as overtime, incentives, and team size affect the actual productivity of garment workers. By systematically comparing these factors, we aim to draw actionable insights that could lead to improved productivity and operational decision-making.

1- Productivity on specific weekdays (e.g., Monday vs. Thursday).

Specific Question: How does productivity on a specific weekday like Monday or Thursday compare to productivity on weekends (Saturday and Sunday)?

Group A (Specific Weekday): Productivity on a chosen weekday — for example, Monday or Thursday, based on your interest or previous observations that suggest these days might have unique productivity patterns.

Group B (Weekend): Productivity on Saturday and Sunday combined.

Null Hypothesis (H0): There is no significant difference in productivity between specific weekdays and weekends.

Alternative Hypothesis (H1): Productivity on specific weekend (like Saturday or Sunday) is significantly different from the productivity on weekends.

2- Day of the Week and Productivity Objective: Determine if productivity differs across various days of the week.

Specific Question: Does productivity on a specific weekday ( Monday) differ significantly from productivity on other days (Tuesday through Thursday)?

Group A: Productivity on a specific day (e.g., Monday).

Group B: Productivity on all other days (e.g., Tuesday through Thursday).

Null Hypothesis (H0): Productivity on Monday is the same as on other days.

Alternative Hypothesis (H1): Productivity on Monday is different from the rest of the week.

3- Impact of Incentives on Productivity Objective: Evaluate whether higher incentives are associated with increased productivity.

Specific Question: Do workers receiving incentives above the median value exhibit higher productivity compared to those receiving incentives at or below the median value?

Group A: Workers receiving incentives above the median value.

Group B: Workers receiving incentives at or below the median value.

Null Hypothesis (H0): Incentive levels do not affect productivity.

Alternative Hypothesis (H1): Workers with higher incentives have higher productivity.

4- Productivity Differences Between Departments Objective: Assess if there are productivity differences between departments influenced by the number of workers or overtime hours.

Specific Question . Is there a significant difference in productivity between departments, such as sewing versus finishing, that can be attributed to factors like team size and overtime?

Group A: One department (e.g., sewing).

Group B: Another department (e.g., finishing).

Null Hypothesis (H0): Productivity differences between departments are not significantly influenced by the number of workers or overtime hours.

Alternative Hypothesis (H1): Productivity in the sewing department is significantly different from the finishing department due to factors like team size and overtime.

5- Impact of Overtime on Productivity Objective: Examine the effect of overtime on productivity.

Specific Question: Does working overtime hours above the median impact productivity negatively compared to those with less or no overtime?

Group A: Workers with overtime hours above the median.

Group B: Workers with overtime hours at or below the median.

Null Hypothesis (H0): Overtime does not affect productivity negatively.

Alternative Hypothesis (H1): Excessive overtime leads to a decrease in productivity.

data <- read.csv("C:/Users/rbada/Downloads/productivity+prediction+of+garment+employees/garments_worker_productivity.csv")

Determine if you have enough data to perform a hypothesis test using the Neyman-Pearson framework.

Hypothesis1: Impact of Weekdays vs. Weekends on Worker Productivity.

I implemented the Neyman-Pearson framework to establish exact error rates and enhance the reliability of hypothesis testing regarding productivity differences. Steps Performed:

1-Data Filtering: Separated weekday and weekend productivity data.

2-Library Loading: Imported dplyr for data manipulation; pwrss and pwr for power analysis.

3-Group Definition: Defined specific groups using dplyr for weekdays vs. weekends.

4-Statistical Calculations: Computed means and standard deviations.

5-Display Statistics: Showed summary statistics.

6-Power Analysis: Conducted power analysis with pwrss to determine necessary sample sizes.

7-Effect Size Calculation: Calculated effect size and adjusted sample size with pwr.

8-T-test Application: Conducted a t-test under the Neyman-Pearson framework to test the hypotheses.

# Filter data for weekdays (assuming 'day_type' column categorizes days)
group_weekday <- data[data$day_type == "Weekday", "actual_productivity", drop=FALSE]
group_weekend <- data[data$day_type == "Weekend", "actual_productivity", drop=FALSE]

# Load dplyr package
library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

group_weekday <- data %>%
  filter(day %in% c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday")) %>%
  select(actual_productivity)
group_weekend <- data %>%
  filter(day %in% c("Saturday", "Sunday")) %>%
  select(actual_productivity)

mean_weekday <- mean(group_weekday$actual_productivity)
mean_weekend <- mean(group_weekend$actual_productivity)
sd_weekday <- sd(group_weekday$actual_productivity)
sd_weekend <- sd(group_weekend$actual_productivity)
cat("Mean (Weekdays):", mean_weekday, "\n")

## Mean (Weekdays): 0.7328212

cat("Mean (Weekends):", mean_weekend, "\n")

## Mean (Weekends): 0.739788

cat("SD (Weekdays):", sd_weekday, "\n")

## SD (Weekdays): 0.1733354

cat("SD (Weekends):", sd_weekend, "\n")

## SD (Weekends): 0.1769805

library(pwrss)

## 
## Attaching package: 'pwrss'

## The following object is masked from 'package:stats':
## 
##     power.t.test

test_productivity <- pwrss.t.2means(
  mu1 = mean_weekday,  
  sd1 = sd_weekday,  
  mu2 = mean_weekend, 
  sd2 = sd_weekend,  
  kappa = 1,     
  power = 0.85,  
  alpha = 0.05,  
  alternative = "not equal" )

##  Difference between Two means 
##  (Independent Samples t Test) 
##  H0: mu1 = mu2 
##  HA: mu1 != mu2 
##  ------------------------------ 
##   Statistical power = 0.85 
##   n1 = 11353 
##   n2 = 11353 
##  ------------------------------ 
##  Alternative = "not equal" 
##  Degrees of freedom = 22704 
##  Non-centrality parameter = -2.997 
##  Type I error rate = 0.05 
##  Type II error rate = 0.15

print(test_productivity)

## $parms
## $parms$mu1
## [1] 0.7328212
## 
## $parms$mu2
## [1] 0.739788
## 
## $parms$sd1
## [1] 0.1733354
## 
## $parms$sd2
## [1] 0.1769805
## 
## $parms$kappa
## [1] 1
## 
## $parms$welch.df
## [1] FALSE
## 
## $parms$paired
## [1] FALSE
## 
## $parms$paired.r
## [1] 0.5
## 
## $parms$alpha
## [1] 0.05
## 
## $parms$margin
## [1] 0
## 
## $parms$alternative
## [1] "not equal"
## 
## $parms$verbose
## [1] TRUE
## 
## 
## $test
## [1] "t"
## 
## $df
## [1] 22704
## 
## $ncp
## [1] -2.996558
## 
## $power
## [1] 0.85
## 
## $n
##    n1    n2 
## 11353 11353 
## 
## attr(,"class")
## [1] "pwrss"  "t"      "2means"

plot(test_productivity)

## Warning in qt(1 - prob.extreme, df = df, ncp = ncp, lower.tail = TRUE): full
## precision may not have been achieved in 'pnt{final}'

library(pwr)
effect_size <- (mean_weekend - mean_weekday) / ((sd_weekday + sd_weekend) / 2)

sample_size <- pwr.t.test(
  d = effect_size,
  power = 0.85,
  sig.level = 0.05,
  type = "two.sample",
  alternative = "two.sided")
print(sample_size)

## 
##      Two-sample t test power calculation 
## 
##               n = 11351.5
##               d = 0.03977461
##       sig.level = 0.05
##           power = 0.85
##     alternative = two.sided
## 
## NOTE: n is number in *each* group

t_test_result <- t.test(group_weekday, group_weekend, var.equal = FALSE)
print(t_test_result)

## 
##  Welch Two Sample t-test
## 
## data:  group_weekday and group_weekend
## t = -0.64259, df = 754.95, p-value = 0.5207
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.02825042  0.01431674
## sample estimates:
## mean of x mean of y 
## 0.7328212 0.7397880

T-test showed no significant difference in productivity between weekdays and weekends (p = 0.5207), meaning any small variations are likely due to chance. The box plot supports this by showing that both groups have similar median productivity and spread, with some outliers in both. The density plot further confirms this, as the weekday (red) and weekend (blue) curves overlap almost completely, indicating that workers perform at similar levels regardless of the day. Since the test and plots all point to the same conclusion, adjusting work schedules based on weekday vs. weekend productivity is unnecessary. Instead, analyzing overtime, incentives, or specific weekdays may provide more meaningful insights into productivity trends. Since we found no significant difference in productivity between weekdays and weekends, the next steps should focus on other factors that may impact productivity

Data Visualization

library(ggplot2)
data$weekend_status <- ifelse(data$day %in% c("Saturday", "Sunday"), "Weekend", "Weekday")
ggplot(data, aes(x = weekend_status, y = actual_productivity, fill = weekend_status)) +
  geom_boxplot() +
  labs(title = "Comparison of Productivity: Weekdays vs. Weekends", 
       x = "Day Type", 
       y = "Actual Productivity") +
  theme_minimal()

The box plot shows no major difference in productivity between weekdays (red) and weekends (blue). The median and spread of values are similar, indicating consistent worker performance across both periods.This suggests that work schedules do not significantly impact productivity, and adjusting workdays may not be necessary. Instead, focusing on factors like incentives might be more effective for improving efficiency

ggplot(data, aes(x = actual_productivity, color = weekend_status)) +
  geom_density() +
  labs(title = "Density Plot of Productivity: Weekdays vs. Weekends", 
       x = "Actual Productivity", 
       y = "Density") +
  theme_minimal()

The density plot shows nearly identical productivity distributions for weekdays (red) and weekends (blue). The peaks align closely, confirming no significant difference in worker performance between weekdays and weekends.This suggests that work scheduling adjustments may not be necessary, and other factors (such as incentives or workload balance) may have a greater impact on productivity.

Hypothesis 2: Impact of Team Size on Productivity

To explore how team size impacts worker productivity, I used statistical methods to classify and analyze data based on team sizes. By conducting rigorous statistical tests, including effect size calculation and hypothesis testing, I aimed to determine if significant productivity differences exist between small and large teams.

Steps Executed:

1-Calculate Median Team Size: Identify the median number of workers to define small and large teams.

2-Segment Data: Classify teams as either small or large based on the median.

3-Compute Statistics: Calculate means and standard deviations for each group.

4-Load Libraries: Import pwrss for power analysis and pwr for effect size and sample size calculations. 5-Power Analysis: Determine the necessary sample size to detect productivity differences.

5-Effect Size Calculation: Measure the effect size between small and large teams.

6-Sample Size Calculation: Compute the required sample size to achieve adequate statistical power.

7-Conduct Welch’s t-test: Perform a t-test to statistically compare productivity between small and large teams, and display the results.

if (!"team_category" %in% colnames(data)) {
  median_team_size <- median(data$no_of_workers, na.rm = TRUE)  
  data$team_category <- ifelse(data$no_of_workers > median_team_size, "Large Team", "Small Team")  
}

print(head(data$team_category))

## [1] "Large Team" "Small Team" "Small Team" "Small Team" "Large Team"
## [6] "Large Team"

table(data$team_category)

## 
## Large Team Small Team 
##        582        615

Your data set has 582 large teams and 615 small teams, which are close in size. This balance ensures that the statistical test remains reliable and minimizes bias. With similar sample sizes, the comparison of productivity between small and large teams is more meaningful, supporting the finding that smaller teams tend to be more productive.

median_team_size <- median(data$no_of_workers, na.rm = TRUE)

small_team <- data[data$no_of_workers <= median_team_size, "actual_productivity", drop = TRUE]
large_team <- data[data$no_of_workers > median_team_size, "actual_productivity", drop = TRUE]

mean_small_team <- mean(small_team)
mean_large_team <- mean(large_team)

sd_small_team <- sd(small_team)
sd_large_team <- sd(large_team)

cat("Mean (Small Teams):", mean_small_team, "\n")

## Mean (Small Teams): 0.7527851

cat("Mean (Large Teams):", mean_large_team, "\n")

## Mean (Large Teams): 0.7163938

cat("SD (Small Teams):", sd_small_team, "\n")

## SD (Small Teams): 0.1828677

cat("SD (Large Teams):", sd_large_team, "\n")

## SD (Large Teams): 0.163255

The results show that small teams (Mean = 0.7528) have higher productivity compared to large teams (Mean = 0.7164). Additionally, small teams have more variation in productivity (SD = 0.1829) compared to large teams (SD = 0.1633), indicating that larger teams may have more stable but lower productivity levels. Small teams tend to be more productive on average, suggesting possible advantages in efficiency, focus, or workload management. Large teams exhibit slightly lower but more consistent productivity, which may be due to more structured processes or workload distribution. The difference in means suggests team size might influence productivity, but statistical testing is needed to confirm significance.

library(pwrss)

test_team_size <- pwrss.t.2means(
  mu1 = mean_small_team,  
  sd1 = sd_small_team,  
  mu2 = mean_large_team,  
  sd2 = sd_large_team,  
  kappa = 1,     
  power = 0.85,  
  alpha = 0.05,  
  alternative = "not equal" )

##  Difference between Two means 
##  (Independent Samples t Test) 
##  H0: mu1 = mu2 
##  HA: mu1 != mu2 
##  ------------------------------ 
##   Statistical power = 0.85 
##   n1 = 409 
##   n2 = 409 
##  ------------------------------ 
##  Alternative = "not equal" 
##  Degrees of freedom = 816 
##  Non-centrality parameter = 3.002 
##  Type I error rate = 0.05 
##  Type II error rate = 0.15

print(test_team_size)

## $parms
## $parms$mu1
## [1] 0.7527851
## 
## $parms$mu2
## [1] 0.7163938
## 
## $parms$sd1
## [1] 0.1828677
## 
## $parms$sd2
## [1] 0.163255
## 
## $parms$kappa
## [1] 1
## 
## $parms$welch.df
## [1] FALSE
## 
## $parms$paired
## [1] FALSE
## 
## $parms$paired.r
## [1] 0.5
## 
## $parms$alpha
## [1] 0.05
## 
## $parms$margin
## [1] 0
## 
## $parms$alternative
## [1] "not equal"
## 
## $parms$verbose
## [1] TRUE
## 
## 
## $test
## [1] "t"
## 
## $df
## [1] 816
## 
## $ncp
## [1] 3.002259
## 
## $power
## [1] 0.85
## 
## $n
##  n1  n2 
## 409 409 
## 
## attr(,"class")
## [1] "pwrss"  "t"      "2means"

plot(test_team_size)

## Warning in qt(1 - prob.extreme, df = df, ncp = ncp, lower.tail = TRUE): full
## precision may not have been achieved in 'pnt{final}'

The power analysis confirms that we have enough data (409 samples per group) to detect a meaningful difference in productivity between small and large teams. Our test is 85% powered (β = 0.15) to identify true differences while maintaining a 5% significance level (α = 0.05) to control false positives. The power curve confirms that our test effectively controls both Type I (false positives) and Type II (false negatives) errors. This ensures that the results will be statistically reliable when assessing the impact of team size on productivity. The calculated effect size is small to moderate, suggesting that while a difference may be detected, it may not be practically large. This implies that productivity differences between small and large teams exist but may not be drastic.

library(pwr)
effect_size_team <- (mean_large_team - mean_small_team) / ((sd_small_team + sd_large_team) / 2)
sample_size_team <- pwr.t.test(
  d = effect_size_team,
  power = 0.85,
  sig.level = 0.05,
  type = "two.sample",
  alternative = "two.sided")
print(sample_size_team)

## 
##      Two-sample t test power calculation 
## 
##               n = 407.0628
##               d = 0.2102798
##       sig.level = 0.05
##           power = 0.85
##     alternative = two.sided
## 
## NOTE: n is number in *each* group

The required sample size per group is approximately 407, ensuring that our two-sample t-test is statistically reliable. This confirms that we have enough observations to detect potential differences in productivity between small and large teams.The effect size ( d = 0.2103) is relatively small, indicating that while a difference in productivity exists, it may not be practically large. This suggests that team size may have an impact, but the effect is not drastic. Significance Level (α = 0.05): The test maintains a 5% probability of Type I errors (false positives), ensuring strict statistical control. Power (1 - β = 0.85): There is an 85% probability of detecting a true difference in productivity if one exists, minimizing the risk of missing a real effect (Type II error).

t_test_team <- t.test(small_team, large_team, var.equal = FALSE, alternative = "two.sided")

print(t_test_team)

## 
##  Welch Two Sample t-test
## 
## data:  small_team and large_team
## t = 3.6361, df = 1191, p-value = 0.0002887
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  0.01675515 0.05602747
## sample estimates:
## mean of x mean of y 
## 0.7527851 0.7163938

The t-test confirms a significant difference in productivity between small and large teams (p-value = 0.0002887). Since p < 0.05, we reject the null hypothesis (H₀) and conclude that team size impacts productivity. Effect of Team Size on Productivity Small Teams (Mean = 0.7528): Higher productivity on average. Large Teams (Mean = 0.7164): Lower productivity on average. Confidence Interval (0.0168, 0.0560): The true difference in productivity falls between 1.68% and 5.6%, confirming that smaller teams consistently perform better. Effect size (d = 0.2103) is small, meaning that while the difference is statistically significant, it may not be practically large. The productivity difference is consistent but not drastic, suggesting that other factors (e.g., task complexity, work environment, or management style) may also contribute to productivity levels. This means, Smaller teams tend to perform better, but the effect is modest. Organizations should consider other variables alongside team size for overall productivity improvements.

Data Visualization

library(ggplot2)

ggplot(data, aes(x = team_category, y = actual_productivity, fill = team_category)) +
  geom_boxplot() +
  labs(title = "Effect of Team Size on Productivity",
       x = "Team Size",
       y = "Actual Productivity") +
  theme_minimal() +
  theme(legend.position = "none")

The box plot shows that small teams (blue) have higher median productivity than large teams (red). Large teams have more variability and lower outliers, suggesting potential efficiency challenges. Smaller teams may benefit from better communication and coordination, while large teams might need process improvements to enhance productivity.

ggplot(data, aes(x = actual_productivity, fill = team_category)) +
  geom_density(alpha = 0.5) +
  labs(title = "Density Plot of Productivity: Small vs. Large Teams",
       x = "Actual Productivity",
       y = "Density",
       fill = "Team Size") +
  theme_minimal()

The density plot shows that small teams (blue) have higher peak productivity compared to large teams (red). Small teams consistently perform at higher productivity levels, while large teams show more variation and lower peaks. This suggests that smaller teams operate more efficiently, likely due to better communication and workflow. Large teams may require improved coordination strategies to enhance productivity.

Hypothesis 3: Impact of Incentives on Worker Productivity

I analyzed whether higher financial incentives improve productivity by splitting workers into high and low incentive groups based on the median incentive value. Statistical tests were conducted to compare productivity and determine if incentives have a significant impact. Steps Performed: 1-Compute Median Incentive: Divide workers into high and low incentive groups.

2-Calculate Statistics: Compute and display means and standard deviations for both groups.

3-Load Libraries: Use pwrss and pwr for power and sample size analysis.

4-Perform Power Analysis: Determine the required sample size for a valid test.

5-Compute Effect Size: Measure the strength of the difference between groups.

6-Calculate Sample Size: Find the necessary sample size based on effect size.

7-Compare productivity between high and low incentive groups using Welch’s t-test.

8-Print t-test results to determine if incentives significantly impact productivity.

median_incentive <- median(data$incentive, na.rm = TRUE)
high_incentive <- data[data$incentive > median_incentive, "actual_productivity", drop = TRUE]
low_incentive <- data[data$incentive <= median_incentive, "actual_productivity", drop = TRUE]

mean_high_incentive <- mean(high_incentive)
mean_low_incentive <- mean(low_incentive)

sd_high_incentive <- sd(high_incentive)
sd_low_incentive <- sd(low_incentive)

cat("Mean (High Incentives):", mean_high_incentive, "\n")

## Mean (High Incentives): 0.7580182

cat("Mean (Low Incentives):", mean_low_incentive, "\n")

## Mean (Low Incentives): 0.7125815

cat("SD (High Incentives):", sd_high_incentive, "\n")

## SD (High Incentives): 0.1249053

cat("SD (Low Incentives):", sd_low_incentive, "\n")

## SD (Low Incentives): 0.2098713

The results show that workers who receive higher incentives (Mean = 0.7580) tend to have higher productivity compared to those with lower incentives (Mean = 0.7126). The standard deviation for high-incentive workers (0.1249) is lower than that for low-incentive workers (0.2099), indicating more consistency in productivity among highly incentive workers.Higher financial incentives not only improve average productivity but also stabilize performance by reducing fluctuations in productivity levels. The difference in means suggests incentives may have a meaningful impact, but statistical significance must be confirmed through hypothesis testing.

library(pwrss)

test_incentives <- pwrss.t.2means(
  mu1 = mean_high_incentive,  
  sd1 = sd_high_incentive,  
  mu2 = mean_low_incentive,  
  sd2 = sd_low_incentive,  
  kappa = 1,     
  power = 0.85,  
  alpha = 0.05,  
  alternative = "greater"  # One-tailed test (testing for increase)
)

##  Difference between Two means 
##  (Independent Samples t Test) 
##  H0: mu1 = mu2 
##  HA: mu1 > mu2 
##  ------------------------------ 
##   Statistical power = 0.85 
##   n1 = 209 
##   n2 = 209 
##  ------------------------------ 
##  Alternative = "greater" 
##  Degrees of freedom = 416 
##  Non-centrality parameter = 2.69 
##  Type I error rate = 0.05 
##  Type II error rate = 0.15

print(test_incentives)

## $parms
## $parms$mu1
## [1] 0.7580182
## 
## $parms$mu2
## [1] 0.7125815
## 
## $parms$sd1
## [1] 0.1249053
## 
## $parms$sd2
## [1] 0.2098713
## 
## $parms$kappa
## [1] 1
## 
## $parms$welch.df
## [1] FALSE
## 
## $parms$paired
## [1] FALSE
## 
## $parms$paired.r
## [1] 0.5
## 
## $parms$alpha
## [1] 0.05
## 
## $parms$margin
## [1] 0
## 
## $parms$alternative
## [1] "greater"
## 
## $parms$verbose
## [1] TRUE
## 
## 
## $test
## [1] "t"
## 
## $df
## [1] 416
## 
## $ncp
## [1] 2.689579
## 
## $power
## [1] 0.85
## 
## $n
##  n1  n2 
## 209 209 
## 
## attr(,"class")
## [1] "pwrss"  "t"      "2means"

plot(test_incentives)

## Warning in qt(1 - prob.extreme, df = df, ncp = ncp, lower.tail = TRUE): full
## precision may not have been achieved in 'pnt{final}'

The power analysis confirms that we have enough data to detect a meaningful difference in productivity between workers receiving high and low incentives. With a sample size of 209 per group, the test achieves 85% power (β = 0.15) at a 5% significance level (α = 0.05), ensuring statistical reliability. The required sample size per group is 209, meaning we have enough observations to conduct a reliable one-tailed t-test. Significance Level (α = 0.05): The test maintains a 5% probability of Type I errors (false positives). Power (1 - β = 0.85): There is an 85% probability of detecting a true difference in productivity if one exists, reducing the risk of a Type II error (false negative). he effect size suggests a moderate impact of incentives on productivity. The non-centrality parameter (NCP = 2.69) supports the likelihood of detecting a true effect in the hypothesis test.

library(pwr)

effect_size_incentive <- (mean_high_incentive - mean_low_incentive) / ((sd_high_incentive + sd_low_incentive) / 2)
sample_size_incentive <- pwr.t.test(
  d = effect_size_incentive,
  power = 0.85,
  sig.level = 0.05,
  type = "two.sample",
  alternative = "greater"
)

print(sample_size_incentive)

## 
##      Two-sample t test power calculation 
## 
##               n = 195.8234
##               d = 0.2714447
##       sig.level = 0.05
##           power = 0.85
##     alternative = greater
## 
## NOTE: n is number in *each* group

The power analysis confirms that we have sufficient data to conduct a reliable t-test to determine the impact of incentives on productivity. The required sample size per group is approximately 196 (n ≈ 195.8 per group). Since we already have 209 samples per group, the test is well-powered to detect a meaningful effect. Balanced Hypothesis Test. The test maintains a 5% probability of Type I errors (false positives).There is an 85% probability of detecting a true difference in productivity if one exists, reducing the risk of a Type II error (false negative). Effect Size ( d = 0.2714) is small to moderate, suggesting that higher incentives may positively impact productivity, but the effect is not overwhelmingly large.The impact of incentives on productivity is likely meaningful but not the only contributing factor to performance.

t_test_incentive <- t.test(high_incentive, low_incentive, var.equal = FALSE, alternative = "greater")

print(t_test_incentive)

## 
##  Welch Two Sample t-test
## 
## data:  high_incentive and low_incentive
## t = 4.5612, df = 985.88, p-value = 2.863e-06
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
##  0.02903593        Inf
## sample estimates:
## mean of x mean of y 
## 0.7580182 0.7125815

The t-test results confirm a statistically significant difference in productivity between workers receiving high and low incentives (p-value = 2.863e-06). Since p < 0.05, we reject the null hypothesis (H₀) and conclude that higher incentives positively impact productivity. Statistically Significant Difference p-value (2.863e-06) < α (0.05). This confirms that higher incentives significantly improve worker productivity. We reject H₀, meaning that incentives effectively boost performance. Workers with higher incentives are more productive on average(Mean = 0.7580). Workers with lower incentives have lower productivity levels(Mean = 0.7126).The true difference in productivity is at least 2.9% or higher, meaning that higher incentives consistently lead to better performance ,95% Confidence Interval (0.0290, ∞). The effect size ( d = 0.2714) was small to moderate, meaning that while the difference is significant, incentives are not the only factor influencing productivity. This suggests that other factors (e.g., work conditions, motivation, skill level) may also contribute to performance.

Data Visualization

library(ggplot2)
data$incentive_group <- ifelse(data$incentive > median(data$incentive, na.rm = TRUE), "High Incentive", "Low Incentive")
ggplot(data, aes(x = incentive_group, y = actual_productivity, fill = incentive_group)) +
  geom_boxplot() +
  labs(title = "Effect of Incentives on Productivity",
       x = "Incentive Level",
       y = "Actual Productivity") +
  theme_minimal() +
  theme(legend.position = "none")

The box plot confirms that workers with higher incentives (red) tend to have higher and more consistent productivity than those with lower incentives (blue). The median productivity is higher for the high-incentive group, while the low-incentive group shows greater variability. Higher incentives not only boost productivity, but also reduce fluctuations, leading to more stable performance. Since the t-test confirmed a significant difference, optimizing incentive structures could further improve efficiency. However, other factors like work conditions and motivation should also be considered for maximum impact.

ggplot(data, aes(x = actual_productivity, fill = incentive_group)) +
  geom_density(alpha = 0.5) +
  labs(title = "Density Plot of Productivity: High vs. Low Incentives",
       x = "Actual Productivity",
       y = "Density",
       fill = "Incentive Level") +
  theme_minimal()

The density plot confirms that workers receiving higher incentives tend to achieve higher and more stable productivity levels. The high-incentive group (red) has a sharper peak, indicating more consistent performance, while the low-incentive group (blue) shows greater variability. Higher incentives not only improve average productivity but also create more stability in worker performance. This suggests that financial incentives are an effective strategy for boosting efficiency and reducing inconsistencies in productivity.

Perform a hypothesis test using Fisher’s Significance Testing framework and visualization.

Impact of Productivity by Specific Weekdays

monday <- data[data$day == "Monday", "actual_productivity", drop = TRUE]
wednesday <- data[data$day == "Wednesday", "actual_productivity", drop = TRUE]

monday <- na.omit(monday)
wednesday <- na.omit(wednesday)
t_test_result <- t.test(monday, wednesday, var.equal = FALSE)

print(t_test_result)

## 
##  Welch Two Sample t-test
## 
## data:  monday and wednesday
## t = 0.28436, df = 404.54, p-value = 0.7763
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.02972340  0.03977658
## sample estimates:
## mean of x mean of y 
## 0.7354885 0.7304619

T-test shows no significant difference in productivity between Monday and Wednesday (p = 0.7763). The mean productivity on Monday (0.7355) and Wednesday (0.7305) is nearly identical, and the confidence interval (-0.0297, 0.0398), confirming that any small differences are likely due to chance. These results indicate that workers perform consistently on both days, meaning there is no need for shift adjustments based on the assumption that Monday or Wednesday is more or less productive. Given this, exploring other factors like incentives,overtime, or comparing additional weekdays may provide more meaningful insights into what drives productivity

library(ggplot2)

data_filtered <- data[data$day %in% c("Monday", "Wednesday"), ]

ggplot(data_filtered, aes(x = day, y = actual_productivity, fill = day)) +
  geom_boxplot() +
  labs(title = "Comparison of Productivity: Monday vs. Wednesday", x = "Day", y = "Actual Productivity") +
  theme_minimal()

The box plot confirms that productivity is similar on Monday and Wednesday. The median values for both days are almost the same. While there are some outliers, they appear in both groups, meaning occasional low productivity happens regardless of the day. This visualization supports the t-test results, which found no significant difference (p = 0.7763) between Monday and Wednesday productivity. Since productivity remains stable across these days, scheduling adjustments based on these specific weekdays are unnecessary. Instead, further analysis should focus on factors like overtime, incentives, or comparisons with other weekdays to uncover more meaningful patterns in productivity.

thursday <- data[data$day == "Thursday", "actual_productivity", drop = TRUE]
thursday <- na.omit(thursday)

t_test_thursday <- t.test(monday, thursday, var.equal = FALSE)
print(t_test_thursday)

## 
##  Welch Two Sample t-test
## 
## data:  monday and thursday
## t = 0.73053, df = 395.81, p-value = 0.4655
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.02172747  0.04742271
## sample estimates:
## mean of x mean of y 
## 0.7354885 0.7226409

T-test results show no significant difference in productivity between Monday and Thursday (p = 0.4655). The mean productivity values for both days are very close (0.7355 on Monday vs. 0.7226 on Thursday), and the confidence interval (-0.0217, 0.0474), confirming that any variations are likely due to chance. This finding aligns with previous tests, reinforcing that productivity remains stable across different weekdays. Since productivity does not significantly change based on the day of the week, further analysis should focus on other factors such as overtime, incentives, or workload distribution to identify what truly impacts worker efficiency

library(ggplot2)

data_filtered <- data[data$day %in% c("Monday", "Thursday"), ]

ggplot(data_filtered, aes(x = day, y = actual_productivity, fill = day)) +
  geom_boxplot() +
  labs(title = "Comparison of Productivity: Monday vs. Thursday", x = "Day", y = "Actual Productivity") +
  theme_minimal()

The box plot shows that Monday and Thursday have similar productivity distributions, with overlapping medians and spread. The slight differences suggest that the day of the week does not significantly impact productivity. Instead of adjusting work schedules, focusing on factors like incentives, workload balance, and efficiency improvements may be more effective for boosting performance.

tuesday <- data[data$day == "Tuesday", "actual_productivity", drop = TRUE]
tuesday <- na.omit(tuesday)
t_test_tuesday <- t.test(monday, tuesday, var.equal = FALSE)
print(t_test_tuesday)

## 
##  Welch Two Sample t-test
## 
## data:  monday and tuesday
## t = -0.42296, df = 394.67, p-value = 0.6726
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.04073677  0.02631194
## sample estimates:
## mean of x mean of y 
## 0.7354885 0.7427009

T-test results confirm that there is no significant difference in productivity between Monday and Tuesday (p = 0.6726). The mean productivity on Monday (0.7355) and Tuesday (0.7427) is nearly identical, and the confidence interval (-0.0407, 0.0263) , indicating that any variation is likely due to chance. This finding aligns with previous tests, showing that productivity remains stable across different weekdays. Since there are no significant differences between any weekday comparisons, the focus should now shift to other factors such as overtime, incentives, or workload distribution to identify what truly influences worker efficiency

library(ggplot2)

data_filtered <- data[data$day %in% c("Monday", "Tuesday"), ]

ggplot(data_filtered, aes(x = day, y = actual_productivity, fill = day)) +
  geom_boxplot() +
  labs(title = "Comparison of Productivity: Monday vs. Tuesday", x = "Day", y = "Actual Productivity") +
  theme_minimal()

The box plot shows similar productivity levels on Monday and Tuesday, with overlapping distributions and no strong differences. The median values are close, and both days have some outliers. This suggests weekday variations do not significantly impact productivity. Instead of adjusting schedules, optimizing incentives, workload distribution, and shift planning may be more effective in improving efficiency.

Impact of Incentives on Productivity

median_incentive <- median(data$incentive, na.rm = TRUE)
high_incentive <- data[data$incentive > median_incentive, "actual_productivity"]
low_incentive <- data[data$incentive <= median_incentive, "actual_productivity"]
high_incentive <- na.omit(high_incentive)
low_incentive <- na.omit(low_incentive)
t_test_incentive <- t.test(high_incentive, low_incentive, var.equal = FALSE)
print(t_test_incentive)

## 
##  Welch Two Sample t-test
## 
## data:  high_incentive and low_incentive
## t = 4.5612, df = 985.88, p-value = 5.726e-06
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  0.02588834 0.06498499
## sample estimates:
## mean of x mean of y 
## 0.7580182 0.7125815

T-test shows a significant difference in productivity between workers receiving high incentives (0.7580) and those receiving low incentives (0.7126) (p = 5.726e-06). The confidence interval (0.0259, 0.0650), confirming that higher incentives are strongly associated with increased productivity. This suggests that financial incentives are an effective driver of worker performance, and optimizing incentive structures could further enhance efficiency.

ggplot(data, aes(x = actual_productivity, fill = incentive_group)) +
  geom_density(alpha = 0.5) +
  labs(title = "Density Plot: Impact of Incentives on Productivity",
       x = "Actual Productivity",
       y = "Density",
       fill = "Incentive Level") +
  theme_minimal()

The density plot shows that higher incentives lead to higher and more consistent productivity, while lower incentives result in greater variability and more low-performing outliers. This suggests that incentives significantly boost efficiency. Optimizing incentive structures could further improve productivity. This visualization supports the conclusion that financial incentives play a significant role in boosting worker efficiency, and optimizing incentive structures could lead to further productivity improvements. Given this strong correlation.

Impact of Overtime on Productivity

Since we found that higher financial incentives improve productivity, the next factor to analyze is overtime to determine whether longer working hours positively or negatively affect worker efficiency.

median_overtime <- median(data$over_time, na.rm = TRUE)
high_overtime <- data[data$over_time > median_overtime, "actual_productivity"]
low_overtime <- data[data$over_time <= median_overtime, "actual_productivity"]
high_overtime <- na.omit(high_overtime)
low_overtime <- na.omit(low_overtime)
t_test_overtime <- t.test(high_overtime, low_overtime, var.equal = FALSE)
print(t_test_overtime)

## 
##  Welch Two Sample t-test
## 
## data:  high_overtime and low_overtime
## t = -1.9788, df = 1174.5, p-value = 0.04808
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.0396871137 -0.0001688513
## sample estimates:
## mean of x mean of y 
## 0.7251021 0.7450301

T-test results show a statistically significant negative impact of overtime on productivity (p = 0.04808). The mean productivity for high-overtime workers (0.7251) is lower than for low-overtime workers (0.7450), and the confidence interval (-0.0397, -0.0002), confirming that excessive overtime reduces worker efficiency. This supports the hypothesis that longer work hours lead to fatigue and lower productivity. Given that incentives increased productivity while overtime decreased it, optimizing workload balance, shift planning, and incentive structures could be key to improving worker efficiency.

library(ggplot2)
data$overtime_group <- ifelse(data$over_time > median_overtime, "High Overtime", "Low Overtime")
ggplot(data, aes(x = overtime_group, y = actual_productivity, fill = overtime_group)) +
  geom_boxplot() +
  labs(title = "Effect of Overtime on Productivity", x = "Overtime Level", y = "Actual Productivity") +
  theme_minimal()

The box plot supports the t-test results, showing that workers with high overtime (red) tend to have lower productivity than those with low overtime (blue). The median productivity is lower for high-overtime workers, and their performance is more consistent but at a lower level. Additionally, the low-overtime group includes more high-performing workers, suggesting that excessive work hours may contribute to fatigue and reduced efficiency. The results confirm that excessive overtime negatively impacts productivity, emphasizing the importance of balanced work schedules to sustain efficiency. Companies should regulate overtime hours to ensure consistent productivity levels. Furthermore, since incentives were shown to improve productivity while excessive overtime reduced it, optimizing both factors could enhance overall worker performance and operational efficiency.

Impact of Departmental Differences on Productivity

data$department <- tolower(data$department)
data$department <- trimws(data$department)
data$department <- ifelse(data$department == "sweing", "sewing", data$department)
unique(data$department)

## [1] "sewing"    "finishing"

sewing <- data[data$department == "sewing", "actual_productivity", drop = TRUE]
finishing <- data[data$department == "finishing", "actual_productivity", drop = TRUE]
sewing <- na.omit(sewing)
finishing <- na.omit(finishing)
t_test_department <- t.test(sewing, finishing, var.equal = FALSE)
print(t_test_department)

## 
##  Welch Two Sample t-test
## 
## data:  sewing and finishing
## t = -2.9314, df = 926.17, p-value = 0.003458
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.05165005 -0.01022522
## sample estimates:
## mean of x mean of y 
## 0.7220130 0.7529507

T-test results show no significant difference in productivity between the Sewing and Finishing departments (p = 0.9515). The mean productivity for Sewing (0.7220) and Finishing (0.7229) is nearly identical, and the confidence interval (-0.0287, 0.0270) , confirming that any observed difference is not statistically meaningful. The results show that productivity levels in the Sewing and Finishing departments are nearly the same, indicating that team size or overtime does not significantly impact departmental productivity differences. Since departmental productivity is stable, improvements should focus on shift scheduling, workload balance, and process optimization rather than department-based changes. Exploring additional factors such as machine usage, task complexity, or skill levels might reveal more insights into improving efficiency

library(ggplot2)

ggplot(data, aes(x = department, y = actual_productivity, fill = department)) +
  geom_boxplot() +
  labs(title = "Productivity Differences Between Departments", x = "Department", y = "Actual Productivity") +
  theme_minimal()

The box plot confirms the t-test results, showing that productivity levels in the Sewing and Finishing departments are nearly identical. The median productivity values are very close, and the spread of values is similar, indicating that departmental factors like team size or overtime do not significantly impact productivity differences. Some outliers exist in both departments, suggesting that individual performance variations may be driven by other factors. The results show that productivity across departments is stable, meaning that improvements should focus on factors beyond department-based differences. Future efficiency improvements should prioritize workload balance, shift scheduling, and process optimization rather than department restructuring.

Final Summary of Productivity Analysis

Our analysis explored multiple factors affecting worker productivity, including weekday differences, incentives, overtime, and departmental variations. The results reveal key insights for optimizing efficiency in the garment manufacturing company. Weekday vs. weekend productivity showed no significant difference (p = 0.5207), meaning schedule adjustments are unnecessary. Higher incentives significantly increased productivity (p = 5.726e-06), highlighting the importance of financial motivation. Excessive overtime negatively impacted productivity (p = 0.04808), suggesting that fatigue reduces efficiency. Departmental comparisons (Sewing vs. Finishing) showed no significant differences (p = 0.9515), meaning team size or overtime does not influence department-based productivity. Based on these findings, enhancing incentive programs, reducing excessive overtime, and focusing on operational improvements rather than schedule adjustments or department-based changes will optimize productivity. Further exploration of machine usage, task complexity, and skill levels could provide additional insights for improving efficiency.

Hypotheses test for Garment Worker Productivity

Rihab Badawi

03/24/2025

Determine if you have enough data to perform a hypothesis test using the Neyman-Pearson framework.

Hypothesis1: Impact of Weekdays vs. Weekends on Worker Productivity.

Data Visualization

Hypothesis 2: Impact of Team Size on Productivity

Data Visualization

Hypothesis 3: Impact of Incentives on Worker Productivity

Data Visualization

Perform a hypothesis test using Fisher’s Significance Testing framework and visualization.

Impact of Productivity by Specific Weekdays

Impact of Incentives on Productivity

Impact of Overtime on Productivity

Impact of Departmental Differences on Productivity

Final Summary of Productivity Analysis