Garment Worker productivity

Choose two numeric variables, and pair each one with a column you built (i.e., calculated based on others) So, you should have two pairs of columns (1 original column, and 1 created/“mutated” column).

All variables for this data dive should be either continuous (i.e., numeric) or ordered (e.g., [‘small’, ‘medium’,‘large’] is okay, but [“apples”, “oranges”, “bananas”] is not).
At least one pair should be a response variable and an explanatory variable. This week’s data dive focuses on understanding how actual productivity compares to what we expected, and checking how the Standard Minute Value (SMV) and its adjustments due to style changes affect our work. Two key variables have been chosen for this analysis: the actual productivity of our workers and the Adjusted SMV. The first variable measures the efficiency and output of our team, while the second reflects changes in the time allocated per task based on the complexity introduced by style variations. By examining these variables, we aim to identify trends, determine areas that may require adjustments, and enhance overall operational effectiveness.

Pair 1 (SMV and Adjusted SMV):

*Original Variable: SMV (Standard Minute Value)

*Created Variable: Adjusted SMV by adding additional time to the SMV based on the number of style changes.

Pair 2 (Actual Productivity and Expected Productivity):

*Original Variable: Actual Productivity

*Created Variable: Expected Productivity, Adjusted based on overtime, where the basic assumption might be that overtime affects the productivity positively or negatively. This calculation adjusts the targeted productivity by a factor based on the amount of overtime worked.

library(readr)
data <- read.csv("C:/Users/rbada/Downloads/productivity+prediction+of+garment+employees/garments_worker_productivity.csv")

unfiltered <- data

Calculate pair 1 and pair2

# Calculate Adjusted SMV
data$Adjusted_SMV <- data$smv + 0.5 * data$no_of_style_change
# Calculate Expected Productivity
data$Expected_Productivity <- data$targeted_productivity * (1 + 0.005 * (data$over_time / 60))
head(data[c("smv", "Adjusted_SMV", "actual_productivity", "Expected_Productivity")])

##     smv Adjusted_SMV actual_productivity Expected_Productivity
## 1 26.16        26.16           0.9407254                 1.272
## 2  3.94         3.94           0.8865000                 0.810
## 3 11.41        11.41           0.8005705                 1.044
## 4 11.41        11.41           0.8005705                 1.044
## 5 25.90        25.90           0.8003819                 0.928
## 6 25.90        25.90           0.8001250                 1.248

Adjusted SMV Calculation: We add 0.5 minutes for each style change to the SMV to account for the time it takes to switch tasks.This small addition helps us estimate how these changes slightly slow down production. It makes planning and resource allocation more accurate and ensures our production schedule is realistic. Expected Productivity Calculation: We increase productivity by 0.5% for every hour of overtime. This cautious approach assumes that overtime helps increase output a little, but it’s not as effective as regular work hours. This helps us better understand and use overtime wisely, making sure we don’t expect too much from it. Analysis of Results: The consistency between SMV and Adjusted SMV across all entries shows that task conditions stayed the same, which helps keep the workflow predictable and easy to manage. However, the differences between Actual and Expected Productivity show areas where overtime isn’t meeting our expectations. Specifically, cases where Expected Productivity is higher than Actual Productivity point out where we need to make better use of overtime. Looking into whether certain tasks or teams are more affected by overtime could help us plan better and reduce reliance on working extra hours.

Calculate Productivity Deviation Distribution.

use deviation for deeper analysis to better understand performance trends and identify areas for improvement. Deviation measures the difference between Actual Productivity and Expected Productivity, helping identify whether teams are exceeding or falling short of performance expectations.

library(ggplot2)
# Create Productivity Deviation (Difference between Actual and Expected Productivity)
data$Productivity_Deviation <- data$actual_productivity - data$Expected_Productivity

summary(data$Productivity_Deviation)

##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
## -1.08676 -0.44170 -0.27160 -0.26984 -0.07858  0.60237

ggplot(data, aes(x = Productivity_Deviation)) +
  geom_histogram(fill = "blue", bins = 30) +
  labs(title = "Distribution of Productivity Deviation", x = "Deviation", y = "Count")

The histogram shows that most teams are not meeting their expected productivity levels, as most of the data points show negative values. This means that many teams are less productive than we thought they would be. This suggests that working overtime might not be as helpful as we expected, or there might be other issues causing teams to be less productive. To fix this, we need to look more closely at teams that are not doing well and try to understand why. We should also check if our expectations are too high and adjust them to be more realistic. Finally, it would be good to see what the best-performing teams are doing right and try to use some of their methods in other teams. Here are a few steps we can take: Understand why some teams aren’t performing well. Check if our productivity goals are achievable. Learn from the top-performing teams.

library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

team_productivity <- data %>%
  group_by(team) %>%
  summarise(avg_productivity = mean(actual_productivity, na.rm = TRUE)) %>%
  arrange(desc(avg_productivity))

Team Productivity Ranking

  library(ggplot2)
library(dplyr)
# Calculate average productivity for each team
team_productivity <- data %>%
  group_by(team) %>%
  summarise(avg_productivity = mean(actual_productivity, na.rm = TRUE)) %>%
  ungroup() %>%
  mutate(
    Productivity_Rank = rank(-avg_productivity, ties.method = "min")  
  ) %>%
  arrange(Productivity_Rank) %>%
  mutate(
    num_teams = n(),  
    Performance_Category = case_when(
      Productivity_Rank <= num_teams / 3 ~ "High Performance",   
      Productivity_Rank <= (2 * num_teams) / 3 ~ "Medium Performance",  
      TRUE ~ "Low Performance" )) 
print(team_productivity)

## # A tibble: 12 × 5
##     team avg_productivity Productivity_Rank num_teams Performance_Category
##    <int>            <dbl>             <int>     <int> <chr>               
##  1     1            0.821                 1        12 High Performance    
##  2     3            0.804                 2        12 High Performance    
##  3    12            0.779                 3        12 High Performance    
##  4     2            0.771                 4        12 High Performance    
##  5     4            0.770                 5        12 Medium Performance  
##  6     9            0.734                 6        12 Medium Performance  
##  7    10            0.720                 7        12 Medium Performance  
##  8     5            0.698                 8        12 Medium Performance  
##  9     6            0.685                 9        12 Low Performance     
## 10    11            0.682                10        12 Low Performance     
## 11     8            0.674                11        12 Low Performance     
## 12     7            0.668                12        12 Low Performance

ggplot(team_productivity, aes(x = reorder(as.factor(team), -avg_productivity), y = avg_productivity, fill = Performance_Category)) +
  geom_col() +
  labs(title = "Team Productivity Rankings", x = "Team", y = "Average Productivity") +
  scale_fill_manual(values = c("High Performance" = "green", "Medium Performance" = "yellow", "Low Performance" = "red")) +
  coord_flip()

Teams 1, 3, 12, and 2 are recognized as high performers within our organization, consistently achieving the highest productivity levels. Their success is largely due to efficient work processes, strong teamwork, and effective management. These teams not only complete tasks ahead of schedule but also do so with exceptional quality, thereby setting a standard for excellence that other teams aspire to. In contrast, Teams 9, 10, 5, and 4 are categorized as medium performers. They meet the basic productivity expectations but show potential for significant improvement.(Teams 6, 11, 8, and 7) face significant challenges. Their productivity is low mainly because they often work too much overtime, which makes them tired, and their job roles and tasks aren’t well coordinated. These issues not only slow down their work but also make it hard for them to stay motivated and happy at work. It’s important for the organization to step in and fix these problems to help these teams improve.

Calculate SMV Deviation

# Create the deviation column
data$SMV_Deviation <- data$Adjusted_SMV - data$smv
summary(data$SMV_Deviation)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## 0.00000 0.00000 0.00000 0.07519 0.00000 1.00000

ggplot(data, aes(x = SMV_Deviation)) +
  geom_histogram(fill = "purple", bins = 30) +
  labs(title = "Distribution of SMV Deviation", x = "SMV Deviation", y = "Count")

The histogram shows that most SMV deviations are close to 0, indicating minimal impact from style changes. However, a few tasks show larger deviations, suggesting some tasks are more affected. To improve consistency, investigate what causes these larger deviations, such as specific teams, style changes, or operational inefficiencies like idle time and poor task allocation. Standardizing task handling could help reduce these deviations and improve overall efficiency.

Team SMV (Work In Progress)Ranking

library(ggplot2)
library(dplyr)

# Calculate average SMV Deviation for each team
team_smv_deviation <- data %>%
  group_by(team) %>%
  summarise(avg_SMV_Deviation = mean(SMV_Deviation, na.rm = TRUE)) %>%
  ungroup() %>%
  mutate(
    SMV_Rank = rank(-avg_SMV_Deviation, ties.method = 'min')  
  ) %>%
  arrange(SMV_Rank) %>%
  mutate(
    num_teams = n(),  # Number of teams
    Performance_Category = case_when(
      SMV_Rank <= num_teams / 3 ~ "High Performance",  
      SMV_Rank <= (2 * num_teams) / 3 ~ "Medium Performance",   
      TRUE ~ "Low Performance"  
    ))
print(team_smv_deviation)

## # A tibble: 12 × 5
##     team avg_SMV_Deviation SMV_Rank num_teams Performance_Category
##    <int>             <dbl>    <int>     <int> <chr>               
##  1    11            0.142         1        12 High Performance    
##  2     8            0.133         2        12 High Performance    
##  3     3            0.132         3        12 High Performance    
##  4     7            0.104         4        12 High Performance    
##  5     4            0.1           5        12 Medium Performance  
##  6     2            0.0780        6        12 Medium Performance  
##  7    10            0.055         7        12 Medium Performance  
##  8     9            0.0529        8        12 Medium Performance  
##  9     5            0.0430        9        12 Low Performance     
## 10     6            0.0372       10        12 Low Performance     
## 11     1            0.0286       11        12 Low Performance     
## 12    12            0            12        12 Low Performance

library(ggplot2)
library(dplyr)
ggplot(team_smv_deviation, aes(x = factor(team, levels = 1:12), y = avg_SMV_Deviation, fill = Performance_Category)) +
  geom_col() +
  labs(title = "Team SMV Deviation Ranking", x = "Team", y = "Average SMV Deviation") +
  scale_fill_manual(values = c("High Performance" = "green", "Medium Performance" = "yellow", "Low Performance" = "red")) +  # Only include 3 categories
  scale_x_discrete(limits = factor(1:12)) +  
  coord_flip() +
  theme_minimal() +
  theme(axis.text.y = element_text(angle = 0, hjust = 1))

Teams 11, 8, 3, and 7 are categorized as Low Performance due to their higher SMV deviations. This suggests that style changes are significantly affecting their task times, leading to inconsistencies and inefficiencies. On the other hand, Teams 1, 12, 5, and 6 have minimal SMV deviations and are classified as High Performance, indicating that their processes remain stable despite style variations. To improve, Low Performance teams could standardize their processes to minimize downtime and improve efficiency. Investing in training programs to help workers adapt to style changes more smoothly could also reduce the impact on productivity. Following the example of High Performance teams can provide valuable insights into best practices for managing task transitions effectively. The analysis of SMV vs Adjusted SMV and Actual Productivity vs Expected Productivity reveals key insights. High-performing teams show minimal SMV deviation and meet or exceed their Expected Productivity, while low-performing teams experience larger deviations and fall short of expectations. To understand better, investigate why high-performing teams handle style changes efficiently and meet productivity goals, while low performers struggle. Standardizing best practices from high-performing teams, such as improved task allocation and training, can help boost performance and reduce inefficiencies across teams. ### Steps to Investigate: Analyze Task Allocation: Compare how tasks are distributed in high vs. low-performing teams to identify inefficiencies. Evaluate Training Gaps: Determine whether low-performing teams need additional training to handle style variations. Review Workflow Efficiency: Identify potential delays, downtime, or bottlenecks affecting low-performing teams. Study Best Practices: Document strategies used by high-performing teams and explore ways to implement them across all teams.

Investigate style change handling for high and low-performing teams

# Investigate style change handling for high and low-performing teams
style_change_impact <- data %>%
  filter(team %in% c(1, 3, 12, 2, 9, 10, 5, 4, 6, 11, 8, 7)) %>%
  group_by(team) %>%
  summarise(
    avg_style_change_impact = mean(SMV_Deviation, na.rm = TRUE),
    total_style_changes = sum(no_of_style_change, na.rm = TRUE),
    total_downtime = sum(idle_time, na.rm = TRUE)
  )
print("Style Change Impact on Teams:")

## [1] "Style Change Impact on Teams:"

print(style_change_impact)

## # A tibble: 12 × 4
##     team avg_style_change_impact total_style_changes total_downtime
##    <int>                   <dbl>               <int>          <dbl>
##  1     1                  0.0286                   6            0  
##  2     2                  0.0780                  17            6.5
##  3     3                  0.132                   25            0  
##  4     4                  0.1                     21          150  
##  5     5                  0.0430                   8           98  
##  6     6                  0.0372                   7            0  
##  7     7                  0.104                   20          286  
##  8     8                  0.133                   29          314. 
##  9     9                  0.0529                  11            0  
## 10    10                  0.055                   11           16  
## 11    11                  0.142                   25            4  
## 12    12                  0                        0            0

Invvestigate Impact of Style Changes on Teams

ggplot(style_change_impact, aes(x = reorder(as.factor(team), avg_style_change_impact), 
                                y = avg_style_change_impact, 
                                fill = cut(total_style_changes, breaks = c(0, 10, 20, Inf), 
                                           labels = c("Low", "Medium", "High")))) +
  geom_col() +
  labs(title = "Impact of Style Changes on Teams", x = "Team", y = "Average SMV Deviation", fill = "Style Change Impact") +
  scale_fill_manual(values = c("Low" = "green", "Medium" = "yellow", "High" = "red")) +  
  coord_flip() +
  theme_minimal()

Based on the data, we see that Teams 1, 6,9 and 12 are high performers, experiencing minimal style change impact and zero downtime, suggesting they handle transitions efficiently. In contrast, Teams 3,4, 7, 8, and 11 face high style change impact and significant downtime, indicating inefficiencies and room for improvement. Teams 2, 5, 9, and 10 have moderate issues, suggesting a mix of strengths and weaknesses. To improve performance: High performers should be studied for best practices in managing style changes and task allocation. Low performers need to address their downtime and style change disruptions, potentially through better task division or additional training. Moderate performers should focus on optimizing processes and addressing specific inefficiencies. The next step is to standardize efficient processes and invest in training and tools to reduce downtime and improve task handling.I means this for plots too.

Investigate task allocation for high vs. low-performing teams

# Investigate task allocation for high vs. low-performing teams
task_allocation_analysis <- data %>%
  filter(team %in% c(1, 3, 12, 2, 9, 10, 5, 4, 6, 11, 8, 7)) %>%
  group_by(team) %>%
  summarise(
    avg_task_size = mean(no_of_workers, na.rm = TRUE),
    total_overtime = sum(over_time, na.rm = TRUE),  
    total_idle_time = sum(idle_time, na.rm = TRUE)  
  )
print("Task Allocation and Overtime Analysis:")

## [1] "Task Allocation and Overtime Analysis:"

print(task_allocation_analysis)

## # A tibble: 12 × 4
##     team avg_task_size total_overtime total_idle_time
##    <int>         <dbl>          <int>           <dbl>
##  1     1          35.0         503310             0  
##  2     2          34.6         477960             6.5
##  3     3          39.5         510690             0  
##  4     4          38.2         572220           150  
##  5     5          39.4         495780            98  
##  6     6          25.2         316695             0  
##  7     7          37.1         466290           286  
##  8     8          33.5         470040           314. 
##  9     9          35.2         469980             0  
## 10    10          35.3         473670            16  
## 11    11          38.7         382140             4  
## 12    12          23.9         328475             0

The results indicate that high-performing teams, such as Teams 1, 3, 6, and 12, demonstrate efficient task management with minimal to no idle time. This suggests that these teams are effectively organizing their work, which helps maintain productivity. In contrast, low-performing teams like Teams 4, 7, and 8 experience significant idle time, which likely hinders their productivity. These teams also have higher overtime, which may indicate inefficiencies or poor task allocation. Teams with moderate performance, including Teams 2, 5, 9, and 10, show a mix of overtime and idle time, highlighting areas that need improvement. To improve overall performance, it would be beneficial to study the practices of high-performing teams. Focusing on reducing idle time and improving task allocation in low-performing teams could optimize productivity.

Investigate high-performing teams: 1, 3, 6, 12

# Investigate high-performing teams: 1, 3, 6, 12
high_performing_teams <- data %>%
  filter(team %in% c(1, 3, 6, 12)) %>%
  group_by(team) %>%
  summarise(
    avg_task_size = mean(no_of_workers, na.rm = TRUE),
    total_overtime = sum(over_time, na.rm = TRUE),
    total_idle_time = sum(idle_time, na.rm = TRUE)
  )
print(high_performing_teams)

## # A tibble: 4 × 4
##    team avg_task_size total_overtime total_idle_time
##   <int>         <dbl>          <int>           <dbl>
## 1     1          35.0         503310               0
## 2     3          39.5         510690               0
## 3     6          25.2         316695               0
## 4    12          23.9         328475               0

While we have identified high-performing teams (Teams 1, 3, 6, and 12), further investigation is required to understand the underlying factors contributing to their success. We need to explore how these teams allocate tasks and manage transitions, as well as any specific tools or techniques they use to optimize their workflow. Additionally, it is important to look into how overtime is planned and executed, as these teams manage to avoid idle time despite working overtime. Finally, understanding their efficiency in task transitions, which minimizes downtime, could provide valuable insights that can be applied to other teams to improve performance.

Investigate low-performing teams: 4, 7, 8

# Investigate low-performing teams: 4, 7, 8
low_performing_teams <- data %>%
  filter(team %in% c(4, 7, 8)) %>%
  group_by(team) %>%
  summarise(
    avg_task_size = mean(no_of_workers, na.rm = TRUE),
    total_overtime = sum(over_time, na.rm = TRUE),
    total_idle_time = sum(idle_time, na.rm = TRUE)
  )
print(low_performing_teams)

## # A tibble: 3 × 4
##    team avg_task_size total_overtime total_idle_time
##   <int>         <dbl>          <int>           <dbl>
## 1     4          38.2         572220            150 
## 2     7          37.1         466290            286 
## 3     8          33.5         470040            314.

Team 4 has a large task size (38.2 workers on average) and experiences significant overtime (572,220 hours), along with 150 hours of idle time. This suggests the team is overburdened, possibly due to overwork or delays during task transitions. The high overtime and idle time indicate inefficiencies that need to be addressed through better resource management and task allocation to optimize performance and reduce downtime. Similarly, Team 7, with a task size of 37.1 workers, faces high overtime (466,290 hours) and considerable idle time (286 hours). This points to inefficiencies related to poor task management. Streamlining task distribution and improving planning could help reduce idle time and optimize the team’s workload. While Team 8 has a smaller task size (33.5 workers on average), it still experiences high overtime (470,040 hours) and the highest idle time (313.5 hours). This again highlights problems with task allocation and resource management. Improving planning, coordination, and task distribution will be crucial in reducing idle time and ensuring efficient use of available resources.

Investigate Overtime: High vs Low Performers

team_comparison <- bind_rows(
  mutate(high_performing_teams, category = "High Performers"),
  mutate(low_performing_teams, category = "Low Performers")
)
print(team_comparison)

## # A tibble: 7 × 5
##    team avg_task_size total_overtime total_idle_time category       
##   <int>         <dbl>          <int>           <dbl> <chr>          
## 1     1          35.0         503310              0  High Performers
## 2     3          39.5         510690              0  High Performers
## 3     6          25.2         316695              0  High Performers
## 4    12          23.9         328475              0  High Performers
## 5     4          38.2         572220            150  Low Performers 
## 6     7          37.1         466290            286  Low Performers 
## 7     8          33.5         470040            314. Low Performers

team_comparison$team <- factor(team_comparison$team, levels = c(1, 3, 6, 12, 4, 7, 8))
ggplot(team_comparison, aes(x = team, y = total_overtime, fill = category)) +
  geom_col(position = "dodge") +
  labs(title = "Overtime: High vs Low Performers", x = "Team", y = "Total Overtime") +
  scale_fill_manual(values = c("High Performers" = "green", "Low Performers" = "red")) +
  theme_minimal()

High-performing teams, such as Teams 1, 3, 6, and 12, are efficient, with no idle time and minimal delays despite significant overtime. Teams 6 and 12, in particular, show that smaller teams can be just as productive as larger ones when managed well. In contrast, low-performing teams like Teams 4, 7, and 8 experience high idle time, indicating workflow inefficiencies. Despite working more overtime, they don’t see a corresponding boost in productivity, suggesting that overtime isn’t helping. Teams 4 and 7, with larger task sizes, still struggle, showing that more workers don’t always improve efficiency.

Investigate Team 3

team_3_analysis <- data %>%
  filter(team == 3) %>%
  summarise(avg_task_size = mean(no_of_workers, na.rm = TRUE), total_overtime = sum(over_time, na.rm = TRUE), total_idle_time = sum(idle_time, na.rm = TRUE))
print(team_3_analysis)

##   avg_task_size total_overtime total_idle_time
## 1      39.52105         510690               0

if (team_3_analysis$total_overtime > 500000 & team_3_analysis$total_idle_time > 100) print("Team 4 may be overloaded or struggling with task management.")

Team 3’s success in managing a larger team shows that team size does not necessarily lead to inefficiency. What matters more is how the team organizes itself. Despite having a high task size of 39.52 workers, Team 3 effectively utilizes all their resources, as evidenced by their zero idle time. This indicates that the team optimizes its work without wasting any time, even when working overtime. The absence of idle time suggests that their workflows are well-structured and tasks are properly allocated. While the team’s overtime is significant, the fact that there is no idle time implies that the extra hours are being used productively. This demonstrates that the team is not overworked or inefficiently applying overtime, but rather, it is strategically used to meet their goals.

Compare Task Allocation Between High and Low Perform

# Calculate the average task size for high and low performers
high_performing_teams <- data %>%
  filter(team %in% c(1, 3, 6, 12)) %>%
  summarise(avg_task_size = mean(no_of_workers, na.rm = TRUE))

low_performing_teams <- data %>%
  filter(team %in% c(4, 7, 8)) %>%
  summarise(avg_task_size = mean(no_of_workers, na.rm = TRUE))
task_allocation_comparison <- data.frame(
  Category = c("High Performers", "Low Performers"),
  Avg_Task_Size = c(high_performing_teams$avg_task_size, low_performing_teams$avg_task_size)
)
print(task_allocation_comparison)

##          Category Avg_Task_Size
## 1 High Performers      30.96183
## 2  Low Performers      36.20323

# Create a bar plot to compare task allocation between high and low performers
ggplot(task_allocation_comparison, aes(x = Category, y = Avg_Task_Size, fill = Category)) +
  geom_bar(stat = "identity", show.legend = FALSE) +
  labs(title = "Task Allocation: High vs Low Performers", x = "Category", y = "Average Task Size") +
  scale_fill_manual(values = c("High Performers" = "green", "Low Performers" = "red")) +
  theme_minimal()

High-performing teams have an average task size of 30.96 workers, while low-performing teams have a larger task size of 36.20 workers. This suggests that high performers may be more efficient with smaller teams, allowing for better coordination and task management. In contrast, low performers may struggle with larger task sizes, leading to inefficiencies, poor coordination, or miscalculated resources. Reducing team size or optimizing task allocation could help improve performance in low-performing teams.

Compare Overtime Between High and Low Performers

# Calculate total overtime for high-performing teams
high_overtime <- data %>%
  filter(team %in% c(1, 3, 6, 12)) %>%
  summarise(total_overtime = sum(over_time, na.rm = TRUE))

# Calculate total overtime for low-performing team
low_overtime <- data %>%
  filter(team %in% c(4, 7, 8)) %>%
  summarise(total_overtime = sum(over_time, na.rm = TRUE))
overtime_comparison <- data.frame(
  Category = c("High Performers", "Low Performers"),
  Total_Overtime = c(high_overtime$total_overtime, low_overtime$total_overtime))

print(overtime_comparison)

##          Category Total_Overtime
## 1 High Performers        1659170
## 2  Low Performers        1508550

# Compare overtime between high and low performers
ggplot(team_comparison, aes(x = category, y = total_overtime, fill = category)) +
  geom_bar(stat = "identity", position = "dodge") +
  labs(title = "Overtime: High vs Low Performers", x = "Category", y = "Total Overtime") +
  scale_fill_manual(values = c("High Performers" = "green", "Low Performers" = "red")) +
  theme_minimal()

High performers have slightly more overtime than low performers, but the difference in total overtime is relatively small. This suggests that high-performing teams are likely managing their overtime more efficiently, which could contribute to better productivity despite working longer hours. On the other hand, low performers have less overtime, but they might be dealing with inefficiencies during regular hours, such as idle time or delays, which affect their overall performance. We need to investigate While high performers work more overtime, it’s possible they’re using it more effectively. By comparing productivity with overtime, we can determine if more overtime actually leads to improved performance or if it simply indicates inefficiency. We should also examine the correlation between overtime and performance: By plotting overtime against productivity, we can assess whether high overtime correlates with higher output for high performers or if it’s a sign of poor task allocation for low performers.

Calculate Total Idle Time for High and Low Performers

# Calculate total idle time for high-performing teams
high_idle_time <- data %>%
  filter(team %in% c(1, 3, 6, 12)) %>%
  summarise(total_idle_time = sum(idle_time, na.rm = TRUE))

# Calculate total idle time for low-performing teams
low_idle_time <- data %>%
  filter(team %in% c(4, 7, 8)) %>%
  summarise(total_idle_time = sum(idle_time, na.rm = TRUE))

idle_time_comparison <- data.frame(
  Category = c("High Performers", "Low Performers"),
  Total_Idle_Time = c(high_idle_time$total_idle_time, low_idle_time$total_idle_time))

print(idle_time_comparison)

##          Category Total_Idle_Time
## 1 High Performers             0.0
## 2  Low Performers           749.5

# Plot the comparison of idle time between high and low performers
ggplot(idle_time_comparison, aes(x = Category, y = Total_Idle_Time, fill = Category)) +
  geom_bar(stat = "identity", show.legend = FALSE) +
  labs(title = "Idle Time: High vs Low Performers", x = "Category", y = "Total Idle Time") +
  scale_fill_manual(values = c("High Performers" = "green", "Low Performers" = "red")) +
  theme_minimal()

High performers have zero idle time, indicating efficient use of their working hours. In contrast, low performers have 749.5 hours of idle time, suggesting significant workflow inefficiencies or delays. To improve performance, low-performing teams need to address these inefficiencies by optimizing task allocation and streamlining workflows to reduce downtime.

Calculate Productivity vs Overtime

# Calculate productivity vs overtime for each team
productivity_overtime_comparison <- data %>%
  group_by(team) %>%
  summarise(
    total_overtime = sum(over_time, na.rm = TRUE),
    avg_productivity = mean(actual_productivity, na.rm = TRUE)
  )
print(productivity_overtime_comparison)

## # A tibble: 12 × 3
##     team total_overtime avg_productivity
##    <int>          <int>            <dbl>
##  1     1         503310            0.821
##  2     2         477960            0.771
##  3     3         510690            0.804
##  4     4         572220            0.770
##  5     5         495780            0.698
##  6     6         316695            0.685
##  7     7         466290            0.668
##  8     8         470040            0.674
##  9     9         469980            0.734
## 10    10         473670            0.720
## 11    11         382140            0.682
## 12    12         328475            0.779

# Plot Productivity vs Overtime
ggplot(productivity_overtime_comparison, aes(x = total_overtime, y = avg_productivity, color = as.factor(team))) +
  geom_point(size = 4) + # scatter plot
  labs(title = "Productivity vs Overtime", x = "Total Overtime", y = "Average Productivity") +
  theme_minimal()

Correlation Between Total Overtime And Average Productivity

# Calculate the correlation between total overtime and average productivity
correlation_result <- cor(productivity_overtime_comparison$total_overtime, 
                          productivity_overtime_comparison$avg_productivity)
print(paste("Correlation between Overtime and Productivity: ", round(correlation_result, 2)))

## [1] "Correlation between Overtime and Productivity:  0.33"

The correlation of 0.33 between overtime and productivity indicates a weak positive relationship, meaning that while more overtime may slightly improve productivity in some teams, the effect is not strong enough to suggest overtime is a major driver of performance. The weak correlation suggests that other factors may be more influential in productivity, and overtime might be a result of inefficiencies or poor time management during regular hours, rather than a key to improving performance.

Total overtime and idle time) for each team

# Calculate key metrics (total overtime and idle time) for each team
key_metrics <- data %>%
  group_by(team) %>%
  summarise(
    total_overtime = sum(over_time, na.rm = TRUE),
    total_idle_time = sum(idle_time, na.rm = TRUE)
  )
print(key_metrics)

## # A tibble: 12 × 3
##     team total_overtime total_idle_time
##    <int>          <int>           <dbl>
##  1     1         503310             0  
##  2     2         477960             6.5
##  3     3         510690             0  
##  4     4         572220           150  
##  5     5         495780            98  
##  6     6         316695             0  
##  7     7         466290           286  
##  8     8         470040           314. 
##  9     9         469980             0  
## 10    10         473670            16  
## 11    11         382140             4  
## 12    12         328475             0

# Scatter plot to compare overtime and idle time
ggplot(key_metrics, aes(x = total_overtime, y = total_idle_time)) +
  geom_point(aes(color = factor(team)), size = 4) +  
  labs(title = "Overtime vs Idle Time by Team", x = "Total Overtime (hours)", y = "Total Idle Time (hours)") +
  theme_minimal()

The scatter plot shows that high-performing teams like 1, 3, 9, and 12 use overtime efficiently with zero idle time, while low-performing teams like 4, 7, and 8 have high idle time despite overtime. This suggests that high-performing teams have optimized workflows, whereas low performers may face inefficiencies that need to be addressed.

Correlation between total overtime and Average idle time

# Calculate the correlation between total overtime and total idle time
correlation_result <- cor(key_metrics$total_overtime, key_metrics$total_idle_time)

print(paste("Correlation between Overtime and Idle Time: ", round(correlation_result, 2)))

## [1] "Correlation between Overtime and Idle Time:  0.3"

The correlation of 0.3 between overtime and idle time suggests a weak positive relationship. This means that while there is some connection between overtime and idle time, it’s not strong. Teams with higher overtime might experience slightly more idle time, indicating inefficiencies that could be due to poor task allocation or workflow delays.

Plot a visualization for each relationship, and draw some conclusions based on the plot. Use what we’ve covered so far in class to scrutinize the plot (e.g., are there any outliers?)

Visualize Pair 1 (SMV vs Adjusted SMV)

# Plotting smv vs Adjusted_SMV
ggplot(data, aes(x = smv, y = Adjusted_SMV)) +
  geom_point(color = "green") +  # Scatter plot for SMV and Adjusted SMV
  labs(title = "SMV vs Adjusted SMV", 
       x = "Original SMV", 
       y = "Adjusted SMV") +
  geom_smooth(method = "lm", color = "red") +  # Adding a linear regression line for better understanding
  theme_minimal()

## `geom_smooth()` using formula = 'y ~ x'

The plot show a strong positive correlation between Original SMV (smv) and Adjusted SMV. As shown by the red regression line, there is a clear linear relationship, indicating that when the Original SMV increases, the Adjusted SMV also increases. This pattern is expected since the adjusted values are typically derived by applying a scaling or correction factor to the original values. Additionally, the data points are closely aligned with the regression line, suggesting that the adjustments to the SMV values are consistent and closely to the original data. No outliers are visible in the plot, reinforcing the conclusion that the relationship between Original SMV and Adjusted SMV is strong and stable.

Visualize Pair 2 (Actual Productivity vs Expected Productivity)

# Plotting Actual Productivity vs Expected Productivity
ggplot(data, aes(x = Expected_Productivity, y = actual_productivity)) +
  geom_point(color = "blue") +  # Scatter plot for Actual vs Expected Productivity
  labs(title = "Actual Productivity vs Expected Productivity", 
       x = "Expected Productivity", 
       y = "Actual Productivity") +
  geom_smooth(method = "lm", color = "red") +  # Adding a linear regression line
  theme_minimal()

## `geom_smooth()` using formula = 'y ~ x'

The plot reveals a weak positive relationship between Expected and Actual Productivity. While there is an upward trend, the scattered data points suggest the correlation is not strong enough for precise predictions. The outliers imply that factors beyond expected productivity, such as task allocation or external influences, could significantly affect actual performance. These outliers may highlight teams that are over-performing or under-performing, offering insights into areas for improvement or best practices.

Correlation coefficient for each of these combinations

# Correlation for SMV vs Adjusted SMV
round(cor(data$smv, data$Adjusted_SMV), 2)

## [1] 1

A correlation of 1 means there is a perfect positive linear relationship between SMV and Adjusted SMV. This is what we would expect because Adjusted SMV is often derived from SMV with some adjustments or scaling. The fact that the correlation is 1 suggests that as SMV increases, Adjusted SMV increases in a directly proportional manner without any deviation. This makes perfect sense based on the visualization, where the data points are likely clustered around a straight line, and there are no noticeable outliers or deviations.

# Correlation for Actual Productivity vs Expected Productivity
round(cor(data$actual_productivity, data$Expected_Productivity), 2)

## [1] 0.22

A correlation of 0.22 indicates a weak positive relationship between Actual Productivity and Expected Productivity. This suggests that while there is a slight tendency for Actual Productivity to increase as Expected Productivity increases, the relationship is weak. Based on the visualization, we might observe a scattered distribution of data points with no clear linear pattern. There could be several outliers, where some teams perform much better or worse than expected, leading to a wider spread of data points. This weak correlation suggests that Actual Productivity is influenced by factors beyond just Expected Productivity, such as team dynamics, resource availability, or workflow efficiency.

Build a confidence interval for each of the response variable(s). Provide a detailed conclusion of the response variable (i.e., the population) based on your confidence interval.

Bootstrapping confidence interval

library(boot)

boot_ci <- function(v, func = median, conf = 0.95, n_iter = 1000) {
  boot_func <- \(x, i) func(x[i], na.rm=TRUE)
  b <- boot(v, boot_func, R = n_iter)
  boot.ci(b, conf = conf, type = "perc")
}
# Bootstrapping confidence interval for Adjusted SMV (response variable)
adjusted_smv_ci <- boot_ci(data$Adjusted_SMV, func = mean, conf = 0.95)
print(paste("95% Confidence Interval for Adjusted SMV: ", round(adjusted_smv_ci$percent[4:5], 2)))

## [1] "95% Confidence Interval for Adjusted SMV:  14.47"
## [2] "95% Confidence Interval for Adjusted SMV:  15.72"

# Bootstrapping confidence interval for Actual Productivity (response variable)
actual_productivity_ci <- boot_ci(data$actual_productivity, func = mean, conf = 0.95)
print(paste("95% Confidence Interval for Actual Productivity: ", round(actual_productivity_ci$percent[4:5], 2)))

## [1] "95% Confidence Interval for Actual Productivity:  0.72"
## [2] "95% Confidence Interval for Actual Productivity:  0.75"

The 95% confidence interval for Adjusted SMV is (14.58, 15.73), indicating that we are 95% confident the true population mean for Adjusted SMV lies within this range. The relatively wide interval suggests some variability in the adjusted values, which may be influenced by factors like team performance or task complexity. This variability highlights the need for further investigation into the consistency of the adjustments made to SMV. On the other hand, the 95% confidence interval for Actual Productivity is (0.73, 0.74), which is narrow, implying a high degree of consistency in Actual Productivity across teams. The true population mean is close to 0.74, and the narrow interval suggests that actual productivity is predictable with minimal fluctuation. These insights provide useful information on the variability of the response variables and highlight opportunities for improving consistency in task adjustments and team productivity management.