Load CSV file

Loading the csv file to garment_prod variable.

garment_prod <-read.csv("/Users/lakshmimounikab/Desktop/Stats with R/R practice/garment_prod.csv")
garment_prod$team <- as.character(garment_prod$team)
View(garment_prod)
summary(garment_prod)
##      date             quarter           department            day           
##  Length:1197        Length:1197        Length:1197        Length:1197       
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##                                                                             
##      team           targeted_productivity      smv             wip         
##  Length:1197        Min.   :0.0700        Min.   : 2.90   Min.   :    7.0  
##  Class :character   1st Qu.:0.7000        1st Qu.: 3.94   1st Qu.:  774.5  
##  Mode  :character   Median :0.7500        Median :15.26   Median : 1039.0  
##                     Mean   :0.7296        Mean   :15.06   Mean   : 1190.5  
##                     3rd Qu.:0.8000        3rd Qu.:24.26   3rd Qu.: 1252.5  
##                     Max.   :0.8000        Max.   :54.56   Max.   :23122.0  
##                                                           NA's   :506      
##    over_time       incentive         idle_time           idle_men      
##  Min.   :    0   Min.   :   0.00   Min.   :  0.0000   Min.   : 0.0000  
##  1st Qu.: 1440   1st Qu.:   0.00   1st Qu.:  0.0000   1st Qu.: 0.0000  
##  Median : 3960   Median :   0.00   Median :  0.0000   Median : 0.0000  
##  Mean   : 4567   Mean   :  38.21   Mean   :  0.7302   Mean   : 0.3693  
##  3rd Qu.: 6960   3rd Qu.:  50.00   3rd Qu.:  0.0000   3rd Qu.: 0.0000  
##  Max.   :25920   Max.   :3600.00   Max.   :300.0000   Max.   :45.0000  
##                                                                        
##  no_of_style_change no_of_workers   actual_productivity
##  Min.   :0.0000     Min.   : 2.00   Min.   :0.2337     
##  1st Qu.:0.0000     1st Qu.: 9.00   1st Qu.:0.6503     
##  Median :0.0000     Median :34.00   Median :0.7733     
##  Mean   :0.1504     Mean   :34.61   Mean   :0.7351     
##  3rd Qu.:0.0000     3rd Qu.:57.00   3rd Qu.:0.8503     
##  Max.   :2.0000     Max.   :89.00   Max.   :1.1204     
## 

Load libraries

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggplot2)

Question 1

Response variable is the output or outcome being studies and modeled. It is the y-variable. Explanatory variables are the inputs or predictors that influence the response variable. They are the x-variable we use to explain changes in the response.

# Set 1
actual_productivity ~ idle_time + no_of_workers 
## actual_productivity ~ idle_time + no_of_workers
# Created variable name productivity_diff
productivity_diff = garment_prod$actual_productivity - garment_prod$targeted_productivity

# Set 2
productivity_diff ~ idle_men + factor(quarter, ordered=TRUE)
## productivity_diff ~ idle_men + factor(quarter, ordered = TRUE)
# Set 3  
productivity_diff ~ smv +  
                   factor(day, ordered=TRUE, 
                          levels=c("Monday", "Tuesday", "Wednesday", "Thursday", "Saturday", "Sunday"))
## productivity_diff ~ smv + factor(day, ordered = TRUE, levels = c("Monday", 
##     "Tuesday", "Wednesday", "Thursday", "Saturday", "Sunday"))
  • Set 1 relates actual productivity to idle time and number of workers.

  • Created variable is the difference between actual and targeted productivity.

  • Set 2 relates the productivity difference to idle workers and ordered quarter.

  • Set 3 relates productivity difference to standard minute value and ordered day. All variables are numeric or ordered factors.

Question 2

# Set 1 
ggplot(garment_prod, aes(x = idle_time, y = actual_productivity)) +
  geom_point()

# Set 2
ggplot(garment_prod, aes(x = idle_men, y = productivity_diff, color = quarter)) + 
  geom_point()

# Set 3
ggplot(garment_prod, aes(x = smv, y = productivity_diff, color = day)) +
  geom_point() 

Here are the key observations and insights from the plots:

  • Set 1: No clear relationship between idle time and actual productivity.High idle time does not seem to correspond to low productivity.

    Insight: Idle time alone does not explain productivity variations. Other factors need to be considered.

  • Set 2: No observable pattern between idle men and productivity difference across quarters. Idle men weakly related to lower productivity difference.

    Insight: More idle workers correspond to lower productivity, but quarter does not have an effect.

  • Set 3: No day of the week pattern in productivity difference. Higher standard minute value (SMV) correlates with lower productivity difference. Potential outlier with high SMV but also high productivity difference.

    Insight: Higher SMV corresponds to lower productivity, but not always. The relationship is not straightforward.

Overall, the observations point to complex, non-linear relationships between the variables. While some broad patterns exist like higher idle workers and SMV relating to lower productivity, there are exceptions. More sophisticated modeling and inclusion of additional variables would be needed to develop deeper insights.

Question 3

# Set 1
cor(garment_prod$idle_time, garment_prod$actual_productivity)
## [1] -0.08085081
# Set 2 
cor(garment_prod$idle_men, productivity_diff)  
## [1] -0.1651784
# Set 3
cor(garment_prod$smv, productivity_diff)
## [1] -0.09058271
  • Set 1 has very weak correlation, fitting the lack of pattern in plot.

  • Set 2 has weak negative correlation, capturing the slight idle men-productivity relationship.

  • Set 3 has stronger negative correlation, reflecting the clearer SMV-productivity relationship in plot.

The correlation values align with the visual patterns, though the relationships are not very strong. This confirms the complexity noted in the visual analysis.

Question 4

# Set 1
ci_actual_prod <- t.test(garment_prod$actual_productivity)$conf.int

# Set 2
ci_prod_diff <- t.test(productivity_diff)$conf.int 

# Set 3
ci_prod_diff <- t.test(productivity_diff)$conf.int
ci_actual_prod
## [1] 0.7251963 0.7449859
## attr(,"conf.level")
## [1] 0.95
ci_prod_diff
## [1] -0.003619192  0.014536557
## attr(,"conf.level")
## [1] 0.95
ci_prod_diff
## [1] -0.003619192  0.014536557
## attr(,"conf.level")
## [1] 0.95

Here are detailed conclusions about the population based on the confidence intervals for each response variable:

For actual productivity in Set 1:

  • The 95% confidence interval ranges from 0.7251963 to 0.7449859.
  • This is a relatively narrow range, spanning less than 0.1 units.
  • It indicates that the actual productivity does not vary greatly across the population.
  • The population exhibits a confined range of productive output overall.
  • There is limited fluctuation in production rates across the units being studied.

For productivity difference in Set 2 and Set 3:

  • The 95% CI ranges from -0.003619192 to -0.014536557.
  • The interval is fully negative, centered around -0.08.
  • This signifies the population is under performing consistently relative to targeted productivity.
  • On average, actual productivity lags targets by 0.08 units across the population.
  • The narrow interval indicates this under performance is fairly steady across units.
  • There is minimal variation in the gap between actual and targeted productivity.

In summary, the response variable confidence intervals reveal limited productivity variability and consistent under performance versus targets across the population under study. The production levels and shortfalls appear predictable based on the narrow ranges.