Load CSV file

Loading the csv file to garment_prod variable.

garment_prod <-read.csv("/Users/lakshmimounikab/Desktop/Stats with R/R practice/garment_prod.csv")
garment_prod$team <- as.character(garment_prod$team)
View(garment_prod)
summary(garment_prod)
##      date             quarter           department            day           
##  Length:1197        Length:1197        Length:1197        Length:1197       
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##                                                                             
##      team           targeted_productivity      smv             wip         
##  Length:1197        Min.   :0.0700        Min.   : 2.90   Min.   :    7.0  
##  Class :character   1st Qu.:0.7000        1st Qu.: 3.94   1st Qu.:  774.5  
##  Mode  :character   Median :0.7500        Median :15.26   Median : 1039.0  
##                     Mean   :0.7296        Mean   :15.06   Mean   : 1190.5  
##                     3rd Qu.:0.8000        3rd Qu.:24.26   3rd Qu.: 1252.5  
##                     Max.   :0.8000        Max.   :54.56   Max.   :23122.0  
##                                                           NA's   :506      
##    over_time       incentive         idle_time           idle_men      
##  Min.   :    0   Min.   :   0.00   Min.   :  0.0000   Min.   : 0.0000  
##  1st Qu.: 1440   1st Qu.:   0.00   1st Qu.:  0.0000   1st Qu.: 0.0000  
##  Median : 3960   Median :   0.00   Median :  0.0000   Median : 0.0000  
##  Mean   : 4567   Mean   :  38.21   Mean   :  0.7302   Mean   : 0.3693  
##  3rd Qu.: 6960   3rd Qu.:  50.00   3rd Qu.:  0.0000   3rd Qu.: 0.0000  
##  Max.   :25920   Max.   :3600.00   Max.   :300.0000   Max.   :45.0000  
##                                                                        
##  no_of_style_change no_of_workers   actual_productivity
##  Min.   :0.0000     Min.   : 2.00   Min.   :0.2337     
##  1st Qu.:0.0000     1st Qu.: 9.00   1st Qu.:0.6503     
##  Median :0.0000     Median :34.00   Median :0.7733     
##  Mean   :0.1504     Mean   :34.61   Mean   :0.7351     
##  3rd Qu.:0.0000     3rd Qu.:57.00   3rd Qu.:0.8503     
##  Max.   :2.0000     Max.   :89.00   Max.   :1.1204     
## 

Question 1

There are 15 attributes in this data set, out of which there are a couple of values which are quite unclear until we skim through the documentation. Some of them are:

  1. smv- This actually means Standard minute value, which could be mistaken for speed/velocity measurement. It is the allocated time for the task.

  2. wip- This stand for Work in progress, the includes the number of unfinished items for products. wip seems more like a ID rather than the actual meaning.

  3. idle_time- This stands for the amount of time when the production was interrupted due to several reasons. The column name here does not mention the measurement unit of time.

  4. over_time- This attribute stands for the amount of overtime by each team in minutes. The attribute name itself is not enough to understand the unit of time measurement here.

I think they chose these abbreviated, non-intuitive names to conform to common terminology used in manufacturing plants. The data is meant for internal use by people familiar with the terms.

Question 2

One of the elements in this data that remains unclear even after reading the documentation is the ‘incentive’ column. This column contains integer values for each data row, but the exact meaning and mechanism is neither clear nor explained. It is unclear if this refers to monetary incentives, performance bonuses or any other kind of motivation for the workers. The documentation does not bridge this gap to explain what these incentive values represent. More context is needed to fully understand this data element’s role.

Question 3

library(ggplot2)
df <- garment_prod
ggplot(df) + geom_col(aes(x=quarter, y=incentive, fill= quarter)) + labs(title="Incentive Values by Quarter",
       x="Quarter",
       y="Incentive Amount (unclear unit)")

  • Incentives in Quarter 1 and Quarter 2 seem generally higher than Quarter 3- Quarter 5. There may be seasonal trends.

  • However, there is high variance within each quarter - the range of incentives remains unclear.

  • The meaning of the absolute incentive numbers is still ambiguous without documentation.

  • But the relative differences between quarters provides clues on how incentives change over time.

While the incentive values themselves remain unclear, the trends versus quarter suggest some seasonality. This highlights that visualizing ambiguous data along other dimensions can still provide useful insights into relationships and patterns. However, fully understanding incentives still requires documentation of what these values represent.