Loading the csv file to garment_prod variable.
garment_prod <-read.csv("/Users/lakshmimounikab/Desktop/Stats with R/R practice/garment_prod.csv")
garment_prod$team <- as.character(garment_prod$team)
View(garment_prod)
summary(garment_prod)
## date quarter department day
## Length:1197 Length:1197 Length:1197 Length:1197
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## team targeted_productivity smv wip
## Length:1197 Min. :0.0700 Min. : 2.90 Min. : 7.0
## Class :character 1st Qu.:0.7000 1st Qu.: 3.94 1st Qu.: 774.5
## Mode :character Median :0.7500 Median :15.26 Median : 1039.0
## Mean :0.7296 Mean :15.06 Mean : 1190.5
## 3rd Qu.:0.8000 3rd Qu.:24.26 3rd Qu.: 1252.5
## Max. :0.8000 Max. :54.56 Max. :23122.0
## NA's :506
## over_time incentive idle_time idle_men
## Min. : 0 Min. : 0.00 Min. : 0.0000 Min. : 0.0000
## 1st Qu.: 1440 1st Qu.: 0.00 1st Qu.: 0.0000 1st Qu.: 0.0000
## Median : 3960 Median : 0.00 Median : 0.0000 Median : 0.0000
## Mean : 4567 Mean : 38.21 Mean : 0.7302 Mean : 0.3693
## 3rd Qu.: 6960 3rd Qu.: 50.00 3rd Qu.: 0.0000 3rd Qu.: 0.0000
## Max. :25920 Max. :3600.00 Max. :300.0000 Max. :45.0000
##
## no_of_style_change no_of_workers actual_productivity
## Min. :0.0000 Min. : 2.00 Min. :0.2337
## 1st Qu.:0.0000 1st Qu.: 9.00 1st Qu.:0.6503
## Median :0.0000 Median :34.00 Median :0.7733
## Mean :0.1504 Mean :34.61 Mean :0.7351
## 3rd Qu.:0.0000 3rd Qu.:57.00 3rd Qu.:0.8503
## Max. :2.0000 Max. :89.00 Max. :1.1204
##
There are 15 attributes in this data set, out of which there are a couple of values which are quite unclear until we skim through the documentation. Some of them are:
smv- This actually means Standard minute value, which could be mistaken for speed/velocity measurement. It is the allocated time for the task.
wip- This stand for Work in progress, the includes the number of unfinished items for products. wip seems more like a ID rather than the actual meaning.
idle_time- This stands for the amount of time when the production was interrupted due to several reasons. The column name here does not mention the measurement unit of time.
over_time- This attribute stands for the amount of overtime by each team in minutes. The attribute name itself is not enough to understand the unit of time measurement here.
I think they chose these abbreviated, non-intuitive names to conform to common terminology used in manufacturing plants. The data is meant for internal use by people familiar with the terms.
One of the elements in this data that remains unclear even after reading the documentation is the ‘incentive’ column. This column contains integer values for each data row, but the exact meaning and mechanism is neither clear nor explained. It is unclear if this refers to monetary incentives, performance bonuses or any other kind of motivation for the workers. The documentation does not bridge this gap to explain what these incentive values represent. More context is needed to fully understand this data element’s role.
library(ggplot2)
df <- garment_prod
ggplot(df) + geom_col(aes(x=quarter, y=incentive, fill= quarter)) + labs(title="Incentive Values by Quarter",
x="Quarter",
y="Incentive Amount (unclear unit)")
Incentives in Quarter 1 and Quarter 2 seem generally higher than Quarter 3- Quarter 5. There may be seasonal trends.
However, there is high variance within each quarter - the range of incentives remains unclear.
The meaning of the absolute incentive numbers is still ambiguous without documentation.
But the relative differences between quarters provides clues on how incentives change over time.
While the incentive values themselves remain unclear, the trends versus quarter suggest some seasonality. This highlights that visualizing ambiguous data along other dimensions can still provide useful insights into relationships and patterns. However, fully understanding incentives still requires documentation of what these values represent.