Loading the csv file to garment_prod variable.
garment_prod <-read.csv("/Users/lakshmimounikab/Desktop/Stats with R/R practice/garment_prod.csv")
garment_prod$team <- as.character(garment_prod$team)
View(garment_prod)
summary(garment_prod)
## date quarter department day
## Length:1197 Length:1197 Length:1197 Length:1197
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## team targeted_productivity smv wip
## Length:1197 Min. :0.0700 Min. : 2.90 Min. : 7.0
## Class :character 1st Qu.:0.7000 1st Qu.: 3.94 1st Qu.: 774.5
## Mode :character Median :0.7500 Median :15.26 Median : 1039.0
## Mean :0.7296 Mean :15.06 Mean : 1190.5
## 3rd Qu.:0.8000 3rd Qu.:24.26 3rd Qu.: 1252.5
## Max. :0.8000 Max. :54.56 Max. :23122.0
## NA's :506
## over_time incentive idle_time idle_men
## Min. : 0 Min. : 0.00 Min. : 0.0000 Min. : 0.0000
## 1st Qu.: 1440 1st Qu.: 0.00 1st Qu.: 0.0000 1st Qu.: 0.0000
## Median : 3960 Median : 0.00 Median : 0.0000 Median : 0.0000
## Mean : 4567 Mean : 38.21 Mean : 0.7302 Mean : 0.3693
## 3rd Qu.: 6960 3rd Qu.: 50.00 3rd Qu.: 0.0000 3rd Qu.: 0.0000
## Max. :25920 Max. :3600.00 Max. :300.0000 Max. :45.0000
##
## no_of_style_change no_of_workers actual_productivity
## Min. :0.0000 Min. : 2.00 Min. :0.2337
## 1st Qu.:0.0000 1st Qu.: 9.00 1st Qu.:0.6503
## Median :0.0000 Median :34.00 Median :0.7733
## Mean :0.1504 Mean :34.61 Mean :0.7351
## 3rd Qu.:0.0000 3rd Qu.:57.00 3rd Qu.:0.8503
## Max. :2.0000 Max. :89.00 Max. :1.1204
##
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
Response variable is the output or outcome being studies and modeled. It is the y-variable. Explanatory variables are the inputs or predictors that influence the response variable. They are the x-variable we use to explain changes in the response.
# Set 1
actual_productivity ~ idle_time + no_of_workers
## actual_productivity ~ idle_time + no_of_workers
# Created variable name productivity_diff
productivity_diff = garment_prod$actual_productivity - garment_prod$targeted_productivity
# Set 2
productivity_diff ~ idle_men + factor(quarter, ordered=TRUE)
## productivity_diff ~ idle_men + factor(quarter, ordered = TRUE)
# Set 3
productivity_diff ~ smv +
factor(day, ordered=TRUE,
levels=c("Monday", "Tuesday", "Wednesday", "Thursday", "Saturday", "Sunday"))
## productivity_diff ~ smv + factor(day, ordered = TRUE, levels = c("Monday",
## "Tuesday", "Wednesday", "Thursday", "Saturday", "Sunday"))
Set 1 relates actual productivity to idle time and number of workers.
Created variable is the difference between actual and targeted productivity.
Set 2 relates the productivity difference to idle workers and ordered quarter.
Set 3 relates productivity difference to standard minute value and ordered day. All variables are numeric or ordered factors.
# Set 1
ggplot(garment_prod, aes(x = idle_time, y = actual_productivity)) +
geom_point()
# Set 2
ggplot(garment_prod, aes(x = idle_men, y = productivity_diff, color = quarter)) +
geom_point()
# Set 3
ggplot(garment_prod, aes(x = smv, y = productivity_diff, color = day)) +
geom_point()
Here are the key observations and insights from the plots:
Set 1: No clear relationship between idle time and actual productivity.High idle time does not seem to correspond to low productivity.
Insight: Idle time alone does not explain productivity variations. Other factors need to be considered.
Set 2: No observable pattern between idle men and productivity difference across quarters. Idle men weakly related to lower productivity difference.
Insight: More idle workers correspond to lower productivity, but quarter does not have an effect.
Set 3: No day of the week pattern in productivity difference. Higher standard minute value (SMV) correlates with lower productivity difference. Potential outlier with high SMV but also high productivity difference.
Insight: Higher SMV corresponds to lower productivity, but not always. The relationship is not straightforward.
Overall, the observations point to complex, non-linear relationships between the variables. While some broad patterns exist like higher idle workers and SMV relating to lower productivity, there are exceptions. More sophisticated modeling and inclusion of additional variables would be needed to develop deeper insights.
# Set 1
cor(garment_prod$idle_time, garment_prod$actual_productivity)
## [1] -0.08085081
# Set 2
cor(garment_prod$idle_men, productivity_diff)
## [1] -0.1651784
# Set 3
cor(garment_prod$smv, productivity_diff)
## [1] -0.09058271
Set 1 has very weak correlation, fitting the lack of pattern in plot.
Set 2 has weak negative correlation, capturing the slight idle men-productivity relationship.
Set 3 has stronger negative correlation, reflecting the clearer SMV-productivity relationship in plot.
The correlation values align with the visual patterns, though the relationships are not very strong. This confirms the complexity noted in the visual analysis.
# Set 1
ci_actual_prod <- t.test(garment_prod$actual_productivity)$conf.int
# Set 2
ci_prod_diff <- t.test(productivity_diff)$conf.int
# Set 3
ci_prod_diff <- t.test(productivity_diff)$conf.int
ci_actual_prod
## [1] 0.7251963 0.7449859
## attr(,"conf.level")
## [1] 0.95
ci_prod_diff
## [1] -0.003619192 0.014536557
## attr(,"conf.level")
## [1] 0.95
ci_prod_diff
## [1] -0.003619192 0.014536557
## attr(,"conf.level")
## [1] 0.95
Here are detailed conclusions about the population based on the confidence intervals for each response variable:
For actual productivity in Set 1:
For productivity difference in Set 2 and Set 3:
In summary, the response variable confidence intervals reveal limited productivity variability and consistent under performance versus targets across the population under study. The production levels and shortfalls appear predictable based on the narrow ranges.