data <- read.csv("C:\\Users\\Krishna\\Downloads\\productivity+prediction+of+garment+employees\\garments_worker_productivity.csv")
# 1. Selecting Variables
#**Response Variable:** Actual Productivity
## **Explanatory Variable:** Department
# 2. ANOVA Test
# Null Hypothesis:
# Null hypothesis: There is no significant difference in actual productivity among different departments.
# Perform ANOVA
anova_result <- aov(actual_productivity ~ department, data = data)
summary(anova_result)
## Df Sum Sq Mean Sq F value Pr(>F)
## department 2 0.72 0.3615 12.09 6.31e-06 ***
## Residuals 1194 35.69 0.0299
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The p-value from the ANOVA test is 0.00000631. Since this value is less than 0.05, we reject the null hypothesis.
There is significant evidence to suggest that there are differences in actual productivity among different departments.
# Identifying the predictor variable
# In my dataset i will consider SMV(standard minute value) as a predictor variable
# it's the time to build regreesion model on my predictor variable
# Build linear regression model
model <- lm(actual_productivity ~ smv, data = data)
# Summary of the model
summary(model)
##
## Call:
## lm(formula = actual_productivity ~ smv, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.52095 -0.08825 0.04300 0.11334 0.37991
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.7644124 0.0085220 89.699 < 2e-16 ***
## smv -0.0019467 0.0004578 -4.252 2.28e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1733 on 1195 degrees of freedom
## Multiple R-squared: 0.01491, Adjusted R-squared: 0.01408
## F-statistic: 18.08 on 1 and 1195 DF, p-value: 2.281e-05
The R-squared value of the model is 0.01491. This indicates that 1.491% of the variability in actual productivity can be explained by the predictor variable (SMV).
Interpretation of Coefficients: The coefficient of SMV is -0.0019. This suggests that for each unit increase in SMV, the actual productivity is expected to decrease by 0.0019467 units.
Recommendations: Based on the analysis, it is recommended to focus on optimizing SMV to improve actual productivity in the manufacturing process.
# Load required packages
library(ggplot2)
# Plot SMV vs. Actual Productivity
ggplot(data, aes(x = smv, y = actual_productivity)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE) +
labs(title = "SMV vs. Actual Productivity",
x = "SMV (Standard Minute Value)",
y = "Actual Productivity")
## `geom_smooth()` using formula = 'y ~ x'
1)ANOVA test: The anova test evaluates whether there are statistically significant differences in actual productivity across different catagories of a categorical variable (ex-department)
2)linear regression model:
The linear regression model depicts the relationship between a continuous predictor variable (smv) and response variable (actual productivity)