Reading the Managing the employee Retention case study dataset into a data frame called “store”for futher investigation and insights.
store <- read.csv(paste("Store24.csv",sep=""))
head(store)
## store Sales Profit MTenure CTenure Pop Comp Visibility
## 1 1 1060294 265014 0.00000 24.804930 7535 2.797888 3
## 2 2 1619874 424007 86.22219 6.636550 8630 4.235555 4
## 3 3 1099921 222735 23.88854 5.026694 9695 4.494666 3
## 4 4 1053860 210122 0.00000 5.371663 2797 4.253946 4
## 5 5 1227841 300480 3.87737 6.866530 20335 1.651364 2
## 6 6 1703140 469050 149.93590 11.351130 16926 3.184613 3
## PedCount Res Hours24 CrewSkill MgrSkill ServQual
## 1 3 1 1 3.56 3.150000 86.84327
## 2 3 1 1 3.20 3.556667 94.73510
## 3 3 1 1 3.80 4.116667 78.94776
## 4 2 1 1 2.06 4.100000 100.00000
## 5 5 0 1 3.65 3.588889 68.42164
## 6 4 1 0 3.58 4.605556 94.73510
describe(store)
## vars n mean sd median trimmed mad
## store 1 75 38.00 21.79 38.00 38.00 28.17
## Sales 2 75 1205413.12 304531.31 1127332.00 1182031.25 288422.04
## Profit 3 75 276313.61 89404.08 265014.00 270260.34 90532.00
## MTenure 4 75 45.30 57.67 24.12 33.58 29.67
## CTenure 5 75 13.93 17.70 7.21 10.60 6.14
## Pop 6 75 9825.59 5911.67 8896.00 9366.07 7266.22
## Comp 7 75 3.79 1.31 3.63 3.66 0.82
## Visibility 8 75 3.08 0.75 3.00 3.07 0.00
## PedCount 9 75 2.96 0.99 3.00 2.97 1.48
## Res 10 75 0.96 0.20 1.00 1.00 0.00
## Hours24 11 75 0.84 0.37 1.00 0.92 0.00
## CrewSkill 12 75 3.46 0.41 3.50 3.47 0.34
## MgrSkill 13 75 3.64 0.41 3.59 3.62 0.45
## ServQual 14 75 87.15 12.61 89.47 88.62 15.61
## min max range skew kurtosis se
## store 1.00 75.00 74.00 0.00 -1.25 2.52
## Sales 699306.00 2113089.00 1413783.00 0.71 -0.09 35164.25
## Profit 122180.00 518998.00 396818.00 0.62 -0.21 10323.49
## MTenure 0.00 277.99 277.99 2.01 3.90 6.66
## CTenure 0.89 114.15 113.26 3.52 15.00 2.04
## Pop 1046.00 26519.00 25473.00 0.62 -0.23 682.62
## Comp 1.65 11.13 9.48 2.48 11.31 0.15
## Visibility 2.00 5.00 3.00 0.25 -0.38 0.09
## PedCount 1.00 5.00 4.00 0.00 -0.52 0.11
## Res 0.00 1.00 1.00 -4.60 19.43 0.02
## Hours24 0.00 1.00 1.00 -1.82 1.32 0.04
## CrewSkill 2.06 4.64 2.58 -0.43 1.64 0.05
## MgrSkill 2.96 4.62 1.67 0.27 -0.53 0.05
## ServQual 57.90 100.00 42.10 -0.66 -0.72 1.46
Lets (i) Use R to measure the mean and standard deviation of Profit. (ii) Use R to measure the mean and standard deviation of MTenure. (iii) Use R to measure the mean and standard deviation of CTenure
mean_sd(store$Profit,denote_sd = "paren")
## [1] "276,313.61 (89,404.08)"
mean_sd(store$MTenure,denote_sd = "paren")
## [1] "45.30 (57.67)"
mean_sd(store$CTenure,denote_sd = "paren")
## [1] "13.93 (17.70)"
Lets now visualize variable MTenure and CTenure
hist(store$MTenure, breaks = 18,
col="gray", xlab="Mtenure",
main="Manager Tenure")
hist(store$CTenure, breaks = 18,
col="gray", xlab="Mtenure",
main="Manager Tenure")
Lets Use R to print the {StoreID, Sales, Profit, MTenure, CTenure} of the top 10 most profitable stores.
newdata <- store[c(1:5)]
newdata <- newdata[order(-store$Profit),]
head(newdata,10)
## store Sales Profit MTenure CTenure
## 74 74 1782957 518998 171.09720 29.519510
## 7 7 1809256 476355 62.53080 7.326488
## 9 9 2113089 474725 108.99350 6.061602
## 6 6 1703140 469050 149.93590 11.351130
## 44 44 1807740 439781 182.23640 114.151900
## 2 2 1619874 424007 86.22219 6.636550
## 45 45 1602362 410149 47.64565 9.166325
## 18 18 1704826 394039 239.96980 33.774130
## 11 11 1583446 389886 44.81977 2.036961
## 47 47 1665657 387853 12.84790 6.636550
Lets use R to print the {StoreID, Sales, Profit, MTenure, CTenure} of the bottom 10 least profitable stores.
newdata1 <- newdata[order(store$Profit),] # sort by Profit
head(newdata1,10)
## store Sales Profit MTenure CTenure
## 4 4 1053860 210122 0.0000000 5.3716630
## 37 37 1202917 187765 23.1985000 1.3470230
## 14 14 1171491 261571 87.3722600 2.9568790
## 26 26 898548 211912 0.6571813 20.4353200
## 10 10 1080979 278625 31.4789900 23.1950700
## 22 22 1433440 367036 18.3682200 25.9548300
## 36 36 1016950 219292 41.5995800 20.8952800
## 64 64 969509 221157 0.0000000 0.8870637
## 71 71 977566 198529 43.8997200 38.3737200
## 17 17 1095695 265584 31.7090000 3.6468170
Lets use R to draw a scatter plot of Profit vs. MTenure.
scatterplot( Profit ~ MTenure, data=store, ## this is Y,X
xlab="MTenure in weeks", ylab="Profit in $",spread = FALSE,smoother.args=list(lty=2),
pch=19,
main="Scatterplot of Profit vs Mtenure",labels = row.names(store))
Use R to draw a scatter plot of Profit vs. CTenure.
scatterplot( Profit ~ CTenure, data=store, ## this is Y,X
xlab="CTenure in weeks", ylab="Profit in $",spread = FALSE,smoother.args=list(lty=2),
pch=19,
main="Scatterplot of Profit vs Mtenure",labels = row.names(store))
Lets use R to construct a Correlation Matrix for all the variables in the dataset. (Display the numbers up to 2 Decimal places)
corrs <- round(cor(store, use="pairwise.complete.obs"), 2)
head(corrs,10)
## store Sales Profit MTenure CTenure Pop Comp Visibility
## store 1.00 -0.23 -0.20 -0.06 0.02 -0.29 0.03 -0.03
## Sales -0.23 1.00 0.92 0.45 0.25 0.40 -0.24 0.13
## Profit -0.20 0.92 1.00 0.44 0.26 0.43 -0.33 0.14
## MTenure -0.06 0.45 0.44 1.00 0.24 -0.06 0.18 0.16
## CTenure 0.02 0.25 0.26 0.24 1.00 0.00 -0.07 0.07
## Pop -0.29 0.40 0.43 -0.06 0.00 1.00 -0.27 -0.05
## Comp 0.03 -0.24 -0.33 0.18 -0.07 -0.27 1.00 0.03
## Visibility -0.03 0.13 0.14 0.16 0.07 -0.05 0.03 1.00
## PedCount -0.22 0.42 0.45 0.06 -0.08 0.61 -0.15 -0.14
## Res -0.03 -0.17 -0.16 -0.06 -0.34 -0.24 0.22 0.02
## PedCount Res Hours24 CrewSkill MgrSkill ServQual
## store -0.22 -0.03 0.03 0.05 -0.07 -0.32
## Sales 0.42 -0.17 0.06 0.16 0.31 0.39
## Profit 0.45 -0.16 -0.03 0.16 0.32 0.36
## MTenure 0.06 -0.06 -0.17 0.10 0.23 0.18
## CTenure -0.08 -0.34 0.07 0.26 0.12 0.08
## Pop 0.61 -0.24 -0.22 0.28 0.08 0.12
## Comp -0.15 0.22 0.13 -0.04 0.22 0.02
## Visibility -0.14 0.02 0.05 -0.20 0.07 0.21
## PedCount 1.00 -0.28 -0.28 0.21 0.09 -0.01
## Res -0.28 1.00 -0.09 -0.15 -0.03 0.09
#We can also use
#options(digits=2)
#cor(store,use="complete obs", method="Kendall")
Use R to measure the correlation between Profit and MTenure. (Display the numbers up to 2 Decimal places)
options(digits =2)
cor(store$Profit,store$MTenure)
## [1] 0.44
Use R to measure the correlation between Profit and CTenure. (Display the numbers up to 2 Decimal places)
options(digits =2)
cor(store$Profit,store$CTenure)
## [1] 0.26
Lets study the scatter plot matrix for the different variables.
scatterplot.matrix(store[,c("Profit","MTenure","CTenure")],spread=FALSE,
smoother.args=list(lty=2),
main="Scatter plot Matrix")
## Warning: 'scatterplot.matrix' is deprecated.
## Use 'scatterplotMatrix' instead.
## See help("Deprecated") and help("car-deprecated").
Lets Use R to construct the following Corrgram based on all variables in the dataset.
#library("corrgram")
#corrgram(store, order=FALSE,
# main="Corrgram of Store",lower.panel=panel.shade, upper.panel=panel.pie, # text.panel=panel.txt)
Run a Pearson’s Correlation test on the correlation between Profit and MTenure. What is the p-value?
cor.test(store$Profit,store$MTenure)
##
## Pearson's product-moment correlation
##
## data: store$Profit and store$MTenure
## t = 4, df = 70, p-value = 8e-05
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.24 0.61
## sample estimates:
## cor
## 0.44
Thus the p-value is 8.19x10^(-5) Run a Pearson’s Correlation test on the correlation between Profit and CTenure. What is the p-value?
cor.test(store$Profit,store$CTenure)
##
## Pearson's product-moment correlation
##
## data: store$Profit and store$CTenure
## t = 2, df = 70, p-value = 0.03
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.033 0.458
## sample estimates:
## cor
## 0.26
Thus the p-value from the Pearson’s Correlation test is 0.02562 Run a regression of Profit on MTenure, CTenure Comp, Pop, PedCount, Res, Hours24, Visibility
modulus1 <- lm(Profit ~ MTenure + CTenure + Comp + Pop + PedCount + Res + Hours24 + Visibility ,data=store )
summary(modulus1)
##
## Call:
## lm(formula = Profit ~ MTenure + CTenure + Comp + Pop + PedCount +
## Res + Hours24 + Visibility, data = store)
##
## Residuals:
## Min 1Q Median 3Q Max
## -105789 -35946 -7069 33780 112390
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7610.04 66821.99 0.11 0.90967
## MTenure 760.99 127.09 5.99 9.7e-08 ***
## CTenure 944.98 421.69 2.24 0.02840 *
## Comp -25286.89 5491.94 -4.60 1.9e-05 ***
## Pop 3.67 1.47 2.50 0.01489 *
## PedCount 34087.36 9073.20 3.76 0.00037 ***
## Res 91584.68 39231.28 2.33 0.02262 *
## Hours24 63233.31 19641.11 3.22 0.00199 **
## Visibility 12625.45 9087.62 1.39 0.16941
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 57000 on 66 degrees of freedom
## Multiple R-squared: 0.638, Adjusted R-squared: 0.594
## F-statistic: 14.5 on 8 and 66 DF, p-value: 5.38e-12
lets List the explanatory variable(s) whose beta-coefficients are statistically significant (p < 0.05).
Mtenure, CTenure,Comp,Pop, PedCount,Res, Hours24 are statistically significant explanatory variables This can be viewed from the P-value column above
Lets List the explanatory variable(s) whose beta-coefficients are not statistically significant (p > 0.05)
Visibility is a statistically insignificant explanatory variable
Questions and Answers
Q1)What is expected change in the Profit at a store, if the Manager’s tenure i.e. number of months of experience with Store24, increases by one month?
Ans)If the manager’s tenure increases by one month then the profit at the store by $761
What is expected change in the Profit at a store, if the Crew’s tenure i.e. number of months of experience with Store24, increases by one month?
Ans)If the employees tenure increases by 1 month then the profit at the store increases by $945
Managerial Insight
As seen above, from the regression analysis,