Analysis of the Case Study, Store24 (A): Managing Employee Retention

This document gives Analysis of the Case Store24 (A): Managing Employee Retention.

Task 4c

Set the working directory.

setwd("~/Desktop/Data Analytics Internship/Employee Retention")

Read the dataset.

store  <- read.csv(paste("Store24.csv" , sep = ""))
View(store)

Describing the Dataset

library(psych)
describe(store)
##            vars  n       mean        sd     median    trimmed       mad
## store         1 75      38.00     21.79      38.00      38.00     28.17
## Sales         2 75 1205413.12 304531.31 1127332.00 1182031.25 288422.04
## Profit        3 75  276313.61  89404.08  265014.00  270260.34  90532.00
## MTenure       4 75      45.30     57.67      24.12      33.58     29.67
## CTenure       5 75      13.93     17.70       7.21      10.60      6.14
## Pop           6 75    9825.59   5911.67    8896.00    9366.07   7266.22
## Comp          7 75       3.79      1.31       3.63       3.66      0.82
## Visibility    8 75       3.08      0.75       3.00       3.07      0.00
## PedCount      9 75       2.96      0.99       3.00       2.97      1.48
## Res          10 75       0.96      0.20       1.00       1.00      0.00
## Hours24      11 75       0.84      0.37       1.00       0.92      0.00
## CrewSkill    12 75       3.46      0.41       3.50       3.47      0.34
## MgrSkill     13 75       3.64      0.41       3.59       3.62      0.45
## ServQual     14 75      87.15     12.61      89.47      88.62     15.61
##                  min        max      range  skew kurtosis       se
## store           1.00      75.00      74.00  0.00    -1.25     2.52
## Sales      699306.00 2113089.00 1413783.00  0.71    -0.09 35164.25
## Profit     122180.00  518998.00  396818.00  0.62    -0.21 10323.49
## MTenure         0.00     277.99     277.99  2.01     3.90     6.66
## CTenure         0.89     114.15     113.26  3.52    15.00     2.04
## Pop          1046.00   26519.00   25473.00  0.62    -0.23   682.62
## Comp            1.65      11.13       9.48  2.48    11.31     0.15
## Visibility      2.00       5.00       3.00  0.25    -0.38     0.09
## PedCount        1.00       5.00       4.00  0.00    -0.52     0.11
## Res             0.00       1.00       1.00 -4.60    19.43     0.02
## Hours24         0.00       1.00       1.00 -1.82     1.32     0.04
## CrewSkill       2.06       4.64       2.58 -0.43     1.64     0.05
## MgrSkill        2.96       4.62       1.67  0.27    -0.53     0.05
## ServQual       57.90     100.00      42.10 -0.66    -0.72     1.46

Summary of the Dataset

library(psych)
describe(store)
##            vars  n       mean        sd     median    trimmed       mad
## store         1 75      38.00     21.79      38.00      38.00     28.17
## Sales         2 75 1205413.12 304531.31 1127332.00 1182031.25 288422.04
## Profit        3 75  276313.61  89404.08  265014.00  270260.34  90532.00
## MTenure       4 75      45.30     57.67      24.12      33.58     29.67
## CTenure       5 75      13.93     17.70       7.21      10.60      6.14
## Pop           6 75    9825.59   5911.67    8896.00    9366.07   7266.22
## Comp          7 75       3.79      1.31       3.63       3.66      0.82
## Visibility    8 75       3.08      0.75       3.00       3.07      0.00
## PedCount      9 75       2.96      0.99       3.00       2.97      1.48
## Res          10 75       0.96      0.20       1.00       1.00      0.00
## Hours24      11 75       0.84      0.37       1.00       0.92      0.00
## CrewSkill    12 75       3.46      0.41       3.50       3.47      0.34
## MgrSkill     13 75       3.64      0.41       3.59       3.62      0.45
## ServQual     14 75      87.15     12.61      89.47      88.62     15.61
##                  min        max      range  skew kurtosis       se
## store           1.00      75.00      74.00  0.00    -1.25     2.52
## Sales      699306.00 2113089.00 1413783.00  0.71    -0.09 35164.25
## Profit     122180.00  518998.00  396818.00  0.62    -0.21 10323.49
## MTenure         0.00     277.99     277.99  2.01     3.90     6.66
## CTenure         0.89     114.15     113.26  3.52    15.00     2.04
## Pop          1046.00   26519.00   25473.00  0.62    -0.23   682.62
## Comp            1.65      11.13       9.48  2.48    11.31     0.15
## Visibility      2.00       5.00       3.00  0.25    -0.38     0.09
## PedCount        1.00       5.00       4.00  0.00    -0.52     0.11
## Res             0.00       1.00       1.00 -4.60    19.43     0.02
## Hours24         0.00       1.00       1.00 -1.82     1.32     0.04
## CrewSkill       2.06       4.64       2.58 -0.43     1.64     0.05
## MgrSkill        2.96       4.62       1.67  0.27    -0.53     0.05
## ServQual       57.90     100.00      42.10 -0.66    -0.72     1.46

Attaching the dataframe

attach(store)
## The following object is masked _by_ .GlobalEnv:
## 
##     store

Task 4d

Measure the mean and standard deviation of Profit

mean(store$Profit)
## [1] 276313.6
sd(store$Profit)
## [1] 89404.08

Measure the mean and standard deviation of MTenure

mean(store$MTenure)
## [1] 45.29644
sd(store$MTenure)
## [1] 57.67155

Measure the mean and standard deviation of CTenure

mean(store$CTenure)
## [1] 13.9315
sd(store$CTenure)
## [1] 17.69752

TASK 4e

Sorting and Subsetting data in R

attach(mtcars)
View(mtcars)
newdata <- mtcars[order(mpg),] # sort by mpg (ascending)
View(newdata)
newdata[1:5,] # see the first 5 rows
##                      mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Cadillac Fleetwood  10.4   8  472 205 2.93 5.250 17.98  0  0    3    4
## Lincoln Continental 10.4   8  460 215 3.00 5.424 17.82  0  0    3    4
## Camaro Z28          13.3   8  350 245 3.73 3.840 15.41  0  0    3    4
## Duster 360          14.3   8  360 245 3.21 3.570 15.84  0  0    3    4
## Chrysler Imperial   14.7   8  440 230 3.23 5.345 17.42  0  0    3    4
newdata <- mtcars[order(-mpg),] # sort by mpg (descending)
View(newdata)
detach(mtcars)

TASK 4f

Replicate Exhibit 1 shown in the case, using R

draw a scatter plot of Profit vs. MTenure.

library(car)
## 
## Attaching package: 'car'
## The following object is masked from 'package:psych':
## 
##     logit
scatterplot(Profit~MTenure, data=store,
            xlab="MTenure", ylab="Profit", 
            main="Scatterplot of MTenure v/s Profit")

TASK 4h - Scatter Plots (contd.)

draw a scatter plot of Profit vs. CTenure.

scatterplot(Profit~CTenure, data=store,
            xlab="CTenure", ylab="Profit", 
            main="Scatterplot of CTenure v/s Profit")

TASK 4i - Correlation Matrix

construct a Correlation Matrix for all the variables in the dataset. (Display the numbers up to 2 Decimal places)

options(digits=2)
cor(store)
##             store  Sales Profit MTenure CTenure     Pop   Comp Visibility
## store       1.000 -0.227 -0.200  -0.057  0.0199 -0.2894  0.032     -0.026
## Sales      -0.227  1.000  0.924   0.455  0.2543  0.4035 -0.235      0.131
## Profit     -0.200  0.924  1.000   0.439  0.2577  0.4306 -0.335      0.136
## MTenure    -0.057  0.455  0.439   1.000  0.2434 -0.0609  0.181      0.157
## CTenure     0.020  0.254  0.258   0.243  1.0000 -0.0015 -0.070      0.067
## Pop        -0.289  0.403  0.431  -0.061 -0.0015  1.0000 -0.268     -0.050
## Comp        0.032 -0.235 -0.335   0.181 -0.0703 -0.2683  1.000      0.028
## Visibility -0.026  0.131  0.136   0.157  0.0665 -0.0500  0.028      1.000
## PedCount   -0.221  0.424  0.450   0.062 -0.0841  0.6076 -0.146     -0.141
## Res        -0.031 -0.167 -0.159  -0.062 -0.3403 -0.2369  0.219      0.022
## Hours24     0.027  0.063 -0.026  -0.165  0.0729 -0.2218  0.130      0.047
## CrewSkill   0.049  0.164  0.160   0.102  0.2572  0.2828 -0.042     -0.197
## MgrSkill   -0.072  0.312  0.323   0.230  0.1240  0.0836  0.224      0.073
## ServQual   -0.322  0.386  0.362   0.182  0.0812  0.1239  0.018      0.210
##            PedCount    Res Hours24 CrewSkill MgrSkill ServQual
## store       -0.2212 -0.031   0.027     0.049   -0.072  -0.3225
## Sales        0.4239 -0.167   0.063     0.164    0.312   0.3864
## Profit       0.4502 -0.159  -0.026     0.160    0.323   0.3625
## MTenure      0.0620 -0.062  -0.165     0.102    0.230   0.1817
## CTenure     -0.0841 -0.340   0.073     0.257    0.124   0.0812
## Pop          0.6076 -0.237  -0.222     0.283    0.084   0.1239
## Comp        -0.1463  0.219   0.130    -0.042    0.224   0.0181
## Visibility  -0.1411  0.022   0.047    -0.197    0.073   0.2099
## PedCount     1.0000 -0.284  -0.276     0.214    0.087  -0.0054
## Res         -0.2844  1.000  -0.089    -0.153   -0.032   0.0908
## Hours24     -0.2760 -0.089   1.000     0.105   -0.039   0.0583
## CrewSkill    0.2137 -0.153   0.105     1.000   -0.021  -0.0335
## MgrSkill     0.0875 -0.032  -0.039    -0.021    1.000   0.3567
## ServQual    -0.0054  0.091   0.058    -0.034    0.357   1.0000

TASK 4j - Correlations

Measure the correlation between Profit and MTenure. (Display the numbers up to 2 Decimal places)

cor(store$Profit, store$MTenure)
## [1] 0.44

Measure the correlation between Profit and CTenure.(Display the numbers up to 2 Decimal places)

cor(store$Profit, store$CTenure)
## [1] 0.26

TASK 4k

construct the Corrgram based on all variables in the dataset.

library(corrgram)
## Warning: replacing previous import by 'magrittr::%>%' when loading
## 'dendextend'
corrgram(store,order=TRUE, lower.panel = panel.shade,
         upper.panel = panel.pie, text.panel = panel.txt, 
         main="Corrgram of store24 dataset")

Critically think about how the Profit is correlated with the other variables (e.g. MTenure, CTenure, Sales, Pop, Comp etc). Qualitatively discuss the managerially relevant correlations.

Profit is positive for MTenure,CTenure,Sales,Pop,Comp,Visibility,ServQual,CrewSkill and Profit is negatively affected by Comp,Hours24,store and Res. We notice that the manager’s tenure(MTenure) is negatively affected by longer working hours (Hours24) and are not interested in continuing their jobs. Crews don’t in continuing their tenure in 24 hour stores as this gives a chance of having flexible hours for their shift. Profit is positively affected by Crew’s Skill and Manager’s Skill as more the customers are satisfied due to efficient customer service, more returning customers the store has.

TASK 4l - Pearson’s Correlation Tests

Run a Pearson’s Correlation test on the correlation between Profit and MTenure. What is the p-value?

cor.test(store$Profit, store$MTenure, method = "pearson")
## 
##  Pearson's product-moment correlation
## 
## data:  store$Profit and store$MTenure
## t = 4, df = 70, p-value = 8e-05
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.24 0.61
## sample estimates:
##  cor 
## 0.44

P-value : 8e-05

Run a Pearson’s Correlation test on the correlation between Profit and CTenure. What is the p-value?

cor.test(store$Profit, store$CTenure, method = "pearson")
## 
##  Pearson's product-moment correlation
## 
## data:  store$Profit and store$CTenure
## t = 2, df = 70, p-value = 0.03
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.033 0.458
## sample estimates:
##  cor 
## 0.26

P-value: 0.03

TASK 4m - Regression Analysis

Regression of Profit on {MTenure, CTenure Comp, Pop, PedCount, Res, Hours24, Visibility}.

rp<- lm(Profit~MTenure+CTenure+Comp+Pop+PedCount+Res+Hours24+Visibility, data=store)
summary(rp)
## 
## Call:
## lm(formula = Profit ~ MTenure + CTenure + Comp + Pop + PedCount + 
##     Res + Hours24 + Visibility, data = store)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -105789  -35946   -7069   33780  112390 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   7610.04   66821.99    0.11  0.90967    
## MTenure        760.99     127.09    5.99  9.7e-08 ***
## CTenure        944.98     421.69    2.24  0.02840 *  
## Comp        -25286.89    5491.94   -4.60  1.9e-05 ***
## Pop              3.67       1.47    2.50  0.01489 *  
## PedCount     34087.36    9073.20    3.76  0.00037 ***
## Res          91584.68   39231.28    2.33  0.02262 *  
## Hours24      63233.31   19641.11    3.22  0.00199 ** 
## Visibility   12625.45    9087.62    1.39  0.16941    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 57000 on 66 degrees of freedom
## Multiple R-squared:  0.638,  Adjusted R-squared:  0.594 
## F-statistic: 14.5 on 8 and 66 DF,  p-value: 5.38e-12

TASK 4n

List the explanatory variable(s) whose beta-coefficients are statistically significant (p < 0.05).

MTenure, CTenure, Comp, Pop, PedCount, Res, Hours24

List the explanatory variable(s) whose beta-coefficients are not statistically significant (p > 0.05).

Visibility

TASK 4o

What is expected change in the Profit at a store, if the Manager’s tenure i.e. number of months of experience with Store24, increases by one month?

From the above analysis, we can decipher that if the Manager’s tenure is increased by a month, the profit changes by $760.99.

What is expected change in the Profit at a store, if the Crew’s tenure i.e. number of months of experience with Store24, increases by one month?

From the above analysis, we can decipher that if the Crew’s tenure is increased by a month, the profit changes by $944.98.

Task 4p

Executive Summary

a)The most profitable store is with ID:74 and the least profitable store is :57

b)The correlation between Profit and MTenure are 0.44 while of that between Profit and CTenure is 0.26.

c)Pearson coefficient suggests that the value of p<0.05 which means the hypothesis is true.

d)The regression coefficient suggests that the value of p is significiant which says it is a good fit model.

e)R square value is:0.6379.It means that 63.79% of variations in the dependent variable can be explained by the independent variable.

f)Adjusted R square value is 0.594.It means 59.4% variation in the dependent variable can be explained by the independent variable also the value decreases as we add no of independent variables to it.

g)Explanatory variable(s) whose beta-coefficients are statistically significant are - MTenure , CTenure, Pop , PedCount , Res , Hours24 while that whose beta-coefficients are not statistically significant is the Visibility variable.

So as an overall analysis we can say it is a good model to fit in.