Reading the Managing the employee Retention case study dataset into a data frame called “store”for futher investigation and insights.

store <- read.csv(paste("Store24.csv",sep=""))
head(store)
##   store   Sales Profit   MTenure   CTenure   Pop     Comp Visibility
## 1     1 1060294 265014   0.00000 24.804930  7535 2.797888          3
## 2     2 1619874 424007  86.22219  6.636550  8630 4.235555          4
## 3     3 1099921 222735  23.88854  5.026694  9695 4.494666          3
## 4     4 1053860 210122   0.00000  5.371663  2797 4.253946          4
## 5     5 1227841 300480   3.87737  6.866530 20335 1.651364          2
## 6     6 1703140 469050 149.93590 11.351130 16926 3.184613          3
##   PedCount Res Hours24 CrewSkill MgrSkill  ServQual
## 1        3   1       1      3.56 3.150000  86.84327
## 2        3   1       1      3.20 3.556667  94.73510
## 3        3   1       1      3.80 4.116667  78.94776
## 4        2   1       1      2.06 4.100000 100.00000
## 5        5   0       1      3.65 3.588889  68.42164
## 6        4   1       0      3.58 4.605556  94.73510
describe(store)
##            vars  n       mean        sd     median    trimmed       mad
## store         1 75      38.00     21.79      38.00      38.00     28.17
## Sales         2 75 1205413.12 304531.31 1127332.00 1182031.25 288422.04
## Profit        3 75  276313.61  89404.08  265014.00  270260.34  90532.00
## MTenure       4 75      45.30     57.67      24.12      33.58     29.67
## CTenure       5 75      13.93     17.70       7.21      10.60      6.14
## Pop           6 75    9825.59   5911.67    8896.00    9366.07   7266.22
## Comp          7 75       3.79      1.31       3.63       3.66      0.82
## Visibility    8 75       3.08      0.75       3.00       3.07      0.00
## PedCount      9 75       2.96      0.99       3.00       2.97      1.48
## Res          10 75       0.96      0.20       1.00       1.00      0.00
## Hours24      11 75       0.84      0.37       1.00       0.92      0.00
## CrewSkill    12 75       3.46      0.41       3.50       3.47      0.34
## MgrSkill     13 75       3.64      0.41       3.59       3.62      0.45
## ServQual     14 75      87.15     12.61      89.47      88.62     15.61
##                  min        max      range  skew kurtosis       se
## store           1.00      75.00      74.00  0.00    -1.25     2.52
## Sales      699306.00 2113089.00 1413783.00  0.71    -0.09 35164.25
## Profit     122180.00  518998.00  396818.00  0.62    -0.21 10323.49
## MTenure         0.00     277.99     277.99  2.01     3.90     6.66
## CTenure         0.89     114.15     113.26  3.52    15.00     2.04
## Pop          1046.00   26519.00   25473.00  0.62    -0.23   682.62
## Comp            1.65      11.13       9.48  2.48    11.31     0.15
## Visibility      2.00       5.00       3.00  0.25    -0.38     0.09
## PedCount        1.00       5.00       4.00  0.00    -0.52     0.11
## Res             0.00       1.00       1.00 -4.60    19.43     0.02
## Hours24         0.00       1.00       1.00 -1.82     1.32     0.04
## CrewSkill       2.06       4.64       2.58 -0.43     1.64     0.05
## MgrSkill        2.96       4.62       1.67  0.27    -0.53     0.05
## ServQual       57.90     100.00      42.10 -0.66    -0.72     1.46

Lets (i) Use R to measure the mean and standard deviation of Profit. (ii) Use R to measure the mean and standard deviation of MTenure. (iii) Use R to measure the mean and standard deviation of CTenure

mean_sd(store$Profit,denote_sd = "paren")
## [1] "276,313.61 (89,404.08)"
mean_sd(store$MTenure,denote_sd = "paren")
## [1] "45.30 (57.67)"
mean_sd(store$CTenure,denote_sd = "paren")
## [1] "13.93 (17.70)"

Lets now visualize variable MTenure and CTenure

hist(store$MTenure, breaks = 18,
     col="gray", xlab="Mtenure",
     main="Manager Tenure")

hist(store$CTenure, breaks = 18,
     col="gray", xlab="Mtenure",
     main="Manager Tenure")

Lets Use R to print the {StoreID, Sales, Profit, MTenure, CTenure} of the top 10 most profitable stores.

newdata <- store[c(1:5)]
newdata <- newdata[order(-store$Profit),]
head(newdata,10)
##    store   Sales Profit   MTenure    CTenure
## 74    74 1782957 518998 171.09720  29.519510
## 7      7 1809256 476355  62.53080   7.326488
## 9      9 2113089 474725 108.99350   6.061602
## 6      6 1703140 469050 149.93590  11.351130
## 44    44 1807740 439781 182.23640 114.151900
## 2      2 1619874 424007  86.22219   6.636550
## 45    45 1602362 410149  47.64565   9.166325
## 18    18 1704826 394039 239.96980  33.774130
## 11    11 1583446 389886  44.81977   2.036961
## 47    47 1665657 387853  12.84790   6.636550

Lets use R to print the {StoreID, Sales, Profit, MTenure, CTenure} of the bottom 10 least profitable stores.

newdata1 <- newdata[order(store$Profit),] # sort by Profit
head(newdata1,10)
##    store   Sales Profit    MTenure    CTenure
## 4      4 1053860 210122  0.0000000  5.3716630
## 37    37 1202917 187765 23.1985000  1.3470230
## 14    14 1171491 261571 87.3722600  2.9568790
## 26    26  898548 211912  0.6571813 20.4353200
## 10    10 1080979 278625 31.4789900 23.1950700
## 22    22 1433440 367036 18.3682200 25.9548300
## 36    36 1016950 219292 41.5995800 20.8952800
## 64    64  969509 221157  0.0000000  0.8870637
## 71    71  977566 198529 43.8997200 38.3737200
## 17    17 1095695 265584 31.7090000  3.6468170

Lets use R to draw a scatter plot of Profit vs. MTenure.

scatterplot( Profit ~ MTenure, data=store,   ## this is Y,X
    xlab="MTenure in weeks", ylab="Profit in $",spread = FALSE,smoother.args=list(lty=2),
    pch=19, 
   main="Scatterplot of Profit vs Mtenure",labels = row.names(store))

Use R to draw a scatter plot of Profit vs. CTenure.

scatterplot( Profit ~ CTenure, data=store,   ## this is Y,X
    xlab="CTenure in weeks", ylab="Profit in $",spread = FALSE,smoother.args=list(lty=2),
    pch=19, 
   main="Scatterplot of Profit vs Mtenure",labels = row.names(store))

Lets use R to construct a Correlation Matrix for all the variables in the dataset. (Display the numbers up to 2 Decimal places)

corrs <- round(cor(store, use="pairwise.complete.obs"), 2)
head(corrs,10)
##            store Sales Profit MTenure CTenure   Pop  Comp Visibility
## store       1.00 -0.23  -0.20   -0.06    0.02 -0.29  0.03      -0.03
## Sales      -0.23  1.00   0.92    0.45    0.25  0.40 -0.24       0.13
## Profit     -0.20  0.92   1.00    0.44    0.26  0.43 -0.33       0.14
## MTenure    -0.06  0.45   0.44    1.00    0.24 -0.06  0.18       0.16
## CTenure     0.02  0.25   0.26    0.24    1.00  0.00 -0.07       0.07
## Pop        -0.29  0.40   0.43   -0.06    0.00  1.00 -0.27      -0.05
## Comp        0.03 -0.24  -0.33    0.18   -0.07 -0.27  1.00       0.03
## Visibility -0.03  0.13   0.14    0.16    0.07 -0.05  0.03       1.00
## PedCount   -0.22  0.42   0.45    0.06   -0.08  0.61 -0.15      -0.14
## Res        -0.03 -0.17  -0.16   -0.06   -0.34 -0.24  0.22       0.02
##            PedCount   Res Hours24 CrewSkill MgrSkill ServQual
## store         -0.22 -0.03    0.03      0.05    -0.07    -0.32
## Sales          0.42 -0.17    0.06      0.16     0.31     0.39
## Profit         0.45 -0.16   -0.03      0.16     0.32     0.36
## MTenure        0.06 -0.06   -0.17      0.10     0.23     0.18
## CTenure       -0.08 -0.34    0.07      0.26     0.12     0.08
## Pop            0.61 -0.24   -0.22      0.28     0.08     0.12
## Comp          -0.15  0.22    0.13     -0.04     0.22     0.02
## Visibility    -0.14  0.02    0.05     -0.20     0.07     0.21
## PedCount       1.00 -0.28   -0.28      0.21     0.09    -0.01
## Res           -0.28  1.00   -0.09     -0.15    -0.03     0.09
#We can also use 
#options(digits=2)
#cor(store,use="complete obs", method="Kendall")

Use R to measure the correlation between Profit and MTenure. (Display the numbers up to 2 Decimal places)

options(digits =2)
cor(store$Profit,store$MTenure)
## [1] 0.44

Use R to measure the correlation between Profit and CTenure. (Display the numbers up to 2 Decimal places)

options(digits =2)
cor(store$Profit,store$CTenure)
## [1] 0.26

Lets study the scatter plot matrix for the different variables.

scatterplot.matrix(store[,c("Profit","MTenure","CTenure")],spread=FALSE,
                   smoother.args=list(lty=2),
                   main="Scatter plot Matrix")
## Warning: 'scatterplot.matrix' is deprecated.
## Use 'scatterplotMatrix' instead.
## See help("Deprecated") and help("car-deprecated").

Lets Use R to construct the following Corrgram based on all variables in the dataset.

#library("corrgram")
#corrgram(store, order=FALSE,
#         main="Corrgram of Store",lower.panel=panel.shade, upper.panel=panel.pie,                #          text.panel=panel.txt)

Run a Pearson’s Correlation test on the correlation between Profit and MTenure. What is the p-value?

cor.test(store$Profit,store$MTenure)
## 
##  Pearson's product-moment correlation
## 
## data:  store$Profit and store$MTenure
## t = 4, df = 70, p-value = 8e-05
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.24 0.61
## sample estimates:
##  cor 
## 0.44

Thus the p-value is 8.19x10^(-5) Run a Pearson’s Correlation test on the correlation between Profit and CTenure. What is the p-value?

cor.test(store$Profit,store$CTenure)
## 
##  Pearson's product-moment correlation
## 
## data:  store$Profit and store$CTenure
## t = 2, df = 70, p-value = 0.03
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.033 0.458
## sample estimates:
##  cor 
## 0.26

Thus the p-value from the Pearson’s Correlation test is 0.02562 Run a regression of Profit on MTenure, CTenure Comp, Pop, PedCount, Res, Hours24, Visibility

modulus1 <- lm(Profit ~ MTenure + CTenure + Comp + Pop + PedCount + Res + Hours24 + Visibility ,data=store )
summary(modulus1)
## 
## Call:
## lm(formula = Profit ~ MTenure + CTenure + Comp + Pop + PedCount + 
##     Res + Hours24 + Visibility, data = store)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -105789  -35946   -7069   33780  112390 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   7610.04   66821.99    0.11  0.90967    
## MTenure        760.99     127.09    5.99  9.7e-08 ***
## CTenure        944.98     421.69    2.24  0.02840 *  
## Comp        -25286.89    5491.94   -4.60  1.9e-05 ***
## Pop              3.67       1.47    2.50  0.01489 *  
## PedCount     34087.36    9073.20    3.76  0.00037 ***
## Res          91584.68   39231.28    2.33  0.02262 *  
## Hours24      63233.31   19641.11    3.22  0.00199 ** 
## Visibility   12625.45    9087.62    1.39  0.16941    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 57000 on 66 degrees of freedom
## Multiple R-squared:  0.638,  Adjusted R-squared:  0.594 
## F-statistic: 14.5 on 8 and 66 DF,  p-value: 5.38e-12

lets List the explanatory variable(s) whose beta-coefficients are statistically significant (p < 0.05).

Mtenure, CTenure,Comp,Pop, PedCount,Res, Hours24 are statistically significant explanatory variables This can be viewed from the P-value column above

Lets List the explanatory variable(s) whose beta-coefficients are not statistically significant (p > 0.05)

Visibility is a statistically insignificant explanatory variable

Questions and Answers

Q1)What is expected change in the Profit at a store, if the Manager’s tenure i.e. number of months of experience with Store24, increases by one month?

Ans)If the manager’s tenure increases by one month then the profit at the store by $761

What is expected change in the Profit at a store, if the Crew’s tenure i.e. number of months of experience with Store24, increases by one month?

Ans)If the employees tenure increases by 1 month then the profit at the store increases by $945

Managerial Insight

As seen above, from the regression analysis,

  1. Mtenure, CTenure,Comp,Pop, PedCount,Res, Hours24 are statistically significant explanatory variables.
  2. Visibility is a statistically insignificant explanatory variable.
  3. The sign of the Beta co-efficient gives the direction of relationship and the magnitude is the value of B-co-efficient.
  4. Competitor follows a negative releation with the Profit which sounds correct as increase in competition will decrease the Profit at the Store. 5)The Adjusted R squared is 0.594 ie around 60% which states that the model isn’t very good to the extent. 6)For every one unit increase in Mtenure(respective units), Profit increases by $761. This can be inferred for every individual statistically significant variable in thier respective units.
  5. The F-Test shows that the explanatory variables as a whole i.e combined explanatory variables affect the dependent variable i.e Profit which can be seen from the statistically significant p-value.