Import Store24 dataset

To import the Store24 dataset into R:

store.df <- read.csv(paste("F:/Data Analytics for Managerial Applications/Store24.csv", sep = ""))
##summary(store.df)
library(psych)
describe(store.df)
##            vars  n       mean        sd     median    trimmed       mad
## store         1 75      38.00     21.79      38.00      38.00     28.17
## Sales         2 75 1205413.12 304531.31 1127332.00 1182031.25 288422.04
## Profit        3 75  276313.61  89404.08  265014.00  270260.34  90532.00
## MTenure       4 75      45.30     57.67      24.12      33.58     29.67
## CTenure       5 75      13.93     17.70       7.21      10.60      6.14
## Pop           6 75    9825.59   5911.67    8896.00    9366.07   7266.22
## Comp          7 75       3.79      1.31       3.63       3.66      0.82
## Visibility    8 75       3.08      0.75       3.00       3.07      0.00
## PedCount      9 75       2.96      0.99       3.00       2.97      1.48
## Res          10 75       0.96      0.20       1.00       1.00      0.00
## Hours24      11 75       0.84      0.37       1.00       0.92      0.00
## CrewSkill    12 75       3.46      0.41       3.50       3.47      0.34
## MgrSkill     13 75       3.64      0.41       3.59       3.62      0.45
## ServQual     14 75      87.15     12.61      89.47      88.62     15.61
##                  min        max      range  skew kurtosis       se
## store           1.00      75.00      74.00  0.00    -1.25     2.52
## Sales      699306.00 2113089.00 1413783.00  0.71    -0.09 35164.25
## Profit     122180.00  518998.00  396818.00  0.62    -0.21 10323.49
## MTenure         0.00     277.99     277.99  2.01     3.90     6.66
## CTenure         0.89     114.15     113.26  3.52    15.00     2.04
## Pop          1046.00   26519.00   25473.00  0.62    -0.23   682.62
## Comp            1.65      11.13       9.48  2.48    11.31     0.15
## Visibility      2.00       5.00       3.00  0.25    -0.38     0.09
## PedCount        1.00       5.00       4.00  0.00    -0.52     0.11
## Res             0.00       1.00       1.00 -4.60    19.43     0.02
## Hours24         0.00       1.00       1.00 -1.82     1.32     0.04
## CrewSkill       2.06       4.64       2.58 -0.43     1.64     0.05
## MgrSkill        2.96       4.62       1.67  0.27    -0.53     0.05
## ServQual       57.90     100.00      42.10 -0.66    -0.72     1.46

Thus we see, the summary statistics match the statistics given in Exhibit 3 of the Case Study pdf file.

Calculating mean and sd of 3 variables

4d. 1) To calculate mean and SD of profits:

mean(store.df$Profit)
## [1] 276313.6
sd(store.df$Profit)
## [1] 89404.08

4d. 2) To calculate mean and SD of MTenure (Manager’s Tenure):

mean(store.df$MTenure)
## [1] 45.29644
sd(store.df$MTenure)
## [1] 57.67155

4d. 3) To calculate mean and SD of CTenure (Crew’s Tenure):

mean(store.df$CTenure)
## [1] 13.9315
sd(store.df$CTenure)
## [1] 17.69752

Sorting and subsetting the dataset

4f. 4) To print the {StoreID, Sales, Profit, MTenure, CTenure} of the top 10 most profitable stores:

most_profit <- store.df[order(-store.df$Profit),]
View(most_profit[1:10,1:5])

4f. 5) To print the {StoreID, Sales, Profit, MTenure, CTenure} of the bottom 10 least profitable stores:

least_profit <- store.df[order(store.df$Profit),]
View(least_profit[1:10,1:5])

Scatterplots

4g. 6) To draw a scatterplot of profit v/s MTenure

library(car)
## 
## Attaching package: 'car'
## The following object is masked from 'package:psych':
## 
##     logit
scatterplot(store.df$Profit ~ store.df$MTenure, xlab = "MTenure",ylab = "Profit", main = "Scatterplot of Profit vs. MTenure", pch = 16, spread = FALSE, smoother.args=list(lty=2))

4h. 7) To draw a scatterplot of profit v/s CTenure

scatterplot(store.df$Profit ~ store.df$CTenure, xlab = "CTenure",ylab = "Profit", main = "Scatterplot of Profit vs. CTenure", pch = 16, spread = FALSE, smoother.args=list(lty=2))

Correlation matrix

4i. 8) To get the correlation matrix for all variables in the dataset, rounded to 2 decimal places:

round(cor(store.df),2)
##            store Sales Profit MTenure CTenure   Pop  Comp Visibility
## store       1.00 -0.23  -0.20   -0.06    0.02 -0.29  0.03      -0.03
## Sales      -0.23  1.00   0.92    0.45    0.25  0.40 -0.24       0.13
## Profit     -0.20  0.92   1.00    0.44    0.26  0.43 -0.33       0.14
## MTenure    -0.06  0.45   0.44    1.00    0.24 -0.06  0.18       0.16
## CTenure     0.02  0.25   0.26    0.24    1.00  0.00 -0.07       0.07
## Pop        -0.29  0.40   0.43   -0.06    0.00  1.00 -0.27      -0.05
## Comp        0.03 -0.24  -0.33    0.18   -0.07 -0.27  1.00       0.03
## Visibility -0.03  0.13   0.14    0.16    0.07 -0.05  0.03       1.00
## PedCount   -0.22  0.42   0.45    0.06   -0.08  0.61 -0.15      -0.14
## Res        -0.03 -0.17  -0.16   -0.06   -0.34 -0.24  0.22       0.02
## Hours24     0.03  0.06  -0.03   -0.17    0.07 -0.22  0.13       0.05
## CrewSkill   0.05  0.16   0.16    0.10    0.26  0.28 -0.04      -0.20
## MgrSkill   -0.07  0.31   0.32    0.23    0.12  0.08  0.22       0.07
## ServQual   -0.32  0.39   0.36    0.18    0.08  0.12  0.02       0.21
##            PedCount   Res Hours24 CrewSkill MgrSkill ServQual
## store         -0.22 -0.03    0.03      0.05    -0.07    -0.32
## Sales          0.42 -0.17    0.06      0.16     0.31     0.39
## Profit         0.45 -0.16   -0.03      0.16     0.32     0.36
## MTenure        0.06 -0.06   -0.17      0.10     0.23     0.18
## CTenure       -0.08 -0.34    0.07      0.26     0.12     0.08
## Pop            0.61 -0.24   -0.22      0.28     0.08     0.12
## Comp          -0.15  0.22    0.13     -0.04     0.22     0.02
## Visibility    -0.14  0.02    0.05     -0.20     0.07     0.21
## PedCount       1.00 -0.28   -0.28      0.21     0.09    -0.01
## Res           -0.28  1.00   -0.09     -0.15    -0.03     0.09
## Hours24       -0.28 -0.09    1.00      0.11    -0.04     0.06
## CrewSkill      0.21 -0.15    0.11      1.00    -0.02    -0.03
## MgrSkill       0.09 -0.03   -0.04     -0.02     1.00     0.36
## ServQual      -0.01  0.09    0.06     -0.03     0.36     1.00

Correlations between chosen variables

4j. 9) To measure the correlation between Profit and MTenure rounded off to 2 Decimal places:

round(cor(store.df$Profit, store.df$MTenure),2)
## [1] 0.44

4j. 10) To measure the correlation between Profit and CTenure rounded off to 2 Decimal places:

round(cor(store.df$Profit, store.df$CTenure),2)
## [1] 0.26

Corrgram

4k. To construct a Corrgram based on all variables in the dataset:

library(gplots)      
## 
## Attaching package: 'gplots'
## The following object is masked from 'package:stats':
## 
##     lowess
par(mfrow=c(1, 1))
corrplot::corrplot.mixed(corr=cor(store.df, use="complete.obs"), main = "Corrgram of store variables", upper="pie", tl.cex = 0.45, lower = "shade", mar = c(0,0,2,0), shade.col = colorpanel(5,"red", "blue4", "pink1"))

## Writing corrplot::corrplot() to specify that we want the function from the package corrplot to avoid ambiguity as corrplot() is part of multiple packages.

Some insights from the corrplot:

Profit appears to have a stronger correlation with MTenure than with CTenure. Profit has a negative correlation with No. of Competitors, which is understandable. Also, Sales and Profit have a moderately strong correlation with PedCount.

Pearson’s Correlation Test

4l. 12) To run a Pearson’s Correlation test on the correlation between Profit and MTenure:

cor.test(store.df$Profit, store.df$MTenure)
## 
##  Pearson's product-moment correlation
## 
## data:  store.df$Profit and store.df$MTenure
## t = 4.1731, df = 73, p-value = 8.193e-05
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.2353497 0.6055175
## sample estimates:
##       cor 
## 0.4388692

The p-value is 8.193e-05 (<0.05) suggesting strong correlation between Profit and MTenure.

4l. 13) To run a Pearson’s Correlation test on the correlation between Profit and CTenure:

cor.test(store.df$Profit, store.df$CTenure)
## 
##  Pearson's product-moment correlation
## 
## data:  store.df$Profit and store.df$CTenure
## t = 2.2786, df = 73, p-value = 0.02562
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.03262507 0.45786339
## sample estimates:
##       cor 
## 0.2576789

The p-value is 0.02562 (<0.05) suggesting strong correlation between Profit and CTenure.

Regression Analysis

4m. 14) To run a regression of Profit on {MTenure, CTenure, Comp, Pop, PedCount, Res, Hours24, Visibility}:

fit <- lm(Profit ~ MTenure+CTenure+Comp+Pop+PedCount+Res+Hours24+Visibility, data = store.df)
summary(fit)
## 
## Call:
## lm(formula = Profit ~ MTenure + CTenure + Comp + Pop + PedCount + 
##     Res + Hours24 + Visibility, data = store.df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -105789  -35946   -7069   33780  112390 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   7610.041  66821.994   0.114 0.909674    
## MTenure        760.993    127.086   5.988 9.72e-08 ***
## CTenure        944.978    421.687   2.241 0.028400 *  
## Comp        -25286.887   5491.937  -4.604 1.94e-05 ***
## Pop              3.667      1.466   2.501 0.014890 *  
## PedCount     34087.359   9073.196   3.757 0.000366 ***
## Res          91584.675  39231.283   2.334 0.022623 *  
## Hours24      63233.307  19641.114   3.219 0.001994 ** 
## Visibility   12625.447   9087.620   1.389 0.169411    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 56970 on 66 degrees of freedom
## Multiple R-squared:  0.6379, Adjusted R-squared:  0.594 
## F-statistic: 14.53 on 8 and 66 DF,  p-value: 5.382e-12

4n. 15) From the regression analysis above, we see that the explanatory variables whose beta-coefficients are statistically sigificant (p < 0.05) are: MTenure, CTenure, Comp, Pop, PedCount, Res and Hours24.

4n. 16) The explanatory variable whose beta-coefficient is statistically insigificant (p > 0.05) is: Visibility.

4o. 17) Translating the results from the regression analysis, we see that - For every increase in Manager’s Tenure (MTenure) by 1 month, the Profit increases by 761 Currency Units.

4o. 18) Translating the results from the regression analysis, we see that - For every increase in Crew’s Tenure (CTenure) by 1 month, the Profit increases by 945 Currency Units.

Executive Summary

4p. 19) An executive summary based on the data analysis and regression analysis conducted on the Store24 dataset:

Looking at the results of the regression analysis, we see that the F-statistic (=14.53) and the p-value (=5.382e-12 which is far less than 0.05) suggest that the model as a whole is a good one. So it passes the test imposed by the F-Test. The value of Adjusted R-squared (=0.594) suggests that the selected explanatory variables explain approximately 59% of the variance while also suggesting that there are few other variables that account for the Profits. Looking at the individual explanatory variables and their respective p-values, we can deduce that other than Visibility, all others have a significant correlation with the Profits of Store24 outlets. To answer the initial research question, we conclude that the crew and manager tenures indeed have a significant correlation with the profits. Another interesting insight we get from the regression analysis is that the site-location factors and external competition have a significant influence of the profits generated, which also makes sense.