To import the Store24 dataset into R:
store.df <- read.csv(paste("F:/Data Analytics for Managerial Applications/Store24.csv", sep = ""))
##summary(store.df)
library(psych)
describe(store.df)
## vars n mean sd median trimmed mad
## store 1 75 38.00 21.79 38.00 38.00 28.17
## Sales 2 75 1205413.12 304531.31 1127332.00 1182031.25 288422.04
## Profit 3 75 276313.61 89404.08 265014.00 270260.34 90532.00
## MTenure 4 75 45.30 57.67 24.12 33.58 29.67
## CTenure 5 75 13.93 17.70 7.21 10.60 6.14
## Pop 6 75 9825.59 5911.67 8896.00 9366.07 7266.22
## Comp 7 75 3.79 1.31 3.63 3.66 0.82
## Visibility 8 75 3.08 0.75 3.00 3.07 0.00
## PedCount 9 75 2.96 0.99 3.00 2.97 1.48
## Res 10 75 0.96 0.20 1.00 1.00 0.00
## Hours24 11 75 0.84 0.37 1.00 0.92 0.00
## CrewSkill 12 75 3.46 0.41 3.50 3.47 0.34
## MgrSkill 13 75 3.64 0.41 3.59 3.62 0.45
## ServQual 14 75 87.15 12.61 89.47 88.62 15.61
## min max range skew kurtosis se
## store 1.00 75.00 74.00 0.00 -1.25 2.52
## Sales 699306.00 2113089.00 1413783.00 0.71 -0.09 35164.25
## Profit 122180.00 518998.00 396818.00 0.62 -0.21 10323.49
## MTenure 0.00 277.99 277.99 2.01 3.90 6.66
## CTenure 0.89 114.15 113.26 3.52 15.00 2.04
## Pop 1046.00 26519.00 25473.00 0.62 -0.23 682.62
## Comp 1.65 11.13 9.48 2.48 11.31 0.15
## Visibility 2.00 5.00 3.00 0.25 -0.38 0.09
## PedCount 1.00 5.00 4.00 0.00 -0.52 0.11
## Res 0.00 1.00 1.00 -4.60 19.43 0.02
## Hours24 0.00 1.00 1.00 -1.82 1.32 0.04
## CrewSkill 2.06 4.64 2.58 -0.43 1.64 0.05
## MgrSkill 2.96 4.62 1.67 0.27 -0.53 0.05
## ServQual 57.90 100.00 42.10 -0.66 -0.72 1.46
Thus we see, the summary statistics match the statistics given in Exhibit 3 of the Case Study pdf file.
4d. 1) To calculate mean and SD of profits:
mean(store.df$Profit)
## [1] 276313.6
sd(store.df$Profit)
## [1] 89404.08
4d. 2) To calculate mean and SD of MTenure (Manager’s Tenure):
mean(store.df$MTenure)
## [1] 45.29644
sd(store.df$MTenure)
## [1] 57.67155
4d. 3) To calculate mean and SD of CTenure (Crew’s Tenure):
mean(store.df$CTenure)
## [1] 13.9315
sd(store.df$CTenure)
## [1] 17.69752
4f. 4) To print the {StoreID, Sales, Profit, MTenure, CTenure} of the top 10 most profitable stores:
most_profit <- store.df[order(-store.df$Profit),]
View(most_profit[1:10,1:5])
4f. 5) To print the {StoreID, Sales, Profit, MTenure, CTenure} of the bottom 10 least profitable stores:
least_profit <- store.df[order(store.df$Profit),]
View(least_profit[1:10,1:5])
4g. 6) To draw a scatterplot of profit v/s MTenure
library(car)
##
## Attaching package: 'car'
## The following object is masked from 'package:psych':
##
## logit
scatterplot(store.df$Profit ~ store.df$MTenure, xlab = "MTenure",ylab = "Profit", main = "Scatterplot of Profit vs. MTenure", pch = 16, spread = FALSE, smoother.args=list(lty=2))
4h. 7) To draw a scatterplot of profit v/s CTenure
scatterplot(store.df$Profit ~ store.df$CTenure, xlab = "CTenure",ylab = "Profit", main = "Scatterplot of Profit vs. CTenure", pch = 16, spread = FALSE, smoother.args=list(lty=2))
4i. 8) To get the correlation matrix for all variables in the dataset, rounded to 2 decimal places:
round(cor(store.df),2)
## store Sales Profit MTenure CTenure Pop Comp Visibility
## store 1.00 -0.23 -0.20 -0.06 0.02 -0.29 0.03 -0.03
## Sales -0.23 1.00 0.92 0.45 0.25 0.40 -0.24 0.13
## Profit -0.20 0.92 1.00 0.44 0.26 0.43 -0.33 0.14
## MTenure -0.06 0.45 0.44 1.00 0.24 -0.06 0.18 0.16
## CTenure 0.02 0.25 0.26 0.24 1.00 0.00 -0.07 0.07
## Pop -0.29 0.40 0.43 -0.06 0.00 1.00 -0.27 -0.05
## Comp 0.03 -0.24 -0.33 0.18 -0.07 -0.27 1.00 0.03
## Visibility -0.03 0.13 0.14 0.16 0.07 -0.05 0.03 1.00
## PedCount -0.22 0.42 0.45 0.06 -0.08 0.61 -0.15 -0.14
## Res -0.03 -0.17 -0.16 -0.06 -0.34 -0.24 0.22 0.02
## Hours24 0.03 0.06 -0.03 -0.17 0.07 -0.22 0.13 0.05
## CrewSkill 0.05 0.16 0.16 0.10 0.26 0.28 -0.04 -0.20
## MgrSkill -0.07 0.31 0.32 0.23 0.12 0.08 0.22 0.07
## ServQual -0.32 0.39 0.36 0.18 0.08 0.12 0.02 0.21
## PedCount Res Hours24 CrewSkill MgrSkill ServQual
## store -0.22 -0.03 0.03 0.05 -0.07 -0.32
## Sales 0.42 -0.17 0.06 0.16 0.31 0.39
## Profit 0.45 -0.16 -0.03 0.16 0.32 0.36
## MTenure 0.06 -0.06 -0.17 0.10 0.23 0.18
## CTenure -0.08 -0.34 0.07 0.26 0.12 0.08
## Pop 0.61 -0.24 -0.22 0.28 0.08 0.12
## Comp -0.15 0.22 0.13 -0.04 0.22 0.02
## Visibility -0.14 0.02 0.05 -0.20 0.07 0.21
## PedCount 1.00 -0.28 -0.28 0.21 0.09 -0.01
## Res -0.28 1.00 -0.09 -0.15 -0.03 0.09
## Hours24 -0.28 -0.09 1.00 0.11 -0.04 0.06
## CrewSkill 0.21 -0.15 0.11 1.00 -0.02 -0.03
## MgrSkill 0.09 -0.03 -0.04 -0.02 1.00 0.36
## ServQual -0.01 0.09 0.06 -0.03 0.36 1.00
4j. 9) To measure the correlation between Profit and MTenure rounded off to 2 Decimal places:
round(cor(store.df$Profit, store.df$MTenure),2)
## [1] 0.44
4j. 10) To measure the correlation between Profit and CTenure rounded off to 2 Decimal places:
round(cor(store.df$Profit, store.df$CTenure),2)
## [1] 0.26
4k. To construct a Corrgram based on all variables in the dataset:
library(gplots)
##
## Attaching package: 'gplots'
## The following object is masked from 'package:stats':
##
## lowess
par(mfrow=c(1, 1))
corrplot::corrplot.mixed(corr=cor(store.df, use="complete.obs"), main = "Corrgram of store variables", upper="pie", tl.cex = 0.45, lower = "shade", mar = c(0,0,2,0), shade.col = colorpanel(5,"red", "blue4", "pink1"))
## Writing corrplot::corrplot() to specify that we want the function from the package corrplot to avoid ambiguity as corrplot() is part of multiple packages.
Some insights from the corrplot:
Profit appears to have a stronger correlation with MTenure than with CTenure. Profit has a negative correlation with No. of Competitors, which is understandable. Also, Sales and Profit have a moderately strong correlation with PedCount.
4l. 12) To run a Pearson’s Correlation test on the correlation between Profit and MTenure:
cor.test(store.df$Profit, store.df$MTenure)
##
## Pearson's product-moment correlation
##
## data: store.df$Profit and store.df$MTenure
## t = 4.1731, df = 73, p-value = 8.193e-05
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.2353497 0.6055175
## sample estimates:
## cor
## 0.4388692
The p-value is 8.193e-05 (<0.05) suggesting strong correlation between Profit and MTenure.
4l. 13) To run a Pearson’s Correlation test on the correlation between Profit and CTenure:
cor.test(store.df$Profit, store.df$CTenure)
##
## Pearson's product-moment correlation
##
## data: store.df$Profit and store.df$CTenure
## t = 2.2786, df = 73, p-value = 0.02562
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.03262507 0.45786339
## sample estimates:
## cor
## 0.2576789
The p-value is 0.02562 (<0.05) suggesting strong correlation between Profit and CTenure.
4m. 14) To run a regression of Profit on {MTenure, CTenure, Comp, Pop, PedCount, Res, Hours24, Visibility}:
fit <- lm(Profit ~ MTenure+CTenure+Comp+Pop+PedCount+Res+Hours24+Visibility, data = store.df)
summary(fit)
##
## Call:
## lm(formula = Profit ~ MTenure + CTenure + Comp + Pop + PedCount +
## Res + Hours24 + Visibility, data = store.df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -105789 -35946 -7069 33780 112390
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7610.041 66821.994 0.114 0.909674
## MTenure 760.993 127.086 5.988 9.72e-08 ***
## CTenure 944.978 421.687 2.241 0.028400 *
## Comp -25286.887 5491.937 -4.604 1.94e-05 ***
## Pop 3.667 1.466 2.501 0.014890 *
## PedCount 34087.359 9073.196 3.757 0.000366 ***
## Res 91584.675 39231.283 2.334 0.022623 *
## Hours24 63233.307 19641.114 3.219 0.001994 **
## Visibility 12625.447 9087.620 1.389 0.169411
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 56970 on 66 degrees of freedom
## Multiple R-squared: 0.6379, Adjusted R-squared: 0.594
## F-statistic: 14.53 on 8 and 66 DF, p-value: 5.382e-12
4n. 15) From the regression analysis above, we see that the explanatory variables whose beta-coefficients are statistically sigificant (p < 0.05) are: MTenure, CTenure, Comp, Pop, PedCount, Res and Hours24.
4n. 16) The explanatory variable whose beta-coefficient is statistically insigificant (p > 0.05) is: Visibility.
4o. 17) Translating the results from the regression analysis, we see that - For every increase in Manager’s Tenure (MTenure) by 1 month, the Profit increases by 761 Currency Units.
4o. 18) Translating the results from the regression analysis, we see that - For every increase in Crew’s Tenure (CTenure) by 1 month, the Profit increases by 945 Currency Units.
4p. 19) An executive summary based on the data analysis and regression analysis conducted on the Store24 dataset:
Looking at the results of the regression analysis, we see that the F-statistic (=14.53) and the p-value (=5.382e-12 which is far less than 0.05) suggest that the model as a whole is a good one. So it passes the test imposed by the F-Test. The value of Adjusted R-squared (=0.594) suggests that the selected explanatory variables explain approximately 59% of the variance while also suggesting that there are few other variables that account for the Profits. Looking at the individual explanatory variables and their respective p-values, we can deduce that other than Visibility, all others have a significant correlation with the Profits of Store24 outlets. To answer the initial research question, we conclude that the crew and manager tenures indeed have a significant correlation with the profits. Another interesting insight we get from the regression analysis is that the site-location factors and external competition have a significant influence of the profits generated, which also makes sense.