This R Markdown file is written to find any correlations between the employee tenure and the sales/profits of a given store. It involves reading a dataset of 75 stores, containing various variables.
setwd("~/Muyeena/Internship/Case Studies/Store 24")
store = read.csv("Store24.csv")
library(psych)
str(store)
## 'data.frame': 75 obs. of 14 variables:
## $ store : int 1 2 3 4 5 6 7 8 9 10 ...
## $ Sales : int 1060294 1619874 1099921 1053860 1227841 1703140 1809256 1378482 2113089 1080979 ...
## $ Profit : int 265014 424007 222735 210122 300480 469050 476355 361115 474725 278625 ...
## $ MTenure : num 0 86.22 23.89 0 3.88 ...
## $ CTenure : num 24.8 6.64 5.03 5.37 6.87 ...
## $ Pop : int 7535 8630 9695 2797 20335 16926 17754 20824 26519 16381 ...
## $ Comp : num 2.8 4.24 4.49 4.25 1.65 ...
## $ Visibility: int 3 4 3 4 2 3 2 4 2 4 ...
## $ PedCount : int 3 3 3 2 5 4 5 3 4 3 ...
## $ Res : int 1 1 1 1 0 1 1 1 1 1 ...
## $ Hours24 : int 1 1 1 1 1 0 1 1 1 0 ...
## $ CrewSkill : num 3.56 3.2 3.8 2.06 3.65 3.58 3.94 3.98 3.22 3.54 ...
## $ MgrSkill : num 3.15 3.56 4.12 4.1 3.59 ...
## $ ServQual : num 86.8 94.7 78.9 100 68.4 ...
head(store)
## store Sales Profit MTenure CTenure Pop Comp Visibility
## 1 1 1060294 265014 0.00000 24.804930 7535 2.797888 3
## 2 2 1619874 424007 86.22219 6.636550 8630 4.235555 4
## 3 3 1099921 222735 23.88854 5.026694 9695 4.494666 3
## 4 4 1053860 210122 0.00000 5.371663 2797 4.253946 4
## 5 5 1227841 300480 3.87737 6.866530 20335 1.651364 2
## 6 6 1703140 469050 149.93590 11.351130 16926 3.184613 3
## PedCount Res Hours24 CrewSkill MgrSkill ServQual
## 1 3 1 1 3.56 3.150000 86.84327
## 2 3 1 1 3.20 3.556667 94.73510
## 3 3 1 1 3.80 4.116667 78.94776
## 4 2 1 1 2.06 4.100000 100.00000
## 5 5 0 1 3.65 3.588889 68.42164
## 6 4 1 0 3.58 4.605556 94.73510
attach(store)
## The following object is masked _by_ .GlobalEnv:
##
## store
As can be seen above, the dataset is read into a variable called store and the same has been attached for easier access to its variables.
In this, we get the summary of all the variables of R, and compare it with Exhibit 3 given in the case study.
sum = describe(store)
sum[2:10,c(3,4,8,9)]
## mean sd min max
## Sales 1205413.12 304531.31 699306.00 2113089.00
## Profit 276313.61 89404.08 122180.00 518998.00
## MTenure 45.30 57.67 0.00 277.99
## CTenure 13.93 17.70 0.89 114.15
## Pop 9825.59 5911.67 1046.00 26519.00
## Comp 3.79 1.31 1.65 11.13
## Visibility 3.08 0.75 2.00 5.00
## PedCount 2.96 0.99 1.00 5.00
## Res 0.96 0.20 0.00 1.00
This is now compared with the summary data available in Exhibit 3 of the case study.
As can be seen above, both the values are identical.
Here, we analyse the most important variables for this case study, which include -
The same has been shown in a table, by executing the following code.
sum[3:5,3:4]
## mean sd
## Profit 276313.61 89404.08
## MTenure 45.30 57.67
## CTenure 13.93 17.70
In this task, we arrange the dataset in an ascending/descending order of the Profit, and display data related to it.
newdata = store[order(-store$Profit),]
newdata[1:10,1:5]
## store Sales Profit MTenure CTenure
## 74 74 1782957 518998 171.09720 29.519510
## 7 7 1809256 476355 62.53080 7.326488
## 9 9 2113089 474725 108.99350 6.061602
## 6 6 1703140 469050 149.93590 11.351130
## 44 44 1807740 439781 182.23640 114.151900
## 2 2 1619874 424007 86.22219 6.636550
## 45 45 1602362 410149 47.64565 9.166325
## 18 18 1704826 394039 239.96980 33.774130
## 11 11 1583446 389886 44.81977 2.036961
## 47 47 1665657 387853 12.84790 6.636550
newdata = store[order(store$Profit),]
newdata[1:10,1:5]
## store Sales Profit MTenure CTenure
## 57 57 699306 122180 24.3485700 2.956879
## 66 66 879581 146058 115.2039000 3.876797
## 41 41 744211 147327 14.9180200 11.926080
## 55 55 925744 147672 6.6703910 18.365500
## 32 32 828918 149033 36.0792600 6.636550
## 13 13 857843 152513 0.6571813 1.577002
## 54 54 811190 159792 6.6703910 3.876797
## 52 52 1073008 169201 24.1185600 3.416838
## 61 61 716589 177046 21.8184200 13.305950
## 37 37 1202917 187765 23.1985000 1.347023
In this task, we first draw scatterplot of Profit vs. MTenure so that we can visualize their relationship.
par(mfrow=c(1,1))
plot(store$Profit~store$MTenure,
cex = 1,
col = "blue",
pch = 20,
main = "Scatterplot of Profit vs. MTenure",
xlab = "MTenure",
ylab = "Profit")
abline(lm(store$Profit ~ store$MTenure), col = "green")
We now draw scatterplot of Profit vs. CTenure so that we can visualize their relationship too.
par(mfrow=c(1,1))
plot(store$Profit~store$CTenure,
cex = 1,
col = "blue",
pch = 20,
main = "Scatterplot of Profit vs. CTenure",
xlab = "CTenure",
ylab = "Profit")
abline(lm(store$Profit ~ store$CTenure), col = "green")
In R, we can see the correlations of each variable with every other variable using a correlation matrix. Higher the magnitude, greater the relationship between the two variables.
round(cor(store), 2)
## store Sales Profit MTenure CTenure Pop Comp Visibility
## store 1.00 -0.23 -0.20 -0.06 0.02 -0.29 0.03 -0.03
## Sales -0.23 1.00 0.92 0.45 0.25 0.40 -0.24 0.13
## Profit -0.20 0.92 1.00 0.44 0.26 0.43 -0.33 0.14
## MTenure -0.06 0.45 0.44 1.00 0.24 -0.06 0.18 0.16
## CTenure 0.02 0.25 0.26 0.24 1.00 0.00 -0.07 0.07
## Pop -0.29 0.40 0.43 -0.06 0.00 1.00 -0.27 -0.05
## Comp 0.03 -0.24 -0.33 0.18 -0.07 -0.27 1.00 0.03
## Visibility -0.03 0.13 0.14 0.16 0.07 -0.05 0.03 1.00
## PedCount -0.22 0.42 0.45 0.06 -0.08 0.61 -0.15 -0.14
## Res -0.03 -0.17 -0.16 -0.06 -0.34 -0.24 0.22 0.02
## Hours24 0.03 0.06 -0.03 -0.17 0.07 -0.22 0.13 0.05
## CrewSkill 0.05 0.16 0.16 0.10 0.26 0.28 -0.04 -0.20
## MgrSkill -0.07 0.31 0.32 0.23 0.12 0.08 0.22 0.07
## ServQual -0.32 0.39 0.36 0.18 0.08 0.12 0.02 0.21
## PedCount Res Hours24 CrewSkill MgrSkill ServQual
## store -0.22 -0.03 0.03 0.05 -0.07 -0.32
## Sales 0.42 -0.17 0.06 0.16 0.31 0.39
## Profit 0.45 -0.16 -0.03 0.16 0.32 0.36
## MTenure 0.06 -0.06 -0.17 0.10 0.23 0.18
## CTenure -0.08 -0.34 0.07 0.26 0.12 0.08
## Pop 0.61 -0.24 -0.22 0.28 0.08 0.12
## Comp -0.15 0.22 0.13 -0.04 0.22 0.02
## Visibility -0.14 0.02 0.05 -0.20 0.07 0.21
## PedCount 1.00 -0.28 -0.28 0.21 0.09 -0.01
## Res -0.28 1.00 -0.09 -0.15 -0.03 0.09
## Hours24 -0.28 -0.09 1.00 0.11 -0.04 0.06
## CrewSkill 0.21 -0.15 0.11 1.00 -0.02 -0.03
## MgrSkill 0.09 -0.03 -0.04 -0.02 1.00 0.36
## ServQual -0.01 0.09 0.06 -0.03 0.36 1.00
Based on the above table, we can just measure and display the correlation between Profit and MTenure as follows :
0.44
Similarly, we have the correlation between Profit and CTenure as :
0.26
In R, we can construct visualizations called Corrgram, which is basically a visual representation of the correlation matrix made above. This is an important way to visualize the data.
library(corrplot)
## corrplot 0.84 loaded
library(gplots)
##
## Attaching package: 'gplots'
## The following object is masked from 'package:stats':
##
## lowess
library(corrgram)
corrgram(store, lower.panel=panel.shade,
upper.panel=panel.pie, text.panel=panel.txt,
main="Corrgram of stores intercorrelations")
As can be seen in the above data, Profit has a high correlation with -
Whereas, the following variables have somewhat an effect on Profit -
Here, we run the pearson’s correlation test to find any relationship between the profit of each store, and the manager/crew tenure. We also make a note of the p-value as it is essential in rejecting or accepting a hypothesis.
The Null hypothesis states that :
There is no relationship between Profit and Manager’s Tenure.
m = cor.test(store$Profit, store$MTenure)
m
##
## Pearson's product-moment correlation
##
## data: store$Profit and store$MTenure
## t = 4.1731, df = 73, p-value = 8.193e-05
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.2353497 0.6055175
## sample estimates:
## cor
## 0.4388692
The p-value thus obtained is 8.210^{-5} which is very small. Therefore, we can reject the null hypothesis.
In terms of crew’s tenure, the hypothesis will be as such :
There is no relationship between Profit and Crew’s Tenure.
c = cor.test(store$Profit, store$CTenure)
c
##
## Pearson's product-moment correlation
##
## data: store$Profit and store$CTenure
## t = 2.2786, df = 73, p-value = 0.02562
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.03262507 0.45786339
## sample estimates:
## cor
## 0.2576789
The p-value thus obtained is 0.02562 This value is less than 0.05, which means we can reject the null hypothesis. But, it also show’s that a Crew Member’s tenure will not have that significant an impact on the profit of the store.
The major task asked in the case study was to see the effect of a Manager’s and Crew’s tenure in the organization on the sales of their store. This had to be calculated, while also accounting for the store location, popularity, etc.
We can run a basic linear regression model to get an estimated effect of the tenure’s of the employees on the profit of the stores.
This can be carried out as follows :
model = Profit ~ MTenure + CTenure + Comp + PedCount + Res + Hours24 + Visibility
reg = lm(model, data = store)
summary(reg)
##
## Call:
## lm(formula = model, data = store)
##
## Residuals:
## Min 1Q Median 3Q Max
## -117573 -31547 -3433 26249 115426
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 19521.5 69215.5 0.282 0.77878
## MTenure 721.6 131.0 5.510 6.16e-07 ***
## CTenure 1006.9 437.1 2.303 0.02437 *
## Comp -27692.7 5615.0 -4.932 5.66e-06 ***
## PedCount 46618.9 7854.5 5.935 1.15e-07 ***
## Res 87331.6 40701.8 2.146 0.03553 *
## Hours24 59035.6 20321.9 2.905 0.00497 **
## Visibility 14140.2 9416.1 1.502 0.13787
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 59160 on 67 degrees of freedom
## Multiple R-squared: 0.6036, Adjusted R-squared: 0.5622
## F-statistic: 14.57 on 7 and 67 DF, p-value: 2.271e-11
The following variables, have a significant effect (p<0.05) on the profit of the store :
The explanatory variable which does not have a significant effect (p>0.05) on the profit of the store is :
The expected change in the Profit at a store, if the Manager’s tenure i.e. number of months of experience with Store24, increases by one month is 721.58.
The expected change in the Profit at a store, if the Crew’s tenure i.e. number of months of experience with Store24, increases by one month is 1006.91
After all the tasks carried out on the given dataset, when can infer a lot of important points. Some of the important points are -
Manager’s tenure in the organization has a great impact on the overall profit of Store24. A very low p-value indicates that the two of them are highly related.
A crew member’s tenure will also have an impact on the profit of the store, but it’s p-value is not as high as compared to the manager’s tenure.
In terms of overall impact, a crew member’s tenure is supposed to have greater impact than a manager’s tenure.
Apart from the manager’s tenure; the total pedestrian footfall in the store area, the loaction of the store (residential or industrial) and the working hours of the store also have an impact on it’s profit.
The number of competitors in its Vicinity, has a great negative impact on the profit garnered from a store.
Visibility does not have any significant impact on the profit of the store.