Code to read the data into the dataframe called Store:-
setwd("~/Desktop/5 SRM Kashish Mukheja/Downoad content")
Store.df<-read.csv(paste("Store24.csv",sep=""))
View(Store.df)
Code to get the summary statistics of the data:-
library(psych)
describe(Store.df)
## vars n mean sd median trimmed mad
## store 1 75 38.00 21.79 38.00 38.00 28.17
## Sales 2 75 1205413.12 304531.31 1127332.00 1182031.25 288422.04
## Profit 3 75 276313.61 89404.08 265014.00 270260.34 90532.00
## MTenure 4 75 45.30 57.67 24.12 33.58 29.67
## CTenure 5 75 13.93 17.70 7.21 10.60 6.14
## Pop 6 75 9825.59 5911.67 8896.00 9366.07 7266.22
## Comp 7 75 3.79 1.31 3.63 3.66 0.82
## Visibility 8 75 3.08 0.75 3.00 3.07 0.00
## PedCount 9 75 2.96 0.99 3.00 2.97 1.48
## Res 10 75 0.96 0.20 1.00 1.00 0.00
## Hours24 11 75 0.84 0.37 1.00 0.92 0.00
## CrewSkill 12 75 3.46 0.41 3.50 3.47 0.34
## MgrSkill 13 75 3.64 0.41 3.59 3.62 0.45
## ServQual 14 75 87.15 12.61 89.47 88.62 15.61
## min max range skew kurtosis se
## store 1.00 75.00 74.00 0.00 -1.25 2.52
## Sales 699306.00 2113089.00 1413783.00 0.71 -0.09 35164.25
## Profit 122180.00 518998.00 396818.00 0.62 -0.21 10323.49
## MTenure 0.00 277.99 277.99 2.01 3.90 6.66
## CTenure 0.89 114.15 113.26 3.52 15.00 2.04
## Pop 1046.00 26519.00 25473.00 0.62 -0.23 682.62
## Comp 1.65 11.13 9.48 2.48 11.31 0.15
## Visibility 2.00 5.00 3.00 0.25 -0.38 0.09
## PedCount 1.00 5.00 4.00 0.00 -0.52 0.11
## Res 0.00 1.00 1.00 -4.60 19.43 0.02
## Hours24 0.00 1.00 1.00 -1.82 1.32 0.04
## CrewSkill 2.06 4.64 2.58 -0.43 1.64 0.05
## MgrSkill 2.96 4.62 1.67 0.27 -0.53 0.05
## ServQual 57.90 100.00 42.10 -0.66 -0.72 1.46
code to measure the mean and standard deviation of Profit:-
mean(Store.df$Profit)
## [1] 276313.6
sd(Store.df$Profit)
## [1] 89404.08
Code to measure the mean and standard deviation of MTenure:-
mean(Store.df$MTenure)
## [1] 45.29644
sd(Store.df$MTenure)
## [1] 57.67155
Code to measure the mean and standard deviation of CTenure:-
mean(Store.df$CTenure)
## [1] 13.9315
sd(Store.df$CTenure)
## [1] 17.69752
Code to print the {StoreID, Sales, Profit, MTenure, CTenure} of the top 10 most profitable stores:-
TopProfit<- Store.df[order(-Store.df$Profit),]
head(TopProfit[c("store","Sales","Profit","MTenure","CTenure")],n=10)
## store Sales Profit MTenure CTenure
## 74 74 1782957 518998 171.09720 29.519510
## 7 7 1809256 476355 62.53080 7.326488
## 9 9 2113089 474725 108.99350 6.061602
## 6 6 1703140 469050 149.93590 11.351130
## 44 44 1807740 439781 182.23640 114.151900
## 2 2 1619874 424007 86.22219 6.636550
## 45 45 1602362 410149 47.64565 9.166325
## 18 18 1704826 394039 239.96980 33.774130
## 11 11 1583446 389886 44.81977 2.036961
## 47 47 1665657 387853 12.84790 6.636550
Code to print the {StoreID, Sales, Profit, MTenure, CTenure} of the bottom 10 least profitable stores:-
BotProfit<- Store.df[order(-Store.df$Profit),]
tail(BotProfit[c("store","Sales","Profit","MTenure","CTenure")],n=10)
## store Sales Profit MTenure CTenure
## 37 37 1202917 187765 23.1985000 1.347023
## 61 61 716589 177046 21.8184200 13.305950
## 52 52 1073008 169201 24.1185600 3.416838
## 54 54 811190 159792 6.6703910 3.876797
## 13 13 857843 152513 0.6571813 1.577002
## 32 32 828918 149033 36.0792600 6.636550
## 55 55 925744 147672 6.6703910 18.365500
## 41 41 744211 147327 14.9180200 11.926080
## 66 66 879581 146058 115.2039000 3.876797
## 57 57 699306 122180 24.3485700 2.956879
Code to draw a scatter plot of Profit vs. MTenure:-
library(car)
##
## Attaching package: 'car'
## The following object is masked from 'package:psych':
##
## logit
scatterplot(Profit ~ MTenure , data=Store.df,
xlab="Mtenure", ylab="Profit",
main="Scatterplot of Profit Vs Mtenure",
labels=row.names(Store.df))
Code to draw a scatter plot of Profit vs. CTenure:-
scatterplot(Profit ~ CTenure , data=Store.df,
xlab="Ctenure", ylab="Profit",
main="Scatterplot of Profit Vs Ctenure",
labels=row.names(Store.df))
Code to construct a Correlation Matrix for all the variables in the dataset. (Display the numbers up to 2 Decimal places)
res<-cor(Store.df)
round(res,2)
## store Sales Profit MTenure CTenure Pop Comp Visibility
## store 1.00 -0.23 -0.20 -0.06 0.02 -0.29 0.03 -0.03
## Sales -0.23 1.00 0.92 0.45 0.25 0.40 -0.24 0.13
## Profit -0.20 0.92 1.00 0.44 0.26 0.43 -0.33 0.14
## MTenure -0.06 0.45 0.44 1.00 0.24 -0.06 0.18 0.16
## CTenure 0.02 0.25 0.26 0.24 1.00 0.00 -0.07 0.07
## Pop -0.29 0.40 0.43 -0.06 0.00 1.00 -0.27 -0.05
## Comp 0.03 -0.24 -0.33 0.18 -0.07 -0.27 1.00 0.03
## Visibility -0.03 0.13 0.14 0.16 0.07 -0.05 0.03 1.00
## PedCount -0.22 0.42 0.45 0.06 -0.08 0.61 -0.15 -0.14
## Res -0.03 -0.17 -0.16 -0.06 -0.34 -0.24 0.22 0.02
## Hours24 0.03 0.06 -0.03 -0.17 0.07 -0.22 0.13 0.05
## CrewSkill 0.05 0.16 0.16 0.10 0.26 0.28 -0.04 -0.20
## MgrSkill -0.07 0.31 0.32 0.23 0.12 0.08 0.22 0.07
## ServQual -0.32 0.39 0.36 0.18 0.08 0.12 0.02 0.21
## PedCount Res Hours24 CrewSkill MgrSkill ServQual
## store -0.22 -0.03 0.03 0.05 -0.07 -0.32
## Sales 0.42 -0.17 0.06 0.16 0.31 0.39
## Profit 0.45 -0.16 -0.03 0.16 0.32 0.36
## MTenure 0.06 -0.06 -0.17 0.10 0.23 0.18
## CTenure -0.08 -0.34 0.07 0.26 0.12 0.08
## Pop 0.61 -0.24 -0.22 0.28 0.08 0.12
## Comp -0.15 0.22 0.13 -0.04 0.22 0.02
## Visibility -0.14 0.02 0.05 -0.20 0.07 0.21
## PedCount 1.00 -0.28 -0.28 0.21 0.09 -0.01
## Res -0.28 1.00 -0.09 -0.15 -0.03 0.09
## Hours24 -0.28 -0.09 1.00 0.11 -0.04 0.06
## CrewSkill 0.21 -0.15 0.11 1.00 -0.02 -0.03
## MgrSkill 0.09 -0.03 -0.04 -0.02 1.00 0.36
## ServQual -0.01 0.09 0.06 -0.03 0.36 1.00
Code to measure the correlation between Profit and MTenure. (Display the numbers up to 2 Decimal places)
round(cor(Store.df$Profit,Store.df$MTenure),2)
## [1] 0.44
Code to measure the correlation between Profit and CTenure. (Display the numbers up to 2 Decimal places)
round(cor(Store.df$Profit,Store.df$CTenure),2)
## [1] 0.26
Code to construct the following Corrgram based on all variables in the dataset:-
library(corrgram)
library(ellipse)
##
## Attaching package: 'ellipse'
## The following object is masked from 'package:car':
##
## ellipse
## The following object is masked from 'package:graphics':
##
## pairs
corrgram(res, order = FALSE, lower.panel = panel.shade, upper.panel = panel.pie, text.panel = panel.txt,main = "Corrgram of store variables")
Run a Pearson’s Correlation test on the correlation between Profit and MTenure. What is the p-value?
corpm<-cor.test(Store.df$Profit , Store.df$MTenure, method = "pearson")
corpm$p.value
## [1] 8.193133e-05
p-value = 8.193e-05 Run a Pearson’s Correlation test on the correlation between Profit and CTenure. What is the p-value?
corpc<-cor.test(Store.df$Profit , Store.df$CTenure, method = "pearson")
corpc$p.value
## [1] 0.0256203
p-value = 0.0256203
Run a regression of Profit on {MTenure, CTenure Comp, Pop, PedCount, Res, Hours24, Visibility}.
fit<-lm(Profit~ MTenure + CTenure + Comp + Pop + PedCount + Res + Hours24 , data = Store.df)
summary(fit)
##
## Call:
## lm(formula = Profit ~ MTenure + CTenure + Comp + Pop + PedCount +
## Res + Hours24, data = Store.df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -107360 -34389 -6529 35039 122875
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 48251.178 60494.260 0.798 0.427911
## MTenure 791.141 126.085 6.275 2.94e-08 ***
## CTenure 947.183 424.601 2.231 0.029050 *
## Comp -25413.478 5529.167 -4.596 1.96e-05 ***
## Pop 3.802 1.473 2.581 0.012043 *
## PedCount 32266.120 9040.103 3.569 0.000668 ***
## Res 91993.749 39501.555 2.329 0.022891 *
## Hours24 64414.635 19758.441 3.260 0.001752 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 57360 on 67 degrees of freedom
## Multiple R-squared: 0.6273, Adjusted R-squared: 0.5884
## F-statistic: 16.11 on 7 and 67 DF, p-value: 3.151e-12
Since p<0.05, so we reject the null hypothesis.Hence they all affect the profit.
Based on TASK 3m, answer the following questions: 15.List the explanatory variable(s) whose beta-coefficients are statistically significant (p < 0.05).
Ans.]The explanatory variable(s) whose beta-coefficients are statistically significant (p < 0.05) are Mtenure,Ctenure,Comp,Pop,PedCount,Res,Hours24. 16.List the explanatory variable(s) whose beta-coefficients are not statistically significant (p > 0.05).
Ans.]Visibility is the ony explanatory variable whose beta-coefficient is not statistically significant (p > 0.05)
Based on TASK 4m, answer the following questions: 17.What is expected change in the Profit at a store, if the Manager’s tenure i.e. number of months of experience with Store24, increases by one month?
Ans.]760.993 units is expected change in the Profit at a store, if the Manager’s tenure i.e. number of months of experience with Store24, increases by one month? 18.What is expected change in the Profit at a store, if the Crew’s tenure i.e. number of months of experience with Store24, increases by one month?
Ans.]944.978 units is expected change in the Profit at a store, if the Crew’s tenure i.e. number of months of experience with Store24, increases by one month?
1.T-test to check the hypothesis “Stores in residential area have more profit than indusrtial area”
t.test(Store.df$Profit[Store.df$Res==1],Store.df$Profit[Store.df$Res==0],alternative = "less")
##
## Welch Two Sample t-test
##
## data: Store.df$Profit[Store.df$Res == 1] and Store.df$Profit[Store.df$Res == 0]
## t = -1.4991, df = 2.2038, p-value = 0.1307
## alternative hypothesis: true difference in means is less than 0
## 95 percent confidence interval:
## -Inf 60012.99
## sample estimates:
## mean of x mean of y
## 273422.7 345695.7
Conclusion:-Here Res is a categorical variable.Since p-value>0.05, hence we accepth the Null Hypothesis.So profit in Resendial area equals that of industrial area. 2.Regression between Profit and MTenure,CTenure,Res:-
fit1<-lm(Profit~ MTenure + CTenure + Res , data = Store.df)
summary(fit1)
##
## Call:
## lm(formula = Profit ~ MTenure + CTenure + Res, data = Store.df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -165603 -48276 -7952 36649 195254
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 278383.0 52589.0 5.294 1.28e-06 ***
## MTenure 622.9 167.1 3.727 0.000386 ***
## CTenure 652.1 578.1 1.128 0.263089
## Res -41009.8 50397.7 -0.814 0.418524
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 80400 on 71 degrees of freedom
## Multiple R-squared: 0.224, Adjusted R-squared: 0.1912
## F-statistic: 6.833 on 3 and 71 DF, p-value: 0.000413
3.T-test to check the hypothesis “Stores open for 24 hours have more profit than indusrtial area”
t.test(Store.df$Profit[Store.df$Hours24==1],Store.df$Profit[Store.df$Hours24==0],alternative = "greater")
##
## Welch Two Sample t-test
##
## data: Store.df$Profit[Store.df$Hours24 == 1] and Store.df$Profit[Store.df$Hours24 == 0]
## t = -0.18413, df = 13.615, p-value = 0.5717
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
## -65861.03 Inf
## sample estimates:
## mean of x mean of y
## 275318.0 281540.4
Conclusion:-Here Hours24 is a categorical variable.Since p-value>0.05, hence we accepth the Null Hypothesis. ##Task 4p:- Prepare an “Executive Summary”. Please add this to the end of your Rmd file. Specifically, please create a qualitative summary of Managerial Insights, based on your data analysis, especially your Regression Analysis. You may write this in paragraph form or in point form.
Ans.]From the above regression analyses we can infer the following:-
1.There is a statistically significant association between Profit and Mtenure, CTenure, Comp, Pop, PedCount, Res, Hours24.
2.There is a moderate positive correlation between Profit and MTenure and Profit and CTenure.
3.There is no statistically signifocant association between Profit and Visibility.
4.Profit shows a highly negative correlation with Comp.
5.On the basis of regression analysis, there is a need for more predictors, since the R-squared value is not too high.
6.The Mtenure and CTenure inlfuence the profit at a very significant rate.
7.There is a negative correlation between Mtenure and Hours24.
8.The Service quality shows positive correlation with Profit,MTenure,CTenure and Manager’s skill.Hence,increasing the service quality though manager’s and crew’s skill might lead to a respectable amount of increase in profit and their retention.