Read and View the data using read.csv
store.df <- read.csv(paste("Store24.csv", sep=""))
View(store.df)
Summary statistics
summary(store.df)
## store Sales Profit MTenure
## Min. : 1.0 Min. : 699306 Min. :122180 Min. : 0.00
## 1st Qu.:19.5 1st Qu.: 984579 1st Qu.:211004 1st Qu.: 6.67
## Median :38.0 Median :1127332 Median :265014 Median : 24.12
## Mean :38.0 Mean :1205413 Mean :276314 Mean : 45.30
## 3rd Qu.:56.5 3rd Qu.:1362388 3rd Qu.:331314 3rd Qu.: 50.92
## Max. :75.0 Max. :2113089 Max. :518998 Max. :277.99
## CTenure Pop Comp Visibility
## Min. : 0.8871 Min. : 1046 Min. : 1.651 Min. :2.00
## 1st Qu.: 4.3943 1st Qu.: 5616 1st Qu.: 3.151 1st Qu.:3.00
## Median : 7.2115 Median : 8896 Median : 3.629 Median :3.00
## Mean : 13.9315 Mean : 9826 Mean : 3.788 Mean :3.08
## 3rd Qu.: 17.2156 3rd Qu.:14104 3rd Qu.: 4.230 3rd Qu.:4.00
## Max. :114.1519 Max. :26519 Max. :11.128 Max. :5.00
## PedCount Res Hours24 CrewSkill
## Min. :1.00 Min. :0.00 Min. :0.00 Min. :2.060
## 1st Qu.:2.00 1st Qu.:1.00 1st Qu.:1.00 1st Qu.:3.225
## Median :3.00 Median :1.00 Median :1.00 Median :3.500
## Mean :2.96 Mean :0.96 Mean :0.84 Mean :3.457
## 3rd Qu.:4.00 3rd Qu.:1.00 3rd Qu.:1.00 3rd Qu.:3.655
## Max. :5.00 Max. :1.00 Max. :1.00 Max. :4.640
## MgrSkill ServQual
## Min. :2.957 Min. : 57.90
## 1st Qu.:3.344 1st Qu.: 78.95
## Median :3.589 Median : 89.47
## Mean :3.638 Mean : 87.15
## 3rd Qu.:3.925 3rd Qu.: 99.90
## Max. :4.622 Max. :100.00
Use R to measure the mean and standard deviation of Profit.
mean(store.df$Profit, na.rm = FALSE)
## [1] 276313.6
sd(store.df$Profit, na.rm = FALSE)
## [1] 89404.08
Use R to measure the mean and standard deviation of MTenure.
mean(store.df$MTenure, na.rm = FALSE)
## [1] 45.29644
sd(store.df$MTenure, na.rm = FALSE)
## [1] 57.67155
Use R to measure the mean and standard deviation of CTenure.
mean(store.df$CTenure, na.rm = FALSE)
## [1] 13.9315
sd(store.df$CTenure, na.rm = FALSE)
## [1] 17.69752
Use R to print the {StoreID, Sales, Profit, MTenure, CTenure} of the top 10 most profitable stores.
attach(store.df)
most_profitable_stores <- store.df[order(-Profit), ] # sort by Profit (ascending)
most_profitable_stores[1:10, c(1, 3:5)] # see the first 10 rows
## store Profit MTenure CTenure
## 74 74 518998 171.09720 29.519510
## 7 7 476355 62.53080 7.326488
## 9 9 474725 108.99350 6.061602
## 6 6 469050 149.93590 11.351130
## 44 44 439781 182.23640 114.151900
## 2 2 424007 86.22219 6.636550
## 45 45 410149 47.64565 9.166325
## 18 18 394039 239.96980 33.774130
## 11 11 389886 44.81977 2.036961
## 47 47 387853 12.84790 6.636550
Use R to print the {StoreID, Sales, Profit, MTenure, CTenure} of the bottom 10 least profitable stores.
least_profitable_stores <- store.df[order(Profit), ] # sort by Profit (descending)
least_profitable_stores[10:1, c(1, 3:5)] # see the first 10 rows
## store Profit MTenure CTenure
## 37 37 187765 23.1985000 1.347023
## 61 61 177046 21.8184200 13.305950
## 52 52 169201 24.1185600 3.416838
## 54 54 159792 6.6703910 3.876797
## 13 13 152513 0.6571813 1.577002
## 32 32 149033 36.0792600 6.636550
## 55 55 147672 6.6703910 18.365500
## 41 41 147327 14.9180200 11.926080
## 66 66 146058 115.2039000 3.876797
## 57 57 122180 24.3485700 2.956879
detach(store.df)
Use R to draw a scatter plot of Profit vs. MTenure.
plot(store.df$MTenure, store.df$Profit,
col="black",
main="Scatterplot of Profit vs. MTenure",
xlab="MTenure", ylab="Profit")
abline(v=mean(store.df$MTenure), col="green", lty="dotted")
abline(h=mean(store.df$Profit), col="green", lty="dotted")
abline(lm(store.df$Profit ~ store.df$MTenure), col="red", lty="dotted")
Use R to draw a scatter plot of Profit vs. CTenure.
plot(store.df$CTenure, store.df$Profit,
col="black",
main="Scatterplot of Profit vs. CTenure",
xlab="CTenure", ylab="Profit")
abline(v=mean(store.df$CTenure), col="green", lty="dotted")
abline(h=mean(store.df$Profit), col="green", lty="dotted")
abline(lm(store.df$Profit ~ store.df$CTenure), col="red", lty="dotted")
Use R to construct a Correlation Matrix for all the variables in the dataset. (Display the numbers up to 2 Decimal places)
dim(store.df)
## [1] 75 14
round(cor(store.df[, ]), digits = 2)
## store Sales Profit MTenure CTenure Pop Comp Visibility
## store 1.00 -0.23 -0.20 -0.06 0.02 -0.29 0.03 -0.03
## Sales -0.23 1.00 0.92 0.45 0.25 0.40 -0.24 0.13
## Profit -0.20 0.92 1.00 0.44 0.26 0.43 -0.33 0.14
## MTenure -0.06 0.45 0.44 1.00 0.24 -0.06 0.18 0.16
## CTenure 0.02 0.25 0.26 0.24 1.00 0.00 -0.07 0.07
## Pop -0.29 0.40 0.43 -0.06 0.00 1.00 -0.27 -0.05
## Comp 0.03 -0.24 -0.33 0.18 -0.07 -0.27 1.00 0.03
## Visibility -0.03 0.13 0.14 0.16 0.07 -0.05 0.03 1.00
## PedCount -0.22 0.42 0.45 0.06 -0.08 0.61 -0.15 -0.14
## Res -0.03 -0.17 -0.16 -0.06 -0.34 -0.24 0.22 0.02
## Hours24 0.03 0.06 -0.03 -0.17 0.07 -0.22 0.13 0.05
## CrewSkill 0.05 0.16 0.16 0.10 0.26 0.28 -0.04 -0.20
## MgrSkill -0.07 0.31 0.32 0.23 0.12 0.08 0.22 0.07
## ServQual -0.32 0.39 0.36 0.18 0.08 0.12 0.02 0.21
## PedCount Res Hours24 CrewSkill MgrSkill ServQual
## store -0.22 -0.03 0.03 0.05 -0.07 -0.32
## Sales 0.42 -0.17 0.06 0.16 0.31 0.39
## Profit 0.45 -0.16 -0.03 0.16 0.32 0.36
## MTenure 0.06 -0.06 -0.17 0.10 0.23 0.18
## CTenure -0.08 -0.34 0.07 0.26 0.12 0.08
## Pop 0.61 -0.24 -0.22 0.28 0.08 0.12
## Comp -0.15 0.22 0.13 -0.04 0.22 0.02
## Visibility -0.14 0.02 0.05 -0.20 0.07 0.21
## PedCount 1.00 -0.28 -0.28 0.21 0.09 -0.01
## Res -0.28 1.00 -0.09 -0.15 -0.03 0.09
## Hours24 -0.28 -0.09 1.00 0.11 -0.04 0.06
## CrewSkill 0.21 -0.15 0.11 1.00 -0.02 -0.03
## MgrSkill 0.09 -0.03 -0.04 -0.02 1.00 0.36
## ServQual -0.01 0.09 0.06 -0.03 0.36 1.00
Use R to measure the correlation between Profit and MTenure. (Display the numbers up to 2 Decimal places)
round(cor(store.df$Profit, store.df$CTenure), digits = 2)
## [1] 0.26
Use R to measure the correlation between Profit and CTenure. (Display the numbers up to 2 Decimal places)
round(cor(store.df$Profit, store.df$MTenure), digits = 2)
## [1] 0.44
Use R to construct the following Corrgram based on all variables in the dataset.
library(corrgram)
corrgram(store.df[, names(store.df)], order=FALSE,
main="Corrgram of store variables",
lower.panel=panel.shade, upper.panel=panel.pie,
text.panel=panel.txt)
Qualitatively discuss the managerially relevant correlations - The Managerially relevant correlations are:
Run a Pearson’s Correlation test on the correlation between Profit and MTenure. What is the p-value? p-value = 8.193e-05
cor.test(store.df$Profit, store.df$MTenure)
##
## Pearson's product-moment correlation
##
## data: store.df$Profit and store.df$MTenure
## t = 4.1731, df = 73, p-value = 8.193e-05
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.2353497 0.6055175
## sample estimates:
## cor
## 0.4388692
Run a Pearson’s Correlation test on the correlation between Profit and CTenure. What is the p-value? p-value = 0.02562
cor.test(store.df$Profit, store.df$CTenure)
##
## Pearson's product-moment correlation
##
## data: store.df$Profit and store.df$CTenure
## t = 2.2786, df = 73, p-value = 0.02562
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.03262507 0.45786339
## sample estimates:
## cor
## 0.2576789
Run a regression of Profit on {MTenure, CTenure Comp, Pop, PedCount, Res, Hours24, Visibility}.
fit <- lm(Profit ~ MTenure+CTenure+Comp+Pop+PedCount+Res+Hours24+Visibility, data = store.df)
summary(fit)
##
## Call:
## lm(formula = Profit ~ MTenure + CTenure + Comp + Pop + PedCount +
## Res + Hours24 + Visibility, data = store.df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -105789 -35946 -7069 33780 112390
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7610.041 66821.994 0.114 0.909674
## MTenure 760.993 127.086 5.988 9.72e-08 ***
## CTenure 944.978 421.687 2.241 0.028400 *
## Comp -25286.887 5491.937 -4.604 1.94e-05 ***
## Pop 3.667 1.466 2.501 0.014890 *
## PedCount 34087.359 9073.196 3.757 0.000366 ***
## Res 91584.675 39231.283 2.334 0.022623 *
## Hours24 63233.307 19641.114 3.219 0.001994 **
## Visibility 12625.447 9087.620 1.389 0.169411
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 56970 on 66 degrees of freedom
## Multiple R-squared: 0.6379, Adjusted R-squared: 0.594
## F-statistic: 14.53 on 8 and 66 DF, p-value: 5.382e-12
List the explanatory variable(s) whose beta-coefficients are statistically significant (p < 0.05) -
List the explanatory variable(s) whose beta-coefficients are not statistically significant (p > 0.05) -
Ans) If the Manager’s tenure increases by one month, the profit increases by 760.993 unit Rs.
Ans) If the Crew’s tenure increases by one month, the profit increases by 944.978 unit Rs.
p- value of the whole model is 5.382e-12 which is much less than 0.05 and therefore, the model as a whole is a good model for the prediction of profit.
The model has passed the F-Test most likely.
According to the Adjusted R-Squared, all the predictor variables taken explain a 59.4% of variance approximately. Since it is around 60%, we can say that the number of variables taken to calculate the effect on Profit is not very less and is just appropriate although in order to increase this percentage, more factors could have been brought into picture to see their effect on Profit.
Since, for the Visibility variable, the p-value is not significant, we fail to reject the Null Hypothesis that the Visibility or the store front rating has a significant effect on Profit. Infact, it does not have any significant effect on Profit.
There is a very positive relationship between (MTenure, Comp and PedCount) variables and Profit.
The relationship between CTenure and Profit although significant is just on the borderline.
The inflence of CTenure is greater than MTenure.
Stores which are open in residential areas (res variable) have more influnce than industrial area stores on Profit.
Stores which are open 24 hours (Hours24 variable) have more influnce than other stores on Profit.