This is an R Markdown document. This Rmd file shows the tabulations and graphs generated from Store 24 dataset.
#TASK 4-c: (To read the dataset and assess Summary statisticss)
setwd("F:/R-Internship/Course related files")
store24.df<-read.csv(paste("Store24.csv",sep=""))
View(store24.df)
summary(store24.df)
## store Sales Profit MTenure
## Min. : 1.0 Min. : 699306 Min. :122180 Min. : 0.00
## 1st Qu.:19.5 1st Qu.: 984579 1st Qu.:211004 1st Qu.: 6.67
## Median :38.0 Median :1127332 Median :265014 Median : 24.12
## Mean :38.0 Mean :1205413 Mean :276314 Mean : 45.30
## 3rd Qu.:56.5 3rd Qu.:1362388 3rd Qu.:331314 3rd Qu.: 50.92
## Max. :75.0 Max. :2113089 Max. :518998 Max. :277.99
## CTenure Pop Comp Visibility
## Min. : 0.8871 Min. : 1046 Min. : 1.651 Min. :2.00
## 1st Qu.: 4.3943 1st Qu.: 5616 1st Qu.: 3.151 1st Qu.:3.00
## Median : 7.2115 Median : 8896 Median : 3.629 Median :3.00
## Mean : 13.9315 Mean : 9826 Mean : 3.788 Mean :3.08
## 3rd Qu.: 17.2156 3rd Qu.:14104 3rd Qu.: 4.230 3rd Qu.:4.00
## Max. :114.1519 Max. :26519 Max. :11.128 Max. :5.00
## PedCount Res Hours24 CrewSkill
## Min. :1.00 Min. :0.00 Min. :0.00 Min. :2.060
## 1st Qu.:2.00 1st Qu.:1.00 1st Qu.:1.00 1st Qu.:3.225
## Median :3.00 Median :1.00 Median :1.00 Median :3.500
## Mean :2.96 Mean :0.96 Mean :0.84 Mean :3.457
## 3rd Qu.:4.00 3rd Qu.:1.00 3rd Qu.:1.00 3rd Qu.:3.655
## Max. :5.00 Max. :1.00 Max. :1.00 Max. :4.640
## MgrSkill ServQual
## Min. :2.957 Min. : 57.90
## 1st Qu.:3.344 1st Qu.: 78.95
## Median :3.589 Median : 89.47
## Mean :3.638 Mean : 87.15
## 3rd Qu.:3.925 3rd Qu.: 99.90
## Max. :4.622 Max. :100.00
#TASK 4-d: (To compute mean and standard deviation of various columns of the dataset)
mean(store24.df$Profit)
## [1] 276313.6
sd(store24.df$Profit)
## [1] 89404.08
mean(store24.df$MTenure)
## [1] 45.29644
sd(store24.df$MTenure)
## [1] 57.67155
mean(store24.df$CTenure)
## [1] 13.9315
sd(store24.df$CTenure)
## [1] 17.69752
#TASK 4-e: (Sorting and Subsetting data)
attach(mtcars)
View(mtcars)
newdata <- mtcars[order(mpg),] # sort by mpg (ascending)
View(newdata)
newdata[1:5,] # see the first 5 rows
## mpg cyl disp hp drat wt qsec vs am gear carb
## Cadillac Fleetwood 10.4 8 472 205 2.93 5.250 17.98 0 0 3 4
## Lincoln Continental 10.4 8 460 215 3.00 5.424 17.82 0 0 3 4
## Camaro Z28 13.3 8 350 245 3.73 3.840 15.41 0 0 3 4
## Duster 360 14.3 8 360 245 3.21 3.570 15.84 0 0 3 4
## Chrysler Imperial 14.7 8 440 230 3.23 5.345 17.42 0 0 3 4
newdata <- mtcars[order(-mpg),] # sort by mpg (descending)
View(newdata)
detach(mtcars)
#TASK 4-f: (To display the first 5 columns of top 10 and bottom least 10 profitable stores)
attach(store24.df)
View(store24.df)
store1<-store24.df[order(-Profit),]
store2<-store1[1:10,1:5]
View(store2)
store3<-store1[66:75,1:5]
View(store3)
#TASK 4-g: (Scatterplot of Profit and MTenure)
plot(store24.df$Profit~store24.df$MTenure,xlab="MTenure",
ylab="Profit",main="Scatterplot of Profit vs. MTenure",pch=19)
abline(lm(store24.df$Profit~store24.df$MTenure),col="green 3")
#TASK 4-h: (Scatterplot of Profit and CTenure)
plot(store24.df$Profit~store24.df$CTenure,xlab="CTenure",
ylab="Profit",main="Scatterplot of Profit vs. CTenure",pch=19)
abline(lm(store24.df$Profit~store24.df$CTenure),col="green 3")
#TASK 4-i: (Constructing a correlation matrix for all variables of dataset)
round(cor(store24.df),2)
## store Sales Profit MTenure CTenure Pop Comp Visibility
## store 1.00 -0.23 -0.20 -0.06 0.02 -0.29 0.03 -0.03
## Sales -0.23 1.00 0.92 0.45 0.25 0.40 -0.24 0.13
## Profit -0.20 0.92 1.00 0.44 0.26 0.43 -0.33 0.14
## MTenure -0.06 0.45 0.44 1.00 0.24 -0.06 0.18 0.16
## CTenure 0.02 0.25 0.26 0.24 1.00 0.00 -0.07 0.07
## Pop -0.29 0.40 0.43 -0.06 0.00 1.00 -0.27 -0.05
## Comp 0.03 -0.24 -0.33 0.18 -0.07 -0.27 1.00 0.03
## Visibility -0.03 0.13 0.14 0.16 0.07 -0.05 0.03 1.00
## PedCount -0.22 0.42 0.45 0.06 -0.08 0.61 -0.15 -0.14
## Res -0.03 -0.17 -0.16 -0.06 -0.34 -0.24 0.22 0.02
## Hours24 0.03 0.06 -0.03 -0.17 0.07 -0.22 0.13 0.05
## CrewSkill 0.05 0.16 0.16 0.10 0.26 0.28 -0.04 -0.20
## MgrSkill -0.07 0.31 0.32 0.23 0.12 0.08 0.22 0.07
## ServQual -0.32 0.39 0.36 0.18 0.08 0.12 0.02 0.21
## PedCount Res Hours24 CrewSkill MgrSkill ServQual
## store -0.22 -0.03 0.03 0.05 -0.07 -0.32
## Sales 0.42 -0.17 0.06 0.16 0.31 0.39
## Profit 0.45 -0.16 -0.03 0.16 0.32 0.36
## MTenure 0.06 -0.06 -0.17 0.10 0.23 0.18
## CTenure -0.08 -0.34 0.07 0.26 0.12 0.08
## Pop 0.61 -0.24 -0.22 0.28 0.08 0.12
## Comp -0.15 0.22 0.13 -0.04 0.22 0.02
## Visibility -0.14 0.02 0.05 -0.20 0.07 0.21
## PedCount 1.00 -0.28 -0.28 0.21 0.09 -0.01
## Res -0.28 1.00 -0.09 -0.15 -0.03 0.09
## Hours24 -0.28 -0.09 1.00 0.11 -0.04 0.06
## CrewSkill 0.21 -0.15 0.11 1.00 -0.02 -0.03
## MgrSkill 0.09 -0.03 -0.04 -0.02 1.00 0.36
## ServQual -0.01 0.09 0.06 -0.03 0.36 1.00
#TASK 4-j: (Measuring the correlation between variables)
round(cor(store24.df$Profit,store24.df$MTenure),2)
## [1] 0.44
round(cor(store24.df$Profit,store24.df$CTenure),2)
## [1] 0.26
#TASK 4-k: (Corrgram of all variables of the dataset)
library(corrgram)
corrgram(store24.df,lower.panel = panel.shade,upper.panel = panel.pie,
text.panel = panel.txt,main=" Corrgram of store variables")
#TASK 4-k - EXPLANATION: From the corrgram, it looks like, Sales and Profit are strongly correlated with each other. Also, manager’s tenure and crew’s tenure seem to have a great effect on the profit of the company.
#TASK 4-l: (Correlation test between variables)
cor.test(store24.df$Profit,store24.df$MTenure)
##
## Pearson's product-moment correlation
##
## data: store24.df$Profit and store24.df$MTenure
## t = 4.1731, df = 73, p-value = 8.193e-05
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.2353497 0.6055175
## sample estimates:
## cor
## 0.4388692
cor.test(store24.df$Profit,store24.df$CTenure)
##
## Pearson's product-moment correlation
##
## data: store24.df$Profit and store24.df$CTenure
## t = 2.2786, df = 73, p-value = 0.02562
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.03262507 0.45786339
## sample estimates:
## cor
## 0.2576789
#TASK 4-m: (Regression Analysis)
store5<-lm(Profit~MTenure+CTenure+PedCount
+Comp+Pop+Visibility+Res+Hours24,data=store24.df)
store5
##
## Call:
## lm(formula = Profit ~ MTenure + CTenure + PedCount + Comp + Pop +
## Visibility + Res + Hours24, data = store24.df)
##
## Coefficients:
## (Intercept) MTenure CTenure PedCount Comp
## 7610.041 760.993 944.978 34087.359 -25286.887
## Pop Visibility Res Hours24
## 3.667 12625.447 91584.675 63233.307
summary(store5)
##
## Call:
## lm(formula = Profit ~ MTenure + CTenure + PedCount + Comp + Pop +
## Visibility + Res + Hours24, data = store24.df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -105789 -35946 -7069 33780 112390
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7610.041 66821.994 0.114 0.909674
## MTenure 760.993 127.086 5.988 9.72e-08 ***
## CTenure 944.978 421.687 2.241 0.028400 *
## PedCount 34087.359 9073.196 3.757 0.000366 ***
## Comp -25286.887 5491.937 -4.604 1.94e-05 ***
## Pop 3.667 1.466 2.501 0.014890 *
## Visibility 12625.447 9087.620 1.389 0.169411
## Res 91584.675 39231.283 2.334 0.022623 *
## Hours24 63233.307 19641.114 3.219 0.001994 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 56970 on 66 degrees of freedom
## Multiple R-squared: 0.6379, Adjusted R-squared: 0.594
## F-statistic: 14.53 on 8 and 66 DF, p-value: 5.382e-12
Variables namely MTenure, CTenure, PedCount, Comp, Pop,Res, Hours24 have a p-value less than 0.05 and hence they are Statistically Significant Variable Visibility has a p-value > 0.05 and it is Statistically Insignificant
The Profit will increase by $ 760.993 when the Manager’s tenure increases by 1 month The Profit will increase by $ 944.978 when the Crew’s tenure increase by 1 month
This dataset shows the effect of increasing mananger and crew’s tenure on the profit of the stores. Also, it discusses about various factors like Population in a 1/2 mile, Competitors in that 1/2 mile, Visibility and Pedestrian count ratings. From the regression analysis, we can observe that profit increases by around $ 760.993 if the manager’s tenure is increased by 1 month. At the same time, profit increases by around $ 944.978 if the crew’s tenure is increased by 1 month. From the corrgram, we could observe that, the parameters like Sales and Profit are positively correlated. It looks like providing techincally advanced training to managers and crew in highly profitable stores won’t be a feasible method to retain them. But, at the same time, it might prove effective for the managers and crew in least profitable stores. An assessment of population and competitors per 10000 in a 1/2 mile proves to crucial to decide on the profit of a store. Also, manager’s skill has a positive effect on sales and profit. From Pearson’s correlation test, it is evident that all the variables except Visibility, of the data set are statistically significant.