R Markdown

This is an R Markdown document. This Rmd file shows the tabulations and graphs generated from Store 24 dataset.

#TASK 4-c: (To read the dataset and assess Summary statisticss)
setwd("F:/R-Internship/Course related files") 
store24.df<-read.csv(paste("Store24.csv",sep=""))
View(store24.df)
summary(store24.df)
##      store          Sales             Profit          MTenure      
##  Min.   : 1.0   Min.   : 699306   Min.   :122180   Min.   :  0.00  
##  1st Qu.:19.5   1st Qu.: 984579   1st Qu.:211004   1st Qu.:  6.67  
##  Median :38.0   Median :1127332   Median :265014   Median : 24.12  
##  Mean   :38.0   Mean   :1205413   Mean   :276314   Mean   : 45.30  
##  3rd Qu.:56.5   3rd Qu.:1362388   3rd Qu.:331314   3rd Qu.: 50.92  
##  Max.   :75.0   Max.   :2113089   Max.   :518998   Max.   :277.99  
##     CTenure              Pop             Comp          Visibility  
##  Min.   :  0.8871   Min.   : 1046   Min.   : 1.651   Min.   :2.00  
##  1st Qu.:  4.3943   1st Qu.: 5616   1st Qu.: 3.151   1st Qu.:3.00  
##  Median :  7.2115   Median : 8896   Median : 3.629   Median :3.00  
##  Mean   : 13.9315   Mean   : 9826   Mean   : 3.788   Mean   :3.08  
##  3rd Qu.: 17.2156   3rd Qu.:14104   3rd Qu.: 4.230   3rd Qu.:4.00  
##  Max.   :114.1519   Max.   :26519   Max.   :11.128   Max.   :5.00  
##     PedCount         Res          Hours24       CrewSkill    
##  Min.   :1.00   Min.   :0.00   Min.   :0.00   Min.   :2.060  
##  1st Qu.:2.00   1st Qu.:1.00   1st Qu.:1.00   1st Qu.:3.225  
##  Median :3.00   Median :1.00   Median :1.00   Median :3.500  
##  Mean   :2.96   Mean   :0.96   Mean   :0.84   Mean   :3.457  
##  3rd Qu.:4.00   3rd Qu.:1.00   3rd Qu.:1.00   3rd Qu.:3.655  
##  Max.   :5.00   Max.   :1.00   Max.   :1.00   Max.   :4.640  
##     MgrSkill        ServQual     
##  Min.   :2.957   Min.   : 57.90  
##  1st Qu.:3.344   1st Qu.: 78.95  
##  Median :3.589   Median : 89.47  
##  Mean   :3.638   Mean   : 87.15  
##  3rd Qu.:3.925   3rd Qu.: 99.90  
##  Max.   :4.622   Max.   :100.00
#TASK 4-d: (To compute mean and standard deviation of various columns of the dataset)
mean(store24.df$Profit) 
## [1] 276313.6
sd(store24.df$Profit)
## [1] 89404.08
mean(store24.df$MTenure)
## [1] 45.29644
sd(store24.df$MTenure)
## [1] 57.67155
mean(store24.df$CTenure)
## [1] 13.9315
sd(store24.df$CTenure)
## [1] 17.69752
#TASK 4-e: (Sorting and Subsetting data)

attach(mtcars) 
View(mtcars)
newdata <- mtcars[order(mpg),] # sort by mpg (ascending)
View(newdata)
newdata[1:5,] # see the first 5 rows
##                      mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Cadillac Fleetwood  10.4   8  472 205 2.93 5.250 17.98  0  0    3    4
## Lincoln Continental 10.4   8  460 215 3.00 5.424 17.82  0  0    3    4
## Camaro Z28          13.3   8  350 245 3.73 3.840 15.41  0  0    3    4
## Duster 360          14.3   8  360 245 3.21 3.570 15.84  0  0    3    4
## Chrysler Imperial   14.7   8  440 230 3.23 5.345 17.42  0  0    3    4
newdata <- mtcars[order(-mpg),] # sort by mpg (descending)
View(newdata)
detach(mtcars)
#TASK 4-f: (To display the first 5 columns of top 10 and bottom least 10 profitable stores)
attach(store24.df) 
View(store24.df)
store1<-store24.df[order(-Profit),]
store2<-store1[1:10,1:5]
View(store2)
store3<-store1[66:75,1:5]
View(store3)
#TASK 4-g: (Scatterplot of Profit and MTenure)
plot(store24.df$Profit~store24.df$MTenure,xlab="MTenure",
     ylab="Profit",main="Scatterplot of Profit vs. MTenure",pch=19) 
abline(lm(store24.df$Profit~store24.df$MTenure),col="green 3")

#TASK 4-h: (Scatterplot of Profit and CTenure)
plot(store24.df$Profit~store24.df$CTenure,xlab="CTenure",
     ylab="Profit",main="Scatterplot of Profit vs. CTenure",pch=19) 
abline(lm(store24.df$Profit~store24.df$CTenure),col="green 3")

#TASK 4-i: (Constructing a correlation matrix for all variables of dataset)
round(cor(store24.df),2)
##            store Sales Profit MTenure CTenure   Pop  Comp Visibility
## store       1.00 -0.23  -0.20   -0.06    0.02 -0.29  0.03      -0.03
## Sales      -0.23  1.00   0.92    0.45    0.25  0.40 -0.24       0.13
## Profit     -0.20  0.92   1.00    0.44    0.26  0.43 -0.33       0.14
## MTenure    -0.06  0.45   0.44    1.00    0.24 -0.06  0.18       0.16
## CTenure     0.02  0.25   0.26    0.24    1.00  0.00 -0.07       0.07
## Pop        -0.29  0.40   0.43   -0.06    0.00  1.00 -0.27      -0.05
## Comp        0.03 -0.24  -0.33    0.18   -0.07 -0.27  1.00       0.03
## Visibility -0.03  0.13   0.14    0.16    0.07 -0.05  0.03       1.00
## PedCount   -0.22  0.42   0.45    0.06   -0.08  0.61 -0.15      -0.14
## Res        -0.03 -0.17  -0.16   -0.06   -0.34 -0.24  0.22       0.02
## Hours24     0.03  0.06  -0.03   -0.17    0.07 -0.22  0.13       0.05
## CrewSkill   0.05  0.16   0.16    0.10    0.26  0.28 -0.04      -0.20
## MgrSkill   -0.07  0.31   0.32    0.23    0.12  0.08  0.22       0.07
## ServQual   -0.32  0.39   0.36    0.18    0.08  0.12  0.02       0.21
##            PedCount   Res Hours24 CrewSkill MgrSkill ServQual
## store         -0.22 -0.03    0.03      0.05    -0.07    -0.32
## Sales          0.42 -0.17    0.06      0.16     0.31     0.39
## Profit         0.45 -0.16   -0.03      0.16     0.32     0.36
## MTenure        0.06 -0.06   -0.17      0.10     0.23     0.18
## CTenure       -0.08 -0.34    0.07      0.26     0.12     0.08
## Pop            0.61 -0.24   -0.22      0.28     0.08     0.12
## Comp          -0.15  0.22    0.13     -0.04     0.22     0.02
## Visibility    -0.14  0.02    0.05     -0.20     0.07     0.21
## PedCount       1.00 -0.28   -0.28      0.21     0.09    -0.01
## Res           -0.28  1.00   -0.09     -0.15    -0.03     0.09
## Hours24       -0.28 -0.09    1.00      0.11    -0.04     0.06
## CrewSkill      0.21 -0.15    0.11      1.00    -0.02    -0.03
## MgrSkill       0.09 -0.03   -0.04     -0.02     1.00     0.36
## ServQual      -0.01  0.09    0.06     -0.03     0.36     1.00
#TASK 4-j: (Measuring the correlation between variables)
round(cor(store24.df$Profit,store24.df$MTenure),2) 
## [1] 0.44
round(cor(store24.df$Profit,store24.df$CTenure),2)
## [1] 0.26
#TASK 4-k: (Corrgram of all variables of the dataset)
library(corrgram)
corrgram(store24.df,lower.panel = panel.shade,upper.panel = panel.pie,
         text.panel = panel.txt,main=" Corrgram of store variables")

#TASK 4-k - EXPLANATION: From the corrgram, it looks like, Sales and Profit are strongly correlated with each other. Also, manager’s tenure and crew’s tenure seem to have a great effect on the profit of the company.

#TASK 4-l: (Correlation test between variables)
cor.test(store24.df$Profit,store24.df$MTenure) 
## 
##  Pearson's product-moment correlation
## 
## data:  store24.df$Profit and store24.df$MTenure
## t = 4.1731, df = 73, p-value = 8.193e-05
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.2353497 0.6055175
## sample estimates:
##       cor 
## 0.4388692
cor.test(store24.df$Profit,store24.df$CTenure)
## 
##  Pearson's product-moment correlation
## 
## data:  store24.df$Profit and store24.df$CTenure
## t = 2.2786, df = 73, p-value = 0.02562
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.03262507 0.45786339
## sample estimates:
##       cor 
## 0.2576789
#TASK 4-m: (Regression Analysis)
store5<-lm(Profit~MTenure+CTenure+PedCount 
           +Comp+Pop+Visibility+Res+Hours24,data=store24.df)
store5
## 
## Call:
## lm(formula = Profit ~ MTenure + CTenure + PedCount + Comp + Pop + 
##     Visibility + Res + Hours24, data = store24.df)
## 
## Coefficients:
## (Intercept)      MTenure      CTenure     PedCount         Comp  
##    7610.041      760.993      944.978    34087.359   -25286.887  
##         Pop   Visibility          Res      Hours24  
##       3.667    12625.447    91584.675    63233.307
summary(store5)
## 
## Call:
## lm(formula = Profit ~ MTenure + CTenure + PedCount + Comp + Pop + 
##     Visibility + Res + Hours24, data = store24.df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -105789  -35946   -7069   33780  112390 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   7610.041  66821.994   0.114 0.909674    
## MTenure        760.993    127.086   5.988 9.72e-08 ***
## CTenure        944.978    421.687   2.241 0.028400 *  
## PedCount     34087.359   9073.196   3.757 0.000366 ***
## Comp        -25286.887   5491.937  -4.604 1.94e-05 ***
## Pop              3.667      1.466   2.501 0.014890 *  
## Visibility   12625.447   9087.620   1.389 0.169411    
## Res          91584.675  39231.283   2.334 0.022623 *  
## Hours24      63233.307  19641.114   3.219 0.001994 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 56970 on 66 degrees of freedom
## Multiple R-squared:  0.6379, Adjusted R-squared:  0.594 
## F-statistic: 14.53 on 8 and 66 DF,  p-value: 5.382e-12

TASK 4-n:

Variables namely MTenure, CTenure, PedCount, Comp, Pop,Res, Hours24 have a p-value less than 0.05 and hence they are Statistically Significant Variable Visibility has a p-value > 0.05 and it is Statistically Insignificant

TASK 4-o:

The Profit will increase by $ 760.993 when the Manager’s tenure increases by 1 month The Profit will increase by $ 944.978 when the Crew’s tenure increase by 1 month

TASK 4-p:

This dataset shows the effect of increasing mananger and crew’s tenure on the profit of the stores. Also, it discusses about various factors like Population in a 1/2 mile, Competitors in that 1/2 mile, Visibility and Pedestrian count ratings. From the regression analysis, we can observe that profit increases by around $ 760.993 if the manager’s tenure is increased by 1 month. At the same time, profit increases by around $ 944.978 if the crew’s tenure is increased by 1 month. From the corrgram, we could observe that, the parameters like Sales and Profit are positively correlated. It looks like providing techincally advanced training to managers and crew in highly profitable stores won’t be a feasible method to retain them. But, at the same time, it might prove effective for the managers and crew in least profitable stores. An assessment of population and competitors per 10000 in a 1/2 mile proves to crucial to decide on the profit of a store. Also, manager’s skill has a positive effect on sales and profit. From Pearson’s correlation test, it is evident that all the variables except Visibility, of the data set are statistically significant.