Loading the Store24.csv and Viewing the file

setwd("~/SIP/SIP Phase 2/R Programming/Udemy Class Material/Week 3")
store.df <- read.csv(paste("Store24.csv", sep=""))
View(store.df)

Measuring Mean and Standard Deviation of Profit, MTenure, CTenure

library(psych)
## Warning: package 'psych' was built under R version 3.4.3
describe(store.df$Profit)
##    vars  n     mean       sd median  trimmed   mad    min    max  range
## X1    1 75 276313.6 89404.08 265014 270260.3 90532 122180 518998 396818
##    skew kurtosis       se
## X1 0.62    -0.21 10323.49
describe(store.df$MTenure)
##    vars  n mean    sd median trimmed   mad min    max  range skew kurtosis
## X1    1 75 45.3 57.67  24.12   33.58 29.67   0 277.99 277.99 2.01      3.9
##      se
## X1 6.66
describe(store.df$CTenure)
##    vars  n  mean   sd median trimmed  mad  min    max  range skew kurtosis
## X1    1 75 13.93 17.7   7.21    10.6 6.14 0.89 114.15 113.26 3.52       15
##      se
## X1 2.04

The Mean and Standard Deviation are

Profit: 276313.6 89404.08

MTenure: 45.3 57.67

CTenure: 13.93 17.7

Printing the {StoreID, Sales, Profit, MTenure, CTenure} of the top 10 most profitable stores.

Top10 <- store.df[order(store.df$Profit), c(1:4)]
Top10[1:10,]
##    store   Sales Profit     MTenure
## 57    57  699306 122180  24.3485700
## 66    66  879581 146058 115.2039000
## 41    41  744211 147327  14.9180200
## 55    55  925744 147672   6.6703910
## 32    32  828918 149033  36.0792600
## 13    13  857843 152513   0.6571813
## 54    54  811190 159792   6.6703910
## 52    52 1073008 169201  24.1185600
## 61    61  716589 177046  21.8184200
## 37    37 1202917 187765  23.1985000

Printing the {StoreID, Sales, Profit, MTenure, CTenure} of the Least 10 profitable stores.

Least10 <- store.df[order(-store.df$Profit), c(1:4)]
Least10[1:10,]
##    store   Sales Profit   MTenure
## 74    74 1782957 518998 171.09720
## 7      7 1809256 476355  62.53080
## 9      9 2113089 474725 108.99350
## 6      6 1703140 469050 149.93590
## 44    44 1807740 439781 182.23640
## 2      2 1619874 424007  86.22219
## 45    45 1602362 410149  47.64565
## 18    18 1704826 394039 239.96980
## 11    11 1583446 389886  44.81977
## 47    47 1665657 387853  12.84790

Scatter plot of Profit vs. MTenure with a simple linear regression between both

library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.4.3
## 
## Attaching package: 'ggplot2'
## The following objects are masked from 'package:psych':
## 
##     %+%, alpha
plot(store.df$MTenure, store.df$Profit, xlab = "Manager Tenure (Months)", ylab = "Store Profit ($)", main = "Store Profit Vs. Manager Tenure", cex = 0.6)
abline(lm(Profit ~ MTenure, data = store.df), col = "blue")

Scatter plot of Profit vs. CTenure with a simple linear regression between both

plot(store.df$CTenure, store.df$Profit, xlab = "Crew Tenure (Months)", ylab = "Store Profit ($)", main = "Store Profit Vs. Manager Tenure", cex = 0.6)
abline(lm(Profit~CTenure, data=store.df), col="blue")

Correlation Matrix for all the variables in the dataset.

round(cor(store.df), 2)
##            store Sales Profit MTenure CTenure   Pop  Comp Visibility
## store       1.00 -0.23  -0.20   -0.06    0.02 -0.29  0.03      -0.03
## Sales      -0.23  1.00   0.92    0.45    0.25  0.40 -0.24       0.13
## Profit     -0.20  0.92   1.00    0.44    0.26  0.43 -0.33       0.14
## MTenure    -0.06  0.45   0.44    1.00    0.24 -0.06  0.18       0.16
## CTenure     0.02  0.25   0.26    0.24    1.00  0.00 -0.07       0.07
## Pop        -0.29  0.40   0.43   -0.06    0.00  1.00 -0.27      -0.05
## Comp        0.03 -0.24  -0.33    0.18   -0.07 -0.27  1.00       0.03
## Visibility -0.03  0.13   0.14    0.16    0.07 -0.05  0.03       1.00
## PedCount   -0.22  0.42   0.45    0.06   -0.08  0.61 -0.15      -0.14
## Res        -0.03 -0.17  -0.16   -0.06   -0.34 -0.24  0.22       0.02
## Hours24     0.03  0.06  -0.03   -0.17    0.07 -0.22  0.13       0.05
## CrewSkill   0.05  0.16   0.16    0.10    0.26  0.28 -0.04      -0.20
## MgrSkill   -0.07  0.31   0.32    0.23    0.12  0.08  0.22       0.07
## ServQual   -0.32  0.39   0.36    0.18    0.08  0.12  0.02       0.21
##            PedCount   Res Hours24 CrewSkill MgrSkill ServQual
## store         -0.22 -0.03    0.03      0.05    -0.07    -0.32
## Sales          0.42 -0.17    0.06      0.16     0.31     0.39
## Profit         0.45 -0.16   -0.03      0.16     0.32     0.36
## MTenure        0.06 -0.06   -0.17      0.10     0.23     0.18
## CTenure       -0.08 -0.34    0.07      0.26     0.12     0.08
## Pop            0.61 -0.24   -0.22      0.28     0.08     0.12
## Comp          -0.15  0.22    0.13     -0.04     0.22     0.02
## Visibility    -0.14  0.02    0.05     -0.20     0.07     0.21
## PedCount       1.00 -0.28   -0.28      0.21     0.09    -0.01
## Res           -0.28  1.00   -0.09     -0.15    -0.03     0.09
## Hours24       -0.28 -0.09    1.00      0.11    -0.04     0.06
## CrewSkill      0.21 -0.15    0.11      1.00    -0.02    -0.03
## MgrSkill       0.09 -0.03   -0.04     -0.02     1.00     0.36
## ServQual      -0.01  0.09    0.06     -0.03     0.36     1.00

Correlation between Profit and MTenure, Correlation between Profit and CTenure.

round(cor(store.df$Profit, store.df$MTenure), 2)
## [1] 0.44
round(cor(store.df$Profit, store.df$CTenure), 2)
## [1] 0.26

Corrgram based on all variables in the dataset.

library("corrgram", lib.loc="~/R/win-library/3.4")
## Warning: package 'corrgram' was built under R version 3.4.3
corrgram(store.df, order=TRUE, lower.panel=panel.shade, upper.panel=panel.pie, text.panel=panel.txt, main="Corrgram of Store24 variables")

Pearson’s Correlation test on the correlation between Profit and MTenure, Profit and CTenure.

chisq.test(store.df$Profit, store.df$MTenure)
## Warning in chisq.test(store.df$Profit, store.df$MTenure): Chi-squared
## approximation may be incorrect
## 
##  Pearson's Chi-squared test
## 
## data:  store.df$Profit and store.df$MTenure
## X-squared = 4425, df = 4366, p-value = 0.2625
chisq.test(store.df$Profit, store.df$CTenure)
## Warning in chisq.test(store.df$Profit, store.df$CTenure): Chi-squared
## approximation may be incorrect
## 
##  Pearson's Chi-squared test
## 
## data:  store.df$Profit and store.df$CTenure
## X-squared = 4200, df = 4144, p-value = 0.2677

p-value for Profit and MTenure is 0.2625

p-value for Profit and CTenure is 0.2677

Regression of Profit on {MTenure, CTenure Comp, Pop, PedCount, Res, Hours24, Visibility}.

fit <- lm(Profit ~ MTenure+CTenure+Comp+Pop+PedCount+Res+Hours24+Visibility, data=store.df)
summary(fit)
## 
## Call:
## lm(formula = Profit ~ MTenure + CTenure + Comp + Pop + PedCount + 
##     Res + Hours24 + Visibility, data = store.df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -105789  -35946   -7069   33780  112390 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   7610.041  66821.994   0.114 0.909674    
## MTenure        760.993    127.086   5.988 9.72e-08 ***
## CTenure        944.978    421.687   2.241 0.028400 *  
## Comp        -25286.887   5491.937  -4.604 1.94e-05 ***
## Pop              3.667      1.466   2.501 0.014890 *  
## PedCount     34087.359   9073.196   3.757 0.000366 ***
## Res          91584.675  39231.283   2.334 0.022623 *  
## Hours24      63233.307  19641.114   3.219 0.001994 ** 
## Visibility   12625.447   9087.620   1.389 0.169411    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 56970 on 66 degrees of freedom
## Multiple R-squared:  0.6379, Adjusted R-squared:  0.594 
## F-statistic: 14.53 on 8 and 66 DF,  p-value: 5.382e-12

MTenure, CTenure, Comp, Pop, PedCount, Res, Hours24 are the explanatory variables/ regressors whose beta coefficients are statistically significant (p<0.05)

Visibility is the explanatory variable/ regressor whose beta coefficients are not statistically significant (p>0.05)

If a Manager’s Tenure is increased by one month then the Store’s profit increases by $760.993.

If a Crew member’s Tenure is increased by one month then the Store’s profit increases by $944.978

The scatterplot between Profit Vs. MTenure and Profit Vs. CTenure shows a positive linear relationship. The same is emphasized from the results of the correlation matrix. The Correlogram visualizes the correlations between all the variables of Store24 data-set. From the correlogram it is evident that Profit is strongly impacted by Population, Pedestrian foot traffic volume, Crew Tenure, Crew Skill, Manager Tenure, Manager Skill, Service Quality and Competitors. From the Linear Regression Model Fit we can see that the variables Manager’s Tenure, Pedestrian foot traffic volume and competitors have a greater impact on the profits of the store followed by the variables Crew Tenure, Population and location of store (either residential or industrial).

  1. From the analysis, Gordon’s argument is proven to be right. The site location factors are important and at the same time the ‘people factors’ also play a significant role in impacting store’s performance
  2. Manager and Crew Tenure, when collectively put together, have lesser significance than the site location factors such as population, number of competitors, pedestrian foot traffic volume
  3. From the correlogram and regression coefficients and their significance, increasing wages/ bonuses and spending on training and development programs for the managers proves to be beneficial, statistically