Analysis of the Store24 (A) Harvard Case Study

setwd("C:\\Users\\Adithya Nataraj\\Downloads")
store.df <- read.csv(paste("Store24.csv", sep=""))
View(store.df)
summary(store.df)
##      store          Sales             Profit          MTenure      
##  Min.   : 1.0   Min.   : 699306   Min.   :122180   Min.   :  0.00  
##  1st Qu.:19.5   1st Qu.: 984579   1st Qu.:211004   1st Qu.:  6.67  
##  Median :38.0   Median :1127332   Median :265014   Median : 24.12  
##  Mean   :38.0   Mean   :1205413   Mean   :276314   Mean   : 45.30  
##  3rd Qu.:56.5   3rd Qu.:1362388   3rd Qu.:331314   3rd Qu.: 50.92  
##  Max.   :75.0   Max.   :2113089   Max.   :518998   Max.   :277.99  
##     CTenure              Pop             Comp          Visibility  
##  Min.   :  0.8871   Min.   : 1046   Min.   : 1.651   Min.   :2.00  
##  1st Qu.:  4.3943   1st Qu.: 5616   1st Qu.: 3.151   1st Qu.:3.00  
##  Median :  7.2115   Median : 8896   Median : 3.629   Median :3.00  
##  Mean   : 13.9315   Mean   : 9826   Mean   : 3.788   Mean   :3.08  
##  3rd Qu.: 17.2156   3rd Qu.:14104   3rd Qu.: 4.230   3rd Qu.:4.00  
##  Max.   :114.1519   Max.   :26519   Max.   :11.128   Max.   :5.00  
##     PedCount         Res          Hours24       CrewSkill    
##  Min.   :1.00   Min.   :0.00   Min.   :0.00   Min.   :2.060  
##  1st Qu.:2.00   1st Qu.:1.00   1st Qu.:1.00   1st Qu.:3.225  
##  Median :3.00   Median :1.00   Median :1.00   Median :3.500  
##  Mean   :2.96   Mean   :0.96   Mean   :0.84   Mean   :3.457  
##  3rd Qu.:4.00   3rd Qu.:1.00   3rd Qu.:1.00   3rd Qu.:3.655  
##  Max.   :5.00   Max.   :1.00   Max.   :1.00   Max.   :4.640  
##     MgrSkill        ServQual     
##  Min.   :2.957   Min.   : 57.90  
##  1st Qu.:3.344   1st Qu.: 78.95  
##  Median :3.589   Median : 89.47  
##  Mean   :3.638   Mean   : 87.15  
##  3rd Qu.:3.925   3rd Qu.: 99.90  
##  Max.   :4.622   Max.   :100.00

The mean and standards deviations of Profit

mean(store.df$Profit)
## [1] 276313.6
sd(store.df$Profit)
## [1] 89404.08

The mean and standards deviations of MTenure

mean(store.df$MTenure)
## [1] 45.29644
sd(store.df$MTenure)
## [1] 57.67155

The mean and standards deviations of MTenure

mean(store.df$CTenure)
## [1] 13.9315
sd(store.df$CTenure)
## [1] 17.69752

Top 10 most profitable stores.

attach(store.df)
exhib1 <- store.df[order(-Profit),]
exhib1[1:10,1:5]
##    store   Sales Profit   MTenure    CTenure
## 74    74 1782957 518998 171.09720  29.519510
## 7      7 1809256 476355  62.53080   7.326488
## 9      9 2113089 474725 108.99350   6.061602
## 6      6 1703140 469050 149.93590  11.351130
## 44    44 1807740 439781 182.23640 114.151900
## 2      2 1619874 424007  86.22219   6.636550
## 45    45 1602362 410149  47.64565   9.166325
## 18    18 1704826 394039 239.96980  33.774130
## 11    11 1583446 389886  44.81977   2.036961
## 47    47 1665657 387853  12.84790   6.636550

Top 10 least profitable stores

exhib1 <- store.df[order(Profit),]
exhib1[1:10,1:5]
##    store   Sales Profit     MTenure   CTenure
## 57    57  699306 122180  24.3485700  2.956879
## 66    66  879581 146058 115.2039000  3.876797
## 41    41  744211 147327  14.9180200 11.926080
## 55    55  925744 147672   6.6703910 18.365500
## 32    32  828918 149033  36.0792600  6.636550
## 13    13  857843 152513   0.6571813  1.577002
## 54    54  811190 159792   6.6703910  3.876797
## 52    52 1073008 169201  24.1185600  3.416838
## 61    61  716589 177046  21.8184200 13.305950
## 37    37 1202917 187765  23.1985000  1.347023

Plot of Profit vs MTenure

plot(MTenure,Profit)

Plot of Profit vs CTenure

plot(CTenure,Profit)

Correlation Matrix of all the variables

library(corrplot)    
## corrplot 0.84 loaded
corrplot(corr=cor(store.df[ , c(1:14)], use="complete.obs"), 
         method ="ellipse")

Correlation between Profit and MTenure

cor(store.df$Profit,store.df$MTenure)
## [1] 0.4388692

Correlation between Profit and CTenure

cor(store.df$Profit,store.df$CTenure)
## [1] 0.2576789

Corrgram of all the variables

library(corrgram)
corrgram(store.df, order=TRUE,
         main="Corrgram of all the Variables",
         lower.panel=panel.shade, upper.panel=panel.pie,
         diag.panel=panel.minmax, text.panel=panel.txt) 

Correlation test between Profit and MTenure

cor.test(store.df$Profit,store.df$MTenure)
## 
##  Pearson's product-moment correlation
## 
## data:  store.df$Profit and store.df$MTenure
## t = 4.1731, df = 73, p-value = 8.193e-05
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.2353497 0.6055175
## sample estimates:
##       cor 
## 0.4388692

The p - Value is 8.193e-05.

Correlation test between Profit and CTenure

cor.test(store.df$Profit,store.df$CTenure)
## 
##  Pearson's product-moment correlation
## 
## data:  store.df$Profit and store.df$CTenure
## t = 2.2786, df = 73, p-value = 0.02562
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.03262507 0.45786339
## sample estimates:
##       cor 
## 0.2576789

The p - Value is 0.02562.

Regression of Profit on {MTenure, CTenure, Comp, Pop, PedCount, Res, Hours24, Visibility}

regr <- lm(Profit ~ MTenure + CTenure + Comp + Pop + PedCount + Res + Hours24 + Visibility, data = store.df)
summary(regr)
## 
## Call:
## lm(formula = Profit ~ MTenure + CTenure + Comp + Pop + PedCount + 
##     Res + Hours24 + Visibility, data = store.df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -105789  -35946   -7069   33780  112390 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   7610.041  66821.994   0.114 0.909674    
## MTenure        760.993    127.086   5.988 9.72e-08 ***
## CTenure        944.978    421.687   2.241 0.028400 *  
## Comp        -25286.887   5491.937  -4.604 1.94e-05 ***
## Pop              3.667      1.466   2.501 0.014890 *  
## PedCount     34087.359   9073.196   3.757 0.000366 ***
## Res          91584.675  39231.283   2.334 0.022623 *  
## Hours24      63233.307  19641.114   3.219 0.001994 ** 
## Visibility   12625.447   9087.620   1.389 0.169411    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 56970 on 66 degrees of freedom
## Multiple R-squared:  0.6379, Adjusted R-squared:  0.594 
## F-statistic: 14.53 on 8 and 66 DF,  p-value: 5.382e-12
regr$coefficients
##   (Intercept)       MTenure       CTenure          Comp           Pop 
##   7610.041452    760.992734    944.978026 -25286.886662      3.666606 
##      PedCount           Res       Hours24    Visibility 
##  34087.358789  91584.675234  63233.307162  12625.447050
residuals(regr)
##            1            2            3            4            5 
##  -17870.5566  112390.4448  -24652.2001   21254.9195   -8292.9911 
##            6            7            8            9           10 
##   89270.7785   84050.0803  -10870.2458   31488.0240  -21849.6437 
##           11           12           13           14           15 
##    -528.7222  -91759.0426  -57806.5916   -7068.7877  -75345.2520 
##           16           17           18           19           20 
##   -6104.0234  -86950.2209  -61254.3355    5413.3764   -5853.2921 
##           21           22           23           24           25 
##    5094.0156   95869.5511  -31589.1824   53013.4120   36072.8170 
##           26           27           28           29           30 
##   -7386.9737  -28735.7167   -7662.9590   53111.6789   73572.1725 
##           31           32           33           34           35 
##   14802.3715  -42214.3885   85510.4023   11712.8399    3995.3758 
##           36           37           38           39           40 
##  -13036.0714  -52665.8054    4157.3337  -39500.8957   49125.8347 
##           41           42           43           44           45 
##  -90438.9845  -13683.5771  -38699.0221  -35704.8151   59928.2412 
##           46           47           48           49           50 
##   36388.7486  -11664.7868   75418.5740  -20696.9041  -56799.7045 
##           51           52           53           54           55 
##  -45563.7849  -42912.6649  112306.9000  -36187.6388  -67002.3454 
##           56           57           58           59           60 
##   22171.0985 -105788.7112    9050.7093   38001.1613   24195.2780 
##           61           62           63           64           65 
##  -15038.1182  -16284.7396     509.1427  -97461.0600    8243.7616 
##           66           67           68           69           70 
##  -72921.5481  100520.7352   -4625.3831   95310.5717  -27907.3903 
##           71           72           73           74           75 
##   -7364.0084  -65662.7214    9331.0473  106126.6026   43997.8062

Variables MTenure, Comp, PedCount, Hours24, CTenure, Pop, Res are Statistically significant with p<0.05.

Variable Visibility is statistically insignificant with p>0.05.

The expected change in the Profit at a store, if the Manager’s tenure increases by one month is 760.993.

The expected change in the Profit at a store, if the Crew’s tenure increases by one month is 944.978.

Executive summary:

The summary from this analysis on “Store24 (A) Harvard Case Study”" based on the regression analysis is as follows. Regression analysis is basically representing a dependent variable as the linear combination of one or many independent variables. In this case the dependent variable is “Profit” which is represented as a linear combination of MTenure, CTenure, Comp, Pop, PedCount, Res, Hours24 and Visibility. Regression analysis is of the form of: Yi = b0 + b1*Xi + ei where ‘ei’ is the residual. The aim of a regression analysis is to include as much independent variables into the equation in an attempt to reduce the residuals. According to regression analysis, Variables MTenure, Comp, PedCount, Hours24, CTenure, Pop, Res have their p values less than 0.05, meaning they are statistically significant. Meaning, the expected change in the Profit at a store, if the Manager’s tenure increases by one month is 760.993. It is applicable to the rest of the variables with their p values less than 0.05. The variable Visibility has its p value more than 0.05 and is hence, statistically insignificant. The multiple R-squared indciates that the model accounts for 63.79% of the variance in Profit.

From this analysis and taking into consideration the different correlation plots, we can say that Profit is very closely linked with sales. This is nothing suprising as only id sales happens, profit can be gained. Another interessting phenomenon is also noted ie. the tenure of the Manager(MTenure) is more dominant in getting more profits than the tenure of the Crew(CTenure). This is because of the fact that a more experienced manager can manage the crew more efficiently and hence get more profits. Also, the population in a ½ mile radius around the store played a bigger role in determining profits than the store visibility of the store front. This can be explained by the fact that, no matter how visually attractive the store might look, profits can be forgotten if there is no consumer market. Even if a store was there in a resedential area, a store that was open for 24 hours had more profits. This can be explained by the fact that resendtial areas had a fixed sleeping cycle and a store open 24 hours wouldn’t make a difference. Whereas if it was present in an industrial area and was open for 24 hours, the profits would increase as industries normally employed people in shifts. If a store was present in a resedntial area, it had more profits than the stores which had more competitiors in a ½ mile radius. This can be explained in a matter of probability as the competitor could possibly get a fraction of the customers. Also a store located in a densely populated area had more profits than the stores which had competitors around. This algorithm can be employed to alter the location of the stores in a way that it had no competitors anywhere around, had a manager with a good tenure and located in a populated area.

LEGEND FOR THE VARIABLES:

Profit - Fiscal Year 2000 Profit before corporate overhead allocations, rent, and depreciation

MTenure - Average manager tenure during FY-2000 where tenure is defined as the number of months of experience with Store24

CTenure - Average crew tenure during FY-2000 where tenure is defined as the number of months of experience with Store24

Comp - Number of competitors per 10,000 people within a ½ mile radius

Pop - Population within a ½ mile radius

Visible - 5-point rating on visibility of store front with 5 being the highest

PedCount - 5-point rating on pedestrian foot traffic volume with 5 being the highest

Hours24 - Indicator for open 24 hours or not

Res - Indicator for located in residential vs. industrial area