Using R, read the data into a data frame called store. Using R, get the summary statistics of the data.

store.df<-read.csv(paste("Store24.csv", sep=""))
summary(store.df)
##      store          Sales             Profit          MTenure      
##  Min.   : 1.0   Min.   : 699306   Min.   :122180   Min.   :  0.00  
##  1st Qu.:19.5   1st Qu.: 984579   1st Qu.:211004   1st Qu.:  6.67  
##  Median :38.0   Median :1127332   Median :265014   Median : 24.12  
##  Mean   :38.0   Mean   :1205413   Mean   :276314   Mean   : 45.30  
##  3rd Qu.:56.5   3rd Qu.:1362388   3rd Qu.:331314   3rd Qu.: 50.92  
##  Max.   :75.0   Max.   :2113089   Max.   :518998   Max.   :277.99  
##     CTenure              Pop             Comp          Visibility  
##  Min.   :  0.8871   Min.   : 1046   Min.   : 1.651   Min.   :2.00  
##  1st Qu.:  4.3943   1st Qu.: 5616   1st Qu.: 3.151   1st Qu.:3.00  
##  Median :  7.2115   Median : 8896   Median : 3.629   Median :3.00  
##  Mean   : 13.9315   Mean   : 9826   Mean   : 3.788   Mean   :3.08  
##  3rd Qu.: 17.2156   3rd Qu.:14104   3rd Qu.: 4.230   3rd Qu.:4.00  
##  Max.   :114.1519   Max.   :26519   Max.   :11.128   Max.   :5.00  
##     PedCount         Res          Hours24       CrewSkill    
##  Min.   :1.00   Min.   :0.00   Min.   :0.00   Min.   :2.060  
##  1st Qu.:2.00   1st Qu.:1.00   1st Qu.:1.00   1st Qu.:3.225  
##  Median :3.00   Median :1.00   Median :1.00   Median :3.500  
##  Mean   :2.96   Mean   :0.96   Mean   :0.84   Mean   :3.457  
##  3rd Qu.:4.00   3rd Qu.:1.00   3rd Qu.:1.00   3rd Qu.:3.655  
##  Max.   :5.00   Max.   :1.00   Max.   :1.00   Max.   :4.640  
##     MgrSkill        ServQual     
##  Min.   :2.957   Min.   : 57.90  
##  1st Qu.:3.344   1st Qu.: 78.95  
##  Median :3.589   Median : 89.47  
##  Mean   :3.638   Mean   : 87.15  
##  3rd Qu.:3.925   3rd Qu.: 99.90  
##  Max.   :4.622   Max.   :100.00

Three very important variables in this analysis are the store Profit, the management tenure (MTenure) and the crew tenure (CTenure).

1

Use R to measure the mean and standard deviation of Profit.

mean(store.df$Profit)
## [1] 276313.6
sd(store.df$Profit)
## [1] 89404.08

2

Use R to measure the mean and standard deviation of MTenure.

mean(store.df$MTenure)
## [1] 45.29644
sd(store.df$MTenure)
## [1] 57.67155

3

Use R to measure the mean and standard deviation of CTenure.

mean(store.df$CTenure)
## [1] 13.9315
sd(store.df$CTenure)
## [1] 17.69752

4

Use R to print the {StoreID, Sales, Profit, MTenure, CTenure} of the top 10 most profitable stores.

mostprofitable <- store.df[order(-store.df$Profit),]
mostprofitable[1:10,1:5,]
##    store   Sales Profit   MTenure    CTenure
## 74    74 1782957 518998 171.09720  29.519510
## 7      7 1809256 476355  62.53080   7.326488
## 9      9 2113089 474725 108.99350   6.061602
## 6      6 1703140 469050 149.93590  11.351130
## 44    44 1807740 439781 182.23640 114.151900
## 2      2 1619874 424007  86.22219   6.636550
## 45    45 1602362 410149  47.64565   9.166325
## 18    18 1704826 394039 239.96980  33.774130
## 11    11 1583446 389886  44.81977   2.036961
## 47    47 1665657 387853  12.84790   6.636550

5

Use R to print the {StoreID, Sales, Profit, MTenure, CTenure} of the bottom 10 least profitable stores.

leastprofitable <- store.df[order(store.df$Profit),]
leastprofitable[1:10,1:5,]
##    store   Sales Profit     MTenure   CTenure
## 57    57  699306 122180  24.3485700  2.956879
## 66    66  879581 146058 115.2039000  3.876797
## 41    41  744211 147327  14.9180200 11.926080
## 55    55  925744 147672   6.6703910 18.365500
## 32    32  828918 149033  36.0792600  6.636550
## 13    13  857843 152513   0.6571813  1.577002
## 54    54  811190 159792   6.6703910  3.876797
## 52    52 1073008 169201  24.1185600  3.416838
## 61    61  716589 177046  21.8184200 13.305950
## 37    37 1202917 187765  23.1985000  1.347023

6

Use R to draw a scatter plot of Profit vs. MTenure.

library(car)
scatterplot(store.df$Profit~store.df$MTenure, data=store.df, spread=FALSE, xlab="MTenure", ylab="Profit", main="Scatterplot of Profit vs MTenure")

7

Use R to draw a scatter plot of Profit vs. CTenure.

library(car)
scatterplot(store.df$Profit~store.df$CTenure, data=store.df, spread=FALSE, xlab="CTenure", ylab="Profit", main="Scatterplot of Profit vs CTenure")

8

Use R to construct a Correlation Matrix for all the variables in the dataset. (Display the numbers up to 2 Decimal places)

options(digits=2)
cor(store.df, use="complete.obs", method="kendall") 
##              store  Sales Profit MTenure CTenure    Pop   Comp Visibility
## store       1.0000 -0.157 -0.141 -0.0105 -0.0062 -0.186 -0.019    -0.0163
## Sales      -0.1575  1.000  0.779  0.2562  0.1387  0.200 -0.179     0.1457
## Profit     -0.1409  0.779  1.000  0.2467  0.1916  0.230 -0.259     0.1403
## MTenure    -0.0105  0.256  0.247  1.0000  0.0965 -0.042  0.121     0.0068
## CTenure    -0.0062  0.139  0.192  0.0965  1.0000 -0.128 -0.107     0.0500
## Pop        -0.1863  0.200  0.230 -0.0424 -0.1279  1.000 -0.113     0.0118
## Comp       -0.0191 -0.179 -0.259  0.1214 -0.1068 -0.113  1.000     0.0670
## Visibility -0.0163  0.146  0.140  0.0068  0.0500  0.012  0.067     1.0000
## PedCount   -0.1378  0.315  0.317 -0.0017 -0.0538  0.461 -0.215    -0.1127
## Res        -0.0258 -0.134 -0.150  0.0442 -0.1013 -0.170  0.186     0.0162
## Hours24     0.0221  0.073  0.023 -0.0882  0.0215 -0.238  0.102     0.0381
## CrewSkill  -0.0319  0.113  0.107  0.1180  0.1675  0.158 -0.046    -0.1783
## MgrSkill   -0.0621  0.175  0.149  0.1873  0.0152  0.033  0.170     0.0059
## ServQual   -0.2278  0.277  0.251  0.1697  0.0578  0.065  0.059     0.1581
##            PedCount    Res Hours24 CrewSkill MgrSkill ServQual
## store       -0.1378 -0.026  0.0221    -0.032  -0.0621   -0.228
## Sales        0.3148 -0.134  0.0732     0.113   0.1755    0.277
## Profit       0.3173 -0.150  0.0235     0.107   0.1495    0.251
## MTenure     -0.0017  0.044 -0.0882     0.118   0.1873    0.170
## CTenure     -0.0538 -0.101  0.0215     0.167   0.0152    0.058
## Pop          0.4606 -0.170 -0.2375     0.158   0.0332    0.065
## Comp        -0.2154  0.186  0.1022    -0.046   0.1704    0.059
## Visibility  -0.1127  0.016  0.0381    -0.178   0.0059    0.158
## PedCount     1.0000 -0.255 -0.2850     0.123   0.0490   -0.054
## Res         -0.2553  1.000 -0.0891    -0.158  -0.0311    0.088
## Hours24     -0.2850 -0.089  1.0000     0.140   0.0042    0.045
## CrewSkill    0.1229 -0.158  0.1395     1.000   0.0501   -0.013
## MgrSkill     0.0490 -0.031  0.0042     0.050   1.0000    0.241
## ServQual    -0.0536  0.088  0.0447    -0.013   0.2407    1.000

9

Use R to measure the correlation between Profit and MTenure. (Display the numbers up to 2 Decimal places)

options(digits=2)
cor(store.df$Profit, store.df$MTenure)
## [1] 0.44

10

Use R to measure the correlation between Profit and CTenure. (Display the numbers up to 2 Decimal places)

options(digits=2)
cor(store.df$Profit, store.df$CTenure)
## [1] 0.26

11

Use R to construct a Corrgram based on all variables in the dataset.

library(corrgram)
corrgram(store.df, order=FALSE, 
         lower.panel=panel.shade,
         upper.panel=panel.pie, 
         diag.panel=panel.minmax,
         text.panel=panel.txt,
         main="Corrgram of store variables")

12

Run a Pearson’s Correlation test on the correlation between Profit and MTenure.

cor.test(store.df[,"Profit"], store.df[,"MTenure"])
## 
##  Pearson's product-moment correlation
## 
## data:  store.df[, "Profit"] and store.df[, "MTenure"]
## t = 4, df = 70, p-value = 8e-05
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.24 0.61
## sample estimates:
##  cor 
## 0.44

The p-value is 8e-05.

13

Run a Pearson’s Correlation test on the correlation between Profit and CTenure.

cor.test(store.df[,"Profit"], store.df[,"CTenure"])
## 
##  Pearson's product-moment correlation
## 
## data:  store.df[, "Profit"] and store.df[, "CTenure"]
## t = 2, df = 70, p-value = 0.03
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.033 0.458
## sample estimates:
##  cor 
## 0.26

The p-value is 0.03.

14

Run a regression of Profit on {MTenure, CTenure Comp, Pop, PedCount, Res, Hours24, Visibility}.

reg_Profit <- lm(Profit ~ 
           MTenure 
         + CTenure 
         + Pop
         + PedCount 
         + Res
         + Visibility
         + Hours24 
         + Comp, 
         data=store.df)
summary(reg_Profit)
## 
## Call:
## lm(formula = Profit ~ MTenure + CTenure + Pop + PedCount + Res + 
##     Visibility + Hours24 + Comp, data = store.df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -105789  -35946   -7069   33780  112390 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   7610.04   66821.99    0.11  0.90967    
## MTenure        760.99     127.09    5.99  9.7e-08 ***
## CTenure        944.98     421.69    2.24  0.02840 *  
## Pop              3.67       1.47    2.50  0.01489 *  
## PedCount     34087.36    9073.20    3.76  0.00037 ***
## Res          91584.68   39231.28    2.33  0.02262 *  
## Visibility   12625.45    9087.62    1.39  0.16941    
## Hours24      63233.31   19641.11    3.22  0.00199 ** 
## Comp        -25286.89    5491.94   -4.60  1.9e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 57000 on 66 degrees of freedom
## Multiple R-squared:  0.638,  Adjusted R-squared:  0.594 
## F-statistic: 14.5 on 8 and 66 DF,  p-value: 5.38e-12

15 & 16

reg_Profit$coefficients
## (Intercept)     MTenure     CTenure         Pop    PedCount         Res 
##      7610.0       761.0       945.0         3.7     34087.4     91584.7 
##  Visibility     Hours24        Comp 
##     12625.4     63233.3    -25286.9

Since PedCount, Visibility, Hours24, Res can only take on unique values, it is better to convert the variable PedCount, Visibility, Hours24 and Res into a factor variable

store.df$PedCount <- factor(store.df$PedCount)
store.df$Visibility <- factor(store.df$Visibility)
store.df$Hours24 <- factor(store.df$Hours24)


store.df$Res[store.df$Res == 0] <- 'Ind' 
store.df$Res[store.df$Res == 1] <- 'Res' 

store.df$Res <- factor(store.df$Res)

Now, running regression again,

reg_Profit <- lm(Profit ~ MTenure + CTenure + Pop + PedCount + Res + Visibility + Hours24 + Comp, data=store.df)

summary(reg_Profit)
## 
## Call:
## lm(formula = Profit ~ MTenure + CTenure + Pop + PedCount + Res + 
##     Visibility + Hours24 + Comp, data = store.df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -117216  -33066   -7849   34399  110419 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  48476.21   61471.75    0.79   0.4334    
## MTenure        810.97     145.02    5.59  5.6e-07 ***
## CTenure       1016.02     443.76    2.29   0.0255 *  
## Pop              3.22       1.56    2.07   0.0428 *  
## PedCount2    67304.86   31677.19    2.12   0.0377 *  
## PedCount3    94474.47   32160.24    2.94   0.0047 ** 
## PedCount4   115585.56   34704.23    3.33   0.0015 ** 
## PedCount5   186176.93   52125.82    3.57   0.0007 ***
## ResRes       92247.44   40918.29    2.25   0.0278 *  
## Visibility3  17696.74   19903.86    0.89   0.3774    
## Visibility4  27823.71   22395.95    1.24   0.2189    
## Visibility5  34400.48   49081.13    0.70   0.4860    
## Hours241     57108.15   21767.66    2.62   0.0110 *  
## Comp        -26201.68    5877.72   -4.46  3.6e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 58200 on 61 degrees of freedom
## Multiple R-squared:  0.651,  Adjusted R-squared:  0.576 
## F-statistic: 8.74 on 13 and 61 DF,  p-value: 1.07e-09

17 & 18

reg_Profit$coefficients
## (Intercept)     MTenure     CTenure         Pop   PedCount2   PedCount3 
##     48476.2       811.0      1016.0         3.2     67304.9     94474.5 
##   PedCount4   PedCount5      ResRes Visibility3 Visibility4 Visibility5 
##    115585.6    186176.9     92247.4     17696.7     27823.7     34400.5 
##    Hours241        Comp 
##     57108.2    -26201.7

Answer: $810.971201

Answer: $1016.017324

Executive Summary