Loading the Store24.csv and Viewing the file
setwd("~/SIP/SIP Phase 2/R Programming/Udemy Class Material/Week 3")
store.df <- read.csv(paste("Store24.csv", sep=""))
View(store.df)
Measuring Mean and Standard Deviation of Profit, MTenure, CTenure
library(psych)
## Warning: package 'psych' was built under R version 3.4.3
describe(store.df$Profit)
## vars n mean sd median trimmed mad min max range
## X1 1 75 276313.6 89404.08 265014 270260.3 90532 122180 518998 396818
## skew kurtosis se
## X1 0.62 -0.21 10323.49
describe(store.df$MTenure)
## vars n mean sd median trimmed mad min max range skew kurtosis
## X1 1 75 45.3 57.67 24.12 33.58 29.67 0 277.99 277.99 2.01 3.9
## se
## X1 6.66
describe(store.df$CTenure)
## vars n mean sd median trimmed mad min max range skew kurtosis
## X1 1 75 13.93 17.7 7.21 10.6 6.14 0.89 114.15 113.26 3.52 15
## se
## X1 2.04
The Mean and Standard Deviation are
Profit: 276313.6 89404.08
MTenure: 45.3 57.67
CTenure: 13.93 17.7
Printing the {StoreID, Sales, Profit, MTenure, CTenure} of the top 10 most profitable stores.
Top10 <- store.df[order(store.df$Profit), c(1:4)]
Top10[1:10,]
## store Sales Profit MTenure
## 57 57 699306 122180 24.3485700
## 66 66 879581 146058 115.2039000
## 41 41 744211 147327 14.9180200
## 55 55 925744 147672 6.6703910
## 32 32 828918 149033 36.0792600
## 13 13 857843 152513 0.6571813
## 54 54 811190 159792 6.6703910
## 52 52 1073008 169201 24.1185600
## 61 61 716589 177046 21.8184200
## 37 37 1202917 187765 23.1985000
Printing the {StoreID, Sales, Profit, MTenure, CTenure} of the Least 10 profitable stores.
Least10 <- store.df[order(-store.df$Profit), c(1:4)]
Least10[1:10,]
## store Sales Profit MTenure
## 74 74 1782957 518998 171.09720
## 7 7 1809256 476355 62.53080
## 9 9 2113089 474725 108.99350
## 6 6 1703140 469050 149.93590
## 44 44 1807740 439781 182.23640
## 2 2 1619874 424007 86.22219
## 45 45 1602362 410149 47.64565
## 18 18 1704826 394039 239.96980
## 11 11 1583446 389886 44.81977
## 47 47 1665657 387853 12.84790
Scatter plot of Profit vs. MTenure with a simple linear regression between both
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.4.3
##
## Attaching package: 'ggplot2'
## The following objects are masked from 'package:psych':
##
## %+%, alpha
plot(store.df$MTenure, store.df$Profit, xlab = "Manager Tenure (Months)", ylab = "Store Profit ($)", main = "Store Profit Vs. Manager Tenure", cex = 0.6)
abline(lm(Profit ~ MTenure, data = store.df), col = "blue")

Scatter plot of Profit vs. CTenure with a simple linear regression between both
plot(store.df$CTenure, store.df$Profit, xlab = "Crew Tenure (Months)", ylab = "Store Profit ($)", main = "Store Profit Vs. Manager Tenure", cex = 0.6)
abline(lm(Profit~CTenure, data=store.df), col="blue")

Correlation Matrix for all the variables in the dataset.
round(cor(store.df), 2)
## store Sales Profit MTenure CTenure Pop Comp Visibility
## store 1.00 -0.23 -0.20 -0.06 0.02 -0.29 0.03 -0.03
## Sales -0.23 1.00 0.92 0.45 0.25 0.40 -0.24 0.13
## Profit -0.20 0.92 1.00 0.44 0.26 0.43 -0.33 0.14
## MTenure -0.06 0.45 0.44 1.00 0.24 -0.06 0.18 0.16
## CTenure 0.02 0.25 0.26 0.24 1.00 0.00 -0.07 0.07
## Pop -0.29 0.40 0.43 -0.06 0.00 1.00 -0.27 -0.05
## Comp 0.03 -0.24 -0.33 0.18 -0.07 -0.27 1.00 0.03
## Visibility -0.03 0.13 0.14 0.16 0.07 -0.05 0.03 1.00
## PedCount -0.22 0.42 0.45 0.06 -0.08 0.61 -0.15 -0.14
## Res -0.03 -0.17 -0.16 -0.06 -0.34 -0.24 0.22 0.02
## Hours24 0.03 0.06 -0.03 -0.17 0.07 -0.22 0.13 0.05
## CrewSkill 0.05 0.16 0.16 0.10 0.26 0.28 -0.04 -0.20
## MgrSkill -0.07 0.31 0.32 0.23 0.12 0.08 0.22 0.07
## ServQual -0.32 0.39 0.36 0.18 0.08 0.12 0.02 0.21
## PedCount Res Hours24 CrewSkill MgrSkill ServQual
## store -0.22 -0.03 0.03 0.05 -0.07 -0.32
## Sales 0.42 -0.17 0.06 0.16 0.31 0.39
## Profit 0.45 -0.16 -0.03 0.16 0.32 0.36
## MTenure 0.06 -0.06 -0.17 0.10 0.23 0.18
## CTenure -0.08 -0.34 0.07 0.26 0.12 0.08
## Pop 0.61 -0.24 -0.22 0.28 0.08 0.12
## Comp -0.15 0.22 0.13 -0.04 0.22 0.02
## Visibility -0.14 0.02 0.05 -0.20 0.07 0.21
## PedCount 1.00 -0.28 -0.28 0.21 0.09 -0.01
## Res -0.28 1.00 -0.09 -0.15 -0.03 0.09
## Hours24 -0.28 -0.09 1.00 0.11 -0.04 0.06
## CrewSkill 0.21 -0.15 0.11 1.00 -0.02 -0.03
## MgrSkill 0.09 -0.03 -0.04 -0.02 1.00 0.36
## ServQual -0.01 0.09 0.06 -0.03 0.36 1.00
Correlation between Profit and MTenure, Correlation between Profit and CTenure.
round(cor(store.df$Profit, store.df$MTenure), 2)
## [1] 0.44
round(cor(store.df$Profit, store.df$CTenure), 2)
## [1] 0.26
Corrgram based on all variables in the dataset.
library("corrgram", lib.loc="~/R/win-library/3.4")
## Warning: package 'corrgram' was built under R version 3.4.3
corrgram(store.df, order=TRUE, lower.panel=panel.shade, upper.panel=panel.pie, text.panel=panel.txt, main="Corrgram of Store24 variables")

Pearson’s Correlation test on the correlation between Profit and MTenure, Profit and CTenure.
chisq.test(store.df$Profit, store.df$MTenure)
## Warning in chisq.test(store.df$Profit, store.df$MTenure): Chi-squared
## approximation may be incorrect
##
## Pearson's Chi-squared test
##
## data: store.df$Profit and store.df$MTenure
## X-squared = 4425, df = 4366, p-value = 0.2625
chisq.test(store.df$Profit, store.df$CTenure)
## Warning in chisq.test(store.df$Profit, store.df$CTenure): Chi-squared
## approximation may be incorrect
##
## Pearson's Chi-squared test
##
## data: store.df$Profit and store.df$CTenure
## X-squared = 4200, df = 4144, p-value = 0.2677
p-value for Profit and MTenure is 0.2625
p-value for Profit and CTenure is 0.2677
Regression of Profit on {MTenure, CTenure Comp, Pop, PedCount, Res, Hours24, Visibility}.
fit <- lm(Profit ~ MTenure+CTenure+Comp+Pop+PedCount+Res+Hours24+Visibility, data=store.df)
summary(fit)
##
## Call:
## lm(formula = Profit ~ MTenure + CTenure + Comp + Pop + PedCount +
## Res + Hours24 + Visibility, data = store.df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -105789 -35946 -7069 33780 112390
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7610.041 66821.994 0.114 0.909674
## MTenure 760.993 127.086 5.988 9.72e-08 ***
## CTenure 944.978 421.687 2.241 0.028400 *
## Comp -25286.887 5491.937 -4.604 1.94e-05 ***
## Pop 3.667 1.466 2.501 0.014890 *
## PedCount 34087.359 9073.196 3.757 0.000366 ***
## Res 91584.675 39231.283 2.334 0.022623 *
## Hours24 63233.307 19641.114 3.219 0.001994 **
## Visibility 12625.447 9087.620 1.389 0.169411
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 56970 on 66 degrees of freedom
## Multiple R-squared: 0.6379, Adjusted R-squared: 0.594
## F-statistic: 14.53 on 8 and 66 DF, p-value: 5.382e-12
MTenure, CTenure, Comp, Pop, PedCount, Res, Hours24 are the explanatory variables/ regressors whose beta coefficients are statistically significant (p<0.05)
Visibility is the explanatory variable/ regressor whose beta coefficients are not statistically significant (p>0.05)
If a Manager’s Tenure is increased by one month then the Store’s profit increases by $760.993.
If a Crew member’s Tenure is increased by one month then the Store’s profit increases by $944.978