This is an assignment given on Week 3, Day 1 of the Data Analytics Internship under Prof. Sameer Mathur, IIML.
Qualitatively identify the crucial issues being faced by the management. Based on your judgment, prepare a list of the most important questions that matter.
1.How important manager and crew tenure are relative to people factors such as employee skill and experiences in optimizing a given site’s performance. 2.How important manager and crew tenure are relative to site-location factors such as population, number of competitors, and pedestrian access in determining store level financial performance." 3.Whether increasing wages, implementing a bonus program, instituting new training programs, or developing a career development program would be the best course of action for increasing employee tenure.
Think about how might a dataset associated with Exhibit 2 help in answering your list of questions from TASK 4a.
1.The relationship between MgrSkill, crewSkill and Mtenure and Ctenure can be used to establish whether employee skill and experiences has a positive impact on sales and profit. 2.The relationship between pop, Comp, Visible,PedCount, Hours24,Res and Mtenure and Ctenure can be used to establish whether increase in tenure has a positive impact on sales and profit. 3.The relationship between sales, profit and Mtenure and Ctenure can be used to establish whether increase in tenure has a positive impact on sales and profit.
Download and review the Store24.csv data file associated with this case. Using R, read the data into a data frame called store.
store <- read.csv(file="Store24.csv",head=TRUE,sep=",")
View(store)
Using R, get the summary statistics of the data. Confirm that the summary statistics generated from R are consistent with Exhibit 3 from the Case.
summary(store)
Use R to measure the mean and standard deviation of Profit.
describe(store$Profit)[,3:4]
Use R to measure the mean and standard deviation of MTenure.
describe(store$MTenure)[,3:4]
Use R to measure the mean and standard deviation of CTenure.
describe(store$CTenure)[,3:4]
Sorting and Subsetting data in R.
attach(mtcars)
View(mtcars)
newdata <- mtcars[order(mpg),] # sort by mpg (ascending)
View(newdata)
newdata[1:5,] # see the first 5 rows
newdata <- mtcars[order(-mpg),] # sort by mpg (descending)
View(newdata)
detach(mtcars)
Replicate Exhibit 1 shown in the case, using R.
Use R to print the {StoreID, Sales, Profit, MTenure, CTenure} of the top 10 most profitable stores.
top.df <- store[order(-store$Profit),]
top.df[1:10,1:5]
Use R to print the {StoreID, Sales, Profit, MTenure, CTenure} of the bottom 10 least profitable stores.
bottom.df <- store[order(store$Profit),]
bottom.df[1:10,1:5]
Scatter Plots. Use R to draw a scatter plot of Profit vs. MTenure.
library(car)
scatterplot(store$MTenure ,store$Profit,
+ xlab = "MTenure", ylab = "Profit",
+ main = "Scatterplot of Profit vs. MTenure")
Scatter Plots. Use R to draw a scatter plot of Profit vs. CTenure.
library(car)
scatterplot(store$CTenure ,store$Profit,
+ xlab = "CTenure", ylab = "Profit",
+ main = "Scatterplot of Profit vs. CTenure")
Correlation Matrix Use R to construct a Correlation Matrix for all the variables in the dataset.
round(digits=2,cor(store))
Correlations Use R to measure the correlation between Profit and MTenure.
round(digits=2,cor(store$Profit,store$MTenure))
Use R to measure the correlation between Profit and CTenure.
round(digits=2,cor(store$Profit,store$CTenure))
Use R to construct the following Corrgram based on all variables in the dataset.
library(corrgram)
corrgram(store, lower.panel=panel.shade,
upper.panel=panel.pie, text.panel=panel.txt,
main="Corrgram of Store Variables")
Profit is significantly correlated to Mtenure, Comp, Pop, PedCount, Hours24. It means that if Manager tenure, Competitors, Pedestrian count and the availibility period values have a high impact on the profit. These variables could be either positively or negatively correlated to it, which shall help a manager take decisions on these parameters to increase the sales and hence the profit of the store.
Pearson’s Correlation Tests Run a Pearson’s Correlation test on the correlation between Profit and MTenure.
cor.test(store$Profit , store$MTenure)
The p-value is : 8.193e-05
Run a Pearson’s Correlation test on the correlation between Profit and CTenure.
cor.test(store$Profit , store$CTenure)
The p-value is : 0.02562
Regression Analysis Run a regression of Profit on {MTenure, CTenure Comp, Pop, PedCount, Res, Hours24, Visibility}.
model <- lm(Profit ~ MTenure + CTenure + Comp + Pop + PedCount + Res + Hours24 + Visibility, data = store)
summary(model)
Based on TASK 3m, answer the following questions: The explanatory variables whose beta-coefficients are statistically significant (p < 0.05) are MTenure, CTenure, Comp, Pop, PedCount, Res, Hours24. The explanatory variable whose beta-coefficient is NOT statistically significant (p > 0.05) is visibility.
Based on TASK 2m, answer the following questions: What is expected change in the Profit at a store, if the Manager’s tenure i.e. number of months of experience with Store24, increases by one month?
round(summary(model)$coefficients["MTenure",1], digits=0)
What is expected change in the Profit at a store, if the Crew’s tenure i.e. number of months of experience with Store24, increases by one month?
round(summary(model)$coefficients["CTenure",1], digits=0)
1.The most profitable store is with ID:74 and the least profitable store is :57
2.The correlation between Profit and CTenure is 0.26 , of Profit and MTenure is 0.44 .
3.The regression coefficient suggests that the value of p is significiant which says it is a good fit model.
4.R square value is:0.6379.So 63.79% of variations in the dependent variable can be explained by the independent variable.
5.Adjusted R square value is 0.594.It means 59.4% variation in the dependent variable can be explained by the independent variable also the value decreases as we add no of independent variables to it.
6.Explanatory variable(s) whose beta-coefficients are statistically significant are - MTenure , CTenure, Pop , PedCount , Res , Hours24 while that whose beta-coefficients are not statistically significant is the Visibility variable.
By This analysis we can say that it is a good model to fit in.