This is an assignment given on Week 3, Day 1 of the Data Analytics Internship under Prof. Sameer Mathur, IIML.

Task 4a

Qualitatively identify the crucial issues being faced by the management. Based on your judgment, prepare a list of the most important questions that matter.

1.How important manager and crew tenure are relative to people factors such as employee skill and experiences in optimizing a given site’s performance. 2.How important manager and crew tenure are relative to site-location factors such as population, number of competitors, and pedestrian access in determining store level financial performance." 3.Whether increasing wages, implementing a bonus program, instituting new training programs, or developing a career development program would be the best course of action for increasing employee tenure.

Task 4b

Think about how might a dataset associated with Exhibit 2 help in answering your list of questions from TASK 4a.

1.The relationship between MgrSkill, crewSkill and Mtenure and Ctenure can be used to establish whether employee skill and experiences has a positive impact on sales and profit. 2.The relationship between pop, Comp, Visible,PedCount, Hours24,Res and Mtenure and Ctenure can be used to establish whether increase in tenure has a positive impact on sales and profit. 3.The relationship between sales, profit and Mtenure and Ctenure can be used to establish whether increase in tenure has a positive impact on sales and profit.

Task 4c

Download and review the Store24.csv data file associated with this case. Using R, read the data into a data frame called store.

store <- read.csv(file="Store24.csv",head=TRUE,sep=",")
View(store)

Using R, get the summary statistics of the data. Confirm that the summary statistics generated from R are consistent with Exhibit 3 from the Case.

summary(store)

Task 4d

Use R to measure the mean and standard deviation of Profit.

describe(store$Profit)[,3:4]

Use R to measure the mean and standard deviation of MTenure.

describe(store$MTenure)[,3:4]

Use R to measure the mean and standard deviation of CTenure.

describe(store$CTenure)[,3:4]

Task 4e

Sorting and Subsetting data in R.

attach(mtcars)
View(mtcars)
newdata <- mtcars[order(mpg),] # sort by mpg (ascending)
View(newdata)
newdata[1:5,] # see the first 5 rows
newdata <- mtcars[order(-mpg),] # sort by mpg (descending)
View(newdata)
detach(mtcars)

Task 4f

Replicate Exhibit 1 shown in the case, using R.

Use R to print the {StoreID, Sales, Profit, MTenure, CTenure} of the top 10 most profitable stores.

top.df <- store[order(-store$Profit),]
top.df[1:10,1:5]

Use R to print the {StoreID, Sales, Profit, MTenure, CTenure} of the bottom 10 least profitable stores.

bottom.df <- store[order(store$Profit),]
bottom.df[1:10,1:5]

Task 4g

Scatter Plots. Use R to draw a scatter plot of Profit vs. MTenure.

library(car)
scatterplot(store$MTenure ,store$Profit,
+             xlab = "MTenure", ylab = "Profit", 
+             main = "Scatterplot of Profit vs. MTenure")

Task 4h

Scatter Plots. Use R to draw a scatter plot of Profit vs. CTenure.

library(car)
scatterplot(store$CTenure ,store$Profit,
+             xlab = "CTenure", ylab = "Profit", 
+             main = "Scatterplot of Profit vs. CTenure")

Task 4i

Correlation Matrix Use R to construct a Correlation Matrix for all the variables in the dataset.

round(digits=2,cor(store))

Task 4j

Correlations Use R to measure the correlation between Profit and MTenure.

round(digits=2,cor(store$Profit,store$MTenure))

Use R to measure the correlation between Profit and CTenure.

round(digits=2,cor(store$Profit,store$CTenure))

Task 4k

Use R to construct the following Corrgram based on all variables in the dataset.

library(corrgram)
corrgram(store, lower.panel=panel.shade,
         upper.panel=panel.pie, text.panel=panel.txt,
         main="Corrgram of Store Variables")

Profit is significantly correlated to Mtenure, Comp, Pop, PedCount, Hours24. It means that if Manager tenure, Competitors, Pedestrian count and the availibility period values have a high impact on the profit. These variables could be either positively or negatively correlated to it, which shall help a manager take decisions on these parameters to increase the sales and hence the profit of the store.

Task 4l

Pearson’s Correlation Tests Run a Pearson’s Correlation test on the correlation between Profit and MTenure.

cor.test(store$Profit , store$MTenure)

The p-value is : 8.193e-05

Run a Pearson’s Correlation test on the correlation between Profit and CTenure.

cor.test(store$Profit , store$CTenure)

The p-value is : 0.02562

Task 3m

Regression Analysis Run a regression of Profit on {MTenure, CTenure Comp, Pop, PedCount, Res, Hours24, Visibility}.

model <- lm(Profit ~ MTenure + CTenure + Comp + Pop + PedCount + Res + Hours24 + Visibility, data = store)
summary(model)

Task 4n

Based on TASK 3m, answer the following questions: The explanatory variables whose beta-coefficients are statistically significant (p < 0.05) are MTenure, CTenure, Comp, Pop, PedCount, Res, Hours24. The explanatory variable whose beta-coefficient is NOT statistically significant (p > 0.05) is visibility.

Task 4o

Based on TASK 2m, answer the following questions: What is expected change in the Profit at a store, if the Manager’s tenure i.e. number of months of experience with Store24, increases by one month?

round(summary(model)$coefficients["MTenure",1], digits=0)

What is expected change in the Profit at a store, if the Crew’s tenure i.e. number of months of experience with Store24, increases by one month?

round(summary(model)$coefficients["CTenure",1], digits=0)

TASK 4p Executive Summary

1.The most profitable store is with ID:74 and the least profitable store is :57

2.The correlation between Profit and CTenure is 0.26 , of Profit and MTenure is 0.44 .

3.The regression coefficient suggests that the value of p is significiant which says it is a good fit model.

4.R square value is:0.6379.So 63.79% of variations in the dependent variable can be explained by the independent variable.

5.Adjusted R square value is 0.594.It means 59.4% variation in the dependent variable can be explained by the independent variable also the value decreases as we add no of independent variables to it.

6.Explanatory variable(s) whose beta-coefficients are statistically significant are - MTenure , CTenure, Pop , PedCount , Res , Hours24 while that whose beta-coefficients are not statistically significant is the Visibility variable.

By This analysis we can say that it is a good model to fit in.