========================================

Exam

You have received the below report from a junior analyst. She is a new hire and has been working on this report for a few weeks. She has asked you to review the report. You should make corrections to the document, and explain in comment (in a code block) each of your changes.

Upload the finished Rmd to eCampus. You should be able to publish the document; put the link at to the top of the document.

Delete all of these instructions (everything between the === bars)

1 Introdution

This report presents a preliminary analysis of a dataset containing information about coffee shops across various U.S. cities. The objective is to identify patterns in shop location, coffee quality, and growth that may inform strategic decisions about future locations and marketing approaches. I find that using street is the best predictor of coffee shop growth. My R² value is .77.

#Section changes
# I changed the spelling error in Introduction
# I changed pricing to growth because our tribble doesn't deal with pricing
# I chnaged ratings to the actual variable coffee quality
# I changed shop density to location because we are dealing with city and street
# In the last sentance I changed address to street because it is our actual variable
# I changed R^2 to R²
# I changed the last sentance to say coffee shop growth instead of ratings

2 Data Overview

The dataset contains the following columns:

  • name: Name of the coffee shop
  • city: City where the shop is located
  • street: Street address
  • years_in_business: years in business
  • coffee_quality: good, ok, or bad.
  • growing: 1 if the shop is growing, 0 if not

There are 30 records and 6 variables.

#Section changes
# I added underscores in the years_in_business and coffee_quality variables to match the actual data

2.1 Correlations

# NOTE by Joe. Not sure why this doesn't work?
t2 <- t %>%
  mutate(
    coffee_quality_numeric = case_when(
      coffee_quality == "good" ~ 1,
      coffee_quality == "ok" ~ 2,
      coffee_quality == "bad" ~ 3
    ),
    city_numeric = case_when(
      city == "Springfield" ~ 1,
      city == "Riverton" ~ 2,
      city == "Oakville" ~ 3
    ))
  
correlation <- cor(select(t2, years_in_business, growing, coffee_quality_numeric, city_numeric))


library(ggcorrplot)

ggcorrplot(correlation, 
           lab = TRUE, 
           lab_size = 4, 
           method = "square",
           type = "lower",
           title = "Correlation Matrix of Coffee Shop Variables")

#Section changes
# I changed coffee_quality to a numeric variable to be used in the model if needed
# I changed up some of the spaces in the data and ran the correlation
# I added correlation to my enviornment
# I added the library ggcorrplot and made a correlation matrix of the numeric variables
# I made city a numeric variable so I could add it to the correlation too

3 Histograms

We begin by examining where most coffee shops are located.

table(t$city)
## 
##    Oakville    Riverton Springfield 
##          10          10          10
table(t$street)
## 
##   101 First Ave  1010 Willow Dr     123 Main St     135 Pine Rd   147 Birch Way 
##               1               1               6               1               1 
##   159 Willow Ln   258 Spruce Ct    303 Third Rd     334 Pine St  369 Cedar Blvd 
##               1               1               1               4               1 
## 404 Fourth Blvd    445 Cedar Rd      456 Elm St    556 Elm Blvd   667 Maple Way 
##               1               1               1               1               1 
##  707 Seventh Ct    778 Birch Ct     789 Oak Ave  808 Eighth Way  889 Hickory Ln 
##               1               1               1               1               1 
##   909 Ninth Ave   990 Spruce Pl 
##               1               1
hist(t$years_in_business, 
     main = "Histogram of Years in Business",
     xlab = "Years in Business",
     col = "lightblue", 
     border = "black")

barplot(table(t$growing),
        main = "Barplot of Growing Shops",
        names.arg = c("Not Growing", "Growing"),
        col = "lightgreen",
        border = "black",
        ylab = "Number of Shops")

#Section changes
# I made the years in business histogram cleaner and better to look at
# I changed the shop growth to a bar plot to more clearly show the difference between growing and non growing shops
# I left the tables as is because I thought they were good to read like that

4 Predicting growth

I predict growth by using a number of variables. The model is highly predictive. While it does not show Oakville, it does work for the other two cities.

m <- lm(growing ~  city_numeric + coffee_quality_numeric, data = t2)

summary(m)
## 
## Call:
## lm(formula = growing ~ city_numeric + coffee_quality_numeric, 
##     data = t2)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.83778 -0.27413  0.00466  0.16222  0.88344 
## 
## Coefficients:
##                        Estimate Std. Error t value Pr(>|t|)    
## (Intercept)             1.51352    0.22684   6.672 3.67e-07 ***
## city_numeric           -0.36061    0.08671  -4.159  0.00029 ***
## coffee_quality_numeric -0.15756    0.08700  -1.811  0.08126 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3754 on 27 degrees of freedom
## Multiple R-squared:  0.4905, Adjusted R-squared:  0.4527 
## F-statistic:    13 on 2 and 27 DF,  p-value: 0.0001114
#Section changes
#changed growth to be spelled correctly
#changed Oakfield to Oakville
#used city_numeric instead of street
#used coffee_quality_numeric instaed of coffee_quality
#took years in business out of my model because the r squared was higher and its p-value was high

5 Decision Tree

I create a decision tree to predict the growth variable.

# NOTE by Joe. Not sure why this doesn't work?
library(rpart)
library(rpart.plot)

tree_model <- rpart(
  growing ~ city + years_in_business + coffee_quality,
  data = t2,
  method = "class",
  control = rpart.control(cp = 0.001, minsplit = 2, minbucket = 1)
)
rpart.plot(tree_model, type = 2, extra = 106, fallen.leaves = TRUE, main = "Decision Tree Predicting Growth")

#Section changes
# I added the necessary Libraries
#I made the entire decision tree model