2 Introdution

This report presents a preliminary analysis of a dataset containing information about coffee shops across various U.S. cities. The objective is to identify patterns in shop location, coffee quality, and growth that may inform strategic decisions about future locations and marketing approaches. I find that using street is the best predictor of coffee shop growth. My R² value is .77.

3 Data Overview

The dataset contains the following columns:

  • name: Name of the coffee shop
  • city: City where the shop is located
  • street: Street address
  • years_in_business: years in business
  • coffee_quality: good, ok, or bad.
  • growing: 1 if the shop is growing, 0 if not

There are 30 records and 6 variables.

3.1 Correlations

These are correlations to show how each variable relates to one another

4 Histograms

We begin by examining where most coffee shops are located.

## 
##    Oakville    Riverton Springfield 
##          10          10          10
## 
##   101 First Ave  1010 Willow Dr     123 Main St     135 Pine Rd   147 Birch Way 
##               1               1               6               1               1 
##   159 Willow Ln   258 Spruce Ct    303 Third Rd     334 Pine St  369 Cedar Blvd 
##               1               1               1               4               1 
## 404 Fourth Blvd    445 Cedar Rd      456 Elm St    556 Elm Blvd   667 Maple Way 
##               1               1               1               1               1 
##  707 Seventh Ct    778 Birch Ct     789 Oak Ave  808 Eighth Way  889 Hickory Ln 
##               1               1               1               1               1 
##   909 Ninth Ave   990 Spruce Pl 
##               1               1

5 Predicting growth

I predict growth by using a number of variables. The model is highly predictive. This model works for all cities.

## 
## Call:
## lm(formula = growing ~ city_numeric + coffee_quality_numeric, 
##     data = t2)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.83778 -0.27413  0.00466  0.16222  0.88344 
## 
## Coefficients:
##                        Estimate Std. Error t value Pr(>|t|)    
## (Intercept)             1.51352    0.22684   6.672 3.67e-07 ***
## city_numeric           -0.36061    0.08671  -4.159  0.00029 ***
## coffee_quality_numeric -0.15756    0.08700  -1.811  0.08126 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3754 on 27 degrees of freedom
## Multiple R-squared:  0.4905, Adjusted R-squared:  0.4527 
## F-statistic:    13 on 2 and 27 DF,  p-value: 0.0001114

6 Decision Tree

I create a decision tree to predict the growth variable.