http://rpubs.com/kgc00005/1304861

1 Introduction

This report presents a preliminary analysis of a dataset containing information about coffee shops across various U.S. cities. The objective is to identify patterns in shop density, ratings, and pricing that may inform strategic decisions about future locations and marketing approaches. I find that using location and quality is the best predictor of coffee shop growth. My r^2 value is .45.

2 Data Overview

The dataset contains the following columns:

  • name: Name of the coffee shop
  • city: City where the shop is located
  • street: Street address
  • years in business: years in business
  • coffee quality: good, ok, or bad.
  • growing: 1 if the shop is growing, 0 if not
  • coffee_quality_numeric: 3 for good, 2 for ok, 1 for bad
  • city_using_numbers: 1 for Springfield, 2 for Riverton, 3 for Oakville

There are 30 records and 6 variables.

2.1 Correlations

##                        years_in_business coffee_quality_numeric
## years_in_business             1.00000000              0.2375221
## coffee_quality_numeric        0.23752214              1.0000000
## city_using_numbers            0.07926197             -0.2508375
## growing                       0.01210747              0.4050555
##                        city_using_numbers     growing
## years_in_business              0.07926197  0.01210747
## coffee_quality_numeric        -0.25083752  0.40505554
## city_using_numbers             1.00000000 -0.65465367
## growing                       -0.65465367  1.00000000

3 Histograms

We begin by examining where most coffee shops are located.

## 
##    Oakville    Riverton Springfield 
##          10          10          10
## 
##   101 First Ave  1010 Willow Dr     123 Main St     135 Pine Rd   147 Birch Way 
##               1               1               6               1               1 
##   159 Willow Ln   258 Spruce Ct    303 Third Rd     334 Pine St  369 Cedar Blvd 
##               1               1               1               4               1 
## 404 Fourth Blvd    445 Cedar Rd      456 Elm St    556 Elm Blvd   667 Maple Way 
##               1               1               1               1               1 
##  707 Seventh Ct    778 Birch Ct     789 Oak Ave  808 Eighth Way  889 Hickory Ln 
##               1               1               1               1               1 
##   909 Ninth Ave   990 Spruce Pl 
##               1               1

## 
##  0  1 
## 16 14

4 Predicting gorwth

I predict growth by using a number of variables. The model is highly predictive. It shows all 3 cities as numeric values, and the coffee quality as a numeric value. The adjusted r^2 value is 0.45, which is decent. The p-value is 0.0001, which is significant.

## 
## Call:
## lm(formula = growing ~ coffee_quality_numeric + city_using_numbers, 
##     data = t)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.83778 -0.27413  0.00466  0.16222  0.88344 
## 
## Coefficients:
##                        Estimate Std. Error t value Pr(>|t|)    
## (Intercept)             0.88326    0.27874   3.169  0.00378 ** 
## coffee_quality_numeric  0.15756    0.08700   1.811  0.08126 .  
## city_using_numbers     -0.36061    0.08671  -4.159  0.00029 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3754 on 27 degrees of freedom
## Multiple R-squared:  0.4905, Adjusted R-squared:  0.4527 
## F-statistic:    13 on 2 and 27 DF,  p-value: 0.0001114

5 Decision Tree

I create a decision tree to predict the growth variable.

## [1] 0.8666667