Introduction

This report presents a preliminary analysis of a dataset containing information about coffee shops across various U.S. cities. The objective is to identify patterns in shop density, ratings, and pricing that may inform strategic decisions about future locations and marketing approaches. I find that using address is the best predictor of coffee shop quality. My r^2 value is .80. I changed the coffee quality to a numeric value because it was a categorical variable and I wanted to use it in a regression. I also used the street address as a factor variable for the same reason.

## 
## Call:
## lm(formula = quality_score ~ street_num + city, data = t)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.3333  0.0000  0.0000  0.0000  0.6667 
## 
## Coefficients:
##                             Estimate Std. Error t value Pr(>|t|)  
## (Intercept)                2.294e+00  1.061e+00   2.162   0.0739 .
## street_num1010 Willow Dr  -2.941e-01  1.337e+00  -0.220   0.8331  
## street_num123 Main St     -5.294e-01  1.043e+00  -0.508   0.6298  
## street_num135 Pine Rd     -8.627e-01  1.403e+00  -0.615   0.5611  
## street_num147 Birch Way   -8.627e-01  1.403e+00  -0.615   0.5611  
## street_num159 Willow Ln   -8.627e-01  1.403e+00  -0.615   0.5611  
## street_num258 Spruce Ct    1.373e-01  1.403e+00   0.098   0.9252  
## street_num303 Third Rd     2.010e-15  1.149e+00   0.000   1.0000  
## street_num334 Pine St      1.765e-01  9.243e-01   0.191   0.8549  
## street_num369 Cedar Blvd   1.373e-01  1.403e+00   0.098   0.9252  
## street_num404 Fourth Blvd  2.000e+00  1.149e+00   1.741   0.1324  
## street_num445 Cedar Rd    -2.941e-01  1.337e+00  -0.220   0.8331  
## street_num456 Elm St      -8.627e-01  1.403e+00  -0.615   0.5611  
## street_num556 Elm Blvd    -2.941e-01  1.337e+00  -0.220   0.8331  
## street_num667 Maple Way   -1.294e+00  1.337e+00  -0.968   0.3703  
## street_num707 Seventh Ct   2.000e+00  1.149e+00   1.741   0.1324  
## street_num778 Birch Ct    -2.941e-01  1.337e+00  -0.220   0.8331  
## street_num789 Oak Ave      1.373e-01  1.403e+00   0.098   0.9252  
## street_num808 Eighth Way   1.000e+00  1.149e+00   0.870   0.4176  
## street_num889 Hickory Ln   7.059e-01  1.337e+00   0.528   0.6163  
## street_num909 Ninth Ave    2.176e-15  1.149e+00   0.000   1.0000  
## street_num990 Spruce Pl   -1.294e+00  1.337e+00  -0.968   0.3703  
## cityRiverton              -1.294e+00  6.826e-01  -1.896   0.1068  
## citySpringfield            5.686e-01  7.013e-01   0.811   0.4484  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.8125 on 6 degrees of freedom
## Multiple R-squared:  0.8006, Adjusted R-squared:  0.03639 
## F-statistic: 1.048 on 23 and 6 DF,  p-value: 0.5232

Data Overview

The dataset contains the following columns:

There are 30 records and 8 variables.

Correlations

I created a correlation matrix to see if there are any correlations between the variables. I found that the years in business and coffee quality are correlated, but not strongly. The correlation is .3. I also found that the years in business and growing are correlated at .4.

## corrplot 0.95 loaded

##                   years_in_business    growing quality_score
## years_in_business        1.00000000 0.01210747     0.2375221
## growing                  0.01210747 1.00000000     0.4050555
## quality_score            0.23752214 0.40505554     1.0000000

I also added a bar graph to show the relationship between avg. years in business and coffee quality. I found that the average years in business is highest for ok coffee quality, followed by good, and then bad.

Histograms

We begin by examining where most coffee shops are located. We see that the coffee shops are split between three cities: Springfield, Riverton, and Oakville, with a number of streets in each city. I have created a boxplot to show the years in business by city. We can see that the Oakville shops are the oldest, while Riverton has the most new shops.

## 
##    Oakville    Riverton Springfield 
##          10          10          10

Predicting gorwth

I predict growth by using a number of variables, like stree, year_in_business, and coffee_quality. I find that the street address is the best predictor of growth. I also find that the years in business and coffee quality are correlated with growth. I created a linear regression model to predict growth using these variables. The r^2 value is .77, which means that 77% of the variance in growth can be explained by these variables.

## 
## Call:
## lm(formula = growing ~ street + years_in_business + coffee_quality, 
##     data = t)
## 
## Residuals:
##          1          2          3          4          5          6          7 
##  2.045e-01 -6.245e-17 -1.943e-16 -2.776e-17  3.750e-01 -1.041e-16 -2.498e-16 
##          8          9         10         11         12         13         14 
## -3.469e-17  3.030e-01 -3.469e-17  1.298e-15 -5.265e-01 -9.714e-17 -2.082e-17 
##         15         16         17         18         19         20         21 
##  8.030e-01 -8.333e-02 -7.633e-17 -7.633e-17 -1.180e-16 -1.402e-01 -3.561e-01 
##         22         23         24         25         26         27         28 
##  4.996e-16 -5.795e-01 -4.857e-17 -2.082e-17 -2.984e-16 -1.041e-16 -7.633e-17 
##         29         30 
## -7.633e-17  0.000e+00 
## 
## Coefficients:
##                       Estimate Std. Error t value Pr(>|t|)
## (Intercept)           -0.22727    0.70462  -0.323    0.760
## street1010 Willow Dr   0.46970    1.05233   0.446    0.674
## street123 Main St      0.41288    0.67025   0.616    0.565
## street135 Pine Rd      0.58333    1.06535   0.548    0.608
## street147 Birch Way    1.86742    1.24601   1.499    0.194
## street159 Willow Ln    1.64015    1.08566   1.511    0.191
## street258 Spruce Ct    0.50379    0.99430   0.507    0.634
## street303 Third Rd     0.11364    0.84503   0.134    0.898
## street334 Pine St      0.25379    0.67025   0.379    0.720
## street369 Cedar Blvd   0.67424    0.92359   0.730    0.498
## street404 Fourth Blvd  0.44697    1.03619   0.431    0.684
## street445 Cedar Rd     1.81061    1.19549   1.515    0.190
## street456 Elm St       1.46970    1.05233   1.397    0.221
## street556 Elm Blvd     0.35606    1.07680   0.331    0.754
## street667 Maple Way    0.17045    0.87401   0.195    0.853
## street707 Seventh Ct   0.50379    0.99430   0.507    0.634
## street778 Birch Ct     0.58333    1.06535   0.548    0.608
## street789 Oak Ave      0.73106    0.92068   0.794    0.463
## street808 Eighth Way   0.64015    1.08566   0.590    0.581
## street889 Hickory Ln  -0.49621    0.99430  -0.499    0.639
## street909 Ninth Ave    0.88636    0.84503   1.049    0.342
## street990 Spruce Pl   -0.05682    0.82717  -0.069    0.948
## years_in_business      0.05682    0.09980   0.569    0.594
## coffee_qualityok      -0.69697    0.75366  -0.925    0.398
## coffee_qualitygood     0.32576    0.42282   0.770    0.476
## 
## Residual standard error: 0.5806 on 5 degrees of freedom
## Multiple R-squared:  0.7742, Adjusted R-squared:  -0.3094 
## F-statistic: 0.7145 on 24 and 5 DF,  p-value: 0.7401

Decision Tree

The decision tree model used in this analysis identified three key factors that influence whether a coffee shop is growing: coffee quality, years in business, and street address. Shops with higher coffee quality—particularly those rated as “good” were more likely to experience growth. Additionally, shops that had been in business longer tended to show more signs of stability and expansion. Lastly, the street variable acted as a proxy for location, capturing geographic or neighborhood-level effects; certain addresses were consistently associated with better outcomes, suggesting that location plays a strategic role in success.

#Rpubs Link http://rpubs.com/zz00019/1304850