1 Introduction

This document analyzes a dataset of coffee shops to understand the factors influencing their growth. The dataset includes information on the coffee shop’s name, street, city, years in business, coffee quality, and whether the shop is growing. We will explore the relationships between these variables using visualizations and statistical models.

1.1 Growth by Coffee Quality

1.2 Growth by City

1.3 Correlation Matrix

The correlation matrix shows the relationships between the variables in the dataset. The correlation between city_num and growing is positive, indicating that certain cities tend to have more growing coffee shops. The correlation between coffee_quality and growing is also positive, suggesting that higher coffee quality is associated with growth. The correlation between city_num and coffee_quality is not as strong, indicating that while there are differences in coffee quality across cities, it is not the primary driver of growth.

1.4 Logistic Regression Model

## 
## Call:
## glm(formula = growing ~ city_num + coffee_quality_num, family = binomial, 
##     data = t)
## 
## Coefficients:
##                    Estimate Std. Error z value Pr(>|z|)   
## (Intercept)         -6.3146     2.0804  -3.035  0.00240 **
## city_num             2.1409     0.7718   2.774  0.00554 **
## coffee_quality_num   1.0020     0.6385   1.569  0.11658   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 41.455  on 29  degrees of freedom
## Residual deviance: 23.927  on 27  degrees of freedom
## AIC: 29.927
## 
## Number of Fisher Scoring iterations: 5

1.5 Predicted Growth vs Actual Growth

Here can we can see that the predicted growth is higher for coffee shops with good coffee quality, and the city also plays a significant role in the predicted growth. The model predicts that coffee shops in certain cities are more likely to grow, even if they have lower coffee quality.

1.6 Decision Tree Model

Here we can see that the decision tree shows that the most important variable for predicting growth is the city. The next most important variable is the coffee quality. The tree splits the data into two main branches: one for good coffee quality and one for ok or bad coffee quality. Within the good coffee quality branch, there are further splits based on the city. This indicates that coffee quality is a strong predictor of growth, but the city also plays a significant role.

1.7 Model Evaluation on Test Set

## [1] "Test Accuracy: 91.67 %"

1.8 Conclusion

In this analysis, we explored the relationships between coffee shop growth, city, and coffee quality. We found that both city and coffee quality are significant predictors of growth. The logistic regression model and decision tree model both indicated that coffee quality is a strong predictor of growth, but the city also plays a significant role. The decision tree provided a clear visual representation of how these variables interact to influence growth. The test set evaluation showed that the model has a good accuracy in predicting growth based on these factors.

https://rpubs.com/seesaw9/1304872