This report presents a preliminary analysis of a dataset containing information about coffee shops across various U.S. cities. The objective is to identify patterns in shop density, ratings, and pricing that may inform strategic decisions about future locations and marketing approaches. We will explore whether shop experience, coffee quality, or city location best predict growth. We evaluate these predictors using linear regression and decision trees.
The dataset contains the following columns:
name: Name of the coffee shopcity: City where the shop is locatedstreet: Street addressyears in business: years in businesscoffee quality: good, ok, or bad.growing: 1 if the shop is growing, 0 if notThere are 30 records and 6 variables.
## years_in_business coffee_quality growing
## years_in_business 1.00000000 0.2375221 0.01210747
## coffee_quality 0.23752214 1.0000000 0.40505554
## growing 0.01210747 0.4050555 1.00000000
## Warning: package 'ggcorrplot' was built under R version 4.4.3
## years_in_business coffee_quality growing
## years_in_business 1.00000000 0.2375221 0.01210747
## coffee_quality 0.23752214 1.0000000 0.40505554
## growing 0.01210747 0.4050555 1.00000000
We begin by examining where most coffee shops are located.
##
## Oakville Riverton Springfield
## 10 10 10
##
## 101 First Ave 1010 Willow Dr 123 Main St 135 Pine Rd 147 Birch Way
## 1 1 6 1 1
## 159 Willow Ln 258 Spruce Ct 303 Third Rd 334 Pine St 369 Cedar Blvd
## 1 1 1 4 1
## 404 Fourth Blvd 445 Cedar Rd 456 Elm St 556 Elm Blvd 667 Maple Way
## 1 1 1 1 1
## 707 Seventh Ct 778 Birch Ct 789 Oak Ave 808 Eighth Way 889 Hickory Ln
## 1 1 1 1 1
## 909 Ninth Ave 990 Spruce Pl
## 1 1
We apply PCA to identify underlying patterns among the numeric variables.
## Importance of components:
## PC1 PC2 PC3
## Standard deviation 1.2145 0.9947 0.7319
## Proportion of Variance 0.4916 0.3298 0.1785
## Cumulative Proportion 0.4916 0.8215 1.0000
The PCA shows that the first principal component captures most of the
variance in the data, driven primarily by [coffee quality] and
[growing]. This suggests a shared structure across these variables that
may relate to shop success.
I predict growth by using a number of variables. The model is highly predictive. While it does not show Oakfield, it does work for the other two cities.
m <- lm(growing ~ years_in_business + coffee_quality + city, data = t)
summary(m)
##
## Call:
## lm(formula = growing ~ years_in_business + coffee_quality + city,
## data = t)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.83615 -0.27474 0.00422 0.16300 0.88265
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.1989328 0.2515354 -0.791 0.436457
## years_in_business -0.0004254 0.0288360 -0.015 0.988348
## coffee_quality 0.1585644 0.1007563 1.574 0.128119
## cityRiverton 0.3629578 0.1802407 2.014 0.054919 .
## citySpringfield 0.7205051 0.1830572 3.936 0.000584 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3901 on 25 degrees of freedom
## Multiple R-squared: 0.4905, Adjusted R-squared: 0.409
## F-statistic: 6.017 on 4 and 25 DF, p-value: 0.001558
#took street out of linear regression model so that there weren't so many variables/values. R squared was also negative with street included.
#I included city in the model because it was a factor variable and I thought it would be helpful to see if it was significant. It increased R squared.
#Fixed the typo in the header from "gorwth" to "Growth"
#Evaluation Model
#Added an evaluation model to truly test your model and how well it predicts growth.
set.seed(123)
# Split data
train_idx <- sample(seq_len(nrow(t)), size = 0.7 * nrow(t))
train <- t[train_idx, ]
test <- t[-train_idx, ]
# Fit on training
model <- lm(growing ~ years_in_business + coffee_quality + city, data = train)
# Predict on test
pred <- predict(model, newdata = test)
# Calculate Mean Squared Error (MSE)
mse <- mean((test$growing - pred)^2)
mse
## [1] 0.1049407
I create a decision tree to predict the growth variable.
## Warning: package 'rpart.plot' was built under R version 4.4.3
We visualize how the model’s predicted values compare to actual growth outcomes.
This analysis examined factors influencing coffee shop growth, focusing on years in business, coffee quality, and city. Both the regression and decision tree models showed that coffee quality and location are useful predictors of growth. While years in business played a smaller role, it still added value. Future work could include more detailed variables to improve accuracy, but this provides a strong foundation for understanding growth trends.