http://rpubs.com/kgc00005/1304861
This report presents a preliminary analysis of a dataset containing information about coffee shops across various U.S. cities. The objective is to identify patterns in shop density, ratings, and pricing that may inform strategic decisions about future locations and marketing approaches. I find that using location and quality is the best predictor of coffee shop growth. My r^2 value is .45.
The dataset contains the following columns:
name: Name of the coffee shopcity: City where the shop is locatedstreet: Street addressyears in business: years in businesscoffee quality: good, ok, or bad.growing: 1 if the shop is growing, 0 if notcoffee_quality_numeric: 3 for good, 2 for ok, 1 for
badcity_using_numbers: 1 for Springfield, 2 for Riverton,
3 for OakvilleThere are 30 records and 6 variables.
## years_in_business coffee_quality_numeric
## years_in_business 1.00000000 0.2375221
## coffee_quality_numeric 0.23752214 1.0000000
## city_using_numbers 0.07926197 -0.2508375
## growing 0.01210747 0.4050555
## city_using_numbers growing
## years_in_business 0.07926197 0.01210747
## coffee_quality_numeric -0.25083752 0.40505554
## city_using_numbers 1.00000000 -0.65465367
## growing -0.65465367 1.00000000
We begin by examining where most coffee shops are located.
##
## Oakville Riverton Springfield
## 10 10 10
##
## 101 First Ave 1010 Willow Dr 123 Main St 135 Pine Rd 147 Birch Way
## 1 1 6 1 1
## 159 Willow Ln 258 Spruce Ct 303 Third Rd 334 Pine St 369 Cedar Blvd
## 1 1 1 4 1
## 404 Fourth Blvd 445 Cedar Rd 456 Elm St 556 Elm Blvd 667 Maple Way
## 1 1 1 1 1
## 707 Seventh Ct 778 Birch Ct 789 Oak Ave 808 Eighth Way 889 Hickory Ln
## 1 1 1 1 1
## 909 Ninth Ave 990 Spruce Pl
## 1 1
##
## 0 1
## 16 14
I predict growth by using a number of variables. The model is highly predictive. It shows all 3 cities as numeric values, and the coffee quality as a numeric value. The adjusted r^2 value is 0.45, which is decent. The p-value is 0.0001, which is significant.
##
## Call:
## lm(formula = growing ~ coffee_quality_numeric + city_using_numbers,
## data = t)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.83778 -0.27413 0.00466 0.16222 0.88344
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.88326 0.27874 3.169 0.00378 **
## coffee_quality_numeric 0.15756 0.08700 1.811 0.08126 .
## city_using_numbers -0.36061 0.08671 -4.159 0.00029 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3754 on 27 degrees of freedom
## Multiple R-squared: 0.4905, Adjusted R-squared: 0.4527
## F-statistic: 13 on 2 and 27 DF, p-value: 0.0001114
I create a decision tree to predict the growth variable.
## [1] 0.8666667