Exam
You have received the below report from a junior analyst. She is a new hire and has been working on this report for a few weeks. She has asked you to review the report. You should make corrections to the document, and explain in comment (in a code block) each of your changes.
Upload the finished Rmd to eCampus. You should be able to publish the document; put the link at to the top of the document.
Delete all of these instructions (everything between the === bars)
This report presents a preliminary analysis of a dataset containing information about coffee shops across various U.S. cities. The objective is to identify patterns in shop location, coffee quality, and growth that may inform strategic decisions about future locations and marketing approaches. I find that using street is the best predictor of coffee shop growth. My R² value is .77.
The dataset contains the following columns:
name: Name of the coffee shopcity: City where the shop is locatedstreet: Street addressyears_in_business: years in businesscoffee_quality: good, ok, or bad.growing: 1 if the shop is growing, 0 if notThere are 30 records and 6 variables.
These are correlations to show how each variable relates to one another
We begin by examining where most coffee shops are located.
##
## Oakville Riverton Springfield
## 10 10 10
##
## 101 First Ave 1010 Willow Dr 123 Main St 135 Pine Rd 147 Birch Way
## 1 1 6 1 1
## 159 Willow Ln 258 Spruce Ct 303 Third Rd 334 Pine St 369 Cedar Blvd
## 1 1 1 4 1
## 404 Fourth Blvd 445 Cedar Rd 456 Elm St 556 Elm Blvd 667 Maple Way
## 1 1 1 1 1
## 707 Seventh Ct 778 Birch Ct 789 Oak Ave 808 Eighth Way 889 Hickory Ln
## 1 1 1 1 1
## 909 Ninth Ave 990 Spruce Pl
## 1 1
I predict growth by using a number of variables. The model is highly predictive. This model works for all cities.
##
## Call:
## lm(formula = growing ~ city_numeric + coffee_quality_numeric,
## data = t2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.83778 -0.27413 0.00466 0.16222 0.88344
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.51352 0.22684 6.672 3.67e-07 ***
## city_numeric -0.36061 0.08671 -4.159 0.00029 ***
## coffee_quality_numeric -0.15756 0.08700 -1.811 0.08126 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3754 on 27 degrees of freedom
## Multiple R-squared: 0.4905, Adjusted R-squared: 0.4527
## F-statistic: 13 on 2 and 27 DF, p-value: 0.0001114
I create a decision tree to predict the growth variable.