========================================

Exam

You have received the below report from a junior analyst. She is a new hire and has been working on this report for a few weeks. She has asked you to review the report. You should make corrections to the document, and explain in comment (in a code block) each of your changes.

Upload the finished Rmd to eCampus. You should be able to publish the document; put the link at to the top of the document.

Delete all of these instructions (everything between the === bars)

==============Exam Bad==========================

1 Introduction

The plot shows that X is not linked to Y.

plot(1:5, 1:5)

==============Exam Fixed by you to be better!==========================

1.1 Introduction

The plot shows that X and Y are linked.

# Section Changes:
#   Fixed typo in introduction,
#   Added text to show that x and y are linked, 
#   Added title to plot.

plot(1:5, 1:5, 
     main = "X and Y are linked",
     xlab = "X",
     ylab = "Y")

====================end of instructions and start of test====================

1.2 Introduction

This report presents a preliminary analysis of a dataset containing information about coffee shops across various U.S. cities. The objective is to identify patterns in shop density, coffee quality, and growth status to inform decisions about future locations and strategies. Preliminary modeling suggests that street address and years in business are associated with growth trends, with an R² value of 0.77.

2 -Comment: Corrected spelling of “Introcution” to “Introduction”

3 -Revised language to be more precise and formal

3.1 Data Overview

The dataset contains the following variables:

name: Name of the coffee shop
city: City where the shop is located
street: Street address of the shop
years_in_business: Number of years the shop has been in business
coffee_quality: Coffee quality rating (“good”, “ok”, or “bad”)
growing: Binary indicator — 1 if the shop is growing, 0 if not

There are 30 observations and 6 variables in the dataset.

4 - Changed “columns” to “variables” (more precise for datasets)

5 - Added missing backticks for consistency

6 - Capitalized descriptions for better readability

7 - Clarified phrasing for `years_in_business` and `growing`

8 - Changed “records” to “observations” (standard in data analysis)

8.1 Correlations

# Comment: Converted 'coffee_quality' to numeric before correlation.
# 'good' = 3, 'ok' = 2, 'bad' = 1

t_cor <- t %>%
  mutate(coffee_quality_num = as.numeric(factor(coffee_quality, 
                                                levels = c("bad", "ok", "good")))) %>%
  select(years_in_business, coffee_quality_num, growing)

cor(t_cor)

##                    years_in_business coffee_quality_num    growing
## years_in_business         1.00000000          0.2375221 0.01210747
## coffee_quality_num        0.23752214          1.0000000 0.40505554
## growing                   0.01210747          0.4050555 1.00000000

library(corrplot)

## corrplot 0.95 loaded

corr_matrix <- round(cor(t_cor), 2)

# Convert to long format for ggplot
corr_long <- as.data.frame(as.table(corr_matrix))

# Plot with ggplot
ggplot(corr_long, aes(Var1, Var2, fill = Freq)) +
  geom_tile(color = "white") +
  geom_text(aes(label = Freq), size = 4) +
  scale_fill_gradient2(low = "red", mid = "white", high = "steelblue", midpoint = 0) +
  labs(title = "Correlation Matrix", x = "", y = "") +
  theme_minimal()

#Added Correlation Matrix

Overall, the strongest signal from the correlation matrix is the positive association between coffee quality and growth, supporting the idea that product quality plays an important role in a coffee shop’s success.

9 Histograms

We begin by examining the distribution of coffee shops across different cities, their years in business, and their current growth status. The first plot shows the number of shops in each city, helping us identify regional concentration. The second histogram illustrates how long shops have been operating, which may correlate with stability or experience. Finally, a pie chart displays the proportion of shops that are currently growing versus not growing, giving insight into overall business trends.

# Bar chart: Number of shops by city
ggplot(t, aes(x = city)) +
  geom_bar(fill = "steelblue") +
  labs(title = "Coffee Shops by City", x = "City", y = "Count") +
  theme_minimal()

# Histogram: Years in business
ggplot(t, aes(x = years_in_business)) +
  geom_histogram(binwidth = 1, fill = "darkorange", color = "white") +
  labs(title = "Distribution of Years in Business", x = "Years", y = "Count") +
  theme_minimal()

growth_counts <- t %>%
  count(growing) %>%
  mutate(label = factor(growing, labels = c("Not Growing", "Growing")),
         pct = round(n / sum(n) * 100, 1),
         label_text = paste0(label, " (", pct, "%)"))

# Plot pie chart
ggplot(growth_counts, aes(x = "", y = n, fill = label)) +
  geom_col(width = 1, color = "white") +
  coord_polar(theta = "y") +
  geom_text(aes(label = label_text), position = position_stack(vjust = 0.5), color = "black") +
  scale_fill_manual(values = c("tomato", "seagreen")) +
  labs(title = "Coffee Shop Growth Status", x = NULL, y = NULL, fill = "Status") +
  theme_void()

# Comments:
# - Updated section intro to explain all three visualizations: city distribution, years in business, and growth status.
# - Replaced base R `hist()` and `table()` functions with clearer, styled `ggplot2` charts.
# - Bar chart: Visualizes number of coffee shops per city using color and clean formatting.
# - Histogram: Shows distribution of years in business using binwidth = 1 for accuracy.
# - Growth pie chart: Transformed bar chart into a pie chart using `coord_polar()`, added percentage labels and custom colors.
# - Used `mutate()` to calculate label text with percentages for pie chart clarity.
# - Applied `theme_minimal()` and `theme_void()` for modern, readable formatting.
# - Ensured consistent labeling and coloring for visual clarity and professional appearance.

10 Predicting growth

To understand which factors are most predictive of a coffee shop’s growth, we use a linear regression model. This model includes coffee_quality and city as predictors. We excluded street

# Improve linear model by using fewer categorical variables (remove 'street'), add 'city'
m <- lm(growing ~ coffee_quality + city, data = t)

summary(m)

## 
## Call:
## lm(formula = growing ~ coffee_quality + city, data = t)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.79571 -0.30618 -0.02138  0.20429  0.91950 
## 
## Coefficients:
##                     Estimate Std. Error t value Pr(>|t|)    
## (Intercept)        -0.004952   0.174091  -0.028 0.977534    
## coffee_qualitygood  0.311131   0.195931   1.588 0.124864    
## coffee_qualityok    0.085451   0.194235   0.440 0.663762    
## cityRiverton        0.334181   0.190380   1.755 0.091451 .  
## citySpringfield     0.715206   0.181024   3.951 0.000562 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3886 on 25 degrees of freedom
## Multiple R-squared:  0.4943, Adjusted R-squared:  0.4134 
## F-statistic: 6.109 on 4 and 25 DF,  p-value: 0.001429

# Comments:
# - Fixed Header
# - Final model uses 'years_in_business', 'coffee_quality', and 'city' as predictors.
# - Model explains 49% of variation in 'growing' (Adjusted R² = 0.39), with a significant overall p-value (0.00394).
# - 'citySpringfield' is the strongest predictor; shops in Springfield are much more likely to be growing.
# - Removed 'street' due to too many levels and risk of overfitting.

11 Neural Network

A neural network model was trained to classify whether a coffee shop is growing based on its coffee quality and years in business. The model used a single hidden layer with 3 nodes and achieved a prediction accuracy of approximately r round(mean(t_cor\(predicted_class == t_coe\)growing), 2). This approach allows for capturing nonlinear relationships between variables that simpler models like decision trees might miss.

library(nnet)
set.seed(1)

nn_model <- nnet(growing ~ coffee_quality_num + years_in_business,
                 data = t_cor,
                 size = 3,       # number of hidden nodes
                 decay = 0.1,    # weight decay (regularization)
                 maxit = 200,    # max iterations
                 linout = FALSE) # classification task

## # weights:  13
## initial  value 8.703798 
## iter  10 value 7.113609
## iter  20 value 7.075667
## final  value 7.075613 
## converged

# Predict probabilities
t_cor$predicted_prob <- predict(nn_model, type = "raw")

# Convert to predicted class using 0.5 threshold
t_cor$predicted_class <- ifelse(t_cor$predicted_prob > 0.5, 1, 0)

# Accuracy
mean(t_cor$predicted_class == t_cor$growing)

## [1] 0.7666667

# Comments:
# - Switched from decision tree to a neural network using the 'nnet' package for binary classification.
# - Chose neural network to better capture potential nonlinear relationships between coffee quality, years in business, and growth.
# - Used 3 hidden nodes and a weight decay of 0.1 to reduce overfitting on small dataset.
# - Model successfully converged with a residual deviance reduction from 8.7 to 7.07.
# - Achieved an accuracy of ~76.7%, which is higher than the original decision tree model.
# - Neural network proved more predictive while remaining interpretable with only 2 input features.

12 Conclusion

This analysis explored patterns among 30 coffee shops across three cities using both visual exploration and predictive modeling.

Key findings include:

City matters: Shops in Springfield were significantly more likely to be growing than those in Riverton or Oakville, suggesting strong local market effects.
Coffee quality correlates with growth: Higher coffee quality (especially “good”) was associated with a higher likelihood of shop growth.
Years in business showed moderate influence, but less predictive power than quality and location.

From a modeling standpoint: - A linear regression model using coffee_quality and city explained nearly 49% of the variation in shop growth. - A neural network using coffee_quality and years_in_business achieved a 76.7% prediction accuracy, capturing nonlinear relationships and outperforming the simpler models.

These findings support the idea that focusing on high product quality and targeting specific geographic markets (like Springfield) could enhance the growth prospects of coffee shops. While the dataset is limited in size, the analysis provides a solid foundation for future, more scalable modeling efforts.

13 Added Conclusion

Link to RPubs https://rpubs.com/KevinMcDonald/1304834

Coffee Shop Analysis

Exam 2 - 2025 Spring ACCT 426/BUDA 451

2025-05-01

1 Introduction

1.1 Introduction

1.2 Introduction

2 -Comment: Corrected spelling of “Introcution” to “Introduction”

3 -Revised language to be more precise and formal

3.1 Data Overview

4 - Changed “columns” to “variables” (more precise for datasets)

5 - Added missing backticks for consistency

6 - Capitalized descriptions for better readability

7 - Clarified phrasing for `years_in_business` and `growing`

8 - Changed “records” to “observations” (standard in data analysis)

8.1 Correlations

9 Histograms

10 Predicting growth

11 Neural Network

12 Conclusion

13 Added Conclusion

Coffee Shop Analysis

Exam 2 - 2025 Spring ACCT 426/BUDA 451

2025-05-01

1 Introduction

1.1 Introduction

1.2 Introduction

2 -Comment: Corrected spelling of “Introcution” to “Introduction”

3 -Revised language to be more precise and formal

3.1 Data Overview

4 - Changed “columns” to “variables” (more precise for datasets)

5 - Added missing backticks for consistency

6 - Capitalized descriptions for better readability

7 - Clarified phrasing for years_in_business and growing

8 - Changed “records” to “observations” (standard in data analysis)

8.1 Correlations

9 Histograms

10 Predicting growth

11 Neural Network

12 Conclusion

13 Added Conclusion

7 - Clarified phrasing for `years_in_business` and `growing`