This project investigates the relationship between the attractiveness of coffee shops as points of interest (POIs) and neighborhood advantages in Fulton, DeKalb, Clayton, Cobb, and Gwinnett counties, Georgia. By utilizing data from Yelp and American Community Survey, the analysis focuses on key neighborhood characteristics such as median household income, poverty rates, and demographic composition to identify correlations with the attractiveness of coffee shops. Through a series of visualizations created using the ggplot package, the project highlights patterns in consumer engagement and the distribution of attractive POIs across different neighborhoods, providing insights into the economic and social dynamics that influence urban vitality.
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## âś” dplyr 1.1.3 âś” readr 2.1.4
## âś” forcats 1.0.0 âś” stringr 1.5.0
## âś” ggplot2 3.4.3 âś” tibble 3.2.1
## âś” lubridate 1.9.2 âś” tidyr 1.3.0
## âś” purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## âś– dplyr::filter() masks stats::filter()
## âś– dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(ggplot2)
coffee <- read.csv("coffee.csv")
ggplot(coffee, aes(x = factor(avg_rating), y = hhincome)) +
geom_boxplot(outlier.shape = 19, outlier.colour = "black") +
labs(x = "Average Rating", y = "Household Income")
The box plot illustrates that higher average ratings tend to correlate with higher household income levels, but the average rating of 5 is an outlier.
ggplot(coffee, aes(x = factor(avg_rating), y = hhincome)) +
geom_boxplot(outlier.shape = 19, outlier.colour = "black") +
labs(x = "Average Yelp Rating", y = "Median Annual Household Income ($)") +
facet_wrap(~ county) + # Create facets for each county
scale_y_continuous(breaks = seq(50000, 200000, by = 50000)) # Set y-axis breaks
There is noticeable variation in income distributions within each rating category. Some counties exhibit a wider range of household incomes at similar rating levels, indicating differing economic landscapes across the regions.
ggplot(coffee, aes(x = review_count_log, y = hhincome, color = pct_white)) +
geom_point(size = 7, alpha = 0.7) +
labs(title = "Scatterplot: Review Count vs. Household Income",
x = "Review Count (log)",
y = "Median Annual Household Income ($)",
color = "Proportion of residents who self-identified as white") +
facet_wrap(~ county) +
scale_color_gradient(low = "blue", high = "red") +
coord_cartesian(ylim = c(0, 200000))
Generally, the proportion of residents who identify as white increases with median annual household income. There is also a positive correlation indicating that counties with higher household incomes tend to have more logged reviews.
data_long <- coffee %>%
pivot_longer(cols = c(hhincome, pct_pov_log, pct_white, pop),
names_to = "variable", values_to = "value")
correlations <- data_long %>%
group_by(variable) %>%
summarise(
R = cor(review_count_log, value, use = "complete.obs"),
p_value = cor.test(review_count_log, value)$p.value
) %>%
ungroup()
ggplot(data_long, aes(x = review_count_log, y = value, color = county)) +
geom_point(alpha = 0.8, size = 1) +
geom_smooth(method = "lm", se = FALSE) +
facet_wrap(~ variable, scales = "free_y", nrow = 2, ncol = 2, labeller = as_labeller(
c(hhincome = "Median Annual Household Income ($)",
pct_pov_log = "Percent Residents Under Poverty",
pct_white = "Percent White Resident",
pop = "Total Population")
)) +
labs(
title = "Scatterplot between logged review count & neighborhood characteristics",
subtitle = "Using Yelp data in Five Counties Around Atlanta, GA",
x = "Review Count Logged",
y = "Values"
) +
geom_text(data = correlations,
aes(x = -Inf, y = Inf, label = paste0("R = ", signif(R, 2), ", p = ", signif(p_value, 2))),
hjust = -0.1, vjust = 1.5, inherit.aes = FALSE, size = 3, color = "black") +
theme_minimal() +
theme(
strip.text = element_text(size = 8),
plot.title = element_text(size = 12),
plot.subtitle = element_text(size = 10),
strip.background = element_rect(fill = "lightgray"),
panel.border = element_rect(color = "black", fill = NA, size = 0.5),
axis.ticks.length = unit(0.15, "cm"),
axis.ticks = element_line(size = 0.3)
)
## Warning: The `size` argument of `element_line()` is deprecated as of ggplot2 3.4.0.
## ℹ Please use the `linewidth` argument instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## Warning: The `size` argument of `element_rect()` is deprecated as of ggplot2 3.4.0.
## ℹ Please use the `linewidth` argument instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## `geom_smooth()` using formula = 'y ~ x'
The analysis of Yelp data across five counties in Georgia reveals significant relationships between logged review counts of coffee shops and various neighborhood characteristics. Notably, higher review counts correlate with increased median household income, suggesting that wealthier neighborhoods tend to have a greater concentration of coffee shop patrons. This trend may be attributed to the disposable income of residents in affluent areas, allowing them to frequent coffee shops more often, thus generating a higher volume of reviews.
Additionally, the analysis uncovers a contrasting relationship with the percentage of residents living under the poverty line. Areas with higher poverty rates demonstrate lower review counts, indicating that economic disadvantage may limit access to and engagement with coffee shops. This disparity in restaurant engagement highlights potential barriers faced by residents in less advantaged neighborhoods, such as limited financial resources or fewer coffee shop options.
Furthermore, the findings suggest that demographic factors—such as income levels and poverty rates—significantly influence consumer behavior in urban environments. This relationship underscores the importance of considering socioeconomic context when evaluating the distribution and attractiveness of coffee shops as POIs. The insights gained from this analysis can inform urban planners and policymakers about the need for equitable access to quality dining options, potentially guiding efforts to enhance local economic development and community engagement in underserved areas.