Import Libraries
library(tidyverse)
library(sf)
library(tmap)
library(leaflet)
library(tidycensus)
library(magrittr)
library(here)
library(ggplot2)
library(plotly)
Import Data
data <- read.csv(here::here("coffee.csv"))
bxplot <- ggplot(data) +
geom_boxplot(aes(x=factor(avg_rating), y=hhincome),
color="black",fill="white") +
labs(
x = "Average Rating",
y = "Median Annual Household Income ($)"
)
plotly::ggplotly(bxplot)
The box plot illustrates how the median annual household income varies across neighborhoods with different average coffee shop ratings. A nonlinear relationship is found between income levels and ratings.
Rating 1: Coffee Shops with the lowest ratings are predominantly in neighborhoods with the lowest household incomes.
Ratings 3 and 4: Coffee shops with mid-to-high ratings are located in areas with the higher median household incomes. This suggests these establishments tend to serve more affluent communities.
Rating 5: Interestingly, the highest-rated coffee shops are not situated dominantly in the wealthiest areas. Their median income levels are lower than those for ratings 3 and 4, and even slightly below rating 2.
This pattern suggests that exceptionally high ratings of 5 may correlate with factors other than neighborhood affluence.
bxplot <- ggplot(data) +
geom_boxplot(aes(x=factor(avg_rating), y=hhincome),
color="black",fill="white") +
facet_wrap(~ county) +
labs(
x = "Average Rating",
y = "Median Annual Household Income ($)"
)
plotly::ggplotly(bxplot)
The box plots illustrate how the relationship between average coffee shop rating and median household income in tract level varies across counties.
Clayton County: There is little variation in median household income across rating categories, suggesting coffee shops are located in neighborhoods with similar income levels regardless of rating.
Cobb County: Coffee shops rated 4 tend to be situated in tracts with slightly higher median incomes compared to those rated 2 or 3, though the difference is modest.
DeKalb County: A mild positive trend is observed from ratings 1 to 4, where higher-rated shops are found in wealthier tracts. However, this pattern reverses at rating 5, where shops are located in areas with the lowest median incomes.
Fulton County: Coffee shops rated 1 are concentrated in the lowest-income tracts. A gradual increase in income is seen from ratings 2 to 4, indicating a modest positive relationship. Shops with rating 5, however, are located in tracts with similar incomes to those with rating 4, suggesting a plateau rather than a continued increase.
Gwinnett County: Median household income tends to rise slightly from ratings 1 to 3, followed by stabilization or a minor decline for ratings 4 and 5, indicating no strong income gradient.
Overall, while some counties such as DeKalb and Fulton show a positive but nonlinear relationship, others like Clayton and Cobb exhibit minimal association between coffee shop ratings and neighborhood affluence.
ggplot(data) +
geom_point(
aes(x = review_count_log, y = hhincome, color = pct_white),
alpha = 0.7,
size = 2
) +
scale_color_gradient(
low = "blue",
high = "red",
name = "Proportion of residents\nwho self-identified as white"
) +
facet_wrap(~ county) +
labs(
title = "Scatterplot: Review Count vs. Household Income",
x = "Review Count (log scale)",
y = "Median Annual Household Income ($)"
) +
theme_bw()
This scatterplot examines the relationship between median household
income of census tracts and the number of reviews for coffee shops
located within them (on a log scale). Overall, there is no clear or
consistent relationship between review count and household income across
counties — higher income areas do not systematically correspond to more
(or fewer) coffee shop reviews. However, the color gradient reveals a
notable pattern:
Red dots, representing tracts with a higher proportion of white residents, tend to appear toward the upper part of the plot, corresponding to higher median household incomes.
Blue dots, indicating tracts with lower proportions of white residents, cluster more in the lower-income range.
This pattern suggests that racial composition is more associated with neighborhood income levels than coffee shop review activity.
df_long <- data %>%
pivot_longer(
cols = c(hhincome, pct_pov_log, pct_white, pop),
names_to = "metric", values_to = "value"
) %>%
mutate(
metric = recode(metric,
hhincome = "Median Annual Household Income ($)",
pct_pov_log = "Residents Under Poverty Level (%; log)",
pct_white = "Residents who self-identify as White (%)",
pop = "Total Population"
)
)
ggplot(df_long, aes(x = review_count_log, y = value, color = county)) +
geom_point(alpha = 0.7, size = 1) +
geom_smooth(method = "lm", se = FALSE) +
facet_wrap(~ metric, scales = "free_y") +
labs(
title = "Scatterplot between logged review count & neighborhood characteristics",
x = "Review Count (log)",
y = "Values",
color = "County"
) +
ggpubr::stat_cor(
inherit.aes = FALSE,
data = df_long,
mapping = aes(x = review_count_log, y = value),
method = "pearson",
label.x = 0.2,
label.y = Inf,
hjust = 0, vjust = 1.2,
size = 3
) +
theme_bw()
## `geom_smooth()` using formula = 'y ~ x'
Unlike the previous plot, this figure reveals more nuanced and statistically supported correlations between the log-transformed review count and various neighborhood characteristics.
[Median Annual Household Income] There is a positive association between tract level median household income and the number of reviews (R = 0.16, p = 6 × 10⁻⁴), indicating a statistically significant relationship. This suggests that coffee shops located in wealthier tracts tend to attract more reviews. Among all counties, DeKalb County exhibits the strongest positive association.
[Residents Under Poverty Level] The proportion of residents living below the poverty level in tract level is negatively associated with review counts (R = –0.12, p = 0.014), which is statistically significant. This indicates that areas with higher poverty rates generally correspond to fewer coffee shop reviews. Again, DeKalb County shows the most pronounced negative relationship.
[Residents Who Self-Identify as White] The share of residents identifying as White in tract level is positively correlated with review counts (R = 0.24, p = 6.7 × 10⁻⁷), demonstrating a significant relationship. Census tracts with higher proportions of White residents tend to have more active coffee shop engagement, as measured by review volume. DeKalb County once more shows the strongest correlation among all counties.
[Total Population] The total population of each tract is slightly negatively correlated with review counts (R = –0.031, p = 0.52), but this relationship is not statistically significant. While some counties may exhibit marginal effects, the overall pattern suggests that population size alone does not explain variation in review activity.