High average rating scores tend to correspond to higher median annual household incomes, particularly from average rating scores 1 to 4, except for those with a rating score of 5.
The variation in median household income is the smallest among coffee shops with an average rating score of 1.
Coffee shops with a rating score 4 shows large variation in median household income.
coffee <- read.csv("../coffee.csv")
# Plot 1
bxplot <- ggplot(data = coffee) +
geom_boxplot(aes(x=avg_rating, y=hhincome),
color="black",fill="white")
plotly::ggplotly(bxplot)
The dataset from Clayton County shows limited variation in average rating scores.
Cobb County does not have any observations with an average rating of 1.
In contrast, Fulton County shows a high degree of variability in average rating scores.
DeKalb County and Fulton County shows similar trends across different average rating levels.
bxplot_county <- ggplot(data = coffee) +
geom_boxplot(aes(x = avg_rating, y=hhincome),
color = "black",fill="white") +
facet_wrap(~county) +
labs(x="Average Rating",
y="Median Annual Household Income ($)")
plotly::ggplotly(bxplot_county)
The proportion of residents who self-identified as White seems to be related to both review count and median annual household income across all counties.
Clayton County shows a low proportion of residents of White residents.
The data size is larger in Fulton County than in Clayton County.
scatter_county <- ggplot(data=coffee) +
geom_point(aes(x=log(review_count), y=hhincome, color=pct_white),
size=3, alpha=0.7) +
facet_wrap(~county) +
scale_color_gradient(low = "darkblue", high = "red") +
labs(x="Review Count(log)",
y="Median Annual Household Income")
plotly::ggplotly(scatter_county)
DeKalb County shows a positive correlation between review count and household income.
All counties show no significant relationship between review count and poverty level.
DeKalb County shows a strong positive correlation between review count and the proportion of white residents. Fulton county also shows a similar relationship, whereas Clayton county does not show a meaningful correlation, presumably caused by a small sample size.
Population counts does not have a meaningful relationship with review counts across any of the counties.
coffee_long <- coffee %>%
pivot_longer(
cols = c(hhincome, pct_pov, pct_white, pop),
names_to = "variable",
values_to = "value"
) %>%
mutate(
value = if_else(variable == "pct_pov", log(value), value)
)
scatter_4 <- ggplot(data=coffee_long) +
geom_point(aes(x=log(review_count), y=value, color=county)) +
geom_smooth(aes(x=log(review_count), y = value, color=county), method = lm, se=FALSE) +
stat_cor(aes(x=log(review_count), y=value),
output.type="text",
label.x.npc = "left", label.y.npc = "top",
size = 3.5, color = "black", label.sep = ", ") +
facet_wrap(~variable, scales = "free_y",
labeller = as_labeller(c(
hhincome="Median Annual Household Income ($)",
pct_pov="Residents Under Poverty Level (%, log)",
pct_white="Residents who self-identify as White (%)",
pop="Total Population"
))) +
labs(
x = "Review Count (log)",
y = "Value",
color = "County"
) +
theme_bw() +
theme(strip.text=element_text(size=7))
scatter_4
## `geom_smooth()` using formula = 'y ~ x'