Data
library(tidyverse)
library(sf)
library(tmap)
library(leaflet)
library(tidycensus)
coffee <- read_csv("coffee.csv")
head(coffee)
Scatter plot
Plot 3
ggplot(data = coffee) +
geom_point(mapping = aes(x=review_count_log, y=hhincome,color=pct_white,alpha=0.5)) +
facet_wrap(~county)+
labs(x = "Review Count (log)",
y = "Median Annual Household Income",
color = "Proportion of residents who self-identified as white",
title = "Scatterplot: Review Count vs. Household Income") +
scale_color_gradient(low="darkblue", high="red") + #<<
theme_bw()
There is a general trend for review count to increase with higher median household income, but the strength of this relationship varies across counties. The color suggests that there may be some differences in the relationship based on racial demographics.
Plot 4
coffee_long <- coffee %>%
pivot_longer(cols = c(hhincome_log, pct_pov_log, pct_white, pop),
names_to = "Variable",
values_to = "Values")
ggplot(coffee_long, aes(x = review_count_log, y = Values, color = county)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE) +
facet_wrap(~ Variable, scales = "free", ncol = 2) +
labs(title = "Scatterplot between logged review count & neighborhood characteristics",
subtitle = "Using Yelp data in Five Counties Around Atlanta, GA",
x = "Review Count Logged",
y = "Values") +
theme_bw()
## `geom_smooth()` using formula = 'y ~ x'
Median Annual Household Income: There is a weak positive correlation between logged review count and median annual household income. This suggests that areas with higher median incomes tend to have slightly more reviews.
Percent Residents Under Poverty: There is a weak negative correlation between logged review count and percent residents under poverty. This suggests that areas with higher poverty rates tend to have slightly fewer reviews.
Percent White Resident: There is a moderate positive correlation between logged review count and percent white resident. This suggests that areas with a higher proportion of white residents tend to have more reviews.
Total Population: There is a weak positive correlation between logged review count and total population. This suggests that larger areas tend to have slightly more reviews, but the relationship is not very strong.
Box plot
Plot 1
bxplot <- ggplot(data = coffee) +
geom_boxplot(aes(x=avg_rating, y=hhincome),
color="black",fill="white")
plotly::ggplotly(bxplot)
## Warning: Continuous x aesthetic
## ℹ did you forget `aes(group = ...)`?
The boxplot shows that 1-star businesses are located in neighborhoods with the lowest median household income. Income tends to increase for 2, 3, and 4-star businesses. However, 5-star businesses are associated with a drop in income, similar to 1-star areas.
Plot 2
bxplot_sep <- ggplot(data = coffee) +
geom_boxplot(aes(x=avg_rating, y=hhincome),
color="black",fill="white")+
facet_wrap(~county)
plotly::ggplotly(bxplot_sep)
## Warning: Continuous x aesthetic
## ℹ did you forget `aes(group = ...)`?
In Cobb county, yelp ratings are more concentrated around 3-4 stars. Median household income of DeKalb county is similar to Cobb, but with a wider range. Yelp ratings are more spread out. For Fulton county, median household income is highest, with a wide range.