coffee <- read.csv("/Users/helenalindsay/Documents/Fall_23/CP8883/Mini4/coffee.csv")%>%
select(-X)
ggplot(data = coffee) +
geom_boxplot(aes(x=factor(avg_rating), y=hhincome),
fill = "white", color = "black")+
labs(x = "Average Rating", y = "Household Income")
The plot implies that the higher household income relates to average yelp ratings of 3 and 4.
ggplot(data = coffee) +
geom_boxplot(aes(x=factor(avg_rating), y=hhincome),
fill = "white", color = "black")+
labs(x = "Average Yelp Rating", y = "Median Average Household Income ($)") +
facet_wrap(~county)
Plot 2 indicates that the trend that we saw in plot 1 (higher household income correlates to average yelp ratings of 3 and 4) applies across counties.
ggplot(data = coffee) +
geom_point(aes(x=review_count_log, y=hhincome,color = pct_white),)+
labs(x = "Review Count (log)", y = "Median Average Household Income ($)") +
facet_wrap(~county)
Plot 3 implies that the tracts with a higher percentage of residents who self-identify as white have a higher median average household income across counties. In addition, the log of the average number of reviews seem to have a positive correlation with median average household income.
long_coffee <- coffee %>%
pivot_longer(cols = c(pct_pov_log, hhincome, pct_white, race.tot),
names_to = "var",
values_to = "Values")
long_coffee <- long_coffee %>%
mutate(
var = case_when(
var == "pct_pov_log" ~ "Percent Residents Under Poverty",
var == "hhincome" ~ "Median Annual Household Income ($)",
var == "pct_white" ~ "Percent White Resident",
var == "race.tot" ~ "Total Population",
TRUE ~ as.character(var) # Keep other values unchanged
)
)
scatterplots <- ggplot(long_coffee) +
geom_point(aes_string(x = long_coffee$review_count_log, y = "Values", color = "county")) +
stat_smooth(aes_string(x = long_coffee$review_count_log, y = "Values", color = "county"), method = "lm", se = FALSE) +
labs(
x = "Review Count Logged",
y = "Values",
color = "County"
) +
facet_wrap(~ var, scales = "free")+
stat_cor(aes(x=Values,y = review_count_log), method = "pearson")
scatterplots
There seems to be a positive correlation between the log of the average number of reviews and median annual household income as well as the percentage of residents who self-identify as white. On the other hand, total population and the percentage of residents under poverty seem to be negatively correlated to the log of the average number of reviews.