HW_4

Plot 1.

High average rating scores tend to correspond to higher median annual household incomes, particularly from average rating scores 1 to 4, except for those with a rating score of 5.

The variation in median household income is the smallest among coffee shops with an average rating score of 1.

Coffee shops with a rating score 4 shows large variation in median household income.

coffee <- read.csv("../coffee.csv")

# Plot 1
bxplot <- ggplot(data = coffee) +
  geom_boxplot(aes(x=avg_rating, y=hhincome),
               color="black",fill="white")

plotly::ggplotly(bxplot)

Plot 2.

The dataset from Clayton County shows limited variation in average rating scores.

Cobb County does not have any observations with an average rating of 1.

In contrast, Fulton County shows a high degree of variability in average rating scores.

DeKalb County and Fulton County shows similar trends across different average rating levels.

bxplot_county <- ggplot(data = coffee) +
  geom_boxplot(aes(x = avg_rating, y=hhincome), 
               color = "black",fill="white") +
  facet_wrap(~county) +
  labs(x="Average Rating",
       y="Median Annual Household Income ($)")

plotly::ggplotly(bxplot_county)

Plot 3.

The proportion of residents who self-identified as White seems to be related to both review count and median annual household income across all counties.

Clayton County shows a low proportion of residents of White residents.

The data size is larger in Fulton County than in Clayton County.

scatter_county <- ggplot(data=coffee) +
  geom_point(aes(x=log(review_count), y=hhincome, color=pct_white),
             size=3, alpha=0.7) +
  facet_wrap(~county) +
  scale_color_gradient(low = "darkblue", high = "red") +
  labs(x="Review Count(log)",
       y="Median Annual Household Income")

plotly::ggplotly(scatter_county)

Plot 4.

Median Annual Household Income

DeKalb County shows a positive correlation between review count and household income.

Residents Under Poverty Level (%, log)

All counties show no significant relationship between review count and poverty level.

Residents who self-identify as White (%)

DeKalb County shows a strong positive correlation between review count and the proportion of white residents. Fulton county also shows a similar relationship, whereas Clayton county does not show a meaningful correlation, presumably caused by a small sample size.

Total population

Population counts does not have a meaningful relationship with review counts across any of the counties.

coffee_long <- coffee %>%
  pivot_longer(
    cols = c(hhincome, pct_pov, pct_white, pop),
    names_to  = "variable",
    values_to = "value"
  ) %>%
  mutate(
    value = if_else(variable == "pct_pov", log(value), value)
  )

scatter_4 <- ggplot(data=coffee_long) +
  geom_point(aes(x=log(review_count), y=value, color=county)) +
  geom_smooth(aes(x=log(review_count), y = value, color=county), method = lm, se=FALSE) +
  stat_cor(aes(x=log(review_count), y=value),
           output.type="text",
           label.x.npc = "left", label.y.npc = "top",
           size = 3.5, color = "black", label.sep = ", ") +
    facet_wrap(~variable, scales = "free_y",
               labeller = as_labeller(c(
                 hhincome="Median Annual Household Income ($)",
                 pct_pov="Residents Under Poverty Level (%, log)",
                 pct_white="Residents who self-identify as White (%)",
                 pop="Total Population"
               ))) +
  labs(
      x = "Review Count (log)",
      y = "Value",
      color = "County"
    ) +
  theme_bw() +
  theme(strip.text=element_text(size=7))
  
scatter_4

## `geom_smooth()` using formula = 'y ~ x'