mini_4_1011

Plot 1: Variables used - avg_rating, hhincome

ggplot(data, aes(x = as.factor(avg_rating), y = hhincome)) +
  geom_boxplot()

The median household income appears to vary across different rating levels. Higher-rated businesses except the highest rating (ratings 3, 4) shows a generally higher median income compared to lower-rated businesses.

Plot 2: Variables used - avg_rating, hhincome, county

ggplot(data, aes(x = as.factor(avg_rating), y = hhincome)) +
  geom_boxplot() +
  facet_wrap(~ county, ncol = 2)

In some counties (like Cobb and Fulton), there appears to be a trend where higher-rated coffee businesses are more commonly found in higher-income neighborhoods. Other counties, such as Clayton and Gwinnett, do not show a clear pattern between income and average rating.

Plot 3: Variables used - review_count_log, hhincome, county, pct_white

ggplot(data, aes(x = review_count_log, y = hhincome, color = pct_white)) +
  geom_point(size = 2, alpha = 0.6) +
  scale_color_gradient(low = "blue", high = "red") +
  labs(x = "Review Count(log)", y = "Value", title = "Scatterplot:Review Count vs. Household Income") +
  facet_wrap(~ county, ncol = 2)

Except Clayton county, four counties show that census tract with higher proportion of residents who self-identify as white tend to have higher median household income. Across counties, there appears to be a general clustering of coffee businesses around certain income levels and review counts. For example, in Fulton County, the spread of review counts is broader and includes businesses in both low and high household income areas. In contrast, Clayton County shows a limited range of both income levels and review counts, suggesting a more homogeneous economic environment and business engagement pattern.

Plot 4: Variables used - pct_pov_log, hhincome, pct_white, pop, review_count_log, county

data_long <- data %>%
  pivot_longer(cols = c(hhincome, pct_pov_log, pct_white, pop), 
               names_to = "variable", 
               values_to = "value")

fig <- ggplot(data_long, aes(x = review_count_log, y = value, color = county)) +
  geom_point(size = 1) +
  geom_smooth(method = "lm", se = FALSE, linewidth = 0.7) +
  facet_wrap(~ variable, ncol = 2, scales = "free_y") +
  labs(
    x = "Review Count(log)", 
    y = "Value", 
    title = "Scatterplot between logged review count & neighborhood characteristics",
    subtitle = "Using Yelp data in Five Counties Around Atlanta, GA")

fig

In examining the relationship between median household income and logged review count, only DeKalb County shows a positive slope, indicating that higher review counts are associated with higher median household income. In contrast, the other counties exhibit a flat slope, suggesting no specific relationship between these variables. For the relationship between the logged review count and the log-transformed percentage of residents under poverty, all counties show a negative slope, suggesting that as the review count increases, the log-transformed percentage of residents under poverty decreases. This implies that higher review counts are associated with lower poverty levels in each census tract. Conversely, when examining the relationship between the percentage of residents who self-identify as white and the log of the average number of reviews, all counties display a negative slope. This indicates that as the review count increases, the percentage of residents who self-identify as white also increases. DeKalb and Fulton counties, in particular, show a more pronounced slope. Lastly, in terms of the relationship between population and the log of the average number of reviews, most counties display a flat line. This is likely because most tracts are clustered around the middle range of the logged review count (x-axis) and household income below $10,000 (y-axis), indicating no specific relationship.

mini_4_1011

Seung Jae Lieu

2024-10-10

Plot 1: Variables used - avg_rating, hhincome

Plot 2: Variables used - avg_rating, hhincome, county

Plot 3: Variables used - review_count_log, hhincome, county, pct_white

Plot 4: Variables used - pct_pov_log, hhincome, pct_white, pop, review_count_log, county