MA_ZHAOXIN

Setting up the data given.

# Load the coffee data given in the lab.
coffee_data <- read_csv("C:/Users/jenny/OneDrive - Georgia Institute of Technology/Desktop/CP8883/Mini 4/coffee.csv") %>%
  select(-`...1`)

# Expected columns
head(coffee_data)
names(coffee_data)

Plot 1: Household Income by Average Coffee Shop Rating

ggplot(coffee_data %>%
         mutate(avg_rating = round(avg_rating, 0) %>% factor(ordered = TRUE))) +
  geom_boxplot(aes(x = avg_rating, y = hhincome),
               fill = "lightgreen", color = "darkgreen") +
  labs(
    title = "Plot 1. Median Household Income by Average Coffee Shop Rating",
    x = "Average Coffee Shop Rating (Rounded to Stars)",
    y = "Median Household Income ($)"
  ) +
  theme_bw()

# Findings from Plot 1 (above): 
# The boxplot shows that coffee shops in higher-income neighborhoods tend to have higher average ratings. Areas with one- or two-star shops usually have lower median household incomes, while areas with three- and four-star shops often have higher incomes above $100,000. This suggests that wealthier communities may attract or support better-rated coffee shops. However, there is still a wide range of incomes within each rating level, meaning income alone does not fully explain the differences in ratings. Some high-income areas also have lower-rated shops, showing that other factors (such as customer expectations or the number of shops nearby) might also play a role.

Plot 2: Median Household Income by Average Coffee Rating, Organized by County

ggplot(coffee_data %>%
         mutate(avg_rating = round(avg_rating, 0) %>% factor(ordered = TRUE))) +
  geom_boxplot(aes(x = avg_rating, y = hhincome),
               fill = "lightblue", color = "darkblue") +
  facet_wrap(~ county) +
  labs(
    title = "Plot 2. Median Household Income by Coffee Shop Rating, Organized by County",
    x = "Average Coffee Shop Rating (Rounded to Stars)",
    y = "Median Household Income ($)"
  ) +
  theme_bw()

# Findings from Plot 2 (above): 
# This plot shows how the relationship between coffee shop ratings and household income differs across the five counties. In Fulton and Cobb Counties, there is a clear trend where higher-rated shops are often located in higher-income tracts, with several outliers reaching above $200,000. These outliers, shown as dots above the boxes, represent particularly wealthy neighborhoods that may have strong consumer demand and more competitive coffee markets. DeKalb County shows a similar but weaker pattern, while Gwinnett and Clayton Counties have narrower income ranges and smaller variations between ratings, suggesting that ratings there are less influenced by income levels. Overall, the presence of multiple outliers and varying box sizes indicates that while income generally rises with better coffee ratings, the strength of this pattern depends on local county conditions.

Plot 3: Review Count vs. Household Income, Organized by % Self-Identified White

ggplot(coffee_data) +
  geom_point(aes(x = review_count_log, 
                 y = hhincome, 
                 color = pct_white), 
             alpha = 0.8, size = 3) +
  facet_wrap(~ county) +
  scale_color_gradient(low = "blue", high = "red",
                       name = "Proportion of Residents\nIdentifying as White") +
  labs(
    title = "Plot 3. Review Count vs. Household Income by County",
    x = "Review Count (log scale)",
    y = "Median Household Income ($)"
  ) +
  theme_bw()

# Findings from Plot 3 (above): 
# This scatterplot above shows how the number of coffee shop reviews (on a log scale) relates to median household income across five metro Atlanta counties. Each dot represents a census tract, where the x-axis shows the log of average review counts (meaning values increase quickly. For example, a move from 2 to 4 roughly equals a jump from about 7 reviews to 55). And the y-axis shows median household income in dollars. The color scale represents the proportion of residents self-identifying as White, with darker blue points showing lower percentages and red points showing higher percentages.
# In general, it seems that tracts with higher incomes tend to have more coffee shop reviews, especially in Cobb, Fulton, and DeKalb Counties. These counties have more red points concentrated toward the top-right of each panel, showing that higher-income, majority-White areas often have more active or popular coffee shops. In contrast, Clayton County has mostly blue and purple points near the bottom-left, indicating fewer reviews and lower household incomes. Gwinnett County shows a mix of both patterns, with moderate incomes and review counts.
# Overall, this suggests a positive correlation between income and review activity: wealthier areas tend to have coffee shops that attract more customer engagement. The color trend adds another layer of information, indicating that places with higher proportions of White residents also tend to overlap with higher incomes and more reviews. Altogether, these patterns show how both socioeconomic status and demographic makeup can influence online visibility and community engagement with local coffee shops!

Plot 4: Logged Review Count vs. Neighborhood Characteristics

# Reshape data to long format
coffee_long <- coffee_data %>%
  select(county, review_count_log, hhincome, pct_pov_log, pct_white, pop) %>%
  pivot_longer(
    cols = c(hhincome, pct_pov_log, pct_white, pop),
    names_to = "variable", values_to = "value"
  ) %>%
  mutate(variable = recode(variable,
                           hhincome = "Median Annual Household Income ($)",
                           pct_pov_log = "Residents Under Poverty Level (%; log)",
                           pct_white = "Residents who self-identify as White (%)",
                           pop = "Total Population"))

# Make the plot
ggplot(coffee_long, aes(x = review_count_log, y = value, color = county)) +
  geom_point(alpha = 0.8, size = 2) + 
  geom_smooth(method = "lm", se = FALSE) +
  facet_wrap(~ variable, scales = "free_y") +
  # Add correlation values (formatted like lab example)
  stat_cor(method = "pearson",
           label.x = 1.5, label.y.npc = 0.9, # position top-left
           aes(label = paste("italic(R)~'= ', ..r.. %>% round(2), ',~italic(p)~'= ', signif(..p..,2)")),
           parse = TRUE, size = 3.5, color = "black") +
  labs(
    title = "Plot 4. Logged Review Count vs. Neighborhood Characteristics",
    x = "Review Count (log)",
    y = "Values",
    color = "County"
  ) +
  theme_bw()

# Findings from Plot 4 (above): 
# This figure shows how the number of coffee shop reviews (on a log scale) relates to several neighborhood characteristics: median household income, poverty level, racial composition, and total population. Each point represents a census tract across the five metro Atlanta counties, and the trend lines show general relationships. The black text in each panel gives the correlation (R) and p-value (p) to describe how strong and significant each relationship is.
# In the top-left panel, there is a weak but positive correlation (R = 0.16, p = 6e-04) between income and review count, meaning tracts with higher median household incomes tend to have more coffee shop reviews. This suggests that coffee shops in wealthier areas may attract more customers or receive more online engagement.In the top-right panel, the relationship between poverty level and review count is slightly negative (R = -0.12, p = 0.014). This means tracts with higher poverty rates generally have fewer reviews, possibly because they have fewer coffee shops or less online activity related to them.The bottom-left panel shows the strongest correlation (R = 0.24, p = 6.7e-07) between percentage of residents who identify as White and review count. Areas with higher proportions of White residents tend to have more reviews, which may reflect differences in socioeconomic factors, access to amenities, or engagement with digital review platforms. Finally, the bottom-right panel shows almost no relationship between total population and review count (R = -0.03, p = 0.52). This means that simply having more people in an area does not necessarily lead to more coffee shop reviews.
# Overall, the analysis suggests that income and racial composition are more closely linked to coffee shop activity than population size. The regression lines and correlation values help confirm that economic and demographic factors play a role in shaping how active local coffee shop scenes are within different parts of the Atlanta metro area.

MA_ZHAOXIN_Mini4

2025-10-08