Mini Assignment 4

coffee <- read.csv("/Users/helenalindsay/Documents/Fall_23/CP8883/Mini4/coffee.csv")%>%
  select(-X)

Plot 1

ggplot(data = coffee) +
  geom_boxplot(aes(x=factor(avg_rating), y=hhincome), 
               fill = "white", color = "black")+
  labs(x = "Average Rating", y = "Household Income")

The plot implies that the higher household income relates to average yelp ratings of 3 and 4.

Plot 2

ggplot(data = coffee) +
  geom_boxplot(aes(x=factor(avg_rating), y=hhincome), 
               fill = "white", color = "black")+
  labs(x = "Average Yelp Rating", y = "Median Average Household Income ($)") +
  facet_wrap(~county)

Plot 2 indicates that the trend that we saw in plot 1 (higher household income correlates to average yelp ratings of 3 and 4) applies across counties.

Plot 3

ggplot(data = coffee) +
  geom_point(aes(x=review_count_log, y=hhincome,color = pct_white),)+
  labs(x = "Review Count (log)", y = "Median Average Household Income ($)") +
  facet_wrap(~county)

Plot 3 implies that the tracts with a higher percentage of residents who self-identify as white have a higher median average household income across counties. In addition, the log of the average number of reviews seem to have a positive correlation with median average household income.

Plot 4

long_coffee <- coffee %>%
  pivot_longer(cols = c(pct_pov_log, hhincome, pct_white, race.tot),
               names_to = "var",
               values_to = "Values")

long_coffee <- long_coffee %>%
  mutate(
    var = case_when(
      var == "pct_pov_log" ~ "Percent Residents Under Poverty",
      var == "hhincome" ~ "Median Annual Household Income ($)",
      var == "pct_white" ~ "Percent White Resident",
      var == "race.tot" ~ "Total Population",
      TRUE ~ as.character(var)  # Keep other values unchanged
    )
  )

scatterplots <- ggplot(long_coffee) +
  geom_point(aes_string(x = long_coffee$review_count_log, y = "Values", color = "county")) +
  stat_smooth(aes_string(x = long_coffee$review_count_log, y = "Values", color = "county"), method = "lm", se = FALSE) +
  labs(
    x = "Review Count Logged",
    y = "Values",
    color = "County"
  ) +
  facet_wrap(~ var, scales = "free")+
  stat_cor(aes(x=Values,y = review_count_log),  method = "pearson")


scatterplots

There seems to be a positive correlation between the log of the average number of reviews and median annual household income as well as the percentage of residents who self-identify as white. On the other hand, total population and the percentage of residents under poverty seem to be negatively correlated to the log of the average number of reviews.

Mini Assignment 4

Helena Lindsay

2023-10-10

Plot 1

Plot 2

Plot 3

Plot 4