library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --
## v ggplot2 3.3.6     v purrr   0.3.4
## v tibble  3.1.4     v dplyr   1.0.7
## v tidyr   1.1.3     v stringr 1.4.0
## v readr   2.0.1     v forcats 0.5.1
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library(tidycensus)
library(sf)
## Linking to GEOS 3.10.2, GDAL 3.4.1, PROJ 7.2.1; sf_use_s2() is TRUE
library(tmap)
library(jsonlite)
## 
## 载入程辑包:'jsonlite'
## The following object is masked from 'package:purrr':
## 
##     flatten
library(tidyverse)
library(httr)
library(jsonlite)
library(reshape2)
## 
## 载入程辑包:'reshape2'
## The following object is masked from 'package:tidyr':
## 
##     smiths
library(here)
## here() starts at C:/Users/11969/Desktop/Intro to Urban_Analytics/Assignment4
library(yelpr)
library(knitr)
library(leaflet)
library(ggplot2)
library(ggpubr)

Get the data

yelpcof<-read.csv("C:/Users/11969/Desktop/Intro to Urban_Analytics/Assignment4/coffee.csv")

Plot1

boxplot_Arating <- ggplot(data=yelpcof, aes(x = avg_rating, group = avg_rating, y = hhincome)) +
  geom_boxplot(col = 'red') +
  labs(x = 'average Rating', y = 'hhincome') 
boxplot_Arating

The distribution of ratings 3~4 is the most and the income is similar. 1~5 is the least and the income is almost the same

Plot2

boxplot_Arating+facet_wrap(~county)+theme_light()+
  labs( x = "Average Yelp Rating", y= "Median Household Income($)")

Clayton County doesn’t have a level five, Cobb County doesn’t have a level one.

Plot3

ggplot(data = yelpcof) +
  geom_point(mapping = aes(x=review_count, y=hhincome, color = pct_white),alpha = 0.3) + facet_wrap(~county)+
  labs(x = "Review Count(log) ",
       y = "Median Annual Household Income",
       color = "Proportion of residents who self-identified as white",
       title = "Review Court vs. Household Income?") +
  scale_color_gradient(low="darkblue", high="red") 

Fulton County has the highest income levels, while Clayton County is less white and has an overall lower income level. Clayton County may be due to geographic factors and small sample size.Because the relationship between income level and racial distribution cannot be seen from the other counties

Plot4

yelpcof_pivot <- pivot_longer(yelpcof, cols = c('hhincome','pct_pov_log','pct_white','race.tot'))

ggplot(data = yelpcof_pivot, mapping = aes(x = review_count_log, y = value))+
  geom_point(mapping = aes(color = county))+
  geom_smooth(mapping = aes(color = county), method = "lm")+
  facet_wrap(~name,
  scales = "free_y",
  labeller = labeller("name" = c("hhincome" = 'Median Annual Houshold Income ($)',
  "pct_pov_log"='Percent Residents Under Poverty',
  "pct_white"='Percent White Resident',
  "race.tot"='Total Population')
  ))+
  ggpubr::stat_cor(method = "pearson")
## `geom_smooth()` using formula 'y ~ x'

Review count is not related to population according to P-value.In contrast, all regions have a relationship with the poverty index, especially Dekalb County. It is also the only region where there is a correlation in income levels. This shows that there is definitely a relationship between the level of diet and the level of economy.Fulton and Dekalb are two that are correlated in the ethnic distribution, but we also have to consider the geographic factors involved.