setwd("C:/Users/49765/Desktop/Urban Analytics/mini4")
# Read CSV file
data <- read.csv("coffee.csv")
head(data)
## X GEOID county hhincome pct_pov review_count avg_rating
## 1 1 13063040202 Clayton County 33276 0.20134228 57.00000 2
## 2 2 13063040308 Clayton County 28422 0.21071800 13.00000 3
## 3 3 13063040407 Clayton County 49271 0.10825507 29.33333 2
## 4 4 13063040408 Clayton County 44551 0.18095661 20.00000 4
## 5 5 13063040410 Clayton County 49719 0.11468019 41.00000 1
## 6 6 13063040411 Clayton County 57924 0.09068942 18.00000 2
## race.tot avg_price pct_white hhincome_log review_count_log pct_pov_log
## 1 2850 1 0.07508772 10.41289 4.060443 -1.554276
## 2 4262 1 0.26067574 10.25527 1.975622 -1.510869
## 3 4046 1 0.20514088 10.80529 3.320837 -2.134911
## 4 8489 1 0.16868889 10.70461 3.044522 -1.655709
## 5 7166 1 0.19369244 10.81434 3.737670 -2.082003
## 6 13311 1 0.16512659 10.96706 2.944439 -2.295715
## yelp_n
## 1 1
## 2 2
## 3 3
## 4 1
## 5 1
## 6 1
As the average rating increases, so does its corresponding household income, except for 5. Assuming that higher rated cafes usually offer higher quality products, the phenomenon shown in the chart suggests that higher rated cafes with their higher quality products are usually distributed at higher household incomes, suggesting that higher quality products tend to command a higher price. However, the highest quality products, i.e., the products offered by the restaurants with a score of 5, are good value for money.
ggplot(data, aes(factor(avg_rating), hhincome)) +
geom_boxplot() +
xlab("Average Rating") +
ylab("Household Income") +
ggtitle("Boxplot of Income vs. Average Rating")
The charts show the relationship between average ratings and household income in the different counties. In Clayton County, almost all of the rated cafes are distributed in locations with low household incomes, probably due to the low household incomes in the county as a whole, as well as the low number of cafes. there are no cafes with a rating of 1 in Cobb County, and the distribution of the corresponding household incomes of the cafes is more consistent. In DeKalb County, even the cafes with a rating of 1 have higher household incomes and there are more cafes in very high household income locations.Fulton County and Gwinnett County are more consistent with the region as a whole.
ggplot(data = data, aes(x = factor(avg_rating), y = hhincome)) +
geom_boxplot(aes(fill = factor(avg_rating)), show.legend = FALSE) +
labs(
#title = "Boxplot of Avg Rating by Household Income",
x = "Average Rating",
y = "Household Income"
) +
theme_minimal() +
scale_fill_manual(values = rep("white", 5)) + # 设置颜色为白色
facet_wrap(~ county, ncol = 3 ) +
theme(strip.background = element_rect(fill = "lightgrey"))
The position of the dots responds to the relationship between review count and household income, and the color of the dots responds to their corresponding percentage of whites.The number of dots reflects the number of cafes in the area.One obvious feature is that cafes with more white people usaully have higher household incomes.
ggplot(data = data, aes(x = review_count_log, y = hhincome, color = pct_white)) +
geom_point() +
labs(
x = "Review Count(Log)",
y = "Household Income",
color = "P(white)"
) +
theme_minimal() +
facet_wrap(~ county, ncol = 3) +
theme(strip.background = element_rect(fill = "lightgrey"))
In this image, the colors of the dots reflect different counties. Very in this graph, the color of the dots reflects the different counties. The correlations reflected in all four data sets are weak, with only a slightly stronger correlation with the percentage of white residents.
# 使用pivot_longer将多个变量转换成长格式
df_long <- data %>%
pivot_longer(cols = c("hhincome", "pct_pov_log", "pct_white", "race.tot"),
names_to = "Variable", values_to = "Value")
# 创建散点图,按照county着色,并使用facet_wrap创建子图
ggplot(df_long, aes(x = review_count_log, y = Value, color = county)) +
geom_point() +
geom_smooth(method = "lm", formula = y ~ x, se = FALSE) + # 添加拟合线
facet_wrap(~ Variable, scales = "free") +
labs(
x = "Review Count(Log)",
y = "Values",
title = "The relationships between different values and review_count_log"
)