Tingyu_Liu_Mini

Research Question

Point-of-interests (POI) are important for urban residents in various ways. POIs can add economic and social vitality to the neighborhood in which they locate. POIs that are close to residential locations can functions as destinations that people can potentially walk to, creating more walkable environment; in fact, Walk Score of a location is calculated based on how many/diverse POIs are found within the walkable distance from that location. These benefits are likely to be greater if POIs are attractive and popular.

However, attractive POIs may not be distributed evenly across different neighborhoods; we anecdotally know that attractive POIs are more likely to be found in more advantaged neighborhoods, and more advantaged neighborhoods may enjoy more benefits from having attractive POIs nearby than their counterparts.

But is there really such relationships between the attractiveness of POIs in a neighborhood and being advantaged as a neighborhood? Which neighborhood characteristic has the strongest relationship with the attractiveness of POIs? This assignment aims to examine the relationship between being advantaged as a neighborhood and having more attractive POIs using ggplot package.

Using this data, re-create the following plots as closely as possible. Make sure you provide the code you wrote to generate the plot. When you re-create them, you DO NOT need to make plots aesthetics similar.

Other minor aesthetics, such as the aspect ratio and theme of the plots, do not matter. If you want to modify them for aesthetics, feel free to do so. For each of the plot, write a few sentences to describe your findings.

library(tidyverse)
library(sf)
library(tmap)
library(leaflet)
library(here)
library(tidycensus)
library(readr)
library(skimr)
library(ggplot2)
library(lubridate)
library(dplyr)
library(htmlwidgets)

data <- read_csv('coffee.csv')

## New names:
## Rows: 363 Columns: 14
## ── Column specification
## ──────────────────────────────────────────────────────── Delimiter: "," chr
## (1): county dbl (13): ...1, GEOID, hhincome, pct_pov, review_count, avg_rating,
## race.tot...
## ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
## Specify the column types or set `show_col_types = FALSE` to quiet this message.
## • `` -> `...1`

# Variables used - avg_rating, hhincome
# boxplot

boxplot <- ggplot(data = data) +
  geom_boxplot(aes(x=avg_rating, y=hhincome),fill="darkblue") +
  coord_cartesian(xlim = c(0, 6))+
                 ggdark::dark_theme_grey()

## Inverted geom defaults of fill and color/colour.
## To change them back, use invert_geom_defaults().

 plotly::ggplotly(boxplot)

## Warning: Continuous x aesthetic
## ℹ did you forget `aes(group = ...)`?

Boxplot findings

The boxplot provided a visual summary of the central tendency and dispersion of average rating and median annual household income.

Key observations include:

Median: The median values of median annual household income for different levels of average rating showed an initial increase followed by a decrease, with the highest median income corresponding to a rating of 3.

Interquartile Range (IQR, Q3-Q1): The IQR for median annual household income across different levels of average rating revealed that the group with the lowest rating had the smallest IQR, suggesting that this group has a lower income range. Conversely, a rating of 4 corresponded to the largest IQR, indicating a wide range of incomes among those who gave this rating.

Outliers: Notable outliers were observed in the data for ratings 2 and 4. All these outliers corresponded to higher household incomes.

# Variables used - avg_rating, hhincome, county
boxplot_county <- ggplot(data = data) +
  geom_boxplot(aes(x=avg_rating, y=hhincome),fill="darkred") +
  coord_cartesian(xlim = c(0, 6))+
                 ggdark::dark_theme_grey() +
  facet_wrap(~county) 

plotly::ggplotly(boxplot_county)

County-specific boxplot findings

Using the facet wrap function, the boxplots were broken down by county, revealing county-specific patterns and trends. Here are some key observations:

County Clayton: In County Clayton, there is a completely different trend compared to the others. The rating 3 has the lowest median income, and the interquartile range (IQR) is very small. This suggests that this county has a narrow range of income and a relatively low median income.

County Fulton: In County Fulton, the IQR is notably large, and the lowest median income group corresponds to the lowest rating.

# Variables used - review_count_log, hhincome, county, pct_white
scatter_plot <- ggplot(data = data) +
  geom_point(mapping = aes(x=review_count_log, y=hhincome, 
                           color=pct_white))+
  scale_color_gradient(low = "orange", high = "blue") +
  ggdark::dark_theme_grey() +
  facet_grid(~county)

plot_3 <- plotly::ggplotly(scatter_plot)

plot_3

County-Specific scatter plot findings

The scatter plots illustrated the relationships between logged review count and median annual household income, and the percentage of residents who self-identify as white. Key observations include:

Correlation: There was a strong and positive correlation between the percentage of white residents and household income in every county, except for Clayton County. Additionally, there was a weak and positive correlation between review count and income in every county, except for Cobb County.

Trend: As review count increased, household income also increased.

Variation: Fulton County exhibited high variation.

# Variables used - pct_pov_log, hhincome, pct_white, race.tot, review_count_log, county
# Hint: used pivot_longer() to create Plot 4.
pivot_data = pivot_longer(data, cols = review_count_log)

pivot_data_long <- pivot_data %>%
  pivot_longer(cols = c(pct_pov_log, hhincome, pct_white, race.tot), names_to = "variable", values_to = "value_y")

ggplot(data = pivot_data_long, mapping = aes(x=value, y=value_y, color=county)) + 
  geom_jitter(alpha=0.4) + 
  geom_smooth(method = "lm", se=FALSE) +
  ggpubr::stat_cor(method = "pearson") +
  ggdark::dark_theme_grey() +
  facet_wrap(~variable, scales = "free")

## `geom_smooth()` using formula = 'y ~ x'

County-Specific scatter plot group findings

The scatter plot group illustrates the relationships between the logged review count and various factors such as median annual household income, poverty percentage, white residence percentage, the percentage of residents who self-identify as white, and total population.

Key observations include:

Correlation: There is a strong and positive correlation between the percentage of white residents and review count in every county, particularly in Dekalb county. Conversely, there is a strong and negative correlation between the poverty percentage and review count in every county, especially in Dekalb county.

In addition, there is a weak and positive correlation between review count and income in every county, except for Cobb County.

Trend: As the review count increases, household income and the white residence percentage also increase, while population and the poverty percentage decrease.

Variation: Dekalb County exhibits the highest variation, while Clayton County shows the lowest variation.

Tingyu_Liu_Mini_4

Drunken_Boat

2023-10-11

Research Question

Boxplot findings

County-specific boxplot findings

County-Specific scatter plot findings

County-Specific scatter plot group findings