The homes data set has 8 variables on 1299 homes that sold in January of 2023:
Run the line of code below to add the city of the home (Burlington/DC/Nashville):
## type n
## 1 condo 309
## 2 mobile/manufactured home 5
## 3 multi-family 67
## 4 other 2
## 5 parking 5
## 6 single family 608
## 7 townhouse 296
## 8 vacant land 7
Create the data set that has the proportion of homes for each property type (just condos, townhouses, and single family homes) conditional on the city. Round the result to 4 decimal places. Save it as homes_q1.
If done correctly, the proportion of DC homes that are single family is 0.2037
Make sure to display it in the knitted document
homes_q1 <-
homes |>
# Keeping the 3 home types
filter(type %in% c("condo", "townhouse", "single family")) |>
# Counting the number of homes per type and city combo
summarize(
.by = c(city, type),
houses = n()
) |>
mutate(
.by = city,
prop_city = round(houses/sum(houses), digits = 4),
type = factor(type, levels = c("condo", "townhouse", "single family"))
) |>
arrange(city, type)
homes_q1
## city type houses prop_city
## 1 Burlington condo 85 0.2787
## 2 Burlington townhouse 56 0.1836
## 3 Burlington single family 164 0.5377
## 4 DC condo 171 0.4005
## 5 DC townhouse 169 0.3958
## 6 DC single family 87 0.2037
## 7 Nashville condo 53 0.1102
## 8 Nashville townhouse 71 0.1476
## 9 Nashville single family 357 0.7422
Create the side-by-side bar charts displaying the proportions of property type conditional on the city. Which city has the lowest rate of single family homes?
ggplot(
data = homes_q1,
mapping = aes(
x = city,
fill = type,
y = prop_city
)
) +
# Using geom_col() to add the bars
# and positioning them side-by-side with position = "dodge2"
geom_col(
color = "black",
position = "dodge2"
) +
# Adding the number of homes above each bar using geom_text()
geom_bar_text(
mapping = aes(label = houses),
position = "dodge2"
) +
# Changing the labels and adding a caption
labs(
x = NULL,
y = NULL,
fill = "Property Type",
caption = "Data: Redfin.com"
) +
# Informing what the number above each bar represents
annotate(
geom = "text",
label = "Number of homes displayed\nat the top of each bar",
y = 0.70,
x = 2,
fontface = "bold",
size = 5
) +
# Changing what appears on the y-axis:
# 1) expand removes the blank space
# 2) labels changes the proportions to percentages
# 3) breaks changes where the tick marks are located
scale_y_continuous(
expand = c(0, 0, 0.05, 0),
labels = scales::label_percent(),
breaks = (0:7)/10
) +
# Changing the labels on the x-axis
scale_x_discrete(
labels = c("Burlington, VT", "Washington DC", "Nashville, TN")
) +
# Changing the default theme
theme_classic() +
# Moving the theme to the top
theme(legend.position = "top")
The code chunk above is creating a data set called state_education that has the proportion of residents within each state without a high school degree, only a high school degree, some college but no degree, and at least a college degree.
Use the state_education data set to create a new data set named states2 with columns:
1 - 3) long, lat, and group: the 3 columns needed to make an outline of the state outline
region: The name of the state (in lowercase)
education: a factor with 4 levels ordered “No High School
Degree”, “Only High School Degree”, “Some College Experience”, “Bachelor
Degree or Higher” (hint: use the factor()
function with
levels
and labels
arguments)
proportion: The proportion of residents in the state with that education level
Make sure to remove the row for Washington DC.
If done correctly, it should have 62108 rows and 6 columns.
# Creating a data set with the state outlines
state_data <- map_data(map = "state")
# adding the house price info to the state_data set
states2 <-
left_join(x = state_data,
y = state_education |>
mutate(region = tolower(state)),
by = "region") |>
# Removing washington dc
filter(state != "District of Columbia") |>
# placing all 4 education proportions into 1 column named proportion
pivot_longer(
cols = no_hs:bachelor,
names_to = "education",
values_to = "proportion"
) |>
# Reordering the groups for education and changing the label for it
mutate(
education = factor(
x = education,
levels = c("no_hs", "high_school", "some_college", "bachelor"),
labels = c("No High School Degree",
"Only High School Degree",
"Some College Experience",
"Bachelor Degree or Higher")
)
) |>
# Picking the relevant columns
dplyr::select(long:group, region, education, proportion)
tibble(states2)
## # A tibble: 62,108 × 6
## long lat group region education proportion
## <dbl> <dbl> <dbl> <chr> <fct> <dbl>
## 1 -87.5 30.4 1 alabama No High School Degree 0.142
## 2 -87.5 30.4 1 alabama Only High School Degree 0.309
## 3 -87.5 30.4 1 alabama Some College Experience 0.299
## 4 -87.5 30.4 1 alabama Bachelor Degree or Higher 0.250
## 5 -87.5 30.4 1 alabama No High School Degree 0.142
## 6 -87.5 30.4 1 alabama Only High School Degree 0.309
## 7 -87.5 30.4 1 alabama Some College Experience 0.299
## 8 -87.5 30.4 1 alabama Bachelor Degree or Higher 0.250
## 9 -87.5 30.4 1 alabama No High School Degree 0.142
## 10 -87.5 30.4 1 alabama Only High School Degree 0.309
## # ℹ 62,098 more rows
Create the maps seen in Brightspace using ggplot()
. Make
sure to exclude the row for the District of Columbia!
ggplot(
data = states2,
mapping = aes(
x = long,
y = lat,
fill = proportion,
group = group
)
) +
# Drawing the state outline
geom_polygon(
color = "black",
size = 0.2,
#show.legend = F
) +
# Map theme
theme_map() +
# Separate map for each education level
facet_wrap(
facets = ~ education
) +
# Changing the color levels to different shades of green
scale_fill_fermenter(
label = scales::label_percent(),
palette = "Greens",
direction = 1
) +
# Changing the coordinate to an albers projection
coord_map(
projection = "albers",
lat0 = 39, lat1 = 45
) +
# Removing the buffer space around the sides
scale_x_continuous(expand = c(0, 0)) +
scale_y_continuous(expand = c(0, 0)) +
# Removing the label for fill and adding a title
labs(
title = "Education Levels per State",
fill = NULL
) +
# Centering the title, removing the background of the facet panels, and
# moving the legend to the center of the graph
theme(
plot.title = element_text(size = 16,
hjust = 0.5),
strip.background = element_blank(),
strip.text = element_text(face = "bold"),
legend.position = c(0.45, 0.45)
)