Introduction

The data I have shows information on the rent of apartment complexes around Seattle with data from 2000 to 2024. For my analysis, I am going to be exploring on what aspects effect the cost of rent as well as exploring how this changes over time.

Data Preperation

For my data preparation, I performed some initial data cleaning by changing some of the variable names as certain variable names in the original data were very long. I shortened certain variable names, while still keeping the meaning of the variable name clear. Another operation I did was removing rows of the data where all the quantitative variables were zero. The data set has rows for each apartment complex for every year between 2000 and 2024. However, some of the apartment complexes were not built yet in 2000, so for all the years between 2000 and the complex’s construction date all the quantitative values are set to zero. This zero data is not valuable to the analysis and so was removed.

# Read in data
rent <- read.csv("~/Documents/Masters/GEOG588/Lesson1/RentTypology.csv")

#Data cleaning
cleanRent <- rent %>%
  clean_names() %>% #Clean names
  rename(c(rent_per_sq_ft = tract_median_apartment_contract_rent_per_square_foot,
           rent_per_unit = tract_median_apartment_contract_rent_per_unit,
           community_name = community_reporting_area_name, 
           year_change_rent_per_sq_ft = year_over_year_change_in_rent_per_square_foot,
           year_change_rent_per_unit = year_over_year_change_in_rent_per_unit)) %>% #change long variable names
  subset(rent_per_unit > 0) # Remove rows with no rent data. 

Analysis

Plot 1:

For my first plot I wanted to simply show the change in rent over the years with a dot plot. As the data set has two rent variables, I chose to use the rent per unit variable for this plot. Along with the dot plot, I also have a line showing the mean rent per unit for each year. With the mean line, you can see a slight increase each year for rent with a slight dip in the average rent in 2008 and 2020.

ggplot(data = cleanRent, aes(x=year, y=rent_per_unit, colour = cost_category)) + 
  geom_point() + #Create a dot plot with year and rent per unit
  stat_summary(fun = mean, geom = "line", color = "black", linewidth=1.2) + # Calculates mean rent_per_unit for each year and displays as black line
  labs(x="Year", y="Rent Per Unit", title = "Change in Rent per Unit Over Time", colour = "Cost Category")
Fig 1: Rent per unit for complexes in Seattle from 2000 to 2024 with a line showing the mean rent each year. Shows a overall increase in rent prices each year.

Fig 1: Rent per unit for complexes in Seattle from 2000 to 2024 with a line showing the mean rent each year. Shows a overall increase in rent prices each year.

Plot 2:

For my second plot I have a boxplot of the yearly change in rent per unit for each of the cost categories. In this plot you can see the average change each year in rent for the cheaper and more expensive properties. Based on the boxplots, you can still see the slight increase in rent prices each year, but there does not seem to be a significant difference in how much the rent increases each year for each of the cost categories.

ggplot(data = cleanRent, aes(x=year_change_rent_per_unit, y = cost_category)) +
  geom_boxplot() + # Create three boxplots for each cost category
  labs(x = "Yearly Change in Rent Per Unit", y = "Cost Category", title = "Boxplots of Yearly Change in Rent by Cost Category")
Fig 2: Boxplots showing the percent change in rent per unit each year splitting the complexes into three different categories based on rent prices.

Fig 2: Boxplots showing the percent change in rent per unit each year splitting the complexes into three different categories based on rent prices.

Plot 3:

For my third plot, I calculated the total number of properties across all the complexes for each year. There is an increase of about 700 properties between 2000 and 2024.

#Create new data frame by grouping rent data by year and calculating the sum of the properties for each year
total_properties_per_year <- cleanRent %>%
  group_by(year) %>%
  summarise(total_properties = sum(properties, na.rm = TRUE))

ggplot(total_properties_per_year, aes(x = year, y = total_properties)) +
  geom_col() + #Create bar plot showing total number of properties by year
  labs(x = "Year", y="Total Properties", title = "Total Number of Properties Over Time")
Fig 3:.

Fig 3:.