Abstract

This data analysis project investigates the dynamics of the rental studio market in New York City by conducting a thorough analysis of average rental studio prices and studio rental inventory. The study aims to provide valuable insights into the factors influencing rental pricing trends and the overall availability of studio spaces in the region. The research methodology involves the collection and examination of two extensive data sets sourced from StreetEasy.com. Data points include area names, borough, area types, dates, and values over a specified time frame.

Introduction

Standing on the threshold of a new chapter, my time at Mercy has proven to be a transforming experience, providing me with the skills and information required to navigate the complexities of the real world. As I stand at the intersection of academia and the professional realm, my data analysis project unfolds - a journey into the heart of New York City’s housing market.

This project dives into NYC’s studio apartment scene, exploring rental prices and inventory shifts across diverse neighborhoods. As a graduate about to enter the professional arena, I’m grappling with the decision of whether to commute of become a city resident. We’ll unravel the choices tied to personal preferences and economic realities, decoding living costs in coveted studios.

Join me in this data-driven journey, where we explore the impact of being a commuter or a city dweller!

Introduction to my Data

My data will explore rent prices in Manhattan for studio apartments from January 2022 to October 2023. I found my data on StreetEasy.com, a well-known platform that provides information on real estate listings in the New York metropolitan area. StreetEasy has uploaded all of the data to use as a resource to leverage its vast database of for-sale and for-rent data and ultimately tailor it to your needs. The data I will be analyzing has been updated as of October 2023, with time-frames included dating back from January 2010. Geographically, the data is separated by city, borough, region and neighborhood. In both the Median_Rent and Rental_Inventory data sets there are 169 columns and 198 rows. Each row is representative of an area in NYC.

Data Manipulation

I am cleaning my data by omitting columns that will not be used, renaming columns, renaming variables within columns, making my data “long” data, and creating new columns that will be necessary for further analysis.

# Code to slim down data to only whats needed
Median_Rent <- Median_Rent %>% select(-4:-147)
Rental_Inventory <- Rental_Inventory %>% select(-4:-147)

# Code to change column names
colnames(Median_Rent) <- c('AreaName', 'Borough', 'AreaType','Jan.2022', 'Feb.2022', 'Mar.2022', 'Apr.2022', 'May.2022', 'Jun.2022', 'Jul.2022', 'Aug.2022', 'Sep.2022', 'Oct.2022', 'Nov.2022', 'Dec.2022', 'Jan.2023', 'Feb.2023', 'Mar.2023', 'Apr.2023', 'May.2023', 'Jun.2023', 'Jul.2023', 'Aug.2023', 'Sep.2023', 'Oct.2023') 
colnames(Rental_Inventory) <- c('AreaName', 'Borough', 'AreaType','Jan.2022', 'Feb.2022', 'Mar.2022', 'Apr.2022', 'May.2022', 'Jun.2022', 'Jul.2022', 'Aug.2022', 'Sep.2022', 'Oct.2022', 'Nov.2022', 'Dec.2022', 'Jan.2023', 'Feb.2023', 'Mar.2023', 'Apr.2023', 'May.2023', 'Jun.2023', 'Jul.2023', 'Aug.2023', 'Sep.2023', 'Oct.2023')

# Code to change "submarket" to "region"
Median_Rent <- Median_Rent %>%
  mutate(AreaType = sub("submarket", "region", AreaType))
Rental_Inventory <- Rental_Inventory %>%
  mutate(AreaType = sub("submarket", "region", AreaType))

# Code to change data from wide to long
library(data.table)
LongMedian_Rent <- melt(setDT(Median_Rent), id.vars = c("AreaName", "Borough", "AreaType"), variable.name = "Month")
LongRental_Inventory <- melt(setDT(Rental_Inventory), id.vars = c("AreaName", "Borough", "AreaType"), variable.name = "Month")

# Code to create new columns with changed date format
LongMedian_Rent$month <- as.Date(paste0("01 ", gsub("\\.", " ", LongMedian_Rent$Month)), format="%d %b %Y")

# Formatting of the Median Rent data
LongMedian_Rent <- LongMedian_Rent %>%
  mutate(season = case_when(
    lubridate::month(month) %in% c(12, 1, 2) ~ "Winter",
    lubridate::month(month) %in% c(3, 4, 5) ~ "Spring",
    lubridate::month(month) %in% c(6, 7, 8) ~ "Summer",
    lubridate::month(month) %in% c(9, 10, 11) ~ "Fall",
    TRUE ~ NA_character_))

LongMedian_Rent <- LongMedian_Rent %>% mutate(SeasonMonth = factor(months(month), levels = month.name))

# Formatting of the Rental Inventory data
LongRental_Inventory$month <- as.Date(paste0("01 ", gsub("\\.", " ", LongRental_Inventory$Month)), format="%d %b %Y")

LongRental_Inventory <- LongRental_Inventory %>%
  mutate(season = case_when(
    lubridate::month(month) %in% c(12, 1, 2) ~ "Winter",
    lubridate::month(month) %in% c(3, 4, 5) ~ "Spring",
    lubridate::month(month) %in% c(6, 7, 8) ~ "Summer",
    lubridate::month(month) %in% c(9, 10, 11) ~ "Fall",
    TRUE ~ NA_character_))

LongRental_Inventory <- LongRental_Inventory %>% mutate(SeasonMonth = factor(months(month), levels = month.name))

After cleaning both Median_Rent and Rental_Inventory data sets, here is a glimpse of what we will be working with:

glimpse(LongMedian_Rent)
## Rows: 4,356
## Columns: 8
## $ AreaName    <chr> "All Downtown", "All Midtown", "All Upper East Side", "All…
## $ Borough     <chr> "Manhattan", "Manhattan", "Manhattan", "Manhattan", "Manha…
## $ AreaType    <chr> "region", "region", "region", "region", "region", "neighbo…
## $ Month       <fct> Jan.2022, Jan.2022, Jan.2022, Jan.2022, Jan.2022, Jan.2022…
## $ value       <dbl> 3300, 3000, 2250, 1800, 2550, 1700, NA, NA, 3553, 1475, NA…
## $ month       <date> 2022-01-01, 2022-01-01, 2022-01-01, 2022-01-01, 2022-01-0…
## $ season      <chr> "Winter", "Winter", "Winter", "Winter", "Winter", "Winter"…
## $ SeasonMonth <fct> January, January, January, January, January, January, Janu…
glimpse(LongRental_Inventory)
## Rows: 4,356
## Columns: 8
## $ AreaName    <chr> "All Downtown", "All Midtown", "All Upper East Side", "All…
## $ Borough     <chr> "Manhattan", "Manhattan", "Manhattan", "Manhattan", "Manha…
## $ AreaType    <chr> "region", "region", "region", "region", "region", "neighbo…
## $ Month       <fct> Jan.2022, Jan.2022, Jan.2022, Jan.2022, Jan.2022, Jan.2022…
## $ value       <dbl> 732, 503, 347, 224, 262, 90, NA, 3, 10, 51, NA, 6, NA, 47,…
## $ month       <date> 2022-01-01, 2022-01-01, 2022-01-01, 2022-01-01, 2022-01-0…
## $ season      <chr> "Winter", "Winter", "Winter", "Winter", "Winter", "Winter"…
## $ SeasonMonth <fct> January, January, January, January, January, January, Janu…

Business Questions

  1. What are the trends in studio apartment rentals? Are prices higher during certain months?

This visualization will display monthly trends in NYC studio apartment rentals. Data shows rental prices are higher during August and September, and lower during the Winter months.

# Calculation of average rental price for each month
MonthlyAverageRent <- LongMedian_Rent %>%
    group_by(SeasonMonth) %>%
    summarize(AverageRent = mean(value, na.rm = TRUE))
MonthlyAverageRent <- MonthlyAverageRent %>% rename(Month = SeasonMonth)
# Bar chart visualization of monthly trends for studio apartment rentals in NY.
ggplot(MonthlyAverageRent, aes(reorder(x = Month, AverageRent), y = AverageRent, fill = Month)) +
  geom_bar(show.legend = FALSE , stat = "identity") +
  labs(title = "Average Rent Prices by Month", x = "Month", y = "Average Rent Price") +
  theme_minimal() +
  coord_flip() 

  1. What are the average monthly rental prices by neighborhood?

This table will display the average monthly rental prices for studio apartments in different neighborhoods of Manhattan. Areas such as Inwood, Washington Heights, and Hamilton Heights seem to have the lowest average rent prices overall and areas like Tribeca, Flatiron, and Chelsea lead with the highest average rent prices

AvgRentbyNhood <- LongMedian_Rent %>% filter(AreaType=="neighborhood", Borough=="Manhattan") %>% group_by(AreaName,Borough) %>% summarise(AverageRent=mean(value,na.rm = TRUE)) %>% arrange(desc(AverageRent)) %>% filter(AreaName != "Stuyvesant Town/PCV" & AreaName != "Marble Hill" & AreaName != "Civic Center") 
ggplot(AvgRentbyNhood, aes(reorder(x = AreaName, -AverageRent), y = AverageRent, fill = AreaName)) +
  geom_col(show.legend = FALSE) +
  labs(title = "Monthly Rental Prices for Studio Apartments in Manhattan",
       x = "Neighborhood",
       y = "Monthly Rental Price") +
  coord_flip() +
  theme_classic()

  1. Are there specific times of the year when more apartments are available throughout NYC?

This table will show seasonal trend in apartment availability. Conclusively, rental inventory is higher during the Summer months and lower in the Winter months.

# Calculation of the average number of available apartments by month
MonthlyRentalInventory <- LongRental_Inventory %>%
  group_by(season) %>%
  summarise(Average_Available_Apartments = mean(value, na.rm = TRUE))
# Visualization of on average the amount of available apartments by season
ggplot(MonthlyRentalInventory, aes(reorder(x = season, -Average_Available_Apartments), y = Average_Available_Apartments)) +
  geom_bar(stat = "identity", fill = "skyblue") +
  labs(title = "Seasonal Trends in Apartment Availability", x = "Month", y = "Available Apartments Averaged") +
  theme_classic() 

  1. With this box plot, we are able to visually explore and compare the seasonal variation in studio apartment rental prices. Although you may not be see a wide variation in rental prices across different seasons, the box plot still allows one to compare the spread, and potential out-liers of rental prices for each season. The boxes represent the interquartile range, the line inside the box is the median, and the whiskers extend to the min and max values within a certain range.
ggplot(LongMedian_Rent, aes(x = season, y = value, fill = season)) +
  geom_boxplot() +
  labs(title = "Seasonal Trends in Studio Apartment Rentals",
       x = "Season",
       y = "Rental Price") +
  scale_fill_manual(values = c("Winter" = "lightgreen", "Spring" = "orange", "Summer" = "lavender", "Fall" = "pink")) +
  theme_minimal() +
  theme(legend.position = "top")

Results

As a result of my analyses, there is more rental inventory in the Summer but you will also find apartments at higher rates during those months. There is less inventory in the Winter months, yet rental prices are at a lower rate during those months. Additionally, some neighborhoods one may want to look at are Inwood, Washington Heights, and Hamilton Heights, while areas like Tribeca and Chelsea may be last choices.

Conclusion

Leaving Mercy University behind, my analysis of New York City’s apartment scene revealed some important information. In the summer, there are more apartments, but they cost more. In the winter, there are fewer options, but the prices are lower. Also, some good neighborhoods to check out are Inwood, Washington Heights, and Hamilton Heights. On the flip side, Tribeca and Chelsea might not be the best choices—they’re pricey and don’t have as many options. These findings are helping me navigate residential living as I step into the real world after college.