Apartment Rental Cost Changes Over Time in New York

Introduction

Rental prices have increased since 1980 on average 8.86% per year. Nationwide the average monthly rent is $1,164 and within the 21st century the median rent increases at an annual rate of 4.17%. How does this compare by state, particularly in New York? Did the pandemic have a big effect on these rental cost prices? To answer these questions and more, I will be using three data sets including: Street Easy Data, Codecademy in GitHub, and Apartment List that include data from 2016 - October 2021. I will be wrangling and exploring this data and using Tableau Public to create visual aids to show how rental costs have changed over time.

\(~\)

Load Libraries:

library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✓ ggplot2 3.3.5     ✓ purrr   0.3.4
## ✓ tibble  3.1.4     ✓ dplyr   1.0.7
## ✓ tidyr   1.1.3     ✓ stringr 1.4.0
## ✓ readr   2.0.1     ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library(stringr)
library(dplyr)
library(tidyr)

\(~\)

StreetEasy Data Wrangling

Street Easy Data from 2016 - October 2021 data sets containing All apartment types, Studio, One Bedroom, Two Bedrooms, and Three Bedrooms or more. I’ll be exploring the data sets individually and then joining them to compare.

\(~\)

Load StreetEasy Data

# StreetEasy Data

# All types of Apartments
street_easy_all <- read.csv("https://raw.githubusercontent.com/letisalba/Data-607/main/Final%20Project/csv%20files/medianAskingRent_All.csv", header = TRUE, stringsAsFactors = FALSE, na.strings = c(""," ","NA"));

# Studio's only
street_easy_studio <-  read.csv("https://raw.githubusercontent.com/letisalba/Data-607/main/Final%20Project/csv%20files/medianAskingRent_Studio.csv", header = TRUE, stringsAsFactors = FALSE, na.strings = c(""," ","NA"));

# One Bedroom only
street_easy_onebrd <- read.csv("https://raw.githubusercontent.com/letisalba/Data-607/main/Final%20Project/csv%20files/medianAskingRent_OneBd.csv", header = TRUE, stringsAsFactors = FALSE, na.strings = c(""," ","NA"));

# Two Bedrooms only
street_easy_twobrd <- read.csv("https://raw.githubusercontent.com/letisalba/Data-607/main/Final%20Project/csv%20files/medianAskingRent_TwoBd.csv", header = TRUE, stringsAsFactors = FALSE, na.strings = c(""," ","NA"));

# Three or more Bedrooms only
street_easy_threebrds <- read.csv("https://raw.githubusercontent.com/letisalba/Data-607/main/Final%20Project/csv%20files/medianAskingRent_ThreePlusBd.csv", header = TRUE, stringsAsFactors = FALSE, na.strings = c(""," ","NA"));

\(~\)

Street Easy All

Rows: 198 Columns: 145

\(~\)

# Remove columns not needed
streeteasy_all <- street_easy_all[, -c(4:75)]

# Rename columns
colnames(streeteasy_all) <- c("Area_Name", "Borough", "Area_Type", "Jan_2016", "Feb_2016", "Mar_2016", "Apr_2016", "May_2016", "Jun_2016", "Jul_2016", "Aug_2016", "Sep_2016", "Oct_2016", "Nov_2016", "Dec_2016", "Jan_2017", "Feb_2017", "Mar_2017", "Apr_2017", "May_2017", "Jun_2017", "Jul_2017", "Aug_2017", "Sep_2017", "Oct_2017", "Nov_2017", "Dec_2017", "Jan_2018", "Feb_2018", "Mar_2018", "Apr_2018", "May_2018", "Jun_2018", "Jul_2018", "Aug_2018", "Sep_2018", "Oct_2018", "Nov_2018", "Dec_2018", "Jan_2019", "Feb_2019", "Mar_2019", "Apr_2019", "May_2019", "Jun_2019", "Jul_2019", "Aug_2019", "Sep_2019", "Oct_2019", "Nov_2019", "Dec_2019", "Jan_2020", "Feb_2020", "Mar_2020", "Apr_2020", "May_2020", "Jun_2020", "Jul_2020", "Aug_2020", "Sep_2020", "Oct_2020", "Nov_2020", "Dec_2020",  "Jan_2021", "Feb_2021", "Mar_2021", "Apr_2021", "May_2021", "Jun_2021", "Jul_2021", "Aug_2021", "Sep_2021", "Oct_2021")

# Replace all NA values with 0
streeteasy_all <- streeteasy_all %>% replace(is.na(.), 0)

\(~\)

Year 2016

# Select columns from 2016
yr2016_all <- streeteasy_all %>% 
  select(4:15)

# cbind new columns with Mean of 2016
yr_2016_all <- cbind(yr2016_all, Year_2016_All = round(rowMeans(yr2016_all)))

\(~\)

Year 2017

# Select columns from 2017 
yr2017_all <- streeteasy_all %>% 
  select(16:27)

# cbind new columns with Mean of 2017
yr_2017_all <- cbind(yr2017_all, Year_2017_All = round(rowMeans(yr2017_all)))

\(~\)

Year 2018

# Select columns from 2018
yr2018_all <- streeteasy_all %>% 
  select(28:39)

# cbind new columns with Mean of 2018
yr_2018_all <- cbind(yr2018_all, Year_2018_All = round(rowMeans(yr2018_all)))

\(~\)

Year 2019

# Select columns from 2019
yr2019_all <- streeteasy_all %>% 
  select(40:51)

# cbind new columns with Mean of 2019
yr_2019_all <- cbind(yr2019_all, Year_2019_All = round(rowMeans(yr2019_all)))

\(~\)

Year 2020

# Select columns for 2020
yr2020_all <- streeteasy_all %>% 
  select(52:63)

# cbind new columns with Mean of 2020
yr_2020_all <- cbind(yr2020_all, Year_2020_All = round(rowMeans(yr2020_all)))

\(~\)

Year 2021

# Select columns for 2021
yr2021_all <- streeteasy_all %>% 
  select(64:73)

# cbind new columns with Mean of 2021
yr_2021_all <- cbind(yr2021_all, Year_2021_All = round(rowMeans(yr2021_all)))

\(~\)

Street Easy Studio

Rows: 198 Columns: 145

\(~\)

# Remove columns not needed
streeteasy_studio <- street_easy_studio[, -c(4:75)]

# Rename columns
colnames(streeteasy_studio) <- c("Area_Name", "Borough", "Area_Type", "Jan_2016", "Feb_2016", "Mar_2016", "Apr_2016", "May_2016", "Jun_2016", "Jul_2016", "Aug_2016", "Sep_2016", "Oct_2016", "Nov_2016", "Dec_2016", "Jan_2017", "Feb_2017", "Mar_2017", "Apr_2017", "May_2017", "Jun_2017", "Jul_2017", "Aug_2017", "Sep_2017", "Oct_2017", "Nov_2017", "Dec_2017", "Jan_2018", "Feb_2018", "Mar_2018", "Apr_2018", "May_2018", "Jun_2018", "Jul_2018", "Aug_2018", "Sep_2018", "Oct_2018", "Nov_2018", "Dec_2018", "Jan_2019", "Feb_2019", "Mar_2019", "Apr_2019", "May_2019", "Jun_2019", "Jul_2019", "Aug_2019", "Sep_2019", "Oct_2019", "Nov_2019", "Dec_2019", "Jan_2020", "Feb_2020", "Mar_2020", "Apr_2020", "May_2020", "Jun_2020", "Jul_2020", "Aug_2020", "Sep_2020", "Oct_2020", "Nov_2020", "Dec_2020",  "Jan_2021", "Feb_2021", "Mar_2021", "Apr_2021", "May_2021", "Jun_2021", "Jul_2021", "Aug_2021", "Sep_2021", "Oct_2021")

# Replace all NA values with 0
streeteasy_studio <- streeteasy_studio %>% replace(is.na(.), 0)

\(~\)

Year 2016

# Select columns from 2016
yr2016_studio <- streeteasy_studio %>% 
  select(4:15)

# cbind new columns with Mean of 2016
yr_2016_studio <- cbind(yr2016_studio, Year_2016_Studio = round(rowMeans(yr2016_studio)))

\(~\)

Year 2017

# Select columns from 2017
yr2017_studio <- streeteasy_studio %>% 
  select(16:27)

# cbind new columns with Mean of 2017
yr_2017_studio <- cbind(yr2017_studio, Year_2017_Studio = round(rowMeans(yr2017_studio)))

\(~\)

Year 2018

# Select columns from 2018
yr2018_studio <- streeteasy_studio %>% 
  select(28:39)

# cbind new columns with Mean of 2018
yr_2018_studio <- cbind(yr2018_studio, Year_2018_Studio = round(rowMeans(yr2018_studio)))

\(~\)

Year 2019

# Select columns from 2019
yr2019_studio <- streeteasy_studio %>% 
  select(40:51)

# cbind new columns with Mean of 2019
yr_2019_studio <- cbind(yr2019_studio, Year_2019_Studio = round(rowMeans(yr2019_studio)))

\(~\)

Year 2020

# Select columns from 2020
yr2020_studio <- streeteasy_studio %>% 
  select(52:63)

# cbind new columns with Mean of 2020
yr_2020_studio <- cbind(yr2020_studio, Year_2020_Studio = round(rowMeans(yr2020_studio)))

\(~\)

Year 2021

# Select columns from 2021
yr2021_studio <- streeteasy_studio %>% 
  select(64:73)

# cbind new columns with Mean of 2021
yr_2021_studio <- cbind(yr2021_studio, Year_2021_Studio = round(rowMeans(yr2021_studio)))

\(~\)

Street Easy One Bedroom

Rows: 198 Columns: 145

\(~\)

# Remove columns not needed
streeteasy_onebdr <- street_easy_onebrd[, -c(4:75)]

# Rename columns
colnames(streeteasy_onebdr) <- c("Area_Name", "Borough", "Area_Type", "Jan_2016", "Feb_2016", "Mar_2016", "Apr_2016", "May_2016", "Jun_2016", "Jul_2016", "Aug_2016", "Sep_2016", "Oct_2016", "Nov_2016", "Dec_2016", "Jan_2017", "Feb_2017", "Mar_2017", "Apr_2017", "May_2017", "Jun_2017", "Jul_2017", "Aug_2017", "Sep_2017", "Oct_2017", "Nov_2017", "Dec_2017", "Jan_2018", "Feb_2018", "Mar_2018", "Apr_2018", "May_2018", "Jun_2018", "Jul_2018", "Aug_2018", "Sep_2018", "Oct_2018", "Nov_2018", "Dec_2018", "Jan_2019", "Feb_2019", "Mar_2019", "Apr_2019", "May_2019", "Jun_2019", "Jul_2019", "Aug_2019", "Sep_2019", "Oct_2019", "Nov_2019", "Dec_2019", "Jan_2020", "Feb_2020", "Mar_2020", "Apr_2020", "May_2020", "Jun_2020", "Jul_2020", "Aug_2020", "Sep_2020", "Oct_2020", "Nov_2020", "Dec_2020",  "Jan_2021", "Feb_2021", "Mar_2021", "Apr_2021", "May_2021", "Jun_2021", "Jul_2021", "Aug_2021", "Sep_2021", "Oct_2021")

# Replace all NA values with 0
streeteasy_onebdr <- streeteasy_onebdr %>% replace(is.na(.), 0)

\(~\)

Year 2016

# Select columns from 2016
yr2016_onebdr <- streeteasy_onebdr %>% 
  select(4:15)

# cbind new columns with Mean of 2016
yr_2016_onebdr <- cbind(yr2016_onebdr, Year_2016_OneBdr = round(rowMeans(yr2016_onebdr)))

\(~\)

Year 2017

# Select columns from 2017
yr2017_onebdr<- streeteasy_onebdr %>% 
  select(16:27)

# cbind new columns with Mean of 2017
yr_2017_onebdr<- cbind(yr2017_onebdr, Year_2017_OneBdr = round(rowMeans(yr2017_onebdr)))

\(~\)

Year 2018

# Select columns from 2018
yr2018_onebdr <- streeteasy_onebdr %>% 
  select(28:39)

# cbind new columns with Mean of 2018
yr_2018_onebdr <- cbind(yr2018_onebdr, Year_2018_OneBdr = round(rowMeans(yr2018_onebdr)))

\(~\)

Year 2019

# Select columns from 2019
yr2019_onebdr <- streeteasy_onebdr %>% 
  select(40:51)

# cbind new columns with Mean of 2019
yr_2019_onebdr <- cbind(yr2019_onebdr, Year_2019_OneBdr = round(rowMeans(yr2019_onebdr)))

\(~\)

Year 2020

# Select columns from 2020
yr2020_onebdr <- streeteasy_onebdr %>% 
  select(52:63)

# cbind new columns with Mean of 2020
yr_2020_onebdr <- cbind(yr2020_onebdr, Year_2020_OneBdr = round(rowMeans(yr2020_onebdr)))

\(~\)

Year 2021

# Select columns from 2021
yr2021_onebdr <- streeteasy_onebdr %>% 
  select(64:73)

# cbind new columns with Mean of 2020
yr_2021_onebdr <- cbind(yr2021_onebdr, Year_2021_OneBdr = round(rowMeans(yr2021_onebdr)))

\(~\)

Street Easy Two Bedrooms

Rows: 198 Columns: 145

\(~\)

# Remove columns not needed
streeteasy_twobrd <- street_easy_twobrd[, -c(4:75)]

# Rename columns
colnames(streeteasy_twobrd) <- c("Area_Name", "Borough", "Area_Type", "Jan_2016", "Feb_2016", "Mar_2016", "Apr_2016", "May_2016", "Jun_2016", "Jul_2016", "Aug_2016", "Sep_2016", "Oct_2016", "Nov_2016", "Dec_2016", "Jan_2017", "Feb_2017", "Mar_2017", "Apr_2017", "May_2017", "Jun_2017", "Jul_2017", "Aug_2017", "Sep_2017", "Oct_2017", "Nov_2017", "Dec_2017", "Jan_2018", "Feb_2018", "Mar_2018", "Apr_2018", "May_2018", "Jun_2018", "Jul_2018", "Aug_2018", "Sep_2018", "Oct_2018", "Nov_2018", "Dec_2018", "Jan_2019", "Feb_2019", "Mar_2019", "Apr_2019", "May_2019", "Jun_2019", "Jul_2019", "Aug_2019", "Sep_2019", "Oct_2019", "Nov_2019", "Dec_2019", "Jan_2020", "Feb_2020", "Mar_2020", "Apr_2020", "May_2020", "Jun_2020", "Jul_2020", "Aug_2020", "Sep_2020", "Oct_2020", "Nov_2020", "Dec_2020",  "Jan_2021", "Feb_2021", "Mar_2021", "Apr_2021", "May_2021", "Jun_2021", "Jul_2021", "Aug_2021", "Sep_2021", "Oct_2021")

# Replace all NA values with 0
streeteasy_twobrd <- streeteasy_twobrd %>% replace(is.na(.), 0)

\(~\)

Year 2016

# Select columns from 2016
yr2016_twobdr <- streeteasy_twobrd %>% 
  select(4:15)

# cbind new columns with Mean of 2016
yr_2016_twobdr <- cbind(yr2016_twobdr, Year_2016_TwoBdr = round(rowMeans(yr2016_twobdr)))

\(~\)

Year 2017

# Select columns from 2017
yr2017_twobdr <- streeteasy_twobrd %>% 
  select(16:27)

# cbind new columns with Mean of 2017
yr_2017_twobdr <- cbind(yr2017_twobdr, Year_2017_TwoBdr = round(rowMeans(yr2017_twobdr)))

\(~\)

Year 2018

# Select columns from 2018
yr2018_twobdr <- streeteasy_twobrd %>% 
  select(28:39)

# cbind new columns with Mean of 2018
yr_2018_twobdr <- cbind(yr2018_twobdr, Year_2018_TwoBdr = round(rowMeans(yr2018_twobdr)))

\(~\)

Year 2019

# Select columns from 2019
yr2019_twobdr <- streeteasy_twobrd %>% 
  select(40:51)

# cbind new columns with Mean of 2019
yr_2019_twobdr <- cbind(yr2019_twobdr, Year_2019_TwoBdr = round(rowMeans(yr2019_twobdr)))

\(~\)

Year 2020

# Select columns from 2020
yr2020_twobdr <- streeteasy_twobrd %>% 
  select(52:63)

# cbind new columns with Mean of 2020
yr_2020_twobdr <- cbind(yr2020_twobdr, Year_2020_TwoBdr = round(rowMeans(yr2020_twobdr)))

\(~\)

Year 2021

# Select columns from 2021
yr2021_twobdr <- streeteasy_twobrd %>% 
  select(64:73)

# cbind new columns with Mean of 2020
yr_2021_twobdr <- cbind(yr2021_twobdr, Year_2021_TwoBdr = round(rowMeans(yr2021_twobdr)))

\(~\)

Street Easy Three+ Bedrooms

Rows: 198 Columns: 145

\(~\)

# Remove columns not needed
streeteasy_threebrds <- street_easy_threebrds[, -c(4:75)]

# Rename columns
colnames(streeteasy_threebrds) <- c("Area_Name", "Borough", "Area_Type", "Jan_2016", "Feb_2016", "Mar_2016", "Apr_2016", "May_2016", "Jun_2016", "Jul_2016", "Aug_2016", "Sep_2016", "Oct_2016", "Nov_2016", "Dec_2016", "Jan_2017", "Feb_2017", "Mar_2017", "Apr_2017", "May_2017", "Jun_2017", "Jul_2017", "Aug_2017", "Sep_2017", "Oct_2017", "Nov_2017", "Dec_2017", "Jan_2018", "Feb_2018", "Mar_2018", "Apr_2018", "May_2018", "Jun_2018", "Jul_2018", "Aug_2018", "Sep_2018", "Oct_2018", "Nov_2018", "Dec_2018", "Jan_2019", "Feb_2019", "Mar_2019", "Apr_2019", "May_2019", "Jun_2019", "Jul_2019", "Aug_2019", "Sep_2019", "Oct_2019", "Nov_2019", "Dec_2019", "Jan_2020", "Feb_2020", "Mar_2020", "Apr_2020", "May_2020", "Jun_2020", "Jul_2020", "Aug_2020", "Sep_2020", "Oct_2020", "Nov_2020", "Dec_2020",  "Jan_2021", "Feb_2021", "Mar_2021", "Apr_2021", "May_2021", "Jun_2021", "Jul_2021", "Aug_2021", "Sep_2021", "Oct_2021")

# Replace all NA values with 0
streeteasy_threebrds <- streeteasy_threebrds %>% replace(is.na(.), 0)

\(~\)

Year 2016

# Select columns from 2016
yr2016_threebdrs <- streeteasy_threebrds %>% 
  select(4:15)

# cbind new columns with Mean of 2016
yr_2016_threebdrs <- cbind(yr2016_threebdrs, Year_2016_ThreeBdrs = round(rowMeans(yr2016_threebdrs)))

\(~\)

Year 2017

# Select columns from 2017
yr2017_threebdrs <- streeteasy_threebrds %>% 
  select(16:27)

# cbind new columns with Mean of 2017
yr_2017_threebdrs <- cbind(yr2017_threebdrs, Year_2017_ThreeBdrs = round(rowMeans(yr2017_threebdrs)))

\(~\)

Year 2018

# Select columns from 2018
yr2018_threebdrs <- streeteasy_threebrds %>% 
  select(28:39)

# cbind new columns with Mean of 2018
yr_2018_threebdrs <- cbind(yr2018_threebdrs, Year_2018_ThreeBdrs = round(rowMeans(yr2018_threebdrs)))

\(~\)

Year 2019

# Select columns from 2019
yr2019_threebdrs <- streeteasy_threebrds %>% 
  select(40:51)

# cbind new columns with Mean of 2019
yr_2019_threebdrs <- cbind(yr2019_threebdrs, Year_2019_ThreeBdrs = round(rowMeans(yr2019_threebdrs)))

\(~\)

Year 2020

# Select columns from 2020
yr2020_threebdrs <- streeteasy_threebrds %>% 
  select(52:63)

# cbind new columns with Mean of 2020
yr_2020_threebdrs <- cbind(yr2020_threebdrs, Year_2020_ThreeBdrs = round(rowMeans(yr2020_threebdrs)))

\(~\)

Year 2021

# Select columns from 2021
yr2021_threebdrs <- streeteasy_threebrds %>% 
  select(64:73)

# cbind new columns with Mean of 2020
yr_2021_threebdrs <- cbind(yr2021_threebdrs, Year_2021_ThreeBdrs = round(rowMeans(yr2021_threebdrs)))

\(~\)

Combining tables by Year

# cbind year 2016 for all apartment types
street_easy16 <- cbind(yr_2016_studio, yr_2016_onebdr, yr_2016_twobdr, yr_2016_threebdrs)
street_easy16 <- street_easy16[-c(1:12, 14:25, 27:38, 40:51, 53:64, 66:75)]

# cbind year 2017 for all apartment types
street_easy17 <- cbind(yr_2017_studio, yr_2017_onebdr, yr_2017_twobdr, yr_2017_threebdrs)
street_easy17 <- street_easy17[-c(1:12, 14:25, 27:38, 40:51, 53:64, 66:75)]

# cbind year 2018 for all apartment types
street_easy18 <- cbind(yr_2018_studio, yr_2018_onebdr, yr_2018_twobdr, yr_2018_threebdrs)
street_easy18 <- street_easy18[-c(1:12, 14:25, 27:38, 40:51, 53:64, 66:75)]

# cbind year 2019 for all apartment types
street_easy19 <- cbind(yr_2019_studio, yr_2019_onebdr, yr_2019_twobdr, yr_2019_threebdrs)
street_easy19 <- street_easy19[-c(1:12, 14:25, 27:38, 40:51, 53:64, 66:75)]

# cbind year 2020 for all apartment types
street_easy20 <- cbind(yr_2020_studio, yr_2020_onebdr, yr_2020_twobdr, yr_2020_threebdrs)
street_easy20 <- street_easy20[-c(1:12, 14:25, 27:38, 40:51, 53:64, 66:75)]

# cbind year 2021 for all apartment types
street_easy21 <- cbind(yr_2021_studio, yr_2021_onebdr, yr_2021_twobdr, yr_2021_threebdrs)
street_easy21 <- street_easy21[-c(1:10, 12:21, 23:32, 34:43)]

\(~\)

Joining original table with new per Year tables

# cbind original table with new table and drop all columns not needed
street_easy_All <- cbind(streeteasy_all, street_easy16, street_easy17, street_easy18, street_easy19, street_easy20, street_easy21)

# Dropping columns not needed
street_easy_All <- street_easy_All[-c(4:73)]

\(~\)

Combine tables by apartment type

Another way to combine tables to explore

# cbind all tables with the mean and drop all columns not needed

# All Apt Types
street_easyall <-  cbind(yr_2016_all, yr_2017_all, yr_2018_all, yr_2019_all, yr_2020_all, yr_2021_all) 
street_easyall <- street_easyall[-c(1:12, 14:25, 27:38, 40:51, 53:64, 66:75)]

# Studio's only
street_easystudio <- cbind(yr_2016_studio, yr_2017_studio, yr_2018_studio, yr_2019_studio, yr_2020_studio, yr_2021_studio)
street_easystudio <- street_easystudio[-c(1:12, 14:25, 27:38, 40:51, 53:64, 66:75)]

# One Bedroom only
street_easyonebdr <- cbind(yr_2016_onebdr, yr_2017_onebdr, yr_2018_onebdr, yr_2019_onebdr, yr_2020_onebdr, yr_2021_onebdr)
street_easyonebdr <- street_easyonebdr[-c(1:12, 14:25, 27:38, 40:51, 53:64, 66:75)]

# Two Bedrooms only
street_easytwobdr <- cbind(yr_2016_twobdr, yr_2017_twobdr, yr_2018_twobdr, yr_2019_twobdr, yr_2020_twobdr, yr_2021_twobdr)
street_easytwobdr <- street_easytwobdr[-c(1:12, 14:25, 27:38, 40:51, 53:64, 66:75)]

# Three Bedrooms only
street_easythreebdrs <- cbind(yr_2016_threebdrs, yr_2017_threebdrs, yr_2018_threebdrs, yr_2019_threebdrs, yr_2020_threebdrs, yr_2021_threebdrs)
street_easythreebdrs <- street_easythreebdrs[-c(1:12, 14:25, 27:38, 40:51, 53:64, 66:75)]

\(~\)

Combine tables Studio, One Bedroom, Two Bedrooms, and Three Bedrooms ONLY

Another way to combine tables to explore

# cbind all Studio tables with original dataset and drop columns not needed
Street_Easy_Studio <- cbind(streeteasy_all, yr_2016_studio, yr_2017_studio, yr_2018_studio, yr_2019_studio, yr_2020_studio, yr_2021_studio)
Street_Easy_Studio <- Street_Easy_Studio[-c(4:85, 87:98, 100:111, 113:124, 126:137, 139:148)]


# cbind all One Bedroom tables with original dataset and drop columns not needed
Street_Easy_OneBdr <- cbind(streeteasy_all, yr_2016_onebdr, yr_2017_onebdr, yr_2018_onebdr, yr_2019_onebdr, yr_2020_onebdr, yr_2021_onebdr)
Street_Easy_OneBdr <- Street_Easy_OneBdr[-c(4:85, 87:98, 100:111, 113:124, 126:137, 139:148)]

# cbind all Two Bedrooms tables with original dataset and drop columns not needed
Street_Easy_TwoBdr <- cbind(streeteasy_all, yr_2016_twobdr, yr_2017_twobdr, yr_2018_twobdr, yr_2019_twobdr, yr_2020_twobdr, yr_2021_twobdr)
Street_Easy_TwoBdr <- Street_Easy_TwoBdr[-c(4:85, 87:98, 100:111, 113:124, 126:137, 139:148)]

# cbind all 3 or more Bedrooms with original data set and drop columns not needed
Street_Easy_ThreeBdrs <- cbind(streeteasy_all, yr_2016_threebdrs, yr_2017_threebdrs, yr_2018_threebdrs, yr_2019_threebdrs, yr_2020_threebdrs, yr_2021_threebdrs)
Street_Easy_ThreeBdrs <- Street_Easy_ThreeBdrs[-c(4:85, 87:98, 100:111, 113:124, 126:137, 139:148)]

\(~\)

Join tables by apartment type

This data frame will be used to write .csv file to then be used in tableau. This was easier to explore since it was already categorized by apartment type with a subcategory of years.

# cbind original table with new table and drop all columns not needed
Street_Easy_All <- cbind(streeteasy_all, street_easyall, street_easystudio, street_easyonebdr, street_easytwobdr, street_easythreebdrs)

# Dropping columns not needed
Street_Easy_All <- Street_Easy_All[-c(4:73)]

# Replace 0 in Borough column with NYC 
Street_Easy_All$Borough[Street_Easy_All$Borough=="0"]<-"NYC"
head(Street_Easy_All, n = 4)
##             Area_Name   Borough Area_Type Year_2016_All Year_2017_All
## 1        All Downtown Manhattan submarket          3765          3689
## 2         All Midtown Manhattan submarket          3516          3471
## 3 All Upper East Side Manhattan submarket          2979          2960
## 4 All Upper Manhattan Manhattan submarket          2363          2335
##   Year_2018_All Year_2019_All Year_2020_All Year_2021_All Year_2016_Studio
## 1          3779          3956          3561          3583             2795
## 2          3474          3637          3249          3313             2613
## 3          2912          3057          2881          2570             2123
## 4          2343          2408          2351          2250             1645
##   Year_2017_Studio Year_2018_Studio Year_2019_Studio Year_2020_Studio
## 1             2738             2779             2925             2605
## 2             2569             2559             2678             2404
## 3             2088             2094             2156             2042
## 4             1685             1730             1765             1746
##   Year_2021_Studio Year_2016_OneBdr Year_2017_OneBdr Year_2018_OneBdr
## 1             2626             3590             3508             3564
## 2             2378             3509             3462             3456
## 3             1896             2831             2811             2799
## 4             1692             1927             1955             1967
##   Year_2019_OneBdr Year_2020_OneBdr Year_2021_OneBdr Year_2016_TwoBdr
## 1             3804             3428             3519             4771
## 2             3611             3221             3342             5013
## 3             2920             2770             2522             3808
## 4             2026             2013             1925             2408
##   Year_2017_TwoBdr Year_2018_TwoBdr Year_2019_TwoBdr Year_2020_TwoBdr
## 1             4527             4497             4727             4270
## 2             4883             4730             4953             4377
## 3             3858             3679             4015             3816
## 4             2422             2440             2504             2438
##   Year_2021_TwoBdr Year_2016_ThreeBdrs Year_2017_ThreeBdrs Year_2018_ThreeBdrs
## 1             4114                7115                6478                6464
## 2             4793                6432                6339                6006
## 3             3325                8553                8146                7924
## 4             2360                3161                3129                3138
##   Year_2019_ThreeBdrs Year_2020_ThreeBdrs Year_2021_ThreeBdrs
## 1                7224                6275                5873
## 2                6165                5393                5969
## 3                8632                8227                7044
## 4                3193                3045                2902

\(~\)

StreetEasy Data Exploration

# Count of Boroughs
borough <- Street_Easy_All %>%
  group_by(Borough) %>%
  summarise(num=n()) %>%
  arrange(desc(num))
head(borough, n = 6)
## # A tibble: 6 Ă— 2
##   Borough         num
##   <chr>         <int>
## 1 Queens           60
## 2 Brooklyn         55
## 3 Bronx            43
## 4 Manhattan        38
## 5 NYC               1
## 6 Staten Island     1

\(~\)

summary(Street_Easy_All)
##   Area_Name           Borough           Area_Type         Year_2016_All   
##  Length:198         Length:198         Length:198         Min.   :   0.0  
##  Class :character   Class :character   Class :character   1st Qu.: 164.2  
##  Mode  :character   Mode  :character   Mode  :character   Median :1831.0  
##                                                           Mean   :1729.7  
##                                                           3rd Qu.:2589.0  
##                                                           Max.   :7509.0  
##  Year_2017_All    Year_2018_All    Year_2019_All  Year_2020_All   
##  Min.   :   0.0   Min.   :   0.0   Min.   :   0   Min.   :   0.0  
##  1st Qu.: 554.2   1st Qu.: 745.8   1st Qu.: 876   1st Qu.: 348.8  
##  Median :1925.5   Median :1926.5   Median :1975   Median :1957.5  
##  Mean   :1798.4   Mean   :1830.5   Mean   :1902   Mean   :1760.1  
##  3rd Qu.:2554.5   3rd Qu.:2594.0   3rd Qu.:2675   3rd Qu.:2599.8  
##  Max.   :7082.0   Max.   :7539.0   Max.   :7180   Max.   :7078.0  
##  Year_2021_All  Year_2016_Studio Year_2017_Studio Year_2018_Studio
##  Min.   :   0   Min.   :   0.0   Min.   :   0.0   Min.   :   0.0  
##  1st Qu.: 499   1st Qu.:   0.0   1st Qu.:   0.0   1st Qu.:   0.0  
##  Median :1945   Median :   0.0   Median :   0.0   Median : 122.0  
##  Mean   :1780   Mean   : 825.8   Mean   : 879.3   Mean   : 912.9  
##  3rd Qu.:2508   3rd Qu.:1652.8   3rd Qu.:1706.8   3rd Qu.:1778.2  
##  Max.   :7330   Max.   :3687.0   Max.   :3482.0   Max.   :3859.0  
##  Year_2019_Studio Year_2020_Studio Year_2021_Studio Year_2016_OneBdr
##  Min.   :   0.0   Min.   :   0.0   Min.   :   0.0   Min.   :   0    
##  1st Qu.:   0.0   1st Qu.:   0.0   1st Qu.:   0.0   1st Qu.:   0    
##  Median : 167.0   Median : 144.0   Median : 166.5   Median :1428    
##  Mean   : 953.9   Mean   : 919.2   Mean   : 953.3   Mean   :1341    
##  3rd Qu.:1833.0   3rd Qu.:1776.8   3rd Qu.:1846.2   3rd Qu.:2126    
##  Max.   :3983.0   Max.   :3401.0   Max.   :4043.0   Max.   :6722    
##  Year_2017_OneBdr Year_2018_OneBdr Year_2019_OneBdr Year_2020_OneBdr
##  Min.   :   0     Min.   :   0     Min.   :   0     Min.   :   0    
##  1st Qu.:   0     1st Qu.:   0     1st Qu.:   0     1st Qu.:   0    
##  Median :1526     Median :1488     Median :1662     Median :1676    
##  Mean   :1364     Mean   :1365     Mean   :1468     Mean   :1380    
##  3rd Qu.:2173     3rd Qu.:2229     3rd Qu.:2278     3rd Qu.:2260    
##  Max.   :6304     Max.   :6410     Max.   :6031     Max.   :5426    
##  Year_2021_OneBdr Year_2016_TwoBdr Year_2017_TwoBdr Year_2018_TwoBdr
##  Min.   :   0     Min.   :    0    Min.   :   0     Min.   :   0    
##  1st Qu.:   0     1st Qu.:    0    1st Qu.:   0     1st Qu.:   0    
##  Median :1660     Median : 1802    Median :1918     Median :1968    
##  Mean   :1410     Mean   : 1777    Mean   :1780     Mean   :1804    
##  3rd Qu.:2186     3rd Qu.: 2706    3rd Qu.:2668     3rd Qu.:2680    
##  Max.   :6001     Max.   :10025    Max.   :9675     Max.   :9936    
##  Year_2019_TwoBdr Year_2020_TwoBdr Year_2021_TwoBdr Year_2016_ThreeBdrs
##  Min.   :    0    Min.   :    0    Min.   :    0    Min.   :    0      
##  1st Qu.:    0    1st Qu.:    0    1st Qu.:    0    1st Qu.:    0      
##  Median : 2001    Median : 1870    Median : 2023    Median : 1592      
##  Mean   : 1886    Mean   : 1759    Mean   : 1810    Mean   : 2341      
##  3rd Qu.: 2737    3rd Qu.: 2580    3rd Qu.: 2696    3rd Qu.: 3157      
##  Max.   :10397    Max.   :10850    Max.   :10406    Max.   :16737      
##  Year_2017_ThreeBdrs Year_2018_ThreeBdrs Year_2019_ThreeBdrs
##  Min.   :    0       Min.   :    0       Min.   :    0      
##  1st Qu.:    0       1st Qu.:    0       1st Qu.:    0      
##  Median : 1717       Median : 2182       Median : 2123      
##  Mean   : 2321       Mean   : 2281       Mean   : 2377      
##  3rd Qu.: 3194       3rd Qu.: 3218       3rd Qu.: 3234      
##  Max.   :16188       Max.   :14669       Max.   :16682      
##  Year_2020_ThreeBdrs Year_2021_ThreeBdrs
##  Min.   :    0       Min.   :    0      
##  1st Qu.:    0       1st Qu.:    0      
##  Median : 1498       Median : 1748      
##  Mean   : 2183       Mean   : 2227      
##  3rd Qu.: 3184       3rd Qu.: 3011      
##  Max.   :17185       Max.   :17224

\(~\)

# Mean for yr 2016 

round(mean(Street_Easy_All$Year_2016_All))
## [1] 1730
round(mean(Street_Easy_All$Year_2016_Studio))
## [1] 826
round(mean(Street_Easy_All$Year_2016_OneBdr))
## [1] 1341
round(mean(Street_Easy_All$Year_2016_TwoBdr))
## [1] 1777
round(mean(Street_Easy_All$Year_2016_ThreeBdrs))
## [1] 2341

\(~\)

# Mean for yr 2017

round(mean(Street_Easy_All$Year_2017_All))
## [1] 1798
round(mean(Street_Easy_All$Year_2017_Studio))
## [1] 879
round(mean(Street_Easy_All$Year_2017_OneBdr))
## [1] 1364
round(mean(Street_Easy_All$Year_2017_TwoBdr))
## [1] 1780
round(mean(Street_Easy_All$Year_2017_ThreeBdrs))
## [1] 2321

\(~\)

# Mean for yr 2018

round(mean(Street_Easy_All$Year_2018_All))
## [1] 1830
round(mean(Street_Easy_All$Year_2018_Studio))
## [1] 913
round(mean(Street_Easy_All$Year_2018_OneBdr))
## [1] 1365
round(mean(Street_Easy_All$Year_2018_TwoBdr))
## [1] 1804
round(mean(Street_Easy_All$Year_2018_ThreeBdrs))
## [1] 2281

\(~\)

# Mean for yr 2019

round(mean(Street_Easy_All$Year_2019_All))
## [1] 1902
round(mean(Street_Easy_All$Year_2019_Studio))
## [1] 954
round(mean(Street_Easy_All$Year_2019_OneBdr))
## [1] 1468
round(mean(Street_Easy_All$Year_2019_TwoBdr))
## [1] 1886
round(mean(Street_Easy_All$Year_2019_ThreeBdrs))
## [1] 2377

\(~\)

# Mean for yr 2020

round(mean(Street_Easy_All$Year_2020_All))
## [1] 1760
round(mean(Street_Easy_All$Year_2020_Studio))
## [1] 919
round(mean(Street_Easy_All$Year_2020_OneBdr))
## [1] 1380
round(mean(Street_Easy_All$Year_2020_TwoBdr))
## [1] 1759
round(mean(Street_Easy_All$Year_2020_ThreeBdrs))
## [1] 2183

\(~\)

# Mean for yr 2021

round(mean(Street_Easy_All$Year_2021_All))
## [1] 1780
round(mean(Street_Easy_All$Year_2021_Studio))
## [1] 953
round(mean(Street_Easy_All$Year_2021_OneBdr))
## [1] 1410
round(mean(Street_Easy_All$Year_2021_TwoBdr))
## [1] 1810
round(mean(Street_Easy_All$Year_2021_ThreeBdrs))
## [1] 2227

\(~\)

# Count of Area_Type
area_type <- Street_Easy_All %>%
  group_by(Area_Type) %>%
  summarise(num=n()) %>%
  arrange(desc(num))
head(area_type, n = 5)
## # A tibble: 4 Ă— 2
##   Area_Type      num
##   <chr>        <int>
## 1 neighborhood   177
## 2 submarket       15
## 3 borough          5
## 4 city             1

\(~\)

Write .csv file for Street_Easy_All

This file will be helpful for Tableau Public

# commenting out so that it doesn't replicate in the system

#write.csv(Street_Easy_All, "/Users/letiix3/Desktop/Data-607/Final Project/csv files/Street_Easy_All.csv")

\(~\)

Codecademy + StreetEasy Data Wrangling

Rows: 5,000 Columns: 20

Codecademy + StreetEasy: sample of 5,000 rental listings in Manhattan, Brooklyn, and Queens from June 2016.

\(~\)

Headers Description
rental_id rental ID
building_id building ID
rent price of rent($)
bedrooms number of bedrooms
bathrooms number of bathrooms
size_sqft size in square feet
min_to_subway distance from subway station in minutes
floor floor number
building_age_yrs building’s age in years
no_fee does it have a broker fee? (0 for fee, 1 for no fee)
has_rooftdeck does it have a roof deck? (o for no, 1 for yes)
has_washer_dryer does it have washer/dryer in unit (0/1)
has_doorman does it have a doorman? (0/1)
has_elevator does it have an elevator? (0/1)
has_dishwasher does it have a dishwasher? (0/1)
has_patio does the building have a gym? (0/1)
has_gym does the building have a gym? (0/1)
neighborhood neighborhood (ex: Greenpoint)
submarket submarket (ex: North Brooklyn)
borough borough (ex: Brooklyn)

\(~\)

Load Codecademy + StreetEasy Data

# Codecademy + StreetEasy
street_easy_cc <- read.csv("https://raw.githubusercontent.com/Codecademy/datasets/master/streeteasy/streeteasy.csv", header = TRUE)

\(~\)

# commenting out so that it does not reproduce

#glimpse(street_easy_cc)

\(~\)

# Remove columns not needed
street_easy_cc <- street_easy_cc[, -c(2, 6:17)]

# Rename columns
colnames(street_easy_cc) <- c("Rental_ID", "Rent_Price", "No_Bedrooms", "No_Bathrooms", "Neighborhood", "SubMarket", "Borough" )

head(street_easy_cc, n = 4)
##   Rental_ID Rent_Price No_Bedrooms No_Bathrooms      Neighborhood
## 1      1545       2550           0            1   Upper East Side
## 2      2472      11500           2            2 Greenwich Village
## 3     10234       3000           3            1           Astoria
## 4      2919       4500           1            1           Midtown
##             SubMarket   Borough
## 1 All Upper East Side Manhattan
## 2        All Downtown Manhattan
## 3    Northwest Queens    Queens
## 4         All Midtown Manhattan

\(~\)

Write .csv file for Codecademy + StreetEasy

This file will be helpful for Tableau Public

# commenting out so that it doesn't replicate in the system

#write.csv(street_easy_cc, "/Users/letiix3/Desktop/Data-607/Final Project/csv files/street_easy_cc.csv")

\(~\)

Codecademy + StreetEasy Data Exploration

\(~\)

summary(street_easy_cc)
##    Rental_ID       Rent_Price     No_Bedrooms     No_Bathrooms  
##  Min.   :    1   Min.   : 1250   Min.   :0.000   Min.   :0.000  
##  1st Qu.: 2700   1st Qu.: 2750   1st Qu.:1.000   1st Qu.:1.000  
##  Median : 5456   Median : 3600   Median :1.000   Median :1.000  
##  Mean   : 5527   Mean   : 4537   Mean   :1.396   Mean   :1.322  
##  3rd Qu.: 8306   3rd Qu.: 5200   3rd Qu.:2.000   3rd Qu.:2.000  
##  Max.   :11349   Max.   :20000   Max.   :5.000   Max.   :5.000  
##  Neighborhood        SubMarket           Borough         
##  Length:5000        Length:5000        Length:5000       
##  Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character  
##                                                          
##                                                          
## 

\(~\)

# Count of Bedrooms
bedrooms <- street_easy_cc%>%
  group_by(No_Bedrooms)%>%
  summarise(num=n())%>%
  arrange(desc(num))
head(bedrooms)
## # A tibble: 6 Ă— 2
##   No_Bedrooms   num
##         <dbl> <int>
## 1         1    2048
## 2         2    1438
## 3         0     832
## 4         3     496
## 5         4     103
## 6         1.5    28

\(~\)

# Count of Bathrooms
bathrooms <- street_easy_cc%>%
  group_by(No_Bathrooms)%>%
  summarise(num=n())%>%
  arrange(desc(num))
head(bathrooms)
## # A tibble: 6 Ă— 2
##   No_Bathrooms   num
##          <int> <int>
## 1            1  3572
## 2            2  1212
## 3            3   182
## 4            0    18
## 5            4    14
## 6            5     2

\(~\)

# Count of Neighborhood
neighborhood <- street_easy_cc%>%
  group_by(Neighborhood)%>%
  summarise(num=n())%>%
  arrange(desc(num))
head(neighborhood, n = 6)
## # A tibble: 6 Ă— 2
##   Neighborhood         num
##   <chr>              <int>
## 1 Upper West Side      579
## 2 Upper East Side      500
## 3 Midtown East         460
## 4 Midtown West         314
## 5 Williamsburg         306
## 6 Financial District   268

\(~\)

# Count of Submarket
submarket<- street_easy_cc%>%
  group_by(SubMarket)%>%
  summarise(num=n())%>%
  arrange(desc(num))
head(submarket)
## # A tibble: 6 Ă— 2
##   SubMarket             num
##   <chr>               <int>
## 1 All Downtown         1229
## 2 All Midtown          1001
## 3 All Upper West Side   579
## 4 All Upper East Side   505
## 5 North Brooklyn        347
## 6 Northwest Brooklyn    311

\(~\)

# Count of Borough
borough <- street_easy_cc%>%
  group_by(Borough)%>%
  summarise(num=n())%>%
  arrange(desc(num))
head(borough)
## # A tibble: 3 Ă— 2
##   Borough     num
##   <chr>     <int>
## 1 Manhattan  3539
## 2 Brooklyn   1013
## 3 Queens      448

\(~\)

# Mean for Rent Price
round(mean(street_easy_cc$Rent_Price))
## [1] 4537

\(~\)

Apartment List State Data Wrangling

Rows: 206 Columns: 62

\(~\)

Load Apartment List Data

Apartment List State data set from 2017 - October 2021.

# Load Data

apt_list_state <- read.csv("https://raw.githubusercontent.com/letisalba/Data-607/main/Final%20Project/csv%20files/State_AptList_Rent.csv", header=TRUE, stringsAsFactors=FALSE, na.strings=c(""," ","NA"));

\(~\)

# Rename columns
colnames(apt_list_state) <- c("State_Name", "FIPS_Code", "Population", "Bedroom_Size", "Jan_2017", "Feb_2017", "Mar_2017", "Apr_2017", "May_2017", "Jun_2017", "Jul_2017", "Aug_2017", "Sep_2017", "Oct_2017", "Nov_2017", "Dec_2017", "Jan_2018", "Feb_2018", "Mar_2018", "Apr_2018", "May_2018", "Jun_2018", "Jul_2018", "Aug_2018", "Sep_2018", "Oct_2018", "Nov_2018", "Dec_2018", "Jan_2019", "Feb_2019", "Mar_2019", "Apr_2019", "May_2019", "Jun_2019", "Jul_2019", "Aug_2019", "Sep_2019", "Oct_2019", "Nov_2019", "Dec_2019", "Jan_2020", "Feb_2020", "Mar_2020", "Apr_2020", "May_2020", "Jun_2020", "Jul_2020", "Aug_2020", "Sep_2020", "Oct_2020", "Nov_2020", "Dec_2020",  "Jan_2021", "Feb_2021", "Mar_2021", "Apr_2021", "May_2021", "Jun_2021", "Jul_2021", "Aug_2021", "Sep_2021", "Oct_2021")

# Replace all NA values with 0
apt_list_state <- apt_list_state %>% replace(is.na(.), 0)

\(~\)

# Make 5 - 62 numeric
apt_list_state <- apt_list_state %>% 
     mutate_at(c(5:62), as.numeric)

\(~\)

# Rename values in Bedroom_Size column to remove punctuation
Apt_list_state <- recode(apt_list_state$Bedroom_Size,
                         "_Overall" = "Overall",
                         "_Studio" = "Studio")
head(Apt_list_state)
## [1] "Overall" "Studio"  "1br"     "2br"     "3br"     "4br"

\(~\)

# cbind tables
Apt_List_state <- cbind(apt_list_state, Apt_list_state)

# Drop column not needed
Apt_List_state <- Apt_List_state[, -c(4)]

# Rename column Apt_list_state to Bedroom_size
colnames(Apt_List_state) <- c("State_Name", "FIPS_Code", "Population", "Jan_2017", "Feb_2017", "Mar_2017", "Apr_2017", "May_2017", "Jun_2017", "Jul_2017", "Aug_2017", "Sep_2017", "Oct_2017", "Nov_2017", "Dec_2017", "Jan_2018", "Feb_2018", "Mar_2018", "Apr_2018", "May_2018", "Jun_2018", "Jul_2018", "Aug_2018", "Sep_2018", "Oct_2018", "Nov_2018", "Dec_2018", "Jan_2019", "Feb_2019", "Mar_2019", "Apr_2019", "May_2019", "Jun_2019", "Jul_2019", "Aug_2019", "Sep_2019", "Oct_2019", "Nov_2019", "Dec_2019", "Jan_2020", "Feb_2020", "Mar_2020", "Apr_2020", "May_2020", "Jun_2020", "Jul_2020", "Aug_2020", "Sep_2020", "Oct_2020", "Nov_2020", "Dec_2020",  "Jan_2021", "Feb_2021", "Mar_2021", "Apr_2021", "May_2021", "Jun_2021", "Jul_2021", "Aug_2021", "Sep_2021", "Oct_2021", "Bedroom_size")

# Reorder columns
col_order <- c("State_Name", "FIPS_Code", "Population", "Bedroom_size", "Jan_2017", "Feb_2017", "Mar_2017", "Apr_2017", "May_2017", "Jun_2017", "Jul_2017", "Aug_2017", "Sep_2017", "Oct_2017", "Nov_2017", "Dec_2017", "Jan_2018", "Feb_2018", "Mar_2018", "Apr_2018", "May_2018", "Jun_2018", "Jul_2018", "Aug_2018", "Sep_2018", "Oct_2018", "Nov_2018", "Dec_2018", "Jan_2019", "Feb_2019", "Mar_2019", "Apr_2019", "May_2019", "Jun_2019", "Jul_2019", "Aug_2019", "Sep_2019", "Oct_2019", "Nov_2019", "Dec_2019", "Jan_2020", "Feb_2020", "Mar_2020", "Apr_2020", "May_2020", "Jun_2020", "Jul_2020", "Aug_2020", "Sep_2020", "Oct_2020", "Nov_2020", "Dec_2020",  "Jan_2021", "Feb_2021", "Mar_2021", "Apr_2021", "May_2021", "Jun_2021", "Jul_2021", "Aug_2021", "Sep_2021", "Oct_2021")

Apt_List_state <- Apt_List_state[, col_order]

\(~\)

Year 2017

# Select columns from 2017
rent_state_2017 <- Apt_List_state %>% 
  select(5:16)


# cbind new columns with Mean of 2017
rent_state_17 <- cbind(rent_state_2017, Average_Rent_State_17 = round(rowMeans(rent_state_2017)))

\(~\)

Year 2018

# Select columns from 2018
rent_state_2018 <- Apt_List_state %>% 
  select(17:28)

# cbind new columns with Mean of 2018
rent_state_18 <- cbind(rent_state_2018, Average_Rent_State_18 = round(rowMeans(rent_state_2018)))

\(~\)

Year 2019

# Select columns from 2019
rent_state_2019 <- Apt_List_state %>% 
  select(29:40)

# cbind new columns with Mean of 2019
rent_state_19 <- cbind(rent_state_2019, Average_Rent_State_19 = round(rowMeans(rent_state_2019)))

\(~\)

Year 2020

# Select columns from 2020
rent_state_2020 <- Apt_List_state %>% 
  select(41:52)

# cbind new columns with Mean of 2019
rent_state_20 <- cbind(rent_state_2020, Average_Rent_State_20 = round(rowMeans(rent_state_2020)))

\(~\)

Year 2021

# Select columns from 2021
rent_state_2021 <- Apt_List_state %>% 
  select(53:62)

# cbind new columns with Mean of 2021
rent_state_21 <- cbind(rent_state_2021, Average_Rent_State_21 = round(rowMeans(rent_state_2021)))

\(~\)

# cbind all tables with the mean and drop all columns not needed
Apartment_List_State <-  cbind(Apt_List_state, rent_state_17, rent_state_18, rent_state_19, rent_state_20, rent_state_21)
Apartment_List_State <- Apartment_List_State[-c(5:74, 76:87, 89:100, 102:113, 115:124)]
head(Apartment_List_State, n = 3)
##   State_Name FIPS_Code Population Bedroom_size Average_Rent_State_17
## 1    Alabama         1    4864680      Overall                   835
## 2    Alabama         1    4864680       Studio                   639
## 3    Alabama         1    4864680          1br                   649
##   Average_Rent_State_18 Average_Rent_State_19 Average_Rent_State_20
## 1                   852                   890                   917
## 2                   652                   681                   702
## 3                   662                   692                   713
##   Average_Rent_State_21
## 1                   996
## 2                   763
## 3                   774

\(~\)

Write .csv file for Apartment List

This file will be helpful for Tableau Public

# commenting out so that it doesn't replicate in the system

#write.csv(Apartment_List_State, "/Users/letiix3/Desktop/Data-607/Final Project/csv files/Apartment_List_State.csv")

\(~\)

Apartment List State Data Exploration

\(~\)

summary(Apartment_List_State)
##   State_Name          FIPS_Code       Population       Bedroom_size      
##  Length:306         Min.   : 1.00   Min.   :  581836   Length:306        
##  Class :character   1st Qu.:16.00   1st Qu.: 1687809   Class :character  
##  Mode  :character   Median :29.00   Median : 4440204   Mode  :character  
##                     Mean   :28.96   Mean   : 6331432                     
##                     3rd Qu.:42.00   3rd Qu.: 7294336                     
##                     Max.   :56.00   Max.   :39148760                     
##  Average_Rent_State_17 Average_Rent_State_18 Average_Rent_State_19
##  Min.   : 546.0        Min.   : 546.0        Min.   : 560         
##  1st Qu.: 806.2        1st Qu.: 824.8        1st Qu.: 851         
##  Median :1020.0        Median :1044.0        Median :1076         
##  Mean   :1090.7        Mean   :1120.4        Mean   :1152         
##  3rd Qu.:1263.5        3rd Qu.:1304.8        3rd Qu.:1351         
##  Max.   :2928.0        Max.   :3176.0        Max.   :3254         
##  Average_Rent_State_20 Average_Rent_State_21
##  Min.   : 571.0        Min.   : 600.0       
##  1st Qu.: 864.8        1st Qu.: 928.5       
##  Median :1088.0        Median :1153.5       
##  Mean   :1162.3        Mean   :1241.0       
##  3rd Qu.:1349.2        3rd Qu.:1484.5       
##  Max.   :3211.0        Max.   :3391.0

\(~\)

# Count of Bedroom_size
Bdr_size <- Apartment_List_State%>%
  group_by(Bedroom_size)%>%
  summarise(num=n())%>%
  arrange(desc(num))
head(Bdr_size)
## # A tibble: 6 Ă— 2
##   Bedroom_size   num
##   <chr>        <int>
## 1 1br             51
## 2 2br             51
## 3 3br             51
## 4 4br             51
## 5 Overall         51
## 6 Studio          51

\(~\)

# Count of State and filtering out NY
state <- Apartment_List_State %>%
  group_by(State_Name) %>%
  filter(State_Name == "New York") %>% 
  summarise(num=n()) %>%
  arrange(desc(num))
head(state)
## # A tibble: 1 Ă— 2
##   State_Name   num
##   <chr>      <int>
## 1 New York       6

\(~\)

# Mean yr 2017
round(mean(Apartment_List_State$Average_Rent_State_17))
## [1] 1091

\(~\)

# Mean yr 2018
round(mean(Apartment_List_State$Average_Rent_State_18))
## [1] 1120

\(~\)

# Mean yr 2019
round(mean(Apartment_List_State$Average_Rent_State_19))
## [1] 1152

\(~\)

# Mean yr 2020
round(mean(Apartment_List_State$Average_Rent_State_20))
## [1] 1162

\(~\)

# Mean yr 2021
round(mean(Apartment_List_State$Average_Rent_State_21))
## [1] 1241

\(~\)

GGPLOT

\(~\)

Import Libraries

# Libraries
library(ggplot2)
library(hrbrthemes)
## NOTE: Either Arial Narrow or Roboto Condensed fonts are required to use these themes.
##       Please use hrbrthemes::import_roboto_condensed() to install Roboto Condensed and
##       if Arial Narrow is not on your system, please see https://bit.ly/arialnarrow
hrbrthemes::import_roboto_condensed()
## You will likely need to install these fonts on your system as well.
## 
## You can find them in [/Users/letiix3/Library/R/x86_64/4.1/library/hrbrthemes/fonts/roboto-condensed]
library(viridis)
## Loading required package: viridisLite
library(plotly)
## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout

\(~\)

# Bar plot for all Rent Averages in the boroughs
Street_Easy_All %>%
  mutate(Averages = Year_2016_All + Year_2017_All + Year_2018_All + Year_2019_All + Year_2020_All + Year_2021_All + Year_2016_Studio  + Year_2017_Studio + Year_2018_Studio + Year_2019_Studio + Year_2020_Studio + Year_2021_Studio + Year_2016_OneBdr + Year_2017_OneBdr + Year_2018_OneBdr + Year_2019_OneBdr + Year_2020_OneBdr + Year_2021_OneBdr +   
Year_2016_TwoBdr + Year_2017_TwoBdr + Year_2018_TwoBdr +  
Year_2019_TwoBdr + Year_2020_TwoBdr + Year_2021_TwoBdr +   
Year_2016_ThreeBdrs + Year_2017_ThreeBdrs + Year_2018_ThreeBdrs +
Year_2019_ThreeBdrs + Year_2020_ThreeBdrs + Year_2021_ThreeBdrs) %>% # Creating cumulative time
  ggplot(aes(x = Borough, y = Averages, fill = Averages)) + # Passing plot arguments
  geom_bar(stat = "Identity") # Specifying the type of plot

\(~\)

Street_Easy_All %>%
  pivot_longer(4:9, names_to = "Averages", values_to = "Value") %>%
  ggplot(aes(x = Averages)) +
  geom_bar() +
  facet_wrap(vars(Averages), ncol = 3) +
  labs(x = "Rent Averages", y = "Count", title = "All Apartment Types")

\(~\)

Street_Easy_Studio %>%
  pivot_longer(4:9, names_to = "Averages", values_to = "Value") %>%
  ggplot(aes(x = Averages)) +
  geom_bar() +
  facet_wrap(vars(Averages), ncol = 3) +
  labs(x = "Rent Averages", y = "Count", title = "Studio's")

\(~\)

# Plot for original dataset of All dataset
streeteasy_all %>%
  pivot_longer(4:15, names_to = "Averages", values_to = "Value") %>%
  ggplot(aes(x = Value, color = "blue")) +
  facet_wrap(vars(Averages), ncol = 3) +
  geom_point(stat = "count") +
  geom_line(stat = "count") +
  labs(x = "Rent Cost per Month", y = "Count")

\(~\)

# Count of Borough's for Codecademy dataset
street_easy_cc %>% 
  ggplot(aes(x = Borough, fill = Borough)) +
  geom_bar(stat = "count") +
  labs(x = "Boroughs", y = "Count")

\(~\)

# Codecademy Count of Rent Prices
street_easy_cc %>% 
ggplot(aes(x=Rent_Price)) + 
  geom_histogram(binwidth=15, fill="#69b3a2", color="#e9ecef", alpha=0.9) +
    ggtitle("Rent Prices") +
    theme_ipsum() +
    theme(
      plot.title = element_text(size=15)
    )

\(~\)

# Density plot for Borough and Rent Price in Codecademy
street_easy_cc %>% 
ggplot(aes(x = Rent_Price, group = Borough, fill = Borough)) +
    geom_density(adjust = 1.0) +
    theme_ipsum() +
    facet_wrap(~Borough) +
    theme(
      legend.position="none",
      panel.spacing = unit(0.1, "lines"),
      axis.ticks.x=element_blank()
    )

\(~\)

# Line plot for Rent Price and Number of Bedrooms in Codecademy
street_easy_cc %>%
  tail(10) %>%
  ggplot( aes(x = Rent_Price, y = No_Bedrooms)) +
    geom_line() +
    geom_point()

\(~\)

# Plot .long for Codecademy
street_easy_cc.long <- street_easy_cc %>% 
  select("Neighborhood", "Rent_Price") %>% 
  pivot_longer(-Neighborhood, names_to = "Year", values_to = "Rent_Price")

ggplot(street_easy_cc.long, aes(Neighborhood, Rent_Price, colour = Year)) + geom_line()

\(~\)

# This plot came out a bit cluttered
rental_price <- street_easy_cc %>% select("Borough", "No_Bedrooms", "Rent_Price")
plot_ly(rental_price, x = ~No_Bedrooms, y = ~Rent_Price, color = ~Borough, type = 'scatter', mode = 'lines')

\(~\)

Tableau Public

Conclusion

Each data set required a bit of wrangling but after some exploration I found the following results:

\(~\)

  • From the StreetEasy data set the 5 boroughs ranked:
    • Queens with 60 counts
    • Brooklyn with 55 counts
    • Bronx with 43 counts
    • Manhattan with 38 counts and
    • Staten Island with 1 count

\(~\)

When taking into consideration the apartment types within the StreetEasy data set the rent averages per year were:

\(~\)

  • 2016 - $1730
  • 2017 - $1798
  • 2018 - $1830
  • 2019 - $1902
  • 2020 - $1760
  • 2021 - $1780 [Average counts from January - October 2021; this average rent price will increase when adding November and December 2021.]

\(~\)

How does that compare to the entire United States? From the table below, overall, New York has a higher average rent from the years 2017 - 2021. There was about a 39% difference between the years 2017 - 2019 but from 2020 - 2021 the difference dropped to about 30% of rent increase.

\(~\)

# Comparison of all apartment types for year 2017 - 2021 Apartment List and StreetEasy
comparison <- data.frame(Year = c(2017, 2018, 2019, 2020, 2021),
                 Avg_Rent_US = c(1091, 1120, 1152, 1162, 1241),
                 New_York = c(1798, 1830, 1902, 1760, 1780))
comparison
##   Year Avg_Rent_US New_York
## 1 2017        1091     1798
## 2 2018        1120     1830
## 3 2019        1152     1902
## 4 2020        1162     1760
## 5 2021        1241     1780

\(~\)

# Percent difference for rent averages

# percent_diff <- ((new_avg - old_avg) / new_avg) * 100 

percent_diff17 <- ((1798 - 1091) / 1798) * 100
percent_diff17
## [1] 39.32147
percent_diff18 <- ((1830 - 1120) / 1830) * 100
percent_diff18
## [1] 38.79781
percent_diff19 <- ((1902 - 1152) / 1902) * 100
percent_diff19
## [1] 39.43218
percent_diff20 <- ((1760 - 1162) / 1760) * 100
percent_diff20
## [1] 33.97727
percent_diff21 <- ((1780 - 1241) / 1780) * 100
percent_diff21
## [1] 30.2809

\(~\)

Digging deeper, using the Apartment List data set, I am comparing the years 2019 - 2021 with the average cost of each apartment types and obtained the results below:

# Comparison of all apartment types for year 2019 - 2021 from Apartment List
comparison2 <- data.frame(Apt_Type = c("Studio", "One_Bdr", "Two_Bdrs", "Three_Bdrs", "Four_Bdrs"),
                 "Yr_2019" = c(853, 912, 1111, 1336, 1587),
                 "Yr_2020" = c(860, 919, 1121, 1349, 1601),
                 "Yr_2021" = c(917, 981, 1197, 1441, 1710))
comparison2
##     Apt_Type Yr_2019 Yr_2020 Yr_2021
## 1     Studio     853     860     917
## 2    One_Bdr     912     919     981
## 3   Two_Bdrs    1111    1121    1197
## 4 Three_Bdrs    1336    1349    1441
## 5  Four_Bdrs    1587    1601    1710

\(~\)

For all 5 apartment types there is a small increase in rent of about 7% from 2019 to 2021.

\(~\)

# Percent difference for rent averages from 2019 and 2021

# percent_diff <- ((new_avg - old_avg) / new_avg) * 100 

studio <- ((917 - 853) / 917) * 100
studio
## [1] 6.97928
one_bdr <- ((981 - 912) / 981) * 100
one_bdr
## [1] 7.033639
two_bdr <- ((1197 - 1111) / 1197) * 100
two_bdr
## [1] 7.184628
three_bdr <- ((1441 - 1336) / 1441) * 100
three_bdr
## [1] 7.286607
four_bdr <- ((1710 - 1587) / 1710) * 100
four_bdr
## [1] 7.192982

\(~\)

After importing my data sets into Tableau Public I was able to create more visuals that led to the following conclusions:

  • From the StreetEasy data the areas in NYC with the highest average rent are Central Park South, Battery Park City, Tribeca, Soho and Midtown. No surprise it’s only Manhattan!
  • Looking at the all the boroughs, Manhattan, Brooklyn and Queens are the top three with the highest rents.
  • From Codecademy + StreetEasy data set from June 2016, the most number of bedrooms in apartments were 1 and 2 bedrooms followed by studios.
  • Top 6 Neighborhoods within the dataset were Upper West Side, Upper East Side, Midtown East, Midtown West, Williamsburg and the Financial District.
  • With the top 3 boroughs the neighborhoods with the highest average rent are:
    • Brooklyn
      • Dumbo
      • Brooklyn Heights
      • Carroll Gardens
      • Prospect Heights
      • Fort Greene
    • Manhattan
      • Soho
      • Tribeca
      • Central Park South
      • Nolita
      • Chelsea
    • Queens *Long Island City
      • South Richmond Hill
      • Astoria
      • Flushing
      • Ridgewood

The pandemic has affected rent prices all throughout the United States. Although the most expensive states saw rent price drops between 2020 and 2021, there have been increases in average rent as well. New York has one of the highest average in rent prices nationwide. Not only that but based on these numbers I predict that apartment rental prices will continue to rise an average of about 7% within the next two years.

\(~\)

Assumptions

While thinking about this project, I made a couple of assumptions. The first being the timeline I should consider, if 2018 - 2021 would suffice. While exploring the data I decided that I had to expand my search and opted for 2016 - 2021. I also thought I would see a larger decrease in rent for 2020 and into 2021 since media coverage has covered how much rent prices in New York and throughout the 5 boroughs have decreased. One assumption that was true was Manhattan having the highest average rent price in NYC.

\(~\)

References

  1. Streeteasy Data Dashboard: StreetEasy. StreetEasy Blog. (2019, November 20). Retrieved November 14, 2021, from https://streeteasy.com/blog/data-dashboard/.

\(~\)

  1. Salviati, C., Popov, I., Warnock, R., & Szini, L. (2021, November 30). Apartment List National Rent Report. Apartment List - More than 5 Million Apartments for Rent. Retrieved November 14, 2021, from https://www.apartmentlist.com/research/national-rent-data.

\(~\)

  1. Last Updated: October 26. (2021, October 26). Average rent by year [1940-2021]: Historical rental rates. iPropertyManagement.com. Retrieved November 14, 2021, from https://ipropertymanagement.com/research/average-rent-by-year.

\(~\)

  1. Statista Research Department. (2021, August 25). Topic: Rental market in the U.S. Statista. Retrieved November 14, 2021, from https://www.statista.com/topics/4465/rental-market-in-the-us/#dossierKeyfigures.

\(~\)

  1. https://github.com/Codecademy/datasets/tree/master/streeteasy

\(~\)

  1. https://www.sharpsightlabs.com/blog/small-multiples-ggplot/

\(~\)

  1. https://www.neonscience.org/resources/learning-hub/tutorials/dc-time-series-plot-ggplot-r