Rental prices have increased since 1980 on average 8.86% per year. Nationwide the average monthly rent is $1,164 and within the 21st century the median rent increases at an annual rate of 4.17%. How does this compare by state, particularly in New York? Did the pandemic have a big effect on these rental cost prices? To answer these questions and more, I will be using three data sets including: Street Easy Data, Codecademy in GitHub, and Apartment List that include data from 2016 - October 2021. I will be wrangling and exploring this data and using Tableau Public to create visual aids to show how rental costs have changed over time.
\(~\)
library(tidyverse)## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✓ ggplot2 3.3.5 ✓ purrr 0.3.4
## ✓ tibble 3.1.4 ✓ dplyr 1.0.7
## ✓ tidyr 1.1.3 ✓ stringr 1.4.0
## ✓ readr 2.0.1 ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(stringr)
library(dplyr)
library(tidyr)\(~\)
Street Easy Data from 2016 - October 2021 data sets containing All apartment types, Studio, One Bedroom, Two Bedrooms, and Three Bedrooms or more. I’ll be exploring the data sets individually and then joining them to compare.
\(~\)
# StreetEasy Data
# All types of Apartments
street_easy_all <- read.csv("https://raw.githubusercontent.com/letisalba/Data-607/main/Final%20Project/csv%20files/medianAskingRent_All.csv", header = TRUE, stringsAsFactors = FALSE, na.strings = c(""," ","NA"));
# Studio's only
street_easy_studio <- read.csv("https://raw.githubusercontent.com/letisalba/Data-607/main/Final%20Project/csv%20files/medianAskingRent_Studio.csv", header = TRUE, stringsAsFactors = FALSE, na.strings = c(""," ","NA"));
# One Bedroom only
street_easy_onebrd <- read.csv("https://raw.githubusercontent.com/letisalba/Data-607/main/Final%20Project/csv%20files/medianAskingRent_OneBd.csv", header = TRUE, stringsAsFactors = FALSE, na.strings = c(""," ","NA"));
# Two Bedrooms only
street_easy_twobrd <- read.csv("https://raw.githubusercontent.com/letisalba/Data-607/main/Final%20Project/csv%20files/medianAskingRent_TwoBd.csv", header = TRUE, stringsAsFactors = FALSE, na.strings = c(""," ","NA"));
# Three or more Bedrooms only
street_easy_threebrds <- read.csv("https://raw.githubusercontent.com/letisalba/Data-607/main/Final%20Project/csv%20files/medianAskingRent_ThreePlusBd.csv", header = TRUE, stringsAsFactors = FALSE, na.strings = c(""," ","NA"));\(~\)
Rows: 198 Columns: 145
\(~\)
# Remove columns not needed
streeteasy_all <- street_easy_all[, -c(4:75)]
# Rename columns
colnames(streeteasy_all) <- c("Area_Name", "Borough", "Area_Type", "Jan_2016", "Feb_2016", "Mar_2016", "Apr_2016", "May_2016", "Jun_2016", "Jul_2016", "Aug_2016", "Sep_2016", "Oct_2016", "Nov_2016", "Dec_2016", "Jan_2017", "Feb_2017", "Mar_2017", "Apr_2017", "May_2017", "Jun_2017", "Jul_2017", "Aug_2017", "Sep_2017", "Oct_2017", "Nov_2017", "Dec_2017", "Jan_2018", "Feb_2018", "Mar_2018", "Apr_2018", "May_2018", "Jun_2018", "Jul_2018", "Aug_2018", "Sep_2018", "Oct_2018", "Nov_2018", "Dec_2018", "Jan_2019", "Feb_2019", "Mar_2019", "Apr_2019", "May_2019", "Jun_2019", "Jul_2019", "Aug_2019", "Sep_2019", "Oct_2019", "Nov_2019", "Dec_2019", "Jan_2020", "Feb_2020", "Mar_2020", "Apr_2020", "May_2020", "Jun_2020", "Jul_2020", "Aug_2020", "Sep_2020", "Oct_2020", "Nov_2020", "Dec_2020", "Jan_2021", "Feb_2021", "Mar_2021", "Apr_2021", "May_2021", "Jun_2021", "Jul_2021", "Aug_2021", "Sep_2021", "Oct_2021")
# Replace all NA values with 0
streeteasy_all <- streeteasy_all %>% replace(is.na(.), 0)\(~\)
# Select columns from 2016
yr2016_all <- streeteasy_all %>%
select(4:15)
# cbind new columns with Mean of 2016
yr_2016_all <- cbind(yr2016_all, Year_2016_All = round(rowMeans(yr2016_all)))\(~\)
# Select columns from 2017
yr2017_all <- streeteasy_all %>%
select(16:27)
# cbind new columns with Mean of 2017
yr_2017_all <- cbind(yr2017_all, Year_2017_All = round(rowMeans(yr2017_all)))\(~\)
# Select columns from 2018
yr2018_all <- streeteasy_all %>%
select(28:39)
# cbind new columns with Mean of 2018
yr_2018_all <- cbind(yr2018_all, Year_2018_All = round(rowMeans(yr2018_all)))\(~\)
# Select columns from 2019
yr2019_all <- streeteasy_all %>%
select(40:51)
# cbind new columns with Mean of 2019
yr_2019_all <- cbind(yr2019_all, Year_2019_All = round(rowMeans(yr2019_all)))\(~\)
# Select columns for 2020
yr2020_all <- streeteasy_all %>%
select(52:63)
# cbind new columns with Mean of 2020
yr_2020_all <- cbind(yr2020_all, Year_2020_All = round(rowMeans(yr2020_all)))\(~\)
# Select columns for 2021
yr2021_all <- streeteasy_all %>%
select(64:73)
# cbind new columns with Mean of 2021
yr_2021_all <- cbind(yr2021_all, Year_2021_All = round(rowMeans(yr2021_all)))\(~\)
Rows: 198 Columns: 145
\(~\)
# Remove columns not needed
streeteasy_studio <- street_easy_studio[, -c(4:75)]
# Rename columns
colnames(streeteasy_studio) <- c("Area_Name", "Borough", "Area_Type", "Jan_2016", "Feb_2016", "Mar_2016", "Apr_2016", "May_2016", "Jun_2016", "Jul_2016", "Aug_2016", "Sep_2016", "Oct_2016", "Nov_2016", "Dec_2016", "Jan_2017", "Feb_2017", "Mar_2017", "Apr_2017", "May_2017", "Jun_2017", "Jul_2017", "Aug_2017", "Sep_2017", "Oct_2017", "Nov_2017", "Dec_2017", "Jan_2018", "Feb_2018", "Mar_2018", "Apr_2018", "May_2018", "Jun_2018", "Jul_2018", "Aug_2018", "Sep_2018", "Oct_2018", "Nov_2018", "Dec_2018", "Jan_2019", "Feb_2019", "Mar_2019", "Apr_2019", "May_2019", "Jun_2019", "Jul_2019", "Aug_2019", "Sep_2019", "Oct_2019", "Nov_2019", "Dec_2019", "Jan_2020", "Feb_2020", "Mar_2020", "Apr_2020", "May_2020", "Jun_2020", "Jul_2020", "Aug_2020", "Sep_2020", "Oct_2020", "Nov_2020", "Dec_2020", "Jan_2021", "Feb_2021", "Mar_2021", "Apr_2021", "May_2021", "Jun_2021", "Jul_2021", "Aug_2021", "Sep_2021", "Oct_2021")
# Replace all NA values with 0
streeteasy_studio <- streeteasy_studio %>% replace(is.na(.), 0)\(~\)
# Select columns from 2016
yr2016_studio <- streeteasy_studio %>%
select(4:15)
# cbind new columns with Mean of 2016
yr_2016_studio <- cbind(yr2016_studio, Year_2016_Studio = round(rowMeans(yr2016_studio)))\(~\)
# Select columns from 2017
yr2017_studio <- streeteasy_studio %>%
select(16:27)
# cbind new columns with Mean of 2017
yr_2017_studio <- cbind(yr2017_studio, Year_2017_Studio = round(rowMeans(yr2017_studio)))\(~\)
# Select columns from 2018
yr2018_studio <- streeteasy_studio %>%
select(28:39)
# cbind new columns with Mean of 2018
yr_2018_studio <- cbind(yr2018_studio, Year_2018_Studio = round(rowMeans(yr2018_studio)))\(~\)
# Select columns from 2019
yr2019_studio <- streeteasy_studio %>%
select(40:51)
# cbind new columns with Mean of 2019
yr_2019_studio <- cbind(yr2019_studio, Year_2019_Studio = round(rowMeans(yr2019_studio)))\(~\)
# Select columns from 2020
yr2020_studio <- streeteasy_studio %>%
select(52:63)
# cbind new columns with Mean of 2020
yr_2020_studio <- cbind(yr2020_studio, Year_2020_Studio = round(rowMeans(yr2020_studio)))\(~\)
# Select columns from 2021
yr2021_studio <- streeteasy_studio %>%
select(64:73)
# cbind new columns with Mean of 2021
yr_2021_studio <- cbind(yr2021_studio, Year_2021_Studio = round(rowMeans(yr2021_studio)))\(~\)
Rows: 198 Columns: 145
\(~\)
# Remove columns not needed
streeteasy_onebdr <- street_easy_onebrd[, -c(4:75)]
# Rename columns
colnames(streeteasy_onebdr) <- c("Area_Name", "Borough", "Area_Type", "Jan_2016", "Feb_2016", "Mar_2016", "Apr_2016", "May_2016", "Jun_2016", "Jul_2016", "Aug_2016", "Sep_2016", "Oct_2016", "Nov_2016", "Dec_2016", "Jan_2017", "Feb_2017", "Mar_2017", "Apr_2017", "May_2017", "Jun_2017", "Jul_2017", "Aug_2017", "Sep_2017", "Oct_2017", "Nov_2017", "Dec_2017", "Jan_2018", "Feb_2018", "Mar_2018", "Apr_2018", "May_2018", "Jun_2018", "Jul_2018", "Aug_2018", "Sep_2018", "Oct_2018", "Nov_2018", "Dec_2018", "Jan_2019", "Feb_2019", "Mar_2019", "Apr_2019", "May_2019", "Jun_2019", "Jul_2019", "Aug_2019", "Sep_2019", "Oct_2019", "Nov_2019", "Dec_2019", "Jan_2020", "Feb_2020", "Mar_2020", "Apr_2020", "May_2020", "Jun_2020", "Jul_2020", "Aug_2020", "Sep_2020", "Oct_2020", "Nov_2020", "Dec_2020", "Jan_2021", "Feb_2021", "Mar_2021", "Apr_2021", "May_2021", "Jun_2021", "Jul_2021", "Aug_2021", "Sep_2021", "Oct_2021")
# Replace all NA values with 0
streeteasy_onebdr <- streeteasy_onebdr %>% replace(is.na(.), 0)\(~\)
# Select columns from 2016
yr2016_onebdr <- streeteasy_onebdr %>%
select(4:15)
# cbind new columns with Mean of 2016
yr_2016_onebdr <- cbind(yr2016_onebdr, Year_2016_OneBdr = round(rowMeans(yr2016_onebdr)))\(~\)
# Select columns from 2017
yr2017_onebdr<- streeteasy_onebdr %>%
select(16:27)
# cbind new columns with Mean of 2017
yr_2017_onebdr<- cbind(yr2017_onebdr, Year_2017_OneBdr = round(rowMeans(yr2017_onebdr)))\(~\)
# Select columns from 2018
yr2018_onebdr <- streeteasy_onebdr %>%
select(28:39)
# cbind new columns with Mean of 2018
yr_2018_onebdr <- cbind(yr2018_onebdr, Year_2018_OneBdr = round(rowMeans(yr2018_onebdr)))\(~\)
# Select columns from 2019
yr2019_onebdr <- streeteasy_onebdr %>%
select(40:51)
# cbind new columns with Mean of 2019
yr_2019_onebdr <- cbind(yr2019_onebdr, Year_2019_OneBdr = round(rowMeans(yr2019_onebdr)))\(~\)
# Select columns from 2020
yr2020_onebdr <- streeteasy_onebdr %>%
select(52:63)
# cbind new columns with Mean of 2020
yr_2020_onebdr <- cbind(yr2020_onebdr, Year_2020_OneBdr = round(rowMeans(yr2020_onebdr)))\(~\)
# Select columns from 2021
yr2021_onebdr <- streeteasy_onebdr %>%
select(64:73)
# cbind new columns with Mean of 2020
yr_2021_onebdr <- cbind(yr2021_onebdr, Year_2021_OneBdr = round(rowMeans(yr2021_onebdr)))\(~\)
Rows: 198 Columns: 145
\(~\)
# Remove columns not needed
streeteasy_twobrd <- street_easy_twobrd[, -c(4:75)]
# Rename columns
colnames(streeteasy_twobrd) <- c("Area_Name", "Borough", "Area_Type", "Jan_2016", "Feb_2016", "Mar_2016", "Apr_2016", "May_2016", "Jun_2016", "Jul_2016", "Aug_2016", "Sep_2016", "Oct_2016", "Nov_2016", "Dec_2016", "Jan_2017", "Feb_2017", "Mar_2017", "Apr_2017", "May_2017", "Jun_2017", "Jul_2017", "Aug_2017", "Sep_2017", "Oct_2017", "Nov_2017", "Dec_2017", "Jan_2018", "Feb_2018", "Mar_2018", "Apr_2018", "May_2018", "Jun_2018", "Jul_2018", "Aug_2018", "Sep_2018", "Oct_2018", "Nov_2018", "Dec_2018", "Jan_2019", "Feb_2019", "Mar_2019", "Apr_2019", "May_2019", "Jun_2019", "Jul_2019", "Aug_2019", "Sep_2019", "Oct_2019", "Nov_2019", "Dec_2019", "Jan_2020", "Feb_2020", "Mar_2020", "Apr_2020", "May_2020", "Jun_2020", "Jul_2020", "Aug_2020", "Sep_2020", "Oct_2020", "Nov_2020", "Dec_2020", "Jan_2021", "Feb_2021", "Mar_2021", "Apr_2021", "May_2021", "Jun_2021", "Jul_2021", "Aug_2021", "Sep_2021", "Oct_2021")
# Replace all NA values with 0
streeteasy_twobrd <- streeteasy_twobrd %>% replace(is.na(.), 0)\(~\)
# Select columns from 2016
yr2016_twobdr <- streeteasy_twobrd %>%
select(4:15)
# cbind new columns with Mean of 2016
yr_2016_twobdr <- cbind(yr2016_twobdr, Year_2016_TwoBdr = round(rowMeans(yr2016_twobdr)))\(~\)
# Select columns from 2017
yr2017_twobdr <- streeteasy_twobrd %>%
select(16:27)
# cbind new columns with Mean of 2017
yr_2017_twobdr <- cbind(yr2017_twobdr, Year_2017_TwoBdr = round(rowMeans(yr2017_twobdr)))\(~\)
# Select columns from 2018
yr2018_twobdr <- streeteasy_twobrd %>%
select(28:39)
# cbind new columns with Mean of 2018
yr_2018_twobdr <- cbind(yr2018_twobdr, Year_2018_TwoBdr = round(rowMeans(yr2018_twobdr)))\(~\)
# Select columns from 2019
yr2019_twobdr <- streeteasy_twobrd %>%
select(40:51)
# cbind new columns with Mean of 2019
yr_2019_twobdr <- cbind(yr2019_twobdr, Year_2019_TwoBdr = round(rowMeans(yr2019_twobdr)))\(~\)
# Select columns from 2020
yr2020_twobdr <- streeteasy_twobrd %>%
select(52:63)
# cbind new columns with Mean of 2020
yr_2020_twobdr <- cbind(yr2020_twobdr, Year_2020_TwoBdr = round(rowMeans(yr2020_twobdr)))\(~\)
# Select columns from 2021
yr2021_twobdr <- streeteasy_twobrd %>%
select(64:73)
# cbind new columns with Mean of 2020
yr_2021_twobdr <- cbind(yr2021_twobdr, Year_2021_TwoBdr = round(rowMeans(yr2021_twobdr)))\(~\)
Rows: 198 Columns: 145
\(~\)
# Remove columns not needed
streeteasy_threebrds <- street_easy_threebrds[, -c(4:75)]
# Rename columns
colnames(streeteasy_threebrds) <- c("Area_Name", "Borough", "Area_Type", "Jan_2016", "Feb_2016", "Mar_2016", "Apr_2016", "May_2016", "Jun_2016", "Jul_2016", "Aug_2016", "Sep_2016", "Oct_2016", "Nov_2016", "Dec_2016", "Jan_2017", "Feb_2017", "Mar_2017", "Apr_2017", "May_2017", "Jun_2017", "Jul_2017", "Aug_2017", "Sep_2017", "Oct_2017", "Nov_2017", "Dec_2017", "Jan_2018", "Feb_2018", "Mar_2018", "Apr_2018", "May_2018", "Jun_2018", "Jul_2018", "Aug_2018", "Sep_2018", "Oct_2018", "Nov_2018", "Dec_2018", "Jan_2019", "Feb_2019", "Mar_2019", "Apr_2019", "May_2019", "Jun_2019", "Jul_2019", "Aug_2019", "Sep_2019", "Oct_2019", "Nov_2019", "Dec_2019", "Jan_2020", "Feb_2020", "Mar_2020", "Apr_2020", "May_2020", "Jun_2020", "Jul_2020", "Aug_2020", "Sep_2020", "Oct_2020", "Nov_2020", "Dec_2020", "Jan_2021", "Feb_2021", "Mar_2021", "Apr_2021", "May_2021", "Jun_2021", "Jul_2021", "Aug_2021", "Sep_2021", "Oct_2021")
# Replace all NA values with 0
streeteasy_threebrds <- streeteasy_threebrds %>% replace(is.na(.), 0)\(~\)
# Select columns from 2016
yr2016_threebdrs <- streeteasy_threebrds %>%
select(4:15)
# cbind new columns with Mean of 2016
yr_2016_threebdrs <- cbind(yr2016_threebdrs, Year_2016_ThreeBdrs = round(rowMeans(yr2016_threebdrs)))\(~\)
# Select columns from 2017
yr2017_threebdrs <- streeteasy_threebrds %>%
select(16:27)
# cbind new columns with Mean of 2017
yr_2017_threebdrs <- cbind(yr2017_threebdrs, Year_2017_ThreeBdrs = round(rowMeans(yr2017_threebdrs)))\(~\)
# Select columns from 2018
yr2018_threebdrs <- streeteasy_threebrds %>%
select(28:39)
# cbind new columns with Mean of 2018
yr_2018_threebdrs <- cbind(yr2018_threebdrs, Year_2018_ThreeBdrs = round(rowMeans(yr2018_threebdrs)))\(~\)
# Select columns from 2019
yr2019_threebdrs <- streeteasy_threebrds %>%
select(40:51)
# cbind new columns with Mean of 2019
yr_2019_threebdrs <- cbind(yr2019_threebdrs, Year_2019_ThreeBdrs = round(rowMeans(yr2019_threebdrs)))\(~\)
# Select columns from 2020
yr2020_threebdrs <- streeteasy_threebrds %>%
select(52:63)
# cbind new columns with Mean of 2020
yr_2020_threebdrs <- cbind(yr2020_threebdrs, Year_2020_ThreeBdrs = round(rowMeans(yr2020_threebdrs)))\(~\)
# Select columns from 2021
yr2021_threebdrs <- streeteasy_threebrds %>%
select(64:73)
# cbind new columns with Mean of 2020
yr_2021_threebdrs <- cbind(yr2021_threebdrs, Year_2021_ThreeBdrs = round(rowMeans(yr2021_threebdrs)))\(~\)
# cbind year 2016 for all apartment types
street_easy16 <- cbind(yr_2016_studio, yr_2016_onebdr, yr_2016_twobdr, yr_2016_threebdrs)
street_easy16 <- street_easy16[-c(1:12, 14:25, 27:38, 40:51, 53:64, 66:75)]
# cbind year 2017 for all apartment types
street_easy17 <- cbind(yr_2017_studio, yr_2017_onebdr, yr_2017_twobdr, yr_2017_threebdrs)
street_easy17 <- street_easy17[-c(1:12, 14:25, 27:38, 40:51, 53:64, 66:75)]
# cbind year 2018 for all apartment types
street_easy18 <- cbind(yr_2018_studio, yr_2018_onebdr, yr_2018_twobdr, yr_2018_threebdrs)
street_easy18 <- street_easy18[-c(1:12, 14:25, 27:38, 40:51, 53:64, 66:75)]
# cbind year 2019 for all apartment types
street_easy19 <- cbind(yr_2019_studio, yr_2019_onebdr, yr_2019_twobdr, yr_2019_threebdrs)
street_easy19 <- street_easy19[-c(1:12, 14:25, 27:38, 40:51, 53:64, 66:75)]
# cbind year 2020 for all apartment types
street_easy20 <- cbind(yr_2020_studio, yr_2020_onebdr, yr_2020_twobdr, yr_2020_threebdrs)
street_easy20 <- street_easy20[-c(1:12, 14:25, 27:38, 40:51, 53:64, 66:75)]
# cbind year 2021 for all apartment types
street_easy21 <- cbind(yr_2021_studio, yr_2021_onebdr, yr_2021_twobdr, yr_2021_threebdrs)
street_easy21 <- street_easy21[-c(1:10, 12:21, 23:32, 34:43)]\(~\)
# cbind original table with new table and drop all columns not needed
street_easy_All <- cbind(streeteasy_all, street_easy16, street_easy17, street_easy18, street_easy19, street_easy20, street_easy21)
# Dropping columns not needed
street_easy_All <- street_easy_All[-c(4:73)]\(~\)
Another way to combine tables to explore
# cbind all tables with the mean and drop all columns not needed
# All Apt Types
street_easyall <- cbind(yr_2016_all, yr_2017_all, yr_2018_all, yr_2019_all, yr_2020_all, yr_2021_all)
street_easyall <- street_easyall[-c(1:12, 14:25, 27:38, 40:51, 53:64, 66:75)]
# Studio's only
street_easystudio <- cbind(yr_2016_studio, yr_2017_studio, yr_2018_studio, yr_2019_studio, yr_2020_studio, yr_2021_studio)
street_easystudio <- street_easystudio[-c(1:12, 14:25, 27:38, 40:51, 53:64, 66:75)]
# One Bedroom only
street_easyonebdr <- cbind(yr_2016_onebdr, yr_2017_onebdr, yr_2018_onebdr, yr_2019_onebdr, yr_2020_onebdr, yr_2021_onebdr)
street_easyonebdr <- street_easyonebdr[-c(1:12, 14:25, 27:38, 40:51, 53:64, 66:75)]
# Two Bedrooms only
street_easytwobdr <- cbind(yr_2016_twobdr, yr_2017_twobdr, yr_2018_twobdr, yr_2019_twobdr, yr_2020_twobdr, yr_2021_twobdr)
street_easytwobdr <- street_easytwobdr[-c(1:12, 14:25, 27:38, 40:51, 53:64, 66:75)]
# Three Bedrooms only
street_easythreebdrs <- cbind(yr_2016_threebdrs, yr_2017_threebdrs, yr_2018_threebdrs, yr_2019_threebdrs, yr_2020_threebdrs, yr_2021_threebdrs)
street_easythreebdrs <- street_easythreebdrs[-c(1:12, 14:25, 27:38, 40:51, 53:64, 66:75)]\(~\)
Another way to combine tables to explore
# cbind all Studio tables with original dataset and drop columns not needed
Street_Easy_Studio <- cbind(streeteasy_all, yr_2016_studio, yr_2017_studio, yr_2018_studio, yr_2019_studio, yr_2020_studio, yr_2021_studio)
Street_Easy_Studio <- Street_Easy_Studio[-c(4:85, 87:98, 100:111, 113:124, 126:137, 139:148)]
# cbind all One Bedroom tables with original dataset and drop columns not needed
Street_Easy_OneBdr <- cbind(streeteasy_all, yr_2016_onebdr, yr_2017_onebdr, yr_2018_onebdr, yr_2019_onebdr, yr_2020_onebdr, yr_2021_onebdr)
Street_Easy_OneBdr <- Street_Easy_OneBdr[-c(4:85, 87:98, 100:111, 113:124, 126:137, 139:148)]
# cbind all Two Bedrooms tables with original dataset and drop columns not needed
Street_Easy_TwoBdr <- cbind(streeteasy_all, yr_2016_twobdr, yr_2017_twobdr, yr_2018_twobdr, yr_2019_twobdr, yr_2020_twobdr, yr_2021_twobdr)
Street_Easy_TwoBdr <- Street_Easy_TwoBdr[-c(4:85, 87:98, 100:111, 113:124, 126:137, 139:148)]
# cbind all 3 or more Bedrooms with original data set and drop columns not needed
Street_Easy_ThreeBdrs <- cbind(streeteasy_all, yr_2016_threebdrs, yr_2017_threebdrs, yr_2018_threebdrs, yr_2019_threebdrs, yr_2020_threebdrs, yr_2021_threebdrs)
Street_Easy_ThreeBdrs <- Street_Easy_ThreeBdrs[-c(4:85, 87:98, 100:111, 113:124, 126:137, 139:148)]\(~\)
This data frame will be used to write .csv file to then be used in tableau. This was easier to explore since it was already categorized by apartment type with a subcategory of years.
# cbind original table with new table and drop all columns not needed
Street_Easy_All <- cbind(streeteasy_all, street_easyall, street_easystudio, street_easyonebdr, street_easytwobdr, street_easythreebdrs)
# Dropping columns not needed
Street_Easy_All <- Street_Easy_All[-c(4:73)]
# Replace 0 in Borough column with NYC
Street_Easy_All$Borough[Street_Easy_All$Borough=="0"]<-"NYC"
head(Street_Easy_All, n = 4)## Area_Name Borough Area_Type Year_2016_All Year_2017_All
## 1 All Downtown Manhattan submarket 3765 3689
## 2 All Midtown Manhattan submarket 3516 3471
## 3 All Upper East Side Manhattan submarket 2979 2960
## 4 All Upper Manhattan Manhattan submarket 2363 2335
## Year_2018_All Year_2019_All Year_2020_All Year_2021_All Year_2016_Studio
## 1 3779 3956 3561 3583 2795
## 2 3474 3637 3249 3313 2613
## 3 2912 3057 2881 2570 2123
## 4 2343 2408 2351 2250 1645
## Year_2017_Studio Year_2018_Studio Year_2019_Studio Year_2020_Studio
## 1 2738 2779 2925 2605
## 2 2569 2559 2678 2404
## 3 2088 2094 2156 2042
## 4 1685 1730 1765 1746
## Year_2021_Studio Year_2016_OneBdr Year_2017_OneBdr Year_2018_OneBdr
## 1 2626 3590 3508 3564
## 2 2378 3509 3462 3456
## 3 1896 2831 2811 2799
## 4 1692 1927 1955 1967
## Year_2019_OneBdr Year_2020_OneBdr Year_2021_OneBdr Year_2016_TwoBdr
## 1 3804 3428 3519 4771
## 2 3611 3221 3342 5013
## 3 2920 2770 2522 3808
## 4 2026 2013 1925 2408
## Year_2017_TwoBdr Year_2018_TwoBdr Year_2019_TwoBdr Year_2020_TwoBdr
## 1 4527 4497 4727 4270
## 2 4883 4730 4953 4377
## 3 3858 3679 4015 3816
## 4 2422 2440 2504 2438
## Year_2021_TwoBdr Year_2016_ThreeBdrs Year_2017_ThreeBdrs Year_2018_ThreeBdrs
## 1 4114 7115 6478 6464
## 2 4793 6432 6339 6006
## 3 3325 8553 8146 7924
## 4 2360 3161 3129 3138
## Year_2019_ThreeBdrs Year_2020_ThreeBdrs Year_2021_ThreeBdrs
## 1 7224 6275 5873
## 2 6165 5393 5969
## 3 8632 8227 7044
## 4 3193 3045 2902
\(~\)
# Count of Boroughs
borough <- Street_Easy_All %>%
group_by(Borough) %>%
summarise(num=n()) %>%
arrange(desc(num))
head(borough, n = 6)## # A tibble: 6 Ă— 2
## Borough num
## <chr> <int>
## 1 Queens 60
## 2 Brooklyn 55
## 3 Bronx 43
## 4 Manhattan 38
## 5 NYC 1
## 6 Staten Island 1
\(~\)
summary(Street_Easy_All)## Area_Name Borough Area_Type Year_2016_All
## Length:198 Length:198 Length:198 Min. : 0.0
## Class :character Class :character Class :character 1st Qu.: 164.2
## Mode :character Mode :character Mode :character Median :1831.0
## Mean :1729.7
## 3rd Qu.:2589.0
## Max. :7509.0
## Year_2017_All Year_2018_All Year_2019_All Year_2020_All
## Min. : 0.0 Min. : 0.0 Min. : 0 Min. : 0.0
## 1st Qu.: 554.2 1st Qu.: 745.8 1st Qu.: 876 1st Qu.: 348.8
## Median :1925.5 Median :1926.5 Median :1975 Median :1957.5
## Mean :1798.4 Mean :1830.5 Mean :1902 Mean :1760.1
## 3rd Qu.:2554.5 3rd Qu.:2594.0 3rd Qu.:2675 3rd Qu.:2599.8
## Max. :7082.0 Max. :7539.0 Max. :7180 Max. :7078.0
## Year_2021_All Year_2016_Studio Year_2017_Studio Year_2018_Studio
## Min. : 0 Min. : 0.0 Min. : 0.0 Min. : 0.0
## 1st Qu.: 499 1st Qu.: 0.0 1st Qu.: 0.0 1st Qu.: 0.0
## Median :1945 Median : 0.0 Median : 0.0 Median : 122.0
## Mean :1780 Mean : 825.8 Mean : 879.3 Mean : 912.9
## 3rd Qu.:2508 3rd Qu.:1652.8 3rd Qu.:1706.8 3rd Qu.:1778.2
## Max. :7330 Max. :3687.0 Max. :3482.0 Max. :3859.0
## Year_2019_Studio Year_2020_Studio Year_2021_Studio Year_2016_OneBdr
## Min. : 0.0 Min. : 0.0 Min. : 0.0 Min. : 0
## 1st Qu.: 0.0 1st Qu.: 0.0 1st Qu.: 0.0 1st Qu.: 0
## Median : 167.0 Median : 144.0 Median : 166.5 Median :1428
## Mean : 953.9 Mean : 919.2 Mean : 953.3 Mean :1341
## 3rd Qu.:1833.0 3rd Qu.:1776.8 3rd Qu.:1846.2 3rd Qu.:2126
## Max. :3983.0 Max. :3401.0 Max. :4043.0 Max. :6722
## Year_2017_OneBdr Year_2018_OneBdr Year_2019_OneBdr Year_2020_OneBdr
## Min. : 0 Min. : 0 Min. : 0 Min. : 0
## 1st Qu.: 0 1st Qu.: 0 1st Qu.: 0 1st Qu.: 0
## Median :1526 Median :1488 Median :1662 Median :1676
## Mean :1364 Mean :1365 Mean :1468 Mean :1380
## 3rd Qu.:2173 3rd Qu.:2229 3rd Qu.:2278 3rd Qu.:2260
## Max. :6304 Max. :6410 Max. :6031 Max. :5426
## Year_2021_OneBdr Year_2016_TwoBdr Year_2017_TwoBdr Year_2018_TwoBdr
## Min. : 0 Min. : 0 Min. : 0 Min. : 0
## 1st Qu.: 0 1st Qu.: 0 1st Qu.: 0 1st Qu.: 0
## Median :1660 Median : 1802 Median :1918 Median :1968
## Mean :1410 Mean : 1777 Mean :1780 Mean :1804
## 3rd Qu.:2186 3rd Qu.: 2706 3rd Qu.:2668 3rd Qu.:2680
## Max. :6001 Max. :10025 Max. :9675 Max. :9936
## Year_2019_TwoBdr Year_2020_TwoBdr Year_2021_TwoBdr Year_2016_ThreeBdrs
## Min. : 0 Min. : 0 Min. : 0 Min. : 0
## 1st Qu.: 0 1st Qu.: 0 1st Qu.: 0 1st Qu.: 0
## Median : 2001 Median : 1870 Median : 2023 Median : 1592
## Mean : 1886 Mean : 1759 Mean : 1810 Mean : 2341
## 3rd Qu.: 2737 3rd Qu.: 2580 3rd Qu.: 2696 3rd Qu.: 3157
## Max. :10397 Max. :10850 Max. :10406 Max. :16737
## Year_2017_ThreeBdrs Year_2018_ThreeBdrs Year_2019_ThreeBdrs
## Min. : 0 Min. : 0 Min. : 0
## 1st Qu.: 0 1st Qu.: 0 1st Qu.: 0
## Median : 1717 Median : 2182 Median : 2123
## Mean : 2321 Mean : 2281 Mean : 2377
## 3rd Qu.: 3194 3rd Qu.: 3218 3rd Qu.: 3234
## Max. :16188 Max. :14669 Max. :16682
## Year_2020_ThreeBdrs Year_2021_ThreeBdrs
## Min. : 0 Min. : 0
## 1st Qu.: 0 1st Qu.: 0
## Median : 1498 Median : 1748
## Mean : 2183 Mean : 2227
## 3rd Qu.: 3184 3rd Qu.: 3011
## Max. :17185 Max. :17224
\(~\)
# Mean for yr 2016
round(mean(Street_Easy_All$Year_2016_All))## [1] 1730
round(mean(Street_Easy_All$Year_2016_Studio))## [1] 826
round(mean(Street_Easy_All$Year_2016_OneBdr))## [1] 1341
round(mean(Street_Easy_All$Year_2016_TwoBdr))## [1] 1777
round(mean(Street_Easy_All$Year_2016_ThreeBdrs))## [1] 2341
\(~\)
# Mean for yr 2017
round(mean(Street_Easy_All$Year_2017_All))## [1] 1798
round(mean(Street_Easy_All$Year_2017_Studio))## [1] 879
round(mean(Street_Easy_All$Year_2017_OneBdr))## [1] 1364
round(mean(Street_Easy_All$Year_2017_TwoBdr))## [1] 1780
round(mean(Street_Easy_All$Year_2017_ThreeBdrs))## [1] 2321
\(~\)
# Mean for yr 2018
round(mean(Street_Easy_All$Year_2018_All))## [1] 1830
round(mean(Street_Easy_All$Year_2018_Studio))## [1] 913
round(mean(Street_Easy_All$Year_2018_OneBdr))## [1] 1365
round(mean(Street_Easy_All$Year_2018_TwoBdr))## [1] 1804
round(mean(Street_Easy_All$Year_2018_ThreeBdrs))## [1] 2281
\(~\)
# Mean for yr 2019
round(mean(Street_Easy_All$Year_2019_All))## [1] 1902
round(mean(Street_Easy_All$Year_2019_Studio))## [1] 954
round(mean(Street_Easy_All$Year_2019_OneBdr))## [1] 1468
round(mean(Street_Easy_All$Year_2019_TwoBdr))## [1] 1886
round(mean(Street_Easy_All$Year_2019_ThreeBdrs))## [1] 2377
\(~\)
# Mean for yr 2020
round(mean(Street_Easy_All$Year_2020_All))## [1] 1760
round(mean(Street_Easy_All$Year_2020_Studio))## [1] 919
round(mean(Street_Easy_All$Year_2020_OneBdr))## [1] 1380
round(mean(Street_Easy_All$Year_2020_TwoBdr))## [1] 1759
round(mean(Street_Easy_All$Year_2020_ThreeBdrs))## [1] 2183
\(~\)
# Mean for yr 2021
round(mean(Street_Easy_All$Year_2021_All))## [1] 1780
round(mean(Street_Easy_All$Year_2021_Studio))## [1] 953
round(mean(Street_Easy_All$Year_2021_OneBdr))## [1] 1410
round(mean(Street_Easy_All$Year_2021_TwoBdr))## [1] 1810
round(mean(Street_Easy_All$Year_2021_ThreeBdrs))## [1] 2227
\(~\)
# Count of Area_Type
area_type <- Street_Easy_All %>%
group_by(Area_Type) %>%
summarise(num=n()) %>%
arrange(desc(num))
head(area_type, n = 5)## # A tibble: 4 Ă— 2
## Area_Type num
## <chr> <int>
## 1 neighborhood 177
## 2 submarket 15
## 3 borough 5
## 4 city 1
\(~\)
This file will be helpful for Tableau Public
# commenting out so that it doesn't replicate in the system
#write.csv(Street_Easy_All, "/Users/letiix3/Desktop/Data-607/Final Project/csv files/Street_Easy_All.csv")\(~\)
Rows: 5,000 Columns: 20
Codecademy + StreetEasy: sample of 5,000 rental listings in Manhattan, Brooklyn, and Queens from June 2016.
\(~\)
| Headers | Description |
|---|---|
| rental_id | rental ID |
| building_id | building ID |
| rent | price of rent($) |
| bedrooms | number of bedrooms |
| bathrooms | number of bathrooms |
| size_sqft | size in square feet |
| min_to_subway | distance from subway station in minutes |
| floor | floor number |
| building_age_yrs | building’s age in years |
| no_fee | does it have a broker fee? (0 for fee, 1 for no fee) |
| has_rooftdeck | does it have a roof deck? (o for no, 1 for yes) |
| has_washer_dryer | does it have washer/dryer in unit (0/1) |
| has_doorman | does it have a doorman? (0/1) |
| has_elevator | does it have an elevator? (0/1) |
| has_dishwasher | does it have a dishwasher? (0/1) |
| has_patio | does the building have a gym? (0/1) |
| has_gym | does the building have a gym? (0/1) |
| neighborhood | neighborhood (ex: Greenpoint) |
| submarket | submarket (ex: North Brooklyn) |
| borough | borough (ex: Brooklyn) |
\(~\)
# Codecademy + StreetEasy
street_easy_cc <- read.csv("https://raw.githubusercontent.com/Codecademy/datasets/master/streeteasy/streeteasy.csv", header = TRUE)\(~\)
# commenting out so that it does not reproduce
#glimpse(street_easy_cc)\(~\)
# Remove columns not needed
street_easy_cc <- street_easy_cc[, -c(2, 6:17)]
# Rename columns
colnames(street_easy_cc) <- c("Rental_ID", "Rent_Price", "No_Bedrooms", "No_Bathrooms", "Neighborhood", "SubMarket", "Borough" )
head(street_easy_cc, n = 4)## Rental_ID Rent_Price No_Bedrooms No_Bathrooms Neighborhood
## 1 1545 2550 0 1 Upper East Side
## 2 2472 11500 2 2 Greenwich Village
## 3 10234 3000 3 1 Astoria
## 4 2919 4500 1 1 Midtown
## SubMarket Borough
## 1 All Upper East Side Manhattan
## 2 All Downtown Manhattan
## 3 Northwest Queens Queens
## 4 All Midtown Manhattan
\(~\)
This file will be helpful for Tableau Public
# commenting out so that it doesn't replicate in the system
#write.csv(street_easy_cc, "/Users/letiix3/Desktop/Data-607/Final Project/csv files/street_easy_cc.csv")\(~\)
\(~\)
summary(street_easy_cc)## Rental_ID Rent_Price No_Bedrooms No_Bathrooms
## Min. : 1 Min. : 1250 Min. :0.000 Min. :0.000
## 1st Qu.: 2700 1st Qu.: 2750 1st Qu.:1.000 1st Qu.:1.000
## Median : 5456 Median : 3600 Median :1.000 Median :1.000
## Mean : 5527 Mean : 4537 Mean :1.396 Mean :1.322
## 3rd Qu.: 8306 3rd Qu.: 5200 3rd Qu.:2.000 3rd Qu.:2.000
## Max. :11349 Max. :20000 Max. :5.000 Max. :5.000
## Neighborhood SubMarket Borough
## Length:5000 Length:5000 Length:5000
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
##
##
##
\(~\)
# Count of Bedrooms
bedrooms <- street_easy_cc%>%
group_by(No_Bedrooms)%>%
summarise(num=n())%>%
arrange(desc(num))
head(bedrooms)## # A tibble: 6 Ă— 2
## No_Bedrooms num
## <dbl> <int>
## 1 1 2048
## 2 2 1438
## 3 0 832
## 4 3 496
## 5 4 103
## 6 1.5 28
\(~\)
# Count of Bathrooms
bathrooms <- street_easy_cc%>%
group_by(No_Bathrooms)%>%
summarise(num=n())%>%
arrange(desc(num))
head(bathrooms)## # A tibble: 6 Ă— 2
## No_Bathrooms num
## <int> <int>
## 1 1 3572
## 2 2 1212
## 3 3 182
## 4 0 18
## 5 4 14
## 6 5 2
\(~\)
# Count of Neighborhood
neighborhood <- street_easy_cc%>%
group_by(Neighborhood)%>%
summarise(num=n())%>%
arrange(desc(num))
head(neighborhood, n = 6)## # A tibble: 6 Ă— 2
## Neighborhood num
## <chr> <int>
## 1 Upper West Side 579
## 2 Upper East Side 500
## 3 Midtown East 460
## 4 Midtown West 314
## 5 Williamsburg 306
## 6 Financial District 268
\(~\)
# Count of Submarket
submarket<- street_easy_cc%>%
group_by(SubMarket)%>%
summarise(num=n())%>%
arrange(desc(num))
head(submarket)## # A tibble: 6 Ă— 2
## SubMarket num
## <chr> <int>
## 1 All Downtown 1229
## 2 All Midtown 1001
## 3 All Upper West Side 579
## 4 All Upper East Side 505
## 5 North Brooklyn 347
## 6 Northwest Brooklyn 311
\(~\)
# Count of Borough
borough <- street_easy_cc%>%
group_by(Borough)%>%
summarise(num=n())%>%
arrange(desc(num))
head(borough)## # A tibble: 3 Ă— 2
## Borough num
## <chr> <int>
## 1 Manhattan 3539
## 2 Brooklyn 1013
## 3 Queens 448
\(~\)
# Mean for Rent Price
round(mean(street_easy_cc$Rent_Price))## [1] 4537
\(~\)
Rows: 206 Columns: 62
\(~\)
Apartment List State data set from 2017 - October 2021.
# Load Data
apt_list_state <- read.csv("https://raw.githubusercontent.com/letisalba/Data-607/main/Final%20Project/csv%20files/State_AptList_Rent.csv", header=TRUE, stringsAsFactors=FALSE, na.strings=c(""," ","NA"));\(~\)
# Rename columns
colnames(apt_list_state) <- c("State_Name", "FIPS_Code", "Population", "Bedroom_Size", "Jan_2017", "Feb_2017", "Mar_2017", "Apr_2017", "May_2017", "Jun_2017", "Jul_2017", "Aug_2017", "Sep_2017", "Oct_2017", "Nov_2017", "Dec_2017", "Jan_2018", "Feb_2018", "Mar_2018", "Apr_2018", "May_2018", "Jun_2018", "Jul_2018", "Aug_2018", "Sep_2018", "Oct_2018", "Nov_2018", "Dec_2018", "Jan_2019", "Feb_2019", "Mar_2019", "Apr_2019", "May_2019", "Jun_2019", "Jul_2019", "Aug_2019", "Sep_2019", "Oct_2019", "Nov_2019", "Dec_2019", "Jan_2020", "Feb_2020", "Mar_2020", "Apr_2020", "May_2020", "Jun_2020", "Jul_2020", "Aug_2020", "Sep_2020", "Oct_2020", "Nov_2020", "Dec_2020", "Jan_2021", "Feb_2021", "Mar_2021", "Apr_2021", "May_2021", "Jun_2021", "Jul_2021", "Aug_2021", "Sep_2021", "Oct_2021")
# Replace all NA values with 0
apt_list_state <- apt_list_state %>% replace(is.na(.), 0)\(~\)
# Make 5 - 62 numeric
apt_list_state <- apt_list_state %>%
mutate_at(c(5:62), as.numeric)\(~\)
# Rename values in Bedroom_Size column to remove punctuation
Apt_list_state <- recode(apt_list_state$Bedroom_Size,
"_Overall" = "Overall",
"_Studio" = "Studio")
head(Apt_list_state)## [1] "Overall" "Studio" "1br" "2br" "3br" "4br"
\(~\)
# cbind tables
Apt_List_state <- cbind(apt_list_state, Apt_list_state)
# Drop column not needed
Apt_List_state <- Apt_List_state[, -c(4)]
# Rename column Apt_list_state to Bedroom_size
colnames(Apt_List_state) <- c("State_Name", "FIPS_Code", "Population", "Jan_2017", "Feb_2017", "Mar_2017", "Apr_2017", "May_2017", "Jun_2017", "Jul_2017", "Aug_2017", "Sep_2017", "Oct_2017", "Nov_2017", "Dec_2017", "Jan_2018", "Feb_2018", "Mar_2018", "Apr_2018", "May_2018", "Jun_2018", "Jul_2018", "Aug_2018", "Sep_2018", "Oct_2018", "Nov_2018", "Dec_2018", "Jan_2019", "Feb_2019", "Mar_2019", "Apr_2019", "May_2019", "Jun_2019", "Jul_2019", "Aug_2019", "Sep_2019", "Oct_2019", "Nov_2019", "Dec_2019", "Jan_2020", "Feb_2020", "Mar_2020", "Apr_2020", "May_2020", "Jun_2020", "Jul_2020", "Aug_2020", "Sep_2020", "Oct_2020", "Nov_2020", "Dec_2020", "Jan_2021", "Feb_2021", "Mar_2021", "Apr_2021", "May_2021", "Jun_2021", "Jul_2021", "Aug_2021", "Sep_2021", "Oct_2021", "Bedroom_size")
# Reorder columns
col_order <- c("State_Name", "FIPS_Code", "Population", "Bedroom_size", "Jan_2017", "Feb_2017", "Mar_2017", "Apr_2017", "May_2017", "Jun_2017", "Jul_2017", "Aug_2017", "Sep_2017", "Oct_2017", "Nov_2017", "Dec_2017", "Jan_2018", "Feb_2018", "Mar_2018", "Apr_2018", "May_2018", "Jun_2018", "Jul_2018", "Aug_2018", "Sep_2018", "Oct_2018", "Nov_2018", "Dec_2018", "Jan_2019", "Feb_2019", "Mar_2019", "Apr_2019", "May_2019", "Jun_2019", "Jul_2019", "Aug_2019", "Sep_2019", "Oct_2019", "Nov_2019", "Dec_2019", "Jan_2020", "Feb_2020", "Mar_2020", "Apr_2020", "May_2020", "Jun_2020", "Jul_2020", "Aug_2020", "Sep_2020", "Oct_2020", "Nov_2020", "Dec_2020", "Jan_2021", "Feb_2021", "Mar_2021", "Apr_2021", "May_2021", "Jun_2021", "Jul_2021", "Aug_2021", "Sep_2021", "Oct_2021")
Apt_List_state <- Apt_List_state[, col_order]\(~\)
# Select columns from 2017
rent_state_2017 <- Apt_List_state %>%
select(5:16)
# cbind new columns with Mean of 2017
rent_state_17 <- cbind(rent_state_2017, Average_Rent_State_17 = round(rowMeans(rent_state_2017)))\(~\)
# Select columns from 2018
rent_state_2018 <- Apt_List_state %>%
select(17:28)
# cbind new columns with Mean of 2018
rent_state_18 <- cbind(rent_state_2018, Average_Rent_State_18 = round(rowMeans(rent_state_2018)))\(~\)
# Select columns from 2019
rent_state_2019 <- Apt_List_state %>%
select(29:40)
# cbind new columns with Mean of 2019
rent_state_19 <- cbind(rent_state_2019, Average_Rent_State_19 = round(rowMeans(rent_state_2019)))\(~\)
# Select columns from 2020
rent_state_2020 <- Apt_List_state %>%
select(41:52)
# cbind new columns with Mean of 2019
rent_state_20 <- cbind(rent_state_2020, Average_Rent_State_20 = round(rowMeans(rent_state_2020)))\(~\)
# Select columns from 2021
rent_state_2021 <- Apt_List_state %>%
select(53:62)
# cbind new columns with Mean of 2021
rent_state_21 <- cbind(rent_state_2021, Average_Rent_State_21 = round(rowMeans(rent_state_2021)))\(~\)
# cbind all tables with the mean and drop all columns not needed
Apartment_List_State <- cbind(Apt_List_state, rent_state_17, rent_state_18, rent_state_19, rent_state_20, rent_state_21)
Apartment_List_State <- Apartment_List_State[-c(5:74, 76:87, 89:100, 102:113, 115:124)]
head(Apartment_List_State, n = 3)## State_Name FIPS_Code Population Bedroom_size Average_Rent_State_17
## 1 Alabama 1 4864680 Overall 835
## 2 Alabama 1 4864680 Studio 639
## 3 Alabama 1 4864680 1br 649
## Average_Rent_State_18 Average_Rent_State_19 Average_Rent_State_20
## 1 852 890 917
## 2 652 681 702
## 3 662 692 713
## Average_Rent_State_21
## 1 996
## 2 763
## 3 774
\(~\)
This file will be helpful for Tableau Public
# commenting out so that it doesn't replicate in the system
#write.csv(Apartment_List_State, "/Users/letiix3/Desktop/Data-607/Final Project/csv files/Apartment_List_State.csv")\(~\)
\(~\)
summary(Apartment_List_State)## State_Name FIPS_Code Population Bedroom_size
## Length:306 Min. : 1.00 Min. : 581836 Length:306
## Class :character 1st Qu.:16.00 1st Qu.: 1687809 Class :character
## Mode :character Median :29.00 Median : 4440204 Mode :character
## Mean :28.96 Mean : 6331432
## 3rd Qu.:42.00 3rd Qu.: 7294336
## Max. :56.00 Max. :39148760
## Average_Rent_State_17 Average_Rent_State_18 Average_Rent_State_19
## Min. : 546.0 Min. : 546.0 Min. : 560
## 1st Qu.: 806.2 1st Qu.: 824.8 1st Qu.: 851
## Median :1020.0 Median :1044.0 Median :1076
## Mean :1090.7 Mean :1120.4 Mean :1152
## 3rd Qu.:1263.5 3rd Qu.:1304.8 3rd Qu.:1351
## Max. :2928.0 Max. :3176.0 Max. :3254
## Average_Rent_State_20 Average_Rent_State_21
## Min. : 571.0 Min. : 600.0
## 1st Qu.: 864.8 1st Qu.: 928.5
## Median :1088.0 Median :1153.5
## Mean :1162.3 Mean :1241.0
## 3rd Qu.:1349.2 3rd Qu.:1484.5
## Max. :3211.0 Max. :3391.0
\(~\)
# Count of Bedroom_size
Bdr_size <- Apartment_List_State%>%
group_by(Bedroom_size)%>%
summarise(num=n())%>%
arrange(desc(num))
head(Bdr_size)## # A tibble: 6 Ă— 2
## Bedroom_size num
## <chr> <int>
## 1 1br 51
## 2 2br 51
## 3 3br 51
## 4 4br 51
## 5 Overall 51
## 6 Studio 51
\(~\)
# Count of State and filtering out NY
state <- Apartment_List_State %>%
group_by(State_Name) %>%
filter(State_Name == "New York") %>%
summarise(num=n()) %>%
arrange(desc(num))
head(state)## # A tibble: 1 Ă— 2
## State_Name num
## <chr> <int>
## 1 New York 6
\(~\)
# Mean yr 2017
round(mean(Apartment_List_State$Average_Rent_State_17))## [1] 1091
\(~\)
# Mean yr 2018
round(mean(Apartment_List_State$Average_Rent_State_18))## [1] 1120
\(~\)
# Mean yr 2019
round(mean(Apartment_List_State$Average_Rent_State_19))## [1] 1152
\(~\)
# Mean yr 2020
round(mean(Apartment_List_State$Average_Rent_State_20))## [1] 1162
\(~\)
# Mean yr 2021
round(mean(Apartment_List_State$Average_Rent_State_21))## [1] 1241
\(~\)
\(~\)
# Libraries
library(ggplot2)
library(hrbrthemes)## NOTE: Either Arial Narrow or Roboto Condensed fonts are required to use these themes.
## Please use hrbrthemes::import_roboto_condensed() to install Roboto Condensed and
## if Arial Narrow is not on your system, please see https://bit.ly/arialnarrow
hrbrthemes::import_roboto_condensed()## You will likely need to install these fonts on your system as well.
##
## You can find them in [/Users/letiix3/Library/R/x86_64/4.1/library/hrbrthemes/fonts/roboto-condensed]
library(viridis)## Loading required package: viridisLite
library(plotly)##
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
\(~\)
# Bar plot for all Rent Averages in the boroughs
Street_Easy_All %>%
mutate(Averages = Year_2016_All + Year_2017_All + Year_2018_All + Year_2019_All + Year_2020_All + Year_2021_All + Year_2016_Studio + Year_2017_Studio + Year_2018_Studio + Year_2019_Studio + Year_2020_Studio + Year_2021_Studio + Year_2016_OneBdr + Year_2017_OneBdr + Year_2018_OneBdr + Year_2019_OneBdr + Year_2020_OneBdr + Year_2021_OneBdr +
Year_2016_TwoBdr + Year_2017_TwoBdr + Year_2018_TwoBdr +
Year_2019_TwoBdr + Year_2020_TwoBdr + Year_2021_TwoBdr +
Year_2016_ThreeBdrs + Year_2017_ThreeBdrs + Year_2018_ThreeBdrs +
Year_2019_ThreeBdrs + Year_2020_ThreeBdrs + Year_2021_ThreeBdrs) %>% # Creating cumulative time
ggplot(aes(x = Borough, y = Averages, fill = Averages)) + # Passing plot arguments
geom_bar(stat = "Identity") # Specifying the type of plot\(~\)
Street_Easy_All %>%
pivot_longer(4:9, names_to = "Averages", values_to = "Value") %>%
ggplot(aes(x = Averages)) +
geom_bar() +
facet_wrap(vars(Averages), ncol = 3) +
labs(x = "Rent Averages", y = "Count", title = "All Apartment Types")\(~\)
Street_Easy_Studio %>%
pivot_longer(4:9, names_to = "Averages", values_to = "Value") %>%
ggplot(aes(x = Averages)) +
geom_bar() +
facet_wrap(vars(Averages), ncol = 3) +
labs(x = "Rent Averages", y = "Count", title = "Studio's")\(~\)
# Plot for original dataset of All dataset
streeteasy_all %>%
pivot_longer(4:15, names_to = "Averages", values_to = "Value") %>%
ggplot(aes(x = Value, color = "blue")) +
facet_wrap(vars(Averages), ncol = 3) +
geom_point(stat = "count") +
geom_line(stat = "count") +
labs(x = "Rent Cost per Month", y = "Count")\(~\)
# Count of Borough's for Codecademy dataset
street_easy_cc %>%
ggplot(aes(x = Borough, fill = Borough)) +
geom_bar(stat = "count") +
labs(x = "Boroughs", y = "Count")\(~\)
# Codecademy Count of Rent Prices
street_easy_cc %>%
ggplot(aes(x=Rent_Price)) +
geom_histogram(binwidth=15, fill="#69b3a2", color="#e9ecef", alpha=0.9) +
ggtitle("Rent Prices") +
theme_ipsum() +
theme(
plot.title = element_text(size=15)
)\(~\)
# Density plot for Borough and Rent Price in Codecademy
street_easy_cc %>%
ggplot(aes(x = Rent_Price, group = Borough, fill = Borough)) +
geom_density(adjust = 1.0) +
theme_ipsum() +
facet_wrap(~Borough) +
theme(
legend.position="none",
panel.spacing = unit(0.1, "lines"),
axis.ticks.x=element_blank()
)\(~\)
# Line plot for Rent Price and Number of Bedrooms in Codecademy
street_easy_cc %>%
tail(10) %>%
ggplot( aes(x = Rent_Price, y = No_Bedrooms)) +
geom_line() +
geom_point()\(~\)
# Plot .long for Codecademy
street_easy_cc.long <- street_easy_cc %>%
select("Neighborhood", "Rent_Price") %>%
pivot_longer(-Neighborhood, names_to = "Year", values_to = "Rent_Price")
ggplot(street_easy_cc.long, aes(Neighborhood, Rent_Price, colour = Year)) + geom_line()\(~\)
# This plot came out a bit cluttered
rental_price <- street_easy_cc %>% select("Borough", "No_Bedrooms", "Rent_Price")
plot_ly(rental_price, x = ~No_Bedrooms, y = ~Rent_Price, color = ~Borough, type = 'scatter', mode = 'lines')\(~\)
Each data set required a bit of wrangling but after some exploration I found the following results:
\(~\)
\(~\)
When taking into consideration the apartment types within the StreetEasy data set the rent averages per year were:
\(~\)
\(~\)
How does that compare to the entire United States? From the table below, overall, New York has a higher average rent from the years 2017 - 2021. There was about a 39% difference between the years 2017 - 2019 but from 2020 - 2021 the difference dropped to about 30% of rent increase.
\(~\)
# Comparison of all apartment types for year 2017 - 2021 Apartment List and StreetEasy
comparison <- data.frame(Year = c(2017, 2018, 2019, 2020, 2021),
Avg_Rent_US = c(1091, 1120, 1152, 1162, 1241),
New_York = c(1798, 1830, 1902, 1760, 1780))
comparison## Year Avg_Rent_US New_York
## 1 2017 1091 1798
## 2 2018 1120 1830
## 3 2019 1152 1902
## 4 2020 1162 1760
## 5 2021 1241 1780
\(~\)
# Percent difference for rent averages
# percent_diff <- ((new_avg - old_avg) / new_avg) * 100
percent_diff17 <- ((1798 - 1091) / 1798) * 100
percent_diff17## [1] 39.32147
percent_diff18 <- ((1830 - 1120) / 1830) * 100
percent_diff18## [1] 38.79781
percent_diff19 <- ((1902 - 1152) / 1902) * 100
percent_diff19## [1] 39.43218
percent_diff20 <- ((1760 - 1162) / 1760) * 100
percent_diff20## [1] 33.97727
percent_diff21 <- ((1780 - 1241) / 1780) * 100
percent_diff21## [1] 30.2809
\(~\)
Digging deeper, using the Apartment List data set, I am comparing the years 2019 - 2021 with the average cost of each apartment types and obtained the results below:
# Comparison of all apartment types for year 2019 - 2021 from Apartment List
comparison2 <- data.frame(Apt_Type = c("Studio", "One_Bdr", "Two_Bdrs", "Three_Bdrs", "Four_Bdrs"),
"Yr_2019" = c(853, 912, 1111, 1336, 1587),
"Yr_2020" = c(860, 919, 1121, 1349, 1601),
"Yr_2021" = c(917, 981, 1197, 1441, 1710))
comparison2## Apt_Type Yr_2019 Yr_2020 Yr_2021
## 1 Studio 853 860 917
## 2 One_Bdr 912 919 981
## 3 Two_Bdrs 1111 1121 1197
## 4 Three_Bdrs 1336 1349 1441
## 5 Four_Bdrs 1587 1601 1710
\(~\)
For all 5 apartment types there is a small increase in rent of about 7% from 2019 to 2021.
\(~\)
# Percent difference for rent averages from 2019 and 2021
# percent_diff <- ((new_avg - old_avg) / new_avg) * 100
studio <- ((917 - 853) / 917) * 100
studio## [1] 6.97928
one_bdr <- ((981 - 912) / 981) * 100
one_bdr## [1] 7.033639
two_bdr <- ((1197 - 1111) / 1197) * 100
two_bdr## [1] 7.184628
three_bdr <- ((1441 - 1336) / 1441) * 100
three_bdr## [1] 7.286607
four_bdr <- ((1710 - 1587) / 1710) * 100
four_bdr## [1] 7.192982
\(~\)
After importing my data sets into Tableau Public I was able to create more visuals that led to the following conclusions:
The pandemic has affected rent prices all throughout the United States. Although the most expensive states saw rent price drops between 2020 and 2021, there have been increases in average rent as well. New York has one of the highest average in rent prices nationwide. Not only that but based on these numbers I predict that apartment rental prices will continue to rise an average of about 7% within the next two years.
\(~\)
While thinking about this project, I made a couple of assumptions. The first being the timeline I should consider, if 2018 - 2021 would suffice. While exploring the data I decided that I had to expand my search and opted for 2016 - 2021. I also thought I would see a larger decrease in rent for 2020 and into 2021 since media coverage has covered how much rent prices in New York and throughout the 5 boroughs have decreased. One assumption that was true was Manhattan having the highest average rent price in NYC.
\(~\)
\(~\)
\(~\)
\(~\)
\(~\)
\(~\)
\(~\)