Rental prices have increased since 1980 on average 8.86% per year. Nationwide the average monthly rent is $1,164 and within the 21st century the median rent increases at an annual rate of 4.17%. How does this compare by state, particularly in New York? Did the pandemic have a big effect on these rental cost prices? To answer these questions and more, I will be using three data sets including: Street Easy Data, Codecademy in GitHub, and Apartment List that include data from 2016 - October 2021. I will be wrangling and exploring this data and using Tableau Public to create visual aids to show how rental costs have changed over time.
\(~\)
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✓ ggplot2 3.3.5 ✓ purrr 0.3.4
## ✓ tibble 3.1.4 ✓ dplyr 1.0.7
## ✓ tidyr 1.1.3 ✓ stringr 1.4.0
## ✓ readr 2.0.1 ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(stringr)
library(dplyr)
library(tidyr)
\(~\)
Street Easy Data from 2016 - October 2021 data sets containing All apartment types, Studio, One Bedroom, Two Bedrooms, and Three Bedrooms or more. I’ll be exploring the data sets individually and then joining them to compare.
\(~\)
# StreetEasy Data
# All types of Apartments
<- read.csv("https://raw.githubusercontent.com/letisalba/Data-607/main/Final%20Project/csv%20files/medianAskingRent_All.csv", header = TRUE, stringsAsFactors = FALSE, na.strings = c(""," ","NA"));
street_easy_all
# Studio's only
<- read.csv("https://raw.githubusercontent.com/letisalba/Data-607/main/Final%20Project/csv%20files/medianAskingRent_Studio.csv", header = TRUE, stringsAsFactors = FALSE, na.strings = c(""," ","NA"));
street_easy_studio
# One Bedroom only
<- read.csv("https://raw.githubusercontent.com/letisalba/Data-607/main/Final%20Project/csv%20files/medianAskingRent_OneBd.csv", header = TRUE, stringsAsFactors = FALSE, na.strings = c(""," ","NA"));
street_easy_onebrd
# Two Bedrooms only
<- read.csv("https://raw.githubusercontent.com/letisalba/Data-607/main/Final%20Project/csv%20files/medianAskingRent_TwoBd.csv", header = TRUE, stringsAsFactors = FALSE, na.strings = c(""," ","NA"));
street_easy_twobrd
# Three or more Bedrooms only
<- read.csv("https://raw.githubusercontent.com/letisalba/Data-607/main/Final%20Project/csv%20files/medianAskingRent_ThreePlusBd.csv", header = TRUE, stringsAsFactors = FALSE, na.strings = c(""," ","NA")); street_easy_threebrds
\(~\)
Rows: 198 Columns: 145
\(~\)
# Remove columns not needed
<- street_easy_all[, -c(4:75)]
streeteasy_all
# Rename columns
colnames(streeteasy_all) <- c("Area_Name", "Borough", "Area_Type", "Jan_2016", "Feb_2016", "Mar_2016", "Apr_2016", "May_2016", "Jun_2016", "Jul_2016", "Aug_2016", "Sep_2016", "Oct_2016", "Nov_2016", "Dec_2016", "Jan_2017", "Feb_2017", "Mar_2017", "Apr_2017", "May_2017", "Jun_2017", "Jul_2017", "Aug_2017", "Sep_2017", "Oct_2017", "Nov_2017", "Dec_2017", "Jan_2018", "Feb_2018", "Mar_2018", "Apr_2018", "May_2018", "Jun_2018", "Jul_2018", "Aug_2018", "Sep_2018", "Oct_2018", "Nov_2018", "Dec_2018", "Jan_2019", "Feb_2019", "Mar_2019", "Apr_2019", "May_2019", "Jun_2019", "Jul_2019", "Aug_2019", "Sep_2019", "Oct_2019", "Nov_2019", "Dec_2019", "Jan_2020", "Feb_2020", "Mar_2020", "Apr_2020", "May_2020", "Jun_2020", "Jul_2020", "Aug_2020", "Sep_2020", "Oct_2020", "Nov_2020", "Dec_2020", "Jan_2021", "Feb_2021", "Mar_2021", "Apr_2021", "May_2021", "Jun_2021", "Jul_2021", "Aug_2021", "Sep_2021", "Oct_2021")
# Replace all NA values with 0
<- streeteasy_all %>% replace(is.na(.), 0) streeteasy_all
\(~\)
# Select columns from 2016
<- streeteasy_all %>%
yr2016_all select(4:15)
# cbind new columns with Mean of 2016
<- cbind(yr2016_all, Year_2016_All = round(rowMeans(yr2016_all))) yr_2016_all
\(~\)
# Select columns from 2017
<- streeteasy_all %>%
yr2017_all select(16:27)
# cbind new columns with Mean of 2017
<- cbind(yr2017_all, Year_2017_All = round(rowMeans(yr2017_all))) yr_2017_all
\(~\)
# Select columns from 2018
<- streeteasy_all %>%
yr2018_all select(28:39)
# cbind new columns with Mean of 2018
<- cbind(yr2018_all, Year_2018_All = round(rowMeans(yr2018_all))) yr_2018_all
\(~\)
# Select columns from 2019
<- streeteasy_all %>%
yr2019_all select(40:51)
# cbind new columns with Mean of 2019
<- cbind(yr2019_all, Year_2019_All = round(rowMeans(yr2019_all))) yr_2019_all
\(~\)
# Select columns for 2020
<- streeteasy_all %>%
yr2020_all select(52:63)
# cbind new columns with Mean of 2020
<- cbind(yr2020_all, Year_2020_All = round(rowMeans(yr2020_all))) yr_2020_all
\(~\)
# Select columns for 2021
<- streeteasy_all %>%
yr2021_all select(64:73)
# cbind new columns with Mean of 2021
<- cbind(yr2021_all, Year_2021_All = round(rowMeans(yr2021_all))) yr_2021_all
\(~\)
Rows: 198 Columns: 145
\(~\)
# Remove columns not needed
<- street_easy_studio[, -c(4:75)]
streeteasy_studio
# Rename columns
colnames(streeteasy_studio) <- c("Area_Name", "Borough", "Area_Type", "Jan_2016", "Feb_2016", "Mar_2016", "Apr_2016", "May_2016", "Jun_2016", "Jul_2016", "Aug_2016", "Sep_2016", "Oct_2016", "Nov_2016", "Dec_2016", "Jan_2017", "Feb_2017", "Mar_2017", "Apr_2017", "May_2017", "Jun_2017", "Jul_2017", "Aug_2017", "Sep_2017", "Oct_2017", "Nov_2017", "Dec_2017", "Jan_2018", "Feb_2018", "Mar_2018", "Apr_2018", "May_2018", "Jun_2018", "Jul_2018", "Aug_2018", "Sep_2018", "Oct_2018", "Nov_2018", "Dec_2018", "Jan_2019", "Feb_2019", "Mar_2019", "Apr_2019", "May_2019", "Jun_2019", "Jul_2019", "Aug_2019", "Sep_2019", "Oct_2019", "Nov_2019", "Dec_2019", "Jan_2020", "Feb_2020", "Mar_2020", "Apr_2020", "May_2020", "Jun_2020", "Jul_2020", "Aug_2020", "Sep_2020", "Oct_2020", "Nov_2020", "Dec_2020", "Jan_2021", "Feb_2021", "Mar_2021", "Apr_2021", "May_2021", "Jun_2021", "Jul_2021", "Aug_2021", "Sep_2021", "Oct_2021")
# Replace all NA values with 0
<- streeteasy_studio %>% replace(is.na(.), 0) streeteasy_studio
\(~\)
# Select columns from 2016
<- streeteasy_studio %>%
yr2016_studio select(4:15)
# cbind new columns with Mean of 2016
<- cbind(yr2016_studio, Year_2016_Studio = round(rowMeans(yr2016_studio))) yr_2016_studio
\(~\)
# Select columns from 2017
<- streeteasy_studio %>%
yr2017_studio select(16:27)
# cbind new columns with Mean of 2017
<- cbind(yr2017_studio, Year_2017_Studio = round(rowMeans(yr2017_studio))) yr_2017_studio
\(~\)
# Select columns from 2018
<- streeteasy_studio %>%
yr2018_studio select(28:39)
# cbind new columns with Mean of 2018
<- cbind(yr2018_studio, Year_2018_Studio = round(rowMeans(yr2018_studio))) yr_2018_studio
\(~\)
# Select columns from 2019
<- streeteasy_studio %>%
yr2019_studio select(40:51)
# cbind new columns with Mean of 2019
<- cbind(yr2019_studio, Year_2019_Studio = round(rowMeans(yr2019_studio))) yr_2019_studio
\(~\)
# Select columns from 2020
<- streeteasy_studio %>%
yr2020_studio select(52:63)
# cbind new columns with Mean of 2020
<- cbind(yr2020_studio, Year_2020_Studio = round(rowMeans(yr2020_studio))) yr_2020_studio
\(~\)
# Select columns from 2021
<- streeteasy_studio %>%
yr2021_studio select(64:73)
# cbind new columns with Mean of 2021
<- cbind(yr2021_studio, Year_2021_Studio = round(rowMeans(yr2021_studio))) yr_2021_studio
\(~\)
Rows: 198 Columns: 145
\(~\)
# Remove columns not needed
<- street_easy_onebrd[, -c(4:75)]
streeteasy_onebdr
# Rename columns
colnames(streeteasy_onebdr) <- c("Area_Name", "Borough", "Area_Type", "Jan_2016", "Feb_2016", "Mar_2016", "Apr_2016", "May_2016", "Jun_2016", "Jul_2016", "Aug_2016", "Sep_2016", "Oct_2016", "Nov_2016", "Dec_2016", "Jan_2017", "Feb_2017", "Mar_2017", "Apr_2017", "May_2017", "Jun_2017", "Jul_2017", "Aug_2017", "Sep_2017", "Oct_2017", "Nov_2017", "Dec_2017", "Jan_2018", "Feb_2018", "Mar_2018", "Apr_2018", "May_2018", "Jun_2018", "Jul_2018", "Aug_2018", "Sep_2018", "Oct_2018", "Nov_2018", "Dec_2018", "Jan_2019", "Feb_2019", "Mar_2019", "Apr_2019", "May_2019", "Jun_2019", "Jul_2019", "Aug_2019", "Sep_2019", "Oct_2019", "Nov_2019", "Dec_2019", "Jan_2020", "Feb_2020", "Mar_2020", "Apr_2020", "May_2020", "Jun_2020", "Jul_2020", "Aug_2020", "Sep_2020", "Oct_2020", "Nov_2020", "Dec_2020", "Jan_2021", "Feb_2021", "Mar_2021", "Apr_2021", "May_2021", "Jun_2021", "Jul_2021", "Aug_2021", "Sep_2021", "Oct_2021")
# Replace all NA values with 0
<- streeteasy_onebdr %>% replace(is.na(.), 0) streeteasy_onebdr
\(~\)
# Select columns from 2016
<- streeteasy_onebdr %>%
yr2016_onebdr select(4:15)
# cbind new columns with Mean of 2016
<- cbind(yr2016_onebdr, Year_2016_OneBdr = round(rowMeans(yr2016_onebdr))) yr_2016_onebdr
\(~\)
# Select columns from 2017
<- streeteasy_onebdr %>%
yr2017_onebdrselect(16:27)
# cbind new columns with Mean of 2017
<- cbind(yr2017_onebdr, Year_2017_OneBdr = round(rowMeans(yr2017_onebdr))) yr_2017_onebdr
\(~\)
# Select columns from 2018
<- streeteasy_onebdr %>%
yr2018_onebdr select(28:39)
# cbind new columns with Mean of 2018
<- cbind(yr2018_onebdr, Year_2018_OneBdr = round(rowMeans(yr2018_onebdr))) yr_2018_onebdr
\(~\)
# Select columns from 2019
<- streeteasy_onebdr %>%
yr2019_onebdr select(40:51)
# cbind new columns with Mean of 2019
<- cbind(yr2019_onebdr, Year_2019_OneBdr = round(rowMeans(yr2019_onebdr))) yr_2019_onebdr
\(~\)
# Select columns from 2020
<- streeteasy_onebdr %>%
yr2020_onebdr select(52:63)
# cbind new columns with Mean of 2020
<- cbind(yr2020_onebdr, Year_2020_OneBdr = round(rowMeans(yr2020_onebdr))) yr_2020_onebdr
\(~\)
# Select columns from 2021
<- streeteasy_onebdr %>%
yr2021_onebdr select(64:73)
# cbind new columns with Mean of 2020
<- cbind(yr2021_onebdr, Year_2021_OneBdr = round(rowMeans(yr2021_onebdr))) yr_2021_onebdr
\(~\)
Rows: 198 Columns: 145
\(~\)
# Remove columns not needed
<- street_easy_twobrd[, -c(4:75)]
streeteasy_twobrd
# Rename columns
colnames(streeteasy_twobrd) <- c("Area_Name", "Borough", "Area_Type", "Jan_2016", "Feb_2016", "Mar_2016", "Apr_2016", "May_2016", "Jun_2016", "Jul_2016", "Aug_2016", "Sep_2016", "Oct_2016", "Nov_2016", "Dec_2016", "Jan_2017", "Feb_2017", "Mar_2017", "Apr_2017", "May_2017", "Jun_2017", "Jul_2017", "Aug_2017", "Sep_2017", "Oct_2017", "Nov_2017", "Dec_2017", "Jan_2018", "Feb_2018", "Mar_2018", "Apr_2018", "May_2018", "Jun_2018", "Jul_2018", "Aug_2018", "Sep_2018", "Oct_2018", "Nov_2018", "Dec_2018", "Jan_2019", "Feb_2019", "Mar_2019", "Apr_2019", "May_2019", "Jun_2019", "Jul_2019", "Aug_2019", "Sep_2019", "Oct_2019", "Nov_2019", "Dec_2019", "Jan_2020", "Feb_2020", "Mar_2020", "Apr_2020", "May_2020", "Jun_2020", "Jul_2020", "Aug_2020", "Sep_2020", "Oct_2020", "Nov_2020", "Dec_2020", "Jan_2021", "Feb_2021", "Mar_2021", "Apr_2021", "May_2021", "Jun_2021", "Jul_2021", "Aug_2021", "Sep_2021", "Oct_2021")
# Replace all NA values with 0
<- streeteasy_twobrd %>% replace(is.na(.), 0) streeteasy_twobrd
\(~\)
# Select columns from 2016
<- streeteasy_twobrd %>%
yr2016_twobdr select(4:15)
# cbind new columns with Mean of 2016
<- cbind(yr2016_twobdr, Year_2016_TwoBdr = round(rowMeans(yr2016_twobdr))) yr_2016_twobdr
\(~\)
# Select columns from 2017
<- streeteasy_twobrd %>%
yr2017_twobdr select(16:27)
# cbind new columns with Mean of 2017
<- cbind(yr2017_twobdr, Year_2017_TwoBdr = round(rowMeans(yr2017_twobdr))) yr_2017_twobdr
\(~\)
# Select columns from 2018
<- streeteasy_twobrd %>%
yr2018_twobdr select(28:39)
# cbind new columns with Mean of 2018
<- cbind(yr2018_twobdr, Year_2018_TwoBdr = round(rowMeans(yr2018_twobdr))) yr_2018_twobdr
\(~\)
# Select columns from 2019
<- streeteasy_twobrd %>%
yr2019_twobdr select(40:51)
# cbind new columns with Mean of 2019
<- cbind(yr2019_twobdr, Year_2019_TwoBdr = round(rowMeans(yr2019_twobdr))) yr_2019_twobdr
\(~\)
# Select columns from 2020
<- streeteasy_twobrd %>%
yr2020_twobdr select(52:63)
# cbind new columns with Mean of 2020
<- cbind(yr2020_twobdr, Year_2020_TwoBdr = round(rowMeans(yr2020_twobdr))) yr_2020_twobdr
\(~\)
# Select columns from 2021
<- streeteasy_twobrd %>%
yr2021_twobdr select(64:73)
# cbind new columns with Mean of 2020
<- cbind(yr2021_twobdr, Year_2021_TwoBdr = round(rowMeans(yr2021_twobdr))) yr_2021_twobdr
\(~\)
Rows: 198 Columns: 145
\(~\)
# Remove columns not needed
<- street_easy_threebrds[, -c(4:75)]
streeteasy_threebrds
# Rename columns
colnames(streeteasy_threebrds) <- c("Area_Name", "Borough", "Area_Type", "Jan_2016", "Feb_2016", "Mar_2016", "Apr_2016", "May_2016", "Jun_2016", "Jul_2016", "Aug_2016", "Sep_2016", "Oct_2016", "Nov_2016", "Dec_2016", "Jan_2017", "Feb_2017", "Mar_2017", "Apr_2017", "May_2017", "Jun_2017", "Jul_2017", "Aug_2017", "Sep_2017", "Oct_2017", "Nov_2017", "Dec_2017", "Jan_2018", "Feb_2018", "Mar_2018", "Apr_2018", "May_2018", "Jun_2018", "Jul_2018", "Aug_2018", "Sep_2018", "Oct_2018", "Nov_2018", "Dec_2018", "Jan_2019", "Feb_2019", "Mar_2019", "Apr_2019", "May_2019", "Jun_2019", "Jul_2019", "Aug_2019", "Sep_2019", "Oct_2019", "Nov_2019", "Dec_2019", "Jan_2020", "Feb_2020", "Mar_2020", "Apr_2020", "May_2020", "Jun_2020", "Jul_2020", "Aug_2020", "Sep_2020", "Oct_2020", "Nov_2020", "Dec_2020", "Jan_2021", "Feb_2021", "Mar_2021", "Apr_2021", "May_2021", "Jun_2021", "Jul_2021", "Aug_2021", "Sep_2021", "Oct_2021")
# Replace all NA values with 0
<- streeteasy_threebrds %>% replace(is.na(.), 0) streeteasy_threebrds
\(~\)
# Select columns from 2016
<- streeteasy_threebrds %>%
yr2016_threebdrs select(4:15)
# cbind new columns with Mean of 2016
<- cbind(yr2016_threebdrs, Year_2016_ThreeBdrs = round(rowMeans(yr2016_threebdrs))) yr_2016_threebdrs
\(~\)
# Select columns from 2017
<- streeteasy_threebrds %>%
yr2017_threebdrs select(16:27)
# cbind new columns with Mean of 2017
<- cbind(yr2017_threebdrs, Year_2017_ThreeBdrs = round(rowMeans(yr2017_threebdrs))) yr_2017_threebdrs
\(~\)
# Select columns from 2018
<- streeteasy_threebrds %>%
yr2018_threebdrs select(28:39)
# cbind new columns with Mean of 2018
<- cbind(yr2018_threebdrs, Year_2018_ThreeBdrs = round(rowMeans(yr2018_threebdrs))) yr_2018_threebdrs
\(~\)
# Select columns from 2019
<- streeteasy_threebrds %>%
yr2019_threebdrs select(40:51)
# cbind new columns with Mean of 2019
<- cbind(yr2019_threebdrs, Year_2019_ThreeBdrs = round(rowMeans(yr2019_threebdrs))) yr_2019_threebdrs
\(~\)
# Select columns from 2020
<- streeteasy_threebrds %>%
yr2020_threebdrs select(52:63)
# cbind new columns with Mean of 2020
<- cbind(yr2020_threebdrs, Year_2020_ThreeBdrs = round(rowMeans(yr2020_threebdrs))) yr_2020_threebdrs
\(~\)
# Select columns from 2021
<- streeteasy_threebrds %>%
yr2021_threebdrs select(64:73)
# cbind new columns with Mean of 2020
<- cbind(yr2021_threebdrs, Year_2021_ThreeBdrs = round(rowMeans(yr2021_threebdrs))) yr_2021_threebdrs
\(~\)
# cbind year 2016 for all apartment types
<- cbind(yr_2016_studio, yr_2016_onebdr, yr_2016_twobdr, yr_2016_threebdrs)
street_easy16 <- street_easy16[-c(1:12, 14:25, 27:38, 40:51, 53:64, 66:75)]
street_easy16
# cbind year 2017 for all apartment types
<- cbind(yr_2017_studio, yr_2017_onebdr, yr_2017_twobdr, yr_2017_threebdrs)
street_easy17 <- street_easy17[-c(1:12, 14:25, 27:38, 40:51, 53:64, 66:75)]
street_easy17
# cbind year 2018 for all apartment types
<- cbind(yr_2018_studio, yr_2018_onebdr, yr_2018_twobdr, yr_2018_threebdrs)
street_easy18 <- street_easy18[-c(1:12, 14:25, 27:38, 40:51, 53:64, 66:75)]
street_easy18
# cbind year 2019 for all apartment types
<- cbind(yr_2019_studio, yr_2019_onebdr, yr_2019_twobdr, yr_2019_threebdrs)
street_easy19 <- street_easy19[-c(1:12, 14:25, 27:38, 40:51, 53:64, 66:75)]
street_easy19
# cbind year 2020 for all apartment types
<- cbind(yr_2020_studio, yr_2020_onebdr, yr_2020_twobdr, yr_2020_threebdrs)
street_easy20 <- street_easy20[-c(1:12, 14:25, 27:38, 40:51, 53:64, 66:75)]
street_easy20
# cbind year 2021 for all apartment types
<- cbind(yr_2021_studio, yr_2021_onebdr, yr_2021_twobdr, yr_2021_threebdrs)
street_easy21 <- street_easy21[-c(1:10, 12:21, 23:32, 34:43)] street_easy21
\(~\)
# cbind original table with new table and drop all columns not needed
<- cbind(streeteasy_all, street_easy16, street_easy17, street_easy18, street_easy19, street_easy20, street_easy21)
street_easy_All
# Dropping columns not needed
<- street_easy_All[-c(4:73)] street_easy_All
\(~\)
Another way to combine tables to explore
# cbind all tables with the mean and drop all columns not needed
# All Apt Types
<- cbind(yr_2016_all, yr_2017_all, yr_2018_all, yr_2019_all, yr_2020_all, yr_2021_all)
street_easyall <- street_easyall[-c(1:12, 14:25, 27:38, 40:51, 53:64, 66:75)]
street_easyall
# Studio's only
<- cbind(yr_2016_studio, yr_2017_studio, yr_2018_studio, yr_2019_studio, yr_2020_studio, yr_2021_studio)
street_easystudio <- street_easystudio[-c(1:12, 14:25, 27:38, 40:51, 53:64, 66:75)]
street_easystudio
# One Bedroom only
<- cbind(yr_2016_onebdr, yr_2017_onebdr, yr_2018_onebdr, yr_2019_onebdr, yr_2020_onebdr, yr_2021_onebdr)
street_easyonebdr <- street_easyonebdr[-c(1:12, 14:25, 27:38, 40:51, 53:64, 66:75)]
street_easyonebdr
# Two Bedrooms only
<- cbind(yr_2016_twobdr, yr_2017_twobdr, yr_2018_twobdr, yr_2019_twobdr, yr_2020_twobdr, yr_2021_twobdr)
street_easytwobdr <- street_easytwobdr[-c(1:12, 14:25, 27:38, 40:51, 53:64, 66:75)]
street_easytwobdr
# Three Bedrooms only
<- cbind(yr_2016_threebdrs, yr_2017_threebdrs, yr_2018_threebdrs, yr_2019_threebdrs, yr_2020_threebdrs, yr_2021_threebdrs)
street_easythreebdrs <- street_easythreebdrs[-c(1:12, 14:25, 27:38, 40:51, 53:64, 66:75)] street_easythreebdrs
\(~\)
Another way to combine tables to explore
# cbind all Studio tables with original dataset and drop columns not needed
<- cbind(streeteasy_all, yr_2016_studio, yr_2017_studio, yr_2018_studio, yr_2019_studio, yr_2020_studio, yr_2021_studio)
Street_Easy_Studio <- Street_Easy_Studio[-c(4:85, 87:98, 100:111, 113:124, 126:137, 139:148)]
Street_Easy_Studio
# cbind all One Bedroom tables with original dataset and drop columns not needed
<- cbind(streeteasy_all, yr_2016_onebdr, yr_2017_onebdr, yr_2018_onebdr, yr_2019_onebdr, yr_2020_onebdr, yr_2021_onebdr)
Street_Easy_OneBdr <- Street_Easy_OneBdr[-c(4:85, 87:98, 100:111, 113:124, 126:137, 139:148)]
Street_Easy_OneBdr
# cbind all Two Bedrooms tables with original dataset and drop columns not needed
<- cbind(streeteasy_all, yr_2016_twobdr, yr_2017_twobdr, yr_2018_twobdr, yr_2019_twobdr, yr_2020_twobdr, yr_2021_twobdr)
Street_Easy_TwoBdr <- Street_Easy_TwoBdr[-c(4:85, 87:98, 100:111, 113:124, 126:137, 139:148)]
Street_Easy_TwoBdr
# cbind all 3 or more Bedrooms with original data set and drop columns not needed
<- cbind(streeteasy_all, yr_2016_threebdrs, yr_2017_threebdrs, yr_2018_threebdrs, yr_2019_threebdrs, yr_2020_threebdrs, yr_2021_threebdrs)
Street_Easy_ThreeBdrs <- Street_Easy_ThreeBdrs[-c(4:85, 87:98, 100:111, 113:124, 126:137, 139:148)] Street_Easy_ThreeBdrs
\(~\)
This data frame will be used to write .csv file to then be used in tableau. This was easier to explore since it was already categorized by apartment type with a subcategory of years.
# cbind original table with new table and drop all columns not needed
<- cbind(streeteasy_all, street_easyall, street_easystudio, street_easyonebdr, street_easytwobdr, street_easythreebdrs)
Street_Easy_All
# Dropping columns not needed
<- Street_Easy_All[-c(4:73)]
Street_Easy_All
# Replace 0 in Borough column with NYC
$Borough[Street_Easy_All$Borough=="0"]<-"NYC"
Street_Easy_Allhead(Street_Easy_All, n = 4)
## Area_Name Borough Area_Type Year_2016_All Year_2017_All
## 1 All Downtown Manhattan submarket 3765 3689
## 2 All Midtown Manhattan submarket 3516 3471
## 3 All Upper East Side Manhattan submarket 2979 2960
## 4 All Upper Manhattan Manhattan submarket 2363 2335
## Year_2018_All Year_2019_All Year_2020_All Year_2021_All Year_2016_Studio
## 1 3779 3956 3561 3583 2795
## 2 3474 3637 3249 3313 2613
## 3 2912 3057 2881 2570 2123
## 4 2343 2408 2351 2250 1645
## Year_2017_Studio Year_2018_Studio Year_2019_Studio Year_2020_Studio
## 1 2738 2779 2925 2605
## 2 2569 2559 2678 2404
## 3 2088 2094 2156 2042
## 4 1685 1730 1765 1746
## Year_2021_Studio Year_2016_OneBdr Year_2017_OneBdr Year_2018_OneBdr
## 1 2626 3590 3508 3564
## 2 2378 3509 3462 3456
## 3 1896 2831 2811 2799
## 4 1692 1927 1955 1967
## Year_2019_OneBdr Year_2020_OneBdr Year_2021_OneBdr Year_2016_TwoBdr
## 1 3804 3428 3519 4771
## 2 3611 3221 3342 5013
## 3 2920 2770 2522 3808
## 4 2026 2013 1925 2408
## Year_2017_TwoBdr Year_2018_TwoBdr Year_2019_TwoBdr Year_2020_TwoBdr
## 1 4527 4497 4727 4270
## 2 4883 4730 4953 4377
## 3 3858 3679 4015 3816
## 4 2422 2440 2504 2438
## Year_2021_TwoBdr Year_2016_ThreeBdrs Year_2017_ThreeBdrs Year_2018_ThreeBdrs
## 1 4114 7115 6478 6464
## 2 4793 6432 6339 6006
## 3 3325 8553 8146 7924
## 4 2360 3161 3129 3138
## Year_2019_ThreeBdrs Year_2020_ThreeBdrs Year_2021_ThreeBdrs
## 1 7224 6275 5873
## 2 6165 5393 5969
## 3 8632 8227 7044
## 4 3193 3045 2902
\(~\)
# Count of Boroughs
<- Street_Easy_All %>%
borough group_by(Borough) %>%
summarise(num=n()) %>%
arrange(desc(num))
head(borough, n = 6)
## # A tibble: 6 Ă— 2
## Borough num
## <chr> <int>
## 1 Queens 60
## 2 Brooklyn 55
## 3 Bronx 43
## 4 Manhattan 38
## 5 NYC 1
## 6 Staten Island 1
\(~\)
summary(Street_Easy_All)
## Area_Name Borough Area_Type Year_2016_All
## Length:198 Length:198 Length:198 Min. : 0.0
## Class :character Class :character Class :character 1st Qu.: 164.2
## Mode :character Mode :character Mode :character Median :1831.0
## Mean :1729.7
## 3rd Qu.:2589.0
## Max. :7509.0
## Year_2017_All Year_2018_All Year_2019_All Year_2020_All
## Min. : 0.0 Min. : 0.0 Min. : 0 Min. : 0.0
## 1st Qu.: 554.2 1st Qu.: 745.8 1st Qu.: 876 1st Qu.: 348.8
## Median :1925.5 Median :1926.5 Median :1975 Median :1957.5
## Mean :1798.4 Mean :1830.5 Mean :1902 Mean :1760.1
## 3rd Qu.:2554.5 3rd Qu.:2594.0 3rd Qu.:2675 3rd Qu.:2599.8
## Max. :7082.0 Max. :7539.0 Max. :7180 Max. :7078.0
## Year_2021_All Year_2016_Studio Year_2017_Studio Year_2018_Studio
## Min. : 0 Min. : 0.0 Min. : 0.0 Min. : 0.0
## 1st Qu.: 499 1st Qu.: 0.0 1st Qu.: 0.0 1st Qu.: 0.0
## Median :1945 Median : 0.0 Median : 0.0 Median : 122.0
## Mean :1780 Mean : 825.8 Mean : 879.3 Mean : 912.9
## 3rd Qu.:2508 3rd Qu.:1652.8 3rd Qu.:1706.8 3rd Qu.:1778.2
## Max. :7330 Max. :3687.0 Max. :3482.0 Max. :3859.0
## Year_2019_Studio Year_2020_Studio Year_2021_Studio Year_2016_OneBdr
## Min. : 0.0 Min. : 0.0 Min. : 0.0 Min. : 0
## 1st Qu.: 0.0 1st Qu.: 0.0 1st Qu.: 0.0 1st Qu.: 0
## Median : 167.0 Median : 144.0 Median : 166.5 Median :1428
## Mean : 953.9 Mean : 919.2 Mean : 953.3 Mean :1341
## 3rd Qu.:1833.0 3rd Qu.:1776.8 3rd Qu.:1846.2 3rd Qu.:2126
## Max. :3983.0 Max. :3401.0 Max. :4043.0 Max. :6722
## Year_2017_OneBdr Year_2018_OneBdr Year_2019_OneBdr Year_2020_OneBdr
## Min. : 0 Min. : 0 Min. : 0 Min. : 0
## 1st Qu.: 0 1st Qu.: 0 1st Qu.: 0 1st Qu.: 0
## Median :1526 Median :1488 Median :1662 Median :1676
## Mean :1364 Mean :1365 Mean :1468 Mean :1380
## 3rd Qu.:2173 3rd Qu.:2229 3rd Qu.:2278 3rd Qu.:2260
## Max. :6304 Max. :6410 Max. :6031 Max. :5426
## Year_2021_OneBdr Year_2016_TwoBdr Year_2017_TwoBdr Year_2018_TwoBdr
## Min. : 0 Min. : 0 Min. : 0 Min. : 0
## 1st Qu.: 0 1st Qu.: 0 1st Qu.: 0 1st Qu.: 0
## Median :1660 Median : 1802 Median :1918 Median :1968
## Mean :1410 Mean : 1777 Mean :1780 Mean :1804
## 3rd Qu.:2186 3rd Qu.: 2706 3rd Qu.:2668 3rd Qu.:2680
## Max. :6001 Max. :10025 Max. :9675 Max. :9936
## Year_2019_TwoBdr Year_2020_TwoBdr Year_2021_TwoBdr Year_2016_ThreeBdrs
## Min. : 0 Min. : 0 Min. : 0 Min. : 0
## 1st Qu.: 0 1st Qu.: 0 1st Qu.: 0 1st Qu.: 0
## Median : 2001 Median : 1870 Median : 2023 Median : 1592
## Mean : 1886 Mean : 1759 Mean : 1810 Mean : 2341
## 3rd Qu.: 2737 3rd Qu.: 2580 3rd Qu.: 2696 3rd Qu.: 3157
## Max. :10397 Max. :10850 Max. :10406 Max. :16737
## Year_2017_ThreeBdrs Year_2018_ThreeBdrs Year_2019_ThreeBdrs
## Min. : 0 Min. : 0 Min. : 0
## 1st Qu.: 0 1st Qu.: 0 1st Qu.: 0
## Median : 1717 Median : 2182 Median : 2123
## Mean : 2321 Mean : 2281 Mean : 2377
## 3rd Qu.: 3194 3rd Qu.: 3218 3rd Qu.: 3234
## Max. :16188 Max. :14669 Max. :16682
## Year_2020_ThreeBdrs Year_2021_ThreeBdrs
## Min. : 0 Min. : 0
## 1st Qu.: 0 1st Qu.: 0
## Median : 1498 Median : 1748
## Mean : 2183 Mean : 2227
## 3rd Qu.: 3184 3rd Qu.: 3011
## Max. :17185 Max. :17224
\(~\)
# Mean for yr 2016
round(mean(Street_Easy_All$Year_2016_All))
## [1] 1730
round(mean(Street_Easy_All$Year_2016_Studio))
## [1] 826
round(mean(Street_Easy_All$Year_2016_OneBdr))
## [1] 1341
round(mean(Street_Easy_All$Year_2016_TwoBdr))
## [1] 1777
round(mean(Street_Easy_All$Year_2016_ThreeBdrs))
## [1] 2341
\(~\)
# Mean for yr 2017
round(mean(Street_Easy_All$Year_2017_All))
## [1] 1798
round(mean(Street_Easy_All$Year_2017_Studio))
## [1] 879
round(mean(Street_Easy_All$Year_2017_OneBdr))
## [1] 1364
round(mean(Street_Easy_All$Year_2017_TwoBdr))
## [1] 1780
round(mean(Street_Easy_All$Year_2017_ThreeBdrs))
## [1] 2321
\(~\)
# Mean for yr 2018
round(mean(Street_Easy_All$Year_2018_All))
## [1] 1830
round(mean(Street_Easy_All$Year_2018_Studio))
## [1] 913
round(mean(Street_Easy_All$Year_2018_OneBdr))
## [1] 1365
round(mean(Street_Easy_All$Year_2018_TwoBdr))
## [1] 1804
round(mean(Street_Easy_All$Year_2018_ThreeBdrs))
## [1] 2281
\(~\)
# Mean for yr 2019
round(mean(Street_Easy_All$Year_2019_All))
## [1] 1902
round(mean(Street_Easy_All$Year_2019_Studio))
## [1] 954
round(mean(Street_Easy_All$Year_2019_OneBdr))
## [1] 1468
round(mean(Street_Easy_All$Year_2019_TwoBdr))
## [1] 1886
round(mean(Street_Easy_All$Year_2019_ThreeBdrs))
## [1] 2377
\(~\)
# Mean for yr 2020
round(mean(Street_Easy_All$Year_2020_All))
## [1] 1760
round(mean(Street_Easy_All$Year_2020_Studio))
## [1] 919
round(mean(Street_Easy_All$Year_2020_OneBdr))
## [1] 1380
round(mean(Street_Easy_All$Year_2020_TwoBdr))
## [1] 1759
round(mean(Street_Easy_All$Year_2020_ThreeBdrs))
## [1] 2183
\(~\)
# Mean for yr 2021
round(mean(Street_Easy_All$Year_2021_All))
## [1] 1780
round(mean(Street_Easy_All$Year_2021_Studio))
## [1] 953
round(mean(Street_Easy_All$Year_2021_OneBdr))
## [1] 1410
round(mean(Street_Easy_All$Year_2021_TwoBdr))
## [1] 1810
round(mean(Street_Easy_All$Year_2021_ThreeBdrs))
## [1] 2227
\(~\)
# Count of Area_Type
<- Street_Easy_All %>%
area_type group_by(Area_Type) %>%
summarise(num=n()) %>%
arrange(desc(num))
head(area_type, n = 5)
## # A tibble: 4 Ă— 2
## Area_Type num
## <chr> <int>
## 1 neighborhood 177
## 2 submarket 15
## 3 borough 5
## 4 city 1
\(~\)
This file will be helpful for Tableau Public
# commenting out so that it doesn't replicate in the system
#write.csv(Street_Easy_All, "/Users/letiix3/Desktop/Data-607/Final Project/csv files/Street_Easy_All.csv")
\(~\)
Rows: 5,000 Columns: 20
Codecademy + StreetEasy: sample of 5,000 rental listings in Manhattan, Brooklyn, and Queens from June 2016.
\(~\)
Headers | Description |
---|---|
rental_id | rental ID |
building_id | building ID |
rent | price of rent($) |
bedrooms | number of bedrooms |
bathrooms | number of bathrooms |
size_sqft | size in square feet |
min_to_subway | distance from subway station in minutes |
floor | floor number |
building_age_yrs | building’s age in years |
no_fee | does it have a broker fee? (0 for fee, 1 for no fee) |
has_rooftdeck | does it have a roof deck? (o for no, 1 for yes) |
has_washer_dryer | does it have washer/dryer in unit (0/1) |
has_doorman | does it have a doorman? (0/1) |
has_elevator | does it have an elevator? (0/1) |
has_dishwasher | does it have a dishwasher? (0/1) |
has_patio | does the building have a gym? (0/1) |
has_gym | does the building have a gym? (0/1) |
neighborhood | neighborhood (ex: Greenpoint) |
submarket | submarket (ex: North Brooklyn) |
borough | borough (ex: Brooklyn) |
\(~\)
# Codecademy + StreetEasy
<- read.csv("https://raw.githubusercontent.com/Codecademy/datasets/master/streeteasy/streeteasy.csv", header = TRUE) street_easy_cc
\(~\)
# commenting out so that it does not reproduce
#glimpse(street_easy_cc)
\(~\)
# Remove columns not needed
<- street_easy_cc[, -c(2, 6:17)]
street_easy_cc
# Rename columns
colnames(street_easy_cc) <- c("Rental_ID", "Rent_Price", "No_Bedrooms", "No_Bathrooms", "Neighborhood", "SubMarket", "Borough" )
head(street_easy_cc, n = 4)
## Rental_ID Rent_Price No_Bedrooms No_Bathrooms Neighborhood
## 1 1545 2550 0 1 Upper East Side
## 2 2472 11500 2 2 Greenwich Village
## 3 10234 3000 3 1 Astoria
## 4 2919 4500 1 1 Midtown
## SubMarket Borough
## 1 All Upper East Side Manhattan
## 2 All Downtown Manhattan
## 3 Northwest Queens Queens
## 4 All Midtown Manhattan
\(~\)
This file will be helpful for Tableau Public
# commenting out so that it doesn't replicate in the system
#write.csv(street_easy_cc, "/Users/letiix3/Desktop/Data-607/Final Project/csv files/street_easy_cc.csv")
\(~\)
\(~\)
summary(street_easy_cc)
## Rental_ID Rent_Price No_Bedrooms No_Bathrooms
## Min. : 1 Min. : 1250 Min. :0.000 Min. :0.000
## 1st Qu.: 2700 1st Qu.: 2750 1st Qu.:1.000 1st Qu.:1.000
## Median : 5456 Median : 3600 Median :1.000 Median :1.000
## Mean : 5527 Mean : 4537 Mean :1.396 Mean :1.322
## 3rd Qu.: 8306 3rd Qu.: 5200 3rd Qu.:2.000 3rd Qu.:2.000
## Max. :11349 Max. :20000 Max. :5.000 Max. :5.000
## Neighborhood SubMarket Borough
## Length:5000 Length:5000 Length:5000
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
##
##
##
\(~\)
# Count of Bedrooms
<- street_easy_cc%>%
bedrooms group_by(No_Bedrooms)%>%
summarise(num=n())%>%
arrange(desc(num))
head(bedrooms)
## # A tibble: 6 Ă— 2
## No_Bedrooms num
## <dbl> <int>
## 1 1 2048
## 2 2 1438
## 3 0 832
## 4 3 496
## 5 4 103
## 6 1.5 28
\(~\)
# Count of Bathrooms
<- street_easy_cc%>%
bathrooms group_by(No_Bathrooms)%>%
summarise(num=n())%>%
arrange(desc(num))
head(bathrooms)
## # A tibble: 6 Ă— 2
## No_Bathrooms num
## <int> <int>
## 1 1 3572
## 2 2 1212
## 3 3 182
## 4 0 18
## 5 4 14
## 6 5 2
\(~\)
# Count of Neighborhood
<- street_easy_cc%>%
neighborhood group_by(Neighborhood)%>%
summarise(num=n())%>%
arrange(desc(num))
head(neighborhood, n = 6)
## # A tibble: 6 Ă— 2
## Neighborhood num
## <chr> <int>
## 1 Upper West Side 579
## 2 Upper East Side 500
## 3 Midtown East 460
## 4 Midtown West 314
## 5 Williamsburg 306
## 6 Financial District 268
\(~\)
# Count of Submarket
<- street_easy_cc%>%
submarketgroup_by(SubMarket)%>%
summarise(num=n())%>%
arrange(desc(num))
head(submarket)
## # A tibble: 6 Ă— 2
## SubMarket num
## <chr> <int>
## 1 All Downtown 1229
## 2 All Midtown 1001
## 3 All Upper West Side 579
## 4 All Upper East Side 505
## 5 North Brooklyn 347
## 6 Northwest Brooklyn 311
\(~\)
# Count of Borough
<- street_easy_cc%>%
borough group_by(Borough)%>%
summarise(num=n())%>%
arrange(desc(num))
head(borough)
## # A tibble: 3 Ă— 2
## Borough num
## <chr> <int>
## 1 Manhattan 3539
## 2 Brooklyn 1013
## 3 Queens 448
\(~\)
# Mean for Rent Price
round(mean(street_easy_cc$Rent_Price))
## [1] 4537
\(~\)
Rows: 206 Columns: 62
\(~\)
Apartment List State data set from 2017 - October 2021.
# Load Data
<- read.csv("https://raw.githubusercontent.com/letisalba/Data-607/main/Final%20Project/csv%20files/State_AptList_Rent.csv", header=TRUE, stringsAsFactors=FALSE, na.strings=c(""," ","NA")); apt_list_state
\(~\)
# Rename columns
colnames(apt_list_state) <- c("State_Name", "FIPS_Code", "Population", "Bedroom_Size", "Jan_2017", "Feb_2017", "Mar_2017", "Apr_2017", "May_2017", "Jun_2017", "Jul_2017", "Aug_2017", "Sep_2017", "Oct_2017", "Nov_2017", "Dec_2017", "Jan_2018", "Feb_2018", "Mar_2018", "Apr_2018", "May_2018", "Jun_2018", "Jul_2018", "Aug_2018", "Sep_2018", "Oct_2018", "Nov_2018", "Dec_2018", "Jan_2019", "Feb_2019", "Mar_2019", "Apr_2019", "May_2019", "Jun_2019", "Jul_2019", "Aug_2019", "Sep_2019", "Oct_2019", "Nov_2019", "Dec_2019", "Jan_2020", "Feb_2020", "Mar_2020", "Apr_2020", "May_2020", "Jun_2020", "Jul_2020", "Aug_2020", "Sep_2020", "Oct_2020", "Nov_2020", "Dec_2020", "Jan_2021", "Feb_2021", "Mar_2021", "Apr_2021", "May_2021", "Jun_2021", "Jul_2021", "Aug_2021", "Sep_2021", "Oct_2021")
# Replace all NA values with 0
<- apt_list_state %>% replace(is.na(.), 0) apt_list_state
\(~\)
# Make 5 - 62 numeric
<- apt_list_state %>%
apt_list_state mutate_at(c(5:62), as.numeric)
\(~\)
# Rename values in Bedroom_Size column to remove punctuation
<- recode(apt_list_state$Bedroom_Size,
Apt_list_state "_Overall" = "Overall",
"_Studio" = "Studio")
head(Apt_list_state)
## [1] "Overall" "Studio" "1br" "2br" "3br" "4br"
\(~\)
# cbind tables
<- cbind(apt_list_state, Apt_list_state)
Apt_List_state
# Drop column not needed
<- Apt_List_state[, -c(4)]
Apt_List_state
# Rename column Apt_list_state to Bedroom_size
colnames(Apt_List_state) <- c("State_Name", "FIPS_Code", "Population", "Jan_2017", "Feb_2017", "Mar_2017", "Apr_2017", "May_2017", "Jun_2017", "Jul_2017", "Aug_2017", "Sep_2017", "Oct_2017", "Nov_2017", "Dec_2017", "Jan_2018", "Feb_2018", "Mar_2018", "Apr_2018", "May_2018", "Jun_2018", "Jul_2018", "Aug_2018", "Sep_2018", "Oct_2018", "Nov_2018", "Dec_2018", "Jan_2019", "Feb_2019", "Mar_2019", "Apr_2019", "May_2019", "Jun_2019", "Jul_2019", "Aug_2019", "Sep_2019", "Oct_2019", "Nov_2019", "Dec_2019", "Jan_2020", "Feb_2020", "Mar_2020", "Apr_2020", "May_2020", "Jun_2020", "Jul_2020", "Aug_2020", "Sep_2020", "Oct_2020", "Nov_2020", "Dec_2020", "Jan_2021", "Feb_2021", "Mar_2021", "Apr_2021", "May_2021", "Jun_2021", "Jul_2021", "Aug_2021", "Sep_2021", "Oct_2021", "Bedroom_size")
# Reorder columns
<- c("State_Name", "FIPS_Code", "Population", "Bedroom_size", "Jan_2017", "Feb_2017", "Mar_2017", "Apr_2017", "May_2017", "Jun_2017", "Jul_2017", "Aug_2017", "Sep_2017", "Oct_2017", "Nov_2017", "Dec_2017", "Jan_2018", "Feb_2018", "Mar_2018", "Apr_2018", "May_2018", "Jun_2018", "Jul_2018", "Aug_2018", "Sep_2018", "Oct_2018", "Nov_2018", "Dec_2018", "Jan_2019", "Feb_2019", "Mar_2019", "Apr_2019", "May_2019", "Jun_2019", "Jul_2019", "Aug_2019", "Sep_2019", "Oct_2019", "Nov_2019", "Dec_2019", "Jan_2020", "Feb_2020", "Mar_2020", "Apr_2020", "May_2020", "Jun_2020", "Jul_2020", "Aug_2020", "Sep_2020", "Oct_2020", "Nov_2020", "Dec_2020", "Jan_2021", "Feb_2021", "Mar_2021", "Apr_2021", "May_2021", "Jun_2021", "Jul_2021", "Aug_2021", "Sep_2021", "Oct_2021")
col_order
<- Apt_List_state[, col_order] Apt_List_state
\(~\)
# Select columns from 2017
<- Apt_List_state %>%
rent_state_2017 select(5:16)
# cbind new columns with Mean of 2017
<- cbind(rent_state_2017, Average_Rent_State_17 = round(rowMeans(rent_state_2017))) rent_state_17
\(~\)
# Select columns from 2018
<- Apt_List_state %>%
rent_state_2018 select(17:28)
# cbind new columns with Mean of 2018
<- cbind(rent_state_2018, Average_Rent_State_18 = round(rowMeans(rent_state_2018))) rent_state_18
\(~\)
# Select columns from 2019
<- Apt_List_state %>%
rent_state_2019 select(29:40)
# cbind new columns with Mean of 2019
<- cbind(rent_state_2019, Average_Rent_State_19 = round(rowMeans(rent_state_2019))) rent_state_19
\(~\)
# Select columns from 2020
<- Apt_List_state %>%
rent_state_2020 select(41:52)
# cbind new columns with Mean of 2019
<- cbind(rent_state_2020, Average_Rent_State_20 = round(rowMeans(rent_state_2020))) rent_state_20
\(~\)
# Select columns from 2021
<- Apt_List_state %>%
rent_state_2021 select(53:62)
# cbind new columns with Mean of 2021
<- cbind(rent_state_2021, Average_Rent_State_21 = round(rowMeans(rent_state_2021))) rent_state_21
\(~\)
# cbind all tables with the mean and drop all columns not needed
<- cbind(Apt_List_state, rent_state_17, rent_state_18, rent_state_19, rent_state_20, rent_state_21)
Apartment_List_State <- Apartment_List_State[-c(5:74, 76:87, 89:100, 102:113, 115:124)]
Apartment_List_State head(Apartment_List_State, n = 3)
## State_Name FIPS_Code Population Bedroom_size Average_Rent_State_17
## 1 Alabama 1 4864680 Overall 835
## 2 Alabama 1 4864680 Studio 639
## 3 Alabama 1 4864680 1br 649
## Average_Rent_State_18 Average_Rent_State_19 Average_Rent_State_20
## 1 852 890 917
## 2 652 681 702
## 3 662 692 713
## Average_Rent_State_21
## 1 996
## 2 763
## 3 774
\(~\)
This file will be helpful for Tableau Public
# commenting out so that it doesn't replicate in the system
#write.csv(Apartment_List_State, "/Users/letiix3/Desktop/Data-607/Final Project/csv files/Apartment_List_State.csv")
\(~\)
\(~\)
summary(Apartment_List_State)
## State_Name FIPS_Code Population Bedroom_size
## Length:306 Min. : 1.00 Min. : 581836 Length:306
## Class :character 1st Qu.:16.00 1st Qu.: 1687809 Class :character
## Mode :character Median :29.00 Median : 4440204 Mode :character
## Mean :28.96 Mean : 6331432
## 3rd Qu.:42.00 3rd Qu.: 7294336
## Max. :56.00 Max. :39148760
## Average_Rent_State_17 Average_Rent_State_18 Average_Rent_State_19
## Min. : 546.0 Min. : 546.0 Min. : 560
## 1st Qu.: 806.2 1st Qu.: 824.8 1st Qu.: 851
## Median :1020.0 Median :1044.0 Median :1076
## Mean :1090.7 Mean :1120.4 Mean :1152
## 3rd Qu.:1263.5 3rd Qu.:1304.8 3rd Qu.:1351
## Max. :2928.0 Max. :3176.0 Max. :3254
## Average_Rent_State_20 Average_Rent_State_21
## Min. : 571.0 Min. : 600.0
## 1st Qu.: 864.8 1st Qu.: 928.5
## Median :1088.0 Median :1153.5
## Mean :1162.3 Mean :1241.0
## 3rd Qu.:1349.2 3rd Qu.:1484.5
## Max. :3211.0 Max. :3391.0
\(~\)
# Count of Bedroom_size
<- Apartment_List_State%>%
Bdr_size group_by(Bedroom_size)%>%
summarise(num=n())%>%
arrange(desc(num))
head(Bdr_size)
## # A tibble: 6 Ă— 2
## Bedroom_size num
## <chr> <int>
## 1 1br 51
## 2 2br 51
## 3 3br 51
## 4 4br 51
## 5 Overall 51
## 6 Studio 51
\(~\)
# Count of State and filtering out NY
<- Apartment_List_State %>%
state group_by(State_Name) %>%
filter(State_Name == "New York") %>%
summarise(num=n()) %>%
arrange(desc(num))
head(state)
## # A tibble: 1 Ă— 2
## State_Name num
## <chr> <int>
## 1 New York 6
\(~\)
# Mean yr 2017
round(mean(Apartment_List_State$Average_Rent_State_17))
## [1] 1091
\(~\)
# Mean yr 2018
round(mean(Apartment_List_State$Average_Rent_State_18))
## [1] 1120
\(~\)
# Mean yr 2019
round(mean(Apartment_List_State$Average_Rent_State_19))
## [1] 1152
\(~\)
# Mean yr 2020
round(mean(Apartment_List_State$Average_Rent_State_20))
## [1] 1162
\(~\)
# Mean yr 2021
round(mean(Apartment_List_State$Average_Rent_State_21))
## [1] 1241
\(~\)
\(~\)
# Libraries
library(ggplot2)
library(hrbrthemes)
## NOTE: Either Arial Narrow or Roboto Condensed fonts are required to use these themes.
## Please use hrbrthemes::import_roboto_condensed() to install Roboto Condensed and
## if Arial Narrow is not on your system, please see https://bit.ly/arialnarrow
::import_roboto_condensed() hrbrthemes
## You will likely need to install these fonts on your system as well.
##
## You can find them in [/Users/letiix3/Library/R/x86_64/4.1/library/hrbrthemes/fonts/roboto-condensed]
library(viridis)
## Loading required package: viridisLite
library(plotly)
##
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
\(~\)
# Bar plot for all Rent Averages in the boroughs
%>%
Street_Easy_All mutate(Averages = Year_2016_All + Year_2017_All + Year_2018_All + Year_2019_All + Year_2020_All + Year_2021_All + Year_2016_Studio + Year_2017_Studio + Year_2018_Studio + Year_2019_Studio + Year_2020_Studio + Year_2021_Studio + Year_2016_OneBdr + Year_2017_OneBdr + Year_2018_OneBdr + Year_2019_OneBdr + Year_2020_OneBdr + Year_2021_OneBdr +
+ Year_2017_TwoBdr + Year_2018_TwoBdr +
Year_2016_TwoBdr + Year_2020_TwoBdr + Year_2021_TwoBdr +
Year_2019_TwoBdr + Year_2017_ThreeBdrs + Year_2018_ThreeBdrs +
Year_2016_ThreeBdrs + Year_2020_ThreeBdrs + Year_2021_ThreeBdrs) %>% # Creating cumulative time
Year_2019_ThreeBdrs ggplot(aes(x = Borough, y = Averages, fill = Averages)) + # Passing plot arguments
geom_bar(stat = "Identity") # Specifying the type of plot
\(~\)
%>%
Street_Easy_All pivot_longer(4:9, names_to = "Averages", values_to = "Value") %>%
ggplot(aes(x = Averages)) +
geom_bar() +
facet_wrap(vars(Averages), ncol = 3) +
labs(x = "Rent Averages", y = "Count", title = "All Apartment Types")
\(~\)
%>%
Street_Easy_Studio pivot_longer(4:9, names_to = "Averages", values_to = "Value") %>%
ggplot(aes(x = Averages)) +
geom_bar() +
facet_wrap(vars(Averages), ncol = 3) +
labs(x = "Rent Averages", y = "Count", title = "Studio's")
\(~\)
# Plot for original dataset of All dataset
%>%
streeteasy_all pivot_longer(4:15, names_to = "Averages", values_to = "Value") %>%
ggplot(aes(x = Value, color = "blue")) +
facet_wrap(vars(Averages), ncol = 3) +
geom_point(stat = "count") +
geom_line(stat = "count") +
labs(x = "Rent Cost per Month", y = "Count")
\(~\)
# Count of Borough's for Codecademy dataset
%>%
street_easy_cc ggplot(aes(x = Borough, fill = Borough)) +
geom_bar(stat = "count") +
labs(x = "Boroughs", y = "Count")
\(~\)
# Codecademy Count of Rent Prices
%>%
street_easy_cc ggplot(aes(x=Rent_Price)) +
geom_histogram(binwidth=15, fill="#69b3a2", color="#e9ecef", alpha=0.9) +
ggtitle("Rent Prices") +
theme_ipsum() +
theme(
plot.title = element_text(size=15)
)
\(~\)
# Density plot for Borough and Rent Price in Codecademy
%>%
street_easy_cc ggplot(aes(x = Rent_Price, group = Borough, fill = Borough)) +
geom_density(adjust = 1.0) +
theme_ipsum() +
facet_wrap(~Borough) +
theme(
legend.position="none",
panel.spacing = unit(0.1, "lines"),
axis.ticks.x=element_blank()
)
\(~\)
# Line plot for Rent Price and Number of Bedrooms in Codecademy
%>%
street_easy_cc tail(10) %>%
ggplot( aes(x = Rent_Price, y = No_Bedrooms)) +
geom_line() +
geom_point()
\(~\)
# Plot .long for Codecademy
<- street_easy_cc %>%
street_easy_cc.long select("Neighborhood", "Rent_Price") %>%
pivot_longer(-Neighborhood, names_to = "Year", values_to = "Rent_Price")
ggplot(street_easy_cc.long, aes(Neighborhood, Rent_Price, colour = Year)) + geom_line()
\(~\)
# This plot came out a bit cluttered
<- street_easy_cc %>% select("Borough", "No_Bedrooms", "Rent_Price")
rental_price plot_ly(rental_price, x = ~No_Bedrooms, y = ~Rent_Price, color = ~Borough, type = 'scatter', mode = 'lines')
\(~\)
Each data set required a bit of wrangling but after some exploration I found the following results:
\(~\)
\(~\)
When taking into consideration the apartment types within the StreetEasy data set the rent averages per year were:
\(~\)
\(~\)
How does that compare to the entire United States? From the table below, overall, New York has a higher average rent from the years 2017 - 2021. There was about a 39% difference between the years 2017 - 2019 but from 2020 - 2021 the difference dropped to about 30% of rent increase.
\(~\)
# Comparison of all apartment types for year 2017 - 2021 Apartment List and StreetEasy
<- data.frame(Year = c(2017, 2018, 2019, 2020, 2021),
comparison Avg_Rent_US = c(1091, 1120, 1152, 1162, 1241),
New_York = c(1798, 1830, 1902, 1760, 1780))
comparison
## Year Avg_Rent_US New_York
## 1 2017 1091 1798
## 2 2018 1120 1830
## 3 2019 1152 1902
## 4 2020 1162 1760
## 5 2021 1241 1780
\(~\)
# Percent difference for rent averages
# percent_diff <- ((new_avg - old_avg) / new_avg) * 100
<- ((1798 - 1091) / 1798) * 100
percent_diff17 percent_diff17
## [1] 39.32147
<- ((1830 - 1120) / 1830) * 100
percent_diff18 percent_diff18
## [1] 38.79781
<- ((1902 - 1152) / 1902) * 100
percent_diff19 percent_diff19
## [1] 39.43218
<- ((1760 - 1162) / 1760) * 100
percent_diff20 percent_diff20
## [1] 33.97727
<- ((1780 - 1241) / 1780) * 100
percent_diff21 percent_diff21
## [1] 30.2809
\(~\)
Digging deeper, using the Apartment List data set, I am comparing the years 2019 - 2021 with the average cost of each apartment types and obtained the results below:
# Comparison of all apartment types for year 2019 - 2021 from Apartment List
<- data.frame(Apt_Type = c("Studio", "One_Bdr", "Two_Bdrs", "Three_Bdrs", "Four_Bdrs"),
comparison2 "Yr_2019" = c(853, 912, 1111, 1336, 1587),
"Yr_2020" = c(860, 919, 1121, 1349, 1601),
"Yr_2021" = c(917, 981, 1197, 1441, 1710))
comparison2
## Apt_Type Yr_2019 Yr_2020 Yr_2021
## 1 Studio 853 860 917
## 2 One_Bdr 912 919 981
## 3 Two_Bdrs 1111 1121 1197
## 4 Three_Bdrs 1336 1349 1441
## 5 Four_Bdrs 1587 1601 1710
\(~\)
For all 5 apartment types there is a small increase in rent of about 7% from 2019 to 2021.
\(~\)
# Percent difference for rent averages from 2019 and 2021
# percent_diff <- ((new_avg - old_avg) / new_avg) * 100
<- ((917 - 853) / 917) * 100
studio studio
## [1] 6.97928
<- ((981 - 912) / 981) * 100
one_bdr one_bdr
## [1] 7.033639
<- ((1197 - 1111) / 1197) * 100
two_bdr two_bdr
## [1] 7.184628
<- ((1441 - 1336) / 1441) * 100
three_bdr three_bdr
## [1] 7.286607
<- ((1710 - 1587) / 1710) * 100
four_bdr four_bdr
## [1] 7.192982
\(~\)
After importing my data sets into Tableau Public I was able to create more visuals that led to the following conclusions:
The pandemic has affected rent prices all throughout the United States. Although the most expensive states saw rent price drops between 2020 and 2021, there have been increases in average rent as well. New York has one of the highest average in rent prices nationwide. Not only that but based on these numbers I predict that apartment rental prices will continue to rise an average of about 7% within the next two years.
\(~\)
While thinking about this project, I made a couple of assumptions. The first being the timeline I should consider, if 2018 - 2021 would suffice. While exploring the data I decided that I had to expand my search and opted for 2016 - 2021. I also thought I would see a larger decrease in rent for 2020 and into 2021 since media coverage has covered how much rent prices in New York and throughout the 5 boroughs have decreased. One assumption that was true was Manhattan having the highest average rent price in NYC.
\(~\)
\(~\)
\(~\)
\(~\)
\(~\)
\(~\)
\(~\)