Central Tendency for Housing Data in R

In this project, you will find the mean, median, and mode cost of one-bedroom apartments in three of the five New York City boroughs: Brooklyn, Manhattan, and Queens.

Using your findings, you will make conclusions about the cost of living in each of the boroughs. We will also discuss an important assumption that we make when we point out differences between the boroughs.

We worked with Streeteasy.com to collect this data. While we will only focus on the cost of one-bedroom apartments, the dataset includes a lot more information if you’re interested in asking your own questions about the Brooklyn, Manhattan, and Queens housing market.

# Load libraries
library(readr)
library(dplyr)
library(DescTools)
# Read in housing data
brooklyn_one_bed <- read_csv('brooklyn-one-bed.csv')
brooklyn_price <- brooklyn_one_bed$rent

manhattan_one_bed <- read_csv('manhattan-one-bed.csv')
manhattan_price <- manhattan_one_bed$rent

queens_one_bed <- read_csv('queens-one-bed.csv')
queens_price <- queens_one_bed$rent

Find the Mean

#Calculate Mean
brooklyn_mean<-mean(brooklyn_price)
brooklyn_mean
[1] 3327.404
manhattan_mean<-mean(manhattan_price)
manhattan_mean
[1] 3993.477
queens_mean<-mean(queens_price)
queens_mean
[1] 2346.254

Find the Median

#Calculate Median
brooklyn_median<-median(brooklyn_price)
brooklyn_median
[1] 3000
manhattan_median<-median(manhattan_price)
manhattan_median
[1] 3800
queens_median<-median(queens_price)
queens_median
[1] 2200

Find the Mode

#Calculate Mode
brooklyn_mode<-Mode(brooklyn_price)
brooklyn_mode
[1] 2500
attr(,"freq")
[1] 26
manhattan_mode<-Mode(manhattan_price)
manhattan_mode
[1] 3500
attr(,"freq")
[1] 56
queens_mode<-Mode(queens_price)
queens_mode
[1] 1750
attr(,"freq")
[1] 11

It looks like the average cost of one-bedroom apartments in Manhattan is the most, and in Queens is the least. This pattern holds for the median and mode values as well.

While the mode is not the most important indicator of centrality, the fact that mean, median, and mode are within a few hundred dollars for each borough indicates the data is centered around:

  • $3,300 for Brooklyn
  • $3,900 for Manhattan
  • $2,300 for Queens

We assumed that the data from Streeteasy is representative of housing prices for the entire borough. Given that Streeteasy is only used by a subset of property owners, this is not a fair assumption. A quick search on rentcafe.com will tell you the averages are more like:

  • $2,695 for Brooklyn one-bedroom apartments
  • $4,188 for Manhattan one-bedroom apartments
  • $2,178 for Queens one-bedroom apartments

This is an interesting finding. Why may the cost from rentcafe.com be higher in Manhattan than in Brooklyn or Queens?

Although we don’t have the answer to this question, it’s worth thinking about the possible differences between our Streeteasy data and where rentcafe is pulling their data.

Histograms

library(ggplot2)
#Histograms
b <- ggplot (brooklyn_one_bed, aes(x=rent)) +
  geom_histogram(binwidth = 100) + 
  geom_vline(aes(xintercept=brooklyn_mean,
                 linetype=Mean, color = Blue),
             color="blue",
             linetype="dashed",
             size=1,
             show.legend = TRUE)  + 
  geom_vline(aes(xintercept=brooklyn_median,
             linetype=Median, color = Red),
             color="Red",
             linetype="dashed",
             size=1, show.legend = TRUE) + 
  geom_vline(aes(xintercept=brooklyn_mode,
             linetype=Mode, color = Green),
             color="Green",
             linetype="dashed",
             size=1, show.legend = TRUE) +
  labs(title="Monthly Rent in Brooklyn")
b


m <- ggplot (manhattan_one_bed, aes(x=rent)) +
  geom_histogram(binwidth = 100) + 
  geom_vline(aes(xintercept=manhattan_mean,
                 linetype=Mean, color = Blue),
             color="blue",
             linetype="dashed",
             size=1,
             show.legend = TRUE)  + 
  geom_vline(aes(xintercept=manhattan_median,
             linetype=Median, color = Red),
             color="Red",
             linetype="dashed",
             size=1, show.legend = TRUE) + 
  geom_vline(aes(xintercept=manhattan_mode,
             linetype=Mode, color = Green),
             color="Green",
             linetype="dashed",
             size=1, show.legend = TRUE) +
  labs(title="Monthly Rent in Manhattan")
m


q <- ggplot (queens_one_bed, aes(x=rent)) +
  geom_histogram(binwidth = 40) + 
  geom_vline(aes(xintercept=queens_mean,
                 linetype=Mean, color = Blue),
             color="blue",
             linetype="dashed",
             size=1,
             show.legend = TRUE)  + 
  geom_vline(aes(xintercept=queens_median,
             linetype=Median, color = Red),
             color="Red",
             linetype="dashed",
             size=1, show.legend = TRUE) + 
  geom_vline(aes(xintercept=queens_mode,
             linetype=Mode, color = Green),
             color="Green",
             linetype="dashed",
             size=1, show.legend = TRUE) +
  labs(title="Monthly Rent in Queens")

q

# Don't look below here
# Mean
if(exists('brooklyn_mean')) {
  print(paste("The mean price in Brooklyn is" , round(brooklyn_mean, digits=2))) 
}else{
    print("The mean price in Brooklyn is not yet defined.")
}
[1] "The mean price in Brooklyn is 3327.4"
if(exists("manhattan_mean")) {
    print(paste("The mean price in Manhattan is", round(manhattan_mean,digits=2)))
} else {
    print("The mean in Manhattan is not yet defined.")
}
[1] "The mean price in Manhattan is 3993.48"
if(exists("queens_mean")) {
    print(paste("The mean price in Queens is" , round(queens_mean,digits=2)))
} else {
  print("The mean price in Queens is not yet defined.")
}   
[1] "The mean price in Queens is 2346.25"
    
# Median
if(exists("brooklyn_median")) {
  print(paste("The median price in Brooklyn is" , brooklyn_median)) 
}else{
    print("The median price in Brooklyn is not yet defined.")
}
[1] "The median price in Brooklyn is 3000"
if(exists("manhattan_median")) {
    print(paste("The median price in Manhattan is", manhattan_median))
} else {
    print("The median in Manhattan is not yet defined.")
}
[1] "The median price in Manhattan is 3800"
if(exists("queens_median")) {
    print(paste("The median price in Queens is" , queens_median))
} else {
  print("The median price in Queens is not yet defined.")
} 
[1] "The median price in Queens is 2200"
    
#Mode
if(exists("brooklyn_mode")) {
  print(paste("The mode price in Brooklyn is" , brooklyn_mode)) 
}else{
    print("The mode price in Brooklyn is not yet defined.")
}
[1] "The mode price in Brooklyn is 2500"
if(exists("manhattan_median")) {
    print(paste("The mode price in Manhattan is", manhattan_mode))
} else {
    print("The mode in Manhattan is not yet defined.")
}
[1] "The mode price in Manhattan is 3500"
if(exists("queens_median")) {
    print(paste("The mode price in Queens is" , queens_mode))
} else {
  print("The mode price in Queens is not yet defined.")
} 
[1] "The mode price in Queens is 1750"
