Load data saved in my working directory

library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --
## v ggplot2 3.3.3     v purrr   0.3.4
## v tibble  3.1.2     v dplyr   1.0.6
## v tidyr   1.1.3     v stringr 1.4.0
## v readr   1.4.0     v forcats 0.5.1
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
setwd("C:/Documents - Copy/PERSONAL/Data 110_MC_Class")

project0 <- read_csv("2019_Rental_Facility_Occupancy_Survey_Results.csv")
## 
## -- Column specification --------------------------------------------------------
## cols(
##   `Community Name` = col_character(),
##   `Community Address` = col_character(),
##   `Bedroom Types` = col_character(),
##   `Average Rent 2015` = col_double(),
##   `Average Rent 2016` = col_double(),
##   `Average Rent 2017` = col_double(),
##   `Average Rent 2018` = col_double(),
##   `Average Rent 2019` = col_double(),
##   `Percent Change From Previous Year 2018-2019` = col_double()
## )

Clean the data

names(project0) <- tolower(names(project0))
names(project0) <- gsub(" ", "_", names(project0))
str(project0)
## spec_tbl_df [1,100 x 9] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ community_name                             : chr [1:1100] "Chateau Apartments" "The Henri" "Corrigan Square" "Dorset Apartments" ...
##  $ community_address                          : chr [1:1100] "9727 MT PISGAH RD SILVER SPRING MD 20903" "11870 GRAND PARK AVE ROCKVILLE MD 20852" "8511 SNOUFFER SCHOOL RD GAITHERSBURG MD 20879" "4757 CHEVY CHASE DR CHEVY CHASE MD 20815" ...
##  $ bedroom_types                              : chr [1:1100] "efficiency" "efficiency" "1 bedroom" "2 bedroom" ...
##  $ average_rent_2015                          : num [1:1100] 1249 NA 1032 1508 997 ...
##  $ average_rent_2016                          : num [1:1100] NA NA 1224 1523 1026 ...
##  $ average_rent_2017                          : num [1:1100] 1299 1606 1073 1508 782 ...
##  $ average_rent_2018                          : num [1:1100] 1323 1567 1093 1520 716 ...
##  $ average_rent_2019                          : num [1:1100] 1594 1845 1276 1772 834 ...
##  $ percent_change_from_previous_year_2018-2019: num [1:1100] 0.205 0.177 0.167 0.166 0.165 0.163 0.158 0.146 0.14 0.139 ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   `Community Name` = col_character(),
##   ..   `Community Address` = col_character(),
##   ..   `Bedroom Types` = col_character(),
##   ..   `Average Rent 2015` = col_double(),
##   ..   `Average Rent 2016` = col_double(),
##   ..   `Average Rent 2017` = col_double(),
##   ..   `Average Rent 2018` = col_double(),
##   ..   `Average Rent 2019` = col_double(),
##   ..   `Percent Change From Previous Year 2018-2019` = col_double()
##   .. )
view(project0)

Learning about the data

How is the data distributed between the bedroom types?

Filter the dataset by bedroom type

Filter for 1 bedroom

onebed <- project0 %>%
  filter(bedroom_types == "1 bedroom")
view(onebed)

There are 404 entries for 1 bedroom.

Filter for 2 bedroom

twobed <- project0 %>%
  filter(bedroom_types == "2 bedroom")
view(twobed)

There are 413 entries for 2 bedroom.

Filter for 3 bedroom

threebed <- project0 %>%
  filter(bedroom_types == "3 bedroom")
view(threebed)

There are 167 entries of 3 bedroom.

Filter for 4 bedroom

fourbed <- project0 %>%
  filter(bedroom_types == "4 bedroom")
view(fourbed)

There are 19 entries of 4 bedroom.

Filter for “efficiency” bedroom type

efficiency <- project0 %>%
  filter(bedroom_types == "efficiency")
view(efficiency)

There are 97 entried of “efficiency” bedroom type.

Do a bar chart of bedroom types

ggplot(data = project0) +
  geom_bar(mapping = aes(x = bedroom_types, fill=bedroom_types))+
  ggtitle("Distribution of Bedroom Types") 

The bar chart shows that one and two bedrooms are the most frequent bedroom types followed by efficiency; and the least are the 4-bedroom residences.

My analysis questions

How does the average rent vary by bedroom type?

Use boxplot of average rent of 2018, as an example, to answer this question.

boxpl <- project0 %>%
  ggplot() + 
  geom_boxplot(aes(y=average_rent_2018, group=bedroom_types,fill=bedroom_types))+
     ggtitle("Average Rent 2018 by Bedroom Types") 
boxpl

Check out the above patterm using average rent of 2019.

boxpl <- project0 %>%
  ggplot() + 
  geom_boxplot(aes(y=average_rent_2019, group=bedroom_types,fill=bedroom_types))+
  ggtitle("Average Rent 2019 by Bedroom Types") 
boxpl

The rent pattern for 2019 is similar to that of 2018.

What is this dataset?

This data is from the Montgomery County datasets at this link: 2019 Rental Facility Occupancy Survey Results | Open Data Portal (montgomerycountymd.gov). It is data from a survey conducted in 2019 about residential rent in Montgomery County in Maryland in the period 2015 to 2019. It has the following categorical variables: community, address, bedroom types (1-bedroom, 2-bedroom, 3-bedroom, 4-bedroom and efficiency). And numeric variables: average rent in 2015, 2016, 2017, 2018 and 2019; and the percent difference in average rent between 2018 and 2019. The data has 1,100 entries. The cleaning of the data included converting variable names to lower case, as well as replacing word string variable names with single hyphenated words.

What did the data show?

The 1,100 residences in the survey included 1-bedroom (n=404), 2-bedroom (n=413), 3-bedroom (n=167), 4-bedroom (n=19) and efficiency bedroom type (n=97). My main analysis question was, “how does the average rent vary by bedroom type?” I used the most recent data, 2018 and 2019, to answer this question. I plotted a boxplot for each year showing the range of average rent for each year by bedroom type. The box plots showed the same pattern in 2018 and 2019. The data showed that rent increased steadily from 1-bedroom, to 2-bedroom, to 3-bedroom, to 4-bedroom. The range for the rent for “efficiency” is almost similar to 1-bedroom. However, the rent for 4-bedroom has a wide range with the lower end overlaping with all the other bedroom types. This overlap came as a surprise to me. I would not have expected that in the same county one could get a 1-bedroom for the same rent as a 4-bedroom.

What else could be explored with this data?

It would be interesting to explore possible reasons for this rent overlap through further analysis of this data. One reason for the overlap could be that rents in some area codes of the county are so high that some high-end 1-bedroom could have a rent similar to the lower-end 4-bedroom residences in some area codes with lower rents. This question could be answered by plotting the data by area code. Another reason for the overlap of rent between bedroom types could be the variation by community. It is possible that some residential communities are expensive while others are moderate, thus a 1-bedroom in one community could cost the same as a 4-bedroom in another community. Further exploration of the data could include examination of time trend. Indeed, the dataset has a variable, “Percent Change From Previous Year 2018-2019”. This analysis could reveal the pattern of rent increases between the years 2015 to 2019 thus suggesting a pattern that could be used to predict rents for subsequent years.