Summary on the Data and the Analysis

The brooklyn_sales_map dataset was downloaded from the Kaggle Website, and it contains information about the home sold in Brooklyn New York from 2003 to 2017. Within this analysis, we will look at trends in the Brooklyn real estate market through the use of images generated by the ggplot2 family. This study will have two parts. In the first part of this study, our dependent variable is the price in which the house was sold. Our independent variables are the year that the property was sold and the zip code of that property. In the second part of this study, our dependent variable is the size of the land sold. Our independent variables are the year that the property was sold and the zip code of that property. My hopes for dividing up this analysis into two sections is to investigate trends in property sales over a substantial period of time. By studying these trends, people may gain some insights that can lead to safer neighborhoods throughout the country.

Variables used for the Analysis

PART 1 Dependent variable = sale_price

Independent variable = year_of_sale & ZipCode

PART 2 Dependent variable = LotArea

Independent variable = year_of_sale & ZipCode

library(readr)
brooklyn_sales_map <- read_csv("Desktop/brooklynhomes2003to2017/brooklyn_sales_map.csv")

Importing Packages

library(ggplot2)
library(ggthemes)
library(gganimate)
library(tidyverse)
library(dplyr)
library(viridis)

Part One

graph_data <- ggplot(brooklyn_sales_map,mapping =aes(x= year_of_sale, y = sale_price))

graph1 <- graph_data + geom_smooth() 

graph1 + labs(title = 'Market Trend in Brooklyn New York: Sales Price Per Home from 2003 - 2017') + theme_tufte() 

Zip Code effects on property value.

#Data Prep
Housing=brooklyn_sales_map%>%
  
  select(year_of_sale,sale_price,ZipCode)%>%
  
  filter(!is.na(year_of_sale),
         !is.na(sale_price),
         !is.na(ZipCode))%>%
         
  mutate(ZipCode = factor(ZipCode))%>%
  
  group_by(year_of_sale,ZipCode)%>%
  
  summarize(avgSale=mean(sale_price))%>%
  
  ungroup()%>%
  
  mutate(year_of_sale = as.integer(year_of_sale))%>%
  
  select(ZipCode,avgSale,year_of_sale)
#Graph by Zip
Housing%>%
  ggplot(aes(x=reorder(ZipCode,avgSale),y=avgSale,fill=avgSale)) +
  geom_bar(stat = "identity") +
  facet_wrap(~year_of_sale)+
  scale_fill_viridis(name = "avgSale", option = "E") +
  coord_flip()+
  labs(title = 'Average Sale',subtitle='By ZipCode',x="ZipCode",y="Average Sale")+
  geom_text(aes(label=round(avgSale,digits=0)), color="black", size=1.9,hjust= -.03) + 
  theme_gray()

As you can see, the 11241 zip code is an outlier that is impacting how our data is displayed. With that in mind, the R chunk below will not include the 11241 zip code to provide better visualizations.

#Data Prep
Housing2 <- brooklyn_sales_map%>%
  
  select(year_of_sale,sale_price,ZipCode)%>%
  
  filter(!is.na(year_of_sale),
         !is.na(sale_price),
         !is.na(ZipCode),
         ZipCode!=11241)%>%
         
  mutate(ZipCode = factor(ZipCode))%>%
  
  group_by(year_of_sale,ZipCode)%>%
  
  summarize(avgSale=mean(sale_price))%>%
  
  ungroup()%>%
  
  mutate(year_of_sale = as.integer(year_of_sale))%>%
  
  select(ZipCode,avgSale,year_of_sale)
#Graph by Zip
Housing2%>%
  ggplot(aes(x=reorder(ZipCode,avgSale),y=avgSale,fill=avgSale)) +
  geom_bar(stat = "identity") +
  facet_wrap(~year_of_sale)+
  scale_fill_viridis(name = "avgSale", option = "E") +
  coord_flip()+
  labs(title = 'Average Sale',subtitle='By ZipCode',x="ZipCode",y="Average Sale")+
  geom_text(aes(label=round(avgSale,digits=0)), color="black", size=2.9,hjust= -.03) + 
  theme_gray()

Animated Graph

Housing2 %>%
ggplot(aes(x=factor(ZipCode),y=avgSale, fill = avgSale)) +
  geom_col(alpha = 0.8) +
  scale_size(range = c(2, 12)) + 
  guides(fill=guide_legend(title="Average Cost Per Home"))+
  labs(title = 'Average Sales Price Per Home Over Time',
       subtitle='Date: {frame_time}', 
       x = 'ZipCode', 
       y = 'Average Sale')+
  transition_time(year_of_sale)+
  coord_flip()+
  theme_gray()

Part Two

graph_data2 <- ggplot(brooklyn_sales_map,mapping =aes(x= year_of_sale, y = LotArea))

graph2 <- graph_data2 + geom_smooth()

graph2 + labs(title = 'Market Trend in Brooklyn New York: Size of Lot Sold from 2003 - 2017') + theme_tufte() 

#Data Prep
LotSize=brooklyn_sales_map%>%
  
  select(year_of_sale,LotArea,ZipCode)%>%
  
  filter(!is.na(year_of_sale),
         !is.na(LotArea),
         !is.na(ZipCode))%>%
         
  mutate(ZipCode = factor(ZipCode))%>%
  
  group_by(year_of_sale,ZipCode)%>%
  
  summarize(avgLotArea=mean(LotArea))%>%
  
  ungroup()%>%
  
  mutate(year_of_sale = as.integer(year_of_sale))%>%
  
  select(ZipCode,avgLotArea,year_of_sale)

Zip Code effects on Lot Size.

#Graph by Zip
LotSize%>%
  ggplot(aes(x=reorder(ZipCode,avgLotArea),y=avgLotArea,fill=avgLotArea)) +
  geom_bar(stat = "identity") +
  facet_wrap(~year_of_sale)+
  scale_fill_viridis(name = "avgLotArea", option = "E") +
  coord_flip()+
  labs(title = 'Average Sale',subtitle='By ZipCode',x="ZipCode",y="Average Lot Area Sold")+
  geom_text(aes(label=round(avgLotArea,digits=0)), color="black", size=2.9,hjust= -.03) + 
  theme_gray()

Animated Graph Number Two

LotSize %>%
ggplot(aes(x=factor(ZipCode),y=avgLotArea, fill = avgLotArea)) +
  geom_col(alpha = 0.8) +
  scale_size(range = c(2, 12)) + 
  guides(fill=guide_legend(title="Average Lot Size Sold"))+
  labs(title = 'Average Lot Size Sold Over Time',
       subtitle='Date: {frame_time}', 
       x = 'ZipCode', 
       y = 'Average Sale')+
  transition_time(year_of_sale)+
  coord_flip()+
  theme_gray()

Conculsion

The purpose of this investigation was to find trends in Brooklyn’s real estate remarket that lead to the neighborhood becoming one of the trendiest places to currently live. To study the patterns in the Brooklyn’s housing market, this study used the ggplot2 visualization package to explore how much and how large home was sold. In the first part of this analysis, we learned that Brooklyn as seen several dips in the market before 2012. However, during 2013 and onwards, Brooklyn properties were are selling at record-breaking prices.

We learned that homes in the 11241 zip code (the DownTown District of Brooklyn) were sold at astronomical prices when the housing market was not doing too well. The 11241 area code acted as an outlier and forced us to exclude this zip code from our analysis. Once this was completed, we learned that other areas such as Park Slope and Williamsburg (ZipCodes 11221, 11237) experienced a boom in real estate too. Homes in many neighborhoods in Brooklyn were selling at record prices.

For the second part of this study, we looked at how zip code and the year influenced the area of the lot sold for homes across Brooklyn. As seen in the first half, the biggest lots sold were observed in the 11241 Zip Code around 2003 and 2006. This finding is interesting because the data suggest that once many large lots are brought, a few years later, many condos will be built and the neighborhood would experience a boom in real estate.

Future studies should use other variables to find more trends in this dataset. Also, datasets for different communities should be used to see if any similarities across all datasets can be found.