Housing sale prices refers to the monetary value at which a residential property is bought or sold in a given market. This price is influenced by various factors such as location, size and condition of the property, local economic conditions, crime, amenities, and demand-supply dynamics. Housing sale prices serves as a crucial indicator of the real estate market’s health and can impact homeowners, investors, and policymakers alike. Understanding the determinants of housing sales price and analyzing its trends can provide valuable insights into the dynamics of the housing market, inform investment decisions, and aid in policy formulation related to housing affordability, urban planning, and economic development.
We will be analyzing housing sales prices in the Bronx for the year 2023. Analyzing housing sales prices in Bronx, New York, holds particular significance due to its unique characteristics and position within the broader New York City real estate market. As one of the five boroughs of New York City, Bronx exhibits a diverse mix of residential properties ranging from single-family homes to multifamily buildings and condominiums. Historically, Bronx has experienced fluctuations in its housing market, influenced by factors such as urban renewal initiatives, demographic shifts, and economic development projects. Analyzing housing sales prices in the Bronx can provide insights into the affordability of housing, neighborhood dynamics, gentrification trends, and the impact of urban revitalization efforts. Moreover, understanding the housing market in Bronx can contribute to efforts aimed at promoting equitable development, addressing housing disparities, and enhancing the overall quality of life for residents.
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.4.4 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(webshot)
library(magick)
## Linking to ImageMagick 6.9.12.93
## Enabled features: cairo, fontconfig, freetype, heic, lcms, pango, raw, rsvg, webp
## Disabled features: fftw, ghostscript, x11
library(png)
## Import dataframe
bronx_p_2023 <- read.csv(url("https://raw.githubusercontent.com/baa5234/FinalProject/main/2023_bronx%20(1).csv"))
# Link to the data in case it may not work https://github.com/baa5234/FinalProject/blob/main/2023_bronx%20(1).csv
#Raw url if you prefer to use that instead of copy path : https://raw.githubusercontent.com/baa5234/FinalProject/main/2023_bronx%20(1).csv?token=GHSAT0AAAAAACNSJLCYSTCIXFGYY5NRLCJWZR34U6Q
colnames(bronx_p_2023)
## [1] "BOROUGH" "NEIGHBORHOOD"
## [3] "BUILDING.CLASS.CATEGORY" "TAX.CLASS.AT.PRESENT"
## [5] "BLOCK" "LOT"
## [7] "EASE.MENT" "BUILDING.CLASS.AT.PRESENT"
## [9] "ADDRESS" "APARTMENT.NUMBER"
## [11] "ZIP.CODE" "RESIDENTIAL.UNITS"
## [13] "COMMERCIAL.UNITS" "TOTAL..UNITS"
## [15] "LAND..SQUARE.FEET" "GROSS..SQUARE.FEET"
## [17] "YEAR.BUILT" "TAX.CLASS.AT.TIME.OF.SALE"
## [19] "BUILDING.CLASS.AT.TIME.OF.SALE" "SALE.PRICE"
## [21] "SALE.DATE"
head(bronx_p_2023)
## BOROUGH NEIGHBORHOOD BUILDING.CLASS.CATEGORY TAX.CLASS.AT.PRESENT BLOCK LOT
## 1 2 BATHGATE 01 ONE FAMILY DWELLINGS 1 3030 66
## 2 2 BATHGATE 01 ONE FAMILY DWELLINGS 1 3030 66
## 3 2 BATHGATE 01 ONE FAMILY DWELLINGS 1 3035 52
## 4 2 BATHGATE 01 ONE FAMILY DWELLINGS 1 3053 86
## 5 2 BATHGATE 02 TWO FAMILY DWELLINGS 1 2904 22
## 6 2 BATHGATE 02 TWO FAMILY DWELLINGS 1 2904 22
## EASE.MENT BUILDING.CLASS.AT.PRESENT ADDRESS APARTMENT.NUMBER
## 1 NA A1 4453 PARK AVENUE
## 2 NA A1 4453 PARK AVENUE
## 3 NA A1 461 EAST 178 STREET
## 4 NA S0 2364 WASHINGTON AVENUE
## 5 NA B9 454 EAST 172 STREET
## 6 NA B9 454 EAST 172ND STREET
## ZIP.CODE RESIDENTIAL.UNITS COMMERCIAL.UNITS TOTAL..UNITS LAND..SQUARE.FEET
## 1 10457 1 0 1 1,646
## 2 10457 1 0 1 1,646
## 3 10457 1 0 1 1,782
## 4 10458 1 2 3 1,911
## 5 10457 2 0 2 1,658
## 6 10457 2 0 2 1,658
## GROSS..SQUARE.FEET YEAR.BUILT TAX.CLASS.AT.TIME.OF.SALE
## 1 1,497 1899 1
## 2 1,497 1899 1
## 3 1,548 1899 1
## 4 4,080 1931 1
## 5 1,428 1901 1
## 6 1,428 1901 1
## BUILDING.CLASS.AT.TIME.OF.SALE SALE.PRICE SALE.DATE
## 1 A1 215,000 4/18/23
## 2 A1 570,000 8/23/23
## 3 A1 0 4/14/23
## 4 S0 0 10/24/23
## 5 B9 350,000 6/26/23
## 6 B9 310,000 8/14/23
# Checking for missing values
sum(is.na(bronx_p_2023))
## [1] 8113
There are 8,113 missing values. And we have to remove them.
##Selecting necessary columns
bx_prop_23 <- bronx_p_2023 %>%
select(NEIGHBORHOOD, `BUILDING.CLASS.CATEGORY`, LOT, `ZIP.CODE`, `YEAR.BUILT`, `GROSS..SQUARE.FEET`, `BUILDING.CLASS.AT.TIME.OF.SALE`, `SALE.PRICE`, `SALE.DATE`)
head(bx_prop_23)
## NEIGHBORHOOD BUILDING.CLASS.CATEGORY LOT ZIP.CODE YEAR.BUILT
## 1 BATHGATE 01 ONE FAMILY DWELLINGS 66 10457 1899
## 2 BATHGATE 01 ONE FAMILY DWELLINGS 66 10457 1899
## 3 BATHGATE 01 ONE FAMILY DWELLINGS 52 10457 1899
## 4 BATHGATE 01 ONE FAMILY DWELLINGS 86 10458 1931
## 5 BATHGATE 02 TWO FAMILY DWELLINGS 22 10457 1901
## 6 BATHGATE 02 TWO FAMILY DWELLINGS 22 10457 1901
## GROSS..SQUARE.FEET BUILDING.CLASS.AT.TIME.OF.SALE SALE.PRICE SALE.DATE
## 1 1,497 A1 215,000 4/18/23
## 2 1,497 A1 570,000 8/23/23
## 3 1,548 A1 0 4/14/23
## 4 4,080 S0 0 10/24/23
## 5 1,428 B9 350,000 6/26/23
## 6 1,428 B9 310,000 8/14/23
##Renaming column(s)
bx_props_23 <- rename(bx_prop_23, GROSS.SQ.FEET= GROSS..SQUARE.FEET, BUILDING.CLASS= BUILDING.CLASS.AT.TIME.OF.SALE)
colnames(bx_props_23)
## [1] "NEIGHBORHOOD" "BUILDING.CLASS.CATEGORY"
## [3] "LOT" "ZIP.CODE"
## [5] "YEAR.BUILT" "GROSS.SQ.FEET"
## [7] "BUILDING.CLASS" "SALE.PRICE"
## [9] "SALE.DATE"
sum(is.na(bx_props_23))
## [1] 642
Now After removing some columns there are 642 remaining values. I also renamed two columns to make the dataframe columns a bit easier.
#Convert 0's and 10's in the dataframe to NA
bx_props_23[bx_props_23==0] <- NA
bx_props_23[bx_props_23==10] <- NA
#Remove NA'S from dataframe
bronx_props_2023 <- na.omit(bx_props_23)
#Remove Comma from column name "SALE.PRICE" data
bronx_props_2023$SALE.PRICE <- gsub(",", "", bronx_props_2023$SALE.PRICE )
#View new Dataframe
view(bronx_props_2023)
# Convert column name "SALE.PRICE" to numeric
bronx_props_2023$SALE.PRICE <- as.numeric(bronx_props_2023$SALE.PRICE )
#Format column name "Sale.DATE" into month and date
bronx_props_2023$SALE.DATE <- format(as.Date(bronx_props_2023$SALE.DATE, format="%m/%d/%y"),"%m/%d")
view(bronx_props_2023)
Now in the analysis I want to focus on only housing properties so I filtered what I want to focus on, removing all other properties sold in 2023 such as commercial buildings, garages, parking lots , and more.
The 3 types of properties I will be analyzing are “One Family Homes”, “Two Family Homes” and “Condos”(which are apartment style housing property).
Mean Price by Neighborhood
#filter one_family house - Labled "ONE FAMILY DWELLINGS"
one_fam_23 <- dplyr::filter(bronx_props_2023, BUILDING.CLASS.CATEGORY== "01 ONE FAMILY DWELLINGS")
#plot of mean sales by neighborhood
mean_sales_by_neighborhood <- aggregate(SALE.PRICE ~ NEIGHBORHOOD, data = one_fam_23, FUN = mean)
P <- ggplot(mean_sales_by_neighborhood, aes(x = NEIGHBORHOOD, y = SALE.PRICE)) +
geom_bar(stat = "identity", fill = "skyblue") +
labs(x = "Neighborhood", y = "Mean Sales Price", title = "One Family Homes - Mean Sales Price by Neighborhood")
P + theme(axis.text.x = element_text(angle = 45, hjust = 1))
Based on the mean plot, of One Family Homes (Dwelling) the top 5
mean sales prices by neighborhood (locations) sold in the Bronx of the
year 2023 are Fieldston, Riverdale, City Island, Bedford Park, and
Kingsbridge(Jerome Park).
Median Price by Neighborhood
median_price_by_neighborhood <- one_fam_23 %>%
group_by(NEIGHBORHOOD) %>%
summarize(median_price = median(SALE.PRICE))
P2 <- ggplot(median_price_by_neighborhood, aes(x = NEIGHBORHOOD, y = median_price)) +
geom_bar(stat = "identity", fill = "skyblue") +
labs(x = "Neighborhood", y = "Median Price", title = "Median Price by Neighborhood")
P2 + theme(axis.text.x = element_text(angle = 45, hjust = 1))
Based on the median plot, of One Family Homes (Dwelling), the top 5 median sales prices by neighborhood (locations) sold in the Bronx of the year 2023 are Fieldston, Riverdale, City Island, Kingsbridge Heights (University Heights), and Kingsbridge(Jerome Park).
In both Mean and Median the top 3 housing sales price have in common the following neighborhoods: Fieldston, Riverdale, and City Island. These are 3 of the safest neighborhoods in the Bronx.
This part here might come as a surprise. But the reason for this
analysis is to show where houses are mostly sold and compare total sales
prices.
Based on the data we can see that Throggs Neck, Riverdale and Baychester had the highest total sales prices in the Bronx in 2023.
total_price_by_neighborhood <- one_fam_23 %>%
group_by(NEIGHBORHOOD) %>%
summarize(total_price = sum(SALE.PRICE))
P3 <- ggplot(total_price_by_neighborhood, aes(x = NEIGHBORHOOD, y = total_price)) +
geom_bar(stat = "identity", fill = "skyblue") +
labs(x = "Neighborhood", y = "Total Price", title = "Total Price by Neighborhood")
P3 + theme(axis.text.x = element_text(angle = 65, hjust = 1))
Here I want us to look at the number of houses sold, and compare it to the total sales prices. Lets compare Riverdale with another neighborhood as it was one of the top 3 in total sale prices by neighborhood. Comparing nieghborhoods: Riverdale which includes 34 homes sold in 2023 and Soundview as 33 homes sold in 2023. You can see in the plot that despite the number of houses sold being nearly similar with a difference of 1 house; Riverdale total sale prices surpasses Soundview. This shows location matters.
neighborhood_counts <- table(one_fam_23$NEIGHBORHOOD)
print(neighborhood_counts)
##
## BATHGATE BAYCHESTER BEDFORD PARK/NORWOOD
## 2 76 9
## BRONXDALE CASTLE HILL/UNIONPORT CITY ISLAND
## 36 20 34
## CITY ISLAND-PELHAM STRIP COUNTRY CLUB CROTONA PARK
## 1 45 3
## EAST TREMONT FIELDSTON FORDHAM
## 5 14 7
## HIGHBRIDGE/MORRIS HEIGHTS KINGSBRIDGE HTS/UNIV HTS KINGSBRIDGE/JEROME PARK
## 6 9 16
## MELROSE/CONCOURSE MORRIS PARK/VAN NEST MORRISANIA/LONGWOOD
## 8 33 21
## MOTT HAVEN/PORT MORRIS MOUNT HOPE/MOUNT EDEN PARKCHESTER
## 10 1 7
## PELHAM GARDENS PELHAM PARKWAY NORTH PELHAM PARKWAY SOUTH
## 24 31 23
## RIVERDALE SCHUYLERVILLE/PELHAM BAY SOUNDVIEW
## 34 50 33
## THROGS NECK WAKEFIELD WESTCHESTER
## 125 38 10
## WILLIAMSBRIDGE WOODLAWN
## 30 13
Using one family homes data frame. We will determine if the year the
home was built play a role in the home sales prices.
Does the year the house was built play a role in Housing Sale Prices ?
# group by Year bUILT
price_by_year <- one_fam_23 %>%
group_by(YEAR.BUILT) %>%
summarize(total_price1 = sum(SALE.PRICE))
#PLOT
d <- ggplot(price_by_year, aes(x = YEAR.BUILT, y = total_price1/1000)) +
geom_line(color = "blue") +
geom_point(color = "red") +
labs(x = "Year Built", y = "Sales Price", title = "Sales Prices Over Time")
d+ theme(axis.text.x = element_text(angle = 45, hjust = 1))
Using One Family house dataframe. Housing Prices based on Year
Built we cannot tell too much cause the sale prices are not consistent.
The sale prices fluctuates up and down as the years built goes by. But
we can see a major difference as houses built in the 1900s are more
pricier(have higher sales prices) than houses built in 2000s despite
having newer amenities and more recent structures.
#filter two family homes
two_fam_23 <- dplyr::filter(bronx_props_2023, BUILDING.CLASS.CATEGORY== "02 TWO FAMILY DWELLINGS")
#plot of mean sales by neighborhood
mean_sales_by_neigh_pt2 <- aggregate(SALE.PRICE ~ NEIGHBORHOOD, data = two_fam_23, FUN = mean)
P <- ggplot(mean_sales_by_neigh_pt2, aes(x = NEIGHBORHOOD, y = SALE.PRICE/10)) +
geom_bar(stat = "identity", fill="purple") +
labs(x = "Neighborhood", y = "Mean Sales Price(*10)", title = "Two Family House Mean Sales Price by Neighborhood")
P + theme(axis.text.x = element_text(angle = 45, hjust = 1))
Top 3 mean housing sales price of two family properties in Bronx
are Riverdale, Mount Hope(Port Morris), and Fieldston.
#Median of two family homes aand group by neighborhood
median_price_by_neigh_pt2 <-two_fam_23 %>%
group_by(NEIGHBORHOOD) %>%
summarize(median_price2 = median(SALE.PRICE))
#plot
P2 <- ggplot(median_price_by_neigh_pt2, aes(x = NEIGHBORHOOD, y = median_price2)) +
geom_bar(stat = "identity", fill = "purple") +
labs(x = "Neighborhood", y = "Median Price", title = "Median Price by Neighborhood of Two Family House")
P2 + theme(axis.text.x = element_text(angle = 45, hjust = 1))
Top 3 mean sale prices of two family homes in the Bronx year
2023 are Riverdale, Mount Hope(Port Morris), and Westchester; followed
by Fieldston, City Island, and few more.
Now I wanted to also look at housing sale prices on condominiums(CONDOS)
to also prove if location matter aside from the one family homes and two
family homes.
#filter the condos (using multiple)
filtered_condos <- bronx_props_2023 %>%
filter(BUILDING.CLASS.CATEGORY %in% c("04 TAX CLASS 1 CONDOS", "13 CONDOS - ELEVATOR APARTMENTS"))
mean_condos_price <- filtered_condos %>%
group_by(NEIGHBORHOOD) %>%
summarize(mean_sales_price3 = mean(SALE.PRICE))
# Create a bar plot of mean sales price by neighborhood
barplot <- ggplot(mean_condos_price, aes(x = NEIGHBORHOOD, y = mean_sales_price3/10, fill = NEIGHBORHOOD)) +
geom_bar(stat = "identity") +
labs(title = "Mean Condos Sales Price by Neighborhood",
x = "Neighborhood",
y = "Mean Sales Price (*10)",
fill = "Neighborhood") +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) # Rotate x-axis labels for better readability
# Display the bar plot
print(barplot)
#top 5 neighborhoods with high sales prices are Riverdale, City Island, Kingsbridge(Jerome Park) Country Club, and Throggs Neck.
# Group by neighborhood and calculate
median_condos_price <- filtered_condos %>%
group_by(NEIGHBORHOOD) %>%
summarize(median_sales_price3 = median(SALE.PRICE))
# Create a bar plot of median sales price by neighborhood
barplot <- ggplot(median_condos_price, aes(x = NEIGHBORHOOD, y = median_sales_price3/10, fill = NEIGHBORHOOD)) +
geom_bar(stat = "identity") +
labs(title = "Median Condos Sales Price by Neighborhood",
x = "Neighborhood",
y = "Median Sales Price(*10)",
fill = "Neighborhood") +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) # Rotate x-axis labels for better readability
# Display the bar plot
print(barplot)
In both Condos Mean and Median Sales Prices based on neighborhoods - (Condos), we can see that the top 5 neighborhoods with high sales prices are Riverdale, City Island, Kingsbridge(Jerome Park) Country Club, and Throggs Neck.
Using the filtered Two Family Home DataFrame, Based on Gross Sq Feet,
lets analyze sale prices.
# Perform a simple linear regression
lm_model <- lm(SALE.PRICE ~ GROSS.SQ.FEET , data = two_fam_23)
# Print the summary of the regression model using code below.
# summary(lm_model)
# Plot diagnostic plots
plot(lm_model)
Using code : summary(lm_model) which is labeled as a comment
,it gives us the results, R Squared is 0.6653 meaning that gross square
feet plays a significant role in the sale prices. Also the plots align
with the residuals.
Safest neighborhoods in Bronx (Google)
library(png)
png_url <- "https://raw.githubusercontent.com/baa5234/FinalProject/main/SAFETEST%20NEIGHBORHOOD%20IN%20BX%20ImG.png"
# Read the PNG image from the URL
download.file(png_url,
"SAFETEST NEIGHBORHOOD IN BX ImG.png",mode="wb")
img <- readPNG( "SAFETEST NEIGHBORHOOD IN BX ImG.png")
# Display the image
plot(1:2, type = "n", axes = FALSE, xlab = "", ylab = "")
rasterImage(img, 1, 1, 2, 2)
Based on the image we can see the safest neighborhoods in the Bronx are the following above. If you notice, you can see all of the neighborhoods with highest sale prices in our the analysis are on the list above. - Riverdale, City Island, Throggs Neck, Kingsbridge, Country Club, Fieldston, and more.
Also, this article link :https://propertyclub.nyc/article/safest-neighborhoods-in-the-bronx
states Riverdale is number 1 safest neighborhood in the Bronx.
Using Webshot of a map neighborhoods in Bronx
library(webshot)
library(magick)
# Define the URL
mapurl <- "https://www.google.com/maps/d/u/0/viewer?mid=1eB3cfuq2tEUpHgHZ6mKzV1UjrU8&hl=en_US&ll=40.854060357151404%2C-73.85685142264114&z=12"
# Capture screenshot
webshot(mapurl, "map_screenshot.png")
# Read the captured image
map_image <- image_read("map_screenshot.png")
# Display the image
plot(map_image)
The screenshot above shows the safest and most dangerous neighborhoods in the Bronx. As you can see, the highest sales prices based on location are not highlighted in red, but instead in yellow, or blank indicating a safe neighborhood.
Here is a url that you can hover across neighborhoods in the Bronx. You will see neighborhoods with highest sales price have lower crime rates.
#
crimedata_url <- "https://www.neighborhoodscout.com/ny/bronx/crime"
# Open the URL in the default web browser
browseURL(crimedata_url)
In conclusion we can see variables of neighborhood(location), year built, and gross square feet plays major roles in Housing Sales Prices , most especially neighborhood. Overall, this conclusion highlights the importance of thorough analysis and data-driven decision-making in the real estate market, as well as providing insights into the factors driving house prices and how they can be leveraged to make informed decisions.