Housing sales price refers to the monetary value at which a residential property is bought or sold in a given market. This price is influenced by various factors such as location, size and condition of the property, local economic conditions, crime, amenities, and demand-supply dynamics. Housing sales price serves as a crucial indicator of the real estate market’s health and can impact homeowners, investors, and policymakers alike. Understanding the determinants of housing sales price and analyzing its trends can provide valuable insights into the dynamics of the housing market, inform investment decisions, and aid in policy formulation related to housing affordability, urban planning, and economic development.
We will be analyzing housing sales prices in Bronx of 2023. Analyzing housing sales prices in Bronx, New York, holds particular significance due to its unique characteristics and position within the broader New York City real estate market. As one of the five boroughs of New York City, Bronx exhibits a diverse mix of residential properties ranging from single-family homes to multifamily buildings and condominiums. Historically, Bronx has experienced fluctuations in its housing market, influenced by factors such as urban renewal initiatives, demographic shifts, and economic development projects. Analyzing housing sales prices in Bronx can provide insights into the affordability of housing, neighborhood dynamics, gentrification trends, and the impact of urban revitalization efforts. Moreover, understanding the housing market in Bronx can contribute to efforts aimed at promoting equitable development, addressing housing disparities, and enhancing the overall quality of life for residents.
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.4.4 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(webshot)
library(magick)
## Linking to ImageMagick 6.9.12.93
## Enabled features: cairo, fontconfig, freetype, heic, lcms, pango, raw, rsvg, webp
## Disabled features: fftw, ghostscript, x11
library(png)
## Imported data using "copy path" via github
bronx_p_2023 <- read.csv(url("https://raw.githubusercontent.com/baa5234/FinalProject/main/2023_bronx%20(1).csv"))
# Link to the data in case it may not work https://github.com/baa5234/FinalProject/blob/main/2023_bronx%20(1).csv
#Raw url if you prefer to use that instead of copy path : https://raw.githubusercontent.com/baa5234/FinalProject/main/2023_bronx%20(1).csv?token=GHSAT0AAAAAACNSJLCYSTCIXFGYY5NRLCJWZR34U6Q
colnames(bronx_p_2023)
## [1] "BOROUGH" "NEIGHBORHOOD"
## [3] "BUILDING.CLASS.CATEGORY" "TAX.CLASS.AT.PRESENT"
## [5] "BLOCK" "LOT"
## [7] "EASE.MENT" "BUILDING.CLASS.AT.PRESENT"
## [9] "ADDRESS" "APARTMENT.NUMBER"
## [11] "ZIP.CODE" "RESIDENTIAL.UNITS"
## [13] "COMMERCIAL.UNITS" "TOTAL..UNITS"
## [15] "LAND..SQUARE.FEET" "GROSS..SQUARE.FEET"
## [17] "YEAR.BUILT" "TAX.CLASS.AT.TIME.OF.SALE"
## [19] "BUILDING.CLASS.AT.TIME.OF.SALE" "SALE.PRICE"
## [21] "SALE.DATE"
head(bronx_p_2023)
## BOROUGH NEIGHBORHOOD BUILDING.CLASS.CATEGORY TAX.CLASS.AT.PRESENT BLOCK LOT
## 1 2 BATHGATE 01 ONE FAMILY DWELLINGS 1 3030 66
## 2 2 BATHGATE 01 ONE FAMILY DWELLINGS 1 3030 66
## 3 2 BATHGATE 01 ONE FAMILY DWELLINGS 1 3035 52
## 4 2 BATHGATE 01 ONE FAMILY DWELLINGS 1 3053 86
## 5 2 BATHGATE 02 TWO FAMILY DWELLINGS 1 2904 22
## 6 2 BATHGATE 02 TWO FAMILY DWELLINGS 1 2904 22
## EASE.MENT BUILDING.CLASS.AT.PRESENT ADDRESS APARTMENT.NUMBER
## 1 NA A1 4453 PARK AVENUE
## 2 NA A1 4453 PARK AVENUE
## 3 NA A1 461 EAST 178 STREET
## 4 NA S0 2364 WASHINGTON AVENUE
## 5 NA B9 454 EAST 172 STREET
## 6 NA B9 454 EAST 172ND STREET
## ZIP.CODE RESIDENTIAL.UNITS COMMERCIAL.UNITS TOTAL..UNITS LAND..SQUARE.FEET
## 1 10457 1 0 1 1,646
## 2 10457 1 0 1 1,646
## 3 10457 1 0 1 1,782
## 4 10458 1 2 3 1,911
## 5 10457 2 0 2 1,658
## 6 10457 2 0 2 1,658
## GROSS..SQUARE.FEET YEAR.BUILT TAX.CLASS.AT.TIME.OF.SALE
## 1 1,497 1899 1
## 2 1,497 1899 1
## 3 1,548 1899 1
## 4 4,080 1931 1
## 5 1,428 1901 1
## 6 1,428 1901 1
## BUILDING.CLASS.AT.TIME.OF.SALE SALE.PRICE SALE.DATE
## 1 A1 215,000 4/18/23
## 2 A1 570,000 8/23/23
## 3 A1 0 4/14/23
## 4 S0 0 10/24/23
## 5 B9 350,000 6/26/23
## 6 B9 310,000 8/14/23
# Checking for missing values
sum(is.na(bronx_p_2023))
## [1] 8113
There are 8,113 missing values. We have to remove them.
##Selecting necessary columns
bx_prop_23 <- bronx_p_2023 %>%
select(NEIGHBORHOOD, `BUILDING.CLASS.CATEGORY`, LOT, `ZIP.CODE`, `YEAR.BUILT`, `GROSS..SQUARE.FEET`, `BUILDING.CLASS.AT.TIME.OF.SALE`, `SALE.PRICE`, `SALE.DATE`)
head(bx_prop_23)
## NEIGHBORHOOD BUILDING.CLASS.CATEGORY LOT ZIP.CODE YEAR.BUILT
## 1 BATHGATE 01 ONE FAMILY DWELLINGS 66 10457 1899
## 2 BATHGATE 01 ONE FAMILY DWELLINGS 66 10457 1899
## 3 BATHGATE 01 ONE FAMILY DWELLINGS 52 10457 1899
## 4 BATHGATE 01 ONE FAMILY DWELLINGS 86 10458 1931
## 5 BATHGATE 02 TWO FAMILY DWELLINGS 22 10457 1901
## 6 BATHGATE 02 TWO FAMILY DWELLINGS 22 10457 1901
## GROSS..SQUARE.FEET BUILDING.CLASS.AT.TIME.OF.SALE SALE.PRICE SALE.DATE
## 1 1,497 A1 215,000 4/18/23
## 2 1,497 A1 570,000 8/23/23
## 3 1,548 A1 0 4/14/23
## 4 4,080 S0 0 10/24/23
## 5 1,428 B9 350,000 6/26/23
## 6 1,428 B9 310,000 8/14/23
##Renaming column(s)
bx_props_23 <- rename(bx_prop_23, GROSS.SQ.FEET= GROSS..SQUARE.FEET, BUILDING.CLASS= BUILDING.CLASS.AT.TIME.OF.SALE)
colnames(bx_props_23)
## [1] "NEIGHBORHOOD" "BUILDING.CLASS.CATEGORY"
## [3] "LOT" "ZIP.CODE"
## [5] "YEAR.BUILT" "GROSS.SQ.FEET"
## [7] "BUILDING.CLASS" "SALE.PRICE"
## [9] "SALE.DATE"
sum(is.na(bx_props_23))
## [1] 642
Now After removing some columns there are 642 remaining values. We also renamed two columns to make it a bit easier for us.
#Make 0's and 10's in dataframe to be NA
bx_props_23[bx_props_23==0] <- NA
bx_props_23[bx_props_23==10] <- NA
#Remove NA'S from dataframe
bronx_props_2023 <- na.omit(bx_props_23)
#Removing Comma from Sales Price
bronx_props_2023$SALE.PRICE <- gsub(",", "", bronx_props_2023$SALE.PRICE )
#View new Dataframe
view(bronx_props_2023)
# Make SALE.PRICE to numeric
bronx_props_2023$SALE.PRICE <- as.numeric(bronx_props_2023$SALE.PRICE )
#Format Sale.DATE into month and date
bronx_props_2023$SALE.DATE <- format(as.Date(bronx_props_2023$SALE.DATE, format="%m/%d/%y"),"%m/%d")
view(bronx_props_2023)
Now in this analysis I wanted to focus on only housing properties so I filtered what I want to focus on, removing all other properties sold in 2023 such as commercial buildings, garages, parking lots , and more.
The 3 types of properties I will be analyzing are “One Family Housing”, “Two Family Housing” and “Condos”(which is apartment style housing property).
Mean Price by Neighborhood
#filter one_family house
one_fam_23 <- dplyr::filter(bronx_props_2023, BUILDING.CLASS.CATEGORY== "01 ONE FAMILY DWELLINGS")
#plot of mean sales by neighborhood
mean_sales_by_neighborhood <- aggregate(SALE.PRICE ~ NEIGHBORHOOD, data = one_fam_23, FUN = mean)
P <- ggplot(mean_sales_by_neighborhood, aes(x = NEIGHBORHOOD, y = SALE.PRICE)) +
geom_bar(stat = "identity", fill = "skyblue") +
labs(x = "Neighborhood", y = "Mean Sales Price", title = "One Family House Mean Sales Price by Neighborhood")
P + theme(axis.text.x = element_text(angle = 45, hjust = 1))
Based on the mean anaylsis of One Fam Housing. The top 5 mean by
neighborhood (locations) of One Family Housing sold in Bronx of 2023 are
Fieldston, Riverdale, City Island, Bedford Park, and Kingsbridge(Jerome
Park).
Median Price by Neighborhood
median_price_by_neighborhood <- one_fam_23 %>%
group_by(NEIGHBORHOOD) %>%
summarize(median_price = median(SALE.PRICE))
P2 <- ggplot(median_price_by_neighborhood, aes(x = NEIGHBORHOOD, y = median_price)) +
geom_bar(stat = "identity", fill = "skyblue") +
labs(x = "Neighborhood", y = "Median Price", title = "Median Price by Neighborhood")
P2 + theme(axis.text.x = element_text(angle = 45, hjust = 1))
Based on the median analysis of One Fam Housing. The top 5 median by neighborhood (locations) of One Family Housing sold in Bronx of 2023 are Fieldston, Riverdale, City Island, Kingsbridge Heights (University Heights), and Kingsbridge(Jerome Park).
In both Mean and Median the top 3 housing sales price were the same Fieldston, Riverdale, and City Island. These are 3 of many safest neighborhoods in the Bronx.
This analysis might come as a suprise. But the reason for this
analysis is to show where houses are mostly sold. Based on the data we
can see that in Throggs Neck, Riverdale and Baychester had the highest
total sales prices.
total_price_by_neighborhood <- one_fam_23 %>%
group_by(NEIGHBORHOOD) %>%
summarize(total_price = sum(SALE.PRICE))
P3 <- ggplot(total_price_by_neighborhood, aes(x = NEIGHBORHOOD, y = total_price)) +
geom_bar(stat = "identity", fill = "skyblue") +
labs(x = "Neighborhood", y = "Total Price", title = "Total Price by Neighborhood")
P3 + theme(axis.text.x = element_text(angle = 65, hjust = 1))
Here I want us to look at the number of houses sold, and compare it to the total sales prices. Lets compare Riverdale with another neighborhood as it was on the top 3 in total sales prices by neighborhood. Comparing Riverdale as 34 houses were sold in 2023 and Soundview as 33 houses were sold in 2023: we can that despite the number of houses sold being nearly similar with a difference of 1 sold less. Looking at the previous total by neighborhood Riverdale surpasses Soundview. This shows location matters.
neighborhood_counts <- table(one_fam_23$NEIGHBORHOOD)
print(neighborhood_counts)
##
## BATHGATE BAYCHESTER BEDFORD PARK/NORWOOD
## 2 76 9
## BRONXDALE CASTLE HILL/UNIONPORT CITY ISLAND
## 36 20 34
## CITY ISLAND-PELHAM STRIP COUNTRY CLUB CROTONA PARK
## 1 45 3
## EAST TREMONT FIELDSTON FORDHAM
## 5 14 7
## HIGHBRIDGE/MORRIS HEIGHTS KINGSBRIDGE HTS/UNIV HTS KINGSBRIDGE/JEROME PARK
## 6 9 16
## MELROSE/CONCOURSE MORRIS PARK/VAN NEST MORRISANIA/LONGWOOD
## 8 33 21
## MOTT HAVEN/PORT MORRIS MOUNT HOPE/MOUNT EDEN PARKCHESTER
## 10 1 7
## PELHAM GARDENS PELHAM PARKWAY NORTH PELHAM PARKWAY SOUTH
## 24 31 23
## RIVERDALE SCHUYLERVILLE/PELHAM BAY SOUNDVIEW
## 34 50 33
## THROGS NECK WAKEFIELD WESTCHESTER
## 125 38 10
## WILLIAMSBRIDGE WOODLAWN
## 30 13
Does the year the house was built play a role in Housing
Sales Price ?
price_by_year <- one_fam_23 %>%
group_by(YEAR.BUILT) %>%
summarize(total_price1 = sum(SALE.PRICE))
d <- ggplot(price_by_year, aes(x = YEAR.BUILT, y = total_price1/1000)) +
geom_line(color = "blue") +
geom_point(color = "red") +
labs(x = "Year Built", y = "Sales Price", title = "Sales Prices Over Time")
d+ theme(axis.text.x = element_text(angle = 45, hjust = 1))
Using One Family house dataframe. Housing Prices based on Year
Built we cannot tell too much cause the sales prices is not consistent
it is up and as the years built goes by. But we can see a major
difference as houses built in the 1900s are more pricier(have higher
sales prices) than houses built in 2000s despite having newer amenities
and more recent structures.
#filter two family housing properties
two_fam_23 <- dplyr::filter(bronx_props_2023, BUILDING.CLASS.CATEGORY== "02 TWO FAMILY DWELLINGS")
#plot of mean sales by neighborhood
mean_sales_by_neigh_pt2 <- aggregate(SALE.PRICE ~ NEIGHBORHOOD, data = two_fam_23, FUN = mean)
P <- ggplot(mean_sales_by_neigh_pt2, aes(x = NEIGHBORHOOD, y = SALE.PRICE/10)) +
geom_bar(stat = "identity", fill="purple") +
labs(x = "Neighborhood", y = "Mean Sales Price(*10)", title = "Two Family House Mean Sales Price by Neighborhood")
P + theme(axis.text.x = element_text(angle = 45, hjust = 1))
Top 3 mean housing sales price of two family properties in Bronx
are Riverdale, Mount Hope(Port Morris), and Fieldston.
median_price_by_neigh_pt2 <-two_fam_23 %>%
group_by(NEIGHBORHOOD) %>%
summarize(median_price2 = median(SALE.PRICE))
P2 <- ggplot(median_price_by_neigh_pt2, aes(x = NEIGHBORHOOD, y = median_price2)) +
geom_bar(stat = "identity", fill = "purple") +
labs(x = "Neighborhood", y = "Median Price", title = "Median Price by Neighborhood of Two Family House")
P2 + theme(axis.text.x = element_text(angle = 45, hjust = 1))
Top 3 mean housing sales price of two family properties in Bronx
are Riverdale, Mount Hope(Port Morris), and Westchester, followed by
Fieldston, City Island, and few more.
Now I wanted to also look at housing sale prices on condominiums(CONDOS)
to also prove if location matter aside from the one family houses and
two family houses.
#filter the condos (using multiple)
filtered_condos <- bronx_props_2023 %>%
filter(BUILDING.CLASS.CATEGORY %in% c("04 TAX CLASS 1 CONDOS", "13 CONDOS - ELEVATOR APARTMENTS"))
mean_condos_price <- filtered_condos %>%
group_by(NEIGHBORHOOD) %>%
summarize(mean_sales_price3 = mean(SALE.PRICE))
# Create a bar plot of mean sales price by neighborhood
barplot <- ggplot(mean_condos_price, aes(x = NEIGHBORHOOD, y = mean_sales_price3/10, fill = NEIGHBORHOOD)) +
geom_bar(stat = "identity") +
labs(title = "Mean Condos Sales Price by Neighborhood",
x = "Neighborhood",
y = "Mean Sales Price (*10)",
fill = "Neighborhood") +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) # Rotate x-axis labels for better readability
# Display the bar plot
print(barplot)
# Group by neighborhood and calculate
median_condos_price <- filtered_condos %>%
group_by(NEIGHBORHOOD) %>%
summarize(median_sales_price3 = median(SALE.PRICE))
# Create a bar plot of median sales price by neighborhood
barplot <- ggplot(median_condos_price, aes(x = NEIGHBORHOOD, y = median_sales_price3/10, fill = NEIGHBORHOOD)) +
geom_bar(stat = "identity") +
labs(title = "Median Condos Sales Price by Neighborhood",
x = "Neighborhood",
y = "Median Sales Price(*10)",
fill = "Neighborhood") +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) # Rotate x-axis labels for better readability
# Display the bar plot
print(barplot)
In both Condos Mean and Median Sales Prices based on neighborhoods - (Condos), we can see that the top 5 neighborhoods with high sales prices are Riverdale, City Island, Kingsbridge(Jerome Park) Country Club, and Throggs Neck.
Using the filtered Two Family DataFrame Based on Gross Sq Feet, lets
analyze sales price.
# Perform a simple linear regression
lm_model <- lm(SALE.PRICE ~ GROSS.SQ.FEET , data = two_fam_23)
# Print the summary of the regression model using code below.
# summary(lm_model)
# Plot diagnostic plots
plot(lm_model)
Using code : summary(lm_model) it gives us the results, R Squared is 0.6653 meaning that gross square feet plays a significant role in the sales prices. Also the plots align with the residuals. Due to the summary being long it is shown as comment.
Safest neighborhoods in Bronx (Google)
library(png)
png_url <- "https://raw.githubusercontent.com/baa5234/FinalProject/main/SAFETEST%20NEIGHBORHOOD%20IN%20BX%20ImG.png"
# Read the PNG image from the URL
download.file(png_url,
"SAFETEST NEIGHBORHOOD IN BX ImG.png",mode="wb")
img <- readPNG( "SAFETEST NEIGHBORHOOD IN BX ImG.png")
# Display the image
plot(1:2, type = "n", axes = FALSE, xlab = "", ylab = "")
rasterImage(img, 1, 1, 2, 2)
Based on the image we can see the safest neighborhoods are the following above. Also, the article link :https://propertyclub.nyc/article/safest-neighborhoods-in-the-bronx shows Riverdale is number 1 safest neighborhood in Bronx. But also we can see the neighborhoods with highest sales prices are on the list above.
Using Webshot of a map neighborhoods in Bronx
library(webshot)
library(magick)
# Define the URL
mapurl <- "https://www.google.com/maps/d/u/0/viewer?mid=1eB3cfuq2tEUpHgHZ6mKzV1UjrU8&hl=en_US&ll=40.854060357151404%2C-73.85685142264114&z=12"
# Capture screenshot
webshot(mapurl, "map_screenshot.png")
# Read the captured image
map_image <- image_read("map_screenshot.png")
# Display the image
plot(map_image)
The screenshot above shows safest and dangerous neighborhoods in Bronx. As you can see the following the highest sales prices based on location are not highlighted as red but instead as yellow meaning safe neighborhood.
Here is a url that you can hover across neighborhoods in the Bronx. You will see neighborhoods with highest sales price have lower crime rates.
#
crimedata_url <- "https://www.neighborhoodscout.com/ny/bronx/crime"
# Open the URL in the default web browser
browseURL(crimedata_url)
In conclusion we can see variables of neighborhood(location), year built, and gross square feet plays major roles in Housing Sales Prices , most especially neighborhood. Overall, this conclusion highlights the importance of thorough analysis and data-driven decision-making in the real estate market, as well as providing insights into the factors driving house prices and how they can be leveraged to make informed decisions.