Overview

Housing sale prices refers to the monetary value at which a residential property is bought or sold in a given market. This price is influenced by various factors such as location, size and condition of the property, local economic conditions, crime, amenities, and demand-supply dynamics. Housing sale prices serves as a crucial indicator of the real estate market’s health and can impact homeowners, investors, and policymakers alike. Understanding the determinants of housing sales price and analyzing its trends can provide valuable insights into the dynamics of the housing market, inform investment decisions, and aid in policy formulation related to housing affordability, urban planning, and economic development.

We will be analyzing housing sales prices in the Bronx for the year 2023. Analyzing housing sales prices in Bronx, New York, holds particular significance due to its unique characteristics and position within the broader New York City real estate market. As one of the five boroughs of New York City, Bronx exhibits a diverse mix of residential properties ranging from single-family homes to multifamily buildings and condominiums. Historically, Bronx has experienced fluctuations in its housing market, influenced by factors such as urban renewal initiatives, demographic shifts, and economic development projects. Analyzing housing sales prices in the Bronx can provide insights into the affordability of housing, neighborhood dynamics, gentrification trends, and the impact of urban revitalization efforts. Moreover, understanding the housing market in Bronx can contribute to efforts aimed at promoting equitable development, addressing housing disparities, and enhancing the overall quality of life for residents.

Load Package(s)

library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.4.4     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(webshot)
library(magick)
## Linking to ImageMagick 6.9.12.93
## Enabled features: cairo, fontconfig, freetype, heic, lcms, pango, raw, rsvg, webp
## Disabled features: fftw, ghostscript, x11
library(png)

Import Data

## Import dataframe 
bronx_p_2023 <- read.csv(url("https://raw.githubusercontent.com/baa5234/FinalProject/main/2023_bronx%20(1).csv"))

# Link to the data in case it may not work https://github.com/baa5234/FinalProject/blob/main/2023_bronx%20(1).csv
#Raw url if you prefer to use that instead of copy path : https://raw.githubusercontent.com/baa5234/FinalProject/main/2023_bronx%20(1).csv?token=GHSAT0AAAAAACNSJLCYSTCIXFGYY5NRLCJWZR34U6Q


colnames(bronx_p_2023)
##  [1] "BOROUGH"                        "NEIGHBORHOOD"                  
##  [3] "BUILDING.CLASS.CATEGORY"        "TAX.CLASS.AT.PRESENT"          
##  [5] "BLOCK"                          "LOT"                           
##  [7] "EASE.MENT"                      "BUILDING.CLASS.AT.PRESENT"     
##  [9] "ADDRESS"                        "APARTMENT.NUMBER"              
## [11] "ZIP.CODE"                       "RESIDENTIAL.UNITS"             
## [13] "COMMERCIAL.UNITS"               "TOTAL..UNITS"                  
## [15] "LAND..SQUARE.FEET"              "GROSS..SQUARE.FEET"            
## [17] "YEAR.BUILT"                     "TAX.CLASS.AT.TIME.OF.SALE"     
## [19] "BUILDING.CLASS.AT.TIME.OF.SALE" "SALE.PRICE"                    
## [21] "SALE.DATE"
head(bronx_p_2023)
##   BOROUGH NEIGHBORHOOD BUILDING.CLASS.CATEGORY TAX.CLASS.AT.PRESENT BLOCK LOT
## 1       2     BATHGATE 01 ONE FAMILY DWELLINGS                    1  3030  66
## 2       2     BATHGATE 01 ONE FAMILY DWELLINGS                    1  3030  66
## 3       2     BATHGATE 01 ONE FAMILY DWELLINGS                    1  3035  52
## 4       2     BATHGATE 01 ONE FAMILY DWELLINGS                    1  3053  86
## 5       2     BATHGATE 02 TWO FAMILY DWELLINGS                    1  2904  22
## 6       2     BATHGATE 02 TWO FAMILY DWELLINGS                    1  2904  22
##   EASE.MENT BUILDING.CLASS.AT.PRESENT                ADDRESS APARTMENT.NUMBER
## 1        NA                        A1       4453 PARK AVENUE                 
## 2        NA                        A1       4453 PARK AVENUE                 
## 3        NA                        A1    461 EAST 178 STREET                 
## 4        NA                        S0 2364 WASHINGTON AVENUE                 
## 5        NA                        B9    454 EAST 172 STREET                 
## 6        NA                        B9  454 EAST 172ND STREET                 
##   ZIP.CODE RESIDENTIAL.UNITS COMMERCIAL.UNITS TOTAL..UNITS LAND..SQUARE.FEET
## 1    10457                 1                0            1             1,646
## 2    10457                 1                0            1             1,646
## 3    10457                 1                0            1             1,782
## 4    10458                 1                2            3             1,911
## 5    10457                 2                0            2             1,658
## 6    10457                 2                0            2             1,658
##   GROSS..SQUARE.FEET YEAR.BUILT TAX.CLASS.AT.TIME.OF.SALE
## 1              1,497       1899                         1
## 2              1,497       1899                         1
## 3              1,548       1899                         1
## 4              4,080       1931                         1
## 5              1,428       1901                         1
## 6              1,428       1901                         1
##   BUILDING.CLASS.AT.TIME.OF.SALE SALE.PRICE SALE.DATE
## 1                             A1    215,000   4/18/23
## 2                             A1    570,000   8/23/23
## 3                             A1          0   4/14/23
## 4                             S0          0  10/24/23
## 5                             B9    350,000   6/26/23
## 6                             B9    310,000   8/14/23
# Checking for missing values
sum(is.na(bronx_p_2023))
## [1] 8113

There are 8,113 missing values. And we have to remove them.

Clean and Tidy Data

##Selecting necessary columns
bx_prop_23 <- bronx_p_2023 %>%
  select(NEIGHBORHOOD, `BUILDING.CLASS.CATEGORY`, LOT, `ZIP.CODE`, `YEAR.BUILT`, `GROSS..SQUARE.FEET`, `BUILDING.CLASS.AT.TIME.OF.SALE`, `SALE.PRICE`, `SALE.DATE`)

head(bx_prop_23)
##   NEIGHBORHOOD BUILDING.CLASS.CATEGORY LOT ZIP.CODE YEAR.BUILT
## 1     BATHGATE 01 ONE FAMILY DWELLINGS  66    10457       1899
## 2     BATHGATE 01 ONE FAMILY DWELLINGS  66    10457       1899
## 3     BATHGATE 01 ONE FAMILY DWELLINGS  52    10457       1899
## 4     BATHGATE 01 ONE FAMILY DWELLINGS  86    10458       1931
## 5     BATHGATE 02 TWO FAMILY DWELLINGS  22    10457       1901
## 6     BATHGATE 02 TWO FAMILY DWELLINGS  22    10457       1901
##   GROSS..SQUARE.FEET BUILDING.CLASS.AT.TIME.OF.SALE SALE.PRICE SALE.DATE
## 1              1,497                             A1    215,000   4/18/23
## 2              1,497                             A1    570,000   8/23/23
## 3              1,548                             A1          0   4/14/23
## 4              4,080                             S0          0  10/24/23
## 5              1,428                             B9    350,000   6/26/23
## 6              1,428                             B9    310,000   8/14/23
##Renaming column(s)
bx_props_23 <- rename(bx_prop_23, GROSS.SQ.FEET= GROSS..SQUARE.FEET, BUILDING.CLASS= BUILDING.CLASS.AT.TIME.OF.SALE)

colnames(bx_props_23)
## [1] "NEIGHBORHOOD"            "BUILDING.CLASS.CATEGORY"
## [3] "LOT"                     "ZIP.CODE"               
## [5] "YEAR.BUILT"              "GROSS.SQ.FEET"          
## [7] "BUILDING.CLASS"          "SALE.PRICE"             
## [9] "SALE.DATE"
 sum(is.na(bx_props_23))
## [1] 642

Now After removing some columns there are 642 remaining values. I also renamed two columns to make the dataframe columns a bit easier.


Continuation - Cleaning Data

#Convert 0's and 10's in the dataframe to NA
 bx_props_23[bx_props_23==0] <- NA
bx_props_23[bx_props_23==10] <- NA


#Remove NA'S from dataframe
bronx_props_2023 <- na.omit(bx_props_23)

#Remove Comma from column name "SALE.PRICE" data
bronx_props_2023$SALE.PRICE <- gsub(",", "", bronx_props_2023$SALE.PRICE )

#View new Dataframe
view(bronx_props_2023)

# Convert column name "SALE.PRICE" to numeric
bronx_props_2023$SALE.PRICE  <- as.numeric(bronx_props_2023$SALE.PRICE )

#Format column name "Sale.DATE" into month and date 
bronx_props_2023$SALE.DATE <- format(as.Date(bronx_props_2023$SALE.DATE, format="%m/%d/%y"),"%m/%d")

view(bronx_props_2023)

Analyze Data (with filters) Based on Location

Now in the analysis I want to focus on only housing properties so I filtered what I want to focus on, removing all other properties sold in 2023 such as commercial buildings, garages, parking lots , and more.

The 3 types of properties I will be analyzing are “One Family Homes”, “Two Family Homes” and “Condos”(which are apartment style housing property).

Analysis Based on One Family Homes

Mean Price by Neighborhood

#filter one_family house - Labled "ONE FAMILY DWELLINGS"
one_fam_23 <- dplyr::filter(bronx_props_2023, BUILDING.CLASS.CATEGORY== "01 ONE FAMILY DWELLINGS")

#plot of mean sales by neighborhood
mean_sales_by_neighborhood <- aggregate(SALE.PRICE ~ NEIGHBORHOOD, data = one_fam_23, FUN = mean)

P <- ggplot(mean_sales_by_neighborhood, aes(x = NEIGHBORHOOD, y = SALE.PRICE)) +
  geom_bar(stat = "identity", fill = "skyblue") +
  labs(x = "Neighborhood", y = "Mean Sales Price", title = "One Family Homes - Mean Sales Price by Neighborhood")
P + theme(axis.text.x = element_text(angle = 45, hjust = 1))


Based on the mean plot, of One Family Homes (Dwelling) the top 5 mean sales prices by neighborhood (locations) sold in the Bronx of the year 2023 are Fieldston, Riverdale, City Island, Bedford Park, and Kingsbridge(Jerome Park).

Median Price by Neighborhood

median_price_by_neighborhood <- one_fam_23 %>%
  group_by(NEIGHBORHOOD) %>%
  summarize(median_price = median(SALE.PRICE))


P2 <- ggplot(median_price_by_neighborhood, aes(x = NEIGHBORHOOD, y = median_price)) +
  geom_bar(stat = "identity", fill = "skyblue") +
  labs(x = "Neighborhood", y = "Median Price", title = "Median Price by Neighborhood")
P2 + theme(axis.text.x = element_text(angle = 45, hjust = 1))

  Based on the median plot, of One Family Homes (Dwelling), the top 5 median sales prices by neighborhood (locations) sold in the Bronx of the year 2023 are Fieldston, Riverdale, City Island, Kingsbridge Heights (University Heights), and Kingsbridge(Jerome Park).

In both Mean and Median the top 3 housing sales price have in common the following neighborhoods: Fieldston, Riverdale, and City Island. These are 3 of the safest neighborhoods in the Bronx.


Using Total of One Family Homes Sold in Bronx to Compare


This part here might come as a surprise. But the reason for this analysis is to show where houses are mostly sold and compare total sales prices.

Based on the data we can see that Throggs Neck, Riverdale and Baychester had the highest total sales prices in the Bronx in 2023.

total_price_by_neighborhood <- one_fam_23 %>%
  group_by(NEIGHBORHOOD) %>%
  summarize(total_price = sum(SALE.PRICE))


P3 <- ggplot(total_price_by_neighborhood, aes(x = NEIGHBORHOOD, y = total_price)) +
  geom_bar(stat = "identity", fill = "skyblue") +
  labs(x = "Neighborhood", y = "Total Price", title = "Total Price by Neighborhood")
P3 + theme(axis.text.x = element_text(angle = 65, hjust = 1))

Here I want us to look at the number of houses sold, and compare it to the total sales prices. Lets compare Riverdale with another neighborhood as it was one of the top 3 in total sale prices by neighborhood. Comparing nieghborhoods: Riverdale which includes 34 homes sold in 2023 and Soundview as 33 homes sold in 2023. You can see in the plot that despite the number of houses sold being nearly similar with a difference of 1 house; Riverdale total sale prices surpasses Soundview. This shows location matters.

neighborhood_counts <- table(one_fam_23$NEIGHBORHOOD)
print(neighborhood_counts)
## 
##                  BATHGATE                BAYCHESTER      BEDFORD PARK/NORWOOD 
##                         2                        76                         9 
##                 BRONXDALE     CASTLE HILL/UNIONPORT               CITY ISLAND 
##                        36                        20                        34 
##  CITY ISLAND-PELHAM STRIP              COUNTRY CLUB              CROTONA PARK 
##                         1                        45                         3 
##              EAST TREMONT                 FIELDSTON                   FORDHAM 
##                         5                        14                         7 
## HIGHBRIDGE/MORRIS HEIGHTS  KINGSBRIDGE HTS/UNIV HTS   KINGSBRIDGE/JEROME PARK 
##                         6                         9                        16 
##         MELROSE/CONCOURSE      MORRIS PARK/VAN NEST       MORRISANIA/LONGWOOD 
##                         8                        33                        21 
##    MOTT HAVEN/PORT MORRIS     MOUNT HOPE/MOUNT EDEN               PARKCHESTER 
##                        10                         1                         7 
##            PELHAM GARDENS      PELHAM PARKWAY NORTH      PELHAM PARKWAY SOUTH 
##                        24                        31                        23 
##                 RIVERDALE  SCHUYLERVILLE/PELHAM BAY                 SOUNDVIEW 
##                        34                        50                        33 
##               THROGS NECK                 WAKEFIELD               WESTCHESTER 
##                       125                        38                        10 
##            WILLIAMSBRIDGE                  WOODLAWN 
##                        30                        13


Analysis Based on Year Built - Using One Family Homes Data


Using one family homes data frame. We will determine if the year the home was built play a role in the home sales prices.

Does the year the house was built play a role in Housing Sale Prices ?


# group by Year bUILT
price_by_year <- one_fam_23 %>%
  group_by(YEAR.BUILT) %>%
  summarize(total_price1 = sum(SALE.PRICE))

#PLOT 
d <- ggplot(price_by_year, aes(x = YEAR.BUILT, y = total_price1/1000)) +
  geom_line(color = "blue") +
  geom_point(color = "red") +
  labs(x = "Year Built", y = "Sales Price", title = "Sales Prices Over Time") 
   d+  theme(axis.text.x = element_text(angle = 45, hjust = 1))


Using One Family house dataframe. Housing Prices based on Year Built we cannot tell too much cause the sale prices are not consistent. The sale prices fluctuates up and down as the years built goes by. But we can see a major difference as houses built in the 1900s are more pricier(have higher sales prices) than houses built in 2000s despite having newer amenities and more recent structures.


Analysis Based on Two Family Housing

#filter two family homes 
two_fam_23 <- dplyr::filter(bronx_props_2023, BUILDING.CLASS.CATEGORY== "02 TWO FAMILY DWELLINGS")

#plot of mean sales by neighborhood
mean_sales_by_neigh_pt2 <- aggregate(SALE.PRICE ~ NEIGHBORHOOD, data = two_fam_23, FUN = mean)

P <- ggplot(mean_sales_by_neigh_pt2, aes(x = NEIGHBORHOOD, y = SALE.PRICE/10)) +
  geom_bar(stat = "identity", fill="purple") +
  labs(x = "Neighborhood", y = "Mean Sales Price(*10)", title = "Two Family House Mean Sales Price by Neighborhood") 
P + theme(axis.text.x = element_text(angle = 45, hjust = 1))


Top 3 mean housing sales price of two family properties in Bronx are Riverdale, Mount Hope(Port Morris), and Fieldston.

#Median of two family homes aand group by neighborhood
median_price_by_neigh_pt2 <-two_fam_23 %>%
  group_by(NEIGHBORHOOD) %>%
  summarize(median_price2 = median(SALE.PRICE))

#plot
P2 <- ggplot(median_price_by_neigh_pt2, aes(x = NEIGHBORHOOD, y = median_price2)) +
  geom_bar(stat = "identity", fill = "purple") +
  labs(x = "Neighborhood", y = "Median Price", title = "Median Price by Neighborhood of Two Family House")
P2 + theme(axis.text.x = element_text(angle = 45, hjust = 1))


Top 3 mean sale prices of two family homes in the Bronx year 2023 are Riverdale, Mount Hope(Port Morris), and Westchester; followed by Fieldston, City Island, and few more.


Analysis Based on Condos


Now I wanted to also look at housing sale prices on condominiums(CONDOS) to also prove if location matter aside from the one family homes and two family homes.


#filter the condos (using multiple)
filtered_condos <- bronx_props_2023 %>%
  filter(BUILDING.CLASS.CATEGORY %in% c("04 TAX CLASS 1 CONDOS", "13 CONDOS - ELEVATOR APARTMENTS"))
mean_condos_price <- filtered_condos %>%
  group_by(NEIGHBORHOOD) %>%
  summarize(mean_sales_price3 = mean(SALE.PRICE))

# Create a bar plot of mean sales price by neighborhood
barplot <- ggplot(mean_condos_price, aes(x = NEIGHBORHOOD, y = mean_sales_price3/10, fill = NEIGHBORHOOD)) +
  geom_bar(stat = "identity") +
  labs(title = "Mean Condos Sales Price by Neighborhood",
       x = "Neighborhood",
       y = "Mean Sales Price (*10)",
       fill = "Neighborhood") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))  # Rotate x-axis labels for better readability

# Display the bar plot
print(barplot)

#top 5 neighborhoods with high sales prices are Riverdale, City Island, Kingsbridge(Jerome Park) Country Club, and Throggs Neck.
# Group by neighborhood and calculate 
median_condos_price <- filtered_condos %>%
  group_by(NEIGHBORHOOD) %>%
  summarize(median_sales_price3 = median(SALE.PRICE))

# Create a bar plot of median sales price by neighborhood
barplot <- ggplot(median_condos_price, aes(x = NEIGHBORHOOD, y = median_sales_price3/10, fill = NEIGHBORHOOD)) +
  geom_bar(stat = "identity") +
  labs(title = "Median Condos Sales Price by Neighborhood",
       x = "Neighborhood",
       y = "Median Sales Price(*10)",
       fill = "Neighborhood") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))  # Rotate x-axis labels for better readability

# Display the bar plot
print(barplot)

In both Condos Mean and Median Sales Prices based on neighborhoods - (Condos), we can see that the top 5 neighborhoods with high sales prices are Riverdale, City Island, Kingsbridge(Jerome Park) Country Club, and Throggs Neck.


Lets look at a statisical data - on Gross Square Feet


Using the filtered Two Family Home DataFrame, Based on Gross Sq Feet, lets analyze sale prices.

# Perform a simple linear regression
lm_model <- lm(SALE.PRICE ~ GROSS.SQ.FEET  , data = two_fam_23)


# Print the summary of the regression model using code below. 
# summary(lm_model)


# Plot diagnostic plots
plot(lm_model)


Using code : summary(lm_model) which is labeled as a comment ,it gives us the results, R Squared is 0.6653 meaning that gross square feet plays a significant role in the sale prices. Also the plots align with the residuals.

Confirming Analysis of Sale Prices

Safest neighborhoods in Bronx (Google)

library(png)


png_url <- "https://raw.githubusercontent.com/baa5234/FinalProject/main/SAFETEST%20NEIGHBORHOOD%20IN%20BX%20ImG.png"

# Read the PNG image from the URL

download.file(png_url,
              "SAFETEST NEIGHBORHOOD IN BX ImG.png",mode="wb")


img <- readPNG( "SAFETEST NEIGHBORHOOD IN BX ImG.png")


# Display the image
plot(1:2, type = "n", axes = FALSE, xlab = "", ylab = "")
rasterImage(img, 1, 1, 2, 2)

Based on the image we can see the safest neighborhoods in the Bronx are the following above. If you notice, you can see all of the neighborhoods with highest sale prices in our the analysis are on the list above. - Riverdale, City Island, Throggs Neck, Kingsbridge, Country Club, Fieldston, and more.


Also, this article link :https://propertyclub.nyc/article/safest-neighborhoods-in-the-bronx states Riverdale is number 1 safest neighborhood in the Bronx.


Using Webshot of a map neighborhoods in Bronx

library(webshot)
library(magick)

# Define the URL
mapurl <- "https://www.google.com/maps/d/u/0/viewer?mid=1eB3cfuq2tEUpHgHZ6mKzV1UjrU8&hl=en_US&ll=40.854060357151404%2C-73.85685142264114&z=12"

# Capture screenshot
webshot(mapurl, "map_screenshot.png")

# Read the captured image
map_image <- image_read("map_screenshot.png")

# Display the image
plot(map_image)

The screenshot above shows the safest and most dangerous neighborhoods in the Bronx. As you can see, the highest sales prices based on location are not highlighted in red, but instead in yellow, or blank indicating a safe neighborhood.


Interactive Crime Data MAP

Here is a url that you can hover across neighborhoods in the Bronx. You will see neighborhoods with highest sales price have lower crime rates.

# 
crimedata_url <- "https://www.neighborhoodscout.com/ny/bronx/crime"

# Open the URL in the default web browser
browseURL(crimedata_url)


Conclusion

In conclusion we can see variables of neighborhood(location), year built, and gross square feet plays major roles in Housing Sales Prices , most especially neighborhood. Overall, this conclusion highlights the importance of thorough analysis and data-driven decision-making in the real estate market, as well as providing insights into the factors driving house prices and how they can be leveraged to make informed decisions.