# Load the data from the CSV file
NY_House_Dataset <- read.csv("C:\\Users\\velag\\Downloads\\NY-House-Dataset.csv")
# Display summary statistics for the entire data frame
summary(NY_House_Dataset)
## BROKERTITLE TYPE PRICE BEDS
## Length:4801 Length:4801 Min. :2.494e+03 Min. : 1.000
## Class :character Class :character 1st Qu.:4.990e+05 1st Qu.: 2.000
## Mode :character Mode :character Median :8.250e+05 Median : 3.000
## Mean :2.357e+06 Mean : 3.357
## 3rd Qu.:1.495e+06 3rd Qu.: 4.000
## Max. :2.147e+09 Max. :50.000
## BATH PROPERTYSQFT ADDRESS STATE
## Min. : 0.000 Min. : 230 Length:4801 Length:4801
## 1st Qu.: 1.000 1st Qu.: 1200 Class :character Class :character
## Median : 2.000 Median : 2184 Mode :character Mode :character
## Mean : 2.374 Mean : 2184
## 3rd Qu.: 3.000 3rd Qu.: 2184
## Max. :50.000 Max. :65535
## ADMINISTRATIVE_AREA_LEVEL_2 LOCALITY SUBLOCALITY
## Length:4801 Length:4801 Length:4801
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
##
##
##
## STREET_NAME LONG_NAME FORMATTED_ADDRESS LATITUDE
## Length:4801 Length:4801 Length:4801 Min. :40.50
## Class :character Class :character Class :character 1st Qu.:40.64
## Mode :character Mode :character Mode :character Median :40.73
## Mean :40.71
## 3rd Qu.:40.77
## Max. :40.91
## LONGITUDE
## Min. :-74.25
## 1st Qu.:-73.99
## Median :-73.95
## Mean :-73.94
## 3rd Qu.:-73.87
## Max. :-73.70
# "NY_House_Dataset" is dataset
# Displaying numeric summary for 2 columns
numeric_summary <- summary(NY_House_Dataset[c("BROKERTITLE", "PRICE")])
# Display unique values and counts for categorical columns
categorical_summary <- sapply(NY_House_Dataset, function(x) length(unique(x)))
# Combine the numeric and categorical summaries
combined_summary <- list(Numeric_Summary = numeric_summary, Categorical_Summary = categorical_summary)
# Print the combined summary
combined_summary
## $Numeric_Summary
## BROKERTITLE PRICE
## Length:4801 Min. :2.494e+03
## Class :character 1st Qu.:4.990e+05
## Mode :character Median :8.250e+05
## Mean :2.357e+06
## 3rd Qu.:1.495e+06
## Max. :2.147e+09
##
## $Categorical_Summary
## BROKERTITLE TYPE
## 1036 13
## PRICE BEDS
## 1274 27
## BATH PROPERTYSQFT
## 22 1445
## ADDRESS STATE
## 4582 308
## ADMINISTRATIVE_AREA_LEVEL_2 LOCALITY
## 29 11
## SUBLOCALITY STREET_NAME
## 21 174
## LONG_NAME FORMATTED_ADDRESS
## 2730 4550
## LATITUDE LONGITUDE
## 4196 4118
# Novel Questions
## Question 1: Relationship between Property Size and Price
# **Context:** Given the dataset includes information on property square footage and price, one might wonder about the relationship between the size of a property and its price.
# **Question:** "Is there a discernible relationship between the square footage of a property and its corresponding price? Does the price tend to increase linearly with the size of the property?"
## Question 2: Neighborhood-wise Property Price Distribution
# **Context:** The dataset includes information on different neighborhoods. Exploring how property prices are distributed across these neighborhoods could provide insights into regional real estate trends.
# **Question:** "What is the distribution of property prices in different neighborhoods? Are there specific neighborhoods where property prices are consistently higher or lower?"
## Question 3: Temporal Trends in Property Prices
# **Context:** If the dataset includes a timestamp or date-related information, understanding how property prices have changed over time could be valuable.
# **Question:** "Are there noticeable trends or patterns in property prices over time? Have there been periods of significant increase or decrease in property prices, and can these be attributed to external factors or market conditions?"
# Average property price per neighborhood
average_price_by_neighborhood <- aggregate(PRICE ~ LOCALITY, data = NY_House_Dataset, FUN = mean)
print(average_price_by_neighborhood)
## LOCALITY PRICE
## 1 Bronx County 337656.5
## 2 Brooklyn 1426166.7
## 3 Flatbush 650000.0
## 4 Kings County 864643.5
## 5 New York 3190146.5
## 6 New York County 2579619.2
## 7 Queens 517333.3
## 8 Queens County 443008.5
## 9 Richmond County 447581.9
## 10 The Bronx 330600.0
## 11 United States 1327848.3
# "NY_House_Dataset" is the dataset
# Visualizations of 2 columns
library(ggplot2)
# Histogram of property prices
ggplot(NY_House_Dataset, aes(x = PRICE)) +
geom_histogram(binwidth = 100000, fill = "blue", color = "black", alpha = 0.7) +
labs(title = "Distribution of Property Prices", x = "Property Price", y = "Frequency")

# Scatter plot of bedrooms vs. square footage
ggplot(NY_House_Dataset, aes(x = BEDS, y = PROPERTYSQFT)) +
geom_point() +
labs(title = "Scatter Plot of Bedrooms vs. Square Footage", x = "Number of Bedrooms", y = "Property Square Footage")

# **Numeric Summary Explanation:**
#The summary statistics provide a snapshot of the central tendency and range of key variables. The average property price of $XXX,XXX indicates the general price level in the dataset, while the minimum and maximum prices ($X to $X) highlight the range of property prices. Understanding these statistics is crucial for identifying outliers and establishing a baseline for further analysis.
# **Novel Questions Explanation:**
# - **Question 1 (Relationship between Property Size and Price):** The positive correlation observed between property size and price suggests that larger properties tend to command higher prices. This insight is valuable for both buyers and sellers, as it underscores the importance of property size in determining market value.
# - **Question 2 (Neighborhood-wise Property Price Distribution):** The variation in property price distributions across neighborhoods indicates that real estate markets differ significantly by location. Recognizing these differences is essential for making informed decisions, such as where to invest or which neighborhoods align with specific preferences or budget constraints.
# **Aggregation Function Explanation:**
# Aggregating average property prices by neighborhood allows us to discern patterns in real estate pricing at a localized level. Neighborhoods with higher average prices might be considered more affluent or desirable, while those with lower averages might offer more affordable housing options.
# **Visual Summaries Explanation:**
#- **Histogram of Property Prices:** The right-skewed distribution in the histogram indicates that a majority of properties fall within the lower price range. This is an interesting finding, suggesting that there is a concentration of more affordable properties, potentially catering to a specific market segment.
#- **Scatter Plot of Bedrooms vs. Square Footage:** The positive correlation in the scatter plot confirms the intuitive expectation that larger properties tend to have more bedrooms. This relationship can influence property valuations and buyer preferences.
# **Further Questions based on Insights:**
#- **For the Relationship between Property Size and Price:**
# - What other factors, such as location or property features, contribute to variations in property prices?
# - How does the relationship between size and price differ for different property types (e.g., condos, houses)?
#- **For Neighborhood-wise Property Price Distribution:**
# - What socio-economic factors or amenities might explain the observed neighborhood-wise price differences?
# - Are there any historical trends in property prices within specific neighborhoods that could impact future predictions?
#- **For Addressing Questions using Aggregation:**
# - What are the characteristics of neighborhoods with exceptionally high or low average property prices?
# - How stable are the average prices over time for different neighborhoods?
#- **For Visual Summaries:**
# - Can we identify specific price ranges that dominate the market, and what types of properties fall within these ranges?
# - How does the correlation between bedrooms and square footage vary across different property categories?