This report explores housing market data using the dataset available at: https://www.lock5stat.com/datasets3e/HomesForSale.csv. With 120 observations and five variables, the dataset captures key characteristics of homes for sale in California (CA), New Jersey (NJ), New York (NY), and Pennsylvania (PA) in 2019.
The goal of this analysis is to investigate how factors such as home size, number of bedrooms, and number of bathrooms influence the asking price of homes, both individually and collectively. In addition, we examine whether there are significant differences in home prices across the four states. The study uses simple and multiple linear regression models, ANOVA, and graphical visualizations to answer five core research questions.
You can also embed plots, for example:
home <- read.csv("https://www.lock5stat.com/datasets3e/HomesForSale.csv")
CA_home <- subset(home, State == "CA")
library(ggplot2)
ggplot(CA_home, aes(x = Size, y = Price)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
labs(title = "Home Price vs Size in California",
x = "Size (1,000 sq. ft.)",
y = "Price ($1,000's)")
## `geom_smooth()` using formula = 'y ~ x'
Size has a significant positive influence on price. Each additional 1000
sq. ft. increases price by approximately the value of the slope.
ggplot(CA_home, aes(x = factor(Beds), y = Price)) +
geom_violin(fill = "lightblue") +
geom_boxplot(width = 0.1) +
labs(title = "Price Distribution by Number of Bedrooms",
x = "Number of Bedrooms",
y = "Price ($1,000's)")
The p-value for beds tells if it’s statistically significant. If p-value
< 0.05, more bedrooms increase price, but not strongly because of
overlapping ranges.
ggplot(CA_home, aes(x = factor(Baths), y = Price)) +
geom_boxplot(fill = "lightgreen") +
labs(title = "Price by Number of Bathrooms",
x = "Number of Bathrooms",
y = "Price ($1,000's)")
Homes with more bathrooms generally have higher prices, but it varies.
If the p-value for baths is small, the number of bathrooms significantly
affects the price.
library(GGally)
## Registered S3 method overwritten by 'GGally':
## method from
## +.gg ggplot2
ggpairs(CA_home, columns = c("Size", "Beds", "Baths", "Price"))
Size and Baths influence price more strongly than beds after controlling
for other factors.
ggplot(home, aes(x = State, y = Price, fill = State)) +
geom_boxplot() +
labs(title = "Price Differences Among States",
x = "State",
y = "Price ($1,000's)") +
theme(legend.position = "none")
California (CA) and New York (NY) often have higher median home prices
compared to New Jersey (NJ) and Pennsylvania (PA). Based on ANOVA, if
p-value < 0.05, there are significant differences among the
states.
The data analysis conducted in this report provides insights into the relationships between home characteristics and their asking prices. Using statistical models and visualization techniques, we addressed the five research questions posed at the start of the project. Major findings are summarized below:
Q1. Home Size and Price: In California, larger home sizes are significantly associated with higher asking prices.
Q2. Bedrooms and Price: The number of bedrooms shows some positive influence on price, although the effect is weaker compared to home size.
Q3. Bathrooms and Price: The number of bathrooms has a significant positive relationship with home price, suggesting that additional bathrooms add considerable value.
Q4. Combined Effect of Size, Bedrooms, and Bathrooms: When analyzed together, home size and number of bathrooms remain strong predictors of price, while the effect of bedrooms diminishes after controlling for size and baths.
Q5. Price Differences Among States: Significant differences in average home prices exist across the four states. California and New York homes generally command higher prices compared to homes in New Jersey and Pennsylvania.
These findings highlight how specific home features contribute to price variation within states and across markets. Understanding these factors can be valuable for buyers, sellers, and real estate professionals aiming to make informed housing decisions.