This report explores the relationships between home features and their prices using the HomesForSale dataset, which contains data on various attributes of homes across four states: California (CA), New York (NY), New Jersey (NJ), and Pennsylvania (PA). The dataset provides insights into how features such as size, the number of bedrooms, and the number of bathrooms influence home prices. Additionally, this analysis examines whether there are significant differences in home prices among the four states.
Using regression analysis and ANOVA, the findings aim to provide a comprehensive understanding of the factors influencing home prices and the role of geographic location in price variation. The results can guide home buyers, sellers, and policymakers in understanding the housing market dynamics.
The following research questions are addressed in this report:
1.How much does the size of a home influence its price in California?
2.How does the number of bedrooms of a home influence its price in California?
3.How does the number of bathrooms of a home influence its price in California?
4.How do the size, the number of bedrooms, and the number of bathrooms of a home jointly influence its price in California?
5.Are there significant differences in home prices among the four states (CA, NY, NJ, PA)?
The HomesForSale dataset contains detailed information about homes listed for sale across four states in the United States in 2019: California (CA), New Jersey (NJ), New York (NY), and Pennsylvania (PA). It provides valuable insights into the factors that influence home prices, such as location, size, and the number of bedrooms and bathrooms.
The dataset includes 120 observations, each representing an individual home, with the following five variables:
The data was collected in 2019 from Zillow, a popular real estate platform. The dataset reflects a representative sample of homes listed for sale in each state during that year.
This dataset enables analyses such as:
I will now explore the questions in detail now:
HomesForSale = read.csv("https://www.lock5stat.com/datasets3e/HomesForSale.csv")
home_CA <- subset(HomesForSale, State == "CA")
model <- lm(Price ~ Size, data = home_CA)
summary(model)
##
## Call:
## lm(formula = Price ~ Size, data = home_CA)
##
## Residuals:
## Min 1Q Median 3Q Max
## -462.55 -139.69 39.24 147.65 352.21
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -56.81675 154.68102 -0.367 0.716145
## Size 0.33919 0.08558 3.963 0.000463 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 219.3 on 28 degrees of freedom
## Multiple R-squared: 0.3594, Adjusted R-squared: 0.3365
## F-statistic: 15.71 on 1 and 28 DF, p-value: 0.0004634
This analysis indicates that home prices and their size are highly corralated in Californa. The regression model shows that for each additional unit increase in size, the home’s price increases by approximately 0.33910 units. With a p-value less than 0.001 these finding confirm that a homes size has signfinicant impact on it’s price. The analysis suggests that larger homes tend to be priced higher in California, which aligns with the expectation that home size is a defining factor in determining real estate prices.
model_beds <- lm(Price ~ Beds, data = home_CA)
summary(model_beds)
##
## Call:
## lm(formula = Price ~ Beds, data = home_CA)
##
## Residuals:
## Min 1Q Median 3Q Max
## -413.83 -236.62 29.94 197.69 570.94
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 269.76 233.62 1.155 0.258
## Beds 84.77 72.91 1.163 0.255
##
## Residual standard error: 267.6 on 28 degrees of freedom
## Multiple R-squared: 0.04605, Adjusted R-squared: 0.01198
## F-statistic: 1.352 on 1 and 28 DF, p-value: 0.2548
The regression analysis examining the relationship between the number of bedrooms and home prices in California shows a coefficient of 84.77 for bedrooms. However, the p-value is 0.255, which means this relationship is not statistically significant at the 0.05 level. This indicates that the number of bedrooms alone does not have a meaningful impact on home prices in California. Other factors, such as home size or location, may be more influential in determining prices.
model_baths <- lm(Price ~ Baths, data = home_CA)
summary(model_baths)
##
## Call:
## lm(formula = Price ~ Baths, data = home_CA)
##
## Residuals:
## Min 1Q Median 3Q Max
## -374.93 -181.56 -2.74 152.31 614.81
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 90.71 148.57 0.611 0.54641
## Baths 194.74 62.28 3.127 0.00409 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 235.8 on 28 degrees of freedom
## Multiple R-squared: 0.2588, Adjusted R-squared: 0.2324
## F-statistic: 9.779 on 1 and 28 DF, p-value: 0.004092
The regression analysis for California homes shows that each additional bathroom adds about $194.74 to the home’s price. With a p-value of 0.00409, this relationship is statistically significant, indicating that the number of bathrooms positively affects home prices. This result aligns with the expectation that more bathrooms increase a home’s value and appeal in the California market.
model_multiple <- lm(Price ~ Size + Beds + Baths, data = home_CA)
summary(model_multiple)
##
## Call:
## lm(formula = Price ~ Size + Beds + Baths, data = home_CA)
##
## Residuals:
## Min 1Q Median 3Q Max
## -415.47 -130.32 19.64 154.79 384.94
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -41.5608 210.3809 -0.198 0.8449
## Size 0.2811 0.1189 2.364 0.0259 *
## Beds -33.7036 67.9255 -0.496 0.6239
## Baths 83.9844 76.7530 1.094 0.2839
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 221.8 on 26 degrees of freedom
## Multiple R-squared: 0.3912, Adjusted R-squared: 0.3209
## F-statistic: 5.568 on 3 and 26 DF, p-value: 0.004353
The multiple regression analysis shows that home size significantly impacts price, with a p-value of 0.0259, meaning larger homes tend to be more expensive. On the other hand, the number of bedrooms and bathrooms does not appear to significantly influence price, as their p-values are 0.6240 and 0.2839, respectively. This suggests that in California, home size plays a more prominent role in determining property value, while the number of bedrooms and bathrooms might not be as influential in this dataset.
anova_model <- aov(Price ~ State, data = HomesForSale)
summary(anova_model)
## Df Sum Sq Mean Sq F value Pr(>F)
## State 3 1198169 399390 7.355 0.000148 ***
## Residuals 116 6299266 54304
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The ANOVA analysis shows significant differences in home prices across the four states, with a p-value of 0.000148. This low p-value indicates that at least one state’s home prices are distinct from the others. These findings highlight how location influences property values, emphasizing regional differences that can be important for real estate decisions. Recognizing these variations can help buyers, sellers, and investors make better choices when navigating the housing market across different states.
The analysis reveals key factors influencing home prices. In California, larger homes are significantly more expensive, while the number of bedrooms has little impact, and the number of bathrooms contributes positively to price. When size, bedrooms, and bathrooms are analyzed together, only size significantly affects price, emphasizing its importance in home valuations.
Additionally, home prices vary significantly across California, New York, New Jersey, and Pennsylvania, with location playing a major role in pricing differences. These insights highlight the importance of size and regional market trends for buyers, sellers, and investors navigating the real estate market.