Introduction

This report explores the relationships between home features and their prices using the HomesForSale dataset, which contains data on various attributes of homes across four states: California (CA), New York (NY), New Jersey (NJ), and Pennsylvania (PA). The dataset provides insights into how features such as size, the number of bedrooms, and the number of bathrooms influence home prices. Additionally, this analysis examines whether there are significant differences in home prices among the four states.

Using regression analysis and ANOVA, the findings aim to provide a comprehensive understanding of the factors influencing home prices and the role of geographic location in price variation. The results can guide home buyers, sellers, and policymakers in understanding the housing market dynamics.

The following research questions are addressed in this report:

Data

The HomesForSale dataset contains detailed information about homes listed for sale across four states in the United States in 2019: California (CA), New Jersey (NJ), New York (NY), and Pennsylvania (PA). It provides valuable insights into the factors that influence home prices, such as location, size, and the number of bedrooms and bathrooms.

Dataset Structure

The dataset includes 120 observations, each representing an individual home, with the following five variables:

  • State: The state where the home is located (CA, NJ, NY, or PA).
  • Price: The asking price of the home (in $1,000s).
  • Size: The area of all rooms combined (in 1,000 square feet).
  • Beds: The number of bedrooms.
  • Baths: The number of bathrooms.

Data Collection Process

The data was collected in 2019 from Zillow, a popular real estate platform. The dataset reflects a representative sample of homes listed for sale in each state during that year.

This dataset enables analyses such as:

  • Evaluating the impact of home size, number of bedrooms, and number of bathrooms on the asking price.
  • Comparing average home prices across different states to determine how location influences pricing.

Analysis

I will now explore the questions in detail now:

HomesForSale = read.csv("https://www.lock5stat.com/datasets3e/HomesForSale.csv")

Question 1: How much does the size of a home influence its price in California?

home_CA <- subset(HomesForSale, State == "CA")

model <- lm(Price ~ Size, data = home_CA)

summary(model)
## 
## Call:
## lm(formula = Price ~ Size, data = home_CA)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -462.55 -139.69   39.24  147.65  352.21 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -56.81675  154.68102  -0.367 0.716145    
## Size          0.33919    0.08558   3.963 0.000463 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 219.3 on 28 degrees of freedom
## Multiple R-squared:  0.3594, Adjusted R-squared:  0.3365 
## F-statistic: 15.71 on 1 and 28 DF,  p-value: 0.0004634

This analysis indicates that home prices and their size are highly corralated in Californa. The regression model shows that for each additional unit increase in size, the home’s price increases by approximately 0.33910 units. With a p-value less than 0.001 these finding confirm that a homes size has signfinicant impact on it’s price. The analysis suggests that larger homes tend to be priced higher in California, which aligns with the expectation that home size is a defining factor in determining real estate prices.

Question 2: How does the number of bedrooms of a home influence its price in California?

model_beds <- lm(Price ~ Beds, data = home_CA)

summary(model_beds)
## 
## Call:
## lm(formula = Price ~ Beds, data = home_CA)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -413.83 -236.62   29.94  197.69  570.94 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)   269.76     233.62   1.155    0.258
## Beds           84.77      72.91   1.163    0.255
## 
## Residual standard error: 267.6 on 28 degrees of freedom
## Multiple R-squared:  0.04605,    Adjusted R-squared:  0.01198 
## F-statistic: 1.352 on 1 and 28 DF,  p-value: 0.2548

The regression analysis examining the relationship between the number of bedrooms and home prices in California shows a coefficient of 84.77 for bedrooms. However, the p-value is 0.255, which means this relationship is not statistically significant at the 0.05 level. This indicates that the number of bedrooms alone does not have a meaningful impact on home prices in California. Other factors, such as home size or location, may be more influential in determining prices.

Question 3: How does the number of bathrooms of a home influence its price in California?

model_baths <- lm(Price ~ Baths, data = home_CA)

summary(model_baths)
## 
## Call:
## lm(formula = Price ~ Baths, data = home_CA)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -374.93 -181.56   -2.74  152.31  614.81 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)   
## (Intercept)    90.71     148.57   0.611  0.54641   
## Baths         194.74      62.28   3.127  0.00409 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 235.8 on 28 degrees of freedom
## Multiple R-squared:  0.2588, Adjusted R-squared:  0.2324 
## F-statistic: 9.779 on 1 and 28 DF,  p-value: 0.004092

The regression analysis for California homes shows that each additional bathroom adds about $194.74 to the home’s price. With a p-value of 0.00409, this relationship is statistically significant, indicating that the number of bathrooms positively affects home prices. This result aligns with the expectation that more bathrooms increase a home’s value and appeal in the California market.

Question 4: How do the size, the number of bedrooms, and the number of bathrooms of a home jointly influence its price in California?

model_multiple <- lm(Price ~ Size + Beds + Baths, data = home_CA)

summary(model_multiple)
## 
## Call:
## lm(formula = Price ~ Size + Beds + Baths, data = home_CA)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -415.47 -130.32   19.64  154.79  384.94 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)  
## (Intercept) -41.5608   210.3809  -0.198   0.8449  
## Size          0.2811     0.1189   2.364   0.0259 *
## Beds        -33.7036    67.9255  -0.496   0.6239  
## Baths        83.9844    76.7530   1.094   0.2839  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 221.8 on 26 degrees of freedom
## Multiple R-squared:  0.3912, Adjusted R-squared:  0.3209 
## F-statistic: 5.568 on 3 and 26 DF,  p-value: 0.004353

The multiple regression analysis shows that home size significantly impacts price, with a p-value of 0.0259, meaning larger homes tend to be more expensive. On the other hand, the number of bedrooms and bathrooms does not appear to significantly influence price, as their p-values are 0.6240 and 0.2839, respectively. This suggests that in California, home size plays a more prominent role in determining property value, while the number of bedrooms and bathrooms might not be as influential in this dataset.

Question 5: Are there significant differences in home prices among the four states (CA, NY, NJ, PA)?

anova_model <- aov(Price ~ State, data = HomesForSale)

summary(anova_model)
##              Df  Sum Sq Mean Sq F value   Pr(>F)    
## State         3 1198169  399390   7.355 0.000148 ***
## Residuals   116 6299266   54304                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The ANOVA analysis shows significant differences in home prices across the four states, with a p-value of 0.000148. This low p-value indicates that at least one state’s home prices are distinct from the others. These findings highlight how location influences property values, emphasizing regional differences that can be important for real estate decisions. Recognizing these variations can help buyers, sellers, and investors make better choices when navigating the housing market across different states.

Summary

The analysis reveals key factors influencing home prices. In California, larger homes are significantly more expensive, while the number of bedrooms has little impact, and the number of bathrooms contributes positively to price. When size, bedrooms, and bathrooms are analyzed together, only size significantly affects price, emphasizing its importance in home valuations.

Additionally, home prices vary significantly across California, New York, New Jersey, and Pennsylvania, with location playing a major role in pricing differences. These insights highlight the importance of size and regional market trends for buyers, sellers, and investors navigating the real estate market.

References