1. Introduction

This project explores how various factors influence home prices using the “HomesForSale” dataset. The analysis focuses on homes in California to examine the effects of size, number of bedrooms, and number of bathrooms on pricing individually and collectively. Additionally, the project investigates price variations across four states—California, New York, New Jersey, and Pennsylvania. Using regression and ANOVA methods, this study aims to uncover significant predictors of home prices and regional differences. The insights gained provide valuable information for real estate market analysis and decision-making. We are going to explore these questions in this project:

Q1. Use the data only for California. How much does the size of a home influence its price?

Q2. Use the data only for California. How does the number of bedrooms of a home influence its price?

Q3. Use the data only for California. How does the number of bathrooms of a home influence its price?

Q4. Use the data only for California. How do the size, the number of bedrooms, and the number of bathrooms of a home jointly influence its price?

Q5. Are there significant differences in home prices among the four states (CA, NY, NJ, PA)? This will help you determine if the state in which a home is located has a significant impact on its price. All data should be used.

2. Data

The dataset used in this project, “HomesForSale,” was sourced from the Lock5Stat website. It includes information on home prices and their attributes across several states in the United States. Key variables in the dataset include:

Price: The sale price of the home (in dollars). Size: The total area of the home (in square feet). Bedrooms: The number of bedrooms in the home. Bathrooms: The number of bathrooms in the home. State: The location of the home (CA, NY, NJ, PA). The dataset allows for a focused analysis of homes in California to understand how specific features impact prices and a broader comparison across states to determine regional pricing differences. Data preprocessing steps include filtering for relevant states, handling missing values, and ensuring variables are correctly formatted for regression and ANOVA analysis.

college = read.csv("https://www.lock5stat.com/datasets3e/HomesForSale.csv")

3. Analysis

In this section we are going to discuss all the questions in detail.

Q1. Use the data only for California. How much does the size of a home influence its price?

# Load data
home <- read.csv("https://www.lock5stat.com/datasets3e/HomesForSale.csv")
# Filter data for California
home_CA <- subset(home, State == "CA")

# Regression model: Size vs Price
model_size <- lm(Price ~ Size, data = home_CA)
summary(model_size)
## 
## Call:
## lm(formula = Price ~ Size, data = home_CA)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -462.55 -139.69   39.24  147.65  352.21 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -56.81675  154.68102  -0.367 0.716145    
## Size          0.33919    0.08558   3.963 0.000463 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 219.3 on 28 degrees of freedom
## Multiple R-squared:  0.3594, Adjusted R-squared:  0.3365 
## F-statistic: 15.71 on 1 and 28 DF,  p-value: 0.0004634

Q2. Use the data only for California. How does the number of bedrooms of a home influence its price?

# Regression model: Bedrooms vs Price
model_bedrooms <- lm(Price ~ Beds, data = home_CA)
summary(model_bedrooms)
## 
## Call:
## lm(formula = Price ~ Beds, data = home_CA)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -413.83 -236.62   29.94  197.69  570.94 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)   269.76     233.62   1.155    0.258
## Beds           84.77      72.91   1.163    0.255
## 
## Residual standard error: 267.6 on 28 degrees of freedom
## Multiple R-squared:  0.04605,    Adjusted R-squared:  0.01198 
## F-statistic: 1.352 on 1 and 28 DF,  p-value: 0.2548

Q3. Use the data only for California. How does the number of bathrooms of a home influence its price?

# Regression model: Bathrooms vs Price
model_bathrooms <- lm(Price ~ Baths, data = home_CA)
summary(model_bathrooms)
## 
## Call:
## lm(formula = Price ~ Baths, data = home_CA)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -374.93 -181.56   -2.74  152.31  614.81 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)   
## (Intercept)    90.71     148.57   0.611  0.54641   
## Baths         194.74      62.28   3.127  0.00409 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 235.8 on 28 degrees of freedom
## Multiple R-squared:  0.2588, Adjusted R-squared:  0.2324 
## F-statistic: 9.779 on 1 and 28 DF,  p-value: 0.004092

Q4. Use the data only for California. How do the size, the number of bedrooms, and the number of bathrooms of a home jointly influence its price?

# Multiple regression model
model_multiple <- lm(Price ~ Size + Beds + Baths, data = home_CA)
summary(model_multiple)
## 
## Call:
## lm(formula = Price ~ Size + Beds + Baths, data = home_CA)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -415.47 -130.32   19.64  154.79  384.94 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)  
## (Intercept) -41.5608   210.3809  -0.198   0.8449  
## Size          0.2811     0.1189   2.364   0.0259 *
## Beds        -33.7036    67.9255  -0.496   0.6239  
## Baths        83.9844    76.7530   1.094   0.2839  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 221.8 on 26 degrees of freedom
## Multiple R-squared:  0.3912, Adjusted R-squared:  0.3209 
## F-statistic: 5.568 on 3 and 26 DF,  p-value: 0.004353

Q5. Are there significant differences in home prices among the four states (CA, NY, NJ, PA)? This will help you determine if the state in which a home is located has a significant impact on its price. All data should be used.

# ANOVA: State vs Price
model_anova <- aov(Price ~ State, data = home)
summary(model_anova)
##              Df  Sum Sq Mean Sq F value   Pr(>F)    
## State         3 1198169  399390   7.355 0.000148 ***
## Residuals   116 6299266   54304                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

4. Conclusion

This project analyzes the factors influencing home prices using the “HomesForSale” dataset. It examines the relationship between home price and characteristics such as size, number of bedrooms, and bathrooms in California, applying regression techniques for individual and joint effects. Additionally, it investigates whether home prices significantly differ across California, New York, New Jersey, and Pennsylvania using ANOVA. The results provide insights into critical predictors of home value and regional price variations, offering a comprehensive understanding of the real estate market dynamics.