home_data = read.csv("https://www.lock5stat.com/datasets3e/HomesForSale.csv")
head(home_data)

Introduction

This report looks at how different features of a home affect its price using data from the Homes For Sale data set. Most of the analysis focuses on homes in California, but the last part compares prices across four different states: California, New York, New Jersey, and Pennsylvania. The goal is to see if things like the size of a home, the number of bedrooms, or the number of bathrooms can help explain how much the home costs. I used regression methods to check each of these variables, and I also used ANOVA to compare average home prices between states.

The questions include:

1. Use the data only for California. How much does the size of a home influence its price?

2. Use the data only for California. How does the number of bedrooms of a home influence its price?

3. Use the data only for California. How does the number of bathrooms of a home influence its price?

4. Use the data only for California. How do the size, the number of bedrooms, and the number of bathrooms of a home jointly influence its price?

5. Are there significant differences in home prices among the four states (CA, NY, NJ, PA)? This will help you determine if the state in which a home is located has a significant impact on its price. All data should be used.

Analysis

We explore the questions in detail.

Question one: Use the data only for California. How much does the size of a home influence its price?

# Load tidyverse for filter()
library(tidyverse)

# Filter data for California
home_ca <- filter(home_data, State == "CA")

# Fit linear regression model: Price ~ Size
model_q1 <- lm(Price ~ Size, data = home_ca)

# Show model summary (gives slope, p-value, R-squared)
summary(model_q1)

I used a linear regression to see if home size affects price for homes in California. The results showed that for every extra square foot, the price goes up by about $339. The p-value was 0.00046, which means the result is statistically significant. The R-squared was 0.36, so about 36% of the difference in home prices can be explained by the size.

So yes, home size does have a strong effect on price in California.

question two: Use the data only for California. How does the number of bedrooms of a home influence its price?

# Run regression: Price ~ Beds for CA homes
model_q2 <- lm(Price ~ Beds, data = home_ca)

# View summary
summary(model_q2)

I ran a regression to see if the number of bedrooms affects home prices in California. The model showed that price goes up by about $84,770 per bedroom, but the p-value was 0.255, which is not statistically significant. The R-squared was only 4.6%, so the model doesn’t explain much of the variation in price. This means the number of bedrooms by itself does not have a strong effect on home price.

question three: Use the data only for California. How does the number of bathrooms of a home influence its price?

# Regression: Price ~ Baths for California homes
model_q3 <- lm(Price ~ Baths, data = home_ca)

# Show summary
summary(model_q3)

I ran a linear regression to see if bathrooms affect home price in California. The results showed that each extra bathroom increases the price by about $194,740. The p-value was 0.0041, which means the relationship is statistically significant. The R-squared value was about 26%, so bathrooms explain a decent amount of the difference in home prices. Overall, bathrooms have a clear and significant impact on home price in California.

question four: Use the data only for California. How do the size, the number of bedrooms, and the number of bathrooms of a home jointly influence its price?

# Multiple regression with all three predictors
model_q4 <- lm(Price ~ Size + Beds + Baths, data = home_ca)

# Show the summary
summary(model_q4)

I ran a multiple regression using size, number of bedrooms, and number of bathrooms to predict home price in California. Only size had a statistically significant p-value (0.026), meaning it clearly affects price. Bedrooms (p = 0.624) and bathrooms (p = 0.284) were not significant in this model.

The model’s R-squared value was 0.39, so about 39% of the variation in home price can be explained by these three variables together. Overall, size is the only factor with a strong and reliable effect on price when all three are considered at once.

question five: Are there significant differences in home prices among the four states (CA, NY, NJ, PA)? This will help you determine if the state in which a home is located has a significant impact on its price. All data should be used.

# Fit a linear model with State as the predictor
model_q5 <- lm(Price ~ State, data = home_data)

# Run ANOVA
anova(model_q5)

I used ANOVA to test whether home prices differ by state. The results showed a p-value of 0.00015, which is statistically significant. This means that at least one state has a significantly different average home price compared to the others. So yes, the state a home is in does have a significant effect on its price.

##Conclusion Home prices are influenced by both physical features and location. Size is the biggest factor affecting price, while bedrooms add only a small amount of value. Bathrooms consistently increase a home’s price, and in the combined model, size still matters most. There are also clear differences between states, with California being the most expensive. Overall, buyers tend to pay more for larger homes, homes with more bathrooms, and homes in California compared to New Jersey, New York, and Pennsylvania.

##Appendix

#Q1

# Load tidyverse for filter()
library(tidyverse)

# Filter data for California
home_ca <- filter(home_data, State == "CA")

# Fit linear regression model: Price ~ Size
model_q1 <- lm(Price ~ Size, data = home_ca)

# Show model summary (gives slope, p-value, R-squared)
summary(model_q1)

#Q2

# Run regression: Price ~ Beds for CA homes
model_q2 <- lm(Price ~ Beds, data = home_ca)

# View summary
summary(model_q2)

#Q3

# Regression: Price ~ Baths for California homes
model_q3 <- lm(Price ~ Baths, data = home_ca)

# Show summary
summary(model_q3)

#Q4

# Multiple regression with all three predictors
model_q4 <- lm(Price ~ Size + Beds + Baths, data = home_ca)

# Show the summary
summary(model_q4)

#Q5

# Fit a linear model with State as the predictor
model_q5 <- lm(Price ~ State, data = home_data)

# Run ANOVA
anova(model_q5)