Introduction

We use the data from https://www.lock5stat.com/datapage3e.html

I propose the following questions.

  1. Use the data only for California. How much does the size of a home influence its price?

  2. Use the data only for California. How does the number of bedrooms of a home influence its price?

  3. Use the data only for California. How does the number of bathrooms of a home influence its price?

  4. Use the data only for California. How do the size, the number of bedrooms, and the number of bathrooms of a home jointly influence its price?

  5. Are there significant differences in home prices among the four states (CA, NY, NJ, PA)? This will help you determine if the state in which a home is located has a significant impact on its price. All data should be used.

Analysis

We will explore the questions in detail.

housing = read.csv("https://www.lock5stat.com/datasets3e/HomesForSale.csv")
head(housing)
##   State Price Size Beds Baths
## 1    CA   533 1589    3   2.5
## 2    CA   610 2008    3   2.0
## 3    CA   899 2380    5   3.0
## 4    CA   929 1868    3   3.0
## 5    CA   210 1360    2   2.0
## 6    CA   268 2131    3   2.0

Q1. (1) Use the data only for California. How much does the size of a home influence its price?

# Filter for California homes
california_data <- housing[housing$State == "CA", ]

# Fit linear regression model: Price ~ Size
model <- lm(Price ~ Size, data = california_data)

# Display the summary of the model
summary(model)
## 
## Call:
## lm(formula = Price ~ Size, data = california_data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -462.55 -139.69   39.24  147.65  352.21 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -56.81675  154.68102  -0.367 0.716145    
## Size          0.33919    0.08558   3.963 0.000463 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 219.3 on 28 degrees of freedom
## Multiple R-squared:  0.3594, Adjusted R-squared:  0.3365 
## F-statistic: 15.71 on 1 and 28 DF,  p-value: 0.0004634

###Q2.(2) Use the data only for California. How does the number of bedrooms of a home influence its price?

# Filter for California homes
california_data <- housing[housing$State == "CA", ]

# Fit a regression model: Price as a function of number of Bedrooms
bedroom_model <- lm(Price ~ Beds, data = california_data)

# View summary of the model
summary(bedroom_model)
## 
## Call:
## lm(formula = Price ~ Beds, data = california_data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -413.83 -236.62   29.94  197.69  570.94 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)   269.76     233.62   1.155    0.258
## Beds           84.77      72.91   1.163    0.255
## 
## Residual standard error: 267.6 on 28 degrees of freedom
## Multiple R-squared:  0.04605,    Adjusted R-squared:  0.01198 
## F-statistic: 1.352 on 1 and 28 DF,  p-value: 0.2548

###Q3.(3) Use the data only for California. How does the number of bathrooms of a home influence its price?

# Filter for California homes
california_data <- housing[housing$State == "CA", ]

# Fit a regression model: Price as a function of number of Bedrooms
bathroom_model <- lm(Price ~ Baths, data = california_data)

# View summary of the model
summary(bedroom_model)
## 
## Call:
## lm(formula = Price ~ Beds, data = california_data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -413.83 -236.62   29.94  197.69  570.94 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)   269.76     233.62   1.155    0.258
## Beds           84.77      72.91   1.163    0.255
## 
## Residual standard error: 267.6 on 28 degrees of freedom
## Multiple R-squared:  0.04605,    Adjusted R-squared:  0.01198 
## F-statistic: 1.352 on 1 and 28 DF,  p-value: 0.2548

Q4. (4) Use the data only for California. How do the size, the number of bedrooms, and the number of bathrooms of a home jointly influence its price?

# Visualize relationships
pairs(housing[c("Size", "Beds", "Baths", "Price")], 
      main = "Scatterplot Matrix: Home Features vs Price")

# Fit multiple linear regression model
model <- lm(Price ~ Size + Beds + Baths, data = housing)
summary(model)
## 
## Call:
## lm(formula = Price ~ Size + Beds + Baths, data = housing)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -352.31 -157.69  -68.89   86.14  745.66 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)  
## (Intercept) 103.75177   92.91802   1.117   0.2665  
## Size          0.08199    0.04264   1.923   0.0570 .
## Beds        -25.80554   32.82340  -0.786   0.4334  
## Baths        84.95750   34.48394   2.464   0.0152 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 228.1 on 116 degrees of freedom
## Multiple R-squared:  0.1953, Adjusted R-squared:  0.1745 
## F-statistic: 9.385 on 3 and 116 DF,  p-value: 1.329e-05

###Q5. (5) Are there significant differences in home prices among the four states (CA, NY, NJ, PA)? This will help you determine if the state in which a home is located has a significant impact on its price. All data should be used.

# Convert state column to a factor
housing$state <- as.factor(housing$State)

# Run one-way ANOVA
anova_result <- aov(Price ~ state, data = housing)

# Display summary of ANOVA
summary(anova_result)
##              Df  Sum Sq Mean Sq F value   Pr(>F)    
## state         3 1198169  399390   7.355 0.000148 ***
## Residuals   116 6299266   54304                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Summary

Q1. Based on the linear regression model, the size of a home has a statistically significant positive effect on its price in California. Specifically, for every additional square foot, the price increases by approximately $339.

Q2.The p-value of 0.255 for the number of beds is greater than the typical significance level of 0.05, indicating that we do not have statistically significant evidence to conclude that there’s a linear relationship between the number of beds and property price in this model. Therefore, based on this analysis, the number of beds alone is not a statistically significant predictor of price in this California dataset.

Q3.This regression model examines the relationship between property price and the number of beds, but the high p-value of 0.255 for the ‘Beds’ coefficient suggests that the number of beds is not a statistically significant predictor of price in this dataset. Additionally, the low R-squared value of 0.046 indicates that the model explains only a small portion of the variation in property prices.

Q4.The number of bathrooms has a statistically significant positive effect on home price (p = 0.0152), with each additional bathroom increasing the price on average. Home size shows a marginally significant influence (p = 0.0570), suggesting weak evidence that larger homes tend to cost more. The number of bedrooms does not significantly impact price (p = 0.4334) when controlling for size and bathrooms. Overall, bathrooms are the strongest predictor of home price among the three variables considered.

Q5. The ANOVA test shows a statistically significant difference in mean home prices among the four states (CA, NY, NJ, PA), with a p-value of 0.000148. This indicates that the state in which a home is located has a significant impact on its price.