We use the data from https://www.lock5stat.com/datapage3e.html
I propose the following questions.
Use the data only for California. How much does the size of a home influence its price?
Use the data only for California. How does the number of bedrooms of a home influence its price?
Use the data only for California. How does the number of bathrooms of a home influence its price?
Use the data only for California. How do the size, the number of bedrooms, and the number of bathrooms of a home jointly influence its price?
Are there significant differences in home prices among the four states (CA, NY, NJ, PA)? This will help you determine if the state in which a home is located has a significant impact on its price. All data should be used.
We will explore the questions in detail.
housing = read.csv("https://www.lock5stat.com/datasets3e/HomesForSale.csv")
head(housing)
## State Price Size Beds Baths
## 1 CA 533 1589 3 2.5
## 2 CA 610 2008 3 2.0
## 3 CA 899 2380 5 3.0
## 4 CA 929 1868 3 3.0
## 5 CA 210 1360 2 2.0
## 6 CA 268 2131 3 2.0
# Filter for California homes
california_data <- housing[housing$State == "CA", ]
# Fit linear regression model: Price ~ Size
model <- lm(Price ~ Size, data = california_data)
# Display the summary of the model
summary(model)
##
## Call:
## lm(formula = Price ~ Size, data = california_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -462.55 -139.69 39.24 147.65 352.21
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -56.81675 154.68102 -0.367 0.716145
## Size 0.33919 0.08558 3.963 0.000463 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 219.3 on 28 degrees of freedom
## Multiple R-squared: 0.3594, Adjusted R-squared: 0.3365
## F-statistic: 15.71 on 1 and 28 DF, p-value: 0.0004634
###Q2.(2) Use the data only for California. How does the number of bedrooms of a home influence its price?
# Filter for California homes
california_data <- housing[housing$State == "CA", ]
# Fit a regression model: Price as a function of number of Bedrooms
bedroom_model <- lm(Price ~ Beds, data = california_data)
# View summary of the model
summary(bedroom_model)
##
## Call:
## lm(formula = Price ~ Beds, data = california_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -413.83 -236.62 29.94 197.69 570.94
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 269.76 233.62 1.155 0.258
## Beds 84.77 72.91 1.163 0.255
##
## Residual standard error: 267.6 on 28 degrees of freedom
## Multiple R-squared: 0.04605, Adjusted R-squared: 0.01198
## F-statistic: 1.352 on 1 and 28 DF, p-value: 0.2548
###Q3.(3) Use the data only for California. How does the number of bathrooms of a home influence its price?
# Filter for California homes
california_data <- housing[housing$State == "CA", ]
# Fit a regression model: Price as a function of number of Bedrooms
bathroom_model <- lm(Price ~ Baths, data = california_data)
# View summary of the model
summary(bedroom_model)
##
## Call:
## lm(formula = Price ~ Beds, data = california_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -413.83 -236.62 29.94 197.69 570.94
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 269.76 233.62 1.155 0.258
## Beds 84.77 72.91 1.163 0.255
##
## Residual standard error: 267.6 on 28 degrees of freedom
## Multiple R-squared: 0.04605, Adjusted R-squared: 0.01198
## F-statistic: 1.352 on 1 and 28 DF, p-value: 0.2548
# Visualize relationships
pairs(housing[c("Size", "Beds", "Baths", "Price")],
main = "Scatterplot Matrix: Home Features vs Price")
# Fit multiple linear regression model
model <- lm(Price ~ Size + Beds + Baths, data = housing)
summary(model)
##
## Call:
## lm(formula = Price ~ Size + Beds + Baths, data = housing)
##
## Residuals:
## Min 1Q Median 3Q Max
## -352.31 -157.69 -68.89 86.14 745.66
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 103.75177 92.91802 1.117 0.2665
## Size 0.08199 0.04264 1.923 0.0570 .
## Beds -25.80554 32.82340 -0.786 0.4334
## Baths 84.95750 34.48394 2.464 0.0152 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 228.1 on 116 degrees of freedom
## Multiple R-squared: 0.1953, Adjusted R-squared: 0.1745
## F-statistic: 9.385 on 3 and 116 DF, p-value: 1.329e-05
###Q5. (5) Are there significant differences in home prices among the four states (CA, NY, NJ, PA)? This will help you determine if the state in which a home is located has a significant impact on its price. All data should be used.
# Convert state column to a factor
housing$state <- as.factor(housing$State)
# Run one-way ANOVA
anova_result <- aov(Price ~ state, data = housing)
# Display summary of ANOVA
summary(anova_result)
## Df Sum Sq Mean Sq F value Pr(>F)
## state 3 1198169 399390 7.355 0.000148 ***
## Residuals 116 6299266 54304
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Q1. Based on the linear regression model, the size of a home has a statistically significant positive effect on its price in California. Specifically, for every additional square foot, the price increases by approximately $339.
Q2.The p-value of 0.255 for the number of beds is greater than the typical significance level of 0.05, indicating that we do not have statistically significant evidence to conclude that there’s a linear relationship between the number of beds and property price in this model. Therefore, based on this analysis, the number of beds alone is not a statistically significant predictor of price in this California dataset.
Q3.This regression model examines the relationship between property price and the number of beds, but the high p-value of 0.255 for the ‘Beds’ coefficient suggests that the number of beds is not a statistically significant predictor of price in this dataset. Additionally, the low R-squared value of 0.046 indicates that the model explains only a small portion of the variation in property prices.
Q4.The number of bathrooms has a statistically significant positive effect on home price (p = 0.0152), with each additional bathroom increasing the price on average. Home size shows a marginally significant influence (p = 0.0570), suggesting weak evidence that larger homes tend to cost more. The number of bedrooms does not significantly impact price (p = 0.4334) when controlling for size and bathrooms. Overall, bathrooms are the strongest predictor of home price among the three variables considered.
Q5. The ANOVA test shows a statistically significant difference in mean home prices among the four states (CA, NY, NJ, PA), with a p-value of 0.000148. This indicates that the state in which a home is located has a significant impact on its price.