I used the data from (“https://www.lock5stat.com/datapage3e.html”).Then refer to the data “HomesForSale”.
The following questions were given for the assignment to explore:
How much does the size of a home influence its price in California?
How does the number of bedrooms of a home influence its price in California?
How does the number of bathrooms of a home influence its price in California?
How do size, number of bedrooms, and number of bathrooms jointly influence home price in California?
Are there significant differences in home prices among the four states (CA, NY, NJ, PA)?
##Analysis
We will explore the following questions in detail.
# Filter data for California
home_CA <- home %>% filter(State == "CA")
# Fit a simple linear regression model
model_size <- lm(Price ~ Size, data = home_CA)
# Show the regression summary
summary(model_size)
##
## Call:
## lm(formula = Price ~ Size, data = home_CA)
##
## Residuals:
## Min 1Q Median 3Q Max
## -462.55 -139.69 39.24 147.65 352.21
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -56.81675 154.68102 -0.367 0.716145
## Size 0.33919 0.08558 3.963 0.000463 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 219.3 on 28 degrees of freedom
## Multiple R-squared: 0.3594, Adjusted R-squared: 0.3365
## F-statistic: 15.71 on 1 and 28 DF, p-value: 0.0004634
# Plot
ggplot(home_CA, aes(x = Size, y = Price)) +
geom_point(color = "blue") +
geom_smooth(method = "lm", color = "red") +
labs(title = "Home Size vs Price in California",
x = "Size (Square Feet)", y = "Price (USD)") +
theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'
# Fit a simple linear regression model
model_bedrooms <- lm(Price ~ Beds, data = home_CA)
# Show the regression summary
summary(model_bedrooms)
##
## Call:
## lm(formula = Price ~ Beds, data = home_CA)
##
## Residuals:
## Min 1Q Median 3Q Max
## -413.83 -236.62 29.94 197.69 570.94
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 269.76 233.62 1.155 0.258
## Beds 84.77 72.91 1.163 0.255
##
## Residual standard error: 267.6 on 28 degrees of freedom
## Multiple R-squared: 0.04605, Adjusted R-squared: 0.01198
## F-statistic: 1.352 on 1 and 28 DF, p-value: 0.2548
# Plot
ggplot(home_CA, aes(x = Beds, y = Price)) +
geom_point(color = "green") +
geom_smooth(method = "lm", color = "red") +
labs(title = "Number of Beds vs Price in California",
x = "Number of Beds", y = "Price (USD)") +
theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'
# Fit a simple linear regression model
model_bathrooms <- lm(Price ~ Baths, data = home_CA)
# Show the regression summary
summary(model_bathrooms)
##
## Call:
## lm(formula = Price ~ Baths, data = home_CA)
##
## Residuals:
## Min 1Q Median 3Q Max
## -374.93 -181.56 -2.74 152.31 614.81
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 90.71 148.57 0.611 0.54641
## Baths 194.74 62.28 3.127 0.00409 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 235.8 on 28 degrees of freedom
## Multiple R-squared: 0.2588, Adjusted R-squared: 0.2324
## F-statistic: 9.779 on 1 and 28 DF, p-value: 0.004092
# Plot
ggplot(home_CA, aes(x = Baths, y = Price)) +
geom_point(color = "purple") +
geom_smooth(method = "lm", color = "red") +
labs(title = "Number of Baths vs Price in California",
x = "Number of Baths", y = "Price (USD)") +
theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'
# Fit a multiple regression model
model_multiple <- lm(Price ~ Size + Beds + Baths, data = home_CA)
# Show the regression summary
summary(model_multiple)
##
## Call:
## lm(formula = Price ~ Size + Beds + Baths, data = home_CA)
##
## Residuals:
## Min 1Q Median 3Q Max
## -415.47 -130.32 19.64 154.79 384.94
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -41.5608 210.3809 -0.198 0.8449
## Size 0.2811 0.1189 2.364 0.0259 *
## Beds -33.7036 67.9255 -0.496 0.6239
## Baths 83.9844 76.7530 1.094 0.2839
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 221.8 on 26 degrees of freedom
## Multiple R-squared: 0.3912, Adjusted R-squared: 0.3209
## F-statistic: 5.568 on 3 and 26 DF, p-value: 0.004353
# Display the coefficients
coef(model_multiple)
## (Intercept) Size Beds Baths
## -41.5608472 0.2811026 -33.7035665 83.9844096
# Filter data for the four states
home_states <- home %>% filter(State %in% c("CA", "NY", "NJ", "PA"))
# Perform ANOVA to compare home prices by state
anova_model <- aov(Price ~ State, data = home_states)
# Show the ANOVA summary
summary(anova_model)
## Df Sum Sq Mean Sq F value Pr(>F)
## State 3 1198169 399390 7.355 0.000148 ***
## Residuals 116 6299266 54304
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Plot
ggplot(home_states, aes(x = State, y = Price, fill = State)) +
geom_boxplot() +
labs(title = "Home Prices by State",
x = "State", y = "Price (USD)") +
theme_minimal()