Introduction

I used the data from (“https://www.lock5stat.com/datapage3e.html”).Then refer to the data “HomesForSale”.

The following questions were given for the assignment to explore:

  1. How much does the size of a home influence its price in California?

  2. How does the number of bedrooms of a home influence its price in California?

  3. How does the number of bathrooms of a home influence its price in California?

  4. How do size, number of bedrooms, and number of bathrooms jointly influence home price in California?

  5. Are there significant differences in home prices among the four states (CA, NY, NJ, PA)?

##Analysis

We will explore the following questions in detail.

Q1:How much does the size of a home influence its price in California?

# Filter data for California
home_CA <- home %>% filter(State == "CA")
# Fit a simple linear regression model
model_size <- lm(Price ~ Size, data = home_CA)
# Show the regression summary
summary(model_size)
## 
## Call:
## lm(formula = Price ~ Size, data = home_CA)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -462.55 -139.69   39.24  147.65  352.21 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -56.81675  154.68102  -0.367 0.716145    
## Size          0.33919    0.08558   3.963 0.000463 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 219.3 on 28 degrees of freedom
## Multiple R-squared:  0.3594, Adjusted R-squared:  0.3365 
## F-statistic: 15.71 on 1 and 28 DF,  p-value: 0.0004634
# Plot
ggplot(home_CA, aes(x = Size, y = Price)) +
  geom_point(color = "blue") +
  geom_smooth(method = "lm", color = "red") +
  labs(title = "Home Size vs Price in California",
       x = "Size (Square Feet)", y = "Price (USD)") +
  theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'

Q2:How does the number of bedrooms of a home influence its price in California?

# Fit a simple linear regression model
model_bedrooms <- lm(Price ~ Beds, data = home_CA)
# Show the regression summary
summary(model_bedrooms)
## 
## Call:
## lm(formula = Price ~ Beds, data = home_CA)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -413.83 -236.62   29.94  197.69  570.94 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)   269.76     233.62   1.155    0.258
## Beds           84.77      72.91   1.163    0.255
## 
## Residual standard error: 267.6 on 28 degrees of freedom
## Multiple R-squared:  0.04605,    Adjusted R-squared:  0.01198 
## F-statistic: 1.352 on 1 and 28 DF,  p-value: 0.2548
# Plot
ggplot(home_CA, aes(x = Beds, y = Price)) +
  geom_point(color = "green") +
  geom_smooth(method = "lm", color = "red") +
  labs(title = "Number of Beds vs Price in California",
       x = "Number of Beds", y = "Price (USD)") +
  theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'

Q3:How does the number of bathrooms of a home influence its price in California?

# Fit a simple linear regression model
model_bathrooms <- lm(Price ~ Baths, data = home_CA)
# Show the regression summary
summary(model_bathrooms)
## 
## Call:
## lm(formula = Price ~ Baths, data = home_CA)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -374.93 -181.56   -2.74  152.31  614.81 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)   
## (Intercept)    90.71     148.57   0.611  0.54641   
## Baths         194.74      62.28   3.127  0.00409 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 235.8 on 28 degrees of freedom
## Multiple R-squared:  0.2588, Adjusted R-squared:  0.2324 
## F-statistic: 9.779 on 1 and 28 DF,  p-value: 0.004092
# Plot
ggplot(home_CA, aes(x = Baths, y = Price)) +
  geom_point(color = "purple") +
  geom_smooth(method = "lm", color = "red") +
  labs(title = "Number of Baths vs Price in California",
       x = "Number of Baths", y = "Price (USD)") +
  theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'

Q4:How do size, number of bedrooms, and number of bathrooms jointly influence home price in California?

# Fit a multiple regression model
model_multiple <- lm(Price ~ Size + Beds + Baths, data = home_CA)
# Show the regression summary
summary(model_multiple)
## 
## Call:
## lm(formula = Price ~ Size + Beds + Baths, data = home_CA)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -415.47 -130.32   19.64  154.79  384.94 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)  
## (Intercept) -41.5608   210.3809  -0.198   0.8449  
## Size          0.2811     0.1189   2.364   0.0259 *
## Beds        -33.7036    67.9255  -0.496   0.6239  
## Baths        83.9844    76.7530   1.094   0.2839  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 221.8 on 26 degrees of freedom
## Multiple R-squared:  0.3912, Adjusted R-squared:  0.3209 
## F-statistic: 5.568 on 3 and 26 DF,  p-value: 0.004353
# Display the coefficients
coef(model_multiple)
## (Intercept)        Size        Beds       Baths 
## -41.5608472   0.2811026 -33.7035665  83.9844096

Q5:Are there significant differences in home prices among the four states (CA, NY, NJ, PA)?

# Filter data for the four states
home_states <- home %>% filter(State %in% c("CA", "NY", "NJ", "PA"))
# Perform ANOVA to compare home prices by state
anova_model <- aov(Price ~ State, data = home_states)
# Show the ANOVA summary
summary(anova_model)
##              Df  Sum Sq Mean Sq F value   Pr(>F)    
## State         3 1198169  399390   7.355 0.000148 ***
## Residuals   116 6299266   54304                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Plot
ggplot(home_states, aes(x = State, y = Price, fill = State)) +
  geom_boxplot() +
  labs(title = "Home Prices by State",
       x = "State", y = "Price (USD)") +
  theme_minimal()