1. Introduction

I used the data from HomesForSale - which shows the prices on homes in California, New Jersey, New York and Pennsylvania according to sizes bed rooms and bath rooms.

I will be exploring the following five questions:

2. Analysis

I will explore each question further more in details below:

##   State Price Size Beds Baths
## 1    CA   533 1589    3   2.5
## 2    CA   610 2008    3   2.0
## 3    CA   899 2380    5   3.0
## 4    CA   929 1868    3   3.0
## 5    CA   210 1360    2   2.0
## 6    CA   268 2131    3   2.0

Q1: Use the data only for California. How much does the size of a home influence its price?

## 
## Call:
## lm(formula = Price ~ Size, data = home)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -308.54 -146.92  -74.77   63.02  857.56 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 119.65602   59.02301   2.027   0.0449 *  
## Size          0.13639    0.02958   4.611 1.02e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 232 on 118 degrees of freedom
## Multiple R-squared:  0.1527, Adjusted R-squared:  0.1455 
## F-statistic: 21.26 on 1 and 118 DF,  p-value: 1.022e-05
## `geom_smooth()` using formula = 'y ~ x'

Since the p-value for size is 1.02e-05, which is less than the significance level of 0.05, I can conclude that size has a statistically significant effect on home prices. Specifically, for each additional 1,000 sq. ft. of size, the price of the home increases by approximately $136,390. This suggests that the size of a home has a positive influence on its price. However, the R-squared value of 0.1527 indicates that only about 15.27% of the variability in home prices is explained by the size, implying that other factors may also contribute to home price variations.

Q2: Use the data only for California. How does the number of bedrooms of a home influence its price?

## Model Summary:
## 
## Call:
## lm(formula = Price ~ Beds, data = home)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -347.34 -176.03  -74.41   95.72  835.34 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)  
## (Intercept)   159.31      99.28   1.605   0.1112  
## Beds           63.84      28.79   2.217   0.0285 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 247 on 118 degrees of freedom
## Multiple R-squared:   0.04,  Adjusted R-squared:  0.03186 
## F-statistic: 4.917 on 1 and 118 DF,  p-value: 0.02851
## Model Summary:
## 
## Call:
## lm(formula = Price ~ Beds, data = home)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -347.34 -176.03  -74.41   95.72  835.34 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)  
## (Intercept)   159.31      99.28   1.605   0.1112  
## Beds           63.84      28.79   2.217   0.0285 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 247 on 118 degrees of freedom
## Multiple R-squared:   0.04,  Adjusted R-squared:  0.03186 
## F-statistic: 4.917 on 1 and 118 DF,  p-value: 0.02851
## `geom_smooth()` using formula = 'y ~ x'

Interpretation: The slope (coefficient for Beds) is 63.84 , which means that for every additional bedroom, the home price changes by approximately 63.84 thousand dollars. The p-value for the slope coefficient is 0.0285132 , which is less than the significance level of 0.05. This indicates that the number of bedrooms has a statistically significant influence on the price of a home.

NOTE: Condition used was: If p_value < 0.05, the influence of Beds on Price is statistically significant.

Q3: Use the data only for California. How does the number of bathrooms of a home influence its price?

## 
## Call:
## lm(formula = Price ~ Baths, data = home)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -371.18 -161.83  -59.93   84.42  769.42 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    84.98      62.46   1.360    0.176    
## Baths         120.30      24.52   4.907 2.99e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 229.7 on 118 degrees of freedom
## Multiple R-squared:  0.1695, Adjusted R-squared:  0.1624 
## F-statistic: 24.08 on 1 and 118 DF,  p-value: 2.992e-06
## The p-value for the slope coefficient is 2.991905e-06 , which is less than the significance level of 0.05.
## This indicates that the number of bathrooms has a statistically significant influence on the price of a home.
## `geom_smooth()` using formula = 'y ~ x'

The p-value for the slope coefficient is 2.991905e-06 , which is less than the significance level of 0.05. This indicates that the number of bathrooms has a statistically significant influence on the price of a home. NOTE: Condition used was: If p_value < 0.05, the influence of Bathrooms on Price is statistically significant.

Q4: Use the data only for California. How do the size, the number of bedrooms, and the number of bathrooms of a home jointly influence its price?

## Model Summary:
## 
## Call:
## lm(formula = Price ~ Size + Beds + Baths, data = home)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -352.31 -157.69  -68.89   86.14  745.66 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)  
## (Intercept) 103.75177   92.91802   1.117   0.2665  
## Size          0.08199    0.04264   1.923   0.0570 .
## Beds        -25.80554   32.82340  -0.786   0.4334  
## Baths        84.95750   34.48394   2.464   0.0152 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 228.1 on 116 degrees of freedom
## Multiple R-squared:  0.1953, Adjusted R-squared:  0.1745 
## F-statistic: 9.385 on 3 and 116 DF,  p-value: 1.329e-05
## Model Summary:
## 
## Call:
## lm(formula = Price ~ Size + Beds + Baths, data = home)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -352.31 -157.69  -68.89   86.14  745.66 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)  
## (Intercept) 103.75177   92.91802   1.117   0.2665  
## Size          0.08199    0.04264   1.923   0.0570 .
## Beds        -25.80554   32.82340  -0.786   0.4334  
## Baths        84.95750   34.48394   2.464   0.0152 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 228.1 on 116 degrees of freedom
## Multiple R-squared:  0.1953, Adjusted R-squared:  0.1745 
## F-statistic: 9.385 on 3 and 116 DF,  p-value: 1.329e-05
## `geom_smooth()` using formula = 'y ~ x'

## `geom_smooth()` using formula = 'y ~ x'

## `geom_smooth()` using formula = 'y ~ x'

Interpretation:

Variable: Size The coefficient (slope) is 0.08 , which means that for every unit increase in Size the home price changes by approximately 0.08 thousand dollars, holding all other variables constant. The p-value is 5.696903e-02 , which is greater than the significance level of 0.05. This suggests that Size does not have a statistically significant influence on home price.

Variable: Beds The coefficient (slope) is -25.81 , which means that for every unit increase in Beds the home price changes by approximately -25.81 thousand dollars, holding all other variables constant. The p-value is 4.333572e-01 , which is greater than the significance level of 0.05. This suggests that Beds does not have a statistically significant influence on home price.

Variable: Baths The coefficient (slope) is 84.96 , which means that for every unit increase in Baths the home price changes by approximately 84.96 thousand dollars, holding all other variables constant. The p-value is 1.521908e-02 , which is less than the significance level of 0.05. This indicates that Baths has a statistically significant influence on home price.

Q5:

##              Df  Sum Sq Mean Sq F value   Pr(>F)    
## State         3 1198169  399390   7.355 0.000148 ***
## Residuals   116 6299266   54304                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Since the p-value is 0.000148, which is much smaller than the significance level of 0.05, we reject the null hypothesis. This suggests that there are significant differences in home prices among the four states (CA, NY, NJ, PA). Therefore, the state in which a home is located has a statistically significant impact on its price.

3. Summary

This project analyzed a dataset of homes for sale in California, New Jersey, New York, and Pennsylvania to examine how factors like size, number of bedrooms, and number of bathrooms influence home prices. Linear regression revealed that size and bathrooms significantly affect price, while the number of bedrooms did not. A multiple regression model confirmed that size and bathrooms were key factors, while bedrooms were not significant. ANOVA showed that home prices differ significantly across states, particularly between California and the other states. These findings provide valuable insights into the key drivers of home prices.