Introduction

This report seeks to analyze data about homes for sale from the year 2019. The goal is to explore the differences in price, home layout, and home location.

The questions that will be answered in this report include:

  1. How much does the size of a home influence its price in California?

  2. How does the number of bedrooms of a home influence its price in California?

  3. How does the number of bathrooms of a home influence its price in California?

  4. How do the size, the number of bedrooms, and the number of bathrooms of a home jointly influence its price in California?

  5. Are there significant differences in home prices among the four states (CA, NY, NJ, PA)? This will help you determine if the state in which a home is located has a significant impact on its price?

Data

The data set used in this analysis contains information from a study of homes for sale in various states. It includes 120 observations and 5 variables that describe various characteristics of the homes, such as the asking price, size, number of bedrooms, and number of bathrooms, along with the location of the property.

The variables in this dataset include:

State - Location of the homes

Price - Asking price of the homes

Size - Area of all the rooms

Beds - Number of bedrooms

Baths - Number of bathrooms

Analysis

Q1: How much does the size of a home influence its price in California?

To explore the relationship between home size and price, we will fit a simple linear regression model using data from homes in California.

## 
## Call:
## lm(formula = Price ~ Size, data = homeData_CA)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -462.55 -139.69   39.24  147.65  352.21 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -56.81675  154.68102  -0.367 0.716145    
## Size          0.33919    0.08558   3.963 0.000463 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 219.3 on 28 degrees of freedom
## Multiple R-squared:  0.3594, Adjusted R-squared:  0.3365 
## F-statistic: 15.71 on 1 and 28 DF,  p-value: 0.0004634

The regression analysis shows that the size of a home in California has a statistically significant positive impact on its price. For each additional 1,000 sq. ft., the price increases by about $339. The model explains roughly 36% of the price variation, with a residual standard error of $219,300. While the size of the home is an important factor, other variables not included in the model likely also influence the price.

Q2: How does the number of bedrooms of a home influence its price in California?

This question examines how the number of bedrooms in a home influences its price. By analyzing data specifically for California.

## 
## Call:
## lm(formula = Price ~ Beds, data = homeData_CA)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -413.83 -236.62   29.94  197.69  570.94 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)   269.76     233.62   1.155    0.258
## Beds           84.77      72.91   1.163    0.255
## 
## Residual standard error: 267.6 on 28 degrees of freedom
## Multiple R-squared:  0.04605,    Adjusted R-squared:  0.01198 
## F-statistic: 1.352 on 1 and 28 DF,  p-value: 0.2548

While the number of bedrooms appears to have a positive relationship with the price of homes, the p-value suggests that this relationship is not statistically significant at the 0.05 level.

Q3: How does the number of bathrooms of a home influence its price in California?

To investigate the relationship between the number of bathrooms and the price of homes in California, we fit a regression model to the data.

## 
## Call:
## lm(formula = Price ~ Baths, data = homeData_CA)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -374.93 -181.56   -2.74  152.31  614.81 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)   
## (Intercept)    90.71     148.57   0.611  0.54641   
## Baths         194.74      62.28   3.127  0.00409 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 235.8 on 28 degrees of freedom
## Multiple R-squared:  0.2588, Adjusted R-squared:  0.2324 
## F-statistic: 9.779 on 1 and 28 DF,  p-value: 0.004092

The regression model indicates that the number of bathrooms has a statistically significant effect on the price of a home in California. The p-value for the slope (0.00409) is less than the significance level of 0.05, indicating strong evidence against the null hypothesis that the slope is zero. This suggests that an increase in the number of bathrooms is associated with an increase in the price of the home.

Q4: How do the size, the number of bedrooms, and the number of bathrooms of a home jointly influence its price in California?

To investigate how the amount of bedrooms, bathrooms and size of the house influences the homes price, we use multiple linear regressions, and examine the p-value’s of each slope

## 
## Call:
## lm(formula = Price ~ Size + Beds + Baths, data = california_data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -415.47 -130.32   19.64  154.79  384.94 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)  
## (Intercept) -41.5608   210.3809  -0.198   0.8449  
## Size          0.2811     0.1189   2.364   0.0259 *
## Beds        -33.7036    67.9255  -0.496   0.6239  
## Baths        83.9844    76.7530   1.094   0.2839  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 221.8 on 26 degrees of freedom
## Multiple R-squared:  0.3912, Adjusted R-squared:  0.3209 
## F-statistic: 5.568 on 3 and 26 DF,  p-value: 0.004353

The regression model indicates that Size has a statistically significant effect on the price of a home in California. The p-value for the slope (0.0259) is less than the common significance level of 0.05, providing sufficient evidence to reject the null hypothesis that the slope is zero. This suggests that an increase in home size is associated with an increase in home price, holding the number of bedrooms and bathrooms constant. In contrast, the number of bedrooms (p = 0.6239) and bathrooms (p = 0.2839) do not show statistically significant effects in this model, indicating weaker evidence of their individual contributions to home price when accounting for size.

Q5: Are there significant differences in home prices among the four states (CA, NY, NJ, PA)? This will help you determine if the state in which a home is located has a significant impact on its price?

To investigate whether the state in which a home is located influences its price, we use a one-way ANOVA and examine the p-value associated with the state variable.

##              Df  Sum Sq Mean Sq F value   Pr(>F)    
## State         3 1198169  399390   7.355 0.000148 ***
## Residuals   116 6299266   54304                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The one-way ANOVA indicates that there are statistically significant differences in home prices among the four states (CA, NY, NJ, PA). The p-value associated with the State variable is 0.000148, which is well below the typical significance level of 0.05. This provides strong evidence against the null hypothesis that the mean home prices are equal across the states. Therefore, we conclude that the state in which a home is located does have a significant impact on its price.

Summary

This analysis explored how various features of a home including its size, number of bedrooms, and number of bathrooms and how it influences its price in California, as well as whether home prices differ significantly across states. The results show that the size of a home in California has a statistically significant positive impact on its price, meaning that larger homes tend to cost more. However, the number of bedrooms does not appear to significantly influence price on its own. In contrast, the number of bathrooms is a significant factor, with more bathrooms associated with higher home prices. When considering size, bedrooms, and bathrooms together in a multiple regression model, only size remains a statistically significant predictor, suggesting it is the most influential of the three. Additionally, a one-way ANOVA revealed significant differences in average home prices among California, New York, New Jersey, and Pennsylvania, indicating that the state in which a home is located plays an important role in determining its price.

Appendix

Q1: How much does the size of a home influence its price in California?

# Filter to California
homeData_CA <- subset(homeData, State == "CA")

# Fit regression model
lm_model <- lm(Price ~ Size, data = homeData_CA)

# Summary of the regression
summary(lm_model)

# Check model adequacy 
par(mfrow = c(2, 2)) 
plot(lm_model)   

Q2: How does the number of bedrooms of a home influence its price in California?

# Filter data to California
homeData_CA <- subset(homeData, State == "CA")

# Fit regression model
lm_beds <- lm(Price ~ Beds, data = homeData_CA)

# Scatter plot
plot(homeData_CA$Beds, homeData_CA$Price, 
     main = "Regression of Home Price on Number of Bedrooms",
     xlab = "Number of Bedrooms", 
     ylab = "Home Price ($1,000s)", 
     pch = 19, col = "blue")

# Regression line
abline(lm_beds, col = "red")

# Summary of the regression
summary(lm_beds)

# Check model adequacy 
par(mfrow = c(2, 2))  
plot(lm_beds)         

Q3: How does the number of bathrooms of a home influence its price in California?

# Fit regression model 
lm_baths <- lm(Price ~ Baths, data = homeData_CA)

# Summary of regression 
summary(lm_baths)

# Plot the regression line 
plot(homeData_CA$Baths, homeData_CA$Price, 
     xlab = "Number of Bathrooms", 
     ylab = "Price ($1,000)", 
     main = "Price vs. Number of Bathrooms")
abline(lm_baths, col = "blue", lwd = 2)

# Check model adequacy 
par(mfrow = c(2, 2))  
plot(lm_baths)        

Q4: How do the size, the number of bedrooms, and the number of bathrooms of a home jointly influence its price in California?

# Subset the data for California
california_data <- subset(homeData, State == "CA")

# Fit the multiple linear regression 
model_ca <- lm(Price ~ Size + Beds + Baths, data = california_data)

# View the summary
summary(model_ca)

# Plot Size vs Price with regression line
plot(california_data$Size, california_data$Price,
     main = "Price vs Size in California",
     xlab = "Size (in 1000 sq. ft.)",
     ylab = "Price (in $1000's)",
     pch = 19, col = "steelblue")
abline(lm(Price ~ Size, data = california_data), col = "red", lwd = 2)

# Plot Beds vs Price with regression line
plot(california_data$Beds, california_data$Price,
     main = "Price vs Bedrooms in California",
     xlab = "Number of Bedrooms",
     ylab = "Price (in $1000's)",
     pch = 19, col = "steelblue")
abline(lm(Price ~ Beds, data = california_data), col = "red", lwd = 2)

# Plot Baths vs Price with regression line
plot(california_data$Baths, california_data$Price,
     main = "Price vs Bathrooms in California",
     xlab = "Number of Bathrooms",
     ylab = "Price (in $1000's)",
     pch = 19, col = "steelblue")
abline(lm(Price ~ Baths, data = california_data), col = "red", lwd = 2)

# Check model adequacy 
par(mfrow = c(2, 2))  
plot(model_ca)       

Q5: Are there significant differences in home prices among the four states (CA, NY, NJ, PA)? This will help you determine if the state in which a home is located has a significant impact on its price?

# Perform one-way ANOVA
anova_model <- aov(Price ~ State, data = homeData)

# Display the ANOVA table
summary(anova_model)

# Boxplot of Price by State
boxplot(Price ~ State,
        data = homeData,
        main = "Home Prices by State",
        xlab = "State",
        ylab = "Price (in $1,000s)",
        col = c("lightblue", "lightgreen", "lightpink", "lightyellow"),
        border = "gray40")

References

The data being referenced in this report:

##     State Price Size Beds Baths
## 1      CA   533 1589    3   2.5
## 2      CA   610 2008    3   2.0
## 3      CA   899 2380    5   3.0
## 4      CA   929 1868    3   3.0
## 5      CA   210 1360    2   2.0
## 6      CA   268 2131    3   2.0
## 7      CA  1095 2436    3   2.0
## 8      CA   699 1375    2   1.0
## 9      CA   729 2013    3   4.0
## 10     CA   700 1371    3   2.0
## 11     CA   145 1440    3   2.0
## 12     CA   469 2286    3   3.0
## 14     CA   841 2474    3   3.5
## 15     CA   300 1464    3   3.0
## 16     CA   285 1419    3   2.0
## 17     CA   640 1440    3   2.0
## 18     CA   929 2479    4   3.0
## 19     CA   714 2191    4   3.0
## 20     CA   559 1309    3   2.0
## 21     CA   235 2224    3   2.0
## 22     CA   180  834    2   1.0
## 23     CA   148  972    3   1.0
## 24     CA   195 1688    4   2.0
## 25     CA   619 1431    3   2.5
## 26     CA   549 1488    3   2.0
## 27     CA   408 1477    3   2.0
## 28     CA   835 2340    3   3.0
## 29     CA   368 1828    5   2.0
## 30     CA   360 1053    3   2.0
## 31     NJ   135  960    3   2.0
## 32     NJ   235 2010    3   3.0
## 33     NJ   200 1388    3   3.0
## 34     NJ   350 1510    3   1.0
## 35     NJ   165 1438    3   2.0
## 36     NJ   249 1700    3   2.0
## 37     NJ   220 1392    3   2.0
## 38     NJ   320 2216    4   3.0
## 39     NJ   115  986    3   2.0
## 40     NJ   275 2193    5   4.0
## 41     NJ   599 3562    4   3.0
## 42     NJ   279 1196    4   2.0
## 43     NJ   439 2394    4   3.0
## 44     NJ   580 3421    4   4.0
## 45     NJ   230 1422    3   1.0
## 46     NJ   625 1488    3   2.0
## 47     NJ   215 1420    3   2.0
## 48     NJ   209 2108    4   3.0
## 49     NJ   279  984    3   1.0
## 50     NJ   330 2024    5   2.0
## 51     NJ   265 1828    4   3.0
## 52     NJ   280 1937    3   3.0
## 53     NJ   500 2060    4   3.0
## 54     NJ   650 3462    4   3.0
## 55     NJ   129 1012    3   2.0
## 56     NJ   410 1560    3   2.0
## 57     NJ   639 1239    2   2.0
## 58     NJ   260 1836    3   3.0
## 59     NJ   369 1065    4   1.0
## 60     NJ   305 1800    3   3.0
## 61     NY   179 1742    3   2.0
## 62     NY   294 1600    4   2.0
## 63     NY  1250 2000    4   4.0
## 64     NY   825 1200    3   3.0
## 65     NY    35  936    2   1.0
## 66     NY    75 1784    2   1.5
## 67     NY   143 2040    3   1.0
## 68     NY   195 2079    6   4.0
## 69     NY   175 2668    5   3.0
## 70     NY    85 1904    3   3.0
## 71     NY   160 1500    3   2.0
## 72     NY   125 1052    3   1.0
## 73     NY   130 1476    4   2.0
## 74     NY   259 2976    5   2.0
## 75     NY   775 3750    4   4.0
## 76     NY   254 1500    3   2.0
## 77     NY   359 1924    4   2.0
## 78     NY   140 1710    4   3.0
## 79     NY   445 3551    3   3.0
## 80     NY   364 2116    4   3.0
## 81     NY   180 1515    4   2.0
## 82     NY   260 2090    4   3.0
## 83     NY   430 2720    4   3.0
## 84     NY   165 1462    4   2.0
## 85     NY   195 1627    4   2.0
## 86     NY  1000 2743    5   4.0
## 87     NY   929 2197    4   3.0
## 88     NY   875 1456    3   2.0
## 89     NY   289 1823    3   2.0
## 90     NY   370 1742    3   3.0
## 91     PA   230 2628    4   2.0
## 92     PA   475 4286    4   4.0
## 93     PA   225 1104    2   2.0
## 94     PA   220 1132    3   1.0
## 95     PA   180 1508    3   3.0
## 96     PA   399 3290    4   4.0
## 97     PA   550 4100    5   5.0
## 98     PA   290 2000    3   3.0
## 99     PA   110 1224    3   2.0
## 100    PA   190 2076    3   2.0
## 101    PA   342 1502    3   2.0
## 102    PA    99 1380    3   1.0
## 103    PA   410 1980    3   2.0
## 104    PA   315 2024    3   2.0
## 105    PA   240 1304    3   1.0
## 106    PA   515 4163    4   4.0
## 107    PA   350 1667    4   2.0
## 108    PA   135  900    2   2.0
## 109    PA   300 1565    3   2.0
## 110    PA   545 3370    5   4.0
## 111    PA   265 1576    3   3.0
## 112    PA   235 1818    3   2.0
## 113    PA    50  540    1   1.0
## 114    PA   145 1271    3   2.0
## 115    PA   345 1788    3   2.0
## 116    PA   190 1680    4   2.0
## 117    PA   190 1500    3   3.0
## 118    PA    90 1768    3   2.0
## 119    PA   228 1732    3   4.0
## 120    PA   109 1770    3   2.0

This data was collected from https://www.lock5stat.com/datapage3e.html