This report seeks to analyze data about homes for sale from the year 2019. The goal is to explore the differences in price, home layout, and home location.
The questions that will be answered in this report include:
How much does the size of a home influence its price in California?
How does the number of bedrooms of a home influence its price in California?
How does the number of bathrooms of a home influence its price in California?
How do the size, the number of bedrooms, and the number of bathrooms of a home jointly influence its price in California?
Are there significant differences in home prices among the four states (CA, NY, NJ, PA)? This will help you determine if the state in which a home is located has a significant impact on its price?
The data set used in this analysis contains information from a study of homes for sale in various states. It includes 120 observations and 5 variables that describe various characteristics of the homes, such as the asking price, size, number of bedrooms, and number of bathrooms, along with the location of the property.
The variables in this dataset include:
State - Location of the homes
Price - Asking price of the homes
Size - Area of all the rooms
Beds - Number of bedrooms
Baths - Number of bathrooms
To explore the relationship between home size and price, we will fit a simple linear regression model using data from homes in California.
##
## Call:
## lm(formula = Price ~ Size, data = homeData_CA)
##
## Residuals:
## Min 1Q Median 3Q Max
## -462.55 -139.69 39.24 147.65 352.21
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -56.81675 154.68102 -0.367 0.716145
## Size 0.33919 0.08558 3.963 0.000463 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 219.3 on 28 degrees of freedom
## Multiple R-squared: 0.3594, Adjusted R-squared: 0.3365
## F-statistic: 15.71 on 1 and 28 DF, p-value: 0.0004634
The regression analysis shows that the size of a home in California has a statistically significant positive impact on its price. For each additional 1,000 sq. ft., the price increases by about $339. The model explains roughly 36% of the price variation, with a residual standard error of $219,300. While the size of the home is an important factor, other variables not included in the model likely also influence the price.
This question examines how the number of bedrooms in a home influences its price. By analyzing data specifically for California.
##
## Call:
## lm(formula = Price ~ Beds, data = homeData_CA)
##
## Residuals:
## Min 1Q Median 3Q Max
## -413.83 -236.62 29.94 197.69 570.94
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 269.76 233.62 1.155 0.258
## Beds 84.77 72.91 1.163 0.255
##
## Residual standard error: 267.6 on 28 degrees of freedom
## Multiple R-squared: 0.04605, Adjusted R-squared: 0.01198
## F-statistic: 1.352 on 1 and 28 DF, p-value: 0.2548
While the number of bedrooms appears to have a positive relationship with the price of homes, the p-value suggests that this relationship is not statistically significant at the 0.05 level.
To investigate the relationship between the number of bathrooms and the price of homes in California, we fit a regression model to the data.
##
## Call:
## lm(formula = Price ~ Baths, data = homeData_CA)
##
## Residuals:
## Min 1Q Median 3Q Max
## -374.93 -181.56 -2.74 152.31 614.81
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 90.71 148.57 0.611 0.54641
## Baths 194.74 62.28 3.127 0.00409 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 235.8 on 28 degrees of freedom
## Multiple R-squared: 0.2588, Adjusted R-squared: 0.2324
## F-statistic: 9.779 on 1 and 28 DF, p-value: 0.004092
The regression model indicates that the number of bathrooms has a statistically significant effect on the price of a home in California. The p-value for the slope (0.00409) is less than the significance level of 0.05, indicating strong evidence against the null hypothesis that the slope is zero. This suggests that an increase in the number of bathrooms is associated with an increase in the price of the home.
To investigate how the amount of bedrooms, bathrooms and size of the house influences the homes price, we use multiple linear regressions, and examine the p-value’s of each slope
##
## Call:
## lm(formula = Price ~ Size + Beds + Baths, data = california_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -415.47 -130.32 19.64 154.79 384.94
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -41.5608 210.3809 -0.198 0.8449
## Size 0.2811 0.1189 2.364 0.0259 *
## Beds -33.7036 67.9255 -0.496 0.6239
## Baths 83.9844 76.7530 1.094 0.2839
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 221.8 on 26 degrees of freedom
## Multiple R-squared: 0.3912, Adjusted R-squared: 0.3209
## F-statistic: 5.568 on 3 and 26 DF, p-value: 0.004353
The regression model indicates that Size has a statistically significant effect on the price of a home in California. The p-value for the slope (0.0259) is less than the common significance level of 0.05, providing sufficient evidence to reject the null hypothesis that the slope is zero. This suggests that an increase in home size is associated with an increase in home price, holding the number of bedrooms and bathrooms constant. In contrast, the number of bedrooms (p = 0.6239) and bathrooms (p = 0.2839) do not show statistically significant effects in this model, indicating weaker evidence of their individual contributions to home price when accounting for size.
To investigate whether the state in which a home is located influences its price, we use a one-way ANOVA and examine the p-value associated with the state variable.
## Df Sum Sq Mean Sq F value Pr(>F)
## State 3 1198169 399390 7.355 0.000148 ***
## Residuals 116 6299266 54304
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The one-way ANOVA indicates that there are statistically significant differences in home prices among the four states (CA, NY, NJ, PA). The p-value associated with the State variable is 0.000148, which is well below the typical significance level of 0.05. This provides strong evidence against the null hypothesis that the mean home prices are equal across the states. Therefore, we conclude that the state in which a home is located does have a significant impact on its price.
This analysis explored how various features of a home including its size, number of bedrooms, and number of bathrooms and how it influences its price in California, as well as whether home prices differ significantly across states. The results show that the size of a home in California has a statistically significant positive impact on its price, meaning that larger homes tend to cost more. However, the number of bedrooms does not appear to significantly influence price on its own. In contrast, the number of bathrooms is a significant factor, with more bathrooms associated with higher home prices. When considering size, bedrooms, and bathrooms together in a multiple regression model, only size remains a statistically significant predictor, suggesting it is the most influential of the three. Additionally, a one-way ANOVA revealed significant differences in average home prices among California, New York, New Jersey, and Pennsylvania, indicating that the state in which a home is located plays an important role in determining its price.
# Filter to California
homeData_CA <- subset(homeData, State == "CA")
# Fit regression model
lm_model <- lm(Price ~ Size, data = homeData_CA)
# Summary of the regression
summary(lm_model)
# Check model adequacy
par(mfrow = c(2, 2))
plot(lm_model)
# Filter data to California
homeData_CA <- subset(homeData, State == "CA")
# Fit regression model
lm_beds <- lm(Price ~ Beds, data = homeData_CA)
# Scatter plot
plot(homeData_CA$Beds, homeData_CA$Price,
main = "Regression of Home Price on Number of Bedrooms",
xlab = "Number of Bedrooms",
ylab = "Home Price ($1,000s)",
pch = 19, col = "blue")
# Regression line
abline(lm_beds, col = "red")
# Summary of the regression
summary(lm_beds)
# Check model adequacy
par(mfrow = c(2, 2))
plot(lm_beds)
# Fit regression model
lm_baths <- lm(Price ~ Baths, data = homeData_CA)
# Summary of regression
summary(lm_baths)
# Plot the regression line
plot(homeData_CA$Baths, homeData_CA$Price,
xlab = "Number of Bathrooms",
ylab = "Price ($1,000)",
main = "Price vs. Number of Bathrooms")
abline(lm_baths, col = "blue", lwd = 2)
# Check model adequacy
par(mfrow = c(2, 2))
plot(lm_baths)
# Subset the data for California
california_data <- subset(homeData, State == "CA")
# Fit the multiple linear regression
model_ca <- lm(Price ~ Size + Beds + Baths, data = california_data)
# View the summary
summary(model_ca)
# Plot Size vs Price with regression line
plot(california_data$Size, california_data$Price,
main = "Price vs Size in California",
xlab = "Size (in 1000 sq. ft.)",
ylab = "Price (in $1000's)",
pch = 19, col = "steelblue")
abline(lm(Price ~ Size, data = california_data), col = "red", lwd = 2)
# Plot Beds vs Price with regression line
plot(california_data$Beds, california_data$Price,
main = "Price vs Bedrooms in California",
xlab = "Number of Bedrooms",
ylab = "Price (in $1000's)",
pch = 19, col = "steelblue")
abline(lm(Price ~ Beds, data = california_data), col = "red", lwd = 2)
# Plot Baths vs Price with regression line
plot(california_data$Baths, california_data$Price,
main = "Price vs Bathrooms in California",
xlab = "Number of Bathrooms",
ylab = "Price (in $1000's)",
pch = 19, col = "steelblue")
abline(lm(Price ~ Baths, data = california_data), col = "red", lwd = 2)
# Check model adequacy
par(mfrow = c(2, 2))
plot(model_ca)
# Perform one-way ANOVA
anova_model <- aov(Price ~ State, data = homeData)
# Display the ANOVA table
summary(anova_model)
# Boxplot of Price by State
boxplot(Price ~ State,
data = homeData,
main = "Home Prices by State",
xlab = "State",
ylab = "Price (in $1,000s)",
col = c("lightblue", "lightgreen", "lightpink", "lightyellow"),
border = "gray40")
The data being referenced in this report:
## State Price Size Beds Baths
## 1 CA 533 1589 3 2.5
## 2 CA 610 2008 3 2.0
## 3 CA 899 2380 5 3.0
## 4 CA 929 1868 3 3.0
## 5 CA 210 1360 2 2.0
## 6 CA 268 2131 3 2.0
## 7 CA 1095 2436 3 2.0
## 8 CA 699 1375 2 1.0
## 9 CA 729 2013 3 4.0
## 10 CA 700 1371 3 2.0
## 11 CA 145 1440 3 2.0
## 12 CA 469 2286 3 3.0
## 14 CA 841 2474 3 3.5
## 15 CA 300 1464 3 3.0
## 16 CA 285 1419 3 2.0
## 17 CA 640 1440 3 2.0
## 18 CA 929 2479 4 3.0
## 19 CA 714 2191 4 3.0
## 20 CA 559 1309 3 2.0
## 21 CA 235 2224 3 2.0
## 22 CA 180 834 2 1.0
## 23 CA 148 972 3 1.0
## 24 CA 195 1688 4 2.0
## 25 CA 619 1431 3 2.5
## 26 CA 549 1488 3 2.0
## 27 CA 408 1477 3 2.0
## 28 CA 835 2340 3 3.0
## 29 CA 368 1828 5 2.0
## 30 CA 360 1053 3 2.0
## 31 NJ 135 960 3 2.0
## 32 NJ 235 2010 3 3.0
## 33 NJ 200 1388 3 3.0
## 34 NJ 350 1510 3 1.0
## 35 NJ 165 1438 3 2.0
## 36 NJ 249 1700 3 2.0
## 37 NJ 220 1392 3 2.0
## 38 NJ 320 2216 4 3.0
## 39 NJ 115 986 3 2.0
## 40 NJ 275 2193 5 4.0
## 41 NJ 599 3562 4 3.0
## 42 NJ 279 1196 4 2.0
## 43 NJ 439 2394 4 3.0
## 44 NJ 580 3421 4 4.0
## 45 NJ 230 1422 3 1.0
## 46 NJ 625 1488 3 2.0
## 47 NJ 215 1420 3 2.0
## 48 NJ 209 2108 4 3.0
## 49 NJ 279 984 3 1.0
## 50 NJ 330 2024 5 2.0
## 51 NJ 265 1828 4 3.0
## 52 NJ 280 1937 3 3.0
## 53 NJ 500 2060 4 3.0
## 54 NJ 650 3462 4 3.0
## 55 NJ 129 1012 3 2.0
## 56 NJ 410 1560 3 2.0
## 57 NJ 639 1239 2 2.0
## 58 NJ 260 1836 3 3.0
## 59 NJ 369 1065 4 1.0
## 60 NJ 305 1800 3 3.0
## 61 NY 179 1742 3 2.0
## 62 NY 294 1600 4 2.0
## 63 NY 1250 2000 4 4.0
## 64 NY 825 1200 3 3.0
## 65 NY 35 936 2 1.0
## 66 NY 75 1784 2 1.5
## 67 NY 143 2040 3 1.0
## 68 NY 195 2079 6 4.0
## 69 NY 175 2668 5 3.0
## 70 NY 85 1904 3 3.0
## 71 NY 160 1500 3 2.0
## 72 NY 125 1052 3 1.0
## 73 NY 130 1476 4 2.0
## 74 NY 259 2976 5 2.0
## 75 NY 775 3750 4 4.0
## 76 NY 254 1500 3 2.0
## 77 NY 359 1924 4 2.0
## 78 NY 140 1710 4 3.0
## 79 NY 445 3551 3 3.0
## 80 NY 364 2116 4 3.0
## 81 NY 180 1515 4 2.0
## 82 NY 260 2090 4 3.0
## 83 NY 430 2720 4 3.0
## 84 NY 165 1462 4 2.0
## 85 NY 195 1627 4 2.0
## 86 NY 1000 2743 5 4.0
## 87 NY 929 2197 4 3.0
## 88 NY 875 1456 3 2.0
## 89 NY 289 1823 3 2.0
## 90 NY 370 1742 3 3.0
## 91 PA 230 2628 4 2.0
## 92 PA 475 4286 4 4.0
## 93 PA 225 1104 2 2.0
## 94 PA 220 1132 3 1.0
## 95 PA 180 1508 3 3.0
## 96 PA 399 3290 4 4.0
## 97 PA 550 4100 5 5.0
## 98 PA 290 2000 3 3.0
## 99 PA 110 1224 3 2.0
## 100 PA 190 2076 3 2.0
## 101 PA 342 1502 3 2.0
## 102 PA 99 1380 3 1.0
## 103 PA 410 1980 3 2.0
## 104 PA 315 2024 3 2.0
## 105 PA 240 1304 3 1.0
## 106 PA 515 4163 4 4.0
## 107 PA 350 1667 4 2.0
## 108 PA 135 900 2 2.0
## 109 PA 300 1565 3 2.0
## 110 PA 545 3370 5 4.0
## 111 PA 265 1576 3 3.0
## 112 PA 235 1818 3 2.0
## 113 PA 50 540 1 1.0
## 114 PA 145 1271 3 2.0
## 115 PA 345 1788 3 2.0
## 116 PA 190 1680 4 2.0
## 117 PA 190 1500 3 3.0
## 118 PA 90 1768 3 2.0
## 119 PA 228 1732 3 4.0
## 120 PA 109 1770 3 2.0
This data was collected from https://www.lock5stat.com/datapage3e.html