Background:
For my final project in STAT 353, I have been assigned to write a statistical report on home data to demonstrate my understanding of regression, P-value, and ANOVA. Using the RStudio IDE, this report will address 5 statistical questions, utilizing RStudio’s calculation tools. Lastly, I will analyze each result individually before summarizing my conclusions.
Purpose:
Gain experience in RStudio programming
Reinforce my knowledge of inferential statistics
Analyze the data provided in a creative and insightful way
Research Questions:
How much does the size of a home influence its price in California?
How does the number of bedrooms of a home influence its price California?
How does the number of bathrooms of a home influence its price California?
How do the size, the number of bedrooms, and the number of bathrooms of a home jointly influence its price California?
Are there significant differences in home prices among the four states (CA, NY, NJ, PA)?
Variables and Observations:
For the California data (HomesForSaleCA), there are 30 observations and 5 variables.
For the CA, NY, NJ, PA data (HomesForSale), there are 120 observations and 5 variables.
Variable Definitions:
State: Location of the home
Price: Asking price (in $1,000’s)
Size: Area of all rooms (in 1,000’s sq. ft.)
Beds: Number of bedrooms
Baths: Number of bathrooms
Data Collection:
Data was collected from Lock5 website, which was sourced from www.zillow.com in 2019.
Statistical Methods:
Regression
Multiple Regression
ANOVA
## State Price Size Beds Baths
## 1 CA 533 1589 3 2.5
## 2 CA 610 2008 3 2.0
## 3 CA 899 2380 5 3.0
## 4 CA 929 1868 3 3.0
## 5 CA 210 1360 2 2.0
## 6 CA 268 2131 3 2.0
## State Price Size Beds Baths
## 1 CA 533 1589 3 2.5
## 2 CA 610 2008 3 2.0
## 3 CA 899 2380 5 3.0
## 4 CA 929 1868 3 3.0
## 5 CA 210 1360 2 2.0
## 6 CA 268 2131 3 2.0
##
## Call:
## lm(formula = Price ~ Size, data = ca_homes)
##
## Residuals:
## Min 1Q Median 3Q Max
## -462.55 -139.69 39.24 147.65 352.21
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -56.81675 154.68102 -0.367 0.716145
## Size 0.33919 0.08558 3.963 0.000463 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 219.3 on 28 degrees of freedom
## Multiple R-squared: 0.3594, Adjusted R-squared: 0.3365
## F-statistic: 15.71 on 1 and 28 DF, p-value: 0.0004634
Slope Estimate is 0.33919; As the size of a home increases by 1 square foot, the price increases by approximately 34 cents.
P-value is 0.000463; The relationship between size of home and price is statistically significant, as the probability of observing these results by random chance is 0.0463% if null hypothesis were true
Multiple R-squared is 0.3594; 35.94% of the variation in prices is explained by home sizes.
##
## Call:
## lm(formula = Price ~ Beds, data = ca_homes)
##
## Residuals:
## Min 1Q Median 3Q Max
## -413.83 -236.62 29.94 197.69 570.94
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 269.76 233.62 1.155 0.258
## Beds 84.77 72.91 1.163 0.255
##
## Residual standard error: 267.6 on 28 degrees of freedom
## Multiple R-squared: 0.04605, Adjusted R-squared: 0.01198
## F-statistic: 1.352 on 1 and 28 DF, p-value: 0.2548
Slope Estimate is 84.77; For every bedroom added to a home, the price increases by approximately $84,770.
P-value is 0.255; The relationship between number of bedrooms and price is not statistically significant, with a 25.5% likelihood that the results are observed by random chance if null hypothesis were true
Multiple R-squared is 0.04605; 4.6% of the variation in prices is explained by number of bedrooms.
##
## Call:
## lm(formula = Price ~ Baths, data = ca_homes)
##
## Residuals:
## Min 1Q Median 3Q Max
## -374.93 -181.56 -2.74 152.31 614.81
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 90.71 148.57 0.611 0.54641
## Baths 194.74 62.28 3.127 0.00409 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 235.8 on 28 degrees of freedom
## Multiple R-squared: 0.2588, Adjusted R-squared: 0.2324
## F-statistic: 9.779 on 1 and 28 DF, p-value: 0.004092
Slope Estimate is 194.74; For every bathroom added to a home, the price increases by approximately $194,740.
P-value is 0.00409; The relationship between number of bathrooms and price is statistically significant, with a 0.409% likelihood that the results are observed by random chance if null hypothesis were true.
Multiple R-squared is 0.2588; 25.88% of the variation in prices is explained by number of bathrooms.
##
## Call:
## lm(formula = Price ~ Size + Beds + Baths, data = ca_homes)
##
## Residuals:
## Min 1Q Median 3Q Max
## -415.47 -130.32 19.64 154.79 384.94
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -41.5608 210.3809 -0.198 0.8449
## Size 0.2811 0.1189 2.364 0.0259 *
## Beds -33.7036 67.9255 -0.496 0.6239
## Baths 83.9844 76.7530 1.094 0.2839
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 221.8 on 26 degrees of freedom
## Multiple R-squared: 0.3912, Adjusted R-squared: 0.3209
## F-statistic: 5.568 on 3 and 26 DF, p-value: 0.004353
For size, slope estimate is 0.2811 and P-value is 0.0259; As the size of a home increases by 1 square foot, the price increases by 28 cents. This result is statistically significant, with only a 2.59% chance that these results are observed by random change under null hypothesis.
For number of bedrooms, slope estimate is -33.7036 and P-value is 0.6239; For every bedroom added to a house, the price decreases by $33,704 . This result is not statistically significant, with a 62.39% chance that these results are observed by random change under null hypothesis.
For number of bathrooms, slope estimate is 83.9844 and P-value is 0.2839; For every bathroom added to a house, the price increases by $83,984 . This result is not statistically significant, with a 28.39% chance that these results are observed by random change under null hypothesis.
## Df Sum Sq Mean Sq F value Pr(>F)
## State 3 1198169 399390 7.355 0.000148 ***
## Residuals 116 6299266 54304
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
F-value is 7.355; This tells me that the variation between states is much larger than the variation within states.
P-value is 0.000148, meaning the F-value is statistically significant and the likelihood of seeing these results by random chance under null hypothesis is only 0.0148%.
With this statistical report, I explore how home prices are influenced by variables, specifically size, number of bedrooms, number of bathrooms, and location. This is important because virtually all Americans buy and sell homes at some point in their lives, and it is necessary to be informed on what factors have the strongest influence on home prices, so we can make smart decisions.
When I analyzed the regression under each variable individually, size had by far the strongest assocation with price. Number of bathrooms had less but still statistically significant influence and number of bedrooms had the least influence on price.
When I analyzed the variables jointly, number of bedrooms and number of bathrooms do not have strong influence on prices of homes in California. Size did have a very low P-value, which suggests that prices of homes in California are dependent on size as opposed to number of bedrooms or bathrooms.
Under ANOVA analysis, there is healthy evidence that prices vary among states, as shown by the high F-value and low P-value. This indicates that location heavily influences home prices among California, New York, New Jersey, and Pennsylvania.