Introduction

Background:

For my final project in STAT 353, I have been assigned to write a statistical report on home data to demonstrate my understanding of regression, P-value, and ANOVA. Using the RStudio IDE, this report will address 5 statistical questions, utilizing RStudio’s calculation tools. Lastly, I will analyze each result individually before summarizing my conclusions.

Purpose:

Research Questions:

Data

Variables and Observations:

For the California data (HomesForSaleCA), there are 30 observations and 5 variables.

For the CA, NY, NJ, PA data (HomesForSale), there are 120 observations and 5 variables.

Variable Definitions:

State: Location of the home

Price: Asking price (in $1,000’s)

Size: Area of all rooms (in 1,000’s sq. ft.)

Beds: Number of bedrooms

Baths: Number of bathrooms

Data Collection:

Data was collected from Lock5 website, which was sourced from www.zillow.com in 2019.

Statistical Methods:

Analysis

##   State Price Size Beds Baths
## 1    CA   533 1589    3   2.5
## 2    CA   610 2008    3   2.0
## 3    CA   899 2380    5   3.0
## 4    CA   929 1868    3   3.0
## 5    CA   210 1360    2   2.0
## 6    CA   268 2131    3   2.0
##   State Price Size Beds Baths
## 1    CA   533 1589    3   2.5
## 2    CA   610 2008    3   2.0
## 3    CA   899 2380    5   3.0
## 4    CA   929 1868    3   3.0
## 5    CA   210 1360    2   2.0
## 6    CA   268 2131    3   2.0

Q1: How much does the size of a home influence its price in California?

## 
## Call:
## lm(formula = Price ~ Size, data = ca_homes)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -462.55 -139.69   39.24  147.65  352.21 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -56.81675  154.68102  -0.367 0.716145    
## Size          0.33919    0.08558   3.963 0.000463 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 219.3 on 28 degrees of freedom
## Multiple R-squared:  0.3594, Adjusted R-squared:  0.3365 
## F-statistic: 15.71 on 1 and 28 DF,  p-value: 0.0004634
  • Slope Estimate is 0.33919; As the size of a home increases by 1 square foot, the price increases by approximately 34 cents.

  • P-value is 0.000463; The relationship between size of home and price is statistically significant, as the probability of observing these results by random chance is 0.0463% if null hypothesis were true

  • Multiple R-squared is 0.3594; 35.94% of the variation in prices is explained by home sizes.

Q2: How does the number of bedrooms of a home influence its price California?

## 
## Call:
## lm(formula = Price ~ Beds, data = ca_homes)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -413.83 -236.62   29.94  197.69  570.94 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)   269.76     233.62   1.155    0.258
## Beds           84.77      72.91   1.163    0.255
## 
## Residual standard error: 267.6 on 28 degrees of freedom
## Multiple R-squared:  0.04605,    Adjusted R-squared:  0.01198 
## F-statistic: 1.352 on 1 and 28 DF,  p-value: 0.2548
  • Slope Estimate is 84.77; For every bedroom added to a home, the price increases by approximately $84,770.

  • P-value is 0.255; The relationship between number of bedrooms and price is not statistically significant, with a 25.5% likelihood that the results are observed by random chance if null hypothesis were true

  • Multiple R-squared is 0.04605; 4.6% of the variation in prices is explained by number of bedrooms.

Q3: How does the number of bathrooms of a home influence its price California?

## 
## Call:
## lm(formula = Price ~ Baths, data = ca_homes)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -374.93 -181.56   -2.74  152.31  614.81 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)   
## (Intercept)    90.71     148.57   0.611  0.54641   
## Baths         194.74      62.28   3.127  0.00409 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 235.8 on 28 degrees of freedom
## Multiple R-squared:  0.2588, Adjusted R-squared:  0.2324 
## F-statistic: 9.779 on 1 and 28 DF,  p-value: 0.004092
  • Slope Estimate is 194.74; For every bathroom added to a home, the price increases by approximately $194,740.

  • P-value is 0.00409; The relationship between number of bathrooms and price is statistically significant, with a 0.409% likelihood that the results are observed by random chance if null hypothesis were true.

  • Multiple R-squared is 0.2588; 25.88% of the variation in prices is explained by number of bathrooms.

Q4: How do the size, the number of bedrooms, and the number of bathrooms of a home jointly influence its price California

## 
## Call:
## lm(formula = Price ~ Size + Beds + Baths, data = ca_homes)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -415.47 -130.32   19.64  154.79  384.94 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)  
## (Intercept) -41.5608   210.3809  -0.198   0.8449  
## Size          0.2811     0.1189   2.364   0.0259 *
## Beds        -33.7036    67.9255  -0.496   0.6239  
## Baths        83.9844    76.7530   1.094   0.2839  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 221.8 on 26 degrees of freedom
## Multiple R-squared:  0.3912, Adjusted R-squared:  0.3209 
## F-statistic: 5.568 on 3 and 26 DF,  p-value: 0.004353
  • For size, slope estimate is 0.2811 and P-value is 0.0259; As the size of a home increases by 1 square foot, the price increases by 28 cents. This result is statistically significant, with only a 2.59% chance that these results are observed by random change under null hypothesis.

  • For number of bedrooms, slope estimate is -33.7036 and P-value is 0.6239; For every bedroom added to a house, the price decreases by $33,704 . This result is not statistically significant, with a 62.39% chance that these results are observed by random change under null hypothesis.

  • For number of bathrooms, slope estimate is 83.9844 and P-value is 0.2839; For every bathroom added to a house, the price increases by $83,984 . This result is not statistically significant, with a 28.39% chance that these results are observed by random change under null hypothesis.

Q5: Are there significant differences in home prices among the four states (CA, NY, NJ, PA)?

##              Df  Sum Sq Mean Sq F value   Pr(>F)    
## State         3 1198169  399390   7.355 0.000148 ***
## Residuals   116 6299266   54304                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
  • F-value is 7.355; This tells me that the variation between states is much larger than the variation within states.

  • P-value is 0.000148, meaning the F-value is statistically significant and the likelihood of seeing these results by random chance under null hypothesis is only 0.0148%.

Conclusion

With this statistical report, I explore how home prices are influenced by variables, specifically size, number of bedrooms, number of bathrooms, and location. This is important because virtually all Americans buy and sell homes at some point in their lives, and it is necessary to be informed on what factors have the strongest influence on home prices, so we can make smart decisions.

When I analyzed the regression under each variable individually, size had by far the strongest assocation with price. Number of bathrooms had less but still statistically significant influence and number of bedrooms had the least influence on price.

When I analyzed the variables jointly, number of bedrooms and number of bathrooms do not have strong influence on prices of homes in California. Size did have a very low P-value, which suggests that prices of homes in California are dependent on size as opposed to number of bedrooms or bathrooms.

Under ANOVA analysis, there is healthy evidence that prices vary among states, as shown by the high F-value and low P-value. This indicates that location heavily influences home prices among California, New York, New Jersey, and Pennsylvania.

References