Throughout 2022, 475,803 people moved into the state, but 817,669 moved out of state, reporting a negative loss of 341,866 residents.
As the major areas of California get more populated, competition rises for the limited supply of homes and many people get priced out.
This leads to people moving out of California in search of cheaper homes and a lower cost of living.
The rising cost of housing is outpacing income growth in certain areas, making it difficult for people to afford living in these locations. As a result, individuals and families are opting for more affordable regions where their income can provide a better quality of life, even if these areas are farther from their original locations or central business districts.
The goal of this project is exploring and analyzing the dataset that will help and identify why people are moving to areas with lower prices but higher median income.
data <- read.csv("~/Downloads/datacsv.csv")
summary(data)
## price bed bath house_size
## Min. : 135900 Min. :2.000 Min. :1.000 Min. : 564
## 1st Qu.: 540000 1st Qu.:3.000 1st Qu.:2.000 1st Qu.:1199
## Median : 700000 Median :3.000 Median :2.000 Median :1498
## Mean : 709130 Mean :3.103 Mean :2.204 Mean :1571
## 3rd Qu.: 850000 3rd Qu.:4.000 3rd Qu.:3.000 3rd Qu.:1875
## Max. :1350000 Max. :5.000 Max. :4.000 Max. :3204
## zip_code_rank
## Min. : 1.00
## 1st Qu.: 39.75
## Median : 87.00
## Mean :107.84
## 3rd Qu.:159.25
## Max. :421.00
$price indicates how the house cost | $bed
indicate how many bedrooms include in a house | $bath
indicate how many bathrooms include in a house |
$house_size the area of a house, indicate with square feet
unit | $zip_code_rank ranking the zip code based on how
expensive the house is (the lower of $zip_code_rank value,
the higher $price)
Visually, we can see there are linear relationship between
$price vs $house_size(positive correlation) |
$price vs $zip_code_rank (negative
correlation). But for the other features $bed and
$bath, they do not have any vivid correlation between\(price`,
`\)house_sizeand$zip_code_rank`
model <- lm(price ~ bed + bath + house_size + zip_code_rank, data = data)
summary(model)
##
## Call:
## lm(formula = price ~ bed + bath + house_size + zip_code_rank,
## data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -205317 -96088 -5355 91085 258562
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 578484.48 18851.87 30.686 < 2e-16 ***
## bed -28351.96 6192.99 -4.578 5.29e-06 ***
## bath 73052.86 7225.80 10.110 < 2e-16 ***
## house_size 144.73 12.03 12.030 < 2e-16 ***
## zip_code_rank -1574.74 46.57 -33.816 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 116700 on 995 degrees of freedom
## Multiple R-squared: 0.7217, Adjusted R-squared: 0.7206
## F-statistic: 645.2 on 4 and 995 DF, p-value: < 2.2e-16
Method: Multiple Linear Regression
Dependent Variables Housing Price
Independent Variables: number of Beds, number of Baths, House Size, Zip Code Rank
Regression Equation: y = b0 + b1x1 + b2x2 + b3x3 + b4x4 x1 = beds | x2 = baths | x3 = house size | x4 = zip code rank
Assumptions: Level of Significance 0.05 | Assume Independent variables and dependent variable are linear | Residuals are normally distributed
Estimated Regression Equation: ŷ = 578484.48 - 28351.96x₁ + 73052.86x₂ + 144.73x₃ - 1574.74x₄
Coefficient Interpretation
b₀ = 578484.48 The intercept of $578,484.48 represents the average base price of a house with zero bedrooms, bathrooms, house size, or zip code rank.
b₁ = -28351.96 Each additional bedroom reduces the home value by $28,351.96. This negative relationship suggests that houses with more bedrooms may trade off other valuable features like house size or location, leading to a decrease in overall value.
b₂ = 73052.86 Each additional bathroom increases the home value by $73,052.86. Bathrooms are a critical feature for homebuyers and are likely associated with higher levels of comfort and utility
b₃ = 144.73 For every additional square foot, the home value increases by $144.73. This positive relationship highlights the importance of larger living spaces in determining home value.
b₄ = -1574.74 For each one-point increase in zip code rank which is indicating a less expensive area, the home value decreases by $1,574.74. This finding emphasizes the importance of location, as lower-ranked zip codes correspond to higher housing costs, reflecting desirability and competition in those areas.
Because the data fits our model well, we can conclude with our dataset that it is possible to predict home values based on home features
Furthermore, every feature of the house has shown to be significant as each p-value is considerably lower than alpha
People should consider the variables outside the feature of a house like area and community before thinking of moving out of state
People can use our model to understand which features of a house makes up most of the home value while taking in consideration of rank
Inform home sellers moving out of state that there are cheaper alternatives