Introduction

Throughout 2022, 475,803 people moved into the state, but 817,669 moved out of state, reporting a negative loss of 341,866 residents.

As the major areas of California get more populated, competition rises for the limited supply of homes and many people get priced out.

This leads to people moving out of California in search of cheaper homes and a lower cost of living.

Problem Statement

The rising cost of housing is outpacing income growth in certain areas, making it difficult for people to afford living in these locations. As a result, individuals and families are opting for more affordable regions where their income can provide a better quality of life, even if these areas are farther from their original locations or central business districts.

Objective

The goal of this project is exploring and analyzing the dataset that will help and identify why people are moving to areas with lower prices but higher median income.

Import dataset with “read.csv”

data <- read.csv("~/Downloads/datacsv.csv")

Summarization for descriptive analysis

summary(data)
##      price              bed             bath         house_size  
##  Min.   : 135900   Min.   :2.000   Min.   :1.000   Min.   : 564  
##  1st Qu.: 540000   1st Qu.:3.000   1st Qu.:2.000   1st Qu.:1199  
##  Median : 700000   Median :3.000   Median :2.000   Median :1498  
##  Mean   : 709130   Mean   :3.103   Mean   :2.204   Mean   :1571  
##  3rd Qu.: 850000   3rd Qu.:4.000   3rd Qu.:3.000   3rd Qu.:1875  
##  Max.   :1350000   Max.   :5.000   Max.   :4.000   Max.   :3204  
##  zip_code_rank   
##  Min.   :  1.00  
##  1st Qu.: 39.75  
##  Median : 87.00  
##  Mean   :107.84  
##  3rd Qu.:159.25  
##  Max.   :421.00

$price indicates how the house cost | $bed indicate how many bedrooms include in a house | $bath indicate how many bathrooms include in a house | $house_size the area of a house, indicate with square feet unit | $zip_code_rank ranking the zip code based on how expensive the house is (the lower of $zip_code_rank value, the higher $price)

Scatterplot for price vs other features

Visually, we can see there are linear relationship between $price vs $house_size(positive correlation) | $price vs $zip_code_rank (negative correlation). But for the other features $bed and $bath, they do not have any vivid correlation between\(price`, `\)house_sizeand$zip_code_rank`

Perform multiple linear regression analysis

model <- lm(price ~ bed + bath + house_size + zip_code_rank, data = data)
summary(model)
## 
## Call:
## lm(formula = price ~ bed + bath + house_size + zip_code_rank, 
##     data = data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -205317  -96088   -5355   91085  258562 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   578484.48   18851.87  30.686  < 2e-16 ***
## bed           -28351.96    6192.99  -4.578 5.29e-06 ***
## bath           73052.86    7225.80  10.110  < 2e-16 ***
## house_size       144.73      12.03  12.030  < 2e-16 ***
## zip_code_rank  -1574.74      46.57 -33.816  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 116700 on 995 degrees of freedom
## Multiple R-squared:  0.7217, Adjusted R-squared:  0.7206 
## F-statistic: 645.2 on 4 and 995 DF,  p-value: < 2.2e-16

Method: Multiple Linear Regression

Dependent Variables Housing Price

Independent Variables: number of Beds, number of Baths, House Size, Zip Code Rank

Regression Equation: y = b0 + b1x1 + b2x2 + b3x3 + b4x4 x1 = beds | x2 = baths | x3 = house size | x4 = zip code rank

Assumptions: Level of Significance 0.05 | Assume Independent variables and dependent variable are linear | Residuals are normally distributed

Results

Estimated Regression Equation: ŷ = 578484.48 - 28351.96x₁ + 73052.86x₂ + 144.73x₃ - 1574.74x₄

Coefficient Interpretation

b₀ = 578484.48 The intercept of $578,484.48 represents the average base price of a house with zero bedrooms, bathrooms, house size, or zip code rank.

b₁ = -28351.96 Each additional bedroom reduces the home value by $28,351.96. This negative relationship suggests that houses with more bedrooms may trade off other valuable features like house size or location, leading to a decrease in overall value.

b₂ = 73052.86 Each additional bathroom increases the home value by $73,052.86. Bathrooms are a critical feature for homebuyers and are likely associated with higher levels of comfort and utility

b₃ = 144.73 For every additional square foot, the home value increases by $144.73. This positive relationship highlights the importance of larger living spaces in determining home value.

b₄ = -1574.74 For each one-point increase in zip code rank which is indicating a less expensive area, the home value decreases by $1,574.74. This finding emphasizes the importance of location, as lower-ranked zip codes correspond to higher housing costs, reflecting desirability and competition in those areas.

Conclusion & Recommendation

Because the data fits our model well, we can conclude with our dataset that it is possible to predict home values based on home features

Furthermore, every feature of the house has shown to be significant as each p-value is considerably lower than alpha

Business Recommendations

People should consider the variables outside the feature of a house like area and community before thinking of moving out of state

People can use our model to understand which features of a house makes up most of the home value while taking in consideration of rank

Inform home sellers moving out of state that there are cheaper alternatives