Within this analysis, I am trying to study how the number of bedrooms in a building that is classified as “affordable housing” impacts the amount of rent a tenant pays across the country. To complete this investigation, I will use a dataset from the National Housing Preservation Database. This dataset was created by the Public and Affordable Housing Research Corporation and the National Low Income Housing Coalition nearly ten years ago.
In this dataset, each row represents a building. Also, each row contains information about the structure such as the number of units the building, its location and the type of people that live in that building (i.e., families, disabled or the elderly).This dataset contains over 40,000 rows. Since the dataset only contains information about the amount of money tenants spend on two-bedroom units, my analysis will only be limited to two-bedroom apartments across the country. My independent variable is the number of two-bedroom units in a building, and my dependent variable is the amount of rent a tenant pays. A unique feature can be seen in the data. Each building is located in a State. This gives us the ability to investigate how much people pay in rent across the United States of America. Ultimately, regardless of the apartments being classified as “affordable housing,” I hypothesis that buildings with fewer two-bedroom apartments will be more expensive for tenants to rent. Also, the location of an apartment will impact how much rent is charge too.
FairMarketRent_2BR = Dependent Variable TwoBedroomUnits = Independent Variable
library(readr)
library(dplyr)
library(ggplot2)
library(nlme)
library(lme4)
Properties <- read_csv("Desktop/Properties.csv")
Housing <- Properties %>%
select(TwoBedroomUnits,
FairMarketRent_2BR,
State) %>%
filter(!is.na(TwoBedroomUnits),
!is.na(FairMarketRent_2BR),
!is.na(State))
CP <- lm(FairMarketRent_2BR ~ TwoBedroomUnits, data = Housing)
summary(CP)
##
## Call:
## lm(formula = FairMarketRent_2BR ~ TwoBedroomUnits, data = Housing)
##
## Residuals:
## Min 1Q Median 3Q Max
## -702.3 -266.9 -130.7 103.0 2138.0
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 982.99652 1.93968 506.78 <2e-16 ***
## TwoBedroomUnits 1.11968 0.04558 24.56 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 384 on 54635 degrees of freedom
## Multiple R-squared: 0.01092, Adjusted R-squared: 0.0109
## F-statistic: 603.3 on 1 and 54635 DF, p-value: < 2.2e-16
When analyzing the complete pooling model, it must not go unmentioned that we are running a linear regression that treats the number of two bedroom apartments at an individual level. In other words, we are not grouping the data. Based on my Y - intercept, it is safe to say that if a building has zero two bedroom apartments available, fair market rent for the other two bedroom apartments should be around 982 units of measurement. However, there is an error in this model. This model does not factor in how rent varies from state to state. Outliers can drastically impact the results generated by this model.
dcoef <- Housing %>%
group_by(State) %>%
do(mod = lm(FairMarketRent_2BR ~ TwoBedroomUnits, data = .))
coef <- dcoef %>% do(data.frame(Rent_Across_50_States= coef(.$mod)[1]))
ggplot(coef, aes(x = Rent_Across_50_States)) + geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
This plot contains information for the intercept of our regression for the impact of available two-bedroom apartments on rent across the 50 states. Based on this plot, the majority of buildings across the United States that are classified as “affordable housing” have around eight two-bedroom apartments. Also, it would seem that the majority of rent for two-bedroom apartments across the United States is under 1200 dollars. This graph also suggest that usually the more two-bedroom units a building has, the less money tenants will spend on rent.
dcoef <- Housing %>%
group_by(State) %>%
do(mod = lm(FairMarketRent_2BR ~ TwoBedroomUnits, data = .))
coef <- dcoef %>% do(data.frame(Slope= coef(.$mod)[2.]))
ggplot(coef, aes(x = Slope)) + geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
The graph above was generated to display the slopes of all models we ran when investigating the effects of the number of apartments on rent charge to people across the 50 states. Here we see some variation between the 50 states. For instance, if we look towards the far left of the graph, we will see that buildings with fewer to bedroom apartments have an positive effect on the amount of money spent on rent. However, if you direct your attention to the far right of the graph, you will see a negative effect. In certain states, fewer apartments in a building does benefit the tenant since they will spend less money. However, an oppisite effect is seen too.
If you direct your attention towards the middle of the table, we see that in most states, building with more bedrooms has more variation in their rent pricing. Having more bedrooms apartments in an affordable housing complex does not guarantee cheaper rent. These findings may have never been discovered using the complete pooling method. The 982.99 and the 1.11 are byproducts of countervailing forces.
ML1 <- lme(FairMarketRent_2BR ~ TwoBedroomUnits, data = Housing, random = ~ 1|State, method = "ML")
summary(ML1)
## Linear mixed-effects model fit by maximum likelihood
## Data: Housing
## AIC BIC logLik
## 761905.7 761941.3 -380948.8
##
## Random effects:
## Formula: ~1 | State
## (Intercept) Residual
## StdDev: 269.4156 257.3055
##
## Fixed effects: FairMarketRent_2BR ~ TwoBedroomUnits
## Value Std.Error DF t-value p-value
## (Intercept) 967.7319 37.76286 54585 25.62655 0
## TwoBedroomUnits 0.7191 0.03148 54585 22.84579 0
## Correlation:
## (Intr)
## TwoBedroomUnits -0.018
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -3.65663186 -0.44473270 -0.08576724 0.48536059 5.89823067
##
## Number of Observations: 54637
## Number of Groups: 51
The model above is known as partial pooling or random-effects models. This model will combine the strengths of no-pooling models and complete-pooling model. In essence, the model will capture the effects of reality as well as be frugal. Please note that the model above uses a random intercept which allows for group variation, but not for the total amount of two bedroom units in a building. As you can see, the standard deviation for this model is 269.4156.
## Linear mixed-effects model fit by maximum likelihood
## Data: Housing
## AIC BIC logLik
## 761622.3 761675.8 -380805.2
##
## Random effects:
## Formula: ~TwoBedroomUnits | State
## Structure: General positive-definite, Log-Cholesky parametrization
## StdDev Corr
## (Intercept) 273.8583212 (Intr)
## TwoBedroomUnits 0.5539248 -0.378
## Residual 256.4615706
##
## Fixed effects: FairMarketRent_2BR ~ TwoBedroomUnits
## Value Std.Error DF t-value p-value
## (Intercept) 967.9227 38.39052 54585 25.212543 0
## TwoBedroomUnits 0.7221 0.09345 54585 7.726524 0
## Correlation:
## (Intr)
## TwoBedroomUnits -0.327
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -3.90780260 -0.44436939 -0.08074969 0.47011438 6.16372745
##
## Number of Observations: 54637
## Number of Groups: 51
This model incorporates the amount of two bedroom apartments across all 50 states. Based on this model, we can see an increase in standard deviation for the amount of money one will spend for a two bedroom apartment. Therefore, this model suggests that there is a lot of variation between the number of apartments a building has, and the amount of money someone spends on rent. As for interpretations, if you direct your attention towards the random effect model, the Y-intercept of this model that uses both the number of two-bedroom apartments as well as the states they are located in is at 273.9 units. However, for each added apartment, the rent will increase by .55 units of measurement. Note how these findings are different from the complete pooling model that does not incorporate the states of each building is located.
This analysis aimed to investigate the effects of the number of housing units in a given location on the amount of rent someone would spend in a building that is classified as “affordable housing.” Based on this analysis, I learned that the term affordable housing varies from state to state. People in different states do spend a different amount of money on rent, even if the building fits the accessible housing guidelines set forth by the government.
The conclusions of this analysis suggest that there is a lot of variation when it comes to topics like rent and that there are many factors that impact how much one spends. Lastly, this study does not entirely agree with my initial hypothesis. Instead, on average, apartment buildings that are deemed “affordable housing” with fewer bedrooms do charge more rent, but this effect is not seen in all states. Many times, the opposite is viewed in certain parts of the United States since there is a lot of variation. This study had some limitations that must not go unmentioned. For instance, the dataset only had the amount of money tenants spend on two bedroom apartments. Also, this study just used three variables. Future studies should incorporate more variables and have more knowledge for the amount of rent spend across different types of apartments.