library(tidyverse)## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.2     ✔ tibble    3.3.0
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.1.0     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errorslibrary(ggplot2)
library(readxl)
library(pastecs)## 
## Attaching package: 'pastecs'
## 
## The following objects are masked from 'package:dplyr':
## 
##     first, last
## 
## The following object is masked from 'package:tidyr':
## 
##     extractHousingData <- read_csv("Table9.csv")## Rows: 3221 Columns: 152
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr   (6): source, sumlevel, geoid, name, st, cnty
## dbl (146): T9_est1, T9_est2, T9_est3, T9_est4, T9_est5, T9_est6, T9_est7, T9...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.CHAS_renters <- HousingData %>%
  mutate(
    severe_prob_rate = (T9_est5 / T9_est1) * 100,
    overcrowd_rate = (T9_est10 / T9_est1) * 100,
    incomplete_rate = ((T9_est12 + T9_est13) / T9_est1) * 100
  )model <- lm(severe_prob_rate ~ overcrowd_rate + incomplete_rate, data = CHAS_renters)
summary(model)## 
## Call:
## lm(formula = severe_prob_rate ~ overcrowd_rate + incomplete_rate, 
##     data = CHAS_renters)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -6.0394 -1.2284 -0.0352  1.3681 10.8438 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      6.03935    0.04795 125.947   <2e-16 ***
## overcrowd_rate  -0.63794    0.03476 -18.352   <2e-16 ***
## incomplete_rate -0.05017    0.02477  -2.025   0.0429 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.302 on 3218 degrees of freedom
## Multiple R-squared:  0.09888,    Adjusted R-squared:  0.09832 
## F-statistic: 176.6 on 2 and 3218 DF,  p-value: < 2.2e-16The R^2 value is low meaning only about 10 percent of the variation in severe housing problems across counties is explained by these variables. The p value for overcrowd rate is highly significant and the incomplete rate is marginally significant.
For overcrowding, the coefficient is -.638 which would imply that a one percent increase in overcrowding causes a decrease in severe problem rate of .64 which is interesting and unexpected.
plot(model, which = 1)
The model is somewhat linear with only a mild curve but violates the
homoscedasticity assumption. There is a funnel shape along the line as
variance increases towards the right.