The Housing Affordability Data System (HADS) is a set of files derived from the 1985 and later national American Housing Survey (AHS) and the 2002 and later Metro AHS. This system categorizes housing units by affordability and households by income, with respect to the Adjusted Median Income, Fair Market Rent (FMR), and poverty income. It also includes housing cost burden for owner and renter households.
# load data
house <- read.csv("https://raw.githubusercontent.com/maharjansudhan/DATA606/master/housing_affordability.csv", header=TRUE, sep=",")
summary(house)
## ï..AGE1 METRO3 REGION LMED
## Min. :13.00 Min. :1.000 Min. :1.000 Min. : 38500
## 1st Qu.:38.00 1st Qu.:1.000 1st Qu.:1.000 1st Qu.: 60300
## Median :52.00 Median :2.000 Median :2.000 Median : 64600
## Mean :52.18 Mean :2.227 Mean :2.394 Mean : 68110
## 3rd Qu.:65.00 3rd Qu.:3.000 3rd Qu.:3.000 3rd Qu.: 74008
## Max. :93.00 Max. :5.000 Max. :4.000 Max. :115300
## NA's :4438
## IPOV BEDRMS BUILT TYPE
## Min. :11057 Min. :0.00 Min. :1919 Min. :1.000
## 1st Qu.:12036 1st Qu.:2.00 1st Qu.:1950 1st Qu.:1.000
## Median :15470 Median :3.00 Median :1970 Median :1.000
## Mean :17168 Mean :2.66 Mean :1966 Mean :1.065
## 3rd Qu.:18639 3rd Qu.:3.00 3rd Qu.:1985 3rd Qu.:1.000
## Max. :51635 Max. :7.00 Max. :2013 Max. :9.000
## NA's :4438
## VALUE ROOMS ZINC2 ZSMHC
## Min. : 1 Min. : 1.000 Min. : -117 Min. : 0
## 1st Qu.: 100000 1st Qu.: 4.000 1st Qu.: 19974 1st Qu.: 510
## Median : 180000 Median : 5.000 Median : 44973 Median : 899
## Mean : 246763 Mean : 5.631 Mean : 65887 Mean : 1140
## 3rd Qu.: 300000 3rd Qu.: 7.000 3rd Qu.: 85600 3rd Qu.: 1454
## Max. :2520000 Max. :15.000 Max. :1061921 Max. :10667
## NA's :27389 NA's :4438 NA's :4438
## TOTSAL FMTMETRO3 FMTBUILT
## Min. : 0 Central City:21493 Pre 1940 :10058
## 1st Qu.: 0 Nonmetro :11255 1940-1959 :11078
## Median : 28000 Suburb :31787 1960-1979 :19685
## Mean : 48228 1980-1989 : 8234
## 3rd Qu.: 70000 1990-1999 : 7533
## Max. :698886 2000-2009 : 7176
## NA's :4438 After 2010: 771
## FMTSTRUCTURETYPE FMTBEDRMS FMTOWNRENT
## .: 2 0 Studio: 622 1 Owner :37146
## 1 Single Family:41271 1 1BR : 9821 2 Renter:27389
## 2 2-4 units : 6257 2 2BR :16401
## 3 5-19 units : 7273 3 3BR :24850
## 4 20-49 units : 2719 4 4BR+ :12841
## 5 50+ units : 4570
## 6 Mobile Home : 2443
## FMTREGION FMTSTATUS
## Midwest :17400 Occupied:60097
## Northeast:16519 Vacant : 4438
## South :19260
## West :11356
##
##
##
names(house)
## [1] "ï..AGE1" "METRO3" "REGION"
## [4] "LMED" "IPOV" "BEDRMS"
## [7] "BUILT" "TYPE" "VALUE"
## [10] "ROOMS" "ZINC2" "ZSMHC"
## [13] "TOTSAL" "FMTMETRO3" "FMTBUILT"
## [16] "FMTSTRUCTURETYPE" "FMTBEDRMS" "FMTOWNRENT"
## [19] "FMTREGION" "FMTSTATUS"
colnames(house)[which(names(house) == "ï..AGE1")] <- "AGE"
The file is very big so I used IBM SPSS to convert the xpt file to csv file and then later used Excel to get only some of the information that is needed for my project I have uploaded my converted file to the github.
Is it because of the unavailabity of insurance that people are dying because of health risk behaviors and chronic diseases?
Each case or record represents 1 candidate.
The purpose of these datasets is to provide housing analysts with consistent measures of affordability and burdens over a long period. The datasets are based on the American Housing Survey (AHS) national files from 1985 through 2009 and the metropolitan files from 2002 through 2009
This is a data collection done by the government to see the factors of housing and rental cost going up and down.
I got the data from the government website.
https://www.huduser.gov/portal/datasets/hads/hads.html http://www.census.gov/hhes/poverty/threshld.html
Own or rent the house or apartment
Age, Number of People at home, Income, Monthly housing cost
Provide summary statistics for each the variables. Also include appropriate visualizations related to your research question (e.g. scatter plot, boxplots, etc). This step requires the use of R, hence a code chunk is provided below. Insert more code chunks as needed.
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.5.1
ggplot(house, aes(x=AGE)) + geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Removed 4438 rows containing non-finite values (stat_bin).
All the data and information are collected and referenced from