We know that education leads to better housing the question for Habitat is whether housing leads to better education?
It is established that kids of Owners do better in school what is debated is Why? Is it that the type of parent that saves for years to buy a house is also the kind of parent that is thinking about college on their child’s first day of kindergarten? Rents approximate mortgage payments so is it only a matter of coming up with a down payment that determines a child’s outcome? Or is this too Calvinistic an interpretation of the facts? Could there be elements of Good Fortune that visited those that became homeowners that we are ignoring (Say an inheritance or the option of living with parents to save for a downpayment?) And but for the want of such Good Fortune any tenant might make a marvelous homeowner?
The US Census can tell us many things but it cannot help us with these questions. To answer questions like these we need the PSID longitudinal data. We would need to look at kids who graduated high school and see if their parents owned or rented? Relatedly, it would be interesting to see if the length of tenancy has an effect on highschool graduation rates? Does it make a difference if the parents own the home or if the tenant stays in the same home for the first 18 years of the child’s life? What about Owners who move more often than tenants, what effect does this have on a child’s high school graduation chances? And if there are enough of them, how have children adopted in to Owners families fared? If they succeed as much as their adopted siblings it indicates it is Nurture if they do not, could it be Nature?
Hopefully it is not Alchemy that Habitat is after and that merely raising a kid in a owned home does make a difference. Nurture not Nature. And hopefully the marshmallow study where kids who were able to delay gratification went on to have more successful lives has no part in Habitat and if just given the chance to succeed they will.
And while I would like to include here an investigation done with PSID data this quote from an R Blogger post tells me I should not wait on finishing it before contacting you:
make no mistake, this is a terribly complicated data set both for michigan to construct and for you to analyze. (https://www.r-bloggers.com/analyze-the-panel-study-of-income-dynamics-psid-with-r/).
On top of this comes a warning I have never seen R generate before:
this can take several hours or days to download do you want to proceed? (It has since been 5 hours.)
In the mean time below is some R work done with Census data.
The US Census Bureau ACS variable B25013 (https://api.census.gov/data/2017/acs/acs5/groups/B25013.html ) addresses HOUSING vis-a-vis EDUCATION.
These are the particular variables that we will look at.
The first 2 relate to the number of homes and if they are owner occupied. B25013_001E: Estimated Total of Homes: B25013_002E: Est Total That Are Owner Occupied
The next four relate to the level of education of the owner. B25013_003E : No high School diploma B25013_004E : High School graduate. B25013_005E : Some College but no degree. B25013_006E : At least a bachelors degree
library(tidyverse)
[30m── [1mAttaching packages[22m ───────────────────────────────────────── tidyverse 1.2.1 ──[39m
[30m[32m✔[30m [34mggplot2[30m 3.1.1 [32m✔[30m [34mpurrr [30m 0.3.2
[32m✔[30m [34mtibble [30m 2.1.1 [32m✔[30m [34mdplyr [30m 0.8.1
[32m✔[30m [34mtidyr [30m 0.8.3 [32m✔[30m [34mstringr[30m 1.4.0
[32m✔[30m [34mreadr [30m 1.3.1 [32m✔[30m [34mforcats[30m 0.4.0[39m
[30m── [1mConflicts[22m ──────────────────────────────────────────── tidyverse_conflicts() ──
[31m✖[30m [34mdplyr[30m::[32mfilter()[30m masks [34mstats[30m::filter()
[31m✖[30m [34mdplyr[30m::[32mlag()[30m masks [34mstats[30m::lag()[39m
library(tidycensus)
library(ggthemes)
Attaching package: ‘ggthemes’
The following object is masked _by_ ‘.GlobalEnv’:
theme_map
And insert our Census API key April 2019: 6eff16100e6184b9bf0604e510aebaxxxxxxxxxx census_api_key(“YOUR API KEY GOES HERE”)
census_api_key(key, overwrite = FALSE, install = FALSE)
school_benefits<- get_acs(geography = "county",
variables = c(EstTotal = "B25013_001E",
EstTotalOwnerOcc = "B25013_002E",
SchoolLeaver = "B25013_003E",
HSGrad = "B25013_004E",
SomeColl = "B25013_005E",
CollGrad = "B25013_006E") ,
state = "NY",
output = "wide")
Getting data from the 2013-2017 5-year ACS
Using FIPS code '36' for state 'NY'
head(school_benefits,3)
dim(school_benefits) #[1] 62 14
[1] 62 14
Fourteen columns is unweidly . Let us keep only the essentialand drop the rest.
school_benefits<- select(school_benefits, -c(B25013_001M ,B25013_002M ,B25013_003M ,
B25013_004M , B25013_005M,B25013_006M ))
head(school_benefits,3)
#str(school_benefits) Easy way to access and C&P the column names.
Nassau<- filter(school_benefits,NAME == "Nassau County, New York")
head(Nassau)
Nassau_2<- Nassau%>% gather( "EducationLevel","NumberWhoOwn", SchoolLeaver : CollGrad )
head(Nassau_2,2)
str(Nassau_2)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 4 obs. of 6 variables:
$ GEOID : chr "36059" "36059" "36059" "36059"
$ NAME : chr "Nassau County, New York" "Nassau County, New York" "Nassau County, New York" "Nassau County, New York"
$ EstTotal : num 444136 444136 444136 444136
$ EstTotalOwnerOcc: num 357982 357982 357982 357982
$ EducationLevel : chr "SchoolLeaver" "HSGrad" "SomeColl" "CollGrad"
$ NumberWhoOwn : num 18606 70696 85702 182978
head(Nassau_2,2)
Nassau_select<- Nassau_2%>%
select(EducationLevel,NumberWhoOwn)
head(Nassau_select)
names(Nassau_select )[names(Nassau_select) == "NumberWhoOwn"] <- "Number"
str(Nassau_select)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 4 obs. of 2 variables:
$ EducationLevel: chr "SchoolLeaver" "HSGrad" "SomeColl" "CollGrad"
$ Number : num 18606 70696 85702 182978
Nassau_select$EducationLevel <- as.factor(Nassau_select$EducationLevel)
str(Nassau_select)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 4 obs. of 2 variables:
$ EducationLevel: Factor w/ 4 levels "CollGrad","HSGrad",..: 3 2 4 1
$ Number : num 18606 70696 85702 182978
Nassau_select
ggplot(Nassau_select, aes(x=EducationLevel, y= Number)) + geom_histogram(stat = 'identity')
Ignoring unknown parameters: binwidth, bins, pad
ggplot(Nassau_select, aes(x=EducationLevel, y= Number)) + geom_histogram(stat = 'identity')+
theme_economist(base_family="Verdana") +
scale_colour_economist()
Ignoring unknown parameters: binwidth, bins, pad
Lets rearrange those bars.
Nassau_select$EducationLevel <-factor(Nassau_select$EducationLevel,
levels = c("SchoolLeaver","HSGrad","SomeColl", "CollGrad"))
ggplot(Nassau_select, aes(x=EducationLevel, y= Number)) + geom_histogram(stat = 'identity')+
theme_economist(base_family="Verdana") +
scale_colour_economist()
Ignoring unknown parameters: binwidth, bins, pad
ggplot(Nassau_select, aes(x=EducationLevel, y= Number)) + geom_histogram(stat = 'identity')+
theme_economist(base_family="Verdana") +
scale_colour_economist()+
labs(y="")+
ggtitle("Homes Owned Per Education Bracket")
Ignoring unknown parameters: binwidth, bins, pad