First, lets load a few libraries we will need.
library(plyr)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:plyr':
##
## arrange, count, desc, failwith, id, mutate, rename, summarise,
## summarize
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(haven)
library(ggplot2)
Now we load our IPUMS data from my professor’s github account.
ipums<-read_dta("https://github.com/coreysparks/data/blob/master/usa_00045.dta?raw=true")
Here, we’d like to create a histogram of family size – but only for “household heads” (defined in the data as “ipums$relate==1”).
ipums %>%
filter(relate==1) %>%
ggplot() +
geom_histogram(aes(famsize))
## Don't know how to automatically pick scale for object of type labelled. Defaulting to continuous.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
This looks pretty ugly. Let’s improve it by making our bins have the same width as the variable “famsize”" (that is, “famsize”" increases in increments of 1, so we can iterate our graph in the same increments for a smoother look):
ipums %>%
filter(relate==1) %>%
ggplot() +
geom_histogram(aes(famsize),binwidth = 1) +
xlab(label = "Family Size for Household Heads")
## Don't know how to automatically pick scale for object of type labelled. Defaulting to continuous.
Here, I have been asked to list summary stats for Foreign- and US-Born heads-of-households. I would like to know their mean family sizes, along with the standard deviation of this measurement. We’ve accomplished this by creating a new variable (“birthplace”), grouping by that new variable and then checking the mean and standard deviation of “famsize” accordingly.
ipums %>%
filter(relate==1) %>%
mutate(birthplace=ifelse(bpl<=120,"US_BORN","FOREIGN_BORN")) %>%
group_by(birthplace) %>%
summarise(mean_family_size=mean(famsize),sd=sd(famsize))
## # A tibble: 2 x 3
## birthplace mean_family_size sd
## <chr> <dbl> <dbl>
## 1 FOREIGN_BORN 2.934445 1.683525
## 2 US_BORN 2.290301 1.332221
Here we see that foreign-born heads-of-household tend to have slightly bigger families than US-born heads-of-household.
Here, I’ve been asked to create a histogram of the above two groups (Foreign Born vs. US Born). I’ve done this by running a histogram but using “facet_wrap” to separate by birthplace.
ipums %>%
filter(relate==1) %>%
mutate(birthplace=ifelse(bpl<=120,"US_BORN","FOREIGN_BORN")) %>%
group_by(birthplace) %>%
ggplot() +
geom_histogram(aes(famsize),binwidth = 1) +
facet_wrap(~birthplace) +
xlab("Family Size") +
ggtitle(label="Foreign-Born vs. US-Born Heads-of-Households by Family Size")
## Don't know how to automatically pick scale for object of type labelled. Defaulting to continuous.
Foreign-Born and US-Born seem very different, and this is largely because the population of Foreign-Born heads-of-households is much smaller than US Born heads-of-households.
Here, I run a box-and-whisker (boxplot) for the same data:
ipums %>%
filter(relate==1) %>%
mutate(birthplace=ifelse(bpl<=120,"US_BORN","FOREIGN_BORN")) %>%
ggplot()+
geom_boxplot(aes(x=birthplace,y=famsize))+
ggtitle(label="Family Size by Head-of-Household Place of Birth (Foreign or US-Born)")+
xlab("Place of Birth")+
ylab("Family Size")
## Don't know how to automatically pick scale for object of type labelled. Defaulting to continuous.
As you can see, the mean family size of foreign-born heads-of-households remains somewhat higher than US-born heads-of-households.
Here, I’ve been asked to plot the average size of the family versus the age of the head-of-household.
ipums %>%
filter(relate==1) %>%
group_by(famsize,age) %>%
summarise(mean_family_size=mean(famsize,na.rm=T)) %>%
ggplot()+
geom_point(aes(x=age,y=mean_family_size),size=.1)+
geom_smooth(aes(x=age,y=mean_family_size),method="loess")+
xlab("Age")+
ylab("Mean Family Size")+
ggtitle(label="Mean Family Size By Age of the Head-of-Household")
## Don't know how to automatically pick scale for object of type labelled. Defaulting to continuous.
From this plot, we can see that family size rises by the age of the head-of-household, until the late 40’s, when it begins to steadily decline. This is in line with the observations of family structure discussed in class: families tend to grow with age (through the childbearing years) while beginning to decline once the head-of-household becomes elderly.