Harold Nelson
11/10/2016
First we need to make the package available to knitr and look at it.
library(babynames)
str(babynames)
## Classes 'tbl_df', 'tbl' and 'data.frame': 1825433 obs. of 5 variables:
## $ year: num 1880 1880 1880 1880 1880 1880 1880 1880 1880 1880 ...
## $ sex : chr "F" "F" "F" "F" ...
## $ name: chr "Mary" "Anna" "Emma" "Elizabeth" ...
## $ n : int 7065 2604 2003 1939 1746 1578 1472 1414 1320 1288 ...
## $ prop: num 0.0724 0.0267 0.0205 0.0199 0.0179 ...
summary(babynames)
## year sex name n
## Min. :1880 Length:1825433 Length:1825433 Min. : 5.0
## 1st Qu.:1949 Class :character Class :character 1st Qu.: 7.0
## Median :1982 Mode :character Mode :character Median : 12.0
## Mean :1973 Mean : 184.7
## 3rd Qu.:2001 3rd Qu.: 32.0
## Max. :2014 Max. :99680.0
## prop
## Min. :2.260e-06
## 1st Qu.:3.910e-06
## Median :7.390e-06
## Mean :1.407e-04
## 3rd Qu.:2.346e-05
## Max. :8.155e-02
First, get the records with name = “Harold”
harolds = babynames[babynames$name=="Harold",]
str(harolds)
## Classes 'tbl_df', 'tbl' and 'data.frame': 229 obs. of 5 variables:
## $ year: num 1880 1881 1882 1883 1884 ...
## $ sex : chr "M" "M" "M" "M" ...
## $ name: chr "Harold" "Harold" "Harold" "Harold" ...
## $ n : int 113 120 127 108 191 201 224 279 298 340 ...
## $ prop: num 0.000954 0.001108 0.001041 0.00096 0.001556 ...
summary(harolds)
## year sex name n
## Min. :1880 Length:229 Length:229 Min. : 5
## 1st Qu.:1917 Class :character Class :character 1st Qu.: 25
## Median :1946 Mode :character Mode :character Median : 350
## Mean :1946 Mean : 2408
## 3rd Qu.:1974 3rd Qu.: 2768
## Max. :2014 Max. :14156
## prop
## Min. :2.774e-06
## 1st Qu.:2.490e-05
## Median :2.586e-04
## Mean :2.640e-03
## 3rd Qu.:4.336e-03
## Max. :1.243e-02
Plot the number and proportion for Harold over time.
plot(harolds$year,harolds$n,main = "Count")
plot(harolds$year,harolds$prop,main = "Proportion")
Why do we get two sets of points? Some Harolds are girls!!??
table(harolds$sex)
##
## F M
## 94 135
girlHarolds = harolds[harolds$sex=="F",]
boxplot(girlHarolds$n,main = "Girls Named Harold",horizontal=TRUE)
plot(girlHarolds$year,girlHarolds$n, main = "Girls Named Harold")
How many of you know a girl named Harold (or a boy named Sue)?
You may want to read this blog post from the Census Bureau.
http://blogs.census.gov/2011/09/27/how-small-errors-can-have-a-big-impact-on-small-populations/