Exploration of virtue names

options(stringsAsFactors = FALSE)
library(dplyr)

## 
## Attaching package: 'dplyr'
## 
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## 
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(tidyr)
library(ggplot2)

Names data from IPUMS 1% samples of the 1850, 1860, and 1870 censuses. This is not as good as having the earlier censuses, but it at least lets us know about the birthyears of people represented.

names <- read.csv("usa_00005.csv")

What birthyears are represented? Keep in mind that this is not really representative, since we haven’t taken into account variable person weighting. But it does show what we would expect, a declining number of people from earlier birth years, due to increasing population and morality. The spikes show us that the data entry was fudged in many places.

names %>%
  group_by(BIRTHYR) %>%
  summarize(count = n()) %>%
  ggplot(aes(x = BIRTHYR, y = count)) +
  geom_bar(stat = "identity")

plot of chunk unnamed-chunk-3

For now we don’t want to mess with person weights, so let’s just look at the 1850 census.

c_1850 <- names %>%
  filter(YEAR == 1850)

We need to clean up the names to remove initials, etc. Here is a function to do that. (A more sophisticated function would take into account people with initials for first names.)

clean_names <- function(name) {
  require(stringr)
  return( name %>% word() %>% tolower() )
}

Let’s find out the number of unique names in the 1850 census

c_1850$name <- clean_names(c_1850$NAMEFRST)

## Loading required package: stringr

length(unique(c_1850$name))

## [1] 14056

Now let’s create statistics for each name by birth year:

n_1850 <- c_1850 %>% 
  group_by(name, BIRTHYR) %>%
  summarize(count = n()) %>%
  arrange(desc(count)) 

n_1850 %>%
  head(n = 50)

## Source: local data frame [50 x 3]
## Groups: name
## 
##       name BIRTHYR count
## 1     mary    1848   452
## 2     mary    1850   436
## 3     mary    1846   434
## 4     mary    1847   424
## 5     mary    1844   413
## 6     mary    1849   406
## 7     mary    1845   403
## 8     mary    1843   378
## 9     mary    1842   369
## 10    mary    1830   359
## 11    mary    1840   355
## 12    mary    1838   346
## 13    mary    1832   344
## 14    mary    1836   344
## 15    john    1848   343
## 16    john    1846   342
## 17    mary    1841   326
## 18    john    1849   322
## 19    john    1842   321
## 20    john    1845   320
## 21    mary    1820   319
## 22    mary    1834   317
## 23    john    1820   314
## 24    john    1844   311
## 25    john    1847   309
## 26    mary    1833   307
## 27    john    1838   302
## 28    mary    1837   302
## 29    mary    1825   301
## 30    mary    1839   300
## 31    mary    1828   298
## 32    john    1840   295
## 33    john    1843   292
## 34    mary    1835   280
## 35    mary    1827   273
## 36    john    1850   271
## 37    john    1825   269
## 38    john    1841   269
## 39    john    1836   259
## 40    mary    1826   258
## 41 william    1842   250
## 42    john    1828   247
## 43    mary    1831   245
## 44    john    1832   241
## 45    john    1834   241
## 46   sarah    1848   240
## 47    john    1835   236
## 48 william    1845   236
## 49    mary    1822   235
## 50    john    1826   234

Now we can plot the occurence of a name by birth year for any given name. Keep in mind that this is the number of people alive in 1850 uncorrected for mortality. This is probably too long from your period to tell anything useful.

n_1850 %>%
  filter(name == "mary") %>%
  ggplot(aes(x = BIRTHYR, y = count)) +
  geom_line()

plot of chunk unnamed-chunk-8

One other approach is to create a list of names, and then we can get the counts for each combination of name and year.

interesting_names <- c(
  "patience",
  "virtue",
  "honor",
  "chastity",
  "prudence"
  )

n_1850 %>%
  filter(name %in% interesting_names)

## Source: local data frame [82 x 3]
## Groups: name
## 
##        name BIRTHYR count
## 1  patience    1831     3
## 2  prudence    1832     3
## 3  patience    1782     2
## 4  patience    1792     2
## 5  patience    1795     2
## 6  patience    1796     2
## 7  patience    1800     2
## 8  patience    1802     2
## 9  patience    1824     2
## 10 patience    1830     2
## 11 patience    1840     2
## 12 prudence    1786     2
## 13 prudence    1798     2
## 14 prudence    1810     2
## 15 prudence    1816     2
## 16 prudence    1821     2
## 17 prudence    1823     2
## 18 prudence    1836     2
## 19    honor    1815     1
## 20    honor    1820     1
## 21    honor    1830     1
## 22    honor    1831     1
## 23 patience    1767     1
## 24 patience    1772     1
## 25 patience    1780     1
## 26 patience    1784     1
## 27 patience    1785     1
## 28 patience    1786     1
## 29 patience    1791     1
## 30 patience    1797     1
## 31 patience    1798     1
## 32 patience    1801     1
## 33 patience    1803     1
## 34 patience    1805     1
## 35 patience    1809     1
## 36 patience    1811     1
## 37 patience    1815     1
## 38 patience    1819     1
## 39 patience    1820     1
## 40 patience    1822     1
## 41 patience    1825     1
## 42 patience    1827     1
## 43 patience    1828     1
## 44 patience    1834     1
## 45 patience    1835     1
## 46 patience    1836     1
## 47 patience    1838     1
## 48 patience    1839     1
## 49 patience    1841     1
## 50 patience    1843     1
## 51 patience    1845     1
## 52 prudence    1768     1
## 53 prudence    1780     1
## 54 prudence    1788     1
## 55 prudence    1792     1
## 56 prudence    1793     1
## 57 prudence    1794     1
## 58 prudence    1796     1
## 59 prudence    1797     1
## 60 prudence    1800     1
## 61 prudence    1802     1
## 62 prudence    1803     1
## 63 prudence    1806     1
## 64 prudence    1808     1
## 65 prudence    1809     1
## 66 prudence    1812     1
## 67 prudence    1814     1
## 68 prudence    1815     1
## 69 prudence    1819     1
## 70 prudence    1820     1
## 71 prudence    1824     1
## 72 prudence    1825     1
## 73 prudence    1828     1
## 74 prudence    1831     1
## 75 prudence    1833     1
## 76 prudence    1834     1
## 77 prudence    1837     1
## 78 prudence    1838     1
## 79 prudence    1841     1
## 80 prudence    1842     1
## 81 prudence    1847     1
## 82 prudence    1849     1

The short answer is that the 1850 census is too far from the time period that you are interested in, but unfortunately it’s the first census for which IPUMS has the data available.

Exploration of virtue names

Lincoln Mullen

07/05/2014