Using the SGL Database in R - Part 2

J. Kavanagh

2023-03-25

Introduction

We will doing more specific analyses in this series of slides

Load the data file I created for this group

load('SGL_2.RData')

Age Breakdown

Now there is only consistent age data from the burials dataset, this is a regular feature of historical data. Typically grave registers are the most useful for providing age information for this type of historical data.

During the intial transcription process, the age ranges was a separate column that normalises the ages listed, it also helped normalise the partial ages such as 18 months etc, that were common for infants.

# First select the two columns we need for this analysis and create a new variable
burials %>% select(Gender, Age_Ranges) -> burials_gender_age_range

# This is now a separate dataset with the Age_Ranges and Gender columns from the Burials dataset
burials_gender_age_range
## # A tibble: 930 × 2
##    Gender Age_Ranges
##    <chr>  <chr>     
##  1 Male   1-5       
##  2 Female 1-5       
##  3 Male   1-5       
##  4 Male   1-5       
##  5 Male   1-5       
##  6 Male   1-5       
##  7 Female 1-5       
##  8 Male   1-5       
##  9 Female 1-5       
## 10 Male   1-5       
## # … with 920 more rows

Filtering the data

Now we’re going to use filter again to create two separate datasets for male and female

# This creates a dataset for male deaths with their age ranges
burials_gender_age_range %>% filter(Gender == "Male") -> burials_male_age_range

# This creates a dataset for female deaths with their age ranges
burials_gender_age_range %>% filter(Gender == "Female") -> burials_female_age_range

# Result
burials_male_age_range
## # A tibble: 460 × 2
##    Gender Age_Ranges
##    <chr>  <chr>     
##  1 Male   1-5       
##  2 Male   1-5       
##  3 Male   1-5       
##  4 Male   1-5       
##  5 Male   1-5       
##  6 Male   1-5       
##  7 Male   1-5       
##  8 Male   1-5       
##  9 Male   1-5       
## 10 Male   1-5       
## # … with 450 more rows
# Result
burials_female_age_range
## # A tibble: 470 × 2
##    Gender Age_Ranges
##    <chr>  <chr>     
##  1 Female 1-5       
##  2 Female 1-5       
##  3 Female 1-5       
##  4 Female 1-5       
##  5 Female 1-5       
##  6 Female 1-5       
##  7 Female 1-5       
##  8 Female 1-5       
##  9 Female 1-5       
## 10 Female 1-5       
## # … with 460 more rows

Creating statistics and tidying the columns

# Now using the count() command we can get running totals of the age ranges for men and women
burials_male_age_range %>% count(Age_Ranges) -> burials_male_age_range_count
burials_female_age_range %>% count(Age_Ranges) -> burials_female_age_range_count

# Results
burials_male_age_range_count
## # A tibble: 16 × 2
##    Age_Ranges           n
##    <chr>            <int>
##  1 1-5                 53
##  2 11-15                7
##  3 16-20               16
##  4 21-25               14
##  5 26-30               21
##  6 31-35               11
##  7 36-40               29
##  8 41-45               16
##  9 46-50               31
## 10 51-55               16
## 11 56-60               23
## 12 6-10                27
## 13 61-65               13
## 14 65 and older        78
## 15 Less than 1 year    86
## 16 <NA>                19
# Results
burials_female_age_range_count
## # A tibble: 16 × 2
##    Age_Ranges           n
##    <chr>            <int>
##  1 1-5                 81
##  2 11-15                8
##  3 16-20               17
##  4 21-25               17
##  5 26-30               13
##  6 31-35               15
##  7 36-40               26
##  8 41-45               21
##  9 46-50               26
## 10 51-55               14
## 11 56-60               31
## 12 6-10                24
## 13 61-65               17
## 14 65 and older        89
## 15 Less than 1 year    60
## 16 <NA>                11
# Tidy the column names so it's less confusing, since n will always the column heading for anything you have counted
colnames(burials_male_age_range_count) <- c("Age_Ranges", "No_of_Deaths_Male")
colnames(burials_female_age_range_count) <- c("Age_Ranges", "No_of_Deaths_Female")


# Results
burials_male_age_range_count
## # A tibble: 16 × 2
##    Age_Ranges       No_of_Deaths_Male
##    <chr>                        <int>
##  1 1-5                             53
##  2 11-15                            7
##  3 16-20                           16
##  4 21-25                           14
##  5 26-30                           21
##  6 31-35                           11
##  7 36-40                           29
##  8 41-45                           16
##  9 46-50                           31
## 10 51-55                           16
## 11 56-60                           23
## 12 6-10                            27
## 13 61-65                           13
## 14 65 and older                    78
## 15 Less than 1 year                86
## 16 <NA>                            19
# Results
burials_female_age_range_count
## # A tibble: 16 × 2
##    Age_Ranges       No_of_Deaths_Female
##    <chr>                          <int>
##  1 1-5                               81
##  2 11-15                              8
##  3 16-20                             17
##  4 21-25                             17
##  5 26-30                             13
##  6 31-35                             15
##  7 36-40                             26
##  8 41-45                             21
##  9 46-50                             26
## 10 51-55                             14
## 11 56-60                             31
## 12 6-10                              24
## 13 61-65                             17
## 14 65 and older                      89
## 15 Less than 1 year                  60
## 16 <NA>                              11

Creating a factoring order

Now the Age Ranges are quite large and may need to be adjusted in future, I amalgamated everything above 65 as there were too many for an effective chart. It may be a good idea to change the ranges 10 year ranges as opposed to 5 year ranges.

# This creates a list of age ranges that matches our dataset
age_ranges <- c("Less than 1 year", "1-5", "6-10", "11-15",
              "16-20", "21-25", "26-30", "31-35", "36-40", 
              "41-45", "46-50", "51-55", "56-60", "61-65",
              "65 and older")
# Using the arrange() function we are setting the age ranges to follow the order of our list
burials_female_age_range_count %>% arrange(factor(Age_Ranges, levels = age_ranges)) ->
burials_female_age_range_count

# Result
burials_female_age_range_count
## # A tibble: 16 × 2
##    Age_Ranges       No_of_Deaths_Female
##    <chr>                          <int>
##  1 Less than 1 year                  60
##  2 1-5                               81
##  3 6-10                              24
##  4 11-15                              8
##  5 16-20                             17
##  6 21-25                             17
##  7 26-30                             13
##  8 31-35                             15
##  9 36-40                             26
## 10 41-45                             21
## 11 46-50                             26
## 12 51-55                             14
## 13 56-60                             31
## 14 61-65                             17
## 15 65 and older                      89
## 16 <NA>                              11
# Repeat for the male dataset
burials_male_age_range_count %>% arrange(factor(Age_Ranges, levels = age_ranges)) ->
burials_male_age_range_count

# Result
burials_male_age_range_count
## # A tibble: 16 × 2
##    Age_Ranges       No_of_Deaths_Male
##    <chr>                        <int>
##  1 Less than 1 year                86
##  2 1-5                             53
##  3 6-10                            27
##  4 11-15                            7
##  5 16-20                           16
##  6 21-25                           14
##  7 26-30                           21
##  8 31-35                           11
##  9 36-40                           29
## 10 41-45                           16
## 11 46-50                           31
## 12 51-55                           16
## 13 56-60                           23
## 14 61-65                           13
## 15 65 and older                    78
## 16 <NA>                            19

Merging the datasets

Using the same principals as the yearly data, use the inner_join() command from Tidyverse.

# Use inner_join() to add the datasets together, we're joining them by column, so the Age Ranges is used a common linkage
inner_join(burials_female_age_range_count, burials_male_age_range_count, by="Age_Ranges") -> burials_age_ranges_count

# Rename the columns
colnames(burials_age_ranges_count) <- c("Ages","Female","Male")

# Remove the NA rows
burials_age_ranges_count %>% na.omit() -> burials_age_ranges_count

# Results
burials_age_ranges_count
## # A tibble: 15 × 3
##    Ages             Female  Male
##    <chr>             <int> <int>
##  1 Less than 1 year     60    86
##  2 1-5                  81    53
##  3 6-10                 24    27
##  4 11-15                 8     7
##  5 16-20                17    16
##  6 21-25                17    14
##  7 26-30                13    21
##  8 31-35                15    11
##  9 36-40                26    29
## 10 41-45                21    16
## 11 46-50                26    31
## 12 51-55                14    16
## 13 56-60                31    23
## 14 61-65                17    13
## 15 65 and older         89    78

Change the data into a long format

Using the melt() command from the reshape2 package we can change the data from being wide to long data

# Here is the command
melt(burials_age_ranges_count) -> burials_total_age_ranges_long
## Using Ages as id variables
# Result
burials_total_age_ranges_long
##                Ages variable value
## 1  Less than 1 year   Female    60
## 2               1-5   Female    81
## 3              6-10   Female    24
## 4             11-15   Female     8
## 5             16-20   Female    17
## 6             21-25   Female    17
## 7             26-30   Female    13
## 8             31-35   Female    15
## 9             36-40   Female    26
## 10            41-45   Female    21
## 11            46-50   Female    26
## 12            51-55   Female    14
## 13            56-60   Female    31
## 14            61-65   Female    17
## 15     65 and older   Female    89
## 16 Less than 1 year     Male    86
## 17              1-5     Male    53
## 18             6-10     Male    27
## 19            11-15     Male     7
## 20            16-20     Male    16
## 21            21-25     Male    14
## 22            26-30     Male    21
## 23            31-35     Male    11
## 24            36-40     Male    29
## 25            41-45     Male    16
## 26            46-50     Male    31
## 27            51-55     Male    16
## 28            56-60     Male    23
## 29            61-65     Male    13
## 30     65 and older     Male    78

Graph - Version A

I created my own theme but there are other ones available via the ggthemes package.

burials_total_age_ranges_long %>% ggplot(aes(x=factor(Ages, levels = age_ranges), y=value)) + 
geom_bar(aes(fill=variable), position = "dodge", stat = "identity", width = 0.5) + 
labs(title = "Age and Gender breakdown of Irish deaths, 1689-1740",
     subtitle = "(St. Germain-en-Laye)",
     tag = "Figure 3", 
     caption = "Database of St. Germain-en-Laye Registers",
     x = "Age Ranges",
     y = "No. of Deaths",
     fill = "Gender") +
scale_fill_colorblind() + theme_classic(base_family = "Baskerville") + 
theme(axis.text.x = element_text(colour = "darkslategrey", size = 10), 
      axis.text.y = element_text(colour = "darkslategrey", size = 14),          
      legend.background = element_rect(fill = "white", linewidth = 4, colour = "white"),
      legend.justification = c(0, 1),
      legend.position = c(0.9, 1),
      axis.ticks = element_line(colour = "grey70", linewidth = 0.3),
      text = element_text(family = "Baskerville"),
      plot.title = element_text(face = "bold", size = 18, margin = margin(b = 10)),
      plot.subtitle = element_text(size = 12, color = "darkslategrey", margin = margin(b = 25)),
      plot.caption = element_text(size = 10, margin = margin(t = 10), color = "grey70", hjust = 0))

Graphs - Version B

First install and load the ggthemes library

’ install.packages(‘ggthemes’) ’

library(ggthemes)
burials_total_age_ranges_long %>% ggplot(aes(x=factor(Ages, levels = age_ranges), y=value)) + 
geom_bar(aes(fill=variable), position = "dodge", stat = "identity", width = 0.5) + 
labs(title = "Age and Gender breakdown of Irish deaths, 1689-1740",
     subtitle = "(St. Germain-en-Laye)",
     tag = "Figure 3", 
     caption = "Database of St. Germain-en-Laye Registers",
     x = "Age Ranges",
     y = "No. of Deaths",
     fill = "Gender") +
theme_economist()

Preparing Tables

De-duplicating the Signatures & Witnesses. Create small subsets of the data from baptisms.

baptisms %>% select(Father_Name_FR, Father_Register_Signature) -> baptisms_father_sig
baptisms %>% select(Mother_Name_FR, Mother_Register_Signature) -> baptisms_mother_sig
baptisms %>% select(Godmother_Name_FR, Godmother_Register_Signature) -> baptisms_godmother_sig
baptisms %>% select(Godfather_Name_FR, Godfather_Register_Signature) -> baptisms_godfather_sig

# Result
baptisms_father_sig
## # A tibble: 954 × 2
##    Father_Name_FR            Father_Register_Signature
##    <chr>                     <chr>                    
##  1 Jean Comte de Melfort     No                       
##  2 Randall Magdonnal         No                       
##  3 George Comte de Dumberton No                       
##  4 Henry Baxta               No                       
##  5 Deudley Bagnall           No                       
##  6 Francois Cleaq            No                       
##  7 Jean Roche                No                       
##  8 Brian Bruerton            No                       
##  9 Jacques Lambert           No                       
## 10 Antoine Saual             No                       
## # … with 944 more rows
# Result
baptisms_mother_sig
## # A tibble: 954 × 2
##    Mother_Name_FR             Mother_Register_Signature
##    <chr>                      <chr>                    
##  1 Euphenia Vicalace          No                       
##  2 Hanna Roche                No                       
##  3 Anne Uuheatly              No                       
##  4 Marie Mauuld               No                       
##  5 Anne Mateis                No                       
##  6 Elizabeth Best             No                       
##  7 Anne Pessé                 No                       
##  8 Catherine Louise Jonchseur No                       
##  9 Jeanne LeFoy               No                       
## 10 Jeanne Cecile Herbert      No                       
## # … with 944 more rows
# Result
baptisms_godmother_sig
## # A tibble: 954 × 2
##    Godmother_Name_FR    Godmother_Register_Signature
##    <chr>                <chr>                       
##  1 Marie Conny          Yes                         
##  2 Anne Baguenal        Yes                         
##  3 Henriette de Gourdon Yes                         
##  4 Margueritte Karlton  Yes                         
##  5 Anne Magdonnel       Yes                         
##  6 Anne Magdonnel       Yes                         
##  7 Winifrida Strickland Yes                         
##  8 Contesse Montaygue   Yes                         
##  9 Jacqueline Du Foud   Unable                      
## 10 Lucie Herbert        Yes                         
## # … with 944 more rows
# Result
baptisms_godfather_sig
## # A tibble: 954 × 2
##    Godfather_Name_FR        Godfather_Register_Signature
##    <chr>                    <chr>                       
##  1 Jacques Comte de Droment Yes                         
##  2 <NA>                     Yes                         
##  3 Milord Stafford          Yes                         
##  4 Thomas Waldgrave         Yes                         
##  5 Thomas Neuuille          Yes                         
##  6 Guillaume Waldegrave     Yes                         
##  7 Philippe Ellis           Yes                         
##  8 Conte Dombartay          Yes                         
##  9 Jacques Blouiy           Unable                      
## 10 Charles Leybonne         Yes                         
## # … with 944 more rows

As there are many parents and godparents, you can use the distinct() command to separate dupilcates and the .keep_all = TRUE will keep all the columns.

baptisms_father_sig %>% distinct(Father_Name_FR, .keep_all = T) -> baptisms_father_sig
baptisms_mother_sig %>% distinct(Mother_Name_FR, .keep_all = T) -> baptisms_mother_sig
baptisms_godmother_sig %>% distinct(Godmother_Name_FR, .keep_all = T) -> baptisms_godmother_sig
baptisms_godfather_sig %>% distinct(Godfather_Name_FR, .keep_all = T) -> baptisms_godfather_sig

Create running counts of the signatures

Now that you have de-duplicated the baptismal signatures create running counts and omit the NAs

baptisms_father_sig %>% count(Father_Register_Signature) %>% na.omit() -> baptisms_father_sig_count
baptisms_mother_sig %>% count(Mother_Register_Signature) %>% na.omit() -> baptisms_mother_sig_count
baptisms_godmother_sig %>% count(Godmother_Register_Signature) %>% na.omit() -> baptisms_godmother_sig_count
baptisms_godfather_sig %>% count(Godfather_Register_Signature) %>% na.omit() -> baptisms_godfather_sig_count

Remember to rename the column headings

colnames(baptisms_father_sig_count) <- c("Signature", "Fathers") 
colnames(baptisms_mother_sig_count) <- c("Signature", "Mothers") 
colnames(baptisms_godfather_sig_count) <- c("Signature", "Godfathers") 
colnames(baptisms_godmother_sig_count) <- c("Signature", "Godmothers") 
# Check the result
baptisms_father_sig_count
## # A tibble: 3 × 2
##   Signature Fathers
##   <chr>       <int>
## 1 No            423
## 2 Unable         86
## 3 Yes           200
# Check the result
baptisms_mother_sig_count
## # A tibble: 3 × 2
##   Signature Mothers
##   <chr>       <int>
## 1 No            643
## 2 Unable         73
## 3 Yes             6
# Check the result
baptisms_godfather_sig_count
## # A tibble: 3 × 2
##   Signature Godfathers
##   <chr>          <int>
## 1 No                45
## 2 Unable            58
## 3 Yes              600
# Check the result
baptisms_godmother_sig_count
## # A tibble: 3 × 2
##   Signature Godmothers
##   <chr>          <int>
## 1 No                46
## 2 Unable            93
## 3 Yes              616

Group the findings

If you have multiple files that are very similar you can group multiple datasets using Reduce() and inner_join()

baptisms_signatures_tbl <- Reduce(inner_join, 
                                  list(baptisms_father_sig_count, 
                                       baptisms_mother_sig_count,
                                       baptisms_godfather_sig_count,
                                       baptisms_godmother_sig_count))
## Joining, by = "Signature"
## Joining, by = "Signature"
## Joining, by = "Signature"
baptisms_signatures_tbl
## # A tibble: 3 × 5
##   Signature Fathers Mothers Godfathers Godmothers
##   <chr>       <int>   <int>      <int>      <int>
## 1 No            423     643         45         46
## 2 Unable         86      73         58         93
## 3 Yes           200       6        600        616
# Change the order of the rows
baptisms_signatures_tbl %>% arrange(desc(Signature)) -> baptisms_signatures_tbl

baptisms_signatures_tbl
## # A tibble: 3 × 5
##   Signature Fathers Mothers Godfathers Godmothers
##   <chr>       <int>   <int>      <int>      <int>
## 1 Yes           200       6        600        616
## 2 Unable         86      73         58         93
## 3 No            423     643         45         46

Creating Tables

The package kableExtra creates nice tables.

’ install.packages(‘kableExtra’) ’

library(kableExtra)
## 
## Attaching package: 'kableExtra'
## The following object is masked from 'package:dplyr':
## 
##     group_rows
kbl(baptisms_signatures_tbl, caption = "Literacy of Parents & Godparents", booktabs = T) %>% kable_classic_2(full_width = F)
Literacy of Parents & Godparents
Signature Fathers Mothers Godfathers Godmothers
Yes 200 6 600 616
Unable 86 73 58 93
No 423 643 45 46