J. Kavanagh
2023-03-25
We will doing more specific analyses in this series of slides
Load the data file I created for this group
Now there is only consistent age data from the burials dataset, this is a regular feature of historical data. Typically grave registers are the most useful for providing age information for this type of historical data.
During the intial transcription process, the age ranges was a separate column that normalises the ages listed, it also helped normalise the partial ages such as 18 months etc, that were common for infants.
# First select the two columns we need for this analysis and create a new variable
burials %>% select(Gender, Age_Ranges) -> burials_gender_age_range
# This is now a separate dataset with the Age_Ranges and Gender columns from the Burials dataset
burials_gender_age_range## # A tibble: 930 × 2
## Gender Age_Ranges
## <chr> <chr>
## 1 Male 1-5
## 2 Female 1-5
## 3 Male 1-5
## 4 Male 1-5
## 5 Male 1-5
## 6 Male 1-5
## 7 Female 1-5
## 8 Male 1-5
## 9 Female 1-5
## 10 Male 1-5
## # … with 920 more rows
Now we’re going to use filter again to create two separate datasets for male and female
# This creates a dataset for male deaths with their age ranges
burials_gender_age_range %>% filter(Gender == "Male") -> burials_male_age_range
# This creates a dataset for female deaths with their age ranges
burials_gender_age_range %>% filter(Gender == "Female") -> burials_female_age_range
# Result
burials_male_age_range## # A tibble: 460 × 2
## Gender Age_Ranges
## <chr> <chr>
## 1 Male 1-5
## 2 Male 1-5
## 3 Male 1-5
## 4 Male 1-5
## 5 Male 1-5
## 6 Male 1-5
## 7 Male 1-5
## 8 Male 1-5
## 9 Male 1-5
## 10 Male 1-5
## # … with 450 more rows
## # A tibble: 470 × 2
## Gender Age_Ranges
## <chr> <chr>
## 1 Female 1-5
## 2 Female 1-5
## 3 Female 1-5
## 4 Female 1-5
## 5 Female 1-5
## 6 Female 1-5
## 7 Female 1-5
## 8 Female 1-5
## 9 Female 1-5
## 10 Female 1-5
## # … with 460 more rows
# Now using the count() command we can get running totals of the age ranges for men and women
burials_male_age_range %>% count(Age_Ranges) -> burials_male_age_range_count
burials_female_age_range %>% count(Age_Ranges) -> burials_female_age_range_count
# Results
burials_male_age_range_count## # A tibble: 16 × 2
## Age_Ranges n
## <chr> <int>
## 1 1-5 53
## 2 11-15 7
## 3 16-20 16
## 4 21-25 14
## 5 26-30 21
## 6 31-35 11
## 7 36-40 29
## 8 41-45 16
## 9 46-50 31
## 10 51-55 16
## 11 56-60 23
## 12 6-10 27
## 13 61-65 13
## 14 65 and older 78
## 15 Less than 1 year 86
## 16 <NA> 19
## # A tibble: 16 × 2
## Age_Ranges n
## <chr> <int>
## 1 1-5 81
## 2 11-15 8
## 3 16-20 17
## 4 21-25 17
## 5 26-30 13
## 6 31-35 15
## 7 36-40 26
## 8 41-45 21
## 9 46-50 26
## 10 51-55 14
## 11 56-60 31
## 12 6-10 24
## 13 61-65 17
## 14 65 and older 89
## 15 Less than 1 year 60
## 16 <NA> 11
# Tidy the column names so it's less confusing, since n will always the column heading for anything you have counted
colnames(burials_male_age_range_count) <- c("Age_Ranges", "No_of_Deaths_Male")
colnames(burials_female_age_range_count) <- c("Age_Ranges", "No_of_Deaths_Female")
# Results
burials_male_age_range_count## # A tibble: 16 × 2
## Age_Ranges No_of_Deaths_Male
## <chr> <int>
## 1 1-5 53
## 2 11-15 7
## 3 16-20 16
## 4 21-25 14
## 5 26-30 21
## 6 31-35 11
## 7 36-40 29
## 8 41-45 16
## 9 46-50 31
## 10 51-55 16
## 11 56-60 23
## 12 6-10 27
## 13 61-65 13
## 14 65 and older 78
## 15 Less than 1 year 86
## 16 <NA> 19
## # A tibble: 16 × 2
## Age_Ranges No_of_Deaths_Female
## <chr> <int>
## 1 1-5 81
## 2 11-15 8
## 3 16-20 17
## 4 21-25 17
## 5 26-30 13
## 6 31-35 15
## 7 36-40 26
## 8 41-45 21
## 9 46-50 26
## 10 51-55 14
## 11 56-60 31
## 12 6-10 24
## 13 61-65 17
## 14 65 and older 89
## 15 Less than 1 year 60
## 16 <NA> 11
Now the Age Ranges are quite large and may need to be adjusted in future, I amalgamated everything above 65 as there were too many for an effective chart. It may be a good idea to change the ranges 10 year ranges as opposed to 5 year ranges.
# This creates a list of age ranges that matches our dataset
age_ranges <- c("Less than 1 year", "1-5", "6-10", "11-15",
"16-20", "21-25", "26-30", "31-35", "36-40",
"41-45", "46-50", "51-55", "56-60", "61-65",
"65 and older")# Using the arrange() function we are setting the age ranges to follow the order of our list
burials_female_age_range_count %>% arrange(factor(Age_Ranges, levels = age_ranges)) ->
burials_female_age_range_count
# Result
burials_female_age_range_count## # A tibble: 16 × 2
## Age_Ranges No_of_Deaths_Female
## <chr> <int>
## 1 Less than 1 year 60
## 2 1-5 81
## 3 6-10 24
## 4 11-15 8
## 5 16-20 17
## 6 21-25 17
## 7 26-30 13
## 8 31-35 15
## 9 36-40 26
## 10 41-45 21
## 11 46-50 26
## 12 51-55 14
## 13 56-60 31
## 14 61-65 17
## 15 65 and older 89
## 16 <NA> 11
# Repeat for the male dataset
burials_male_age_range_count %>% arrange(factor(Age_Ranges, levels = age_ranges)) ->
burials_male_age_range_count
# Result
burials_male_age_range_count## # A tibble: 16 × 2
## Age_Ranges No_of_Deaths_Male
## <chr> <int>
## 1 Less than 1 year 86
## 2 1-5 53
## 3 6-10 27
## 4 11-15 7
## 5 16-20 16
## 6 21-25 14
## 7 26-30 21
## 8 31-35 11
## 9 36-40 29
## 10 41-45 16
## 11 46-50 31
## 12 51-55 16
## 13 56-60 23
## 14 61-65 13
## 15 65 and older 78
## 16 <NA> 19
Using the same principals as the yearly data, use the inner_join() command from Tidyverse.
# Use inner_join() to add the datasets together, we're joining them by column, so the Age Ranges is used a common linkage
inner_join(burials_female_age_range_count, burials_male_age_range_count, by="Age_Ranges") -> burials_age_ranges_count
# Rename the columns
colnames(burials_age_ranges_count) <- c("Ages","Female","Male")
# Remove the NA rows
burials_age_ranges_count %>% na.omit() -> burials_age_ranges_count
# Results
burials_age_ranges_count## # A tibble: 15 × 3
## Ages Female Male
## <chr> <int> <int>
## 1 Less than 1 year 60 86
## 2 1-5 81 53
## 3 6-10 24 27
## 4 11-15 8 7
## 5 16-20 17 16
## 6 21-25 17 14
## 7 26-30 13 21
## 8 31-35 15 11
## 9 36-40 26 29
## 10 41-45 21 16
## 11 46-50 26 31
## 12 51-55 14 16
## 13 56-60 31 23
## 14 61-65 17 13
## 15 65 and older 89 78
Using the melt() command from the reshape2 package we can change the data from being wide to long data
## Using Ages as id variables
## Ages variable value
## 1 Less than 1 year Female 60
## 2 1-5 Female 81
## 3 6-10 Female 24
## 4 11-15 Female 8
## 5 16-20 Female 17
## 6 21-25 Female 17
## 7 26-30 Female 13
## 8 31-35 Female 15
## 9 36-40 Female 26
## 10 41-45 Female 21
## 11 46-50 Female 26
## 12 51-55 Female 14
## 13 56-60 Female 31
## 14 61-65 Female 17
## 15 65 and older Female 89
## 16 Less than 1 year Male 86
## 17 1-5 Male 53
## 18 6-10 Male 27
## 19 11-15 Male 7
## 20 16-20 Male 16
## 21 21-25 Male 14
## 22 26-30 Male 21
## 23 31-35 Male 11
## 24 36-40 Male 29
## 25 41-45 Male 16
## 26 46-50 Male 31
## 27 51-55 Male 16
## 28 56-60 Male 23
## 29 61-65 Male 13
## 30 65 and older Male 78
I created my own theme but there are other ones available via the ggthemes package.
burials_total_age_ranges_long %>% ggplot(aes(x=factor(Ages, levels = age_ranges), y=value)) +
geom_bar(aes(fill=variable), position = "dodge", stat = "identity", width = 0.5) +
labs(title = "Age and Gender breakdown of Irish deaths, 1689-1740",
subtitle = "(St. Germain-en-Laye)",
tag = "Figure 3",
caption = "Database of St. Germain-en-Laye Registers",
x = "Age Ranges",
y = "No. of Deaths",
fill = "Gender") +
scale_fill_colorblind() + theme_classic(base_family = "Baskerville") +
theme(axis.text.x = element_text(colour = "darkslategrey", size = 10),
axis.text.y = element_text(colour = "darkslategrey", size = 14),
legend.background = element_rect(fill = "white", linewidth = 4, colour = "white"),
legend.justification = c(0, 1),
legend.position = c(0.9, 1),
axis.ticks = element_line(colour = "grey70", linewidth = 0.3),
text = element_text(family = "Baskerville"),
plot.title = element_text(face = "bold", size = 18, margin = margin(b = 10)),
plot.subtitle = element_text(size = 12, color = "darkslategrey", margin = margin(b = 25)),
plot.caption = element_text(size = 10, margin = margin(t = 10), color = "grey70", hjust = 0))First install and load the ggthemes library
’ install.packages(‘ggthemes’) ’
burials_total_age_ranges_long %>% ggplot(aes(x=factor(Ages, levels = age_ranges), y=value)) +
geom_bar(aes(fill=variable), position = "dodge", stat = "identity", width = 0.5) +
labs(title = "Age and Gender breakdown of Irish deaths, 1689-1740",
subtitle = "(St. Germain-en-Laye)",
tag = "Figure 3",
caption = "Database of St. Germain-en-Laye Registers",
x = "Age Ranges",
y = "No. of Deaths",
fill = "Gender") +
theme_economist()De-duplicating the Signatures & Witnesses. Create small subsets of the data from baptisms.
baptisms %>% select(Father_Name_FR, Father_Register_Signature) -> baptisms_father_sig
baptisms %>% select(Mother_Name_FR, Mother_Register_Signature) -> baptisms_mother_sig
baptisms %>% select(Godmother_Name_FR, Godmother_Register_Signature) -> baptisms_godmother_sig
baptisms %>% select(Godfather_Name_FR, Godfather_Register_Signature) -> baptisms_godfather_sig
# Result
baptisms_father_sig## # A tibble: 954 × 2
## Father_Name_FR Father_Register_Signature
## <chr> <chr>
## 1 Jean Comte de Melfort No
## 2 Randall Magdonnal No
## 3 George Comte de Dumberton No
## 4 Henry Baxta No
## 5 Deudley Bagnall No
## 6 Francois Cleaq No
## 7 Jean Roche No
## 8 Brian Bruerton No
## 9 Jacques Lambert No
## 10 Antoine Saual No
## # … with 944 more rows
## # A tibble: 954 × 2
## Mother_Name_FR Mother_Register_Signature
## <chr> <chr>
## 1 Euphenia Vicalace No
## 2 Hanna Roche No
## 3 Anne Uuheatly No
## 4 Marie Mauuld No
## 5 Anne Mateis No
## 6 Elizabeth Best No
## 7 Anne Pessé No
## 8 Catherine Louise Jonchseur No
## 9 Jeanne LeFoy No
## 10 Jeanne Cecile Herbert No
## # … with 944 more rows
## # A tibble: 954 × 2
## Godmother_Name_FR Godmother_Register_Signature
## <chr> <chr>
## 1 Marie Conny Yes
## 2 Anne Baguenal Yes
## 3 Henriette de Gourdon Yes
## 4 Margueritte Karlton Yes
## 5 Anne Magdonnel Yes
## 6 Anne Magdonnel Yes
## 7 Winifrida Strickland Yes
## 8 Contesse Montaygue Yes
## 9 Jacqueline Du Foud Unable
## 10 Lucie Herbert Yes
## # … with 944 more rows
## # A tibble: 954 × 2
## Godfather_Name_FR Godfather_Register_Signature
## <chr> <chr>
## 1 Jacques Comte de Droment Yes
## 2 <NA> Yes
## 3 Milord Stafford Yes
## 4 Thomas Waldgrave Yes
## 5 Thomas Neuuille Yes
## 6 Guillaume Waldegrave Yes
## 7 Philippe Ellis Yes
## 8 Conte Dombartay Yes
## 9 Jacques Blouiy Unable
## 10 Charles Leybonne Yes
## # … with 944 more rows
As there are many parents and godparents, you can use the distinct() command to separate dupilcates and the .keep_all = TRUE will keep all the columns.
baptisms_father_sig %>% distinct(Father_Name_FR, .keep_all = T) -> baptisms_father_sig
baptisms_mother_sig %>% distinct(Mother_Name_FR, .keep_all = T) -> baptisms_mother_sig
baptisms_godmother_sig %>% distinct(Godmother_Name_FR, .keep_all = T) -> baptisms_godmother_sig
baptisms_godfather_sig %>% distinct(Godfather_Name_FR, .keep_all = T) -> baptisms_godfather_sigNow that you have de-duplicated the baptismal signatures create running counts and omit the NAs
baptisms_father_sig %>% count(Father_Register_Signature) %>% na.omit() -> baptisms_father_sig_count
baptisms_mother_sig %>% count(Mother_Register_Signature) %>% na.omit() -> baptisms_mother_sig_count
baptisms_godmother_sig %>% count(Godmother_Register_Signature) %>% na.omit() -> baptisms_godmother_sig_count
baptisms_godfather_sig %>% count(Godfather_Register_Signature) %>% na.omit() -> baptisms_godfather_sig_countRemember to rename the column headings
colnames(baptisms_father_sig_count) <- c("Signature", "Fathers")
colnames(baptisms_mother_sig_count) <- c("Signature", "Mothers")
colnames(baptisms_godfather_sig_count) <- c("Signature", "Godfathers")
colnames(baptisms_godmother_sig_count) <- c("Signature", "Godmothers") ## # A tibble: 3 × 2
## Signature Fathers
## <chr> <int>
## 1 No 423
## 2 Unable 86
## 3 Yes 200
## # A tibble: 3 × 2
## Signature Mothers
## <chr> <int>
## 1 No 643
## 2 Unable 73
## 3 Yes 6
## # A tibble: 3 × 2
## Signature Godfathers
## <chr> <int>
## 1 No 45
## 2 Unable 58
## 3 Yes 600
## # A tibble: 3 × 2
## Signature Godmothers
## <chr> <int>
## 1 No 46
## 2 Unable 93
## 3 Yes 616
If you have multiple files that are very similar you can group multiple datasets using Reduce() and inner_join()
baptisms_signatures_tbl <- Reduce(inner_join,
list(baptisms_father_sig_count,
baptisms_mother_sig_count,
baptisms_godfather_sig_count,
baptisms_godmother_sig_count))## Joining, by = "Signature"
## Joining, by = "Signature"
## Joining, by = "Signature"
## # A tibble: 3 × 5
## Signature Fathers Mothers Godfathers Godmothers
## <chr> <int> <int> <int> <int>
## 1 No 423 643 45 46
## 2 Unable 86 73 58 93
## 3 Yes 200 6 600 616
# Change the order of the rows
baptisms_signatures_tbl %>% arrange(desc(Signature)) -> baptisms_signatures_tbl
baptisms_signatures_tbl## # A tibble: 3 × 5
## Signature Fathers Mothers Godfathers Godmothers
## <chr> <int> <int> <int> <int>
## 1 Yes 200 6 600 616
## 2 Unable 86 73 58 93
## 3 No 423 643 45 46
The package kableExtra creates nice tables.
’ install.packages(‘kableExtra’) ’
##
## Attaching package: 'kableExtra'
## The following object is masked from 'package:dplyr':
##
## group_rows
kbl(baptisms_signatures_tbl, caption = "Literacy of Parents & Godparents", booktabs = T) %>% kable_classic_2(full_width = F)| Signature | Fathers | Mothers | Godfathers | Godmothers |
|---|---|---|---|---|
| Yes | 200 | 6 | 600 | 616 |
| Unable | 86 | 73 | 58 | 93 |
| No | 423 | 643 | 45 | 46 |