To import the data, please run the following code on the three csv files from the database
read.csv('baptisms.csv', stringsAsFactors = F, na.strings= c("NA", " ", "")) -> baptisms
read.csv('burials.csv', stringsAsFactors = F, na.strings= c("NA", " ", "")) -> burials
read.csv('marriages.csv', stringsAsFactors = F, na.strings= c("NA", " ", "")) -> marriages
read.csv('abjurations.csv', stringsAsFactors = F, na.strings= c("NA", " ", "")) -> abjurations
This imports the datasets as data.frames, however, as these can be somewhat limited, we will convert these to table data.frames.
baptisms <- as_tibble(baptisms)
burials <- as_tibble(burials)
marriages <- as_tibble(marriages)
abjurations <- as_tibble(abjurations)
As date ranges are available per dataset, you can covert these to specific date fields.
baptisms$Date_of_Baptism <- as.Date(baptisms$Date_of_Baptism)
burials$Date_of_Burial <- as.Date(burials$Date_of_Burial)
marriages$Date_of_Marriage <- as.Date(marriages$Date_of_Marriage)
abjurations$Date <- as.Date(abjurations$Date)
The majority of data analyses used the %>% function, this is akin to grammar in a sentence, for example, to create a gender breakdown of burials you would use it as follows:
burials %>% count(Gender)
## # A tibble: 2 × 2
## Gender n
## <chr> <int>
## 1 Female 173
## 2 Male 214
If you wanted to save this information as a separate table you can create a new burials_gender table
burials %>% count(Gender) -> burials_gender
baptisms %>% count(Gender) -> baptisms_gender
marriages %>% count()
## # A tibble: 1 × 1
## n
## <int>
## 1 175
The first research question concerns the numbers of Irish people who are present across the entire database. To start with you need to separate out the Irish people from the overall data and then create yearly totals for each. The filter() function will create Irish specific datasets.
# For burials this is quite straightforward as there is only person involved
burials %>% filter(Nationality_Infer == "Irish") -> burials_irish
# For baptisms, you can filter to make sure that child and the parents are all listed as having Irish nationality
baptisms %>% filter(Child_Nationality_Infer == "Irish" & Father_Nationality_infer == "Irish" & Mother_Nationality_infer == "Irish") -> baptisms_irish
# For marriages, the same principal applies
marriages %>% filter(Groom_Nationality_infer == "Irish" & Bride_Nationality_infer == "Irish") -> marriages_irish
To count the number of marriages, you need to first count all the marriages and create a new table. As R often uses ‘n’ to signify a new variable it is often useful to rename columns to better represent what the data actually refers to.
marriages_irish %>% count(Date_of_Marriage) -> marriages_irish_dates
colnames(marriages_irish_dates) <- c("Date", "No")
Group marriages into years using the floor_date() command from the ‘lubridate’ package. Its fairly intelligent and can reorganise dates into days, months, years etc.
marriages_irish_dates %>% group_by(year=floor_date(Date, "year")) %>% summarize(No_of_Marriages=sum(No)) -> marriages_irish_yearly
This process is then repeated for burials and baptisms
burials_irish %>% count(Date_of_Burial) -> burials_irish_dates
colnames(burials_irish_dates) <- c("Date", "No")
burials_irish_dates %>% group_by(year=floor_date(Date, "year")) %>% summarize(No_of_Burials=sum(No)) -> burials_irish_yearly
baptisms_irish %>% count(Date_of_Baptism) -> baptisms_irish_dates
colnames(baptisms_irish_dates) <- c("Date", "No")
baptisms_irish_dates %>% group_by(year=floor_date(Date, "year")) %>% summarize(No_of_Baptisms=sum(No)) -> baptisms_irish_yearly
The baptisms and burials yearly totals matched and were merged easily into a new grouped table
inner_join(baptisms_irish_yearly, burials_irish_yearly, by="year") -> births_deaths_yearly
As there are no recorded marriages of Irish grooms & brides for a number of single years, it was necessary to create new empty row values for the marriages table
marriages_irish_yearly %>% add_row(year = c(as.Date("1689-01-01"),
as.Date("1693-01-01"),
as.Date("1709-01-01"))) -> marriages_irish_yearly
As this introduces NAs, the following command replaces this with 0
marriages_irish_yearly[is.na(marriages_irish_yearly)] <- 0
This was then added to the grouped birth_deaths table
inner_join(births_deaths_yearly, marriages_irish_yearly, by="year") -> irish_yearly_births_deaths_marriages
Tidy the column names
colnames(irish_yearly_births_deaths_marriages) <- c("Year", "Baptisms", "Burials", "Marriages")
# This data is in a 'wide' format, for effective charts it needs to be re-adjusted to 'long' data. There is a package called 'reshape2' which does this automatically.
irish_yearly <- melt(irish_yearly_births_deaths_marriages, id.vars = "Year")
irish_yearly
## Year variable value
## 1 1689-01-01 Baptisms 4
## 2 1690-01-01 Baptisms 9
## 3 1691-01-01 Baptisms 8
## 4 1692-01-01 Baptisms 26
## 5 1693-01-01 Baptisms 27
## 6 1694-01-01 Baptisms 20
## 7 1695-01-01 Baptisms 18
## 8 1696-01-01 Baptisms 35
## 9 1697-01-01 Baptisms 29
## 10 1698-01-01 Baptisms 37
## 11 1699-01-01 Baptisms 51
## 12 1700-01-01 Baptisms 32
## 13 1701-01-01 Baptisms 36
## 14 1702-01-01 Baptisms 30
## 15 1703-01-01 Baptisms 33
## 16 1704-01-01 Baptisms 27
## 17 1705-01-01 Baptisms 20
## 18 1706-01-01 Baptisms 24
## 19 1707-01-01 Baptisms 30
## 20 1708-01-01 Baptisms 35
## 21 1709-01-01 Baptisms 3
## 22 1689-01-01 Burials 8
## 23 1690-01-01 Burials 8
## 24 1691-01-01 Burials 6
## 25 1692-01-01 Burials 10
## 26 1693-01-01 Burials 37
## 27 1694-01-01 Burials 12
## 28 1695-01-01 Burials 17
## 29 1696-01-01 Burials 12
## 30 1697-01-01 Burials 15
## 31 1698-01-01 Burials 24
## 32 1699-01-01 Burials 27
## 33 1700-01-01 Burials 31
## 34 1701-01-01 Burials 27
## 35 1702-01-01 Burials 19
## 36 1703-01-01 Burials 24
## 37 1704-01-01 Burials 27
## 38 1705-01-01 Burials 9
## 39 1706-01-01 Burials 12
## 40 1707-01-01 Burials 17
## 41 1708-01-01 Burials 41
## 42 1709-01-01 Burials 3
## 43 1689-01-01 Marriages 0
## 44 1690-01-01 Marriages 3
## 45 1691-01-01 Marriages 2
## 46 1692-01-01 Marriages 1
## 47 1693-01-01 Marriages 0
## 48 1694-01-01 Marriages 7
## 49 1695-01-01 Marriages 7
## 50 1696-01-01 Marriages 8
## 51 1697-01-01 Marriages 8
## 52 1698-01-01 Marriages 16
## 53 1699-01-01 Marriages 5
## 54 1700-01-01 Marriages 12
## 55 1701-01-01 Marriages 11
## 56 1702-01-01 Marriages 10
## 57 1703-01-01 Marriages 11
## 58 1704-01-01 Marriages 6
## 59 1705-01-01 Marriages 8
## 60 1706-01-01 Marriages 7
## 61 1707-01-01 Marriages 8
## 62 1708-01-01 Marriages 8
## 63 1709-01-01 Marriages 0
# Now here is an example of a line graph showing the comparative rates of deaths, burials and baptisms from the database.
irish_yearly %>% ggplot(aes(x=Year, y=value, colour=variable, group=variable)) + geom_line()
# This can be adjusted using a variety of themes and other adjustments
irish_yearly %>% ggplot(aes(x=Year, y=value,colour=variable, group=variable)) + geom_line() + theme_fivethirtyeight()
irish_yearly %>% ggplot(aes(x=Year, y=value, colour=variable, group=variable)) +
geom_line(size=0.8) +
labs(title = "Births, Marriages and Deaths, 1689-1708",
subtitle = "(St. Germain-en-Laye)",
tag = "Figure 1",
caption = "Database of St. Germain-en-Laye Registers",
x = "Year",
y = "No.",
color = "") +
scale_color_colorblind() +
scale_x_date(date_breaks = "7 years", date_labels = "%Y") +
theme_classic() +
theme(axis.text.x = element_text(colour = "darkslategrey", size = 14),
axis.text.y = element_text(colour = "darkslategrey", size = 14),
legend.background = element_rect(fill = "white", size = 4, colour = "white"),
legend.justification = c(0, 1),
legend.position = c(0.9, 1),
text = element_text(family = "Baskerville"),
plot.title = element_text(size = 18, margin = margin(b = 10)),
plot.subtitle = element_text(size = 12, color = "darkslategrey", margin = margin(b = 25)),
plot.caption = element_text(size = 8, margin = margin(t = 10), color = "grey70", hjust = 0))
# Creating Charts of Gender Breakdown
# Marriages has no gender column so one must be created by counting the size of the dataframe
marriages %>% count() -> marriages_gender
marriages_gender$Gender <- c("Female")
marriages_gender$n <- NULL
marriages_gender$Marriages <- c(175)
marriages_gender %>% add_row(Gender = c("Male")) -> marriages_gender
marriages_gender[is.na(marriages_gender)] <- 175
marriages_gender
## # A tibble: 2 × 2
## Gender Marriages
## <chr> <dbl>
## 1 Female 175
## 2 Male 175
# Create _Gender totals for the other three datasets
burials %>% count(Gender) -> burials_gender
colnames(burials_gender) <- c("Gender", "Deaths")
baptisms %>% count(Gender) -> baptisms_gender
colnames(baptisms_gender) <- c("Gender", "Baptisms")
abjurations %>% count(Gender) -> abjurations_gender
colnames(abjurations_gender) <- c("Gender", "Abjurations")
# Merge these into a single dataframe in stages
inner_join(burials_gender, baptisms_gender, by="Gender") -> burials_baptisms_gender
inner_join(marriages_gender, abjurations_gender, by="Gender") -> marriages_abjurations_gender
inner_join(burials_baptisms_gender, marriages_abjurations_gender, by="Gender") -> registers_gender
# Change the data to long format using melt()
melt(registers_gender) -> registers_gender_long
## Using Gender as id variables
# Breakdown of Gender within the Registers
registers_gender_long %>% ggplot(aes(x=Gender, y=value)) +
geom_bar(aes(fill=variable), position = "dodge", stat = "identity", width = 0.5) +
labs(title = "Gender Breakdown of Registers, 1689-1708",
subtitle = "(St. Germain-en-Laye)",
tag = "Figure 2",
caption = "Database of St. Germain-en-Laye Registers",
x = "Gender",
y = "No.",
fill = "Registers") +
scale_fill_colorblind()+
theme_classic()+
theme(axis.text.x = element_text(colour = "darkslategrey", size = 16),
axis.text.y = element_text(colour = "darkslategrey", size = 16),
legend.background = element_rect(fill = "white", size = 4, colour = "white"),
legend.justification = c(0, 1),
legend.position = c(0.9, 1),
text = element_text(family = "Georgia"),
plot.title = element_text(size = 18, margin = margin(b = 10)),
plot.subtitle = element_text(size = 12, color = "darkslategrey", margin = margin(b = 25)),
plot.caption = element_text(size = 8, margin = margin(t = 10), color = "grey70", hjust = 0))
### Age and Gender Breakdowns
burials_irish %>% select(Gender, Age_ranges) -> burials_irish_gender_age_range
burials_irish_gender_age_range %>% filter(Gender == "Male") -> burials_irish_male_age_range
burials_irish_gender_age_range %>% filter(Gender == "Female") -> burials_irish_female_age_range
burials_irish_male_age_range %>% count(Age_ranges) -> burials_irish_male_age_range_count
burials_irish_female_age_range %>% count(Age_ranges) -> burials_irish_female_age_range_count
colnames(burials_irish_male_age_range_count) <- c("Age_ranges", "No_of_Deaths_Male")
colnames(burials_irish_female_age_range_count) <- c("Age_ranges", "No_of_Deaths_Female")
# For ease of using Age Ranges create a character list
namelist <- c("Less than 1 year", "1-5", "6-10", "11-15",
"16-20", "21-25", "26-30", "31-35", "36-40",
"41-45", "46-50", "51-55", "56-60", "61-65",
"65-100", "NA")
# Then re-order each burials dataframe according to the namelist, so that it begins with less than 1, 1-5 etc etc
burials_irish_female_age_range_count %>% arrange(factor(Age_ranges, levels = namelist)) ->
burials_irish_female_age_range_count
burials_irish_male_age_range_count %>% arrange(factor(Age_ranges, levels = namelist)) ->
burials_irish_male_age_range_count
# Next merge the two datasets into a single dataframe
inner_join(burials_irish_female_age_range_count, burials_irish_male_age_range_count, by="Age_ranges") -> burials_age_ranges_count
# Tidy the column names for ease of use
colnames(burials_age_ranges_count) <- c("Ages","Female","Male")
# Melt the data into a 'long' as opposed to wide format
melt(burials_age_ranges_count) -> burials_total_age_ranges_long
## Using Ages as id variables
# Figure 3 Graph of Burials by age & gender
burials_total_age_ranges_long %>% ggplot(aes(x=factor(Ages, levels = namelist), y=value)) +
geom_bar(aes(fill=variable), position = "dodge", stat = "identity", width = 0.5) +
labs(title = "Age and Gender breakdown of Irish deaths, 1689-1708 ",
subtitle = "(St. Germain-en-Laye)",
tag = "Figure 3",
caption = "Database of St. Germain-en-Laye Registers",
x = "Age Ranges",
y = "No. of Deaths",
fill = "Gender") +
scale_fill_colorblind() + theme_classic(base_family = "Baskerville") +
theme(axis.text.x = element_text(colour = "darkslategrey", size = 10),
axis.text.y = element_text(colour = "darkslategrey", size = 14),
legend.background = element_rect(fill = "white", size = 4, colour = "white"),
legend.justification = c(0, 1),
legend.position = c(0.9, 1),
axis.ticks = element_line(colour = "grey70", size = 0.3),
text = element_text(family = "Baskerville"),
plot.title = element_text(face = "bold", size = 18, margin = margin(b = 10)),
plot.subtitle = element_text(size = 12, color = "darkslategrey", margin = margin(b = 25)),
plot.caption = element_text(size = 10, margin = margin(t = 10), color = "grey70", hjust = 0))
# Non Irish Births, Marriages & Deaths
# This finds all the death records that have an nationality that is not listed as 'Irish'
burials %>% filter(Nationality_Infer != "Irish") -> burials_non_irish
burials_non_irish
## # A tibble: 1 × 30
## Type.of.Burial Name_FR Forename Surname Nationality_Inf… Gender Religion
## <chr> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 Regular Georges Uuak… George Watkins English Male Roman C…
## # … with 23 more variables: Age..d.m.y. <int>, Age..d.m. <chr>,
## # Age_ranges <chr>, Occupation <chr>, Marital_Status <chr>,
## # Spouse_Name_FR <chr>, Spouse_Forename_EN <chr>, Spouse_Surname_EN <chr>,
## # Date_of_Burial <date>, Domicile_Inferred <chr>, Place_of_Burial <chr>,
## # Father_Name_FR <chr>, Father_Forename_EN <chr>, Father_Surname_EN <chr>,
## # Father_Nationality <chr>, Father_Domicile <chr>, Mother_Name_FR <chr>,
## # Mother_Forename_EN <chr>, Mother_Surname_EN <chr>, …
# There are also the parents of the deceased who may be Irish but have a non-Irish parent, either a mother or father using the | symbol in the filter command to signify an either/or
burials %>% filter(Mother_Nationality != "Irish" | Father_Nationality != "Irish") -> burials_non_irish_parents
# Check your findings using the following commands
burials %>% count(Mother_Nationality)
## # A tibble: 4 × 2
## Mother_Nationality n
## <chr> <int>
## 1 English 3
## 2 French 4
## 3 Irish 176
## 4 <NA> 204
burials %>% count(Father_Nationality)
## # A tibble: 4 × 2
## Father_Nationality n
## <chr> <int>
## 1 English 3
## 2 French 1
## 3 Irish 200
## 4 <NA> 183
burials_non_irish_parents %>% view()
# Use the rbind() command to join these two burials dataframes together, as they are both from the same parent data the number of columns is the same.
rbind(burials_non_irish, burials_non_irish_parents) -> burials_non_irish
# Next move onto the marriages dataset
marriages %>% filter(Bride_Nationality_infer != "Irish" | Groom_Nationality_infer != "Irish") -> marriages_non_irish
# For the baptisms dataset filter each individually as there are too many options between the two parents and the child
baptisms %>% filter(Mother_Nationality_infer != "Irish") -> baptisms_non_irish_mother
baptisms %>% filter(Child_Nationality_Infer != "Irish") -> baptisms_non_irish_child
baptisms %>% filter(Father_Nationality_infer != "Irish") -> baptisms_non_irish_father
rbind(baptisms_non_irish_child, baptisms_non_irish_father, baptisms_non_irish_mother) -> baptisms_non_irish
# Check for non-Irish populations in the abjurations dataset
abjurations %>% count(Nationality)
## # A tibble: 1 × 2
## Nationality n
## <chr> <int>
## 1 Irish 21
# None exist, so this dataset is not part of the next analysis
# Non Irish Marriages
marriages_non_irish %>% count(Bride_Nationality_infer) %>% filter(Bride_Nationality_infer != "Irish") -> marriages_non_irish_brides
colnames(marriages_non_irish_brides) <- c("Nationality", "Brides")
marriages_non_irish %>% count(Groom_Nationality_infer) %>% filter(Groom_Nationality_infer != "Irish") -> marriages_non_irish_grooms
colnames(marriages_non_irish_grooms) <- c("Nationality", "Grooms")
inner_join(marriages_non_irish_brides, marriages_non_irish_grooms, by="Nationality") -> marriages_non_irish_brides_grooms
# The kable package creates very simple & clean tables which display tabular data in a clear format
marriages_non_irish_brides_grooms %>% kbl() %>% kable_minimal() %>% add_header_above(c("", "Non-Irish Spouses" = 2))
|
Non-Irish Spouses
|
||
|---|---|---|
| Nationality | Brides | Grooms |
| English | 17 | 7 |
| French | 10 | 2 |
# Use the following code to create a table breakdown of the nationality of Witnesses to Marriages
marriages %>% count(Witness_1_Nationality_infer) -> marriages_witness_1_nationality
colnames(marriages_witness_1_nationality) <- c("Nationality", "Witness_1")
marriages %>% count(Witness_2_Nationality_infer) -> marriages_witness_2_nationality
colnames(marriages_witness_2_nationality) <- c("Nationality", "Witness_2")
marriages %>% count(Witness_3_Nationality_infer) -> marriages_witness_3_nationality
colnames(marriages_witness_3_nationality) <- c("Nationality", "Witness_3")
marriages %>% count(Witness_4_Nationality_infer) -> marriages_witness_4_nationality
colnames(marriages_witness_4_nationality) <- c("Nationality", "Witness_4")
marriages %>% count(Witness_5_Nationality_infer) -> marriages_witness_5_nationality
colnames(marriages_witness_5_nationality) <- c("Nationality", "Witness_5")
# In order to merge all these together, you need to add a blank row for Scottish for the dataframes for Witnesses 3-5
marriages_witness_3_nationality %>% add_row(Nationality = "Scottish") -> marriages_witness_3_nationality
marriages_witness_4_nationality %>% add_row(Nationality = "Scottish") -> marriages_witness_4_nationality
marriages_witness_5_nationality %>% add_row(Nationality = "Scottish") -> marriages_witness_5_nationality
# As this introduces NAs, the following command replaces this with 0
marriages_witness_3_nationality$Witness_3[is.na(marriages_witness_3_nationality$Witness_3)] <- 0
marriages_witness_4_nationality$Witness_4[is.na(marriages_witness_4_nationality$Witness_4)] <- 0
marriages_witness_5_nationality$Witness_5[is.na(marriages_witness_5_nationality$Witness_5)] <- 0
# Now merge all these into a single dataframe
marriages_witnesses_nationality <- Reduce(inner_join, list(marriages_witness_1_nationality, marriages_witness_2_nationality, marriages_witness_3_nationality, marriages_witness_4_nationality, marriages_witness_5_nationality))
## Joining, by = "Nationality"
## Joining, by = "Nationality"
## Joining, by = "Nationality"
## Joining, by = "Nationality"
marriages_witnesses_nationality %>% kbl() %>% kable_minimal() %>% add_header_above(c("", "Breakdown of the Nationality of the Witnesses to Marriages" = 5))
|
Breakdown of the Nationality of the Witnesses to Marriages
|
|||||
|---|---|---|---|---|---|
| Nationality | Witness_1 | Witness_2 | Witness_3 | Witness_4 | Witness_5 |
| English | 9 | 7 | 6 | 3 | 1 |
| French | 10 | 18 | 14 | 8 | 1 |
| Irish | 140 | 125 | 98 | 58 | 18 |
| Scottish | 2 | 1 | 0 | 0 | 0 |
| NA | 14 | 24 | 57 | 106 | 155 |
# Non-Irish Burials
burials_non_irish %>% count(Father_Nationality) %>% filter(Father_Nationality != "Irish") -> burials_non_irish_fathers
colnames(burials_non_irish_fathers) <- c("Nationality", "Fathers")
burials_non_irish %>% count(Mother_Nationality) %>% filter(Mother_Nationality != "Irish") -> burials_non_irish_mothers
colnames(burials_non_irish_mothers) <- c("Nationality", "Mothers")
burials_non_irish %>% count(Nationality_Infer) %>% filter(Nationality_Infer != "Irish") -> burials_non_irish_deceased
colnames(burials_non_irish_deceased) <- c("Nationality", "Deceased")
# Add the missing French row to the deceased row and change the NA to numerical value
burials_non_irish_deceased %>% add_row(Nationality = "French") -> burials_non_irish_deceased
burials_non_irish_deceased$Deceased[is.na(burials_non_irish_deceased$Deceased)] <- 0
burials_non_irish_parents_deceased <- Reduce(inner_join,
list(burials_non_irish_deceased,
burials_non_irish_mothers,
burials_non_irish_fathers))
## Joining, by = "Nationality"
## Joining, by = "Nationality"
# Table Breakdown of non-Irish deaths and non-Irish parents of deceased
burials_non_irish_parents_deceased %>% kbl(full_width = F, font = "Baskerville") %>% kable_minimal() %>% add_header_above(c("", "Non-Irish deceased & non-Irish parents of deceased" = 3))
|
Non-Irish deceased & non-Irish parents of deceased
|
|||
|---|---|---|---|
| Nationality | Deceased | Mothers | Fathers |
| English | 1 | 3 | 3 |
| French | 0 | 4 | 1 |
# The following command uses the floor_date() command to group the burials_dates by 'month'
burials_irish_dates %>% group_by(Month=floor_date(Date, "month")) %>% summarize(No_of_Burials=sum(No)) -> burials_irish_dates_monthly
# The package 'xts' is used to create time series data which is required for dygraphs
xts(x=burials_irish_dates_monthly[,-1], order.by = burials_irish_dates_monthly$Month) -> burials_irish_dates_monthly_xts
# This next section is an adapation of the rainfall analysis undertaken by Chris Brunsdon
burials_irish_dates_monthly_xts %>% dygraph()
burials_irish_dates_monthly_xts %>% dygraph(width=800,height=300) %>% dyRangeSelector %>% dyRoller(rollPeriod = 600)
# Monthly Deaths - Table
# First add more time series columns
tk_augment_timeseries_signature(burials_irish_dates_monthly) -> burials_irish_dates_monthly_ts
## tk_augment_timeseries_signature(): Using the following .date_var variable: Month
# Select the Month, Year and keep the No_of_Burials
burials_irish_dates_monthly_ts %>% select(year, month.lbl, No_of_Burials) -> burials_irish_dates_monthly_tbl
colnames(burials_irish_dates_monthly_tbl) <- c("Year", "Month", "Burials")
# This create an interactive grouped table of the monthly burials results
burials_irish_dates_monthly_tbl %>% reactable(groupBy = "Year", striped = T, filterable = T)