Creating a Workspace

When you open RStudio for the first time there are four windows, you can re-orientate these to your own preference. This also applies to the text colour, font type and size. It is very customisable for your needs. RStudio uses the memory of your computer and you need to set a Working Directory for your analysis. My advice is create a specific folder and link RStudio to it using the Session -> Set Working Directory -> Choose Directory… option in the Menu Bar

This lecture is designed to show you the potential for using R for spatial analysis when applied to historical questions.

You must have the following installed on your computer before we begin.

R
RStudio

Open RStudio, in the console type the following

getwd()

## [1] "/Users/jackkavanagh/Dropbox/R_SGL_Lessons"

Any files you are going to import into R need to be placed into this folder, any files you export from R will also go to this folder. So the good thing is that it your files avoid being saved in some obscure folder in your hard-drive. You choose where it lives. For Windows users, the simplest place would be your Desktop.

Using the R Libraries

We are only using 3 libraries for our data analysis: Tidyverse, lubridate and ggthemes.

I already sent along a little script explaining how to import them but here is a recap.

install.packages(“tidyverse”, “ggthemes”, “lubridate”)

This then tells RStudio to install these packages from a central repository. Once they’ve been installed you need to tell R to use them. You do this by using the library() command

library(tidyverse)
library(ggthemes)
library(lubridate)

Key Commands to Remember

The %>% pipeline command will be used throughout this lecture to link various command queries

    marriages %>% select(Groom_Occupation_Type)

The $ command is used to display the internal components of a dataframe

    marriages$

Please run these now in the Console section of RStudio

The simple logical operators are for the filter command are:

    & (and)

    | (or)

    ! (not)

Joining Datasets - Row

Using base R it is possible to join datasets either by row or column

# Create data frame
df <- data.frame(a=c(1, 3, 3, 4, 5),
                 b=c(7, 7, 8, 3, 2),
                 c=c(3, 3, 6, 6, 8))

df

##   a b c
## 1 1 7 3
## 2 3 7 3
## 3 3 8 6
## 4 4 3 6
## 5 5 2 8

# Create a second dataframe 

df2 <- c(11, 14, 17)

df2

## [1] 11 14 17

# rbind() will join this datasets together as they are of equal length and stack one atop the other
df_new <- rbind(df, df2)

df_new

##    a  b  c
## 1  1  7  3
## 2  3  7  3
## 3  3  8  6
## 4  4  3  6
## 5  5  2  8
## 6 11 14 17

Joining Datasets - Column

This is an example of how to join dataframes by column

# Create data frame
df <- data.frame(a=c(1, 3, 3, 4, 5),
                 b=c(7, 7, 8, 3, 2),
                 c=c(3, 3, 6, 6, 8))

df

##   a b c
## 1 1 7 3
## 2 3 7 3
## 3 3 8 6
## 4 4 3 6
## 5 5 2 8

# Define vector
df2 <- c(11, 14, 16, 17, 22)
df2

## [1] 11 14 16 17 22

# cbind vector to data frame
df_new <- cbind(df, df2)

df_new

##   a b c df2
## 1 1 7 3  11
## 2 3 7 3  14
## 3 3 8 6  16
## 4 4 3 6  17
## 5 5 2 8  22

Adjusting the dates

There are a number of different ways to adjust dates, however, as the data is structured we can use the mdy() command from the package ‘lubridate’ to make a relatively simple change.

# Create some sample dates 
begin <- c("May 11, 1996", "September 12, 2001", "July 1, 1988")
end <- c("7/8/97","10/23/02","1/4/91")
class(begin)

## [1] "character"

## [1] "character"
class(end)

## [1] "character"

## [1] "character"

(begin <- mdy(begin))

## [1] "1996-05-11" "2001-09-12" "1988-07-01"

## [1] "1996-05-11" "2001-09-12" "1988-07-01"
(end <- mdy(end))

## [1] "1997-07-08" "2002-10-23" "1991-01-04"

## [1] "1997-07-08" "2002-10-23" "1991-01-04"
class(begin)

## [1] "Date"

## [1] "Date"
class(end)

## [1] "Date"

## [1] "Date"

Importing the Data

The Saint Germain-en-Laye database is structured as a series of tables: Marriages, Burials, Baptisms, Abjurations and Signatures

I’ve exported these from Microsoft Excel as CSV (Comma Separated Values) files as these are easier to import and are open access, meaning you don’t need to pay a subscription to Microsoft, Apple or Google to use them.

read.csv('baptisms.csv', 
         stringsAsFactors = F, 
         na.strings= c("NA", " ", "")) -> baptisms

To import files into the R environment, you are ‘reading’ them into the system. Each part of the following code segment performs a separate task, similar to when importing a file into Excel or Word.

The first part is straightforward, you are reading the file ‘baptisms.csv’ into the system.

The second part is telling the computer that is character data and does not have a formal hierarchy. It would be different it we were importing data that had to be in a specific order then we would tell import the file and then specify stringsAsFactors=TRUE. There is a longer explanation with examples at this link.

Finally the na.strings segment is telling the computer to insert NA into all the blank rows within the data.

Repeat the same process to Import all the other CSV files

read.csv('burials.csv', 
         stringsAsFactors = F, 
         na.strings= c("NA", " ", "")) -> burials

read.csv('marriages.csv', 
         stringsAsFactors = F, 
         na.strings= c("NA", " ", "")) -> marriages

read.csv('abjurations.csv', 
         stringsAsFactors = F, 
         na.strings= c("NA", " ", "")) -> abjurations

read.csv('signatures.csv', 
         stringsAsFactors = F, 
         na.strings= c("NA", " ", "")) -> signatures

Examine the files you have Imported

Once the files have been imported they will be imported as data.frames this is the R equivalent of a Excel file. We can get an overview of the data we have just imported using the glimpse() command, which provides an overview of the data.

You can see the different classes of data, in this case everything is a chr or character. Numerical data is often written as int or integer. In the case of the Date of Baptism column we will need to change that to a Date type.

glimpse(baptisms)

## Rows: 954
## Columns: 69
## $ Child_Name_FR                                      <chr> "Therese Margueritt…
## $ Child_Forename_EN                                  <chr> "Therese Margueritt…
## $ Child_Surname_EN                                   <chr> "Melfort", "Macdonn…
## $ Child_Nationality_Infer                            <chr> "Irish", "Irish", "…
## $ Gender                                             <chr> "Female", "Male", "…
## $ Child_Religion_Infer                               <chr> "Roman Catholic", "…
## $ Date_of_Birth                                      <chr> "1689-01-08", "1689…
## $ Date_of_Birth_Partial                              <chr> NA, NA, NA, NA, NA,…
## $ Date_of_Baptism                                    <chr> "1689-01-08", "1689…
## $ Place_of_Baptism                                   <chr> "St Germain-en-Laye…
## $ Church_of_Baptismal_Ceremony                       <chr> "Parish church", "P…
## $ Street_of_Residence                                <chr> NA, NA, NA, NA, NA,…
## $ Parents_Status                                     <chr> "Married", "Married…
## $ Father_Name_FR                                     <chr> "Jean Comte de Melf…
## $ Father_Forename_EN                                 <chr> "John", "Randall", …
## $ Father_Surname_EN                                  <chr> "Melfort", "Macdonn…
## $ Father_Nationality_Stated                          <chr> NA, NA, NA, NA, NA,…
## $ Father_Nationality_Infer                           <chr> "Irish", "Irish", "…
## $ Father_Rank                                        <chr> "Count", "Gentleman…
## $ Father_Occupation                                  <chr> "Secretary of the r…
## $ Father_Occupation_Type                             <chr> "Noble", "Noble", "…
## $ Member_of_Jacobite_Army                            <chr> "Yes", NA, "Yes", N…
## $ Father_Residence                                   <chr> "St Germain-en-Laye…
## $ Father_Register_Signature                          <chr> "No", "No", "No", "…
## $ Mother_Name_FR                                     <chr> "Euphenia Vicalace"…
## $ Mother_Forenam_EN                                  <chr> "Euphenia", "Hannah…
## $ Mother_Surname_EN                                  <chr> "Vicalace", "Roche"…
## $ Mother_Nationality_Stated                          <chr> NA, NA, NA, "Englis…
## $ Mother_Nationality_Infer                           <chr> NA, "Irish", "Irish…
## $ Mother_Religion_Infer                              <chr> "Roman Catholic", "…
## $ Mother_Rank                                        <chr> NA, NA, NA, NA, NA,…
## $ Mother_Occupation                                  <chr> NA, NA, NA, NA, NA,…
## $ Mother_Occupation_Type                             <chr> NA, NA, NA, NA, NA,…
## $ Mother_Residence                                   <chr> "St Germain-en-Laye…
## $ Mother_Register_Signature                          <chr> "No", "No", "No", "…
## $ Godfather_Name_FR                                  <chr> "Jacques Comte de D…
## $ Godfather_Forename_EN                              <chr> "Jacque", NA, NA, "…
## $ Godfather_Surname_EN                               <chr> "Drummond", NA, "St…
## $ Godfather_Nationality_Stated                       <chr> NA, NA, "English", …
## $ Godfather_Nationality_Infer                        <chr> "Scottish", NA, "En…
## $ Godfather_Rank                                     <chr> "Count", NA, "Lord"…
## $ Godfather_Occupation                               <chr> "Count", NA, "Lord"…
## $ Godfather_Occupation_Type                          <chr> "Noble", NA, "Noble…
## $ Godfather_Residence                                <chr> "St Germain-en-Laye…
## $ Godfather_familial_relationship_to_father_of_child <chr> NA, NA, NA, NA, NA,…
## $ Godfather_familial_relationship_to_mother_of_child <chr> NA, NA, NA, NA, NA,…
## $ Godfather_Register_Signature                       <chr> "Yes", "Yes", "Yes"…
## $ Godmother_Name_FR                                  <chr> "Marie Conny", "Ann…
## $ Godmother_Forename_EN                              <chr> "Marie", "Anne", "H…
## $ Godmother_Surname_EN                               <chr> "Cormy", "Bagnall",…
## $ Godmother_Nationality_Stated                       <chr> NA, NA, "English", …
## $ Godmother_Nationality_Infer                        <chr> "Irish", "Irish", "…
## $ Godmother_Religion_Infer                           <chr> "Roman Catholic", "…
## $ Godmother_Rank                                     <chr> NA, NA, NA, "Wife o…
## $ Godmother_Occupation                               <chr> NA, NA, NA, NA, NA,…
## $ Godmother_Occupation_Type                          <chr> NA, NA, NA, NA, NA,…
## $ Godmother_Father_Name_FR                           <chr> NA, NA, NA, NA, NA,…
## $ Godmother_Father_Forename_EN                       <chr> NA, NA, NA, NA, NA,…
## $ Godmother_Father_Surname_EN                        <chr> NA, NA, NA, NA, NA,…
## $ Godmother_Mother_Name_FR                           <chr> NA, NA, NA, NA, NA,…
## $ Godmother_Mother_Forename_EN                       <chr> NA, NA, NA, NA, NA,…
## $ Godmother_Mother_Surname_EN                        <chr> NA, NA, NA, NA, NA,…
## $ Godmother_Residence                                <chr> "St Germain-en-Laye…
## $ Godmother_relationship_to_father_of_child          <chr> NA, NA, NA, NA, NA,…
## $ Godmother_relationship_to_mother_of_child          <chr> NA, NA, NA, NA, NA,…
## $ Godmother_Register_Signature                       <chr> "Yes", "Yes", "Yes"…
## $ Officiating_Priest_Nationality                     <chr> "French", "French",…
## $ Additional_Notes                                   <chr> NA, NA, NA, NA, NA,…
## $ Archival_Reference                                 <chr> "5MI 1734 [1168921/…

Tidying the Data

We are going to to do two simple tasks here. First is to change the dataframes to table data.frames, its a relatively simple procedure and what is results is a simpler file structure.

baptisms <- as_tibble(baptisms)
burials <- as_tibble(burials)
marriages <- as_tibble(marriages)
abjurations <- as_tibble(abjurations)

Second task is to change the date information into a Date type, this will allow us to filter by day, month or years.

baptisms$Date_of_Baptism <- as.Date(baptisms$Date_of_Baptism)
burials$Date_of_Burial <- as.Date(burials$Date_of_Burial)
marriages$Date_of_Marriage <- as.Date(marriages$Date_of_Marriage)
abjurations$Date <- as.Date(abjurations$Date)

Check the Results

You’ll see that the Date_of_Baptism column has been changed to Date from chr

glimpse(baptisms)

## Rows: 954
## Columns: 69
## $ Child_Name_FR                                      <chr> "Therese Margueritt…
## $ Child_Forename_EN                                  <chr> "Therese Margueritt…
## $ Child_Surname_EN                                   <chr> "Melfort", "Macdonn…
## $ Child_Nationality_Infer                            <chr> "Irish", "Irish", "…
## $ Gender                                             <chr> "Female", "Male", "…
## $ Child_Religion_Infer                               <chr> "Roman Catholic", "…
## $ Date_of_Birth                                      <chr> "1689-01-08", "1689…
## $ Date_of_Birth_Partial                              <chr> NA, NA, NA, NA, NA,…
## $ Date_of_Baptism                                    <date> 1689-01-08, 1689-0…
## $ Place_of_Baptism                                   <chr> "St Germain-en-Laye…
## $ Church_of_Baptismal_Ceremony                       <chr> "Parish church", "P…
## $ Street_of_Residence                                <chr> NA, NA, NA, NA, NA,…
## $ Parents_Status                                     <chr> "Married", "Married…
## $ Father_Name_FR                                     <chr> "Jean Comte de Melf…
## $ Father_Forename_EN                                 <chr> "John", "Randall", …
## $ Father_Surname_EN                                  <chr> "Melfort", "Macdonn…
## $ Father_Nationality_Stated                          <chr> NA, NA, NA, NA, NA,…
## $ Father_Nationality_Infer                           <chr> "Irish", "Irish", "…
## $ Father_Rank                                        <chr> "Count", "Gentleman…
## $ Father_Occupation                                  <chr> "Secretary of the r…
## $ Father_Occupation_Type                             <chr> "Noble", "Noble", "…
## $ Member_of_Jacobite_Army                            <chr> "Yes", NA, "Yes", N…
## $ Father_Residence                                   <chr> "St Germain-en-Laye…
## $ Father_Register_Signature                          <chr> "No", "No", "No", "…
## $ Mother_Name_FR                                     <chr> "Euphenia Vicalace"…
## $ Mother_Forenam_EN                                  <chr> "Euphenia", "Hannah…
## $ Mother_Surname_EN                                  <chr> "Vicalace", "Roche"…
## $ Mother_Nationality_Stated                          <chr> NA, NA, NA, "Englis…
## $ Mother_Nationality_Infer                           <chr> NA, "Irish", "Irish…
## $ Mother_Religion_Infer                              <chr> "Roman Catholic", "…
## $ Mother_Rank                                        <chr> NA, NA, NA, NA, NA,…
## $ Mother_Occupation                                  <chr> NA, NA, NA, NA, NA,…
## $ Mother_Occupation_Type                             <chr> NA, NA, NA, NA, NA,…
## $ Mother_Residence                                   <chr> "St Germain-en-Laye…
## $ Mother_Register_Signature                          <chr> "No", "No", "No", "…
## $ Godfather_Name_FR                                  <chr> "Jacques Comte de D…
## $ Godfather_Forename_EN                              <chr> "Jacque", NA, NA, "…
## $ Godfather_Surname_EN                               <chr> "Drummond", NA, "St…
## $ Godfather_Nationality_Stated                       <chr> NA, NA, "English", …
## $ Godfather_Nationality_Infer                        <chr> "Scottish", NA, "En…
## $ Godfather_Rank                                     <chr> "Count", NA, "Lord"…
## $ Godfather_Occupation                               <chr> "Count", NA, "Lord"…
## $ Godfather_Occupation_Type                          <chr> "Noble", NA, "Noble…
## $ Godfather_Residence                                <chr> "St Germain-en-Laye…
## $ Godfather_familial_relationship_to_father_of_child <chr> NA, NA, NA, NA, NA,…
## $ Godfather_familial_relationship_to_mother_of_child <chr> NA, NA, NA, NA, NA,…
## $ Godfather_Register_Signature                       <chr> "Yes", "Yes", "Yes"…
## $ Godmother_Name_FR                                  <chr> "Marie Conny", "Ann…
## $ Godmother_Forename_EN                              <chr> "Marie", "Anne", "H…
## $ Godmother_Surname_EN                               <chr> "Cormy", "Bagnall",…
## $ Godmother_Nationality_Stated                       <chr> NA, NA, "English", …
## $ Godmother_Nationality_Infer                        <chr> "Irish", "Irish", "…
## $ Godmother_Religion_Infer                           <chr> "Roman Catholic", "…
## $ Godmother_Rank                                     <chr> NA, NA, NA, "Wife o…
## $ Godmother_Occupation                               <chr> NA, NA, NA, NA, NA,…
## $ Godmother_Occupation_Type                          <chr> NA, NA, NA, NA, NA,…
## $ Godmother_Father_Name_FR                           <chr> NA, NA, NA, NA, NA,…
## $ Godmother_Father_Forename_EN                       <chr> NA, NA, NA, NA, NA,…
## $ Godmother_Father_Surname_EN                        <chr> NA, NA, NA, NA, NA,…
## $ Godmother_Mother_Name_FR                           <chr> NA, NA, NA, NA, NA,…
## $ Godmother_Mother_Forename_EN                       <chr> NA, NA, NA, NA, NA,…
## $ Godmother_Mother_Surname_EN                        <chr> NA, NA, NA, NA, NA,…
## $ Godmother_Residence                                <chr> "St Germain-en-Laye…
## $ Godmother_relationship_to_father_of_child          <chr> NA, NA, NA, NA, NA,…
## $ Godmother_relationship_to_mother_of_child          <chr> NA, NA, NA, NA, NA,…
## $ Godmother_Register_Signature                       <chr> "Yes", "Yes", "Yes"…
## $ Officiating_Priest_Nationality                     <chr> "French", "French",…
## $ Additional_Notes                                   <chr> NA, NA, NA, NA, NA,…
## $ Archival_Reference                                 <chr> "5MI 1734 [1168921/…

Filtering for Specifics - Nationality

As we are using the ‘Tidyverse’ library, the language used is very straightforward. Therefore in order to filter by a specific characteristic, the code is very simple.

# This filters the burials and creates a new subset of Irish only burials

burials %>% filter(Nationality_Infer == "Irish") -> burials_irish

# You can see the difference in size
burials_irish

## # A tibble: 897 × 33
##    Type.of.Burial Name_FR Foren…¹ Surna…² Natio…³ Gender Relig…⁴ Age_F…⁵ Age_P…⁶
##    <chr>          <chr>   <chr>   <chr>   <chr>   <chr>  <chr>     <int> <chr>  
##  1 Charitable     Patric… Patrick MacCor… Irish   Male   Roman …       2 <NA>   
##  2 Charitable     Marie … Mary T… Maher   Irish   Female Roman …      NA 20 mon…
##  3 Charitable     Alexan… Alexan… Gordon  Irish   Male   Roman …      NA 18 mon…
##  4 Charitable     George… George… Willia… Irish   Male   Roman …       2 <NA>   
##  5 Charitable     Alexis… Alexis  MacLau… Irish   Male   Roman …       2 <NA>   
##  6 Charitable     Guilla… William O'Brien Irish   Male   Roman …       2 <NA>   
##  7 Charitable     Hanain… Hanain… MacDon… Irish   Female Roman …       3 <NA>   
##  8 Charitable     Jacque… Jacque  Baggott Irish   Male   Roman …      NA 18 mon…
##  9 Regular        Honora… Honora… Jennin… Irish   Female Roman …      NA 16 mon…
## 10 Regular        Jacque… Jacque  Carbery Irish   Male   Roman …       1 <NA>   
## # … with 887 more rows, 24 more variables: Age_Ranges <chr>, Occupation <chr>,
## #   Occupation_Type <chr>, Marital_Status <chr>, Spouse_Name_FR <chr>,
## #   Spouse_Forename_EN <chr>, Spouse_Surname_EN <chr>, Date_of_Burial <date>,
## #   Domicile_Inferred <chr>, Place_of_Burial <chr>, Street_Name <chr>,
## #   Father_Name_FR <chr>, Father_Forename_EN <chr>, Father_Surname_EN <chr>,
## #   Father_Nationality <chr>, Father_Domicile <chr>, Mother_Name_FR <chr>,
## #   Mother_Forename_EN <chr>, Mother_Surname_EN <chr>, …

Filter Baptisms

# As the entries of baptisms are focused upon an individual child, be sure to place that first in your list, followed by the parents

baptisms %>% filter(Child_Nationality_Infer == "Irish" & Father_Nationality_Infer == "Irish" & Mother_Nationality_Infer == "Irish") -> baptisms_irish

# Check the results
baptisms_irish

## # A tibble: 727 × 69
##    Child_Nam…¹ Child…² Child…³ Child…⁴ Gender Child…⁵ Date_…⁶ Date_…⁷ Date_of_…⁸
##    <chr>       <chr>   <chr>   <chr>   <chr>  <chr>   <chr>   <chr>   <date>    
##  1 Renald Mag… "Ronal… Macdon… Irish   Male   Roman … 1689-1… <NA>    1689-03-22
##  2 Henriette … "Henri… Dumbar… Irish   Female Roman … 1689-0… <NA>    1689-06-14
##  3 Marie Brue… "Marie" Breret… Irish   Female Roman … 1689-1… <NA>    1689-10-28
##  4 Louise Sau… "Louis… Sewell  Irish   Female Roman … 1689-1… <NA>    1689-12-27
##  5 Jacques Ad… "James" Adams   Irish   Male   Roman … 1690-0… <NA>    1690-02-04
##  6 Anne Winif… "Anne … Strick… Irish   Female Roman … 1690-0… <NA>    1690-03-28
##  7 Marie Ursu… "Mary … Riva    Irish   Female Roman … 1690-0… <NA>    1690-04-03
##  8 Marie Fitz… "Mary"  Fitzpa… Irish   Female Roman … 1690-0… <NA>    1690-05-11
##  9 Louis Jack… "Louis" Jackson Irish   Male   Roman … 1690-0… <NA>    1690-05-30
## 10 Louise Lam… "Louis… Lambert Irish   Female Roman … 1690-0… <NA>    1690-08-17
## # … with 717 more rows, 60 more variables: Place_of_Baptism <chr>,
## #   Church_of_Baptismal_Ceremony <chr>, Street_of_Residence <chr>,
## #   Parents_Status <chr>, Father_Name_FR <chr>, Father_Forename_EN <chr>,
## #   Father_Surname_EN <chr>, Father_Nationality_Stated <chr>,
## #   Father_Nationality_Infer <chr>, Father_Rank <chr>, Father_Occupation <chr>,
## #   Father_Occupation_Type <chr>, Member_of_Jacobite_Army <chr>,
## #   Father_Residence <chr>, Father_Register_Signature <chr>, …

Filter Marriages

# Again as each marriage is a separate instance or event only use the Bride and Groom Nationality

marriages %>% filter(Groom_Nationality_Infer == "Irish" & Bride_Nationality_Infer == "Irish") -> marriages_irish

# Check the results
marriages_irish

## # A tibble: 192 × 76
##    Groom_Name_FR Groom…¹ Groom…² Groom…³ Groom…⁴ Groom…⁵ Groom…⁶ Groom…⁷ Groom…⁸
##    <chr>         <chr>   <chr>   <chr>   <chr>   <chr>   <chr>   <chr>   <chr>  
##  1 Aaron Huskin  Aaron   Huskin  Marie … Mary    Waldren No      Jean H… "John" 
##  2 Alain MacDon… Alan    MacDon… Marie … Mary A… MacClo… No      Daniel… "Danie…
##  3 Alexandre Ma… Alexan… MacCle… <NA>    <NA>    <NA>    <NA>    <NA>     <NA>  
##  4 Alexandre Pl… Alexan… Plunke… Jeanne… Jean    O'Mara  No      Mauric… "Mauri…
##  5 André Barret  Andrew  Barrett Anne L… Anne    Long    No      Jacque… "James"
##  6 Anthoine Omu… Anthony O'Mulr… Cather… Cather… Nixon   No      Jeremi… "Jerem…
##  7 Bernard Henry Bernard Henry   Mabel … Mabel   Nolan   No      Patric… "Patri…
##  8 Bernard Hues  Bernard Hughes  <NA>    <NA>    <NA>    <NA>    <NA>     <NA>  
##  9 Bernard Jolly Bernard Jolly   Cecile… Cecille Reilly  No      Louis … "Louis"
## 10 Bernard Morp… Bernard Murphy  Rose C… Rose    Connol… No      Terenc… "Teren…
## # … with 182 more rows, 67 more variables: Groom_Father_Surname_EN <chr>,
## #   Groom_Father_Deceased <chr>, Groom_Age <int>, Widower <chr>,
## #   Groom_Nationality_Stated <chr>, Groom_Nationality_Infer <chr>,
## #   Groom_Religion_Infer <chr>, Groom_Rank <chr>, Groom_Occupation <chr>,
## #   Groom_Occupation_Type <chr>, Groom_Residence <chr>,
## #   Date_of_Marriage <date>, Place_of_Marriage <chr>, Church_of_Ceremony <chr>,
## #   Bride_Name_FR <chr>, Bride_Forename_EN <chr>, Bride_Surname_EN <chr>, …

Creating Date Ranges

To count the number of baptisms, you need to first count all the marriages and create a new table. As R often uses ‘n’ to signify a new variable it is often useful to rename columns to better represent what the data actually refers to.

# Create a new dataframe of baptisms
baptisms_irish %>% count(Date_of_Baptism) -> baptisms_irish_dates

# Check the results, we now have a smaller dataframe showing all the Baptisms per day
baptisms_irish_dates

## # A tibble: 694 × 2
##    Date_of_Baptism     n
##    <date>          <int>
##  1 1689-03-22          1
##  2 1689-06-14          1
##  3 1689-10-28          1
##  4 1689-12-27          1
##  5 1690-02-04          1
##  6 1690-03-28          1
##  7 1690-04-03          1
##  8 1690-05-11          1
##  9 1690-05-30          1
## 10 1690-08-17          1
## # … with 684 more rows

Renaming the columns prevents confusion later on and makes for more accurate findings.

colnames(baptisms_irish_dates) <- c("Date", "No")

# Check the results
baptisms_irish_dates

## # A tibble: 694 × 2
##    Date          No
##    <date>     <int>
##  1 1689-03-22     1
##  2 1689-06-14     1
##  3 1689-10-28     1
##  4 1689-12-27     1
##  5 1690-02-04     1
##  6 1690-03-28     1
##  7 1690-04-03     1
##  8 1690-05-11     1
##  9 1690-05-30     1
## 10 1690-08-17     1
## # … with 684 more rows

class(baptisms_irish_dates$Date)

## [1] "Date"

Repeat this process for Marriages and Burials

marriages_irish %>% count(Date_of_Marriage) -> marriages_irish_dates

marriages_irish_dates

## # A tibble: 190 × 2
##    Date_of_Marriage     n
##    <date>           <int>
##  1 1690-08-18           1
##  2 1690-09-04           1
##  3 1690-11-29           1
##  4 1691-05-11           1
##  5 1691-08-08           1
##  6 1692-11-29           1
##  7 1694-04-01           1
##  8 1694-04-06           1
##  9 1694-05-08           1
## 10 1694-06-10           1
## # … with 180 more rows

colnames(marriages_irish_dates) <- c("Date", "No")

# Check the results
marriages_irish_dates

## # A tibble: 190 × 2
##    Date          No
##    <date>     <int>
##  1 1690-08-18     1
##  2 1690-09-04     1
##  3 1690-11-29     1
##  4 1691-05-11     1
##  5 1691-08-08     1
##  6 1692-11-29     1
##  7 1694-04-01     1
##  8 1694-04-06     1
##  9 1694-05-08     1
## 10 1694-06-10     1
## # … with 180 more rows

burials_irish %>% count(Date_of_Burial) -> burials_irish_dates

burials_irish_dates

## # A tibble: 853 × 2
##    Date_of_Burial     n
##    <date>         <int>
##  1 1689-01-06         1
##  2 1689-02-24         1
##  3 1689-04-28         1
##  4 1689-05-29         1
##  5 1689-07-30         1
##  6 1689-09-16         1
##  7 1689-11-06         1
##  8 1690-01-26         1
##  9 1690-03-09         1
## 10 1690-04-09         1
## # … with 843 more rows

colnames(burials_irish_dates) <- c("Date", "No")

# Check the results
burials_irish_dates

## # A tibble: 853 × 2
##    Date          No
##    <date>     <int>
##  1 1689-01-06     1
##  2 1689-02-24     1
##  3 1689-04-28     1
##  4 1689-05-29     1
##  5 1689-07-30     1
##  6 1689-09-16     1
##  7 1689-11-06     1
##  8 1690-01-26     1
##  9 1690-03-09     1
## 10 1690-04-09     1
## # … with 843 more rows

The package ‘lubridate’ was created to help group together dates with ease

To explain the process there are a number of things happening. First we select the dataset we want to analyse baptisms_dates and then we use the group_by() command which allows for the creation of new groups within the Date information.

# You can select by different ranges, for example 3 months
baptisms_irish_dates %>% group_by(threemonth=floor_date(Date, "3 months")) %>% summarize(No_of_Baptisms=sum(No))

## # A tibble: 148 × 2
##    threemonth No_of_Baptisms
##    <date>              <int>
##  1 1689-01-01              1
##  2 1689-04-01              1
##  3 1689-10-01              2
##  4 1690-01-01              2
##  5 1690-04-01              3
##  6 1690-07-01              2
##  7 1690-10-01              2
##  8 1691-01-01              5
##  9 1691-04-01              3
## 10 1691-10-01              1
## # … with 138 more rows

‘threemonth’ is the new column we’re creating and ‘3 months’ is the range we set. Now that groups the Date but we also need to group the totals as well, for that we use summarize() and tell are creating a running total of all the dates.

# We want yearly data so group by year
baptisms_irish_dates %>% group_by(year=floor_date(Date, "year")) %>% summarize(No_of_Baptisms=sum(No)) -> baptisms_irish_yearly

baptisms_irish_yearly

## # A tibble: 46 × 2
##    year       No_of_Baptisms
##    <date>              <int>
##  1 1689-01-01              4
##  2 1690-01-01              9
##  3 1691-01-01              9
##  4 1692-01-01             27
##  5 1693-01-01             27
##  6 1694-01-01             20
##  7 1695-01-01             18
##  8 1696-01-01             35
##  9 1697-01-01             29
## 10 1698-01-01             37
## # … with 36 more rows

Repeat this process for Marriages and Burials

burials_irish_dates %>% group_by(year=floor_date(Date, "year")) %>% summarize(No_of_Burials=sum(No)) -> burials_irish_yearly

burials_irish_yearly

## # A tibble: 53 × 2
##    year       No_of_Burials
##    <date>             <int>
##  1 1689-01-01             7
##  2 1690-01-01             8
##  3 1691-01-01             8
##  4 1692-01-01            11
##  5 1693-01-01            37
##  6 1694-01-01            12
##  7 1695-01-01            17
##  8 1696-01-01            12
##  9 1697-01-01            15
## 10 1698-01-01            24
## # … with 43 more rows

marriages_irish_dates %>% group_by(year=floor_date(Date, "year")) %>% summarize(No_of_Marriages=sum(No)) -> marriages_irish_yearly

marriages_irish_yearly

## # A tibble: 36 × 2
##    year       No_of_Marriages
##    <date>               <int>
##  1 1690-01-01               3
##  2 1691-01-01               2
##  3 1692-01-01               1
##  4 1694-01-01               7
##  5 1695-01-01               7
##  6 1696-01-01               8
##  7 1697-01-01               8
##  8 1698-01-01              16
##  9 1699-01-01               5
## 10 1700-01-01              12
## # … with 26 more rows

Creating grouped totals for Baptisms, Burials and Marriages

Grouped years for each of the three dataframes do not match, this is a feature of historical data. So we need to add new rows for years that are blank.

glimpse(marriages_irish_yearly)

## Rows: 36
## Columns: 2
## $ year            <date> 1690-01-01, 1691-01-01, 1692-01-01, 1694-01-01, 1695-…
## $ No_of_Marriages <int> 3, 2, 1, 7, 7, 8, 8, 16, 5, 12, 11, 10, 11, 6, 8, 7, 8…

glimpse(burials_irish_yearly)

## Rows: 53
## Columns: 2
## $ year          <date> 1689-01-01, 1690-01-01, 1691-01-01, 1692-01-01, 1693-01…
## $ No_of_Burials <int> 7, 8, 8, 11, 37, 12, 17, 12, 15, 24, 27, 31, 27, 19, 24,…

glimpse(baptisms_irish_yearly)

## Rows: 46
## Columns: 2
## $ year           <date> 1689-01-01, 1690-01-01, 1691-01-01, 1692-01-01, 1693-0…
## $ No_of_Baptisms <int> 4, 9, 9, 27, 27, 20, 18, 35, 29, 37, 51, 32, 35, 30, 33…

There should be 52 entries in each, there is an extra NA row in the Burials dataset. So we need to add the additional empty rows for marriages and baptisms. You can create a full screen view of each table using the view() comamnd

burials_irish_yearly %>% view()

Adding Missing Rows

Now I’ve worked out all the empty rows for marriages & baptisms but the principal is essentially that all the datasets must be the same length before they can be added together.

# This adds a new blank row so the Marriages Yearly dataframe
marriages_irish_yearly %>% add_row(year = c(as.Date("1689-01-01"),
                                            as.Date("1693-01-01"),
                                            as.Date("1719-01-01"),
                                            as.Date("1722-01-01"),
                                            as.Date("1723-01-01"),
                                            as.Date("1724-01-01"),
                                            as.Date("1725-01-01"),
                                            as.Date("1729-01-01"),
                                            as.Date("1730-01-01"),
                                            as.Date("1731-01-01"),
                                            as.Date("1734-01-01"),
                                            as.Date("1735-01-01"),
                                            as.Date("1737-01-01"),
                                            as.Date("1738-01-01"),
                                            as.Date("1739-01-01"),
                                            as.Date("1740-01-01"))) -> marriages_irish_yearly

# This adds a new blank row to the Baptisms Yearly dataframe
baptisms_irish_yearly %>% add_row(year = c(as.Date("1729-01-01"),
                                           as.Date("1724-01-01"),
                                           as.Date("1733-01-01"),
                                           as.Date("1735-01-01"),
                                           as.Date("1736-01-01"),
                                           as.Date("1740-01-01"))) -> baptisms_irish_yearly

# As this introduces NAs, the following command replaces this with 0
marriages_irish_yearly[is.na(marriages_irish_yearly)] <- 0

# As this introduces NAs, the following command replaces this with 0
baptisms_irish_yearly[is.na(baptisms_irish_yearly)] <- 0

# Occasinally you'll need to specify the column as well
baptisms_irish_yearly$No_of_Baptisms[is.na(baptisms_irish_yearly$No_of_Baptisms)] <- 0

# Final thing is to remove the NA from the burials_irish_yearly
burials_irish_yearly %>% na.omit() -> burials_irish_yearly

Alternative ways to add and identify missing rows

Repeat the process to create the marriages_irish_yearly totals

marriages_irish_dates %>% group_by(year=floor_date(Date, "year")) %>% summarize(No_of_Marriages=sum(No)) -> marriages_irish_yearly

marriages_irish_yearly

## # A tibble: 36 × 2
##    year       No_of_Marriages
##    <date>               <int>
##  1 1690-01-01               3
##  2 1691-01-01               2
##  3 1692-01-01               1
##  4 1694-01-01               7
##  5 1695-01-01               7
##  6 1696-01-01               8
##  7 1697-01-01               8
##  8 1698-01-01              16
##  9 1699-01-01               5
## 10 1700-01-01              12
## # … with 26 more rows

Since we know that the marriages_irish_yearly totals are missing a number of years we can create add these quickly by creating a sequence of our complete timeline

# This creates a dataframe of all the years in our timeline
all_dates <- data.frame(year=seq(as.Date("1689-01-01"), by='year', length.out=52))

all_dates

##          year
## 1  1689-01-01
## 2  1690-01-01
## 3  1691-01-01
## 4  1692-01-01
## 5  1693-01-01
## 6  1694-01-01
## 7  1695-01-01
## 8  1696-01-01
## 9  1697-01-01
## 10 1698-01-01
## 11 1699-01-01
## 12 1700-01-01
## 13 1701-01-01
## 14 1702-01-01
## 15 1703-01-01
## 16 1704-01-01
## 17 1705-01-01
## 18 1706-01-01
## 19 1707-01-01
## 20 1708-01-01
## 21 1709-01-01
## 22 1710-01-01
## 23 1711-01-01
## 24 1712-01-01
## 25 1713-01-01
## 26 1714-01-01
## 27 1715-01-01
## 28 1716-01-01
## 29 1717-01-01
## 30 1718-01-01
## 31 1719-01-01
## 32 1720-01-01
## 33 1721-01-01
## 34 1722-01-01
## 35 1723-01-01
## 36 1724-01-01
## 37 1725-01-01
## 38 1726-01-01
## 39 1727-01-01
## 40 1728-01-01
## 41 1729-01-01
## 42 1730-01-01
## 43 1731-01-01
## 44 1732-01-01
## 45 1733-01-01
## 46 1734-01-01
## 47 1735-01-01
## 48 1736-01-01
## 49 1737-01-01
## 50 1738-01-01
## 51 1739-01-01
## 52 1740-01-01

Using a command called anti_join() we can determine the missing years from the marriages_irish_yearly dataset

# Use anti_join() to create a dataframe of missing dates
anti_join(all_dates, marriages_irish_yearly, by="year") -> missing_dates

missing_dates

##          year
## 1  1689-01-01
## 2  1693-01-01
## 3  1719-01-01
## 4  1722-01-01
## 5  1723-01-01
## 6  1724-01-01
## 7  1725-01-01
## 8  1729-01-01
## 9  1730-01-01
## 10 1731-01-01
## 11 1734-01-01
## 12 1735-01-01
## 13 1737-01-01
## 14 1738-01-01
## 15 1739-01-01
## 16 1740-01-01

# Now merge the two dataframes together by the year column, remember to include all the values for x and y values
merge(marriages_irish_yearly, missing_dates, by="year", all.y = T, all.x = T) -> marriages_irish_yearly

marriages_irish_yearly

##          year No_of_Marriages
## 1  1689-01-01              NA
## 2  1690-01-01               3
## 3  1691-01-01               2
## 4  1692-01-01               1
## 5  1693-01-01              NA
## 6  1694-01-01               7
## 7  1695-01-01               7
## 8  1696-01-01               8
## 9  1697-01-01               8
## 10 1698-01-01              16
## 11 1699-01-01               5
## 12 1700-01-01              12
## 13 1701-01-01              11
## 14 1702-01-01              10
## 15 1703-01-01              11
## 16 1704-01-01               6
## 17 1705-01-01               8
## 18 1706-01-01               7
## 19 1707-01-01               8
## 20 1708-01-01              10
## 21 1709-01-01               5
## 22 1710-01-01              14
## 23 1711-01-01               2
## 24 1712-01-01               2
## 25 1713-01-01               7
## 26 1714-01-01               3
## 27 1715-01-01               2
## 28 1716-01-01               4
## 29 1717-01-01               1
## 30 1718-01-01               1
## 31 1719-01-01              NA
## 32 1720-01-01               1
## 33 1721-01-01               2
## 34 1722-01-01              NA
## 35 1723-01-01              NA
## 36 1724-01-01              NA
## 37 1725-01-01              NA
## 38 1726-01-01               2
## 39 1727-01-01               1
## 40 1728-01-01               1
## 41 1729-01-01              NA
## 42 1730-01-01              NA
## 43 1731-01-01              NA
## 44 1732-01-01               2
## 45 1733-01-01               1
## 46 1734-01-01              NA
## 47 1735-01-01              NA
## 48 1736-01-01               1
## 49 1737-01-01              NA
## 50 1738-01-01              NA
## 51 1739-01-01              NA
## 52 1740-01-01              NA

# Remember to change the NA values to 0
marriages_irish_yearly[is.na(marriages_irish_yearly)] <- 0

Merging the Date information

# The baptisms and burials yearly totals matched and were merged into a new grouped table using the command inner_join()

inner_join(baptisms_irish_yearly, burials_irish_yearly, by="year") -> births_deaths_yearly

births_deaths_yearly

## # A tibble: 52 × 3
##    year       No_of_Baptisms No_of_Burials
##    <date>              <dbl>         <int>
##  1 1689-01-01              4             7
##  2 1690-01-01              9             8
##  3 1691-01-01              9             8
##  4 1692-01-01             27            11
##  5 1693-01-01             27            37
##  6 1694-01-01             20            12
##  7 1695-01-01             18            17
##  8 1696-01-01             35            12
##  9 1697-01-01             29            15
## 10 1698-01-01             37            24
## # … with 42 more rows

# Add the Marriages yearly totals to the new grouped totals
inner_join(births_deaths_yearly, marriages_irish_yearly, by="year") -> yearly_births_deaths_marriages

yearly_births_deaths_marriages

## # A tibble: 52 × 4
##    year       No_of_Baptisms No_of_Burials No_of_Marriages
##    <date>              <dbl>         <int>           <dbl>
##  1 1689-01-01              4             7               0
##  2 1690-01-01              9             8               3
##  3 1691-01-01              9             8               2
##  4 1692-01-01             27            11               1
##  5 1693-01-01             27            37               0
##  6 1694-01-01             20            12               7
##  7 1695-01-01             18            17               7
##  8 1696-01-01             35            12               8
##  9 1697-01-01             29            15               8
## 10 1698-01-01             37            24              16
## # … with 42 more rows

# Tidy the column names
colnames(yearly_births_deaths_marriages) <- c("Year", "Baptisms", "Burials", "Marriages")

yearly_births_deaths_marriages

## # A tibble: 52 × 4
##    Year       Baptisms Burials Marriages
##    <date>        <dbl>   <int>     <dbl>
##  1 1689-01-01        4       7         0
##  2 1690-01-01        9       8         3
##  3 1691-01-01        9       8         2
##  4 1692-01-01       27      11         1
##  5 1693-01-01       27      37         0
##  6 1694-01-01       20      12         7
##  7 1695-01-01       18      17         7
##  8 1696-01-01       35      12         8
##  9 1697-01-01       29      15         8
## 10 1698-01-01       37      24        16
## # … with 42 more rows

Wide data versus Long data

At the moment, the grouped births, deaths and marriages table is in wide format, we need to change it to long data.

# This data is in a 'wide' format, for effective charts it needs to be re-adjusted to 'long' data. There is a package called 'reshape2' which does this automatically. 

yearly_long <- melt(yearly_births_deaths_marriages, id.vars = "Year")

# As you can see it's been re-ordered

yearly_long

##           Year  variable value
## 1   1689-01-01  Baptisms     4
## 2   1690-01-01  Baptisms     9
## 3   1691-01-01  Baptisms     9
## 4   1692-01-01  Baptisms    27
## 5   1693-01-01  Baptisms    27
## 6   1694-01-01  Baptisms    20
## 7   1695-01-01  Baptisms    18
## 8   1696-01-01  Baptisms    35
## 9   1697-01-01  Baptisms    29
## 10  1698-01-01  Baptisms    37
## 11  1699-01-01  Baptisms    51
## 12  1700-01-01  Baptisms    32
## 13  1701-01-01  Baptisms    35
## 14  1702-01-01  Baptisms    30
## 15  1703-01-01  Baptisms    33
## 16  1704-01-01  Baptisms    27
## 17  1705-01-01  Baptisms    20
## 18  1706-01-01  Baptisms    24
## 19  1707-01-01  Baptisms    30
## 20  1708-01-01  Baptisms    35
## 21  1709-01-01  Baptisms    26
## 22  1710-01-01  Baptisms    16
## 23  1711-01-01  Baptisms    25
## 24  1712-01-01  Baptisms    25
## 25  1713-01-01  Baptisms    10
## 26  1714-01-01  Baptisms    17
## 27  1715-01-01  Baptisms     9
## 28  1716-01-01  Baptisms     8
## 29  1717-01-01  Baptisms     8
## 30  1718-01-01  Baptisms     8
## 31  1719-01-01  Baptisms     6
## 32  1720-01-01  Baptisms     4
## 33  1721-01-01  Baptisms     4
## 34  1722-01-01  Baptisms     3
## 35  1723-01-01  Baptisms     5
## 36  1725-01-01  Baptisms     3
## 37  1726-01-01  Baptisms     1
## 38  1727-01-01  Baptisms     3
## 39  1728-01-01  Baptisms     2
## 40  1730-01-01  Baptisms     1
## 41  1731-01-01  Baptisms     1
## 42  1732-01-01  Baptisms     1
## 43  1734-01-01  Baptisms     1
## 44  1737-01-01  Baptisms     2
## 45  1738-01-01  Baptisms     2
## 46  1739-01-01  Baptisms     4
## 47  1729-01-01  Baptisms     0
## 48  1724-01-01  Baptisms     0
## 49  1733-01-01  Baptisms     0
## 50  1735-01-01  Baptisms     0
## 51  1736-01-01  Baptisms     0
## 52  1740-01-01  Baptisms     0
## 53  1689-01-01   Burials     7
## 54  1690-01-01   Burials     8
## 55  1691-01-01   Burials     8
## 56  1692-01-01   Burials    11
## 57  1693-01-01   Burials    37
## 58  1694-01-01   Burials    12
## 59  1695-01-01   Burials    17
## 60  1696-01-01   Burials    12
## 61  1697-01-01   Burials    15
## 62  1698-01-01   Burials    24
## 63  1699-01-01   Burials    27
## 64  1700-01-01   Burials    31
## 65  1701-01-01   Burials    27
## 66  1702-01-01   Burials    19
## 67  1703-01-01   Burials    24
## 68  1704-01-01   Burials    27
## 69  1705-01-01   Burials     9
## 70  1706-01-01   Burials    12
## 71  1707-01-01   Burials    17
## 72  1708-01-01   Burials    41
## 73  1709-01-01   Burials    39
## 74  1710-01-01   Burials    27
## 75  1711-01-01   Burials    39
## 76  1712-01-01   Burials    41
## 77  1713-01-01   Burials    21
## 78  1714-01-01   Burials    16
## 79  1715-01-01   Burials    18
## 80  1716-01-01   Burials    23
## 81  1717-01-01   Burials    20
## 82  1718-01-01   Burials    15
## 83  1719-01-01   Burials    19
## 84  1720-01-01   Burials    13
## 85  1721-01-01   Burials    11
## 86  1722-01-01   Burials    20
## 87  1723-01-01   Burials    16
## 88  1725-01-01   Burials    13
## 89  1726-01-01   Burials    11
## 90  1727-01-01   Burials    16
## 91  1728-01-01   Burials     8
## 92  1730-01-01   Burials     3
## 93  1731-01-01   Burials     5
## 94  1732-01-01   Burials    15
## 95  1734-01-01   Burials     8
## 96  1737-01-01   Burials     5
## 97  1738-01-01   Burials    11
## 98  1739-01-01   Burials     9
## 99  1729-01-01   Burials    14
## 100 1724-01-01   Burials    14
## 101 1733-01-01   Burials     9
## 102 1735-01-01   Burials    14
## 103 1736-01-01   Burials     7
## 104 1740-01-01   Burials    11
## 105 1689-01-01 Marriages     0
## 106 1690-01-01 Marriages     3
## 107 1691-01-01 Marriages     2
## 108 1692-01-01 Marriages     1
## 109 1693-01-01 Marriages     0
## 110 1694-01-01 Marriages     7
## 111 1695-01-01 Marriages     7
## 112 1696-01-01 Marriages     8
## 113 1697-01-01 Marriages     8
## 114 1698-01-01 Marriages    16
## 115 1699-01-01 Marriages     5
## 116 1700-01-01 Marriages    12
## 117 1701-01-01 Marriages    11
## 118 1702-01-01 Marriages    10
## 119 1703-01-01 Marriages    11
## 120 1704-01-01 Marriages     6
## 121 1705-01-01 Marriages     8
## 122 1706-01-01 Marriages     7
## 123 1707-01-01 Marriages     8
## 124 1708-01-01 Marriages    10
## 125 1709-01-01 Marriages     5
## 126 1710-01-01 Marriages    14
## 127 1711-01-01 Marriages     2
## 128 1712-01-01 Marriages     2
## 129 1713-01-01 Marriages     7
## 130 1714-01-01 Marriages     3
## 131 1715-01-01 Marriages     2
## 132 1716-01-01 Marriages     4
## 133 1717-01-01 Marriages     1
## 134 1718-01-01 Marriages     1
## 135 1719-01-01 Marriages     0
## 136 1720-01-01 Marriages     1
## 137 1721-01-01 Marriages     2
## 138 1722-01-01 Marriages     0
## 139 1723-01-01 Marriages     0
## 140 1725-01-01 Marriages     0
## 141 1726-01-01 Marriages     2
## 142 1727-01-01 Marriages     1
## 143 1728-01-01 Marriages     1
## 144 1730-01-01 Marriages     0
## 145 1731-01-01 Marriages     0
## 146 1732-01-01 Marriages     2
## 147 1734-01-01 Marriages     0
## 148 1737-01-01 Marriages     0
## 149 1738-01-01 Marriages     0
## 150 1739-01-01 Marriages     0
## 151 1729-01-01 Marriages     0
## 152 1724-01-01 Marriages     0
## 153 1733-01-01 Marriages     1
## 154 1735-01-01 Marriages     0
## 155 1736-01-01 Marriages     1
## 156 1740-01-01 Marriages     0

A simple graph using the graphics library ggplot2

This is just a basic plot without any background or title. For a good introduction to using ggplot2, I recommend the introduction written by Chris Brunsdon here.

yearly_long %>% ggplot(aes(x=Year, y=value, colour=variable, group=variable)) + geom_line(linewidth=0.8)

A fully labelled graph

This is a fully labelled and marked up graph, notice that you can edit nearly every aspect.

yearly_long %>% ggplot(aes(x=Year, y=value, colour=variable, group=variable)) + 
geom_line(linewidth=0.8) + 
labs(title = "Irish Baptisms, Burials and Marriages 1689-1740",
     subtitle = "(St. Germain-en-Laye)",
     tag = "Figure 1", 
     caption = "Database of St. Germain-en-Laye Registers",
     x = "Year",
     y = "No.",
     color = "") +
scale_color_colorblind() + 
theme_classic() +
theme(axis.text.x = element_text(colour = "darkslategrey", size = 16), 
      axis.text.y = element_text(colour = "darkslategrey", size = 16),
      legend.background = element_rect(fill = "white", linewidth = 4, colour = "white"),
      legend.justification = c(0, 1),
      legend.position = c(0.9, 1),
      text = element_text(family = "Georgia"),
      plot.title = element_text(size = 18, margin = margin(b = 10)),
      plot.subtitle = element_text(size = 12, color = "darkslategrey", margin = margin(b = 25)),
      plot.caption = element_text(size = 8, margin = margin(t = 10), color = "grey70", hjust = 0))

Creating Charts of Gender Breakdown

Marriages has no gender column so one must be created by counting the size of the dataframe

# The creates a table showing the length of the marriages dataset which is 274 as thats the total number of marriages
marriages %>% count() -> marriages_gender

# Next we add a new column called Gender and add a new row called Female
marriages_gender$Gender <- c("Female")

# Using the NULL command removes the n row
marriages_gender$n <- NULL

# This adds the value of 274 for the Marriages column
marriages_gender$Marriages <- c(274)

# This adds a new row of Male
marriages_gender %>% add_row(Gender = c("Male")) -> marriages_gender

# This adds the value of 274 for the Marriages column and replaces the NA that was added for Male row
marriages_gender[is.na(marriages_gender)] <- 274

marriages_gender

## # A tibble: 2 × 2
##   Gender Marriages
##   <chr>      <dbl>
## 1 Female       274
## 2 Male         274

Create Gender totals for the other three datasets

As the other elements of the SGL database all contain a Gender column, the creation of specific gender totals is straightforward

burials %>% count(Gender) -> burials_gender
colnames(burials_gender) <- c("Gender", "Deaths")

baptisms %>% count(Gender) -> baptisms_gender
colnames(baptisms_gender) <- c("Gender", "Baptisms")

abjurations %>% count(Gender) -> abjurations_gender
colnames(abjurations_gender) <- c("Gender", "Abjurations")

Merge these into a single dataframe in stages

inner_join(burials_gender, baptisms_gender, by="Gender") -> burials_baptisms_gender

inner_join(marriages_gender, abjurations_gender, by="Gender") -> marriages_abjurations_gender

inner_join(burials_baptisms_gender, marriages_abjurations_gender, by="Gender") -> registers_gender

# Change the data to long format using melt()

melt(registers_gender) -> registers_gender_long

## Using Gender as id variables

Create a new graph

registers_gender_long %>% ggplot(aes(x=Gender, y=value)) + 
geom_bar(aes(fill=variable), position = "dodge", stat = "identity", width = 0.5) +
labs(title = "Gender Breakdown of Registers, 1689-1740",
      subtitle = "(St. Germain-en-Laye)",
      tag = "Figure 2", 
      caption = "Database of St. Germain-en-Laye Registers",
      x = "Gender",
      y = "No.",
      fill = "Registers") +
scale_fill_colorblind()+ 
theme_classic() +
theme(axis.text.x = element_text(colour = "darkslategrey", size = 16), 
      axis.text.y = element_text(colour = "darkslategrey", size = 16),          
      legend.background = element_rect(fill = "white", size = 4, colour = "white"),
      legend.justification = c(0, 1),
      legend.position = c(0.9, 1),
      text = element_text(family = "Georgia"),
      plot.title = element_text(size = 18, margin = margin(b = 10)),
      plot.subtitle = element_text(size = 12, color = "darkslategrey", margin = margin(b = 25)),
      plot.caption = element_text(size = 8, margin = margin(t = 10), color = "grey70", hjust = 0))

## Warning: The `size` argument of `element_rect()` is deprecated as of ggplot2 3.4.0.
## ℹ Please use the `linewidth` argument instead.

Using the SGL Database in R - Part 1

R Markdown

Creating a Workspace

Using the R Libraries

Key Commands to Remember

Joining Datasets - Row

Joining Datasets - Column

Adjusting the dates

Importing the Data

Repeat the same process to Import all the other CSV files

Examine the files you have Imported

Tidying the Data

Check the Results

Filtering for Specifics - Nationality

Filter Baptisms

Filter Marriages

Creating Date Ranges

Repeat this process for Marriages and Burials

The package ‘lubridate’ was created to help group together dates with ease

Repeat this process for Marriages and Burials

Creating grouped totals for Baptisms, Burials and Marriages

Adding Missing Rows

Alternative ways to add and identify missing rows

Merging the Date information

Wide data versus Long data

A simple graph using the graphics library ggplot2

A fully labelled graph

Creating Charts of Gender Breakdown

Create Gender totals for the other three datasets

Create a new graph