SGL R Exercises

J. Kavanagh

2023-04-08

Introduction

This is a series of excercises to get everyone used to the RStudio environment and with some of the code segments and commands we went over in the past few weeks.

My advice is to create a new R Script file for this exercise and copy the following code into it, saving repeatedly as you go.

Some Exercises for using R

First load the libaries you’ll be using for these excercises.

library(tidyverse)
library(lubridate)

Secondly, import the four csv files that make up the SGL Database

read.csv('baptisms.csv', 
         stringsAsFactors = F, 
         na.strings= c("NA", " ", "")) -> baptisms

read.csv('burials.csv', 
         stringsAsFactors = F, 
         na.strings= c("NA", " ", "")) -> burials

read.csv('marriages.csv', 
         stringsAsFactors = F, 
         na.strings= c("NA", " ", "")) -> marriages

read.csv('abjurations.csv', 
         stringsAsFactors = F, 
         na.strings= c("NA", " ", "")) -> abjurations

read.csv('signatures.csv', 
         stringsAsFactors = F, 
         na.strings= c("NA", " ", "")) -> signatures

Examine the files you’ve imported

Once the files have been imported they will be imported as data.frames this is the R equivalent of a Excel file. We can get an overview of the data we have just imported using the glimpse() command, which provides an overview of the data.

You can see the different classes of data, in this case everything is a chr or character. Numerical data is often written as int or integer. In the case of the Date of Baptism column we will need to change that to a Date type.

glimpse(baptisms)
## Rows: 954
## Columns: 69
## $ Child_Name_FR                                      <chr> "Therese Margueritt…
## $ Child_Forename_EN                                  <chr> "Therese Margueritt…
## $ Child_Surname_EN                                   <chr> "Melfort", "Macdonn…
## $ Child_Nationality_Infer                            <chr> "Irish", "Irish", "…
## $ Gender                                             <chr> "Female", "Male", "…
## $ Child_Religion_Infer                               <chr> "Roman Catholic", "…
## $ Date_of_Birth                                      <chr> "1689-01-08", "1689…
## $ Date_of_Birth_Partial                              <chr> NA, NA, NA, NA, NA,…
## $ Date_of_Baptism                                    <chr> "1689-01-08", "1689…
## $ Place_of_Baptism                                   <chr> "St Germain-en-Laye…
## $ Church_of_Baptismal_Ceremony                       <chr> "Parish church", "P…
## $ Street_of_Residence                                <chr> NA, NA, NA, NA, NA,…
## $ Parents_Status                                     <chr> "Married", "Married…
## $ Father_Name_FR                                     <chr> "Jean Comte de Melf…
## $ Father_Forename_EN                                 <chr> "John", "Randall", …
## $ Father_Surname_EN                                  <chr> "Melfort", "Macdonn…
## $ Father_Nationality_Stated                          <chr> NA, NA, NA, NA, NA,…
## $ Father_Nationality_Infer                           <chr> "Irish", "Irish", "…
## $ Father_Rank                                        <chr> "Count", "Gentleman…
## $ Father_Occupation                                  <chr> "Secretary of the r…
## $ Father_Occupation_Type                             <chr> "Noble", "Noble", "…
## $ Member_of_Jacobite_Army                            <chr> "Yes", NA, "Yes", N…
## $ Father_Residence                                   <chr> "St Germain-en-Laye…
## $ Father_Register_Signature                          <chr> "No", "No", "No", "…
## $ Mother_Name_FR                                     <chr> "Euphenia Vicalace"…
## $ Mother_Forenam_EN                                  <chr> "Euphenia", "Hannah…
## $ Mother_Surname_EN                                  <chr> "Vicalace", "Roche"…
## $ Mother_Nationality_Stated                          <chr> NA, NA, NA, "Englis…
## $ Mother_Nationality_Infer                           <chr> NA, "Irish", "Irish…
## $ Mother_Religion_Infer                              <chr> "Roman Catholic", "…
## $ Mother_Rank                                        <chr> NA, NA, NA, NA, NA,…
## $ Mother_Occupation                                  <chr> NA, NA, NA, NA, NA,…
## $ Mother_Occupation_Type                             <chr> NA, NA, NA, NA, NA,…
## $ Mother_Residence                                   <chr> "St Germain-en-Laye…
## $ Mother_Register_Signature                          <chr> "No", "No", "No", "…
## $ Godfather_Name_FR                                  <chr> "Jacques Comte de D…
## $ Godfather_Forename_EN                              <chr> "Jacque", NA, NA, "…
## $ Godfather_Surname_EN                               <chr> "Drummond", NA, "St…
## $ Godfather_Nationality_Stated                       <chr> NA, NA, "English", …
## $ Godfather_Nationality_Infer                        <chr> "Scottish", NA, "En…
## $ Godfather_Rank                                     <chr> "Count", NA, "Lord"…
## $ Godfather_Occupation                               <chr> "Count", NA, "Lord"…
## $ Godfather_Occupation_Type                          <chr> "Noble", NA, "Noble…
## $ Godfather_Residence                                <chr> "St Germain-en-Laye…
## $ Godfather_familial_relationship_to_father_of_child <chr> NA, NA, NA, NA, NA,…
## $ Godfather_familial_relationship_to_mother_of_child <chr> NA, NA, NA, NA, NA,…
## $ Godfather_Register_Signature                       <chr> "Yes", "Yes", "Yes"…
## $ Godmother_Name_FR                                  <chr> "Marie Conny", "Ann…
## $ Godmother_Forename_EN                              <chr> "Marie", "Anne", "H…
## $ Godmother_Surname_EN                               <chr> "Cormy", "Bagnall",…
## $ Godmother_Nationality_Stated                       <chr> NA, NA, "English", …
## $ Godmother_Nationality_Infer                        <chr> "Irish", "Irish", "…
## $ Godmother_Religion_Infer                           <chr> "Roman Catholic", "…
## $ Godmother_Rank                                     <chr> NA, NA, NA, "Wife o…
## $ Godmother_Occupation                               <chr> NA, NA, NA, NA, NA,…
## $ Godmother_Occupation_Type                          <chr> NA, NA, NA, NA, NA,…
## $ Godmother_Father_Name_FR                           <chr> NA, NA, NA, NA, NA,…
## $ Godmother_Father_Forename_EN                       <chr> NA, NA, NA, NA, NA,…
## $ Godmother_Father_Surname_EN                        <chr> NA, NA, NA, NA, NA,…
## $ Godmother_Mother_Name_FR                           <chr> NA, NA, NA, NA, NA,…
## $ Godmother_Mother_Forename_EN                       <chr> NA, NA, NA, NA, NA,…
## $ Godmother_Mother_Surname_EN                        <chr> NA, NA, NA, NA, NA,…
## $ Godmother_Residence                                <chr> "St Germain-en-Laye…
## $ Godmother_relationship_to_father_of_child          <chr> NA, NA, NA, NA, NA,…
## $ Godmother_relationship_to_mother_of_child          <chr> NA, NA, NA, NA, NA,…
## $ Godmother_Register_Signature                       <chr> "Yes", "Yes", "Yes"…
## $ Officiating_Priest_Nationality                     <chr> "French", "French",…
## $ Additional_Notes                                   <chr> NA, NA, NA, NA, NA,…
## $ Archival_Reference                                 <chr> "5MI 1734 [1168921/…

Tidying the Data

We are going to to do two simple tasks here. First is to change the dataframes to table data.frames, its a relatively simple procedure and what is results is a simpler file structure. You don’t have to do this, but it does result in a cleaner visual aesthetic.

baptisms <- as_tibble(baptisms)
burials <- as_tibble(burials)
marriages <- as_tibble(marriages)
abjurations <- as_tibble(abjurations)

Second task is to change the date information into a Date type, this will allow us to filter by day, month or years. This needs to occur for any analysis of dates.

baptisms$Date_of_Baptism <- as.Date(baptisms$Date_of_Baptism)
burials$Date_of_Burial <- as.Date(burials$Date_of_Burial)
marriages$Date_of_Marriage <- as.Date(marriages$Date_of_Marriage)
abjurations$Date <- as.Date(abjurations$Date)

Check the Results

You’ll see that the Date_of_Baptism column has been changed to Date from chr

glimpse(baptisms)
## Rows: 954
## Columns: 69
## $ Child_Name_FR                                      <chr> "Therese Margueritt…
## $ Child_Forename_EN                                  <chr> "Therese Margueritt…
## $ Child_Surname_EN                                   <chr> "Melfort", "Macdonn…
## $ Child_Nationality_Infer                            <chr> "Irish", "Irish", "…
## $ Gender                                             <chr> "Female", "Male", "…
## $ Child_Religion_Infer                               <chr> "Roman Catholic", "…
## $ Date_of_Birth                                      <chr> "1689-01-08", "1689…
## $ Date_of_Birth_Partial                              <chr> NA, NA, NA, NA, NA,…
## $ Date_of_Baptism                                    <date> 1689-01-08, 1689-0…
## $ Place_of_Baptism                                   <chr> "St Germain-en-Laye…
## $ Church_of_Baptismal_Ceremony                       <chr> "Parish church", "P…
## $ Street_of_Residence                                <chr> NA, NA, NA, NA, NA,…
## $ Parents_Status                                     <chr> "Married", "Married…
## $ Father_Name_FR                                     <chr> "Jean Comte de Melf…
## $ Father_Forename_EN                                 <chr> "John", "Randall", …
## $ Father_Surname_EN                                  <chr> "Melfort", "Macdonn…
## $ Father_Nationality_Stated                          <chr> NA, NA, NA, NA, NA,…
## $ Father_Nationality_Infer                           <chr> "Irish", "Irish", "…
## $ Father_Rank                                        <chr> "Count", "Gentleman…
## $ Father_Occupation                                  <chr> "Secretary of the r…
## $ Father_Occupation_Type                             <chr> "Noble", "Noble", "…
## $ Member_of_Jacobite_Army                            <chr> "Yes", NA, "Yes", N…
## $ Father_Residence                                   <chr> "St Germain-en-Laye…
## $ Father_Register_Signature                          <chr> "No", "No", "No", "…
## $ Mother_Name_FR                                     <chr> "Euphenia Vicalace"…
## $ Mother_Forenam_EN                                  <chr> "Euphenia", "Hannah…
## $ Mother_Surname_EN                                  <chr> "Vicalace", "Roche"…
## $ Mother_Nationality_Stated                          <chr> NA, NA, NA, "Englis…
## $ Mother_Nationality_Infer                           <chr> NA, "Irish", "Irish…
## $ Mother_Religion_Infer                              <chr> "Roman Catholic", "…
## $ Mother_Rank                                        <chr> NA, NA, NA, NA, NA,…
## $ Mother_Occupation                                  <chr> NA, NA, NA, NA, NA,…
## $ Mother_Occupation_Type                             <chr> NA, NA, NA, NA, NA,…
## $ Mother_Residence                                   <chr> "St Germain-en-Laye…
## $ Mother_Register_Signature                          <chr> "No", "No", "No", "…
## $ Godfather_Name_FR                                  <chr> "Jacques Comte de D…
## $ Godfather_Forename_EN                              <chr> "Jacque", NA, NA, "…
## $ Godfather_Surname_EN                               <chr> "Drummond", NA, "St…
## $ Godfather_Nationality_Stated                       <chr> NA, NA, "English", …
## $ Godfather_Nationality_Infer                        <chr> "Scottish", NA, "En…
## $ Godfather_Rank                                     <chr> "Count", NA, "Lord"…
## $ Godfather_Occupation                               <chr> "Count", NA, "Lord"…
## $ Godfather_Occupation_Type                          <chr> "Noble", NA, "Noble…
## $ Godfather_Residence                                <chr> "St Germain-en-Laye…
## $ Godfather_familial_relationship_to_father_of_child <chr> NA, NA, NA, NA, NA,…
## $ Godfather_familial_relationship_to_mother_of_child <chr> NA, NA, NA, NA, NA,…
## $ Godfather_Register_Signature                       <chr> "Yes", "Yes", "Yes"…
## $ Godmother_Name_FR                                  <chr> "Marie Conny", "Ann…
## $ Godmother_Forename_EN                              <chr> "Marie", "Anne", "H…
## $ Godmother_Surname_EN                               <chr> "Cormy", "Bagnall",…
## $ Godmother_Nationality_Stated                       <chr> NA, NA, "English", …
## $ Godmother_Nationality_Infer                        <chr> "Irish", "Irish", "…
## $ Godmother_Religion_Infer                           <chr> "Roman Catholic", "…
## $ Godmother_Rank                                     <chr> NA, NA, NA, "Wife o…
## $ Godmother_Occupation                               <chr> NA, NA, NA, NA, NA,…
## $ Godmother_Occupation_Type                          <chr> NA, NA, NA, NA, NA,…
## $ Godmother_Father_Name_FR                           <chr> NA, NA, NA, NA, NA,…
## $ Godmother_Father_Forename_EN                       <chr> NA, NA, NA, NA, NA,…
## $ Godmother_Father_Surname_EN                        <chr> NA, NA, NA, NA, NA,…
## $ Godmother_Mother_Name_FR                           <chr> NA, NA, NA, NA, NA,…
## $ Godmother_Mother_Forename_EN                       <chr> NA, NA, NA, NA, NA,…
## $ Godmother_Mother_Surname_EN                        <chr> NA, NA, NA, NA, NA,…
## $ Godmother_Residence                                <chr> "St Germain-en-Laye…
## $ Godmother_relationship_to_father_of_child          <chr> NA, NA, NA, NA, NA,…
## $ Godmother_relationship_to_mother_of_child          <chr> NA, NA, NA, NA, NA,…
## $ Godmother_Register_Signature                       <chr> "Yes", "Yes", "Yes"…
## $ Officiating_Priest_Nationality                     <chr> "French", "French",…
## $ Additional_Notes                                   <chr> NA, NA, NA, NA, NA,…
## $ Archival_Reference                                 <chr> "5MI 1734 [1168921/…

Counting

Using the count() command you can create fast summary statistics for example using the Age_Ranges column within the burials dataset.

burials %>% count(Age_Ranges)
## # A tibble: 16 × 2
##    Age_Ranges           n
##    <chr>            <int>
##  1 1-5                134
##  2 11-15               15
##  3 16-20               33
##  4 21-25               31
##  5 26-30               34
##  6 31-35               26
##  7 36-40               55
##  8 41-45               37
##  9 46-50               57
## 10 51-55               30
## 11 56-60               54
## 12 6-10                51
## 13 61-65               30
## 14 65 and older       167
## 15 Less than 1 year   146
## 16 <NA>                30

Filtering for Specifics - Nationality

As we are using the ‘Tidyverse’ library, the language used is very straightforward. Therefore in order to filter by a specific characteristic, the code is very simple.

# This filters the burials and creates a new subset of Irish only burials

burials %>% filter(Nationality_Infer == "Irish") -> burials_irish

# You can see the difference in size
burials_irish
## # A tibble: 897 × 33
##    Type.of.Burial Name_FR Foren…¹ Surna…² Natio…³ Gender Relig…⁴ Age_F…⁵ Age_P…⁶
##    <chr>          <chr>   <chr>   <chr>   <chr>   <chr>  <chr>     <int> <chr>  
##  1 Charitable     Patric… Patrick MacCor… Irish   Male   Roman …       2 <NA>   
##  2 Charitable     Marie … Mary T… Maher   Irish   Female Roman …      NA 20 mon…
##  3 Charitable     Alexan… Alexan… Gordon  Irish   Male   Roman …      NA 18 mon…
##  4 Charitable     George… George… Willia… Irish   Male   Roman …       2 <NA>   
##  5 Charitable     Alexis… Alexis  MacLau… Irish   Male   Roman …       2 <NA>   
##  6 Charitable     Guilla… William O'Brien Irish   Male   Roman …       2 <NA>   
##  7 Charitable     Hanain… Hanain… MacDon… Irish   Female Roman …       3 <NA>   
##  8 Charitable     Jacque… Jacque  Baggott Irish   Male   Roman …      NA 18 mon…
##  9 Regular        Honora… Honora… Jennin… Irish   Female Roman …      NA 16 mon…
## 10 Regular        Jacque… Jacque  Carbery Irish   Male   Roman …       1 <NA>   
## # … with 887 more rows, 24 more variables: Age_Ranges <chr>, Occupation <chr>,
## #   Occupation_Type <chr>, Marital_Status <chr>, Spouse_Name_FR <chr>,
## #   Spouse_Forename_EN <chr>, Spouse_Surname_EN <chr>, Date_of_Burial <date>,
## #   Domicile_Inferred <chr>, Place_of_Burial <chr>, Street_Name <chr>,
## #   Father_Name_FR <chr>, Father_Forename_EN <chr>, Father_Surname_EN <chr>,
## #   Father_Nationality <chr>, Father_Domicile <chr>, Mother_Name_FR <chr>,
## #   Mother_Forename_EN <chr>, Mother_Surname_EN <chr>, …

Alternative Filtering Commands

These are the main filtering options:

The simple logical operators are for the filter command are:

    == (equal to)
    
    =! (not equal to)
    
    & (and)

    | (or)

    ! (not)

Exercise #1 - Advanced Filtering

Exercise 1. Filter the marriages for non-Irish brides and grooms. Create a separate file for each.

Creating yearly breakdowns of the data

To count the number of baptisms, you need to first count all the marriages and create a new table. As R often uses ‘n’ to signify a new variable it is often useful to rename columns to better represent what the data actually refers to.

# Create a new dataframe of baptisms
baptisms_irish %>% count(Date_of_Baptism) -> baptisms_irish_dates

# Check the results, we now have a smaller dataframe showing all the Baptisms per day
baptisms_irish_dates
## # A tibble: 694 × 2
##    Date_of_Baptism     n
##    <date>          <int>
##  1 1689-03-22          1
##  2 1689-06-14          1
##  3 1689-10-28          1
##  4 1689-12-27          1
##  5 1690-02-04          1
##  6 1690-03-28          1
##  7 1690-04-03          1
##  8 1690-05-11          1
##  9 1690-05-30          1
## 10 1690-08-17          1
## # … with 684 more rows

Renaming the columns prevents confusion later on and makes for more accurate findings.

colnames(baptisms_irish_dates) <- c("Date", "No")

# Check the results
baptisms_irish_dates
## # A tibble: 694 × 2
##    Date          No
##    <date>     <int>
##  1 1689-03-22     1
##  2 1689-06-14     1
##  3 1689-10-28     1
##  4 1689-12-27     1
##  5 1690-02-04     1
##  6 1690-03-28     1
##  7 1690-04-03     1
##  8 1690-05-11     1
##  9 1690-05-30     1
## 10 1690-08-17     1
## # … with 684 more rows
class(baptisms_irish_dates$Date)
## [1] "Date"

Repeat this process for Marriages and Burials

Repeat this process for Marriages and Burials

marriages_irish %>% count(Date_of_Marriage) -> marriages_irish_dates

marriages_irish_dates
## # A tibble: 190 × 2
##    Date_of_Marriage     n
##    <date>           <int>
##  1 1690-08-18           1
##  2 1690-09-04           1
##  3 1690-11-29           1
##  4 1691-05-11           1
##  5 1691-08-08           1
##  6 1692-11-29           1
##  7 1694-04-01           1
##  8 1694-04-06           1
##  9 1694-05-08           1
## 10 1694-06-10           1
## # … with 180 more rows
colnames(marriages_irish_dates) <- c("Date", "No")

# Check the results
marriages_irish_dates
## # A tibble: 190 × 2
##    Date          No
##    <date>     <int>
##  1 1690-08-18     1
##  2 1690-09-04     1
##  3 1690-11-29     1
##  4 1691-05-11     1
##  5 1691-08-08     1
##  6 1692-11-29     1
##  7 1694-04-01     1
##  8 1694-04-06     1
##  9 1694-05-08     1
## 10 1694-06-10     1
## # … with 180 more rows
burials_irish %>% count(Date_of_Burial) -> burials_irish_dates

burials_irish_dates
## # A tibble: 853 × 2
##    Date_of_Burial     n
##    <date>         <int>
##  1 1689-01-06         1
##  2 1689-02-24         1
##  3 1689-04-28         1
##  4 1689-05-29         1
##  5 1689-07-30         1
##  6 1689-09-16         1
##  7 1689-11-06         1
##  8 1690-01-26         1
##  9 1690-03-09         1
## 10 1690-04-09         1
## # … with 843 more rows
colnames(burials_irish_dates) <- c("Date", "No")

# Check the results
burials_irish_dates
## # A tibble: 853 × 2
##    Date          No
##    <date>     <int>
##  1 1689-01-06     1
##  2 1689-02-24     1
##  3 1689-04-28     1
##  4 1689-05-29     1
##  5 1689-07-30     1
##  6 1689-09-16     1
##  7 1689-11-06     1
##  8 1690-01-26     1
##  9 1690-03-09     1
## 10 1690-04-09     1
## # … with 843 more rows

The package ‘lubridate’ was created to help group together dates with ease

To explain the process there are a number of things happening. First we select the dataset we want to analyse baptisms_dates and then we use the group_by() command which allows for the creation of new groups within the Date information.

# We want yearly data so group by year
baptisms_irish_dates %>% group_by(year=floor_date(Date, "year")) %>% summarize(No_of_Baptisms=sum(No)) -> baptisms_irish_yearly

baptisms_irish_yearly
## # A tibble: 46 × 2
##    year       No_of_Baptisms
##    <date>              <int>
##  1 1689-01-01              4
##  2 1690-01-01              9
##  3 1691-01-01              9
##  4 1692-01-01             27
##  5 1693-01-01             27
##  6 1694-01-01             20
##  7 1695-01-01             18
##  8 1696-01-01             35
##  9 1697-01-01             29
## 10 1698-01-01             37
## # … with 36 more rows

Repeat this process for Marriages and Burials

burials_irish_dates %>% group_by(year=floor_date(Date, "year")) %>% summarize(No_of_Burials=sum(No)) -> burials_irish_yearly

burials_irish_yearly
## # A tibble: 53 × 2
##    year       No_of_Burials
##    <date>             <int>
##  1 1689-01-01             7
##  2 1690-01-01             8
##  3 1691-01-01             8
##  4 1692-01-01            11
##  5 1693-01-01            37
##  6 1694-01-01            12
##  7 1695-01-01            17
##  8 1696-01-01            12
##  9 1697-01-01            15
## 10 1698-01-01            24
## # … with 43 more rows
marriages_irish_dates %>% group_by(year=floor_date(Date, "year")) %>% summarize(No_of_Marriages=sum(No)) -> marriages_irish_yearly

marriages_irish_yearly
## # A tibble: 36 × 2
##    year       No_of_Marriages
##    <date>               <int>
##  1 1690-01-01               3
##  2 1691-01-01               2
##  3 1692-01-01               1
##  4 1694-01-01               7
##  5 1695-01-01               7
##  6 1696-01-01               8
##  7 1697-01-01               8
##  8 1698-01-01              16
##  9 1699-01-01               5
## 10 1700-01-01              12
## # … with 26 more rows

Creating grouped totals for Baptisms, Burials and Marriages

Grouped years for each of the three dataframes do not match, this is a feature of historical data. So we need to add new rows for years that are blank.

glimpse(marriages_irish_yearly)
## Rows: 36
## Columns: 2
## $ year            <date> 1690-01-01, 1691-01-01, 1692-01-01, 1694-01-01, 1695-…
## $ No_of_Marriages <int> 3, 2, 1, 7, 7, 8, 8, 16, 5, 12, 11, 10, 11, 6, 8, 7, 8…
glimpse(burials_irish_yearly)
## Rows: 53
## Columns: 2
## $ year          <date> 1689-01-01, 1690-01-01, 1691-01-01, 1692-01-01, 1693-01…
## $ No_of_Burials <int> 7, 8, 8, 11, 37, 12, 17, 12, 15, 24, 27, 31, 27, 19, 24,…
glimpse(baptisms_irish_yearly)
## Rows: 46
## Columns: 2
## $ year           <date> 1689-01-01, 1690-01-01, 1691-01-01, 1692-01-01, 1693-0…
## $ No_of_Baptisms <int> 4, 9, 9, 27, 27, 20, 18, 35, 29, 37, 51, 32, 35, 30, 33…

There should be 52 entries in each, there is an extra NA row in the Burials dataset. Remove that with the na.omit() command

burials_irish_yearly %>% na.omit() -> burials_irish_yearly

You can view any dataset using the view() command

burials_irish_yearly %>% view()

Adding Missing Years

Since we know that the marriages_irish_yearly totals are missing a number of years we can create add these quickly by creating a sequence of our complete timeline

# This creates a dataframe of all the years in our timeline
all_dates <- data.frame(year=seq(as.Date("1689-01-01"), by='year', length.out=52))

all_dates
##          year
## 1  1689-01-01
## 2  1690-01-01
## 3  1691-01-01
## 4  1692-01-01
## 5  1693-01-01
## 6  1694-01-01
## 7  1695-01-01
## 8  1696-01-01
## 9  1697-01-01
## 10 1698-01-01
## 11 1699-01-01
## 12 1700-01-01
## 13 1701-01-01
## 14 1702-01-01
## 15 1703-01-01
## 16 1704-01-01
## 17 1705-01-01
## 18 1706-01-01
## 19 1707-01-01
## 20 1708-01-01
## 21 1709-01-01
## 22 1710-01-01
## 23 1711-01-01
## 24 1712-01-01
## 25 1713-01-01
## 26 1714-01-01
## 27 1715-01-01
## 28 1716-01-01
## 29 1717-01-01
## 30 1718-01-01
## 31 1719-01-01
## 32 1720-01-01
## 33 1721-01-01
## 34 1722-01-01
## 35 1723-01-01
## 36 1724-01-01
## 37 1725-01-01
## 38 1726-01-01
## 39 1727-01-01
## 40 1728-01-01
## 41 1729-01-01
## 42 1730-01-01
## 43 1731-01-01
## 44 1732-01-01
## 45 1733-01-01
## 46 1734-01-01
## 47 1735-01-01
## 48 1736-01-01
## 49 1737-01-01
## 50 1738-01-01
## 51 1739-01-01
## 52 1740-01-01

Using a command called anti_join() we can determine the missing years from the marriages_irish_yearly dataset

# Use anti_join() to create a dataframe of missing dates
anti_join(all_dates, marriages_irish_yearly, by="year") -> missing_dates

missing_dates
##          year
## 1  1689-01-01
## 2  1693-01-01
## 3  1719-01-01
## 4  1722-01-01
## 5  1723-01-01
## 6  1724-01-01
## 7  1725-01-01
## 8  1729-01-01
## 9  1730-01-01
## 10 1731-01-01
## 11 1734-01-01
## 12 1735-01-01
## 13 1737-01-01
## 14 1738-01-01
## 15 1739-01-01
## 16 1740-01-01

Merging the missing the dates into the marriages_irish_yearly data

# Now merge the two dataframes together by the year column, remember to include all the values for x and y values
merge(marriages_irish_yearly, missing_dates, by="year", all.y = T, all.x = T) -> marriages_irish_yearly

marriages_irish_yearly
##          year No_of_Marriages
## 1  1689-01-01              NA
## 2  1690-01-01               3
## 3  1691-01-01               2
## 4  1692-01-01               1
## 5  1693-01-01              NA
## 6  1694-01-01               7
## 7  1695-01-01               7
## 8  1696-01-01               8
## 9  1697-01-01               8
## 10 1698-01-01              16
## 11 1699-01-01               5
## 12 1700-01-01              12
## 13 1701-01-01              11
## 14 1702-01-01              10
## 15 1703-01-01              11
## 16 1704-01-01               6
## 17 1705-01-01               8
## 18 1706-01-01               7
## 19 1707-01-01               8
## 20 1708-01-01              10
## 21 1709-01-01               5
## 22 1710-01-01              14
## 23 1711-01-01               2
## 24 1712-01-01               2
## 25 1713-01-01               7
## 26 1714-01-01               3
## 27 1715-01-01               2
## 28 1716-01-01               4
## 29 1717-01-01               1
## 30 1718-01-01               1
## 31 1719-01-01              NA
## 32 1720-01-01               1
## 33 1721-01-01               2
## 34 1722-01-01              NA
## 35 1723-01-01              NA
## 36 1724-01-01              NA
## 37 1725-01-01              NA
## 38 1726-01-01               2
## 39 1727-01-01               1
## 40 1728-01-01               1
## 41 1729-01-01              NA
## 42 1730-01-01              NA
## 43 1731-01-01              NA
## 44 1732-01-01               2
## 45 1733-01-01               1
## 46 1734-01-01              NA
## 47 1735-01-01              NA
## 48 1736-01-01               1
## 49 1737-01-01              NA
## 50 1738-01-01              NA
## 51 1739-01-01              NA
## 52 1740-01-01              NA
# Remember to change the NA values to 0
marriages_irish_yearly[is.na(marriages_irish_yearly)] <- 0
# Check the Results
marriages_irish_yearly
##          year No_of_Marriages
## 1  1689-01-01               0
## 2  1690-01-01               3
## 3  1691-01-01               2
## 4  1692-01-01               1
## 5  1693-01-01               0
## 6  1694-01-01               7
## 7  1695-01-01               7
## 8  1696-01-01               8
## 9  1697-01-01               8
## 10 1698-01-01              16
## 11 1699-01-01               5
## 12 1700-01-01              12
## 13 1701-01-01              11
## 14 1702-01-01              10
## 15 1703-01-01              11
## 16 1704-01-01               6
## 17 1705-01-01               8
## 18 1706-01-01               7
## 19 1707-01-01               8
## 20 1708-01-01              10
## 21 1709-01-01               5
## 22 1710-01-01              14
## 23 1711-01-01               2
## 24 1712-01-01               2
## 25 1713-01-01               7
## 26 1714-01-01               3
## 27 1715-01-01               2
## 28 1716-01-01               4
## 29 1717-01-01               1
## 30 1718-01-01               1
## 31 1719-01-01               0
## 32 1720-01-01               1
## 33 1721-01-01               2
## 34 1722-01-01               0
## 35 1723-01-01               0
## 36 1724-01-01               0
## 37 1725-01-01               0
## 38 1726-01-01               2
## 39 1727-01-01               1
## 40 1728-01-01               1
## 41 1729-01-01               0
## 42 1730-01-01               0
## 43 1731-01-01               0
## 44 1732-01-01               2
## 45 1733-01-01               1
## 46 1734-01-01               0
## 47 1735-01-01               0
## 48 1736-01-01               1
## 49 1737-01-01               0
## 50 1738-01-01               0
## 51 1739-01-01               0
## 52 1740-01-01               0

Exercise #2 - Add Missing dates for the remaining yearly totals

Exercise 2. Using the all_dates data.frame, repeat the process of adding missing dates for baptisms_irish_yearly.