J. Kavanagh
2023-02-25
This is an R Markdown presentation. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.
All the code used to create this lesson plan is available as a document here.
When you open RStudio for the first time there are four windows, you can re-orientate these to your own preference. This also applies to the text colour, font type and size. It is very customisable for your needs. RStudio uses the memory of your computer and you need to set a Working Directory for your analysis. My advice is create a specific folder and link RStudio to it using the Session -> Set Working Directory -> Choose Directory… option in the Menu Bar
This lecture is designed to show you the potential for using R for spatial analysis when applied to historical questions.
You must have the following installed on your computer before we begin.
Open RStudio, in the console type the following
## [1] "/Users/jackkavanagh/Dropbox/R_SGL_Lessons"
Any files you are going to import into R need to be placed into this folder, any files you export from R will also go to this folder. So the good thing is that it your files avoid being saved in some obscure folder in your hard-drive. You choose where it lives. For Windows users, the simplest place would be your Desktop.
We are only using 3 libraries for our data analysis: Tidyverse, lubridate and ggthemes.
I already sent along a little script explaining how to import them but here is a recap.
install.packages(“tidyverse”, “ggthemes”, “lubridate”)
This then tells RStudio to install these packages from a central repository. Once they’ve been installed you need to tell R to use them. You do this by using the library() command
The %>% pipeline command will be used throughout this lecture to link various command queries
marriages %>% select(Groom_Occupation_Type)
The $ command is used to display the internal components of a dataframe
marriages$
Please run these now in the Console section of RStudio
The simple logical operators are for the filter command are:
& (and)
| (or)
! (not)
Using base R it is possible to join datasets either by row or column
## a b c
## 1 1 7 3
## 2 3 7 3
## 3 3 8 6
## 4 4 3 6
## 5 5 2 8
## [1] 11 14 17
# rbind() will join this datasets together as they are of equal length and stack one atop the other
df_new <- rbind(df, df2)
df_new## a b c
## 1 1 7 3
## 2 3 7 3
## 3 3 8 6
## 4 4 3 6
## 5 5 2 8
## 6 11 14 17
This is an example of how to join dataframes by column
## a b c
## 1 1 7 3
## 2 3 7 3
## 3 3 8 6
## 4 4 3 6
## 5 5 2 8
## [1] 11 14 16 17 22
## a b c df2
## 1 1 7 3 11
## 2 3 7 3 14
## 3 3 8 6 16
## 4 4 3 6 17
## 5 5 2 8 22
There are a number of different ways to adjust dates, however, as the data is structured we can use the mdy() command from the package ‘lubridate’ to make a relatively simple change.
# Create some sample dates
begin <- c("May 11, 1996", "September 12, 2001", "July 1, 1988")
end <- c("7/8/97","10/23/02","1/4/91")
class(begin)## [1] "character"
## [1] "character"
## [1] "1996-05-11" "2001-09-12" "1988-07-01"
## [1] "1997-07-08" "2002-10-23" "1991-01-04"
## [1] "Date"
## [1] "Date"
The Saint Germain-en-Laye database is structured as a series of tables: Marriages, Burials, Baptisms, Abjurations and Signatures
I’ve exported these from Microsoft Excel as CSV (Comma Separated Values) files as these are easier to import and are open access, meaning you don’t need to pay a subscription to Microsoft, Apple or Google to use them.
To import files into the R environment, you are ‘reading’ them into the system. Each part of the following code segment performs a separate task, similar to when importing a file into Excel or Word.
The first part is straightforward, you are reading the file ‘baptisms.csv’ into the system.
The second part is telling the computer that is character data and does not have a formal hierarchy. It would be different it we were importing data that had to be in a specific order then we would tell import the file and then specify stringsAsFactors=TRUE. There is a longer explanation with examples at this link.
Finally the na.strings segment is telling the computer to insert NA into all the blank rows within the data.
read.csv('burials.csv',
stringsAsFactors = F,
na.strings= c("NA", " ", "")) -> burials
read.csv('marriages.csv',
stringsAsFactors = F,
na.strings= c("NA", " ", "")) -> marriages
read.csv('abjurations.csv',
stringsAsFactors = F,
na.strings= c("NA", " ", "")) -> abjurations
read.csv('signatures.csv',
stringsAsFactors = F,
na.strings= c("NA", " ", "")) -> signaturesOnce the files have been imported they will be imported as data.frames this is the R equivalent of a Excel file. We can get an overview of the data we have just imported using the glimpse() command, which provides an overview of the data.
You can see the different classes of data, in this case everything is a chr or character. Numerical data is often written as int or integer. In the case of the Date of Baptism column we will need to change that to a Date type.
## Rows: 954
## Columns: 69
## $ Child_Name_FR <chr> "Therese Margueritt…
## $ Child_Forename_EN <chr> "Therese Margueritt…
## $ Child_Surname_EN <chr> "Melfort", "Macdonn…
## $ Child_Nationality_Infer <chr> "Irish", "Irish", "…
## $ Gender <chr> "Female", "Male", "…
## $ Child_Religion_Infer <chr> "Roman Catholic", "…
## $ Date_of_Birth <chr> "1689-01-08", "1689…
## $ Date_of_Birth_Partial <chr> NA, NA, NA, NA, NA,…
## $ Date_of_Baptism <chr> "1689-01-08", "1689…
## $ Place_of_Baptism <chr> "St Germain-en-Laye…
## $ Church_of_Baptismal_Ceremony <chr> "Parish church", "P…
## $ Street_of_Residence <chr> NA, NA, NA, NA, NA,…
## $ Parents_Status <chr> "Married", "Married…
## $ Father_Name_FR <chr> "Jean Comte de Melf…
## $ Father_Forename_EN <chr> "John", "Randall", …
## $ Father_Surname_EN <chr> "Melfort", "Macdonn…
## $ Father_Nationality_Stated <chr> NA, NA, NA, NA, NA,…
## $ Father_Nationality_Infer <chr> "Irish", "Irish", "…
## $ Father_Rank <chr> "Count", "Gentleman…
## $ Father_Occupation <chr> "Secretary of the r…
## $ Father_Occupation_Type <chr> "Noble", "Noble", "…
## $ Member_of_Jacobite_Army <chr> "Yes", NA, "Yes", N…
## $ Father_Residence <chr> "St Germain-en-Laye…
## $ Father_Register_Signature <chr> "No", "No", "No", "…
## $ Mother_Name_FR <chr> "Euphenia Vicalace"…
## $ Mother_Forenam_EN <chr> "Euphenia", "Hannah…
## $ Mother_Surname_EN <chr> "Vicalace", "Roche"…
## $ Mother_Nationality_Stated <chr> NA, NA, NA, "Englis…
## $ Mother_Nationality_Infer <chr> NA, "Irish", "Irish…
## $ Mother_Religion_Infer <chr> "Roman Catholic", "…
## $ Mother_Rank <chr> NA, NA, NA, NA, NA,…
## $ Mother_Occupation <chr> NA, NA, NA, NA, NA,…
## $ Mother_Occupation_Type <chr> NA, NA, NA, NA, NA,…
## $ Mother_Residence <chr> "St Germain-en-Laye…
## $ Mother_Register_Signature <chr> "No", "No", "No", "…
## $ Godfather_Name_FR <chr> "Jacques Comte de D…
## $ Godfather_Forename_EN <chr> "Jacque", NA, NA, "…
## $ Godfather_Surname_EN <chr> "Drummond", NA, "St…
## $ Godfather_Nationality_Stated <chr> NA, NA, "English", …
## $ Godfather_Nationality_Infer <chr> "Scottish", NA, "En…
## $ Godfather_Rank <chr> "Count", NA, "Lord"…
## $ Godfather_Occupation <chr> "Count", NA, "Lord"…
## $ Godfather_Occupation_Type <chr> "Noble", NA, "Noble…
## $ Godfather_Residence <chr> "St Germain-en-Laye…
## $ Godfather_familial_relationship_to_father_of_child <chr> NA, NA, NA, NA, NA,…
## $ Godfather_familial_relationship_to_mother_of_child <chr> NA, NA, NA, NA, NA,…
## $ Godfather_Register_Signature <chr> "Yes", "Yes", "Yes"…
## $ Godmother_Name_FR <chr> "Marie Conny", "Ann…
## $ Godmother_Forename_EN <chr> "Marie", "Anne", "H…
## $ Godmother_Surname_EN <chr> "Cormy", "Bagnall",…
## $ Godmother_Nationality_Stated <chr> NA, NA, "English", …
## $ Godmother_Nationality_Infer <chr> "Irish", "Irish", "…
## $ Godmother_Religion_Infer <chr> "Roman Catholic", "…
## $ Godmother_Rank <chr> NA, NA, NA, "Wife o…
## $ Godmother_Occupation <chr> NA, NA, NA, NA, NA,…
## $ Godmother_Occupation_Type <chr> NA, NA, NA, NA, NA,…
## $ Godmother_Father_Name_FR <chr> NA, NA, NA, NA, NA,…
## $ Godmother_Father_Forename_EN <chr> NA, NA, NA, NA, NA,…
## $ Godmother_Father_Surname_EN <chr> NA, NA, NA, NA, NA,…
## $ Godmother_Mother_Name_FR <chr> NA, NA, NA, NA, NA,…
## $ Godmother_Mother_Forename_EN <chr> NA, NA, NA, NA, NA,…
## $ Godmother_Mother_Surname_EN <chr> NA, NA, NA, NA, NA,…
## $ Godmother_Residence <chr> "St Germain-en-Laye…
## $ Godmother_relationship_to_father_of_child <chr> NA, NA, NA, NA, NA,…
## $ Godmother_relationship_to_mother_of_child <chr> NA, NA, NA, NA, NA,…
## $ Godmother_Register_Signature <chr> "Yes", "Yes", "Yes"…
## $ Officiating_Priest_Nationality <chr> "French", "French",…
## $ Additional_Notes <chr> NA, NA, NA, NA, NA,…
## $ Archival_Reference <chr> "5MI 1734 [1168921/…
We are going to to do two simple tasks here. First is to change the dataframes to table data.frames, its a relatively simple procedure and what is results is a simpler file structure.
baptisms <- as_tibble(baptisms)
burials <- as_tibble(burials)
marriages <- as_tibble(marriages)
abjurations <- as_tibble(abjurations)Second task is to change the date information into a Date type, this will allow us to filter by day, month or years.
You’ll see that the Date_of_Baptism column has been changed to Date from chr
## Rows: 954
## Columns: 69
## $ Child_Name_FR <chr> "Therese Margueritt…
## $ Child_Forename_EN <chr> "Therese Margueritt…
## $ Child_Surname_EN <chr> "Melfort", "Macdonn…
## $ Child_Nationality_Infer <chr> "Irish", "Irish", "…
## $ Gender <chr> "Female", "Male", "…
## $ Child_Religion_Infer <chr> "Roman Catholic", "…
## $ Date_of_Birth <chr> "1689-01-08", "1689…
## $ Date_of_Birth_Partial <chr> NA, NA, NA, NA, NA,…
## $ Date_of_Baptism <date> 1689-01-08, 1689-0…
## $ Place_of_Baptism <chr> "St Germain-en-Laye…
## $ Church_of_Baptismal_Ceremony <chr> "Parish church", "P…
## $ Street_of_Residence <chr> NA, NA, NA, NA, NA,…
## $ Parents_Status <chr> "Married", "Married…
## $ Father_Name_FR <chr> "Jean Comte de Melf…
## $ Father_Forename_EN <chr> "John", "Randall", …
## $ Father_Surname_EN <chr> "Melfort", "Macdonn…
## $ Father_Nationality_Stated <chr> NA, NA, NA, NA, NA,…
## $ Father_Nationality_Infer <chr> "Irish", "Irish", "…
## $ Father_Rank <chr> "Count", "Gentleman…
## $ Father_Occupation <chr> "Secretary of the r…
## $ Father_Occupation_Type <chr> "Noble", "Noble", "…
## $ Member_of_Jacobite_Army <chr> "Yes", NA, "Yes", N…
## $ Father_Residence <chr> "St Germain-en-Laye…
## $ Father_Register_Signature <chr> "No", "No", "No", "…
## $ Mother_Name_FR <chr> "Euphenia Vicalace"…
## $ Mother_Forenam_EN <chr> "Euphenia", "Hannah…
## $ Mother_Surname_EN <chr> "Vicalace", "Roche"…
## $ Mother_Nationality_Stated <chr> NA, NA, NA, "Englis…
## $ Mother_Nationality_Infer <chr> NA, "Irish", "Irish…
## $ Mother_Religion_Infer <chr> "Roman Catholic", "…
## $ Mother_Rank <chr> NA, NA, NA, NA, NA,…
## $ Mother_Occupation <chr> NA, NA, NA, NA, NA,…
## $ Mother_Occupation_Type <chr> NA, NA, NA, NA, NA,…
## $ Mother_Residence <chr> "St Germain-en-Laye…
## $ Mother_Register_Signature <chr> "No", "No", "No", "…
## $ Godfather_Name_FR <chr> "Jacques Comte de D…
## $ Godfather_Forename_EN <chr> "Jacque", NA, NA, "…
## $ Godfather_Surname_EN <chr> "Drummond", NA, "St…
## $ Godfather_Nationality_Stated <chr> NA, NA, "English", …
## $ Godfather_Nationality_Infer <chr> "Scottish", NA, "En…
## $ Godfather_Rank <chr> "Count", NA, "Lord"…
## $ Godfather_Occupation <chr> "Count", NA, "Lord"…
## $ Godfather_Occupation_Type <chr> "Noble", NA, "Noble…
## $ Godfather_Residence <chr> "St Germain-en-Laye…
## $ Godfather_familial_relationship_to_father_of_child <chr> NA, NA, NA, NA, NA,…
## $ Godfather_familial_relationship_to_mother_of_child <chr> NA, NA, NA, NA, NA,…
## $ Godfather_Register_Signature <chr> "Yes", "Yes", "Yes"…
## $ Godmother_Name_FR <chr> "Marie Conny", "Ann…
## $ Godmother_Forename_EN <chr> "Marie", "Anne", "H…
## $ Godmother_Surname_EN <chr> "Cormy", "Bagnall",…
## $ Godmother_Nationality_Stated <chr> NA, NA, "English", …
## $ Godmother_Nationality_Infer <chr> "Irish", "Irish", "…
## $ Godmother_Religion_Infer <chr> "Roman Catholic", "…
## $ Godmother_Rank <chr> NA, NA, NA, "Wife o…
## $ Godmother_Occupation <chr> NA, NA, NA, NA, NA,…
## $ Godmother_Occupation_Type <chr> NA, NA, NA, NA, NA,…
## $ Godmother_Father_Name_FR <chr> NA, NA, NA, NA, NA,…
## $ Godmother_Father_Forename_EN <chr> NA, NA, NA, NA, NA,…
## $ Godmother_Father_Surname_EN <chr> NA, NA, NA, NA, NA,…
## $ Godmother_Mother_Name_FR <chr> NA, NA, NA, NA, NA,…
## $ Godmother_Mother_Forename_EN <chr> NA, NA, NA, NA, NA,…
## $ Godmother_Mother_Surname_EN <chr> NA, NA, NA, NA, NA,…
## $ Godmother_Residence <chr> "St Germain-en-Laye…
## $ Godmother_relationship_to_father_of_child <chr> NA, NA, NA, NA, NA,…
## $ Godmother_relationship_to_mother_of_child <chr> NA, NA, NA, NA, NA,…
## $ Godmother_Register_Signature <chr> "Yes", "Yes", "Yes"…
## $ Officiating_Priest_Nationality <chr> "French", "French",…
## $ Additional_Notes <chr> NA, NA, NA, NA, NA,…
## $ Archival_Reference <chr> "5MI 1734 [1168921/…
As we are using the ‘Tidyverse’ library, the language used is very straightforward. Therefore in order to filter by a specific characteristic, the code is very simple.
# This filters the burials and creates a new subset of Irish only burials
burials %>% filter(Nationality_Infer == "Irish") -> burials_irish
# You can see the difference in size
burials_irish## # A tibble: 897 × 33
## Type.of.Burial Name_FR Foren…¹ Surna…² Natio…³ Gender Relig…⁴ Age_F…⁵ Age_P…⁶
## <chr> <chr> <chr> <chr> <chr> <chr> <chr> <int> <chr>
## 1 Charitable Patric… Patrick MacCor… Irish Male Roman … 2 <NA>
## 2 Charitable Marie … Mary T… Maher Irish Female Roman … NA 20 mon…
## 3 Charitable Alexan… Alexan… Gordon Irish Male Roman … NA 18 mon…
## 4 Charitable George… George… Willia… Irish Male Roman … 2 <NA>
## 5 Charitable Alexis… Alexis MacLau… Irish Male Roman … 2 <NA>
## 6 Charitable Guilla… William O'Brien Irish Male Roman … 2 <NA>
## 7 Charitable Hanain… Hanain… MacDon… Irish Female Roman … 3 <NA>
## 8 Charitable Jacque… Jacque Baggott Irish Male Roman … NA 18 mon…
## 9 Regular Honora… Honora… Jennin… Irish Female Roman … NA 16 mon…
## 10 Regular Jacque… Jacque Carbery Irish Male Roman … 1 <NA>
## # … with 887 more rows, 24 more variables: Age_Ranges <chr>, Occupation <chr>,
## # Occupation_Type <chr>, Marital_Status <chr>, Spouse_Name_FR <chr>,
## # Spouse_Forename_EN <chr>, Spouse_Surname_EN <chr>, Date_of_Burial <date>,
## # Domicile_Inferred <chr>, Place_of_Burial <chr>, Street_Name <chr>,
## # Father_Name_FR <chr>, Father_Forename_EN <chr>, Father_Surname_EN <chr>,
## # Father_Nationality <chr>, Father_Domicile <chr>, Mother_Name_FR <chr>,
## # Mother_Forename_EN <chr>, Mother_Surname_EN <chr>, …
# As the entries of baptisms are focused upon an individual child, be sure to place that first in your list, followed by the parents
baptisms %>% filter(Child_Nationality_Infer == "Irish" & Father_Nationality_Infer == "Irish" & Mother_Nationality_Infer == "Irish") -> baptisms_irish
# Check the results
baptisms_irish## # A tibble: 727 × 69
## Child_Nam…¹ Child…² Child…³ Child…⁴ Gender Child…⁵ Date_…⁶ Date_…⁷ Date_of_…⁸
## <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <date>
## 1 Renald Mag… "Ronal… Macdon… Irish Male Roman … 1689-1… <NA> 1689-03-22
## 2 Henriette … "Henri… Dumbar… Irish Female Roman … 1689-0… <NA> 1689-06-14
## 3 Marie Brue… "Marie" Breret… Irish Female Roman … 1689-1… <NA> 1689-10-28
## 4 Louise Sau… "Louis… Sewell Irish Female Roman … 1689-1… <NA> 1689-12-27
## 5 Jacques Ad… "James" Adams Irish Male Roman … 1690-0… <NA> 1690-02-04
## 6 Anne Winif… "Anne … Strick… Irish Female Roman … 1690-0… <NA> 1690-03-28
## 7 Marie Ursu… "Mary … Riva Irish Female Roman … 1690-0… <NA> 1690-04-03
## 8 Marie Fitz… "Mary" Fitzpa… Irish Female Roman … 1690-0… <NA> 1690-05-11
## 9 Louis Jack… "Louis" Jackson Irish Male Roman … 1690-0… <NA> 1690-05-30
## 10 Louise Lam… "Louis… Lambert Irish Female Roman … 1690-0… <NA> 1690-08-17
## # … with 717 more rows, 60 more variables: Place_of_Baptism <chr>,
## # Church_of_Baptismal_Ceremony <chr>, Street_of_Residence <chr>,
## # Parents_Status <chr>, Father_Name_FR <chr>, Father_Forename_EN <chr>,
## # Father_Surname_EN <chr>, Father_Nationality_Stated <chr>,
## # Father_Nationality_Infer <chr>, Father_Rank <chr>, Father_Occupation <chr>,
## # Father_Occupation_Type <chr>, Member_of_Jacobite_Army <chr>,
## # Father_Residence <chr>, Father_Register_Signature <chr>, …
# Again as each marriage is a separate instance or event only use the Bride and Groom Nationality
marriages %>% filter(Groom_Nationality_Infer == "Irish" & Bride_Nationality_Infer == "Irish") -> marriages_irish
# Check the results
marriages_irish## # A tibble: 192 × 76
## Groom_Name_FR Groom…¹ Groom…² Groom…³ Groom…⁴ Groom…⁵ Groom…⁶ Groom…⁷ Groom…⁸
## <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 Aaron Huskin Aaron Huskin Marie … Mary Waldren No Jean H… "John"
## 2 Alain MacDon… Alan MacDon… Marie … Mary A… MacClo… No Daniel… "Danie…
## 3 Alexandre Ma… Alexan… MacCle… <NA> <NA> <NA> <NA> <NA> <NA>
## 4 Alexandre Pl… Alexan… Plunke… Jeanne… Jean O'Mara No Mauric… "Mauri…
## 5 André Barret Andrew Barrett Anne L… Anne Long No Jacque… "James"
## 6 Anthoine Omu… Anthony O'Mulr… Cather… Cather… Nixon No Jeremi… "Jerem…
## 7 Bernard Henry Bernard Henry Mabel … Mabel Nolan No Patric… "Patri…
## 8 Bernard Hues Bernard Hughes <NA> <NA> <NA> <NA> <NA> <NA>
## 9 Bernard Jolly Bernard Jolly Cecile… Cecille Reilly No Louis … "Louis"
## 10 Bernard Morp… Bernard Murphy Rose C… Rose Connol… No Terenc… "Teren…
## # … with 182 more rows, 67 more variables: Groom_Father_Surname_EN <chr>,
## # Groom_Father_Deceased <chr>, Groom_Age <int>, Widower <chr>,
## # Groom_Nationality_Stated <chr>, Groom_Nationality_Infer <chr>,
## # Groom_Religion_Infer <chr>, Groom_Rank <chr>, Groom_Occupation <chr>,
## # Groom_Occupation_Type <chr>, Groom_Residence <chr>,
## # Date_of_Marriage <date>, Place_of_Marriage <chr>, Church_of_Ceremony <chr>,
## # Bride_Name_FR <chr>, Bride_Forename_EN <chr>, Bride_Surname_EN <chr>, …
To count the number of baptisms, you need to first count all the marriages and create a new table. As R often uses ‘n’ to signify a new variable it is often useful to rename columns to better represent what the data actually refers to.
# Create a new dataframe of baptisms
baptisms_irish %>% count(Date_of_Baptism) -> baptisms_irish_dates
# Check the results, we now have a smaller dataframe showing all the Baptisms per day
baptisms_irish_dates## # A tibble: 694 × 2
## Date_of_Baptism n
## <date> <int>
## 1 1689-03-22 1
## 2 1689-06-14 1
## 3 1689-10-28 1
## 4 1689-12-27 1
## 5 1690-02-04 1
## 6 1690-03-28 1
## 7 1690-04-03 1
## 8 1690-05-11 1
## 9 1690-05-30 1
## 10 1690-08-17 1
## # … with 684 more rows
Renaming the columns prevents confusion later on and makes for more accurate findings.
## # A tibble: 694 × 2
## Date No
## <date> <int>
## 1 1689-03-22 1
## 2 1689-06-14 1
## 3 1689-10-28 1
## 4 1689-12-27 1
## 5 1690-02-04 1
## 6 1690-03-28 1
## 7 1690-04-03 1
## 8 1690-05-11 1
## 9 1690-05-30 1
## 10 1690-08-17 1
## # … with 684 more rows
## [1] "Date"
Repeat this process for Marriages and Burials
## # A tibble: 190 × 2
## Date_of_Marriage n
## <date> <int>
## 1 1690-08-18 1
## 2 1690-09-04 1
## 3 1690-11-29 1
## 4 1691-05-11 1
## 5 1691-08-08 1
## 6 1692-11-29 1
## 7 1694-04-01 1
## 8 1694-04-06 1
## 9 1694-05-08 1
## 10 1694-06-10 1
## # … with 180 more rows
## # A tibble: 190 × 2
## Date No
## <date> <int>
## 1 1690-08-18 1
## 2 1690-09-04 1
## 3 1690-11-29 1
## 4 1691-05-11 1
## 5 1691-08-08 1
## 6 1692-11-29 1
## 7 1694-04-01 1
## 8 1694-04-06 1
## 9 1694-05-08 1
## 10 1694-06-10 1
## # … with 180 more rows
## # A tibble: 853 × 2
## Date_of_Burial n
## <date> <int>
## 1 1689-01-06 1
## 2 1689-02-24 1
## 3 1689-04-28 1
## 4 1689-05-29 1
## 5 1689-07-30 1
## 6 1689-09-16 1
## 7 1689-11-06 1
## 8 1690-01-26 1
## 9 1690-03-09 1
## 10 1690-04-09 1
## # … with 843 more rows
## # A tibble: 853 × 2
## Date No
## <date> <int>
## 1 1689-01-06 1
## 2 1689-02-24 1
## 3 1689-04-28 1
## 4 1689-05-29 1
## 5 1689-07-30 1
## 6 1689-09-16 1
## 7 1689-11-06 1
## 8 1690-01-26 1
## 9 1690-03-09 1
## 10 1690-04-09 1
## # … with 843 more rows
To explain the process there are a number of things happening. First we select the dataset we want to analyse baptisms_dates and then we use the group_by() command which allows for the creation of new groups within the Date information.
# You can select by different ranges, for example 3 months
baptisms_irish_dates %>% group_by(threemonth=floor_date(Date, "3 months")) %>% summarize(No_of_Baptisms=sum(No))## # A tibble: 148 × 2
## threemonth No_of_Baptisms
## <date> <int>
## 1 1689-01-01 1
## 2 1689-04-01 1
## 3 1689-10-01 2
## 4 1690-01-01 2
## 5 1690-04-01 3
## 6 1690-07-01 2
## 7 1690-10-01 2
## 8 1691-01-01 5
## 9 1691-04-01 3
## 10 1691-10-01 1
## # … with 138 more rows
‘threemonth’ is the new column we’re creating and ‘3 months’ is the range we set. Now that groups the Date but we also need to group the totals as well, for that we use summarize() and tell are creating a running total of all the dates.
# We want yearly data so group by year
baptisms_irish_dates %>% group_by(year=floor_date(Date, "year")) %>% summarize(No_of_Baptisms=sum(No)) -> baptisms_irish_yearly
baptisms_irish_yearly## # A tibble: 46 × 2
## year No_of_Baptisms
## <date> <int>
## 1 1689-01-01 4
## 2 1690-01-01 9
## 3 1691-01-01 9
## 4 1692-01-01 27
## 5 1693-01-01 27
## 6 1694-01-01 20
## 7 1695-01-01 18
## 8 1696-01-01 35
## 9 1697-01-01 29
## 10 1698-01-01 37
## # … with 36 more rows
burials_irish_dates %>% group_by(year=floor_date(Date, "year")) %>% summarize(No_of_Burials=sum(No)) -> burials_irish_yearly
burials_irish_yearly## # A tibble: 53 × 2
## year No_of_Burials
## <date> <int>
## 1 1689-01-01 7
## 2 1690-01-01 8
## 3 1691-01-01 8
## 4 1692-01-01 11
## 5 1693-01-01 37
## 6 1694-01-01 12
## 7 1695-01-01 17
## 8 1696-01-01 12
## 9 1697-01-01 15
## 10 1698-01-01 24
## # … with 43 more rows
marriages_irish_dates %>% group_by(year=floor_date(Date, "year")) %>% summarize(No_of_Marriages=sum(No)) -> marriages_irish_yearly
marriages_irish_yearly## # A tibble: 36 × 2
## year No_of_Marriages
## <date> <int>
## 1 1690-01-01 3
## 2 1691-01-01 2
## 3 1692-01-01 1
## 4 1694-01-01 7
## 5 1695-01-01 7
## 6 1696-01-01 8
## 7 1697-01-01 8
## 8 1698-01-01 16
## 9 1699-01-01 5
## 10 1700-01-01 12
## # … with 26 more rows
Grouped years for each of the three dataframes do not match, this is a feature of historical data. So we need to add new rows for years that are blank.
## Rows: 36
## Columns: 2
## $ year <date> 1690-01-01, 1691-01-01, 1692-01-01, 1694-01-01, 1695-…
## $ No_of_Marriages <int> 3, 2, 1, 7, 7, 8, 8, 16, 5, 12, 11, 10, 11, 6, 8, 7, 8…
## Rows: 53
## Columns: 2
## $ year <date> 1689-01-01, 1690-01-01, 1691-01-01, 1692-01-01, 1693-01…
## $ No_of_Burials <int> 7, 8, 8, 11, 37, 12, 17, 12, 15, 24, 27, 31, 27, 19, 24,…
## Rows: 46
## Columns: 2
## $ year <date> 1689-01-01, 1690-01-01, 1691-01-01, 1692-01-01, 1693-0…
## $ No_of_Baptisms <int> 4, 9, 9, 27, 27, 20, 18, 35, 29, 37, 51, 32, 35, 30, 33…
There should be 52 entries in each, there is an extra NA row in the Burials dataset. So we need to add the additional empty rows for marriages and baptisms. You can create a full screen view of each table using the view() comamnd
Now I’ve worked out all the empty rows for marriages & baptisms but the principal is essentially that all the datasets must be the same length before they can be added together.
# This adds a new blank row so the Marriages Yearly dataframe
marriages_irish_yearly %>% add_row(year = c(as.Date("1689-01-01"),
as.Date("1693-01-01"),
as.Date("1719-01-01"),
as.Date("1722-01-01"),
as.Date("1723-01-01"),
as.Date("1724-01-01"),
as.Date("1725-01-01"),
as.Date("1729-01-01"),
as.Date("1730-01-01"),
as.Date("1731-01-01"),
as.Date("1734-01-01"),
as.Date("1735-01-01"),
as.Date("1737-01-01"),
as.Date("1738-01-01"),
as.Date("1739-01-01"),
as.Date("1740-01-01"))) -> marriages_irish_yearly
# This adds a new blank row to the Baptisms Yearly dataframe
baptisms_irish_yearly %>% add_row(year = c(as.Date("1729-01-01"),
as.Date("1724-01-01"),
as.Date("1733-01-01"),
as.Date("1735-01-01"),
as.Date("1736-01-01"),
as.Date("1740-01-01"))) -> baptisms_irish_yearly# As this introduces NAs, the following command replaces this with 0
marriages_irish_yearly[is.na(marriages_irish_yearly)] <- 0
# As this introduces NAs, the following command replaces this with 0
baptisms_irish_yearly[is.na(baptisms_irish_yearly)] <- 0
# Occasinally you'll need to specify the column as well
baptisms_irish_yearly$No_of_Baptisms[is.na(baptisms_irish_yearly$No_of_Baptisms)] <- 0Repeat the process to create the marriages_irish_yearly totals
marriages_irish_dates %>% group_by(year=floor_date(Date, "year")) %>% summarize(No_of_Marriages=sum(No)) -> marriages_irish_yearly
marriages_irish_yearly## # A tibble: 36 × 2
## year No_of_Marriages
## <date> <int>
## 1 1690-01-01 3
## 2 1691-01-01 2
## 3 1692-01-01 1
## 4 1694-01-01 7
## 5 1695-01-01 7
## 6 1696-01-01 8
## 7 1697-01-01 8
## 8 1698-01-01 16
## 9 1699-01-01 5
## 10 1700-01-01 12
## # … with 26 more rows
Since we know that the marriages_irish_yearly totals are missing a number of years we can create add these quickly by creating a sequence of our complete timeline
# This creates a dataframe of all the years in our timeline
all_dates <- data.frame(year=seq(as.Date("1689-01-01"), by='year', length.out=52))
all_dates## year
## 1 1689-01-01
## 2 1690-01-01
## 3 1691-01-01
## 4 1692-01-01
## 5 1693-01-01
## 6 1694-01-01
## 7 1695-01-01
## 8 1696-01-01
## 9 1697-01-01
## 10 1698-01-01
## 11 1699-01-01
## 12 1700-01-01
## 13 1701-01-01
## 14 1702-01-01
## 15 1703-01-01
## 16 1704-01-01
## 17 1705-01-01
## 18 1706-01-01
## 19 1707-01-01
## 20 1708-01-01
## 21 1709-01-01
## 22 1710-01-01
## 23 1711-01-01
## 24 1712-01-01
## 25 1713-01-01
## 26 1714-01-01
## 27 1715-01-01
## 28 1716-01-01
## 29 1717-01-01
## 30 1718-01-01
## 31 1719-01-01
## 32 1720-01-01
## 33 1721-01-01
## 34 1722-01-01
## 35 1723-01-01
## 36 1724-01-01
## 37 1725-01-01
## 38 1726-01-01
## 39 1727-01-01
## 40 1728-01-01
## 41 1729-01-01
## 42 1730-01-01
## 43 1731-01-01
## 44 1732-01-01
## 45 1733-01-01
## 46 1734-01-01
## 47 1735-01-01
## 48 1736-01-01
## 49 1737-01-01
## 50 1738-01-01
## 51 1739-01-01
## 52 1740-01-01
Using a command called anti_join() we can determine the missing years from the marriages_irish_yearly dataset
# Use anti_join() to create a dataframe of missing dates
anti_join(all_dates, marriages_irish_yearly, by="year") -> missing_dates
missing_dates## year
## 1 1689-01-01
## 2 1693-01-01
## 3 1719-01-01
## 4 1722-01-01
## 5 1723-01-01
## 6 1724-01-01
## 7 1725-01-01
## 8 1729-01-01
## 9 1730-01-01
## 10 1731-01-01
## 11 1734-01-01
## 12 1735-01-01
## 13 1737-01-01
## 14 1738-01-01
## 15 1739-01-01
## 16 1740-01-01
# Now merge the two dataframes together by the year column, remember to include all the values for x and y values
merge(marriages_irish_yearly, missing_dates, by="year", all.y = T, all.x = T) -> marriages_irish_yearly
marriages_irish_yearly## year No_of_Marriages
## 1 1689-01-01 NA
## 2 1690-01-01 3
## 3 1691-01-01 2
## 4 1692-01-01 1
## 5 1693-01-01 NA
## 6 1694-01-01 7
## 7 1695-01-01 7
## 8 1696-01-01 8
## 9 1697-01-01 8
## 10 1698-01-01 16
## 11 1699-01-01 5
## 12 1700-01-01 12
## 13 1701-01-01 11
## 14 1702-01-01 10
## 15 1703-01-01 11
## 16 1704-01-01 6
## 17 1705-01-01 8
## 18 1706-01-01 7
## 19 1707-01-01 8
## 20 1708-01-01 10
## 21 1709-01-01 5
## 22 1710-01-01 14
## 23 1711-01-01 2
## 24 1712-01-01 2
## 25 1713-01-01 7
## 26 1714-01-01 3
## 27 1715-01-01 2
## 28 1716-01-01 4
## 29 1717-01-01 1
## 30 1718-01-01 1
## 31 1719-01-01 NA
## 32 1720-01-01 1
## 33 1721-01-01 2
## 34 1722-01-01 NA
## 35 1723-01-01 NA
## 36 1724-01-01 NA
## 37 1725-01-01 NA
## 38 1726-01-01 2
## 39 1727-01-01 1
## 40 1728-01-01 1
## 41 1729-01-01 NA
## 42 1730-01-01 NA
## 43 1731-01-01 NA
## 44 1732-01-01 2
## 45 1733-01-01 1
## 46 1734-01-01 NA
## 47 1735-01-01 NA
## 48 1736-01-01 1
## 49 1737-01-01 NA
## 50 1738-01-01 NA
## 51 1739-01-01 NA
## 52 1740-01-01 NA
# The baptisms and burials yearly totals matched and were merged into a new grouped table using the command inner_join()
inner_join(baptisms_irish_yearly, burials_irish_yearly, by="year") -> births_deaths_yearly
births_deaths_yearly## # A tibble: 52 × 3
## year No_of_Baptisms No_of_Burials
## <date> <dbl> <int>
## 1 1689-01-01 4 7
## 2 1690-01-01 9 8
## 3 1691-01-01 9 8
## 4 1692-01-01 27 11
## 5 1693-01-01 27 37
## 6 1694-01-01 20 12
## 7 1695-01-01 18 17
## 8 1696-01-01 35 12
## 9 1697-01-01 29 15
## 10 1698-01-01 37 24
## # … with 42 more rows
# Add the Marriages yearly totals to the new grouped totals
inner_join(births_deaths_yearly, marriages_irish_yearly, by="year") -> yearly_births_deaths_marriages
yearly_births_deaths_marriages## # A tibble: 52 × 4
## year No_of_Baptisms No_of_Burials No_of_Marriages
## <date> <dbl> <int> <dbl>
## 1 1689-01-01 4 7 0
## 2 1690-01-01 9 8 3
## 3 1691-01-01 9 8 2
## 4 1692-01-01 27 11 1
## 5 1693-01-01 27 37 0
## 6 1694-01-01 20 12 7
## 7 1695-01-01 18 17 7
## 8 1696-01-01 35 12 8
## 9 1697-01-01 29 15 8
## 10 1698-01-01 37 24 16
## # … with 42 more rows
# Tidy the column names
colnames(yearly_births_deaths_marriages) <- c("Year", "Baptisms", "Burials", "Marriages")
yearly_births_deaths_marriages## # A tibble: 52 × 4
## Year Baptisms Burials Marriages
## <date> <dbl> <int> <dbl>
## 1 1689-01-01 4 7 0
## 2 1690-01-01 9 8 3
## 3 1691-01-01 9 8 2
## 4 1692-01-01 27 11 1
## 5 1693-01-01 27 37 0
## 6 1694-01-01 20 12 7
## 7 1695-01-01 18 17 7
## 8 1696-01-01 35 12 8
## 9 1697-01-01 29 15 8
## 10 1698-01-01 37 24 16
## # … with 42 more rows
At the moment, the grouped births, deaths and marriages table is in wide format, we need to change it to long data.
# This data is in a 'wide' format, for effective charts it needs to be re-adjusted to 'long' data. There is a package called 'reshape2' which does this automatically.
yearly_long <- melt(yearly_births_deaths_marriages, id.vars = "Year")
# As you can see it's been re-ordered
yearly_long## Year variable value
## 1 1689-01-01 Baptisms 4
## 2 1690-01-01 Baptisms 9
## 3 1691-01-01 Baptisms 9
## 4 1692-01-01 Baptisms 27
## 5 1693-01-01 Baptisms 27
## 6 1694-01-01 Baptisms 20
## 7 1695-01-01 Baptisms 18
## 8 1696-01-01 Baptisms 35
## 9 1697-01-01 Baptisms 29
## 10 1698-01-01 Baptisms 37
## 11 1699-01-01 Baptisms 51
## 12 1700-01-01 Baptisms 32
## 13 1701-01-01 Baptisms 35
## 14 1702-01-01 Baptisms 30
## 15 1703-01-01 Baptisms 33
## 16 1704-01-01 Baptisms 27
## 17 1705-01-01 Baptisms 20
## 18 1706-01-01 Baptisms 24
## 19 1707-01-01 Baptisms 30
## 20 1708-01-01 Baptisms 35
## 21 1709-01-01 Baptisms 26
## 22 1710-01-01 Baptisms 16
## 23 1711-01-01 Baptisms 25
## 24 1712-01-01 Baptisms 25
## 25 1713-01-01 Baptisms 10
## 26 1714-01-01 Baptisms 17
## 27 1715-01-01 Baptisms 9
## 28 1716-01-01 Baptisms 8
## 29 1717-01-01 Baptisms 8
## 30 1718-01-01 Baptisms 8
## 31 1719-01-01 Baptisms 6
## 32 1720-01-01 Baptisms 4
## 33 1721-01-01 Baptisms 4
## 34 1722-01-01 Baptisms 3
## 35 1723-01-01 Baptisms 5
## 36 1725-01-01 Baptisms 3
## 37 1726-01-01 Baptisms 1
## 38 1727-01-01 Baptisms 3
## 39 1728-01-01 Baptisms 2
## 40 1730-01-01 Baptisms 1
## 41 1731-01-01 Baptisms 1
## 42 1732-01-01 Baptisms 1
## 43 1734-01-01 Baptisms 1
## 44 1737-01-01 Baptisms 2
## 45 1738-01-01 Baptisms 2
## 46 1739-01-01 Baptisms 4
## 47 1729-01-01 Baptisms 0
## 48 1724-01-01 Baptisms 0
## 49 1733-01-01 Baptisms 0
## 50 1735-01-01 Baptisms 0
## 51 1736-01-01 Baptisms 0
## 52 1740-01-01 Baptisms 0
## 53 1689-01-01 Burials 7
## 54 1690-01-01 Burials 8
## 55 1691-01-01 Burials 8
## 56 1692-01-01 Burials 11
## 57 1693-01-01 Burials 37
## 58 1694-01-01 Burials 12
## 59 1695-01-01 Burials 17
## 60 1696-01-01 Burials 12
## 61 1697-01-01 Burials 15
## 62 1698-01-01 Burials 24
## 63 1699-01-01 Burials 27
## 64 1700-01-01 Burials 31
## 65 1701-01-01 Burials 27
## 66 1702-01-01 Burials 19
## 67 1703-01-01 Burials 24
## 68 1704-01-01 Burials 27
## 69 1705-01-01 Burials 9
## 70 1706-01-01 Burials 12
## 71 1707-01-01 Burials 17
## 72 1708-01-01 Burials 41
## 73 1709-01-01 Burials 39
## 74 1710-01-01 Burials 27
## 75 1711-01-01 Burials 39
## 76 1712-01-01 Burials 41
## 77 1713-01-01 Burials 21
## 78 1714-01-01 Burials 16
## 79 1715-01-01 Burials 18
## 80 1716-01-01 Burials 23
## 81 1717-01-01 Burials 20
## 82 1718-01-01 Burials 15
## 83 1719-01-01 Burials 19
## 84 1720-01-01 Burials 13
## 85 1721-01-01 Burials 11
## 86 1722-01-01 Burials 20
## 87 1723-01-01 Burials 16
## 88 1725-01-01 Burials 13
## 89 1726-01-01 Burials 11
## 90 1727-01-01 Burials 16
## 91 1728-01-01 Burials 8
## 92 1730-01-01 Burials 3
## 93 1731-01-01 Burials 5
## 94 1732-01-01 Burials 15
## 95 1734-01-01 Burials 8
## 96 1737-01-01 Burials 5
## 97 1738-01-01 Burials 11
## 98 1739-01-01 Burials 9
## 99 1729-01-01 Burials 14
## 100 1724-01-01 Burials 14
## 101 1733-01-01 Burials 9
## 102 1735-01-01 Burials 14
## 103 1736-01-01 Burials 7
## 104 1740-01-01 Burials 11
## 105 1689-01-01 Marriages 0
## 106 1690-01-01 Marriages 3
## 107 1691-01-01 Marriages 2
## 108 1692-01-01 Marriages 1
## 109 1693-01-01 Marriages 0
## 110 1694-01-01 Marriages 7
## 111 1695-01-01 Marriages 7
## 112 1696-01-01 Marriages 8
## 113 1697-01-01 Marriages 8
## 114 1698-01-01 Marriages 16
## 115 1699-01-01 Marriages 5
## 116 1700-01-01 Marriages 12
## 117 1701-01-01 Marriages 11
## 118 1702-01-01 Marriages 10
## 119 1703-01-01 Marriages 11
## 120 1704-01-01 Marriages 6
## 121 1705-01-01 Marriages 8
## 122 1706-01-01 Marriages 7
## 123 1707-01-01 Marriages 8
## 124 1708-01-01 Marriages 10
## 125 1709-01-01 Marriages 5
## 126 1710-01-01 Marriages 14
## 127 1711-01-01 Marriages 2
## 128 1712-01-01 Marriages 2
## 129 1713-01-01 Marriages 7
## 130 1714-01-01 Marriages 3
## 131 1715-01-01 Marriages 2
## 132 1716-01-01 Marriages 4
## 133 1717-01-01 Marriages 1
## 134 1718-01-01 Marriages 1
## 135 1719-01-01 Marriages 0
## 136 1720-01-01 Marriages 1
## 137 1721-01-01 Marriages 2
## 138 1722-01-01 Marriages 0
## 139 1723-01-01 Marriages 0
## 140 1725-01-01 Marriages 0
## 141 1726-01-01 Marriages 2
## 142 1727-01-01 Marriages 1
## 143 1728-01-01 Marriages 1
## 144 1730-01-01 Marriages 0
## 145 1731-01-01 Marriages 0
## 146 1732-01-01 Marriages 2
## 147 1734-01-01 Marriages 0
## 148 1737-01-01 Marriages 0
## 149 1738-01-01 Marriages 0
## 150 1739-01-01 Marriages 0
## 151 1729-01-01 Marriages 0
## 152 1724-01-01 Marriages 0
## 153 1733-01-01 Marriages 1
## 154 1735-01-01 Marriages 0
## 155 1736-01-01 Marriages 1
## 156 1740-01-01 Marriages 0
This is just a basic plot without any background or title. For a good introduction to using ggplot2, I recommend the introduction written by Chris Brunsdon here.
yearly_long %>% ggplot(aes(x=Year, y=value, colour=variable, group=variable)) + geom_line(linewidth=0.8) This is a fully labelled and marked up graph, notice that you can edit nearly every aspect.
yearly_long %>% ggplot(aes(x=Year, y=value, colour=variable, group=variable)) +
geom_line(linewidth=0.8) +
labs(title = "Irish Baptisms, Burials and Marriages 1689-1740",
subtitle = "(St. Germain-en-Laye)",
tag = "Figure 1",
caption = "Database of St. Germain-en-Laye Registers",
x = "Year",
y = "No.",
color = "") +
scale_color_colorblind() +
theme_classic() +
theme(axis.text.x = element_text(colour = "darkslategrey", size = 16),
axis.text.y = element_text(colour = "darkslategrey", size = 16),
legend.background = element_rect(fill = "white", linewidth = 4, colour = "white"),
legend.justification = c(0, 1),
legend.position = c(0.9, 1),
text = element_text(family = "Georgia"),
plot.title = element_text(size = 18, margin = margin(b = 10)),
plot.subtitle = element_text(size = 12, color = "darkslategrey", margin = margin(b = 25)),
plot.caption = element_text(size = 8, margin = margin(t = 10), color = "grey70", hjust = 0))Marriages has no gender column so one must be created by counting the size of the dataframe
# The creates a table showing the length of the marriages dataset which is 274 as thats the total number of marriages
marriages %>% count() -> marriages_gender
# Next we add a new column called Gender and add a new row called Female
marriages_gender$Gender <- c("Female")
# Using the NULL command removes the n row
marriages_gender$n <- NULL# This adds the value of 274 for the Marriages column
marriages_gender$Marriages <- c(274)
# This adds a new row of Male
marriages_gender %>% add_row(Gender = c("Male")) -> marriages_gender
# This adds the value of 274 for the Marriages column and replaces the NA that was added for Male row
marriages_gender[is.na(marriages_gender)] <- 274## # A tibble: 2 × 2
## Gender Marriages
## <chr> <dbl>
## 1 Female 274
## 2 Male 274
As the other elements of the SGL database all contain a Gender column, the creation of specific gender totals is straightforward
burials %>% count(Gender) -> burials_gender
colnames(burials_gender) <- c("Gender", "Deaths")
baptisms %>% count(Gender) -> baptisms_gender
colnames(baptisms_gender) <- c("Gender", "Baptisms")
abjurations %>% count(Gender) -> abjurations_gender
colnames(abjurations_gender) <- c("Gender", "Abjurations")Merge these into a single dataframe in stages
inner_join(burials_gender, baptisms_gender, by="Gender") -> burials_baptisms_gender
inner_join(marriages_gender, abjurations_gender, by="Gender") -> marriages_abjurations_gender
inner_join(burials_baptisms_gender, marriages_abjurations_gender, by="Gender") -> registers_gender## Using Gender as id variables
registers_gender_long %>% ggplot(aes(x=Gender, y=value)) +
geom_bar(aes(fill=variable), position = "dodge", stat = "identity", width = 0.5) +
labs(title = "Gender Breakdown of Registers, 1689-1740",
subtitle = "(St. Germain-en-Laye)",
tag = "Figure 2",
caption = "Database of St. Germain-en-Laye Registers",
x = "Gender",
y = "No.",
fill = "Registers") +
scale_fill_colorblind()+
theme_classic() +
theme(axis.text.x = element_text(colour = "darkslategrey", size = 16),
axis.text.y = element_text(colour = "darkslategrey", size = 16),
legend.background = element_rect(fill = "white", size = 4, colour = "white"),
legend.justification = c(0, 1),
legend.position = c(0.9, 1),
text = element_text(family = "Georgia"),
plot.title = element_text(size = 18, margin = margin(b = 10)),
plot.subtitle = element_text(size = 12, color = "darkslategrey", margin = margin(b = 25)),
plot.caption = element_text(size = 8, margin = margin(t = 10), color = "grey70", hjust = 0))## Warning: The `size` argument of `element_rect()` is deprecated as of ggplot2 3.4.0.
## ℹ Please use the `linewidth` argument instead.