J. Kavanagh
2022-07-14
This lecture is designed to show you the potential for using R for spatial analysis when applied to historical questions.
You must have the following installed on your computer before we begin.
Open RStudio, in the console type the following
## [1] "/Users/jackkavanagh/Dropbox/R_SPUR"
Set the Working Directory using this command
setwd()
Load the following libraries
library(tidyverse) library(lubridate) library(ggthemes) library(reshape2) library(historydata) library(HistData)
After you’ve loaded the libraries, import the following file which you need to place in your Working Directory, use the following command to import the data
There should be a number of distinct dataframes:
The %>% pipeline command will be used throughout this lecture to link various command queries
early_colleges %>% select(established)
The %in% command is used for matching a vector within a dataframe
early_colleges %>% filter(established %in% c('1795','1797','1802'))
The $ command is used to display the internal components of a dataframe
early_colleges$
Please run these now in the Console section of RStudio
The head() and tail() commands are useful for exploring the datasets, each shows the first ten and last ten rows. This is particularly useful when importing data and ensuring that all the information has been correctly inputted.
## # A tibble: 6 × 6
## college original_name city state established sponsorship
## <chr> <chr> <chr> <chr> <int> <chr>
## 1 Harvard <NA> Camb… MA 1636 Congregati…
## 2 William and Mary <NA> Will… VA 1693 Anglican
## 3 Yale <NA> New … CT 1701 Congregati…
## 4 Pennsylvania, Univ. of <NA> Phil… PA 1740 Nondenomin…
## 5 Princeton College of New Jer… Prin… NJ 1746 Presbyteri…
## 6 Columbia King's College New … NY 1754 Anglican
## # A tibble: 6 × 6
## college original_name city state established sponsorship
## <chr> <chr> <chr> <chr> <int> <chr>
## 1 Beloit <NA> Beloit WI 1846 Congregational
## 2 Bucknell <NA> Lewisburg PA 1846 Baptist
## 3 Grinnell <NA> Grinnell IA 1846 Congregational
## 4 Mount Union <NA> Alliance OH 1846 Methodist
## 5 Earlham <NA> Richmond IN 1847 Quaker
## 6 Wisconsin, Univ. of <NA> Madison WI 1848 Secular
Another way to view an entire dataset is to use the glimpse() command which displays the overall dataset and the class of each type
## Rows: 65
## Columns: 6
## $ college <chr> "Harvard", "William and Mary", "Yale", "Pennsylvania, Un…
## $ original_name <chr> NA, NA, NA, NA, "College of New Jersey", "King's College…
## $ city <chr> "Cambridge", "Williamsburg", "New Haven", "Philadelphia"…
## $ state <chr> "MA", "VA", "CT", "PA", "NJ", "NY", "RI", "NJ", "NH", "S…
## $ established <int> 1636, 1693, 1701, 1740, 1746, 1754, 1765, 1766, 1769, 17…
## $ sponsorship <chr> "Congregational; after 1805 Unitarian", "Anglican", "Con…
The summary() command provides an overview of the dataset, most beneficial with numerical data
## college original_name city state
## Length:65 Length:65 Length:65 Length:65
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
## established sponsorship
## Min. :1636 Length:65
## 1st Qu.:1793 Class :character
## Median :1823 Mode :character
## Mean :1810
## 3rd Qu.:1838
## Max. :1848
The select() command from ‘dplyr’ is a very versatile command that can be used in sequence using the %>% pipeline to link to other commands
## # A tibble: 6 × 3
## college city state
## <chr> <chr> <chr>
## 1 Harvard Cambridge MA
## 2 William and Mary Williamsburg VA
## 3 Yale New Haven CT
## 4 Pennsylvania, Univ. of Philadelphia PA
## 5 Princeton Princeton NJ
## 6 Columbia New York NY
The filter() command also from ‘dplyr’ is very useful and can implement numerical and text commands.
This example shows the number of colleges established prior to 1800
## # A tibble: 6 × 6
## college original_name city state established sponsorship
## <chr> <chr> <chr> <chr> <int> <chr>
## 1 Harvard <NA> Camb… MA 1636 Congregati…
## 2 William and Mary <NA> Will… VA 1693 Anglican
## 3 Yale <NA> New … CT 1701 Congregati…
## 4 Pennsylvania, Univ. of <NA> Phil… PA 1740 Nondenomin…
## 5 Princeton College of New Jer… Prin… NJ 1746 Presbyteri…
## 6 Columbia King's College New … NY 1754 Anglican
This example shows the colleges with the state of New York using the ==
## # A tibble: 6 × 6
## college original_name city state established sponsorship
## <chr> <chr> <chr> <chr> <int> <chr>
## 1 Columbia King's College New York NY 1754 Anglican
## 2 Union College <NA> Schenectady NY 1795 Presbyteri…
## 3 U.S. Military Academy <NA> West Point NY 1802 Secular
## 4 Colgate <NA> Hamilton NY 1819 Baptist
## 5 New York Univ. <NA> New York NY 1831 Nondenomin…
## 6 Fordham <NA> Fordham NY 1841 Roman Cath…
Using the != displays all states that are not New York
## # A tibble: 6 × 6
## college original_name city state established sponsorship
## <chr> <chr> <chr> <chr> <int> <chr>
## 1 Harvard <NA> Camb… MA 1636 Congregati…
## 2 William and Mary <NA> Will… VA 1693 Anglican
## 3 Yale <NA> New … CT 1701 Congregati…
## 4 Pennsylvania, Univ. of <NA> Phil… PA 1740 Nondenomin…
## 5 Princeton College of New Jer… Prin… NJ 1746 Presbyteri…
## 6 Brown <NA> Prov… RI 1765 Baptist
The simple logical operators are for the filter command are:
& (and)
| (or)
! (not)
This example shows the colleges with the state of New York, Massachusetts & Virginia –
## # A tibble: 6 × 6
## college original_name city state established sponsorship
## <chr> <chr> <chr> <chr> <int> <chr>
## 1 Harvard <NA> Cambridge MA 1636 Congregation…
## 2 William and Mary <NA> Williamsburg VA 1693 Anglican
## 3 Columbia King's College New York NY 1754 Anglican
## 4 Hampden-Sydney <NA> Hampden-Sydney VA 1775 Presbyterian
## 5 Williams <NA> Williamstown MA 1793 Congregation…
## 6 Union College <NA> Schenectady NY 1795 Presbyterian…
Note that the %in% command can also be expressed as follows:
# Create a new character list of three states using their abbreviations
three_states <- c("NY", "VA", "MA")
# Filter for this using the %in% command
early_colleges %>% filter(state %in% three_states) %>% head()## # A tibble: 6 × 6
## college original_name city state established sponsorship
## <chr> <chr> <chr> <chr> <int> <chr>
## 1 Harvard <NA> Cambridge MA 1636 Congregation…
## 2 William and Mary <NA> Williamsburg VA 1693 Anglican
## 3 Columbia King's College New York NY 1754 Anglican
## 4 Hampden-Sydney <NA> Hampden-Sydney VA 1775 Presbyterian
## 5 Williams <NA> Williamstown MA 1793 Congregation…
## 6 Union College <NA> Schenectady NY 1795 Presbyterian…
This will be one of the most common ways to explore a dataset, exploring specific variables and counting them
## # A tibble: 25 × 2
## state n
## <chr> <int>
## 1 CT 3
## 2 DC 2
## 3 GA 2
## 4 IA 1
## 5 IL 1
## 6 IN 2
## 7 KY 1
## 8 LA 1
## 9 MA 6
## 10 MD 2
## # … with 15 more rows
To create a new column, in this case merging two geographic attributes into a single column, use mutate(). The sep= refers to how you want the two words to be separated. In this example its with a comma e.g. New York, NY
## # A tibble: 65 × 7
## college original_name city state established sponsorship location
## <chr> <chr> <chr> <chr> <int> <chr> <chr>
## 1 Harvard <NA> Camb… MA 1636 Congregati… Cambrid…
## 2 William and Mary <NA> Will… VA 1693 Anglican William…
## 3 Yale <NA> New … CT 1701 Congregati… New Hav…
## 4 Pennsylvania, Uni… <NA> Phil… PA 1740 Nondenomin… Philade…
## 5 Princeton College of N… Prin… NJ 1746 Presbyteri… Princet…
## 6 Columbia King's Colle… New … NY 1754 Anglican New Yor…
## 7 Brown <NA> Prov… RI 1765 Baptist Provide…
## 8 Rutgers Queen's Coll… New … NJ 1766 Dutch Refo… New Bru…
## 9 Dartmouth <NA> Hano… NH 1769 Congregati… Hanover…
## 10 Charleston, Coll.… <NA> Char… SC 1770 Anglican Charles…
## # … with 55 more rows
Although mutate() creates a new column, unless you save it back into the main dataframe it will be lost therefore always point your code back to the original dataframe using the -> command.
Some programmers use the = sign, however, this is not recommended in R as that sign has other uses depending on which package you are using.
# Now when you run this code, the number of variables will increase to 7
early_colleges %>% mutate(location=paste(city,state,sep=","))-> early_colleges
early_colleges## # A tibble: 65 × 7
## college original_name city state established sponsorship location
## <chr> <chr> <chr> <chr> <int> <chr> <chr>
## 1 Harvard <NA> Camb… MA 1636 Congregati… Cambrid…
## 2 William and Mary <NA> Will… VA 1693 Anglican William…
## 3 Yale <NA> New … CT 1701 Congregati… New Hav…
## 4 Pennsylvania, Uni… <NA> Phil… PA 1740 Nondenomin… Philade…
## 5 Princeton College of N… Prin… NJ 1746 Presbyteri… Princet…
## 6 Columbia King's Colle… New … NY 1754 Anglican New Yor…
## 7 Brown <NA> Prov… RI 1765 Baptist Provide…
## 8 Rutgers Queen's Coll… New … NJ 1766 Dutch Refo… New Bru…
## 9 Dartmouth <NA> Hano… NH 1769 Congregati… Hanover…
## 10 Charleston, Coll.… <NA> Char… SC 1770 Anglican Charles…
## # … with 55 more rows
early_colleges %>% filter(established < 1812) %>% mutate(is_secular=ifelse(sponsorship!="Secular", "no", "yes")) -> secular_colleges_before_1812
secular_colleges_before_1812## # A tibble: 23 × 8
## college original_name city state established sponsorship location is_secular
## <chr> <chr> <chr> <chr> <int> <chr> <chr> <chr>
## 1 Harvard <NA> Camb… MA 1636 Congregati… Cambrid… no
## 2 Willia… <NA> Will… VA 1693 Anglican William… no
## 3 Yale <NA> New … CT 1701 Congregati… New Hav… no
## 4 Pennsy… <NA> Phil… PA 1740 Nondenomin… Philade… no
## 5 Prince… College of N… Prin… NJ 1746 Presbyteri… Princet… no
## 6 Columb… King's Colle… New … NY 1754 Anglican New Yor… no
## 7 Brown <NA> Prov… RI 1765 Baptist Provide… no
## 8 Rutgers Queen's Coll… New … NJ 1766 Dutch Refo… New Bru… no
## 9 Dartmo… <NA> Hano… NH 1769 Congregati… Hanover… no
## 10 Charle… <NA> Char… SC 1770 Anglican Charles… no
## # … with 13 more rows
Filter the early_colleges to show the colleges established in the original 13 colonies of the United States of America
Create a new object from the early_colleges dataset showing the largest religious sponsorship of colleges
A useful package for visualising research findings as charts and graphs is ‘ggplot2’. It is included in the ‘tidyverse’ package and follows the guidelines of the ‘Layered Grammar of Graphics’.
The key layers are:
We’re going to go through Prof. Chris Brunsdon’s introductionary lecture to using ggplot2 available here
# A number of training datasets are included with R, a common one that is used for tutorials is the 'mtcars'
data(mtcars)
mtcars## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
## Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
## Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
## Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
## Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
## Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
## Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
## Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
## Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
## Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
## Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
## Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
## Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
## Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
## Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
## Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
## Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
## AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
## Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
## Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
## Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
## Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
## Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
## Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
## Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
## Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
## Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2