Dr. J. Kavanagh
2023-09-09
A useful package for visualising research findings as charts and graphs is ‘ggplot2’. It is included in the ‘tidyverse’ package and follows the guidelines of the ‘Layered Grammar of Graphics’.
The key layers are:
We’re going to utilise elements of Prof. Chris Brunsdon’s introductionary lecture to ggplot2 available here
To start with load the mtcars sample dataset.
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
Next we’re goign to make cylinders an ordinal factor, just to give an example of what this means here is the following example:
## [1] "grande" "ventil" "tall"
## [1] grande ventil tall grande ventil tall
## Levels: tall grande ventil
To ensure that a vector has been properly assigned as a factor, use the is.factor() function
## [1] TRUE
## [1] FALSE
So now we’re going to change the cyl column and re-order in numerical order.
# Using mutate() we can adjust the cyl column
mtcars %>% mutate(cyl=factor(cyl, ordered = TRUE, levels=c(4,6,8))) %>% head(n=6)## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
Here is an example of a histogram use the syntax of ggplot2. aes = aesthetic means a mapping between a variable and a characteristic of the plot, x refers to the x-axis. Histograms are a very useful visualisation and can add significant value
You can save these as gg objects, for example:
Now you can add to the gg object like labels and themes, using the library ggthemes, there are additional themes you can utilise, for example:
Another type of visualisation is a scatterplot, the geom_smooth() function highlights the overall trend.
my_scatplot <- ggplot(mtcars,aes(x=wt,y=mpg)) + geom_point()
my_scatplot + xlab('Weight (x 1000lbs)') + ylab('Miles per Gallon') + geom_smooth()## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
We can also specify colours of a third variable, cylinders, in addition to the miles per gallon and weight.
my_scatplot <- ggplot(mtcars,aes(x=wt,y=mpg,col=cyl)) + geom_point()
my_scatplot + labs(x='Weight (x1000lbs)',y='Miles per Gallon',colour='Number of\n Cylinders')The facet_grid() command can help illustrate the disparate elements of a dataset in a concise way.
Returning to the judges datasets, note the number of rows, observations and class as shown by the glimpse() command.
## Rows: 4,202
## Columns: 15
## $ judge_id <int> 3419, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11…
## $ court_name <chr> "U. S. District Court, Southern Distric…
## $ court_type <chr> "USDC", "USDC", "USDC", "USDC", "USDC",…
## $ president_name <chr> "Barack Obama", "Franklin D. Roosevelt"…
## $ president_party <chr> "Democratic", "Democratic", "Republican…
## $ nomination_date <chr> "07/28/2011", "02/03/1936", "01/06/1880…
## $ predecessor_last_name <chr> "Kaplan", "new", "Ketcham", "McFadden",…
## $ predecessor_first_name <chr> "Lewis A.", NA, "Winthrop", "Frank H.",…
## $ senate_confirmation_date <chr> "03/22/2012", "02/12/1936", "01/14/1880…
## $ commission_date <chr> "03/23/2012", "02/15/1936", "01/14/1880…
## $ chief_judge_begin <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ chief_judge_end <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ retirement_from_active_service <chr> NA, "02/15/1966", NA, "05/31/1996", "02…
## $ termination_date <chr> NA, "05/28/1971", "02/09/1891", NA, "12…
## $ termination_reason <chr> NA, "Death", "Appointment to Another Ju…
## Rows: 3,532
## Columns: 13
## $ judge_id <int> 3419, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 2989, 32…
## $ name_first <chr> "Ronnie", "Matthew", "Marcus", "William", "Harold", "…
## $ name_middle <chr> NA, "T.", "Wilson", "Marsh", "Arnold", "Waldo", "L.",…
## $ name_last <chr> "Abrams", "Abruzzo", "Acheson", "Acker", "Ackerman", …
## $ name_suffix <chr> NA, NA, NA, "Jr.", NA, NA, NA, NA, NA, NA, NA, NA, "J…
## $ birth_date <int> 1968, 1889, 1828, 1927, 1928, 1926, 1925, 1887, 1921,…
## $ birthplace_city <chr> "New York", "Brooklyn", "Washington", "Birmingham", "…
## $ birthplace_state <chr> "NY", "NY", "PA", "AL", "NJ", "FL", "NY", "IL", "PA",…
## $ death_date <int> NA, 1971, 1906, NA, 2009, 1984, NA, 1956, NA, 1916, 1…
## $ death_city <chr> NA, "Potomac", "Pittsburgh", NA, "West Orange", "Spri…
## $ death_state <chr> NA, "MD", "PA", NA, "NJ", "IL", NA, NA, NA, "MO", "MS…
## $ gender <chr> "F", "M", "M", "M", "M", "M", "M", "M", "M", "M", "M"…
## $ race <chr> "White", "White", "White", "White", "White", "White",…
Using base R it is possible to join datasets either by row or column
## a b c
## 1 1 7 3
## 2 3 7 3
## 3 3 8 6
## 4 4 3 6
## 5 5 2 8
## [1] 11 14 17
# rbind() will join this datasets together as they are of equal length and stack one atop the other
df_new <- rbind(df, df2)
df_new## a b c
## 1 1 7 3
## 2 3 7 3
## 3 3 8 6
## 4 4 3 6
## 5 5 2 8
## 6 11 14 17
This is an example of how to join dataframes by column
## a b c
## 1 1 7 3
## 2 3 7 3
## 3 3 8 6
## 4 4 3 6
## 5 5 2 8
## [1] 11 14 16 17 22
## a b c df2
## 1 1 7 3 11
## 2 3 7 3 14
## 3 3 8 6 16
## 4 4 3 6 17
## 5 5 2 8 22
The key advantage to using inner_join() from ‘dplyr’ is that it allows for the linking by specific named variables, in this case the primary key in both judges datasets
## # A tibble: 6 × 1
## judge_id
## <int>
## 1 3419
## 2 1
## 3 2
## 4 3
## 5 4
## 6 5
## # A tibble: 6 × 1
## judge_id
## <int>
## 1 3419
## 2 1
## 3 2
## 4 3
## 5 4
## 6 5
Be sure to create a new dataframe or the join will not be saved to your workspace
Note that the people data has been linked to the appointments, and there are clearly multiple entries, indicating that a number of individuals were appointed to numerous judicial posts.
## Rows: 4,202
## Columns: 27
## $ judge_id <int> 3419, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11…
## $ court_name <chr> "U. S. District Court, Southern Distric…
## $ court_type <chr> "USDC", "USDC", "USDC", "USDC", "USDC",…
## $ president_name <chr> "Barack Obama", "Franklin D. Roosevelt"…
## $ president_party <chr> "Democratic", "Democratic", "Republican…
## $ nomination_date <chr> "07/28/2011", "02/03/1936", "01/06/1880…
## $ predecessor_last_name <chr> "Kaplan", "new", "Ketcham", "McFadden",…
## $ predecessor_first_name <chr> "Lewis A.", NA, "Winthrop", "Frank H.",…
## $ senate_confirmation_date <chr> "03/22/2012", "02/12/1936", "01/14/1880…
## $ commission_date <chr> "03/23/2012", "02/15/1936", "01/14/1880…
## $ chief_judge_begin <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ chief_judge_end <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ retirement_from_active_service <chr> NA, "02/15/1966", NA, "05/31/1996", "02…
## $ termination_date <chr> NA, "05/28/1971", "02/09/1891", NA, "12…
## $ termination_reason <chr> NA, "Death", "Appointment to Another Ju…
## $ name_first <chr> "Ronnie", "Matthew", "Marcus", "William…
## $ name_middle <chr> NA, "T.", "Wilson", "Marsh", "Arnold", …
## $ name_last <chr> "Abrams", "Abruzzo", "Acheson", "Acker"…
## $ name_suffix <chr> NA, NA, NA, "Jr.", NA, NA, NA, NA, NA, …
## $ birth_date <int> 1968, 1889, 1828, 1927, 1928, 1926, 192…
## $ birthplace_city <chr> "New York", "Brooklyn", "Washington", "…
## $ birthplace_state <chr> "NY", "NY", "PA", "AL", "NJ", "FL", "NY…
## $ death_date <int> NA, 1971, 1906, NA, 2009, 1984, NA, 195…
## $ death_city <chr> NA, "Potomac", "Pittsburgh", NA, "West …
## $ death_state <chr> NA, "MD", "PA", NA, "NJ", "IL", NA, NA,…
## $ gender <chr> "F", "M", "M", "M", "M", "M", "M", "M",…
## $ race <chr> "White", "White", "White", "White", "Wh…
## # A tibble: 6 × 27
## judge_id court_name court_type president_name president_party nomination_date
## <int> <chr> <chr> <chr> <chr> <chr>
## 1 3419 U. S. Dist… USDC Barack Obama Democratic 07/28/2011
## 2 1 U. S. Dist… USDC Franklin D. R… Democratic 02/03/1936
## 3 2 U. S. Dist… USDC Rutherford B.… Republican 01/06/1880
## 4 3 U. S. Dist… USDC Ronald Reagan Republican 07/22/1982
## 5 4 U. S. Dist… USDC Jimmy Carter Democratic 09/28/1979
## 6 5 U. S. Dist… USDC Gerald Ford Republican 06/18/1976
## # ℹ 21 more variables: predecessor_last_name <chr>,
## # predecessor_first_name <chr>, senate_confirmation_date <chr>,
## # commission_date <chr>, chief_judge_begin <int>, chief_judge_end <int>,
## # retirement_from_active_service <chr>, termination_date <chr>,
## # termination_reason <chr>, name_first <chr>, name_middle <chr>,
## # name_last <chr>, name_suffix <chr>, birth_date <int>,
## # birthplace_city <chr>, birthplace_state <chr>, death_date <int>, …
## # A tibble: 3,532 × 2
## judge_id n
## <int> <int>
## 1 1 1
## 2 2 1
## 3 3 1
## 4 4 1
## 5 5 1
## 6 6 1
## 7 7 1
## 8 8 1
## 9 9 1
## 10 10 1
## # ℹ 3,522 more rows
# There are 4,202 appointments, however there are only 3532 individual judges
judges_appointments %>% count(judge_id)## # A tibble: 3,532 × 2
## judge_id n
## <int> <int>
## 1 1 1
## 2 2 3
## 3 3 1
## 4 4 1
## 5 5 2
## 6 6 1
## 7 7 1
## 8 8 1
## 9 9 3
## 10 10 3
## # ℹ 3,522 more rows
## # A tibble: 3,532 × 2
## judge_id n
## <int> <int>
## 1 1 1
## 2 2 3
## 3 3 1
## 4 4 1
## 5 5 2
## 6 6 1
## 7 7 1
## 8 8 1
## 9 9 3
## 10 10 3
## # ℹ 3,522 more rows
There are a number of different dates included in the judges_unified dataframe. However, none of these variables are the correct class as shown by the glimpse().
## [1] "07/28/2011" "02/03/1936" "01/06/1880" "07/22/1982" "09/28/1979"
## [6] "06/18/1976"
## [1] "03/22/2012" "02/12/1936" "01/14/1880" "08/18/1982" "10/31/1979"
## [6] "07/02/1976"
## [1] "03/23/2012" "02/15/1936" "01/14/1880" "08/18/1982" "11/02/1979"
## [6] "07/02/1976"
## [1] NA "05/28/1971" "02/09/1891" NA "12/02/2009"
## [6] "03/31/1979"
There are a number of different ways to adjust dates, however, as the data is structured we can use the mdy() command from the package ‘lubridate’ to make a relatively simple change.
# Create some sample dates
begin <- c("May 11, 1996", "September 12, 2001", "July 1, 1988")
end <- c("7/8/97","10/23/02","1/4/91")
class(begin)## [1] "character"
## [1] "character"
## [1] "1996-05-11" "2001-09-12" "1988-07-01"
## [1] "1997-07-08" "2002-10-23" "1991-01-04"
## [1] "Date"
## [1] "Date"
Use the mdy() command and verify the results with the class() command
mdy(judges_unified$nomination_date) -> judges_unified$nomination_date
class(judges_unified$nomination_date)## [1] "Date"
mdy(judges_unified$senate_confirmation_date) -> judges_unified$senate_confirmation_date
class(judges_unified$senate_confirmation_date)## [1] "Date"
mdy(judges_unified$commission_date) -> judges_unified$commission_date
class(judges_unified$commission_date)## [1] "Date"
mdy(judges_unified$termination_date) -> judges_unified$termination_date
class(judges_unified$termination_date)## [1] "Date"
First you need to create a new dataframe that provides the number of nominations per day
# This creates a new variable, however, you will need to rename the column names
judges_unified %>% count(nomination_date) -> judges_nominations_date
judges_nominations_date## # A tibble: 2,036 × 2
## nomination_date n
## <date> <int>
## 1 1789-09-24 13
## 2 1789-09-25 2
## 3 1790-02-08 4
## 4 1790-06-11 1
## 5 1790-07-02 1
## 6 1790-08-02 1
## 7 1790-12-17 2
## 8 1791-03-04 1
## 9 1791-10-31 2
## 10 1792-01-12 1
## # ℹ 2,026 more rows
This is vital as you will create multiple smaller dataframes and need to individualise the column names. This will prevent future errors.
# Rename the columns
colnames(judges_nominations_date) <- c("Date", "Nominations")
# Check your results
judges_nominations_date## # A tibble: 2,036 × 2
## Date Nominations
## <date> <int>
## 1 1789-09-24 13
## 2 1789-09-25 2
## 3 1790-02-08 4
## 4 1790-06-11 1
## 5 1790-07-02 1
## 6 1790-08-02 1
## 7 1790-12-17 2
## 8 1791-03-04 1
## 9 1791-10-31 2
## 10 1792-01-12 1
## # ℹ 2,026 more rows
Group nomination dates into years using the floor_date() command from the ‘lubridate’ package. Its fairly intelligent and can reorganise dates into days, months, years etc. First create dataframes of eac
judges_nominations_date %>% group_by(year=floor_date(Date, "year")) %>%
summarize(No_of_Nominations=sum(Nominations)) -> judges_nominations_yearly
judges_nominations_yearly## # A tibble: 220 × 2
## year No_of_Nominations
## <date> <int>
## 1 1789-01-01 15
## 2 1790-01-01 9
## 3 1791-01-01 3
## 4 1792-01-01 1
## 5 1793-01-01 2
## 6 1794-01-01 1
## 7 1795-01-01 2
## 8 1796-01-01 5
## 9 1797-01-01 1
## 10 1798-01-01 2
## # ℹ 210 more rows
Repeat this process for Commission and Termination Date. Create a new dataframe for judges terminations
# This creates a new variable, however, you will need to rename the column names
judges_unified %>% count(termination_date) -> judges_terminations_date
judges_terminations_date## # A tibble: 2,498 × 2
## termination_date n
## <date> <int>
## 1 1790-05-18 1
## 2 1790-08-16 1
## 3 1790-10-12 1
## 4 1791-03-05 1
## 5 1791-05-09 1
## 6 1792-01-04 1
## 7 1793-01-01 1
## 8 1793-01-16 1
## 9 1794-03-17 1
## 10 1794-06-09 1
## # ℹ 2,488 more rows
Create a new dataframe for judges commissions
# This creates a new variable, however, you will need to rename the column names
judges_unified %>% count(commission_date) -> judges_commissions_date
judges_commissions_date## # A tibble: 2,066 × 2
## commission_date n
## <date> <int>
## 1 1789-09-26 12
## 2 1789-09-27 1
## 3 1789-09-29 1
## 4 1789-09-30 1
## 5 1790-02-10 4
## 6 1790-06-14 1
## 7 1790-07-03 1
## 8 1790-08-03 1
## 9 1790-12-20 2
## 10 1791-03-04 1
## # ℹ 2,056 more rows
# Rename the columns
colnames(judges_terminations_date) <- c("Date", "Terminations")
# Check your results
judges_terminations_date## # A tibble: 2,498 × 2
## Date Terminations
## <date> <int>
## 1 1790-05-18 1
## 2 1790-08-16 1
## 3 1790-10-12 1
## 4 1791-03-05 1
## 5 1791-05-09 1
## 6 1792-01-04 1
## 7 1793-01-01 1
## 8 1793-01-16 1
## 9 1794-03-17 1
## 10 1794-06-09 1
## # ℹ 2,488 more rows
# Rename the columns
colnames(judges_commissions_date) <- c("Date", "Commissions")
# Check your results
judges_commissions_date## # A tibble: 2,066 × 2
## Date Commissions
## <date> <int>
## 1 1789-09-26 12
## 2 1789-09-27 1
## 3 1789-09-29 1
## 4 1789-09-30 1
## 5 1790-02-10 4
## 6 1790-06-14 1
## 7 1790-07-03 1
## 8 1790-08-03 1
## 9 1790-12-20 2
## 10 1791-03-04 1
## # ℹ 2,056 more rows
judges_terminations_date %>% group_by(year=floor_date(Date, "year")) %>%
summarize(No_of_Terminations=sum(Terminations)) -> judges_terminations_yearly
judges_terminations_yearly## # A tibble: 222 × 2
## year No_of_Terminations
## <date> <int>
## 1 1790-01-01 3
## 2 1791-01-01 2
## 3 1792-01-01 1
## 4 1793-01-01 2
## 5 1794-01-01 2
## 6 1795-01-01 4
## 7 1796-01-01 3
## 8 1797-01-01 1
## 9 1798-01-01 2
## 10 1799-01-01 2
## # ℹ 212 more rows
judges_commissions_date %>% group_by(year=floor_date(Date, "year")) %>%
summarize(No_of_Commissions=sum(Commissions)) -> judges_commissions_yearly
judges_commissions_yearly## # A tibble: 219 × 2
## year No_of_Commissions
## <date> <int>
## 1 1789-01-01 15
## 2 1790-01-01 9
## 3 1791-01-01 3
## 4 1792-01-01 1
## 5 1793-01-01 1
## 6 1794-01-01 3
## 7 1795-01-01 1
## 8 1796-01-01 4
## 9 1797-01-01 3
## 10 1798-01-01 2
## # ℹ 209 more rows
In order to merge the three date columns: Nominations, Commissions and Terminations. It is necessary to create a dataframe of the entire year range. In this case, 1789-2014 or 226 years.
# This creates a new
all_dates <- data.frame(year=seq(as.Date("1789-01-01"), by="year", length.out=226))
all_dates## year
## 1 1789-01-01
## 2 1790-01-01
## 3 1791-01-01
## 4 1792-01-01
## 5 1793-01-01
## 6 1794-01-01
## 7 1795-01-01
## 8 1796-01-01
## 9 1797-01-01
## 10 1798-01-01
## 11 1799-01-01
## 12 1800-01-01
## 13 1801-01-01
## 14 1802-01-01
## 15 1803-01-01
## 16 1804-01-01
## 17 1805-01-01
## 18 1806-01-01
## 19 1807-01-01
## 20 1808-01-01
## 21 1809-01-01
## 22 1810-01-01
## 23 1811-01-01
## 24 1812-01-01
## 25 1813-01-01
## 26 1814-01-01
## 27 1815-01-01
## 28 1816-01-01
## 29 1817-01-01
## 30 1818-01-01
## 31 1819-01-01
## 32 1820-01-01
## 33 1821-01-01
## 34 1822-01-01
## 35 1823-01-01
## 36 1824-01-01
## 37 1825-01-01
## 38 1826-01-01
## 39 1827-01-01
## 40 1828-01-01
## 41 1829-01-01
## 42 1830-01-01
## 43 1831-01-01
## 44 1832-01-01
## 45 1833-01-01
## 46 1834-01-01
## 47 1835-01-01
## 48 1836-01-01
## 49 1837-01-01
## 50 1838-01-01
## 51 1839-01-01
## 52 1840-01-01
## 53 1841-01-01
## 54 1842-01-01
## 55 1843-01-01
## 56 1844-01-01
## 57 1845-01-01
## 58 1846-01-01
## 59 1847-01-01
## 60 1848-01-01
## 61 1849-01-01
## 62 1850-01-01
## 63 1851-01-01
## 64 1852-01-01
## 65 1853-01-01
## 66 1854-01-01
## 67 1855-01-01
## 68 1856-01-01
## 69 1857-01-01
## 70 1858-01-01
## 71 1859-01-01
## 72 1860-01-01
## 73 1861-01-01
## 74 1862-01-01
## 75 1863-01-01
## 76 1864-01-01
## 77 1865-01-01
## 78 1866-01-01
## 79 1867-01-01
## 80 1868-01-01
## 81 1869-01-01
## 82 1870-01-01
## 83 1871-01-01
## 84 1872-01-01
## 85 1873-01-01
## 86 1874-01-01
## 87 1875-01-01
## 88 1876-01-01
## 89 1877-01-01
## 90 1878-01-01
## 91 1879-01-01
## 92 1880-01-01
## 93 1881-01-01
## 94 1882-01-01
## 95 1883-01-01
## 96 1884-01-01
## 97 1885-01-01
## 98 1886-01-01
## 99 1887-01-01
## 100 1888-01-01
## 101 1889-01-01
## 102 1890-01-01
## 103 1891-01-01
## 104 1892-01-01
## 105 1893-01-01
## 106 1894-01-01
## 107 1895-01-01
## 108 1896-01-01
## 109 1897-01-01
## 110 1898-01-01
## 111 1899-01-01
## 112 1900-01-01
## 113 1901-01-01
## 114 1902-01-01
## 115 1903-01-01
## 116 1904-01-01
## 117 1905-01-01
## 118 1906-01-01
## 119 1907-01-01
## 120 1908-01-01
## 121 1909-01-01
## 122 1910-01-01
## 123 1911-01-01
## 124 1912-01-01
## 125 1913-01-01
## 126 1914-01-01
## 127 1915-01-01
## 128 1916-01-01
## 129 1917-01-01
## 130 1918-01-01
## 131 1919-01-01
## 132 1920-01-01
## 133 1921-01-01
## 134 1922-01-01
## 135 1923-01-01
## 136 1924-01-01
## 137 1925-01-01
## 138 1926-01-01
## 139 1927-01-01
## 140 1928-01-01
## 141 1929-01-01
## 142 1930-01-01
## 143 1931-01-01
## 144 1932-01-01
## 145 1933-01-01
## 146 1934-01-01
## 147 1935-01-01
## 148 1936-01-01
## 149 1937-01-01
## 150 1938-01-01
## 151 1939-01-01
## 152 1940-01-01
## 153 1941-01-01
## 154 1942-01-01
## 155 1943-01-01
## 156 1944-01-01
## 157 1945-01-01
## 158 1946-01-01
## 159 1947-01-01
## 160 1948-01-01
## 161 1949-01-01
## 162 1950-01-01
## 163 1951-01-01
## 164 1952-01-01
## 165 1953-01-01
## 166 1954-01-01
## 167 1955-01-01
## 168 1956-01-01
## 169 1957-01-01
## 170 1958-01-01
## 171 1959-01-01
## 172 1960-01-01
## 173 1961-01-01
## 174 1962-01-01
## 175 1963-01-01
## 176 1964-01-01
## 177 1965-01-01
## 178 1966-01-01
## 179 1967-01-01
## 180 1968-01-01
## 181 1969-01-01
## 182 1970-01-01
## 183 1971-01-01
## 184 1972-01-01
## 185 1973-01-01
## 186 1974-01-01
## 187 1975-01-01
## 188 1976-01-01
## 189 1977-01-01
## 190 1978-01-01
## 191 1979-01-01
## 192 1980-01-01
## 193 1981-01-01
## 194 1982-01-01
## 195 1983-01-01
## 196 1984-01-01
## 197 1985-01-01
## 198 1986-01-01
## 199 1987-01-01
## 200 1988-01-01
## 201 1989-01-01
## 202 1990-01-01
## 203 1991-01-01
## 204 1992-01-01
## 205 1993-01-01
## 206 1994-01-01
## 207 1995-01-01
## 208 1996-01-01
## 209 1997-01-01
## 210 1998-01-01
## 211 1999-01-01
## 212 2000-01-01
## 213 2001-01-01
## 214 2002-01-01
## 215 2003-01-01
## 216 2004-01-01
## 217 2005-01-01
## 218 2006-01-01
## 219 2007-01-01
## 220 2008-01-01
## 221 2009-01-01
## 222 2010-01-01
## 223 2011-01-01
## 224 2012-01-01
## 225 2013-01-01
## 226 2014-01-01
# Use the anti_join() command to show the missing dates
anti_join(all_dates, judges_commissions_yearly, by="year") -> missing_dates_commissions
missing_dates_commissions## year
## 1 1800-01-01
## 2 1805-01-01
## 3 1808-01-01
## 4 1810-01-01
## 5 1828-01-01
## 6 1831-01-01
## 7 1833-01-01
## 8 1843-01-01
# Use the anti_join() command to show the missing dates
anti_join(all_dates, judges_nominations_yearly, by="year") -> missing_dates_nominations
missing_dates_nominations## year
## 1 1800-01-01
## 2 1808-01-01
## 3 1810-01-01
## 4 1814-01-01
## 5 1827-01-01
## 6 1838-01-01
## 7 1843-01-01
# Merge the missing dates and the yearly judges nominations
merge(judges_nominations_yearly, missing_dates_nominations, by="year", all.y = T, all.x = T) -> judges_nominations_yearly# Use the anti_join() command to show the missing dates
anti_join(all_dates, judges_terminations_yearly, by="year") -> missing_dates_terminations
missing_dates_terminations## year
## 1 1789-01-01
## 2 1807-01-01
## 3 1808-01-01
## 4 1817-01-01
## 5 1827-01-01
Now each of the datasets are the same length they can be joined together. It is always a good policy to join the datasets in sequence, start with the Nominations and Commissions.
inner_join(judges_nominations_yearly, judges_commissions_yearly, by="year") -> nominations_commissions_yearly
nominations_commissions_yearly## year No_of_Nominations No_of_Commissions
## 1 1789-01-01 15 15
## 2 1790-01-01 9 9
## 3 1791-01-01 3 3
## 4 1792-01-01 1 1
## 5 1793-01-01 2 1
## 6 1794-01-01 1 3
## 7 1795-01-01 2 1
## 8 1796-01-01 5 4
## 9 1797-01-01 1 3
## 10 1798-01-01 2 2
## 11 1799-01-01 2 2
## 12 1800-01-01 NA NA
## 13 1801-01-01 18 21
## 14 1802-01-01 7 10
## 15 1803-01-01 2 2
## 16 1804-01-01 3 3
## 17 1805-01-01 1 NA
## 18 1806-01-01 5 5
## 19 1807-01-01 1 2
## 20 1808-01-01 NA NA
## 21 1809-01-01 1 1
## 22 1810-01-01 NA NA
## 23 1811-01-01 3 3
## 24 1812-01-01 5 5
## 25 1813-01-01 2 2
## 26 1814-01-01 NA 2
## 27 1815-01-01 1 1
## 28 1816-01-01 1 1
## 29 1817-01-01 2 2
## 30 1818-01-01 3 4
## 31 1819-01-01 3 4
## 32 1820-01-01 3 3
## 33 1821-01-01 2 1
## 34 1822-01-01 2 3
## 35 1823-01-01 4 6
## 36 1824-01-01 5 5
## 37 1825-01-01 3 3
## 38 1826-01-01 7 8
## 39 1827-01-01 NA 1
## 40 1828-01-01 2 NA
## 41 1829-01-01 4 5
## 42 1830-01-01 3 3
## 43 1831-01-01 1 NA
## 44 1832-01-01 1 2
## 45 1833-01-01 2 NA
## 46 1834-01-01 3 4
## 47 1835-01-01 3 1
## 48 1836-01-01 8 9
## 49 1837-01-01 4 5
## 50 1838-01-01 NA 2
## 51 1839-01-01 2 4
## 52 1840-01-01 4 4
## 53 1841-01-01 6 6
## 54 1842-01-01 3 3
## 55 1843-01-01 NA NA
## 56 1844-01-01 1 1
## 57 1845-01-01 4 2
## 58 1846-01-01 4 7
## 59 1847-01-01 2 3
## 60 1848-01-01 1 3
## 61 1849-01-01 5 4
## 62 1850-01-01 2 4
## 63 1851-01-01 2 4
## 64 1852-01-01 2 3
## 65 1853-01-01 6 5
## 66 1854-01-01 1 2
## 67 1855-01-01 7 9
## 68 1856-01-01 4 4
## 69 1857-01-01 4 5
## 70 1858-01-01 4 5
## 71 1859-01-01 2 2
## 72 1860-01-01 5 5
## 73 1861-01-01 8 7
## 74 1862-01-01 8 9
## 75 1863-01-01 10 10
## 76 1864-01-01 12 15
## 77 1865-01-01 9 6
## 78 1866-01-01 5 10
## 79 1867-01-01 2 2
## 80 1868-01-01 2 2
## 81 1869-01-01 10 8
## 82 1870-01-01 13 14
## 83 1871-01-01 4 6
## 84 1872-01-01 4 5
## 85 1873-01-01 4 3
## 86 1874-01-01 4 5
## 87 1875-01-01 7 7
## 88 1876-01-01 2 2
## 89 1877-01-01 7 7
## 90 1878-01-01 5 5
## 91 1879-01-01 9 9
## 92 1880-01-01 6 6
## 93 1881-01-01 7 8
## 94 1882-01-01 9 10
## 95 1883-01-01 5 3
## 96 1884-01-01 6 8
## 97 1885-01-01 4 3
## 98 1886-01-01 6 4
## 99 1887-01-01 5 5
## 100 1888-01-01 5 9
## 101 1889-01-01 6 2
## 102 1890-01-01 9 13
## 103 1891-01-01 21 14
## 104 1892-01-01 17 32
## 105 1893-01-01 14 14
## 106 1894-01-01 6 6
## 107 1895-01-01 7 9
## 108 1896-01-01 10 7
## 109 1897-01-01 8 9
## 110 1898-01-01 6 5
## 111 1899-01-01 15 15
## 112 1900-01-01 7 5
## 113 1901-01-01 12 13
## 114 1902-01-01 12 14
## 115 1903-01-01 13 14
## 116 1904-01-01 9 7
## 117 1905-01-01 26 27
## 118 1906-01-01 11 11
## 119 1907-01-01 15 12
## 120 1908-01-01 6 6
## 121 1909-01-01 15 15
## 122 1910-01-01 33 22
## 123 1911-01-01 16 28
## 124 1912-01-01 14 13
## 125 1913-01-01 9 9
## 126 1914-01-01 15 14
## 127 1915-01-01 4 5
## 128 1916-01-01 15 15
## 129 1917-01-01 9 10
## 130 1918-01-01 12 11
## 131 1919-01-01 12 13
## 132 1920-01-01 6 6
## 133 1921-01-01 15 14
## 134 1922-01-01 20 15
## 135 1923-01-01 23 25
## 136 1924-01-01 12 16
## 137 1925-01-01 26 25
## 138 1926-01-01 5 12
## 139 1927-01-01 12 10
## 140 1928-01-01 23 27
## 141 1929-01-01 29 35
## 142 1930-01-01 16 10
## 143 1931-01-01 26 22
## 144 1932-01-01 9 18
## 145 1933-01-01 9 9
## 146 1934-01-01 9 9
## 147 1935-01-01 17 16
## 148 1936-01-01 13 14
## 149 1937-01-01 33 33
## 150 1938-01-01 9 9
## 151 1939-01-01 35 34
## 152 1940-01-01 32 32
## 153 1941-01-01 26 21
## 154 1942-01-01 9 15
## 155 1943-01-01 16 16
## 156 1944-01-01 9 9
## 157 1945-01-01 23 22
## 158 1946-01-01 17 18
## 159 1947-01-01 17 15
## 160 1948-01-01 3 7
## 161 1949-01-01 30 30
## 162 1950-01-01 44 39
## 163 1951-01-01 14 17
## 164 1952-01-01 5 5
## 165 1953-01-01 10 9
## 166 1954-01-01 47 47
## 167 1955-01-01 21 21
## 168 1956-01-01 24 24
## 169 1957-01-01 21 21
## 170 1958-01-01 17 17
## 171 1959-01-01 37 36
## 172 1960-01-01 12 12
## 173 1961-01-01 63 62
## 174 1962-01-01 58 62
## 175 1963-01-01 16 16
## 176 1964-01-01 24 23
## 177 1965-01-01 31 35
## 178 1966-01-01 66 81
## 179 1967-01-01 39 38
## 180 1968-01-01 27 28
## 181 1969-01-01 26 26
## 182 1970-01-01 66 65
## 183 1971-01-01 74 71
## 184 1972-01-01 30 35
## 185 1973-01-01 23 23
## 186 1974-01-01 36 36
## 187 1975-01-01 22 20
## 188 1976-01-01 29 31
## 189 1977-01-01 32 32
## 190 1978-01-01 34 34
## 191 1979-01-01 152 141
## 192 1980-01-01 48 75
## 193 1981-01-01 48 62
## 194 1982-01-01 44 62
## 195 1983-01-01 35 36
## 196 1984-01-01 44 44
## 197 1985-01-01 86 85
## 198 1986-01-01 46 47
## 199 1987-01-01 64 44
## 200 1988-01-01 21 41
## 201 1989-01-01 24 15
## 202 1990-01-01 48 57
## 203 1991-01-01 89 58
## 204 1992-01-01 33 64
## 205 1993-01-01 48 29
## 206 1994-01-01 83 102
## 207 1995-01-01 69 54
## 208 1996-01-01 7 22
## 209 1997-01-01 65 37
## 210 1998-01-01 37 65
## 211 1999-01-01 48 34
## 212 2000-01-01 25 39
## 213 2001-01-01 54 30
## 214 2002-01-01 48 72
## 215 2003-01-01 94 69
## 216 2004-01-01 11 35
## 217 2005-01-01 33 18
## 218 2006-01-01 21 37
## 219 2007-01-01 51 38
## 220 2008-01-01 17 30
## 221 2009-01-01 32 10
## 222 2010-01-01 30 51
## 223 2011-01-01 97 61
## 224 2012-01-01 15 49
## 225 2013-01-01 46 48
## 226 2014-01-01 61 62
## 227 <NA> 160 33
Follow this with the final data column, Terminations.
inner_join(nominations_commissions_yearly, judges_terminations_yearly, by="year") -> nominations_commissions_terminations_yearly
nominations_commissions_terminations_yearly## year No_of_Nominations No_of_Commissions No_of_Terminations
## 1 1789-01-01 15 15 NA
## 2 1790-01-01 9 9 3
## 3 1791-01-01 3 3 2
## 4 1792-01-01 1 1 1
## 5 1793-01-01 2 1 2
## 6 1794-01-01 1 3 2
## 7 1795-01-01 2 1 4
## 8 1796-01-01 5 4 3
## 9 1797-01-01 1 3 1
## 10 1798-01-01 2 2 2
## 11 1799-01-01 2 2 2
## 12 1800-01-01 NA NA 1
## 13 1801-01-01 18 21 6
## 14 1802-01-01 7 10 20
## 15 1803-01-01 2 2 1
## 16 1804-01-01 3 3 2
## 17 1805-01-01 1 NA 1
## 18 1806-01-01 5 5 5
## 19 1807-01-01 1 2 NA
## 20 1808-01-01 NA NA NA
## 21 1809-01-01 1 1 1
## 22 1810-01-01 NA NA 2
## 23 1811-01-01 3 3 1
## 24 1812-01-01 5 5 4
## 25 1813-01-01 2 2 2
## 26 1814-01-01 NA 2 3
## 27 1815-01-01 1 1 1
## 28 1816-01-01 1 1 1
## 29 1817-01-01 2 2 NA
## 30 1818-01-01 3 4 3
## 31 1819-01-01 3 4 3
## 32 1820-01-01 3 3 1
## 33 1821-01-01 2 1 1
## 34 1822-01-01 2 3 2
## 35 1823-01-01 4 6 4
## 36 1824-01-01 5 5 6
## 37 1825-01-01 3 3 4
## 38 1826-01-01 7 8 7
## 39 1827-01-01 NA 1 NA
## 40 1828-01-01 2 NA 5
## 41 1829-01-01 4 5 2
## 42 1830-01-01 3 3 2
## 43 1831-01-01 1 NA 1
## 44 1832-01-01 1 2 1
## 45 1833-01-01 2 NA 3
## 46 1834-01-01 3 4 3
## 47 1835-01-01 3 1 3
## 48 1836-01-01 8 9 5
## 49 1837-01-01 4 5 2
## 50 1838-01-01 NA 2 3
## 51 1839-01-01 2 4 3
## 52 1840-01-01 4 4 1
## 53 1841-01-01 6 6 6
## 54 1842-01-01 3 3 3
## 55 1843-01-01 NA NA 1
## 56 1844-01-01 1 1 2
## 57 1845-01-01 4 2 5
## 58 1846-01-01 4 7 1
## 59 1847-01-01 2 3 1
## 60 1848-01-01 1 3 1
## 61 1849-01-01 5 4 5
## 62 1850-01-01 2 4 1
## 63 1851-01-01 2 4 3
## 64 1852-01-01 2 3 3
## 65 1853-01-01 6 5 5
## 66 1854-01-01 1 2 1
## 67 1855-01-01 7 9 6
## 68 1856-01-01 4 4 1
## 69 1857-01-01 4 5 5
## 70 1858-01-01 4 5 2
## 71 1859-01-01 2 2 5
## 72 1860-01-01 5 5 3
## 73 1861-01-01 8 7 20
## 74 1862-01-01 8 9 5
## 75 1863-01-01 10 10 11
## 76 1864-01-01 12 15 8
## 77 1865-01-01 9 6 2
## 78 1866-01-01 5 10 6
## 79 1867-01-01 2 2 2
## 80 1868-01-01 2 2 1
## 81 1869-01-01 10 8 4
## 82 1870-01-01 13 14 8
## 83 1871-01-01 4 6 6
## 84 1872-01-01 4 5 3
## 85 1873-01-01 4 3 5
## 86 1874-01-01 4 5 7
## 87 1875-01-01 7 7 3
## 88 1876-01-01 2 2 2
## 89 1877-01-01 7 7 5
## 90 1878-01-01 5 5 5
## 91 1879-01-01 9 9 8
## 92 1880-01-01 6 6 6
## 93 1881-01-01 7 8 8
## 94 1882-01-01 9 10 9
## 95 1883-01-01 5 3 5
## 96 1884-01-01 6 8 5
## 97 1885-01-01 4 3 3
## 98 1886-01-01 6 4 6
## 99 1887-01-01 5 5 5
## 100 1888-01-01 5 9 5
## 101 1889-01-01 6 2 4
## 102 1890-01-01 9 13 6
## 103 1891-01-01 21 14 10
## 104 1892-01-01 17 32 8
## 105 1893-01-01 14 14 12
## 106 1894-01-01 6 6 1
## 107 1895-01-01 7 9 5
## 108 1896-01-01 10 7 9
## 109 1897-01-01 8 9 10
## 110 1898-01-01 6 5 6
## 111 1899-01-01 15 15 8
## 112 1900-01-01 7 5 5
## 113 1901-01-01 12 13 10
## 114 1902-01-01 12 14 9
## 115 1903-01-01 13 14 10
## 116 1904-01-01 9 7 6
## 117 1905-01-01 26 27 19
## 118 1906-01-01 11 11 12
## 119 1907-01-01 15 12 9
## 120 1908-01-01 6 6 7
## 121 1909-01-01 15 15 14
## 122 1910-01-01 33 22 13
## 123 1911-01-01 16 28 50
## 124 1912-01-01 14 13 10
## 125 1913-01-01 9 9 16
## 126 1914-01-01 15 14 13
## 127 1915-01-01 4 5 8
## 128 1916-01-01 15 15 15
## 129 1917-01-01 9 10 6
## 130 1918-01-01 12 11 11
## 131 1919-01-01 12 13 8
## 132 1920-01-01 6 6 7
## 133 1921-01-01 15 14 11
## 134 1922-01-01 20 15 11
## 135 1923-01-01 23 25 10
## 136 1924-01-01 12 16 16
## 137 1925-01-01 26 25 17
## 138 1926-01-01 5 12 6
## 139 1927-01-01 12 10 10
## 140 1928-01-01 23 27 16
## 141 1929-01-01 29 35 18
## 142 1930-01-01 16 10 16
## 143 1931-01-01 26 22 19
## 144 1932-01-01 9 18 9
## 145 1933-01-01 9 9 11
## 146 1934-01-01 9 9 7
## 147 1935-01-01 17 16 11
## 148 1936-01-01 13 14 6
## 149 1937-01-01 33 33 12
## 150 1938-01-01 9 9 14
## 151 1939-01-01 35 34 15
## 152 1940-01-01 32 32 15
## 153 1941-01-01 26 21 18
## 154 1942-01-01 9 15 8
## 155 1943-01-01 16 16 14
## 156 1944-01-01 9 9 14
## 157 1945-01-01 23 22 19
## 158 1946-01-01 17 18 14
## 159 1947-01-01 17 15 13
## 160 1948-01-01 3 7 22
## 161 1949-01-01 30 30 21
## 162 1950-01-01 44 39 14
## 163 1951-01-01 14 17 10
## 164 1952-01-01 5 5 15
## 165 1953-01-01 10 9 17
## 166 1954-01-01 47 47 13
## 167 1955-01-01 21 21 15
## 168 1956-01-01 24 24 9
## 169 1957-01-01 21 21 16
## 170 1958-01-01 17 17 21
## 171 1959-01-01 37 36 16
## 172 1960-01-01 12 12 17
## 173 1961-01-01 63 62 19
## 174 1962-01-01 58 62 21
## 175 1963-01-01 16 16 21
## 176 1964-01-01 24 23 19
## 177 1965-01-01 31 35 29
## 178 1966-01-01 66 81 40
## 179 1967-01-01 39 38 18
## 180 1968-01-01 27 28 14
## 181 1969-01-01 26 26 23
## 182 1970-01-01 66 65 20
## 183 1971-01-01 74 71 24
## 184 1972-01-01 30 35 24
## 185 1973-01-01 23 23 18
## 186 1974-01-01 36 36 34
## 187 1975-01-01 22 20 28
## 188 1976-01-01 29 31 27
## 189 1977-01-01 32 32 18
## 190 1978-01-01 34 34 26
## 191 1979-01-01 152 141 37
## 192 1980-01-01 48 75 29
## 193 1981-01-01 48 62 40
## 194 1982-01-01 44 62 38
## 195 1983-01-01 35 36 23
## 196 1984-01-01 44 44 24
## 197 1985-01-01 86 85 25
## 198 1986-01-01 46 47 29
## 199 1987-01-01 64 44 27
## 200 1988-01-01 21 41 31
## 201 1989-01-01 24 15 29
## 202 1990-01-01 48 57 39
## 203 1991-01-01 89 58 28
## 204 1992-01-01 33 64 30
## 205 1993-01-01 48 29 25
## 206 1994-01-01 83 102 27
## 207 1995-01-01 69 54 39
## 208 1996-01-01 7 22 29
## 209 1997-01-01 65 37 26
## 210 1998-01-01 37 65 40
## 211 1999-01-01 48 34 34
## 212 2000-01-01 25 39 32
## 213 2001-01-01 54 30 34
## 214 2002-01-01 48 72 38
## 215 2003-01-01 94 69 29
## 216 2004-01-01 11 35 34
## 217 2005-01-01 33 18 30
## 218 2006-01-01 21 37 30
## 219 2007-01-01 51 38 30
## 220 2008-01-01 17 30 30
## 221 2009-01-01 32 10 35
## 222 2010-01-01 30 51 37
## 223 2011-01-01 97 61 47
## 224 2012-01-01 15 49 33
## 225 2013-01-01 46 48 32
## 226 2014-01-01 61 62 31
## 227 <NA> 160 33 1374
Using ggplot we can display the findings
## Warning: Removed 1 row containing missing values (`geom_line()`).
Note how with a additional information added to the basic plot we can create a very effective graph
judges_commissions_yearly %>%
ggplot(aes(x=year, y=No_of_Commissions)) +
geom_line(linewidth = 0.8) +
labs(title = "Judicial Commmissions - 1789-2014",
tag = "Figure 1",
x = "Year",
y = "No.") +
scale_x_date(date_breaks = "65 years", date_labels = "%Y") +
theme_classic() +
theme(axis.text.x = element_text(colour = "darkslategrey", size = 16),
axis.text.y = element_text(colour = "darkslategrey", size = 16),
legend.background = element_rect(fill = "white", linewidth = 4, colour = "white"),
legend.justification = c(0, 1),
legend.position = c(0.9, 1),
text = element_text(family = "Georgia"),
plot.title = element_text(size = 18, margin = margin(b = 10)),
plot.subtitle = element_text(size = 12, color = "darkslategrey", margin = margin(b = 25)),
plot.caption = element_text(size = 8, margin = margin(t = 10), color = "grey70", hjust = 0))## Warning: Removed 1 row containing missing values (`geom_line()`).
Create a bar graph of yearly Judges nominations and terminations
Create a comparative line chart or bar chart of the nominations, commissions and terminations