Abstract
Although the US Constitution has no religious requiremenet for federal office, virtualy all presidents have claimed to be Christians. Out of 45 presidents, 2 have been Roman Catholics and 43 have been Protestants. Hre is a barchart of the distribution of the categories. We found the data on Wikipedia.The challenges included the webscraping and wrangling of the data to provide a tidy dataset that could be analyzed and visualized. Valuable tools included Regular Expressions, Selector Gadget, Plotly, writing functions and Wordclouds.The barchart shown above, is one of the end results of this challenge.
The steps to achieve it included:
Webscraping to get the raw data from the Wikipedia website. Four methods were explored.
Extensive data wrangling, including use of Regular Expressions (gsub function) and treatment of missing data.
Creation of tables.
Plotting barsharts.
Presidential Years in Office.Lines 202-302.
Presidential Political Parties.
Most Popular Presidential First Names.
Click Here“President Religions”
##
en.wikipedia.org
## [1] TRUE
Proceed if paths allowed = TRUE.
Webscrape “table” using the rvest package
Using the rvest package and Xpath functtion.
How to find the Xpath.
Go to the website.
Right click at the top left of the table and go to ‘Inspect Element’.
Move the cursor up 3 lines to the line that begins with “<table class=”. Yo will notice that the entire table is now in “blue”.
Right click to “copy” > “Xpath”. This copies the necessary code which can now be pasted into your R code.
XPath = //*[@id="mw-content-text"]/div[1]/table[1] Note that the code must be enclosed in single quotes.
Using Outwit Hib software.
Click Here“Outwit HUb”
It saved the data in a “csv” file, which is loaded here.
Three versions of raw_data (2,3,4) are the cleanest. Let’s work with raw_data_3.
Grover Cleveland was president for two non-consecutive terms. He was the 22nd and 24th president.
## # Name Religion Branch Further branch
## 1 22+24 Grover Cleveland Christian Protestant Reformed
## Specific denomination Years in office
## 1 Presbyterian Church in the United States of America 1885–1889; 1893-1897
## Notes
## 1
Franklin D. Roosevelt is the only President to serve more than 8 years. The 22nd amendment to the US Costitution, passed by Congress in 1947 and ratified in 1951, put a two-term limit on the Presidency.
Use “Selector Gadget” and “rvest packaage”
Click Here“Selector Gadget”