The barchart shown above, is one of the end results of this challenge.

The steps to achieve it included:

  1. Webscraping to get the raw data from the Wikipedia website. Four methods were explored.

  2. Extensive data wrangling, including use of Regular Expressions (gsub function) and treatment of missing data.

  3. Creation of tables.

  4. Plotting barsharts.

  5. Presidential Years in Office.Lines 202-302.

  6. Presidential Political Parties.

  7. Most Popular Presidential First Names.

Click Here“President Religions”

Load necessary packages (Install if necessary from CRAN)

Check with robotstxt

## 
 en.wikipedia.org
## [1] TRUE

Proceed if paths allowed = TRUE.

Webscraping Data: Method 1

Methods 2 and 3

Webscrape “table” using the rvest package

Method 4

Using the rvest package and Xpath functtion.

How to find the Xpath.

  1. Go to the website.

  2. Right click at the top left of the table and go to ‘Inspect Element’.

  3. Move the cursor up 3 lines to the line that begins with “<table class=”. Yo will notice that the entire table is now in “blue”.

  4. Right click to “copy” > “Xpath”. This copies the necessary code which can now be pasted into your R code.

  5. XPath = //*[@id="mw-content-text"]/div[1]/table[1] Note that the code must be enclosed in single quotes.

Method 5

Using Outwit Hib software.

Click Here“Outwit HUb”

It saved the data in a “csv” file, which is loaded here.

Start data wrangling

Three versions of raw_data (2,3,4) are the cleanest. Let’s work with raw_data_3.

Name Column

Grover Cleveland was president for two non-consecutive terms. He was the 22nd and 24th president.

##       #             Name  Religion     Branch Further branch
## 1 22+24 Grover Cleveland Christian Protestant       Reformed
##                                 Specific denomination      Years in office
## 1 Presbyterian Church in the United States of America 1885–1889; 1893-1897
##   Notes
## 1
Create Tables
Create plots

What can we do with the “Years in Office” data?
President Years in Office

Franklin D. Roosevelt is the only President to serve more than 8 years. The 22nd amendment to the US Costitution, passed by Congress in 1947 and ratified in 1951, put a two-term limit on the Presidency.

President political parties

Use “Selector Gadget” and “rvest packaage”

Click Here“Selector Gadget”

Wordcloud of Further Branches
Presidential Religious Affiliations
Presidential Political Party Frequencies