The site r-bloggers is a team blog, with a lot of great how-to content on various R topics. The page http://www.r-bloggers.com/search/web%20scraping provides a list of topics related to web scraping, which is also the topic of this project!

Part 1: For each of the reference blog entries on the first page, you should pull out the title, date, and author, and store these in an R data frame. Your code should be in github, and published to rpubs.com.

library(rvest)
## Warning: package 'rvest' was built under R version 3.1.3
library(XML)
## Warning: package 'XML' was built under R version 3.1.3
## 
## Attaching package: 'XML'
## 
## The following object is masked from 'package:rvest':
## 
##     xml
library(knitr)
## Warning: package 'knitr' was built under R version 3.1.3
library(dplyr)
## Warning: package 'dplyr' was built under R version 3.1.3
## 
## Attaching package: 'dplyr'
## 
## The following object is masked from 'package:stats':
## 
##     filter
## 
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(gsubfn)
## Warning: package 'gsubfn' was built under R version 3.1.3
## Loading required package: proto
library(tools)
## 
## Attaching package: 'tools'
## 
## The following object is masked from 'package:XML':
## 
##     toHTML

Start with the first search page.

r_bloggers <- html("http://www.r-bloggers.com/search/web%20scraping")

posts <- r_bloggers %>%
  html_nodes(xpath = '//div[contains(@id,"post")]')

titles <- posts %>%
  html_nodes(xpath = 'h2/a/text()')
t_titles <- data.frame(sapply(titles,xmlValue))  

dates <- posts %>%
  html_nodes(xpath = 'div[1]/div')
t_dates <- data.frame(sapply(dates,xmlValue))

authors <- posts %>%
  html_nodes(xpath = 'div[1]/a')
t_authors <- data.frame(sapply(authors,xmlValue))  

t_posts <- cbind(Title = t_titles, Date = t_dates, Author = t_authors) #merge tables
colnames(t_posts) <- c("Title", "Date", "Author") #add colnames to table

#View table of first page posts
kable(t_posts)
Title Date Author
rvest: easy web scraping with R November 24, 2014 hadleywickham
Migrating Table-oriented Web Scraping Code to rvest w/XPath & CSS Selector Examples September 17, 2014 Bob Rudis (@hrbrmstr)
Web Scraping: working with APIs March 12, 2014 Rolf Fredheim
Web Scraping: Scaling up Digital Data Collection March 5, 2014 Rolf Fredheim
Web Scraping part2: Digging deeper February 25, 2014 Rolf Fredheim
A Little Web Scraping Exercise with XML-Package April 5, 2012 Kay Cichini
R: Web Scraping R-bloggers Facebook Page January 6, 2012 Tony Breyal
Web scraping with Python – the dark side of data December 27, 2011 axiomOfChoice
Web Scraping Google+ via XPath November 11, 2011 Tony Breyal
Web Scraping Yahoo Search Page via XPath November 10, 2011 Tony Breyal

Extend your scraper to include the base information for blog entries on all of the tagged pages. Your R data frame should include any necessary additional rows.

Parse first search page to find out how many total pages.

baseurl <- htmlParse("http://www.r-bloggers.com/search/web%20scraping")
xpath <- '//*[contains(@class, "last")]'
total_pages <- as.numeric(xpathSApply(baseurl, xpath, xmlValue))
total_pages
## [1] 17

Move to other pages.

#Substitute page number in a loop to set other search page URLs
substitute_url_args <- function(url, list_args) {
  gsubfn("%\\((.*?)\\)s", x = url, env = list_args)
}

n <- total_pages
for(i in 2:n) {

#Page number substituted here
s <- "http://www.r-bloggers.com/search/web%20scraping/page/%(id)s"
L <- list(id = i)
newurl <- substitute_url_args(s, L)

#Post data pulled from page
r_bloggers <- html(newurl)

posts <- r_bloggers %>%
  html_nodes(xpath = '//div[contains(@id,"post")]')

titles <- posts %>%
  html_nodes(xpath = 'h2/a/text()')
t_titles <- data.frame(sapply(titles,xmlValue))  

dates <- posts %>%
  html_nodes(xpath = 'div[1]/div')
t_dates <- data.frame(sapply(dates,xmlValue))

authors <- posts %>%
  html_nodes(xpath = 'div[1]/a')
t_authors <- data.frame(sapply(authors,xmlValue))  

t_posts_new <- cbind(Title = t_titles, Date = t_dates, Author = t_authors) #merge tables
colnames(t_posts_new) <- c("Title", "Date", "Author") #add colnames to table

t_posts <- bind_rows(t_posts, t_posts_new)

}
## Warning in rbind_all(list(x, ...)): Unequal factor levels: coercing to
## character
## Warning in rbind_all(list(x, ...)): Unequal factor levels: coercing to
## character
## Warning in rbind_all(list(x, ...)): Unequal factor levels: coercing to
## character

View table of all posts in all search results pages

kable(t_posts)
Title Date Author
rvest: easy web scraping with R November 24, 2014 hadleywickham
Migrating Table-oriented Web Scraping Code to rvest w/XPath & CSS Selector Examples September 17, 2014 Bob Rudis (@hrbrmstr)
Web Scraping: working with APIs March 12, 2014 Rolf Fredheim
Web Scraping: Scaling up Digital Data Collection March 5, 2014 Rolf Fredheim
Web Scraping part2: Digging deeper February 25, 2014 Rolf Fredheim
A Little Web Scraping Exercise with XML-Package April 5, 2012 Kay Cichini
R: Web Scraping R-bloggers Facebook Page January 6, 2012 Tony Breyal
Web scraping with Python – the dark side of data December 27, 2011 axiomOfChoice
Web Scraping Google+ via XPath November 11, 2011 Tony Breyal
Web Scraping Yahoo Search Page via XPath November 10, 2011 Tony Breyal
Web Scraping Google Scholar: Part 2 (Complete Success) November 8, 2011 Tony Breyal
Web Scraping Google Scholar (Partial Success) November 8, 2011 Tony Breyal
Web Scraping Google URLs November 7, 2011 Tony Breyal
Next Level Web Scraping November 5, 2011 Kay Cichini
Web Scraping Google Scholar & Show Result as Word Cloud Using R November 1, 2011 Kay Cichini
Scraping Web Pages With R April 15, 2015 Tony Hirst
FOMC Dates – Scraping Data From Web Pages November 30, 2014 Peter Chan
Scraping Fantasy Football Projections from the Web June 27, 2014 Isaac Petersen
Web-Scraping: the Basics February 19, 2014 Rolf Fredheim
Relenium, Selenium for R. A new tool for webscraping. January 4, 2014 aleixrvr
R and the web (for beginners), Part III: Scraping MPs’ expenses in detail from the web August 23, 2012 GivenTheData
Web-Scraping in R April 2, 2012 diffuseprior
Scraping table from any web page with R or CloudStat January 15, 2012 PR
Scraping table from html web with CloudStat January 12, 2012 CloudStat
A Little Webscraping-Exercise… October 22, 2011 Kay Cichini
Scraping web data in R August 10, 2011 Zach Mayer
Webscraping using readLines and RCurl April 14, 2009 bryan
Webscraping using readLines and RCurl April 14, 2009 bryan
Short R tutorial: Scraping Javascript Generated Data with R March 15, 2015 DataCamp
FOMC Dates – Full History Web Scrape January 21, 2015 Peter Chan
Scraping XML Tables with R May 15, 2014 jgreenb1
Scraping SSL Labs Server Test Results With R April 29, 2014 Bob Rudis (@hrbrmstr)
Interfacing R with Web technologies April 14, 2014 David Smith
Scraping organism metadata for Treebase repositories from GOLD using Python and R April 4, 2014 What is this? David Springate’s personal blog :: R
R-Bloggers’ Web-Presence April 6, 2012 Kay Cichini
How-to Extract Text From Multiple Websites with R February 18, 2012 Christopher Gandrud
Scraping Flora of North America January 27, 2012 Recology - R
Scraping R-bloggers with Python – Part 2 January 5, 2012 The PolStat R Feed
Scraping R-Bloggers with Python January 4, 2012 The PolStat R Feed
R-Function GScholarScraper to Webscrape Google Scholar Search Result November 9, 2011 Kay Cichini
Interacting with bioinformatics webservers using R September 8, 2011 nsaunders
R Screen Scraping: 105 Counties of Election Data February 18, 2011 Earl Glynn
Simple R Screen Scraping Example February 18, 2011 Earl Glynn
Scrape Web data using R August 13, 2010
Digital Data Collection course March 20, 2015 Rolf Fredheim
Getting Data From An Online Source March 6, 2015 Robert Norberg
Playing around with #rstats twitter data February 28, 2015 [email protected]
/* <![CDATA[ */!function(){try{var t=“currentScript”in document?document.currentScript:function(){for(var t=document.getElementsBy TagName(“script”),e=t .length;e–;)if(t[e].getAttribute(“cf-hash”))return t[e]}();if(t&&t.previousSibling){var e,r,n,i,c=t.previousSibling,a=c.getAttribute(“data-cfemail”);if(a){for(e=“”,r=parseInt(a.substr(0,2),16),n=2;a.length-n;n+=2)i=parseInt(a.substr(n,2),16)^r,e+=String.fromCharCode(i);e=document.createTextNode(e),c.parentNode.replaceChild(e,c)}}}catch(u){}}();/* ]]> */
50 years of Christmas at the Windsors December 19, 2014 Dominic Nyhuis
Power Outage Impact Choropleths In 5 Steps in R (featuring rvest & RStudio “Projects”) November 27, 2014 hrbrmstr
Slightly Advanced rvest with Help from htmltools + XML + pipeR November 26, 2014 klr
What size will you be after you lose weight? November 14, 2014 dan
A bioinformatics walk-through: Accessing protein-protein interaction interfaces for all known protein structures with PDBe PISA September 28, 2014 biochemistries
R User Group Roundup August 28, 2014 Joseph Rickert
Automatically Scrape Flight Ticket Data Using R and Phantomjs April 30, 2014 Huidong Tian
Text Mining Gun Deaths Data March 13, 2014 Francis Smart
Better handling of JSON data in R? March 13, 2014 Rolf Fredheim
Upcoming NYC R Programming Classes March 10, 2014 vivian
Introduction February 1, 2014 steadyfish
Programming instrumental music from scratch July 29, 2013 Vik Paruchuri
Programming instrumental music from scratch July 29, 2013 - r
Programming instrumental music from scratch July 29, 2013 Vik Paruchuri
xkcd: Visualized May 6, 2013 Myles
Has R-help gotten meaner over time? And what does Mancur Olson have to say about it? April 30, 2013 Trey Causey
Data Science, Data Analysis, R and Python December 15, 2012 Ron Pearson (aka TheNoodleDoodler)
.Rhistory October 27, 2012 distantobserver
Hangman in R: A learning experience July 28, 2012 tylerrinker
Data Analysis Training March 20, 2012 prasoonsharma
Making an R Package: Not as hard as you think January 11, 2012 markbulling
Plotting Doctor Who Ratings (1963-2011) with R January 3, 2012 Tony Breyal
GScholarXScraper: Hacking the GScholarScraper function with XPath November 13, 2011 Tony Breyal
Facebook Graph API Explorer with R November 10, 2011 Tony Breyal
UCLA Statistics: Analyzing Thesis/Dissertation Lengths September 29, 2010 Ryan Rosario
Cricket data analysis September 4, 2010 prasoonsharma
What to Expect? January 22, 2010 Ryan
Analysing The Rock ‘n’ Roll Madrid Marathon April 18, 2015 aschinchon
Monitoring Price Fluctuations of Book Trade-In Values on Amazon April 8, 2015 Andrew Landgraf
More Airline Crashes via the Hadleyverse March 31, 2015 hrbrmstr
Knitr’s best hidden gem: spin March 23, 2015 Dean Attali’s R Blog
Fuzzy String Matching – a survival skill to tackle unstructured information February 26, 2015 Bigdata Doc
Who Has the Best Fantasy Football Projections? 2015 Update February 20, 2015 Isaac Petersen
Predicting the six nations February 4, 2015 Mango Solutions
Building a choropleth map of Italy using mapIT January 19, 2015 Davide Massidda
New updates to the rNOMADS package and big changes in the GFS model January 16, 2015 glossarch
Explore Kaggle Competition Data with R December 23, 2014 notesofdabbler
How to analyze a new dataset (or, analyzing ‘supercar’ data, part 1) December 16, 2014 Sharpsight Admin
FOMC Dates – Price Data Exploration December 14, 2014 Peter Chan
A Letter of Recommendation for Nan Xiao November 17, 2014 Yihui Xie
Leveraging R for Job Openings for Economists November 1, 2014 Thiemo Fetzer
Wrangling F1 Data With R – F1DataJunkie Book October 30, 2014 Tony Hirst
How to Download and Run R Scripts from this Site October 23, 2014 Isaac Petersen
FIFA 15 Analysis with R September 26, 2014 The Clerk
“Do You Want to Steal a Snowman?” – A Look (with R) At TorrentFreak’s Top 10 PiRated Movies List #TLAPD September 18, 2014 Bob Rudis (@hrbrmstr)
Visit of Di Cook August 12, 2014 Rob J Hyndman
Identify Fantasy Football Sleepers with this Shiny App July 6, 2014 Isaac Petersen
Time to Accept It: publishing in the Journal of Statistical Software June 30, 2014 brobar
2014 World Cup Squads June 5, 2014 gjabel
Basketball Data Part II – Length of Career by Position June 2, 2014 jgreenb1
Using sentiment analysis to predict ratings of popular tv series May 26, 2014 tlfvincent
On the trade history and dynamics of NBA teams April 28, 2014 tlfvincent
Rblogger Posting Patterns Analyzed with R April 11, 2014 Mark T Patterson
BARUG talks highlight R’s diverse applications April 10, 2014 Joseph Rickert
Mapping academic collaborations in Evolutionary Biology April 4, 2014 What is this? David Springate’s personal blog :: R
President Approval Ratings from Roosevelt to Obama March 29, 2014 tlfvincent
Evolution of Code March 27, 2014 Educate-R - R
Terms February 13, 2014 Tal Galili
Live Google Spreadsheet For Keeping Track Of Sochi Medals February 11, 2014 hrbrmstr
Using One Programming Language In the Context of Another – Python and R January 22, 2014 Tony Hirst
Statistics meets rhetoric: A text analysis of “I Have a Dream” in R January 20, 2014 Max Ghenis
Statistics meets rhetoric: A text analysis of “I Have a Dream” in R January 20, 2014 Max Ghenis
Second NYC R classes(announcement and teaching experience) January 20, 2014 Tal Galili
Calling Python from R with rPython January 13, 2014 bryan
Why R is Better Than Excel for Fantasy Football (and most other) Data Analysis January 13, 2014 Isaac Petersen
College Basketball: Presence in the NBA over Time November 7, 2013 Mark T Patterson
Creating your personal, portable R code library with GitHub September 21, 2013 bryan
MLB Rankings Using the Bradley-Terry Model August 31, 2013 John Ramey
ggplot2 Chloropleth of Supreme Court Decisions: A Tutorial July 4, 2013 tylerrinker
Which airline should you be loyal to? July 2, 2013 dan
Opel Corsa Diesel Usage June 24, 2013 Wingfeet
Logging Data in R Loops: Applied to Twitter. May 26, 2013 Alistair Leak
Shiny App for CRAN packages May 13, 2013 pssguy
The Guerilla Guide to R May 12, 2013 Nikhil Gopal
Presentations of the third Milano R net meeting April 19, 2013 Milano R net
Milano (Italy). April 18, 2013. Third Milano R net meeting: agenda April 10, 2013 Milano R net
April 18, 2013Third Milano R net meeting: agenda March 25, 2013 Milano R net
Generating Labels for Supervised Text Classification using CAT and R February 4, 2013 Solomon
Hilary: the most poisoned baby name in US history January 29, 2013 hilaryparker
R and foreign characters January 25, 2013 Rolf Fredheim
SPARQL with R in less than 5 minutes January 23, 2013 bryan
Multiple Classification and Authorship of the Hebrew Bible January 1, 2013 inkhorn82
Chocolate and nobel prize – a true story? December 22, 2012 Max Gordon
Animated map of 2012 US election campaigning, with R and ffmpeg October 28, 2012 civilstat
Tips on accessing data from various sources with R October 3, 2012 David Smith
R Helper Functions September 25, 2012 bryan
The R-Podcast Episode 10: Adventures in Data Munging Part 2 September 16, 2012 Eric
UseR 2012 highlights June 20, 2012 David Smith
Visualizing the CRAN: Graphing Package Dependencies May 17, 2012 wrathematics
118 years of US State Weather Data April 22, 2012 drunksandlampposts
The 50 most used R packages April 5, 2012 flodel
RStudio Development Environment March 23, 2012 bryan
R: A Quick Scrape of Top Grossing Films from boxofficemojo.com January 13, 2012 Tony Breyal
Installing quantstrat from R-forge and source January 10, 2012 bryan
Analyzing R-bloggers January 6, 2012 The PolStat R Feed
Mapping the Iowa GOP 2012 Caucus Results January 4, 2012 jjh
Outliers in the European Parliament December 20, 2011 The PolStat Feed
Subscriptions Feature Added December 7, 2011 bryan
Google Scholar (still) sucks November 13, 2011 bbolker
Power Tools for Aspiring Data Journalists: R October 31, 2011 Tony Hirst
Forecasting recessions August 9, 2011 Zach Mayer
CHCN: Canadian Historical Climate Network August 4, 2011 Steven Mosher
hacking .gov shortened links July 30, 2011 Harlan
roll calls, ideal points, 112th Congress June 29, 2011 jackman
Automating R Scripts on Amazon EC2 June 9, 2011 Travis Nelson
Friday fun projects May 14, 2011 nsaunders
Further Adventures in Visualisation with ggplot2 April 25, 2011 hayward
Friday Function: setInternet2 April 15, 2011 richierocks
Find NHL Players with 30 Goals and 100 PIM using R April 2, 2011 btibert3
NBA Analysis: Coming Soon! March 21, 2011 Ryan
Clustering NHL Skaters February 6, 2011
Dial-a-statistic! Featuring R and Estonia January 16, 2011 Ethan Brown
How to buy a used car with R (part 1) October 31, 2010 Dan Knoepfle’s Blog
How to buy a used car with R (part 1) October 31, 2010 Dan Knoepfle’s Blog
Using XML package vs. BeautifulSoup August 31, 2010 Ryan
Are MLB Games Getting Longer? August 5, 2010 Ryan
Analyze Gold Demand and Investments using R June 29, 2010 C
tooltips in R graphics; nytR package December 28, 2009 jackman