Captures all blog links on the web scraping blog pages(1 to 17)
urls <- unlist(llply(url_list, getHTMLLinks))
head(urls)
## [1] "http://www.r-bloggers.com/"
## [2] "http://www.r-bloggers.com"
## [3] "http://www.r-bloggers.com/about/"
## [4] "http://feeds.feedburner.com/RBloggers"
## [5] "http://www.r-bloggers.com/add-your-blog/"
## [6] "http://www.r-users.com/"
Apply same function blogScraper to all the web scraping urls, to get title date and author
a <- sapply(urls, blogScraper)
class(a)
## [1] "matrix"
head(a)
## [,1]
## [1,] "http://www.r-bloggers.com/rvest-easy-web-scraping-with-r/"
## [2,] "rvest: easy web scraping with R"
## [3,] "November 24, 2014"
## [4,] "hadleywickham"
## [,2]
## [1,] "http://www.r-bloggers.com/migrating-table-oriented-web-scraping-code-to-rvest-wxpath-css-selector-examples/"
## [2,] "Migrating Table-oriented Web Scraping Code to rvest w/XPath & CSS Selector Examples"
## [3,] "September 17, 2014"
## [4,] "Bob Rudis (@hrbrmstr)"
## [,3]
## [1,] "http://www.r-bloggers.com/web-scraping-working-with-apis/"
## [2,] "Web Scraping: working with APIs"
## [3,] "March 12, 2014"
## [4,] "Rolf Fredheim"
## [,4]
## [1,] "http://www.r-bloggers.com/web-scraping-scaling-up-digital-data-collection/"
## [2,] "Web Scraping: Scaling up Digital Data Collection"
## [3,] "March 5, 2014"
## [4,] "Rolf Fredheim"
## [,5]
## [1,] "http://www.r-bloggers.com/web-scraping-part2-digging-deeper/"
## [2,] "Web Scraping part2: Digging deeper"
## [3,] "February 25, 2014"
## [4,] "Rolf Fredheim"
## [,6]
## [1,] "http://www.r-bloggers.com/a-little-web-scraping-exercise-with-xml-package/"
## [2,] "A Little Web Scraping Exercise with XML-Package"
## [3,] "April 5, 2012"
## [4,] "Kay Cichini"
## [,7]
## [1,] "http://www.r-bloggers.com/r-web-scraping-r-bloggers-facebook-page-to-gain-further-information-about-an-authors-r-blog-posts-e-g-number-of-likes-comments-shares-etc/"
## [2,] "R: Web Scraping R-bloggers Facebook Page"
## [3,] "January 6, 2012"
## [4,] "Tony Breyal"
## [,8]
## [1,] "http://www.r-bloggers.com/web-scraping-with-python-the-dark-side-of-data/"
## [2,] "Web scraping with Python – the dark side of data"
## [3,] "December 27, 2011"
## [4,] "axiomOfChoice"
## [,9]
## [1,] "http://www.r-bloggers.com/web-scraping-google-via-xpath/"
## [2,] "Web Scraping Google+ via XPath"
## [3,] "November 11, 2011"
## [4,] "Tony Breyal"
## [,10]
## [1,] "http://www.r-bloggers.com/web-scraping-yahoo-search-page-via-xpath/"
## [2,] "Web Scraping Yahoo Search Page via XPath"
## [3,] "November 10, 2011"
## [4,] "Tony Breyal"
## [,11]
## [1,] "http://www.r-bloggers.com/web-scraping-google-scholar-part-2-complete-success/"
## [2,] "Web Scraping Google Scholar: Part 2 (Complete Success)"
## [3,] "November 8, 2011"
## [4,] "Tony Breyal"
## [,12]
## [1,] "http://www.r-bloggers.com/web-scraping-google-scholar-partial-success/"
## [2,] "Web Scraping Google Scholar (Partial Success)"
## [3,] "November 8, 2011"
## [4,] "Tony Breyal"
## [,13]
## [1,] "http://www.r-bloggers.com/web-scraping-google-urls/"
## [2,] "Web Scraping Google URLs"
## [3,] "November 7, 2011"
## [4,] "Tony Breyal"
## [,14]
## [1,] "http://www.r-bloggers.com/next-level-web-scraping/"
## [2,] "Next Level Web Scraping"
## [3,] "November 5, 2011"
## [4,] "Kay Cichini"
## [,15]
## [1,] "http://www.r-bloggers.com/web-scraping-google-scholar-show-result-as-word-cloud-using-r/"
## [2,] "Web Scraping Google Scholar & Show Result as Word Cloud Using R"
## [3,] "November 1, 2011"
## [4,] "Kay Cichini"
## [,16]
## [1,] "http://www.r-bloggers.com/fomc-dates-scraping-data-from-web-pages/"
## [2,] "FOMC Dates – Scraping Data From Web Pages"
## [3,] "November 30, 2014"
## [4,] "Peter Chan"
## [,17]
## [1,] "http://www.r-bloggers.com/web-scraping-the-basics/"
## [2,] "Web-Scraping: the Basics"
## [3,] "February 19, 2014"
## [4,] "Rolf Fredheim"
## [,18]
## [1,] "http://www.r-bloggers.com/r-and-the-web-for-beginners-part-iii-scraping-mps-expenses-in-detail-from-the-web/"
## [2,] "R and the web (for beginners), Part III: Scraping MPs’ expenses in detail from the web"
## [3,] "August 23, 2012"
## [4,] "GivenTheData"
## [,19]
## [1,] "http://www.r-bloggers.com/web-scraping-in-r/"
## [2,] "Web-Scraping in R"
## [3,] "April 2, 2012"
## [4,] "diffuseprior"
## [,20]
## [1,] "http://www.r-bloggers.com/short-r-tutorial-scraping-javascript-generated-data-with-r/"
## [2,] "Short R tutorial: Scraping Javascript Generated Data with R"
## [3,] "March 15, 2015"
## [4,] "DataCamp"
## [,21]
## [1,] "http://www.r-bloggers.com/fomc-dates-full-history-web-scrape/"
## [2,] "FOMC Dates – Full History Web Scrape"
## [3,] "January 21, 2015"
## [4,] "Peter Chan"
## [,22]
## [1,] "http://www.r-bloggers.com/r-screen-scraping-105-counties-of-election-data/"
## [2,] "R Screen Scraping: 105 Counties of Election Data"
## [3,] "February 18, 2011"
## [4,] "Earl Glynn"
## [,23]
## [1,] "http://www.r-bloggers.com/simple-r-screen-scraping-example/"
## [2,] "Simple R Screen Scraping Example"
## [3,] "February 18, 2011"
## [4,] "Earl Glynn"
## [,24]
## [1,] "http://www.r-bloggers.com/automatically-scrape-flight-ticket-data-using-r-and-phantomjs/"
## [2,] "Automatically Scrape Flight Ticket Data Using R and Phantomjs"
## [3,] "April 30, 2014"
## [4,] "Huidong Tian"
## [,25]
## [1,] "http://www.r-bloggers.com/r-a-quick-scrape-of-top-grossing-films-from-boxofficemojo-com/"
## [2,] "R: A Quick Scrape of Top Grossing Films from boxofficemojo.com"
## [3,] "January 13, 2012"
## [4,] "Tony Breyal"
transpose all entries to have a row for each record
a <- t(a)
class(a)
## [1] "matrix"
head(a)
## [,1]
## [1,] "http://www.r-bloggers.com/rvest-easy-web-scraping-with-r/"
## [2,] "http://www.r-bloggers.com/migrating-table-oriented-web-scraping-code-to-rvest-wxpath-css-selector-examples/"
## [3,] "http://www.r-bloggers.com/web-scraping-working-with-apis/"
## [4,] "http://www.r-bloggers.com/web-scraping-scaling-up-digital-data-collection/"
## [5,] "http://www.r-bloggers.com/web-scraping-part2-digging-deeper/"
## [6,] "http://www.r-bloggers.com/a-little-web-scraping-exercise-with-xml-package/"
## [,2]
## [1,] "rvest: easy web scraping with R"
## [2,] "Migrating Table-oriented Web Scraping Code to rvest w/XPath & CSS Selector Examples"
## [3,] "Web Scraping: working with APIs"
## [4,] "Web Scraping: Scaling up Digital Data Collection"
## [5,] "Web Scraping part2: Digging deeper"
## [6,] "A Little Web Scraping Exercise with XML-Package"
## [,3] [,4]
## [1,] "November 24, 2014" "hadleywickham"
## [2,] "September 17, 2014" "Bob Rudis (@hrbrmstr)"
## [3,] "March 12, 2014" "Rolf Fredheim"
## [4,] "March 5, 2014" "Rolf Fredheim"
## [5,] "February 25, 2014" "Rolf Fredheim"
## [6,] "April 5, 2012" "Kay Cichini"
Name the dataframe columns appropriately
a <- rename(a, c("V1"="Blog_URL", "V2"="Blog_Title", "V3" = "Blog_Date", "V4" = "Blog_Author"))
class(a)
## [1] "data.frame"
a
## Blog_URL
## 1 http://www.r-bloggers.com/rvest-easy-web-scraping-with-r/
## 2 http://www.r-bloggers.com/migrating-table-oriented-web-scraping-code-to-rvest-wxpath-css-selector-examples/
## 3 http://www.r-bloggers.com/web-scraping-working-with-apis/
## 4 http://www.r-bloggers.com/web-scraping-scaling-up-digital-data-collection/
## 5 http://www.r-bloggers.com/web-scraping-part2-digging-deeper/
## 6 http://www.r-bloggers.com/a-little-web-scraping-exercise-with-xml-package/
## 7 http://www.r-bloggers.com/r-web-scraping-r-bloggers-facebook-page-to-gain-further-information-about-an-authors-r-blog-posts-e-g-number-of-likes-comments-shares-etc/
## 8 http://www.r-bloggers.com/web-scraping-with-python-the-dark-side-of-data/
## 9 http://www.r-bloggers.com/web-scraping-google-via-xpath/
## 10 http://www.r-bloggers.com/web-scraping-yahoo-search-page-via-xpath/
## 11 http://www.r-bloggers.com/web-scraping-google-scholar-part-2-complete-success/
## 12 http://www.r-bloggers.com/web-scraping-google-scholar-partial-success/
## 13 http://www.r-bloggers.com/web-scraping-google-urls/
## 14 http://www.r-bloggers.com/next-level-web-scraping/
## 15 http://www.r-bloggers.com/web-scraping-google-scholar-show-result-as-word-cloud-using-r/
## 16 http://www.r-bloggers.com/fomc-dates-scraping-data-from-web-pages/
## 17 http://www.r-bloggers.com/web-scraping-the-basics/
## 18 http://www.r-bloggers.com/r-and-the-web-for-beginners-part-iii-scraping-mps-expenses-in-detail-from-the-web/
## 19 http://www.r-bloggers.com/web-scraping-in-r/
## 20 http://www.r-bloggers.com/short-r-tutorial-scraping-javascript-generated-data-with-r/
## 21 http://www.r-bloggers.com/fomc-dates-full-history-web-scrape/
## 22 http://www.r-bloggers.com/r-screen-scraping-105-counties-of-election-data/
## 23 http://www.r-bloggers.com/simple-r-screen-scraping-example/
## 24 http://www.r-bloggers.com/automatically-scrape-flight-ticket-data-using-r-and-phantomjs/
## 25 http://www.r-bloggers.com/r-a-quick-scrape-of-top-grossing-films-from-boxofficemojo-com/
## Blog_Title
## 1 rvest: easy web scraping with R
## 2 Migrating Table-oriented Web Scraping Code to rvest w/XPath & CSS Selector Examples
## 3 Web Scraping: working with APIs
## 4 Web Scraping: Scaling up Digital Data Collection
## 5 Web Scraping part2: Digging deeper
## 6 A Little Web Scraping Exercise with XML-Package
## 7 R: Web Scraping R-bloggers Facebook Page
## 8 Web scraping with Python – the dark side of data
## 9 Web Scraping Google+ via XPath
## 10 Web Scraping Yahoo Search Page via XPath
## 11 Web Scraping Google Scholar: Part 2 (Complete Success)
## 12 Web Scraping Google Scholar (Partial Success)
## 13 Web Scraping Google URLs
## 14 Next Level Web Scraping
## 15 Web Scraping Google Scholar & Show Result as Word Cloud Using R
## 16 FOMC Dates – Scraping Data From Web Pages
## 17 Web-Scraping: the Basics
## 18 R and the web (for beginners), Part III: Scraping MPs’ expenses in detail from the web
## 19 Web-Scraping in R
## 20 Short R tutorial: Scraping Javascript Generated Data with R
## 21 FOMC Dates – Full History Web Scrape
## 22 R Screen Scraping: 105 Counties of Election Data
## 23 Simple R Screen Scraping Example
## 24 Automatically Scrape Flight Ticket Data Using R and Phantomjs
## 25 R: A Quick Scrape of Top Grossing Films from boxofficemojo.com
## Blog_Date Blog_Author
## 1 November 24, 2014 hadleywickham
## 2 September 17, 2014 Bob Rudis (@hrbrmstr)
## 3 March 12, 2014 Rolf Fredheim
## 4 March 5, 2014 Rolf Fredheim
## 5 February 25, 2014 Rolf Fredheim
## 6 April 5, 2012 Kay Cichini
## 7 January 6, 2012 Tony Breyal
## 8 December 27, 2011 axiomOfChoice
## 9 November 11, 2011 Tony Breyal
## 10 November 10, 2011 Tony Breyal
## 11 November 8, 2011 Tony Breyal
## 12 November 8, 2011 Tony Breyal
## 13 November 7, 2011 Tony Breyal
## 14 November 5, 2011 Kay Cichini
## 15 November 1, 2011 Kay Cichini
## 16 November 30, 2014 Peter Chan
## 17 February 19, 2014 Rolf Fredheim
## 18 August 23, 2012 GivenTheData
## 19 April 2, 2012 diffuseprior
## 20 March 15, 2015 DataCamp
## 21 January 21, 2015 Peter Chan
## 22 February 18, 2011 Earl Glynn
## 23 February 18, 2011 Earl Glynn
## 24 April 30, 2014 Huidong Tian
## 25 January 13, 2012 Tony Breyal