The main objective of this project is to parse the blog entries present at the web site “http://www.r-bloggers.com/search/web%20scraping” (This web site contains the search results obtained when “web scraping” is searched at “http://www.r-bloggers.com”. But we will write a generic R function to search any key word(s) at “r-bloggers.com” website).
Specifically here are the requirements of this project:
Parse the first page blog entries present at “http://www.r-bloggers.com/search/web%20scraping” and output a data frame named first_page_df, with three variables: title, date and author. Generalize this logic to any web page returned when a key word search is made at “http://www.r-bloggers.com”"
Get the number of pages obtained as the search results in “http://www.r-bloggers.com/search/web%20scraping”. Generalize this logic to any web page returned when a key word search is made at “http://www.r-bloggers.com”
Generalize the program logic, and let the user enter a search string. The program should return all the search results (from all the pages returned) in the form of a data frame with three variables: title, date and author
We will develop the following R functions to support the above objectives:
scrape_html(url) This function takes a URL as input and outputs the data frame containing the blog’s title, date (when the blog was posted) and the blog’s author.
max_pages(html_txt) This function takes HTML Text (obtained from www.r-bloggers.com, whenever a search is made at this web site), parses the text, and outputs the number of pages returned as search result
search_Rbloggers(search_keywords) This function takes character string as input, searches the text at www.r-bloggers.com website, and gives a data frame containing the blog’s title, date (when the blog was posted) and the blog’s author. This function calls the other two functions scrape_html() and max_pages()
We need the following R packages:
rvest
data.frame
NOTE: We will use “selector gadget” to get the required HTML elements. For more information, visit “http://selectorgadget.com”
The following R Code creates a function named “scrape_html()”. It takes a URL as input, gets the html page associated to the URL and obtains the “h2”, “.meta” and “.date” HTML elements data. This function works correctly for the URL obtained whenever any keyword(s) search is made at www.r-bloggers.com website. If this code is used for any other website, you may see some unexpected results. In order to identify the “h2”, “.meta”, and “.date” elements, I opened the “www.r-bloggers.com/search/web%20scraping” in Google chrome, enabled the selector gadget, and selected and un-selected the required elements on the page.
scrape_html <- function(url)
{
library(rvest)
library(data.table)
html_txt <- html(url)
#Gets the posts headings
title <- html_txt %>%
html_nodes("#leftcontent h2") %>%
html_text()
#Gets the authors and date information
authors_and_post_date <- html_txt %>%
html_nodes(".meta , .date #leftcontent h2") %>%
html_text()
temp_df <- data.frame(rbindlist(lapply(strsplit(authors_and_post_date,split="By "),as.list)))
names(temp_df) <- c("date","author")
#If the author's details are protected, the NA values will be displayed
temp_df$author[grep("protected",as.vector(temp_df$author))] <- NA
page_df <- cbind(title,temp_df)
return(page_df)
}
Let us call the scrape_html() function with “http://www.r-bloggers.com/search/web%20scraping” as input. This is the first page obtained when “web scraping” is searched at “www.r-bloggers.com”. NOTE that we obtain a data frame as output, and this data frame contains the topic name, author and date details (only on the input page).
NOTE: This function call satisfies the objective-1 of our project : “Parse the first page blog entries present at”http://www.r-bloggers.com/search/web%20scraping" and output a data frame named first_page_df, with three variables: title, date and author"
#Calling the scrape_html function with "http://www.r-bloggers.com/search/web%20scraping" as input.
url <- "http://www.r-bloggers.com/search/web%20scraping"
first_page_df <- scrape_html(url)
first_page_df
## title
## 1 rvest: easy web scraping with R
## 2 Migrating Table-oriented Web Scraping Code to rvest w/XPath & CSS Selector Examples
## 3 Web Scraping: working with APIs
## 4 Web Scraping: Scaling up Digital Data Collection
## 5 Web Scraping part2: Digging deeper
## 6 A Little Web Scraping Exercise with XML-Package
## 7 R: Web Scraping R-bloggers Facebook Page
## 8 Web scraping with Python – the dark side of data
## 9 Web Scraping Google+ via XPath
## 10 Web Scraping Yahoo Search Page via XPath
## date author
## 1 November 24, 2014 hadleywickham
## 2 September 17, 2014 Bob Rudis (@hrbrmstr)
## 3 March 12, 2014 Rolf Fredheim
## 4 March 5, 2014 Rolf Fredheim
## 5 February 25, 2014 Rolf Fredheim
## 6 April 5, 2012 Kay Cichini
## 7 January 6, 2012 Tony Breyal
## 8 December 27, 2011 axiomOfChoice
## 9 November 11, 2011 Tony Breyal
## 10 November 10, 2011 Tony Breyal
NOTE: In the above function code, if any author’s details are protected, then NA values are displayed in “author” variable
The above display shows the contents of first_page_df data frame. This data frame has three variables title, date and author.
The following R code gets the maximum number of pages returned for any search query at www.r-bloggers.com
NOTE: This function call satisfies the objective-2 of our project : “Get the maximum number of pages returned whenever any key word search is made at www.r-bloggers.com”
#This function takes HTML text as input and returns the maxium number of pages obtained by the search query at www.r-bloggers.com
max_pages <- function(html_txt)
{
#Gets the page numbers
p <- html_txt %>%
html_nodes(".pages") %>%
html_text()
#Parsing p
return(as.numeric(strsplit(p,"of ")[[1]][2]))
}
# calling the function
The R code to parse all the pages returned when any key word is searched, is given below. This function takes a search string as input and returns a data frame with three variables:title, date and author. A call to this function with “web scraping” as input returns all the search results of “web scraping”, in the form of a data frame (with details: title, date and author)
search_Rbloggers <- function(str)
{
library(rvest)
library(data.table)
str <- gsub(" ", "%20", str)
url <- paste("http://www.r-bloggers.com/search/",str,sep="")
html_txt <- html(url)
#Parsing the first page ...
all_pages_df <- scrape_html(url)
#all_pages_df$page <- 1
#Getting the max number of pages
p <- max_pages(html_txt)
#Parsing the 2nd pasge to last pages
if (p > 1)
{
for(i in 2:p)
{
url <- paste("http://www.r-bloggers.com/search/",str,"/page/",i,"/",sep="")
all_pages_df <- rbind(all_pages_df,scrape_html(url))
#print(i)
#print(scrape_html(url))
}
}
return(all_pages_df)
}
Calling the search_Rbloggers() function with “web scraping” as input. You can call the function if you want to search any key word(s).
NOTE: This function call satisfies the objective-3 of our project : “Generalize the program logic, and let the user enter a search string. The program should return the results in the form of a data frame with three variables: title, date and author”
df_temp <- search_Rbloggers("web scraping")
print(data.frame(df_temp),right=FALSE)
## title
## 1 rvest: easy web scraping with R
## 2 Migrating Table-oriented Web Scraping Code to rvest w/XPath & CSS Selector Examples
## 3 Web Scraping: working with APIs
## 4 Web Scraping: Scaling up Digital Data Collection
## 5 Web Scraping part2: Digging deeper
## 6 A Little Web Scraping Exercise with XML-Package
## 7 R: Web Scraping R-bloggers Facebook Page
## 8 Web scraping with Python – the dark side of data
## 9 Web Scraping Google+ via XPath
## 10 Web Scraping Yahoo Search Page via XPath
## 11 Web Scraping Google Scholar: Part 2 (Complete Success)
## 12 Web Scraping Google Scholar (Partial Success)
## 13 Web Scraping Google URLs
## 14 Next Level Web Scraping
## 15 Web Scraping Google Scholar & Show Result as Word Cloud Using R
## 16 Scraping Web Pages With R
## 17 FOMC Dates – Scraping Data From Web Pages
## 18 Scraping Fantasy Football Projections from the Web
## 19 Web-Scraping: the Basics
## 20 Relenium, Selenium for R. A new tool for webscraping.
## 21 R and the web (for beginners), Part III: Scraping MPs’ expenses in detail from the web
## 22 Web-Scraping in R
## 23 Scraping table from any web page with R or CloudStat
## 24 Scraping table from html web with CloudStat
## 25 A Little Webscraping-Exercise…
## 26 Scraping web data in R
## 27 Webscraping using readLines and RCurl
## 28 Webscraping using readLines and RCurl
## 29 Short R tutorial: Scraping Javascript Generated Data with R
## 30 FOMC Dates – Full History Web Scrape
## 31 Scraping XML Tables with R
## 32 Scraping SSL Labs Server Test Results With R
## 33 Interfacing R with Web technologies
## 34 Scraping organism metadata for Treebase repositories from GOLD using Python and R
## 35 R-Bloggers’ Web-Presence
## 36 How-to Extract Text From Multiple Websites with R
## 37 Scraping Flora of North America
## 38 Scraping R-bloggers with Python – Part 2
## 39 Scraping R-Bloggers with Python
## 40 R-Function GScholarScraper to Webscrape Google Scholar Search Result
## 41 Interacting with bioinformatics webservers using R
## 42 R Screen Scraping: 105 Counties of Election Data
## 43 Simple R Screen Scraping Example
## 44 Scrape Web data using R
## 45 Digital Data Collection course
## 46 Getting Data From An Online Source
## 47 Playing around with #rstats twitter data
## 48 50 years of Christmas at the Windsors
## 49 Power Outage Impact Choropleths In 5 Steps in R (featuring rvest & RStudio “Projects”)
## 50 Slightly Advanced rvest with Help from htmltools + XML + pipeR
## 51 What size will you be after you lose weight?
## 52 A bioinformatics walk-through: Accessing protein-protein interaction interfaces for all known protein structures with PDBe PISA
## 53 R User Group Roundup
## 54 Automatically Scrape Flight Ticket Data Using R and Phantomjs
## 55 Text Mining Gun Deaths Data
## 56 Better handling of JSON data in R?
## 57 Upcoming NYC R Programming Classes
## 58 Introduction
## 59 Programming instrumental music from scratch
## 60 Programming instrumental music from scratch
## 61 Programming instrumental music from scratch
## 62 xkcd: Visualized
## 63 Has R-help gotten meaner over time? And what does Mancur Olson have to say about it?
## 64 Data Science, Data Analysis, R and Python
## 65 .Rhistory
## 66 Hangman in R: A learning experience
## 67 Data Analysis Training
## 68 Making an R Package: Not as hard as you think
## 69 Plotting Doctor Who Ratings (1963-2011) with R
## 70 GScholarXScraper: Hacking the GScholarScraper function with XPath
## 71 Facebook Graph API Explorer with R
## 72 UCLA Statistics: Analyzing Thesis/Dissertation Lengths
## 73 Cricket data analysis
## 74 What to Expect?
## 75 Analysing The Rock ‘n’ Roll Madrid Marathon
## 76 Monitoring Price Fluctuations of Book Trade-In Values on Amazon
## 77 More Airline Crashes via the Hadleyverse
## 78 Knitr’s best hidden gem: spin
## 79 Fuzzy String Matching – a survival skill to tackle unstructured information
## 80 Who Has the Best Fantasy Football Projections? 2015 Update
## 81 Predicting the six nations
## 82 Building a choropleth map of Italy using mapIT
## 83 New updates to the rNOMADS package and big changes in the GFS model
## 84 Explore Kaggle Competition Data with R
## 85 How to analyze a new dataset (or, analyzing ‘supercar’ data, part 1)
## 86 FOMC Dates – Price Data Exploration
## 87 A Letter of Recommendation for Nan Xiao
## 88 Leveraging R for Job Openings for Economists
## 89 Wrangling F1 Data With R – F1DataJunkie Book
## 90 How to Download and Run R Scripts from this Site
## 91 FIFA 15 Analysis with R
## 92 “Do You Want to Steal a Snowman?” – A Look (with R) At TorrentFreak’s Top 10 PiRated Movies List #TLAPD
## 93 Visit of Di Cook
## 94 Identify Fantasy Football Sleepers with this Shiny App
## 95 Time to Accept It: publishing in the Journal of Statistical Software
## 96 2014 World Cup Squads
## 97 Basketball Data Part II – Length of Career by Position
## 98 Using sentiment analysis to predict ratings of popular tv series
## 99 On the trade history and dynamics of NBA teams
## 100 Rblogger Posting Patterns Analyzed with R
## 101 BARUG talks highlight R’s diverse applications
## 102 Mapping academic collaborations in Evolutionary Biology
## 103 President Approval Ratings from Roosevelt to Obama
## 104 Evolution of Code
## 105 Terms
## 106 Live Google Spreadsheet For Keeping Track Of Sochi Medals
## 107 Using One Programming Language In the Context of Another – Python and R
## 108 Statistics meets rhetoric: A text analysis of "I Have a Dream" in R
## 109 Statistics meets rhetoric: A text analysis of “I Have a Dream” in R
## 110 Second NYC R classes(announcement and teaching experience)
## 111 Calling Python from R with rPython
## 112 Why R is Better Than Excel for Fantasy Football (and most other) Data Analysis
## 113 College Basketball: Presence in the NBA over Time
## 114 Creating your personal, portable R code library with GitHub
## 115 MLB Rankings Using the Bradley-Terry Model
## 116 ggplot2 Chloropleth of Supreme Court Decisions: A Tutorial
## 117 Which airline should you be loyal to?
## 118 Opel Corsa Diesel Usage
## 119 Logging Data in R Loops: Applied to Twitter.
## 120 Shiny App for CRAN packages
## 121 The Guerilla Guide to R
## 122 Presentations of the third Milano R net meeting
## 123 Milano (Italy). April 18, 2013. Third Milano R net meeting: agenda
## 124 April 18, 2013Third Milano R net meeting: agenda
## 125 Generating Labels for Supervised Text Classification using CAT and R
## 126 Hilary: the most poisoned baby name in US history
## 127 R and foreign characters
## 128 SPARQL with R in less than 5 minutes
## 129 Multiple Classification and Authorship of the Hebrew Bible
## 130 Chocolate and nobel prize – a true story?
## 131 Animated map of 2012 US election campaigning, with R and ffmpeg
## 132 Tips on accessing data from various sources with R
## 133 R Helper Functions
## 134 The R-Podcast Episode 10: Adventures in Data Munging Part 2
## 135 UseR 2012 highlights
## 136 Visualizing the CRAN: Graphing Package Dependencies
## 137 118 years of US State Weather Data
## 138 The 50 most used R packages
## 139 RStudio Development Environment
## 140 R: A Quick Scrape of Top Grossing Films from boxofficemojo.com
## 141 Installing quantstrat from R-forge and source
## 142 Analyzing R-bloggers
## 143 Mapping the Iowa GOP 2012 Caucus Results
## 144 Outliers in the European Parliament
## 145 Subscriptions Feature Added
## 146 Google Scholar (still) sucks
## 147 Power Tools for Aspiring Data Journalists: R
## 148 Forecasting recessions
## 149 CHCN: Canadian Historical Climate Network
## 150 hacking .gov shortened links
## 151 roll calls, ideal points, 112th Congress
## 152 Automating R Scripts on Amazon EC2
## 153 Friday fun projects
## 154 Further Adventures in Visualisation with ggplot2
## 155 Friday Function: setInternet2
## 156 Find NHL Players with 30 Goals and 100 PIM using R
## 157 NBA Analysis: Coming Soon!
## 158 Clustering NHL Skaters
## 159 Dial-a-statistic! Featuring R and Estonia
## 160 How to buy a used car with R (part 1)
## 161 How to buy a used car with R (part 1)
## 162 Using XML package vs. BeautifulSoup
## 163 Are MLB Games Getting Longer?
## 164 Analyze Gold Demand and Investments using R
## 165 tooltips in R graphics; nytR package
## date author
## 1 November 24, 2014 hadleywickham
## 2 September 17, 2014 Bob Rudis (@hrbrmstr)
## 3 March 12, 2014 Rolf Fredheim
## 4 March 5, 2014 Rolf Fredheim
## 5 February 25, 2014 Rolf Fredheim
## 6 April 5, 2012 Kay Cichini
## 7 January 6, 2012 Tony Breyal
## 8 December 27, 2011 axiomOfChoice
## 9 November 11, 2011 Tony Breyal
## 10 November 10, 2011 Tony Breyal
## 11 November 8, 2011 Tony Breyal
## 12 November 8, 2011 Tony Breyal
## 13 November 7, 2011 Tony Breyal
## 14 November 5, 2011 Kay Cichini
## 15 November 1, 2011 Kay Cichini
## 16 April 15, 2015 Tony Hirst
## 17 November 30, 2014 Peter Chan
## 18 June 27, 2014 Isaac Petersen
## 19 February 19, 2014 Rolf Fredheim
## 20 January 4, 2014 aleixrvr
## 21 August 23, 2012 GivenTheData
## 22 April 2, 2012 diffuseprior
## 23 January 15, 2012 PR
## 24 January 12, 2012 CloudStat
## 25 October 22, 2011 Kay Cichini
## 26 August 10, 2011 Zach Mayer
## 27 April 14, 2009 bryan
## 28 April 14, 2009 bryan
## 29 March 15, 2015 DataCamp
## 30 January 21, 2015 Peter Chan
## 31 May 15, 2014 jgreenb1
## 32 April 29, 2014 Bob Rudis (@hrbrmstr)
## 33 April 14, 2014 David Smith
## 34 April 4, 2014 What is this? David Springate's personal blog :: R
## 35 April 6, 2012 Kay Cichini
## 36 February 18, 2012 Christopher Gandrud
## 37 January 27, 2012 Recology - R
## 38 January 5, 2012 The PolStat R Feed
## 39 January 4, 2012 The PolStat R Feed
## 40 November 9, 2011 Kay Cichini
## 41 September 8, 2011 nsaunders
## 42 February 18, 2011 Earl Glynn
## 43 February 18, 2011 Earl Glynn
## 44 August 13, 2010 --
## 45 March 20, 2015 Rolf Fredheim
## 46 March 6, 2015 Robert Norberg
## 47 February 28, 2015 <NA>
## 48 December 19, 2014 Dominic Nyhuis
## 49 November 27, 2014 hrbrmstr
## 50 November 26, 2014 klr
## 51 November 14, 2014 dan
## 52 September 28, 2014 biochemistries
## 53 August 28, 2014 Joseph Rickert
## 54 April 30, 2014 Huidong Tian
## 55 March 13, 2014 Francis Smart
## 56 March 13, 2014 Rolf Fredheim
## 57 March 10, 2014 vivian
## 58 February 1, 2014 steadyfish
## 59 July 29, 2013 Vik Paruchuri
## 60 July 29, 2013 - r
## 61 July 29, 2013 Vik Paruchuri
## 62 May 6, 2013 Myles
## 63 April 30, 2013 Trey Causey
## 64 December 15, 2012 Ron Pearson (aka TheNoodleDoodler)
## 65 October 27, 2012 distantobserver
## 66 July 28, 2012 tylerrinker
## 67 March 20, 2012 prasoonsharma
## 68 January 11, 2012 markbulling
## 69 January 3, 2012 Tony Breyal
## 70 November 13, 2011 Tony Breyal
## 71 November 10, 2011 Tony Breyal
## 72 September 29, 2010 Ryan Rosario
## 73 September 4, 2010 prasoonsharma
## 74 January 22, 2010 Ryan
## 75 April 18, 2015 aschinchon
## 76 April 8, 2015 Andrew Landgraf
## 77 March 31, 2015 hrbrmstr
## 78 March 23, 2015 Dean Attali's R Blog
## 79 February 26, 2015 Bigdata Doc
## 80 February 20, 2015 Isaac Petersen
## 81 February 4, 2015 Mango Solutions
## 82 January 19, 2015 Davide Massidda
## 83 January 16, 2015 glossarch
## 84 December 23, 2014 notesofdabbler
## 85 December 16, 2014 Sharpsight Admin
## 86 December 14, 2014 Peter Chan
## 87 November 17, 2014 Yihui Xie
## 88 November 1, 2014 Thiemo Fetzer
## 89 October 30, 2014 Tony Hirst
## 90 October 23, 2014 Isaac Petersen
## 91 September 26, 2014 The Clerk
## 92 September 18, 2014 Bob Rudis (@hrbrmstr)
## 93 August 12, 2014 Rob J Hyndman
## 94 July 6, 2014 Isaac Petersen
## 95 June 30, 2014 brobar
## 96 June 5, 2014 gjabel
## 97 June 2, 2014 jgreenb1
## 98 May 26, 2014 tlfvincent
## 99 April 28, 2014 tlfvincent
## 100 April 11, 2014 Mark T Patterson
## 101 April 10, 2014 Joseph Rickert
## 102 April 4, 2014 What is this? David Springate's personal blog :: R
## 103 March 29, 2014 tlfvincent
## 104 March 27, 2014 Educate-R - R
## 105 February 13, 2014 Tal Galili
## 106 February 11, 2014 hrbrmstr
## 107 January 22, 2014 Tony Hirst
## 108 January 20, 2014 Max Ghenis
## 109 January 20, 2014 Max Ghenis
## 110 January 20, 2014 Tal Galili
## 111 January 13, 2014 bryan
## 112 January 13, 2014 Isaac Petersen
## 113 November 7, 2013 Mark T Patterson
## 114 September 21, 2013 bryan
## 115 August 31, 2013 John Ramey
## 116 July 4, 2013 tylerrinker
## 117 July 2, 2013 dan
## 118 June 24, 2013 Wingfeet
## 119 May 26, 2013 Alistair Leak
## 120 May 13, 2013 pssguy
## 121 May 12, 2013 Nikhil Gopal
## 122 April 19, 2013 Milano R net
## 123 April 10, 2013 Milano R net
## 124 March 25, 2013 Milano R net
## 125 February 4, 2013 Solomon
## 126 January 29, 2013 hilaryparker
## 127 January 25, 2013 Rolf Fredheim
## 128 January 23, 2013 bryan
## 129 January 1, 2013 inkhorn82
## 130 December 22, 2012 Max Gordon
## 131 October 28, 2012 civilstat
## 132 October 3, 2012 David Smith
## 133 September 25, 2012 bryan
## 134 September 16, 2012 Eric
## 135 June 20, 2012 David Smith
## 136 May 17, 2012 wrathematics
## 137 April 22, 2012 drunksandlampposts
## 138 April 5, 2012 flodel
## 139 March 23, 2012 bryan
## 140 January 13, 2012 Tony Breyal
## 141 January 10, 2012 bryan
## 142 January 6, 2012 The PolStat R Feed
## 143 January 4, 2012 jjh
## 144 December 20, 2011 The PolStat Feed
## 145 December 7, 2011 bryan
## 146 November 13, 2011 bbolker
## 147 October 31, 2011 Tony Hirst
## 148 August 9, 2011 Zach Mayer
## 149 August 4, 2011 Steven Mosher
## 150 July 30, 2011 Harlan
## 151 June 29, 2011 jackman
## 152 June 9, 2011 Travis Nelson
## 153 May 14, 2011 nsaunders
## 154 April 25, 2011 hayward
## 155 April 15, 2011 richierocks
## 156 April 2, 2011 btibert3
## 157 March 21, 2011 Ryan
## 158 February 6, 2011 --
## 159 January 16, 2011 Ethan Brown
## 160 October 31, 2010 Dan Knoepfle's Blog
## 161 October 31, 2010 Dan Knoepfle's Blog
## 162 August 31, 2010 Ryan
## 163 August 5, 2010 Ryan
## 164 June 29, 2010 C
## 165 December 28, 2009 jackman
The above display shows the search results obtained by scraping all the pages of the search results (for the key word(s) “web scraping” at R-bloggers.com website).
df_temp <- search_Rbloggers("rvest")
print(data.frame(df_temp),right=FALSE)
## title
## 1 Harvesting Canadian climate data
## 2 Using rvest to Scrape an HTML Table
## 3 Power Outage Impact Choropleths In 5 Steps in R (featuring rvest & RStudio “Projects”)
## 4 Happy Thanksgiving | More Examples of XML + rvest with SVG
## 5 Slightly Advanced rvest with Help from htmltools + XML + pipeR
## 6 rvest: easy web scraping with R
## 7 Charting/Mapping the Scottish Vote with R (an rvest/dplyr/tidyr/TopoJSON/ggplot tutorial)
## 8 Migrating Table-oriented Web Scraping Code to rvest w/XPath & CSS Selector Examples
## 9 R and Google Visualization API: Fish harvests
## 10 Harvesting & Analyzing Interaction Data in R: The Case of MyLyn
## 11 Analysing The Rock ‘n’ Roll Madrid Marathon
## 12 Scraping Web Pages With R
## 13 Monitoring Price Fluctuations of Book Trade-In Values on Amazon
## 14 More Airline Crashes via the Hadleyverse
## 15 Digital Data Collection course
## 16 Short R tutorial: Scraping Javascript Generated Data with R
## 17 Getting Data From An Online Source
## 18 Playing around with #rstats twitter data
## 19 A Step to the Right in R Assignments
## 20 Predicting the six nations
## 21 Building [Security] Dashboards w/R & Shiny + shinydashboard
## 22 Automatic Detection of the Language of a Tweet
## 23 Much Better Animated Paths | Christmas SVG
## 24 analyze the public libraries survey (pls) with r
## 25 Overcoming D3 Cartographic Envy With R + ggplot
## 26 “Do You Want to Steal a Snowman?” – A Look (with R) At TorrentFreak’s Top 10 PiRated Movies List #TLAPD
## 27 ASDAR book Review
## 28 R/Finance 2014 Review
## 29 subset vectors in Rcpp11
## 30 Modernizing sugar in Rcpp11
## 31 Modernizing sugar in Rcpp11
## 32 Why multiple imputation?
## 33 How popular is the President of Mexico on Twitter?
## 34 Sustainability of water use in agriculture
## 35 Bayesian First Aid: Two Sample t-test
## 36 Bayesian First Aid: One Sample and Paired Samples t-test
## 37 analyze the national survey of oaa participants (nps) with r
## 38 Too crude to be true?
## 39 Sentiment Analysis on Twitter with Datumbox API
## 40 Sentiment Analysis on Twitter with Viralheat API
## 41 change in weight of cars plot
## 42 My take on the USA versus Western Europe comparison of GM corn
## 43 Scholarly metadata in R
## 44 Scholarly metadata in R
## 45 Scholarly metadata in R
## 46 Effects of forest management on a woodland salamander in Missouri
## 47 Data Science, Data Analysis, R and Python
## 48 Obviousness of REITs?
## 49 Scholarly metadata from R
## 50 knitcitations
## 51 Review: “Forest Analytics with R: an introduction”
## 52 Web-Scraping in R
## 53 Internet surveys
## 54 Using factor analysis or principal components analysis or measurement-error models for biological measurements in archaeology?
## 55 Surviving a binomial mixed model
## 56 Calling R lovers and bloggers – to work together on “The R Programming wikibook”
## 57 Boxplots & Beyond IV: Beanplots
## 58 Visualizing Agricultural Subsidies by Kentucky County
## 59 Don’t be a Turkey
## 60 Don’t be a Turkey
## 61 BioStar users (of the world, unite)
## 62 TripleR round-up
## 63 A small and lonely sea urchin…
## 64 What I’ll be presenting at O’Reilly Money Tech 2009
## date author
## 1 January 14, 2015 Gavin L. Simpson
## 2 January 8, 2015 Cory Nissen
## 3 November 27, 2014 hrbrmstr
## 4 November 26, 2014 klr
## 5 November 26, 2014 klr
## 6 November 24, 2014 hadleywickham
## 7 September 20, 2014 hrbrmstr
## 8 September 17, 2014 Bob Rudis (@hrbrmstr)
## 9 January 17, 2011 Scott Chamberlain
## 10 September 9, 2010 VCASMO - drewconway
## 11 April 18, 2015 aschinchon
## 12 April 15, 2015 Tony Hirst
## 13 April 8, 2015 Andrew Landgraf
## 14 March 31, 2015 hrbrmstr
## 15 March 20, 2015 Rolf Fredheim
## 16 March 15, 2015 DataCamp
## 17 March 6, 2015 Robert Norberg
## 18 February 28, 2015 <NA>
## 19 February 4, 2015 hrbrmstr
## 20 February 4, 2015 Mango Solutions
## 21 January 24, 2015 Bob Rudis (@hrbrmstr)
## 22 January 5, 2015 arthur charpentier
## 23 December 2, 2014 klr
## 24 October 14, 2014 Anthony Damico
## 25 September 25, 2014 hrbrmstr
## 26 September 18, 2014 Bob Rudis (@hrbrmstr)
## 27 September 8, 2014 Robin Lovelace - R
## 28 June 30, 2014 Joshua Ulrich
## 29 June 7, 2014 romain francois
## 30 May 27, 2014 romain francois
## 31 May 26, 2014 romain francois
## 32 March 20, 2014 Michael kao
## 33 March 18, 2014 Jose
## 34 March 9, 2014 vikasrawal
## 35 February 24, 2014 Rasmus Bååth
## 36 February 4, 2014 Rasmus Bååth
## 37 January 21, 2014 Anthony Damico
## 38 October 8, 2013 Max Gordon
## 39 September 9, 2013 julianhi
## 40 September 2, 2013 julianhi
## 41 July 7, 2013 Wingfeet
## 42 July 4, 2013 Luis
## 43 March 16, 2013 Recology - R
## 44 March 16, 2013 Recology - R
## 45 March 16, 2013 Recology - R
## 46 January 25, 2013 Daniel Hocking
## 47 December 15, 2012 Ron Pearson (aka TheNoodleDoodler)
## 48 September 20, 2012 klr
## 49 September 17, 2012 Recology - R
## 50 May 30, 2012 Carl Boettiger
## 51 May 29, 2012 Luis
## 52 April 2, 2012 diffuseprior
## 53 January 18, 2012 Rob J Hyndman
## 54 December 31, 2011 andrew
## 55 November 11, 2011 Luis
## 56 June 20, 2011 Tal Galili
## 57 March 5, 2011 Ron Pearson (aka TheNoodleDoodler)
## 58 December 12, 2010 Matt Bogard
## 59 November 9, 2010 C
## 60 November 9, 2010 C
## 61 October 9, 2010 nsaunders
## 62 August 23, 2010 felixschoenbrodt
## 63 August 22, 2010 Timothée
## 64 October 21, 2008 mike
The above data frame display the search results of “rvest” key word at “r-bloggers.com”
You may use any desired key word(s) (enclosed in quotes), as input to search_Rbloggers() function.
This project created a generic function which takes some key word(s) as input, searches the “www.r-bloggers.com” website and returns the search results in the form of a data frame with three variables: title, date and author. This function, if implemented in C++ or Java, can act as an API to the search the “www.r-bloggers.com” web site. The data frame returned by the function can be enhanced to include the URL and page number (to which the blog belongs to, in the search results).
-~-End of Project Report-~-