Objective

The main objective of this project is to parse the blog entries present at the web site “http://www.r-bloggers.com/search/web%20scraping” (This web site contains the search results obtained when “web scraping” is searched at “http://www.r-bloggers.com”. But we will write a generic R function to search any key word(s) at “r-bloggers.com” website).

Specifically here are the requirements of this project:

  1. Parse the first page blog entries present at “http://www.r-bloggers.com/search/web%20scraping” and output a data frame named first_page_df, with three variables: title, date and author. Generalize this logic to any web page returned when a key word search is made at “http://www.r-bloggers.com”"

  2. Get the number of pages obtained as the search results in “http://www.r-bloggers.com/search/web%20scraping”. Generalize this logic to any web page returned when a key word search is made at “http://www.r-bloggers.com

  3. Generalize the program logic, and let the user enter a search string. The program should return all the search results (from all the pages returned) in the form of a data frame with three variables: title, date and author

Design

We will develop the following R functions to support the above objectives:

Required R packages

We need the following R packages:

NOTE: We will use “selector gadget” to get the required HTML elements. For more information, visit “http://selectorgadget.com

Code Implementation

The following R Code creates a function named “scrape_html()”. It takes a URL as input, gets the html page associated to the URL and obtains the “h2”, “.meta” and “.date” HTML elements data. This function works correctly for the URL obtained whenever any keyword(s) search is made at www.r-bloggers.com website. If this code is used for any other website, you may see some unexpected results. In order to identify the “h2”, “.meta”, and “.date” elements, I opened the “www.r-bloggers.com/search/web%20scraping” in Google chrome, enabled the selector gadget, and selected and un-selected the required elements on the page.

The R code for “scrape_html()” function is given below:

scrape_html <- function(url)
  {
library(rvest)
library(data.table)
  html_txt <- html(url)


#Gets the posts headings
title <- html_txt %>% 
  html_nodes("#leftcontent h2") %>%
  html_text()



#Gets the authors and date information
authors_and_post_date <- html_txt %>% 
  html_nodes(".meta , .date #leftcontent h2") %>%
  html_text()

temp_df <- data.frame(rbindlist(lapply(strsplit(authors_and_post_date,split="By "),as.list)))

names(temp_df) <- c("date","author")

#If the author's details are protected, the NA values will be displayed
  temp_df$author[grep("protected",as.vector(temp_df$author))] <- NA

page_df <- cbind(title,temp_df)


return(page_df)

  
  }

Let us call the scrape_html() function with “http://www.r-bloggers.com/search/web%20scraping” as input. This is the first page obtained when “web scraping” is searched at “www.r-bloggers.com”. NOTE that we obtain a data frame as output, and this data frame contains the topic name, author and date details (only on the input page).

NOTE: This function call satisfies the objective-1 of our project : “Parse the first page blog entries present at”http://www.r-bloggers.com/search/web%20scraping" and output a data frame named first_page_df, with three variables: title, date and author"

#Calling the scrape_html function with "http://www.r-bloggers.com/search/web%20scraping" as input.

url <- "http://www.r-bloggers.com/search/web%20scraping"

first_page_df <- scrape_html(url)
first_page_df
##                                                                                  title
## 1                                                      rvest: easy web scraping with R
## 2  Migrating Table-oriented Web Scraping Code to rvest w/XPath & CSS Selector Examples
## 3                                                      Web Scraping: working with APIs
## 4                                     Web Scraping: Scaling up Digital Data Collection
## 5                                                   Web Scraping part2: Digging deeper
## 6                                      A Little Web Scraping Exercise with XML-Package
## 7                                             R: Web Scraping R-bloggers Facebook Page
## 8                                     Web scraping with Python – the dark side of data
## 9                                                       Web Scraping Google+ via XPath
## 10                                            Web Scraping Yahoo Search Page via XPath
##                  date                author
## 1   November 24, 2014         hadleywickham
## 2  September 17, 2014 Bob Rudis (@hrbrmstr)
## 3      March 12, 2014         Rolf Fredheim
## 4       March 5, 2014         Rolf Fredheim
## 5   February 25, 2014         Rolf Fredheim
## 6       April 5, 2012           Kay Cichini
## 7     January 6, 2012           Tony Breyal
## 8   December 27, 2011         axiomOfChoice
## 9   November 11, 2011           Tony Breyal
## 10  November 10, 2011           Tony Breyal

NOTE: In the above function code, if any author’s details are protected, then NA values are displayed in “author” variable

The above display shows the contents of first_page_df data frame. This data frame has three variables title, date and author.

The R code for “max_pages()” function is given below:

The following R code gets the maximum number of pages returned for any search query at www.r-bloggers.com

NOTE: This function call satisfies the objective-2 of our project : “Get the maximum number of pages returned whenever any key word search is made at www.r-bloggers.com”

#This function takes HTML text as input and returns the maxium number of pages obtained by the search query at www.r-bloggers.com
max_pages <- function(html_txt)
  {
   #Gets the page numbers
p <- html_txt %>% 
  html_nodes(".pages") %>%
  html_text()
 
#Parsing p
return(as.numeric(strsplit(p,"of ")[[1]][2]))
  }

# calling the function

The R code for “search_Rbloggers()” function is given below:

The R code to parse all the pages returned when any key word is searched, is given below. This function takes a search string as input and returns a data frame with three variables:title, date and author. A call to this function with “web scraping” as input returns all the search results of “web scraping”, in the form of a data frame (with details: title, date and author)

search_Rbloggers <- function(str)
  {
  
library(rvest)
library(data.table)

str <- gsub(" ", "%20", str)
 
url <- paste("http://www.r-bloggers.com/search/",str,sep="")

html_txt <- html(url)

#Parsing the first page ...
all_pages_df <- scrape_html(url)

#all_pages_df$page <- 1
#Getting the max number of pages
p <- max_pages(html_txt)

#Parsing the 2nd pasge to last pages
if (p > 1)
  {
for(i in 2:p)
  {
  
     url <- paste("http://www.r-bloggers.com/search/",str,"/page/",i,"/",sep="")
     all_pages_df <- rbind(all_pages_df,scrape_html(url))       
     
     #print(i)
     #print(scrape_html(url))
     
       
  
  }
}

return(all_pages_df)
}

Calling the search_Rbloggers() function with “web scraping” as input. You can call the function if you want to search any key word(s).

NOTE: This function call satisfies the objective-3 of our project : “Generalize the program logic, and let the user enter a search string. The program should return the results in the form of a data frame with three variables: title, date and author

df_temp <- search_Rbloggers("web scraping")
print(data.frame(df_temp),right=FALSE)
##     title                                                                                                                          
## 1   rvest: easy web scraping with R                                                                                                
## 2   Migrating Table-oriented Web Scraping Code to rvest w/XPath & CSS Selector Examples                                            
## 3   Web Scraping: working with APIs                                                                                                
## 4   Web Scraping: Scaling up Digital Data Collection                                                                               
## 5   Web Scraping part2: Digging deeper                                                                                             
## 6   A Little Web Scraping Exercise with XML-Package                                                                                
## 7   R: Web Scraping R-bloggers Facebook Page                                                                                       
## 8   Web scraping with Python – the dark side of data                                                                               
## 9   Web Scraping Google+ via XPath                                                                                                 
## 10  Web Scraping Yahoo Search Page via XPath                                                                                       
## 11  Web Scraping Google Scholar: Part 2 (Complete Success)                                                                         
## 12  Web Scraping Google Scholar (Partial Success)                                                                                  
## 13  Web Scraping Google URLs                                                                                                       
## 14  Next Level Web Scraping                                                                                                        
## 15  Web Scraping Google Scholar & Show Result as Word Cloud Using R                                                                
## 16  Scraping Web Pages With R                                                                                                      
## 17  FOMC Dates – Scraping Data From Web Pages                                                                                      
## 18  Scraping Fantasy Football Projections from the Web                                                                             
## 19  Web-Scraping: the Basics                                                                                                       
## 20  Relenium, Selenium for R. A new tool for webscraping.                                                                          
## 21  R and the web (for beginners), Part III: Scraping MPs’ expenses in detail from the web                                         
## 22  Web-Scraping in R                                                                                                              
## 23  Scraping table from any web page with R or CloudStat                                                                           
## 24  Scraping table from html web with CloudStat                                                                                    
## 25  A Little Webscraping-Exercise…                                                                                                 
## 26  Scraping web data in R                                                                                                         
## 27  Webscraping using readLines and RCurl                                                                                          
## 28  Webscraping using readLines and RCurl                                                                                          
## 29  Short R tutorial: Scraping Javascript Generated Data with R                                                                    
## 30  FOMC Dates – Full History Web Scrape                                                                                           
## 31  Scraping XML Tables with R                                                                                                     
## 32  Scraping SSL Labs Server Test Results With R                                                                                   
## 33  Interfacing R with Web technologies                                                                                            
## 34  Scraping organism metadata for Treebase repositories from GOLD using Python and R                                              
## 35  R-Bloggers’ Web-Presence                                                                                                       
## 36  How-to Extract Text From Multiple Websites with R                                                                              
## 37  Scraping Flora of North America                                                                                                
## 38  Scraping R-bloggers with Python – Part 2                                                                                       
## 39  Scraping R-Bloggers with Python                                                                                                
## 40  R-Function GScholarScraper to Webscrape Google Scholar Search Result                                                           
## 41  Interacting with bioinformatics webservers using R                                                                             
## 42  R Screen Scraping: 105 Counties of Election Data                                                                               
## 43  Simple R Screen Scraping Example                                                                                               
## 44  Scrape Web data using R                                                                                                        
## 45  Digital Data Collection course                                                                                                 
## 46  Getting Data From An Online Source                                                                                             
## 47  Playing around with #rstats twitter data                                                                                       
## 48  50 years of Christmas at the Windsors                                                                                          
## 49  Power Outage Impact Choropleths In 5 Steps in R (featuring rvest & RStudio “Projects”)                                         
## 50  Slightly Advanced rvest with Help from htmltools + XML + pipeR                                                                 
## 51  What size will you be after you lose weight?                                                                                   
## 52  A bioinformatics walk-through: Accessing protein-protein interaction interfaces for all known protein structures with PDBe PISA
## 53  R User Group Roundup                                                                                                           
## 54  Automatically Scrape Flight Ticket Data Using R and Phantomjs                                                                  
## 55  Text Mining Gun Deaths Data                                                                                                    
## 56  Better handling of JSON data in R?                                                                                             
## 57  Upcoming NYC R Programming Classes                                                                                             
## 58  Introduction                                                                                                                   
## 59  Programming instrumental music from scratch                                                                                    
## 60  Programming instrumental music from scratch                                                                                    
## 61  Programming instrumental music from scratch                                                                                    
## 62  xkcd: Visualized                                                                                                               
## 63  Has R-help gotten meaner over time? And what does Mancur Olson have to say about it?                                           
## 64  Data Science, Data Analysis, R and Python                                                                                      
## 65  .Rhistory                                                                                                                      
## 66  Hangman in R: A learning experience                                                                                            
## 67  Data Analysis Training                                                                                                         
## 68  Making an R Package: Not as hard as you think                                                                                  
## 69  Plotting Doctor Who Ratings (1963-2011) with R                                                                                 
## 70  GScholarXScraper: Hacking the GScholarScraper function with XPath                                                              
## 71  Facebook Graph API Explorer with R                                                                                             
## 72  UCLA Statistics: Analyzing Thesis/Dissertation Lengths                                                                         
## 73  Cricket data analysis                                                                                                          
## 74  What to Expect?                                                                                                                
## 75  Analysing The Rock ‘n’ Roll Madrid Marathon                                                                                    
## 76  Monitoring Price Fluctuations of Book Trade-In Values on Amazon                                                                
## 77  More Airline Crashes via the Hadleyverse                                                                                       
## 78  Knitr’s best hidden gem: spin                                                                                                  
## 79  Fuzzy String Matching – a survival skill to tackle unstructured information                                                    
## 80  Who Has the Best Fantasy Football Projections? 2015 Update                                                                     
## 81  Predicting the six nations                                                                                                     
## 82  Building a choropleth map of Italy using mapIT                                                                                 
## 83  New updates to the rNOMADS package and big changes in the GFS model                                                            
## 84  Explore Kaggle Competition Data with R                                                                                         
## 85  How to analyze a new dataset (or, analyzing ‘supercar’ data, part 1)                                                           
## 86  FOMC Dates – Price Data Exploration                                                                                            
## 87  A Letter of Recommendation for Nan Xiao                                                                                        
## 88  Leveraging R for Job Openings for Economists                                                                                   
## 89  Wrangling F1 Data With R – F1DataJunkie Book                                                                                   
## 90  How to Download and Run R Scripts from this Site                                                                               
## 91  FIFA 15 Analysis with R                                                                                                        
## 92  “Do You Want to Steal a Snowman?” – A Look (with R) At TorrentFreak’s Top 10 PiRated Movies List #TLAPD                        
## 93  Visit of Di Cook                                                                                                               
## 94  Identify Fantasy Football Sleepers with this Shiny App                                                                         
## 95  Time to Accept It: publishing in the Journal of Statistical Software                                                           
## 96  2014 World Cup Squads                                                                                                          
## 97  Basketball Data Part II – Length of Career by Position                                                                         
## 98  Using sentiment analysis to predict ratings of popular tv series                                                               
## 99  On the trade history and dynamics of NBA teams                                                                                 
## 100 Rblogger Posting Patterns Analyzed with R                                                                                      
## 101 BARUG talks highlight R’s diverse applications                                                                                 
## 102 Mapping academic collaborations in Evolutionary Biology                                                                        
## 103 President Approval Ratings from Roosevelt to Obama                                                                             
## 104 Evolution of Code                                                                                                              
## 105 Terms                                                                                                                          
## 106 Live Google Spreadsheet For Keeping Track Of Sochi Medals                                                                      
## 107 Using One Programming Language In the Context of Another – Python and R                                                        
## 108 Statistics meets rhetoric: A text analysis of "I Have a Dream" in R                                                            
## 109 Statistics meets rhetoric: A text analysis of “I Have a Dream” in R                                                            
## 110 Second NYC R classes(announcement and teaching experience)                                                                     
## 111 Calling Python from R with rPython                                                                                             
## 112 Why R is Better Than Excel for Fantasy Football (and most other) Data Analysis                                                 
## 113 College Basketball: Presence in the NBA over Time                                                                              
## 114 Creating your personal, portable R code library with GitHub                                                                    
## 115 MLB Rankings Using the Bradley-Terry Model                                                                                     
## 116 ggplot2 Chloropleth of Supreme Court Decisions: A Tutorial                                                                     
## 117 Which airline should you be loyal to?                                                                                          
## 118 Opel Corsa Diesel Usage                                                                                                        
## 119 Logging Data in R Loops: Applied to Twitter.                                                                                   
## 120 Shiny App for CRAN packages                                                                                                    
## 121 The Guerilla Guide to R                                                                                                        
## 122 Presentations of the third Milano R net meeting                                                                                
## 123 Milano (Italy). April 18, 2013. Third Milano R net meeting: agenda                                                             
## 124 April 18, 2013Third Milano R net meeting: agenda                                                                               
## 125 Generating Labels for Supervised Text Classification using CAT and R                                                           
## 126 Hilary: the most poisoned baby name in US history                                                                              
## 127 R and foreign characters                                                                                                       
## 128 SPARQL with R in less than 5 minutes                                                                                           
## 129 Multiple Classification and Authorship of the Hebrew Bible                                                                     
## 130 Chocolate and nobel prize – a true story?                                                                                      
## 131 Animated map of 2012 US election campaigning, with R and ffmpeg                                                                
## 132 Tips on accessing data from various sources with R                                                                             
## 133 R Helper Functions                                                                                                             
## 134 The R-Podcast Episode 10: Adventures in Data Munging Part 2                                                                    
## 135 UseR 2012 highlights                                                                                                           
## 136 Visualizing the CRAN:  Graphing Package Dependencies                                                                           
## 137 118 years of US State Weather Data                                                                                             
## 138 The 50 most used R packages                                                                                                    
## 139 RStudio Development Environment                                                                                                
## 140 R: A Quick Scrape of Top Grossing Films from boxofficemojo.com                                                                 
## 141 Installing quantstrat from R-forge and source                                                                                  
## 142 Analyzing R-bloggers                                                                                                           
## 143 Mapping the Iowa GOP 2012 Caucus Results                                                                                       
## 144 Outliers in the European Parliament                                                                                            
## 145 Subscriptions Feature Added                                                                                                    
## 146 Google Scholar (still) sucks                                                                                                   
## 147 Power Tools for Aspiring Data Journalists: R                                                                                   
## 148 Forecasting recessions                                                                                                         
## 149 CHCN: Canadian Historical Climate Network                                                                                      
## 150 hacking .gov shortened links                                                                                                   
## 151 roll calls, ideal points, 112th Congress                                                                                       
## 152 Automating R Scripts on Amazon EC2                                                                                             
## 153 Friday fun projects                                                                                                            
## 154 Further Adventures in Visualisation with ggplot2                                                                               
## 155 Friday Function: setInternet2                                                                                                  
## 156 Find NHL Players with 30 Goals and 100 PIM using R                                                                             
## 157 NBA Analysis:  Coming Soon!                                                                                                    
## 158 Clustering NHL Skaters                                                                                                         
## 159 Dial-a-statistic! Featuring R and Estonia                                                                                      
## 160 How to buy a used car with R (part 1)                                                                                          
## 161 How to buy a used car with R (part 1)                                                                                          
## 162 Using XML package vs. BeautifulSoup                                                                                            
## 163 Are MLB Games Getting Longer?                                                                                                  
## 164 Analyze Gold Demand and Investments using R                                                                                    
## 165 tooltips in R graphics; nytR package                                                                                           
##     date               author                                            
## 1   November 24, 2014  hadleywickham                                     
## 2   September 17, 2014 Bob Rudis (@hrbrmstr)                             
## 3   March 12, 2014     Rolf Fredheim                                     
## 4   March 5, 2014      Rolf Fredheim                                     
## 5   February 25, 2014  Rolf Fredheim                                     
## 6   April 5, 2012      Kay Cichini                                       
## 7   January 6, 2012    Tony Breyal                                       
## 8   December 27, 2011  axiomOfChoice                                     
## 9   November 11, 2011  Tony Breyal                                       
## 10  November 10, 2011  Tony Breyal                                       
## 11  November 8, 2011   Tony Breyal                                       
## 12  November 8, 2011   Tony Breyal                                       
## 13  November 7, 2011   Tony Breyal                                       
## 14  November 5, 2011   Kay Cichini                                       
## 15  November 1, 2011   Kay Cichini                                       
## 16  April 15, 2015     Tony Hirst                                        
## 17  November 30, 2014  Peter Chan                                        
## 18  June 27, 2014      Isaac Petersen                                    
## 19  February 19, 2014  Rolf Fredheim                                     
## 20  January 4, 2014    aleixrvr                                          
## 21  August 23, 2012    GivenTheData                                      
## 22  April 2, 2012      diffuseprior                                      
## 23  January 15, 2012   PR                                                
## 24  January 12, 2012   CloudStat                                         
## 25  October 22, 2011   Kay Cichini                                       
## 26  August 10, 2011    Zach Mayer                                        
## 27  April 14, 2009     bryan                                             
## 28  April 14, 2009     bryan                                             
## 29  March 15, 2015     DataCamp                                          
## 30  January 21, 2015   Peter Chan                                        
## 31  May 15, 2014       jgreenb1                                          
## 32  April 29, 2014     Bob Rudis (@hrbrmstr)                             
## 33  April 14, 2014     David Smith                                       
## 34  April 4, 2014      What is this? David Springate's personal blog :: R
## 35  April 6, 2012      Kay Cichini                                       
## 36  February 18, 2012  Christopher Gandrud                               
## 37  January 27, 2012   Recology - R                                      
## 38  January 5, 2012    The PolStat R Feed                                
## 39  January 4, 2012    The PolStat R Feed                                
## 40  November 9, 2011   Kay Cichini                                       
## 41  September 8, 2011  nsaunders                                         
## 42  February 18, 2011  Earl Glynn                                        
## 43  February 18, 2011  Earl Glynn                                        
## 44  August 13, 2010    --                                                
## 45  March 20, 2015     Rolf Fredheim                                     
## 46  March 6, 2015      Robert Norberg                                    
## 47  February 28, 2015  <NA>                                              
## 48  December 19, 2014  Dominic Nyhuis                                    
## 49  November 27, 2014  hrbrmstr                                          
## 50  November 26, 2014  klr                                               
## 51  November 14, 2014  dan                                               
## 52  September 28, 2014 biochemistries                                    
## 53  August 28, 2014    Joseph Rickert                                    
## 54  April 30, 2014     Huidong Tian                                      
## 55  March 13, 2014     Francis Smart                                     
## 56  March 13, 2014     Rolf Fredheim                                     
## 57  March 10, 2014     vivian                                            
## 58  February 1, 2014   steadyfish                                        
## 59  July 29, 2013      Vik Paruchuri                                     
## 60  July 29, 2013      - r                                               
## 61  July 29, 2013      Vik Paruchuri                                     
## 62  May 6, 2013        Myles                                             
## 63  April 30, 2013     Trey Causey                                       
## 64  December 15, 2012  Ron Pearson (aka TheNoodleDoodler)                
## 65  October 27, 2012   distantobserver                                   
## 66  July 28, 2012      tylerrinker                                       
## 67  March 20, 2012     prasoonsharma                                     
## 68  January 11, 2012   markbulling                                       
## 69  January 3, 2012    Tony Breyal                                       
## 70  November 13, 2011  Tony Breyal                                       
## 71  November 10, 2011  Tony Breyal                                       
## 72  September 29, 2010 Ryan Rosario                                      
## 73  September 4, 2010  prasoonsharma                                     
## 74  January 22, 2010   Ryan                                              
## 75  April 18, 2015     aschinchon                                        
## 76  April 8, 2015      Andrew Landgraf                                   
## 77  March 31, 2015     hrbrmstr                                          
## 78  March 23, 2015     Dean Attali's R Blog                              
## 79  February 26, 2015  Bigdata Doc                                       
## 80  February 20, 2015  Isaac Petersen                                    
## 81  February 4, 2015   Mango Solutions                                   
## 82  January 19, 2015   Davide Massidda                                   
## 83  January 16, 2015   glossarch                                         
## 84  December 23, 2014  notesofdabbler                                    
## 85  December 16, 2014  Sharpsight Admin                                  
## 86  December 14, 2014  Peter Chan                                        
## 87  November 17, 2014  Yihui Xie                                         
## 88  November 1, 2014   Thiemo Fetzer                                     
## 89  October 30, 2014   Tony Hirst                                        
## 90  October 23, 2014   Isaac Petersen                                    
## 91  September 26, 2014 The Clerk                                         
## 92  September 18, 2014 Bob Rudis (@hrbrmstr)                             
## 93  August 12, 2014    Rob J Hyndman                                     
## 94  July 6, 2014       Isaac Petersen                                    
## 95  June 30, 2014      brobar                                            
## 96  June 5, 2014       gjabel                                            
## 97  June 2, 2014       jgreenb1                                          
## 98  May 26, 2014       tlfvincent                                        
## 99  April 28, 2014     tlfvincent                                        
## 100 April 11, 2014     Mark T Patterson                                  
## 101 April 10, 2014     Joseph Rickert                                    
## 102 April 4, 2014      What is this? David Springate's personal blog :: R
## 103 March 29, 2014     tlfvincent                                        
## 104 March 27, 2014     Educate-R - R                                     
## 105 February 13, 2014  Tal Galili                                        
## 106 February 11, 2014  hrbrmstr                                          
## 107 January 22, 2014   Tony Hirst                                        
## 108 January 20, 2014   Max Ghenis                                        
## 109 January 20, 2014   Max Ghenis                                        
## 110 January 20, 2014   Tal Galili                                        
## 111 January 13, 2014   bryan                                             
## 112 January 13, 2014   Isaac Petersen                                    
## 113 November 7, 2013   Mark T Patterson                                  
## 114 September 21, 2013 bryan                                             
## 115 August 31, 2013    John Ramey                                        
## 116 July 4, 2013       tylerrinker                                       
## 117 July 2, 2013       dan                                               
## 118 June 24, 2013      Wingfeet                                          
## 119 May 26, 2013       Alistair Leak                                     
## 120 May 13, 2013       pssguy                                            
## 121 May 12, 2013       Nikhil Gopal                                      
## 122 April 19, 2013     Milano R net                                      
## 123 April 10, 2013     Milano R net                                      
## 124 March 25, 2013     Milano R net                                      
## 125 February 4, 2013   Solomon                                           
## 126 January 29, 2013   hilaryparker                                      
## 127 January 25, 2013   Rolf Fredheim                                     
## 128 January 23, 2013   bryan                                             
## 129 January 1, 2013    inkhorn82                                         
## 130 December 22, 2012  Max Gordon                                        
## 131 October 28, 2012   civilstat                                         
## 132 October 3, 2012    David Smith                                       
## 133 September 25, 2012 bryan                                             
## 134 September 16, 2012 Eric                                              
## 135 June 20, 2012      David Smith                                       
## 136 May 17, 2012       wrathematics                                      
## 137 April 22, 2012     drunksandlampposts                                
## 138 April 5, 2012      flodel                                            
## 139 March 23, 2012     bryan                                             
## 140 January 13, 2012   Tony Breyal                                       
## 141 January 10, 2012   bryan                                             
## 142 January 6, 2012    The PolStat R Feed                                
## 143 January 4, 2012    jjh                                               
## 144 December 20, 2011  The PolStat Feed                                  
## 145 December 7, 2011   bryan                                             
## 146 November 13, 2011  bbolker                                           
## 147 October 31, 2011   Tony Hirst                                        
## 148 August 9, 2011     Zach Mayer                                        
## 149 August 4, 2011     Steven Mosher                                     
## 150 July 30, 2011      Harlan                                            
## 151 June 29, 2011      jackman                                           
## 152 June 9, 2011       Travis Nelson                                     
## 153 May 14, 2011       nsaunders                                         
## 154 April 25, 2011     hayward                                           
## 155 April 15, 2011     richierocks                                       
## 156 April 2, 2011      btibert3                                          
## 157 March 21, 2011     Ryan                                              
## 158 February 6, 2011   --                                                
## 159 January 16, 2011   Ethan Brown                                       
## 160 October 31, 2010   Dan Knoepfle's Blog                               
## 161 October 31, 2010   Dan Knoepfle's Blog                               
## 162 August 31, 2010    Ryan                                              
## 163 August 5, 2010     Ryan                                              
## 164 June 29, 2010      C                                                 
## 165 December 28, 2009  jackman

The above display shows the search results obtained by scraping all the pages of the search results (for the key word(s) “web scraping” at R-bloggers.com website).

Summary

This project created a generic function which takes some key word(s) as input, searches the “www.r-bloggers.com” website and returns the search results in the form of a data frame with three variables: title, date and author. This function, if implemented in C++ or Java, can act as an API to the search the “www.r-bloggers.com” web site. The data frame returned by the function can be enhanced to include the URL and page number (to which the blog belongs to, in the search results).

-~-End of Project Report-~-