This is an tutorial about data scraping in R using rvest package.

if (!require("pacman")) install.packages("pacman")
Loading required package: pacman
pacman::p_load(rvest, dplyr, stringr,DT,tidyr, readxl,knitr,ggplot2)
#library(rvest)  
#library(stringr)  # deal with string in r
#library(tidyr) # data cleaning
#library(DT)           # for printing nice HTML output tables

1. HTML: Hypertext Markup Language

A HTML file is structured (hierarchical / tree based).

*everything in an HTML document is a node:**

htmltree

htmltree

One example HTML file:

<html>

  <head>
      <title>This is a title</title>
  </head>

  <body>
      <h1>Lesson one</h1>
      <p>Hello world!</p>
  </body>

</html>

The tree structure:

example

example

2. rvest

rvest is a package from Hadley Wickham that makes basic processing and manipulation of HTML data straight forward.

Core functions:

read_html - read HTML data from a url or character string.

html_nodes - select specified nodes from the HTML document usign CSS selectors.

html_table - parse an HTML table into a data frame.

html_text - extract tag pairs’ content.

html_name - extract tags’ names.

html_attrs - extract all of each tag’s attributes.

html_attr - extract tags’ attribute value by name.

3. Css selectors

Selector gadget helps us identify the html elements of interest - it does this by constructing a css selector which can be used to subset the html document.

3.1 CSS Selector

  • for firefox, custmize the display, and thendrag link to your bookmark bar
  • for chrome, Try Chrome Extension, or drag link to your bookmark bar

Use

To use it, open the page

  • Click on the element you want to select. Selectorgadget will make a first guess at what css selector you want. It’s likely to be bad since it only has one example to learn from, but it’s a start. Elements that match the selector will be highlighted in yellow.

  • Click on elements that shouldn’t be selected. They will turn red. Click on elements that should be selected. They will turn green.

  • Iterate until only the elements you want are selected. Selectorgadget isn’t perfect and sometimes won’t be able to find a useful css selector. Sometimes starting from a different element helps.

4. Weather forecast data

4.1 Data

Source:

htmlpage <- read_html("http://forecast.weather.gov/MapClick.php?lat=42.31674913306716&lon=-71.42487878862437&site=all&smap=1#.VRsEpZPF84I")
forecasthtml <- html_nodes(htmlpage, ".forecast-text")
forecast <- html_text(forecasthtml)
forecast
 [1] "A chance of showers.  Cloudy, with a high near 63. West wind around 6 mph.  Chance of precipitation is 40%. New precipitation amounts of less than a tenth of an inch possible. "                              
 [2] "A chance of showers, mainly before 7pm.  Mostly cloudy, with a low around 43. Northwest wind 5 to 7 mph.  Chance of precipitation is 30%. New precipitation amounts of less than a tenth of an inch possible. "
 [3] "Mostly sunny, with a high near 52. Northwest wind 6 to 9 mph, with gusts as high as 23 mph. "                                                                                                                  
 [4] "Mostly clear, with a low around 28. Northwest wind around 6 mph becoming calm  after midnight. "                                                                                                               
 [5] "Mostly sunny, with a high near 52. Calm wind becoming west 5 to 9 mph in the morning. "                                                                                                                        
 [6] "Mostly cloudy, with a low around 39."                                                                                                                                                                          
 [7] "Partly sunny, with a high near 49."                                                                                                                                                                            
 [8] "Partly cloudy, with a low around 33."                                                                                                                                                                          
 [9] "Sunny, with a high near 52."                                                                                                                                                                                   
[10] "Mostly clear, with a low around 34."                                                                                                                                                                           
[11] "Mostly sunny, with a high near 59."                                                                                                                                                                            
[12] "Partly cloudy, with a low around 41."                                                                                                                                                                          
[13] "Mostly sunny, with a high near 55."                                                                                                                                                                            
[14] "Partly cloudy, with a low around 38."                                                                                                                                                                          
[15] "Mostly sunny, with a high near 56."                                                                                                                                                                            
A chance of showers.  Cloudy, with a high near 63. West wind around 6 mph.  Chance of precipitation is 40%. New precipitation amounts of less than a tenth of an inch possible. 

A chance of showers, mainly before 7pm.  Mostly cloudy, with a low around 43. Northwest wind 5 to 7 mph.  Chance of precipitation is 30%. New precipitation amounts of less than a tenth of an inch possible. 

Mostly sunny, with a high near 52. Northwest wind 6 to 9 mph, with gusts as high as 23 mph. 

Mostly clear, with a low around 28. Northwest wind around 6 mph becoming calm  after midnight. 

Mostly sunny, with a high near 52. Calm wind becoming west 5 to 9 mph in the morning. 

Mostly cloudy, with a low around 39.

Partly sunny, with a high near 49.

Partly cloudy, with a low around 33.

Sunny, with a high near 52.

Mostly clear, with a low around 34.

Mostly sunny, with a high near 59.

Partly cloudy, with a low around 41.

Mostly sunny, with a high near 55.

Partly cloudy, with a low around 38.

Mostly sunny, with a high near 56.

No date, try to add date

4.2 Try to add date

b , .forecast-text

forecasthtml <- html_nodes(htmlpage, "b , .forecast-text")
#forecasthtml <- html_nodes(htmlpage, "#detailed-forecast-body b , .forecast-text")
forecast <- html_text(forecasthtml)
forecast
 [1] "Current conditions at"                                                                                                                                                                                         
 [2] "Lat: "                                                                                                                                                                                                         
 [3] "Lon: "                                                                                                                                                                                                         
 [4] "Elev: "                                                                                                                                                                                                        
 [5] "Humidity"                                                                                                                                                                                                      
 [6] "Wind Speed"                                                                                                                                                                                                    
 [7] "Barometer"                                                                                                                                                                                                     
 [8] "Dewpoint"                                                                                                                                                                                                      
 [9] "Visibility"                                                                                                                                                                                                    
[10] "Last update"                                                                                                                                                                                                   
[11] "More Information:"                                                                                                                                                                                             
[12] "Extended Forecast for"                                                                                                                                                                                         
[13] "This Afternoon"                                                                                                                                                                                                
[14] "A chance of showers.  Cloudy, with a high near 63. West wind around 6 mph.  Chance of precipitation is 40%. New precipitation amounts of less than a tenth of an inch possible. "                              
[15] "Tonight"                                                                                                                                                                                                       
[16] "A chance of showers, mainly before 7pm.  Mostly cloudy, with a low around 43. Northwest wind 5 to 7 mph.  Chance of precipitation is 30%. New precipitation amounts of less than a tenth of an inch possible. "
[17] "Friday"                                                                                                                                                                                                        
[18] "Mostly sunny, with a high near 52. Northwest wind 6 to 9 mph, with gusts as high as 23 mph. "                                                                                                                  
[19] "Friday Night"                                                                                                                                                                                                  
[20] "Mostly clear, with a low around 28. Northwest wind around 6 mph becoming calm  after midnight. "                                                                                                               
[21] "Saturday"                                                                                                                                                                                                      
[22] "Mostly sunny, with a high near 52. Calm wind becoming west 5 to 9 mph in the morning. "                                                                                                                        
[23] "Saturday Night"                                                                                                                                                                                                
[24] "Mostly cloudy, with a low around 39."                                                                                                                                                                          
[25] "Sunday"                                                                                                                                                                                                        
[26] "Partly sunny, with a high near 49."                                                                                                                                                                            
[27] "Sunday Night"                                                                                                                                                                                                  
[28] "Partly cloudy, with a low around 33."                                                                                                                                                                          
[29] "Monday"                                                                                                                                                                                                        
[30] "Sunny, with a high near 52."                                                                                                                                                                                   
[31] "Monday Night"                                                                                                                                                                                                  
[32] "Mostly clear, with a low around 34."                                                                                                                                                                           
[33] "Tuesday"                                                                                                                                                                                                       
[34] "Mostly sunny, with a high near 59."                                                                                                                                                                            
[35] "Tuesday Night"                                                                                                                                                                                                 
[36] "Partly cloudy, with a low around 41."                                                                                                                                                                          
[37] "Wednesday"                                                                                                                                                                                                     
[38] "Mostly sunny, with a high near 55."                                                                                                                                                                            
[39] "Wednesday Night"                                                                                                                                                                                               
[40] "Partly cloudy, with a low around 38."                                                                                                                                                                          
[41] "Thursday"                                                                                                                                                                                                      
[42] "Mostly sunny, with a high near 56."                                                                                                                                                                            
[43] "Map function requires Javascript and a compatible browser."                                                                                                                                                    
Current conditions at

Lat: 

Lon: 

Elev: 

Humidity

Wind Speed

Barometer

Dewpoint

Visibility

Last update

More Information:

Extended Forecast for

This Afternoon

A chance of showers.  Cloudy, with a high near 63. West wind around 6 mph.  Chance of precipitation is 40%. New precipitation amounts of less than a tenth of an inch possible. 

Tonight

A chance of showers, mainly before 7pm.  Mostly cloudy, with a low around 43. Northwest wind 5 to 7 mph.  Chance of precipitation is 30%. New precipitation amounts of less than a tenth of an inch possible. 

Friday

Mostly sunny, with a high near 52. Northwest wind 6 to 9 mph, with gusts as high as 23 mph. 

Friday Night

Mostly clear, with a low around 28. Northwest wind around 6 mph becoming calm  after midnight. 

Saturday

Mostly sunny, with a high near 52. Calm wind becoming west 5 to 9 mph in the morning. 

Saturday Night

Mostly cloudy, with a low around 39.

Sunday

Partly sunny, with a high near 49.

Sunday Night

Partly cloudy, with a low around 33.

Monday

Sunny, with a high near 52.

Monday Night

Mostly clear, with a low around 34.

Tuesday

Mostly sunny, with a high near 59.

Tuesday Night

Partly cloudy, with a low around 41.

Wednesday

Mostly sunny, with a high near 55.

Wednesday Night

Partly cloudy, with a low around 38.

Thursday

Mostly sunny, with a high near 56.

Map function requires Javascript and a compatible browser.

Too much. Click the yellow places that we do not want. It turns red. Unselect.

4.3 Try get rid of extra information, narrow down what we want.

#detailed-forecast-body b , .forecast-text

forecasthtml <- html_nodes(htmlpage, "#detailed-forecast-body b , .forecast-text")
forecast <- html_text(forecasthtml)
forecast
 [1] "This Afternoon"                                                                                                                                                                                                
 [2] "A chance of showers.  Cloudy, with a high near 63. West wind around 6 mph.  Chance of precipitation is 40%. New precipitation amounts of less than a tenth of an inch possible. "                              
 [3] "Tonight"                                                                                                                                                                                                       
 [4] "A chance of showers, mainly before 7pm.  Mostly cloudy, with a low around 43. Northwest wind 5 to 7 mph.  Chance of precipitation is 30%. New precipitation amounts of less than a tenth of an inch possible. "
 [5] "Friday"                                                                                                                                                                                                        
 [6] "Mostly sunny, with a high near 52. Northwest wind 6 to 9 mph, with gusts as high as 23 mph. "                                                                                                                  
 [7] "Friday Night"                                                                                                                                                                                                  
 [8] "Mostly clear, with a low around 28. Northwest wind around 6 mph becoming calm  after midnight. "                                                                                                               
 [9] "Saturday"                                                                                                                                                                                                      
[10] "Mostly sunny, with a high near 52. Calm wind becoming west 5 to 9 mph in the morning. "                                                                                                                        
[11] "Saturday Night"                                                                                                                                                                                                
[12] "Mostly cloudy, with a low around 39."                                                                                                                                                                          
[13] "Sunday"                                                                                                                                                                                                        
[14] "Partly sunny, with a high near 49."                                                                                                                                                                            
[15] "Sunday Night"                                                                                                                                                                                                  
[16] "Partly cloudy, with a low around 33."                                                                                                                                                                          
[17] "Monday"                                                                                                                                                                                                        
[18] "Sunny, with a high near 52."                                                                                                                                                                                   
[19] "Monday Night"                                                                                                                                                                                                  
[20] "Mostly clear, with a low around 34."                                                                                                                                                                           
[21] "Tuesday"                                                                                                                                                                                                       
[22] "Mostly sunny, with a high near 59."                                                                                                                                                                            
[23] "Tuesday Night"                                                                                                                                                                                                 
[24] "Partly cloudy, with a low around 41."                                                                                                                                                                          
[25] "Wednesday"                                                                                                                                                                                                     
[26] "Mostly sunny, with a high near 55."                                                                                                                                                                            
[27] "Wednesday Night"                                                                                                                                                                                               
[28] "Partly cloudy, with a low around 38."                                                                                                                                                                          
[29] "Thursday"                                                                                                                                                                                                      
[30] "Mostly sunny, with a high near 56."                                                                                                                                                                            
This Afternoon

A chance of showers.  Cloudy, with a high near 63. West wind around 6 mph.  Chance of precipitation is 40%. New precipitation amounts of less than a tenth of an inch possible. 

Tonight

A chance of showers, mainly before 7pm.  Mostly cloudy, with a low around 43. Northwest wind 5 to 7 mph.  Chance of precipitation is 30%. New precipitation amounts of less than a tenth of an inch possible. 

Friday

Mostly sunny, with a high near 52. Northwest wind 6 to 9 mph, with gusts as high as 23 mph. 

Friday Night

Mostly clear, with a low around 28. Northwest wind around 6 mph becoming calm  after midnight. 

Saturday

Mostly sunny, with a high near 52. Calm wind becoming west 5 to 9 mph in the morning. 

Saturday Night

Mostly cloudy, with a low around 39.

Sunday

Partly sunny, with a high near 49.

Sunday Night

Partly cloudy, with a low around 33.

Monday

Sunny, with a high near 52.

Monday Night

Mostly clear, with a low around 34.

Tuesday

Mostly sunny, with a high near 59.

Tuesday Night

Partly cloudy, with a low around 41.

Wednesday

Mostly sunny, with a high near 55.

Wednesday Night

Partly cloudy, with a low around 38.

Thursday

Mostly sunny, with a high near 56.

Put them together

paste(forecast, collapse =" ")
[1] "This Afternoon A chance of showers.  Cloudy, with a high near 63. West wind around 6 mph.  Chance of precipitation is 40%. New precipitation amounts of less than a tenth of an inch possible.  Tonight A chance of showers, mainly before 7pm.  Mostly cloudy, with a low around 43. Northwest wind 5 to 7 mph.  Chance of precipitation is 30%. New precipitation amounts of less than a tenth of an inch possible.  Friday Mostly sunny, with a high near 52. Northwest wind 6 to 9 mph, with gusts as high as 23 mph.  Friday Night Mostly clear, with a low around 28. Northwest wind around 6 mph becoming calm  after midnight.  Saturday Mostly sunny, with a high near 52. Calm wind becoming west 5 to 9 mph in the morning.  Saturday Night Mostly cloudy, with a low around 39. Sunday Partly sunny, with a high near 49. Sunday Night Partly cloudy, with a low around 33. Monday Sunny, with a high near 52. Monday Night Mostly clear, with a low around 34. Tuesday Mostly sunny, with a high near 59. Tuesday Night Partly cloudy, with a low around 41. Wednesday Mostly sunny, with a high near 55. Wednesday Night Partly cloudy, with a low around 38. Thursday Mostly sunny, with a high near 56."
This Afternoon A chance of showers.  Cloudy, with a high near 63. West wind around 6 mph.  Chance of precipitation is 40%. New precipitation amounts of less than a tenth of an inch possible.  Tonight A chance of showers, mainly before 7pm.  Mostly cloudy, with a low around 43. Northwest wind 5 to 7 mph.  Chance of precipitation is 30%. New precipitation amounts of less than a tenth of an inch possible.  Friday Mostly sunny, with a high near 52. Northwest wind 6 to 9 mph, with gusts as high as 23 mph.  Friday Night Mostly clear, with a low around 28. Northwest wind around 6 mph becoming calm  after midnight.  Saturday Mostly sunny, with a high near 52. Calm wind becoming west 5 to 9 mph in the morning.  Saturday Night Mostly cloudy, with a low around 39. Sunday Partly sunny, with a high near 49. Sunday Night Partly cloudy, with a low around 33. Monday Sunny, with a high near 52. Monday Night Mostly clear, with a low around 34. Tuesday Mostly sunny, with a high near 59. Tuesday Night Partly cloudy, with a low around 41. Wednesday Mostly sunny, with a high near 55. Wednesday Night Partly cloudy, with a low around 38. Thursday Mostly sunny, with a high near 56.

4.4 Try again for simple forecast

#seven-day-forecast-list p

forecasthtml <- html_nodes(htmlpage, "#seven-day-forecast-list p")
forecast <- html_text(forecasthtml)
paste(forecast, collapse =" ")
[1] "ThisAfternoon  ChanceShowers High: 63 °F Tonight  ChanceShowers thenMostly Cloudy Low: 43 °F Friday  Mostly Sunny High: 52 °F FridayNight  Mostly Clear Low: 28 °F Saturday  Mostly Sunny High: 52 °F SaturdayNight  Mostly Cloudy Low: 39 °F Sunday  Partly Sunny High: 49 °F SundayNight  Partly Cloudy Low: 33 °F Monday  Sunny High: 52 °F"
ThisAfternoon  ChanceShowers High: 63 °F Tonight  ChanceShowers thenMostly Cloudy Low: 43 °F Friday  Mostly Sunny High: 52 °F FridayNight  Mostly Clear Low: 28 °F Saturday  Mostly Sunny High: 52 °F SaturdayNight  Mostly Cloudy Low: 39 °F Sunday  Partly Sunny High: 49 °F SundayNight  Partly Cloudy Low: 33 °F Monday  Sunny High: 52 °F

5. Lego movie

5.1 Cast list

#titleCast .itemprop

html <- read_html("http://www.imdb.com/title/tt1490017/")
cast <- html_nodes(html, "#titleCast .itemprop")
length(cast)
[1] 30
cast[1:2]
{xml_nodeset (2)}
[1] <td class="itemprop" itemprop="actor" itemscope="" itemtype="http:// ...
[2] <span class="itemprop" itemprop="name">Will Arnett</span>

Looking carefully at this output, we see twice as many matches as we expected. That’s because we’ve selected both the table cell and the text inside the cell. We can experiment with selectorgadget to find a better match or look at the html directly.

try #titleCast span.itemprop

cast <- html_nodes(html, "#titleCast span.itemprop")
length(cast)
[1] 15
html_text(cast)
 [1] "Will Arnett"     "Elizabeth Banks" "Craig Berry"    
 [4] "Alison Brie"     "David Burrows"   "Anthony Daniels"
 [7] "Charlie Day"     "Amanda Farinos"  "Keith Ferguson" 
[10] "Will Ferrell"    "Will Forte"      "Dave Franco"    
[13] "Morgan Freeman"  "Todd Hansen"     "Jonah Hill"     
Will Arnett

Elizabeth Banks

Craig Berry

Alison Brie

David Burrows

Anthony Daniels

Charlie Day

Amanda Farinos

Keith Ferguson

Will Ferrell

Will Forte

Dave Franco

Morgan Freeman

Todd Hansen

Jonah Hill

5.2 Score

.ratingValue span

score <- html_nodes(html, ".ratingValue span")
length(score)
[1] 3
html_text(score)
[1] "7.8" "/"   "10" 
7.8

/

10

Put them together

paste(html_text(score), collapse ="")
[1] "7.8/10"
7.8/10

6. Exercise: find the Now Playing (Box Office)

.aux-content-widget-2:nth-child(11) .title a

html <- read_html("http://www.imdb.com/")
playing <- html_nodes(html, ".aux-content-widget-2:nth-child(11) .title a")
length(playing)
[1] 5
playing[1:3]
{xml_nodeset (3)}
[1] <a href="/title/tt5325452?pf_rd_m=A2FGELUUNOQJNL&amp;pf_rd_p=2495768 ...
[2] <a href="/title/tt3062096?pf_rd_m=A2FGELUUNOQJNL&amp;pf_rd_p=2495768 ...
[3] <a href="/title/tt3393786?pf_rd_m=A2FGELUUNOQJNL&amp;pf_rd_p=2495768 ...

Get text

movies= html_text(playing)
movies
[1] " Boo! A Madea Halloween "      " Inferno "                    
[3] " Jack Reacher: Never Go Back " " The Accountant "             
[5] " Ouija: Origin of Evil "      
 Boo! A Madea Halloween 

 Inferno 

 Jack Reacher: Never Go Back 

 The Accountant 

 Ouija: Origin of Evil 

Get link

link=html_attr(playing, "href")
link
[1] "/title/tt5325452?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=2495768522&pf_rd_r=0VTJRNSP4VYPD3J9S3AS&pf_rd_s=right-7&pf_rd_t=15061&pf_rd_i=homepage&ref_=hm_cht_t0"
[2] "/title/tt3062096?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=2495768522&pf_rd_r=0VTJRNSP4VYPD3J9S3AS&pf_rd_s=right-7&pf_rd_t=15061&pf_rd_i=homepage&ref_=hm_cht_t1"
[3] "/title/tt3393786?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=2495768522&pf_rd_r=0VTJRNSP4VYPD3J9S3AS&pf_rd_s=right-7&pf_rd_t=15061&pf_rd_i=homepage&ref_=hm_cht_t2"
[4] "/title/tt2140479?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=2495768522&pf_rd_r=0VTJRNSP4VYPD3J9S3AS&pf_rd_s=right-7&pf_rd_t=15061&pf_rd_i=homepage&ref_=hm_cht_t3"
[5] "/title/tt4361050?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=2495768522&pf_rd_r=0VTJRNSP4VYPD3J9S3AS&pf_rd_s=right-7&pf_rd_t=15061&pf_rd_i=homepage&ref_=hm_cht_t4"
/title/tt5325452?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=2495768522&pf_rd_r=0VTJRNSP4VYPD3J9S3AS&pf_rd_s=right-7&pf_rd_t=15061&pf_rd_i=homepage&ref_=hm_cht_t0

/title/tt3062096?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=2495768522&pf_rd_r=0VTJRNSP4VYPD3J9S3AS&pf_rd_s=right-7&pf_rd_t=15061&pf_rd_i=homepage&ref_=hm_cht_t1

/title/tt3393786?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=2495768522&pf_rd_r=0VTJRNSP4VYPD3J9S3AS&pf_rd_s=right-7&pf_rd_t=15061&pf_rd_i=homepage&ref_=hm_cht_t2

/title/tt2140479?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=2495768522&pf_rd_r=0VTJRNSP4VYPD3J9S3AS&pf_rd_s=right-7&pf_rd_t=15061&pf_rd_i=homepage&ref_=hm_cht_t3

/title/tt4361050?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=2495768522&pf_rd_r=0VTJRNSP4VYPD3J9S3AS&pf_rd_s=right-7&pf_rd_t=15061&pf_rd_i=homepage&ref_=hm_cht_t4
link = paste0("http://www.imdb.com", link )
link
[1] "http://www.imdb.com/title/tt5325452?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=2495768522&pf_rd_r=0VTJRNSP4VYPD3J9S3AS&pf_rd_s=right-7&pf_rd_t=15061&pf_rd_i=homepage&ref_=hm_cht_t0"
[2] "http://www.imdb.com/title/tt3062096?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=2495768522&pf_rd_r=0VTJRNSP4VYPD3J9S3AS&pf_rd_s=right-7&pf_rd_t=15061&pf_rd_i=homepage&ref_=hm_cht_t1"
[3] "http://www.imdb.com/title/tt3393786?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=2495768522&pf_rd_r=0VTJRNSP4VYPD3J9S3AS&pf_rd_s=right-7&pf_rd_t=15061&pf_rd_i=homepage&ref_=hm_cht_t2"
[4] "http://www.imdb.com/title/tt2140479?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=2495768522&pf_rd_r=0VTJRNSP4VYPD3J9S3AS&pf_rd_s=right-7&pf_rd_t=15061&pf_rd_i=homepage&ref_=hm_cht_t3"
[5] "http://www.imdb.com/title/tt4361050?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=2495768522&pf_rd_r=0VTJRNSP4VYPD3J9S3AS&pf_rd_s=right-7&pf_rd_t=15061&pf_rd_i=homepage&ref_=hm_cht_t4"
http://www.imdb.com/title/tt5325452?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=2495768522&pf_rd_r=0VTJRNSP4VYPD3J9S3AS&pf_rd_s=right-7&pf_rd_t=15061&pf_rd_i=homepage&ref_=hm_cht_t0

http://www.imdb.com/title/tt3062096?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=2495768522&pf_rd_r=0VTJRNSP4VYPD3J9S3AS&pf_rd_s=right-7&pf_rd_t=15061&pf_rd_i=homepage&ref_=hm_cht_t1

http://www.imdb.com/title/tt3393786?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=2495768522&pf_rd_r=0VTJRNSP4VYPD3J9S3AS&pf_rd_s=right-7&pf_rd_t=15061&pf_rd_i=homepage&ref_=hm_cht_t2

http://www.imdb.com/title/tt2140479?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=2495768522&pf_rd_r=0VTJRNSP4VYPD3J9S3AS&pf_rd_s=right-7&pf_rd_t=15061&pf_rd_i=homepage&ref_=hm_cht_t3

http://www.imdb.com/title/tt4361050?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=2495768522&pf_rd_r=0VTJRNSP4VYPD3J9S3AS&pf_rd_s=right-7&pf_rd_t=15061&pf_rd_i=homepage&ref_=hm_cht_t4

6.1 Get boxoffice

.secondary-text

boxoffice <- html_nodes(html, ".secondary-text")
length(boxoffice)
[1] 16
boxoffice[1:3]
{xml_nodeset (3)}
[1] <span class="secondary-text"/>
[2] <span class="secondary-text"/>
[3] <span class="secondary-text"/>
boxoffice = html_text(boxoffice)
boxoffice = boxoffice[7:11]
boxoffice 
[1] "Weekend: $17.2M" "Weekend: $14.9M" "Weekend: $9.6M"  "Weekend: $8.5M" 
[5] "Weekend: $7.1M" 
Weekend: $17.2M

Weekend: $14.9M

Weekend: $9.6M

Weekend: $8.5M

Weekend: $7.1M

6.2 Get and print as a dataframe

imbddf <- data_frame(movie= movies, boxoffice = boxoffice,link = link)
datatable(imbddf)

7. Superbowl Winners: read_table

We use the read_table function to read a web page and get the table. More organized text.

url <- 'http://espn.go.com/nfl/superbowl/history/winners'
webpage <- read_html(url)

Next, we use the functions html_nodes and html_table to extract the HTML table element and convert it to a data frame.

use ? html_talbe to check out the arguments.

Do pay attetion to fill = TRUE

If we only need first element of a list, using [[i]] double square bracket.

sb_table <- html_nodes(webpage, 'table')
str(sb_table)
List of 1
 $ :List of 2
  ..$ node:<externalptr> 
  ..$ doc :<externalptr> 
  ..- attr(*, "class")= chr "xml_node"
 - attr(*, "class")= chr "xml_nodeset"
sb <- html_table(sb_table,fill=TRUE)[[1]]
#head(sb)
datatable(sb, caption = 'Table 1: Not clean and tidy data.')

We remove the first two rows, and set the column names.

sb <- sb[-(1:2), ]
names(sb) <- c("number", "date", "site", "result")
#head(sb)
datatable(sb, caption = 'Table 2: Improvment to clean and tidy data.')

It is traditional to use Roman numerals to refer to Super Bowls, but Arabic numerals are more convenient to work with. We will also convert the date to a standard format.

library(lubridate) # easy to parse datetime data

Attaching package: 'lubridate'
The following object is masked from 'package:base':

    date
sb$number <- 1:50
#sb$date <- as.Date(sb$date, "%B. %d, %Y")
sb$date <- mdy(sb$date)
#head(sb)
datatable(sb, caption = 'Table 3: Improvment to clean and tidy data.')

The result column should be split into four columns as the winning teams name, the winners score, the losing teams name, and the losers score. We start by splitting the results column into two columns at the comma. This operation uses the separate function from the tidyr package.

sb <- separate(sb, result, c('winner', 'loser'), sep=', ', remove=TRUE)
#head(sb)
datatable(sb, caption = 'Table 4: Clean and tidy data.')

8. Assignment

  1. Scrap the table on Obesity_in_the_United_States
  1. Find the wind data in AESO
LS0tDQp0aXRsZTogIkVDT040NTcgUiBsYWIgMDQgRGF0YSBTY3JwaW5nIGluIFI6IFJ2ZXN0Ig0Kb3V0cHV0Og0KICBodG1sX25vdGVib29rOiBkZWZhdWx0DQotLS0NCg0KVGhpcyBpcyBhbiB0dXRvcmlhbCBhYm91dCBkYXRhIHNjcmFwaW5nIGluIFIgdXNpbmcgcnZlc3QgcGFja2FnZS4NCg0KDQpgYGB7cn0NCmlmICghcmVxdWlyZSgicGFjbWFuIikpIGluc3RhbGwucGFja2FnZXMoInBhY21hbiIpDQpwYWNtYW46OnBfbG9hZChydmVzdCwgZHBseXIsIHN0cmluZ3IsRFQsdGlkeXIsIHJlYWR4bCxrbml0cixnZ3Bsb3QyKQ0KI2xpYnJhcnkocnZlc3QpICANCiNsaWJyYXJ5KHN0cmluZ3IpICAjIGRlYWwgd2l0aCBzdHJpbmcgaW4gcg0KI2xpYnJhcnkodGlkeXIpICMgZGF0YSBjbGVhbmluZw0KI2xpYnJhcnkoRFQpICAgICAgICAgICAjIGZvciBwcmludGluZyBuaWNlIEhUTUwgb3V0cHV0IHRhYmxlcw0KYGBgDQoNCg0KIyMgMS4gSFRNTDogSHlwZXJ0ZXh0IE1hcmt1cCBMYW5ndWFnZQ0KQSBIVE1MIGZpbGUgaXMgc3RydWN0dXJlZCBbKGhpZXJhcmNoaWNhbCAvIHRyZWUgYmFzZWQpLl0oaHR0cDovL3d3dy53M3NjaG9vbHMuY29tL2pzL2pzX2h0bWxkb21fbmF2aWdhdGlvbi5hc3ApDQoNCipldmVyeXRoaW5nIGluIGFuIEhUTUwgZG9jdW1lbnQgaXMgYSBub2RlOioqDQoNCiAgLSBUaGUgZW50aXJlIGRvY3VtZW50IGlzIGEgZG9jdW1lbnQgbm9kZQ0KICAtIEV2ZXJ5IEhUTUwgZWxlbWVudCBpcyBhbiBlbGVtZW50IG5vZGUNCiAgLSBUaGUgdGV4dCBpbnNpZGUgSFRNTCBlbGVtZW50cyBhcmUgdGV4dCBub2Rlcw0KICAtIEV2ZXJ5IEhUTUwgYXR0cmlidXRlIGlzIGFuIGF0dHJpYnV0ZSBub2RlDQogIC0gQWxsIGNvbW1lbnRzIGFyZSBjb21tZW50IG5vZGVzDQoNCg0KIVtodG1sdHJlZV0oaHR0cDovL3d3dy53M3NjaG9vbHMuY29tL2pzL3BpY19odG1sdHJlZS5naWYpDQoNCg0KT25lIGV4YW1wbGUgSFRNTCBmaWxlOg0KDQpgYGBodG1sDQo8aHRtbD4NCg0KICA8aGVhZD4NCiAgICAgIDx0aXRsZT5UaGlzIGlzIGEgdGl0bGU8L3RpdGxlPg0KICA8L2hlYWQ+DQoNCiAgPGJvZHk+DQogICAgICA8aDE+TGVzc29uIG9uZTwvaDE+DQogICAgICA8cD5IZWxsbyB3b3JsZCE8L3A+DQogIDwvYm9keT4NCg0KPC9odG1sPg0KYGBgDQoNCg0KVGhlIHRyZWUgc3RydWN0dXJlOiAgDQoNCg0KIVtleGFtcGxlXShodHRwOi8vd3d3Lnczc2Nob29scy5jb20vanMvcGljX25hdmlnYXRlLmdpZikNCg0KDQojIyAyLiBydmVzdA0KcnZlc3QgaXMgYSBwYWNrYWdlIGZyb20gKipIYWRsZXkgV2lja2hhbSoqIHRoYXQgbWFrZXMgYmFzaWMgcHJvY2Vzc2luZyBhbmQgbWFuaXB1bGF0aW9uIG9mIEhUTUwgZGF0YSBzdHJhaWdodCBmb3J3YXJkLg0KDQoqKkNvcmUgZnVuY3Rpb25zOioqDQoNCmByZWFkX2h0bWxgIC0gcmVhZCBIVE1MIGRhdGEgZnJvbSBhIHVybCBvciBjaGFyYWN0ZXIgc3RyaW5nLg0KDQpgaHRtbF9ub2Rlc2AgLSBzZWxlY3Qgc3BlY2lmaWVkIG5vZGVzIGZyb20gdGhlIEhUTUwgZG9jdW1lbnQgdXNpZ24gQ1NTIHNlbGVjdG9ycy4NCg0KYGh0bWxfdGFibGVgIC0gcGFyc2UgYW4gSFRNTCB0YWJsZSBpbnRvIGEgZGF0YSBmcmFtZS4NCg0KYGh0bWxfdGV4dGAgLSBleHRyYWN0IHRhZyBwYWlycycgY29udGVudC4NCg0KYGh0bWxfbmFtZWAgLSBleHRyYWN0IHRhZ3MnIG5hbWVzLg0KDQpgaHRtbF9hdHRyc2AgLSBleHRyYWN0IGFsbCBvZiBlYWNoIHRhZydzIGF0dHJpYnV0ZXMuDQoNCmBodG1sX2F0dHJgIC0gZXh0cmFjdCB0YWdzJyBhdHRyaWJ1dGUgdmFsdWUgYnkgbmFtZS4NCg0KDQojIyAzLiBDc3Mgc2VsZWN0b3JzDQoNCltTZWxlY3RvciBnYWRnZXRdKGh0dHBzOi8vY3Jhbi5yLXByb2plY3Qub3JnL3dlYi9wYWNrYWdlcy9ydmVzdC92aWduZXR0ZXMvc2VsZWN0b3JnYWRnZXQuaHRtbCkgaGVscHMgdXMgaWRlbnRpZnkgdGhlIGh0bWwgZWxlbWVudHMgb2YgaW50ZXJlc3QgLSBpdCBkb2VzIHRoaXMgYnkgY29uc3RydWN0aW5nIGEgY3NzIHNlbGVjdG9yIHdoaWNoIGNhbiBiZSB1c2VkIHRvIHN1YnNldCB0aGUgaHRtbCBkb2N1bWVudC4NCg0KDQpgYGB7ciAsZWNobz1GQUxTRX0NCmh0bWxwYWdlIDwtIHJlYWRfaHRtbCgiaHR0cDovL3d3dzIuc3RhdC5kdWtlLmVkdS9+Y3IxNzMvU3RhNTIzX0ZhMTYvV2ViX3NjcmFwaW5nLmh0bWwiKQ0KZWxlbWVudHRhYmxlIDwtIGh0bWxfbm9kZXMoaHRtbHBhZ2UsICd0YWJsZScpDQpzaG93dGFibGUgPC0gaHRtbF90YWJsZShlbGVtZW50dGFibGUpW1sxXV0NCmRhdGF0YWJsZShzaG93dGFibGUpIA0KYGBgDQoNCg0KDQojIyMgMy4xIFtDU1MgU2VsZWN0b3JdKGh0dHA6Ly9zZWxlY3RvcmdhZGdldC5jb20vKQ0KDQogIC0gZm9yIGZpcmVmb3gsIGN1c3RtaXplIHRoZSBkaXNwbGF5LCBhbmQgdGhlbmRyYWcgbGluayB0byB5b3VyIGJvb2ttYXJrIGJhcg0KICAtIGZvciBjaHJvbWUsIFRyeSBDaHJvbWUgRXh0ZW5zaW9uLCBvciBkcmFnIGxpbmsgdG8geW91ciBib29rbWFyayBiYXINCg0KIyMjIFVzZQ0KDQpUbyB1c2UgaXQsIG9wZW4gdGhlIHBhZ2UNCg0KICAtIENsaWNrIG9uIHRoZSBlbGVtZW50IHlvdSB3YW50IHRvIHNlbGVjdC4gU2VsZWN0b3JnYWRnZXQgd2lsbCBtYWtlIGEgZmlyc3QgZ3Vlc3MgYXQgd2hhdCBjc3Mgc2VsZWN0b3IgeW91IHdhbnQuIEl0J3MgbGlrZWx5IHRvIGJlIGJhZCBzaW5jZSBpdCBvbmx5IGhhcyBvbmUgZXhhbXBsZSB0byBsZWFybiBmcm9tLCBidXQgaXQncyBhIHN0YXJ0LiBFbGVtZW50cyB0aGF0IG1hdGNoIHRoZSBzZWxlY3RvciB3aWxsIGJlIGhpZ2hsaWdodGVkIGluIHllbGxvdy4NCg0KICAtIENsaWNrIG9uIGVsZW1lbnRzIHRoYXQgc2hvdWxkbid0IGJlIHNlbGVjdGVkLiBUaGV5IHdpbGwgdHVybiByZWQuIENsaWNrIG9uIGVsZW1lbnRzIHRoYXQgc2hvdWxkIGJlIHNlbGVjdGVkLiBUaGV5IHdpbGwgdHVybiBncmVlbi4NCg0KICAtIEl0ZXJhdGUgdW50aWwgb25seSB0aGUgZWxlbWVudHMgeW91IHdhbnQgYXJlIHNlbGVjdGVkLiBTZWxlY3RvcmdhZGdldCBpc24ndCBwZXJmZWN0IGFuZCBzb21ldGltZXMgd29uJ3QgYmUgYWJsZSB0byBmaW5kIGEgdXNlZnVsIGNzcyBzZWxlY3Rvci4gU29tZXRpbWVzIHN0YXJ0aW5nIGZyb20gYSBkaWZmZXJlbnQgZWxlbWVudCBoZWxwcy4NCg0KDQojIyA0LiBXZWF0aGVyIGZvcmVjYXN0IGRhdGENCg0KIyMjIDQuMSBEYXRhICAgDQogIC0gW1dlYXRoZXIgZm9yZWNhc3QgZGF0YV0oaHR0cDovL2ZvcmVjYXN0LndlYXRoZXIuZ292L01hcENsaWNrLnBocD9sYXQ9NDIuMzE2NzQ5MTMzMDY3MTYmbG9uPS03MS40MjQ4Nzg3ODg2MjQzNyZzaXRlPWFsbCZzbWFwPTEjLlZSc0VwWlBGODRJKQ0KIA0KU291cmNlOg0KDQogIC0gW1dlYiBzY3JhcGluZyB3aXRoIFIgYW5kIHJ2ZXN0IChpbmNsdWRlcyB2aWRlbyAmIGNvZGUpXShodHRwOi8vd3d3LmNvbXB1dGVyd29ybGQuY29tL2FydGljbGUvMjkwOTU2MC9idXNpbmVzcy1pbnRlbGxpZ2VuY2Uvd2ViLXNjcmFwaW5nLXdpdGgtci1hbmQtcnZlc3QtaW5jbHVkZXMtdmlkZW8tY29kZS5odG1sKSANCg0KDQoNCmBgYHtyfQ0KaHRtbHBhZ2UgPC0gcmVhZF9odG1sKCJodHRwOi8vZm9yZWNhc3Qud2VhdGhlci5nb3YvTWFwQ2xpY2sucGhwP2xhdD00Mi4zMTY3NDkxMzMwNjcxNiZsb249LTcxLjQyNDg3ODc4ODYyNDM3JnNpdGU9YWxsJnNtYXA9MSMuVlJzRXBaUEY4NEkiKQ0KZm9yZWNhc3RodG1sIDwtIGh0bWxfbm9kZXMoaHRtbHBhZ2UsICIuZm9yZWNhc3QtdGV4dCIpDQpmb3JlY2FzdCA8LSBodG1sX3RleHQoZm9yZWNhc3RodG1sKQ0KZm9yZWNhc3QNCmBgYA0KDQoNCk5vIGRhdGUsIHRyeSB0byBhZGQgZGF0ZQ0KDQojIyMgNC4yIFRyeSB0byBhZGQgZGF0ZQ0KDQpgYiAsIC5mb3JlY2FzdC10ZXh0YA0KDQoNCmBgYHtyfQ0KZm9yZWNhc3RodG1sIDwtIGh0bWxfbm9kZXMoaHRtbHBhZ2UsICJiICwgLmZvcmVjYXN0LXRleHQiKQ0KI2ZvcmVjYXN0aHRtbCA8LSBodG1sX25vZGVzKGh0bWxwYWdlLCAiI2RldGFpbGVkLWZvcmVjYXN0LWJvZHkgYiAsIC5mb3JlY2FzdC10ZXh0IikNCmZvcmVjYXN0IDwtIGh0bWxfdGV4dChmb3JlY2FzdGh0bWwpDQpmb3JlY2FzdA0KYGBgDQoNCg0KDQpUb28gbXVjaC4gQ2xpY2sgdGhlIHllbGxvdyBwbGFjZXMgdGhhdCB3ZSBkbyBub3Qgd2FudC4gSXQgdHVybnMgcmVkLiBVbnNlbGVjdC4NCg0KIyMjIDQuMyBUcnkgZ2V0IHJpZCBvZiBleHRyYSBpbmZvcm1hdGlvbiwgbmFycm93IGRvd24gd2hhdCB3ZSB3YW50Lg0KDQpgI2RldGFpbGVkLWZvcmVjYXN0LWJvZHkgYiAsIC5mb3JlY2FzdC10ZXh0YA0KDQpgYGB7cn0NCmZvcmVjYXN0aHRtbCA8LSBodG1sX25vZGVzKGh0bWxwYWdlLCAiI2RldGFpbGVkLWZvcmVjYXN0LWJvZHkgYiAsIC5mb3JlY2FzdC10ZXh0IikNCmZvcmVjYXN0IDwtIGh0bWxfdGV4dChmb3JlY2FzdGh0bWwpDQpmb3JlY2FzdA0KYGBgDQoNClB1dCB0aGVtIHRvZ2V0aGVyDQoNCmBgYHtyfQ0KcGFzdGUoZm9yZWNhc3QsIGNvbGxhcHNlID0iICIpDQpgYGANCg0KDQojIyMgNC40IFRyeSBhZ2FpbiBmb3Igc2ltcGxlIGZvcmVjYXN0DQoNCmAjc2V2ZW4tZGF5LWZvcmVjYXN0LWxpc3QgcGANCg0KDQpgYGB7cn0NCmZvcmVjYXN0aHRtbCA8LSBodG1sX25vZGVzKGh0bWxwYWdlLCAiI3NldmVuLWRheS1mb3JlY2FzdC1saXN0IHAiKQ0KZm9yZWNhc3QgPC0gaHRtbF90ZXh0KGZvcmVjYXN0aHRtbCkNCnBhc3RlKGZvcmVjYXN0LCBjb2xsYXBzZSA9IiAiKQ0KYGBgDQoNCg0KIyMgNS4gTGVnbyBtb3ZpZQ0KDQojIyMgNS4xIENhc3QgbGlzdA0KDQpgI3RpdGxlQ2FzdCAuaXRlbXByb3BgDQoNCmBgYHtyfQ0KaHRtbCA8LSByZWFkX2h0bWwoImh0dHA6Ly93d3cuaW1kYi5jb20vdGl0bGUvdHQxNDkwMDE3LyIpDQpjYXN0IDwtIGh0bWxfbm9kZXMoaHRtbCwgIiN0aXRsZUNhc3QgLml0ZW1wcm9wIikNCmxlbmd0aChjYXN0KQ0KDQpjYXN0WzE6Ml0NCg0KYGBgDQoNCg0KTG9va2luZyBjYXJlZnVsbHkgYXQgdGhpcyBvdXRwdXQsIHdlIHNlZSB0d2ljZSBhcyBtYW55IG1hdGNoZXMgYXMgd2UgZXhwZWN0ZWQuIFRoYXQncyBiZWNhdXNlIHdlJ3ZlIHNlbGVjdGVkIGJvdGggdGhlIHRhYmxlIGNlbGwgYW5kIHRoZSB0ZXh0IGluc2lkZSB0aGUgY2VsbC4gV2UgY2FuIGV4cGVyaW1lbnQgd2l0aCBzZWxlY3RvcmdhZGdldCB0byBmaW5kIGEgYmV0dGVyIG1hdGNoIG9yIGxvb2sgYXQgdGhlIGh0bWwgZGlyZWN0bHkuDQoNCnRyeSBgI3RpdGxlQ2FzdCBzcGFuLml0ZW1wcm9wYA0KDQpgYGB7cn0NCmNhc3QgPC0gaHRtbF9ub2RlcyhodG1sLCAiI3RpdGxlQ2FzdCBzcGFuLml0ZW1wcm9wIikNCmxlbmd0aChjYXN0KQ0KaHRtbF90ZXh0KGNhc3QpDQpgYGANCg0KDQojIyMgNS4yIFNjb3JlDQoNCmAucmF0aW5nVmFsdWUgc3BhbmANCg0KYGBge3J9DQpzY29yZSA8LSBodG1sX25vZGVzKGh0bWwsICIucmF0aW5nVmFsdWUgc3BhbiIpDQpsZW5ndGgoc2NvcmUpDQpodG1sX3RleHQoc2NvcmUpDQpgYGANCg0KDQpQdXQgdGhlbSB0b2dldGhlcg0KDQpgYGB7cn0NCnBhc3RlKGh0bWxfdGV4dChzY29yZSksIGNvbGxhcHNlID0iIikNCmBgYA0KDQoNCg0KIyMgNi4gRXhlcmNpc2U6IGZpbmQgdGhlIGBOb3cgUGxheWluZyAoQm94IE9mZmljZSlgDQoNCg0KDQpgLmF1eC1jb250ZW50LXdpZGdldC0yOm50aC1jaGlsZCgxMSkgLnRpdGxlIGFgDQoNCg0KDQpgYGB7cn0NCmh0bWwgPC0gcmVhZF9odG1sKCJodHRwOi8vd3d3LmltZGIuY29tLyIpDQpwbGF5aW5nIDwtIGh0bWxfbm9kZXMoaHRtbCwgIi5hdXgtY29udGVudC13aWRnZXQtMjpudGgtY2hpbGQoMTEpIC50aXRsZSBhIikNCmxlbmd0aChwbGF5aW5nKQ0KcGxheWluZ1sxOjNdDQpgYGANCg0KR2V0IHRleHQNCg0KYGBge3J9DQptb3ZpZXM9IGh0bWxfdGV4dChwbGF5aW5nKQ0KbW92aWVzDQpgYGANCg0KR2V0IGxpbmsNCg0KYGBge3J9DQpsaW5rPWh0bWxfYXR0cihwbGF5aW5nLCAiaHJlZiIpDQpsaW5rDQpgYGANCg0KYGBge3J9DQpsaW5rID0gcGFzdGUwKCJodHRwOi8vd3d3LmltZGIuY29tIiwgbGluayApDQpsaW5rDQpgYGANCg0KIyMjIDYuMSBHZXQgYm94b2ZmaWNlDQoNCmAuc2Vjb25kYXJ5LXRleHRgDQoNCmBgYHtyfQ0KYm94b2ZmaWNlIDwtIGh0bWxfbm9kZXMoaHRtbCwgIi5zZWNvbmRhcnktdGV4dCIpDQpsZW5ndGgoYm94b2ZmaWNlKQ0KYm94b2ZmaWNlWzE6M10NCmBgYA0KDQoNCmBgYHtyfQ0KYm94b2ZmaWNlID0gaHRtbF90ZXh0KGJveG9mZmljZSkNCmJveG9mZmljZSA9IGJveG9mZmljZVs3OjExXQ0KYm94b2ZmaWNlIA0KYGBgDQoNCg0KIyMjIDYuMiBHZXQgYW5kIHByaW50IGFzIGEgZGF0YWZyYW1lDQoNCg0KYGBge3J9DQppbWJkZGYgPC0gZGF0YV9mcmFtZShtb3ZpZT0gbW92aWVzLCBib3hvZmZpY2UgPSBib3hvZmZpY2UsbGluayA9IGxpbmspDQpkYXRhdGFibGUoaW1iZGRmKQ0KYGBgDQoNCg0KDQojIyA3LiBTdXBlcmJvd2wgV2lubmVyczogcmVhZF90YWJsZQ0KDQpXZSB1c2UgdGhlIHJlYWRfdGFibGUgZnVuY3Rpb24gdG8gcmVhZCBhIHdlYiBwYWdlIGFuZCBnZXQgdGhlIHRhYmxlLiBNb3JlIG9yZ2FuaXplZCB0ZXh0Lg0KDQpgYGB7cn0NCnVybCA8LSAnaHR0cDovL2VzcG4uZ28uY29tL25mbC9zdXBlcmJvd2wvaGlzdG9yeS93aW5uZXJzJw0Kd2VicGFnZSA8LSByZWFkX2h0bWwodXJsKQ0KYGBgDQoNCk5leHQsIHdlIHVzZSB0aGUgZnVuY3Rpb25zIGh0bWxfbm9kZXMgYW5kIGh0bWxfdGFibGUgdG8gZXh0cmFjdCB0aGUgSFRNTCB0YWJsZSBlbGVtZW50IGFuZCBjb252ZXJ0IGl0IHRvIGEgZGF0YSBmcmFtZS4NCg0KdXNlIGA/IGh0bWxfdGFsYmVgIHRvIGNoZWNrIG91dCB0aGUgYXJndW1lbnRzLiANCg0KRG8gcGF5IGF0dGV0aW9uIHRvIGBmaWxsID0gVFJVRWANCg0KSWYgd2Ugb25seSBuZWVkIGZpcnN0IGVsZW1lbnQgb2YgYSBsaXN0LCB1c2luZyBgW1tpXV1gIGRvdWJsZSBzcXVhcmUgYnJhY2tldC4gDQoNCmBgYHtyfQ0KDQpzYl90YWJsZSA8LSBodG1sX25vZGVzKHdlYnBhZ2UsICd0YWJsZScpDQpzdHIoc2JfdGFibGUpDQpzYiA8LSBodG1sX3RhYmxlKHNiX3RhYmxlLGZpbGw9VFJVRSlbWzFdXQ0KI2hlYWQoc2IpDQpkYXRhdGFibGUoc2IsIGNhcHRpb24gPSAnVGFibGUgMTogTm90IGNsZWFuIGFuZCB0aWR5IGRhdGEuJykNCmBgYA0KDQpXZSByZW1vdmUgdGhlIGZpcnN0IHR3byByb3dzLCBhbmQgc2V0IHRoZSBjb2x1bW4gbmFtZXMuDQoNCmBgYHtyfQ0Kc2IgPC0gc2JbLSgxOjIpLCBdDQpuYW1lcyhzYikgPC0gYygibnVtYmVyIiwgImRhdGUiLCAic2l0ZSIsICJyZXN1bHQiKQ0KI2hlYWQoc2IpDQpkYXRhdGFibGUoc2IsIGNhcHRpb24gPSAnVGFibGUgMjogSW1wcm92bWVudCB0byBjbGVhbiBhbmQgdGlkeSBkYXRhLicpDQpgYGANCg0KDQpJdCBpcyB0cmFkaXRpb25hbCB0byB1c2UgUm9tYW4gbnVtZXJhbHMgdG8gcmVmZXIgdG8gU3VwZXIgQm93bHMsIGJ1dCBBcmFiaWMgbnVtZXJhbHMgYXJlIG1vcmUgY29udmVuaWVudCB0byB3b3JrIHdpdGguIFdlIHdpbGwgYWxzbyBjb252ZXJ0IHRoZSBkYXRlIHRvIGEgc3RhbmRhcmQgZm9ybWF0Lg0KDQpgYGB7cn0NCmxpYnJhcnkobHVicmlkYXRlKSAjIGVhc3kgdG8gcGFyc2UgZGF0ZXRpbWUgZGF0YQ0Kc2IkbnVtYmVyIDwtIDE6NTANCiNzYiRkYXRlIDwtIGFzLkRhdGUoc2IkZGF0ZSwgIiVCLiAlZCwgJVkiKQ0Kc2IkZGF0ZSA8LSBtZHkoc2IkZGF0ZSkNCiNoZWFkKHNiKQ0KZGF0YXRhYmxlKHNiLCBjYXB0aW9uID0gJ1RhYmxlIDM6IEltcHJvdm1lbnQgdG8gY2xlYW4gYW5kIHRpZHkgZGF0YS4nKQ0KYGBgDQoNCg0KVGhlIHJlc3VsdCBjb2x1bW4gc2hvdWxkIGJlIHNwbGl0IGludG8gZm91ciBjb2x1bW5zIGFzIHRoZSB3aW5uaW5nIHRlYW1zIG5hbWUsIHRoZSB3aW5uZXJzIHNjb3JlLCB0aGUgbG9zaW5nIHRlYW1zIG5hbWUsIGFuZCB0aGUgbG9zZXJzIHNjb3JlLiBXZSBzdGFydCBieSBzcGxpdHRpbmcgdGhlIHJlc3VsdHMgY29sdW1uIGludG8gdHdvIGNvbHVtbnMgYXQgdGhlIGNvbW1hLiBUaGlzIG9wZXJhdGlvbiB1c2VzIHRoZSBzZXBhcmF0ZSBmdW5jdGlvbiBmcm9tIHRoZSB0aWR5ciBwYWNrYWdlLg0KDQoNCmBgYHtyfQ0Kc2IgPC0gc2VwYXJhdGUoc2IsIHJlc3VsdCwgYygnd2lubmVyJywgJ2xvc2VyJyksIHNlcD0nLCAnLCByZW1vdmU9VFJVRSkNCiNoZWFkKHNiKQ0KZGF0YXRhYmxlKHNiLCBjYXB0aW9uID0gJ1RhYmxlIDQ6IENsZWFuIGFuZCB0aWR5IGRhdGEuJykNCmBgYA0KDQojIyA4LiBBc3NpZ25tZW50IA0KDQoNCiAxLiBTY3JhcCB0aGUgdGFibGUgb24gW09iZXNpdHlfaW5fdGhlX1VuaXRlZF9TdGF0ZXNdKGh0dHBzOi8vZW4ud2lraXBlZGlhLm9yZy93aWtpL09iZXNpdHlfaW5fdGhlX1VuaXRlZF9TdGF0ZXMpICAgIA0KICAgLSBoaW50OiB1c2luZyBgcmVhZF9odG1sYCAsYGh0bWxfbm9kZXNgICxgaHRtbF90YWJsZWAuDQogICAtIGhpbnQ6IHVzaW5nIGBbWzFdXWAgdG8gYWNjZXNzIHRoZSBmaXJzdCBlbGVtZW50IGluIGEgbGlzdA0KICAgLSBoaW50OiB1c2luZyBgZGF0YXRhYmxlYCBpbiBsaWJyYXJ5KERUKSB0byBwcmludCBvdXQgcHJldHR5IHRhYmxlDQoNCiAyLiBGaW5kIHRoZSB3aW5kIGRhdGEgaW4gIFtBRVNPXShodHRwOi8vZXRzLmFlc28uY2EvZXRzX3dlYi9pcC9NYXJrZXQvUmVwb3J0cy9DU0RSZXBvcnRTZXJ2bGV0KSANCg0KICAgLSBoaW50OiB1c2luZyBgcmVhZF9odG1sYCAsYGh0bWxfbm9kZXNgICxgaHRtbF90YWJsZWAuDQogICAtIFRoZXJlIGFyZSBtYW55IHRhYmxlcyBvbiB0aGF0IHBhZ2UuIElkZW50aWZ5IHRoZSBvbmUgb2Ygd2luZCBkYXRhLg0KICAgDQoNCiMjIyBSZXNvdXJjZToNCg0KICAtIFtXZWIgc2NyYXBpbmcgd2l0aCBSIGFuZCBydmVzdCAoaW5jbHVkZXMgdmlkZW8gJiBjb2RlKV0oaHR0cDovL3d3dy5jb21wdXRlcndvcmxkLmNvbS9hcnRpY2xlLzI5MDk1NjAvYnVzaW5lc3MtaW50ZWxsaWdlbmNlL3dlYi1zY3JhcGluZy13aXRoLXItYW5kLXJ2ZXN0LWluY2x1ZGVzLXZpZGVvLWNvZGUuaHRtbCkgDQoNCg0KICAtIFtSdmVzdCBhdXRob3IgZXhwbGFpbiBob3cgdG8gdXNlIHNlbGVjdG9yIGdhZGdldF0oaHR0cHM6Ly9jcmFuLnItcHJvamVjdC5vcmcvd2ViL3BhY2thZ2VzL3J2ZXN0L3ZpZ25ldHRlcy9zZWxlY3RvcmdhZGdldC5odG1sKQ0KDQogIC0gW1dlYiBzY3JhcGluZyB0dXRvcmlhbF0oaHR0cDovL3d3dzIuc3RhdC5kdWtlLmVkdS9+Y3IxNzMvU3RhNTIzX0ZhMTYvV2ViX3NjcmFwaW5nLmh0bWwpDQoNCiAtIFtXZWIgc2NyYXBpbmcgdHV0b3JpYWwgdmlkZW8gLTFdKGh0dHBzOi8vd3d3LnlvdXR1YmUuY29tL3dhdGNoP3Y9OHFqenozX0dyNGMmZmVhdHVyZT1lbS11cGxvYWRfb3duZXIpDQoNCiAgLSBbV2ViIHNjcmFwaW5nIHR1dG9yaWFsIHZpZGVvIC0yXShodHRwczovL3d3dy55b3V0dWJlLmNvbS93YXRjaD92PVFhTVZkSHVpZTdNJmZlYXR1cmU9eW91dHUuYmUpDQogICANCg0KIC0gW1dlYiBzY3JhcGluZyBpbiBSOiBBIHR1dG9yaWFsIHVzaW5nIFN1cGVyIEJvd2wgRGF0YV0oaHR0cHM6Ly9ycHVicy5jb20vUmFkY2xpZmZlL3N1cGVyYm93bCkNCg==