Downloading Data

You can read a csv file with by read.csv.

troll<-read.csv("https://raw.githubusercontent.com/fivethirtyeight/russian-troll-tweets/master/IRAhandle_tweets_1.csv")
head(troll$content)
## [1] "\"We have a sitting Democrat US Senator on trial for corruption and you've barely heard a peep from the mainstream media.\" ~ @nedryun https://t.co/gh6g0D1oiC"
## [2] "Marshawn Lynch arrives to game in anti-Trump shirt. Judging by his sagging pants the shirt should say Lynch vs. belt https://t.co/mLH1i30LZZ"                  
## [3] "Daughter of fallen Navy Sailor delivers powerful monologue on anthem protests, burns her NFL packers gear.  #BoycottNFL https://t.co/qDlFBGMeag"               
## [4] "JUST IN: President Trump dedicates Presidents Cup golf tournament trophy to the people of Florida, Texas and Puerto Rico. https://t.co/z9wVa4djAE"             
## [5] "19,000 RESPECTING our National Anthem! #StandForOurAnthem\U0001f1fa\U0001f1f8 https://t.co/czutyGaMQV"                                                         
## [6] "Dan Bongino: \"Nobody trolls liberals better than Donald Trump.\" Exactly!  https://t.co/AigV93aC8J"

If you want to download files, the command is download.file().

download.file(
  url="https://raw.githubusercontent.com/fivethirtyeight/russian-troll-tweets/master/IRAhandle_tweets_1.csv",
  destfile="russiantroll1.csv",
  method="curl"
)
russiantweets<-read.csv("russiantroll1.csv", sep=",", stringsAsFactors =FALSE)
str(russiantweets)
## 'data.frame':    243891 obs. of  21 variables:
##  $ external_author_id: num  9.06e+17 9.06e+17 9.06e+17 9.06e+17 9.06e+17 ...
##  $ author            : chr  "10_GOP" "10_GOP" "10_GOP" "10_GOP" ...
##  $ content           : chr  "\"We have a sitting Democrat US Senator on trial for corruption and you've barely heard a peep from the mainstr"| __truncated__ "Marshawn Lynch arrives to game in anti-Trump shirt. Judging by his sagging pants the shirt should say Lynch vs."| __truncated__ "Daughter of fallen Navy Sailor delivers powerful monologue on anthem protests, burns her NFL packers gear.  #Bo"| __truncated__ "JUST IN: President Trump dedicates Presidents Cup golf tournament trophy to the people of Florida, Texas and Pu"| __truncated__ ...
##  $ region            : chr  "Unknown" "Unknown" "Unknown" "Unknown" ...
##  $ language          : chr  "English" "English" "English" "English" ...
##  $ publish_date      : chr  "10/1/2017 19:58" "10/1/2017 22:43" "10/1/2017 22:50" "10/1/2017 23:52" ...
##  $ harvested_date    : chr  "10/1/2017 19:59" "10/1/2017 22:43" "10/1/2017 22:51" "10/1/2017 23:52" ...
##  $ following         : int  1052 1054 1054 1062 1050 1050 1050 1050 1050 1050 ...
##  $ followers         : int  9636 9637 9637 9642 9645 9644 9644 9644 9646 9646 ...
##  $ updates           : int  253 254 255 256 246 247 248 249 250 251 ...
##  $ post_type         : chr  "" "" "RETWEET" "" ...
##  $ account_type      : chr  "Right" "Right" "Right" "Right" ...
##  $ retweet           : int  0 0 1 0 1 0 1 0 0 0 ...
##  $ account_category  : chr  "RightTroll" "RightTroll" "RightTroll" "RightTroll" ...
##  $ new_june_2018     : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ alt_external_id   : num  9.06e+17 9.06e+17 9.06e+17 9.06e+17 9.06e+17 ...
##  $ tweet_id          : num  9.15e+17 9.15e+17 9.15e+17 9.15e+17 9.14e+17 ...
##  $ article_url       : chr  "http://twitter.com/905874659358453760/statuses/914580356430536707" "http://twitter.com/905874659358453760/statuses/914621840496189440" "http://twitter.com/905874659358453760/statuses/914623490375979008" "http://twitter.com/905874659358453760/statuses/914639143690555392" ...
##  $ tco1_step1        : chr  "https://twitter.com/10_gop/status/914580356430536707/video/1" "https://twitter.com/damienwoody/status/914568524449959937/video/1" "https://twitter.com/10_gop/status/913231923715198976/video/1" "https://twitter.com/10_gop/status/914639143690555392/video/1" ...
##  $ tco2_step1        : chr  "" "" "" "" ...
##  $ tco3_step1        : chr  "" "" "" "" ...

We can even loop the process.

root<-"https://raw.githubusercontent.com/fivethirtyeight/russian-troll-tweets/master/IRAhandle_tweets_"
for (i in 2:3) {
download.file(
  url<-paste(root,i,".csv", sep=""),
  destfile=paste("russiantroll",i,".csv",sep="")
)
}

Using APIs

APIs allow you to make queries and download the datasets. Google “CRAN and the website name”. For example, search CRAN world bank and you will see the WDI package which allows you to import World Development Indicators based on your specific query. For example, let’s download the oil rent and education expenditure data. To do so, we need the specific code for oil rents, which has the code of NY.GDP.PETR.RT.ZS.

library(WDI)
mydataset<-WDI(country = 'all',
               indicator = c('oilrents' = 'NY.GDP.PETR.RT.ZS',
                             'educationexpenditure' = 'SE.XPD.TOTL.GB.ZS'), 
               start=1960,end=2019)
ggplot(data=mydataset)+aes(x=year,y=oilrents)+geom_smooth()

rm(list=ls())

You can download

Web Scraping

Scraping Table Data with htmltab

Sometimes neither APIs nor csv files are not available to download. In this case, we scrape the web. Let’s import all the tables in this wikipedia page on opinion polling on Trump.

library("htmltab")
url <- "https://en.wikipedia.org/wiki/Opinion_polling_on_the_Donald_Trump_administration"

tables<-list()
for (i in 1:5) {
  tables[[i]] <- htmltab(doc = url, which = i)
}

tables[[1]]
##          Aggregator           Segment polled Approve Disapprove
## 2   FiveThirtyEight                All polls   38.1%      58.0%
## 3   FiveThirtyEight Likely/registered voters   39.7%      56.5%
## 4   FiveThirtyEight               All adults   36.7%      59.4%
## 5 RealClearPolitics                All polls   39.7%      57.7%

Wow! Isn’t this so cool! Now let’s import our division’s seminar series programs:

url <- "https://nyuad.nyu.edu/en/academics/divisions/social-science/events/seminar-series.html"

seminarseries<-list()
for (i in 1:4) {
  seminarseries[[i]] <- htmltab(doc = url, which = i, header = 0,rm_nodata_cols = FALSE)
  colnames(seminarseries[[i]])<-c("Date","Speaker")
}
seminarseries[[2]]
##                  Date
## 1   November 30, 2020
## 2   November 18, 2020
## 3    October 14, 2020
## 4  September 30, 2020
## 5        May 13, 2020
## 6      April 29, 2020
## 7      April 22, 2020
## 8      April 15, 2020
## 9      April 13, 2020
## 10      April 1, 2020
## 11      March 5, 2020
## 12  February 12, 2020
## 13   February 5, 2020
## 14   January 29, 2020
## 15  December 11, 2019
## 16  November 25, 2019
## 17  November 13, 2019
## 18   October 30, 2019
## 19   October 23, 2019
## 20   October 21, 2019
## 21 September 25, 2019
##                                                                                                                                                           Speaker
## 1                                                                                                                           Yotam Margalit, Tel Aviv University**
## 2                                                                                                                             Andy Eggers, University of Oxford**
## 3                                                                                                                Orit Kedar, The Hebrew University of Jerusalem**
## 4                                                                                                               Noam Gidron, The Hebrew University of Jerusalem**
## 5                                                                                                                     Kenneth Benoit, London School of Economics*
## 6                                                                                                                                        Cecilia Mo, UC Berkeley*
## 7                                                                                                                  Cesi Cruz, University of California San Diego*
## 8                                                                                                                           Scott Gelbach, University of Chicago*
## 9                                                                                                                        Miriam Golden, European Union Institute*
## 10                                                                                                                          Ethan Kaplan, University of Maryland*
## 11                                                                                                                          Arthur Spirling, New York University*
## 12                        Race, Representation and Local Governments in the US South: the Effect of the Voting Rights Act Cecilia Testa, University of Nottingham
## 13                                         How Programmatic Policies Impact Clientelism: Evidence from Snow Subsidies in Japan Amy Catalinac, New York University
## 14 Changing In-Group Boundaries: The Effects of Immigration on Race Relations in the US (with Shom Mazumder and Marco Tabellini) Vicky Fouka, Stanford University
## 15                             Under the Gun: Explaining Party Violence in Karachi, Pakistan Niloufer Siddiqui, University at Albany-State University of New York
## 16                                                               A Silent Corrupting Force? Recall Elections and Criminal Sentencing Sanford Gordon, NYU Politics
## 17                                                               Imperial Machinations and (B)Uganda’s Struggle for Independence Apollo Makubuya, MMAKS Advocates
## 18                                                                           From Research to Policy: Health Reform in the United States Sherry Glied, NYU Wagner
## 19                                                                    Bureaucracies, Advisers, and Interstate Crises Robert Schub, University of Nebraska-Lincoln
## 20                                              Prestige, Duty, or Fear? Why U.S. Supreme CourtJustices Follow Public Opinion Matt Hall, University of Notre Dame
## 21                                                                     Leverage and Governance under US Hierarchy Giacomo Chiozza, American University of Sharjah

Importing Data from HTML websites with rvest

Sometimes websites do not have directly downloadable .csv files, APIs or automatically extractable content (htmltab). We may need a very specific piece of information. In this case, we follow a procedure to scrape whatever data we need. To do so, we will use rvest to scrape data from html web pages. It is designed to work with dplyr so that you can express complex operations as elegant pipelines composed of simple, easily understood pieces. But first, install the chrome extension SelectorGadget.

install.packages("rvest")

Let’s say we didnt have htmltab package and we had to manually extract the tables. We click on the table, inspect and find the table, copy the xpath:

url <- "https://en.wikipedia.org/wiki/Opinion_polling_on_the_Donald_Trump_administration"
popularity <- url %>%
  read_html() %>% # reads the url and returns an xml document
  html_nodes(xpath='//*[@id="mw-content-text"]/div/table[2]') %>% # table[2] Returns the second table in page
  html_table() # puts the data in the table format.
popularity <- popularity[[1]]
popularity
##      Area polled    Segment polled                    Polling group
## 1        Georgia     Likely Voters                       AtlasIntel
## 2  United States Registered voters     NBC News/Wall Street Journal
## 3      Wisconsin Registered voters  Marquette University Law School
## 4  United States Registered voters     NBC News/Wall Street Journal
## 5  United States Registered voters                         Fox News
## 6  United States Registered voters              Monmouth University
## 7      Wisconsin Registered voters  Marquette University Law School
## 8  United States        All adults                 Grinnell College
## 9  United States Registered voters     NBC News/Wall Street Journal
## 10 United States Registered voters                         Fox News
## 11 United States Registered voters     NBC News/Wall Street Journal
## 12      New York Registered voters Siena College Research Institute
## 13 United States        All adults                              CNN
## 14 United States        All adults     NBC News/Wall Street Journal
## 15      Michigan        All adults                  NBC News/Marist
## 16 United States        All adults     NBC News/Wall Street Journal
## 17 United States        All adults                   Bloomberg News
## 18 United States        All adults     NBC News/Wall Street Journal
## 19 United States        All adults              Ipsos (for Reuters)
## 20 United States        All adults              Ipsos (for Reuters)
## 21 United States        All adults              Pew Research Center
## 22 United States        All adults       YouGov (for The Economist)
##                                 Date Donald Trump favorable
## 1  December 25, 2020–January 1, 2021                    47%
## 2                  August 9–12, 2020                    40%
## 3                   June 14–18, 2020                    42%
## 4              May 28 – June 2, 2020                    40%
## 5                    May 17–20, 2020                    43%
## 6             April 30 – May 4, 2020                    40%
## 7               November 13–17, 2019                    46%
## 8                October 17–23, 2019                    42%
## 9              September 13–16, 2019                    41%
## 10                August 11–13, 2019                    42%
## 11             September 16–19, 2018                    39%
## 12                February 5–8, 2018                    33%
## 13               January 14–18, 2018                    40%
## 14               January 13–17, 2018                    36%
## 15                August 13–17, 2017                    34%
## 16                  August 5–9, 2017                    36%
## 17                   July 8–12, 2017                    41%
## 18                 April 17–20, 2017                    39%
## 19                 April 13–17, 2017                    47%
## 20                 March 24–28, 2017                    51%
## 21      February 28 – March 12, 2017                    43%
## 22               January 23–25, 2017                    45%
##    Barack Obama favorable Sample size       Polling method Source
## 1                     52%       1,680 telephone and online    [3]
## 2                     54%         900            telephone    [4]
## 3                     61%         805            telephone    [5]
## 4                     57%       1,000            telephone    [6]
## 5                     63%       1,207            telephone    [7]
## 6                     57%         739            telephone    [8]
## 7                     54%         801            telephone    [9]
## 8                     61%       1,003            telephone   [10]
## 9                     54%         900            telephone   [11]
## 10                    60%       1,013            telephone   [12]
## 11                    54%         900            telephone   [13]
## 12                    67%         823            telephone   [14]
## 13                    66%       1,005            telephone   [15]
## 14                    57%         900            telephone   [16]
## 15                    64%         907            telephone   [17]
## 16                    51%       1,200            telephone   [18]
## 17                    61%       1,001            telephone   [19]
## 18                    52%         900            telephone   [20]
## 19                    62%       1,843               online   [21]
## 20                    64%       1,646               online   [22]
## 21                    60%       3,844 telephone and online   [23]
## 22                    54%       2,692               online   [24]

Let’s check the Business page of Khaleej Times news and scrape the headline, date, description. This time we will use SelectorGadget to create the xpaths.

url<-"https://www.khaleejtimes.com/business/energy?pagenr=1"
headlinesxpath<-'//*[contains(concat( " ", @class, " " ), concat( " ", "post-title", " " ))]//a'
descriptionxpath<-'//*[contains(concat( " ", @class, " " ), concat( " ", "post-summary", " " ))]'
datexpath<-'//*[contains(concat( " ", @class, " " ), concat( " ", "time", " " ))]'

headlines <- url %>%
  read_html() %>% # reads the url and returns an xml document
  html_nodes(xpath=headlinesxpath) %>% 
  html_text() # puts the data in the text format.

description <- url %>%
  read_html() %>% # reads the url and returns an xml document
  html_nodes(xpath=descriptionxpath) %>% 
  html_text() # puts the data in the text format.

date <- url %>%
  read_html() %>% # reads the url and returns an xml document
  html_nodes(xpath=datexpath) %>% 
  html_text() # puts the data in the text format.

KhaleeejTimes<-data.frame(headlines,description,date)
KhaleeejTimes
##                                                                                            headlines
## 1                Adnoc Drilling awarded $3.8 billion contract, underscoring strong growth trajectory
## 2                                Opec will continue with supply adjustments for oil market: Barkindo
## 3                                                     Dewa inaugurates Noor Energy 1 Visitors Centre
## 4  Expo 2020 Dubai: Sheikh Mohamed bin Zayed chairs Adnoc board of directors meeting at UAE pavilion
## 5                                             UAE: Petrol, diesel prices for December 2021 announced
## 6                              Masdar signs agreement to develop Armenia’s largest solar power plant
## 7                   UAE remains fully committed to Declaration of Cooperation 'OPEC+', says Ministry
## 8                         UAE rules out supply hike as US-led group set to release oil from reserves
## 9                             UAE supports international efforts to promote renewable  energy sector
## 10                                                                            Oil volatility is back
##                                                                                                                                                                                     description
## 1  Five-year agreement for provision of drilling, workover and well services to Adnoc Onshore; Contract reinforces Adnoc Drilling’s unique position as sole drilling services provider to Adnoc
## 2                                                                              Barkindo said in terms of oil demand the estimate at the moment was for a growth of 5.7 million barrels per day.
## 3                                                                    The 950-megawatt (MW) phase has investments totalling Dh15.78 billion based on the Independent Power Producer (IPP) model.
## 4                                                                          The Board of Directors approved Adnoc’s five year business plan and capital expenditure Dh466 billion for 2022-2026.
## 5                                                                                                                                  Here's how much it will cost to tank up your car next month.
## 6                                                                   Armenia is looking to increase the share of renewables in its energy mix and reduce its dependence on imported oil and gas.
## 7                                                                                                                           Ministry of Energy and Infrastructure issues statement on Thursday.
## 8                                                                                The White House said on Tuesday that the US would release 50 million barrels of crude from strategic reserves.
## 9                                                                                                                    Suhail Al Mazrouei, Israeli Minister of Energy sign MoU at Expo 2020 Dubai
## 10                                                                             Crude may trade under pressure as prices slumps below $80 on resurgent European Covid fears, release of reserves
##           date
## 1   3 days ago
## 2   1 week ago
## 3   1 week ago
## 4   1 week ago
## 5   1 week ago
## 6  2 weeks ago
## 7  2 weeks ago
## 8  2 weeks ago
## 9  2 weeks ago
## 10 3 weeks ago

Let’s automate the process:

KhaleeejTimes=list()
for (i in 1:5) {
  url<-paste("https://www.khaleejtimes.com/business/energy?pagenr=",i, sep="")
  headlinesxpath<-'//*[contains(concat( " ", @class, " " ), concat( " ", "post-title", " " ))]//a'
  descriptionxpath<-'//*[contains(concat( " ", @class, " " ), concat( " ", "post-summary", " " ))]'
  datexpath<-'//*[contains(concat( " ", @class, " " ), concat( " ", "time", " " ))]'

  headlines <- url %>%
    read_html() %>% # reads the url and returns an xml document
    html_nodes(xpath=headlinesxpath) %>% 
    html_text() # puts the data in the text format.
  
  description <- url %>%
    read_html() %>% # reads the url and returns an xml document
    html_nodes(xpath=descriptionxpath) %>% 
    html_text() # puts the data in the text format.
  
  date <- url %>%
    read_html() %>% # reads the url and returns an xml document
    html_nodes(xpath=datexpath) %>% 
    html_text() # puts the data in the text format.
  
  KhaleeejTimes[[i]]<-data.frame(headline=headlines,description=description,date=date)
}

library(purrr)
library(dplyr)
mydat<-map_dfr(KhaleeejTimes, bind_rows)
unique(mydat$headline)
##  [1] "Adnoc Drilling awarded $3.8 billion contract, underscoring strong growth trajectory"                              
##  [2] "Opec will continue with supply adjustments for oil market: Barkindo"                                              
##  [3] "Dewa inaugurates Noor Energy 1 Visitors Centre"                                                                   
##  [4] "Expo 2020 Dubai: Sheikh Mohamed bin Zayed chairs Adnoc board of directors meeting at UAE pavilion"                
##  [5] "UAE: Petrol, diesel prices for December 2021 announced"                                                           
##  [6] "Masdar signs agreement to develop Armenia’s largest solar power plant"                                            
##  [7] "UAE remains fully committed to Declaration of Cooperation 'OPEC+', says Ministry"                                 
##  [8] "UAE rules out supply hike as US-led group set to release oil from reserves"                                       
##  [9] "UAE supports international efforts to promote renewable  energy sector"                                           
## [10] "Oil volatility is back"                                                                                           
## [11] "Adnoc secures $3 billion loan from JBIC and four other banks"                                                     
## [12] "Japan PM confirms oil reserves may be released to curb prices"                                                    
## [13] " Fujairah marine fuel sales rise 22% to record high"                                                              
## [14] "Microsoft aims to be carbon negative by 2030"                                                                     
## [15] "UAE, Russia to bolster industrial collaboration in hydrogen fuel technology"                                      
## [16] "Aveva ready to shape sustainable industries of future"                                                            
## [17] "Opec should release excess capacity to bring down oil prices: Indian minister"                                    
## [18] "Leading Indian lubricants production company opens first plant in Sharjah   "                                     
## [19] "Come, join us in energy transition journey: UAE tells the world"                                                  
## [20] "Adnoc, Borealis ink Dh22 billion agreement to expand Borouge facility"                                            
## [21] "Israeli manufacturer inks pact with UAE company"                                                                  
## [22] "Adnoc to invest $6 billion to enable drilling growth"                                                             
## [23] "Adnoc, Taqa form global green energy venture"                                                                     
## [24] "2022 going to be a year of balance after turbulence in 2021"                                                      
## [25] "UAE sees oil supply surplus by Q1 2022 "                                                                          
## [26] "Schneider Electric unveils technologies for energy sector at ADIPEC 2021"                                         
## [27] "Taqa posts Dh4.3 billion net income in 9 months of 2021"                                                          
## [28] "Adnoc Drilling 9-month revenue surges 12% to $1.7 billion"                                                        
## [29] "Opec+ able to increase oil supply in case of market demand"                                                       
## [30] "Sustainable finance key to net-zero goal"                                                                         
## [31] "Virgin Mobile UAE urges collaborative approach to climate change with  launch of carbon offsetting app initiative"
## [32] "UAE: Construction of Barakah nuclear plant's Unit 3 complete"                                                     
## [33] "Dubai sets world record with first 3D-printed lab"                                                                
## [34] "Saudi Aramco sees third-quarter income rise to $30.4 billion"                                                     
## [35] "UAE: Petrol, diesel prices for November 2021 announced"                                                           
## [36] "Abdullah bin Zayed leads UAE delegation ‏to COP 26 Glasgow"                                                        
## [37] "Video: Sheikh Hamdan bin Zayed attends opening of ConvEx-3 'Barakah UAE'"                                         
## [38] "Adnoc, Ewec ink partnership"                                                                                      
## [39] "UAE welcomes Saudi Arabia’s net zero target"                                                                      
## [40] "Race to zero carbon emission"                                                                                     
## [41] "UAE committed to India’s energy security"                                                                         
## [42] "Etihad Credit Insurance, SkyPower sign sustainability partnership"                                                
## [43] "FTSE adds Adnoc Drilling to 3 of its global equity indices"                                                       
## [44] "Building sustainable energy future"                                                                               
## [45] "Dubai Supreme Council of Energy launches Circular Economy Committee"                                              
## [46] "Aveva charts progress to net zero, gender equality"                                                               
## [47] "Energy crunch to boost oil demand"                                                                                
## [48] "Oil extends rally to multi-year peaks as energy crunch bites"                                                     
## [49] "Bain & Company to reduce business travel emissions by 35 per cent per employee"                                   
## [50] "Oil could climb to $100 as Opec+ opts against output boost"

Let’s look a the State of the Union Speeches by American presidents from 1790 to present.

url<-"https://www.presidency.ucsb.edu/documents/presidential-documents-archive-guidebook/annual-messages-congress-the-state-the-union"
links<-url %>%
  read_html() %>% # reads the url and returns an xml document
  html_nodes(xpath='//tbody//a') %>% 
  html_attr("href") # extracts the attribute data (urls behind the hyperlinks).
links<-links[!is.na(links)]
links<-links[links!="#nixon1973"]

speeches<-list()
for (i in 1:2) { # 
  url<-links[i]
  presidentx<-'//*[contains(concat( " ", @class, " " ), concat( " ", "diet-title", " " ))]//a'
  datex<-'//*[contains(concat( " ", @class, " " ), concat( " ", "date-display-single", " " ))]'
  speechx<-'//*[contains(concat( " ", @class, " " ), concat( " ", "field-docs-content", " " ))]//p'
  
  urlxml<-url %>%
    read_html()  # reads the url and returns an xml document
  
  pres <- urlxml %>% 
    html_nodes(xpath=presidentx) %>% 
    html_text() # puts the data in the text format.
  
  date <- urlxml %>% 
    html_nodes(xpath=datex) %>% 
    html_text() # puts the data in the text format.
  
  speech <- urlxml %>%  
    html_nodes(xpath=speechx) %>% 
    html_text() # puts the data in the text format.
  
  speeches[[i]]<-data.frame(President=pres,Speech=speech,Date=date)
}
speeches<-map_dfr(speeches, ~bind_rows(.,))
head(speeches)
##         President
## 1 Joseph R. Biden
## 2 Joseph R. Biden
## 3 Joseph R. Biden
## 4 Joseph R. Biden
## 5 Joseph R. Biden
## 6 Joseph R. Biden
##                                                                                                                                                                                                                                                                                                                                                                                                        Speech
## 1                                                                                                                                                                                                                          The President. Thank you. Thank you. Thank you. Good to be back. As Mitch and Chuck will understand, it's good to be almost home, down the hall. [Laughter] Anyway, thank you all.
## 2                                                                                                                                                                                                                                                   Madam Speaker, Madam Vice President—no President has ever said those words from this podium. No President has ever said those words, and it's about time.
## 3                                                                                                        First Lady—I'm her husband; Second Gentleman; Chief Justice; Members of the United States Congress and the Cabinet; distinguished guests; my fellow Americans: While the setting tonight is familiar, this gathering is just a little bit different, a reminder of the extraordinary times we're in.
## 4                                                          Throughout our history, Presidents have come to this Chamber to speak to Congress, to the Nation, and to the world; to declare war, to celebrate peace; to announce new plans and possibilities. Tonight I come to talk about crisis and opportunity, about rebuilding the Nation, revitalizing our democracy, and winning the future for America.
## 5                                         I stand here tonight, 1 day shy of the hundredth day of my administration, hundred days since I took the oath of office and lifted my hand off our family Bible and inherited a Nation—we all did—that was in crisis: the worst pandemic in a century, the worst economic crisis since the Great Depression, the worst attack on our democracy since the Civil War.
## 6 Now, after just 100 days, I can report to the Nation: America is on the move again, turning peril into possibility, crisis to opportunity, setbacks into strength. We all know life can knock us down. But, in America, we never, ever, ever stay down. Americans always get up. Today, that's what we're doing: America is rising anew, choosing hope over fear, truth over lies, and light over darkness.
##             Date
## 1 April 28, 2021
## 2 April 28, 2021
## 3 April 28, 2021
## 4 April 28, 2021
## 5 April 28, 2021
## 6 April 28, 2021
unique(speeches$President)
## [1] "Joseph R. Biden" "Donald J. Trump"
unique(speeches$Date)
## [1] "April 28, 2021"    "February 28, 2017"

If I want to get the data for all the presidents from 1790 to today, my loop would be as follows: for (i in 1:length(links))

These days I am looking for a new apartment. All I want is a large apartment at a good location with the cheapest possible rent:

root<-"https://abudhabi.dubizzle.com/en/property-for-rent/residential/?filters=%28bedrooms%3E%3D2+AND+bedrooms%3C%3D2%29+AND+%28price%3E%3D60000+AND+price%3C%3D75000%29+AND+%28neighborhoods.ids%3D57333+OR+neighborhoods.ids%3D57327%29&sort=lowest&page="

Titlepath<-'//*[contains(concat( " ", @class, " " ), concat( " ", "irHTOV", " " ))]'
Pricepath<-'//*[contains(concat( " ", @class, " " ), concat( " ", "dfkLOe", " " ))]'
Spacepath<-'//*[contains(concat( " ", @class, " " ), concat( " ", "cZBzoq", " " ))]'

props<-list()
for (i in 1:4) {
url<-paste(root,i,sep="")

urlxml<-url %>%
  read_html()

Title <-urlxml  %>%  
  html_nodes(xpath=Titlepath) %>% 
  html_text()  

Price <- urlxml %>% 
  html_nodes(xpath=Pricepath) %>% 
  html_text()  

Info <- urlxml %>%  
  html_nodes(xpath=Spacepath) %>% 
  html_text()  

Space<-Info[grep("SqFt", Info)]


props[[i]]<-data.frame(Title=Title,
                       Price=Price,
                       Space=Space)
}


data<-purrr::map_dfr(props, bind_rows) %>%
  mutate(Price=as.numeric(gsub(",","",Price))) %>%
  mutate(Space=gsub(",","",Space)) %>% 
  mutate(Space=as.numeric(gsub("SqFt","",Space)))

ggplot(data=data) + aes(y=Price/1000, x=Space)+
  geom_jitter()+
  geom_smooth()+theme_bw()