Ch2. HTML

2.1 Brower presentation and source code

HTML

HTML

  1. HTML’s marked up structured

    • Markup definitions: the tags
  2. Web content is an interpreted version of the source code

    • How the document is structured and the function of its various parts: headlines, links, tables, etc…

    • Element inspector

2.2 Syntax rules

  1. Tags, elements, and attributes

Elements

<title>First HTML</title>

Attributes

<a href="http://www.r-datacollection.com/">Link to Homepage</a>

http://www.r-datacollection.com/bookmaterials.html

  1. Tree structure

    <p>First HTML</p>

    I am your first HTML-file!

A tree perspective on HTML

A tree perspective on HTML

  1. Comments

  2. Reversed and special characters

    <p>5 &lt; 6 but 7 &gt; 3 </p>

HTML entities

Character Entity name Explanation
" " quotation mark
' apostrophe
& & ampersand
< < less than
> > greater than
  non-breaking space
  1. Document type definition

    <!DOCTYPE html>
  2. Spaces and line breaks

    Writing code is poetry

    Writing
     code
     is
     poetry

    Writing
        code
          is
     poetry

2.3 Tags and attributes

  1. The anchor tag <a>

The tag <a> is what turns HTML from just a markup language into a hypertext markup language by enabling HTML documents to link to other documents.

- Linking to another document
  1. The metadata tag <meta>

The <meta> tag provides meta information on the HTML document.

- Specifying keywords
- Asking robots not to index the page or to follow its links
- Declaring character encoding
- Defining character encodings
  1. The external reference tag <link>

The <link> tag is used to link to and include information and external files.

- Specifying style sheets to use
- Specifying the icon associated with the website
  1. Emphasizing tags <b>, <i>, <strong>

    • Text with bold type setting
    • Text set in italics
    • Text defined as important
  2. The paragraphs tag <p>

  3. Heading tags <h1>, <h2>, <h3>, …

  4. Listing content with <ul>, <ol>, and <dl>

  5. The organizational tags <div> and <span>

While <div> and <span> themselves do not change the appearance of the content they enclose, these tags are used to group parts of the document.

- &lt;div&gt; defines groups across lines, tags, and paragraphs
- &lt;span&gt; used for in-line grouping
- CSS
     div.happy {color:pink;font-family:"Comic Sans MS";font-size:120%}
     span.happy {color:pink;font-family:"Comic Sans MS";font-size:120%} 
     
     <link href="htmlresources/awesomestyle.css" rel="stylesheet" type="text/css"/>
     

The purpose of CSS is to separate content from layout to improve the document’s accessibility. Defining styles outside of an HTML and assigning them via the class attribute enables the web designer to reuse styles across elements and documents. This enables developers to change a style in one single place–within the CSS file–with effects on all elements and documents using this style.

  1. The <form> tag and its companions

  2. The foreign script tag <script>

HTTP

HTTP

  1. Table tags <table>, <tr>, <td>, and <th>

    • new lines with <tr>
    • <td> for defining cells
    • <th> for header cells

2.4 Parsing

Loading and representing the contents of HTML/XML files in an R session

  1. Inspecting content on the Web: browser to display HTML content nicely

  2. Importing HTML files into R and extracting info. from them: parser in R to construct useful representations of HTML documents

What is parsing?

Reading vs. Parsing

Reading does not care to understand the formal grammar that underlies HTML but merely recognize the sequence of symbols included in the HTML file: Merely loading the content of an HTML file into an R session.

url <- "https://news.v.daum.net/v/20210922193955830"
example <- readLines(url)
## Warning in readLines(url): 'https://news.v.daum.net/v/20210922193955830'에서 불
## 완전한 마지막 행이 발견되었습니다
example <- paste0(example, collapse = " ")

class(example)
## [1] "character"
library(httr)
url <- "https://news.v.daum.net/v/20210922193955830"
example <- httr::GET(url)
example
## Response [https://news.v.daum.net/v/20210922193955830]
##   Date: 2021-09-23 02:58
##   Status: 200
##   Content-Type: text/html;charset=UTF-8
##   Size: 59.2 kB
## <!doctype html>
## <html lang="ko"> 
##  <head data-cloud-area="head"> 
##   <meta charset="utf-8"> 
##   <meta http-equiv="X-UA-Compatible" content="IE=edge"> 
##   <style>
##             @import url('//t1.daumcdn.net/harmony_static/cloud/page/c8a258a8a...
##             @import url('//t1.daumcdn.net/harmony_static/cloud/2021/09/14/com...
##         </style> 
##   <style>
## ...
class(example)
## [1] "response"

GET() is agnostic about the different tag elements (name, attribute, values, etc.) and produces results that do not reflect the document’s internal hierarchy as implied by the nested tags in any sensible way.

To achieve a useful representation of HTML files, we need to employ a program that understands the special meaning of the markup structures and reconstructs the implied hierarchy of an HTML file within some R-specific data structure.

Transformation from any HTML file to a queryable Document Object Model: Parsing using XML package in two steps

  1. ```html_parse()``` first parses the entire target document and creates the DOM in a tree-like data structure of the C language.
  2. The C-level node structure is converted into an object of the R language through handler functions.
library(XML)
parsed_example <- htmlParse(example)
class(parsed_example)
## [1] "HTMLInternalDocument" "HTMLInternalDocument" "XMLInternalDocument" 
## [4] "XMLAbstractDocument"
#parsed_example

Extracting information in the building process

Assignment

  1. Select 1) a DAUM news page of your interest in your browser

  2. Have a look at the source code

  3. Inspect various elements in the Inspect Elements tool of your browser

  4. Copy and paste the elements for the outlet, the upload data, the headline, any highlight, the body content, and the comments (댓글)

  5. Check and report the structure of the elements.

Case study: World Heritage Sites in Danger

  1. Generate some research questions

    • 1,121 heretage sites like the Pyramids in Egypt

    • Which sites are threatened and where are they located?

    • Are there regions in the world where sitets are more endangered than in others?

    • What are the reasons that put a site at risk?

    Questions to be considered in data collection

    1. What type of data is most suited to answer your question?

    2. Is the quality of the data sufficiently high to answer your question?

    3. Is the information systematically flowed (biased)?

  2. Find a source of data that can be used to answer the questions

https://en.wikipedia.org/wiki/List_of_World_Heritage_in_Danger

  1. Import content from a web page into R
library(httr)
library(XML)
library(stringr)
library(maps)

heritage_page <- GET("https://en.wikipedia.org/wiki/List_of_World_Heritage_in_Danger")
heritage_page
## Response [https://en.wikipedia.org/wiki/List_of_World_Heritage_in_Danger]
##   Date: 2021-09-23 02:07
##   Status: 200
##   Content-Type: text/html; charset=UTF-8
##   Size: 513 kB
## <!DOCTYPE html>
## <html class="client-nojs" lang="en" dir="ltr">
## <head>
## <meta charset="UTF-8"/>
## <title>List of World Heritage in Danger - Wikipedia</title>
## <script>document.documentElement.className="client-js";RLCONF={"wgBreakFrames...
## ,"Geographic coordinate lists","Articles with Geo","Featured lists","Lists of...
## "wgGEAskQuestionEnabled":!1,"wgGELinkRecommendationsFrontendEnabled":!1,"wgCe...
## "ext.visualEditor.desktopArticleTarget.init","ext.visualEditor.targetLoader",...
## <script>(RLQ=window.RLQ||[]).push(function(){mw.loader.implement("user.option...
## ...
class(heritage_page)
## [1] "response"
  1. Parse the content in the form of an HTML document and extract HTML tables
heritage_parsed <- htmlParse(heritage_page, encoding = "UTF-8")
class(heritage_parsed)
## [1] "HTMLInternalDocument" "HTMLInternalDocument" "XMLInternalDocument" 
## [4] "XMLAbstractDocument"
tables <- readHTMLTable(heritage_parsed, stringsAsFactors = FALSE)
class(tables)
## [1] "list"
length(tables)
## [1] 5
tables[[1]]
##                                                            V1
## 1 Map this section's coordinates using: OpenStreetMap<U+00A0>
## 2                                Download coordinates as: KML
tables[[2]][1:10,]
##                                     V1    V2
## 1                                 Name Image
## 2                             Abu Mena      
## 3      Air and Tenere Natural Reserves      
## 4               Ancient City of Aleppo      
## 5                Ancient City of Bosra      
## 6             Ancient City of Damascus      
## 7   Ancient Villages of Northern Syria      
## 8        Archaeological Site of Cyrene      
## 9  Archaeological Site of Leptis Magna      
## 10     Archaeological Site of Sabratha      
##                                                                                                                                                                                                                                                                                                                                                                                                                V3
## 1                                                                                                                                                                                                                                                                                                                                                                                                        Location
## 2  EgyAbusir,<U+00A0>Egypt.mw-parser-output .geo-default,.mw-parser-output .geo-dms,.mw-parser-output .geo-dec{display:inline}.mw-parser-output .geo-nondefault,.mw-parser-output .geo-multi-punct{display:none}.mw-parser-output .longitude,.mw-parser-output .latitude{white-space:nowrap}30°50′30″N 29°39′50″E<U+FEFF> / <U+FEFF>30.84167°N 29.66389°E<U+FEFF> / 30.84167; 29.66389<U+FEFF> (Abu Mena)
## 3                                                                                                                                                                                                                                                    Niger1Arlit Department,<U+00A0>Niger18°17′N 8°0′E<U+FEFF> / <U+FEFF>18.283°N 8.000°E<U+FEFF> / 18.283; 8.000<U+FEFF> (Air and Tenere Natural Reserves)
## 4                                                                                                                                                                                                                                                            Aleppo Governorate, <U+00A0>Syria36°14′N 37°10′E<U+FEFF> / <U+FEFF>36.233°N 37.167°E<U+FEFF> / 36.233; 37.167<U+FEFF> (Ancient City of Aleppo)
## 5                                                                                                                                                                                                                                               Daraa Governorate, <U+00A0>Syria32°31′5″N 36°28′54″E<U+FEFF> / <U+FEFF>32.51806°N 36.48167°E<U+FEFF> / 32.51806; 36.48167<U+FEFF> (Ancient City of Bosra)
## 6                                                                                                                                                                                                                                        Damascus Governorate, <U+00A0>Syria33°30′41″N 36°18′23″E<U+FEFF> / <U+FEFF>33.51139°N 36.30639°E<U+FEFF> / 33.51139; 36.30639<U+FEFF> (Ancient City of Damascus)
## 7                                                                                                                                                                                                                                                     <U+00A0>Syria36°20′3″N 36°50′39″E<U+FEFF> / <U+FEFF>36.33417°N 36.84417°E<U+FEFF> / 36.33417; 36.84417<U+FEFF> (Ancient Villages of Northern Syria)
## 8                                                                                                                                                                                                                                         LibJebel Akhdar,<U+00A0>Libya32°49′30″N 21°51′30″E<U+FEFF> / <U+FEFF>32.82500°N 21.85833°E<U+FEFF> / 32.82500; 21.85833<U+FEFF> (Archaeological Site of Cyrene)
## 9                                                                                                                                                                                                                                          LibKhoms,<U+00A0>Libya32°38′18″N 14°17′35″E<U+FEFF> / <U+FEFF>32.63833°N 14.29306°E<U+FEFF> / 32.63833; 14.29306<U+FEFF> (Archaeological Site of Leptis Magna)
## 10                                                                                                                                                                                                                                           LibSabratha,<U+00A0>Libya32°48′19″N 12°29′6″E<U+FEFF> / <U+FEFF>32.80528°N 12.48500°E<U+FEFF> / 32.80528; 12.48500<U+FEFF> (Archaeological Site of Sabratha)
##                               V4                     V5         V6           V7
## 1                       Criteria          Areaha (acre) Year (WHS)   Endangered
## 2                  Cultural:(iv)              182 (450)       1979 2001<U+2013>
## 3       Natural:(vii), (ix), (x) 7,736,000 (19,120,000)       1991 1992<U+2013>
## 4             Cultural:(iii)(iv)              350 (860)       1986 2013<U+2013>
## 5          Cultural:(i)(iii)(vi)               <U+2014>       1980 2013<U+2013>
## 6  Cultural:(i)(ii)(iii)(iv)(vi)               86 (210)       1979 2013<U+2013>
## 7          Cultural:(iii)(iv)(v)        12,290 (30,400)       2011 2013<U+2013>
## 8     Cultural:(ii), (iii), (vi)               <U+2014>       1982 2016<U+2013>
## 9      Cultural:(i), (ii), (iii)               <U+2014>       1982 2016<U+2013>
## 10                Cultural:(iii)               <U+2014>       1982 2016<U+2013>
##                                                                                                                                             V8
## 1                                                                                                                                       Reason
## 2                               Cave-ins in the area caused by the clay at the surface, which becomes semi-liquid when met with "excess water"
## 3  Military conflict and civil disturbance in the region as well as a reduction of wildlife population and degradation of the vegetation cover
## 4                                                  Syrian Civil War, currently held by the government. Bombings continue threatening the site.
## 5                                                                                                    Syrian Civil War, held by the government.
## 6                                Syrian Civil War, rebel gunfire and mortar shelling, mainly from adjacent Jobar suburb endangers foundations.
## 7                                                Syrian Civil War, some held by rebels. Reports of looting and demolitions by Islamist groups.
## 8                                                   Libyan Civil War, presence of armed groups, already incurred and potential further damage.
## 9                                                   Libyan Civil War, presence of armed groups, already incurred and potential further damage.
## 10                                                  Libyan Civil War, presence of armed groups, already incurred and potential further damage.
##              V9
## 1          Refs
## 2  [17][18][19]
## 3      [20][21]
## 4          [22]
## 5          [23]
## 6          [24]
## 7          [25]
## 8      [26][27]
## 9      [27][28]
## 10     [27][29]
  1. Select the table of interest and rename the variables
danger_table <- tables[[2]]
class(danger_table)
## [1] "data.frame"
danger_table[1:10,]
##                                     V1    V2
## 1                                 Name Image
## 2                             Abu Mena      
## 3      Air and Tenere Natural Reserves      
## 4               Ancient City of Aleppo      
## 5                Ancient City of Bosra      
## 6             Ancient City of Damascus      
## 7   Ancient Villages of Northern Syria      
## 8        Archaeological Site of Cyrene      
## 9  Archaeological Site of Leptis Magna      
## 10     Archaeological Site of Sabratha      
##                                                                                                                                                                                                                                                                                                                                                                                                                V3
## 1                                                                                                                                                                                                                                                                                                                                                                                                        Location
## 2  EgyAbusir,<U+00A0>Egypt.mw-parser-output .geo-default,.mw-parser-output .geo-dms,.mw-parser-output .geo-dec{display:inline}.mw-parser-output .geo-nondefault,.mw-parser-output .geo-multi-punct{display:none}.mw-parser-output .longitude,.mw-parser-output .latitude{white-space:nowrap}30°50′30″N 29°39′50″E<U+FEFF> / <U+FEFF>30.84167°N 29.66389°E<U+FEFF> / 30.84167; 29.66389<U+FEFF> (Abu Mena)
## 3                                                                                                                                                                                                                                                    Niger1Arlit Department,<U+00A0>Niger18°17′N 8°0′E<U+FEFF> / <U+FEFF>18.283°N 8.000°E<U+FEFF> / 18.283; 8.000<U+FEFF> (Air and Tenere Natural Reserves)
## 4                                                                                                                                                                                                                                                            Aleppo Governorate, <U+00A0>Syria36°14′N 37°10′E<U+FEFF> / <U+FEFF>36.233°N 37.167°E<U+FEFF> / 36.233; 37.167<U+FEFF> (Ancient City of Aleppo)
## 5                                                                                                                                                                                                                                               Daraa Governorate, <U+00A0>Syria32°31′5″N 36°28′54″E<U+FEFF> / <U+FEFF>32.51806°N 36.48167°E<U+FEFF> / 32.51806; 36.48167<U+FEFF> (Ancient City of Bosra)
## 6                                                                                                                                                                                                                                        Damascus Governorate, <U+00A0>Syria33°30′41″N 36°18′23″E<U+FEFF> / <U+FEFF>33.51139°N 36.30639°E<U+FEFF> / 33.51139; 36.30639<U+FEFF> (Ancient City of Damascus)
## 7                                                                                                                                                                                                                                                     <U+00A0>Syria36°20′3″N 36°50′39″E<U+FEFF> / <U+FEFF>36.33417°N 36.84417°E<U+FEFF> / 36.33417; 36.84417<U+FEFF> (Ancient Villages of Northern Syria)
## 8                                                                                                                                                                                                                                         LibJebel Akhdar,<U+00A0>Libya32°49′30″N 21°51′30″E<U+FEFF> / <U+FEFF>32.82500°N 21.85833°E<U+FEFF> / 32.82500; 21.85833<U+FEFF> (Archaeological Site of Cyrene)
## 9                                                                                                                                                                                                                                          LibKhoms,<U+00A0>Libya32°38′18″N 14°17′35″E<U+FEFF> / <U+FEFF>32.63833°N 14.29306°E<U+FEFF> / 32.63833; 14.29306<U+FEFF> (Archaeological Site of Leptis Magna)
## 10                                                                                                                                                                                                                                           LibSabratha,<U+00A0>Libya32°48′19″N 12°29′6″E<U+FEFF> / <U+FEFF>32.80528°N 12.48500°E<U+FEFF> / 32.80528; 12.48500<U+FEFF> (Archaeological Site of Sabratha)
##                               V4                     V5         V6           V7
## 1                       Criteria          Areaha (acre) Year (WHS)   Endangered
## 2                  Cultural:(iv)              182 (450)       1979 2001<U+2013>
## 3       Natural:(vii), (ix), (x) 7,736,000 (19,120,000)       1991 1992<U+2013>
## 4             Cultural:(iii)(iv)              350 (860)       1986 2013<U+2013>
## 5          Cultural:(i)(iii)(vi)               <U+2014>       1980 2013<U+2013>
## 6  Cultural:(i)(ii)(iii)(iv)(vi)               86 (210)       1979 2013<U+2013>
## 7          Cultural:(iii)(iv)(v)        12,290 (30,400)       2011 2013<U+2013>
## 8     Cultural:(ii), (iii), (vi)               <U+2014>       1982 2016<U+2013>
## 9      Cultural:(i), (ii), (iii)               <U+2014>       1982 2016<U+2013>
## 10                Cultural:(iii)               <U+2014>       1982 2016<U+2013>
##                                                                                                                                             V8
## 1                                                                                                                                       Reason
## 2                               Cave-ins in the area caused by the clay at the surface, which becomes semi-liquid when met with "excess water"
## 3  Military conflict and civil disturbance in the region as well as a reduction of wildlife population and degradation of the vegetation cover
## 4                                                  Syrian Civil War, currently held by the government. Bombings continue threatening the site.
## 5                                                                                                    Syrian Civil War, held by the government.
## 6                                Syrian Civil War, rebel gunfire and mortar shelling, mainly from adjacent Jobar suburb endangers foundations.
## 7                                                Syrian Civil War, some held by rebels. Reports of looting and demolitions by Islamist groups.
## 8                                                   Libyan Civil War, presence of armed groups, already incurred and potential further damage.
## 9                                                   Libyan Civil War, presence of armed groups, already incurred and potential further damage.
## 10                                                  Libyan Civil War, presence of armed groups, already incurred and potential further damage.
##              V9
## 1          Refs
## 2  [17][18][19]
## 3      [20][21]
## 4          [22]
## 5          [23]
## 6          [24]
## 7          [25]
## 8      [26][27]
## 9      [27][28]
## 10     [27][29]
names(danger_table)
## [1] "V1" "V2" "V3" "V4" "V5" "V6" "V7" "V8" "V9"
danger_table <- danger_table[-1, c(1,3,4,6,7)]
danger_table[1:10,]
##                                     V1
## 2                             Abu Mena
## 3      Air and Tenere Natural Reserves
## 4               Ancient City of Aleppo
## 5                Ancient City of Bosra
## 6             Ancient City of Damascus
## 7   Ancient Villages of Northern Syria
## 8        Archaeological Site of Cyrene
## 9  Archaeological Site of Leptis Magna
## 10     Archaeological Site of Sabratha
## 11              Ashur (Qal'at Sherqat)
##                                                                                                                                                                                                                                                                                                                                                                                                                V3
## 2  EgyAbusir,<U+00A0>Egypt.mw-parser-output .geo-default,.mw-parser-output .geo-dms,.mw-parser-output .geo-dec{display:inline}.mw-parser-output .geo-nondefault,.mw-parser-output .geo-multi-punct{display:none}.mw-parser-output .longitude,.mw-parser-output .latitude{white-space:nowrap}30°50′30″N 29°39′50″E<U+FEFF> / <U+FEFF>30.84167°N 29.66389°E<U+FEFF> / 30.84167; 29.66389<U+FEFF> (Abu Mena)
## 3                                                                                                                                                                                                                                                    Niger1Arlit Department,<U+00A0>Niger18°17′N 8°0′E<U+FEFF> / <U+FEFF>18.283°N 8.000°E<U+FEFF> / 18.283; 8.000<U+FEFF> (Air and Tenere Natural Reserves)
## 4                                                                                                                                                                                                                                                            Aleppo Governorate, <U+00A0>Syria36°14′N 37°10′E<U+FEFF> / <U+FEFF>36.233°N 37.167°E<U+FEFF> / 36.233; 37.167<U+FEFF> (Ancient City of Aleppo)
## 5                                                                                                                                                                                                                                               Daraa Governorate, <U+00A0>Syria32°31′5″N 36°28′54″E<U+FEFF> / <U+FEFF>32.51806°N 36.48167°E<U+FEFF> / 32.51806; 36.48167<U+FEFF> (Ancient City of Bosra)
## 6                                                                                                                                                                                                                                        Damascus Governorate, <U+00A0>Syria33°30′41″N 36°18′23″E<U+FEFF> / <U+FEFF>33.51139°N 36.30639°E<U+FEFF> / 33.51139; 36.30639<U+FEFF> (Ancient City of Damascus)
## 7                                                                                                                                                                                                                                                     <U+00A0>Syria36°20′3″N 36°50′39″E<U+FEFF> / <U+FEFF>36.33417°N 36.84417°E<U+FEFF> / 36.33417; 36.84417<U+FEFF> (Ancient Villages of Northern Syria)
## 8                                                                                                                                                                                                                                         LibJebel Akhdar,<U+00A0>Libya32°49′30″N 21°51′30″E<U+FEFF> / <U+FEFF>32.82500°N 21.85833°E<U+FEFF> / 32.82500; 21.85833<U+FEFF> (Archaeological Site of Cyrene)
## 9                                                                                                                                                                                                                                          LibKhoms,<U+00A0>Libya32°38′18″N 14°17′35″E<U+FEFF> / <U+FEFF>32.63833°N 14.29306°E<U+FEFF> / 32.63833; 14.29306<U+FEFF> (Archaeological Site of Leptis Magna)
## 10                                                                                                                                                                                                                                           LibSabratha,<U+00A0>Libya32°48′19″N 12°29′6″E<U+FEFF> / <U+FEFF>32.80528°N 12.48500°E<U+FEFF> / 32.80528; 12.48500<U+FEFF> (Archaeological Site of Sabratha)
## 11                                                                                                                                                                                                                                                                IraqSalah ad Din,<U+00A0>Iraq35°27′24″N 43°15′45″E<U+FEFF> / <U+FEFF>35.45667°N 43.26250°E<U+FEFF> / 35.45667; 43.26250<U+FEFF> (Ashur)
##                               V4   V6           V7
## 2                  Cultural:(iv) 1979 2001<U+2013>
## 3       Natural:(vii), (ix), (x) 1991 1992<U+2013>
## 4             Cultural:(iii)(iv) 1986 2013<U+2013>
## 5          Cultural:(i)(iii)(vi) 1980 2013<U+2013>
## 6  Cultural:(i)(ii)(iii)(iv)(vi) 1979 2013<U+2013>
## 7          Cultural:(iii)(iv)(v) 2011 2013<U+2013>
## 8     Cultural:(ii), (iii), (vi) 1982 2016<U+2013>
## 9      Cultural:(i), (ii), (iii) 1982 2016<U+2013>
## 10                Cultural:(iii) 1982 2016<U+2013>
## 11          Cultural:(iii), (iv) 2003 2003<U+2013>
colnames(danger_table) <- c("name","location","criterion","year_des","year_end")
head(danger_table)
##                                 name
## 2                           Abu Mena
## 3    Air and Tenere Natural Reserves
## 4             Ancient City of Aleppo
## 5              Ancient City of Bosra
## 6           Ancient City of Damascus
## 7 Ancient Villages of Northern Syria
##                                                                                                                                                                                                                                                                                                                                                                                                         location
## 2 EgyAbusir,<U+00A0>Egypt.mw-parser-output .geo-default,.mw-parser-output .geo-dms,.mw-parser-output .geo-dec{display:inline}.mw-parser-output .geo-nondefault,.mw-parser-output .geo-multi-punct{display:none}.mw-parser-output .longitude,.mw-parser-output .latitude{white-space:nowrap}30°50′30″N 29°39′50″E<U+FEFF> / <U+FEFF>30.84167°N 29.66389°E<U+FEFF> / 30.84167; 29.66389<U+FEFF> (Abu Mena)
## 3                                                                                                                                                                                                                                                   Niger1Arlit Department,<U+00A0>Niger18°17′N 8°0′E<U+FEFF> / <U+FEFF>18.283°N 8.000°E<U+FEFF> / 18.283; 8.000<U+FEFF> (Air and Tenere Natural Reserves)
## 4                                                                                                                                                                                                                                                           Aleppo Governorate, <U+00A0>Syria36°14′N 37°10′E<U+FEFF> / <U+FEFF>36.233°N 37.167°E<U+FEFF> / 36.233; 37.167<U+FEFF> (Ancient City of Aleppo)
## 5                                                                                                                                                                                                                                              Daraa Governorate, <U+00A0>Syria32°31′5″N 36°28′54″E<U+FEFF> / <U+FEFF>32.51806°N 36.48167°E<U+FEFF> / 32.51806; 36.48167<U+FEFF> (Ancient City of Bosra)
## 6                                                                                                                                                                                                                                       Damascus Governorate, <U+00A0>Syria33°30′41″N 36°18′23″E<U+FEFF> / <U+FEFF>33.51139°N 36.30639°E<U+FEFF> / 33.51139; 36.30639<U+FEFF> (Ancient City of Damascus)
## 7                                                                                                                                                                                                                                                    <U+00A0>Syria36°20′3″N 36°50′39″E<U+FEFF> / <U+FEFF>36.33417°N 36.84417°E<U+FEFF> / 36.33417; 36.84417<U+FEFF> (Ancient Villages of Northern Syria)
##                       criterion year_des     year_end
## 2                 Cultural:(iv)     1979 2001<U+2013>
## 3      Natural:(vii), (ix), (x)     1991 1992<U+2013>
## 4            Cultural:(iii)(iv)     1986 2013<U+2013>
## 5         Cultural:(i)(iii)(vi)     1980 2013<U+2013>
## 6 Cultural:(i)(ii)(iii)(iv)(vi)     1979 2013<U+2013>
## 7         Cultural:(iii)(iv)(v)     2011 2013<U+2013>
  1. Data cleaning
danger_table$criterion[1:3]
## [1] "Cultural:(iv)"            "Natural:(vii), (ix), (x)"
## [3] "Cultural:(iii)(iv)"
danger_table$criterion <- ifelse(str_detect(danger_table$criterion, "Natural")==TRUE, "Natural", "Cultural")
danger_table$criterion[1:3]
## [1] "Cultural" "Natural"  "Cultural"
danger_table$year_des[1:3]
## [1] "1979" "1991" "1986"
danger_table$year_des <- as.numeric(danger_table$year_des)
danger_table$year_des[1:3]
## [1] 1979 1991 1986
danger_table$year_end[1:10]
##  [1] "2001<U+2013>" "1992<U+2013>" "2013<U+2013>" "2013<U+2013>" "2013<U+2013>" "2013<U+2013>" "2016<U+2013>" "2016<U+2013>" "2016<U+2013>"
## [10] "2003<U+2013>"
year_end_clean <- unlist(str_extract(danger_table$year_end, "^[[:digit:]]{4}"))
danger_table$year_end <- as.numeric(year_end_clean)
danger_table$year_end[1:10]
##  [1] 2001 1992 2013 2013 2013 2013 2016 2016 2016 2003

The location variable contains the name of the site’s location, the country, and the geographic coordinates in several varieties.

danger_table$location[c(1, 3, 5)]
## [1] "EgyAbusir,<U+00A0>Egypt.mw-parser-output .geo-default,.mw-parser-output .geo-dms,.mw-parser-output .geo-dec{display:inline}.mw-parser-output .geo-nondefault,.mw-parser-output .geo-multi-punct{display:none}.mw-parser-output .longitude,.mw-parser-output .latitude{white-space:nowrap}30°50′30″N 29°39′50″E<U+FEFF> / <U+FEFF>30.84167°N 29.66389°E<U+FEFF> / 30.84167; 29.66389<U+FEFF> (Abu Mena)"
## [2] "Aleppo Governorate, <U+00A0>Syria36°14′N 37°10′E<U+FEFF> / <U+FEFF>36.233°N 37.167°E<U+FEFF> / 36.233; 37.167<U+FEFF> (Ancient City of Aleppo)"                                                                                                                                                                                                                   
## [3] "Damascus Governorate, <U+00A0>Syria33°30′41″N 36°18′23″E<U+FEFF> / <U+FEFF>33.51139°N 36.30639°E<U+FEFF> / 33.51139; 36.30639<U+FEFF> (Ancient City of Damascus)"
reg_y <- "[/][ -]*[[:digit:]]*[.]*[[:digit:]]*[;]"
reg_x <- "[;][ -]*[[:digit:]]*[.]*[[:digit:]]*"
y_coords <- str_extract(danger_table$location, reg_y)
y_coords
##  [1] "/ 30.84167;"  "/ 18.283;"    "/ 36.233;"    "/ 32.51806;"  "/ 33.51139;" 
##  [6] "/ 36.33417;"  "/ 32.82500;"  "/ 32.63833;"  "/ 32.80528;"  "/ 35.45667;" 
## [11] "/ -8.11111;"  "/ -19.58361;" "/ 11.417;"    "/ 34.78167;"  "/ 34.83194;" 
## [16] "/ -11.68306;" "/ 25.317;"    "/ 9.55389;"   "/ 4.000;"     "/ 35.58806;" 
## [21] "/ 31.52417;"  "/ 39.05000;"  "/ 48.200;"    "/ 14.200;"    "/ 27.633;"   
## [26] "/ -2.500;"    "/ 3.05222;"   "/ 9.000;"     "/ 34.39667;"  "/ 42.66111;" 
## [31] "/ 7.600;"     "/ 6.83972;"   "/ 13.000;"    "/ 2.000;"     "/ 31.77667;" 
## [36] "/ 15.35556;"  "/ 30.13333;"  "/ 13.90639;"  "/ 31.71972;"  "/ 15.92694;" 
## [41] "/ -14.467;"   "/ 15.74444;"  "/ 24.833;"    "/ 34.200;"    "/ -9.00000;" 
## [46] "/ 34.55417;"  "/ 16.77333;"  "/ 16.28972;"  "/ 0.32917;"   "/ -2.500;"   
## [51] "/ 0.917;"     "/ 46.30611;"
y_coords <- as.numeric(str_sub(y_coords, 3, -2))
y_coords
##  [1]  30.84167  18.28300  36.23300  32.51806  33.51139  36.33417  32.82500
##  [8]  32.63833  32.80528  35.45667  -8.11111 -19.58361  11.41700  34.78167
## [15]  34.83194 -11.68306  25.31700   9.55389   4.00000  35.58806  31.52417
## [22]  39.05000  48.20000  14.20000  27.63300  -2.50000   3.05222   9.00000
## [29]  34.39667  42.66111   7.60000   6.83972  13.00000   2.00000  31.77667
## [36]  15.35556  30.13333  13.90639  31.71972  15.92694 -14.46700  15.74444
## [43]  24.83300  34.20000  -9.00000  34.55417  16.77333  16.28972   0.32917
## [50]  -2.50000   0.91700  46.30611
danger_table$y_coords <- y_coords

x_coords <- str_extract(danger_table$location, reg_x)
x_coords
##  [1] "; 29.66389"  "; 8.000"     "; 37.167"    "; 36.48167"  "; 36.30639" 
##  [6] "; 36.84417"  "; 21.85833"  "; 14.29306"  "; 12.48500"  "; 43.26250" 
## [11] "; -79.07500" "; -65.75306" "; -69.667"   "; 36.26306"  "; 67.82667" 
## [16] "; 160.18306" "; -80.933"   "; -79.65583" "; 29.250"    "; 42.71833" 
## [21] "; 35.10889"  "; 66.83333"  "; 16.367"    "; 43.317"    "; -112.550" 
## [26] "; 28.750"    "; 36.50361"  "; 21.500"    "; 64.51611"  "; 20.26556" 
## [31] "; -8.383"    "; 158.33083" "; -12.667"   "; 28.500"    "; 35.23417" 
## [36] "; 44.20806"  "; 9.50000"   "; -4.55500"  "; 35.13056"  "; 48.62667" 
## [41] "; 49.700"    "; -84.67500" "; 10.333"    "; 43.867"    "; 37.40000" 
## [46] "; 38.26667"  "; -2.99944"  "; -0.04444"  "; 32.55333"  "; 101.500"  
## [51] "; 29.167"    "; 23.13056"
x_coords <- as.numeric(str_sub(x_coords, 3, -1))
x_coords
##  [1]   29.66389    8.00000   37.16700   36.48167   36.30639   36.84417
##  [7]   21.85833   14.29306   12.48500   43.26250  -79.07500  -65.75306
## [13]  -69.66700   36.26306   67.82667  160.18306  -80.93300  -79.65583
## [19]   29.25000   42.71833   35.10889   66.83333   16.36700   43.31700
## [25] -112.55000   28.75000   36.50361   21.50000   64.51611   20.26556
## [31]   -8.38300  158.33083  -12.66700   28.50000   35.23417   44.20806
## [37]    9.50000   -4.55500   35.13056   48.62667   49.70000  -84.67500
## [43]   10.33300   43.86700   37.40000   38.26667   -2.99944   -0.04444
## [49]   32.55333  101.50000   29.16700   23.13056
danger_table$x_coords <- x_coords
danger_table$location <- NULL
danger_table
##                                                                                             name
## 2                                                                                       Abu Mena
## 3                                                                Air and Tenere Natural Reserves
## 4                                                                         Ancient City of Aleppo
## 5                                                                          Ancient City of Bosra
## 6                                                                       Ancient City of Damascus
## 7                                                             Ancient Villages of Northern Syria
## 8                                                                  Archaeological Site of Cyrene
## 9                                                            Archaeological Site of Leptis Magna
## 10                                                               Archaeological Site of Sabratha
## 11                                                                        Ashur (Qal'at Sherqat)
## 12                                                                 Chan Chan Archaeological Zone
## 13                                                                                City of Potosi
## 14                                                                             Coro and its Port
## 15                                                  Crac des Chevaliers and Qal’at Salah El-Din
## 16                           Cultural Landscape and Archaeological Remains of the Bamiyan Valley
## 17                                                                                  East Rennell
## 18                                                                      Everglades National Park
## 19                         Fortifications on the Caribbean Side of Panama: Portobelo-San Lorenzo
## 20                                                                         Garamba National Park
## 21                                                                                         Hatra
## 22                                                                     Hebron/Al-Khalil Old Town
## 23                                                               Historic Centre of Shakhrisyabz
## 24                                                                     Historic Centre of Vienna
## 25                                                                 Historic Town of Zab<U+012B>d
## 26                                         Islands and Protected Areas of the Gulf of California
## 27                                                                    Kahuzi-Biega National Park
## 28                                                                   Lake Turkana National Parks
## 29                                                         Manovo-Gounda St Floris National Park
## 30                                                     Minaret and Archaeological Remains of Jam
## 31                                                                  Medieval Monuments in Kosovo
## 32                                                             Mount Nimba Strict Nature Reserve
## 33                                            Nan Madol: Ceremonial Centre of Eastern Micronesia
## 34                                                                    Niokolo-Koba National Park
## 35                                                                        Okapi Wildlife Reserve
## 36                                                           Old City of Jerusalem and its Walls
## 37                                                                            Old City of Sana'a
## 38                                                                          Old Town of Ghadames
## 39                                                                           Old Towns of Djenne
## 40 Palestine: Land of Olives and Vines <U+2013> Cultural Landscape of Southern Jerusalem, Battir
## 41                                                                     Old Walled City of Shibam
## 42                                                                 Rainforests of the Atsinanana
## 43                                                                 Rio Platano Biosphere Reserve
## 44                                                              Rock-Art Sites of Tadrart Acacus
## 45                                                                   Samarra Archaeological City
## 46                                                                           Selous Game Reserve
## 47                                                                               Site of Palmyra
## 48                                                                                      Timbuktu
## 49                                                                                 Tomb of Askia
## 50                                                              Tombs of Buganda Kings at Kasubi
## 51                                                       Tropical Rainforest Heritage of Sumatra
## 52                                                                         Virunga National Park
## 53                                                  Ro<U+0219>ia Montan<U+0103> Mining Landscape
##    criterion year_des year_end  y_coords   x_coords
## 2   Cultural     1979     2001  30.84167   29.66389
## 3    Natural     1991     1992  18.28300    8.00000
## 4   Cultural     1986     2013  36.23300   37.16700
## 5   Cultural     1980     2013  32.51806   36.48167
## 6   Cultural     1979     2013  33.51139   36.30639
## 7   Cultural     2011     2013  36.33417   36.84417
## 8   Cultural     1982     2016  32.82500   21.85833
## 9   Cultural     1982     2016  32.63833   14.29306
## 10  Cultural     1982     2016  32.80528   12.48500
## 11  Cultural     2003     2003  35.45667   43.26250
## 12  Cultural     1986     1986  -8.11111  -79.07500
## 13  Cultural     1987     2014 -19.58361  -65.75306
## 14  Cultural     1993     2005  11.41700  -69.66700
## 15  Cultural     2006     2013  34.78167   36.26306
## 16  Cultural     2003     2003  34.83194   67.82667
## 17   Natural     1998     2013 -11.68306  160.18306
## 18   Natural     1979     1993  25.31700  -80.93300
## 19  Cultural     1980     2012   9.55389  -79.65583
## 20   Natural     1980     1984   4.00000   29.25000
## 21  Cultural     1985     2015  35.58806   42.71833
## 22  Cultural     2017     2017  31.52417   35.10889
## 23  Cultural     2000     2016  39.05000   66.83333
## 24  Cultural     2001     2017  48.20000   16.36700
## 25  Cultural     1993     2000  14.20000   43.31700
## 26   Natural     2005     2019  27.63300 -112.55000
## 27   Natural     1980     1997  -2.50000   28.75000
## 28   Natural     1997     2018   3.05222   36.50361
## 29   Natural     1988     1997   9.00000   21.50000
## 30  Cultural     2002     2002  34.39667   64.51611
## 31  Cultural     2004     2006  42.66111   20.26556
## 32   Natural     1981     1992   7.60000   -8.38300
## 33  Cultural     2016     2016   6.83972  158.33083
## 34   Natural     1981     2007  13.00000  -12.66700
## 35   Natural     1996     1997   2.00000   28.50000
## 36  Cultural     1981     1982  31.77667   35.23417
## 37  Cultural     1986     2015  15.35556   44.20806
## 38  Cultural     1986     2016  30.13333    9.50000
## 39  Cultural     1988     2016  13.90639   -4.55500
## 40  Cultural     2014     2014  31.71972   35.13056
## 41  Cultural     1982     2015  15.92694   48.62667
## 42   Natural     2007     2010 -14.46700   49.70000
## 43   Natural     1982     1996  15.74444  -84.67500
## 44  Cultural     1985     2016  24.83300   10.33300
## 45  Cultural     2007     2007  34.20000   43.86700
## 46   Natural     1982     2014  -9.00000   37.40000
## 47  Cultural     1980     2013  34.55417   38.26667
## 48  Cultural     1988     2012  16.77333   -2.99944
## 49  Cultural     2004     2012  16.28972   -0.04444
## 50  Cultural     2001     2010   0.32917   32.55333
## 51   Natural     2004     2011  -2.50000  101.50000
## 52   Natural     1979     1994   0.91700   29.16700
## 53  Cultural     2021     2021  46.30611   23.13056
round(danger_table$y_coords, 2)[1:3]
## [1] 30.84 18.28 36.23
round(danger_table$x_coords, 2)[1:3]
## [1] 29.66  8.00 37.17
length(danger_table$y_coords)
## [1] 52
length(danger_table$x_coords)
## [1] 52
dim(danger_table)
## [1] 52  6
head(danger_table)
##                                 name criterion year_des year_end y_coords
## 2                           Abu Mena  Cultural     1979     2001 30.84167
## 3    Air and Tenere Natural Reserves   Natural     1991     1992 18.28300
## 4             Ancient City of Aleppo  Cultural     1986     2013 36.23300
## 5              Ancient City of Bosra  Cultural     1980     2013 32.51806
## 6           Ancient City of Damascus  Cultural     1979     2013 33.51139
## 7 Ancient Villages of Northern Syria  Cultural     2011     2013 36.33417
##   x_coords
## 2 29.66389
## 3  8.00000
## 4 37.16700
## 5 36.48167
## 6 36.30639
## 7 36.84417
  1. Plot the locations of the places on a map
pch <- ifelse(danger_table$criterion == "Natural", 19, 2)
map("world", col = "darkgrey", lwd = 0.5, mar = c(0.1, 0.1, 0.1, 0.1))
points(danger_table$x_coords, danger_table$y_coords, pch = pch)
box()

  1. Analytic approaches to answer the questions
table(danger_table$criterion)
## 
## Cultural  Natural 
##       36       16
  1. Visualization of time trends
hist(danger_table$year_end, freq=TRUE,
     xlab = "Year when site was put on the list of endangered sites",
     main = "")

duration <- danger_table$year_end - danger_table$year_des
hist(duration, freq = TRUE,
     xlab = "Years it took to become an endangered site",
     main = "")