HTML
HTML’s marked up structured
Web content is an interpreted version of the source code
How the document is structured and the function of its various parts: headlines, links, tables, etc…
Element inspector
Elements
<title>First HTML</title>
Attributes
<a href="http://www.r-datacollection.com/">Link to Homepage</a>
http://www.r-datacollection.com/bookmaterials.html
Tree structure
First HTML
I am your first HTML-file!
A tree perspective on HTML
Comments
Reversed and special characters
<p>5 < 6 but 7 > 3 </p>
HTML entities
Character | Entity name | Explanation |
---|---|---|
" | " | quotation mark |
’ | ' | apostrophe |
& | & | ampersand |
< | < | less than |
> | > | greater than |
non-breaking space |
Document type definition
<!DOCTYPE html>
Spaces and line breaks
Writing code is poetry
Writing
code
is
poetry
Writing
code
is
poetry
Loading and representing the contents of HTML/XML files in an R session
Inspecting content on the Web: browser to display HTML content nicely
Importing HTML files into R and extracting info. from them: parser in R to construct useful representations of HTML documents
Reading vs. Parsing
Reading does not care to understand the formal grammar that underlies HTML but merely recognize the sequence of symbols included in the HTML file: Merely loading the content of an HTML file into an R session.
url <- "https://news.v.daum.net/v/20210922193955830"
example <- readLines(url)
## Warning in readLines(url): 'https://news.v.daum.net/v/20210922193955830'에서 불
## 완전한 마지막 행이 발견되었습니다
example <- paste0(example, collapse = " ")
class(example)
## [1] "character"
library(httr)
url <- "https://news.v.daum.net/v/20210922193955830"
example <- httr::GET(url)
example
## Response [https://news.v.daum.net/v/20210922193955830]
## Date: 2021-09-23 02:58
## Status: 200
## Content-Type: text/html;charset=UTF-8
## Size: 59.2 kB
## <!doctype html>
## <html lang="ko">
## <head data-cloud-area="head">
## <meta charset="utf-8">
## <meta http-equiv="X-UA-Compatible" content="IE=edge">
## <style>
## @import url('//t1.daumcdn.net/harmony_static/cloud/page/c8a258a8a...
## @import url('//t1.daumcdn.net/harmony_static/cloud/2021/09/14/com...
## </style>
## <style>
## ...
class(example)
## [1] "response"
GET()
is agnostic about the different tag elements (name, attribute, values, etc.) and produces results that do not reflect the document’s internal hierarchy as implied by the nested tags in any sensible way.
To achieve a useful representation of HTML files, we need to employ a program that understands the special meaning of the markup structures and reconstructs the implied hierarchy of an HTML file within some R-specific data structure.
Transformation from any HTML file to a queryable Document Object Model: Parsing using XML package in two steps
1. ```html_parse()``` first parses the entire target document and creates the DOM in a tree-like data structure of the C language.
2. The C-level node structure is converted into an object of the R language through handler functions.
library(XML)
parsed_example <- htmlParse(example)
class(parsed_example)
## [1] "HTMLInternalDocument" "HTMLInternalDocument" "XMLInternalDocument"
## [4] "XMLAbstractDocument"
#parsed_example
Select 1) a DAUM news page of your interest in your browser
Have a look at the source code
Inspect various elements in the Inspect Elements tool of your browser
Copy and paste the elements for the outlet, the upload data, the headline, any highlight, the body content, and the comments (댓글)
Check and report the structure of the elements.
Generate some research questions
1,121 heretage sites like the Pyramids in Egypt
Which sites are threatened and where are they located?
Are there regions in the world where sitets are more endangered than in others?
What are the reasons that put a site at risk?
Questions to be considered in data collection
What type of data is most suited to answer your question?
Is the quality of the data sufficiently high to answer your question?
Is the information systematically flowed (biased)?
Find a source of data that can be used to answer the questions
https://en.wikipedia.org/wiki/List_of_World_Heritage_in_Danger
library(httr)
library(XML)
library(stringr)
library(maps)
heritage_page <- GET("https://en.wikipedia.org/wiki/List_of_World_Heritage_in_Danger")
heritage_page
## Response [https://en.wikipedia.org/wiki/List_of_World_Heritage_in_Danger]
## Date: 2021-09-23 02:07
## Status: 200
## Content-Type: text/html; charset=UTF-8
## Size: 513 kB
## <!DOCTYPE html>
## <html class="client-nojs" lang="en" dir="ltr">
## <head>
## <meta charset="UTF-8"/>
## <title>List of World Heritage in Danger - Wikipedia</title>
## <script>document.documentElement.className="client-js";RLCONF={"wgBreakFrames...
## ,"Geographic coordinate lists","Articles with Geo","Featured lists","Lists of...
## "wgGEAskQuestionEnabled":!1,"wgGELinkRecommendationsFrontendEnabled":!1,"wgCe...
## "ext.visualEditor.desktopArticleTarget.init","ext.visualEditor.targetLoader",...
## <script>(RLQ=window.RLQ||[]).push(function(){mw.loader.implement("user.option...
## ...
class(heritage_page)
## [1] "response"
heritage_parsed <- htmlParse(heritage_page, encoding = "UTF-8")
class(heritage_parsed)
## [1] "HTMLInternalDocument" "HTMLInternalDocument" "XMLInternalDocument"
## [4] "XMLAbstractDocument"
tables <- readHTMLTable(heritage_parsed, stringsAsFactors = FALSE)
class(tables)
## [1] "list"
length(tables)
## [1] 5
tables[[1]]
## V1
## 1 Map this section's coordinates using: OpenStreetMap<U+00A0>
## 2 Download coordinates as: KML
tables[[2]][1:10,]
## V1 V2
## 1 Name Image
## 2 Abu Mena
## 3 Air and Tenere Natural Reserves
## 4 Ancient City of Aleppo
## 5 Ancient City of Bosra
## 6 Ancient City of Damascus
## 7 Ancient Villages of Northern Syria
## 8 Archaeological Site of Cyrene
## 9 Archaeological Site of Leptis Magna
## 10 Archaeological Site of Sabratha
## V3
## 1 Location
## 2 EgyAbusir,<U+00A0>Egypt.mw-parser-output .geo-default,.mw-parser-output .geo-dms,.mw-parser-output .geo-dec{display:inline}.mw-parser-output .geo-nondefault,.mw-parser-output .geo-multi-punct{display:none}.mw-parser-output .longitude,.mw-parser-output .latitude{white-space:nowrap}30°50′30″N 29°39′50″E<U+FEFF> / <U+FEFF>30.84167°N 29.66389°E<U+FEFF> / 30.84167; 29.66389<U+FEFF> (Abu Mena)
## 3 Niger1Arlit Department,<U+00A0>Niger18°17′N 8°0′E<U+FEFF> / <U+FEFF>18.283°N 8.000°E<U+FEFF> / 18.283; 8.000<U+FEFF> (Air and Tenere Natural Reserves)
## 4 Aleppo Governorate, <U+00A0>Syria36°14′N 37°10′E<U+FEFF> / <U+FEFF>36.233°N 37.167°E<U+FEFF> / 36.233; 37.167<U+FEFF> (Ancient City of Aleppo)
## 5 Daraa Governorate, <U+00A0>Syria32°31′5″N 36°28′54″E<U+FEFF> / <U+FEFF>32.51806°N 36.48167°E<U+FEFF> / 32.51806; 36.48167<U+FEFF> (Ancient City of Bosra)
## 6 Damascus Governorate, <U+00A0>Syria33°30′41″N 36°18′23″E<U+FEFF> / <U+FEFF>33.51139°N 36.30639°E<U+FEFF> / 33.51139; 36.30639<U+FEFF> (Ancient City of Damascus)
## 7 <U+00A0>Syria36°20′3″N 36°50′39″E<U+FEFF> / <U+FEFF>36.33417°N 36.84417°E<U+FEFF> / 36.33417; 36.84417<U+FEFF> (Ancient Villages of Northern Syria)
## 8 LibJebel Akhdar,<U+00A0>Libya32°49′30″N 21°51′30″E<U+FEFF> / <U+FEFF>32.82500°N 21.85833°E<U+FEFF> / 32.82500; 21.85833<U+FEFF> (Archaeological Site of Cyrene)
## 9 LibKhoms,<U+00A0>Libya32°38′18″N 14°17′35″E<U+FEFF> / <U+FEFF>32.63833°N 14.29306°E<U+FEFF> / 32.63833; 14.29306<U+FEFF> (Archaeological Site of Leptis Magna)
## 10 LibSabratha,<U+00A0>Libya32°48′19″N 12°29′6″E<U+FEFF> / <U+FEFF>32.80528°N 12.48500°E<U+FEFF> / 32.80528; 12.48500<U+FEFF> (Archaeological Site of Sabratha)
## V4 V5 V6 V7
## 1 Criteria Areaha (acre) Year (WHS) Endangered
## 2 Cultural:(iv) 182 (450) 1979 2001<U+2013>
## 3 Natural:(vii), (ix), (x) 7,736,000 (19,120,000) 1991 1992<U+2013>
## 4 Cultural:(iii)(iv) 350 (860) 1986 2013<U+2013>
## 5 Cultural:(i)(iii)(vi) <U+2014> 1980 2013<U+2013>
## 6 Cultural:(i)(ii)(iii)(iv)(vi) 86 (210) 1979 2013<U+2013>
## 7 Cultural:(iii)(iv)(v) 12,290 (30,400) 2011 2013<U+2013>
## 8 Cultural:(ii), (iii), (vi) <U+2014> 1982 2016<U+2013>
## 9 Cultural:(i), (ii), (iii) <U+2014> 1982 2016<U+2013>
## 10 Cultural:(iii) <U+2014> 1982 2016<U+2013>
## V8
## 1 Reason
## 2 Cave-ins in the area caused by the clay at the surface, which becomes semi-liquid when met with "excess water"
## 3 Military conflict and civil disturbance in the region as well as a reduction of wildlife population and degradation of the vegetation cover
## 4 Syrian Civil War, currently held by the government. Bombings continue threatening the site.
## 5 Syrian Civil War, held by the government.
## 6 Syrian Civil War, rebel gunfire and mortar shelling, mainly from adjacent Jobar suburb endangers foundations.
## 7 Syrian Civil War, some held by rebels. Reports of looting and demolitions by Islamist groups.
## 8 Libyan Civil War, presence of armed groups, already incurred and potential further damage.
## 9 Libyan Civil War, presence of armed groups, already incurred and potential further damage.
## 10 Libyan Civil War, presence of armed groups, already incurred and potential further damage.
## V9
## 1 Refs
## 2 [17][18][19]
## 3 [20][21]
## 4 [22]
## 5 [23]
## 6 [24]
## 7 [25]
## 8 [26][27]
## 9 [27][28]
## 10 [27][29]
danger_table <- tables[[2]]
class(danger_table)
## [1] "data.frame"
danger_table[1:10,]
## V1 V2
## 1 Name Image
## 2 Abu Mena
## 3 Air and Tenere Natural Reserves
## 4 Ancient City of Aleppo
## 5 Ancient City of Bosra
## 6 Ancient City of Damascus
## 7 Ancient Villages of Northern Syria
## 8 Archaeological Site of Cyrene
## 9 Archaeological Site of Leptis Magna
## 10 Archaeological Site of Sabratha
## V3
## 1 Location
## 2 EgyAbusir,<U+00A0>Egypt.mw-parser-output .geo-default,.mw-parser-output .geo-dms,.mw-parser-output .geo-dec{display:inline}.mw-parser-output .geo-nondefault,.mw-parser-output .geo-multi-punct{display:none}.mw-parser-output .longitude,.mw-parser-output .latitude{white-space:nowrap}30°50′30″N 29°39′50″E<U+FEFF> / <U+FEFF>30.84167°N 29.66389°E<U+FEFF> / 30.84167; 29.66389<U+FEFF> (Abu Mena)
## 3 Niger1Arlit Department,<U+00A0>Niger18°17′N 8°0′E<U+FEFF> / <U+FEFF>18.283°N 8.000°E<U+FEFF> / 18.283; 8.000<U+FEFF> (Air and Tenere Natural Reserves)
## 4 Aleppo Governorate, <U+00A0>Syria36°14′N 37°10′E<U+FEFF> / <U+FEFF>36.233°N 37.167°E<U+FEFF> / 36.233; 37.167<U+FEFF> (Ancient City of Aleppo)
## 5 Daraa Governorate, <U+00A0>Syria32°31′5″N 36°28′54″E<U+FEFF> / <U+FEFF>32.51806°N 36.48167°E<U+FEFF> / 32.51806; 36.48167<U+FEFF> (Ancient City of Bosra)
## 6 Damascus Governorate, <U+00A0>Syria33°30′41″N 36°18′23″E<U+FEFF> / <U+FEFF>33.51139°N 36.30639°E<U+FEFF> / 33.51139; 36.30639<U+FEFF> (Ancient City of Damascus)
## 7 <U+00A0>Syria36°20′3″N 36°50′39″E<U+FEFF> / <U+FEFF>36.33417°N 36.84417°E<U+FEFF> / 36.33417; 36.84417<U+FEFF> (Ancient Villages of Northern Syria)
## 8 LibJebel Akhdar,<U+00A0>Libya32°49′30″N 21°51′30″E<U+FEFF> / <U+FEFF>32.82500°N 21.85833°E<U+FEFF> / 32.82500; 21.85833<U+FEFF> (Archaeological Site of Cyrene)
## 9 LibKhoms,<U+00A0>Libya32°38′18″N 14°17′35″E<U+FEFF> / <U+FEFF>32.63833°N 14.29306°E<U+FEFF> / 32.63833; 14.29306<U+FEFF> (Archaeological Site of Leptis Magna)
## 10 LibSabratha,<U+00A0>Libya32°48′19″N 12°29′6″E<U+FEFF> / <U+FEFF>32.80528°N 12.48500°E<U+FEFF> / 32.80528; 12.48500<U+FEFF> (Archaeological Site of Sabratha)
## V4 V5 V6 V7
## 1 Criteria Areaha (acre) Year (WHS) Endangered
## 2 Cultural:(iv) 182 (450) 1979 2001<U+2013>
## 3 Natural:(vii), (ix), (x) 7,736,000 (19,120,000) 1991 1992<U+2013>
## 4 Cultural:(iii)(iv) 350 (860) 1986 2013<U+2013>
## 5 Cultural:(i)(iii)(vi) <U+2014> 1980 2013<U+2013>
## 6 Cultural:(i)(ii)(iii)(iv)(vi) 86 (210) 1979 2013<U+2013>
## 7 Cultural:(iii)(iv)(v) 12,290 (30,400) 2011 2013<U+2013>
## 8 Cultural:(ii), (iii), (vi) <U+2014> 1982 2016<U+2013>
## 9 Cultural:(i), (ii), (iii) <U+2014> 1982 2016<U+2013>
## 10 Cultural:(iii) <U+2014> 1982 2016<U+2013>
## V8
## 1 Reason
## 2 Cave-ins in the area caused by the clay at the surface, which becomes semi-liquid when met with "excess water"
## 3 Military conflict and civil disturbance in the region as well as a reduction of wildlife population and degradation of the vegetation cover
## 4 Syrian Civil War, currently held by the government. Bombings continue threatening the site.
## 5 Syrian Civil War, held by the government.
## 6 Syrian Civil War, rebel gunfire and mortar shelling, mainly from adjacent Jobar suburb endangers foundations.
## 7 Syrian Civil War, some held by rebels. Reports of looting and demolitions by Islamist groups.
## 8 Libyan Civil War, presence of armed groups, already incurred and potential further damage.
## 9 Libyan Civil War, presence of armed groups, already incurred and potential further damage.
## 10 Libyan Civil War, presence of armed groups, already incurred and potential further damage.
## V9
## 1 Refs
## 2 [17][18][19]
## 3 [20][21]
## 4 [22]
## 5 [23]
## 6 [24]
## 7 [25]
## 8 [26][27]
## 9 [27][28]
## 10 [27][29]
names(danger_table)
## [1] "V1" "V2" "V3" "V4" "V5" "V6" "V7" "V8" "V9"
danger_table <- danger_table[-1, c(1,3,4,6,7)]
danger_table[1:10,]
## V1
## 2 Abu Mena
## 3 Air and Tenere Natural Reserves
## 4 Ancient City of Aleppo
## 5 Ancient City of Bosra
## 6 Ancient City of Damascus
## 7 Ancient Villages of Northern Syria
## 8 Archaeological Site of Cyrene
## 9 Archaeological Site of Leptis Magna
## 10 Archaeological Site of Sabratha
## 11 Ashur (Qal'at Sherqat)
## V3
## 2 EgyAbusir,<U+00A0>Egypt.mw-parser-output .geo-default,.mw-parser-output .geo-dms,.mw-parser-output .geo-dec{display:inline}.mw-parser-output .geo-nondefault,.mw-parser-output .geo-multi-punct{display:none}.mw-parser-output .longitude,.mw-parser-output .latitude{white-space:nowrap}30°50′30″N 29°39′50″E<U+FEFF> / <U+FEFF>30.84167°N 29.66389°E<U+FEFF> / 30.84167; 29.66389<U+FEFF> (Abu Mena)
## 3 Niger1Arlit Department,<U+00A0>Niger18°17′N 8°0′E<U+FEFF> / <U+FEFF>18.283°N 8.000°E<U+FEFF> / 18.283; 8.000<U+FEFF> (Air and Tenere Natural Reserves)
## 4 Aleppo Governorate, <U+00A0>Syria36°14′N 37°10′E<U+FEFF> / <U+FEFF>36.233°N 37.167°E<U+FEFF> / 36.233; 37.167<U+FEFF> (Ancient City of Aleppo)
## 5 Daraa Governorate, <U+00A0>Syria32°31′5″N 36°28′54″E<U+FEFF> / <U+FEFF>32.51806°N 36.48167°E<U+FEFF> / 32.51806; 36.48167<U+FEFF> (Ancient City of Bosra)
## 6 Damascus Governorate, <U+00A0>Syria33°30′41″N 36°18′23″E<U+FEFF> / <U+FEFF>33.51139°N 36.30639°E<U+FEFF> / 33.51139; 36.30639<U+FEFF> (Ancient City of Damascus)
## 7 <U+00A0>Syria36°20′3″N 36°50′39″E<U+FEFF> / <U+FEFF>36.33417°N 36.84417°E<U+FEFF> / 36.33417; 36.84417<U+FEFF> (Ancient Villages of Northern Syria)
## 8 LibJebel Akhdar,<U+00A0>Libya32°49′30″N 21°51′30″E<U+FEFF> / <U+FEFF>32.82500°N 21.85833°E<U+FEFF> / 32.82500; 21.85833<U+FEFF> (Archaeological Site of Cyrene)
## 9 LibKhoms,<U+00A0>Libya32°38′18″N 14°17′35″E<U+FEFF> / <U+FEFF>32.63833°N 14.29306°E<U+FEFF> / 32.63833; 14.29306<U+FEFF> (Archaeological Site of Leptis Magna)
## 10 LibSabratha,<U+00A0>Libya32°48′19″N 12°29′6″E<U+FEFF> / <U+FEFF>32.80528°N 12.48500°E<U+FEFF> / 32.80528; 12.48500<U+FEFF> (Archaeological Site of Sabratha)
## 11 IraqSalah ad Din,<U+00A0>Iraq35°27′24″N 43°15′45″E<U+FEFF> / <U+FEFF>35.45667°N 43.26250°E<U+FEFF> / 35.45667; 43.26250<U+FEFF> (Ashur)
## V4 V6 V7
## 2 Cultural:(iv) 1979 2001<U+2013>
## 3 Natural:(vii), (ix), (x) 1991 1992<U+2013>
## 4 Cultural:(iii)(iv) 1986 2013<U+2013>
## 5 Cultural:(i)(iii)(vi) 1980 2013<U+2013>
## 6 Cultural:(i)(ii)(iii)(iv)(vi) 1979 2013<U+2013>
## 7 Cultural:(iii)(iv)(v) 2011 2013<U+2013>
## 8 Cultural:(ii), (iii), (vi) 1982 2016<U+2013>
## 9 Cultural:(i), (ii), (iii) 1982 2016<U+2013>
## 10 Cultural:(iii) 1982 2016<U+2013>
## 11 Cultural:(iii), (iv) 2003 2003<U+2013>
colnames(danger_table) <- c("name","location","criterion","year_des","year_end")
head(danger_table)
## name
## 2 Abu Mena
## 3 Air and Tenere Natural Reserves
## 4 Ancient City of Aleppo
## 5 Ancient City of Bosra
## 6 Ancient City of Damascus
## 7 Ancient Villages of Northern Syria
## location
## 2 EgyAbusir,<U+00A0>Egypt.mw-parser-output .geo-default,.mw-parser-output .geo-dms,.mw-parser-output .geo-dec{display:inline}.mw-parser-output .geo-nondefault,.mw-parser-output .geo-multi-punct{display:none}.mw-parser-output .longitude,.mw-parser-output .latitude{white-space:nowrap}30°50′30″N 29°39′50″E<U+FEFF> / <U+FEFF>30.84167°N 29.66389°E<U+FEFF> / 30.84167; 29.66389<U+FEFF> (Abu Mena)
## 3 Niger1Arlit Department,<U+00A0>Niger18°17′N 8°0′E<U+FEFF> / <U+FEFF>18.283°N 8.000°E<U+FEFF> / 18.283; 8.000<U+FEFF> (Air and Tenere Natural Reserves)
## 4 Aleppo Governorate, <U+00A0>Syria36°14′N 37°10′E<U+FEFF> / <U+FEFF>36.233°N 37.167°E<U+FEFF> / 36.233; 37.167<U+FEFF> (Ancient City of Aleppo)
## 5 Daraa Governorate, <U+00A0>Syria32°31′5″N 36°28′54″E<U+FEFF> / <U+FEFF>32.51806°N 36.48167°E<U+FEFF> / 32.51806; 36.48167<U+FEFF> (Ancient City of Bosra)
## 6 Damascus Governorate, <U+00A0>Syria33°30′41″N 36°18′23″E<U+FEFF> / <U+FEFF>33.51139°N 36.30639°E<U+FEFF> / 33.51139; 36.30639<U+FEFF> (Ancient City of Damascus)
## 7 <U+00A0>Syria36°20′3″N 36°50′39″E<U+FEFF> / <U+FEFF>36.33417°N 36.84417°E<U+FEFF> / 36.33417; 36.84417<U+FEFF> (Ancient Villages of Northern Syria)
## criterion year_des year_end
## 2 Cultural:(iv) 1979 2001<U+2013>
## 3 Natural:(vii), (ix), (x) 1991 1992<U+2013>
## 4 Cultural:(iii)(iv) 1986 2013<U+2013>
## 5 Cultural:(i)(iii)(vi) 1980 2013<U+2013>
## 6 Cultural:(i)(ii)(iii)(iv)(vi) 1979 2013<U+2013>
## 7 Cultural:(iii)(iv)(v) 2011 2013<U+2013>
danger_table$criterion[1:3]
## [1] "Cultural:(iv)" "Natural:(vii), (ix), (x)"
## [3] "Cultural:(iii)(iv)"
danger_table$criterion <- ifelse(str_detect(danger_table$criterion, "Natural")==TRUE, "Natural", "Cultural")
danger_table$criterion[1:3]
## [1] "Cultural" "Natural" "Cultural"
danger_table$year_des[1:3]
## [1] "1979" "1991" "1986"
danger_table$year_des <- as.numeric(danger_table$year_des)
danger_table$year_des[1:3]
## [1] 1979 1991 1986
danger_table$year_end[1:10]
## [1] "2001<U+2013>" "1992<U+2013>" "2013<U+2013>" "2013<U+2013>" "2013<U+2013>" "2013<U+2013>" "2016<U+2013>" "2016<U+2013>" "2016<U+2013>"
## [10] "2003<U+2013>"
year_end_clean <- unlist(str_extract(danger_table$year_end, "^[[:digit:]]{4}"))
danger_table$year_end <- as.numeric(year_end_clean)
danger_table$year_end[1:10]
## [1] 2001 1992 2013 2013 2013 2013 2016 2016 2016 2003
The location
variable contains the name of the site’s location, the country, and the geographic coordinates in several varieties.
danger_table$location[c(1, 3, 5)]
## [1] "EgyAbusir,<U+00A0>Egypt.mw-parser-output .geo-default,.mw-parser-output .geo-dms,.mw-parser-output .geo-dec{display:inline}.mw-parser-output .geo-nondefault,.mw-parser-output .geo-multi-punct{display:none}.mw-parser-output .longitude,.mw-parser-output .latitude{white-space:nowrap}30°50′30″N 29°39′50″E<U+FEFF> / <U+FEFF>30.84167°N 29.66389°E<U+FEFF> / 30.84167; 29.66389<U+FEFF> (Abu Mena)"
## [2] "Aleppo Governorate, <U+00A0>Syria36°14′N 37°10′E<U+FEFF> / <U+FEFF>36.233°N 37.167°E<U+FEFF> / 36.233; 37.167<U+FEFF> (Ancient City of Aleppo)"
## [3] "Damascus Governorate, <U+00A0>Syria33°30′41″N 36°18′23″E<U+FEFF> / <U+FEFF>33.51139°N 36.30639°E<U+FEFF> / 33.51139; 36.30639<U+FEFF> (Ancient City of Damascus)"
reg_y <- "[/][ -]*[[:digit:]]*[.]*[[:digit:]]*[;]"
reg_x <- "[;][ -]*[[:digit:]]*[.]*[[:digit:]]*"
y_coords <- str_extract(danger_table$location, reg_y)
y_coords
## [1] "/ 30.84167;" "/ 18.283;" "/ 36.233;" "/ 32.51806;" "/ 33.51139;"
## [6] "/ 36.33417;" "/ 32.82500;" "/ 32.63833;" "/ 32.80528;" "/ 35.45667;"
## [11] "/ -8.11111;" "/ -19.58361;" "/ 11.417;" "/ 34.78167;" "/ 34.83194;"
## [16] "/ -11.68306;" "/ 25.317;" "/ 9.55389;" "/ 4.000;" "/ 35.58806;"
## [21] "/ 31.52417;" "/ 39.05000;" "/ 48.200;" "/ 14.200;" "/ 27.633;"
## [26] "/ -2.500;" "/ 3.05222;" "/ 9.000;" "/ 34.39667;" "/ 42.66111;"
## [31] "/ 7.600;" "/ 6.83972;" "/ 13.000;" "/ 2.000;" "/ 31.77667;"
## [36] "/ 15.35556;" "/ 30.13333;" "/ 13.90639;" "/ 31.71972;" "/ 15.92694;"
## [41] "/ -14.467;" "/ 15.74444;" "/ 24.833;" "/ 34.200;" "/ -9.00000;"
## [46] "/ 34.55417;" "/ 16.77333;" "/ 16.28972;" "/ 0.32917;" "/ -2.500;"
## [51] "/ 0.917;" "/ 46.30611;"
y_coords <- as.numeric(str_sub(y_coords, 3, -2))
y_coords
## [1] 30.84167 18.28300 36.23300 32.51806 33.51139 36.33417 32.82500
## [8] 32.63833 32.80528 35.45667 -8.11111 -19.58361 11.41700 34.78167
## [15] 34.83194 -11.68306 25.31700 9.55389 4.00000 35.58806 31.52417
## [22] 39.05000 48.20000 14.20000 27.63300 -2.50000 3.05222 9.00000
## [29] 34.39667 42.66111 7.60000 6.83972 13.00000 2.00000 31.77667
## [36] 15.35556 30.13333 13.90639 31.71972 15.92694 -14.46700 15.74444
## [43] 24.83300 34.20000 -9.00000 34.55417 16.77333 16.28972 0.32917
## [50] -2.50000 0.91700 46.30611
danger_table$y_coords <- y_coords
x_coords <- str_extract(danger_table$location, reg_x)
x_coords
## [1] "; 29.66389" "; 8.000" "; 37.167" "; 36.48167" "; 36.30639"
## [6] "; 36.84417" "; 21.85833" "; 14.29306" "; 12.48500" "; 43.26250"
## [11] "; -79.07500" "; -65.75306" "; -69.667" "; 36.26306" "; 67.82667"
## [16] "; 160.18306" "; -80.933" "; -79.65583" "; 29.250" "; 42.71833"
## [21] "; 35.10889" "; 66.83333" "; 16.367" "; 43.317" "; -112.550"
## [26] "; 28.750" "; 36.50361" "; 21.500" "; 64.51611" "; 20.26556"
## [31] "; -8.383" "; 158.33083" "; -12.667" "; 28.500" "; 35.23417"
## [36] "; 44.20806" "; 9.50000" "; -4.55500" "; 35.13056" "; 48.62667"
## [41] "; 49.700" "; -84.67500" "; 10.333" "; 43.867" "; 37.40000"
## [46] "; 38.26667" "; -2.99944" "; -0.04444" "; 32.55333" "; 101.500"
## [51] "; 29.167" "; 23.13056"
x_coords <- as.numeric(str_sub(x_coords, 3, -1))
x_coords
## [1] 29.66389 8.00000 37.16700 36.48167 36.30639 36.84417
## [7] 21.85833 14.29306 12.48500 43.26250 -79.07500 -65.75306
## [13] -69.66700 36.26306 67.82667 160.18306 -80.93300 -79.65583
## [19] 29.25000 42.71833 35.10889 66.83333 16.36700 43.31700
## [25] -112.55000 28.75000 36.50361 21.50000 64.51611 20.26556
## [31] -8.38300 158.33083 -12.66700 28.50000 35.23417 44.20806
## [37] 9.50000 -4.55500 35.13056 48.62667 49.70000 -84.67500
## [43] 10.33300 43.86700 37.40000 38.26667 -2.99944 -0.04444
## [49] 32.55333 101.50000 29.16700 23.13056
danger_table$x_coords <- x_coords
danger_table$location <- NULL
danger_table
## name
## 2 Abu Mena
## 3 Air and Tenere Natural Reserves
## 4 Ancient City of Aleppo
## 5 Ancient City of Bosra
## 6 Ancient City of Damascus
## 7 Ancient Villages of Northern Syria
## 8 Archaeological Site of Cyrene
## 9 Archaeological Site of Leptis Magna
## 10 Archaeological Site of Sabratha
## 11 Ashur (Qal'at Sherqat)
## 12 Chan Chan Archaeological Zone
## 13 City of Potosi
## 14 Coro and its Port
## 15 Crac des Chevaliers and Qal’at Salah El-Din
## 16 Cultural Landscape and Archaeological Remains of the Bamiyan Valley
## 17 East Rennell
## 18 Everglades National Park
## 19 Fortifications on the Caribbean Side of Panama: Portobelo-San Lorenzo
## 20 Garamba National Park
## 21 Hatra
## 22 Hebron/Al-Khalil Old Town
## 23 Historic Centre of Shakhrisyabz
## 24 Historic Centre of Vienna
## 25 Historic Town of Zab<U+012B>d
## 26 Islands and Protected Areas of the Gulf of California
## 27 Kahuzi-Biega National Park
## 28 Lake Turkana National Parks
## 29 Manovo-Gounda St Floris National Park
## 30 Minaret and Archaeological Remains of Jam
## 31 Medieval Monuments in Kosovo
## 32 Mount Nimba Strict Nature Reserve
## 33 Nan Madol: Ceremonial Centre of Eastern Micronesia
## 34 Niokolo-Koba National Park
## 35 Okapi Wildlife Reserve
## 36 Old City of Jerusalem and its Walls
## 37 Old City of Sana'a
## 38 Old Town of Ghadames
## 39 Old Towns of Djenne
## 40 Palestine: Land of Olives and Vines <U+2013> Cultural Landscape of Southern Jerusalem, Battir
## 41 Old Walled City of Shibam
## 42 Rainforests of the Atsinanana
## 43 Rio Platano Biosphere Reserve
## 44 Rock-Art Sites of Tadrart Acacus
## 45 Samarra Archaeological City
## 46 Selous Game Reserve
## 47 Site of Palmyra
## 48 Timbuktu
## 49 Tomb of Askia
## 50 Tombs of Buganda Kings at Kasubi
## 51 Tropical Rainforest Heritage of Sumatra
## 52 Virunga National Park
## 53 Ro<U+0219>ia Montan<U+0103> Mining Landscape
## criterion year_des year_end y_coords x_coords
## 2 Cultural 1979 2001 30.84167 29.66389
## 3 Natural 1991 1992 18.28300 8.00000
## 4 Cultural 1986 2013 36.23300 37.16700
## 5 Cultural 1980 2013 32.51806 36.48167
## 6 Cultural 1979 2013 33.51139 36.30639
## 7 Cultural 2011 2013 36.33417 36.84417
## 8 Cultural 1982 2016 32.82500 21.85833
## 9 Cultural 1982 2016 32.63833 14.29306
## 10 Cultural 1982 2016 32.80528 12.48500
## 11 Cultural 2003 2003 35.45667 43.26250
## 12 Cultural 1986 1986 -8.11111 -79.07500
## 13 Cultural 1987 2014 -19.58361 -65.75306
## 14 Cultural 1993 2005 11.41700 -69.66700
## 15 Cultural 2006 2013 34.78167 36.26306
## 16 Cultural 2003 2003 34.83194 67.82667
## 17 Natural 1998 2013 -11.68306 160.18306
## 18 Natural 1979 1993 25.31700 -80.93300
## 19 Cultural 1980 2012 9.55389 -79.65583
## 20 Natural 1980 1984 4.00000 29.25000
## 21 Cultural 1985 2015 35.58806 42.71833
## 22 Cultural 2017 2017 31.52417 35.10889
## 23 Cultural 2000 2016 39.05000 66.83333
## 24 Cultural 2001 2017 48.20000 16.36700
## 25 Cultural 1993 2000 14.20000 43.31700
## 26 Natural 2005 2019 27.63300 -112.55000
## 27 Natural 1980 1997 -2.50000 28.75000
## 28 Natural 1997 2018 3.05222 36.50361
## 29 Natural 1988 1997 9.00000 21.50000
## 30 Cultural 2002 2002 34.39667 64.51611
## 31 Cultural 2004 2006 42.66111 20.26556
## 32 Natural 1981 1992 7.60000 -8.38300
## 33 Cultural 2016 2016 6.83972 158.33083
## 34 Natural 1981 2007 13.00000 -12.66700
## 35 Natural 1996 1997 2.00000 28.50000
## 36 Cultural 1981 1982 31.77667 35.23417
## 37 Cultural 1986 2015 15.35556 44.20806
## 38 Cultural 1986 2016 30.13333 9.50000
## 39 Cultural 1988 2016 13.90639 -4.55500
## 40 Cultural 2014 2014 31.71972 35.13056
## 41 Cultural 1982 2015 15.92694 48.62667
## 42 Natural 2007 2010 -14.46700 49.70000
## 43 Natural 1982 1996 15.74444 -84.67500
## 44 Cultural 1985 2016 24.83300 10.33300
## 45 Cultural 2007 2007 34.20000 43.86700
## 46 Natural 1982 2014 -9.00000 37.40000
## 47 Cultural 1980 2013 34.55417 38.26667
## 48 Cultural 1988 2012 16.77333 -2.99944
## 49 Cultural 2004 2012 16.28972 -0.04444
## 50 Cultural 2001 2010 0.32917 32.55333
## 51 Natural 2004 2011 -2.50000 101.50000
## 52 Natural 1979 1994 0.91700 29.16700
## 53 Cultural 2021 2021 46.30611 23.13056
round(danger_table$y_coords, 2)[1:3]
## [1] 30.84 18.28 36.23
round(danger_table$x_coords, 2)[1:3]
## [1] 29.66 8.00 37.17
length(danger_table$y_coords)
## [1] 52
length(danger_table$x_coords)
## [1] 52
dim(danger_table)
## [1] 52 6
head(danger_table)
## name criterion year_des year_end y_coords
## 2 Abu Mena Cultural 1979 2001 30.84167
## 3 Air and Tenere Natural Reserves Natural 1991 1992 18.28300
## 4 Ancient City of Aleppo Cultural 1986 2013 36.23300
## 5 Ancient City of Bosra Cultural 1980 2013 32.51806
## 6 Ancient City of Damascus Cultural 1979 2013 33.51139
## 7 Ancient Villages of Northern Syria Cultural 2011 2013 36.33417
## x_coords
## 2 29.66389
## 3 8.00000
## 4 37.16700
## 5 36.48167
## 6 36.30639
## 7 36.84417
pch <- ifelse(danger_table$criterion == "Natural", 19, 2)
map("world", col = "darkgrey", lwd = 0.5, mar = c(0.1, 0.1, 0.1, 0.1))
points(danger_table$x_coords, danger_table$y_coords, pch = pch)
box()
table(danger_table$criterion)
##
## Cultural Natural
## 36 16
hist(danger_table$year_end, freq=TRUE,
xlab = "Year when site was put on the list of endangered sites",
main = "")
duration <- danger_table$year_end - danger_table$year_des
hist(duration, freq = TRUE,
xlab = "Years it took to become an endangered site",
main = "")