Introduction
My data set has information about the world heritage sites all around the globe. I got this data set from kaggle (https://www.kaggle.com/ujwalkandi/unesco-world-heritage-sites?select=whc-sites-2019+-+Copy.xls).It was created at 2019 by Ujwal Kandi.
As a quantitative variable I use latitude,longitude,date_inscribed, and as categorical variable i use category,region_en variables. Organizing the data
I had some issue with quantitative variables. I have labeled the graphs using main, xlab and ylab. I used # to comment. I have brief detail about the headers through out the project.

knitr::opts_chunk$set(echo = TRUE)
df<- read.csv("whc-sites-2019.csv")
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✓ ggplot2 3.3.5     ✓ purrr   0.3.4
## ✓ tibble  3.1.4     ✓ dplyr   1.0.7
## ✓ tidyr   1.1.4     ✓ stringr 1.4.0
## ✓ readr   2.0.1     ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

1

#The mean value of Latitude's values.
mean(df$latitude)
## [1] 28.94847
#The standard deviation.
sd(df$latitude)
## [1] 23.69288
#Five number summary
summary(df$latitude)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  -54.59   17.48   36.10   28.95   45.77   71.19

Graphical Display

Below I have different graphical displays. Histogram, box plot and qq plot is presented for latitude variable which is one of the quantitative variable.The main purpose of a qq plot is to assess normality. Histograms might be second-best option (to normal probability plots) for assessing normality. Boxplots main purpose is to show quartiles and outliers, if there are any present.

hist(df$latitude, main = "Histogram of Latitude values",xlab = "Latitude (sec)",col = "red",)

boxplot(df$latitude, main= "Box plot showing Latitude", xlab= "Quartile", ylab = "latitude")

qqnorm(df$latitude)
qqline(df$latitude, col = "red")

There are some outliers below , which is far from the rest of the values. The distribution is negative skewed.

2

Graphical display looking at longitude and latitude and their correlation

plot(df$latitude, df$longitude)

cor(df$latitude, df$longitude)
## [1] -0.01570684

The correlation coefficient is a measurement of the closeness of association of the points in a scatter plot to a linear regression line based on those points.

3

Table

Below I have frequency table and relative frequency table. It contains different region and the number of world heritages sites in that region.

Frequency Table

table(df$region_en)
## 
##                                                                        Africa 
##                                                                            96 
##                                                                   Arab States 
##                                                                            86 
##                                                          Asia and the Pacific 
##                                                                           266 
##                                                      Europe and North America 
##                                                                           528 
##                                 Europe and North America,Asia and the Pacific 
##                                                                             2 
## Europe and North America,Asia and the Pacific,Latin America and the Caribbean 
##                                                                             1 
##                                               Latin America and the Caribbean 
##                                                                           142

Relative Frequency Table

table(df$region_en)/length(df$region_en)
## 
##                                                                        Africa 
##                                                                  0.0856378234 
##                                                                   Arab States 
##                                                                  0.0767172168 
##                                                          Asia and the Pacific 
##                                                                  0.2372881356 
##                                                      Europe and North America 
##                                                                  0.4710080285 
##                                 Europe and North America,Asia and the Pacific 
##                                                                  0.0017841213 
## Europe and North America,Asia and the Pacific,Latin America and the Caribbean 
##                                                                  0.0008920607 
##                                               Latin America and the Caribbean 
##                                                                  0.1266726137

4

Two-way table

Two way table is made below for two categorical variables. Category and region. The table represents how many Cultural, Mixed and Natural heritage sites are in 7 different region.

two_way_table <- table(df$category,df$region_en)
two_way_table
##           
##            Africa Arab States Asia and the Pacific Europe and North America
##   Cultural     53          78                  189                      452
##   Mixed         5           3                   12                       11
##   Natural      38           5                   65                       65
##           
##            Europe and North America,Asia and the Pacific
##   Cultural                                             0
##   Mixed                                                0
##   Natural                                              2
##           
##            Europe and North America,Asia and the Pacific,Latin America and the Caribbean
##   Cultural                                                                             1
##   Mixed                                                                                0
##   Natural                                                                              0
##           
##            Latin America and the Caribbean
##   Cultural                              96
##   Mixed                                  8
##   Natural                               38

5

Side-By-Side Plot

Here I use a boxplot with one quantitative and other categorical variable to present side-by-side plot. As X-axis represent different category of heritage sites and Y-axis represent establisment date of the heritage sites.

boxplot(df$date_inscribed ~ df$category, col="orange", main="Date Inscribed of distributed Category", ylab="Inscribed Date", xlab="Category") 

6

BarPlot

One quantative variable is used for making barplot below. Latitude of heritage sites.

barplot(df$latitude,  main = "Latitude Chart", xlab = "", ylab = "latitude")

Scatter Plot of Longitude Value

scatter.smooth(df$longitude, main = "Scatter Plot")

HEAT MAP

data <- read.csv("whc-sites-2019.csv", header = TRUE)
data <- data.matrix(data[,-1])
library(RColorBrewer)
heatmap(t(data),
        main = "Heat Map",
        Rowv = NA,
        Colv = NA,
        col = colorRampPalette(brewer.pal(8, "PiYG"))(25),
        scale = "column")

Conclusion
The most interesting feature of my data, graphical analysis of the data. We can predict how different variables are depending on each other through plotting graphs. As this is my first time exploring data and analyzing in RStudio, I got alot of ideas about how we can manipulate data according to our need.

DECISION TREE

Decision tree uses a tree-like model of decisions and their possible outcomes. Here I made category variable a factor. Since my data has alot of written description, i had to make my data small, so R won’t crash. I have imported two libraries ‘rpart’&‘rpart.plot’ to make decision tree.

library(rpart)
library(rpart.plot)
smalldf <- sample_n(df,35)
tree <- rpart(category ~ region_en + states_name_en + danger + date_inscribed + category_short , data = smalldf)
tree
## n= 35 
## 
## node), split, n, loss, yval, (yprob)
##       * denotes terminal node
## 
## 1) root 35 4 Cultural (0.8857143 0.1142857)  
##   2) states_name_en=Andorra,Austria,Austria,Hungary,Belarus,Estonia,Finland,Latvia,Lithuania,Norway,Republic of Moldova,Russian Federation,Sweden,Ukraine,Brazil,Burkina Faso,Cuba,Ethiopia,France,India,Indonesia,Italy,Japan,Kenya,Libya,Mexico,Myanmar,Poland,Portugal,Russian Federation,Senegal,South Africa,Spain,Sweden,Syrian Arab Republic,Turkey 28 0 Cultural (1.0000000 0.0000000) *
##   3) states_name_en=Algeria,Belize,Democratic Republic of the Congo,Iran (Islamic Republic of),Madagascar 7 3 Natural (0.4285714 0.5714286) *
rpart.plot(tree, extra = 2)

To make a prediction I am using the tree, I predict the tree that have created.

pred <- predict(tree, smalldf, type = "class")
head(pred)
##        1        2        3        4        5        6 
## Cultural Cultural  Natural Cultural Cultural Cultural 
## Levels: Cultural Natural

Each has been classified into its own category.

predict(tree, smalldf) %>%
  head()
##    Cultural   Natural
## 1 1.0000000 0.0000000
## 2 1.0000000 0.0000000
## 3 0.4285714 0.5714286
## 4 1.0000000 0.0000000
## 5 1.0000000 0.0000000
## 6 1.0000000 0.0000000

Confusion Table

Confusion Table is presented below:

confusion_table <- with(smalldf, table(category, pred))
confusion_table
##           pred
## category   Cultural Natural
##   Cultural       28       3
##   Natural         0       4

Cross Validation

The process of traning and testing data by seperating data into a set to train or to test is called cross validation.

library(caret)
## Loading required package: lattice
## 
## Attaching package: 'caret'
## The following object is masked from 'package:purrr':
## 
##     lift
inTrain <- createDataPartition(y = smalldf$category, p = .66, list = FALSE)
smalldf_train <- smalldf %>% slice(inTrain)
smalldf_test <- smalldf %>% slice(-inTrain)
dim(smalldf_train)
## [1] 24 22
dim(smalldf_test)
## [1] 11 22

I used the training set to build my model and then test it. I removed states_name_en from my tree.

tree_from_train <- rpart(category ~.,data = subset(smalldf_train, select=c( -states_name_en)))
pred_test <- predict(tree_from_train, subset(smalldf_train, select=c( -states_name_en)), type = "class")
with(smalldf_train, table(category, pred_test))
##           pred_test
## category   Cultural Natural
##   Cultural       21       0
##   Natural         3       0

I have made a full tree below. I only have ~25 data because i had a lot of data, I had chop my data.

smalldf_no_States <- subset(smalldf, select=c( -states_name_en))
tree_full <- sample_n(smalldf_no_States,25) %>% 
  rpart(category ~., data = ., control = rpart.control(minsplit = 2, cp = 0))
rpart.plot(tree_full, extra = 2, roundint=FALSE,
  box.palette = list( "Gn", "Bu")) 

I couldn’t make prediction on my data. I have error in model. That’s the reason i have kept the it in comment below:

#pred_full <- predict(tree_full, smalldf_no_States, type = "class")
#with(smalldf, table(region_en, pred_full))
imp <- varImp(tree)
head(imp)
##                  Overall
## date_inscribed 0.3164835
## region_en      0.8634921
## states_name_en 3.6571429
## danger         0.0000000
## category_short 0.0000000
imp %>% ggplot(aes(x = row.names(imp), weight = Overall)) +
  geom_bar()

barplot(imp$Overall)

Chi-squared statistic

library(FSelector)
weights <- smalldf %>% chi.squared(category ~ ., data = .) %>%
  as_tibble(rownames = "feature") %>%
  arrange(desc(attr_importance))
weights
## # A tibble: 21 × 2
##    feature              attr_importance
##    <chr>                          <dbl>
##  1 name_en                        1    
##  2 short_description_en           1    
##  3 date_end                       1    
##  4 criteria_txt                   1    
##  5 category_short                 1    
##  6 states_name_en                 0.901
##  7 iso_code                       0.901
##  8 udnp_code                      0.901
##  9 danger_list                    0.853
## 10 area_hectares                  0.717
## # … with 11 more rows
ggplot(weights,
  aes(x = attr_importance, y = reorder(feature, attr_importance))) +
  geom_bar(stat = "identity") +
  xlab("Importance score") + ylab("Feature")

Extra tree

tree1 <- rpart(date_inscribed ~ category + danger + region_en,data = smalldf, method = 'class')
tree1
## n= 35 
## 
## node), split, n, loss, yval, (yprob)
##       * denotes terminal node
## 
## 1) root 35 32 2001 (0.057 0.029 0.029 0.029 0.057 0.029 0.029 0.057 0.057 0.057 0.086 0.029 0.057 0.057 0.086 0.029 0.029 0.029 0.057 0.029 0.029 0.029 0.029)  
##   2) region_en=Africa,Arab States,Europe and North America 24 21 2001 (0.083 0.042 0.042 0.042 0 0.042 0.042 0.042 0 0.083 0.12 0.042 0.083 0.042 0.083 0.042 0.042 0 0.083 0 0 0 0.042)  
##     4) region_en=Africa,Arab States 10  8 1980 (0.2 0.1 0 0.1 0 0 0 0.1 0 0 0.1 0 0.1 0 0 0.1 0.1 0 0.1 0 0 0 0) *
##     5) region_en=Europe and North America 14 12 2000 (0 0 0.071 0 0 0.071 0.071 0 0 0.14 0.14 0.071 0.071 0.071 0.14 0 0 0 0.071 0 0 0 0.071) *
##   3) region_en=Asia and the Pacific,Latin America and the Caribbean 11  9 1991 (0 0 0 0 0.18 0 0 0.091 0.18 0 0 0 0 0.091 0.091 0 0 0.091 0 0.091 0.091 0.091 0) *
rpart.plot(tree1, extra = 2)

tree
## n= 35 
## 
## node), split, n, loss, yval, (yprob)
##       * denotes terminal node
## 
## 1) root 35 4 Cultural (0.8857143 0.1142857)  
##   2) states_name_en=Andorra,Austria,Austria,Hungary,Belarus,Estonia,Finland,Latvia,Lithuania,Norway,Republic of Moldova,Russian Federation,Sweden,Ukraine,Brazil,Burkina Faso,Cuba,Ethiopia,France,India,Indonesia,Italy,Japan,Kenya,Libya,Mexico,Myanmar,Poland,Portugal,Russian Federation,Senegal,South Africa,Spain,Sweden,Syrian Arab Republic,Turkey 28 0 Cultural (1.0000000 0.0000000) *
##   3) states_name_en=Algeria,Belize,Democratic Republic of the Congo,Iran (Islamic Republic of),Madagascar 7 3 Natural (0.4285714 0.5714286) *
tail(smalldf)
##    category                   states_name_en                region_en
## 30 Cultural                           Turkey Europe and North America
## 31 Cultural                          Austria Europe and North America
## 32 Cultural                            India     Asia and the Pacific
## 33  Natural Democratic Republic of the Congo                   Africa
## 34 Cultural                            Italy Europe and North America
## 35 Cultural                          Algeria              Arab States
##    unique_number id_no rev_bis
## 30           729   614        
## 31          1206  1033        
## 32          1947   247     Rev
## 33           849   718        
## 34          1196  1024     Rev
## 35           212   191        
##                                                         name_en
## 30                                           City of Safranbolu
## 31                                    Historic Centre of Vienna
## 32                                      Hill Forts of Rajasthan
## 33                                       Okapi Wildlife Reserve
## 34 Late Baroque Towns of the Val di Noto (South-Eastern Sicily)
## 35                                                      Djémila
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     short_description_en
## 30                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            From the 13th century to the advent of the railway in the early 20th century, Safranbolu was an important caravan station on the main East–West trade route. The Old Mosque, Old Bath and Süleyman Pasha Medrese were built in 1322. During its apogee in the 17th century, Safranbolu's architecture influenced urban development throughout much of the Ottoman Empire. 
## 31                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  Vienna developed from early Celtic and Roman settlements into a Medieval and Baroque city, the capital of the Austro-Hungarian Empire. It played an essential role as a leading European music centre, from the great age of Viennese Classicism through the early part of the 20th century. The historic centre of Vienna is rich in architectural ensembles, including Baroque castles and gardens, as well as the late-19th-century Ringstrasse lined with grand buildings, monuments and parks. 
## 32  The serial site, situated in the state of Rajastahan, includes six majestic forts in Chittorgarh; Kumbhalgarh; Sawai Madhopur; Jhalawar; Jaipur, and Jaisalmer. The ecclectic architecture of the forts, some up to 20 kilometres in circumference, bears testimony to the power of the Rajput princely states that flourished in the region from the 8th to the 18th centuries. Enclosed within defensive walls are major urban centres, palaces, trading centres and other buildings including temples that often predate the fortifications within which developed an elaborate courtly culture that supported learning, music and the arts. Some of the urban centres enclosed in the fortifications have survived, as have many of the site's temples and other sacred buildings. The forts use the natural defenses offered by the landscape: hills, deserts, rivers, and dense forests. They also feature extensive water harvesting structures, largely still in use today. 
## 33                                                                                                                                                                                                                                                                                                                                                                                                                       The Okapi Wildlife Reserve occupies about one-fifth of the Ituri forest in the north-east of the Democratic Republic of the Congo. The Congo river basin, of which the reserve and forest are a part, is one of the largest drainage systems in Africa. The reserve contains threatened species of primates and birds and about 5,000 of the estimated 30,000 okapi surviving in the wild. It also has some dramatic scenery, including waterfalls on the Ituri and Epulu rivers. The reserve is inhabited by traditional nomadic pygmy Mbuti and Efe hunters. 
## 34                                                                                                                                                                                                                                                                                                                                                                                                                                                  The eight towns in south-eastern Sicily: Caltagirone, Militello Val di Catania, Catania, Modica, Noto, Palazzolo, Ragusa and Scicli, were all rebuilt after 1693 on or beside towns existing at the time of the earthquake which took place in that year. They represent a considerable collective undertaking, successfully carried out at a high level of architectural and artistic achievement. Keeping within the late Baroque style of the day, they also depict distinctive innovations in town planning and urban building. 
## 35                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         Situated 900 m above sea-level, Dj&eacute;mila, or Cuicul, with its forum, temples, basilicas, triumphal arches and houses, is an interesting example of Roman town planning adapted to a mountain location. 
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   justification_en
## 30                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                
## 31                                                                                                                                                                                                       <em>Criterion (ii):</em> The urban and architectural qualities of the Historic Centre of Vienna bear outstanding witness to a continuing interchange of values throughout the second millennium. \n <em>Criterion (iv):</em> Three key periods of European cultural and political development – the Middle Ages, the Baroque period, and the Gründerzeit – are exceptionally well illustrated by the urban and architectural heritage of the Historic Centre of Vienna. \n <em>Criterion (vi):</em> Since the 16th century Vienna has been universally acknowledged to be the musical capital of Europe. 
## 32                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                
## 33                                                                                                                                                                                                                                                                                                                                                                               The Committee inscribed the property as one of the most important sites for conservation, including the rare Okapi and rich floral diversity, under natural <em>criterion (x)</em>. The Committee expressed its hope that the activities outlined in the new management plan would ensure the integrity of the site. Considering the civil unrest in the country, the question of the long-term security of the site was raised. 
## 34  <em>Criterion (i):</em> This group of towns in south-eastern Sicily provides outstanding testimony to the exuberant genius of late Baroque art and architecture. \n <em>Criterion (ii):</em> The towns of the Val di Noto represent the culmination and final flowering of Baroque art in Europe. \n <em>Criterion (iv):</em> The exceptional quality of the late Baroque art and architecture in the Val di Noto lies in its geographical and chronological homogeneity, as well as its quantity, the result of the 1693 earthquake in this region. \n <em>Criterion (v):</em> The eight towns of south-eastern Sicily that make up this nomination, which are characteristic of the settlement pattern and urban form of this region, are permanently at risk from earthquakes and eruptions of Mount Etna. 
## 35                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                
##    date_inscribed secondary_dates danger date_end danger_list longitude
## 30           1994                      0       NA              32.68972
## 31           2001                      1       NA      Y 2017  16.38333
## 32           2013                      0       NA              74.64611
## 33           1996                      1       NA      Y 1997  28.50000
## 34           2002                      0       NA              15.06892
## 35           1982                      0       NA               5.73667
##    latitude area_hectares   criteria_txt category_short iso_code udnp_code
## 30 41.26000        193.00    (ii)(iv)(v)              C       tr       tur
## 31 48.21667        371.00   (ii)(iv)(vi)              C       at       aut
## 32 24.88333            NA      (ii)(iii)              C       in       ind
## 33  2.00000    1372625.00            (x)              N       cd       cod
## 34 36.89319        112.79 (i)(ii)(iv)(v)              C       it       ita
## 35 36.32056         30.60      (iii)(iv)              C       dz       dza
##    transboundary
## 30             0
## 31             0
## 32             0
## 33             0
## 34             0
## 35             0
library(arules)
## Loading required package: Matrix
## 
## Attaching package: 'Matrix'
## The following objects are masked from 'package:tidyr':
## 
##     expand, pack, unpack
## 
## Attaching package: 'arules'
## The following object is masked from 'package:dplyr':
## 
##     recode
## The following objects are masked from 'package:base':
## 
##     abbreviate, write
library(arulesViz)
transactions(smalldf)
## Warning: Column(s) 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,
## 18, 19, 20, 21, 22 not logical or factor. Applying default discretization (see
## '? discretizeDF').
## Warning in discretize(x = c(0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, : The calculated breaks are: 0, 0, 0, 1
##   Only unique breaks are used reducing the number of intervals. Look at ? discretize for details.
## Warning in discretize(x = c(0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, : The calculated breaks are: 0, 0, 0, 1
##   Only unique breaks are used reducing the number of intervals. Look at ? discretize for details.
## transactions in sparse format with
##  35 transactions (rows) and
##  248 items (columns)
colnames(smalldf)[c(1,2,3,4,10,12)]
## [1] "category"       "states_name_en" "region_en"      "unique_number" 
## [5] "date_inscribed" "danger"
smalldf <- smalldf %>% mutate(
  danger = (danger > 0),
  date_inscribed = (date_inscribed >0)
)
trans <- transactions(smalldf)
## Warning: Column(s) 1, 2, 3, 4, 5, 6, 7, 8, 9, 11, 13, 14, 15, 16, 17, 18,
## 19, 20, 21, 22 not logical or factor. Applying default discretization (see '?
## discretizeDF').
## Warning in discretize(x = c(0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, : The calculated breaks are: 0, 0, 0, 1
##   Only unique breaks are used reducing the number of intervals. Look at ? discretize for details.