Jean-Luc BELLIER
February 25th, 2017
The European Heritage Days is a cultural annual event, that takes place since 1991. It gives the opportunity for everyone, during one week-end, to discover and visit momuments or part of them that are usually closed to the public.
This event takes place in around 50 countries in Europe and also in extra european countries. It was initiated by French Ministry of Culture in 1984.
During the week-end of September 17th-18th 2016, more than 12.5 millions of visitors on about :
The project is made of two elements :
The interactive application is made of an input panel with two dropdown lists. The output contains one table with the main information on the events, and a map enabling to localize them.
The application can be found at the following address : European Heritage Days 2016.
The code of the project can be found on github : Project Source
library(tidyr)
library(dplyr)
Journees_Patr_2016 <- read.csv("Journees_europeennes_du_patrimoine_20160914.csv",encoding="UTF-8", sep=";",header=TRUE,stringsAsFactors = FALSE)
dim(Journees_Patr_2016)
[1] 23624 122
nb_obs <- dim(Journees_Patr_2016)[1]
nb_col <- dim(Journees_Patr_2016)[2]
# remove empty columns
check_NA <- apply(Journees_Patr_2016,2,function(x) sum(is.na(x)))
cols_to_remove <- which(check_NA == nb_obs)
clean_Jour_Pat_2016 <- select(Journees_Patr_2016,-cols_to_remove)
dim(clean_Jour_Pat_2016)
[1] 23624 60
clean_Jour_Pat_2016_2 <- clean_Jour_Pat_2016 %>% subset(!is.na(Latitude) & !is.na(Longitude))
dim(clean_Jour_Pat_2016_2)
[1] 10591 60
The data is coming from the French open data. It lists all the events and nomuments that occured in september 2016.
We can notice that half of the columns (62 columns among 122) are completely empty. In addition, the number of rows can be significantly reduced by filtering the rows having no latitude or no longitude. We need this to ensure that the events can be plotted on the map.This removes about half of the rows.
This gives an idea about the quality that can be found on a true dataset. As shown above, the data volume can be divided by 4. We did not go into further details on the data. There may be mistakes (the fields are mostly free texts). It is not the objective of the project to have 100% clean data.