library(knitr)
opts_chunk$set(echo= TRUE, message=FALSE, warning = FALSE)
library(data.table)
library(tidyverse)
## Loading tidyverse: ggplot2
## Loading tidyverse: tibble
## Loading tidyverse: tidyr
## Loading tidyverse: readr
## Loading tidyverse: purrr
## Loading tidyverse: dplyr
## Conflicts with tidy packages ----------------------------------------------
## between(): dplyr, data.table
## filter(): dplyr, stats
## first(): dplyr, data.table
## lag(): dplyr, stats
## last(): dplyr, data.table
## transpose(): purrr, data.table
library(devtools)
library(stringr)
library(plotly)
##
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
library(twitteR)
##
## Attaching package: 'twitteR'
## The following objects are masked from 'package:dplyr':
##
## id, location
As per Wikipedia, “The Dublin Regulation (Regulation No. 604/2013; sometimes the Dublin III Regulation; previously the Dublin II Regulation and Dublin Convention) is a European Union (EU) law that determines the EU Member State responsible to examine an application for asylum seekers seeking international protection under the Geneva Convention and the EU Qualification Directive, within the European Union. It is the cornerstone of the Dublin System, which consists of the Dublin Regulation and the EURODAC Regulation, which establishes a Europe-wide fingerprinting database for unauthorised entrants to the EU. The Dublin Regulation aims to”determine rapidly the Member State responsible [for an asylum claim]“[1] and provides for the transfer of an asylum seeker to that Member State. Usually, the responsible Member State will be the state through which the asylum seeker first entered the EU” (https://en.wikipedia.org/wiki/Dublin_Regulation).
Therefore a Dublin request can be simply described as a request, initiated by one European Member State (MS), to transfer asylum seekers to another European MS.
The goal of this project is to analyze trends of incoming vs. outgoing transfer requests across Europe, using the 2015 Eurostat dataset. Italy appears to be the main recepient of Dublin requests sent from other European MSs. At the same time, many concerns arise regarding its reception system. To investigate the latter, a sentiment analysis of English language tweets was carried out.
Source: http://ec.europa.eu/eurostat/data/database Population and social conditions>migr
First, let’s load the Eurostat dataset into a GraphDB, Neo4j, to visualize the flow of Dublin requests.
lines <- readLines("C:/Users/Patrizia/Desktop/AmbraMSDA/FinalProject/Data/Dublin Requests/incoming_2015_LAST.csv")
#Delete quotes and remove dots notation under the Value var
lines <- gsub('"', '', lines, fixed=TRUE)
lines<- gsub("\\.","", lines)
lines<- str_replace_all(lines, "\\:", "0")
incomingreq<- read.csv(textConnection(lines),header=TRUE,colClasses=c("integer","character","factor","factor","factor","character","numeric"))
incomingreq<- as.data.frame(incomingreq)
incomingreq<- incomingreq %>% select(GEO,PARTNER, Value)
#str(incomingreq)
#Subset to retain only geo, partner and value (to be leveraged for building relationships). Remove all self-referential and null observations where value=0 (self requests are not an option)
#incomingreq$GEO<- as.character(incomingreq$GEO)
#incomingreq$PARTNER<- as.character(incomingreq$PARTNER)
#incomingreq$Value<- as.numeric(incomingreq$Value)
incomingreq<-incomingreq %>% filter(Value > 0) %>% na.omit()
#test Italy sum
italyincoming<- incomingreq %>% group_by(GEO) %>% filter(GEO=="Italy") %>% summarise(sum= sum(Value))
#Rank Receipient Countries by number of incoming requests
toprecepients<- incomingreq %>% group_by(GEO) %>% summarise(Sum_Incoming= sum(Value)) %>%
arrange(desc(Sum_Incoming)) %>% rename(Recipient=GEO)
kable(head(toprecepients), format = "html", caption = "Top EU countries for number of incoming Dublin Requests")
Recipient | Sum_Incoming |
---|---|
Italy | 25071 |
Germany (until 1990 former territory of the FRG) | 11781 |
Bulgaria | 8448 |
Poland | 6444 |
Austria | 5022 |
France | 4820 |
#Map Receipient Countries in plotly
toprecepients$hover <- with(toprecepients, paste(Recipient, '<br>', "2015 Incoming Requests", Sum_Incoming))
# give state boundaries a white border
l <- list(color = toRGB("white"), width = 2)
# specify some map projection/options
g <- list(
scope = 'europe',
projection = list(type = 'Mercator'),
showlakes = TRUE,
lakecolor = toRGB('white')
)
p <- plot_geo(toprecepients, locationmode = 'country names') %>%
add_trace(
z = ~Sum_Incoming, text = ~hover, locations = ~Recipient,
color = ~sum, colors = 'Reds'
) %>%
colorbar(title = "Num of incoming Dublin Requests") %>%
layout(
title = '2015 Incoming Dublin Requests<br>(Hover for breakdown)',
geo = g
)
Sys.sleep(3)
# Create a shareable link to chart
Sys.setenv("plotly_username"="ambra8due")
Sys.setenv("plotly_api_key"="Q3msZml0Qh3Ht43HFmJS")
x<- sample(1:10000, 1)
chart_link = api_create(p, filename=paste("Dublin_Receipients",x, sep=""), sharing = "public")
chart_link
url<- chart_link$embed_url
plotly_iframe <- paste("<center><iframe scrolling='no' seamless='seamless' style='border:none' src='", url,
"/800/800' width='800' height='800'></iframe><center>", sep = "")
#Load the data into a GraphDB
library(RNeo4j)
graph = startGraph("http://localhost:7474/db/data/", username="neo4j", password="password")
clear(graph, input = FALSE)
data = data.frame(
Origin = incomingreq$PARTNER ,
Value = incomingreq$Value,
Receipient = incomingreq$GEO)
query = "
MERGE (origin:Country {name:{origin_name}})
MERGE (receipient:Country {name:{dest_name}})
CREATE (origin)<-[:ORIGIN]-(:Incoming {number:{num_req}})-[:RECEIPIENT]->(receipient)
"
t = newTransaction(graph)
for (i in 1:nrow(data)) {
origin_name = data[i, ]$Origin
dest_name = data[i, ]$Receipient
num_req = data[i, ]$Value
appendCypher(t,
query,
origin_name = origin_name,
num_req = num_req,
dest_name = dest_name)
}
commit(t)
tabgraph<- cypher(graph, "MATCH (o:Country)<-[:ORIGIN]-(f:Incoming)-[:RECEIPIENT]->(d:Country)
RETURN o.name AS Requester, f.number AS Req_num, d.name as Receipient")
kable(head(tabgraph), format = "html", caption = "2015 Dublin Requests (Excerpt from Neo4j cypher query result)")
Requester | Req_num | Receipient |
---|---|---|
Croatia | 4 | Bulgaria |
Italy | 99 | Bulgaria |
Luxembourg | 3 | Bulgaria |
Hungary | 343 | Bulgaria |
Germany (until 1990 former territory of the FRG) | 3993 | Bulgaria |
Ireland | 6 | Bulgaria |
#Return all nodes and edges in Neo4J: START n=node(*) MATCH (n)-[r]->(m) RETURN n,r,m LIMIT 100;
#Return most central node: start n=node(*) match (n)-[r]-(m) return n, count(r) as degree order by degree desc
#Return Italy node as central one
## match (f:Incoming)-[:RECEIPIENT]-(n:Country {name: 'Italy'}) return f,n
## match (n:Country {name: 'Italy'})-[:RECEIPIENT]-(f:Incoming)-[:ORIGIN]-(v:Country) return n,f,v
#Return top requesters for Italy's node
italyreq<- cypher(graph, "MATCH (o:Country)<-[:ORIGIN]-(f:Incoming)-[:RECEIPIENT]->(d:Country{name: 'Italy'})
RETURN o.name AS Requester, f.number AS Req_num, d.name as Receipient ORDER BY Req_num DESC")
kable(head(italyreq, 10), format="html",caption = "Top requesting countries: Dublin transfer requests sent to Italy in 2015" )
Requester | Req_num | Receipient |
---|---|---|
Switzerland | 8716 | Italy |
Germany (until 1990 former territory of the FRG) | 8553 | Italy |
France | 2202 | Italy |
Sweden | 1489 | Italy |
Austria | 1357 | Italy |
Belgium | 776 | Italy |
Norway | 714 | Italy |
Netherlands | 487 | Italy |
Denmark | 324 | Italy |
Finland | 258 | Italy |
Italy appears to be the main recepient of Dublin requests issued in 2015, topping the list with more than 25000 incoming requests. Germany follows behind, with an amount of requests that is less than half Italy’s record.
The graph below represents Italy’s incoming requests and the requesting countries:
Italy node
#Ranking the top 10 countries by first-time asylum applications in Europe
lines4 <- readLines("C:/Users/Patrizia/Desktop/AmbraMSDA/FinalProject/Data/Dublin Requests/FirstTimeApp.csv")
#Delete quotes and remove dots notation under the Value var
lines4 <- gsub('"', '', lines4, fixed=TRUE)
lines4<- str_replace_all(lines4, "\\.","")
asylumapp<- read.csv(textConnection(lines4))
asylumapp$Value<- as.integer(asylumapp$Value)
asylumapp<- asylumapp %>% select(GEO, CITIZEN, Value) %>% filter(Value>0, CITIZEN!="Extra-EU-28") %>% na.omit()
#periphery<-data.frame("Receipient"=c("Italy", "Spain", "Greece","Portugal", "Poland", "Hungary"))
appcountry<- asylumapp %>% group_by(GEO) %>% summarise(firstimeapp=sum(Value))%>% rename(Recipient=GEO)
appcountryt<- appcountry %>% arrange(desc(firstimeapp)) %>% top_n(10, firstimeapp)
#kable(appcountryt, format="html", caption="Top 10 EU countries for first time asylum applications")
peripheraltop<- semi_join(appcountryt,head(toprecepients[,1],10), by= "Recipient")
kable(peripheraltop, format="html", caption="Countries that are both top recipients of asylum applications and Dublin Requests")
Recipient | firstimeapp |
---|---|
Italy | 75205 |
Germany (until 1990 former territory of the FRG) | 429795 |
Austria | 85030 |
France | 70110 |
Switzerland | 37025 |
Sweden | 155580 |
Belgium | 38780 |
Netherlands | 42855 |
asylumDublin<- full_join(toprecepients[,1:2], appcountry, by="Recipient") %>% rename(Dublin_requests=Sum_Incoming, Firstime_asylum=firstimeapp) %>% mutate(perc= (Firstime_asylum/Dublin_requests)*100) %>% na.omit()
#Scatter plot: first time asylum applications and Dublin Transfer Requests
f <- list(
family = "Courier New, monospace",
size = 18,
color = "#7f7f7f"
)
x <- list(
title = "First time asylum applications",
titlefont = f
)
y <- list(
title = "Incoming Dublin requests",
titlefont = f
)
p3 <- plot_ly(
asylumDublin, x = ~Firstime_asylum, y = ~ Dublin_requests,
color = ~Recipient, size= ~(1/perc),type = 'scatter' ,
mode = 'markers', text = ~paste(Recipient, "Dublin Requests: ", Dublin_requests, '<br>First time asylum application:', Firstime_asylum)) %>% layout(xaxis = x, yaxis = y, title= "2015 asylum applications and incoming Dublin requests", showlegend= FALSE)
h<- sample(1:10000, 1)
chart_link5 = api_create(p3, filename=paste("Dublin_Asylum",h, sep=""), sharing = "public")
chart_link5
url5<- chart_link5$embed_url
plotly_iframe5 <- paste("<center><iframe scrolling='no' seamless='seamless' style='border:none' src='", url5,
"/800/800' width='800' height='800'></iframe><center>", sep = "")
asylumfapp<- asylumDublin$Firstime_asylum
dublinreq<- asylumDublin$Dublin_requests
rsquared<- (cor(asylumfapp, dublinreq, use = "complete.obs"))^2
Out of the top 10 recipient countries of Dublin transfer request, 7 are also top recipients of first time asylum applications. There is a weak linear relationship between first time asylum applications and incoming Dublin requests. Countries of first entry do not always receive high numbers of Dublin transfer requests (it is the case of Greece, that is usually compared to Italy in migratory flow analysis, but received a only 137 incoming Dublin requests in 2015). In other peripheral countries (Croatia and Lithuania), the number of incoming Dublin requests largely overcame the number of asylum applications received in 2015.
#Rank Requesting Countries by number of requests
toprequesters<- incomingreq %>% group_by(PARTNER) %>% summarise(Sum_Outgoing= sum(Value)) %>%
arrange(desc(Sum_Outgoing))
#kable(head(toprequesters), format= "html", caption = "Rank of European MSs by outgoing requests (as logged by submitting partner)")
toprequestersItaly<- incomingreq %>% group_by(PARTNER) %>% filter(GEO=="Italy") %>% summarise(Sum_Outgoing= sum(Value)) %>% arrange(desc(Sum_Outgoing))
#kable(head(toprequestersItaly), format= "html", caption = "Italy's top requesters (as logged by submitting partner)")
#Is there a correlation between incoming and outgoing requests?
#Combine the two tables
topreq<- toprequesters %>% arrange(PARTNER) %>% rename (Country= PARTNER)
toprec<- toprecepients %>% arrange(Recipient) %>% rename (Country= Recipient)
incvsoutg<- left_join(topreq, toprec, by = "Country")
incoutperc<- incvsoutg %>% mutate (incperc=round((Sum_Incoming/(Sum_Outgoing+Sum_Incoming))*100)) %>% arrange(desc(incperc))
#kable(head(incoutperc), format="html",caption= "Rank of EU MSs by incoming requests as a percentage of total requests" )
#Scatterplot and Correlation Analysis
#p <- plot_ly(incvsoutg, x = ~sumoutgoing, y = ~sumincoming, color = ~Country, size = ~sumincoming, text = ~paste("Incoming: ", sumincoming, '$<br>Outgoing:', sumoutgoing))
#p4 <- plot_ly(incvsoutg, x = ~sumoutgoing, y = ~sumincoming, type = 'scatter' ,
#mode = 'markers', color = ~Country, size = ~sumincoming, text = ~paste(Country, "Incoming: ", sumincoming, '<br>Outgoing:', #sumoutgoing)) %>% layout(title= "2015 Incoming vs. Outgoing Dublin requests", showlegend=FALSE)
#p4
#y<- sample(11000:20000, 1)
#chart_link6 = plotly_POST(p4, fileopt = "overwrite", filename=paste("Incoming vs Outgoing Dublin Requests",y, sep=""))
#chart_link6
#url6<- chart_link6$embed_url
#plotly_iframe6 <- paste("<center><iframe scrolling='no' seamless='seamless' style='border:none' src='", url6,
# "/800/800' width='800' height='800'></iframe><center>", sep = "")
incoming<- incvsoutg$Sum_Incoming
outgoing<- incvsoutg$Sum_Outgoing
rsq<- (cor(incoming, outgoing, use = "complete.obs"))^2
rsq
## [1] 0.103182
Only 10% of the variation in outgoing requests is explained by the variation in incoming requests. Italy only issued 475 Dublin requests in 2015, representing less than 2% of total Dublin requests handled by the country (incoming+ outgoing).
lines5 <- readLines("C:/Users/Patrizia/Desktop/AmbraMSDA/FinalProject/Data/Dublin Requests/Dublin_Italy_time_series.csv")
#Delete quotes, remove dots notation under the Value var and clean up the incomingreqItaly$LEG_PROV var
lines5 <-gsub('"', '', lines5, fixed=TRUE)
lines5<- lines5 %>% str_replace_all("\\.|\\s\\(Articles: 20.5, 18.1.b, 18.1.c, 18.1.d\\)","")
incomingreqItaly<- read.csv(textConnection(lines5))
incomingreqItaly$Value<- as.integer(incomingreqItaly$Value)
#breakdown by type of request in 2015
incomingreqItalypie<- incomingreqItaly %>% filter(TIME==2015) %>% select(LEG_PROV, PARTNER, Value) %>% group_by(LEG_PROV)%>%na.omit() %>% filter(Value>0) %>% summarise(Tot=sum(Value))
p5 <- plot_ly(incomingreqItalypie, labels =~LEG_PROV, values = ~Tot, type = 'pie') %>%
layout(title = 'Breakdown of Italy incoming requests in 2015',
xaxis = list(showgrid = FALSE, zeroline = FALSE, showticklabels = FALSE),
yaxis = list(showgrid = FALSE, zeroline = FALSE, showticklabels = FALSE))
j<- sample(11000:20000, 1)
chart_link3 = plotly_POST(p5, fileopt = "overwrite", filename=paste("Breakdown of Italy incoming requests in 2015",j, sep=""))
chart_link3
url3<- chart_link3$embed_url
plotly_iframe3 <- paste("<center><iframe scrolling='no' seamless='seamless' style='border:none' src='", url3,
"/800/800' width='800' height='800'></iframe><center>", sep = "")
Note that, in 2015, Italy received more “take charge” than “take back” requests. Only take back requests are issued when “the asylum seeker in the requesting country has already submitted an application for asylum in the country receiving the request” [http://ec.europa.eu/eurostat/statistics-explained/index.php/Dublin_statistics_on_countries_responsible_for_asylum_application].
The following article describes Italy’s main reception facilities for Asylum Seekers, with a special focus on CARA (large scale, governmental centers) and SPRAR (small, decentralized reception structures supported by NGOs):
AIDA visual on Italy’s reception system
#Authentication
consumer_key <- "XXXX"
consumer_secret <- "XXXX"
access_token <- "XXXXX"
access_secret <- "XXXXX"
options(httr_oauth_cache=T)
setup_twitter_oauth(consumer_key, consumer_secret, access_token, access_secret)
## [1] "Using direct authentication"
#Get sample tweets on the topic
asylum<- searchTwitter("(italy OR italian) + (asylum OR refugees) + (centers OR facilities OR procedures OR process OR CARA OR reception OR facility OR detention OR SPRAR OR camps OR camp)", n=1000, lang='en')
#Extract date of tweets and txt, consolidating into df
#Unfortunately twitter search API only seems to retrieve tweets in the past 7 days
#asylumdfall<- ldply(asylum, function(x) x$toDataFrame() )
#asylum_date=ldply(asylum, function(x) x$getCreated())
#asylum_date=sapply(asylum_date,function(x) strftime(x, format="%Y-%m-%d",tz = "UTC"))
asylum_txt <- sapply(asylum, function(x) x$getText())
#asylum_geo<- ldply(asylum, function(x) x$getScreenName())
asylum_text=unlist(asylum_txt)
#asylumtweetdf=as.data.frame(cbind(tweet=asylum_text,date=asylum_date))
#kable(head(asylumtweetdf), format="html", captions="Sample of tweets on Italy's reception facilities from past week")
#Cleaning up the text
library(tm)
library(stringr)
#removing hashtags, emails, urls, retweets etc...
#clean_tweet = str_replace_all(asylumdf$tweet, "&|#", "")
#clean_tweet = str_replace_all(clean_tweet,"http.* *|\\S+@\\S+|(RT|via)((?:\\b\\W*@\\w+)+)|@\\w+|[[:punct:]]|[[:digit:]]|^\\s+|\\s+$|[ \t]{2,}|\r?\n|\r|[ |\t]{2,}", "")
clean_tweet = asylum_text %>% str_replace_all("&|#", "") %>%
str_replace_all("http.* *|\\S+@\\S+|(RT|via)((?:\\b\\W*@\\w+)+)|@\\w+|[[:punct:]]|[[:digit:]]", "") %>% str_replace_all("^\\s+|\\s+$|[ \t]{2,}|\r?\n|\r", "")
# Create corpus and make some basic transformations
corpus=Corpus(VectorSource(clean_tweet))
corpus<- corpus %>%
tm_map(stripWhitespace) %>%
tm_map(function(x) removeWords(x,stopwords()))
library(wordcloud)
corpustdm<- as.matrix(TermDocumentMatrix(corpus))
word.freq <- sort(rowSums(corpustdm), decreasing = T)
library(RColorBrewer)
pal2 <- brewer.pal(12,"Dark2")
wordcloud(words = names(word.freq), freq = word.freq, min.freq = 10,
random.order = F, colors=pal2)
Just a look at the above word cloud conveys an overall negative sentiment. The sample of tweets features two restrictions, related to language (only tweets in English) and temporal scope (due to Twitter API restrictions, only tweets from the past 7 days are returned).
To back the negative sentiment claim, I will implement a sentiment analysis using the Syuzhet library.
library(syuzhet)
tweetsentiment = get_nrc_sentiment(clean_tweet)
#kable(head(tweetsentiment), format= "html")
#sentimentdf = cbind(asylumtweetdf,tweetsentiment)
#subset emotions
tweetemotions<- tweetsentiment %>% select(-negative, -positive)
#subset sentiment values
tweetvalue<- tweetsentiment[, -(1:8)]
plottweetvalue<- gather(tweetvalue,"sentiment","values") %>%
group_by( sentiment) %>%
summarise(Total = sum(values))
plotemotions<- gather(tweetemotions,"emotion","values") %>%
group_by(emotion) %>%
summarise(Total = sum(values))
gg1<- ggplot(plottweetvalue, aes(sentiment, Total, color = sentiment, fill = sentiment)) +
geom_bar(stat="identity")+
ggtitle("Sentiment of tweets on Italy's reception facilities for asylum seekers \n (data from past week as of May 14, 2017)")+ theme(legend.position="none")+ geom_text(aes(label=Total), position = position_dodge(width=0.75), vjust = -0.25)
gg2<- ggplot(plotemotions, aes(emotion, Total, color = emotion, fill = emotion)) +
geom_bar(stat="identity")+
ggtitle("Emotions conveyed by tweets on Italy's reception facilities for asylum seekers \n (data from past week as of May 14, 2017)")+ theme(legend.position="none")+ geom_text(aes(label=Total), position = position_dodge(width=0.75), vjust = -0.25)
gg1
gg2
The charts above show that EN language tweets towards Italian reception facilities are mostly negative, and that the top four emotions that permeate them are fear, sadness, anger and disgust.
In general, media (including social media) tend to portray a negative picture of the Italian reception system for asylum seekers. Note that more research is necessary to support such generalization.
In 2015, Italy was the top recepient of Dublin transfer requests (25k) issued by other EU MSs. 58% were “Take Charge” requests. The majority of Italy’s incoming Dublin requests came from neighbouring countries, namely Switzerland, Germany and France. The Twitter analysis shows a negative sentiment towards the state of Italy’s reception facilities. Note that unfortunately this represents an initial exploratory, scoping analysis that could be further integrated by researching traditional media coverage, legal background of the Dublin system and secondary refugee movements.