library(knitr)
opts_chunk$set(echo= TRUE, message=FALSE, warning = FALSE)

library(data.table)
library(tidyverse)

## Loading tidyverse: ggplot2
## Loading tidyverse: tibble
## Loading tidyverse: tidyr
## Loading tidyverse: readr
## Loading tidyverse: purrr
## Loading tidyverse: dplyr

## Conflicts with tidy packages ----------------------------------------------

## between():   dplyr, data.table
## filter():    dplyr, stats
## first():     dplyr, data.table
## lag():       dplyr, stats
## last():      dplyr, data.table
## transpose(): purrr, data.table

library(devtools)

library(stringr)
library(plotly)

## 
## Attaching package: 'plotly'

## The following object is masked from 'package:ggplot2':
## 
##     last_plot

## The following object is masked from 'package:stats':
## 
##     filter

## The following object is masked from 'package:graphics':
## 
##     layout

library(twitteR)

## 
## Attaching package: 'twitteR'

## The following objects are masked from 'package:dplyr':
## 
##     id, location

Analysis of Dublin Requests and Twitter sentiment towards Italy’s reception facilities

Introduction

As per Wikipedia, “The Dublin Regulation (Regulation No. 604/2013; sometimes the Dublin III Regulation; previously the Dublin II Regulation and Dublin Convention) is a European Union (EU) law that determines the EU Member State responsible to examine an application for asylum seekers seeking international protection under the Geneva Convention and the EU Qualification Directive, within the European Union. It is the cornerstone of the Dublin System, which consists of the Dublin Regulation and the EURODAC Regulation, which establishes a Europe-wide fingerprinting database for unauthorised entrants to the EU. The Dublin Regulation aims to”determine rapidly the Member State responsible [for an asylum claim]“[1] and provides for the transfer of an asylum seeker to that Member State. Usually, the responsible Member State will be the state through which the asylum seeker first entered the EU” (https://en.wikipedia.org/wiki/Dublin_Regulation).

Therefore a Dublin request can be simply described as a request, initiated by one European Member State (MS), to transfer asylum seekers to another European MS.

The goal of this project is to analyze trends of incoming vs. outgoing transfer requests across Europe, using the 2015 Eurostat dataset. Italy appears to be the main recepient of Dublin requests sent from other European MSs. At the same time, many concerns arise regarding its reception system. To investigate the latter, a sentiment analysis of English language tweets was carried out.

Source: http://ec.europa.eu/eurostat/data/database Population and social conditions>migr

Loading the Eurostat data into Neo4j using RNeo4j and first exploratory analysis

First, let’s load the Eurostat dataset into a GraphDB, Neo4j, to visualize the flow of Dublin requests.

lines <- readLines("C:/Users/Patrizia/Desktop/AmbraMSDA/FinalProject/Data/Dublin Requests/incoming_2015_LAST.csv")
#Delete quotes and remove dots notation under the Value var
lines <- gsub('"', '', lines, fixed=TRUE)
lines<- gsub("\\.","", lines)
lines<- str_replace_all(lines, "\\:", "0")
incomingreq<- read.csv(textConnection(lines),header=TRUE,colClasses=c("integer","character","factor","factor","factor","character","numeric"))

incomingreq<- as.data.frame(incomingreq)
incomingreq<- incomingreq %>%  select(GEO,PARTNER, Value)

#str(incomingreq)

#Subset to retain only geo, partner and value (to be leveraged for building relationships). Remove all self-referential and null observations where value=0 (self requests are not an option)
 
#incomingreq$GEO<- as.character(incomingreq$GEO)
#incomingreq$PARTNER<- as.character(incomingreq$PARTNER)
#incomingreq$Value<- as.numeric(incomingreq$Value)

 incomingreq<-incomingreq %>%  filter(Value > 0) %>% na.omit()

#test Italy sum
italyincoming<- incomingreq %>%  group_by(GEO) %>% filter(GEO=="Italy") %>%  summarise(sum= sum(Value)) 

#Rank Receipient Countries by number of incoming requests
toprecepients<- incomingreq %>%  group_by(GEO) %>%  summarise(Sum_Incoming= sum(Value)) %>% 
arrange(desc(Sum_Incoming)) %>% rename(Recipient=GEO)

kable(head(toprecepients), format = "html",  caption = "Top EU countries for number of incoming Dublin Requests")

Top EU countries for number of incoming Dublin Requests
Recipient	Sum_Incoming
Italy	25071
Germany (until 1990 former territory of the FRG)	11781
Bulgaria	8448
Poland	6444
Austria	5022
France	4820

#Map Receipient Countries in plotly


toprecepients$hover <- with(toprecepients, paste(Recipient, '<br>', "2015 Incoming Requests", Sum_Incoming))
# give state boundaries a white border
l <- list(color = toRGB("white"), width = 2)
# specify some map projection/options
g <- list(
  scope = 'europe',
  projection = list(type = 'Mercator'),
  showlakes = TRUE,
  lakecolor = toRGB('white')
)

p <- plot_geo(toprecepients, locationmode = 'country names') %>%
  add_trace(
    z = ~Sum_Incoming, text = ~hover, locations = ~Recipient,
    color = ~sum, colors = 'Reds'
  ) %>%
  colorbar(title = "Num of incoming Dublin Requests") %>%
  layout(
    title = '2015 Incoming Dublin Requests<br>(Hover for breakdown)',
    geo = g
  )

Sys.sleep(3)
# Create a shareable link to chart
Sys.setenv("plotly_username"="ambra8due")
Sys.setenv("plotly_api_key"="Q3msZml0Qh3Ht43HFmJS")
x<- sample(1:10000, 1)
chart_link = api_create(p, filename=paste("Dublin_Receipients",x, sep=""), sharing = "public")

chart_link

url<- chart_link$embed_url

plotly_iframe <- paste("<center><iframe scrolling='no' seamless='seamless' style='border:none' src='", url, 
    "/800/800' width='800' height='800'></iframe><center>", sep = "")
#Load the data into a GraphDB
library(RNeo4j)
graph = startGraph("http://localhost:7474/db/data/", username="neo4j", password="password")
clear(graph, input = FALSE)

data = data.frame(
Origin = incomingreq$PARTNER ,
Value = incomingreq$Value,
Receipient = incomingreq$GEO)

query = "
MERGE (origin:Country {name:{origin_name}})
MERGE (receipient:Country {name:{dest_name}})
CREATE (origin)<-[:ORIGIN]-(:Incoming {number:{num_req}})-[:RECEIPIENT]->(receipient)
"
t = newTransaction(graph)

for (i in 1:nrow(data)) {
origin_name = data[i, ]$Origin
dest_name = data[i, ]$Receipient
num_req = data[i, ]$Value
appendCypher(t,
query,
origin_name = origin_name,
num_req = num_req,
dest_name = dest_name)
}
commit(t)

tabgraph<- cypher(graph, "MATCH (o:Country)<-[:ORIGIN]-(f:Incoming)-[:RECEIPIENT]->(d:Country)
RETURN o.name AS Requester, f.number AS Req_num, d.name as Receipient")

kable(head(tabgraph), format = "html", caption = "2015 Dublin Requests (Excerpt from Neo4j cypher query result)")

2015 Dublin Requests (Excerpt from Neo4j cypher query result)
Requester	Req_num	Receipient
Croatia	4	Bulgaria
Italy	99	Bulgaria
Luxembourg	3	Bulgaria
Hungary	343	Bulgaria
Germany (until 1990 former territory of the FRG)	3993	Bulgaria
Ireland	6	Bulgaria

#Return all nodes and edges in Neo4J: START n=node(*) MATCH (n)-[r]->(m) RETURN n,r,m LIMIT 100;



#Return most central node: start n=node(*) match (n)-[r]-(m)  return n, count(r) as degree  order by degree desc

#Return Italy node as central one

## match (f:Incoming)-[:RECEIPIENT]-(n:Country {name: 'Italy'}) return f,n
## match (n:Country {name: 'Italy'})-[:RECEIPIENT]-(f:Incoming)-[:ORIGIN]-(v:Country) return n,f,v

#Return top requesters for Italy's node 
italyreq<- cypher(graph, "MATCH (o:Country)<-[:ORIGIN]-(f:Incoming)-[:RECEIPIENT]->(d:Country{name: 'Italy'})
RETURN o.name AS Requester, f.number AS Req_num, d.name as Receipient ORDER BY Req_num DESC")

kable(head(italyreq, 10), format="html",caption = "Top requesting countries: Dublin transfer requests sent to Italy in 2015" )

Top requesting countries: Dublin transfer requests sent to Italy in 2015
Requester	Req_num	Receipient
Switzerland	8716	Italy
Germany (until 1990 former territory of the FRG)	8553	Italy
France	2202	Italy
Sweden	1489	Italy
Austria	1357	Italy
Belgium	776	Italy
Norway	714	Italy
Netherlands	487	Italy
Denmark	324	Italy
Finland	258	Italy

Italy appears to be the main recepient of Dublin requests issued in 2015, topping the list with more than 25000 incoming requests. Germany follows behind, with an amount of requests that is less than half Italy’s record.

The graph below represents Italy’s incoming requests and the requesting countries:

Italy node

Top recepients of Dublin Requests and number of first time asylum applications

#Ranking the top 10 countries by first-time asylum applications in Europe
lines4 <- readLines("C:/Users/Patrizia/Desktop/AmbraMSDA/FinalProject/Data/Dublin Requests/FirstTimeApp.csv")
#Delete quotes and remove dots notation under the Value var
lines4 <- gsub('"', '', lines4, fixed=TRUE)
lines4<- str_replace_all(lines4, "\\.","")
asylumapp<- read.csv(textConnection(lines4))
asylumapp$Value<- as.integer(asylumapp$Value)
asylumapp<- asylumapp %>%  select(GEO, CITIZEN, Value) %>% filter(Value>0, CITIZEN!="Extra-EU-28") %>% na.omit()
#periphery<-data.frame("Receipient"=c("Italy", "Spain", "Greece","Portugal", "Poland", "Hungary"))

appcountry<- asylumapp %>% group_by(GEO) %>% summarise(firstimeapp=sum(Value))%>% rename(Recipient=GEO)
appcountryt<- appcountry %>% arrange(desc(firstimeapp)) %>% top_n(10, firstimeapp)

#kable(appcountryt, format="html", caption="Top 10 EU countries for first time asylum applications")

peripheraltop<- semi_join(appcountryt,head(toprecepients[,1],10), by= "Recipient")

kable(peripheraltop, format="html", caption="Countries that are both top recipients of asylum applications and Dublin Requests")

Countries that are both top recipients of asylum applications and Dublin Requests
Recipient	firstimeapp
Italy	75205
Germany (until 1990 former territory of the FRG)	429795
Austria	85030
France	70110
Switzerland	37025
Sweden	155580
Belgium	38780
Netherlands	42855

asylumDublin<- full_join(toprecepients[,1:2], appcountry, by="Recipient") %>% rename(Dublin_requests=Sum_Incoming, Firstime_asylum=firstimeapp) %>% mutate(perc= (Firstime_asylum/Dublin_requests)*100) %>% na.omit()

#Scatter plot: first time asylum applications and Dublin Transfer Requests
f <- list(
  family = "Courier New, monospace",
  size = 18,
  color = "#7f7f7f"
)
x <- list(
  title = "First time asylum applications",
  titlefont = f
)
y <- list(
  title = "Incoming Dublin requests",
  titlefont = f
)
p3 <- plot_ly(
  asylumDublin, x = ~Firstime_asylum, y = ~ Dublin_requests,
  color = ~Recipient, size= ~(1/perc),type = 'scatter' ,
mode = 'markers', text = ~paste(Recipient, "Dublin Requests: ", Dublin_requests, '<br>First time asylum application:', Firstime_asylum)) %>% layout(xaxis = x, yaxis = y, title= "2015 asylum applications and incoming Dublin requests", showlegend= FALSE)



h<- sample(1:10000, 1)
chart_link5 = api_create(p3, filename=paste("Dublin_Asylum",h, sep=""), sharing = "public")

chart_link5

url5<- chart_link5$embed_url

plotly_iframe5 <- paste("<center><iframe scrolling='no' seamless='seamless' style='border:none' src='", url5, 
    "/800/800' width='800' height='800'></iframe><center>", sep = "")

asylumfapp<- asylumDublin$Firstime_asylum
dublinreq<- asylumDublin$Dublin_requests

rsquared<- (cor(asylumfapp, dublinreq, use = "complete.obs"))^2

Out of the top 10 recipient countries of Dublin transfer request, 7 are also top recipients of first time asylum applications. There is a weak linear relationship between first time asylum applications and incoming Dublin requests. Countries of first entry do not always receive high numbers of Dublin transfer requests (it is the case of Greece, that is usually compared to Italy in migratory flow analysis, but received a only 137 incoming Dublin requests in 2015). In other peripheral countries (Croatia and Lithuania), the number of incoming Dublin requests largely overcame the number of asylum applications received in 2015.

Top requester countries and correlation between incoming and outcoming requests

#Rank Requesting Countries by number of requests 
toprequesters<- incomingreq %>%  group_by(PARTNER) %>%  summarise(Sum_Outgoing= sum(Value)) %>% 
arrange(desc(Sum_Outgoing))

#kable(head(toprequesters), format= "html", caption = "Rank of European MSs by outgoing requests (as logged by submitting partner)")

toprequestersItaly<- incomingreq %>%  group_by(PARTNER) %>% filter(GEO=="Italy") %>%  summarise(Sum_Outgoing= sum(Value)) %>% arrange(desc(Sum_Outgoing))

#kable(head(toprequestersItaly), format= "html", caption = "Italy's top requesters (as logged by submitting partner)")

#Is there a correlation between incoming and outgoing requests? 

#Combine the two tables
topreq<- toprequesters %>% arrange(PARTNER) %>% rename (Country= PARTNER)
toprec<- toprecepients %>% arrange(Recipient) %>% rename (Country= Recipient) 

incvsoutg<- left_join(topreq, toprec, by = "Country")

incoutperc<- incvsoutg %>% mutate (incperc=round((Sum_Incoming/(Sum_Outgoing+Sum_Incoming))*100)) %>% arrange(desc(incperc))

#kable(head(incoutperc), format="html",caption= "Rank of EU MSs by incoming requests as a percentage of total requests" )

#Scatterplot and Correlation Analysis 

#p <- plot_ly(incvsoutg, x = ~sumoutgoing, y = ~sumincoming, color = ~Country, size = ~sumincoming, text = ~paste("Incoming: ", sumincoming, '$<br>Outgoing:', sumoutgoing))

#p4 <- plot_ly(incvsoutg, x = ~sumoutgoing, y = ~sumincoming, type = 'scatter' ,
#mode = 'markers', color = ~Country, size = ~sumincoming, text = ~paste(Country, "Incoming: ", sumincoming, '<br>Outgoing:', #sumoutgoing)) %>% layout(title= "2015 Incoming vs. Outgoing Dublin requests", showlegend=FALSE)

#p4

#y<- sample(11000:20000, 1)
#chart_link6 = plotly_POST(p4, fileopt = "overwrite", filename=paste("Incoming vs Outgoing Dublin Requests",y, sep=""))

#chart_link6


#url6<- chart_link6$embed_url

#plotly_iframe6 <- paste("<center><iframe scrolling='no' seamless='seamless' style='border:none' src='", url6, 
   # "/800/800' width='800' height='800'></iframe><center>", sep = "")

incoming<- incvsoutg$Sum_Incoming
outgoing<- incvsoutg$Sum_Outgoing
 

rsq<- (cor(incoming, outgoing, use = "complete.obs"))^2

 rsq

## [1] 0.103182

Only 10% of the variation in outgoing requests is explained by the variation in incoming requests. Italy only issued 475 Dublin requests in 2015, representing less than 2% of total Dublin requests handled by the country (incoming+ outgoing).

Italy’s incoming Dublin Requests and breakdown by type

lines5 <- readLines("C:/Users/Patrizia/Desktop/AmbraMSDA/FinalProject/Data/Dublin Requests/Dublin_Italy_time_series.csv")
#Delete quotes, remove dots notation under the Value var and clean up the incomingreqItaly$LEG_PROV var
lines5 <-gsub('"', '', lines5, fixed=TRUE)
lines5<- lines5 %>% str_replace_all("\\.|\\s\\(Articles: 20.5, 18.1.b, 18.1.c, 18.1.d\\)","")
incomingreqItaly<- read.csv(textConnection(lines5))
incomingreqItaly$Value<- as.integer(incomingreqItaly$Value)

#breakdown by type of request in 2015

incomingreqItalypie<- incomingreqItaly %>%  filter(TIME==2015) %>% select(LEG_PROV, PARTNER, Value) %>% group_by(LEG_PROV)%>%na.omit() %>% filter(Value>0) %>% summarise(Tot=sum(Value))

p5 <- plot_ly(incomingreqItalypie, labels =~LEG_PROV, values = ~Tot, type = 'pie') %>%
  layout(title = 'Breakdown of Italy incoming requests in 2015',
         xaxis = list(showgrid = FALSE, zeroline = FALSE, showticklabels = FALSE),
         yaxis = list(showgrid = FALSE, zeroline = FALSE, showticklabels = FALSE))


j<- sample(11000:20000, 1)
chart_link3 = plotly_POST(p5, fileopt = "overwrite", filename=paste("Breakdown of Italy incoming requests in 2015",j, sep=""))

chart_link3

url3<- chart_link3$embed_url

plotly_iframe3 <- paste("<center><iframe scrolling='no' seamless='seamless' style='border:none' src='", url3, 
    "/800/800' width='800' height='800'></iframe><center>", sep = "")

Note that, in 2015, Italy received more “take charge” than “take back” requests. Only take back requests are issued when “the asylum seeker in the requesting country has already submitted an application for asylum in the country receiving the request” [http://ec.europa.eu/eurostat/statistics-explained/index.php/Dublin_statistics_on_countries_responsible_for_asylum_application].

Sentiment analysis of articles on Italy immigration facilities

Gathering tweets on Italian reception system

The following article describes Italy’s main reception facilities for Asylum Seekers, with a special focus on CARA (large scale, governmental centers) and SPRAR (small, decentralized reception structures supported by NGOs):

http://www.asylumineurope.org/reports/country/italy/reception-conditions/short-overview-italian-reception-system

AIDA visual on Italy’s reception system

#Authentication
consumer_key <- "XXXX"
consumer_secret <- "XXXX"
access_token <- "XXXXX"
access_secret <- "XXXXX"

options(httr_oauth_cache=T)
setup_twitter_oauth(consumer_key, consumer_secret, access_token, access_secret)

## [1] "Using direct authentication"

#Get sample tweets on the topic 

asylum<- searchTwitter("(italy OR italian) + (asylum OR refugees) + (centers OR facilities OR procedures OR process OR CARA OR reception OR facility OR detention OR SPRAR OR camps OR camp)", n=1000, lang='en')

#Extract date of tweets and txt, consolidating into df
#Unfortunately twitter search API only seems to retrieve tweets in the past 7 days

#asylumdfall<- ldply(asylum, function(x) x$toDataFrame() )

#asylum_date=ldply(asylum, function(x) x$getCreated())
#asylum_date=sapply(asylum_date,function(x) strftime(x, format="%Y-%m-%d",tz = "UTC"))

asylum_txt <- sapply(asylum, function(x) x$getText())
#asylum_geo<- ldply(asylum, function(x) x$getScreenName())

asylum_text=unlist(asylum_txt)

#asylumtweetdf=as.data.frame(cbind(tweet=asylum_text,date=asylum_date))

#kable(head(asylumtweetdf), format="html", captions="Sample of tweets on Italy's reception facilities from past week")

#Cleaning up the text

library(tm)
library(stringr)

#removing hashtags, emails, urls, retweets etc...

#clean_tweet = str_replace_all(asylumdf$tweet, "&amp|#", "")
#clean_tweet = str_replace_all(clean_tweet,"http.* *|\\S+@\\S+|(RT|via)((?:\\b\\W*@\\w+)+)|@\\w+|[[:punct:]]|[[:digit:]]|^\\s+|\\s+$|[ \t]{2,}|\r?\n|\r|[ |\t]{2,}", "")

clean_tweet = asylum_text %>%  str_replace_all("&amp|#", "") %>% 
  str_replace_all("http.* *|\\S+@\\S+|(RT|via)((?:\\b\\W*@\\w+)+)|@\\w+|[[:punct:]]|[[:digit:]]", "") %>% str_replace_all("^\\s+|\\s+$|[ \t]{2,}|\r?\n|\r", "")

# Create corpus and make some basic transformations
corpus=Corpus(VectorSource(clean_tweet))

corpus<- corpus %>% 
    tm_map(stripWhitespace) %>% 
    tm_map(function(x) removeWords(x,stopwords()))


library(wordcloud)

corpustdm<- as.matrix(TermDocumentMatrix(corpus))

word.freq <- sort(rowSums(corpustdm), decreasing = T)

library(RColorBrewer)

pal2 <- brewer.pal(12,"Dark2")

wordcloud(words = names(word.freq), freq = word.freq, min.freq = 10,
          random.order = F, colors=pal2)

Just a look at the above word cloud conveys an overall negative sentiment. The sample of tweets features two restrictions, related to language (only tweets in English) and temporal scope (due to Twitter API restrictions, only tweets from the past 7 days are returned).

To back the negative sentiment claim, I will implement a sentiment analysis using the Syuzhet library.

Sentiment Analysis of tweets

library(syuzhet)

tweetsentiment = get_nrc_sentiment(clean_tweet)
#kable(head(tweetsentiment), format= "html")

#sentimentdf = cbind(asylumtweetdf,tweetsentiment)

#subset emotions
tweetemotions<- tweetsentiment %>%  select(-negative, -positive)
#subset sentiment values
tweetvalue<- tweetsentiment[, -(1:8)]

plottweetvalue<- gather(tweetvalue,"sentiment","values")  %>% 
  group_by( sentiment) %>%
  summarise(Total = sum(values))

plotemotions<- gather(tweetemotions,"emotion","values")  %>% 
  group_by(emotion) %>%
  summarise(Total = sum(values))
  

gg1<- ggplot(plottweetvalue, aes(sentiment, Total, color = sentiment, fill = sentiment)) + 
  geom_bar(stat="identity")+
  ggtitle("Sentiment of tweets on Italy's reception facilities for asylum seekers \n (data from past week as of May 14, 2017)")+  theme(legend.position="none")+ geom_text(aes(label=Total), position = position_dodge(width=0.75), vjust = -0.25)

 gg2<- ggplot(plotemotions, aes(emotion, Total, color = emotion, fill = emotion)) + 
  geom_bar(stat="identity")+
  ggtitle("Emotions conveyed by tweets on Italy's reception facilities for asylum seekers \n (data from past week as of May 14, 2017)")+  theme(legend.position="none")+ geom_text(aes(label=Total), position = position_dodge(width=0.75), vjust = -0.25)
 
 gg1

gg2

The charts above show that EN language tweets towards Italian reception facilities are mostly negative, and that the top four emotions that permeate them are fear, sadness, anger and disgust.

In general, media (including social media) tend to portray a negative picture of the Italian reception system for asylum seekers. Note that more research is necessary to support such generalization.

Conclusion

In 2015, Italy was the top recepient of Dublin transfer requests (25k) issued by other EU MSs. 58% were “Take Charge” requests. The majority of Italy’s incoming Dublin requests came from neighbouring countries, namely Switzerland, Germany and France. The Twitter analysis shows a negative sentiment towards the state of Italy’s reception facilities. Note that unfortunately this represents an initial exploratory, scoping analysis that could be further integrated by researching traditional media coverage, legal background of the Dublin system and secondary refugee movements.

FP

Ambra

14 maggio 2017