‘Refugee’ is a word we hear in the news often. By definition, a refugee is “one that flees; especially : a person who flees to a foreign country or power to escape danger or persecution”.
Recently, the media has been paying close attention to refugees fleeing conflict areas in the Middle East, dubbing this mass migration of people as a ‘Refugee Crisis’.
I’d like to take a look at how many refugees there are in the world today as opposed to the number of refugees in the world in 1975, to analyze the numbers behind this crisi. I’d also like to take a look at what regions in the world have produced the most refugees or if there has been any country that have seen a decline in citizens seeking aslyum elsewhere.
My data was obtained from the the UN website (http://data.un.org/Data.aspx?d=UNHCR&f=indID%3AType-Ref#UNHCR).
The UN has data on Refugees migration from 1975-2016. The data frame includes 96065 observations. This analysis is only looking at three variables, years, refugee numbers, and country of origin.
This is an observational study looking at data from 1975 to 2016 for all countries that have been forced to leave their country or territory of origin.
The response variable is a numerical value that represents the number of refugees.
The explanatory variable will be calculated from the years and the number of refugees.
The population of interest is number of refugees from 1975-2016; while I will only be observing the countries in which the refugees are fleeing, I might be able to detect a trend in whether specific countries are seeing more or less refugees leaving over time. This study will not be able to find any causation of migration trends.
refugeedata <- read.csv("https://raw.githubusercontent.com/ntlrs/data606finalproject/master/UN%20Data.csv", header = TRUE, stringsAsFactors = FALSE)
head(refugeedata)
## Country.or.territory.of.origin Year Refugees.sup....sup.
## 1 Iraq 2016 1
## 2 Islamic Rep. of Iran 2016 33
## 3 Pakistan 2016 59737
## 4 China 2016 11
## 5 Dem. Rep. of the Congo 2016 3
## 6 Egypt 2016 3
## Total.refugees.and.people.in.refugee.like.situations.sup.....sup. X X.1
## 1 1 NA NA
## 2 33 NA NA
## 3 59737 NA NA
## 4 11 NA NA
## 5 3 NA NA
## 6 3 NA NA
## X.2 X.3 X.4
## 1 NA NA NA
## 2 NA NA NA
## 3 NA NA NA
## 4 NA NA NA
## 5 NA NA NA
## 6 NA NA NA
## X.5
## 1
## 2
## 3
## 4
## 5
## 6 how many people left their countries and became refugees (1075-2016)
Clean Data
refugee<-refugeedata[c(1:3)]
names(refugee)[1] <- "CoO"
names(refugee)[2] <- "Year"
names(refugee)[3] <- "Refugees"
refugee$CoO <- as.factor(refugee$CoO)
head(refugee)
## CoO Year Refugees
## 1 Iraq 2016 1
## 2 Islamic Rep. of Iran 2016 33
## 3 Pakistan 2016 59737
## 4 China 2016 11
## 5 Dem. Rep. of the Congo 2016 3
## 6 Egypt 2016 3
summary(refugee)
## CoO Year Refugees
## Various : 2347 Length:96065 Min. : 1
## Somalia : 2243 Class :character 1st Qu.: 3
## Iraq : 2091 Mode :character Median : 14
## Dem. Rep. of the Congo: 2061 Mean : 4947
## Sudan : 1995 3rd Qu.: 129
## Ethiopia : 1937 Max. :3272290
## (Other) :83391 NA's :202
I see that there are 202 cases of NA values in my data. Since this number is small as compared to the total number of refugees, I’m going to omit the NA’s from this data set
refugee <- na.omit(refugee)
summary(refugee)
## CoO Year Refugees
## Various : 2335 Length:95863 Min. : 1
## Somalia : 2240 Class :character 1st Qu.: 3
## Iraq : 2091 Mode :character Median : 14
## Dem. Rep. of the Congo: 2056 Mean : 4947
## Sudan : 1991 3rd Qu.: 129
## Ethiopia : 1933 Max. :3272290
## (Other) :83217
Now, I’ll take a look at some of the summary statistics for the number of refugees in the data frame.
describe(refugee$Refugees)
## vars n mean sd median trimmed mad min max range
## X1 1 95863 4947.1 61660.01 14 108.18 19.27 1 3272290 3272289
## skew kurtosis se
## X1 30.24 1182.49 199.15
The mean number of refugees over the time represented by the data frame is 4947.1, with a standard deviation of 61660.01.
Since the original data included a column of which countries refugee fled to, there are countries listed several times for any given year. Since I’m only looking at the total number of refugees leaving a particular country, I’m going to combine the data for each country’s total number of refugees left per year.
allrefugees <- aggregate(refugee$Refugees, by=list(Year=refugee$Year, CoO=refugee$CoO), FUN=sum)
summary(allrefugees)
## Year CoO x
## Length:5878 Angola : 42 Min. : 1
## Class :character Burundi : 42 1st Qu.: 46
## Mode :character Cambodia : 42 Median : 774
## Chile : 42 Mean : 80681
## Dem. Rep. of the Congo: 42 3rd Qu.: 16238
## Ethiopia : 42 Max. :6339095
## (Other) :5626
describe(allrefugees$x)
## vars n mean sd median trimmed mad min max range
## X1 1 5878 80681.1 365766.9 774 13269.69 1143.83 1 6339095 6339094
## skew kurtosis se
## X1 9.47 112.45 4770.78
refugeetotals <- aggregate(refugee$Refugees, by=list(Year=refugee$Year), FUN=sum)
names(refugeetotals)[2] <- "numrefugees"
summary(refugeetotals)
## Year numrefugees
## Length:42 Min. : 3529434
## Class :character 1st Qu.: 9614508
## Mode :character Median :10864246
## Mean :11291512
## 3rd Qu.:13607394
## Max. :17838074
ggplot(refugeetotals, aes(x=Year, y=numrefugees, group = 1)) +
geom_line() + ylab("Total Refugees") + xlab("Years") +
ggtitle("Total Number of Refugees") + scale_x_discrete(breaks = c("1975", "1980", "1985", "1990", "1995", "2000", "2005", "2010", "2015"))
sum(allrefugees$x)
## [1] 474243490
top25 <- allrefugees %>%
arrange(desc(x)) %>%
slice(1:25)
top25
## # A tibble: 25 x 3
## Year CoO x
## <chr> <fctr> <int>
## 1 1990 Afghanistan 6339095
## 2 1991 Afghanistan 6306301
## 3 1989 Afghanistan 5643989
## 4 1988 Afghanistan 5622982
## 5 1987 Afghanistan 5511740
## 6 2016 Syrian Arab Rep. 5500448
## 7 1986 Afghanistan 5094283
## 8 2015 Syrian Arab Rep. 4851450
## 9 1983 Afghanistan 4712735
## 10 1985 Afghanistan 4653193
## # ... with 15 more rows
I wanted to take a look at the country and year that the most refugees came from and Afghanistan occupies 8 of the top 10 slots.
afghanistanrefugees <- subset(allrefugees, CoO=="Afghanistan")
afghanistanrefugees <- afghanistanrefugees[c(1,3)]
names(afghanistanrefugees)[2] <- "numrefugees"
describe(afghanistanrefugees$numrefugees)
## vars n mean sd median trimmed mad min max range
## X1 1 38 3311495 1407570 2675455 3234206 1085756 5e+05 6339095 5839095
## skew kurtosis se
## X1 0.58 -0.59 228338
ggplot(afghanistanrefugees, aes(x=Year, y=numrefugees, group = 1)) +
geom_line() + ylab("Afghanistan") + xlab("Years") +
ggtitle("Number of Refugees in Afghanistan") + scale_x_discrete(breaks = c("1975", "1980", "1985", "1990", "1995", "2000", "2005", "2010", "2015"))
sum(afghanistanrefugees$numrefugees)
## [1] 125836802
(sum(afghanistanrefugees$numrefugees)/sum(allrefugees$x))
## [1] 0.2653422
Refugees from Afghanistan make up 26% of the world’s refugee population from 1975.
noafghanistan <- subset(allrefugees, CoO != "Afghanistan")
describe(noafghanistan$x)
## vars n mean sd median trimmed mad min max range
## X1 1 5840 59658.68 231793.6 750 12272.95 1107.5 1 5500448 5500447
## skew kurtosis se
## X1 9.47 134.58 3033.16
refugeewoafghan <- aggregate(noafghanistan$x, by=list(Year=noafghanistan$Year), FUN=sum)
names(refugeewoafghan)[2] <- "numrefugees"
describe(refugeewoafghan$numrefugees)
## vars n mean sd median trimmed mad min max
## X1 1 42 8295397 2596848 7693171 8168177 2057003 3529434 14044724
## range skew kurtosis se
## X1 10515290 0.53 -0.49 400702.3
ggplot(refugeewoafghan, aes(x=Year, y=numrefugees, group = 1)) +
geom_line() + ylab("Refugees Minus Afghanistan") + xlab("Years") +
ggtitle("Number of Refugees w/o Afghanistam") + scale_x_discrete(breaks = c("1975", "1980", "1985", "1990", "1995", "2000", "2005", "2010", "2015"))
sum(noafghanistan$x)
## [1] 348406688
ggplot(refugeetotals, aes(x=Year, y=numrefugees, group = 1), color='green') +
geom_line() +
geom_line (data = refugeewoafghan, aes(x=Year, Y=numrefugees), color = 'blue') +
geom_line (data = afghanistanrefugees, aes(x=Year, Y=numrefugees), color = 'red') +
ggtitle("Comparing Total World Refugees with Refugee Totals without Afghanistan") + scale_x_discrete(breaks = c("1975", "1980", "1985", "1990", "1995", "2000", "2005", "2010", "2015"))
## Warning: Ignoring unknown aesthetics: Y
## Warning: Ignoring unknown aesthetics: Y
refugeestats <- data.frame("Year" = refugeewoafghan$Year,
"Num Refugee wo Afghanistan" = refugeewoafghan$numrefugees,
"Total Number of Refugees" = refugeetotals$numrefugees)
as.integer(refugeestats$Year)
## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
## [24] 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42
head(refugeestats)
## Year Num.Refugee.wo.Afghanistan Total.Number.of.Refugees
## 1 1975 3529434 3529434
## 2 1976 4270631 4270631
## 3 1977 4518659 4518659
## 4 1978 5065844 5065844
## 5 1979 5779912 6279912
## 6 1980 6720016 8454937
H0 = There is no relationship between the population of refugees from Afghanistan and the total world refugee population since 1975.
HA = There is a relationship between the population of refugees from Afghanistan and the total world refugee population since 1975.
Independence of cases : It is unlikely that the number of refugees from around the world would impact the number of refugees leaving Afghanistan. These cases are independent of one another.
Sample Size/Skew : The Samples size is 474243490, a portion of the population of the world. It meets the minimum sample size to pass this condition.
boxplot(refugeestats$Num.Refugee.wo.Afghanistan, refugeestats$Total.Number.of.Refugees, names=c("Total W/O Afghanistan","All Refugees"), col=c("blue","orange"), main="Box Plot of World Refugee Totals with and without Afghanistan")
qq<- lm(Num.Refugee.wo.Afghanistan ~ Total.Number.of.Refugees, data = refugeestats)
plot(refugeestats$Num.Refugee.wo.Afghanistan ~ refugeestats$Total.Number.of.Refugees, col="blue", xlab="World Refugee Total", ylab="Refugees W/O Afghanistan")
abline(qq)
costats<-lm(refugeestats$Total.Number.of.Refugees ~ refugeestats$Num.Refugee.wo.Afghanistan)
summary(costats)
##
## Call:
## lm(formula = refugeestats$Total.Number.of.Refugees ~ refugeestats$Num.Refugee.wo.Afghanistan)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2235302 -954386 -462826 1300460 2779406
##
## Coefficients:
## Estimate Std. Error t value
## (Intercept) 1.042e+06 8.159e+05 1.277
## refugeestats$Num.Refugee.wo.Afghanistan 1.236e+00 9.396e-02 13.149
## Pr(>|t|)
## (Intercept) 0.209
## refugeestats$Num.Refugee.wo.Afghanistan 4.15e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1562000 on 40 degrees of freedom
## Multiple R-squared: 0.8121, Adjusted R-squared: 0.8074
## F-statistic: 172.9 on 1 and 40 DF, p-value: 4.154e-16
hist(resid(costats))
qqnorm(resid(costats))
qqline(resid(costats))
anova <- aov(refugeestats$Total.Number.of.Refugees ~ refugeestats$Num.Refugee.wo.Afghanistan)
summary(anova)
## Df Sum Sq Mean Sq F value
## refugeestats$Num.Refugee.wo.Afghanistan 1 4.221e+14 4.221e+14 172.9
## Residuals 40 9.765e+13 2.441e+12
## Pr(>F)
## refugeestats$Num.Refugee.wo.Afghanistan 4.15e-16 ***
## Residuals
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Based on the analysis above, the total number of Afghani refugees has had a significant impact on the world’s refugee population. We reject the null hypothesis in this instance. Though the number of refugees from Afghanistan is statisically decreasing, the sheer volumn of refugees still contributes to a massive amount of the world’s refugee population.
For further research, I would like to look more at contributing factors, such a GDP, Economy, and unrest in countries with large numbers of refugee populations.
Data: http://data.un.org/Data.aspx?d=UNHCR&f=indID%3AType-Ref#UNHCR