Part 1 - Introduction:

‘Refugee’ is a word we hear in the news often. By definition, a refugee is “one that flees; especially : a person who flees to a foreign country or power to escape danger or persecution”.

Recently, the media has been paying close attention to refugees fleeing conflict areas in the Middle East, dubbing this mass migration of people as a ‘Refugee Crisis’.

I’d like to take a look at how many refugees there are in the world today as opposed to the number of refugees in the world in 1975, to analyze the numbers behind this crisi. I’d also like to take a look at what regions in the world have produced the most refugees or if there has been any country that have seen a decline in citizens seeking aslyum elsewhere.

Part 2 - Data:

Data Collection:

My data was obtained from the the UN website (http://data.un.org/Data.aspx?d=UNHCR&f=indID%3AType-Ref#UNHCR).

Cases:

The UN has data on Refugees migration from 1975-2016. The data frame includes 96065 observations. This analysis is only looking at three variables, years, refugee numbers, and country of origin.

Type of Study:

This is an observational study looking at data from 1975 to 2016 for all countries that have been forced to leave their country or territory of origin.

Response:

The response variable is a numerical value that represents the number of refugees.

Explanatory:

The explanatory variable will be calculated from the years and the number of refugees.

Scope of Inference:

The population of interest is number of refugees from 1975-2016; while I will only be observing the countries in which the refugees are fleeing, I might be able to detect a trend in whether specific countries are seeing more or less refugees leaving over time. This study will not be able to find any causation of migration trends.

refugeedata <- read.csv("https://raw.githubusercontent.com/ntlrs/data606finalproject/master/UN%20Data.csv", header = TRUE, stringsAsFactors = FALSE)
head(refugeedata)
##   Country.or.territory.of.origin Year Refugees.sup....sup.
## 1                           Iraq 2016                    1
## 2           Islamic Rep. of Iran 2016                   33
## 3                       Pakistan 2016                59737
## 4                          China 2016                   11
## 5         Dem. Rep. of the Congo 2016                    3
## 6                          Egypt 2016                    3
##   Total.refugees.and.people.in.refugee.like.situations.sup.....sup.  X X.1
## 1                                                                 1 NA  NA
## 2                                                                33 NA  NA
## 3                                                             59737 NA  NA
## 4                                                                11 NA  NA
## 5                                                                 3 NA  NA
## 6                                                                 3 NA  NA
##   X.2 X.3 X.4
## 1  NA  NA  NA
## 2  NA  NA  NA
## 3  NA  NA  NA
## 4  NA  NA  NA
## 5  NA  NA  NA
## 6  NA  NA  NA
##                                                                    X.5
## 1                                                                     
## 2                                                                     
## 3                                                                     
## 4                                                                     
## 5                                                                     
## 6 how many people left their countries and became refugees (1075-2016)

Clean Data

refugee<-refugeedata[c(1:3)]
names(refugee)[1] <- "CoO"
names(refugee)[2] <- "Year"
names(refugee)[3] <- "Refugees"
refugee$CoO <- as.factor(refugee$CoO)
head(refugee)
##                      CoO Year Refugees
## 1                   Iraq 2016        1
## 2   Islamic Rep. of Iran 2016       33
## 3               Pakistan 2016    59737
## 4                  China 2016       11
## 5 Dem. Rep. of the Congo 2016        3
## 6                  Egypt 2016        3

Part 3 - Exploratory data analysis:

summary(refugee)
##                      CoO            Year              Refugees      
##  Various               : 2347   Length:96065       Min.   :      1  
##  Somalia               : 2243   Class :character   1st Qu.:      3  
##  Iraq                  : 2091   Mode  :character   Median :     14  
##  Dem. Rep. of the Congo: 2061                      Mean   :   4947  
##  Sudan                 : 1995                      3rd Qu.:    129  
##  Ethiopia              : 1937                      Max.   :3272290  
##  (Other)               :83391                      NA's   :202

I see that there are 202 cases of NA values in my data. Since this number is small as compared to the total number of refugees, I’m going to omit the NA’s from this data set

refugee <- na.omit(refugee)
summary(refugee)
##                      CoO            Year              Refugees      
##  Various               : 2335   Length:95863       Min.   :      1  
##  Somalia               : 2240   Class :character   1st Qu.:      3  
##  Iraq                  : 2091   Mode  :character   Median :     14  
##  Dem. Rep. of the Congo: 2056                      Mean   :   4947  
##  Sudan                 : 1991                      3rd Qu.:    129  
##  Ethiopia              : 1933                      Max.   :3272290  
##  (Other)               :83217

Now, I’ll take a look at some of the summary statistics for the number of refugees in the data frame.

describe(refugee$Refugees)
##    vars     n   mean       sd median trimmed   mad min     max   range
## X1    1 95863 4947.1 61660.01     14  108.18 19.27   1 3272290 3272289
##     skew kurtosis     se
## X1 30.24  1182.49 199.15

The mean number of refugees over the time represented by the data frame is 4947.1, with a standard deviation of 61660.01.

Since the original data included a column of which countries refugee fled to, there are countries listed several times for any given year. Since I’m only looking at the total number of refugees leaving a particular country, I’m going to combine the data for each country’s total number of refugees left per year.

allrefugees <- aggregate(refugee$Refugees, by=list(Year=refugee$Year, CoO=refugee$CoO), FUN=sum)
summary(allrefugees)
##      Year                               CoO             x          
##  Length:5878        Angola                :  42   Min.   :      1  
##  Class :character   Burundi               :  42   1st Qu.:     46  
##  Mode  :character   Cambodia              :  42   Median :    774  
##                     Chile                 :  42   Mean   :  80681  
##                     Dem. Rep. of the Congo:  42   3rd Qu.:  16238  
##                     Ethiopia              :  42   Max.   :6339095  
##                     (Other)               :5626
describe(allrefugees$x)
##    vars    n    mean       sd median  trimmed     mad min     max   range
## X1    1 5878 80681.1 365766.9    774 13269.69 1143.83   1 6339095 6339094
##    skew kurtosis      se
## X1 9.47   112.45 4770.78
refugeetotals <- aggregate(refugee$Refugees, by=list(Year=refugee$Year), FUN=sum)
names(refugeetotals)[2] <- "numrefugees"

summary(refugeetotals)
##      Year            numrefugees      
##  Length:42          Min.   : 3529434  
##  Class :character   1st Qu.: 9614508  
##  Mode  :character   Median :10864246  
##                     Mean   :11291512  
##                     3rd Qu.:13607394  
##                     Max.   :17838074
ggplot(refugeetotals, aes(x=Year, y=numrefugees, group = 1)) +
    geom_line() + ylab("Total Refugees") + xlab("Years") +
    ggtitle("Total Number of Refugees") + scale_x_discrete(breaks = c("1975", "1980", "1985", "1990", "1995", "2000", "2005", "2010", "2015"))

sum(allrefugees$x)
## [1] 474243490
top25 <- allrefugees %>%
  arrange(desc(x)) %>%
   slice(1:25) 
top25
## # A tibble: 25 x 3
##     Year              CoO       x
##    <chr>           <fctr>   <int>
##  1  1990      Afghanistan 6339095
##  2  1991      Afghanistan 6306301
##  3  1989      Afghanistan 5643989
##  4  1988      Afghanistan 5622982
##  5  1987      Afghanistan 5511740
##  6  2016 Syrian Arab Rep. 5500448
##  7  1986      Afghanistan 5094283
##  8  2015 Syrian Arab Rep. 4851450
##  9  1983      Afghanistan 4712735
## 10  1985      Afghanistan 4653193
## # ... with 15 more rows

I wanted to take a look at the country and year that the most refugees came from and Afghanistan occupies 8 of the top 10 slots.

afghanistanrefugees <- subset(allrefugees, CoO=="Afghanistan")
afghanistanrefugees <- afghanistanrefugees[c(1,3)]
names(afghanistanrefugees)[2] <- "numrefugees"
describe(afghanistanrefugees$numrefugees)
##    vars  n    mean      sd  median trimmed     mad   min     max   range
## X1    1 38 3311495 1407570 2675455 3234206 1085756 5e+05 6339095 5839095
##    skew kurtosis     se
## X1 0.58    -0.59 228338
ggplot(afghanistanrefugees, aes(x=Year, y=numrefugees, group = 1)) +
    geom_line() + ylab("Afghanistan") + xlab("Years") +
    ggtitle("Number of Refugees in Afghanistan") + scale_x_discrete(breaks = c("1975", "1980", "1985", "1990", "1995", "2000", "2005", "2010", "2015"))

sum(afghanistanrefugees$numrefugees)
## [1] 125836802
(sum(afghanistanrefugees$numrefugees)/sum(allrefugees$x))
## [1] 0.2653422

Refugees from Afghanistan make up 26% of the world’s refugee population from 1975.

noafghanistan <- subset(allrefugees, CoO != "Afghanistan")
describe(noafghanistan$x)
##    vars    n     mean       sd median  trimmed    mad min     max   range
## X1    1 5840 59658.68 231793.6    750 12272.95 1107.5   1 5500448 5500447
##    skew kurtosis      se
## X1 9.47   134.58 3033.16
refugeewoafghan <- aggregate(noafghanistan$x, by=list(Year=noafghanistan$Year), FUN=sum)
names(refugeewoafghan)[2] <- "numrefugees"

describe(refugeewoafghan$numrefugees)
##    vars  n    mean      sd  median trimmed     mad     min      max
## X1    1 42 8295397 2596848 7693171 8168177 2057003 3529434 14044724
##       range skew kurtosis       se
## X1 10515290 0.53    -0.49 400702.3
ggplot(refugeewoafghan, aes(x=Year, y=numrefugees, group = 1)) +
    geom_line() + ylab("Refugees Minus Afghanistan") + xlab("Years") +
    ggtitle("Number of Refugees w/o Afghanistam") + scale_x_discrete(breaks = c("1975", "1980", "1985", "1990", "1995", "2000", "2005", "2010", "2015"))

sum(noafghanistan$x)
## [1] 348406688
ggplot(refugeetotals, aes(x=Year, y=numrefugees, group = 1), color='green') + 
geom_line() + 
geom_line (data = refugeewoafghan, aes(x=Year, Y=numrefugees), color = 'blue') +
geom_line (data = afghanistanrefugees, aes(x=Year, Y=numrefugees), color = 'red') +  
   ggtitle("Comparing Total World Refugees with Refugee Totals without Afghanistan") + scale_x_discrete(breaks = c("1975", "1980", "1985", "1990", "1995", "2000", "2005", "2010", "2015"))
## Warning: Ignoring unknown aesthetics: Y

## Warning: Ignoring unknown aesthetics: Y

refugeestats <- data.frame("Year" = refugeewoafghan$Year,
                           "Num Refugee wo Afghanistan" = refugeewoafghan$numrefugees,
                           "Total Number of Refugees" = refugeetotals$numrefugees)
as.integer(refugeestats$Year)
##  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
## [24] 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42
head(refugeestats)
##   Year Num.Refugee.wo.Afghanistan Total.Number.of.Refugees
## 1 1975                    3529434                  3529434
## 2 1976                    4270631                  4270631
## 3 1977                    4518659                  4518659
## 4 1978                    5065844                  5065844
## 5 1979                    5779912                  6279912
## 6 1980                    6720016                  8454937

Part 4 - Inference:

H0 = There is no relationship between the population of refugees from Afghanistan and the total world refugee population since 1975.

HA = There is a relationship between the population of refugees from Afghanistan and the total world refugee population since 1975.

Conditions

Independence of cases : It is unlikely that the number of refugees from around the world would impact the number of refugees leaving Afghanistan. These cases are independent of one another.

Sample Size/Skew : The Samples size is 474243490, a portion of the population of the world. It meets the minimum sample size to pass this condition.

boxplot(refugeestats$Num.Refugee.wo.Afghanistan, refugeestats$Total.Number.of.Refugees, names=c("Total W/O Afghanistan","All Refugees"), col=c("blue","orange"), main="Box Plot of World Refugee Totals with and without Afghanistan")

qq<- lm(Num.Refugee.wo.Afghanistan ~ Total.Number.of.Refugees, data = refugeestats)
plot(refugeestats$Num.Refugee.wo.Afghanistan ~ refugeestats$Total.Number.of.Refugees, col="blue", xlab="World Refugee Total", ylab="Refugees W/O Afghanistan")
abline(qq)

costats<-lm(refugeestats$Total.Number.of.Refugees ~ refugeestats$Num.Refugee.wo.Afghanistan)
summary(costats)
## 
## Call:
## lm(formula = refugeestats$Total.Number.of.Refugees ~ refugeestats$Num.Refugee.wo.Afghanistan)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2235302  -954386  -462826  1300460  2779406 
## 
## Coefficients:
##                                          Estimate Std. Error t value
## (Intercept)                             1.042e+06  8.159e+05   1.277
## refugeestats$Num.Refugee.wo.Afghanistan 1.236e+00  9.396e-02  13.149
##                                         Pr(>|t|)    
## (Intercept)                                0.209    
## refugeestats$Num.Refugee.wo.Afghanistan 4.15e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1562000 on 40 degrees of freedom
## Multiple R-squared:  0.8121, Adjusted R-squared:  0.8074 
## F-statistic: 172.9 on 1 and 40 DF,  p-value: 4.154e-16
hist(resid(costats))

qqnorm(resid(costats))
qqline(resid(costats))

ANOVA

anova <- aov(refugeestats$Total.Number.of.Refugees ~ refugeestats$Num.Refugee.wo.Afghanistan)
summary(anova)
##                                         Df    Sum Sq   Mean Sq F value
## refugeestats$Num.Refugee.wo.Afghanistan  1 4.221e+14 4.221e+14   172.9
## Residuals                               40 9.765e+13 2.441e+12        
##                                           Pr(>F)    
## refugeestats$Num.Refugee.wo.Afghanistan 4.15e-16 ***
## Residuals                                           
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Part 5 - Conclusion:

Based on the analysis above, the total number of Afghani refugees has had a significant impact on the world’s refugee population. We reject the null hypothesis in this instance. Though the number of refugees from Afghanistan is statisically decreasing, the sheer volumn of refugees still contributes to a massive amount of the world’s refugee population.

For further research, I would like to look more at contributing factors, such a GDP, Economy, and unrest in countries with large numbers of refugee populations.

References:

Data: http://data.un.org/Data.aspx?d=UNHCR&f=indID%3AType-Ref#UNHCR