UNHCR Population Statistics Database

The database currently contains data about UNHCR’s populations of concern from the year 2000 up to 2013 and you can use it to investigate different aspects of these populations: their general composition by location of residence or origin, their status (refugees, asylum seekers, internally displaced persons, etc.), their evolution over time, and so on.

In each of the screens in the system you start by selecting the sub-set of data you are interested in, choosing one or more countries or territories of residence and/or origin. You can focus on specific types of population by checking the boxes for only those you are concerned with, and you can summarise the data by checking the boxes for only those data items by which you wish the data to be broken down.

In the Overview page, each row of data presents the information about UNHCR’s populations of concern for a given year and country of residence and/or origin. Figures for the different types of population are presented across the page.

The Time Series page presents the same data as the Overview page, but arranges the figures as a yearly time series across the page.

UNHCR’s populations of concern

General Notes

Data Set Extraction

Source: UNHCR Population Statistics Database

Page Selected: Time Series

Selection Criteria (below)

Date Range: 2000 - 2013

Country / Territory of Residence: All countries / territories

Origin / Returned from: All origins

Population Type(s):
    - Refugees
    - Asylum seekers
    - Returned refugees
    - Internally displaced persons
    - Returned IDPs
    - Stateless persons
    - Others of concern

Data item(s) to display:
    - Country / territory of residence
    - Origin / Returned from
    - Population type

Section 1: Data Import and Pre-Processing

  1. Execute the query using the Selection Criteria shown above
  2. Download the Comma Deliminated Text File (CSV) file and save it to your working directory
  3. Read the CSV file into a data frame
  4. Remove the first (6) rows from the data frame (which served as spacers)
  5. Rename the column headers
  6. Convert the categorical variables to factors
data <- read.csv('unhcr_timeseries.csv', header=F, na.strings='*')
data <- data[7:nrow(data),]
names(data) <- c('Residence', 'Origin', 'Type', 'FY2000', 'FY2001', 'FY2002', 'FY2003', 'FY2004', 'FY2005', 'FY2006', 'FY2007',
                 'FY2008', 'FY2009', 'FY2010', 'FY2011', 'FY2012', 'FY2013')
data$Residence <- as.factor(data$Residence)
data$Origin <- as.factor(data$Origin)
data$Type <- as.factor(data$Type)
sapply(data, class)
## Residence    Origin      Type    FY2000    FY2001    FY2002    FY2003 
##  "factor"  "factor"  "factor" "integer" "integer" "integer" "integer" 
##    FY2004    FY2005    FY2006    FY2007    FY2008    FY2009    FY2010 
## "integer" "integer" "integer" "integer" "integer" "integer" "integer" 
##    FY2011    FY2012    FY2013 
## "integer" "integer" "integer"

Section 2: Data Exploratory Analysis

  1. Display the number of observations containing “NA” or that are not “complete cases”
  2. Omit selected observations containing a high volume of NAs
  3. Summarize the data by Population Type (or ‘People of Concern’)
  4. Analyze the Refugee Population sizes by Year
  5. Display the (5) countries hosting the largest number of Refugees on average (since 2005)
  6. Visualize the Refugee Population trends in (2) Central African countries
  7. Summarize the change in Refugee Populations in Central Africa between FY2008 and FY2009

There are 16309 total observations in the file, with 2476 of them serving as “complete” cases. The remaining balance of 13833 observations contain at least one instance of “NA” and will be removed from the dataset when conducting visual analysis.

data9_13 <- data[,c(1:3,13:17)]
nrow(data9_13[!complete.cases(data9_13),])
## [1] 11815
data9_13 <- na.omit(data9_13)
nrow(data9_13)
## [1] 4494

To better understand the “people of concern” population currently being assisted by the UNHCR, the steps below manipulate the original dataset to summarize the populations by type for FY2013. To aid in the analysis, the original dataset has been trimmed from 16309 observations to 4494 by stripping out 11815 observations containing “NA” values. Since there were a high number of incomplete observations, the number of columns was also reduced, resulting in a time series from FY2005 to FY2013 (rather than from FY2000).

## Warning: package 'plyr' was built under R version 3.1.1
Type Population
Asylum seekers 843029
Internally displaced 15956463
Others of concern 150411
Refugees 11073007
Returned IDPs 849998
Returned refugees 191546
Stateless 2522169

The largest populations are represented by people of concerns in the following (2) categories: Internally displaced, 15956463 and Refugees, 11073007

Due to a number of global conflicts, natural disasters, famines, and various other events over the last 8 years, Refugee populations have remained relatively high. Since these events occur in isolated areas (typically contained within specific countries or regions), a small number of countries typically contribute to the Refugee population growth year over year. Visual analysis of the Refugee population is provided below:

## Warning: package 'reshape2' was built under R version 3.1.1

plot of chunk unnamed-chunk-5

plot of chunk unnamed-chunk-6

In FY2013, the total Refugee Population was 10056290, indicating that there has been little change to the number of global Refugees since 2007. (Note: this data has been filtered to exclude observations with NA values).

plot of chunk unnamed-chunk-7

##        0%       25%       50%       75%      100% 
##       5.0     602.2    5463.0   42151.2 1615995.0
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##       5     602    5460   67300   42200 1620000

plot of chunk unnamed-chunk-8

Focusing on the Refugee Populations in Central Africa

Central African Republic

Data pre-processing steps below:

data5_13 <- data[,c(1:3,9:17)]
nrow(data5_13[!complete.cases(data5_13),])
## [1] 12706
data5_13 <- na.omit(data5_13)
nrow(data5_13)
## [1] 3603

plot of chunk unnamed-chunk-10

The Congo

Data pre-processing steps below:

data5_13 <- data[,c(1:3,9:17)]
nrow(data5_13[!complete.cases(data5_13),])
## [1] 12706
data5_13 <- na.omit(data5_13)
nrow(data5_13)
## [1] 3603

plot of chunk unnamed-chunk-12

Summary

In reviewing the year-over-year (YoY) changes to the Refugee populations in both the Central African Republic and the Congo, it appears that events occurring in FY2009 ended a period of declines to the total Refugee population. When researching this period (FY2008 - FY2009) online, I discovered an article from the UN which called attention to increased violence in Central Africa occuring during this same time period. You can read the article by visiting the United Nations’ website.