Immigration is a defining feature of Australia’s economic and social life with the country’s population expected to reach 40 million by the middle of century- immigrants have always been the limelight of politics and elections. There has been a massive increase in Australia’s annual permanent migration intake – from 85,000 in 1996 to 208,000 in 2018. The emergence of India and China as the largest sources – by far – of migrants.
The dataset contains annual summaries of resident population by their geographical region and country of birth. Population number given are running total for each year.
In this report we will analyse the trend in the migrant population over the last two decades. The objective is to visualise the dataset and determine the geographical migration pattern into Australia in the last two decades.
# Load packages
library(readxl) # For reading the excel data
library(ggplot2) # For data visualisation
library(tidyr) # For restructuring original datasets (from wide to long format)
library(dplyr) # For data manipulation (factor levels, labels etc.)
library(stringr) # For pattern matching
library(data.table) # For efficient sorting
library(cowplot) # For grids and combining plots
#read data
Population <- read_excel("Australia_population.xls", sheet = "Table 5.3",skip = 5,
trim_ws = TRUE, range = cell_limits(c(6,2),c(15,25)) )
Population_countrywise <- read_excel("Australia_population.xls",
sheet = "Table 5.1", skip = 5,
range = cell_limits(c(6,1),c(260,25)))
#Data preprocessing for Dataset I :Population
#change the column name for readability
colnames(Population)[colnames(Population)=="Country of birth by major group"] <-
"Major Regions"
#select the data from 98 to 2018
Population_98_18 <- Population %>% select(`Major Regions`,`1998`:`2018`)
#Factorise and change labels
Population_98_18$`Major Regions` <- Population_98_18$`Major Regions` %>%
factor(levels = c("Oceania and Antarctica","North-West Europe",
"Southern and Eastern Europe","North Africa and the Middle East",
"South-East Asia","North-East Asia","Southern and Central Asia",
"Americas","Sub-Saharan Africa"),
labels = c("Oceania and Antarctica","North-West Europe",
"South-East Europe","North Africa & Middle East",
"South-East Asia","North-East Asia","South & Central Asia",
"Americas","Sub-Saharan Africa"))
#Remove Oceania as we are only interested in visualising migration pattern into Australia.
Population_98_18 <- Population_98_18[2:9,]
#change data to long format
Population_98_18_tidy <- Population_98_18 %>% gather(year,population,2:22 )
# Preprocessing 2nd data set : Population countrywise
# Create new column to name the regions as per region codes.
for (i in 1:length(Population_countrywise$`SACC code(a)`)) {
Population_countrywise[i,26] <- ifelse(grepl("^1",Population_countrywise[i,1]),"Oceania",
ifelse(grepl("^2",Population_countrywise[i,1]),"NORTH-WEST EUROPE",
ifelse(grepl("^3",Population_countrywise[i,1]),"SOUTH-EAST EUROPE",
ifelse(grepl("^4",Population_countrywise[i,1]),"NORTH-AFRICA & MIDDLE EAST",
ifelse(grepl("^5",Population_countrywise[i,1]),"SOUTH-EAST ASIA",
ifelse(grepl("^6",Population_countrywise[i,1]),"NORTH-EAST ASIA",
ifelse(grepl("^7",Population_countrywise[i,1]),"SOUTH & CENTRAL ASIA",
ifelse(grepl("^8",Population_countrywise[i,1]),"AMERICAS","Sub-Saharan Africa"
))))))))
}
#Remove Oceania and order the columns
Population_countrywise_98_18 <- Population_countrywise[35:254,c(1,26,2,5:25)]
#change column name of new variable
names(Population_countrywise_98_18) <- ifelse(names(Population_countrywise_98_18)
== "V26","Major_Region_New",names(Population_countrywise_98_18))
#factorise the new variable
Population_countrywise_98_18$Major_Region_New <-as.factor(Population_countrywise_98_18$Major_Region_New)
#Find the average population of each country for the time periond 1998 - 2018
Population_countrywise_98_18$mean <-
round(rowMeans(Population_countrywise_98_18[,4:24]),0)
#convert df into table
d <- data.table(Population_countrywise_98_18,key = "mean")
#sort by top 3 countires in each region
Top3_countries <- d[, tail(.SD, 3), by=Major_Region_New]
#remove means column and rearrange columns
Top3_countries <- Top3_countries %>% select(c(2,1,3:24))
#Choose South-East Europe, and Asian countries
Top3_countries_new <- Top3_countries[c(1:3,16:24),]
#Remove gaps and , in name of countries
for(i in 1:length(Top3_countries_new$`Country of birth`)){
Top3_countries_new[i,3] <- ifelse(Top3_countries_new[i,3] %in% "Hong Kong", "HongKong" ,
ifelse(Top3_countries_new[i,3] %in% "Sri Lanka", "SriLanka",
ifelse(Top3_countries_new[i,3] %in% "Korea, South", "SouthKorea",
Top3_countries_new[i,3] )))
}
#convert wide to long format
Top3_countries_tidy <- Top3_countries_new %>% gather(year,population,4:24 )
#Both dataset is in Tidy format now ready to plot
# Plot 1: Australia's resident population by country of birth over last two decades
p1 <- ggplot(data = Population_98_18_tidy, aes(x=year, y=population))
# specify plot type, distinct regions by using colors
p1 <- p1 + geom_point(aes(color=`Major Regions` ))+
geom_line(aes(group=`Major Regions`,color=`Major Regions` ))
#manually specify the colors using colorbrewer
p1 <- p1 + scale_color_manual(values = c("#b15928","#1f78b4","#ffed6f","#33a02c",
"#6a3d9a","#e31a1c","#e7298a","#ff7f00"),
name = "Major Regions of birth")
#add plot title
p1 <- p1 +labs(title = "Australia's resident population by region of birth",
y = "population(in million)")
p1 <- p1+ scale_y_continuous(
limits =c(100000,1700000),
breaks = c(100000,200000,300000,400000,500000,600000,700000,800000,900000,
1000000,1100000,1200000,1300000,1400000,1500000,1600000,1700000),
labels = c("0.1m","0.2m","0.3m","0.4m","0.5m","0.6m","0.7m","0.8m","0.9m",
"1m","1.1m","1.2m","1.3m","1.4m","1.5m","1.6m","1.7m"))
p1 <- p1+theme(axis.text.x = element_text(angle = 45, hjust = 1))
p1
We can’t fail to notice South-East Europeans are the only ones whose population has declined by more than 100k in the last decades.
Whereas South & Central Asian population rose from a mere 150k in 98 to a staggering 1m in 2018 which is almost 5times in a period of 20 years. North-East Asians has grown three times from 300k to almost 900k. At the same period South-East Asians has doubled from half a million .
Diving further into these 4 regions to understand which 3 countries is a major contributor towards this trend , we explore the plot of each of the middle four regions individually.
#Second plot
#define data and aesthetics
g1 <- ggplot(data = Top3_countries_tidy, aes(x= year,y= population))
# specify plot type, distinct regions by using colors
g1<-g1+ geom_point(aes(color=`Country of birth`))+geom_line(aes(group=`Country of birth`, color=`Country of birth`))
#facet it for multple panels
g1 <- g1+ facet_wrap(Major_Region_New~. , shrink = FALSE)
#Define same color for each category as used in plot1
g1 <- g1+scale_colour_manual(values=c(
China = "#6a3d9a",HongKong= "#cab2d6",SouthKorea="#bc80bd",
India = "#e31a1c",Pakistan="#f4a582" , SriLanka = "#d6604d",
Croatia = "#a6cee3",Greece="#3690c0", Italy = "#1f78b4",
Malaysia = "#66c2a4",Philippines = "#006d2c",Vietnam = "#33a02c"),
name= "country of birth")
g1 <- g1 + scale_y_continuous(
breaks = c(100000,200000,300000,400000,500000,600000),
labels = c("0.1m","0.2m","0.3m","0.4m","0.5m","0.6m"))
g1 <- g1 + scale_x_discrete(
breaks = c(1998,2000,2002,2004,2006,2008,2010,2012,2014,2016,2018))
g1<-g1+labs(title = "Australia's resident population by major country of birth per region"
,y = "population(in million)")
g1 <- g1 +theme(axis.text.x = element_text(angle = 45, hjust = 1))
g1
In North-East asia comprising of China, HongKong and South Korea - Chinese population has been the major driving force in this upward trend. Also, South Koreans have overtaken Hong Kong in last 5-6 years.
In South & Central Asia India, Pakistan & SriLanka are the major cohorts of migrants moving in. With 600k Indians moving in in the last 20 years is the biggest of this lot. It was also bit surprising to notice the number of SriLankans being consistently higher than Pakistanis, though it’s a much smaller country in terms of population and resources than Pakistan.
In South-East Asia, Malaysia, Phillipines and Vietnam were the major migrants. Phillipinos has overtaken Vietnamese in the last 5 years and Malaysians has a steady rise.
Finally, the population from all the three top South-East European country have declined steadily.
To conclude Australia has seen a drastic rise in Indians and Chinese in the last two decades and a steady fall among the Eurpoeans particularly from the South-Eastern europe.
Guardian,Sat 24 Mar 2018 The changing shape of Australia’s immigration policy. (https://www.theguardian.com/australia-news/2018/mar/24/australias-fierce-immigration-debate-is-about-to-get-louder)