Table of contents

  1. Introduction
  2. Data Extraction
  3. Data Merging
  4. Data Consolidation

Introduction

The simplest form of merge() finds the intersection between two different sets of data.

Merge() is very similar to a database join in SQL or merge function in Power BI. This is, indeed, the case and the different arguments to merge() allow you to perform natural joins, as well as left, right, and full outer joins.

The merge() function takes quite a large number of arguments. These arguments can look quite intimidating until you realize that they form a smaller number of related arguments:

x: A data frame.

y: A data frame.

by, by.x, by.y: The names of the columns that are common to both x and y. The default is to use the columns with common names between the two data frames.

all, all.x, all.y: Logical values that specify the type of merge. The default value is all=FALSE (meaning that only the matching rows are returned).

That last group of arguments - all, all.x and all.y - deserves some explanation. These arguments determine the type of merge that will happen.

Data Extraction

The data sources used to gather the variables for the project were:

  1. Individual’s taxation statistic by Postcode provided by the Australian Taxation Office (Years 2013-2016)

  2. Department of Social Services (DSS) Payment Demographic Data by postcode for the years 2013 to 2016

some caption

some caption

Those data sources provide information of the postcode demographic which allow to identify the characteristic of the locations that require major attention for the government in terms of welfare payments.

The Payment Demographic Data by postcode and the Individual’s taxation statistic by Postcode are organised in different tables for each year. Therefore, the first step for the process was gather each table per year and then append all the years in one table per data set.

In this case, we are going to use Left outer join: To include all the rows of the data frame x = (Individual’s taxation statistic by Postcode) and only those from y = (Department of Social Services (DSS) Payment Demographic Data by postcode) that match, specify all.x=TRUE.

Let’s start:

Individual’s taxation statistic by Postcode (Table 1)

library(readxl)
library(httr)
library(dplyr)
library(tidyr)

#### Table 1 Data for 2013

url2 <- "https://data.gov.au/dataset/e29ef9ca-0d1a-47ec-9e9b-14a79a941511/resource/fa6fda82-a70d-4f4c-bb00-203ffc4d20af/download/taxstats2013individual06selected-items-by-taxable-status-postcode.xlsx"
p2f <- tempfile()

GET(url2, write_disk(tf2 <- tempfile(fileext = ".xlsx")))
## Response [https://data.gov.au/dataset/e29ef9ca-0d1a-47ec-9e9b-14a79a941511/resource/fa6fda82-a70d-4f4c-bb00-203ffc4d20af/download/taxstats2013individual06selected-items-by-taxable-status-postcode.xlsx]
##   Date: 2018-10-03 06:45
##   Status: 200
##   Content-Type: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
##   Size: 7.46 MB
## <ON DISK>  C:\Users\josev\AppData\Local\Temp\Rtmpk3s4Yc\file4e446982310c.xlsx
T2013 <- as.data.frame(read_excel(tf2, sheet = 3, col_names = TRUE, skip = 2))

T2013n <- T2013[, c(1,2,3,4,7,8,17,18,25,26,27,28,33,34,95,96,113,114)]

colnames(T2013n) <- c("State", "Postcode", "No.Population", "No.HealthInsurance","No.Tax", "Tax", "No.Salary", "Salary",
                     "No.Allowances", "Allowances", "No.Pensions", "Pensions", "No.Interest", "Interest", "NoRent",
                     "Rent", "No.Superannuation", "Superannuation")

rep2013 <- rep("2013", length(T2013n$State))

T2013n <- T2013n%>%mutate(Year = rep2013)


#### Table 1 Data for 2014

url3 <- "https://data.gov.au/dataset/25e81c18-2083-4abe-81b6-0f530053c63f/resource/942c9a30-baf2-4432-9513-1395fbef7edf/download/taxstats2014individual06taxablestatuspostcode.xlsx"
p3f <- tempfile()

GET(url3, write_disk(tf3 <- tempfile(fileext = ".xlsx")))
## Response [https://data.gov.au/dataset/25e81c18-2083-4abe-81b6-0f530053c63f/resource/942c9a30-baf2-4432-9513-1395fbef7edf/download/taxstats2014individual06taxablestatuspostcode.xlsx]
##   Date: 2018-10-03 06:45
##   Status: 200
##   Content-Type: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
##   Size: 5.02 MB
## <ON DISK>  C:\Users\josev\AppData\Local\Temp\Rtmpk3s4Yc\file4e445f2a443d.xlsx
T2014 <- as.data.frame(read_excel(tf3, sheet = 2, col_names = TRUE, skip = 2))

T2014n <- T2014[ ,c(2,3,4,5,8,9,18,19,26,27,28,29,34,35,96,97,114,115)]

colnames(T2014n) <- c("State", "Postcode", "No.Population", "No.HealthInsurance","No.Tax", "Tax", "No.Salary", "Salary",
                       "No.Allowances", "Allowances", "No.Pensions", "Pensions", "No.Interest", "Interest", "NoRent",
                       "Rent", "No.Superannuation", "Superannuation")

T2014n <- T2014n%>%group_by(State, Postcode)%>% 
  summarize(No.Population = sum(No.Population), No.HealthInsurance = sum(No.HealthInsurance), No.Tax = sum(No.Tax),
            Tax = sum(Tax), No.Salary = sum(No.Salary), Salary = sum(Salary), No.Allowances = sum(No.Allowances),
            Allowances = sum(Allowances), No.Pensions = sum(No.Pensions), Pensions = sum(Pensions), No.Interest = sum(No.Interest), Interest = sum(Interest),
            NoRent = sum(NoRent), Rent = sum(Rent), No.Superannuation = sum(No.Superannuation), Superannuation = sum(Superannuation))

rep2014 <- rep("2014", length(25))


T2014n <- as.data.frame(T2014n%>%mutate(Year = rep2014, Postcode = as.character(Postcode)))

#### Table 1 Data for 2015

url4 <- "https://data.gov.au/dataset/5c99cfed-254d-40a6-af1c-47412b7de6fe/resource/12443334-2273-4e0f-a2e1-c9d0c1d8213b/download/taxstats2015individual06taxablestatusstateterritorypostcode.xlsx"
p4f <- tempfile()

GET(url4, write_disk(tf4 <- tempfile(fileext = ".xlsx")))
## Response [https://data.gov.au/dataset/5c99cfed-254d-40a6-af1c-47412b7de6fe/resource/12443334-2273-4e0f-a2e1-c9d0c1d8213b/download/taxstats2015individual06taxablestatusstateterritorypostcode.xlsx]
##   Date: 2018-10-03 06:45
##   Status: 200
##   Content-Type: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
##   Size: 7.72 MB
## <ON DISK>  C:\Users\josev\AppData\Local\Temp\Rtmpk3s4Yc\file4e44381524f6.xlsx
T2015 <- as.data.frame(read_excel(tf4, sheet = 3, col_names = TRUE, skip = 2))


T2015n <- T2015[ ,c(1,2,3,134,6,7,16,17, 24,25,26,27,32,33,94,95,112,113)]

colnames(T2015n) <- c("State", "Postcode", "No.Population", "No.HealthInsurance","No.Tax", "Tax", "No.Salary", "Salary",
                      "No.Allowances", "Allowances", "No.Pensions", "Pensions", "No.Interest", "Interest", "NoRent",
                      "Rent", "No.Superannuation", "Superannuation")

rep2015 <- rep("2015", length(T2015$State))

T2015n <- T2015n%>%mutate(Year = rep2015)


#### Table 1: Data for 2016

url5 <- "https://data.gov.au/dataset/d170213c-4391-4d10-ac24-b0c11768da3f/resource/8c86b6d5-0485-4387-b12b-ad26a9da2033/download/taxstats2016individual06taxablestatusstateterritorypostcodetaxableincome.xlsx"
p5f <- tempfile()

GET(url5, write_disk(tf5 <- tempfile(fileext = ".xlsx")))
## Response [https://data.gov.au/dataset/d170213c-4391-4d10-ac24-b0c11768da3f/resource/8c86b6d5-0485-4387-b12b-ad26a9da2033/download/taxstats2016individual06taxablestatusstateterritorypostcodetaxableincome.xlsx]
##   Date: 2018-10-03 06:45
##   Status: 200
##   Content-Type: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
##   Size: 8.36 MB
## <ON DISK>  C:\Users\josev\AppData\Local\Temp\Rtmpk3s4Yc\file4e44825e4.xlsx
T2016 <- as.data.frame(read_excel(tf5, sheet = 3, col_names = TRUE, skip = 2))


T2016n <- T2016[ ,c(1,2,3,140,6,7,18,19,28,29,30,31,36,37,100,101,118,119)]

colnames(T2016n) <- c("State", "Postcode", "No.Population", "No.HealthInsurance","No.Tax", "Tax", "No.Salary", "Salary",
                      "No.Allowances", "Allowances", "No.Pensions", "Pensions", "No.Interest", "Interest", "NoRent",
                      "Rent", "No.Superannuation", "Superannuation")
rep2016 <- rep("2016", length(T2016$State))

T2016n <- T2016n%>%mutate(Year = rep2016)


#### Consolidation Table 1 (Appending)

table1 <- rbind(T2013n, T2014n, T2015n, T2016n)
head(table1)
##   State Postcode No.Population No.HealthInsurance No.Tax       Tax
## 1   ACT     2600          5755               4455   4960 155433316
## 2   ACT     2601          2650               1360   1800  37896737
## 3   ACT     2602         19110              12325  16355 284355118
## 4   ACT     2603          6180               4725   5370 177086957
## 5   ACT     2604          7165               5070   6285 142567504
## 6   ACT     2605          7520               5565   6475 144414959
##   No.Salary     Salary No.Allowances Allowances No.Pensions Pensions
## 1      4025  330601387           175     867475         195  1452867
## 2      2185  125396805           195    1192431          45   371161
## 3     15525 1045005856          1075    5975182         940  7201386
## 4      4600  373517913           225    1225973         170  1557289
## 5      5970  491470590           285    1368105         210  1805084
## 6      5570  425204412           225    1162566         475  3396660
##   No.Interest Interest NoRent     Rent No.Superannuation Superannuation
## 1        4105 14852802   1560 41760840               225        4601346
## 2        1775  3622769    545 14057264                25         516019
## 3       13075 27229224   3430 71800358               235        4049535
## 4        4380 15509702   1530 45263237               225        4903593
## 5        4820 10721436   2000 44753380               100        1606696
## 6        5245 15611915   1610 36426162               140        2672196
##   Year
## 1 2013
## 2 2013
## 3 2013
## 4 2013
## 5 2013
## 6 2013

Payment Demographic Data by postcode Table 2

## Response [https://data.gov.au/dataset/cff2ae8a-55e4-47db-a66d-e177fe0ac6a0/resource/6b2343f4-09d3-4b03-a1fe-41897f710c6c/download/dssdemographicsdecember2013.xlsx]
##   Date: 2018-10-03 06:45
##   Status: 200
##   Content-Type: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
##   Size: 668 kB
## <ON DISK>  C:\Users\josev\AppData\Local\Temp\Rtmpk3s4Yc\file4e444878185c.xlsx
## Response [https://data.gov.au/dataset/cff2ae8a-55e4-47db-a66d-e177fe0ac6a0/resource/f0615bb3-463f-4352-902c-0b6bb0e22e7d/download/june-2014-dss-demographics-suppressed-2014-final-ftb.xlsx]
##   Date: 2018-10-03 06:45
##   Status: 200
##   Content-Type: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
##   Size: 1.19 MB
## <ON DISK>  C:\Users\josev\AppData\Local\Temp\Rtmpk3s4Yc\file4e445c2123cf.xlsx
## Response [https://data.gov.au/dataset/cff2ae8a-55e4-47db-a66d-e177fe0ac6a0/resource/e6457899-378e-406f-8027-a6ee8a19eec6/download/dss-demographics-june-2015.xlsx]
##   Date: 2018-10-03 06:45
##   Status: 200
##   Content-Type: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
##   Size: 1.72 MB
## <ON DISK>  C:\Users\josev\AppData\Local\Temp\Rtmpk3s4Yc\file4e441a6f71ac.xlsx
## Response [https://data.gov.au/dataset/cff2ae8a-55e4-47db-a66d-e177fe0ac6a0/resource/829fc910-e263-4a59-ad33-1cd1159ce0df/download/dss-demographics-june-2016.xlsx]
##   Date: 2018-10-03 06:45
##   Status: 200
##   Content-Type: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
##   Size: 1.7 MB
## <ON DISK>  C:\Users\josev\AppData\Local\Temp\Rtmpk3s4Yc\file4e444003fd7.xlsx
## 'data.frame':    11619 obs. of  29 variables:
##  $ Postcode                                            : chr  "200" "800" "801" "803" ...
##  $ ABSTUDY (Living allowance)                          : num  10 10 10 10 10 36 10 10 10 10 ...
##  $ ABSTUDY (Non-living allowance)                      : num  10 10 10 10 10 90 10 115 10 10 ...
##  $ Age Pension                                         : num  10 171 10 10 10 ...
##  $ Austudy                                             : num  10 10 10 10 10 47 10 26 10 10 ...
##  $ Carer Allowance                                     : num  10 20 10 10 10 421 10 337 10 10 ...
##  $ Carer Allowance (Child Health Care Card only)       : num  10 10 10 10 10 10 10 20 10 10 ...
##  $ Carer Payment                                       : num  10 10 10 10 10 148 10 121 10 10 ...
##  $ Commonwealth Seniors Health Card                    : num  10 34 10 10 10 212 10 90 10 10 ...
##  $ Double Orphan Pension                               : num  10 10 10 10 10 10 10 10 10 10 ...
##  $ Disability Support Pension                          : num  10 295 21 10 10 954 10 555 10 10 ...
##  $ Family Tax Benefit Part A                           : num  10 135 10 10 10 ...
##  $ Family Tax Benefit Part B                           : num  10 122 10 10 10 ...
##  $ Health Care Card                                    : num  10 308 10 10 10 1350 10 935 10 10 ...
##  $ Low Income Card                                     : num  10 22 10 10 10 177 10 100 10 10 ...
##  $ Newstart Allowance                                  : num  10 224 10 10 10 686 10 388 10 10 ...
##  $ Parenting Payment (Partnered)                       : num  10 10 10 10 10 83 10 68 10 10 ...
##  $ Parenting Payment (Single)                          : num  10 27 10 10 10 324 10 286 10 10 ...
##  $ Partner Allowance                                   : num  10 10 10 10 10 10 10 10 10 10 ...
##  $ Pensioner Concession Card                           : num  10 505 37 10 10 ...
##  $ Sickness Allowance                                  : num  10 10 10 10 10 10 10 10 10 10 ...
##  $ Special Benefit                                     : num  10 10 10 10 10 10 10 10 10 10 ...
##  $ Wife Pension (Partner on Age Pension)               : num  10 10 10 10 10 10 10 10 10 10 ...
##  $ Wife Pension (Partner on Disability Support Pension): num  10 10 10 10 10 10 10 10 10 10 ...
##  $ Widow Allowance                                     : num  10 10 10 10 10 10 10 10 10 10 ...
##  $ Widow B Pension                                     : num  10 10 10 10 10 10 10 10 10 10 ...
##  $ Youth Allowance (other)                             : num  10 10 10 10 10 94 10 84 10 10 ...
##  $ Youth Allowance (student and apprentice)            : num  10 10 10 10 10 85 10 59 10 10 ...
##  $ Year                                                : num  2013 2013 2013 2013 2013 ...

Data Merging

After the appending, the data from DSS payment demographic was merged to the Individual taxation statistic table using the variables “Postcode” and “Year” as connector of the tables (Primary Key and Foreign Key).

mergetables <- merge(x= table1, y = table2, by.x = c('Postcode','Year'), by.y = c('Postcode','Year'), all.x = TRUE)

mergetables[is.na(mergetables)] <- 0

Data Consolidation

After merging and joining the different data sources, the resultant consolidation is a data frame with 10082 observation and 46 variables (See below)

## 'data.frame':    10282 obs. of  46 variables:
##  $ Postcode                                            : chr  "2000" "2000" "2000" "2000" ...
##  $ Year                                                : chr  "2013" "2014" "2015" "2016" ...
##  $ State                                               : chr  "NSW" "NSW" "NSW" "NSW" ...
##  $ No.Population                                       : num  34525 35372 36185 24207 130 ...
##  $ No.HealthInsurance                                  : num  7115 7410 7707 7710 65 ...
##  $ No.Tax                                              : num  18525 18937 19471 15806 90 ...
##  $ Tax                                                 : num  3.86e+08 4.14e+08 4.52e+08 4.78e+08 2.34e+06 ...
##  $ No.Salary                                           : num  30310 30942 31293 20465 105 ...
##  $ Salary                                              : num  1.09e+09 1.16e+09 1.21e+09 1.08e+09 6.28e+06 ...
##  $ No.Allowances                                       : num  400 364 373 331 15 6 4 4 215 213 ...
##  $ Allowances                                          : num  2399279 2307947 2295483 2019617 74853 ...
##  $ No.Pensions                                         : num  140 133 149 132 0 3 2 2 45 42 ...
##  $ Pensions                                            : num  1465139 1499689 1702404 1635906 6521 ...
##  $ No.Interest                                         : num  11660 11835 13510 13183 75 ...
##  $ Interest                                            : num  32708556 32075103 30352478 32943378 197975 ...
##  $ NoRent                                              : num  3075 3247 3576 2586 20 ...
##  $ Rent                                                : num  1.17e+08 1.23e+08 1.35e+08 1.05e+08 5.16e+05 ...
##  $ No.Superannuation                                   : num  295 321 350 328 0 0 1 0 20 18 ...
##  $ Superannuation                                      : num  5917261 7398363 9335644 9442442 25000 ...
##  $ ABSTUDY (Living allowance)                          : num  10 10 10 11 10 0 10 0 10 23 ...
##  $ ABSTUDY (Non-living allowance)                      : num  10 0 0 0 10 0 0 0 10 10 ...
##  $ Age Pension                                         : num  468 460 420 405 10 0 0 0 197 204 ...
##  $ Austudy                                             : num  39 43 29 24 10 0 0 0 31 46 ...
##  $ Carer Allowance                                     : num  105 105 85 85 10 0 0 0 39 48 ...
##  $ Carer Allowance (Child Health Care Card only)       : num  10 10 10 0 10 0 0 0 10 0 ...
##  $ Carer Payment                                       : num  62 64 42 42 10 0 0 0 25 28 ...
##  $ Commonwealth Seniors Health Card                    : num  162 0 182 0 10 0 0 0 22 0 ...
##  $ Double Orphan Pension                               : num  10 422 0 228 10 0 0 0 10 181 ...
##  $ Disability Support Pension                          : num  370 360 267 324 10 0 0 0 117 112 ...
##  $ Family Tax Benefit Part A                           : num  400 422 341 263 10 0 0 0 184 183 ...
##  $ Family Tax Benefit Part B                           : num  336 351 297 493 10 10 0 20 172 165 ...
##  $ Health Care Card                                    : num  628 116 561 147 28 10 10 21 389 108 ...
##  $ Low Income Card                                     : num  154 382 136 268 27 0 10 0 196 136 ...
##  $ Newstart Allowance                                  : num  395 10 344 17 10 0 0 0 148 10 ...
##  $ Parenting Payment (Partnered)                       : num  22 45 24 28 10 0 0 0 10 21 ...
##  $ Parenting Payment (Single)                          : num  38 10 36 0 10 0 0 0 23 10 ...
##  $ Partner Allowance                                   : num  10 1014 10 786 10 ...
##  $ Pensioner Concession Card                           : num  1032 168 850 173 10 ...
##  $ Sickness Allowance                                  : num  10 10 10 0 10 0 0 0 10 0 ...
##  $ Special Benefit                                     : num  10 10 10 7 10 0 0 0 10 10 ...
##  $ Wife Pension (Partner on Age Pension)               : num  10 10 10 7 10 0 0 0 10 10 ...
##  $ Wife Pension (Partner on Disability Support Pension): num  10 0 0 0 10 0 0 0 10 0 ...
##  $ Widow Allowance                                     : num  10 10 10 0 10 0 0 0 10 0 ...
##  $ Widow B Pension                                     : num  10 10 0 0 10 0 0 0 10 0 ...
##  $ Youth Allowance (other)                             : num  10 23 10 10 10 0 10 0 10 10 ...
##  $ Youth Allowance (student and apprentice)            : num  109 150 128 131 36 44 37 33 204 242 ...