Required packages

library(tidyr)

Attaching package: ‘tidyr’

The following object is masked _by_ ‘.GlobalEnv’:

    who
library(readr)
library(stringr)
library(dplyr)

Attaching package: ‘dplyr’

The following objects are masked from ‘package:stats’:

    filter, lag

The following objects are masked from ‘package:base’:

    intersect, setdiff, setequal, union
library(Hmisc)
Loading required package: lattice
Loading required package: survival
Loading required package: Formula
Loading required package: ggplot2

Attaching package: ‘Hmisc’

The following objects are masked from ‘package:dplyr’:

    src, summarize

The following objects are masked from ‘package:base’:

    format.pval, units
library(outliers)

Executive Summary

In data preprocessing techniques raw data is transformed into workable data that can be used for statistical visualizations and analysis. In this report, these techniques are applied on 3 datasets sourced from sourced from Australian Bureau of Statistics (http://www.abs.gov.au/). These datasets have selected labor information for residents of all the local government areas in Victoria. These datasets were merged and various techniques like data type conversions, subsetting, scanning for outliers in the dataset was done in alignment with the tidy data principles to convert the raw data into data that is suitable for statistical purposes.

Data

The datasets in the current report have been sourced from Australian Bureau of Statistics (http://www.abs.gov.au/). 3 datasets, 2016Census_G02_VIC_LGA.csv, 2016Census_G40_VIC_LGA and LGA_2016_VIC have been used in the report. 2016Census_G40_VIC_LGA includes selected Medians and Averages for age, rent, mortgage, income and other information. 2016Census_G02_VIC_LGA.csv includes various information about selected Labour Force, Education and Migration Characteristics by Sex about the labour force in Victorian local government areas (LGA). LGA_2016_VIC contains information like names of LGA, area the LGA etc.

The 3 datasets were imported into lga_1, lga_2 and lga_3 dataframes using the read_csv function from readr package.

#read data
lga_1 <- read_csv("/Users/Vidya/Documents/Uni/Semester5/DataPreprocessing/assignment3/data_dp/2016Census_G02_VIC_LGA.csv")
Parsed with column specification:
cols(
  LGA_CODE_2016 = col_character(),
  Median_age_persons = col_integer(),
  Median_mortgage_repay_monthly = col_integer(),
  Median_tot_prsnl_inc_weekly = col_integer(),
  Median_rent_weekly = col_integer(),
  Median_tot_fam_inc_weekly = col_integer(),
  Average_num_psns_per_bedroom = col_double(),
  Median_tot_hhd_inc_weekly = col_integer(),
  Average_household_size = col_double()
)
lga_2 <- read_csv("/Users/Vidya/Documents/Uni/Semester5/DataPreprocessing/assignment3/data_dp/2016Census_G40_VIC_LGA.csv")
Parsed with column specification:
cols(
  .default = col_integer(),
  LGA_CODE_2016 = col_character(),
  Percent_Unem_loyment_M = col_double(),
  Percent_Unem_loyment_F = col_double(),
  Percent_Unem_loyment_P = col_double(),
  Percnt_LabForc_prticipation_M = col_double(),
  Percnt_LabForc_prticipation_F = col_double(),
  Percnt_LabForc_prticipation_P = col_double(),
  Percnt_Employment_to_populn_M = col_double(),
  Percnt_Employment_to_populn_F = col_double(),
  Percnt_Employment_to_populn_P = col_double()
)
See spec(...) for full column specifications.
lga_3 <- read_csv("/Users/Vidya/Documents/Uni/Semester5/DataPreprocessing/assignment3/data_dp/LGA_2016_VIC.csv")
Parsed with column specification:
cols(
  MB_CODE_2016 = col_double(),
  LGA_CODE_2016 = col_integer(),
  LGA_NAME_2016 = col_character(),
  STATE_CODE_2016 = col_integer(),
  STATE_NAME_2016 = col_character(),
  AREA_ALBERS_SQKM = col_double()
)
head(lga_1, 3)
# A tibble: 3 x 9
  LGA_CODE_2016 Median_age_pers… Median_mortgage… Median_tot_prsn… Median_rent_wee… Median_tot_fam_…
  <chr>                    <int>            <int>            <int>            <int>            <int>
1 LGA20110                    49             1300              562              200             1322
2 LGA20260                    46             1022              556              185             1263
3 LGA20570                    38             1350              590              250             1489
# ... with 3 more variables: Average_num_psns_per_bedroom <dbl>, Median_tot_hhd_inc_weekly <int>,
#   Average_household_size <dbl>
head(lga_2, 3)
# A tibble: 3 x 67
  LGA_CODE_2016 P_15_yrs_over_M P_15_yrs_over_F P_15_yrs_over_P lfs_Emplyed_wrk… lfs_Emplyed_wrk…
  <chr>                   <int>           <int>           <int>            <int>            <int>
1 LGA20110                 5074            5307           10384             1917             1091
2 LGA20260                 5232            4572            9798             1779              943
3 LGA20570                38889           43329           82219            16596             8873
# ... with 61 more variables: lfs_Emplyed_wrked_full_time_P <int>, lfs_Emplyed_wrked_part_time_M <int>,
#   lfs_Emplyed_wrked_part_time_F <int>, lfs_Emplyed_wrked_part_time_P <int>,
#   lfs_Employed_away_from_work_M <int>, lfs_Employed_away_from_work_F <int>,
#   lfs_Employed_away_from_work_P <int>, lfs_Unmplyed_lookng_for_wrk_M <int>,
#   lfs_Unmplyed_lookng_for_wrk_F <int>, lfs_Unmplyed_lookng_for_wrk_P <int>, lfs_Tot_LF_M <int>,
#   lfs_Tot_LF_F <int>, lfs_Tot_LF_P <int>, lfs_N_the_labour_force_M <int>,
#   lfs_N_the_labour_force_F <int>, lfs_N_the_labour_force_P <int>, Percent_Unem_loyment_M <dbl>,
#   Percent_Unem_loyment_F <dbl>, Percent_Unem_loyment_P <dbl>, Percnt_LabForc_prticipation_M <dbl>,
#   Percnt_LabForc_prticipation_F <dbl>, Percnt_LabForc_prticipation_P <dbl>,
#   Percnt_Employment_to_populn_M <dbl>, Percnt_Employment_to_populn_F <dbl>,
#   Percnt_Employment_to_populn_P <dbl>, Non_sch_quals_PostGrad_Dgre_M <int>,
#   Non_sch_quals_PostGrad_Dgre_F <int>, Non_sch_quals_PostGrad_Dgre_P <int>,
#   Non_sch_quals_Gr_Dip_Gr_Crt_M <int>, Non_sch_quals_Gr_Dip_Gr_Crt_F <int>,
#   Non_sch_quals_Gr_Dip_Gr_Crt_P <int>, Non_sch_quals_Bchelr_Degree_M <int>,
#   Non_sch_quals_Bchelr_Degree_F <int>, Non_sch_quals_Bchelr_Degree_P <int>,
#   Non_sch_quls_Advncd_Dip_Dip_M <int>, Non_sch_quls_Advncd_Dip_Dip_F <int>,
#   Non_sch_quls_Advncd_Dip_Dip_P <int>, Non_sch_quls_Cert3a4_Level_M <int>,
#   Non_sch_quls_Cert3a4_Level_F <int>, Non_sch_quls_Cert3a4_Level_P <int>,
#   Non_sch_quls_Cert1a2_Level_M <int>, Non_sch_quls_Cert1a2_Level_F <int>,
#   Non_sch_quls_Cert1a2_Level_P <int>, Non_sch_quls_Certnfd_Level_M <int>,
#   Non_sch_quls_Certnfd_Level_F <int>, Non_sch_quls_Certnfd_Level_P <int>,
#   Non_sch_quls_CertTot_Level_M <int>, Non_sch_quls_CertTot_Level_F <int>,
#   Non_sch_quls_CertTot_Level_P <int>, Migtn_Lvd_same_add_1_yr_ago_M <int>,
#   Migtn_Lvd_same_add_1_yr_ago_F <int>, Migtn_Lvd_same_add_1_yr_ago_P <int>,
#   Migtn_Lvd_Diff_add_1_yr_ago_M <int>, Migtn_Lvd_Diff_add_1_yr_ago_F <int>,
#   Migtn_Lvd_Diff_add_1_yr_ago_P <int>, Migtn_Lvd_sme_add_5_yrs_ago_M <int>,
#   Migtn_Lvd_sme_add_5_yrs_ago_F <int>, Migtn_Lvd_sme_add_5_yrs_ago_P <int>,
#   Mign_Lvd_Diff_add_5_yrs_ago_M <int>, Mign_Lvd_Diff_add_5_yrs_ago_F <int>,
#   Mign_Lvd_Diff_add_5_yrs_ago_P <int>
head(lga_3, 3)
# A tibble: 3 x 6
  MB_CODE_2016 LGA_CODE_2016 LGA_NAME_2016 STATE_CODE_2016 STATE_NAME_2016 AREA_ALBERS_SQKM
         <dbl>         <int> <chr>                   <int> <chr>                      <dbl>
1  20011170000         20570 Ballarat (C)                2 Victoria                  0.054 
2  20011160000         20570 Ballarat (C)                2 Victoria                  0.0331
3  20010160000         20570 Ballarat (C)                2 Victoria                  0.0409

It was observed that lga_1 had 82 observations for 9 variables, lga_2 had 82 observations for 67 variables and lga_3 had 85014 observations for 6 variables.

#dimension of the dataframe
{ cat("dimension of lga_1: ", dim(lga_1))
cat("\ndimension of lga_2: ", dim(lga_2))
cat("\ndimension of lga_3: ", dim(lga_3))}
dimension of lga_1:  82 9
dimension of lga_2:  82 67
dimension of lga_3:  85014 6

Since the report is only interested in looking at variables that give information about the employment details, education levels and such information, those that did not contribute to this were removed. Distinct LGA names were selected so that the final dataframe can have the names of each LGA.

#Dropping variables that are not of interest from the dataframes before joining
lga_1 <- select(lga_1, LGA_CODE_2016, Median_mortgage_repay_monthly, Median_rent_weekly, Median_tot_fam_inc_weekly, Average_household_size)
lga_2 <- lga_2[, c(1, 5:15)]
lga_3 <- select(lga_3, LGA_CODE_2016, LGA_NAME_2016)
#Select distinct lga names from lga_3
lga_3 <-  lga_3 %>% distinct(LGA_CODE_2016, LGA_NAME_2016)

The 3 datasets were joined on the key variable LGA_CODE_2016. This variable was of different data type in LGA_3 table and therefore not combatible for joining. This column was convered to character datatype before joining the 3 datasets.

# change datatype to character to make the columns compatible
lga_3$LGA_CODE_2016 <- as.character(lga_3$LGA_CODE_2016)

The 3 data frames was merged using the LGA_CODE_2016 column. The LGA codes in the LGA_CODE_2016 columns had ‘LGA’ prefixed to it. To combine the 3 dataframes the value has be similiar across the dataframes. Using the transform function from stringr, the ‘LGA’ prefix was removed.

#change the name of the the lga_code column in lga_1 and lga_2
lga_1 <- lga_1 %>% transform(LGA_CODE_2016=str_replace(LGA_CODE_2016,"LGA",""))
lga_2 <- lga_2 %>%  transform(LGA_CODE_2016=str_replace(LGA_CODE_2016,"LGA",""))
#Join the three tables
lga_1_2 <- lga_1 %>% left_join(lga_2, by = "LGA_CODE_2016")
lga_joined <- lga_1_2 %>% left_join(lga_3, by = "LGA_CODE_2016")
#Shift the column LGA name next to the lga code
lga_joined <- lga_joined %>% select(LGA_NAME_2016,everything())
head(lga_joined,3)
  LGA_NAME_2016 LGA_CODE_2016 Median_mortgage_repay_monthly Median_rent_weekly
1    Alpine (S)         20110                          1300                200
2   Ararat (RC)         20260                          1022                185
3  Ballarat (C)         20570                          1350                250
  Median_tot_fam_inc_weekly Average_household_size lfs_Emplyed_wrked_full_time_M
1                      1322                    2.2                          1917
2                      1263                    2.3                          1779
3                      1489                    2.4                         16596
  lfs_Emplyed_wrked_full_time_F lfs_Emplyed_wrked_full_time_P lfs_Emplyed_wrked_part_time_M
1                          1091                          3005                           711
2                           943                          2726                           573
3                          8873                         25472                          5039
  lfs_Emplyed_wrked_part_time_F lfs_Emplyed_wrked_part_time_P lfs_Employed_away_from_work_M
1                          1384                          2088                           192
2                          1101                          1676                           149
3                         11636                         16673                          1059
  lfs_Employed_away_from_work_F lfs_Employed_away_from_work_P lfs_Unmplyed_lookng_for_wrk_M
1                           199                           400                           136
2                           168                           316                           151
3                          1511                          2571                          1843
  lfs_Unmplyed_lookng_for_wrk_F
1                            82
2                           107
3                          1573
{ cat("dimension of lga_1_2: ", dim(lga_1_2))}
dimension of lga_1_2:  82 16

Understand

There are 82 observations from 17 variables in the joined dataset. There are 2 character variables, 14 integer data types and 1 numeric type variable.

str(lga_joined)
'data.frame':   82 obs. of  17 variables:
 $ LGA_NAME_2016                : chr  "Alpine (S)" "Ararat (RC)" "Ballarat (C)" "Banyule (C)" ...
 $ LGA_CODE_2016                : chr  "20110" "20260" "20570" "20660" ...
 $ Median_mortgage_repay_monthly: int  1300 1022 1350 1950 1315 1465 2500 1192 2500 1599 ...
 $ Median_rent_weekly           : int  200 185 250 350 250 250 450 200 406 300 ...
 $ Median_tot_fam_inc_weekly    : int  1322 1263 1489 2033 1192 1462 2765 1232 2652 1358 ...
 $ Average_household_size       : num  2.2 2.3 2.4 2.6 2.2 2.5 2.6 2.2 2.6 3 ...
 $ lfs_Emplyed_wrked_full_time_M: int  1917 1779 16596 22786 4231 8399 17691 2144 30428 31470 ...
 $ lfs_Emplyed_wrked_full_time_F: int  1091 943 8873 12770 2061 3969 9871 1101 19343 16975 ...
 $ lfs_Emplyed_wrked_full_time_P: int  3005 2726 25472 35554 6292 12364 27559 3240 49766 48442 ...
 $ lfs_Emplyed_wrked_part_time_M: int  711 573 5039 6450 1794 2211 5171 605 10059 10830 ...
 $ lfs_Emplyed_wrked_part_time_F: int  1384 1101 11636 14064 3468 5337 11274 1411 18926 16058 ...
 $ lfs_Emplyed_wrked_part_time_P: int  2088 1676 16673 20514 5259 7548 16443 2012 28984 26894 ...
 $ lfs_Employed_away_from_work_M: int  192 149 1059 1170 457 610 847 183 1468 2444 ...
 $ lfs_Employed_away_from_work_F: int  199 168 1511 1670 532 734 1134 219 2038 2254 ...
 $ lfs_Employed_away_from_work_P: int  400 316 2571 2846 988 1346 1989 402 3507 4696 ...
 $ lfs_Unmplyed_lookng_for_wrk_M: int  136 151 1843 1797 477 620 1093 175 2575 4815 ...
 $ lfs_Unmplyed_lookng_for_wrk_F: int  82 107 1573 1635 377 624 1121 140 2263 4443 ...

Before performing any datatype conversions, the data needs to be reshaped. We can observe that columns 7 to 17 are values instead of variables.Therefore, the data does not conform to tidy data principles. To rectify this the dataset was converted from wide to long in the next section.

Tidy & Manipulate Data I

The information for the employment status of residents of an LGA is spread across columns 7 to 10 in the joined dataset. To do any meaningful statistical visualizations or analysis on the data, these values will need to be contained inside the variable Employment_Status.This was done using the gather() function. Also, the employment details value also has details about the LGA residents’s gender. To make the information in each cell atomic, the value was split and assigned to 2 different columns: Employment_status and Sex. There were rows which held values for all people of employable age in an LGA. This was also removed.

# Forming Employment_status  column
lga_emp <- lga_joined %>% gather(7:17, key = "Employment_status", value = "Count")
lga_emp <- lga_emp %>% transform(Employment_status=str_replace(Employment_status,"lfs_","")) %>% separate(Employment_status, into = c("Employment_status","Sex"), sep = -1)
head(lga_emp, 3)
  LGA_NAME_2016 LGA_CODE_2016 Median_mortgage_repay_monthly Median_rent_weekly
1    Alpine (S)         20110                          1300                200
2   Ararat (RC)         20260                          1022                185
3  Ballarat (C)         20570                          1350                250
  Median_tot_fam_inc_weekly Average_household_size        Employment_status Sex Count
1                      1322                    2.2 Emplyed_wrked_full_time_   M  1917
2                      1263                    2.3 Emplyed_wrked_full_time_   M  1779
3                      1489                    2.4 Emplyed_wrked_full_time_   M 16596
str(lga_emp)
'data.frame':   902 obs. of  9 variables:
 $ LGA_NAME_2016                : chr  "Alpine (S)" "Ararat (RC)" "Ballarat (C)" "Banyule (C)" ...
 $ LGA_CODE_2016                : chr  "20110" "20260" "20570" "20660" ...
 $ Median_mortgage_repay_monthly: int  1300 1022 1350 1950 1315 1465 2500 1192 2500 1599 ...
 $ Median_rent_weekly           : int  200 185 250 350 250 250 450 200 406 300 ...
 $ Median_tot_fam_inc_weekly    : int  1322 1263 1489 2033 1192 1462 2765 1232 2652 1358 ...
 $ Average_household_size       : num  2.2 2.3 2.4 2.6 2.2 2.5 2.6 2.2 2.6 3 ...
 $ Employment_status            : chr  "Emplyed_wrked_full_time_" "Emplyed_wrked_full_time_" "Emplyed_wrked_full_time_" "Emplyed_wrked_full_time_" ...
 $ Sex                          : chr  "M" "M" "M" "M" ...
 $ Count                        : int  1917 1779 16596 22786 4231 8399 17691 2144 30428 31470 ...
#Remove rows which have the total count of both sexes
lga_emp_1 <- lga_emp %>% filter( Sex == 'F' | Sex =='M')
#converting the Employment_status variable to factor with levels
lga_emp_1$Sex <- as.factor(lga_emp_1$Sex)
lga_emp_1$Employment_status <- as.factor(lga_emp_1$Employment_status)
#rename factor levels
levels(lga_emp_1$Sex)
[1] "F" "M"
levels(lga_emp_1$Employment_status)
[1] "Employed_away_from_work_" "Emplyed_wrked_full_time_" "Emplyed_wrked_part_time_"
[4] "Unmplyed_lookng_for_wrk_"
lga_emp_1$Sex <- factor(lga_emp_1$Sex, ordered = TRUE,levels=c('F','M'),labels=c('Female','Male'))
lga_emp_1$Employment_status <- factor(lga_emp_1$Employment_status, ordered = TRUE,levels=c("Employed_away_from_work_", "Emplyed_wrked_full_time_", "Emplyed_wrked_part_time_","Unmplyed_lookng_for_wrk_"),labels=c("Away",'Full_time','Part_time',"Unemployed/ Looking"))
str(lga_emp_1)
'data.frame':   656 obs. of  9 variables:
 $ LGA_NAME_2016                : chr  "Alpine (S)" "Ararat (RC)" "Ballarat (C)" "Banyule (C)" ...
 $ LGA_CODE_2016                : chr  "20110" "20260" "20570" "20660" ...
 $ Median_mortgage_repay_monthly: int  1300 1022 1350 1950 1315 1465 2500 1192 2500 1599 ...
 $ Median_rent_weekly           : int  200 185 250 350 250 250 450 200 406 300 ...
 $ Median_tot_fam_inc_weekly    : int  1322 1263 1489 2033 1192 1462 2765 1232 2652 1358 ...
 $ Average_household_size       : num  2.2 2.3 2.4 2.6 2.2 2.5 2.6 2.2 2.6 3 ...
 $ Employment_status            : Ord.factor w/ 4 levels "Away"<"Full_time"<..: 2 2 2 2 2 2 2 2 2 2 ...
 $ Sex                          : Ord.factor w/ 2 levels "Female"<"Male": 2 2 2 2 2 2 2 2 2 2 ...
 $ Count                        : int  1917 1779 16596 22786 4231 8399 17691 2144 30428 31470 ...

Tidy & Manipulate Data II

A new variable Total that contains the information about the percentage of persons of each employment status by gender in each LGA was created.

#summarise the total number of males and females for each employment status in each LGA 
lga_emp_total <- lga_emp_1 %>% group_by(Employment_status) %>%  group_by(LGA_NAME_2016) %>% summarise(Total = sum(Count, na.rm = TRUE))
#join lga_emp_total to get the total 
lga_emp_total <- lga_emp_1 %>% left_join(lga_emp_total, by = "LGA_NAME_2016")
#create new dataframe which has the mutated value
lga_unemp_percentage <- mutate(lga_emp_total, Emp_status_Perc_by_lga = round((Count/Total)*100,2))
head(lga_unemp_percentage,3)
  LGA_NAME_2016 LGA_CODE_2016 Median_mortgage_repay_monthly Median_rent_weekly
1    Alpine (S)         20110                          1300                200
2   Ararat (RC)         20260                          1022                185
3  Ballarat (C)         20570                          1350                250
  Median_tot_fam_inc_weekly Average_household_size Employment_status  Sex Count Total
1                      1322                    2.2         Full_time Male  1917  5712
2                      1263                    2.3         Full_time Male  1779  4971
3                      1489                    2.4         Full_time Male 16596 48130
  Emp_status_Perc_by_lga
1                  33.56
2                  35.79
3                  34.48

Scan I

The lga_unemp_percentage dataset was scanned for missing or special values. No such values were found.

is.special <- function(x){
  if (is.numeric(x)) !is.finite(x) else is.na(x)
}
is.special <- function(x){
  if (is.numeric(x)) !is.finite(x)
}
sapply(lga_unemp_percentage, is.special)
$LGA_NAME_2016
NULL

$LGA_CODE_2016
NULL

$Median_mortgage_repay_monthly
  [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 [17] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 [33] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 [49] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 [65] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 [81] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 [97] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[113] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[129] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[145] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[161] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[177] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[193] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[209] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[225] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[241] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[257] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[273] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[289] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[305] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[321] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[337] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[353] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[369] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[385] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[401] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[417] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[433] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[449] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[465] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[481] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[497] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[513] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[529] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[545] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[561] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[577] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[593] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[609] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[625] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[641] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE

$Median_rent_weekly
  [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 [17] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 [33] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 [49] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 [65] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 [81] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 [97] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[113] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[129] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[145] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[161] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[177] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[193] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[209] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[225] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[241] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[257] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[273] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[289] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[305] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[321] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[337] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[353] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[369] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[385] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[401] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[417] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[433] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[449] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[465] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[481] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[497] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[513] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[529] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[545] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[561] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[577] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[593] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[609] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[625] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[641] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE

$Median_tot_fam_inc_weekly
  [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 [17] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 [33] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 [49] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 [65] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 [81] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 [97] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[113] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[129] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[145] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[161] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[177] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[193] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[209] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[225] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[241] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[257] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[273] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[289] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[305] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[321] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[337] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[353] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[369] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[385] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[401] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[417] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[433] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[449] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[465] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[481] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[497] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[513] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[529] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[545] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[561] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[577] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[593] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[609] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[625] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[641] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE

$Average_household_size
  [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 [17] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 [33] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 [49] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 [65] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 [81] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 [97] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[113] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[129] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[145] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[161] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[177] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[193] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[209] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[225] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[241] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[257] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[273] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[289] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[305] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[321] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[337] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[353] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[369] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[385] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[401] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[417] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[433] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[449] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[465] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[481] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[497] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[513] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[529] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[545] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[561] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[577] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[593] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[609] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[625] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[641] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE

$Employment_status
NULL

$Sex
NULL

$Count
  [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 [17] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 [33] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 [49] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 [65] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 [81] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 [97] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[113] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[129] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[145] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[161] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[177] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[193] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[209] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[225] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[241] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[257] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[273] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[289] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[305] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[321] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[337] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[353] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[369] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[385] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[401] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[417] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[433] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[449] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[465] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[481] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[497] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[513] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[529] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[545] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[561] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[577] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[593] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[609] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[625] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[641] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE

$Total
  [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 [17] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 [33] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 [49] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 [65] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 [81] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 [97] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[113] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[129] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[145] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[161] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[177] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[193] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[209] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[225] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[241] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[257] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[273] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[289] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[305] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[321] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[337] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[353] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[369] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[385] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[401] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[417] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[433] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[449] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[465] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[481] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[497] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[513] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[529] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[545] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[561] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[577] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[593] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[609] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[625] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[641] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE

$Emp_status_Perc_by_lga
  [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 [17] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 [33] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 [49] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 [65] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 [81] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 [97] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[113] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[129] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[145] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[161] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[177] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[193] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[209] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[225] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[241] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[257] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[273] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[289] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[305] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[321] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[337] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[353] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[369] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[385] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[401] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[417] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[433] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[449] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[465] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[481] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[497] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[513] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[529] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[545] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[561] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[577] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[593] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[609] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[625] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[641] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
# no na or other special values found

Scan II

The lga_unemp_percentage dataset was scanned for outliers by plotting the boxplot for the numerical variables. The boxplot for the Median_mortgage_repay_monthly, Median_rent_weekly, Median_tot_fam_inc_weekly, Average_household_size showed some low and high values. According to the Z-score method, the dataset has 48 outliers.

par(mfrow = c(2,3))
boxplot(lga_unemp_percentage$Median_mortgage_repay_monthly, xlab="Median monthly mortgage repayment", col = "grey")
boxplot(lga_unemp_percentage$Median_rent_weekly, xlab="Median weekly rent", col = "grey")
boxplot(lga_unemp_percentage$Median_tot_fam_inc_weekly, xlab = "Median family weekly income", col = "grey")
boxplot(lga_unemp_percentage$Average_household_size, xlab = "Average Household size", col = "grey")
boxplot(lga_unemp_percentage$Total, xlab = "Number of persons over 15 years in the LGA", col = "grey")

#There are rows which have median income of 0. These were inspected by filtering the rows.
outlier_detection_var <- lga_unemp_percentage %>% select("Median_mortgage_repay_monthly","Median_rent_weekly","Median_tot_fam_inc_weekly", "Average_household_size")
z.scores <- outlier_detection_var  %>%  scores(type = "z")
z.scores %>% summary()
 Median_mortgage_repay_monthly Median_rent_weekly Median_tot_fam_inc_weekly Average_household_size
 Min.   :-3.06940              Min.   :-2.7648    Min.   :-3.4252           Min.   :-5.18195      
 1st Qu.:-0.55035              1st Qu.:-0.6452    1st Qu.:-0.5404           1st Qu.:-0.41730      
 Median :-0.09902              Median :-0.1153    Median :-0.1961           Median : 0.01585      
 Mean   : 0.00000              Mean   : 0.0000    Mean   : 0.0000           Mean   : 0.00000      
 3rd Qu.: 0.70498              3rd Qu.: 0.7432    3rd Qu.: 0.5554           3rd Qu.: 0.44900      
 Max.   : 2.17862              Max.   : 2.0043    Max.   : 2.6590           Max.   : 1.53187      
length(which( abs(z.scores) >3 ))
[1] 48
lga_unemp_percentage %>% filter(Median_mortgage_repay_monthly == 0)
                            LGA_NAME_2016 LGA_CODE_2016 Median_mortgage_repay_monthly
1                 No usual address (Vic.)         29499                             0
2  Migratory - Offshore - Shipping (Vic.)         29799                             0
3                 No usual address (Vic.)         29499                             0
4  Migratory - Offshore - Shipping (Vic.)         29799                             0
5                 No usual address (Vic.)         29499                             0
6  Migratory - Offshore - Shipping (Vic.)         29799                             0
7                 No usual address (Vic.)         29499                             0
8  Migratory - Offshore - Shipping (Vic.)         29799                             0
9                 No usual address (Vic.)         29499                             0
10 Migratory - Offshore - Shipping (Vic.)         29799                             0
11                No usual address (Vic.)         29499                             0
12 Migratory - Offshore - Shipping (Vic.)         29799                             0
13                No usual address (Vic.)         29499                             0
14 Migratory - Offshore - Shipping (Vic.)         29799                             0
15                No usual address (Vic.)         29499                             0
16 Migratory - Offshore - Shipping (Vic.)         29799                             0
   Median_rent_weekly Median_tot_fam_inc_weekly Average_household_size   Employment_status    Sex Count
1                   0                         0                      0           Full_time   Male   811
2                   0                         0                      0           Full_time   Male    16
3                   0                         0                      0           Full_time Female   384
4                   0                         0                      0           Full_time Female     0
5                   0                         0                      0           Part_time   Male   317
6                   0                         0                      0           Part_time   Male     0
7                   0                         0                      0           Part_time Female   326
8                   0                         0                      0           Part_time Female     0
9                   0                         0                      0                Away   Male   115
10                  0                         0                      0                Away   Male     3
11                  0                         0                      0                Away Female    81
12                  0                         0                      0                Away Female     0
13                  0                         0                      0 Unemployed/ Looking   Male   367
14                  0                         0                      0 Unemployed/ Looking   Male     0
15                  0                         0                      0 Unemployed/ Looking Female   251
16                  0                         0                      0 Unemployed/ Looking Female     0
   Total Emp_status_Perc_by_lga
1   2652                  30.58
2     19                  84.21
3   2652                  14.48
4     19                   0.00
5   2652                  11.95
6     19                   0.00
7   2652                  12.29
8     19                   0.00
9   2652                   4.34
10    19                  15.79
11  2652                   3.05
12    19                   0.00
13  2652                  13.84
14    19                   0.00
15  2652                   9.46
16    19                   0.00

There were also some datapoints equalling 0 in the boxplots. The rows corresponding to these data points were investigated by filtering them from the dataset. It was discovered that these rows had the LGA code 29799 and 29499. The LGA code 29799 is reserved for cases where people are coded to Migratory, Off-shore and Shipping Mesh Blocks. LGA code 29499 is reserved for cases where people are coded to No usual address Mesh Blocks. Since these rows do not add much information to the data set and to make the data less skewed, they were removed from the data frame.

#Remove rows which have 0 value
lga_unemp_percentage_outliers_rm <- lga_unemp_percentage%>% filter(LGA_CODE_2016 != 29499 & LGA_CODE_2016 != 29799)
#boxplot after outlier removal.
par(mfrow = c(2,3))
boxplot(lga_unemp_percentage_outliers_rm$Median_mortgage_repay_monthly, xlab="Median monthly mortgage repayment", col = "grey")
boxplot(lga_unemp_percentage_outliers_rm$Median_rent_weekly, xlab="Median weekly rent", col = "grey")
boxplot(lga_unemp_percentage_outliers_rm$Median_tot_fam_inc_weekly, xlab = "Median family weekly income", col = "grey")
boxplot(lga_unemp_percentage_outliers_rm$Average_household_size, xlab = "Average Household size", col = "grey")
boxplot(lga_unemp_percentage_outliers_rm$Total, xlab = "Number of persons over 15 years in the LGA", col = "grey")

The boxplots show that there are still outliers in the data. Since the data set reports the observed value for each variable, it would be counter-productive to remove these values. On the contrary, these points could be quite informative in giving insights about why certain LGA’s have unusually high or low values for the indexes in the data set. So, these outliers were not removed or imputed from the dataset.

Transform

Following are the histograms for the Median_mortgage_repay_monthly, Median_rent_weekly, Median_tot_fam_inc_weekly, Average_household_size, Total and Emp_status_Perc_by_lga variables in the dataset.

 
par(mfrow = c(2,3))
hist(lga_unemp_percentage_outliers_rm$Median_mortgage_repay_monthly, col = "light blue")
hist(lga_unemp_percentage_outliers_rm$Median_rent_weekly, col = "light blue")
hist(lga_unemp_percentage_outliers_rm$Median_tot_fam_inc_weekly, col = "light blue")
hist(lga_unemp_percentage_outliers_rm$Average_household_size, col = "light blue")
hist(lga_unemp_percentage_outliers_rm$Total, col = "light blue")
hist(lga_unemp_percentage_outliers_rm$Emp_status_Perc_by_lga, col = "light blue")

The histogram for the variable ‘Total’ and Emp_status_Perc_by_lga which gives the count and percentages of the persons by different employment status for each gender has a right-skewed distribution. To prepare the dataset for statistical analysis, a logarithmic transformation was applied on the ‘Total’ variable to make it more symmetric.

hist(log(lga_unemp_percentage_outliers_rm$Total))

hist(log(lga_unemp_percentage_outliers_rm$Emp_status_Perc_by_lga))



