library(tidyr)
Attaching package: ‘tidyr’
The following object is masked _by_ ‘.GlobalEnv’:
who
library(readr)
library(stringr)
library(dplyr)
Attaching package: ‘dplyr’
The following objects are masked from ‘package:stats’:
filter, lag
The following objects are masked from ‘package:base’:
intersect, setdiff, setequal, union
library(Hmisc)
Loading required package: lattice
Loading required package: survival
Loading required package: Formula
Loading required package: ggplot2
Attaching package: ‘Hmisc’
The following objects are masked from ‘package:dplyr’:
src, summarize
The following objects are masked from ‘package:base’:
format.pval, units
library(outliers)
In data preprocessing techniques raw data is transformed into workable data that can be used for statistical visualizations and analysis. In this report, these techniques are applied on 3 datasets sourced from sourced from Australian Bureau of Statistics (http://www.abs.gov.au/). These datasets have selected labor information for residents of all the local government areas in Victoria. These datasets were merged and various techniques like data type conversions, subsetting, scanning for outliers in the dataset was done in alignment with the tidy data principles to convert the raw data into data that is suitable for statistical purposes.
The datasets in the current report have been sourced from Australian Bureau of Statistics (http://www.abs.gov.au/). 3 datasets, 2016Census_G02_VIC_LGA.csv, 2016Census_G40_VIC_LGA and LGA_2016_VIC have been used in the report. 2016Census_G40_VIC_LGA includes selected Medians and Averages for age, rent, mortgage, income and other information. 2016Census_G02_VIC_LGA.csv includes various information about selected Labour Force, Education and Migration Characteristics by Sex about the labour force in Victorian local government areas (LGA). LGA_2016_VIC contains information like names of LGA, area the LGA etc.
The 3 datasets were imported into lga_1, lga_2 and lga_3 dataframes using the read_csv function from readr package.
#read data
lga_1 <- read_csv("/Users/Vidya/Documents/Uni/Semester5/DataPreprocessing/assignment3/data_dp/2016Census_G02_VIC_LGA.csv")
Parsed with column specification:
cols(
LGA_CODE_2016 = col_character(),
Median_age_persons = col_integer(),
Median_mortgage_repay_monthly = col_integer(),
Median_tot_prsnl_inc_weekly = col_integer(),
Median_rent_weekly = col_integer(),
Median_tot_fam_inc_weekly = col_integer(),
Average_num_psns_per_bedroom = col_double(),
Median_tot_hhd_inc_weekly = col_integer(),
Average_household_size = col_double()
)
lga_2 <- read_csv("/Users/Vidya/Documents/Uni/Semester5/DataPreprocessing/assignment3/data_dp/2016Census_G40_VIC_LGA.csv")
Parsed with column specification:
cols(
.default = col_integer(),
LGA_CODE_2016 = col_character(),
Percent_Unem_loyment_M = col_double(),
Percent_Unem_loyment_F = col_double(),
Percent_Unem_loyment_P = col_double(),
Percnt_LabForc_prticipation_M = col_double(),
Percnt_LabForc_prticipation_F = col_double(),
Percnt_LabForc_prticipation_P = col_double(),
Percnt_Employment_to_populn_M = col_double(),
Percnt_Employment_to_populn_F = col_double(),
Percnt_Employment_to_populn_P = col_double()
)
See spec(...) for full column specifications.
lga_3 <- read_csv("/Users/Vidya/Documents/Uni/Semester5/DataPreprocessing/assignment3/data_dp/LGA_2016_VIC.csv")
Parsed with column specification:
cols(
MB_CODE_2016 = col_double(),
LGA_CODE_2016 = col_integer(),
LGA_NAME_2016 = col_character(),
STATE_CODE_2016 = col_integer(),
STATE_NAME_2016 = col_character(),
AREA_ALBERS_SQKM = col_double()
)
head(lga_1, 3)
# A tibble: 3 x 9
LGA_CODE_2016 Median_age_pers… Median_mortgage… Median_tot_prsn… Median_rent_wee… Median_tot_fam_…
<chr> <int> <int> <int> <int> <int>
1 LGA20110 49 1300 562 200 1322
2 LGA20260 46 1022 556 185 1263
3 LGA20570 38 1350 590 250 1489
# ... with 3 more variables: Average_num_psns_per_bedroom <dbl>, Median_tot_hhd_inc_weekly <int>,
# Average_household_size <dbl>
head(lga_2, 3)
# A tibble: 3 x 67
LGA_CODE_2016 P_15_yrs_over_M P_15_yrs_over_F P_15_yrs_over_P lfs_Emplyed_wrk… lfs_Emplyed_wrk…
<chr> <int> <int> <int> <int> <int>
1 LGA20110 5074 5307 10384 1917 1091
2 LGA20260 5232 4572 9798 1779 943
3 LGA20570 38889 43329 82219 16596 8873
# ... with 61 more variables: lfs_Emplyed_wrked_full_time_P <int>, lfs_Emplyed_wrked_part_time_M <int>,
# lfs_Emplyed_wrked_part_time_F <int>, lfs_Emplyed_wrked_part_time_P <int>,
# lfs_Employed_away_from_work_M <int>, lfs_Employed_away_from_work_F <int>,
# lfs_Employed_away_from_work_P <int>, lfs_Unmplyed_lookng_for_wrk_M <int>,
# lfs_Unmplyed_lookng_for_wrk_F <int>, lfs_Unmplyed_lookng_for_wrk_P <int>, lfs_Tot_LF_M <int>,
# lfs_Tot_LF_F <int>, lfs_Tot_LF_P <int>, lfs_N_the_labour_force_M <int>,
# lfs_N_the_labour_force_F <int>, lfs_N_the_labour_force_P <int>, Percent_Unem_loyment_M <dbl>,
# Percent_Unem_loyment_F <dbl>, Percent_Unem_loyment_P <dbl>, Percnt_LabForc_prticipation_M <dbl>,
# Percnt_LabForc_prticipation_F <dbl>, Percnt_LabForc_prticipation_P <dbl>,
# Percnt_Employment_to_populn_M <dbl>, Percnt_Employment_to_populn_F <dbl>,
# Percnt_Employment_to_populn_P <dbl>, Non_sch_quals_PostGrad_Dgre_M <int>,
# Non_sch_quals_PostGrad_Dgre_F <int>, Non_sch_quals_PostGrad_Dgre_P <int>,
# Non_sch_quals_Gr_Dip_Gr_Crt_M <int>, Non_sch_quals_Gr_Dip_Gr_Crt_F <int>,
# Non_sch_quals_Gr_Dip_Gr_Crt_P <int>, Non_sch_quals_Bchelr_Degree_M <int>,
# Non_sch_quals_Bchelr_Degree_F <int>, Non_sch_quals_Bchelr_Degree_P <int>,
# Non_sch_quls_Advncd_Dip_Dip_M <int>, Non_sch_quls_Advncd_Dip_Dip_F <int>,
# Non_sch_quls_Advncd_Dip_Dip_P <int>, Non_sch_quls_Cert3a4_Level_M <int>,
# Non_sch_quls_Cert3a4_Level_F <int>, Non_sch_quls_Cert3a4_Level_P <int>,
# Non_sch_quls_Cert1a2_Level_M <int>, Non_sch_quls_Cert1a2_Level_F <int>,
# Non_sch_quls_Cert1a2_Level_P <int>, Non_sch_quls_Certnfd_Level_M <int>,
# Non_sch_quls_Certnfd_Level_F <int>, Non_sch_quls_Certnfd_Level_P <int>,
# Non_sch_quls_CertTot_Level_M <int>, Non_sch_quls_CertTot_Level_F <int>,
# Non_sch_quls_CertTot_Level_P <int>, Migtn_Lvd_same_add_1_yr_ago_M <int>,
# Migtn_Lvd_same_add_1_yr_ago_F <int>, Migtn_Lvd_same_add_1_yr_ago_P <int>,
# Migtn_Lvd_Diff_add_1_yr_ago_M <int>, Migtn_Lvd_Diff_add_1_yr_ago_F <int>,
# Migtn_Lvd_Diff_add_1_yr_ago_P <int>, Migtn_Lvd_sme_add_5_yrs_ago_M <int>,
# Migtn_Lvd_sme_add_5_yrs_ago_F <int>, Migtn_Lvd_sme_add_5_yrs_ago_P <int>,
# Mign_Lvd_Diff_add_5_yrs_ago_M <int>, Mign_Lvd_Diff_add_5_yrs_ago_F <int>,
# Mign_Lvd_Diff_add_5_yrs_ago_P <int>
head(lga_3, 3)
# A tibble: 3 x 6
MB_CODE_2016 LGA_CODE_2016 LGA_NAME_2016 STATE_CODE_2016 STATE_NAME_2016 AREA_ALBERS_SQKM
<dbl> <int> <chr> <int> <chr> <dbl>
1 20011170000 20570 Ballarat (C) 2 Victoria 0.054
2 20011160000 20570 Ballarat (C) 2 Victoria 0.0331
3 20010160000 20570 Ballarat (C) 2 Victoria 0.0409
It was observed that lga_1 had 82 observations for 9 variables, lga_2 had 82 observations for 67 variables and lga_3 had 85014 observations for 6 variables.
#dimension of the dataframe
{ cat("dimension of lga_1: ", dim(lga_1))
cat("\ndimension of lga_2: ", dim(lga_2))
cat("\ndimension of lga_3: ", dim(lga_3))}
dimension of lga_1: 82 9
dimension of lga_2: 82 67
dimension of lga_3: 85014 6
Since the report is only interested in looking at variables that give information about the employment details, education levels and such information, those that did not contribute to this were removed. Distinct LGA names were selected so that the final dataframe can have the names of each LGA.
#Dropping variables that are not of interest from the dataframes before joining
lga_1 <- select(lga_1, LGA_CODE_2016, Median_mortgage_repay_monthly, Median_rent_weekly, Median_tot_fam_inc_weekly, Average_household_size)
lga_2 <- lga_2[, c(1, 5:15)]
lga_3 <- select(lga_3, LGA_CODE_2016, LGA_NAME_2016)
#Select distinct lga names from lga_3
lga_3 <- lga_3 %>% distinct(LGA_CODE_2016, LGA_NAME_2016)
The 3 datasets were joined on the key variable LGA_CODE_2016. This variable was of different data type in LGA_3 table and therefore not combatible for joining. This column was convered to character datatype before joining the 3 datasets.
# change datatype to character to make the columns compatible
lga_3$LGA_CODE_2016 <- as.character(lga_3$LGA_CODE_2016)
The 3 data frames was merged using the LGA_CODE_2016 column. The LGA codes in the LGA_CODE_2016 columns had ‘LGA’ prefixed to it. To combine the 3 dataframes the value has be similiar across the dataframes. Using the transform function from stringr, the ‘LGA’ prefix was removed.
#change the name of the the lga_code column in lga_1 and lga_2
lga_1 <- lga_1 %>% transform(LGA_CODE_2016=str_replace(LGA_CODE_2016,"LGA",""))
lga_2 <- lga_2 %>% transform(LGA_CODE_2016=str_replace(LGA_CODE_2016,"LGA",""))
#Join the three tables
lga_1_2 <- lga_1 %>% left_join(lga_2, by = "LGA_CODE_2016")
lga_joined <- lga_1_2 %>% left_join(lga_3, by = "LGA_CODE_2016")
#Shift the column LGA name next to the lga code
lga_joined <- lga_joined %>% select(LGA_NAME_2016,everything())
head(lga_joined,3)
LGA_NAME_2016 LGA_CODE_2016 Median_mortgage_repay_monthly Median_rent_weekly
1 Alpine (S) 20110 1300 200
2 Ararat (RC) 20260 1022 185
3 Ballarat (C) 20570 1350 250
Median_tot_fam_inc_weekly Average_household_size lfs_Emplyed_wrked_full_time_M
1 1322 2.2 1917
2 1263 2.3 1779
3 1489 2.4 16596
lfs_Emplyed_wrked_full_time_F lfs_Emplyed_wrked_full_time_P lfs_Emplyed_wrked_part_time_M
1 1091 3005 711
2 943 2726 573
3 8873 25472 5039
lfs_Emplyed_wrked_part_time_F lfs_Emplyed_wrked_part_time_P lfs_Employed_away_from_work_M
1 1384 2088 192
2 1101 1676 149
3 11636 16673 1059
lfs_Employed_away_from_work_F lfs_Employed_away_from_work_P lfs_Unmplyed_lookng_for_wrk_M
1 199 400 136
2 168 316 151
3 1511 2571 1843
lfs_Unmplyed_lookng_for_wrk_F
1 82
2 107
3 1573
{ cat("dimension of lga_1_2: ", dim(lga_1_2))}
dimension of lga_1_2: 82 16
There are 82 observations from 17 variables in the joined dataset. There are 2 character variables, 14 integer data types and 1 numeric type variable.
str(lga_joined)
'data.frame': 82 obs. of 17 variables:
$ LGA_NAME_2016 : chr "Alpine (S)" "Ararat (RC)" "Ballarat (C)" "Banyule (C)" ...
$ LGA_CODE_2016 : chr "20110" "20260" "20570" "20660" ...
$ Median_mortgage_repay_monthly: int 1300 1022 1350 1950 1315 1465 2500 1192 2500 1599 ...
$ Median_rent_weekly : int 200 185 250 350 250 250 450 200 406 300 ...
$ Median_tot_fam_inc_weekly : int 1322 1263 1489 2033 1192 1462 2765 1232 2652 1358 ...
$ Average_household_size : num 2.2 2.3 2.4 2.6 2.2 2.5 2.6 2.2 2.6 3 ...
$ lfs_Emplyed_wrked_full_time_M: int 1917 1779 16596 22786 4231 8399 17691 2144 30428 31470 ...
$ lfs_Emplyed_wrked_full_time_F: int 1091 943 8873 12770 2061 3969 9871 1101 19343 16975 ...
$ lfs_Emplyed_wrked_full_time_P: int 3005 2726 25472 35554 6292 12364 27559 3240 49766 48442 ...
$ lfs_Emplyed_wrked_part_time_M: int 711 573 5039 6450 1794 2211 5171 605 10059 10830 ...
$ lfs_Emplyed_wrked_part_time_F: int 1384 1101 11636 14064 3468 5337 11274 1411 18926 16058 ...
$ lfs_Emplyed_wrked_part_time_P: int 2088 1676 16673 20514 5259 7548 16443 2012 28984 26894 ...
$ lfs_Employed_away_from_work_M: int 192 149 1059 1170 457 610 847 183 1468 2444 ...
$ lfs_Employed_away_from_work_F: int 199 168 1511 1670 532 734 1134 219 2038 2254 ...
$ lfs_Employed_away_from_work_P: int 400 316 2571 2846 988 1346 1989 402 3507 4696 ...
$ lfs_Unmplyed_lookng_for_wrk_M: int 136 151 1843 1797 477 620 1093 175 2575 4815 ...
$ lfs_Unmplyed_lookng_for_wrk_F: int 82 107 1573 1635 377 624 1121 140 2263 4443 ...
Before performing any datatype conversions, the data needs to be reshaped. We can observe that columns 7 to 17 are values instead of variables.Therefore, the data does not conform to tidy data principles. To rectify this the dataset was converted from wide to long in the next section.
The information for the employment status of residents of an LGA is spread across columns 7 to 10 in the joined dataset. To do any meaningful statistical visualizations or analysis on the data, these values will need to be contained inside the variable Employment_Status.This was done using the gather() function. Also, the employment details value also has details about the LGA residents’s gender. To make the information in each cell atomic, the value was split and assigned to 2 different columns: Employment_status and Sex. There were rows which held values for all people of employable age in an LGA. This was also removed.
# Forming Employment_status column
lga_emp <- lga_joined %>% gather(7:17, key = "Employment_status", value = "Count")
lga_emp <- lga_emp %>% transform(Employment_status=str_replace(Employment_status,"lfs_","")) %>% separate(Employment_status, into = c("Employment_status","Sex"), sep = -1)
head(lga_emp, 3)
LGA_NAME_2016 LGA_CODE_2016 Median_mortgage_repay_monthly Median_rent_weekly
1 Alpine (S) 20110 1300 200
2 Ararat (RC) 20260 1022 185
3 Ballarat (C) 20570 1350 250
Median_tot_fam_inc_weekly Average_household_size Employment_status Sex Count
1 1322 2.2 Emplyed_wrked_full_time_ M 1917
2 1263 2.3 Emplyed_wrked_full_time_ M 1779
3 1489 2.4 Emplyed_wrked_full_time_ M 16596
str(lga_emp)
'data.frame': 902 obs. of 9 variables:
$ LGA_NAME_2016 : chr "Alpine (S)" "Ararat (RC)" "Ballarat (C)" "Banyule (C)" ...
$ LGA_CODE_2016 : chr "20110" "20260" "20570" "20660" ...
$ Median_mortgage_repay_monthly: int 1300 1022 1350 1950 1315 1465 2500 1192 2500 1599 ...
$ Median_rent_weekly : int 200 185 250 350 250 250 450 200 406 300 ...
$ Median_tot_fam_inc_weekly : int 1322 1263 1489 2033 1192 1462 2765 1232 2652 1358 ...
$ Average_household_size : num 2.2 2.3 2.4 2.6 2.2 2.5 2.6 2.2 2.6 3 ...
$ Employment_status : chr "Emplyed_wrked_full_time_" "Emplyed_wrked_full_time_" "Emplyed_wrked_full_time_" "Emplyed_wrked_full_time_" ...
$ Sex : chr "M" "M" "M" "M" ...
$ Count : int 1917 1779 16596 22786 4231 8399 17691 2144 30428 31470 ...
#Remove rows which have the total count of both sexes
lga_emp_1 <- lga_emp %>% filter( Sex == 'F' | Sex =='M')
#converting the Employment_status variable to factor with levels
lga_emp_1$Sex <- as.factor(lga_emp_1$Sex)
lga_emp_1$Employment_status <- as.factor(lga_emp_1$Employment_status)
#rename factor levels
levels(lga_emp_1$Sex)
[1] "F" "M"
levels(lga_emp_1$Employment_status)
[1] "Employed_away_from_work_" "Emplyed_wrked_full_time_" "Emplyed_wrked_part_time_"
[4] "Unmplyed_lookng_for_wrk_"
lga_emp_1$Sex <- factor(lga_emp_1$Sex, ordered = TRUE,levels=c('F','M'),labels=c('Female','Male'))
lga_emp_1$Employment_status <- factor(lga_emp_1$Employment_status, ordered = TRUE,levels=c("Employed_away_from_work_", "Emplyed_wrked_full_time_", "Emplyed_wrked_part_time_","Unmplyed_lookng_for_wrk_"),labels=c("Away",'Full_time','Part_time',"Unemployed/ Looking"))
str(lga_emp_1)
'data.frame': 656 obs. of 9 variables:
$ LGA_NAME_2016 : chr "Alpine (S)" "Ararat (RC)" "Ballarat (C)" "Banyule (C)" ...
$ LGA_CODE_2016 : chr "20110" "20260" "20570" "20660" ...
$ Median_mortgage_repay_monthly: int 1300 1022 1350 1950 1315 1465 2500 1192 2500 1599 ...
$ Median_rent_weekly : int 200 185 250 350 250 250 450 200 406 300 ...
$ Median_tot_fam_inc_weekly : int 1322 1263 1489 2033 1192 1462 2765 1232 2652 1358 ...
$ Average_household_size : num 2.2 2.3 2.4 2.6 2.2 2.5 2.6 2.2 2.6 3 ...
$ Employment_status : Ord.factor w/ 4 levels "Away"<"Full_time"<..: 2 2 2 2 2 2 2 2 2 2 ...
$ Sex : Ord.factor w/ 2 levels "Female"<"Male": 2 2 2 2 2 2 2 2 2 2 ...
$ Count : int 1917 1779 16596 22786 4231 8399 17691 2144 30428 31470 ...
A new variable Total that contains the information about the percentage of persons of each employment status by gender in each LGA was created.
#summarise the total number of males and females for each employment status in each LGA
lga_emp_total <- lga_emp_1 %>% group_by(Employment_status) %>% group_by(LGA_NAME_2016) %>% summarise(Total = sum(Count, na.rm = TRUE))
#join lga_emp_total to get the total
lga_emp_total <- lga_emp_1 %>% left_join(lga_emp_total, by = "LGA_NAME_2016")
#create new dataframe which has the mutated value
lga_unemp_percentage <- mutate(lga_emp_total, Emp_status_Perc_by_lga = round((Count/Total)*100,2))
head(lga_unemp_percentage,3)
LGA_NAME_2016 LGA_CODE_2016 Median_mortgage_repay_monthly Median_rent_weekly
1 Alpine (S) 20110 1300 200
2 Ararat (RC) 20260 1022 185
3 Ballarat (C) 20570 1350 250
Median_tot_fam_inc_weekly Average_household_size Employment_status Sex Count Total
1 1322 2.2 Full_time Male 1917 5712
2 1263 2.3 Full_time Male 1779 4971
3 1489 2.4 Full_time Male 16596 48130
Emp_status_Perc_by_lga
1 33.56
2 35.79
3 34.48
The lga_unemp_percentage dataset was scanned for missing or special values. No such values were found.
is.special <- function(x){
if (is.numeric(x)) !is.finite(x) else is.na(x)
}
is.special <- function(x){
if (is.numeric(x)) !is.finite(x)
}
sapply(lga_unemp_percentage, is.special)
$LGA_NAME_2016
NULL
$LGA_CODE_2016
NULL
$Median_mortgage_repay_monthly
[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[17] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[33] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[49] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[65] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[81] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[97] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[113] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[129] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[145] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[161] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[177] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[193] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[209] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[225] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[241] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[257] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[273] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[289] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[305] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[321] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[337] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[353] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[369] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[385] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[401] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[417] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[433] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[449] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[465] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[481] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[497] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[513] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[529] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[545] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[561] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[577] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[593] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[609] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[625] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[641] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
$Median_rent_weekly
[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[17] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[33] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[49] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[65] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[81] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[97] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[113] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[129] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[145] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[161] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[177] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[193] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[209] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[225] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[241] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[257] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[273] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[289] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[305] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[321] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[337] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[353] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[369] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[385] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[401] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[417] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[433] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[449] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[465] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[481] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[497] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[513] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[529] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[545] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[561] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[577] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[593] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[609] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[625] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[641] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
$Median_tot_fam_inc_weekly
[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[17] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[33] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[49] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[65] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[81] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[97] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[113] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[129] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[145] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[161] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[177] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[193] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[209] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[225] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[241] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[257] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[273] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[289] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[305] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[321] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[337] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[353] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[369] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[385] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[401] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[417] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[433] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[449] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[465] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[481] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[497] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[513] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[529] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[545] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[561] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[577] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[593] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[609] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[625] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[641] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
$Average_household_size
[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[17] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[33] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[49] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[65] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[81] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[97] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[113] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[129] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[145] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[161] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[177] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[193] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[209] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[225] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[241] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[257] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[273] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[289] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[305] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[321] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[337] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[353] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[369] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[385] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[401] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[417] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[433] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[449] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[465] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[481] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[497] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[513] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[529] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[545] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[561] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[577] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[593] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[609] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[625] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[641] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
$Employment_status
NULL
$Sex
NULL
$Count
[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[17] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[33] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[49] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[65] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[81] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[97] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[113] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[129] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[145] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[161] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[177] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[193] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[209] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[225] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[241] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[257] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[273] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[289] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[305] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[321] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[337] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[353] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[369] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[385] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[401] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[417] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[433] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[449] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[465] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[481] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[497] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[513] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[529] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[545] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[561] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[577] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[593] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[609] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[625] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[641] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
$Total
[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[17] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[33] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[49] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[65] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[81] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[97] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[113] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[129] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[145] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[161] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[177] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[193] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[209] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[225] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[241] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[257] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[273] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[289] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[305] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[321] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[337] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[353] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[369] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[385] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[401] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[417] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[433] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[449] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[465] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[481] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[497] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[513] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[529] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[545] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[561] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[577] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[593] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[609] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[625] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[641] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
$Emp_status_Perc_by_lga
[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[17] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[33] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[49] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[65] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[81] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[97] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[113] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[129] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[145] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[161] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[177] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[193] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[209] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[225] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[241] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[257] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[273] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[289] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[305] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[321] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[337] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[353] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[369] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[385] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[401] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[417] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[433] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[449] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[465] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[481] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[497] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[513] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[529] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[545] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[561] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[577] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[593] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[609] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[625] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[641] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
# no na or other special values found
The lga_unemp_percentage dataset was scanned for outliers by plotting the boxplot for the numerical variables. The boxplot for the Median_mortgage_repay_monthly, Median_rent_weekly, Median_tot_fam_inc_weekly, Average_household_size showed some low and high values. According to the Z-score method, the dataset has 48 outliers.
par(mfrow = c(2,3))
boxplot(lga_unemp_percentage$Median_mortgage_repay_monthly, xlab="Median monthly mortgage repayment", col = "grey")
boxplot(lga_unemp_percentage$Median_rent_weekly, xlab="Median weekly rent", col = "grey")
boxplot(lga_unemp_percentage$Median_tot_fam_inc_weekly, xlab = "Median family weekly income", col = "grey")
boxplot(lga_unemp_percentage$Average_household_size, xlab = "Average Household size", col = "grey")
boxplot(lga_unemp_percentage$Total, xlab = "Number of persons over 15 years in the LGA", col = "grey")
#There are rows which have median income of 0. These were inspected by filtering the rows.
outlier_detection_var <- lga_unemp_percentage %>% select("Median_mortgage_repay_monthly","Median_rent_weekly","Median_tot_fam_inc_weekly", "Average_household_size")
z.scores <- outlier_detection_var %>% scores(type = "z")
z.scores %>% summary()
Median_mortgage_repay_monthly Median_rent_weekly Median_tot_fam_inc_weekly Average_household_size
Min. :-3.06940 Min. :-2.7648 Min. :-3.4252 Min. :-5.18195
1st Qu.:-0.55035 1st Qu.:-0.6452 1st Qu.:-0.5404 1st Qu.:-0.41730
Median :-0.09902 Median :-0.1153 Median :-0.1961 Median : 0.01585
Mean : 0.00000 Mean : 0.0000 Mean : 0.0000 Mean : 0.00000
3rd Qu.: 0.70498 3rd Qu.: 0.7432 3rd Qu.: 0.5554 3rd Qu.: 0.44900
Max. : 2.17862 Max. : 2.0043 Max. : 2.6590 Max. : 1.53187
length(which( abs(z.scores) >3 ))
[1] 48
lga_unemp_percentage %>% filter(Median_mortgage_repay_monthly == 0)
LGA_NAME_2016 LGA_CODE_2016 Median_mortgage_repay_monthly
1 No usual address (Vic.) 29499 0
2 Migratory - Offshore - Shipping (Vic.) 29799 0
3 No usual address (Vic.) 29499 0
4 Migratory - Offshore - Shipping (Vic.) 29799 0
5 No usual address (Vic.) 29499 0
6 Migratory - Offshore - Shipping (Vic.) 29799 0
7 No usual address (Vic.) 29499 0
8 Migratory - Offshore - Shipping (Vic.) 29799 0
9 No usual address (Vic.) 29499 0
10 Migratory - Offshore - Shipping (Vic.) 29799 0
11 No usual address (Vic.) 29499 0
12 Migratory - Offshore - Shipping (Vic.) 29799 0
13 No usual address (Vic.) 29499 0
14 Migratory - Offshore - Shipping (Vic.) 29799 0
15 No usual address (Vic.) 29499 0
16 Migratory - Offshore - Shipping (Vic.) 29799 0
Median_rent_weekly Median_tot_fam_inc_weekly Average_household_size Employment_status Sex Count
1 0 0 0 Full_time Male 811
2 0 0 0 Full_time Male 16
3 0 0 0 Full_time Female 384
4 0 0 0 Full_time Female 0
5 0 0 0 Part_time Male 317
6 0 0 0 Part_time Male 0
7 0 0 0 Part_time Female 326
8 0 0 0 Part_time Female 0
9 0 0 0 Away Male 115
10 0 0 0 Away Male 3
11 0 0 0 Away Female 81
12 0 0 0 Away Female 0
13 0 0 0 Unemployed/ Looking Male 367
14 0 0 0 Unemployed/ Looking Male 0
15 0 0 0 Unemployed/ Looking Female 251
16 0 0 0 Unemployed/ Looking Female 0
Total Emp_status_Perc_by_lga
1 2652 30.58
2 19 84.21
3 2652 14.48
4 19 0.00
5 2652 11.95
6 19 0.00
7 2652 12.29
8 19 0.00
9 2652 4.34
10 19 15.79
11 2652 3.05
12 19 0.00
13 2652 13.84
14 19 0.00
15 2652 9.46
16 19 0.00
There were also some datapoints equalling 0 in the boxplots. The rows corresponding to these data points were investigated by filtering them from the dataset. It was discovered that these rows had the LGA code 29799 and 29499. The LGA code 29799 is reserved for cases where people are coded to Migratory, Off-shore and Shipping Mesh Blocks. LGA code 29499 is reserved for cases where people are coded to No usual address Mesh Blocks. Since these rows do not add much information to the data set and to make the data less skewed, they were removed from the data frame.
#Remove rows which have 0 value
lga_unemp_percentage_outliers_rm <- lga_unemp_percentage%>% filter(LGA_CODE_2016 != 29499 & LGA_CODE_2016 != 29799)
#boxplot after outlier removal.
par(mfrow = c(2,3))
boxplot(lga_unemp_percentage_outliers_rm$Median_mortgage_repay_monthly, xlab="Median monthly mortgage repayment", col = "grey")
boxplot(lga_unemp_percentage_outliers_rm$Median_rent_weekly, xlab="Median weekly rent", col = "grey")
boxplot(lga_unemp_percentage_outliers_rm$Median_tot_fam_inc_weekly, xlab = "Median family weekly income", col = "grey")
boxplot(lga_unemp_percentage_outliers_rm$Average_household_size, xlab = "Average Household size", col = "grey")
boxplot(lga_unemp_percentage_outliers_rm$Total, xlab = "Number of persons over 15 years in the LGA", col = "grey")
The boxplots show that there are still outliers in the data. Since the data set reports the observed value for each variable, it would be counter-productive to remove these values. On the contrary, these points could be quite informative in giving insights about why certain LGA’s have unusually high or low values for the indexes in the data set. So, these outliers were not removed or imputed from the dataset.
Following are the histograms for the Median_mortgage_repay_monthly, Median_rent_weekly, Median_tot_fam_inc_weekly, Average_household_size, Total and Emp_status_Perc_by_lga variables in the dataset.
par(mfrow = c(2,3))
hist(lga_unemp_percentage_outliers_rm$Median_mortgage_repay_monthly, col = "light blue")
hist(lga_unemp_percentage_outliers_rm$Median_rent_weekly, col = "light blue")
hist(lga_unemp_percentage_outliers_rm$Median_tot_fam_inc_weekly, col = "light blue")
hist(lga_unemp_percentage_outliers_rm$Average_household_size, col = "light blue")
hist(lga_unemp_percentage_outliers_rm$Total, col = "light blue")
hist(lga_unemp_percentage_outliers_rm$Emp_status_Perc_by_lga, col = "light blue")
The histogram for the variable ‘Total’ and Emp_status_Perc_by_lga which gives the count and percentages of the persons by different employment status for each gender has a right-skewed distribution. To prepare the dataset for statistical analysis, a logarithmic transformation was applied on the ‘Total’ variable to make it more symmetric.
hist(log(lga_unemp_percentage_outliers_rm$Total))
hist(log(lga_unemp_percentage_outliers_rm$Emp_status_Perc_by_lga))