Student Details

Data Table

These data were obtain from the Australian Bureau of Statistics website, under the Census section (http://www.abs.gov.au/census). A direct URL to the tabulated data cannot be provided as this area of the site requires a “Guest User” login. Details of the steps of bring these data into R for pre-processing are outlined in the appendix.

The data table used for the analysis.

Importing the data

census <- read_xlsx("FINALtable.xlsx", 1)
## Warning in strptime(x, format, tz = tz): unknown timezone 'zone/tz/2018c.
## 1.0/zoneinfo/Australia/Melbourne'
# Checking the data
head(census, n = 5)
## # A tibble: 5 x 24
##   `SEXP Sex` `EETP Engagement in Em… `0-4 years` `5-9 years` `10-14 years`
##   <chr>      <chr>                         <dbl>       <dbl>         <dbl>
## 1 Male       Fully engaged                    0.          0.            0.
## 2 <NA>       Partially engaged                0.          0.            0.
## 3 <NA>       At least partially eng…          0.          0.            0.
## 4 <NA>       Not Engaged                      0.          0.            0.
## 5 <NA>       Engagement status unde…          0.          0.            0.
## # ... with 19 more variables: `15-19 years` <dbl>, `20-24 years` <dbl>,
## #   `25-29 years` <dbl>, `30-34 years` <dbl>, `35-39 years` <dbl>, `40-44
## #   years` <dbl>, `45-49 years` <dbl>, `50-54 years` <dbl>, `55-59
## #   years` <dbl>, `60-64 years` <dbl>, `65-69 years` <dbl>, `70-74
## #   years` <dbl>, `75-79 years` <dbl>, `80-84 years` <dbl>, `85-89
## #   years` <dbl>, `90-94 years` <dbl>, `95-99 years` <dbl>, `100 years and
## #   over` <dbl>, Total <dbl>
# Checking the data
tail(census, n = 5)
## # A tibble: 5 x 24
##   `SEXP Sex` `EETP Engagement in Em… `0-4 years` `5-9 years` `10-14 years`
##   <chr>      <chr>                         <dbl>       <dbl>         <dbl>
## 1 <NA>       Partially engaged                0.          0.            0.
## 2 <NA>       At least partially eng…          0.          0.            0.
## 3 <NA>       Not Engaged                      0.          0.            0.
## 4 <NA>       Engagement status unde…          0.          0.            0.
## 5 <NA>       Not applicable             1464782.    1502647.      1397187.
## # ... with 19 more variables: `15-19 years` <dbl>, `20-24 years` <dbl>,
## #   `25-29 years` <dbl>, `30-34 years` <dbl>, `35-39 years` <dbl>, `40-44
## #   years` <dbl>, `45-49 years` <dbl>, `50-54 years` <dbl>, `55-59
## #   years` <dbl>, `60-64 years` <dbl>, `65-69 years` <dbl>, `70-74
## #   years` <dbl>, `75-79 years` <dbl>, `80-84 years` <dbl>, `85-89
## #   years` <dbl>, `90-94 years` <dbl>, `95-99 years` <dbl>, `100 years and
## #   over` <dbl>, Total <dbl>
# Checking the size
dim(census)
## [1] 18 24
# Display the first attribute as a vector
c(census[1:18,1])
## $`SEXP Sex`
##  [1] "Male"   NA       NA       NA       NA       NA       "Female"
##  [8] NA       NA       NA       NA       NA       "Total"  NA      
## [15] NA       NA       NA       NA
# Display the first attribute as a vector
c(census[1:18,1])
## $`SEXP Sex`
##  [1] "Male"   "Male"   "Male"   "Male"   "Male"   "Male"   "Female"
##  [8] "Female" "Female" "Female" "Female" "Female" "Total"  "Total" 
## [15] "Total"  "Total"  "Total"  "Total"
headerNew <- c("Gender", "Engagement", "2", "7", "12", "17", "22", "27", 
               "32", "37", "42", "47", "52", "57", "62", "67", "72", "77", 
               "82", "87", "92", "97", "100+", "Total")
colnames(census) <- headerNew
# Drop age 0 - 18 (no data recorded) & drop 100+ (different interval)
censusViz <- census[-c(3, 5, 6, 9, 11, 12, 13, 14, 15, 16, 17, 18), -c(3, 4, 5, 23, 24)]
# Checking the data
str(censusViz)
## Classes 'tbl_df', 'tbl' and 'data.frame':    6 obs. of  19 variables:
##  $ Gender    : chr  "Male" "Male" "Male" "Female" ...
##  $ Engagement: chr  "Fully engaged" "Partially engaged" "Not Engaged" "Fully engaged" ...
##  $ 17        : num  587338 43097 51755 564773 49503 ...
##  $ 22        : num  530431 92269 95785 487796 115051 ...
##  $ 27        : num  558982 83821 87244 453365 143524 ...
##  $ 32        : num  587397 75855 81836 369790 202239 ...
##  $ 37        : num  541873 69043 80007 299536 232233 ...
##  $ 42        : num  535291 72456 89937 317693 253261 ...
##  $ 47        : num  516059 75111 101870 332521 244726 ...
##  $ 52        : num  473883 78123 118341 306164 221484 ...
##  $ 57        : num  394833 83950 154576 240776 202529 ...
##  $ 62        : num  242659 89835 232986 129180 150442 ...
##  $ 67        : num  93675 67358 362931 40828 73753 ...
##  $ 72        : num  27724 33984 325735 10427 26378 ...
##  $ 77        : num  9221 13591 249343 3326 8077 ...
##  $ 82        : num  3111 4702 169140 1228 2458 ...
##  $ 87        : num  1161 1640 101497 522 837 ...
##  $ 92        : num  348 363 37806 242 226 ...
##  $ 97        : num  457 105 6877 272 205 ...
# Turn the Gender and engagement into factors rather than Char
censusViz$Gender <- factor(censusViz$Gender)
censusViz$Engagement <- factor(censusViz$Engagement)
# Checking the data
summary(censusViz)
##     Gender              Engagement       17               22        
##  Female:3   Fully engaged    :2    Min.   : 38476   Min.   : 92269  
##  Male  :3   Not Engaged      :2    1st Qu.: 44698   1st Qu.: 97325  
##             Partially engaged:2    Median : 50629   Median :108498  
##                                    Mean   :222490   Mean   :237213  
##                                    3rd Qu.:436518   3rd Qu.:394610  
##                                    Max.   :587338   Max.   :530431  
##        27               32               37               42        
##  Min.   : 83821   Min.   : 75855   Min.   : 69043   Min.   : 72456  
##  1st Qu.:101314   1st Qu.:107958   1st Qu.:103692   1st Qu.:108451  
##  Median :147530   Median :194280   Median :203491   Median :208627  
##  Mean   :246412   Mean   :250573   Mean   :232907   Mean   :238772  
##  3rd Qu.:377908   3rd Qu.:327902   3rd Qu.:282710   3rd Qu.:301585  
##  Max.   :558982   Max.   :587397   Max.   :541873   Max.   :535291  
##        47               52               57               62        
##  Min.   : 75111   Min.   : 78123   Min.   : 83950   Min.   : 89835  
##  1st Qu.:116682   1st Qu.:132619   1st Qu.:166564   1st Qu.:134496  
##  Median :202922   Median :198468   Median :215932   Median :191714  
##  Mean   :238567   Mean   :228908   Mean   :217667   Mean   :194490  
##  3rd Qu.:310572   3rd Qu.:284994   3rd Qu.:237916   3rd Qu.:240241  
##  Max.   :516059   Max.   :473883   Max.   :394833   Max.   :321841  
##        67               72               77               82        
##  Min.   : 40828   Min.   : 10427   Min.   :  3326   Min.   :  1228  
##  1st Qu.: 68957   1st Qu.: 26714   1st Qu.:  8363   1st Qu.:  2621  
##  Median : 83714   Median : 30854   Median : 11406   Median :  3906  
##  Mean   :178894   Mean   :132674   Mean   : 95807   Mean   : 65931  
##  3rd Qu.:295617   3rd Qu.:252797   3rd Qu.:190405   3rd Qu.:128030  
##  Max.   :434817   Max.   :371799   Max.   :291283   Max.   :214948  
##        87               92                97         
##  Min.   :   522   Min.   :  226.0   Min.   :  105.0  
##  1st Qu.:   918   1st Qu.:  268.5   1st Qu.:  221.8  
##  Median :  1400   Median :  355.5   Median :  364.5  
##  Mean   : 42908   Mean   :18949.2   Mean   : 4477.3  
##  3rd Qu.: 76533   3rd Qu.:28445.2   3rd Qu.: 5272.0  
##  Max.   :151789   Max.   :74710.0   Max.   :18948.0
# Converting from Wide to Long format
censusVizNew <- censusViz %>% gather(`17`:`97`, key = "Age", value = "Number")

Code

# Fully Engaged
fullyEngaged <- ggplot(data = censusVizNew[censusVizNew$Engagement=="Fully engaged",], aes(x = as.numeric(Age), y = Number, colour = Engagement)) + 
  geom_density2d(bins = 40, size = 0.4, colour = "#1f78b4", show.legend = TRUE) + 
  geom_point(show.legend = TRUE) +
  labs(colour = "Engagement Level") +
  scale_colour_manual(values=c("black")) +
  scale_y_continuous(limits = c(0, 600000), labels = comma, breaks=seq(0,600000,50000)) +
  scale_x_continuous(labels = comma, breaks=seq(17,97,10)) +
  labs(y="Number of Individuals") +
  theme(axis.title.x=element_blank(),
        axis.title.y = element_text(size=11, face="bold"),
        axis.ticks.length=unit(.03, "cm"),
        legend.title = element_text(size=10, face="bold"),
        legend.text = element_text(size=9),
        panel.border = element_rect(fill = NA, colour = "grey30", size = 0.1)) +
  facet_grid(.~Gender) +
  theme(strip.text.x = element_text(size = 10, face = "bold"))
# Partially Engaged
partiallyEngaged <- ggplot(data = censusVizNew[censusVizNew$Engagement=="Partially engaged",], aes(x = as.numeric(Age), y = Number, colour = Engagement)) + 
  geom_density2d(bins = 40, size = 0.4, colour = "#4daf4a", show.legend = TRUE) + 
  geom_point(show.legend = TRUE) +
  labs(colour = "Engagement Level") +
  scale_colour_manual(values=c("black")) +
  scale_y_continuous(limits = c(0, 600000), labels = comma, breaks=seq(0,600000,50000)) +
  scale_x_continuous(labels = comma, breaks=seq(17,97,10)) +
  labs(y="Number of Individuals") +
  theme(axis.title.x=element_blank(),
        axis.title.y = element_text(size=11, face="bold"),
        legend.title = element_text(size=10, face="bold"),
        legend.text = element_text(size=9),
        plot.title = element_text(hjust=0.5),
        axis.ticks.length=unit(.03, "cm"),
        panel.border = element_rect(fill = NA, colour = "grey30", size = 0.1)) +
  facet_grid(.~Gender) +
  theme(strip.text.x = element_text(size = 10, face = "bold"))
# Not Engaged
notEngaged <- ggplot(data = censusVizNew[censusVizNew$Engagement=="Not Engaged",], aes(x = as.numeric(Age), y = Number, colour = Engagement)) + 
  geom_density2d(bins = 40, size = 0.4, colour = "#e41a1c", show.legend = TRUE) + 
  geom_point(show.legend = TRUE) +
  labs(colour = "Engagement Level") +
  scale_colour_manual(values=c("black")) +
  scale_y_continuous(limits = c(0, 600000), labels = comma, breaks=seq(0,600000,50000)) +
  scale_x_continuous(labels = comma, breaks=seq(17,97,10)) +
  labs(y="Number of Individuals") +  
  theme(axis.title.x=element_blank(),
        axis.title.y = element_text(size=11, face = "bold"),
        legend.title = element_text(size=10, face="bold"),
        legend.text = element_text(size=9),
        axis.ticks.length=unit(.03, "cm"),
        panel.border = element_rect(fill = NA, colour = "grey30", size = 0.1)) +
  facet_grid(.~Gender) +
  theme(strip.text.x = element_text(size = 10, face = "bold"))

Visualisation

title1 = textGrob("2D Density Plots (with trend points) of Engagement and Age from Census 2016", hjust = 0.567, gp=gpar(fontface="bold"))
bottom1 = textGrob("Age of Individuals (years)", hjust = 0.68, gp=gpar(fontface="bold", fontsize=10))
grid.arrange(fullyEngaged, partiallyEngaged, notEngaged, 
             #left="Number of Individuals", 
             bottom=bottom1,
             top = title1)
The figure shows 2D density plots overlaid with the discrete sample number from the 2016 Australian Census (http://www.abs.gov.au/census). The columns represent gender, while each row represents engagement level (with the y-axis giving number of individuals, and the x-axis age in years). Fully engaged (for example full time work or full time tertiary study) shows a peak in male participation around child-bearing age, while females immediately begin disengaging from full engagement (participants under the age of 17 years are overwhelmingly still engaged in high school study, not included in these data). Females leaving child-bearing age (around 37 years of age) increase their engagement and then disengage at a rate equivalent to males (starting at approximately 37 years for females and approximately 32 years for males). The number of females in the partially engaged category (for example part-time work or study) gradually increases during child-bearing years, then gradually decrease. Males are relatively constant in this category. For both males and females, the category of not engaged (neither working or studying) is quite similar. In both plots an inflexion point is seen at 67 years, most likely associated with morbidity.

The figure shows 2D density plots overlaid with the discrete sample number from the 2016 Australian Census (http://www.abs.gov.au/census). The columns represent gender, while each row represents engagement level (with the y-axis giving number of individuals, and the x-axis age in years). Fully engaged (for example full time work or full time tertiary study) shows a peak in male participation around child-bearing age, while females immediately begin disengaging from full engagement (participants under the age of 17 years are overwhelmingly still engaged in high school study, not included in these data). Females leaving child-bearing age (around 37 years of age) increase their engagement and then disengage at a rate equivalent to males (starting at approximately 37 years for females and approximately 32 years for males). The number of females in the partially engaged category (for example part-time work or study) gradually increases during child-bearing years, then gradually decrease. Males are relatively constant in this category. For both males and females, the category of not engaged (neither working or studying) is quite similar. In both plots an inflexion point is seen at 67 years, most likely associated with morbidity.

Appendix

  1. Visit http://www.abs.gov.au/census
  2. Select TableBuilder under Data by Products
  3. Select Open Table Builder Guest (choose Continue to login as a Guest User)
  4. Drop down the folder menu on 2016 Census of Population and Housing
  5. Select 2016 Census - Employment, Income and Education
  6. From the Predefined table menu, select 2016 Census Engagement in Employment, Education and Training by Age in Five Year Groups
  7. Click Open Table
  8. Page source was viewed
  9. Page source added to excel
  10. Excel was imported into R environment and cleaned as above