Assignment 3

Click the Overview, Code and Output tabs to explore International students in Australia. The reflections tab notes key learning and unexplored ideas.

Overview

Education was Australia’s third largest export category in 2019 (Australia Expo, 2019). As a destination Australia is known to be safe and welcoming, which continued to the success of this more than $32 billion sector (Australia Expo, 2019).

Context

On the next tabs, you will see how the Australian Education sector has grown and evolved (Summary tab) (Map tab). The interactive visualisations on the pages will allow you to explore international students and their origins in more detail.

Code

This app is hosted at shinyapp.io (https://maher-au.shinyapps.io/InternationalStudents/)

#
# This is a Shiny web application. You can run the application by clicking
# the 'Run App' button above.
#
# Find out more about building applications with Shiny here:
#
#    http://shiny.rstudio.com/
#

#call all libraries
library(shiny)
library(plotly)
library(dplyr)
library(ggplot2)
library(tidyr)
library(magrittr)
library(stringr)
library(readr)

# #this commented out section, was used to create the datafile for the map---
# #try and draw a map of where inbound students come from 
# 
# library(readxl)
# Int_2016 <- read_excel("International student data.xlsx", 
#                        sheet = "2016")
# Int_2017 <- read_excel("International student data.xlsx", 
#                        sheet = "2017")
# Int_2018 <- read_excel("International student data.xlsx", 
#                        sheet = "2018")
# Int_2019 <- read_excel("International student data.xlsx", 
#                        sheet = "2019")
# 
# #Combine dataset
# library(dplyr)
# library(tidyr)
# library(magrittr)
# 
# int_data <- bind_rows(Int_2016, Int_2017, Int_2018, Int_2019, id = NULL)
# 
# #int_data %<>% rename (`Country name` = Nationality)
# 
# #create table 
# int_data2 <- int_data %>% group_by(Nationality, Year) %>% summarise (count = sum(`DATA YTD Commencements`), 
#                                                                      max = max(`DATA YTD Commencements`))
# 
# 
# 
# # #try and different table ----
# library(rvest)
# Wiki2 <- read_html("https://en.wikipedia.org/w/index.php?title=ISO_3166-1&diff=986005040&oldid=985745388") #Use permanent link to prevent nodes changing
# all_tables2 <- html_nodes(Wiki2,"table")
# Iso_country2 <- html_table(all_tables2[[3]], header=TRUE)
# 
# 
# #combine tables (students + iso) ----
# library(stringr)
# int_data2$Nationality <- str_trim (int_data2$Nationality, side = "both")
# int_data2$Nationality <- str_replace_all (int_data2$Nationality, pattern="\t", replacement = "")
# Iso_country2$`English short name (using title case)` <- str_trim (Iso_country2$`English short name (using title case)`, side = "both")
# Iso_country2$`English short name (using title case)` <- str_replace_all (Iso_country2$`English short name (using title case)`, pattern="\t", replacement = "")
# 
# int_iso2 <- left_join (int_data2, Iso_country2, by  = c("Nationality" = "English short name (using title case)" ))
# 
# unmatched2 <- anti_join(int_data2, Iso_country2, by = c("Nationality" = "English short name (using title case)"))
# 
# Translate2 <-unique(unmatched2$Nationality)
# 
# #hard code the list to ensure the data matches - next time consider creating a lookup table with multiple country columns which can be used for 'matching'
# countrylist <- c("Bolivia (Plurinational State of)",  
#                  "Congo, Democratic Republic of the", 
#                  "Congo", 
#                  "Côte d'Ivoire",
#                  "Czechia",
#                  "Timor-Leste", 
#                  "Other", 
#                  "Iran (Islamic Republic of)",
#                  "Korea (Democratic People's Republic of)", 
#                  "Korea, Republic of",
#                  "Serbia",
#                  "Lao People's Democratic Republic",
#                  "China", 
#                  "Moldova, Republic of",
#                  "Netherlands", 
#                  "Other", 
#                  "Samoa", 
#                  "Saint Helena, Ascension and Tristan da Cunha", 
#                  "Saint Kitts and Nevis",
#                  "Saint Lucia", 
#                  "Saint Vincent and the Grenadines",
#                  "Eswatini",
#                  "Syrian Arab Republic",
#                  "Taiwan, Province of China",
#                  "Tanzania, United Republic of",
#                  "United Kingdom of Great Britain and Northern Ireland", 
#                  "United Kingdom of Great Britain and Northern Ireland",
#                  "Venezuela (Bolivarian Republic of)", 
#                  "Viet Nam",
#                  "Congo, Democratic Republic of the")
# 
# #create dataframe with names which need to be updated
# translate_country <- as.data.frame(cbind(Translate2, countrylist))
# 
# #add new column with 'updated' names
# int_data3 <-left_join(int_data2, translate_country, by = (c ("Nationality" ="Translate2" )))
# 
# #add names to other rows
# int_data3$countrylist[is.na(int_data3$countrylist)]<-int_data3$Nationality[is.na(int_data3$countrylist)]
# 
# 
# int_data3$countrylist[is.na(int_data3$countrylist)]<-as.character(int_data3$Nationality[is.na(int_data3$countrylist)])
# 
# #join map data with international student data
# map_int_data <- left_join (int_data3, Iso_country2, by =c( "countrylist"= "English short name (using title case)"))
# 
# map_int_data2 <- aggregate(map_int_data$count, 
#                            by = list(map_int_data$countrylist,
#                                      map_int_data$`Alpha-3 code`, 
#                                      map_int_data$Year), FUN = sum)
# colnames(map_int_data2) <- c("Country", "ISO_3", "Year", "Count")
# 
# 
# 
# 
# 
# #create data for plotting 
# data <- map_int_data2
# 
# #filter data
# #data <- map_int_data2 %>% filter( Year == 2017)
# 
# 
# # draw map----
# library(plotly)
# 
# #log data to help show range of data
# data$log <- log10((data$Count +1))
# write.csv(data, "internationalstudentnumbers.csv")
# 
# #this exported data file is then used as the input for the shiny app. There was some conflict which I didn't have time to resolve with this code in shiny. 


#create data frame for map analysis ----
dataP <-read.csv("internationalstudentnumbers.csv")
dataP$hovertext <- paste(  "In",dataP$Year, "<br>", dataP$Count, "students arrived from","<br><b>", dataP$Country, "</b><extra></extra>")

#define a white, yellow, red, purple colour scale for the map
colours <- c('#F0F0F0', '#F0E73E', '#F0973E', '#F04D3E', '#9f3Ef0')


#define internal country borders
l <-list(color = 808080, width =0.5)

#define presentation of outlines (country/continent)
g <- list (showframe = FALSE,
           showcoastlines = TRUE,
           projection = list(type ='Mercator'))


#create data frame for bar graph ----
#import data
monthly_data <- read_csv("intst2.csv")


#create individual data frames for different variables----

comm_this_month <- monthly_data[19:23,1:13]
colnames(comm_this_month) <-comm_this_month[1,]
comm_this_month <- comm_this_month[-1,]

comm_earlier<- monthly_data[27:31,1:13]
colnames(comm_earlier) <-comm_earlier[1,]
comm_earlier <- comm_earlier[-1,]


#reformat table and add new column for type (for each table)----

comm_this_month2 <- pivot_longer(comm_this_month, cols = 2:13, names_to = "month", 
                                 values_to = "count")
comm_this_month2$student_type <- "Started this month"

comm_earlier2 <- pivot_longer(comm_earlier, cols = 2:13, names_to = "month", 
                              values_to = "count")
comm_earlier2$student_type <- "Started earlier this year"

#combine tables in a dataframe ----

student_data <-rbind(comm_earlier2, comm_this_month2)

My_months <- c('Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec')
student_data$month <- factor(student_data$month,
                             levels = My_months, 
                             labels = My_months,
                             ordered = TRUE)

#remove the , from the count column and make it numeric
student_data$count <- str_remove_all(student_data$count, ",")
student_data$count <- as.numeric(student_data$count)

#set ggplot theme
theme_set(  theme_minimal())
#assign colours for the bar graph
colour_scheme3 <- c("#c6f2f7",  "#D9F0D3", "#A6DBA0" ,"#E7D4E8" , "#C2A5CF" )




# Define UI for application that draws a histogram
ui <- fluidPage(fluidRow(column (7, 
    h3("Exploring International Students in Australia's Higher Education Sector"),
    p("Education is now Australia's third largest export, the sector contributes over $37 billion to the economy (van Onselen, 2019),
    and has been growing at a rapid rate for more than 5 years (Tehan, 2019)."),     
    p("Where are Australia's international students in Higher Education arriving from, and when do they arrive in Australia?")),
   (column (4,
    # Create a drop down list, called "select"
    selectInput("select", label = h3("Please select your year of interest"), 
                choices = list("2016" = 2016, "2017" = 2017, "2018" = 2018, "2019" = 2019), 
                selected = 2016)
    ))),
    p(  "Use the drop down box on the right to select a year, and then explore the map to see where 
      international students arrive into Australia from. The graph below details when higher education students 
      started studying in Australia."),
    hr(),

        mainPanel(
          width = 12, 
            fluidRow(column (6,             
                             h3("Most Higher Educaiton students are arriving from China and India"),
           plotlyOutput("mapPlot"),
           tags$div("Data source:",
           tags$a(href="https://internationaleducation.gov.au/research/International-Student-Data/Pages/InternationalStudentData2019.aspx",
           "Department of Education, Skills and Employment. (2020). International Student Data 2019"),
           br(),
           tags$a(href="https://en.wikipedia.org/w/index.php?title=ISO_3166-1",
                  "Wikipedia. (2020). ISO3166-1"))),
        column (6,
        h3("Higher Educaiton students tend to arrive in two peaks (semester 1, and semester 2)"),
        plotlyOutput("graphPlot"),
        tags$div("Data source:",
                 tags$a(href="https://internationaleducation.gov.au/research/International-Student-Data/Pages/InternationalStudentData2019.aspx",
                        "Department of Education, Skills and Employment. (2020). International Student Data 2019")))), 
        p(),
        tags$b("Interactions:"),
        p("Hover your mouse over the data, or icons for more informaiton. You are able to move or zoom on the map, and graph. To reset the visualisation, on the menu in the top right of the figure press the - X within a square."),
    hr(),
        h4("References"),
        tags$ul(
          tags$li("Department of Education, Skills and Employment. (2020). International Student Data 2019. Internaitonal Student Data 2019. https://internationaleducation.gov.au/research/International-Student-Data/Pages/InternationalStudentData2019.aspx"), 
                  tags$li("Tehan, D. (2019, November 22). International education makes significant economic contribution. Ministers’ Media Centre. https://ministers.dese.gov.au/tehan/international-education-makes-significant-economic-contribution"),
                          tags$li("van Onselen, L. (2019, November 25). Australia’s $37.6b international student export con. Australia’s $37.6b International Student Export Con. https://www.macrobusiness.com.au/2019/11/australias-37-6b-international-student-export-con/")
    )))



#working out how to do this for the map
server <- function(input, output) {
    output$value <- renderPrint ({input$select})
    
# create map
    output$mapPlot <- renderPlotly({

        # filter data to use with  map
        data <- filter(dataP, Year == as.numeric({input$select}))
        #establish the geo data for the map
        fig2 <- plot_geo(data = data)

        #declare the title for the map, and set the parameters for the base map
        fig2 <- fig2 %>% layout(
            title = paste("Student Country of Origin data for", {input$select}),
            geo = g
            )

        #add the relevant data to be displayed as a trace
        fig2 <- fig2 %>% add_trace(z = ~log,   color = ~log, colors = colours,
                                   text = ~hovertext,
                                   locations = ~ISO_3,
                                   marker = list(line = l),
                                    hovertemplate =  "%{text}"
                                   )
        fig2 <- fig2 %>% colorbar (title= "Student numbers (log)
0 = 1 student
4 = 10,000 students")
      

        #render/declare the map
        fig2
    })
    output$graphPlot <- renderPlotly({
        student_data2 <- filter(student_data, Year == as.numeric({input$select}))
        p3 <- ggplot (data=student_data2, aes( x=month, y= count/1000, fill =student_type))+
            geom_bar(stat ="identity")+
            ylab ("Enrollments (thousands)")+
            xlab ("Date")+
            scale_fill_manual(values= colour_scheme3)+
            ggtitle (paste("International student enrolments by month", {input$select}))
        p3
        
        #create plotlywrapper
        gg1<-ggplotly(p3)
        gg1
    })
}

Summary

The submitted version of this app is hosted at shinyapp.io - [https://maher-au.shinyapps.io/InternationalStudents/] it is also rended below, although formatting has not been optimised.

Retro

Overall the visualisation comes close to what I was attempting to deliver.

Key things I explored in this assignment

creating maps (with ggplot and plotly)
Using plotly - both as a wrapper for a ggplot image and a map
creating shiny app
creating and publishing to web hosted platforms (plotly and shiny.io)

On the map things I would improve with additional time & skill include:

The scale: I have used a log scale to provide better colour resolution for the scale, but this can be difficult to interpret. With more time, I would have used the actual numbers 10, 100, 1000 on the scale rather than the log 1, 2, 3. I have incorporated two point into the scale title to try and assist people unfamiliar with log scales to interpret the results. I have also used the ‘true student number’ rather than the log student number in the hover text. In an attempt not to mislead the viewers.
The hover text: Currently it is being created as a column in the data frame and then being ‘fed’ into the hovertemplate as a single field. There should be a way to ‘feed’ all this information into the hover text using %{} but I couldn’t find a way to get it working.
The Plotly icons overlap with key information title etc on the graph, in the future I would hope to avoid this.

On the student number graph, things I would improve in the future include

Hovertext: Update to include more useful descriptions. Currently it is autogenerated and supports interpretation of the graphic. Additional items which could be improved.
Onscreen layout - to get the references onto a single page, I had to use an unordered list, this introduced list with bullets, which could be removed with further coding
The width of the text in the shiny.io app as it spans the whole page width can be difficult to read. Next time I will look at optimising the column widths further.

With respect to data, or expanding this analysis in the future I would consider

including data from other places (Canada, US etc) (examples https://internationaleducation.gov.au/research/otherinternationaldata/pages/international-education-data-sources.aspx, or http://uis.unesco.org/)
including more years
Connecting the monthly data from the graph to the map, to allow selection and navigation across both images

Assignment 3

Data Story

Sheryl Maher (Student Number: s3869791)

Overview

Code

Summary

Retro