MATH2270 Data Visualisation and Communication

Introduction

The ABS purpose is to inform Australia’s important health and risks decisions by partnering and innovating to deliver relevant, trusted, objective data, statistics and insights by conducting surveys.The Australian Bureau of Statistics releases Apparent Consumption of Alcohol in Australia in each year which provides the annual volume of alcohol available for consumption by types of alcoholic beverage.This is a main statistic which denotes the overall trend in apparent consumption in Australia.This has been driven by large shifts in the types of alcoholic beverage consumed.

Objective

By drawing proper insights from the open data Australian Bureau of Statistics, this storytelling will provides information about the Apparent Consumption Alcoholic beverages in the entire Australia

The health department of Australia throughout analyse the health conditions and risks of people and generate proper statistics related with Overheight and Obesity,Smoking,Apparent Alcohol Consumption of Alcohol etc.Apparent Alcohol Consumption will displays the recent trend of people whether the particular beverage is steady or falling,Available litres of alcoholic beverage types like Beer,Wine,Spirits,Cider,total consumption rate,adjusted total per capita rate etc can easily referred by using this dataset which is a main statistic of Australian Health Department.

Targetted Audience: This data tries to capture the attention of

Australian News and press reporters which reports the drinking trends or patterns,releases articles as per the data released from Bureau of Statistics.

People who are monitoring the health conditions and risks of Australian people,even the ordinary people.

National health Survey and SA Health authorities used these surveys to analyse the peoples drinking styles.Especially to focuss on studies about low aged peoples drinking patterns.

Alcohol Consumption overviews are highly referred by Australian Institute of Health and Welfare and also by World Health Organisations inorder to make proper decisions.

DATA SOURCE

Apparent Consumption of Alcohol, Australia, 2017-18. (2019, September 9). Australian Bureau of Statistics. https://www.abs.gov.au/statistics/health/health-conditions-and-risks/apparent-consumption-alcohol-australia/latest-release#data-download

URL

STORY URL

The storytelling with this open data are related with below Story URLs,In which these are the articles of The Guardian which detaily explains about the alcoholic consumption patterns of Australian people :

Smithers, R. (2020, March 11). Young drinkers’ thirst for no and low-alcohol beer sets new trend. The Guardian. https://www.theguardian.com/food/2020/mar/11/young-drinkers-thirst-for-no--and-low-alcohol-beer-sets-new-trend
Reporter, G. S. (2018, September 3). Australians drinking less alcohol now than any time in past 50 years. The Guardian. https://www.theguardian.com/australia-news/2018/sep/03/australians-drinking-less-alcohol-now-than-any-time-in-past-50-years
Hunt, E. (2017, September 20). Almost half of young Australian adults binge-drink every month, report says. The Guardian. https://www.theguardian.com/australia-news/2016/oct/07/almost-half-of-young-australian-adults-binge-drink-every-month-report-says

VISUALISATION URL

https://geenageorge.shinyapps.io/DataViz/

RPUBS URL

https://rpubs.com/Geena/684333

CODE

The Data Preprocessing is very important which may significantly influence the statistical conclusions based on the data.By preprocessing the data we can minimise the garbage that gets into our analysis so that we can reduce the amount so that our visualisations or models will result in best way.Below is the raw code used for Shiny app data visualisation and following are the data processing steps.

# This is the R chunk for the required packages

#Import the necessary packages
library(readr) # Useful for importing data
library(knitr) # Useful for creating nice tables
library(readxl) #Useful for getting data out of excel
library(dplyr) # Useful for data manipulation
library(tidyr) #Useful for tidying data
library(Hmisc) #Useful to perform many operations for data analysis 
library(lubridate) #Useful to work with dates and times
library(tidyverse) #Useful of having R packages
library(shiny) #Useful for building web applications
library(plotly) #Useful with having Statistics tools
library(rapportools) #Useful package having miscellaneous helper functions
library(ggplot2) #Useful for creating graphics
library(rsconnect) #Useful to use the R Shiny
library(foreign) # Useful for importing SPSS, SAS, STATA etc. data files

GET:Importing Data with R

# This is the R chunk for the Data Section

#Reading the excel data by considering the particular sheet and skipping the header
DataSet <- read_excel("Data_20172018.xls", col_names = FALSE, sheet = 8, skip = 6)
#Considering the eighth sheet data and skiping first six rows having description

#View the header of data object using head() function
head(DataSet)

DATA PREPROCESSING

#Subsetting the required data 
df <- subset(DataSet, select = -c(2:7))

#Removing the last columns having the datasource information
df <- head(df, -10)

#Assigning the column names
colnames(df) <- c("Year", "Total Beer", "Total Wine", "Total Spirits and RTDs", "Total Cider", "Total Consumption","Total Per Capita")

#Data Cleaning - Changing the year range to Year
df$Year <-  substr(df$Year, 1 , nchar(df$Year)-3)

UNDERSTAND: Understanding Data and Data Structures

In the Understand step the strture of data structures are checked by using the str() function. Based on that the data structures we can convert into proper datastructures to get better plots.

#Checking the data structure
str(df)

#Data type Conversions,Converting to numeric data structure
df$Year <- as.numeric(df$Year)
df$`Total Wine` <- as.numeric(df$`Total Wine`)
df$`Total Spirits and RTDs` <- as.numeric(df$`Total Spirits and RTDs`)
df$`Total Cider` <- as.numeric(df$`Total Cider`)
df$`Total Consumption` <- as.numeric(df$`Total Consumption`)
df$`Total Per Capita` <- as.numeric(df$`Total Per Capita`)

SCAN: Missing Values

By using the colSums() function we can calculate the total missing values of each column in the data where this function will displays the total number of NA values in each column.The NA values are replaced with zero as the missing values denote as “not applicable or not available data”.After replacing with zero there is no errors in database.There by no replacings,imputations are needed to proceed with this dataset.

#Scan the each column to check whether there is any missing values.
colSums(is.na(df))

#Replacing the missing values with zero,the not available values are represented as zero
df[is.na(df)] = 0

SCAN: Outliers

An outlier is defined as an observation which stands far away from the most of other observations.We have to remove the outliers inorder to plot the best visualisation as part of Scan process in data wrangling.

#Using summary() function displaying the quartiles,median mean and to check outliers
summary(df)

After analysing the results there is no outliers are present in our data.

TIDY AND MANIPULATE: Tidy Data Principles and Manipulating Data

As per Tidy Data principles, Each variable must have its own column. Each observation must have its own row. *Each value must have its own cell. which makes the dataset tidy.While checking with these rules the column names are values used here instead of variables,So using gather() function alcohol beverage type columns gathered into a new pair of variables with key Alcohol_Type.The argument name value is the Volume in litres

#Tiding the data : Converting the wide to long format
df1 <- df %>% gather(key = 'Alcohol_Type', value = 'Volume(Litres)', -Year)

#Converting to factors and reordering the factors in Alcohol_Type column
df1$Alcohol_Type <- factor(df1$Alcohol_Type, levels = c("Total Beer", "Total Wine", "Total Spirits and RTDs", "Total Cider", "Total Consumption","Total Per Capita"), ordered = TRUE)

R Shiny App - Data Visualisation

After the above data processing steps, we will get the proper data which is not having any errors or outliers.We are able to use this for data visualisation.The R Shiny App have two components, A user interface object and a server function.The source for both of these components is listed below.The user interface is defined as follows:

ui <- fluidPage(
  titlePanel("Apparent Consumption of Alcohol in Australia"),
  #Dashboard visualisation by giving the title and layout
  sidebarLayout(
    #SliderInput is used to select the year in the sidebarPanel
    sidebarPanel(
      sliderInput(
        label = "Select the Year(1944 - 2017) ",
        "Year",
        min(df1$Year), 
        max(df1$Year),
        value = 2017, #Set the default value 
        animate = animationOptions(interval = 1000, loop = TRUE)),
      #Suggestive text to use data visualisation
      strong("You can view the consumption of different Alcohol Beverages like Beer,Wine,Spirits,Cider along with the Total Alcohol and Adjusted Total per Capita values in Litres.You can press Play to see the alcohol consumption value changes across year 1944 to 2017."),
      br(),
      br(),
      p("*The Alcohol Beverage Cider consumption is recorded from Year 2004."), 
      p("*The adjusted Total per Capita recorded from Year 2000"),
      p("Data from : Apparent Consumption of Alcohol, Australia, 2017-18. (2019, September 9). Australian Bureau of Statistics. https://www.abs.gov.au/statistics/health/health-conditions-and-risks/apparent-consumption-alcohol-australia/latest-release#data-download")
    ),
    
    #Main panel is having three tabs
    mainPanel(tabsetPanel(
      type = "tabs",
      tabPanel("BarPlot",
               fluidRow(
                 plotOutput("BarPlot"), 
                 #Barplot denotes the apparent consumption of beverages along with total consumption values
                 br()
               )
      ),
      tabPanel("Line Graph - Total Consumption",
               plotOutput("LineGraph"),
               strong("Above graph denotes the total alcohol consumption for year range 1960-2017 having the intercept line for the consumption in 2017 of 9.51 litres.")
      ),
      tabPanel("Line Graph - Total Per Capita",
               plotOutput("LineGraph1"),
               strong("Above graph denotes total alcohol consumption per capita for year range 1960-2017 having the intercept line for the consumption in 2017 of 12.43 litres.")
      )
    )
    )
  )
)

The server-side of the application is shown below. At one level, It’s very simple:

Tab 1: A Barplot which indicates apparent alcohol consumption by Popular Beverage types like Beer,Wine,Spirits and RTD’s,Cider along with the total consumption of alcohol,average per capita.
Tab 2: A Line graph represents the total apparent consumption of alcohol in the year 1960-2017 having the intercept line for the consumption in 2017 of 9.51 litres.
Tab 3: A Line graph represents the adjusted total per capita for year range 1960-2017 having the intercept line for the consumption in 2017 of 12.43 litres.

The statistics displayed is the volumes in litres per person aged 15 years and over.The Alcohol Beverage Cider consumption is recorded from year 2004.And the adjusted total per capita recorded from year 2000.

server <- function(input, output) {
  #Function of the Bargraph
  output$BarPlot <- renderPlot({
    #Using the Subset, values are displayed for each year
    Bardata <- subset(df1, df1$Year == input$Year)
    
    #Plotting the bargraph using ggplot function
    ggplot(data = Bardata, aes(x = Alcohol_Type, y = `Volume(Litres)`)) + geom_col(aes(fill = Alcohol_Type), color = "blue") + 
      labs(title = "Apparent Alcohol Consumption by Popular Beverage Types", x = "Alcohol Beverage Type", y = "Volume in Litres") + 
      scale_y_continuous(limits = c(0,15), expand = c(0,0)) + theme_bw() +
      scale_fill_manual(values = c("#003f5c", "#444e86", "#955196", "#dd5182", "#ff6e54","#ffa600"))
  })
  
  #Lineplot of Total Alcohol consumption 
  output$LineGraph <- renderPlot({
    #Using the Subset function plotting the lineplot
    Data1 <- subset(df1, df1$Alcohol_Type == "Total Consumption")
    
    #Plotting the line graph of Total Consumption using ggplot
    ggplot(data = Data1, aes(x = Year, y = `Volume(Litres)`, color = `Volume(Litres)`)) + geom_line() +  
      labs(title ="Apparent Consumption of Alcohol (1960 - 2017)", x = "Year", y = "Volume in Litres") + 
      theme_bw() + 
      scale_y_continuous(limits = c(9,14),
                         expand = c(0,0)) + scale_x_continuous(limits = c(1960, 2020)) + 
      geom_hline(yintercept=9.51, linetype="longdash", color = "red")
  })
  
  #Lineplot of Total Per Capita
  output$LineGraph1 <- renderPlot({
    #Using the Subset function plotting the lineplot of Total per capita
    Data2 <- subset(df1, df1$Alcohol_Type == "Total Per Capita")
    
    #Plotting the line graph of Total Per Capita using ggplot
    ggplot(data = Data2, aes(x = Year, y = `Volume(Litres)`, color = `Volume(Litres)`)) + geom_line() +  
      labs(title ="Total per Capita Consumption of Alcohol (2000 - 2017)", x = "Year", y = "Volume in Litres") +
      theme_bw() + 
      scale_y_continuous(limits = c(12,14),
                         expand = c(0,0)) + scale_x_continuous(limits = c(2000, 2020)) + 
      geom_hline(yintercept=12.43, linetype="longdash", color = "red")
  })
}

DEPLOYING R SHINY APP

#We using the shinyapp function to create a Shiny app object from the UI/Server pair as defined below:
# Deploy app
shinyApp(ui = ui, server = server)

CONCLUSION

Over the past 60 years, the per capita trend in apparent consumption of alcohol may be viewed as occurring in several phases:First was a steep increase from the early 1960s leading to the peak of 13.1 litres per capita in 1974-75. Consumption around this level was maintained until the early 1980s when annual consumption fell consistently through to the early 1990s, and from that point hovered around 10 litres per capita for around a decade.Consumption increased once again over the period 2002-03 to 2008-09 where it reached 10.84 litres per capita.Since 2008-09, consumption has declined reaching 9.51 litres per capita in 2017-18, only slightly higher than the 9.48 in 2016-17. From this storytelling of this open data the overall trend in apparent consumption has been driven by large shifts in the types of alcoholic beverage consumed.This story will provides the compelling story using data visualisation of Alcohol consumption pattern of Australia.

REFERENCES

Shiny - The basic parts of a Shiny app. (2017, June 28). Shiny from R Studio. https://shiny.rstudio.com/articles/basics.html

About the Australian Bureau of Statistics. (n.d.-b). Australian Bureau of Statistics. https://www.abs.gov.au/about