Apparent Consumption of Non-Alcoholic Beverages in Australia (2018-23)

Analysis of Per Capita Consumption Patterns and Nutritional Implications

Puneethkrishna Basavaiah (s4051957) and Rahul Ramesh (s4029852)

Last updated: 09 June, 2024

Introduction

The consumption of non-alcoholic beverages is a significant aspect of dietary habits that can have substantial implications for public health. Understanding these consumption patterns can provide insights into nutritional intake and help inform public health strategies. In Australia, apparent consumption data offers valuable information on the average intake of various non-alcoholic beverages based on sales from the food retail sector, excluding restaurants and fast-food outlets. This data is crucial for understanding how dietary trends evolve over time and their potential health impacts. The primary question driving this investigation is: “What are the trends and patterns in the apparent consumption of non-alcoholic beverages in Australia for the financial year 2018-23?” This investigation aims to analyze these consumption patterns, focusing on per capita intake, changes over time, and the nutritional implications related to different types of non-alcoholic beverages. By examining beverage groups such as soft drinks, packaged water, fruit and vegetable juices, energy drinks, and others, the analysis seeks to identify significant trends, such as increases or decreases in consumption, and the potential health impacts associated with these trends.

Key concepts and terms:

Introduction Cont.

Background

Recent data indicates that in the financial year 2022-23, the per capita apparent consumption of non-alcoholic beverages in Australia was 4533 mL per day, a decrease from 4636 mL per day in the previous year. This decline follows consecutive year-on-year increases between 2018-19 and 2021-22, yet the 2022-23 figure still represents an 11.25% increase from 2018-19 levels.

Description of the image

This investigation will explore these trends in detail, analyzing how different beverage types contribute to overall consumption and identifying the shifts in consumer preferences. Additionally, the study will assess the nutritional implications, particularly focusing on the intake of added and free sugars, which are critical for public health considerations.

By providing a comprehensive analysis of non-alcoholic beverage consumption in Australia, this project aims to inform public health policies and strategies to promote healthier dietary habits.

Problem Statement

The overall problem driving this investigation is to understand the trends and patterns in the apparent consumption of non-alcoholic beverages in Australia for the financial year 2022-23. Specifically, the investigation aims to analyze the changes in per capita consumption of various types of non-alcoholic beverages, identify significant increases or decreases, and assess the nutritional implications, particularly focusing on the intake of added and free sugars.

Statistical Approach

To solve this problem and answer the question, the analysis will employ a comprehensive statistical approach using the provided dataset. The key steps in this analysis will include:

Data

Data Source

The data used in this project was sourced from the Australian Bureau of Statistics and other reputable organizations. The main dataset, which includes information on the apparent consumption of non-alcoholic beverages in Australia for the financial years 2018-19 to 2022-23, was obtained from the Food and Agriculture Organization (FAO) database. Additional references and guidelines were used to provide context and support the analysis.

Data Collection Process

Primary Dataset

The primary dataset containing information on non-alcoholic beverage consumption was downloaded from the FAO database. This dataset includes monthly and annual data on various beverage groups such as soft drinks, packaged water, fruit and vegetable juices, energy drinks, electrolyte drinks, and cordials. - [Apparent Consumption of Selected Foodstuffs, Australia](https://www.abs.gov.au/statistics/health/health-conditions-and-risks/apparent-consumption-selected-foodstuffs-australia/2022-23#data-downloads

Supplementary Data Sources

Australian Bureau of Statistics (ABS)
National Health and Medical Research Council (NHMRC)
Food Standards Australia New Zealand (FSANZ)

Data Cont.

Data Replication

To replicate the data collection process, follow these steps:

Access the FAO Database

  1. Navigate to the FAO website and locate the data section for non-alcoholic beverage consumption.
  2. Download the dataset for the relevant financial years.

Access Supplementary Data Sources

  1. Visit the websites of the ABS, and FSANZ to access the additional data and guidelines referenced in this study.
  2. Download or record the necessary information related to consumer price indexes, retail trade, dietary guidelines, and nutrient reference values.

Sampling Method

The dataset primarily uses apparent consumption data, which is derived from sales of non-alcoholic beverages in the food retail sector. The sampling method involves collecting data from major supermarkets, smaller outlets such as convenience stores, butchers, seafood shops, bakeries, delis, and fresh food markets. It does not include data from fast food outlets, cafes, restaurants, or institutions that source from non-supermarket suppliers. This approach provides an estimate of the average dietary intake of non-alcoholic beverages based on retail sales.

Reference to Open Data

The primary data source for this analysis is open data provided by the FAO, supplemented by additional publicly accessible data from the ABS, NHMRC, FSANZ, and WHO. These sources offer comprehensive and reliable information necessary for conducting the analysis and drawing meaningful conclusions.

Data Cont.

Data Overview

The dataset contains the following columns:

The analysis will proceed with a detailed examination of this dataset to uncover the consumption trends and insights.

Data Cont.

Data Preprocessing

Data Cleaning and Preparation

The dataset needs to be clean and properly formatted for analysis. The following steps will be taken:

  1. Load the Data: Load the dataset from the file.
# Load the dataset
data <- read_csv("H:/RMIT/1st Sem/Advance Analytics/assingment_2/FAO_Data.csv")

# Separate the dataframe into major and minor groups
df_major <- data %>%
  filter(Group_type == "Major")

df_minor <- data %>%
  filter(Group_type == "Minor")

# Display the first few rows of the dataset
head(data)
  1. Dimension, columns names and data type: Check the dimensions, column names, and data types of the dataset to understand its structure.
data %>% glimpse()
## Rows: 1,320
## Columns: 10
## $ Month_date        <chr> "7/1/2018", "7/1/2018", "7/1/2018", "7/1/2018", "7/1…
## $ Month_year        <dbl> 201807, 201807, 201807, 201807, 201807, 201807, 2018…
## $ month             <dbl> 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7…
## $ financial_year    <chr> "2018-19", "2018-19", "2018-19", "2018-19", "2018-19…
## $ Beverage_groups   <chr> "Fruit and vegetable juices", "Fruit juices", "Veget…
## $ Group_type        <chr> "Major", "Minor", "Minor", "Minor", "Major", "Minor"…
## $ Type_of_sweetness <chr> NA, NA, NA, NA, NA, NA, NA, NA, "Sugar sweetened", "…
## $ Unit              <chr> "mL", "mL", "mL", "mL", "mL", "mL", "mL", "mL", "mL"…
## $ Amount            <dbl> 31.9, 29.9, 0.9, 1.1, 16.8, 16.6, 0.1, 5.1, 4.2, 0.8…
## $ Proportion        <dbl> 11.1, 10.4, 0.3, 0.4, 5.8, 5.8, 0.0, 1.8, 1.5, 0.3, …
  1. Handling Missing Values:: Identify and handle any missing values in the dataset.
colSums(is.na(data))
##        Month_date        Month_year             month    financial_year 
##                 0                 0                 0                 0 
##   Beverage_groups        Group_type Type_of_sweetness              Unit 
##                 0                 0               840                 0 
##            Amount        Proportion 
##                 0                 0

The concept of sweetness is not applicable to all beverage groups, leading to the presence of null values in the type of sweetness column.

  1. Converting Month_date to Date data type::
# Convert the Month_date column to date datatype
data <- data %>%
  mutate(Month_date = as.Date(Month_date, format = "%m/%d/%Y"))

# Display the structure of the dataframe to confirm the change
str(data)
## tibble [1,320 × 10] (S3: tbl_df/tbl/data.frame)
##  $ Month_date       : Date[1:1320], format: "2018-07-01" "2018-07-01" ...
##  $ Month_year       : num [1:1320] 201807 201807 201807 201807 201807 ...
##  $ month            : num [1:1320] 7 7 7 7 7 7 7 7 7 7 ...
##  $ financial_year   : chr [1:1320] "2018-19" "2018-19" "2018-19" "2018-19" ...
##  $ Beverage_groups  : chr [1:1320] "Fruit and vegetable juices" "Fruit juices" "Vegetable juices" "Fruit and vegetable juice blends" ...
##  $ Group_type       : chr [1:1320] "Major" "Minor" "Minor" "Minor" ...
##  $ Type_of_sweetness: chr [1:1320] NA NA NA NA ...
##  $ Unit             : chr [1:1320] "mL" "mL" "mL" "mL" ...
##  $ Amount           : num [1:1320] 31.9 29.9 0.9 1.1 16.8 16.6 0.1 5.1 4.2 0.8 ...
##  $ Proportion       : num [1:1320] 11.1 10.4 0.3 0.4 5.8 5.8 0 1.8 1.5 0.3 ...

Descriptive Statistics and Visualisation

The “Major” group dataframe is particularly significant as it implies the inclusion of all the Minor group values. Therefore, for a comprehensive summary, we are considering only the “Major” group values.

df_major %>% group_by(Beverage_groups) %>% summarise(Min = min(Amount,na.rm = TRUE),
                                           Q1 = quantile(Amount,probs = .25,na.rm = TRUE),
                                           Median = median(Amount, na.rm = TRUE),
                                           Q3 = quantile(Amount,probs = .75,na.rm = TRUE),
                                           Max = max(Amount,na.rm = TRUE),
                                           Mean = mean(Amount, na.rm = TRUE),
                                           SD = sd(Amount, na.rm = TRUE),
                                           n = n(),
                                           Missing = sum(is.na(Amount))) -> table1
knitr::kable(table1)
Beverage_groups Min Q1 Median Q3 Max Mean SD n Missing
Cordials 4.8 5.500 6.15 7.100 9.0 6.326667 0.9931744 60 0
Electrolyte drinks 4.5 6.100 7.80 9.125 12.8 7.796667 2.0071876 60 0
Energy drinks 7.4 8.875 11.20 11.800 13.4 10.461667 1.6741395 60 0
Fruit and vegetable drinks 14.0 15.975 16.80 18.425 20.1 17.198333 1.6078717 60 0
Fruit and vegetable juices 31.0 32.675 34.10 36.100 40.5 34.666667 2.5404969 60 0
Packaged water 79.4 103.400 118.90 137.125 178.8 120.243333 25.3440286 60 0
Soft drinks 135.7 151.675 165.95 175.500 223.7 166.016667 20.1040865 60 0

Decsriptive Statistics Cont.

The box plot below shows the distribution of consumption amounts for different beverage groups.

ggplot(df_major, aes(x = Beverage_groups, y = Amount, fill = Beverage_groups)) +
  geom_boxplot() + theme_minimal() +
  labs(title = "Distribution of Beverage Consumption by Group",
    x = "Beverage Group",
    y = "Amount Consumed (mL)",
    fill = "Beverage Group") + theme(axis.text.x = element_text(angle = 45, hjust = 1))

Decsriptive Statistics Cont.

Total per capita consumption of each major beverage group across the financial years:

ggplot(df_major, aes(x = financial_year, y = Amount, fill = Beverage_groups)) +
  geom_bar(stat = "identity", position = "dodge") +
  theme_minimal() +
  labs(title = "Total Per Capita Consumption by Beverage Group",
    x = "Financial Year",
    y = "Amount Consumed (mL)",
    fill = "Beverage Group")

Hypothesis Testing

Problem Statement

To determine if there is a significant difference in the mean per capita consumption of non-alcoholic beverages between 2018-19 and 2022-23, we perform a two-sample t-test.

Hypotheses

The null and alternative hypotheses for this test are:

Assumptions

  1. The samples from each year are independent.

  2. The data is approximately normally distributed.

  3. The variances of the two populations are equal.

Hypthesis Testing Cont.

# Filter the data for the two financial years
data_2018_19 <- filter(df_major, financial_year == "2018-19")
data_2022_23 <- filter(df_major, financial_year == "2022-23")
# Perform a two-sample t-test
t_test_result <- t.test(data_2018_19$Amount, data_2022_23$Amount)

Sum of Squared Differences

Calculate the sum of squared differences to understand the variability in the data.

\[S = \sum^n_{i = 1}d^2_i\]

# Calculate differences for paired observations (if applicable)
# For demonstration, assuming we have paired data (this is just an example, adjust as needed)
differences <- data_2018_19$Amount - data_2022_23$Amount

# Calculate sum of squared differences (S)
S <- sum(differences^2)

Hypthesis Testing Cont.

# Summary of the t-test result
print(t_test_result)
## 
##  Welch Two Sample t-test
## 
## data:  data_2018_19$Amount and data_2022_23$Amount
## t = -0.58615, df = 164.57, p-value = 0.5586
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -23.83975  12.92546
## sample estimates:
## mean of x mean of y 
##  48.51071  53.96786
# Sum of squared difference
print(S)
## [1] 14336.6

Hypthesis Testing Cont.

Categorical Association Analysis

For this analysis, let’s investigate if there is an association between the type of beverage groups and the financial years (2018-19 and 2022-23). #####Hypotheses: - Null Hypothesis(H_0):There is no association between the type of beverage groups and the financial years. - Alternative Hypothesis(H_A):There is an association between the type of beverage groups and the financial years.

# Filter data for the financial years 2018-19 and 2022-23
data_filtered <- df_major %>% filter(financial_year %in% c("2018-19", "2022-23"))

# Create a contingency table for the chi-square test
contingency_table <- table(data_filtered$Beverage_groups, data_filtered$financial_year)

# Perform the chi-square test of independence
chi_square_test <- chisq.test(contingency_table)

# Print the chi-square test result
chi_square_test
## 
##  Pearson's Chi-squared test
## 
## data:  contingency_table
## X-squared = 0, df = 6, p-value = 1

Discussion

From 2018-19 to 2021-22, there was a noticeable rise in the per capita consumption of non-alcoholic beverages. However, in 2022-23, a slight decline occurred, primarily attributed to decreased consumption of packaged water and soft drinks. Despite this reduction, the overall consumption level in 2022-23 remains higher than that of 2018-19.

Statistical Analysis

Discussion cont

Strengths and Limitations

Strengths

Limitations

Directions for Future Investigations

Conclusion

The investigation reveals stable yet notable trends in non-alcoholic beverage consumption in Australia, highlighting a general increase over the years with minor fluctuations. The findings underscore the importance of comprehensive data collection and detailed nutritional analysis to inform public health strategies effectively.

References

References

  1. Wickham, H., François, R., Henry, L., & Müller, K. (2021). dplyr: A grammar of data manipulation (Version 1.1.4) [Computer software]. Retrieved from https://cran.r-project.org/package=dplyr

  2. Wickham, H. (2023). ggplot2 (Version 3.5.0) [R package]. Accessed on June 26, 2023. Available online at: https://ggplot2.tidyverse.org/

  3. Australian Bureau of Statistics. (2022-23). Apparent Consumption of Selected Foodstuffs, Australia. ABS. https://www.abs.gov.au/statistics/health/health-conditions-and-risks/apparent-consumption-selected-foodstuffs-australia/latest-release#cite-window1

  4. OpenAI. (2023). ChatGPT (May 24 version) [Large language model]. Retrieved from https://chat.openai.com