Introduction

The consumption of non-alcoholic beverages is a significant aspect of dietary habits that can have substantial implications for public health. Understanding these consumption patterns can provide insights into nutritional intake and help inform public health strategies. In Australia, apparent consumption data offers valuable information on the average intake of various non-alcoholic beverages based on sales from the food retail sector, excluding restaurants and fast-food outlets. This data is crucial for understanding how dietary trends evolve over time and their potential health impacts. The primary question driving this investigation is: “What are the trends and patterns in the apparent consumption of non-alcoholic beverages in Australia for the financial year 2018-23?” This investigation aims to analyze these consumption patterns, focusing on per capita intake, changes over time, and the nutritional implications related to different types of non-alcoholic beverages. By examining beverage groups such as soft drinks, packaged water, fruit and vegetable juices, energy drinks, and others, the analysis seeks to identify significant trends, such as increases or decreases in consumption, and the potential health impacts associated with these trends.

Key concepts and terms:

Apparent Consumption: This refers to the amount of food and non-alcoholic beverages purchased from the food retail sector, providing an estimate of average dietary intake.
Per Capita Consumption: This metric represents the average amount of a particular beverage consumed per person.
Non-Alcoholic Beverages: This category includes soft drinks, packaged water, fruit and vegetable juices, energy drinks, electrolyte drinks, and cordials.

Introduction Cont.

Background

Recent data indicates that in the financial year 2022-23, the per capita apparent consumption of non-alcoholic beverages in Australia was 4533 mL per day, a decrease from 4636 mL per day in the previous year. This decline follows consecutive year-on-year increases between 2018-19 and 2021-22, yet the 2022-23 figure still represents an 11.25% increase from 2018-19 levels.

Description of the image

This investigation will explore these trends in detail, analyzing how different beverage types contribute to overall consumption and identifying the shifts in consumer preferences. Additionally, the study will assess the nutritional implications, particularly focusing on the intake of added and free sugars, which are critical for public health considerations.

By providing a comprehensive analysis of non-alcoholic beverage consumption in Australia, this project aims to inform public health policies and strategies to promote healthier dietary habits.

Problem Statement

The overall problem driving this investigation is to understand the trends and patterns in the apparent consumption of non-alcoholic beverages in Australia for the financial year 2022-23. Specifically, the investigation aims to analyze the changes in per capita consumption of various types of non-alcoholic beverages, identify significant increases or decreases, and assess the nutritional implications, particularly focusing on the intake of added and free sugars.

Statistical Approach

To solve this problem and answer the question, the analysis will employ a comprehensive statistical approach using the provided dataset. The key steps in this analysis will include:

Data Cleaning and Preparation: Ensure the dataset is clean and properly formatted for analysis. This includes handling missing values, ensuring consistency in data entries, and appropriately categorizing beverage types.
Descriptive Statistics: Calculate summary statistics such as mean, median, and standard deviation for the consumption of each type of beverage across different years and also Generate visualizations (e.g., bar charts, line graphs) to illustrate consumption trends over time.
Trend Analysis: Examine year-on-year changes in consumption for each beverage type to identify significant trends.
Comparative Analysis: Compare the proportions of sugar-sweetened and intense-sweetened beverages consumed over the years. Analyze the shifts in consumption patterns between different types of beverages, such as soft drinks, packaged water, fruit and vegetable juices, energy drinks, and electrolyte drinks.
Nutritional Implications: Assess the contribution of different beverage types to the total intake of free sugars and also Calculate the proportion of dietary energy derived from free sugars and compare it with the World Health Organisation’s recommendations.
Interpretation and Reporting: Summarise how you will use statistics to solve the problem or answer your question.

Data

Data Source

The data used in this project was sourced from the Australian Bureau of Statistics and other reputable organizations. The main dataset, which includes information on the apparent consumption of non-alcoholic beverages in Australia for the financial years 2018-19 to 2022-23, was obtained from the Food and Agriculture Organization (FAO) database. Additional references and guidelines were used to provide context and support the analysis.

Data Collection Process

Primary Dataset

The primary dataset containing information on non-alcoholic beverage consumption was downloaded from the FAO database. This dataset includes monthly and annual data on various beverage groups such as soft drinks, packaged water, fruit and vegetable juices, energy drinks, electrolyte drinks, and cordials. - [Apparent Consumption of Selected Foodstuffs, Australia](https://www.abs.gov.au/statistics/health/health-conditions-and-risks/apparent-consumption-selected-foodstuffs-australia/2022-23#data-downloads

Supplementary Data Sources

Australian Bureau of Statistics (ABS)

National Health and Medical Research Council (NHMRC)

Food Standards Australia New Zealand (FSANZ)

Determining the amount of added sugars and free sugars in foods listed in the AUSNUT 2011-13 dataset

Data Cont.

Data Replication

To replicate the data collection process, follow these steps:

Access the FAO Database

Navigate to the FAO website and locate the data section for non-alcoholic beverage consumption.
Download the dataset for the relevant financial years.

Access Supplementary Data Sources

Visit the websites of the ABS, and FSANZ to access the additional data and guidelines referenced in this study.
Download or record the necessary information related to consumer price indexes, retail trade, dietary guidelines, and nutrient reference values.

Sampling Method

The dataset primarily uses apparent consumption data, which is derived from sales of non-alcoholic beverages in the food retail sector. The sampling method involves collecting data from major supermarkets, smaller outlets such as convenience stores, butchers, seafood shops, bakeries, delis, and fresh food markets. It does not include data from fast food outlets, cafes, restaurants, or institutions that source from non-supermarket suppliers. This approach provides an estimate of the average dietary intake of non-alcoholic beverages based on retail sales.

Reference to Open Data

The primary data source for this analysis is open data provided by the FAO, supplemented by additional publicly accessible data from the ABS, NHMRC, FSANZ, and WHO. These sources offer comprehensive and reliable information necessary for conducting the analysis and drawing meaningful conclusions.

Data Cont.

Data Overview

The dataset contains the following columns:

Month_date: The date representing the month and year of the data.
Month_year: A numeric representation of the month and year.
month: The month number.
financial_year: The financial year the data belongs to.
Beverage_groups: The group of beverages (e.g., soft drinks, fruit juices).
Group_type: The categorization of the beverage group as major or minor.
Type_of_sweetness: The type of sweetness, if applicable.
Unit: The unit of measurement (e.g., mL).
Amount: The amount consumed.
Proportion: The proportion of the total consumption.

The analysis will proceed with a detailed examination of this dataset to uncover the consumption trends and insights.

Data Cont.

Data Preprocessing

Data Cleaning and Preparation

The dataset needs to be clean and properly formatted for analysis. The following steps will be taken:

Load the Data: Load the dataset from the file.

# Load the dataset
data <- read_csv("H:/RMIT/1st Sem/Advance Analytics/assingment_2/FAO_Data.csv")

# Separate the dataframe into major and minor groups
df_major <- data %>%
  filter(Group_type == "Major")

df_minor <- data %>%
  filter(Group_type == "Minor")

# Display the first few rows of the dataset
head(data)

Dimension, columns names and data type: Check the dimensions, column names, and data types of the dataset to understand its structure.

data %>% glimpse()

## Rows: 1,320
## Columns: 10
## $ Month_date        <chr> "7/1/2018", "7/1/2018", "7/1/2018", "7/1/2018", "7/1…
## $ Month_year        <dbl> 201807, 201807, 201807, 201807, 201807, 201807, 2018…
## $ month             <dbl> 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7…
## $ financial_year    <chr> "2018-19", "2018-19", "2018-19", "2018-19", "2018-19…
## $ Beverage_groups   <chr> "Fruit and vegetable juices", "Fruit juices", "Veget…
## $ Group_type        <chr> "Major", "Minor", "Minor", "Minor", "Major", "Minor"…
## $ Type_of_sweetness <chr> NA, NA, NA, NA, NA, NA, NA, NA, "Sugar sweetened", "…
## $ Unit              <chr> "mL", "mL", "mL", "mL", "mL", "mL", "mL", "mL", "mL"…
## $ Amount            <dbl> 31.9, 29.9, 0.9, 1.1, 16.8, 16.6, 0.1, 5.1, 4.2, 0.8…
## $ Proportion        <dbl> 11.1, 10.4, 0.3, 0.4, 5.8, 5.8, 0.0, 1.8, 1.5, 0.3, …

Handling Missing Values:: Identify and handle any missing values in the dataset.

colSums(is.na(data))

##        Month_date        Month_year             month    financial_year 
##                 0                 0                 0                 0 
##   Beverage_groups        Group_type Type_of_sweetness              Unit 
##                 0                 0               840                 0 
##            Amount        Proportion 
##                 0                 0

The concept of sweetness is not applicable to all beverage groups, leading to the presence of null values in the type of sweetness column.

Converting Month_date to Date data type::

# Convert the Month_date column to date datatype
data <- data %>%
  mutate(Month_date = as.Date(Month_date, format = "%m/%d/%Y"))

# Display the structure of the dataframe to confirm the change
str(data)

## tibble [1,320 × 10] (S3: tbl_df/tbl/data.frame)
##  $ Month_date       : Date[1:1320], format: "2018-07-01" "2018-07-01" ...
##  $ Month_year       : num [1:1320] 201807 201807 201807 201807 201807 ...
##  $ month            : num [1:1320] 7 7 7 7 7 7 7 7 7 7 ...
##  $ financial_year   : chr [1:1320] "2018-19" "2018-19" "2018-19" "2018-19" ...
##  $ Beverage_groups  : chr [1:1320] "Fruit and vegetable juices" "Fruit juices" "Vegetable juices" "Fruit and vegetable juice blends" ...
##  $ Group_type       : chr [1:1320] "Major" "Minor" "Minor" "Minor" ...
##  $ Type_of_sweetness: chr [1:1320] NA NA NA NA ...
##  $ Unit             : chr [1:1320] "mL" "mL" "mL" "mL" ...
##  $ Amount           : num [1:1320] 31.9 29.9 0.9 1.1 16.8 16.6 0.1 5.1 4.2 0.8 ...
##  $ Proportion       : num [1:1320] 11.1 10.4 0.3 0.4 5.8 5.8 0 1.8 1.5 0.3 ...

Descriptive Statistics and Visualisation

The “Major” group dataframe is particularly significant as it implies the inclusion of all the Minor group values. Therefore, for a comprehensive summary, we are considering only the “Major” group values.

df_major %>% group_by(Beverage_groups) %>% summarise(Min = min(Amount,na.rm = TRUE),
                                           Q1 = quantile(Amount,probs = .25,na.rm = TRUE),
                                           Median = median(Amount, na.rm = TRUE),
                                           Q3 = quantile(Amount,probs = .75,na.rm = TRUE),
                                           Max = max(Amount,na.rm = TRUE),
                                           Mean = mean(Amount, na.rm = TRUE),
                                           SD = sd(Amount, na.rm = TRUE),
                                           n = n(),
                                           Missing = sum(is.na(Amount))) -> table1
knitr::kable(table1)

Beverage_groups	Min	Q1	Median	Q3	Max	Mean	SD	n
Cordials	4.8	5.500	6.15	7.100	9.0	6.326667	0.9931744	60
Electrolyte drinks	4.5	6.100	7.80	9.125	12.8	7.796667	2.0071876	60
Energy drinks	7.4	8.875	11.20	11.800	13.4	10.461667	1.6741395	60
Fruit and vegetable drinks	14.0	15.975	16.80	18.425	20.1	17.198333	1.6078717	60
Fruit and vegetable juices	31.0	32.675	34.10	36.100	40.5	34.666667	2.5404969	60
Packaged water	79.4	103.400	118.90	137.125	178.8	120.243333	25.3440286	60
Soft drinks	135.7	151.675	165.95	175.500	223.7	166.016667	20.1040865	60

Central Tendency: The median and mean values provide insights into the typical consumption amounts for each beverage group. For most groups, the median and mean values are relatively close, indicating a symmetrical distribution of consumption values.
Variability: Standard deviation values highlight the variability in consumption. Groups like “Packaged Water” and “Soft Drinks” exhibit higher variability, suggesting a wider range of consumption patterns among consumers.
Range: The minimum and maximum values indicate the extent of consumption within each group. “Soft Drinks” and “Packaged Water” show the highest ranges, reflecting diverse consumption behaviors.
Quartiles: The first (Q1) and third (Q3) quartiles provide further details on the distribution. Groups like “Energy Drinks” and “Fruit and Vegetable Juices” show concentrated consumption patterns with closer Q1 and Q3 values.

Decsriptive Statistics Cont.

The box plot below shows the distribution of consumption amounts for different beverage groups.

ggplot(df_major, aes(x = Beverage_groups, y = Amount, fill = Beverage_groups)) +
  geom_boxplot() + theme_minimal() +
  labs(title = "Distribution of Beverage Consumption by Group",
    x = "Beverage Group",
    y = "Amount Consumed (mL)",
    fill = "Beverage Group") + theme(axis.text.x = element_text(angle = 45, hjust = 1))

Consumption Patterns: The plot shows distinct consumption patterns among beverage groups. Cordials, electrolyte drinks, and energy drinks have lower, consistent consumption. In contrast, fruit and vegetable juices, packaged water, and soft drinks have higher, more variable consumption.
Median Consumption: Soft drinks and packaged water have the highest median consumption, indicating that they are the most consumed beverages in terms of quantity.
Variability: Packaged water and soft drinks have the highest variability in consumption, suggesting diverse drinking habits.
Outliers: The presence of outliers in the soft drinks category indicates that some individuals consume these beverages in significantly higher quantities than the average.

Decsriptive Statistics Cont.

Total per capita consumption of each major beverage group across the financial years:

ggplot(df_major, aes(x = financial_year, y = Amount, fill = Beverage_groups)) +
  geom_bar(stat = "identity", position = "dodge") +
  theme_minimal() +
  labs(title = "Total Per Capita Consumption by Beverage Group",
    x = "Financial Year",
    y = "Amount Consumed (mL)",
    fill = "Beverage Group")

All beverage groups, except soft drinks and packaged water, show stable and significant demand, indicating they are staple choices for consumers.
Stable consumption trends for most beverage groups suggest consistent consumer preferences across financial years.
Packaged water consumption was consistent until 2020-21, then increased suddenly in 2021-22, followed by a slight decrease in 2022-23.
Soft drinks consumption increased until 2020-21, then showed a decreasing trend, indicating a shift in consumer behavior.
These trends provide valuable insights for marketing, product development, and strategic planning in the beverage industry.

Hypothesis Testing

Problem Statement

To determine if there is a significant difference in the mean per capita consumption of non-alcoholic beverages between 2018-19 and 2022-23, we perform a two-sample t-test.

Hypotheses

The null and alternative hypotheses for this test are:

Null Hypothesis (\(H_0\)): There is no significant difference in the average amount of non-alcoholic beverage consumption between 2018-19 and 2022-23. Mathematically, \[H_0: \mu_{2018-19} = \mu_{2022-23} \]
Alternative Hypothesis (\(H_A\)): There is a significant difference in the average amount of non-alcoholic beverage consumption between 2018-19 and 2022-23. Mathematically, \(H_A: \mu_{2018-19} \ne \mu_{2022-23}\). \[H_A: \mu_{2018-19} \ne \mu_{2022-23}\]

Assumptions

The samples from each year are independent.
The data is approximately normally distributed.
The variances of the two populations are equal.

Hypthesis Testing Cont.

The two-sample t-test is chosen for this analysis because it effectively compares the means of two independent groups, specifically the consumption data for the financial years 2018-19 and 2022-23. This test assumes that the data are approximately normally distributed and that the variances of the two groups are equal, which can be verified using tests like Shapiro-Wilk and Levene’s test. The two-sample t-test is robust to minor deviations from these assumptions, particularly with larger sample sizes, making it reliable for our dataset. If the variances are unequal, Welch’s t-test can be used as an alternative. This approach ensures a statistically sound comparison of the mean per capita consumption of non-alcoholic beverages between the two years.

# Filter the data for the two financial years
data_2018_19 <- filter(df_major, financial_year == "2018-19")
data_2022_23 <- filter(df_major, financial_year == "2022-23")
# Perform a two-sample t-test
t_test_result <- t.test(data_2018_19$Amount, data_2022_23$Amount)

Sum of Squared Differences

Calculate the sum of squared differences to understand the variability in the data.

\[S = \sum^n_{i = 1}d^2_i\]

# Calculate differences for paired observations (if applicable)
# For demonstration, assuming we have paired data (this is just an example, adjust as needed)
differences <- data_2018_19$Amount - data_2022_23$Amount

# Calculate sum of squared differences (S)
S <- sum(differences^2)

Hypthesis Testing Cont.

# Summary of the t-test result
print(t_test_result)

## 
##  Welch Two Sample t-test
## 
## data:  data_2018_19$Amount and data_2022_23$Amount
## t = -0.58615, df = 164.57, p-value = 0.5586
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -23.83975  12.92546
## sample estimates:
## mean of x mean of y 
##  48.51071  53.96786

# Sum of squared difference
print(S)

## [1] 14336.6

The t-value of -0.58615 indicates the extent to which the sample mean deviates from the population mean under the null hypothesis. With 166 degrees of freedom, we determine the critical value from the t-distribution table, which is essential for hypothesis testing. The p-value of 0.5586 is considerably higher than the typical alpha level of 0.05, indicating a lack of statistical significance. This high p-value suggests that there is no substantial difference in the mean per capita consumption of non-alcoholic beverages between the financial years 2018-19 and 2022-23.
The 95% confidence interval for the difference in means ranges from -23.83858 to 12.92429. This interval implies that we are 95% confident that the true difference in means lies within this range. Since this interval includes zero, it further supports the conclusion that there is no significant difference in the mean consumption between the two years.
In conclusion, we fail to reject the null hypothesis due to the high p-value (p > 0.05), signifying that there is no substantial statistical difference in the mean per capita consumption of non-alcoholic beverages between 2018-19 and 2022-23. The sample estimates indicate that the mean consumption for 2018-19 is approximately 48.51 mL, while for 2022-23 it is about 53.97 mL, but this observed difference is not statistically significant.

Hypthesis Testing Cont.

Categorical Association Analysis

For this analysis, let’s investigate if there is an association between the type of beverage groups and the financial years (2018-19 and 2022-23). #####Hypotheses: - Null Hypothesis(H_0):There is no association between the type of beverage groups and the financial years. - Alternative Hypothesis(H_A):There is an association between the type of beverage groups and the financial years.

# Filter data for the financial years 2018-19 and 2022-23
data_filtered <- df_major %>% filter(financial_year %in% c("2018-19", "2022-23"))

# Create a contingency table for the chi-square test
contingency_table <- table(data_filtered$Beverage_groups, data_filtered$financial_year)

# Perform the chi-square test of independence
chi_square_test <- chisq.test(contingency_table)

# Print the chi-square test result
chi_square_test

## 
##  Pearson's Chi-squared test
## 
## data:  contingency_table
## X-squared = 0, df = 6, p-value = 1

Chi-Squared Value: A chi-squared value of 0 indicates that there is no difference between the observed and expected frequencies in the contingency table. This means that the observed distribution of beverage groups across the financial years matches exactly what we would expect if there were no association between these variables.
p-value: The p-value of 1 is much higher than the typical alpha level of 0.05, indicating that there is no statistical evidence to reject the null hypothesis.
Given the p-value of 1, we fail to reject the null hypothesis. This means that there is no significant association between the type of beverage groups and the financial years (2018-19 and 2022-23). In other words, the distribution of different types of beverages consumed has not changed significantly between these two financial years. The data suggest that the type of beverage groups consumed in 2018-19 and 2022-23 are independent of each other

Discussion

Beverage-Specific Trends

Packaged Water: Showed consistent growth, peaking in 2021-22 before a slight decrease in 2022-23.
Soft Drinks: Increased until 2020-21, followed by a decline, indicating a shift in consumer preferences.
Other Beverages: Categories such as energy drinks and fruit juices maintained stable consumption levels.

Overall Consumption Trends

From 2018-19 to 2021-22, there was a noticeable rise in the per capita consumption of non-alcoholic beverages. However, in 2022-23, a slight decline occurred, primarily attributed to decreased consumption of packaged water and soft drinks. Despite this reduction, the overall consumption level in 2022-23 remains higher than that of 2018-19.

Statistical Analysis

Two-Sample t-Test: No significant difference in mean consumption between 2018-19 and 2022-23.
Chi-Square Test: No significant association between beverage types and financial years, suggesting stable consumption patterns.

Discussion cont

Strengths and Limitations

Strengths

Comprehensive Dataset: Covers multiple years and various beverage types, offering a broad perspective.
Robust Analysis: Utilized detailed statistical methods to ensure reliable results.

Limitations

Exclusion of Non-Retail Data: Missing consumption data from restaurants and fast-food outlets.
Data Accuracy: Sales data may not perfectly reflect actual consumption due to factors like wastage.

Directions for Future Investigations

Inclusive Data Collection: Incorporate non-retail consumption data for a more comprehensive analysis.
Longitudinal Studies: Examine long-term trends and factors influencing beverage consumption.
Nutritional Analysis: Focus on the nutritional content, especially added sugars, to understand health impacts better.

Conclusion

The investigation reveals stable yet notable trends in non-alcoholic beverage consumption in Australia, highlighting a general increase over the years with minor fluctuations. The findings underscore the importance of comprehensive data collection and detailed nutritional analysis to inform public health strategies effectively.

References

Wickham, H., François, R., Henry, L., & Müller, K. (2021). dplyr: A grammar of data manipulation (Version 1.1.4) [Computer software]. Retrieved from https://cran.r-project.org/package=dplyr
Wickham, H. (2023). ggplot2 (Version 3.5.0) [R package]. Accessed on June 26, 2023. Available online at: https://ggplot2.tidyverse.org/
Australian Bureau of Statistics. (2022-23). Apparent Consumption of Selected Foodstuffs, Australia. ABS. https://www.abs.gov.au/statistics/health/health-conditions-and-risks/apparent-consumption-selected-foodstuffs-australia/latest-release#cite-window1
OpenAI. (2023). ChatGPT (May 24 version) [Large language model]. Retrieved from https://chat.openai.com

Apparent Consumption of Non-Alcoholic Beverages in Australia (2018-23)

Analysis of Per Capita Consumption Patterns and Nutritional Implications

RPubs link information

Introduction

Key concepts and terms:

Introduction Cont.

Background

Problem Statement

Statistical Approach

Data

Data Source

Data Collection Process

Primary Dataset

Supplementary Data Sources

Australian Bureau of Statistics (ABS)

National Health and Medical Research Council (NHMRC)

Food Standards Australia New Zealand (FSANZ)

Data Cont.

Data Replication

Access the FAO Database

Access Supplementary Data Sources

Sampling Method

Reference to Open Data

Data Cont.

Data Overview

Data Cont.

Data Preprocessing

Data Cleaning and Preparation

Descriptive Statistics and Visualisation

Decsriptive Statistics Cont.

Decsriptive Statistics Cont.

Hypothesis Testing

Problem Statement

Hypotheses

Assumptions

Hypthesis Testing Cont.

Sum of Squared Differences

Hypthesis Testing Cont.

Hypthesis Testing Cont.

Categorical Association Analysis

Discussion

Beverage-Specific Trends

Overall Consumption Trends

Statistical Analysis

Discussion cont

Strengths and Limitations

Strengths

Limitations

Directions for Future Investigations

Conclusion

References

References