Puneethkrishna Basavaiah (s4051957) and Rahul Ramesh (s4029852)
Last updated: 09 June, 2024
It is not necessary (That is, it is optional and not compulsory) but if you like you can publish your presentation to RPubs (see here) and add this link to your presentation here.
Rpubs link comes here: www………
The consumption of non-alcoholic beverages is a significant aspect of dietary habits that can have substantial implications for public health. Understanding these consumption patterns can provide insights into nutritional intake and help inform public health strategies. In Australia, apparent consumption data offers valuable information on the average intake of various non-alcoholic beverages based on sales from the food retail sector, excluding restaurants and fast-food outlets. This data is crucial for understanding how dietary trends evolve over time and their potential health impacts. The primary question driving this investigation is: “What are the trends and patterns in the apparent consumption of non-alcoholic beverages in Australia for the financial year 2018-23?” This investigation aims to analyze these consumption patterns, focusing on per capita intake, changes over time, and the nutritional implications related to different types of non-alcoholic beverages. By examining beverage groups such as soft drinks, packaged water, fruit and vegetable juices, energy drinks, and others, the analysis seeks to identify significant trends, such as increases or decreases in consumption, and the potential health impacts associated with these trends.
Recent data indicates that in the financial year 2022-23, the per capita apparent consumption of non-alcoholic beverages in Australia was 4533 mL per day, a decrease from 4636 mL per day in the previous year. This decline follows consecutive year-on-year increases between 2018-19 and 2021-22, yet the 2022-23 figure still represents an 11.25% increase from 2018-19 levels.
This investigation will explore these trends in detail, analyzing how different beverage types contribute to overall consumption and identifying the shifts in consumer preferences. Additionally, the study will assess the nutritional implications, particularly focusing on the intake of added and free sugars, which are critical for public health considerations.
By providing a comprehensive analysis of non-alcoholic beverage consumption in Australia, this project aims to inform public health policies and strategies to promote healthier dietary habits.
The overall problem driving this investigation is to understand the trends and patterns in the apparent consumption of non-alcoholic beverages in Australia for the financial year 2022-23. Specifically, the investigation aims to analyze the changes in per capita consumption of various types of non-alcoholic beverages, identify significant increases or decreases, and assess the nutritional implications, particularly focusing on the intake of added and free sugars.
To solve this problem and answer the question, the analysis will employ a comprehensive statistical approach using the provided dataset. The key steps in this analysis will include:
Data Cleaning and Preparation: Ensure the dataset is clean and properly formatted for analysis. This includes handling missing values, ensuring consistency in data entries, and appropriately categorizing beverage types.
Descriptive Statistics: Calculate summary statistics such as mean, median, and standard deviation for the consumption of each type of beverage across different years and also Generate visualizations (e.g., bar charts, line graphs) to illustrate consumption trends over time.
Trend Analysis: Examine year-on-year changes in consumption for each beverage type to identify significant trends.
Comparative Analysis: Compare the proportions of sugar-sweetened and intense-sweetened beverages consumed over the years. Analyze the shifts in consumption patterns between different types of beverages, such as soft drinks, packaged water, fruit and vegetable juices, energy drinks, and electrolyte drinks.
Nutritional Implications: Assess the contribution of different beverage types to the total intake of free sugars and also Calculate the proportion of dietary energy derived from free sugars and compare it with the World Health Organisation’s recommendations.
Interpretation and Reporting: Summarise how you will use statistics to solve the problem or answer your question.
The data used in this project was sourced from the Australian Bureau of Statistics and other reputable organizations. The main dataset, which includes information on the apparent consumption of non-alcoholic beverages in Australia for the financial years 2018-19 to 2022-23, was obtained from the Food and Agriculture Organization (FAO) database. Additional references and guidelines were used to provide context and support the analysis.
The primary dataset containing information on non-alcoholic beverage consumption was downloaded from the FAO database. This dataset includes monthly and annual data on various beverage groups such as soft drinks, packaged water, fruit and vegetable juices, energy drinks, electrolyte drinks, and cordials. - [Apparent Consumption of Selected Foodstuffs, Australia](https://www.abs.gov.au/statistics/health/health-conditions-and-risks/apparent-consumption-selected-foodstuffs-australia/2022-23#data-downloads
To replicate the data collection process, follow these steps:
The dataset primarily uses apparent consumption data, which is derived from sales of non-alcoholic beverages in the food retail sector. The sampling method involves collecting data from major supermarkets, smaller outlets such as convenience stores, butchers, seafood shops, bakeries, delis, and fresh food markets. It does not include data from fast food outlets, cafes, restaurants, or institutions that source from non-supermarket suppliers. This approach provides an estimate of the average dietary intake of non-alcoholic beverages based on retail sales.
The primary data source for this analysis is open data provided by the FAO, supplemented by additional publicly accessible data from the ABS, NHMRC, FSANZ, and WHO. These sources offer comprehensive and reliable information necessary for conducting the analysis and drawing meaningful conclusions.
The dataset contains the following columns:
The analysis will proceed with a detailed examination of this dataset to uncover the consumption trends and insights.
The dataset needs to be clean and properly formatted for analysis. The following steps will be taken:
# Load the dataset
data <- read_csv("H:/RMIT/1st Sem/Advance Analytics/assingment_2/FAO_Data.csv")
# Separate the dataframe into major and minor groups
df_major <- data %>%
filter(Group_type == "Major")
df_minor <- data %>%
filter(Group_type == "Minor")
# Display the first few rows of the dataset
head(data)## Rows: 1,320
## Columns: 10
## $ Month_date <chr> "7/1/2018", "7/1/2018", "7/1/2018", "7/1/2018", "7/1…
## $ Month_year <dbl> 201807, 201807, 201807, 201807, 201807, 201807, 2018…
## $ month <dbl> 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7…
## $ financial_year <chr> "2018-19", "2018-19", "2018-19", "2018-19", "2018-19…
## $ Beverage_groups <chr> "Fruit and vegetable juices", "Fruit juices", "Veget…
## $ Group_type <chr> "Major", "Minor", "Minor", "Minor", "Major", "Minor"…
## $ Type_of_sweetness <chr> NA, NA, NA, NA, NA, NA, NA, NA, "Sugar sweetened", "…
## $ Unit <chr> "mL", "mL", "mL", "mL", "mL", "mL", "mL", "mL", "mL"…
## $ Amount <dbl> 31.9, 29.9, 0.9, 1.1, 16.8, 16.6, 0.1, 5.1, 4.2, 0.8…
## $ Proportion <dbl> 11.1, 10.4, 0.3, 0.4, 5.8, 5.8, 0.0, 1.8, 1.5, 0.3, …
## Month_date Month_year month financial_year
## 0 0 0 0
## Beverage_groups Group_type Type_of_sweetness Unit
## 0 0 840 0
## Amount Proportion
## 0 0
The concept of sweetness is not applicable to all beverage groups, leading to the presence of null values in the type of sweetness column.
# Convert the Month_date column to date datatype
data <- data %>%
mutate(Month_date = as.Date(Month_date, format = "%m/%d/%Y"))
# Display the structure of the dataframe to confirm the change
str(data)## tibble [1,320 × 10] (S3: tbl_df/tbl/data.frame)
## $ Month_date : Date[1:1320], format: "2018-07-01" "2018-07-01" ...
## $ Month_year : num [1:1320] 201807 201807 201807 201807 201807 ...
## $ month : num [1:1320] 7 7 7 7 7 7 7 7 7 7 ...
## $ financial_year : chr [1:1320] "2018-19" "2018-19" "2018-19" "2018-19" ...
## $ Beverage_groups : chr [1:1320] "Fruit and vegetable juices" "Fruit juices" "Vegetable juices" "Fruit and vegetable juice blends" ...
## $ Group_type : chr [1:1320] "Major" "Minor" "Minor" "Minor" ...
## $ Type_of_sweetness: chr [1:1320] NA NA NA NA ...
## $ Unit : chr [1:1320] "mL" "mL" "mL" "mL" ...
## $ Amount : num [1:1320] 31.9 29.9 0.9 1.1 16.8 16.6 0.1 5.1 4.2 0.8 ...
## $ Proportion : num [1:1320] 11.1 10.4 0.3 0.4 5.8 5.8 0 1.8 1.5 0.3 ...
The “Major” group dataframe is particularly significant as it implies the inclusion of all the Minor group values. Therefore, for a comprehensive summary, we are considering only the “Major” group values.
df_major %>% group_by(Beverage_groups) %>% summarise(Min = min(Amount,na.rm = TRUE),
Q1 = quantile(Amount,probs = .25,na.rm = TRUE),
Median = median(Amount, na.rm = TRUE),
Q3 = quantile(Amount,probs = .75,na.rm = TRUE),
Max = max(Amount,na.rm = TRUE),
Mean = mean(Amount, na.rm = TRUE),
SD = sd(Amount, na.rm = TRUE),
n = n(),
Missing = sum(is.na(Amount))) -> table1
knitr::kable(table1)| Beverage_groups | Min | Q1 | Median | Q3 | Max | Mean | SD | n | Missing |
|---|---|---|---|---|---|---|---|---|---|
| Cordials | 4.8 | 5.500 | 6.15 | 7.100 | 9.0 | 6.326667 | 0.9931744 | 60 | 0 |
| Electrolyte drinks | 4.5 | 6.100 | 7.80 | 9.125 | 12.8 | 7.796667 | 2.0071876 | 60 | 0 |
| Energy drinks | 7.4 | 8.875 | 11.20 | 11.800 | 13.4 | 10.461667 | 1.6741395 | 60 | 0 |
| Fruit and vegetable drinks | 14.0 | 15.975 | 16.80 | 18.425 | 20.1 | 17.198333 | 1.6078717 | 60 | 0 |
| Fruit and vegetable juices | 31.0 | 32.675 | 34.10 | 36.100 | 40.5 | 34.666667 | 2.5404969 | 60 | 0 |
| Packaged water | 79.4 | 103.400 | 118.90 | 137.125 | 178.8 | 120.243333 | 25.3440286 | 60 | 0 |
| Soft drinks | 135.7 | 151.675 | 165.95 | 175.500 | 223.7 | 166.016667 | 20.1040865 | 60 | 0 |
The box plot below shows the distribution of consumption amounts for different beverage groups.
ggplot(df_major, aes(x = Beverage_groups, y = Amount, fill = Beverage_groups)) +
geom_boxplot() + theme_minimal() +
labs(title = "Distribution of Beverage Consumption by Group",
x = "Beverage Group",
y = "Amount Consumed (mL)",
fill = "Beverage Group") + theme(axis.text.x = element_text(angle = 45, hjust = 1))Total per capita consumption of each major beverage group across the financial years:
ggplot(df_major, aes(x = financial_year, y = Amount, fill = Beverage_groups)) +
geom_bar(stat = "identity", position = "dodge") +
theme_minimal() +
labs(title = "Total Per Capita Consumption by Beverage Group",
x = "Financial Year",
y = "Amount Consumed (mL)",
fill = "Beverage Group")To determine if there is a significant difference in the mean per capita consumption of non-alcoholic beverages between 2018-19 and 2022-23, we perform a two-sample t-test.
The null and alternative hypotheses for this test are:
Null Hypothesis (\(H_0\)): There is no significant difference in the average amount of non-alcoholic beverage consumption between 2018-19 and 2022-23. Mathematically, \[H_0: \mu_{2018-19} = \mu_{2022-23} \]
Alternative Hypothesis (\(H_A\)): There is a significant difference in the average amount of non-alcoholic beverage consumption between 2018-19 and 2022-23. Mathematically, \(H_A: \mu_{2018-19} \ne \mu_{2022-23}\). \[H_A: \mu_{2018-19} \ne \mu_{2022-23}\]
The samples from each year are independent.
The data is approximately normally distributed.
The variances of the two populations are equal.
# Filter the data for the two financial years
data_2018_19 <- filter(df_major, financial_year == "2018-19")
data_2022_23 <- filter(df_major, financial_year == "2022-23")
# Perform a two-sample t-test
t_test_result <- t.test(data_2018_19$Amount, data_2022_23$Amount)Calculate the sum of squared differences to understand the variability in the data.
\[S = \sum^n_{i = 1}d^2_i\]
##
## Welch Two Sample t-test
##
## data: data_2018_19$Amount and data_2022_23$Amount
## t = -0.58615, df = 164.57, p-value = 0.5586
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -23.83975 12.92546
## sample estimates:
## mean of x mean of y
## 48.51071 53.96786
## [1] 14336.6
For this analysis, let’s investigate if there is an association between the type of beverage groups and the financial years (2018-19 and 2022-23). #####Hypotheses: - Null Hypothesis(H_0):There is no association between the type of beverage groups and the financial years. - Alternative Hypothesis(H_A):There is an association between the type of beverage groups and the financial years.
# Filter data for the financial years 2018-19 and 2022-23
data_filtered <- df_major %>% filter(financial_year %in% c("2018-19", "2022-23"))
# Create a contingency table for the chi-square test
contingency_table <- table(data_filtered$Beverage_groups, data_filtered$financial_year)
# Perform the chi-square test of independence
chi_square_test <- chisq.test(contingency_table)
# Print the chi-square test result
chi_square_test##
## Pearson's Chi-squared test
##
## data: contingency_table
## X-squared = 0, df = 6, p-value = 1
Chi-Squared Value: A chi-squared value of 0 indicates that there is no difference between the observed and expected frequencies in the contingency table. This means that the observed distribution of beverage groups across the financial years matches exactly what we would expect if there were no association between these variables.
p-value: The p-value of 1 is much higher than the typical alpha level of 0.05, indicating that there is no statistical evidence to reject the null hypothesis.
Given the p-value of 1, we fail to reject the null hypothesis. This means that there is no significant association between the type of beverage groups and the financial years (2018-19 and 2022-23). In other words, the distribution of different types of beverages consumed has not changed significantly between these two financial years. The data suggest that the type of beverage groups consumed in 2018-19 and 2022-23 are independent of each other
Packaged Water: Showed consistent growth, peaking in 2021-22 before a slight decrease in 2022-23.
Soft Drinks: Increased until 2020-21, followed by a decline, indicating a shift in consumer preferences.
Other Beverages: Categories such as energy drinks and fruit juices maintained stable consumption levels.
From 2018-19 to 2021-22, there was a noticeable rise in the per capita consumption of non-alcoholic beverages. However, in 2022-23, a slight decline occurred, primarily attributed to decreased consumption of packaged water and soft drinks. Despite this reduction, the overall consumption level in 2022-23 remains higher than that of 2018-19.
The investigation reveals stable yet notable trends in non-alcoholic beverage consumption in Australia, highlighting a general increase over the years with minor fluctuations. The findings underscore the importance of comprehensive data collection and detailed nutritional analysis to inform public health strategies effectively.
Wickham, H., François, R., Henry, L., & Müller, K. (2021). dplyr: A grammar of data manipulation (Version 1.1.4) [Computer software]. Retrieved from https://cran.r-project.org/package=dplyr
Wickham, H. (2023). ggplot2 (Version 3.5.0) [R package]. Accessed on June 26, 2023. Available online at: https://ggplot2.tidyverse.org/
Australian Bureau of Statistics. (2022-23). Apparent Consumption of Selected Foodstuffs, Australia. ABS. https://www.abs.gov.au/statistics/health/health-conditions-and-risks/apparent-consumption-selected-foodstuffs-australia/latest-release#cite-window1
OpenAI. (2023). ChatGPT (May 24 version) [Large language model]. Retrieved from https://chat.openai.com