MATH1324 Semester 2, 2019 - Assignment 3

Investigation on the Relationship between
Sugar Intake and Energy Levels in Australia

Ashutosh Kumar Singh (s3797767)

Last updated: 27 October, 2019

Introduction

According to Harvard Health,

What’s a good breakfast? One that delivers some healthful protein, some slowly digested carbohydrates, and some fruit or vegetables.

According to Better Health Channel (Victoria),

Glucose is the body’s energy source… In the morning, after you have gone without food for as long as 12 hours, your glycogen stores are low… Without carbohydrate, … cause reduced energy levels. Eating breakfast restores your glycogen stores and boosts your energy levels.


Then, sugar – a type of sweet-tasting, soluble carbohydrate – is a source of energy, which is mostly included in breakfast cereals (in one form or another) in Australia.

So, it brings us to the question:

Problem Statement

Does eating more sugar give enough energy?


In other words, as is the common belief in Australia:


So, it raises the question:

Is there exists a weak but positive relationship between sugar intake from breakfast cereals and energy levels?

Data

Dataset Origin

Dataset Contents

Variables used in this Investigation

Data (Cont.)

Filter

Sampling Method

head(data)

Data (Cont.)

Subsetting Data

data <- data[, c(1, 5, 13)]

head(data)

Descriptive Statistics and Visualisation

Summary and Missing Values

data %>% summarise(Variable = "Total Sugar",
                   Min = min(`Total sugars (g)`, na.rm = TRUE),
                   Max = max(`Total sugars (g)`, na.rm = TRUE),
                   Mean = mean(`Total sugars (g)`, na.rm = TRUE),
                   Missing = sum(is.na(`Total sugars (g)`))) -> table1

knitr::kable(table1)
Variable Min Max Mean Missing
Total Sugar 0.6 46 17.48699 0
data %>% summarise(Variable = "Energy with Dietary Fibre",
                   Min = min(`Energy, with dietary fibre (kJ)`, na.rm = TRUE),
                   Max = max(`Energy, with dietary fibre (kJ)`, na.rm = TRUE),
                   Mean = mean(`Energy, with dietary fibre (kJ)`, na.rm = TRUE),
                   Missing = sum(is.na(`Energy, with dietary fibre (kJ)`))) -> table2

knitr::kable(table2)
Variable Min Max Mean Missing
Energy with Dietary Fibre 287 1776 1469.52 0

Descriptive Statistics and Visualisation (Cont.)

Multivariate Outlier Detection

data1 <- data[, 2:3]

results <- mvn(data1, 
               multivariateOutlierMethod = "quan", 
               showOutliers = TRUE, showNewData = TRUE)

Descriptive Statistics and Visualisation (Cont.)

Dataset without Outliers

data2 <- data.frame(energy = results$newData[,1],
                    sugars = results$newData[,2])

results <- mvn(data2, 
               multivariateOutlierMethod = "quan")

Hypothesis Testing

Assumptions for Linear Regression

1. Independence

Every breakfast cereal is the dataset is recorded independently.

2. Linearity

Residual vs. Fitted Plot

model1 <- lm(Energy..with.dietary.fibre..kJ. ~ Total.sugars..g., data = data2)

plot(model1, which = 1)

Hypothesis Testing (Cont.)

3. Normality of Residuals

Normal Q-Q Plot

plot(model1, which = 2)

Hypothesis Testing (Cont.)

4. Homoscedasticity

Scale-Location Plot

plot(model1, which = 3)

Hypothesis Testing (Cont.)

Linear Regression

summary(model1)
## 
## Call:
## lm(formula = Energy..with.dietary.fibre..kJ. ~ Total.sugars..g., 
##     data = data2)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -238.043  -47.079    1.984   50.390  254.333 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      1496.3033    16.6862  89.673   <2e-16 ***
## Total.sugars..g.    1.8788     0.8076   2.326   0.0218 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 88.52 on 114 degrees of freedom
## Multiple R-squared:  0.04532,    Adjusted R-squared:  0.03695 
## F-statistic: 5.412 on 1 and 114 DF,  p-value: 0.02177

Hypothesis Testing (Cont.)

Overall Regression Model

Hypotheses

\[H_0: The\,data\,don't\,fit\,the\,linear\,regression\,model.\] \[H_A: The\,data\,do\,fit\,the\,linear\,regression\,model.\]

Hypothesis Testing (Cont.)

Intercept or Constant

Hypotheses

\[H_0: α = 0\] \[H_A: α ≠ 0\]

confint(model1)
##                         2.5 %      97.5 %
## (Intercept)      1463.2480701 1529.358604
## Total.sugars..g.    0.2789205    3.478691

Hypothesis Testing (Cont.)

Slope

Hypotheses

\[H_0: β = 0\]

\[H_A: β ≠ 0\]

confint(model1)
##                         2.5 %      97.5 %
## (Intercept)      1463.2480701 1529.358604
## Total.sugars..g.    0.2789205    3.478691

Discussion

Statistical Results

Conclusion

Discussion (Cont.)

Investigation Limitations

Investigation Strengths

Data Limitations

Data Strengths

References

Data

More info