MATH1324 Assignment 3

Exploring the difference between weights after a specific diet plan

Hewayalage Vishva Lahiru Kantha Abeyrathne (s3735195) Kodithuwakku Arachchige Iresh Udara Kaushalya (s3704769)

Last updated: 28 October, 2018

RPubs link information

Rpubs link: http://rpubs.com/Vishva/MATH1324Assignment3

Introduction

People normally experiment with certain diet plans either to increase or decrease their respective weights.
Consequently, they monitor their weights in different periods in order to check whether diet plan is working or not.
It would be a decisive factor for them in terms of continuing their diet plans.
Comparison between weights before diet plan and weights after the diet plan would aid in exploring any existing difference between both the situations.
“Does diet plan reduce the weight of a person?” would be the main question to answer using statsitical analysis.

Problem Statement

Main objective of this investigation is to explore the difference between weights of the people before the diet plan and weights of the people after the diet plan.
Since weights before diet plan and weights after diet plan has been taken form the same population, measurements can be stated as “dependent” or “paired”.
Thefore, “paired-samples t-test” would be the best way to check for any significant difference or mean change in both the situations.

Problem Statement Cont.

Descriptive Statistics will be used to observe summary of both the situations (before & after) and summary of difference between mean values of both the situations.
Line plot will be used as a visualization tool to identify similarities between both weights.
Further, Box plot will be created with difference of weights in both the situations in order to identify any possible outliers prior to Paired-samples t-test.
Q-Q plot will be in use to observe the normality of the distribution in order to make sure that normality assumption is valid for Paired-sample t-test.
Finally, Paired-Sapmple t-test will be applied for distribution specifying relavant null hypothesis and alternative hypothesis in order to arrive to a decision.

Data

The Diet data set contains information on 78 people using one of three diets.
Data set is open to be used for any statistical analysis.
Link for the Dataset : https://www.sheffield.ac.uk/mash/statistics2/data
It contains 7 varibles as following.
- Person (Participant Number) -> Integer
- Gender -> Integer (Supposed to be a factor)
- Age -> Integer
- Height -> Integer
- pre.weight (Weight before diet) -> Integer
- Diet -> Integer (Supposed to be a factor)
- weight6weeks (Weight after 6 weeks of diet)-> Numeric

Data Cont.

“pre.weight” and “weight6weeks” can be identified as critical variables in this test.
Scale of “pre.weight” variable is between 58 and 103 while scale of “weight6weeks” stands between 53 and 103.
“Gender” and “Diet” are two factor variables in the dataset.
- Gender (Levels = “Female” “Male”)
- Diet (Levels = “Diet1” “Diet2” “Diet3”)
Originally, both “Gender” and “Diet” variables were in integer format.
Both “Gender” and “Diet” variable converted to factor variables after exploring the structure of the variables.

Descriptive Statistics and Visualisation

Descriptive statistics were generated for “pre.weight” and “weight6weeks” variables.

#Descriptive Statistics for weight6weeks and pre.weight
Diet %>%summarise(
  Mean_weight6weeks = mean(weight6weeks, na.rm = TRUE),
  SD_weight6weeks= sd(weight6weeks, na.rm = TRUE),
  Mean_pre.weight = mean(pre.weight, na.rm = TRUE),
  SD_pre.weight= sd(pre.weight, na.rm = TRUE),
  Mean_Difference= mean(weight6weeks - pre.weight, na.rm = TRUE),
  SD_Difference = sd(weight6weeks - pre.weight, na.rm = TRUE),
  n = n()
) -> table1

knitr::kable(table1)

Mean_weight6weeks	SD_weight6weeks	Mean_pre.weight	SD_pre.weight	Mean_Difference	SD_Difference	n
68.68077	8.924504	72.52564	8.723344	-3.844872	2.551478	78

Descriptive Statistics and Visualisation Cont.

Created “weight_difference” column by getting the difference of “pre.weight” and “weight6weeks” columns using mutate () function.
Descriptive statistics were generated for “weight_difference” variable.

#Create differences Column (weight_difference)
Diet <- Diet %>% mutate(weight_difference = weight6weeks - pre.weight)

#Descriptive Statistics for Column d
Diet %>% summarise(
  Min = min(weight_difference, na.rm = TRUE),
  Q1 = quantile(weight_difference, probs = .25, na.rm = TRUE),
  Median = median(weight_difference, na.rm = TRUE),
  Q3 = quantile(weight_difference, probs = .75, na.rm = TRUE),
  Max = max(weight_difference, na.rm = TRUE),
  Mean = mean(weight_difference, na.rm = TRUE),
  SD = sd(weight_difference, na.rm = TRUE),
  IQR = IQR(weight_difference, na.rm = TRUE),
  n = n(),
  Missing = sum(is.na(weight_difference))
) -> table2

knitr::kable(table2)

Min	Q1	Median	Q3	Max	Mean	SD	IQR	n	Missing
-9.2	-5.55	-3.6	-2	2.1	-3.844872	2.551478	3.55	78	0

Descriptive Statistics and Visualisation Cont2.

Line plot was plotted to visualize similarities between “pre.weight” and “weight6weeks”.
There were differences among weights for the most part.

#line plot Visualization
matplot(t(data.frame(Diet$weight6weeks, Diet$pre.weight)),
  type = "b",
  pch = 19,
  col = 1,
  lty = 1,
  xlab = "",
  ylab = "Weight",
  xaxt = "n"
  )
axis(1, at = 1:2, labels = c("After 6 weeks", "Before"))

Descriptive Statistics and Visualisation Cont3.

Scanned for possible outliers in “weight_difference” and there were none of them.

#Outliers
boxplot(Diet$weight_difference)

Descriptive Statistics and Visualisation Cont4.

Q-Q Plot was visualized to gurantee the normality assumption.
According to the Q-Q Plot, distribution is noramally distributed.

#Check normalaity of the differences using Q-Q plot
qqPlot(Diet$weight_difference, dist="norm")

## [1] 17 77

Hypothesis Testing

Paired-samples t-test is choosen according to the Statistical problem.
Normality assumption is fulfilled since n > 30 (78) and Q–Q Plot gurantees that distribution is normal.
Sigificant level will be taken as 0.05.
Null Hypothesis :

\[H_0: \mu_1 = \mu_2 \]
Alternative Hypothesis :

\[H_A: \mu_1 \ne \mu_2\]

Hypthesis Testing Cont.

Paired-samples t-test has been performed.
The mean difference between weights was -3.844872 and p-value was < 2.2e-16.
Since p-value was less than Sigificant level (0.05), reject H0 (Null Hypothesis).
Therefore, the hypothesis test is statistically significant [t(df=77)=-13.309, 95% [-4.420141 -3.269602] to accept that weights before the diet plan and weights after the diet plan are significantly different.

#Calculation of the paired sample t-test

t.test(Diet$weight6weeks, Diet$pre.weight,
       paired = TRUE,
       alternative = "two.sided")

## 
##  Paired t-test
## 
## data:  Diet$weight6weeks and Diet$pre.weight
## t = -13.309, df = 77, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -4.420141 -3.269602
## sample estimates:
## mean of the differences 
##               -3.844872

Discussion

Main finding of the investigation is that weights before the diet plan and weights after the diet plan are significantly different with the rejection of null hypothesis in the paired t-test.
It can be observed that After the diet plan, weights of the people tend to get decreased after 6 weeks.
Therefore it can be stated that diet plan has a impact in reducing the weight of the people.
Answer would be ‘Yes’ for the question, “Does diet plan reduce the weight of a person?”.
Major strenght of the investigation is the accuracy of the dataset. All the weights were taken from real world sample population.
But still there can be limitations with this investigation and further improvements might be required.
One limitation of this investigation is that size of the sample which is 78 records. Sample size can be improved as an improvement to make sure that sample distribution will be highly normaly distributed.

MATH1324 Assignment 3

Exploring the difference between weights after a specific diet plan

RPubs link information

Introduction

Problem Statement

Problem Statement Cont.

Data

Data Cont.

Descriptive Statistics and Visualisation

Descriptive Statistics and Visualisation Cont.

Descriptive Statistics and Visualisation Cont2.

Descriptive Statistics and Visualisation Cont3.

Descriptive Statistics and Visualisation Cont4.

Hypothesis Testing

Hypthesis Testing Cont.

Discussion

References