eating = read.csv("Blackmore.csv")
In 2005 Blackmore and Davis attempted to study whether or not patient’s with anorexia nervosa were more likely than non-anorexic peers to exercise. The data included here is collected from 138 teenaged girls hospitalized for eating disorders and 98 control subjects. The data was collected using longitudinal, retrospective, self-report measurements. Participants have estimated exercise time per week reported for age 8, 10 and 12. They then have one or two more data points collected, many at age 14 and then one more measure. It seems as if the data points for ages 8 and 10 are standard and then the one to three more data points collected vary but all are less than 18 years of age at last collection point (max = 17.92 years). The original purpose of the data in its entirety was to examine if there was a link between physical activity in patients with anorexia nervosa as compared to a control group. The research also questioned whether the influence of parents’ activity on that of their children was significantly different for the anorexic group. The researchers initial prediction was that adolescents with anorexia nervosa would be significantly more active than healthy controls both prior to, and during, the progression of their disorder. They also “expected that the activity levels of parents and their daughters would be correlated, and that this relationship would be stronger in patient than control families. Finally, they expected that the anorexic patients’ parents would be more active and report a greater commitment to exercise than the control parents.” (Davis C1, Blackmore E, Katzman DK, Fox J, 2005)
The amended data, did not include the information about parent activity levels. It was reduced to only 4 variables: two categorical - patient number and patient/control group and two numerical - age and hours of exercise per week. This data was accessed through Github [https://vincentarelbundock.github.io/Rdatasets/datasets.html] and is titled Exercise Histories of Eating-Disordered and Control Subjects
age <- (eating$age)
exercise <- (eating$exercise)
group <- (eating$group)
patient <- eating[which (group == "patient"),] #subset patients 5 variables
control <- eating[which(group == "control"),] #subset control 5 variables
pexer12 <- subset(patient, age >12,
select=c(age, exercise)) # patient exercise data for age greater than 12
cexer12 <- subset(control, age >12,
select=c(age, exercise)) #control exercise data for age greater than 12
pexer10 <-subset (patient, age <=10, select=c(age, exercise)) #patient exercise data for age less than or equal to 10
cexer10 <-subset (control, age <=10, select=c(age, exercise)) #control exercise data for age less than or equal to 10
#boxplot to compare populations
boxplot(cexer12$exercise, pexer12$exercise, ylab = "Control Patient", horizontal= TRUE, main = "Exercise Habits: Anorexic Patients Vs Control", xlab = "estimated hours per week")
summary(cexer12$exercise)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 0.415 1.380 2.090 2.853 9.290
summary(pexer12$exercise)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 1.210 3.420 5.457 8.380 29.960
When subsetted to include only measures of exercise for patients and control group at ages greater than 12 years old, it is clear that, in general, the anorexic patients are spending much more time, on average, exercising. Though both groups have distributions for exercise that are skewed right, half of the patient group are exercising more than 75% of the control group. The patient group exercises 3.4 hours per week on average, while the control group exercises 1.38 hours per week.
#creating side-by-side histograms of population distributions so that I can run tests which require the data to be approximately normally distributed.
par(mfrow=c(1,2))
hist(pexer12$exercise, main = "Anorexic patients", xlab= "hrs exercise/wk")
hist(cexer12$exercise, main = "Control", xlab = "hrs exercise/wk")
Both distributions of exercise hours per week are skewed right. In order to better compare the groups, I need to change the intervals on the axes. However, because they are skewed, I need to gather samples before any inference procedures can be used. The central Limit Theorem states that the distribution of sample means will be approximately normally distributed.
Because both distributions, patient and control, are skewed right, I need to create sampling distributions in order to have approximately normally distributed distributions. The following code allows me to gather 50 samples (n=10) for each treatment group. I have narrowed the groups to only include participants data when they are age 12 or greater as I think that is most representative of what I would like to compare: a difference between the average exercise hours of patient and control groups
#pulling samples of size 10 to create a sampling distribution that is approximately normal.
set.seed(51919) #this function allows you to repeatedly run code chunk but sample remains consistent
pex12means<- rep(0, 50)
for (i in 1:50) {
sam <- sample(pexer12$exercise, 10)
pex12means[i] <- mean(sam)
} #running a loop, 50 samples of 10 with mean calculated for each. Vector is called pex12means
#samples of size 10 for control group.
set.seed(51919) #this function allows you to repeatedly run code chunk but sample remains consistent
cex12means<- rep(0, 50)
for (i in 1:50) {
sam2 <- sample(cexer12$exercise, 10)
cex12means[i] <- mean(sam2)
} #running a loop, 50 samples of 10 with mean calculated for each. Vector is called cex12means
#creating side-by-side histograms of sampling distributions so that I can run tests which require the data to be approximately normally distributed.
par(mfrow=c(1,2))
hist(pex12means, main = "Patient Sample Means", xlab= "hrs exercise/wk")
hist(cex12means, main = "Control Sample Means", xlab = "hrs exercise/wk")
Approximate Normality Achieved
The sampling distributions (n=10) for both groups, patient and control, are approximately normal. I can verify this with histograms below. I can now proceed with inference testing
`
#summary statistics for sampling distributions to determine whether or not data fits within 3 standard deviations of the mean
summary(pex12means)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.597 3.867 4.854 5.001 5.761 10.044
summary(cex12means)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.907 1.524 1.939 2.007 2.345 3.785
sd(pex12means)
## [1] 1.629129
sd(cex12means)
## [1] 0.60726
Both sampling distributions are unimodal and symmetric. The sample means for the patient group is centered at 5.001 hours with a standard deviation of 1.63 hours, while the control group is centered at 2.007 hours with a standard deviation of 0.607 hours. This sampling distribution is based on 50 sample means, wherein each sample n = 10. The mean for each sampling distribution plus or minus 3 std dev, fits most of the data
Null hypothesis: There is no difference in average exercise times between the patient and control groups Alternative hypothesis: The patient group exercises more on average than the control group
#t test for a difference in sample means
t.test(pex12means, cex12means)
##
## Welch Two Sample t-test
##
## data: pex12means and cex12means
## t = 12.174, df = 62.359, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 2.501811 3.484709
## sample estimates:
## mean of x mean of y
## 5.00064 2.00738
Because the P-Value is essentially 0, I will reject the null. There is evidence that the patient group exercises more on average than the control group
I am 95% confident that the TRUE mean difference in exercise hours between the anorexic patient group and the control group is between 2.5 and 3.48 hours per week.
As stated in the introduction, “the original purpose of the data in its entirety was to examine if there was a link between physical activity in patients with anorexia nervosa as compared to a control group.” According to my statistical analysis, I have shown that there is a link. Anorexic patients have higher physical activity, on average, than control group subjects of similar age.
```