Title: Estimation of physical activity levels using cell phone questionnaires: A comparison with accelerometry for evaluation of between-subject and within-subject variations
Abstract: Physical activity promotes health and longevity. From a business perspective, healthier employees are more likely to report to work, miss less days, and cost less for health insurance. Your business wants to encourage healthy livestyles in a cheap and affordable way through health care incentive programs. The use of telecommunication technologies such as cell phones is highly interesting in this respect. In an earlier report, we showed that physical activity level (PAL) assessed using a cell phone procedure agreed well with corresponding estimates obtained using the doubly labeled water method. However, our earlier study indicated high within-subject variation in relation to between-subject variations in PAL using cell phones, but we could not assess if this was a true variation of PAL or an artifact of the cell phone technique. Objective: Our objective was to compare within- and between-subject variations in PAL by means of cell phones with corresponding estimates using an accelerometer. In addition, we compared the agreement of daily PAL values obtained using the cell phone questionnaire with corresponding data obtained using an accelerometer.
- Gender: male and female subjects were examined in this experiment.
- PAL_cell: average physical activity values for the cell phone accelerometer (range 0-100).
- PAL_acc: average physical activity values for the hand held accelerometer (range 0-100).
APA write ups should include means, standard deviation/error, t-values, p-values, effect size, and a brief description of what happened in plain English.
df <- read.csv("09_data.csv")
a) Include output and indicate how the data are not accurate.
Use summary function to check the accuracy, data are all accurate, gender only has "male" and "female".PAL_cell and PAL_acc are all in the range 0-100.
b) Include output to show how you fixed the accuracy errors, and describe what you did.
No accuracy erros, only I did here is to rename the first column to be gender, for easier manipulation for the following analysis.
summary(df)
## ï..gender PAL_cell PAL_acc
## female:50 Min. :38.99 Min. :36.62
## male :50 1st Qu.:58.34 1st Qu.:61.70
## Median :65.58 Median :71.75
## Mean :64.50 Mean :71.85
## 3rd Qu.:70.92 3rd Qu.:83.32
## Max. :88.37 Max. :99.39
# Data are all accurate, just need to rename first column to be gender
names(df)[1] <- "gender"
a) Include output that shows you have missing data.
Summary function shows that we don't have missing data.
b) Include output and a description that shows what you did with the missing data.
summary(df)
## gender PAL_cell PAL_acc
## female:50 Min. :38.99 Min. :36.62
## male :50 1st Qu.:58.34 1st Qu.:61.70
## Median :65.58 Median :71.75
## Mean :64.50 Mean :71.85
## 3rd Qu.:70.92 3rd Qu.:83.32
## Max. :88.37 Max. :99.39
a) Include a summary of your mahal scores that are greater than the cutoff.
Use table function we get 0 mahal scores that are greater than cutoff.
b) What are the df for your Mahalanobis cutoff?
DF = 2 (# of variables used to calculate Mahalanobis)
c) What is the cut off score for your Mahalanobis measure?
Mahalanobis cut off score is 13.81551.
d) How many outliers did you have?
We don't have outliers.
e) Delete all outliers.
No outliers.
mahal <- mahalanobis(df[,2:3],
colMeans(df[,2:3]),
cov(df[,2:3]))
cutmahal <- qchisq(1-.001, ncol(df[,2:3]))
badmahal <- as.numeric(mahal > cutmahal)
table(badmahal)
## badmahal
## 0
## 100
a) We won't need to calculate a correlation table. Why not?
Because two variable PAL_cell and PAL_acc are obtained from cell phone accelerometer and hand held accelerometer separatedly. They are not correlated at all.
a) Include a picture that shows how you might assess multivariate linearity.
b) Do you think you've met the assumption for linearity?
No, the assumption for linearity is not met.
random <- rchisq(nrow(df), 3)
fake <- lm(random ~ ., data = df)
summary(fake)
##
## Call:
## lm(formula = random ~ ., data = df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.8520 -1.8292 -0.4033 1.1399 7.5384
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.355806 1.751545 0.774 0.441
## gendermale -0.132524 0.674084 -0.197 0.845
## PAL_cell 0.027514 0.039803 0.691 0.491
## PAL_acc -0.002578 0.026741 -0.096 0.923
##
## Residual standard error: 2.325 on 96 degrees of freedom
## Multiple R-squared: 0.0103, Adjusted R-squared: -0.02063
## F-statistic: 0.3329 on 3 and 96 DF, p-value: 0.8015
standardized = rstudent(fake)
qqnorm(standardized)
abline(0,1)
a) Include a picture that shows how you might assess multivariate normality.
b) Do you think you've met the assumption for normality?
No, the assumption for normality is not met. It is skewed to the right.
hist(standardized, breaks=15)
a) Include a picture that shows how you might assess multivariate homogeneity.
b) Do you think you've met the assumption for homogeneity?
No, the assumption for homogeneity is not met,
c) Do you think you've met the assumption for homoscedasticity?
No, the assumption for homoscedasticity is not met.
plot(fake, 1) # look at a residual scatterplot
Use the equal variances option to adjust for problems with homogeneity (if necessary). Since the assumption of homogeneity is not met, so var.equal = FALSE
Include means and sds for your groups.
Means: female:56.55691 male: 72.44089
stds: female: 8.304537 male: 7.207361
# Since the assumption of homogeneity is not met, so var.equal = FALSE
t.test(PAL_cell ~ gender, data = df,
var.equal = FALSE, paired = FALSE)
##
## Welch Two Sample t-test
##
## data: PAL_cell by gender
## t = -10.214, df = 96.096, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -18.97071 -12.79723
## sample estimates:
## mean in group female mean in group male
## 56.55691 72.44089
mean <- tapply(df$PAL_cell, df$gender, mean) # calculate means of cell phone measurement PAL for gender group
sd <- tapply(df$PAL_cell, df$gender, sd) # calculate stds of cell phone measurement PAL for gender group
len <- tapply(df$PAL_cell, df$gender, length)
mean
## female male
## 56.55691 72.44089
sd
## female male
## 8.304537 7.207361
len
## female male
## 50 50
library(MOTE)
## Registered S3 methods overwritten by 'lme4':
## method from
## cooks.distance.influence.merMod car
## influence.merMod car
## dfbeta.influence.merMod car
## dfbetas.influence.merMod car
Effectsize <- d.ind.t(m1 = mean[1], m2 = mean[2],
sd1 = sd[1], sd2 = sd[2],
n1 = len[1], n2 = len[2], a = .05)
Effectsize$d
## female
## -2.042869
5 participants in each group (total is 10) we should used in this experiment given the effect size we found above.
library(pwr)
pwr.t.test(n = NULL, d = Effectsize$d,
sig.level = .05,
power = .80, type = "two.sample",
alternative = "two.sided")
##
## Two-sample t test power calculation
##
## n = 4.934352
## d = 2.042869
## sig.level = 0.05
## power = 0.8
## alternative = two.sided
##
## NOTE: n is number in *each* group
library(ggplot2)
cleanup = theme(panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
panel.background = element_blank(),
axis.line.x = element_line(color = "black"),
axis.line.y = element_line(color = "black"),
legend.key = element_rect(fill = "white"),
text = element_text(size = 15))
bargraph = ggplot(df, aes(gender, PAL_cell))
bargraph +
cleanup +
stat_summary(fun.y = match.fun(mean),
geom = "bar",
fill = "White",
color = "Black") +
stat_summary(fun.data = mean_cl_normal,
geom = "errorbar",
width = .2,
position = "dodge") +
xlab("Gender Group") +
ylab("Average PAL_cell")
According to stats: t = -10.214, df = 96.096, p-value < 2.2e-16, so we reject the null hypothesis. The conclusion is that there are significant differences in gender for the cell phone measurement of physical activity level.
Run a dependent t-test to tell if there are differences in the cell phone and hand held accelerometer results.
sds for PAL_cell/PAL_acc group PAL_cell: 11.11563 PAL_acc: 14.87863
There is significant in ratings, PAL_cell has smaller mean and std than PAL_acc
According to stats: t = -3.9573, df = 183.26, p-value = 0.0001083, so we reject the null hypothesis. The conclusion is that there are significant differences in the cell phone and hand held accelerometer results.
library(reshape)
longdata <- melt(df,
id = c("gender"),
measured = c("PAL_cell", "PAL_acc"))
# t test for differences in the cell phone and hand held accelerometer results
t.test(value ~ variable, data = longdata,
var.equal = FALSE, paired = FALSE)
##
## Welch Two Sample t-test
##
## data: value by variable
## t = -3.9573, df = 183.26, p-value = 0.0001083
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -11.014014 -3.685402
## sample estimates:
## mean in group PAL_cell mean in group PAL_acc
## 64.49890 71.84861
# calculate means for PAL_cell/PAL_acc group
M <- tapply(longdata$value, longdata$variable, mean)
# calculate stds for PAL_cell/PAL_acc group
sds <- tapply(longdata$value, longdata$variable, sd)
N <- tapply(longdata$value, longdata$variable, length)
M
## PAL_cell PAL_acc
## 64.49890 71.84861
sds
## PAL_cell PAL_acc
## 11.11563 14.87863
Effect size is -0.829654 for this difference. We are using Cohen’s d for effect size.
differences <- df$PAL_cell - df$PAL_acc
library(MOTE)
effect_diff <- d.dep.t.diff(mdiff = mean(differences), sddiff = sd(differences),
n = length(differences), a = .05)
effect_diff$d
## [1] -0.829654
24 participants we should have used in this experiment given the effect size found above.
library(pwr)
pwr.t.test(n = NULL, d = effect_diff$d,
sig.level = .05,
power = .80, type = "two.sample",
alternative = "two.sided")
##
## Two-sample t test power calculation
##
## n = 23.80526
## d = 0.829654
## sig.level = 0.05
## power = 0.8
## alternative = two.sided
##
## NOTE: n is number in *each* group
library(ggplot2)
cleanup = theme(panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
panel.background = element_blank(),
axis.line.x = element_line(color = "black"),
axis.line.y = element_line(color = "black"),
legend.key = element_rect(fill = "white"),
text = element_text(size = 15))
bargraph = ggplot(longdata, aes(variable, value))
bargraph +
cleanup +
stat_summary(fun.y = match.fun(mean),
geom = "bar",
fill = "White",
color = "Black") +
stat_summary(fun.data = mean_cl_normal,
geom = "errorbar",
width = .2,
position = "dodge") +
xlab("PAL_cell/PAL_acc Group") +
ylab("Average Value")
According to stats: t = -3.9573, df = 183.26, p-value = 0.0001083, so we reject the null hypothesis. The conclusion is that there are significant differences in the cell phone and hand held accelerometer results.
List the null hypothesis for the dependent t-test. The null hypothesis for the dependent t-test above is that : the mean differences in the cell phone and hand held accelerometer results is 0.
List the research hypothesis for the dependent t-test. The research hypothesis for the dependent t-test above is that : the mean differences in the cell phone and hand held accelerometer results is not 0.
If the null were true, what would we expect the mean difference score to be? If the null were true, what would we expect the mean difference score to be 0.
If the null were false, what would we expect the mean difference score to be? If the null were false, what would we expect the mean difference score to be in the range of (-11.014014 -3.685402) with 95% confidence interval.
In our formula for dependent t, what is the estimation of systematic variance? Systematic variance: the mean difference between PAL_cell and PAL_acc. The nominator of formula dependent t
In our formula for dependent t, what is the estimation of unsystematic variance? Unsystematic variance: the standard error of the mean difference of PAL_cell and PAL_acc. THe denominator of formula dependent t.