What is our question?

Is there a statistically significant difference in the number of steps taken daily by males and females?

# Load necessary libraries
library(readr)
library(ggplot2)
library(ggthemes)
library(BSDA)
## Loading required package: lattice
## 
## Attaching package: 'BSDA'
## The following object is masked from 'package:datasets':
## 
##     Orange
# Load data
stepsM <- c(1756,6042,23567,16156,5641,3943,5255,10610,8162,4927,13476,1506,19336,4940,4847,16313,18710,16758,7874)

stepsFM <- c(3504,1260,23748,7990,1405,10766,2566,12772,14879,8020,14985,21070,1127,17954,10432,7514,13934,9710,16846)

Level 1: Eyeball the data. Can you tell which group has a higher mean?

# Calculate and display summary statistics like the mean for males
summary(stepsM)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1506    4934    7874    9990   16234   23567

Level 2: Visualize the data. Can you tell which group has a higher mean?

# Visualize the data
hist(stepsM)

# Calculate and display summary statistics like the mean for females
summary(stepsFM)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1127    5509   10432   10552   14932   23748
# Merge the two datasets into one data frame for visualization
stepsGender <- merge(stepsM, stepsFM)
str(stepsGender)
## 'data.frame':    361 obs. of  2 variables:
##  $ x: num  1756 6042 23567 16156 5641 ...
##  $ y: num  3504 3504 3504 3504 3504 ...
# Name the columns appropriately "StepsMale" and "StepsFemale"
colnames(stepsGender) <- c("StepsMale", "StepsFemale")
head(stepsGender)

Level 3: Conduct a variety of inferential tests. But which one makes the most sense?

Recall NHST in five steps:

1. State the null and alternative hypotheses (H)

H0 : μ… HA : μ…

2. Check your conditions and assumptions (A)

The sample is a… The sample size (n = ?) is less than 10% of the population size (unknown?)?

So, we’d typically do a t-test if n < 30 and a z-test if n > 30.

alpha = ? or ?

3. Calculate the test statistic (TS)

Manual

sample_mean <- mean(df\(column) sample_sd <- sd(df\)column) n <- nrow(df)

t_score <- (sample_mean_male - sample_mean_fm) / (sample_sd / sqrt(n)) t_score

Computer-assisted ttest_result <- t.test(df, mu = ?, conf.level = ?)

Visualize the t-distribution and the t-statistic

library(gginference) ggttest(t_test, colaccept = “lightsteelblue1”, colreject = “pink, colstat =”navyblue”)

4. Find the p-value (P)

Manual p_value <- 2 * pt(t_score, df = n - 1) p_value

Computer-assisted t.test()

5. Make a decision (D)

The p-value ? >/</= ?, so we reject/fail to reject the null hypothesis. There is/is not enough evidence to suggest that…

Remember the mnemonic:

HATS-PD or HATS-Police Department 1. Hypotheses 2. Assumptions 3. Test statistic 4. Significance (p-value) 5. Decision

But first, let’s test the assumption of equal variances.

Are our variances equal or not?

The F test in R can help us determine that.

First, let’s do it manually…

var(stepsM)
## [1] 44596292
var(stepsFM)
## [1] 46065055
f_ratio <- var(stepsM) / var(stepsFM)
f_ratio
## [1] 0.9681155

Now let’s use base R code to do it for us…

var.test(stepsGender$StepsMale, stepsGender$StepsFemale)
## 
##  F test to compare two variances
## 
## data:  stepsGender$StepsMale and stepsGender$StepsFemale
## F = 0.96812, num df = 360, denom df = 360, p-value = 0.7587
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
##  0.7871552 1.1906769
## sample estimates:
## ratio of variances 
##          0.9681155

Yay! The F-ratios match and they’re close to 1, so very similar.

# Conduct a t-test
t.test(stepsGender$StepsMale, stepsGender$StepsFemale, var.equal=TRUE)
## 
##  Two Sample t-test
## 
## data:  stepsGender$StepsMale and stepsGender$StepsFemale
## t = -1.149, df = 720, p-value = 0.251
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -1520.1651   397.7441
## sample estimates:
## mean of x mean of y 
##  9990.474 10551.684
# Visualize the t-statistic within the t-distribution and whether or not it falls in the rejection region
library(gginference)
ggttest(t.test(stepsGender$StepsMale, stepsGender$StepsFemale,
               var.equal=TRUE),
        colaccept = "skyblue",
        colreject = "maroon",
        colstat = "black")
## Warning: `geom_vline()`: Ignoring `data` because `xintercept` was provided.

# So, is the t-statistic extremely small or large enough to fall in the rejection regions?

Class response…

# Conduct a t-test
t.test(stepsGender$StepsMale, stepsGender$StepsFemale, var.equal=FALSE)
## 
##  Welch Two Sample t-test
## 
## data:  stepsGender$StepsMale and stepsGender$StepsFemale
## t = -1.149, df = 719.81, p-value = 0.251
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -1520.1655   397.7445
## sample estimates:
## mean of x mean of y 
##  9990.474 10551.684
# Visualize the t-statistic within the t-distribution and whether or not it falls in the rejection region
library(gginference)
ggttest(t.test(stepsGender$StepsMale, stepsGender$StepsFemale,
               var.equal=FALSE),
        colaccept = "skyblue",
        colreject = "maroon",
        colstat = "black")
## Warning: `geom_vline()`: Ignoring `data` because `xintercept` was provided.

BONUS: USE IF WE HAVE TIME

But what if we knew the population standard deviations? Would a z-test be more appropriate? Do we know the population standard deviations in reality?

We don’t really know for UMD students if we’re looking at all students, but let’s say we did.

We know from studies the US averages:

https://www.healthline.com/health/average-steps-per-day#sex describes a systematic review of studies from 1995 to 2015: https://pmc.ncbi.nlm.nih.gov/articles/PMC5397769/

“A 2010 study looked at pedometer data for just over 1,000 adults. Overall, males took an average of 5,340 steps per day, compared to 4,912 for females.”

Male = 5,340 steps/day Female = 4,912 steps/day Difference = 428 steps/day

But I can’t find the exact place in the article where the numbers appear!

Gender differences in total walking

There was no evidence for a gender difference in the prevalence of walking for any purpose in studies including all ages from the USA. Data reported by age group (in two studies from the USA, Fig. 4b) suggest that at younger ages more women walk than men, but at older ages the gender difference is very small. However, both Australian studies looking at wide age ranges reported that the prevalence of walking was higher in women than in men (Fig. 4a).

A study conducted across nearly all adult ages in the Czech Republic reported that a greater proportion of women than men walked for 150 min per week (OR 1.46 (1.34, 1.68)) (Frömel, Czech Republic, 25+) [26].

In older age groups, Hörder (Sweden, 75) [28] found no significant difference in the proportion of women and men achieving 75 min walking per week (OR 0.97 (0.63, 1.37)), whereas Satariano (US, 65+) [44] found that more men than women walked for more than 150 min per week (OR 0.56 (0.43, 0.74)).

Three studies reported on gender differences in ‘regular substantial walking’. One found that fewer women engaged in ‘regular substantial walking’ than men (OR 0.90 (0.88, 0.92)) (Ryu, South Korea, 19+) [42]. Granner (USA, 18+) [27] and Wen (USA, 18+) [52] reported non-significant odds ratios of 0.90 (0.75, 1.08) and 0.96 (0.90, 1.03) respectively (Wen adjusted for age, ethnicity, marital status, employment, education, income and weight category).

??z.test
# Conduct a z-test 
# Did I do this correctly?

z.test(
  stepsGender$StepsFemale,
  y = stepsGender$StepsFemale,
  alternative = "two.sided",
  mu = 428,
  sigma.x = 1250, #JP made these numbers up to illustrate the z-test
  sigma.y = 1500, #JP made these numbers up to illustrate the z-test
  conf.level = 0.95
)
## 
##  Two-sample z-Test
## 
## data:  stepsGender$StepsFemale and stepsGender$StepsFemale
## z = -4.1648, p-value = 3.116e-05
## alternative hypothesis: true difference in means is not equal to 428
## 95 percent confidence interval:
##  -201.4185  201.4185
## sample estimates:
## mean of x mean of y 
##  10551.68  10551.68

Interpret the results of the two-sample z-test from above

z = ?, p = ? # So, is the z-statistic extremely small or large enough to fall in the rejection regions?

P-value

…Class response….

JP: Check that my interpretation is correct here. # The difference between confidence interval and confidence level is that a confidence interval is a range of values that is likely to contain the true population parameter, while a confidence level is the percentage of times that the confidence interval would contain the true population parameter if the same sampling procedure were repeated multiple times.