I have added the effect size to all hypothesis tests. Latex is used in the document.

Chi-squared – Cramer’s V ($\phi_c$)
- $\phi_c \in [0.1,0.3]$: weak association
- $\phi_c \in [0.4,0.5]$: medium association
- $_c > 0.5 $: strong association
Kruskal Wallis Test by Rank – Eta-squared ($\eta^2$)
- $\eta^2 = 0.01$ indicates a small effect;
- $\eta^2 = 0.06$ indicates a medium effect;
- $\eta^2 = 0.14$ indicates a large effect.
Pairwise Wilcoxon Signed Ranks Test – (Cohen) $r = z/\sqrt{n}$; z statistic; n: sample size number of pairs
- 0.1 (small effect),
- 0.3 (moderate effect) and
- 0.5 and above (large effect).
- P adjustment method “BH” is used.

$\phi_c$, $\eta^2$, and $r$ of pairwise Wilcoxon are incomparable.

The effect size of Wilcoxon test is not explained well in the original source, but this method of calculation is provided by both R and SPSS:

Rosenthal, R. (1991). Applied Social Research Methods: Meta-analytic procedures for social research Thousand Oaks, CA: SAGE Publications Ltd doi: 10.4135/9781412984997

You can view significant data by limiting p value and effect size (e.g. p <0.05, effect.size $\in [0.4,\infty]$).

# function that converts group notations (P,S,C) to more readable texts
group.legend = function(data){
    new = data %>%
    mutate(group = case_when(
        group == "control" ~ "Control",
        group == "single" ~ "20 minuted 1X/day",
        TRUE ~ "10 minutes 2X/day"
    ))
    return(new)
}

1.0 Group differences

in all demographic variables and in exam grade, hours studying, & self-reported study productivity at day 6 and 7

Data preparation

AEQ_sum <- function(data){
    new = data %>%
    pivot_longer(-emory.id, names_to = c("subscale","q_num"), names_pattern = "(.*).(.)", values_to = "score") %>%
    group_by(emory.id, subscale) %>%
    summarise(score = sum(score)) %>%
    ungroup()
    return(new)
}

# Define factor levels of 3; this step is useful for future correlation analysis and data display
levels = as.character(c(1:3))
labels = c("Barely concerned", "Somewhat concerned", "Very Concerned")

# The variable "data" is used for 1.1 
# Do not remove it until 1.2 Numeric Data

data <- datasets.raw$pre_study %>% select(1:35) %>%
    left_join(select(datasets$other, emory.id, group)) %>%
    mutate(
        # Separate Race, religious affiliation, family affiliation, and types of meditation practiced using ';'
           race = strsplit(race,";"),
           religion = strsplit(religion,";"),
           religion.family = strsplit(religion.family,";"),
           meditation.type = strsplit(meditation.type,";"),
           major = strsplit(major,";"),
           
           worry = factor(worry, levels = as.character(c(1:5))),
           concern.1 = factor(concern.1, levels=levels),
           concern.2 = factor(concern.2, levels=levels),
           concern.3 = factor(concern.3, levels=levels),
           concern.4 = factor(concern.4, levels=levels),
           concern.5 = factor(concern.5, levels=levels),
           concern.6 = factor(concern.6, levels=levels),
           
           religious.level = factor(religious.level , levels=as.character(c(1:5))),
           meditation.freq = factor(meditation.freq, levels=as.character(c(1:4))),
           meditation.times = factor(meditation.times, levels=as.character(c(1:4))),
           
           relax = factor(relax, levels=as.character(c(1:5))),
           focus = factor(focus, levels=as.character(c(1:5))),
           exam.score = factor(exam.score, levels=as.character(c(1:5))),
           
           ) %>%
    group.legend()


levels(data$concern.1) <- c("Barely concerned", "Somewhat concerned", "Very Concerned")
levels(data$concern.2) <- c("Barely concerned", "Somewhat concerned", "Very Concerned")
levels(data$concern.3) <- c("Barely concerned", "Somewhat concerned", "Very Concerned")
levels(data$concern.4) <- c("Barely concerned", "Somewhat concerned", "Very Concerned")
levels(data$concern.5) <- c("Barely concerned", "Somewhat concerned", "Very Concerned")
levels(data$concern.6) <- c("Barely concerned", "Somewhat concerned", "Very Concerned")

freq <- c("Never", "Weekly", "Monthly", "Other")
times <- c("Never tried it", "Tried it 1-10 times", "Tried it 10-100 times", "Tried it 100-1000 times")
levels(data$meditation.freq) <- freq
levels(data$meditation.times) <- times


rm(list=ls()[! ls() %in% c("datasets","datasets.raw", "group.legend", "data")])

1.1 Categorical Variables

Ordinal variables:
- Meditation frequency, meditation times, Difficulty, worry, and concerns (6 questions), religious level,
- Attitude (Likert Scale): meditation helps me relax, focus, improve exam score
- self-reported study productivity on day 6 and 7
Gender, Grade, and Ethnicity does not need any change before analysis
- gender, selection bias
Separate Race, major, religious affiliation, family affiliation, and types of meditation practiced using ‘;’

You can interact with the following tables.

# Ordinal Variables
colnames1 <- c("gender", "grade","ethnicity","meditation.freq","meditation.times")

concerns = paste0("concern.",c(1:6))
concerns.label = paste("If you were to get a bad grade, how concerned would you be about:", 
    c("Disappointing your parents", 
        "Embarrassing yourself amongst friends",
        "Embarrassing yourself amongst professors",
        "Not getting the job or into the grad school that you hope to",
        "Feeling unintelligent",
        "Feeling lazy"))

colnames = c(colnames1, "difficulty", "worry", concerns, "religious.level",
         "relax","focus","exam.score")

colnames.labels = c(colnames1, "Difficulty of the selected course", "How worried are you about this exam?",
           concerns.label, 
           "How religious do you consider yourself?",
           paste("Meditation helps me", c("relax", "focus","improve my exam score")))

colnames2 = c("race", "major", "religion", "religion.family", "meditation.type")

productivity = c("productivity6", "productivity7")
productivity.label = c("productivity day 6", "productivity day 7")

# function to calculate CI
CI.categorical = function(data, colname){ # colname is a string
    new <- data %>%
        group_by_('group',colname) %>% # use the 'standard evaluation' counterpart of `group_by`
        
        # ---*** Warning! some subjects left the question `meditation.freq` in blank ***---
        filter(!is.na(.data[[colname]])) %>% # To refer to column names that are stored as strings, use the `.data` pronoun
        summarise(count = n()) %>%
        ungroup() %>%
        group_by(group) %>%
        mutate(sum = sum(count),
               proportion = count / sum) %>%
        group_by_('group',colname) %>% # use the 'standard evaluation' counterpart of `group_by`
        mutate(lower = prop.test(count, sum)$conf.int[1], # CI.formula(sum, proportion)[1],
               upper = prop.test(count, sum)$conf.int[2], # CI.formula(sum,proportion)[2], 
            ) %>%
        ungroup() %>%
        mutate(variable = colname) %>%
        # rename_('value' = colname) %>%
        select(group, variable, value=colname, proportion, lower, upper) %>% # remove original count of frequences of categorical value and the sum of all instences
        arrange_('value')
    return(new)
}

# function for Chi-squared Hypothesis Testing 
chi <- function(data, var1="group", var2){ # pass the variable name instead of the variable
    # print(sprintf("Chi-square test of independence between %s and %s:",var1, var2))
    tbl = table(data[[var1]], data[[var2]])
    
    result <- chisq.test(data[[var1]], data[[var2]])
    
    if(result$p.value <= 0.05){
        print("There is statistically significant differences acrosss three groups.")
    }

    n = nrow(data)
    effect = sqrt((result$statistic)/(n * result$parameter)) # parameter is the degree of freedom, calculates Cramer's V
    result_vector = c("Chi-squared", var1,var2, result$statistic, result$parameter, n, result$p.value, effect)
    return(result_vector)
}

CI = data.frame(matrix(ncol = 6, nrow = 0))
chisq.result = data.frame(matrix(ncol = 8, nrow = 0))
colnames(chisq.result) = c("test.type","var1", "var2","statistic", "df", "n", "p.value", "effect.size")

for(colname in colnames){
    # CI.categorical
    result = CI.categorical(data, colname)
    CI = rbind(CI, result)
    
    # Hypothesis Testing
    result = chi(data, "group", colname)
    chisq.result[nrow(chisq.result) + 1,] = result
    gc(verbose = FALSE) # free some RAM
}

## [1] "There is statistically significant differences acrosss three groups."

##########
# variables that are nested: 
# race, major, religion, religion.family, meditation.type (with multiple values stored as vectors inside a cell)
##########

for(colname in colnames2){
    # CI
    tmp = data %>% select(group, colname) %>%
        unnest(colname) # unnest the variable
    result = CI.categorical(tmp, colname)
    CI = rbind(CI, result)
    
    # Hypothesis Testing
    result = chi(tmp, "group", colname)
    chisq.result[nrow(chisq.result) + 1,] = result
}

# modified mood.6 and mood.7

mood.7 <- datasets.raw$daily %>%
    filter(day == 7) %>%
    mutate(emory.id = as.integer(emory.id))

productivity67 <- datasets.raw$daily %>%
    filter(day == 6) %>% select(emory.id, productivity, day) %>%
    bind_rows(select(mood.7, emory.id, productivity, day)) %>%
    left_join(datasets$other %>% select(emory.id, group), by = "emory.id")

# Create Productivity Dataset separately for day 6 and 7
productivity6 <- productivity67 %>%
    filter(day == 6) %>%
    select(group, productivity) %>%
    group.legend() %>%
    rename(productivity6 = productivity)
productivity7 = productivity67 %>%
    filter(day == 7) %>%
    select(group, productivity) %>%
    group.legend() %>%
    rename(productivity7 = productivity)

productivity67 <- list(productivity6,productivity7)


# use the functions
for(var in productivity){
    # CI.categorical
    i = which(productivity == var)
    result = CI.categorical(productivity67[[i]], var)
    CI = rbind(CI, result)
    
    # Hypothesis Testing
    result = chi(productivity67[[i]], "group", var)
    chisq.result[nrow(chisq.result) + 1,] = result
    gc(verbose = FALSE) # free some RAM
}



# Display
datatable(
  CI, filter = 'top', extensions = 'Buttons', options = list(
    dom = 'Bfrtip',
    buttons = c('copy', 'csv', 'excel', 'pdf', 'print')
  )
)

datatable(
  chisq.result, filter = 'top', extensions = 'Buttons', options = list(
    dom = 'Bfrtip',
    buttons = c('copy', 'csv', 'excel', 'pdf', 'print')
  )
)

# remove unnecessary data and functions
CI.categorical = CI
rm(list=ls()[! ls() %in% c("datasets","datasets.raw", "group.legend", "data", "chisq.result", "CI.categorical")])

Most feared class and Reason to meditate are not included.

1.2 Numeric Baseline Data

Discrete variables: credits, age
Continuous variables: expected grade for the course, and estimated GPA
Group differences in exam grade and hours studying

Analysis of the differences of demographic variables across groups:

# Create variable names and labels
vars.numeric = c("credits","age",
         "GPA.previous", "grade.predict")
labels.numeric = c("Credits of the current semester","Age", "Expected grade for the most challenging course",
           "Estimated GPA of the previous semester")


CI.numeric <- function(data, colname){
    new = data %>%
        group_by(group) %>%
        filter(!is.na(.data[[colname]])) %>%
        summarise(
            mean = t.test(.data[[colname]], conf.level = 0.95)$estimate,
            lower = t.test(.data[[colname]], conf.level = 0.95)$conf.int[1],
            lower = case_when(lower < 0 ~ 0, TRUE ~ lower),
            upper = t.test(.data[[colname]], conf.level = 0.95)$conf.int[2]) %>%
        ungroup() %>%
        mutate(variable = colname) %>%
        select(group, variable, mean, lower, upper)
    return(new)
}

kruskal_wilcox <- function(data, colname){
    # Kruskal-Wallis rank sum test
    kruskal = kruskal_test(reformulate('group',colname), data = data) %>% # y~x
        rename(test.type = method,
               y = .y.,
               p.value = p) %>%
        mutate(x = "group", 
               effect.size = kruskal_effsize(data, reformulate('group',colname))$effsize) %>% # incorporate effect size
        select(test.type, x, y, statistic, df, n, p.value, effect.size)

    # Multiple pairwise Wilcox unpaired test between groups
    if(kruskal$p.value <=0.05){
        wilcox = pairwise_wilcox_test(data, reformulate('group',colname), p.adjust.method = "BH") %>% # y~x
        rename(y = .y.,
               p.value = p) %>% # P adjusted or raw P value?
        mutate(var1 = "group", 
               test.type = "pairwise Wilcox",
               effect.size = wilcox_effsize(data, reformulate('group',colname))$effsize) %>% # incorporate effect size
        select(test.type, group1, group2, y, statistic, n1, n2, p.value, p.adj, effect.size)

        return(list(kruskal, wilcox))
    }
    else{
        return(list(kruskal))
    }
}

CI = data.frame(matrix(ncol = 5, nrow = 0))
kruskal.group.diff = data.frame(matrix(ncol = 8, nrow = 0))
wilcox.group.diff = data.frame(matrix(ncol = 10, nrow = 0))

for(var in vars.numeric){
    # CI
    result = CI.numeric(data, var)
    CI = rbind(CI, result)
    # Hypothesis Testing
    result = kruskal_wilcox(data %>% select(group, var), var)
    kruskal.group.diff = rbind(kruskal.group.diff, result[[1]])
    if(length(result) > 1){
        wilcox.group.diff = rbind(wilcox.group.diff, result[[2]])
    }
    
    gc(verbose = FALSE) # free some RAM
}

vars = c("exam.score", "study.time.total")
labels = c("Exam score", "Hours studying")


for(var in vars){
    # CI
    result = CI.numeric(datasets$other %>% group.legend(), var)
    CI = rbind(CI, result)
    
    # hypothesis testing
    result = kruskal_wilcox(datasets$other %>% group.legend(), var)
    kruskal.group.diff = rbind(kruskal.group.diff, result[[1]])
    if(length(result) > 1){
        wilcox.group.diff = rbind(wilcox.group.diff, result[[2]])
    }
    gc(verbose = FALSE) # free some RAM
}



datatable(
  CI, filter = 'top', extensions = 'Buttons', options = list(
    dom = 'Bfrtip',
    buttons = c('copy', 'csv', 'excel', 'pdf', 'print')
  )
)

datatable(
  kruskal.group.diff, filter = 'top', extensions = 'Buttons', options = list(
    dom = 'Bfrtip',
    buttons = c('copy', 'csv', 'excel', 'pdf', 'print')
  )
)

datatable(
  wilcox.group.diff, filter = 'top', extensions = 'Buttons', options = list(
    dom = 'Bfrtip',
    buttons = c('copy', 'csv', 'excel', 'pdf', 'print')
  )
)

CI.numeric = CI
rm(list=ls()[! ls() %in% c("datasets","datasets.raw", "group.legend","kruskal_wilcox", "chisq.result", "CI.categorical", "CI.numeric", "kruskal.group.diff", "wilcox.group.diff")])

P value adjusted or raw P value? Which one I should use?

2.0 AEQs

CI.AEQ = function(data){
    new = data %>%
    group.legend() %>%
    group_by(subscale, group) %>%
    summarise(
            avg = t.test(score, conf.level = 0.95)$estimate,
            L = t.test(score, conf.level = 0.95)$conf.int[1],
            U = t.test(score, conf.level = 0.95)$conf.int[2]) %>%
    ungroup()
    return(new)
}

plot.aeq <- function(data, title = "AEQ", x = "Group", y = "Score"){ # plot a dotplot
    
    subscales = unique(data$subscale)
    
    p = data %>%
        group.legend() %>%
        # filter(subscale == subscales[i]) %>%
        ggplot(mapping = aes(x=group, y=score)) + 
        geom_dotplot(binaxis='y', stackdir='center', alpha = 0.6) + 
        facet_grid(rows = vars(subscale)) + 
        stat_summary(fun.data = "mean_cl_normal",
                   geom = "errorbar",size = 0.6,
                   width = .3, color = "black") +
        stat_summary(fun = "mean", geom = "point",  color = "red", size = 2) +
        # geom_errorbar(aes(ymin = , ymax = , fill = group), width = 0.1, size = 0.8) + 
        labs(# title=title,
            y =y, x = x # , caption = "Error bars indicate 95% CI"
            )
    return(p)
}

hypo.aeq = function(data){
    results = vector(mode = "list", length = 2)
    kruskal = data.frame(matrix(ncol = 8, nrow = 0))
    wilcox = data.frame(matrix(ncol = 9, nrow = 0))

    subscales = unique(data$subscale) # get the subscale names by removing duplicates
    
    for(var in subscales){ 
        new = data %>%
            group.legend() %>%
            filter(subscale == var) %>%
            mutate_(.dots=setNames(list(quote(score)),var)) %>% # standard evaluation  https://stackoverflow.com/questions/37259581/r-understanding-standard-evaluation-in-mutate
            select_('group', var)
        
        result = kruskal_wilcox(new, var) 
        
        if(length(result) > 1){
            kruskal = rbind(kruskal, result[[1]])
            wilcox = rbind(wilcox, result[[2]])
        }
        else{
            kruskal = rbind(kruskal, result[[1]])
        }
        results = list(kruskal, wilcox)
        
        
    }
    return(results)
}

# CI in tables

datasets$difference <- datasets$difference %>%
    rename(score = difference)

AEQ_6.CI <- datasets$AEQ_6 %>% CI.AEQ() %>%
    mutate(AEQ.type = "state", AEQ.day = "6")
AEQ_7.CI <- datasets$AEQ_7 %>% CI.AEQ() %>%
    mutate(AEQ.type = "state", AEQ.day = "7")
AEQ_14.CI <- datasets$AEQ_14 %>% CI.AEQ() %>%
    mutate(AEQ.type = "state", AEQ.day = "14")
difference.CI <- datasets$difference %>%
    group.legend() %>%
    group_by(subscale, group) %>%
    summarise(
            avg = t.test(score, conf.level = 0.95)$estimate,
            L = t.test(score, conf.level = 0.95)$conf.int[1],
            U = t.test(score, conf.level = 0.95)$conf.int[2]) %>%
    ungroup() %>%
    mutate(AEQ.type = "trait", AEQ.day = "Trait AEQ change (21-0)")

CI <- rbind(AEQ_6.CI, AEQ_7.CI, AEQ_14.CI, difference.CI)


# CI in graphs
titles = paste("Total State AEQ Score on Day", c(6, 7, 14))

i = 1
AEQ.graphs = list()

for(data in datasets[c(1,2,3)]){
    graph = data %>% plot.aeq(title = titles[i])
    
    AEQ.graphs = append(AEQ.graphs, list(graph))

    names(AEQ.graphs)[length(AEQ.graphs)] = c(6, 7, 14)[i]
    i = i+1
    
    gc(verbose = FALSE) # free some RAM
}
difference.graph <- datasets$difference %>%
    plot.aeq(title = "Change in Trait AEQ Scores (21-0)")

AEQ.graphs = append(AEQ.graphs, list(difference.graph))
names(AEQ.graphs)[length(AEQ.graphs)] = "difference"

AEQ.6714 <- ggarrange(AEQ.graphs[[1]],AEQ.graphs[[2]],AEQ.graphs[[3]], 
          # labels = c(paste("Day", c(6, 7, 14))),
          ncol = 1, nrow = 3, # hjust = c(-0.9, -0.9,-0.9),
          vjust = 0,
          heights = c(0.8,0.8,0.8))


figure <- ggarrange(AEQ.6714,AEQ.graphs[[4]], 
          labels = c(paste("Day", c(6, 7, 14)), "21-0"),
          ncol = 2, hjust = -1.1,
          vjust = 0)

figure.AEQ <- annotate_figure(figure,
                top = text_grob("AEQ Scores", color = "black", face = "bold", size = 14),
                bottom = text_grob("Error bars indicate 95% CI ", color = "black",
                                   hjust = 1, x = 1, face = "italic", size = 10),
                )
figure.AEQ

# hypothesis testing
AEQ.kruskal = data.frame(matrix(ncol = 9, nrow = 0))
AEQ.wilcox = data.frame(matrix(ncol = 11, nrow = 0))

i = 1

for(data in datasets[1:4]){
    result = hypo.aeq(data)
    result[[1]] = result[[1]] %>%
        mutate(day = names(datasets[1:4])[i]) 
    
    if(length(result[[2]]) != 0){
        result[[2]] = result[[2]] %>%
        mutate(day = names(datasets[1:4])[i])
    }
    
    AEQ.kruskal = rbind(AEQ.kruskal, result[[1]])
    AEQ.wilcox = rbind(AEQ.wilcox, result[[2]])
    
    i = i+1
    
    gc(verbose = FALSE) # free some RAM
}

results = list(AEQ.kruskal, AEQ.wilcox)

datatable(
  CI, filter = 'top', extensions = 'Buttons', options = list(
    dom = 'Bfrtip',
    buttons = c('copy', 'csv', 'excel', 'pdf', 'print')
  )
)

datatable(
  AEQ.kruskal, filter = 'top', extensions = 'Buttons', options = list(
    dom = 'Bfrtip',
    buttons = c('copy', 'csv', 'excel', 'pdf', 'print')
  )
)

datatable(
  AEQ.wilcox, filter = 'top', extensions = 'Buttons', options = list(
    dom = 'Bfrtip',
    buttons = c('copy', 'csv', 'excel', 'pdf', 'print')
  )
)

CI.AEQ = CI
rm(list=ls()[! ls() %in% c("datasets","datasets.raw", "group.legend","kruskal_wilcox", "chisq.result", "CI.categorical", 
    "CI.numeric", "kruskal.group.diff", "wilcox.group.diff",
    "CI.AEQ", "AEQ.kruskal", "AEQ.wilcox", "figure.AEQ")])

3.0 Daily Mood Score

95% CI for daily mood
group by duration (1~6 or 7~21) and group: 2 * 3 = 6 CI in total

CI.mood <- datasets$mood %>%
    group_by(mood.type, group) %>%
    summarise(
            avg.mean = t.test(mood.avg, conf.level = 0.95)$estimate,
            min.mean = t.test(mood.min, conf.level = 0.95)$estimate,
            avg.L = t.test(mood.avg, conf.level = 0.95)$conf.int[1],
            avg.U = t.test(mood.avg, conf.level = 0.95)$conf.int[2],
            min.L = t.test(mood.min, conf.level = 0.95)$conf.int[1],
            min.U = t.test(mood.min, conf.level = 0.95)$conf.int[2]) %>%
    ungroup()

# Average daily mood


# mood.conf %>% 
#     group.legend() %>%
#     ggplot(mapping = aes(mood.type, avg.mean)) + 
#     geom_point(size = 3) + 
#     geom_errorbar(aes(ymin = avg.L, ymax = avg.U), width = 0.1, size = 0.8) + 
#     facet_wrap(~group) + 
#     # ylim(2.75, 3.75) + 
#     labs(title="4.1 Average Daily Mood across Groups",
#         x ="Day", y = "Mood (Likert Scale)", caption = "Error bars indicate 95% CI")

avg.graph <- datasets$mood %>% 
    group.legend() %>%
    ggplot(mapping = aes(x=mood.type, y=mood.avg)) + 
    geom_dotplot(binaxis='y', stackdir='center') + 
    stat_summary(fun.data = "mean_cl_normal",
               geom = "errorbar",size = 1,
               width = .3, color = "red") +
    stat_summary(fun = "mean", geom = "point",  color = "red") + 
    # geom_errorbar(aes(ymin = , ymax = , fill = group), width = 0.1, size = 0.8) + 
    facet_wrap(~group) + 
    labs(# title="4.1 Average Daily Mood across Groups",
        x ="Day", y = "Mood (Likert Scale)" #, caption = "Error bars indicate 95% CI"
        )


# min daily mood

# mood.conf %>% 
#     group.legend()%>%
#     ggplot(mapping = aes(mood.type, min.mean)) + 
#     geom_point(size = 3) + 
#     geom_errorbar(aes(ymin = min.L, ymax = min.U), width = 0.1, size = 0.8) + 
#     facet_wrap(~group) + 
#     # ylim(2.75, 3.75) + 
#     labs(title="4.2 Minimum Daily Mood across Groups",
#         x ="Day", y = "Mood (Likert Scale)", caption = "Error bars indicate 95% CI")

min.graph <- datasets$mood %>% 
    group.legend()%>%
    ggplot(mapping = aes(x=mood.type, y=mood.min)) + 
    geom_dotplot(binaxis='y', stackdir='center') + 
    stat_summary(fun.data = "mean_cl_normal",
               geom = "errorbar",size = 1,
               width = .3, color = "red") +
    stat_summary(fun = "mean", geom = "point",  color = "red") + 
    # geom_errorbar(aes(ymin = , ymax = , fill = group), width = 0.1, size = 0.8) + 
    facet_wrap(~group) + 
    labs( # title="4.2 Minimum Daily Mood across Groups",
        x ="Day", y = "Mood (Likert Scale)" # , caption = "Error bars indicate 95% CI"
        )



figure <- ggarrange(avg.graph,min.graph, 
          labels = c("Average","Minimum"),
          ncol = 1, nrow = 2, hjust = c(-1, -0.9), vjust = 0.2)

figure.mood <- annotate_figure(figure,
                top = text_grob("Mood Scores", color = "black", face = "bold", size = 14),
                bottom = text_grob("Error bars indicate 95% CI ", color = "black",
                                   hjust = 1, x = 1, face = "italic", size = 10),
                )
figure.mood

# remove data and functions
rm(list=ls()[! ls() %in% c("datasets","datasets.raw", "group.legend","kruskal_wilcox", 
    "chisq.result", "CI.categorical", 
    "CI.numeric", "kruskal.group.diff", "wilcox.group.diff",
    "CI.AEQ", "AEQ.kruskal", "AEQ.wilcox", "figure.AEQ",
    "CI.mood", "figure.mood")])

# remove all functions, checkpoint
rm(list=ls()[! ls() %in% c("datasets","datasets.raw", # original data sets
    "chisq.result", "CI.categorical", # data related to group difference in categorical data
    "CI.numeric", "kruskal.group.diff", "wilcox.group.diff", # data related to group difference in numeric data
    "CI.AEQ", "AEQ.kruskal", "AEQ.wilcox", "figure.AEQ", # data and figure related to AEQ
    "CI.mood", "figure.mood")]) # data and figure related to mood

Further Correlation Analysis (draft)

continuous (GPA) vs continuous (difference between predicted and actual exam score)) – (Correlation: Spearman’s rank or Kendall’s tau, non-parametric tests)
discrete (e.g. age, credit) vs continuous (difference between predicted and actual exam score)) – Pearson’s r_p (parametric, may not be suitable if the GPA and score does not follow normal didstribution) or Spearman’s r_s – (Spearman’s rank or Kendall’s tau, non-parametric tests)
discrete (e.g. age, credit) vs ordinal (difference (AEQ 21 - AEQ 0)) – (Spearman’s rank or Kendall’s tau, non-parametric tests)
ordinal (e.g. how concern are you, how religious are you (ordinal), can meditation help relax, focus, exam.score) vs continuous (difference between predicted and actual exam score)) – (Spearman’s rank or Kendall’s tau, non-parametric tests)
ordinal (e.g. how concern are you, how religious are you (ordinal), can meditation help relax, focus, exam.score) vs ordinal (difference (AEQ 21 - AEQ 0)) – (Spearman’s rank or Kendall’s tau, non-parametric tests)
nominal (e.g. gender, major) vs continuous – Kruskal Wallis, Eta-squared value (effect size)
- point-biserial correlation when the nominal variable is dichotomous (e.g. gender)
nominal (e.g. gender, major) vs ordinal (difference (AEQ 21 - AEQ 0)) – Kruskal Wallis, Eta-squared value (treat the ordinal variable as continuous variable)
- rank-biserial correlation when the nominal variable is dichotomous (e.g. gender)

Summary

We only use non-parametric tests
AEQ scores are essentially in ordinal scale
Chi-squared test of independence (nominal and ordinal (except AEQ score))
- Cramer’s V (effect size): examine the association between two categorical variables when at least one of the variables has more than two values (i.e. more than a $2 \times 2$ contingency table).
Kruskal Wallis test and post hoc (pairwise Wilcox test) (AEQ score and all other numeric data)
- Spearman’s rank or Kendall’s tau (correlation coefficient) (still learning)
- Eta-squared($\eta^2$) (effect size)

Data Analysis of Psychology Meditation Study

Charles_Li

2022-07-10