For attribution, please cite this work as:
Andrii Bova. (2024). Analysis of the association between dichotomous
variables with a stratifying factor: Cochran–Mantel–Haenszel test and
conditional logistic regression in R. January 27, 2024. Last updated:
2024-02-12 https://rpubs.com/abova/cmh.
This tutorial discusses the Cochran-Mantel-Haenszel test and conditional logistic regression, and provides an example of how to use these methods in R.
Overview
of the Cochran-Mantel-Haenszel Test and Conditional Logistic
Regression
Key
Indicators for Contingency Tables and Conditional Logistic
Regression
R
Packages for the Cochran-Mantel-Haenzel Test and Conditional Logistic
Regression
Chronic
Conditions, Gender, and COVID-19 Concern: Extended Analysis of
Associations
Description of the
Survey
Frequency
Tables
The Impact of
Chronic Condition on Covid Concern
The Impact of Gender
on Chronic Condition
The Impact of Gender on
Covid Concern
Associations between Three
Variables
Mosaic
Plot of the Association of Chronic Condition and Covid Concern in Two
Subgroups
The
Association between Chronic Conditions and Covid Concern among
Men
The
Association between Chronic Conditions and Covid Concern among
Women
Cochran–Mantel–Haenszel
Test
Homogeneity of Odds
ratios
Binary Logistic
Regression with One Predictor
Conditional Logistic
Regression
Conclusion
References
Named after statisticians William G. Cochran (1909-1980), Nathan
Mantel (1919-2002), and William Haenszel (1910-1998), the
Cochran-Mantel-Haenszel test or the Cochran-Mantel-Haenszel
test of a common odds ratio is used to test conditional independence
between two dichotomous (binary) variables in the presence of
information about a third nominal variable. The Cochran-Mantel-Haenszel
(CMH) test allows to take into account the possible confounding effect
of the third variable on the first and second variables without the need
to estimate parameters for them.
In epidemiology, a confounded is a variable that affects both the
dependent and independent variables, resulting in a spurious
relationship. If such a variable is categorical, it divides (stratifies)
the sample into subgroups (strata). Thus, the CMH test assumes that the
two variables are dichotomous (binary), i.e., the tables are formed by
two rows and two columns, and the third control variable has several
categories. Such contingency tables are denoted as 2 x 2 x K, where K is
the number of categories of the factor (K>2).This type of table is
called partial tables. The stratifying variable (stratification
variable, stratifying factor, or control variable) is used to stratify
the data into subgroups and analyze the association between dichotomous
variables (binary variables) in each subgroup separately. If the
strength and direction of the association between two variables remain
consistent across all levels of a third variable, then the third
variable is not confounding the association between the first two
variables.
There are several tests associated with the names Nathan Mantel and
William Haenszel.
The Generalized Mantel-Haenszel test is used to test
the independence of two categorical variables with more than two
categories or ordinal variables with one or more control
variables.
The Mantel-Haenszel test for linear trend, which is
used to assess the presence of a monotonic (increasing or decreasing)
linear association between two ordinal variables.
The Mantel-Haenszel test for homogeneity of odds ratios
is a statistical test used to assess whether the odds ratios for a
binary outcome variable are the same across multiple strata of a
categorical exposure variable.
The results of some statistical tests can be reproduced using
generalized linear models (GLMs) that relate the outcome variable to one
or more predictors. Conditional logistic regression is an extension of
logistic regression used to analyze data from case-control studies,
where each case is matched to one or more control groups based on
certain criteria. Unlike standard logistic regression, which assumes
independence of observations, conditional logistic regression takes into
account the stratified nature of the data. The conditional
logistic regression allows for the inclusion of a binary
variable (e.g., absence or presence of a disease or occurrence or
non-occurrence of an event) and dichotomous, categorical, or continuous
predictors as the dependent variable and categorical variables as
stratification factors. Conditional logistic regression is used to build
a prognostic model. These methods, by adjusting for the influence of a
third variable or variables, make the results more reliable. This method
is more flexible, but usually requires a larger sample size and can be
more difficult to interpret. Another alternative to the CMH test is
the log linear analysis.
The CMH test is widely used in medical research to analyze various
outcomes, such as disease prevalence, treatment effectiveness, and side
effects. It is often used in biomedical research, when conducting
experiments, for example, the effect of treatment (“drug” or “placebo”)
on recovery (“improvement”, “no change”), controlling for factors such
as age group, gender, region of residence, ethnicity, etc. The CMH
design is useful for analyzing categorical data from randomized block
experiments, especially when the number of categories is small. While it
has broader applications, it cannot handle complex designs like Latin
squares or multifactor ANOVA 1.
The CHM test is based on the the Cochran’s test statistics or/and
Mantel-Haenszel test statistic (denoted as M2 defined, or
Mantel-Haenszel χ2), which is calculated by combining the association
between two variables in each stratum of the control variable.
The null hypothesis for the CMH test can be formulated as either a
two-tailed null hypothesis or a one-tailed null hypothesis. The
two-tailed null hypothesis is that the odds ratio (or relative risk)
between two variables varying at different levels of the control
variable is equal to 1, i.e., there is no association, and the
alternative hypothesis is that the odds ratio (or relative risk) is
different from 1 at least in one strata, i.e., there is a association.
The one-tailed null hypothesis hypothesis assumes that the odds ratio in
the subgroups is greater than or less than 1.
The statistic has a chi-square distribution with one degree of freedom
for the null hypothesis. The test provides a p-value to determine the
statistical significance of the association. If the p-value is less than
the selected significance level (e.g., 0.05), it can be assumed that
there is a statistically significant association between the variables.
One of the formulas for calculating the statistic involves the Yates
correction for continuity. The correction underestimates the level of
significance.
In the context of conditional logistic regression, the evaluation of the
magnitude and direction of the relationship between dependent and
independent variables involves the computation of odds ratios (ORs)
along with their corresponding confidence intervals.
The homogeneity of the OR by subgroups should be maintained, i.e.,
the null hypothesis should be supported by the Breslow-Day
test (also known as the Breslow-Day with Tarone correction,
BDT), the Woolf test or the Mantel-Haenszel test for
homogeneity of odds ratios.
When applying the CMH test, it is common to observe a statistically
significant association in the overall contingency table. In turn, the
contingency tables for subgroups reveal an association in the same
direction, but this association is only statistically significant for
some strata. However, it is important to note that Simpson’s paradox may
be present in the data. This means that even though a statistically
significant association with the same direction may be present in each
individual subgroup contingency table, it may be absent or even have the
opposite direction when the combined data are considered in the overall
table.
For a 2x2 table, the following are interpreted:
Chi-squared test of independence - shows how likely it is that the
observed differences in frequencies occurred by chance.
Statistical significance level - the probability that the observed
differences are random. Usually, the significance level is 0.05.
Measures of association - show how strong the relationship between two
variables is. Common measures of association include the phi coefficient
(φ) and the Yule’s Q association coefficient.
Effect size - shows how much influence one variable has on another. OR -
shows how many times more likely an event is to occur in one group than
in another. In the overall table, the OR is called the non-adjusted or
unstratified odds ratio.
Statistical power - the probability of correctly rejecting the null
hypothesis and detecting an effect with a given sample size.
An interpretation of the following measures is provided for partial
contingency tables for a 2 x 2 x K design:
The chi-square test of independence and its statistical significance,
measures of association and effect size, power of the effect, ORs within
subgroups.
Mantel-Haenszel statistic and its statistical significance. This tests
for the overall association between two variables while controlling for
the stratifying variable.
Mantel-Haenszel OR, also known as the common, adjusted, or pooled OR:
This shows the overall association between two variables after adjusting
for the stratifying variable.
Breslow-Day test, Woolf test, or Mantel-Haenszel test for homogeneity of
odds ratios and statistical significance or non-significance. These
tests assess whether the ORs are consistent across the strata.
In conditional logistic regression, the primary goal is to evaluate
the statistical significance of the overall model and the individual
logit coefficients associated with predictor variables. This assessment
typically involves examining the p-values and confidence intervals of
the coefficients, along with their exponentials, which give more
interpretable odds ratios.
Logistic regression is used to obtain an OR when there is more than one
explanatory variable and allows you to interpret the effect of each
variable. In conditional logistic regression, there must be statistical
significance of the overall model according to the chi-square likelihood
ratio test and Wald’s test estimates.
In a contingency table, the statistical significance of the
association is determined using the chi-square test (χ2). This involves
evaluating whether to reject the null hypothesis. If the cells in a
contingency table have no observations, a fixed number 0.5 is
added.
Phi measure of association is based on the chi-square value. Phi is
equivalent to Cohen’s w.
Next, the effect size is determined. For a 2x2 table, the effect size of
the phi is: 0.10 - small, 0.30 - moderate, 0.50 - large.
In statistics, the power of an effect indicates the probability of
detecting a statistically significant effect if it actually exists. The
power of an effect is usually expressed as a number from 0 to 1, where 0
indicates no power (the test does not detect an effect) and 1 indicates
the highest possible power (the test detects an effect with a high
probability). Usually, 0.8 is considered an acceptable power.
OR is a statistical measure that quantifies the strength of the
association between an exposure or treatment and an outcome in a medical
study. It compares the odds of an outcome occurring in the group that
was exposed or treated (the exposed or experimental group) to the odds
of an outcome occurring in the group that was not exposed or treated
(the unexposed or control group). An OR greater than 1 indicates that
exposure or treatment is associated with increased odds of the outcome
occurring. An OR less than 1 indicates that an exposure or treatment is
associated with a decreased chance of an outcome occurring. If the OR is
equal to 1, it means that there is no relationship between the exposure
and the outcome. In other words, the odds of an outcome occurring are
the same in both the exposed and non-exposed groups.
It is important to consider the confidence interval of the OR because it
provides a range of values within which the true OR is likely to lie. If
the confidence interval for the OR includes 1, it is not known whether
the exposure increases or decreases the odds of an event occurring in
the exposed group compared to the non-exposed group.
When analyzing public opinion survey data, it is advisable to recode the
variables so that the OR is greater than 1, which will facilitate
interpretation.
Effect sizes in 2 x 2 tables and logistic regression models for OR,
where 1.68 is a small effect, 3.47 is a moderate effect, 6.71 is a large
effect 2.
The CMH test is a weighted average of the individual ORs or relative
risks drawn from a sample divided into a series of strata that are
internally homogeneous with respect to the factors that influence the
odds ratio estimates. It is important to note that the CMH statistic has
low power to detect an association in which the association patterns for
some strata are in the opposite direction to those of other strata.
Thus, a non-significant CMH statistic implies either no association or
that no single association pattern has sufficient strength or
consistency to dominate any other pattern 3. To determine the
uniformity of the direction of association between variables, you can
consider the odds ratio (for all tables based on subgroups, it should be
greater than or less than 1 ) or the measure of association Yule’s Q,
which varies from -1 to 1 (respectively, the relationship in all tables
should be greater than or less than 0).
Compliance with the necessary conditions for the use of the CMH test is
checked using the Breslow-Day test for homogeneity of the odds ratios or
the Woolf’s heterogeneity test. These tests are used in the context of
assessing the homogeneity or similarity of effects in different
subgroups. They help to determine whether odds ratios obtained in
several groups or studies are consistent with each other, or whether
there are significant differences between the groups being compared or
certain conditions. The Breslow-Day test checks whether all strata have
the same odds ratio. The Woolf’s test checks whether all layers have the
same value of the logarithm of the odds ratio. These tests, along with
the CHM test, are used in meta-analysis to determine whether treatment
effects differ across studies.
“Interpretational Scenarios:
1. Significant CMH test with homogeneity (non-significant MH and BDT
test): Indicates conditional dependence and consistent association
across strata. The common odds ratio is a reliable summary of the
association.
2. Significant CMH test with heterogeneity (significant MH and BDT
test): Suggests conditional dependence and varying strength or direction
of association across strata (interaction), cautioning against a simple
summary of the association.
3. Non-significant CMH test with homogeneity (non-significant MH and BDT
test): Implies conditional independence and consistent association
across strata. The common odds ratio is a reliable summary of the
association.
4. Non-significant CMH test with heterogeneity (significant MH and BDT
test): Since not all the conditional odds ratios are in the same
direction, the result of the CMH test might not be reliable.
Evaluating the stratum-specific chi-squared tests should be considered.”
4.
Finally, if the CMH test is statistically significant and the odds
ratios in subgroups are homogeneous, we compare the overall odds ratio
from the contingency table and the pooled odds ratio. If the difference
between the overall odds ratio and the pooled odds ratio is less than
10%, then the control variable does not have a confounding effect on the
association between the two variables 5.
By stratifying the data based on the confounding factor, the common odds
ratio can provide a more accurate assessment of the association between
between two variables. Adjustment enhances the robustness and
reliability of observed associations in the contingency table.
In R, several packages and functions can be used to perform the
Cochran-Mantel-Haenszel Test. Some of them are: the
mantelhaen.test
function of the
stats package,
the cmh.test
function of the
lawstat package,
the CMH function
of the CMHNPA
package,
thecmh_test
function from the
coin package,
stratastats
function from the
stratastats
package, epi.2by2
function from the
epiR package,
etc.
The groupwiseCMH function of the rcompanion package performs post-hoc tests for Cochran-Mantel-Haenszel test.
For the Breslow-Day test, there is the BreslowDayTest function in the DescTools package, and for the Woolf test, the woolf_test function in the vcd package.
The
power.cmh.test
function of the
samplesizeCMH
package calculates the sample size required to apply the
Cochran-Mantel-Haenszel test.
The clogit
function of the
survival package
is used for conditional logistic regression.
Data were drawn from the All-Ukrainian survey conducted by the Institute of Sociology of the National Academy of Sciences of Ukraine between September 9 and October 20, 2020. The survey employed a quota sampling design, with a final sample size of 1800 respondents representative of the adult population (aged 18+) in Ukraine. 6
Research question: How does the presence of chronic diseases affect the level of concern about COVID-19?
Question formulation and response alternatives. Please tell me how
concerned you are about the coronavirus pandemic.
Very concerned (1)
Somewhat concerned (2)
Somewhat unconcerned (3)
Not concerned at all (4)
Difficult to say (5)
Do you have any chronic diseases?
No, I do not (1)
Yes, one (2)
Yes, several (3)
Your gender
Man (1)
Woman (2)
The dependent variable was COVID-19 concern, the independent variable was chronic condition, and the stratifying variable was gender.
rm(list=ls())
# load packages
library(CMHNPA)
library(coin)
library(confintr)
library(DescTools)
library(dplyr)
library(effectsize)
library(ggplot2)
library(haven)
library(lawstat)
library(plotly)
library(psych)
library(sjlabelled)
library(sjmisc)
library(sjPlot)
library(stats)
library(stratastats)
library(survival)
library(vcd)
# importing data
us2020<-read_spss("us2020.sav")
# recode variables and select the required variables into a new data frame without NA
data <- us2020 %>%
mutate(
concern = recode(
V18,
`1` = 1,
`2` = 1,
`3` = 0,
`4` = 0,
`5` = NA_real_
),
chronic = recode(
V220,
`1` = 0,
`2` = 1,
`3` = 1
),
gender = recode(V306, `1` = 0, `2` = 1)
) %>%
select(concern, chronic, gender) %>%
na.omit()
attr(data$concern, "label") <-
"Please tell us how much you are concerned about the coronavirus epidemic?"
attr(data$chronic, "label") <-
"Do you have any chronic diseases?"
attr(data$gender, "label") <- "Your gender"
df <- data.frame(
concern = factor(
data$concern,
levels = c(0, 1),
labels = c("not concerned", "concerned")
),
chronic = factor(
data$chronic,
levels = c(0, 1),
labels = c("no", "yes")
),
gender = factor(
data$gender,
levels = c(0, 1),
labels = c("men", "women")
)
)
# write_sav(df, "df.sav")
frq(df)
## concern <categorical>
## # total N=1687 valid N=1687 mean=1.77 sd=0.42
##
## Value | N | Raw % | Valid % | Cum. %
## -----------------------------------------------
## not concerned | 390 | 23.12 | 23.12 | 23.12
## concerned | 1297 | 76.88 | 76.88 | 100.00
## <NA> | 0 | 0.00 | <NA> | <NA>
##
## chronic <categorical>
## # total N=1687 valid N=1687 mean=1.38 sd=0.48
##
## Value | N | Raw % | Valid % | Cum. %
## ---------------------------------------
## no | 1051 | 62.30 | 62.30 | 62.30
## yes | 636 | 37.70 | 37.70 | 100.00
## <NA> | 0 | 0.00 | <NA> | <NA>
##
## gender <categorical>
## # total N=1687 valid N=1687 mean=1.54 sd=0.50
##
## Value | N | Raw % | Valid % | Cum. %
## --------------------------------------
## men | 770 | 45.64 | 45.64 | 45.64
## women | 917 | 54.36 | 54.36 | 100.00
## <NA> | 0 | 0.00 | <NA> | <NA>
frq(data)
## Please tell us how much you are concerned about the coronavirus epidemic? (concern) <numeric>
## # total N=1687 valid N=1687 mean=0.77 sd=0.42
##
## Value | N | Raw % | Valid % | Cum. %
## ---------------------------------------
## 0 | 390 | 23.12 | 23.12 | 23.12
## 1 | 1297 | 76.88 | 76.88 | 100.00
## <NA> | 0 | 0.00 | <NA> | <NA>
##
## Do you have any chronic diseases? (chronic) <numeric>
## # total N=1687 valid N=1687 mean=0.38 sd=0.48
##
## Value | N | Raw % | Valid % | Cum. %
## ---------------------------------------
## 0 | 1051 | 62.30 | 62.30 | 62.30
## 1 | 636 | 37.70 | 37.70 | 100.00
## <NA> | 0 | 0.00 | <NA> | <NA>
##
## Your gender (gender) <numeric>
## # total N=1687 valid N=1687 mean=0.54 sd=0.50
##
## Value | N | Raw % | Valid % | Cum. %
## --------------------------------------
## 0 | 770 | 45.64 | 45.64 | 45.64
## 1 | 917 | 54.36 | 54.36 | 100.00
## <NA> | 0 | 0.00 | <NA> | <NA>
cat("Contingency table, chi-square independence test, phi")
## Contingency table, chi-square independence test, phi
tab_xtab(
var.row = df$chronic,
var.col = df$concern,
show.row.prc = T,
show.summary = F,
var.labels = c("chronic condition",
"covid concern")
)
| chronic condition | covid concern | Total | |
|---|---|---|---|
| not concerned | concerned | ||
| no |
293 27.9 % |
758 72.1 % |
1051 100 % |
| yes |
97 15.3 % |
539 84.7 % |
636 100 % |
| Total |
390 23.1 % |
1297 76.9 % |
1687 100 % |
cross_table <- table(df$chronic, df$concern)
print(chisq.test(cross_table,correct=F), digits = 5)
##
## Pearson's Chi-squared test
##
## data: cross_table
## X-squared = 35.5, df = 1, p-value = 2.5e-09
effectsize::phi(cross_table)
## Phi (adj.) | 95% CI
## -------------------------
## 0.14 | [0.10, 1.00]
##
## - One-sided CIs: upper bound fixed at [1.00].
pow<-power.chisq.test(w=0.14 , df=1, n=1687, sig.level=0.05)
cat("Рower of the effect")
## Рower of the effect
print(pow, digits=3)
##
## Chi squared power calculation
##
## w = 0.14
## n = 1687
## df = 1
## sig.level = 0.05
## power = 1
##
## NOTE: n is the number of observations
cross_table <-
table(df$chronic, df$concern)
or<-OddsRatio(
cross_table,
method = "wald",
conf.level = 0.95
)
cat("Odds ratio")
## Odds ratio
print(or, digits=3)
## odds ratio lwr.ci upr.ci
## 2.15 1.66 2.77
cat("Odds and odds ratio")
## Odds and odds ratio
cat("Odds of an event without additional information")
## Odds of an event without additional information
round((1297/390),2)
## [1] 3.33
cat("Odds in control group")
## Odds in control group
round((758/293),2)
## [1] 2.59
cat("Odds in experimental group")
## Odds in experimental group
round((539/97),2)
## [1] 5.56
cat("Odds ratio")
## Odds ratio
round((539/97)/(758/293),2)
## [1] 2.15
A statistically significant association was found between chronic condition and Сovid concern in the contingency table (χ² (1, n = 1687) = 35.5, p < .001). The effect was small (φ = 0.14, 95% CI [0.10, 0.18], OR = 2.15, 95% CI [1.66, 2.77]). The effect had high power (1).
cat("Сontingency table, chi-square independence test, phi")
## Сontingency table, chi-square independence test, phi
tab_xtab(
var.row = df$gender,
var.col = df$chronic,
show.row.prc = T,
show.summary = F,
var.labels = c("gender",
"chronic condition")
)
| gender | chronic condition | Total | |
|---|---|---|---|
| no | yes | ||
| men |
523 67.9 % |
247 32.1 % |
770 100 % |
| women |
528 57.6 % |
389 42.4 % |
917 100 % |
| Total |
1051 62.3 % |
636 37.7 % |
1687 100 % |
cross_table1 <- table(df$gender, df$chronic)
print(chisq.test(cross_table1,correct=F), digits = 5)
##
## Pearson's Chi-squared test
##
## data: cross_table1
## X-squared = 19.1, df = 1, p-value = 1.3e-05
effectsize::phi(cross_table1)
## Phi (adj.) | 95% CI
## -------------------------
## 0.10 | [0.06, 1.00]
##
## - One-sided CIs: upper bound fixed at [1.00].
A statistically significant association was found between gender and chronic condition χ² (1, n = 1687) = 19.1, p < .001). The effect was small (φ = 0.10, 95% CI [0.06, 1]).
cat("Сontingency table, chi-square independence test, phi")
## Сontingency table, chi-square independence test, phi
tab_xtab(
var.row = df$gender,
var.col = df$concern,
show.row.prc = T,
show.summary = F,
var.labels = c("gender",
"covid concern")
)
| gender | covid concern | Total | |
|---|---|---|---|
| not concerned | concerned | ||
| men |
233 30.3 % |
537 69.7 % |
770 100 % |
| women |
157 17.1 % |
760 82.9 % |
917 100 % |
| Total |
390 23.1 % |
1297 76.9 % |
1687 100 % |
cross_table2 <- table(df$gender, df$concern)
print(chisq.test(cross_table2,correct=F), digits = 5)
##
## Pearson's Chi-squared test
##
## data: cross_table2
## X-squared = 40.7, df = 1, p-value = 1.8e-10
effectsize::phi(cross_table2)
## Phi (adj.) | 95% CI
## -------------------------
## 0.15 | [0.11, 1.00]
##
## - One-sided CIs: upper bound fixed at [1.00].
A statistically significant association was found between gender and Covid concern (χ² (1, n = 1687) = 40.7, p < .001). The effect was small (φ = 0.15, 95% CI [0.11, 1]).
cat("Yule's Q")
## Yule's Q
YuleCor(df[1:3])
## Yule and Generalized Yule coefficients
## Call: YuleCor(x = df[1:3])
##
## Yule coefficient
## concern chronic gender
## concern 1.00 0.36 0.35
## chronic 0.36 1.00 0.22
## gender 0.35 0.22 1.00
##
## Upper and Lower Confidence Intervals =
## concern chronic gender
## concern 1.00 0.47 0.45
## chronic 0.25 1.00 0.31
## gender 0.25 0.12 1.00
Based on the analysis, a significant relationship emerged between Covid concern and chronic conditions, as well as between concern and gender. Meanwhile, the association between chronic conditions and gender was comparatively weaker.
options(digits = 3)
plot <- ggplot(df, aes(x = chronic, fill = concern)) +
geom_bar(position = "fill") +
facet_grid(~gender) +
labs(title = "Mosaic Plot",
x = "chronic condition",
y = "Proportion",
fill = "covid concern")
ggplotly(plot)
#Splitting a dataset into two subgroups
df_men <- subset(df, gender == 'men')
df_women <- subset(df, gender == 'women')
cat("Сontingency table, chi-square independence test, phi")
## Сontingency table, chi-square independence test, phi
tab_xtab(
var.row = df_men$chronic,
var.col = df_men$concern,
show.row.prc = T,
show.summary = F,
var.labels = c("chronic condition",
"covid concern"),
)
| chronic condition | covid concern | Total | |
|---|---|---|---|
| not concerned | concerned | ||
| no |
185 35.4 % |
338 64.6 % |
523 100 % |
| yes |
48 19.4 % |
199 80.6 % |
247 100 % |
| Total |
233 30.3 % |
537 69.7 % |
770 100 % |
cross_table3 <- table(df_men$chronic, df_men$concern)
print(chisq.test(cross_table3,correct=F), digits = 5)
##
## Pearson's Chi-squared test
##
## data: cross_table3
## X-squared = 20.2, df = 1, p-value = 7e-06
effectsize::phi(cross_table3)
## Phi (adj.) | 95% CI
## -------------------------
## 0.16 | [0.10, 1.00]
##
## - One-sided CIs: upper bound fixed at [1.00].
cat("Power of the effect of association chronic condition and Covid concern among men")
## Power of the effect of association chronic condition and Covid concern among men
print(power.chisq.test(
w = 0.16,
df = 1,
n = 770,
sig.level = 0.05
),
digits = 3)
##
## Chi squared power calculation
##
## w = 0.16
## n = 770
## df = 1
## sig.level = 0.05
## power = 0.993
##
## NOTE: n is the number of observations
cat("Odds ratio among men")
## Odds ratio among men
cross_table_men <-
table(df_men$chronic, df_men$concern)
or_men<-OddsRatio(
cross_table_men,
method = "wald",
conf.level = 0.95
)
print(or_men, digits=3)
## odds ratio lwr.ci upr.ci
## 2.27 1.58 3.26
cat("Yule's Q among men")
## Yule's Q among men
YuleCor(df_men[1:2])
## Yule and Generalized Yule coefficients
## Call: YuleCor(x = df_men[1:2])
##
## Yule coefficient
## concern chronic
## concern 1.00 0.39
## chronic 0.39 1.00
##
## Upper and Lower Confidence Intervals =
## concern chronic
## concern 1.00 0.53
## chronic 0.22 1.00
cat("Сontingency table, chi-square independence test, phi")
## Сontingency table, chi-square independence test, phi
tab_xtab(
var.row = df_women$chronic,
var.col = df_women$concern,
show.row.prc = T,
show.summary = F,
var.labels = c("chronic condition",
"covid concern"))
| chronic condition | covid concern | Total | |
|---|---|---|---|
| not concerned | concerned | ||
| no |
108 20.5 % |
420 79.5 % |
528 100 % |
| yes |
49 12.6 % |
340 87.4 % |
389 100 % |
| Total |
157 17.1 % |
760 82.9 % |
917 100 % |
cross_table4 <- table(df_women$chronic, df_women$concern)
print(chisq.test(cross_table4,correct=F), digits = 5)
##
## Pearson's Chi-squared test
##
## data: cross_table4
## X-squared = 9.75, df = 1, p-value = 0.0018
effectsize::phi(cross_table4)
## Phi (adj.) | 95% CI
## -------------------------
## 0.10 | [0.04, 1.00]
##
## - One-sided CIs: upper bound fixed at [1.00].
cat("Power of the effect of association chronic condition and Covid concern among women")
## Power of the effect of association chronic condition and Covid concern among women
print(power.chisq.test(
w = 0.10,
df = 1,
n = 917,
sig.level = 0.05
),
digits = 3)
##
## Chi squared power calculation
##
## w = 0.1
## n = 917
## df = 1
## sig.level = 0.05
## power = 0.857
##
## NOTE: n is the number of observations
cat("Odds ratio among women")
## Odds ratio among women
cross_table_women <-
table(df_women$chronic, df_women$concern)
or_women<-OddsRatio(
cross_table_women,
method = "wald",
conf.level = 0.95
)
print(or_women, digits=3)
## odds ratio lwr.ci upr.ci
## 1.78 1.24 2.57
cat("Yule's Q among women")
## Yule's Q among women
YuleCor(df_women[1:2])
## Yule and Generalized Yule coefficients
## Call: YuleCor(x = df_women[1:2])
##
## Yule coefficient
## concern chronic
## concern 1.00 0.28
## chronic 0.28 1.00
##
## Upper and Lower Confidence Intervals =
## concern chronic
## concern 1.00 0.44
## chronic 0.11 1.00
Subgroup analyses of contingency tables reveal patterns consistent with the overall findings: individuals with chronic diseases exhibited higher level of Covid concern in both men’s and women’s groups.
table1 <- matrix(c(185,338,48,199), byrow = TRUE, ncol=2)
table2 <- matrix(c(108,420,49,340), byrow = TRUE, ncol=2)
tables <- list(table1, table2)
varnames <- c("concern","chronic")
stratvar <- "gender"
results <-
stratastats(
input.data = tables,
variable.names = varnames,
stratifying.variable.name = stratvar
)
print(results$cmh.test.result)
##
## Mantel-Haenszel chi-squared test without continuity correction
##
## data: array_of_tables
## Mantel-Haenszel X-squared = 29, df = 1, p-value = 6e-08
## alternative hypothesis: true common odds ratio is not equal to 1
## 95 percent confidence interval:
## 1.56 2.61
## sample estimates:
## common odds ratio
## 2.02
cmh_test(concern ~ chronic |gender, data = df)
##
## Asymptotic Generalized Cochran-Mantel-Haenszel Test
##
## data: concern by
## chronic (no, yes)
## stratified by gender
## chi-squared = 29, df = 1, p-value = 6e-08
CMH(treatment = df$concern, response = df$chronic,
strata = df$gender, cor_breakdown = T)
## Warning in CMH(treatment = df$concern, response = df$chronic, strata =
## df$gender, : Treatment and response scores not provided. Not performing Mean
## Score and Correlation tests.
##
## Cochran Mantel Haenszel Tests
##
## S df p-value
## Overall Partial Association 29.9 2 3.20e-07
## General Association 29.2 1 6.43e-08
mantelhaen.test(df$concern, df$chronic, df$gender,
correct = FALSE)
##
## Mantel-Haenszel chi-squared test without continuity correction
##
## data: df$concern and df$chronic and df$gender
## Mantel-Haenszel X-squared = 29, df = 1, p-value = 6e-08
## alternative hypothesis: true common odds ratio is not equal to 1
## 95 percent confidence interval:
## 1.56 2.61
## sample estimates:
## common odds ratio
## 2.02
tablest<-xtabs(data=df, ~concern + chronic + gender)
mantelhaen.test(tablest)
##
## Mantel-Haenszel chi-squared test with continuity correction
##
## data: tablest
## Mantel-Haenszel X-squared = 29, df = 1, p-value = 9e-08
## alternative hypothesis: true common odds ratio is not equal to 1
## 95 percent confidence interval:
## 1.56 2.61
## sample estimates:
## common odds ratio
## 2.02
cat("The Breslow-Day-Tarone test")
## The Breslow-Day-Tarone test
print(results$bdt.test.result, digits=4)
## $X2.HBD
## [1] 0.836
##
## $X2.HBDT
## [1] 0.836
##
## $p
## [1] 0.3606
##
## attr(,"class")
## [1] "bdtest"
cat("the logarithm of the odds ratios")
## the logarithm of the odds ratios
vcd::oddsratio(tablest, log = TRUE) #
## log odds ratios for concern and chronic by gender
##
## men women
## 0.819 0.579
cat ("The Woolf test")
## The Woolf test
woolf_test(tablest)
##
## Woolf-test on Homogeneity of Odds Ratios (no 3-Way assoc.)
##
## data: tablest
## X-squared = 0.8, df = 1, p-value = 0.4
cat("The Mantel-Haenszel test")
## The Mantel-Haenszel test
print(results$mh.test.result, digits=4)
## $statistic
## [1] 0.8351
##
## $df
## [1] 1
##
## $p.value
## [1] 0.3608
fit <- glm(concern~ chronic,
family = binomial(link = "logit"), data = data)
summary(fit)
##
## Call:
## glm(formula = concern ~ chronic, family = binomial(link = "logit"),
## data = data)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.9505 0.0688 13.82 < 2e-16 ***
## chronic 0.7645 0.1300 5.88 4.1e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 1824.3 on 1686 degrees of freedom
## Residual deviance: 1787.2 on 1685 degrees of freedom
## AIC: 1791
##
## Number of Fisher Scoring iterations: 4
cat("The exponentiated coefficient (Exp(estim))")
## The exponentiated coefficient (Exp(estim))
round(exp(coefficients(fit)[1:2]),2)
## (Intercept) chronic
## 2.59 2.15
cat("effects")
## effects
cat("2.59, intercept - odds in the control group")
## 2.59, intercept - odds in the control group
cat("2.15 - the odds ratio for a one-unit increase in the experimental group")
## 2.15 - the odds ratio for a one-unit increase in the experimental group
The exponentiated coefficients (odds ratio) associated with chronic condition (Exp(estim) = 2.15) indicated that individuals with a chronic condition had approximately 2.15 times higher odds of reporting higher Covid concern compared to those without. The odds for the intercept (Exp(estim) = 2.59) represented the baseline odds of Covid concern for the reference category.
cat ("The impact of chronic condition on Covid concern with gender as the stratifying variable")
## The impact of chronic condition on Covid concern with gender as the stratifying variable
fitс <- clogit(concern ~ chronic + strata(gender), data=data)
cat("The exponentiated coefficient (Exp(estim))")
## The exponentiated coefficient (Exp(estim))
summary(fitс)
## Call:
## coxph(formula = Surv(rep(1, 1687L), concern) ~ chronic + strata(gender),
## data = data, method = "exact")
##
## n= 1687, number of events= 1297
##
## coef exp(coef) se(coef) z Pr(>|z|)
## chronic 0.702 2.018 0.131 5.34 9.1e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## exp(coef) exp(-coef) lower .95 upper .95
## chronic 2.02 0.496 1.56 2.61
##
## Concordance= 0.575 (se = 0.013 )
## Likelihood ratio test= 30.4 on 1 df, p=4e-08
## Wald test = 28.6 on 1 df, p=9e-08
## Score (logrank) test = 29.2 on 1 df, p=6e-08
In the conditional logistic regression, the exponentiated coefficient (2.02) corresponds to the common odds ratio in the CMH test.
Statistical findings from the
stratastats
package. Chi-squared test results: Partial Table (men subgroup) chi-sq:
20.20, df: 1, p-value: 0.000. Partial Table (women subgroup) chi-sq:
9.75, df: 1, p-value: 0.002.
Marginal table chi-sq: 35.54, df: 1, p-value: 0.000.
The Cochran-Mantel-Haenszel test is significant (test statistic: 29.23;
df: 1; p-value: 0.000), suggesting conditional dependence (the odds
ratio in at least one of the partial tables is not equal to 1).
The Mantel-Haenszel test for homogeneity of odds ratios is not
significant (test statistic: 0.84; df: 1; p-value: 0.361), indicating
homogeneity of the odds ratios across strata.
The Breslow-Day-Tarone test for homogeneity of odds ratios is not
significant (test statistic: 0.84; df: 1; p-value: 0.361), indicating
homogeneity of the odds ratios across strata.
Given the homogeneity of odds ratios across strata, “gender” does not
significantly modify the association between “concern” and “chronic”.
This means that the conditional association between “concern” and
“chronic” is the same (in direction and magnitude) at each level of the
stratifying variable “gender”. The association, which does not
significantly differ across the levels of “gender”, can be summarised
using the Mantel-Haenszel estimate of a common odds ratio (2.02).
result_change <- round(((2.15 - 2.02) * 100 / 2.15), 1)
output <- sprintf("%.1f%%", result_change)
final_output <- paste(output, "change")
cat(final_output, "\n")
## 6.0% change
Gender does not confound the association. The difference between unadjusted and adjusted odds ratios is less than 10%“. The direction of further research may involve examining associations, considering age groups as a stratifying variable.
Rayner, J. C. W., & Livingston, G. C. (2022). Introduction to Cochran-Mantel-Haenszel Testing and Nonparametric ANOVA. Wiley & Sons, Incorporated, John.↩︎
Chen, H., Cohen, P., & Chen, S. (2010). How Big is a Big Odds Ratio? Interpreting the Magnitudes of Odds Ratios in Epidemiological Studies. Communications in Statistics - Simulation and Computation, 39(4), 860–864. https://doi.org/10.1080/03610911003650383 (date of access: 16.01.2024)↩︎
SAS Help Center. SAS Help Center. URL: https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.4/statug/statug_freq_details92.htm (date of access: 16.01.2024).↩︎
Alberti G (2024). Stratastats: Stratified Analysis of 2x2 Contingency Tables. R package version 0.2, https://CRAN.R-project.org/package=stratastats.↩︎
Tripepi, G., Jager, K. J., Dekker, F. W., & Zoccali, C. (2010). Stratification for Confounding – Part 1: The Mantel-Haenszel Formula. Nephron Clinical Practice, 116(4), p.317-321. https://doi.org/10.1159/000319590 (date of access: 25.01.2024)↩︎
Ворона, В.М., & Шульга, М.О. (Ред.). (2020). Українське суспільство: моніторинг соціальних змін. Випуск 7 (21). Київ: Інститут соціології НАН України. https://isnasu.org.ua/assets/files/monitoring/mon2020.pdf (дата доступу: 16.01.2024).↩︎