make sure

You have instal updated package from Git hub using command

devtools::install_github(“umarhussain-git/dentomedical1

about the dentomedical

Publication-Ready Descriptive, Bivariate, Regression, and Diagnostic Accuracy Tools for Medical and Dental Data

function in the package

  • category Categorize a Numeric Variable into Custom Ranges
  • impute_missing Impute Missing Values in a Data Frame
  • medical_data Load Infertility Dataset
  • recode_data Recode values in a data frame using a lookup table
  • sum_cor Summarize Correlations Between a Reference Variable and Others
  • sum_norm Normality Test Summary Table for Numeric Variables
  • sum_posthoc Summarize Variables with Post-hoc Tests: Mutiple comparisons
  • sum_stat Summarize Continuous and Categorical Variables with Optional Grouping
  • sum_stat_p Summarize Continuous and Categorical Variables with Grouping and P-Values
  • sum_stat_p_strata Summarize Variables with Optional Stratification and Statistical Tests
  • linreg Linear Regression Table with Univariable and Multivariable Analysis
  • logreg Binary Logistic Regression Table with Univariable and Multivariable Analysis
  • diag_accuracy Diagnostic Accuracy Metrics with Optional 2x2 Table

category

category creates a new categorical variable by splitting a numeric column into specified ranges. You can provide custom labels for each range, or it will default to the range values themselves.

df <- data.frame(age = c(20, 28, 26, 40, 55, 34, 10, 24, 55))

# Categorize without custom labels
category(df, var = age, level = c("10-25", "26-35", "36-50"))
##   age age_group
## 1  20     10-25
## 2  28     26-35
## 3  26     26-35
## 4  40     36-50
## 5  55      <NA>
## 6  34     26-35
## 7  10     10-25
## 8  24     10-25
## 9  55      <NA>
# Categorize with custom labels
category(df, var = age, level = c("10-25", "26-35", "36-55"),
         labels = c("young", "adult", "old"))
##   age age_group
## 1  20     young
## 2  28     adult
## 3  26     adult
## 4  40       old
## 5  55       old
## 6  34     adult
## 7  10     young
## 8  24     young
## 9  55       old

recode_data

This function replaces values in a data frame according to a named lookup vector. All columns are converted to character, and any value matching a name in lookup will be replaced by its corresponding value. let say we want change spelling mistake or want relabel in the categorical variable like in gender we to replace “F” with “Female”

df <- data.frame(
  gender = c("male", "F", "male", "female"),
  status = c("single", "Married", "oo", "M"),
  stringsAsFactors = FALSE
)

lookup <- c(
  "male" = "Male",
  "M" = "Married",
  "oo" = "Widow",
  "female" = "Female",
  "F" = "Female"
)

df_recode <- recode_data(df, lookup)
print(df_recode)
##   gender  status
## 1   Male  single
## 2 Female Married
## 3   Male   Widow
## 4 Female Married

impute_missing

This function imputes missing values in a data frame. For categorical variables (factor or character), missing values are replaced with the mode (most common category). For numeric variables, missing values can be imputed using the mean, median, or regression-based imputation. If no method is specified for numeric columns, missing values are left as NA. * starwars data has some missing value let handle these values

library(dplyr) # to import starwars data
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
head(starwars)
## # A tibble: 6 × 14
##   name      height  mass hair_color skin_color eye_color birth_year sex   gender
##   <chr>      <int> <dbl> <chr>      <chr>      <chr>          <dbl> <chr> <chr> 
## 1 Luke Sky…    172    77 blond      fair       blue            19   male  mascu…
## 2 C-3PO        167    75 <NA>       gold       yellow         112   none  mascu…
## 3 R2-D2         96    32 <NA>       white, bl… red             33   none  mascu…
## 4 Darth Va…    202   136 none       white      yellow          41.9 male  mascu…
## 5 Leia Org…    150    49 brown      light      brown           19   fema… femin…
## 6 Owen Lars    178   120 brown, gr… light      blue            52   male  mascu…
## # ℹ 5 more variables: homeworld <chr>, species <chr>, films <list>,
## #   vehicles <list>, starships <list>
# Impute numeric columns using regression and categorical with mode
impute_missing(starwars, method = "regression")
## # A tibble: 87 × 14
##    name     height  mass hair_color skin_color eye_color birth_year sex   gender
##    <chr>     <dbl> <dbl> <chr>      <chr>      <chr>          <dbl> <chr> <chr> 
##  1 Luke Sk…    172    77 blond      fair       blue            19   male  mascu…
##  2 C-3PO       167    75 none       gold       yellow         112   none  mascu…
##  3 R2-D2        96    32 none       white, bl… red             33   none  mascu…
##  4 Darth V…    202   136 none       white      yellow          41.9 male  mascu…
##  5 Leia Or…    150    49 brown      light      brown           19   fema… femin…
##  6 Owen La…    178   120 brown, gr… light      blue            52   male  mascu…
##  7 Beru Wh…    165    75 brown      light      blue            47   fema… femin…
##  8 R5-D4        97    32 none       white, red red            254.  none  mascu…
##  9 Biggs D…    183    84 black      light      brown           24   male  mascu…
## 10 Obi-Wan…    182    77 auburn, w… fair       blue-gray       57   male  mascu…
## # ℹ 77 more rows
## # ℹ 5 more variables: homeworld <chr>, species <chr>, films <list>,
## #   vehicles <list>, starships <list>
# Impute numeric columns using mean
impute_missing(starwars, method = "mean")
## # A tibble: 87 × 14
##    name     height  mass hair_color skin_color eye_color birth_year sex   gender
##    <chr>     <dbl> <dbl> <chr>      <chr>      <chr>          <dbl> <chr> <chr> 
##  1 Luke Sk…    172    77 blond      fair       blue            19   male  mascu…
##  2 C-3PO       167    75 none       gold       yellow         112   none  mascu…
##  3 R2-D2        96    32 none       white, bl… red             33   none  mascu…
##  4 Darth V…    202   136 none       white      yellow          41.9 male  mascu…
##  5 Leia Or…    150    49 brown      light      brown           19   fema… femin…
##  6 Owen La…    178   120 brown, gr… light      blue            52   male  mascu…
##  7 Beru Wh…    165    75 brown      light      blue            47   fema… femin…
##  8 R5-D4        97    32 none       white, red red             87.6 none  mascu…
##  9 Biggs D…    183    84 black      light      brown           24   male  mascu…
## 10 Obi-Wan…    182    77 auburn, w… fair       blue-gray       57   male  mascu…
## # ℹ 77 more rows
## # ℹ 5 more variables: homeworld <chr>, species <chr>, films <list>,
## #   vehicles <list>, starships <list>
# Impute numeric columns using median
impute_missing(starwars, method = "median")
## # A tibble: 87 × 14
##    name     height  mass hair_color skin_color eye_color birth_year sex   gender
##    <chr>     <int> <dbl> <chr>      <chr>      <chr>          <dbl> <chr> <chr> 
##  1 Luke Sk…    172    77 blond      fair       blue            19   male  mascu…
##  2 C-3PO       167    75 none       gold       yellow         112   none  mascu…
##  3 R2-D2        96    32 none       white, bl… red             33   none  mascu…
##  4 Darth V…    202   136 none       white      yellow          41.9 male  mascu…
##  5 Leia Or…    150    49 brown      light      brown           19   fema… femin…
##  6 Owen La…    178   120 brown, gr… light      blue            52   male  mascu…
##  7 Beru Wh…    165    75 brown      light      blue            47   fema… femin…
##  8 R5-D4        97    32 none       white, red red             52   none  mascu…
##  9 Biggs D…    183    84 black      light      brown           24   male  mascu…
## 10 Obi-Wan…    182    77 auburn, w… fair       blue-gray       57   male  mascu…
## # ℹ 77 more rows
## # ℹ 5 more variables: homeworld <chr>, species <chr>, films <list>,
## #   vehicles <list>, starships <list>

sum_norm

This function performs the Shapiro-Wilk normality test on all numeric variables in a dataset and returns the results in a publication-ready flextable. Extremely small p-values are displayed as “p < 0.001”. The function automatically detects numeric variables and ignores non-numeric columns.

sum_norm(iris)

Variable

W Statistic

p.value

Distribution

Sepal.Length

0.976

0.0102

Skewed

Sepal.Width

0.985

0.1012

Normal

Petal.Length

0.876

< 0.001

Skewed

Petal.Width

0.902

< 0.001

Skewed

sum_stat

sum_stat provides a summary of both continuous and categorical variables in a dataset. Continuous variables can be summarized using mean (SD) or median (IQR), optionally with 95% confidence intervals. Categorical variables are summarized as counts and percentages, optionally with confidence intervals. Summaries can also be generated by a grouping variable, and a narrative interpretation is optionally printed.

# Basic summary of iris dataset
sum_stat(iris, ci = FALSE, report = TRUE)
## Mean Sepal.Length was 5.84 +/-0.83.
## 
## Mean Sepal.Width was 3.06 +/-0.44.
## 
## Mean Petal.Length was 3.76 +/-1.77.
## 
## Mean Petal.Width was 1.2 +/-0.76.
## 
## For Species, all categories were similar in frequency (n=50, 33.33%).

Variable

Characteristic

N = 150*

Sepal.Length

Mean (SD)

5.84 (0.83)

Sepal.Width

Mean (SD)

3.06 (0.44)

Petal.Length

Mean (SD)

3.76 (1.77)

Petal.Width

Mean (SD)

1.2 (0.76)

Species

setosa

50 (33.33%)

versicolor

50 (33.33%)

virginica

50 (33.33%)

* Mean (SD)/n(%)

selecting some variables and run of sum_stat

names(starwars)
##  [1] "name"       "height"     "mass"       "hair_color" "skin_color"
##  [6] "eye_color"  "birth_year" "sex"        "gender"     "homeworld" 
## [11] "species"    "films"      "vehicles"   "starships"
# select  "height"     "mass"       "hair_color" and "species"  by using thier index number 
sum_stat(starwars[c(2,3,4,11)])
## Mean height was 174.6 +/-34.77.
## 
## Mean mass was 97.31 +/-169.46.
## 
## For hair_color, frequency was highest in none (n=38, 43.68%), followed by brown (n=18, 20.69%), black (n=13, 14.94%), NA (n=5, 5.75%), white (n=4, 4.6%), blond (n=3, 3.45%), auburn (n=1, 1.15%), auburn, grey (n=1, 1.15%), auburn, white (n=1, 1.15%), blonde (n=1, 1.15%), brown, grey (n=1, 1.15%), and lowest in grey (n=1, 1.15%).
## 
## For species, frequency was highest in Human (n=35, 40.23%), followed by Droid (n=6, 6.9%), NA (n=4, 4.6%), Gungan (n=3, 3.45%), Kaminoan (n=2, 2.3%), Mirialan (n=2, 2.3%), Twi'lek (n=2, 2.3%), Wookiee (n=2, 2.3%), Zabrak (n=2, 2.3%), Aleena (n=1, 1.15%), Besalisk (n=1, 1.15%), Cerean (n=1, 1.15%), Chagrian (n=1, 1.15%), Clawdite (n=1, 1.15%), Dug (n=1, 1.15%), Ewok (n=1, 1.15%), Geonosian (n=1, 1.15%), Hutt (n=1, 1.15%), Iktotchi (n=1, 1.15%), Kaleesh (n=1, 1.15%), Kel Dor (n=1, 1.15%), Mon Calamari (n=1, 1.15%), Muun (n=1, 1.15%), Nautolan (n=1, 1.15%), Neimodian (n=1, 1.15%), Pau'an (n=1, 1.15%), Quermian (n=1, 1.15%), Rodian (n=1, 1.15%), Skakoan (n=1, 1.15%), Sullustan (n=1, 1.15%), Tholothian (n=1, 1.15%), Togruta (n=1, 1.15%), Toong (n=1, 1.15%), Toydarian (n=1, 1.15%), Trandoshan (n=1, 1.15%), Vulptereen (n=1, 1.15%), Xexto (n=1, 1.15%), and lowest in Yoda's species (n=1, 1.15%).

Variable

Characteristic

N = 87*

height

Mean (SD)

174.6 (34.77)

mass

Mean (SD)

97.31 (169.46)

hair_color

auburn

1 (1.15%)

auburn, grey

1 (1.15%)

auburn, white

1 (1.15%)

black

13 (14.94%)

blond

3 (3.45%)

blonde

1 (1.15%)

brown

18 (20.69%)

brown, grey

1 (1.15%)

grey

1 (1.15%)

none

38 (43.68%)

white

4 (4.6%)

5 (5.75%)

species

Aleena

1 (1.15%)

Besalisk

1 (1.15%)

Cerean

1 (1.15%)

Chagrian

1 (1.15%)

Clawdite

1 (1.15%)

Droid

6 (6.9%)

Dug

1 (1.15%)

Ewok

1 (1.15%)

Geonosian

1 (1.15%)

Gungan

3 (3.45%)

Human

35 (40.23%)

Hutt

1 (1.15%)

Iktotchi

1 (1.15%)

Kaleesh

1 (1.15%)

Kaminoan

2 (2.3%)

Kel Dor

1 (1.15%)

Mirialan

2 (2.3%)

Mon Calamari

1 (1.15%)

Muun

1 (1.15%)

Nautolan

1 (1.15%)

Neimodian

1 (1.15%)

Pau'an

1 (1.15%)

Quermian

1 (1.15%)

Rodian

1 (1.15%)

Skakoan

1 (1.15%)

Sullustan

1 (1.15%)

Tholothian

1 (1.15%)

Togruta

1 (1.15%)

Toong

1 (1.15%)

Toydarian

1 (1.15%)

Trandoshan

1 (1.15%)

Twi'lek

2 (2.3%)

Vulptereen

1 (1.15%)

Wookiee

2 (2.3%)

Xexto

1 (1.15%)

Yoda's species

1 (1.15%)

Zabrak

2 (2.3%)

4 (4.6%)

* Mean (SD)/n(%)

cross tabulation by sum_stat

use “by=”

# Summary of CO2 dataset by 'Treatment' with CI
sum_stat(CO2, by = "Treatment", ci = TRUE, report = TRUE, percent = "row")
## Mean conc was similar between nonchilled (435 ± 297.72 [95% CI: 344.96, 525.04]) and chilled (435 +/- 297.72 [95% CI: 344.96, 525.04]).
## 
## Mean uptake was higher in nonchilled (30.64 ± 9.7 [95% CI: 27.71, 33.58]) than in chilled (23.78 +/- 10.88 [95% CI: 20.49, 27.08]).
## 
## For Plant, all categories were similar in frequency (n=7, 8.33%).
## 
## For Type, all categories were similar in frequency (n=42, 50%).

Variable

Characteristic

nonchilled

chilled

Plant

Qn1

7 (100% [59.04, 100])

0 (0% [0, 40.96])

Qn2

7 (100% [59.04, 100])

0 (0% [0, 40.96])

Qn3

7 (100% [59.04, 100])

0 (0% [0, 40.96])

Qc1

0 (0% [0, 40.96])

7 (100% [59.04, 100])

Qc3

0 (0% [0, 40.96])

7 (100% [59.04, 100])

Qc2

0 (0% [0, 40.96])

7 (100% [59.04, 100])

Mn3

7 (100% [59.04, 100])

0 (0% [0, 40.96])

Mn2

7 (100% [59.04, 100])

0 (0% [0, 40.96])

Mn1

7 (100% [59.04, 100])

0 (0% [0, 40.96])

Mc2

0 (0% [0, 40.96])

7 (100% [59.04, 100])

Mc3

0 (0% [0, 40.96])

7 (100% [59.04, 100])

Mc1

0 (0% [0, 40.96])

7 (100% [59.04, 100])

Type

Quebec

21 (50% [34.19, 65.81])

21 (50% [34.19, 65.81])

Mississippi

21 (50% [34.19, 65.81])

21 (50% [34.19, 65.81])

conc

Mean (SD)

435 (297.72) [344.96, 525.04]

435 (297.72) [344.96, 525.04]

uptake

Mean (SD)

30.64 (9.7) [27.71, 33.58]

23.78 (10.88) [20.49, 27.08]

* Mean (SD)/n(%) with 95% CI

sum_stat_p

sum_stat_p generates a descriptive summary table for both continuous and categorical variables, stratified by a grouping variable. It automatically computes appropriate statistical tests (Chi-square, Fisher’s exact, t-test, Wilcoxon, ANOVA, or Kruskal–Wallis) based on variable type, number of groups, and data distribution. Continuous variables can be summarized as mean (SD) or median (IQR), and categorical variables as counts and percentages.

# Summary of iris dataset by species # wil select the correct automatically
sum_stat_p(iris, by = "Species", statistic = "mean_sd", test_type = "auto")

Variable

Characteristic

setosa

versicolor

virginica

p-value

Sepal.Length

Mean (SD)

5.01 (0.35)

5.94 (0.52)

6.59 (0.64)

0.00

Sepal.Width

Mean (SD)

3.43 (0.38)

2.77 (0.31)

2.97 (0.32)

0.00

Petal.Length

Mean (SD)

1.46 (0.17)

4.26 (0.47)

5.55 (0.55)

0.00

Petal.Width

Mean (SD)

0.25 (0.11)

1.33 (0.2)

2.03 (0.27)

0.00

1 n (%); Mean (SD)

P-values calculated using: Kruskal-Wallis

# Summary of CO2 dataset by Type with paired t-test
sum_stat_p(CO2, by = "Type", statistic = "mean_sd", test_type = "t.test", paired = TRUE)
## Warning in chisq.test(tbl): Chi-squared approximation may be incorrect

Variable

Characteristic

Quebec

Mississippi

p-value

Plant

Qn1

7 (16.67%)

0 (0%)

0.00

Qn2

7 (16.67%)

0 (0%)

Qn3

7 (16.67%)

0 (0%)

Qc1

7 (16.67%)

0 (0%)

Qc3

7 (16.67%)

0 (0%)

Qc2

7 (16.67%)

0 (0%)

Mn3

0 (0%)

7 (16.67%)

Mn2

0 (0%)

7 (16.67%)

Mn1

0 (0%)

7 (16.67%)

Mc2

0 (0%)

7 (16.67%)

Mc3

0 (0%)

7 (16.67%)

Mc1

0 (0%)

7 (16.67%)

Treatment

nonchilled

21 (50%)

21 (50%)

1.00

chilled

21 (50%)

21 (50%)

conc

Mean (SD)

435 (297.72)

435 (297.72)

NaN

uptake

Mean (SD)

33.54 (9.67)

20.88 (7.82)

0.00

1 n (%); Mean (SD)

P-values calculated using: Chi-square, Paired t-test

# Summary using median and IQR
sum_stat_p(iris, by = "Species", statistic = "med_iqr", test_type = "kruskal")

Variable

Characteristic

setosa

versicolor

virginica

p-value

Sepal.Length

Median (IQR)

5 (4.8, 5.2)

5.9 (5.6, 6.3)

6.5 (6.23, 6.9)

0.00

Sepal.Width

Median (IQR)

3.4 (3.2, 3.68)

2.8 (2.52, 3)

3 (2.8, 3.18)

0.00

Petal.Length

Median (IQR)

1.5 (1.4, 1.58)

4.35 (4, 4.6)

5.55 (5.1, 5.88)

0.00

Petal.Width

Median (IQR)

0.2 (0.2, 0.3)

1.3 (1.2, 1.5)

2 (1.8, 2.3)

0.00

1 n (%); Median (IQR)

P-values calculated using: Kruskal-Wallis

stratified analysis

sum_stat_p_strata

Produces summary tables for numeric and categorical variables in a dataset, optionally stratified by a grouping variable. Numeric variables are summarized with mean (SD) or median (IQR), and categorical variables with counts and percentages. Appropriate statistical tests (t-test, Wilcoxon, ANOVA, Kruskal-Wallis, Chi-square, or Fisher’s Exact) are performed depending on the variable type, number of groups, and user-specified options.

# Example : Summary of CO2 dataset by Type, stratified by Treatment
sum_stat_p_strata(data = CO2, by = "Type", strata = "Treatment")

Treatment

Variable

Characteristic

Quebec

Mississippi

p-value

nonchilled

Plant

Qn1

7 (33%)

0 (0%)

<0.001

Qn2

7 (33%)

0 (0%)

Qn3

7 (33%)

0 (0%)

Qc1

0 (0%)

0 (0%)

Qc3

0 (0%)

0 (0%)

Qc2

0 (0%)

0 (0%)

Mn3

0 (0%)

7 (33%)

Mn2

0 (0%)

7 (33%)

Mn1

0 (0%)

7 (33%)

Mc2

0 (0%)

0 (0%)

Mc3

0 (0%)

0 (0%)

Mc1

0 (0%)

0 (0%)

conc

Mean (SD)

435 (301.42)

435 (301.42)

1.00

uptake

Mean (SD)

35.33 (9.6)

25.95 (7.4)

0.00

chilled

Plant

Qn1

0 (0%)

0 (0%)

<0.001

Qn2

0 (0%)

0 (0%)

Qn3

0 (0%)

0 (0%)

Qc1

7 (33%)

0 (0%)

Qc3

7 (33%)

0 (0%)

Qc2

7 (33%)

0 (0%)

Mn3

0 (0%)

0 (0%)

Mn2

0 (0%)

0 (0%)

Mn1

0 (0%)

0 (0%)

Mc2

0 (0%)

7 (33%)

Mc3

0 (0%)

7 (33%)

Mc1

0 (0%)

7 (33%)

conc

Mean (SD)

435 (301.42)

435 (301.42)

1.00

uptake

Mean (SD)

31.75 (9.64)

15.81 (4.06)

<0.001

1 n (%); Mean (SD)

Tests used: Plant : Fisher's Exact; conc : Student's t-test; uptake : Student's t-test

sum_posthoc

Produces a summary table of numeric or categorical variables grouped by a factor, optionally performing global tests (ANOVA or Kruskal-Wallis) and post-hoc comparisons (Tukey or Dunn test). Numeric variables can be summarized using mean (SD) or median (IQR). Returns a flextable suitable for reporting.

sum_posthoc(
  data = iris,
  by = "Species",
  variables = c("Sepal.Length","Sepal.Width","Petal.Length","Petal.Width")
)

Variable

Characteristic

setosa

versicolor

virginica

p-value

versicolor-setosa

virginica-setosa

virginica-versicolor

setosa - versicolor

setosa - virginica

versicolor - virginica

Sepal.Length*

Mean (SD)

5.01 (0.35)

5.94 (0.52)

6.59 (0.64)

<0.001

0.93 (<0.001)

1.58 (<0.001)

0.65 (<0.001)

Sepal.Width*

Mean (SD)

3.43 (0.38)

2.77 (0.31)

2.97 (0.32)

<0.001

-0.66 (<0.001)

-0.45 (<0.001)

0.2 (<0.001)

Petal.Length*

Mean (SD)

1.46 (0.17)

4.26 (0.47)

5.55 (0.55)

<0.001

2.8 (<0.001)

4.09 (<0.001)

1.29 (<0.001)

Petal.Width**

Median (IQR)

0.2 (0.2, 0.3)

1.3 (1.2, 1.5)

2 (1.8, 2.3)

<0.001

<0.001

<0.001

<0.001

Statistic: Mean (SD) for ANOVA, Median (IQR) for Kruskal-Wallis

Tests used: ANOVA + Tukey, Kruskal-Wallis + Dunn

* ANOVA, ** Kruskal-Wallis

Post-hoc: mean difference (p-value) for pairwise comparisons

# apply ANOVA post hoc tukey test
sum_posthoc(
  data = iris,
  by = "Species",
  variables = c("Sepal.Length","Sepal.Width","Petal.Length","Petal.Width"),
  test_type = "anova"
)

Variable

Characteristic

setosa

versicolor

virginica

p-value

versicolor-setosa

virginica-setosa

virginica-versicolor

Sepal.Length*

Mean (SD)

5.01 (0.35)

5.94 (0.52)

6.59 (0.64)

<0.001

0.93 (<0.001)

1.58 (<0.001)

0.65 (<0.001)

Sepal.Width*

Mean (SD)

3.43 (0.38)

2.77 (0.31)

2.97 (0.32)

<0.001

-0.66 (<0.001)

-0.45 (<0.001)

0.2 (<0.001)

Petal.Length*

Mean (SD)

1.46 (0.17)

4.26 (0.47)

5.55 (0.55)

<0.001

2.8 (<0.001)

4.09 (<0.001)

1.29 (<0.001)

Petal.Width*

Mean (SD)

0.25 (0.11)

1.33 (0.2)

2.03 (0.27)

<0.001

1.08 (<0.001)

1.78 (<0.001)

0.7 (<0.001)

Statistic: Mean (SD) for ANOVA, Median (IQR) for Kruskal-Wallis

Tests used: ANOVA + Tukey

* ANOVA, ** Kruskal-Wallis

Post-hoc: mean difference (p-value) for pairwise comparisons

#Post hoc for one variable

sum_posthoc(
  data = iris,
  by = "Species",
  variables = c("Petal.Width"))

Variable

Characteristic

setosa

versicolor

virginica

p-value

setosa - versicolor

setosa - virginica

versicolor - virginica

Petal.Width**

Median (IQR)

0.2 (0.2, 0.3)

1.3 (1.2, 1.5)

2 (1.8, 2.3)

<0.001

<0.001

<0.001

<0.001

Statistic: Mean (SD) for ANOVA, Median (IQR) for Kruskal-Wallis

Tests used: Kruskal-Wallis + Dunn

* ANOVA, ** Kruskal-Wallis

Post-hoc: mean difference (p-value) for pairwise comparisons

Correlations

sum_cor function

Computes correlations between a reference variable and one or more comparison variables. For Pearson correlations, 95% confidence intervals are also calculated. Can optionally stratify by a grouping variable. Returns a formatted flextable and optionally prints a narrative summary describing weak, moderate, and strong correlations

# Example 1: Correlations across entire dataset
sum_cor(
  data = iris,
  ref_var = "Sepal.Length",
  compare_vars = c("Petal.Length", "Petal.Width", "Sepal.Width"),
  method = "pearson",
  digits = 2,
  report = TRUE
)
## Strong correlation was found with Petal.Length (r = 0.87, 95% CI = 0.83,0.91, p = <0.001) and Petal.Width (r = 0.82, 95% CI = 0.76,0.86, p = <0.001). Weak correlation was found with Sepal.Width (r = -0.12, 95% CI = -0.27,0.04, p = 0.15).

Reference Variable: Sepal.Length

Comparison Variable

Correlation

95% CI Lower

95% CI Upper

p-value

Strength

Petal.Lengtha

0.87

0.83

0.91

<0.001

strong

Petal.Width

0.82

0.76

0.86

<0.001

strong

Sepal.Width

-0.12

-0.27

0.04

0.15

weak

aTest: pearson

stratified correlations

# Example 2: Correlations by Species
sum_cor(
  data = iris,
  ref_var = "Sepal.Length",
  by = "Species",
  compare_vars = c("Petal.Length", "Petal.Width", "Sepal.Width"),
  method = "pearson",
  digits = 2,
  report = TRUE
)
## For setosa: Strong correlation was found with Sepal.Width (r = 0.74, 95% CI = 0.59,0.85, p = <0.001). Weak correlation was found with Petal.Width (r = 0.28, 95% CI = 0,0.52, p = 0.05) and Petal.Length (r = 0.27, 95% CI = -0.01,0.51, p = 0.06).  
## 
## For versicolor: Strong correlation was found with Petal.Length (r = 0.75, 95% CI = 0.6,0.85, p = <0.001). Moderate correlation was found with Petal.Width (r = 0.55, 95% CI = 0.32,0.72, p = <0.001) and Sepal.Width (r = 0.53, 95% CI = 0.29,0.7, p = <0.001).  
## 
## For virginica: Strong correlation was found with Petal.Length (r = 0.86, 95% CI = 0.77,0.92, p = <0.001). Moderate correlation was found with Sepal.Width (r = 0.46, 95% CI = 0.2,0.65, p = <0.001). Weak correlation was found with Petal.Width (r = 0.28, 95% CI = 0,0.52, p = 0.05).

Reference Variable: Sepal.Length

Species

Comparison Variable

Correlation

95% CI Lower

95% CI Upper

p-value

Strength

setosaa

Sepal.Width

0.74

0.59

0.85

<0.001

strong

Petal.Width

0.28

-0.00

0.52

0.05

weak

Petal.Length

0.27

-0.01

0.51

0.06

weak

versicolor

Petal.Length

0.75

0.60

0.85

<0.001

strong

Petal.Width

0.55

0.32

0.72

<0.001

moderate

Sepal.Width

0.53

0.29

0.70

<0.001

moderate

virginica

Petal.Length

0.86

0.77

0.92

<0.001

strong

Sepal.Width

0.46

0.20

0.65

<0.001

moderate

Petal.Width

0.28

0.00

0.52

0.05

weak

aTest: pearson

# Apply linear regression on iris dataset
linreg(
  data = iris,
  outcome = "Sepal.Length",
  predictors = c("Sepal.Width", "Petal.Length", "Species"),
  report = TRUE
)
## $table
## a flextable object.
## col_keys: `Predictor`, `Univariable
## Beta (95% CI)`, `Univariable
## p`, `Multivariable
## Beta (95% CI)`, `Multivariable
## p` 
## header has 1 row(s) 
## body has 5 row(s) 
## original dataset sample: 
##      Predictor Univariable\nBeta (95% CI) Univariable\np
## 1  Sepal.Width        -0.22 (-0.53, 0.08)           0.15
## 2 Petal.Length          0.41 (0.37, 0.45)         <0.001
## 3       setosa                  Reference               
## 4   versicolor          0.93 (0.73, 1.13)         <0.001
## 5    virginica          1.58 (1.38, 1.79)         <0.001
##   Multivariable\nBeta (95% CI) Multivariable\np
## 1            0.43 (0.27, 0.59)           <0.001
## 2             0.78 (0.65, 0.9)           <0.001
## 3                    Reference                 
## 4         -0.96 (-1.38, -0.53)           <0.001
## 5         -1.39 (-1.96, -0.83)           <0.001
## 
## $interpretation
## [1] "The multivariable model explained 86.3% of variance in Sepal.Length (Adjusted R^2 = 0.86). Each one-unit increase in Sepal.Width was associated with a statistically significant increase of 0.43 units in Sepal.Length (95% CI 0.27 to 0.59, p = <0.001). Each one-unit increase in Petal.Length was associated with a statistically significant increase of 0.78 units in Sepal.Length (95% CI 0.65 to 0.9, p = <0.001). Each one-unit increase in Speciesversicolor was associated with a statistically significant decrease of 0.96 units in Sepal.Length (95% CI -1.38 to -0.53, p = <0.001). Each one-unit increase in Speciesvirginica was associated with a statistically significant decrease of 1.39 units in Sepal.Length (95% CI -1.96 to -0.83, p = <0.001). "

linreg function

Linear Regression Table with Univariable and Multivariable Analysis

Fits univariable and multivariable linear regression models for a continuous outcome, summarizing beta coefficients, 95% confidence intervals, and p-values. Factor predictors include reference levels in the table. Returns a formatted flextable and optionally provides an automatic textual interpretation of results.

# Apply linear regression on iris dataset
linreg(
  data = iris,
  outcome = "Sepal.Length",
  predictors = c("Sepal.Width", "Petal.Length", "Species"),
  report = FALSE
)

Predictor

Univariable
Beta (95% CI)

Univariable
p

Multivariable
Beta (95% CI)

Multivariable
p

Sepal.Width

-0.22 (-0.53, 0.08)

0.15

0.43 (0.27, 0.59)

<0.001

Petal.Length

0.41 (0.37, 0.45)

<0.001

0.78 (0.65, 0.9)

<0.001

setosa

Reference

Reference

versicolor

0.93 (0.73, 1.13)

<0.001

-0.96 (-1.38, -0.53)

<0.001

virginica

1.58 (1.38, 1.79)

<0.001

-1.39 (-1.96, -0.83)

<0.001

Multivariable model: R^2 = 0.863; Adjusted R^2 = 0.86

Binary Logistic Regression Table with Univariable and Multivariable Analysis

logreg function

Fits univariable and multivariable logistic regression models for a binary outcome, summarizing odds ratios (ORs), 95% confidence intervals, and p-values. Factor predictors include reference levels in the table. Returns a formatted flextable and optionally provides an automatic textual interpretation of results

logreg(data=medical_data(), outcome="case" ,
   predictors= c("age" ,  "parity" ,    "induced" ), report = TRUE)
## 
## --- Automatic Interpretation ---
## age showed a non-significant increase in odds of case (OR 1, 95% CI 0.94-1.05; p=0.88). parity showed a non-significant increase in odds of case (OR 0.99, 95% CI 0.75-1.3; p=0.95). Yes showed a non-significant increase in odds of case (OR 1.04, 95% CI 0.55-1.95; p=0.90).

Predictor

Univariable
OR (95% CI)

Univariable
p

Multivariable
OR (95% CI)

Multivariable
p

age

1 (0.95-1.05)

0.96

1 (0.94-1.05)

0.88

parity

1.02 (0.82-1.25)

0.89

0.99 (0.75-1.3)

0.95

induced_No

Reference

Reference

induced_Yes

1.04 (0.56-1.91)

0.89

1.04 (0.55-1.95)

0.90

OR, Odds ratio; Binary logistic regression

# apply on trial dataset in gtsummary 
df1 <- gtsummary::trial
logreg(data=df1, outcome="response" ,
   predictors= c( "stage" ,  "grade",  "marker" ), report = TRUE)
## 
## --- Automatic Interpretation ---
## T2 showed a non-significant increase in odds of response (OR 0.48, 95% CI 0.19-1.19; p=0.12). T3 showed a non-significant increase in odds of response (OR 1.06, 95% CI 0.42-2.67; p=0.89). T4 showed a non-significant increase in odds of response (OR 0.71, 95% CI 0.29-1.7; p=0.44). II showed a non-significant increase in odds of response (OR 1.2, 95% CI 0.54-2.7; p=0.66). III showed a non-significant increase in odds of response (OR 1.15, 95% CI 0.53-2.52; p=0.72). marker showed a non-significant increase in odds of response (OR 1.43, 95% CI 0.98-2.09; p=0.06).

Predictor

Univariable
OR (95% CI)

Univariable
p

Multivariable
OR (95% CI)

Multivariable
p

stage_T1

Reference

Reference

stage_T2

0.63 (0.27-1.46)

0.29

0.48 (0.19-1.19)

0.12

stage_T3

1.13 (0.48-2.68)

0.77

1.06 (0.42-2.67)

0.89

stage_T4

0.83 (0.36-1.92)

0.67

0.71 (0.29-1.7)

0.44

grade_I

Reference

Reference

grade_II

0.95 (0.45-2)

0.88

1.2 (0.54-2.7)

0.66

grade_III

1.1 (0.52-2.29)

0.81

1.15 (0.53-2.52)

0.72

marker

1.35 (0.94-1.93)

0.10

1.43 (0.98-2.09)

0.06

OR, Odds ratio; Binary logistic regression

Diagnostic Accuracy Metrics with Optional 2x2 Table

diag_accuracy function

Calculates diagnostic accuracy measures (Sensitivity, Specificity, PPV, NPV, Accuracy, LR+, LR-, DOR) from a binary test and gold standard. Provides 95% confidence intervals using Wilson method for proportions and log method for ratios. Optionally, prints a descriptive 2x2 table.

diagnostic_data <- data.frame(
  test = c("positive","negative","positive","
  negative","positive","negative","positive","negative"),
  goldstandard = c("positive","positive","negative",
  "negative","positive","negative","positive","negative")
)
diag_accuracy(diagnostic_data, test_col = "test",
gold_col = "goldstandard",
descriptive = FALSE)

Diagnostic Metric

Estimate (95% CI)

Sensitivity (%)

75 (30.06-95.44)

Specificity (%)

66.67 (20.77-93.85)

PPV (%)

75 (30.06-95.44)

NPV (%)

66.67 (20.77-93.85)

Accuracy (%)

71.43 (35.89-91.78)

LR+

2.25 (0.41-12.28)

LR-

0.38 (0.06-2.45)

DOR

6 (0.48-75.35)

CI formula references: Wilson (1927) for proportions; log method for LR+/- and DOR (Altman, 1991)

diag_accuracy(diagnostic_data, test_col = "test",
gold_col = "goldstandard",
descriptive = TRUE)
## a flextable object.
## col_keys: `Metric`, `Count` 
## header has 2 row(s) 
## body has 4 row(s) 
## original dataset sample: 
##           Metric Count
## 1  True Positive     3
## 2 False Positive     1
## 3 False Negative     1
## 4  True Negative     2

Diagnostic Metric

Estimate (95% CI)

Sensitivity (%)

75 (30.06-95.44)

Specificity (%)

66.67 (20.77-93.85)

PPV (%)

75 (30.06-95.44)

NPV (%)

66.67 (20.77-93.85)

Accuracy (%)

71.43 (35.89-91.78)

LR+

2.25 (0.41-12.28)

LR-

0.38 (0.06-2.45)

DOR

6 (0.48-75.35)

CI formula references: Wilson (1927) for proportions; log method for LR+/- and DOR (Altman, 1991)