about the dentomedical

Publication-Ready Descriptive, Bivariate, Regression, and Diagnostic Accuracy Tools for Medical and Dental Data

function in the package

category Categorize a Numeric Variable into Custom Ranges
impute_missing Impute Missing Values in a Data Frame
medical_data Load Infertility Dataset
recode_data Recode values in a data frame using a lookup table
sum_cor Summarize Correlations Between a Reference Variable and Others
sum_norm Normality Test Summary Table for Numeric Variables
sum_posthoc Summarize Variables with Post-hoc Tests: Mutiple comparisons
sum_stat Summarize Continuous and Categorical Variables with Optional Grouping
sum_stat_p Summarize Continuous and Categorical Variables with Grouping and P-Values
sum_stat_p_strata Summarize Variables with Optional Stratification and Statistical Tests
linreg Linear Regression Table with Univariable and Multivariable Analysis
logreg Binary Logistic Regression Table with Univariable and Multivariable Analysis
diag_accuracy Diagnostic Accuracy Metrics with Optional 2x2 Table

recode_data

This function replaces values in a data frame according to a named lookup vector. All columns are converted to character, and any value matching a name in lookup will be replaced by its corresponding value. let say we want change spelling mistake or want relabel in the categorical variable like in gender we to replace “F” with “Female”

df <- data.frame(
  gender = c("male", "F", "male", "female"),
  status = c("single", "Married", "oo", "M"),
  stringsAsFactors = FALSE
)

lookup <- c(
  "male" = "Male",
  "M" = "Married",
  "oo" = "Widow",
  "female" = "Female",
  "F" = "Female"
)

df_recode <- recode_data(df, lookup)
print(df_recode)

##   gender  status
## 1   Male  single
## 2 Female Married
## 3   Male   Widow
## 4 Female Married

impute_missing

This function imputes missing values in a data frame. For categorical variables (factor or character), missing values are replaced with the mode (most common category). For numeric variables, missing values can be imputed using the mean, median, or regression-based imputation. If no method is specified for numeric columns, missing values are left as NA. * starwars data has some missing value let handle these values

library(dplyr) # to import starwars data

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

head(starwars)

## # A tibble: 6 × 14
##   name      height  mass hair_color skin_color eye_color birth_year sex   gender
##   <chr>      <int> <dbl> <chr>      <chr>      <chr>          <dbl> <chr> <chr> 
## 1 Luke Sky…    172    77 blond      fair       blue            19   male  mascu…
## 2 C-3PO        167    75 <NA>       gold       yellow         112   none  mascu…
## 3 R2-D2         96    32 <NA>       white, bl… red             33   none  mascu…
## 4 Darth Va…    202   136 none       white      yellow          41.9 male  mascu…
## 5 Leia Org…    150    49 brown      light      brown           19   fema… femin…
## 6 Owen Lars    178   120 brown, gr… light      blue            52   male  mascu…
## # ℹ 5 more variables: homeworld <chr>, species <chr>, films <list>,
## #   vehicles <list>, starships <list>

# Impute numeric columns using regression and categorical with mode
impute_missing(starwars, method = "regression")

## # A tibble: 87 × 14
##    name     height  mass hair_color skin_color eye_color birth_year sex   gender
##    <chr>     <dbl> <dbl> <chr>      <chr>      <chr>          <dbl> <chr> <chr> 
##  1 Luke Sk…    172    77 blond      fair       blue            19   male  mascu…
##  2 C-3PO       167    75 none       gold       yellow         112   none  mascu…
##  3 R2-D2        96    32 none       white, bl… red             33   none  mascu…
##  4 Darth V…    202   136 none       white      yellow          41.9 male  mascu…
##  5 Leia Or…    150    49 brown      light      brown           19   fema… femin…
##  6 Owen La…    178   120 brown, gr… light      blue            52   male  mascu…
##  7 Beru Wh…    165    75 brown      light      blue            47   fema… femin…
##  8 R5-D4        97    32 none       white, red red            254.  none  mascu…
##  9 Biggs D…    183    84 black      light      brown           24   male  mascu…
## 10 Obi-Wan…    182    77 auburn, w… fair       blue-gray       57   male  mascu…
## # ℹ 77 more rows
## # ℹ 5 more variables: homeworld <chr>, species <chr>, films <list>,
## #   vehicles <list>, starships <list>

# Impute numeric columns using mean
impute_missing(starwars, method = "mean")

## # A tibble: 87 × 14
##    name     height  mass hair_color skin_color eye_color birth_year sex   gender
##    <chr>     <dbl> <dbl> <chr>      <chr>      <chr>          <dbl> <chr> <chr> 
##  1 Luke Sk…    172    77 blond      fair       blue            19   male  mascu…
##  2 C-3PO       167    75 none       gold       yellow         112   none  mascu…
##  3 R2-D2        96    32 none       white, bl… red             33   none  mascu…
##  4 Darth V…    202   136 none       white      yellow          41.9 male  mascu…
##  5 Leia Or…    150    49 brown      light      brown           19   fema… femin…
##  6 Owen La…    178   120 brown, gr… light      blue            52   male  mascu…
##  7 Beru Wh…    165    75 brown      light      blue            47   fema… femin…
##  8 R5-D4        97    32 none       white, red red             87.6 none  mascu…
##  9 Biggs D…    183    84 black      light      brown           24   male  mascu…
## 10 Obi-Wan…    182    77 auburn, w… fair       blue-gray       57   male  mascu…
## # ℹ 77 more rows
## # ℹ 5 more variables: homeworld <chr>, species <chr>, films <list>,
## #   vehicles <list>, starships <list>

# Impute numeric columns using median
impute_missing(starwars, method = "median")

## # A tibble: 87 × 14
##    name     height  mass hair_color skin_color eye_color birth_year sex   gender
##    <chr>     <int> <dbl> <chr>      <chr>      <chr>          <dbl> <chr> <chr> 
##  1 Luke Sk…    172    77 blond      fair       blue            19   male  mascu…
##  2 C-3PO       167    75 none       gold       yellow         112   none  mascu…
##  3 R2-D2        96    32 none       white, bl… red             33   none  mascu…
##  4 Darth V…    202   136 none       white      yellow          41.9 male  mascu…
##  5 Leia Or…    150    49 brown      light      brown           19   fema… femin…
##  6 Owen La…    178   120 brown, gr… light      blue            52   male  mascu…
##  7 Beru Wh…    165    75 brown      light      blue            47   fema… femin…
##  8 R5-D4        97    32 none       white, red red             52   none  mascu…
##  9 Biggs D…    183    84 black      light      brown           24   male  mascu…
## 10 Obi-Wan…    182    77 auburn, w… fair       blue-gray       57   male  mascu…
## # ℹ 77 more rows
## # ℹ 5 more variables: homeworld <chr>, species <chr>, films <list>,
## #   vehicles <list>, starships <list>

sum_norm

This function performs the Shapiro-Wilk normality test on all numeric variables in a dataset and returns the results in a publication-ready flextable. Extremely small p-values are displayed as “p < 0.001”. The function automatically detects numeric variables and ignores non-numeric columns.

sum_norm(iris)

Variable	W Statistic	p.value	Distribution
Sepal.Length	0.976	0.0102	Skewed
Sepal.Width	0.985	0.1012	Normal
Petal.Length	0.876	< 0.001	Skewed
Petal.Width	0.902	< 0.001	Skewed

sum_stat

sum_stat provides a summary of both continuous and categorical variables in a dataset. Continuous variables can be summarized using mean (SD) or median (IQR), optionally with 95% confidence intervals. Categorical variables are summarized as counts and percentages, optionally with confidence intervals. Summaries can also be generated by a grouping variable, and a narrative interpretation is optionally printed.

# Basic summary of iris dataset
sum_stat(iris, ci = FALSE, report = TRUE)

## Mean Sepal.Length was 5.84 +/-0.83.
## 
## Mean Sepal.Width was 3.06 +/-0.44.
## 
## Mean Petal.Length was 3.76 +/-1.77.
## 
## Mean Petal.Width was 1.2 +/-0.76.
## 
## For Species, all categories were similar in frequency (n=50, 33.33%).

Variable	Characteristic	N = 150*
Sepal.Length	Mean (SD)	5.84 (0.83)
Sepal.Width	Mean (SD)	3.06 (0.44)
Petal.Length	Mean (SD)	3.76 (1.77)
Petal.Width	Mean (SD)	1.2 (0.76)
Species	setosa	50 (33.33%)
	versicolor	50 (33.33%)
	virginica	50 (33.33%)
* Mean (SD)/n(%)

selecting some variables and run of sum_stat

names(starwars)

##  [1] "name"       "height"     "mass"       "hair_color" "skin_color"
##  [6] "eye_color"  "birth_year" "sex"        "gender"     "homeworld" 
## [11] "species"    "films"      "vehicles"   "starships"

# select  "height"     "mass"       "hair_color" and "species"  by using thier index number 
sum_stat(starwars[c(2,3,4,11)])

## Mean height was 174.6 +/-34.77.
## 
## Mean mass was 97.31 +/-169.46.
## 
## For hair_color, frequency was highest in none (n=38, 43.68%), followed by brown (n=18, 20.69%), black (n=13, 14.94%), NA (n=5, 5.75%), white (n=4, 4.6%), blond (n=3, 3.45%), auburn (n=1, 1.15%), auburn, grey (n=1, 1.15%), auburn, white (n=1, 1.15%), blonde (n=1, 1.15%), brown, grey (n=1, 1.15%), and lowest in grey (n=1, 1.15%).
## 
## For species, frequency was highest in Human (n=35, 40.23%), followed by Droid (n=6, 6.9%), NA (n=4, 4.6%), Gungan (n=3, 3.45%), Kaminoan (n=2, 2.3%), Mirialan (n=2, 2.3%), Twi'lek (n=2, 2.3%), Wookiee (n=2, 2.3%), Zabrak (n=2, 2.3%), Aleena (n=1, 1.15%), Besalisk (n=1, 1.15%), Cerean (n=1, 1.15%), Chagrian (n=1, 1.15%), Clawdite (n=1, 1.15%), Dug (n=1, 1.15%), Ewok (n=1, 1.15%), Geonosian (n=1, 1.15%), Hutt (n=1, 1.15%), Iktotchi (n=1, 1.15%), Kaleesh (n=1, 1.15%), Kel Dor (n=1, 1.15%), Mon Calamari (n=1, 1.15%), Muun (n=1, 1.15%), Nautolan (n=1, 1.15%), Neimodian (n=1, 1.15%), Pau'an (n=1, 1.15%), Quermian (n=1, 1.15%), Rodian (n=1, 1.15%), Skakoan (n=1, 1.15%), Sullustan (n=1, 1.15%), Tholothian (n=1, 1.15%), Togruta (n=1, 1.15%), Toong (n=1, 1.15%), Toydarian (n=1, 1.15%), Trandoshan (n=1, 1.15%), Vulptereen (n=1, 1.15%), Xexto (n=1, 1.15%), and lowest in Yoda's species (n=1, 1.15%).

Variable	Characteristic	N = 87*
height	Mean (SD)	174.6 (34.77)
mass	Mean (SD)	97.31 (169.46)
hair_color	auburn	1 (1.15%)
	auburn, grey	1 (1.15%)
	auburn, white	1 (1.15%)
	black	13 (14.94%)
	blond	3 (3.45%)
	blonde	1 (1.15%)
	brown	18 (20.69%)
	brown, grey	1 (1.15%)
	grey	1 (1.15%)
	none	38 (43.68%)
	white	4 (4.6%)
		5 (5.75%)
species	Aleena	1 (1.15%)
	Besalisk	1 (1.15%)
	Cerean	1 (1.15%)
	Chagrian	1 (1.15%)
	Clawdite	1 (1.15%)
	Droid	6 (6.9%)
	Dug	1 (1.15%)
	Ewok	1 (1.15%)
	Geonosian	1 (1.15%)
	Gungan	3 (3.45%)
	Human	35 (40.23%)
	Hutt	1 (1.15%)
	Iktotchi	1 (1.15%)
	Kaleesh	1 (1.15%)
	Kaminoan	2 (2.3%)
	Kel Dor	1 (1.15%)
	Mirialan	2 (2.3%)
	Mon Calamari	1 (1.15%)
	Muun	1 (1.15%)
	Nautolan	1 (1.15%)
	Neimodian	1 (1.15%)
	Pau'an	1 (1.15%)
	Quermian	1 (1.15%)
	Rodian	1 (1.15%)
	Skakoan	1 (1.15%)
	Sullustan	1 (1.15%)
	Tholothian	1 (1.15%)
	Togruta	1 (1.15%)
	Toong	1 (1.15%)
	Toydarian	1 (1.15%)
	Trandoshan	1 (1.15%)
	Twi'lek	2 (2.3%)
	Vulptereen	1 (1.15%)
	Wookiee	2 (2.3%)
	Xexto	1 (1.15%)
	Yoda's species	1 (1.15%)
	Zabrak	2 (2.3%)
		4 (4.6%)
* Mean (SD)/n(%)

cross tabulation by sum_stat

use “by=”

# Summary of CO2 dataset by 'Treatment' with CI
sum_stat(CO2, by = "Treatment", ci = TRUE, report = TRUE, percent = "row")

## Mean conc was similar between nonchilled (435 ± 297.72 [95% CI: 344.96, 525.04]) and chilled (435 +/- 297.72 [95% CI: 344.96, 525.04]).
## 
## Mean uptake was higher in nonchilled (30.64 ± 9.7 [95% CI: 27.71, 33.58]) than in chilled (23.78 +/- 10.88 [95% CI: 20.49, 27.08]).
## 
## For Plant, all categories were similar in frequency (n=7, 8.33%).
## 
## For Type, all categories were similar in frequency (n=42, 50%).

Variable	Characteristic	nonchilled	chilled
Plant	Qn1	7 (100% [59.04, 100])	0 (0% [0, 40.96])
	Qn2	7 (100% [59.04, 100])	0 (0% [0, 40.96])
	Qn3	7 (100% [59.04, 100])	0 (0% [0, 40.96])
	Qc1	0 (0% [0, 40.96])	7 (100% [59.04, 100])
	Qc3	0 (0% [0, 40.96])	7 (100% [59.04, 100])
	Qc2	0 (0% [0, 40.96])	7 (100% [59.04, 100])
	Mn3	7 (100% [59.04, 100])	0 (0% [0, 40.96])
	Mn2	7 (100% [59.04, 100])	0 (0% [0, 40.96])
	Mn1	7 (100% [59.04, 100])	0 (0% [0, 40.96])
	Mc2	0 (0% [0, 40.96])	7 (100% [59.04, 100])
	Mc3	0 (0% [0, 40.96])	7 (100% [59.04, 100])
	Mc1	0 (0% [0, 40.96])	7 (100% [59.04, 100])
Type	Quebec	21 (50% [34.19, 65.81])	21 (50% [34.19, 65.81])
	Mississippi	21 (50% [34.19, 65.81])	21 (50% [34.19, 65.81])
conc	Mean (SD)	435 (297.72) [344.96, 525.04]	435 (297.72) [344.96, 525.04]
uptake	Mean (SD)	30.64 (9.7) [27.71, 33.58]	23.78 (10.88) [20.49, 27.08]
* Mean (SD)/n(%) with 95% CI

sum_stat_p

sum_stat_p generates a descriptive summary table for both continuous and categorical variables, stratified by a grouping variable. It automatically computes appropriate statistical tests (Chi-square, Fisher’s exact, t-test, Wilcoxon, ANOVA, or Kruskal–Wallis) based on variable type, number of groups, and data distribution. Continuous variables can be summarized as mean (SD) or median (IQR), and categorical variables as counts and percentages.

# Summary of iris dataset by species # wil select the correct automatically
sum_stat_p(iris, by = "Species", statistic = "mean_sd", test_type = "auto")

Variable	Characteristic	setosa	versicolor	virginica	p-value
Sepal.Length	Mean (SD)	5.01 (0.35)	5.94 (0.52)	6.59 (0.64)	0.00
Sepal.Width	Mean (SD)	3.43 (0.38)	2.77 (0.31)	2.97 (0.32)	0.00
Petal.Length	Mean (SD)	1.46 (0.17)	4.26 (0.47)	5.55 (0.55)	0.00
Petal.Width	Mean (SD)	0.25 (0.11)	1.33 (0.2)	2.03 (0.27)	0.00
1 n (%); Mean (SD)
P-values calculated using: Kruskal-Wallis

# Summary of CO2 dataset by Type with paired t-test
sum_stat_p(CO2, by = "Type", statistic = "mean_sd", test_type = "t.test", paired = TRUE)

## Warning in chisq.test(tbl): Chi-squared approximation may be incorrect

Variable	Characteristic	Quebec	Mississippi	p-value
Plant	Qn1	7 (16.67%)	0 (0%)	0.00
	Qn2	7 (16.67%)	0 (0%)
	Qn3	7 (16.67%)	0 (0%)
	Qc1	7 (16.67%)	0 (0%)
	Qc3	7 (16.67%)	0 (0%)
	Qc2	7 (16.67%)	0 (0%)
	Mn3	0 (0%)	7 (16.67%)
	Mn2	0 (0%)	7 (16.67%)
	Mn1	0 (0%)	7 (16.67%)
	Mc2	0 (0%)	7 (16.67%)
	Mc3	0 (0%)	7 (16.67%)
	Mc1	0 (0%)	7 (16.67%)
Treatment	nonchilled	21 (50%)	21 (50%)	1.00
	chilled	21 (50%)	21 (50%)
conc	Mean (SD)	435 (297.72)	435 (297.72)	NaN
uptake	Mean (SD)	33.54 (9.67)	20.88 (7.82)	0.00
1 n (%); Mean (SD)
P-values calculated using: Chi-square, Paired t-test

# Summary using median and IQR
sum_stat_p(iris, by = "Species", statistic = "med_iqr", test_type = "kruskal")

Variable	Characteristic	setosa	versicolor	virginica	p-value
Sepal.Length	Median (IQR)	5 (4.8, 5.2)	5.9 (5.6, 6.3)	6.5 (6.23, 6.9)	0.00
Sepal.Width	Median (IQR)	3.4 (3.2, 3.68)	2.8 (2.52, 3)	3 (2.8, 3.18)	0.00
Petal.Length	Median (IQR)	1.5 (1.4, 1.58)	4.35 (4, 4.6)	5.55 (5.1, 5.88)	0.00
Petal.Width	Median (IQR)	0.2 (0.2, 0.3)	1.3 (1.2, 1.5)	2 (1.8, 2.3)	0.00
1 n (%); Median (IQR)
P-values calculated using: Kruskal-Wallis

stratified analysis

sum_stat_p_strata

Produces summary tables for numeric and categorical variables in a dataset, optionally stratified by a grouping variable. Numeric variables are summarized with mean (SD) or median (IQR), and categorical variables with counts and percentages. Appropriate statistical tests (t-test, Wilcoxon, ANOVA, Kruskal-Wallis, Chi-square, or Fisher’s Exact) are performed depending on the variable type, number of groups, and user-specified options.

# Example : Summary of CO2 dataset by Type, stratified by Treatment
sum_stat_p_strata(data = CO2, by = "Type", strata = "Treatment")

Treatment	Variable	Characteristic	Quebec	Mississippi	p-value
nonchilled	Plant	Qn1	7 (33%)	0 (0%)	<0.001
		Qn2	7 (33%)	0 (0%)
		Qn3	7 (33%)	0 (0%)
		Qc1	0 (0%)	0 (0%)
		Qc3	0 (0%)	0 (0%)
		Qc2	0 (0%)	0 (0%)
		Mn3	0 (0%)	7 (33%)
		Mn2	0 (0%)	7 (33%)
		Mn1	0 (0%)	7 (33%)
		Mc2	0 (0%)	0 (0%)
		Mc3	0 (0%)	0 (0%)
		Mc1	0 (0%)	0 (0%)
	conc	Mean (SD)	435 (301.42)	435 (301.42)	1.00
	uptake	Mean (SD)	35.33 (9.6)	25.95 (7.4)	0.00
chilled	Plant	Qn1	0 (0%)	0 (0%)	<0.001
		Qn2	0 (0%)	0 (0%)
		Qn3	0 (0%)	0 (0%)
		Qc1	7 (33%)	0 (0%)
		Qc3	7 (33%)	0 (0%)
		Qc2	7 (33%)	0 (0%)
		Mn3	0 (0%)	0 (0%)
		Mn2	0 (0%)	0 (0%)
		Mn1	0 (0%)	0 (0%)
		Mc2	0 (0%)	7 (33%)
		Mc3	0 (0%)	7 (33%)
		Mc1	0 (0%)	7 (33%)
	conc	Mean (SD)	435 (301.42)	435 (301.42)	1.00
	uptake	Mean (SD)	31.75 (9.64)	15.81 (4.06)	<0.001
1 n (%); Mean (SD)
Tests used: Plant : Fisher's Exact; conc : Student's t-test; uptake : Student's t-test

sum_posthoc

Produces a summary table of numeric or categorical variables grouped by a factor, optionally performing global tests (ANOVA or Kruskal-Wallis) and post-hoc comparisons (Tukey or Dunn test). Numeric variables can be summarized using mean (SD) or median (IQR). Returns a flextable suitable for reporting.

sum_posthoc(
  data = iris,
  by = "Species",
  variables = c("Sepal.Length","Sepal.Width","Petal.Length","Petal.Width")
)

Variable	Characteristic	setosa	versicolor	virginica	p-value	versicolor-setosa	virginica-setosa	virginica-versicolor	setosa - versicolor	setosa - virginica	versicolor - virginica
Sepal.Length*	Mean (SD)	5.01 (0.35)	5.94 (0.52)	6.59 (0.64)	<0.001	0.93 (<0.001)	1.58 (<0.001)	0.65 (<0.001)
Sepal.Width*	Mean (SD)	3.43 (0.38)	2.77 (0.31)	2.97 (0.32)	<0.001	-0.66 (<0.001)	-0.45 (<0.001)	0.2 (<0.001)
Petal.Length*	Mean (SD)	1.46 (0.17)	4.26 (0.47)	5.55 (0.55)	<0.001	2.8 (<0.001)	4.09 (<0.001)	1.29 (<0.001)
Petal.Width**	Median (IQR)	0.2 (0.2, 0.3)	1.3 (1.2, 1.5)	2 (1.8, 2.3)	<0.001				<0.001	<0.001	<0.001
Statistic: Mean (SD) for ANOVA, Median (IQR) for Kruskal-Wallis
Tests used: ANOVA + Tukey, Kruskal-Wallis + Dunn
* ANOVA, ** Kruskal-Wallis
Post-hoc: mean difference (p-value) for pairwise comparisons

# apply ANOVA post hoc tukey test
sum_posthoc(
  data = iris,
  by = "Species",
  variables = c("Sepal.Length","Sepal.Width","Petal.Length","Petal.Width"),
  test_type = "anova"
)

Variable	Characteristic	setosa	versicolor	virginica	p-value	versicolor-setosa	virginica-setosa	virginica-versicolor
Sepal.Length*	Mean (SD)	5.01 (0.35)	5.94 (0.52)	6.59 (0.64)	<0.001	0.93 (<0.001)	1.58 (<0.001)	0.65 (<0.001)
Sepal.Width*	Mean (SD)	3.43 (0.38)	2.77 (0.31)	2.97 (0.32)	<0.001	-0.66 (<0.001)	-0.45 (<0.001)	0.2 (<0.001)
Petal.Length*	Mean (SD)	1.46 (0.17)	4.26 (0.47)	5.55 (0.55)	<0.001	2.8 (<0.001)	4.09 (<0.001)	1.29 (<0.001)
Petal.Width*	Mean (SD)	0.25 (0.11)	1.33 (0.2)	2.03 (0.27)	<0.001	1.08 (<0.001)	1.78 (<0.001)	0.7 (<0.001)
Statistic: Mean (SD) for ANOVA, Median (IQR) for Kruskal-Wallis
Tests used: ANOVA + Tukey
* ANOVA, ** Kruskal-Wallis
Post-hoc: mean difference (p-value) for pairwise comparisons

#Post hoc for one variable

sum_posthoc(
  data = iris,
  by = "Species",
  variables = c("Petal.Width"))

Variable	Characteristic	setosa	versicolor	virginica	p-value	setosa - versicolor	setosa - virginica	versicolor - virginica
Petal.Width**	Median (IQR)	0.2 (0.2, 0.3)	1.3 (1.2, 1.5)	2 (1.8, 2.3)	<0.001	<0.001	<0.001	<0.001
Statistic: Mean (SD) for ANOVA, Median (IQR) for Kruskal-Wallis
Tests used: Kruskal-Wallis + Dunn
* ANOVA, ** Kruskal-Wallis
Post-hoc: mean difference (p-value) for pairwise comparisons

Correlations

sum_cor function

Computes correlations between a reference variable and one or more comparison variables. For Pearson correlations, 95% confidence intervals are also calculated. Can optionally stratify by a grouping variable. Returns a formatted flextable and optionally prints a narrative summary describing weak, moderate, and strong correlations

# Example 1: Correlations across entire dataset
sum_cor(
  data = iris,
  ref_var = "Sepal.Length",
  compare_vars = c("Petal.Length", "Petal.Width", "Sepal.Width"),
  method = "pearson",
  digits = 2,
  report = TRUE
)

## Strong correlation was found with Petal.Length (r = 0.87, 95% CI = 0.83,0.91, p = <0.001) and Petal.Width (r = 0.82, 95% CI = 0.76,0.86, p = <0.001). Weak correlation was found with Sepal.Width (r = -0.12, 95% CI = -0.27,0.04, p = 0.15).

Reference Variable: Sepal.Length
Comparison Variable	Correlation	95% CI Lower	95% CI Upper	p-value	Strength
Petal.Lengtha	0.87	0.83	0.91	<0.001	strong
Petal.Width	0.82	0.76	0.86	<0.001	strong
Sepal.Width	-0.12	-0.27	0.04	0.15	weak
aTest: pearson

stratified correlations

# Example 2: Correlations by Species
sum_cor(
  data = iris,
  ref_var = "Sepal.Length",
  by = "Species",
  compare_vars = c("Petal.Length", "Petal.Width", "Sepal.Width"),
  method = "pearson",
  digits = 2,
  report = TRUE
)

## For setosa: Strong correlation was found with Sepal.Width (r = 0.74, 95% CI = 0.59,0.85, p = <0.001). Weak correlation was found with Petal.Width (r = 0.28, 95% CI = 0,0.52, p = 0.05) and Petal.Length (r = 0.27, 95% CI = -0.01,0.51, p = 0.06).  
## 
## For versicolor: Strong correlation was found with Petal.Length (r = 0.75, 95% CI = 0.6,0.85, p = <0.001). Moderate correlation was found with Petal.Width (r = 0.55, 95% CI = 0.32,0.72, p = <0.001) and Sepal.Width (r = 0.53, 95% CI = 0.29,0.7, p = <0.001).  
## 
## For virginica: Strong correlation was found with Petal.Length (r = 0.86, 95% CI = 0.77,0.92, p = <0.001). Moderate correlation was found with Sepal.Width (r = 0.46, 95% CI = 0.2,0.65, p = <0.001). Weak correlation was found with Petal.Width (r = 0.28, 95% CI = 0,0.52, p = 0.05).

Reference Variable: Sepal.Length
Species	Comparison Variable	Correlation	95% CI Lower	95% CI Upper	p-value	Strength
setosaa	Sepal.Width	0.74	0.59	0.85	<0.001	strong
	Petal.Width	0.28	-0.00	0.52	0.05	weak
	Petal.Length	0.27	-0.01	0.51	0.06	weak
versicolor	Petal.Length	0.75	0.60	0.85	<0.001	strong
	Petal.Width	0.55	0.32	0.72	<0.001	moderate
	Sepal.Width	0.53	0.29	0.70	<0.001	moderate
virginica	Petal.Length	0.86	0.77	0.92	<0.001	strong
	Sepal.Width	0.46	0.20	0.65	<0.001	moderate
	Petal.Width	0.28	0.00	0.52	0.05	weak
aTest: pearson

# Apply linear regression on iris dataset
linreg(
  data = iris,
  outcome = "Sepal.Length",
  predictors = c("Sepal.Width", "Petal.Length", "Species"),
  report = TRUE
)

## $table
## a flextable object.
## col_keys: `Predictor`, `Univariable
## Beta (95% CI)`, `Univariable
## p`, `Multivariable
## Beta (95% CI)`, `Multivariable
## p` 
## header has 1 row(s) 
## body has 5 row(s) 
## original dataset sample: 
##      Predictor Univariable\nBeta (95% CI) Univariable\np
## 1  Sepal.Width        -0.22 (-0.53, 0.08)           0.15
## 2 Petal.Length          0.41 (0.37, 0.45)         <0.001
## 3       setosa                  Reference               
## 4   versicolor          0.93 (0.73, 1.13)         <0.001
## 5    virginica          1.58 (1.38, 1.79)         <0.001
##   Multivariable\nBeta (95% CI) Multivariable\np
## 1            0.43 (0.27, 0.59)           <0.001
## 2             0.78 (0.65, 0.9)           <0.001
## 3                    Reference                 
## 4         -0.96 (-1.38, -0.53)           <0.001
## 5         -1.39 (-1.96, -0.83)           <0.001
## 
## $interpretation
## [1] "The multivariable model explained 86.3% of variance in Sepal.Length (Adjusted R^2 = 0.86). Each one-unit increase in Sepal.Width was associated with a statistically significant increase of 0.43 units in Sepal.Length (95% CI 0.27 to 0.59, p = <0.001). Each one-unit increase in Petal.Length was associated with a statistically significant increase of 0.78 units in Sepal.Length (95% CI 0.65 to 0.9, p = <0.001). Each one-unit increase in Speciesversicolor was associated with a statistically significant decrease of 0.96 units in Sepal.Length (95% CI -1.38 to -0.53, p = <0.001). Each one-unit increase in Speciesvirginica was associated with a statistically significant decrease of 1.39 units in Sepal.Length (95% CI -1.96 to -0.83, p = <0.001). "

linreg function

Linear Regression Table with Univariable and Multivariable Analysis

Fits univariable and multivariable linear regression models for a continuous outcome, summarizing beta coefficients, 95% confidence intervals, and p-values. Factor predictors include reference levels in the table. Returns a formatted flextable and optionally provides an automatic textual interpretation of results.

# Apply linear regression on iris dataset
linreg(
  data = iris,
  outcome = "Sepal.Length",
  predictors = c("Sepal.Width", "Petal.Length", "Species"),
  report = FALSE
)

Predictor	Univariable Beta (95% CI)	Univariable p	Multivariable Beta (95% CI)	Multivariable p
Sepal.Width	-0.22 (-0.53, 0.08)	0.15	0.43 (0.27, 0.59)	<0.001
Petal.Length	0.41 (0.37, 0.45)	<0.001	0.78 (0.65, 0.9)	<0.001
setosa	Reference		Reference
versicolor	0.93 (0.73, 1.13)	<0.001	-0.96 (-1.38, -0.53)	<0.001
virginica	1.58 (1.38, 1.79)	<0.001	-1.39 (-1.96, -0.83)	<0.001
Multivariable model: R^2 = 0.863; Adjusted R^2 = 0.86

Binary Logistic Regression Table with Univariable and Multivariable Analysis

logreg function

Fits univariable and multivariable logistic regression models for a binary outcome, summarizing odds ratios (ORs), 95% confidence intervals, and p-values. Factor predictors include reference levels in the table. Returns a formatted flextable and optionally provides an automatic textual interpretation of results

logreg(data=medical_data(), outcome="case" ,
   predictors= c("age" ,  "parity" ,    "induced" ), report = TRUE)

## 
## --- Automatic Interpretation ---
## age showed a non-significant increase in odds of case (OR 1, 95% CI 0.94-1.05; p=0.88). parity showed a non-significant increase in odds of case (OR 0.99, 95% CI 0.75-1.3; p=0.95). Yes showed a non-significant increase in odds of case (OR 1.04, 95% CI 0.55-1.95; p=0.90).

Predictor	Univariable OR (95% CI)	Univariable p	Multivariable OR (95% CI)	Multivariable p
age	1 (0.95-1.05)	0.96	1 (0.94-1.05)	0.88
parity	1.02 (0.82-1.25)	0.89	0.99 (0.75-1.3)	0.95
induced_No	Reference		Reference
induced_Yes	1.04 (0.56-1.91)	0.89	1.04 (0.55-1.95)	0.90
OR, Odds ratio; Binary logistic regression

# apply on trial dataset in gtsummary 
df1 <- gtsummary::trial
logreg(data=df1, outcome="response" ,
   predictors= c( "stage" ,  "grade",  "marker" ), report = TRUE)

## 
## --- Automatic Interpretation ---
## T2 showed a non-significant increase in odds of response (OR 0.48, 95% CI 0.19-1.19; p=0.12). T3 showed a non-significant increase in odds of response (OR 1.06, 95% CI 0.42-2.67; p=0.89). T4 showed a non-significant increase in odds of response (OR 0.71, 95% CI 0.29-1.7; p=0.44). II showed a non-significant increase in odds of response (OR 1.2, 95% CI 0.54-2.7; p=0.66). III showed a non-significant increase in odds of response (OR 1.15, 95% CI 0.53-2.52; p=0.72). marker showed a non-significant increase in odds of response (OR 1.43, 95% CI 0.98-2.09; p=0.06).

Predictor	Univariable OR (95% CI)	Univariable p	Multivariable OR (95% CI)	Multivariable p
stage_T1	Reference		Reference
stage_T2	0.63 (0.27-1.46)	0.29	0.48 (0.19-1.19)	0.12
stage_T3	1.13 (0.48-2.68)	0.77	1.06 (0.42-2.67)	0.89
stage_T4	0.83 (0.36-1.92)	0.67	0.71 (0.29-1.7)	0.44
grade_I	Reference		Reference
grade_II	0.95 (0.45-2)	0.88	1.2 (0.54-2.7)	0.66
grade_III	1.1 (0.52-2.29)	0.81	1.15 (0.53-2.52)	0.72
marker	1.35 (0.94-1.93)	0.10	1.43 (0.98-2.09)	0.06
OR, Odds ratio; Binary logistic regression

Diagnostic Accuracy Metrics with Optional 2x2 Table

diag_accuracy function

Calculates diagnostic accuracy measures (Sensitivity, Specificity, PPV, NPV, Accuracy, LR+, LR-, DOR) from a binary test and gold standard. Provides 95% confidence intervals using Wilson method for proportions and log method for ratios. Optionally, prints a descriptive 2x2 table.

diagnostic_data <- data.frame(
  test = c("positive","negative","positive","
  negative","positive","negative","positive","negative"),
  goldstandard = c("positive","positive","negative",
  "negative","positive","negative","positive","negative")
)
diag_accuracy(diagnostic_data, test_col = "test",
gold_col = "goldstandard",
descriptive = FALSE)

Diagnostic Metric	Estimate (95% CI)
Sensitivity (%)	75 (30.06-95.44)
Specificity (%)	66.67 (20.77-93.85)
PPV (%)	75 (30.06-95.44)
NPV (%)	66.67 (20.77-93.85)
Accuracy (%)	71.43 (35.89-91.78)
LR+	2.25 (0.41-12.28)
LR-	0.38 (0.06-2.45)
DOR	6 (0.48-75.35)
CI formula references: Wilson (1927) for proportions; log method for LR+/- and DOR (Altman, 1991)

diag_accuracy(diagnostic_data, test_col = "test",
gold_col = "goldstandard",
descriptive = TRUE)

## a flextable object.
## col_keys: `Metric`, `Count` 
## header has 2 row(s) 
## body has 4 row(s) 
## original dataset sample: 
##           Metric Count
## 1  True Positive     3
## 2 False Positive     1
## 3 False Negative     1
## 4  True Negative     2

Diagnostic Metric	Estimate (95% CI)
Sensitivity (%)	75 (30.06-95.44)
Specificity (%)	66.67 (20.77-93.85)
PPV (%)	75 (30.06-95.44)
NPV (%)	66.67 (20.77-93.85)
Accuracy (%)	71.43 (35.89-91.78)
LR+	2.25 (0.41-12.28)
LR-	0.38 (0.06-2.45)
DOR	6 (0.48-75.35)
CI formula references: Wilson (1927) for proportions; log method for LR+/- and DOR (Altman, 1991)

dentomedical new version

Umar Hussain

2026-01-08

make sure