The ‘dentomedical’ package provides a comprehensive suite of tools for medical and dental research. It includes automated descriptive statistics, bivariate analysis with intelligent test selection, logistic regression, and diagnostic accuracy assessment. All functions generate structured, publication-ready tables using ‘flextable’, ensuring reproducibility and clarity suitable for manuscripts, reports, and clinical research workflows.
library(ggplot2)
# in order to laod diamonds dataset
# we dont need for dentomedical runing
norm_sum() automatically selects all numeric variables in a dataset and performs normality testing for each of them. It applies widely used tests, including:
Shapiro–Wilk test
Kolmogorov–Smirnov test (when applicable when n>5000)
The output provides a tidy summary table containing:
norm_sum(diamonds)
Variable | W Statistic | p.value | Distribution |
|---|---|---|---|
carat | 0.897 | < 0.001 | Skewed |
depth | 0.964 | < 0.001 | Skewed |
table | 0.926 | < 0.001 | Skewed |
price | 0.795 | < 0.001 | Skewed |
x | 0.955 | < 0.001 | Skewed |
y | 0.957 | < 0.001 | Skewed |
z | 0.764 | < 0.001 | Skewed |
his function generates descriptive summary tables for both continuous and categorical variables. Continuous variables can be summarized using mean (SD) or median (IQR), and categorical variables are summarized as counts and percentages. Optionally, summaries can be stratified by a grouping variable.
sum_stat(iris)
Variable | Characteristic | Value |
|---|---|---|
Sepal.Length | Mean (SD) | 5.84 (0.83) |
Sepal.Width | Mean (SD) | 3.06 (0.44) |
Petal.Length | Mean (SD) | 3.76 (1.77) |
Petal.Width | Mean (SD) | 1.20 (0.76) |
Species | setosa | 50 (33.33%) |
versicolor | 50 (33.33%) | |
virginica | 50 (33.33%) | |
Mean (SD) / n(%) | ||
# if data is not normal
sum_stat(iris,statistic = "med_iqr")
Variable | Characteristic | Value |
|---|---|---|
Sepal.Length | Median (IQR) | 5.80 (5.10, 6.40) |
Sepal.Width | Median (IQR) | 3.00 (2.80, 3.30) |
Petal.Length | Median (IQR) | 4.35 (1.60, 5.10) |
Petal.Width | Median (IQR) | 1.30 (0.30, 1.80) |
Species | setosa | 50 (33.33%) |
versicolor | 50 (33.33%) | |
virginica | 50 (33.33%) | |
Median (IQR) / n(%) | ||
# shows indexing number
names(diamonds)
## [1] "carat" "cut" "color" "clarity" "depth" "table" "price"
## [8] "x" "y" "z"
# summarize cut, colr and table
sum_stat(diamonds[c(2,3,6)])
Variable | Characteristic | Value |
|---|---|---|
cut | Fair | 1610 (2.98%) |
Good | 4906 (9.1%) | |
Very Good | 12082 (22.4%) | |
Premium | 13791 (25.57%) | |
Ideal | 21551 (39.95%) | |
color | D | 6775 (12.56%) |
E | 9797 (18.16%) | |
F | 9542 (17.69%) | |
G | 11292 (20.93%) | |
H | 8304 (15.39%) | |
I | 5422 (10.05%) | |
J | 2808 (5.21%) | |
table | Mean (SD) | 57.46 (2.23) |
Mean (SD) / n(%) | ||
sum_stat(diamonds,by = "cut")
Variable | Characteristic | Fair | Good | Very Good | Premium | Ideal |
|---|---|---|---|---|---|---|
carat | Mean (SD) | 1.05 (0.52) | 0.85 (0.45) | 0.81 (0.46) | 0.89 (0.52) | 0.70 (0.43) |
color | D | 163 (10.12%) | 662 (13.49%) | 1513 (12.52%) | 1603 (11.62%) | 2834 (13.15%) |
E | 224 (13.91%) | 933 (19.02%) | 2400 (19.86%) | 2337 (16.95%) | 3903 (18.11%) | |
F | 312 (19.38%) | 909 (18.53%) | 2164 (17.91%) | 2331 (16.9%) | 3826 (17.75%) | |
G | 314 (19.5%) | 871 (17.75%) | 2299 (19.03%) | 2924 (21.2%) | 4884 (22.66%) | |
H | 303 (18.82%) | 702 (14.31%) | 1824 (15.1%) | 2360 (17.11%) | 3115 (14.45%) | |
I | 175 (10.87%) | 522 (10.64%) | 1204 (9.97%) | 1428 (10.35%) | 2093 (9.71%) | |
J | 119 (7.39%) | 307 (6.26%) | 678 (5.61%) | 808 (5.86%) | 896 (4.16%) | |
clarity | I1 | 210 (13.04%) | 96 (1.96%) | 84 (0.7%) | 205 (1.49%) | 146 (0.68%) |
SI2 | 466 (28.94%) | 1081 (22.03%) | 2100 (17.38%) | 2949 (21.38%) | 2598 (12.06%) | |
SI1 | 408 (25.34%) | 1560 (31.8%) | 3240 (26.82%) | 3575 (25.92%) | 4282 (19.87%) | |
VS2 | 261 (16.21%) | 978 (19.93%) | 2591 (21.45%) | 3357 (24.34%) | 5071 (23.53%) | |
VS1 | 170 (10.56%) | 648 (13.21%) | 1775 (14.69%) | 1989 (14.42%) | 3589 (16.65%) | |
VVS2 | 69 (4.29%) | 286 (5.83%) | 1235 (10.22%) | 870 (6.31%) | 2606 (12.09%) | |
VVS1 | 17 (1.06%) | 186 (3.79%) | 789 (6.53%) | 616 (4.47%) | 2047 (9.5%) | |
IF | 9 (0.56%) | 71 (1.45%) | 268 (2.22%) | 230 (1.67%) | 1212 (5.62%) | |
depth | Mean (SD) | 64.04 (3.64) | 62.37 (2.17) | 61.82 (1.38) | 61.26 (1.16) | 61.71 (0.72) |
table | Mean (SD) | 59.05 (3.95) | 58.69 (2.85) | 57.96 (2.12) | 58.75 (1.48) | 55.95 (1.25) |
price | Mean (SD) | 4358.76 (3560.39) | 3928.86 (3681.59) | 3981.76 (3935.86) | 4584.26 (4349.20) | 3457.54 (3808.40) |
x | Mean (SD) | 6.25 (0.96) | 5.84 (1.06) | 5.74 (1.10) | 5.97 (1.19) | 5.51 (1.06) |
y | Mean (SD) | 6.18 (0.96) | 5.85 (1.05) | 5.77 (1.10) | 5.94 (1.26) | 5.52 (1.07) |
z | Mean (SD) | 3.98 (0.65) | 3.64 (0.65) | 3.56 (0.73) | 3.65 (0.73) | 3.40 (0.66) |
Mean (SD) / n(%) | ||||||
sum_stat_p() generates a descriptive summary table for both categorical and continuous variables stratified by a grouping variable. It automatically computes appropriate statistical tests (Chi-square, Fisher’s exact, t-test, Wilcoxon, ANOVA, or Kruskal–Wallis) based on data type and distribution characteristics. The output is formatted as a flextable with footnotes indicating the summary statistics used and the tests applied.
# Example 1: Auto test selection, median/IQR summary
sum_stat_p(CO2, by = "Type", statistic = "med_iqr")
Variable | Characteristic | Quebec | Mississippi | p-value |
|---|---|---|---|---|
Plant | Qn1 | 7 (17%) | 0 (0%) | <0.001 |
Qn2 | 7 (17%) | 0 (0%) | ||
Qn3 | 7 (17%) | 0 (0%) | ||
Qc1 | 7 (17%) | 0 (0%) | ||
Qc3 | 7 (17%) | 0 (0%) | ||
Qc2 | 7 (17%) | 0 (0%) | ||
Mn3 | 0 (0%) | 7 (17%) | ||
Mn2 | 0 (0%) | 7 (17%) | ||
Mn1 | 0 (0%) | 7 (17%) | ||
Mc2 | 0 (0%) | 7 (17%) | ||
Mc3 | 0 (0%) | 7 (17%) | ||
Mc1 | 0 (0%) | 7 (17%) | ||
Treatment | nonchilled | 21 (50%) | 21 (50%) | 1 |
chilled | 21 (50%) | 21 (50%) | ||
conc | Median (IQR) | 350 (175, 675) | 350 (175, 675) | 1 |
uptake | Median (IQR) | 37.15 (30.33, 40.15) | 19.3 (13.87, 28.05) | <0.001 |
1 n (%); Median (IQR) | ||||
P-values calculated using: Fisher's Exact, Chi-square, Student's t-test | ||||
# Example 2: Force Wilcoxon test for continuous variables
sum_stat_p(CO2, by = "Type", statistic = "med_iqr", test_type = "wilcox")
## Warning in chisq.test(tbl): Chi-squared approximation may be incorrect
## Warning in wilcox.test.default(x = DATA[[1L]], y = DATA[[2L]], ...): cannot
## compute exact p-value with ties
## Warning in wilcox.test.default(x = DATA[[1L]], y = DATA[[2L]], ...): cannot
## compute exact p-value with ties
Variable | Characteristic | Quebec | Mississippi | p-value |
|---|---|---|---|---|
Plant | Qn1 | 7 (17%) | 0 (0%) | <0.001 |
Qn2 | 7 (17%) | 0 (0%) | ||
Qn3 | 7 (17%) | 0 (0%) | ||
Qc1 | 7 (17%) | 0 (0%) | ||
Qc3 | 7 (17%) | 0 (0%) | ||
Qc2 | 7 (17%) | 0 (0%) | ||
Mn3 | 0 (0%) | 7 (17%) | ||
Mn2 | 0 (0%) | 7 (17%) | ||
Mn1 | 0 (0%) | 7 (17%) | ||
Mc2 | 0 (0%) | 7 (17%) | ||
Mc3 | 0 (0%) | 7 (17%) | ||
Mc1 | 0 (0%) | 7 (17%) | ||
Treatment | nonchilled | 21 (50%) | 21 (50%) | 1 |
chilled | 21 (50%) | 21 (50%) | ||
conc | Median (IQR) | 350 (175, 675) | 350 (175, 675) | 1 |
uptake | Median (IQR) | 37.15 (30.33, 40.15) | 19.3 (13.87, 28.05) | <0.001 |
1 n (%); Median (IQR) | ||||
P-values calculated using: Chi-square, Wilcoxon Rank-Sum | ||||
# Example 3: Mean/SD with automatic test choice
sum_stat_p(CO2, by = "Treatment", statistic = "mean_sd")
Variable | Characteristic | nonchilled | chilled | p-value |
|---|---|---|---|---|
Plant | Qn1 | 7 (17%) | 0 (0%) | <0.001 |
Qn2 | 7 (17%) | 0 (0%) | ||
Qn3 | 7 (17%) | 0 (0%) | ||
Qc1 | 0 (0%) | 7 (17%) | ||
Qc3 | 0 (0%) | 7 (17%) | ||
Qc2 | 0 (0%) | 7 (17%) | ||
Mn3 | 7 (17%) | 0 (0%) | ||
Mn2 | 7 (17%) | 0 (0%) | ||
Mn1 | 7 (17%) | 0 (0%) | ||
Mc2 | 0 (0%) | 7 (17%) | ||
Mc3 | 0 (0%) | 7 (17%) | ||
Mc1 | 0 (0%) | 7 (17%) | ||
Type | Quebec | 21 (50%) | 21 (50%) | 1 |
Mississippi | 21 (50%) | 21 (50%) | ||
conc | Mean (SD) | 435 (297.72) | 435 (297.72) | 1 |
uptake | Mean (SD) | 30.64 (9.7) | 23.78 (10.88) | 0.00311 |
1 n (%); Mean (SD) | ||||
P-values calculated using: Fisher's Exact, Chi-square, Student's t-test | ||||
linreg() performs univariate and multivariate linear regression analyses for the specified predictors and outcome variable, returning a summary table with characteristics, regression coefficients (β) with 95% CI Numeric variables are summarized as mean (SD); categorical variables as n( Multivariate model R^2 and adjusted R^2 are included in the table footer.
# Example using built-in iris dataset
linreg(iris, outcome = "Sepal.Length",
predictors = c("Sepal.Width", "Petal.Length", "Species"))
Predictor | Level | Characteristics | Univariate beta (95% CI) | Univariate p | Multivariate beta (95% CI) | Multivariate p |
|---|---|---|---|---|---|---|
Sepal.Width | 3.06 (0.44) | -0.22 (-0.53, 0.08) | 0.152 | 0.43 (0.27, 0.59) | <0.001 | |
Petal.Length | 3.76 (1.77) | 0.41 (0.37, 0.45) | <0.001 | 0.78 (0.65, 0.9) | <0.001 | |
Species | setosa | 50 (33.3%) [ref] | ||||
versicolor | 50 (33.3%) | 0.93 (0.73, 1.13) | <0.001 | -0.96 (-1.38, -0.53) | <0.001 | |
virginica | 50 (33.3%) | 1.58 (1.38, 1.79) | <0.001 | -1.39 (-1.96, -0.83) | <0.001 | |
Characteristics: Mean (SD) for numeric; n (%) for categorical. Multivariate model: R^2 = 0.863; Adjusted R^2 = 0.86 | ||||||
logreg() performs logistic regression for a binary outcome and a set of predictor variables. Computes both univariate and multivariate odds ratios (ORs) with 95% confidence intervals and p-values. Categorical variables automatically include a reference level in the output. Results are returned as a formatted flextable.
## medical_data() is internal dataset in the package
## Note: please make sure your all categorical variables should be factor (ordinal)
## if not then run this code first (lets say your data name is 'df')
# df <- df %>%
# mutate(across(where(is.character), as.factor))
logreg(data=medical_data(), outcome="case" ,
predictors= c("age" , "parity" , "induced" ))
Predictor | Univariate OR (95% CI) | P-value (Univariate) | Multivariate OR (95% CI) | P-value (Multivariate) |
|---|---|---|---|---|
age | 1 (0.95-1.05) | 0.956 | 1 (0.94-1.05) | 0.878 |
parity | 1.02 (0.82-1.25) | 0.888 | 0.99 (0.75-1.3) | 0.952 |
induced_No (ref) | Reference | Reference | ||
induced_Yes | 1.04 (0.56-1.91) | 0.890 | 1.04 (0.55-1.95) | 0.897 |
diag_accuracy() Calculates diagnostic accuracy measures (Sensitivity, Specificity, PPV, NPV, Accuracy, LR+, LR-, DOR) from a binary test and gold standard. Provides 95% confidence intervals using Wilson method for proportions and log method for ratios. Optionally, prints a descriptive 2x2 table.
diagnostic_data <- data.frame(
test = c("positive","negative","positive","
negative","positive","negative","positive","negative"),
goldstandard = c("positive","positive","negative",
"negative","positive","negative","positive","negative")
)
diag_accuracy(diagnostic_data, test_col = "test",
gold_col = "goldstandard",
descriptive = FALSE)
Diagnostic Metric | Estimate (95% CI) |
|---|---|
Sensitivity (%) | 75 (30.06-95.44) |
Specificity (%) | 66.67 (20.77-93.85) |
PPV (%) | 75 (30.06-95.44) |
NPV (%) | 66.67 (20.77-93.85) |
Accuracy (%) | 71.43 (35.89-91.78) |
LR+ | 2.25 (0.41-12.28) |
LR- | 0.38 (0.06-2.45) |
DOR | 6 (0.48-75.35) |
CI formula references: Wilson (1927) for proportions; log method for LR+/- and DOR (Altman, 1991) | |
diag_accuracy(diagnostic_data, test_col = "test",
gold_col = "goldstandard",
descriptive = TRUE)
## a flextable object.
## col_keys: `Metric`, `Count`
## header has 2 row(s)
## body has 4 row(s)
## original dataset sample:
## Metric Count
## 1 True Positive 3
## 2 False Positive 1
## 3 False Negative 1
## 4 True Negative 2
Diagnostic Metric | Estimate (95% CI) |
|---|---|
Sensitivity (%) | 75 (30.06-95.44) |
Specificity (%) | 66.67 (20.77-93.85) |
PPV (%) | 75 (30.06-95.44) |
NPV (%) | 66.67 (20.77-93.85) |
Accuracy (%) | 71.43 (35.89-91.78) |
LR+ | 2.25 (0.41-12.28) |
LR- | 0.38 (0.06-2.45) |
DOR | 6 (0.48-75.35) |
CI formula references: Wilson (1927) for proportions; log method for LR+/- and DOR (Altman, 1991) | |