dentomedical

The ‘dentomedical’ package provides a comprehensive suite of tools for medical and dental research. It includes automated descriptive statistics, bivariate analysis with intelligent test selection, logistic regression, and diagnostic accuracy assessment. All functions generate structured, publication-ready tables using ‘flextable’, ensuring reproducibility and clarity suitable for manuscripts, reports, and clinical research workflows.

library(ggplot2)
# in order to laod diamonds dataset 
# we dont need for dentomedical runing

assessing Normality

norm_sum() automatically selects all numeric variables in a dataset and performs normality testing for each of them. It applies widely used tests, including:

Shapiro–Wilk test

Kolmogorov–Smirnov test (when applicable when n>5000)

The output provides a tidy summary table containing:

norm_sum(diamonds)

Variable

W Statistic

p.value

Distribution

carat

0.897

< 0.001

Skewed

depth

0.964

< 0.001

Skewed

table

0.926

< 0.001

Skewed

price

0.795

< 0.001

Skewed

x

0.955

< 0.001

Skewed

y

0.957

< 0.001

Skewed

z

0.764

< 0.001

Skewed

summary statistics

his function generates descriptive summary tables for both continuous and categorical variables. Continuous variables can be summarized using mean (SD) or median (IQR), and categorical variables are summarized as counts and percentages. Optionally, summaries can be stratified by a grouping variable.

sum_stat(iris)

Variable

Characteristic

Value

Sepal.Length

Mean (SD)

5.84 (0.83)

Sepal.Width

Mean (SD)

3.06 (0.44)

Petal.Length

Mean (SD)

3.76 (1.77)

Petal.Width

Mean (SD)

1.20 (0.76)

Species

setosa

50 (33.33%)

versicolor

50 (33.33%)

virginica

50 (33.33%)

Mean (SD) / n(%)

# if data is not normal
sum_stat(iris,statistic = "med_iqr")

Variable

Characteristic

Value

Sepal.Length

Median (IQR)

5.80 (5.10, 6.40)

Sepal.Width

Median (IQR)

3.00 (2.80, 3.30)

Petal.Length

Median (IQR)

4.35 (1.60, 5.10)

Petal.Width

Median (IQR)

1.30 (0.30, 1.80)

Species

setosa

50 (33.33%)

versicolor

50 (33.33%)

virginica

50 (33.33%)

Median (IQR) / n(%)

selecting some variables from data and summarize

# shows indexing number 
names(diamonds)
##  [1] "carat"   "cut"     "color"   "clarity" "depth"   "table"   "price"  
##  [8] "x"       "y"       "z"
# summarize cut, colr and table 
sum_stat(diamonds[c(2,3,6)])

Variable

Characteristic

Value

cut

Fair

1610 (2.98%)

Good

4906 (9.1%)

Very Good

12082 (22.4%)

Premium

13791 (25.57%)

Ideal

21551 (39.95%)

color

D

6775 (12.56%)

E

9797 (18.16%)

F

9542 (17.69%)

G

11292 (20.93%)

H

8304 (15.39%)

I

5422 (10.05%)

J

2808 (5.21%)

table

Mean (SD)

57.46 (2.23)

Mean (SD) / n(%)

summarize by group

sum_stat(diamonds,by = "cut")

Variable

Characteristic

Fair

Good

Very Good

Premium

Ideal

carat

Mean (SD)

1.05 (0.52)

0.85 (0.45)

0.81 (0.46)

0.89 (0.52)

0.70 (0.43)

color

D

163 (10.12%)

662 (13.49%)

1513 (12.52%)

1603 (11.62%)

2834 (13.15%)

E

224 (13.91%)

933 (19.02%)

2400 (19.86%)

2337 (16.95%)

3903 (18.11%)

F

312 (19.38%)

909 (18.53%)

2164 (17.91%)

2331 (16.9%)

3826 (17.75%)

G

314 (19.5%)

871 (17.75%)

2299 (19.03%)

2924 (21.2%)

4884 (22.66%)

H

303 (18.82%)

702 (14.31%)

1824 (15.1%)

2360 (17.11%)

3115 (14.45%)

I

175 (10.87%)

522 (10.64%)

1204 (9.97%)

1428 (10.35%)

2093 (9.71%)

J

119 (7.39%)

307 (6.26%)

678 (5.61%)

808 (5.86%)

896 (4.16%)

clarity

I1

210 (13.04%)

96 (1.96%)

84 (0.7%)

205 (1.49%)

146 (0.68%)

SI2

466 (28.94%)

1081 (22.03%)

2100 (17.38%)

2949 (21.38%)

2598 (12.06%)

SI1

408 (25.34%)

1560 (31.8%)

3240 (26.82%)

3575 (25.92%)

4282 (19.87%)

VS2

261 (16.21%)

978 (19.93%)

2591 (21.45%)

3357 (24.34%)

5071 (23.53%)

VS1

170 (10.56%)

648 (13.21%)

1775 (14.69%)

1989 (14.42%)

3589 (16.65%)

VVS2

69 (4.29%)

286 (5.83%)

1235 (10.22%)

870 (6.31%)

2606 (12.09%)

VVS1

17 (1.06%)

186 (3.79%)

789 (6.53%)

616 (4.47%)

2047 (9.5%)

IF

9 (0.56%)

71 (1.45%)

268 (2.22%)

230 (1.67%)

1212 (5.62%)

depth

Mean (SD)

64.04 (3.64)

62.37 (2.17)

61.82 (1.38)

61.26 (1.16)

61.71 (0.72)

table

Mean (SD)

59.05 (3.95)

58.69 (2.85)

57.96 (2.12)

58.75 (1.48)

55.95 (1.25)

price

Mean (SD)

4358.76 (3560.39)

3928.86 (3681.59)

3981.76 (3935.86)

4584.26 (4349.20)

3457.54 (3808.40)

x

Mean (SD)

6.25 (0.96)

5.84 (1.06)

5.74 (1.10)

5.97 (1.19)

5.51 (1.06)

y

Mean (SD)

6.18 (0.96)

5.85 (1.05)

5.77 (1.10)

5.94 (1.26)

5.52 (1.07)

z

Mean (SD)

3.98 (0.65)

3.64 (0.65)

3.56 (0.73)

3.65 (0.73)

3.40 (0.66)

Mean (SD) / n(%)

Apply tests to summarized data

sum_stat_p() generates a descriptive summary table for both categorical and continuous variables stratified by a grouping variable. It automatically computes appropriate statistical tests (Chi-square, Fisher’s exact, t-test, Wilcoxon, ANOVA, or Kruskal–Wallis) based on data type and distribution characteristics. The output is formatted as a flextable with footnotes indicating the summary statistics used and the tests applied.

# Example 1: Auto test selection, median/IQR summary
sum_stat_p(CO2, by = "Type", statistic = "med_iqr")

Variable

Characteristic

Quebec

Mississippi

p-value

Plant

Qn1

7 (17%)

0 (0%)

<0.001

Qn2

7 (17%)

0 (0%)

Qn3

7 (17%)

0 (0%)

Qc1

7 (17%)

0 (0%)

Qc3

7 (17%)

0 (0%)

Qc2

7 (17%)

0 (0%)

Mn3

0 (0%)

7 (17%)

Mn2

0 (0%)

7 (17%)

Mn1

0 (0%)

7 (17%)

Mc2

0 (0%)

7 (17%)

Mc3

0 (0%)

7 (17%)

Mc1

0 (0%)

7 (17%)

Treatment

nonchilled

21 (50%)

21 (50%)

1

chilled

21 (50%)

21 (50%)

conc

Median (IQR)

350 (175, 675)

350 (175, 675)

1

uptake

Median (IQR)

37.15 (30.33, 40.15)

19.3 (13.87, 28.05)

<0.001

1 n (%); Median (IQR)

P-values calculated using: Fisher's Exact, Chi-square, Student's t-test

# Example 2: Force Wilcoxon test for continuous variables
sum_stat_p(CO2, by = "Type", statistic = "med_iqr", test_type = "wilcox")
## Warning in chisq.test(tbl): Chi-squared approximation may be incorrect
## Warning in wilcox.test.default(x = DATA[[1L]], y = DATA[[2L]], ...): cannot
## compute exact p-value with ties
## Warning in wilcox.test.default(x = DATA[[1L]], y = DATA[[2L]], ...): cannot
## compute exact p-value with ties

Variable

Characteristic

Quebec

Mississippi

p-value

Plant

Qn1

7 (17%)

0 (0%)

<0.001

Qn2

7 (17%)

0 (0%)

Qn3

7 (17%)

0 (0%)

Qc1

7 (17%)

0 (0%)

Qc3

7 (17%)

0 (0%)

Qc2

7 (17%)

0 (0%)

Mn3

0 (0%)

7 (17%)

Mn2

0 (0%)

7 (17%)

Mn1

0 (0%)

7 (17%)

Mc2

0 (0%)

7 (17%)

Mc3

0 (0%)

7 (17%)

Mc1

0 (0%)

7 (17%)

Treatment

nonchilled

21 (50%)

21 (50%)

1

chilled

21 (50%)

21 (50%)

conc

Median (IQR)

350 (175, 675)

350 (175, 675)

1

uptake

Median (IQR)

37.15 (30.33, 40.15)

19.3 (13.87, 28.05)

<0.001

1 n (%); Median (IQR)

P-values calculated using: Chi-square, Wilcoxon Rank-Sum

# Example 3: Mean/SD with automatic test choice
sum_stat_p(CO2, by = "Treatment", statistic = "mean_sd")

Variable

Characteristic

nonchilled

chilled

p-value

Plant

Qn1

7 (17%)

0 (0%)

<0.001

Qn2

7 (17%)

0 (0%)

Qn3

7 (17%)

0 (0%)

Qc1

0 (0%)

7 (17%)

Qc3

0 (0%)

7 (17%)

Qc2

0 (0%)

7 (17%)

Mn3

7 (17%)

0 (0%)

Mn2

7 (17%)

0 (0%)

Mn1

7 (17%)

0 (0%)

Mc2

0 (0%)

7 (17%)

Mc3

0 (0%)

7 (17%)

Mc1

0 (0%)

7 (17%)

Type

Quebec

21 (50%)

21 (50%)

1

Mississippi

21 (50%)

21 (50%)

conc

Mean (SD)

435 (297.72)

435 (297.72)

1

uptake

Mean (SD)

30.64 (9.7)

23.78 (10.88)

0.00311

1 n (%); Mean (SD)

P-values calculated using: Fisher's Exact, Chi-square, Student's t-test

linear regresssion

linreg() performs univariate and multivariate linear regression analyses for the specified predictors and outcome variable, returning a summary table with characteristics, regression coefficients (β) with 95% CI Numeric variables are summarized as mean (SD); categorical variables as n( Multivariate model R^2 and adjusted R^2 are included in the table footer.

# Example using built-in iris dataset
linreg(iris, outcome = "Sepal.Length",
       predictors = c("Sepal.Width", "Petal.Length", "Species"))

Predictor

Level

Characteristics

Univariate beta (95% CI)

Univariate p

Multivariate beta (95% CI)

Multivariate p

Sepal.Width

3.06 (0.44)

-0.22 (-0.53, 0.08)

0.152

0.43 (0.27, 0.59)

<0.001

Petal.Length

3.76 (1.77)

0.41 (0.37, 0.45)

<0.001

0.78 (0.65, 0.9)

<0.001

Species

setosa

50 (33.3%) [ref]

versicolor

50 (33.3%)

0.93 (0.73, 1.13)

<0.001

-0.96 (-1.38, -0.53)

<0.001

virginica

50 (33.3%)

1.58 (1.38, 1.79)

<0.001

-1.39 (-1.96, -0.83)

<0.001

Characteristics: Mean (SD) for numeric; n (%) for categorical. Multivariate model: R^2 = 0.863; Adjusted R^2 = 0.86

logistic regresion in dentomedical pacakge

logreg() performs logistic regression for a binary outcome and a set of predictor variables. Computes both univariate and multivariate odds ratios (ORs) with 95% confidence intervals and p-values. Categorical variables automatically include a reference level in the output. Results are returned as a formatted flextable.

## medical_data() is internal dataset in the package
## Note: please make sure your all categorical variables should be  factor (ordinal)
## if not then run this code first (lets say your data name is 'df')

# df <- df %>%
#   mutate(across(where(is.character), as.factor))

logreg(data=medical_data(), outcome="case" ,
   predictors= c("age" ,  "parity" ,    "induced" ))

Predictor

Univariate OR (95% CI)

P-value (Univariate)

Multivariate OR (95% CI)

P-value (Multivariate)

age

1 (0.95-1.05)

0.956

1 (0.94-1.05)

0.878

parity

1.02 (0.82-1.25)

0.888

0.99 (0.75-1.3)

0.952

induced_No (ref)

Reference

Reference

induced_Yes

1.04 (0.56-1.91)

0.890

1.04 (0.55-1.95)

0.897

Diagnositic accuracy

diag_accuracy() Calculates diagnostic accuracy measures (Sensitivity, Specificity, PPV, NPV, Accuracy, LR+, LR-, DOR) from a binary test and gold standard. Provides 95% confidence intervals using Wilson method for proportions and log method for ratios. Optionally, prints a descriptive 2x2 table.

diagnostic_data <- data.frame(
  test = c("positive","negative","positive","
  negative","positive","negative","positive","negative"),
  goldstandard = c("positive","positive","negative",
  "negative","positive","negative","positive","negative")
)
diag_accuracy(diagnostic_data, test_col = "test",
gold_col = "goldstandard",
descriptive = FALSE)

Diagnostic Metric

Estimate (95% CI)

Sensitivity (%)

75 (30.06-95.44)

Specificity (%)

66.67 (20.77-93.85)

PPV (%)

75 (30.06-95.44)

NPV (%)

66.67 (20.77-93.85)

Accuracy (%)

71.43 (35.89-91.78)

LR+

2.25 (0.41-12.28)

LR-

0.38 (0.06-2.45)

DOR

6 (0.48-75.35)

CI formula references: Wilson (1927) for proportions; log method for LR+/- and DOR (Altman, 1991)

diag_accuracy(diagnostic_data, test_col = "test",
gold_col = "goldstandard",
descriptive = TRUE)
## a flextable object.
## col_keys: `Metric`, `Count` 
## header has 2 row(s) 
## body has 4 row(s) 
## original dataset sample: 
##           Metric Count
## 1  True Positive     3
## 2 False Positive     1
## 3 False Negative     1
## 4  True Negative     2

Diagnostic Metric

Estimate (95% CI)

Sensitivity (%)

75 (30.06-95.44)

Specificity (%)

66.67 (20.77-93.85)

PPV (%)

75 (30.06-95.44)

NPV (%)

66.67 (20.77-93.85)

Accuracy (%)

71.43 (35.89-91.78)

LR+

2.25 (0.41-12.28)

LR-

0.38 (0.06-2.45)

DOR

6 (0.48-75.35)

CI formula references: Wilson (1927) for proportions; log method for LR+/- and DOR (Altman, 1991)