dentomedical

The ‘dentomedical’ package provides a comprehensive suite of tools for medical and dental research. It includes automated descriptive statistics, bivariate analysis with intelligent test selection, logistic regression, and diagnostic accuracy assessment. All functions generate structured, publication-ready tables using ‘flextable’, ensuring reproducibility and clarity suitable for manuscripts, reports, and clinical research workflows.

library(ggplot2)
# in order to laod diamonds dataset 
# we dont need for dentomedical runing

assessing Normality

norm_sum() automatically selects all numeric variables in a dataset and performs normality testing for each of them. It applies widely used tests, including:

Shapiro–Wilk test

Kolmogorov–Smirnov test (when applicable when n>5000)

The output provides a tidy summary table containing:

Variable name
Sample size
Normality test statistics
p-values
Interpretation (Normal /skewed)

norm_sum(diamonds)

Variable	W Statistic	p.value	Distribution
carat	0.897	< 0.001	Skewed
depth	0.964	< 0.001	Skewed
table	0.926	< 0.001	Skewed
price	0.795	< 0.001	Skewed
x	0.955	< 0.001	Skewed
y	0.957	< 0.001	Skewed
z	0.764	< 0.001	Skewed

summary statistics

his function generates descriptive summary tables for both continuous and categorical variables. Continuous variables can be summarized using mean (SD) or median (IQR), and categorical variables are summarized as counts and percentages. Optionally, summaries can be stratified by a grouping variable.

sum_stat(iris)

Variable	Characteristic	Value
Sepal.Length	Mean (SD)	5.84 (0.83)
Sepal.Width	Mean (SD)	3.06 (0.44)
Petal.Length	Mean (SD)	3.76 (1.77)
Petal.Width	Mean (SD)	1.20 (0.76)
Species	setosa	50 (33.33%)
	versicolor	50 (33.33%)
	virginica	50 (33.33%)
Mean (SD) / n(%)

# if data is not normal
sum_stat(iris,statistic = "med_iqr")

Variable	Characteristic	Value
Sepal.Length	Median (IQR)	5.80 (5.10, 6.40)
Sepal.Width	Median (IQR)	3.00 (2.80, 3.30)
Petal.Length	Median (IQR)	4.35 (1.60, 5.10)
Petal.Width	Median (IQR)	1.30 (0.30, 1.80)
Species	setosa	50 (33.33%)
	versicolor	50 (33.33%)
	virginica	50 (33.33%)
Median (IQR) / n(%)

selecting some variables from data and summarize

# shows indexing number 
names(diamonds)

##  [1] "carat"   "cut"     "color"   "clarity" "depth"   "table"   "price"  
##  [8] "x"       "y"       "z"

# summarize cut, colr and table 
sum_stat(diamonds[c(2,3,6)])

Variable	Characteristic	Value
cut	Fair	1610 (2.98%)
	Good	4906 (9.1%)
	Very Good	12082 (22.4%)
	Premium	13791 (25.57%)
	Ideal	21551 (39.95%)
color	D	6775 (12.56%)
	E	9797 (18.16%)
	F	9542 (17.69%)
	G	11292 (20.93%)
	H	8304 (15.39%)
	I	5422 (10.05%)
	J	2808 (5.21%)
table	Mean (SD)	57.46 (2.23)
Mean (SD) / n(%)

summarize by group

sum_stat(diamonds,by = "cut")

Variable	Characteristic	Fair	Good	Very Good	Premium	Ideal
carat	Mean (SD)	1.05 (0.52)	0.85 (0.45)	0.81 (0.46)	0.89 (0.52)	0.70 (0.43)
color	D	163 (10.12%)	662 (13.49%)	1513 (12.52%)	1603 (11.62%)	2834 (13.15%)
	E	224 (13.91%)	933 (19.02%)	2400 (19.86%)	2337 (16.95%)	3903 (18.11%)
	F	312 (19.38%)	909 (18.53%)	2164 (17.91%)	2331 (16.9%)	3826 (17.75%)
	G	314 (19.5%)	871 (17.75%)	2299 (19.03%)	2924 (21.2%)	4884 (22.66%)
	H	303 (18.82%)	702 (14.31%)	1824 (15.1%)	2360 (17.11%)	3115 (14.45%)
	I	175 (10.87%)	522 (10.64%)	1204 (9.97%)	1428 (10.35%)	2093 (9.71%)
	J	119 (7.39%)	307 (6.26%)	678 (5.61%)	808 (5.86%)	896 (4.16%)
clarity	I1	210 (13.04%)	96 (1.96%)	84 (0.7%)	205 (1.49%)	146 (0.68%)
	SI2	466 (28.94%)	1081 (22.03%)	2100 (17.38%)	2949 (21.38%)	2598 (12.06%)
	SI1	408 (25.34%)	1560 (31.8%)	3240 (26.82%)	3575 (25.92%)	4282 (19.87%)
	VS2	261 (16.21%)	978 (19.93%)	2591 (21.45%)	3357 (24.34%)	5071 (23.53%)
	VS1	170 (10.56%)	648 (13.21%)	1775 (14.69%)	1989 (14.42%)	3589 (16.65%)
	VVS2	69 (4.29%)	286 (5.83%)	1235 (10.22%)	870 (6.31%)	2606 (12.09%)
	VVS1	17 (1.06%)	186 (3.79%)	789 (6.53%)	616 (4.47%)	2047 (9.5%)
	IF	9 (0.56%)	71 (1.45%)	268 (2.22%)	230 (1.67%)	1212 (5.62%)
depth	Mean (SD)	64.04 (3.64)	62.37 (2.17)	61.82 (1.38)	61.26 (1.16)	61.71 (0.72)
table	Mean (SD)	59.05 (3.95)	58.69 (2.85)	57.96 (2.12)	58.75 (1.48)	55.95 (1.25)
price	Mean (SD)	4358.76 (3560.39)	3928.86 (3681.59)	3981.76 (3935.86)	4584.26 (4349.20)	3457.54 (3808.40)
x	Mean (SD)	6.25 (0.96)	5.84 (1.06)	5.74 (1.10)	5.97 (1.19)	5.51 (1.06)
y	Mean (SD)	6.18 (0.96)	5.85 (1.05)	5.77 (1.10)	5.94 (1.26)	5.52 (1.07)
z	Mean (SD)	3.98 (0.65)	3.64 (0.65)	3.56 (0.73)	3.65 (0.73)	3.40 (0.66)
Mean (SD) / n(%)

Apply tests to summarized data

sum_stat_p() generates a descriptive summary table for both categorical and continuous variables stratified by a grouping variable. It automatically computes appropriate statistical tests (Chi-square, Fisher’s exact, t-test, Wilcoxon, ANOVA, or Kruskal–Wallis) based on data type and distribution characteristics. The output is formatted as a flextable with footnotes indicating the summary statistics used and the tests applied.

# Example 1: Auto test selection, median/IQR summary
sum_stat_p(CO2, by = "Type", statistic = "med_iqr")

Variable	Characteristic	Quebec	Mississippi	p-value
Plant	Qn1	7 (17%)	0 (0%)	<0.001
	Qn2	7 (17%)	0 (0%)
	Qn3	7 (17%)	0 (0%)
	Qc1	7 (17%)	0 (0%)
	Qc3	7 (17%)	0 (0%)
	Qc2	7 (17%)	0 (0%)
	Mn3	0 (0%)	7 (17%)
	Mn2	0 (0%)	7 (17%)
	Mn1	0 (0%)	7 (17%)
	Mc2	0 (0%)	7 (17%)
	Mc3	0 (0%)	7 (17%)
	Mc1	0 (0%)	7 (17%)
Treatment	nonchilled	21 (50%)	21 (50%)	1
	chilled	21 (50%)	21 (50%)
conc	Median (IQR)	350 (175, 675)	350 (175, 675)	1
uptake	Median (IQR)	37.15 (30.33, 40.15)	19.3 (13.87, 28.05)	<0.001
1 n (%); Median (IQR)
P-values calculated using: Fisher's Exact, Chi-square, Student's t-test

# Example 2: Force Wilcoxon test for continuous variables
sum_stat_p(CO2, by = "Type", statistic = "med_iqr", test_type = "wilcox")

## Warning in chisq.test(tbl): Chi-squared approximation may be incorrect

## Warning in wilcox.test.default(x = DATA[[1L]], y = DATA[[2L]], ...): cannot
## compute exact p-value with ties
## Warning in wilcox.test.default(x = DATA[[1L]], y = DATA[[2L]], ...): cannot
## compute exact p-value with ties

Variable	Characteristic	Quebec	Mississippi	p-value
Plant	Qn1	7 (17%)	0 (0%)	<0.001
	Qn2	7 (17%)	0 (0%)
	Qn3	7 (17%)	0 (0%)
	Qc1	7 (17%)	0 (0%)
	Qc3	7 (17%)	0 (0%)
	Qc2	7 (17%)	0 (0%)
	Mn3	0 (0%)	7 (17%)
	Mn2	0 (0%)	7 (17%)
	Mn1	0 (0%)	7 (17%)
	Mc2	0 (0%)	7 (17%)
	Mc3	0 (0%)	7 (17%)
	Mc1	0 (0%)	7 (17%)
Treatment	nonchilled	21 (50%)	21 (50%)	1
	chilled	21 (50%)	21 (50%)
conc	Median (IQR)	350 (175, 675)	350 (175, 675)	1
uptake	Median (IQR)	37.15 (30.33, 40.15)	19.3 (13.87, 28.05)	<0.001
1 n (%); Median (IQR)
P-values calculated using: Chi-square, Wilcoxon Rank-Sum

# Example 3: Mean/SD with automatic test choice
sum_stat_p(CO2, by = "Treatment", statistic = "mean_sd")

Variable	Characteristic	nonchilled	chilled	p-value
Plant	Qn1	7 (17%)	0 (0%)	<0.001
	Qn2	7 (17%)	0 (0%)
	Qn3	7 (17%)	0 (0%)
	Qc1	0 (0%)	7 (17%)
	Qc3	0 (0%)	7 (17%)
	Qc2	0 (0%)	7 (17%)
	Mn3	7 (17%)	0 (0%)
	Mn2	7 (17%)	0 (0%)
	Mn1	7 (17%)	0 (0%)
	Mc2	0 (0%)	7 (17%)
	Mc3	0 (0%)	7 (17%)
	Mc1	0 (0%)	7 (17%)
Type	Quebec	21 (50%)	21 (50%)	1
	Mississippi	21 (50%)	21 (50%)
conc	Mean (SD)	435 (297.72)	435 (297.72)	1
uptake	Mean (SD)	30.64 (9.7)	23.78 (10.88)	0.00311
1 n (%); Mean (SD)
P-values calculated using: Fisher's Exact, Chi-square, Student's t-test

linear regresssion

linreg() performs univariate and multivariate linear regression analyses for the specified predictors and outcome variable, returning a summary table with characteristics, regression coefficients (β) with 95% CI Numeric variables are summarized as mean (SD); categorical variables as n( Multivariate model R^2 and adjusted R^2 are included in the table footer.

# Example using built-in iris dataset
linreg(iris, outcome = "Sepal.Length",
       predictors = c("Sepal.Width", "Petal.Length", "Species"))

Predictor	Level	Characteristics	Univariate beta (95% CI)	Univariate p	Multivariate beta (95% CI)	Multivariate p
Sepal.Width		3.06 (0.44)	-0.22 (-0.53, 0.08)	0.152	0.43 (0.27, 0.59)	<0.001
Petal.Length		3.76 (1.77)	0.41 (0.37, 0.45)	<0.001	0.78 (0.65, 0.9)	<0.001
Species	setosa	50 (33.3%) [ref]
	versicolor	50 (33.3%)	0.93 (0.73, 1.13)	<0.001	-0.96 (-1.38, -0.53)	<0.001
	virginica	50 (33.3%)	1.58 (1.38, 1.79)	<0.001	-1.39 (-1.96, -0.83)	<0.001
Characteristics: Mean (SD) for numeric; n (%) for categorical. Multivariate model: R^2 = 0.863; Adjusted R^2 = 0.86

logistic regresion in dentomedical pacakge

logreg() performs logistic regression for a binary outcome and a set of predictor variables. Computes both univariate and multivariate odds ratios (ORs) with 95% confidence intervals and p-values. Categorical variables automatically include a reference level in the output. Results are returned as a formatted flextable.

## medical_data() is internal dataset in the package
## Note: please make sure your all categorical variables should be  factor (ordinal)
## if not then run this code first (lets say your data name is 'df')

# df <- df %>%
#   mutate(across(where(is.character), as.factor))

logreg(data=medical_data(), outcome="case" ,
   predictors= c("age" ,  "parity" ,    "induced" ))

Predictor	Univariate OR (95% CI)	P-value (Univariate)	Multivariate OR (95% CI)	P-value (Multivariate)
age	1 (0.95-1.05)	0.956	1 (0.94-1.05)	0.878
parity	1.02 (0.82-1.25)	0.888	0.99 (0.75-1.3)	0.952
induced_No (ref)	Reference		Reference
induced_Yes	1.04 (0.56-1.91)	0.890	1.04 (0.55-1.95)	0.897

Diagnositic accuracy

diag_accuracy() Calculates diagnostic accuracy measures (Sensitivity, Specificity, PPV, NPV, Accuracy, LR+, LR-, DOR) from a binary test and gold standard. Provides 95% confidence intervals using Wilson method for proportions and log method for ratios. Optionally, prints a descriptive 2x2 table.

diagnostic_data <- data.frame(
  test = c("positive","negative","positive","
  negative","positive","negative","positive","negative"),
  goldstandard = c("positive","positive","negative",
  "negative","positive","negative","positive","negative")
)
diag_accuracy(diagnostic_data, test_col = "test",
gold_col = "goldstandard",
descriptive = FALSE)

Diagnostic Metric	Estimate (95% CI)
Sensitivity (%)	75 (30.06-95.44)
Specificity (%)	66.67 (20.77-93.85)
PPV (%)	75 (30.06-95.44)
NPV (%)	66.67 (20.77-93.85)
Accuracy (%)	71.43 (35.89-91.78)
LR+	2.25 (0.41-12.28)
LR-	0.38 (0.06-2.45)
DOR	6 (0.48-75.35)
CI formula references: Wilson (1927) for proportions; log method for LR+/- and DOR (Altman, 1991)

diag_accuracy(diagnostic_data, test_col = "test",
gold_col = "goldstandard",
descriptive = TRUE)

## a flextable object.
## col_keys: `Metric`, `Count` 
## header has 2 row(s) 
## body has 4 row(s) 
## original dataset sample: 
##           Metric Count
## 1  True Positive     3
## 2 False Positive     1
## 3 False Negative     1
## 4  True Negative     2

Diagnostic Metric	Estimate (95% CI)
Sensitivity (%)	75 (30.06-95.44)
Specificity (%)	66.67 (20.77-93.85)
PPV (%)	75 (30.06-95.44)
NPV (%)	66.67 (20.77-93.85)
Accuracy (%)	71.43 (35.89-91.78)
LR+	2.25 (0.41-12.28)
LR-	0.38 (0.06-2.45)
DOR	6 (0.48-75.35)
CI formula references: Wilson (1927) for proportions; log method for LR+/- and DOR (Altman, 1991)

dentomedical R Package

Umar Hussain

2025-12-09