You have instal updated package from Git hub using command
devtools::install_github(“umarhussain-git/dentomedical1
Publication-Ready Descriptive, Bivariate, Regression, and Diagnostic Accuracy Tools for Medical and Dental Data
category creates a new categorical variable by splitting a numeric column into specified ranges. You can provide custom labels for each range, or it will default to the range values themselves.
df <- data.frame(age = c(20, 28, 26, 40, 55, 34, 10, 24, 55))
# Categorize without custom labels
category(df, var = age, level = c("10-25", "26-35", "36-50"))
## age age_group
## 1 20 10-25
## 2 28 26-35
## 3 26 26-35
## 4 40 36-50
## 5 55 <NA>
## 6 34 26-35
## 7 10 10-25
## 8 24 10-25
## 9 55 <NA>
# Categorize with custom labels
category(df, var = age, level = c("10-25", "26-35", "36-55"),
labels = c("young", "adult", "old"))
## age age_group
## 1 20 young
## 2 28 adult
## 3 26 adult
## 4 40 old
## 5 55 old
## 6 34 adult
## 7 10 young
## 8 24 young
## 9 55 old
This function replaces values in a data frame according to a named lookup vector. All columns are converted to character, and any value matching a name in lookup will be replaced by its corresponding value. let say we want change spelling mistake or want relabel in the categorical variable like in gender we to replace “F” with “Female”
df <- data.frame(
gender = c("male", "F", "male", "female"),
status = c("single", "Married", "oo", "M"),
stringsAsFactors = FALSE
)
lookup <- c(
"male" = "Male",
"M" = "Married",
"oo" = "Widow",
"female" = "Female",
"F" = "Female"
)
df_recode <- recode_data(df, lookup)
print(df_recode)
## gender status
## 1 Male single
## 2 Female Married
## 3 Male Widow
## 4 Female Married
This function imputes missing values in a data frame. For categorical variables (factor or character), missing values are replaced with the mode (most common category). For numeric variables, missing values can be imputed using the mean, median, or regression-based imputation. If no method is specified for numeric columns, missing values are left as NA. * starwars data has some missing value let handle these values
library(dplyr) # to import starwars data
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
head(starwars)
## # A tibble: 6 × 14
## name height mass hair_color skin_color eye_color birth_year sex gender
## <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr> <chr>
## 1 Luke Sky… 172 77 blond fair blue 19 male mascu…
## 2 C-3PO 167 75 <NA> gold yellow 112 none mascu…
## 3 R2-D2 96 32 <NA> white, bl… red 33 none mascu…
## 4 Darth Va… 202 136 none white yellow 41.9 male mascu…
## 5 Leia Org… 150 49 brown light brown 19 fema… femin…
## 6 Owen Lars 178 120 brown, gr… light blue 52 male mascu…
## # ℹ 5 more variables: homeworld <chr>, species <chr>, films <list>,
## # vehicles <list>, starships <list>
# Impute numeric columns using regression and categorical with mode
impute_missing(starwars, method = "regression")
## # A tibble: 87 × 14
## name height mass hair_color skin_color eye_color birth_year sex gender
## <chr> <dbl> <dbl> <chr> <chr> <chr> <dbl> <chr> <chr>
## 1 Luke Sk… 172 77 blond fair blue 19 male mascu…
## 2 C-3PO 167 75 none gold yellow 112 none mascu…
## 3 R2-D2 96 32 none white, bl… red 33 none mascu…
## 4 Darth V… 202 136 none white yellow 41.9 male mascu…
## 5 Leia Or… 150 49 brown light brown 19 fema… femin…
## 6 Owen La… 178 120 brown, gr… light blue 52 male mascu…
## 7 Beru Wh… 165 75 brown light blue 47 fema… femin…
## 8 R5-D4 97 32 none white, red red 254. none mascu…
## 9 Biggs D… 183 84 black light brown 24 male mascu…
## 10 Obi-Wan… 182 77 auburn, w… fair blue-gray 57 male mascu…
## # ℹ 77 more rows
## # ℹ 5 more variables: homeworld <chr>, species <chr>, films <list>,
## # vehicles <list>, starships <list>
# Impute numeric columns using mean
impute_missing(starwars, method = "mean")
## # A tibble: 87 × 14
## name height mass hair_color skin_color eye_color birth_year sex gender
## <chr> <dbl> <dbl> <chr> <chr> <chr> <dbl> <chr> <chr>
## 1 Luke Sk… 172 77 blond fair blue 19 male mascu…
## 2 C-3PO 167 75 none gold yellow 112 none mascu…
## 3 R2-D2 96 32 none white, bl… red 33 none mascu…
## 4 Darth V… 202 136 none white yellow 41.9 male mascu…
## 5 Leia Or… 150 49 brown light brown 19 fema… femin…
## 6 Owen La… 178 120 brown, gr… light blue 52 male mascu…
## 7 Beru Wh… 165 75 brown light blue 47 fema… femin…
## 8 R5-D4 97 32 none white, red red 87.6 none mascu…
## 9 Biggs D… 183 84 black light brown 24 male mascu…
## 10 Obi-Wan… 182 77 auburn, w… fair blue-gray 57 male mascu…
## # ℹ 77 more rows
## # ℹ 5 more variables: homeworld <chr>, species <chr>, films <list>,
## # vehicles <list>, starships <list>
# Impute numeric columns using median
impute_missing(starwars, method = "median")
## # A tibble: 87 × 14
## name height mass hair_color skin_color eye_color birth_year sex gender
## <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr> <chr>
## 1 Luke Sk… 172 77 blond fair blue 19 male mascu…
## 2 C-3PO 167 75 none gold yellow 112 none mascu…
## 3 R2-D2 96 32 none white, bl… red 33 none mascu…
## 4 Darth V… 202 136 none white yellow 41.9 male mascu…
## 5 Leia Or… 150 49 brown light brown 19 fema… femin…
## 6 Owen La… 178 120 brown, gr… light blue 52 male mascu…
## 7 Beru Wh… 165 75 brown light blue 47 fema… femin…
## 8 R5-D4 97 32 none white, red red 52 none mascu…
## 9 Biggs D… 183 84 black light brown 24 male mascu…
## 10 Obi-Wan… 182 77 auburn, w… fair blue-gray 57 male mascu…
## # ℹ 77 more rows
## # ℹ 5 more variables: homeworld <chr>, species <chr>, films <list>,
## # vehicles <list>, starships <list>
This function performs the Shapiro-Wilk normality test on all numeric variables in a dataset and returns the results in a publication-ready flextable. Extremely small p-values are displayed as “p < 0.001”. The function automatically detects numeric variables and ignores non-numeric columns.
sum_norm(iris)
Variable | W Statistic | p.value | Distribution |
|---|---|---|---|
Sepal.Length | 0.976 | 0.0102 | Skewed |
Sepal.Width | 0.985 | 0.1012 | Normal |
Petal.Length | 0.876 | < 0.001 | Skewed |
Petal.Width | 0.902 | < 0.001 | Skewed |
sum_stat provides a summary of both continuous and categorical variables in a dataset. Continuous variables can be summarized using mean (SD) or median (IQR), optionally with 95% confidence intervals. Categorical variables are summarized as counts and percentages, optionally with confidence intervals. Summaries can also be generated by a grouping variable, and a narrative interpretation is optionally printed.
# Basic summary of iris dataset
sum_stat(iris, ci = FALSE, report = TRUE)
## Mean Sepal.Length was 5.84 +/-0.83.
##
## Mean Sepal.Width was 3.06 +/-0.44.
##
## Mean Petal.Length was 3.76 +/-1.77.
##
## Mean Petal.Width was 1.2 +/-0.76.
##
## For Species, all categories were similar in frequency (n=50, 33.33%).
Variable | Characteristic | N = 150* |
|---|---|---|
Sepal.Length | Mean (SD) | 5.84 (0.83) |
Sepal.Width | Mean (SD) | 3.06 (0.44) |
Petal.Length | Mean (SD) | 3.76 (1.77) |
Petal.Width | Mean (SD) | 1.2 (0.76) |
Species | setosa | 50 (33.33%) |
versicolor | 50 (33.33%) | |
virginica | 50 (33.33%) | |
* Mean (SD)/n(%) | ||
names(starwars)
## [1] "name" "height" "mass" "hair_color" "skin_color"
## [6] "eye_color" "birth_year" "sex" "gender" "homeworld"
## [11] "species" "films" "vehicles" "starships"
# select "height" "mass" "hair_color" and "species" by using thier index number
sum_stat(starwars[c(2,3,4,11)])
## Mean height was 174.6 +/-34.77.
##
## Mean mass was 97.31 +/-169.46.
##
## For hair_color, frequency was highest in none (n=38, 43.68%), followed by brown (n=18, 20.69%), black (n=13, 14.94%), NA (n=5, 5.75%), white (n=4, 4.6%), blond (n=3, 3.45%), auburn (n=1, 1.15%), auburn, grey (n=1, 1.15%), auburn, white (n=1, 1.15%), blonde (n=1, 1.15%), brown, grey (n=1, 1.15%), and lowest in grey (n=1, 1.15%).
##
## For species, frequency was highest in Human (n=35, 40.23%), followed by Droid (n=6, 6.9%), NA (n=4, 4.6%), Gungan (n=3, 3.45%), Kaminoan (n=2, 2.3%), Mirialan (n=2, 2.3%), Twi'lek (n=2, 2.3%), Wookiee (n=2, 2.3%), Zabrak (n=2, 2.3%), Aleena (n=1, 1.15%), Besalisk (n=1, 1.15%), Cerean (n=1, 1.15%), Chagrian (n=1, 1.15%), Clawdite (n=1, 1.15%), Dug (n=1, 1.15%), Ewok (n=1, 1.15%), Geonosian (n=1, 1.15%), Hutt (n=1, 1.15%), Iktotchi (n=1, 1.15%), Kaleesh (n=1, 1.15%), Kel Dor (n=1, 1.15%), Mon Calamari (n=1, 1.15%), Muun (n=1, 1.15%), Nautolan (n=1, 1.15%), Neimodian (n=1, 1.15%), Pau'an (n=1, 1.15%), Quermian (n=1, 1.15%), Rodian (n=1, 1.15%), Skakoan (n=1, 1.15%), Sullustan (n=1, 1.15%), Tholothian (n=1, 1.15%), Togruta (n=1, 1.15%), Toong (n=1, 1.15%), Toydarian (n=1, 1.15%), Trandoshan (n=1, 1.15%), Vulptereen (n=1, 1.15%), Xexto (n=1, 1.15%), and lowest in Yoda's species (n=1, 1.15%).
Variable | Characteristic | N = 87* |
|---|---|---|
height | Mean (SD) | 174.6 (34.77) |
mass | Mean (SD) | 97.31 (169.46) |
hair_color | auburn | 1 (1.15%) |
auburn, grey | 1 (1.15%) | |
auburn, white | 1 (1.15%) | |
black | 13 (14.94%) | |
blond | 3 (3.45%) | |
blonde | 1 (1.15%) | |
brown | 18 (20.69%) | |
brown, grey | 1 (1.15%) | |
grey | 1 (1.15%) | |
none | 38 (43.68%) | |
white | 4 (4.6%) | |
5 (5.75%) | ||
species | Aleena | 1 (1.15%) |
Besalisk | 1 (1.15%) | |
Cerean | 1 (1.15%) | |
Chagrian | 1 (1.15%) | |
Clawdite | 1 (1.15%) | |
Droid | 6 (6.9%) | |
Dug | 1 (1.15%) | |
Ewok | 1 (1.15%) | |
Geonosian | 1 (1.15%) | |
Gungan | 3 (3.45%) | |
Human | 35 (40.23%) | |
Hutt | 1 (1.15%) | |
Iktotchi | 1 (1.15%) | |
Kaleesh | 1 (1.15%) | |
Kaminoan | 2 (2.3%) | |
Kel Dor | 1 (1.15%) | |
Mirialan | 2 (2.3%) | |
Mon Calamari | 1 (1.15%) | |
Muun | 1 (1.15%) | |
Nautolan | 1 (1.15%) | |
Neimodian | 1 (1.15%) | |
Pau'an | 1 (1.15%) | |
Quermian | 1 (1.15%) | |
Rodian | 1 (1.15%) | |
Skakoan | 1 (1.15%) | |
Sullustan | 1 (1.15%) | |
Tholothian | 1 (1.15%) | |
Togruta | 1 (1.15%) | |
Toong | 1 (1.15%) | |
Toydarian | 1 (1.15%) | |
Trandoshan | 1 (1.15%) | |
Twi'lek | 2 (2.3%) | |
Vulptereen | 1 (1.15%) | |
Wookiee | 2 (2.3%) | |
Xexto | 1 (1.15%) | |
Yoda's species | 1 (1.15%) | |
Zabrak | 2 (2.3%) | |
4 (4.6%) | ||
* Mean (SD)/n(%) | ||
use “by=”
# Summary of CO2 dataset by 'Treatment' with CI
sum_stat(CO2, by = "Treatment", ci = TRUE, report = TRUE, percent = "row")
## Mean conc was similar between nonchilled (435 ± 297.72 [95% CI: 344.96, 525.04]) and chilled (435 +/- 297.72 [95% CI: 344.96, 525.04]).
##
## Mean uptake was higher in nonchilled (30.64 ± 9.7 [95% CI: 27.71, 33.58]) than in chilled (23.78 +/- 10.88 [95% CI: 20.49, 27.08]).
##
## For Plant, all categories were similar in frequency (n=7, 8.33%).
##
## For Type, all categories were similar in frequency (n=42, 50%).
Variable | Characteristic | nonchilled | chilled |
|---|---|---|---|
Plant | Qn1 | 7 (100% [59.04, 100]) | 0 (0% [0, 40.96]) |
Qn2 | 7 (100% [59.04, 100]) | 0 (0% [0, 40.96]) | |
Qn3 | 7 (100% [59.04, 100]) | 0 (0% [0, 40.96]) | |
Qc1 | 0 (0% [0, 40.96]) | 7 (100% [59.04, 100]) | |
Qc3 | 0 (0% [0, 40.96]) | 7 (100% [59.04, 100]) | |
Qc2 | 0 (0% [0, 40.96]) | 7 (100% [59.04, 100]) | |
Mn3 | 7 (100% [59.04, 100]) | 0 (0% [0, 40.96]) | |
Mn2 | 7 (100% [59.04, 100]) | 0 (0% [0, 40.96]) | |
Mn1 | 7 (100% [59.04, 100]) | 0 (0% [0, 40.96]) | |
Mc2 | 0 (0% [0, 40.96]) | 7 (100% [59.04, 100]) | |
Mc3 | 0 (0% [0, 40.96]) | 7 (100% [59.04, 100]) | |
Mc1 | 0 (0% [0, 40.96]) | 7 (100% [59.04, 100]) | |
Type | Quebec | 21 (50% [34.19, 65.81]) | 21 (50% [34.19, 65.81]) |
Mississippi | 21 (50% [34.19, 65.81]) | 21 (50% [34.19, 65.81]) | |
conc | Mean (SD) | 435 (297.72) [344.96, 525.04] | 435 (297.72) [344.96, 525.04] |
uptake | Mean (SD) | 30.64 (9.7) [27.71, 33.58] | 23.78 (10.88) [20.49, 27.08] |
* Mean (SD)/n(%) with 95% CI | |||
sum_stat_p generates a descriptive summary table for both continuous and categorical variables, stratified by a grouping variable. It automatically computes appropriate statistical tests (Chi-square, Fisher’s exact, t-test, Wilcoxon, ANOVA, or Kruskal–Wallis) based on variable type, number of groups, and data distribution. Continuous variables can be summarized as mean (SD) or median (IQR), and categorical variables as counts and percentages.
# Summary of iris dataset by species # wil select the correct automatically
sum_stat_p(iris, by = "Species", statistic = "mean_sd", test_type = "auto")
Variable | Characteristic | setosa | versicolor | virginica | p-value |
|---|---|---|---|---|---|
Sepal.Length | Mean (SD) | 5.01 (0.35) | 5.94 (0.52) | 6.59 (0.64) | 0.00 |
Sepal.Width | Mean (SD) | 3.43 (0.38) | 2.77 (0.31) | 2.97 (0.32) | 0.00 |
Petal.Length | Mean (SD) | 1.46 (0.17) | 4.26 (0.47) | 5.55 (0.55) | 0.00 |
Petal.Width | Mean (SD) | 0.25 (0.11) | 1.33 (0.2) | 2.03 (0.27) | 0.00 |
1 n (%); Mean (SD) | |||||
P-values calculated using: Kruskal-Wallis | |||||
# Summary of CO2 dataset by Type with paired t-test
sum_stat_p(CO2, by = "Type", statistic = "mean_sd", test_type = "t.test", paired = TRUE)
## Warning in chisq.test(tbl): Chi-squared approximation may be incorrect
Variable | Characteristic | Quebec | Mississippi | p-value |
|---|---|---|---|---|
Plant | Qn1 | 7 (16.67%) | 0 (0%) | 0.00 |
Qn2 | 7 (16.67%) | 0 (0%) | ||
Qn3 | 7 (16.67%) | 0 (0%) | ||
Qc1 | 7 (16.67%) | 0 (0%) | ||
Qc3 | 7 (16.67%) | 0 (0%) | ||
Qc2 | 7 (16.67%) | 0 (0%) | ||
Mn3 | 0 (0%) | 7 (16.67%) | ||
Mn2 | 0 (0%) | 7 (16.67%) | ||
Mn1 | 0 (0%) | 7 (16.67%) | ||
Mc2 | 0 (0%) | 7 (16.67%) | ||
Mc3 | 0 (0%) | 7 (16.67%) | ||
Mc1 | 0 (0%) | 7 (16.67%) | ||
Treatment | nonchilled | 21 (50%) | 21 (50%) | 1.00 |
chilled | 21 (50%) | 21 (50%) | ||
conc | Mean (SD) | 435 (297.72) | 435 (297.72) | NaN |
uptake | Mean (SD) | 33.54 (9.67) | 20.88 (7.82) | 0.00 |
1 n (%); Mean (SD) | ||||
P-values calculated using: Chi-square, Paired t-test | ||||
# Summary using median and IQR
sum_stat_p(iris, by = "Species", statistic = "med_iqr", test_type = "kruskal")
Variable | Characteristic | setosa | versicolor | virginica | p-value |
|---|---|---|---|---|---|
Sepal.Length | Median (IQR) | 5 (4.8, 5.2) | 5.9 (5.6, 6.3) | 6.5 (6.23, 6.9) | 0.00 |
Sepal.Width | Median (IQR) | 3.4 (3.2, 3.68) | 2.8 (2.52, 3) | 3 (2.8, 3.18) | 0.00 |
Petal.Length | Median (IQR) | 1.5 (1.4, 1.58) | 4.35 (4, 4.6) | 5.55 (5.1, 5.88) | 0.00 |
Petal.Width | Median (IQR) | 0.2 (0.2, 0.3) | 1.3 (1.2, 1.5) | 2 (1.8, 2.3) | 0.00 |
1 n (%); Median (IQR) | |||||
P-values calculated using: Kruskal-Wallis | |||||
Produces summary tables for numeric and categorical variables in a dataset, optionally stratified by a grouping variable. Numeric variables are summarized with mean (SD) or median (IQR), and categorical variables with counts and percentages. Appropriate statistical tests (t-test, Wilcoxon, ANOVA, Kruskal-Wallis, Chi-square, or Fisher’s Exact) are performed depending on the variable type, number of groups, and user-specified options.
# Example : Summary of CO2 dataset by Type, stratified by Treatment
sum_stat_p_strata(data = CO2, by = "Type", strata = "Treatment")
Treatment | Variable | Characteristic | Quebec | Mississippi | p-value |
|---|---|---|---|---|---|
nonchilled | Plant | Qn1 | 7 (33%) | 0 (0%) | <0.001 |
Qn2 | 7 (33%) | 0 (0%) | |||
Qn3 | 7 (33%) | 0 (0%) | |||
Qc1 | 0 (0%) | 0 (0%) | |||
Qc3 | 0 (0%) | 0 (0%) | |||
Qc2 | 0 (0%) | 0 (0%) | |||
Mn3 | 0 (0%) | 7 (33%) | |||
Mn2 | 0 (0%) | 7 (33%) | |||
Mn1 | 0 (0%) | 7 (33%) | |||
Mc2 | 0 (0%) | 0 (0%) | |||
Mc3 | 0 (0%) | 0 (0%) | |||
Mc1 | 0 (0%) | 0 (0%) | |||
conc | Mean (SD) | 435 (301.42) | 435 (301.42) | 1.00 | |
uptake | Mean (SD) | 35.33 (9.6) | 25.95 (7.4) | 0.00 | |
chilled | Plant | Qn1 | 0 (0%) | 0 (0%) | <0.001 |
Qn2 | 0 (0%) | 0 (0%) | |||
Qn3 | 0 (0%) | 0 (0%) | |||
Qc1 | 7 (33%) | 0 (0%) | |||
Qc3 | 7 (33%) | 0 (0%) | |||
Qc2 | 7 (33%) | 0 (0%) | |||
Mn3 | 0 (0%) | 0 (0%) | |||
Mn2 | 0 (0%) | 0 (0%) | |||
Mn1 | 0 (0%) | 0 (0%) | |||
Mc2 | 0 (0%) | 7 (33%) | |||
Mc3 | 0 (0%) | 7 (33%) | |||
Mc1 | 0 (0%) | 7 (33%) | |||
conc | Mean (SD) | 435 (301.42) | 435 (301.42) | 1.00 | |
uptake | Mean (SD) | 31.75 (9.64) | 15.81 (4.06) | <0.001 | |
1 n (%); Mean (SD) | |||||
Tests used: Plant : Fisher's Exact; conc : Student's t-test; uptake : Student's t-test | |||||
Produces a summary table of numeric or categorical variables grouped by a factor, optionally performing global tests (ANOVA or Kruskal-Wallis) and post-hoc comparisons (Tukey or Dunn test). Numeric variables can be summarized using mean (SD) or median (IQR). Returns a flextable suitable for reporting.
sum_posthoc(
data = iris,
by = "Species",
variables = c("Sepal.Length","Sepal.Width","Petal.Length","Petal.Width")
)
Variable | Characteristic | setosa | versicolor | virginica | p-value | versicolor-setosa | virginica-setosa | virginica-versicolor | setosa - versicolor | setosa - virginica | versicolor - virginica |
|---|---|---|---|---|---|---|---|---|---|---|---|
Sepal.Length* | Mean (SD) | 5.01 (0.35) | 5.94 (0.52) | 6.59 (0.64) | <0.001 | 0.93 (<0.001) | 1.58 (<0.001) | 0.65 (<0.001) | |||
Sepal.Width* | Mean (SD) | 3.43 (0.38) | 2.77 (0.31) | 2.97 (0.32) | <0.001 | -0.66 (<0.001) | -0.45 (<0.001) | 0.2 (<0.001) | |||
Petal.Length* | Mean (SD) | 1.46 (0.17) | 4.26 (0.47) | 5.55 (0.55) | <0.001 | 2.8 (<0.001) | 4.09 (<0.001) | 1.29 (<0.001) | |||
Petal.Width** | Median (IQR) | 0.2 (0.2, 0.3) | 1.3 (1.2, 1.5) | 2 (1.8, 2.3) | <0.001 | <0.001 | <0.001 | <0.001 | |||
Statistic: Mean (SD) for ANOVA, Median (IQR) for Kruskal-Wallis | |||||||||||
Tests used: ANOVA + Tukey, Kruskal-Wallis + Dunn | |||||||||||
* ANOVA, ** Kruskal-Wallis | |||||||||||
Post-hoc: mean difference (p-value) for pairwise comparisons | |||||||||||
# apply ANOVA post hoc tukey test
sum_posthoc(
data = iris,
by = "Species",
variables = c("Sepal.Length","Sepal.Width","Petal.Length","Petal.Width"),
test_type = "anova"
)
Variable | Characteristic | setosa | versicolor | virginica | p-value | versicolor-setosa | virginica-setosa | virginica-versicolor |
|---|---|---|---|---|---|---|---|---|
Sepal.Length* | Mean (SD) | 5.01 (0.35) | 5.94 (0.52) | 6.59 (0.64) | <0.001 | 0.93 (<0.001) | 1.58 (<0.001) | 0.65 (<0.001) |
Sepal.Width* | Mean (SD) | 3.43 (0.38) | 2.77 (0.31) | 2.97 (0.32) | <0.001 | -0.66 (<0.001) | -0.45 (<0.001) | 0.2 (<0.001) |
Petal.Length* | Mean (SD) | 1.46 (0.17) | 4.26 (0.47) | 5.55 (0.55) | <0.001 | 2.8 (<0.001) | 4.09 (<0.001) | 1.29 (<0.001) |
Petal.Width* | Mean (SD) | 0.25 (0.11) | 1.33 (0.2) | 2.03 (0.27) | <0.001 | 1.08 (<0.001) | 1.78 (<0.001) | 0.7 (<0.001) |
Statistic: Mean (SD) for ANOVA, Median (IQR) for Kruskal-Wallis | ||||||||
Tests used: ANOVA + Tukey | ||||||||
* ANOVA, ** Kruskal-Wallis | ||||||||
Post-hoc: mean difference (p-value) for pairwise comparisons | ||||||||
#Post hoc for one variable
sum_posthoc(
data = iris,
by = "Species",
variables = c("Petal.Width"))
Variable | Characteristic | setosa | versicolor | virginica | p-value | setosa - versicolor | setosa - virginica | versicolor - virginica |
|---|---|---|---|---|---|---|---|---|
Petal.Width** | Median (IQR) | 0.2 (0.2, 0.3) | 1.3 (1.2, 1.5) | 2 (1.8, 2.3) | <0.001 | <0.001 | <0.001 | <0.001 |
Statistic: Mean (SD) for ANOVA, Median (IQR) for Kruskal-Wallis | ||||||||
Tests used: Kruskal-Wallis + Dunn | ||||||||
* ANOVA, ** Kruskal-Wallis | ||||||||
Post-hoc: mean difference (p-value) for pairwise comparisons | ||||||||
Computes correlations between a reference variable and one or more comparison variables. For Pearson correlations, 95% confidence intervals are also calculated. Can optionally stratify by a grouping variable. Returns a formatted flextable and optionally prints a narrative summary describing weak, moderate, and strong correlations
# Example 1: Correlations across entire dataset
sum_cor(
data = iris,
ref_var = "Sepal.Length",
compare_vars = c("Petal.Length", "Petal.Width", "Sepal.Width"),
method = "pearson",
digits = 2,
report = TRUE
)
## Strong correlation was found with Petal.Length (r = 0.87, 95% CI = 0.83,0.91, p = <0.001) and Petal.Width (r = 0.82, 95% CI = 0.76,0.86, p = <0.001). Weak correlation was found with Sepal.Width (r = -0.12, 95% CI = -0.27,0.04, p = 0.15).
Reference Variable: Sepal.Length | |||||
|---|---|---|---|---|---|
Comparison Variable | Correlation | 95% CI Lower | 95% CI Upper | p-value | Strength |
Petal.Lengtha | 0.87 | 0.83 | 0.91 | <0.001 | strong |
Petal.Width | 0.82 | 0.76 | 0.86 | <0.001 | strong |
Sepal.Width | -0.12 | -0.27 | 0.04 | 0.15 | weak |
aTest: pearson | |||||
# Example 2: Correlations by Species
sum_cor(
data = iris,
ref_var = "Sepal.Length",
by = "Species",
compare_vars = c("Petal.Length", "Petal.Width", "Sepal.Width"),
method = "pearson",
digits = 2,
report = TRUE
)
## For setosa: Strong correlation was found with Sepal.Width (r = 0.74, 95% CI = 0.59,0.85, p = <0.001). Weak correlation was found with Petal.Width (r = 0.28, 95% CI = 0,0.52, p = 0.05) and Petal.Length (r = 0.27, 95% CI = -0.01,0.51, p = 0.06).
##
## For versicolor: Strong correlation was found with Petal.Length (r = 0.75, 95% CI = 0.6,0.85, p = <0.001). Moderate correlation was found with Petal.Width (r = 0.55, 95% CI = 0.32,0.72, p = <0.001) and Sepal.Width (r = 0.53, 95% CI = 0.29,0.7, p = <0.001).
##
## For virginica: Strong correlation was found with Petal.Length (r = 0.86, 95% CI = 0.77,0.92, p = <0.001). Moderate correlation was found with Sepal.Width (r = 0.46, 95% CI = 0.2,0.65, p = <0.001). Weak correlation was found with Petal.Width (r = 0.28, 95% CI = 0,0.52, p = 0.05).
Reference Variable: Sepal.Length | ||||||
|---|---|---|---|---|---|---|
Species | Comparison Variable | Correlation | 95% CI Lower | 95% CI Upper | p-value | Strength |
setosaa | Sepal.Width | 0.74 | 0.59 | 0.85 | <0.001 | strong |
Petal.Width | 0.28 | -0.00 | 0.52 | 0.05 | weak | |
Petal.Length | 0.27 | -0.01 | 0.51 | 0.06 | weak | |
versicolor | Petal.Length | 0.75 | 0.60 | 0.85 | <0.001 | strong |
Petal.Width | 0.55 | 0.32 | 0.72 | <0.001 | moderate | |
Sepal.Width | 0.53 | 0.29 | 0.70 | <0.001 | moderate | |
virginica | Petal.Length | 0.86 | 0.77 | 0.92 | <0.001 | strong |
Sepal.Width | 0.46 | 0.20 | 0.65 | <0.001 | moderate | |
Petal.Width | 0.28 | 0.00 | 0.52 | 0.05 | weak | |
aTest: pearson | ||||||
# Apply linear regression on iris dataset
linreg(
data = iris,
outcome = "Sepal.Length",
predictors = c("Sepal.Width", "Petal.Length", "Species"),
report = TRUE
)
## $table
## a flextable object.
## col_keys: `Predictor`, `Univariable
## Beta (95% CI)`, `Univariable
## p`, `Multivariable
## Beta (95% CI)`, `Multivariable
## p`
## header has 1 row(s)
## body has 5 row(s)
## original dataset sample:
## Predictor Univariable\nBeta (95% CI) Univariable\np
## 1 Sepal.Width -0.22 (-0.53, 0.08) 0.15
## 2 Petal.Length 0.41 (0.37, 0.45) <0.001
## 3 setosa Reference
## 4 versicolor 0.93 (0.73, 1.13) <0.001
## 5 virginica 1.58 (1.38, 1.79) <0.001
## Multivariable\nBeta (95% CI) Multivariable\np
## 1 0.43 (0.27, 0.59) <0.001
## 2 0.78 (0.65, 0.9) <0.001
## 3 Reference
## 4 -0.96 (-1.38, -0.53) <0.001
## 5 -1.39 (-1.96, -0.83) <0.001
##
## $interpretation
## [1] "The multivariable model explained 86.3% of variance in Sepal.Length (Adjusted R^2 = 0.86). Each one-unit increase in Sepal.Width was associated with a statistically significant increase of 0.43 units in Sepal.Length (95% CI 0.27 to 0.59, p = <0.001). Each one-unit increase in Petal.Length was associated with a statistically significant increase of 0.78 units in Sepal.Length (95% CI 0.65 to 0.9, p = <0.001). Each one-unit increase in Speciesversicolor was associated with a statistically significant decrease of 0.96 units in Sepal.Length (95% CI -1.38 to -0.53, p = <0.001). Each one-unit increase in Speciesvirginica was associated with a statistically significant decrease of 1.39 units in Sepal.Length (95% CI -1.96 to -0.83, p = <0.001). "
Fits univariable and multivariable linear regression models for a continuous outcome, summarizing beta coefficients, 95% confidence intervals, and p-values. Factor predictors include reference levels in the table. Returns a formatted flextable and optionally provides an automatic textual interpretation of results.
# Apply linear regression on iris dataset
linreg(
data = iris,
outcome = "Sepal.Length",
predictors = c("Sepal.Width", "Petal.Length", "Species"),
report = FALSE
)
Predictor | Univariable | Univariable | Multivariable | Multivariable |
|---|---|---|---|---|
Sepal.Width | -0.22 (-0.53, 0.08) | 0.15 | 0.43 (0.27, 0.59) | <0.001 |
Petal.Length | 0.41 (0.37, 0.45) | <0.001 | 0.78 (0.65, 0.9) | <0.001 |
setosa | Reference | Reference | ||
versicolor | 0.93 (0.73, 1.13) | <0.001 | -0.96 (-1.38, -0.53) | <0.001 |
virginica | 1.58 (1.38, 1.79) | <0.001 | -1.39 (-1.96, -0.83) | <0.001 |
Multivariable model: R^2 = 0.863; Adjusted R^2 = 0.86 | ||||
Fits univariable and multivariable logistic regression models for a binary outcome, summarizing odds ratios (ORs), 95% confidence intervals, and p-values. Factor predictors include reference levels in the table. Returns a formatted flextable and optionally provides an automatic textual interpretation of results
logreg(data=medical_data(), outcome="case" ,
predictors= c("age" , "parity" , "induced" ), report = TRUE)
##
## --- Automatic Interpretation ---
## age showed a non-significant increase in odds of case (OR 1, 95% CI 0.94-1.05; p=0.88). parity showed a non-significant increase in odds of case (OR 0.99, 95% CI 0.75-1.3; p=0.95). Yes showed a non-significant increase in odds of case (OR 1.04, 95% CI 0.55-1.95; p=0.90).
Predictor | Univariable | Univariable | Multivariable | Multivariable |
|---|---|---|---|---|
age | 1 (0.95-1.05) | 0.96 | 1 (0.94-1.05) | 0.88 |
parity | 1.02 (0.82-1.25) | 0.89 | 0.99 (0.75-1.3) | 0.95 |
induced_No | Reference | Reference | ||
induced_Yes | 1.04 (0.56-1.91) | 0.89 | 1.04 (0.55-1.95) | 0.90 |
OR, Odds ratio; Binary logistic regression | ||||
# apply on trial dataset in gtsummary
df1 <- gtsummary::trial
logreg(data=df1, outcome="response" ,
predictors= c( "stage" , "grade", "marker" ), report = TRUE)
##
## --- Automatic Interpretation ---
## T2 showed a non-significant increase in odds of response (OR 0.48, 95% CI 0.19-1.19; p=0.12). T3 showed a non-significant increase in odds of response (OR 1.06, 95% CI 0.42-2.67; p=0.89). T4 showed a non-significant increase in odds of response (OR 0.71, 95% CI 0.29-1.7; p=0.44). II showed a non-significant increase in odds of response (OR 1.2, 95% CI 0.54-2.7; p=0.66). III showed a non-significant increase in odds of response (OR 1.15, 95% CI 0.53-2.52; p=0.72). marker showed a non-significant increase in odds of response (OR 1.43, 95% CI 0.98-2.09; p=0.06).
Predictor | Univariable | Univariable | Multivariable | Multivariable |
|---|---|---|---|---|
stage_T1 | Reference | Reference | ||
stage_T2 | 0.63 (0.27-1.46) | 0.29 | 0.48 (0.19-1.19) | 0.12 |
stage_T3 | 1.13 (0.48-2.68) | 0.77 | 1.06 (0.42-2.67) | 0.89 |
stage_T4 | 0.83 (0.36-1.92) | 0.67 | 0.71 (0.29-1.7) | 0.44 |
grade_I | Reference | Reference | ||
grade_II | 0.95 (0.45-2) | 0.88 | 1.2 (0.54-2.7) | 0.66 |
grade_III | 1.1 (0.52-2.29) | 0.81 | 1.15 (0.53-2.52) | 0.72 |
marker | 1.35 (0.94-1.93) | 0.10 | 1.43 (0.98-2.09) | 0.06 |
OR, Odds ratio; Binary logistic regression | ||||
Calculates diagnostic accuracy measures (Sensitivity, Specificity, PPV, NPV, Accuracy, LR+, LR-, DOR) from a binary test and gold standard. Provides 95% confidence intervals using Wilson method for proportions and log method for ratios. Optionally, prints a descriptive 2x2 table.
diagnostic_data <- data.frame(
test = c("positive","negative","positive","
negative","positive","negative","positive","negative"),
goldstandard = c("positive","positive","negative",
"negative","positive","negative","positive","negative")
)
diag_accuracy(diagnostic_data, test_col = "test",
gold_col = "goldstandard",
descriptive = FALSE)
Diagnostic Metric | Estimate (95% CI) |
|---|---|
Sensitivity (%) | 75 (30.06-95.44) |
Specificity (%) | 66.67 (20.77-93.85) |
PPV (%) | 75 (30.06-95.44) |
NPV (%) | 66.67 (20.77-93.85) |
Accuracy (%) | 71.43 (35.89-91.78) |
LR+ | 2.25 (0.41-12.28) |
LR- | 0.38 (0.06-2.45) |
DOR | 6 (0.48-75.35) |
CI formula references: Wilson (1927) for proportions; log method for LR+/- and DOR (Altman, 1991) | |
diag_accuracy(diagnostic_data, test_col = "test",
gold_col = "goldstandard",
descriptive = TRUE)
## a flextable object.
## col_keys: `Metric`, `Count`
## header has 2 row(s)
## body has 4 row(s)
## original dataset sample:
## Metric Count
## 1 True Positive 3
## 2 False Positive 1
## 3 False Negative 1
## 4 True Negative 2
Diagnostic Metric | Estimate (95% CI) |
|---|---|
Sensitivity (%) | 75 (30.06-95.44) |
Specificity (%) | 66.67 (20.77-93.85) |
PPV (%) | 75 (30.06-95.44) |
NPV (%) | 66.67 (20.77-93.85) |
Accuracy (%) | 71.43 (35.89-91.78) |
LR+ | 2.25 (0.41-12.28) |
LR- | 0.38 (0.06-2.45) |
DOR | 6 (0.48-75.35) |
CI formula references: Wilson (1927) for proportions; log method for LR+/- and DOR (Altman, 1991) | |