Data Analysis Manal

Published

September 1, 2023

Modified

September 27, 2023

Packages

Dataset

Create new var for the experience

Merge the other specialists

Considering the limited number of non-radiologists, it would be advisable to consolidate them for more comprehensive analysis.

Characteristic	N = 12¹
Group
Non Radiologist	8 (67%)
Radiologist	4 (33%)
Experience (years)
[3, 10.4)	7 (58%)
[10.4, 17.8)	2 (17%)
[25.2, 32.6)	1 (8.3%)
[32.6, 40]	2 (17%)
Gender
F	2 (17%)
M	10 (83%)
¹ n (%)

Is preferable to split the experience in two group: 3-8 and 8-40

How many readers?

[1] 12

How many observations?

[1] 66

How many LesionID?

[1] 5

Table 1 Descriptive of the dataset

Characteristic	Negative, N = 1¹	Positive, N = 37¹
Rating
> 1	1 (100%)	10 (27%)
1	0 (0%)	27 (73%)
Extent
Large Extent	0 (0%)	21 (57%)
Small Extent	1 (100%)	16 (43%)
Location
Anterior Mandibular	0 (0%)	3 (8.1%)
Anterior Maxillary	0 (0%)	6 (16%)
Molar Mandibular	1 (100%)	10 (27%)
Molar Maxillary	0 (0%)	8 (22%)
Premolar Mandibular	0 (0%)	6 (16%)
Premolar Maxillary	0 (0%)	4 (11%)
Treatment
Endodontically Treated Tooth	1 (100%)	22 (59%)
Non Endodontically Treated Tooth	0 (0%)	15 (41%)
¹ n (%)

Questions

rating: Was there a difference in rating for the groups?

Characteristic	Beta	95% CI¹	p-value
Group
Non Radiologist	—	—
Radiologist	0.09	0.06, 0.12	<0.001
as.factor(Reading_number)
1	—	—
2	0.01	-0.02, 0.03	0.6
Group * as.factor(Reading_number)
Radiologist * 2	-0.02	-0.06, 0.02	0.3
¹ CI = Confidence Interval

The diagnosis of radiologists differs from non-radiologists, and is not affected by the use of AI, just as there is no interaction between group and AI.

time: Was there a difference in time for the groups?

Characteristic	Beta	95% CI¹	p-value
Group
Non Radiologist	—	—
Radiologist	16	15, 18	<0.001
as.factor(Reading_number)
1	—	—
2	-10	-12, -9.0	<0.001
Group * as.factor(Reading_number)
Radiologist * 2	-4.0	-6.4, -1.6	0.001
¹ CI = Confidence Interval

For the time, radiologists took more time than generals before taking up AI. With Ai they did it faster (-10) and there is an interaction between group and time, with radiology occupying AI taking -4 less time than non-radiologists.

Effect on Time

Characteristic	Aided Read, N = 1,466¹	Unaided Read, N = 1,414¹
Group
Non Radiologist	1,040 (71%)	1,012 (72%)
Radiologist	426 (29%)	402 (28%)
Sec_per_image	26 (20, 35)	30 (24, 51)
¹ n (%); Median (IQR)

Reading_type	mean_rating	mean_time
Aided Read	0.8100955	27.90256
Unaided Read	0.8104668	39.21976


    Welch Two Sample t-test

data:  Rating by Reading_type
t = -0.038465, df = 2868, p-value = 0.9693
alternative hypothesis: true difference in means between group Aided Read and group Unaided Read is not equal to 0
95 percent confidence interval:
 -0.01929654  0.01855402
sample estimates:
  mean in group Aided Read mean in group Unaided Read 
                 0.8100955                  0.8104668


    Welch Two Sample t-test

data:  Sec_per_image by Reading_type
t = -18.406, df = 1761.3, p-value < 2.2e-16
alternative hypothesis: true difference in means between group Aided Read and group Unaided Read is not equal to 0
95 percent confidence interval:
 -12.52314 -10.11126
sample estimates:
  mean in group Aided Read mean in group Unaided Read 
                  27.90256                   39.21976

Characteristic	Time			Rating
Characteristic	Beta	95% CI¹	p-value	Beta	95% CI¹	p-value
Reading_type
Aided Read	—	—		—	—
Unaided Read	12	11, 13	<0.001	0.01	-0.01, 0.02	0.3
Group
Non Radiologist	—	—		—	—
Radiologist	15	13, 16	<0.001	0.02	0.00, 0.04	0.015
Age	0.00	-0.04, 0.03	0.8	0.00	0.00, 0.00	0.017
Gender
F	—	—		—	—
M	-0.57	-1.9, 0.72	0.4	-0.01	-0.03, 0.00	0.091
¹ CI = Confidence Interval

Plot models

Explore in more detail the change for each

# A tibble: 2,880 × 3
   Metric        `Aided Read` `Unaided Read`
   <chr>                <dbl>          <dbl>
 1 Sec_per_image         35.5           NA  
 2 Sec_per_image         35.5           NA  
 3 Sec_per_image         NA             28.4
 4 Sec_per_image         NA             28.4
 5 Sec_per_image         20.2           NA  
 6 Sec_per_image         20.2           NA  
 7 Sec_per_image         NA             53.3
 8 Sec_per_image         NA             53.3
 9 Sec_per_image         NA             45.6
10 Sec_per_image         NA             45.6
# ℹ 2,870 more rows

--- title: "Data Analysis Manal" date: 2023-09-01 date-modified: last-modified format: html: toc: true toc-expand: 3 code-fold: true code-tools: true editor: visual execute: echo: false warning: false message: false --- # Packages ```{r} pacman::p_load(tidyverse, janitor, kableExtra, sjPlot, patchwork, santoku, # to cut the data gtsummary, here) ``` ```{r} theme_set(theme_minimal()) ``` # Dataset ```{r} df <- read_csv(here("data", "df.csv")) ``` ```{r} # head(df) ``` Create new var for the experience ```{r} df <- df |> mutate(Experience_5 = chop_evenly(Experience, intervals = 5)) |> mutate(Experience_equally = chop_equally(Experience, groups = 2)) ``` ## Merge the other specialists Considering the limited number of non-radiologists, it would be advisable to consolidate them for more comprehensive analysis. ```{r} df <- df |> mutate(Group = case_when( Group == "Radiologist" ~ "Radiologist", TRUE ~ "Non Radiologist" )) ``` ```{r} df |> mutate(ReaderID = as.factor(ReaderID)) |> mutate(Age = as.integer(Age)) |> mutate(Experience = as.integer(Experience)) |> distinct(ReaderID, .keep_all = TRUE) |> select(Group, # Experience, "Experience (years)" = Experience_5, # Experience_equally, Gender) |> gtsummary::tbl_summary() ``` Is preferable to split the experience in two group: 3-8 and 8-40 ## How many readers? ```{r} n_distinct(df$ReaderID) ``` ## How many observations? ```{r} n_distinct(df$CaseID) ``` ## How many LesionID? ```{r} n_distinct(df$LesionID) ``` ## Table 1 Descriptive of the dataset ```{r} df %>% filter(Reading_type == "Unaided Read") %>% drop_na(True_status) %>% # Eliminar filas donde True_status es NA distinct(CaseID, .keep_all = TRUE) %>% select( # CaseID, # LesionID, Rating, Extent, Location, Treatment, True_status) %>% mutate(Rating = case_when( Rating != "1" ~ "> 1", TRUE ~ as.character(Rating) )) |> gtsummary::tbl_summary(by = True_status) ``` # Questions ## rating: Was there a difference in rating for the groups? ```{r} df |> ggplot(aes(x = Group, y = Rating, color = as.factor(Reading_number))) + geom_violin() ``` ```{r} gtsummary::tbl_regression(lm(Rating ~ Group * as.factor(Reading_number), data = df)) ``` The diagnosis of radiologists differs from non-radiologists, and is not affected by the use of AI, just as there is no interaction between group and AI. ## time: Was there a difference in time for the groups? ```{r} df |> mutate(Reading_number = recode(as.factor(Reading_number), "1" = "No AI", "2" = "With AI")) |> ggplot(aes(x = Group, y = Sec_per_image, fill = as.factor(Reading_number))) + geom_violin() + labs(title = "Differences in reading time by groups and use of AI", y = "Sec per image", x = "Group", fill = "Use of AI") ``` ```{r} gtsummary::tbl_regression(lm(Sec_per_image ~ Group * as.factor(Reading_number), data = df)) ``` For the time, radiologists took more time than generals before taking up AI. With Ai they did it faster (-10) and there is an interaction between group and time, with radiology occupying AI taking -4 less time than non-radiologists. ### Effect on Time ```{r} df |> ggplot(aes(x = Sec_per_image)) + geom_histogram(bins = 5) + facet_grid(Reading_type ~ Group) + labs(title = "Effect on time", x = "Sec per image", y = "Readings") + scale_y_log10() ``` ```{r} df |> select(Reading_type, Group, Sec_per_image) |> gtsummary::tbl_summary(by = Reading_type) ``` ```{r} df |> group_by(Reading_type) |> summarise(mean_rating = mean(Rating, na.rm = TRUE), mean_time = mean(Sec_per_image, na.rm = TRUE)) |> kbl() |> kable_styling() ``` ```{r} # T-test Rating t.test(Rating ~ Reading_type, data = df) ``` ```{r} # T-test Sec_per_image t.test(Sec_per_image ~ Reading_type, data = df) ``` ```{r} # Lineal model Rating lm_rating <- lm(Rating ~ Reading_type + Group + Age + Gender, data = df) # gtsummary::tbl_regression(lm_rating) |> # as_gt() %>% # gt::tab_header(title = "Effect on Ratings") ``` ```{r} lm_rating <- gtsummary::tbl_regression(lm_rating) ``` ```{r} # Lineal model Sec_per_image lm_time <- lm(Sec_per_image ~ Reading_type + Group + Age + Gender, data = df) # summary(lm_time) ``` ```{r} lm_time <- gtsummary::tbl_regression(lm_time) ``` ```{r} tbl_merge( tbls = list(lm_time, lm_rating), tab_spanner = c("**Time**", "**Rating**") ) ``` ## Plot models ```{r} # plot_model(lm(Rating ~ Reading_type + Group + Age + Gender, data = df), # colors = "black") ``` ```{r} # plot_model(lm(Sec_per_image ~ Reading_type + Group + Age + Gender, data = df), # colors = "black") ``` ```{r} # reshape the data df |> select(Reading_type, Sec_per_image, Rating) |> pivot_longer(cols = c(Sec_per_image, Rating), names_to = "Metric", values_to = "Value") |> # make the plot ggplot(aes(x = Reading_type, y = Value)) + geom_boxplot() + facet_wrap(~ Metric, scales = "free") + labs(title = "Impact of Reading_type on Sec_per_image y Rating", x = "Reading type", y = "Value") + theme_minimal() ``` Explore in more detail the change for each ```{r} df |> select(Reading_type, Sec_per_image, Rating) |> pivot_longer(cols = c(Sec_per_image, Rating), names_to = "Metric", values_to = "Value") |> filter(Metric == "Sec_per_image") |> mutate(row = row_number()) %>% pivot_wider(names_from = Reading_type, values_from = Value) |> select(-row) ``` #