10% of patients showed volume loss of less than 15%, which means that the final implant volume is more than anticipated. 20% of patients showed volume loss of more than 35% which means that the final implant volume is considerably less than anticipated and 5% of patients lost 45% or more of the initial volume. Furthermore, when analysing symmetry of volume loss we can see that 20% of patients had more than 7% difference in volume loss for one of the sides, this difference can cause asymmetrical malar zones.
# A tibble: 1 × 1
change
<dbl>
1 4.44
Correct answer: 4.4% of patients showed volume loss of less than 15%
20% of patients showed volume loss of more than 35%
# A tibble: 1 × 1
change
<dbl>
1 13.3
Correct answer: 13.3% of patients showed volume loss of more than 35%
5% of patients lost 45% or more of the initial volume
# A tibble: 1 × 1
change
<dbl>
1 4.44
Correct answer: 4.4% of patients showed volume loss of more than 45%
Furthermore, when analysing symmetry of volume loss we can see that 20% of patients had more than 7% difference in volume loss for one of the sides,
Correct answer: Furthermore, when analysing symmetry of volume loss we can see that 17.7% of patients had more than 7% difference in volume loss for one of the sides,
Source Code
---title: "02_EDA"author: "Sergio Uribe"date-modified: last-modifiedformat: html: toc: truetoc-expand: 3code-fold: truecode-tools: trueeditor: visualexecute: echo: false cache: false warning: false message: false---# Packages```{r}pacman::p_load(tidyverse, # tools for data science visdat, #NAs scales, here, skimr, # EDA gtsummary, janitor)``````{r}theme_set(theme_minimal())``````{r}#| output: falsehere::here()```# Dataset```{r}df <-read_csv(here("data", "df.csv"))```## Preliminary EDA```{r}skimr::skim(df)```# Simplify the dataset```{r}# df |> # # create all by side and time # pivot_longer(right_t1:left_t3, names_to = "side_time", values_to = "side_volume") |># pivot_longer(t1:t3, names_to = "t_time", values_to = "t_volume") |># pivot_longer(right_t1_abs:left_t3_abs, names_to = "side_time_abs", values_to = "side_volume_abs") |># pivot_longer(t1_abs:t3_abs, names_to = "t_time_abs", values_to = "t_volume_abs")``````{r}#| output: falsedf_long <- df |>pivot_longer(cols =starts_with("right_") |starts_with("left_"),names_to ="side_time",values_to ="side_volume") |>pivot_longer(cols =matches("^[t][1-3]_?abs?$"),names_to ="time_abs",values_to ="volume_abs") |>pivot_longer(t1:t3, names_to ="time", values_to ="volume") ```# Exploratory Data Analysis# DEMOGRAPHICS## Table 1 Patients and age```{r}df |>select(Age = age, Sex = sex) |> gtsummary::tbl_summary()```# ABSOLUTE CHANGES## Absolute Change in Volume by Time```{r}df |>select(id:age, t1:t3) |># calculate the change in volumemutate(t1 = t1, t2 = t1 - t2,t3 = t2 - t3) |>pivot_longer(-c(id:age), names_to ="time", values_to ="value") |>mutate(time =recode(time, t1 ="1 week",t2 ="4 months",t3 ="12 months")) |>mutate(time =fct_relevel(time, "1 week", "4 months", "12 months")) |># # calculate the relative change# group_by(id) %>%# arrange(id, time) %>%# mutate(relative_change = (value / lag(value) - 1) * 100) %>%# ungroup() |> # # # filter(!is.na(relative_change)) |># select(-c(value)) |> ggplot(aes(x = time, y = value,# size = age, color = sex, # by sexgroup = id)) +geom_line(alpha = .3) +geom_point(alpha = .3) +labs(title ="Absolute Change in Volume by Time",x ="Time",color ="Sex", y ="Volume (mm3)") +# facet_grid(. ~ sex ) + # by sexscale_y_continuous(labels = scales::comma) # scale_y_log10(labels = scales::comma) ```### Absolute changes by time violin plot```{r}df |>select(id:age, t1:t3) |># calculate the change in volumemutate(t1 = t1, t2 = t1 - t2,t3 = t2 - t3) |>pivot_longer(-c(id:age), names_to ="Time") |>mutate(Time =recode(Time, t1 ="1 week",t2 ="4 months",t3 ="12 months")) |>mutate(Time =fct_relevel(Time, "1 week", "4 months", "12 months")) |>ggplot(aes(x = Time,# color = sex,# group = id, y = value)) +geom_violin() +geom_jitter(alpha = .2, width = .2) +geom_boxplot(width = .1, alpha =0.5) +# geom_line(aes(group = id, color = as.factor(id)), alpha = 0.1, size = 0.7) + # scale_color_viridis_d(begin = 0.5, end = 0.9, option = "C") +# geom_line(aes(group = id), alpha = 0.1) + # this line shows each patient. Check the outliers shown in 12scale_y_continuous(labels = scales::comma) +labs(title ="Absolute Changes in Volume by Time",x ="Time",y ="Total Bilateral Volume (mm3)") ``````{r}df |>select(id:age, t1:t3) |># calculate the change in volumemutate(t1 = t1, t2 = t1 - t2,t3 = t2 - t3) |>pivot_longer(-c(id:age), names_to ="Time") |>group_by(Time) |>summarise(n =n(), median =median(value, na.rm =TRUE),range =IQR(value, na.rm =TRUE),max =max(value, na.rm =TRUE),min =min(value, na.rm =TRUE),mean =mean(value, na.rm =TRUE), sd =sd(value, na.rm =TRUE)) |>mutate_if(is.numeric, round, 2) |> knitr::kable(caption ="Absolute Changes in Volume by Time")``````{r}# Graph Absolute change in volume by time# df |> # select(id, sex, age, t1_abs:t3_abs) |> # pivot_longer(-c(id:age)) |> # mutate(name = recode(name, # t1_abs = "1 week",# t2_abs = "4 months",# t3_abs = "12 months")) |> # group_by(name) |> # summarise(n = n(), # mean = mean(value), # sd = sd(value)) |> # mutate_if(is.numeric, # ~round(., 2)) |> # knitr::kable(caption = "Absolute Changes in Volume by Time")``````{r}# df |> # select(id, sex, age, t1_abs:t3_abs) |> # pivot_longer(-c(id:age)) |> # mutate(name = recode(name, # t1_abs = "1 week",# t2_abs = "4 months",# t3_abs = "12 months")) |> # ggplot(aes(x = name,# # color = sex, # y = value)) +# geom_violin() + # geom_jitter(alpha = .2, width = .2) + # geom_boxplot(width = .1, alpha = 0.5) +# scale_y_continuous(labels = scales::comma) + # labs(title = "Absolute Changes in Volume by Time",# x = "Time",# y = "Volume (mm3)")```# RELATIVE CHANGES## Relative change in volume by time```{r}df |>select(id, sex, t1_abs:t3_abs) |># create the relative change variablemutate(relative_change = ((t3_abs - t1_abs) / t1_abs)*100) |>ggplot(aes(x = relative_change)) +geom_histogram(bins =20) +labs(title ="Relative Change in Volume by Time",x ="Relative Change",y ="Frequency") +scale_x_continuous(labels = scales::percent_format(scale =1)) +xlim(-75, 0) ```## Relative change in volume by time```{r}df |>select(id, sex, age, t1:t3) |># what is the volume at each time point?mutate(t2 = t1 - t2, t3 = t2 - t3) |># relative changemutate(t2 = (t2 / t1) *100, t3 = (t3 / t1) *100) |>mutate(t1 =100) |>pivot_longer(-c(id:age), names_to ="time") |>mutate(time =recode(time, t1 ="1 week",t2 ="4 months",t3 ="12 months") ) |>mutate(time =fct_relevel(time, "1 week", "4 months")) |>ggplot(aes(x = time, y = value, group = id)) +geom_line(alpha = .1) +geom_jitter(alpha = .2, width = .2) +labs(title ="Relative Change in Volume by Time",x ="Time",y ="Relative Change") +scale_y_continuous(labels = scales::percent_format(scale =1)) ```### Table Relative Change in Volume by Time```{r}df |>select(id, sex, age, t1:t3) |># what is the volume at each time point?mutate(t2 = t1 - t2,t3 = t2 - t3) |># relative changemutate(t2 = (t2 / t1) *100,t3 = (t3 / t1) *100) |>mutate(t1 =100) |>pivot_longer(-c(id:age),names_to ="time") |>mutate(time =recode(time,t1 ="1 week",t2 ="4 months",t3 ="12 months")) |>mutate(time =fct_relevel(time,"1 week","4 months")) |>group_by(time) |>summarise(n =n(),mean =mean(value, na.rm =TRUE),sd =sd(value, na.rm =TRUE),n =n(),se = sd /sqrt(n), # Calculate Standard Errorlower_ci = mean -1.96* se, # Lower bound of the 95% CIupper_ci = mean +1.96* se ) |># Upper bound of the 95% CI mutate_if(is.numeric, round, 1) |> knitr::kable(caption ="Relative Change in Volume by Time")```## Relative change by Sex```{r}df |>select(id, Sex = sex, t1_abs:t3_abs) |># create the relative change variablemutate(relative_change = ((t3_abs - t1_abs) / t1_abs)*100) |>ggplot(aes(y = relative_change,x = Sex)) +geom_violin() +geom_jitter(alpha = .2, width = .2) +geom_boxplot(width = .1, alpha =0.5) +labs(title ="Relative Change at 12 months in Volume", x ="Sex", y ="Relative Change") +scale_y_continuous(labels = scales::percent_format(scale =1)) +ylim(-75, 0)``````{r}## Difference in relative change by sex?# df |> # select(Sex = sex, # t1_abs:t3_abs) |># # create the relative change variable# mutate(relative_change = ((t3_abs - t1_abs) / t1_abs)*100) |> # select(Sex, relative_change) |># gtsummary::tbl_summary(by = Sex) |> # gtsummary::add_ci() |> # gtsummary::add_p()``````{r}# df |> # select(Sex = sex, # t1_abs:t3_abs) |># # create the relative change variable# mutate(relative_change = ((t3_abs - t1_abs) / t1_abs)*100) |> # select(Sex, relative_change) |> # with(lm(relative_change ~ Sex)) |> # gtsummary::tbl_regression()```## Relative change by Age```{r}df |>select(id, Age = age, t1_abs:t3_abs) |># create the relative change variablemutate(relative_change = ((t3_abs - t1_abs) / t1_abs)*100) |>ggplot(aes(y = relative_change,x = Age)) +geom_point() +geom_smooth() +labs(title ="Relative Change at 12 months in Volume by Age", x ="Age", y ="Relative Change") ```Difference in relative_change by age?```{r}# df |> # select(age, # t1_abs:t3_abs) |># # create the relative change variable# mutate(relative_change = ((t3_abs - t1_abs) / t1_abs)*100) |> # select(age, relative_change) |> # with(lm(relative_change ~ age)) |> # gtsummary::tbl_regression()```# REGRESSION ANALYSIS## Table 2 Difference in relative change by age and sex?```{r}df |>select(age, sex, t1_abs:t3_abs) |># create the relative change variablemutate(relative_change = ((t3_abs - t1_abs) / t1_abs)*100) |>select(Age = age, Sex = sex, relative_change) |>with(lm(relative_change ~ Age + Sex + Sex:Age)) |> gtsummary::tbl_regression()```# ANALYSIS BY SIDE```{r}df |>select(id:age, right_t1_abs = right_t1, right_t2_abs:left_t3_abs) |>pivot_longer(-c(id:age), names_to ="side_time_abs", values_to ="value") |>separate(side_time_abs, into =c("side", "time", "abs"), sep ="_", remove =TRUE) |>mutate(time =recode(time,t1 ="1 week",t2 ="4 months",t3 ="12 months")) |>mutate(time=fct_relevel(time,"1 week","4 months", "12 months")) |>select(-abs) |>ggplot(aes(x = time, y = value,# color = sex, group = id)) +geom_line(alpha = .3) +geom_point(alpha = .3) +scale_y_continuous(labels = scales::comma) +facet_grid(. ~ side ) +labs(title ="Change in Volume by Time and Side",x ="Time",y ="Volume (mm3)")```# Additional```{r}df |>select(id, t1:t3) |># what is the volume at each time point?mutate(t2 = t1 - t2,t3 = t2 - t3) |># relative changemutate(t2 = (t2 / t1) *100,t3 = (t3 / t1) *100) |>mutate(t1 =100) |>arrange(desc(t3)) |># ggplot(aes(x = t3)) + # geom_histogram(bins = 10) + # xlim(c(0, 100))summarise(across(t3, quantile, probs =seq(0, 1, 0.1))) ```10% of patients showed volume loss of less than 15%, which means that the final implant volume is more than anticipated. 20% of patients showed volume loss of more than 35% which means that the final implant volume is considerably less than anticipated and 5% of patients lost 45% or more of the initial volume. Furthermore, when analysing symmetry of volume loss we can see that 20% of patients had more than 7% difference in volume loss for one of the sides, this difference can cause asymmetrical malar zones.```{r}total_rows <-nrow(df)df |>mutate(percentage_change = (abs((t3_abs - t1_abs) / t1_abs)) *100) |>select(id, percentage_change) |>filter(percentage_change <15) |>summarise(change =n() / total_rows *100)```**Correct answer: 4.4% of patients showed volume loss of less than 15%**20% of patients showed volume loss of more than 35%```{r}df |>mutate(percentage_change = (abs((t3_abs - t1_abs) / t1_abs)) *100) |>select(id, percentage_change) |>filter(percentage_change >35) |>summarise(change =n() / total_rows *100)```**Correct answer: 13.3% of patients showed volume loss of more than 35%**5% of patients lost 45% or more of the initial volume```{r}df |>mutate(percentage_change = (abs((t3_abs - t1_abs) / t1_abs)) *100) |>select(id, percentage_change) |>filter(percentage_change >45) |>summarise(change =n() / total_rows *100)```**Correct answer: 4.4% of patients showed volume loss of more than 45%**Furthermore, when analysing symmetry of volume loss we can see that 20% of patients had more than 7% difference in volume loss for one of the sides,1. calculate the difference in volume per side```{r} df |>mutate(right_change =abs((right_t3_abs - right_t1_abs)) / right_t1_abs *100,left_change =abs((left_t3_abs - left_t1_abs)) / left_t1_abs *100) |>select(id, right_change, left_change)```Now compare the difference between sides```{r}df |>mutate(right_change =abs((right_t3_abs - right_t1_abs)) / right_t1_abs *100,left_change =abs((left_t3_abs - left_t1_abs)) / left_t1_abs *100) |>select(id, right_change, left_change) |>mutate(difference_between_sides =abs(right_change - left_change)) |>select(id, difference_between_sides)```Now count how many \> 7%```{r}df |>mutate(right_change =abs((right_t3_abs - right_t1_abs)) / right_t1_abs *100,left_change =abs((left_t3_abs - left_t1_abs)) / left_t1_abs *100) |>select(id, right_change, left_change) |>mutate(difference_between_sides =abs(right_change - left_change)) |>select(id, difference_between_sides) |>filter(difference_between_sides >7) |>summarise(change =n() / total_rows *100)```**Correct answer: Furthermore, when analysing symmetry of volume loss we can see that 17.7% of patients had more than 7% difference in volume loss for one of the sides,**