[1] "/home/sergiouribe/Insync/sergio.uribe@gmail.com/Google Drive - Shared with me/Epidemiologijas petijumi/2023_Publication_epidemiology/2023_analysis_epidemiology_Latvia_ilze_12_15"
Dataset for EPi
Load data for Epidemiological Analysis
Now subset only ages 12 and 15
Labelling
SIC caries index - Significant Caries Index
SIC 12 years old caries
SIC 15 years old caries
Table SIC results
SiC Index for 12 and 15-Year-Olds
Caries Threshold
12-Year-Olds
15-Year-Olds
D1MFS
17.37
25.76
D1MFT
10.58
14.75
D3MFS
8.24
14.79
D3MFT
5.08
8.93
D5MFS
6.67
12.38
D5MFT
3.86
7.18
Source Code
---title: "05 SIC caries Index Epidemiology Latvia 2023"date: 2025-02-04date-modified: last-modifieddate-format: "MMM D, YYYY, HH:mm"theme: default format:# docx: default# pdf: default html: toc: true toc-location: left embed-resources: truetoc-expand: 1code-fold: truecode-tools: trueeditor: sourceexecute: echo: false cache: true warning: false message: false---# PACKAGES & DATASETS```{r}pacman::p_load(tidyverse, performance, # Assessment of Regression Models Performance# expss, # for the labels# haven, # to keep the labels names# Hmisc, # for the labels# sjlabelled, # for the labels labelled, # for the labels using purrr, grid, # for arrow results in the SIC section gt, # fot the tables knitr, tidymodels, sjPlot, scales, gtsummary, irr, # for agreement calculations patchwork, # for several plots viridis, # colour palette janitor, here)``````{r}theme_set(theme_minimal())``````{r}here::here()```## Dataset for EPiLoad data for Epidemiological Analysis```{r}df <-read_rds(here("data", "df.rds")) # the rds file with the correct levels``````{r}# Will subset the dataset, only works with the csv file, not the rds# df <- df |> # select(-c(`17 [Distal]`:`Sealants[ 47]`))```Now subset only ages 12 and 15```{r}df <- df |>filter(Age %in%c("12", "15"))```## Labelling```{r}## For labelled packagedf <-set_variable_labels( df, Count ="Skaits",`Examen date`="Egzāmena datums",`Examen time`="Egzāmena laiks",Age ="Vecums",`Examiner code`="Egzaminatora kods",Region ="Reģions",`School code`="Skolas kods",`Child code`="Bērna kods",Examination ="Pārbaude",Gender ="Dzimums",`Date of birth`="Dzimšanas datums",`Language spoken iat home`="Mājās runātā valoda",`Place of living`="Dzīvesvieta",`Toothbrushing frequency`="Zobu sukošanas biežums",Toothpaste ="Zobu pasta",`Daily sugary drinks`="Ikdienu saldētie dzērieni",`Daily sweets`="Ikdienu saldumi",`Annual dental / dental hygiene visits`="Ikgadējās zobārstniecības / zobu higiēnas vizītes",`Smoking or other tobacco at least once per week`="Smēķēšana vai citu tabakas izstrādājumu lietošana vismaz reizi nedēļā",`Visible plaque`="Redzams zobu aplikums",Smoking ="Smēķēšana")```## SIC caries index - Significant Caries Index```{r}caries <- df |>select(Gender, Age, D1MFS:D5MFT) ``````{r}# Function to calculate and plot SiC Index for a given age group and caries thresholdplot_sic_index <-function(data, age, caries_column) {# Ensure caries_column is a valid column nameif (!caries_column %in%names(data)) {stop("Invalid caries column name. Choose from: D1MFS, D3MFS, D5MFS, D1MFT, D3MFT, D5MFT") }# Filter data for selected age and caries column df <- data |>filter(Age == age) |>select(all_of(caries_column)) |>arrange(.data[[caries_column]])# Rename column dynamically for consistent referencingcolnames(df) <-c("Caries_Score")# If there are no valid values, return an errorif (nrow(df) ==0) {stop("No data available for the selected age group.") }# Calculate cumulative frequency distribution df <- df |>mutate(Cumulative_Percent = (row_number() /n()) *100)# Identify the SiC Index threshold (top 1/3) sic_threshold <-quantile(df$Cumulative_Percent, probs =2/3, na.rm =TRUE)# Calculate the SiC Index as the mean of the highest 1/3 sic_index <- df |>filter(Cumulative_Percent >= sic_threshold) |>summarise(SiC =mean(Caries_Score, na.rm =TRUE)) |>pull(SiC)# Generate the plot p <-ggplot(df, aes(x = Cumulative_Percent, y = Caries_Score)) +# Shaded area below the step functiongeom_ribbon(aes(ymin =0, ymax = Caries_Score), fill ="black", alpha =0.2) +# Step plot for cumulative distributiongeom_step(size =1.2, color ="black") +# Vertical line marking the SiC Index thresholdgeom_vline(xintercept = sic_threshold, linetype ="solid", color ="black", size =1.2) +# Arrow to indicate the SiC rangeannotate("text", x = sic_threshold +5, y =max(df$Caries_Score, na.rm =TRUE) -1, label ="SIC", fontface ="bold", hjust =0) +annotate("segment", x = sic_threshold, xend =100, y =max(df$Caries_Score, na.rm =TRUE) -2, yend =max(df$Caries_Score, na.rm =TRUE) -2, arrow =arrow(type ="closed", ends ="both", length =unit(0.3, "cm")),color ="black") +# Labels and themelabs(title =paste("Cumulative Distribution of", caries_column, "for", age, "Year-Olds"),x ="Cumulative Percentage of Group",y =paste(caries_column, "Score")) +theme_minimal()return(p) # Explicitly return the plot} # <-- This closing bracket was missing!```### SIC 12 years old caries```{r}plot_sic_index(caries, age =12, caries_column ="D1MFS")``````{r}plot_sic_index(caries, age =12, caries_column ="D3MFS")``````{r}plot_sic_index(caries, age =12, caries_column ="D5MFS")``````{r}plot_sic_index(caries, age =12, caries_column ="D1MFT")``````{r}plot_sic_index(caries, age =12, caries_column ="D3MFT")``````{r}plot_sic_index(caries, age =12, caries_column ="D5MFT")```### SIC 15 years old caries```{r}plot_sic_index(caries, age =15, caries_column ="D1MFS")``````{r}plot_sic_index(caries, age =15, caries_column ="D3MFS")``````{r}plot_sic_index(caries, age =15, caries_column ="D5MFS")``````{r}plot_sic_index(caries, age =15, caries_column ="D1MFT")``````{r}plot_sic_index(caries, age =15, caries_column ="D3MFT")``````{r}plot_sic_index(caries, age =15, caries_column ="D5MFT")```## Table SIC results```{r}# Function to calculate SiC Index (without plotting)calculate_sic_index <-function(data, age, caries_column) {# Ensure caries_column exists in the datasetif (!caries_column %in%colnames(data)) {stop("Invalid caries column name.") }# Filter data for selected age and caries column df <- data |>filter(Age == age) |>select(all_of(caries_column)) |>arrange(.data[[caries_column]])# Rename column dynamicallycolnames(df) <-c("Caries_Score")# If there are no valid values, return NAif (nrow(df) ==0) {return(NA) }# Calculate cumulative frequency distribution df <- df |>mutate(Cumulative_Percent = (row_number() /n()) *100)# Identify the SiC Index threshold (top 1/3) sic_threshold <-quantile(df$Cumulative_Percent, probs =2/3, na.rm =TRUE)# Calculate the SiC Index as the mean of the highest 1/3 sic_index <- df |>filter(Cumulative_Percent >= sic_threshold) |>summarise(SiC =mean(Caries_Score, na.rm =TRUE)) |>pull(SiC)return(sic_index)}``````{r}# Define caries thresholdscaries_levels <-c("D1MFS", "D3MFS", "D5MFS", "D1MFT", "D3MFT", "D5MFT")``````{r}# Compute SiC Index for each age (12 & 15) and caries levelsic_table <-expand.grid(Age =c(12, 15), Caries_Level = caries_levels, stringsAsFactors =FALSE) |>mutate(Caries_Level =as.character(Caries_Level)) |># Convert factor to characterrowwise() |>mutate(SiC_Index =calculate_sic_index(caries, Age, Caries_Level)) |>ungroup()``````{r}# Reshape the table: Caries Thresholds in rows, Age in columnssic_table_wide <- sic_table |>pivot_wider(names_from = Age, values_from = SiC_Index) |>arrange(Caries_Level)# Display the transformed tablesic_table_wide |>gt() |>tab_header(title ="SiC Index for 12 and 15-Year-Olds") |>cols_label(Caries_Level ="Caries Threshold", `12`="12-Year-Olds", `15`="15-Year-Olds") |>fmt_number(columns =c("12", "15"), decimals =2) |>tab_options(table.font.size ="medium",heading.title.font.size ="large" )```