Argentina Data Analysis

1. Demographics

Variable Description ID: Unique identifier for the participant. ID_genetics: Genetic identifier related to the participant’s sample or genetic profile. cognitive: Assessment or overall score of cognitive function. de800cog: Result obtained from a specific cognitive test (e.g., DE800 version). images: Indicates the availability or analysis of images (e.g., neuroimaging). DATE_STUDY: Date when the study was conducted. BIRTH: Participant’s birth date. AGE-2024: Participant’s age in the year 2024. AGE-PCR: Age of the participant at the time of the PCR test. AGE_INTERVAL: Interval between age assessments (e.g., between the study date and the PCR test).

2. Symptoms

Variable Description ID Identifier: associated with the symptom evaluation. ANOSMIA: Indicator of loss of smell (anosmia), presented in binary format or on a scale. Anosmia: A second measure or confirmation of the presence of anosmia. RISK-HOSPITAL-ICU: Indicator of the risk of hospitalization or the need for ICU care due to COVID-19. VACCINE_BEFORE_STUDY: Participant’s vaccination status prior to the study. COVID_BEFORE_VACCINATION: History of COVID-19 infection before vaccination. FEVER: Presence of fever. COUGH: Presence of cough. MUSCLE_PAIN: Presence of muscle pain (myalgia). BREATH_DIF: Indication of breathing difficulties (dyspnea). SMELL_LOST: Report of loss of smell. TASTE_LOST: Report of loss of taste. DATE_PCR: Date when the PCR test for COVID-19 was performed. PCR: Qualitative result of the PCR test (e.g., positive or negative). PCR_NUM: Numerical value associated with the PCR test (e.g., Ct value or viral load). COVID-VARIANT: SARS-CoV-2 variant identified (e.g., Alpha, Delta, Omicron). VACCINE_1: Information regarding the first dose of the vaccine (type or brand). VACCINE_2: Information regarding the second dose of the vaccine (type or brand). VACCINE_3: Information regarding the third dose or booster dose of the vaccine.

3. Environmental COVID

Variable Description ID: Identifier related to environmental data or COVID-19 exposure. LISTAPRIMERREC: List referring to the first recognition (possibly of symptoms or initial contact). LISTAAPRENDIZAJE: List associated with the acquisition of knowledge or learning in the context of COVID-19. LISTACP: Indicator or list related to close contacts (CP) or similar parameters. LISTALP: List of parameters or locations (LP), whose definition depends on the study protocol. LISTARECON: List for the recognition of signs or symptoms related to COVID-19. CORSIDIRECTO: Measure or score of direct correlation, possibly related to environmental factors. CORSIINVERSO: Measure or score of inverse correlation related to the same parameters. CACTUSCORRECTAS: Number of correct responses obtained in the “cactus” test. CACTUSVIVOS: Number of responses classified as “living” in the “cactus” test. CACTUSINANIM: Number of responses classified as “inanimate” in the “cactus” test.

4. Cognitive Variables

Variable Description OTVERBALTPO: Response time in the verbal task (OT). OTVERBALERR: Number of errors in the verbal task (OT). OTVISUALTPO: Response time in the visual task (OT). OTVISUALERR: Number of errors in the visual task (OT). OTMENTALTPO: Response time in the mental task (OT). OTMENTALERR: Number of errors in the mental task (OT). OTVISMENTTPO: Response time in the task combining visual and mental stimuli (OT). OTVISMENTERR: Number of errors in the task combining visual and mental stimuli (OT). OTSWITCHTPO: Response time in the switching task (OT evaluation). OTSWITCHERR: Number of errors in the switching task (OT evaluation). 5DREADTPO: Reading time in the designated 5D task. 5DREADERR: Number of errors in the 5D reading task. 5DCOUNTTPO: Time taken in the counting task (5D). 5DCOUNTERR: Number of errors in the counting task (5D). 5DFOCTPO: Execution time in the focus task (5D). 5DFOCERR: Number of errors in the focus task (5D). 5DSWITCHTPO: Response time in the switching task (5D evaluation). 5DSWITCHERR: Number of errors in the switching task (5D evaluation). DSCORR: Number of correct responses in the DS task (e.g., digit span test). DSOMIS: Number of omissions in the DS task. DSCOMIS: Number of commission errors (incorrect responses) in the DS task. TORREMOV: Indicator or number of removals in a tower task (possibly related to response inhibition). TORRETPO: Execution time in the tower task (e.g., Tower of Hanoi or similar). BOSTONSC: Score obtained on the Boston test subscale, possibly related to naming. BOSTONLAT: Latency in the performance of the Boston test. BOSTONSEM: Performance in the semantic component of the Boston test. BOSTONSEMERR: Number of errors in the semantic component of the Boston test. BOSTONFON: Performance in the phonemic component of the Boston test. BOSTONFONERR: Number of errors in the phonemic component of the Boston test. FLUENCIA: Measure of verbal fluency, evaluating the ability to generate words within a specified time.

library(readxl)
library(tidyverse)
library(caret)
library(janitor)
library(DataExplorer)
library(dlookr)
library(skimr)
library(car) # For Anova()
library(effectsize) # For effect sizes
library(corrplot) # For correlation matrix visualization

# Read the Excel file
dataset <- read_excel("~/Downloads/en_uso8_last_V10.xlsx", 
                      sheet = "dataset", col_types = c("numeric", 
                                                       "numeric", "numeric", "numeric", 
                                                       "numeric", "date", "date", "numeric", 
                                                       "numeric", "numeric", "date", "numeric", 
                                                       "numeric", "numeric", "numeric", 
                                                       "numeric", "numeric", "numeric", 
                                                       "numeric", "numeric", "numeric", 
                                                       "numeric", "date", "text", "numeric", 
                                                       "numeric", "numeric", "numeric", 
                                                       "numeric", "numeric", "numeric", 
                                                       "numeric", "numeric", "numeric", 
                                                       "numeric", "numeric", "numeric", 
                                                       "numeric", "numeric", "numeric", 
                                                       "numeric", "numeric", "numeric", 
                                                       "numeric", "numeric", "numeric", 
                                                       "numeric", "numeric", "numeric", 
                                                       "numeric", "numeric", "numeric", 
                                                       "numeric", "numeric", "numeric", 
                                                       "numeric", "numeric", "numeric", 
                                                       "numeric", "numeric", "numeric", 
                                                       "numeric", "numeric", "numeric", 
                                                       "numeric", "numeric", "numeric", 
                                                       "numeric", "numeric", "numeric", 
                                                       "numeric", "numeric", "numeric", 
                                                       "numeric", "numeric", "numeric", 
                                                       "numeric", "numeric", "numeric", 
                                                       "numeric", "numeric", "numeric", 
                                                       "numeric", "numeric", "numeric", 
                                                       "numeric", "numeric", "numeric", 
                                                       "numeric", "numeric", "numeric", 
                                                       "numeric", "numeric", "numeric", 
                                                       "numeric", "numeric", "numeric", 
                                                       "numeric", "numeric", "numeric", 
                                                       "numeric"))


# Rename variables to clean names
dataset <- dataset %>% 
  clean_names()

# Subset the data (selecting specific columns)
data <- dataset[c(2:5, 8:10, 12, 14:22, 24:29, 31:37, 39:65, 67, 69:101)]

# Get the names of the columns
# names(data)

Data Summary

# Summary of cognitive data:
skim(data[24:60])
Data summary
Name data[24:60]
Number of rows 463
Number of columns 37
_______________________
Column type frequency:
numeric 37
________________________
Group variables None

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
listaprimerrec 0 1.00 3.71 1.63 0.00 3.00 4.00 5.00 16.00 ▇▇▁▁▁
listaaprendizaje 0 1.00 23.99 7.35 0.00 19.00 24.00 29.00 41.00 ▁▂▇▇▂
listacp 0 1.00 6.20 2.21 0.00 5.00 6.00 8.00 12.00 ▁▃▇▃▁
listalp 0 1.00 5.48 2.38 0.00 4.00 5.00 7.00 12.00 ▂▅▇▃▁
listarecon 0 1.00 21.21 3.27 0.00 20.00 22.00 23.00 24.00 ▁▁▁▂▇
corsidirecto 0 1.00 5.13 1.34 0.00 4.00 5.00 6.00 11.00 ▁▃▇▁▁
corsiinverso 1 1.00 4.17 1.21 0.00 4.00 4.00 5.00 9.00 ▁▂▇▂▁
cactusvivos 0 1.00 12.67 2.56 0.00 12.00 13.00 14.00 16.00 ▁▁▁▃▇
cactusinanim 0 1.00 15.11 2.46 0.00 14.00 16.00 17.00 17.00 ▁▁▁▁▇
otverbaltpo 0 1.00 185.29 11.42 99.00 180.00 189.00 192.00 215.00 ▁▁▁▇▃
otverbalerr 0 1.00 19.84 0.55 15.00 20.00 20.00 20.00 20.00 ▁▁▁▁▇
otvisualtpo 0 1.00 63.95 31.16 0.00 46.00 55.00 74.00 300.00 ▇▅▁▁▁
otvisualerr 0 1.00 19.70 1.42 0.00 20.00 20.00 20.00 20.00 ▁▁▁▁▇
otmentaltpo 0 1.00 245.58 52.95 19.00 234.50 262.00 278.00 319.00 ▁▁▁▅▇
otmentalerr 0 1.00 18.04 3.03 0.00 17.00 19.00 20.00 20.00 ▁▁▁▂▇
otvismenttpo 0 1.00 240.89 49.22 32.00 226.50 256.00 272.50 332.00 ▁▁▂▇▃
otvismenterr 0 1.00 18.63 2.70 0.00 18.00 20.00 20.00 20.00 ▁▁▁▁▇
otswitchtpo 0 1.00 350.00 53.89 28.00 338.00 366.00 383.00 428.00 ▁▁▁▃▇
otswitcherr 0 1.00 17.65 2.95 0.00 17.00 19.00 20.00 20.00 ▁▁▁▂▇
x5dreadtpo 0 1.00 188.32 20.01 22.00 185.00 193.00 198.00 222.00 ▁▁▁▂▇
x5dreaderr 3 0.99 49.83 1.06 30.00 50.00 50.00 50.00 50.00 ▁▁▁▁▇
x5dcounttpo 2 1.00 165.29 14.23 68.00 162.00 168.00 173.00 200.00 ▁▁▁▇▃
x5dcounterr 2 1.00 49.79 1.51 19.00 50.00 50.00 50.00 50.00 ▁▁▁▁▇
x5dfoctpo 2 1.00 246.04 23.31 89.00 239.00 252.00 259.00 300.00 ▁▁▁▇▃
x5dfocerr 2 1.00 48.28 3.01 3.00 48.00 49.00 50.00 50.00 ▁▁▁▁▇
x5dswitchtpo 2 1.00 271.96 35.53 0.00 257.00 280.00 295.00 350.00 ▁▁▁▇▇
x5dswitcherr 2 1.00 46.58 4.59 0.00 46.00 48.00 49.00 50.00 ▁▁▁▁▇
dscorr 0 1.00 69.16 18.11 0.00 57.00 70.00 82.00 109.00 ▁▂▆▇▂
dsomis 0 1.00 0.35 1.95 0.00 0.00 0.00 0.00 24.00 ▇▁▁▁▁
dscomis 0 1.00 25.93 2.25 0.00 26.00 27.00 27.00 27.00 ▁▁▁▁▇
torremov 1 1.00 341.44 3.93 323.00 340.00 343.00 344.00 351.00 ▁▁▂▇▁
torretpo 0 1.00 264.54 74.67 2.58 240.58 287.58 316.08 352.58 ▁▁▁▅▇
bostonsc 0 1.00 3.20 2.93 0.00 1.00 2.00 5.00 12.00 ▇▃▂▂▁
bostonlat 0 1.00 2.71 1.89 0.00 1.83 2.62 3.52 16.00 ▇▃▁▁▁
bostonsemerr 0 1.00 1.99 2.37 0.00 0.00 1.00 3.00 10.00 ▇▂▁▁▁
bostonfonerr 0 1.00 1.12 1.76 0.00 0.00 0.00 2.00 11.00 ▇▁▁▁▁
fluencia 1 1.00 16.52 4.82 1.00 14.00 16.50 20.00 33.00 ▁▃▇▂▁
# Visualize the missing data pattern:
plot_missing(data[24:60])

# Plot density distributions of numeric variables:
plot_density(data[24:60])

# Diagnose potential data issues
print(diagnose_outlier(data[24:60]), n = 40)
## # A tibble: 37 × 6
##    variables    outliers_cnt outliers_ratio outliers_mean with_mean without_mean
##    <chr>               <int>          <dbl>         <dbl>     <dbl>        <dbl>
##  1 listaprimer…            4          0.864         10.8      3.71         3.64 
##  2 listaaprend…            4          0.864          0.75    24.0         24.2  
##  3 listacp                 8          1.73           0        6.20         6.31 
##  4 listalp                 4          0.864         12        5.48         5.42 
##  5 listarecon             16          3.46           8.75    21.2         21.7  
##  6 corsidirecto            7          1.51           6        5.13         5.12 
##  7 corsiinverso           52         11.2            3.17     4.17         4.3  
##  8 cactusvivos            31          6.70           5.81    12.7         13.2  
##  9 cactusinanim           20          4.32           6.6     15.1         15.5  
## 10 otverbaltpo            20          4.32         157.     185.         187.   
## 11 otverbalerr            49         10.6           18.5     19.8         20    
## 12 otvisualtpo            29          6.26         142.      64.0         58.7  
## 13 otvisualerr            75         16.2           18.2     19.7         20    
## 14 otmentaltpo            44          9.50         110.     246.         260.   
## 15 otmentalerr            18          3.89           7       18.0         18.5  
## 16 otvismenttpo           29          6.26          97.7    241.         250.   
## 17 otvismenterr           24          5.18          10.0     18.6         19.1  
## 18 otswitchtpo            30          6.48         197.     350.         361.   
## 19 otswitcherr            21          4.54           8.43    17.7         18.1  
## 20 x5dreadtpo             29          6.26         138.     188.         192.   
## 21 x5dreaderr             38          8.21          47.9     49.8         50    
## 22 x5dcounttpo            32          6.91         135.     165.         168.   
## 23 x5dcounterr            46          9.94          47.9     49.8         50    
## 24 x5dfoctpo              28          6.05         194.     246.         249.   
## 25 x5dfocerr              21          4.54          39       48.3         48.7  
## 26 x5dswitchtpo           19          4.10         160.     272.         277.   
## 27 x5dswitcherr           31          6.70          34.4     46.6         47.5  
## 28 dscorr                  3          0.648         11.3     69.2         69.5  
## 29 dsomis                 46          9.94           3.54     0.352        0    
## 30 dscomis                68         14.7           21.8     25.9         26.6  
## 31 torremov               18          3.89         330.     341.         342.   
## 32 torretpo               37          7.99          66.8    265.         282.   
## 33 bostonsc                4          0.864         12        3.20         3.12 
## 34 bostonlat              19          4.10           8.04     2.71         2.48 
## 35 bostonsemerr           17          3.67           8.76     1.99         1.73 
## 36 bostonfonerr           22          4.75           6.64     1.12         0.846
## 37 fluencia                8          1.73          17.6     16.5         16.5
# Explore missing data patterns
print(diagnose(data[24:60]), n = 40)
## # A tibble: 37 × 6
##    variables        types missing_count missing_percent unique_count unique_rate
##    <chr>            <chr>         <int>           <dbl>        <int>       <dbl>
##  1 listaprimerrec   nume…             0           0               11      0.0238
##  2 listaaprendizaje nume…             0           0               39      0.0842
##  3 listacp          nume…             0           0               13      0.0281
##  4 listalp          nume…             0           0               13      0.0281
##  5 listarecon       nume…             0           0               17      0.0367
##  6 corsidirecto     nume…             0           0               11      0.0238
##  7 corsiinverso     nume…             1           0.216           11      0.0238
##  8 cactusvivos      nume…             0           0               14      0.0302
##  9 cactusinanim     nume…             0           0               13      0.0281
## 10 otverbaltpo      nume…             0           0               49      0.106 
## 11 otverbalerr      nume…             0           0                5      0.0108
## 12 otvisualtpo      nume…             0           0               99      0.214 
## 13 otvisualerr      nume…             0           0                7      0.0151
## 14 otmentaltpo      nume…             0           0              136      0.294 
## 15 otmentalerr      nume…             0           0               14      0.0302
## 16 otvismenttpo     nume…             0           0              128      0.276 
## 17 otvismenterr     nume…             0           0               13      0.0281
## 18 otswitchtpo      nume…             0           0              134      0.289 
## 19 otswitcherr      nume…             0           0               14      0.0302
## 20 x5dreadtpo       nume…             0           0               65      0.140 
## 21 x5dreaderr       nume…             3           0.648            7      0.0151
## 22 x5dcounttpo      nume…             2           0.432           59      0.127 
## 23 x5dcounterr      nume…             2           0.432            7      0.0151
## 24 x5dfoctpo        nume…             2           0.432           84      0.181 
## 25 x5dfocerr        nume…             2           0.432           15      0.0324
## 26 x5dswitchtpo     nume…             2           0.432          118      0.255 
## 27 x5dswitcherr     nume…             2           0.432           21      0.0454
## 28 dscorr           nume…             0           0               82      0.177 
## 29 dsomis           nume…             0           0               12      0.0259
## 30 dscomis          nume…             0           0               13      0.0281
## 31 torremov         nume…             1           0.216           20      0.0432
## 32 torretpo         nume…             0           0              151      0.326 
## 33 bostonsc         nume…             0           0               13      0.0281
## 34 bostonlat        nume…             0           0              147      0.317 
## 35 bostonsemerr     nume…             0           0               11      0.0238
## 36 bostonfonerr     nume…             0           0               10      0.0216
## 37 fluencia         nume…             1           0.216           30      0.0648
# Summary of the COVID data
summary(data[c(2:23)])
##    cognitive        de800cog         images         age_2024       age_pcr     
##  Min.   :1.000   Min.   :1.000   Min.   :1.000   Min.   :12.0   Min.   :53.00  
##  1st Qu.:1.000   1st Qu.:2.000   1st Qu.:1.000   1st Qu.:65.0   1st Qu.:62.00  
##  Median :2.000   Median :2.000   Median :2.000   Median :69.0   Median :65.00  
##  Mean   :1.501   Mean   :2.352   Mean   :1.877   Mean   :69.7   Mean   :66.48  
##  3rd Qu.:2.000   3rd Qu.:3.000   3rd Qu.:2.000   3rd Qu.:73.0   3rd Qu.:70.00  
##  Max.   :2.000   Max.   :4.000   Max.   :3.000   Max.   :90.0   Max.   :86.00  
##                                                  NA's   :13                    
##   age_interval      anosmia      risk_hospital_icu vaccine_before_study
##  Min.   :1.000   Min.   :0.000   Min.   :0.0000    Min.   :0.000       
##  1st Qu.:1.000   1st Qu.:1.000   1st Qu.:0.0000    1st Qu.:1.000       
##  Median :2.000   Median :2.000   Median :0.0000    Median :2.000       
##  Mean   :1.922   Mean   :1.734   Mean   :0.3024    Mean   :1.482       
##  3rd Qu.:2.000   3rd Qu.:3.000   3rd Qu.:0.0000    3rd Qu.:2.000       
##  Max.   :3.000   Max.   :3.000   Max.   :3.0000    Max.   :3.000       
##                  NA's   :1                                             
##  covid_before_vaccination     fever            cough         muscle_pain    
##  Min.   :0.0000           Min.   :0.0000   Min.   :0.0000   Min.   :0.0000  
##  1st Qu.:0.0000           1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.0000  
##  Median :1.0000           Median :0.0000   Median :1.0000   Median :1.0000  
##  Mean   :0.5487           Mean   :0.4323   Mean   :0.5178   Mean   :0.6081  
##  3rd Qu.:1.0000           3rd Qu.:1.0000   3rd Qu.:1.0000   3rd Qu.:1.0000  
##  Max.   :1.0000           Max.   :1.0000   Max.   :1.0000   Max.   :1.0000  
##  NA's   :42               NA's   :42       NA's   :42       NA's   :42      
##    breath_dif       smell_lost       taste_lost        pcr           
##  Min.   :0.0000   Min.   :0.0000   Min.   :0.000   Length:463        
##  1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.000   Class :character  
##  Median :0.0000   Median :0.0000   Median :0.000   Mode  :character  
##  Mean   :0.3967   Mean   :0.4561   Mean   :0.399                     
##  3rd Qu.:1.0000   3rd Qu.:1.0000   3rd Qu.:1.000                     
##  Max.   :1.0000   Max.   :1.0000   Max.   :1.000                     
##  NA's   :42       NA's   :42       NA's   :42                        
##     pcr_num       covid_variant     vaccine_1       vaccine_2    
##  Min.   :0.0000   Min.   :0.000   Min.   :0.000   Min.   :0.000  
##  1st Qu.:1.0000   1st Qu.:1.000   1st Qu.:2.000   1st Qu.:1.000  
##  Median :1.0000   Median :1.000   Median :2.000   Median :2.000  
##  Mean   :0.8377   Mean   :1.348   Mean   :1.996   Mean   :1.819  
##  3rd Qu.:1.0000   3rd Qu.:2.000   3rd Qu.:2.000   3rd Qu.:2.000  
##  Max.   :1.0000   Max.   :7.000   Max.   :5.000   Max.   :6.000  
##  NA's   :1                                                       
##    vaccine_3    
##  Min.   :0.000  
##  1st Qu.:0.000  
##  Median :1.000  
##  Mean   :1.726  
##  3rd Qu.:2.000  
##  Max.   :6.000  
## 
# Summary of the cognitive data
summary(data[24:60])
##  listaprimerrec   listaaprendizaje    listacp          listalp      
##  Min.   : 0.000   Min.   : 0.00    Min.   : 0.000   Min.   : 0.000  
##  1st Qu.: 3.000   1st Qu.:19.00    1st Qu.: 5.000   1st Qu.: 4.000  
##  Median : 4.000   Median :24.00    Median : 6.000   Median : 5.000  
##  Mean   : 3.706   Mean   :23.99    Mean   : 6.203   Mean   : 5.477  
##  3rd Qu.: 5.000   3rd Qu.:29.00    3rd Qu.: 8.000   3rd Qu.: 7.000  
##  Max.   :16.000   Max.   :41.00    Max.   :12.000   Max.   :12.000  
##                                                                     
##    listarecon     corsidirecto     corsiinverso    cactusvivos   
##  Min.   : 0.00   Min.   : 0.000   Min.   :0.000   Min.   : 0.00  
##  1st Qu.:20.00   1st Qu.: 4.000   1st Qu.:4.000   1st Qu.:12.00  
##  Median :22.00   Median : 5.000   Median :4.000   Median :13.00  
##  Mean   :21.21   Mean   : 5.134   Mean   :4.173   Mean   :12.67  
##  3rd Qu.:23.00   3rd Qu.: 6.000   3rd Qu.:5.000   3rd Qu.:14.00  
##  Max.   :24.00   Max.   :11.000   Max.   :9.000   Max.   :16.00  
##                                   NA's   :1                      
##   cactusinanim    otverbaltpo     otverbalerr     otvisualtpo    
##  Min.   : 0.00   Min.   : 99.0   Min.   :15.00   Min.   :  0.00  
##  1st Qu.:14.00   1st Qu.:180.0   1st Qu.:20.00   1st Qu.: 46.00  
##  Median :16.00   Median :189.0   Median :20.00   Median : 55.00  
##  Mean   :15.11   Mean   :185.3   Mean   :19.84   Mean   : 63.95  
##  3rd Qu.:17.00   3rd Qu.:192.0   3rd Qu.:20.00   3rd Qu.: 74.00  
##  Max.   :17.00   Max.   :215.0   Max.   :20.00   Max.   :300.00  
##                                                                  
##   otvisualerr    otmentaltpo     otmentalerr     otvismenttpo    otvismenterr  
##  Min.   : 0.0   Min.   : 19.0   Min.   : 0.00   Min.   : 32.0   Min.   : 0.00  
##  1st Qu.:20.0   1st Qu.:234.5   1st Qu.:17.00   1st Qu.:226.5   1st Qu.:18.00  
##  Median :20.0   Median :262.0   Median :19.00   Median :256.0   Median :20.00  
##  Mean   :19.7   Mean   :245.6   Mean   :18.04   Mean   :240.9   Mean   :18.63  
##  3rd Qu.:20.0   3rd Qu.:278.0   3rd Qu.:20.00   3rd Qu.:272.5   3rd Qu.:20.00  
##  Max.   :20.0   Max.   :319.0   Max.   :20.00   Max.   :332.0   Max.   :20.00  
##                                                                                
##   otswitchtpo   otswitcherr      x5dreadtpo      x5dreaderr     x5dcounttpo   
##  Min.   : 28   Min.   : 0.00   Min.   : 22.0   Min.   :30.00   Min.   : 68.0  
##  1st Qu.:338   1st Qu.:17.00   1st Qu.:185.0   1st Qu.:50.00   1st Qu.:162.0  
##  Median :366   Median :19.00   Median :193.0   Median :50.00   Median :168.0  
##  Mean   :350   Mean   :17.65   Mean   :188.3   Mean   :49.83   Mean   :165.3  
##  3rd Qu.:383   3rd Qu.:20.00   3rd Qu.:198.0   3rd Qu.:50.00   3rd Qu.:173.0  
##  Max.   :428   Max.   :20.00   Max.   :222.0   Max.   :50.00   Max.   :200.0  
##                                                NA's   :3       NA's   :2      
##   x5dcounterr      x5dfoctpo     x5dfocerr      x5dswitchtpo  x5dswitcherr  
##  Min.   :19.00   Min.   : 89   Min.   : 3.00   Min.   :  0   Min.   : 0.00  
##  1st Qu.:50.00   1st Qu.:239   1st Qu.:48.00   1st Qu.:257   1st Qu.:46.00  
##  Median :50.00   Median :252   Median :49.00   Median :280   Median :48.00  
##  Mean   :49.79   Mean   :246   Mean   :48.28   Mean   :272   Mean   :46.58  
##  3rd Qu.:50.00   3rd Qu.:259   3rd Qu.:50.00   3rd Qu.:295   3rd Qu.:49.00  
##  Max.   :50.00   Max.   :300   Max.   :50.00   Max.   :350   Max.   :50.00  
##  NA's   :2       NA's   :2     NA's   :2       NA's   :2     NA's   :2      
##      dscorr           dsomis           dscomis         torremov    
##  Min.   :  0.00   Min.   : 0.0000   Min.   : 0.00   Min.   :323.0  
##  1st Qu.: 57.00   1st Qu.: 0.0000   1st Qu.:26.00   1st Qu.:340.0  
##  Median : 70.00   Median : 0.0000   Median :27.00   Median :343.0  
##  Mean   : 69.16   Mean   : 0.3521   Mean   :25.93   Mean   :341.4  
##  3rd Qu.: 82.00   3rd Qu.: 0.0000   3rd Qu.:27.00   3rd Qu.:344.0  
##  Max.   :109.00   Max.   :24.0000   Max.   :27.00   Max.   :351.0  
##                                                     NA's   :1      
##     torretpo         bostonsc        bostonlat       bostonsemerr   
##  Min.   :  2.58   Min.   : 0.000   Min.   : 0.000   Min.   : 0.000  
##  1st Qu.:240.58   1st Qu.: 1.000   1st Qu.: 1.830   1st Qu.: 0.000  
##  Median :287.58   Median : 2.000   Median : 2.620   Median : 1.000  
##  Mean   :264.54   Mean   : 3.201   Mean   : 2.706   Mean   : 1.991  
##  3rd Qu.:316.08   3rd Qu.: 5.000   3rd Qu.: 3.520   3rd Qu.: 3.000  
##  Max.   :352.58   Max.   :12.000   Max.   :16.000   Max.   :10.000  
##                                                                     
##   bostonfonerr       fluencia    
##  Min.   : 0.000   Min.   : 1.00  
##  1st Qu.: 0.000   1st Qu.:14.00  
##  Median : 0.000   Median :16.50  
##  Mean   : 1.121   Mean   :16.52  
##  3rd Qu.: 2.000   3rd Qu.:20.00  
##  Max.   :11.000   Max.   :33.00  
##                   NA's   :1
# Summary of the volume brain data
summary(data[71:91])
##  right_thalamus_proper left_thalamus_proper  fornix_right    fornix_left   
##  Min.   :4349          Min.   :4478         Min.   : 67.0   Min.   :105.0  
##  1st Qu.:6449          1st Qu.:6800         1st Qu.:324.0   1st Qu.:399.5  
##  Median :6859          Median :7203         Median :361.0   Median :453.0  
##  Mean   :6871          Mean   :7231         Mean   :358.6   Mean   :446.2  
##  3rd Qu.:7265          3rd Qu.:7658         3rd Qu.:394.0   3rd Qu.:496.0  
##  Max.   :9039          Max.   :9560         Max.   :541.0   Max.   :694.0  
##  anterior_limb_of_internal_capsule_right anterior_limb_of_internal_capsule_left
##  Min.   :1849                            Min.   :1438                          
##  1st Qu.:2842                            1st Qu.:2440                          
##  Median :3071                            Median :2673                          
##  Mean   :3099                            Mean   :2687                          
##  3rd Qu.:3336                            3rd Qu.:2919                          
##  Max.   :4742                            Max.   :4213                          
##  posterior_limb_of_internal_capsule_inc_cerebral_peduncle_right
##  Min.   :1571                                                  
##  1st Qu.:2194                                                  
##  Median :2349                                                  
##  Mean   :2384                                                  
##  3rd Qu.:2557                                                  
##  Max.   :4129                                                  
##  posterior_limb_of_internal_capsule_inc_cerebral_peduncle_left corpus_callosum
##  Min.   :1455                                                  Min.   : 6119  
##  1st Qu.:2155                                                  1st Qu.: 9718  
##  Median :2333                                                  Median :10789  
##  Mean   :2365                                                  Mean   :10885  
##  3rd Qu.:2550                                                  3rd Qu.:11905  
##  Max.   :3814                                                  Max.   :17580  
##  right_a_cg_g_anterior_cingulate_gyrus left_a_cg_g_anterior_cingulate_gyrus
##  Min.   :1901                          Min.   :2210                        
##  1st Qu.:2962                          1st Qu.:3522                        
##  Median :3316                          Median :3896                        
##  Mean   :3392                          Mean   :3953                        
##  3rd Qu.:3772                          3rd Qu.:4374                        
##  Max.   :5867                          Max.   :6185                        
##  right_a_ins_anterior_insula left_a_ins_anterior_insula
##  Min.   :1537                Min.   :1562              
##  1st Qu.:2925                1st Qu.:2907              
##  Median :3186                Median :3165              
##  Mean   :3206                Mean   :3177              
##  3rd Qu.:3493                3rd Qu.:3450              
##  Max.   :4521                Max.   :5098              
##  right_an_g_angular_gyrus left_an_g_angular_gyrus right_cun_cuneus
##  Min.   : 4960            Min.   : 4715           Min.   :2653    
##  1st Qu.: 7492            1st Qu.: 6732           1st Qu.:4556    
##  Median : 8189            Median : 7401           Median :5139    
##  Mean   : 8312            Mean   : 7447           Mean   :5188    
##  3rd Qu.: 9116            3rd Qu.: 8115           3rd Qu.:5791    
##  Max.   :13560            Max.   :11434           Max.   :8236    
##  left_cun_cuneus right_ent_entorhinal_area left_ent_entorhinal_area
##  Min.   :2627    Min.   : 749              Min.   : 796            
##  1st Qu.:4551    1st Qu.:2131              1st Qu.:1856            
##  Median :5085    Median :2372              Median :2070            
##  Mean   :5144    Mean   :2424              Mean   :2109            
##  3rd Qu.:5652    3rd Qu.:2704              3rd Qu.:2336            
##  Max.   :8628    Max.   :3989              Max.   :4156            
##  right_g_re_gyrus_rectus left_g_re_gyrus_rectus
##  Min.   :1253            Min.   :1189          
##  1st Qu.:1752            1st Qu.:1774          
##  Median :1955            Median :1956          
##  Mean   :1972            Mean   :1978          
##  3rd Qu.:2166            3rd Qu.:2152          
##  Max.   :2912            Max.   :3072
# Convert relevant columns to factors
factor_cols <- c("pcr", "anosmia", "risk_hospital_icu", "vaccine_before_study",
                 "covid_before_vaccination", "fever", "cough", "muscle_pain",
                 "breath_dif", "smell_lost", "taste_lost", "covid_variant",
                 "vaccine_1", "vaccine_2", "vaccine_3")

# Check which factor_cols actually exist in the data
existing_factor_cols <- factor_cols[factor_cols %in% names(data)]

# Apply as.factor only to existing columns
if (length(existing_factor_cols) > 0) {
    data <- data %>%
        mutate(across(all_of(existing_factor_cols), as.factor))
}

Question 1: Does COVID-19 infection (positive PCR) impact overall cognitive scores?

# --- Analysis Questions & Code Examples ---

# Question 1: Does COVID-19 infection (positive PCR) impact overall cognitive scores?
# Assuming 'cognitive' is a primary numeric outcome. Adjust variable name if needed.
# Check normality assumption first (e.g., Shapiro-Wilk test, histograms)
# shapiro.test(data$cognitive[data$pcr == "Positive"]) # Example for one group
# shapiro.test(data$cognitive[data$pcr == "Negative"]) # Example for other group
# hist(data$cognitive)

# If data is roughly normal: t-test
if ("cognitive" %in% names(data) && "pcr" %in% names(data)) {
    print("--- Q1: Cognitive Score vs PCR Status (T-test/Wilcoxon) ---")
    # Check levels of pcr factor
    print(levels(data$pcr))
    # Ensure levels are correctly specified for comparison if needed
    tryCatch({
        ttest_result <- t.test(cognitive ~ pcr, data = data)
        print(ttest_result)
    }, error = function(e) {
        print(paste("T-test failed:", e$message))
        print("Attempting Wilcoxon test instead...")
        tryCatch({
            wilcox_result <- wilcox.test(cognitive ~ pcr, data = data)
            print(wilcox_result)
        }, error = function(e2) {
            print(paste("Wilcoxon test also failed:", e2$message))
        })
    })
    
    # Linear model controlling for age (assuming age_pcr is relevant age)
    if("age_pcr" %in% names(data)) {
        print("--- Q1: Cognitive Score vs PCR Status controlling for Age (LM) ---")
        lm_model_pcr_age <- lm(cognitive ~ pcr + age_pcr, data = data)
        print(summary(lm_model_pcr_age))
    }
} else {
    print("Skipping Q1 analysis: 'cognitive' or 'pcr' column not found.")
}
## [1] "--- Q1: Cognitive Score vs PCR Status (T-test/Wilcoxon) ---"
## [1] "NEGATIVA" "POSITIVA"
## 
##  Welch Two Sample t-test
## 
## data:  cognitive by pcr
## t = 1.3513, df = 105.05, p-value = 0.1795
## alternative hypothesis: true difference in means between group NEGATIVA and group POSITIVA is not equal to 0
## 95 percent confidence interval:
##  -0.03970381  0.20962630
## sample estimates:
## mean in group NEGATIVA mean in group POSITIVA 
##               1.573333               1.488372 
## 
## [1] "--- Q1: Cognitive Score vs PCR Status controlling for Age (LM) ---"
## 
## Call:
## lm(formula = cognitive ~ pcr + age_pcr, data = data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.8350 -0.4707  0.2359  0.4252  0.8676 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  3.325410   0.290119  11.462  < 2e-16 ***
## pcrPOSITIVA -0.111318   0.060860  -1.829    0.068 .  
## age_pcr     -0.026021   0.004229  -6.153 1.66e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4812 on 459 degrees of freedom
##   (1 observation deleted due to missingness)
## Multiple R-squared:  0.07983,    Adjusted R-squared:  0.07582 
## F-statistic: 19.91 on 2 and 459 DF,  p-value: 5.107e-09

Q1: Overall Cognitive Score vs. PCR Status

The simple comparison (t-test) didn’t find a significant difference in the cognitive score between PCR positive and negative groups (p = 0.18).

However, when controlling for age (age_pcr) in the linear model, age itself is strongly negatively associated with cognitive scores (older age -> lower score, p < 0.001), which is expected. The effect of PCR status (pcrPOSITIVA) shows a trend towards lower scores in the positive group, but it doesn’t reach statistical significance (p = 0.068). This suggests age is a much stronger factor, and any potential direct effect of PCR status on this general cognitive score is weak or requires more power to detect.

Question 2: Is there a difference in specific cognitive domains (e.g., fluency, memory) between COVID+ and COVID- groups?

# Question 2: Is there a difference in specific cognitive domains (e.g., fluency, memory) between COVID+ and COVID- groups?
# Example using 'fluencia' and 'dscorr'
if (all(c("fluencia", "dscorr", "pcr") %in% names(data))) {
    print("--- Q2: Specific Cognitive Domains vs PCR Status ---")
    # Individual tests (repeat for other domains, consider p-value adjustment)
    print("Fluency:")
    tryCatch({print(t.test(fluencia ~ pcr, data = data))}, error=function(e){print(paste("Error:", e$message))})
    print("DS Correct:")
    tryCatch({print(t.test(dscorr ~ pcr, data = data))}, error=function(e){print(paste("Error:", e$message))})
    
    # MANOVA (Multivariate Analysis of Variance) - checks multiple DVs at once
    # Ensure no missing values in the selected columns for MANOVA
    print("MANOVA for Fluency & DS Correct:")
    manova_data <- data %>% select(fluencia, dscorr, pcr) %>% na.omit()
    if(nrow(manova_data) > 0 && length(unique(manova_data$pcr)) > 1) {
        manova_result <- manova(cbind(fluencia, dscorr) ~ pcr, data = manova_data)
        print(summary(manova_result))
        print(summary.aov(manova_result)) # To see univariate results
    } else {
        print("Insufficient data or factor levels for MANOVA.")
    }
} else {
    print("Skipping Q2 analysis: 'fluencia', 'dscorr', or 'pcr' column not found.")
}
## [1] "--- Q2: Specific Cognitive Domains vs PCR Status ---"
## [1] "Fluency:"
## 
##  Welch Two Sample t-test
## 
## data:  fluencia by pcr
## t = -2.7437, df = 99.602, p-value = 0.007207
## alternative hypothesis: true difference in means between group NEGATIVA and group POSITIVA is not equal to 0
## 95 percent confidence interval:
##  -3.0291666 -0.4866883
## sample estimates:
## mean in group NEGATIVA mean in group POSITIVA 
##               15.04000               16.79793 
## 
## [1] "DS Correct:"
## 
##  Welch Two Sample t-test
## 
## data:  dscorr by pcr
## t = -0.57336, df = 111.6, p-value = 0.5676
## alternative hypothesis: true difference in means between group NEGATIVA and group POSITIVA is not equal to 0
## 95 percent confidence interval:
##  -5.479240  3.019912
## sample estimates:
## mean in group NEGATIVA mean in group POSITIVA 
##               68.09333               69.32300 
## 
## [1] "MANOVA for Fluency & DS Correct:"
##            Df   Pillai approx F num Df den Df   Pr(>F)   
## pcr         1 0.021077   4.9305      2    458 0.007611 **
## Residuals 459                                            
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##  Response fluencia :
##              Df  Sum Sq Mean Sq F value   Pr(>F)   
## pcr           1   194.1 194.066  8.4713 0.003783 **
## Residuals   459 10515.1  22.909                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##  Response dscorr :
##              Df Sum Sq Mean Sq F value Pr(>F)
## pcr           1    100  100.36   0.305  0.581
## Residuals   459 151035  329.05

Q2: Specific Cognitive Domains (Fluency, DS Correct) vs. PCR Status

Fluency (fluencia): Surprisingly, the PCR positive group showed significantly higher verbal fluency scores than the negative group (p = 0.007). This is counter-intuitive if expecting COVID to impair cognition and warrants further investigation (see Next Steps).

Memory (dscorr - DS Correct): No significant difference was found between groups (p = 0.57).

MANOVA: The overall multivariate test was significant (p = 0.0076), confirming a difference between PCR groups when considering fluency and memory together. The univariate results confirm this difference is driven solely by the fluencia variable.

Question 3: Does the presence of specific symptoms (e.g., anosmia) correlate with cognitive performance?

# Question 3: Does the presence of specific symptoms (e.g., anosmia) correlate with cognitive performance?
# Using 'anosmia' factor and 'cognitive' score
if (all(c("cognitive", "anosmia", "age_pcr") %in% names(data))) {
    print("--- Q3: Cognitive Score vs Anosmia ---")
    print(levels(data$anosmia)) # Check levels
    # Compare cognitive scores based on anosmia presence (assuming binary factor)
    tryCatch({print(t.test(cognitive ~ anosmia, data = data))}, error=function(e){print(paste("Error:", e$message))})
    
    # Linear model controlling for age
    print("Linear Model for Cognitive Score ~ Anosmia + Age:")
    lm_model_anosmia_age <- lm(cognitive ~ anosmia + age_pcr, data = data)
    print(summary(lm_model_anosmia_age))
} else {
    print("Skipping Q3 analysis: 'cognitive', 'anosmia', or 'age_pcr' column not found.")
}
## [1] "--- Q3: Cognitive Score vs Anosmia ---"
## [1] "0" "1" "2" "3"
## [1] "Error: grouping factor must have exactly 2 levels"
## [1] "Linear Model for Cognitive Score ~ Anosmia + Age:"
## 
## Call:
## lm(formula = cognitive ~ anosmia + age_pcr, data = data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.9277 -0.4598  0.1478  0.4395  0.8791 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  2.911246   0.290696  10.015  < 2e-16 ***
## anosmia1     0.098567   0.079292   1.243 0.214469    
## anosmia2     0.232995   0.060789   3.833 0.000144 ***
## anosmia3     0.086466   0.059951   1.442 0.149911    
## age_pcr     -0.022953   0.004242  -5.411 1.01e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4758 on 457 degrees of freedom
##   (1 observation deleted due to missingness)
## Multiple R-squared:  0.1044, Adjusted R-squared:  0.09654 
## F-statistic: 13.31 on 4 and 457 DF,  p-value: 2.859e-10
# In the Q3 block, replace the t.test line with:
print("ANOVA for Cognitive Score ~ Anosmia:")
## [1] "ANOVA for Cognitive Score ~ Anosmia:"
tryCatch({
    aov_anosmia <- aov(cognitive ~ anosmia, data = data)
    print(summary(aov_anosmia))
    # Optional: Post-hoc tests if ANOVA is significant
    if (summary(aov_anosmia)[[1]]$`Pr(>F)`[1] < 0.05) {
         print(TukeyHSD(aov_anosmia))
    }
}, error=function(e){print(paste("Error:", e$message))})
##              Df Sum Sq Mean Sq F value  Pr(>F)    
## anosmia       3   5.43  1.8095   7.529 6.3e-05 ***
## Residuals   458 110.07  0.2403                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 1 observation deleted due to missingness
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = cognitive ~ anosmia, data = data)
## 
## $anosmia
##            diff         lwr         upr     p adj
## 1-0  0.15129274 -0.05777993  0.36036542 0.2442834
## 2-0  0.28409373  0.12454535  0.44364210 0.0000337
## 3-0  0.10220183 -0.05689478  0.26129845 0.3481916
## 2-1  0.13280098 -0.06681980  0.33242177 0.3167972
## 3-1 -0.04909091 -0.24835080  0.15016898 0.9206020
## 3-2 -0.18189189 -0.32834601 -0.03543778 0.0079235

Q3: Cognitive Score vs. Anosmia

Linear Model (controlling for age): Age is again significant (p < 0.001). Interestingly, compared to the baseline anosmia level 0, having anosmia level 2 is associated with significantly higher cognitive scores (p = 0.00014). Levels 1 and 3 were not significantly different from 0. This is also counter-intuitive and needs careful checking.

However, I don’t know what the levels of anosmia mean (anosmia1, anosmia2, anosmia3).

Question 4: Does vaccination status influence cognitive outcomes, potentially interacting with COVID status?

# Question 4: Does vaccination status influence cognitive outcomes, potentially interacting with COVID status?
# Requires 'num_doses' variable created earlier, and assumes 'cognitive', 'pcr', 'age_pcr' exist
if (all(c("cognitive", "pcr", "age_pcr") %in% names(data)) && "num_doses" %in% names(data)) {
    print("--- Q4: Cognitive Score vs Vaccination & PCR Status ---")
    # ANOVA/Linear Model including interaction
    # Using car::Anova for Type III sums of squares, often preferred with interactions
    lm_interaction <- lm(cognitive ~ num_doses * pcr + age_pcr, data = data)
    print("ANOVA Table (Type III SS):")
    print(Anova(lm_interaction, type = "III"))
    print("Model Summary:")
    print(summary(lm_interaction))
    
    # Visualize interaction if significant (example)
    # ggplot(data, aes(x = pcr, y = cognitive, color = num_doses, group = num_doses)) +
    #   stat_summary(fun.data = mean_se, geom = "errorbar", width = 0.1) +
    #   stat_summary(fun = mean, geom = "line") +
    #   stat_summary(fun = mean, geom = "point") +
    #   labs(title = "Interaction Plot: Cognitive Score by PCR Status and Vaccination",
    #        x = "PCR Status", y = "Mean Cognitive Score", color = "Vaccine Doses") +
    #   theme_minimal()
    
} else {
    print("Skipping Q4 analysis: 'cognitive', 'pcr', 'age_pcr', or 'num_doses' column not found or not created.")
}
## [1] "Skipping Q4 analysis: 'cognitive', 'pcr', 'age_pcr', or 'num_doses' column not found or not created."

Q4: Cognitive Score vs. Vaccination & PCR Status

NOTE: We will try again later…

Question 5: Are there differences in cognitive performance based on COVID-19 variant?

# Question 5: Are there differences in cognitive performance based on COVID-19 variant?
# Requires 'covid_variant', 'cognitive', 'age_pcr'. May need to control for vaccination ('num_doses').
if (all(c("cognitive", "covid_variant", "age_pcr") %in% names(data))) {
    print("--- Q5: Cognitive Score vs COVID Variant ---")
    print(levels(data$covid_variant)) # Check levels/variants present
    
    # Filter out variants with very few cases if necessary for stable analysis
    variant_counts <- table(data$covid_variant)
    print("Variant Counts:")
    print(variant_counts)
    # data_filtered_variants <- data %>% filter(covid_variant %in% names(variant_counts[variant_counts > 10])) # Example threshold
    
    # ANOVA model (using original data or filtered data)
    # Add other covariates like num_doses if available and relevant
    control_vars <- "age_pcr"
    if ("num_doses" %in% names(data)) {
        control_vars <- paste(control_vars, "+ num_doses")
    }
    formula_q5 <- as.formula(paste("cognitive ~ covid_variant +", control_vars))
    aov_model_variant <- aov(formula_q5, data = data)
    print("ANOVA Summary:")
    print(summary(aov_model_variant))
    
    # Post-hoc tests if ANOVA is significant
    if (summary(aov_model_variant)[[1]]$`Pr(>F)`[1] < 0.05) {
        print("Post-hoc Tests (Tukey HSD):")
        print(TukeyHSD(aov_model_variant, which = "covid_variant"))
    }
} else {
    print("Skipping Q5 analysis: 'cognitive', 'covid_variant', or 'age_pcr' column not found.")
}
## [1] "--- Q5: Cognitive Score vs COVID Variant ---"
## [1] "0" "1" "2" "3" "4" "5" "6" "7"
## [1] "Variant Counts:"
## 
##   0   1   2   3   4   5   6   7 
##  81 222  94  59   1   4   1   1 
## [1] "ANOVA Summary:"
##                Df Sum Sq Mean Sq F value   Pr(>F)    
## covid_variant   7   1.24   0.177   0.754    0.626    
## age_pcr         1   8.12   8.119  34.648 7.69e-09 ***
## Residuals     454 106.39   0.234                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Q5: Cognitive Score vs. COVID Variant

After controlling for age, there were no significant differences in the cognitive score based on the identified covid_variant (p = 0.626).

Caveat: Some variants had very few participants (n=1 or n=4), making comparisons involving them unreliable. We might consider grouping rare variants or excluding them. Let’s check later.

Question 6: How does age relate to cognitive performance? Does COVID history modify this?

# Question 6: How does age relate to cognitive performance? Does COVID history modify this?
if (all(c("cognitive", "age_pcr", "pcr") %in% names(data))) {
    print("--- Q6: Age, Cognitive Score, and PCR Interaction ---")
    # Correlation
    cor_age_cog <- cor.test(~ cognitive + age_pcr, data = data)
    print("Correlation between Age and Cognitive Score:")
    print(cor_age_cog)
    
    # Linear model with interaction term
    lm_age_interaction <- lm(cognitive ~ age_pcr * pcr, data = data)
    print("Linear Model with Age * PCR Interaction:")
    print(summary(lm_age_interaction))
    
    # Visualize the relationship (optional)
    # ggplot(data, aes(x = age_pcr, y = cognitive, color = pcr)) +
    #   geom_point(alpha = 0.5) +
    #   geom_smooth(method = "lm", aes(fill = pcr), alpha = 0.1) + # Add regression lines
    #   labs(title = "Cognitive Score vs Age by PCR Status",
    #        x = "Age at PCR", y = "Cognitive Score") +
    #   theme_minimal()
    
} else {
    print("Skipping Q6 analysis: 'cognitive', 'age_pcr', or 'pcr' column not found.")
}
## [1] "--- Q6: Age, Cognitive Score, and PCR Interaction ---"
## [1] "Correlation between Age and Cognitive Score:"
## 
##  Pearson's product-moment correlation
## 
## data:  cognitive and age_pcr
## t = -5.9975, df = 461, p-value = 4.052e-09
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.3515442 -0.1823737
## sample estimates:
##        cor 
## -0.2690327 
## 
## [1] "Linear Model with Age * PCR Interaction:"
## 
## Call:
## lm(formula = cognitive ~ age_pcr * pcr, data = data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.8347 -0.4722  0.1653  0.4324  0.8382 
## 
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)          3.972676   0.668204   5.945 5.48e-09 ***
## age_pcr             -0.035634   0.009889  -3.603 0.000349 ***
## pcrPOSITIVA         -0.901191   0.737102  -1.223 0.222104    
## age_pcr:pcrPOSITIVA  0.011763   0.010940   1.075 0.282823    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4811 on 458 degrees of freedom
##   (1 observation deleted due to missingness)
## Multiple R-squared:  0.08214,    Adjusted R-squared:  0.07613 
## F-statistic: 13.66 on 3 and 458 DF,  p-value: 1.5e-08

Q6: Age, Cognitive Score, and PCR Interaction

There’s a significant negative correlation between age and cognitive score (p < 0.001), as seen before.

The linear model testing for an interaction found no significant interaction effect (p = 0.28). This suggests that the relationship between age and cognitive score is similar for both PCR positive and negative groups in our sample.

Question 7: Are there correlations between different cognitive test performances?

# Question 7: Are there correlations between different cognitive test performances?
# Select relevant cognitive variable columns (adjust column names/indices as needed)
# Using names from skimr output: columns 24:60 correspond to indices 24:60 if data wasn't reordered
# Let's use specific names for clarity
cog_vars_indices <- which(names(data) %in% c("listaprimerrec", "listaaprendizaje", "listacp", "listalp", "listarecon",
                                             "corsidirecto", "corsiinverso", "cactusvivos", "cactusinanim",
                                             "otverbaltpo", "otverbalerr", "otvisualtpo", "otvisualerr",
                                             "otmentaltpo", "otmentalerr", "otvismenttpo", "otvismenterr",
                                             "otswitchtpo", "otswitcherr", "x5dreadtpo", "x5dreaderr",
                                             "x5dcounttpo", "x5dcounterr", "x5dfoctpo", "x5dfocerr",
                                             "x5dswitchtpo", "x5dswitcherr", "dscorr", "dsomis", "dscomis",
                                             "torremov", "torretpo", "bostonsc", "bostonlat",
                                             "bostonsemerr", "bostonfonerr", "fluencia"))

if (length(cog_vars_indices) > 1) {
    print("--- Q7: Correlations among Cognitive Variables ---")
    cognitive_subset <- data[, cog_vars_indices]
    
    # Handle missing data for correlation matrix (e.g., pairwise complete)
    cor_matrix <- cor(cognitive_subset, use = "pairwise.complete.obs")
    print("Correlation Matrix (Top Left Corner):")
    print(round(cor_matrix[1:min(10, nrow(cor_matrix)), 1:min(10, ncol(cor_matrix))], 2)) # Print a portion
    
    # Consider visualizing with corrplot or GGally packages
    # library(corrplot)
    # corrplot(cor_matrix, type = "upper", order = "hclust", tl.col = "black", tl.srt = 45)
    
    # Optional: Principal Component Analysis (PCA) or Factor Analysis
    # Requires careful consideration of scaling and interpretation
    # pca_result <- princomp(na.omit(cognitive_subset), cor = TRUE) # Using correlation matrix
    # summary(pca_result)
    # loadings(pca_result)
} else {
    print("Skipping Q7 analysis: Not enough cognitive variable columns found or selected.")
}
## [1] "--- Q7: Correlations among Cognitive Variables ---"
## [1] "Correlation Matrix (Top Left Corner):"
##                  listaprimerrec listaaprendizaje listacp listalp listarecon
## listaprimerrec             1.00             0.62    0.43    0.41       0.37
## listaaprendizaje           0.62             1.00    0.71    0.69       0.53
## listacp                    0.43             0.71    1.00    0.81       0.54
## listalp                    0.41             0.69    0.81    1.00       0.55
## listarecon                 0.37             0.53    0.54    0.55       1.00
## corsidirecto               0.14             0.14    0.22    0.17       0.21
## corsiinverso               0.28             0.30    0.26    0.27       0.29
## cactusvivos                0.34             0.45    0.43    0.38       0.44
## cactusinanim               0.32             0.43    0.41    0.37       0.41
## otverbaltpo                0.24             0.41    0.36    0.34       0.15
##                  corsidirecto corsiinverso cactusvivos cactusinanim otverbaltpo
## listaprimerrec           0.14         0.28        0.34         0.32        0.24
## listaaprendizaje         0.14         0.30        0.45         0.43        0.41
## listacp                  0.22         0.26        0.43         0.41        0.36
## listalp                  0.17         0.27        0.38         0.37        0.34
## listarecon               0.21         0.29        0.44         0.41        0.15
## corsidirecto             1.00         0.55        0.34         0.42        0.19
## corsiinverso             0.55         1.00        0.43         0.46        0.32
## cactusvivos              0.34         0.43        1.00         0.76        0.41
## cactusinanim             0.42         0.46        0.76         1.00        0.39
## otverbaltpo              0.19         0.32        0.41         0.39        1.00

Q7: Correlations Among Cognitive Variables

The output shows a portion of the correlation matrix. As expected, there are moderate-to-strong correlations between related tests (e.g., the different ‘lista’ tests). This confirms relationships between different cognitive measures.

Question 8: Do neuroimaging measures correlate with cognitive scores or differ by COVID status?

# Question 8: Do neuroimaging measures correlate with cognitive scores or differ by COVID status?
# Example using 'right_hippocampus' and 'cognitive', 'pcr', 'age_pcr'
neuro_var <- "right_hippocampus" # Choose an imaging variable
if (all(c(neuro_var, "cognitive", "pcr", "age_pcr") %in% names(data))) {
    print(paste("--- Q8: Neuroimaging (", neuro_var, ") Analysis ---"))
    
    # Correlation with cognitive score
    cor_neuro_cog <- cor.test(data[[neuro_var]], data$cognitive, use = "complete.obs")
    print(paste("Correlation between", neuro_var, "and Cognitive Score:"))
    print(cor_neuro_cog)
    
    # Comparison based on PCR status (controlling for age)
    lm_neuro_pcr_age <- lm(as.formula(paste(neuro_var, "~ pcr + age_pcr")), data = data)
    print(paste("Linear Model:", neuro_var, "~ pcr + age_pcr"))
    print(summary(lm_neuro_pcr_age))
    
    # T-test (uncorrected for age)
    # tryCatch({print(t.test(as.formula(paste(neuro_var, "~ pcr")), data = data))}, error=function(e){print(paste("Error:", e$message))})
    
} else {
    print(paste("Skipping Q8 analysis: '", neuro_var, "', 'cognitive', 'pcr', or 'age_pcr' column not found."))
}
## [1] "--- Q8: Neuroimaging ( right_hippocampus ) Analysis ---"
## [1] "Correlation between right_hippocampus and Cognitive Score:"
## 
##  Pearson's product-moment correlation
## 
## data:  data[[neuro_var]] and data$cognitive
## t = 2.8752, df = 461, p-value = 0.004224
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.04210788 0.22118372
## sample estimates:
##       cor 
## 0.1327288 
## 
## [1] "Linear Model: right_hippocampus ~ pcr + age_pcr"
## 
## Call:
## lm(formula = as.formula(paste(neuro_var, "~ pcr + age_pcr")), 
##     data = data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1404.85  -301.81   -37.61   263.75  1614.20 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 5217.312    252.916  20.629  < 2e-16 ***
## pcrPOSITIVA  -20.996     53.055  -0.396    0.692    
## age_pcr      -20.689      3.687  -5.612 3.47e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 419.5 on 459 degrees of freedom
##   (1 observation deleted due to missingness)
## Multiple R-squared:  0.0642, Adjusted R-squared:  0.06013 
## F-statistic: 15.75 on 2 and 459 DF,  p-value: 2.432e-07

Q8: Neuroimaging (Right Hippocampus) Analysis

There’s a small but statistically significant positive correlation between the right_hippocampus measure and the cognitive score (r = 0.13, p = 0.004).

In the linear model controlling for age, age_pcr was significantly negatively associated with the right_hippocampus measure (p < 0.001), but PCR status was not (p = 0.69).

# --- Data Preparation (Adapt as needed) ---

# Convert relevant columns to factors
factor_cols <- c("pcr", "anosmia", "risk_hospital_icu", "vaccine_before_study",
                 "covid_before_vaccination", "fever", "cough", "muscle_pain",
                 "breath_dif", "smell_lost", "taste_lost", "covid_variant",
                 "vaccine_1", "vaccine_2", "vaccine_3")

# Check which factor_cols actually exist in the data
existing_factor_cols <- factor_cols[factor_cols %in% names(data)]

# Apply as.factor only to existing columns
if (length(existing_factor_cols) > 0) {
  data <- data %>%
    mutate(across(all_of(existing_factor_cols), as.factor))
}

# !! Action Needed for Q4: Create 'num_doses' variable !!
# Uncomment and ADAPT the following code based on how vaccine_1/2/3 columns indicate doses
# Example: Assumes non-NA means dose received. Change logic if needed (e.g., check for specific values like "Pfizer", "Yes", etc.)
# data <- data %>%
#   mutate(
#     num_doses = case_when(
#       !is.na(vaccine_3) & vaccine_3 != "" & vaccine_3 != "NA" ~ 3, # Adjust conditions based on your data
#       !is.na(vaccine_2) & vaccine_2 != "" & vaccine_2 != "NA" ~ 2, # Adjust conditions based on your data
#       !is.na(vaccine_1) & vaccine_1 != "" & vaccine_1 != "NA" ~ 1, # Adjust conditions based on your data
#       TRUE ~ 0 # Assumes those without vaccine_1 entry have 0 doses
#     ),
#     # Convert to factor with meaningful labels
#     num_doses = factor(num_doses, levels = 0:3, labels = c("0 Doses", "1 Dose", "2 Doses", "3 Doses"))
#   )
#
# # Check the created variable
# print("Summary of num_doses variable:")
# print(summary(data$num_doses))
# print(table(data$num_doses, useNA = "ifany"))


# --- Analysis Questions & Code Examples ---

# Question 1: Does COVID-19 infection (positive PCR) impact overall cognitive scores?
if ("cognitive" %in% names(data) && "pcr" %in% names(data)) {
  print("--- Q1: Cognitive Score vs PCR Status ---")
  print(levels(data$pcr))

  # Welch T-test (original analysis)
  tryCatch({
    ttest_result_q1 <- t.test(cognitive ~ pcr, data = data)
    print(ttest_result_q1)
    # Suggestion 9: Calculate Effect Size (Cohen's d)
    print("Effect Size (Cohen's d):")
    print(cohens_d(cognitive ~ pcr, data = data))
  }, error = function(e) {print(paste("T-test failed:", e$message))})

  # Linear model controlling for age (original analysis)
  if("age_pcr" %in% names(data)) {
     print("--- Q1: Cognitive Score vs PCR Status controlling for Age (LM) ---")
     lm_model_pcr_age <- lm(cognitive ~ pcr + age_pcr, data = data)
     print(summary(lm_model_pcr_age))
     # Suggestion 9: Calculate Effect Size (Partial Eta Squared for ANOVA equivalent)
     print("Effect Sizes (Partial Eta Squared) from ANOVA:")
     print(eta_squared(car::Anova(lm_model_pcr_age, type="III"), partial = TRUE)) # requires car package

     # Suggestion 4: Check Model Assumptions
     print("Checking assumptions for lm(cognitive ~ pcr + age_pcr):")
     par(mfrow=c(2,2)) # Arrange plots in a 2x2 grid
     plot(lm_model_pcr_age)
     par(mfrow=c(1,1)) # Reset plot layout
  }
} else {
    print("Skipping Q1 analysis: 'cognitive' or 'pcr' column not found.")
}
## [1] "--- Q1: Cognitive Score vs PCR Status ---"
## [1] "NEGATIVA" "POSITIVA"
## 
##  Welch Two Sample t-test
## 
## data:  cognitive by pcr
## t = 1.3513, df = 105.05, p-value = 0.1795
## alternative hypothesis: true difference in means between group NEGATIVA and group POSITIVA is not equal to 0
## 95 percent confidence interval:
##  -0.03970381  0.20962630
## sample estimates:
## mean in group NEGATIVA mean in group POSITIVA 
##               1.573333               1.488372 
## 
## [1] "Effect Size (Cohen's d):"
## Warning: 'y' is numeric but has only 2 unique values.
##   If this is a grouping variable, convert it to a factor.
## Cohen's d |        95% CI
## -------------------------
## 0.17      | [-0.08, 0.42]
## 
## - Estimated using pooled SD.[1] "--- Q1: Cognitive Score vs PCR Status controlling for Age (LM) ---"
## 
## Call:
## lm(formula = cognitive ~ pcr + age_pcr, data = data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.8350 -0.4707  0.2359  0.4252  0.8676 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  3.325410   0.290119  11.462  < 2e-16 ***
## pcrPOSITIVA -0.111318   0.060860  -1.829    0.068 .  
## age_pcr     -0.026021   0.004229  -6.153 1.66e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4812 on 459 degrees of freedom
##   (1 observation deleted due to missingness)
## Multiple R-squared:  0.07983,    Adjusted R-squared:  0.07582 
## F-statistic: 19.91 on 2 and 459 DF,  p-value: 5.107e-09
## 
## [1] "Effect Sizes (Partial Eta Squared) from ANOVA:"
## # Effect Size for ANOVA (Type III)
## 
## Parameter | Eta2 (partial) |       95% CI
## -----------------------------------------
## pcr       |       7.24e-03 | [0.00, 1.00]
## age_pcr   |           0.08 | [0.04, 1.00]
## 
## - One-sided CIs: upper bound fixed at [1.00].[1] "Checking assumptions for lm(cognitive ~ pcr + age_pcr):"

# Question 2: Is there a difference in specific cognitive domains (e.g., fluency, memory) between COVID+ and COVID- groups?
if (all(c("fluencia", "dscorr", "pcr") %in% names(data))) {
  print("--- Q2: Specific Cognitive Domains vs PCR Status ---")
  # Fluency (original analysis)
  print("Fluency:")
  tryCatch({
      ttest_fluencia <- t.test(fluencia ~ pcr, data = data)
      print(ttest_fluencia)
      # Suggestion 9: Effect Size
      if(ttest_fluencia$p.value < 0.05) { # Only print if significant
          print("Effect Size (Cohen's d) for Fluency:")
          print(cohens_d(fluencia ~ pcr, data = data))
      }
       # Suggestion 3 & 10: Visualize counter-intuitive fluency finding
      print("Generating Boxplot for Fluency by PCR status:")
      print( # Explicitly print ggplot object
          ggplot(data, aes(x = pcr, y = fluencia, fill = pcr)) +
          geom_boxplot(alpha=0.7) +
          geom_jitter(width=0.1, alpha=0.3) +
          labs(title = "Verbal Fluency by PCR Status", x = "PCR Status", y = "Fluency Score") +
          theme_minimal() +
          theme(legend.position = "none")
      )

  }, error=function(e){print(paste("Error testing fluency:", e$message))})

  # DS Correct (original analysis)
  print("DS Correct:")
   tryCatch({print(t.test(dscorr ~ pcr, data = data))}, error=function(e){print(paste("Error testing dscorr:", e$message))})

  # MANOVA (original analysis)
  print("MANOVA for Fluency & DS Correct:")
  manova_data <- data %>% select(fluencia, dscorr, pcr) %>% na.omit()
  if(nrow(manova_data) > 0 && length(unique(manova_data$pcr)) > 1) {
      manova_result <- manova(cbind(fluencia, dscorr) ~ pcr, data = manova_data)
      print(summary(manova_result))
      print(summary.aov(manova_result))
      # Suggestion 9: Effect Size (Pillai's Trace - MANOVA effect sizes are complex, eta_squared on univariate is simpler)
      print("Effect Sizes (Partial Eta Squared) for Univariate ANOVAs from MANOVA:")
      print(eta_squared(manova_result, partial = TRUE)) # Eta squared for the MANOVA model factors
  } else {
      print("Insufficient data or factor levels for MANOVA.")
  }

  # Suggestion 8: Address Multiple Comparisons if testing many domains
  # Example: If you tested 5 cognitive domains with t-tests vs PCR
  # p_values_domains <- c(ttest_fluencia$p.value, p_val_domain2, p_val_domain3, p_val_domain4, p_val_domain5)
  # print("Adjusted p-values (Benjamini-Hochberg):")
  # print(p.adjust(p_values_domains, method = "BH"))

} else {
    print("Skipping Q2 analysis: 'fluencia', 'dscorr', or 'pcr' column not found.")
}
## [1] "--- Q2: Specific Cognitive Domains vs PCR Status ---"
## [1] "Fluency:"
## 
##  Welch Two Sample t-test
## 
## data:  fluencia by pcr
## t = -2.7437, df = 99.602, p-value = 0.007207
## alternative hypothesis: true difference in means between group NEGATIVA and group POSITIVA is not equal to 0
## 95 percent confidence interval:
##  -3.0291666 -0.4866883
## sample estimates:
## mean in group NEGATIVA mean in group POSITIVA 
##               15.04000               16.79793 
## 
## [1] "Effect Size (Cohen's d) for Fluency:"
## Warning: Missing values detected. NAs dropped.
## Cohen's d |         95% CI
## --------------------------
## -0.37     | [-0.62, -0.12]
## 
## - Estimated using pooled SD.[1] "Generating Boxplot for Fluency by PCR status:"
## Warning: Removed 1 row containing non-finite outside the scale range
## (`stat_boxplot()`).
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_point()`).

## [1] "DS Correct:"
## 
##  Welch Two Sample t-test
## 
## data:  dscorr by pcr
## t = -0.57336, df = 111.6, p-value = 0.5676
## alternative hypothesis: true difference in means between group NEGATIVA and group POSITIVA is not equal to 0
## 95 percent confidence interval:
##  -5.479240  3.019912
## sample estimates:
## mean in group NEGATIVA mean in group POSITIVA 
##               68.09333               69.32300 
## 
## [1] "MANOVA for Fluency & DS Correct:"
##            Df   Pillai approx F num Df den Df   Pr(>F)   
## pcr         1 0.021077   4.9305      2    458 0.007611 **
## Residuals 459                                            
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##  Response fluencia :
##              Df  Sum Sq Mean Sq F value   Pr(>F)   
## pcr           1   194.1 194.066  8.4713 0.003783 **
## Residuals   459 10515.1  22.909                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##  Response dscorr :
##              Df Sum Sq Mean Sq F value Pr(>F)
## pcr           1    100  100.36   0.305  0.581
## Residuals   459 151035  329.05               
## 
## [1] "Effect Sizes (Partial Eta Squared) for Univariate ANOVAs from MANOVA:"
## # Effect Size for ANOVA (Type I)
## 
## Parameter | Eta2 (partial) |       95% CI
## -----------------------------------------
## pcr       |           0.02 | [0.00, 1.00]
## 
## - One-sided CIs: upper bound fixed at [1.00].
# Question 3: Does the presence/severity of specific symptoms (e.g., anosmia) correlate with cognitive performance?
if (all(c("cognitive", "anosmia", "age_pcr") %in% names(data))) {
    print("--- Q3: Cognitive Score vs Anosmia ---")
    print("Levels of anosmia variable:")
    print(levels(data$anosmia))
    print("Summary of anosmia variable:")
    print(summary(data$anosmia)) # See counts per level

    # Suggestion 1: Replace t.test with ANOVA for multi-level factor
    print("ANOVA for Cognitive Score ~ Anosmia:")
    tryCatch({
        aov_anosmia <- aov(cognitive ~ anosmia, data = data)
        summary_aov_anosmia <- summary(aov_anosmia)
        print(summary_aov_anosmia)

        # Suggestion 9: Effect Size (Eta Squared)
        print("Effect Size (Eta Squared) for Anosmia ANOVA:")
        print(eta_squared(aov_anosmia, partial = FALSE)) # Simple eta-squared for one-way ANOVA

        # Optional: Post-hoc tests if ANOVA is significant
        if (summary_aov_anosmia[[1]]$`Pr(>F)`[1] < 0.05) {
             print("Post-hoc Tests (Tukey HSD) for Anosmia:")
             print(TukeyHSD(aov_anosmia))
        }
    }, error=function(e){print(paste("Error running ANOVA for anosmia:", e$message))})

    # Linear model controlling for age (original analysis)
    print("Linear Model for Cognitive Score ~ Anosmia + Age:")
    lm_model_anosmia_age <- lm(cognitive ~ anosmia + age_pcr, data = data)
    print(summary(lm_model_anosmia_age))
    # Suggestion 9: Effect Sizes
    print("Effect Sizes (Partial Eta Squared) from Anosmia + Age model:")
    print(eta_squared(car::Anova(lm_model_anosmia_age, type="III"), partial = TRUE))

    # Suggestion 4: Check Model Assumptions
    print("Checking assumptions for lm(cognitive ~ anosmia + age_pcr):")
    par(mfrow=c(2,2))
    plot(lm_model_anosmia_age)
    par(mfrow=c(1,1))

    # Suggestion 1 & 10: Visualize potentially counter-intuitive Anosmia finding
    print("Generating Boxplot for Cognitive Score by Anosmia level:")
    print( # Explicitly print ggplot object
        ggplot(data, aes(x = anosmia, y = cognitive, fill = anosmia)) +
        geom_boxplot(alpha=0.7) +
        labs(title = "Cognitive Score by Anosmia Level", x = "Anosmia Level", y = "Cognitive Score") +
        theme_minimal() +
        theme(legend.position = "none")
    )

} else {
    print("Skipping Q3 analysis: 'cognitive', 'anosmia', or 'age_pcr' column not found.")
}
## [1] "--- Q3: Cognitive Score vs Anosmia ---"
## [1] "Levels of anosmia variable:"
## [1] "0" "1" "2" "3"
## [1] "Summary of anosmia variable:"
##    0    1    2    3 NA's 
##  109   55  148  150    1 
## [1] "ANOVA for Cognitive Score ~ Anosmia:"
##              Df Sum Sq Mean Sq F value  Pr(>F)    
## anosmia       3   5.43  1.8095   7.529 6.3e-05 ***
## Residuals   458 110.07  0.2403                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 1 observation deleted due to missingness
## [1] "Effect Size (Eta Squared) for Anosmia ANOVA:"
## # Effect Size for ANOVA (Type I)
## 
## Parameter | Eta2 |       95% CI
## -------------------------------
## anosmia   | 0.05 | [0.02, 1.00]
## 
## - One-sided CIs: upper bound fixed at [1.00].[1] "Post-hoc Tests (Tukey HSD) for Anosmia:"
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = cognitive ~ anosmia, data = data)
## 
## $anosmia
##            diff         lwr         upr     p adj
## 1-0  0.15129274 -0.05777993  0.36036542 0.2442834
## 2-0  0.28409373  0.12454535  0.44364210 0.0000337
## 3-0  0.10220183 -0.05689478  0.26129845 0.3481916
## 2-1  0.13280098 -0.06681980  0.33242177 0.3167972
## 3-1 -0.04909091 -0.24835080  0.15016898 0.9206020
## 3-2 -0.18189189 -0.32834601 -0.03543778 0.0079235
## 
## [1] "Linear Model for Cognitive Score ~ Anosmia + Age:"
## 
## Call:
## lm(formula = cognitive ~ anosmia + age_pcr, data = data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.9277 -0.4598  0.1478  0.4395  0.8791 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  2.911246   0.290696  10.015  < 2e-16 ***
## anosmia1     0.098567   0.079292   1.243 0.214469    
## anosmia2     0.232995   0.060789   3.833 0.000144 ***
## anosmia3     0.086466   0.059951   1.442 0.149911    
## age_pcr     -0.022953   0.004242  -5.411 1.01e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4758 on 457 degrees of freedom
##   (1 observation deleted due to missingness)
## Multiple R-squared:  0.1044, Adjusted R-squared:  0.09654 
## F-statistic: 13.31 on 4 and 457 DF,  p-value: 2.859e-10
## 
## [1] "Effect Sizes (Partial Eta Squared) from Anosmia + Age model:"
## # Effect Size for ANOVA (Type III)
## 
## Parameter | Eta2 (partial) |       95% CI
## -----------------------------------------
## anosmia   |           0.03 | [0.01, 1.00]
## age_pcr   |           0.06 | [0.03, 1.00]
## 
## - One-sided CIs: upper bound fixed at [1.00].[1] "Checking assumptions for lm(cognitive ~ anosmia + age_pcr):"

## [1] "Generating Boxplot for Cognitive Score by Anosmia level:"

# Question 4: Does vaccination status influence cognitive outcomes, potentially interacting with COVID status?
# Requires 'num_doses' variable created in Preparation Step
if (all(c("cognitive", "pcr", "age_pcr") %in% names(data)) && "num_doses" %in% names(data)) {
    print("--- Q4: Cognitive Score vs Vaccination & PCR Status ---")
    # Ensure num_doses is a factor
    if(!is.factor(data$num_doses)) data$num_doses <- factor(data$num_doses)

    # ANOVA/Linear Model including interaction
    print("Linear Model: cognitive ~ num_doses * pcr + age_pcr")
    lm_interaction_vacc <- lm(cognitive ~ num_doses * pcr + age_pcr, data = data)

    # Using car::Anova for Type III sums of squares
    print("ANOVA Table (Type III SS):")
    anova_table_q4 <- car::Anova(lm_interaction_vacc, type = "III")
    print(anova_table_q4)

    print("Model Summary:")
    print(summary(lm_interaction_vacc))

    # Suggestion 9: Effect Sizes
    print("Effect Sizes (Partial Eta Squared):")
    print(eta_squared(anova_table_q4, partial = TRUE))

    # Suggestion 4: Check Model Assumptions
    print("Checking assumptions for lm(cognitive ~ num_doses * pcr + age_pcr):")
    par(mfrow=c(2,2))
    plot(lm_interaction_vacc)
    par(mfrow=c(1,1))

    # Suggestion 10: Visualize interaction if significant
    # Check if the interaction term p-value is less than 0.05
    interaction_p_value <- anova_table_q4["num_doses:pcr", "Pr(>F)"]
    if (!is.na(interaction_p_value) && interaction_p_value < 0.05) {
      print("Interaction detected, generating interaction plot:")
      print( # Explicitly print ggplot object
        ggplot(data, aes(x = pcr, y = cognitive, color = num_doses, group = num_doses)) +
          stat_summary(fun.data = mean_se, geom = "errorbar", width = 0.1, position=position_dodge(0.1)) +
          stat_summary(fun = mean, geom = "line", position=position_dodge(0.1)) +
          stat_summary(fun = mean, geom = "point", position=position_dodge(0.1), size=2) +
          labs(title = "Interaction: Cognitive Score by PCR Status and Vaccination",
               x = "PCR Status", y = "Mean Cognitive Score", color = "Vaccine Doses") +
          theme_minimal()
      )
    } else {
        print("Interaction term num_doses:pcr not significant (or NA), skipping interaction plot.")
    }

} else {
    print("Skipping Q4 analysis: 'cognitive', 'pcr', 'age_pcr', or 'num_doses' column not found or not created/adapted correctly.")
}
## [1] "Skipping Q4 analysis: 'cognitive', 'pcr', 'age_pcr', or 'num_doses' column not found or not created/adapted correctly."
# Question 5: Are there differences in cognitive performance based on COVID-19 variant?
if (all(c("cognitive", "covid_variant", "age_pcr") %in% names(data))) {
    print("--- Q5: Cognitive Score vs COVID Variant ---")
    print("Original Variant Counts:")
    variant_counts <- table(data$covid_variant, useNA = "ifany")
    print(variant_counts)

    # Suggestion 6: Refine Variant Analysis - Handle rare variants (Example: Grouping)
    # Define a threshold for 'rare'
    rare_threshold <- 10 # Example: group variants with fewer than 10 cases
    data <- data %>%
      mutate(
        covid_variant_grouped = ifelse(covid_variant %in% names(variant_counts[variant_counts < rare_threshold]),
                                       "Other_Rare",
                                       as.character(covid_variant)), # Keep others as is
        covid_variant_grouped = factor(covid_variant_grouped) # Convert back to factor
        )
    print("Grouped Variant Counts:")
    print(table(data$covid_variant_grouped, useNA = "ifany"))

    # ANOVA model using the grouped variant variable
    control_vars <- "age_pcr"
    # Add num_doses if available and analysis in Q4 worked
    if ("num_doses" %in% names(data) && exists("lm_interaction_vacc")) {
      control_vars <- paste(control_vars, "+ num_doses")
    }
    formula_q5 <- as.formula(paste("cognitive ~ covid_variant_grouped +", control_vars))

    print(paste("Running ANOVA:", deparse(formula_q5)))
    aov_model_variant <- aov(formula_q5, data = data)
    summary_aov_variant <- summary(aov_model_variant)
    print("ANOVA Summary (Grouped Variants):")
    print(summary_aov_variant)

    # Suggestion 9: Effect Size
    print("Effect Size (Eta Squared) for Grouped Variants ANOVA:")
    print(eta_squared(aov_model_variant)) # Use car::Anova if using Type III SS with interactions/covariates

    # Post-hoc tests if ANOVA is significant
    if (summary_aov_variant[[1]]$`Pr(>F)`[1] < 0.05) {
      print("Post-hoc Tests (Tukey HSD) for Grouped Variants:")
      # Ensure the factor name matches the one in the formula
      print(TukeyHSD(aov_model_variant, which = "covid_variant_grouped"))
    }
} else {
    print("Skipping Q5 analysis: 'cognitive', 'covid_variant', or 'age_pcr' column not found.")
}
## [1] "--- Q5: Cognitive Score vs COVID Variant ---"
## [1] "Original Variant Counts:"
## 
##   0   1   2   3   4   5   6   7 
##  81 222  94  59   1   4   1   1 
## [1] "Grouped Variant Counts:"
## 
##          0          1          2          3 Other_Rare 
##         81        222         94         59          7 
## [1] "Running ANOVA: cognitive ~ covid_variant_grouped + age_pcr"
## [1] "ANOVA Summary (Grouped Variants):"
##                        Df Sum Sq Mean Sq F value   Pr(>F)    
## covid_variant_grouped   4   0.52   0.131    0.56    0.692    
## age_pcr                 1   8.45   8.454   36.18 3.68e-09 ***
## Residuals             457 106.77   0.234                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## [1] "Effect Size (Eta Squared) for Grouped Variants ANOVA:"
## # Effect Size for ANOVA (Type I)
## 
## Parameter             | Eta2 (partial) |       95% CI
## -----------------------------------------------------
## covid_variant_grouped |       4.87e-03 | [0.00, 1.00]
## age_pcr               |           0.07 | [0.04, 1.00]
## 
## - One-sided CIs: upper bound fixed at [1.00].
# Question 6: How does age relate to cognitive performance? Does COVID history modify this?
if (all(c("cognitive", "age_pcr", "pcr") %in% names(data))) {
    print("--- Q6: Age, Cognitive Score, and PCR Interaction ---")
    # Correlation (original analysis)
    cor_age_cog <- cor.test(~ cognitive + age_pcr, data = data)
    print("Correlation between Age and Cognitive Score:")
    print(cor_age_cog)

    # Linear model with interaction term (original analysis)
    lm_age_interaction <- lm(cognitive ~ age_pcr * pcr, data = data)
    print("Linear Model with Age * PCR Interaction:")
    print(summary(lm_age_interaction))
    # Suggestion 9: Effect Sizes
    print("Effect Sizes (Partial Eta Squared) for Age*PCR model:")
    print(eta_squared(car::Anova(lm_age_interaction, type="III"), partial = TRUE))

    # Suggestion 4: Check Model Assumptions
    print("Checking assumptions for lm(cognitive ~ age_pcr * pcr):")
    par(mfrow=c(2,2))
    plot(lm_age_interaction)
    par(mfrow=c(1,1))

    # Suggestion 10: Visualize the relationship (interaction or main effect of age)
    print("Generating Scatter Plot for Cognitive Score vs Age by PCR Status:")
    print( # Explicitly print ggplot object
      ggplot(data, aes(x = age_pcr, y = cognitive, color = pcr)) +
        geom_point(alpha = 0.4) +
        geom_smooth(method = "lm", aes(fill = pcr), alpha = 0.1) + # Add regression lines per group
        labs(title = "Cognitive Score vs Age by PCR Status",
             x = "Age at PCR", y = "Cognitive Score") +
        theme_minimal()
    )

} else {
    print("Skipping Q6 analysis: 'cognitive', 'age_pcr', or 'pcr' column not found.")
}
## [1] "--- Q6: Age, Cognitive Score, and PCR Interaction ---"
## [1] "Correlation between Age and Cognitive Score:"
## 
##  Pearson's product-moment correlation
## 
## data:  cognitive and age_pcr
## t = -5.9975, df = 461, p-value = 4.052e-09
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.3515442 -0.1823737
## sample estimates:
##        cor 
## -0.2690327 
## 
## [1] "Linear Model with Age * PCR Interaction:"
## 
## Call:
## lm(formula = cognitive ~ age_pcr * pcr, data = data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.8347 -0.4722  0.1653  0.4324  0.8382 
## 
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)          3.972676   0.668204   5.945 5.48e-09 ***
## age_pcr             -0.035634   0.009889  -3.603 0.000349 ***
## pcrPOSITIVA         -0.901191   0.737102  -1.223 0.222104    
## age_pcr:pcrPOSITIVA  0.011763   0.010940   1.075 0.282823    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4811 on 458 degrees of freedom
##   (1 observation deleted due to missingness)
## Multiple R-squared:  0.08214,    Adjusted R-squared:  0.07613 
## F-statistic: 13.66 on 3 and 458 DF,  p-value: 1.5e-08
## 
## [1] "Effect Sizes (Partial Eta Squared) for Age*PCR model:"
## Type 3 ANOVAs only give sensible and informative results when covariates
##   are mean-centered and factors are coded with orthogonal contrasts (such
##   as those produced by `contr.sum`, `contr.poly`, or `contr.helmert`, but
##   *not* by the default `contr.treatment`).
## # Effect Size for ANOVA (Type III)
## 
## Parameter   | Eta2 (partial) |       95% CI
## -------------------------------------------
## age_pcr     |           0.03 | [0.01, 1.00]
## pcr         |       3.25e-03 | [0.00, 1.00]
## age_pcr:pcr |       2.52e-03 | [0.00, 1.00]
## 
## - One-sided CIs: upper bound fixed at [1.00].[1] "Checking assumptions for lm(cognitive ~ age_pcr * pcr):"

## [1] "Generating Scatter Plot for Cognitive Score vs Age by PCR Status:"
## `geom_smooth()` using formula = 'y ~ x'

# Question 7: Are there correlations between different cognitive test performances?
cog_vars_indices <- which(names(data) %in% c("listaprimerrec", "listaaprendizaje", "listacp", "listalp", "listarecon",
                                            "corsidirecto", "corsiinverso", "cactusvivos", "cactusinanim",
                                            "otverbaltpo", "otverbalerr", "otvisualtpo", "otvisualerr",
                                            "otmentaltpo", "otmentalerr", "otvismenttpo", "otvismenterr",
                                            "otswitchtpo", "otswitcherr", "x5dreadtpo", "x5dreaderr",
                                            "x5dcounttpo", "x5dcounterr", "x5dfoctpo", "x5dfocerr",
                                            "x5dswitchtpo", "x5dswitcherr", "dscorr", "dsomis", "dscomis",
                                            "torremov", "torretpo", "bostonsc", "bostonlat",
                                            "bostonsemerr", "bostonfonerr", "fluencia"))

if (length(cog_vars_indices) > 1) {
    print("--- Q7: Correlations among Cognitive Variables ---")
    cognitive_subset <- data[, cog_vars_indices]

    # Correlation matrix (original analysis)
    cor_matrix <- cor(cognitive_subset, use = "pairwise.complete.obs")
    print("Correlation Matrix (Top Left Corner):")
    print(round(cor_matrix[1:min(10, nrow(cor_matrix)), 1:min(10, ncol(cor_matrix))], 2))

    # Suggestion 5: Visualize the full correlation matrix
    print("Generating Correlation Plot (allow time for plot rendering):")
    # Adjust cex (size) parameters if labels overlap or are too small
    corrplot(cor_matrix, method="number", type = "upper", order = "hclust",
             tl.col = "black", tl.srt = 45, number.cex = 0.5, tl.cex = 0.6,
             title = "Correlation Matrix of Cognitive Variables", mar=c(0,0,1,0)) # Add title

    # Optional: PCA/Factor Analysis (original suggestion)
    # print("Running PCA (example):")
    # # Ensure no missing values for PCA - using na.omit is one way, imputation is another
    # cognitive_subset_complete <- na.omit(cognitive_subset)
    # if(nrow(cognitive_subset_complete) > ncol(cognitive_subset_complete)) { # Need more rows than columns
    #   pca_result <- princomp(cognitive_subset_complete, cor = TRUE, scores=TRUE)
    #   print(summary(pca_result)) # Show variance explained
    #   # print(loadings(pca_result)) # Show component loadings (how variables contribute)
    # } else {
    #   print("Skipping PCA: Not enough complete cases or too many variables.")
    # }

} else {
    print("Skipping Q7 analysis: Not enough cognitive variable columns found or selected.")
}
## [1] "--- Q7: Correlations among Cognitive Variables ---"
## [1] "Correlation Matrix (Top Left Corner):"
##                  listaprimerrec listaaprendizaje listacp listalp listarecon
## listaprimerrec             1.00             0.62    0.43    0.41       0.37
## listaaprendizaje           0.62             1.00    0.71    0.69       0.53
## listacp                    0.43             0.71    1.00    0.81       0.54
## listalp                    0.41             0.69    0.81    1.00       0.55
## listarecon                 0.37             0.53    0.54    0.55       1.00
## corsidirecto               0.14             0.14    0.22    0.17       0.21
## corsiinverso               0.28             0.30    0.26    0.27       0.29
## cactusvivos                0.34             0.45    0.43    0.38       0.44
## cactusinanim               0.32             0.43    0.41    0.37       0.41
## otverbaltpo                0.24             0.41    0.36    0.34       0.15
##                  corsidirecto corsiinverso cactusvivos cactusinanim otverbaltpo
## listaprimerrec           0.14         0.28        0.34         0.32        0.24
## listaaprendizaje         0.14         0.30        0.45         0.43        0.41
## listacp                  0.22         0.26        0.43         0.41        0.36
## listalp                  0.17         0.27        0.38         0.37        0.34
## listarecon               0.21         0.29        0.44         0.41        0.15
## corsidirecto             1.00         0.55        0.34         0.42        0.19
## corsiinverso             0.55         1.00        0.43         0.46        0.32
## cactusvivos              0.34         0.43        1.00         0.76        0.41
## cactusinanim             0.42         0.46        0.76         1.00        0.39
## otverbaltpo              0.19         0.32        0.41         0.39        1.00
## [1] "Generating Correlation Plot (allow time for plot rendering):"

# Question 8: Do neuroimaging measures correlate with cognitive scores or differ by COVID status?
# Suggestion 7: Expand Neuroimaging Analysis - Structure for repeating
neuro_vars_to_test <- c("right_hippocampus", "left_hippocampus", "right_amygdala", "left_amygdala") # Add other variables here

# Check which neuroimaging variables actually exist in the data
neuro_vars_to_test <- neuro_vars_to_test[neuro_vars_to_test %in% names(data)]

all_required_cols_q8 <- c(neuro_vars_to_test, "cognitive", "pcr", "age_pcr")

if (all(all_required_cols_q8 %in% names(data))) {
    print(paste("--- Q8: Neuroimaging Analysis for:", paste(neuro_vars_to_test, collapse=", "), "---"))

    results_neuro <- list() # Store results

    for (neuro_var in neuro_vars_to_test) {
        print(paste("--- Analyzing:", neuro_var, "---"))
        results_neuro[[neuro_var]] <- list() # Create sublist for this variable

        # Correlation with cognitive score
        cor_neuro_cog <- cor.test(data[[neuro_var]], data$cognitive, use = "complete.obs")
        print(paste("Correlation between", neuro_var, "and Cognitive Score:"))
        print(cor_neuro_cog)
        results_neuro[[neuro_var]]$correlation <- cor_neuro_cog

        # Comparison based on PCR status (controlling for age)
        formula_q8 <- as.formula(paste(neuro_var, "~ pcr + age_pcr"))
        lm_neuro_pcr_age <- lm(formula_q8, data = data)
        print(paste("Linear Model:", neuro_var, "~ pcr + age_pcr"))
        summary_lm_neuro <- summary(lm_neuro_pcr_age)
        print(summary_lm_neuro)
        results_neuro[[neuro_var]]$lm_summary <- summary_lm_neuro

        # Suggestion 9: Effect Sizes for LM
        print("Effect Sizes (Partial Eta Squared):")
        tryCatch({ # Anova might fail if model is singular etc.
            anova_lm_neuro <- car::Anova(lm_neuro_pcr_age, type="III")
            print(eta_squared(anova_lm_neuro, partial = TRUE))
            results_neuro[[neuro_var]]$lm_effect_sizes <- eta_squared(anova_lm_neuro, partial = TRUE)
        }, error = function(e){print(paste("Could not calculate effect sizes for", neuro_var, ":", e$message))})

        # Suggestion 4: Check Model Assumptions
        print(paste("Checking assumptions for lm for", neuro_var, ":"))
        par(mfrow=c(2,2))
        plot(lm_neuro_pcr_age)
        par(mfrow=c(1,1))
    }

    # Suggestion 8: Address Multiple Comparisons for Neuroimaging LMs
    # Example: Adjust p-values for the 'pcrPOSITIVA' term across all tested neuro variables
    # p_values_pcr_effect <- sapply(results_neuro, function(res) {
    #     coef_summary <- coefficients(res$lm_summary)
    #     if ("pcrPOSITIVA" %in% rownames(coef_summary)) {
    #         return(coef_summary["pcrPOSITIVA", "Pr(>|t|)"])
    #     } else {
    #         return(NA) # Return NA if the coefficient doesn't exist
    #     }
    # })
    # p_values_pcr_effect <- p_values_pcr_effect[!is.na(p_values_pcr_effect)] # Remove NAs
    # if(length(p_values_pcr_effect) > 1) {
    #   print("Adjusted p-values for PCR effect across neuroimaging variables (BH method):")
    #   print(p.adjust(p_values_pcr_effect, method = "BH"))
    # }

} else {
    print(paste("Skipping Q8 analysis: One or more required columns not found:", paste(all_required_cols_q8[!all_required_cols_q8 %in% names(data)], collapse=", ")))
}
## [1] "--- Q8: Neuroimaging Analysis for: right_hippocampus, left_hippocampus, right_amygdala, left_amygdala ---"
## [1] "--- Analyzing: right_hippocampus ---"
## [1] "Correlation between right_hippocampus and Cognitive Score:"
## 
##  Pearson's product-moment correlation
## 
## data:  data[[neuro_var]] and data$cognitive
## t = 2.8752, df = 461, p-value = 0.004224
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.04210788 0.22118372
## sample estimates:
##       cor 
## 0.1327288 
## 
## [1] "Linear Model: right_hippocampus ~ pcr + age_pcr"
## 
## Call:
## lm(formula = formula_q8, data = data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1404.85  -301.81   -37.61   263.75  1614.20 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 5217.312    252.916  20.629  < 2e-16 ***
## pcrPOSITIVA  -20.996     53.055  -0.396    0.692    
## age_pcr      -20.689      3.687  -5.612 3.47e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 419.5 on 459 degrees of freedom
##   (1 observation deleted due to missingness)
## Multiple R-squared:  0.0642, Adjusted R-squared:  0.06013 
## F-statistic: 15.75 on 2 and 459 DF,  p-value: 2.432e-07
## 
## [1] "Effect Sizes (Partial Eta Squared):"
## # Effect Size for ANOVA (Type III)
## 
## Parameter | Eta2 (partial) |       95% CI
## -----------------------------------------
## pcr       |       3.41e-04 | [0.00, 1.00]
## age_pcr   |           0.06 | [0.03, 1.00]
## 
## - One-sided CIs: upper bound fixed at [1.00].[1] "Checking assumptions for lm for right_hippocampus :"

## [1] "--- Analyzing: left_hippocampus ---"
## [1] "Correlation between left_hippocampus and Cognitive Score:"
## 
##  Pearson's product-moment correlation
## 
## data:  data[[neuro_var]] and data$cognitive
## t = 2.8959, df = 461, p-value = 0.00396
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.04306161 0.22209225
## sample estimates:
##       cor 
## 0.1336673 
## 
## [1] "Linear Model: left_hippocampus ~ pcr + age_pcr"
## 
## Call:
## lm(formula = formula_q8, data = data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1325.45  -262.95   -18.78   242.66  1773.68 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 4760.336    232.020  20.517  < 2e-16 ***
## pcrPOSITIVA   12.712     48.672   0.261    0.794    
## age_pcr      -19.155      3.382  -5.664 2.61e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 384.8 on 459 degrees of freedom
##   (1 observation deleted due to missingness)
## Multiple R-squared:  0.06616,    Adjusted R-squared:  0.06209 
## F-statistic: 16.26 on 2 and 459 DF,  p-value: 1.506e-07
## 
## [1] "Effect Sizes (Partial Eta Squared):"
## # Effect Size for ANOVA (Type III)
## 
## Parameter | Eta2 (partial) |       95% CI
## -----------------------------------------
## pcr       |       1.49e-04 | [0.00, 1.00]
## age_pcr   |           0.07 | [0.03, 1.00]
## 
## - One-sided CIs: upper bound fixed at [1.00].[1] "Checking assumptions for lm for left_hippocampus :"

## [1] "--- Analyzing: right_amygdala ---"
## [1] "Correlation between right_amygdala and Cognitive Score:"
## 
##  Pearson's product-moment correlation
## 
## data:  data[[neuro_var]] and data$cognitive
## t = 1.0567, df = 461, p-value = 0.2912
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.04216545  0.13965835
## sample estimates:
##        cor 
## 0.04915368 
## 
## [1] "Linear Model: right_amygdala ~ pcr + age_pcr"
## 
## Call:
## lm(formula = formula_q8, data = data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -550.59 -127.42  -26.96  103.96  811.74 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 1374.489    113.518  12.108   <2e-16 ***
## pcrPOSITIVA    2.158     23.813   0.091    0.928    
## age_pcr       -3.964      1.655  -2.395    0.017 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 188.3 on 459 degrees of freedom
##   (1 observation deleted due to missingness)
## Multiple R-squared:  0.01249,    Adjusted R-squared:  0.008187 
## F-statistic: 2.903 on 2 and 459 DF,  p-value: 0.05588
## 
## [1] "Effect Sizes (Partial Eta Squared):"
## # Effect Size for ANOVA (Type III)
## 
## Parameter | Eta2 (partial) |       95% CI
## -----------------------------------------
## pcr       |       1.79e-05 | [0.00, 1.00]
## age_pcr   |           0.01 | [0.00, 1.00]
## 
## - One-sided CIs: upper bound fixed at [1.00].[1] "Checking assumptions for lm for right_amygdala :"

## [1] "--- Analyzing: left_amygdala ---"
## [1] "Correlation between left_amygdala and Cognitive Score:"
## 
##  Pearson's product-moment correlation
## 
## data:  data[[neuro_var]] and data$cognitive
## t = 1.3678, df = 461, p-value = 0.1721
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.02771658  0.15381339
## sample estimates:
##        cor 
## 0.06357426 
## 
## [1] "Linear Model: left_amygdala ~ pcr + age_pcr"
## 
## Call:
## lm(formula = formula_q8, data = data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -527.45 -122.77  -20.61   89.47  800.90 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 1233.465    111.054  11.107   <2e-16 ***
## pcrPOSITIVA   17.400     23.296   0.747    0.456    
## age_pcr       -2.388      1.619  -1.475    0.141    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 184.2 on 459 degrees of freedom
##   (1 observation deleted due to missingness)
## Multiple R-squared:  0.006286,   Adjusted R-squared:  0.001956 
## F-statistic: 1.452 on 2 and 459 DF,  p-value: 0.2353
## 
## [1] "Effect Sizes (Partial Eta Squared):"
## # Effect Size for ANOVA (Type III)
## 
## Parameter | Eta2 (partial) |       95% CI
## -----------------------------------------
## pcr       |       1.21e-03 | [0.00, 1.00]
## age_pcr   |       4.72e-03 | [0.00, 1.00]
## 
## - One-sided CIs: upper bound fixed at [1.00].[1] "Checking assumptions for lm for left_amygdala :"

print("--- End of Updated Analysis Script ---")
## [1] "--- End of Updated Analysis Script ---"

Results

Interpretation of Neuroimaging Results (Q8)

This analysis examined the relationship between four subcortical brain regions (right_hippocampus, left_hippocampus, right_amygdala, left_amygdala) and both the general cognitive score and pcr status (controlling for age_pcr).

Here’s a breakdown by region:

1. Right Hippocampus

Correlation with Cognition: There is a statistically significant, albeit small, positive correlation between the right hippocampus measure and the cognitive score (r = 0.13, p = 0.004). This suggests individuals with larger right hippocampal measures tend to have slightly higher cognitive scores in this sample.

Linear Model (vs. PCR Status & Age):

age_pcr has a highly significant negative association (p < 0.001), indicating that older age is linked to smaller right hippocampal measures. The partial eta squared (η²p = 0.06) suggests age accounts for about 6% of the variance in the right hippocampus measure after controlling for PCR status, a small-to-medium effect.

pcr status (positive vs. negative) was not significantly associated with the right hippocampus measure after controlling for age (p = 0.69). The effect size (η²p = 0.0003) is negligible.

2. Left Hippocampus

Correlation with Cognition: Similar to the right side, there’s a significant, small positive correlation between the left hippocampus measure and the cognitive score (r = 0.13, p = 0.004).

Linear Model (vs. PCR Status & Age):

age_pcr again shows a highly significant negative association (p < 0.001), linking older age to smaller left hippocampal measures. The effect size (η²p = 0.07) is similar to the right side (around 7% of variance explained).

pcr status was not significantly associated with the left hippocampus measure after controlling for age (p = 0.79). The effect size (η²p = 0.0001) is negligible.

3. Right Amygdala

Correlation with Cognition: There was no significant correlation found between the right amygdala measure and the cognitive score (p = 0.29).

Linear Model (vs. PCR Status & Age):

age_pcr shows a significant negative association (p = 0.017), suggesting older age is linked to smaller right amygdala measures, although the effect size is small (η²p = 0.01).

pcr status was not significantly associated with the right amygdala measure after controlling for age (p = 0.93). The effect size (η²p ≈ 0) is negligible.

The overall model was borderline significant (p=0.056).

4. Left Amygdala

Correlation with Cognition: There was no significant correlation found between the left amygdala measure and the cognitive score (p = 0.17).

Linear Model (vs. PCR Status & Age):

Neither age_pcr (p = 0.14) nor pcr status (p = 0.46) were significantly associated with the left amygdala measure.

Effect sizes were negligible (η²p < 0.005 for both).

The overall model was not statistically significant (p = 0.24).

Other Conclusions:

Age Effect: Age consistently shows a significant negative relationship with the size/measure of the hippocampi (both sides) and the right amygdala. This is expected, as brain structures can change with age.

Cognitive Correlation: Both left and right hippocampal measures show a weak positive association with the general cognitive score used in this study. Amygdala measures did not show this association.

PCR Status Effect: Based on these analyses, there is no significant evidence to suggest that having had a positive COVID-19 PCR test is associated with differences in the measures of the hippocampus or amygdala (left or right) in this sample, after accounting for the effect of age. The effect sizes for PCR status were consistently negligible across all four regions.

Model Fit: The linear models explained a small percentage of the variance in the neuroimaging measures (Adjusted R-squared ranging from ~0% for left amygdala to ~6% for hippocampi), primarily driven by age.

Impute data

# Impute the data using the MICE package
library(mice)
# Use the CART imputation method
imp <- mice( data[c(2:91)], m = 5, maxit = 5, method = "cart", seed = 123)

# Complete the imputation process
imputed_data <- complete(imp)
# names(imputed_data)

Exploratory Graphical Analysis (EGA)

library(EGAnet)
# Run EGA on the imputed data

# EGA on the Symptoms and Demographics
fit_demo <- EGA(imputed_data[c(1:3, 5, 8:16, 18:22)])

# Scale cognitive variables before running the EGA
fit_cog <- EGA(scale(imputed_data[23:59]))

# Confirmatory Factor Analysis from EGA model
cfa <- CFA(fit_cog, plot = TRUE, data = (scale(imputed_data[23:59])), estimator = "MLR")
## [1] "listaprimerrec"   "listaaprendizaje" "listacp"          "listalp"         
## [5] "listarecon"      
##  [1] "corsidirecto" "corsiinverso" "cactusvivos"  "cactusinanim" "otverbalerr" 
##  [6] "dscorr"       "dscomis"      "bostonsc"     "bostonlat"    "bostonsemerr"
## [11] "bostonfonerr" "fluencia"    
## [1] "otverbaltpo"  "otvisualtpo"  "x5dreadtpo"   "x5dcounttpo"  "x5dfoctpo"   
## [6] "x5dswitchtpo"
## [1] "otvisualerr"  "otmentaltpo"  "otmentalerr"  "otvismenttpo" "otvismenterr"
## [6] "otswitchtpo"  "otswitcherr" 
## [1] "x5dreaderr"   "x5dcounterr"  "x5dfocerr"    "x5dswitcherr"
## [1] "torremov" "torretpo"

# Scale brain variables before running the EGA
fit_all <- EGA(scale(imputed_data[60:90]))

Compare groups function

#' Compare Groups Across Multiple Variables Automatically, Adjusting for Covariates
#'
#' This function iterates through specified variables in a dataset, determines their type,
#' fits appropriate generalized linear models (GLM), ordinal, or multinomial models,
#' and compares a model with the group predictor (and optional covariates) to a
#' null model (with optional covariates only) using a Likelihood Ratio Test (LRT).
#' It reports p-values and FDR-adjusted p-values for the group effect, adjusted for covariates.
#'
#' @param data A data.frame containing the variables to compare, the grouping variable,
#'   and any covariates.
#' @param group A character string specifying the name of the column in `data` that
#'   contains the grouping factor. Must have at least 2 levels.
#' @param vars_to_test A character vector specifying the names of the outcome variables
#'   in `data` to be tested against the `group`. If NULL (default), all columns
#'   except `group` and `covariates` are tested.
#' @param covariates A character vector specifying the names of the columns in `data` to
#'   be used as covariates in the regression models. Default is `NULL` (no covariates).
#'   These variables will be included in both the full and null models.
#' @param alpha The significance level (default 0.05) used to determine significance
#'   based on the FDR-adjusted p-value of the group effect.
#' @param zero_threshold For continuous outcome variables, the minimum proportion of zero values
#'   (default 0.3) to consider potentially using a Gamma GLM (along with skewness).
#' @param skew_threshold For continuous outcome variables, the minimum absolute skewness
#'   (default 2) to consider potentially using a Gamma GLM (along with zero proportion).
#'   Note: Uses `moments::skewness`.
#' @param verbose Logical indicating whether to print progress messages (default TRUE).
#' @param group_ref Optional. A string specifying the reference level for the group variable.
#'   If NULL (default), the first level is used.
#'
#' @return A data.frame summarizing the results for each variable tested, including:
#'   \item{Variable}{Name of the outcome variable tested.}
#'   \item{Type}{Detected type of the outcome variable (e.g., "Continuous (Gaussian GLM used)", "Binary", "Ordinal", "Categorical", "Constant Outcome Variable").}
#'   \item{Covariates_Used}{Comma-separated string of covariates included in the models for this variable.}
#'   \item{n_obs}{Number of observations used for the test after handling missing data for the outcome, group, and all covariates.}
#'   \item{Status}{Outcome of the modeling ("OK", "Error: [message]", "Constant Outcome Variable", "No DF improvement", "Low N or too many NAs", "Unsupported outcome variable type", "Log-Likelihood calculation failed...").}
#'   \item{p_value}{The raw p-value from the Likelihood Ratio Test for the `group` effect, formatted as a string.}
#'   \item{p_value_FDR}{The group p-value adjusted for multiple comparisons using the Benjamini-Hochberg FDR method, formatted as a string.}
#'   \item{Significant}{"Yes" if p_value_FDR < alpha, "No" otherwise (based on non-NA adjusted p-values).}
#'   \item{Gamma_Shift_Warning}{A flag ("Yes"/"No") indicating if non-positive outcome values were shifted for the Gamma GLM.}
#'   \item{Convergence_Warning}{A flag ("Yes"/"No") indicating if GLM/Multinom models reported non-convergence or if logLik calculation failed for Ordinal models (proxy for potential issues).}
#'
#' @details
#' - **Covariate Adjustment:** Compares `model(y ~ group + covariates)` vs `model(y ~ covariates)` (or `y ~ group` vs `y ~ 1` if no covariates).
#' - **Variable Selection:** Uses `vars_to_test` or defaults to all other variables.
#' - **Variable Types & Modeling:** Automatically detects outcome type. Uses Gaussian GLM (default continuous), Gamma GLM (skewed/zeros continuous), Binomial GLM (binary), `MASS::polr` (ordinal), `nnet::multinom` (categorical). Covariates used as-is (ensure factor/numeric).
#' - **Convergence:** Checks convergence flags for GLM/Multinom. For Ordinal (`polr`), failure to calculate log-likelihood is used as a proxy for potential convergence/Hessian issues and flagged.
#' - **Missing Data:** Uses complete cases for outcome, group, and all covariates per test.
#' - **Group Reference Level:** Users can specify the reference level for the group variable using `group_ref`. If the specified level is not present in the data for a particular variable after removing NAs, a warning is issued, and the default reference level is used.
#' - **Dependencies:** Requires 'MASS', 'nnet', and 'moments' packages.
#'
#' @examples
#' \dontrun{
#' set.seed(123)
#' sample_data <- data.frame(
#'   Site = factor(rep(c("Site1", "Site2"), each = 60)),
#'   Age = rnorm(120, mean = 50, sd = 10),
#'   Education = sample(10:20, 120, replace = TRUE),
#'   Sex = factor(sample(c("M", "F"), 120, replace = TRUE)),
#'   CognitiveScore = rnorm(120, mean = rep(c(100, 105), each = 60) + 0.1 * Age - 5 * (Sex == "F")),
#'   BiomarkerA = rgamma(120, shape = rep(c(2, 3), each = 60) + 0.05 * Age, scale = 1.5),
#'   Diagnosis = factor(sample(c("Normal", "MCI", "AD"), 120, replace = TRUE, prob = c(0.6, 0.3, 0.1))),
#'   OrdinalResp = factor(sample(1:5, 120, replace = TRUE), ordered = TRUE),
#'   ConstantVar = rep(10, 120)
#' )
#' sample_data$Age[sample(1:120, 10)] <- NA
#' sample_data$CognitiveScore[sample(1:120, 5)] <- NA
#' sample_data$BiomarkerA[1:5] <- NA
#'
#' results_with_cov <- compare_groups_auto_v4(
#'     data = sample_data,
#'     group = "Site",
#'     vars_to_test = c("CognitiveScore", "BiomarkerA", "Diagnosis", "OrdinalResp", "ConstantVar"),
#'     covariates = c("Age", "Education", "Sex"),
#'     verbose = TRUE
#' )
#' print(results_with_cov)
#'
#' # Example with specified group reference level
#' results_with_ref <- compare_groups_auto_v4(
#'     data = sample_data,
#'     group = "Site",
#'     vars_to_test = c("CognitiveScore", "BiomarkerA", "Diagnosis", "OrdinalResp", "ConstantVar"),
#'     covariates = c("Age", "Education", "Sex"),
#'     group_ref = "Site2",
#'     verbose = TRUE
#' )
#' print(results_with_ref)
#' }
#' @importFrom stats glm gaussian binomial Gamma logLik pchisq p.adjust complete.cases sd anova reformulate relevel na.omit
#' @importFrom MASS polr
#' @importFrom nnet multinom
#' @importFrom moments skewness
#' @export
compare_groups_auto_v4 <- function(data, group, vars_to_test = NULL, covariates = NULL, alpha = 0.05,
                                   zero_threshold = 0.3, skew_threshold = 2, verbose = TRUE, group_ref = NULL) {
  
  # --- 1. Input Validation and Package Checks ---
  if (!requireNamespace("MASS", quietly = TRUE)) stop("Package 'MASS' is required. Please install it.")
  if (!requireNamespace("nnet", quietly = TRUE)) stop("Package 'nnet' is required. Please install it.")
  if (!requireNamespace("moments", quietly = TRUE)) stop("Package 'moments' is required. Please install it.")
  
  if (!is.data.frame(data)) stop("'data' must be a data.frame.")
  if (!is.character(group) || length(group) != 1 || !(group %in% names(data))) {
    stop("'group' must be a single string naming an existing column in 'data'.")
  }
  
  if (!is.null(covariates)) {
    if (!is.character(covariates) || any(!covariates %in% names(data))) {
      stop("'covariates' must be NULL or a character vector of existing column names in 'data'.")
    }
    if (any(covariates %in% group)) stop("The 'group' variable cannot also be listed in 'covariates'.")
    covariates <- unique(covariates)
  } else {
    covariates <- character(0)
  }
  
  if (!is.null(vars_to_test)) {
    if (!is.character(vars_to_test) || any(!vars_to_test %in% names(data))) {
      stop("'vars_to_test' must be NULL or a character vector of existing column names in 'data'.")
    }
    if (any(vars_to_test %in% c(group, covariates))) stop("Variables in 'vars_to_test' cannot include the 'group' or 'covariates'.")
    vars_to_test <- unique(vars_to_test)
  }
  
  if (!is.null(group_ref) && (!is.character(group_ref) || length(group_ref) != 1)) {
    stop("'group_ref' must be NULL or a single string.")
  }
  
  if (!is.numeric(alpha) || alpha <= 0 || alpha >= 1) stop("'alpha' must be a numeric value strictly between 0 and 1.")
  if (!is.numeric(zero_threshold) || zero_threshold < 0 || zero_threshold > 1) stop("'zero_threshold' must be numeric between 0 and 1.")
  if (!is.numeric(skew_threshold) || skew_threshold < 0) stop("'skew_threshold' must be non-negative numeric.")
  if (!is.logical(verbose)) stop("'verbose' must be a logical value (TRUE or FALSE).")
  
  # --- 2. Prepare Data and Identify Variables ---
  if (!is.factor(data[[group]])) {
    if (verbose) message("Converting grouping variable '", group, "' to factor.")
    data[[group]] <- as.factor(data[[group]])
  }
  if (nlevels(data[[group]]) < 2) stop("Grouping variable '", group, "' must have at least two levels.")
  
  if (!is.null(group_ref)) {
    if (!(group_ref %in% levels(data[[group]]))) {
      stop(sprintf("'group_ref' '%s' is not a level of the group variable '%s'.", group_ref, group))
    }
  }
  
  if (is.null(vars_to_test)) {
    var_names <- setdiff(names(data), c(group, covariates))
  } else {
    var_names <- vars_to_test
  }
  
  if (length(var_names) == 0) {
    warning("No variables identified for testing after excluding 'group' and 'covariates'.")
    return(data.frame(Variable = character(), Type = character(), Covariates_Used = character(),
                      n_obs = integer(), Status = character(), p_value = numeric(),
                      p_value_FDR = numeric(), Significant = character(),
                      Gamma_Shift_Warning = character(), Convergence_Warning = character(),
                      stringsAsFactors = FALSE))
  }
  
  if (verbose) {
    message(sprintf("Starting analysis of %d variables with '%s' as the grouping variable.",
                    length(var_names), group))
    if (length(covariates) > 0) {
      message(sprintf("Using %d covariates: %s", length(covariates), paste(covariates, collapse = ", ")))
    }
  }
  
  # --- 3. Internal Helper Function: Model Fitting and LRT ---
  fit_models_and_lrt <- function(temp_data, formula_full, formula_null, type, variable, verbose,
                                 zero_threshold, skew_threshold) {
    gamma_shift_warning <- FALSE
    convergence_warning <- FALSE
    p_val <- NA_real_
    model_status <- "OK"
    type_used <- type
    data_for_model <- temp_data
    outcome <- data_for_model$y
    
    res <- tryCatch({
      if (type == "Continuous") {
        zero_prop <- mean(outcome == 0, na.rm = TRUE)
        skew_val <- tryCatch(moments::skewness(outcome, na.rm = TRUE), error = function(e) NA_real_)
        use_gamma <- !is.na(skew_val) && (zero_prop > zero_threshold) && (abs(skew_val) > skew_threshold)
        if (is.na(skew_val) && verbose) {
          message(sprintf("Skewness calculation failed for '%s'. Using Gaussian GLM.", variable))
        }
        if (use_gamma) {
          min_val <- min(outcome, na.rm = TRUE)
          if (min_val <= 0) {
            shift_amount <- max(1e-6, abs(min_val) * 1.01 + 1e-6)
            outcome <- outcome + shift_amount
            data_for_model$y <- outcome
            gamma_shift_warning <- TRUE
            if (verbose) warning(sprintf("Variable '%s': Contains non-positive values. Added shift of ~%s for Gamma GLM fitting.",
                                         variable, format(shift_amount, digits = 2)), call. = FALSE, immediate. = TRUE)
          }
          model_full <- stats::glm(formula = formula_full, data = data_for_model, family = stats::Gamma(link = "log"))
          model_null <- stats::glm(formula = formula_null, data = data_for_model, family = stats::Gamma(link = "log"))
          type_used <- "Continuous (Gamma GLM used)"
          if (!model_full$converged || !model_null$converged) {
            convergence_warning <- TRUE
            if (verbose) warning(sprintf("Gamma GLM did not converge for '%s'.", variable), call. = FALSE)
          }
        } else {
          model_full <- stats::glm(formula = formula_full, data = data_for_model, family = stats::gaussian(link = "identity"))
          model_null <- stats::glm(formula = formula_null, data = data_for_model, family = stats::gaussian(link = "identity"))
          type_used <- "Continuous (Gaussian GLM used)"
          if (!model_full$converged || !model_null$converged) {
            convergence_warning <- TRUE
            if (verbose) warning(sprintf("Gaussian GLM did not converge for '%s'.", variable), call. = FALSE)
          }
        }
      } else if (type == "Binary") {
        model_full <- stats::glm(formula = formula_full, data = data_for_model, family = stats::binomial(link = "logit"))
        model_null <- stats::glm(formula = formula_null, data = data_for_model, family = stats::binomial(link = "logit"))
        type_used <- "Binary"
        if (!model_full$converged || !model_null$converged) {
          convergence_warning <- TRUE
          if (verbose) warning(sprintf("Binomial GLM did not converge for '%s'.", variable), call. = FALSE)
        }
      } else if (type == "Ordinal") {
        model_full <- MASS::polr(formula = formula_full, data = data_for_model, method = "logistic", Hess = TRUE)
        model_null <- MASS::polr(formula = formula_null, data = data_for_model, method = "logistic", Hess = TRUE)
        type_used <- "Ordinal"
      } else if (type == "Categorical") {
        model_full <- nnet::multinom(formula = formula_full, data = data_for_model, trace = FALSE)
        model_null <- nnet::multinom(formula = formula_null, data = data_for_model, trace = FALSE)
        type_used <- "Categorical"
        if ((!is.null(model_full$convergence) && model_full$convergence != 0) ||
            (!is.null(model_null$convergence) && model_null$convergence != 0)) {
          convergence_warning <- TRUE
          if (verbose) warning(sprintf("Multinomial model did not fully converge for '%s'.", variable), call. = FALSE)
        }
      } else {
        stop("Internal error: Unknown model type.")
      }
      
      # --- Likelihood Ratio Test ---
      if (inherits(model_full, "glm")) {
        lrt_result <- suppressWarnings(stats::anova(model_null, model_full, test = "LRT"))
        if (!is.null(lrt_result) && "Pr(>Chi)" %in% names(lrt_result) && nrow(lrt_result) > 1) {
          candidate <- lrt_result$"Pr(>Chi)"[2]
          if (!is.na(candidate) && is.finite(candidate)) {
            p_val <- candidate
          } else {
            if (!is.na(lrt_result$Df[2]) && lrt_result$Df[2] <= 0) {
              model_status <- "No DF improvement/equivalent models"
              p_val <- 1.0
            } else {
              model_status <- "LRT p-value calculation failed (anova)"
              p_val <- NA_real_
            }
          }
        } else {
          model_status <- "LRT failed (anova output invalid/models identical)"
          p_val <- NA_real_
        }
      } else {
        ll_full <- tryCatch(stats::logLik(model_full), error = function(e) NA)
        ll_null <- tryCatch(stats::logLik(model_null), error = function(e) NA)
        if (is.na(ll_full) || is.na(ll_null)) {
          model_status <- "Log-Likelihood calculation failed (NA/NaN)"
          if (inherits(model_full, "polr")) {
            model_status <- paste(model_status, "- potential polr convergence?")
            convergence_warning <- TRUE
          }
          p_val <- NA_real_
        } else {
          lr_stat <- 2 * (as.numeric(ll_full) - as.numeric(ll_null))
          df_full <- attr(ll_full, "df")
          df_null <- attr(ll_null, "df")
          if (is.null(df_full) || is.null(df_null)) {
            model_status <- "Could not retrieve degrees of freedom for LRT"
            p_val <- NA_real_
          } else {
            df_diff <- df_full - df_null
            if (df_diff > .Machine$double.eps^0.5) {
              if (lr_stat < -1e-8) {
                model_status <- sprintf("Negative LRT statistic (%.2g)", lr_stat)
                p_val <- NA_real_
              } else {
                p_val <- stats::pchisq(max(0, lr_stat), df = df_diff, lower.tail = FALSE)
              }
            } else {
              p_val <- 1.0
              model_status <- "No DF improvement"
            }
          }
        }
      }
      
      list(p_value = p_val, model_status = model_status, type_used = type_used,
           gamma_shift_warning = gamma_shift_warning, convergence_warning = convergence_warning)
    }, error = function(e) {
      msg <- paste("Error:", gsub("[\\r\\n\\t]+", " ", conditionMessage(e)))
      if (verbose) message(sprintf("Model fitting/LRT failed for '%s'. Status: %s", variable, msg))
      list(p_value = NA_real_, model_status = msg, type_used = type,
           gamma_shift_warning = gamma_shift_warning, convergence_warning = convergence_warning)
    })
    
    return(res)
  }
  
  # --- 4. Loop Through Variables ---
  res_list <- vector("list", length(var_names))
  names(res_list) <- var_names
  
  for (i in seq_along(var_names)) {
    variable <- var_names[i]
    if (verbose && length(var_names) > 5 && i %% ceiling(length(var_names) / 10) == 1) {
      message(sprintf("Processing variable %d of %d: %s", i, length(var_names), variable))
    }
    
    # Initialize default result for the variable
    res <- data.frame(
      Variable = variable,
      Type = "Unknown",
      Covariates_Used = paste(covariates, collapse = ", "),
      n_obs = 0L,
      Status = "Not processed",
      p_value = NA_real_,
      p_value_FDR = NA_real_,
      Significant = "No",
      Gamma_Shift_Warning = "No",
      Convergence_Warning = "No",
      stringsAsFactors = FALSE
    )
    
    # --- Data Extraction and Missing Data Handling ---
    current_cols <- c(variable, group, covariates)
    full_data <- tryCatch(data[, current_cols, drop = FALSE], error = function(e) NULL)
    if (is.null(full_data)) {
      res$Status <- paste("Error accessing column:", variable)
      res_list[[variable]] <- res
      next
    }
    
    complete_idx <- stats::complete.cases(full_data)
    n_complete <- sum(complete_idx)
    res$n_obs <- n_complete
    min_obs_needed <- (nlevels(data[[group]]) - 1) + length(covariates) + nlevels(data[[group]]) + 5
    
    if (n_complete < min_obs_needed || n_complete < (2 * nlevels(data[[group]]))) {
      res$Status <- "Low N or too many NAs for model complexity"
      res_list[[variable]] <- res
      next
    }
    
    temp_data <- full_data[complete_idx, , drop = FALSE]
    names(temp_data)[names(temp_data) == variable] <- "y"
    temp_data[[group]] <- factor(temp_data[[group]])
    
    # Check if group has at least two levels after removing NAs
    if (nlevels(temp_data[[group]]) < 2) {
      res$Status <- "Grouping variable has less than 2 levels after removing NAs"
      res_list[[variable]] <- res
      next
    }
    
    # Set reference level for group if specified and present
    if (!is.null(group_ref) && group_ref %in% levels(temp_data[[group]])) {
      temp_data[[group]] <- relevel(temp_data[[group]], ref = group_ref)
    } else if (!is.null(group_ref)) {
      warning(sprintf("For variable '%s', the specified 'group_ref' '%s' is not present in the data after removing NAs. Using default reference level.", variable, group_ref))
    }
    
    x <- temp_data$y
    
    # --- Check for Constant Outcome Variable ---
    is_constant <- FALSE
    if (is.numeric(x)) {
      if (is.na(stats::sd(x, na.rm = TRUE)) || stats::sd(x, na.rm = TRUE) < .Machine$double.eps^0.5) {
        is_constant <- TRUE
      }
    } else {
      if (length(unique(stats::na.omit(x))) <= 1) is_constant <- TRUE
    }
    if (is_constant) {
      res$Status <- "Constant Outcome Variable"
      res$Type <- if (is.numeric(x)) "Numeric Constant" else "Factor/Char Constant"
      res_list[[variable]] <- res
      next
    }
    
    # --- Determine Outcome Variable Type ---
    type <- "Unknown"
    if (is.numeric(x)) {
      unique_vals <- unique(stats::na.omit(x))
      n_unique <- length(unique_vals)
      is_int_like_binary <- n_unique == 2 && all(abs(unique_vals - round(unique_vals)) < .Machine$double.eps^0.5)
      if (is_int_like_binary) {
        type <- "Binary"
        temp_data$y <- factor(temp_data$y, levels = sort(unique_vals))
      } else {
        type <- "Continuous"
      }
    } else if (is.factor(x)) {
      current_levels <- levels(droplevels(factor(x)))
      n_levels <- length(current_levels)
      original_levels <- levels(data[[variable]])
      if (is.ordered(x)) {
        if (n_levels < 2) {
          res$Status <- "Ordinal outcome with < 2 levels after NA removal"
          res_list[[variable]] <- res
          next
        }
        type <- "Ordinal"
        temp_data$y <- factor(temp_data$y, levels = intersect(original_levels, current_levels), ordered = TRUE)
      } else {
        if (n_levels == 2) {
          type <- "Binary"
          temp_data$y <- factor(temp_data$y, levels = current_levels)
        } else if (n_levels > 2) {
          type <- "Categorical"
          temp_data$y <- stats::relevel(factor(temp_data$y, levels = current_levels), ref = current_levels[1])
        } else {
          res$Status <- "Factor outcome with < 2 levels after NA removal"
          res_list[[variable]] <- res
          next
        }
      }
    } else if (is.character(x)) {
      temp_data$y <- factor(temp_data$y)
      current_levels <- levels(temp_data$y)
      n_levels <- length(current_levels)
      if (n_levels == 2) {
        type <- "Binary"
      } else if (n_levels > 2) {
        type <- "Categorical"
        temp_data$y <- stats::relevel(temp_data$y, ref = current_levels[1])
      } else {
        res$Status <- "Character outcome with < 2 levels after NA removal"
        res_list[[variable]] <- res
        next
      }
    } else {
      res$Status <- "Unsupported outcome variable type"
      res$Type <- class(x)[1]
      res_list[[variable]] <- res
      next
    }
    
    res$Type <- type
    
    # --- Ensure Covariates are Factor or Numeric ---
    for (covar in covariates) {
      if (!is.numeric(temp_data[[covar]]) && !is.factor(temp_data[[covar]])) {
        if (verbose) message("Converting covariate '", covar, "' to factor for variable '", variable, "'.")
        temp_data[[covar]] <- factor(temp_data[[covar]])
      }
    }
    
    # --- Construct Model Formulas ---
    terms_full <- c(group, covariates)
    terms_null <- if (length(covariates) > 0) covariates else "1"
    formula_full <- tryCatch(stats::reformulate(termlabels = terms_full, response = "y"),
                             error = function(e) NULL)
    formula_null <- tryCatch(stats::reformulate(termlabels = terms_null, response = "y"),
                             error = function(e) NULL)
    if (is.null(formula_full) || is.null(formula_null)) {
      res$Status <- "Error: Failed to construct model formulas"
      res_list[[variable]] <- res
      next
    }
    
    # --- Fit Models and Perform LRT via Helper Function ---
    model_out <- fit_models_and_lrt(temp_data, formula_full, formula_null, type, variable, verbose,
                                    zero_threshold, skew_threshold)
    res$p_value <- model_out$p_value
    res$Status <- model_out$model_status
    res$Type <- model_out$type_used
    if (model_out$gamma_shift_warning) res$Gamma_Shift_Warning <- "Yes"
    if (model_out$convergence_warning) res$Convergence_Warning <- "Yes"
    res_list[[variable]] <- res
    
    # --- Periodic Garbage Collection ---
    if (i %% 50 == 0 && (nrow(data) * ncol(data) > 1e6 || length(var_names) > 100)) {
      if (verbose) message("Running garbage collection...")
      gc(verbose = FALSE)
    }
  }
  
  # --- 5. Combine and Adjust Results ---
  res_df <- do.call(rbind, res_list)
  rownames(res_df) <- NULL
  valid_idx <- which(!is.na(res_df$p_value) & is.finite(res_df$p_value))
  if (length(valid_idx) > 0) {
    res_df$p_value_FDR[valid_idx] <- stats::p.adjust(res_df$p_value[valid_idx], method = "fdr")
    res_df$Significant <- ifelse(!is.na(res_df$p_value_FDR) &
                                   res_df$p_value_FDR < alpha, "Yes", "No")
    res_df$Significant[is.na(res_df$p_value)] <- "No"
  } else {
    res_df$p_value_FDR <- NA_real_
    res_df$Significant <- "No"
  }
  
  # --- 6. Round and Format P-values ---
  res_df$p_value <- ifelse(!is.na(res_df$p_value) & res_df$p_value < 0.001,
                           "<0.001", sprintf("%.3f", res_df$p_value))
  res_df$p_value_FDR <- ifelse(!is.na(res_df$p_value_FDR) & res_df$p_value_FDR < 0.001,
                               "<0.001", sprintf("%.3f", res_df$p_value_FDR))
  
  final_order <- c("Variable", "Type", "Covariates_Used", "n_obs", "Status",
                   "p_value", "p_value_FDR", "Significant",
                   "Gamma_Shift_Warning", "Convergence_Warning")
  if (!all(final_order %in% names(res_df))) {
    warning("Internal issue: Some expected result columns are missing.")
  } else {
    res_df <- res_df[, final_order, drop = FALSE]
  }
  
  if (verbose) message("Analysis finished. Returning results table.")
  return(res_df)
}
#' Compare Groups Across Multiple Variables Automatically, Adjusting for Covariates
#'
#' This function iterates through specified variables in a dataset, determines their type,
#' fits appropriate generalized linear models (GLM), ordinal, or multinomial models,
#' and compares a model with the group predictor (and optional covariates) to a
#' null model (with optional covariates only) using a Likelihood Ratio Test (LRT).
#' It reports p-values and FDR-adjusted p-values for the group effect, adjusted for covariates.
#'
#' @param data A data.frame containing the variables to compare, the grouping variable,
#'   and any covariates.
#' @param group A character string specifying the name of the column in `data` that
#'   contains the grouping factor. Must have at least 2 levels.
#' @param vars_to_test A character vector specifying the names of the outcome variables
#'   in `data` to be tested against the `group`. If NULL (default), all columns
#'   except `group` and `covariates` are tested.
#' @param covariates A character vector specifying the names of the columns in `data` to
#'   be used as covariates in the regression models. Default is `NULL` (no covariates).
#'   These variables will be included in both the full and null models.
#' @param alpha The significance level (default 0.05) used to determine significance
#'   based on the FDR-adjusted p-value of the group effect.
#' @param zero_threshold For continuous outcome variables, the minimum proportion of zero values
#'   (default 0.3) to consider potentially using a Gamma GLM (along with skewness).
#' @param skew_threshold For continuous outcome variables, the minimum absolute skewness
#'   (default 2) to consider potentially using a Gamma GLM (along with zero proportion).
#'   Note: Uses `moments::skewness`.
#' @param verbose Logical indicating whether to print progress messages (default TRUE).
#' @param group_ref Optional. A string specifying the reference level for the group variable.
#'   If NULL (default), the first level (after NA removal and potential factor conversion) is used.
#'
#' @return A data.frame summarizing the results for each variable tested, including:
#'   \item{Variable}{Name of the outcome variable tested.}
#'   \item{Type}{Detected type of the outcome variable (e.g., "Continuous (Gaussian GLM used)", "Binary", "Ordinal", "Categorical", "Constant Outcome Variable").}
#'   \item{Covariates_Used}{Comma-separated string of covariates included in the models for this variable.}
#'   \item{Group_Ref_Level_Used}{The actual reference level used for the group factor in the model for this variable.}
#'   \item{n_obs}{Number of observations used for the test after handling missing data for the outcome, group, and all covariates.}
#'   \item{Status}{Outcome of the modeling ("OK", "Error: [message]", "Constant Outcome Variable", "No DF improvement", "Low N or too many NAs...", "Unsupported outcome variable type", "Log-Likelihood calculation failed...").}
#'   \item{p_value}{The raw p-value from the Likelihood Ratio Test for the `group` effect, formatted as a string.}
#'   \item{p_value_FDR}{The group p-value adjusted for multiple comparisons using the Benjamini-Hochberg FDR method, formatted as a string.}
#'   \item{Significant}{"Yes" if p_value_FDR < alpha, "No" otherwise (based on non-NA adjusted p-values).}
#'   \item{Gamma_Shift_Warning}{Logical (`TRUE`/`FALSE`) indicating if non-positive outcome values were shifted for the Gamma GLM.}
#'   \item{Convergence_Warning}{Logical (`TRUE`/`FALSE`) indicating if GLM/Multinom models reported non-convergence or if logLik calculation failed for Ordinal models (proxy for potential issues).}
#'
#' @details
#' - **Covariate Adjustment:** Compares `model(y ~ group + covariates)` vs `model(y ~ covariates)` (or `y ~ group` vs `y ~ 1` if no covariates).
#' - **Variable Selection:** Uses `vars_to_test` or defaults to all other variables.
#' - **Variable Types & Modeling:** Automatically detects outcome type. Uses Gaussian GLM (default continuous), Gamma GLM (skewed/zeros continuous), Binomial GLM (binary), `MASS::polr` (ordinal), `nnet::multinom` (categorical). Covariates used as-is (ensure factor/numeric).
#' - **Convergence:** Checks convergence flags for GLM/Multinom. For Ordinal (`polr`), failure to calculate log-likelihood is used as a proxy for potential convergence/Hessian issues (as `polr` lacks a simple flag) and flagged. Note that binomial/multinomial models may also issue warnings or fail related to separation/quasi-separation, which might be reflected in convergence status or errors.
#' - **Missing Data:** Uses complete cases for outcome, group, and all covariates per test.
#' - **Minimum Observations:** A heuristic check (`n_complete > params_approx + 5`) is performed to ensure a minimal number of observations relative to the basic model parameters (intercept + group levels + covariates). This is a safeguard, not a guarantee of model stability.
#' - **Group Reference Level:** Users can specify the reference level for the group variable using `group_ref`. If the specified level is not present in the data for a particular variable after removing NAs, a warning is issued, and the default reference level (first level of the factor in the subset) is used. The actual level used is reported.
#' - **Dependencies:** Requires 'MASS', 'nnet', and 'moments' packages.
#'
#' @examples
#' \dontrun{
#' set.seed(123)
#' sample_data <- data.frame(
#'   Site = factor(rep(c("Site1", "Site2", "Site3"), each = 40)), # Added Site3
#'   Age = rnorm(120, mean = 50, sd = 10),
#'   Education = sample(10:20, 120, replace = TRUE),
#'   Sex = factor(sample(c("M", "F"), 120, replace = TRUE)),
#'   CognitiveScore = rnorm(120, mean = rep(c(100, 105, 102), each = 40) + 0.1 * Age - 5 * (Sex == "F")),
#'   BiomarkerA = rgamma(120, shape = rep(c(2, 3, 2.5), each = 40) + 0.05 * Age, scale = 1.5),
#'   Diagnosis = factor(sample(c("Normal", "MCI", "AD"), 120, replace = TRUE, prob = c(0.6, 0.3, 0.1))),
#'   OrdinalResp = factor(sample(1:5, 120, replace = TRUE), ordered = TRUE),
#'   ConstantVar = rep(10, 120),
#'   LowNVar = c(rnorm(5), rep(NA, 115)) # Variable with few non-NA cases
#' )
#' sample_data$Age[sample(1:120, 10)] <- NA
#' sample_data$CognitiveScore[sample(1:120, 5)] <- NA
#' sample_data$BiomarkerA[1:5] <- NA
#' # Make Site3 rare for CognitiveScore after NA removal
#' sample_data$CognitiveScore[sample(which(sample_data$Site == "Site3"), 35)] <- NA
#'
#' results_final <- compare_groups_auto_v4(
#'     data = sample_data,
#'     group = "Site",
#'     # Test all relevant variables
#'     vars_to_test = c("CognitiveScore", "BiomarkerA", "Diagnosis",
#'                      "OrdinalResp", "ConstantVar", "LowNVar", "Age"),
#'     covariates = c("Education", "Sex"), # Using fewer covariates for example
#'     group_ref = "Site2", # Specify reference
#'     verbose = TRUE
#' )
#' print(results_final)
#' }
#' @importFrom stats glm gaussian binomial Gamma logLik pchisq p.adjust complete.cases sd anova reformulate relevel na.omit
#' @importFrom MASS polr
#' @importFrom nnet multinom
#' @importFrom moments skewness
#' @export
compare_groups_auto_v4 <- function(data, group, vars_to_test = NULL, covariates = NULL, alpha = 0.05,
                                   zero_threshold = 0.3, skew_threshold = 2, verbose = TRUE, group_ref = NULL) {
  
  # --- 1. Input Validation and Package Checks ---
  # Ensure required packages are available
  if (!requireNamespace("MASS", quietly = TRUE)) stop("Package 'MASS' is required. Please install it.")
  if (!requireNamespace("nnet", quietly = TRUE)) stop("Package 'nnet' is required. Please install it.")
  if (!requireNamespace("moments", quietly = TRUE)) stop("Package 'moments' is required. Please install it.")
  
  # Validate inputs
  if (!is.data.frame(data)) stop("'data' must be a data.frame.")
  if (!is.character(group) || length(group) != 1 || !(group %in% names(data))) {
    stop("'group' must be a single string naming an existing column in 'data'.")
  }
  
  if (!is.null(covariates)) {
    if (!is.character(covariates) || any(!covariates %in% names(data))) {
      stop("'covariates' must be NULL or a character vector of existing column names in 'data'.")
    }
    if (any(covariates %in% group)) stop("The 'group' variable cannot also be listed in 'covariates'.")
    covariates <- unique(covariates) # Remove duplicates
  } else {
    covariates <- character(0) # Ensure it's an empty character vector if NULL
  }
  
  if (!is.null(vars_to_test)) {
    if (!is.character(vars_to_test) || any(!vars_to_test %in% names(data))) {
      stop("'vars_to_test' must be NULL or a character vector of existing column names in 'data'.")
    }
    if (any(vars_to_test %in% c(group, covariates))) stop("Variables in 'vars_to_test' cannot include the 'group' or 'covariates'.")
    vars_to_test <- unique(vars_to_test) # Remove duplicates
  }
  
  if (!is.null(group_ref) && (!is.character(group_ref) || length(group_ref) != 1)) {
    stop("'group_ref' must be NULL or a single string.")
  }
  
  # Validate numeric parameters
  if (!is.numeric(alpha) || alpha <= 0 || alpha >= 1) stop("'alpha' must be a numeric value strictly between 0 and 1.")
  if (!is.numeric(zero_threshold) || zero_threshold < 0 || zero_threshold > 1) stop("'zero_threshold' must be numeric between 0 and 1.")
  if (!is.numeric(skew_threshold) || skew_threshold < 0) stop("'skew_threshold' must be non-negative numeric.")
  if (!is.logical(verbose)) stop("'verbose' must be a logical value (TRUE or FALSE).")
  
  # --- 2. Prepare Data and Identify Variables ---
  # Ensure group variable is a factor
  if (!is.factor(data[[group]])) {
    if (verbose) message("Converting grouping variable '", group, "' to factor.")
    data[[group]] <- as.factor(data[[group]])
  }
  original_group_levels <- levels(data[[group]])
  if (length(original_group_levels) < 2) stop("Grouping variable '", group, "' must have at least two levels in the original data.")
  
  # Validate group_ref against original levels
  if (!is.null(group_ref)) {
    if (!(group_ref %in% original_group_levels)) {
      stop(sprintf("'group_ref' '%s' is not a level of the original group variable '%s'. Available levels: %s",
                   group_ref, group, paste(original_group_levels, collapse=", ")))
    }
  }
  
  # Identify variables to test
  if (is.null(vars_to_test)) {
    var_names <- base::setdiff(names(data), c(group, covariates))
  } else {
    var_names <- vars_to_test # Already validated and made unique
  }
  
  # Handle case with no variables to test
  if (length(var_names) == 0) {
    warning("No variables identified for testing after excluding 'group' and 'covariates'.")
    return(data.frame(Variable = character(), Type = character(), Covariates_Used = character(),
                      Group_Ref_Level_Used = character(), n_obs = integer(), Status = character(),
                      p_value = character(), p_value_FDR = character(), Significant = character(),
                      Gamma_Shift_Warning = logical(), Convergence_Warning = logical(),
                      stringsAsFactors = FALSE))
  }
  
  # Initial message
  if (verbose) {
    message(sprintf("Starting analysis of %d variables with '%s' as the grouping variable.",
                    length(var_names), group))
    if (length(covariates) > 0) {
      message(sprintf("Using %d covariates: %s", length(covariates), paste(covariates, collapse = ", ")))
    }
    if (!is.null(group_ref)) message(sprintf("Attempting to use '%s' as the reference level for '%s'.", group_ref, group))
  }
  
  # --- 3. Internal Helper Function: Model Fitting and LRT ---
  # This function isolates the core modeling logic for a single variable
  fit_models_and_lrt <- function(temp_data, formula_full, formula_null, type, variable, verbose,
                                 zero_threshold, skew_threshold) {
    # Initialize flags and results
    gamma_shift_warning <- FALSE
    convergence_warning <- FALSE
    p_val <- NA_real_
    model_status <- "OK"
    type_used <- type
    data_for_model <- temp_data # Work on a copy? Unlikely needed here.
    outcome <- data_for_model$y
    
    # Use tryCatch to handle potential errors during model fitting/comparison
    res <- tryCatch({
      # --- Model Fitting based on Variable Type ---
      if (type == "Continuous") {
        zero_prop <- base::mean(outcome == 0, na.rm = TRUE)
        # Use moments::skewness safely
        skew_val <- tryCatch(moments::skewness(outcome, na.rm = TRUE), error = function(e) NA_real_)
        # Decide whether to use Gamma GLM based on thresholds
        use_gamma <- !is.na(skew_val) && (zero_prop > zero_threshold) && (abs(skew_val) > skew_threshold)
        if (is.na(skew_val) && verbose) {
          message(sprintf("Skewness calculation failed for '%s'. Using Gaussian GLM.", variable))
        }
        
        if (use_gamma) {
          type_used <- "Continuous (Gamma GLM used)"
          min_val <- base::min(outcome, na.rm = TRUE)
          # Shift non-positive values for Gamma GLM
          if (min_val <= 0) {
            shift_amount <- base::max(1e-6, abs(min_val) * 1.01 + 1e-6) # Small relative shift
            outcome <- outcome + shift_amount
            data_for_model$y <- outcome # Update outcome in data used for modeling
            gamma_shift_warning <- TRUE
            if (verbose) warning(sprintf("Variable '%s': Contains non-positive values. Added shift of ~%s for Gamma GLM fitting.",
                                         variable, format(shift_amount, digits = 2)), call. = FALSE, immediate. = TRUE)
          }
          # Fit Gamma GLMs
          model_full <- stats::glm(formula = formula_full, data = data_for_model, family = stats::Gamma(link = "log"))
          model_null <- stats::glm(formula = formula_null, data = data_for_model, family = stats::Gamma(link = "log"))
          if (!model_full$converged || !model_null$converged) {
            convergence_warning <- TRUE
            if (verbose) warning(sprintf("Gamma GLM did not converge for '%s'.", variable), call. = FALSE)
          }
        } else {
          type_used <- "Continuous (Gaussian GLM used)"
          # Fit Gaussian GLMs
          model_full <- stats::glm(formula = formula_full, data = data_for_model, family = stats::gaussian(link = "identity"))
          model_null <- stats::glm(formula = formula_null, data = data_for_model, family = stats::gaussian(link = "identity"))
          if (!model_full$converged || !model_null$converged) {
            convergence_warning <- TRUE
            if (verbose) warning(sprintf("Gaussian GLM did not converge for '%s'.", variable), call. = FALSE)
          }
        }
      } else if (type == "Binary") {
        type_used <- "Binary"
        # Fit Binomial GLMs
        model_full <- stats::glm(formula = formula_full, data = data_for_model, family = stats::binomial(link = "logit"))
        model_null <- stats::glm(formula = formula_null, data = data_for_model, family = stats::binomial(link = "logit"))
        if (!model_full$converged || !model_null$converged) {
          convergence_warning <- TRUE
          if (verbose) warning(sprintf("Binomial GLM did not converge for '%s'.", variable), call. = FALSE)
        }
      } else if (type == "Ordinal") {
        type_used <- "Ordinal"
        # Fit Proportional Odds Logistic Regression (POLR) models
        model_full <- MASS::polr(formula = formula_full, data = data_for_model, method = "logistic", Hess = TRUE)
        model_null <- MASS::polr(formula = formula_null, data = data_for_model, method = "logistic", Hess = TRUE)
        # Convergence check for polr relies on logLik below
      } else if (type == "Categorical") {
        type_used <- "Categorical"
        # Fit Multinomial Log-linear Models
        # Ensure baseline category is explicit (handled in main loop via relevel)
        model_full <- nnet::multinom(formula = formula_full, data = data_for_model, trace = FALSE)
        model_null <- nnet::multinom(formula = formula_null, data = data_for_model, trace = FALSE)
        # Check nnet convergence flag (0 indicates success)
        if ((!is.null(model_full$convergence) && model_full$convergence != 0) ||
            (!is.null(model_null$convergence) && model_null$convergence != 0)) {
          convergence_warning <- TRUE
          if (verbose) warning(sprintf("Multinomial model did not fully converge for '%s'. Code: %s (full), %s (null).",
                                       variable, model_full$convergence, model_null$convergence), call. = FALSE)
        }
      } else {
        # This case should not be reached due to prior checks
        stop("Internal error: Unknown model type specified.")
      }
      
      # --- Likelihood Ratio Test (LRT) ---
      # Use stats::anova for GLM objects, manual logLik comparison otherwise
      if (inherits(model_full, "glm")) {
        # Suppress warnings from anova (e.g., about non-integer df) - check output carefully
        lrt_result <- suppressWarnings(stats::anova(model_null, model_full, test = "LRT"))
        # Validate the anova output
        if (!is.null(lrt_result) && "Pr(>Chi)" %in% names(lrt_result) && nrow(lrt_result) > 1) {
          candidate_p <- lrt_result$"Pr(>Chi)"[2]
          candidate_df <- lrt_result$Df[2] # Df difference
          # Check if Df difference is valid (NA or non-positive indicates issues)
          if(!is.na(candidate_df) && candidate_df <= 0) {
            model_status <- "No DF improvement/equivalent models (anova)"
            p_val <- 1.0 # Models are equivalent or null is more complex
          } else if (!is.na(candidate_p) && is.finite(candidate_p)) {
            p_val <- candidate_p # Valid p-value from anova
          } else {
            # P-value calculation failed within anova
            model_status <- "LRT p-value calculation failed (anova returned NA/NaN)"
            p_val <- NA_real_
          }
        } else {
          # Anova failed to produce expected output
          model_status <- "LRT failed (anova output invalid or models identical?)"
          # Consider if models are identical, maybe p=1? But anova failure suggests other issues.
          p_val <- NA_real_
        }
      } else { # Manual LRT for polr, multinom
        # Safely get log-likelihoods
        ll_full <- tryCatch(stats::logLik(model_full), error = function(e) NA)
        ll_null <- tryCatch(stats::logLik(model_null), error = function(e) NA)
        
        # Check if logLik calculation succeeded
        if (is.na(ll_full) || is.na(ll_null)) {
          model_status <- "Log-Likelihood calculation failed (NA/NaN)"
          if (inherits(model_full, "polr")) {
            # For polr, logLik failure often indicates fitting issues (e.g., Hessian non-positive definite)
            # Use this as a proxy for convergence/fitting problems.
            model_status <- paste(model_status, "- potential polr convergence/Hessian issue?")
            convergence_warning <- TRUE # Set convergence flag as proxy
          }
          p_val <- NA_real_
        } else {
          # Calculate LRT statistic and degrees of freedom difference
          lr_stat <- 2 * (as.numeric(ll_full) - as.numeric(ll_null))
          df_full <- attr(ll_full, "df")
          df_null <- attr(ll_null, "df")
          
          if (is.null(df_full) || is.null(df_null)) {
            # Should not happen if logLik worked, but check anyway
            model_status <- "Could not retrieve degrees of freedom for LRT"
            p_val <- NA_real_
          } else {
            df_diff <- df_full - df_null
            # Check if full model has more parameters (allow for floating point noise)
            if (df_diff > sqrt(.Machine$double.eps)) {
              # Check for negative LRT statistic (usually indicates fitting error or numerical instability)
              if (lr_stat < -sqrt(.Machine$double.eps)) {
                model_status <- sprintf("Warning: Negative LRT statistic (%.2g), check model fits/convergence.", lr_stat)
                p_val <- NA_real_ # P-value is unreliable
              } else {
                # Calculate p-value using chi-squared distribution
                # Ensure LRT statistic is non-negative before calculating p-value
                p_val <- stats::pchisq(max(0, lr_stat), df = df_diff, lower.tail = FALSE)
              }
            } else {
              # Full model does not have more parameters than null model
              p_val <- 1.0 # No evidence against null hypothesis based on complexity
              model_status <- "No DF improvement"
            }
          }
        }
      }
      # Final check for NaN p-values
      if (is.nan(p_val)) {
        p_val <- NA_real_
        if (model_status == "OK") model_status <- "LRT resulted in NaN p-value"
      }
      
      # Return results list
      list(p_value = p_val, model_status = model_status, type_used = type_used,
           gamma_shift_warning = gamma_shift_warning, convergence_warning = convergence_warning)
      
    }, error = function(e) {
      # Catch errors during the tryCatch block (fitting or LRT)
      msg <- paste("Error during model/LRT:", gsub("[\\r\\n\\t]+", " ", conditionMessage(e)))
      if (verbose) message(sprintf("Model fitting/LRT process failed for '%s'. Status: %s", variable, msg))
      # Return NA p-value and error status, maintain existing warning flags if set before error
      list(p_value = NA_real_, model_status = msg, type_used = type, # type might not be updated (e.g. to Gamma)
           gamma_shift_warning = gamma_shift_warning, convergence_warning = convergence_warning)
    }) # End tryCatch
    
    return(res)
  } # End helper function fit_models_and_lrt
  
  # --- 4. Loop Through Variables to Test ---
  res_list <- vector("list", length(var_names))
  names(res_list) <- var_names
  
  for (i in seq_along(var_names)) {
    variable <- var_names[i]
    # Progress message
    if (verbose && length(var_names) > 5 && i %% ceiling(length(var_names) / 10) == 1) {
      message(sprintf("Processing variable %d of %d: %s", i, length(var_names), variable))
    }
    
    # Initialize default result structure for this variable
    res <- data.frame(
      Variable = variable,
      Type = "Unknown",
      Covariates_Used = paste(covariates, collapse = ", "),
      Group_Ref_Level_Used = NA_character_, # Placeholder for actual ref level
      n_obs = 0L,
      Status = "Not processed",
      p_value = NA_real_, # Store raw p-value initially
      p_value_FDR = NA_real_,
      Significant = "No", # Default to No significance
      Gamma_Shift_Warning = FALSE,
      Convergence_Warning = FALSE,
      stringsAsFactors = FALSE
    )
    
    # --- Data Subsetting and Missing Data Handling for Current Variable ---
    current_cols <- c(variable, group, covariates)
    # Use tryCatch for safe column access
    full_data_subset <- tryCatch(data[, current_cols, drop = FALSE], error = function(e) NULL)
    if (is.null(full_data_subset)) {
      # Identify missing columns if possible
      missing_cols <- current_cols[!current_cols %in% names(data)]
      res$Status <- paste("Error accessing column(s):", paste(missing_cols, collapse=", "))
      res_list[[variable]] <- res
      next # Skip to next variable
    }
    
    # Get complete cases for this variable + group + covariates
    complete_idx <- stats::complete.cases(full_data_subset)
    n_complete <- sum(complete_idx)
    res$n_obs <- n_complete # Store number of observations used
    temp_data <- full_data_subset[complete_idx, , drop = FALSE]
    
    # --- Check Minimum Observations and Group Levels *after* NA removal ---
    # Ensure group variable is factor in the subset and drop unused levels
    temp_data[[group]] <- base::droplevels(factor(temp_data[[group]]))
    nlevels_subset <- nlevels(temp_data[[group]])
    
    # Check if group still has at least two levels
    if (nlevels_subset < 2) {
      res$Status <- "Grouping variable has < 2 levels after NA removal"
      res_list[[variable]] <- res
      next
    }
    
    # Heuristic check for sufficient observations relative to basic model parameters
    # Params approx = intercept + group_levels-1 + num_covariates
    # This is a rough safeguard, not a guarantee of stable estimation.
    params_approx <- 1 + (nlevels_subset - 1) + length(covariates)
    min_obs_threshold <- params_approx + 5 # Require ~5 obs beyond basic parameters
    min_group_threshold <- 2 * nlevels_subset # Require minimum average obs per group
    
    if (n_complete < min_obs_threshold || n_complete < min_group_threshold) {
      res$Status <- sprintf("Low N (%d) for model complexity (heuristic check: need >~%d obs for params, >~%d for group avg)",
                            n_complete, min_obs_threshold, min_group_threshold)
      res_list[[variable]] <- res
      next
    }
    
    # --- Set Reference Level for Group (if specified and valid in subset) ---
    current_group_levels_subset <- levels(temp_data[[group]])
    ref_level_to_use <- current_group_levels_subset[1] # Default is first level in subset
    if (!is.null(group_ref)) {
      if (group_ref %in% current_group_levels_subset) {
        # Set the specified reference level if it exists in the current subset
        temp_data[[group]] <- stats::relevel(temp_data[[group]], ref = group_ref)
        ref_level_to_use <- group_ref
      } else {
        # Warn if specified ref level is missing after NA removal, use default
        warning(sprintf("For variable '%s', specified 'group_ref' '%s' is not present after NA removal. Using default reference '%s'.",
                        variable, group_ref, ref_level_to_use), call. = FALSE, immediate. = TRUE)
        # No need to relevel, as the default is the first level already
      }
    }
    res$Group_Ref_Level_Used <- ref_level_to_use # Store the reference level actually used
    
    # --- Prepare Outcome Variable `y` and Check for Constant ---
    names(temp_data)[names(temp_data) == variable] <- "y" # Rename outcome for formula use
    outcome_vec <- temp_data$y
    
    # Check if outcome is constant within the subset
    is_constant <- FALSE
    if(is.numeric(outcome_vec)) {
      sd_outcome <- stats::sd(outcome_vec, na.rm = TRUE) # Already handled NAs
      # Check if sd is NA (single value) or effectively zero
      if (is.na(sd_outcome) || sd_outcome < sqrt(.Machine$double.eps)) {
        is_constant <- TRUE
      }
    } else {
      # Check for single unique non-NA value (NAs already removed)
      if (length(unique(outcome_vec)) <= 1) is_constant <- TRUE
    }
    if(is_constant) {
      res$Status <- "Constant Outcome Variable (after NA removal)"
      res$Type <- if(is.numeric(outcome_vec)) "Numeric Constant" else "Factor/Char Constant"
      res_list[[variable]] <- res
      next
    }
    
    # --- Determine Outcome Variable Type for Modeling ---
    type <- "Unknown"
    if (is.numeric(outcome_vec)) {
      unique_vals <- unique(outcome_vec) # NAs already removed
      n_unique <- length(unique_vals)
      # Check if looks like binary integer (e.g., 0/1, 1/2)
      is_int_like <- all(abs(unique_vals - round(unique_vals)) < sqrt(.Machine$double.eps))
      if (n_unique == 2 && is_int_like) {
        type <- "Binary"
        # Convert to factor for modeling, ensuring consistent levels (e.g., 0 then 1)
        temp_data$y <- factor(temp_data$y, levels = sort(unique_vals))
      } else {
        type <- "Continuous"
        # Keep as numeric for Gaussian/Gamma GLM
      }
    } else if (is.factor(outcome_vec)) {
      # Factor levels already dropped via droplevels earlier on temp_data[[group]]
      # Need to ensure outcome factor levels are also correct for the subset
      temp_data$y <- base::droplevels(factor(outcome_vec)) # Ensure y uses only present levels
      current_levels_y <- levels(temp_data$y)
      n_levels_y <- length(current_levels_y)
      
      if (is.ordered(outcome_vec)) { # Check original variable's property
        if (n_levels_y < 2) {
          res$Status <- "Ordinal outcome with < 2 levels after NA removal"
          res_list[[variable]] <- res
          next
        }
        type <- "Ordinal"
        # Ensure factor remains ordered with only present levels
        temp_data$y <- factor(temp_data$y, levels = current_levels_y, ordered = TRUE)
      } else { # Unordered factor
        if (n_levels_y == 2) {
          type <- "Binary"
          # Ensure factor with correct levels (already done by droplevels+factor)
        } else if (n_levels_y > 2) {
          type <- "Categorical"
          # Relevel ensures a consistent baseline category (first level) for multinom
          temp_data$y <- stats::relevel(temp_data$y, ref = current_levels_y[1])
        } else { # n_levels_y < 2
          res$Status <- "Factor outcome with < 2 levels after NA removal"
          res_list[[variable]] <- res
          next
        }
      }
    } else if (is.character(outcome_vec)) {
      # Convert character to factor for modeling
      temp_data$y <- factor(outcome_vec)
      current_levels_y <- levels(temp_data$y)
      n_levels_y <- length(current_levels_y)
      if (n_levels_y == 2) {
        type <- "Binary"
      } else if (n_levels_y > 2) {
        type <- "Categorical"
        # Relevel ensures a consistent baseline category (first level) for multinom
        temp_data$y <- stats::relevel(temp_data$y, ref = current_levels_y[1])
      } else { # n_levels_y < 2
        res$Status <- "Character outcome with < 2 levels after NA removal"
        res_list[[variable]] <- res
        next
      }
    } else {
      # Handle unexpected types (e.g., list, date - though date might work as numeric)
      res$Status <- paste("Unsupported outcome variable type:", class(outcome_vec)[1])
      res$Type <- class(outcome_vec)[1]
      res_list[[variable]] <- res
      next
    }
    
    # Store the determined type, might be updated later by helper if Gamma GLM is used
    res$Type <- type
    
    # --- Ensure Covariates are Factor or Numeric (within subset) ---
    # Also check for constant covariates within the subset, which often cause errors
    for (covar in covariates) {
      covar_vec <- temp_data[[covar]]
      if(length(unique(stats::na.omit(covar_vec))) <= 1) {
        warning(sprintf("Covariate '%s' is constant for variable '%s' after NA removal. This will likely cause model fitting errors.",
                        covar, variable), call. = FALSE, immediate. = TRUE)
      }
      # Convert character covariates to factors if not already numeric/factor
      if (!is.numeric(covar_vec) && !is.factor(covar_vec)) {
        if (verbose) message(sprintf("Converting covariate '%s' to factor for variable '%s'.", covar, variable))
        temp_data[[covar]] <- factor(covar_vec)
      }
    }
    
    # --- Construct Model Formulas ---
    terms_full <- c(group, covariates)
    # Null model: only covariates, or intercept-only if no covariates
    terms_null <- if (length(covariates) > 0) covariates else "1"
    
    # Use tryCatch for formula construction just in case of weird term names
    formula_full <- tryCatch(stats::reformulate(termlabels = terms_full, response = "y"),
                             error = function(e) NULL)
    formula_null <- tryCatch(stats::reformulate(termlabels = terms_null, response = "y"),
                             error = function(e) NULL)
    
    if (is.null(formula_full) || is.null(formula_null)) {
      res$Status <- "Error: Failed to construct model formulas (invalid terms?)"
      res_list[[variable]] <- res
      next
    }
    
    # --- Fit Models and Perform LRT via Helper Function ---
    model_out <- fit_models_and_lrt(temp_data = temp_data,
                                    formula_full = formula_full,
                                    formula_null = formula_null,
                                    type = type, # Pass initial type
                                    variable = variable,
                                    verbose = verbose,
                                    zero_threshold = zero_threshold,
                                    skew_threshold = skew_threshold)
    
    # --- Store Results from Helper ---
    res$p_value <- model_out$p_value # Store raw p-value
    res$Status <- model_out$model_status
    res$Type <- model_out$type_used # Update type if Gamma was used
    res$Gamma_Shift_Warning <- model_out$gamma_shift_warning
    res$Convergence_Warning <- model_out$convergence_warning
    res_list[[variable]] <- res # Add results for this variable to the list
    
    # --- Periodic Garbage Collection (optional, for very large datasets/loops) ---
    if (i %% 50 == 0 && (nrow(data) * ncol(data) > 1e6 || length(var_names) > 100)) {
      if (verbose) message("Running garbage collection...")
      gc(verbose = FALSE)
    }
  } # End loop through variables
  
  # --- 5. Combine, Adjust, and Format Results ---
  if (length(res_list) == 0) {
    # Should have been caught earlier, but as a safeguard
    warning("No results were generated in the loop.")
    # Return empty structure consistent with initial empty check
    return(data.frame(Variable = character(), Type = character(), Covariates_Used = character(),
                      Group_Ref_Level_Used = character(), n_obs = integer(), Status = character(),
                      p_value = character(), p_value_FDR = character(), Significant = character(),
                      Gamma_Shift_Warning = logical(), Convergence_Warning = logical(),
                      stringsAsFactors = FALSE))
  }
  # Combine list of data frames into one
  res_df <- do.call(rbind, res_list)
  rownames(res_df) <- NULL # Clean row names
  
  # Calculate FDR adjusted p-values only for valid numeric raw p-values
  valid_p_idx <- which(!is.na(res_df$p_value) & is.finite(res_df$p_value))
  res_df$p_value_FDR <- NA_real_ # Initialize column before filling
  
  if (length(valid_p_idx) > 0) {
    # Calculate FDR adjusted p-values
    res_df$p_value_FDR[valid_p_idx] <- stats::p.adjust(res_df$p_value[valid_p_idx], method = "fdr")
    
    # Determine significance based on FDR p-value
    # Ensure Significant column exists if needed (should be initialized)
    if (!"Significant" %in% names(res_df)) res_df$Significant <- "No"
    
    res_df$Significant <- ifelse(!is.na(res_df$p_value_FDR) & res_df$p_value_FDR < alpha,
                                 "Yes", "No")
    # Ensure non-significant if p-value was NA or adjustment resulted in NA
    res_df$Significant[is.na(res_df$p_value_FDR)] <- "No"
    
  } else {
    # If no valid p-values, ensure FDR is NA and Significance is No
    res_df$p_value_FDR <- NA_real_
    res_df$Significant <- "No"
  }
  
  # --- Format P-values for Reporting (after FDR calculation) ---
  # Convert numeric p-values to formatted strings
  p_val_fmt <- ifelse(!is.na(res_df$p_value) & res_df$p_value < 0.001,
                      "<0.001", sprintf("%.3f", res_df$p_value))
  p_val_fdr_fmt <- ifelse(!is.na(res_df$p_value_FDR) & res_df$p_value_FDR < 0.001,
                          "<0.001", sprintf("%.3f", res_df$p_value_FDR))
  
  # Assign formatted strings back, handling potential NAs from numeric columns
  res_df$p_value <- ifelse(is.na(res_df$p_value), NA_character_, p_val_fmt)
  res_df$p_value_FDR <- ifelse(is.na(res_df$p_value_FDR), NA_character_, p_val_fdr_fmt)
  
  
  # --- 7. Final Output ---
  # Ensure consistent column order
  final_order <- c("Variable", "Type", "Covariates_Used", "Group_Ref_Level_Used",
                   "n_obs", "Status", "p_value", "p_value_FDR", "Significant",
                   "Gamma_Shift_Warning", "Convergence_Warning")
  # Check if all expected columns are present before ordering
  if (!all(final_order %in% names(res_df))) {
    warning("Internal issue: Some expected result columns might be missing.")
    # Order using only the columns that are present
    final_order <- intersect(final_order, names(res_df))
  }
  res_df <- res_df[, final_order, drop = FALSE]
  
  if (verbose) message("Analysis finished. Returning results table.")
  return(res_df)
}
#' Plot Results from compare_groups_auto_v4 function
#'
#' Cria um gráfico de pontos (lollipop) mostrando a significância (-log10 p-valor ajustado por FDR)
#' para cada variável testada a partir do output da função compare_groups_auto_v4.
#' Ideal para visualização de múltiplas comparações em artigos científicos.
#'
#' @param results_df Dataframe retornado pela função `compare_groups_auto_v4`.
#' @param p_value_type Character. Qual p-valor usar para o eixo y?
#'   Padrão é `"FDR"` (p_value_FDR), recomendado para múltiplas comparações.
#'   Pode ser alterado para `"raw"` (p_value) se desejado.
#' @param significance_threshold Numeric. Limiar de significância (alfa) para desenhar
#'   uma linha vertical e colorir os pontos (padrão 0.05).
#' @param label_significant Logical. Se `TRUE` (padrão), rotula os pontos mais
#'   significativos diretamente no gráfico usando `ggrepel`.
#' @param n_label Integer. Número máximo de pontos significativos a serem rotulados
#'   (padrão 15), para evitar poluição visual. Usado apenas se `label_significant = TRUE`.
#' @param plot_title Character. Título opcional para o gráfico. Se `NULL` (padrão),
#'   um título genérico é gerado.
#' @param exclude_statuses Character vector. Status da coluna `Status` a serem
#'   excluídos do gráfico (padrão: exclui erros, N baixo, constante, não suportado).
#'   Defina como `NULL` para incluir todos os status.
#' @param order_by_significance Logical. Se `TRUE` (padrão), ordena as variáveis
#'   no eixo y pela sua significância (p-valor mais baixo no topo). Se `FALSE`,
#'   mantém a ordem original do dataframe.
#' @param custom_labels Named character vector. Opcional, permite renomear variáveis
#'   no eixo Y do gráfico. Ex: `c("ScoreContinuo" = "Cognitive Score", "BiomarcadorGamma" = "Biomarker A")`.
#'
#' @return Um objeto ggplot.
#'
#' @details
#' - A função converte os p-valores formatados (strings como "<0.001") de volta para numéricos
#'   para poder calcular -log10(p-valor). "<0.001" é tratado como um valor pequeno (p.ex., 1e-4).
#' - Variáveis com p-valores NA ou status excluídos não são plotadas.
#' - Requer os pacotes `ggplot2`, `dplyr`, `ggrepel`, e `forcats`.
#'
#' @importFrom ggplot2 ggplot aes geom_point geom_segment geom_vline scale_color_manual labs theme_minimal theme element_text scale_y_discrete ggtitle
#' @importFrom ggrepel geom_text_repel
#' @importFrom dplyr filter mutate arrange select case_when pull rename any_of
#' @importFrom forcats fct_reorder fct_relevel
#' @importFrom rlang .data sym
#' @examples
#' \dontrun{
#' # Supondo que 'resultados_ex1' existe do exemplo da função anterior
#' if (exists("resultados_ex1") && is.data.frame(resultados_ex1)) {
#'   # Plotagem básica
#'   p1 <- plot_comparison_results(resultados_ex1)
#'   print(p1)
#'
#'   # Plotagem com mais rótulos e título personalizado
#'   p2 <- plot_comparison_results(resultados_ex1,
#'                                 n_label = 20,
#'                                 plot_title = "Comparação entre Centros (Ajustado por Idade e Sexo)",
#'                                 custom_labels = c("ScoreContinuo" = "Escore Cognitivo",
#'                                                   "BiomarcadorGamma" = "Biomarcador (Gamma)"))
#'   print(p2)
#'
#'   # Plotagem usando p-valor bruto e sem ordenar
#'   p3 <- plot_comparison_results(resultados_ex1,
#'                                 p_value_type = "raw",
#'                                 order_by_significance = FALSE,
#'                                 label_significant = FALSE)
#'   print(p3)
#' } else {
#'  print("Execute o exemplo da função compare_groups_auto_v4 primeiro para gerar 'resultados_ex1'")
#' }
#' }
plot_comparison_results <- function(res_df,
                                    p_value_type = "FDR",
                                    significance_threshold = 0.05,
                                    label_significant = TRUE,
                                    n_label = 15,
                                    plot_title = NULL,
                                    exclude_statuses = c("Constant Outcome Variable",
                                                         "Low N", # Captura status começando com Low N
                                                         "Unsupported outcome variable type",
                                                         "Grouping variable has < 2 levels",
                                                         "Error", # Captura status começando com Error
                                                         "Not processed"),
                                    order_by_significance = TRUE,
                                    custom_labels = NULL) {
  
  # --- 1. Input Checks ---
  req_cols <- c("Variable", "Status", "p_value", "p_value_FDR")
  if (!all(req_cols %in% names(results_df))) {
    stop("O dataframe 'results_df' não contém as colunas necessárias: ",
         paste(req_cols[!req_cols %in% names(results_df)], collapse = ", "))
  }
  if (!p_value_type %in% c("FDR", "raw")) {
    stop("'p_value_type' deve ser 'FDR' ou 'raw'.")
  }
  if (!requireNamespace("ggplot2", quietly = TRUE)) {
    stop("Pacote 'ggplot2' é necessário. Por favor, instale-o.", call. = FALSE)
  }
  if (!requireNamespace("dplyr", quietly = TRUE)) {
    stop("Pacote 'dplyr' é necessário. Por favor, instale-o.", call. = FALSE)
  }
  if (!requireNamespace("forcats", quietly = TRUE)) {
    stop("Pacote 'forcats' é necessário. Por favor, instale-o.", call. = FALSE)
  }
  if (label_significant && !requireNamespace("ggrepel", quietly = TRUE)) {
    warning("Pacote 'ggrepel' não encontrado. Rótulos não serão adicionados. Instale 'ggrepel' para habilitar.", call. = FALSE)
    label_significant <- FALSE
  }
  
  # --- 2. Data Preparation ---
  p_col_name <- if (p_value_type == "FDR") "p_value_FDR" else "p_value"
  p_col_sym <- rlang::sym(p_col_name)
  
  # Filtrar por status
  plot_data <- results_df
  
  if (!is.null(exclude_statuses) && length(exclude_statuses) > 0) {
    # Cria um padrão regex para buscar status que *começam* com as strings em exclude_statuses
    # ou que correspondem exatamente
    exclude_pattern <- paste0("^(", paste(exclude_statuses, collapse = "|"), ")")
    plot_data <- dplyr::filter(plot_data, !grepl(exclude_pattern, .data$Status, ignore.case = TRUE))
  }
  
  
  # Converter p-valor string para numérico
  # Tratar "<0.001" como um valor pequeno e NAs
  p_numeric <- plot_data[[p_col_name]]
  p_numeric <- suppressWarnings(as.numeric(gsub("<0\\.001", "1e-4", p_numeric))) # Usar 1e-4 ou similar
  
  plot_data <- dplyr::mutate(plot_data,
                             p_numeric = p_numeric,
                             # Calcular -log10(p), tratando p=0 ou NAs
                             neg_log10_p = dplyr::case_when(
                               is.na(p_numeric) ~ NA_real_,
                               p_numeric == 0 ~ -log10(1e-300), # Evitar Inf, usar um valor grande
                               TRUE ~ -log10(p_numeric)
                             ),
                             # Determinar significância baseada no p numérico
                             Significant_num = !is.na(p_numeric) & p_numeric < significance_threshold
  )
  
  # Filtrar NAs no valor de plotagem e garantir que neg_log10_p é finito
  plot_data <- dplyr::filter(plot_data, !is.na(.data$neg_log10_p) & is.finite(.data$neg_log10_p))
  
  if (nrow(plot_data) == 0) {
    warning("Nenhum dado válido restante para plotar após filtragem e conversão de p-valor.", call. = FALSE)
    # Retorna um gráfico vazio ou mensagem
    return(ggplot2::ggplot() + ggplot2::theme_void() + ggplot2::ggtitle("Nenhum dado para plotar"))
  }
  
  # Ordenar variáveis se solicitado
  if (order_by_significance) {
    # Reordena o fator Variable com base em neg_log10_p (valores maiores primeiro)
    plot_data <- dplyr::mutate(plot_data, Variable = forcats::fct_reorder(.data$Variable, .data$neg_log10_p, .desc = FALSE))
  } else {
    # Mantém a ordem original ou transforma em fator para garantir ordem discreta
    plot_data <- dplyr::mutate(plot_data, Variable = factor(.data$Variable, levels = unique(.data$Variable)))
  }
  
  # Aplicar rótulos personalizados se fornecidos
  if (!is.null(custom_labels)) {
    original_levels <- levels(plot_data$Variable)
    new_labels <- ifelse(original_levels %in% names(custom_labels),
                         custom_labels[original_levels],
                         original_levels)
    # Renomeia os níveis do fator
    plot_data <- dplyr::mutate(plot_data, Variable = factor(.data$Variable, levels = original_levels, labels = new_labels))
  }
  
  
  # --- 3. Criar o Gráfico ---
  # Definir título padrão se não fornecido
  if (is.null(plot_title)) {
    plot_title <- paste("Significância da Comparação de Grupos (-log10", p_value_type, "p-valor)")
  }
  
  # Limiar para a linha vertical
  neg_log10_threshold <- -log10(significance_threshold)
  
  # Gráfico base
  gg <- ggplot2::ggplot(plot_data, ggplot2::aes(x = .data$neg_log10_p, y = .data$Variable)) +
    # Segmentos do lollipop (opcional, mas visualmente útil)
    ggplot2::geom_segment(ggplot2::aes(xend = 0, yend = .data$Variable), color = "grey80", linewidth = 0.5) +
    # Pontos, coloridos por significância
    ggplot2::geom_point(ggplot2::aes(color = .data$Significant_num), size = 2.5, alpha = 0.8) +
    # Linha vertical do limiar de significância
    ggplot2::geom_vline(xintercept = neg_log10_threshold, linetype = "dashed", color = "darkred", linewidth = 0.8) +
    # Escala de cores manual para significância
    ggplot2::scale_color_manual(values = c(`FALSE` = "grey40", `TRUE` = "red"),
                                labels = c(`FALSE` = paste0("p >= ", significance_threshold),
                                           `TRUE` = paste0("p < ", significance_threshold)),
                                name = paste(p_value_type, "Significância")) +
    # Rótulos dos eixos e título
    ggplot2::labs(
      title = plot_title,
      x = bquote(-log[10] ~ .(paste("(", p_value_type, " p-valor)", sep=""))), # Expressão para log10
      y = "Variável Analisada"
    ) +
    # Tema limpo, comum em publicações
    ggplot2::theme_minimal(base_size = 12) +
    ggplot2::theme(
      axis.text.y = ggplot2::element_text(size = 8), # Ajustar tamanho do texto no eixo Y se houver muitas variáveis
      axis.title = ggplot2::element_text(size = 10),
      plot.title = ggplot2::element_text(size = 12, face = "bold", hjust = 0.5),
      legend.position = "bottom"
    )
  
  # Adicionar rótulos aos pontos significativos usando ggrepel
  if (label_significant && sum(plot_data$Significant_num) > 0) {
    # Preparar dados para rótulos (top N significativos)
    label_data <- plot_data %>%
      dplyr::filter(.data$Significant_num) %>%
      dplyr::arrange(dplyr::desc(.data$neg_log10_p)) %>%
      dplyr::slice_head(n = n_label)
    
    if (nrow(label_data) > 0) {
      gg <- gg +
        ggrepel::geom_text_repel(
          data = label_data,
          ggplot2::aes(label = .data$Variable),
          size = 2.5,          # Tamanho do texto do rótulo
          max.overlaps = Inf, # Tentar mostrar todos os rótulos (pode ajustar)
          segment.color = "grey50",
          segment.size = 0.3,
          nudge_x = 0.15,     # Ajustar posição do rótulo
          box.padding = 0.3,
          point.padding = 0.3
        )
    }
  }
  
  return(gg)
}
library(ggrepel)
library(forcats)

#plot_comparison_results(results)

#plot_comparison_results(results2)

# Para salvar um gráfico:
# ggsave("meu_grafico_significancia.png", plot = plot1, width = 8, height = 6, dpi = 300)
# ggsave("meu_grafico_significancia.pdf", plot = plot1, width = 8, height = 6)
#' Cria uma Tabela Formatada dos Resultados da Função compare_groups_auto_v4 (v2 Corrigida)
#'
#' Gera uma tabela bonita e organizada, adequada para publicação científica,
#' a partir do dataframe de resultados da função `compare_groups_auto_v4`.
#' Utiliza os pacotes knitr e kableExtra para formatação.
#'
#' @param results_df Dataframe. O output da função `compare_groups_auto_v4`.
#' @param caption Character. O título/legenda da tabela.
#' @param columns_to_include Character vector. Nomes das colunas do `results_df`
#'   a serem incluídas na tabela. Padrão inclui as colunas mais comuns.
#' @param column_rename Named character vector. Para renomear as colunas na tabela final.
#'   Ex: `c("Variable" = "Variável", "p_value_FDR" = "P-valor Ajustado (FDR)")`.
#' @param highlight_significant Logical. Se `TRUE` (padrão), aplica negrito aos
#'   p-valores ajustados (FDR) que são significativos (baseado na coluna 'Significant'
#'   do input).
#' @param add_notes_column Logical. Se `TRUE` (padrão), adiciona uma coluna "Notas"
#'   com códigos para avisos (Convergência, Shift Gamma) ou status não "OK".
#'   Uma nota de rodapé geral explicará os códigos.
#' @param status_codes Named character vector. Códigos a serem usados na coluna Notas
#'   para status específicos (além de Convergência 'C' e Gamma 'G').
#'   Padrão inclui códigos para status comuns como 'N=Baixo', 'Constante', 'Erro'.
#'   Ex: `c("Low N" = "N", "Constant Outcome Variable" = "K", "Error" = "E")`
#'   O matching é feito buscando o início do texto do Status.
#' @param format Character. Formato de saída para `kable` (p.ex., "html", "latex",
#'   "markdown"). Padrão é "pipe" (Markdown GFM). Para artigos, "latex" ou "html"
#'   (dependendo do fluxo de trabalho) são comuns.
#' @param booktabs Logical. Usar `booktabs = TRUE` para tabelas LaTeX (recomendado).
#'   Padrão é `TRUE`.
#' @param full_width Logical. Argumento `full_width` para `kable_styling`. Padrão `FALSE`.
#' @param font_size Numeric. Tamanho da fonte para `kable_styling`. Padrão `NULL`.
#' @param ... Argumentos adicionais a serem passados para `kableExtra::kable_styling`.
#'
#' @return Um objeto kable (tabela formatada).
#'
#' @details
#' - A função seleciona, renomeia e opcionalmente formata colunas.
#' - P-valores já devem estar formatados como string no `results_df` (incluindo "<0.001").
#' - A coluna 'Notas' e a nota de rodapé ajudam a comunicar sucintamente problemas ou
#'   avisos metodológicos sem sobrecarregar a tabela principal.
#' - Requer os pacotes `knitr`, `kableExtra`, e `dplyr`.
#'
#' @importFrom knitr kable
#' @importFrom kableExtra kable_styling footnote cell_spec row_spec kable_classic save_kable
#' @importFrom dplyr select rename mutate case_when across all_of left_join relocate filter any_of arrange desc slice_head setdiff intersect
#' @importFrom rlang := !! sym `%||%`
#' @importFrom tibble tibble
#'
create_results_table <- function(results_df,
                                 caption = "Resultados da Comparação de Grupos",
                                 columns_to_include = c("Variable", "n_obs",
                                                        "Group_Ref_Level_Used", "Status",
                                                        "p_value", "p_value_FDR"),
                                 column_rename = c("Variable" = "Variável",
                                                   "n_obs" = "N",
                                                   "Group_Ref_Level_Used" = "Ref.",
                                                   "Status" = "Status Análise",
                                                   "p_value" = "P-valor",
                                                   "p_value_FDR" = "P-valor Ajustado"),
                                 highlight_significant = TRUE,
                                 add_notes_column = TRUE,
                                 status_codes = c("Low N" = "N", # N baixo
                                                  "Constant Outcome" = "K", # Constante
                                                  "Error" = "E", # Erro geral
                                                  "Unsupported" = "U", # Tipo não suportado
                                                  "No DF improvement" = "DF", # Sem melhora DF
                                                  "calculation failed" = "Calc", # Falha cálculo
                                                  "converge" = "C" # Usa 'C' se status contiver 'converge'
                                 ),
                                 format = "pipe", # pipe é bom para markdown GFM
                                 booktabs = TRUE,
                                 full_width = FALSE,
                                 font_size = NULL,
                                 ...) {
  
  # --- 1. Input Checks ---
  if (!requireNamespace("knitr", quietly = TRUE)) stop("Pacote 'knitr' é necessário.")
  if (!requireNamespace("kableExtra", quietly = TRUE)) stop("Pacote 'kableExtra' é necessário.")
  if (!requireNamespace("dplyr", quietly = TRUE)) stop("Pacote 'dplyr' é necessário.")
  if (!requireNamespace("rlang", quietly = TRUE)) stop("Pacote 'rlang' é necessário.")
  
  req_input_cols <- c("Variable", "Status", "p_value", "p_value_FDR", "Significant",
                      "Convergence_Warning", "Gamma_Shift_Warning")
  if (!all(req_input_cols %in% names(results_df))) {
    missing_cols <- req_input_cols[!req_input_cols %in% names(results_df)]
    stop("O dataframe 'results_df' não contém as colunas de input necessárias: ",
         paste(missing_cols, collapse = ", "))
  }
  
  original_columns_to_include <- columns_to_include # Guardar a lista original pedida
  if (!all(columns_to_include %in% names(results_df))) {
    missing_cols <- columns_to_include[!columns_to_include %in% names(results_df)]
    warning("As seguintes colunas especificadas em 'columns_to_include' não foram encontradas em 'results_df': ",
            paste(missing_cols, collapse = ", "), ". Elas serão ignoradas.", immediate. = TRUE)
    columns_to_include <- intersect(columns_to_include, names(results_df))
    if(length(columns_to_include) == 0) stop("Nenhuma coluna válida para incluir na tabela.")
  }
  
  
  # --- 2. Data Preparation ---
  # Selecionar colunas necessárias para processamento (inclui as de notas/highlight)
  cols_for_processing <- unique(c(columns_to_include, req_input_cols))
  table_data <- results_df %>%
    dplyr::select(dplyr::all_of(intersect(cols_for_processing, names(results_df))))
  
  footnote_explanations <- list() # Usar lista para evitar problemas com nomes duplicados
  notes_col_present <- FALSE
  
  # Criar coluna de Notas
  if (add_notes_column) {
    table_data <- table_data %>%
      dplyr::mutate(Notes = "") # Inicializar coluna
    
    # Adicionar códigos para avisos booleanos
    if (any(table_data$Convergence_Warning, na.rm = TRUE)) {
      table_data <- dplyr::mutate(table_data, Notes = ifelse(.data$Convergence_Warning, paste0(.data$Notes, "C"), .data$Notes))
      if (!"C" %in% names(footnote_explanations)) footnote_explanations[["C"]] <- "Modelo não convergiu ou problema similar (e.g., Hessian em polr)"
    }
    if (any(table_data$Gamma_Shift_Warning, na.rm = TRUE)) {
      table_data <- dplyr::mutate(table_data, Notes = ifelse(.data$Gamma_Shift_Warning, paste0(.data$Notes, "G"), .data$Notes))
      if (!"G" %in% names(footnote_explanations)) footnote_explanations[["G"]] <- "Valores não-positivos ajustados (shift) para GLM Gamma"
    }
    
    # Adicionar códigos para status não "OK"
    if (length(status_codes) > 0) {
      # Linhas que têm status não "OK" E ainda não receberam código de aviso C ou G
      needs_status_code <- table_data$Status != "OK" & !grepl("[CG]", table_data$Notes)
      
      for (status_text in names(status_codes)) {
        code <- status_codes[[status_text]]
        match_idx <- which(grepl(status_text, table_data$Status, ignore.case = TRUE) & needs_status_code)
        if(length(match_idx) > 0) {
          table_data$Notes[match_idx] <- paste0(table_data$Notes[match_idx], code)
          needs_status_code[match_idx] <- FALSE # Marcar como processado
          explanation <- paste0(code, " = Status contém '", status_text, "'")
          if (!code %in% names(footnote_explanations)) footnote_explanations[[code]] <- explanation
        }
      }
      # Código genérico 'S' para status não-OK restantes sem código específico
      final_needs_code_idx <- which(table_data$Status != "OK" & !grepl("[CG]", table_data$Notes) & needs_status_code)
      if (length(final_needs_code_idx) > 0) {
        table_data$Notes[final_needs_code_idx] <- paste0(table_data$Notes[final_needs_code_idx], "S")
        if (!"S" %in% names(footnote_explanations)) footnote_explanations[["S"]] <- "S = Outro status não-'OK'"
      }
    }
    
    # Verificar se a coluna Notes tem algum conteúdo útil
    if (any(table_data$Notes != "", na.rm = TRUE)) {
      notes_col_present <- TRUE
      if (!"Notes" %in% names(column_rename)) column_rename["Notes"] <- "Notas" # Nome padrão
    } else {
      table_data <- dplyr::select(table_data, -dplyr::all_of("Notes")) # Remover se vazia
    }
  } # Fim de if(add_notes_column)
  
  # Aplicar highlight (negrito) se solicitado
  p_col_original <- "p_value_FDR" # Coluna base para highlight
  target_col_formatted <- paste0(p_col_original, "_fmt")
  final_col_name <- column_rename[p_col_original] %||% "P-valor Ajustado" # Nome final da coluna
  
  highlight_applied <- FALSE # Flag para saber se o highlight foi aplicado
  
  if (highlight_significant && p_col_original %in% names(table_data)) {
    if (!"Significant" %in% names(table_data)) {
      warning("Coluna 'Significant' não encontrada. Não é possível aplicar highlight.", immediate. = TRUE)
    } else if (!p_col_original %in% original_columns_to_include) {
      warning(paste0("Coluna '", p_col_original, "' não está em 'columns_to_include'. Highlight não será aplicado a ela."), immediate. = TRUE)
    } else {
      # Garantir que a coluna original é character para evitar problemas com cell_spec
      table_data[[p_col_original]] <- as.character(table_data[[p_col_original]])
      
      table_data <- table_data %>%
        dplyr::mutate(
          !!rlang::sym(target_col_formatted) := dplyr::case_when(
            .data$Significant == "Yes" & !is.na(.data[[p_col_original]]) ~
              kableExtra::cell_spec(.data[[p_col_original]], format = format, bold = TRUE),
            # Caso padrão: apenas retorna o valor como está (ou com cell_spec sem bold se necessário)
            # Se format for 'latex' ou 'html', cell_spec é necessário para escape correto
            !is.na(.data[[p_col_original]]) & format %in% c("latex", "html") ~
              kableExtra::cell_spec(.data[[p_col_original]], format = format, bold = FALSE),
            # Para outros formatos como 'pipe', o valor original pode ser suficiente
            TRUE ~ .data[[p_col_original]]
          )
        )
      highlight_applied <- TRUE
      
      # Atualizar quais colunas incluir e como renomear
      columns_to_include <- setdiff(columns_to_include, p_col_original) # Remover original da lista final
      # Encontrar posição para inserir a nova coluna (após p_value bruto, se existir)
      raw_p_col_name_in_list <- names(column_rename)[column_rename == (column_rename["p_value"] %||% "P-valor")] %||% "p_value"
      insert_pos <- which(columns_to_include == raw_p_col_name_in_list)
      if (length(insert_pos) == 0) { # Se p_value não estiver, inserir após Variable
        insert_pos <- which(columns_to_include == "Variable")
        if (length(insert_pos) == 0) insert_pos <- 0 # Inserir no início se nem Variable estiver
      }
      columns_to_include <- append(columns_to_include, target_col_formatted, after = insert_pos)
      
      # Atualizar o mapa de renomeação
      column_rename[target_col_formatted] <- final_col_name # Renomeia a nova coluna
      column_rename <- column_rename[!names(column_rename) %in% p_col_original] # Remove renomeação da original
    }
  }
  
  # Adicionar coluna Notes à lista final se ela foi criada
  if (notes_col_present && !("Notes" %in% columns_to_include)) {
    columns_to_include <- c(columns_to_include, "Notes")
  }
  
  # Selecionar colunas finais na ordem desejada
  final_columns_ordered <- intersect(columns_to_include, names(table_data))
  
  # Filtrar `column_rename` para conter apenas as colunas que realmente existem na tabela final
  valid_rename_keys <- names(column_rename)[names(column_rename) %in% final_columns_ordered]
  column_rename_final <- column_rename[valid_rename_keys]
  
  # Criar a tabela final com colunas selecionadas e renomeadas
  table_data_final <- table_data %>%
    dplyr::select(dplyr::all_of(final_columns_ordered)) %>%
    dplyr::rename(dplyr::any_of(column_rename_final))
  
  
  # --- 3. Gerar Tabela Kable ---
  # escape = FALSE é necessário se usamos cell_spec para formatar (negrito)
  escape_setting <- highlight_applied && format %in% c("latex", "html")
  
  kbl_obj <- knitr::kable(table_data_final,
                          format = format,
                          caption = caption,
                          booktabs = booktabs,
                          linesep = "",
                          escape = !escape_setting, # Escape TRUE se NENHUMA formatação especial foi usada
                          align = 'l') # Alinhar colunas à esquerda por padrão
  
  # --- 4. Aplicar Estilo KableExtra ---
  kbl_obj <- kableExtra::kable_styling(
    kbl_obj,
    bootstrap_options = c("striped", "hover", "condensed", "responsive"),
    full_width = full_width,
    font_size = font_size,
    latex_options = if(format == "latex") c("striped", "repeat_header") else NULL,
    ...
  )
  
  # Adicionar nota de rodapé
  if (notes_col_present && length(footnote_explanations) > 0) {
    footnote_text <- paste(names(footnote_explanations), footnote_explanations, sep = ": ", collapse = "; ")
    kbl_obj <- kableExtra::footnote(kbl_obj, general = footnote_text,
                                    general_title = "Notas:",
                                    footnote_as_chunk = TRUE,
                                    escape = TRUE, # Nota de rodapé geralmente não precisa de escape=FALSE
                                    threeparttable = (format == "latex")
    )
  }
  
  return(kbl_obj)
}
# --- Usar a Função create_results_table ---

# Certifique-se de que o dataframe de resultados (ex: resultados_ex1) existe
if (exists("resultados_ex1") && is.data.frame(resultados_ex1)) {
  
  # Exemplo 1: Tabela padrão em formato Markdown (bom para console/Rmd)
  cat("\n--- Tabela 1 (Markdown) ---\n")
  tabela_md <- create_results_table(
    results_df = resultados_ex1,
    caption = "Tabela 1: Comparação de Variáveis entre Centros (Ajustado por Idade e Sexo)",
    format = "pipe" # ou "markdown"
  )
  print(tabela_md)
  
  
  # Exemplo 2: Tabela formatada para HTML com colunas selecionadas e renomeadas
  cat("\n--- Tabela 2 (HTML - veja no Viewer ou salve) ---\n")
  tabela_html <- create_results_table(
    results_df = resultados_ex1,
    caption = "Tabela 2: Resultados Principais da Comparação entre Centros",
    columns_to_include = c("Variable", "n_obs", "Group_Ref_Level_Used", "p_value", "p_value_FDR"),
    column_rename = c("Variable" = "Variável Analisada",
                      "n_obs" = "N",
                      "Group_Ref_Level_Used" = "Grupo Ref.",
                      "p_value" = "P (Bruto)",
                      "p_value_FDR" = "P (Ajustado)"),
    highlight_significant = TRUE,
    add_notes_column = TRUE,
    format = "html"
  )
  # print(tabela_html) # No RStudio, abre no Viewer
  # Para salvar:
  # kableExtra::save_kable(tabela_html, file = "tabela_resultados_exemplo.html")
  
  
  # Exemplo 3: Tabela para LaTeX (requer pacote LaTeX)
  # cat("\n--- Tabela 3 (LaTeX - para documentos .tex) ---\n")
  # tabela_latex <- create_results_table(
  #   results_df = resultados_ex1,
  #   caption = "Comparação entre Centros",
  #   columns_to_include = c("Variable", "n_obs", "p_value_FDR"),
  #    column_rename = c("Variable" = "Variável", "n_obs" = "N", "p_value_FDR" = "P ajustado"),
  #   highlight_significant = TRUE,
  #   add_notes_column = TRUE,
  #   format = "latex",
  #   booktabs = TRUE
  # )
  # print(tabela_latex) # Imprime o código LaTeX
  
  
} else {
  print("Por favor, execute o exemplo da função compare_groups_auto_v4 primeiro para gerar 'resultados_ex1'.")
}
## [1] "Por favor, execute o exemplo da função compare_groups_auto_v4 primeiro para gerar 'resultados_ex1'."
#cat("\n--- Tabela 2 (HTML - veja no Viewer ou salve) ---\n")
#tabela_html <- create_results_table(
#  results_df = results,
#  caption = "Tabela 2: Resultados Principais",
#  columns_to_include = c("Variable", "n_obs", "Group_Ref_Level_Used", "p_value", "p_value_FDR"),
#  column_rename = c("Variable" = "Variável Analisada",
#                    "n_obs" = "N",
#                    "Group_Ref_Level_Used" = "Grupo Ref.",
#                    "p_value" = "P (Bruto)",
#                    "p_value_FDR" = "P (Ajustado)"),
#  highlight_significant = TRUE,
#  add_notes_column = TRUE,
#  format = "html"
#)

#print(tabela_html)

#cat("\n--- Tabela 2 (HTML - veja no Viewer ou salve) ---\n")
#tabela_html2 <- create_results_table(
#  results_df = results2,
#  caption = "Tabela 2: Resultados Principais",
#  columns_to_include = c("Variable", "n_obs", "Group_Ref_Level_Used", "p_value", "p_value_FDR"),
#  column_rename = c("Variable" = "Variável Analisada",
#                    "n_obs" = "N",
#                    "Group_Ref_Level_Used" = "Grupo Ref.",
#                    "p_value" = "P (Bruto)",
#                    "p_value_FDR" = "P (Ajustado)"),
#  highlight_significant = TRUE,
#  add_notes_column = TRUE,
#  format = "html"
#)

#print(tabela_html2)
# Compare brain variables across groups (cognitive)
compare_groups_auto_v4(imputed_data[c(1, 5, 60:90)], group = "cognitive", covariates = "age_pcr")
##                                                          Variable
## 1                                            right_accumbens_area
## 2                                             left_accumbens_area
## 3                                                  right_amygdala
## 4                                                   left_amygdala
## 5                                       right_cerebellum_exterior
## 6                                        left_cerebellum_exterior
## 7                                               right_hippocampus
## 8                                                left_hippocampus
## 9                                                   right_putamen
## 10                                                   left_putamen
## 11                                          right_thalamus_proper
## 12                                           left_thalamus_proper
## 13                                                   fornix_right
## 14                                                    fornix_left
## 15                        anterior_limb_of_internal_capsule_right
## 16                         anterior_limb_of_internal_capsule_left
## 17 posterior_limb_of_internal_capsule_inc_cerebral_peduncle_right
## 18  posterior_limb_of_internal_capsule_inc_cerebral_peduncle_left
## 19                                                corpus_callosum
## 20                          right_a_cg_g_anterior_cingulate_gyrus
## 21                           left_a_cg_g_anterior_cingulate_gyrus
## 22                                    right_a_ins_anterior_insula
## 23                                     left_a_ins_anterior_insula
## 24                                       right_an_g_angular_gyrus
## 25                                        left_an_g_angular_gyrus
## 26                                               right_cun_cuneus
## 27                                                left_cun_cuneus
## 28                                      right_ent_entorhinal_area
## 29                                       left_ent_entorhinal_area
## 30                                        right_g_re_gyrus_rectus
## 31                                         left_g_re_gyrus_rectus
##                              Type Covariates_Used Group_Ref_Level_Used n_obs
## 1  Continuous (Gaussian GLM used)         age_pcr                    1   463
## 2  Continuous (Gaussian GLM used)         age_pcr                    1   463
## 3  Continuous (Gaussian GLM used)         age_pcr                    1   463
## 4  Continuous (Gaussian GLM used)         age_pcr                    1   463
## 5  Continuous (Gaussian GLM used)         age_pcr                    1   463
## 6  Continuous (Gaussian GLM used)         age_pcr                    1   463
## 7  Continuous (Gaussian GLM used)         age_pcr                    1   463
## 8  Continuous (Gaussian GLM used)         age_pcr                    1   463
## 9  Continuous (Gaussian GLM used)         age_pcr                    1   463
## 10 Continuous (Gaussian GLM used)         age_pcr                    1   463
## 11 Continuous (Gaussian GLM used)         age_pcr                    1   463
## 12 Continuous (Gaussian GLM used)         age_pcr                    1   463
## 13 Continuous (Gaussian GLM used)         age_pcr                    1   463
## 14 Continuous (Gaussian GLM used)         age_pcr                    1   463
## 15 Continuous (Gaussian GLM used)         age_pcr                    1   463
## 16 Continuous (Gaussian GLM used)         age_pcr                    1   463
## 17 Continuous (Gaussian GLM used)         age_pcr                    1   463
## 18 Continuous (Gaussian GLM used)         age_pcr                    1   463
## 19 Continuous (Gaussian GLM used)         age_pcr                    1   463
## 20 Continuous (Gaussian GLM used)         age_pcr                    1   463
## 21 Continuous (Gaussian GLM used)         age_pcr                    1   463
## 22 Continuous (Gaussian GLM used)         age_pcr                    1   463
## 23 Continuous (Gaussian GLM used)         age_pcr                    1   463
## 24 Continuous (Gaussian GLM used)         age_pcr                    1   463
## 25 Continuous (Gaussian GLM used)         age_pcr                    1   463
## 26 Continuous (Gaussian GLM used)         age_pcr                    1   463
## 27 Continuous (Gaussian GLM used)         age_pcr                    1   463
## 28 Continuous (Gaussian GLM used)         age_pcr                    1   463
## 29 Continuous (Gaussian GLM used)         age_pcr                    1   463
## 30 Continuous (Gaussian GLM used)         age_pcr                    1   463
## 31 Continuous (Gaussian GLM used)         age_pcr                    1   463
##    Status p_value p_value_FDR Significant Gamma_Shift_Warning
## 1      OK   0.884       0.931          No               FALSE
## 2      OK   0.497       0.854          No               FALSE
## 3      OK   0.668       0.867          No               FALSE
## 4      OK   0.318       0.854          No               FALSE
## 5      OK   0.277       0.854          No               FALSE
## 6      OK   0.251       0.854          No               FALSE
## 7      OK   0.134       0.854          No               FALSE
## 8      OK   0.134       0.854          No               FALSE
## 9      OK   0.948       0.948          No               FALSE
## 10     OK   0.439       0.854          No               FALSE
## 11     OK   0.032       0.854          No               FALSE
## 12     OK   0.058       0.854          No               FALSE
## 13     OK   0.410       0.854          No               FALSE
## 14     OK   0.584       0.854          No               FALSE
## 15     OK   0.770       0.905          No               FALSE
## 16     OK   0.528       0.854          No               FALSE
## 17     OK   0.789       0.905          No               FALSE
## 18     OK   0.561       0.854          No               FALSE
## 19     OK   0.492       0.854          No               FALSE
## 20     OK   0.511       0.854          No               FALSE
## 21     OK   0.281       0.854          No               FALSE
## 22     OK   0.283       0.854          No               FALSE
## 23     OK   0.395       0.854          No               FALSE
## 24     OK   0.901       0.931          No               FALSE
## 25     OK   0.565       0.854          No               FALSE
## 26     OK   0.606       0.854          No               FALSE
## 27     OK   0.721       0.894          No               FALSE
## 28     OK   0.671       0.867          No               FALSE
## 29     OK   0.820       0.908          No               FALSE
## 30     OK   0.252       0.854          No               FALSE
## 31     OK   0.210       0.854          No               FALSE
##    Convergence_Warning
## 1                FALSE
## 2                FALSE
## 3                FALSE
## 4                FALSE
## 5                FALSE
## 6                FALSE
## 7                FALSE
## 8                FALSE
## 9                FALSE
## 10               FALSE
## 11               FALSE
## 12               FALSE
## 13               FALSE
## 14               FALSE
## 15               FALSE
## 16               FALSE
## 17               FALSE
## 18               FALSE
## 19               FALSE
## 20               FALSE
## 21               FALSE
## 22               FALSE
## 23               FALSE
## 24               FALSE
## 25               FALSE
## 26               FALSE
## 27               FALSE
## 28               FALSE
## 29               FALSE
## 30               FALSE
## 31               FALSE
# Compare brain variables across groups (age_interval)
compare_groups_auto_v4(imputed_data[c(6, 60:90)], group = "age_interval")
##                                                          Variable
## 1                                            right_accumbens_area
## 2                                             left_accumbens_area
## 3                                                  right_amygdala
## 4                                                   left_amygdala
## 5                                       right_cerebellum_exterior
## 6                                        left_cerebellum_exterior
## 7                                               right_hippocampus
## 8                                                left_hippocampus
## 9                                                   right_putamen
## 10                                                   left_putamen
## 11                                          right_thalamus_proper
## 12                                           left_thalamus_proper
## 13                                                   fornix_right
## 14                                                    fornix_left
## 15                        anterior_limb_of_internal_capsule_right
## 16                         anterior_limb_of_internal_capsule_left
## 17 posterior_limb_of_internal_capsule_inc_cerebral_peduncle_right
## 18  posterior_limb_of_internal_capsule_inc_cerebral_peduncle_left
## 19                                                corpus_callosum
## 20                          right_a_cg_g_anterior_cingulate_gyrus
## 21                           left_a_cg_g_anterior_cingulate_gyrus
## 22                                    right_a_ins_anterior_insula
## 23                                     left_a_ins_anterior_insula
## 24                                       right_an_g_angular_gyrus
## 25                                        left_an_g_angular_gyrus
## 26                                               right_cun_cuneus
## 27                                                left_cun_cuneus
## 28                                      right_ent_entorhinal_area
## 29                                       left_ent_entorhinal_area
## 30                                        right_g_re_gyrus_rectus
## 31                                         left_g_re_gyrus_rectus
##                              Type Covariates_Used Group_Ref_Level_Used n_obs
## 1  Continuous (Gaussian GLM used)                                    1   463
## 2  Continuous (Gaussian GLM used)                                    1   463
## 3  Continuous (Gaussian GLM used)                                    1   463
## 4  Continuous (Gaussian GLM used)                                    1   463
## 5  Continuous (Gaussian GLM used)                                    1   463
## 6  Continuous (Gaussian GLM used)                                    1   463
## 7  Continuous (Gaussian GLM used)                                    1   463
## 8  Continuous (Gaussian GLM used)                                    1   463
## 9  Continuous (Gaussian GLM used)                                    1   463
## 10 Continuous (Gaussian GLM used)                                    1   463
## 11 Continuous (Gaussian GLM used)                                    1   463
## 12 Continuous (Gaussian GLM used)                                    1   463
## 13 Continuous (Gaussian GLM used)                                    1   463
## 14 Continuous (Gaussian GLM used)                                    1   463
## 15 Continuous (Gaussian GLM used)                                    1   463
## 16 Continuous (Gaussian GLM used)                                    1   463
## 17 Continuous (Gaussian GLM used)                                    1   463
## 18 Continuous (Gaussian GLM used)                                    1   463
## 19 Continuous (Gaussian GLM used)                                    1   463
## 20 Continuous (Gaussian GLM used)                                    1   463
## 21 Continuous (Gaussian GLM used)                                    1   463
## 22 Continuous (Gaussian GLM used)                                    1   463
## 23 Continuous (Gaussian GLM used)                                    1   463
## 24 Continuous (Gaussian GLM used)                                    1   463
## 25 Continuous (Gaussian GLM used)                                    1   463
## 26 Continuous (Gaussian GLM used)                                    1   463
## 27 Continuous (Gaussian GLM used)                                    1   463
## 28 Continuous (Gaussian GLM used)                                    1   463
## 29 Continuous (Gaussian GLM used)                                    1   463
## 30 Continuous (Gaussian GLM used)                                    1   463
## 31 Continuous (Gaussian GLM used)                                    1   463
##    Status p_value p_value_FDR Significant Gamma_Shift_Warning
## 1      OK   0.008       0.016         Yes               FALSE
## 2      OK   0.081       0.114          No               FALSE
## 3      OK   0.609       0.609          No               FALSE
## 4      OK   0.358       0.397          No               FALSE
## 5      OK   0.095       0.128          No               FALSE
## 6      OK   0.100       0.130          No               FALSE
## 7      OK   0.002       0.004         Yes               FALSE
## 8      OK  <0.001      <0.001         Yes               FALSE
## 9      OK   0.187       0.215          No               FALSE
## 10     OK   0.030       0.049         Yes               FALSE
## 11     OK  <0.001      <0.001         Yes               FALSE
## 12     OK  <0.001      <0.001         Yes               FALSE
## 13     OK  <0.001      <0.001         Yes               FALSE
## 14     OK  <0.001      <0.001         Yes               FALSE
## 15     OK  <0.001      <0.001         Yes               FALSE
## 16     OK  <0.001      <0.001         Yes               FALSE
## 17     OK   0.009       0.016         Yes               FALSE
## 18     OK   0.068       0.100          No               FALSE
## 19     OK   0.007       0.014         Yes               FALSE
## 20     OK  <0.001      <0.001         Yes               FALSE
## 21     OK  <0.001      <0.001         Yes               FALSE
## 22     OK  <0.001      <0.001         Yes               FALSE
## 23     OK  <0.001      <0.001         Yes               FALSE
## 24     OK  <0.001       0.001         Yes               FALSE
## 25     OK   0.028       0.049         Yes               FALSE
## 26     OK   0.004       0.009         Yes               FALSE
## 27     OK   0.458       0.489          No               FALSE
## 28     OK   0.131       0.156          No               FALSE
## 29     OK   0.115       0.143          No               FALSE
## 30     OK   0.040       0.061          No               FALSE
## 31     OK   0.577       0.596          No               FALSE
##    Convergence_Warning
## 1                FALSE
## 2                FALSE
## 3                FALSE
## 4                FALSE
## 5                FALSE
## 6                FALSE
## 7                FALSE
## 8                FALSE
## 9                FALSE
## 10               FALSE
## 11               FALSE
## 12               FALSE
## 13               FALSE
## 14               FALSE
## 15               FALSE
## 16               FALSE
## 17               FALSE
## 18               FALSE
## 19               FALSE
## 20               FALSE
## 21               FALSE
## 22               FALSE
## 23               FALSE
## 24               FALSE
## 25               FALSE
## 26               FALSE
## 27               FALSE
## 28               FALSE
## 29               FALSE
## 30               FALSE
## 31               FALSE
# Compare cognitive variables across groups (cognitive)
compare_groups_auto_v4(imputed_data[c(1, 5, 23:59)], group = "cognitive", covariates = "age_pcr")
##            Variable                           Type Covariates_Used
## 1    listaprimerrec Continuous (Gaussian GLM used)         age_pcr
## 2  listaaprendizaje Continuous (Gaussian GLM used)         age_pcr
## 3           listacp Continuous (Gaussian GLM used)         age_pcr
## 4           listalp Continuous (Gaussian GLM used)         age_pcr
## 5        listarecon Continuous (Gaussian GLM used)         age_pcr
## 6      corsidirecto Continuous (Gaussian GLM used)         age_pcr
## 7      corsiinverso Continuous (Gaussian GLM used)         age_pcr
## 8       cactusvivos Continuous (Gaussian GLM used)         age_pcr
## 9      cactusinanim Continuous (Gaussian GLM used)         age_pcr
## 10      otverbaltpo Continuous (Gaussian GLM used)         age_pcr
## 11      otverbalerr Continuous (Gaussian GLM used)         age_pcr
## 12      otvisualtpo Continuous (Gaussian GLM used)         age_pcr
## 13      otvisualerr Continuous (Gaussian GLM used)         age_pcr
## 14      otmentaltpo Continuous (Gaussian GLM used)         age_pcr
## 15      otmentalerr Continuous (Gaussian GLM used)         age_pcr
## 16     otvismenttpo Continuous (Gaussian GLM used)         age_pcr
## 17     otvismenterr Continuous (Gaussian GLM used)         age_pcr
## 18      otswitchtpo Continuous (Gaussian GLM used)         age_pcr
## 19      otswitcherr Continuous (Gaussian GLM used)         age_pcr
## 20       x5dreadtpo Continuous (Gaussian GLM used)         age_pcr
## 21       x5dreaderr Continuous (Gaussian GLM used)         age_pcr
## 22      x5dcounttpo Continuous (Gaussian GLM used)         age_pcr
## 23      x5dcounterr Continuous (Gaussian GLM used)         age_pcr
## 24        x5dfoctpo Continuous (Gaussian GLM used)         age_pcr
## 25        x5dfocerr Continuous (Gaussian GLM used)         age_pcr
## 26     x5dswitchtpo Continuous (Gaussian GLM used)         age_pcr
## 27     x5dswitcherr Continuous (Gaussian GLM used)         age_pcr
## 28           dscorr Continuous (Gaussian GLM used)         age_pcr
## 29           dsomis                     Continuous         age_pcr
## 30          dscomis Continuous (Gaussian GLM used)         age_pcr
## 31         torremov Continuous (Gaussian GLM used)         age_pcr
## 32         torretpo Continuous (Gaussian GLM used)         age_pcr
## 33         bostonsc Continuous (Gaussian GLM used)         age_pcr
## 34        bostonlat Continuous (Gaussian GLM used)         age_pcr
## 35     bostonsemerr Continuous (Gaussian GLM used)         age_pcr
## 36     bostonfonerr                     Continuous         age_pcr
## 37         fluencia Continuous (Gaussian GLM used)         age_pcr
##    Group_Ref_Level_Used n_obs                                    Status p_value
## 1                     1   463                                        OK  <0.001
## 2                     1   463                                        OK  <0.001
## 3                     1   463                                        OK  <0.001
## 4                     1   463                                        OK  <0.001
## 5                     1   463                                        OK  <0.001
## 6                     1   463                                        OK   0.191
## 7                     1   463                                        OK  <0.001
## 8                     1   463                                        OK  <0.001
## 9                     1   463                                        OK  <0.001
## 10                    1   463                                        OK  <0.001
## 11                    1   463                                        OK  <0.001
## 12                    1   463                                        OK  <0.001
## 13                    1   463                                        OK   0.041
## 14                    1   463                                        OK  <0.001
## 15                    1   463                                        OK  <0.001
## 16                    1   463                                        OK  <0.001
## 17                    1   463                                        OK  <0.001
## 18                    1   463                                        OK  <0.001
## 19                    1   463                                        OK  <0.001
## 20                    1   463                                        OK  <0.001
## 21                    1   463                                        OK   0.356
## 22                    1   463                                        OK  <0.001
## 23                    1   463                                        OK   0.811
## 24                    1   463                                        OK  <0.001
## 25                    1   463                                        OK   0.001
## 26                    1   463                                        OK  <0.001
## 27                    1   463                                        OK  <0.001
## 28                    1   463                                        OK  <0.001
## 29                    1   463 Error during model/LRT: NA/NaN/I f i  'x'    <NA>
## 30                    1   463                                        OK   0.057
## 31                    1   463                                        OK  <0.001
## 32                    1   463                                        OK  <0.001
## 33                    1   463                                        OK  <0.001
## 34                    1   463                                        OK  <0.001
## 35                    1   463                                        OK  <0.001
## 36                    1   463 Error during model/LRT: NA/NaN/I f i  'x'    <NA>
## 37                    1   463                                        OK  <0.001
##    p_value_FDR Significant Gamma_Shift_Warning Convergence_Warning
## 1       <0.001         Yes               FALSE               FALSE
## 2       <0.001         Yes               FALSE               FALSE
## 3       <0.001         Yes               FALSE               FALSE
## 4       <0.001         Yes               FALSE               FALSE
## 5       <0.001         Yes               FALSE               FALSE
## 6        0.202          No               FALSE               FALSE
## 7       <0.001         Yes               FALSE               FALSE
## 8       <0.001         Yes               FALSE               FALSE
## 9       <0.001         Yes               FALSE               FALSE
## 10      <0.001         Yes               FALSE               FALSE
## 11      <0.001         Yes               FALSE               FALSE
## 12      <0.001         Yes               FALSE               FALSE
## 13       0.046         Yes               FALSE               FALSE
## 14      <0.001         Yes               FALSE               FALSE
## 15      <0.001         Yes               FALSE               FALSE
## 16      <0.001         Yes               FALSE               FALSE
## 17      <0.001         Yes               FALSE               FALSE
## 18      <0.001         Yes               FALSE               FALSE
## 19      <0.001         Yes               FALSE               FALSE
## 20      <0.001         Yes               FALSE               FALSE
## 21       0.366          No               FALSE               FALSE
## 22      <0.001         Yes               FALSE               FALSE
## 23       0.811          No               FALSE               FALSE
## 24      <0.001         Yes               FALSE               FALSE
## 25       0.001         Yes               FALSE               FALSE
## 26      <0.001         Yes               FALSE               FALSE
## 27      <0.001         Yes               FALSE               FALSE
## 28      <0.001         Yes               FALSE               FALSE
## 29        <NA>          No                TRUE               FALSE
## 30       0.062          No               FALSE               FALSE
## 31      <0.001         Yes               FALSE               FALSE
## 32      <0.001         Yes               FALSE               FALSE
## 33      <0.001         Yes               FALSE               FALSE
## 34      <0.001         Yes               FALSE               FALSE
## 35      <0.001         Yes               FALSE               FALSE
## 36        <NA>          No                TRUE               FALSE
## 37      <0.001         Yes               FALSE               FALSE
# Compare symptoms variables across groups (cognitive)
compare_groups_auto_v4(imputed_data[c(1:22)], group = "cognitive", covariates = "age_pcr")
##                    Variable                           Type Covariates_Used
## 1                  de800cog Continuous (Gaussian GLM used)         age_pcr
## 2                    images Continuous (Gaussian GLM used)         age_pcr
## 3                  age_2024 Continuous (Gaussian GLM used)         age_pcr
## 4              age_interval Continuous (Gaussian GLM used)         age_pcr
## 5                   anosmia                    Categorical         age_pcr
## 6         risk_hospital_icu                    Categorical         age_pcr
## 7      vaccine_before_study                    Categorical         age_pcr
## 8  covid_before_vaccination                         Binary         age_pcr
## 9                     fever                         Binary         age_pcr
## 10                    cough                         Binary         age_pcr
## 11              muscle_pain                         Binary         age_pcr
## 12               breath_dif                         Binary         age_pcr
## 13               smell_lost                         Binary         age_pcr
## 14               taste_lost                         Binary         age_pcr
## 15                      pcr                         Binary         age_pcr
## 16                  pcr_num                         Binary         age_pcr
## 17            covid_variant                    Categorical         age_pcr
## 18                vaccine_1                    Categorical         age_pcr
## 19                vaccine_2                    Categorical         age_pcr
## 20                vaccine_3                    Categorical         age_pcr
##    Group_Ref_Level_Used n_obs Status p_value p_value_FDR Significant
## 1                     1   463     OK  <0.001      <0.001         Yes
## 2                     1   463     OK   0.235       0.521          No
## 3                     1   463     OK   0.290       0.527          No
## 4                     1   463     OK   0.132       0.330          No
## 5                     1   463     OK   0.002       0.016         Yes
## 6                     1   463     OK   0.534       0.681          No
## 7                     1   463     OK   0.032       0.213          No
## 8                     1   463     OK   0.093       0.266          No
## 9                     1   463     OK   0.335       0.533          No
## 10                    1   463     OK   0.545       0.681          No
## 11                    1   463     OK   0.654       0.727          No
## 12                    1   463     OK   0.063       0.257          No
## 13                    1   463     OK   0.952       0.952          No
## 14                    1   463     OK   0.283       0.527          No
## 15                    1   463     OK   0.087       0.266          No
## 16                    1   462     OK   0.064       0.257          No
## 17                    1   463     OK   0.347       0.533          No
## 18                    1   463     OK   0.451       0.645          No
## 19                    1   463     OK   0.890       0.937          No
## 20                    1   463     OK   0.649       0.727          No
##    Gamma_Shift_Warning Convergence_Warning
## 1                FALSE               FALSE
## 2                FALSE               FALSE
## 3                FALSE               FALSE
## 4                FALSE               FALSE
## 5                FALSE               FALSE
## 6                FALSE               FALSE
## 7                FALSE               FALSE
## 8                FALSE               FALSE
## 9                FALSE               FALSE
## 10               FALSE               FALSE
## 11               FALSE               FALSE
## 12               FALSE               FALSE
## 13               FALSE               FALSE
## 14               FALSE               FALSE
## 15               FALSE               FALSE
## 16               FALSE               FALSE
## 17               FALSE               FALSE
## 18               FALSE               FALSE
## 19               FALSE               FALSE
## 20               FALSE               FALSE
# Compare symptoms variables across groups (age_interval)
compare_groups_auto_v4(imputed_data[c(1:22)], group = "age_interval")
##                    Variable                           Type Covariates_Used
## 1                 cognitive                         Binary                
## 2                  de800cog Continuous (Gaussian GLM used)                
## 3                    images Continuous (Gaussian GLM used)                
## 4                  age_2024 Continuous (Gaussian GLM used)                
## 5                   age_pcr Continuous (Gaussian GLM used)                
## 6                   anosmia                    Categorical                
## 7         risk_hospital_icu                    Categorical                
## 8      vaccine_before_study                    Categorical                
## 9  covid_before_vaccination                         Binary                
## 10                    fever                         Binary                
## 11                    cough                         Binary                
## 12              muscle_pain                         Binary                
## 13               breath_dif                         Binary                
## 14               smell_lost                         Binary                
## 15               taste_lost                         Binary                
## 16                      pcr                         Binary                
## 17                  pcr_num                         Binary                
## 18            covid_variant                    Categorical                
## 19                vaccine_1                    Categorical                
## 20                vaccine_2                    Categorical                
## 21                vaccine_3                    Categorical                
##    Group_Ref_Level_Used n_obs Status p_value p_value_FDR Significant
## 1                     1   463     OK  <0.001      <0.001         Yes
## 2                     1   463     OK  <0.001       0.001         Yes
## 3                     1   463     OK  <0.001      <0.001         Yes
## 4                     1   463     OK  <0.001      <0.001         Yes
## 5                     1   463     OK  <0.001      <0.001         Yes
## 6                     1   463     OK   0.014       0.044         Yes
## 7                     1   463     OK   0.712       0.712          No
## 8                     1   463     OK   0.410       0.478          No
## 9                     1   463     OK   0.470       0.493          No
## 10                    1   463     OK   0.103       0.144          No
## 11                    1   463     OK   0.039       0.074          No
## 12                    1   463     OK   0.027       0.061          No
## 13                    1   463     OK   0.081       0.130          No
## 14                    1   463     OK   0.022       0.057          No
## 15                    1   463     OK   0.015       0.044         Yes
## 16                    1   463     OK   0.087       0.130          No
## 17                    1   462     OK   0.071       0.125          No
## 18                    1   463     OK   0.029       0.061          No
## 19                    1   463     OK   0.432       0.478          No
## 20                    1   463     OK   0.221       0.290          No
## 21                    1   463     OK   0.264       0.326          No
##    Gamma_Shift_Warning Convergence_Warning
## 1                FALSE               FALSE
## 2                FALSE               FALSE
## 3                FALSE               FALSE
## 4                FALSE               FALSE
## 5                FALSE               FALSE
## 6                FALSE               FALSE
## 7                FALSE               FALSE
## 8                FALSE               FALSE
## 9                FALSE               FALSE
## 10               FALSE               FALSE
## 11               FALSE               FALSE
## 12               FALSE               FALSE
## 13               FALSE               FALSE
## 14               FALSE               FALSE
## 15               FALSE               FALSE
## 16               FALSE               FALSE
## 17               FALSE               FALSE
## 18               FALSE               FALSE
## 19               FALSE               FALSE
## 20               FALSE               FALSE
## 21               FALSE               FALSE

Descriptive Statistics and Data Quality

The dataset (N=463) exhibited minimal missingness for most variables, although some items (e.g., certain “cough” or “PCR” measures) had higher rates of missing data. They were imputed using CART method (see: https://stefvanbuuren.name/fimd/sec-cart.html)

Outlier diagnostics indicated a small subset of extreme values in several cognitive-performance and neuroimaging measures. These did not prevent model convergence overall, but a few variables required a shift when attempting Gamma-based modeling, suggesting skewed distributions and the presence of zeros or negative values. I used false discovery rate (FDR-controlling procedures) in the R syntax (https://link.springer.com/referenceworkentry/10.1007/978-1-4419-9863-7_223)

Cognitive Group Comparisons (Controlling for Age)

When grouping by the “cognitive” variable (binary/ordinal classification), we tested multiple subcortical and cortical volumes as outcomes in generalized linear models (Gaussian family) with “age_pcr” as a covariate.

Most brain volumetric measures did not show a statistically significant difference between cognitive groups after false discovery rate (FDR) correction. A single region (e.g., right_thalamus_proper, p=0.032) approached nominal significance but was not significant after multiple-comparison adjustment.

Age Interval Group Comparisons

By contrast, grouping participants according to their “age_interval” revealed widespread associations with numerous brain volumes (e.g., hippocampus, thalamus, cerebellum, and other subcortical/cortical areas). Several regions displayed highly significant p-values that remained robust after FDR correction. This finding underscores the substantial impact of age on volumetric measures.

Additionally, certain task-based cognitive measures (e.g., reaction times and error rates in the 5D tasks or OT tasks) also varied significantly across age intervals, indicating age-related differences in cognitive performance.

Symptom and Demographic Variables

Exploratory analyses of symptoms (e.g., anosmia, cough, muscle pain) across “age_interval” did not yield consistent significance after FDR correction, although some nominal p-values (e.g., for cough and muscle_pain) were <0.05 uncorrected.

Interestingly, “cognitive” status and certain related variables (e.g., de800cog) differed significantly across age intervals (p<0.001), suggesting that older groups may show different cognitive test profiles.

Overall Implications

These preliminary results highlight age as a crucial factor influencing both neuroimaging measures and certain cognitive markers.

The “cognitive” grouping alone did not robustly differentiate subcortical volumes once age was accounted for, suggesting that chronological age exerts a stronger effect on structural brain metrics than the cognitive classification in this sample.

— Analysis 1: Grouping by COVID Severity within PCR+ group —

# --- Define your outcome variable lists FIRST ---
# Example (replace with your actual column names):
cognitive_vars_detailed <- c("listaprimerrec", "listaaprendizaje", "listacp", "listalp",
                             "listarecon", "corsidirecto", "corsiinverso", "cactusvivos",
                             "cactusinanim", "otverbaltpo", "otverbalerr", "otvisualtpo",
                             "otvisualerr", "otmentaltpo", "otmentalerr", "otvismenttpo",
                             "otvismenterr", "otswitchtpo", "otswitcherr", "x5dreadtpo",
                             "x5dreaderr", "x5dcounttpo", "x5dcounterr", "x5dfoctpo",
                             "x5dfocerr", "x5dswitchtpo", "x5dswitcherr", "dscorr",
                             "dsomis", "dscomis", "torremov", "torretpo", "bostonsc",
                             "bostonlat", "bostonsemerr", "bostonfonerr", "fluencia")

neuroimaging_vars <- c("right_accumbens_area", "left_accumbens_area", "right_amygdala",
                       "left_amygdala", "right_cerebellum_exterior", "left_cerebellum_exterior",
                       "right_hippocampus", "left_hippocampus", "right_putamen",
                       "left_putamen", "right_thalamus_proper", "left_thalamus_proper",
                       "fornix_right", "fornix_left", "anterior_limb_of_internal_capsule_right",
                       "anterior_limb_of_internal_capsule_left",
                       "posterior_limb_of_internal_capsule_inc_cerebral_peduncle_right",
                       "posterior_limb_of_internal_capsule_inc_cerebral_peduncle_left",
                       "corpus_callosum", "right_a_cg_g_anterior_cingulate_gyrus",
                       "left_a_cg_g_anterior_cingulate_gyrus", "right_a_ins_anterior_insula",
                       "left_a_ins_anterior_insula", "right_an_g_angular_gyrus",
                       "left_an_g_angular_gyrus", "right_cun_cuneus", "left_cun_cuneus",
                       "right_ent_entorhinal_area", "left_ent_entorhinal_area",
                       "right_g_re_gyrus_rectus", "left_g_re_gyrus_rectus")
                       # Add any other relevant imaging vars

# --- Analysis 1: Grouping by COVID Severity within PCR+ group ---
# Goal: Impact of severity on cognition/brain in PCR+ group.

# Create subset for PCR positive
pcr_positive_data <- subset(imputed_data, pcr == "POSITIVA") # Or however POSITIVA is coded

# Check distribution of risk_hospital_icu within this subset
print(table(pcr_positive_data$risk_hospital_icu))
## 
##   0   1   2   3 
## 314  17  49   7
# Consider grouping levels 1, 2, 3 if counts are very low, e.g.:
# pcr_positive_data$severity_grouped <- ifelse(pcr_positive_data$risk_hospital_icu == 0, "0", "1+")
# Use "severity_grouped" as grouping_var if you do this.

# (Assuming pcr_positive_data and cognitive_vars_detailed are defined)

# Run for Cognitive Variables
results_severity_cog <- compare_groups_auto_v4(
  vars_to_test = cognitive_vars_detailed,  
  group = "risk_hospital_icu",        
  covariates = c("age_pcr"),          
  data = pcr_positive_data              
)
print(results_severity_cog)
##            Variable                           Type Covariates_Used
## 1    listaprimerrec Continuous (Gaussian GLM used)         age_pcr
## 2  listaaprendizaje Continuous (Gaussian GLM used)         age_pcr
## 3           listacp Continuous (Gaussian GLM used)         age_pcr
## 4           listalp Continuous (Gaussian GLM used)         age_pcr
## 5        listarecon Continuous (Gaussian GLM used)         age_pcr
## 6      corsidirecto Continuous (Gaussian GLM used)         age_pcr
## 7      corsiinverso Continuous (Gaussian GLM used)         age_pcr
## 8       cactusvivos Continuous (Gaussian GLM used)         age_pcr
## 9      cactusinanim Continuous (Gaussian GLM used)         age_pcr
## 10      otverbaltpo Continuous (Gaussian GLM used)         age_pcr
## 11      otverbalerr Continuous (Gaussian GLM used)         age_pcr
## 12      otvisualtpo Continuous (Gaussian GLM used)         age_pcr
## 13      otvisualerr Continuous (Gaussian GLM used)         age_pcr
## 14      otmentaltpo Continuous (Gaussian GLM used)         age_pcr
## 15      otmentalerr Continuous (Gaussian GLM used)         age_pcr
## 16     otvismenttpo Continuous (Gaussian GLM used)         age_pcr
## 17     otvismenterr Continuous (Gaussian GLM used)         age_pcr
## 18      otswitchtpo Continuous (Gaussian GLM used)         age_pcr
## 19      otswitcherr Continuous (Gaussian GLM used)         age_pcr
## 20       x5dreadtpo Continuous (Gaussian GLM used)         age_pcr
## 21       x5dreaderr Continuous (Gaussian GLM used)         age_pcr
## 22      x5dcounttpo Continuous (Gaussian GLM used)         age_pcr
## 23      x5dcounterr Continuous (Gaussian GLM used)         age_pcr
## 24        x5dfoctpo Continuous (Gaussian GLM used)         age_pcr
## 25        x5dfocerr Continuous (Gaussian GLM used)         age_pcr
## 26     x5dswitchtpo Continuous (Gaussian GLM used)         age_pcr
## 27     x5dswitcherr Continuous (Gaussian GLM used)         age_pcr
## 28           dscorr Continuous (Gaussian GLM used)         age_pcr
## 29           dsomis                     Continuous         age_pcr
## 30          dscomis Continuous (Gaussian GLM used)         age_pcr
## 31         torremov Continuous (Gaussian GLM used)         age_pcr
## 32         torretpo Continuous (Gaussian GLM used)         age_pcr
## 33         bostonsc Continuous (Gaussian GLM used)         age_pcr
## 34        bostonlat Continuous (Gaussian GLM used)         age_pcr
## 35     bostonsemerr Continuous (Gaussian GLM used)         age_pcr
## 36     bostonfonerr Continuous (Gaussian GLM used)         age_pcr
## 37         fluencia Continuous (Gaussian GLM used)         age_pcr
##    Group_Ref_Level_Used n_obs                                    Status p_value
## 1                     0   387                                        OK   0.463
## 2                     0   387                                        OK   0.201
## 3                     0   387                                        OK   0.667
## 4                     0   387                                        OK   0.833
## 5                     0   387                                        OK   0.191
## 6                     0   387                                        OK   0.870
## 7                     0   387                                        OK   0.132
## 8                     0   387                                        OK   0.810
## 9                     0   387                                        OK   0.424
## 10                    0   387                                        OK   0.958
## 11                    0   387                                        OK   0.072
## 12                    0   387                                        OK   0.979
## 13                    0   387                                        OK   0.977
## 14                    0   387                                        OK   0.322
## 15                    0   387                                        OK   0.590
## 16                    0   387                                        OK   0.114
## 17                    0   387                                        OK   0.892
## 18                    0   387                                        OK   0.435
## 19                    0   387                                        OK   0.755
## 20                    0   387                                        OK   0.380
## 21                    0   387                                        OK   0.499
## 22                    0   387                                        OK   0.520
## 23                    0   387                                        OK   0.937
## 24                    0   387                                        OK   0.937
## 25                    0   387                                        OK   0.894
## 26                    0   387                                        OK   0.418
## 27                    0   387                                        OK   0.919
## 28                    0   387                                        OK   0.960
## 29                    0   387 Error during model/LRT: NA/NaN/I f i  'x'    <NA>
## 30                    0   387                                        OK   0.648
## 31                    0   387                                        OK   0.049
## 32                    0   387                                        OK   0.254
## 33                    0   387                                        OK   0.096
## 34                    0   387                                        OK   0.327
## 35                    0   387                                        OK   0.105
## 36                    0   387                                        OK   0.185
## 37                    0   387                                        OK   0.106
##    p_value_FDR Significant Gamma_Shift_Warning Convergence_Warning
## 1        0.926          No               FALSE               FALSE
## 2        0.723          No               FALSE               FALSE
## 3        0.979          No               FALSE               FALSE
## 4        0.979          No               FALSE               FALSE
## 5        0.723          No               FALSE               FALSE
## 6        0.979          No               FALSE               FALSE
## 7        0.681          No               FALSE               FALSE
## 8        0.979          No               FALSE               FALSE
## 9        0.922          No               FALSE               FALSE
## 10       0.979          No               FALSE               FALSE
## 11       0.681          No               FALSE               FALSE
## 12       0.979          No               FALSE               FALSE
## 13       0.979          No               FALSE               FALSE
## 14       0.906          No               FALSE               FALSE
## 15       0.979          No               FALSE               FALSE
## 16       0.681          No               FALSE               FALSE
## 17       0.979          No               FALSE               FALSE
## 18       0.922          No               FALSE               FALSE
## 19       0.979          No               FALSE               FALSE
## 20       0.922          No               FALSE               FALSE
## 21       0.935          No               FALSE               FALSE
## 22       0.935          No               FALSE               FALSE
## 23       0.979          No               FALSE               FALSE
## 24       0.979          No               FALSE               FALSE
## 25       0.979          No               FALSE               FALSE
## 26       0.922          No               FALSE               FALSE
## 27       0.979          No               FALSE               FALSE
## 28       0.979          No               FALSE               FALSE
## 29        <NA>          No                TRUE               FALSE
## 30       0.979          No               FALSE               FALSE
## 31       0.681          No               FALSE               FALSE
## 32       0.832          No               FALSE               FALSE
## 33       0.681          No               FALSE               FALSE
## 34       0.906          No               FALSE               FALSE
## 35       0.681          No               FALSE               FALSE
## 36       0.723          No               FALSE               FALSE
## 37       0.681          No               FALSE               FALSE
# Run for Neuroimaging Variables
results_severity_neuro <- compare_groups_auto_v4(
  vars_to_test = neuroimaging_vars,       
  group = "risk_hospital_icu",        
  covariates = c("age_pcr"),          
  data = pcr_positive_data              
)
print(results_severity_neuro)
##                                                          Variable
## 1                                            right_accumbens_area
## 2                                             left_accumbens_area
## 3                                                  right_amygdala
## 4                                                   left_amygdala
## 5                                       right_cerebellum_exterior
## 6                                        left_cerebellum_exterior
## 7                                               right_hippocampus
## 8                                                left_hippocampus
## 9                                                   right_putamen
## 10                                                   left_putamen
## 11                                          right_thalamus_proper
## 12                                           left_thalamus_proper
## 13                                                   fornix_right
## 14                                                    fornix_left
## 15                        anterior_limb_of_internal_capsule_right
## 16                         anterior_limb_of_internal_capsule_left
## 17 posterior_limb_of_internal_capsule_inc_cerebral_peduncle_right
## 18  posterior_limb_of_internal_capsule_inc_cerebral_peduncle_left
## 19                                                corpus_callosum
## 20                          right_a_cg_g_anterior_cingulate_gyrus
## 21                           left_a_cg_g_anterior_cingulate_gyrus
## 22                                    right_a_ins_anterior_insula
## 23                                     left_a_ins_anterior_insula
## 24                                       right_an_g_angular_gyrus
## 25                                        left_an_g_angular_gyrus
## 26                                               right_cun_cuneus
## 27                                                left_cun_cuneus
## 28                                      right_ent_entorhinal_area
## 29                                       left_ent_entorhinal_area
## 30                                        right_g_re_gyrus_rectus
## 31                                         left_g_re_gyrus_rectus
##                              Type Covariates_Used Group_Ref_Level_Used n_obs
## 1  Continuous (Gaussian GLM used)         age_pcr                    0   387
## 2  Continuous (Gaussian GLM used)         age_pcr                    0   387
## 3  Continuous (Gaussian GLM used)         age_pcr                    0   387
## 4  Continuous (Gaussian GLM used)         age_pcr                    0   387
## 5  Continuous (Gaussian GLM used)         age_pcr                    0   387
## 6  Continuous (Gaussian GLM used)         age_pcr                    0   387
## 7  Continuous (Gaussian GLM used)         age_pcr                    0   387
## 8  Continuous (Gaussian GLM used)         age_pcr                    0   387
## 9  Continuous (Gaussian GLM used)         age_pcr                    0   387
## 10 Continuous (Gaussian GLM used)         age_pcr                    0   387
## 11 Continuous (Gaussian GLM used)         age_pcr                    0   387
## 12 Continuous (Gaussian GLM used)         age_pcr                    0   387
## 13 Continuous (Gaussian GLM used)         age_pcr                    0   387
## 14 Continuous (Gaussian GLM used)         age_pcr                    0   387
## 15 Continuous (Gaussian GLM used)         age_pcr                    0   387
## 16 Continuous (Gaussian GLM used)         age_pcr                    0   387
## 17 Continuous (Gaussian GLM used)         age_pcr                    0   387
## 18 Continuous (Gaussian GLM used)         age_pcr                    0   387
## 19 Continuous (Gaussian GLM used)         age_pcr                    0   387
## 20 Continuous (Gaussian GLM used)         age_pcr                    0   387
## 21 Continuous (Gaussian GLM used)         age_pcr                    0   387
## 22 Continuous (Gaussian GLM used)         age_pcr                    0   387
## 23 Continuous (Gaussian GLM used)         age_pcr                    0   387
## 24 Continuous (Gaussian GLM used)         age_pcr                    0   387
## 25 Continuous (Gaussian GLM used)         age_pcr                    0   387
## 26 Continuous (Gaussian GLM used)         age_pcr                    0   387
## 27 Continuous (Gaussian GLM used)         age_pcr                    0   387
## 28 Continuous (Gaussian GLM used)         age_pcr                    0   387
## 29 Continuous (Gaussian GLM used)         age_pcr                    0   387
## 30 Continuous (Gaussian GLM used)         age_pcr                    0   387
## 31 Continuous (Gaussian GLM used)         age_pcr                    0   387
##    Status p_value p_value_FDR Significant Gamma_Shift_Warning
## 1      OK   0.578       0.853          No               FALSE
## 2      OK   0.439       0.853          No               FALSE
## 3      OK   0.062       0.595          No               FALSE
## 4      OK   0.247       0.850          No               FALSE
## 5      OK   0.431       0.853          No               FALSE
## 6      OK   0.344       0.853          No               FALSE
## 7      OK   0.722       0.948          No               FALSE
## 8      OK   0.501       0.853          No               FALSE
## 9      OK   0.986       0.989          No               FALSE
## 10     OK   0.795       0.948          No               FALSE
## 11     OK   0.774       0.948          No               FALSE
## 12     OK   0.894       0.956          No               FALSE
## 13     OK   0.154       0.680          No               FALSE
## 14     OK   0.069       0.595          No               FALSE
## 15     OK   0.637       0.897          No               FALSE
## 16     OK   0.542       0.853          No               FALSE
## 17     OK   0.837       0.951          No               FALSE
## 18     OK   0.357       0.853          No               FALSE
## 19     OK   0.565       0.853          No               FALSE
## 20     OK   0.989       0.989          No               FALSE
## 21     OK   0.483       0.853          No               FALSE
## 22     OK   0.457       0.853          No               FALSE
## 23     OK   0.372       0.853          No               FALSE
## 24     OK   0.077       0.595          No               FALSE
## 25     OK   0.236       0.850          No               FALSE
## 26     OK   0.146       0.680          No               FALSE
## 27     OK   0.373       0.853          No               FALSE
## 28     OK   0.027       0.595          No               FALSE
## 29     OK   0.753       0.948          No               FALSE
## 30     OK   0.142       0.680          No               FALSE
## 31     OK   0.859       0.951          No               FALSE
##    Convergence_Warning
## 1                FALSE
## 2                FALSE
## 3                FALSE
## 4                FALSE
## 5                FALSE
## 6                FALSE
## 7                FALSE
## 8                FALSE
## 9                FALSE
## 10               FALSE
## 11               FALSE
## 12               FALSE
## 13               FALSE
## 14               FALSE
## 15               FALSE
## 16               FALSE
## 17               FALSE
## 18               FALSE
## 19               FALSE
## 20               FALSE
## 21               FALSE
## 22               FALSE
## 23               FALSE
## 24               FALSE
## 25               FALSE
## 26               FALSE
## 27               FALSE
## 28               FALSE
## 29               FALSE
## 30               FALSE
## 31               FALSE

Analysis 1: COVID Severity (risk_hospital_icu) within PCR+ Group (N=387)

Cognitive Outcomes: After adjusting for age (age_pcr) and correcting for multiple comparisons (FDR), no significant differences were found across COVID-19 severity levels (defined by risk_hospital_icu, reference level 0) for any of the detailed cognitive performance variables within the PCR-positive group. The analysis for dsomis failed due to errors. One variable (torremov, p=0.049) showed nominal significance before FDR correction but was not significant afterward (FDR p=0.681). Neuroimaging Outcomes: Similarly, when examining structural neuroimaging volumes within the PCR-positive group, no significant differences were detected across COVID-19 severity levels after adjusting for age (age_pcr) and applying FDR correction. One variable (right_ent_entorhinal_area, p=0.027) showed nominal significance but did not survive multiple comparison correction (FDR p=0.595).

— Analysis 2: Grouping by Vaccination Status (Pre-Study) —

# Goal: Explore relationship between pre-study vaccination and outcomes.

# Define key symptom variables if needed
symptom_vars_key <- c("anosmia", "risk_hospital_icu") # Add others if desired

# Run for Cognitive Variables
results_vaccine_cog <- compare_groups_auto_v4(
  vars_to_test = cognitive_vars_detailed,
  group = "vaccine_before_study",
  covariates = c("age_pcr"),
  data = imputed_data
)
print(results_vaccine_cog)
##            Variable                           Type Covariates_Used
## 1    listaprimerrec Continuous (Gaussian GLM used)         age_pcr
## 2  listaaprendizaje Continuous (Gaussian GLM used)         age_pcr
## 3           listacp Continuous (Gaussian GLM used)         age_pcr
## 4           listalp Continuous (Gaussian GLM used)         age_pcr
## 5        listarecon Continuous (Gaussian GLM used)         age_pcr
## 6      corsidirecto Continuous (Gaussian GLM used)         age_pcr
## 7      corsiinverso Continuous (Gaussian GLM used)         age_pcr
## 8       cactusvivos Continuous (Gaussian GLM used)         age_pcr
## 9      cactusinanim Continuous (Gaussian GLM used)         age_pcr
## 10      otverbaltpo Continuous (Gaussian GLM used)         age_pcr
## 11      otverbalerr Continuous (Gaussian GLM used)         age_pcr
## 12      otvisualtpo Continuous (Gaussian GLM used)         age_pcr
## 13      otvisualerr Continuous (Gaussian GLM used)         age_pcr
## 14      otmentaltpo Continuous (Gaussian GLM used)         age_pcr
## 15      otmentalerr Continuous (Gaussian GLM used)         age_pcr
## 16     otvismenttpo Continuous (Gaussian GLM used)         age_pcr
## 17     otvismenterr Continuous (Gaussian GLM used)         age_pcr
## 18      otswitchtpo Continuous (Gaussian GLM used)         age_pcr
## 19      otswitcherr Continuous (Gaussian GLM used)         age_pcr
## 20       x5dreadtpo Continuous (Gaussian GLM used)         age_pcr
## 21       x5dreaderr Continuous (Gaussian GLM used)         age_pcr
## 22      x5dcounttpo Continuous (Gaussian GLM used)         age_pcr
## 23      x5dcounterr Continuous (Gaussian GLM used)         age_pcr
## 24        x5dfoctpo Continuous (Gaussian GLM used)         age_pcr
## 25        x5dfocerr Continuous (Gaussian GLM used)         age_pcr
## 26     x5dswitchtpo Continuous (Gaussian GLM used)         age_pcr
## 27     x5dswitcherr Continuous (Gaussian GLM used)         age_pcr
## 28           dscorr Continuous (Gaussian GLM used)         age_pcr
## 29           dsomis                     Continuous         age_pcr
## 30          dscomis Continuous (Gaussian GLM used)         age_pcr
## 31         torremov Continuous (Gaussian GLM used)         age_pcr
## 32         torretpo Continuous (Gaussian GLM used)         age_pcr
## 33         bostonsc Continuous (Gaussian GLM used)         age_pcr
## 34        bostonlat Continuous (Gaussian GLM used)         age_pcr
## 35     bostonsemerr Continuous (Gaussian GLM used)         age_pcr
## 36     bostonfonerr                     Continuous         age_pcr
## 37         fluencia Continuous (Gaussian GLM used)         age_pcr
##    Group_Ref_Level_Used n_obs                                    Status p_value
## 1                     0   463                                        OK   0.770
## 2                     0   463                                        OK   0.007
## 3                     0   463                                        OK   0.681
## 4                     0   463                                        OK   0.585
## 5                     0   463                                        OK   0.476
## 6                     0   463                                        OK   0.069
## 7                     0   463                                        OK   0.735
## 8                     0   463                                        OK   0.703
## 9                     0   463                                        OK   0.953
## 10                    0   463                                        OK   0.525
## 11                    0   463                                        OK   0.376
## 12                    0   463                                        OK   0.070
## 13                    0   463                                        OK   0.410
## 14                    0   463                                        OK   0.091
## 15                    0   463                                        OK   0.814
## 16                    0   463                                        OK   0.017
## 17                    0   463                                        OK   0.595
## 18                    0   463                                        OK   0.013
## 19                    0   463                                        OK   0.175
## 20                    0   463                                        OK   0.224
## 21                    0   463                                        OK   0.264
## 22                    0   463                                        OK   0.155
## 23                    0   463                                        OK   0.325
## 24                    0   463                                        OK   0.325
## 25                    0   463                                        OK   0.077
## 26                    0   463                                        OK   0.207
## 27                    0   463                                        OK   0.220
## 28                    0   463                                        OK   0.279
## 29                    0   463 Error during model/LRT: NA/NaN/I f i  'x'    <NA>
## 30                    0   463                                        OK   0.041
## 31                    0   463                                        OK   0.064
## 32                    0   463                                        OK   0.001
## 33                    0   463                                        OK   0.402
## 34                    0   463                                        OK  <0.001
## 35                    0   463                                        OK   0.022
## 36                    0   463 Error during model/LRT: NA/NaN/I f i  'x'    <NA>
## 37                    0   463                                        OK   0.518
##    p_value_FDR Significant Gamma_Shift_Warning Convergence_Warning
## 1        0.816          No               FALSE               FALSE
## 2        0.079          No               FALSE               FALSE
## 3        0.794          No               FALSE               FALSE
## 4        0.718          No               FALSE               FALSE
## 5        0.666          No               FALSE               FALSE
## 6        0.245          No               FALSE               FALSE
## 7        0.804          No               FALSE               FALSE
## 8        0.794          No               FALSE               FALSE
## 9        0.953          No               FALSE               FALSE
## 10       0.681          No               FALSE               FALSE
## 11       0.598          No               FALSE               FALSE
## 12       0.245          No               FALSE               FALSE
## 13       0.598          No               FALSE               FALSE
## 14       0.265          No               FALSE               FALSE
## 15       0.838          No               FALSE               FALSE
## 16       0.120          No               FALSE               FALSE
## 17       0.718          No               FALSE               FALSE
## 18       0.110          No               FALSE               FALSE
## 19       0.437          No               FALSE               FALSE
## 20       0.461          No               FALSE               FALSE
## 21       0.513          No               FALSE               FALSE
## 22       0.418          No               FALSE               FALSE
## 23       0.541          No               FALSE               FALSE
## 24       0.541          No               FALSE               FALSE
## 25       0.245          No               FALSE               FALSE
## 26       0.461          No               FALSE               FALSE
## 27       0.461          No               FALSE               FALSE
## 28       0.514          No               FALSE               FALSE
## 29        <NA>          No                TRUE               FALSE
## 30       0.207          No               FALSE               FALSE
## 31       0.245          No               FALSE               FALSE
## 32       0.023         Yes               FALSE               FALSE
## 33       0.598          No               FALSE               FALSE
## 34       0.009         Yes               FALSE               FALSE
## 35       0.127          No               FALSE               FALSE
## 36        <NA>          No                TRUE               FALSE
## 37       0.681          No               FALSE               FALSE
# Run for Neuroimaging Variables
results_vaccine_neuro <- compare_groups_auto_v4(
  vars_to_test = neuroimaging_vars,
  group = "vaccine_before_study",
  covariates = c("age_pcr"),
  data = imputed_data
)
print(results_vaccine_neuro)
##                                                          Variable
## 1                                            right_accumbens_area
## 2                                             left_accumbens_area
## 3                                                  right_amygdala
## 4                                                   left_amygdala
## 5                                       right_cerebellum_exterior
## 6                                        left_cerebellum_exterior
## 7                                               right_hippocampus
## 8                                                left_hippocampus
## 9                                                   right_putamen
## 10                                                   left_putamen
## 11                                          right_thalamus_proper
## 12                                           left_thalamus_proper
## 13                                                   fornix_right
## 14                                                    fornix_left
## 15                        anterior_limb_of_internal_capsule_right
## 16                         anterior_limb_of_internal_capsule_left
## 17 posterior_limb_of_internal_capsule_inc_cerebral_peduncle_right
## 18  posterior_limb_of_internal_capsule_inc_cerebral_peduncle_left
## 19                                                corpus_callosum
## 20                          right_a_cg_g_anterior_cingulate_gyrus
## 21                           left_a_cg_g_anterior_cingulate_gyrus
## 22                                    right_a_ins_anterior_insula
## 23                                     left_a_ins_anterior_insula
## 24                                       right_an_g_angular_gyrus
## 25                                        left_an_g_angular_gyrus
## 26                                               right_cun_cuneus
## 27                                                left_cun_cuneus
## 28                                      right_ent_entorhinal_area
## 29                                       left_ent_entorhinal_area
## 30                                        right_g_re_gyrus_rectus
## 31                                         left_g_re_gyrus_rectus
##                              Type Covariates_Used Group_Ref_Level_Used n_obs
## 1  Continuous (Gaussian GLM used)         age_pcr                    0   463
## 2  Continuous (Gaussian GLM used)         age_pcr                    0   463
## 3  Continuous (Gaussian GLM used)         age_pcr                    0   463
## 4  Continuous (Gaussian GLM used)         age_pcr                    0   463
## 5  Continuous (Gaussian GLM used)         age_pcr                    0   463
## 6  Continuous (Gaussian GLM used)         age_pcr                    0   463
## 7  Continuous (Gaussian GLM used)         age_pcr                    0   463
## 8  Continuous (Gaussian GLM used)         age_pcr                    0   463
## 9  Continuous (Gaussian GLM used)         age_pcr                    0   463
## 10 Continuous (Gaussian GLM used)         age_pcr                    0   463
## 11 Continuous (Gaussian GLM used)         age_pcr                    0   463
## 12 Continuous (Gaussian GLM used)         age_pcr                    0   463
## 13 Continuous (Gaussian GLM used)         age_pcr                    0   463
## 14 Continuous (Gaussian GLM used)         age_pcr                    0   463
## 15 Continuous (Gaussian GLM used)         age_pcr                    0   463
## 16 Continuous (Gaussian GLM used)         age_pcr                    0   463
## 17 Continuous (Gaussian GLM used)         age_pcr                    0   463
## 18 Continuous (Gaussian GLM used)         age_pcr                    0   463
## 19 Continuous (Gaussian GLM used)         age_pcr                    0   463
## 20 Continuous (Gaussian GLM used)         age_pcr                    0   463
## 21 Continuous (Gaussian GLM used)         age_pcr                    0   463
## 22 Continuous (Gaussian GLM used)         age_pcr                    0   463
## 23 Continuous (Gaussian GLM used)         age_pcr                    0   463
## 24 Continuous (Gaussian GLM used)         age_pcr                    0   463
## 25 Continuous (Gaussian GLM used)         age_pcr                    0   463
## 26 Continuous (Gaussian GLM used)         age_pcr                    0   463
## 27 Continuous (Gaussian GLM used)         age_pcr                    0   463
## 28 Continuous (Gaussian GLM used)         age_pcr                    0   463
## 29 Continuous (Gaussian GLM used)         age_pcr                    0   463
## 30 Continuous (Gaussian GLM used)         age_pcr                    0   463
## 31 Continuous (Gaussian GLM used)         age_pcr                    0   463
##    Status p_value p_value_FDR Significant Gamma_Shift_Warning
## 1      OK   0.352       0.971          No               FALSE
## 2      OK   0.202       0.971          No               FALSE
## 3      OK   0.901       0.984          No               FALSE
## 4      OK   0.317       0.971          No               FALSE
## 5      OK   0.956       0.984          No               FALSE
## 6      OK   0.984       0.984          No               FALSE
## 7      OK   0.976       0.984          No               FALSE
## 8      OK   0.859       0.984          No               FALSE
## 9      OK   0.740       0.984          No               FALSE
## 10     OK   0.376       0.971          No               FALSE
## 11     OK   0.620       0.984          No               FALSE
## 12     OK   0.625       0.984          No               FALSE
## 13     OK   0.228       0.971          No               FALSE
## 14     OK   0.275       0.971          No               FALSE
## 15     OK   0.602       0.984          No               FALSE
## 16     OK   0.854       0.984          No               FALSE
## 17     OK   0.789       0.984          No               FALSE
## 18     OK   0.524       0.984          No               FALSE
## 19     OK   0.250       0.971          No               FALSE
## 20     OK   0.264       0.971          No               FALSE
## 21     OK   0.052       0.971          No               FALSE
## 22     OK   0.122       0.971          No               FALSE
## 23     OK   0.089       0.971          No               FALSE
## 24     OK   0.664       0.984          No               FALSE
## 25     OK   0.824       0.984          No               FALSE
## 26     OK   0.932       0.984          No               FALSE
## 27     OK   0.759       0.984          No               FALSE
## 28     OK   0.722       0.984          No               FALSE
## 29     OK   0.572       0.984          No               FALSE
## 30     OK   0.809       0.984          No               FALSE
## 31     OK   0.310       0.971          No               FALSE
##    Convergence_Warning
## 1                FALSE
## 2                FALSE
## 3                FALSE
## 4                FALSE
## 5                FALSE
## 6                FALSE
## 7                FALSE
## 8                FALSE
## 9                FALSE
## 10               FALSE
## 11               FALSE
## 12               FALSE
## 13               FALSE
## 14               FALSE
## 15               FALSE
## 16               FALSE
## 17               FALSE
## 18               FALSE
## 19               FALSE
## 20               FALSE
## 21               FALSE
## 22               FALSE
## 23               FALSE
## 24               FALSE
## 25               FALSE
## 26               FALSE
## 27               FALSE
## 28               FALSE
## 29               FALSE
## 30               FALSE
## 31               FALSE
# Run for Key Symptoms
results_vaccine_symptoms <- compare_groups_auto_v4(
  vars_to_test = symptom_vars_key,
  group = "vaccine_before_study",
  covariates = c("age_pcr"),
  data = imputed_data
)
print(results_vaccine_symptoms)
##            Variable        Type Covariates_Used Group_Ref_Level_Used n_obs
## 1           anosmia Categorical         age_pcr                    0   463
## 2 risk_hospital_icu Categorical         age_pcr                    0   463
##   Status p_value p_value_FDR Significant Gamma_Shift_Warning
## 1     OK   0.793       0.793          No               FALSE
## 2     OK   0.025       0.050         Yes               FALSE
##   Convergence_Warning
## 1               FALSE
## 2               FALSE
# Note: You might later stratify these by PCR status if sample sizes allow.

2. Analysis: Pre-Study Vaccination Status (vaccine_before_study) - All Participants (N=463)

Cognitive Outcomes: Controlling for age (age_pcr) and applying FDR correction, significant differences across pre-study vaccination status groups (reference level 0) were observed for torretpo (Tower Task Time, FDR p=0.023) and bostonlat (Boston Latency, FDR p=0.009). No other cognitive variables showed significant differences after FDR correction, although listaaprendizaje, otvismenttpo, otswitchtpo, and bostonsemerr had uncorrected p-values < 0.05. Analyses for dsomis and bostonfonerr failed. Neuroimaging Outcomes: After adjusting for age (age_pcr) and correcting for multiple comparisons, no significant differences in neuroimaging volumes were found between participants with different pre-study vaccination statuses. Symptom Outcomes: When examining anosmia and risk_hospital_icu as outcomes, significant differences were found for risk_hospital_icu (FDR p=0.050) across vaccination status groups, adjusting for age. No significant difference was found for anosmia.

— Analysis 3: Grouping by Infection Timing Relative to Vaccination (within PCR+) —

# Goal: Does getting COVID before vs. after vaccination associate with different outcomes?

# Ensure the grouping variable isn't mostly missing data in the subset
print(table(pcr_positive_data$covid_before_vaccination, useNA = "ifany"))
## 
##   0   1 
## 187 200
# Run for Cognitive Variables
results_timing_cog <- compare_groups_auto_v4(
  vars_to_test = cognitive_vars_detailed,
  group = "covid_before_vaccination",
  covariates = c("age_pcr", "vaccine_before_study"), # Control for overall vaccine status too
  data = pcr_positive_data
)
print(results_timing_cog)
##            Variable                           Type
## 1    listaprimerrec Continuous (Gaussian GLM used)
## 2  listaaprendizaje Continuous (Gaussian GLM used)
## 3           listacp Continuous (Gaussian GLM used)
## 4           listalp Continuous (Gaussian GLM used)
## 5        listarecon Continuous (Gaussian GLM used)
## 6      corsidirecto Continuous (Gaussian GLM used)
## 7      corsiinverso Continuous (Gaussian GLM used)
## 8       cactusvivos Continuous (Gaussian GLM used)
## 9      cactusinanim Continuous (Gaussian GLM used)
## 10      otverbaltpo Continuous (Gaussian GLM used)
## 11      otverbalerr Continuous (Gaussian GLM used)
## 12      otvisualtpo Continuous (Gaussian GLM used)
## 13      otvisualerr Continuous (Gaussian GLM used)
## 14      otmentaltpo Continuous (Gaussian GLM used)
## 15      otmentalerr Continuous (Gaussian GLM used)
## 16     otvismenttpo Continuous (Gaussian GLM used)
## 17     otvismenterr Continuous (Gaussian GLM used)
## 18      otswitchtpo Continuous (Gaussian GLM used)
## 19      otswitcherr Continuous (Gaussian GLM used)
## 20       x5dreadtpo Continuous (Gaussian GLM used)
## 21       x5dreaderr Continuous (Gaussian GLM used)
## 22      x5dcounttpo Continuous (Gaussian GLM used)
## 23      x5dcounterr Continuous (Gaussian GLM used)
## 24        x5dfoctpo Continuous (Gaussian GLM used)
## 25        x5dfocerr Continuous (Gaussian GLM used)
## 26     x5dswitchtpo Continuous (Gaussian GLM used)
## 27     x5dswitcherr Continuous (Gaussian GLM used)
## 28           dscorr Continuous (Gaussian GLM used)
## 29           dsomis                     Continuous
## 30          dscomis Continuous (Gaussian GLM used)
## 31         torremov Continuous (Gaussian GLM used)
## 32         torretpo Continuous (Gaussian GLM used)
## 33         bostonsc Continuous (Gaussian GLM used)
## 34        bostonlat Continuous (Gaussian GLM used)
## 35     bostonsemerr Continuous (Gaussian GLM used)
## 36     bostonfonerr Continuous (Gaussian GLM used)
## 37         fluencia Continuous (Gaussian GLM used)
##                  Covariates_Used Group_Ref_Level_Used n_obs
## 1  age_pcr, vaccine_before_study                    0   387
## 2  age_pcr, vaccine_before_study                    0   387
## 3  age_pcr, vaccine_before_study                    0   387
## 4  age_pcr, vaccine_before_study                    0   387
## 5  age_pcr, vaccine_before_study                    0   387
## 6  age_pcr, vaccine_before_study                    0   387
## 7  age_pcr, vaccine_before_study                    0   387
## 8  age_pcr, vaccine_before_study                    0   387
## 9  age_pcr, vaccine_before_study                    0   387
## 10 age_pcr, vaccine_before_study                    0   387
## 11 age_pcr, vaccine_before_study                    0   387
## 12 age_pcr, vaccine_before_study                    0   387
## 13 age_pcr, vaccine_before_study                    0   387
## 14 age_pcr, vaccine_before_study                    0   387
## 15 age_pcr, vaccine_before_study                    0   387
## 16 age_pcr, vaccine_before_study                    0   387
## 17 age_pcr, vaccine_before_study                    0   387
## 18 age_pcr, vaccine_before_study                    0   387
## 19 age_pcr, vaccine_before_study                    0   387
## 20 age_pcr, vaccine_before_study                    0   387
## 21 age_pcr, vaccine_before_study                    0   387
## 22 age_pcr, vaccine_before_study                    0   387
## 23 age_pcr, vaccine_before_study                    0   387
## 24 age_pcr, vaccine_before_study                    0   387
## 25 age_pcr, vaccine_before_study                    0   387
## 26 age_pcr, vaccine_before_study                    0   387
## 27 age_pcr, vaccine_before_study                    0   387
## 28 age_pcr, vaccine_before_study                    0   387
## 29 age_pcr, vaccine_before_study                    0   387
## 30 age_pcr, vaccine_before_study                    0   387
## 31 age_pcr, vaccine_before_study                    0   387
## 32 age_pcr, vaccine_before_study                    0   387
## 33 age_pcr, vaccine_before_study                    0   387
## 34 age_pcr, vaccine_before_study                    0   387
## 35 age_pcr, vaccine_before_study                    0   387
## 36 age_pcr, vaccine_before_study                    0   387
## 37 age_pcr, vaccine_before_study                    0   387
##                                       Status p_value p_value_FDR Significant
## 1                                         OK   0.345       0.910          No
## 2                                         OK   0.624       0.910          No
## 3                                         OK   0.946       0.973          No
## 4                                         OK   0.559       0.910          No
## 5                                         OK   0.157       0.751          No
## 6                                         OK   0.238       0.893          No
## 7                                         OK   0.025       0.297          No
## 8                                         OK   0.709       0.910          No
## 9                                         OK   0.356       0.910          No
## 10                                        OK   0.600       0.910          No
## 11                                        OK   0.094       0.563          No
## 12                                        OK   0.396       0.910          No
## 13                                        OK   0.852       0.935          No
## 14                                        OK   0.822       0.935          No
## 15                                        OK   0.514       0.910          No
## 16                                        OK   0.747       0.910          No
## 17                                        OK   0.722       0.910          No
## 18                                        OK   0.935       0.973          No
## 19                                        OK   0.640       0.910          No
## 20                                        OK   0.690       0.910          No
## 21                                        OK   0.093       0.563          No
## 22                                        OK   0.857       0.935          No
## 23                                        OK   0.012       0.297          No
## 24                                        OK   0.984       0.984          No
## 25                                        OK   0.167       0.751          No
## 26                                        OK   0.705       0.910          No
## 27                                        OK   0.046       0.410          No
## 28                                        OK   0.248       0.893          No
## 29 Error during model/LRT: NA/NaN/I f i  'x'    <NA>        <NA>          No
## 30                                        OK   0.462       0.910          No
## 31                                        OK   0.480       0.910          No
## 32                                        OK   0.759       0.910          No
## 33                                        OK   0.304       0.910          No
## 34                                        OK   0.021       0.297          No
## 35                                        OK   0.478       0.910          No
## 36                                        OK   0.274       0.897          No
## 37                                        OK   0.727       0.910          No
##    Gamma_Shift_Warning Convergence_Warning
## 1                FALSE               FALSE
## 2                FALSE               FALSE
## 3                FALSE               FALSE
## 4                FALSE               FALSE
## 5                FALSE               FALSE
## 6                FALSE               FALSE
## 7                FALSE               FALSE
## 8                FALSE               FALSE
## 9                FALSE               FALSE
## 10               FALSE               FALSE
## 11               FALSE               FALSE
## 12               FALSE               FALSE
## 13               FALSE               FALSE
## 14               FALSE               FALSE
## 15               FALSE               FALSE
## 16               FALSE               FALSE
## 17               FALSE               FALSE
## 18               FALSE               FALSE
## 19               FALSE               FALSE
## 20               FALSE               FALSE
## 21               FALSE               FALSE
## 22               FALSE               FALSE
## 23               FALSE               FALSE
## 24               FALSE               FALSE
## 25               FALSE               FALSE
## 26               FALSE               FALSE
## 27               FALSE               FALSE
## 28               FALSE               FALSE
## 29                TRUE               FALSE
## 30               FALSE               FALSE
## 31               FALSE               FALSE
## 32               FALSE               FALSE
## 33               FALSE               FALSE
## 34               FALSE               FALSE
## 35               FALSE               FALSE
## 36               FALSE               FALSE
## 37               FALSE               FALSE
# Run for Neuroimaging Variables
results_timing_neuro <- compare_groups_auto_v4(
  vars_to_test = neuroimaging_vars,
  group = "covid_before_vaccination",
  covariates = c("age_pcr", "vaccine_before_study"),
  data = pcr_positive_data
)
print(results_timing_neuro)
##                                                          Variable
## 1                                            right_accumbens_area
## 2                                             left_accumbens_area
## 3                                                  right_amygdala
## 4                                                   left_amygdala
## 5                                       right_cerebellum_exterior
## 6                                        left_cerebellum_exterior
## 7                                               right_hippocampus
## 8                                                left_hippocampus
## 9                                                   right_putamen
## 10                                                   left_putamen
## 11                                          right_thalamus_proper
## 12                                           left_thalamus_proper
## 13                                                   fornix_right
## 14                                                    fornix_left
## 15                        anterior_limb_of_internal_capsule_right
## 16                         anterior_limb_of_internal_capsule_left
## 17 posterior_limb_of_internal_capsule_inc_cerebral_peduncle_right
## 18  posterior_limb_of_internal_capsule_inc_cerebral_peduncle_left
## 19                                                corpus_callosum
## 20                          right_a_cg_g_anterior_cingulate_gyrus
## 21                           left_a_cg_g_anterior_cingulate_gyrus
## 22                                    right_a_ins_anterior_insula
## 23                                     left_a_ins_anterior_insula
## 24                                       right_an_g_angular_gyrus
## 25                                        left_an_g_angular_gyrus
## 26                                               right_cun_cuneus
## 27                                                left_cun_cuneus
## 28                                      right_ent_entorhinal_area
## 29                                       left_ent_entorhinal_area
## 30                                        right_g_re_gyrus_rectus
## 31                                         left_g_re_gyrus_rectus
##                              Type               Covariates_Used
## 1  Continuous (Gaussian GLM used) age_pcr, vaccine_before_study
## 2  Continuous (Gaussian GLM used) age_pcr, vaccine_before_study
## 3  Continuous (Gaussian GLM used) age_pcr, vaccine_before_study
## 4  Continuous (Gaussian GLM used) age_pcr, vaccine_before_study
## 5  Continuous (Gaussian GLM used) age_pcr, vaccine_before_study
## 6  Continuous (Gaussian GLM used) age_pcr, vaccine_before_study
## 7  Continuous (Gaussian GLM used) age_pcr, vaccine_before_study
## 8  Continuous (Gaussian GLM used) age_pcr, vaccine_before_study
## 9  Continuous (Gaussian GLM used) age_pcr, vaccine_before_study
## 10 Continuous (Gaussian GLM used) age_pcr, vaccine_before_study
## 11 Continuous (Gaussian GLM used) age_pcr, vaccine_before_study
## 12 Continuous (Gaussian GLM used) age_pcr, vaccine_before_study
## 13 Continuous (Gaussian GLM used) age_pcr, vaccine_before_study
## 14 Continuous (Gaussian GLM used) age_pcr, vaccine_before_study
## 15 Continuous (Gaussian GLM used) age_pcr, vaccine_before_study
## 16 Continuous (Gaussian GLM used) age_pcr, vaccine_before_study
## 17 Continuous (Gaussian GLM used) age_pcr, vaccine_before_study
## 18 Continuous (Gaussian GLM used) age_pcr, vaccine_before_study
## 19 Continuous (Gaussian GLM used) age_pcr, vaccine_before_study
## 20 Continuous (Gaussian GLM used) age_pcr, vaccine_before_study
## 21 Continuous (Gaussian GLM used) age_pcr, vaccine_before_study
## 22 Continuous (Gaussian GLM used) age_pcr, vaccine_before_study
## 23 Continuous (Gaussian GLM used) age_pcr, vaccine_before_study
## 24 Continuous (Gaussian GLM used) age_pcr, vaccine_before_study
## 25 Continuous (Gaussian GLM used) age_pcr, vaccine_before_study
## 26 Continuous (Gaussian GLM used) age_pcr, vaccine_before_study
## 27 Continuous (Gaussian GLM used) age_pcr, vaccine_before_study
## 28 Continuous (Gaussian GLM used) age_pcr, vaccine_before_study
## 29 Continuous (Gaussian GLM used) age_pcr, vaccine_before_study
## 30 Continuous (Gaussian GLM used) age_pcr, vaccine_before_study
## 31 Continuous (Gaussian GLM used) age_pcr, vaccine_before_study
##    Group_Ref_Level_Used n_obs Status p_value p_value_FDR Significant
## 1                     0   387     OK   0.431       0.937          No
## 2                     0   387     OK   0.573       0.937          No
## 3                     0   387     OK   0.052       0.810          No
## 4                     0   387     OK   0.316       0.823          No
## 5                     0   387     OK   0.691       0.937          No
## 6                     0   387     OK   0.295       0.823          No
## 7                     0   387     OK   0.142       0.810          No
## 8                     0   387     OK   0.157       0.810          No
## 9                     0   387     OK   0.987       0.987          No
## 10                    0   387     OK   0.877       0.937          No
## 11                    0   387     OK   0.791       0.937          No
## 12                    0   387     OK   0.695       0.937          No
## 13                    0   387     OK   0.148       0.810          No
## 14                    0   387     OK   0.770       0.937          No
## 15                    0   387     OK   0.978       0.987          No
## 16                    0   387     OK   0.691       0.937          No
## 17                    0   387     OK   0.848       0.937          No
## 18                    0   387     OK   0.796       0.937          No
## 19                    0   387     OK   0.845       0.937          No
## 20                    0   387     OK   0.295       0.823          No
## 21                    0   387     OK   0.255       0.823          No
## 22                    0   387     OK   0.319       0.823          No
## 23                    0   387     OK   0.487       0.937          No
## 24                    0   387     OK   0.027       0.810          No
## 25                    0   387     OK   0.087       0.810          No
## 26                    0   387     OK   0.591       0.937          No
## 27                    0   387     OK   0.527       0.937          No
## 28                    0   387     OK   0.650       0.937          No
## 29                    0   387     OK   0.239       0.823          No
## 30                    0   387     OK   0.694       0.937          No
## 31                    0   387     OK   0.439       0.937          No
##    Gamma_Shift_Warning Convergence_Warning
## 1                FALSE               FALSE
## 2                FALSE               FALSE
## 3                FALSE               FALSE
## 4                FALSE               FALSE
## 5                FALSE               FALSE
## 6                FALSE               FALSE
## 7                FALSE               FALSE
## 8                FALSE               FALSE
## 9                FALSE               FALSE
## 10               FALSE               FALSE
## 11               FALSE               FALSE
## 12               FALSE               FALSE
## 13               FALSE               FALSE
## 14               FALSE               FALSE
## 15               FALSE               FALSE
## 16               FALSE               FALSE
## 17               FALSE               FALSE
## 18               FALSE               FALSE
## 19               FALSE               FALSE
## 20               FALSE               FALSE
## 21               FALSE               FALSE
## 22               FALSE               FALSE
## 23               FALSE               FALSE
## 24               FALSE               FALSE
## 25               FALSE               FALSE
## 26               FALSE               FALSE
## 27               FALSE               FALSE
## 28               FALSE               FALSE
## 29               FALSE               FALSE
## 30               FALSE               FALSE
## 31               FALSE               FALSE

3. Analysis: Infection Timing (covid_before_vaccination) within PCR+ Group (N=387)

Cognitive Outcomes: Within the PCR-positive group, comparing those infected before versus after vaccination (reference level 0), no significant differences in detailed cognitive scores were found after adjusting for age (age_pcr), pre-study vaccination status (vaccine_before_study), and FDR correction. Several variables (corsiinverso, x5dcounterr, x5dswitcherr, bostonlat) showed nominal significance (p<0.05) but did not meet the FDR threshold. The analysis for dsomis failed. Neuroimaging Outcomes: Similarly, no significant differences in neuroimaging volumes were detected between PCR-positive individuals infected before versus after vaccination, after controlling for age, pre-study vaccination status, and multiple comparisons. Several regions approached nominal significance (e.g., right_amygdala, fornix_left, right_an_g_angular_gyrus, left_an_g_angular_gyrus) but were non-significant after FDR correction.

— Analysis 4: Grouping by Specific Key Symptoms (Binary) within PCR+ group —

# Goal: Is presence of specific symptoms linked to cognitive or brain changes?

# Example for Smell Loss:
results_smell_cog <- compare_groups_auto_v4(
  vars_to_test = cognitive_vars_detailed,
  group= "smell_lost", # Assumes this is 0/1 coded
  covariates = c("age_pcr", "risk_hospital_icu"), # Control for age and severity
  data = pcr_positive_data
)
print(results_smell_cog)
##            Variable                           Type            Covariates_Used
## 1    listaprimerrec Continuous (Gaussian GLM used) age_pcr, risk_hospital_icu
## 2  listaaprendizaje Continuous (Gaussian GLM used) age_pcr, risk_hospital_icu
## 3           listacp Continuous (Gaussian GLM used) age_pcr, risk_hospital_icu
## 4           listalp Continuous (Gaussian GLM used) age_pcr, risk_hospital_icu
## 5        listarecon Continuous (Gaussian GLM used) age_pcr, risk_hospital_icu
## 6      corsidirecto Continuous (Gaussian GLM used) age_pcr, risk_hospital_icu
## 7      corsiinverso Continuous (Gaussian GLM used) age_pcr, risk_hospital_icu
## 8       cactusvivos Continuous (Gaussian GLM used) age_pcr, risk_hospital_icu
## 9      cactusinanim Continuous (Gaussian GLM used) age_pcr, risk_hospital_icu
## 10      otverbaltpo Continuous (Gaussian GLM used) age_pcr, risk_hospital_icu
## 11      otverbalerr Continuous (Gaussian GLM used) age_pcr, risk_hospital_icu
## 12      otvisualtpo Continuous (Gaussian GLM used) age_pcr, risk_hospital_icu
## 13      otvisualerr Continuous (Gaussian GLM used) age_pcr, risk_hospital_icu
## 14      otmentaltpo Continuous (Gaussian GLM used) age_pcr, risk_hospital_icu
## 15      otmentalerr Continuous (Gaussian GLM used) age_pcr, risk_hospital_icu
## 16     otvismenttpo Continuous (Gaussian GLM used) age_pcr, risk_hospital_icu
## 17     otvismenterr Continuous (Gaussian GLM used) age_pcr, risk_hospital_icu
## 18      otswitchtpo Continuous (Gaussian GLM used) age_pcr, risk_hospital_icu
## 19      otswitcherr Continuous (Gaussian GLM used) age_pcr, risk_hospital_icu
## 20       x5dreadtpo Continuous (Gaussian GLM used) age_pcr, risk_hospital_icu
## 21       x5dreaderr Continuous (Gaussian GLM used) age_pcr, risk_hospital_icu
## 22      x5dcounttpo Continuous (Gaussian GLM used) age_pcr, risk_hospital_icu
## 23      x5dcounterr Continuous (Gaussian GLM used) age_pcr, risk_hospital_icu
## 24        x5dfoctpo Continuous (Gaussian GLM used) age_pcr, risk_hospital_icu
## 25        x5dfocerr Continuous (Gaussian GLM used) age_pcr, risk_hospital_icu
## 26     x5dswitchtpo Continuous (Gaussian GLM used) age_pcr, risk_hospital_icu
## 27     x5dswitcherr Continuous (Gaussian GLM used) age_pcr, risk_hospital_icu
## 28           dscorr Continuous (Gaussian GLM used) age_pcr, risk_hospital_icu
## 29           dsomis                     Continuous age_pcr, risk_hospital_icu
## 30          dscomis Continuous (Gaussian GLM used) age_pcr, risk_hospital_icu
## 31         torremov Continuous (Gaussian GLM used) age_pcr, risk_hospital_icu
## 32         torretpo Continuous (Gaussian GLM used) age_pcr, risk_hospital_icu
## 33         bostonsc Continuous (Gaussian GLM used) age_pcr, risk_hospital_icu
## 34        bostonlat Continuous (Gaussian GLM used) age_pcr, risk_hospital_icu
## 35     bostonsemerr Continuous (Gaussian GLM used) age_pcr, risk_hospital_icu
## 36     bostonfonerr Continuous (Gaussian GLM used) age_pcr, risk_hospital_icu
## 37         fluencia Continuous (Gaussian GLM used) age_pcr, risk_hospital_icu
##    Group_Ref_Level_Used n_obs                                    Status p_value
## 1                     0   387                                        OK   0.405
## 2                     0   387                                        OK   0.522
## 3                     0   387                                        OK   0.680
## 4                     0   387                                        OK   0.333
## 5                     0   387                                        OK   0.316
## 6                     0   387                                        OK   0.769
## 7                     0   387                                        OK   0.174
## 8                     0   387                                        OK   0.993
## 9                     0   387                                        OK   0.394
## 10                    0   387                                        OK   0.513
## 11                    0   387                                        OK   0.604
## 12                    0   387                                        OK   0.829
## 13                    0   387                                        OK   0.815
## 14                    0   387                                        OK   0.764
## 15                    0   387                                        OK   0.849
## 16                    0   387                                        OK   0.911
## 17                    0   387                                        OK   0.694
## 18                    0   387                                        OK   0.882
## 19                    0   387                                        OK   0.742
## 20                    0   387                                        OK   0.804
## 21                    0   387                                        OK   0.180
## 22                    0   387                                        OK   0.385
## 23                    0   387                                        OK   0.361
## 24                    0   387                                        OK   0.582
## 25                    0   387                                        OK   0.029
## 26                    0   387                                        OK   0.371
## 27                    0   387                                        OK   0.446
## 28                    0   387                                        OK   0.954
## 29                    0   387 Error during model/LRT: NA/NaN/I f i  'x'    <NA>
## 30                    0   387                                        OK   0.958
## 31                    0   387                                        OK   0.887
## 32                    0   387                                        OK   0.820
## 33                    0   387                                        OK   0.482
## 34                    0   387                                        OK   0.513
## 35                    0   387                                        OK   0.592
## 36                    0   387                                        OK   0.850
## 37                    0   387                                        OK   0.716
##    p_value_FDR Significant Gamma_Shift_Warning Convergence_Warning
## 1        0.985          No               FALSE               FALSE
## 2        0.985          No               FALSE               FALSE
## 3        0.985          No               FALSE               FALSE
## 4        0.985          No               FALSE               FALSE
## 5        0.985          No               FALSE               FALSE
## 6        0.985          No               FALSE               FALSE
## 7        0.985          No               FALSE               FALSE
## 8        0.993          No               FALSE               FALSE
## 9        0.985          No               FALSE               FALSE
## 10       0.985          No               FALSE               FALSE
## 11       0.985          No               FALSE               FALSE
## 12       0.985          No               FALSE               FALSE
## 13       0.985          No               FALSE               FALSE
## 14       0.985          No               FALSE               FALSE
## 15       0.985          No               FALSE               FALSE
## 16       0.985          No               FALSE               FALSE
## 17       0.985          No               FALSE               FALSE
## 18       0.985          No               FALSE               FALSE
## 19       0.985          No               FALSE               FALSE
## 20       0.985          No               FALSE               FALSE
## 21       0.985          No               FALSE               FALSE
## 22       0.985          No               FALSE               FALSE
## 23       0.985          No               FALSE               FALSE
## 24       0.985          No               FALSE               FALSE
## 25       0.985          No               FALSE               FALSE
## 26       0.985          No               FALSE               FALSE
## 27       0.985          No               FALSE               FALSE
## 28       0.985          No               FALSE               FALSE
## 29        <NA>          No                TRUE               FALSE
## 30       0.985          No               FALSE               FALSE
## 31       0.985          No               FALSE               FALSE
## 32       0.985          No               FALSE               FALSE
## 33       0.985          No               FALSE               FALSE
## 34       0.985          No               FALSE               FALSE
## 35       0.985          No               FALSE               FALSE
## 36       0.985          No               FALSE               FALSE
## 37       0.985          No               FALSE               FALSE
results_smell_neuro <- compare_groups_auto_v4(
  vars_to_test = neuroimaging_vars,
  group = "smell_lost",
  covariates = c("age_pcr", "risk_hospital_icu"),
  data = pcr_positive_data
)
print(results_smell_neuro)
##                                                          Variable
## 1                                            right_accumbens_area
## 2                                             left_accumbens_area
## 3                                                  right_amygdala
## 4                                                   left_amygdala
## 5                                       right_cerebellum_exterior
## 6                                        left_cerebellum_exterior
## 7                                               right_hippocampus
## 8                                                left_hippocampus
## 9                                                   right_putamen
## 10                                                   left_putamen
## 11                                          right_thalamus_proper
## 12                                           left_thalamus_proper
## 13                                                   fornix_right
## 14                                                    fornix_left
## 15                        anterior_limb_of_internal_capsule_right
## 16                         anterior_limb_of_internal_capsule_left
## 17 posterior_limb_of_internal_capsule_inc_cerebral_peduncle_right
## 18  posterior_limb_of_internal_capsule_inc_cerebral_peduncle_left
## 19                                                corpus_callosum
## 20                          right_a_cg_g_anterior_cingulate_gyrus
## 21                           left_a_cg_g_anterior_cingulate_gyrus
## 22                                    right_a_ins_anterior_insula
## 23                                     left_a_ins_anterior_insula
## 24                                       right_an_g_angular_gyrus
## 25                                        left_an_g_angular_gyrus
## 26                                               right_cun_cuneus
## 27                                                left_cun_cuneus
## 28                                      right_ent_entorhinal_area
## 29                                       left_ent_entorhinal_area
## 30                                        right_g_re_gyrus_rectus
## 31                                         left_g_re_gyrus_rectus
##                              Type            Covariates_Used
## 1  Continuous (Gaussian GLM used) age_pcr, risk_hospital_icu
## 2  Continuous (Gaussian GLM used) age_pcr, risk_hospital_icu
## 3  Continuous (Gaussian GLM used) age_pcr, risk_hospital_icu
## 4  Continuous (Gaussian GLM used) age_pcr, risk_hospital_icu
## 5  Continuous (Gaussian GLM used) age_pcr, risk_hospital_icu
## 6  Continuous (Gaussian GLM used) age_pcr, risk_hospital_icu
## 7  Continuous (Gaussian GLM used) age_pcr, risk_hospital_icu
## 8  Continuous (Gaussian GLM used) age_pcr, risk_hospital_icu
## 9  Continuous (Gaussian GLM used) age_pcr, risk_hospital_icu
## 10 Continuous (Gaussian GLM used) age_pcr, risk_hospital_icu
## 11 Continuous (Gaussian GLM used) age_pcr, risk_hospital_icu
## 12 Continuous (Gaussian GLM used) age_pcr, risk_hospital_icu
## 13 Continuous (Gaussian GLM used) age_pcr, risk_hospital_icu
## 14 Continuous (Gaussian GLM used) age_pcr, risk_hospital_icu
## 15 Continuous (Gaussian GLM used) age_pcr, risk_hospital_icu
## 16 Continuous (Gaussian GLM used) age_pcr, risk_hospital_icu
## 17 Continuous (Gaussian GLM used) age_pcr, risk_hospital_icu
## 18 Continuous (Gaussian GLM used) age_pcr, risk_hospital_icu
## 19 Continuous (Gaussian GLM used) age_pcr, risk_hospital_icu
## 20 Continuous (Gaussian GLM used) age_pcr, risk_hospital_icu
## 21 Continuous (Gaussian GLM used) age_pcr, risk_hospital_icu
## 22 Continuous (Gaussian GLM used) age_pcr, risk_hospital_icu
## 23 Continuous (Gaussian GLM used) age_pcr, risk_hospital_icu
## 24 Continuous (Gaussian GLM used) age_pcr, risk_hospital_icu
## 25 Continuous (Gaussian GLM used) age_pcr, risk_hospital_icu
## 26 Continuous (Gaussian GLM used) age_pcr, risk_hospital_icu
## 27 Continuous (Gaussian GLM used) age_pcr, risk_hospital_icu
## 28 Continuous (Gaussian GLM used) age_pcr, risk_hospital_icu
## 29 Continuous (Gaussian GLM used) age_pcr, risk_hospital_icu
## 30 Continuous (Gaussian GLM used) age_pcr, risk_hospital_icu
## 31 Continuous (Gaussian GLM used) age_pcr, risk_hospital_icu
##    Group_Ref_Level_Used n_obs Status p_value p_value_FDR Significant
## 1                     0   387     OK   0.221       0.554          No
## 2                     0   387     OK   0.352       0.554          No
## 3                     0   387     OK   0.610       0.700          No
## 4                     0   387     OK   0.987       0.998          No
## 5                     0   387     OK   0.451       0.607          No
## 6                     0   387     OK   0.250       0.554          No
## 7                     0   387     OK   0.998       0.998          No
## 8                     0   387     OK   0.374       0.554          No
## 9                     0   387     OK   0.025       0.380          No
## 10                    0   387     OK   0.038       0.397          No
## 11                    0   387     OK   0.264       0.554          No
## 12                    0   387     OK   0.363       0.554          No
## 13                    0   387     OK   0.176       0.545          No
## 14                    0   387     OK   0.470       0.607          No
## 15                    0   387     OK   0.752       0.803          No
## 16                    0   387     OK   0.535       0.638          No
## 17                    0   387     OK   0.147       0.538          No
## 18                    0   387     OK   0.349       0.554          No
## 19                    0   387     OK   0.639       0.708          No
## 20                    0   387     OK   0.393       0.554          No
## 21                    0   387     OK   0.105       0.538          No
## 22                    0   387     OK   0.132       0.538          No
## 23                    0   387     OK   0.299       0.554          No
## 24                    0   387     OK   0.128       0.538          No
## 25                    0   387     OK   0.383       0.554          No
## 26                    0   387     OK   0.531       0.638          No
## 27                    0   387     OK   0.354       0.554          No
## 28                    0   387     OK   0.259       0.554          No
## 29                    0   387     OK   0.156       0.538          No
## 30                    0   387     OK   0.008       0.257          No
## 31                    0   387     OK   0.094       0.538          No
##    Gamma_Shift_Warning Convergence_Warning
## 1                FALSE               FALSE
## 2                FALSE               FALSE
## 3                FALSE               FALSE
## 4                FALSE               FALSE
## 5                FALSE               FALSE
## 6                FALSE               FALSE
## 7                FALSE               FALSE
## 8                FALSE               FALSE
## 9                FALSE               FALSE
## 10               FALSE               FALSE
## 11               FALSE               FALSE
## 12               FALSE               FALSE
## 13               FALSE               FALSE
## 14               FALSE               FALSE
## 15               FALSE               FALSE
## 16               FALSE               FALSE
## 17               FALSE               FALSE
## 18               FALSE               FALSE
## 19               FALSE               FALSE
## 20               FALSE               FALSE
## 21               FALSE               FALSE
## 22               FALSE               FALSE
## 23               FALSE               FALSE
## 24               FALSE               FALSE
## 25               FALSE               FALSE
## 26               FALSE               FALSE
## 27               FALSE               FALSE
## 28               FALSE               FALSE
## 29               FALSE               FALSE
## 30               FALSE               FALSE
## 31               FALSE               FALSE
# *** After, we can repeat the above structure for other binary symptoms like: ***
# grouping_var = "taste_lost"
# grouping_var = "breath_dif"
# etc.

4. Analysis: Specific Symptom (smell_lost - Example) within PCR+ Group (N=387)

Cognitive Outcomes: Comparing PCR-positive individuals with and without reported smell loss (reference level 0), while controlling for age (age_pcr) and COVID severity (risk_hospital_icu), no significant differences were found for any detailed cognitive scores after FDR correction. The analysis for dsomis failed. x5dfocerr approached nominal significance (p=0.029) but was not significant after correction (FDR p=0.985). Neuroimaging Outcomes: After adjusting for age and COVID severity and applying FDR correction, no significant differences in neuroimaging volumes were observed between PCR-positive participants with and without smell loss. Several variables showed nominal significance (right_putamen, left_putamen, right_g_re_gyrus_rectus) but did not survive multiple comparison correction.

— Analysis 5: Grouping by COVID Variant (Grouped) within PCR+ group —

# Goal: Do different major variants associate with specific outcomes?

# First, create the grouped variant variable (ensure dplyr is loaded)
library(dplyr)
pcr_positive_data <- pcr_positive_data %>%
  mutate(covid_variant_grouped = case_when(
    covid_variant %in% c(0, 1, 2, 3) ~ as.character(covid_variant), # Keep major ones separate
    TRUE ~ "Other_Rare" # Group others
  )) %>%
  mutate(covid_variant_grouped = factor(covid_variant_grouped)) # Make it a factor

print(table(pcr_positive_data$covid_variant_grouped))
## 
##          0          1          2          3 Other_Rare 
##          5        222         94         59          7
# Run for Cognitive Variables
results_variant_cog <- compare_groups_auto_v4(
  vars_to_test = cognitive_vars_detailed,
  group = "covid_variant_grouped",
  covariates = c("age_pcr", "vaccine_before_study"),
  data = pcr_positive_data
)
print(results_variant_cog)
##            Variable                           Type
## 1    listaprimerrec Continuous (Gaussian GLM used)
## 2  listaaprendizaje Continuous (Gaussian GLM used)
## 3           listacp Continuous (Gaussian GLM used)
## 4           listalp Continuous (Gaussian GLM used)
## 5        listarecon Continuous (Gaussian GLM used)
## 6      corsidirecto Continuous (Gaussian GLM used)
## 7      corsiinverso Continuous (Gaussian GLM used)
## 8       cactusvivos Continuous (Gaussian GLM used)
## 9      cactusinanim Continuous (Gaussian GLM used)
## 10      otverbaltpo Continuous (Gaussian GLM used)
## 11      otverbalerr Continuous (Gaussian GLM used)
## 12      otvisualtpo Continuous (Gaussian GLM used)
## 13      otvisualerr Continuous (Gaussian GLM used)
## 14      otmentaltpo Continuous (Gaussian GLM used)
## 15      otmentalerr Continuous (Gaussian GLM used)
## 16     otvismenttpo Continuous (Gaussian GLM used)
## 17     otvismenterr Continuous (Gaussian GLM used)
## 18      otswitchtpo Continuous (Gaussian GLM used)
## 19      otswitcherr Continuous (Gaussian GLM used)
## 20       x5dreadtpo Continuous (Gaussian GLM used)
## 21       x5dreaderr Continuous (Gaussian GLM used)
## 22      x5dcounttpo Continuous (Gaussian GLM used)
## 23      x5dcounterr Continuous (Gaussian GLM used)
## 24        x5dfoctpo Continuous (Gaussian GLM used)
## 25        x5dfocerr Continuous (Gaussian GLM used)
## 26     x5dswitchtpo Continuous (Gaussian GLM used)
## 27     x5dswitcherr Continuous (Gaussian GLM used)
## 28           dscorr Continuous (Gaussian GLM used)
## 29           dsomis                     Continuous
## 30          dscomis Continuous (Gaussian GLM used)
## 31         torremov Continuous (Gaussian GLM used)
## 32         torretpo Continuous (Gaussian GLM used)
## 33         bostonsc Continuous (Gaussian GLM used)
## 34        bostonlat Continuous (Gaussian GLM used)
## 35     bostonsemerr Continuous (Gaussian GLM used)
## 36     bostonfonerr Continuous (Gaussian GLM used)
## 37         fluencia Continuous (Gaussian GLM used)
##                  Covariates_Used Group_Ref_Level_Used n_obs
## 1  age_pcr, vaccine_before_study                    0   387
## 2  age_pcr, vaccine_before_study                    0   387
## 3  age_pcr, vaccine_before_study                    0   387
## 4  age_pcr, vaccine_before_study                    0   387
## 5  age_pcr, vaccine_before_study                    0   387
## 6  age_pcr, vaccine_before_study                    0   387
## 7  age_pcr, vaccine_before_study                    0   387
## 8  age_pcr, vaccine_before_study                    0   387
## 9  age_pcr, vaccine_before_study                    0   387
## 10 age_pcr, vaccine_before_study                    0   387
## 11 age_pcr, vaccine_before_study                    0   387
## 12 age_pcr, vaccine_before_study                    0   387
## 13 age_pcr, vaccine_before_study                    0   387
## 14 age_pcr, vaccine_before_study                    0   387
## 15 age_pcr, vaccine_before_study                    0   387
## 16 age_pcr, vaccine_before_study                    0   387
## 17 age_pcr, vaccine_before_study                    0   387
## 18 age_pcr, vaccine_before_study                    0   387
## 19 age_pcr, vaccine_before_study                    0   387
## 20 age_pcr, vaccine_before_study                    0   387
## 21 age_pcr, vaccine_before_study                    0   387
## 22 age_pcr, vaccine_before_study                    0   387
## 23 age_pcr, vaccine_before_study                    0   387
## 24 age_pcr, vaccine_before_study                    0   387
## 25 age_pcr, vaccine_before_study                    0   387
## 26 age_pcr, vaccine_before_study                    0   387
## 27 age_pcr, vaccine_before_study                    0   387
## 28 age_pcr, vaccine_before_study                    0   387
## 29 age_pcr, vaccine_before_study                    0   387
## 30 age_pcr, vaccine_before_study                    0   387
## 31 age_pcr, vaccine_before_study                    0   387
## 32 age_pcr, vaccine_before_study                    0   387
## 33 age_pcr, vaccine_before_study                    0   387
## 34 age_pcr, vaccine_before_study                    0   387
## 35 age_pcr, vaccine_before_study                    0   387
## 36 age_pcr, vaccine_before_study                    0   387
## 37 age_pcr, vaccine_before_study                    0   387
##                                       Status p_value p_value_FDR Significant
## 1                                         OK   0.003       0.067          No
## 2                                         OK   0.153       0.554          No
## 3                                         OK   0.512       0.943          No
## 4                                         OK   0.136       0.554          No
## 5                                         OK   0.123       0.554          No
## 6                                         OK   0.587       0.943          No
## 7                                         OK   0.127       0.554          No
## 8                                         OK   0.358       0.920          No
## 9                                         OK   0.679       0.943          No
## 10                                        OK   0.908       0.951          No
## 11                                        OK   0.589       0.943          No
## 12                                        OK   0.812       0.943          No
## 13                                        OK   0.697       0.943          No
## 14                                        OK   0.907       0.951          No
## 15                                        OK   0.727       0.943          No
## 16                                        OK   0.970       0.970          No
## 17                                        OK   0.925       0.951          No
## 18                                        OK   0.764       0.943          No
## 19                                        OK   0.802       0.943          No
## 20                                        OK   0.649       0.943          No
## 21                                        OK   0.037       0.336          No
## 22                                        OK   0.925       0.951          No
## 23                                        OK   0.008       0.101          No
## 24                                        OK   0.573       0.943          No
## 25                                        OK   0.154       0.554          No
## 26                                        OK   0.501       0.943          No
## 27                                        OK   0.345       0.920          No
## 28                                        OK   0.205       0.671          No
## 29 Error during model/LRT: NA/NaN/I f i  'x'    <NA>        <NA>          No
## 30                                        OK   0.650       0.943          No
## 31                                        OK   0.763       0.943          No
## 32                                        OK   0.699       0.943          No
## 33                                        OK   0.338       0.920          No
## 34                                        OK   0.547       0.943          No
## 35                                        OK   0.797       0.943          No
## 36                                        OK   0.061       0.439          No
## 37                                        OK   0.004       0.067          No
##    Gamma_Shift_Warning Convergence_Warning
## 1                FALSE               FALSE
## 2                FALSE               FALSE
## 3                FALSE               FALSE
## 4                FALSE               FALSE
## 5                FALSE               FALSE
## 6                FALSE               FALSE
## 7                FALSE               FALSE
## 8                FALSE               FALSE
## 9                FALSE               FALSE
## 10               FALSE               FALSE
## 11               FALSE               FALSE
## 12               FALSE               FALSE
## 13               FALSE               FALSE
## 14               FALSE               FALSE
## 15               FALSE               FALSE
## 16               FALSE               FALSE
## 17               FALSE               FALSE
## 18               FALSE               FALSE
## 19               FALSE               FALSE
## 20               FALSE               FALSE
## 21               FALSE               FALSE
## 22               FALSE               FALSE
## 23               FALSE               FALSE
## 24               FALSE               FALSE
## 25               FALSE               FALSE
## 26               FALSE               FALSE
## 27               FALSE               FALSE
## 28               FALSE               FALSE
## 29                TRUE               FALSE
## 30               FALSE               FALSE
## 31               FALSE               FALSE
## 32               FALSE               FALSE
## 33               FALSE               FALSE
## 34               FALSE               FALSE
## 35               FALSE               FALSE
## 36               FALSE               FALSE
## 37               FALSE               FALSE
# Run for Neuroimaging Variables
results_variant_neuro <- compare_groups_auto_v4(
  vars_to_test = neuroimaging_vars,
  group = "covid_variant_grouped",
  covariates = c("age_pcr", "vaccine_before_study"),
  data = pcr_positive_data
)
print(results_variant_neuro)
##                                                          Variable
## 1                                            right_accumbens_area
## 2                                             left_accumbens_area
## 3                                                  right_amygdala
## 4                                                   left_amygdala
## 5                                       right_cerebellum_exterior
## 6                                        left_cerebellum_exterior
## 7                                               right_hippocampus
## 8                                                left_hippocampus
## 9                                                   right_putamen
## 10                                                   left_putamen
## 11                                          right_thalamus_proper
## 12                                           left_thalamus_proper
## 13                                                   fornix_right
## 14                                                    fornix_left
## 15                        anterior_limb_of_internal_capsule_right
## 16                         anterior_limb_of_internal_capsule_left
## 17 posterior_limb_of_internal_capsule_inc_cerebral_peduncle_right
## 18  posterior_limb_of_internal_capsule_inc_cerebral_peduncle_left
## 19                                                corpus_callosum
## 20                          right_a_cg_g_anterior_cingulate_gyrus
## 21                           left_a_cg_g_anterior_cingulate_gyrus
## 22                                    right_a_ins_anterior_insula
## 23                                     left_a_ins_anterior_insula
## 24                                       right_an_g_angular_gyrus
## 25                                        left_an_g_angular_gyrus
## 26                                               right_cun_cuneus
## 27                                                left_cun_cuneus
## 28                                      right_ent_entorhinal_area
## 29                                       left_ent_entorhinal_area
## 30                                        right_g_re_gyrus_rectus
## 31                                         left_g_re_gyrus_rectus
##                              Type               Covariates_Used
## 1  Continuous (Gaussian GLM used) age_pcr, vaccine_before_study
## 2  Continuous (Gaussian GLM used) age_pcr, vaccine_before_study
## 3  Continuous (Gaussian GLM used) age_pcr, vaccine_before_study
## 4  Continuous (Gaussian GLM used) age_pcr, vaccine_before_study
## 5  Continuous (Gaussian GLM used) age_pcr, vaccine_before_study
## 6  Continuous (Gaussian GLM used) age_pcr, vaccine_before_study
## 7  Continuous (Gaussian GLM used) age_pcr, vaccine_before_study
## 8  Continuous (Gaussian GLM used) age_pcr, vaccine_before_study
## 9  Continuous (Gaussian GLM used) age_pcr, vaccine_before_study
## 10 Continuous (Gaussian GLM used) age_pcr, vaccine_before_study
## 11 Continuous (Gaussian GLM used) age_pcr, vaccine_before_study
## 12 Continuous (Gaussian GLM used) age_pcr, vaccine_before_study
## 13 Continuous (Gaussian GLM used) age_pcr, vaccine_before_study
## 14 Continuous (Gaussian GLM used) age_pcr, vaccine_before_study
## 15 Continuous (Gaussian GLM used) age_pcr, vaccine_before_study
## 16 Continuous (Gaussian GLM used) age_pcr, vaccine_before_study
## 17 Continuous (Gaussian GLM used) age_pcr, vaccine_before_study
## 18 Continuous (Gaussian GLM used) age_pcr, vaccine_before_study
## 19 Continuous (Gaussian GLM used) age_pcr, vaccine_before_study
## 20 Continuous (Gaussian GLM used) age_pcr, vaccine_before_study
## 21 Continuous (Gaussian GLM used) age_pcr, vaccine_before_study
## 22 Continuous (Gaussian GLM used) age_pcr, vaccine_before_study
## 23 Continuous (Gaussian GLM used) age_pcr, vaccine_before_study
## 24 Continuous (Gaussian GLM used) age_pcr, vaccine_before_study
## 25 Continuous (Gaussian GLM used) age_pcr, vaccine_before_study
## 26 Continuous (Gaussian GLM used) age_pcr, vaccine_before_study
## 27 Continuous (Gaussian GLM used) age_pcr, vaccine_before_study
## 28 Continuous (Gaussian GLM used) age_pcr, vaccine_before_study
## 29 Continuous (Gaussian GLM used) age_pcr, vaccine_before_study
## 30 Continuous (Gaussian GLM used) age_pcr, vaccine_before_study
## 31 Continuous (Gaussian GLM used) age_pcr, vaccine_before_study
##    Group_Ref_Level_Used n_obs Status p_value p_value_FDR Significant
## 1                     0   387     OK   0.664       0.879          No
## 2                     0   387     OK   0.552       0.778          No
## 3                     0   387     OK   0.015       0.457          No
## 4                     0   387     OK   0.517       0.778          No
## 5                     0   387     OK   0.442       0.778          No
## 6                     0   387     OK   0.176       0.654          No
## 7                     0   387     OK   0.190       0.654          No
## 8                     0   387     OK   0.832       0.921          No
## 9                     0   387     OK   0.780       0.896          No
## 10                    0   387     OK   0.737       0.879          No
## 11                    0   387     OK   0.314       0.778          No
## 12                    0   387     OK   0.421       0.778          No
## 13                    0   387     OK   0.147       0.654          No
## 14                    0   387     OK   0.461       0.778          No
## 15                    0   387     OK   0.732       0.879          No
## 16                    0   387     OK   0.685       0.879          No
## 17                    0   387     OK   0.400       0.778          No
## 18                    0   387     OK   0.877       0.925          No
## 19                    0   387     OK   0.281       0.778          No
## 20                    0   387     OK   0.487       0.778          No
## 21                    0   387     OK   0.142       0.654          No
## 22                    0   387     OK   0.127       0.654          No
## 23                    0   387     OK   0.103       0.654          No
## 24                    0   387     OK   0.094       0.654          No
## 25                    0   387     OK   0.225       0.698          No
## 26                    0   387     OK   0.327       0.778          No
## 27                    0   387     OK   0.970       0.970          No
## 28                    0   387     OK   0.156       0.654          No
## 29                    0   387     OK   0.895       0.925          No
## 30                    0   387     OK   0.443       0.778          No
## 31                    0   387     OK   0.527       0.778          No
##    Gamma_Shift_Warning Convergence_Warning
## 1                FALSE               FALSE
## 2                FALSE               FALSE
## 3                FALSE               FALSE
## 4                FALSE               FALSE
## 5                FALSE               FALSE
## 6                FALSE               FALSE
## 7                FALSE               FALSE
## 8                FALSE               FALSE
## 9                FALSE               FALSE
## 10               FALSE               FALSE
## 11               FALSE               FALSE
## 12               FALSE               FALSE
## 13               FALSE               FALSE
## 14               FALSE               FALSE
## 15               FALSE               FALSE
## 16               FALSE               FALSE
## 17               FALSE               FALSE
## 18               FALSE               FALSE
## 19               FALSE               FALSE
## 20               FALSE               FALSE
## 21               FALSE               FALSE
## 22               FALSE               FALSE
## 23               FALSE               FALSE
## 24               FALSE               FALSE
## 25               FALSE               FALSE
## 26               FALSE               FALSE
## 27               FALSE               FALSE
## 28               FALSE               FALSE
## 29               FALSE               FALSE
## 30               FALSE               FALSE
## 31               FALSE               FALSE

5. Analysis: Grouped COVID Variant (covid_variant_grouped) within PCR+ Group (N=387)

Cognitive Outcomes: Within the PCR-positive group, comparing different grouped COVID-19 variants (reference level 0, including an “Other_Rare” category) while controlling for age (age_pcr) and pre-study vaccination status (vaccine_before_study), no significant differences in detailed cognitive scores were found after FDR correction. listaprimerrec, x5dreaderr, x5dcounterr, fluencia, and bostonfonerr showed nominal significance (p<0.05 or p=0.061) but did not meet the FDR threshold. The analysis for dsomis failed. Neuroimaging Outcomes: After adjusting for age and vaccination status and applying FDR correction, no significant differences in neuroimaging volumes were detected across the grouped COVID-19 variant categories within the PCR-positive sample. right_amygdala and right_an_g_angular_gyrus showed nominal significance but were non-significant after correction.

— Analysis 6: Exploring Cognitive Domains Separately —

# Goal: Focus comparisons on specific types of cognitive measures.

# Define domain-specific variable lists (examples - adjust based on your expertise)
processing_speed_vars <- c("otverbaltpo", "otvisualtpo", "otmentaltpo", "otvismenttpo",
                           "otswitchtpo", "x5dreadtpo", "x5dcounttpo", "x5dfoctpo",
                           "x5dswitchtpo", "torretpo", "bostonlat")
accuracy_error_vars <- c("otverbalerr", "otvisualerr", "otmentalerr", "otvismenterr",
                         "otswitcherr", "x5dreaderr", "x5dcounterr", "x5dfocerr",
                         "x5dswitcherr", "dsomis", "dscomis", "bostonsemerr", "bostonfonerr")
memory_learning_vars <- c("listaprimerrec", "listaaprendizaje", "listacp", "listalp",
                          "listarecon", "corsidirecto", "corsiinverso", "dscorr")
executive_naming_vars <- c("fluencia", "bostonsc", "torremov") # Add others like otswitchtpo/err if desired

# Example: Comparing Processing Speed by PCR Status
results_pcr_speed <- compare_groups_auto_v4(
  vars_to_test = processing_speed_vars,
  group = "pcr",
  covariates = c("age_pcr"),
  data = imputed_data
)
print(results_pcr_speed)
##        Variable                           Type Covariates_Used
## 1   otverbaltpo Continuous (Gaussian GLM used)         age_pcr
## 2   otvisualtpo Continuous (Gaussian GLM used)         age_pcr
## 3   otmentaltpo Continuous (Gaussian GLM used)         age_pcr
## 4  otvismenttpo Continuous (Gaussian GLM used)         age_pcr
## 5   otswitchtpo Continuous (Gaussian GLM used)         age_pcr
## 6    x5dreadtpo Continuous (Gaussian GLM used)         age_pcr
## 7   x5dcounttpo Continuous (Gaussian GLM used)         age_pcr
## 8     x5dfoctpo Continuous (Gaussian GLM used)         age_pcr
## 9  x5dswitchtpo Continuous (Gaussian GLM used)         age_pcr
## 10     torretpo Continuous (Gaussian GLM used)         age_pcr
## 11    bostonlat Continuous (Gaussian GLM used)         age_pcr
##    Group_Ref_Level_Used n_obs Status p_value p_value_FDR Significant
## 1              NEGATIVA   463     OK   0.230       0.421          No
## 2              NEGATIVA   463     OK   0.994       0.994          No
## 3              NEGATIVA   463     OK   0.143       0.339          No
## 4              NEGATIVA   463     OK   0.345       0.447          No
## 5              NEGATIVA   463     OK   0.366       0.447          No
## 6              NEGATIVA   463     OK   0.154       0.339          No
## 7              NEGATIVA   463     OK   0.142       0.339          No
## 8              NEGATIVA   463     OK   0.344       0.447          No
## 9              NEGATIVA   463     OK   0.910       0.994          No
## 10             NEGATIVA   463     OK   0.007       0.078          No
## 11             NEGATIVA   463     OK   0.064       0.339          No
##    Gamma_Shift_Warning Convergence_Warning
## 1                FALSE               FALSE
## 2                FALSE               FALSE
## 3                FALSE               FALSE
## 4                FALSE               FALSE
## 5                FALSE               FALSE
## 6                FALSE               FALSE
## 7                FALSE               FALSE
## 8                FALSE               FALSE
## 9                FALSE               FALSE
## 10               FALSE               FALSE
## 11               FALSE               FALSE
# Example: Comparing Memory/Learning by Smell Loss (within PCR+)
results_smell_memory <- compare_groups_auto_v4(
  vars_to_test = memory_learning_vars,
  group = "smell_lost",
  covariates = c("age_pcr", "risk_hospital_icu"),
  data = pcr_positive_data
)
print(results_smell_memory)
##           Variable                           Type            Covariates_Used
## 1   listaprimerrec Continuous (Gaussian GLM used) age_pcr, risk_hospital_icu
## 2 listaaprendizaje Continuous (Gaussian GLM used) age_pcr, risk_hospital_icu
## 3          listacp Continuous (Gaussian GLM used) age_pcr, risk_hospital_icu
## 4          listalp Continuous (Gaussian GLM used) age_pcr, risk_hospital_icu
## 5       listarecon Continuous (Gaussian GLM used) age_pcr, risk_hospital_icu
## 6     corsidirecto Continuous (Gaussian GLM used) age_pcr, risk_hospital_icu
## 7     corsiinverso Continuous (Gaussian GLM used) age_pcr, risk_hospital_icu
## 8           dscorr Continuous (Gaussian GLM used) age_pcr, risk_hospital_icu
##   Group_Ref_Level_Used n_obs Status p_value p_value_FDR Significant
## 1                    0   387     OK   0.405       0.810          No
## 2                    0   387     OK   0.522       0.835          No
## 3                    0   387     OK   0.680       0.878          No
## 4                    0   387     OK   0.333       0.810          No
## 5                    0   387     OK   0.316       0.810          No
## 6                    0   387     OK   0.769       0.878          No
## 7                    0   387     OK   0.174       0.810          No
## 8                    0   387     OK   0.954       0.954          No
##   Gamma_Shift_Warning Convergence_Warning
## 1               FALSE               FALSE
## 2               FALSE               FALSE
## 3               FALSE               FALSE
## 4               FALSE               FALSE
## 5               FALSE               FALSE
## 6               FALSE               FALSE
## 7               FALSE               FALSE
## 8               FALSE               FALSE
# *** Repeat the above structure for other domains and other grouping variables of interest ***
# e.g., Accuracy vs PCR, Executive vs Severity (grouped), Memory vs Variant (grouped) etc.

6. Analysis: Cognitive Domains

Processing Speed vs. PCR Status (N=463): Comparing PCR-positive vs. PCR-negative groups (reference: NEGATIVA) on processing speed variables, while controlling for age (age_pcr), revealed a significant difference only for torretpo (Tower Task Time, FDR p=0.078, though nominal p=0.007 was strong). No other speed-related variables showed significant differences after FDR correction. bostonlat approached nominal significance (p=0.064). Memory/Learning vs. Smell Loss (within PCR+, N=387): Comparing PCR-positive individuals with and without smell loss (reference level 0) on memory and learning variables, while controlling for age (age_pcr) and COVID severity (risk_hospital_icu), revealed no significant differences after FDR correction.

Random Forest Classification for Cognitive Status

A machine learning approach such as a Random Forest can be used to classify participants according to cognitive status using demographic and neuroimaging predictors. This approach can also help identify the most important predictors of cognitive differences.

# Load the randomForest package
library(randomForest)

# Ensure cognitive status is treated as a factor
imputed_data$cognitive <- as.factor(imputed_data$cognitive)

# Select predictors (here we include age_pcr and neuroimaging measures; adjust as needed)
predictors <- imputed_data %>% 
  select(age_pcr, right_accumbens_area:left_g_re_gyrus_rectus)

# Combine response and predictors into one dataframe
rf_data <- data.frame(cognitive = imputed_data$cognitive, predictors)

# Set seed for reproducibility and train the random forest model
set.seed(123)
rf_model <- randomForest(cognitive ~ ., data = rf_data, importance = TRUE)

# Print the model summary
print(rf_model)
## 
## Call:
##  randomForest(formula = cognitive ~ ., data = rf_data, importance = TRUE) 
##                Type of random forest: classification
##                      Number of trees: 500
## No. of variables tried at each split: 5
## 
##         OOB estimate of  error rate: 40.6%
## Confusion matrix:
##     1   2 class.error
## 1 134  97   0.4199134
## 2  91 141   0.3922414
# Plot variable importance to identify key predictors
varImpPlot(rf_model)

## Interpretation of the Random Forest Variable Importance Plot

In these two plots (yes, I know they are too small to be seen), each point represents the contribution of a particular predictor variable to the random forest model used to classify participants according to their cognitive status. The left panel (Mean Decrease in Accuracy) indicates how much model accuracy would drop if a given variable were excluded, while the right panel (Mean Decrease in Gini) shows how each variable contributes to node purity (i.e., how well it splits the data within the trees). 1. Top Predictors • Age (age_pcr) stands out as the most influential predictor for classifying cognitive status, indicating that chronological age has the strongest impact on the model’s decision-making process. • Subcortical volumes (e.g., hippocampus and thalamus) also rank highly, suggesting that these brain regions are important for differentiating between cognitive groups. 2. Model Performance • The out-of-bag (OOB) error rate of about 40.6% implies a moderate level of predictive accuracy (roughly 59.4% correct classification). While certain variables (like age and hippocampal volumes) are clearly influential, the model still struggles to classify all individuals correctly. 3. Practical Implications • The strong role of age underscores the need to control for or further investigate age-related effects when studying cognitive outcomes. • The importance of hippocampal and thalamic volumes aligns with existing evidence that these structures are linked to cognitive performance, particularly in aging populations. • The moderate overall accuracy suggests that either (a) additional variables or (b) a different modeling approach (e.g., feature engineering, dimension reduction, or other machine learning methods) may be needed to improve classification performance.

Overall, these plots highlight that age and certain subcortical regions are key drivers of the model’s classification decisions, but the relatively high misclassification rate points to a complex interplay of factors influencing cognitive status.

Generalized Additive Models (GAM) for Non-linear Relationships

GAMs allow flexible modeling of non-linear effects. For instance, you can explore the non-linear association between age (or age interval) and brain volumes, which may not be captured adequately by linear models.

# Load the mgcv package
library(mgcv)

# Fit a GAM for one brain region (e.g., right_hippocampus) as a function of age_pcr
gam_model <- gam(right_hippocampus ~ s(age_pcr), data = imputed_data)

# Print the summary of the model
summary(gam_model)
## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## right_hippocampus ~ s(age_pcr)
## 
## Parametric coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  3823.52      19.28   198.3   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##              edf Ref.df     F  p-value    
## s(age_pcr) 3.419  4.326 9.536 3.68e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.0801   Deviance explained = 8.69%
## GCV = 1.7373e+05  Scale est. = 1.7207e+05  n = 463
# Plot the fitted smooth along with the residuals
plot(gam_model, residuals = TRUE, pch = 20, cex = 0.5, shade = TRUE)

# Diagnostic plots for the GAM model
par(mfrow = c(2,2))
plot(gam_model, residuals = TRUE, pch = 20, cex = 0.5, shade = TRUE)
# Check model diagnostics
gam.check(gam_model)

## 
## Method: GCV   Optimizer: magic
## Smoothing parameter selection converged after 4 iterations.
## The RMS GCV score gradient at convergence was 0.01721632 .
## The Hessian was positive definite.
## Model rank =  10 / 10 
## 
## Basis dimension (k) checking results. Low p-value (k-index<1) may
## indicate that k is too low, especially if edf is close to k'.
## 
##              k'  edf k-index p-value
## s(age_pcr) 9.00 3.42    0.95    0.12
library(lavaan)

# --- Create the numeric version of cognitive FIRST ---
# Ensure the 'cognitive' factor exists and convert it
if ("cognitive" %in% names(imputed_data) && is.factor(imputed_data$cognitive)) {
    imputed_data$cognitive_num <- as.numeric(as.character(imputed_data$cognitive))
    print("Created 'cognitive_num' variable.")
} else {
    stop("Error: 'cognitive' column not found in imputed_data or is not a factor. Cannot create 'cognitive_num'.")
}
## [1] "Created 'cognitive_num' variable."
# --- Scaling Variables for Mediation ---
# Select only the data needed for the model and scale the continuous variables
# Now 'cognitive_num' exists and can be selected
mediation_data_scaled <- imputed_data %>%
  # Ensure the necessary columns exist before selecting
  select(one_of(c("cognitive_num", "right_hippocampus", "age_pcr"))) %>%
  mutate(
    cognitive_num_scaled = scale(cognitive_num),
    right_hippocampus_scaled = scale(right_hippocampus),
    age_pcr_scaled = scale(age_pcr)
  ) %>%
  # Select only the scaled variables for the model
  select(cognitive_num_scaled, right_hippocampus_scaled, age_pcr_scaled)

# --- Define the Mediation Model using Scaled Variables ---
mediation_model_scaled <- '
  # Direct effect
  cognitive_num_scaled ~ c*age_pcr_scaled
  # Mediator path
  right_hippocampus_scaled ~ a*age_pcr_scaled
  cognitive_num_scaled ~ b*right_hippocampus_scaled
  # Indirect effect (a*b) and total effect
  ab := a*b
  total := c + (a*b)
'

# --- Fit the Mediation Model using Scaled Data ---
# Still use MLR for robustness if desired, but scaling often helps stability
fit_mediation_scaled <- sem(mediation_model_scaled,
                            data = mediation_data_scaled,
                            estimator = "MLR", # Or use "ML" if MLR still fails
                            warn = TRUE) # Keep warnings on

# --- Summarize the Results ---
# Note: Coefficients will now be standardized estimates because variables were scaled
print("--- Summary of Mediation Model with Scaled Variables ---")
## [1] "--- Summary of Mediation Model with Scaled Variables ---"
# Check if the model converged and standard errors were computed
summary_output <- tryCatch(summary(fit_mediation_scaled, standardized = FALSE, fit.measures = TRUE),
                           error = function(e) { print(paste("Error in summary:", e$message)); NULL })

if (!is.null(summary_output)) {
    print(summary_output)
} else {
    print("Model summary could not be generated. Check previous warnings/errors.")
}
## lavaan 0.6-19 ended normally after 1 iteration
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of model parameters                         5
## 
##   Number of observations                           463
## 
## Model Test User Model:
##                                               Standard      Scaled
##   Test Statistic                                 0.000       0.000
##   Degrees of freedom                                 0           0
## 
## Model Test Baseline Model:
## 
##   Test statistic                                67.355      67.355
##   Degrees of freedom                                 3           3
##   P-value                                        0.000       0.000
##   Scaling correction factor                                  1.000
## 
## User Model versus Baseline Model:
## 
##   Comparative Fit Index (CFI)                    1.000       1.000
##   Tucker-Lewis Index (TLI)                       1.000       1.000
##                                                                   
##   Robust Comparative Fit Index (CFI)                            NA
##   Robust Tucker-Lewis Index (TLI)                               NA
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)              -1279.258   -1279.258
##   Loglikelihood unrestricted model (H1)      -1279.258   -1279.258
##                                                                   
##   Akaike (AIC)                                2568.517    2568.517
##   Bayesian (BIC)                              2589.205    2589.205
##   Sample-size adjusted Bayesian (SABIC)       2573.337    2573.337
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.000          NA
##   90 Percent confidence interval - lower         0.000          NA
##   90 Percent confidence interval - upper         0.000          NA
##   P-value H_0: RMSEA <= 0.050                       NA          NA
##   P-value H_0: RMSEA >= 0.080                       NA          NA
##                                                                   
##   Robust RMSEA                                               0.000
##   90 Percent confidence interval - lower                     0.000
##   90 Percent confidence interval - upper                     0.000
##   P-value H_0: Robust RMSEA <= 0.050                            NA
##   P-value H_0: Robust RMSEA >= 0.080                            NA
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.000       0.000
## 
## Parameter Estimates:
## 
##   Standard errors                             Sandwich
##   Information bread                           Observed
##   Observed information based on                Hessian
## 
## Regressions:
##                              Estimate  Std.Err  z-value  P(>|z|)
##   cognitive_num_scaled ~                                        
##     ag_pcr_scl (c)             -0.252    0.043   -5.914    0.000
##   right_hippocampus_scaled ~                                    
##     ag_pcr_scl (a)             -0.252    0.052   -4.869    0.000
##   cognitive_num_scaled ~                                        
##     rght_hppc_ (b)              0.069    0.044    1.562    0.118
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##    .cogntv_nm_scld    0.921    0.022   41.150    0.000
##    .rght_hppcmps_s    0.935    0.068   13.742    0.000
## 
## Defined Parameters:
##                    Estimate  Std.Err  z-value  P(>|z|)
##     ab               -0.017    0.012   -1.457    0.145
##     total            -0.269    0.040   -6.718    0.000
# You can also request standardized=TRUE to see std.all column if needed,
# though it will be very similar to the estimates themselves now.
# summary(fit_mediation_scaled, standardized = TRUE, fit.measures = TRUE)

# Check variances of scaled data (should be ~1)
print("--- Variance Table for Scaled Data Used in Model ---")
## [1] "--- Variance Table for Scaled Data Used in Model ---"
# Use the scaled data frame directly to check variances
print(sapply(mediation_data_scaled, var))
##     cognitive_num_scaled right_hippocampus_scaled           age_pcr_scaled 
##                        1                        1                        1
# Or use varTable on the fitted object if it succeeded
# print("--- Variance Table from lavaan Object ---")
# tryCatch(print(varTable(fit_mediation_scaled)), error = function(e) print(paste("varTable failed:", e$message)))

The mediation model is just‐identified, so all global fit indices (CFI, TLI, RMSEA, SRMR) are perfect by definition and do not provide additional insight. The estimated parameters indicate the following: • Direct Effect (c): The effect of age (age_pcr) on cognitive performance (cognitive_num) is estimated at –0.024 (standardized –0.252), suggesting that, holding other factors constant, an increase in age is directly associated with lower cognitive scores. • Path a: The effect of age on right hippocampal volume is –20.512 (standardized –0.252), indicating that older age is associated with a reduction in hippocampal volume. • Path b: The effect of right hippocampal volume on cognitive performance is estimated at 0.000 (standardized 0.069). This near‐zero coefficient suggests that, in this model, hippocampal volume does not significantly predict cognitive performance once age is accounted for. • Indirect Effect (ab): The product of paths a and b (i.e., the mediated effect) is –0.002 (standardized –0.017), which is very small relative to the total effect. • Total Effect: The sum of the direct and indirect effects is –0.025 (standardized –0.269), which is essentially driven by the direct effect of age on cognition.

While age significantly predicts both lower cognitive performance and reduced hippocampal volume, there is no evidence from this model that hippocampal volume mediates the relationship between age and cognitive performance. The mediation pathway (ab) is negligible, indicating that the association between age and cognition is primarily a direct effect.

— New Analysis Questions —

# Assume 'imputed_data' dataframe from the imputation chunk is available
# Ensure relevant variables are factors where appropriate (as done in original EDA)
factor_cols_imp <- c("pcr", "anosmia", "risk_hospital_icu", "vaccine_before_study",
                     "covid_before_vaccination", "fever", "cough", "muscle_pain",
                     "breath_dif", "smell_lost", "taste_lost", "covid_variant",
                     "vaccine_1", "vaccine_2", "vaccine_3", "cognitive") # Add cognitive if used as factor

existing_factor_cols_imp <- factor_cols_imp[factor_cols_imp %in% names(imputed_data)]

if (length(existing_factor_cols_imp) > 0) {
    imputed_data <- imputed_data %>%
        mutate(across(all_of(existing_factor_cols_imp), as.factor))
}

# Define cognitive and neuroimaging variable sets (adjust based on actual names in imputed_data)
# Assuming imputed_data columns 23:59 are cognitive, and 60:90 are neuroimaging
# Use names for robustness if column order might change
cognitive_vars_names <- names(imputed_data)[23:59] # Example range, verify column names
neuroimaging_vars_names <- names(imputed_data)[60:90] # Example range, verify column names
symptom_vars_names <- c("anosmia", "fever", "cough", "muscle_pain", "breath_dif", "smell_lost", "taste_lost") # Add others if relevant
symptom_vars_names <- intersect(symptom_vars_names, names(imputed_data)) # Keep only existing symptom vars

Question 9: Does COVID-19 severity (Risk Hospital/ICU) interact with PCR status to predict cognitive outcomes?

# Hypothesis: More severe COVID (higher risk) might lead to worse cognitive scores,
# potentially amplified in the PCR positive group.
# Using 'cognitive_num' (created for mediation) or a primary cognitive score.
# Let's use 'cognitive_num' and control for age.

if (all(c("cognitive_num", "risk_hospital_icu", "pcr", "age_pcr") %in% names(imputed_data))) {
    print("--- Q9: Severity (Risk Hospital/ICU) Interaction with PCR on Cognition ---")

    # Ensure risk_hospital_icu is a factor
    if (!is.factor(imputed_data$risk_hospital_icu)) {
        imputed_data$risk_hospital_icu <- factor(imputed_data$risk_hospital_icu)
        print("Converted risk_hospital_icu to factor.")
    }
    print("Levels of risk_hospital_icu:")
    print(levels(imputed_data$risk_hospital_icu))
    print("Table of risk_hospital_icu vs pcr:")
    print(table(imputed_data$risk_hospital_icu, imputed_data$pcr))

    # Fit linear model with interaction term
    # Using cognitive_num as the outcome (assuming it's a reasonable continuous score)
    lm_severity_interaction <- lm(cognitive_num ~ risk_hospital_icu * pcr + age_pcr, data = imputed_data)

    print("ANOVA Table (Type III SS) for Severity Interaction Model:")
    tryCatch({
        anova_q9 <- car::Anova(lm_severity_interaction, type = "III")
        print(anova_q9)

        # Calculate effect sizes
        print("Effect Sizes (Partial Eta Squared):")
        print(effectsize::eta_squared(anova_q9, partial = TRUE))

    }, error = function(e) {
        print(paste("Error running Anova/eta_squared for Q9:", e$message))
        print("Showing basic model summary instead:")
        print(summary(lm_severity_interaction))
    })

    # Visualize interaction if significant (example using ggplot)
    # Check interaction term p-value from Anova output
    # interaction_p_val_q9 <- anova_q9["risk_hospital_icu:pcr", "Pr(>F)"] # Adjust row name if needed
    # if (!is.na(interaction_p_val_q9) && interaction_p_val_q9 < 0.05) {
    #    print("Interaction detected, generating plot:")
    #    print(
    #        ggplot(imputed_data, aes(x = pcr, y = cognitive_num, color = risk_hospital_icu, group = risk_hospital_icu)) +
    #            stat_summary(fun.data = mean_se, geom = "errorbar", width = 0.1, position = position_dodge(0.1)) +
    #            stat_summary(fun = mean, geom = "line", position = position_dodge(0.1)) +
    #            stat_summary(fun = mean, geom = "point", position = position_dodge(0.1), size = 2) +
    #            labs(title = "Interaction: Cognitive Score by PCR Status and Hospital/ICU Risk",
    #                 x = "PCR Status", y = "Mean Cognitive Score", color = "Hospital/ICU Risk") +
    #            theme_minimal()
    #    )
    # } else {
    #    print("Interaction term risk_hospital_icu:pcr not significant, skipping interaction plot.")
    # }

} else {
    print("Skipping Q9 analysis: Required columns ('cognitive_num', 'risk_hospital_icu', 'pcr', 'age_pcr') not found.")
}
## [1] "--- Q9: Severity (Risk Hospital/ICU) Interaction with PCR on Cognition ---"
## [1] "Levels of risk_hospital_icu:"
## [1] "0" "1" "2" "3"
## [1] "Table of risk_hospital_icu vs pcr:"
##    
##     NEGATIVA POSITIVA
##   0       74      314
##   1        0       17
##   2        2       49
##   3        0        7
## [1] "ANOVA Table (Type III SS) for Severity Interaction Model:"
## [1] "Error running Anova/eta_squared for Q9: there are aliased coefficients in the model"
## [1] "Showing basic model summary instead:"
## 
## Call:
## lm(formula = cognitive_num ~ risk_hospital_icu * pcr + age_pcr, 
##     data = imputed_data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.7691 -0.4598  0.2444  0.4404  0.8511 
## 
## Coefficients: (2 not defined because of singularities)
##                                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                     3.295779   0.292112  11.283  < 2e-16 ***
## risk_hospital_icu1              0.001595   0.120389   0.013    0.989    
## risk_hospital_icu2             -0.101563   0.346109  -0.293    0.769    
## risk_hospital_icu3             -0.138195   0.185020  -0.747    0.455    
## pcrPOSITIVA                    -0.093310   0.062589  -1.491    0.137    
## age_pcr                        -0.025670   0.004258  -6.029 3.42e-09 ***
## risk_hospital_icu1:pcrPOSITIVA        NA         NA      NA       NA    
## risk_hospital_icu2:pcrPOSITIVA  0.028691   0.353973   0.081    0.935    
## risk_hospital_icu3:pcrPOSITIVA        NA         NA      NA       NA    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4829 on 456 degrees of freedom
## Multiple R-squared:  0.08123,    Adjusted R-squared:  0.06914 
## F-statistic:  6.72 on 6 and 456 DF,  p-value: 7.827e-07

R9: An analysis was conducted to determine if the relationship between COVID-19 severity (operationalized as risk_hospital_icu) and cognitive performance (cognitive_num) differs based on PCR status (pcr), while controlling for age (age_pcr). A linear model including an interaction term between severity and PCR status was specified. However, due to singularities in the model, likely caused by zero counts in some combinations of severity level and PCR status (e.g., no PCR-negative individuals in severity levels 1 or 3), the interaction effects could not be fully estimated. The partial results from the model indicated no significant main effect for PCR status (p=0.137), severity levels (compared to baseline), or the estimable interaction term (severity level 2 * PCR status, p=0.935). Age remained a highly significant predictor of cognitive score (p < 0.001). Therefore, this analysis could not definitively assess the interaction due to data limitations, but the available evidence did not support a significant interaction or main effect of severity or PCR status on cognition after accounting for age.

Question 10: Can symptom clusters predict cognitive performance differently than individual symptoms?

# Hypothesis: Patterns of symptoms (e.g., predominantly neurological vs. respiratory)
# might be more informative for predicting cognitive outcomes than single symptoms.
# Approach: Use clustering (e.g., K-means or hierarchical) on symptom variables
# for the PCR positive group, then compare cognitive scores across clusters.

if (length(symptom_vars_names) > 1 && "pcr" %in% names(imputed_data) && "cognitive_num" %in% names(imputed_data) && "age_pcr" %in% names(imputed_data)) {
    print("--- Q10: Symptom Clusters and Cognition (in PCR+ group) ---")

    # Subset data to PCR positive individuals and select symptom variables
    pcr_positive_data <- imputed_data %>% filter(pcr == levels(pcr)[2]) # Assuming level 2 is positive
    symptoms_for_clustering <- pcr_positive_data %>% select(all_of(symptom_vars_names))

    # Convert factors to numeric for clustering (handle with care - assumes meaningful numeric representation)
    # This might require careful dummy coding if factors are not ordinal/binary 0/1
    # Example: simple conversion assuming binary factors are 0/1 or similar
    symptoms_numeric <- symptoms_for_clustering %>%
        mutate(across(everything(), ~ as.numeric(as.character(.)))) %>%
        na.omit() # Clustering typically requires complete data

    if (nrow(symptoms_numeric) > 10) { # Need sufficient data points for clustering
        # Scale data before clustering
        symptoms_scaled <- scale(symptoms_numeric)

        # Determine optimal number of clusters (e.g., using elbow or silhouette method)
        print("Determining optimal clusters (Example: Elbow method):")
        tryCatch({
             print(factoextra::fviz_nbclust(symptoms_scaled, kmeans, method = "wss", k.max = 10))
             # print(factoextra::fviz_nbclust(symptoms_scaled, kmeans, method = "silhouette", k.max = 10))
             # Choose k based on the plot (e.g., k=3)
             chosen_k <- 3 # <<-- Set k based on fviz_nbclust output
        }, error = function(e) {
             print(paste("Cluster number determination failed:", e$message))
             chosen_k <- 3 # Default to 3 clusters if determination fails
             print(paste("Defaulting to k =", chosen_k))
        })

        # Perform K-means clustering
        set.seed(123) # for reproducibility
        km_results <- kmeans(symptoms_scaled, centers = chosen_k, nstart = 25)

        # Add cluster assignments back to the PCR positive data (matching by row index/ID if possible)
        # This assumes na.omit() didn't drastically change the dataset size/order. Robust matching needed for production.
        pcr_positive_data_complete <- pcr_positive_data %>% na.omit() # Match the data used for clustering
        if(nrow(pcr_positive_data_complete) == nrow(symptoms_numeric)){
             pcr_positive_data_complete$symptom_cluster <- factor(km_results$cluster)
             print("Cluster assignments added.")

             # Analyze cognitive scores across clusters, controlling for age
             print(paste("Comparing cognitive_num across", chosen_k, "symptom clusters (controlling for age):"))
             lm_cluster_cognition <- lm(cognitive_num ~ symptom_cluster + age_pcr, data = pcr_positive_data_complete)
             print(summary(lm_cluster_cognition))

             print("ANOVA Table (Type III SS) for Cluster Model:")
             tryCatch({
                 anova_q10 <- car::Anova(lm_cluster_cognition, type = "III")
                 print(anova_q10)
                 print("Effect Sizes (Partial Eta Squared):")
                 print(effectsize::eta_squared(anova_q10, partial = TRUE))
             }, error = function(e){ print(paste("Error in Anova/eta_squared for Q10:", e$message))})

             # Visualize differences (optional)
             # print(
             #    ggplot(pcr_positive_data_complete, aes(x = symptom_cluster, y = cognitive_num, fill = symptom_cluster)) +
             #        geom_boxplot(alpha=0.7) +
             #        labs(title = "Cognitive Score by Symptom Cluster (PCR Positive, Age Adjusted?)", # Note: plot doesn't show adjustment
             #             x = "Symptom Cluster", y = "Cognitive Score") +
             #        theme_minimal()
             # )

        } else {
             print("Mismatch in row numbers after na.omit(). Cannot reliably add cluster assignments.")
        }

    } else {
        print("Skipping Q10 clustering: Insufficient complete data points for symptom variables in PCR+ group.")
    }

} else {
    print("Skipping Q10 analysis: Required symptom variables, 'pcr', 'cognitive_num', or 'age_pcr' not found/sufficient.")
}
## [1] "--- Q10: Symptom Clusters and Cognition (in PCR+ group) ---"
## [1] "Determining optimal clusters (Example: Elbow method):"

## [1] "Cluster assignments added."
## [1] "Comparing cognitive_num across 3 symptom clusters (controlling for age):"
## 
## Call:
## lm(formula = cognitive_num ~ symptom_cluster + age_pcr, data = pcr_positive_data_complete)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.8010 -0.4853 -0.1251  0.4426  0.8268 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       3.075647   0.318998   9.642  < 2e-16 ***
## symptom_cluster2  0.018474   0.062070   0.298    0.766    
## symptom_cluster3 -0.002058   0.060403  -0.034    0.973    
## age_pcr          -0.024012   0.004752  -5.053 6.75e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4865 on 383 degrees of freedom
## Multiple R-squared:  0.06268,    Adjusted R-squared:  0.05534 
## F-statistic: 8.538 on 3 and 383 DF,  p-value: 1.68e-05
## 
## [1] "ANOVA Table (Type III SS) for Cluster Model:"
## Anova Table (Type III tests)
## 
## Response: cognitive_num
##                 Sum Sq  Df F value    Pr(>F)    
## (Intercept)     21.999   1 92.9605 < 2.2e-16 ***
## symptom_cluster  0.033   2  0.0687    0.9336    
## age_pcr          6.042   1 25.5321 6.751e-07 ***
## Residuals       90.636 383                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## [1] "Effect Sizes (Partial Eta Squared):"
## # Effect Size for ANOVA (Type III)
## 
## Parameter       | Eta2 (partial) |       95% CI
## -----------------------------------------------
## symptom_cluster |       3.59e-04 | [0.00, 1.00]
## age_pcr         |           0.06 | [0.03, 1.00]
## 
## - One-sided CIs: upper bound fixed at [1.00].

R10: To investigate whether patterns of symptoms are associated with cognitive outcomes, symptom data from PCR-positive participants were subjected to k-means clustering, resulting in the identification of three distinct symptom clusters. A linear model was then employed to assess differences in cognitive performance (cognitive_num) across these clusters, controlling for age (age_pcr). The analysis revealed no statistically significant differences in cognitive scores between the derived symptom clusters (F(2, 383) ≈ 0.07, p = 0.93). Age, however, remained a significant predictor (p < 0.001), indicating that older participants within the PCR-positive group had lower cognitive scores irrespective of their symptom cluster membership. The partial eta-squared for symptom cluster was negligible (η²p < 0.001), suggesting symptom patterns, as clustered here, did not explain variance in cognitive performance beyond age.

Question 11: Are there distinct cognitive profiles among participants, and how do they relate to COVID history or demographics?

# Hypothesis: Participants might fall into subgroups based on their performance patterns across
# multiple cognitive tests, revealing different types of cognitive impact (or resilience).
# Approach: Cluster participants based on a wide range of cognitive test scores.

if (length(cognitive_vars_names) > 5) { # Need multiple cognitive variables
    print("--- Q11: Identifying Cognitive Profiles via Clustering ---")

    # Select cognitive variables from imputed data
    cognitive_data_for_clustering <- imputed_data %>% select(all_of(cognitive_vars_names)) %>% na.omit()

    if(nrow(cognitive_data_for_clustering) > 10){
        # Scale the data
        cognitive_scaled <- scale(cognitive_data_for_clustering)

        # Determine optimal number of clusters (similar to Q10)
        print("Determining optimal cognitive clusters (Example: Elbow method):")
         tryCatch({
             print(factoextra::fviz_nbclust(cognitive_scaled, kmeans, method = "wss", k.max = 10))
             # print(factoextra::fviz_nbclust(cognitive_scaled, kmeans, method = "silhouette", k.max = 10))
             # Choose k based on the plot (e.g., k=4)
             chosen_k_cog <- 4 # <<-- Set k based on fviz_nbclust output
        }, error = function(e) {
             print(paste("Cluster number determination failed:", e$message))
             chosen_k_cog <- 3 # Default to 3 clusters if determination fails
             print(paste("Defaulting to k =", chosen_k_cog))
        })

        # Perform K-means clustering
        set.seed(456)
        km_cog_results <- kmeans(cognitive_scaled, centers = chosen_k_cog, nstart = 25)

        # Add cluster assignments back to the main imputed dataset
        # Again, assumes na.omit didn't drastically change things. Use ID matching if available.
        imputed_data_complete_cog <- imputed_data %>% na.omit() # Match data used
         if(nrow(imputed_data_complete_cog) == nrow(cognitive_data_for_clustering)){
             imputed_data_complete_cog$cognitive_profile <- factor(km_cog_results$cluster)
             print("Cognitive profile cluster assignments added.")

             # Analyze characteristics of each profile cluster
             print(paste("Analyzing characteristics across", chosen_k_cog, "cognitive profiles:"))

             # Example: Compare age across profiles
             if("age_pcr" %in% names(imputed_data_complete_cog)){
                 print("Age distribution by cognitive profile:")
                 print(summary(aov(age_pcr ~ cognitive_profile, data = imputed_data_complete_cog)))
                 # print(ggplot(imputed_data_complete_cog, aes(x=cognitive_profile, y=age_pcr, fill=cognitive_profile)) + geom_boxplot())
             }

             # Example: Compare PCR status distribution across profiles
             if("pcr" %in% names(imputed_data_complete_cog)){
                 print("PCR status distribution by cognitive profile:")
                 print(table(imputed_data_complete_cog$cognitive_profile, imputed_data_complete_cog$pcr))
                 print(chisq.test(table(imputed_data_complete_cog$cognitive_profile, imputed_data_complete_cog$pcr)))
             }

             # Example: Compare symptom cluster distribution across profiles (if Q10 ran successfully)
             if("symptom_cluster" %in% names(imputed_data_complete_cog)){ # Need to merge results carefully if na.omits differed
                 print("Symptom cluster distribution by cognitive profile:")
                 # Requires merging Q10 results back carefully if na.omits were different
                 # print(table(imputed_data_complete_cog$cognitive_profile, imputed_data_complete_cog$symptom_cluster))
                 # print(chisq.test(table(imputed_data_complete_cog$cognitive_profile, imputed_data_complete_cog$symptom_cluster)))
             }

             # Further analysis: Characterize each cluster by looking at the mean scaled cognitive scores
             profile_centers <- aggregate(cognitive_scaled, by=list(cluster=km_cog_results$cluster), mean)
             print("Mean scaled cognitive scores for each profile cluster:")
             print(profile_centers)

         } else {
             print("Mismatch in row numbers after na.omit(). Cannot reliably add cognitive profile assignments.")
         }

    } else {
        print("Skipping Q11 clustering: Insufficient complete data points for cognitive variables.")
    }

} else {
    print("Skipping Q11 analysis: Not enough cognitive variables found.")
}
## [1] "--- Q11: Identifying Cognitive Profiles via Clustering ---"
## [1] "Determining optimal cognitive clusters (Example: Elbow method):"

## [1] "Mismatch in row numbers after na.omit(). Cannot reliably add cognitive profile assignments."

R11: An attempt was made to identify distinct cognitive profiles by applying clustering algorithms to a selection of cognitive performance variables. The intention was to subsequently examine the relationship between these derived profiles and participants’ demographic characteristics or COVID-19 history. While preliminary steps, such as using the elbow method to explore potential optimal numbers of clusters, were undertaken, a data processing error occurred (Mismatch in row numbers after na.omit()). This technical issue prevented the reliable assignment of participants to the identified cognitive profiles within the main dataset. Consequently, the planned analysis comparing demographic or COVID-related variables across different cognitive profiles could not be completed.

Question 12: What are the correlations between specific cognitive domains and a broader range of neuroimaging variables?

# Hypothesis: Specific cognitive functions (e.g., executive function, memory) might correlate
# more strongly with certain brain regions than others.
# Approach: Create a correlation matrix between selected cognitive scores and neuroimaging variables.

# Select key cognitive domain representatives and neuroimaging variables
# Example cognitive vars: fluency, dscorr (memory), otswitcherr (switching errors), bostonsc (naming)
selected_cog_vars <- c("fluencia", "dscorr", "otswitcherr", "bostonsc")
selected_cog_vars <- intersect(selected_cog_vars, names(imputed_data)) # Ensure they exist

# Use all available neuroimaging vars
selected_neuro_vars <- neuroimaging_vars_names

if (length(selected_cog_vars) > 1 && length(selected_neuro_vars) > 1) {
    print("--- Q12: Correlations between Cognitive Domains and Neuroimaging ---")

    # Subset data
    cor_data_q12 <- imputed_data %>% select(all_of(selected_cog_vars), all_of(selected_neuro_vars))

    # Calculate correlation matrix (using pairwise complete observations)
    cor_matrix_q12 <- cor(cor_data_q12, use = "pairwise.complete.obs")

    # Extract the submatrix of correlations between cognitive and neuroimaging variables
    cog_neuro_cor <- cor_matrix_q12[selected_cog_vars, selected_neuro_vars]

    print("Correlation Matrix (Cognitive vs. Neuroimaging):")
    # Print rounded matrix (adjust rounding digits if needed)
    print(round(cog_neuro_cor, 2))

    # Visualize the correlation matrix using corrplot
    print("Generating Correlation Plot (Cognitive vs. Neuroimaging):")
    tryCatch({
        corrplot::corrplot(cog_neuro_cor,
                 method = "color", # Use color intensity
                 type = "full",    # Show full matrix
                 order = "hclust", # Reorder based on clustering
                 addCoef.col = "black", # Add correlation coefficients
                 tl.col = "black", tl.srt = 45, # Text label color and rotation
                 number.cex = 0.7, # Size of coefficients
                 tl.cex = 0.8,     # Size of labels
                 is.corr = TRUE,  # Input is a correlation matrix
                 sig.level = 0.05, # Optionally cross out non-significant correlations
                 insig = "blank",  # Leave non-significant blank (or use "pch", "p-value")
                 # Note: Significance requires calculating p-values separately, corrplot doesn't do it automatically from matrix input
                 title = "Correlations: Cognitive Domains vs Neuroimaging",
                 mar = c(0, 0, 1, 0)) # Adjust margins
    }, error = function(e){ print(paste("Corrplot failed:", e$message))})

} else {
    print("Skipping Q12 analysis: Not enough selected cognitive or neuroimaging variables found.")
}
## [1] "--- Q12: Correlations between Cognitive Domains and Neuroimaging ---"
## [1] "Correlation Matrix (Cognitive vs. Neuroimaging):"
##             right_accumbens_area left_accumbens_area right_amygdala
## fluencia                   -0.11               -0.10          -0.11
## dscorr                     -0.10               -0.12          -0.08
## otswitcherr                 0.09                0.10           0.19
## bostonsc                   -0.06               -0.06          -0.15
##             left_amygdala right_cerebellum_exterior left_cerebellum_exterior
## fluencia            -0.09                     -0.11                    -0.12
## dscorr              -0.08                     -0.13                    -0.11
## otswitcherr          0.23                      0.11                     0.11
## bostonsc            -0.12                     -0.16                    -0.16
##             right_hippocampus left_hippocampus right_putamen left_putamen
## fluencia                -0.14            -0.16         -0.06        -0.08
## dscorr                  -0.15            -0.13         -0.03        -0.05
## otswitcherr              0.23             0.22          0.05         0.02
## bostonsc                -0.17            -0.19         -0.06        -0.09
##             right_thalamus_proper left_thalamus_proper fornix_right fornix_left
## fluencia                    -0.20                -0.22        -0.15       -0.19
## dscorr                      -0.21                -0.23        -0.23       -0.22
## otswitcherr                  0.24                 0.22         0.23        0.28
## bostonsc                    -0.25                -0.28        -0.17       -0.16
##             anterior_limb_of_internal_capsule_right
## fluencia                                      -0.14
## dscorr                                        -0.19
## otswitcherr                                    0.22
## bostonsc                                      -0.19
##             anterior_limb_of_internal_capsule_left
## fluencia                                     -0.16
## dscorr                                       -0.23
## otswitcherr                                   0.22
## bostonsc                                     -0.21
##             posterior_limb_of_internal_capsule_inc_cerebral_peduncle_right
## fluencia                                                             -0.13
## dscorr                                                               -0.11
## otswitcherr                                                           0.14
## bostonsc                                                             -0.14
##             posterior_limb_of_internal_capsule_inc_cerebral_peduncle_left
## fluencia                                                            -0.13
## dscorr                                                              -0.11
## otswitcherr                                                          0.10
## bostonsc                                                            -0.13
##             corpus_callosum right_a_cg_g_anterior_cingulate_gyrus
## fluencia              -0.15                                 -0.04
## dscorr                -0.15                                 -0.06
## otswitcherr            0.19                                  0.12
## bostonsc              -0.15                                 -0.05
##             left_a_cg_g_anterior_cingulate_gyrus right_a_ins_anterior_insula
## fluencia                                   -0.17                       -0.14
## dscorr                                     -0.16                       -0.24
## otswitcherr                                 0.18                        0.24
## bostonsc                                   -0.16                       -0.21
##             left_a_ins_anterior_insula right_an_g_angular_gyrus
## fluencia                         -0.13                    -0.15
## dscorr                           -0.19                    -0.13
## otswitcherr                       0.21                     0.13
## bostonsc                         -0.16                    -0.08
##             left_an_g_angular_gyrus right_cun_cuneus left_cun_cuneus
## fluencia                      -0.09            -0.11           -0.09
## dscorr                        -0.08            -0.06           -0.05
## otswitcherr                    0.05             0.11            0.07
## bostonsc                      -0.05            -0.11           -0.12
##             right_ent_entorhinal_area left_ent_entorhinal_area
## fluencia                        -0.09                    -0.07
## dscorr                          -0.07                    -0.03
## otswitcherr                      0.17                     0.13
## bostonsc                        -0.15                    -0.11
##             right_g_re_gyrus_rectus left_g_re_gyrus_rectus
## fluencia                      -0.10                  -0.05
## dscorr                        -0.07                  -0.01
## otswitcherr                    0.10                   0.08
## bostonsc                      -0.08                  -0.04
## [1] "Generating Correlation Plot (Cognitive vs. Neuroimaging):"

R12: This analysis explored the bivariate correlations between selected cognitive domain measures (verbal fluency fluencia, digit span correct dscorr, switching errors otswitcherr, and Boston Naming Test score bostonsc) and a comprehensive set of structural neuroimaging variables representing volumes of various brain regions. The correlation matrix revealed numerous weak-to-moderate associations. Notably, higher performance on tasks measuring fluency, memory/attention (dscorr), and naming (bostonsc) tended to correlate negatively with volumes in several regions, including the hippocampi, thalami, and anterior insula (correlations typically ranging from r ≈ -0.10 to -0.28). Conversely, a higher number of errors on the switching task (otswitcherr) showed positive correlations with volumes in regions such as the amygdala, hippocampi, and thalami (r ≈ 0.19 to 0.28). It is important to note that these are simple correlations, not controlling for potential confounders like age, which might influence both cognitive scores and brain volumes.

Question 13: Exploring the ‘Environmental COVID’ Variables

# Hypothesis: These variables might relate to perceived risk, learning about COVID,
# or performance on specific experimental tasks. Explore their relationship with
# cognitive scores and COVID history.
# Approach: Basic correlations and comparisons.

env_vars <- c("listaprimerrec", "listaaprendizaje", "listacp", "listalp", "listarecon",
              "corsidirecto", "corsiinverso", "cactuscorrectas", "cactusvivos", "cactusinanim")
env_vars <- intersect(env_vars, names(imputed_data))

if (length(env_vars) > 0 && "cognitive_num" %in% names(imputed_data)) {
    print("--- Q13: Exploring Environmental COVID Variables ---")

    # Select environmental and key cognitive/demographic variables
    explore_data_q13 <- imputed_data %>% select(all_of(env_vars), cognitive_num, age_pcr, pcr) %>% na.omit()

    if(nrow(explore_data_q13) > 10){
        # Correlations between environmental variables and cognitive score / age
        cor_env_cog_age <- cor(explore_data_q13 %>% select(all_of(env_vars), cognitive_num, age_pcr), use="pairwise.complete.obs")
        print("Correlations involving Environmental Variables, Cognition, and Age:")
        print(round(cor_env_cog_age[c("cognitive_num", "age_pcr"), env_vars], 2))

        # Compare environmental variables based on PCR status (example using t-tests/wilcoxon)
        print("Comparing Environmental Variables by PCR Status:")
        for (env_var in env_vars) {
            if (is.numeric(explore_data_q13[[env_var]])) {
                 print(paste("---", env_var, "vs PCR Status ---"))
                 tryCatch({
                     # Simple t-test or Wilcoxon as fallback
                     test_result <- t.test(as.formula(paste(env_var, "~ pcr")), data = explore_data_q13)
                     print(test_result)
                 }, error = function(e) {
                     tryCatch({
                        wilcox_result <- wilcox.test(as.formula(paste(env_var, "~ pcr")), data = explore_data_q13)
                        print(wilcox_result)
                     }, error = function(e2){
                        print(paste("Could not perform test for", env_var, ":", e2$message))
                     })
                 })
            }
        }
    } else {
         print("Skipping Q13 exploration: Insufficient complete data for environmental variables.")
    }

} else {
    print("Skipping Q13 analysis: Environmental variables or 'cognitive_num' not found.")
}
## [1] "--- Q13: Exploring Environmental COVID Variables ---"
## [1] "Correlations involving Environmental Variables, Cognition, and Age:"
##               listaprimerrec listaaprendizaje listacp listalp listarecon
## cognitive_num           0.37             0.61    0.58    0.56       0.35
## age_pcr                -0.20            -0.33   -0.35   -0.31      -0.18
##               corsidirecto corsiinverso cactusvivos cactusinanim
## cognitive_num         0.08         0.25        0.42         0.34
## age_pcr              -0.08        -0.14       -0.23        -0.26
## [1] "Comparing Environmental Variables by PCR Status:"
## [1] "--- listaprimerrec vs PCR Status ---"
## 
##  Welch Two Sample t-test
## 
## data:  listaprimerrec by pcr
## t = 0.11574, df = 122.92, p-value = 0.908
## alternative hypothesis: true difference in means between group NEGATIVA and group POSITIVA is not equal to 0
## 95 percent confidence interval:
##  -0.3355948  0.3772784
## sample estimates:
## mean in group NEGATIVA mean in group POSITIVA 
##               3.723684               3.702842 
## 
## [1] "--- listaaprendizaje vs PCR Status ---"
## 
##  Welch Two Sample t-test
## 
## data:  listaaprendizaje by pcr
## t = 1.4734, df = 116.69, p-value = 0.1433
## alternative hypothesis: true difference in means between group NEGATIVA and group POSITIVA is not equal to 0
## 95 percent confidence interval:
##  -0.4279609  2.9147010
## sample estimates:
## mean in group NEGATIVA mean in group POSITIVA 
##               25.02632               23.78295 
## 
## [1] "--- listacp vs PCR Status ---"
## 
##  Welch Two Sample t-test
## 
## data:  listacp by pcr
## t = 1.3638, df = 100.95, p-value = 0.1757
## alternative hypothesis: true difference in means between group NEGATIVA and group POSITIVA is not equal to 0
## 95 percent confidence interval:
##  -0.1829785  0.9880240
## sample estimates:
## mean in group NEGATIVA mean in group POSITIVA 
##               6.539474               6.136951 
## 
## [1] "--- listalp vs PCR Status ---"
## 
##  Welch Two Sample t-test
## 
## data:  listalp by pcr
## t = 1.6448, df = 104.74, p-value = 0.103
## alternative hypothesis: true difference in means between group NEGATIVA and group POSITIVA is not equal to 0
## 95 percent confidence interval:
##  -0.1026459  1.1014219
## sample estimates:
## mean in group NEGATIVA mean in group POSITIVA 
##               5.894737               5.395349 
## 
## [1] "--- listarecon vs PCR Status ---"
## 
##  Welch Two Sample t-test
## 
## data:  listarecon by pcr
## t = 1.8522, df = 143.03, p-value = 0.06606
## alternative hypothesis: true difference in means between group NEGATIVA and group POSITIVA is not equal to 0
## 95 percent confidence interval:
##  -0.04028367  1.23911408
## sample estimates:
## mean in group NEGATIVA mean in group POSITIVA 
##               21.71053               21.11111 
## 
## [1] "--- corsidirecto vs PCR Status ---"
## 
##  Welch Two Sample t-test
## 
## data:  corsidirecto by pcr
## t = -0.14342, df = 150.46, p-value = 0.8862
## alternative hypothesis: true difference in means between group NEGATIVA and group POSITIVA is not equal to 0
## 95 percent confidence interval:
##  -0.2738168  0.2367571
## sample estimates:
## mean in group NEGATIVA mean in group POSITIVA 
##               5.118421               5.136951 
## 
## [1] "--- corsiinverso vs PCR Status ---"
## 
##  Welch Two Sample t-test
## 
## data:  corsiinverso by pcr
## t = 1.1679, df = 110.26, p-value = 0.2454
## alternative hypothesis: true difference in means between group NEGATIVA and group POSITIVA is not equal to 0
## 95 percent confidence interval:
##  -0.1192161  0.4613893
## sample estimates:
## mean in group NEGATIVA mean in group POSITIVA 
##               4.315789               4.144703 
## 
## [1] "--- cactusvivos vs PCR Status ---"
## 
##  Welch Two Sample t-test
## 
## data:  cactusvivos by pcr
## t = 0.90244, df = 121.75, p-value = 0.3686
## alternative hypothesis: true difference in means between group NEGATIVA and group POSITIVA is not equal to 0
## 95 percent confidence interval:
##  -0.3058832  0.8183951
## sample estimates:
## mean in group NEGATIVA mean in group POSITIVA 
##               12.88158               12.62532 
## 
## [1] "--- cactusinanim vs PCR Status ---"
## 
##  Welch Two Sample t-test
## 
## data:  cactusinanim by pcr
## t = 0.92729, df = 122.67, p-value = 0.3556
## alternative hypothesis: true difference in means between group NEGATIVA and group POSITIVA is not equal to 0
## 95 percent confidence interval:
##  -0.2850283  0.7874083
## sample estimates:
## mean in group NEGATIVA mean in group POSITIVA 
##               15.31579               15.06460
print("--- End of Expanded Analysis Script ---")
## [1] "--- End of Expanded Analysis Script ---"

R13: This exploratory analysis focused on variables grouped under ‘Environmental COVID’, which appear to represent performance on various cognitive or learning-related tasks (e.g., listaprimerrec, listaaprendizaje, cactusvivos). Firstly, correlations indicated that scores on several of these tasks (listaprimerrec, listaaprendizaje, listacp, listalp, listarecon, cactusvivos, cactusinanim) were moderately positively associated with the general cognitive score (cognitive_num, r ≈ 0.34 to 0.61) and negatively associated with age (age_pcr, r ≈ -0.18 to -0.35). This suggests these variables largely reflect cognitive performance sensitive to age. Secondly, comparisons using Welch’s t-tests were conducted to assess differences in these variables based on COVID-19 PCR status (pcr). No statistically significant differences were found between the PCR-positive and PCR-negative groups for any of these ‘Environmental COVID’ variables (all p > 0.05). Thus, while these tasks measure age-related cognitive functions, performance on them did not significantly differ based on past PCR positivity in this sample.