Variable Description ID: Unique identifier for the participant. ID_genetics: Genetic identifier related to the participant’s sample or genetic profile. cognitive: Assessment or overall score of cognitive function. de800cog: Result obtained from a specific cognitive test (e.g., DE800 version). images: Indicates the availability or analysis of images (e.g., neuroimaging). DATE_STUDY: Date when the study was conducted. BIRTH: Participant’s birth date. AGE-2024: Participant’s age in the year 2024. AGE-PCR: Age of the participant at the time of the PCR test. AGE_INTERVAL: Interval between age assessments (e.g., between the study date and the PCR test).
Variable Description ID Identifier: associated with the symptom evaluation. ANOSMIA: Indicator of loss of smell (anosmia), presented in binary format or on a scale. Anosmia: A second measure or confirmation of the presence of anosmia. RISK-HOSPITAL-ICU: Indicator of the risk of hospitalization or the need for ICU care due to COVID-19. VACCINE_BEFORE_STUDY: Participant’s vaccination status prior to the study. COVID_BEFORE_VACCINATION: History of COVID-19 infection before vaccination. FEVER: Presence of fever. COUGH: Presence of cough. MUSCLE_PAIN: Presence of muscle pain (myalgia). BREATH_DIF: Indication of breathing difficulties (dyspnea). SMELL_LOST: Report of loss of smell. TASTE_LOST: Report of loss of taste. DATE_PCR: Date when the PCR test for COVID-19 was performed. PCR: Qualitative result of the PCR test (e.g., positive or negative). PCR_NUM: Numerical value associated with the PCR test (e.g., Ct value or viral load). COVID-VARIANT: SARS-CoV-2 variant identified (e.g., Alpha, Delta, Omicron). VACCINE_1: Information regarding the first dose of the vaccine (type or brand). VACCINE_2: Information regarding the second dose of the vaccine (type or brand). VACCINE_3: Information regarding the third dose or booster dose of the vaccine.
Variable Description ID: Identifier related to environmental data or COVID-19 exposure. LISTAPRIMERREC: List referring to the first recognition (possibly of symptoms or initial contact). LISTAAPRENDIZAJE: List associated with the acquisition of knowledge or learning in the context of COVID-19. LISTACP: Indicator or list related to close contacts (CP) or similar parameters. LISTALP: List of parameters or locations (LP), whose definition depends on the study protocol. LISTARECON: List for the recognition of signs or symptoms related to COVID-19. CORSIDIRECTO: Measure or score of direct correlation, possibly related to environmental factors. CORSIINVERSO: Measure or score of inverse correlation related to the same parameters. CACTUSCORRECTAS: Number of correct responses obtained in the “cactus” test. CACTUSVIVOS: Number of responses classified as “living” in the “cactus” test. CACTUSINANIM: Number of responses classified as “inanimate” in the “cactus” test.
Variable Description OTVERBALTPO: Response time in the verbal task (OT). OTVERBALERR: Number of errors in the verbal task (OT). OTVISUALTPO: Response time in the visual task (OT). OTVISUALERR: Number of errors in the visual task (OT). OTMENTALTPO: Response time in the mental task (OT). OTMENTALERR: Number of errors in the mental task (OT). OTVISMENTTPO: Response time in the task combining visual and mental stimuli (OT). OTVISMENTERR: Number of errors in the task combining visual and mental stimuli (OT). OTSWITCHTPO: Response time in the switching task (OT evaluation). OTSWITCHERR: Number of errors in the switching task (OT evaluation). 5DREADTPO: Reading time in the designated 5D task. 5DREADERR: Number of errors in the 5D reading task. 5DCOUNTTPO: Time taken in the counting task (5D). 5DCOUNTERR: Number of errors in the counting task (5D). 5DFOCTPO: Execution time in the focus task (5D). 5DFOCERR: Number of errors in the focus task (5D). 5DSWITCHTPO: Response time in the switching task (5D evaluation). 5DSWITCHERR: Number of errors in the switching task (5D evaluation). DSCORR: Number of correct responses in the DS task (e.g., digit span test). DSOMIS: Number of omissions in the DS task. DSCOMIS: Number of commission errors (incorrect responses) in the DS task. TORREMOV: Indicator or number of removals in a tower task (possibly related to response inhibition). TORRETPO: Execution time in the tower task (e.g., Tower of Hanoi or similar). BOSTONSC: Score obtained on the Boston test subscale, possibly related to naming. BOSTONLAT: Latency in the performance of the Boston test. BOSTONSEM: Performance in the semantic component of the Boston test. BOSTONSEMERR: Number of errors in the semantic component of the Boston test. BOSTONFON: Performance in the phonemic component of the Boston test. BOSTONFONERR: Number of errors in the phonemic component of the Boston test. FLUENCIA: Measure of verbal fluency, evaluating the ability to generate words within a specified time.
library(readxl)
library(tidyverse)
library(caret)
library(janitor)
library(DataExplorer)
library(dlookr)
library(skimr)
library(car) # For Anova()
library(effectsize) # For effect sizes
library(corrplot) # For correlation matrix visualization
# Read the Excel file
dataset <- read_excel("~/Downloads/en_uso8_last_V10.xlsx",
sheet = "dataset", col_types = c("numeric",
"numeric", "numeric", "numeric",
"numeric", "date", "date", "numeric",
"numeric", "numeric", "date", "numeric",
"numeric", "numeric", "numeric",
"numeric", "numeric", "numeric",
"numeric", "numeric", "numeric",
"numeric", "date", "text", "numeric",
"numeric", "numeric", "numeric",
"numeric", "numeric", "numeric",
"numeric", "numeric", "numeric",
"numeric", "numeric", "numeric",
"numeric", "numeric", "numeric",
"numeric", "numeric", "numeric",
"numeric", "numeric", "numeric",
"numeric", "numeric", "numeric",
"numeric", "numeric", "numeric",
"numeric", "numeric", "numeric",
"numeric", "numeric", "numeric",
"numeric", "numeric", "numeric",
"numeric", "numeric", "numeric",
"numeric", "numeric", "numeric",
"numeric", "numeric", "numeric",
"numeric", "numeric", "numeric",
"numeric", "numeric", "numeric",
"numeric", "numeric", "numeric",
"numeric", "numeric", "numeric",
"numeric", "numeric", "numeric",
"numeric", "numeric", "numeric",
"numeric", "numeric", "numeric",
"numeric", "numeric", "numeric",
"numeric", "numeric", "numeric",
"numeric", "numeric", "numeric",
"numeric"))
# Rename variables to clean names
dataset <- dataset %>%
clean_names()
# Subset the data (selecting specific columns)
data <- dataset[c(2:5, 8:10, 12, 14:22, 24:29, 31:37, 39:65, 67, 69:101)]
# Get the names of the columns
# names(data)
# Summary of cognitive data:
skim(data[24:60])
| Name | data[24:60] |
| Number of rows | 463 |
| Number of columns | 37 |
| _______________________ | |
| Column type frequency: | |
| numeric | 37 |
| ________________________ | |
| Group variables | None |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| listaprimerrec | 0 | 1.00 | 3.71 | 1.63 | 0.00 | 3.00 | 4.00 | 5.00 | 16.00 | ▇▇▁▁▁ |
| listaaprendizaje | 0 | 1.00 | 23.99 | 7.35 | 0.00 | 19.00 | 24.00 | 29.00 | 41.00 | ▁▂▇▇▂ |
| listacp | 0 | 1.00 | 6.20 | 2.21 | 0.00 | 5.00 | 6.00 | 8.00 | 12.00 | ▁▃▇▃▁ |
| listalp | 0 | 1.00 | 5.48 | 2.38 | 0.00 | 4.00 | 5.00 | 7.00 | 12.00 | ▂▅▇▃▁ |
| listarecon | 0 | 1.00 | 21.21 | 3.27 | 0.00 | 20.00 | 22.00 | 23.00 | 24.00 | ▁▁▁▂▇ |
| corsidirecto | 0 | 1.00 | 5.13 | 1.34 | 0.00 | 4.00 | 5.00 | 6.00 | 11.00 | ▁▃▇▁▁ |
| corsiinverso | 1 | 1.00 | 4.17 | 1.21 | 0.00 | 4.00 | 4.00 | 5.00 | 9.00 | ▁▂▇▂▁ |
| cactusvivos | 0 | 1.00 | 12.67 | 2.56 | 0.00 | 12.00 | 13.00 | 14.00 | 16.00 | ▁▁▁▃▇ |
| cactusinanim | 0 | 1.00 | 15.11 | 2.46 | 0.00 | 14.00 | 16.00 | 17.00 | 17.00 | ▁▁▁▁▇ |
| otverbaltpo | 0 | 1.00 | 185.29 | 11.42 | 99.00 | 180.00 | 189.00 | 192.00 | 215.00 | ▁▁▁▇▃ |
| otverbalerr | 0 | 1.00 | 19.84 | 0.55 | 15.00 | 20.00 | 20.00 | 20.00 | 20.00 | ▁▁▁▁▇ |
| otvisualtpo | 0 | 1.00 | 63.95 | 31.16 | 0.00 | 46.00 | 55.00 | 74.00 | 300.00 | ▇▅▁▁▁ |
| otvisualerr | 0 | 1.00 | 19.70 | 1.42 | 0.00 | 20.00 | 20.00 | 20.00 | 20.00 | ▁▁▁▁▇ |
| otmentaltpo | 0 | 1.00 | 245.58 | 52.95 | 19.00 | 234.50 | 262.00 | 278.00 | 319.00 | ▁▁▁▅▇ |
| otmentalerr | 0 | 1.00 | 18.04 | 3.03 | 0.00 | 17.00 | 19.00 | 20.00 | 20.00 | ▁▁▁▂▇ |
| otvismenttpo | 0 | 1.00 | 240.89 | 49.22 | 32.00 | 226.50 | 256.00 | 272.50 | 332.00 | ▁▁▂▇▃ |
| otvismenterr | 0 | 1.00 | 18.63 | 2.70 | 0.00 | 18.00 | 20.00 | 20.00 | 20.00 | ▁▁▁▁▇ |
| otswitchtpo | 0 | 1.00 | 350.00 | 53.89 | 28.00 | 338.00 | 366.00 | 383.00 | 428.00 | ▁▁▁▃▇ |
| otswitcherr | 0 | 1.00 | 17.65 | 2.95 | 0.00 | 17.00 | 19.00 | 20.00 | 20.00 | ▁▁▁▂▇ |
| x5dreadtpo | 0 | 1.00 | 188.32 | 20.01 | 22.00 | 185.00 | 193.00 | 198.00 | 222.00 | ▁▁▁▂▇ |
| x5dreaderr | 3 | 0.99 | 49.83 | 1.06 | 30.00 | 50.00 | 50.00 | 50.00 | 50.00 | ▁▁▁▁▇ |
| x5dcounttpo | 2 | 1.00 | 165.29 | 14.23 | 68.00 | 162.00 | 168.00 | 173.00 | 200.00 | ▁▁▁▇▃ |
| x5dcounterr | 2 | 1.00 | 49.79 | 1.51 | 19.00 | 50.00 | 50.00 | 50.00 | 50.00 | ▁▁▁▁▇ |
| x5dfoctpo | 2 | 1.00 | 246.04 | 23.31 | 89.00 | 239.00 | 252.00 | 259.00 | 300.00 | ▁▁▁▇▃ |
| x5dfocerr | 2 | 1.00 | 48.28 | 3.01 | 3.00 | 48.00 | 49.00 | 50.00 | 50.00 | ▁▁▁▁▇ |
| x5dswitchtpo | 2 | 1.00 | 271.96 | 35.53 | 0.00 | 257.00 | 280.00 | 295.00 | 350.00 | ▁▁▁▇▇ |
| x5dswitcherr | 2 | 1.00 | 46.58 | 4.59 | 0.00 | 46.00 | 48.00 | 49.00 | 50.00 | ▁▁▁▁▇ |
| dscorr | 0 | 1.00 | 69.16 | 18.11 | 0.00 | 57.00 | 70.00 | 82.00 | 109.00 | ▁▂▆▇▂ |
| dsomis | 0 | 1.00 | 0.35 | 1.95 | 0.00 | 0.00 | 0.00 | 0.00 | 24.00 | ▇▁▁▁▁ |
| dscomis | 0 | 1.00 | 25.93 | 2.25 | 0.00 | 26.00 | 27.00 | 27.00 | 27.00 | ▁▁▁▁▇ |
| torremov | 1 | 1.00 | 341.44 | 3.93 | 323.00 | 340.00 | 343.00 | 344.00 | 351.00 | ▁▁▂▇▁ |
| torretpo | 0 | 1.00 | 264.54 | 74.67 | 2.58 | 240.58 | 287.58 | 316.08 | 352.58 | ▁▁▁▅▇ |
| bostonsc | 0 | 1.00 | 3.20 | 2.93 | 0.00 | 1.00 | 2.00 | 5.00 | 12.00 | ▇▃▂▂▁ |
| bostonlat | 0 | 1.00 | 2.71 | 1.89 | 0.00 | 1.83 | 2.62 | 3.52 | 16.00 | ▇▃▁▁▁ |
| bostonsemerr | 0 | 1.00 | 1.99 | 2.37 | 0.00 | 0.00 | 1.00 | 3.00 | 10.00 | ▇▂▁▁▁ |
| bostonfonerr | 0 | 1.00 | 1.12 | 1.76 | 0.00 | 0.00 | 0.00 | 2.00 | 11.00 | ▇▁▁▁▁ |
| fluencia | 1 | 1.00 | 16.52 | 4.82 | 1.00 | 14.00 | 16.50 | 20.00 | 33.00 | ▁▃▇▂▁ |
# Visualize the missing data pattern:
plot_missing(data[24:60])
# Plot density distributions of numeric variables:
plot_density(data[24:60])
# Diagnose potential data issues
print(diagnose_outlier(data[24:60]), n = 40)
## # A tibble: 37 × 6
## variables outliers_cnt outliers_ratio outliers_mean with_mean without_mean
## <chr> <int> <dbl> <dbl> <dbl> <dbl>
## 1 listaprimer… 4 0.864 10.8 3.71 3.64
## 2 listaaprend… 4 0.864 0.75 24.0 24.2
## 3 listacp 8 1.73 0 6.20 6.31
## 4 listalp 4 0.864 12 5.48 5.42
## 5 listarecon 16 3.46 8.75 21.2 21.7
## 6 corsidirecto 7 1.51 6 5.13 5.12
## 7 corsiinverso 52 11.2 3.17 4.17 4.3
## 8 cactusvivos 31 6.70 5.81 12.7 13.2
## 9 cactusinanim 20 4.32 6.6 15.1 15.5
## 10 otverbaltpo 20 4.32 157. 185. 187.
## 11 otverbalerr 49 10.6 18.5 19.8 20
## 12 otvisualtpo 29 6.26 142. 64.0 58.7
## 13 otvisualerr 75 16.2 18.2 19.7 20
## 14 otmentaltpo 44 9.50 110. 246. 260.
## 15 otmentalerr 18 3.89 7 18.0 18.5
## 16 otvismenttpo 29 6.26 97.7 241. 250.
## 17 otvismenterr 24 5.18 10.0 18.6 19.1
## 18 otswitchtpo 30 6.48 197. 350. 361.
## 19 otswitcherr 21 4.54 8.43 17.7 18.1
## 20 x5dreadtpo 29 6.26 138. 188. 192.
## 21 x5dreaderr 38 8.21 47.9 49.8 50
## 22 x5dcounttpo 32 6.91 135. 165. 168.
## 23 x5dcounterr 46 9.94 47.9 49.8 50
## 24 x5dfoctpo 28 6.05 194. 246. 249.
## 25 x5dfocerr 21 4.54 39 48.3 48.7
## 26 x5dswitchtpo 19 4.10 160. 272. 277.
## 27 x5dswitcherr 31 6.70 34.4 46.6 47.5
## 28 dscorr 3 0.648 11.3 69.2 69.5
## 29 dsomis 46 9.94 3.54 0.352 0
## 30 dscomis 68 14.7 21.8 25.9 26.6
## 31 torremov 18 3.89 330. 341. 342.
## 32 torretpo 37 7.99 66.8 265. 282.
## 33 bostonsc 4 0.864 12 3.20 3.12
## 34 bostonlat 19 4.10 8.04 2.71 2.48
## 35 bostonsemerr 17 3.67 8.76 1.99 1.73
## 36 bostonfonerr 22 4.75 6.64 1.12 0.846
## 37 fluencia 8 1.73 17.6 16.5 16.5
# Explore missing data patterns
print(diagnose(data[24:60]), n = 40)
## # A tibble: 37 × 6
## variables types missing_count missing_percent unique_count unique_rate
## <chr> <chr> <int> <dbl> <int> <dbl>
## 1 listaprimerrec nume… 0 0 11 0.0238
## 2 listaaprendizaje nume… 0 0 39 0.0842
## 3 listacp nume… 0 0 13 0.0281
## 4 listalp nume… 0 0 13 0.0281
## 5 listarecon nume… 0 0 17 0.0367
## 6 corsidirecto nume… 0 0 11 0.0238
## 7 corsiinverso nume… 1 0.216 11 0.0238
## 8 cactusvivos nume… 0 0 14 0.0302
## 9 cactusinanim nume… 0 0 13 0.0281
## 10 otverbaltpo nume… 0 0 49 0.106
## 11 otverbalerr nume… 0 0 5 0.0108
## 12 otvisualtpo nume… 0 0 99 0.214
## 13 otvisualerr nume… 0 0 7 0.0151
## 14 otmentaltpo nume… 0 0 136 0.294
## 15 otmentalerr nume… 0 0 14 0.0302
## 16 otvismenttpo nume… 0 0 128 0.276
## 17 otvismenterr nume… 0 0 13 0.0281
## 18 otswitchtpo nume… 0 0 134 0.289
## 19 otswitcherr nume… 0 0 14 0.0302
## 20 x5dreadtpo nume… 0 0 65 0.140
## 21 x5dreaderr nume… 3 0.648 7 0.0151
## 22 x5dcounttpo nume… 2 0.432 59 0.127
## 23 x5dcounterr nume… 2 0.432 7 0.0151
## 24 x5dfoctpo nume… 2 0.432 84 0.181
## 25 x5dfocerr nume… 2 0.432 15 0.0324
## 26 x5dswitchtpo nume… 2 0.432 118 0.255
## 27 x5dswitcherr nume… 2 0.432 21 0.0454
## 28 dscorr nume… 0 0 82 0.177
## 29 dsomis nume… 0 0 12 0.0259
## 30 dscomis nume… 0 0 13 0.0281
## 31 torremov nume… 1 0.216 20 0.0432
## 32 torretpo nume… 0 0 151 0.326
## 33 bostonsc nume… 0 0 13 0.0281
## 34 bostonlat nume… 0 0 147 0.317
## 35 bostonsemerr nume… 0 0 11 0.0238
## 36 bostonfonerr nume… 0 0 10 0.0216
## 37 fluencia nume… 1 0.216 30 0.0648
# Summary of the COVID data
summary(data[c(2:23)])
## cognitive de800cog images age_2024 age_pcr
## Min. :1.000 Min. :1.000 Min. :1.000 Min. :12.0 Min. :53.00
## 1st Qu.:1.000 1st Qu.:2.000 1st Qu.:1.000 1st Qu.:65.0 1st Qu.:62.00
## Median :2.000 Median :2.000 Median :2.000 Median :69.0 Median :65.00
## Mean :1.501 Mean :2.352 Mean :1.877 Mean :69.7 Mean :66.48
## 3rd Qu.:2.000 3rd Qu.:3.000 3rd Qu.:2.000 3rd Qu.:73.0 3rd Qu.:70.00
## Max. :2.000 Max. :4.000 Max. :3.000 Max. :90.0 Max. :86.00
## NA's :13
## age_interval anosmia risk_hospital_icu vaccine_before_study
## Min. :1.000 Min. :0.000 Min. :0.0000 Min. :0.000
## 1st Qu.:1.000 1st Qu.:1.000 1st Qu.:0.0000 1st Qu.:1.000
## Median :2.000 Median :2.000 Median :0.0000 Median :2.000
## Mean :1.922 Mean :1.734 Mean :0.3024 Mean :1.482
## 3rd Qu.:2.000 3rd Qu.:3.000 3rd Qu.:0.0000 3rd Qu.:2.000
## Max. :3.000 Max. :3.000 Max. :3.0000 Max. :3.000
## NA's :1
## covid_before_vaccination fever cough muscle_pain
## Min. :0.0000 Min. :0.0000 Min. :0.0000 Min. :0.0000
## 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000
## Median :1.0000 Median :0.0000 Median :1.0000 Median :1.0000
## Mean :0.5487 Mean :0.4323 Mean :0.5178 Mean :0.6081
## 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:1.0000
## Max. :1.0000 Max. :1.0000 Max. :1.0000 Max. :1.0000
## NA's :42 NA's :42 NA's :42 NA's :42
## breath_dif smell_lost taste_lost pcr
## Min. :0.0000 Min. :0.0000 Min. :0.000 Length:463
## 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.000 Class :character
## Median :0.0000 Median :0.0000 Median :0.000 Mode :character
## Mean :0.3967 Mean :0.4561 Mean :0.399
## 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:1.000
## Max. :1.0000 Max. :1.0000 Max. :1.000
## NA's :42 NA's :42 NA's :42
## pcr_num covid_variant vaccine_1 vaccine_2
## Min. :0.0000 Min. :0.000 Min. :0.000 Min. :0.000
## 1st Qu.:1.0000 1st Qu.:1.000 1st Qu.:2.000 1st Qu.:1.000
## Median :1.0000 Median :1.000 Median :2.000 Median :2.000
## Mean :0.8377 Mean :1.348 Mean :1.996 Mean :1.819
## 3rd Qu.:1.0000 3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.:2.000
## Max. :1.0000 Max. :7.000 Max. :5.000 Max. :6.000
## NA's :1
## vaccine_3
## Min. :0.000
## 1st Qu.:0.000
## Median :1.000
## Mean :1.726
## 3rd Qu.:2.000
## Max. :6.000
##
# Summary of the cognitive data
summary(data[24:60])
## listaprimerrec listaaprendizaje listacp listalp
## Min. : 0.000 Min. : 0.00 Min. : 0.000 Min. : 0.000
## 1st Qu.: 3.000 1st Qu.:19.00 1st Qu.: 5.000 1st Qu.: 4.000
## Median : 4.000 Median :24.00 Median : 6.000 Median : 5.000
## Mean : 3.706 Mean :23.99 Mean : 6.203 Mean : 5.477
## 3rd Qu.: 5.000 3rd Qu.:29.00 3rd Qu.: 8.000 3rd Qu.: 7.000
## Max. :16.000 Max. :41.00 Max. :12.000 Max. :12.000
##
## listarecon corsidirecto corsiinverso cactusvivos
## Min. : 0.00 Min. : 0.000 Min. :0.000 Min. : 0.00
## 1st Qu.:20.00 1st Qu.: 4.000 1st Qu.:4.000 1st Qu.:12.00
## Median :22.00 Median : 5.000 Median :4.000 Median :13.00
## Mean :21.21 Mean : 5.134 Mean :4.173 Mean :12.67
## 3rd Qu.:23.00 3rd Qu.: 6.000 3rd Qu.:5.000 3rd Qu.:14.00
## Max. :24.00 Max. :11.000 Max. :9.000 Max. :16.00
## NA's :1
## cactusinanim otverbaltpo otverbalerr otvisualtpo
## Min. : 0.00 Min. : 99.0 Min. :15.00 Min. : 0.00
## 1st Qu.:14.00 1st Qu.:180.0 1st Qu.:20.00 1st Qu.: 46.00
## Median :16.00 Median :189.0 Median :20.00 Median : 55.00
## Mean :15.11 Mean :185.3 Mean :19.84 Mean : 63.95
## 3rd Qu.:17.00 3rd Qu.:192.0 3rd Qu.:20.00 3rd Qu.: 74.00
## Max. :17.00 Max. :215.0 Max. :20.00 Max. :300.00
##
## otvisualerr otmentaltpo otmentalerr otvismenttpo otvismenterr
## Min. : 0.0 Min. : 19.0 Min. : 0.00 Min. : 32.0 Min. : 0.00
## 1st Qu.:20.0 1st Qu.:234.5 1st Qu.:17.00 1st Qu.:226.5 1st Qu.:18.00
## Median :20.0 Median :262.0 Median :19.00 Median :256.0 Median :20.00
## Mean :19.7 Mean :245.6 Mean :18.04 Mean :240.9 Mean :18.63
## 3rd Qu.:20.0 3rd Qu.:278.0 3rd Qu.:20.00 3rd Qu.:272.5 3rd Qu.:20.00
## Max. :20.0 Max. :319.0 Max. :20.00 Max. :332.0 Max. :20.00
##
## otswitchtpo otswitcherr x5dreadtpo x5dreaderr x5dcounttpo
## Min. : 28 Min. : 0.00 Min. : 22.0 Min. :30.00 Min. : 68.0
## 1st Qu.:338 1st Qu.:17.00 1st Qu.:185.0 1st Qu.:50.00 1st Qu.:162.0
## Median :366 Median :19.00 Median :193.0 Median :50.00 Median :168.0
## Mean :350 Mean :17.65 Mean :188.3 Mean :49.83 Mean :165.3
## 3rd Qu.:383 3rd Qu.:20.00 3rd Qu.:198.0 3rd Qu.:50.00 3rd Qu.:173.0
## Max. :428 Max. :20.00 Max. :222.0 Max. :50.00 Max. :200.0
## NA's :3 NA's :2
## x5dcounterr x5dfoctpo x5dfocerr x5dswitchtpo x5dswitcherr
## Min. :19.00 Min. : 89 Min. : 3.00 Min. : 0 Min. : 0.00
## 1st Qu.:50.00 1st Qu.:239 1st Qu.:48.00 1st Qu.:257 1st Qu.:46.00
## Median :50.00 Median :252 Median :49.00 Median :280 Median :48.00
## Mean :49.79 Mean :246 Mean :48.28 Mean :272 Mean :46.58
## 3rd Qu.:50.00 3rd Qu.:259 3rd Qu.:50.00 3rd Qu.:295 3rd Qu.:49.00
## Max. :50.00 Max. :300 Max. :50.00 Max. :350 Max. :50.00
## NA's :2 NA's :2 NA's :2 NA's :2 NA's :2
## dscorr dsomis dscomis torremov
## Min. : 0.00 Min. : 0.0000 Min. : 0.00 Min. :323.0
## 1st Qu.: 57.00 1st Qu.: 0.0000 1st Qu.:26.00 1st Qu.:340.0
## Median : 70.00 Median : 0.0000 Median :27.00 Median :343.0
## Mean : 69.16 Mean : 0.3521 Mean :25.93 Mean :341.4
## 3rd Qu.: 82.00 3rd Qu.: 0.0000 3rd Qu.:27.00 3rd Qu.:344.0
## Max. :109.00 Max. :24.0000 Max. :27.00 Max. :351.0
## NA's :1
## torretpo bostonsc bostonlat bostonsemerr
## Min. : 2.58 Min. : 0.000 Min. : 0.000 Min. : 0.000
## 1st Qu.:240.58 1st Qu.: 1.000 1st Qu.: 1.830 1st Qu.: 0.000
## Median :287.58 Median : 2.000 Median : 2.620 Median : 1.000
## Mean :264.54 Mean : 3.201 Mean : 2.706 Mean : 1.991
## 3rd Qu.:316.08 3rd Qu.: 5.000 3rd Qu.: 3.520 3rd Qu.: 3.000
## Max. :352.58 Max. :12.000 Max. :16.000 Max. :10.000
##
## bostonfonerr fluencia
## Min. : 0.000 Min. : 1.00
## 1st Qu.: 0.000 1st Qu.:14.00
## Median : 0.000 Median :16.50
## Mean : 1.121 Mean :16.52
## 3rd Qu.: 2.000 3rd Qu.:20.00
## Max. :11.000 Max. :33.00
## NA's :1
# Summary of the volume brain data
summary(data[71:91])
## right_thalamus_proper left_thalamus_proper fornix_right fornix_left
## Min. :4349 Min. :4478 Min. : 67.0 Min. :105.0
## 1st Qu.:6449 1st Qu.:6800 1st Qu.:324.0 1st Qu.:399.5
## Median :6859 Median :7203 Median :361.0 Median :453.0
## Mean :6871 Mean :7231 Mean :358.6 Mean :446.2
## 3rd Qu.:7265 3rd Qu.:7658 3rd Qu.:394.0 3rd Qu.:496.0
## Max. :9039 Max. :9560 Max. :541.0 Max. :694.0
## anterior_limb_of_internal_capsule_right anterior_limb_of_internal_capsule_left
## Min. :1849 Min. :1438
## 1st Qu.:2842 1st Qu.:2440
## Median :3071 Median :2673
## Mean :3099 Mean :2687
## 3rd Qu.:3336 3rd Qu.:2919
## Max. :4742 Max. :4213
## posterior_limb_of_internal_capsule_inc_cerebral_peduncle_right
## Min. :1571
## 1st Qu.:2194
## Median :2349
## Mean :2384
## 3rd Qu.:2557
## Max. :4129
## posterior_limb_of_internal_capsule_inc_cerebral_peduncle_left corpus_callosum
## Min. :1455 Min. : 6119
## 1st Qu.:2155 1st Qu.: 9718
## Median :2333 Median :10789
## Mean :2365 Mean :10885
## 3rd Qu.:2550 3rd Qu.:11905
## Max. :3814 Max. :17580
## right_a_cg_g_anterior_cingulate_gyrus left_a_cg_g_anterior_cingulate_gyrus
## Min. :1901 Min. :2210
## 1st Qu.:2962 1st Qu.:3522
## Median :3316 Median :3896
## Mean :3392 Mean :3953
## 3rd Qu.:3772 3rd Qu.:4374
## Max. :5867 Max. :6185
## right_a_ins_anterior_insula left_a_ins_anterior_insula
## Min. :1537 Min. :1562
## 1st Qu.:2925 1st Qu.:2907
## Median :3186 Median :3165
## Mean :3206 Mean :3177
## 3rd Qu.:3493 3rd Qu.:3450
## Max. :4521 Max. :5098
## right_an_g_angular_gyrus left_an_g_angular_gyrus right_cun_cuneus
## Min. : 4960 Min. : 4715 Min. :2653
## 1st Qu.: 7492 1st Qu.: 6732 1st Qu.:4556
## Median : 8189 Median : 7401 Median :5139
## Mean : 8312 Mean : 7447 Mean :5188
## 3rd Qu.: 9116 3rd Qu.: 8115 3rd Qu.:5791
## Max. :13560 Max. :11434 Max. :8236
## left_cun_cuneus right_ent_entorhinal_area left_ent_entorhinal_area
## Min. :2627 Min. : 749 Min. : 796
## 1st Qu.:4551 1st Qu.:2131 1st Qu.:1856
## Median :5085 Median :2372 Median :2070
## Mean :5144 Mean :2424 Mean :2109
## 3rd Qu.:5652 3rd Qu.:2704 3rd Qu.:2336
## Max. :8628 Max. :3989 Max. :4156
## right_g_re_gyrus_rectus left_g_re_gyrus_rectus
## Min. :1253 Min. :1189
## 1st Qu.:1752 1st Qu.:1774
## Median :1955 Median :1956
## Mean :1972 Mean :1978
## 3rd Qu.:2166 3rd Qu.:2152
## Max. :2912 Max. :3072
# Convert relevant columns to factors
factor_cols <- c("pcr", "anosmia", "risk_hospital_icu", "vaccine_before_study",
"covid_before_vaccination", "fever", "cough", "muscle_pain",
"breath_dif", "smell_lost", "taste_lost", "covid_variant",
"vaccine_1", "vaccine_2", "vaccine_3")
# Check which factor_cols actually exist in the data
existing_factor_cols <- factor_cols[factor_cols %in% names(data)]
# Apply as.factor only to existing columns
if (length(existing_factor_cols) > 0) {
data <- data %>%
mutate(across(all_of(existing_factor_cols), as.factor))
}
# --- Analysis Questions & Code Examples ---
# Question 1: Does COVID-19 infection (positive PCR) impact overall cognitive scores?
# Assuming 'cognitive' is a primary numeric outcome. Adjust variable name if needed.
# Check normality assumption first (e.g., Shapiro-Wilk test, histograms)
# shapiro.test(data$cognitive[data$pcr == "Positive"]) # Example for one group
# shapiro.test(data$cognitive[data$pcr == "Negative"]) # Example for other group
# hist(data$cognitive)
# If data is roughly normal: t-test
if ("cognitive" %in% names(data) && "pcr" %in% names(data)) {
print("--- Q1: Cognitive Score vs PCR Status (T-test/Wilcoxon) ---")
# Check levels of pcr factor
print(levels(data$pcr))
# Ensure levels are correctly specified for comparison if needed
tryCatch({
ttest_result <- t.test(cognitive ~ pcr, data = data)
print(ttest_result)
}, error = function(e) {
print(paste("T-test failed:", e$message))
print("Attempting Wilcoxon test instead...")
tryCatch({
wilcox_result <- wilcox.test(cognitive ~ pcr, data = data)
print(wilcox_result)
}, error = function(e2) {
print(paste("Wilcoxon test also failed:", e2$message))
})
})
# Linear model controlling for age (assuming age_pcr is relevant age)
if("age_pcr" %in% names(data)) {
print("--- Q1: Cognitive Score vs PCR Status controlling for Age (LM) ---")
lm_model_pcr_age <- lm(cognitive ~ pcr + age_pcr, data = data)
print(summary(lm_model_pcr_age))
}
} else {
print("Skipping Q1 analysis: 'cognitive' or 'pcr' column not found.")
}
## [1] "--- Q1: Cognitive Score vs PCR Status (T-test/Wilcoxon) ---"
## [1] "NEGATIVA" "POSITIVA"
##
## Welch Two Sample t-test
##
## data: cognitive by pcr
## t = 1.3513, df = 105.05, p-value = 0.1795
## alternative hypothesis: true difference in means between group NEGATIVA and group POSITIVA is not equal to 0
## 95 percent confidence interval:
## -0.03970381 0.20962630
## sample estimates:
## mean in group NEGATIVA mean in group POSITIVA
## 1.573333 1.488372
##
## [1] "--- Q1: Cognitive Score vs PCR Status controlling for Age (LM) ---"
##
## Call:
## lm(formula = cognitive ~ pcr + age_pcr, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.8350 -0.4707 0.2359 0.4252 0.8676
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.325410 0.290119 11.462 < 2e-16 ***
## pcrPOSITIVA -0.111318 0.060860 -1.829 0.068 .
## age_pcr -0.026021 0.004229 -6.153 1.66e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4812 on 459 degrees of freedom
## (1 observation deleted due to missingness)
## Multiple R-squared: 0.07983, Adjusted R-squared: 0.07582
## F-statistic: 19.91 on 2 and 459 DF, p-value: 5.107e-09
The simple comparison (t-test) didn’t find a significant difference in the cognitive score between PCR positive and negative groups (p = 0.18).
However, when controlling for age (age_pcr) in the linear model, age itself is strongly negatively associated with cognitive scores (older age -> lower score, p < 0.001), which is expected. The effect of PCR status (pcrPOSITIVA) shows a trend towards lower scores in the positive group, but it doesn’t reach statistical significance (p = 0.068). This suggests age is a much stronger factor, and any potential direct effect of PCR status on this general cognitive score is weak or requires more power to detect.
# Question 2: Is there a difference in specific cognitive domains (e.g., fluency, memory) between COVID+ and COVID- groups?
# Example using 'fluencia' and 'dscorr'
if (all(c("fluencia", "dscorr", "pcr") %in% names(data))) {
print("--- Q2: Specific Cognitive Domains vs PCR Status ---")
# Individual tests (repeat for other domains, consider p-value adjustment)
print("Fluency:")
tryCatch({print(t.test(fluencia ~ pcr, data = data))}, error=function(e){print(paste("Error:", e$message))})
print("DS Correct:")
tryCatch({print(t.test(dscorr ~ pcr, data = data))}, error=function(e){print(paste("Error:", e$message))})
# MANOVA (Multivariate Analysis of Variance) - checks multiple DVs at once
# Ensure no missing values in the selected columns for MANOVA
print("MANOVA for Fluency & DS Correct:")
manova_data <- data %>% select(fluencia, dscorr, pcr) %>% na.omit()
if(nrow(manova_data) > 0 && length(unique(manova_data$pcr)) > 1) {
manova_result <- manova(cbind(fluencia, dscorr) ~ pcr, data = manova_data)
print(summary(manova_result))
print(summary.aov(manova_result)) # To see univariate results
} else {
print("Insufficient data or factor levels for MANOVA.")
}
} else {
print("Skipping Q2 analysis: 'fluencia', 'dscorr', or 'pcr' column not found.")
}
## [1] "--- Q2: Specific Cognitive Domains vs PCR Status ---"
## [1] "Fluency:"
##
## Welch Two Sample t-test
##
## data: fluencia by pcr
## t = -2.7437, df = 99.602, p-value = 0.007207
## alternative hypothesis: true difference in means between group NEGATIVA and group POSITIVA is not equal to 0
## 95 percent confidence interval:
## -3.0291666 -0.4866883
## sample estimates:
## mean in group NEGATIVA mean in group POSITIVA
## 15.04000 16.79793
##
## [1] "DS Correct:"
##
## Welch Two Sample t-test
##
## data: dscorr by pcr
## t = -0.57336, df = 111.6, p-value = 0.5676
## alternative hypothesis: true difference in means between group NEGATIVA and group POSITIVA is not equal to 0
## 95 percent confidence interval:
## -5.479240 3.019912
## sample estimates:
## mean in group NEGATIVA mean in group POSITIVA
## 68.09333 69.32300
##
## [1] "MANOVA for Fluency & DS Correct:"
## Df Pillai approx F num Df den Df Pr(>F)
## pcr 1 0.021077 4.9305 2 458 0.007611 **
## Residuals 459
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Response fluencia :
## Df Sum Sq Mean Sq F value Pr(>F)
## pcr 1 194.1 194.066 8.4713 0.003783 **
## Residuals 459 10515.1 22.909
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Response dscorr :
## Df Sum Sq Mean Sq F value Pr(>F)
## pcr 1 100 100.36 0.305 0.581
## Residuals 459 151035 329.05
Fluency (fluencia): Surprisingly, the PCR positive group showed significantly higher verbal fluency scores than the negative group (p = 0.007). This is counter-intuitive if expecting COVID to impair cognition and warrants further investigation (see Next Steps).
Memory (dscorr - DS Correct): No significant difference was found between groups (p = 0.57).
MANOVA: The overall multivariate test was significant (p = 0.0076), confirming a difference between PCR groups when considering fluency and memory together. The univariate results confirm this difference is driven solely by the fluencia variable.
# Question 3: Does the presence of specific symptoms (e.g., anosmia) correlate with cognitive performance?
# Using 'anosmia' factor and 'cognitive' score
if (all(c("cognitive", "anosmia", "age_pcr") %in% names(data))) {
print("--- Q3: Cognitive Score vs Anosmia ---")
print(levels(data$anosmia)) # Check levels
# Compare cognitive scores based on anosmia presence (assuming binary factor)
tryCatch({print(t.test(cognitive ~ anosmia, data = data))}, error=function(e){print(paste("Error:", e$message))})
# Linear model controlling for age
print("Linear Model for Cognitive Score ~ Anosmia + Age:")
lm_model_anosmia_age <- lm(cognitive ~ anosmia + age_pcr, data = data)
print(summary(lm_model_anosmia_age))
} else {
print("Skipping Q3 analysis: 'cognitive', 'anosmia', or 'age_pcr' column not found.")
}
## [1] "--- Q3: Cognitive Score vs Anosmia ---"
## [1] "0" "1" "2" "3"
## [1] "Error: grouping factor must have exactly 2 levels"
## [1] "Linear Model for Cognitive Score ~ Anosmia + Age:"
##
## Call:
## lm(formula = cognitive ~ anosmia + age_pcr, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.9277 -0.4598 0.1478 0.4395 0.8791
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.911246 0.290696 10.015 < 2e-16 ***
## anosmia1 0.098567 0.079292 1.243 0.214469
## anosmia2 0.232995 0.060789 3.833 0.000144 ***
## anosmia3 0.086466 0.059951 1.442 0.149911
## age_pcr -0.022953 0.004242 -5.411 1.01e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4758 on 457 degrees of freedom
## (1 observation deleted due to missingness)
## Multiple R-squared: 0.1044, Adjusted R-squared: 0.09654
## F-statistic: 13.31 on 4 and 457 DF, p-value: 2.859e-10
# In the Q3 block, replace the t.test line with:
print("ANOVA for Cognitive Score ~ Anosmia:")
## [1] "ANOVA for Cognitive Score ~ Anosmia:"
tryCatch({
aov_anosmia <- aov(cognitive ~ anosmia, data = data)
print(summary(aov_anosmia))
# Optional: Post-hoc tests if ANOVA is significant
if (summary(aov_anosmia)[[1]]$`Pr(>F)`[1] < 0.05) {
print(TukeyHSD(aov_anosmia))
}
}, error=function(e){print(paste("Error:", e$message))})
## Df Sum Sq Mean Sq F value Pr(>F)
## anosmia 3 5.43 1.8095 7.529 6.3e-05 ***
## Residuals 458 110.07 0.2403
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 1 observation deleted due to missingness
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = cognitive ~ anosmia, data = data)
##
## $anosmia
## diff lwr upr p adj
## 1-0 0.15129274 -0.05777993 0.36036542 0.2442834
## 2-0 0.28409373 0.12454535 0.44364210 0.0000337
## 3-0 0.10220183 -0.05689478 0.26129845 0.3481916
## 2-1 0.13280098 -0.06681980 0.33242177 0.3167972
## 3-1 -0.04909091 -0.24835080 0.15016898 0.9206020
## 3-2 -0.18189189 -0.32834601 -0.03543778 0.0079235
Linear Model (controlling for age): Age is again significant (p < 0.001). Interestingly, compared to the baseline anosmia level 0, having anosmia level 2 is associated with significantly higher cognitive scores (p = 0.00014). Levels 1 and 3 were not significantly different from 0. This is also counter-intuitive and needs careful checking.
However, I don’t know what the levels of anosmia mean (anosmia1, anosmia2, anosmia3).
# Question 4: Does vaccination status influence cognitive outcomes, potentially interacting with COVID status?
# Requires 'num_doses' variable created earlier, and assumes 'cognitive', 'pcr', 'age_pcr' exist
if (all(c("cognitive", "pcr", "age_pcr") %in% names(data)) && "num_doses" %in% names(data)) {
print("--- Q4: Cognitive Score vs Vaccination & PCR Status ---")
# ANOVA/Linear Model including interaction
# Using car::Anova for Type III sums of squares, often preferred with interactions
lm_interaction <- lm(cognitive ~ num_doses * pcr + age_pcr, data = data)
print("ANOVA Table (Type III SS):")
print(Anova(lm_interaction, type = "III"))
print("Model Summary:")
print(summary(lm_interaction))
# Visualize interaction if significant (example)
# ggplot(data, aes(x = pcr, y = cognitive, color = num_doses, group = num_doses)) +
# stat_summary(fun.data = mean_se, geom = "errorbar", width = 0.1) +
# stat_summary(fun = mean, geom = "line") +
# stat_summary(fun = mean, geom = "point") +
# labs(title = "Interaction Plot: Cognitive Score by PCR Status and Vaccination",
# x = "PCR Status", y = "Mean Cognitive Score", color = "Vaccine Doses") +
# theme_minimal()
} else {
print("Skipping Q4 analysis: 'cognitive', 'pcr', 'age_pcr', or 'num_doses' column not found or not created.")
}
## [1] "Skipping Q4 analysis: 'cognitive', 'pcr', 'age_pcr', or 'num_doses' column not found or not created."
NOTE: We will try again later…
# Question 5: Are there differences in cognitive performance based on COVID-19 variant?
# Requires 'covid_variant', 'cognitive', 'age_pcr'. May need to control for vaccination ('num_doses').
if (all(c("cognitive", "covid_variant", "age_pcr") %in% names(data))) {
print("--- Q5: Cognitive Score vs COVID Variant ---")
print(levels(data$covid_variant)) # Check levels/variants present
# Filter out variants with very few cases if necessary for stable analysis
variant_counts <- table(data$covid_variant)
print("Variant Counts:")
print(variant_counts)
# data_filtered_variants <- data %>% filter(covid_variant %in% names(variant_counts[variant_counts > 10])) # Example threshold
# ANOVA model (using original data or filtered data)
# Add other covariates like num_doses if available and relevant
control_vars <- "age_pcr"
if ("num_doses" %in% names(data)) {
control_vars <- paste(control_vars, "+ num_doses")
}
formula_q5 <- as.formula(paste("cognitive ~ covid_variant +", control_vars))
aov_model_variant <- aov(formula_q5, data = data)
print("ANOVA Summary:")
print(summary(aov_model_variant))
# Post-hoc tests if ANOVA is significant
if (summary(aov_model_variant)[[1]]$`Pr(>F)`[1] < 0.05) {
print("Post-hoc Tests (Tukey HSD):")
print(TukeyHSD(aov_model_variant, which = "covid_variant"))
}
} else {
print("Skipping Q5 analysis: 'cognitive', 'covid_variant', or 'age_pcr' column not found.")
}
## [1] "--- Q5: Cognitive Score vs COVID Variant ---"
## [1] "0" "1" "2" "3" "4" "5" "6" "7"
## [1] "Variant Counts:"
##
## 0 1 2 3 4 5 6 7
## 81 222 94 59 1 4 1 1
## [1] "ANOVA Summary:"
## Df Sum Sq Mean Sq F value Pr(>F)
## covid_variant 7 1.24 0.177 0.754 0.626
## age_pcr 1 8.12 8.119 34.648 7.69e-09 ***
## Residuals 454 106.39 0.234
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
After controlling for age, there were no significant differences in the cognitive score based on the identified covid_variant (p = 0.626).
Caveat: Some variants had very few participants (n=1 or n=4), making comparisons involving them unreliable. We might consider grouping rare variants or excluding them. Let’s check later.
# Question 6: How does age relate to cognitive performance? Does COVID history modify this?
if (all(c("cognitive", "age_pcr", "pcr") %in% names(data))) {
print("--- Q6: Age, Cognitive Score, and PCR Interaction ---")
# Correlation
cor_age_cog <- cor.test(~ cognitive + age_pcr, data = data)
print("Correlation between Age and Cognitive Score:")
print(cor_age_cog)
# Linear model with interaction term
lm_age_interaction <- lm(cognitive ~ age_pcr * pcr, data = data)
print("Linear Model with Age * PCR Interaction:")
print(summary(lm_age_interaction))
# Visualize the relationship (optional)
# ggplot(data, aes(x = age_pcr, y = cognitive, color = pcr)) +
# geom_point(alpha = 0.5) +
# geom_smooth(method = "lm", aes(fill = pcr), alpha = 0.1) + # Add regression lines
# labs(title = "Cognitive Score vs Age by PCR Status",
# x = "Age at PCR", y = "Cognitive Score") +
# theme_minimal()
} else {
print("Skipping Q6 analysis: 'cognitive', 'age_pcr', or 'pcr' column not found.")
}
## [1] "--- Q6: Age, Cognitive Score, and PCR Interaction ---"
## [1] "Correlation between Age and Cognitive Score:"
##
## Pearson's product-moment correlation
##
## data: cognitive and age_pcr
## t = -5.9975, df = 461, p-value = 4.052e-09
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.3515442 -0.1823737
## sample estimates:
## cor
## -0.2690327
##
## [1] "Linear Model with Age * PCR Interaction:"
##
## Call:
## lm(formula = cognitive ~ age_pcr * pcr, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.8347 -0.4722 0.1653 0.4324 0.8382
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.972676 0.668204 5.945 5.48e-09 ***
## age_pcr -0.035634 0.009889 -3.603 0.000349 ***
## pcrPOSITIVA -0.901191 0.737102 -1.223 0.222104
## age_pcr:pcrPOSITIVA 0.011763 0.010940 1.075 0.282823
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4811 on 458 degrees of freedom
## (1 observation deleted due to missingness)
## Multiple R-squared: 0.08214, Adjusted R-squared: 0.07613
## F-statistic: 13.66 on 3 and 458 DF, p-value: 1.5e-08
There’s a significant negative correlation between age and cognitive score (p < 0.001), as seen before.
The linear model testing for an interaction found no significant interaction effect (p = 0.28). This suggests that the relationship between age and cognitive score is similar for both PCR positive and negative groups in our sample.
# Question 7: Are there correlations between different cognitive test performances?
# Select relevant cognitive variable columns (adjust column names/indices as needed)
# Using names from skimr output: columns 24:60 correspond to indices 24:60 if data wasn't reordered
# Let's use specific names for clarity
cog_vars_indices <- which(names(data) %in% c("listaprimerrec", "listaaprendizaje", "listacp", "listalp", "listarecon",
"corsidirecto", "corsiinverso", "cactusvivos", "cactusinanim",
"otverbaltpo", "otverbalerr", "otvisualtpo", "otvisualerr",
"otmentaltpo", "otmentalerr", "otvismenttpo", "otvismenterr",
"otswitchtpo", "otswitcherr", "x5dreadtpo", "x5dreaderr",
"x5dcounttpo", "x5dcounterr", "x5dfoctpo", "x5dfocerr",
"x5dswitchtpo", "x5dswitcherr", "dscorr", "dsomis", "dscomis",
"torremov", "torretpo", "bostonsc", "bostonlat",
"bostonsemerr", "bostonfonerr", "fluencia"))
if (length(cog_vars_indices) > 1) {
print("--- Q7: Correlations among Cognitive Variables ---")
cognitive_subset <- data[, cog_vars_indices]
# Handle missing data for correlation matrix (e.g., pairwise complete)
cor_matrix <- cor(cognitive_subset, use = "pairwise.complete.obs")
print("Correlation Matrix (Top Left Corner):")
print(round(cor_matrix[1:min(10, nrow(cor_matrix)), 1:min(10, ncol(cor_matrix))], 2)) # Print a portion
# Consider visualizing with corrplot or GGally packages
# library(corrplot)
# corrplot(cor_matrix, type = "upper", order = "hclust", tl.col = "black", tl.srt = 45)
# Optional: Principal Component Analysis (PCA) or Factor Analysis
# Requires careful consideration of scaling and interpretation
# pca_result <- princomp(na.omit(cognitive_subset), cor = TRUE) # Using correlation matrix
# summary(pca_result)
# loadings(pca_result)
} else {
print("Skipping Q7 analysis: Not enough cognitive variable columns found or selected.")
}
## [1] "--- Q7: Correlations among Cognitive Variables ---"
## [1] "Correlation Matrix (Top Left Corner):"
## listaprimerrec listaaprendizaje listacp listalp listarecon
## listaprimerrec 1.00 0.62 0.43 0.41 0.37
## listaaprendizaje 0.62 1.00 0.71 0.69 0.53
## listacp 0.43 0.71 1.00 0.81 0.54
## listalp 0.41 0.69 0.81 1.00 0.55
## listarecon 0.37 0.53 0.54 0.55 1.00
## corsidirecto 0.14 0.14 0.22 0.17 0.21
## corsiinverso 0.28 0.30 0.26 0.27 0.29
## cactusvivos 0.34 0.45 0.43 0.38 0.44
## cactusinanim 0.32 0.43 0.41 0.37 0.41
## otverbaltpo 0.24 0.41 0.36 0.34 0.15
## corsidirecto corsiinverso cactusvivos cactusinanim otverbaltpo
## listaprimerrec 0.14 0.28 0.34 0.32 0.24
## listaaprendizaje 0.14 0.30 0.45 0.43 0.41
## listacp 0.22 0.26 0.43 0.41 0.36
## listalp 0.17 0.27 0.38 0.37 0.34
## listarecon 0.21 0.29 0.44 0.41 0.15
## corsidirecto 1.00 0.55 0.34 0.42 0.19
## corsiinverso 0.55 1.00 0.43 0.46 0.32
## cactusvivos 0.34 0.43 1.00 0.76 0.41
## cactusinanim 0.42 0.46 0.76 1.00 0.39
## otverbaltpo 0.19 0.32 0.41 0.39 1.00
The output shows a portion of the correlation matrix. As expected, there are moderate-to-strong correlations between related tests (e.g., the different ‘lista’ tests). This confirms relationships between different cognitive measures.
# Question 8: Do neuroimaging measures correlate with cognitive scores or differ by COVID status?
# Example using 'right_hippocampus' and 'cognitive', 'pcr', 'age_pcr'
neuro_var <- "right_hippocampus" # Choose an imaging variable
if (all(c(neuro_var, "cognitive", "pcr", "age_pcr") %in% names(data))) {
print(paste("--- Q8: Neuroimaging (", neuro_var, ") Analysis ---"))
# Correlation with cognitive score
cor_neuro_cog <- cor.test(data[[neuro_var]], data$cognitive, use = "complete.obs")
print(paste("Correlation between", neuro_var, "and Cognitive Score:"))
print(cor_neuro_cog)
# Comparison based on PCR status (controlling for age)
lm_neuro_pcr_age <- lm(as.formula(paste(neuro_var, "~ pcr + age_pcr")), data = data)
print(paste("Linear Model:", neuro_var, "~ pcr + age_pcr"))
print(summary(lm_neuro_pcr_age))
# T-test (uncorrected for age)
# tryCatch({print(t.test(as.formula(paste(neuro_var, "~ pcr")), data = data))}, error=function(e){print(paste("Error:", e$message))})
} else {
print(paste("Skipping Q8 analysis: '", neuro_var, "', 'cognitive', 'pcr', or 'age_pcr' column not found."))
}
## [1] "--- Q8: Neuroimaging ( right_hippocampus ) Analysis ---"
## [1] "Correlation between right_hippocampus and Cognitive Score:"
##
## Pearson's product-moment correlation
##
## data: data[[neuro_var]] and data$cognitive
## t = 2.8752, df = 461, p-value = 0.004224
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.04210788 0.22118372
## sample estimates:
## cor
## 0.1327288
##
## [1] "Linear Model: right_hippocampus ~ pcr + age_pcr"
##
## Call:
## lm(formula = as.formula(paste(neuro_var, "~ pcr + age_pcr")),
## data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1404.85 -301.81 -37.61 263.75 1614.20
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5217.312 252.916 20.629 < 2e-16 ***
## pcrPOSITIVA -20.996 53.055 -0.396 0.692
## age_pcr -20.689 3.687 -5.612 3.47e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 419.5 on 459 degrees of freedom
## (1 observation deleted due to missingness)
## Multiple R-squared: 0.0642, Adjusted R-squared: 0.06013
## F-statistic: 15.75 on 2 and 459 DF, p-value: 2.432e-07
There’s a small but statistically significant positive correlation between the right_hippocampus measure and the cognitive score (r = 0.13, p = 0.004).
In the linear model controlling for age, age_pcr was significantly negatively associated with the right_hippocampus measure (p < 0.001), but PCR status was not (p = 0.69).
# --- Data Preparation (Adapt as needed) ---
# Convert relevant columns to factors
factor_cols <- c("pcr", "anosmia", "risk_hospital_icu", "vaccine_before_study",
"covid_before_vaccination", "fever", "cough", "muscle_pain",
"breath_dif", "smell_lost", "taste_lost", "covid_variant",
"vaccine_1", "vaccine_2", "vaccine_3")
# Check which factor_cols actually exist in the data
existing_factor_cols <- factor_cols[factor_cols %in% names(data)]
# Apply as.factor only to existing columns
if (length(existing_factor_cols) > 0) {
data <- data %>%
mutate(across(all_of(existing_factor_cols), as.factor))
}
# !! Action Needed for Q4: Create 'num_doses' variable !!
# Uncomment and ADAPT the following code based on how vaccine_1/2/3 columns indicate doses
# Example: Assumes non-NA means dose received. Change logic if needed (e.g., check for specific values like "Pfizer", "Yes", etc.)
# data <- data %>%
# mutate(
# num_doses = case_when(
# !is.na(vaccine_3) & vaccine_3 != "" & vaccine_3 != "NA" ~ 3, # Adjust conditions based on your data
# !is.na(vaccine_2) & vaccine_2 != "" & vaccine_2 != "NA" ~ 2, # Adjust conditions based on your data
# !is.na(vaccine_1) & vaccine_1 != "" & vaccine_1 != "NA" ~ 1, # Adjust conditions based on your data
# TRUE ~ 0 # Assumes those without vaccine_1 entry have 0 doses
# ),
# # Convert to factor with meaningful labels
# num_doses = factor(num_doses, levels = 0:3, labels = c("0 Doses", "1 Dose", "2 Doses", "3 Doses"))
# )
#
# # Check the created variable
# print("Summary of num_doses variable:")
# print(summary(data$num_doses))
# print(table(data$num_doses, useNA = "ifany"))
# --- Analysis Questions & Code Examples ---
# Question 1: Does COVID-19 infection (positive PCR) impact overall cognitive scores?
if ("cognitive" %in% names(data) && "pcr" %in% names(data)) {
print("--- Q1: Cognitive Score vs PCR Status ---")
print(levels(data$pcr))
# Welch T-test (original analysis)
tryCatch({
ttest_result_q1 <- t.test(cognitive ~ pcr, data = data)
print(ttest_result_q1)
# Suggestion 9: Calculate Effect Size (Cohen's d)
print("Effect Size (Cohen's d):")
print(cohens_d(cognitive ~ pcr, data = data))
}, error = function(e) {print(paste("T-test failed:", e$message))})
# Linear model controlling for age (original analysis)
if("age_pcr" %in% names(data)) {
print("--- Q1: Cognitive Score vs PCR Status controlling for Age (LM) ---")
lm_model_pcr_age <- lm(cognitive ~ pcr + age_pcr, data = data)
print(summary(lm_model_pcr_age))
# Suggestion 9: Calculate Effect Size (Partial Eta Squared for ANOVA equivalent)
print("Effect Sizes (Partial Eta Squared) from ANOVA:")
print(eta_squared(car::Anova(lm_model_pcr_age, type="III"), partial = TRUE)) # requires car package
# Suggestion 4: Check Model Assumptions
print("Checking assumptions for lm(cognitive ~ pcr + age_pcr):")
par(mfrow=c(2,2)) # Arrange plots in a 2x2 grid
plot(lm_model_pcr_age)
par(mfrow=c(1,1)) # Reset plot layout
}
} else {
print("Skipping Q1 analysis: 'cognitive' or 'pcr' column not found.")
}
## [1] "--- Q1: Cognitive Score vs PCR Status ---"
## [1] "NEGATIVA" "POSITIVA"
##
## Welch Two Sample t-test
##
## data: cognitive by pcr
## t = 1.3513, df = 105.05, p-value = 0.1795
## alternative hypothesis: true difference in means between group NEGATIVA and group POSITIVA is not equal to 0
## 95 percent confidence interval:
## -0.03970381 0.20962630
## sample estimates:
## mean in group NEGATIVA mean in group POSITIVA
## 1.573333 1.488372
##
## [1] "Effect Size (Cohen's d):"
## Warning: 'y' is numeric but has only 2 unique values.
## If this is a grouping variable, convert it to a factor.
## Cohen's d | 95% CI
## -------------------------
## 0.17 | [-0.08, 0.42]
##
## - Estimated using pooled SD.[1] "--- Q1: Cognitive Score vs PCR Status controlling for Age (LM) ---"
##
## Call:
## lm(formula = cognitive ~ pcr + age_pcr, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.8350 -0.4707 0.2359 0.4252 0.8676
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.325410 0.290119 11.462 < 2e-16 ***
## pcrPOSITIVA -0.111318 0.060860 -1.829 0.068 .
## age_pcr -0.026021 0.004229 -6.153 1.66e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4812 on 459 degrees of freedom
## (1 observation deleted due to missingness)
## Multiple R-squared: 0.07983, Adjusted R-squared: 0.07582
## F-statistic: 19.91 on 2 and 459 DF, p-value: 5.107e-09
##
## [1] "Effect Sizes (Partial Eta Squared) from ANOVA:"
## # Effect Size for ANOVA (Type III)
##
## Parameter | Eta2 (partial) | 95% CI
## -----------------------------------------
## pcr | 7.24e-03 | [0.00, 1.00]
## age_pcr | 0.08 | [0.04, 1.00]
##
## - One-sided CIs: upper bound fixed at [1.00].[1] "Checking assumptions for lm(cognitive ~ pcr + age_pcr):"
# Question 2: Is there a difference in specific cognitive domains (e.g., fluency, memory) between COVID+ and COVID- groups?
if (all(c("fluencia", "dscorr", "pcr") %in% names(data))) {
print("--- Q2: Specific Cognitive Domains vs PCR Status ---")
# Fluency (original analysis)
print("Fluency:")
tryCatch({
ttest_fluencia <- t.test(fluencia ~ pcr, data = data)
print(ttest_fluencia)
# Suggestion 9: Effect Size
if(ttest_fluencia$p.value < 0.05) { # Only print if significant
print("Effect Size (Cohen's d) for Fluency:")
print(cohens_d(fluencia ~ pcr, data = data))
}
# Suggestion 3 & 10: Visualize counter-intuitive fluency finding
print("Generating Boxplot for Fluency by PCR status:")
print( # Explicitly print ggplot object
ggplot(data, aes(x = pcr, y = fluencia, fill = pcr)) +
geom_boxplot(alpha=0.7) +
geom_jitter(width=0.1, alpha=0.3) +
labs(title = "Verbal Fluency by PCR Status", x = "PCR Status", y = "Fluency Score") +
theme_minimal() +
theme(legend.position = "none")
)
}, error=function(e){print(paste("Error testing fluency:", e$message))})
# DS Correct (original analysis)
print("DS Correct:")
tryCatch({print(t.test(dscorr ~ pcr, data = data))}, error=function(e){print(paste("Error testing dscorr:", e$message))})
# MANOVA (original analysis)
print("MANOVA for Fluency & DS Correct:")
manova_data <- data %>% select(fluencia, dscorr, pcr) %>% na.omit()
if(nrow(manova_data) > 0 && length(unique(manova_data$pcr)) > 1) {
manova_result <- manova(cbind(fluencia, dscorr) ~ pcr, data = manova_data)
print(summary(manova_result))
print(summary.aov(manova_result))
# Suggestion 9: Effect Size (Pillai's Trace - MANOVA effect sizes are complex, eta_squared on univariate is simpler)
print("Effect Sizes (Partial Eta Squared) for Univariate ANOVAs from MANOVA:")
print(eta_squared(manova_result, partial = TRUE)) # Eta squared for the MANOVA model factors
} else {
print("Insufficient data or factor levels for MANOVA.")
}
# Suggestion 8: Address Multiple Comparisons if testing many domains
# Example: If you tested 5 cognitive domains with t-tests vs PCR
# p_values_domains <- c(ttest_fluencia$p.value, p_val_domain2, p_val_domain3, p_val_domain4, p_val_domain5)
# print("Adjusted p-values (Benjamini-Hochberg):")
# print(p.adjust(p_values_domains, method = "BH"))
} else {
print("Skipping Q2 analysis: 'fluencia', 'dscorr', or 'pcr' column not found.")
}
## [1] "--- Q2: Specific Cognitive Domains vs PCR Status ---"
## [1] "Fluency:"
##
## Welch Two Sample t-test
##
## data: fluencia by pcr
## t = -2.7437, df = 99.602, p-value = 0.007207
## alternative hypothesis: true difference in means between group NEGATIVA and group POSITIVA is not equal to 0
## 95 percent confidence interval:
## -3.0291666 -0.4866883
## sample estimates:
## mean in group NEGATIVA mean in group POSITIVA
## 15.04000 16.79793
##
## [1] "Effect Size (Cohen's d) for Fluency:"
## Warning: Missing values detected. NAs dropped.
## Cohen's d | 95% CI
## --------------------------
## -0.37 | [-0.62, -0.12]
##
## - Estimated using pooled SD.[1] "Generating Boxplot for Fluency by PCR status:"
## Warning: Removed 1 row containing non-finite outside the scale range
## (`stat_boxplot()`).
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_point()`).
## [1] "DS Correct:"
##
## Welch Two Sample t-test
##
## data: dscorr by pcr
## t = -0.57336, df = 111.6, p-value = 0.5676
## alternative hypothesis: true difference in means between group NEGATIVA and group POSITIVA is not equal to 0
## 95 percent confidence interval:
## -5.479240 3.019912
## sample estimates:
## mean in group NEGATIVA mean in group POSITIVA
## 68.09333 69.32300
##
## [1] "MANOVA for Fluency & DS Correct:"
## Df Pillai approx F num Df den Df Pr(>F)
## pcr 1 0.021077 4.9305 2 458 0.007611 **
## Residuals 459
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Response fluencia :
## Df Sum Sq Mean Sq F value Pr(>F)
## pcr 1 194.1 194.066 8.4713 0.003783 **
## Residuals 459 10515.1 22.909
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Response dscorr :
## Df Sum Sq Mean Sq F value Pr(>F)
## pcr 1 100 100.36 0.305 0.581
## Residuals 459 151035 329.05
##
## [1] "Effect Sizes (Partial Eta Squared) for Univariate ANOVAs from MANOVA:"
## # Effect Size for ANOVA (Type I)
##
## Parameter | Eta2 (partial) | 95% CI
## -----------------------------------------
## pcr | 0.02 | [0.00, 1.00]
##
## - One-sided CIs: upper bound fixed at [1.00].
# Question 3: Does the presence/severity of specific symptoms (e.g., anosmia) correlate with cognitive performance?
if (all(c("cognitive", "anosmia", "age_pcr") %in% names(data))) {
print("--- Q3: Cognitive Score vs Anosmia ---")
print("Levels of anosmia variable:")
print(levels(data$anosmia))
print("Summary of anosmia variable:")
print(summary(data$anosmia)) # See counts per level
# Suggestion 1: Replace t.test with ANOVA for multi-level factor
print("ANOVA for Cognitive Score ~ Anosmia:")
tryCatch({
aov_anosmia <- aov(cognitive ~ anosmia, data = data)
summary_aov_anosmia <- summary(aov_anosmia)
print(summary_aov_anosmia)
# Suggestion 9: Effect Size (Eta Squared)
print("Effect Size (Eta Squared) for Anosmia ANOVA:")
print(eta_squared(aov_anosmia, partial = FALSE)) # Simple eta-squared for one-way ANOVA
# Optional: Post-hoc tests if ANOVA is significant
if (summary_aov_anosmia[[1]]$`Pr(>F)`[1] < 0.05) {
print("Post-hoc Tests (Tukey HSD) for Anosmia:")
print(TukeyHSD(aov_anosmia))
}
}, error=function(e){print(paste("Error running ANOVA for anosmia:", e$message))})
# Linear model controlling for age (original analysis)
print("Linear Model for Cognitive Score ~ Anosmia + Age:")
lm_model_anosmia_age <- lm(cognitive ~ anosmia + age_pcr, data = data)
print(summary(lm_model_anosmia_age))
# Suggestion 9: Effect Sizes
print("Effect Sizes (Partial Eta Squared) from Anosmia + Age model:")
print(eta_squared(car::Anova(lm_model_anosmia_age, type="III"), partial = TRUE))
# Suggestion 4: Check Model Assumptions
print("Checking assumptions for lm(cognitive ~ anosmia + age_pcr):")
par(mfrow=c(2,2))
plot(lm_model_anosmia_age)
par(mfrow=c(1,1))
# Suggestion 1 & 10: Visualize potentially counter-intuitive Anosmia finding
print("Generating Boxplot for Cognitive Score by Anosmia level:")
print( # Explicitly print ggplot object
ggplot(data, aes(x = anosmia, y = cognitive, fill = anosmia)) +
geom_boxplot(alpha=0.7) +
labs(title = "Cognitive Score by Anosmia Level", x = "Anosmia Level", y = "Cognitive Score") +
theme_minimal() +
theme(legend.position = "none")
)
} else {
print("Skipping Q3 analysis: 'cognitive', 'anosmia', or 'age_pcr' column not found.")
}
## [1] "--- Q3: Cognitive Score vs Anosmia ---"
## [1] "Levels of anosmia variable:"
## [1] "0" "1" "2" "3"
## [1] "Summary of anosmia variable:"
## 0 1 2 3 NA's
## 109 55 148 150 1
## [1] "ANOVA for Cognitive Score ~ Anosmia:"
## Df Sum Sq Mean Sq F value Pr(>F)
## anosmia 3 5.43 1.8095 7.529 6.3e-05 ***
## Residuals 458 110.07 0.2403
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 1 observation deleted due to missingness
## [1] "Effect Size (Eta Squared) for Anosmia ANOVA:"
## # Effect Size for ANOVA (Type I)
##
## Parameter | Eta2 | 95% CI
## -------------------------------
## anosmia | 0.05 | [0.02, 1.00]
##
## - One-sided CIs: upper bound fixed at [1.00].[1] "Post-hoc Tests (Tukey HSD) for Anosmia:"
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = cognitive ~ anosmia, data = data)
##
## $anosmia
## diff lwr upr p adj
## 1-0 0.15129274 -0.05777993 0.36036542 0.2442834
## 2-0 0.28409373 0.12454535 0.44364210 0.0000337
## 3-0 0.10220183 -0.05689478 0.26129845 0.3481916
## 2-1 0.13280098 -0.06681980 0.33242177 0.3167972
## 3-1 -0.04909091 -0.24835080 0.15016898 0.9206020
## 3-2 -0.18189189 -0.32834601 -0.03543778 0.0079235
##
## [1] "Linear Model for Cognitive Score ~ Anosmia + Age:"
##
## Call:
## lm(formula = cognitive ~ anosmia + age_pcr, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.9277 -0.4598 0.1478 0.4395 0.8791
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.911246 0.290696 10.015 < 2e-16 ***
## anosmia1 0.098567 0.079292 1.243 0.214469
## anosmia2 0.232995 0.060789 3.833 0.000144 ***
## anosmia3 0.086466 0.059951 1.442 0.149911
## age_pcr -0.022953 0.004242 -5.411 1.01e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4758 on 457 degrees of freedom
## (1 observation deleted due to missingness)
## Multiple R-squared: 0.1044, Adjusted R-squared: 0.09654
## F-statistic: 13.31 on 4 and 457 DF, p-value: 2.859e-10
##
## [1] "Effect Sizes (Partial Eta Squared) from Anosmia + Age model:"
## # Effect Size for ANOVA (Type III)
##
## Parameter | Eta2 (partial) | 95% CI
## -----------------------------------------
## anosmia | 0.03 | [0.01, 1.00]
## age_pcr | 0.06 | [0.03, 1.00]
##
## - One-sided CIs: upper bound fixed at [1.00].[1] "Checking assumptions for lm(cognitive ~ anosmia + age_pcr):"
## [1] "Generating Boxplot for Cognitive Score by Anosmia level:"
# Question 4: Does vaccination status influence cognitive outcomes, potentially interacting with COVID status?
# Requires 'num_doses' variable created in Preparation Step
if (all(c("cognitive", "pcr", "age_pcr") %in% names(data)) && "num_doses" %in% names(data)) {
print("--- Q4: Cognitive Score vs Vaccination & PCR Status ---")
# Ensure num_doses is a factor
if(!is.factor(data$num_doses)) data$num_doses <- factor(data$num_doses)
# ANOVA/Linear Model including interaction
print("Linear Model: cognitive ~ num_doses * pcr + age_pcr")
lm_interaction_vacc <- lm(cognitive ~ num_doses * pcr + age_pcr, data = data)
# Using car::Anova for Type III sums of squares
print("ANOVA Table (Type III SS):")
anova_table_q4 <- car::Anova(lm_interaction_vacc, type = "III")
print(anova_table_q4)
print("Model Summary:")
print(summary(lm_interaction_vacc))
# Suggestion 9: Effect Sizes
print("Effect Sizes (Partial Eta Squared):")
print(eta_squared(anova_table_q4, partial = TRUE))
# Suggestion 4: Check Model Assumptions
print("Checking assumptions for lm(cognitive ~ num_doses * pcr + age_pcr):")
par(mfrow=c(2,2))
plot(lm_interaction_vacc)
par(mfrow=c(1,1))
# Suggestion 10: Visualize interaction if significant
# Check if the interaction term p-value is less than 0.05
interaction_p_value <- anova_table_q4["num_doses:pcr", "Pr(>F)"]
if (!is.na(interaction_p_value) && interaction_p_value < 0.05) {
print("Interaction detected, generating interaction plot:")
print( # Explicitly print ggplot object
ggplot(data, aes(x = pcr, y = cognitive, color = num_doses, group = num_doses)) +
stat_summary(fun.data = mean_se, geom = "errorbar", width = 0.1, position=position_dodge(0.1)) +
stat_summary(fun = mean, geom = "line", position=position_dodge(0.1)) +
stat_summary(fun = mean, geom = "point", position=position_dodge(0.1), size=2) +
labs(title = "Interaction: Cognitive Score by PCR Status and Vaccination",
x = "PCR Status", y = "Mean Cognitive Score", color = "Vaccine Doses") +
theme_minimal()
)
} else {
print("Interaction term num_doses:pcr not significant (or NA), skipping interaction plot.")
}
} else {
print("Skipping Q4 analysis: 'cognitive', 'pcr', 'age_pcr', or 'num_doses' column not found or not created/adapted correctly.")
}
## [1] "Skipping Q4 analysis: 'cognitive', 'pcr', 'age_pcr', or 'num_doses' column not found or not created/adapted correctly."
# Question 5: Are there differences in cognitive performance based on COVID-19 variant?
if (all(c("cognitive", "covid_variant", "age_pcr") %in% names(data))) {
print("--- Q5: Cognitive Score vs COVID Variant ---")
print("Original Variant Counts:")
variant_counts <- table(data$covid_variant, useNA = "ifany")
print(variant_counts)
# Suggestion 6: Refine Variant Analysis - Handle rare variants (Example: Grouping)
# Define a threshold for 'rare'
rare_threshold <- 10 # Example: group variants with fewer than 10 cases
data <- data %>%
mutate(
covid_variant_grouped = ifelse(covid_variant %in% names(variant_counts[variant_counts < rare_threshold]),
"Other_Rare",
as.character(covid_variant)), # Keep others as is
covid_variant_grouped = factor(covid_variant_grouped) # Convert back to factor
)
print("Grouped Variant Counts:")
print(table(data$covid_variant_grouped, useNA = "ifany"))
# ANOVA model using the grouped variant variable
control_vars <- "age_pcr"
# Add num_doses if available and analysis in Q4 worked
if ("num_doses" %in% names(data) && exists("lm_interaction_vacc")) {
control_vars <- paste(control_vars, "+ num_doses")
}
formula_q5 <- as.formula(paste("cognitive ~ covid_variant_grouped +", control_vars))
print(paste("Running ANOVA:", deparse(formula_q5)))
aov_model_variant <- aov(formula_q5, data = data)
summary_aov_variant <- summary(aov_model_variant)
print("ANOVA Summary (Grouped Variants):")
print(summary_aov_variant)
# Suggestion 9: Effect Size
print("Effect Size (Eta Squared) for Grouped Variants ANOVA:")
print(eta_squared(aov_model_variant)) # Use car::Anova if using Type III SS with interactions/covariates
# Post-hoc tests if ANOVA is significant
if (summary_aov_variant[[1]]$`Pr(>F)`[1] < 0.05) {
print("Post-hoc Tests (Tukey HSD) for Grouped Variants:")
# Ensure the factor name matches the one in the formula
print(TukeyHSD(aov_model_variant, which = "covid_variant_grouped"))
}
} else {
print("Skipping Q5 analysis: 'cognitive', 'covid_variant', or 'age_pcr' column not found.")
}
## [1] "--- Q5: Cognitive Score vs COVID Variant ---"
## [1] "Original Variant Counts:"
##
## 0 1 2 3 4 5 6 7
## 81 222 94 59 1 4 1 1
## [1] "Grouped Variant Counts:"
##
## 0 1 2 3 Other_Rare
## 81 222 94 59 7
## [1] "Running ANOVA: cognitive ~ covid_variant_grouped + age_pcr"
## [1] "ANOVA Summary (Grouped Variants):"
## Df Sum Sq Mean Sq F value Pr(>F)
## covid_variant_grouped 4 0.52 0.131 0.56 0.692
## age_pcr 1 8.45 8.454 36.18 3.68e-09 ***
## Residuals 457 106.77 0.234
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## [1] "Effect Size (Eta Squared) for Grouped Variants ANOVA:"
## # Effect Size for ANOVA (Type I)
##
## Parameter | Eta2 (partial) | 95% CI
## -----------------------------------------------------
## covid_variant_grouped | 4.87e-03 | [0.00, 1.00]
## age_pcr | 0.07 | [0.04, 1.00]
##
## - One-sided CIs: upper bound fixed at [1.00].
# Question 6: How does age relate to cognitive performance? Does COVID history modify this?
if (all(c("cognitive", "age_pcr", "pcr") %in% names(data))) {
print("--- Q6: Age, Cognitive Score, and PCR Interaction ---")
# Correlation (original analysis)
cor_age_cog <- cor.test(~ cognitive + age_pcr, data = data)
print("Correlation between Age and Cognitive Score:")
print(cor_age_cog)
# Linear model with interaction term (original analysis)
lm_age_interaction <- lm(cognitive ~ age_pcr * pcr, data = data)
print("Linear Model with Age * PCR Interaction:")
print(summary(lm_age_interaction))
# Suggestion 9: Effect Sizes
print("Effect Sizes (Partial Eta Squared) for Age*PCR model:")
print(eta_squared(car::Anova(lm_age_interaction, type="III"), partial = TRUE))
# Suggestion 4: Check Model Assumptions
print("Checking assumptions for lm(cognitive ~ age_pcr * pcr):")
par(mfrow=c(2,2))
plot(lm_age_interaction)
par(mfrow=c(1,1))
# Suggestion 10: Visualize the relationship (interaction or main effect of age)
print("Generating Scatter Plot for Cognitive Score vs Age by PCR Status:")
print( # Explicitly print ggplot object
ggplot(data, aes(x = age_pcr, y = cognitive, color = pcr)) +
geom_point(alpha = 0.4) +
geom_smooth(method = "lm", aes(fill = pcr), alpha = 0.1) + # Add regression lines per group
labs(title = "Cognitive Score vs Age by PCR Status",
x = "Age at PCR", y = "Cognitive Score") +
theme_minimal()
)
} else {
print("Skipping Q6 analysis: 'cognitive', 'age_pcr', or 'pcr' column not found.")
}
## [1] "--- Q6: Age, Cognitive Score, and PCR Interaction ---"
## [1] "Correlation between Age and Cognitive Score:"
##
## Pearson's product-moment correlation
##
## data: cognitive and age_pcr
## t = -5.9975, df = 461, p-value = 4.052e-09
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.3515442 -0.1823737
## sample estimates:
## cor
## -0.2690327
##
## [1] "Linear Model with Age * PCR Interaction:"
##
## Call:
## lm(formula = cognitive ~ age_pcr * pcr, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.8347 -0.4722 0.1653 0.4324 0.8382
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.972676 0.668204 5.945 5.48e-09 ***
## age_pcr -0.035634 0.009889 -3.603 0.000349 ***
## pcrPOSITIVA -0.901191 0.737102 -1.223 0.222104
## age_pcr:pcrPOSITIVA 0.011763 0.010940 1.075 0.282823
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4811 on 458 degrees of freedom
## (1 observation deleted due to missingness)
## Multiple R-squared: 0.08214, Adjusted R-squared: 0.07613
## F-statistic: 13.66 on 3 and 458 DF, p-value: 1.5e-08
##
## [1] "Effect Sizes (Partial Eta Squared) for Age*PCR model:"
## Type 3 ANOVAs only give sensible and informative results when covariates
## are mean-centered and factors are coded with orthogonal contrasts (such
## as those produced by `contr.sum`, `contr.poly`, or `contr.helmert`, but
## *not* by the default `contr.treatment`).
## # Effect Size for ANOVA (Type III)
##
## Parameter | Eta2 (partial) | 95% CI
## -------------------------------------------
## age_pcr | 0.03 | [0.01, 1.00]
## pcr | 3.25e-03 | [0.00, 1.00]
## age_pcr:pcr | 2.52e-03 | [0.00, 1.00]
##
## - One-sided CIs: upper bound fixed at [1.00].[1] "Checking assumptions for lm(cognitive ~ age_pcr * pcr):"
## [1] "Generating Scatter Plot for Cognitive Score vs Age by PCR Status:"
## `geom_smooth()` using formula = 'y ~ x'
# Question 7: Are there correlations between different cognitive test performances?
cog_vars_indices <- which(names(data) %in% c("listaprimerrec", "listaaprendizaje", "listacp", "listalp", "listarecon",
"corsidirecto", "corsiinverso", "cactusvivos", "cactusinanim",
"otverbaltpo", "otverbalerr", "otvisualtpo", "otvisualerr",
"otmentaltpo", "otmentalerr", "otvismenttpo", "otvismenterr",
"otswitchtpo", "otswitcherr", "x5dreadtpo", "x5dreaderr",
"x5dcounttpo", "x5dcounterr", "x5dfoctpo", "x5dfocerr",
"x5dswitchtpo", "x5dswitcherr", "dscorr", "dsomis", "dscomis",
"torremov", "torretpo", "bostonsc", "bostonlat",
"bostonsemerr", "bostonfonerr", "fluencia"))
if (length(cog_vars_indices) > 1) {
print("--- Q7: Correlations among Cognitive Variables ---")
cognitive_subset <- data[, cog_vars_indices]
# Correlation matrix (original analysis)
cor_matrix <- cor(cognitive_subset, use = "pairwise.complete.obs")
print("Correlation Matrix (Top Left Corner):")
print(round(cor_matrix[1:min(10, nrow(cor_matrix)), 1:min(10, ncol(cor_matrix))], 2))
# Suggestion 5: Visualize the full correlation matrix
print("Generating Correlation Plot (allow time for plot rendering):")
# Adjust cex (size) parameters if labels overlap or are too small
corrplot(cor_matrix, method="number", type = "upper", order = "hclust",
tl.col = "black", tl.srt = 45, number.cex = 0.5, tl.cex = 0.6,
title = "Correlation Matrix of Cognitive Variables", mar=c(0,0,1,0)) # Add title
# Optional: PCA/Factor Analysis (original suggestion)
# print("Running PCA (example):")
# # Ensure no missing values for PCA - using na.omit is one way, imputation is another
# cognitive_subset_complete <- na.omit(cognitive_subset)
# if(nrow(cognitive_subset_complete) > ncol(cognitive_subset_complete)) { # Need more rows than columns
# pca_result <- princomp(cognitive_subset_complete, cor = TRUE, scores=TRUE)
# print(summary(pca_result)) # Show variance explained
# # print(loadings(pca_result)) # Show component loadings (how variables contribute)
# } else {
# print("Skipping PCA: Not enough complete cases or too many variables.")
# }
} else {
print("Skipping Q7 analysis: Not enough cognitive variable columns found or selected.")
}
## [1] "--- Q7: Correlations among Cognitive Variables ---"
## [1] "Correlation Matrix (Top Left Corner):"
## listaprimerrec listaaprendizaje listacp listalp listarecon
## listaprimerrec 1.00 0.62 0.43 0.41 0.37
## listaaprendizaje 0.62 1.00 0.71 0.69 0.53
## listacp 0.43 0.71 1.00 0.81 0.54
## listalp 0.41 0.69 0.81 1.00 0.55
## listarecon 0.37 0.53 0.54 0.55 1.00
## corsidirecto 0.14 0.14 0.22 0.17 0.21
## corsiinverso 0.28 0.30 0.26 0.27 0.29
## cactusvivos 0.34 0.45 0.43 0.38 0.44
## cactusinanim 0.32 0.43 0.41 0.37 0.41
## otverbaltpo 0.24 0.41 0.36 0.34 0.15
## corsidirecto corsiinverso cactusvivos cactusinanim otverbaltpo
## listaprimerrec 0.14 0.28 0.34 0.32 0.24
## listaaprendizaje 0.14 0.30 0.45 0.43 0.41
## listacp 0.22 0.26 0.43 0.41 0.36
## listalp 0.17 0.27 0.38 0.37 0.34
## listarecon 0.21 0.29 0.44 0.41 0.15
## corsidirecto 1.00 0.55 0.34 0.42 0.19
## corsiinverso 0.55 1.00 0.43 0.46 0.32
## cactusvivos 0.34 0.43 1.00 0.76 0.41
## cactusinanim 0.42 0.46 0.76 1.00 0.39
## otverbaltpo 0.19 0.32 0.41 0.39 1.00
## [1] "Generating Correlation Plot (allow time for plot rendering):"
# Question 8: Do neuroimaging measures correlate with cognitive scores or differ by COVID status?
# Suggestion 7: Expand Neuroimaging Analysis - Structure for repeating
neuro_vars_to_test <- c("right_hippocampus", "left_hippocampus", "right_amygdala", "left_amygdala") # Add other variables here
# Check which neuroimaging variables actually exist in the data
neuro_vars_to_test <- neuro_vars_to_test[neuro_vars_to_test %in% names(data)]
all_required_cols_q8 <- c(neuro_vars_to_test, "cognitive", "pcr", "age_pcr")
if (all(all_required_cols_q8 %in% names(data))) {
print(paste("--- Q8: Neuroimaging Analysis for:", paste(neuro_vars_to_test, collapse=", "), "---"))
results_neuro <- list() # Store results
for (neuro_var in neuro_vars_to_test) {
print(paste("--- Analyzing:", neuro_var, "---"))
results_neuro[[neuro_var]] <- list() # Create sublist for this variable
# Correlation with cognitive score
cor_neuro_cog <- cor.test(data[[neuro_var]], data$cognitive, use = "complete.obs")
print(paste("Correlation between", neuro_var, "and Cognitive Score:"))
print(cor_neuro_cog)
results_neuro[[neuro_var]]$correlation <- cor_neuro_cog
# Comparison based on PCR status (controlling for age)
formula_q8 <- as.formula(paste(neuro_var, "~ pcr + age_pcr"))
lm_neuro_pcr_age <- lm(formula_q8, data = data)
print(paste("Linear Model:", neuro_var, "~ pcr + age_pcr"))
summary_lm_neuro <- summary(lm_neuro_pcr_age)
print(summary_lm_neuro)
results_neuro[[neuro_var]]$lm_summary <- summary_lm_neuro
# Suggestion 9: Effect Sizes for LM
print("Effect Sizes (Partial Eta Squared):")
tryCatch({ # Anova might fail if model is singular etc.
anova_lm_neuro <- car::Anova(lm_neuro_pcr_age, type="III")
print(eta_squared(anova_lm_neuro, partial = TRUE))
results_neuro[[neuro_var]]$lm_effect_sizes <- eta_squared(anova_lm_neuro, partial = TRUE)
}, error = function(e){print(paste("Could not calculate effect sizes for", neuro_var, ":", e$message))})
# Suggestion 4: Check Model Assumptions
print(paste("Checking assumptions for lm for", neuro_var, ":"))
par(mfrow=c(2,2))
plot(lm_neuro_pcr_age)
par(mfrow=c(1,1))
}
# Suggestion 8: Address Multiple Comparisons for Neuroimaging LMs
# Example: Adjust p-values for the 'pcrPOSITIVA' term across all tested neuro variables
# p_values_pcr_effect <- sapply(results_neuro, function(res) {
# coef_summary <- coefficients(res$lm_summary)
# if ("pcrPOSITIVA" %in% rownames(coef_summary)) {
# return(coef_summary["pcrPOSITIVA", "Pr(>|t|)"])
# } else {
# return(NA) # Return NA if the coefficient doesn't exist
# }
# })
# p_values_pcr_effect <- p_values_pcr_effect[!is.na(p_values_pcr_effect)] # Remove NAs
# if(length(p_values_pcr_effect) > 1) {
# print("Adjusted p-values for PCR effect across neuroimaging variables (BH method):")
# print(p.adjust(p_values_pcr_effect, method = "BH"))
# }
} else {
print(paste("Skipping Q8 analysis: One or more required columns not found:", paste(all_required_cols_q8[!all_required_cols_q8 %in% names(data)], collapse=", ")))
}
## [1] "--- Q8: Neuroimaging Analysis for: right_hippocampus, left_hippocampus, right_amygdala, left_amygdala ---"
## [1] "--- Analyzing: right_hippocampus ---"
## [1] "Correlation between right_hippocampus and Cognitive Score:"
##
## Pearson's product-moment correlation
##
## data: data[[neuro_var]] and data$cognitive
## t = 2.8752, df = 461, p-value = 0.004224
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.04210788 0.22118372
## sample estimates:
## cor
## 0.1327288
##
## [1] "Linear Model: right_hippocampus ~ pcr + age_pcr"
##
## Call:
## lm(formula = formula_q8, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1404.85 -301.81 -37.61 263.75 1614.20
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5217.312 252.916 20.629 < 2e-16 ***
## pcrPOSITIVA -20.996 53.055 -0.396 0.692
## age_pcr -20.689 3.687 -5.612 3.47e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 419.5 on 459 degrees of freedom
## (1 observation deleted due to missingness)
## Multiple R-squared: 0.0642, Adjusted R-squared: 0.06013
## F-statistic: 15.75 on 2 and 459 DF, p-value: 2.432e-07
##
## [1] "Effect Sizes (Partial Eta Squared):"
## # Effect Size for ANOVA (Type III)
##
## Parameter | Eta2 (partial) | 95% CI
## -----------------------------------------
## pcr | 3.41e-04 | [0.00, 1.00]
## age_pcr | 0.06 | [0.03, 1.00]
##
## - One-sided CIs: upper bound fixed at [1.00].[1] "Checking assumptions for lm for right_hippocampus :"
## [1] "--- Analyzing: left_hippocampus ---"
## [1] "Correlation between left_hippocampus and Cognitive Score:"
##
## Pearson's product-moment correlation
##
## data: data[[neuro_var]] and data$cognitive
## t = 2.8959, df = 461, p-value = 0.00396
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.04306161 0.22209225
## sample estimates:
## cor
## 0.1336673
##
## [1] "Linear Model: left_hippocampus ~ pcr + age_pcr"
##
## Call:
## lm(formula = formula_q8, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1325.45 -262.95 -18.78 242.66 1773.68
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4760.336 232.020 20.517 < 2e-16 ***
## pcrPOSITIVA 12.712 48.672 0.261 0.794
## age_pcr -19.155 3.382 -5.664 2.61e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 384.8 on 459 degrees of freedom
## (1 observation deleted due to missingness)
## Multiple R-squared: 0.06616, Adjusted R-squared: 0.06209
## F-statistic: 16.26 on 2 and 459 DF, p-value: 1.506e-07
##
## [1] "Effect Sizes (Partial Eta Squared):"
## # Effect Size for ANOVA (Type III)
##
## Parameter | Eta2 (partial) | 95% CI
## -----------------------------------------
## pcr | 1.49e-04 | [0.00, 1.00]
## age_pcr | 0.07 | [0.03, 1.00]
##
## - One-sided CIs: upper bound fixed at [1.00].[1] "Checking assumptions for lm for left_hippocampus :"
## [1] "--- Analyzing: right_amygdala ---"
## [1] "Correlation between right_amygdala and Cognitive Score:"
##
## Pearson's product-moment correlation
##
## data: data[[neuro_var]] and data$cognitive
## t = 1.0567, df = 461, p-value = 0.2912
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.04216545 0.13965835
## sample estimates:
## cor
## 0.04915368
##
## [1] "Linear Model: right_amygdala ~ pcr + age_pcr"
##
## Call:
## lm(formula = formula_q8, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -550.59 -127.42 -26.96 103.96 811.74
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1374.489 113.518 12.108 <2e-16 ***
## pcrPOSITIVA 2.158 23.813 0.091 0.928
## age_pcr -3.964 1.655 -2.395 0.017 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 188.3 on 459 degrees of freedom
## (1 observation deleted due to missingness)
## Multiple R-squared: 0.01249, Adjusted R-squared: 0.008187
## F-statistic: 2.903 on 2 and 459 DF, p-value: 0.05588
##
## [1] "Effect Sizes (Partial Eta Squared):"
## # Effect Size for ANOVA (Type III)
##
## Parameter | Eta2 (partial) | 95% CI
## -----------------------------------------
## pcr | 1.79e-05 | [0.00, 1.00]
## age_pcr | 0.01 | [0.00, 1.00]
##
## - One-sided CIs: upper bound fixed at [1.00].[1] "Checking assumptions for lm for right_amygdala :"
## [1] "--- Analyzing: left_amygdala ---"
## [1] "Correlation between left_amygdala and Cognitive Score:"
##
## Pearson's product-moment correlation
##
## data: data[[neuro_var]] and data$cognitive
## t = 1.3678, df = 461, p-value = 0.1721
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.02771658 0.15381339
## sample estimates:
## cor
## 0.06357426
##
## [1] "Linear Model: left_amygdala ~ pcr + age_pcr"
##
## Call:
## lm(formula = formula_q8, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -527.45 -122.77 -20.61 89.47 800.90
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1233.465 111.054 11.107 <2e-16 ***
## pcrPOSITIVA 17.400 23.296 0.747 0.456
## age_pcr -2.388 1.619 -1.475 0.141
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 184.2 on 459 degrees of freedom
## (1 observation deleted due to missingness)
## Multiple R-squared: 0.006286, Adjusted R-squared: 0.001956
## F-statistic: 1.452 on 2 and 459 DF, p-value: 0.2353
##
## [1] "Effect Sizes (Partial Eta Squared):"
## # Effect Size for ANOVA (Type III)
##
## Parameter | Eta2 (partial) | 95% CI
## -----------------------------------------
## pcr | 1.21e-03 | [0.00, 1.00]
## age_pcr | 4.72e-03 | [0.00, 1.00]
##
## - One-sided CIs: upper bound fixed at [1.00].[1] "Checking assumptions for lm for left_amygdala :"
print("--- End of Updated Analysis Script ---")
## [1] "--- End of Updated Analysis Script ---"
This analysis examined the relationship between four subcortical brain regions (right_hippocampus, left_hippocampus, right_amygdala, left_amygdala) and both the general cognitive score and pcr status (controlling for age_pcr).
Here’s a breakdown by region:
Correlation with Cognition: There is a statistically significant, albeit small, positive correlation between the right hippocampus measure and the cognitive score (r = 0.13, p = 0.004). This suggests individuals with larger right hippocampal measures tend to have slightly higher cognitive scores in this sample.
Linear Model (vs. PCR Status & Age):
age_pcr has a highly significant negative association (p < 0.001), indicating that older age is linked to smaller right hippocampal measures. The partial eta squared (η²p = 0.06) suggests age accounts for about 6% of the variance in the right hippocampus measure after controlling for PCR status, a small-to-medium effect.
pcr status (positive vs. negative) was not significantly associated with the right hippocampus measure after controlling for age (p = 0.69). The effect size (η²p = 0.0003) is negligible.
Correlation with Cognition: Similar to the right side, there’s a significant, small positive correlation between the left hippocampus measure and the cognitive score (r = 0.13, p = 0.004).
Linear Model (vs. PCR Status & Age):
age_pcr again shows a highly significant negative association (p < 0.001), linking older age to smaller left hippocampal measures. The effect size (η²p = 0.07) is similar to the right side (around 7% of variance explained).
pcr status was not significantly associated with the left hippocampus measure after controlling for age (p = 0.79). The effect size (η²p = 0.0001) is negligible.
Correlation with Cognition: There was no significant correlation found between the right amygdala measure and the cognitive score (p = 0.29).
Linear Model (vs. PCR Status & Age):
age_pcr shows a significant negative association (p = 0.017), suggesting older age is linked to smaller right amygdala measures, although the effect size is small (η²p = 0.01).
pcr status was not significantly associated with the right amygdala measure after controlling for age (p = 0.93). The effect size (η²p ≈ 0) is negligible.
The overall model was borderline significant (p=0.056).
Correlation with Cognition: There was no significant correlation found between the left amygdala measure and the cognitive score (p = 0.17).
Linear Model (vs. PCR Status & Age):
Neither age_pcr (p = 0.14) nor pcr status (p = 0.46) were significantly associated with the left amygdala measure.
Effect sizes were negligible (η²p < 0.005 for both).
The overall model was not statistically significant (p = 0.24).
Age Effect: Age consistently shows a significant negative relationship with the size/measure of the hippocampi (both sides) and the right amygdala. This is expected, as brain structures can change with age.
Cognitive Correlation: Both left and right hippocampal measures show a weak positive association with the general cognitive score used in this study. Amygdala measures did not show this association.
PCR Status Effect: Based on these analyses, there is no significant evidence to suggest that having had a positive COVID-19 PCR test is associated with differences in the measures of the hippocampus or amygdala (left or right) in this sample, after accounting for the effect of age. The effect sizes for PCR status were consistently negligible across all four regions.
Model Fit: The linear models explained a small percentage of the variance in the neuroimaging measures (Adjusted R-squared ranging from ~0% for left amygdala to ~6% for hippocampi), primarily driven by age.
# Impute the data using the MICE package
library(mice)
# Use the CART imputation method
imp <- mice( data[c(2:91)], m = 5, maxit = 5, method = "cart", seed = 123)
# Complete the imputation process
imputed_data <- complete(imp)
# names(imputed_data)
library(EGAnet)
# Run EGA on the imputed data
# EGA on the Symptoms and Demographics
fit_demo <- EGA(imputed_data[c(1:3, 5, 8:16, 18:22)])
# Scale cognitive variables before running the EGA
fit_cog <- EGA(scale(imputed_data[23:59]))
# Confirmatory Factor Analysis from EGA model
cfa <- CFA(fit_cog, plot = TRUE, data = (scale(imputed_data[23:59])), estimator = "MLR")
## [1] "listaprimerrec" "listaaprendizaje" "listacp" "listalp"
## [5] "listarecon"
## [1] "corsidirecto" "corsiinverso" "cactusvivos" "cactusinanim" "otverbalerr"
## [6] "dscorr" "dscomis" "bostonsc" "bostonlat" "bostonsemerr"
## [11] "bostonfonerr" "fluencia"
## [1] "otverbaltpo" "otvisualtpo" "x5dreadtpo" "x5dcounttpo" "x5dfoctpo"
## [6] "x5dswitchtpo"
## [1] "otvisualerr" "otmentaltpo" "otmentalerr" "otvismenttpo" "otvismenterr"
## [6] "otswitchtpo" "otswitcherr"
## [1] "x5dreaderr" "x5dcounterr" "x5dfocerr" "x5dswitcherr"
## [1] "torremov" "torretpo"
# Scale brain variables before running the EGA
fit_all <- EGA(scale(imputed_data[60:90]))
#' Compare Groups Across Multiple Variables Automatically, Adjusting for Covariates
#'
#' This function iterates through specified variables in a dataset, determines their type,
#' fits appropriate generalized linear models (GLM), ordinal, or multinomial models,
#' and compares a model with the group predictor (and optional covariates) to a
#' null model (with optional covariates only) using a Likelihood Ratio Test (LRT).
#' It reports p-values and FDR-adjusted p-values for the group effect, adjusted for covariates.
#'
#' @param data A data.frame containing the variables to compare, the grouping variable,
#' and any covariates.
#' @param group A character string specifying the name of the column in `data` that
#' contains the grouping factor. Must have at least 2 levels.
#' @param vars_to_test A character vector specifying the names of the outcome variables
#' in `data` to be tested against the `group`. If NULL (default), all columns
#' except `group` and `covariates` are tested.
#' @param covariates A character vector specifying the names of the columns in `data` to
#' be used as covariates in the regression models. Default is `NULL` (no covariates).
#' These variables will be included in both the full and null models.
#' @param alpha The significance level (default 0.05) used to determine significance
#' based on the FDR-adjusted p-value of the group effect.
#' @param zero_threshold For continuous outcome variables, the minimum proportion of zero values
#' (default 0.3) to consider potentially using a Gamma GLM (along with skewness).
#' @param skew_threshold For continuous outcome variables, the minimum absolute skewness
#' (default 2) to consider potentially using a Gamma GLM (along with zero proportion).
#' Note: Uses `moments::skewness`.
#' @param verbose Logical indicating whether to print progress messages (default TRUE).
#' @param group_ref Optional. A string specifying the reference level for the group variable.
#' If NULL (default), the first level is used.
#'
#' @return A data.frame summarizing the results for each variable tested, including:
#' \item{Variable}{Name of the outcome variable tested.}
#' \item{Type}{Detected type of the outcome variable (e.g., "Continuous (Gaussian GLM used)", "Binary", "Ordinal", "Categorical", "Constant Outcome Variable").}
#' \item{Covariates_Used}{Comma-separated string of covariates included in the models for this variable.}
#' \item{n_obs}{Number of observations used for the test after handling missing data for the outcome, group, and all covariates.}
#' \item{Status}{Outcome of the modeling ("OK", "Error: [message]", "Constant Outcome Variable", "No DF improvement", "Low N or too many NAs", "Unsupported outcome variable type", "Log-Likelihood calculation failed...").}
#' \item{p_value}{The raw p-value from the Likelihood Ratio Test for the `group` effect, formatted as a string.}
#' \item{p_value_FDR}{The group p-value adjusted for multiple comparisons using the Benjamini-Hochberg FDR method, formatted as a string.}
#' \item{Significant}{"Yes" if p_value_FDR < alpha, "No" otherwise (based on non-NA adjusted p-values).}
#' \item{Gamma_Shift_Warning}{A flag ("Yes"/"No") indicating if non-positive outcome values were shifted for the Gamma GLM.}
#' \item{Convergence_Warning}{A flag ("Yes"/"No") indicating if GLM/Multinom models reported non-convergence or if logLik calculation failed for Ordinal models (proxy for potential issues).}
#'
#' @details
#' - **Covariate Adjustment:** Compares `model(y ~ group + covariates)` vs `model(y ~ covariates)` (or `y ~ group` vs `y ~ 1` if no covariates).
#' - **Variable Selection:** Uses `vars_to_test` or defaults to all other variables.
#' - **Variable Types & Modeling:** Automatically detects outcome type. Uses Gaussian GLM (default continuous), Gamma GLM (skewed/zeros continuous), Binomial GLM (binary), `MASS::polr` (ordinal), `nnet::multinom` (categorical). Covariates used as-is (ensure factor/numeric).
#' - **Convergence:** Checks convergence flags for GLM/Multinom. For Ordinal (`polr`), failure to calculate log-likelihood is used as a proxy for potential convergence/Hessian issues and flagged.
#' - **Missing Data:** Uses complete cases for outcome, group, and all covariates per test.
#' - **Group Reference Level:** Users can specify the reference level for the group variable using `group_ref`. If the specified level is not present in the data for a particular variable after removing NAs, a warning is issued, and the default reference level is used.
#' - **Dependencies:** Requires 'MASS', 'nnet', and 'moments' packages.
#'
#' @examples
#' \dontrun{
#' set.seed(123)
#' sample_data <- data.frame(
#' Site = factor(rep(c("Site1", "Site2"), each = 60)),
#' Age = rnorm(120, mean = 50, sd = 10),
#' Education = sample(10:20, 120, replace = TRUE),
#' Sex = factor(sample(c("M", "F"), 120, replace = TRUE)),
#' CognitiveScore = rnorm(120, mean = rep(c(100, 105), each = 60) + 0.1 * Age - 5 * (Sex == "F")),
#' BiomarkerA = rgamma(120, shape = rep(c(2, 3), each = 60) + 0.05 * Age, scale = 1.5),
#' Diagnosis = factor(sample(c("Normal", "MCI", "AD"), 120, replace = TRUE, prob = c(0.6, 0.3, 0.1))),
#' OrdinalResp = factor(sample(1:5, 120, replace = TRUE), ordered = TRUE),
#' ConstantVar = rep(10, 120)
#' )
#' sample_data$Age[sample(1:120, 10)] <- NA
#' sample_data$CognitiveScore[sample(1:120, 5)] <- NA
#' sample_data$BiomarkerA[1:5] <- NA
#'
#' results_with_cov <- compare_groups_auto_v4(
#' data = sample_data,
#' group = "Site",
#' vars_to_test = c("CognitiveScore", "BiomarkerA", "Diagnosis", "OrdinalResp", "ConstantVar"),
#' covariates = c("Age", "Education", "Sex"),
#' verbose = TRUE
#' )
#' print(results_with_cov)
#'
#' # Example with specified group reference level
#' results_with_ref <- compare_groups_auto_v4(
#' data = sample_data,
#' group = "Site",
#' vars_to_test = c("CognitiveScore", "BiomarkerA", "Diagnosis", "OrdinalResp", "ConstantVar"),
#' covariates = c("Age", "Education", "Sex"),
#' group_ref = "Site2",
#' verbose = TRUE
#' )
#' print(results_with_ref)
#' }
#' @importFrom stats glm gaussian binomial Gamma logLik pchisq p.adjust complete.cases sd anova reformulate relevel na.omit
#' @importFrom MASS polr
#' @importFrom nnet multinom
#' @importFrom moments skewness
#' @export
compare_groups_auto_v4 <- function(data, group, vars_to_test = NULL, covariates = NULL, alpha = 0.05,
zero_threshold = 0.3, skew_threshold = 2, verbose = TRUE, group_ref = NULL) {
# --- 1. Input Validation and Package Checks ---
if (!requireNamespace("MASS", quietly = TRUE)) stop("Package 'MASS' is required. Please install it.")
if (!requireNamespace("nnet", quietly = TRUE)) stop("Package 'nnet' is required. Please install it.")
if (!requireNamespace("moments", quietly = TRUE)) stop("Package 'moments' is required. Please install it.")
if (!is.data.frame(data)) stop("'data' must be a data.frame.")
if (!is.character(group) || length(group) != 1 || !(group %in% names(data))) {
stop("'group' must be a single string naming an existing column in 'data'.")
}
if (!is.null(covariates)) {
if (!is.character(covariates) || any(!covariates %in% names(data))) {
stop("'covariates' must be NULL or a character vector of existing column names in 'data'.")
}
if (any(covariates %in% group)) stop("The 'group' variable cannot also be listed in 'covariates'.")
covariates <- unique(covariates)
} else {
covariates <- character(0)
}
if (!is.null(vars_to_test)) {
if (!is.character(vars_to_test) || any(!vars_to_test %in% names(data))) {
stop("'vars_to_test' must be NULL or a character vector of existing column names in 'data'.")
}
if (any(vars_to_test %in% c(group, covariates))) stop("Variables in 'vars_to_test' cannot include the 'group' or 'covariates'.")
vars_to_test <- unique(vars_to_test)
}
if (!is.null(group_ref) && (!is.character(group_ref) || length(group_ref) != 1)) {
stop("'group_ref' must be NULL or a single string.")
}
if (!is.numeric(alpha) || alpha <= 0 || alpha >= 1) stop("'alpha' must be a numeric value strictly between 0 and 1.")
if (!is.numeric(zero_threshold) || zero_threshold < 0 || zero_threshold > 1) stop("'zero_threshold' must be numeric between 0 and 1.")
if (!is.numeric(skew_threshold) || skew_threshold < 0) stop("'skew_threshold' must be non-negative numeric.")
if (!is.logical(verbose)) stop("'verbose' must be a logical value (TRUE or FALSE).")
# --- 2. Prepare Data and Identify Variables ---
if (!is.factor(data[[group]])) {
if (verbose) message("Converting grouping variable '", group, "' to factor.")
data[[group]] <- as.factor(data[[group]])
}
if (nlevels(data[[group]]) < 2) stop("Grouping variable '", group, "' must have at least two levels.")
if (!is.null(group_ref)) {
if (!(group_ref %in% levels(data[[group]]))) {
stop(sprintf("'group_ref' '%s' is not a level of the group variable '%s'.", group_ref, group))
}
}
if (is.null(vars_to_test)) {
var_names <- setdiff(names(data), c(group, covariates))
} else {
var_names <- vars_to_test
}
if (length(var_names) == 0) {
warning("No variables identified for testing after excluding 'group' and 'covariates'.")
return(data.frame(Variable = character(), Type = character(), Covariates_Used = character(),
n_obs = integer(), Status = character(), p_value = numeric(),
p_value_FDR = numeric(), Significant = character(),
Gamma_Shift_Warning = character(), Convergence_Warning = character(),
stringsAsFactors = FALSE))
}
if (verbose) {
message(sprintf("Starting analysis of %d variables with '%s' as the grouping variable.",
length(var_names), group))
if (length(covariates) > 0) {
message(sprintf("Using %d covariates: %s", length(covariates), paste(covariates, collapse = ", ")))
}
}
# --- 3. Internal Helper Function: Model Fitting and LRT ---
fit_models_and_lrt <- function(temp_data, formula_full, formula_null, type, variable, verbose,
zero_threshold, skew_threshold) {
gamma_shift_warning <- FALSE
convergence_warning <- FALSE
p_val <- NA_real_
model_status <- "OK"
type_used <- type
data_for_model <- temp_data
outcome <- data_for_model$y
res <- tryCatch({
if (type == "Continuous") {
zero_prop <- mean(outcome == 0, na.rm = TRUE)
skew_val <- tryCatch(moments::skewness(outcome, na.rm = TRUE), error = function(e) NA_real_)
use_gamma <- !is.na(skew_val) && (zero_prop > zero_threshold) && (abs(skew_val) > skew_threshold)
if (is.na(skew_val) && verbose) {
message(sprintf("Skewness calculation failed for '%s'. Using Gaussian GLM.", variable))
}
if (use_gamma) {
min_val <- min(outcome, na.rm = TRUE)
if (min_val <= 0) {
shift_amount <- max(1e-6, abs(min_val) * 1.01 + 1e-6)
outcome <- outcome + shift_amount
data_for_model$y <- outcome
gamma_shift_warning <- TRUE
if (verbose) warning(sprintf("Variable '%s': Contains non-positive values. Added shift of ~%s for Gamma GLM fitting.",
variable, format(shift_amount, digits = 2)), call. = FALSE, immediate. = TRUE)
}
model_full <- stats::glm(formula = formula_full, data = data_for_model, family = stats::Gamma(link = "log"))
model_null <- stats::glm(formula = formula_null, data = data_for_model, family = stats::Gamma(link = "log"))
type_used <- "Continuous (Gamma GLM used)"
if (!model_full$converged || !model_null$converged) {
convergence_warning <- TRUE
if (verbose) warning(sprintf("Gamma GLM did not converge for '%s'.", variable), call. = FALSE)
}
} else {
model_full <- stats::glm(formula = formula_full, data = data_for_model, family = stats::gaussian(link = "identity"))
model_null <- stats::glm(formula = formula_null, data = data_for_model, family = stats::gaussian(link = "identity"))
type_used <- "Continuous (Gaussian GLM used)"
if (!model_full$converged || !model_null$converged) {
convergence_warning <- TRUE
if (verbose) warning(sprintf("Gaussian GLM did not converge for '%s'.", variable), call. = FALSE)
}
}
} else if (type == "Binary") {
model_full <- stats::glm(formula = formula_full, data = data_for_model, family = stats::binomial(link = "logit"))
model_null <- stats::glm(formula = formula_null, data = data_for_model, family = stats::binomial(link = "logit"))
type_used <- "Binary"
if (!model_full$converged || !model_null$converged) {
convergence_warning <- TRUE
if (verbose) warning(sprintf("Binomial GLM did not converge for '%s'.", variable), call. = FALSE)
}
} else if (type == "Ordinal") {
model_full <- MASS::polr(formula = formula_full, data = data_for_model, method = "logistic", Hess = TRUE)
model_null <- MASS::polr(formula = formula_null, data = data_for_model, method = "logistic", Hess = TRUE)
type_used <- "Ordinal"
} else if (type == "Categorical") {
model_full <- nnet::multinom(formula = formula_full, data = data_for_model, trace = FALSE)
model_null <- nnet::multinom(formula = formula_null, data = data_for_model, trace = FALSE)
type_used <- "Categorical"
if ((!is.null(model_full$convergence) && model_full$convergence != 0) ||
(!is.null(model_null$convergence) && model_null$convergence != 0)) {
convergence_warning <- TRUE
if (verbose) warning(sprintf("Multinomial model did not fully converge for '%s'.", variable), call. = FALSE)
}
} else {
stop("Internal error: Unknown model type.")
}
# --- Likelihood Ratio Test ---
if (inherits(model_full, "glm")) {
lrt_result <- suppressWarnings(stats::anova(model_null, model_full, test = "LRT"))
if (!is.null(lrt_result) && "Pr(>Chi)" %in% names(lrt_result) && nrow(lrt_result) > 1) {
candidate <- lrt_result$"Pr(>Chi)"[2]
if (!is.na(candidate) && is.finite(candidate)) {
p_val <- candidate
} else {
if (!is.na(lrt_result$Df[2]) && lrt_result$Df[2] <= 0) {
model_status <- "No DF improvement/equivalent models"
p_val <- 1.0
} else {
model_status <- "LRT p-value calculation failed (anova)"
p_val <- NA_real_
}
}
} else {
model_status <- "LRT failed (anova output invalid/models identical)"
p_val <- NA_real_
}
} else {
ll_full <- tryCatch(stats::logLik(model_full), error = function(e) NA)
ll_null <- tryCatch(stats::logLik(model_null), error = function(e) NA)
if (is.na(ll_full) || is.na(ll_null)) {
model_status <- "Log-Likelihood calculation failed (NA/NaN)"
if (inherits(model_full, "polr")) {
model_status <- paste(model_status, "- potential polr convergence?")
convergence_warning <- TRUE
}
p_val <- NA_real_
} else {
lr_stat <- 2 * (as.numeric(ll_full) - as.numeric(ll_null))
df_full <- attr(ll_full, "df")
df_null <- attr(ll_null, "df")
if (is.null(df_full) || is.null(df_null)) {
model_status <- "Could not retrieve degrees of freedom for LRT"
p_val <- NA_real_
} else {
df_diff <- df_full - df_null
if (df_diff > .Machine$double.eps^0.5) {
if (lr_stat < -1e-8) {
model_status <- sprintf("Negative LRT statistic (%.2g)", lr_stat)
p_val <- NA_real_
} else {
p_val <- stats::pchisq(max(0, lr_stat), df = df_diff, lower.tail = FALSE)
}
} else {
p_val <- 1.0
model_status <- "No DF improvement"
}
}
}
}
list(p_value = p_val, model_status = model_status, type_used = type_used,
gamma_shift_warning = gamma_shift_warning, convergence_warning = convergence_warning)
}, error = function(e) {
msg <- paste("Error:", gsub("[\\r\\n\\t]+", " ", conditionMessage(e)))
if (verbose) message(sprintf("Model fitting/LRT failed for '%s'. Status: %s", variable, msg))
list(p_value = NA_real_, model_status = msg, type_used = type,
gamma_shift_warning = gamma_shift_warning, convergence_warning = convergence_warning)
})
return(res)
}
# --- 4. Loop Through Variables ---
res_list <- vector("list", length(var_names))
names(res_list) <- var_names
for (i in seq_along(var_names)) {
variable <- var_names[i]
if (verbose && length(var_names) > 5 && i %% ceiling(length(var_names) / 10) == 1) {
message(sprintf("Processing variable %d of %d: %s", i, length(var_names), variable))
}
# Initialize default result for the variable
res <- data.frame(
Variable = variable,
Type = "Unknown",
Covariates_Used = paste(covariates, collapse = ", "),
n_obs = 0L,
Status = "Not processed",
p_value = NA_real_,
p_value_FDR = NA_real_,
Significant = "No",
Gamma_Shift_Warning = "No",
Convergence_Warning = "No",
stringsAsFactors = FALSE
)
# --- Data Extraction and Missing Data Handling ---
current_cols <- c(variable, group, covariates)
full_data <- tryCatch(data[, current_cols, drop = FALSE], error = function(e) NULL)
if (is.null(full_data)) {
res$Status <- paste("Error accessing column:", variable)
res_list[[variable]] <- res
next
}
complete_idx <- stats::complete.cases(full_data)
n_complete <- sum(complete_idx)
res$n_obs <- n_complete
min_obs_needed <- (nlevels(data[[group]]) - 1) + length(covariates) + nlevels(data[[group]]) + 5
if (n_complete < min_obs_needed || n_complete < (2 * nlevels(data[[group]]))) {
res$Status <- "Low N or too many NAs for model complexity"
res_list[[variable]] <- res
next
}
temp_data <- full_data[complete_idx, , drop = FALSE]
names(temp_data)[names(temp_data) == variable] <- "y"
temp_data[[group]] <- factor(temp_data[[group]])
# Check if group has at least two levels after removing NAs
if (nlevels(temp_data[[group]]) < 2) {
res$Status <- "Grouping variable has less than 2 levels after removing NAs"
res_list[[variable]] <- res
next
}
# Set reference level for group if specified and present
if (!is.null(group_ref) && group_ref %in% levels(temp_data[[group]])) {
temp_data[[group]] <- relevel(temp_data[[group]], ref = group_ref)
} else if (!is.null(group_ref)) {
warning(sprintf("For variable '%s', the specified 'group_ref' '%s' is not present in the data after removing NAs. Using default reference level.", variable, group_ref))
}
x <- temp_data$y
# --- Check for Constant Outcome Variable ---
is_constant <- FALSE
if (is.numeric(x)) {
if (is.na(stats::sd(x, na.rm = TRUE)) || stats::sd(x, na.rm = TRUE) < .Machine$double.eps^0.5) {
is_constant <- TRUE
}
} else {
if (length(unique(stats::na.omit(x))) <= 1) is_constant <- TRUE
}
if (is_constant) {
res$Status <- "Constant Outcome Variable"
res$Type <- if (is.numeric(x)) "Numeric Constant" else "Factor/Char Constant"
res_list[[variable]] <- res
next
}
# --- Determine Outcome Variable Type ---
type <- "Unknown"
if (is.numeric(x)) {
unique_vals <- unique(stats::na.omit(x))
n_unique <- length(unique_vals)
is_int_like_binary <- n_unique == 2 && all(abs(unique_vals - round(unique_vals)) < .Machine$double.eps^0.5)
if (is_int_like_binary) {
type <- "Binary"
temp_data$y <- factor(temp_data$y, levels = sort(unique_vals))
} else {
type <- "Continuous"
}
} else if (is.factor(x)) {
current_levels <- levels(droplevels(factor(x)))
n_levels <- length(current_levels)
original_levels <- levels(data[[variable]])
if (is.ordered(x)) {
if (n_levels < 2) {
res$Status <- "Ordinal outcome with < 2 levels after NA removal"
res_list[[variable]] <- res
next
}
type <- "Ordinal"
temp_data$y <- factor(temp_data$y, levels = intersect(original_levels, current_levels), ordered = TRUE)
} else {
if (n_levels == 2) {
type <- "Binary"
temp_data$y <- factor(temp_data$y, levels = current_levels)
} else if (n_levels > 2) {
type <- "Categorical"
temp_data$y <- stats::relevel(factor(temp_data$y, levels = current_levels), ref = current_levels[1])
} else {
res$Status <- "Factor outcome with < 2 levels after NA removal"
res_list[[variable]] <- res
next
}
}
} else if (is.character(x)) {
temp_data$y <- factor(temp_data$y)
current_levels <- levels(temp_data$y)
n_levels <- length(current_levels)
if (n_levels == 2) {
type <- "Binary"
} else if (n_levels > 2) {
type <- "Categorical"
temp_data$y <- stats::relevel(temp_data$y, ref = current_levels[1])
} else {
res$Status <- "Character outcome with < 2 levels after NA removal"
res_list[[variable]] <- res
next
}
} else {
res$Status <- "Unsupported outcome variable type"
res$Type <- class(x)[1]
res_list[[variable]] <- res
next
}
res$Type <- type
# --- Ensure Covariates are Factor or Numeric ---
for (covar in covariates) {
if (!is.numeric(temp_data[[covar]]) && !is.factor(temp_data[[covar]])) {
if (verbose) message("Converting covariate '", covar, "' to factor for variable '", variable, "'.")
temp_data[[covar]] <- factor(temp_data[[covar]])
}
}
# --- Construct Model Formulas ---
terms_full <- c(group, covariates)
terms_null <- if (length(covariates) > 0) covariates else "1"
formula_full <- tryCatch(stats::reformulate(termlabels = terms_full, response = "y"),
error = function(e) NULL)
formula_null <- tryCatch(stats::reformulate(termlabels = terms_null, response = "y"),
error = function(e) NULL)
if (is.null(formula_full) || is.null(formula_null)) {
res$Status <- "Error: Failed to construct model formulas"
res_list[[variable]] <- res
next
}
# --- Fit Models and Perform LRT via Helper Function ---
model_out <- fit_models_and_lrt(temp_data, formula_full, formula_null, type, variable, verbose,
zero_threshold, skew_threshold)
res$p_value <- model_out$p_value
res$Status <- model_out$model_status
res$Type <- model_out$type_used
if (model_out$gamma_shift_warning) res$Gamma_Shift_Warning <- "Yes"
if (model_out$convergence_warning) res$Convergence_Warning <- "Yes"
res_list[[variable]] <- res
# --- Periodic Garbage Collection ---
if (i %% 50 == 0 && (nrow(data) * ncol(data) > 1e6 || length(var_names) > 100)) {
if (verbose) message("Running garbage collection...")
gc(verbose = FALSE)
}
}
# --- 5. Combine and Adjust Results ---
res_df <- do.call(rbind, res_list)
rownames(res_df) <- NULL
valid_idx <- which(!is.na(res_df$p_value) & is.finite(res_df$p_value))
if (length(valid_idx) > 0) {
res_df$p_value_FDR[valid_idx] <- stats::p.adjust(res_df$p_value[valid_idx], method = "fdr")
res_df$Significant <- ifelse(!is.na(res_df$p_value_FDR) &
res_df$p_value_FDR < alpha, "Yes", "No")
res_df$Significant[is.na(res_df$p_value)] <- "No"
} else {
res_df$p_value_FDR <- NA_real_
res_df$Significant <- "No"
}
# --- 6. Round and Format P-values ---
res_df$p_value <- ifelse(!is.na(res_df$p_value) & res_df$p_value < 0.001,
"<0.001", sprintf("%.3f", res_df$p_value))
res_df$p_value_FDR <- ifelse(!is.na(res_df$p_value_FDR) & res_df$p_value_FDR < 0.001,
"<0.001", sprintf("%.3f", res_df$p_value_FDR))
final_order <- c("Variable", "Type", "Covariates_Used", "n_obs", "Status",
"p_value", "p_value_FDR", "Significant",
"Gamma_Shift_Warning", "Convergence_Warning")
if (!all(final_order %in% names(res_df))) {
warning("Internal issue: Some expected result columns are missing.")
} else {
res_df <- res_df[, final_order, drop = FALSE]
}
if (verbose) message("Analysis finished. Returning results table.")
return(res_df)
}
#' Compare Groups Across Multiple Variables Automatically, Adjusting for Covariates
#'
#' This function iterates through specified variables in a dataset, determines their type,
#' fits appropriate generalized linear models (GLM), ordinal, or multinomial models,
#' and compares a model with the group predictor (and optional covariates) to a
#' null model (with optional covariates only) using a Likelihood Ratio Test (LRT).
#' It reports p-values and FDR-adjusted p-values for the group effect, adjusted for covariates.
#'
#' @param data A data.frame containing the variables to compare, the grouping variable,
#' and any covariates.
#' @param group A character string specifying the name of the column in `data` that
#' contains the grouping factor. Must have at least 2 levels.
#' @param vars_to_test A character vector specifying the names of the outcome variables
#' in `data` to be tested against the `group`. If NULL (default), all columns
#' except `group` and `covariates` are tested.
#' @param covariates A character vector specifying the names of the columns in `data` to
#' be used as covariates in the regression models. Default is `NULL` (no covariates).
#' These variables will be included in both the full and null models.
#' @param alpha The significance level (default 0.05) used to determine significance
#' based on the FDR-adjusted p-value of the group effect.
#' @param zero_threshold For continuous outcome variables, the minimum proportion of zero values
#' (default 0.3) to consider potentially using a Gamma GLM (along with skewness).
#' @param skew_threshold For continuous outcome variables, the minimum absolute skewness
#' (default 2) to consider potentially using a Gamma GLM (along with zero proportion).
#' Note: Uses `moments::skewness`.
#' @param verbose Logical indicating whether to print progress messages (default TRUE).
#' @param group_ref Optional. A string specifying the reference level for the group variable.
#' If NULL (default), the first level (after NA removal and potential factor conversion) is used.
#'
#' @return A data.frame summarizing the results for each variable tested, including:
#' \item{Variable}{Name of the outcome variable tested.}
#' \item{Type}{Detected type of the outcome variable (e.g., "Continuous (Gaussian GLM used)", "Binary", "Ordinal", "Categorical", "Constant Outcome Variable").}
#' \item{Covariates_Used}{Comma-separated string of covariates included in the models for this variable.}
#' \item{Group_Ref_Level_Used}{The actual reference level used for the group factor in the model for this variable.}
#' \item{n_obs}{Number of observations used for the test after handling missing data for the outcome, group, and all covariates.}
#' \item{Status}{Outcome of the modeling ("OK", "Error: [message]", "Constant Outcome Variable", "No DF improvement", "Low N or too many NAs...", "Unsupported outcome variable type", "Log-Likelihood calculation failed...").}
#' \item{p_value}{The raw p-value from the Likelihood Ratio Test for the `group` effect, formatted as a string.}
#' \item{p_value_FDR}{The group p-value adjusted for multiple comparisons using the Benjamini-Hochberg FDR method, formatted as a string.}
#' \item{Significant}{"Yes" if p_value_FDR < alpha, "No" otherwise (based on non-NA adjusted p-values).}
#' \item{Gamma_Shift_Warning}{Logical (`TRUE`/`FALSE`) indicating if non-positive outcome values were shifted for the Gamma GLM.}
#' \item{Convergence_Warning}{Logical (`TRUE`/`FALSE`) indicating if GLM/Multinom models reported non-convergence or if logLik calculation failed for Ordinal models (proxy for potential issues).}
#'
#' @details
#' - **Covariate Adjustment:** Compares `model(y ~ group + covariates)` vs `model(y ~ covariates)` (or `y ~ group` vs `y ~ 1` if no covariates).
#' - **Variable Selection:** Uses `vars_to_test` or defaults to all other variables.
#' - **Variable Types & Modeling:** Automatically detects outcome type. Uses Gaussian GLM (default continuous), Gamma GLM (skewed/zeros continuous), Binomial GLM (binary), `MASS::polr` (ordinal), `nnet::multinom` (categorical). Covariates used as-is (ensure factor/numeric).
#' - **Convergence:** Checks convergence flags for GLM/Multinom. For Ordinal (`polr`), failure to calculate log-likelihood is used as a proxy for potential convergence/Hessian issues (as `polr` lacks a simple flag) and flagged. Note that binomial/multinomial models may also issue warnings or fail related to separation/quasi-separation, which might be reflected in convergence status or errors.
#' - **Missing Data:** Uses complete cases for outcome, group, and all covariates per test.
#' - **Minimum Observations:** A heuristic check (`n_complete > params_approx + 5`) is performed to ensure a minimal number of observations relative to the basic model parameters (intercept + group levels + covariates). This is a safeguard, not a guarantee of model stability.
#' - **Group Reference Level:** Users can specify the reference level for the group variable using `group_ref`. If the specified level is not present in the data for a particular variable after removing NAs, a warning is issued, and the default reference level (first level of the factor in the subset) is used. The actual level used is reported.
#' - **Dependencies:** Requires 'MASS', 'nnet', and 'moments' packages.
#'
#' @examples
#' \dontrun{
#' set.seed(123)
#' sample_data <- data.frame(
#' Site = factor(rep(c("Site1", "Site2", "Site3"), each = 40)), # Added Site3
#' Age = rnorm(120, mean = 50, sd = 10),
#' Education = sample(10:20, 120, replace = TRUE),
#' Sex = factor(sample(c("M", "F"), 120, replace = TRUE)),
#' CognitiveScore = rnorm(120, mean = rep(c(100, 105, 102), each = 40) + 0.1 * Age - 5 * (Sex == "F")),
#' BiomarkerA = rgamma(120, shape = rep(c(2, 3, 2.5), each = 40) + 0.05 * Age, scale = 1.5),
#' Diagnosis = factor(sample(c("Normal", "MCI", "AD"), 120, replace = TRUE, prob = c(0.6, 0.3, 0.1))),
#' OrdinalResp = factor(sample(1:5, 120, replace = TRUE), ordered = TRUE),
#' ConstantVar = rep(10, 120),
#' LowNVar = c(rnorm(5), rep(NA, 115)) # Variable with few non-NA cases
#' )
#' sample_data$Age[sample(1:120, 10)] <- NA
#' sample_data$CognitiveScore[sample(1:120, 5)] <- NA
#' sample_data$BiomarkerA[1:5] <- NA
#' # Make Site3 rare for CognitiveScore after NA removal
#' sample_data$CognitiveScore[sample(which(sample_data$Site == "Site3"), 35)] <- NA
#'
#' results_final <- compare_groups_auto_v4(
#' data = sample_data,
#' group = "Site",
#' # Test all relevant variables
#' vars_to_test = c("CognitiveScore", "BiomarkerA", "Diagnosis",
#' "OrdinalResp", "ConstantVar", "LowNVar", "Age"),
#' covariates = c("Education", "Sex"), # Using fewer covariates for example
#' group_ref = "Site2", # Specify reference
#' verbose = TRUE
#' )
#' print(results_final)
#' }
#' @importFrom stats glm gaussian binomial Gamma logLik pchisq p.adjust complete.cases sd anova reformulate relevel na.omit
#' @importFrom MASS polr
#' @importFrom nnet multinom
#' @importFrom moments skewness
#' @export
compare_groups_auto_v4 <- function(data, group, vars_to_test = NULL, covariates = NULL, alpha = 0.05,
zero_threshold = 0.3, skew_threshold = 2, verbose = TRUE, group_ref = NULL) {
# --- 1. Input Validation and Package Checks ---
# Ensure required packages are available
if (!requireNamespace("MASS", quietly = TRUE)) stop("Package 'MASS' is required. Please install it.")
if (!requireNamespace("nnet", quietly = TRUE)) stop("Package 'nnet' is required. Please install it.")
if (!requireNamespace("moments", quietly = TRUE)) stop("Package 'moments' is required. Please install it.")
# Validate inputs
if (!is.data.frame(data)) stop("'data' must be a data.frame.")
if (!is.character(group) || length(group) != 1 || !(group %in% names(data))) {
stop("'group' must be a single string naming an existing column in 'data'.")
}
if (!is.null(covariates)) {
if (!is.character(covariates) || any(!covariates %in% names(data))) {
stop("'covariates' must be NULL or a character vector of existing column names in 'data'.")
}
if (any(covariates %in% group)) stop("The 'group' variable cannot also be listed in 'covariates'.")
covariates <- unique(covariates) # Remove duplicates
} else {
covariates <- character(0) # Ensure it's an empty character vector if NULL
}
if (!is.null(vars_to_test)) {
if (!is.character(vars_to_test) || any(!vars_to_test %in% names(data))) {
stop("'vars_to_test' must be NULL or a character vector of existing column names in 'data'.")
}
if (any(vars_to_test %in% c(group, covariates))) stop("Variables in 'vars_to_test' cannot include the 'group' or 'covariates'.")
vars_to_test <- unique(vars_to_test) # Remove duplicates
}
if (!is.null(group_ref) && (!is.character(group_ref) || length(group_ref) != 1)) {
stop("'group_ref' must be NULL or a single string.")
}
# Validate numeric parameters
if (!is.numeric(alpha) || alpha <= 0 || alpha >= 1) stop("'alpha' must be a numeric value strictly between 0 and 1.")
if (!is.numeric(zero_threshold) || zero_threshold < 0 || zero_threshold > 1) stop("'zero_threshold' must be numeric between 0 and 1.")
if (!is.numeric(skew_threshold) || skew_threshold < 0) stop("'skew_threshold' must be non-negative numeric.")
if (!is.logical(verbose)) stop("'verbose' must be a logical value (TRUE or FALSE).")
# --- 2. Prepare Data and Identify Variables ---
# Ensure group variable is a factor
if (!is.factor(data[[group]])) {
if (verbose) message("Converting grouping variable '", group, "' to factor.")
data[[group]] <- as.factor(data[[group]])
}
original_group_levels <- levels(data[[group]])
if (length(original_group_levels) < 2) stop("Grouping variable '", group, "' must have at least two levels in the original data.")
# Validate group_ref against original levels
if (!is.null(group_ref)) {
if (!(group_ref %in% original_group_levels)) {
stop(sprintf("'group_ref' '%s' is not a level of the original group variable '%s'. Available levels: %s",
group_ref, group, paste(original_group_levels, collapse=", ")))
}
}
# Identify variables to test
if (is.null(vars_to_test)) {
var_names <- base::setdiff(names(data), c(group, covariates))
} else {
var_names <- vars_to_test # Already validated and made unique
}
# Handle case with no variables to test
if (length(var_names) == 0) {
warning("No variables identified for testing after excluding 'group' and 'covariates'.")
return(data.frame(Variable = character(), Type = character(), Covariates_Used = character(),
Group_Ref_Level_Used = character(), n_obs = integer(), Status = character(),
p_value = character(), p_value_FDR = character(), Significant = character(),
Gamma_Shift_Warning = logical(), Convergence_Warning = logical(),
stringsAsFactors = FALSE))
}
# Initial message
if (verbose) {
message(sprintf("Starting analysis of %d variables with '%s' as the grouping variable.",
length(var_names), group))
if (length(covariates) > 0) {
message(sprintf("Using %d covariates: %s", length(covariates), paste(covariates, collapse = ", ")))
}
if (!is.null(group_ref)) message(sprintf("Attempting to use '%s' as the reference level for '%s'.", group_ref, group))
}
# --- 3. Internal Helper Function: Model Fitting and LRT ---
# This function isolates the core modeling logic for a single variable
fit_models_and_lrt <- function(temp_data, formula_full, formula_null, type, variable, verbose,
zero_threshold, skew_threshold) {
# Initialize flags and results
gamma_shift_warning <- FALSE
convergence_warning <- FALSE
p_val <- NA_real_
model_status <- "OK"
type_used <- type
data_for_model <- temp_data # Work on a copy? Unlikely needed here.
outcome <- data_for_model$y
# Use tryCatch to handle potential errors during model fitting/comparison
res <- tryCatch({
# --- Model Fitting based on Variable Type ---
if (type == "Continuous") {
zero_prop <- base::mean(outcome == 0, na.rm = TRUE)
# Use moments::skewness safely
skew_val <- tryCatch(moments::skewness(outcome, na.rm = TRUE), error = function(e) NA_real_)
# Decide whether to use Gamma GLM based on thresholds
use_gamma <- !is.na(skew_val) && (zero_prop > zero_threshold) && (abs(skew_val) > skew_threshold)
if (is.na(skew_val) && verbose) {
message(sprintf("Skewness calculation failed for '%s'. Using Gaussian GLM.", variable))
}
if (use_gamma) {
type_used <- "Continuous (Gamma GLM used)"
min_val <- base::min(outcome, na.rm = TRUE)
# Shift non-positive values for Gamma GLM
if (min_val <= 0) {
shift_amount <- base::max(1e-6, abs(min_val) * 1.01 + 1e-6) # Small relative shift
outcome <- outcome + shift_amount
data_for_model$y <- outcome # Update outcome in data used for modeling
gamma_shift_warning <- TRUE
if (verbose) warning(sprintf("Variable '%s': Contains non-positive values. Added shift of ~%s for Gamma GLM fitting.",
variable, format(shift_amount, digits = 2)), call. = FALSE, immediate. = TRUE)
}
# Fit Gamma GLMs
model_full <- stats::glm(formula = formula_full, data = data_for_model, family = stats::Gamma(link = "log"))
model_null <- stats::glm(formula = formula_null, data = data_for_model, family = stats::Gamma(link = "log"))
if (!model_full$converged || !model_null$converged) {
convergence_warning <- TRUE
if (verbose) warning(sprintf("Gamma GLM did not converge for '%s'.", variable), call. = FALSE)
}
} else {
type_used <- "Continuous (Gaussian GLM used)"
# Fit Gaussian GLMs
model_full <- stats::glm(formula = formula_full, data = data_for_model, family = stats::gaussian(link = "identity"))
model_null <- stats::glm(formula = formula_null, data = data_for_model, family = stats::gaussian(link = "identity"))
if (!model_full$converged || !model_null$converged) {
convergence_warning <- TRUE
if (verbose) warning(sprintf("Gaussian GLM did not converge for '%s'.", variable), call. = FALSE)
}
}
} else if (type == "Binary") {
type_used <- "Binary"
# Fit Binomial GLMs
model_full <- stats::glm(formula = formula_full, data = data_for_model, family = stats::binomial(link = "logit"))
model_null <- stats::glm(formula = formula_null, data = data_for_model, family = stats::binomial(link = "logit"))
if (!model_full$converged || !model_null$converged) {
convergence_warning <- TRUE
if (verbose) warning(sprintf("Binomial GLM did not converge for '%s'.", variable), call. = FALSE)
}
} else if (type == "Ordinal") {
type_used <- "Ordinal"
# Fit Proportional Odds Logistic Regression (POLR) models
model_full <- MASS::polr(formula = formula_full, data = data_for_model, method = "logistic", Hess = TRUE)
model_null <- MASS::polr(formula = formula_null, data = data_for_model, method = "logistic", Hess = TRUE)
# Convergence check for polr relies on logLik below
} else if (type == "Categorical") {
type_used <- "Categorical"
# Fit Multinomial Log-linear Models
# Ensure baseline category is explicit (handled in main loop via relevel)
model_full <- nnet::multinom(formula = formula_full, data = data_for_model, trace = FALSE)
model_null <- nnet::multinom(formula = formula_null, data = data_for_model, trace = FALSE)
# Check nnet convergence flag (0 indicates success)
if ((!is.null(model_full$convergence) && model_full$convergence != 0) ||
(!is.null(model_null$convergence) && model_null$convergence != 0)) {
convergence_warning <- TRUE
if (verbose) warning(sprintf("Multinomial model did not fully converge for '%s'. Code: %s (full), %s (null).",
variable, model_full$convergence, model_null$convergence), call. = FALSE)
}
} else {
# This case should not be reached due to prior checks
stop("Internal error: Unknown model type specified.")
}
# --- Likelihood Ratio Test (LRT) ---
# Use stats::anova for GLM objects, manual logLik comparison otherwise
if (inherits(model_full, "glm")) {
# Suppress warnings from anova (e.g., about non-integer df) - check output carefully
lrt_result <- suppressWarnings(stats::anova(model_null, model_full, test = "LRT"))
# Validate the anova output
if (!is.null(lrt_result) && "Pr(>Chi)" %in% names(lrt_result) && nrow(lrt_result) > 1) {
candidate_p <- lrt_result$"Pr(>Chi)"[2]
candidate_df <- lrt_result$Df[2] # Df difference
# Check if Df difference is valid (NA or non-positive indicates issues)
if(!is.na(candidate_df) && candidate_df <= 0) {
model_status <- "No DF improvement/equivalent models (anova)"
p_val <- 1.0 # Models are equivalent or null is more complex
} else if (!is.na(candidate_p) && is.finite(candidate_p)) {
p_val <- candidate_p # Valid p-value from anova
} else {
# P-value calculation failed within anova
model_status <- "LRT p-value calculation failed (anova returned NA/NaN)"
p_val <- NA_real_
}
} else {
# Anova failed to produce expected output
model_status <- "LRT failed (anova output invalid or models identical?)"
# Consider if models are identical, maybe p=1? But anova failure suggests other issues.
p_val <- NA_real_
}
} else { # Manual LRT for polr, multinom
# Safely get log-likelihoods
ll_full <- tryCatch(stats::logLik(model_full), error = function(e) NA)
ll_null <- tryCatch(stats::logLik(model_null), error = function(e) NA)
# Check if logLik calculation succeeded
if (is.na(ll_full) || is.na(ll_null)) {
model_status <- "Log-Likelihood calculation failed (NA/NaN)"
if (inherits(model_full, "polr")) {
# For polr, logLik failure often indicates fitting issues (e.g., Hessian non-positive definite)
# Use this as a proxy for convergence/fitting problems.
model_status <- paste(model_status, "- potential polr convergence/Hessian issue?")
convergence_warning <- TRUE # Set convergence flag as proxy
}
p_val <- NA_real_
} else {
# Calculate LRT statistic and degrees of freedom difference
lr_stat <- 2 * (as.numeric(ll_full) - as.numeric(ll_null))
df_full <- attr(ll_full, "df")
df_null <- attr(ll_null, "df")
if (is.null(df_full) || is.null(df_null)) {
# Should not happen if logLik worked, but check anyway
model_status <- "Could not retrieve degrees of freedom for LRT"
p_val <- NA_real_
} else {
df_diff <- df_full - df_null
# Check if full model has more parameters (allow for floating point noise)
if (df_diff > sqrt(.Machine$double.eps)) {
# Check for negative LRT statistic (usually indicates fitting error or numerical instability)
if (lr_stat < -sqrt(.Machine$double.eps)) {
model_status <- sprintf("Warning: Negative LRT statistic (%.2g), check model fits/convergence.", lr_stat)
p_val <- NA_real_ # P-value is unreliable
} else {
# Calculate p-value using chi-squared distribution
# Ensure LRT statistic is non-negative before calculating p-value
p_val <- stats::pchisq(max(0, lr_stat), df = df_diff, lower.tail = FALSE)
}
} else {
# Full model does not have more parameters than null model
p_val <- 1.0 # No evidence against null hypothesis based on complexity
model_status <- "No DF improvement"
}
}
}
}
# Final check for NaN p-values
if (is.nan(p_val)) {
p_val <- NA_real_
if (model_status == "OK") model_status <- "LRT resulted in NaN p-value"
}
# Return results list
list(p_value = p_val, model_status = model_status, type_used = type_used,
gamma_shift_warning = gamma_shift_warning, convergence_warning = convergence_warning)
}, error = function(e) {
# Catch errors during the tryCatch block (fitting or LRT)
msg <- paste("Error during model/LRT:", gsub("[\\r\\n\\t]+", " ", conditionMessage(e)))
if (verbose) message(sprintf("Model fitting/LRT process failed for '%s'. Status: %s", variable, msg))
# Return NA p-value and error status, maintain existing warning flags if set before error
list(p_value = NA_real_, model_status = msg, type_used = type, # type might not be updated (e.g. to Gamma)
gamma_shift_warning = gamma_shift_warning, convergence_warning = convergence_warning)
}) # End tryCatch
return(res)
} # End helper function fit_models_and_lrt
# --- 4. Loop Through Variables to Test ---
res_list <- vector("list", length(var_names))
names(res_list) <- var_names
for (i in seq_along(var_names)) {
variable <- var_names[i]
# Progress message
if (verbose && length(var_names) > 5 && i %% ceiling(length(var_names) / 10) == 1) {
message(sprintf("Processing variable %d of %d: %s", i, length(var_names), variable))
}
# Initialize default result structure for this variable
res <- data.frame(
Variable = variable,
Type = "Unknown",
Covariates_Used = paste(covariates, collapse = ", "),
Group_Ref_Level_Used = NA_character_, # Placeholder for actual ref level
n_obs = 0L,
Status = "Not processed",
p_value = NA_real_, # Store raw p-value initially
p_value_FDR = NA_real_,
Significant = "No", # Default to No significance
Gamma_Shift_Warning = FALSE,
Convergence_Warning = FALSE,
stringsAsFactors = FALSE
)
# --- Data Subsetting and Missing Data Handling for Current Variable ---
current_cols <- c(variable, group, covariates)
# Use tryCatch for safe column access
full_data_subset <- tryCatch(data[, current_cols, drop = FALSE], error = function(e) NULL)
if (is.null(full_data_subset)) {
# Identify missing columns if possible
missing_cols <- current_cols[!current_cols %in% names(data)]
res$Status <- paste("Error accessing column(s):", paste(missing_cols, collapse=", "))
res_list[[variable]] <- res
next # Skip to next variable
}
# Get complete cases for this variable + group + covariates
complete_idx <- stats::complete.cases(full_data_subset)
n_complete <- sum(complete_idx)
res$n_obs <- n_complete # Store number of observations used
temp_data <- full_data_subset[complete_idx, , drop = FALSE]
# --- Check Minimum Observations and Group Levels *after* NA removal ---
# Ensure group variable is factor in the subset and drop unused levels
temp_data[[group]] <- base::droplevels(factor(temp_data[[group]]))
nlevels_subset <- nlevels(temp_data[[group]])
# Check if group still has at least two levels
if (nlevels_subset < 2) {
res$Status <- "Grouping variable has < 2 levels after NA removal"
res_list[[variable]] <- res
next
}
# Heuristic check for sufficient observations relative to basic model parameters
# Params approx = intercept + group_levels-1 + num_covariates
# This is a rough safeguard, not a guarantee of stable estimation.
params_approx <- 1 + (nlevels_subset - 1) + length(covariates)
min_obs_threshold <- params_approx + 5 # Require ~5 obs beyond basic parameters
min_group_threshold <- 2 * nlevels_subset # Require minimum average obs per group
if (n_complete < min_obs_threshold || n_complete < min_group_threshold) {
res$Status <- sprintf("Low N (%d) for model complexity (heuristic check: need >~%d obs for params, >~%d for group avg)",
n_complete, min_obs_threshold, min_group_threshold)
res_list[[variable]] <- res
next
}
# --- Set Reference Level for Group (if specified and valid in subset) ---
current_group_levels_subset <- levels(temp_data[[group]])
ref_level_to_use <- current_group_levels_subset[1] # Default is first level in subset
if (!is.null(group_ref)) {
if (group_ref %in% current_group_levels_subset) {
# Set the specified reference level if it exists in the current subset
temp_data[[group]] <- stats::relevel(temp_data[[group]], ref = group_ref)
ref_level_to_use <- group_ref
} else {
# Warn if specified ref level is missing after NA removal, use default
warning(sprintf("For variable '%s', specified 'group_ref' '%s' is not present after NA removal. Using default reference '%s'.",
variable, group_ref, ref_level_to_use), call. = FALSE, immediate. = TRUE)
# No need to relevel, as the default is the first level already
}
}
res$Group_Ref_Level_Used <- ref_level_to_use # Store the reference level actually used
# --- Prepare Outcome Variable `y` and Check for Constant ---
names(temp_data)[names(temp_data) == variable] <- "y" # Rename outcome for formula use
outcome_vec <- temp_data$y
# Check if outcome is constant within the subset
is_constant <- FALSE
if(is.numeric(outcome_vec)) {
sd_outcome <- stats::sd(outcome_vec, na.rm = TRUE) # Already handled NAs
# Check if sd is NA (single value) or effectively zero
if (is.na(sd_outcome) || sd_outcome < sqrt(.Machine$double.eps)) {
is_constant <- TRUE
}
} else {
# Check for single unique non-NA value (NAs already removed)
if (length(unique(outcome_vec)) <= 1) is_constant <- TRUE
}
if(is_constant) {
res$Status <- "Constant Outcome Variable (after NA removal)"
res$Type <- if(is.numeric(outcome_vec)) "Numeric Constant" else "Factor/Char Constant"
res_list[[variable]] <- res
next
}
# --- Determine Outcome Variable Type for Modeling ---
type <- "Unknown"
if (is.numeric(outcome_vec)) {
unique_vals <- unique(outcome_vec) # NAs already removed
n_unique <- length(unique_vals)
# Check if looks like binary integer (e.g., 0/1, 1/2)
is_int_like <- all(abs(unique_vals - round(unique_vals)) < sqrt(.Machine$double.eps))
if (n_unique == 2 && is_int_like) {
type <- "Binary"
# Convert to factor for modeling, ensuring consistent levels (e.g., 0 then 1)
temp_data$y <- factor(temp_data$y, levels = sort(unique_vals))
} else {
type <- "Continuous"
# Keep as numeric for Gaussian/Gamma GLM
}
} else if (is.factor(outcome_vec)) {
# Factor levels already dropped via droplevels earlier on temp_data[[group]]
# Need to ensure outcome factor levels are also correct for the subset
temp_data$y <- base::droplevels(factor(outcome_vec)) # Ensure y uses only present levels
current_levels_y <- levels(temp_data$y)
n_levels_y <- length(current_levels_y)
if (is.ordered(outcome_vec)) { # Check original variable's property
if (n_levels_y < 2) {
res$Status <- "Ordinal outcome with < 2 levels after NA removal"
res_list[[variable]] <- res
next
}
type <- "Ordinal"
# Ensure factor remains ordered with only present levels
temp_data$y <- factor(temp_data$y, levels = current_levels_y, ordered = TRUE)
} else { # Unordered factor
if (n_levels_y == 2) {
type <- "Binary"
# Ensure factor with correct levels (already done by droplevels+factor)
} else if (n_levels_y > 2) {
type <- "Categorical"
# Relevel ensures a consistent baseline category (first level) for multinom
temp_data$y <- stats::relevel(temp_data$y, ref = current_levels_y[1])
} else { # n_levels_y < 2
res$Status <- "Factor outcome with < 2 levels after NA removal"
res_list[[variable]] <- res
next
}
}
} else if (is.character(outcome_vec)) {
# Convert character to factor for modeling
temp_data$y <- factor(outcome_vec)
current_levels_y <- levels(temp_data$y)
n_levels_y <- length(current_levels_y)
if (n_levels_y == 2) {
type <- "Binary"
} else if (n_levels_y > 2) {
type <- "Categorical"
# Relevel ensures a consistent baseline category (first level) for multinom
temp_data$y <- stats::relevel(temp_data$y, ref = current_levels_y[1])
} else { # n_levels_y < 2
res$Status <- "Character outcome with < 2 levels after NA removal"
res_list[[variable]] <- res
next
}
} else {
# Handle unexpected types (e.g., list, date - though date might work as numeric)
res$Status <- paste("Unsupported outcome variable type:", class(outcome_vec)[1])
res$Type <- class(outcome_vec)[1]
res_list[[variable]] <- res
next
}
# Store the determined type, might be updated later by helper if Gamma GLM is used
res$Type <- type
# --- Ensure Covariates are Factor or Numeric (within subset) ---
# Also check for constant covariates within the subset, which often cause errors
for (covar in covariates) {
covar_vec <- temp_data[[covar]]
if(length(unique(stats::na.omit(covar_vec))) <= 1) {
warning(sprintf("Covariate '%s' is constant for variable '%s' after NA removal. This will likely cause model fitting errors.",
covar, variable), call. = FALSE, immediate. = TRUE)
}
# Convert character covariates to factors if not already numeric/factor
if (!is.numeric(covar_vec) && !is.factor(covar_vec)) {
if (verbose) message(sprintf("Converting covariate '%s' to factor for variable '%s'.", covar, variable))
temp_data[[covar]] <- factor(covar_vec)
}
}
# --- Construct Model Formulas ---
terms_full <- c(group, covariates)
# Null model: only covariates, or intercept-only if no covariates
terms_null <- if (length(covariates) > 0) covariates else "1"
# Use tryCatch for formula construction just in case of weird term names
formula_full <- tryCatch(stats::reformulate(termlabels = terms_full, response = "y"),
error = function(e) NULL)
formula_null <- tryCatch(stats::reformulate(termlabels = terms_null, response = "y"),
error = function(e) NULL)
if (is.null(formula_full) || is.null(formula_null)) {
res$Status <- "Error: Failed to construct model formulas (invalid terms?)"
res_list[[variable]] <- res
next
}
# --- Fit Models and Perform LRT via Helper Function ---
model_out <- fit_models_and_lrt(temp_data = temp_data,
formula_full = formula_full,
formula_null = formula_null,
type = type, # Pass initial type
variable = variable,
verbose = verbose,
zero_threshold = zero_threshold,
skew_threshold = skew_threshold)
# --- Store Results from Helper ---
res$p_value <- model_out$p_value # Store raw p-value
res$Status <- model_out$model_status
res$Type <- model_out$type_used # Update type if Gamma was used
res$Gamma_Shift_Warning <- model_out$gamma_shift_warning
res$Convergence_Warning <- model_out$convergence_warning
res_list[[variable]] <- res # Add results for this variable to the list
# --- Periodic Garbage Collection (optional, for very large datasets/loops) ---
if (i %% 50 == 0 && (nrow(data) * ncol(data) > 1e6 || length(var_names) > 100)) {
if (verbose) message("Running garbage collection...")
gc(verbose = FALSE)
}
} # End loop through variables
# --- 5. Combine, Adjust, and Format Results ---
if (length(res_list) == 0) {
# Should have been caught earlier, but as a safeguard
warning("No results were generated in the loop.")
# Return empty structure consistent with initial empty check
return(data.frame(Variable = character(), Type = character(), Covariates_Used = character(),
Group_Ref_Level_Used = character(), n_obs = integer(), Status = character(),
p_value = character(), p_value_FDR = character(), Significant = character(),
Gamma_Shift_Warning = logical(), Convergence_Warning = logical(),
stringsAsFactors = FALSE))
}
# Combine list of data frames into one
res_df <- do.call(rbind, res_list)
rownames(res_df) <- NULL # Clean row names
# Calculate FDR adjusted p-values only for valid numeric raw p-values
valid_p_idx <- which(!is.na(res_df$p_value) & is.finite(res_df$p_value))
res_df$p_value_FDR <- NA_real_ # Initialize column before filling
if (length(valid_p_idx) > 0) {
# Calculate FDR adjusted p-values
res_df$p_value_FDR[valid_p_idx] <- stats::p.adjust(res_df$p_value[valid_p_idx], method = "fdr")
# Determine significance based on FDR p-value
# Ensure Significant column exists if needed (should be initialized)
if (!"Significant" %in% names(res_df)) res_df$Significant <- "No"
res_df$Significant <- ifelse(!is.na(res_df$p_value_FDR) & res_df$p_value_FDR < alpha,
"Yes", "No")
# Ensure non-significant if p-value was NA or adjustment resulted in NA
res_df$Significant[is.na(res_df$p_value_FDR)] <- "No"
} else {
# If no valid p-values, ensure FDR is NA and Significance is No
res_df$p_value_FDR <- NA_real_
res_df$Significant <- "No"
}
# --- Format P-values for Reporting (after FDR calculation) ---
# Convert numeric p-values to formatted strings
p_val_fmt <- ifelse(!is.na(res_df$p_value) & res_df$p_value < 0.001,
"<0.001", sprintf("%.3f", res_df$p_value))
p_val_fdr_fmt <- ifelse(!is.na(res_df$p_value_FDR) & res_df$p_value_FDR < 0.001,
"<0.001", sprintf("%.3f", res_df$p_value_FDR))
# Assign formatted strings back, handling potential NAs from numeric columns
res_df$p_value <- ifelse(is.na(res_df$p_value), NA_character_, p_val_fmt)
res_df$p_value_FDR <- ifelse(is.na(res_df$p_value_FDR), NA_character_, p_val_fdr_fmt)
# --- 7. Final Output ---
# Ensure consistent column order
final_order <- c("Variable", "Type", "Covariates_Used", "Group_Ref_Level_Used",
"n_obs", "Status", "p_value", "p_value_FDR", "Significant",
"Gamma_Shift_Warning", "Convergence_Warning")
# Check if all expected columns are present before ordering
if (!all(final_order %in% names(res_df))) {
warning("Internal issue: Some expected result columns might be missing.")
# Order using only the columns that are present
final_order <- intersect(final_order, names(res_df))
}
res_df <- res_df[, final_order, drop = FALSE]
if (verbose) message("Analysis finished. Returning results table.")
return(res_df)
}
#' Plot Results from compare_groups_auto_v4 function
#'
#' Cria um gráfico de pontos (lollipop) mostrando a significância (-log10 p-valor ajustado por FDR)
#' para cada variável testada a partir do output da função compare_groups_auto_v4.
#' Ideal para visualização de múltiplas comparações em artigos científicos.
#'
#' @param results_df Dataframe retornado pela função `compare_groups_auto_v4`.
#' @param p_value_type Character. Qual p-valor usar para o eixo y?
#' Padrão é `"FDR"` (p_value_FDR), recomendado para múltiplas comparações.
#' Pode ser alterado para `"raw"` (p_value) se desejado.
#' @param significance_threshold Numeric. Limiar de significância (alfa) para desenhar
#' uma linha vertical e colorir os pontos (padrão 0.05).
#' @param label_significant Logical. Se `TRUE` (padrão), rotula os pontos mais
#' significativos diretamente no gráfico usando `ggrepel`.
#' @param n_label Integer. Número máximo de pontos significativos a serem rotulados
#' (padrão 15), para evitar poluição visual. Usado apenas se `label_significant = TRUE`.
#' @param plot_title Character. Título opcional para o gráfico. Se `NULL` (padrão),
#' um título genérico é gerado.
#' @param exclude_statuses Character vector. Status da coluna `Status` a serem
#' excluídos do gráfico (padrão: exclui erros, N baixo, constante, não suportado).
#' Defina como `NULL` para incluir todos os status.
#' @param order_by_significance Logical. Se `TRUE` (padrão), ordena as variáveis
#' no eixo y pela sua significância (p-valor mais baixo no topo). Se `FALSE`,
#' mantém a ordem original do dataframe.
#' @param custom_labels Named character vector. Opcional, permite renomear variáveis
#' no eixo Y do gráfico. Ex: `c("ScoreContinuo" = "Cognitive Score", "BiomarcadorGamma" = "Biomarker A")`.
#'
#' @return Um objeto ggplot.
#'
#' @details
#' - A função converte os p-valores formatados (strings como "<0.001") de volta para numéricos
#' para poder calcular -log10(p-valor). "<0.001" é tratado como um valor pequeno (p.ex., 1e-4).
#' - Variáveis com p-valores NA ou status excluídos não são plotadas.
#' - Requer os pacotes `ggplot2`, `dplyr`, `ggrepel`, e `forcats`.
#'
#' @importFrom ggplot2 ggplot aes geom_point geom_segment geom_vline scale_color_manual labs theme_minimal theme element_text scale_y_discrete ggtitle
#' @importFrom ggrepel geom_text_repel
#' @importFrom dplyr filter mutate arrange select case_when pull rename any_of
#' @importFrom forcats fct_reorder fct_relevel
#' @importFrom rlang .data sym
#' @examples
#' \dontrun{
#' # Supondo que 'resultados_ex1' existe do exemplo da função anterior
#' if (exists("resultados_ex1") && is.data.frame(resultados_ex1)) {
#' # Plotagem básica
#' p1 <- plot_comparison_results(resultados_ex1)
#' print(p1)
#'
#' # Plotagem com mais rótulos e título personalizado
#' p2 <- plot_comparison_results(resultados_ex1,
#' n_label = 20,
#' plot_title = "Comparação entre Centros (Ajustado por Idade e Sexo)",
#' custom_labels = c("ScoreContinuo" = "Escore Cognitivo",
#' "BiomarcadorGamma" = "Biomarcador (Gamma)"))
#' print(p2)
#'
#' # Plotagem usando p-valor bruto e sem ordenar
#' p3 <- plot_comparison_results(resultados_ex1,
#' p_value_type = "raw",
#' order_by_significance = FALSE,
#' label_significant = FALSE)
#' print(p3)
#' } else {
#' print("Execute o exemplo da função compare_groups_auto_v4 primeiro para gerar 'resultados_ex1'")
#' }
#' }
plot_comparison_results <- function(res_df,
p_value_type = "FDR",
significance_threshold = 0.05,
label_significant = TRUE,
n_label = 15,
plot_title = NULL,
exclude_statuses = c("Constant Outcome Variable",
"Low N", # Captura status começando com Low N
"Unsupported outcome variable type",
"Grouping variable has < 2 levels",
"Error", # Captura status começando com Error
"Not processed"),
order_by_significance = TRUE,
custom_labels = NULL) {
# --- 1. Input Checks ---
req_cols <- c("Variable", "Status", "p_value", "p_value_FDR")
if (!all(req_cols %in% names(results_df))) {
stop("O dataframe 'results_df' não contém as colunas necessárias: ",
paste(req_cols[!req_cols %in% names(results_df)], collapse = ", "))
}
if (!p_value_type %in% c("FDR", "raw")) {
stop("'p_value_type' deve ser 'FDR' ou 'raw'.")
}
if (!requireNamespace("ggplot2", quietly = TRUE)) {
stop("Pacote 'ggplot2' é necessário. Por favor, instale-o.", call. = FALSE)
}
if (!requireNamespace("dplyr", quietly = TRUE)) {
stop("Pacote 'dplyr' é necessário. Por favor, instale-o.", call. = FALSE)
}
if (!requireNamespace("forcats", quietly = TRUE)) {
stop("Pacote 'forcats' é necessário. Por favor, instale-o.", call. = FALSE)
}
if (label_significant && !requireNamespace("ggrepel", quietly = TRUE)) {
warning("Pacote 'ggrepel' não encontrado. Rótulos não serão adicionados. Instale 'ggrepel' para habilitar.", call. = FALSE)
label_significant <- FALSE
}
# --- 2. Data Preparation ---
p_col_name <- if (p_value_type == "FDR") "p_value_FDR" else "p_value"
p_col_sym <- rlang::sym(p_col_name)
# Filtrar por status
plot_data <- results_df
if (!is.null(exclude_statuses) && length(exclude_statuses) > 0) {
# Cria um padrão regex para buscar status que *começam* com as strings em exclude_statuses
# ou que correspondem exatamente
exclude_pattern <- paste0("^(", paste(exclude_statuses, collapse = "|"), ")")
plot_data <- dplyr::filter(plot_data, !grepl(exclude_pattern, .data$Status, ignore.case = TRUE))
}
# Converter p-valor string para numérico
# Tratar "<0.001" como um valor pequeno e NAs
p_numeric <- plot_data[[p_col_name]]
p_numeric <- suppressWarnings(as.numeric(gsub("<0\\.001", "1e-4", p_numeric))) # Usar 1e-4 ou similar
plot_data <- dplyr::mutate(plot_data,
p_numeric = p_numeric,
# Calcular -log10(p), tratando p=0 ou NAs
neg_log10_p = dplyr::case_when(
is.na(p_numeric) ~ NA_real_,
p_numeric == 0 ~ -log10(1e-300), # Evitar Inf, usar um valor grande
TRUE ~ -log10(p_numeric)
),
# Determinar significância baseada no p numérico
Significant_num = !is.na(p_numeric) & p_numeric < significance_threshold
)
# Filtrar NAs no valor de plotagem e garantir que neg_log10_p é finito
plot_data <- dplyr::filter(plot_data, !is.na(.data$neg_log10_p) & is.finite(.data$neg_log10_p))
if (nrow(plot_data) == 0) {
warning("Nenhum dado válido restante para plotar após filtragem e conversão de p-valor.", call. = FALSE)
# Retorna um gráfico vazio ou mensagem
return(ggplot2::ggplot() + ggplot2::theme_void() + ggplot2::ggtitle("Nenhum dado para plotar"))
}
# Ordenar variáveis se solicitado
if (order_by_significance) {
# Reordena o fator Variable com base em neg_log10_p (valores maiores primeiro)
plot_data <- dplyr::mutate(plot_data, Variable = forcats::fct_reorder(.data$Variable, .data$neg_log10_p, .desc = FALSE))
} else {
# Mantém a ordem original ou transforma em fator para garantir ordem discreta
plot_data <- dplyr::mutate(plot_data, Variable = factor(.data$Variable, levels = unique(.data$Variable)))
}
# Aplicar rótulos personalizados se fornecidos
if (!is.null(custom_labels)) {
original_levels <- levels(plot_data$Variable)
new_labels <- ifelse(original_levels %in% names(custom_labels),
custom_labels[original_levels],
original_levels)
# Renomeia os níveis do fator
plot_data <- dplyr::mutate(plot_data, Variable = factor(.data$Variable, levels = original_levels, labels = new_labels))
}
# --- 3. Criar o Gráfico ---
# Definir título padrão se não fornecido
if (is.null(plot_title)) {
plot_title <- paste("Significância da Comparação de Grupos (-log10", p_value_type, "p-valor)")
}
# Limiar para a linha vertical
neg_log10_threshold <- -log10(significance_threshold)
# Gráfico base
gg <- ggplot2::ggplot(plot_data, ggplot2::aes(x = .data$neg_log10_p, y = .data$Variable)) +
# Segmentos do lollipop (opcional, mas visualmente útil)
ggplot2::geom_segment(ggplot2::aes(xend = 0, yend = .data$Variable), color = "grey80", linewidth = 0.5) +
# Pontos, coloridos por significância
ggplot2::geom_point(ggplot2::aes(color = .data$Significant_num), size = 2.5, alpha = 0.8) +
# Linha vertical do limiar de significância
ggplot2::geom_vline(xintercept = neg_log10_threshold, linetype = "dashed", color = "darkred", linewidth = 0.8) +
# Escala de cores manual para significância
ggplot2::scale_color_manual(values = c(`FALSE` = "grey40", `TRUE` = "red"),
labels = c(`FALSE` = paste0("p >= ", significance_threshold),
`TRUE` = paste0("p < ", significance_threshold)),
name = paste(p_value_type, "Significância")) +
# Rótulos dos eixos e título
ggplot2::labs(
title = plot_title,
x = bquote(-log[10] ~ .(paste("(", p_value_type, " p-valor)", sep=""))), # Expressão para log10
y = "Variável Analisada"
) +
# Tema limpo, comum em publicações
ggplot2::theme_minimal(base_size = 12) +
ggplot2::theme(
axis.text.y = ggplot2::element_text(size = 8), # Ajustar tamanho do texto no eixo Y se houver muitas variáveis
axis.title = ggplot2::element_text(size = 10),
plot.title = ggplot2::element_text(size = 12, face = "bold", hjust = 0.5),
legend.position = "bottom"
)
# Adicionar rótulos aos pontos significativos usando ggrepel
if (label_significant && sum(plot_data$Significant_num) > 0) {
# Preparar dados para rótulos (top N significativos)
label_data <- plot_data %>%
dplyr::filter(.data$Significant_num) %>%
dplyr::arrange(dplyr::desc(.data$neg_log10_p)) %>%
dplyr::slice_head(n = n_label)
if (nrow(label_data) > 0) {
gg <- gg +
ggrepel::geom_text_repel(
data = label_data,
ggplot2::aes(label = .data$Variable),
size = 2.5, # Tamanho do texto do rótulo
max.overlaps = Inf, # Tentar mostrar todos os rótulos (pode ajustar)
segment.color = "grey50",
segment.size = 0.3,
nudge_x = 0.15, # Ajustar posição do rótulo
box.padding = 0.3,
point.padding = 0.3
)
}
}
return(gg)
}
library(ggrepel)
library(forcats)
#plot_comparison_results(results)
#plot_comparison_results(results2)
# Para salvar um gráfico:
# ggsave("meu_grafico_significancia.png", plot = plot1, width = 8, height = 6, dpi = 300)
# ggsave("meu_grafico_significancia.pdf", plot = plot1, width = 8, height = 6)
#' Cria uma Tabela Formatada dos Resultados da Função compare_groups_auto_v4 (v2 Corrigida)
#'
#' Gera uma tabela bonita e organizada, adequada para publicação científica,
#' a partir do dataframe de resultados da função `compare_groups_auto_v4`.
#' Utiliza os pacotes knitr e kableExtra para formatação.
#'
#' @param results_df Dataframe. O output da função `compare_groups_auto_v4`.
#' @param caption Character. O título/legenda da tabela.
#' @param columns_to_include Character vector. Nomes das colunas do `results_df`
#' a serem incluídas na tabela. Padrão inclui as colunas mais comuns.
#' @param column_rename Named character vector. Para renomear as colunas na tabela final.
#' Ex: `c("Variable" = "Variável", "p_value_FDR" = "P-valor Ajustado (FDR)")`.
#' @param highlight_significant Logical. Se `TRUE` (padrão), aplica negrito aos
#' p-valores ajustados (FDR) que são significativos (baseado na coluna 'Significant'
#' do input).
#' @param add_notes_column Logical. Se `TRUE` (padrão), adiciona uma coluna "Notas"
#' com códigos para avisos (Convergência, Shift Gamma) ou status não "OK".
#' Uma nota de rodapé geral explicará os códigos.
#' @param status_codes Named character vector. Códigos a serem usados na coluna Notas
#' para status específicos (além de Convergência 'C' e Gamma 'G').
#' Padrão inclui códigos para status comuns como 'N=Baixo', 'Constante', 'Erro'.
#' Ex: `c("Low N" = "N", "Constant Outcome Variable" = "K", "Error" = "E")`
#' O matching é feito buscando o início do texto do Status.
#' @param format Character. Formato de saída para `kable` (p.ex., "html", "latex",
#' "markdown"). Padrão é "pipe" (Markdown GFM). Para artigos, "latex" ou "html"
#' (dependendo do fluxo de trabalho) são comuns.
#' @param booktabs Logical. Usar `booktabs = TRUE` para tabelas LaTeX (recomendado).
#' Padrão é `TRUE`.
#' @param full_width Logical. Argumento `full_width` para `kable_styling`. Padrão `FALSE`.
#' @param font_size Numeric. Tamanho da fonte para `kable_styling`. Padrão `NULL`.
#' @param ... Argumentos adicionais a serem passados para `kableExtra::kable_styling`.
#'
#' @return Um objeto kable (tabela formatada).
#'
#' @details
#' - A função seleciona, renomeia e opcionalmente formata colunas.
#' - P-valores já devem estar formatados como string no `results_df` (incluindo "<0.001").
#' - A coluna 'Notas' e a nota de rodapé ajudam a comunicar sucintamente problemas ou
#' avisos metodológicos sem sobrecarregar a tabela principal.
#' - Requer os pacotes `knitr`, `kableExtra`, e `dplyr`.
#'
#' @importFrom knitr kable
#' @importFrom kableExtra kable_styling footnote cell_spec row_spec kable_classic save_kable
#' @importFrom dplyr select rename mutate case_when across all_of left_join relocate filter any_of arrange desc slice_head setdiff intersect
#' @importFrom rlang := !! sym `%||%`
#' @importFrom tibble tibble
#'
create_results_table <- function(results_df,
caption = "Resultados da Comparação de Grupos",
columns_to_include = c("Variable", "n_obs",
"Group_Ref_Level_Used", "Status",
"p_value", "p_value_FDR"),
column_rename = c("Variable" = "Variável",
"n_obs" = "N",
"Group_Ref_Level_Used" = "Ref.",
"Status" = "Status Análise",
"p_value" = "P-valor",
"p_value_FDR" = "P-valor Ajustado"),
highlight_significant = TRUE,
add_notes_column = TRUE,
status_codes = c("Low N" = "N", # N baixo
"Constant Outcome" = "K", # Constante
"Error" = "E", # Erro geral
"Unsupported" = "U", # Tipo não suportado
"No DF improvement" = "DF", # Sem melhora DF
"calculation failed" = "Calc", # Falha cálculo
"converge" = "C" # Usa 'C' se status contiver 'converge'
),
format = "pipe", # pipe é bom para markdown GFM
booktabs = TRUE,
full_width = FALSE,
font_size = NULL,
...) {
# --- 1. Input Checks ---
if (!requireNamespace("knitr", quietly = TRUE)) stop("Pacote 'knitr' é necessário.")
if (!requireNamespace("kableExtra", quietly = TRUE)) stop("Pacote 'kableExtra' é necessário.")
if (!requireNamespace("dplyr", quietly = TRUE)) stop("Pacote 'dplyr' é necessário.")
if (!requireNamespace("rlang", quietly = TRUE)) stop("Pacote 'rlang' é necessário.")
req_input_cols <- c("Variable", "Status", "p_value", "p_value_FDR", "Significant",
"Convergence_Warning", "Gamma_Shift_Warning")
if (!all(req_input_cols %in% names(results_df))) {
missing_cols <- req_input_cols[!req_input_cols %in% names(results_df)]
stop("O dataframe 'results_df' não contém as colunas de input necessárias: ",
paste(missing_cols, collapse = ", "))
}
original_columns_to_include <- columns_to_include # Guardar a lista original pedida
if (!all(columns_to_include %in% names(results_df))) {
missing_cols <- columns_to_include[!columns_to_include %in% names(results_df)]
warning("As seguintes colunas especificadas em 'columns_to_include' não foram encontradas em 'results_df': ",
paste(missing_cols, collapse = ", "), ". Elas serão ignoradas.", immediate. = TRUE)
columns_to_include <- intersect(columns_to_include, names(results_df))
if(length(columns_to_include) == 0) stop("Nenhuma coluna válida para incluir na tabela.")
}
# --- 2. Data Preparation ---
# Selecionar colunas necessárias para processamento (inclui as de notas/highlight)
cols_for_processing <- unique(c(columns_to_include, req_input_cols))
table_data <- results_df %>%
dplyr::select(dplyr::all_of(intersect(cols_for_processing, names(results_df))))
footnote_explanations <- list() # Usar lista para evitar problemas com nomes duplicados
notes_col_present <- FALSE
# Criar coluna de Notas
if (add_notes_column) {
table_data <- table_data %>%
dplyr::mutate(Notes = "") # Inicializar coluna
# Adicionar códigos para avisos booleanos
if (any(table_data$Convergence_Warning, na.rm = TRUE)) {
table_data <- dplyr::mutate(table_data, Notes = ifelse(.data$Convergence_Warning, paste0(.data$Notes, "C"), .data$Notes))
if (!"C" %in% names(footnote_explanations)) footnote_explanations[["C"]] <- "Modelo não convergiu ou problema similar (e.g., Hessian em polr)"
}
if (any(table_data$Gamma_Shift_Warning, na.rm = TRUE)) {
table_data <- dplyr::mutate(table_data, Notes = ifelse(.data$Gamma_Shift_Warning, paste0(.data$Notes, "G"), .data$Notes))
if (!"G" %in% names(footnote_explanations)) footnote_explanations[["G"]] <- "Valores não-positivos ajustados (shift) para GLM Gamma"
}
# Adicionar códigos para status não "OK"
if (length(status_codes) > 0) {
# Linhas que têm status não "OK" E ainda não receberam código de aviso C ou G
needs_status_code <- table_data$Status != "OK" & !grepl("[CG]", table_data$Notes)
for (status_text in names(status_codes)) {
code <- status_codes[[status_text]]
match_idx <- which(grepl(status_text, table_data$Status, ignore.case = TRUE) & needs_status_code)
if(length(match_idx) > 0) {
table_data$Notes[match_idx] <- paste0(table_data$Notes[match_idx], code)
needs_status_code[match_idx] <- FALSE # Marcar como processado
explanation <- paste0(code, " = Status contém '", status_text, "'")
if (!code %in% names(footnote_explanations)) footnote_explanations[[code]] <- explanation
}
}
# Código genérico 'S' para status não-OK restantes sem código específico
final_needs_code_idx <- which(table_data$Status != "OK" & !grepl("[CG]", table_data$Notes) & needs_status_code)
if (length(final_needs_code_idx) > 0) {
table_data$Notes[final_needs_code_idx] <- paste0(table_data$Notes[final_needs_code_idx], "S")
if (!"S" %in% names(footnote_explanations)) footnote_explanations[["S"]] <- "S = Outro status não-'OK'"
}
}
# Verificar se a coluna Notes tem algum conteúdo útil
if (any(table_data$Notes != "", na.rm = TRUE)) {
notes_col_present <- TRUE
if (!"Notes" %in% names(column_rename)) column_rename["Notes"] <- "Notas" # Nome padrão
} else {
table_data <- dplyr::select(table_data, -dplyr::all_of("Notes")) # Remover se vazia
}
} # Fim de if(add_notes_column)
# Aplicar highlight (negrito) se solicitado
p_col_original <- "p_value_FDR" # Coluna base para highlight
target_col_formatted <- paste0(p_col_original, "_fmt")
final_col_name <- column_rename[p_col_original] %||% "P-valor Ajustado" # Nome final da coluna
highlight_applied <- FALSE # Flag para saber se o highlight foi aplicado
if (highlight_significant && p_col_original %in% names(table_data)) {
if (!"Significant" %in% names(table_data)) {
warning("Coluna 'Significant' não encontrada. Não é possível aplicar highlight.", immediate. = TRUE)
} else if (!p_col_original %in% original_columns_to_include) {
warning(paste0("Coluna '", p_col_original, "' não está em 'columns_to_include'. Highlight não será aplicado a ela."), immediate. = TRUE)
} else {
# Garantir que a coluna original é character para evitar problemas com cell_spec
table_data[[p_col_original]] <- as.character(table_data[[p_col_original]])
table_data <- table_data %>%
dplyr::mutate(
!!rlang::sym(target_col_formatted) := dplyr::case_when(
.data$Significant == "Yes" & !is.na(.data[[p_col_original]]) ~
kableExtra::cell_spec(.data[[p_col_original]], format = format, bold = TRUE),
# Caso padrão: apenas retorna o valor como está (ou com cell_spec sem bold se necessário)
# Se format for 'latex' ou 'html', cell_spec é necessário para escape correto
!is.na(.data[[p_col_original]]) & format %in% c("latex", "html") ~
kableExtra::cell_spec(.data[[p_col_original]], format = format, bold = FALSE),
# Para outros formatos como 'pipe', o valor original pode ser suficiente
TRUE ~ .data[[p_col_original]]
)
)
highlight_applied <- TRUE
# Atualizar quais colunas incluir e como renomear
columns_to_include <- setdiff(columns_to_include, p_col_original) # Remover original da lista final
# Encontrar posição para inserir a nova coluna (após p_value bruto, se existir)
raw_p_col_name_in_list <- names(column_rename)[column_rename == (column_rename["p_value"] %||% "P-valor")] %||% "p_value"
insert_pos <- which(columns_to_include == raw_p_col_name_in_list)
if (length(insert_pos) == 0) { # Se p_value não estiver, inserir após Variable
insert_pos <- which(columns_to_include == "Variable")
if (length(insert_pos) == 0) insert_pos <- 0 # Inserir no início se nem Variable estiver
}
columns_to_include <- append(columns_to_include, target_col_formatted, after = insert_pos)
# Atualizar o mapa de renomeação
column_rename[target_col_formatted] <- final_col_name # Renomeia a nova coluna
column_rename <- column_rename[!names(column_rename) %in% p_col_original] # Remove renomeação da original
}
}
# Adicionar coluna Notes à lista final se ela foi criada
if (notes_col_present && !("Notes" %in% columns_to_include)) {
columns_to_include <- c(columns_to_include, "Notes")
}
# Selecionar colunas finais na ordem desejada
final_columns_ordered <- intersect(columns_to_include, names(table_data))
# Filtrar `column_rename` para conter apenas as colunas que realmente existem na tabela final
valid_rename_keys <- names(column_rename)[names(column_rename) %in% final_columns_ordered]
column_rename_final <- column_rename[valid_rename_keys]
# Criar a tabela final com colunas selecionadas e renomeadas
table_data_final <- table_data %>%
dplyr::select(dplyr::all_of(final_columns_ordered)) %>%
dplyr::rename(dplyr::any_of(column_rename_final))
# --- 3. Gerar Tabela Kable ---
# escape = FALSE é necessário se usamos cell_spec para formatar (negrito)
escape_setting <- highlight_applied && format %in% c("latex", "html")
kbl_obj <- knitr::kable(table_data_final,
format = format,
caption = caption,
booktabs = booktabs,
linesep = "",
escape = !escape_setting, # Escape TRUE se NENHUMA formatação especial foi usada
align = 'l') # Alinhar colunas à esquerda por padrão
# --- 4. Aplicar Estilo KableExtra ---
kbl_obj <- kableExtra::kable_styling(
kbl_obj,
bootstrap_options = c("striped", "hover", "condensed", "responsive"),
full_width = full_width,
font_size = font_size,
latex_options = if(format == "latex") c("striped", "repeat_header") else NULL,
...
)
# Adicionar nota de rodapé
if (notes_col_present && length(footnote_explanations) > 0) {
footnote_text <- paste(names(footnote_explanations), footnote_explanations, sep = ": ", collapse = "; ")
kbl_obj <- kableExtra::footnote(kbl_obj, general = footnote_text,
general_title = "Notas:",
footnote_as_chunk = TRUE,
escape = TRUE, # Nota de rodapé geralmente não precisa de escape=FALSE
threeparttable = (format == "latex")
)
}
return(kbl_obj)
}
# --- Usar a Função create_results_table ---
# Certifique-se de que o dataframe de resultados (ex: resultados_ex1) existe
if (exists("resultados_ex1") && is.data.frame(resultados_ex1)) {
# Exemplo 1: Tabela padrão em formato Markdown (bom para console/Rmd)
cat("\n--- Tabela 1 (Markdown) ---\n")
tabela_md <- create_results_table(
results_df = resultados_ex1,
caption = "Tabela 1: Comparação de Variáveis entre Centros (Ajustado por Idade e Sexo)",
format = "pipe" # ou "markdown"
)
print(tabela_md)
# Exemplo 2: Tabela formatada para HTML com colunas selecionadas e renomeadas
cat("\n--- Tabela 2 (HTML - veja no Viewer ou salve) ---\n")
tabela_html <- create_results_table(
results_df = resultados_ex1,
caption = "Tabela 2: Resultados Principais da Comparação entre Centros",
columns_to_include = c("Variable", "n_obs", "Group_Ref_Level_Used", "p_value", "p_value_FDR"),
column_rename = c("Variable" = "Variável Analisada",
"n_obs" = "N",
"Group_Ref_Level_Used" = "Grupo Ref.",
"p_value" = "P (Bruto)",
"p_value_FDR" = "P (Ajustado)"),
highlight_significant = TRUE,
add_notes_column = TRUE,
format = "html"
)
# print(tabela_html) # No RStudio, abre no Viewer
# Para salvar:
# kableExtra::save_kable(tabela_html, file = "tabela_resultados_exemplo.html")
# Exemplo 3: Tabela para LaTeX (requer pacote LaTeX)
# cat("\n--- Tabela 3 (LaTeX - para documentos .tex) ---\n")
# tabela_latex <- create_results_table(
# results_df = resultados_ex1,
# caption = "Comparação entre Centros",
# columns_to_include = c("Variable", "n_obs", "p_value_FDR"),
# column_rename = c("Variable" = "Variável", "n_obs" = "N", "p_value_FDR" = "P ajustado"),
# highlight_significant = TRUE,
# add_notes_column = TRUE,
# format = "latex",
# booktabs = TRUE
# )
# print(tabela_latex) # Imprime o código LaTeX
} else {
print("Por favor, execute o exemplo da função compare_groups_auto_v4 primeiro para gerar 'resultados_ex1'.")
}
## [1] "Por favor, execute o exemplo da função compare_groups_auto_v4 primeiro para gerar 'resultados_ex1'."
#cat("\n--- Tabela 2 (HTML - veja no Viewer ou salve) ---\n")
#tabela_html <- create_results_table(
# results_df = results,
# caption = "Tabela 2: Resultados Principais",
# columns_to_include = c("Variable", "n_obs", "Group_Ref_Level_Used", "p_value", "p_value_FDR"),
# column_rename = c("Variable" = "Variável Analisada",
# "n_obs" = "N",
# "Group_Ref_Level_Used" = "Grupo Ref.",
# "p_value" = "P (Bruto)",
# "p_value_FDR" = "P (Ajustado)"),
# highlight_significant = TRUE,
# add_notes_column = TRUE,
# format = "html"
#)
#print(tabela_html)
#cat("\n--- Tabela 2 (HTML - veja no Viewer ou salve) ---\n")
#tabela_html2 <- create_results_table(
# results_df = results2,
# caption = "Tabela 2: Resultados Principais",
# columns_to_include = c("Variable", "n_obs", "Group_Ref_Level_Used", "p_value", "p_value_FDR"),
# column_rename = c("Variable" = "Variável Analisada",
# "n_obs" = "N",
# "Group_Ref_Level_Used" = "Grupo Ref.",
# "p_value" = "P (Bruto)",
# "p_value_FDR" = "P (Ajustado)"),
# highlight_significant = TRUE,
# add_notes_column = TRUE,
# format = "html"
#)
#print(tabela_html2)
# Compare brain variables across groups (cognitive)
compare_groups_auto_v4(imputed_data[c(1, 5, 60:90)], group = "cognitive", covariates = "age_pcr")
## Variable
## 1 right_accumbens_area
## 2 left_accumbens_area
## 3 right_amygdala
## 4 left_amygdala
## 5 right_cerebellum_exterior
## 6 left_cerebellum_exterior
## 7 right_hippocampus
## 8 left_hippocampus
## 9 right_putamen
## 10 left_putamen
## 11 right_thalamus_proper
## 12 left_thalamus_proper
## 13 fornix_right
## 14 fornix_left
## 15 anterior_limb_of_internal_capsule_right
## 16 anterior_limb_of_internal_capsule_left
## 17 posterior_limb_of_internal_capsule_inc_cerebral_peduncle_right
## 18 posterior_limb_of_internal_capsule_inc_cerebral_peduncle_left
## 19 corpus_callosum
## 20 right_a_cg_g_anterior_cingulate_gyrus
## 21 left_a_cg_g_anterior_cingulate_gyrus
## 22 right_a_ins_anterior_insula
## 23 left_a_ins_anterior_insula
## 24 right_an_g_angular_gyrus
## 25 left_an_g_angular_gyrus
## 26 right_cun_cuneus
## 27 left_cun_cuneus
## 28 right_ent_entorhinal_area
## 29 left_ent_entorhinal_area
## 30 right_g_re_gyrus_rectus
## 31 left_g_re_gyrus_rectus
## Type Covariates_Used Group_Ref_Level_Used n_obs
## 1 Continuous (Gaussian GLM used) age_pcr 1 463
## 2 Continuous (Gaussian GLM used) age_pcr 1 463
## 3 Continuous (Gaussian GLM used) age_pcr 1 463
## 4 Continuous (Gaussian GLM used) age_pcr 1 463
## 5 Continuous (Gaussian GLM used) age_pcr 1 463
## 6 Continuous (Gaussian GLM used) age_pcr 1 463
## 7 Continuous (Gaussian GLM used) age_pcr 1 463
## 8 Continuous (Gaussian GLM used) age_pcr 1 463
## 9 Continuous (Gaussian GLM used) age_pcr 1 463
## 10 Continuous (Gaussian GLM used) age_pcr 1 463
## 11 Continuous (Gaussian GLM used) age_pcr 1 463
## 12 Continuous (Gaussian GLM used) age_pcr 1 463
## 13 Continuous (Gaussian GLM used) age_pcr 1 463
## 14 Continuous (Gaussian GLM used) age_pcr 1 463
## 15 Continuous (Gaussian GLM used) age_pcr 1 463
## 16 Continuous (Gaussian GLM used) age_pcr 1 463
## 17 Continuous (Gaussian GLM used) age_pcr 1 463
## 18 Continuous (Gaussian GLM used) age_pcr 1 463
## 19 Continuous (Gaussian GLM used) age_pcr 1 463
## 20 Continuous (Gaussian GLM used) age_pcr 1 463
## 21 Continuous (Gaussian GLM used) age_pcr 1 463
## 22 Continuous (Gaussian GLM used) age_pcr 1 463
## 23 Continuous (Gaussian GLM used) age_pcr 1 463
## 24 Continuous (Gaussian GLM used) age_pcr 1 463
## 25 Continuous (Gaussian GLM used) age_pcr 1 463
## 26 Continuous (Gaussian GLM used) age_pcr 1 463
## 27 Continuous (Gaussian GLM used) age_pcr 1 463
## 28 Continuous (Gaussian GLM used) age_pcr 1 463
## 29 Continuous (Gaussian GLM used) age_pcr 1 463
## 30 Continuous (Gaussian GLM used) age_pcr 1 463
## 31 Continuous (Gaussian GLM used) age_pcr 1 463
## Status p_value p_value_FDR Significant Gamma_Shift_Warning
## 1 OK 0.884 0.931 No FALSE
## 2 OK 0.497 0.854 No FALSE
## 3 OK 0.668 0.867 No FALSE
## 4 OK 0.318 0.854 No FALSE
## 5 OK 0.277 0.854 No FALSE
## 6 OK 0.251 0.854 No FALSE
## 7 OK 0.134 0.854 No FALSE
## 8 OK 0.134 0.854 No FALSE
## 9 OK 0.948 0.948 No FALSE
## 10 OK 0.439 0.854 No FALSE
## 11 OK 0.032 0.854 No FALSE
## 12 OK 0.058 0.854 No FALSE
## 13 OK 0.410 0.854 No FALSE
## 14 OK 0.584 0.854 No FALSE
## 15 OK 0.770 0.905 No FALSE
## 16 OK 0.528 0.854 No FALSE
## 17 OK 0.789 0.905 No FALSE
## 18 OK 0.561 0.854 No FALSE
## 19 OK 0.492 0.854 No FALSE
## 20 OK 0.511 0.854 No FALSE
## 21 OK 0.281 0.854 No FALSE
## 22 OK 0.283 0.854 No FALSE
## 23 OK 0.395 0.854 No FALSE
## 24 OK 0.901 0.931 No FALSE
## 25 OK 0.565 0.854 No FALSE
## 26 OK 0.606 0.854 No FALSE
## 27 OK 0.721 0.894 No FALSE
## 28 OK 0.671 0.867 No FALSE
## 29 OK 0.820 0.908 No FALSE
## 30 OK 0.252 0.854 No FALSE
## 31 OK 0.210 0.854 No FALSE
## Convergence_Warning
## 1 FALSE
## 2 FALSE
## 3 FALSE
## 4 FALSE
## 5 FALSE
## 6 FALSE
## 7 FALSE
## 8 FALSE
## 9 FALSE
## 10 FALSE
## 11 FALSE
## 12 FALSE
## 13 FALSE
## 14 FALSE
## 15 FALSE
## 16 FALSE
## 17 FALSE
## 18 FALSE
## 19 FALSE
## 20 FALSE
## 21 FALSE
## 22 FALSE
## 23 FALSE
## 24 FALSE
## 25 FALSE
## 26 FALSE
## 27 FALSE
## 28 FALSE
## 29 FALSE
## 30 FALSE
## 31 FALSE
# Compare brain variables across groups (age_interval)
compare_groups_auto_v4(imputed_data[c(6, 60:90)], group = "age_interval")
## Variable
## 1 right_accumbens_area
## 2 left_accumbens_area
## 3 right_amygdala
## 4 left_amygdala
## 5 right_cerebellum_exterior
## 6 left_cerebellum_exterior
## 7 right_hippocampus
## 8 left_hippocampus
## 9 right_putamen
## 10 left_putamen
## 11 right_thalamus_proper
## 12 left_thalamus_proper
## 13 fornix_right
## 14 fornix_left
## 15 anterior_limb_of_internal_capsule_right
## 16 anterior_limb_of_internal_capsule_left
## 17 posterior_limb_of_internal_capsule_inc_cerebral_peduncle_right
## 18 posterior_limb_of_internal_capsule_inc_cerebral_peduncle_left
## 19 corpus_callosum
## 20 right_a_cg_g_anterior_cingulate_gyrus
## 21 left_a_cg_g_anterior_cingulate_gyrus
## 22 right_a_ins_anterior_insula
## 23 left_a_ins_anterior_insula
## 24 right_an_g_angular_gyrus
## 25 left_an_g_angular_gyrus
## 26 right_cun_cuneus
## 27 left_cun_cuneus
## 28 right_ent_entorhinal_area
## 29 left_ent_entorhinal_area
## 30 right_g_re_gyrus_rectus
## 31 left_g_re_gyrus_rectus
## Type Covariates_Used Group_Ref_Level_Used n_obs
## 1 Continuous (Gaussian GLM used) 1 463
## 2 Continuous (Gaussian GLM used) 1 463
## 3 Continuous (Gaussian GLM used) 1 463
## 4 Continuous (Gaussian GLM used) 1 463
## 5 Continuous (Gaussian GLM used) 1 463
## 6 Continuous (Gaussian GLM used) 1 463
## 7 Continuous (Gaussian GLM used) 1 463
## 8 Continuous (Gaussian GLM used) 1 463
## 9 Continuous (Gaussian GLM used) 1 463
## 10 Continuous (Gaussian GLM used) 1 463
## 11 Continuous (Gaussian GLM used) 1 463
## 12 Continuous (Gaussian GLM used) 1 463
## 13 Continuous (Gaussian GLM used) 1 463
## 14 Continuous (Gaussian GLM used) 1 463
## 15 Continuous (Gaussian GLM used) 1 463
## 16 Continuous (Gaussian GLM used) 1 463
## 17 Continuous (Gaussian GLM used) 1 463
## 18 Continuous (Gaussian GLM used) 1 463
## 19 Continuous (Gaussian GLM used) 1 463
## 20 Continuous (Gaussian GLM used) 1 463
## 21 Continuous (Gaussian GLM used) 1 463
## 22 Continuous (Gaussian GLM used) 1 463
## 23 Continuous (Gaussian GLM used) 1 463
## 24 Continuous (Gaussian GLM used) 1 463
## 25 Continuous (Gaussian GLM used) 1 463
## 26 Continuous (Gaussian GLM used) 1 463
## 27 Continuous (Gaussian GLM used) 1 463
## 28 Continuous (Gaussian GLM used) 1 463
## 29 Continuous (Gaussian GLM used) 1 463
## 30 Continuous (Gaussian GLM used) 1 463
## 31 Continuous (Gaussian GLM used) 1 463
## Status p_value p_value_FDR Significant Gamma_Shift_Warning
## 1 OK 0.008 0.016 Yes FALSE
## 2 OK 0.081 0.114 No FALSE
## 3 OK 0.609 0.609 No FALSE
## 4 OK 0.358 0.397 No FALSE
## 5 OK 0.095 0.128 No FALSE
## 6 OK 0.100 0.130 No FALSE
## 7 OK 0.002 0.004 Yes FALSE
## 8 OK <0.001 <0.001 Yes FALSE
## 9 OK 0.187 0.215 No FALSE
## 10 OK 0.030 0.049 Yes FALSE
## 11 OK <0.001 <0.001 Yes FALSE
## 12 OK <0.001 <0.001 Yes FALSE
## 13 OK <0.001 <0.001 Yes FALSE
## 14 OK <0.001 <0.001 Yes FALSE
## 15 OK <0.001 <0.001 Yes FALSE
## 16 OK <0.001 <0.001 Yes FALSE
## 17 OK 0.009 0.016 Yes FALSE
## 18 OK 0.068 0.100 No FALSE
## 19 OK 0.007 0.014 Yes FALSE
## 20 OK <0.001 <0.001 Yes FALSE
## 21 OK <0.001 <0.001 Yes FALSE
## 22 OK <0.001 <0.001 Yes FALSE
## 23 OK <0.001 <0.001 Yes FALSE
## 24 OK <0.001 0.001 Yes FALSE
## 25 OK 0.028 0.049 Yes FALSE
## 26 OK 0.004 0.009 Yes FALSE
## 27 OK 0.458 0.489 No FALSE
## 28 OK 0.131 0.156 No FALSE
## 29 OK 0.115 0.143 No FALSE
## 30 OK 0.040 0.061 No FALSE
## 31 OK 0.577 0.596 No FALSE
## Convergence_Warning
## 1 FALSE
## 2 FALSE
## 3 FALSE
## 4 FALSE
## 5 FALSE
## 6 FALSE
## 7 FALSE
## 8 FALSE
## 9 FALSE
## 10 FALSE
## 11 FALSE
## 12 FALSE
## 13 FALSE
## 14 FALSE
## 15 FALSE
## 16 FALSE
## 17 FALSE
## 18 FALSE
## 19 FALSE
## 20 FALSE
## 21 FALSE
## 22 FALSE
## 23 FALSE
## 24 FALSE
## 25 FALSE
## 26 FALSE
## 27 FALSE
## 28 FALSE
## 29 FALSE
## 30 FALSE
## 31 FALSE
# Compare cognitive variables across groups (cognitive)
compare_groups_auto_v4(imputed_data[c(1, 5, 23:59)], group = "cognitive", covariates = "age_pcr")
## Variable Type Covariates_Used
## 1 listaprimerrec Continuous (Gaussian GLM used) age_pcr
## 2 listaaprendizaje Continuous (Gaussian GLM used) age_pcr
## 3 listacp Continuous (Gaussian GLM used) age_pcr
## 4 listalp Continuous (Gaussian GLM used) age_pcr
## 5 listarecon Continuous (Gaussian GLM used) age_pcr
## 6 corsidirecto Continuous (Gaussian GLM used) age_pcr
## 7 corsiinverso Continuous (Gaussian GLM used) age_pcr
## 8 cactusvivos Continuous (Gaussian GLM used) age_pcr
## 9 cactusinanim Continuous (Gaussian GLM used) age_pcr
## 10 otverbaltpo Continuous (Gaussian GLM used) age_pcr
## 11 otverbalerr Continuous (Gaussian GLM used) age_pcr
## 12 otvisualtpo Continuous (Gaussian GLM used) age_pcr
## 13 otvisualerr Continuous (Gaussian GLM used) age_pcr
## 14 otmentaltpo Continuous (Gaussian GLM used) age_pcr
## 15 otmentalerr Continuous (Gaussian GLM used) age_pcr
## 16 otvismenttpo Continuous (Gaussian GLM used) age_pcr
## 17 otvismenterr Continuous (Gaussian GLM used) age_pcr
## 18 otswitchtpo Continuous (Gaussian GLM used) age_pcr
## 19 otswitcherr Continuous (Gaussian GLM used) age_pcr
## 20 x5dreadtpo Continuous (Gaussian GLM used) age_pcr
## 21 x5dreaderr Continuous (Gaussian GLM used) age_pcr
## 22 x5dcounttpo Continuous (Gaussian GLM used) age_pcr
## 23 x5dcounterr Continuous (Gaussian GLM used) age_pcr
## 24 x5dfoctpo Continuous (Gaussian GLM used) age_pcr
## 25 x5dfocerr Continuous (Gaussian GLM used) age_pcr
## 26 x5dswitchtpo Continuous (Gaussian GLM used) age_pcr
## 27 x5dswitcherr Continuous (Gaussian GLM used) age_pcr
## 28 dscorr Continuous (Gaussian GLM used) age_pcr
## 29 dsomis Continuous age_pcr
## 30 dscomis Continuous (Gaussian GLM used) age_pcr
## 31 torremov Continuous (Gaussian GLM used) age_pcr
## 32 torretpo Continuous (Gaussian GLM used) age_pcr
## 33 bostonsc Continuous (Gaussian GLM used) age_pcr
## 34 bostonlat Continuous (Gaussian GLM used) age_pcr
## 35 bostonsemerr Continuous (Gaussian GLM used) age_pcr
## 36 bostonfonerr Continuous age_pcr
## 37 fluencia Continuous (Gaussian GLM used) age_pcr
## Group_Ref_Level_Used n_obs Status p_value
## 1 1 463 OK <0.001
## 2 1 463 OK <0.001
## 3 1 463 OK <0.001
## 4 1 463 OK <0.001
## 5 1 463 OK <0.001
## 6 1 463 OK 0.191
## 7 1 463 OK <0.001
## 8 1 463 OK <0.001
## 9 1 463 OK <0.001
## 10 1 463 OK <0.001
## 11 1 463 OK <0.001
## 12 1 463 OK <0.001
## 13 1 463 OK 0.041
## 14 1 463 OK <0.001
## 15 1 463 OK <0.001
## 16 1 463 OK <0.001
## 17 1 463 OK <0.001
## 18 1 463 OK <0.001
## 19 1 463 OK <0.001
## 20 1 463 OK <0.001
## 21 1 463 OK 0.356
## 22 1 463 OK <0.001
## 23 1 463 OK 0.811
## 24 1 463 OK <0.001
## 25 1 463 OK 0.001
## 26 1 463 OK <0.001
## 27 1 463 OK <0.001
## 28 1 463 OK <0.001
## 29 1 463 Error during model/LRT: NA/NaN/I f i 'x' <NA>
## 30 1 463 OK 0.057
## 31 1 463 OK <0.001
## 32 1 463 OK <0.001
## 33 1 463 OK <0.001
## 34 1 463 OK <0.001
## 35 1 463 OK <0.001
## 36 1 463 Error during model/LRT: NA/NaN/I f i 'x' <NA>
## 37 1 463 OK <0.001
## p_value_FDR Significant Gamma_Shift_Warning Convergence_Warning
## 1 <0.001 Yes FALSE FALSE
## 2 <0.001 Yes FALSE FALSE
## 3 <0.001 Yes FALSE FALSE
## 4 <0.001 Yes FALSE FALSE
## 5 <0.001 Yes FALSE FALSE
## 6 0.202 No FALSE FALSE
## 7 <0.001 Yes FALSE FALSE
## 8 <0.001 Yes FALSE FALSE
## 9 <0.001 Yes FALSE FALSE
## 10 <0.001 Yes FALSE FALSE
## 11 <0.001 Yes FALSE FALSE
## 12 <0.001 Yes FALSE FALSE
## 13 0.046 Yes FALSE FALSE
## 14 <0.001 Yes FALSE FALSE
## 15 <0.001 Yes FALSE FALSE
## 16 <0.001 Yes FALSE FALSE
## 17 <0.001 Yes FALSE FALSE
## 18 <0.001 Yes FALSE FALSE
## 19 <0.001 Yes FALSE FALSE
## 20 <0.001 Yes FALSE FALSE
## 21 0.366 No FALSE FALSE
## 22 <0.001 Yes FALSE FALSE
## 23 0.811 No FALSE FALSE
## 24 <0.001 Yes FALSE FALSE
## 25 0.001 Yes FALSE FALSE
## 26 <0.001 Yes FALSE FALSE
## 27 <0.001 Yes FALSE FALSE
## 28 <0.001 Yes FALSE FALSE
## 29 <NA> No TRUE FALSE
## 30 0.062 No FALSE FALSE
## 31 <0.001 Yes FALSE FALSE
## 32 <0.001 Yes FALSE FALSE
## 33 <0.001 Yes FALSE FALSE
## 34 <0.001 Yes FALSE FALSE
## 35 <0.001 Yes FALSE FALSE
## 36 <NA> No TRUE FALSE
## 37 <0.001 Yes FALSE FALSE
# Compare symptoms variables across groups (cognitive)
compare_groups_auto_v4(imputed_data[c(1:22)], group = "cognitive", covariates = "age_pcr")
## Variable Type Covariates_Used
## 1 de800cog Continuous (Gaussian GLM used) age_pcr
## 2 images Continuous (Gaussian GLM used) age_pcr
## 3 age_2024 Continuous (Gaussian GLM used) age_pcr
## 4 age_interval Continuous (Gaussian GLM used) age_pcr
## 5 anosmia Categorical age_pcr
## 6 risk_hospital_icu Categorical age_pcr
## 7 vaccine_before_study Categorical age_pcr
## 8 covid_before_vaccination Binary age_pcr
## 9 fever Binary age_pcr
## 10 cough Binary age_pcr
## 11 muscle_pain Binary age_pcr
## 12 breath_dif Binary age_pcr
## 13 smell_lost Binary age_pcr
## 14 taste_lost Binary age_pcr
## 15 pcr Binary age_pcr
## 16 pcr_num Binary age_pcr
## 17 covid_variant Categorical age_pcr
## 18 vaccine_1 Categorical age_pcr
## 19 vaccine_2 Categorical age_pcr
## 20 vaccine_3 Categorical age_pcr
## Group_Ref_Level_Used n_obs Status p_value p_value_FDR Significant
## 1 1 463 OK <0.001 <0.001 Yes
## 2 1 463 OK 0.235 0.521 No
## 3 1 463 OK 0.290 0.527 No
## 4 1 463 OK 0.132 0.330 No
## 5 1 463 OK 0.002 0.016 Yes
## 6 1 463 OK 0.534 0.681 No
## 7 1 463 OK 0.032 0.213 No
## 8 1 463 OK 0.093 0.266 No
## 9 1 463 OK 0.335 0.533 No
## 10 1 463 OK 0.545 0.681 No
## 11 1 463 OK 0.654 0.727 No
## 12 1 463 OK 0.063 0.257 No
## 13 1 463 OK 0.952 0.952 No
## 14 1 463 OK 0.283 0.527 No
## 15 1 463 OK 0.087 0.266 No
## 16 1 462 OK 0.064 0.257 No
## 17 1 463 OK 0.347 0.533 No
## 18 1 463 OK 0.451 0.645 No
## 19 1 463 OK 0.890 0.937 No
## 20 1 463 OK 0.649 0.727 No
## Gamma_Shift_Warning Convergence_Warning
## 1 FALSE FALSE
## 2 FALSE FALSE
## 3 FALSE FALSE
## 4 FALSE FALSE
## 5 FALSE FALSE
## 6 FALSE FALSE
## 7 FALSE FALSE
## 8 FALSE FALSE
## 9 FALSE FALSE
## 10 FALSE FALSE
## 11 FALSE FALSE
## 12 FALSE FALSE
## 13 FALSE FALSE
## 14 FALSE FALSE
## 15 FALSE FALSE
## 16 FALSE FALSE
## 17 FALSE FALSE
## 18 FALSE FALSE
## 19 FALSE FALSE
## 20 FALSE FALSE
# Compare symptoms variables across groups (age_interval)
compare_groups_auto_v4(imputed_data[c(1:22)], group = "age_interval")
## Variable Type Covariates_Used
## 1 cognitive Binary
## 2 de800cog Continuous (Gaussian GLM used)
## 3 images Continuous (Gaussian GLM used)
## 4 age_2024 Continuous (Gaussian GLM used)
## 5 age_pcr Continuous (Gaussian GLM used)
## 6 anosmia Categorical
## 7 risk_hospital_icu Categorical
## 8 vaccine_before_study Categorical
## 9 covid_before_vaccination Binary
## 10 fever Binary
## 11 cough Binary
## 12 muscle_pain Binary
## 13 breath_dif Binary
## 14 smell_lost Binary
## 15 taste_lost Binary
## 16 pcr Binary
## 17 pcr_num Binary
## 18 covid_variant Categorical
## 19 vaccine_1 Categorical
## 20 vaccine_2 Categorical
## 21 vaccine_3 Categorical
## Group_Ref_Level_Used n_obs Status p_value p_value_FDR Significant
## 1 1 463 OK <0.001 <0.001 Yes
## 2 1 463 OK <0.001 0.001 Yes
## 3 1 463 OK <0.001 <0.001 Yes
## 4 1 463 OK <0.001 <0.001 Yes
## 5 1 463 OK <0.001 <0.001 Yes
## 6 1 463 OK 0.014 0.044 Yes
## 7 1 463 OK 0.712 0.712 No
## 8 1 463 OK 0.410 0.478 No
## 9 1 463 OK 0.470 0.493 No
## 10 1 463 OK 0.103 0.144 No
## 11 1 463 OK 0.039 0.074 No
## 12 1 463 OK 0.027 0.061 No
## 13 1 463 OK 0.081 0.130 No
## 14 1 463 OK 0.022 0.057 No
## 15 1 463 OK 0.015 0.044 Yes
## 16 1 463 OK 0.087 0.130 No
## 17 1 462 OK 0.071 0.125 No
## 18 1 463 OK 0.029 0.061 No
## 19 1 463 OK 0.432 0.478 No
## 20 1 463 OK 0.221 0.290 No
## 21 1 463 OK 0.264 0.326 No
## Gamma_Shift_Warning Convergence_Warning
## 1 FALSE FALSE
## 2 FALSE FALSE
## 3 FALSE FALSE
## 4 FALSE FALSE
## 5 FALSE FALSE
## 6 FALSE FALSE
## 7 FALSE FALSE
## 8 FALSE FALSE
## 9 FALSE FALSE
## 10 FALSE FALSE
## 11 FALSE FALSE
## 12 FALSE FALSE
## 13 FALSE FALSE
## 14 FALSE FALSE
## 15 FALSE FALSE
## 16 FALSE FALSE
## 17 FALSE FALSE
## 18 FALSE FALSE
## 19 FALSE FALSE
## 20 FALSE FALSE
## 21 FALSE FALSE
The dataset (N=463) exhibited minimal missingness for most variables, although some items (e.g., certain “cough” or “PCR” measures) had higher rates of missing data. They were imputed using CART method (see: https://stefvanbuuren.name/fimd/sec-cart.html)
Outlier diagnostics indicated a small subset of extreme values in several cognitive-performance and neuroimaging measures. These did not prevent model convergence overall, but a few variables required a shift when attempting Gamma-based modeling, suggesting skewed distributions and the presence of zeros or negative values. I used false discovery rate (FDR-controlling procedures) in the R syntax (https://link.springer.com/referenceworkentry/10.1007/978-1-4419-9863-7_223)
When grouping by the “cognitive” variable (binary/ordinal classification), we tested multiple subcortical and cortical volumes as outcomes in generalized linear models (Gaussian family) with “age_pcr” as a covariate.
Most brain volumetric measures did not show a statistically significant difference between cognitive groups after false discovery rate (FDR) correction. A single region (e.g., right_thalamus_proper, p=0.032) approached nominal significance but was not significant after multiple-comparison adjustment.
By contrast, grouping participants according to their “age_interval” revealed widespread associations with numerous brain volumes (e.g., hippocampus, thalamus, cerebellum, and other subcortical/cortical areas). Several regions displayed highly significant p-values that remained robust after FDR correction. This finding underscores the substantial impact of age on volumetric measures.
Additionally, certain task-based cognitive measures (e.g., reaction times and error rates in the 5D tasks or OT tasks) also varied significantly across age intervals, indicating age-related differences in cognitive performance.
Exploratory analyses of symptoms (e.g., anosmia, cough, muscle pain) across “age_interval” did not yield consistent significance after FDR correction, although some nominal p-values (e.g., for cough and muscle_pain) were <0.05 uncorrected.
Interestingly, “cognitive” status and certain related variables (e.g., de800cog) differed significantly across age intervals (p<0.001), suggesting that older groups may show different cognitive test profiles.
These preliminary results highlight age as a crucial factor influencing both neuroimaging measures and certain cognitive markers.
The “cognitive” grouping alone did not robustly differentiate subcortical volumes once age was accounted for, suggesting that chronological age exerts a stronger effect on structural brain metrics than the cognitive classification in this sample.
# --- Define your outcome variable lists FIRST ---
# Example (replace with your actual column names):
cognitive_vars_detailed <- c("listaprimerrec", "listaaprendizaje", "listacp", "listalp",
"listarecon", "corsidirecto", "corsiinverso", "cactusvivos",
"cactusinanim", "otverbaltpo", "otverbalerr", "otvisualtpo",
"otvisualerr", "otmentaltpo", "otmentalerr", "otvismenttpo",
"otvismenterr", "otswitchtpo", "otswitcherr", "x5dreadtpo",
"x5dreaderr", "x5dcounttpo", "x5dcounterr", "x5dfoctpo",
"x5dfocerr", "x5dswitchtpo", "x5dswitcherr", "dscorr",
"dsomis", "dscomis", "torremov", "torretpo", "bostonsc",
"bostonlat", "bostonsemerr", "bostonfonerr", "fluencia")
neuroimaging_vars <- c("right_accumbens_area", "left_accumbens_area", "right_amygdala",
"left_amygdala", "right_cerebellum_exterior", "left_cerebellum_exterior",
"right_hippocampus", "left_hippocampus", "right_putamen",
"left_putamen", "right_thalamus_proper", "left_thalamus_proper",
"fornix_right", "fornix_left", "anterior_limb_of_internal_capsule_right",
"anterior_limb_of_internal_capsule_left",
"posterior_limb_of_internal_capsule_inc_cerebral_peduncle_right",
"posterior_limb_of_internal_capsule_inc_cerebral_peduncle_left",
"corpus_callosum", "right_a_cg_g_anterior_cingulate_gyrus",
"left_a_cg_g_anterior_cingulate_gyrus", "right_a_ins_anterior_insula",
"left_a_ins_anterior_insula", "right_an_g_angular_gyrus",
"left_an_g_angular_gyrus", "right_cun_cuneus", "left_cun_cuneus",
"right_ent_entorhinal_area", "left_ent_entorhinal_area",
"right_g_re_gyrus_rectus", "left_g_re_gyrus_rectus")
# Add any other relevant imaging vars
# --- Analysis 1: Grouping by COVID Severity within PCR+ group ---
# Goal: Impact of severity on cognition/brain in PCR+ group.
# Create subset for PCR positive
pcr_positive_data <- subset(imputed_data, pcr == "POSITIVA") # Or however POSITIVA is coded
# Check distribution of risk_hospital_icu within this subset
print(table(pcr_positive_data$risk_hospital_icu))
##
## 0 1 2 3
## 314 17 49 7
# Consider grouping levels 1, 2, 3 if counts are very low, e.g.:
# pcr_positive_data$severity_grouped <- ifelse(pcr_positive_data$risk_hospital_icu == 0, "0", "1+")
# Use "severity_grouped" as grouping_var if you do this.
# (Assuming pcr_positive_data and cognitive_vars_detailed are defined)
# Run for Cognitive Variables
results_severity_cog <- compare_groups_auto_v4(
vars_to_test = cognitive_vars_detailed,
group = "risk_hospital_icu",
covariates = c("age_pcr"),
data = pcr_positive_data
)
print(results_severity_cog)
## Variable Type Covariates_Used
## 1 listaprimerrec Continuous (Gaussian GLM used) age_pcr
## 2 listaaprendizaje Continuous (Gaussian GLM used) age_pcr
## 3 listacp Continuous (Gaussian GLM used) age_pcr
## 4 listalp Continuous (Gaussian GLM used) age_pcr
## 5 listarecon Continuous (Gaussian GLM used) age_pcr
## 6 corsidirecto Continuous (Gaussian GLM used) age_pcr
## 7 corsiinverso Continuous (Gaussian GLM used) age_pcr
## 8 cactusvivos Continuous (Gaussian GLM used) age_pcr
## 9 cactusinanim Continuous (Gaussian GLM used) age_pcr
## 10 otverbaltpo Continuous (Gaussian GLM used) age_pcr
## 11 otverbalerr Continuous (Gaussian GLM used) age_pcr
## 12 otvisualtpo Continuous (Gaussian GLM used) age_pcr
## 13 otvisualerr Continuous (Gaussian GLM used) age_pcr
## 14 otmentaltpo Continuous (Gaussian GLM used) age_pcr
## 15 otmentalerr Continuous (Gaussian GLM used) age_pcr
## 16 otvismenttpo Continuous (Gaussian GLM used) age_pcr
## 17 otvismenterr Continuous (Gaussian GLM used) age_pcr
## 18 otswitchtpo Continuous (Gaussian GLM used) age_pcr
## 19 otswitcherr Continuous (Gaussian GLM used) age_pcr
## 20 x5dreadtpo Continuous (Gaussian GLM used) age_pcr
## 21 x5dreaderr Continuous (Gaussian GLM used) age_pcr
## 22 x5dcounttpo Continuous (Gaussian GLM used) age_pcr
## 23 x5dcounterr Continuous (Gaussian GLM used) age_pcr
## 24 x5dfoctpo Continuous (Gaussian GLM used) age_pcr
## 25 x5dfocerr Continuous (Gaussian GLM used) age_pcr
## 26 x5dswitchtpo Continuous (Gaussian GLM used) age_pcr
## 27 x5dswitcherr Continuous (Gaussian GLM used) age_pcr
## 28 dscorr Continuous (Gaussian GLM used) age_pcr
## 29 dsomis Continuous age_pcr
## 30 dscomis Continuous (Gaussian GLM used) age_pcr
## 31 torremov Continuous (Gaussian GLM used) age_pcr
## 32 torretpo Continuous (Gaussian GLM used) age_pcr
## 33 bostonsc Continuous (Gaussian GLM used) age_pcr
## 34 bostonlat Continuous (Gaussian GLM used) age_pcr
## 35 bostonsemerr Continuous (Gaussian GLM used) age_pcr
## 36 bostonfonerr Continuous (Gaussian GLM used) age_pcr
## 37 fluencia Continuous (Gaussian GLM used) age_pcr
## Group_Ref_Level_Used n_obs Status p_value
## 1 0 387 OK 0.463
## 2 0 387 OK 0.201
## 3 0 387 OK 0.667
## 4 0 387 OK 0.833
## 5 0 387 OK 0.191
## 6 0 387 OK 0.870
## 7 0 387 OK 0.132
## 8 0 387 OK 0.810
## 9 0 387 OK 0.424
## 10 0 387 OK 0.958
## 11 0 387 OK 0.072
## 12 0 387 OK 0.979
## 13 0 387 OK 0.977
## 14 0 387 OK 0.322
## 15 0 387 OK 0.590
## 16 0 387 OK 0.114
## 17 0 387 OK 0.892
## 18 0 387 OK 0.435
## 19 0 387 OK 0.755
## 20 0 387 OK 0.380
## 21 0 387 OK 0.499
## 22 0 387 OK 0.520
## 23 0 387 OK 0.937
## 24 0 387 OK 0.937
## 25 0 387 OK 0.894
## 26 0 387 OK 0.418
## 27 0 387 OK 0.919
## 28 0 387 OK 0.960
## 29 0 387 Error during model/LRT: NA/NaN/I f i 'x' <NA>
## 30 0 387 OK 0.648
## 31 0 387 OK 0.049
## 32 0 387 OK 0.254
## 33 0 387 OK 0.096
## 34 0 387 OK 0.327
## 35 0 387 OK 0.105
## 36 0 387 OK 0.185
## 37 0 387 OK 0.106
## p_value_FDR Significant Gamma_Shift_Warning Convergence_Warning
## 1 0.926 No FALSE FALSE
## 2 0.723 No FALSE FALSE
## 3 0.979 No FALSE FALSE
## 4 0.979 No FALSE FALSE
## 5 0.723 No FALSE FALSE
## 6 0.979 No FALSE FALSE
## 7 0.681 No FALSE FALSE
## 8 0.979 No FALSE FALSE
## 9 0.922 No FALSE FALSE
## 10 0.979 No FALSE FALSE
## 11 0.681 No FALSE FALSE
## 12 0.979 No FALSE FALSE
## 13 0.979 No FALSE FALSE
## 14 0.906 No FALSE FALSE
## 15 0.979 No FALSE FALSE
## 16 0.681 No FALSE FALSE
## 17 0.979 No FALSE FALSE
## 18 0.922 No FALSE FALSE
## 19 0.979 No FALSE FALSE
## 20 0.922 No FALSE FALSE
## 21 0.935 No FALSE FALSE
## 22 0.935 No FALSE FALSE
## 23 0.979 No FALSE FALSE
## 24 0.979 No FALSE FALSE
## 25 0.979 No FALSE FALSE
## 26 0.922 No FALSE FALSE
## 27 0.979 No FALSE FALSE
## 28 0.979 No FALSE FALSE
## 29 <NA> No TRUE FALSE
## 30 0.979 No FALSE FALSE
## 31 0.681 No FALSE FALSE
## 32 0.832 No FALSE FALSE
## 33 0.681 No FALSE FALSE
## 34 0.906 No FALSE FALSE
## 35 0.681 No FALSE FALSE
## 36 0.723 No FALSE FALSE
## 37 0.681 No FALSE FALSE
# Run for Neuroimaging Variables
results_severity_neuro <- compare_groups_auto_v4(
vars_to_test = neuroimaging_vars,
group = "risk_hospital_icu",
covariates = c("age_pcr"),
data = pcr_positive_data
)
print(results_severity_neuro)
## Variable
## 1 right_accumbens_area
## 2 left_accumbens_area
## 3 right_amygdala
## 4 left_amygdala
## 5 right_cerebellum_exterior
## 6 left_cerebellum_exterior
## 7 right_hippocampus
## 8 left_hippocampus
## 9 right_putamen
## 10 left_putamen
## 11 right_thalamus_proper
## 12 left_thalamus_proper
## 13 fornix_right
## 14 fornix_left
## 15 anterior_limb_of_internal_capsule_right
## 16 anterior_limb_of_internal_capsule_left
## 17 posterior_limb_of_internal_capsule_inc_cerebral_peduncle_right
## 18 posterior_limb_of_internal_capsule_inc_cerebral_peduncle_left
## 19 corpus_callosum
## 20 right_a_cg_g_anterior_cingulate_gyrus
## 21 left_a_cg_g_anterior_cingulate_gyrus
## 22 right_a_ins_anterior_insula
## 23 left_a_ins_anterior_insula
## 24 right_an_g_angular_gyrus
## 25 left_an_g_angular_gyrus
## 26 right_cun_cuneus
## 27 left_cun_cuneus
## 28 right_ent_entorhinal_area
## 29 left_ent_entorhinal_area
## 30 right_g_re_gyrus_rectus
## 31 left_g_re_gyrus_rectus
## Type Covariates_Used Group_Ref_Level_Used n_obs
## 1 Continuous (Gaussian GLM used) age_pcr 0 387
## 2 Continuous (Gaussian GLM used) age_pcr 0 387
## 3 Continuous (Gaussian GLM used) age_pcr 0 387
## 4 Continuous (Gaussian GLM used) age_pcr 0 387
## 5 Continuous (Gaussian GLM used) age_pcr 0 387
## 6 Continuous (Gaussian GLM used) age_pcr 0 387
## 7 Continuous (Gaussian GLM used) age_pcr 0 387
## 8 Continuous (Gaussian GLM used) age_pcr 0 387
## 9 Continuous (Gaussian GLM used) age_pcr 0 387
## 10 Continuous (Gaussian GLM used) age_pcr 0 387
## 11 Continuous (Gaussian GLM used) age_pcr 0 387
## 12 Continuous (Gaussian GLM used) age_pcr 0 387
## 13 Continuous (Gaussian GLM used) age_pcr 0 387
## 14 Continuous (Gaussian GLM used) age_pcr 0 387
## 15 Continuous (Gaussian GLM used) age_pcr 0 387
## 16 Continuous (Gaussian GLM used) age_pcr 0 387
## 17 Continuous (Gaussian GLM used) age_pcr 0 387
## 18 Continuous (Gaussian GLM used) age_pcr 0 387
## 19 Continuous (Gaussian GLM used) age_pcr 0 387
## 20 Continuous (Gaussian GLM used) age_pcr 0 387
## 21 Continuous (Gaussian GLM used) age_pcr 0 387
## 22 Continuous (Gaussian GLM used) age_pcr 0 387
## 23 Continuous (Gaussian GLM used) age_pcr 0 387
## 24 Continuous (Gaussian GLM used) age_pcr 0 387
## 25 Continuous (Gaussian GLM used) age_pcr 0 387
## 26 Continuous (Gaussian GLM used) age_pcr 0 387
## 27 Continuous (Gaussian GLM used) age_pcr 0 387
## 28 Continuous (Gaussian GLM used) age_pcr 0 387
## 29 Continuous (Gaussian GLM used) age_pcr 0 387
## 30 Continuous (Gaussian GLM used) age_pcr 0 387
## 31 Continuous (Gaussian GLM used) age_pcr 0 387
## Status p_value p_value_FDR Significant Gamma_Shift_Warning
## 1 OK 0.578 0.853 No FALSE
## 2 OK 0.439 0.853 No FALSE
## 3 OK 0.062 0.595 No FALSE
## 4 OK 0.247 0.850 No FALSE
## 5 OK 0.431 0.853 No FALSE
## 6 OK 0.344 0.853 No FALSE
## 7 OK 0.722 0.948 No FALSE
## 8 OK 0.501 0.853 No FALSE
## 9 OK 0.986 0.989 No FALSE
## 10 OK 0.795 0.948 No FALSE
## 11 OK 0.774 0.948 No FALSE
## 12 OK 0.894 0.956 No FALSE
## 13 OK 0.154 0.680 No FALSE
## 14 OK 0.069 0.595 No FALSE
## 15 OK 0.637 0.897 No FALSE
## 16 OK 0.542 0.853 No FALSE
## 17 OK 0.837 0.951 No FALSE
## 18 OK 0.357 0.853 No FALSE
## 19 OK 0.565 0.853 No FALSE
## 20 OK 0.989 0.989 No FALSE
## 21 OK 0.483 0.853 No FALSE
## 22 OK 0.457 0.853 No FALSE
## 23 OK 0.372 0.853 No FALSE
## 24 OK 0.077 0.595 No FALSE
## 25 OK 0.236 0.850 No FALSE
## 26 OK 0.146 0.680 No FALSE
## 27 OK 0.373 0.853 No FALSE
## 28 OK 0.027 0.595 No FALSE
## 29 OK 0.753 0.948 No FALSE
## 30 OK 0.142 0.680 No FALSE
## 31 OK 0.859 0.951 No FALSE
## Convergence_Warning
## 1 FALSE
## 2 FALSE
## 3 FALSE
## 4 FALSE
## 5 FALSE
## 6 FALSE
## 7 FALSE
## 8 FALSE
## 9 FALSE
## 10 FALSE
## 11 FALSE
## 12 FALSE
## 13 FALSE
## 14 FALSE
## 15 FALSE
## 16 FALSE
## 17 FALSE
## 18 FALSE
## 19 FALSE
## 20 FALSE
## 21 FALSE
## 22 FALSE
## 23 FALSE
## 24 FALSE
## 25 FALSE
## 26 FALSE
## 27 FALSE
## 28 FALSE
## 29 FALSE
## 30 FALSE
## 31 FALSE
Cognitive Outcomes: After adjusting for age (age_pcr) and correcting for multiple comparisons (FDR), no significant differences were found across COVID-19 severity levels (defined by risk_hospital_icu, reference level 0) for any of the detailed cognitive performance variables within the PCR-positive group. The analysis for dsomis failed due to errors. One variable (torremov, p=0.049) showed nominal significance before FDR correction but was not significant afterward (FDR p=0.681). Neuroimaging Outcomes: Similarly, when examining structural neuroimaging volumes within the PCR-positive group, no significant differences were detected across COVID-19 severity levels after adjusting for age (age_pcr) and applying FDR correction. One variable (right_ent_entorhinal_area, p=0.027) showed nominal significance but did not survive multiple comparison correction (FDR p=0.595).
# Goal: Explore relationship between pre-study vaccination and outcomes.
# Define key symptom variables if needed
symptom_vars_key <- c("anosmia", "risk_hospital_icu") # Add others if desired
# Run for Cognitive Variables
results_vaccine_cog <- compare_groups_auto_v4(
vars_to_test = cognitive_vars_detailed,
group = "vaccine_before_study",
covariates = c("age_pcr"),
data = imputed_data
)
print(results_vaccine_cog)
## Variable Type Covariates_Used
## 1 listaprimerrec Continuous (Gaussian GLM used) age_pcr
## 2 listaaprendizaje Continuous (Gaussian GLM used) age_pcr
## 3 listacp Continuous (Gaussian GLM used) age_pcr
## 4 listalp Continuous (Gaussian GLM used) age_pcr
## 5 listarecon Continuous (Gaussian GLM used) age_pcr
## 6 corsidirecto Continuous (Gaussian GLM used) age_pcr
## 7 corsiinverso Continuous (Gaussian GLM used) age_pcr
## 8 cactusvivos Continuous (Gaussian GLM used) age_pcr
## 9 cactusinanim Continuous (Gaussian GLM used) age_pcr
## 10 otverbaltpo Continuous (Gaussian GLM used) age_pcr
## 11 otverbalerr Continuous (Gaussian GLM used) age_pcr
## 12 otvisualtpo Continuous (Gaussian GLM used) age_pcr
## 13 otvisualerr Continuous (Gaussian GLM used) age_pcr
## 14 otmentaltpo Continuous (Gaussian GLM used) age_pcr
## 15 otmentalerr Continuous (Gaussian GLM used) age_pcr
## 16 otvismenttpo Continuous (Gaussian GLM used) age_pcr
## 17 otvismenterr Continuous (Gaussian GLM used) age_pcr
## 18 otswitchtpo Continuous (Gaussian GLM used) age_pcr
## 19 otswitcherr Continuous (Gaussian GLM used) age_pcr
## 20 x5dreadtpo Continuous (Gaussian GLM used) age_pcr
## 21 x5dreaderr Continuous (Gaussian GLM used) age_pcr
## 22 x5dcounttpo Continuous (Gaussian GLM used) age_pcr
## 23 x5dcounterr Continuous (Gaussian GLM used) age_pcr
## 24 x5dfoctpo Continuous (Gaussian GLM used) age_pcr
## 25 x5dfocerr Continuous (Gaussian GLM used) age_pcr
## 26 x5dswitchtpo Continuous (Gaussian GLM used) age_pcr
## 27 x5dswitcherr Continuous (Gaussian GLM used) age_pcr
## 28 dscorr Continuous (Gaussian GLM used) age_pcr
## 29 dsomis Continuous age_pcr
## 30 dscomis Continuous (Gaussian GLM used) age_pcr
## 31 torremov Continuous (Gaussian GLM used) age_pcr
## 32 torretpo Continuous (Gaussian GLM used) age_pcr
## 33 bostonsc Continuous (Gaussian GLM used) age_pcr
## 34 bostonlat Continuous (Gaussian GLM used) age_pcr
## 35 bostonsemerr Continuous (Gaussian GLM used) age_pcr
## 36 bostonfonerr Continuous age_pcr
## 37 fluencia Continuous (Gaussian GLM used) age_pcr
## Group_Ref_Level_Used n_obs Status p_value
## 1 0 463 OK 0.770
## 2 0 463 OK 0.007
## 3 0 463 OK 0.681
## 4 0 463 OK 0.585
## 5 0 463 OK 0.476
## 6 0 463 OK 0.069
## 7 0 463 OK 0.735
## 8 0 463 OK 0.703
## 9 0 463 OK 0.953
## 10 0 463 OK 0.525
## 11 0 463 OK 0.376
## 12 0 463 OK 0.070
## 13 0 463 OK 0.410
## 14 0 463 OK 0.091
## 15 0 463 OK 0.814
## 16 0 463 OK 0.017
## 17 0 463 OK 0.595
## 18 0 463 OK 0.013
## 19 0 463 OK 0.175
## 20 0 463 OK 0.224
## 21 0 463 OK 0.264
## 22 0 463 OK 0.155
## 23 0 463 OK 0.325
## 24 0 463 OK 0.325
## 25 0 463 OK 0.077
## 26 0 463 OK 0.207
## 27 0 463 OK 0.220
## 28 0 463 OK 0.279
## 29 0 463 Error during model/LRT: NA/NaN/I f i 'x' <NA>
## 30 0 463 OK 0.041
## 31 0 463 OK 0.064
## 32 0 463 OK 0.001
## 33 0 463 OK 0.402
## 34 0 463 OK <0.001
## 35 0 463 OK 0.022
## 36 0 463 Error during model/LRT: NA/NaN/I f i 'x' <NA>
## 37 0 463 OK 0.518
## p_value_FDR Significant Gamma_Shift_Warning Convergence_Warning
## 1 0.816 No FALSE FALSE
## 2 0.079 No FALSE FALSE
## 3 0.794 No FALSE FALSE
## 4 0.718 No FALSE FALSE
## 5 0.666 No FALSE FALSE
## 6 0.245 No FALSE FALSE
## 7 0.804 No FALSE FALSE
## 8 0.794 No FALSE FALSE
## 9 0.953 No FALSE FALSE
## 10 0.681 No FALSE FALSE
## 11 0.598 No FALSE FALSE
## 12 0.245 No FALSE FALSE
## 13 0.598 No FALSE FALSE
## 14 0.265 No FALSE FALSE
## 15 0.838 No FALSE FALSE
## 16 0.120 No FALSE FALSE
## 17 0.718 No FALSE FALSE
## 18 0.110 No FALSE FALSE
## 19 0.437 No FALSE FALSE
## 20 0.461 No FALSE FALSE
## 21 0.513 No FALSE FALSE
## 22 0.418 No FALSE FALSE
## 23 0.541 No FALSE FALSE
## 24 0.541 No FALSE FALSE
## 25 0.245 No FALSE FALSE
## 26 0.461 No FALSE FALSE
## 27 0.461 No FALSE FALSE
## 28 0.514 No FALSE FALSE
## 29 <NA> No TRUE FALSE
## 30 0.207 No FALSE FALSE
## 31 0.245 No FALSE FALSE
## 32 0.023 Yes FALSE FALSE
## 33 0.598 No FALSE FALSE
## 34 0.009 Yes FALSE FALSE
## 35 0.127 No FALSE FALSE
## 36 <NA> No TRUE FALSE
## 37 0.681 No FALSE FALSE
# Run for Neuroimaging Variables
results_vaccine_neuro <- compare_groups_auto_v4(
vars_to_test = neuroimaging_vars,
group = "vaccine_before_study",
covariates = c("age_pcr"),
data = imputed_data
)
print(results_vaccine_neuro)
## Variable
## 1 right_accumbens_area
## 2 left_accumbens_area
## 3 right_amygdala
## 4 left_amygdala
## 5 right_cerebellum_exterior
## 6 left_cerebellum_exterior
## 7 right_hippocampus
## 8 left_hippocampus
## 9 right_putamen
## 10 left_putamen
## 11 right_thalamus_proper
## 12 left_thalamus_proper
## 13 fornix_right
## 14 fornix_left
## 15 anterior_limb_of_internal_capsule_right
## 16 anterior_limb_of_internal_capsule_left
## 17 posterior_limb_of_internal_capsule_inc_cerebral_peduncle_right
## 18 posterior_limb_of_internal_capsule_inc_cerebral_peduncle_left
## 19 corpus_callosum
## 20 right_a_cg_g_anterior_cingulate_gyrus
## 21 left_a_cg_g_anterior_cingulate_gyrus
## 22 right_a_ins_anterior_insula
## 23 left_a_ins_anterior_insula
## 24 right_an_g_angular_gyrus
## 25 left_an_g_angular_gyrus
## 26 right_cun_cuneus
## 27 left_cun_cuneus
## 28 right_ent_entorhinal_area
## 29 left_ent_entorhinal_area
## 30 right_g_re_gyrus_rectus
## 31 left_g_re_gyrus_rectus
## Type Covariates_Used Group_Ref_Level_Used n_obs
## 1 Continuous (Gaussian GLM used) age_pcr 0 463
## 2 Continuous (Gaussian GLM used) age_pcr 0 463
## 3 Continuous (Gaussian GLM used) age_pcr 0 463
## 4 Continuous (Gaussian GLM used) age_pcr 0 463
## 5 Continuous (Gaussian GLM used) age_pcr 0 463
## 6 Continuous (Gaussian GLM used) age_pcr 0 463
## 7 Continuous (Gaussian GLM used) age_pcr 0 463
## 8 Continuous (Gaussian GLM used) age_pcr 0 463
## 9 Continuous (Gaussian GLM used) age_pcr 0 463
## 10 Continuous (Gaussian GLM used) age_pcr 0 463
## 11 Continuous (Gaussian GLM used) age_pcr 0 463
## 12 Continuous (Gaussian GLM used) age_pcr 0 463
## 13 Continuous (Gaussian GLM used) age_pcr 0 463
## 14 Continuous (Gaussian GLM used) age_pcr 0 463
## 15 Continuous (Gaussian GLM used) age_pcr 0 463
## 16 Continuous (Gaussian GLM used) age_pcr 0 463
## 17 Continuous (Gaussian GLM used) age_pcr 0 463
## 18 Continuous (Gaussian GLM used) age_pcr 0 463
## 19 Continuous (Gaussian GLM used) age_pcr 0 463
## 20 Continuous (Gaussian GLM used) age_pcr 0 463
## 21 Continuous (Gaussian GLM used) age_pcr 0 463
## 22 Continuous (Gaussian GLM used) age_pcr 0 463
## 23 Continuous (Gaussian GLM used) age_pcr 0 463
## 24 Continuous (Gaussian GLM used) age_pcr 0 463
## 25 Continuous (Gaussian GLM used) age_pcr 0 463
## 26 Continuous (Gaussian GLM used) age_pcr 0 463
## 27 Continuous (Gaussian GLM used) age_pcr 0 463
## 28 Continuous (Gaussian GLM used) age_pcr 0 463
## 29 Continuous (Gaussian GLM used) age_pcr 0 463
## 30 Continuous (Gaussian GLM used) age_pcr 0 463
## 31 Continuous (Gaussian GLM used) age_pcr 0 463
## Status p_value p_value_FDR Significant Gamma_Shift_Warning
## 1 OK 0.352 0.971 No FALSE
## 2 OK 0.202 0.971 No FALSE
## 3 OK 0.901 0.984 No FALSE
## 4 OK 0.317 0.971 No FALSE
## 5 OK 0.956 0.984 No FALSE
## 6 OK 0.984 0.984 No FALSE
## 7 OK 0.976 0.984 No FALSE
## 8 OK 0.859 0.984 No FALSE
## 9 OK 0.740 0.984 No FALSE
## 10 OK 0.376 0.971 No FALSE
## 11 OK 0.620 0.984 No FALSE
## 12 OK 0.625 0.984 No FALSE
## 13 OK 0.228 0.971 No FALSE
## 14 OK 0.275 0.971 No FALSE
## 15 OK 0.602 0.984 No FALSE
## 16 OK 0.854 0.984 No FALSE
## 17 OK 0.789 0.984 No FALSE
## 18 OK 0.524 0.984 No FALSE
## 19 OK 0.250 0.971 No FALSE
## 20 OK 0.264 0.971 No FALSE
## 21 OK 0.052 0.971 No FALSE
## 22 OK 0.122 0.971 No FALSE
## 23 OK 0.089 0.971 No FALSE
## 24 OK 0.664 0.984 No FALSE
## 25 OK 0.824 0.984 No FALSE
## 26 OK 0.932 0.984 No FALSE
## 27 OK 0.759 0.984 No FALSE
## 28 OK 0.722 0.984 No FALSE
## 29 OK 0.572 0.984 No FALSE
## 30 OK 0.809 0.984 No FALSE
## 31 OK 0.310 0.971 No FALSE
## Convergence_Warning
## 1 FALSE
## 2 FALSE
## 3 FALSE
## 4 FALSE
## 5 FALSE
## 6 FALSE
## 7 FALSE
## 8 FALSE
## 9 FALSE
## 10 FALSE
## 11 FALSE
## 12 FALSE
## 13 FALSE
## 14 FALSE
## 15 FALSE
## 16 FALSE
## 17 FALSE
## 18 FALSE
## 19 FALSE
## 20 FALSE
## 21 FALSE
## 22 FALSE
## 23 FALSE
## 24 FALSE
## 25 FALSE
## 26 FALSE
## 27 FALSE
## 28 FALSE
## 29 FALSE
## 30 FALSE
## 31 FALSE
# Run for Key Symptoms
results_vaccine_symptoms <- compare_groups_auto_v4(
vars_to_test = symptom_vars_key,
group = "vaccine_before_study",
covariates = c("age_pcr"),
data = imputed_data
)
print(results_vaccine_symptoms)
## Variable Type Covariates_Used Group_Ref_Level_Used n_obs
## 1 anosmia Categorical age_pcr 0 463
## 2 risk_hospital_icu Categorical age_pcr 0 463
## Status p_value p_value_FDR Significant Gamma_Shift_Warning
## 1 OK 0.793 0.793 No FALSE
## 2 OK 0.025 0.050 Yes FALSE
## Convergence_Warning
## 1 FALSE
## 2 FALSE
# Note: You might later stratify these by PCR status if sample sizes allow.
Cognitive Outcomes: Controlling for age (age_pcr) and applying FDR correction, significant differences across pre-study vaccination status groups (reference level 0) were observed for torretpo (Tower Task Time, FDR p=0.023) and bostonlat (Boston Latency, FDR p=0.009). No other cognitive variables showed significant differences after FDR correction, although listaaprendizaje, otvismenttpo, otswitchtpo, and bostonsemerr had uncorrected p-values < 0.05. Analyses for dsomis and bostonfonerr failed. Neuroimaging Outcomes: After adjusting for age (age_pcr) and correcting for multiple comparisons, no significant differences in neuroimaging volumes were found between participants with different pre-study vaccination statuses. Symptom Outcomes: When examining anosmia and risk_hospital_icu as outcomes, significant differences were found for risk_hospital_icu (FDR p=0.050) across vaccination status groups, adjusting for age. No significant difference was found for anosmia.
# Goal: Does getting COVID before vs. after vaccination associate with different outcomes?
# Ensure the grouping variable isn't mostly missing data in the subset
print(table(pcr_positive_data$covid_before_vaccination, useNA = "ifany"))
##
## 0 1
## 187 200
# Run for Cognitive Variables
results_timing_cog <- compare_groups_auto_v4(
vars_to_test = cognitive_vars_detailed,
group = "covid_before_vaccination",
covariates = c("age_pcr", "vaccine_before_study"), # Control for overall vaccine status too
data = pcr_positive_data
)
print(results_timing_cog)
## Variable Type
## 1 listaprimerrec Continuous (Gaussian GLM used)
## 2 listaaprendizaje Continuous (Gaussian GLM used)
## 3 listacp Continuous (Gaussian GLM used)
## 4 listalp Continuous (Gaussian GLM used)
## 5 listarecon Continuous (Gaussian GLM used)
## 6 corsidirecto Continuous (Gaussian GLM used)
## 7 corsiinverso Continuous (Gaussian GLM used)
## 8 cactusvivos Continuous (Gaussian GLM used)
## 9 cactusinanim Continuous (Gaussian GLM used)
## 10 otverbaltpo Continuous (Gaussian GLM used)
## 11 otverbalerr Continuous (Gaussian GLM used)
## 12 otvisualtpo Continuous (Gaussian GLM used)
## 13 otvisualerr Continuous (Gaussian GLM used)
## 14 otmentaltpo Continuous (Gaussian GLM used)
## 15 otmentalerr Continuous (Gaussian GLM used)
## 16 otvismenttpo Continuous (Gaussian GLM used)
## 17 otvismenterr Continuous (Gaussian GLM used)
## 18 otswitchtpo Continuous (Gaussian GLM used)
## 19 otswitcherr Continuous (Gaussian GLM used)
## 20 x5dreadtpo Continuous (Gaussian GLM used)
## 21 x5dreaderr Continuous (Gaussian GLM used)
## 22 x5dcounttpo Continuous (Gaussian GLM used)
## 23 x5dcounterr Continuous (Gaussian GLM used)
## 24 x5dfoctpo Continuous (Gaussian GLM used)
## 25 x5dfocerr Continuous (Gaussian GLM used)
## 26 x5dswitchtpo Continuous (Gaussian GLM used)
## 27 x5dswitcherr Continuous (Gaussian GLM used)
## 28 dscorr Continuous (Gaussian GLM used)
## 29 dsomis Continuous
## 30 dscomis Continuous (Gaussian GLM used)
## 31 torremov Continuous (Gaussian GLM used)
## 32 torretpo Continuous (Gaussian GLM used)
## 33 bostonsc Continuous (Gaussian GLM used)
## 34 bostonlat Continuous (Gaussian GLM used)
## 35 bostonsemerr Continuous (Gaussian GLM used)
## 36 bostonfonerr Continuous (Gaussian GLM used)
## 37 fluencia Continuous (Gaussian GLM used)
## Covariates_Used Group_Ref_Level_Used n_obs
## 1 age_pcr, vaccine_before_study 0 387
## 2 age_pcr, vaccine_before_study 0 387
## 3 age_pcr, vaccine_before_study 0 387
## 4 age_pcr, vaccine_before_study 0 387
## 5 age_pcr, vaccine_before_study 0 387
## 6 age_pcr, vaccine_before_study 0 387
## 7 age_pcr, vaccine_before_study 0 387
## 8 age_pcr, vaccine_before_study 0 387
## 9 age_pcr, vaccine_before_study 0 387
## 10 age_pcr, vaccine_before_study 0 387
## 11 age_pcr, vaccine_before_study 0 387
## 12 age_pcr, vaccine_before_study 0 387
## 13 age_pcr, vaccine_before_study 0 387
## 14 age_pcr, vaccine_before_study 0 387
## 15 age_pcr, vaccine_before_study 0 387
## 16 age_pcr, vaccine_before_study 0 387
## 17 age_pcr, vaccine_before_study 0 387
## 18 age_pcr, vaccine_before_study 0 387
## 19 age_pcr, vaccine_before_study 0 387
## 20 age_pcr, vaccine_before_study 0 387
## 21 age_pcr, vaccine_before_study 0 387
## 22 age_pcr, vaccine_before_study 0 387
## 23 age_pcr, vaccine_before_study 0 387
## 24 age_pcr, vaccine_before_study 0 387
## 25 age_pcr, vaccine_before_study 0 387
## 26 age_pcr, vaccine_before_study 0 387
## 27 age_pcr, vaccine_before_study 0 387
## 28 age_pcr, vaccine_before_study 0 387
## 29 age_pcr, vaccine_before_study 0 387
## 30 age_pcr, vaccine_before_study 0 387
## 31 age_pcr, vaccine_before_study 0 387
## 32 age_pcr, vaccine_before_study 0 387
## 33 age_pcr, vaccine_before_study 0 387
## 34 age_pcr, vaccine_before_study 0 387
## 35 age_pcr, vaccine_before_study 0 387
## 36 age_pcr, vaccine_before_study 0 387
## 37 age_pcr, vaccine_before_study 0 387
## Status p_value p_value_FDR Significant
## 1 OK 0.345 0.910 No
## 2 OK 0.624 0.910 No
## 3 OK 0.946 0.973 No
## 4 OK 0.559 0.910 No
## 5 OK 0.157 0.751 No
## 6 OK 0.238 0.893 No
## 7 OK 0.025 0.297 No
## 8 OK 0.709 0.910 No
## 9 OK 0.356 0.910 No
## 10 OK 0.600 0.910 No
## 11 OK 0.094 0.563 No
## 12 OK 0.396 0.910 No
## 13 OK 0.852 0.935 No
## 14 OK 0.822 0.935 No
## 15 OK 0.514 0.910 No
## 16 OK 0.747 0.910 No
## 17 OK 0.722 0.910 No
## 18 OK 0.935 0.973 No
## 19 OK 0.640 0.910 No
## 20 OK 0.690 0.910 No
## 21 OK 0.093 0.563 No
## 22 OK 0.857 0.935 No
## 23 OK 0.012 0.297 No
## 24 OK 0.984 0.984 No
## 25 OK 0.167 0.751 No
## 26 OK 0.705 0.910 No
## 27 OK 0.046 0.410 No
## 28 OK 0.248 0.893 No
## 29 Error during model/LRT: NA/NaN/I f i 'x' <NA> <NA> No
## 30 OK 0.462 0.910 No
## 31 OK 0.480 0.910 No
## 32 OK 0.759 0.910 No
## 33 OK 0.304 0.910 No
## 34 OK 0.021 0.297 No
## 35 OK 0.478 0.910 No
## 36 OK 0.274 0.897 No
## 37 OK 0.727 0.910 No
## Gamma_Shift_Warning Convergence_Warning
## 1 FALSE FALSE
## 2 FALSE FALSE
## 3 FALSE FALSE
## 4 FALSE FALSE
## 5 FALSE FALSE
## 6 FALSE FALSE
## 7 FALSE FALSE
## 8 FALSE FALSE
## 9 FALSE FALSE
## 10 FALSE FALSE
## 11 FALSE FALSE
## 12 FALSE FALSE
## 13 FALSE FALSE
## 14 FALSE FALSE
## 15 FALSE FALSE
## 16 FALSE FALSE
## 17 FALSE FALSE
## 18 FALSE FALSE
## 19 FALSE FALSE
## 20 FALSE FALSE
## 21 FALSE FALSE
## 22 FALSE FALSE
## 23 FALSE FALSE
## 24 FALSE FALSE
## 25 FALSE FALSE
## 26 FALSE FALSE
## 27 FALSE FALSE
## 28 FALSE FALSE
## 29 TRUE FALSE
## 30 FALSE FALSE
## 31 FALSE FALSE
## 32 FALSE FALSE
## 33 FALSE FALSE
## 34 FALSE FALSE
## 35 FALSE FALSE
## 36 FALSE FALSE
## 37 FALSE FALSE
# Run for Neuroimaging Variables
results_timing_neuro <- compare_groups_auto_v4(
vars_to_test = neuroimaging_vars,
group = "covid_before_vaccination",
covariates = c("age_pcr", "vaccine_before_study"),
data = pcr_positive_data
)
print(results_timing_neuro)
## Variable
## 1 right_accumbens_area
## 2 left_accumbens_area
## 3 right_amygdala
## 4 left_amygdala
## 5 right_cerebellum_exterior
## 6 left_cerebellum_exterior
## 7 right_hippocampus
## 8 left_hippocampus
## 9 right_putamen
## 10 left_putamen
## 11 right_thalamus_proper
## 12 left_thalamus_proper
## 13 fornix_right
## 14 fornix_left
## 15 anterior_limb_of_internal_capsule_right
## 16 anterior_limb_of_internal_capsule_left
## 17 posterior_limb_of_internal_capsule_inc_cerebral_peduncle_right
## 18 posterior_limb_of_internal_capsule_inc_cerebral_peduncle_left
## 19 corpus_callosum
## 20 right_a_cg_g_anterior_cingulate_gyrus
## 21 left_a_cg_g_anterior_cingulate_gyrus
## 22 right_a_ins_anterior_insula
## 23 left_a_ins_anterior_insula
## 24 right_an_g_angular_gyrus
## 25 left_an_g_angular_gyrus
## 26 right_cun_cuneus
## 27 left_cun_cuneus
## 28 right_ent_entorhinal_area
## 29 left_ent_entorhinal_area
## 30 right_g_re_gyrus_rectus
## 31 left_g_re_gyrus_rectus
## Type Covariates_Used
## 1 Continuous (Gaussian GLM used) age_pcr, vaccine_before_study
## 2 Continuous (Gaussian GLM used) age_pcr, vaccine_before_study
## 3 Continuous (Gaussian GLM used) age_pcr, vaccine_before_study
## 4 Continuous (Gaussian GLM used) age_pcr, vaccine_before_study
## 5 Continuous (Gaussian GLM used) age_pcr, vaccine_before_study
## 6 Continuous (Gaussian GLM used) age_pcr, vaccine_before_study
## 7 Continuous (Gaussian GLM used) age_pcr, vaccine_before_study
## 8 Continuous (Gaussian GLM used) age_pcr, vaccine_before_study
## 9 Continuous (Gaussian GLM used) age_pcr, vaccine_before_study
## 10 Continuous (Gaussian GLM used) age_pcr, vaccine_before_study
## 11 Continuous (Gaussian GLM used) age_pcr, vaccine_before_study
## 12 Continuous (Gaussian GLM used) age_pcr, vaccine_before_study
## 13 Continuous (Gaussian GLM used) age_pcr, vaccine_before_study
## 14 Continuous (Gaussian GLM used) age_pcr, vaccine_before_study
## 15 Continuous (Gaussian GLM used) age_pcr, vaccine_before_study
## 16 Continuous (Gaussian GLM used) age_pcr, vaccine_before_study
## 17 Continuous (Gaussian GLM used) age_pcr, vaccine_before_study
## 18 Continuous (Gaussian GLM used) age_pcr, vaccine_before_study
## 19 Continuous (Gaussian GLM used) age_pcr, vaccine_before_study
## 20 Continuous (Gaussian GLM used) age_pcr, vaccine_before_study
## 21 Continuous (Gaussian GLM used) age_pcr, vaccine_before_study
## 22 Continuous (Gaussian GLM used) age_pcr, vaccine_before_study
## 23 Continuous (Gaussian GLM used) age_pcr, vaccine_before_study
## 24 Continuous (Gaussian GLM used) age_pcr, vaccine_before_study
## 25 Continuous (Gaussian GLM used) age_pcr, vaccine_before_study
## 26 Continuous (Gaussian GLM used) age_pcr, vaccine_before_study
## 27 Continuous (Gaussian GLM used) age_pcr, vaccine_before_study
## 28 Continuous (Gaussian GLM used) age_pcr, vaccine_before_study
## 29 Continuous (Gaussian GLM used) age_pcr, vaccine_before_study
## 30 Continuous (Gaussian GLM used) age_pcr, vaccine_before_study
## 31 Continuous (Gaussian GLM used) age_pcr, vaccine_before_study
## Group_Ref_Level_Used n_obs Status p_value p_value_FDR Significant
## 1 0 387 OK 0.431 0.937 No
## 2 0 387 OK 0.573 0.937 No
## 3 0 387 OK 0.052 0.810 No
## 4 0 387 OK 0.316 0.823 No
## 5 0 387 OK 0.691 0.937 No
## 6 0 387 OK 0.295 0.823 No
## 7 0 387 OK 0.142 0.810 No
## 8 0 387 OK 0.157 0.810 No
## 9 0 387 OK 0.987 0.987 No
## 10 0 387 OK 0.877 0.937 No
## 11 0 387 OK 0.791 0.937 No
## 12 0 387 OK 0.695 0.937 No
## 13 0 387 OK 0.148 0.810 No
## 14 0 387 OK 0.770 0.937 No
## 15 0 387 OK 0.978 0.987 No
## 16 0 387 OK 0.691 0.937 No
## 17 0 387 OK 0.848 0.937 No
## 18 0 387 OK 0.796 0.937 No
## 19 0 387 OK 0.845 0.937 No
## 20 0 387 OK 0.295 0.823 No
## 21 0 387 OK 0.255 0.823 No
## 22 0 387 OK 0.319 0.823 No
## 23 0 387 OK 0.487 0.937 No
## 24 0 387 OK 0.027 0.810 No
## 25 0 387 OK 0.087 0.810 No
## 26 0 387 OK 0.591 0.937 No
## 27 0 387 OK 0.527 0.937 No
## 28 0 387 OK 0.650 0.937 No
## 29 0 387 OK 0.239 0.823 No
## 30 0 387 OK 0.694 0.937 No
## 31 0 387 OK 0.439 0.937 No
## Gamma_Shift_Warning Convergence_Warning
## 1 FALSE FALSE
## 2 FALSE FALSE
## 3 FALSE FALSE
## 4 FALSE FALSE
## 5 FALSE FALSE
## 6 FALSE FALSE
## 7 FALSE FALSE
## 8 FALSE FALSE
## 9 FALSE FALSE
## 10 FALSE FALSE
## 11 FALSE FALSE
## 12 FALSE FALSE
## 13 FALSE FALSE
## 14 FALSE FALSE
## 15 FALSE FALSE
## 16 FALSE FALSE
## 17 FALSE FALSE
## 18 FALSE FALSE
## 19 FALSE FALSE
## 20 FALSE FALSE
## 21 FALSE FALSE
## 22 FALSE FALSE
## 23 FALSE FALSE
## 24 FALSE FALSE
## 25 FALSE FALSE
## 26 FALSE FALSE
## 27 FALSE FALSE
## 28 FALSE FALSE
## 29 FALSE FALSE
## 30 FALSE FALSE
## 31 FALSE FALSE
Cognitive Outcomes: Within the PCR-positive group, comparing those infected before versus after vaccination (reference level 0), no significant differences in detailed cognitive scores were found after adjusting for age (age_pcr), pre-study vaccination status (vaccine_before_study), and FDR correction. Several variables (corsiinverso, x5dcounterr, x5dswitcherr, bostonlat) showed nominal significance (p<0.05) but did not meet the FDR threshold. The analysis for dsomis failed. Neuroimaging Outcomes: Similarly, no significant differences in neuroimaging volumes were detected between PCR-positive individuals infected before versus after vaccination, after controlling for age, pre-study vaccination status, and multiple comparisons. Several regions approached nominal significance (e.g., right_amygdala, fornix_left, right_an_g_angular_gyrus, left_an_g_angular_gyrus) but were non-significant after FDR correction.
# Goal: Is presence of specific symptoms linked to cognitive or brain changes?
# Example for Smell Loss:
results_smell_cog <- compare_groups_auto_v4(
vars_to_test = cognitive_vars_detailed,
group= "smell_lost", # Assumes this is 0/1 coded
covariates = c("age_pcr", "risk_hospital_icu"), # Control for age and severity
data = pcr_positive_data
)
print(results_smell_cog)
## Variable Type Covariates_Used
## 1 listaprimerrec Continuous (Gaussian GLM used) age_pcr, risk_hospital_icu
## 2 listaaprendizaje Continuous (Gaussian GLM used) age_pcr, risk_hospital_icu
## 3 listacp Continuous (Gaussian GLM used) age_pcr, risk_hospital_icu
## 4 listalp Continuous (Gaussian GLM used) age_pcr, risk_hospital_icu
## 5 listarecon Continuous (Gaussian GLM used) age_pcr, risk_hospital_icu
## 6 corsidirecto Continuous (Gaussian GLM used) age_pcr, risk_hospital_icu
## 7 corsiinverso Continuous (Gaussian GLM used) age_pcr, risk_hospital_icu
## 8 cactusvivos Continuous (Gaussian GLM used) age_pcr, risk_hospital_icu
## 9 cactusinanim Continuous (Gaussian GLM used) age_pcr, risk_hospital_icu
## 10 otverbaltpo Continuous (Gaussian GLM used) age_pcr, risk_hospital_icu
## 11 otverbalerr Continuous (Gaussian GLM used) age_pcr, risk_hospital_icu
## 12 otvisualtpo Continuous (Gaussian GLM used) age_pcr, risk_hospital_icu
## 13 otvisualerr Continuous (Gaussian GLM used) age_pcr, risk_hospital_icu
## 14 otmentaltpo Continuous (Gaussian GLM used) age_pcr, risk_hospital_icu
## 15 otmentalerr Continuous (Gaussian GLM used) age_pcr, risk_hospital_icu
## 16 otvismenttpo Continuous (Gaussian GLM used) age_pcr, risk_hospital_icu
## 17 otvismenterr Continuous (Gaussian GLM used) age_pcr, risk_hospital_icu
## 18 otswitchtpo Continuous (Gaussian GLM used) age_pcr, risk_hospital_icu
## 19 otswitcherr Continuous (Gaussian GLM used) age_pcr, risk_hospital_icu
## 20 x5dreadtpo Continuous (Gaussian GLM used) age_pcr, risk_hospital_icu
## 21 x5dreaderr Continuous (Gaussian GLM used) age_pcr, risk_hospital_icu
## 22 x5dcounttpo Continuous (Gaussian GLM used) age_pcr, risk_hospital_icu
## 23 x5dcounterr Continuous (Gaussian GLM used) age_pcr, risk_hospital_icu
## 24 x5dfoctpo Continuous (Gaussian GLM used) age_pcr, risk_hospital_icu
## 25 x5dfocerr Continuous (Gaussian GLM used) age_pcr, risk_hospital_icu
## 26 x5dswitchtpo Continuous (Gaussian GLM used) age_pcr, risk_hospital_icu
## 27 x5dswitcherr Continuous (Gaussian GLM used) age_pcr, risk_hospital_icu
## 28 dscorr Continuous (Gaussian GLM used) age_pcr, risk_hospital_icu
## 29 dsomis Continuous age_pcr, risk_hospital_icu
## 30 dscomis Continuous (Gaussian GLM used) age_pcr, risk_hospital_icu
## 31 torremov Continuous (Gaussian GLM used) age_pcr, risk_hospital_icu
## 32 torretpo Continuous (Gaussian GLM used) age_pcr, risk_hospital_icu
## 33 bostonsc Continuous (Gaussian GLM used) age_pcr, risk_hospital_icu
## 34 bostonlat Continuous (Gaussian GLM used) age_pcr, risk_hospital_icu
## 35 bostonsemerr Continuous (Gaussian GLM used) age_pcr, risk_hospital_icu
## 36 bostonfonerr Continuous (Gaussian GLM used) age_pcr, risk_hospital_icu
## 37 fluencia Continuous (Gaussian GLM used) age_pcr, risk_hospital_icu
## Group_Ref_Level_Used n_obs Status p_value
## 1 0 387 OK 0.405
## 2 0 387 OK 0.522
## 3 0 387 OK 0.680
## 4 0 387 OK 0.333
## 5 0 387 OK 0.316
## 6 0 387 OK 0.769
## 7 0 387 OK 0.174
## 8 0 387 OK 0.993
## 9 0 387 OK 0.394
## 10 0 387 OK 0.513
## 11 0 387 OK 0.604
## 12 0 387 OK 0.829
## 13 0 387 OK 0.815
## 14 0 387 OK 0.764
## 15 0 387 OK 0.849
## 16 0 387 OK 0.911
## 17 0 387 OK 0.694
## 18 0 387 OK 0.882
## 19 0 387 OK 0.742
## 20 0 387 OK 0.804
## 21 0 387 OK 0.180
## 22 0 387 OK 0.385
## 23 0 387 OK 0.361
## 24 0 387 OK 0.582
## 25 0 387 OK 0.029
## 26 0 387 OK 0.371
## 27 0 387 OK 0.446
## 28 0 387 OK 0.954
## 29 0 387 Error during model/LRT: NA/NaN/I f i 'x' <NA>
## 30 0 387 OK 0.958
## 31 0 387 OK 0.887
## 32 0 387 OK 0.820
## 33 0 387 OK 0.482
## 34 0 387 OK 0.513
## 35 0 387 OK 0.592
## 36 0 387 OK 0.850
## 37 0 387 OK 0.716
## p_value_FDR Significant Gamma_Shift_Warning Convergence_Warning
## 1 0.985 No FALSE FALSE
## 2 0.985 No FALSE FALSE
## 3 0.985 No FALSE FALSE
## 4 0.985 No FALSE FALSE
## 5 0.985 No FALSE FALSE
## 6 0.985 No FALSE FALSE
## 7 0.985 No FALSE FALSE
## 8 0.993 No FALSE FALSE
## 9 0.985 No FALSE FALSE
## 10 0.985 No FALSE FALSE
## 11 0.985 No FALSE FALSE
## 12 0.985 No FALSE FALSE
## 13 0.985 No FALSE FALSE
## 14 0.985 No FALSE FALSE
## 15 0.985 No FALSE FALSE
## 16 0.985 No FALSE FALSE
## 17 0.985 No FALSE FALSE
## 18 0.985 No FALSE FALSE
## 19 0.985 No FALSE FALSE
## 20 0.985 No FALSE FALSE
## 21 0.985 No FALSE FALSE
## 22 0.985 No FALSE FALSE
## 23 0.985 No FALSE FALSE
## 24 0.985 No FALSE FALSE
## 25 0.985 No FALSE FALSE
## 26 0.985 No FALSE FALSE
## 27 0.985 No FALSE FALSE
## 28 0.985 No FALSE FALSE
## 29 <NA> No TRUE FALSE
## 30 0.985 No FALSE FALSE
## 31 0.985 No FALSE FALSE
## 32 0.985 No FALSE FALSE
## 33 0.985 No FALSE FALSE
## 34 0.985 No FALSE FALSE
## 35 0.985 No FALSE FALSE
## 36 0.985 No FALSE FALSE
## 37 0.985 No FALSE FALSE
results_smell_neuro <- compare_groups_auto_v4(
vars_to_test = neuroimaging_vars,
group = "smell_lost",
covariates = c("age_pcr", "risk_hospital_icu"),
data = pcr_positive_data
)
print(results_smell_neuro)
## Variable
## 1 right_accumbens_area
## 2 left_accumbens_area
## 3 right_amygdala
## 4 left_amygdala
## 5 right_cerebellum_exterior
## 6 left_cerebellum_exterior
## 7 right_hippocampus
## 8 left_hippocampus
## 9 right_putamen
## 10 left_putamen
## 11 right_thalamus_proper
## 12 left_thalamus_proper
## 13 fornix_right
## 14 fornix_left
## 15 anterior_limb_of_internal_capsule_right
## 16 anterior_limb_of_internal_capsule_left
## 17 posterior_limb_of_internal_capsule_inc_cerebral_peduncle_right
## 18 posterior_limb_of_internal_capsule_inc_cerebral_peduncle_left
## 19 corpus_callosum
## 20 right_a_cg_g_anterior_cingulate_gyrus
## 21 left_a_cg_g_anterior_cingulate_gyrus
## 22 right_a_ins_anterior_insula
## 23 left_a_ins_anterior_insula
## 24 right_an_g_angular_gyrus
## 25 left_an_g_angular_gyrus
## 26 right_cun_cuneus
## 27 left_cun_cuneus
## 28 right_ent_entorhinal_area
## 29 left_ent_entorhinal_area
## 30 right_g_re_gyrus_rectus
## 31 left_g_re_gyrus_rectus
## Type Covariates_Used
## 1 Continuous (Gaussian GLM used) age_pcr, risk_hospital_icu
## 2 Continuous (Gaussian GLM used) age_pcr, risk_hospital_icu
## 3 Continuous (Gaussian GLM used) age_pcr, risk_hospital_icu
## 4 Continuous (Gaussian GLM used) age_pcr, risk_hospital_icu
## 5 Continuous (Gaussian GLM used) age_pcr, risk_hospital_icu
## 6 Continuous (Gaussian GLM used) age_pcr, risk_hospital_icu
## 7 Continuous (Gaussian GLM used) age_pcr, risk_hospital_icu
## 8 Continuous (Gaussian GLM used) age_pcr, risk_hospital_icu
## 9 Continuous (Gaussian GLM used) age_pcr, risk_hospital_icu
## 10 Continuous (Gaussian GLM used) age_pcr, risk_hospital_icu
## 11 Continuous (Gaussian GLM used) age_pcr, risk_hospital_icu
## 12 Continuous (Gaussian GLM used) age_pcr, risk_hospital_icu
## 13 Continuous (Gaussian GLM used) age_pcr, risk_hospital_icu
## 14 Continuous (Gaussian GLM used) age_pcr, risk_hospital_icu
## 15 Continuous (Gaussian GLM used) age_pcr, risk_hospital_icu
## 16 Continuous (Gaussian GLM used) age_pcr, risk_hospital_icu
## 17 Continuous (Gaussian GLM used) age_pcr, risk_hospital_icu
## 18 Continuous (Gaussian GLM used) age_pcr, risk_hospital_icu
## 19 Continuous (Gaussian GLM used) age_pcr, risk_hospital_icu
## 20 Continuous (Gaussian GLM used) age_pcr, risk_hospital_icu
## 21 Continuous (Gaussian GLM used) age_pcr, risk_hospital_icu
## 22 Continuous (Gaussian GLM used) age_pcr, risk_hospital_icu
## 23 Continuous (Gaussian GLM used) age_pcr, risk_hospital_icu
## 24 Continuous (Gaussian GLM used) age_pcr, risk_hospital_icu
## 25 Continuous (Gaussian GLM used) age_pcr, risk_hospital_icu
## 26 Continuous (Gaussian GLM used) age_pcr, risk_hospital_icu
## 27 Continuous (Gaussian GLM used) age_pcr, risk_hospital_icu
## 28 Continuous (Gaussian GLM used) age_pcr, risk_hospital_icu
## 29 Continuous (Gaussian GLM used) age_pcr, risk_hospital_icu
## 30 Continuous (Gaussian GLM used) age_pcr, risk_hospital_icu
## 31 Continuous (Gaussian GLM used) age_pcr, risk_hospital_icu
## Group_Ref_Level_Used n_obs Status p_value p_value_FDR Significant
## 1 0 387 OK 0.221 0.554 No
## 2 0 387 OK 0.352 0.554 No
## 3 0 387 OK 0.610 0.700 No
## 4 0 387 OK 0.987 0.998 No
## 5 0 387 OK 0.451 0.607 No
## 6 0 387 OK 0.250 0.554 No
## 7 0 387 OK 0.998 0.998 No
## 8 0 387 OK 0.374 0.554 No
## 9 0 387 OK 0.025 0.380 No
## 10 0 387 OK 0.038 0.397 No
## 11 0 387 OK 0.264 0.554 No
## 12 0 387 OK 0.363 0.554 No
## 13 0 387 OK 0.176 0.545 No
## 14 0 387 OK 0.470 0.607 No
## 15 0 387 OK 0.752 0.803 No
## 16 0 387 OK 0.535 0.638 No
## 17 0 387 OK 0.147 0.538 No
## 18 0 387 OK 0.349 0.554 No
## 19 0 387 OK 0.639 0.708 No
## 20 0 387 OK 0.393 0.554 No
## 21 0 387 OK 0.105 0.538 No
## 22 0 387 OK 0.132 0.538 No
## 23 0 387 OK 0.299 0.554 No
## 24 0 387 OK 0.128 0.538 No
## 25 0 387 OK 0.383 0.554 No
## 26 0 387 OK 0.531 0.638 No
## 27 0 387 OK 0.354 0.554 No
## 28 0 387 OK 0.259 0.554 No
## 29 0 387 OK 0.156 0.538 No
## 30 0 387 OK 0.008 0.257 No
## 31 0 387 OK 0.094 0.538 No
## Gamma_Shift_Warning Convergence_Warning
## 1 FALSE FALSE
## 2 FALSE FALSE
## 3 FALSE FALSE
## 4 FALSE FALSE
## 5 FALSE FALSE
## 6 FALSE FALSE
## 7 FALSE FALSE
## 8 FALSE FALSE
## 9 FALSE FALSE
## 10 FALSE FALSE
## 11 FALSE FALSE
## 12 FALSE FALSE
## 13 FALSE FALSE
## 14 FALSE FALSE
## 15 FALSE FALSE
## 16 FALSE FALSE
## 17 FALSE FALSE
## 18 FALSE FALSE
## 19 FALSE FALSE
## 20 FALSE FALSE
## 21 FALSE FALSE
## 22 FALSE FALSE
## 23 FALSE FALSE
## 24 FALSE FALSE
## 25 FALSE FALSE
## 26 FALSE FALSE
## 27 FALSE FALSE
## 28 FALSE FALSE
## 29 FALSE FALSE
## 30 FALSE FALSE
## 31 FALSE FALSE
# *** After, we can repeat the above structure for other binary symptoms like: ***
# grouping_var = "taste_lost"
# grouping_var = "breath_dif"
# etc.
Cognitive Outcomes: Comparing PCR-positive individuals with and without reported smell loss (reference level 0), while controlling for age (age_pcr) and COVID severity (risk_hospital_icu), no significant differences were found for any detailed cognitive scores after FDR correction. The analysis for dsomis failed. x5dfocerr approached nominal significance (p=0.029) but was not significant after correction (FDR p=0.985). Neuroimaging Outcomes: After adjusting for age and COVID severity and applying FDR correction, no significant differences in neuroimaging volumes were observed between PCR-positive participants with and without smell loss. Several variables showed nominal significance (right_putamen, left_putamen, right_g_re_gyrus_rectus) but did not survive multiple comparison correction.
# Goal: Do different major variants associate with specific outcomes?
# First, create the grouped variant variable (ensure dplyr is loaded)
library(dplyr)
pcr_positive_data <- pcr_positive_data %>%
mutate(covid_variant_grouped = case_when(
covid_variant %in% c(0, 1, 2, 3) ~ as.character(covid_variant), # Keep major ones separate
TRUE ~ "Other_Rare" # Group others
)) %>%
mutate(covid_variant_grouped = factor(covid_variant_grouped)) # Make it a factor
print(table(pcr_positive_data$covid_variant_grouped))
##
## 0 1 2 3 Other_Rare
## 5 222 94 59 7
# Run for Cognitive Variables
results_variant_cog <- compare_groups_auto_v4(
vars_to_test = cognitive_vars_detailed,
group = "covid_variant_grouped",
covariates = c("age_pcr", "vaccine_before_study"),
data = pcr_positive_data
)
print(results_variant_cog)
## Variable Type
## 1 listaprimerrec Continuous (Gaussian GLM used)
## 2 listaaprendizaje Continuous (Gaussian GLM used)
## 3 listacp Continuous (Gaussian GLM used)
## 4 listalp Continuous (Gaussian GLM used)
## 5 listarecon Continuous (Gaussian GLM used)
## 6 corsidirecto Continuous (Gaussian GLM used)
## 7 corsiinverso Continuous (Gaussian GLM used)
## 8 cactusvivos Continuous (Gaussian GLM used)
## 9 cactusinanim Continuous (Gaussian GLM used)
## 10 otverbaltpo Continuous (Gaussian GLM used)
## 11 otverbalerr Continuous (Gaussian GLM used)
## 12 otvisualtpo Continuous (Gaussian GLM used)
## 13 otvisualerr Continuous (Gaussian GLM used)
## 14 otmentaltpo Continuous (Gaussian GLM used)
## 15 otmentalerr Continuous (Gaussian GLM used)
## 16 otvismenttpo Continuous (Gaussian GLM used)
## 17 otvismenterr Continuous (Gaussian GLM used)
## 18 otswitchtpo Continuous (Gaussian GLM used)
## 19 otswitcherr Continuous (Gaussian GLM used)
## 20 x5dreadtpo Continuous (Gaussian GLM used)
## 21 x5dreaderr Continuous (Gaussian GLM used)
## 22 x5dcounttpo Continuous (Gaussian GLM used)
## 23 x5dcounterr Continuous (Gaussian GLM used)
## 24 x5dfoctpo Continuous (Gaussian GLM used)
## 25 x5dfocerr Continuous (Gaussian GLM used)
## 26 x5dswitchtpo Continuous (Gaussian GLM used)
## 27 x5dswitcherr Continuous (Gaussian GLM used)
## 28 dscorr Continuous (Gaussian GLM used)
## 29 dsomis Continuous
## 30 dscomis Continuous (Gaussian GLM used)
## 31 torremov Continuous (Gaussian GLM used)
## 32 torretpo Continuous (Gaussian GLM used)
## 33 bostonsc Continuous (Gaussian GLM used)
## 34 bostonlat Continuous (Gaussian GLM used)
## 35 bostonsemerr Continuous (Gaussian GLM used)
## 36 bostonfonerr Continuous (Gaussian GLM used)
## 37 fluencia Continuous (Gaussian GLM used)
## Covariates_Used Group_Ref_Level_Used n_obs
## 1 age_pcr, vaccine_before_study 0 387
## 2 age_pcr, vaccine_before_study 0 387
## 3 age_pcr, vaccine_before_study 0 387
## 4 age_pcr, vaccine_before_study 0 387
## 5 age_pcr, vaccine_before_study 0 387
## 6 age_pcr, vaccine_before_study 0 387
## 7 age_pcr, vaccine_before_study 0 387
## 8 age_pcr, vaccine_before_study 0 387
## 9 age_pcr, vaccine_before_study 0 387
## 10 age_pcr, vaccine_before_study 0 387
## 11 age_pcr, vaccine_before_study 0 387
## 12 age_pcr, vaccine_before_study 0 387
## 13 age_pcr, vaccine_before_study 0 387
## 14 age_pcr, vaccine_before_study 0 387
## 15 age_pcr, vaccine_before_study 0 387
## 16 age_pcr, vaccine_before_study 0 387
## 17 age_pcr, vaccine_before_study 0 387
## 18 age_pcr, vaccine_before_study 0 387
## 19 age_pcr, vaccine_before_study 0 387
## 20 age_pcr, vaccine_before_study 0 387
## 21 age_pcr, vaccine_before_study 0 387
## 22 age_pcr, vaccine_before_study 0 387
## 23 age_pcr, vaccine_before_study 0 387
## 24 age_pcr, vaccine_before_study 0 387
## 25 age_pcr, vaccine_before_study 0 387
## 26 age_pcr, vaccine_before_study 0 387
## 27 age_pcr, vaccine_before_study 0 387
## 28 age_pcr, vaccine_before_study 0 387
## 29 age_pcr, vaccine_before_study 0 387
## 30 age_pcr, vaccine_before_study 0 387
## 31 age_pcr, vaccine_before_study 0 387
## 32 age_pcr, vaccine_before_study 0 387
## 33 age_pcr, vaccine_before_study 0 387
## 34 age_pcr, vaccine_before_study 0 387
## 35 age_pcr, vaccine_before_study 0 387
## 36 age_pcr, vaccine_before_study 0 387
## 37 age_pcr, vaccine_before_study 0 387
## Status p_value p_value_FDR Significant
## 1 OK 0.003 0.067 No
## 2 OK 0.153 0.554 No
## 3 OK 0.512 0.943 No
## 4 OK 0.136 0.554 No
## 5 OK 0.123 0.554 No
## 6 OK 0.587 0.943 No
## 7 OK 0.127 0.554 No
## 8 OK 0.358 0.920 No
## 9 OK 0.679 0.943 No
## 10 OK 0.908 0.951 No
## 11 OK 0.589 0.943 No
## 12 OK 0.812 0.943 No
## 13 OK 0.697 0.943 No
## 14 OK 0.907 0.951 No
## 15 OK 0.727 0.943 No
## 16 OK 0.970 0.970 No
## 17 OK 0.925 0.951 No
## 18 OK 0.764 0.943 No
## 19 OK 0.802 0.943 No
## 20 OK 0.649 0.943 No
## 21 OK 0.037 0.336 No
## 22 OK 0.925 0.951 No
## 23 OK 0.008 0.101 No
## 24 OK 0.573 0.943 No
## 25 OK 0.154 0.554 No
## 26 OK 0.501 0.943 No
## 27 OK 0.345 0.920 No
## 28 OK 0.205 0.671 No
## 29 Error during model/LRT: NA/NaN/I f i 'x' <NA> <NA> No
## 30 OK 0.650 0.943 No
## 31 OK 0.763 0.943 No
## 32 OK 0.699 0.943 No
## 33 OK 0.338 0.920 No
## 34 OK 0.547 0.943 No
## 35 OK 0.797 0.943 No
## 36 OK 0.061 0.439 No
## 37 OK 0.004 0.067 No
## Gamma_Shift_Warning Convergence_Warning
## 1 FALSE FALSE
## 2 FALSE FALSE
## 3 FALSE FALSE
## 4 FALSE FALSE
## 5 FALSE FALSE
## 6 FALSE FALSE
## 7 FALSE FALSE
## 8 FALSE FALSE
## 9 FALSE FALSE
## 10 FALSE FALSE
## 11 FALSE FALSE
## 12 FALSE FALSE
## 13 FALSE FALSE
## 14 FALSE FALSE
## 15 FALSE FALSE
## 16 FALSE FALSE
## 17 FALSE FALSE
## 18 FALSE FALSE
## 19 FALSE FALSE
## 20 FALSE FALSE
## 21 FALSE FALSE
## 22 FALSE FALSE
## 23 FALSE FALSE
## 24 FALSE FALSE
## 25 FALSE FALSE
## 26 FALSE FALSE
## 27 FALSE FALSE
## 28 FALSE FALSE
## 29 TRUE FALSE
## 30 FALSE FALSE
## 31 FALSE FALSE
## 32 FALSE FALSE
## 33 FALSE FALSE
## 34 FALSE FALSE
## 35 FALSE FALSE
## 36 FALSE FALSE
## 37 FALSE FALSE
# Run for Neuroimaging Variables
results_variant_neuro <- compare_groups_auto_v4(
vars_to_test = neuroimaging_vars,
group = "covid_variant_grouped",
covariates = c("age_pcr", "vaccine_before_study"),
data = pcr_positive_data
)
print(results_variant_neuro)
## Variable
## 1 right_accumbens_area
## 2 left_accumbens_area
## 3 right_amygdala
## 4 left_amygdala
## 5 right_cerebellum_exterior
## 6 left_cerebellum_exterior
## 7 right_hippocampus
## 8 left_hippocampus
## 9 right_putamen
## 10 left_putamen
## 11 right_thalamus_proper
## 12 left_thalamus_proper
## 13 fornix_right
## 14 fornix_left
## 15 anterior_limb_of_internal_capsule_right
## 16 anterior_limb_of_internal_capsule_left
## 17 posterior_limb_of_internal_capsule_inc_cerebral_peduncle_right
## 18 posterior_limb_of_internal_capsule_inc_cerebral_peduncle_left
## 19 corpus_callosum
## 20 right_a_cg_g_anterior_cingulate_gyrus
## 21 left_a_cg_g_anterior_cingulate_gyrus
## 22 right_a_ins_anterior_insula
## 23 left_a_ins_anterior_insula
## 24 right_an_g_angular_gyrus
## 25 left_an_g_angular_gyrus
## 26 right_cun_cuneus
## 27 left_cun_cuneus
## 28 right_ent_entorhinal_area
## 29 left_ent_entorhinal_area
## 30 right_g_re_gyrus_rectus
## 31 left_g_re_gyrus_rectus
## Type Covariates_Used
## 1 Continuous (Gaussian GLM used) age_pcr, vaccine_before_study
## 2 Continuous (Gaussian GLM used) age_pcr, vaccine_before_study
## 3 Continuous (Gaussian GLM used) age_pcr, vaccine_before_study
## 4 Continuous (Gaussian GLM used) age_pcr, vaccine_before_study
## 5 Continuous (Gaussian GLM used) age_pcr, vaccine_before_study
## 6 Continuous (Gaussian GLM used) age_pcr, vaccine_before_study
## 7 Continuous (Gaussian GLM used) age_pcr, vaccine_before_study
## 8 Continuous (Gaussian GLM used) age_pcr, vaccine_before_study
## 9 Continuous (Gaussian GLM used) age_pcr, vaccine_before_study
## 10 Continuous (Gaussian GLM used) age_pcr, vaccine_before_study
## 11 Continuous (Gaussian GLM used) age_pcr, vaccine_before_study
## 12 Continuous (Gaussian GLM used) age_pcr, vaccine_before_study
## 13 Continuous (Gaussian GLM used) age_pcr, vaccine_before_study
## 14 Continuous (Gaussian GLM used) age_pcr, vaccine_before_study
## 15 Continuous (Gaussian GLM used) age_pcr, vaccine_before_study
## 16 Continuous (Gaussian GLM used) age_pcr, vaccine_before_study
## 17 Continuous (Gaussian GLM used) age_pcr, vaccine_before_study
## 18 Continuous (Gaussian GLM used) age_pcr, vaccine_before_study
## 19 Continuous (Gaussian GLM used) age_pcr, vaccine_before_study
## 20 Continuous (Gaussian GLM used) age_pcr, vaccine_before_study
## 21 Continuous (Gaussian GLM used) age_pcr, vaccine_before_study
## 22 Continuous (Gaussian GLM used) age_pcr, vaccine_before_study
## 23 Continuous (Gaussian GLM used) age_pcr, vaccine_before_study
## 24 Continuous (Gaussian GLM used) age_pcr, vaccine_before_study
## 25 Continuous (Gaussian GLM used) age_pcr, vaccine_before_study
## 26 Continuous (Gaussian GLM used) age_pcr, vaccine_before_study
## 27 Continuous (Gaussian GLM used) age_pcr, vaccine_before_study
## 28 Continuous (Gaussian GLM used) age_pcr, vaccine_before_study
## 29 Continuous (Gaussian GLM used) age_pcr, vaccine_before_study
## 30 Continuous (Gaussian GLM used) age_pcr, vaccine_before_study
## 31 Continuous (Gaussian GLM used) age_pcr, vaccine_before_study
## Group_Ref_Level_Used n_obs Status p_value p_value_FDR Significant
## 1 0 387 OK 0.664 0.879 No
## 2 0 387 OK 0.552 0.778 No
## 3 0 387 OK 0.015 0.457 No
## 4 0 387 OK 0.517 0.778 No
## 5 0 387 OK 0.442 0.778 No
## 6 0 387 OK 0.176 0.654 No
## 7 0 387 OK 0.190 0.654 No
## 8 0 387 OK 0.832 0.921 No
## 9 0 387 OK 0.780 0.896 No
## 10 0 387 OK 0.737 0.879 No
## 11 0 387 OK 0.314 0.778 No
## 12 0 387 OK 0.421 0.778 No
## 13 0 387 OK 0.147 0.654 No
## 14 0 387 OK 0.461 0.778 No
## 15 0 387 OK 0.732 0.879 No
## 16 0 387 OK 0.685 0.879 No
## 17 0 387 OK 0.400 0.778 No
## 18 0 387 OK 0.877 0.925 No
## 19 0 387 OK 0.281 0.778 No
## 20 0 387 OK 0.487 0.778 No
## 21 0 387 OK 0.142 0.654 No
## 22 0 387 OK 0.127 0.654 No
## 23 0 387 OK 0.103 0.654 No
## 24 0 387 OK 0.094 0.654 No
## 25 0 387 OK 0.225 0.698 No
## 26 0 387 OK 0.327 0.778 No
## 27 0 387 OK 0.970 0.970 No
## 28 0 387 OK 0.156 0.654 No
## 29 0 387 OK 0.895 0.925 No
## 30 0 387 OK 0.443 0.778 No
## 31 0 387 OK 0.527 0.778 No
## Gamma_Shift_Warning Convergence_Warning
## 1 FALSE FALSE
## 2 FALSE FALSE
## 3 FALSE FALSE
## 4 FALSE FALSE
## 5 FALSE FALSE
## 6 FALSE FALSE
## 7 FALSE FALSE
## 8 FALSE FALSE
## 9 FALSE FALSE
## 10 FALSE FALSE
## 11 FALSE FALSE
## 12 FALSE FALSE
## 13 FALSE FALSE
## 14 FALSE FALSE
## 15 FALSE FALSE
## 16 FALSE FALSE
## 17 FALSE FALSE
## 18 FALSE FALSE
## 19 FALSE FALSE
## 20 FALSE FALSE
## 21 FALSE FALSE
## 22 FALSE FALSE
## 23 FALSE FALSE
## 24 FALSE FALSE
## 25 FALSE FALSE
## 26 FALSE FALSE
## 27 FALSE FALSE
## 28 FALSE FALSE
## 29 FALSE FALSE
## 30 FALSE FALSE
## 31 FALSE FALSE
Cognitive Outcomes: Within the PCR-positive group, comparing different grouped COVID-19 variants (reference level 0, including an “Other_Rare” category) while controlling for age (age_pcr) and pre-study vaccination status (vaccine_before_study), no significant differences in detailed cognitive scores were found after FDR correction. listaprimerrec, x5dreaderr, x5dcounterr, fluencia, and bostonfonerr showed nominal significance (p<0.05 or p=0.061) but did not meet the FDR threshold. The analysis for dsomis failed. Neuroimaging Outcomes: After adjusting for age and vaccination status and applying FDR correction, no significant differences in neuroimaging volumes were detected across the grouped COVID-19 variant categories within the PCR-positive sample. right_amygdala and right_an_g_angular_gyrus showed nominal significance but were non-significant after correction.
# Goal: Focus comparisons on specific types of cognitive measures.
# Define domain-specific variable lists (examples - adjust based on your expertise)
processing_speed_vars <- c("otverbaltpo", "otvisualtpo", "otmentaltpo", "otvismenttpo",
"otswitchtpo", "x5dreadtpo", "x5dcounttpo", "x5dfoctpo",
"x5dswitchtpo", "torretpo", "bostonlat")
accuracy_error_vars <- c("otverbalerr", "otvisualerr", "otmentalerr", "otvismenterr",
"otswitcherr", "x5dreaderr", "x5dcounterr", "x5dfocerr",
"x5dswitcherr", "dsomis", "dscomis", "bostonsemerr", "bostonfonerr")
memory_learning_vars <- c("listaprimerrec", "listaaprendizaje", "listacp", "listalp",
"listarecon", "corsidirecto", "corsiinverso", "dscorr")
executive_naming_vars <- c("fluencia", "bostonsc", "torremov") # Add others like otswitchtpo/err if desired
# Example: Comparing Processing Speed by PCR Status
results_pcr_speed <- compare_groups_auto_v4(
vars_to_test = processing_speed_vars,
group = "pcr",
covariates = c("age_pcr"),
data = imputed_data
)
print(results_pcr_speed)
## Variable Type Covariates_Used
## 1 otverbaltpo Continuous (Gaussian GLM used) age_pcr
## 2 otvisualtpo Continuous (Gaussian GLM used) age_pcr
## 3 otmentaltpo Continuous (Gaussian GLM used) age_pcr
## 4 otvismenttpo Continuous (Gaussian GLM used) age_pcr
## 5 otswitchtpo Continuous (Gaussian GLM used) age_pcr
## 6 x5dreadtpo Continuous (Gaussian GLM used) age_pcr
## 7 x5dcounttpo Continuous (Gaussian GLM used) age_pcr
## 8 x5dfoctpo Continuous (Gaussian GLM used) age_pcr
## 9 x5dswitchtpo Continuous (Gaussian GLM used) age_pcr
## 10 torretpo Continuous (Gaussian GLM used) age_pcr
## 11 bostonlat Continuous (Gaussian GLM used) age_pcr
## Group_Ref_Level_Used n_obs Status p_value p_value_FDR Significant
## 1 NEGATIVA 463 OK 0.230 0.421 No
## 2 NEGATIVA 463 OK 0.994 0.994 No
## 3 NEGATIVA 463 OK 0.143 0.339 No
## 4 NEGATIVA 463 OK 0.345 0.447 No
## 5 NEGATIVA 463 OK 0.366 0.447 No
## 6 NEGATIVA 463 OK 0.154 0.339 No
## 7 NEGATIVA 463 OK 0.142 0.339 No
## 8 NEGATIVA 463 OK 0.344 0.447 No
## 9 NEGATIVA 463 OK 0.910 0.994 No
## 10 NEGATIVA 463 OK 0.007 0.078 No
## 11 NEGATIVA 463 OK 0.064 0.339 No
## Gamma_Shift_Warning Convergence_Warning
## 1 FALSE FALSE
## 2 FALSE FALSE
## 3 FALSE FALSE
## 4 FALSE FALSE
## 5 FALSE FALSE
## 6 FALSE FALSE
## 7 FALSE FALSE
## 8 FALSE FALSE
## 9 FALSE FALSE
## 10 FALSE FALSE
## 11 FALSE FALSE
# Example: Comparing Memory/Learning by Smell Loss (within PCR+)
results_smell_memory <- compare_groups_auto_v4(
vars_to_test = memory_learning_vars,
group = "smell_lost",
covariates = c("age_pcr", "risk_hospital_icu"),
data = pcr_positive_data
)
print(results_smell_memory)
## Variable Type Covariates_Used
## 1 listaprimerrec Continuous (Gaussian GLM used) age_pcr, risk_hospital_icu
## 2 listaaprendizaje Continuous (Gaussian GLM used) age_pcr, risk_hospital_icu
## 3 listacp Continuous (Gaussian GLM used) age_pcr, risk_hospital_icu
## 4 listalp Continuous (Gaussian GLM used) age_pcr, risk_hospital_icu
## 5 listarecon Continuous (Gaussian GLM used) age_pcr, risk_hospital_icu
## 6 corsidirecto Continuous (Gaussian GLM used) age_pcr, risk_hospital_icu
## 7 corsiinverso Continuous (Gaussian GLM used) age_pcr, risk_hospital_icu
## 8 dscorr Continuous (Gaussian GLM used) age_pcr, risk_hospital_icu
## Group_Ref_Level_Used n_obs Status p_value p_value_FDR Significant
## 1 0 387 OK 0.405 0.810 No
## 2 0 387 OK 0.522 0.835 No
## 3 0 387 OK 0.680 0.878 No
## 4 0 387 OK 0.333 0.810 No
## 5 0 387 OK 0.316 0.810 No
## 6 0 387 OK 0.769 0.878 No
## 7 0 387 OK 0.174 0.810 No
## 8 0 387 OK 0.954 0.954 No
## Gamma_Shift_Warning Convergence_Warning
## 1 FALSE FALSE
## 2 FALSE FALSE
## 3 FALSE FALSE
## 4 FALSE FALSE
## 5 FALSE FALSE
## 6 FALSE FALSE
## 7 FALSE FALSE
## 8 FALSE FALSE
# *** Repeat the above structure for other domains and other grouping variables of interest ***
# e.g., Accuracy vs PCR, Executive vs Severity (grouped), Memory vs Variant (grouped) etc.
Processing Speed vs. PCR Status (N=463): Comparing PCR-positive vs. PCR-negative groups (reference: NEGATIVA) on processing speed variables, while controlling for age (age_pcr), revealed a significant difference only for torretpo (Tower Task Time, FDR p=0.078, though nominal p=0.007 was strong). No other speed-related variables showed significant differences after FDR correction. bostonlat approached nominal significance (p=0.064). Memory/Learning vs. Smell Loss (within PCR+, N=387): Comparing PCR-positive individuals with and without smell loss (reference level 0) on memory and learning variables, while controlling for age (age_pcr) and COVID severity (risk_hospital_icu), revealed no significant differences after FDR correction.
A machine learning approach such as a Random Forest can be used to classify participants according to cognitive status using demographic and neuroimaging predictors. This approach can also help identify the most important predictors of cognitive differences.
# Load the randomForest package
library(randomForest)
# Ensure cognitive status is treated as a factor
imputed_data$cognitive <- as.factor(imputed_data$cognitive)
# Select predictors (here we include age_pcr and neuroimaging measures; adjust as needed)
predictors <- imputed_data %>%
select(age_pcr, right_accumbens_area:left_g_re_gyrus_rectus)
# Combine response and predictors into one dataframe
rf_data <- data.frame(cognitive = imputed_data$cognitive, predictors)
# Set seed for reproducibility and train the random forest model
set.seed(123)
rf_model <- randomForest(cognitive ~ ., data = rf_data, importance = TRUE)
# Print the model summary
print(rf_model)
##
## Call:
## randomForest(formula = cognitive ~ ., data = rf_data, importance = TRUE)
## Type of random forest: classification
## Number of trees: 500
## No. of variables tried at each split: 5
##
## OOB estimate of error rate: 40.6%
## Confusion matrix:
## 1 2 class.error
## 1 134 97 0.4199134
## 2 91 141 0.3922414
# Plot variable importance to identify key predictors
varImpPlot(rf_model)
## Interpretation of the Random Forest Variable Importance Plot
In these two plots (yes, I know they are too small to be seen), each point represents the contribution of a particular predictor variable to the random forest model used to classify participants according to their cognitive status. The left panel (Mean Decrease in Accuracy) indicates how much model accuracy would drop if a given variable were excluded, while the right panel (Mean Decrease in Gini) shows how each variable contributes to node purity (i.e., how well it splits the data within the trees). 1. Top Predictors • Age (age_pcr) stands out as the most influential predictor for classifying cognitive status, indicating that chronological age has the strongest impact on the model’s decision-making process. • Subcortical volumes (e.g., hippocampus and thalamus) also rank highly, suggesting that these brain regions are important for differentiating between cognitive groups. 2. Model Performance • The out-of-bag (OOB) error rate of about 40.6% implies a moderate level of predictive accuracy (roughly 59.4% correct classification). While certain variables (like age and hippocampal volumes) are clearly influential, the model still struggles to classify all individuals correctly. 3. Practical Implications • The strong role of age underscores the need to control for or further investigate age-related effects when studying cognitive outcomes. • The importance of hippocampal and thalamic volumes aligns with existing evidence that these structures are linked to cognitive performance, particularly in aging populations. • The moderate overall accuracy suggests that either (a) additional variables or (b) a different modeling approach (e.g., feature engineering, dimension reduction, or other machine learning methods) may be needed to improve classification performance.
Overall, these plots highlight that age and certain subcortical regions are key drivers of the model’s classification decisions, but the relatively high misclassification rate points to a complex interplay of factors influencing cognitive status.
GAMs allow flexible modeling of non-linear effects. For instance, you can explore the non-linear association between age (or age interval) and brain volumes, which may not be captured adequately by linear models.
# Load the mgcv package
library(mgcv)
# Fit a GAM for one brain region (e.g., right_hippocampus) as a function of age_pcr
gam_model <- gam(right_hippocampus ~ s(age_pcr), data = imputed_data)
# Print the summary of the model
summary(gam_model)
##
## Family: gaussian
## Link function: identity
##
## Formula:
## right_hippocampus ~ s(age_pcr)
##
## Parametric coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3823.52 19.28 198.3 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Approximate significance of smooth terms:
## edf Ref.df F p-value
## s(age_pcr) 3.419 4.326 9.536 3.68e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R-sq.(adj) = 0.0801 Deviance explained = 8.69%
## GCV = 1.7373e+05 Scale est. = 1.7207e+05 n = 463
# Plot the fitted smooth along with the residuals
plot(gam_model, residuals = TRUE, pch = 20, cex = 0.5, shade = TRUE)
# Diagnostic plots for the GAM model
par(mfrow = c(2,2))
plot(gam_model, residuals = TRUE, pch = 20, cex = 0.5, shade = TRUE)
# Check model diagnostics
gam.check(gam_model)
##
## Method: GCV Optimizer: magic
## Smoothing parameter selection converged after 4 iterations.
## The RMS GCV score gradient at convergence was 0.01721632 .
## The Hessian was positive definite.
## Model rank = 10 / 10
##
## Basis dimension (k) checking results. Low p-value (k-index<1) may
## indicate that k is too low, especially if edf is close to k'.
##
## k' edf k-index p-value
## s(age_pcr) 9.00 3.42 0.95 0.12
library(lavaan)
# --- Create the numeric version of cognitive FIRST ---
# Ensure the 'cognitive' factor exists and convert it
if ("cognitive" %in% names(imputed_data) && is.factor(imputed_data$cognitive)) {
imputed_data$cognitive_num <- as.numeric(as.character(imputed_data$cognitive))
print("Created 'cognitive_num' variable.")
} else {
stop("Error: 'cognitive' column not found in imputed_data or is not a factor. Cannot create 'cognitive_num'.")
}
## [1] "Created 'cognitive_num' variable."
# --- Scaling Variables for Mediation ---
# Select only the data needed for the model and scale the continuous variables
# Now 'cognitive_num' exists and can be selected
mediation_data_scaled <- imputed_data %>%
# Ensure the necessary columns exist before selecting
select(one_of(c("cognitive_num", "right_hippocampus", "age_pcr"))) %>%
mutate(
cognitive_num_scaled = scale(cognitive_num),
right_hippocampus_scaled = scale(right_hippocampus),
age_pcr_scaled = scale(age_pcr)
) %>%
# Select only the scaled variables for the model
select(cognitive_num_scaled, right_hippocampus_scaled, age_pcr_scaled)
# --- Define the Mediation Model using Scaled Variables ---
mediation_model_scaled <- '
# Direct effect
cognitive_num_scaled ~ c*age_pcr_scaled
# Mediator path
right_hippocampus_scaled ~ a*age_pcr_scaled
cognitive_num_scaled ~ b*right_hippocampus_scaled
# Indirect effect (a*b) and total effect
ab := a*b
total := c + (a*b)
'
# --- Fit the Mediation Model using Scaled Data ---
# Still use MLR for robustness if desired, but scaling often helps stability
fit_mediation_scaled <- sem(mediation_model_scaled,
data = mediation_data_scaled,
estimator = "MLR", # Or use "ML" if MLR still fails
warn = TRUE) # Keep warnings on
# --- Summarize the Results ---
# Note: Coefficients will now be standardized estimates because variables were scaled
print("--- Summary of Mediation Model with Scaled Variables ---")
## [1] "--- Summary of Mediation Model with Scaled Variables ---"
# Check if the model converged and standard errors were computed
summary_output <- tryCatch(summary(fit_mediation_scaled, standardized = FALSE, fit.measures = TRUE),
error = function(e) { print(paste("Error in summary:", e$message)); NULL })
if (!is.null(summary_output)) {
print(summary_output)
} else {
print("Model summary could not be generated. Check previous warnings/errors.")
}
## lavaan 0.6-19 ended normally after 1 iteration
##
## Estimator ML
## Optimization method NLMINB
## Number of model parameters 5
##
## Number of observations 463
##
## Model Test User Model:
## Standard Scaled
## Test Statistic 0.000 0.000
## Degrees of freedom 0 0
##
## Model Test Baseline Model:
##
## Test statistic 67.355 67.355
## Degrees of freedom 3 3
## P-value 0.000 0.000
## Scaling correction factor 1.000
##
## User Model versus Baseline Model:
##
## Comparative Fit Index (CFI) 1.000 1.000
## Tucker-Lewis Index (TLI) 1.000 1.000
##
## Robust Comparative Fit Index (CFI) NA
## Robust Tucker-Lewis Index (TLI) NA
##
## Loglikelihood and Information Criteria:
##
## Loglikelihood user model (H0) -1279.258 -1279.258
## Loglikelihood unrestricted model (H1) -1279.258 -1279.258
##
## Akaike (AIC) 2568.517 2568.517
## Bayesian (BIC) 2589.205 2589.205
## Sample-size adjusted Bayesian (SABIC) 2573.337 2573.337
##
## Root Mean Square Error of Approximation:
##
## RMSEA 0.000 NA
## 90 Percent confidence interval - lower 0.000 NA
## 90 Percent confidence interval - upper 0.000 NA
## P-value H_0: RMSEA <= 0.050 NA NA
## P-value H_0: RMSEA >= 0.080 NA NA
##
## Robust RMSEA 0.000
## 90 Percent confidence interval - lower 0.000
## 90 Percent confidence interval - upper 0.000
## P-value H_0: Robust RMSEA <= 0.050 NA
## P-value H_0: Robust RMSEA >= 0.080 NA
##
## Standardized Root Mean Square Residual:
##
## SRMR 0.000 0.000
##
## Parameter Estimates:
##
## Standard errors Sandwich
## Information bread Observed
## Observed information based on Hessian
##
## Regressions:
## Estimate Std.Err z-value P(>|z|)
## cognitive_num_scaled ~
## ag_pcr_scl (c) -0.252 0.043 -5.914 0.000
## right_hippocampus_scaled ~
## ag_pcr_scl (a) -0.252 0.052 -4.869 0.000
## cognitive_num_scaled ~
## rght_hppc_ (b) 0.069 0.044 1.562 0.118
##
## Variances:
## Estimate Std.Err z-value P(>|z|)
## .cogntv_nm_scld 0.921 0.022 41.150 0.000
## .rght_hppcmps_s 0.935 0.068 13.742 0.000
##
## Defined Parameters:
## Estimate Std.Err z-value P(>|z|)
## ab -0.017 0.012 -1.457 0.145
## total -0.269 0.040 -6.718 0.000
# You can also request standardized=TRUE to see std.all column if needed,
# though it will be very similar to the estimates themselves now.
# summary(fit_mediation_scaled, standardized = TRUE, fit.measures = TRUE)
# Check variances of scaled data (should be ~1)
print("--- Variance Table for Scaled Data Used in Model ---")
## [1] "--- Variance Table for Scaled Data Used in Model ---"
# Use the scaled data frame directly to check variances
print(sapply(mediation_data_scaled, var))
## cognitive_num_scaled right_hippocampus_scaled age_pcr_scaled
## 1 1 1
# Or use varTable on the fitted object if it succeeded
# print("--- Variance Table from lavaan Object ---")
# tryCatch(print(varTable(fit_mediation_scaled)), error = function(e) print(paste("varTable failed:", e$message)))
The mediation model is just‐identified, so all global fit indices (CFI, TLI, RMSEA, SRMR) are perfect by definition and do not provide additional insight. The estimated parameters indicate the following: • Direct Effect (c): The effect of age (age_pcr) on cognitive performance (cognitive_num) is estimated at –0.024 (standardized –0.252), suggesting that, holding other factors constant, an increase in age is directly associated with lower cognitive scores. • Path a: The effect of age on right hippocampal volume is –20.512 (standardized –0.252), indicating that older age is associated with a reduction in hippocampal volume. • Path b: The effect of right hippocampal volume on cognitive performance is estimated at 0.000 (standardized 0.069). This near‐zero coefficient suggests that, in this model, hippocampal volume does not significantly predict cognitive performance once age is accounted for. • Indirect Effect (ab): The product of paths a and b (i.e., the mediated effect) is –0.002 (standardized –0.017), which is very small relative to the total effect. • Total Effect: The sum of the direct and indirect effects is –0.025 (standardized –0.269), which is essentially driven by the direct effect of age on cognition.
While age significantly predicts both lower cognitive performance and reduced hippocampal volume, there is no evidence from this model that hippocampal volume mediates the relationship between age and cognitive performance. The mediation pathway (ab) is negligible, indicating that the association between age and cognition is primarily a direct effect.
# Assume 'imputed_data' dataframe from the imputation chunk is available
# Ensure relevant variables are factors where appropriate (as done in original EDA)
factor_cols_imp <- c("pcr", "anosmia", "risk_hospital_icu", "vaccine_before_study",
"covid_before_vaccination", "fever", "cough", "muscle_pain",
"breath_dif", "smell_lost", "taste_lost", "covid_variant",
"vaccine_1", "vaccine_2", "vaccine_3", "cognitive") # Add cognitive if used as factor
existing_factor_cols_imp <- factor_cols_imp[factor_cols_imp %in% names(imputed_data)]
if (length(existing_factor_cols_imp) > 0) {
imputed_data <- imputed_data %>%
mutate(across(all_of(existing_factor_cols_imp), as.factor))
}
# Define cognitive and neuroimaging variable sets (adjust based on actual names in imputed_data)
# Assuming imputed_data columns 23:59 are cognitive, and 60:90 are neuroimaging
# Use names for robustness if column order might change
cognitive_vars_names <- names(imputed_data)[23:59] # Example range, verify column names
neuroimaging_vars_names <- names(imputed_data)[60:90] # Example range, verify column names
symptom_vars_names <- c("anosmia", "fever", "cough", "muscle_pain", "breath_dif", "smell_lost", "taste_lost") # Add others if relevant
symptom_vars_names <- intersect(symptom_vars_names, names(imputed_data)) # Keep only existing symptom vars
# Hypothesis: More severe COVID (higher risk) might lead to worse cognitive scores,
# potentially amplified in the PCR positive group.
# Using 'cognitive_num' (created for mediation) or a primary cognitive score.
# Let's use 'cognitive_num' and control for age.
if (all(c("cognitive_num", "risk_hospital_icu", "pcr", "age_pcr") %in% names(imputed_data))) {
print("--- Q9: Severity (Risk Hospital/ICU) Interaction with PCR on Cognition ---")
# Ensure risk_hospital_icu is a factor
if (!is.factor(imputed_data$risk_hospital_icu)) {
imputed_data$risk_hospital_icu <- factor(imputed_data$risk_hospital_icu)
print("Converted risk_hospital_icu to factor.")
}
print("Levels of risk_hospital_icu:")
print(levels(imputed_data$risk_hospital_icu))
print("Table of risk_hospital_icu vs pcr:")
print(table(imputed_data$risk_hospital_icu, imputed_data$pcr))
# Fit linear model with interaction term
# Using cognitive_num as the outcome (assuming it's a reasonable continuous score)
lm_severity_interaction <- lm(cognitive_num ~ risk_hospital_icu * pcr + age_pcr, data = imputed_data)
print("ANOVA Table (Type III SS) for Severity Interaction Model:")
tryCatch({
anova_q9 <- car::Anova(lm_severity_interaction, type = "III")
print(anova_q9)
# Calculate effect sizes
print("Effect Sizes (Partial Eta Squared):")
print(effectsize::eta_squared(anova_q9, partial = TRUE))
}, error = function(e) {
print(paste("Error running Anova/eta_squared for Q9:", e$message))
print("Showing basic model summary instead:")
print(summary(lm_severity_interaction))
})
# Visualize interaction if significant (example using ggplot)
# Check interaction term p-value from Anova output
# interaction_p_val_q9 <- anova_q9["risk_hospital_icu:pcr", "Pr(>F)"] # Adjust row name if needed
# if (!is.na(interaction_p_val_q9) && interaction_p_val_q9 < 0.05) {
# print("Interaction detected, generating plot:")
# print(
# ggplot(imputed_data, aes(x = pcr, y = cognitive_num, color = risk_hospital_icu, group = risk_hospital_icu)) +
# stat_summary(fun.data = mean_se, geom = "errorbar", width = 0.1, position = position_dodge(0.1)) +
# stat_summary(fun = mean, geom = "line", position = position_dodge(0.1)) +
# stat_summary(fun = mean, geom = "point", position = position_dodge(0.1), size = 2) +
# labs(title = "Interaction: Cognitive Score by PCR Status and Hospital/ICU Risk",
# x = "PCR Status", y = "Mean Cognitive Score", color = "Hospital/ICU Risk") +
# theme_minimal()
# )
# } else {
# print("Interaction term risk_hospital_icu:pcr not significant, skipping interaction plot.")
# }
} else {
print("Skipping Q9 analysis: Required columns ('cognitive_num', 'risk_hospital_icu', 'pcr', 'age_pcr') not found.")
}
## [1] "--- Q9: Severity (Risk Hospital/ICU) Interaction with PCR on Cognition ---"
## [1] "Levels of risk_hospital_icu:"
## [1] "0" "1" "2" "3"
## [1] "Table of risk_hospital_icu vs pcr:"
##
## NEGATIVA POSITIVA
## 0 74 314
## 1 0 17
## 2 2 49
## 3 0 7
## [1] "ANOVA Table (Type III SS) for Severity Interaction Model:"
## [1] "Error running Anova/eta_squared for Q9: there are aliased coefficients in the model"
## [1] "Showing basic model summary instead:"
##
## Call:
## lm(formula = cognitive_num ~ risk_hospital_icu * pcr + age_pcr,
## data = imputed_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.7691 -0.4598 0.2444 0.4404 0.8511
##
## Coefficients: (2 not defined because of singularities)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.295779 0.292112 11.283 < 2e-16 ***
## risk_hospital_icu1 0.001595 0.120389 0.013 0.989
## risk_hospital_icu2 -0.101563 0.346109 -0.293 0.769
## risk_hospital_icu3 -0.138195 0.185020 -0.747 0.455
## pcrPOSITIVA -0.093310 0.062589 -1.491 0.137
## age_pcr -0.025670 0.004258 -6.029 3.42e-09 ***
## risk_hospital_icu1:pcrPOSITIVA NA NA NA NA
## risk_hospital_icu2:pcrPOSITIVA 0.028691 0.353973 0.081 0.935
## risk_hospital_icu3:pcrPOSITIVA NA NA NA NA
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4829 on 456 degrees of freedom
## Multiple R-squared: 0.08123, Adjusted R-squared: 0.06914
## F-statistic: 6.72 on 6 and 456 DF, p-value: 7.827e-07
R9: An analysis was conducted to determine if the relationship between COVID-19 severity (operationalized as risk_hospital_icu) and cognitive performance (cognitive_num) differs based on PCR status (pcr), while controlling for age (age_pcr). A linear model including an interaction term between severity and PCR status was specified. However, due to singularities in the model, likely caused by zero counts in some combinations of severity level and PCR status (e.g., no PCR-negative individuals in severity levels 1 or 3), the interaction effects could not be fully estimated. The partial results from the model indicated no significant main effect for PCR status (p=0.137), severity levels (compared to baseline), or the estimable interaction term (severity level 2 * PCR status, p=0.935). Age remained a highly significant predictor of cognitive score (p < 0.001). Therefore, this analysis could not definitively assess the interaction due to data limitations, but the available evidence did not support a significant interaction or main effect of severity or PCR status on cognition after accounting for age.
# Hypothesis: Patterns of symptoms (e.g., predominantly neurological vs. respiratory)
# might be more informative for predicting cognitive outcomes than single symptoms.
# Approach: Use clustering (e.g., K-means or hierarchical) on symptom variables
# for the PCR positive group, then compare cognitive scores across clusters.
if (length(symptom_vars_names) > 1 && "pcr" %in% names(imputed_data) && "cognitive_num" %in% names(imputed_data) && "age_pcr" %in% names(imputed_data)) {
print("--- Q10: Symptom Clusters and Cognition (in PCR+ group) ---")
# Subset data to PCR positive individuals and select symptom variables
pcr_positive_data <- imputed_data %>% filter(pcr == levels(pcr)[2]) # Assuming level 2 is positive
symptoms_for_clustering <- pcr_positive_data %>% select(all_of(symptom_vars_names))
# Convert factors to numeric for clustering (handle with care - assumes meaningful numeric representation)
# This might require careful dummy coding if factors are not ordinal/binary 0/1
# Example: simple conversion assuming binary factors are 0/1 or similar
symptoms_numeric <- symptoms_for_clustering %>%
mutate(across(everything(), ~ as.numeric(as.character(.)))) %>%
na.omit() # Clustering typically requires complete data
if (nrow(symptoms_numeric) > 10) { # Need sufficient data points for clustering
# Scale data before clustering
symptoms_scaled <- scale(symptoms_numeric)
# Determine optimal number of clusters (e.g., using elbow or silhouette method)
print("Determining optimal clusters (Example: Elbow method):")
tryCatch({
print(factoextra::fviz_nbclust(symptoms_scaled, kmeans, method = "wss", k.max = 10))
# print(factoextra::fviz_nbclust(symptoms_scaled, kmeans, method = "silhouette", k.max = 10))
# Choose k based on the plot (e.g., k=3)
chosen_k <- 3 # <<-- Set k based on fviz_nbclust output
}, error = function(e) {
print(paste("Cluster number determination failed:", e$message))
chosen_k <- 3 # Default to 3 clusters if determination fails
print(paste("Defaulting to k =", chosen_k))
})
# Perform K-means clustering
set.seed(123) # for reproducibility
km_results <- kmeans(symptoms_scaled, centers = chosen_k, nstart = 25)
# Add cluster assignments back to the PCR positive data (matching by row index/ID if possible)
# This assumes na.omit() didn't drastically change the dataset size/order. Robust matching needed for production.
pcr_positive_data_complete <- pcr_positive_data %>% na.omit() # Match the data used for clustering
if(nrow(pcr_positive_data_complete) == nrow(symptoms_numeric)){
pcr_positive_data_complete$symptom_cluster <- factor(km_results$cluster)
print("Cluster assignments added.")
# Analyze cognitive scores across clusters, controlling for age
print(paste("Comparing cognitive_num across", chosen_k, "symptom clusters (controlling for age):"))
lm_cluster_cognition <- lm(cognitive_num ~ symptom_cluster + age_pcr, data = pcr_positive_data_complete)
print(summary(lm_cluster_cognition))
print("ANOVA Table (Type III SS) for Cluster Model:")
tryCatch({
anova_q10 <- car::Anova(lm_cluster_cognition, type = "III")
print(anova_q10)
print("Effect Sizes (Partial Eta Squared):")
print(effectsize::eta_squared(anova_q10, partial = TRUE))
}, error = function(e){ print(paste("Error in Anova/eta_squared for Q10:", e$message))})
# Visualize differences (optional)
# print(
# ggplot(pcr_positive_data_complete, aes(x = symptom_cluster, y = cognitive_num, fill = symptom_cluster)) +
# geom_boxplot(alpha=0.7) +
# labs(title = "Cognitive Score by Symptom Cluster (PCR Positive, Age Adjusted?)", # Note: plot doesn't show adjustment
# x = "Symptom Cluster", y = "Cognitive Score") +
# theme_minimal()
# )
} else {
print("Mismatch in row numbers after na.omit(). Cannot reliably add cluster assignments.")
}
} else {
print("Skipping Q10 clustering: Insufficient complete data points for symptom variables in PCR+ group.")
}
} else {
print("Skipping Q10 analysis: Required symptom variables, 'pcr', 'cognitive_num', or 'age_pcr' not found/sufficient.")
}
## [1] "--- Q10: Symptom Clusters and Cognition (in PCR+ group) ---"
## [1] "Determining optimal clusters (Example: Elbow method):"
## [1] "Cluster assignments added."
## [1] "Comparing cognitive_num across 3 symptom clusters (controlling for age):"
##
## Call:
## lm(formula = cognitive_num ~ symptom_cluster + age_pcr, data = pcr_positive_data_complete)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.8010 -0.4853 -0.1251 0.4426 0.8268
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.075647 0.318998 9.642 < 2e-16 ***
## symptom_cluster2 0.018474 0.062070 0.298 0.766
## symptom_cluster3 -0.002058 0.060403 -0.034 0.973
## age_pcr -0.024012 0.004752 -5.053 6.75e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4865 on 383 degrees of freedom
## Multiple R-squared: 0.06268, Adjusted R-squared: 0.05534
## F-statistic: 8.538 on 3 and 383 DF, p-value: 1.68e-05
##
## [1] "ANOVA Table (Type III SS) for Cluster Model:"
## Anova Table (Type III tests)
##
## Response: cognitive_num
## Sum Sq Df F value Pr(>F)
## (Intercept) 21.999 1 92.9605 < 2.2e-16 ***
## symptom_cluster 0.033 2 0.0687 0.9336
## age_pcr 6.042 1 25.5321 6.751e-07 ***
## Residuals 90.636 383
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## [1] "Effect Sizes (Partial Eta Squared):"
## # Effect Size for ANOVA (Type III)
##
## Parameter | Eta2 (partial) | 95% CI
## -----------------------------------------------
## symptom_cluster | 3.59e-04 | [0.00, 1.00]
## age_pcr | 0.06 | [0.03, 1.00]
##
## - One-sided CIs: upper bound fixed at [1.00].
R10: To investigate whether patterns of symptoms are associated with cognitive outcomes, symptom data from PCR-positive participants were subjected to k-means clustering, resulting in the identification of three distinct symptom clusters. A linear model was then employed to assess differences in cognitive performance (cognitive_num) across these clusters, controlling for age (age_pcr). The analysis revealed no statistically significant differences in cognitive scores between the derived symptom clusters (F(2, 383) ≈ 0.07, p = 0.93). Age, however, remained a significant predictor (p < 0.001), indicating that older participants within the PCR-positive group had lower cognitive scores irrespective of their symptom cluster membership. The partial eta-squared for symptom cluster was negligible (η²p < 0.001), suggesting symptom patterns, as clustered here, did not explain variance in cognitive performance beyond age.
# Hypothesis: Participants might fall into subgroups based on their performance patterns across
# multiple cognitive tests, revealing different types of cognitive impact (or resilience).
# Approach: Cluster participants based on a wide range of cognitive test scores.
if (length(cognitive_vars_names) > 5) { # Need multiple cognitive variables
print("--- Q11: Identifying Cognitive Profiles via Clustering ---")
# Select cognitive variables from imputed data
cognitive_data_for_clustering <- imputed_data %>% select(all_of(cognitive_vars_names)) %>% na.omit()
if(nrow(cognitive_data_for_clustering) > 10){
# Scale the data
cognitive_scaled <- scale(cognitive_data_for_clustering)
# Determine optimal number of clusters (similar to Q10)
print("Determining optimal cognitive clusters (Example: Elbow method):")
tryCatch({
print(factoextra::fviz_nbclust(cognitive_scaled, kmeans, method = "wss", k.max = 10))
# print(factoextra::fviz_nbclust(cognitive_scaled, kmeans, method = "silhouette", k.max = 10))
# Choose k based on the plot (e.g., k=4)
chosen_k_cog <- 4 # <<-- Set k based on fviz_nbclust output
}, error = function(e) {
print(paste("Cluster number determination failed:", e$message))
chosen_k_cog <- 3 # Default to 3 clusters if determination fails
print(paste("Defaulting to k =", chosen_k_cog))
})
# Perform K-means clustering
set.seed(456)
km_cog_results <- kmeans(cognitive_scaled, centers = chosen_k_cog, nstart = 25)
# Add cluster assignments back to the main imputed dataset
# Again, assumes na.omit didn't drastically change things. Use ID matching if available.
imputed_data_complete_cog <- imputed_data %>% na.omit() # Match data used
if(nrow(imputed_data_complete_cog) == nrow(cognitive_data_for_clustering)){
imputed_data_complete_cog$cognitive_profile <- factor(km_cog_results$cluster)
print("Cognitive profile cluster assignments added.")
# Analyze characteristics of each profile cluster
print(paste("Analyzing characteristics across", chosen_k_cog, "cognitive profiles:"))
# Example: Compare age across profiles
if("age_pcr" %in% names(imputed_data_complete_cog)){
print("Age distribution by cognitive profile:")
print(summary(aov(age_pcr ~ cognitive_profile, data = imputed_data_complete_cog)))
# print(ggplot(imputed_data_complete_cog, aes(x=cognitive_profile, y=age_pcr, fill=cognitive_profile)) + geom_boxplot())
}
# Example: Compare PCR status distribution across profiles
if("pcr" %in% names(imputed_data_complete_cog)){
print("PCR status distribution by cognitive profile:")
print(table(imputed_data_complete_cog$cognitive_profile, imputed_data_complete_cog$pcr))
print(chisq.test(table(imputed_data_complete_cog$cognitive_profile, imputed_data_complete_cog$pcr)))
}
# Example: Compare symptom cluster distribution across profiles (if Q10 ran successfully)
if("symptom_cluster" %in% names(imputed_data_complete_cog)){ # Need to merge results carefully if na.omits differed
print("Symptom cluster distribution by cognitive profile:")
# Requires merging Q10 results back carefully if na.omits were different
# print(table(imputed_data_complete_cog$cognitive_profile, imputed_data_complete_cog$symptom_cluster))
# print(chisq.test(table(imputed_data_complete_cog$cognitive_profile, imputed_data_complete_cog$symptom_cluster)))
}
# Further analysis: Characterize each cluster by looking at the mean scaled cognitive scores
profile_centers <- aggregate(cognitive_scaled, by=list(cluster=km_cog_results$cluster), mean)
print("Mean scaled cognitive scores for each profile cluster:")
print(profile_centers)
} else {
print("Mismatch in row numbers after na.omit(). Cannot reliably add cognitive profile assignments.")
}
} else {
print("Skipping Q11 clustering: Insufficient complete data points for cognitive variables.")
}
} else {
print("Skipping Q11 analysis: Not enough cognitive variables found.")
}
## [1] "--- Q11: Identifying Cognitive Profiles via Clustering ---"
## [1] "Determining optimal cognitive clusters (Example: Elbow method):"
## [1] "Mismatch in row numbers after na.omit(). Cannot reliably add cognitive profile assignments."
R11: An attempt was made to identify distinct cognitive profiles by applying clustering algorithms to a selection of cognitive performance variables. The intention was to subsequently examine the relationship between these derived profiles and participants’ demographic characteristics or COVID-19 history. While preliminary steps, such as using the elbow method to explore potential optimal numbers of clusters, were undertaken, a data processing error occurred (Mismatch in row numbers after na.omit()). This technical issue prevented the reliable assignment of participants to the identified cognitive profiles within the main dataset. Consequently, the planned analysis comparing demographic or COVID-related variables across different cognitive profiles could not be completed.
# Hypothesis: Specific cognitive functions (e.g., executive function, memory) might correlate
# more strongly with certain brain regions than others.
# Approach: Create a correlation matrix between selected cognitive scores and neuroimaging variables.
# Select key cognitive domain representatives and neuroimaging variables
# Example cognitive vars: fluency, dscorr (memory), otswitcherr (switching errors), bostonsc (naming)
selected_cog_vars <- c("fluencia", "dscorr", "otswitcherr", "bostonsc")
selected_cog_vars <- intersect(selected_cog_vars, names(imputed_data)) # Ensure they exist
# Use all available neuroimaging vars
selected_neuro_vars <- neuroimaging_vars_names
if (length(selected_cog_vars) > 1 && length(selected_neuro_vars) > 1) {
print("--- Q12: Correlations between Cognitive Domains and Neuroimaging ---")
# Subset data
cor_data_q12 <- imputed_data %>% select(all_of(selected_cog_vars), all_of(selected_neuro_vars))
# Calculate correlation matrix (using pairwise complete observations)
cor_matrix_q12 <- cor(cor_data_q12, use = "pairwise.complete.obs")
# Extract the submatrix of correlations between cognitive and neuroimaging variables
cog_neuro_cor <- cor_matrix_q12[selected_cog_vars, selected_neuro_vars]
print("Correlation Matrix (Cognitive vs. Neuroimaging):")
# Print rounded matrix (adjust rounding digits if needed)
print(round(cog_neuro_cor, 2))
# Visualize the correlation matrix using corrplot
print("Generating Correlation Plot (Cognitive vs. Neuroimaging):")
tryCatch({
corrplot::corrplot(cog_neuro_cor,
method = "color", # Use color intensity
type = "full", # Show full matrix
order = "hclust", # Reorder based on clustering
addCoef.col = "black", # Add correlation coefficients
tl.col = "black", tl.srt = 45, # Text label color and rotation
number.cex = 0.7, # Size of coefficients
tl.cex = 0.8, # Size of labels
is.corr = TRUE, # Input is a correlation matrix
sig.level = 0.05, # Optionally cross out non-significant correlations
insig = "blank", # Leave non-significant blank (or use "pch", "p-value")
# Note: Significance requires calculating p-values separately, corrplot doesn't do it automatically from matrix input
title = "Correlations: Cognitive Domains vs Neuroimaging",
mar = c(0, 0, 1, 0)) # Adjust margins
}, error = function(e){ print(paste("Corrplot failed:", e$message))})
} else {
print("Skipping Q12 analysis: Not enough selected cognitive or neuroimaging variables found.")
}
## [1] "--- Q12: Correlations between Cognitive Domains and Neuroimaging ---"
## [1] "Correlation Matrix (Cognitive vs. Neuroimaging):"
## right_accumbens_area left_accumbens_area right_amygdala
## fluencia -0.11 -0.10 -0.11
## dscorr -0.10 -0.12 -0.08
## otswitcherr 0.09 0.10 0.19
## bostonsc -0.06 -0.06 -0.15
## left_amygdala right_cerebellum_exterior left_cerebellum_exterior
## fluencia -0.09 -0.11 -0.12
## dscorr -0.08 -0.13 -0.11
## otswitcherr 0.23 0.11 0.11
## bostonsc -0.12 -0.16 -0.16
## right_hippocampus left_hippocampus right_putamen left_putamen
## fluencia -0.14 -0.16 -0.06 -0.08
## dscorr -0.15 -0.13 -0.03 -0.05
## otswitcherr 0.23 0.22 0.05 0.02
## bostonsc -0.17 -0.19 -0.06 -0.09
## right_thalamus_proper left_thalamus_proper fornix_right fornix_left
## fluencia -0.20 -0.22 -0.15 -0.19
## dscorr -0.21 -0.23 -0.23 -0.22
## otswitcherr 0.24 0.22 0.23 0.28
## bostonsc -0.25 -0.28 -0.17 -0.16
## anterior_limb_of_internal_capsule_right
## fluencia -0.14
## dscorr -0.19
## otswitcherr 0.22
## bostonsc -0.19
## anterior_limb_of_internal_capsule_left
## fluencia -0.16
## dscorr -0.23
## otswitcherr 0.22
## bostonsc -0.21
## posterior_limb_of_internal_capsule_inc_cerebral_peduncle_right
## fluencia -0.13
## dscorr -0.11
## otswitcherr 0.14
## bostonsc -0.14
## posterior_limb_of_internal_capsule_inc_cerebral_peduncle_left
## fluencia -0.13
## dscorr -0.11
## otswitcherr 0.10
## bostonsc -0.13
## corpus_callosum right_a_cg_g_anterior_cingulate_gyrus
## fluencia -0.15 -0.04
## dscorr -0.15 -0.06
## otswitcherr 0.19 0.12
## bostonsc -0.15 -0.05
## left_a_cg_g_anterior_cingulate_gyrus right_a_ins_anterior_insula
## fluencia -0.17 -0.14
## dscorr -0.16 -0.24
## otswitcherr 0.18 0.24
## bostonsc -0.16 -0.21
## left_a_ins_anterior_insula right_an_g_angular_gyrus
## fluencia -0.13 -0.15
## dscorr -0.19 -0.13
## otswitcherr 0.21 0.13
## bostonsc -0.16 -0.08
## left_an_g_angular_gyrus right_cun_cuneus left_cun_cuneus
## fluencia -0.09 -0.11 -0.09
## dscorr -0.08 -0.06 -0.05
## otswitcherr 0.05 0.11 0.07
## bostonsc -0.05 -0.11 -0.12
## right_ent_entorhinal_area left_ent_entorhinal_area
## fluencia -0.09 -0.07
## dscorr -0.07 -0.03
## otswitcherr 0.17 0.13
## bostonsc -0.15 -0.11
## right_g_re_gyrus_rectus left_g_re_gyrus_rectus
## fluencia -0.10 -0.05
## dscorr -0.07 -0.01
## otswitcherr 0.10 0.08
## bostonsc -0.08 -0.04
## [1] "Generating Correlation Plot (Cognitive vs. Neuroimaging):"
R12: This analysis explored the bivariate correlations between selected cognitive domain measures (verbal fluency fluencia, digit span correct dscorr, switching errors otswitcherr, and Boston Naming Test score bostonsc) and a comprehensive set of structural neuroimaging variables representing volumes of various brain regions. The correlation matrix revealed numerous weak-to-moderate associations. Notably, higher performance on tasks measuring fluency, memory/attention (dscorr), and naming (bostonsc) tended to correlate negatively with volumes in several regions, including the hippocampi, thalami, and anterior insula (correlations typically ranging from r ≈ -0.10 to -0.28). Conversely, a higher number of errors on the switching task (otswitcherr) showed positive correlations with volumes in regions such as the amygdala, hippocampi, and thalami (r ≈ 0.19 to 0.28). It is important to note that these are simple correlations, not controlling for potential confounders like age, which might influence both cognitive scores and brain volumes.
# Hypothesis: These variables might relate to perceived risk, learning about COVID,
# or performance on specific experimental tasks. Explore their relationship with
# cognitive scores and COVID history.
# Approach: Basic correlations and comparisons.
env_vars <- c("listaprimerrec", "listaaprendizaje", "listacp", "listalp", "listarecon",
"corsidirecto", "corsiinverso", "cactuscorrectas", "cactusvivos", "cactusinanim")
env_vars <- intersect(env_vars, names(imputed_data))
if (length(env_vars) > 0 && "cognitive_num" %in% names(imputed_data)) {
print("--- Q13: Exploring Environmental COVID Variables ---")
# Select environmental and key cognitive/demographic variables
explore_data_q13 <- imputed_data %>% select(all_of(env_vars), cognitive_num, age_pcr, pcr) %>% na.omit()
if(nrow(explore_data_q13) > 10){
# Correlations between environmental variables and cognitive score / age
cor_env_cog_age <- cor(explore_data_q13 %>% select(all_of(env_vars), cognitive_num, age_pcr), use="pairwise.complete.obs")
print("Correlations involving Environmental Variables, Cognition, and Age:")
print(round(cor_env_cog_age[c("cognitive_num", "age_pcr"), env_vars], 2))
# Compare environmental variables based on PCR status (example using t-tests/wilcoxon)
print("Comparing Environmental Variables by PCR Status:")
for (env_var in env_vars) {
if (is.numeric(explore_data_q13[[env_var]])) {
print(paste("---", env_var, "vs PCR Status ---"))
tryCatch({
# Simple t-test or Wilcoxon as fallback
test_result <- t.test(as.formula(paste(env_var, "~ pcr")), data = explore_data_q13)
print(test_result)
}, error = function(e) {
tryCatch({
wilcox_result <- wilcox.test(as.formula(paste(env_var, "~ pcr")), data = explore_data_q13)
print(wilcox_result)
}, error = function(e2){
print(paste("Could not perform test for", env_var, ":", e2$message))
})
})
}
}
} else {
print("Skipping Q13 exploration: Insufficient complete data for environmental variables.")
}
} else {
print("Skipping Q13 analysis: Environmental variables or 'cognitive_num' not found.")
}
## [1] "--- Q13: Exploring Environmental COVID Variables ---"
## [1] "Correlations involving Environmental Variables, Cognition, and Age:"
## listaprimerrec listaaprendizaje listacp listalp listarecon
## cognitive_num 0.37 0.61 0.58 0.56 0.35
## age_pcr -0.20 -0.33 -0.35 -0.31 -0.18
## corsidirecto corsiinverso cactusvivos cactusinanim
## cognitive_num 0.08 0.25 0.42 0.34
## age_pcr -0.08 -0.14 -0.23 -0.26
## [1] "Comparing Environmental Variables by PCR Status:"
## [1] "--- listaprimerrec vs PCR Status ---"
##
## Welch Two Sample t-test
##
## data: listaprimerrec by pcr
## t = 0.11574, df = 122.92, p-value = 0.908
## alternative hypothesis: true difference in means between group NEGATIVA and group POSITIVA is not equal to 0
## 95 percent confidence interval:
## -0.3355948 0.3772784
## sample estimates:
## mean in group NEGATIVA mean in group POSITIVA
## 3.723684 3.702842
##
## [1] "--- listaaprendizaje vs PCR Status ---"
##
## Welch Two Sample t-test
##
## data: listaaprendizaje by pcr
## t = 1.4734, df = 116.69, p-value = 0.1433
## alternative hypothesis: true difference in means between group NEGATIVA and group POSITIVA is not equal to 0
## 95 percent confidence interval:
## -0.4279609 2.9147010
## sample estimates:
## mean in group NEGATIVA mean in group POSITIVA
## 25.02632 23.78295
##
## [1] "--- listacp vs PCR Status ---"
##
## Welch Two Sample t-test
##
## data: listacp by pcr
## t = 1.3638, df = 100.95, p-value = 0.1757
## alternative hypothesis: true difference in means between group NEGATIVA and group POSITIVA is not equal to 0
## 95 percent confidence interval:
## -0.1829785 0.9880240
## sample estimates:
## mean in group NEGATIVA mean in group POSITIVA
## 6.539474 6.136951
##
## [1] "--- listalp vs PCR Status ---"
##
## Welch Two Sample t-test
##
## data: listalp by pcr
## t = 1.6448, df = 104.74, p-value = 0.103
## alternative hypothesis: true difference in means between group NEGATIVA and group POSITIVA is not equal to 0
## 95 percent confidence interval:
## -0.1026459 1.1014219
## sample estimates:
## mean in group NEGATIVA mean in group POSITIVA
## 5.894737 5.395349
##
## [1] "--- listarecon vs PCR Status ---"
##
## Welch Two Sample t-test
##
## data: listarecon by pcr
## t = 1.8522, df = 143.03, p-value = 0.06606
## alternative hypothesis: true difference in means between group NEGATIVA and group POSITIVA is not equal to 0
## 95 percent confidence interval:
## -0.04028367 1.23911408
## sample estimates:
## mean in group NEGATIVA mean in group POSITIVA
## 21.71053 21.11111
##
## [1] "--- corsidirecto vs PCR Status ---"
##
## Welch Two Sample t-test
##
## data: corsidirecto by pcr
## t = -0.14342, df = 150.46, p-value = 0.8862
## alternative hypothesis: true difference in means between group NEGATIVA and group POSITIVA is not equal to 0
## 95 percent confidence interval:
## -0.2738168 0.2367571
## sample estimates:
## mean in group NEGATIVA mean in group POSITIVA
## 5.118421 5.136951
##
## [1] "--- corsiinverso vs PCR Status ---"
##
## Welch Two Sample t-test
##
## data: corsiinverso by pcr
## t = 1.1679, df = 110.26, p-value = 0.2454
## alternative hypothesis: true difference in means between group NEGATIVA and group POSITIVA is not equal to 0
## 95 percent confidence interval:
## -0.1192161 0.4613893
## sample estimates:
## mean in group NEGATIVA mean in group POSITIVA
## 4.315789 4.144703
##
## [1] "--- cactusvivos vs PCR Status ---"
##
## Welch Two Sample t-test
##
## data: cactusvivos by pcr
## t = 0.90244, df = 121.75, p-value = 0.3686
## alternative hypothesis: true difference in means between group NEGATIVA and group POSITIVA is not equal to 0
## 95 percent confidence interval:
## -0.3058832 0.8183951
## sample estimates:
## mean in group NEGATIVA mean in group POSITIVA
## 12.88158 12.62532
##
## [1] "--- cactusinanim vs PCR Status ---"
##
## Welch Two Sample t-test
##
## data: cactusinanim by pcr
## t = 0.92729, df = 122.67, p-value = 0.3556
## alternative hypothesis: true difference in means between group NEGATIVA and group POSITIVA is not equal to 0
## 95 percent confidence interval:
## -0.2850283 0.7874083
## sample estimates:
## mean in group NEGATIVA mean in group POSITIVA
## 15.31579 15.06460
print("--- End of Expanded Analysis Script ---")
## [1] "--- End of Expanded Analysis Script ---"
R13: This exploratory analysis focused on variables grouped under ‘Environmental COVID’, which appear to represent performance on various cognitive or learning-related tasks (e.g., listaprimerrec, listaaprendizaje, cactusvivos). Firstly, correlations indicated that scores on several of these tasks (listaprimerrec, listaaprendizaje, listacp, listalp, listarecon, cactusvivos, cactusinanim) were moderately positively associated with the general cognitive score (cognitive_num, r ≈ 0.34 to 0.61) and negatively associated with age (age_pcr, r ≈ -0.18 to -0.35). This suggests these variables largely reflect cognitive performance sensitive to age. Secondly, comparisons using Welch’s t-tests were conducted to assess differences in these variables based on COVID-19 PCR status (pcr). No statistically significant differences were found between the PCR-positive and PCR-negative groups for any of these ‘Environmental COVID’ variables (all p > 0.05). Thus, while these tasks measure age-related cognitive functions, performance on them did not significantly differ based on past PCR positivity in this sample.